|Home | About | Journals | Submit | Contact Us | Français|
9p21.3 is among the most strongly replicated regions for cardiovascular disease (CVD). There are few reports of sequencing the associated 9p21.3 interval. We set out to sequence the 9p21.3 region followed by a comprehensive study of genetic associations with clinical and subclinical CVD and its risk factors, and with copy number variation and gene expression, in the Framingham Heart Study (FHS).
We sequenced 281 individuals (n=94 with myocardial infarction, n=94 with high coronary artery calcium levels, and n=93controls free of elevated coronary artery calcium or myocardial infarction) followed by genotyping and association in >7,000 additional FHS individuals. We assessed genetic associations with clinical and subclinical CVD, risk factor phenotypes, and gene expression levels of protein-coding genes CDKN2A and CDKN2B as well as the non-coding gene ANRIL in freshly harvested leukocytes and platelets. Within this large sample we found strong associations of 9p21.3 variants with increased risk for myocardial infarction, higher coronary artery calcium levels, and larger abdominal aorta diameters, and no evidence for association with traditional CVD risk factors. No common protein-coding variation, variants in splice donor or acceptor sites, or CNV events were observed. By contrast, strong associations were observed between genetic variants and gene expression, particularly for a short isoform of ANRIL and for CDKN2B.
Our thorough genomic characterization of 9p21.3 suggests common variants likely account for observed disease associations, and provide further support for the hypothesis that complex regulatory variation affecting ANRIL and CDKN2B gene expression may contribute to increased risk for clinically apparent and subclinical coronary artery disease and aortic disease.
The human chromosome 9p21.3 region, containing protein-coding genes CDKN2A and CDKN2B and the non-coding gene ANRIL, isone of the strongest and most consistently replicated genetic regions to emerge from genome-wide association studies (GWAS). SNPs in 9p21.3 are associated most strongly with myocardial infarction (MI) and coronary artery disease (CAD)1 but are also associated with abdominal2,3 and intracranial aneurysm3, Type II diabetes (e.g.,4), cancers 5,6, peripheral artery disease3,7, and coronary artery calcium (CAC), a measure of subclinical coronary artery atherosclerosis8,9, suggesting phenotypic pleiotropy of associations in the region. More modest associations have also been observed with incident heart failure10, sudden cardiac death11, coronary stenosis12, and weakly and somewhat inconsistently with stroke13. Thus far, the associations with cancer risk and Type II diabetes appear distinct, by physical location and correlation among the most strongly associated SNP, from the peak cardiovascular disease (CVD) associations that are located primarily in the 3′ region of ANRIL.
Various groups have reported that SNPs associated with cardiovascular outcomes are associated with the gene expression of ANRIL, CDKN2A, and CDKN2B, suggesting that differences in regulatory elements may at least partially explain the functional contribution of risk alleles to disease phenotypes3,14–21. The majority of the studies on 9p21.3 to date have focused on either single SNPs or directly genotyped or imputed SNPs in GWAS based on commercial genotyping arrays. Few groups have yet reported large-scale resequencing of the 9p21.3 region18,22, or on the presence or absence of copy number variations (CNV) in the region1, and to date reports have included only relatively small sample sizes. We set out to sequence the 9p21.3 region in three groups of participants selected by presence of MI or subclinical CAD (total n=281) in the Framingham Heart Study (FHS) to discover new variation and follow up novel and known variants in a larger set of FHS participants(n>7,000)to test associations with clinical and subclinical CVD, and CVD risk factors. We additionally sought to test for association with expression (mRNA) of genes in the region in several thousand individuals in leukocytes and platelets, to construct a denser LD map in the region than available in the HapMap, and determine if common CNVs exist in the 9p21.3 region and if it is associated with disease or its risk factors.
The Framingham Heart Study (FHS) is a community-based, prospective, longitudinal study of three generations of participants. The original cohort enrolled 5,209 participants starting in 194823, the Offspring (Second Generation) cohort enrolled 5,124 children and spouses of children of the original cohort starting in 197124, and the Third Generation enrolled 4,095 children of the Offspring cohort 2002–200525. Subjects from families were recruited for CT scans including 1,390 offspring (Second Generation) cohort participants at Exam 7 and 2,093 third generation participants at Exam 1. Participants for sequencing and association analysis (n=281) in the Discovery sequencing stage were drawn from the offspring cohort, as described below. Participants for genotyping and association analysis (up to n=7,290) in the Follow-up genotyping stage were drawn from the offspring and third generation cohorts, as described below. The overall study design, flow and key Table references are given in a flowchart in Figure S1.
Coronary artery calcium (CAC), abdominal aortic calcium (AAC), and aortic diameters were measured by an 8-slice cardiac multidetector (MDCT) scanner (Lightspeed Ultra, GE, Milwaukee, WI) as previously described26. CAC scores were calculated by a modified Agatston score based on the average of two sequential scans. There were 3,238 FHS participants with both CAC and follow-up genotyping available for analysis. The CAC dichotomous categories were defined as low (mean CAC value <100) and high (mean CAC ≥ 100). Abdominal aortic calcium (AAC) continuous measures were averaged from at least two measurements, with 3,316 participants also having genotypes available. Computed tomography measurements of antero-posterior arterial diameters were calculated. Among individuals with genotypes, diameter measurements were available in participants (n ranging from 3,287 to 3,300 at four anatomical sites): the ascending (AAO) and descending (DAO) thoracic aorta at the level of the right pulmonary artery, and the abdominal aorta 5 cm above (ABAO-5) and at (ABAO) the aorto-common iliac bifurcation. Individuals with known CVD (cardiovascular disease) or abdominal or thoracic surgeries were excluded.
Recognized MI and other CAD were defined as previously described, adjudicated by a panel of physicians and included ECG, cardiac biomarker, case history and/or autopsy evidence27. Hard coronary heart disease was defined as death due to CVD or a recognized MI. Prevalent events were identified across all available examinations, while incident events were identified as those which took place after the DNA collection for each individual. Age of onset for MI was defined using the date of the first documented event relative to birth date. Among those in the Follow-up genotyping stage, 113 prevalent MI cases and 72 incident MI cases were available for analysis.
Cardiovascular disease risk factors were collected as previously described.27 The risk factors were measured at the same examination for each cohort as the CAC and AAC measurements, and included total cholesterol, HDL cholesterol, log (triglyceride levels), body mass index, systolic blood pressure, hypertension (defined as SBP ≥ 140 or DBP ≥ 90 mm Hg or treatment with anti-hypertensive medication), prevalence of Type II diabetes(defined as a fasting plasma glucose ≥ 126 mg/dL and/or use of anti-diabetic treatment), and cigarette smoking (current smoker, regularly smoked ≥1 cigarette/day within the last 12 months). Each of these risk factors was included in the multivariable-adjusted models along with the reported use of lipid-lowering medication.
Participants from the FHS offspring cohort were selected in three groups based upon presence or absence of prior clinically apparent MI or high CAC for 9p21.3 resequencing: Group I) individuals with early onset (men <55, women <65) MI (n=94), Group II) individuals with no recognized MI, but high age-and sex-adjusted CAC28, defined as being in >90th percentile (n=94, mean ± SD: males 1775 ± 1466, females 544 ± 631), and Group III) of individuals with no recognized MI and with low age-adjusted CAC, defined as a value <25th percentile(n=93, mean ± SD: males 8 ± 16, females, 0 ± 0). Characteristics of the three groups are provided in Supplemental Table 1. To account for the family structure within FHS, we preferentially selected one participant per pedigree emphasizing individuals with earliest onset of MI, with highest CAC value, or with lowest sex-specific CAC within pedigrees with no evidence for MI or high CAC. Participants at age <58 years at the time of CT scan were excluded from Group III. A set of males (n=52) met criteria for Group III and were selected to provide a comparable sex distribution with Group II. Of females meeting the criteria, 44 were randomly selected to yield a total of 94 participants in Group III. Sequencing results were only available for 93 individuals in Group III since one sample had average call rates less than 80% across the 9p21.3 region.
Genomic DNA was whole-genome amplified and~15 ug transferred to each of two resequencing centers: the J. Craig Venter Institute and the University of Washington, Department of Genome Sciences sequencing center. The full targeted region for Sanger-based resequencing was 250,000 bases on chromosome 9 between 21,940,000 and 22,190,000 (NCBI Build 36.3 coordinates). The University of Washington targeted the region containing CDKN2A and CDKN2B from 21,940,000 – 22,069,266 and the Venter Institute targeted 22,070,300 – 22,190,000, encompassing ANRIL. Bidirectional sequencing reactions using BigDye Terminator v3.1 chemistry were employed (Applied Biosystems, Foster City, CA) with chromatograms generated by Applied Biosystems 3730XL capillary sequencers. Further details on the sequencing methods are in the Supplementary Text. The recovery of targeted regions was CDKN2A (99.74%), CDKN2B (99.87%) and ANRIL (96.4%). Identified variants were deposited in dbSNP by the sequencing centers.
Bi-allelic SNP variants with minor allele frequency (MAF) ≥ 2.0% were tested for association across groups under an additive model adjusting for age and sex. For each variant two group comparisons were made: 1) for MI: Group I (MI) versus Group III (no MI, low CAC), and 2) for CAC: Group II (no MI, high CAC) versus Group III (no MI, low CAC). Furthermore, each SNP was tested for differences in Group allele counts with both chi-square tests and Fisher’s exact tests to try to improve performance for rarer variants. Variants with one or more group comparison having P < 0.05 were selected for follow up genotyping in a larger sample of FHS participants.
Variants in the same 9p21 region assessed in Framingham were extracted from whole genome sequence data in the ARIC and CHS cohorts (n=187 MI/CHD cases from baseline, n=454 controls). Sequence kernel association tests (SKAT) 29 and T1 burden tests 30, adjusting for age, sex and study site, were applied to the whole region, as well the four individual gene regions (see Supplemental Text for further details).
172 SNPs in the 9p21.3 region were selected for further genotyping as follows: P < 0.05 for both MI and high CAC (n=27), P < 0.05 for MI alone (n=106), P < 0.05 for high CAC alone (n=22), SNPs in the region of densest prior reported associations which had poor call rates by sequencing including rs1333049 (n=8), SNPs with prior haplotype associations that were not significant in the Discovery sequencing stage analysis (n=2), and SNPs with large differential genotype counts between MI and/or high CAC groups and control groups which did not reach significance in the Discovery sequencing stage (n=7). The SNPs were submitted for assay design QC (Illumina, Inc., San Diego, CA) with 140 SNPs receiving good quality metrics and therefore selected for final genotyping.
Framingham Heart Study participants with available genomic DNA with cell line backups and with prior genotyping in the SHARe project with a genome-wide call rate >97% were selected for genotyping (n=7,379). Characteristics of the Follow-up stage participants are provided in Supplemental Table 2. Genotyping of 140 SNPs was conducted by Illumina Golden Gate assay. After DNA sample genotyping failures (n=89) SNP results were available for analysis in 7,290 FHS participants, including 260/281 participants included in the Discovery sequencing stage. Twenty of those from Discovery sequencing were not genotyped because they did not have plated genomic DNA with adequate cell line backups. One Discovery sample failed follow-up genotyping. Fourteen of 140 attempted SNPs failed genotyping leaving 126 SNPs available for analysis (average call rate for successful SNPs: 99.85%). A final list of 126 SNPs included in the Follow-up genotyping stage analysis and their coordinates and rs numbers are shown in Supplemental Table 3.
Regression models were used to examine associations with clinical CVD events, subclinical CVD ascertained by computed tomography (CAC, AAC, aortic diameters), and traditional CVD risk factors. Association analyses for CAC and AAC were conducted after adjusting for age at CT scan and sex, and separately with multivariable risk factors. Associations with CT aortic diameters (AAO, DAO, ABAO, and ABAO-5) were conducted after adjustment for age, height, and sex. Associations with traditional CVD risk factors were tested in age-and sex-adjusted models.
Primary Follow-up stage analyses were conducted on all participants successfully genotyped in order to maximize statistical power. This included 260/281 participants sequenced in the Discovery sequencing stage. Secondary analyses were also conducted with the 7,030 participants not sequenced in the Discovery stage and yielded similar results. Analysis of continuous traits used a linear mixed effects (LME) model with a random effects component to account for relatedness in FHS, as previously described 31. Additive, recessive and dominant models were analyzed for all continuous traits. Dichotomous traits were analyzed using general estimating equations as previously described 31. Conditional age-and sex-adjusted association analyses were conducted for continuous and dichotomous CAC by including a covariate for SNP genotype in the model for each of the top two SNPs in turn (rs1333049, rs1333045) to see which associations remained after accounting for these variants.
Since our study included novel variants from sequencing and a larger sample number than the HapMap CEU reference sample commonly used in GWAS imputation, we created a linkage disequilibrium (LD) map based on 1,170 unrelated individuals in the FHS offspring cohort. Genotype results were combined from the Follow-up genotyping stage for all non-monomorphic SNPs(n=125) and from prior GWAS studies in the FHS (n=23 additional SNPs). One participant was randomly selected from each family and genotypes for 1,170 unrelated individuals were loaded into Haploview (v4.2) to calculate haplotypes and pairwise LD (Figure 1). There were 67 of 126 tested SNPs that were perfectly correlated with 1 or more other SNPs (the 67 SNPs were in 22 distinct intra-correlated groups). Thus, we calculated an effective number of tests for follow-up associations as 81 SNPs per trait(59 non-perfectly correlated SNPs (126 minus 67) + 22 groups of perfectly correlated SNPs), yielding a Bonferroni threshold for each trait of P= [0.05/81]= 6.2×10−4.
Citrated venous blood was collected from Offspring Exam 8 participants as previously described32. Briefly, either platelet rich plasma (PRP) was isolated by centrifugation and filtered to remove white blood cell contaminants, or separately peripheral mononuclear cells (PBMCs) were isolated from whole blood buffy coats using centrifugation in CPT Ficoll separation tubes (Becton Dickinson, Franklin Lakes, NJ). Pellets were lysed with total RNA lysis solution (Qiagen, Germantown, MD) and frozen at −80°C. Total RNA was isolated with RNeasy Mini kits and purified with a Qiacube instrument (Qiagen). RNA quantity and quality was evaluated by NanoDrop Spectrophotometer (Nanodrop, Wilmington, DE). Complementary DNA (cDNA) was prepared with High Capacity cDNA Reverse Transcription Kits (Applied Biosystems, Foster City, CA) and pre-amplification conducted with a TaqMan PreAmp Master Mix (Applied Biosystems). Quantitative RT-PCR was run with TaqMan assays in DynamicArray 48.48 chips on a BioMark RTPCR system (Fluidigm, San Francisco, CA). TaqMan probe sequences for CDKN2A, CDKN2B and ANRIL long and short isoforms are given in Supplementary Table 4.
Of the Offspring participants with gene expression measurements, up to 1,499 also had Follow-up stage genotyping, CAC(n up to 618) or AAC (n up to 653). Gene expression levels were normalized to the mean cycle threshold (CT) of three control genes within samples (beta-actin, ACTB, beta-2-microglobulin, B2M, glyceraldehyde-3-phosphate dehydrogenase, GAPDH) by the 2−ΔΔCT method. Samples with CT values ≥ 30 for target genes or for all control genes were excluded from analysis. Due to the non-normal distribution of result after normalization relative expression levels were log-transformed before analysis. Association of SNPs with gene expression was evaluated under additive and recessive models and adjusting for either age, or age and sex. Association of expression levels with CAC or AAC was evaluated with LME adjusting for age, and alternatively age and sex.
Genotype intensities from the Affymetrix 500K (StyI and NspI arrays) for ~8,700 individual FHS samples with genome-wide call rates >97% were quantile normalized to the median intensity of a reference group of samples (Golden Helix SNP Variation Suite, Bozemann, MT). A logR ratio for individuals at each marker was calculated and corrected using principal components analysis for genotyping batch effects (Golden Helix SVS). Extreme outliers of logR ratios were excluded and CNVs were called as neutral (no gain or loss), a gain (one or more copies) or a loss (one or more copies). In total 425 SNPs were evaluated for potential CNV on chromosome 9 in a region that stretched from rs7025851 (at position 20,951,906) to rs1416509 (at position 23,055,716). This region surrounds our targeted sequencing and association region by ~1 Mb on either side.
We found 1,363 biallelic SNPs (excluding indels) and 55 complex variants (including single and multinucleotide insertions and deletions) that were present in one or more sample (Table 1A). Of all the variants, 651 (636 SNPs, 15 complex variants) were singletons. After categorizing against known variants in dbSNP Build 128 we found that 1,011/1,418 (71.3%) variants were novel. The novel variants clustered toward lower minor allele frequencies (Table 1A), and overall there appeared to be an excess of variants of low minor allele frequency in those in Group I (early onset MI, n=345 alleles counted with MAF<0.5%) or Group II (high levels of CAC, n=312 allele counts) compared with Group III (no prior MI and low CAC, n=286 allele counts), as shown in Supplemental Table 5. Additional MI/CHD cases and controls were sequenced for the same 9p21 region in the ARIC and CHS studies. Burden testing in ARIC + CHS for the whole region as well as four gene sub-regions indicated the most significant result for SKAT across the whole 9p21 region (P=0.052)(Supplemental Table 6).
As shown in Table 1B, most of the 516 variants in FHS with minor allele counts ≥ 5 were within a transcribed portion of the targeted region (322/516, 60.5%) and located within introns (300/516, 58.1%). Eleven of the 22 exonic variants were located within the noncoding RNA ANRIL, eight were located in 3′ untranslated regions (UTR), two within 5′UTRs, and one was a known nonsynonymous variant (rs3731249, Ala148Thr in CDKN2B, MAF 2.1%). One novel variant was found to encode an A>C change within an AG splice acceptor sequence in intron 6 of ANRIL (hg18 position 22,046,249, MAF 0.9%). Two of the 3′UTR variants (rs3217992 in CDKN2A with MAF 42.9%, rs3088440 in CDKN2B with MAF 9.4%) are in predicted microRNA binding sites (http://www.patrocles.org/). None of these potentially functional variants was associated with either MI or CAC in the Discovery sequencing stage association results (all P>0.05), and none were selected for follow-up genotyping.
Based on the Discovery stage results (top results in Table 2, full results in Supplemental Table 3), 140 SNPs were selected for further genotyping and, of these SNPs, 126 SNPs were successfully genotyped (quality control information in Supplemental Table 3). We observed strong associations in the Follow-up genotyping stage with CAC levels as a continuous phenotype (P<5×10−9, Table 3). We examined other associations for the strongest CAC-associated SNP (rs1333045) in further detail, as well as for the 9p21.3 SNP that has been most studied in the literature (rs1333049; r2=0.84 with rs1333045 in FHS). Both SNPs were also associated with increasing abdominal aortic diameter and prevalent MI but were not strongly associated with any CVD risk factors (Table 3). In single SNP analyses or conditional analyses (adjusting for either SNP), rs1333045 showed stronger association with CAC. The magnitude, direction and statistical significance of these findings was similar in multivariable analyses (adjusting for age, sex and all traditional CVD risk factors) of the association of both SNPs with CAC phenotypes (data not shown).
The strongest individual SNP associations from the Follow-up genotyping stage for all measures are shown in Table 4. Only associations for CAC, abdominal aortic diameter, and prevalent MI survive a multiple-test correction (P<6.2×10−4). Notably, the peak SNP for prevalent MI (rs4977574) is the same SNP with previous, strong associations in larger GWAS meta-analyses by the MIGen and CARDIoGRAM Consorta33, and is in strong LD with the SNPs most strongly associated in FHS with CAC (rs1333045, r2= 0.76 in FHS) and abdominal diameter (rs1333046, r2= 0.87 in FHS). The risk alleles at 9p21.3 were weakly associated in FHS with increased ascending and descending aortic dimensions, and moderately associated with increased abdominal aortic dimensions (Table 4). A prior peak GWAS and replication SNP (rs10811661) for Type II diabetes in multiple populations (e.g.,4) was also associated in our population with Type II diabetes (p<7.03×10−4) supporting prior results. As previously noted this variant is not correlated with the CVD and subclinical CVD associated alleles. Results in multivariable-adjusted models for all of the traits and disease endpoints did not differ substantially (data not shown). Overall, these findings show a consistent direction of effect of 9p21.3 SNPs with subclinical atherosclerosis, aortic dimensions, and MI with at most weak associations with traditional CVD risk factors. Plots of region-specific associations for the phenotypes examined along with Framingham specific LD, and gene annotation are shown in Figure 1.
There were strong and independent expression associations of SNPs with CDKN2B and short isoforms of ANRIL (EU741058, DQ485454) in leukocytes (Table 5), but not for CKDN2A or the long isoform of ANRIL (NR_003529). We did not observe any associations that survived multiple-test correction for CDKN2A, or for any of the transcripts in platelets. The pairwise transcript correlations after adjusting for the mean of 3 housekeeping genes in samples that had each transcript assayed with cycle threshold <30 are given in Supplemental Table 7. Regional association plots for all 4 transcript levels in leukocytes are shown in Figure 1.
We detected no common (MAF >1%) CNV gains or losses at 9p21.3 by our methodology in the following coordinate range: 20,951,906–23,055,716. The most common events were clustered in a region 3′ of ANRIL and 5′ of DMRTA1 (from 22,273,153–22,366,805). Among ~8,700 samples the most common CN gain was at rs10811687 (position 22,293,833, n=73, MAF=0.84%) and the most common CN loss was at rs1360137 (position 22,366,805, n=16, MAF=0.18%). Combining CN gains and losses at each position rs10811687 showed the greatest number of events (73 CN gains, 11 CN losses, combined MAF=0.97%). Plotting single SNP CN signals (Figure 1) shows that uncommon events are observed throughout the sequenced region.
The chromosome 9p21.3 locus containing CDKN2A, CDKN2B, and ANRIL is one of the most important and consistently associated loci to emerge from modern genetic studies employing GWAS to study cardiovascular disease susceptibility. The locus has been the target of considerable further genetic and functional studies since the initial discovery, but the functional mechanisms contributing to disease remain unknown. To our knowledge, this is the first study to report on re-sequencing of the region coupled with a comprehensive examination of associations with subclinical and clinical CVD and risk factors, copy number variation, and gene expression, all within a single community-based population of large sample size. At least two other studies have reported sequencing 9p21.3 in smaller samples. One study reported no improvement in resolving stronger functional variants after resequencing of 45 samples combined with imputation and fine-mapping22. A second study reported resequencing 9p21.3 in 50 individuals with evidence that supports risk-associated alleles having effects on expression of CDKN2B and ANRIL through modulation of enhancer binding18. The current study employed a resequencing strategy in population-extremes consisting of clinically affected individuals, those with extreme levels of subclinical coronary atherosclerosis, and healthy matched controls, whereas the prior studies employed resequencing strategies without considering clinical phenotypes. A further strength of the current study is the follow-up in a large cohort with expression, subclinical and clinical measures that were analyzed.
After the Discovery sequencing stage, we found no variant discovered by sequencing with MAF >0.75% with an obvious link to protein function or splicing that was also associated with CAC or MI (see Results and Table 1B). Likewise, in a survey of ~8,700 individuals we did not find strong evidence for common copy number variation in the 9p21.3 region that could directly influence gene function, consistent with a prior report on CNV and MI1. We did observe a slight over-representation of rare variants (MAF <0.5%) in MI and high CAC individuals after sequencing compared to controls (Supplemental Table 5). While rare events are proposed to have the potential to create “synthetic” common disease associations34, the idea remains controversial35 and confirmation of this hypothesis raised by our study would require thorough characterization and replication of all rare and common events in 9p21 and their functional impacts and associations. We followed this hypothesis with independent burden testing from sub-analysis of whole genome sequencing data in the ARIC and CHS studies, but the results do not implicate any gene sub-regions and only approach significance across the whole region in SKAT analysis, which includes both common and rare variant effects. Our results for 9p21 may be consistent with the suggested pattern that the genetic variance of complex traits is mainly accounted for by many variants and loci with modest effects36–38.
We found links between common variation at 9p21.3 and expression of some transcripts within the region. In leukocyte and platelet samples that were also densely genotyped, we found that previously reported variants for 9p21.3 clinical associations were associated with decreased transcript levels of short isoforms (EU741058, DQ485454) of the noncoding RNA ANRIL (e.g., rs1333049, G allele on forward strand (MAF 49.4%), beta −0.15 se 0.05 p<4.1×10−3), but not with a long isoform (NR_003529.1, beta 0.02 se 0.05 p<0.61). This study represents the largest set of samples characterized for gene expression genetics at 9p21.3 to date among at least 10 other studies3,14–21,39. Although there are substantial differences in tissues, probes and isoforms targeted across these studies that may limit direct comparisons, our overall findings are consistent with several prior findings. First, there is consistency among the largest studies, including the current one, that the MI/CAC risk alleles are most strongly associated with decreased levels of short isoforms of ANRIL 14,39 and with weaker39 or no association14 with CDKN2A/2B or a long ANRIL isoform, despite the location of the risk alleles adjacent to the 3′ end of the long isoform. Some studies were unable to distinguish between short and long isoforms due to their assay designs 16–18,20. Second, regardless of genotypes, we observe correlation in leukocytes and platelets of the short and long isoforms of ANRIL (r=0.784 in leukocytes) and between CDKN2A and CDKN2B (r=0.585 in leukocytes) supporting the co-expression of those genes in several tissues, and consistent with findings in other studies14,16,17,19,39. The correlations between ANRIL isoforms and CDKN2A/2B seem to indicate positive correlation between ANRIL isoforms and CDKN2A, and negative correlation with CDKN2B after accounting for housekeeping genes (Supplemental Table 7). The negative correlation of ANRIL expression with CDKN2B expression is consistent with experimental work indicating that ANRIL may inhibit CDKN2B expression and suggesting this relationship could be a key mechanistic driver15,18,40.
We discovered much stronger eQTLs for the short isoforms of ANRIL (rs944799, beta 0.30 se 0.05 p<4.6×10−9) and CDKN2B (rs598664, beta −0.36 se 0.07 p<6.1×10−7) in leukocytes than those directly linked to the MI/CAC alleles. These variants appear relatively distinct from the MI/CAC alleles (rs944799, MAF 49.8%, r2 to rs1333049=0.22; rs598664, MAF 11.0%, r2 to rs1333049=0.01). The SNP rs598664 strongly associated with decreased CDKN2B expression was modestly associated with increased CAC levels (p<1.6×10−3) but not associated with prevalent (p=0.66) or incident MI (p=0.58), though our power to detect disease association may be limited given relatively low MAF and low numbers of clinical events. While the SNP rs944799 showed modest association with increased CAC levels (p<0.02) and trended toward significant association with increased odds of prevalent (p=0.06) and incident (p=0.10) MI, in analyses adjusting for rs1333049 or rs1333045, rs944799 was no longer significant for association with CAC suggesting any associations are likely due to modest LD with the previously described risk variants (data not shown). Despite a high MAF, rs944799 is not present on any commercial genotyping arrays and has no moderate LD partners (r2>0.6) even in the 1000 genome phase 1 data, indicating that our strategy of resequencing, follow-up genotyping and association with gene expression and high resolution clinical phenotypes led to the discovery of a new candidate functional eQTL for ANRIL that may not have been adequately ascertained in prior studies.
At a nominal significance level (p<0.05), 77 variants and 37 variants were associated with ANRIL and CDKN2B expression, respectively, in leukocytes, while only seven variants were associated with CKDN2A expression in platelets or leukocytes. While LD between variants accounts largely for the high number of significant variants for ANRIL and CDKN2B expression, our results suggest that there are multiple, independent functional variants regulating both ANRIL and CDKN2B. These results are consistent with a previous study that employed allele-specific expression of CDKN2A, CDKN2B and ANRIL and genotyped 56 SNPs16, suggesting there are multiple, independent genetic signals for the expression of each gene. While the prior study16 did not distinguish between short and long isoforms of ANRIL, when we attempted to compare results, we found several consistent signals among 15 SNPs we had in common (Supplemental Table 8). Several different isoforms of ANRIL in blood and carotid plaque were also previously associated with risk SNPs and with extent of atherosclerosis severity further building a case for the importance of ANRIL expression in a relevant tissue and sample.39 A previous study including samples from FHS suggested that 9p21.3 alleles may contribute to platelet reactivity.41 Here we find 9p21.3 gene expression genetic results in platelets were less significant than those in leukocytes. This may be due to the anuclear nature of platelets and lower transcript abundance. The short isoforms of ANRIL were not well expressed in platelets whereas the long isoform (NR_003529.1) and CDKN2A and CDKN2B were well expressed, with modest associations observed with variants and decreased expression of NR_003529.1 and increased expression of CDKN2A (Table 5). The top associated variants in platelets are not in LD with the CAC and MI-associated variants indicating their effects are not likely linked to known disease associations at this time. If 9p21.3 disease associations are partially mediated through effects on platelet reactivity it is likely via a mechanism other than effects on platelet gene expression.
In a large, well-phenotyped cohort, we applied multiple genomic technologies to the study of the 9p21.3 association region including sequencing, follow-up genotyping, CNV, gene expression and associations with high-resolution measures of clinical and subclinical CVD and risk factors. We report associations that are consistent with findings from prior studies of MI and other forms of CVD, aortic aneurysm, CAC and Type II diabetes, confirming the impact of 9p21.3 CVD associations in a single population. Consistent with prior studies we find little evidence that 9p21.3 associations are mediated via copy number variation1 or through traditional risk factors. After comprehensive sequencing we failed to find any common or uncommon variants with clear functional effects that are linked to protein-coding genes (e.g., splicing or protein defects). By contrast, we found multiple strong transcript-SNP associations, with both common and uncommon variants, suggesting complex genetic regulation of the transcripts in the region. The strongest disease-associated variants were only modestly associated with decreased expression of ANRIL, and ANRIL expression was inversely correlated with CDKN2B in our sample consistent with several other large studies. If disease progression is altered through modulation of ANRIL and/or CDKN2B transcript levels, this raises the question of why those variants most strongly associated with transcript levels in leukocytes are not strongly associated with disease? One potential explanation is that the modulation of transcript levels that leads to higher risk is specific to particular conditions of regulatory activation and/or specific tissues which are not easily detected in cross-sectional tissue surveys. A recent effort to characterize the regulatory relationships in the region supports the idea that there is a complex regulatory space and that disease-associated alleles may have condition- and tissue-specific effects.18 Further detailed studies in multiple tissues are necessary to understand functional mechanisms and the key regulatory interactions, along with genetic follow-up in large clinical cohorts to determine how these variants influence disease and which ones are most important for potential clinical applications.
We thank Daniel Levy for critical comments on the manuscript. We thank Wei-Jie Seow for contributing to literature searches and cataloguing prior associations at 9p21.3.
Resequencing provided via the NHLBI DNA Resequencing and Genotyping Program by N01-HV-48194. Work on CNV calling and the FHS linkage disequilibrium map was supported by the National Heart, Lung and Blood Institute’s Framingham Heart Study (Contract No. N01-HC-25195) and its contract with Affymetrix, Inc. for genotyping services (Contract No. N02-HL-6-4278). A portion of this research utilized the Linux Cluster for Genetic Analysis (LinGA-II) funded by the Robert Dawson Evans Endowment of the Department of Medicine at Boston University School of Medicine and Boston Medical Center. CNV calling was supported by NIAMS grant R21-AR056405. Whole genome sequencing in CHS and ARIC was conducted as part of the CHARGE-S (Cohorts for Heart and Aging Research in Genome Epidemiology Cosortium) sequencing programming supported by NHLBI Grand Opportunity grant RC2-HL02419. The Atherosclerosis Risk in Communities Study is supported by National Heart, Lung, and Blood Institute contracts N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, and N01-HC-55022 and grants R01-HL-087641, R01-HL-59367 and R01-HL-086694; National Human Genome Research Institute contract U01-HG-004402; and National Institutes of Health contract HHSN268200625226C. The infrastructure was partly supported by Grant Number UL1-RR-025005, a component of the National Institutes of Health and NIH Roadmap for Medical Research. The Cardiovascular Health Study research was supported by NHLBI contracts N01-HC-85239, N01-HC-85079 through N01-HC-85086; N01-HC-35129, N01 HC-15103, N01 HC-55222, N01-HC-75150, N01-HC-45133, HHSN268201200036C and NHLBI grants HL080295, HL087652, HL105756 with additional contribution from NINDS. Additional support was provided through AG-023629, AG-15928, AG-20098, and AG-027058 from the NIA. See also http://www.chs-nhlbi.org/pi.htm. DNA handling and genotyping was supported in part by National Center of Advancing Translational Technologies CTSI grant UL1TR000124 and National Institute of Diabetes and Digestive and Kidney Diseases grant DK063491 to the Southern California Diabetes Endocrinology Research Center. We thank Daniel Levy and Chris J. O’Donnell for genotyping funding. Ian Rogers was supported by NHLBI 1T32 HL076136. Andrew Johnson was supported by an NHLBI IRTA Research Fellowship.