|Home | About | Journals | Submit | Contact Us | Français|
In 2007, the Wellcome Trust Case Control Consortium (WTCCC) performed a genome-wide association study in 2,000 British coronary heart disease (CHD) cases and 3,000 controls after genotyping 469,557 single nucleotide polymorphisms (SNPs). Seven variants associated with CHD were initially identified, and 5 SNPs were later found in replication studies. In the current study, the authors aimed to determine whether the 12 SNPs reported by the WTCCC predicted incident CHD through 2004 in a biracial, prospective cohort study (Atherosclerosis Risk in Communities) comprising 15,792 persons aged 45–64 years who had been selected by probability sampling from 4 different US communities in 1987–1989. Cox proportional hazards models with adjustment for age and gender were used to estimate CHD hazard rate ratios (HRRs) over a 17-year period (1,362 cases in whites and 397 cases in African Americans) under an additive genetic model. The results showed that 3 SNPs in whites (rs599839, rs1333049, and rs501120; HRRs were 1.10 (P=0.044), 1.14 (P<0.001), and 1.14 (P=0.030), respectively) and 1 SNP in African Americans (rs7250581; HRR=1.60, P=0.05) were significantly associated with incident CHD. This study demonstrates that genetic variants revealed in a case-control genome-wide association study enriched for early disease onset may play a role in the genetic etiology of CHD in the general population.
Genetic studies designed to identify sequence variants implicated in complex disorders have recently focused on genome-wide association studies of the relations between large numbers of single nucleotide polymorphisms (SNPs) measured simultaneously and disease risk. The development of high-density genotyping arrays and the availability of the HapMap database (http://hapmap.ncbi.nlm.nih.gov/), which describes common patterns of sequence variation in 4 different populations (1), have facilitated this approach and enabled the successful detection of SNPs that contribute to the etiology of such common diseases as type 2 diabetes (2–7), obesity (8–11), coronary heart disease (CHD) (2, 12, 13), and Alzheimer's disease (14, 15). In many of these studies, SNPs in genes previously reported to be associated with the phenotype of interest, such as transcription factor 7-like 2 and type 2 diabetes (2–5, 16), as well as SNPs in genes or pathways not previously known to be involved in disease pathogenesis, were identified. The replicated association of the fat mass- and obesity-associated gene with obesity is one such example and demonstrates the utility of this approach (9–11).
The Wellcome Trust Case Control Consortium (WTCCC) was formed in Great Britain to carry out an experiment in which approximately 2,000 cases for each of 7 complex diseases and 3,000 shared controls were genotyped using the Affymetrix GeneChip 500K Mapping Array Set (2). Seven SNPs showing either strong (P<5 × 10−7) or moderate (P=10−5–10−7) association with CHD were identified in a sample enriched for premature myocardial infarction or coronary revascularization occurring before the 66th birthday. A second report was subsequently published by the same consortium that presented evidence for replication in the German Myocardial Infarction Family Study for 3 genetic variants, including 2 of the 7 most likely susceptibility loci for CHD from the first report (17). Four additional loci were then identified when a combined analysis of the data from the original WTCCC study and the German Myocardial Infarction Family Study was undertaken. All of the SNPs are located on separate chromosomes or chromosomal regions, so they are not in linkage disequilibrium with each other; information concerning putative candidate genes in these regions and their function is provided in the Appendix Table. Our aim in the current study was to determine whether any of the 12 SNPs found to contribute to coronary artery disease susceptibility in the WTCCC study were associated with incident CHD in a large, biracial, population-based cohort.
The Atherosclerosis Risk in Communities (ARIC) Study is a prospective longitudinal investigation of the development of atherosclerosis involving 15,792 persons aged 45–64 years who were selected by probability sampling from 4 different communities in the United States. At the time of recruitment in 1987–1989, the participants were residents of Forsyth County, North Carolina; Jackson, Mississippi (African Americans only); the northwestern suburbs of Minneapolis, Minnesota; or Washington County, Maryland. Participants in the ARIC Study were excluded from analysis if they were African Americans from the Minnesota or Maryland field center (n=48), because of the small numbers recruited from these sites, or if they were neither African-American nor white (n=43). Additional exclusion criteria were: missing genotype data for all sequence variants (n=540); a positive or unknown history of CHD, prevalent stroke, or a history of transient ischemic attack at the initial clinic visit (n=1,446); and participant refusal for the use of DNA (n=40). When these persons were excluded, the study sample consisted of 13,675 participants.
The initial clinical examination, referred to herein as visit 1, was carried out in 1987–1989 and included an interview designed to elicit information about the presence of cardiovascular disease risk factors, socioeconomic status, and family medical history. Incidence of CHD was determined over a period of 17 years by telephoning participants annually and by surveying discharge lists from local hospitals and death certificates from state vital statistics offices for potential cardiovascular events (18). Incident CHD cases were defined as those involving a definite or probable myocardial infarction, a silent myocardial infarction between examinations determined by electrocardiography, a definite CHD death, or a coronary revascularization procedure. Participants were followed for a mean of 13.1 years. A total of 1,759 CHD cases were identified through December 31, 2004.
All of the persons enrolled in the ARIC Study provided written informed consent, and the study design and methods were approved by institutional review boards at the collaborating medical centers. A detailed description of the ARIC Study has been published previously (19).
The prevalence of diabetes at visit 1 was defined as a fasting glucose level ≥126 mg/dL, a nonfasting glucose level ≥200 mg/dL, and/or self-reported physician diagnosis of or treatment for diabetes. Body mass index (weight (kg)/height (m)2) was calculated from height and weight measurements obtained at the baseline examination. Seated blood pressure was measured 3 times using a random-zero sphygmomanometer, and the last 2 measurements were averaged. Hypertension was defined as systolic blood pressure ≥140 mm Hg or diastolic blood pressure ≥90 mm Hg or current use of antihypertensive medication. Cigarette smoking status was analyzed by comparing current smokers with persons who had formerly or never smoked. Plasma total cholesterol level was measured by an enzymatic method (20), and the portion of low density lipoprotein (LDL) cholesterol was calculated (21). The level of high density lipoprotein (HDL) cholesterol was measured after dextran-magnesium precipitation of non-HDL cholesterol (22).
Genotyping of 12 SNPs previously identified either in the genome-wide association study of coronary artery disease carried out in the United Kingdom (2) or in the German Myocardial Infarction Family Study (17) was performed using the TaqMan assay (Applied Biosystems, Foster City, California). The oligonucleotide sequences for polymerase chain reaction primers and TaqMan probes are available upon request from the authors. Allele detection was performed using the ABI Prism 7700 Sequence Detection System (Applied Biosystems). The genotype call rate, or the percentage of samples to which a genotype was assigned, was determined prior to exclusion of individuals from the analysis and ranged from 93.8% for rs2943634 to 94.8% for rs688034. After the application of all exclusion criteria, the proportion of missing genotype data in the final study sample did not exceed 2.4% for any of the genetic variants. We also assessed the genotyping success rate by analyzing the concordance between genotypes for pairs of blind duplicates included with the DNA samples from the study participants. Kappa coefficients (23), an index of the percentage of agreement between measurements, corrected for agreement occurring by chance, were calculated for each SNP and ranged from 0.96 to 0.99.
Statistical analysis was carried out using the Stata statistical software program, version 9.0 (Stata Corporation, College Station, Texas). The hypothesis that observed genotypes were in Hardy-Weinberg equilibrium was tested in noncases using a χ2 goodness-of-fit test. The proportions, mean values, and standard deviations were calculated for established CHD risk factors for both the incident CHD cases and the comparison group of persons who did not meet the case definition. Cox proportional hazards models were used to estimate hazard rate ratios for CHD. For analyses of CHD cases, follow-up time intervals were defined as the time between visit 1 and the date of the first CHD event. For noncases, follow-up continued through one of the following dates: December 31, 2004, the date of death, or the date of last contact if the participant was lost to follow-up.
Covariates were assessed for statistical significance in the models using the Wald χ2 statistic. The results of all statistical analyses are presented separately by self-reported racial group. A 2-sided P value less than 0.05 was considered statistically significant, with no correction applied for multiple comparisons, since there were strong prior odds of association based on the results of the WTCCC case-control studies (2, 17). Power analyses were conducted using the Cox regression module of the Power Analysis and Sample Size program (24).
Proportions, means, and standard deviations for the established CHD risk factors are shown in Table 1. The mean values for all clinical characteristics at visit 1 differed significantly between participants who developed CHD and noncases for both whites and African Americans, with the exception of mean body mass index for African-American participants. For both racial groups, CHD cases included a higher frequency of males, persons affected by diabetes and hypertension, and smokers in comparison with noncases.
Genotype frequencies for the polymorphisms identified in the WTCCC case-control study differed significantly in noncases between whites and African Americans (data not shown), so subsequent statistical analyses were performed separately by race. The allele and genotype frequencies for all genetic variants were in accordance with Hardy-Weinberg equilibrium expectations for whites. One of the 12 SNPs did not meet Hardy-Weinberg expectations at a P value of 0.05 for African Americans in the study sample (rs7250581, P=0.01); this may be attributable to the low minor allele frequency observed for this genetic variant. Table 2 shows genotype frequencies for the WTCCC SNPs in the ARIC cohort after stratification by race. There were significant differences between incident CHD cases and noncases for 3 of the SNPS in whites (rs1333049, rs501120, and rs8055236).
Table 3 shows results from Cox proportional hazards models used to estimate hazard rate ratios for incident CHD, stratified by both SNP and racial group. An additive genetic model was chosen for these analyses for consistency with the WTCCC reports (2, 17), and all of the genotypes for the individual SNPs were coded with respect to the risk allele for coronary artery disease determined by the WTCCC so that direction of effect could be easily compared. After adjustment for age and gender (model 1), rs7250581 was a nominally significant predictor of incident CHD for African Americans (hazard rate ratio=1.60, P =0.05), while 3 of the SNPS (rs599839, rs1333049, and rs501120) were associated with incident CHD for whites (hazard rate ratios were 1.10 (P=0.044), 1.14 (P < 0.001), and 1.14 (P=0.030), respectively). When further adjustments were made for a group of CHD risk factors including body mass index, smoking, diabetes status, hypertension status, and HDL and LDL cholesterol levels (model 2), the associations with rs7250581 for African Americans and rs599839 for whites were no longer detected. Since rs599839 was reported to be significantly associated with serum LDL cholesterol concentration in 2 genome-wide association studies conducted in European Caucasian populations (25, 26), a third model incorporating only age, gender, and LDL cholesterol was fitted, and the association with incident CHD was eliminated (hazard rate ratio=1.03, 95% confidence interval: 0.94, 1.13; P = 0.556). In addition, if the results of the comparison of genotype frequencies between cases and noncases for each genetic variant are also considered in the evaluation of the effect of adjustment for covariates (Table 2), only rs1333049 and rs501120 are consistently associated with coronary artery disease case status in whites in all analyses. For rs7250581 in African Americans, the further addition of either body mass index, smoking, diabetes status, or hypertension status to the model adjusted for age and gender resulted in the absence of a significant association with incident CHD, while the inclusion of either LDL or HDL cholesterol did not affect the observed relation (data not shown).
Coronary artery disease and its clinical sequelae, including myocardial infarction, together constitute the single greatest cause of death worldwide (27, 28). CHD has a complex etiology, with multiple genes and environmental factors believed to be involved in its pathogenesis. Although the heritability of CHD is estimated to be 40%–60% (29), with a risk to first-degree relatives that is 5–7 times higher than that for members of the general population (30), the reproducible identification of genetic variants underlying this increased susceptibility has been difficult.
A genome-wide association study conducted by the WTCCC compared the frequency of 469,557 SNPs in approximately 2,000 cases and 3,000 shared controls for 7 different complex diseases, including CHD (2). The CHD cases in the WTCCC study included only persons with premature cardiovascular events occurring before the age of 66 years, to maximize the likelihood of detecting a genetic component contributing to disease causation. Seven novel variants were identified, with the strongest association being found for an SNP (rs1333049) located in the same region of chromosome 9p21 that was independently reported to harbor polymorphisms conferring increased risk of myocardial infarction in 5 different Caucasian populations (12, 13). However, the ascertainment scheme chosen by the WTCCC may also have resulted in the choice of CHD cases that were not representative of the population from which they were drawn; thus, SNPs identified using this type of study design may not be associated with CHD outside of the selected subgroup. Our aim in the current investigation was to determine whether any of the 7 SNPs associated with CHD in the initial case-control study (2) and the 5 additional SNPs identified in replication studies conducted by the WTCCC (17) predicted incident CHD in the large, biracial, population-based ARIC cohort.
Two of the sequence variants described by the WTCCC in case-control studies (rs1333049 and rs501120) were significantly associated with incident CHD for white participants in the ARIC Study. Results from the Cox proportional hazards models used to estimate the risk of CHD showed that the 2 intergenic SNPs were independently associated with CHD even after adjustment for multiple risk factors. One of these SNPs (rs1333049) maps to a region of chromosome 9p21 previously reported to be associated with risk of myocardial infarction among whites in the ARIC cohort (12). The 58-kilobase genomic interval defining the risk allele is located near the cyclin-dependent kinase inhibitor 2A, cyclin-dependent kinase inhibitor 2B, and cyclin-dependent kinase inhibitor antisense RNA loci, while there are no annotated genes or micro-RNAs within the region itself (Appendix Table).
The rs501120 variant is located approximately 127 kilobases downstream of the gene encoding stromal cell-derived factor 1 (SDF-1) on chromosome 10q11. SDF-1 is a member of the family of chemoattractant cytokines known as chemokines and is the ligand for cell-surface chemokine receptor 4. SDF-1 has been implicated in a wide variety of biologic processes, including trafficking of hematopoietic stem and progenitor cells (31), migration of lymphocytes and monocytes (32), and the recruitment of endothelial progenitor cells derived from bone marrow to ischemic tissues in animal models (33, 34). SDF-1 has also been reported to be involved in the induction of platelet aggregation, with high expression in smooth muscle cells, endothelial cells, and macrophages in human atherosclerotic plaques (35). Although the latter function suggests that the rs501120 T allele could confer susceptibility to CHD by playing a role in the regulation of thrombus formation after plaque rupture, this would require rs501120 to exert its effect at a considerable distance from the SDF-1 coding region (Appendix Table).
One additional SNP (rs599839) was associated with incident CHD in whites when the results were adjusted only for age and gender, but this relation was abolished after either LDL or HDL cholesterol was added as an additional covariate (P=0.556 and P=0.109, respectively). Since the rs599839 polymorphism was recently reported to be significantly associated with serum LDL cholesterol concentration in 2 genome-wide association studies conducted in European Caucasian populations (25, 26), these results suggest that the influence of this SNP on CHD risk in the ARIC cohort may operate through its effect on plasma lipid levels. The noncoding rs599839 variant is located near 3 coding genes on chromosome 1p13, including sortilin, which is involved in the receptor-mediated binding of lipoprotein lipase on the surface of adipocytes (36). A second SNP that is in linkage disequilibrium with rs599839 (rs646776; r2=0.89 in HapMap Utah residents with Northern and Western European ancestry) has been shown to be correlated with the expression of sortilin in human liver cells (25), providing a possible link to known events in lipoprotein metabolism (Appendix Table).
When incident CHD was examined in African Americans, one of the 12 SNPs (rs7250581) showed a nominally significant relation after adjustment for age and gender that was abrogated after the addition of multiple covariates known to be associated with cardiovascular disease. However, since the observed genotype frequencies for this variant did not meet Hardy-Weinberg equilibrium expectations, the possibility of a false-positive association must also be considered. As Table 4 shows, there was 80% power to detect a hazard rate ratio of 1.2–1.3 for 8 of the polymorphisms studied in African Americans, and a hazard rate ratio of 1.1–1.2 could be detected for all of the genetic variants in whites. Therefore, the failure to identify significant associations between most of the SNPs described in a population of Northern European origin in the WTCCC case-control study and incident CHD in African Americans could be due to differences in linkage disequilibrium in the genomic regions where the respective genetic markers reside; the SNPs identified by the WTCCC may be correlated with a true causative variant in whites but not in populations of African origin. Another possibility is that there may be variation in additional, as-yet-unknown genetic or environmental factors that might modify the effect of the risk variants, although this would need to be examined in a larger population of sufficient size to allow detection of such gene-environment or gene-gene interactions (37). Alternatively, the SNPs identified by the WTCCC were associated with prevalent CHD case status, so for both white and African-American participants, at least some of the polymorphisms analyzed here might play a role in CHD only after it has become well-established rather than in earlier events in disease pathogenesis.
The odds ratios of 1.2 found for the 2 SNPs showing a significant association with CHD in whites under an additive genetic model were in accordance with the modest effects reported for most common sequence variants influencing complex disease in genome-wide association studies (38). However, homozygous carriers of the susceptibility alleles for the rs1333049 and rs501120 variants comprised 22.5% and 75.2% of white persons in the ARIC cohort, respectively, so the public health impact of these polymorphisms could be substantial, and further investigation in other community-based cohorts is warranted. Neither the WTCCC case-control studies nor this study was designed to test alternative hypotheses of disease causation (such as the contribution of copy number variation or of rare variants with large effects) that might help to further explain the observation that family history of CHD is a strong predictor of disease risk.
Author affiliations: Human Genetics Center, University of Texas Health Science Center at Houston, Houston, Texas (Jan Bressler, Kelly A. Volcik, Eric Boerwinkle); Division of Epidemiology and Community Health, School of Public Health, University of Minnesota, Minneapolis, Minnesota (Aaron R. Folsom); Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina (David J. Couper); and Brown Foundation Institute of Molecular Medicine, University of Texas Health Science Center at Houston, Houston, Texas (Eric Boerwinkle).
The Atherosclerosis Risk in Communities Study is carried out as a collaborative study supported by National Heart, Lung, and Blood Institute contracts N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, and N01-HC-55022.
Conflict of interest: none declared.
|dbSNPb ID No.||Chromosome||Nearest Gene(s)||OMIMc No.||Function||Reference No. (Current Study)||Genomic Distance (Base Pairs)d|
|rs599839||1p21||CELSR2||604265||Knockdown in rat pups resulted in abnormal dendritic arborization||39, 40||29,515|
|1p13.3||PSRC1||Down-regulated by p53 and DNA damage||41||3′-UTR|
|1p21.3–p13.1||SORT1||602458||Sorting protein in rat adipocyte vesicles transporting SLC2A4 to plasma membrane in response to insulin; binds LPL in adipocytes||36, 42, 43||118,407|
|rs17465637||1q41||MIA3||Extracellular protein broadly expressed in human tissues; tumor suppressor in melanoma cell lines||44, 45||Intronic|
|rs17672135||1q43||FMN2||606373||Maternal effect gene required for progression through meiosis I in mice||46, 47||Intronic|
|rs6922269||6q25.1||MTHFD1L||611427||MTHFD1L catalyzes tetrahydrofolate synthesis in mitochondria; involved in de novo synthesis of purines and regeneration of methionine from homocysteine; homocysteine identified as a risk factor for vascular disease||48–50||Intronic|
|rs1333049||9p21||CDKN2A/CDKN2B||600160||CDKN2A encodes 2 proteins: p16(INK4), a cyclin-dependent kinase inhibitor, and p14(ARF), involved in p53 regulation; CDKN2B is an effector of TGFB-mediated cell cycle arrest||51, 52||150,455|
|rs501120||10q11.1||CXCL12||600835||Involved in platelet aggregation and expressed in macrophages in human atherosclerotic plaques||35||126,685|
|rs17228212||15q22.33||SMAD3||603109||Transcriptional modulator activated by TGFB; Smad3 (−/−) mice show enhanced intimal hyperplasia after vascular injury; SMAD3 expression detected in human atherosclerotic lesions and restenotic plaques||54–56||Intronic|
|rs8055236||16q24.2–q24.3||CDH13||601364||Receptor for adiponectin; associated with adiponectin levels and blood pressure; expressed in endothelial and smooth muscle cells, atherosclerotic lesions, and restensotic plaques; inhibits attachment of human aortic smooth muscle cells and endothelial cells in culture||57–60||Intronic|
|rs7250581||19q12||POP4||606114||Subunit of human ribonuclease P; associated with RMRP, which is involved in pre-rRNA processing||61, 62||32,815|
|rs688034||22q12.1||SEZ6L||607021||Associated with lung cancer||63, 64||Intronic|
Abbreviations: CDH13, cadherin 13, H-cadherin (heart); CDKN2A, cyclin-dependent kinase inhibitor 2A (melanoma, p16, inhibits CDK4); CDKN2B, cyclin-dependent kinase inhibitor 2B (p15, inhibits CDK4); CDKN2BAS, CDKN2B antisense RNA (non-protein-coding); CELSR2, cadherin, EGF LAG 7-pass G-type receptor 2 (flamingo homolog, Drosophila); CXCL12, chemokine (C-X-C motif) ligand 12 (stromal cell-derived factor 1); FAM174A, family with sequence similarity 174, member A; FMN2, formin 2; ID, identification; LPL, lipoprotein lipase; MIA3, melanoma inhibitory activity family, member 3; MTHFD1L, methylenetetrahydrofolate dehydrogenase (NADP+-dependent) 1-like; OMIM, Online Mendelian Inheritance in Man; p53, transformation-related protein 53; POP4, processing of precursor 4, ribonuclease P/MRP subunit (Saccharomyces cerevisiae); PSRC1, proline/serine-rich coiled-coil 1; RMRP, RNA component of mitochondrial RNA processing endoribonuclease; rs, reference SNP; SEZ6L, seizure-related 6 homolog (mouse)-like; SLC2A4, solute carrier family 2 (facilitated glucose transporter), member 4; SMAD3, SMAD family member 3; SNP, single nucleotide polymorphism; SORT1, sortilin 1; TGFB, transforming growth factor β1; UTR, untranslated region.