|Home | About | Journals | Submit | Contact Us | Français|
Modern genotyping platforms permit a systematic search for inherited components of complex diseases. We performed a joint analysis of two genomewide association studies of coronary artery disease.
We first identified chromosomal loci that were strongly associated with coronary artery disease in the Wellcome Trust Case Control Consortium (WTCCC) study (which involved 1926 case subjects with coronary artery disease and 2938 controls) and looked for replication in the German MI [Myocardial Infarction] Family Study (which involved 875 case subjects with myocardial infarction and 1644 controls). Data on other single-nucleotide polymorphisms (SNPs) that were significantly associated with coronary artery disease in either study (P<0.001) were then combined to identify additional loci with a high probability of true association. Genotyping in both studies was performed with the use of the GeneChip Human Mapping 500K Array Set (Affymetrix).
Of thousands of chromosomal loci studied, the same locus had the strongest association with coronary artery disease in both the WTCCC and the German studies: chromosome 9p21.3 (SNP, rs1333049) (P=1.80×10−14 and P=3.40×10−6, respectively). Overall, the WTCCC study revealed nine loci that were strongly associated with coronary artery disease (P<1.2×10−5 and less than a 50% chance of being falsely positive). In addition to chromosome 9p21.3, two of these loci were successfully replicated (adjusted P<0.05) in the German study: chromosome 6q25.1 (rs6922269) and chromosome 2q36.3 (rs2943634). The combined analysis of the two studies identified four additional loci significantly associated with coronary artery disease (P<1.3×10−6) and a high probability (>80%) of a true association: chromosomes 1p13.3 (rs599839), 1q41 (rs17465637), 10q11.21 (rs501120), and 15q22.33 (rs17228212).
We identified several genetic loci that, individually and in aggregate, substantially affect the risk of development of coronary artery disease.
Coronary artery disease and its main complication, myocardial infarction, are leading causes of death and disability worldwide.1 Lifestyle and environmental factors play an important role in their development.2 In addition, these complex diseases cluster in families, suggesting a substantial genetic cause.3 Despite extensive exploration of many genes, strong evidence of a molecular genetic association with coronary artery disease or myocardial infarction remains to be obtained.
The recent development of high-density genotyping arrays provides unprecedented resolution for whole-genome assessment of variants associated with common diseases.4 Using the GeneChip Human Mapping 500K Array Set (Affymetrix), which simultaneously types approximately 500,000 genetic variants, the Wellcome Trust Case Control Consortium (WTCCC) recently reported on an analysis of data from approximately 2000 people (and a shared set of 3000 controls) for each of seven complex diseases, including coronary artery disease.5 Several loci showed strong associations with coronary artery disease, but the large number of statistical tests poses the challenge of discriminating between true and false associations. By providing a detailed analysis of the WTCCC data in conjunction with data from another large genomewide association study, the German MI [Myocardial Infarction] Family Study, we sought robust evidence for associations of genetic loci with the risk of coronary artery disease and myocardial infarction.
All participants in both studies were of white European origin. Detailed descriptions of recruitment and ascertainment in both studies are given in the Supplementary Appendix, available with the full text of this article at www.nejm.org. Local ethics committees approved the study protocols, and all participants gave written informed consent.
The 1988 case subjects in the WTCCC study had a history of either myocardial infarction or coronary revascularization before the age of 66 years, as well as a strong family history of coronary artery disease.6,7 Two independent control groups were studied: 1504 controls from the British 1958 Birth Cohort and 1500 controls selected from a sample of blood donors recruited as part of the WTCCC project.5
The 875 case subjects in the German MI Family Study were persons who had myocardial infarction before the age of 60 years and at least one first-degree relative with premature coronary artery disease.8,9 The 1644 controls were selected from a well-characterized random sample of German residents, stratified according to sex and age.10
All analyses were performed with the use of the GeneChip Human Mapping 500K Array Set, including a StyI and a NspI chip. Details on genotype-calling algorithms, quality criteria (at the level of both the individual and the SNP), and validation steps are described in the Supplementary Appendix. This left 377,857 SNPs in 1926 case subjects and 2938 controls in the WTCCC study and 272,602 SNPs in 870 and 772 case subjects for the StyI chip and the NspI chips, respectively, and from 1644 controls for both chips in the German study.
Before performing the genetic analyses, we examined the data from each cohort individually for population substructure and ascertained that such variation was negligible in both studies (see the Supplementary Appendix). We then undertook three types of genetic analysis. First, after identifying the SNPs with the strongest associations with coronary artery disease in the WTCCC sample, we sought to replicate these loci in the German study (primary analysis). Second, we combined data on other SNPs for which there was evidence of an association in either study, to identify additional loci with a low probability of being falsely positive (combined analysis). Third, we examined the data from both studies for associations with coronary artery disease among SNPs in genes reported to be associated with this disease (candidate-gene analysis).
All autosomal SNPs that showed evidence of a significant association with coronary artery disease in the WTCCC study (P<0.001 with use of the two-sided Cochran–Armitage trend test) were assessed for the false-positive-report probability (FPRP).11 (The rationale for the FPRP calculations and the assumptions underlying them are given in the Supplementary Appendix.) A small FPRP suggests that the association of a SNP is unlikely to be a false positive result. A SNP with an FPRP of less than 0.5 (i.e., one for which the chance of a truly positive association was greater than 50%), as well as SNPs within 100 kb in either direction of this SNP, also with P values of less than 0.001, were considered to represent a single locus. Formal replication testing of such loci was then performed in the German study with the use of the lead SNP (defined as the SNP with the lowest FPRP) for each locus and adjustment for multiple testing. A P value of less than 0.05 with use of the two-sided Cochran–Armitage trend test was considered to indicate statistical significance. Haplotype analysis was carried out on replicated loci, as described in the Supplementary Appendix. The power for replication in the German study was estimated for each locus for an odds ratio for myocardial infarction of 1.25 per allele and an adjusted significance level of 0.05.
All SNPs that showed evidence of a significant association (P<0.001) with coronary artery disease in the WTCCC study or with myocardial infarction in the German study were combined to assess their combined FPRP. Pooled odds ratios for the risk allele and 95% confidence intervals were calculated within each stratum, according to the study.
Candidate genes were identified as described in the Supplementary Appendix. On the GeneChip array, we distinguished between SNPs that were identical to those identified in previous studies as showing an association with coronary artery disease and SNPs that were in complete or near-complete linkage disequilibrium with those identified in previous studies as showing an association with coronary artery disease. We used either of two measures of linkage disequilibrium — an r2 value of 0.8 or more or a disequilibrium coefficient (D’) of 0.9 or more — on the basis of data from the International HapMap Project.12
Population attributable fractions were estimated in the German study with the use of the lead SNP from each of the three replicated loci. Adjustments were made for age, sex, the interaction between age and sex, the Prospective Cardiovascular Munster (PROCAM) study score, and the Framingham risk score. Further details are given in the Supplementary Appendix.
In both the WTCCC and the German studies, the case subjects were young (mean age at first event, approximately 50 years) and had a strong familial basis for their disease (Table 1).
The distribution of P values for the association of SNPs with coronary artery disease in the WTCCC study and with MI in the German Study, according to chromosome, is shown in Figure 1. In the WTCCC study, 396 SNPs were significantly associated with coronary artery disease (P<0.001). Thirty of these SNPs clustering in nine chromosomal regions met the predefined criterion of an FPRP of less than 0.5 (Table 1 in the Supplementary Appendix). We tested these nine loci for a significant association with myocardial infarction in the German study. Three of the loci — chromosomes 9p21.3, 6q25.1, and 2q36.3 — had such an association, even after adjustment for multiple testing for nine loci (Table 2).
The locus on chromosome 9p21.3 showed the strongest signal in both the WTCCC and the German studies (P=1.80×10−14 and P=3.40×10−6, respectively) (Fig. 1 and Table 2). The combined P value for the association with coronary artery disease of the lead SNP in that locus, rs1333049, was 2.91×10−19, with the risk increased by 36% per copy of the C allele (95% confidence interval [CI], 27 to 46). Approximately 22% of the study participants were homozygous for this allele, with an additional 50% carrying a single copy. In the WTCCC and German studies, a similar pattern of association with respect to direction and magnitude of effect was found across a region of approximately 100 kb on 9p21.3 (Fig. 2). The 19 SNPs showing an association within this region were in strong linkage disequilibrium. However, two blocks could be distinguished, with strong linkage disequilibrium within each block (average D’>0.90) and moderate linkage disequilibrium between blocks (average D’, approximately 0.60) (Fig. 2). Three SNPs in block 1 (rs7044859, rs1292136, and rs7865618) and one SNP in block 2 (rs1333049) were sufficient to tag the region. Haplotype analysis showed that the association was mainly due to two mutually exclusive haplotypes for the four SNPs (TTGG and ACAC) (Supplementary Appendix). In the WTCCC study, the odds ratio for coronary artery disease with the ACAC haplotype (frequency, 0.324 among controls and 0.386 among case subjects), as compared with the TTGG haplotype (frequency, 0.333 among controls and 0.271 among case subjects), was 1.48 (95% CI, 1.34 to 1.64) per copy of the haplotype (P=2.1×10−14). The results of haplotype analysis in the German study were similar (Supplementary Appendix).
The lead SNP on chromosome 6q25.1 (rs6922269, with a combined P value of 2.90×10−8 for the association with coronary artery disease) is in the gene for methylenetetrahydrofolate dehydrogenase (NADP+-dependent) 1–like protein (MTHFD1L). All positive SNPs in the region are located in introns in the middle portion of the gene (Fig. 1A in the Supplementary Appendix). The risk allele (A) for rs6922269 has a prevalence of approximately 25%, with the risk increased by 23% per copy (95% CI, 15 to 33). Haplotype analysis showed that only the two haplotypes carrying the A allele were more frequent in case subjects than in controls, confirming the increased odds ratio for coronary artery disease with the A allele in the single-locus analysis (Supplementary Appendix).
The third replicated locus, on chromosome 2q36.3 (rs2943634, with a combined P value of 1.61×10−7 for the association with coronary artery disease) encompasses a region of 233 kb (Fig. 1B in the Supplementary Appendix). There is only one pseudogene (ENSG00000197218) located within this region. The risk allele (C) for rs2943634 has a prevalence of approximately 65%, with the risk increased by 21% per copy (95% CI, 13 to 30). Haplotype analysis showed that the other associations observed in the linkage-disequilibrium block around rs2943634 are due to linkage disequilibrium with it (Supplementary Appendix).
The association of the loci on chromosomes 9p21.3 and 6q25.1 with myocardial infarction in the German study was not affected by adjustment for cardiovascular risk factors and scores. In contrast, the odds ratio for myocardial infarction at the locus on chromosome 2q36.3 was reduced after such adjustment (Table 2). Further analysis showed that this locus was also significantly related to the body-mass index (P=0.004 in an additive model), the presence or absence of hypertension (P=0.04), and the level of low-density lipoprotein cholesterol (P=0.03). The fully adjusted population attributable fractions for the three loci in the German study are shown in Table 2. The combined fraction for the three loci was 0.38 (95% CI, 0.13 to 0.55). The prediction of myocardial infarction on the basis of the Framingham risk score and the PROCAM study score was substantially improved by adding the predictive information of these three loci to the model (deviance, 191.48; P<1×10−10).
For the six loci with an FPRP of less than 0.5 in the WTCCC study for which the associations were not replicated in the German study, the results of each of the two studies are shown in Table 2 in the Supplementary Appendix. The power to replicate these loci ranged from 43 to 80%.
The combined analysis of all SNPs identified four additional loci with a high likelihood of association with coronary artery disease (FPRP <0.2) (Table 3, and Table 3 in the Supplementary Appendix). The locus on chromosome 1p13.3 involves the PSRC1 gene, which encodes a proline-rich protein (Fig. 1C in the Supplementary Appendix). The other region of chromosome 1 (1q41) maps to the melanoma inhibitory activity 3 (MIA3) gene (also known as ARNT or TANGO) (Fig. 1D in the Supplementary Appendix). The SNPs associated with chromosome 10q11.21 cluster in a region 100 kb downstream of the CXCL12 gene (also known as the gene for stromal-cell–derived factor 1 precursor) (Fig. 1E in the Supplementary Appendix). Finally, the SNP on chromosome 15q22.33 is an intronic SNP in the SMAD3 gene (Fig. 1F in the Supplementary Appendix). SMAD3 is a transcriptional modulator activated by transforming growth factor β (TGF-β) and activin type 1 receptor kinase.
In the WTCCC study, approximately 30% of subjects had confirmed evidence of coronary artery disease but had not had a myocardial infarction at the time of recruitment (Table 1). When subjects with coronary artery disease only and those with coronary artery disease and myocardial infarction were analyzed individually, the odds ratios for both phenotypes across all seven chromosomal regions remained significant (Table 4 in the Supplementary Appendix). For the chromosome 15 locus, the effect size was significantly greater (P=0.004) in the subgroup of patients with coronary artery disease only than in the subgroup with coronary artery disease and myocardial infarction. Analysis according to sex in the two studies combined showed that all seven loci affected the risk of coronary artery disease to a similar extent in women and in men (Table 5 in the Supplementary Appendix).
From a literature search, we identified 142 SNPs, in 91 candidate genes, that had been reported to be associated with coronary artery disease or myocardial infarction. Only 13 of these SNPs are represented on the GeneChip array. For 36 genes, there were no primary or tagging SNPs. For the other genes, we identified 270 SNPs in complete or near-complete linkage disequilibrium with the SNPs that were previously found to be associated with coronary artery disease or myocardial infarction. Although a number of SNPs had a promising association with coronary artery disease in the WTCCC study or with myocardial infarction in the German study, only two linked SNPs (rs17489268 and rs17411031) tagging the Ser447→Ter variant in the lipoprotein lipase gene had a significant association in both studies (Table 6 in the Supplementary Appendix).
We jointly analyzed data from two distinct but complementary genomewide association studies of coronary artery disease and myocardial infarction that performed ascertainment in similar ways and that involved the same genotyping platform. Sequential and combined analyses of the two data sets allowed us to identify several new genetic loci, which individually and in aggregate considerably affect the risk of coronary artery disease.
The association of chromosome 9p21.3 with coronary artery disease was the strongest found in the WTCCC study.5 The finding that this locus was also most strongly associated with myocardial infarction in the German study provides compelling proof of its involvement in coronary artery disease. The evidence of association is strong, the risk variant is common, and each copy of the allele substantially increases the probability of the disease. These findings unequivocally demonstrate a major genetic risk variant at this locus.
Indeed, during revision of this manuscript, two other genomewide studies reported a strong association of the same 9p21.3 locus with coronary artery disease and myocardial infarction,13,14 making this the most highly replicated locus for coronary artery disease identified to date. The region contains the coding sequences of genes for two cyclin-dependent kinase inhibitors, CDKN2A (encoding the prototypic INK4 protein p16INK4a) and CDKN2B (encoding p15INK4b), which play an important role in the regulation of the cell cycle and may be implicated, through their role in TGF-β-induced growth inhibition, in the pathogenesis of atherosclerosis.15-17 Although regulation of one or both of the CDKN2 genes may explain the association with coronary artery disease, other explanations also need to be considered, including involvement of the methylthioadenosine phosphorylase (MTAP) gene or of other expressed sequences located in the region. The same region has also recently been associated with increased susceptibility to type 2 diabetes,18-20 raising the possibility of a shared, rather than a single, mechanism causing both coronary artery disease and diabetes.
The association of chromosome 6q25.1 with coronary artery disease maps to the MTHFD1L gene, which encodes the mitochondrial isozyme of C1-tetrahydrofolate (THF) synthase.21,22 The family of C1-THF synthases is used in a variety of cellular processes, particularly the synthesis of purine and methionine.21 Therefore, MTHFD1L activity may also contribute to plasma homocysteine levels,21,23 raising the possibility of a link between MTHFD1L variants and this risk factor for coronary artery disease.24 A preliminary analysis of data from 1070 persons in the AtheroGene study25 has not revealed an association between rs6922269 genotypes and plasma homocysteine levels (Tiret L, Blankenberg S: personal communication). Nevertheless, further studies in a wider range of subjects are needed to investigate this possibility.
Our findings demonstrate the main strength of a genomewide approach — namely, the possibility of identifying hitherto unsuspected loci that increase susceptibility to complex diseases. However, the mechanisms underlying the newly identified associations are often not immediately obvious. Indeed, the mechanisms for the association of signals on chromosomes 9p21.3, 6q25.1, and 2q36.3 with coronary artery disease all require elucidation. Similarly, none of the chromosomal loci identified in the combined analysis have previously been strongly linked to coronary artery disease. However, genes in several of the loci (PSRC1 at 1p13.3, MIA3 at 1q41, and SMAD3 at 15q22.33) play a role in cell growth or inhibition.26-29 These processes are fundamental for the formation and progression of atherosclerotic plaque and also for plaque instability.30 Our results suggest that genetic regulation of these processes plays an important role in the development of coronary artery disease and myocardial infarction.
Some loci from the WTCCC study that we attempted to replicate did not show association in the German study. These negative data underscore the need to view genomewide associations with caution, despite their statistical strength, until they have been replicated in appropriate validation samples. In this context, caution should also be exercised with regard to the four loci identified in our combined analysis.31
Our primary objective was to identify loci with significant associations with coronary artery disease independently of any biologic assumptions. Nonetheless, the genotyping platform also offered an opportunity to examine genetic variants in genes with previously reported associations. Indeed, several showed evidence of an association in one of our studies. However, only SNPs in the lipoprotein lipase gene had evidence of an association in both studies. This finding is in agreement with those in most recent systematic studies that were largely unsuccessful in replicating initial findings in candidate genes.32 However, many of the previously studied gene variants are poorly tagged on the GeneChip array, which clearly fails to cover the full extent of even common variation in these genes.
Whether our findings can be translated into better prevention or treatment for coronary artery disease will become clear only over time and with further research. Although the odds ratios for each locus are modest, as anticipated for a polygenic disorder, the estimates of population attributable fractions for the three validated loci are substantial, both individually and in aggregate. This observation offers the potential for improved overall coronary risk prediction. However, the case subjects in both studies had a strong family history of premature coronary artery disease, which might have enhanced the power to detect an association with coronary artery disease but also might have increased the estimated population attributable risks beyond that of sporadic cases, and further analysis of the loci in a wider range of subjects is necessary. Further studies are also needed to investigate the associations of the loci with other types of atherosclerotic disease, as well as with cardiovascular risk factors and markers. At a genetic level, studies should focus on fine mapping of the associated regions and thorough investigation of candidate genes. Our results provide a framework for all these additional studies.
Our analysis has several important limitations. Although the GeneChip array typed over 500,000 variants, a substantial percentage could not be evaluated, for reasons given in the Supplementary Appendix. Furthermore, to reduce the effect of multiple testing, we used only the rather conservative Cochran–Armitage test for trend (an additive model) to screen the WTCCC data for significant associations. These limitations make it likely that some loci were missed and that further analysis of the data and subsequent validation will reveal other loci.
Nonetheless, by using a sequential strategy of initial replication and subsequent combination of information from the two genomewide association studies, we were able to describe several new genetic loci for coronary artery disease and myocardial infarction that have a considerable effect on the risk of these diseases and that merit indepth follow-up studies. Most important, the finding that a single locus was the strongest signal in two separate studies carries promise for clinically relevant progress in our understanding of the genetics of coronary artery disease. As the current activity in genomewide association studies of complex traits accelerates, our approach may also provide a paradigm for combining the results of such studies to maximize the amount of valuable information that can be extracted from these expensive and laborious experiments.
Supported by grants from the Wellcome Trust, the National Genome Research Network 2 of the German Federal Ministry of Education and Research, and the Cardiogenics project of the European Union. Recruitment for the WTCCC study was supported by grants from the British Heart Foundation and the U.K. Medical Research Council, and recruitment for the German MI Family Study was supported by grants from the Deutsche Forschungsgemeinschaft and the Deutsche Herzstiftung. We also acknowledge support from the Wellcome Trust Functional Genomics Initiative in Cardiovascular Genetics and the KORA (Cooperative Research in the Region of Augsburg) research platform of the GSF–National Research Centre. Drs. Samani and Ball hold chairs funded by the British Heart Foundation, and Dr. Tobin holds a U.K. Medical Research Council Clinical Scientist Fellowship.
No potential conflict of interest relevant to this article was reported.
We thank Peter Tooze, Andrew Kenniry, Simon Potter, Petra Bruse, Janine Stegmann, Anika Götz, Michaela Vöstner, Klaus Stark, and Viviane Nicaud for assistance.