|Home | About | Journals | Submit | Contact Us | Français|
Recently, genome wide association studies (GWAS) have identified a number of single nucleotide polymorphisms (SNPs) as being associated with coronary heart disease (CHD). We estimated the effect of these SNPs on incident CHD, stroke and total mortality in the prospective cohorts of the MORGAM Project. We studied cohorts from Finland, Sweden, France and Northern Ireland (total N = 33,282, including 1,436 incident CHD events and 571 incident stroke events). The lead SNPs at seven loci identified thus far and additional SNPs (in total 42) were genotyped using a case-cohort design. We estimated the effect of the SNPs on disease history at baseline, disease events during follow-up and classic risk factors. Multiple testing was taken into account using false discovery rate (FDR) analysis. SNP rs1333049 on chromosome 9p21.3 was associated with both CHD and stroke (HR = 1.20, 95% CI 1.08–1.34 for incident CHD events and 1.15, 0.99–1.34 for incident stroke). SNP rs11670734 (19q12) was associated with total mortality and stroke. SNP rs2146807 (10q11.21) showed some association with the fatality of acute coronary event. SNP rs2943634 (2q36.3) was associated with high density lipoprotein (HDL) cholesterol and SNPs rs599839, rs4970834 (1p13.3) and rs17228212 (15q22.23) were associated with non-HDL cholesterol. SNPs rs2943634 (2q36.3) and rs12525353 (6q25.1) were associated with blood pressure. These findings underline the need for replication studies in prospective settings and confirm the candidacy of several SNPs that may play a role in the etiology of cardiovascular disease.
While the role of lifestyle risk factors is well established in the development of cardiovascular disease (CVD), the identification of genetic factors involved in the susceptibility to CVD has been more challenging, with only a few candidate genes reproducibly associated with disease [Arnett et al., 2007; Cambien and Tiret, 2007]. Recently, genome-wide association studies (GWAS) have opened a fresh avenue of research by affording the possibility to explore the whole genome without any a priori biological hypotheses. This approach had led to the identification of a new locus on chromosome 9p21 associated with coronary heart disease (CHD) and myocardial infarction in several GWAS [Helgadottir et al., 2007; McPherson et al., 2007; Samani et al., 2007; Wellcome Trust Case Control Consortium (WTCCC), 2007]. This association has now been replicated in several large cohorts and appears to be a robust finding, even though the underlying mechanism is not yet elucidated [Schunkert et al., 2008]. In addition, the association has been shown with two other arterial diseases [Helgadottir et al., 2008].
In the WTCCC Study [Wellcome Trust Case Control Consortium, 2007] and the German Myocardial Infarction (MI) Study recently reported by the Cardiogenics Consortium [Samani et al., 2007], several loci, including chromosome 9p21.3, were identified as putative loci for CHD, namely 1p13.3, 1q41, 2q36.3, 6q25.1, 10q11.21 and 15q22.33. The natural sequence of this research is to replicate the GWAS findings in other populations and to address other interesting questions, such as the effect of the identified loci on the classic CVD risk factors or on other cardiovascular endpoints such as ischemic stroke. Moreover, these variants have been identified in case-control settings, but their effect on the risk of CHD, ischemic stroke and all-cause mortality has not been studied in a prospective setting. Of course it is widely acknowledged that the task of distinguishing meaningful signals from noise—“separating the gold from the fool’s gold” [Dupuis and O’Donnell, 2007]—is not a solely problem with a genetic or technical solution, but must be addressed from epidemiological principles.
In consortium studies, a major challenge is to ensure that phenotypes have been assured with the same rigour as the genotypes, and furthermore, it can be dangerous to assume that case-control studies will reflect the impact of specific SNPs on incident disease in diverse populations with different absolute risks [McCarthy et al., 2008; Pearson and Manolio, 2008]. The vast majority of subjects in the WTCCC were British residents but, even so, there were several loci, some of which were associated with disease, which demonstrated substantial geographical variation in allele frequencies across Britain. In general, the differences in allele frequencies in Europe [Cavalli-Sforza and Piazza, 1993] and even in more isolated populations such as Finns [Pastinen et al., 2001] are known for long. Thus, replication of the findings in other European populations is essential. Although the WTCCC findings might assuage concerns about the use of common control groups for several disease phenotypes, when CHD studies from different European populations are pooled, it is important to ensure like is being compared with like. Lastly, given the fact that a substantial proportion of incident CHD cases die within 28 days and most of those who die do not even reach hospital [Tunstall-Pedoe et al., 1994], we should not assume that the findings in survivors will be identical to those in whom the outcome is fatal.
The case for replication is compelling and for this purpose we have used several extensively phenotyped prospective cohorts from Europe assembled within the MORGAM Project [Evans et al., 2005]. These cohorts have been followed up for between 5 and 10 years for CHD, stroke events and total mortality. Thus, we can assess whether the findings reported solely for CHD have relevance for the separate (though related) endpoints of stroke and total deaths, in cohorts free of disease at baseline. We can also explore the effect of these variants on classic CVD risk factors measured at baseline.
The study included two population-based cohorts from the FINRISK Study from Finland (follow-up 1992–2001 and 1997–2004), the ATBC study from Finland (follow-up 1992–1999), two cohorts from Northern Sweden (follow-up 1990–1999 and 1994–1999) and the PRIME cohort comprising three centers in France and one in Northern Ireland (5-years follow-up at the period 1991–1999). All the cohorts are part of the MORGAM Project [Evans et al., 2005; MORGAM Project, 2001-] and the cohort descriptions have been published [Kulathinal et al., 2005]. The study employed a case-cohort design in which all CVD cases from the prospective follow-up and a random subset of the cohort members were selected for genotyping [Kulathinal et al., 2007].
The outcomes analyzed included disease status at baseline and disease events during the follow-up. History of MI at baseline (yes or no) and history of stroke at baseline (yes or no) were primarily based on health information sources such as a hospital discharge registers and on self-reports (questionnaire responses about doctor diagnoses). The cohorts were followed up for all fatal and non-fatal acute CHD and stroke events as well as all other deaths. The time and the type of the event were recorded. An event was considered as fatal if the subject died within 28 days of the onset of the CHD or stroke event. The main CHD outcome was defined as first fatal or non-fatal CHD event which included definite and possible acute MI or coronary death, unstable angina pectoris, revascularization and unclassifiable fatal events. The main stroke outcome was defined as first fatal or non-fatal likely cerebral infarction, which includes events validated as cerebral infarction and events that were not validated but most likely were cerebral infarctions on the basis of the clinical or death diagnoses. The details of the follow-up procedures are described in the cohort descriptions [Kulathinal et al., 2005].
Risk factors measured at baseline included total and high density lipoprotein (HDL) cholesterol, systolic and diastolic blood pressure (two measurements), daily smoking, height and weight. In the analysis, we used non-HDL cholesterol (difference of total cholesterol and HDL cholesterol), HDL-cholesterol, mean blood pressure (the mean of the two diastolic and the two systolic blood pressure measurements), current daily smoking (yes or no) and body mass index (weight in kilograms divided by square of height in meters). Other variables collected at baseline included self-reported history of diabetes, drug treatment for high cholesterol and drug treatment for high blood pressure. The baseline measurement procedures were highly standardised [Niemelä et al., 2007].
The SNP markers selected for genotyping included the seven lead SNPs (1p13.3, 1q41, 2q36.3, 6q25.1, 9p21.3, 10q11.21 and 15q22.33) identified in the WTCCC Study and German MI Study and six lead SNPs (1q32.2, 1q43, 5q21, 16q23, 19q12 and 22q12) identified in the WTCCC study but not replicated in the German MI Study. The genotyping plan also included proxies that were in almost complete linkage disequilibrium with the lead SNP and other SNPs that were selected because they brought some additional information about the haplotypic structure of the locus. The full list of the 42 SNPs is given in the web supplement http://www.ktl.fi/publications/morgam/cardiogenics/index.html.
Forty single nucleotide polymorphism (SNP) markers out of 42 were genotyped with the iPLEX chemistry on the MassARRAY system (Sequenom, San Diego, CA), using a protocol specified by the manufacturers, and 12.5–20 ng of genomic DNA. For 233 samples with less than 7.5 μg genomic DNA, the DNA was whole genome amplified prior to genotyping [Silander et al., 2005]. Genotyping was done in 384-well plates which contained eight non-template controls and eight plate-specific duplicates in plate-specific positions and 5% blind duplicates. The case status was unknown to the laboratory staff. Assay information has been provided elsewhere [Schunkert et al., 2008].
Because of difficulties in genotyping SNPs rs599839 and rs2943634 with the iPLEX technique, they were genotyped using the 5′ nuclease assay with MGB TaqMan probes (Applied Biosystems, Courtaboeuf, France). Fluorescence was measured with an ABI PRISM 7000 sequence detection system (Applied Biosystems). Primer and probe sequences can be found at the GeneCanvas website (http://www.gencanvas.org).
We used logistic regression to analyze the case fatality and the disease status at baseline, a Cox proportional hazards model to analyze the disease events during the follow-up and linear regression to analyze the association between genotypes and risk factors. Because subjects were selected for genotyping according to the case-cohort design, cases and subcohort members had to be weighted appropriately in the analyses. In logistic regression models, subjects had weights proportional to the inverse of the selection probabilities [Zhao and Lipsitz, 1992]. In time-to-event models, subcohort members had weights proportional to the inverse of the selection probabilities and cases had a weight of one at the time of the event. In linear regression models, only the subcohort members without history of CVD were included and no weighting was used. The analysis methods for the MORGAM case-cohort design are described in detail elsewhere [Kulathinal et al., 2007]. Statistical analysis was carried out using R [R Development Core Team, 2008].
A separate model was fitted for each SNP implying that there were 42 models for each outcome. Each analysis included only subjects who had complete data on the risk factors in the model. In all models, the heterozygote was coded value 0 and the homozygotes were coded as 1 and -1. Models for disease status at baseline were adjusted for age at baseline, sex and cohort. Models for events during the follow-up were adjusted for cohort, HDL cholesterol, non-HDL cholesterol, mean of systolic and diastolic blood pressure, body mass index, current daily smoking and history of diabetes. The age of the subject was used as the time variable and separate baseline hazards were assumed for men and women. For comparison we also fitted models that were adjusted only for cohort and current daily smoking and stratified by sex. Models for the risk factors were fitted for the subcohort and were adjusted for age at baseline, sex and cohort. Subcohort members who had drug treatment for high cholesterol were excluded from the analyses of the cholesterol measurements and subcohort members who were on drug treatment for high blood pressure were excluded from the blood pressure analyses.
The study had a power of 86% to detect an effect on CHD with a hazard ratio of 1.2 and a power of 41% for detecting an effect with a hazard ratio of 1.1 in a single test with a nominal significance level of 5% [Cai and Zeng, 2004]. For stroke (cerebral infarction) the corresponding powers were 61 and 26%. The details of the power calculations are given in the Supplement.
False-discovery rate (FDR) analysis [Benjamini and Hochberg, 1995] with a conservative a priori assumption that there were no true positive findings in the results was used to address multiple testing. The FDR analysis was performed separately for each outcome. An SNP was identified as interesting if it was one of the seven lead SNPs found in the WTCCC Study and German MI Family Study (rs1333049, rs6922269, rs2943634, rs599839, rs17465637, rs501120 and rs17228212) or if it was identified in one of our main analyses (disease status at baseline, events during follow-up and baseline risk factors) using an FDR threshold of 20%.
The total number of individuals genotyped under the case-cohort design was 5,613 of which 2,341 were subcohort members. The characteristics of the case-cohort set and the subcohort are summarized in Table I. The tables in the web supplement http://www.ktl.fi/publications/morgam/cardiogenics/index.html report the genotyping success rates, allele frequencies and Hardy-Weinberg test statistics for each cohort. No major departures from the Hardy-Weinberg equilibrium were found and 99.89% of blind duplicate genotypes were consistent. A total of 247 samples from the PRIME cohort were genotyped using both iPLEX and TaqMan chemistries for two of the SNPs, rs1333049 and rs6922269. Among 457 successful genotype pair comparisons, five discrepancies in four samples were present, resulting in a 98.9% genotype concordance.
In addition to the seven lead SNPs, six SNPs were identified as interesting by the FDR analysis: rs4970834 (1p13.3), rs2972147 (2q36.3), rs12525353 (6q25.1), rs10738610 (9p21.3), rs2146807 (10q11.21) and rs11670734 (19q12). The genotype distributions of the interesting SNPs in different study populations are reported in Table II. The results are not reported for rs2972147 (2q36.3) and rs10738610 (9p21.3) because they are almost complete proxies of the lead SNPs. The largest differences between the populations were found in chromosome 9p21.3 (rs1333049) where the risk allele C of rs1333049 had a frequency of 52.1% (95% confidence interval 44.2–60.0) in PRIME/Belfast and a frequency of 39.5% (36.2–42.8) in FINRISK and in chromosome 19q12 (rs11670734) where allele C had a frequency of 40.2% (32.5–47.9) in PRIME/Belfast and a frequency of 25.5% (18.9–32.1) in Northern Sweden.
The results concerning disease status at baseline (Table III) showed that chromosome 9p21.3 (rs1333049) was associated with both a history of MI and a history of stroke. The odds ratio per allele for rs1333049 was 1.24 (1.11–1.39) for MI and 1.22 (1.06–1.41) for stroke. Also chromosome 19q12 (rs11670734) was associated with history of MI and history of stroke. Chromosome 1p13.3 (rs599839) and 1q41 (rs17465637) were associated with history of MI. Chromosome 10q11.21 was also associated with history of MI but this association was not seen on the lead SNP (rs501120) but in a proxy (rs2146807). No association was found for chromosome 2q36.3 (rs2943634), 6q25.1 (rs6922269) or 15q22.33 (rs17228212) for these disease outcomes.
The results from the analysis of disease events during the follow-up (Table IV) showed that chromosome 9p21.3 (rs1333049) was associated with the risk of CHD with the hazard ratio per risk allele 1.20 (1.07–1.34) for the lead SNP (rs1333049). Chromosome 19q12 (rs11670734) was associated with all deaths with a hazard ratio of 1.19 (1.06–1.33) per risk allele. No other associations could be confirmed with incident events but in general the point estimates of hazard ratios were in agreement with the odds ratios from the analysis of disease status at baseline.
Chromosome 10q11.21 (rs2146807) showed an association with a history of MI at baseline but not with CHD during follow-up. The main difference between these analyses is that all cases with history at baseline were obviously non-fatal whereas at follow-up we observe both fatal and non-fatal events. To explore this further we built a logistic regression model to assess the associations between the SNPs and the fatality of first CHD event during the follow-up. The strongest association was found with rs2146807, which gave a per allele odds ratio of 0.614 (0.414–0.910) (FDR 25%). This suggests that the odds ratio of 1.36 for allele C for the prevalent MI might have been due to the increased case fatality seen for the T allele.
The results from the analysis of the association between SNPs and classic risk factors (Table V) show that the risk alleles of chromosome 1p13.3 (rs599839 and rs4970834) were associated with higher values of non-HDL cholesterol. Weaker associations were found between chromosome 15q22.33 (rs17228212) and non-HDL cholesterol and between chromosome 2q36.3 (rs2943634) and HDL-cholesterol. Chromosomes 2q36.3 (rs2943634) and 6q25.1 (rs12525353) were associated with blood pressure.
The association between chromosome 9p21.3 (rs1333049) and CHD was strongly confirmed in the prospective cohorts studied. The risk allele C of chromosome 9p21.3 was also found to increase the risk of stroke. Chromosome 19q12 (rs11670734) was associated with total mortality.
An association with disease outcomes was also found for chromosomes 1p13.3 (rs599839), 1q41 (rs17465637) and 10q11.21 (rs2146807) but not for chromosomes 2q36.3 (rs2943634), 6q25.1 (rs6922269) or 15q22.33 (rs17228212). Chromosomes 2q36.3 and 15q22.33 were, however, associated with HDL-cholesterol and non-HDL cholesterol, respectively. The risk alleles of rs599839 and rs4970834 (1p13) were also associated with higher values of non-HDL cholesterol. The association between rs599839 and LDL-cholesterol was reported earlier [Kathiresan et al., 2008] whereas the lipid associations for chromosomes 2q36.3 and 15q22.33 has not been previously identified. The associations between chromosomes 2q36.3 (rs2943634) and 6q25.1 (rs12525353) and blood pressure have also not been reported before. Chromosome 9p21.3, which showed the strongest association with disease outcome, was not associated with classic risk factors. Additional studies are needed to identify the biological pathways for the variants studied. Drug treatment for high cholesterol or for high blood pressure is a potential confounder when analyzing the impact of a gene on cholesterol or on blood pressure. Therefore, it is better to exclude individuals on drug treatment from these analyses if the impact of the drug treatment cannot be modelled reliably.
The findings are stronger for the disease status at baseline than for the disease events during the follow-up. This may reflect a short follow-up time in some cohorts and the fact that individuals with disease history at baseline have had a disease event at a relatively young age and thus possibly being genetically more predisposed. Allele T of chromosome 10q11.21 (rs2146807) seems to be associated with increased CHD case fatality but additional studies are also needed to confirm this association.
Only three out of the seven SNPs which were shown to have a significant association with CHD in the WTCCC Study and German MI Family Study were replicated in the MORGAM study of prevalent disease. Moreover, only one was associated in with CHD in this prospective analysis. This may reflect genetic heterogeneity, gene-age interactions [Lasky-Su et al., 2008; Shi and Rao, 2008], the greater power of GWAS, or the possibility of false-positive association, and so the data must be interpreted with caution. While GWAS findings may be useful in the discovery of previously unsuspected pathways of pathogenesis, our findings illustrate the difficulty that we face in differentiating true signals from noise, when one takes, as one recent editorial put it “a drink from the fire hose,” i.e. indulges in multiple hypothesis testing [Hunter and Kraft, 2007]. Even with mindful consideration of the probability of false discovery and adjustment for multiple comparisons, Strömberg et al. do well to remind us that the p-value is inherently confounded information, a mix of information about the effect size and the effective sample size [Strömberg et al., 2008]. Their alternative approach, conceptually echoing some of the tenets of Bradford Hill, is to take account of the prior information around putative effect size estimates. However, the objective determination of the priors remains as a challenge.
In the present study it is notable that even though the two SNPs, whose association with disease has been replicated in prospective analysis, are carried in significantly different proportions in the Scandinavian and Irish populations, they are not strongly associated with the intermediate phenotypes or classic risk factors. There is undoubtedly still much to be learned about the possible mechanisms of effect for these SNPs. In particular, more power will be needed to reveal possible gene-environment interactions. Even with 212,577 person years of follow-up in these MORGAM cohorts, disentangling meaningful departures from a multiplicative model of interaction could require a considerable effort. Moreover, there may be gene-environment interactions that depend on threshold or cumulative lifetime effects. Nevertheless, it is acknowledged that longitudinal analysis must play a central role in the exploitation of new GWAS findings [Clark et al., 2005]. Initiation of disease may begin long before the phenotype becomes diagnosed, but at a particular point in time, individuals with a certain genotype may display a range of alternative phenotypes, influenced by the range of possible environments to which they are exposed. Few genetic studies take the dynamics of the relationships between an individual’s genotype and phenotypic outcome into account [Sing et al., 2004].
We have also provided some evidence, at least in respect of the 10q11.21 (rs2146807) SNP, of how the juxtaposition of divergent prospective and cross-sectional case-control associations, may indicate the presence of prevalence-incidence “survival” bias. Thus, without replication studies couched in a prospective setting, the implications of GWAS findings for public health and prevention are all the harder to evaluate.
This research was part funded through the European Community’s Seventh Framework Programme (FP7/2007-2013), ENGAGE project, grant agreement HEALTH-F4-2007- 201413 and through the Finnish Heart Association.
Contract grant sponsor: European Community’s Seventh Framework Programme; Contract grant number: FP7/2007-2013; Contract grant sponsor: ENGAGE project; Contract grant number: HEALTH-F4-2007-201413; Contract grant sponsor: Finnish Heart Association.
Sites and key personnel of contributing MORGAM Centres:
FINRISK, National Public Health Institute, Helsinki: V. Salomaa (principal investigator), A. Juolevi, E. Vartiainen, P. Jousilahti; ATBC, National Public Health Institute, Helsinki: J. Virtamo (principal investigator), H. Kilpeläinen; MORGAM Data Centre, National Public Health Institute, Helsinki: K. Kuulasmaa (head), Z. Cepaitis, A. Haukijärvi, B. Joseph, J. Karvanen, S. Kulathinal, M. Niemelä, O. Saarela; MORGAM Central Laboratory, National Public Health Institute, Helsinki: L. Peltonen (responsible person), K. Silander, S. Knaappila, M. Alanne, P. Laiho, M. Perola;
National Coordinating Centre, National Institute of Health and Medical Research (U258), Paris: P. Ducimetière (national coordinator), A. Bingham; PRIME/Strasbourg, Department of Epidemiology and Public Health, Louis Pasteur University, Faculty of Medicine, Strasbourg: D. Arveiler (principal investigator), B. Haas, A. Wagner; PRIME/Toulouse, Department of Epidemiology, Faculty of Medicine, Toulouse-Purpan, Toulouse: J. Ferrières (Principal Investigator), J-B. Ruidavets, V. Bongard, D. Deckers, C. Saulet, S. Barrere, M. Soubiraa; PRIME/Lille, Department of Epidemiology and Public Health, Pasteur Institute of Lille: P. Amouyel (principal investigator), M. Montaye, B. Lemaire, S. Beauchant, D. Cottel, C. Graux, N. Marecaux, C. Steclebout, S. Szeremeta; MORGAM Laboratory, INSERM U525, Paris: F. Cambien (responsible person), L. Tiret, V. Nicaud, D.A. Tregouet, C. Perret, C. Proust, M. de Suremain;
Northern Sweden, Umeå University Hospital, Department of Medicine, Umeå: B. Stegmayr (principal investigator), K. Asplund (former principal investigator), M. Eriksson;
PRIME/Belfast, Queen’s University Belfast, Belfast, Northern Ireland: F Kee (principal investigator), A. Evans (former principal investigator) J. Yarnell, E. Gardner; MORGAM Coordinating Centre, Queen’s University Belfast, Belfast, Northern Ireland: A. Evans (MORGAM coordinator), S. Cashman;
A. Evans (chair), S. Blankenberg (Mainz, Germany), F. Cambien, M. Ferrario (Varese, Italy) , K. Kuulasmaa, L. Peltonen, M. Perola, V. Salomaa, D. Shields (Dublin, Ireland), P.-G. Wiklund (Sweden), H. Tunstall-Pedoe (Dundee, Scotland), K. Asplund (honorary consultant, Stockholm Sweden), B. Stegmayer (former member).