|Home | About | Journals | Submit | Contact Us | Français|
Early efforts to localize and identify genes that contribute to risk of common chronic diseases often used either candidate gene studies or family-based linkage studies, which suffered from low statistical power, lack of replication, and low precision (1). Although there were successes, progress was generally slow. Recently, genome-wide association studies (GWAS) have proven to be productive when they have adequate sample sizes and replication opportunities. Their primary aim is to identify novel genetic loci associated with inter-individual variation in the levels of risk factors, the measure of subclinical disease, or the risk of clinical events. The method does not require assumptions about a priori biologic involvement, is precise in its ability to localize genetic effects to relatively small regions of the genome, and can be extended to evaluate potential gene-environment interactions.
GWAS have successfully identified genetic loci associated with a variety of conditions such as type 2 diabetes (2) and coronary disease (3–5). The large number of statistical tests required in GWAS poses a special challenge because few studies that have DNA and high-quality phenotype data are sufficiently large to provide adequate statistical power for detecting small to modest effect sizes (6). Meta-analyses combining previously published findings have improved the ability to detect new loci (2). Even before the era of GWAS (7), the requirement for large sample sizes and the importance of replication have served as powerful incentives for collaboration.
Our understanding of the risk factors for common chronic diseases has benefited from large population-based cohort studies. Although these studies are costly and time consuming, they are generally free of the survival and recall biases typically encountered in case-control studies. The cohort design, with its prospective standardized data collection, is often the preferred method for estimating disease incidence and evaluating risk factors. The Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium was formed to facilitate GWAS meta-analyses and replication opportunities among multiple large and well-phenotyped cohort studies. The design of the CHARGE Consortium includes five prospective cohort studies from the US and Europe: the Age, Gene/Environment Susceptibility (AGES)--Reykjavik Study, the Atherosclerosis Risk in Communities (ARIC) Study, the Cardiovascular Health Study (CHS), the Framingham Heart Study (FHS), and the Rotterdam Study (RS).
With genome-wide data on about 38,000 individuals, these cohort studies have a large number of phenotypes measured in a similar way, and a prospective meta-analysis of within-study association data from the 5 studies, with a properly selected level of genome-wide significance, is a powerful approach to finding genuine phenotypic associations with novel genetic loci. The CHARGE Consortium provides a unique opportunity for collaborative investigation of the genetic determinants of risk factors, measures of subclinical disease, and clinical events.
Participating studies (8–14) were prospective cohort studies that had multiple cardiovascular and aging phenotypes in common and that had genome-wide scans completed or in progress in 2007–2008 (Table 1). Briefly, the AGES-Reykjavik Study represents a sample drawn from the established population-based cohort, the Reykjavik Study (8). The original Reykjavik Study comprised a random sample of 30,795 men and women living in Reykjavik in 1967 and born between 1907 and 1935. Over the years 1967–1996, 6 examinations were conducted in 6 subcohorts. Between 2002 and 2006, the AGES-Reykjavik study re-examined 5764 survivors of the original cohort. The ARIC study is a population-based prospective cohort study of cardiovascular disease and its risk factors sponsored by National Heart, Lung, and Blood Institute (NHLBI). ARIC included 15,792 individuals aged 45–64 years at baseline (1987–89), chosen by probability sampling from four US communities (9). Cohort members completed four clinic examinations, conducted three years apart between 1987 and 1998. Follow-up for clinical events is annual. The CHS is a population-based NHLBI-funded cohort study of risk factors for cardiovascular disease in adults 65 years of age or older conducted at four field centers (10). The original predominantly white cohort of 5201 persons was recruited in 1989–1990 from random samples of the Medicare lists. An additional 687 African-Americans were enrolled in 1992–93. CHS participants completed standardized clinical examinations and questionnaires at study baseline and at nine annual follow-up visits. Follow-up for clinical events occurs every 6 months. The FHS began in 1948 with the recruitment of an original cohort of 5209 men and women who were 28 to 62 years of age at entry (11). Clinic examinations were performed approximately every two years. In 1971, a second generation of study participants, 5124 children and spouses of children of the original cohort were enrolled (12). With two exceptions, clinic examinations took place approximately every four years. Enrollment of the third generation cohort of 4095 children of offspring cohort participants began in 2002 (13). The RS is a prospective population-based cohort study comprising 7983 subjects aged 55 years or older. A trained interviewer visited the individuals at home for a computerized questionnaire, and individuals were subsequently examined at a research center. Baseline data were collected between 1990 and 1993 (14). The original cohort underwent 3 additional examinations. In 2000–2001, an additional 3011 individuals aged 55 or older (mainly 55–64) were recruited and examined. Since 2006, an additional cohort of individuals aged 45 years or older (mainly 45–59 years) is being recruited, comprising 3236 subjects as of May 1, 2008. All of the CHARGE cohort studies were approved by their respective institutional review committee, and the subjects from all the cohorts provided written informed consent.
Each cohort study has its own administrative structure and set of investigators. Although investigators from several cohorts had occasionally collaborated on analyses, there was little precedent for consortia of cardiovascular epidemiology cohorts. In late 2007, it became clear that because all cohorts shared both a common prospective population-based design and a large number of phenotypes assessed by similar data-collection methods (Table 2), a cohort-level collaboration would facilitate a series of prospectively planned joint meta-analyses. The resulting CHARGE consortium represents a voluntary federation of 5 large complex studies. Between October 2007 and February 2008, the principles and procedures for the CHARGE consortium were developed and approved by the parent studies (public website: http://web.chargeconsortium.com).
The primary aim is the conduct of high-quality analyses that produce, in an efficient and timely manner, reliable and valid findings across multiple cardiovascular and aging-related phenotypes. The organizational structure is simple and comprises a Research Steering Committee (RSC), an Analysis Committee, a Genotyping Committee, and approximately 20 phenotype-specific working groups. The RSC, which has 2 representatives from each cohort, is responsible for establishing the other committees, for nominating working group members, and for developing general guidelines for collaboration, authorship, sharing of results, publication, and timely participation. The Analysis Committee develops guidelines that the working groups are encouraged to adopt or adapt, and the Genotyping Committee coordinates requests for follow-up genotyping.
The main scientific work takes place in the phenotype-specific working groups, which have responsibility for developing and executing the scientific plans. Working groups standardize phenotypes across the cohorts, decide whether and how to include other non-member studies with similar phenotypes, and agree on analysis plans, often with input from the Analysis Committee. The phenotype working groups also develop plans for authorship and manuscripts, evaluate results, write manuscripts, and decide on the need for follow-up genotyping.
For each manuscript, the working-group investigators establish pre-specified plans for analysis and timelines for participation. Before results are shared, each cohort must formally opt-in or opt-out of participation. For any phenotype, each cohort may work with other studies or consortia rather than CHARGE, and individual cohorts remain free to publish cohort-specific findings for any phenotype. The decision to opt-in represents a commitment to collaborate only with the CHARGE working group for that particular analysis until the manuscript is accepted for publication. Only investigators from cohorts that have opted-in have access to shared results. After the results have been shared, investigators cannot opt out to publish their findings on their own. Working group members agree not to share the GWAS findings with outside groups without the permission of the members who generated the data. Transparency, disclosure, and communications about all collaborations, additional follow-up experiments, or efforts to obtain additional funding have been essential to developing, ensuring, and maintaining trust within the consortium.
In practice, many of the CHARGE phenotype working groups have already engaged investigators from non-member studies as collaborators, including at least a dozen other studies from the US and Europe. Collaborating non-member studies either agree to the overall CHARGE principles, or the CHARGE working group develops and negotiates a new CHARGE-compatible agreement with the non-member studies or consortia.
Using traditional authorship criteria (15), the CHARGE RSC encourages the designation of multiple co-equal first and last authors so that the authorship matches the scientific contributions of conducting and coordinating analyses from five complex studies. Special efforts are made to provide opportunities for young investigators. The original CHARGE consortium agreement calls for posting shared results on a public website once a manuscript is published in a journal. Recent change in the NIH GWAS policy may affect this plan (16). The consortium remains committed to the NIH GWAS policy on intellectual property (17).
The CHARGE consortium was developed after each cohort study had contracted for their genotyping platforms and decided on the selection of the individuals to be included in the GWAS. Indeed, the five cohorts used four different platforms (Table 3), which have fewer than about 60,000 SNPs in common. To maximize the availability of comparable genetic data and coverage of the genome, each cohort used recently developed methods (18,19) to impute for Europeans and European Americans their genotypes at each of the 2.5 million autosomal CEPH HapMap SNPs. Prior to imputation, individuals were excluded for low call rates or sex mismatches (Table 3). Next, criteria such as high levels of missingness, highly significant departures from Hardy-Weinberg equilibrium, or low minor allele frequencies (MAF) were used to determine which SNPs to include in the imputation step. All the remaining individuals and SNPs entered the imputation process, which provided estimates for all the HapMap SNPs, including any that may have failed the data-cleaning criteria.
The ratio of the observed dosage variance to the expected binomial variance, the dosage-variance ratio, has proved to be useful metric of imputation quality. To assess accuracy of imputed genotypes, cohorts compared the imputation output to SNPs that had been previously genotyped on other platforms and that had not been used in the imputation process. In an internal analysis that compared the imputed SNPs to the actually genotyped SNPs in the RS, the mean concordance (number of concordant individuals/total number of individuals) between the imputed and the genotyped SNPs was 0.989 for imputed SNPs with a dosage-variance ratio >= 0.9. For ratios between 0.5 and 0.9, the concordance was 0.937; and for ratios <= 0.5, it was 0.889. Validation efforts produced similar results in other cohorts.
The CHARGE Analysis Committee developed a set of general plans as guidelines for all working groups. The issues include quality control of genotype data, decisions about what results to share across cohorts, formats for sharing data, strand alignments, coding of alleles, choice of covariates for adjustments, detection of and correction for population structure, within-study phenotype analysis plans, between-study meta-analysis methods, and the importance of written analysis plans prior to sharing the results. The goal was to provide a flexible plan that could be adapted or adopted by working groups. For each stage in the analysis, there are several valid options available, and some are summarized briefly.
Special features of the CHARGE consortium are the large overall sample size, the population-based recruitment of cohort members, the standardized methods of data collection, and the prospective follow-up for clinical events. In case-control studies, it is not usually possible to obtain DNA from fatal cases. With the cohort design, DNA is generally available for all events, including the fatal ones; and failure-time models are recommended for associations with incident disease.
For most traits, the additive or the 1 degree-of-freedom regression model is used to assess the association between the phenotype and the number of copies of a specified allele. For many patterns of ‘true’ associations, tests derived from this model have good power compared with other approaches (20). The single regression coefficient is readily interpreted and easily used in meta-analysis. When imputed genotypes are used, the observed allele count is simply replaced by the imputation’s “estimated dosage.” Standard errors for the regression estimates are usually calculated with model-robust (‘sandwich’) methods. Routine adjustment is anticipated for age and sex though specific studies may also adjust for site (CHS, ARIC), for family relationships (FHS), or for cohort (FHS, RS). When necessary, principal components analysis is used to correct for within-study population structure (21). Additionally, the method of genomic control is used to correct both within-study and meta-analyzed GWAS results for possible stratification (22).
For the additive model, the regression coefficients estimate the difference in phenotype associated with each extra copy of the minor allele. Due to low power and potentially misleading results, meta-analyses are not reported for those SNPs for which the MAF or the effective sample size, across CHARGE, is too small (23). The acceptable lower threshold of MAF depends on the total sample size for continuous traits or on the total number of events across all cohorts for dichotomous traits.
The analysis of 2.5 million SNPs across the genome poses an obvious multiple-testing problem. Before sharing results, working groups select a p-value threshold to identify a set of genotype-phenotype associations, almost all of which can be expected to replicate in similar populations. With 2.5 million tests, the use of a Bonferroni correction to control the Family-Wise Error rate (FWER) at 0.05 yields a threshold p-value of 2E-8. Another way to interpret this threshold is to estimate the expected number of false-positive (EFP) tests: if there are no true associations, each test contributes on average 2E-8 false positives and, across the genome, yields an expected total of 0.05 false-positive results. Similarly, a threshold of 1/2.5 million, which equals 4E-7, gives an expectation of one false-positive result for all tests. Unlike the FWER interpretation, the control of EFP is not “conservative” for correlated tests (24). The CHARGE Analysis Committee recommends pre-specifying a fixed p-value threshold as well as a number of tests, but the decision about the exact threshold to use is left up the working groups. The Analysis Committee has also provided power calculations for both continuous (Supplemental Figure 1) and binary phenotypes (Supplemental Figure 2).
When promising results from GWAS meta-analyses arise from SNPs that were imputed in some or all of the cohorts, genotyping the imputed markers in a sample of the existing cohort members serves to validate the imputation process. For the purpose of replication, genotyping high-signal SNPs in independent samples provides additional evidence about the presence or absence of an association. The number of SNPs and the number of independent individuals to be genotyped depend on the available resources and populations. Key follow-up efforts--resequencing high signal areas, fine mapping and functional studies--are likely to require new resources.
The cohort-study methods papers provide detail about many of the phenotypes listed in Table 2. For CHD, investigators knowledgeable about the phenotype in each study decided to focus on fatal and non-fatal myocardial infarction (MI) as the primary outcome because the MI criteria differed in only trivial ways among the studies. There were some minor differences in the definition of the composite outcome of MI, fatal CHD, and sudden death, which became the secondary outcome. Only subjects at risk for an incident event were included in the analysis. MI survivors whose DNA was drawn after the event were not eligible. The primary analysis was restricted to Europeans or European Americans. Patients entered the analysis at the time of the DNA blood draw, and were followed until an event, death, loss to follow up, or the last visit. The main recommendations of the Analysis Committee were adopted, and a threshold of 5 × 10−8 was selected for genome-wide statistical significance. Analyses in progress include about 1700 MIs and 2300 CHD events among about 29,000 eligible patients. Each cohort conducted its own analysis, and results were uploaded to a secure share site for the fixed-effects meta-analysis. Even with this number of events (Supplemental Figure 2), power is good for only for relatively high MAFs (> 0.25) and large relative risks (> 1.3).
In thousands of published papers, the five CHARGE cohort studies and many of the collaborating studies have already characterized the risk factors for and the incidence and prognosis of a variety of aging-related and cardiovascular conditions. The analysis of the incident myocardial infarction, for instance, is free from the survival bias typically associated with cross-sectional or case-control studies. The methodologic advantages of the prospective population-based cohort design, the similarity of phenotypes across five studies, the availability of genome-wide genotyping data in each cohort, and the need for large sample sizes to provide reliable estimates of genotype-phenotype associations have served as the primary incentives for the formation of the CHARGE consoritum, which includes GWAS data on about 38,000 individuals. The consortium effort relies on collaborative methods that are similar to those used by the individual contributing cohorts.
Phenotype experts who know the studies and the data well are responsible for phenotype-standardization across cohorts. The coordinated prospectively planned meta-analyses of CHARGE provide results that are virtually identical to a cohort-adjusted pooled analysis of individual level data. This approach--the within-study analysis followed by a between-study meta-analysis--avoids the human subjects issues associated with individual-level data sharing.
Editors, reviewers, and readers expect replication as the standard in science (6). The finding of a genetic association in one population with evidence for replication in multiple independent populations provides moderate assurance against false-positive reports and helps to establish the validity of the original finding. In a single experiment, the discovery-replication structure is traditionally embodied in a two-stage design. The CHARGE consortium includes up to five independent replicate samples as well as additional collaborating studies for some phenotype working groups, so that it would have been possible to set up analysis plans within CHARGE to mimic the traditional two-stage design for replication. For instance, the two largest cohorts could have served as the discovery set and the others as the replication set. However, attaining the extremely small p-values expected in GWAS requires large sample sizes. For any phenotype, a prospective meta-analysis of all participating cohorts, with a properly selected level of genome-wide statistical significance to miminize the chance of false positives, is the most powerful approach to finding new genuine associations for genetic loci (25). When findings narrowly miss the pre-specified significance threshold, genotyping individuals in other independent populations provides additional evidence about the association. For findings that substantially exceed pre-established significance thresholds, the results of a CHARGE meta-analysis effectively provide evidence of a multi-study replication.
The effort to assemble and manage the CHARGE consortium has provided some interesting and unanticipated challenges. Participating cohorts often had relationships with outside study groups that pre-dated the formation of CHARGE. Timelines for genotyping and imputation have shifted. Purchases of new computer systems for the volume of work were sometimes necessary. Each cohort came to the consortium with their own traditions for methods of analysis, organization, and authorship policies that, while appropriate for their own work, were not always optimal for collaboration with multiple external groups. Within each cohort, the investigators had often formed working groups that divided up the large number of available phenotypes in ways that made sense locally but did not necessarily match the configuration that had been adopted by other cohorts. The RSC has attempted to create a set of CHARGE working groups that accommodate the needs and the conventions of the various cohorts. Transparency, disclosure, and professional collaborative behavior by all participating investigators have been essential to the process.
Resource limitations are another challenge. Grant applications that funded the original single-study genome-wide genotyping effort typically imagined a much simpler design. The CHS whole-genome study had as its primary aim, for instance, the analysis of data on three endpoints, coronary disease, stroke and heart failure. With a score of active phenotype working groups, the CHARGE collaboration broadened the scope of the short-term work well beyond initial expectations for all the participating cohorts.
One of the premier challenges has been communciations among scores of investigators at a dozen sites. CHS and ARIC are themselves multi-site studies. To be successful, the CHARGE collaboration has required effective communications: (1) within each cohort; (2) between cohorts; (3) within the CHARGE working groups; and (4) among the major CHARGE committees. In addition to the traditional methods of conference calls and email, the CHARGE “wiki,” set up by Dr J Bis (Seattle, WA), has provided a crucial and highly functional user-driven website for calendars, minutes, guidelines, working group analysis plans, manuscript proposals, and other documents. In the end, there is no substitute for face-to-face meetings, especially at the beginning of the collaboration, and this complex meta-organization has benefited from several CHARGE-wide meetings.
The major emerging opportunity is the collaboration with other studies and consortia. Many working groups have already incorporated non-member studies into their efforts. Several working groups have coordinated submissions of initial manuscripts with the parallel submission of manuscripts from other studies or consortia. Several working groups have embarked on plans for joint meta-analyses between CHARGE and other consortia. CHARGE has tried to acknowledge and reward the efforts of champions, who assume leadership responsibility for moving these large complex projects forward and who are often hard-working young investigators, the key to the future success of population science.
The CHARGE Consortium represents an innovative model of collaborative research conducted by research teams that know well the strengths, the limitations, and the data from five prospective population-based cohort studies. By leveraging the dense genotyping, deep phenotyping and the diverse expertise, prospective meta-analyses are underway to identify and replicate the major common genetic determinants of risk factors, measures of subclinical disease, and clinical events for cardiovascular disease and aging.
The authors thank Drs Josh Bis, Nicole Glazer, and Ken Rice for comments on earlier drafts. A full list of investigators from the CHARGE cohorts appears at: http://web.chargeconsortium.com.
Funding sources: Age, Gene/Environment Susceptibility--Reykjavik Study has been funded by NIH contract N01-AG-12100, the NIA Intramural Research Program, Hjartavernd (the Icelandic Heart Association), and the Althingi (the Icelandic Parliament). The Atherosclerosis Risk in Communities Study is carried out as a collaborative study supported by National Heart, Lung, and Blood Institute contracts N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, N01-HC-55022 and R01HL087641; National Human Genome Research Institute contract U01HG004402; and National Institutes of Health contract HHSN268200625226C. The authors thank the staff and participants of the ARIC study for their important contributions. Infrastructure was partly supported by Grant Number UL1RR025005, a component of the National Institutes of Health and NIH Roadmap for Medical Research. Cardiovascular Health Study: The research reported in this article was supported by contract numbers N01-HC-85079 through N01-HC-85086, N01-HC-35129, N01 HC-15103, N01 HC-55222, N01-HC-75150, N01-HC-45133, grant numbers U01 HL080295 and R01 HL087652 from the National Heart, Lung, and Blood Institute, with additional contribution from the National Institute of Neurological Disorders and Stroke. A full list of principal CHS investigators and institutions can be found at http://www.chs-nhlbi.org/pi.htm. Framingham Heart Study: From the Framingham Heart Study of the National Heart Lung and Blood Institute of the National Institutes of Health and Boston University School of Medicine. This work was supported by the National Heart, Lung and Blood Institute’s Framingham Heart Study (Contract No. N01-HC-25195) and its contract with Affymetrix, Inc for genotyping services (Contract No. N02-HL-6-4278), and by grants from the National Institute of Neurological Disorders and Stroke (NS17950; PAW) and the National Institute of Aging, (AG08122, AG16495; PAW). Analyses reflect intellectual input and resource development from the Framingham Heart Study investigators participating in the SNP Health Association Resource (SHARe) project. Rotterdam Study: The GWA database of the Rotterdam Study was funded through the Netherlands Organisation of Scientific Research NWO (nr. 175.010.2005.011). The Rotterdam Study is supported by the Erasmus Medical Center and Erasmus University, Rotterdam; the Netherlands Organization for Scientific Research (NWO), the Netherlands Organization for Health Research and Development (ZonMw), the Research Institute for Diseases in the Elderly (RIDE), the Ministry of Education, Culture and Science, the Ministry for Health, Welfare and Sports, the European Commission (DG XII), and the Municipality of Rotterdam.
Disclosures and conflicts: None.
The authors had full access to and take full responsibility for the integrity of the data. All authors have read and agree to the manuscript as written.