|Home | About | Journals | Submit | Contact Us | Français|
Genome-wide association study (GWAS) consortia and collaborations formed to detect genetic loci for common phenotypes or investigate gene-environment (G*E) interactions are increasingly common. While these consortia effectively increase sample size, phenotype heterogeneity across studies represents a major obstacle that limits successful identification of these associations. Investigators are faced with the challenge of how to harmonize previously collected phenotype data obtained using different data collection instruments which cover topics in varying degrees of detail and over diverse time frames. This process has not been described in detail. We describe here some of the strategies and pitfalls associated with combining phenotype data from varying studies. Using the Gene Environment Association Studies (GENEVA) multi-site GWAS consortium as an example, this paper provides an illustration to guide GWAS consortia through the process of phenotype harmonization and describes key issues that arise when sharing data across disparate studies. GENEVA is unusual in the diversity of disease endpoints and so the issues it faces as its participating studies share data will be informative for many collaborations. Phenotype harmonization requires identifying common phenotypes, determining the feasibility of cross-study analysis for each, preparing common definitions, and applying appropriate algorithms. Other issues to be considered include genotyping timeframes, coordination of parallel efforts by other collaborative groups, analytic approaches, and imputation of genotype data. GENEVA's harmonization efforts and policy of promoting data sharing and collaboration, not only within GENEVA but also with outside collaborations, can provide important guidance to ongoing and new consortia.
The vast majority of genome-wide association studies (GWAS) have focused on the main effects of gene variants at specific loci on disease outcomes or traits, although most associations identified so far account for only a small portion of the phenotype variations seen [McCarthy et al., 2008; Hindorff et al., 2009]. To extend these findings GWAS consortia and collaborations have formed in which investigators share data to achieve adequate power to identify genetic loci associated with secondary phenotypes and increase the power to study less common outcomes [Psaty et al., 2009; Manolio et al., 2007]. There have been several single center studies of gene-environment (G*E) interactions in complex disease [Kang et al., 2010; Cornelis et al., 2009] but the development of consortia also offers the opportunity to enhance power to discover and verify G*E interactions. Investigators planning new studies and looking to establish consortia or collaborations can share information on how phenotypes are being measured, what questions are being asked, and how responses will be coded so that similar data can be combined with relative ease. Tools such as the PhenX Toolkit of standardized, high priority measures (https://www.phenxtoolkit.org) are increasingly available to investigators planning new studies, though most current collaborations involve existing studies whose phenotypes and data collection instruments are already defined.
While considerable effort has gone into reducing genotype measurement error and ensuring genotype accuracy and consistency of results [de Bakker et al., 2008], phenotype heterogeneity, in both outcomes and covariates across studies, represents a major challenge to successful GWAS analysis of common traits [Zeggini and Ioannidis, 2009; Seminara et al., 2007] or genome-wide assessment of G*E interactions in complex diseases. The recent discovery of markers that are specifically associated with estrogen-receptor negative breast cancer highlights the potential importance of specific, harmonized phenotypes: earlier GWAS of a general breast cancer phenotype did not identify these markers, due to lack of power [Kraft and Haiman, 2010]. In contrast, cross-study analysis groups, such as those established by the Psychiatric Genetics Consortium [Psychiatric GWAS Consortium Coordinating Committee et al., 2009], have begun to analyze GWAS results related to differing psychiatric diagnoses that are known to have shared genetic underpinnings (e.g. Schizophrenia and Bipolar Disorder, Bipolar and Major Depressive Disorder) [International Schizophrenia Consortium et al., 2009; Liu et al., 2011]. The existing literature suggests that phenotype harmonization may reveal novel loci for disease subtypes as well as shared variants among broad categories of disease. These examples illustrate how phenotype harmonization is a crucial first step and a key component of successful multi-study collaboration, where investigators are faced with the challenge of harmonizing previously collected phenotype data derived from diverse data collection instruments over different timeframes and in varying degrees of detail.
Phenotype harmonization in GWAS consortia and collaborations involves a multi-stage process of 1) identifying commonalities and differences in phenotype data to assess the feasibility and potential benefit of combining data; 2) developing common data definitions for phenotypes of interest; and 3) creating and applying study-specific algorithms to convert data into a common format. The goal is to maximize the comparability and compatibility of data across datasets and to minimize inconsistencies and misclassification [Seminara et al., 2007]. The process must balance the need to augment the sample size which increases power for gene discovery with the likelihood of increased data heterogeneity which will decrease power to detect real effects.
GENEVA (Gene Environment Association Studies) is a multi-site collaborative program initiated in 2006 as part of the National Institutes of Health (NIH) Genes, Environment and Health Initiative (GEI) that aims to accelerate the understanding of genetic and environmental contributions to health and disease [Cornelis et al., 2010]. The GENEVA consortium, led by the National Human Genome Research Institute (NHGRI) and working closely with representatives from several institutes at the NIH, includes a Coordinating Center (CC), two genotyping centers, and fourteen independently-designed GWAS whose primary outcomes of interest include addiction, blood pressure, cardiovascular disease, chronic obstructive pulmonary disease, dental caries, type 2 diabetes, lung cancer, maternal metabolism and birth weight interactions, oral clefts, premature birth, primary open-angle glaucoma, prostate cancer, stroke, and venous thrombosis. Two additional GWAS examining coagulation and melanoma joined the consortium in 2009. Each study has collected relevant environmental exposure data. The participating studies vary widely in design: some of the longitudinal studies have been ongoing for years or even decades, others have international data collection sites, and some are family-based studies with phenotype and genotype data available on multiple family members. Using GENEVA as a practical example, this paper provides an illustration to guide GWAS consortia through the process of phenotype harmonization and describes key phenotype harmonization issues that arise when sharing data across disparate studies. The principles outlined here can also be applied to other data sharing contexts.
Phenotype harmonization must be recognized early on as a key requirement for cross-study collaboration. Key personnel should be identified and, depending on the complexity, size, number, and primary endpoints of the participating studies, a group or committee should be established or identified that takes responsibility for directing and overseeing phenotype harmonization activities. In GENEVA, three groups are involved in phenotype harmonization: a Phenotype Harmonization Subcommittee (PHS); the phenotype-specific Working Groups (WGs); and the Coordinating Center (CC) (Figure 1).
Given the diverse nature of studies supported by GENEVA and their lack of familiarity with each others' phenotypes, the GENEVA Steering Committing agreed that phenotype harmonization was a core task warranting investigator participation and leadership, so a PHS was created. The PHS includes individual study, NIH and CC representatives who are interested in harmonization of specific phenotypes and exposures or in increasing sample size for improving power of analyses in their areas of investigation. The Subcommittee provides a forum for discussion of common problems, sets policies, identifies phenotypes common across studies, encourages data sharing, establishes and oversees phenotype-specific WGs, and provides advice, direction and feedback to the CC regarding phenotype harmonization and related issues.
Phenotype WGs consist of representatives from each study contributing data to that phenotype's cross-study analyses. These representatives generally have expertise in the subject area and are aware of the intricacies involved in categorizing or characterizing the phenotype. The WGs also have representatives from the NIH and the CC. Each WG is led by an investigator with a specific interest in the phenotype who manages and coordinates the group's activities. The WGs evaluate the feasibility and logistics for cross-study analyses related to specific phenotypes of interest. They identify and define variables to be shared, identify covariates, recommend the most appropriate analytic methods, draft analysis plans using a template developed by the PHS, and identify the investigators performing the analyses, usually members of the WG.
The CC is responsible for phenotype data organization and coordinating activities related to data collection and management and facilitating cross-study analyses. The CC assists the PHS with identifying potential areas of common interest, harmonizes covariates, and establishes and manages a centralized relational database which serves as a common phenotype/genotype repository providing working groups with cross-study data upon request. The CC assists working groups with phenotype harmonization and provides them with statistical summaries of phenotype data and a central file-sharing site. The CC may also assist working group investigators with data analysis, if requested.
Phenotype harmonization involves a number of integrated activities (Table 1). Note that the elements below may be addressed concurrently and not necessarily in the order given.
Phenotype harmonization starts with an inventory of phenotypes that investigators are interested in pursuing during cross-study analyses. For consortia involving only a few studies, or for consortia in which all studies have a common interest (e.g. stroke), this may only require a review of data collected by different studies and discussion among investigators. For collaborations such as GENEVA that incorporate studies of various designs and diverse outcomes of interest, a more extensive (formal survey) process is required.
In GENEVA, the CC reviewed the initial phenotype submission plans and data collection forms for each study. The CC identified overlapping phenotype areas for which data had been collected, and created a web-based survey (see Supplementary Material, Appendix A) in which study investigators were asked to indicate, for 13 broad phenotype categories, if (a) their study had collected specific data, and if so, (b) the level of sharing, i.e. would they share it solely with other GENEVA investigators or would they share their data with authorized researchers through the controlled-access Database of Genotypes and Phenotypes (dbGaP) [Mailman et al., 2007]. Investigators were invited to list other phenotypes for which data were available and which could be shared in cross-study analyses. The CC tabulated responses and provided the information back to the PHS for review and discussion. If there was interest in a phenotype across three or more studies, the Subcommittee solicited a volunteer to lead a WG for that phenotype and solicited nominations for WG members from those studies interested in participating in cross-study analyses of those phenotypes. If only two studies were able to share data on a specific phenotype, they could still collaborate and perform cross-study analyses, but a formal WG might not be required.
The most important task of the phenotype-specific WGs is to determine the feasibility and logistics for cross-study analyses related to a given phenotype. The WGs' reviews need to take into account each study's data scope, consent limitations, and study design, including the actual questions asked, data collection protocols, phenotype definitions, possible values or responses, estimated number of individuals for whom there would be phenotype data, and any other factors that might influence analyses. The primary considerations fall into seven main categories:
Once the review of the data definitions and values described above indicates that data from different studies are comparable, common definitions and values need to be agreed upon. WGs may combine categories into larger units (e.g. drinks per day combined with drinking days per week to get drinks per week), stipulate inclusion and exclusion criteria (e.g. excluding those who have never had a drink), create a dichotomous variable (e.g. ever or never drinker, or use longitudinal and cross-sectional data to define whether the respondent is a current smoker or has quit smoking), or assign a standard measure to be used (e.g. BMI calculated from a variety of self-reported or laboratory assessments of height and weight). In general, the more tightly defined the phenotype, the greater the likelihood that one or more studies may be unable to contribute to the analyses; the looser the definition, the greater the likelihood that more subjects and more studies can be included. An example of how data from various studies might be combined is shown in Tables 2a and 2b.
For continuously distributed outcomes, GENEVA WGs, like other large-scale collaborative meta-analyses [Lindgren et al., 2009; Thorgeirsson et al., 2010], have applied the same transformation to all datasets for a single outcome. These transformations are discussed in a phenotype-specific manner for each meta-analysis and vary across phenotypes. Logical and extreme outliers are deleted or recoded before transformations are applied to avoid non-normal error distributions. The removal or recoding of outliers, application of a common transformation and a strategy for covariate selection (see below) ensures comparability and interpretation of the estimates from the contributing studies.
Phenotype WGs determine phenotype definitions, identify covariates, and set inclusion/exclusion criteria. But they must also consider the advantages and disadvantages of different analytic approaches, i.e. whether to perform meta-analyses of summary data provided by each study, analyses of pooled individual-level data, or both. If investigators choose to perform meta-analyses of summary data, the WG will need to agree on the subgroup analyses, data format, and analysis method used by each study to produce its summary statistics. Investigators need to address the following points:
The issues surrounding phenotype harmonization also apply to the selection and harmonization of covariates. For GENEVA, WGs have identified a standard set of covariates (e.g. sex and age) applicable to all studies. Other covariates relevant to the phenotype and that can be easily harmonized across contributing studies (e.g. smoking as a covariate for caffeine consumption) are also included. Where covariates are relevant to the characteristics of one study but not to others (e.g. menopause in a sample of older women) adjustment may be inappropriate and stratified analysis or exclusion may be a more appropriate strategy. Studies that have divergent assessment protocols for cases and control subjects have elected, in some instances, to analyze cases and controls separately. This strategy of uniform covariate selection is similar to that used by several other consortia in their large-scale meta-analyses [Lindgren et al., 2009; Thorgeirsson et al., 2010] and has led to comparability of the resulting parameter estimates from individual analyses. However, it can also result in some study-specific confounds. In GENEVA, it is the responsibility of the individual study investigators to assess the feasibility and acceptability of covariate adjustments.
Creation of a centralized database and repository for all phenotype and genotype data provides investigators with the opportunity to easily access the data as they identify new areas of interest for cross-study analyses. In GENEVA, once each study's genotype and phenotype data have gone through a standardized quality control cleaning process, the CC adds each study's phenotype data, data dictionary, individual-level consent and public use or GENEVA-only use status to a centralized relational database. As the WGs define common phenotypes and covariates for cross-study analyses, the CC applies study-specific algorithms to each study's phenotype data to create a dataset and data dictionary for the harmonized variables and covariates, and these are added to the centralized relational database.
While investigators work through the process of phenotype harmonization, they should simultaneously develop a preliminary analysis plan as issues will likely arise that affect harmonization decisions. In GENEVA, each WG drafts a plan for analysis using guidelines created by the PHS (Table 1), defining the variables or outcomes of interest and their type (e.g. whether they are discrete, ordinal or continuous variables), identifying covariates needed for the analyses, stating inclusion and exclusion criteria (e.g. race or ethnicity, value limits), describing the planned subgroup analyses and proposed analytic approach, and specifying individuals' roles in analysis. These plans are refined as preliminary analyses of the selected phenotypes are conducted by study investigators, and the current plans are posted on the consortium's website.
For WGs that have decided to conduct meta-analyses of summary data provided by each study, phenotypes are defined and variables recoded (if necessary) by all studies so that genome-wide analyses can be done on comparable measures. Analyses of individual study data are performed by each study's investigators as soon as that study's phenotype and genotype data have completed the cleaning process. The summary statistics are then provided to the WG member leading the meta-analysis. Where the WG has decided on an analysis of pooled individual-level data, the CC applies the appropriate algorithms to each dataset to harmonize each study's data according to the WG's definition, pools the data into one combined dataset, and provides this to the investigators doing the analysis.
In general, a majority of meta-analyses of gene effects conducted by GENEVA groups have adopted a fixed-effects model, with selective testing for random-effects in subsequent follow-up analyses. Kraft, Zeggini & Ioannidis [Kraft et al., 2009] provide evidence for reduced power and deflation of the heterogeneity parameter in standard meta-analyses – hence, heterogeneity testing is restricted to the most promising signals.
Genotyping of the magnitude required for a consortium such as GENEVA, which includes data on over 80,000 individuals from 16 disparate studies, does not take place on all studies simultaneously. One study's genotyping results arrive for cleaning and quality control checks while other studies' samples are being prepared for or are undergoing genotyping. Therefore genotyping timelines become a critical consideration when planning cross-study analyses. WGs must consider that GWAS results are emerging in the context of sharp competition among various research groups for publication, and there should be an ongoing process to decide when sufficient numbers are available to conduct a study that represents a meaningful advance. WGs may decide to proceed with data analyses when only a few studies are complete and if statistical thresholds are achieved because discoveries relating to the phenotype of interest are emerging so rapidly. The data from studies whose genotype results will be completed later can be used for replication analyses. Alternate strategies might be to wait until more studies' genotype results are available, refine the phenotype definitions so the group can conduct more specific, stratified analyses of key phenotypes, or collaborate with outside consortia doing similar investigations.
There are a large number of consortia undertaking GWAS analyses (Table 3) and the phenotypes under investigation by various consortia often overlap. Negotiations with other consortia WGs can lead to collaborations where GWAS investigators can contribute to data analyses being led by other consortia. This benefits both groups—investigators are able to contribute to primary analyses of an important phenotype and the collaboration's primary analyses now include data on additional individuals. Studies that are still being genotyped can contribute to later analyses. Alternatively, investigators may decide to serve as a replication study or look at other related or intermediate phenotypes or associations and G*E analyses beyond the initial ones being conducted. There are, of course, implications of overlapping data across independent meta-analyses and in some instances, such overlap is unavoidable. In those instances, appropriate corrections, such as those described by Lin & Sullivan [Lin DY and Sullivan PF, 2009], might be considered.
Inevitably an increase in sample size achieved by pooling samples across studies introduces phenotype heterogeneity that may alter the power to discover genes or G*E interactions for a complex trait of interest. During gene discovery, the balance between power achieved by larger sample size versus loss of power produced by introduced phenotype heterogeneity during phenotype harmonization may favor the former. For example, little is known about the genetic architecture of primary open-angle glaucoma (POAG) although the sib relative risk is 10 [Wolfs et al., 1998]. Icelandic investigators reported two SNPs between CAV1 and CAV2 associated with POAG with an OR of ~1.3 and with a p-value of 5E-10 [Thorleifsson et al., 2010]. Several replication datasets were provided and phenotype definitions varied widely. Not all replication sets achieved statistical significance and several had wide confidence intervals. When the replication sets were combined they did achieve significance with an effect size and confidence comparable to that reported in the discovery set. This suggests that the association between CAV1 and CAV2 gene variants and POAG represents a true positive and that pooling samples during the phenotype harmonization process can provide power even in the face of phenotype heterogeneity during gene discovery, although this might not be generalizeable to all complex traits.
On the other hand, when trying to identify G*E interactions for a phenotype with known genetic architecture, the balance between power gained by adding more samples versus power degradation produced by phenotype heterogeneity may favor the latter. The tradeoff between sample size and phenotypic heterogeneity of exposure data is modeled in Figure 2 [Lindstrom et al., 2009]. This hypothetical example considers a rare disease (prevalence 1 in 1,000), no main effect for the binary genetic factor (with 20% prevalence), an odds ratio of 1.5 for the exposure, an interaction odds ratio of 1.35, and a Type I error rate of 5E-8. This illustrates the power of a case-control study to detect a G*E interaction (departure from a multiplicative odds model) when the binary exposure is measured perfectly or via a good proxy with 77% specificity and 99% sensitivity (roughly analogous to self-reported versus measured overweight status). Figure 2 illustrates that even modest misclassification can greatly decrease the power of tests for G*E interaction (and the relative decrease is greater for rare exposures). On the other hand Figure 2 also illustrates that a large study using the proxy can have greater power than a smaller study using the perfect measure, although the power gain is modest at best. Nevertheless, the modest power gain may be important when the perfect measure is prohibitively expensive or only available on a small fraction of samples, while the good measure is relatively inexpensive or already available on many samples.
In many consortia or collaborations, different studies' samples may be genotyped on different platforms or may be genotyped using different versions of the same technology. This is particularly true when large studies are conducted over time because the technology is continuously evolving. Imputation tries to address these differences [Li et al., 2009], but this, too, can create more uncertainties in data comparability when investigators use different software packages to impute data. In GENEVA, genotyping is being performed on both Affymetrix and Illumina platforms with varying degrees of single nucleotide polymorphism (SNP) coverage. In addition, the two platforms detect different sets of SNPs. Plans to conduct imputation vary widely across studies, and there has been considerable discussion regarding the comparability of imputation results performed at different sites using different HapMap reference panels [http://hapmap.ncbi.nlm.nih.gov] and software packages (MACH [http://www.sph.umich.edu/csg/abecasis/MACH/index.html; Li et al., 2009; Li et al., 2006], Impute [https://mathgen.stats.ox.ac.uk/impute/impute.html; Marchini et al., 2007; Howie et al., 2009] and BEAGLE [https:faculty.washington.edu/browning/beagle/beagle.html; Browning and Browning, 2009] being the most common). Most WGs are utilizing individual investigators' imputation results and reviewing preliminary analyses of individual study data to ensure that analyses yield similar findings. Another solution for consortia, and one adopted by GENEVA, is to have all genotype data imputed centrally using a standard methodology.
As previously mentioned, pivotal issues WGs face when combining or harmonizing data from different studies are determining the analytic plan and the precise modeling strategy, adjusting for confounders, and performing stratified analysis. WGs continually consider, for instance: (a) whether waves of longitudinal data should be combined for comparability to cross-sectional studies; (b) whether analyses should be adjusted for covariates, such as ethnicity, or conducted separately in each group; (c) whether analyses should remove, truncate, or adjust for outliers (e.g. by normalizing distributions) and how these outliers are defined; (d) which statistical models accommodate the nuances of each dataset; and (e) whether studies should focus on main effects of individual SNPs, candidate genes, on gene systems or on gene-by-gene (G*G) and G*E analyses. Overarching these phenotype issues are comparable concerns regarding uniform quality control metrics for genotype data and comparable imputation statistics which add inherent complexity to the process. A pre-determined analytic strategy for the main gene discovery stage as well as approaches to G*E and G*G associations requires special planning as there is no standard approach though all agree that very large numbers are required to support the enhanced power requirements of G*E investigations [Garcia-Closas and Lubin, 1999; Kraft and Hunter, 2005].
Many investigators are now looking at using controls from one study as cases in another study as a way of expanding the size of an investigation without the cost of genotyping additional individuals. However, it is only possible when the subjects come from studies whose primary interests are not associated with each other. This becomes complex because it requires well-characterized phenotypes for selection of cases and controls, yet many of the phenotypes are secondary measures collected by the contributing studies and thus may not be well-characterized. Such phenotypes also may represent only a snapshot of the individual's phenotype at one point in time and may not be correlated with the development of future disease or conditions. GENEVA currently has a WG looking specifically at this issue, and its investigation is still in progress.
The infrastructure and processes described here have enabled GENEVA investigators to address common problems, facilitated the generation of new ideas for research and cross-study analyses, and provided GENEVA investigators with a means of turning these questions into active areas of study. GENEVA initially convened seven phenotype WGs. These included anthropometry, alcohol use, smoking, caffeine use, female reproductive history, psychiatric history, and oral health. As groups met and discussed the details of how and what data were collected, a few collaborations were determined to be not feasible (e.g. psychiatric history was not adequately addressed across most of the studies) and one group found that the distribution of data between two studies differed to such an extent that cross-study analyses were inappropriate. However, as investigators have become familiar with the various studies and with each other, and as new studies have joined the consortium, several new areas of mutual interest (e.g. sleep, protective effects, and physical activity) were identified and prompted the formation of new WGs. Each WG has identified one or more key phenotypes for which three or more studies are contributing data (Table 4). Thus far, eleven studies have genotype and phenotype data that have undergone consortium-level quality control and assurance. Six studies have so far contributed data to cross-study analyses, and an additional six have plans to contribute data to cross-study analyses.
Large consortia have emerged to correlate phenotypes with specific variants at certain genetic loci but the strategies and pitfalls associated with combining phenotype data from varying studies have not previously been described in detail. A key recent critique of GWAS reveals a principal concern with effect sizes [Goldstein, 2009] suggesting that meta-analyses of complex traits may be the avenue to successful gene discovery. Consortia-based studies that incorporate the approaches described here will be essential to achieving the power to support investigations of effect sizes demonstrated to be typical [Garcia-Closas and Lubin, 1999].
GENEVA is unusual in the diversity of its studies, yet the issues it has had to face in sharing data are common to all collaborations. GENEVA's harmonization efforts have important applications to current consortia that struggle with study heterogeneity but that seek to maximize the use and value of their data or extend their current data to explore the genetic architecture of novel but relevant traits.
Furthermore, GENEVA's policy to promote data sharing and collaboration, not only within GENEVA but also with other consortia and collaborations outside of GENEVA, has presented unique challenges in harmonization. Our experience demonstrates that systematic planning by a knowledgeable Coordinating Center, a team of collaborative investigators, and engaged program staff from the funding agency, are critical for maximizing the scientific return from large-scale GWAS. Collaborations such as GENEVA can stimulate development of novel methodologies for phenotype harmonization which, hopefully, will actively translate into steps towards identification of genetic loci important for traits related to health and disease.
Funding support for the GENEVA genome‐wide association studies was provided through the NIH Genes, Environment and Health Initiative (GEI). Some studies also received support from individual NIH Institutes. Barnes (U01HG004738); Beaty (NIDCR: U01DE018993, NIH contract: HHSN268200782096C); Bierut (U01HG004422, NIAAA: U10AA008401, NIDA: P01CA089392, R01DA013423, NIH contract: HHSN268200782096C); Boerwinkle (U01HG004402, NHLBI: N01HC55015, N01HC55016, N01HC55018, N01HC55019, N01HC-55020, N01HC-55021, N01HC55022, R01HL087641, R01HL59367, R01HL086694, NIH contract: HHSN268200625226C, NIH Roadmap for Medical Research: UL1RR025005); Caporaso (Z01CP010200); Fornage (U01HG004729); Hu (U01HG004399); Haiman (U01HG004726, NCI: CA63464, CA54281, CA136792); Heit (U01HG004735); Lowe (U01HG004415); Marazita (NIDCR:U01DE018903, NIH contract: HHSN268200782096C); Mitchell (U01HG004436, NINDS and Office for Research in Women's Health: R01NS45012); Murray (U01HG004423); Pasquale [U01HG004728, NEI: R01EY015473, NEI: R01EY015872 (J.L.Wiggs)]. Assistance with phenotype harmonization and genotype cleaning, as well as with general study coordination, was provided by the GENEVA Coordinating Center (U01HG004446). Assistance with data cleaning was provided by the National Center for Biotechnology Information. Genotyping was performed at the Broad Institute of MIT and Harvard, with funding support from the NIH GEI (U01HG04424), and Johns Hopkins University Center for Inherited Disease Research, with support from the NIH GEI (U01HG004438) and the NIH contract “High throughput genotyping for studying the genetic contributions to human disease” (HHSN268200782096C). Funding support was also provided, in part, by the Intramural Research Program of the NIH, National Library of Medicine. M.C. Cornelis is a recipient of a Canadian Institutes of Health Research Fellowship. A. Agrawal is supported by grants from the NIDA. K. C. Barnes is supported in part by the Mary Beryl Patch Turnbull Scholar Program. L. R. Pasquale is supported by a Research to Prevent Blindness Physician Scientist Award.
The GENEVA consortium thanks the staff and participants of all GENEVA studies for their important contributions.