Participants reconvened to hear the recommendations from each breakout group and discuss overall recommendations and strategies to move forward. The recommendations below capture the main points and suggestions from the participants of the workshop, but do not necessarily represent a consensus view of all participants.
A variety of approaches are needed to capture data on environmental exposures, individual genomes, and epigenomes, each of which is likely to contribute to the etiology of disease. Targeted as well as broad approaches are needed in studies of gene-environment interplay, depending on the research question, and studies should include multiple genes and environmental exposures whenever possible. Whichever approach is used, the downstream goals should include ways to help inform policy decisions, address health disparities, and improve public health. Targeted studies (hypothesis driven) that focus, for example, on a specific phenotype, disease, or environmental exposure, are best for investigating situations in which there are known disease associations with particular genes and/or environmental factors. In these situations case-control (including nested-case control) study designs and family study designs may be particularly valuable. These approaches are more cost effective than discovery-driven designs, but focus upon particular hypotheses, and therefore, can miss important effects that lie outside the study design. Broad (discovery-driven, rather than hypothesis-driven) agnostic approaches may be more appropriate in situations where it is unknown how the environment is modifying genetic effects or vice versa. Although more costly than a targeted approach, genome-wide studies can provide a more thorough characterization of the potential interrelationships of environmental exposures, phenotypes, and phenotype groups. Discovery-based research that incorporates both genomics and environmental exposures should be encouraged in both environmental research as well as genomics.
A benefit of using existing population studies for the study of gene-environment interplay is the ability to leverage existing investments, as many completed and ongoing studies have sophisticated phenotypic and exposure measures, follow-up information, and stored biological specimens. Existing cohorts and intervention studies can sometimes be supplemented to collect new data, whether genetic or environmental, to enable the study of gene-environment interplay.
Some environmental factors can be assessed at time points that predate the onset of the disease. Examples include: data from cohort studies’ baseline, special exposures documented in databases (e.g. toxins released from industrial sites recorded in emissions inventories), or medical exposures that are extracted from patient records. One could also conduct secondary analyses using archived data and biospecimens which are rich resources for studies of gene-environment interplay. Specimen banks for newborns’ bloodspots can serve as a resource in which prenatal exposures can be measured retrospectively. Exposures that accumulate slowly over time, are intermittent, or have a short half-life are particularly challenging, but some biomarkers can provide a record of exposures during previous time periods. The problem of assessing past exposure may be attenuated when the lag between etiologically relevant periods of exposure and onset of disease is short, i.e. when the condition is characterized by acute onset. In addition, measurement tools and resources being developed for gene-environment studies to improve precision or enhance potential for future data harmonization can be leveraged in ongoing studies. These include measurement technology in the form of more sophisticated biomarkers and environmental sensors being developed and validated in the GEI Exposure Biology Program (GEI EBP; www.gei.nih.gov/exposurebiology/
). The NHGRI-funded Consensus Measures for Phenotypes and Exposures (PhenX; https://www.phenx.org/
), which provides standardized, validated measurement tools for high-priority phenotypes and exposures for GWAS and Phenotype Finder IN Data Resources (PFINDER; http://www.nhlbi.nih.gov/resources/pfindr.htm
), a tool to support cross-study data discovery among NHLBI genomic studies, are other valuable resources for the scientific community [Stover et al., 2010
Existing studies or cohorts may also present some disadvantages for gene-environment interaction studies, as most studies were not originally designed to identify this complexity. Existing studies often lack exposure data from the relevant time of risk, often do not have variables that span disease domains and/or do not provide detailed exposure information relevant to hypotheses, and informed consents may not allow for data sharing or new uses of data or specimens, all of which are important in leveraging these resources. In addition, existing cohorts often do not include populations with sufficient or appropriate representation of the diverse racial, ethnic, age groups, and social backgrounds which may be necessary to detect population or group-specific gene-environment interactions.
New cohorts and study designs are necessary for detecting the multiple genetic and environmental factors that lead to human disease. Desirable characteristics of new cohort studies for the study of gene-environment interaction generally include: large sample size; diverse demographic representation; a broad range of genetic backgrounds and environmental exposures with early and recent exposure data; a broad array of clinical and laboratory measures with regular follow up over long periods of time; high-quality endpoint ascertainment and documentation; and measurements that are appropriate for the cohort(s) being studied. Furthermore, attention to the selection of participants in case-control studies is needed in order to enhance environmental variation as the limited range of exposures and/or limited numbers of subjects in critical exposure groups are impediments to assessing the exposure effects alone, and the problem may be magnified in gene-environment interaction studies. Studies should also have policies and procedures in place for collection and storage of biological specimens and open access of materials and data to other researchers; researchers should develop plans for re-contacting individuals for additional experimental studies, or for follow-up clinical care. There is also a great need for studies in underserved populations (e.g. American Indians, African Americans, immigrants, low-income individuals, the elderly and children) that often bear a disproportionate burden of certain diseases and/or exposures.
It is important to measure environmental exposures at appropriate time points, because many genes are only expressed during specific developmental periods, and some exposures may have greater impact during specific developmental stages. Potential sensitive periods for environmental exposure include but are not limited to time of conception, gestation, infancy, and puberty. Due to the sporadic or cumulative nature, and/or “sensitive timing” of environmental exposures over a lifetime, a large, longitudinal population study will be needed in order to identify some gene-environment interactions.
Much larger sample sizes are needed for detection of interactions than for main effects [Thomas, 2010
]. Thousands of cases and controls are needed to detect interaction relative risks of about 1.5 for a candidate gene study or tens of thousands for a GWAS. Power for detecting interactions would be further diminished by measurement error in either exposure or genotype and can have unpredictable effects on the direction of an interaction, particularly if one or both is differentially misclassified. Many designs can be appropriate for studying a complex disease, but the sample size requirements for gene-environment studies will depend on many factors, including the prevalence and dose of the environmental variable(s) of interest, allele frequencies, effect sizes, outcome(s), effect modifiers, and/or covariates. Certain study designs can reduce the needed sample size by altering those parameters. For instance, by enriching a cohort with subjects having a rare variant, researchers can increase the prevalence of that variant within their study population. Likewise, increasing the prevalence of an exposure or effect modifier of interest through over-sampling will alter the required sample size. Similarly, the use of animal studies may allow increased levels of the exposure so that the exposure effect size increases dramatically, thereby reducing the needed sample size, although this approach has to be tempered by a need for realistic “doses” of environmental exposure.
Type 1 errors continue to be a concern in the analysis of gene-environment interaction as in other large-scale genomic analyses. Methods to address type 1 errors include the integration of known pathway information to inform analysis; testing for interaction in an environmental subgroup to identify genetic effects only apparent in certain environmental exposures; utilizing case only designs to increase the power to test for interactions; and family-based designs to avoid bias from population stratification [Thomas, 2010
]. Other examples of research designs that allow the investigation of complex diseases without prohibitively large sample sizes include: studies of controlled environment(s), intervention studies, and clinical trials. The use of biomarkers as a refinement of outcome or exposure measures and data reduction methods may also decrease the need for large samples. Other exploratory methods such as multiple regression and pattern recognition can also be used for the evaluation of gene-environment interplay.
Studies utilizing cell lines and animal models will be important for elucidation of the function of genetic variants identified by GWAS and for mechanistic studies to understand how environmental factors interact with genetic factors to influence health and disease. Data simulation and animal models can be useful in developing theories about the functionality of a given gene variant but may not be applicable to humans. Environmental manipulation need not be limited to animal models, but can also be conducted with humans, such as human chamber studies (controlled environmental exposures) or special epidemiologic studies (e.g. occupationally exposed individuals). Animal models can also be useful for conducting basic mechanistic studies.
Yet, there remains a need for the development of more sophisticated analytical methods and approaches for gene-environment interaction studies to explore the multiple levels of data in gene-environment interplay. Continued support is needed for bioinformatic and biostatistical tools and methods development. In addition, the field suffers from a shortage of investigators trained in computational biology and statistical methodology capable of developing new methods and analytic tools. Interdisciplinary training programs in computational, statistical, and molecular biology are also needed to train the next generation on global approaches to research. A database for standardized approaches, tools, and GWAS and gene-environment study results would also benefit the research community. Additionally, existing resources for environmental measurement tools and databases [e.g. GEI Exposure Biology program, PhenX, the Pharmacogenomics Knowledge Base (PharmGKB; http://www.pharmgkb.org
), the Database of Genotypes and Phenotypes (dbGaP; http://www.ncbi.nlm.nih.gov/gap
), etc.] should be leveraged to aid in harmonization of data from multiple studies [Altman, 2007
; Mailman et al., 2007
; Stover et al., 2010
Measurement of phenotypes should be precise and accurate, well correlated with the disease, disorder, or trait of interest, easy to measure, low burden, and low cost. However, there is a tradeoff between the need for low-cost, harmonized phenotypic measures that allow for the pooling of data from large studies and the need to discover and use new endophenotypes and biomarkers that more precisely characterize both the exposure and early or subclinical features of the disease. Across studies, there is a need to use more intensive measures on high risk subsets of the study population (e.g. siblings of autistic children). As much information should be captured as possible on subsets, including data on physical and social environments and molecular markers. However, in addition to more intensive and targeted measures, a large number of phenotypic measures that are cheaper and easier to collect could be obtained and harmonized in many studies, thereby allowing for pooling of phenotype results among studies. To further assist phenotype harmonization across studies, phenotypes should be collected, whenever possible, using standardized methods and approaches, e.g. those outlined in PhenX [Stover et al., 2010
Those conducting phenotype-specific studies should be cautious in assuming that phenotype measurements are the same across studies. More investment is needed in defining and refining the phenotype so that data can be analyzed, harmonized, and interpreted successfully. Previous family studies and linkage studies may be useful to improve phenotyping and determining heritable components, but they must be done carefully, as meta-analyses can provide different results than those of individual studies due to differences in sample ascertainment and phenotype heterogeneity [Manolio et al., 2009
It is important to focus on complex phenotypes as well as search for associations to the underlying mechanisms or endophenotypes as each approach yields different information. Although the search for intermediate phenotypes or endophenotypes is desirable, measurement of endophenotypes can be as complicated as that of the disease outcome. On the other hand, the search for intermediate phenotypes can lead to the discovery of new molecular phenotypes. Challenge or intervention studies are one approach, when possible, to confirm endophenotypes and gene-environment interactions. This is particularly important when endophenotypes include behaviors that are sensitive to gene-environment correlation. It is important to note that measurement of biomarkers may be complex and needs to account for co-morbidities and the presence of multiple exposures. Also, the importance of the cell type affected in the disease process should not be underestimated. Longitudinal information can help identify time and age-dependent phenotype and biomarker measures and their genetic underpinnings.