The contributions from Group 10 introduced a variety of new methods for the detection of G×E interactions in both case-control and family-based data using both cross-sectional and longitudinal study designs. These contributions also illustrated a number of challenges that arise when considering G×E interactions in a GWAS.
One of the most difficult challenges of GWAS is how to deal with the large p
, small n
problem that arises when the number of variables considered (p
) is much larger than the number of subjects (n
). The problem becomes even more pronounced when one seeks to test interactions between SNPs and one or more environmental exposures in addition to determining the main effect of each SNP. One approach commonly used to reduce the number of tests performed is to select a subset of SNPs to be tested for interactions with known or hypothesized environmental predictors of the phenotype. This can be done by first conducting a single-SNP analysis for each SNP in the genome-wide data, in which one SNP at a time, along with relevant covariates, is tested for association with the phenotype, and then only the most significant SNPs are followed up in G×E interaction testing. Several Group 10 contributions showed that the use of this strategy may miss potentially important SNPs that have a very small main effect, but a significant G×E effect. For example, Arya et al. 
tested genotype×sex interactions for association with RA and found 30 SNP×sex interactions that were nominally significant (p
), but none of these were significant in the single-SNP analysis. Similarly, Zhuang and Morris 
also tested genotype×sex interactions for association with RA and identified eight novel SNPs that demonstrated genetic effects in only one sex, or reciprocal effects on risk in males and females, but these SNPs were not significant in the single-SNP analysis. In the FHS, Maenner et al. 
found significant evidence for an interaction between smoking and a SNP selected by a RF algorithm as the most important, but this SNP ranked as only the 2,111th
-value in the single-SNP analysis (out of 355,649 SNPs).
The inability of many of the groups to detect a G×E interaction that reached a genome-wide level of significance is likely to be due to inadequate sample sizes. To explore the power to detect an interaction in a GWAS, we adopt a standard logistic model framework for a disease outcome (D), with form logit[Pr(D=1|G,E)] = β0 + βg G + βe E + βge G×E. This model parameterizes the baseline disease prevalence (β0), the main effects of G (βg) and E (βe), and the G×E interaction (βge). The quantity of interest is the interaction ORge = exp(βge) = ORG | E=1 / ORG | E=0, or, in other words, the odds ratio for a given SNP (G) in exposed (E=1) individuals divided by the odds ratio for G in unexposed (E=0) individuals. The epidemiologist may want to adopt the alternative exposure-based interpretation for ORge, specifically ORE | G=1 / ORE | G=0. shows the required number of case-control pairs required to achieve 80% power for detecting an interaction, for various underlying values of ORge, minor allele frequencies, and exposure prevalences. The range of exposure prevalences represents that of many common environmental exposures, including physical inactivity, obesity, and smoking. An exposure prevalence of 0.1 is representative of physical inactivity in non-Hispanic whites (10.9% according to Centers for Disease Control (CDC) statistics for 2007) and that of 0.5 is representative of obesity in non-Hispanic black women (53% according to National Health and Nutrition Examination Survey (NHANES) statistics from 2003–2006). The prevalence of physical inactivity in other racial/ethnic groups, obesity in other racial/ethnic/sex groups, and smoking (19.8% according to CDC statistics for 2007) falls between 0.1 and 0.5. The required sample size to detect a significant G×E interaction of reasonable magnitude in a GWAS at p<10−7 is approximately two to three times larger than that needed to test a single variant at p<0.05 due to the correction for multiple testing.
Required number of case-control pairs to detect a G×E interaction
High as the sample sizes are in , they are underestimates because in reality both G and E are likely measured with error. As discussed below, accurate measurement of environmental exposures is the exception rather than the rule. Genotypes are also subject to measurement errors but these are generally small compared with errors in environmental exposures. However, in GWAS, the vast majority of the common SNPs are not measured directly but rather are captured through tag SNPs. When testing for SNP main effects, it is well known that imperfect tagging inflates the required sample size by a factor of approximately 1/r2
[Pritchard and Przeworski, 2001
]. Under certain assumptions, this applies more generally to covariates measured with error, specifically the required sample is inflated by the reciprocal of the square of the correlation coefficient between the true value of the covariate and its measurement [Devine and Smith, 1998
]. Thus, if
is the LD between a tag SNP and the causal SNP, and
is the squared correlation coefficient between the true exposure and the measured exposure, we can expect an approximate sample size inflation of
. For example, if
, then the required sample size to detect a G×E interaction with a given power will be 39% higher than the required sample size if both G and E were measured without error.
Unfortunately, given the realities of epidemiological research and the desire to continue to use valuable existing studies (e.g., FHS), the required sample sizes are often not practical. Moreover, recent discussions have pointed out that corrections for multiple testing, such as the Bonferroni correction, are too conservative because they do not take into account correlations between the tests due to LD [Rice et al., 2008
]. Rice et al. point out that the effect sizes of susceptibility alleles (and G×E interactions) will rarely reach the required level of significance in GWAS if a Bonferroni correction is used. Although the Bonferroni correction is easy and straightforward to calculate, less conservative methods, such as permutation testing, false-discovery rate, and sequential methods (splitting the data into a test set and replication set), may need to be applied to balance the type I and type II errors (false positives and false negatives, respectively). Alternately, Maenner et al. 
initially used a machine learning approach, which is not based on p
-values so a Bonferroni correction is not applicable. Machine learning approaches can screen large amounts of data and take into account interaction effects as well as main effects without requiring model specification. They then selected a very small number of variables with the highest variable importance scores and tested these for interactions using traditional regression approaches.
Accurately quantifying environmental exposures at the individual level is a challenge that is widely recognized and the topic of the NIH Genes, Environment and Health Initiative. The difficulty of this task, however, may be most evident when contrasted with the detail and volume of information obtained from genetic samples. In all three data sets, measures of environmental variables were comparatively crude, if available at all. Although the FHS data includes a variable for cigarettes smoked per day at each visit, missing data and the lack of information about smoking before and between visits make it difficult to quantify smoking behavior. Given the structure of smoking data, a choice has to be made between creating crude categories of exposure (e.g., ever or never smoked) and basing the ascertainment of exposure on a limited time period (e.g., average number of cigarettes smoked per day across visits, or last known smoking status). More complete measurements of environmental exposures across time would not only better represent the exposure of interest, but would also allow greater flexibility to replicate findings and to compare with other studies in which exposures are computed differently. Several groups selected sex as an “environmental” variable of interest, which could represent a proxy for different environmental exposures [Arya et al., 2009
]. While the measurement of sex is certainly more straightforward than smoking or alcohol consumption, it does not necessarily provide more insight into the causal pathways of specific environmental exposures. Because many health outcomes are thought to be a complex combination of environmental and genetic factors, future studies should strive to create new methods for the collection of meaningful environmental information that is as reliable and comprehensive as the genotype data.
The FHS presented additional challenges beyond that of multiple comparisons and measurement of environmental exposures when testing for G×E interactions. The longitudinal, family-based design resulted in data that were correlated in two ways: repeated measurements taken in the same individual at multiple time points and measurements taken in members of the same family. Shi et al. 
addressed both types of correlation by applying a three-level hierarchical linear mixed-effects model to account for correlation due to the family-based longitudinal data. Using the simulated data, they found this model to be generally more powerful than using a cross-sectional model that accounted for familial correlation. Joubert et al. 
used a novel variance-component method to account for both repeated measures and familial correlation. Maenner et al. 
utilized the longitudinal data by analyzing age at onset of CHD as the outcome. Gu et al. 
used a two-level factor analysis for longitudinal data. Maenner et al.
and Gu et al.
used a generalized estimating equations model to confirm the results from their primary analysis while accounting for familial correlation. The use of longitudinal data in studies of G×E interactions is particularly appealing because it may help overcome some of the pitfalls discussed above. Specifically, some of the power lost by conducting a G×E analysis using GWA data may be recaptured by the use of longitudinal data and having multiple measurements of the environmental exposure may lessen the problem of measurement error.
As discussed previously, because tests for G×E interaction are generally less powerful than those to detect main effects and current GWAS are typically only powered to detect main effects, it is important for investigators to choose an analysis method that has the most power to detect G×E interactions. Group 10 has investigated the use of many different methods, including traditional logistic regression [Arya et al., 2009
; Zhuang and Morris, 2009
], latent components analysis [Gu et al., 2009
], machine learning algorithms [Maenner et al., 2009
], an extended generalized estimating equations approach [Chiu et al., 2009
], and hierarchical modeling [Shi et al. 2009
]. As discussed previously, many of these analyses identified markers involved in G×E interactions that would have been missed if tested for main effects alone. Zhuang and Morris 
and Joubert et al. 
applied a two degree of freedom test that has been shown to be a more robust choice to detect markers involved in disease risk because it jointly tests for main effect and interaction [Kraft et al., 2007
]. The case-only analysis has been shown to be a powerful alternative to test for G×E [Khoury and Flanders, 1996
; Piegorsch et al., 1994
]. However, in a genome-wide setting, the assumption of G×E independence in the population is untenable across the large number of markers. Recently, Murcray et al. 
developed a two-step method that uses a case-only style screening step on the combined case-control sample to reduce the number of markers tested formally for interaction. They show that their two-step test is more powerful than a traditional logistic regression model for interaction under a wide range of scenarios, even in the presence of an association between gene and environment in the population. Mukherjee and Chatterjee 
developed an empirical Bayes-type shrinkage estimator to model G×E interactions with the efficiency of the case-only design and unbiasedness of a case-control design. By combining case-control and case-only analysis, Li and Conti 
developed a Bayes model averaging approach to obtain a single estimate of the interaction effect. Through simulation, their Bayes model averaging approach was shown to be more powerful and robust to violations of independence than traditional approaches. Although complex disease is likely to be more complicated than can be defined by simple two-way interaction models, the development of powerful tools that incorporate the joint effects of genes and environment is an important step toward understanding disease outcomes.
Although only one of the Group 10 contributions tested for population stratification [Arya et al., 2009
], it should be considered in all studies of G×E interaction. Population stratification can occur when systematic differences in allele frequencies exist between subgroups of a population, often corresponding to distinct genetic ancestry. This is an issue for GWAS because it can result in erroneous associations with the outcome. For analyses including G×E interactions, population stratification is an issue if population membership is associated with the outcome, the genetic effect, and the environmental exposure. One way to determine this is first to test for population stratification, i.e., by employing principal-components analyses using the software EIGENSTRAT [Patterson et al., 2006
; Price et al., 2006
]. If distinct populations are found, then population membership can be tested for association with the environmental variable as well as with the outcome. A priori criteria for a measure of association should be determined by the investigator. In this scenario, it is assumed that the association with the genetic effect is established through the principal-components analysis. If population membership is also found to be associated with the environmental exposure and the outcome, then the final analyses should adjust for population stratification.
Finally, analyzing G×E interactions can be computationally intensive. For dichotomous traits, testing G×E interaction under a logistic regression framework requires maximizing the likelihood function numerically; for quantitative traits, testing the interaction with family-based longitudinal data using a mixed-effects model also relies on numerical optimization, both of which are computationally much more extensive than contingency table or regular regression approaches. Bayesian methods are, inherently, computationally demanding as well, but may allow consideration of more complex models. When scaling up to genome-wide data with hundreds of thousands of SNPs, care should be taken to choose an appropriate statistical model, analysis software, and necessary computing hardware. As demonstrated by Shi et al. 
, mixed-effects models with Kronecker and hierarchical structures yielded comparable model fitness. However, the Kronecker analysis required about 5 minutes for a single model fitting, while the hierarchical model required only 3 seconds, both using SAS PROC MIXED. Due to the parallel nature of the genome-wide scan, cluster computing with tens or hundreds of computing units working simultaneously can significantly reduce the overall computation time. SAS Grid computing enables SAS applications to automatically utilize grid resources. PLINK version 1.05 [Purcell et al., 2007
] provides an R interface, which allows the use of abundant analytical resources developed under R and also offers an option for cluster computing at no cost.
The analysis of G×E interaction is likely to be of increasing importance as researchers attempt to unravel the etiology of complex diseases using high-volume genetic data. A researcher primarily interested in environmental risk factors may be interested in identifying genes that modify the effect of a target environmental risk factor for a disease, while a researcher primarily interested in genetic risk factors may want to know how an environmental factor affects the penetrance of a gene on a disease. Either situation can be viewed as G×E interaction, and for both, the researcher will be charged with conducting the most efficient analysis possible. At a minimum, this will include consideration of sample size and power, potential population stratification, and the best way to measure the environmental exposure. The Group 10 contributions have provided examples of several approaches one might take in testing G×E interactions, for example, jointly testing for a main and interaction effect, testing for population stratification, and the use of longitudinal data with multiple measurements of the environmental exposure to lessen the problem of measurement error. Although several new issues arise when one analyzes G×E interactions in a GWAS, Group 10 demonstrated that such analysis may hold the promise of uncovering new genes that might not otherwise be detected.