Published GWAS have thus far not fully explored the potential role environment plays in modifying genetic susceptibility. Among reasons that deter researchers from considering *G-E* interactions in GWAS is the unclear consensus on the most effective statistical approach to investigating these interactions. In the current study, we have provided an in-depth comparison of logistic regression-based methods aimed to enhance the detection of genetic factors or directly investigate statistical *G-E* interactions. Under the reasonable assumption that the majority of tested markers are not associated with risk of type 2 diabetes in any stratum defined by body mass index, our empirical results provide some information on the type I error rates for these tests. Absent a known (statistical) *G-E* interaction, it is difficult to draw conclusions regarding the relative power of the different tests for *G-E* interaction that we compared. However, markers with a known marginal association with type 2 diabetes (e.g., *TCF7L2*) can serve as positive controls for the 2 df tests, which are geared toward marker discovery, rather than testing *G-E* interactions per se.

In the course of conducting the current study, we observed that tests requiring that the main effect of the environment be specified can have a profoundly inflated type I error rate when that effect is misspecified. Because the standard case-only test does not require that the main effect of the environment be specified, it is in principle immune to this inflation in type I error rate, even if the interaction term is misspecified (refer to the

Appendix). In practice, we did not observe any inflation in type I error rate for this test (

Web Figure 1). We presented 3 strategies that can correct this inflation, 2 of which involve fitting more flexible models for the environmental main effect and 1 of which involves calculating a robust variance estimate. Although all 3 of these methods yield tests with nominal type I error rates, it is an open question which approach yields the most power under different alternative models. Developing flexible and powerful modeling strategies for

*G-E* interactions (especially when more than 1 environmental factor is considered) is an area of active research and beyond the scope of the current study.

If strict control of type I error is used as the primary criterion for evaluating alternative tests for

*G-E* interactions, the standard logistic regression test is generally accepted as the superior option for analysis of case-control data (

8). Tests that exploit

*G*-

*E* independence (i.e., case-only and semiparametric maximum-likelihood estimation) have been perceived as less preferable because they are susceptible to biased results when this assumption is violated (

6). A true

*G*-

*E* correlation might occur when an exposure depends on an individual’s behavior, when the exposure is itself a trait under partial genetic control (such as body mass index), or in samples with latent population substructure (

1,

37). A causal

*G*-

*E* association is likely to occur only for a very small fraction of the tested markers, and the strength of correlation between

*E* and any single marker is likely to be small (as in the case of body mass index). Population stratification could create

*G*-

*E* correlations for a much larger fraction of markers, but we note that all of the methods we have presented here could greatly reduce the risk and magnitude of

*G*-

*E* correlation due to stratification by conducting analyses within strata where the

*G*-

*E* independence assumption is likely to hold (e.g., within groups defined by self-reported ethnicity and/or genome-wide genetic similarity) (

4,

38).

In principle, both sources of

*G*-

*E* correlation could have been a concern in our study, given the known genetic influences on body mass index (

22,

23) and potential differences in body mass index among US self-identified Caucasians with recent ancestors from different parts of Europe. In practice, although known genetic markers for body mass index were indeed associated with body mass index in our samples (

Web Table 4) and the first principal component of genetic variation was associated with body mass index in the HPFS (

*P* < 0.03; no significant association in the NHS), the case-only and semiparametric maximum-likelihood estimation methods showed no evidence of inflation in the type I error rate, and the set of markers with highly significant

*G-E* interaction tests (

*P* < 1 × 10

^{-5}) was not enriched for markers that were correlated with body mass index in controls. This observation is consistent with theoretical power calculations (

Web Figure 8), which showed no discernable increase in the type I error rate for the case-only test using 2,000 cases when as many as 5,000 markers were modestly associated with exposure (e.g., the odds that carriers of the minor allele were exposed was between 1.05 and 1.10 times that of noncarriers).

On the other hand, although, as expected, the standard logistic regression-based test and the hedge tests for *G-E* interaction showed no evidence for inflation in type I error rates, the set of markers with highly significant interactions was enriched for markers correlated (by chance) with body mass index. This is because the interaction odds ratio for the standard logistic regression is correlated with the observed association between *G* and *E* in controls.

Our results suggest that the power gains from the case-only and semiparametric maximum-likelihood estimation methods that leverage the

*G*-

*E* independence assumption may outweigh the risks of potential increase in type I error rate should that assumption not hold. We emphasize, however, that in other (i.e., larger) data sets or for other exposures the potential for bias remains (

Web Figure 8), and that markers that achieve statistical significance using these methods should be vetted to make sure the observed interaction is not due to

*G*-

*E* correlation. They can be compared against published lists of markers associated with the environmental exposure. If no such lists exist or these are not deemed sufficiently exhaustive, the markers can be tested for association with the exposure in auxiliary data sets; and these putative interactions can be tested by using the other methods outlined here that are not sensitive to the

*G*-

*E* independence assumption. We also note that tests that leverage

*G*-

*E* correlation can have lower power than the standard logistic regression test for interaction in special cases where the interaction odds ratio and the

*G*-

*E* association odds ratio are in opposite directions (

24).

We have focused on the performance of tests incorporating

*G-E* interactions in a single GWAS. The relative performance of these tests in the multistage context, where a fraction of markers showing evidence for

*G-E* interaction or for association allowing for

*G-E* interaction (although not necessarily at a genome-wide significance level) is tested in a follow-up sample, remains unclear. Although in many situations the case-only method shows little evidence for an inflated type I error rate when a conservative genome-wide significance level is used (

Web Figure 8), the proportion of false positives due to

*G-E* dependence may be higher when using a more liberal significance threshold to choose markers for follow-up. This could reduce the power of the case-only approach in the multistage context. This point requires further research.

The 2-stage and empirical-Bayes-shrinkage estimator tests are proposed to gain efficiency when the

*G*-

*E* independence assumption is met in the underlying population and, yet, are resistant to increased type I error when the underlying assumption of independence is violated (

7,

8). Neither the 2-stage or standard logistic regression test showed systematic inflation in type I error rates. Although both tended to favor markers that exhibited chance

*G*-

*E* dependence in controls, this tendency was markedly greater for the standard test. We note that our implementation of the empirical-Bayes estimator has a heuristic justification. It would be interesting to compare this computationally simple approach with the general semiparametric maximum-likelihood estimation estimator proposed by Mukherjee and Chatterjee, especially in situations where adjusting for covariates is important.

In the current study, the performance of the joint 2 df and semiparametric maximum-likelihood estimation 2 df tests was generally similar to that of the marginal genetic test, with 1 notable exception: a nearly 3-fold improvement in detection of an established type 2 diabetes locus (*TCF7L2*) in the NHS. We note, however, that, in the current application, the joint tests performed just as well as a marginal effect test that adjusts simply for the environmental factor. If these findings are a function of the strong association between body mass index and type 2 diabetes, they may have important relevance to other settings where one is testing a strong environmental risk factor. Preference for either a 1 df or a 2 df test will largely be dictated by the goals of the study since each test is addressing a very different research question. The likely success of either approach will ultimately depend on the true penetrance model and prevalence of exposure, which will vary by disease.

The nested case-control design of the NHS and the HPFS type 2 diabetes GWAS combines the avoidance of bias of cohort designs with the cost-efficiency of case-control designs and is thus ideal for genome-wide

*G-E* interaction studies. Nevertheless, no

*G-E* interaction reached genome-wide significance in either data set, nor did any nominally significant finding replicate across studies, and we acknowledge this as a limitation. Moreover, there is currently no confirmed

*G-E* interaction in the diabetes literature and, thus, we were unable to include a positive control for additional insight on test-power properties. We also cannot generalize our findings to studies where population stratification is a more pressing concern, notably studies in recently admixed populations such as African Americans or Latinos. Chatterjee and Carroll (

4) consider the potential for bias due to population stratification and propose a modified semiparametric maximum-likelihood estimation test when the

*G*-

*E* independence assumption holds conditional on only a set of stratification variables. Future application of these different tests to substructured populations will be highly informative.