The Case-Only (CO) analysis was always the most efficient choice to test for G×E interaction across a range of interaction effect sizes (), which is not surprising given past reports[Khoury and Flanders 1996
; Li and Conti 2009
; Piegorsch, et al. 1994
; Wang and Lee 2008
]. Although this approach is potentially biased in practice, we include it in the comparison of relative efficiencies to show a lower bound for sample size requirements for a GWAS. For small interaction effects (1.5–2.0), there are noticeable differences in the required sample sizes to achieve 80% power for the two-step approaches. To detect an Rge
= 1.5, the least efficient two-step strategy is to screen on marginal effects, with a required sample size of N1
= 7,932 cases (and N0
= 7,932 controls) that is only slightly more efficient than the standard CC approach. On the other hand, the Environment-Gene Two-Step (EG2) approach requires 4,468 cases under the same assumed parameter settings. As the interaction effect size gets larger, the relative efficiencies (RE) of the methods remain relatively constant compared to the Case-Control (CC) test (REDG2
≈ 1.85, REH2
≈ 1.95, REEG2
≈ 2.08, RECO
Sample Size Required to Achieve 80% Power for Tests of Gene-Environment Interaction in a Genome-wide Association Study by Interaction Effect Size for Binary Environmental Exposure
The EG2 test was often more efficient than the DG2 across a wide range of parameter settings (). The RE of the latter method was more sensitive to assumptions about exposure prevalence and minor allele frequency than the EG2 or Hybrid (H2) approaches as these impact the strength of the induced marginal genetic effect tested in Step 1. Specifically, if the exposure is rare (pE = 0.1), the RE of the DG2 test is 0.36, less efficient than the traditional one-step scan (CC). On the other hand, the DG2 is the most efficient of the two-step methods (RE = 2.0) for a more common exposure, pE = 0.5. Minor allele frequencies (qA), exposure prevalence (pE), and main effects (Rg, Re) have little effect on the RE of the EG2 and H2 approaches. However, the EG2 test is sensitive to disease prevalence, with higher RE for rare diseases (RE=1.90 when p0=0.01) than for a more common disease (RE=1.37 when p0=0.10). The relative efficiency of the EG2 decreased dramatically when a large number of markers had a G-E association in the population (pge>0). For example, if 10,000 markers (pge = 0.01) have a detectable G-E association in Step 1, the RE of the EG2 approach decreases 56%, from RE = 1.90 to 1.34. As the proportion of markers increases to 100%, the relative efficiency of the EG2 approaches 1.0, i.e. requiring equal sample size to that of the CC approach. The H2 has a similar trend, with decreasing efficiency for increasing number of markers with a G-E association, however the decrease is more gradual (a decrease of 28% in RE for pge = 0.01). In general, the H2 is a robust approach that provides either the best or nearly the best efficiency across a wide range of models.
Sample Size (N) and Relative Efficiency (RE) Required to Achieve 80% Power to Detect a True Gene-Environment Interaction for a Collection of Testing Strategies across a Range of Parameter Settings
Both the EG2 and DG2 scans can be optimized for a set of assumed population parameters as a function of the Step 1 significance thresholds, αA
. The EG2 was more efficient for stricter significance thresholds, αA
= 1.0E-05 for the base model parameters (RE = 1.89) (). Conversely, the DG2 analysis was more efficient for a more liberal screening threshold, αM
= 1.0E-03 (RE = 1.41). There does exist a single optimal choice for both αA
for a single set of parameter settings. However, as many of these parameters are unknown at the time of study design and analysis, a robust choice can be made across minor allele frequencies and penetrance models. For the base model, the H2 can be optimized across both significance thresholds with the required sample size across choices for αM
being flatter across a range of αA
near the optimal choice (). Specifically, for αM
(6.0E-04, 1.3E-03) and αA
(8.0E-06, 1.1E-0.5) the minimum sample size required to achieve 80% power is 1,289 (RE = 2.04). Although these choices of αA
define the region of optimal efficiency, the choices of significance thresholds for the screening steps (αA,
) are robust across a fairly large window around the optimal. For example, for the range considered in , the least efficient choice for these parameters (αA
= 1.0E-06, αM
= 5E-03) still had a relative efficiency of 1.93 relative to the traditional case-control (CC) test for interaction (N1
= 1,364, NCC
Figure III Sample Size Required to Achieve 80% Power for the Hybrid Two-Step Test of Gene-Environment Interaction in a Genome-wide Association Study by Step 1 Significance Thresholds for the Disease-Gene and Environment-Gene Two-Step Tests for a Binary Exposure (more ...)
The H2 approach is robust to the choice of ρ, the allocation of the overall Type I error rate to the EG2 method, with the relative efficiency remaining relatively flat across a range of interaction effect sizes (). This pattern holds across a wide range of parameter settings (). Generally, the H2 has the highest efficiency when ρ ≥0.5, except when there exists a non-zero main effect (Rge>1.0), a common exposure (pE=0.5), or when more controls than cases are sampled from the population (e.g. case: control ratio = 1:3). The H2 is often most powerful when ρ=0.9 and is robust to many unknown population parameters, i.e. minor allele frequency and genetic main effect. Except for when there is a sizeable genetic main effect (Rg≥1.3) or for a rare exposure (pE=0.1), the H2 approach was always more powerful than the EG2 or DG2 alone for some value of ρ. Although the H2 approach is often most efficient for an unbalanced allocation of α (ρ≠0.5), in practice, a balanced allocation (ρ=0.5) might be a natural choice to implement. For the parameter settings we considered in , the largest difference between the relative efficiencies for the H2 design with ρ=0.5 compared to the optimal choice of ρ is 0.14 when there exists a sizeable genetic main effect (Rg≥1.3).
Relative Efficiency (RE) to Achieve 80% Power to Detect a True Gene-Environment Interaction for the Hybrid Two-Step Analysis for Various Allocations (ρ) of α to the Environment-Gene Two-Step Across a Range of Interaction Effect Sizes.
Sample Size (N) Required and Relative Efficiency (RE) to Achieve 80% Power to Detect a True Gene-Environment Interaction for the Hybrid Two-Step Analysis for Various Allocations (p) of a to the Environment-Gene Two-Step.
All of the tests described can be applied to test for interaction between G and a continuous environmental factor. In general, the relative efficiencies of the methods are similar to the binary E situation. Under our base model parameters, the relative efficiencies of the two-step methods are similar when the interaction effect size is of a modest size (Rge = 1.15) (). For an interaction effect size of 1.3, the EG2, DG2 and H2 tests all converge to be approximately twice as efficient as the traditional CC test as the interaction effect goes to infinity (REEG2 = 2.12, REDG2 = 2.02, REH2 = 1.95).
Sample Size Required to Achieve 80% Power for Tests of Gene-Environment Interaction in a Genome-wide Association Study by Interaction Effect Size for a Continuous Environmental Exposure in the Absence of a Genetic Main Effect (Rg = 1.0).