Let

be the vector of estimated genetic main and gene-environment interaction effects from study
i, obtained from fitting the generalized linear model:
For example, for a continuous outcome, the link function
g(
x) =
x is equivalent to standard linear regression, and for a binary outcome, the link
g(
x) = log(
x/(1 –
x)) is equivalent to logistic regression. For ease of exposition, the genotype is assumed to be coded additively (0, 1, or 2 copies of the minor allele) and the exposure is assumed to be binary, so that

is a scalar. The results are easily extended to categorical exposure with three or more levels, for example.
We note that for continuous exposures, both standard and joint tests can be invalid when the exposure main effect is mis-specified [
8,
9]. Using the Huber-White robust ‘sandwich’ estimate of the variance-covariance matrix of

yields valid tests even when the continuous exposure is not modeled accurately. These robust covariance matrices are currently available from some software packages (e.g. ProbABEL [
10]), but not others (e.g. PLINK [
11]). Alternatively, breaking the continuous variable into categories (e.g. a binary indicator for high versus low exposure) also yields valid tests [
8].
Following the exposition in van Houwelingen et al. [
12] and assuming the sample size in every study is large enough so that

is multivariate normal with variance-covariance matrix Σ
i, we can write the log likelihood for the observed

as:
One can solve for the maximum likelihood estimate

using the score estimating equation
This leads to the weighted least square solution:
Given

, the Wald test of the joint null β = 0 is

, where

is the Fisher information.
The score test also has a simple form:
Both of these tests have a non-central χ
2 distribution with 2 d.f. under the null hypothesis. (The assumption of normality is used here to motivate the estimation procedure; the test remains valid if the estimates

and Σ
i are unbiased but not normally distributed. Due to the consistency and asymptotic normality of maximum likelihood estimates, the study-specific estimates are likely unbiased and normally distributed when the individual sample sizes are ‘large enough’. Care should be taken when some studies have relatively small sample sizes, e.g. under 100 subjects.)
Although in principle this procedure yields valid and efficient tests of the joint null, in practice it may be difficult to implement, as it requires the covariance of

and

. Standard packages for the analysis of GWAS data such as PLINK [
11] or
glu (
http://code.google.com/p/glu-genetics/) do not report this covariance without custom modification. (The --robust option in ProbABEL [
10] will produce estimates of the variance-covariance matrix.) To avoid this complication, one might conduct a stratified analysis in each study, i.e. estimate the genetic effect in the exposed subjects

and the effect in unexposed

separately. Since these estimates are uncorrelated (they are constructed using non-overlapping sets of subjects), the score test simplifies to:
where
and

Note that expression [C] is simply the sum of the usual fixed-effect (inverse-variance weighted) tests for

.
To demonstrate this approach, we used genotype data from five GWASs conducted using European-ancestry subjects: case-control studies of breast cancer (BRCA), type 2 diabetes (T2D) and coronary heart disease (CHD) in the Nurses’ Health Study (NHS), and case-control studies of T2D and CHD in the Health Professionals Follow-up Study (HPFS) [
13,
14]. For each subject in each GWAS, a continuous phenotype
Y was simulated as a function of two single nuclear polymorphisms (SNP) and one binary environmental exposure:
where
G1 was the (observed) count of minor alleles for rs505922 (frequency from 34.1 to 35.8% in these studies),
G2 was the (observed) count of minor alleles for rs1219648 (frequency from 39.6 to 42.0% in these studies), the exposure
E was a Bernoulli 0-1 variable with expectation 0.3, and the residual variation
![[sm epsilon]](/corehtml/pmc/pmcents/x220A.gif)
was normally distributed with mean 0 and standard deviation σ.
E and
![[sm epsilon]](/corehtml/pmc/pmcents/x220A.gif)
were independent of each other as well as
G1 and
G2; since rs505922 is on chromosome 9 and rs1219648 is on chromosome 10,
G1 and
G2 were also effectively independent. We chose this model to examine the performance of the joint meta-analysis (and the marginal and standard interaction meta-analyses) in two situations: one where the effect does not differ across the exposure strata, and one where it does. The two SNPs were chosen arbitrarily, but are representative of the markers present on genome-wide genotyping platforms (rs505922 is on the Affymetrix Axiom and Illumina HumanHap550 arrays, among others, and rs1219648 is on the Affymetrix 6.0, Illumina HumanHap550, and Illumina OmniExpress arrays, among others).
To illustrate the validity of the meta-analytic joint test, we conducted a GWAS of a single realization of Y. This GWAS included 2,543,290 genotyped or imputed SNPs distributed along the 22 autosomal chromosomes that were available for all of the five studies. Because most of these SNPs are not in linkage disequilibrium with G1 or G2 and hence are not associated with the simulated phenotype, results from tests of the phenotype can be used to estimate the type I error rate of the meta-analytic approaches. For this example, the parameters b = (bE, bG1, bG2, bG2E)’ and σ were chosen so that the marginal test of G1 would have high power at the 5 × 10−8 level in the total sample (n = 10,161), and the joint gene-environment test of G2 would have high power in the total sample, while the marginal test of G2 would have modest or low power. The values b = (0.05, 0.12, 0.05, 0.10)andσ = 1.00 satisfied these criteria. Under this model, the genetic variants explained 0.9% of the variation in Y (0.7% for G1 and 0.2% for G2). We estimated subsequently the power of the meta-analytic joint test (and the marginal and standard interaction meta-analyses test) to detect G1 and G2 in this specific situation, by generating 100,000 realizations of Y with these parameters.
To further exemplify the power of the meta-analytic joint test relative to meta-analytic marginal and standard 1 d.f. interaction tests across a broad range of situations, we simulated 1,000 realizations of
Y under each of 320 different parameter settings. We kept the values of
bG1, and σ fixed at their values in the previous simulation (0.12 and 1.00, respectively), while we varied
bE,
bG2 and
bG2E across a grid defined by
bE ![[set membership]](/corehtml/pmc/pmcents/x2208.gif)
{0.05, 0.1, 0.5, 1, 2},
bG2 ![[set membership]](/corehtml/pmc/pmcents/x2208.gif)
{0.01, 0.04, 0.07, 0.1}, and
bG2E ![[set membership]](/corehtml/pmc/pmcents/x2208.gif)
{−0.15 to 0.15 by 0.2 step}.
As these studies were conducted using different genotyping platforms (the Illumina 550k for the breast cancer study and Affymetrix 6.0 for the other four studies), we imputed missing genotypes using MACH [
15] and the HapMap (rel22) CEU data. We conducted GWAS for SNPs associated with
Y using ProbABEL, which takes the dosage files from MACH as input. We compared three tests. In the first test, we estimated the marginal effect of each SNP in each study, and then combined these estimates using standard fixed-effect meta-analysis. In the second, we estimated the usual product interaction term in each study (adjusting for SNP and exposure main effects), and then combined the estimates of these parameters using fixed-effect meta-analysis. In the final test, we estimated SNP effects stratified by study and exposure, and calculated the overall joint test using expression [C] above. We also investigated the effect of conditioning on a main effect for
E (but not including a
G ×
E interaction term) on the performance of the marginal tests for
G1 and
G2.