Testing GxE interactions with family-based data has a bit of a checkered history. For a dichotomous exposure, interest properly centers on whether the relative risk associated with carrying a variant genotype differs between exposed and unexposed individuals. While Mendelian inheritance guarantees that family-based, i.e., transmission-based, inferences about genetic effects are protected against inflated Type I error rates due to genetic population structure, our simulations document that this protection does not extend to inference related to gene-by-environment interaction.
One early method 19
treated transmission of the variant allele as the dichotomous event of interest, and used logistic regression to compare transmission rates to unexposed versus exposed affected offspring. A similar and seductively simple method creates a two-by-two table based on categorizing all the heterozygous parents, with transmission/nontransmission of the designated allele forming the columns and exposed/unexposed (offspring) forming the rows.10
One simply carries out a chi-squared test for independence. Unfortunately, while still used, such approaches that directly compare allelic transmission rates are invalid.9,20
First, they do not account for the induced dependency, present even in a homogeneous population, between transmissions from the mother and the father to an affected offspring. Second, and more importantly, in stratified populations transmission rates can differ between exposed and unexposed offspring even when the relative risks for carrying a variant genotype do not.
Other existing methods for testing interaction with case-parents data also in effect compare transmissions to exposed versus unexposed affected offspring, but do not use allelic transmission rates directly. These methods avoid the problems of transmission dependency by treating the family rather than each allelic transmission as the unit of analysis.
Many were developed with candidate SNPs in mind, that is, under the strong assumption that the SNP under study is causative and not in LD with another causative SNP. These methods are valid under that unrealistically narrow assumption, as shown in our simulations. Our simulations further demonstrate that these approaches are invalid generally for structured populations when the structure is exposure-related. The knotty problem is that population structure tends to produce heterogeneity in marker transmission rates even when genotype relative risks for the causative allele do not differ between exposed and unexposed individuals within families.
To demonstrate the potential for inflated Type I error rates when testing GxE interactions, we used an extreme scenario in our simulations. For less extreme scenarios, the inflation will be less. Of course, an investigator will generally not know how extreme exposure-related population structure may be in a targeted population. Other approaches to alleviate such bias include stratification on reported ethnicity or on strata derived from a large genome-wide panel of SNPs.21
The ability of such methods to overcome bias depends heavily on how well the assigned strata can identify sufficiently homogenous subpopulations. If families can be assigned unambiguously to their truly relevant subpopulation, then stratification will correct the inflation of Type I error rates for GxE; however, in most settings this expectation would be unrealistic.
Tests of interaction depend heavily on how one specifies the null model. The choice between the multiplicative versus the additive interaction null models has been a long-debated subject.22
The multiplicative model is widely used mainly due to the mathematical convenience of logistic regression, whereas the additive model has been argued to be more biologically relevant. We focused on testing a multiplicative null in this paper, but the tetrad design and the case-sib design also allow testing of an additive model.
While results shown here have been restricted to a dichotomous exposure and a dichotomous phenotype, the same sorts of biases occur in the more general context where the exposure is continuous or even where the phenotype is quantitative. Biases can also occur in haplotype-based analyses when haplotypes under study are in LD with a causative SNP. Our simulations indicate that the Type I error rates are inflated for several haplotype-based approaches such as GEI-TRIMM23
(see Shi et al18
). Thus, great care must be taken with inferences about GxE interactions when using family data. The usual analyses suggesting causative multiplicative interactions between an exposure and a genotype may simply be showing the tendency for exposure to serve as a marker for the LD relationship between the measured marker (even if a haplotype) and an unmeasured causative variant. It is worth noting that bias can occur even without differential LD in the subpopulations. Consider a scenario where a causative locus A is typed, but there is also another causative locus B. Bias can occur whenever both the haplotype frequencies and exposure prevalence differ in the two subpopulations even if the LD between loci A and B remains the same across subpopulations.27
Our proposed remedy is to adjust for a family-based measure of the exposure distribution, Ē, multiplied by genotype. This remedy works extremely well for a dichotomous exposure. If the exposure is continuous, then correcting for exposure-related population structure bias in assessing gene-by-environment interaction becomes more complex, and this problem is the subject of ongoing research.
Of the interaction analyses that used the G Ē -adjustment, the tetrad analysis was virtually identical to the sibpair analysis that imposed within-family G-by-E independence. The other case-sib analysis was consistently less powerful. The within-family independence assumption used here to good advantage is far less stringent than independence of genotype and exposure in the general population (the assumption required for case-only analyses) and should often be plausible. While this supports use of the case-sib method for assessing GxE, genotyping parents provides more power for testing genetic main effects and it permits additional questions to be addressed, e.g., whether there are prenatal maternally-mediated effects on risk, and whether the risk associated with a variant allele depends on the parent of origin.
-adjustment removes bias from the assessment of GxE interaction, it also costs power in situations where adjustment is not needed. One could perform a 4-df (degree-of-freedom) likelihood ratio test by comparing the base genotype/exposure models with and without GĒ
-adjustment to investigate whether inference in a particular data set will likely be subject to bias from population structure. To reduce the number of degrees of freedom, one can fit both G and Ē
as linear, resulting in a 1-df likelihood ratio test. Unfortunately even this 1-df test is not very sensitive. One can nevertheless set a liberal α-level (e.g. α=0.2) and use the model without GĒ
-adjustment to achieve more power when the 1-df likelihood ratio test is not rejected. Empirical Bayes approaches could also potentially help the investigator to negotiate a compromise between bias and efficiency.28
The non-robustness problem highlighted here for family-based analyses of genotype-by-exposure interaction also will plague family-based analyses of genotype-by-genotype (GxG) interactions aimed at elucidating epistatic effects. Even for loci that are unlinked (e.g., on different chromosomes), analyses can generate spurious evidence for epistasis if there is genetic population structure. A robust analysis for GxG interactions could be accomplished through stratification on parental genotypes at both loci (i.e., 36 mating type strata), but this strategy would require a large sample size.
Most researchers, whether involved in the development or application of GxE methods, have mistakenly presumed that family-based methods must be robust to population structure. Under that mistaken assumption, investigators have scanned the genome SNP by SNP looking for GxE. We have shown that when the exposure participates in the population structure, the usual analyses of markers do not guarantee robust tests for GxE interaction effects. Recognition of this potential source of bias is particularly important for SNP-by-SNP analyses of family-based genome-wide association studies. Our proposed method provides one strategy for ameliorating the problem, at least for dichotomous exposures.