Family-based designs are of particular interest when studying diseases with onset early in life, such as asthma (1
), autism, or birth defects. Investigators collect DNA from cases and their parents (producing a “triad”) in order to find genetic markers related to risk; one can also study maternally mediated genetic effects and parent-of-origin (imprinting) effects. Genotype relative risks can be estimated using a log-linear approach (2
One can also use case-parent triads to study multiplicative gene-by-environment interactions for a dichotomous exposure or a categorical exposure (5
). Several other family-based approaches have been proposed, including the case-sibling design (7
), the pseudosibling analysis (8
), the family-based association test with interaction (FBAT-I) (9
), and a method recently developed by Vansteelandt et al. (10
While a case-only approach could also be used, its reliance on gene/environment independence in the source population should worry a careful investigator. A triad design tolerates a much weaker, within-family, independence assumption and enables assessment of genotype main effects. Thus, the case-parent design is both more robust and more informative than the case-only design.
Triad designs also offer advantages over the usual case-control design, which can be subject to self-selection, differential recall, and bias due to genetic population stratification, if subpopulations have a higher prevalence of the allele and a higher baseline risk of disease. The latter can bias interaction assessments if exposure prevalence also varies across subpopulations. By conditioning on parental genotypes, triad designs avoid bias due to these kinds of confounding. Even if parents self-select for their genes or for their affected child's exposures, self-selection will not produce bias unless the decision to participate is also influenced by which of their alleles they happened to pass on to their offspring.
Case-control designs applied to diseases with early-life onset also typically do not enable assessment of important potential confounders and contributors to risk, such as prenatal effects of the maternal genotype and imprinting effects. In addition, for a study of gene-by-environment interaction, family-based designs generally offer better power than would a case-control design for the same number of cases (6
). On the other hand, case-control designs, unlike family-based designs, enable estimation of the exposure effect in addition to the interaction effect. This important advantage can sometimes distort assessment of interaction, however, because misspecification of the main effect of a continuous exposure can cause bias.
In this paper, we describe how a method previously developed for studying genetic effects on a quantitative trait (12
) can be used to assess gene-by-environment interaction involving continuous or categorical exposures. The method uses an approach we call quantitative polytomous logistic (QPL) (12
). Suppose an autosomic diallelic marker, such as a single nucleotide polymorphism, is studied in case-parent triads and an exposure is measured for the cases. The exposure may be either that experienced via the mother during the pregnancy or one experienced later by the offspring. We show that the interaction assessed by QPL corresponds to the usual multiplicative interaction. To accommodate families with a missing genotype, we use an expectation-maximization algorithm (13
The proposed approach requires conditional independence of the exposure and the offspring's genotype, given the parental genotypes, meaning that the exposure must not predict genetic transmissions from parents to offspring in the population at large, conditional on the parental genotypes. This assumption is much weaker than assuming independence of the exposure and genotypes in the population at large (as required with a case-only study), though it can be violated for genes that can influence exposure, such as genes governing alcohol metabolism.
We use simulations to compare the power and robustness of the proposed method with those of 2 other analytic approaches based on family data, the FBAT-I (9
) and the quantitative transmission disequilibrium test (QTDT) (14
), and also with the usual case-control (15
) and case-only (16
) approaches. To illustrate our proposed method, we test for gene-by-environment interaction between maternal smoking and polymorphisms in a gene involved in detoxification of smoking products, in a study of cleft lip/palate.
Some explanation for our application of the QTDT is needed. The QTDT uses a linear model in which the dependent variable is a quantitative trait, parental genotypes are included as covariates, and one tests for a predictive effect of the offspring genotype on the offspring trait. Here we simply replace the quantitative trait with an exposure—in effect testing whether, conditional on the parental genotypes, the inherited genotype predicts the exposure: Under a no-interaction null hypothesis, genotype and exposure should be independent. The method recently proposed by Vansteelandt et al. (10
) uses a similar strategy. We do not simulate that method, however, because its current implementation does not permit saturation of the base model in genotype main effects, and thus the test built on it could often be invalid.
The case-sibling design will also not be considered further here, because for young-onset diseases many families will not have an unaffected offspring available; in addition, the exposure (particularly maternal exposures incurred during pregnancy) will tend to be overmatched in this context. The pseudosibling approach is also not considered, because for complete data it is virtually identical in performance to the polytomous logistic method we will describe, but unlike our approach it cannot readily accommodate incompletely genotyped persons.