Studies of GxE interaction are critical in advancing our understanding of the etiology of orofacial clefts (Zhu et al., 2009
) – and yet there have been few successes in actually identifying these important interactions (Khoury and Wacholder, 2009
). At least part of the problem has been that the development of robust statistical methods and models for identifying GxE interaction has not kept pace with the rapid advances in molecular genetics (Birnbaum et al., 2009
; Grant et al., 2009
; Beaty et al., 2010
; Mangold et al., 2010
; Dixon et al., 2011
; Rahimov et al., 2012
). Furthermore, the combination of inadequate sample size, study heterogeneity and differential assessments of environmental exposures continues to challenge studies of GxE interaction (Clayton and McKeigue, 2001
; Thomas, 2010
; Weinberg, 2009
). Novel study designs that increase power but are less prone to confounding from stratification, such as the hybrid design developed in this study, are important in advancing the study of GxE interaction.
The self-administered questionnaires used in this study contained a much more comprehensive list of environmental exposures than the six maternal exposures examined here. Instead of a haphazard exploration of all possible combinations of genes and exposures, however, we chose to narrow down our search for GxE interaction by studying only those exposures that had already shown an association with clefting in the same study population (Lie et al., 2008
; Deroo et al., 2008
; Wilcox et al., 2007
; Johansen et al., 2008
). Despite this more targeted approach, there was little statistical evidence overall for GxE interactions between the six maternal first-trimester exposures and the 334 cleft candidate genes tested in our data. One possible exception was an interaction between the T-domain transcription factor gene T-box 4 (TBX4
; chr 17q21-q22) and dietary folate in isolated CPO . TBX4
is a member of an evolutionarily highly conserved family of genes that regulate key developmental processes (King et al., 2006
). The mouse homolog, Tbx4
, regulates limb development and specification of limb identity (Duboc and Logan, 2011
). This gene also maps to the 17q21 region that has previously shown significant results in a metaanalysis of 13 genome-wide linkage scans for CL/P (Marazita et al., 2004
). Finally, this chromosomal region is also syntenic with the region harboring the mouse clf1 mutation (Juriloff DM, 1996
). Although these studies indicate that TBX4
might be more relevant to CL/P than CPO, its connection to maternal dietary folate in our data will need to be verified in other datasets before we can categorically dismiss it as a false positive.
Several studies have previously used the case-parent triad design to investigate GxE interactions in clefting (Jugessur et al., 2003
; Shi et al., 2007
; Wu et al., 2010
). Case-parent triads allow a range of causal scenarios to be investigated with relatively high precision (Gjessing and Lie, 2006
). These include fetal and maternal gene-effects, parent-of-origin effects, gene-gene (GxG) interaction, and GxE interaction. For GxE interaction, one compares the transmission of a particular allele or haplotype to an affected offspring between triads of exposed and unexposed mothers. A statistically significant difference between the two transmission patterns would suggest a multiplicative interaction. The use of case-parent triads also overcomes the problem of population stratification by effectively using non-transmitted parental alleles as controls, to be compared with the alleles transmitted to the case child. As both the case and control alleles derive from the same individuals, they are thus guaranteed to be selected from the same population subgroup.
Despite these attractive attributes, a notable limitation of the case-parent triad design is its inability to assess the main effects of an environmental exposure. Comparing genetic effects in the exposed and unexposed triads may reveal interactions, but says nothing about whether the environmental exposure is protective or deleterious. While the case-parent triad design protects against population stratification, it has in general a somewhat lower efficiency than a case-control design. As a consequence, various “hybrid designs” have been proposed to combine the merits of the case-parent triad and case-control design. The full hybrid design involves complete case-parent triads together with complete control-parent triads (not necessarily the same number of controls as cases), but truncated versions that include leaving out the control child and genotyping only his/her parents, leaving out the control-father, or using case-mother dyads together with control-mother dyads have also been proposed (Kazma et al., 2011
; Vermeulen et al., 2009
; Weinberg et al., 2011
; Shi et al., 2008
Because the hybrid design involves independent controls, it adds more statistical power to the analyses and allows an estimation of the main effects of an exposure. A complete caseparent triad provides two transmitted case alleles and two non-transmitted control alleles, but adding a complete control-parent triad adds four independent control alleles. Since the alleles carried by the control child are already present in his/her parents, a complete controlparent triad thus counts as two full independent controls. Although the hybrid design gains advantages from both the case-parent and the case-control designs, it is also to some extent influenced by population stratification. Since it incorporates a case-control component, the bias in the latter may show up in the overall estimate. While the effect is lower than for the case-control design alone, it may still be noticeable.
The HAPLIN implementation of the hybrid design uses a full maximum-likelihood log-linear model setup, as a direct extension of the case-parent triad model originally implemented in HAPLIN (Gjessing and Lie, 2006
). The implementation makes the standard “rare disease assumption” which allows relative risks and odds ratios to be used interchangeably. This assumption is reasonable for orofacial clefts given the relatively low overall risks of CL/P and CPO, thus enabling the relative risk estimates from the case-parent triads to be combined with odds ratio estimates deriving from the case-control comparison. An advantage of the complete maximum-likelihood framework is that imputation of missing genotype data and haplotype reconstruction can be done using the EM algorithm. There is, however, always a price to pay in the form of increased computation time due to more complex model implementation. An additional advantage of the log-linear model setup is that we can obtain explicit estimates for relative risks with asymptotic standard errors. Based on this, interactions may be quantified by computing ratios of each stratum relative risk to a reference stratum. Testing can be performed using the likelihood ratio, Wald, and score tests.
The Wald test for interaction is a flexible approach that allows any set of estimated parameters to be compared across the strata of exposure. In the present application, we only analyzed a single parameter – the log relative risk for a given SNP variant under the assumption of a multiplicative dose response. If haplotype risks were estimated, one might equally well test relative risks linked to more than one haplotype. Similarly, one might test for different haplotype frequencies across strata. The model setup is also simplified by estimating the same model over all strata independently. As an additional check of the software implementation and estimation, we performed a likelihood ratio test for the same interactions. The LRT requires the null model to be estimated explicitly, in this case with a model assuming different SNP frequencies but the same SNP relative risk across strata. The results from the LRT were nearly identical to the Wald test, as would be expected in our model framework.
When testing interactions with a continuous exposure variable, the variable can be grouped into suitable categories, each one large enough that the asymptotic properties of the estimator would be expected to hold true. To increase power, one might test for a trend-type relationship, assuming systematically increasing or decreasing genetic effects in the exposure categories as described in “Model implementation” under “Statistical analysis” in the Methods section. In our setting, we decided to dichotomize all exposures, using a cut-off value consistent with previously detected exposure effects on the risk of clefts. shows the cut-off values used for each type of exposure.
Even though the hybrid design affords more statistical power to detect GxE interaction, it may still be too limited with respect to our sample size. Appropriate control for multiple testing places additional constraints on what effect sizes are possible to detect, even in a candidate-gene setting such as in the present study. Our power simulations explore a selection of scenarios with varying size of the relative risk associated with a SNP and the proportion of SNPs exhibiting this strength of association. The results in show that we should be able to detect all but the smallest relative risks and proportion of SNPs. We recently studied the role of maternal smoking and variants in nicotine dependence genes using another novel approach that involves the use of instrumental variables (IV) (Wehby et al., 2011
). Under the IV model, maternal smoking before and during pregnancy increased the risk of clefting by about 4-5 times at the sample average smoking rate – substantially higher than that found with classical analytic models. This may be because the usual models cannot account for self-selection into smoking based on unobserved confounders. Therefore, a relative risk of 1.8 to 2.0 in our power calculations may be well within the range of expected risks.
Comparing the results of GxE interaction across studies is rarely straightforward, not only because of important differences in study design and methodology, but also because of differences that are unique to the specific populations studied (Clayton and McKeigue, 2001
; Thomas, 2010
; Weinberg, 2009
). The availability of case-parent triads and independent controls (or control families) increases the power to detect interactions while retaining a degree of protection against population stratification. Even in a candidate gene study, correction for multiple testing leads to non-significant overall results, and statistical power to detect interactions is frequently lost due to rare exposure categories or low allele frequencies. While these challenges are well understood, it is still remarkable that orofacial clefts, a phenotype of supposedly very high heritability, remains so hard to decipher.
In conclusion, identifying GxE interactions in complex traits is still fraught with difficulties, but their identification is essential to applying the findings to improve diagnosis, prognosis and therapies/prevention. As large sample sizes continue to accrue through collaboration and biorepositories, statistical power will increase to detect the effects of GxE interaction. The application of powerful new methodologies, such as the one outlined herein, will enable investigators to detect those effects more efficiently while balancing the realities of phenotyping and genotyping costs.