We have shown that it is possible to screen for gene–environment interactions by integrating results from GWAS and EWAS. Our most promising results are candidates for prospective studies in additional independent cohorts.
We chose environmental factors and SNPs with strong evidence for marginal associations in EWAS and GWAS. However, it would also be possible to evaluate interactions that lack strong evidence. Given the small marginal effects for most common SNPs, many genuine associations do not reach GWS and remain false negatives (Ioannidis et al. 2011
). Some may have strong interactions with environmental factors (Khoury and Wacholder 2009
), and may only be discovered if appropriate joint environmental variables are considered. However, choosing them from millions of non-GWS SNPs would be a significant challenge. In addition, testing for interactions is power-intensive (Hunter 2005
), and testing a large number would impose a significant power and multiplicity burden (Thomas 2010
). It has been argued that strict Bonferroni multiplicity corrections need not be used when considering factors derived from previous observations (Rothman 1990
). However, we counter that interaction effects need not exist between factors that have robust evidence from EWAS and GWAS. Further, by estimating the FDR, we present a more powerful way to prioritize findings versus the Bonferroni correction.
Selecting environmental factors to test for interaction is even more difficult. In contrast to SNPs, there is no high-throughput platform that captures environmental factors with low measurement error. This lack of measuring capacity limits data.
We were able to use a prior EWAS to systematically screen 266 T2D-environmental factors measured in serum and urine. We selected five factors with the strongest support for further testing. An advantage of our approach is that it allows for hypothesis generation while keeping the total number of tests lower than testing all possible factor pairs. However, it is still very important to account for multiple hypothesis testing. We used multiplicity correction and FDR, but other approaches may also be employed (Ioannidis 2006
). Other alternatives exist to filter the hypothesis space of interactions, such as prioritizing interacting factors based on evidence of physical or toxicological interaction (Patel et al. 2012a
There were other challenges in this study. First, we had low-to-moderate power to detect moderate interaction effects for some of the interactions we tested. Not surprisingly, the p values and effect sizes of results were modest and only one survived Bonferroni correction. We also obtained modest FDR estimates for the other highest-ranking interactions. However, we observed that the top interactions between these SNPs and EWAS factors were stronger than the interactions between the any of the same SNPs and other conventional risk factors for T2D, such as caloric intake, BMI, and physical fitness. We conclude that our top findings are ideal candidates for extensive validation through replication in higher-powered investigations.
Replication studies can investigate trends in SNP interactions with various environmental entities in populations of different ancestry. Population stratification (Smith et al. 2007
) is one type of bias for the phenotypic effect of SNPs. Although our analysis adjusted and stratified for race, to date, the SNPs identified by GWAS are best characterized in Caucasian populations. Genetic effects for GWAS-discovered markers may be different in other groups (Hayes et al. 2007
; Ioannidis 2009
; Shu et al. 2010
; Tsai et al. 2010
; Unoki et al. 2008
; Yamauchi et al. 2010
). For example, one study of African–American heart disease patients replicated 17 SNPs found in subjects of European descent. The study identified only one SNP (rs7903146 TCF7L2
) associated with T2D in African–Americans from a list of 15 SNPs common to this study, including rs13266634 (SLC30A8
) (Lettre et al. 2011
). Little is known about gene–environment interactions in populations of different ancestry and this idea should be investigated.
The potential imbalance of each interaction test was a limitation of this study. Ideally, each interaction pair should have the same participants. However, NHANES subjects did not all undergo the same tests. Our smallest subsamples were those with Heptachlor Epoxide and PCB170. These factors gave high marginal effects, but their analyses were lower powered relative to other subsamples. Our results may be biased and not as generalizable as tests with larger sample sizes.
There are few documented examples of interaction effects between T2D, GWS SNPs and diverse environmental or dietary factors (Cornelis et al. 2009
). We have been able to hypothesize about possible new ones. For example, the strongest evidence for interaction in our data was between rs13266634, a non-synonymous coding SNP in the SLC30A8
gene and three nutrient factors, trans- and cis-β-carotene, and γ-tocopherol. SLC30A8
is expressed in pancreatic islets and localized in insulin secretory granules of islet β cells. It appears to modulate insulin secretion and storage (Chimienti et al. 2004
). Several reports have found diet-dependent glucose intolerance and insulin secretion abnormalities in SLC30A8
knockout mice (Lemaire et al. 2009
; Nicolson et al. 2009
; Pound et al. 2009
). rs13266634 has been associated T2D in numerous GWAS [e.g., Sladek et al. (2007
), Table S1], and can influence insulin secretion following glucose challenge (Staiger et al. 2007
). Thus, this SNP may be important in T2D pathogenesis. Our study enabled us to hypothesize that impaired insulin secretion driven by rs13266634 may increase T2D risk if combined with high or low levels of specific nutrients.
Alternatively, γ-tocopherol and β-carotene may be markers of other dietary components. β-Carotene is a lipid-soluble dietary factor correlated with fruit and vegetable consumption (Block et al. 2001
), components that are associated with T2D prevention (Carter et al. 2010
). In contrast, the richest sources for γ-tocopherol include soybean oils and margarine (Wagner et al. 2004
), components with higher fatty acid content. Fatty acids influence β-cell function and have been shown to even potentiate insulin secretion among individuals genetically predisposed to T2D (Ashcroft and Rorsman 2012
). Of interest, vitamin E appears to modify GWAS-identified SNPs associated with serum lipid levels, metabolic traits that are risk factors for T2D (Dumitrescu et al. 2012
One hypothesis under debate regarding the etiology of T2D is the thrifty genotype hypothesis, in which T2D risk genotypes provided advantages for indigenous human populations. Now, in times of more readily available nutrients and calories, a result of a different environment, these thrifty genotypes are now risk genotypes. However, evidence to support existence of such thrifty genes or interactions with these environmental factors and behaviors is lacking. To this end, competing hypotheses have emerged, including the “thrifty phenotype” (Hales and Barker 2001
) and “drifty genotype” (Speakman 2008
), whereby predisposition to metabolic diseases are a result of mismatch in nutrition environments between early (pre-childhood) and adult life or due to random genetic drift, respectively. Further, more recent events in human history, such as famine, may have played a role to enrich thrifty genes in certain populations (Diamond 2003
). Perhaps one reason behind lack of formal evidence to support these hypotheses may be that other constituents of the modern lifestyle, such as those indicated by EWAS (in addition to higher overall energy intake), may be interacting with genotypes that conferred advantages to early human populations. Future studies should examine the role of other indicators of modern lifestyle and environment on T2D as we have attempted here.
There was some unavoidable asymmetry in our selection of SNPs and environmental factors. We chose SNPs with documented robust associations with T2D and environmental factors with strong associations to T2D in NHANES. Only three variants were significantly associated with T2D overall, and only two were significantly associated with T2D in race-stratified analyses. This pattern was anticipated, given the small marginal effects of these genetic factors.
While interactions may be informative of causality (Davey Smith 2010
), these findings are subject to bias. For environmental factors, confounding and reverse causality are major issues (Ioannidis et al. 2009
). Little is known about the causal nature, if any, of these factors and T2D (Song et al. 2009
). Our findings must be confirmed in independent, larger populations. Prospective studies will be critical.
The SNPs we examined may have robust marginal associations to T2D, but could only tag the actual causal SNP. Our power is decreased for tagging SNPs that are not in complete linkage disequilibrium with the causal SNP. More importantly, etiological inference might be hindered if the causal SNP is unknown.
Nevertheless, these findings may have important implications for personalized medicine (Chan and Ginsburg 2011
) or the “missing heritability” debate (Manolio et al. 2009
). For example, Roberts et al. (2012
) have recently quantified the difficulty in predicting disease risk using entire genomes of individuals. However, Roberts et al. (2012
) only considered genetic or environmental main effects and interactions were not considered. On the other hand, Aschard et al. (2012
) recently provided theoretical arguments that gene–environment interactions are unlikely to improve risk prediction. However, only a limited number of interactions (maximum of 10) were considered in these simulations. It is possible that inclusion of many interaction effects may increase prediction. We hypothesize that perhaps the lack of predictive capacity in the Roberts et al. investigations and predicted by the Aschard et al. simulations arises from not considering multiple interactions between environmental exposures and the genome. To test the hypothesis empirically that multiple interactions may influence heritability estimates, we would require relatedness information between participants currently unavailable in NHANES. Further, to test if multiple interactions influence risk prediction, we would require samples with same environmental and genetic measures for all participants. Nevertheless, we demonstrate one way of identifying multiple interactions to test in these contexts in future investigations.
Infrastructure-related challenges remain in this area (Hunter 2005
). First, unlike common SNPs (Hindorff et al. 2009a
), we lack a complete list of candidate environmental factors. Screening and validating gene–environment interactions is power-intensive, and will require both environmental and genetic measures to be measured in multiple studies (Ioannidis et al. 2009
), augmentation of GWAS with environmental data (Khoury and Wacholder 2009
), and adoption of measurement standards (e.g., Hamilton et al. 2011
). A systematic approach to investigating the interactions of environment and the individual genome may help explain a substantial component of disease risk, lead to hypotheses regarding disease pathology, or help shed light on the debate on the genetic basis of disease (Gibson 2011