|Home | About | Journals | Submit | Contact Us | Français|
In the presence of epistasis multilocus association tests of human complex traits can provide powerful methods to detect susceptibility variants. We undertook multilocus analyses in 1924 type 2 diabetes cases and 2938 controls from the Wellcome Trust Case Control Consortium (WTCCC). We performed a two-dimensional genome-wide association (GWA) scan using joint two-locus tests of association including main and epistatic effects in 70,236 markers tagging common variants. We found two-locus association at 79 SNP-pairs at a Bonferroni-corrected P-value = 0.05 (uncorrected P-value = 2.14 × 10−11). The 79 pair-wise results always contained rs11196205 in TCF7L2 paired with 79 variants including confirmed variants in FTO, TSPAN8, and CDKAL1, which are associated in the absence of epistasis. However, the majority (82%) of the 79 variants did not have compelling single-locus association signals (P-value = 5 × 10−4). Analyses conditional on the single-locus effects at TCF7L2 established that the joint two-locus results could be attributed to single-locus association at TCF7L2 alone. Interaction analyses among the peak 80 regions and among 23 previously established diabetes candidate genes identified five SNP-pairs with case-control and case-only epistatic signals. Our results demonstrate the feasibility of systematic scans in GWA data, but confirm that single-locus association can underlie and obscure multilocus findings.
Epistasis, or gene–gene interaction, has recently received much attention in gene-mapping studies in human genetics (Carlborg & Haley, 2004; Marchini et al., 2005; Evans et al., 2006; Cordell, 2009). Conventional methods used to locate genes involved in human traits are primarily based on single-locus analyses and often ignore the joint effects of multiple genes on the phenotype. However, studies in model organisms have shown that epistatic interactions can have large effects, may occur frequently, and can involve more than two loci (see Mackay, 2001; Carlborg & Haley, 2004). For example, novel loci that act through epistatic pathways have been identified through multidimensional scans of complex traits in the mouse (Sen & Churchill, 2001; Shimomura et al., 2001), rat (Sugiyama et al., 2001; Ways et al., 2002), chicken (Carlborg et al., 2003; Carlborg et al., 2004), Drosophila (Montooth et al., 2003), and in plants (Holland et al., 1997; Li et al., 1997). These studies suggest that multiple interacting genes can influence complex phenotypes often in the absence of significant single-locus effects. Epistasis in this context typically refers to the interaction between genetic variants at multiple loci, or to the presence of non-additive genetic variance in the population. In the present study, we define epistasis as the statistical interaction between genetic variants at multiple loci, and the epistatic variance as the genetic variance not attributed to single-locus additive and dominance effects at the population level.
The findings of epistatic interactions across multiple species imply that epistasis should be taken into account when searching for susceptibility loci for human complex traits, particularly within the context of recent genome-wide association (GWA) analyses. Strategies to incorporate epistasis within GWA scans have been examined and results indicate that in the presence of epistasis, association studies of complex human traits can have more power to detect susceptibility variants under multilocus models (Marchini et al., 2005; Evans et al., 2006). However, if variants at two loci interact, then the estimated multilocus variance-components, and especially the observed epistatic variance, will depend on allele frequency (Lynch & Walsh, 1998; Hill et al., 2008). In particular, Hill et al., (2008) show under several population genetic models that if the allele frequency distribution is taken into account, a large proportion of the population genetic variance will be additive, even in the presence of epistasis. These results suggest that joint multilocus tests, including both main and epistatic effects, may prove advantageous over other multilocus approaches, such as epistasis-only or conditional analyses, for identifying novel susceptibility loci in GWA data even in the presence of interactions.
We considered joint multilocus effects on a genome-wide level with respect to type 2 diabetes (T2D), which is one of the most common heritable complex traits with rising prevalence throughout the world. To date over 30 T2D susceptibility loci have been identified (Saxena et al., 2007; Scott et al., 2007; Sladek et al., 2007; Steinthorsdottir et al., 2007; WTCCC, 2007; Zeggini et al., 2007; Unoki et al., 2008; Yasuda et al., 2008; Zeggini et al., 2008; Kong et al., 2009; Rung et al., 2009; Dupuis et al., 2010; Voight et al., 2010), but these variants explain only a small proportion of the heritability of T2D. We aimed to assess the contribution of genetic variants to the disease, incorporating the possibility of epistatic interactions within the GWA analyses by using the logistic regression framework. We performed a systematic two-locus GWA scan including epistasis in T2D cases and controls used in the WTCCC GWA scan.
The study consisted of 1924 T2D cases and 2938 controls used in the WTCCC GWA scan and genotypes assayed on the Affymetrix 500K platform. We considered genotype data for a subset of 393,143 common (minor allele frequency (MAF) ≥ 0.05) SNPs. These SNPs passed stringent quality control criteria as previously described (WTCCC, 2007; Zeggini et al., 2008; 2007), including checks for Hardy-Weinberg equilibrium in the controls, and checks for differential missing rates in cases and controls.
Analyses were performed on a subset of SNPs in the data (tag-SNPs) that tagged genetic variation at all remaining genotyped variants in the T2D cases and WTCCC controls. We chose to analyse tag-SNPs as a compromise between reducing computational load and maintaining power to detect multilocus effects, although selecting tag-SNPs alone will reduce the power to detect both joint two-locus effects and non-additive genetic variance components (Hill et al., 2008). Our primary goal was to perform a genome-wide scan to test for evidence for joint two-locus association, rather than focus on specific pairs of regions using all available genetic variation. We defined tag-SNPs based on a pairwise r2 of 0.2 in the WTCCC data. This resulted in a final set of 70,236 tag-SNPs for the two-locus GWA analyses.
The goal of the study was to identify novel T2D loci with evidence for joint multilocus effects. We first fitted a two-locus joint-effects test including main and epistatic components for all pairs of non-syntenic tag-SNPs across the genome. We did not perform an epistatic-only GWA scan because our aim was to identify T2D susceptibility variants with joint multi-locus effects rather than identify evidence for epistasis alone. We applied a Bonferroni correction threshold (P = 2.14 × 10−11) to obtain genome-wide significant pairs of candidate loci with joint-effects. We obtained 79 genome-wide significant marker-pairs, for which we then fitted a series of nested models: single-locus, epistatic, and two-locus models conditional on the main effects at TCF7L2. We used different multiple testing thresholds to assess results from the nested models. In the single-locus analyses we used previously published thresholds (WTCCC, 2007) to qualify genome-wide significance (P = 5.00 × 10−7) and a relaxed threshold to establish suggestive evidence for association (P = 5.00 × 10−4). In the two-locus test conditional on the main effects at TCF7L2 we used a Bonferroni corrected threshold (P = 7.51 10−7) to establish significance. In the epistasis analyses we used Bonferroni corrected thresholds for tests focusing on the peak 79 pairs of markers from the joint-effects analyses (P = 6.30 × 10−4). Lastly, we also performed epistatic-analyses specific to the 23 confirmed T2D regions available in our data (see Epistasis-only analyses Methods section) and applied a Bonferroni correction for multiple testing in these analyses (P = 1.97 × 10−4). Our two-locus scan strategy required only moderate computational demands, because the joint two-locus test is an exact test that has a closed form solution to the maximum likelihood, and the majority of the two-locus analyses over the genome were performed using this test. However, the epistasis-only tests were computationally intensive, therefore we only tested for pair-wise epistasis among the peak joint two-locus results and 23 confirmed T2D regions.
We performed case-control two-locus analyses within the logistic regression framework. Our principal two-locus GWA analysestook into account the main effects and all possible interaction terms between two loci in their contribution to the phenotype. The contribution of two loci to the linear predictor of T2D status in individual i can be described as
where β0 is the intercept, β1 and β3 are the main-effects additive effects at loci 1 and 2, β2 and β4 are the main-effects dominance effects at loci 1 and 2, and β5–β8 are the additive-by-additive (β5), additive-by-dominance (β6), dominance-by-additive (β7), and dominance-by-dominance (β8) epistatic effects. We scaled the genotypes at each of the two biallelic loci on additive (XA) and dominance (XD) scales. For example, for biallelic locus 1 with genotypes 11, 12, and 22, on the additive scale we counted the number of copies of the minor (2) allele (11:XA1 = 0, 12:XA1 = 1, 22:XA1 = 2) and on the dominance scale we compared heterozygote to homozygote genotypes (11:XA1 = 0, 12:XA1 1, 22:XA1 = 0). In the principal two-locus GWA analyses we compared the likelihood of the full two-locus model to the null model with no genetic effects (including just the β0 term) to assess the evidence for joint two-locus effects on T2D. Genome-wide significance thresholds were obtained by Bonferroni correction for the overall number of tests performed (2.34 × 109), where a nominal P = 2.14 × 10−11 corresponded to a corrected P = 0.05.
Single locus analyses were performed within the logistic regression framework by assessing the fit of the main additive and dominant effects at each locus. In particular, at each locus we compared the likelihood of the following model:
where β1 and β2 were the main additive and dominance effects at that locus, to the likelihood under a null model with no genetic effects.
For the 80 peak regions of interest we wished to assess whether the two-locus signals were primarily due to the main effects at TCF7L2. We performed two-locus analyses conditional on TCF7L2 main effects, by comparing the fit of model (1) to the fit of model (2), where β1 and β2 in model (2) were the main-effect additive and dominance effects specifically at TCF7L2. To correct for multiple testing in these conditional analyses, we applied a Bonferroni correction for the number of pair-wise tests genome-wide that involve the tag-SNP rs11196205 in TCF7L2 (66,451 tests, excluding tests within chromosome 10), resulting in a nominal P = 7.51 × 10−7 corresponding to corrected P = 0.05.
The significant findings from the joint two-locus analyses included marker-pairs that were always in combination with tag-SNP rs11196205 in TCF7L2, which was the strongest single-locus association signal for T2D in these data. Previous studies show that when single-locus effects are large relative to interaction effects and particularly if they are sufficient to identify at least one of the loci in the pair, interaction-based approaches are not more powerful than single-locus approaches in identifying susceptibility loci (Marchini et al., 2005). To explore this further in our dataset, we performed genome-wide permutations of the data by keeping the genotypes at rs11196205 in TCF7L2 constant and permuting the genotypes at the secondary locus in the pair, but preserving LD across the genome, for 1000 replicates. For each replicate we obtained the joint two-locus and single-locus test statistics. Using the 1000 replicates we then obtained the null distribution of the joint two-locus test statistic under a single-locus null model. The resulting permutation-based multiple testing threshold for the joint-two locus test under a null single-locus model was P = 1 × 10−14 and corresponded to a corrected P = 0.05.
In the top 79 marker-pairs from the joint two-locus test, we evaluated the nominal significance of the interaction components alone to estimate the evidence of epistasis. We assessed the significance of the additive-by-additive epistasis variance term for case-control analyses as implemented within PLINK v1.06 (Purcell et al., 2007). PLINK interaction analyses in the case-control setting were also performed for variants in 23 of 28 established T2D candidate loci at the time of these analyses (Saxena et al., 2007; Scott et al., 2007; Sladek et al., 2007; Steinthorsdottir et al., 2007; WTCCC, 2007; Zeggini et al., 2007; Unoki et al., 2008; Yasuda et al., 2008; Zeggini et al., 2008; Rung et al., 2009; Kong et al., 2009; Dupuis et al., 2010). We selected 23 T2D loci by selecting regions that were well represented by tagSNPs on the Affymetrix 500K array (by using HapMap CEU as a reference panel) and that passed quality control criteria in our study. We chose one tag-SNP per T2D candidate region (Table S1) and performed all pair-wise comparisons resulting in 253 tests between 23 candidate regions.
We then proceeded to test for interaction in the T2D cases only, because case-only analyses may provide an advantage over case-control analyses in terms of detecting interactions that contribute to disease-susceptibility (Yang et al., 1999). Case-only analyses were first performed for the 79 pair-wise regions of interest in PLINK. For both case-control and case-only analyses we applied a multiple-testing correction for the number of SNP pairs (79), setting the corrected epistasis P-value threshold to P = 0.00063. Case-only analyses were then also performed for the 253 pair-wise tests between the 23 T2D candidate regions.
For SNP-pairs of interest we also estimated genotype relative risks (GRRs) by using the odds ratios corresponding to the two-locus genotypes. We fit the full two-locus logistic regression model with respect to the two-locus genotype to obtain the odds ratios of T2D for individuals from a particular two-locus genotype relative to the baseline genotype, which was the rare double homozygote. Overall, the standard errors associated with the odds ratios were consistent among the eight remaining genotype classes. The odds ratios were used as estimates for the two-locus GRRs. We also estimated power to detect joint two-locus effects under one of the top five epistatic two-locus models in our dataset shown in inFigureFigure 2. Assuming a population prevalence of 2.4% for T2D (from Barroso, 2005) and the two-locus GRRs and allele frequencies from Figure 2, we simulated 1924 cases and 2938 controls for 1000 replicates. Simulated genotypes were analysed using the joint-effects two-locus test and the two-locus test conditional on TCF7L2.
We performed a two-dimensional genome-wide scan of T2D using a joint two-locus test of association including main and epistatic effects in a subset of 70,236 markers tagging common SNP variation at r2 > 0.2 and MAF ≥ 0.05. We examined 2.34 × 109 comparisons involving pairs of inter-chromosomal SNPs, and found that 79 pairs showed evidence for two-locus association at a Bonferroni-corrected P-value = 0.05, or uncorrected P-value = 2.14 × 10−11 (Fig. S1). The 79 pairs consisted of 80 markers (Fig. 1), where all of the 79 pairs involved loci that showed evidence for joint effects on diabetes in cooperation with rs11196205 in TCF7L2, which had the strongest single-locus effects (Table 1, Table S2). In CEPH individuals from the HapMap Project (Frazer et al., 2007), marker rs11196205 is in linkage disequilibrium with previously established T2D variants in TCF7L2, rs7903146 (D′ = 1, r2 = 0.47) and rs7901695 (D′ = 1, r2 = 0.497). Overall, we observed an enrichment of two-locus signals over the expectation under the null model that neither locus contributes to T2D, as shown in the quantile–quantile plot of these two-locus GWA results (Fig. S1).
The peak joint pair-wise results included confirmed variants in FTO, TSPAN8, and CDKAL1, which exhibited significant main-effects and no evidence for epistasis with rs11196205 in TCF7L2. However, the majority (82%) of the 79 tag-SNPs did not have compelling single-locus evidence for association (P-value = 5 × 10−4), suggesting the possibility that epistatic effects may contribute to T2D susceptibility.
Single-locus analyses for rs11196205 in TCF7L2 resulted in the strongest evidence for single-locus association (P = 6.99 × 10−11). Because TCF7L2 was the strongest single-locus association signal in the data, we next assessed whether the 79 peak two-locus signals were primarily due to the main effects at TCF7L2 using a TCF7L2 conditional test. None of the 79 pairs of loci surpassed genome-wide significance for the conditional test (Fig. S1), indicating that the majority of the genome-wide significant joint two-locus signals can be attributed to the main effects at TCF7L2 alone. In addition, we also performed genome-wide permutations of the data under a null model where we kept the single-locus genotypes at rs11196205 in TCF7L2, but randomized the effect at the secondary locus across the genome. The resulting permutation-based joint two-locus threshold under this single-locus null model (P = 1 × 10−14) was more extreme than the observed joint two-locus findings, confirming the conditional test findings that the joint two-locus effects in our data are predominantly due to TCF7L2 alone (Fig. S1).
However, the two-locus analyses highlighted pairs of loci where the secondary locus did not have single-locus evidence for association, but where the joint two-locus results were genome-wide significant and the TCF7L2 conditional results for the pair were strongly suggestive of association. For example, in SNP-pairs rs1122637 (RPP40) – rs11196205 (TCF7L2), rs12447958 (A2BP1) – rs11196205 (TCF7L2), and rs2945770 (ST3GAL1) – rs11196205 (TCF7L2), the main effects at the secondary locus were not significant, but there was moderate evidence for association in the conditional analyses (P = 5 × 10−4). These findings imply that epistatic interactions may contribute to the joint two-locus effects and suggest further analyses assessing the significance of the interaction terms alone in the peak 79 SNP-pairs.
We performed analyses assessing the evidence for gene–gene interactions alone, both in a case-control design and within the type 2 diabetes cases only, for the peak 79 pairs of loci (Table S2), testing specifically the additive-by-additive interaction component. In the case-control analyses 27 pairs of SNPs surpassed nominal significance (P = 0.05) for additive-by-additive epistasis and in the case-only analyses, there were 21 pairs, of which 17 overlapped with the case-control epistatic findings. When we took into account multiple testing, five of the 79 pairs of SNPs showed evidence for additive-by-additive epistasis both in the case/control analyses and in analyses restricted to the T2D cases alone (Table 2). In these five SNP-pairs both the case-only and the case-control epistasis findings were nominally significant, and at least one surpassed the multiple threshold correction. The five loci pairs included rs11196205 (TCF7L2) in combination with tag-SNPs located near SLC9A9 (rs17635531), and regions on chromosomes 4 (rs1450140), 6 (rs1122637; rs1935683) and 11 (rs12803308). We examined pair rs1935683 (RFPL4B) – rs11196205 (TCF7L2) in more detail and obtained the odds ratios corresponding to the two-locus genotypes at the two SNPs. We used the odds ratios as estimators of the GRRs to obtain an insight into the underlying two-locus genetic model for this pair of regions (Fig. 2). Using the observed GRRs at these two loci, we then simulated case-control data to estimate power to detect these loci in our study. Power to detect multilocus joint effects at these two loci was estimated at 94.1%. However, when we tested power to detect rs1935683 (RFPL4B), conditional on TCF7L2 main effects, only 28.6% of the replicates surpassed the corrected threshold (P = 7.51 × 10−7) for the conditional test. These results indicate that the conditional test has low power to detect epistatic loci of modest single-locus effects that interact with TCF7L2, which has strong single-locus effects in T2D.
Finally, we tested for epistasis across pair-wise combinations of tag-SNPs tagging variation in 23 regions that were type 2 diabetes susceptibility loci. We did not observe evidence for additive-by-additive interaction when correcting for the number of candidate-region tests performed (nominal P = 1.97 × 10−4 corresponding to corrected P = 0.05). The peak signals in these analyses were obtained for IRS1 (rs2943634) with IGF2BP2 (rs4402960), with suggestive evidence for epistasis from both case-control (P = 2.67 × 10−4) and case-only (P = 0.01) tests, and for NOTCH2 (rs10923931) with PPARG (rs1801282) with modest evidence for epistasis from case-control analyses (P = 5.97 × 10−3).
We performed a two-locus GWA scan including epistasis in T2D by selecting variants tagging common variation across the human genome. We obtained genome-wide evidence for joint two-locus effects at 79 pairs of markers, all of which included a single variant (rs11196205) in TCF7L2. Our conditional test results and permutation results indicate that the genome-wide significant joint two-locus effects are due to the strong main-effects at TCF7L2 and not to genome-wide significant epistatic effects. We obtained interesting signals for epistasis at five pairs of regions and more detailed analyses of the SNP-pair with the strongest evidence for epistasis (rs1935683 in RFPL4B and rs11196205 in TCF7L2) gave an indication into potential underlying multilocus models.
Overall, we found weak evidence that epistatic effects contribute to the multilocus joint effects obtained in this study. These results are likely in part influenced by the strong single-locus effects of rs11196205 in TCF7L2 in our dataset. Previous studies have shown that although multilocus approaches have on average more power than single-locus approaches to detect susceptibility loci in the presence of epistasis, that is not necessarily the case when one of the loci in the pair has strong single-locus effects that are large relative to the interaction effects and are sufficient to detect this locus in a single-locus GWAS (Marchini et al., 2005). Our findings confirm that in multilocus association testing very strong single locus effects can obscure the two-locus positive results. For example, for one of the observed suggestive models of epistasis in Figure 2, we estimated over 94% power to detect susceptibility variants under the joint two-locus test, however, power to detect the effects of the secondary locus (rs1935683 in RFPL4B) in the pair under the conditional test was only 28.6%. These results suggest that although interactions may contribute to T2D, there is currently insufficient power to detect them even in the single largest GWA study for T2D to date (WTCCC, 2007). Power to detect loci in the presence of epistasis depends on many factors, including sample-size, underlying model of epistasis, and disease allele frequencies at the two loci. Several approaches to detect genetic interactions exist (see Cordell, 2009) and the majority of methods used in our study focus on detecting susceptibility variants in the presence of epistasis, rather than targeting interactions alone. Tests for epistasis in GWA data may prove challenging, both computationally and due to the effect of allele frequency on the corresponding epistatic variance components. Computationally tractable methods to detect epistasis alone have previously been developed (Ritchie et al., 2001; Purcell et al., 2007; Steffens et al., 2010; Wan et al., 2010), and one of these has been applied to the T2D data used in our study (Wan et al., 2010) also reporting negative evidence for epistasis among genome-wide SNPs at least 1Mb away from each other. These findings confirm our results for the conditional and epistasis-only tests, when interpreted in a genome-wide context. In our study, single-locus tests together with the conditional test indicate the relative magnitude of epistatic effects. The joint two-locus analyses may provide a sense of the proportion of two locus effects in the genome that are epistatic, for example, the joint two-locus test results and associated quantile–quantile plot show that the majority of the deviation from the null model of no two-locus association is due to TCF7L2 alone, although there is some minimal enrichment of two-locus signals for tests excluding TCF7L2 (Fig. S1). Based on these findings, it is unlikely that interactions between common variants will be detected at genome-wide significant thresholds in present-day single GWA scans, even if they contribute appreciably to the missing heritability in T2D. Replication of our negative findings is important, but the replication sets for our study will be smaller than the original sample set used, therefore obscuring the distinction between lack of power to detect signals and negative results. Larger sample sizes and meta-analysis approaches may help detect epistatic effects, but do not address the problems of low power to detect epistasis under certain underlying genetic models, and interpreting the epistatic effects in the context of the allele frequency spectrum. Furthermore, the best strategy to perform meta-analyses on results from multiple two-locus GWA scans remains unclear.
Although we did not find evidence for epistasis genome-wide, our few interaction-only analysis findings point towards loci for which there is weak or no evidence for single-locus association. Previous studies have shown that association analyses can detect large epistatic effects in the complete absence of main effects (Culverhouse et al., 2002), but another factor which can influence the power of multilocus GWA search strategy is the presence of LD between marker and disease locus (Marchini et al., 2005). We applied a tagging approach to select a subset of markers for the joint two-locus GWA tests, as this approach significantly reduced the computational load of the analyses. However, tagging may both reduce power to detect multilocus association, and may yield misleading inferences from interaction follow-up results due to the effect of linkage disequilibrium on the underlying two-locus models (Lynch & Walsh, 1998; Hill et al., 2008). Future studies need to improve our understanding of the effect of linkage disequilibrium on the epistatic components of the test statistic under various underlying models of epistasis.
The extent to which findings of statistical interactions relate to biological epistasis remains unclear. As pointed out by Lynch and Walsh (1998), unless the allele frequencies are known, the estimated variance components provide limited insight into the biological gene action. Some approaches incorporate additional biological data to prioritize, improve, or interpret statistical interaction analyses (Aylor & Zeng, 2008; Emily et al., 2009). However, there are many genetic models of epistasis for which the biological mechanisms are difficult to envision. The optimal approach would be to test for statistical epistasis in a situation where biological epistasis exists, unfortunately, examples of such studies are rare. The underlying genetic models for most complex traits are likely models of higher-dimensional biological complexity. However, an assessment of pair-wise joint effects is useful as a first step towards obtaining further insight into the genetic etiology of common complex traits.
We thank J. Broxholme for computational assistance and J. Randall for helping create Figure 1. JTB is supported by a Sir Henry Wellcome Trust Fellowship. NJT is supported by the MRC as part of the MRC CAiTE Centre (G0600705). EZ and APM are supported by the Wellcome Trust (Grant nos. WT088885/Z/09/Z to EZ and WT081682/Z/06/Z to APM). This study makes use of data generated by the Wellcome Trust Case-Control Consortium. A full list of the investigators who contributed to the generation of the data is available from www.wtccc.org.uk. Funding for the project was provided by the Wellcome Trust under award 076113.