We first used data from a yeast linkage study, consisting of 112 segregants derived from a cross of a yeast lab strain (S288C) and a wild isolate, as described in 
, to study epistatic effects. We initially attempted a naïve method to detect epistatic effects which searches over all possible SNP pairs. For 2,931 markers, there were a total of 4,293,915 SNP pairs. Using 10,000 permutations, at cutoffs defined by the 1% and 0.1% quantile of the permutation statistics (see Methods
), we estimated an FDR of 96% and 99%, respectively. The number of SNP pairs selected at these thresholds closely matches the number of SNP pairs expected by chance.
Next, we tried several different strategies to reduce the number of SNP pairs under consideration, marginal-by-marginal (MM), marginal-by-genomewide (MG) and STRING (ST) strategies. We test the epistasis model (see Methods
) using a subset of SNP pairs defined by each strategy. The MM strategy selects a set of SNP pairs such that both SNPs in the pair are associated with the trait at a given significance level, determined from the one-dimensional regression model (see Methods
). The MG strategy selects a set of SNP pairs such that at least one SNP in the pair is associated with the trait at a given significance level. The ST strategy selects pairs of genes with corresponding protein-protein interactions in the STRING database determined by a given significance threshold. The strategy includes all SNP pairs that map to a gene pair.
We compared the different strategies by comparing the estimated FDR as shown in . We define a p-value cutoff by the 0.1% quantile of the permutation statistics which fixes the expected number of false positives. We found that overall the MM and MG strategies tend to have lower FDRs than the ST strategy. We can also observe that for the MM and MG strategies, as the number of tests increases, the FDR tends to increase. This trend is due to the statistical issue of multiple testing, reflecting that as the number of tests increases it becomes difficult to distinguish the significance of true interaction effects from those expected by chance. Indeed, we found that selecting a very small number of marginal SNPs gives the smallest FDR.
Comparison of the FDR (determined at cutoffs corresponding to the 0.1% quantile of permutation p-values) for detecting interactions in yeast gene expression data among the different subset strategies.
We also compared the strategies by comparing their performance to an appropriate control strategy (MM0
, respectively; see Methods
and for further details). The goal is to assess whether the information in the candidate strategies aids in the detection of epistatic effects. The performance relative to 500 randomly chosen respective controls is given in . Both the MM and MG strategies result in a lower FDR than the random control (p
0.17, respectively for approximately 3,000 tests). In contrast, the ST strategy does not outperform the random control at any significance threshold (p
1.0). The FDR as a function of the p-value cutoff is shown in Figure S1
Illustration of comparisons between subset strategies and control strategies for the MM strategy.
The best performing strategy was the MM strategy. The FDR was 39.7%, with 10 hits selected at this cutoff, whereas the expected number of false positives was below four. A plot of the most significant interaction from the MM strategy is given in .
The data supporting the most significant interaction from the MM strategy is shown here.
We then tried a similar approach to systematically compare the performance of different strategies for human data. However, we were faced with considerable study limitations, including larger computational and statistical complexity and stronger environmental effects. We chose a restricted set of 297,153 HapMap SNPs (see Methods
), corresponding to 44 billion possible SNP pairs. As in yeast, we performed permutation tests to assess the significance of the interaction test statistics. However, even with our large computational resources we were only able to perform 1,000 permutations. With fewer permutations, we are required to use a less stringent p-value threshold to assess significance of the results. As a result, we had less power to separate out the strongest signals from noise. We note that in the analysis of yeast, using the same (1%) quantile cut-off as for the human data (instead of the 0.1% quantile), we do not observe improved performance of the MM or MG strategies compared to the ST strategy, with all strategies giving FDRs of 78–100%.
Applying the naïve method to detect epistatic effects which searches over all possible SNP pairs, at a p-value cutoff defined by the 1% quantile of the permutation statistics, we find an FDR of 82%. Thus, of the 12 results selected, only about two are expected to be true.
Next, we compared the performance of MM, MG and ST strategies. As shown in , we do not see any trend in performance suggesting that any of the MM, MG or ST strategies achieve superior performance.
Comparison of the FDR (determined at cutoffs corresponding to the 0.1% quantile of permutation p-values) for detecting interactions in human gene expression data among the different subset strategies.
In order to facilitate the computation of a larger number of permutations, we decided to evaluate the MM strategy using a very limited set of SNPs. For such a small set of SNPs it was therefore possible to perform 10,000 permutations. We applied the MM strategy using the top 5,000 marginally significant SNPs for each gene expression trait, corresponding to 1.2 million SNP pairs. We assessed the significance using 10,000 permutation tests. At the 0.1% quantile of the permutation statistics we estimate an FDR of 0.33, detecting three putative hits of which approximately one is expected by chance (Figure S2
Details of the genes and SNPs associated epistatically are given in . A plot of the FDR at different p-value thresholds is given in Figure S3
. We tested the epistatic association with expression levels of HLA-DRB1 in other populations obtaining p-values of 0.0055, 0.0098, 0.015, 0.00012, 0.93, 0.40, 0.59 for CHB, JPT, GIH, MEX, LWK, MKK and YRI, respectively, indicating that the interaction replicates well across non-African populations (see also Figure S4
). We estimate the percent variance explained by the epistasis term to be 9.3%, 7.9%, 7.0%, and 28.7% in CHB, JPT, GIH, and MEX, respectively. We asked whether the interaction effect for HLA-DRB1 would disappear if we take into account dominant or recessive effects. We found that the interaction remains significant (p
1.631e-11, CEU). Both SNPs fall within CNV regions based on the Database for Genomic Variants 
. However, the Hardy-Weinberg p-values are not significant (p
0.646) indicating that these SNPs are not likely to fall into the copy number variant region.
Details of the two hits discovered by the MM strategy in a human CEU eQTL dataset.