The identification of genetic susceptibility loci for human complex diseases has been rather successful due to the ability to combine different genome-wide association studies via meta-analyses. In the quest for the missing heritability, genome-wide association interaction studies have become increasingly popular and the field shows a boost in methodological developments
[14]. When lower-order effects are not appropriately accounted for in epistasis screening, derived results may not be trustworthy and conclusions about genuine epistasis may be ungrounded.
Indeed, the challenge is to find epistasis effects above and beyond singular marker contributing effects, should there be any. In this work, we investigated the power of MB-MDR for quantitative traits and unrelated individuals, while targeting gene–gene interactions accounting for potential main effects.
As was already observed in
[6], MB-MDR adequately controls type I rate at 5% when no association is present (null data). Under additive corrections, type I error and false positive rates are high irrespective of the adjustment method considered but controlled under co-dominant corrections. This is due to the existence of SNP4, which was simulated with both additive and dominance effects (advantageous heterozygous). Hence, additive adjustment does not fully remove the effect of SNP4. As shown before in , the consequence is that a number of SNPs appear to be significantly interacting with SNP4. Not surprisingly, this occurs more often under additive correction compared to co-dominant correction. This is because when we correct for main effects using the co-dominant model, we remove all the effect of SNP4, and hence false positive results are only by chance (5% nominal error rate). When no main effects adjustment is implemented, MB-MDR gives even higher false positive rate rates.
Lower power profiles under co-dominant corrections in are explained by the different contributions of additive and dominance effects to the total main effects variance as already shown in . When there is a remarkable contribution of dominance effect, as mentioned before, additive coding does not fully remove main effect contribution of the interacting SNPs. For instance, under M27, when the contribution of main effects is maximum (
p
=

0.5), almost 33% of the main effects variance is dominance, hence a huge difference in the power profiles between additive and co-dominant codings.
Interestingly, easy-to-use automatic subset selection procedures (MRAIC) and single regression-based identification of important main effects prior to MB-MDR screening result in lower power and almost zero false positive rates. Often, a list of top SNPs is generated to derive disease genetic risk scores. Some of these SNPs may reach user-defined significance, some may even reach genome-wide significance and some may not be significant at all. Hence, correcting for SNPs in such a list (e.g. top5, 10, 15) may remove more of the trait's variability than is really necessary, especially when correction for multiple testing is not performed. Note that we considered a minimum of 5 top findings since at least 4 SNPs were allowed to contribute to the main effects variance.
In order to attain sufficient power, any main effects corrective method that leads to an over-correction during epistasis screening should be avoided. All considered residual-based approaches (MRAIC, SR0.05, SRperm, SRtop5, SRtop10, SRtop15) led to uncontrolled false positive rates. This can be explained by either the way the residuals were obtained (inappropriate main effects coding) or by the non-exhaustive list of markers considered in the residual computation.
Only co-dominantly correcting for significant SNPs as integral part of MB-MDR screening perform much better. However, the poor performance of MB-MDR1D and MB-MDRlist and the excellent performance of MB-MDRadjust in terms of controlling false positive epistasis rates supports the intuition that it (only) matters to correct for those SNPs that are involved in the SNP pair under investigation, when no other SNPs are expected to modify the effect of that pair.
The aforementioned discussion clearly raises questions about how to best correct for lower-order effects when higher-order (>2) interactions are targeted. In either case, to aid in interpretation of results, it is always a good practice to assess the joint information of clusters of SNPs that contribute to the trait variability
[15].
Finally, we emphasize that most statistical epistasis detection methods can be decomposed into a core component and a multiple testing correction component. Keeping the core component, but using a more refined multiple testing correction can generally enhance its performance. For instance, assumptions underlying the maxT procedure of
[7] that is implemented in MB-MDR are likely to be violated for MB-MDR
1D and MB-MDR
list . Indeed, the null and the alternative hypotheses per pair of SNPs under investigation are no longer the same for all interaction tests.
In conclusion, rather than adjusting for lower-order effects prior to MB-MDR and using residuals as the new trait, or adjusting only for significant SNP(s), we advocate an “on-the-fly” main effects adjustment (MB-MDRadjust). This type of adjustment only removes potential main effects contributions in the pair under investigation but keeps the null and alternative hypotheses similar from one pair of SNPs to another. We have shown that the commonly used additive coding in the “on-the-fly” adjustment (MB-MDRadjust) is not sufficient and leads to overly optimistic results and that co-dominant adjustments are to be preferred. This will ensure an acceptable balance between type I error and power to identify the interactions.
Realistic settings often involve both additive and dominance genetic effects to the trait under investigation. Equivalent to our co-dominant coding, a perhaps biologically more meaningful coding involves introducing 2 variables
X1 and
X2 with values −1, 0, 1 and −1/2, 1/2, −1/2, respectively, for homogenous wild type, heterozygote and homozygote mutant genotypes. In such a coding scheme, both additive and dominant scales are represented. This 2-parameter coding is statistically attractive since it is invariant to allele coding (i.e. whether coding homogenous wild type as 1 or homozygote mutant genotypes as 1 for
X1)
[16]. The utility of the aforementioned coding as a way to adjust for lower-order effects in MB-MDR higher-order epistasis screening will be the subject of future research.
Software
The MB-MDR software with the MB-MDR
adjust option is available upon request from the first author (
jmahachieulg.ac.be).