One goal of personal genomics is to use information about genomic variation to predict who is at risk for various common diseases. Technological advances in genotyping have spawned several personal genetic testing services that market genotyping services directly to the consumer. An important goal of consumer genetic testing is to provide health information along with the genotyping results. This has the potential to integrate detailed personal genetic and genomic information into healthcare decision making. Despite the potential importance of these advances, there are some important limitations. One concern is that much of the literature that is used to formulate personal genetics reports is based on genetic association studies that consider each genetic variant independently of the others. It is our working hypothesis that the true value of personal genomics will only be realized when the complexity of the genotype-to-phenotype mapping relationship is embraced, rather than ignored. We focus here on complexity in genetic architecture due to epistasis or nonlinear gene-gene interaction. We have previously developed a multifactor dimensionality reduction (MDR) algorithm and software package for detecting nonlinear interactions in genetic association studies. In most prior MDR analyses, the permutation testing strategy used to assess statistical significance was unable to differentiate MDR models that captured only interaction effects from those that also detected independent main effects. Statistical interpretation of MDR models required post-hoc analysis using entropy-based measures of interaction information. We introduce here a novel permutation test that allows the effects of nonlinear interactions between multiple genetic variants to be specifically tested in a manner that is not confounded by linear additive effects. We show using data simulated across 35 different epistasis models with varying effect sizes (heritabilities = 0.01, 0.025, 0.05, 0.1, 0.2, 0.3, 0.4) and sample sizes (n = 400, 800, 1600) that the power to detect interactions using the explicit test of epistasis is no different than a standard permutation test. We also show that the test has the appropriate size or type I error rate of approximately 0.05. We then apply MDR with the new explicit test of epistasis to a large genetic study of bladder cancer (n=914) and show that a previously reported nonlinear interaction between two XPD gene polymorphisms is indeed significant (P = 0.005), even after considering the strong additive effect of smoking in the model. Finally, we evaluated the power of the explicit test of epistasis to detect the nonlinear interaction between two XPD gene polymorphisms by simulating data from the MDR model of bladder cancer susceptibility. We show that the power to detect the interaction alone was 1.00 while the power to detect the independent effect of smoking alone was 0.06 which is close to the expected type I error rate of 0.05. Importantly, the power to detect the interaction with smoking in the model was 0.94. The results of this study provide for the first time a simple method for explicitly testing epistasis or gene-gene interaction effects in genetic association studies. An important advantage of the method is that it can be combined with any modeling approach. The explicit test of epistasis brings us a step closer to the type of routine gene-gene interaction analysis that is needed if we are to enable personal genomics.
PMCID: PMC2916690  PMID: 19908385
2.  The effect of alternative permutation testing strategies on the performance of multifactor dimensionality reduction 
BMC Research Notes  2008;1:139.
Multifactor Dimensionality Reduction (MDR) is a novel method developed to detect gene-gene interactions in case-control association analysis by exhaustively searching multi-locus combinations. While the end-goal of analysis is hypothesis generation, significance testing is employed to indicate statistical interest in a resulting model. Because the underlying distribution for the null hypothesis of no association is unknown, non-parametric permutation testing is used. Lately, there has been more emphasis on selecting all statistically significant models at the end of MDR analysis in order to avoid missing a true signal. This approach opens up questions about the permutation testing procedure. Traditionally omnibus permutation testing is used, where one permutation distribution is generated for all models. An alternative is n-locus permutation testing, where a separate distribution is created for each n-level of interaction tested.
In this study, we show that the false positive rate for the MDR method is at or below a selected alpha level, and demonstrate the conservative nature of omnibus testing. We compare the power and false positive rates of both permutation approaches and find omnibus permutation testing optimal for preserving power while protecting against false positives.
Omnibus permutation testing should be used with the MDR method.
PMCID: PMC2631601  PMID: 19116021
3.  A General Framework for Formal Tests of Interaction after Exhaustive Search Methods with Applications to MDR and MDR-PDT 
PLoS ONE  2010;5(2):e9363.
The initial presentation of multifactor dimensionality reduction (MDR) featured cross-validation to mitigate over-fitting, computationally efficient searches of the epistatic model space, and variable construction with constructive induction to alleviate the curse of dimensionality. However, the method was unable to differentiate association signals arising from true interactions from those due to independent main effects at individual loci. This issue leads to problems in inference and interpretability for the results from MDR and the family-based compliment the MDR-pedigree disequilibrium test (PDT). A suggestion from previous work was to fit regression models post hoc to specifically evaluate the null hypothesis of no interaction for MDR or MDR-PDT models. We demonstrate with simulation that fitting a regression model on the same data as that analyzed by MDR or MDR-PDT is not a valid test of interaction. This is likely to be true for any other procedure that searches for models, and then performs an uncorrected test for interaction. We also show with simulation that when strong main effects are present and the null hypothesis of no interaction is true, that MDR and MDR-PDT reject at far greater than the nominal rate. We also provide a valid regression-based permutation test procedure that specifically tests the null hypothesis of no interaction, and does not reject the null when only main effects are present. The regression-based permutation test implemented here conducts a valid test of interaction after a search for multilocus models, and can be applied to any method that conducts a search to find a multilocus model representing an interaction.
PMCID: PMC2826406  PMID: 20186329
4.  Detecting purely epistatic multi-locus interactions by an omnibus permutation test on ensembles of two-locus analyses 
BMC Bioinformatics  2009;10:294.
Purely epistatic multi-locus interactions cannot generally be detected via single-locus analysis in case-control studies of complex diseases. Recently, many two-locus and multi-locus analysis techniques have been shown to be promising for the epistasis detection. However, exhaustive multi-locus analysis requires prohibitively large computational efforts when problems involve large-scale or genome-wide data. Furthermore, there is no explicit proof that a combination of multiple two-locus analyses can lead to the correct identification of multi-locus interactions.
The proposed 2LOmb algorithm performs an omnibus permutation test on ensembles of two-locus analyses. The algorithm consists of four main steps: two-locus analysis, a permutation test, global p-value determination and a progressive search for the best ensemble. 2LOmb is benchmarked against an exhaustive two-locus analysis technique, a set association approach, a correlation-based feature selection (CFS) technique and a tuned ReliefF (TuRF) technique. The simulation results indicate that 2LOmb produces a low false-positive error. Moreover, 2LOmb has the best performance in terms of an ability to identify all causative single nucleotide polymorphisms (SNPs) and a low number of output SNPs in purely epistatic two-, three- and four-locus interaction problems. The interaction models constructed from the 2LOmb outputs via a multifactor dimensionality reduction (MDR) method are also included for the confirmation of epistasis detection. 2LOmb is subsequently applied to a type 2 diabetes mellitus (T2D) data set, which is obtained as a part of the UK genome-wide genetic epidemiology study by the Wellcome Trust Case Control Consortium (WTCCC). After primarily screening for SNPs that locate within or near 372 candidate genes and exhibit no marginal single-locus effects, the T2D data set is reduced to 7,065 SNPs from 370 genes. The 2LOmb search in the reduced T2D data reveals that four intronic SNPs in PGM1 (phosphoglucomutase 1), two intronic SNPs in LMX1A (LIM homeobox transcription factor 1, alpha), two intronic SNPs in PARK2 (Parkinson disease (autosomal recessive, juvenile) 2, parkin) and three intronic SNPs in GYS2 (glycogen synthase 2 (liver)) are associated with the disease. The 2LOmb result suggests that there is no interaction between each pair of the identified genes that can be described by purely epistatic two-locus interaction models. Moreover, there are no interactions between these four genes that can be described by purely epistatic multi-locus interaction models with marginal two-locus effects. The findings provide an alternative explanation for the aetiology of T2D in a UK population.
An omnibus permutation test on ensembles of two-locus analyses can detect purely epistatic multi-locus interactions with marginal two-locus effects. The study also reveals that SNPs from large-scale or genome-wide case-control data which are discarded after single-locus analysis detects no association can still be useful for genetic epidemiology studies.
PMCID: PMC2759961  PMID: 19761607
5.  Multifactor dimensionality reduction: An analysis strategy for modelling and detecting gene - gene interactions in human genetics and pharmacogenomics studies 
Human Genomics  2006;2(5):318-328.
The detection of gene - gene and gene - environment interactions associated with complex human disease or pharmacogenomic endpoints is a difficult challenge for human geneticists. Unlike rare, Mendelian diseases that are associated with a single gene, most common diseases are caused by the non-linear interaction of numerous genetic and environmental variables. The dimensionality involved in the evaluation of combinations of many such variables quickly diminishes the usefulness of traditional, parametric statistical methods. Multifactor dimensionality reduction (MDR) is a novel and powerful statistical tool for detecting and modelling epistasis. MDR is a non-parametric and model-free approach that has been shown to have reasonable power to detect epistasis in both theoretical and empirical studies. MDR has detected interactions in diseases such as sporadic breast cancer, multiple sclerosis and essential hypertension.
As this method is more frequently applied, and was gained acceptance in the study of human disease and pharmacogenomics, it is becoming increasingly important that the implementation of the MDR approach is properly understood. As with all statistical methods, MDR is only powerful and useful when implemented correctly. Concerns regarding dataset structure, configuration parameters and the proper execution of permutation testing in reference to a particular dataset and configuration are essential to the method's effectiveness.
The detection, characterisation and interpretation of gene - gene and gene - environment interactions are expected to improve the diagnosis, prevention and treatment of common human diseases. MDR can be a powerful tool in reaching these goals when used appropriately.
PMCID: PMC3500181  PMID: 16595076
epistasis; multifactor dimensionality reduction; gene - gene interactions; gene - environment interactions; pharmacogenomics
6.  Extension of multifactor dimensionality reduction for identifying multilocus effects in the GAW14 simulated data 
BMC Genetics  2005;6(Suppl 1):S145.
The multifactor dimensionality reduction (MDR) is a model-free approach that can identify gene × gene or gene × environment effects in a case-control study. Here we explore several modifications of the MDR method. We extended MDR to provide model selection without crossvalidation, and use a chi-square statistic as an alternative to prediction error (PE). We also modified the permutation test to provide different levels of stringency. The extended MDR (EMDR) includes three permutation tests (fixed, non-fixed, and omnibus) to obtain p-values of multilocus models. The goal of this study was to compare the different approaches implemented in the EMDR method and evaluate the ability to identify genetic effects in the Genetic Analysis Workshop 14 simulated data. We used three replicates from the simulated family data, generating matched pairs from family triads. The results showed: 1) chi-square and PE statistics give nearly consistent results; 2) results of EMDR without cross-validation matched that of EMDR with 10-fold cross-validation; 3) the fixed permutation test reports false-positive results in data from loci unrelated to the disease, but the non-fixed and omnibus permutation tests perform well in preventing false positives, with the omnibus test being the most conservative. We conclude that the non-cross-validation test can provide accurate results with the advantage of high efficiency compared to 10-cross-validation, and the non-fixed permutation test provides a good compromise between power and false-positive rate.
PMCID: PMC1866790  PMID: 16451605
7.  Multifactor dimensionality reduction reveals a three-locus epistatic interaction associated with susceptibility to pulmonary tuberculosis 
BioData Mining  2013;6:4.
Identifying high-order genetics associations with non-additive (i.e. epistatic) effects in population-based studies of common human diseases is a computational challenge. Multifactor dimensionality reduction (MDR) is a machine learning method that was designed specifically for this problem. The goal of the present study was to apply MDR to mining high-order epistatic interactions in a population-based genetic study of tuberculosis (TB).
The study used a previously published data set consisting of 19 candidate single-nucleotide polymorphisms (SNPs) in 321 pulmonary TB cases and 347 healthy controls from Guniea-Bissau in Africa. The ReliefF algorithm was applied first to generate a smaller set of the five most informative SNPs. MDR with 10-fold cross-validation was then applied to look at all possible combinations of two, three, four and five SNPs. The MDR model with the best testing accuracy (TA) consisted of SNPs rs2305619, rs187084, and rs11465421 (TA = 0.588) in PTX3, TLR9 and DC-Sign, respectively. A general 1000-fold permutation test of the null hypothesis of no association confirmed the statistical significance of the model (p = 0.008). An additional 1000-fold permutation test designed specifically to test the linear null hypothesis that the association effects are only additive confirmed the presence of non-additive (i.e. nonlinear) or epistatic effects (p = 0.013). An independent information-gain measure corroborated these results with a third-order epistatic interaction that was stronger than any lower-order associations.
We have identified statistically significant evidence for a three-way epistatic interaction that is associated with susceptibility to TB. This interaction is stronger than any previously described one-way or two-way associations. This study highlights the importance of using machine learning methods that are designed to embrace, rather than ignore, the complexity of common diseases such as TB. We recommend future studies of the genetics of TB take into account the possibility that high-order epistatic interactions might play an important role in disease susceptibility.
PMCID: PMC3618340  PMID: 23418869
Epistasis; Gene-gene interactions; Machine learning; Pulmonary tuberculosis
8.  FAM-MDR: A Flexible Family-Based Multifactor Dimensionality Reduction Technique to Detect Epistasis Using Related Individuals 
PLoS ONE  2010;5(4):e10304.
We propose a novel multifactor dimensionality reduction method for epistasis detection in small or extended pedigrees, FAM-MDR. It combines features of the Genome-wide Rapid Association using Mixed Model And Regression approach (GRAMMAR) with Model-Based MDR (MB-MDR). We focus on continuous traits, although the method is general and can be used for outcomes of any type, including binary and censored traits. When comparing FAM-MDR with Pedigree-based Generalized MDR (PGMDR), which is a generalization of Multifactor Dimensionality Reduction (MDR) to continuous traits and related individuals, FAM-MDR was found to outperform PGMDR in terms of power, in most of the considered simulated scenarios. Additional simulations revealed that PGMDR does not appropriately deal with multiple testing and consequently gives rise to overly optimistic results. FAM-MDR adequately deals with multiple testing in epistasis screens and is in contrast rather conservative, by construction. Furthermore, simulations show that correcting for lower order (main) effects is of utmost importance when claiming epistasis. As Type 2 Diabetes Mellitus (T2DM) is a complex phenotype likely influenced by gene-gene interactions, we applied FAM-MDR to examine data on glucose area-under-the-curve (GAUC), an endophenotype of T2DM for which multiple independent genetic associations have been observed, in the Amish Family Diabetes Study (AFDS). This application reveals that FAM-MDR makes more efficient use of the available data than PGMDR and can deal with multi-generational pedigrees more easily. In conclusion, we have validated FAM-MDR and compared it to PGMDR, the current state-of-the-art MDR method for family data, using both simulations and a practical dataset. FAM-MDR is found to outperform PGMDR in that it handles the multiple testing issue more correctly, has increased power, and efficiently uses all available information.
PMCID: PMC2858665  PMID: 20421984
9.  A cross-validation procedure for general pedigrees and matched odds ratio fitness metric implemented for the multifactor dimensionality reduction pedigree disequilibrium test 
Genetic epidemiology  2010;34(2):194-199.
As genetic epidemiology looks beyond mapping single disease susceptibility loci, interest in detecting epistatic interactions between genes has grown. The dimensionality and comparisons required to search the epistatic space and the inference for a significant result pose challenges for testing epistatic disease models. The Multifactor Dimensionality Reduction Pedigree Disequilibrium Test (MDR-PDT) was developed to test for multilocus models in pedigree data. In the present study we rigorously tested MDR-PDT with new cross-validation (CV) (both 5- and 10-fold) and omnibus model selection algorithms by simulating a range of heritabilities, odds ratios, minor allele frequencies, sample sizes, and numbers of interacting loci. Power was evaluated using 100, 500, and 1000 families, with minor allele frequencies 0.2 and 0.4 and broad-sense heritabilities of 0.005, 0.01, 0.03, 0.05, and 0.1 for 2 and 3-locus purely epistatic penetrance models. We also compared the prediction error measure of effect with a predicted matched odds ratio for final model selection and testing. We report that the CV procedure is valid with the permutation test, MDR-PDT performs similarly with 5 and 10- fold CV, and that the matched odds ratio is more powerful than prediction error as the fitness metric for MDR-PDT.
PMCID: PMC2811750  PMID: 19697353
Epistasis; MDR-PDT; complex disease; family-based association; bioinformatics
10.  Screening and Rapid Molecular Diagnosis of Tuberculosis in Prisons in Russia and Eastern Europe: A Cost-Effectiveness Analysis 
PLoS Medicine  2012;9(11):e1001348.
Daniel Winetsky and colleagues investigate eight strategies for screening and diagnosis of tuberculosis within prisons of the former Soviet Union.
Prisons of the former Soviet Union (FSU) have high rates of multidrug-resistant tuberculosis (MDR-TB) and are thought to drive general population tuberculosis (TB) epidemics. Effective prison case detection, though employing more expensive technologies, may reduce long-term treatment costs and slow MDR-TB transmission.
Methods and Findings
We developed a dynamic transmission model of TB and drug resistance matched to the epidemiology and costs in FSU prisons. We evaluated eight strategies for TB screening and diagnosis involving, alone or in combination, self-referral, symptom screening, mass miniature radiography (MMR), and sputum PCR with probes for rifampin resistance (Xpert MTB/RIF). Over a 10-y horizon, we projected costs, quality-adjusted life years (QALYs), and TB and MDR-TB prevalence. Using sputum PCR as an annual primary screening tool among the general prison population most effectively reduced overall TB prevalence (from 2.78% to 2.31%) and MDR-TB prevalence (from 0.74% to 0.63%), and cost US$543/QALY for additional QALYs gained compared to MMR screening with sputum PCR reserved for rapid detection of MDR-TB. Adding sputum PCR to the currently used strategy of annual MMR screening was cost-saving over 10 y compared to MMR screening alone, but produced only a modest reduction in MDR-TB prevalence (from 0.74% to 0.69%) and had minimal effect on overall TB prevalence (from 2.78% to 2.74%). Strategies based on symptom screening alone were less effective and more expensive than MMR-based strategies. Study limitations included scarce primary TB time-series data in FSU prisons and uncertainties regarding screening test characteristics.
In prisons of the FSU, annual screening of the general inmate population with sputum PCR most effectively reduces TB and MDR-TB prevalence, doing so cost-effectively. If this approach is not feasible, the current strategy of annual MMR is both more effective and less expensive than strategies using self-referral or symptom screening alone, and the addition of sputum PCR for rapid MDR-TB detection may be cost-saving over time.
Please see later in the article for the Editors' Summary
Editors' Summary
Tuberculosis (TB)—a contagious bacterial disease—is a major public health problem, particularly in low- and middle-income countries. In 2010, about nine million people developed TB, and about 1.5 million people died from the disease. Mycobacterium tuberculosis, the bacterium that causes TB, is spread in airborne droplets when people with active disease cough or sneeze. The characteristic symptoms of TB include fever, a persistent cough, and night sweats. Diagnostic tests include sputum smear microscopy (examination of mucus from the lungs for M. tuberculosis bacilli), mycobacterial culture (growth of M. tuberculosis from sputum), and chest X-rays. TB can also be diagnosed by looking for fragments of the M. tuberculosis genetic blueprint in sputum samples (sputum PCR). Importantly, sputum PCR can detect the genetic changes that make M. tuberculosis resistant to rifampicin, a constituent of the cocktail of antibiotics that is used to cure TB. Rifampicin resistance is an indicator of multidrug-resistant TB (MDR-TB), the emergence of which is thwarting ongoing global efforts to control TB.
Why Was This Study Done?
Prisons present unique challenges for TB control. Overcrowding, poor ventilation, and inadequate medical care increase the spread of TB among prisoners, who often come from disadvantaged populations where the prevalence of TB (the proportion of the population with TB) is already high. Prisons also act as reservoirs for TB, recycling the disease back into the civilian population. The prisons of the former Soviet Union, for example, which have extremely high rates of MDR-TB, are thought to drive TB epidemics in the general population. Because effective identification of active TB among prison inmates has the potential to improve TB control outside prisons, the World Health Organization recommends active TB case finding among prisoners using self-referral, screening with symptom questionnaires, or screening with chest X-rays or mass miniature radiography (MMR). But which of these strategies will reduce the prevalence of TB in prisons most effectively, and which is most cost-effective? Here, the researchers evaluate the relative effectiveness and cost-effectiveness of alternative strategies for screening and diagnosis of TB in prisons by modeling TB and MDR-TB epidemics in prisons of the former Soviet Union.
What Did the Researchers Do and Find?
The researchers used a dynamic transmission model of TB that simulates the movement of individuals in prisons in the former Soviet Union through different stages of TB infection to estimate the costs, quality-adjusted life years (QALYs; a measure of disease burden that includes both the quantity and quality of life) saved, and TB and MDR-TB prevalence for eight TB screening/diagnostic strategies over a ten-year period. Compared to annual MMR alone (the current strategy), annual screening with sputum PCR produced the greatest reduction in the prevalence of TB and of MDR-TB among the prison population. Adding sputum PCR for detection of MDR-TB to annual MMR screening did not affect the overall TB prevalence but slightly reduced the MDR-TB prevalence and saved nearly US$2,000 over ten years per model prison of 1,000 inmates, compared to MMR screening alone. Annual sputum PCR was the most cost-effective strategy, costing US$543/QALY for additional QALYs gained compared to MMR screening plus sputum PCR for MDR-TB detection. Other strategies tested, including symptom screening alone or combined with sputum PCR, were either more expensive and less effective or less cost-effective than these two options.
What Do These Findings Mean?
These findings suggest that, in prisons in the former Soviet Union, annual screening with sputum PCR will most effectively reduce TB and MDR-TB prevalence and will be cost-effective. That is, the cost per QALY saved of this strategy is less than the per-capita gross domestic product of any of the former Soviet Union countries. The paucity of primary data on some facets of TB epidemiology in prisons in the former Soviet Union and the assumptions built into the mathematical model limit the accuracy of these findings. Moreover, because most of the benefits of sputum PCR screening come from treating the MDR-TB cases that are detected using this screening approach, these findings cannot be generalized to prison settings without a functioning MDR-TB treatment program or with a very low MDR-TB prevalence. Despite these and other limitations, these findings provide valuable information about the screening strategies that are most likely to interrupt the TB cycle in prisons, thereby saving resources and averting preventable deaths both inside and outside prisons.
Additional Information
Please access these websites via the online version of this summary at
The World Health Organization provides information (in several languages) on all aspects of tuberculosis, including general information on tuberculosis diagnostics and on tuberculosis in prisons; a report published in the Bulletin of the World Health Organization in 2006 describes tough measures taken in Russian prisons to slow the spread of TB
The Stop TB Partnership is working towards tuberculosis elimination; patient stories about tuberculosis are available (in English and Spanish)
The US Centers for Disease Control and Prevention has information about tuberculosis, about its diagnosis, and about tuberculosis in prisons (some information in English and Spanish)
A PLOS Medicine Research Article by Iacapo Baussano et al. describes a systematic review of tuberculosis incidence in prisons; a linked editorial entitled The Health Crisis of Tuberculosis in Prisons Extends beyond the Prison Walls is also available
The Tuberculosis Survival Project, which aims to raise awareness of tuberculosis and provide support for people with tuberculosis, provides personal stories about treatment for tuberculosis; the Tuberculosis Vaccine Initiative also provides personal stories about dealing with tuberculosis
MedlinePlus has links to further information about tuberculosis (in English and Spanish)
PMCID: PMC3507963  PMID: 23209384
11.  A robust multifactor dimensionality reduction method for detecting gene-gene interactions with application to the genetic analysis of bladder cancer susceptibility 
Annals of human genetics  2010;75(1):20-28.
A central goal of human genetics is to identify and characterize susceptibility genes for common complex human diseases. An important challenge in this endeavor is the modeling of gene-gene interaction or epistasis that can result in non-additivity of genetic effects. The multifactor dimensionality reduction (MDR) method was developed as machine learning alternative to parametric logistic regression for detecting interactions in absence of significant marginal effects. The goal of MDR is to reduce the dimensionality inherent in modeling combinations of polymorphisms using a computational approach called constructive induction. Here, we propose a Robust Multifactor Dimensionality Reduction (RMDR) method that performs constructive induction using a Fisher’s Exact Test rather than a predetermined threshold. The advantage of this approach is that only those genotype combinations that are determined to be statistically significant are considered in the MDR analysis. We use two simulation studies to demonstrate that this approach will increase the success rate of MDR when there are only a few genotype combinations that are significantly associated with case-control status. We show that there is no loss of success rate when this is not the case. We then apply the RMDR method to the detection of gene-gene interactions in genotype data from a population-based study of bladder cancer in New Hampshire.
PMCID: PMC3057873  PMID: 21091664
12.  Accelerating epistasis analysis in human genetics with consumer graphics hardware 
BMC Research Notes  2009;2:149.
Human geneticists are now capable of measuring more than one million DNA sequence variations from across the human genome. The new challenge is to develop computationally feasible methods capable of analyzing these data for associations with common human disease, particularly in the context of epistasis. Epistasis describes the situation where multiple genes interact in a complex non-linear manner to determine an individual's disease risk and is thought to be ubiquitous for common diseases. Multifactor Dimensionality Reduction (MDR) is an algorithm capable of detecting epistasis. An exhaustive analysis with MDR is often computationally expensive, particularly for high order interactions. This challenge has previously been met with parallel computation and expensive hardware. The option we examine here exploits commodity hardware designed for computer graphics. In modern computers Graphics Processing Units (GPUs) have more memory bandwidth and computational capability than Central Processing Units (CPUs) and are well suited to this problem. Advances in the video game industry have led to an economy of scale creating a situation where these powerful components are readily available at very low cost. Here we implement and evaluate the performance of the MDR algorithm on GPUs. Of primary interest are the time required for an epistasis analysis and the price to performance ratio of available solutions.
We found that using MDR on GPUs consistently increased performance per machine over both a feature rich Java software package and a C++ cluster implementation. The performance of a GPU workstation running a GPU implementation reduces computation time by a factor of 160 compared to an 8-core workstation running the Java implementation on CPUs. This GPU workstation performs similarly to 150 cores running an optimized C++ implementation on a Beowulf cluster. Furthermore this GPU system provides extremely cost effective performance while leaving the CPU available for other tasks. The GPU workstation containing three GPUs costs $2000 while obtaining similar performance on a Beowulf cluster requires 150 CPU cores which, including the added infrastructure and support cost of the cluster system, cost approximately $82,500.
Graphics hardware based computing provides a cost effective means to perform genetic analysis of epistasis using MDR on large datasets without the infrastructure of a computing cluster.
PMCID: PMC2732631  PMID: 19630950
13.  A robustness study of parametric and non-parametric tests in model-based multifactor dimensionality reduction for epistasis detection 
BioData Mining  2013;6:9.
Applying a statistical method implies identifying underlying (model) assumptions and checking their validity in the particular context. One of these contexts is association modeling for epistasis detection. Here, depending on the technique used, violation of model assumptions may result in increased type I error, power loss, or biased parameter estimates. Remedial measures for violated underlying conditions or assumptions include data transformation or selecting a more relaxed modeling or testing strategy. Model-Based Multifactor Dimensionality Reduction (MB-MDR) for epistasis detection relies on association testing between a trait and a factor consisting of multilocus genotype information. For quantitative traits, the framework is essentially Analysis of Variance (ANOVA) that decomposes the variability in the trait amongst the different factors. In this study, we assess through simulations, the cumulative effect of deviations from normality and homoscedasticity on the overall performance of quantitative Model-Based Multifactor Dimensionality Reduction (MB-MDR) to detect 2-locus epistasis signals in the absence of main effects.
Our simulation study focuses on pure epistasis models with varying degrees of genetic influence on a quantitative trait. Conditional on a multilocus genotype, we consider quantitative trait distributions that are normal, chi-square or Student’s t with constant or non-constant phenotypic variances. All data are analyzed with MB-MDR using the built-in Student’s t-test for association, as well as a novel MB-MDR implementation based on Welch’s t-test. Traits are either left untransformed or are transformed into new traits via logarithmic, standardization or rank-based transformations, prior to MB-MDR modeling.
Our simulation results show that MB-MDR controls type I error and false positive rates irrespective of the association test considered. Empirically-based MB-MDR power estimates for MB-MDR with Welch’s t-tests are generally lower than those for MB-MDR with Student’s t-tests. Trait transformations involving ranks tend to lead to increased power compared to the other considered data transformations.
When performing MB-MDR screening for gene-gene interactions with quantitative traits, we recommend to first rank-transform traits to normality and then to apply MB-MDR modeling with Student’s t-tests as internal tests for association.
PMCID: PMC3668290  PMID: 23618370
Model-based multifactor dimensionality reduction; Epistasis; Model violations; Data transformation
14.  The choice of null distributions for detecting gene-gene interactions in genome-wide association studies 
BMC Bioinformatics  2011;12(Suppl 1):S26.
In genome-wide association studies (GWAS), the number of single-nucleotide polymorphisms (SNPs) typically ranges between 500,000 and 1,000,000. Accordingly, detecting gene-gene interactions in GWAS is computationally challenging because it involves hundreds of billions of SNP pairs. Stage-wise strategies are often used to overcome the computational difficulty. In the first stage, fast screening methods (e.g. Tuning ReliefF) are applied to reduce the whole SNP set to a small subset. In the second stage, sophisticated modeling methods (e.g., multifactor-dimensionality reduction (MDR)) are applied to the subset of SNPs to identify interesting interaction models and the corresponding interaction patterns. In the third stage, the significance of the identified interaction patterns is evaluated by hypothesis testing.
In this paper, we show that this stage-wise strategy could be problematic in controlling the false positive rate if the null distribution is not appropriately chosen. This is because screening and modeling may change the null distribution used in hypothesis testing. In our simulation study, we use some popular screening methods and the popular modeling method MDR as examples to show the effect of the inappropriate choice of null distributions. To choose appropriate null distributions, we suggest to use the permutation test or testing on the independent data set. We demonstrate their performance using synthetic data and a real genome wide data set from an Aged-related Macular Degeneration (AMD) study.
The permutation test or testing on the independent data set can help choosing appropriate null distributions in hypothesis testing, which provides more reliable results in practice.
PMCID: PMC3044281  PMID: 21342556
15.  An efficient algorithm to perform multiple testing in epistasis screening 
BMC Bioinformatics  2013;14:138.
Research in epistasis or gene-gene interaction detection for human complex traits has grown over the last few years. It has been marked by promising methodological developments, improved translation efforts of statistical epistasis to biological epistasis and attempts to integrate different omics information sources into the epistasis screening to enhance power. The quest for gene-gene interactions poses severe multiple-testing problems. In this context, the maxT algorithm is one technique to control the false-positive rate. However, the memory needed by this algorithm rises linearly with the amount of hypothesis tests. Gene-gene interaction studies will require a memory proportional to the squared number of SNPs. A genome-wide epistasis search would therefore require terabytes of memory. Hence, cache problems are likely to occur, increasing the computation time. In this work we present a new version of maxT, requiring an amount of memory independent from the number of genetic effects to be investigated. This algorithm was implemented in C++ in our epistasis screening software MBMDR-3.0.3. We evaluate the new implementation in terms of memory efficiency and speed using simulated data. The software is illustrated on real-life data for Crohn’s disease.
In the case of a binary (affected/unaffected) trait, the parallel workflow of MBMDR-3.0.3 analyzes all gene-gene interactions with a dataset of 100,000 SNPs typed on 1000 individuals within 4 days and 9 hours, using 999 permutations of the trait to assess statistical significance, on a cluster composed of 10 blades, containing each four Quad-Core AMD Opteron(tm) Processor 2352 2.1 GHz. In the case of a continuous trait, a similar run takes 9 days. Our program found 14 SNP-SNP interactions with a multiple-testing corrected p-value of less than 0.05 on real-life Crohn’s disease (CD) data.
Our software is the first implementation of the MB-MDR methodology able to solve large-scale SNP-SNP interactions problems within a few days, without using much memory, while adequately controlling the type I error rates. A new implementation to reach genome-wide epistasis screening is under construction. In the context of Crohn’s disease, MBMDR-3.0.3 could identify epistasis involving regions that are well known in the field and could be explained from a biological point of view. This demonstrates the power of our software to find relevant phenotype-genotype higher-order associations.
PMCID: PMC3648350  PMID: 23617239
Epistasis; Multiple testing; maxT; MB-MDR; GWA studies; Crohn’s disease
16.  Risk score modeling of multiple gene to gene interactions using aggregated-multifactor dimensionality reduction 
BioData Mining  2013;6:1.
Multifactor Dimensionality Reduction (MDR) has been widely applied to detect gene-gene (GxG) interactions associated with complex diseases. Existing MDR methods summarize disease risk by a dichotomous predisposing model (high-risk/low-risk) from one optimal GxG interaction, which does not take the accumulated effects from multiple GxG interactions into account.
We propose an Aggregated-Multifactor Dimensionality Reduction (A-MDR) method that exhaustively searches for and detects significant GxG interactions to generate an epistasis enriched gene network. An aggregated epistasis enriched risk score, which takes into account multiple GxG interactions simultaneously, replaces the dichotomous predisposing risk variable and provides higher resolution in the quantification of disease susceptibility. We evaluate this new A-MDR approach in a broad range of simulations. Also, we present the results of an application of the A-MDR method to a data set derived from Juvenile Idiopathic Arthritis patients treated with methotrexate (MTX) that revealed several GxG interactions in the folate pathway that were associated with treatment response. The epistasis enriched risk score that pooled information from 82 significant GxG interactions distinguished MTX responders from non-responders with 82% accuracy.
The proposed A-MDR is innovative in the MDR framework to investigate aggregated effects among GxG interactions. New measures (pOR, pRR and pChi) are proposed to detect multiple GxG interactions.
PMCID: PMC3560267  PMID: 23294634
A-MDR; Epistasis enriched risk score; Epistasis enriched gene network; pRR; pOR; pChi
17.  A Simple and Computationally Efficient Sampling Approach to Covariate Adjustment for Multifactor Dimensionality Reduction Analysis of Epistasis 
Human Heredity  2010;70(3):219-225.
Epistasis or gene-gene interaction is a fundamental component of the genetic architecture of complex traits such as disease susceptibility. Multifactor dimensionality reduction (MDR) was developed as a nonparametric and model-free method to detect epistasis when there are no significant marginal genetic effects. However, in many studies of complex disease, other covariates like age of onset and smoking status could have a strong main effect and may potentially interfere with MDR's ability to achieve its goal. In this paper, we present a simple and computationally efficient sampling method to adjust for covariate effects in MDR. We use simulation to show that after adjustment, MDR has sufficient power to detect true gene-gene interactions. We also compare our method with the state-of-art technique in covariate adjustment. The results suggest that our proposed method performs similarly, but is more computationally efficient. We then apply this new method to an analysis of a population-based bladder cancer study in New Hampshire.
PMCID: PMC2982850  PMID: 20924193
Covariate adjustment; Multifactor dimensionality reduction; Epistasis
18.  Interaction among apoptosis-associated sequence variants and joint effects on aggressive prostate cancer 
BMC Medical Genomics  2012;5:11.
Molecular and epidemiological evidence demonstrate that altered gene expression and single nucleotide polymorphisms in the apoptotic pathway are linked to many cancers. Yet, few studies emphasize the interaction of variant apoptotic genes and their joint modifying effects on prostate cancer (PCA) outcomes. An exhaustive assessment of all the possible two-, three- and four-way gene-gene interactions is computationally burdensome. This statistical conundrum stems from the prohibitive amount of data needed to account for multiple hypothesis testing.
To address this issue, we systematically prioritized and evaluated individual effects and complex interactions among 172 apoptotic SNPs in relation to PCA risk and aggressive disease (i.e., Gleason score ≥ 7 and tumor stages III/IV). Single and joint modifying effects on PCA outcomes among European-American men were analyzed using statistical epistasis networks coupled with multi-factor dimensionality reduction (SEN-guided MDR). The case-control study design included 1,175 incident PCA cases and 1,111 controls from the prostate, lung, colo-rectal, and ovarian (PLCO) cancer screening trial. Moreover, a subset analysis of PCA cases consisted of 688 aggressive and 488 non-aggressive PCA cases. SNP profiles were obtained using the NCI Cancer Genetic Markers of Susceptibility (CGEMS) data portal. Main effects were assessed using logistic regression (LR) models. Prior to modeling interactions, SEN was used to pre-process our genetic data. SEN used network science to reduce our analysis from > 36 million to < 13,000 SNP interactions. Interactions were visualized, evaluated, and validated using entropy-based MDR. All parametric and non-parametric models were adjusted for age, family history of PCA, and multiple hypothesis testing.
Following LR modeling, eleven and thirteen sequence variants were associated with PCA risk and aggressive disease, respectively. However, none of these markers remained significant after we adjusted for multiple comparisons. Nevertheless, we detected a modest synergistic interaction between AKT3 rs2125230-PRKCQ rs571715 and disease aggressiveness using SEN-guided MDR (p = 0.011).
In summary, entropy-based SEN-guided MDR facilitated the logical prioritization and evaluation of apoptotic SNPs in relation to aggressive PCA. The suggestive interaction between AKT3-PRKCQ and aggressive PCA requires further validation using independent observational studies.
PMCID: PMC3355002  PMID: 22546513
Prostate cancer; Apoptosis; Single nucleotide polymorphisms; Gene-gene interactions; Multifactor dimensionality reduction (MDR); Statistical epistasis networks (SEN)
19.  An omnibus permutation test on ensembles of two-locus analyses can detect pure epistasis and genetic heterogeneity in genome-wide association studies 
SpringerPlus  2013;2:230.
This article presents the ability of an omnibus permutation test on ensembles of two-locus analyses (2LOmb) to detect pure epistasis in the presence of genetic heterogeneity. The performance of 2LOmb is evaluated in various simulation scenarios covering two independent causes of complex disease where each cause is governed by a purely epistatic interaction. Different scenarios are set up by varying the number of available single nucleotide polymorphisms (SNPs) in data, number of causative SNPs and ratio of case samples from two affected groups. The simulation results indicate that 2LOmb outperforms multifactor dimensionality reduction (MDR) and random forest (RF) techniques in terms of a low number of output SNPs and a high number of correctly-identified causative SNPs. Moreover, 2LOmb is capable of identifying the number of independent interactions in tractable computational time and can be used in genome-wide association studies. 2LOmb is subsequently applied to a type 1 diabetes mellitus (T1D) data set, which is collected from a UK population by the Wellcome Trust Case Control Consortium (WTCCC). After screening for SNPs that locate within or near genes and exhibit no marginal single-locus effects, the T1D data set is reduced to 95,991 SNPs from 12,146 genes. The 2LOmb search in the reduced T1D data set reveals that 12 SNPs, which can be divided into two independent sets, are associated with the disease. The first SNP set consists of three SNPs from MUC21 (mucin 21, cell surface associated), three SNPs from MUC22 (mucin 22), two SNPs from PSORS1C1 (psoriasis susceptibility 1 candidate 1) and one SNP from TCF19 (transcription factor 19). A four-locus interaction between these four genes is also detected. The second SNP set consists of three SNPs from ATAD1 (ATPase family, AAA domain containing 1). Overall, the findings indicate the detection of pure epistasis in the presence of genetic heterogeneity and provide an alternative explanation for the aetiology of T1D in the UK population.
PMCID: PMC4006521  PMID: 24804170
Attribute selection; Complex disease; Epistasis; Genetic heterogeneity; Genome-wide association study; Pattern recognition; Permutation test; Single nucleotide polymorphism; Type 1 diabetes mellitus
20.  A novel survival multifactor dimensionality reduction method for detecting gene–gene interactions with application to bladder cancer prognosis 
Human Genetics  2010;129(1):101-110.
The widespread use of high-throughput methods of single nucleotide polymorphism (SNP) genotyping has created a number of computational and statistical challenges. The problem of identifying SNP–SNP interactions in case–control studies has been studied extensively, and a number of new techniques have been developed. Little progress has been made, however, in the analysis of SNP–SNP interactions in relation to time-to-event data, such as patient survival time or time to cancer relapse. We present an extension of the two class multifactor dimensionality reduction (MDR) algorithm that enables detection and characterization of epistatic SNP–SNP interactions in the context of survival analysis. The proposed Survival MDR (Surv-MDR) method handles survival data by modifying MDR’s constructive induction algorithm to use the log-rank test. Surv-MDR replaces balanced accuracy with log-rank test statistics as the score to determine the best models. We simulated datasets with a survival outcome related to two loci in the absence of any marginal effects. We compared Surv-MDR with Cox-regression for their ability to identify the true predictive loci in these simulated data. We also used this simulation to construct the empirical distribution of Surv-MDR’s testing score. We then applied Surv-MDR to genetic data from a population-based epidemiologic study to find prognostic markers of survival time following a bladder cancer diagnosis. We identified several two-loci SNP combinations that have strong associations with patients’ survival outcome. Surv-MDR is capable of detecting interaction models with weak main effects. These epistatic models tend to be dropped by traditional Cox regression approaches to evaluating interactions. With improved efficiency to handle genome wide datasets, Surv-MDR will play an important role in a research strategy that embraces the complexity of the genotype–phenotype mapping relationship since epistatic interactions are an important component of the genetic basis of disease.
PMCID: PMC3255326  PMID: 20981448
21.  Cost-Effectiveness of Treating Multidrug-Resistant Tuberculosis 
PLoS Medicine  2006;3(7):e241.
Despite the existence of effective drug treatments, tuberculosis (TB) causes 2 million deaths annually worldwide. Effective treatment is complicated by multidrug-resistant TB (MDR TB) strains that respond only to second-line drugs. We projected the health benefits and cost-effectiveness of using drug susceptibility testing and second-line drugs in a lower-middle-income setting with high levels of MDR TB.
Methods and Findings
We developed a dynamic state-transition model of TB. In a base case analysis, the model was calibrated to approximate the TB epidemic in Peru, a setting with a smear-positive TB incidence of 120 per 100,000 and 4.5% MDR TB among prevalent cases. Secondary analyses considered other settings. The following strategies were evaluated: first-line drugs administered under directly observed therapy (DOTS), locally standardized second-line drugs for previously treated cases (STR1), locally standardized second-line drugs for previously treated cases with test-confirmed MDR TB (STR2), comprehensive drug susceptibility testing and individualized treatment for previously treated cases (ITR1), and comprehensive drug susceptibility testing and individualized treatment for all cases (ITR2). Outcomes were costs per TB death averted and costs per quality-adjusted life year (QALY) gained. We found that strategies incorporating the use of second-line drug regimens following first-line treatment failure were highly cost-effective compared to strategies using first-line drugs only. In our base case, standardized second-line treatment for confirmed MDR TB cases (STR2) had an incremental cost-effectiveness ratio of $720 per QALY ($8,700 per averted death) compared to DOTS. Individualized second-line drug treatment for MDR TB following first-line failure (ITR1) provided more benefit at an incremental cost of $990 per QALY ($12,000 per averted death) compared to STR2. A more aggressive version of the individualized treatment strategy (ITR2), in which both new and previously treated cases are tested for MDR TB, had an incremental cost-effectiveness ratio of $11,000 per QALY ($160,000 per averted death) compared to ITR1. The STR2 and ITR1 strategies remained cost-effective under a wide range of alternative assumptions about treatment costs, effectiveness, MDR TB prevalence, and transmission.
Treatment of MDR TB using second-line drugs is highly cost-effective in Peru. In other settings, the attractiveness of strategies using second-line drugs will depend on TB incidence, MDR burden, and the available budget, but simulation results suggest that individualized regimens would be cost-effective in a wide range of situations.
Editors' Summary
Tuberculosis (TB) remains one of the most entrenched diseases on the planet—an estimated one in three people worldwide are infected with Mycobacterium tuberculosis, which causes the disease. Although effective drugs exist, a major reason for the failure to stem the spread of TB lies in the rise of drug-resistant strains of the bacterium. Some strains are resistant to several drugs; patients with this sort of infection are said to have multidrug-resistant (MDR) TB. The development of drug-resistant strains is fostered when health-care workers do not follow treatment guidelines or fail to ensure that patients take the whole treatment course. The World Health Organization recommends an approach to TB control called “DOTS,” which has been adopted by many countries. (See the link below for an explanation of what DOTS involves.) The antibiotics that are used in DOTS are described as “first-line” treatment drugs. They are highly effective against non-resistant TB but much less so against MDR TB. There are other, more expensive, “second-line” antibiotics that perform better against MDR TB.
  Why Was This Study Done?
Despite the worrying rise in MDR TB cases, the much higher cost of using second-line drugs is prompting some policy-makers to question the merits of introducing them in poor countries with limited resources. However, with MDR TB accounting for nearly a third of TB cases in some countries, first-line therapies seem unlikely to be sufficient in the long term. Second-line strategies, or “DOTS-Plus” strategies, are either standardized for a particular region or are chosen for individual patients on the basis of drug susceptibility tests. The researchers wanted to investigate whether standardized or individualized second-line regimens could save lives and be cost-effective in poor countries.
  What Did the Researchers Do and Find?
The researchers used a method called modeling. They took information already available about TB in Peru, where for every 100,000 people there are 120 new TB infections every year, and 4.5% of existing cases are MDR TB. The researchers then calculated what might happen over the next 30 years, comparing the likely effects of five alternative strategies. In four, new cases were given first-line drugs for 6 months. Those who were not cured were then treated in different ways. In DOTS, they were retreated with a second course of the same drugs; in STR1 they were given an 18-month standardized course of second-line and first-line drugs; in STR2, only confirmed MDR TB patients were given an 18-month standardized course of second-line and first-line drugs; and in ITR1, confirmed MDR TB patients were given a personalized regimen of second-line drugs. The fifth strategy, ITR2, tested all patients for drug susceptibility at the outset of treatment, and those with MDR TB were given an individualized course; those not cured were tested again and given another individualized course.
  Compared with DOTS, both the STR1 and STR2 strategies averted 4.8 deaths per 100,000 population, at a cost of $8,700 per averted death—with STR2 being a better value for money since it treated only confirmed MDR TB cases with the more expensive, second-line drugs. Of the individualized treatments, ITR1 averted an extra 0.9 deaths at a cost of $12,000 per averted death; ITR2 averted a further 1.2 deaths but at a much higher $160,000 per saved life.
  What Do These Findings Mean?
Despite the slightly higher cost of ITR1, the extra number of lives it would save compared with STR2 makes it a good approach for treatment in Peru. However, cost-effectiveness varies with other factors. If the difference in cost between the two strategies became higher than $9,500 per patient, STR would be preferable. And, if MDR TB were present in 10% of all TB cases, ITR2—with comprehensive drug susceptibility testing for all TB patients—would be best.
  The findings are of interest not just in Peru but in other developing countries where MDR TB is a growing problem. The researchers maintain that, in areas where DOTS has not yet been fully implemented, it would be more efficient to expand DOTS than to introduce DOTS-Plus. But they add that it would be beneficial to expand DOTS as well as implement DOTS-Plus. Individualized treatment after drug susceptibility testing is likely to be cost-effective even in the poorest of countries, which should give impetus to governments and organizations in those countries where MDR TB is a growing concern to modify their approach to treatment.
  Additional Information.
Please access these Web sites via the online version of this summary at
• Basic information about tuberculosis can be found on the Web site of the US National Institute of Allergy and Infectious Diseases
• The Web site of the World Health Organization's Stop TB department outlines both the DOTS and DOTS-Plus strategies
•  TB Alert, a UK-based charity that promotes TB awareness worldwide, has information on TB in several European, African, and Asian languages
Resch and colleagues found that treatment of MDR TB using second-line drugs is highly cost-effective in Peru.
PMCID: PMC1483913  PMID: 16796403
22.  Examination of polymorphic glutathione S-transferase (GST) genes, tobacco smoking and prostate cancer risk among Men of African Descent: A case-control study 
BMC Cancer  2009;9:397.
Polymorphisms in glutathione S-transferase (GST) genes may influence response to oxidative stress and modify prostate cancer (PCA) susceptibility. These enzymes generally detoxify endogenous and exogenous agents, but also participate in the activation and inactivation of oxidative metabolites that may contribute to PCA development. Genetic variations within selected GST genes may influence PCA risk following exposure to carcinogen compounds found in cigarette smoke and decreased the ability to detoxify them. Thus, we evaluated the effects of polymorphic GSTs (M1, T1, and P1) alone and combined with cigarette smoking on PCA susceptibility.
In order to evaluate the effects of GST polymorphisms in relation to PCA risk, we used TaqMan allelic discrimination assays along with a multi-faceted statistical strategy involving conventional and advanced statistical methodologies (e.g., Multifactor Dimensionality Reduction and Interaction Graphs). Genetic profiles collected from 873 men of African-descent (208 cases and 665 controls) were utilized to systematically evaluate the single and joint modifying effects of GSTM1 and GSTT1 gene deletions, GSTP1 105 Val and cigarette smoking on PCA risk.
We observed a moderately significant association between risk among men possessing at least one variant GSTP1 105 Val allele (OR = 1.56; 95%CI = 0.95-2.58; p = 0.049), which was confirmed by MDR permutation testing (p = 0.001). We did not observe any significant single gene effects among GSTM1 (OR = 1.08; 95%CI = 0.65-1.82; p = 0.718) and GSTT1 (OR = 1.15; 95%CI = 0.66-2.02; p = 0.622) on PCA risk among all subjects. Although the GSTM1-GSTP1 pairwise combination was selected as the best two factor LR and MDR models (p = 0.01), assessment of the hierarchical entropy graph suggested that the observed synergistic effect was primarily driven by the GSTP1 Val marker. Notably, the GSTM1-GSTP1 axis did not provide additional information gain when compared to either loci alone based on a hierarchical entropy algorithm and graph. Smoking status did not significantly modify the relationship between the GST SNPs and PCA.
A moderately significant association was observed between PCA risk and men possessing at least one variant GSTP1 105 Val allele (p = 0.049) among men of African descent. We also observed a 2.1-fold increase in PCA risk associated with men possessing the GSTP1 (Val/Val) and GSTM1 (*1/*1 + *1/*0) alleles. MDR analysis validated these findings; detecting GSTP1 105 Val (p = 0.001) as the best single factor for predicting PCA risk. Our findings emphasize the importance of utilizing a combination of traditional and advanced statistical tools to identify and validate single gene and multi-locus interactions in relation to cancer susceptibility.
PMCID: PMC2783040  PMID: 19917083
23.  Gene ontology analysis of pairwise genetic associations in two genome-wide studies of sporadic ALS 
BioData Mining  2012;5:9.
It is increasingly clear that common human diseases have a complex genetic architecture characterized by both additive and nonadditive genetic effects. The goal of the present study was to determine whether patterns of both additive and nonadditive genetic associations aggregate in specific functional groups as defined by the Gene Ontology (GO).
We first estimated all pairwise additive and nonadditive genetic effects using the multifactor dimensionality reduction (MDR) method that makes few assumptions about the underlying genetic model. Statistical significance was evaluated using permutation testing in two genome-wide association studies of ALS. The detection data consisted of 276 subjects with ALS and 271 healthy controls while the replication data consisted of 221 subjects with ALS and 211 healthy controls. Both studies included genotypes from approximately 550,000 single-nucleotide polymorphisms (SNPs). Each SNP was mapped to a gene if it was within 500 kb of the start or end. Each SNP was assigned a p-value based on its strongest joint effect with the other SNPs. We then used the Exploratory Visual Analysis (EVA) method and software to assign a p-value to each gene based on the overabundance of significant SNPs at the α = 0.05 level in the gene. We also used EVA to assign p-values to each GO group based on the overabundance of significant genes at the α = 0.05 level. A GO category was determined to replicate if that category was significant at the α = 0.05 level in both studies. We found two GO categories that replicated in both studies. The first, ‘Regulation of Cellular Component Organization and Biogenesis’, a GO Biological Process, had p-values of 0.010 and 0.014 in the detection and replication studies, respectively. The second, ‘Actin Cytoskeleton’, a GO Cellular Component, had p-values of 0.040 and 0.046 in the detection and replication studies, respectively.
Pathway analysis of pairwise genetic associations in two GWAS of sporadic ALS revealed a set of genes involved in cellular component organization and actin cytoskeleton, more specifically, that were not reported by prior GWAS. However, prior biological studies have implicated actin cytoskeleton in ALS and other motor neuron diseases. This study supports the idea that pathway-level analysis of GWAS data may discover important associations not revealed using conventional one-SNP-at-a-time approaches.
PMCID: PMC3463436  PMID: 22839596
24.  A comparison of internal model validation methods for multifactor dimensionality reduction in the case of genetic heterogeneity 
BMC Research Notes  2012;5:623.
Determining the genes responsible for certain human traits can be challenging when the underlying genetic model takes a complicated form such as heterogeneity (in which different genetic models can result in the same trait) or epistasis (in which genes interact with other genes and the environment). Multifactor Dimensionality Reduction (MDR) is a widely used method that effectively detects epistasis; however, it does not perform well in the presence of heterogeneity partly due to its reliance on cross-validation for internal model validation. Cross-validation allows for only one “best” model and is therefore inadequate when more than one model could cause the same trait. We hypothesize that another internal model validation method known as a three-way split will be better at detecting heterogeneity models.
In this study, we test this hypothesis by performing a simulation study to compare the performance of MDR to detect models of heterogeneity with the two different internal model validation techniques. We simulated a range of disease models with both main effects and gene-gene interactions with a range of effect sizes. We assessed the performance of each method using a range of definitions of power.
Overall, the power of MDR to detect heterogeneity models was relatively poor, especially under more conservative (strict) definitions of power. While the overall power was low, our results show that the cross-validation approach greatly outperformed the three-way split approach in detecting heterogeneity. This would motivate using cross-validation with MDR in studies where heterogeneity might be present. These results also emphasize the challenge of detecting heterogeneity models and the need for further methods development.
PMCID: PMC3599301  PMID: 23126544
25.  Multifactor dimensionality reduction for graphics processing units enables genome-wide testing of epistasis in sporadic ALS 
Bioinformatics  2010;26(5):694-695.
Motivation: Epistasis, the presence of gene–gene interactions, has been hypothesized to be at the root of many common human diseases, but current genome-wide association studies largely ignore its role. Multifactor dimensionality reduction (MDR) is a powerful model-free method for detecting epistatic relationships between genes, but computational costs have made its application to genome-wide data difficult. Graphics processing units (GPUs), the hardware responsible for rendering computer games, are powerful parallel processors. Using GPUs to run MDR on a genome-wide dataset allows for statistically rigorous testing of epistasis.
Results: The implementation of MDR for GPUs (MDRGPU) includes core features of the widely used Java software package, MDR. This GPU implementation allows for large-scale analysis of epistasis at a dramatically lower cost than the standard CPU-based implementations. As a proof-of-concept, we applied this software to a genome-wide study of sporadic amyotrophic lateral sclerosis (ALS). We discovered a statistically significant two-SNP classifier and subsequently replicated the significance of these two SNPs in an independent study of ALS. MDRGPU makes the large-scale analysis of epistasis tractable and opens the door to statistically rigorous testing of interactions in genome-wide datasets.
Availability: MDRGPU is open source and available free of charge from
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC2828117  PMID: 20081222

