Gene regulatory networks (GRNs) drive the cellular processes that sustain life. To do so reliably, GRNs must be robust to perturbations, such as gene deletion and the addition or removal of regulatory interactions. GRNs must also be robust to genetic changes in regulatory regions that define the logic of signal-integration, as these changes can affect how specific combinations of regulatory signals are mapped to particular gene expression states. Previous theoretical analyses have demonstrated that the robustness of a GRN is influenced by its underlying topological properties, such as degree distribution and modularity. Another important topological property is assortativity, which measures the propensity with which nodes of similar connectivity are connected to one another. How assortativity influences the robustness of the signal-integration logic of GRNs remains an open question. Here, we use computational models of GRNs to investigate this relationship. We separately consider each of the three dynamical regimes of this model for a variety of degree distributions. We find that in the chaotic regime, robustness exhibits a pronounced increase as assortativity becomes more positive, while in the critical and ordered regimes, robustness is generally less sensitive to changes in assortativity. We attribute the increased robustness to a decrease in the duration of the gene expression pattern, which is caused by a reduction in the average size of a GRN’s in-components. This study provides the first direct evidence that assortativity influences the robustness of the signal-integration logic of computational models of GRNs, illuminates a mechanistic explanation for this influence, and furthers our understanding of the relationship between topology and robustness in complex biological systems.
Boolean networks; regulatory regions; in-components; genetic regulation
The rapid development of sequencing technologies makes thousands to millions of genetic attributes available for testing associations with various biological traits. Searching this enormous high-dimensional data space imposes a great computational challenge in genome-wide association studies. We introduce a network-based approach to supervise the search for three-locus models of disease susceptibility. Such statistical epistasis networks (SEN) are built using strong pairwise epistatic interactions and provide a global interaction map to search for higher-order interactions by prioritizing genetic attributes clustered together in the networks. Applying this approach to a population-based bladder cancer dataset, we found a high susceptibility three-way model of genetic variations in DNA repair and immune regulation pathways, which holds great potential for studying the etiology of bladder cancer with further biological validations. We demonstrate that our SEN-supervised search is able to find a small subset of three-locus models with significantly high associations at a substantially reduced computational cost.
Epistasis; High-order genetic interactions; GWAS; Statistical epistasis networks; MDR
Decades after the eradication of smallpox, its etiological agent, variola virus (VARV), remains a threat as a potential bioweapon. Outbreaks of smallpox around the time of the global eradication effort exhibited variable case fatality rates (CFRs), likely attributable in part to complex viral genetic determinants of smallpox virulence. We aimed to identify genome-wide single nucleotide polymorphisms associated with CFR. We evaluated unadjusted and outbreak geographic location-adjusted models of single SNPs and two- and three-way interactions between SNPs.
Using the data mining approach multifactor dimensionality reduction (MDR), we identified five VARV SNPs in models significantly associated with CFR. The top performing unadjusted model and adjusted models both revealed the same two-way gene-gene interaction. We discuss the biological plausibility of the influence of the SNPs identified these and other significant models on the strain-specific virulence of VARV.
We have identified genetic loci in the VARV genome that are statistically associated with VARV virulence as measured by CFR. While our ability to infer a causal relationship between the specific SNPs identified in our analysis and VARV virulence is limited, our results suggest that smallpox severity is in part associated with VARV strain variation and that VARV virulence may be determined by multiple genetic loci. This study represents the first application of MDR to the identification of pathogen gene-gene interactions for predicting infectious disease outbreak severity.
Smallpox; Variola virus; Single nucleotide polymorphisms; Multifactor dimensionality reduction
The collection and analysis of genomic data has the potential to reveal novel druggable targets by providing insight into the genetic basis of disease. However, the number of drugs, targeting new molecular entities, approved by the US Food and Drug Administration (FDA) has not increased in the years since the collection of genomic data has become commonplace. The paucity of translatable results can be partly attributed to conventional analysis methods that test one gene at a time in an effort to identify disease-associated factors as candidate drug targets. By disengaging genetic factors from their position within the genetic regulatory system, much of the information stored within the genomic data set is lost. Here we discuss how genomic data is used to identify disease-associated genes or genomic regions, how disease-associated regions are validated as functional targets, and the role network analysis can play in bridging the gap between data generation and effective drug target identification.
Algorithms designed to detect complex genetic disease associations are initially evaluated using simulated datasets. Typical evaluations vary constraints that influence the correct detection of underlying models (i.e. number of loci, heritability, and minor allele frequency). Such studies neglect to account for model architecture (i.e. the unique specification and arrangement of penetrance values comprising the genetic model), which alone can influence the detectability of a model. In order to design a simulation study which efficiently takes architecture into account, a reliable metric is needed for model selection.
We evaluate three metrics as predictors of relative model detection difficulty derived from previous works: (1) Penetrance table variance (PTV), (2) customized odds ratio (COR), and (3) our own Ease of Detection Measure (EDM), calculated from the penetrance values and respective genotype frequencies of each simulated genetic model. We evaluate the reliability of these metrics across three very different data search algorithms, each with the capacity to detect epistatic interactions. We find that a model’s EDM and COR are each stronger predictors of model detection success than heritability.
This study formally identifies and evaluates metrics which quantify model detection difficulty. We utilize these metrics to intelligently select models from a population of potential architectures. This allows for an improved simulation study design which accounts for differences in detection difficulty attributed to model architecture. We implement the calculation and utilization of EDM and COR into GAMETES, an algorithm which rapidly and precisely generates pure, strict, n-locus epistatic models.
EDM; COR; GAMETES; SNP; Model detection; Epistasis; Simulation; Model; Genetics
The study of common, complex multifactorial diseases in genetic epidemiology is complicated by nonlinearity in the genotype-to-phenotype mapping relationship that is due, in part, to epistasis or gene-gene interactions. Symobolic discriminant analysis (SDA) is a flexible modeling approach which uses genetic programming (GP) to evolve an optimal predictive model using a predefined collection of mathematical functions, constants, and attributes. This has been shown to be an effective strategy for modeling epistasis. In the present study, we introduce the genetic “mask” as a novel building block which exploits expert knowledge in the form of a pre-constructed relationship between two attributes. The goal of this study was to determine whether the availability of “mask” building blocks improves SDA performance. The results of this study support the idea that pre-processing data improves GP performance.
Genetic Analysis; Genetic Epidemiology; Genetic Programming; Symbolic Discriminant Analysis; Symbolic Regression; Function Set; Two-Locus Model; Genetic Mask
Schizophrenia is a complex genetic disorder. Gene set-based analytic (GSA) methods have been widely applied for exploratory analyses of large, high-throughput datasets, but less commonly employed for biological hypothesis testing. Our primary hypothesis is that variation in ion channel genes contribute to the genetic susceptibility to schizophrenia. We applied Exploratory Visual Analysis (EVA), one GSA application, to analyze European-American (EA) and African-American (AA) schizophrenia genome-wide association study datasets for statistical enrichment of ion channel gene sets, comparing GSA results derived under three SNP-to-gene mapping strategies: (1) GENIC; (2) 500-Kb; (3) 2.5-Mb and three complimentary SNP-to-gene statistical reduction methods: (1) minimum p value (pMIN); (2) a novel method, proportion of SNPs per Gene with p-values below a pre-defined α-threshold (PROP); and (3) the truncated product method (TPM). In the EA analyses, ion channel gene set(s) were enriched under all mapping and statistical approaches. In the AA analysis, ion channel gene set(s) were significantly enriched under pMIN for all mapping strategies and under PROP for broader mapping strategies. Less extensive enrichment in the AA sample may reflect true ethnic differences in susceptibility, sampling or case ascertainment differences, or higher dimensionality relative to sample size of the AA data. More consistent findings under broader mapping strategies may reflect enhanced power due to increased SNP inclusion, enhanced capture of effects over extended haplotypes or significant contributions from regulatory regions. While extensive pMIN findings may reflect gene size bias, the extent and significance of PROP and TPM findings suggest that common variation at ion channel genes may capture some of the heritability of schizophrenia.
Clinical studies suggest metabolic memory to hyperglycemia. We tested whether diabetes leads to persistent systematic in vitro gene expression alterations in patients with type 1 diabetes (T1D) compared with their monozygotic, nondiabetic twins. Microarray gene expression was determined in skin fibroblasts (SFs) of five twin pairs cultured in high glucose (HG) for ∼6 weeks. The Exploratory Visual Analysis System tested group differences in gene expression levels within KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways. An overabundance of differentially expressed genes was found in eight pathways: arachidonic acid metabolism (P = 0.003849), transforming growth factor-β signaling (P = 0.009167), glutathione metabolism (P = 0.01281), glycosylphosphatidylinositol anchor (P = 0.01949), adherens junction (P = 0.03134), dorsal-ventral axis formation (P = 0.03695), proteasome (P = 0.04327), and complement and coagulation cascade (P = 0.04666). Several genes involved in epigenetic mechanisms were also differentially expressed. All differentially expressed pathways and all the epigenetically relevant differentially expressed genes have previously been related to HG in vitro or to diabetes and its complications in animal and human studies. However, this is the first in vitro study demonstrating diabetes-relevant gene expression differences between T1D-discordant identical twins. These SF gene expression differences, persistent despite the HG in vitro conditions, likely reflect “metabolic memory”, and discordant identical twins thus represent an excellent model for studying diabetic epigenetic processes in humans.
Simulation studies are useful in various disciplines for a number of reasons including the development and evaluation of new computational and statistical methods. This is particularly true in human genetics and genetic epidemiology where new analytical methods are needed for the detection and characterization of disease susceptibility genes whose effects are complex, nonlinear, and partially or solely dependent on the effects of other genes. Despite this need, the development of complex genetic models that can be used to simulate data is not always intuitive. In fact, only a few such models have been published. In this paper, we present a strategy for identifying complex genetic models for simulation studies that utilizes genetic algorithms. The genetic models used in this study are penetrance functions that define the probability of disease given a specific DNA sequence variation has been inherited. We demonstrate that the genetic algorithm approach routinely identifies interesting and useful penetrance functions in a human-competitve manner.
Epistasis is recognized ubiquitous in the genetic architecture of complex traits such as disease susceptibility. Experimental studies in model organisms have revealed extensive evidence of biological interactions among genes. Meanwhile, statistical and computational studies in human populations have suggested non-additive effects of genetic variation on complex traits. Although these studies form a baseline for understanding the genetic architecture of complex traits, to date they have only considered interactions among a small number of genetic variants. Our goal here is to use network science to determine the extent to which non-additive interactions exist beyond small subsets of genetic variants. We infer statistical epistasis networks to characterize the global space of pairwise interactions among approximately 1500 Single Nucleotide Polymorphisms (SNPs) spanning nearly 500 cancer susceptibility genes in a large population-based study of bladder cancer.
The statistical epistasis network was built by linking pairs of SNPs if their pairwise interactions were stronger than a systematically derived threshold. Its topology clearly differentiated this real-data network from networks obtained from permutations of the same data under the null hypothesis that no association exists between genotype and phenotype. The network had a significantly higher number of hub SNPs and, interestingly, these hub SNPs were not necessarily with high main effects. The network had a largest connected component of 39 SNPs that was absent in any other permuted-data networks. In addition, the vertex degrees of this network were distinctively found following an approximate power-law distribution and its topology appeared scale-free.
In contrast to many existing techniques focusing on high main-effect SNPs or models of several interacting SNPs, our network approach characterized a global picture of gene-gene interactions in a population-based genetic data. The network was built using pairwise interactions, and its distinctive network topology and large connected components indicated joint effects in a large set of SNPs. Our observations suggested that this particular statistical epistasis network captured important features of the genetic architecture of bladder cancer that have not been described previously.
Over the past several years, genome-wide association studies (GWAS) have succeeded in identifying hundreds of genetic markers associated with common diseases. However, most of these markers confer relatively small increments of risk and explain only a small proportion of familial clustering. To identify obstacles to future progress in genetic epidemiology research and provide recommendations to NIH for overcoming these barriers, the National Cancer Institute sponsored a workshop entitled “Next Generation Analytic Tools for Large-Scale Genetic Epidemiology Studies of Complex Diseases” on September 15–16, 2010. The goal of the workshop was to facilitate discussions on (1) statistical strategies and methods to efficiently identify genetic and environmental factors contributing to the risk of complex disease; and (2) how to develop, apply, and evaluate these strategies for the design, analysis, and interpretation of large-scale complex disease association studies in order to guide NIH in setting the future agenda in this area of research. The workshop was organized as a series of short presentations covering scientific (gene-gene and gene-environment interaction, complex phenotypes, and rare variants and next generation sequencing) and methodological (simulation modeling and computational resources and data management) topic areas. Specific needs to advance the field were identified during each session and are summarized.
gene-gene interactions; gene-environment interactions; rare variants; next generation sequencing; complex phenotypes; simulations; computational resources
Genome-wide association studies (GWAS) have evolved over the last ten years into a powerful tool for investigating the genetic architecture of human disease. In this work, we review the key concepts underlying GWAS, including the architecture of common diseases, the structure of common human genetic variation, technologies for capturing genetic information, study designs, and the statistical methods used for data analysis. We also look forward to the future beyond GWAS.
A goal of human genetics is to discover genetic factors that influence individuals' susceptibility to common diseases. Most common diseases are thought to result from the joint failure of two or more interacting components instead of single component failures. This greatly complicates both the task of selecting informative genetic variants and the task of modeling interactions between them. We and others have previously developed algorithms to detect and model the relationships between these genetic factors and disease. Previously these methods have been evaluated with datasets simulated according to pre-defined genetic models.
Here we develop and evaluate a model free evolution strategy to generate datasets which display a complex relationship between individual genotype and disease susceptibility. We show that this model free approach is capable of generating a diverse array of datasets with distinct gene-disease relationships for an arbitrary interaction order and sample size. We specifically generate eight-hundred Pareto fronts; one for each independent run of our algorithm. In each run the predictiveness of single genetic variation and pairs of genetic variants have been minimized, while the predictiveness of third, fourth, or fifth-order combinations is maximized. Two hundred runs of the algorithm are further dedicated to creating datasets with predictive four or five order interactions and minimized lower-level effects.
This method and the resulting datasets will allow the capabilities of novel methods to be tested without pre-specified genetic models. This allows researchers to evaluate which methods will succeed on human genetics problems where the model is not known in advance. We further make freely available to the community the entire Pareto-optimal front of datasets from each run so that novel methods may be rigorously evaluated. These 76,600 datasets are available from http://discovery.dartmouth.edu/model_free_data/.
In human genetics it is now possible to measure large numbers of DNA sequence variations across the human genome. Given current knowledge about biological networks and disease processes it seems likely that disease risk can best be modeled by interactions between biological components, which may be examined as interacting DNA sequence variations. The machine learning challenge is to effectively explore interactions in these datasets to identify combinations of variations which are predictive of common human diseases. Genetic programming is a promising approach to this problem. The goal of this study is to examine the role that an expert knowledge aware initializer can play in the framework of genetic programming. We show that this expert knowledge aware initializer outperforms both a random initializer and an enumerative initializer.
Simulation studies are useful in various disciplines for a number of reasons including the development and evaluation of new computational and statistical methods. This is particularly true in human genetics and genetic epidemiology where new analytical methods are needed for the detection and characterization of disease susceptibility genes whose effects are complex, nonlinear, and partially or solely dependent on the effects of other genes (i.e. epistasis or gene-gene interaction). Despite this need, the development of complex genetic models that can be used to simulate data is not always intuitive. In fact, only a few such models have been published. We have previously developed a genetic algorithm approach to discovering complex genetic models in which two single nucleotide polymorphisms (SNPs) influence disease risk solely through nonlinear interactions. In this paper, we extend this approach for the discovery of high-order epistasis models involving three to five SNPs. We demonstrate that the genetic algorithm is capable of routinely discovering interesting high-order epistasis models in which each SNP influences risk of disease only through interactions with the other SNPs in the model. This study opens the door for routine simulation of complex gene-gene interactions among SNPs for the development and evaluation of new statistical and computational approaches for identifying common, complex multifactorial disease susceptibility genes.
Gene-Gene Interactions; Simulation; Penetrance; Genetic Epidemiology
The identification of genes that influence the risk of common, complex disease primarily through interactions with other genes and environmental factors remains a statistical and computational challenge in genetic epidemiology. This challenge is partly due to the limitations of parametric statistical methods for detecting genetic effects that are dependent solely or partially on interactions. We have previously introduced a genetic programming neural network (GPNN) as a method for optimizing the architecture of a neural network to improve the identification of genetic and gene-environment combinations associated with disease risk. Previous empirical studies suggest GPNN has excellent power for identifying gene-gene and gene-environment interactions. The goal of this study was to compare the power of GPNN to stepwise logistic regression (SLR) and classification and regression trees (CART) for identifying gene-gene and gene-environment interactions. SLR and CART are standard methods of analysis for genetic association studies. Using simulated data, we show that GPNN has higher power to identify gene-gene and gene-environment interactions than SLR and CART. These results indicate that GPNN may be a useful pattern recognition approach for detecting gene-gene and gene-environment interactions in studies of human disease.
One of the central goals of human genetics is the identification of loci with alleles or genotypes that confer increased susceptibility. The availability of dense maps of single-nucleotide polymorphisms (SNPs) along with high-throughput genotyping technologies has set the stage for routine genome-wide association studies that are expected to significantly improve our ability to identify susceptibility loci. Before this promise can be realized, there are some significant challenges that need to be addressed. We address here the challenge of detecting epistasis or gene-gene interactions in genome-wide association studies. Discovering epistatic interactions in high dimensional datasets remains a challenge due to the computational complexity resulting from the analysis of all possible combinations of SNPs. One potential way to overcome the computational burden of a genome-wide epistasis analysis would be to devise a logical way to prioritize the many SNPs in a dataset so that the data may be analyzed more efficiently and yet still retain important biological information. One of the strongest demonstrations of the functional relationship between genes is protein-protein interaction. Thus, it is plausible that the expert knowledge extracted from protein interaction databases may allow for a more efficient analysis of genome-wide studies as well as facilitate the biological interpretation of the data. In this review we will discuss the challenges of detecting epistasis in genome-wide genetic studies and the means by which we propose to apply expert knowledge extracted from protein interaction databases to facilitate this process. We explore some of the fundamentals of protein interactions and the databases that are publicly available.
protein-protein interaction; expert knowledge; epistasis; MDR; SNP
Polymorphisms in glutathione S-transferase (GST) genes may influence response to oxidative stress and modify prostate cancer (PCA) susceptibility. These enzymes generally detoxify endogenous and exogenous agents, but also participate in the activation and inactivation of oxidative metabolites that may contribute to PCA development. Genetic variations within selected GST genes may influence PCA risk following exposure to carcinogen compounds found in cigarette smoke and decreased the ability to detoxify them. Thus, we evaluated the effects of polymorphic GSTs (M1, T1, and P1) alone and combined with cigarette smoking on PCA susceptibility.
In order to evaluate the effects of GST polymorphisms in relation to PCA risk, we used TaqMan allelic discrimination assays along with a multi-faceted statistical strategy involving conventional and advanced statistical methodologies (e.g., Multifactor Dimensionality Reduction and Interaction Graphs). Genetic profiles collected from 873 men of African-descent (208 cases and 665 controls) were utilized to systematically evaluate the single and joint modifying effects of GSTM1 and GSTT1 gene deletions, GSTP1 105 Val and cigarette smoking on PCA risk.
We observed a moderately significant association between risk among men possessing at least one variant GSTP1 105 Val allele (OR = 1.56; 95%CI = 0.95-2.58; p = 0.049), which was confirmed by MDR permutation testing (p = 0.001). We did not observe any significant single gene effects among GSTM1 (OR = 1.08; 95%CI = 0.65-1.82; p = 0.718) and GSTT1 (OR = 1.15; 95%CI = 0.66-2.02; p = 0.622) on PCA risk among all subjects. Although the GSTM1-GSTP1 pairwise combination was selected as the best two factor LR and MDR models (p = 0.01), assessment of the hierarchical entropy graph suggested that the observed synergistic effect was primarily driven by the GSTP1 Val marker. Notably, the GSTM1-GSTP1 axis did not provide additional information gain when compared to either loci alone based on a hierarchical entropy algorithm and graph. Smoking status did not significantly modify the relationship between the GST SNPs and PCA.
A moderately significant association was observed between PCA risk and men possessing at least one variant GSTP1 105 Val allele (p = 0.049) among men of African descent. We also observed a 2.1-fold increase in PCA risk associated with men possessing the GSTP1 (Val/Val) and GSTM1 (*1/*1 + *1/*0) alleles. MDR analysis validated these findings; detecting GSTP1 105 Val (p = 0.001) as the best single factor for predicting PCA risk. Our findings emphasize the importance of utilizing a combination of traditional and advanced statistical tools to identify and validate single gene and multi-locus interactions in relation to cancer susceptibility.
Background: Obesity is a growing worldwide problem with genetic and environmental causes, and it is an underlying basis for many diseases. Studies have shown that the toxicant-activated aryl hydrocarbon receptor (AHR) may disrupt fat metabolism and contribute to obesity. The AHR is a nuclear receptor/transcription factor that is best known for responding to environmental toxicant exposures to induce a battery of xenobiotic-metabolizing genes.
Objectives: The intent of the work reported here was to test more directly the role of the AHR in obesity and fat metabolism in lieu of exogenous toxicants.
Methods: We used two congenic mouse models that differ at the Ahr gene and encode AHRs with a 10-fold difference in signaling activity. The two mouse strains were fed either a low-fat (regular) diet or a high-fat (Western) diet.
Results: The Western diet differentially affected body size, body fat:body mass ratios, liver size and liver metabolism, and liver mRNA and miRNA profiles. The regular diet had no significant differential effects.
Conclusions: The results suggest that the AHR plays a large and broad role in obesity and associated complications, and importantly, may provide a simple and effective therapeutic strategy to combat obesity, heart disease, and other obesity-associated illnesses.
aryl hydrocarbon receptor; gene–environment interaction; liver; mRNA; miRNA; obesity; Western diet
Genome-wide association studies are becoming the de facto standard in the genetic analysis of common human diseases. Given the complexity and robustness of biological networks such diseases are unlikely to be the result of single points of failure but instead likely arise from the joint failure of two or more interacting components. The hope in genome-wide screens is that these points of failure can be linked to single nucleotide polymorphisms (SNPs) which confer disease susceptibility. Detecting interacting variants that lead to disease in the absence of single-gene effects is difficult however, and methods to exhaustively analyze sets of these variants for interactions are combinatorial in nature thus making them computationally infeasible. Efficient algorithms which can detect interacting SNPs are needed. ReliefF is one such promising algorithm, although it has low success rate for noisy datasets when the interaction effect is small. ReliefF has been paired with an iterative approach, Tuned ReliefF (TuRF), which improves the estimation of weights in noisy data but does not fundamentally change the underlying ReliefF algorithm. To improve the sensitivity of studies using these methods to detect small effects we introduce Spatially Uniform ReliefF (SURF).
SURF's ability to detect interactions in this domain is significantly greater than that of ReliefF. Similarly SURF, in combination with the TuRF strategy significantly outperforms TuRF alone for SNP selection under an epistasis model. It is important to note that this success rate increase does not require an increase in algorithmic complexity and allows for increased success rate, even with the removal of a nuisance parameter from the algorithm.
Researchers performing genetic association studies and aiming to discover gene-gene interactions associated with increased disease susceptibility should use SURF in place of ReliefF. For instance, SURF should be used instead of ReliefF to filter a dataset before an exhaustive MDR analysis. This change increases the ability of a study to detect gene-gene interactions. The SURF algorithm is implemented in the open source Multifactor Dimensionality Reduction (MDR) software package available from .
Human geneticists are now capable of measuring more than one million DNA sequence variations from across the human genome. The new challenge is to develop computationally feasible methods capable of analyzing these data for associations with common human disease, particularly in the context of epistasis. Epistasis describes the situation where multiple genes interact in a complex non-linear manner to determine an individual's disease risk and is thought to be ubiquitous for common diseases. Multifactor Dimensionality Reduction (MDR) is an algorithm capable of detecting epistasis. An exhaustive analysis with MDR is often computationally expensive, particularly for high order interactions. This challenge has previously been met with parallel computation and expensive hardware. The option we examine here exploits commodity hardware designed for computer graphics. In modern computers Graphics Processing Units (GPUs) have more memory bandwidth and computational capability than Central Processing Units (CPUs) and are well suited to this problem. Advances in the video game industry have led to an economy of scale creating a situation where these powerful components are readily available at very low cost. Here we implement and evaluate the performance of the MDR algorithm on GPUs. Of primary interest are the time required for an epistasis analysis and the price to performance ratio of available solutions.
We found that using MDR on GPUs consistently increased performance per machine over both a feature rich Java software package and a C++ cluster implementation. The performance of a GPU workstation running a GPU implementation reduces computation time by a factor of 160 compared to an 8-core workstation running the Java implementation on CPUs. This GPU workstation performs similarly to 150 cores running an optimized C++ implementation on a Beowulf cluster. Furthermore this GPU system provides extremely cost effective performance while leaving the CPU available for other tasks. The GPU workstation containing three GPUs costs $2000 while obtaining similar performance on a Beowulf cluster requires 150 CPU cores which, including the added infrastructure and support cost of the cluster system, cost approximately $82,500.
Graphics hardware based computing provides a cost effective means to perform genetic analysis of epistasis using MDR on large datasets without the infrastructure of a computing cluster.