PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-6 (6)
 

Clipboard (0)
None

Select a Filter Below

Journals
Authors
more »
Year of Publication
Document Types
1.  SNP interaction detection with Random Forests in high-dimensional genetic data 
BMC Bioinformatics  2012;13:164.
Background
Identifying variants associated with complex human traits in high-dimensional data is a central goal of genome-wide association studies. However, complicated etiologies such as gene-gene interactions are ignored by the univariate analysis usually applied in these studies. Random Forests (RF) are a popular data-mining technique that can accommodate a large number of predictor variables and allow for complex models with interactions. RF analysis produces measures of variable importance that can be used to rank the predictor variables. Thus, single nucleotide polymorphism (SNP) analysis using RFs is gaining popularity as a potential filter approach that considers interactions in high-dimensional data. However, the impact of data dimensionality on the power of RF to identify interactions has not been thoroughly explored. We investigate the ability of rankings from variable importance measures to detect gene-gene interaction effects and their potential effectiveness as filters compared to p-values from univariate logistic regression, particularly as the data becomes increasingly high-dimensional.
Results
RF effectively identifies interactions in low dimensional data. As the total number of predictor variables increases, probability of detection declines more rapidly for interacting SNPs than for non-interacting SNPs, indicating that in high-dimensional data the RF variable importance measures are capturing marginal effects rather than capturing the effects of interactions.
Conclusions
While RF remains a promising data-mining technique that extends univariate methods to condition on multiple variables simultaneously, RF variable importance measures fail to detect interaction effects in high-dimensional data in the absence of a strong marginal component, and therefore may not be useful as a filter technique that allows for interaction effects in genome-wide data.
doi:10.1186/1471-2105-13-164
PMCID: PMC3463421  PMID: 22793366
2.  The Effect of Retrospective Sampling on Estimates of Prediction Error for Multifactor Dimensionality Reduction 
Annals of human genetics  2011;75(1):46-61.
SUMMARY
The standard in genetic association studies of complex diseases is replication and validation of positive results, with an emphasis on assessing the predictive value of associations. In response to this need, a number of analytical approaches have been developed to identify predictive models that account for complex genetic etiologies. Multifactor Dimensionality Reduction (MDR) is a commonly used, highly successful method designed to evaluate potential gene-gene interactions. MDR relies on classification error in a cross-validation framework to rank and evaluate potentially predictive models. Previous work has demonstrated the high power of MDR, but has not considered the accuracy and variance of the MDR prediction error estimate. Currently, we evaluate the bias and variance of the MDR error estimate as both a retrospective and prospective estimator and show that MDR can both underestimate and overestimate error. We argue that a prospective error estimate is necessary if MDR models are used for prediction, and propose a bootstrap resampling estimate, integrating population prevalence, to accurately estimate prospective error. We demonstrate that this bootstrap estimate is preferable for prediction to the error estimate currently produced by MDR. While demonstrated with MDR, the proposed estimation is applicable to all data-mining methods that use similar estimates.
doi:10.1111/j.1469-1809.2010.00587.x
PMCID: PMC2955770  PMID: 20560921
epistasis; gene-gene interaction; retrospective and prospective sampling; prediction error; bias; variance
3.  Auriculotherapy for Pain Management: A Systematic Review and Meta-Analysis of Randomized Controlled Trials 
Abstract
Objectives
Side-effects of standard pain medications can limit their use. Therefore, nonpharmacologic pain relief techniques such as auriculotherapy may play an important role in pain management. Our aim was to conduct a systematic review and meta-analysis of studies evaluating auriculotherapy for pain management.
Design
MEDLINE,® ISI Web of Science, CINAHL, AMED, and Cochrane Library were searched through December 2008. Randomized trials comparing auriculotherapy to sham, placebo, or standard-of-care control were included that measured outcomes of pain or medication use and were published in English. Two (2) reviewers independently assessed trial eligibility, quality, and abstracted data to a standardized form. Standardized mean differences (SMD) were calculated for studies using a pain score or analgesic requirement as a primary outcome.
Results
Seventeen (17) studies met inclusion criteria (8 perioperative, 4 acute, and 5 chronic pain). Auriculotherapy was superior to controls for studies evaluating pain intensity (SMD, 1.56 [95% confidence interval (CI): 0.85, 2.26]; 8 studies). For perioperative pain, auriculotherapy reduced analgesic use (SMD, 0.54 [95% CI: 0.30, 0.77]; 5 studies). For acute pain and chronic pain, auriculotherapy reduced pain intensity (SMD for acute pain, 1.35 [95% CI: 0.08, 2.64], 2 studies; SMD for chronic pain, 1.84 [95% CI: 0.60, 3.07], 5 studies). Removal of poor quality studies did not alter the conclusions. Significant heterogeneity existed among studies of acute and chronic pain, but not perioperative pain.
Conclusions
Auriculotherapy may be effective for the treatment of a variety of types of pain, especially postoperative pain. However, a more accurate estimate of the effect will require further large, well-designed trials.
doi:10.1089/acm.2009.0451
PMCID: PMC3110838  PMID: 20954963
4.  An R package implementation of multifactor dimensionality reduction 
BioData Mining  2011;4:24.
Background
A breadth of high-dimensional data is now available with unprecedented numbers of genetic markers and data-mining approaches to variable selection are increasingly being utilized to uncover associations, including potential gene-gene and gene-environment interactions. One of the most commonly used data-mining methods for case-control data is Multifactor Dimensionality Reduction (MDR), which has displayed success in both simulations and real data applications. Additional software applications in alternative programming languages can improve the availability and usefulness of the method for a broader range of users.
Results
We introduce a package for the R statistical language to implement the Multifactor Dimensionality Reduction (MDR) method for nonparametric variable selection of interactions. This package is designed to provide an alternative implementation for R users, with great flexibility and utility for both data analysis and research. The 'MDR' package is freely available online at http://www.r-project.org/. We also provide data examples to illustrate the use and functionality of the package.
Conclusions
MDR is a frequently-used data-mining method to identify potential gene-gene interactions, and alternative implementations will further increase this usage. We introduce a flexible software package for R users.
doi:10.1186/1756-0381-4-24
PMCID: PMC3177775  PMID: 21846375
5.  Grammatical evolution decision trees for detecting gene-gene interactions 
BioData Mining  2010;3:8.
Background
A fundamental goal of human genetics is the discovery of polymorphisms that predict common, complex diseases. It is hypothesized that complex diseases are due to a myriad of factors including environmental exposures and complex genetic risk models, including gene-gene interactions. Such epistatic models present an important analytical challenge, requiring that methods perform not only statistical modeling, but also variable selection to generate testable genetic model hypotheses. This challenge is amplified by recent advances in genotyping technology, as the number of potential predictor variables is rapidly increasing.
Methods
Decision trees are a highly successful, easily interpretable data-mining method that are typically optimized with a hierarchical model building approach, which limits their potential to identify interacting effects. To overcome this limitation, we utilize evolutionary computation, specifically grammatical evolution, to build decision trees to detect and model gene-gene interactions. In the current study, we introduce the Grammatical Evolution Decision Trees (GEDT) method and software and evaluate this approach on simulated data representing gene-gene interaction models of a range of effect sizes. We compare the performance of the method to a traditional decision tree algorithm and a random search approach and demonstrate the improved performance of the method to detect purely epistatic interactions.
Results
The results of our simulations demonstrate that GEDT has high power to detect even very moderate genetic risk models. GEDT has high power to detect interactions with and without main effects.
Conclusions
GEDT, while still in its initial stages of development, is a promising new approach for identifying gene-gene interactions in genetic association studies.
doi:10.1186/1756-0381-3-8
PMCID: PMC3000379  PMID: 21087514
6.  A comparison of internal validation techniques for multifactor dimensionality reduction 
BMC Bioinformatics  2010;11:394.
Background
It is hypothesized that common, complex diseases may be due to complex interactions between genetic and environmental factors, which are difficult to detect in high-dimensional data using traditional statistical approaches. Multifactor Dimensionality Reduction (MDR) is the most commonly used data-mining method to detect epistatic interactions. In all data-mining methods, it is important to consider internal validation procedures to obtain prediction estimates to prevent model over-fitting and reduce potential false positive findings. Currently, MDR utilizes cross-validation for internal validation. In this study, we incorporate the use of a three-way split (3WS) of the data in combination with a post-hoc pruning procedure as an alternative to cross-validation for internal model validation to reduce computation time without impairing performance. We compare the power to detect true disease causing loci using MDR with both 5- and 10-fold cross-validation to MDR with 3WS for a range of single-locus and epistatic disease models. Additionally, we analyze a dataset in HIV immunogenetics to demonstrate the results of the two strategies on real data.
Results
MDR with 3WS is computationally approximately five times faster than 5-fold cross-validation. The power to find the exact true disease loci without detecting false positive loci is higher with 5-fold cross-validation than with 3WS before pruning. However, the power to find the true disease causing loci in addition to false positive loci is equivalent to the 3WS. With the incorporation of a pruning procedure after the 3WS, the power of the 3WS approach to detect only the exact disease loci is equivalent to that of MDR with cross-validation. In the real data application, the cross-validation and 3WS analyses indicate the same two-locus model.
Conclusions
Our results reveal that the performance of the two internal validation methods is equivalent with the use of pruning procedures. The specific pruning procedure should be chosen understanding the trade-off between identifying all relevant genetic effects but including false positives and missing important genetic factors. This implies 3WS may be a powerful and computationally efficient approach to screen for epistatic effects, and could be used to identify candidate interactions in large-scale genetic studies.
doi:10.1186/1471-2105-11-394
PMCID: PMC2920275  PMID: 20650002

Results 1-6 (6)