PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-8 (8)
 

Clipboard (0)
None
Journals
Year of Publication
Document Types
author:("Yu, guoyang")
1.  Comparative Analysis of Methods for Identifying Recurrent Copy Number Alterations in Cancer 
PLoS ONE  2012;7(12):e52516.
Recurrent copy number alterations (CNAs) play an important role in cancer genesis. While a number of computational methods have been proposed for identifying such CNAs, their relative merits remain largely unknown in practice since very few efforts have been focused on comparative analysis of the methods. To facilitate studies of recurrent CNA identification in cancer genome, it is imperative to conduct a comprehensive comparison of performance and limitations among existing methods. In this paper, six representative methods proposed in the latest six years are compared. These include one-stage and two-stage approaches, working with raw intensity ratio data and discretized data respectively. They are based on various techniques such as kernel regression, correlation matrix diagonal segmentation, semi-parametric permutation and cyclic permutation schemes. We explore multiple criteria including type I error rate, detection power, Receiver Operating Characteristics (ROC) curve and the area under curve (AUC), and computational complexity, to evaluate performance of the methods under multiple simulation scenarios. We also characterize their abilities on applications to two real datasets obtained from cancers with lung adenocarcinoma and glioblastoma. This comparison study reveals general characteristics of the existing methods for identifying recurrent CNAs, and further provides new insights into their strengths and weaknesses. It is believed helpful to accelerate the development of novel and improved methods.
doi:10.1371/journal.pone.0052516
PMCID: PMC3527554  PMID: 23285074
2.  Genome-wide identification of significant aberrations in cancer genome 
BMC Genomics  2012;13:342.
Background
Somatic Copy Number Alterations (CNAs) in human genomes are present in almost all human cancers. Systematic efforts to characterize such structural variants must effectively distinguish significant consensus events from random background aberrations. Here we introduce Significant Aberration in Cancer (SAIC), a new method for characterizing and assessing the statistical significance of recurrent CNA units. Three main features of SAIC include: (1) exploiting the intrinsic correlation among consecutive probes to assign a score to each CNA unit instead of single probes; (2) performing permutations on CNA units that preserve correlations inherent in the copy number data; and (3) iteratively detecting Significant Copy Number Aberrations (SCAs) and estimating an unbiased null distribution by applying an SCA-exclusive permutation scheme.
Results
We test and compare the performance of SAIC against four peer methods (GISTIC, STAC, KC-SMART, CMDS) on a large number of simulation datasets. Experimental results show that SAIC outperforms peer methods in terms of larger area under the Receiver Operating Characteristics curve and increased detection power. We then apply SAIC to analyze structural genomic aberrations acquired in four real cancer genome-wide copy number data sets (ovarian cancer, metastatic prostate cancer, lung adenocarcinoma, glioblastoma). When compared with previously reported results, SAIC successfully identifies most SCAs known to be of biological significance and associated with oncogenes (e.g., KRAS, CCNE1, and MYC) or tumor suppressor genes (e.g., CDKN2A/B). Furthermore, SAIC identifies a number of novel SCAs in these copy number data that encompass tumor related genes and may warrant further studies.
Conclusions
Supported by a well-grounded theoretical framework, SAIC has been developed and used to identify SCAs in various cancer copy number data sets, providing useful information to study the landscape of cancer genomes. Open–source and platform-independent SAIC software is implemented using C++, together with R scripts for data formatting and Perl scripts for user interfacing, and it is easy to install and efficient to use. The source code and documentation are freely available at http://www.cbil.ece.vt.edu/software.htm.
doi:10.1186/1471-2164-13-342
PMCID: PMC3428679  PMID: 22839576
3.  BACOM: in silico detection of genomic deletion types and correction of normal cell contamination in copy number data 
Bioinformatics  2011;27(11):1473-1480.
Motivation: Identification of somatic DNA copy number alterations (CNAs) and significant consensus events (SCEs) in cancer genomes is a main task in discovering potential cancer-driving genes such as oncogenes and tumor suppressors. The recent development of SNP array technology has facilitated studies on copy number changes at a genome-wide scale with high resolution. However, existing copy number analysis methods are oblivious to normal cell contamination and cannot distinguish between contributions of cancerous and normal cells to the measured copy number signals. This contamination could significantly confound downstream analysis of CNAs and affect the power to detect SCEs in clinical samples.
Results: We report here a statistically principled in silico approach, Bayesian Analysis of COpy number Mixtures (BACOM), to accurately estimate genomic deletion type and normal tissue contamination, and accordingly recover the true copy number profile in cancer cells. We tested the proposed method on two simulated datasets, two prostate cancer datasets and The Cancer Genome Atlas high-grade ovarian dataset, and obtained very promising results supported by the ground truth and biological plausibility. Moreover, based on a large number of comparative simulation studies, the proposed method gives significantly improved power to detect SCEs after in silico correction of normal tissue contamination. We develop a cross-platform open-source Java application that implements the whole pipeline of copy number analysis of heterogeneous cancer tissues including relevant processing steps. We also provide an R interface, bacomR, for running BACOM within the R environment, making it straightforward to include in existing data pipelines.
Availability: The cross-platform, stand-alone Java application, BACOM, the R interface, bacomR, all source code and the simulation data used in this article are freely available at authors' web site: http://www.cbil.ece.vt.edu/software.htm.
Contact: yuewang@vt.edu
Supplementary Information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btr183
PMCID: PMC3102226  PMID: 21498400
4.  PUGSVM: a caBIGTM analytical tool for multiclass gene selection and predictive classification 
Bioinformatics  2010;27(5):736-738.
Summary: Phenotypic Up-regulated Gene Support Vector Machine (PUGSVM) is a cancer Biomedical Informatics Grid (caBIG™) analytical tool for multiclass gene selection and classification. PUGSVM addresses the problem of imbalanced class separability, small sample size and high gene space dimensionality, where multiclass gene markers are defined by the union of one-versus-everyone phenotypic upregulated genes, and used by a well-matched one-versus-rest support vector machine. PUGSVM provides a simple yet more accurate strategy to identify statistically reproducible mechanistic marker genes for characterization of heterogeneous diseases.
Availability: http://www.cbil.ece.vt.edu/caBIG-PUGSVM.htm.
Contact: yuewang@vt.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btq721
PMCID: PMC3042183  PMID: 21186245
5.  Comparative analysis of methods for detecting interacting loci 
BMC Genomics  2011;12:344.
Background
Interactions among genetic loci are believed to play an important role in disease risk. While many methods have been proposed for detecting such interactions, their relative performance remains largely unclear, mainly because different data sources, detection performance criteria, and experimental protocols were used in the papers introducing these methods and in subsequent studies. Moreover, there have been very few studies strictly focused on comparison of existing methods. Given the importance of detecting gene-gene and gene-environment interactions, a rigorous, comprehensive comparison of performance and limitations of available interaction detection methods is warranted.
Results
We report a comparison of eight representative methods, of which seven were specifically designed to detect interactions among single nucleotide polymorphisms (SNPs), with the last a popular main-effect testing method used as a baseline for performance evaluation. The selected methods, multifactor dimensionality reduction (MDR), full interaction model (FIM), information gain (IG), Bayesian epistasis association mapping (BEAM), SNP harvester (SH), maximum entropy conditional probability modeling (MECPM), logistic regression with an interaction term (LRIT), and logistic regression (LR) were compared on a large number of simulated data sets, each, consistent with complex disease models, embedding multiple sets of interacting SNPs, under different interaction models. The assessment criteria included several relevant detection power measures, family-wise type I error rate, and computational complexity. There are several important results from this study. First, while some SNPs in interactions with strong effects are successfully detected, most of the methods miss many interacting SNPs at an acceptable rate of false positives. In this study, the best-performing method was MECPM. Second, the statistical significance assessment criteria, used by some of the methods to control the type I error rate, are quite conservative, thereby limiting their power and making it difficult to fairly compare them. Third, as expected, power varies for different models and as a function of penetrance, minor allele frequency, linkage disequilibrium and marginal effects. Fourth, the analytical relationships between power and these factors are derived, aiding in the interpretation of the study results. Fifth, for these methods the magnitude of the main effect influences the power of the tests. Sixth, most methods can detect some ground-truth SNPs but have modest power to detect the whole set of interacting SNPs.
Conclusion
This comparison study provides new insights into the strengths and limitations of current methods for detecting interacting loci. This study, along with freely available simulation tools we provide, should help support development of improved methods. The simulation tools are available at: http://code.google.com/p/simulation-tool-bmc-ms9169818735220977/downloads/list.
doi:10.1186/1471-2164-12-344
PMCID: PMC3161015  PMID: 21729295
6.  A Ground Truth Based Comparative Study on Detecting Epistatic SNPs 
Genome-wide association studies (GWAS) have been widely applied to identify informative SNPs associated with common and complex diseases. Besides single-SNP analysis, the interaction between SNPs is believed to play an important role in disease risk due to the complex networking of genetic regulations. While many approaches have been proposed for detecting SNP interactions, the relative performance and merits of these methods in practice are largely unclear. In this paper, a ground-truth based comparative study is reported involving 9 popular SNP detection methods using realistic simulation datasets. The results provide general characteristics and guidelines on these methods that may be informative to the biological investigators.
doi:10.1109/BIBMW.2009.5332132
PMCID: PMC2998769  PMID: 21151836
Genome-wide association study; single-nucleotide polymorphism; SNP interaction
7.  An algorithm for learning maximum entropy probability models of disease risk that efficiently searches and sparingly encodes multilocus genomic interactions 
Bioinformatics  2009;25(19):2478-2485.
Motivation: In both genome-wide association studies (GWAS) and pathway analysis, the modest sample size relative to the number of genetic markers presents formidable computational, statistical and methodological challenges for accurately identifying markers/interactions and for building phenotype-predictive models.
Results: We address these objectives via maximum entropy conditional probability modeling (MECPM), coupled with a novel model structure search. Unlike neural networks and support vector machines (SVMs), MECPM makes explicit and is determined by the interactions that confer phenotype-predictive power. Our method identifies both a marker subset and the multiple k-way interactions between these markers. Additional key aspects are: (i) evaluation of a select subset of up to five-way interactions while retaining relatively low complexity; (ii) flexible single nucleotide polymorphism (SNP) coding (dominant, recessive) within each interaction; (iii) no mathematical interaction form assumed; (iv) model structure and order selection based on the Bayesian Information Criterion, which fairly compares interactions at different orders and automatically sets the experiment-wide significance level; (v) MECPM directly yields a phenotype-predictive model. MECPM was compared with a panel of methods on datasets with up to 1000 SNPs and up to eight embedded penetrance function (i.e. ground-truth) interactions, including a five-way, involving less than 20 SNPs. MECPM achieved improved sensitivity and specificity for detecting both ground-truth markers and interactions, compared with previous methods.
Availability: http://www.cbil.ece.vt.edu/ResearchOngoingSNP.htm
Contact: djmiller@engr.psu.edu
Supplementary information:Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btp435
PMCID: PMC3140808  PMID: 19608708

Results 1-8 (8)