Search tips
Search criteria

Results 1-3 (3)

Clipboard (0)
more »
Year of Publication
Document Types
author:("Yu, guoyang")
1.  BACOM: in silico detection of genomic deletion types and correction of normal cell contamination in copy number data 
Bioinformatics  2011;27(11):1473-1480.
Motivation: Identification of somatic DNA copy number alterations (CNAs) and significant consensus events (SCEs) in cancer genomes is a main task in discovering potential cancer-driving genes such as oncogenes and tumor suppressors. The recent development of SNP array technology has facilitated studies on copy number changes at a genome-wide scale with high resolution. However, existing copy number analysis methods are oblivious to normal cell contamination and cannot distinguish between contributions of cancerous and normal cells to the measured copy number signals. This contamination could significantly confound downstream analysis of CNAs and affect the power to detect SCEs in clinical samples.
Results: We report here a statistically principled in silico approach, Bayesian Analysis of COpy number Mixtures (BACOM), to accurately estimate genomic deletion type and normal tissue contamination, and accordingly recover the true copy number profile in cancer cells. We tested the proposed method on two simulated datasets, two prostate cancer datasets and The Cancer Genome Atlas high-grade ovarian dataset, and obtained very promising results supported by the ground truth and biological plausibility. Moreover, based on a large number of comparative simulation studies, the proposed method gives significantly improved power to detect SCEs after in silico correction of normal tissue contamination. We develop a cross-platform open-source Java application that implements the whole pipeline of copy number analysis of heterogeneous cancer tissues including relevant processing steps. We also provide an R interface, bacomR, for running BACOM within the R environment, making it straightforward to include in existing data pipelines.
Availability: The cross-platform, stand-alone Java application, BACOM, the R interface, bacomR, all source code and the simulation data used in this article are freely available at authors' web site:
Supplementary Information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3102226  PMID: 21498400
2.  PUGSVM: a caBIGTM analytical tool for multiclass gene selection and predictive classification 
Bioinformatics  2010;27(5):736-738.
Summary: Phenotypic Up-regulated Gene Support Vector Machine (PUGSVM) is a cancer Biomedical Informatics Grid (caBIG™) analytical tool for multiclass gene selection and classification. PUGSVM addresses the problem of imbalanced class separability, small sample size and high gene space dimensionality, where multiclass gene markers are defined by the union of one-versus-everyone phenotypic upregulated genes, and used by a well-matched one-versus-rest support vector machine. PUGSVM provides a simple yet more accurate strategy to identify statistically reproducible mechanistic marker genes for characterization of heterogeneous diseases.
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3042183  PMID: 21186245
3.  An algorithm for learning maximum entropy probability models of disease risk that efficiently searches and sparingly encodes multilocus genomic interactions 
Bioinformatics  2009;25(19):2478-2485.
Motivation: In both genome-wide association studies (GWAS) and pathway analysis, the modest sample size relative to the number of genetic markers presents formidable computational, statistical and methodological challenges for accurately identifying markers/interactions and for building phenotype-predictive models.
Results: We address these objectives via maximum entropy conditional probability modeling (MECPM), coupled with a novel model structure search. Unlike neural networks and support vector machines (SVMs), MECPM makes explicit and is determined by the interactions that confer phenotype-predictive power. Our method identifies both a marker subset and the multiple k-way interactions between these markers. Additional key aspects are: (i) evaluation of a select subset of up to five-way interactions while retaining relatively low complexity; (ii) flexible single nucleotide polymorphism (SNP) coding (dominant, recessive) within each interaction; (iii) no mathematical interaction form assumed; (iv) model structure and order selection based on the Bayesian Information Criterion, which fairly compares interactions at different orders and automatically sets the experiment-wide significance level; (v) MECPM directly yields a phenotype-predictive model. MECPM was compared with a panel of methods on datasets with up to 1000 SNPs and up to eight embedded penetrance function (i.e. ground-truth) interactions, including a five-way, involving less than 20 SNPs. MECPM achieved improved sensitivity and specificity for detecting both ground-truth markers and interactions, compared with previous methods.
Supplementary information:Supplementary data are available at Bioinformatics online.
PMCID: PMC3140808  PMID: 19608708

Results 1-3 (3)