Search tips
Search criteria

Results 1-10 (10)

Clipboard (0)
Year of Publication
Document Types
2.  Multi-profile Bayesian alignment model for LC-MS data analysis with integration of internal standards 
Bioinformatics  2013;29(21):2774-2780.
Motivation: Liquid chromatography-mass spectrometry (LC-MS) has been widely used for profiling expression levels of biomolecules in various ‘-omic’ studies including proteomics, metabolomics and glycomics. Appropriate LC-MS data preprocessing steps are needed to detect true differences between biological groups. Retention time (RT) alignment, which is required to ensure that ion intensity measurements among multiple LC-MS runs are comparable, is one of the most important yet challenging preprocessing steps. Current alignment approaches estimate RT variability using either single chromatograms or detected peaks, but do not simultaneously take into account the complementary information embedded in the entire LC-MS data.
Results: We propose a Bayesian alignment model for LC-MS data analysis. The alignment model provides estimates of the RT variability along with uncertainty measures. The model enables integration of multiple sources of information including internal standards and clustered chromatograms in a mathematically rigorous framework. We apply the model to LC-MS metabolomic, proteomic and glycomic data. The performance of the model is evaluated based on ground-truth data, by measuring correlation of variation, RT difference across runs and peak-matching performance. We demonstrate that Bayesian alignment model improves significantly the RT alignment performance through appropriate integration of relevant information.
Availability and implementation: MATLAB code, raw and preprocessed LC-MS data are available at
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3799465  PMID: 24013927
3.  Robust identification of transcriptional regulatory networks using a Gibbs sampler on outlier sum statistic 
Bioinformatics  2012;28(15):1990-1997.
Motivation: Identification of transcriptional regulatory networks (TRNs) is of significant importance in computational biology for cancer research, providing a critical building block to unravel disease pathways. However, existing methods for TRN identification suffer from the inclusion of excessive ‘noise’ in microarray data and false-positives in binding data, especially when applied to human tumor-derived cell line studies. More robust methods that can counteract the imperfection of data sources are therefore needed for reliable identification of TRNs in this context.
Results: In this article, we propose to establish a link between the quality of one target gene to represent its regulator and the uncertainty of its expression to represent other target genes. Specifically, an outlier sum statistic was used to measure the aggregated evidence for regulation events between target genes and their corresponding transcription factors. A Gibbs sampling method was then developed to estimate the marginal distribution of the outlier sum statistic, hence, to uncover underlying regulatory relationships. To evaluate the effectiveness of our proposed method, we compared its performance with that of an existing sampling-based method using both simulation data and yeast cell cycle data. The experimental results show that our method consistently outperforms the competing method in different settings of signal-to-noise ratio and network topology, indicating its robustness for biological applications. Finally, we applied our method to breast cancer cell line data and demonstrated its ability to extract biologically meaningful regulatory modules related to estrogen signaling and action in breast cancer.
Availability and implementation: The Gibbs sampler MATLAB package is freely available at
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3400952  PMID: 22595208
4.  CAM-CM: a signal deconvolution tool for in vivo dynamic contrast-enhanced imaging of complex tissues 
Bioinformatics  2011;27(18):2607-2609.
Summary:In vivo dynamic contrast-enhanced imaging tools provide non-invasive methods for analyzing various functional changes associated with disease initiation, progression and responses to therapy. The quantitative application of these tools has been hindered by its inability to accurately resolve and characterize targeted tissues due to spatially mixed tissue heterogeneity. Convex Analysis of Mixtures – Compartment Modeling (CAM-CM) signal deconvolution tool has been developed to automatically identify pure-volume pixels located at the corners of the clustered pixel time series scatter simplex and subsequently estimate tissue-specific pharmacokinetic parameters. CAM-CM can dissect complex tissues into regions with differential tracer kinetics at pixel-wise resolution and provide a systems biology tool for defining imaging signatures predictive of phenotypes.
Availability: The MATLAB source code can be downloaded at the authors′ website
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3167053  PMID: 21785131
5.  BACOM: in silico detection of genomic deletion types and correction of normal cell contamination in copy number data 
Bioinformatics  2011;27(11):1473-1480.
Motivation: Identification of somatic DNA copy number alterations (CNAs) and significant consensus events (SCEs) in cancer genomes is a main task in discovering potential cancer-driving genes such as oncogenes and tumor suppressors. The recent development of SNP array technology has facilitated studies on copy number changes at a genome-wide scale with high resolution. However, existing copy number analysis methods are oblivious to normal cell contamination and cannot distinguish between contributions of cancerous and normal cells to the measured copy number signals. This contamination could significantly confound downstream analysis of CNAs and affect the power to detect SCEs in clinical samples.
Results: We report here a statistically principled in silico approach, Bayesian Analysis of COpy number Mixtures (BACOM), to accurately estimate genomic deletion type and normal tissue contamination, and accordingly recover the true copy number profile in cancer cells. We tested the proposed method on two simulated datasets, two prostate cancer datasets and The Cancer Genome Atlas high-grade ovarian dataset, and obtained very promising results supported by the ground truth and biological plausibility. Moreover, based on a large number of comparative simulation studies, the proposed method gives significantly improved power to detect SCEs after in silico correction of normal tissue contamination. We develop a cross-platform open-source Java application that implements the whole pipeline of copy number analysis of heterogeneous cancer tissues including relevant processing steps. We also provide an R interface, bacomR, for running BACOM within the R environment, making it straightforward to include in existing data pipelines.
Availability: The cross-platform, stand-alone Java application, BACOM, the R interface, bacomR, all source code and the simulation data used in this article are freely available at authors' web site:
Supplementary Information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3102226  PMID: 21498400
6.  DDN: a caBIG® analytical tool for differential network analysis 
Bioinformatics  2011;27(7):1036-1038.
Summary: Differential dependency network (DDN) is a caBIG® (cancer Biomedical Informatics Grid) analytical tool for detecting and visualizing statistically significant topological changes in transcriptional networks representing two biological conditions. Developed under caBIG® 's In Silico Research Centers of Excellence (ISRCE) Program, DDN enables differential network analysis and provides an alternative way for defining network biomarkers predictive of phenotypes. DDN also serves as a useful systems biology tool for users across biomedical research communities to infer how genetic, epigenetic or environment variables may affect biological networks and clinical phenotypes. Besides the standalone Java application, we have also developed a Cytoscape plug-in, CytoDDN, to integrate network analysis and visualization seamlessly.
Availability: The Java and MATLAB source code can be downloaded at the authors' web site
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3065688  PMID: 21296752
7.  PUGSVM: a caBIGTM analytical tool for multiclass gene selection and predictive classification 
Bioinformatics  2010;27(5):736-738.
Summary: Phenotypic Up-regulated Gene Support Vector Machine (PUGSVM) is a cancer Biomedical Informatics Grid (caBIG™) analytical tool for multiclass gene selection and classification. PUGSVM addresses the problem of imbalanced class separability, small sample size and high gene space dimensionality, where multiclass gene markers are defined by the union of one-versus-everyone phenotypic upregulated genes, and used by a well-matched one-versus-rest support vector machine. PUGSVM provides a simple yet more accurate strategy to identify statistically reproducible mechanistic marker genes for characterization of heterogeneous diseases.
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3042183  PMID: 21186245
8.  Multilevel support vector regression analysis to identify condition-specific regulatory networks 
Bioinformatics  2010;26(11):1416-1422.
Motivation: The identification of gene regulatory modules is an important yet challenging problem in computational biology. While many computational methods have been proposed to identify regulatory modules, their initial success is largely compromised by a high rate of false positives, especially when applied to human cancer studies. New strategies are needed for reliable regulatory module identification.
Results: We present a new approach, namely multilevel support vector regression (ml-SVR), to systematically identify condition-specific regulatory modules. The approach is built upon a multilevel analysis strategy designed for suppressing false positive predictions. With this strategy, a regulatory module becomes ever more significant as more relevant gene sets are formed at finer levels. At each level, a two-stage support vector regression (SVR) method is utilized to help reduce false positive predictions by integrating binding motif information and gene expression data; a significant analysis procedure is followed to assess the significance of each regulatory module. To evaluate the effectiveness of the proposed strategy, we first compared the ml-SVR approach with other existing methods on simulation data and yeast cell cycle data. The resulting performance shows that the ml-SVR approach outperforms other methods in the identification of both regulators and their target genes. We then applied our method to breast cancer cell line data to identify condition-specific regulatory modules associated with estrogen treatment. Experimental results show that our method can identify biologically meaningful regulatory modules related to estrogen signaling and action in breast cancer.
Availability and implementation: The ml-SVR MATLAB package can be downloaded at
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC2872001  PMID: 20375112
9.  An algorithm for learning maximum entropy probability models of disease risk that efficiently searches and sparingly encodes multilocus genomic interactions 
Bioinformatics  2009;25(19):2478-2485.
Motivation: In both genome-wide association studies (GWAS) and pathway analysis, the modest sample size relative to the number of genetic markers presents formidable computational, statistical and methodological challenges for accurately identifying markers/interactions and for building phenotype-predictive models.
Results: We address these objectives via maximum entropy conditional probability modeling (MECPM), coupled with a novel model structure search. Unlike neural networks and support vector machines (SVMs), MECPM makes explicit and is determined by the interactions that confer phenotype-predictive power. Our method identifies both a marker subset and the multiple k-way interactions between these markers. Additional key aspects are: (i) evaluation of a select subset of up to five-way interactions while retaining relatively low complexity; (ii) flexible single nucleotide polymorphism (SNP) coding (dominant, recessive) within each interaction; (iii) no mathematical interaction form assumed; (iv) model structure and order selection based on the Bayesian Information Criterion, which fairly compares interactions at different orders and automatically sets the experiment-wide significance level; (v) MECPM directly yields a phenotype-predictive model. MECPM was compared with a panel of methods on datasets with up to 1000 SNPs and up to eight embedded penetrance function (i.e. ground-truth) interactions, including a five-way, involving less than 20 SNPs. MECPM achieved improved sensitivity and specificity for detecting both ground-truth markers and interactions, compared with previous methods.
Supplementary information:Supplementary data are available at Bioinformatics online.
PMCID: PMC3140808  PMID: 19608708
10.  Differential dependency network analysis to identify condition-specific topological changes in biological networks 
Bioinformatics  2008;25(4):526-532.
Motivation: Significant efforts have been made to acquire data under different conditions and to construct static networks that can explain various gene regulation mechanisms. However, gene regulatory networks are dynamic and condition-specific; under different conditions, networks exhibit different regulation patterns accompanied by different transcriptional network topologies. Thus, an investigation on the topological changes in transcriptional networks can facilitate the understanding of cell development or provide novel insights into the pathophysiology of certain diseases, and help identify the key genetic players that could serve as biomarkers or drug targets.
Results: Here, we report a differential dependency network (DDN) analysis to detect statistically significant topological changes in the transcriptional networks between two biological conditions. We propose a local dependency model to represent the local structures of a network by a set of conditional probabilities. We develop an efficient learning algorithm to learn the local dependency model using the Lasso technique. A permutation test is subsequently performed to estimate the statistical significance of each learned local structure. In testing on a simulation dataset, the proposed algorithm accurately detected all the genes with network topological changes. The method was then applied to the estrogen-dependent T-47D estrogen receptor-positive (ER+) breast cancer cell line datasets and human and mouse embryonic stem cell datasets. In both experiments using real microarray datasets, the proposed method produced biologically meaningful results. We expect DDN to emerge as an important bioinformatics tool in transcriptional network analyses. While we focus specifically on transcriptional networks, the DDN method we introduce here is generally applicable to other biological networks with similar characteristics.
Availability: The DDN MATLAB toolbox and experiment data are available at
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC2642641  PMID: 19112081

Results 1-10 (10)