Statins are among the most frequently prescribed drugs because of their efficacy and low toxicity in treating hypercholesterolemia. Recently, statins have been reported to inhibit the proliferative activity of cancer cells, especially those with TP53 mutations. Since TP53 mutations occur in almost all of the ovarian high-grade serous carcinoma, we determined if statins suppressed tumor growth in animal models of ovarian cancer.
Two ovarian cancer mouse models were employed. The first one was a genetically engineered model, mogp-TAg, in which the promoter of oviduct glycoprotein-1 was used to drive the expression of SV40 T-antigen in gynecologic tissues. These mice spontaneously develop serous tubal intraepithelial carcinomas (STICs), which are known as ovarian cancer precursor lesions. The second model was a xenograft tumor model in which human ovarian cancer cells were inoculated into immunocompromised mice. Mice in both models were treated with lovastatin, and effects on tumor growth were monitored. The molecular mechanisms underlying the anti-tumor effects of lovastatin were also investigated.
Lovastatin significantly reduced the development of STICs in mogp-TAg mice and inhibited ovarian tumor growth in the mouse xenograft model. Knockdown of prenylation enzymes in the mevalonate pathway recapitulated the lovastatin-induced anti-proliferative phenotype. Transcriptome analysis indicated that lovastatin affected the expression of genes associated with DNA replication, Rho/PLC signaling, glycolysis, and cholesterol biosynthesis pathways, suggesting that statins have pleiotropic effects on tumor cells.
The above results suggest that repurposing statin drugs for ovarian cancer may provide a promising strategy to prevent and manage this devastating disease.
Summary: Identification of protein interaction subnetworks is an important step to help us understand complex molecular mechanisms in cancer. In this paper, we develop a BMRF-Net package, implemented in Java and C++, to identify protein interaction subnetworks based on a bagging Markov random field (BMRF) framework. By integrating gene expression data and protein–protein interaction data, this software tool can be used to identify biologically meaningful subnetworks. A user friendly graphic user interface is developed as a Cytoscape plugin for the BMRF-Net software to deal with the input/output interface. The detailed structure of the identified networks can be visualized in Cytoscape conveniently. The BMRF-Net package has been applied to breast cancer data to identify significant subnetworks related to breast cancer recurrence.
Availability and implementation: The BMRF-Net package is available at http://sourceforge.net/projects/bmrfcjava/. The package is tested under Ubuntu 12.04 (64-bit), Java 7, glibc 2.15 and Cytoscape 3.1.0.
Supplementary information: Supplementary data are available at Bioinformatics online.
Given the variety of available clustering methods for gene expression data analysis, it is important to develop an appropriate and rigorous validation scheme to assess the performance and limitations of the most widely used clustering algorithms. In this paper, we present a ground truth based comparative study on the functionality, accuracy, and stability of five data clustering methods, namely hierarchical clustering, K-means clustering, self-organizing maps, standard finite normal mixture fitting, and a caBIG™ toolkit (VIsual Statistical Data Analyzer - VISDA), tested on sample clustering of seven published microarray gene expression datasets and one synthetic dataset. We examined the performance of these algorithms in both data-sufficient and data-insufficient cases using quantitative performance measures, including cluster number detection accuracy and mean and standard deviation of partition accuracy. The experimental results showed that VISDA, an interactive coarse-to-fine maximum likelihood fitting algorithm, is a solid performer on most of the datasets, while K-means clustering and self-organizing maps optimized by the mean squared compactness criterion generally produce more stable solutions than the other methods.
Clustering Evaluation; Sample Clustering; Comparative Study; Gene Expression Data
Summary: We have developed an integrated molecular network learning method, within a well-grounded mathematical framework, to construct differential dependency networks with significant rewiring. This knowledge-fused differential dependency networks (KDDN) method, implemented as a Java Cytoscape app, can be used to optimally integrate prior biological knowledge with measured data to simultaneously construct both common and differential networks, to quantitatively assign model parameters and significant rewiring p-values and to provide user-friendly graphical results. The KDDN algorithm is computationally efficient and provides users with parallel computing capability using ubiquitous multi-core machines. We demonstrate the performance of KDDN on various simulations and real gene expression datasets, and further compare the results with those obtained by the most relevant peer methods. The acquired biologically plausible results provide new insights into network rewiring as a mechanistic principle and illustrate KDDN’s ability to detect them efficiently and correctly. Although the principal application here involves microarray gene expressions, our methodology can be readily applied to other types of quantitative molecular profiling data.
Availability: Source code and compiled package are freely available for download at http://apps.cytoscape.org/apps/kddn
Supplementary data are available at Bioinformatics online.
Summary: We develop a novel unsupervised deconvolution method, within a well-grounded mathematical framework, to dissect mixed gene expressions in heterogeneous tumor samples. We implement an R package, UNsupervised DecOnvolution (UNDO), that can be used to automatically detect cell-specific marker genes (MGs) located on the scatter radii of mixed gene expressions, estimate cellular proportions in each sample and deconvolute mixed expressions into cell-specific expression profiles. We demonstrate the performance of UNDO over a wide range of tumor–stroma mixing proportions, validate UNDO on various biologically mixed benchmark gene expression datasets and further estimate tumor purity in TCGA/CPTAC datasets. The highly accurate deconvolution results obtained suggest not only the existence of cell-specific MGs but also UNDO’s ability to detect them blindly and correctly. Although the principal application here involves microarray gene expressions, our methodology can be readily applied to other types of quantitative molecular profiling data.
Availability and implementation: UNDO is available at http://bioconductor.org/packages.
Supplementary data are available at Bioinformatics online.
Chromatin immunoprecipitation with massively parallel DNA sequencing (ChIP-seq) has greatly improved the reliability with which transcription factor binding sites (TFBSs) can be identified from genome-wide profiling studies. Many computational tools are developed to detect binding events or peaks, however the robust detection of weak binding events remains a challenge for current peak calling tools. We have developed a novel Bayesian approach (ChIP-BIT) to reliably detect TFBSs and their target genes by jointly modeling binding signal intensities and binding locations of TFBSs. Specifically, a Gaussian mixture model is used to capture both binding and background signals in sample data. As a unique feature of ChIP-BIT, background signals are modeled by a local Gaussian distribution that is accurately estimated from the input data. Extensive simulation studies showed a significantly improved performance of ChIP-BIT in target gene prediction, particularly for detecting weak binding signals at gene promoter regions. We applied ChIP-BIT to find target genes from NOTCH3 and PBX1 ChIP-seq data acquired from MCF-7 breast cancer cells. TF knockdown experiments have initially validated about 30% of co-regulated target genes identified by ChIP-BIT as being differentially expressed in MCF-7 cells. Functional analysis on these genes further revealed the existence of crosstalk between Notch and Wnt signaling pathways.
High coverage whole genome DNA-sequencing enables identification of somatic structural variation (SSV) more evident in paired tumor and normal samples. Recent studies show that simultaneous analysis of paired samples provides a better resolution of SSV detection than subtracting shared SVs. However, available tools can neither identify all types of SSVs nor provide any rank information regarding their somatic features. In this paper, we have developed a Bayesian framework, by integrating read alignment information from both tumor and normal samples, called BSSV, to calculate the significance of each SSV. Tested by simulated data, the precision of BSSV is comparable to that of available tools and the false negative rate is significantly lowered. We have also applied this approach to The Cancer Genome Atlas breast cancer data for SSV detection. Many known breast cancer specific mutated genes like RAD51, BRIP1, ER, PGR and PTPRD have been successfully identified.
Identification of protein interaction network is a very important step for understanding the molecular mechanisms in cancer. Several methods have been developed to integrate protein-protein interaction (PPI) data with gene expression data for network identification. However, they often fail to model the dependency between genes in the network, which makes many important genes, especially the upstream genes, unidentified. It is necessary to develop a method to improve the network identification performance by incorporating the dependency between genes.
We proposed an approach for identifying protein interaction network by incorporating mutual information (MI) into a Markov random field (MRF) based framework to model the dependency between genes. MI is widely used in information theory to measure the uncertainty between random variables. Different from traditional Pearson correlation test, MI is capable of capturing both linear and non-linear relationship between random variables. Among all the existing MI estimators, we choose to use k-nearest neighbor MI (kNN-MI) estimator which is proved to have minimum bias. The estimated MI is integrated with an MRF framework to model the gene dependency in the context of network. The maximum a posterior (MAP) estimation is applied on the MRF-based model to estimate the network score. In order to reduce the computational complexity of finding the optimal network, a probabilistic searching algorithm is implemented. We further increase the robustness and reproducibility of the results by applying a non-parametric bootstrapping method to measure the confidence level of the identified genes. To evaluate the performance of the proposed method, we test the method on simulation data under different conditions. The experimental results show an improved accuracy in terms of subnetwork identification compared to existing methods. Furthermore, we applied our method onto real breast cancer patient data; the identified protein interaction network shows a close association with the recurrence of breast cancer, which is supported by functional annotation. We also show that the identified subnetworks can be used to predict the recurrence status of cancer patients by survival analysis.
We have developed an integrated approach for protein interaction network identification, which combines Markov random field framework and mutual information to model the gene dependency in PPI network. Improvements in subnetwork identification have been demonstrated with simulation datasets compared to existing methods. We then apply our method onto breast cancer patient data to identify recurrence related subnetworks. The experiment results show that the identified genes are enriched in the pathway and functional categories relevant to progression and recurrence of breast cancer. Finally, the survival analysis based on identified subnetworks achieves a good result of classifying the recurrence status of cancer patients.
In Duchenne muscular dystrophy, asynchronous regeneration in microenvironments within muscle tissue results in development of fibrosis in lieu of global muscle recovery.
We sought to determine the mechanisms underlying failure of muscle regeneration that is observed in dystrophic muscle through hypothesis generation using muscle profiling data (human dystrophy and murine regeneration). We found that transforming growth factor β–centered networks strongly associated with pathological fibrosis and failed regeneration were also induced during normal regeneration but at distinct time points. We hypothesized that asynchronously regenerating microenvironments are an underlying driver of fibrosis and failed regeneration. We validated this hypothesis using an experimental model of focal asynchronous bouts of muscle regeneration in wild-type (WT) mice. A chronic inflammatory state and reduced mitochondrial oxidative capacity are observed in bouts separated by 4 d, whereas a chronic profibrotic state was seen in bouts separated by 10 d. Treatment of asynchronously remodeling WT muscle with either prednisone or VBP15 mitigated the molecular phenotype. Our asynchronous regeneration model for pathological fibrosis and muscle wasting in the muscular dystrophies is likely generalizable to tissue failure in chronic inflammatory states in other regenerative tissues.
Recent advances in RNA sequencing (RNA-Seq) technology have offered unprecedented scope and resolution for transcriptome analysis. However, precise quantification of mRNA abundance and identification of differentially expressed genes are complicated due to biological and technical variations in RNA-Seq data.
We systematically study the variation in count data and dissect the sources of variation into between-sample variation and within-sample variation. A novel Bayesian framework is developed for joint estimate of gene level mRNA abundance and differential state, which models the intrinsic variability in RNA-Seq to improve the estimation. Specifically, a Poisson-Lognormal model is incorporated into the Bayesian framework to model within-sample variation; a Gamma-Gamma model is then used to model between-sample variation, which accounts for over-dispersion of read counts among multiple samples. Simulation studies, where sequencing counts are synthesized based on parameters learned from real datasets, have demonstrated the advantage of the proposed method in both quantification of mRNA abundance and identification of differentially expressed genes. Moreover, performance comparison on data from the Sequencing Quality Control (SEQC) Project with ERCC spike-in controls has shown that the proposed method outperforms existing RNA-Seq methods in differential analysis. Application on breast cancer dataset has further illustrated that the proposed Bayesian model can 'blindly' estimate sources of variation caused by sequencing biases.
We have developed a novel Bayesian hierarchical approach to investigate within-sample and between-sample variations in RNA-Seq data. Simulation and real data applications have validated desirable performance of the proposed method. The software package is available at http://www.cbil.ece.vt.edu/software.htm.
Modeling biological networks serves as both a major goal and an effective tool of systems biology in studying mechanisms that orchestrate the activities of gene products in cells. Biological networks are context-specific and dynamic in nature. To systematically characterize the selectively activated regulatory components and mechanisms, modeling tools must be able to effectively distinguish significant rewiring from random background fluctuations. While differential networks cannot be constructed by existing knowledge alone, novel incorporation of prior knowledge into data-driven approaches can improve the robustness and biological relevance of network inference. However, the major unresolved roadblocks include: big solution space but a small sample size; highly complex networks; imperfect prior knowledge; missing significance assessment; and heuristic structural parameter learning.
To address these challenges, we formulated the inference of differential dependency networks that incorporate both conditional data and prior knowledge as a convex optimization problem, and developed an efficient learning algorithm to jointly infer the conserved biological network and the significant rewiring across different conditions. We used a novel sampling scheme to estimate the expected error rate due to “random” knowledge. Based on that scheme, we developed a strategy that fully exploits the benefit of this data-knowledge integrated approach. We demonstrated and validated the principle and performance of our method using synthetic datasets. We then applied our method to yeast cell line and breast cancer microarray data and obtained biologically plausible results. The open-source R software package and the experimental data are freely available at http://www.cbil.ece.vt.edu/software.htm.
Experiments on both synthetic and real data demonstrate the effectiveness of the knowledge-fused differential dependency network in revealing the statistically significant rewiring in biological networks. The method efficiently leverages data-driven evidence and existing biological knowledge while remaining robust to the false positive edges in the prior knowledge. The identified network rewiring events are supported by previous studies in the literature and also provide new mechanistic insight into the biological systems. We expect the knowledge-fused differential dependency network analysis, together with the open-source R package, to be an important and useful bioinformatics tool in biological network analyses.
Biological networks; Probabilistic graphical models; Differential dependency network; Network rewiring; Network analysis; Systems biology; Knowledge incorporation; Convex optimization
Reliable inference of transcription regulatory networks is still a challenging task in the field of computational biology. Network component analysis (NCA) has become a powerful scheme to uncover the networks behind complex biological processes, especially when gene expression data is integrated with binding motif information. However, the performance of NCA is impaired by the high rate of false connections in binding motif information and the high level of noise in gene expression data. Moreover, in real applications such as cancer research, the performance of NCA in simultaneously analyzing multiple candidate transcription factors (TFs) is further limited by the small sample number of gene expression data. In this paper, we propose a novel scheme, stability-based NCA, to overcome the above-mentioned problems by addressing the inconsistency between gene expression data and motif binding information (i.e., prior network knowledge). This method introduces small perturbations on prior network knowledge and utilizes the variation of estimated TF activities to reflect the stability of TF activities. Such a scheme is less limited by the sample size and especially capable to identify condition-specific TFs and their target genes. Experiment results on both simulation data and real breast cancer data demonstrate the efficiency and robustness of the proposed method.
transcription regulatory network; network component analysis; stability analysis; transcription factor activity; target genes identification
The reliability and reproducibility of gene biomarkers for classification of cancer patients has been challenged due to measurement noise and biological heterogeneity among patients. In this paper, we propose a novel module-based feature selection framework, which integrates biological network information and gene expression data to identify biomarkers not as individual genes but as functional modules. Results from four breast cancer studies demonstrate that the identified module biomarkers i) achieve higher classification accuracy in independent validation datasets; ii) are more reproducible than individual gene markers; iii) improve the biological interpretability of results; and iv) are enriched in cancer “disease drivers”.
Cancer biomarkers; systems biology; feature selection; disease classification
Identification of cooperative gene regulatory network is an important topic for biological study especially in cancer research. Traditional approaches suffer from large noise in gene expression data and false positive connections in motif binding data; they also fail to identify the modularized structure of gene regulatory network. Methods that are capable of revealing underlying modularized structure and robust to noise and false positives are needed to be developed.
We proposed and developed an integrated approach to identify gene regulatory networks, which consists of a novel clustering method (namely motif-guided affinity propagation clustering (mAPC)) and a sampling based method (called Gibbs sampler based on outlier sum statistic (GibbsOS)). mAPC is used in the first step to obtain co-regulated gene modules by clustering genes with a similarity measurement taking into account both gene expression data and binding motif information. This clustering method can reduce the noise effect from microarray data to obtain modularized gene clusters. However, due to many false positives in motif binding data, some genes not regulated by certain transcription factors (TFs) will be falsely clustered with true target genes. To overcome this problem, GibbsOS is applied in the second step to refine each cluster for the identification of true target genes. In order to evaluate the performance of the proposed method, we generated simulation data under different signal-to-noise ratios and false positive ratios to test the method. The experimental results show an improved accuracy in terms of clustering and transcription factor identification. Moreover, an improved performance is demonstrated in target gene identification as compared with GibbsOS. Finally, we applied the proposed method to two breast cancer patient datasets to identify cooperative transcriptional regulatory networks associated with recurrence of breast cancer, as supported by their functional annotations.
We have developed a two-step approach for gene regulatory network identification, featuring an integrated method to identify modularized regulatory structures and refine their target genes subsequently. Simulation studies have shown the robustness of the method against noise in gene expression data and false positives in motif binding data. The proposed method has been applied to two breast cancer gene expression datasets to infer the hidden regulation mechanisms. The experimental results demonstrate the efficacy of the method in identifying key regulatory networks related to the progression and recurrence of breast cancer.
Understanding the molecular changes that drive an acquired antiestrogen resistance phenotype is of major clinical relevance. Previous methodologies for addressing this question have taken a single gene/pathway approach and the resulting gains have been limited in terms of their clinical impact. Recent systems biology approaches allow for the integration of data from high throughput “-omics” technologies. We highlight recent advances in the field of antiestrogen resistance with a focus on transcriptomics, proteomics and methylomics.
Systems biology; breast cancer; estrogens; antiestrogens
To construct biologically interpretable gene sets for muscular dystrophy (MD) sub-type classification, we propose a novel computational scheme to integrate protein-protein interaction (PPI) network, functional gene set information, and mRNA profiling data. The workflow of the proposed scheme includes the following three major steps: firstly, we apply an affinity propagation clustering (APC) approach to identify gene sub-networks associated with each MD sub-type, in which a new distance metric is proposed for APC to combine PPI network information and gene-gene co-expression relationship; secondly, we further incorporate functional gene set knowledge, which complements the physical PPI information, into our scheme for biomarker identification; finally, based on the constructed sub-networks and gene set features, we apply multi-class support vector machines (MSVMs) for MD sub-type classification, with which to highlight the biomarkers contributing to sub-type prediction. The experimental results show that our scheme can help identify sub-networks and gene sets that are more relevant to MD than those constructed by other conventional approaches. Moreover, our integrative strategy improves the prediction accuracy substantially, especially for those ’hard-to-classify’ sub-types.
Gene expression; Classification; Muscular dystrophy; Affinity propagation clustering; Biomarker discovery
Many computational methods for identification of transcription regulatory modules often result in many false positives in practice due to noise sources of binding information and gene expression profiling data. In this paper, we propose a multi-level strategy for condition-specific gene regulatory module identification by integrating motif binding information and gene expression data through support vector regression and significant analysis. We have demonstrated the feasibility of the proposed method on a yeast cell cycle data set. The study on a breast cancer microarray data set shows that it can successfully identify the significant and reliable regulatory modules associated with breast cancer.
transcription regulatory module; motif enrichment analysis; SVR; support vector regression; statistical significance analysis; multi-level regulator identification
Lack of understanding of endocrine resistance remains one of the major challenges for breast cancer researchers, clinicians, and patients. Current reductionist approaches to understanding the molecular signaling driving resistance have offered mostly incremental progress over the past 10 years. As the field of systems biology has begun to mature, the approaches and network modeling tools being developed and applied therein offer a different way to think about how molecular signaling and the regulation of critical cellular functions are integrated. To gain novel insights, we first describe some of the key challenges facing network modeling of endocrine resistance, many of which arise from the properties of the data spaces being studied. We then use activation of the unfolded protein response (UPR) following induction of endoplasmic reticulum stress in breast cancer cells by antiestrogens, to illustrate our approaches to computational modeling. Activation of UPR is a key determinant of cell fate decision making and regulation of autophagy and apoptosis. These initial studies provide insight into a small subnetwork topology obtained using differential dependency network analysis and focused on the UPR gene XBP1. The XBP1 subnetwork topology incorporates BCAR3, BCL2, BIK, NFκB, and other genes as nodes; the connecting edges represent the dependency structures amongst these nodes. As data from ongoing cellular and molecular studies become available, we will build detailed mathematical models of this XBP1-UPR network.
Antiestrogen; autophagy; apoptosis; breast cancer; cell signaling; endoplasmic reticulum; estrogens; gene networks; unfolded protein response; computational modeling; mathematical modeling; systems biology
With the advent of high-throughput biotechnology capable of monitoring genomic signals, it becomes increasingly promising to understand molecular cellular mechanisms through systems biology approaches. One of the active research topics in systems biology is to infer gene transcriptional regulatory networks using various genomic data; this inference problem can be formulated as a linear model with latent signals associated with some regulatory proteins called transcription factors (TFs). As common statistical assumptions may not hold for genomic signals, typical latent variable algorithms such as independent component analysis (ICA) are incapable to reveal underlying true regulatory signals. Liao et al.  proposed to perform inference using an approach named network component analysis (NCA), the optimization of which is achieved by a least-squares fitting approach with biological knowledge constraints. However, the incompleteness of biological knowledge and its inconsistency with gene expression data are not considered in the original NCA solution, which could greatly affect the inference accuracy. To overcome these limitations, we propose a linear extraction scheme, namely regulatory component analysis (RCA), to infer underlying regulatory signals even with partial biological knowledge. Numerical simulations show a significant improvement of our proposed RCA over NCA, not only when signal-to-noise-ratio (SNR) is low, but also when the given biological knowledge is incomplete and inconsistent to gene expression data. Furthermore, real biological experiments on E. coli are performed for regulatory network inference in comparison with several typical linear latent variable methods, which again demonstrates the effectiveness and improved performance of the proposed algorithm.
Transcriptional regulatory network inference; Source extraction; Gene expression; Genomic signal processing
Motivation: Identification of transcriptional regulatory networks (TRNs) is of significant importance in computational biology for cancer research, providing a critical building block to unravel disease pathways. However, existing methods for TRN identification suffer from the inclusion of excessive ‘noise’ in microarray data and false-positives in binding data, especially when applied to human tumor-derived cell line studies. More robust methods that can counteract the imperfection of data sources are therefore needed for reliable identification of TRNs in this context.
Results: In this article, we propose to establish a link between the quality of one target gene to represent its regulator and the uncertainty of its expression to represent other target genes. Specifically, an outlier sum statistic was used to measure the aggregated evidence for regulation events between target genes and their corresponding transcription factors. A Gibbs sampling method was then developed to estimate the marginal distribution of the outlier sum statistic, hence, to uncover underlying regulatory relationships. To evaluate the effectiveness of our proposed method, we compared its performance with that of an existing sampling-based method using both simulation data and yeast cell cycle data. The experimental results show that our method consistently outperforms the competing method in different settings of signal-to-noise ratio and network topology, indicating its robustness for biological applications. Finally, we applied our method to breast cancer cell line data and demonstrated its ability to extract biologically meaningful regulatory modules related to estrogen signaling and action in breast cancer.
Availability and implementation: The Gibbs sampler MATLAB package is freely available at http://www.cbil.ece.vt.edu/software.htm.
Supplementary data are available at Bioinformatics online.
Ovarian cancer is often called the ‘silent killer’ since it is difficult to have early detection and prognosis. Understanding the biological mechanism related to ovarian cancer becomes extremely important for the purpose of treatment. We propose an integrative framework to identify pathway related networks based on large-scale TCGA copy number data and gene expression profiles. The integrative approach first detects highly conserved copy number altered genes and regards them as seed genes, and then applies a network-based method to identify subnetworks that can differentiate gene expression patterns between different phenotypes of ovarian cancer patients. The identified subnetworks are further validated on an independent gene expression data set using a network-based classification method. The experimental results show that our approach can not only achieve good prediction performance across different data sets, but also identify biological meaningful subnetworks involved in many signaling pathways related to ovarian cancer.
How breast cancer cells respond to the stress of endocrine therapies determines whether they acquire a resistant phenotype or execute a cell death pathway. A successfully executed survival signal then requires determination of whether or not to replicate. How these cell fate decisions are regulated is unclear but evidence suggests that the signals determining these outcomes are highly integrated. Central to the final cell fate decision is signaling from the unfolded protein response, which can be activated following the sensing of stress within the endoplasmic reticulum. Duration of the response to stress is partly mediated by the duration of inositol requiring enzyme-1 (IRE1; ERN) activation following its release from heat shock protein A5 (HSPA5). The resulting signaling appears to use several B-cell lymphoma-2 (BCL2) family members to both suppress apoptosis and activate autophagy. Changes in metabolism induced by cellular stress are key components of this regulatory system, and further adaptation of the metabolome is affected in response to stress. Here we describe the unfolded protein response, autophagy and apoptosis, and how their regulation is integrated. Central topological features of the signaling network that integrate cell fate regulation and decision execution are discussed.
Cell signaling; endoplasmic reticulum; estrogens; unfolded protein response
Identification of differentially expressed subnetworks from protein–protein interaction (PPI) networks has become increasingly important to our global understanding of the molecular mechanisms that drive cancer. Several methods have been proposed for PPI subnetwork identification, but the dependency among network member genes is not explicitly considered, leaving many important hub genes largely unidentified. We present a new method, based on a bagging Markov random field (BMRF) framework, to improve subnetwork identification for mechanistic studies of breast cancer. The method follows a maximum a posteriori principle to form a novel network score that explicitly considers pairwise gene interactions in PPI networks, and it searches for subnetworks with maximal network scores. To improve their robustness across data sets, a bagging scheme based on bootstrapping samples is implemented to statistically select high confidence subnetworks. We first compared the BMRF-based method with existing methods on simulation data to demonstrate its improved performance. We then applied our method to breast cancer data to identify PPI subnetworks associated with breast cancer progression and/or tamoxifen resistance. The experimental results show that not only an improved prediction performance can be achieved by the BMRF approach when tested on independent data sets, but biologically meaningful subnetworks can also be revealed that are relevant to breast cancer and tamoxifen resistance.
Summary: Differential dependency network (DDN) is a caBIG® (cancer Biomedical Informatics Grid) analytical tool for detecting and visualizing statistically significant topological changes in transcriptional networks representing two biological conditions. Developed under caBIG® 's In Silico Research Centers of Excellence (ISRCE) Program, DDN enables differential network analysis and provides an alternative way for defining network biomarkers predictive of phenotypes. DDN also serves as a useful systems biology tool for users across biomedical research communities to infer how genetic, epigenetic or environment variables may affect biological networks and clinical phenotypes. Besides the standalone Java application, we have also developed a Cytoscape plug-in, CytoDDN, to integrate network analysis and visualization seamlessly.
Availability: The Java and MATLAB source code can be downloaded at the authors' web site http://www.cbil.ece.vt.edu/software.htm
Supplementary information: Supplementary data are available at Bioinformatics online.