Cross-referencing experimental data with our current knowledge of signaling network topologies is one central goal of mathematical modeling of cellular signal transduction networks. We present a new methodology for data-driven interrogation and training of signaling networks. While most published methods for signaling network inference operate on Bayesian, Boolean, or ODE models, our approach uses integer linear programming (ILP) on interaction graphs to encode constraints on the qualitative behavior of the nodes. These constraints are posed by the network topology and their formulation as ILP allows us to predict the possible qualitative changes (up, down, no effect) of the activation levels of the nodes for a given stimulus. We provide four basic operations to detect and remove inconsistencies between measurements and predicted behavior: (i) find a topology-consistent explanation for responses of signaling nodes measured in a stimulus-response experiment (if none exists, find the closest explanation); (ii) determine a minimal set of nodes that need to be corrected to make an inconsistent scenario consistent; (iii) determine the optimal subgraph of the given network topology which can best reflect measurements from a set of experimental scenarios; (iv) find possibly missing edges that would improve the consistency of the graph with respect to a set of experimental scenarios the most. We demonstrate the applicability of the proposed approach by interrogating a manually curated interaction graph model of EGFR/ErbB signaling against a library of high-throughput phosphoproteomic data measured in primary hepatocytes. Our methods detect interactions that are likely to be inactive in hepatocytes and provide suggestions for new interactions that, if included, would significantly improve the goodness of fit. Our framework is highly flexible and the underlying model requires only easily accessible biological knowledge. All related algorithms were implemented in a freely available toolbox SigNetTrainer making it an appealing approach for various applications.
Cellular signal transduction is orchestrated by communication networks of signaling proteins commonly depicted on signaling pathway maps. However, each cell type may have distinct variants of signaling pathways, and wiring diagrams are often altered in disease states. The identification of truly active signaling topologies based on experimental data is therefore one key challenge in systems biology of cellular signaling. We present a new framework for training signaling networks based on interaction graphs (IG). In contrast to complex modeling formalisms, IG capture merely the known positive and negative edges between the components. This basic information, however, already sets hard constraints on the possible qualitative behaviors of the nodes when perturbing the network. Our approach uses Integer Linear Programming to encode these constraints and to predict the possible changes (down, neutral, up) of the activation levels of the involved players for a given experiment. Based on this formulation we developed several algorithms for detecting and removing inconsistencies between measurements and network topology. Demonstrated by EGFR/ErbB signaling in hepatocytes, our approach delivers direct conclusions on edges that are likely inactive or missing relative to canonical pathway maps. Such information drives the further elucidation of signaling network topologies under normal and pathological phenotypes.
Gene regulatory network is an abstract mapping of gene regulations in living cells that can help to predict the system behavior of living organisms. Such prediction capability can potentially lead to the development of improved diagnostic tests and therapeutics. DNA microarrays, which measure the expression level of thousands of genes in parallel, constitute the numeric seed for the inference of gene regulatory networks. In this paper, we have proposed a new approach for inferring gene regulatory networks from time-series gene expression data using linear time-variant model. Here, Self-Adaptive Differential Evolution, a versatile and robust Evolutionary Algorithm, is used as the learning paradigm.
To assess the potency of the proposed work, a well known nonlinear synthetic network has been used. The reconstruction method has inferred this synthetic network topology and the associated regulatory parameters with high accuracy from both the noise-free and noisy time-series data. For validation purposes, the proposed approach is also applied to the simulated expression dataset of cAMP oscillations in Dictyostelium discoideum and has proved it's strength in finding the correct regulations. The strength of this work has also been verified by analyzing the real expression dataset of SOS DNA repair system in Escherichia coli and it has succeeded in finding more correct and reasonable regulations as compared to various existing works.
By the proposed approach, the gene interaction networks have been inferred in an efficient manner from both the synthetic, simulated cAMP oscillation expression data and real expression data. The computational time of this approach is also considerably smaller, which makes it to be more suitable for larger network reconstruction. Thus the proposed approach can serve as an initiate for the future researches regarding the associated area.
Network motifs provided a “conceptual tool” for understanding the functional principles of biological networks, but such motifs have primarily been used to consider static network structures. Static networks, however, cannot be used to reveal time- and region-specific traits of biological systems. To overcome this limitation, we proposed the concept of a “spatiotemporal network motif,” a spatiotemporal sequence of network motifs of sub-networks which are active only at specific time points and body parts.
On the basis of this concept, we analyzed the developmental gene regulatory network of the Drosophila melanogaster embryo. We identified spatiotemporal network motifs and investigated their distribution pattern in time and space. As a result, we found how key developmental processes are temporally and spatially regulated by the gene network. In particular, we found that nested feedback loops appeared frequently throughout the entire developmental process. From mathematical simulations, we found that mutual inhibition in the nested feedback loops contributes to the formation of spatial expression patterns.
Taken together, the proposed concept and the simulations can be used to unravel the design principle of developmental gene regulatory networks.
Substantial effort in recent years has been devoted to constructing and analyzing large-scale gene and protein networks based on 'omic data and literature mining. These interaction graphs provide valuable insight into the topologies of complex biological networks, but are rarely context-specific and cannot be used to predict the responses of cell signaling proteins to specific ligands or drugs. Conversely, traditional approaches to analyzing cell signaling are narrow in scope and cannot easily make use of network-level data. Here we combine network analysis and functional experimentation using a hybrid approach in which graphs are converted into simple mathematical models that can be trained against biochemical data. Specifically, we created Boolean logic models of immediate-early signaling in liver cells by training a literature-based prior knowledge network against biochemical data obtained from primary human hepatocytes and four hepatocellular carcinoma cell lines exposed to combinations of cytokines and small-molecule kinase inhibitors. Distinct families of models were recovered for each cell type that clustered topologically into normal and diseased sets. Comparison revealed that clustering arises from systematic differences in signaling logic in three regions of the network. We also infer the existence of a new interaction involving Jak-Stat and NFκB signaling and show that it arises from the polypharmacology of an IκB kinase inhibitor rather than previously unidentified protein-protein associations. These results constitute a proof-of-principle that receptor-mediated signal transduction can be reverse engineered using biochemical data so that the immediate effects of drugs on normal and diseased cells can be studied in a systematic manner.
liver; signal transduction; hepatocellular carcinoma; cancer; network inference; Boolean logic modeling
Living cells are controlled by networks of interacting genes, proteins and biochemicals. Cells use the emergent collective dynamics of these networks to probe their surroundings, perform computations and generate appropriate responses. Here, we consider genetic networks, interacting sets of genes that regulate one another’s expression. It is possible to infer the interaction topology of genetic networks from high-throughput experimental measurements. However, such experiments rarely provide information on the detailed nature of each interaction. We show that topological approaches provide powerful means of dealing with the missing biochemical data. We first discuss the biochemical basis of gene regulation, and describe how genes can be connected into networks. We then show that, given weak constraints on the underlying biochemistry, topology alone determines the emergent properties of certain simple networks. Finally, we apply these approaches to the realistic example of quorum-sensing networks: chemical communication systems that coordinate the responses of bacterial populations.
synthetic biology; feedback loops; Boolean threshold models
To elucidate mechanisms of cancer progression, we generated inducible human neoplasia in 3-dimensionally intact epithelial tissue. Gene expression profiling of both epithelia and stroma at specific time points during tumor progression revealed sequential enrichment of genes mediating discrete biologic functions in each tissue compartment. A core cancer progression signature was distilled using the increased signaling specificity of downstream oncogene effectors and subjected to network modeling. Network topology predicted that tumor development depends upon specific ECM-interacting network hubs. Blockade of one such hub, the β1 integrin subunit, disrupted network gene expression and attenuated tumorigenesis in vivo. Thus, integrating network modeling and temporal gene expression analysis of inducible human neoplasia provides an approach to prioritize and characterize genes functioning in cancer progression.
Investigating tumor progression in patient samples is complicated by etiologic heterogeneity, genetic instability, and an overabundance of precursor lesions that fail to progress. These complexities obscure construction of a dynamic picture of progression from normal tissue to invasive cancer. Here, we generate inducible human neoplasia driven by conditionally active Ras and characterize the sequence of gene expression programs engaged in epithelial tumor tissue and adjacent stroma during carcinogenesis. We show that tumor-intrinsic gene expression can be refined by sufficient downstream oncogene effectors and apply a generalizable network modeling strategy to prioritize targets based upon local interconnectivity. This analysis highlights the importance of tumor-stroma interaction during tumorigenesis and identifies β integrin as a potential oncotherapeutic that distinguishes normal and neoplastic tissue.
Cancer; Gene Expression; Skin; Stroma; Tumor Progression
The reconstruction of gene regulatory networks from time series gene expression data is one of the most difficult problems in systems biology. This is due to several reasons, among them the combinatorial explosion of possible network topologies, limited information content of the experimental data with high levels of noise, and the complexity of gene regulation at the transcriptional, translational and post-translational levels. At the same time, quantitative, dynamic models, ideally with probability distributions over model topologies and parameters, are highly desirable.
We present a novel approach to infer such models from data, based on nonlinear differential equations, which we embed into a stochastic Bayesian framework. We thus address both the stochasticity of experimental data and the need for quantitative dynamic models. Furthermore, the Bayesian framework allows it to easily integrate prior knowledge into the inference process. Using stochastic sampling from the Bayes' posterior distribution, our approach can infer different likely network topologies and model parameters along with their respective probabilities from given data. We evaluate our approach on simulated data and the challenge #3 data from the DREAM 2 initiative. On the simulated data, we study effects of different levels of noise and dataset sizes. Results on real data show that the dynamics and main regulatory interactions are correctly reconstructed.
Our approach combines dynamic modeling using differential equations with a stochastic learning framework, thus bridging the gap between biophysical modeling and stochastic inference approaches. Results show that the method can reap the advantages of both worlds, and allows the reconstruction of biophysically accurate dynamic models from noisy data. In addition, the stochastic learning framework used permits the computation of probability distributions over models and model parameters, which holds interesting prospects for experimental design purposes.
Gene expression data generated systematically in a given system over multiple time points provides a source of perturbation that can be leveraged to infer causal relationships among genes explaining network changes. Previously, we showed that food intake has a large impact on blood gene expression patterns and that these responses, either in terms of gene expression level or gene-gene connectivity, are strongly associated with metabolic diseases. In this study, we explored which genes drive the changes of gene expression patterns in response to time and food intake. We applied the Granger causality test and the dynamic Bayesian network to gene expression data generated from blood samples collected at multiple time points during the course of a day. The simulation result shows that combining many short time series together is as powerful to infer Granger causality as using a single long time series. Using the Granger causality test, we identified genes that were supported as the most likely causal candidates for the coordinated temporal changes in the network. These results show that PER1 is a key regulator of the blood transcriptional network, in which multiple biological processes are under circadian rhythm regulation. The fasted and fed dynamic Bayesian networks showed that over 72% of dynamic connections are self links. Finally, we show that different processes such as inflammation and lipid metabolism, which are disconnected in the static network, become dynamically linked in response to food intake, which would suggest that increasing nutritional load leads to coordinate regulation of these biological processes. In conclusion, our results suggest that food intake has a profound impact on the dynamic co-regulation of multiple biological processes, such as metabolism, immune response, apoptosis and circadian rhythm. The results could have broader implications for the design of studies of disease association and drug response in clinical trials.
Peripheral blood is the most readily accessible human tissue for clinical studies and experimental research more generally. Large-scale molecular profiling technologies have enabled measurements of mRNA expression on the scale of whole genomes. Understanding the relationships between human blood gene expression profiles and clinical traits is extremely useful for inferring causal factors for human disease and for studying drug response. Biological pathways and the complex behaviors they induce are not static, but change dynamically in response to external factors such as intake/uptake of nutrients and administration of drugs. We employed a randomized, two-arm cross-over design to assess the effects of fasting and feeding on the dynamic changes of blood transcriptional network. Our work has convincingly shown that feeding or increasing nutritional load affects the human circadian rhythm system which connects to other biological processes including metabolic and immune responses. We believe this is a first step towards a more comprehensive population-based study that seeks to connect changes in the blood transcriptome to drug response, and to disease and biology more generally.
Biological networks are highly dynamic in response to environmental and physiological cues. This variability is in contrast to conventional analyses of biological networks, which have overwhelmingly employed static graph models which stay constant over time to describe biological systems and their underlying molecular interactions.
To overcome these limitations, we propose here a new statistical modelling framework, the ARTIVA formalism (Auto Regressive TIme VArying models), and an associated inferential procedure that allows us to learn temporally varying gene-regulation networks from biological time-course expression data. ARTIVA simultaneously infers the topology of a regulatory network and how it changes over time. It allows us to recover the chronology of regulatory associations for individual genes involved in a specific biological process (development, stress response, etc.).
We demonstrate that the ARTIVA approach generates detailed insights into the function and dynamics of complex biological systems and exploits efficiently time-course data in systems biology. In particular, two biological scenarios are analyzed: the developmental stages of Drosophila melanogaster and the response of Saccharomyces cerevisiae to benomyl poisoning.
ARTIVA does recover essential temporal dependencies in biological systems from transcriptional data, and provide a natural starting point to learn and investigate their dynamics in greater detail.
Gene regulatory networks are perhaps the most important organizational level in the cell where signals from the cell state and the outside environment are integrated in terms of activation and inhibition of genes. For the last decade, the study of such networks has been fueled by large-scale experiments and renewed attention from the theoretical field. Different models have been proposed to, for instance, investigate expression dynamics, explain the network topology we observe in bacteria and yeast, and for the analysis of evolvability and robustness of such networks. Yet how these gene regulatory networks evolve and become evolvable remains an open question.
An individual-oriented evolutionary model is used to shed light on this matter. Each individual has a genome from which its gene regulatory network is derived. Mutations, such as gene duplications and deletions, alter the genome, while the resulting network determines the gene expression pattern and hence fitness. With this protocol we let a population of individuals evolve under Darwinian selection in an environment that changes through time.
Our work demonstrates that long-term evolution of complex gene regulatory networks in a changing environment can lead to a striking increase in the efficiency of generating beneficial mutations. We show that the population evolves towards genotype-phenotype mappings that allow for an orchestrated network-wide change in the gene expression pattern, requiring only a few specific gene indels. The genes involved are hubs of the networks, or directly influencing the hubs. Moreover, throughout the evolutionary trajectory the networks maintain their mutational robustness. In other words, evolution in an alternating environment leads to a network that is sensitive to a small class of beneficial mutations, while the majority of mutations remain neutral: an example of evolution of evolvability.
A cell receives signals both from its internal and external environment and responds by changing the expression of genes. In this manner the cell adjusts to heat, osmotic pressures and other circumstances during its lifetime. Over long timescales, the network of interacting genes and its regulatory actions also undergo evolutionary adaptation. Yet how do such networks evolve and become adapted?
In this paper we describe the study of a simple model of gene regulatory networks, focusing solely on evolutionary adaptation. We let a population of individuals evolve, while the external environment changes through time. To ensure evolution is the only source of adaptation, we do not provide the individuals with a sensor to the environment. We show that the interplay between the long-term process of evolution and short-term gene regulation dynamics leads to a striking increase in the efficiency of creating well-adapted offspring. Beneficial mutations become more frequent, nevertheless robustness to the majority of mutations is maintained. Thus we demonstrate a clear example of the evolution of evolvability.
Biological networks are constructed of repeated simplified patterns, or modules, called network motifs. Network motifs can be found in a variety of organisms including bacteria, plants, and animals, as well as intracellular transcription networks for gene expression and signal transduction processes in neuronal circuits. Standard models of signal transduction events for synaptic plasticity and learning often fail to capture the complexity and cooperativity of the molecular interactions underlying these processes. Here, we apply network motifs to a model for signal transduction during an in vitro form of eyeblink classical conditioning that reveals an underlying organization of these molecular pathways. Experimental evidence suggests there are two stages of synaptic AMPA receptor (AMPAR) trafficking during conditioning. Synaptic incorporation of GluR1-containing AMPARs occurs early to activate silent synapses conveying the auditory conditioned stimulus and this initial step is followed by delivery of GluR4 subunits that supports acquisition of learned conditioned responses (CRs). Overall, the network design of the two stages of synaptic AMPAR delivery during conditioning describes a coherent feed-forward loop (C1-FFL) with AND logic. The combined inputs of GluR1 synaptic delivery AND the sustained activation of 3-phosphoinositide-dependent protein-kinase-1 (PDK-1) results in synaptic incorporation of GluR4-containing AMPARs and the gradual acquisition of CRs. The network architecture described here for conditioning is postulated to act generally as a sign-sensitive delay element that is consistent with the non-linearity of the conditioning process. Interestingly, this FFL structure also performs coincidence detection. A motif-based approach to modeling signal transduction can be used as a new tool for understanding molecular mechanisms underlying synaptic plasticity and learning and for comparing findings across forms of learning and model systems.
classical conditioning; AMPA receptor trafficking; network motifs; model; signal transduction; eyeblink; in vitro; feed-forward loops
The cells adapt to extra- and intra-cellular signals by dynamic orchestration of activities of pathways in the biochemical networks. Dynamic control of the gene expression process represents a major mechanism for pathway activity regulation. Gene expression has thus been routinely measured, most frequently at steady-state mRNA abundance level using micro-array technology. The results are widely used in statistical inference of the structures of underlying biochemical networks, with the assumption that functionally related genes exhibit similar dynamic profiles. Steady-state mRNA abundance, however, is a composite of two factors: transcription rate and mRNA degradation rate. The question being asked here is therefore whether steady-state mRNA abundance or any of two factors is a more informative measurement target for studying network dynamics. The yeast S. cerevisiae was used as model organism and transcription rate was chosen out of the two factors in this study, because genome-wide determination of transcription rates has been reported for several physiological processes in this species. Our strategy is to test which one is a better measurement of functional relatedness between genes. The analysis was performed on those S. cerevisiae genes that have bacterial orthologs as identified by reciprocal BLAST analysis, so that functional relatedness of a gene pair can be measured by the frequency at which their bacterial orthologs co-occur in the same operon in the collection of bacterial genomes. It is found that transcription rate data is generally a better parameter for functional relatedness than steady state mRNA abundance, suggesting transcription rate data is more informative to use in deciphering the logics used by the cells in dynamic regulation of biochemical network behaviors. The significance of this finding for network and systems biology, as well as biomedical research in general, is discussed.
The physical periphery of a biological cell is mainly described by signaling pathways which are triggered by transmembrane proteins and receptors that are sentinels to control the whole gene regulatory network of a cell. However, our current knowledge about the gene regulatory mechanisms that are governed by extracellular signals is severely limited.
The purpose of this paper is three fold. First, we infer a gene regulatory network from a large-scale B-cell lymphoma expression data set using the C3NET algorithm. Second, we provide a functional and structural analysis of the largest connected component of this network, revealing that this network component corresponds to the peripheral region of a cell. Third, we analyze the hierarchical organization of network components of the whole inferred B-cell gene regulatory network by introducing a new approach which exploits the variability within the data as well as the inferential characteristics of C3NET. As a result, we find a functional bisection of the network corresponding to different cellular components.
Overall, our study allows to highlight the peripheral gene regulatory network of B-cells and shows that it is centered around hub transmembrane proteins located at the physical periphery of the cell. In addition, we identify a variety of novel pathological transmembrane proteins such as ion channel complexes and signaling receptors in B-cell lymphoma.
B-cell lymphoma; Gene expression data; Gene regulatory network; Statistical network inference
Identifying gene regulatory network (GRN) from time course gene expression data has attracted more and more attentions. Due to the computational complexity, most approaches for GRN reconstruction are limited on a small number of genes and low connectivity of the underlying networks. These approaches can only identify a single network for a given set of genes. However, for a large-scale gene network, there might exist multiple potential sub-networks, in which genes are only functionally related to others in the sub-networks.
We propose the network and community identification (NCI) method for identifying multiple subnetworks from gene expression data by incorporating community structure information into GRN inference. The proposed algorithm iteratively solves two optimization problems, and can promisingly be applied to large-scale GRNs. Furthermore, we present the efficient Block PCA method for searching communities in GRNs.
The NCI method is effective in identifying multiple subnetworks in a large-scale GRN. With the splitting algorithm, the Block PCA method shows a promosing attempt for exploring communities in a large-scale GRN.
Motivation: Reconstructing gene networks from microarray data has provided mechanistic information on cellular processes. A popular structure learning method, Bayesian network inference, has been used to determine network topology despite its shortcomings, i.e. the high-computational cost when analyzing a large number of genes and the inefficiency in exploiting prior knowledge, such as the co-regulation information of the genes. To address these limitations, we are introducing an alternative method, knowledge-driven matrix factorization (KMF) framework, to reconstruct phenotype-specific modular gene networks.
Results: Considering the reconstruction of gene network as a matrix factorization problem, we first use the gene expression data to estimate a correlation matrix, and then factorize the correlation matrix to recover the gene modules and the interactions between them. Prior knowledge from Gene Ontology is integrated into the matrix factorization. We applied this KMF algorithm to hepatocellular carcinoma (HepG2) cells treated with free fatty acids (FFAs). By comparing the module networks for the different conditions, we identified the specific modules that are involved in conferring the cytotoxic phenotype induced by palmitate. Further analysis of the gene modules of the different conditions suggested individual genes that play important roles in palmitate-induced cytotoxicity. In summary, KMF can efficiently integrate gene expression data with prior knowledge, thereby providing a powerful method of reconstructing phenotype-specific gene networks and valuable insights into the mechanisms that govern the phenotype.
Supplementary information: Supplementary data are available at Bioinformatics online.
Understanding gene expression and regulation is essential for understanding biological mechanisms. Because gene expression profiling has been widely used in basic biological research, especially in transcription regulation studies, we have developed GeneReg, an easy-to-use R package, to construct gene regulatory networks from time course gene expression profiling data; More importantly, this package can provide information about time delays between expression change in a regulator and that of its target genes.
The R package GeneReg is based on time delay linear regression, which can generate a model of the expression levels of regulators at a given time point against the expression levels of their target genes at a later time point. There are two parameters in the model, time delay and regulation coefficient. Time delay is the time lag during which expression change of the regulator is transmitted to change in target gene expression. Regulation coefficient expresses the regulation effect: a positive regulation coefficient indicates activation and negative indicates repression. GeneReg was implemented on a real Saccharomyces cerevisiae cell cycle dataset; more than thirty percent of the modeled regulations, based entirely on gene expression files, were found to be consistent with previous discoveries from known databases.
GeneReg is an easy-to-use, simple, fast R package for gene regulatory network construction from short time course gene expression data. It may be applied to study time-related biological processes such as cell cycle, cell differentiation, or causal inference.
Gene expression time series array data has become a useful resource for investigating gene functions and the interactions between genes. However, the gene expression arrays are always mixed with noise, and many nonlinear regulatory relationships have been omitted in many linear models. Because of those practical limitations, inference of gene regulatory model from expression data is still far from satisfactory.
In this study, we present a model-based computational approach, Slice Pattern Model (SPM), to identify gene regulatory network from time series gene expression array data. In order to estimate performances of stability and reliability of our model, an artificial gene network is tested by the traditional linear model and SPM. SPM can handle the multiple transcriptional time lags and more accurately reconstruct the gene network. Using SPM, a 17 time-series gene expression data in yeast cell cycle is retrieved to reconstruct the regulatory network. Under the reliability threshold, θ = 55%, 18 relationships between genes are identified and transcriptional regulatory network is reconstructed. Results from previous studies demonstrate that most of gene relationships identified by SPM are correct.
With the help of pattern recognition and similarity analysis, the effect of noise has been limited in SPM method. At the same time, genetic algorithm is introduced to optimize parameters of gene network model, which is performed based on a statistic method in our experiments. The results of experiments demonstrate that the gene regulatory model reconstructed using SPM is more stable and reliable than those models coming from traditional linear model.
Recent advancements in genetics and proteomics have led to the acquisition of large quantitative data sets. However, the use of these data to reverse engineer biochemical networks has remained a challenging problem. Many methods have been proposed to infer biochemical network topologies from different types of biological data. Here, we focus on unraveling network topologies from steady state responses of biochemical networks to successive experimental perturbations.
We propose a computational algorithm which combines a deterministic network inference method termed Modular Response Analysis (MRA) and a statistical model selection algorithm called Bayesian Variable Selection, to infer functional interactions in cellular signaling pathways and gene regulatory networks. It can be used to identify interactions among individual molecules involved in a biochemical pathway or reveal how different functional modules of a biological network interact with each other to exchange information. In cases where not all network components are known, our method reveals functional interactions which are not direct but correspond to the interaction routes through unknown elements. Using computer simulated perturbation responses of signaling pathways and gene regulatory networks from the DREAM challenge, we demonstrate that the proposed method is robust against noise and scalable to large networks. We also show that our method can infer network topologies using incomplete perturbation datasets. Consequently, we have used this algorithm to explore the ERBB regulated G1/S transition pathway in certain breast cancer cells to understand the molecular mechanisms which cause these cells to become drug resistant. The algorithm successfully inferred many well characterized interactions of this pathway by analyzing experimentally obtained perturbation data. Additionally, it identified some molecular interactions which promote drug resistance in breast cancer cells.
The proposed algorithm provides a robust, scalable and cost effective solution for inferring network topologies from biological data. It can potentially be applied to explore novel pathways which play important roles in life threatening disease like cancer.
Network inference; Bayesian statistics; Modular Response Analysis; Signaling pathways.
Discovery of essential genes in pathogenic organisms is an important step in the development of new medication. Despite a growing number of genome data available, little is known about C. albicans, a major fungal pathogen. Most of the human population carries C. albicans as commensal, but it can cause systemic infection that may lead to the death of the host if the immune system has deteriorated. In many organisms central nodes in the interaction network (hubs) play a crucial role for information and energy transport. Knock-outs of such hubs often lead to lethal phenotypes making them interesting drug targets. To identify these central genes via topological analysis, we inferred gene regulatory networks that are sparse and scale-free. We collected information from various sources to complement the limited expression data available. We utilized a linear regression algorithm to infer genome-wide gene regulatory interaction networks. To evaluate the predictive power of our approach, we used an automated text-mining system that scanned full-text research papers for known interactions. With the help of the compendium of known interactions, we also optimize the influence of the prior knowledge and the sparseness of the model to achieve the best results. We compare the results of our approach with those of other state-of-the-art network inference methods and show that we outperform those methods. Finally we identify a number of hubs in the genome of the fungus and investigate their biological relevance.
network inference; linear regression; LASSO; reverse engineering; scale-free; Candida albicans; hubs; prior knowledge
Gene regulatory networks for development underlie cell fate specification and differentiation. Network topology, logic and dynamics can be obtained by thorough experimental analysis. Our understanding of the gene regulatory network controlling endomesoderm specification in the sea urchin embryo has attained an advanced level such that it explains developmental phenomenology. Here we review how the network explains the mechanisms utilized in development to control the formation of dynamic expression patterns of transcription factors and signaling molecules. The network represents the genomic program controlling timely activation of specification and differentiation genes in the correct embryonic lineages. It can also be used to study evolution of body plans. We demonstrate how comparing the sea urchin gene regulatory network to that of the sea star and to that of later developmental stages in the sea urchin, reveals mechanisms underlying the origin of evolutionary novelty. The experimentally based gene regulatory network for endomesoderm specification in the sea urchin embryo provides unique insights into the system level properties of cell fate specification and its evolution.
gene regulation in development; evolution; systems level properties
Motivation: Network inference approaches are widely used to shed light on regulatory interplay between molecular players such as genes and proteins. Biochemical processes underlying networks of interest (e.g. gene regulatory or protein signalling networks) are generally nonlinear. In many settings, knowledge is available concerning relevant chemical kinetics. However, existing network inference methods for continuous, steady-state data are typically rooted in statistical formulations, which do not exploit chemical kinetics to guide inference.
Results: Herein, we present an approach to network inference for steady-state data that is rooted in non-linear descriptions of biochemical mechanism. We use equilibrium analysis of chemical kinetics to obtain functional forms that are in turn used to infer networks using steady-state data. The approach we propose is directly applicable to conventional steady-state gene expression or proteomic data and does not require knowledge of either network topology or any kinetic parameters. We illustrate the approach in the context of protein phosphorylation networks, using data simulated from a recent mechanistic model and proteomic data from cancer cell lines. In the former, the true network is known and used for assessment, whereas in the latter, results are compared against known biochemistry. We find that the proposed methodology is more effective at estimating network topology than methods based on linear models.
Supplementary data are available at Bioinformatics online.
Modern approaches to treating genetic disorders, cancers and even epidemics rely on a detailed understanding of the underlying gene signaling network. Previous work has used time series microarray data to infer gene signaling networks given a large number of accurate time series samples. Microarray data available for many biological experiments is limited to a small number of arrays with little or no time series guarantees. When several samples are averaged to examine differences in mean value between a diseased and normal state, information from individual samples that could indicate a gene relationship can be lost.
Asynchronous Inference of Regulatory Networks (AIRnet) provides gene signaling network inference using more practical assumptions about the microarray data. By learning correlation patterns for the changes in microarray values from all pairs of samples, accurate network reconstructions can be performed with data that is normally available in microarray experiments.
By focussing on the changes between microarray samples, instead of absolute values, increased information can be gleaned from expression data.
Successive whole genome duplications have recently been firmly established in all major eukaryote kingdoms. Such exponential evolutionary processes must have largely contributed to shape the topology of protein-protein interaction (PPI) networks by outweighing, in particular, all time-linear network growths modeled so far.
We propose and solve a mathematical model of PPI network evolution under successive genome duplications. This demonstrates, from first principles, that evolutionary conservation and scale-free topology are intrinsically linked properties of PPI networks and emerge from i) prevailing exponential network dynamics under duplication and ii) asymmetric divergence of gene duplicates. While required, we argue that this asymmetric divergence arises, in fact, spontaneously at the level of protein-binding sites. This supports a refined model of PPI network evolution in terms of protein domains under exponential and asymmetric duplication/divergence dynamics, with multidomain proteins underlying the combinatorial formation of protein complexes. Genome duplication then provides a powerful source of PPI network innovation by promoting local rearrangements of multidomain proteins on a genome wide scale. Yet, we show that the overall conservation and topology of PPI networks are robust to extensive domain shuffling of multidomain proteins as well as to finer details of protein interaction and evolution. Finally, large scale features of direct and indirect PPI networks of S. cerevisiae are well reproduced numerically with only two adjusted parameters of clear biological significance (i.e. network effective growth rate and average number of protein-binding domains per protein).
This study demonstrates the statistical consequences of genome duplication and domain shuffling on the conservation and topology of PPI networks over a broad evolutionary scale across eukaryote kingdoms. In particular, scale-free topologies of PPI networks, which are found to be robust to extensive shuffling of protein domains, appear to be a simple consequence of the conservation of protein-binding domains under asymmetric duplication/divergence dynamics in the course of evolution.
Network Component Analysis (NCA) has shown its effectiveness in discovering regulators and inferring transcription factor activities (TFAs) when both microarray data and ChIP-on-chip data are available. However, a NCA scheme is not applicable to many biological studies due to limited topology information available, such as lack of ChIP-on-chip data. We propose a new approach, motif-directed NCA (mNCA), to integrate motif information and gene expression data to infer regulatory networks.
We develop motif-directed NCA (mNCA) to incorporate motif information into NCA for regulatory network inference. While motif information is readily available from knowledge databases, it is a "noisy" source of network topology information consisting of many false positives. To overcome this problem, we develop a stability analysis procedure embedded in mNCA to resolve the inconsistency between motif information and gene expression data, and to enable the identification of stable TFAs. The mNCA approach has been applied to a time course microarray data set of muscle regeneration. The experimental results show that the inferred TFAs are not only numerically stable but also biologically relevant to muscle differentiation process. In particular, several inferred TFAs like those of MyoD, myogenin and YY1 are well supported by biological experiments.
A novel computational approach, mNCA, has been developed to integrate motif information and gene expression data for regulatory network reconstruction. Specifically, motif analysis is used to obtain initial network topology, and stability analysis is developed and applied with mNCA to extract stable TFAs. Experimental results on muscle regeneration microarray data have demonstrated that mNCA is a practical and reliable computational method for regulatory network inference and pathway discovery.
Biological networks are important for elucidating disease etiology due to their ability to model complex high dimensional data and biological systems. Proteomics provides a critical data source for such models, but currently lacks robust de novo methods for network construction, which could bring important insights in systems biology.
We have evaluated the construction of network models using methods derived from weighted gene co-expression network analysis (WGCNA). We show that approximately scale-free peptide networks, composed of statistically significant modules, are feasible and biologically meaningful using two mouse lung experiments and one human plasma experiment. Within each network, peptides derived from the same protein are shown to have a statistically higher topological overlap and concordance in abundance, which is potentially important for inferring protein abundance. The module representatives, called eigenpeptides, correlate significantly with biological phenotypes. Furthermore, within modules, we find significant enrichment for biological function and known interactions (gene ontology and protein-protein interactions).
Biological networks are important tools in the analysis of complex systems. In this paper we evaluate the application of weighted co-expression network analysis to quantitative proteomics data. Protein co-expression networks allow novel approaches for biological interpretation, quality control, inference of protein abundance, a framework for potentially resolving degenerate peptide-protein mappings, and a biomarker signature discovery.
Biomarkers; Biological networks; Networks; Systems biology; Virology; Sarcopenia; LC-MS; Proteomics