To decipher the complexity and improve the understanding of host-pathogen interactions, biologists must adopt new system level approaches in which the hierarchy of biological interactions and dynamics can be studied. This paper presents the application of systems biology for the cross-comparative analysis and interactome modeling of three different infectious agents, leading to the identification of novel, unique and common molecular host responses (biosignatures).
A computational systems biology method was utilized to create interactome models of the host responses to Brucella melitensis (BMEL), Salmonella enterica Typhimurium (STM) and Mycobacterium avium paratuberculosis (MAP). A bovine ligated ileal loop biological model was employed to capture the host gene expression response at four time points post infection. New methods based on Dynamic Bayesian Network (DBN) machine learning were employed to conduct a systematic comparative analysis of pathway and Gene Ontology category perturbations.
A cross-comparative assessment of 219 pathways and 1620 gene ontology (GO) categories was performed on each pathogen-host condition. Both unique and common pathway and GO perturbations indicated remarkable temporal differences in pathogen-host response profiles. Highly discriminatory pathways were selected from each pathogen condition to create a common system level interactome model comprised of 622 genes. This model was trained with data from each pathogen condition to capture unique and common gene expression features and relationships leading to the identification of candidate host-pathogen points of interactions and discriminatory biosignatures.
Our results provide deeper understanding of the overall complexity of host defensive and pathogen invasion processes as well as the identification of novel host-pathogen interactions. The application of advanced computational methods for developing interactome models based on DBN has proven to be instrumental in conducting multi-conditional cross-comparative analyses. Further, this approach generates a fully simulateable model with capabilities for predictive analysis as well as for diagnostic pattern recognition. The resulting biosignatures may represent future targets for identification of emerging pathogens as well as for development of antimicrobial drugs, immunotherapeutics, or vaccines for prevention and treatment of diseases caused by known, emerging/re-emerging infectious agents.
Adequate control of serum glucose in critically ill patients is a complex problem requiring continuous monitoring and intervention, which have a direct effect on clinical outcomes. Understanding temporal relationships can help to improve our knowledge of complex disease processes and their response to treatment. We discuss a Dynamic Bayesian Network (DBN) model that we created using the open-source Projeny toolkit to represent various clinical variables and the temporal and atemporal relationships underlying insulin and glucose homeostasis. We evaluated this model by comparing the DBN model’s insulin dose predictions against those of a rule-based protocol (eProtocol-insulin) currently used in the ICU. The results suggest that the DBN model’s predictions are as effective as or better than those of the rule-based protocol. The limitations of our methods are discussed, with a brief note on their generalizability.
A significant amount of attention has recently been focused on modeling of gene regulatory networks. Two frequently used large-scale modeling frameworks are Bayesian networks (BNs) and Boolean networks, the latter one being a special case of its recent stochastic extension, probabilistic Boolean networks (PBNs). PBN is a promising model class that generalizes the standard rule-based interactions of Boolean networks into the stochastic setting. Dynamic Bayesian networks (DBNs) is a general and versatile model class that is able to represent complex temporal stochastic processes and has also been proposed as a model for gene regulatory systems. In this paper, we concentrate on these two model classes and demonstrate that PBNs and a certain subclass of DBNs can represent the same joint probability distribution over their common variables. The major benefit of introducing the relationships between the models is that it opens up the possibility of applying the standard tools of DBNs to PBNs and vice versa. Hence, the standard learning tools of DBNs can be applied in the context of PBNs, and the inference methods give a natural way of handling the missing values in PBNs which are often present in gene expression measurements. Conversely, the tools for controlling the stationary behavior of the networks, tools for projecting networks onto sub-networks, and efficient learning schemes can be used for DBNs. In other words, the introduced relationships between the models extend the collection of analysis tools for both model classes.
Gene regulatory networks; Probabilistic Boolean networks; Dynamic Bayesian networks
A central goal of molecular biology is to understand the regulatory mechanisms of gene transcription and protein synthesis. Because of their solid basis in statistics, allowing to deal with the stochastic aspects of gene expressions and noisy measurements in a natural way, Bayesian networks appear attractive in the field of inferring gene interactions structure from microarray experiments data. However, the basic formalism has some disadvantages, e.g. it is sometimes hard to distinguish between the origin and the target of an interaction. Two kinds of microarray experiments yield data particularly rich in information regarding the direction of interactions: time series and perturbation experiments. In order to correctly handle them, the basic formalism must be modified. For example, dynamic Bayesian networks (DBN) apply to time series microarray data. To our knowledge the DBN technique has not been applied in the context of perturbation experiments.
We extend the framework of dynamic Bayesian networks in order to incorporate perturbations. Moreover, an exact algorithm for inferring an optimal network is proposed and a discretization method specialized for time series data from perturbation experiments is introduced. We apply our procedure to realistic simulations data. The results are compared with those obtained by standard DBN learning techniques. Moreover, the advantages of using exact learning algorithm instead of heuristic methods are analyzed.
We show that the quality of inferred networks dramatically improves when using data from perturbation experiments. We also conclude that the exact algorithm should be used when it is possible, i.e. when considered set of genes is small enough.
Coordination among cortical neurons is believed to be key element in mediating many high level cortical processes such as perception, attention, learning and memory formation. Inferring the topology of the neural circuitry underlying this coordination is important to characterize the highly non-linear, time-varying interactions between cortical neurons in the presence of complex stimuli. In this work, we investigate the applicability of Dynamic Bayesian Networks (DBNs) in inferring the effective connectivity between spiking cortical neurons from their observed spike trains. We demonstrate that DBNs can infer the underlying non-linear and time-varying causal interactions between these neurons and can discriminate between mono and polysynaptic links between them under certain constraints governing their putative connectivity. We analyzed conditionally-Poisson spike train data mimicking spiking activity of cortical networks of small and moderately-large sizes. The performance was assessed and compared to other methods under systematic variations of the network structure to mimic a wide range of responses typically observed in the cortex. Results demonstrate the utility of DBN in inferring the effective connectivity in cortical networks.
ensemble recordings; spike trains; functional connectivity; effective connectivity; dynamic Bayesian networks; multiple single unit activity; spiking cortical networks
Dynamic Bayesian network (DBN) is among the mainstream approaches for modeling various biological networks, including the gene regulatory network (GRN). Most current methods for learning DBN employ either local search such as hill-climbing, or a meta stochastic global optimization framework such as genetic algorithm or simulated annealing, which are only able to locate sub-optimal solutions. Further, current DBN applications have essentially been limited to small sized networks.
To overcome the above difficulties, we introduce here a deterministic global optimization based DBN approach for reverse engineering genetic networks from time course gene expression data. For such DBN models that consist only of inter time slice arcs, we show that there exists a polynomial time algorithm for learning the globally optimal network structure. The proposed approach, named GlobalMIT+, employs the recently proposed information theoretic scoring metric named mutual information test (MIT). GlobalMIT+ is able to learn high-order time delayed genetic interactions, which are common to most biological systems. Evaluation of the approach using both synthetic and real data sets, including a 733 cyanobacterial gene expression data set, shows significantly improved performance over other techniques.
Our studies demonstrate that deterministic global optimization approaches can infer large scale genetic networks.
Online automated quality assessment is critical to determine a sensor's fitness for purpose in real-time applications. A Dynamic Bayesian Network (DBN) framework is proposed to produce probabilistic quality assessments and represent the uncertainty of sequentially correlated sensor readings. This is a novel framework to represent the causes, quality state and observed effects of individual sensor errors without imposing any constraints upon the physical deployment or measured phenomenon. It represents the casual relationship between quality tests and combines them in a way to generate uncertainty estimates of samples. The DBN was implemented for a particular marine deployment of temperature and conductivity sensors in Hobart, Australia. The DBN was shown to offer a substantial average improvement (34%) in replicating the error bars that were generated by experts when compared to a fuzzy logic approach.
online filtering; automated; quality assessment; sensors; dynamic Bayesian networks
Cross-referencing experimental data with our current knowledge of signaling network topologies is one central goal of mathematical modeling of cellular signal transduction networks. We present a new methodology for data-driven interrogation and training of signaling networks. While most published methods for signaling network inference operate on Bayesian, Boolean, or ODE models, our approach uses integer linear programming (ILP) on interaction graphs to encode constraints on the qualitative behavior of the nodes. These constraints are posed by the network topology and their formulation as ILP allows us to predict the possible qualitative changes (up, down, no effect) of the activation levels of the nodes for a given stimulus. We provide four basic operations to detect and remove inconsistencies between measurements and predicted behavior: (i) find a topology-consistent explanation for responses of signaling nodes measured in a stimulus-response experiment (if none exists, find the closest explanation); (ii) determine a minimal set of nodes that need to be corrected to make an inconsistent scenario consistent; (iii) determine the optimal subgraph of the given network topology which can best reflect measurements from a set of experimental scenarios; (iv) find possibly missing edges that would improve the consistency of the graph with respect to a set of experimental scenarios the most. We demonstrate the applicability of the proposed approach by interrogating a manually curated interaction graph model of EGFR/ErbB signaling against a library of high-throughput phosphoproteomic data measured in primary hepatocytes. Our methods detect interactions that are likely to be inactive in hepatocytes and provide suggestions for new interactions that, if included, would significantly improve the goodness of fit. Our framework is highly flexible and the underlying model requires only easily accessible biological knowledge. All related algorithms were implemented in a freely available toolbox SigNetTrainer making it an appealing approach for various applications.
Cellular signal transduction is orchestrated by communication networks of signaling proteins commonly depicted on signaling pathway maps. However, each cell type may have distinct variants of signaling pathways, and wiring diagrams are often altered in disease states. The identification of truly active signaling topologies based on experimental data is therefore one key challenge in systems biology of cellular signaling. We present a new framework for training signaling networks based on interaction graphs (IG). In contrast to complex modeling formalisms, IG capture merely the known positive and negative edges between the components. This basic information, however, already sets hard constraints on the possible qualitative behaviors of the nodes when perturbing the network. Our approach uses Integer Linear Programming to encode these constraints and to predict the possible changes (down, neutral, up) of the activation levels of the involved players for a given experiment. Based on this formulation we developed several algorithms for detecting and removing inconsistencies between measurements and network topology. Demonstrated by EGFR/ErbB signaling in hepatocytes, our approach delivers direct conclusions on edges that are likely inactive or missing relative to canonical pathway maps. Such information drives the further elucidation of signaling network topologies under normal and pathological phenotypes.
Network inference deals with the reconstruction of biological networks from experimental data. A variety of different reverse engineering techniques are available; they differ in the underlying assumptions and mathematical models used. One common problem for all approaches stems from the complexity of the task, due to the combinatorial explosion of different network topologies for increasing network size. To handle this problem, constraints are frequently used, for example on the node degree, number of edges, or constraints on regulation functions between network components. We propose to exploit topological considerations in the inference of gene regulatory networks. Such systems are often controlled by a small number of hub genes, while most other genes have only limited influence on the network's dynamic. We model gene regulation using a Bayesian network with discrete, Boolean nodes. A hierarchical prior is employed to identify hub genes. The first layer of the prior is used to regularize weights on edges emanating from one specific node. A second prior on hyperparameters controls the magnitude of the former regularization for different nodes. The net effect is that central nodes tend to form in reconstructed networks. Network reconstruction is then performed by maximization of or sampling from the posterior distribution. We evaluate our approach on simulated and real experimental data, indicating that we can reconstruct main regulatory interactions from the data. We furthermore compare our approach to other state-of-the art methods, showing superior performance in identifying hubs. Using a large publicly available dataset of over 800 cell cycle regulated genes, we are able to identify several main hub genes. Our method may thus provide a valuable tool to identify interesting candidate genes for further study. Furthermore, the approach presented may stimulate further developments in regularization methods for network reconstruction from data.
Biological gene networks appear to be dynamically robust to mutation, stochasticity, and changes in the environment and also appear to be sparsely connected. Studies with computational models, however, have suggested that denser gene networks evolve to be more dynamically robust than sparser networks. We resolve this discrepancy by showing that misassumptions about how to measure robustness in artificial networks have inadvertently discounted the costs of network complexity. We show that when the costs of complexity are taken into account, that robustness implies a parsimonious network structure that is sparsely connected and not unnecessarily complex; and that selection will favor sparse networks when network topology is free to evolve. Because a robust system of heredity is necessary for the adaptive evolution of complex phenotypes, the maintenance of frugal network complexity is likely a crucial design constraint that underlies biological organization.
complexity; evolvability; gene network; robustness
It is a great challenge of modern biology to determine the functional roles of non-synonymous Single Nucleotide Polymorphisms (nsSNPs) on complex phenotypes. Statistical and machine learning techniques establish correlations between genotype and phenotype, but may fail to infer the biologically relevant mechanisms. The emerging paradigm of Network-based Association Studies aims to address this problem of statistical analysis. However, a mechanistic understanding of how individual molecular components work together in a system requires knowledge of molecular structures, and their interactions.
To address the challenge of understanding the genetic, molecular, and cellular basis of complex phenotypes, we have, for the first time, developed a structural systems biology approach for genome-wide multiscale modeling of nsSNPs - from the atomic details of molecular interactions to the emergent properties of biological networks. We apply our approach to determine the functional roles of nsSNPs associated with hypoxia tolerance in Drosophila melanogaster. The integrated view of the functional roles of nsSNP at both molecular and network levels allows us to identify driver mutations and their interactions (epistasis) in H, Rad51D, Ulp1, Wnt5, HDAC4, Sol, Dys, GalNAc-T2, and CG33714 genes, all of which are involved in the up-regulation of Notch and Gurken/EGFR signaling pathways. Moreover, we find that a large fraction of the driver mutations are neither located in conserved functional sites, nor responsible for structural stability, but rather regulate protein activity through allosteric transitions, protein-protein interactions, or protein-nucleic acid interactions. This finding should impact future Genome-Wide Association Studies.
Our studies demonstrate that the consolidation of statistical, structural, and network views of biomolecules and their interactions can provide new insight into the functional role of nsSNPs in Genome-Wide Association Studies, in a way that neither the knowledge of molecular structures nor biological networks alone could achieve. Thus, multiscale modeling of nsSNPs may prove to be a powerful tool for establishing the functional roles of sequence variants in a wide array of applications.
In the last decade, advances in genomics, proteomics, and metabolomics have yielded large-scale datasets that have driven an interest in global analyses, with the objective of understanding biological systems as a whole. Systems biology integrates computational modeling and experimental biology to predict and characterize the dynamic properties of biological systems, which are viewed as complex signaling networks. Whereas the systems analysis of disease-perturbed networks holds promise for identification of drug targets for therapy, equally the identified critical network nodes may be targeted through nutritional intervention in either a preventative or therapeutic fashion. As such, in the context of the nutritional sciences, it is envisioned that systems analysis of normal and nutrient-perturbed signaling networks in combination with knowledge of underlying genetic polymorphisms will lead to a future in which the health of individuals will be improved through predictive and preventative nutrition. Although high-throughput transcriptomic microarray data were initially most readily available and amenable to systems analysis, recent technological and methodological advances in MS have contributed to a linear increase in proteomic investigations. It is now commonplace for combined proteomic technologies to generate complex, multi-faceted datasets, and these will be the keystone of future systems biology research. This review will define systems biology, outline current proteomic methodologies, highlight successful applications of proteomics in nutrition research, and discuss the challenges for future applications of systems biology approaches in the nutritional sciences.
Reverse engineering cellular networks is currently one of the most challenging problems in systems biology. Dynamic Bayesian networks (DBNs) seem to be particularly suitable for inferring relationships between cellular variables from the analysis of time series measurements of mRNA or protein concentrations. As evaluating inference results on a real dataset is controversial, the use of simulated data has been proposed. However, DBN approaches that use continuous variables, thus avoiding the information loss associated with discretization, have not yet been extensively assessed, and most of the proposed approaches have dealt with linear Gaussian models.
We propose a generalization of dynamic Gaussian networks to accommodate nonlinear dependencies between variables. As a benchmark dataset to test the new approach, we used data from a mathematical model of cell cycle control in budding yeast that realistically reproduces the complexity of a cellular system. We evaluated the ability of the networks to describe the dynamics of cellular systems and their precision in reconstructing the true underlying causal relationships between variables. We also tested the robustness of the results by analyzing the effect of noise on the data, and the impact of a different sampling time.
The results confirmed that DBNs with Gaussian models can be effectively exploited for a first level analysis of data from complex cellular systems. The inferred models are parsimonious and have a satisfying goodness of fit. Furthermore, the networks not only offer a phenomenological description of the dynamics of cellular systems, but are also able to suggest hypotheses concerning the causal interactions between variables. The proposed nonlinear generalization of Gaussian models yielded models characterized by a slightly lower goodness of fit than the linear model, but a better ability to recover the true underlying connections between variables.
Inferring the topology of a gene-regulatory network (GRN) from genome-scale time-series measurements of transcriptional change has proved useful for disentangling complex biological processes. To address the challenges associated with this inference, a number of competing approaches have previously been used, including examples from information theory, Bayesian and dynamic Bayesian networks (DBNs), and ordinary differential equation (ODE) or stochastic differential equation. The performance of these competing approaches have previously been assessed using a variety of in silico and in vivo datasets. Here, we revisit this work by assessing the performance of more recent network inference algorithms, including a novel non-parametric learning approach based upon nonlinear dynamical systems. For larger GRNs, containing hundreds of genes, these non-parametric approaches more accurately infer network structures than do traditional approaches, but at significant computational cost. For smaller systems, DBNs are competitive with the non-parametric approaches with respect to computational time and accuracy, and both of these approaches appear to be more accurate than Granger causality-based methods and those using simple ODEs models.
gene-regulatory networks; inference; gene expression
Uncovering the operating principles underlying cellular processes by using 'omics' data is often a difficult task due to the high-dimensionality of the solution space that spans all interactions among the bio-molecules under consideration. A rational way to overcome this problem is to use the topology of bio-molecular interaction networks in order to constrain the solution space. Such approaches systematically integrate the existing biological knowledge with the 'omics' data.
Here we introduce a hypothesis-driven method that integrates bio-molecular network topology with transcriptome data, thereby allowing the identification of key biological features (Reporter Features) around which transcriptional changes are significantly concentrated. We have combined transcriptome data with different biological networks in order to identify Reporter Gene Ontologies, Reporter Transcription Factors, Reporter Proteins and Reporter Complexes, and use this to decipher the logic of regulatory circuits playing a key role in yeast glucose repression and human diabetes.
Reporter Features offer the opportunity to identify regulatory hot-spots in bio-molecular interaction networks that are significantly affected between or across conditions. Results of the Reporter Feature analysis not only provide a snapshot of the transcriptional regulatory program but also are biologically easy to interpret and provide a powerful way to generate new hypotheses. Our Reporter Features analyses of yeast glucose repression and human diabetes data brings hints towards the understanding of the principles of transcriptional regulation controlling these two important and potentially closely related systems.
The aim of this study was to provide a framework for the analysis of visceral obesity and its determinants in women, where complex inter-relationships are observed among lifestyle, nutritional and metabolic predictors. Thirty-four predictors related to lifestyle, adiposity, body fat distribution, blood lipids and adipocyte sizes have been considered as potential correlates of visceral obesity in women. To properly address the difficulties in managing such interactions given our limited sample of 150 women, bootstrapped Bayesian networks were constructed based on novel constraint-based learning methods that appeared recently in the statistical learning community. Statistical significance of edge strengths was evaluated and the less reliable edges were pruned to increase the network robustness. To allow accessible interpretation and integrate biological knowledge into the final network, several undirected edges were afterwards directed with physiological expertise according to relevant literature.
Extensive experiments on synthetic data sampled from a known Bayesian network show that the algorithm, called Recursive Hybrid Parents and Children (RHPC), outperforms state-of-the-art algorithms that appeared in the recent literature. Regarding biological plausibility, we found that the inference results obtained with the proposed method were in excellent agreement with biological knowledge. For example, these analyses indicated that visceral adipose tissue accumulation is strongly related to blood lipid alterations independent of overall obesity level.
Bayesian Networks are a useful tool for investigating and summarizing evidence when complex relationships exist among predictors, in particular, as in the case of multifactorial conditions like visceral obesity, when there is a concurrent incidence for several variables, interacting in a complex manner. The source code and the data sets used for the empirical tests are available at http://www710.univ-lyon1.fr/~aaussem/Software.html.
Inference of biological networks has become an important tool in Systems Biology. Nowadays it is becoming clearer that the complexity of organisms is more related with the organization of its components in networks rather than with the individual behaviour of the components. Among various approaches for inferring networks, Bayesian Networks are very attractive due to their probabilistic nature and flexibility to incorporate interventions and extra sources of information. Recently various attempts to infer networks with different Bayesian Networks approaches were pursued. The specific interest in this paper is to compare the performance of three different inference approaches: Bayesian Networks without any modification; Bayesian Networks modified to take into account specific interventions produced during data collection; and a probabilistic hierarchical model that allows the inclusion of extra knowledge in the inference of Bayesian Networks. The inference is performed in three different types of data: (i) synthetic data obtained from a Gaussian distribution, (ii) synthetic data simulated with Netbuilder and (iii) Real data obtained in flow cytometry experiments.
Bayesian Networks with interventions and Bayesian Networks with inclusion of extra knowledge outperform simple Bayesian Networks in all data sets when considering the reconstruction accuracy and taking the edge directions into account. In the Real data the increase in accuracy is also observed when not taking the edge directions into account.
Although it comes with a small extra computational cost the use of more refined Bayesian network models is justified. Both the inclusion of extra knowledge and the use of interventions have outperformed the simple Bayesian network model in simulated and Real data sets. Also, if the source of extra knowledge used in the inference is not reliable the inferred network is not deteriorated. If the extra knowledge has a good agreement with the data there is no significant difference in using the Bayesian networks with interventions or Bayesian networks with the extra knowledge.
With the advent of high-throughput biotechnology capable of monitoring genomic signals, it becomes increasingly promising to understand molecular cellular mechanisms through systems biology approaches. One of the active research topics in systems biology is to infer gene transcriptional regulatory networks using various genomic data; this inference problem can be formulated as a linear model with latent signals associated with some regulatory proteins called transcription factors (TFs). As common statistical assumptions may not hold for genomic signals, typical latent variable algorithms such as independent component analysis (ICA) are incapable to reveal underlying true regulatory signals. Liao et al.  proposed to perform inference using an approach named network component analysis (NCA), the optimization of which is achieved by a least-squares fitting approach with biological knowledge constraints. However, the incompleteness of biological knowledge and its inconsistency with gene expression data are not considered in the original NCA solution, which could greatly affect the inference accuracy. To overcome these limitations, we propose a linear extraction scheme, namely regulatory component analysis (RCA), to infer underlying regulatory signals even with partial biological knowledge. Numerical simulations show a significant improvement of our proposed RCA over NCA, not only when signal-to-noise-ratio (SNR) is low, but also when the given biological knowledge is incomplete and inconsistent to gene expression data. Furthermore, real biological experiments on E. coli are performed for regulatory network inference in comparison with several typical linear latent variable methods, which again demonstrates the effectiveness and improved performance of the proposed algorithm.
Transcriptional regulatory network inference; Source extraction; Gene expression; Genomic signal processing
Motivation: Primary purpose of modeling gene regulatory networks for developmental process is to reveal pathways governing the cellular differentiation to specific phenotypes. Knowledge of differentiation network will enable generation of desired cell fates by careful alteration of the governing network by adequate manipulation of cellular environment.
Results: We have developed a novel integer programming-based approach to reconstruct the underlying regulatory architecture of differentiating embryonic stem cells from discrete temporal gene expression data. The network reconstruction problem is formulated using inherent features of biological networks: (i) that of cascade architecture which enables treatment of the entire complex network as a set of interconnected modules and (ii) that of sparsity of interconnection between the transcription factors. The developed framework is applied to the system of embryonic stem cells differentiating towards pancreatic lineage. Experimentally determined expression profile dynamics of relevant transcription factors serve as the input to the network identification algorithm. The developed formulation accurately captures many of the known regulatory modes involved in pancreatic differentiation. The predictive capacity of the model is tested by simulating an in silico potential pathway of subsequent differentiation. The predicted pathway is experimentally verified by concurrent differentiation experiments. Experimental results agree well with model predictions, thereby illustrating the predictive accuracy of the proposed algorithm.
Supplementary information: Supplementary data are available at Bioinformatics online.
Systems biology is an approach to the science that views biology as an information science, studies biological systems as a whole and their interactions with the environment. This approach, for the reasons described here, has particular power in the search for informative diagnostic biomarkers of diseases because it focuses on the fundamental causes and keys on the identification and understanding of disease- perturbed molecular networks. In this review, we describe some recent developments that have used systems biology to address complex diseases – prion disease and drug induced liver injury- and use these as examples to illustrate the importance of understanding network structure and dynamics. The knowledge of network dynamics through in vitro experimental perturbation and modeling allows us to determine the state of the networks, to identify molecular correlates, and to derive new disease treatment approaches to reverse the pathology or prevent its progress into a more severe state through the manipulation of network states. This general approach, including diagnostics and therapeutics, is becoming known as systems medicine.
Systems biology; biomarkers; systems medicine; prion disease; drug induced liver injury; microRNA; organ-specific proteins
Gene Regulatory Networks (GRNs) have become a major focus of interest in recent years. Elucidating the architecture and dynamics of large scale gene regulatory networks is an important goal in systems biology. The knowledge of the gene regulatory networks further gives insights about gene regulatory pathways. This information leads to many potential applications in medicine and molecular biology, examples of which are identification of metabolic pathways, complex genetic diseases, drug discovery and toxicology analysis. High-throughput technologies allow studying various aspects of gene regulatory networks on a genome-wide scale and we will discuss recent advances as well as limitations and future challenges for gene network modeling. Novel approaches are needed to both infer the causal genes and generate hypothesis on the underlying regulatory mechanisms.
In the present article, we introduce a new method for identifying a set of optimal gene regulatory pathways by using structural equations as a tool for modeling gene regulatory networks. The method, first of all, generates data on reaction flows in a pathway. A set of constraints is formulated incorporating weighting coefficients. Finally the gene regulatory pathways are obtained through optimization of an objective function with respect to these weighting coefficients. The effectiveness of the present method is successfully tested on ten gene regulatory networks existing in the literature. A comparative study with the existing extreme pathway analysis also forms a part of this investigation. The results compare favorably with earlier experimental results. The validated pathways point to a combination of previously documented and novel findings.
We show that our method can correctly identify the causal genes and effectively output experimentally verified pathways. The present method has been successful in deriving the optimal regulatory pathways for all the regulatory networks considered. The biological significance and applicability of the optimal pathways has also been discussed. Finally the usefulness of the present method on genetic engineering is depicted with an example.
Dynamic Bayesian Networks (DBNs) are widely used in regulatory network structure inference with gene expression data. Current methods assumed that the underlying stochastic processes that generate the gene expression data are stationary. The assumption is not realistic in certain applications where the intrinsic regulatory networks are subject to changes for adapting to internal or external stimuli.
In this paper we investigate a novel non-stationary DBNs method with a potential regulator detection technique and a flexible lag choosing mechanism. We apply the approach for the gene regulatory network inference on three non-stationary time series data. For the Macrophages and Arabidopsis data sets with the reference networks, our method shows better network structure prediction accuracy. For the Drosophila data set, our approach converges faster and shows a better prediction accuracy on transition times. In addition, our reconstructed regulatory networks on the Drosophila data not only share a lot of similarities with the predictions of the work of other researchers but also provide many new structural information for further investigation.
Compared with recent proposed non-stationary DBNs methods, our approach has better structure prediction accuracy By detecting potential regulators, our method reduces the size of the search space, hence may speed up the convergence of MCMC sampling.
Despite large amounts of available genomic and proteomic data, predicting the structure and response of signaling networks is still a significant challenge. While statistical method such as Bayesian network has been explored to meet this challenge, employing existing biological knowledge for network prediction is difficult. The objective of this study is to develop a novel approach that integrates prior biological knowledge in the form of the Ontology Fingerprint to infer cell-type-specific signaling networks via data-driven Bayesian network learning; and to further use the trained model to predict cellular responses.
We applied our novel approach to address the Predictive Signaling Network Modeling challenge of the fourth (2009) Dialog for Reverse Engineering Assessment's and Methods (DREAM4) competition. The challenge results showed that our method accurately captured signal transduction of a network of protein kinases and phosphoproteins in that the predicted protein phosphorylation levels under all experimental conditions were highly correlated (R2 = 0.93) with the observed results. Based on the evaluation of the DREAM4 organizer, our team was ranked as one of the top five best performers in predicting network structure and protein phosphorylation activity under test conditions.
Bayesian network can be used to simulate the propagation of signals in cellular systems. Incorporating the Ontology Fingerprint as prior biological knowledge allows us to efficiently infer concise signaling network structure and to accurately predict cellular responses.
A central goal of systems biology is the construction of predictive models of bio-molecular networks. Cellular networks of moderate size have been modeled successfully in a quantitative way based on differential equations. However, in large-scale networks, knowledge of mechanistic details and kinetic parameters is often too limited to allow for the set-up of predictive quantitative models.
Here, we review methodologies for qualitative and semi-quantitative modeling of cellular signal transduction networks. In particular, we focus on three different but related formalisms facilitating modeling of signaling processes with different levels of detail: interaction graphs, logical/Boolean networks, and logic-based ordinary differential equations (ODEs). Albeit the simplest models possible, interaction graphs allow the identification of important network properties such as signaling paths, feedback loops, or global interdependencies. Logical or Boolean models can be derived from interaction graphs by constraining the logical combination of edges. Logical models can be used to study the basic input–output behavior of the system under investigation and to analyze its qualitative dynamic properties by discrete simulations. They also provide a suitable framework to identify proper intervention strategies enforcing or repressing certain behaviors. Finally, as a third formalism, Boolean networks can be transformed into logic-based ODEs enabling studies on essential quantitative and dynamic features of a signaling network, where time and states are continuous.
We describe and illustrate key methods and applications of the different modeling formalisms and discuss their relationships. In particular, as one important aspect for model reuse, we will show how these three modeling approaches can be combined to a modeling pipeline (or model hierarchy) allowing one to start with the simplest representation of a signaling network (interaction graph), which can later be refined to logical and eventually to logic-based ODE models. Importantly, systems and network properties determined in the rougher representation are conserved during these transformations.
Interaction graphs; Logical models; Boolean models; Signal transduction; Qualitative modeling; ODE models; EGF signaling
Structural analysis of biochemical networks is a growing field in bioinformatics and systems biology. The availability of an increasing amount of biological data from molecular biological networks promises a deeper understanding but confronts researchers with the problem of combinatorial explosion. The amount of qualitative network data is growing much faster than the amount of quantitative data, such as enzyme kinetics. In many cases it is even impossible to measure quantitative data because of limitations of experimental methods, or for ethical reasons. Thus, a huge amount of qualitative data, such as interaction data, is available, but it was not sufficiently used for modeling purposes, until now. New approaches have been developed, but the complexity of data often limits the application of many of the methods. Biochemical Petri nets make it possible to explore static and dynamic qualitative system properties. One Petri net approach is model validation based on the computation of the system's invariant properties, focusing on t-invariants. T-invariants correspond to subnetworks, which describe the basic system behavior.
With increasing system complexity, the basic behavior can only be expressed by a huge number of t-invariants. According to our validation criteria for biochemical Petri nets, the necessary verification of the biological meaning, by interpreting each subnetwork (t-invariant) manually, is not possible anymore. Thus, an automated, biologically meaningful classification would be helpful in analyzing t-invariants, and supporting the understanding of the basic behavior of the considered biological system.
Here, we introduce a new approach to automatically classify t-invariants to cope with network complexity. We apply clustering techniques such as UPGMA, Complete Linkage, Single Linkage, and Neighbor Joining in combination with different distance measures to get biologically meaningful clusters (t-clusters), which can be interpreted as modules. To find the optimal number of t-clusters to consider for interpretation, the cluster validity measure, Silhouette Width, is applied.
We considered two different case studies as examples: a small signal transduction pathway (pheromone response pathway in Saccharomyces cerevisiae) and a medium-sized gene regulatory network (gene regulation of Duchenne muscular dystrophy). We automatically classified the t-invariants into functionally distinct t-clusters, which could be interpreted biologically as functional modules in the network. We found differences in the suitability of the various distance measures as well as the clustering methods. In terms of a biologically meaningful classification of t-invariants, the best results are obtained using the Tanimoto distance measure. Considering clustering methods, the obtained results suggest that UPGMA and Complete Linkage are suitable for clustering t-invariants with respect to the biological interpretability.
We propose a new approach for the biological classification of Petri net t-invariants based on cluster analysis. Due to the biologically meaningful data reduction and structuring of network processes, large sets of t-invariants can be evaluated, allowing for model validation of qualitative biochemical Petri nets. This approach can also be applied to elementary mode analysis.