To decipher the complexity and improve the understanding of host-pathogen interactions, biologists must adopt new system level approaches in which the hierarchy of biological interactions and dynamics can be studied. This paper presents the application of systems biology for the cross-comparative analysis and interactome modeling of three different infectious agents, leading to the identification of novel, unique and common molecular host responses (biosignatures).
A computational systems biology method was utilized to create interactome models of the host responses to Brucella melitensis (BMEL), Salmonella enterica Typhimurium (STM) and Mycobacterium avium paratuberculosis (MAP). A bovine ligated ileal loop biological model was employed to capture the host gene expression response at four time points post infection. New methods based on Dynamic Bayesian Network (DBN) machine learning were employed to conduct a systematic comparative analysis of pathway and Gene Ontology category perturbations.
A cross-comparative assessment of 219 pathways and 1620 gene ontology (GO) categories was performed on each pathogen-host condition. Both unique and common pathway and GO perturbations indicated remarkable temporal differences in pathogen-host response profiles. Highly discriminatory pathways were selected from each pathogen condition to create a common system level interactome model comprised of 622 genes. This model was trained with data from each pathogen condition to capture unique and common gene expression features and relationships leading to the identification of candidate host-pathogen points of interactions and discriminatory biosignatures.
Our results provide deeper understanding of the overall complexity of host defensive and pathogen invasion processes as well as the identification of novel host-pathogen interactions. The application of advanced computational methods for developing interactome models based on DBN has proven to be instrumental in conducting multi-conditional cross-comparative analyses. Further, this approach generates a fully simulateable model with capabilities for predictive analysis as well as for diagnostic pattern recognition. The resulting biosignatures may represent future targets for identification of emerging pathogens as well as for development of antimicrobial drugs, immunotherapeutics, or vaccines for prevention and treatment of diseases caused by known, emerging/re-emerging infectious agents.
Adequate control of serum glucose in critically ill patients is a complex problem requiring continuous monitoring and intervention, which have a direct effect on clinical outcomes. Understanding temporal relationships can help to improve our knowledge of complex disease processes and their response to treatment. We discuss a Dynamic Bayesian Network (DBN) model that we created using the open-source Projeny toolkit to represent various clinical variables and the temporal and atemporal relationships underlying insulin and glucose homeostasis. We evaluated this model by comparing the DBN model’s insulin dose predictions against those of a rule-based protocol (eProtocol-insulin) currently used in the ICU. The results suggest that the DBN model’s predictions are as effective as or better than those of the rule-based protocol. The limitations of our methods are discussed, with a brief note on their generalizability.
A significant amount of attention has recently been focused on modeling of gene regulatory networks. Two frequently used large-scale modeling frameworks are Bayesian networks (BNs) and Boolean networks, the latter one being a special case of its recent stochastic extension, probabilistic Boolean networks (PBNs). PBN is a promising model class that generalizes the standard rule-based interactions of Boolean networks into the stochastic setting. Dynamic Bayesian networks (DBNs) is a general and versatile model class that is able to represent complex temporal stochastic processes and has also been proposed as a model for gene regulatory systems. In this paper, we concentrate on these two model classes and demonstrate that PBNs and a certain subclass of DBNs can represent the same joint probability distribution over their common variables. The major benefit of introducing the relationships between the models is that it opens up the possibility of applying the standard tools of DBNs to PBNs and vice versa. Hence, the standard learning tools of DBNs can be applied in the context of PBNs, and the inference methods give a natural way of handling the missing values in PBNs which are often present in gene expression measurements. Conversely, the tools for controlling the stationary behavior of the networks, tools for projecting networks onto sub-networks, and efficient learning schemes can be used for DBNs. In other words, the introduced relationships between the models extend the collection of analysis tools for both model classes.
Gene regulatory networks; Probabilistic Boolean networks; Dynamic Bayesian networks
A central goal of molecular biology is to understand the regulatory mechanisms of gene transcription and protein synthesis. Because of their solid basis in statistics, allowing to deal with the stochastic aspects of gene expressions and noisy measurements in a natural way, Bayesian networks appear attractive in the field of inferring gene interactions structure from microarray experiments data. However, the basic formalism has some disadvantages, e.g. it is sometimes hard to distinguish between the origin and the target of an interaction. Two kinds of microarray experiments yield data particularly rich in information regarding the direction of interactions: time series and perturbation experiments. In order to correctly handle them, the basic formalism must be modified. For example, dynamic Bayesian networks (DBN) apply to time series microarray data. To our knowledge the DBN technique has not been applied in the context of perturbation experiments.
We extend the framework of dynamic Bayesian networks in order to incorporate perturbations. Moreover, an exact algorithm for inferring an optimal network is proposed and a discretization method specialized for time series data from perturbation experiments is introduced. We apply our procedure to realistic simulations data. The results are compared with those obtained by standard DBN learning techniques. Moreover, the advantages of using exact learning algorithm instead of heuristic methods are analyzed.
We show that the quality of inferred networks dramatically improves when using data from perturbation experiments. We also conclude that the exact algorithm should be used when it is possible, i.e. when considered set of genes is small enough.
Coordination among cortical neurons is believed to be key element in mediating many high level cortical processes such as perception, attention, learning and memory formation. Inferring the topology of the neural circuitry underlying this coordination is important to characterize the highly non-linear, time-varying interactions between cortical neurons in the presence of complex stimuli. In this work, we investigate the applicability of Dynamic Bayesian Networks (DBNs) in inferring the effective connectivity between spiking cortical neurons from their observed spike trains. We demonstrate that DBNs can infer the underlying non-linear and time-varying causal interactions between these neurons and can discriminate between mono and polysynaptic links between them under certain constraints governing their putative connectivity. We analyzed conditionally-Poisson spike train data mimicking spiking activity of cortical networks of small and moderately-large sizes. The performance was assessed and compared to other methods under systematic variations of the network structure to mimic a wide range of responses typically observed in the cortex. Results demonstrate the utility of DBN in inferring the effective connectivity in cortical networks.
ensemble recordings; spike trains; functional connectivity; effective connectivity; dynamic Bayesian networks; multiple single unit activity; spiking cortical networks
Dynamic Bayesian network (DBN) is among the mainstream approaches for modeling various biological networks, including the gene regulatory network (GRN). Most current methods for learning DBN employ either local search such as hill-climbing, or a meta stochastic global optimization framework such as genetic algorithm or simulated annealing, which are only able to locate sub-optimal solutions. Further, current DBN applications have essentially been limited to small sized networks.
To overcome the above difficulties, we introduce here a deterministic global optimization based DBN approach for reverse engineering genetic networks from time course gene expression data. For such DBN models that consist only of inter time slice arcs, we show that there exists a polynomial time algorithm for learning the globally optimal network structure. The proposed approach, named GlobalMIT+, employs the recently proposed information theoretic scoring metric named mutual information test (MIT). GlobalMIT+ is able to learn high-order time delayed genetic interactions, which are common to most biological systems. Evaluation of the approach using both synthetic and real data sets, including a 733 cyanobacterial gene expression data set, shows significantly improved performance over other techniques.
Our studies demonstrate that deterministic global optimization approaches can infer large scale genetic networks.
Online automated quality assessment is critical to determine a sensor's fitness for purpose in real-time applications. A Dynamic Bayesian Network (DBN) framework is proposed to produce probabilistic quality assessments and represent the uncertainty of sequentially correlated sensor readings. This is a novel framework to represent the causes, quality state and observed effects of individual sensor errors without imposing any constraints upon the physical deployment or measured phenomenon. It represents the casual relationship between quality tests and combines them in a way to generate uncertainty estimates of samples. The DBN was implemented for a particular marine deployment of temperature and conductivity sensors in Hobart, Australia. The DBN was shown to offer a substantial average improvement (34%) in replicating the error bars that were generated by experts when compared to a fuzzy logic approach.
online filtering; automated; quality assessment; sensors; dynamic Bayesian networks
Network inference deals with the reconstruction of biological networks from experimental data. A variety of different reverse engineering techniques are available; they differ in the underlying assumptions and mathematical models used. One common problem for all approaches stems from the complexity of the task, due to the combinatorial explosion of different network topologies for increasing network size. To handle this problem, constraints are frequently used, for example on the node degree, number of edges, or constraints on regulation functions between network components. We propose to exploit topological considerations in the inference of gene regulatory networks. Such systems are often controlled by a small number of hub genes, while most other genes have only limited influence on the network's dynamic. We model gene regulation using a Bayesian network with discrete, Boolean nodes. A hierarchical prior is employed to identify hub genes. The first layer of the prior is used to regularize weights on edges emanating from one specific node. A second prior on hyperparameters controls the magnitude of the former regularization for different nodes. The net effect is that central nodes tend to form in reconstructed networks. Network reconstruction is then performed by maximization of or sampling from the posterior distribution. We evaluate our approach on simulated and real experimental data, indicating that we can reconstruct main regulatory interactions from the data. We furthermore compare our approach to other state-of-the art methods, showing superior performance in identifying hubs. Using a large publicly available dataset of over 800 cell cycle regulated genes, we are able to identify several main hub genes. Our method may thus provide a valuable tool to identify interesting candidate genes for further study. Furthermore, the approach presented may stimulate further developments in regularization methods for network reconstruction from data.
Biological gene networks appear to be dynamically robust to mutation, stochasticity, and changes in the environment and also appear to be sparsely connected. Studies with computational models, however, have suggested that denser gene networks evolve to be more dynamically robust than sparser networks. We resolve this discrepancy by showing that misassumptions about how to measure robustness in artificial networks have inadvertently discounted the costs of network complexity. We show that when the costs of complexity are taken into account, that robustness implies a parsimonious network structure that is sparsely connected and not unnecessarily complex; and that selection will favor sparse networks when network topology is free to evolve. Because a robust system of heredity is necessary for the adaptive evolution of complex phenotypes, the maintenance of frugal network complexity is likely a crucial design constraint that underlies biological organization.
complexity; evolvability; gene network; robustness
In the last decade, advances in genomics, proteomics, and metabolomics have yielded large-scale datasets that have driven an interest in global analyses, with the objective of understanding biological systems as a whole. Systems biology integrates computational modeling and experimental biology to predict and characterize the dynamic properties of biological systems, which are viewed as complex signaling networks. Whereas the systems analysis of disease-perturbed networks holds promise for identification of drug targets for therapy, equally the identified critical network nodes may be targeted through nutritional intervention in either a preventative or therapeutic fashion. As such, in the context of the nutritional sciences, it is envisioned that systems analysis of normal and nutrient-perturbed signaling networks in combination with knowledge of underlying genetic polymorphisms will lead to a future in which the health of individuals will be improved through predictive and preventative nutrition. Although high-throughput transcriptomic microarray data were initially most readily available and amenable to systems analysis, recent technological and methodological advances in MS have contributed to a linear increase in proteomic investigations. It is now commonplace for combined proteomic technologies to generate complex, multi-faceted datasets, and these will be the keystone of future systems biology research. This review will define systems biology, outline current proteomic methodologies, highlight successful applications of proteomics in nutrition research, and discuss the challenges for future applications of systems biology approaches in the nutritional sciences.
Reverse engineering cellular networks is currently one of the most challenging problems in systems biology. Dynamic Bayesian networks (DBNs) seem to be particularly suitable for inferring relationships between cellular variables from the analysis of time series measurements of mRNA or protein concentrations. As evaluating inference results on a real dataset is controversial, the use of simulated data has been proposed. However, DBN approaches that use continuous variables, thus avoiding the information loss associated with discretization, have not yet been extensively assessed, and most of the proposed approaches have dealt with linear Gaussian models.
We propose a generalization of dynamic Gaussian networks to accommodate nonlinear dependencies between variables. As a benchmark dataset to test the new approach, we used data from a mathematical model of cell cycle control in budding yeast that realistically reproduces the complexity of a cellular system. We evaluated the ability of the networks to describe the dynamics of cellular systems and their precision in reconstructing the true underlying causal relationships between variables. We also tested the robustness of the results by analyzing the effect of noise on the data, and the impact of a different sampling time.
The results confirmed that DBNs with Gaussian models can be effectively exploited for a first level analysis of data from complex cellular systems. The inferred models are parsimonious and have a satisfying goodness of fit. Furthermore, the networks not only offer a phenomenological description of the dynamics of cellular systems, but are also able to suggest hypotheses concerning the causal interactions between variables. The proposed nonlinear generalization of Gaussian models yielded models characterized by a slightly lower goodness of fit than the linear model, but a better ability to recover the true underlying connections between variables.
Inferring the topology of a gene-regulatory network (GRN) from genome-scale time-series measurements of transcriptional change has proved useful for disentangling complex biological processes. To address the challenges associated with this inference, a number of competing approaches have previously been used, including examples from information theory, Bayesian and dynamic Bayesian networks (DBNs), and ordinary differential equation (ODE) or stochastic differential equation. The performance of these competing approaches have previously been assessed using a variety of in silico and in vivo datasets. Here, we revisit this work by assessing the performance of more recent network inference algorithms, including a novel non-parametric learning approach based upon nonlinear dynamical systems. For larger GRNs, containing hundreds of genes, these non-parametric approaches more accurately infer network structures than do traditional approaches, but at significant computational cost. For smaller systems, DBNs are competitive with the non-parametric approaches with respect to computational time and accuracy, and both of these approaches appear to be more accurate than Granger causality-based methods and those using simple ODEs models.
gene-regulatory networks; inference; gene expression
Uncovering the operating principles underlying cellular processes by using 'omics' data is often a difficult task due to the high-dimensionality of the solution space that spans all interactions among the bio-molecules under consideration. A rational way to overcome this problem is to use the topology of bio-molecular interaction networks in order to constrain the solution space. Such approaches systematically integrate the existing biological knowledge with the 'omics' data.
Here we introduce a hypothesis-driven method that integrates bio-molecular network topology with transcriptome data, thereby allowing the identification of key biological features (Reporter Features) around which transcriptional changes are significantly concentrated. We have combined transcriptome data with different biological networks in order to identify Reporter Gene Ontologies, Reporter Transcription Factors, Reporter Proteins and Reporter Complexes, and use this to decipher the logic of regulatory circuits playing a key role in yeast glucose repression and human diabetes.
Reporter Features offer the opportunity to identify regulatory hot-spots in bio-molecular interaction networks that are significantly affected between or across conditions. Results of the Reporter Feature analysis not only provide a snapshot of the transcriptional regulatory program but also are biologically easy to interpret and provide a powerful way to generate new hypotheses. Our Reporter Features analyses of yeast glucose repression and human diabetes data brings hints towards the understanding of the principles of transcriptional regulation controlling these two important and potentially closely related systems.
The aim of this study was to provide a framework for the analysis of visceral obesity and its determinants in women, where complex inter-relationships are observed among lifestyle, nutritional and metabolic predictors. Thirty-four predictors related to lifestyle, adiposity, body fat distribution, blood lipids and adipocyte sizes have been considered as potential correlates of visceral obesity in women. To properly address the difficulties in managing such interactions given our limited sample of 150 women, bootstrapped Bayesian networks were constructed based on novel constraint-based learning methods that appeared recently in the statistical learning community. Statistical significance of edge strengths was evaluated and the less reliable edges were pruned to increase the network robustness. To allow accessible interpretation and integrate biological knowledge into the final network, several undirected edges were afterwards directed with physiological expertise according to relevant literature.
Extensive experiments on synthetic data sampled from a known Bayesian network show that the algorithm, called Recursive Hybrid Parents and Children (RHPC), outperforms state-of-the-art algorithms that appeared in the recent literature. Regarding biological plausibility, we found that the inference results obtained with the proposed method were in excellent agreement with biological knowledge. For example, these analyses indicated that visceral adipose tissue accumulation is strongly related to blood lipid alterations independent of overall obesity level.
Bayesian Networks are a useful tool for investigating and summarizing evidence when complex relationships exist among predictors, in particular, as in the case of multifactorial conditions like visceral obesity, when there is a concurrent incidence for several variables, interacting in a complex manner. The source code and the data sets used for the empirical tests are available at http://www710.univ-lyon1.fr/~aaussem/Software.html.
Inference of biological networks has become an important tool in Systems Biology. Nowadays it is becoming clearer that the complexity of organisms is more related with the organization of its components in networks rather than with the individual behaviour of the components. Among various approaches for inferring networks, Bayesian Networks are very attractive due to their probabilistic nature and flexibility to incorporate interventions and extra sources of information. Recently various attempts to infer networks with different Bayesian Networks approaches were pursued. The specific interest in this paper is to compare the performance of three different inference approaches: Bayesian Networks without any modification; Bayesian Networks modified to take into account specific interventions produced during data collection; and a probabilistic hierarchical model that allows the inclusion of extra knowledge in the inference of Bayesian Networks. The inference is performed in three different types of data: (i) synthetic data obtained from a Gaussian distribution, (ii) synthetic data simulated with Netbuilder and (iii) Real data obtained in flow cytometry experiments.
Bayesian Networks with interventions and Bayesian Networks with inclusion of extra knowledge outperform simple Bayesian Networks in all data sets when considering the reconstruction accuracy and taking the edge directions into account. In the Real data the increase in accuracy is also observed when not taking the edge directions into account.
Although it comes with a small extra computational cost the use of more refined Bayesian network models is justified. Both the inclusion of extra knowledge and the use of interventions have outperformed the simple Bayesian network model in simulated and Real data sets. Also, if the source of extra knowledge used in the inference is not reliable the inferred network is not deteriorated. If the extra knowledge has a good agreement with the data there is no significant difference in using the Bayesian networks with interventions or Bayesian networks with the extra knowledge.
Motivation: Primary purpose of modeling gene regulatory networks for developmental process is to reveal pathways governing the cellular differentiation to specific phenotypes. Knowledge of differentiation network will enable generation of desired cell fates by careful alteration of the governing network by adequate manipulation of cellular environment.
Results: We have developed a novel integer programming-based approach to reconstruct the underlying regulatory architecture of differentiating embryonic stem cells from discrete temporal gene expression data. The network reconstruction problem is formulated using inherent features of biological networks: (i) that of cascade architecture which enables treatment of the entire complex network as a set of interconnected modules and (ii) that of sparsity of interconnection between the transcription factors. The developed framework is applied to the system of embryonic stem cells differentiating towards pancreatic lineage. Experimentally determined expression profile dynamics of relevant transcription factors serve as the input to the network identification algorithm. The developed formulation accurately captures many of the known regulatory modes involved in pancreatic differentiation. The predictive capacity of the model is tested by simulating an in silico potential pathway of subsequent differentiation. The predicted pathway is experimentally verified by concurrent differentiation experiments. Experimental results agree well with model predictions, thereby illustrating the predictive accuracy of the proposed algorithm.
Supplementary information: Supplementary data are available at Bioinformatics online.
Systems biology is an approach to the science that views biology as an information science, studies biological systems as a whole and their interactions with the environment. This approach, for the reasons described here, has particular power in the search for informative diagnostic biomarkers of diseases because it focuses on the fundamental causes and keys on the identification and understanding of disease- perturbed molecular networks. In this review, we describe some recent developments that have used systems biology to address complex diseases – prion disease and drug induced liver injury- and use these as examples to illustrate the importance of understanding network structure and dynamics. The knowledge of network dynamics through in vitro experimental perturbation and modeling allows us to determine the state of the networks, to identify molecular correlates, and to derive new disease treatment approaches to reverse the pathology or prevent its progress into a more severe state through the manipulation of network states. This general approach, including diagnostics and therapeutics, is becoming known as systems medicine.
Systems biology; biomarkers; systems medicine; prion disease; drug induced liver injury; microRNA; organ-specific proteins
Gene Regulatory Networks (GRNs) have become a major focus of interest in recent years. Elucidating the architecture and dynamics of large scale gene regulatory networks is an important goal in systems biology. The knowledge of the gene regulatory networks further gives insights about gene regulatory pathways. This information leads to many potential applications in medicine and molecular biology, examples of which are identification of metabolic pathways, complex genetic diseases, drug discovery and toxicology analysis. High-throughput technologies allow studying various aspects of gene regulatory networks on a genome-wide scale and we will discuss recent advances as well as limitations and future challenges for gene network modeling. Novel approaches are needed to both infer the causal genes and generate hypothesis on the underlying regulatory mechanisms.
In the present article, we introduce a new method for identifying a set of optimal gene regulatory pathways by using structural equations as a tool for modeling gene regulatory networks. The method, first of all, generates data on reaction flows in a pathway. A set of constraints is formulated incorporating weighting coefficients. Finally the gene regulatory pathways are obtained through optimization of an objective function with respect to these weighting coefficients. The effectiveness of the present method is successfully tested on ten gene regulatory networks existing in the literature. A comparative study with the existing extreme pathway analysis also forms a part of this investigation. The results compare favorably with earlier experimental results. The validated pathways point to a combination of previously documented and novel findings.
We show that our method can correctly identify the causal genes and effectively output experimentally verified pathways. The present method has been successful in deriving the optimal regulatory pathways for all the regulatory networks considered. The biological significance and applicability of the optimal pathways has also been discussed. Finally the usefulness of the present method on genetic engineering is depicted with an example.
Despite large amounts of available genomic and proteomic data, predicting the structure and response of signaling networks is still a significant challenge. While statistical method such as Bayesian network has been explored to meet this challenge, employing existing biological knowledge for network prediction is difficult. The objective of this study is to develop a novel approach that integrates prior biological knowledge in the form of the Ontology Fingerprint to infer cell-type-specific signaling networks via data-driven Bayesian network learning; and to further use the trained model to predict cellular responses.
We applied our novel approach to address the Predictive Signaling Network Modeling challenge of the fourth (2009) Dialog for Reverse Engineering Assessment's and Methods (DREAM4) competition. The challenge results showed that our method accurately captured signal transduction of a network of protein kinases and phosphoproteins in that the predicted protein phosphorylation levels under all experimental conditions were highly correlated (R2 = 0.93) with the observed results. Based on the evaluation of the DREAM4 organizer, our team was ranked as one of the top five best performers in predicting network structure and protein phosphorylation activity under test conditions.
Bayesian network can be used to simulate the propagation of signals in cellular systems. Incorporating the Ontology Fingerprint as prior biological knowledge allows us to efficiently infer concise signaling network structure and to accurately predict cellular responses.
Dynamic Bayesian Networks (DBNs) are widely used in regulatory network structure inference with gene expression data. Current methods assumed that the underlying stochastic processes that generate the gene expression data are stationary. The assumption is not realistic in certain applications where the intrinsic regulatory networks are subject to changes for adapting to internal or external stimuli.
In this paper we investigate a novel non-stationary DBNs method with a potential regulator detection technique and a flexible lag choosing mechanism. We apply the approach for the gene regulatory network inference on three non-stationary time series data. For the Macrophages and Arabidopsis data sets with the reference networks, our method shows better network structure prediction accuracy. For the Drosophila data set, our approach converges faster and shows a better prediction accuracy on transition times. In addition, our reconstructed regulatory networks on the Drosophila data not only share a lot of similarities with the predictions of the work of other researchers but also provide many new structural information for further investigation.
Compared with recent proposed non-stationary DBNs methods, our approach has better structure prediction accuracy By detecting potential regulators, our method reduces the size of the search space, hence may speed up the convergence of MCMC sampling.
Structural analysis of biochemical networks is a growing field in bioinformatics and systems biology. The availability of an increasing amount of biological data from molecular biological networks promises a deeper understanding but confronts researchers with the problem of combinatorial explosion. The amount of qualitative network data is growing much faster than the amount of quantitative data, such as enzyme kinetics. In many cases it is even impossible to measure quantitative data because of limitations of experimental methods, or for ethical reasons. Thus, a huge amount of qualitative data, such as interaction data, is available, but it was not sufficiently used for modeling purposes, until now. New approaches have been developed, but the complexity of data often limits the application of many of the methods. Biochemical Petri nets make it possible to explore static and dynamic qualitative system properties. One Petri net approach is model validation based on the computation of the system's invariant properties, focusing on t-invariants. T-invariants correspond to subnetworks, which describe the basic system behavior.
With increasing system complexity, the basic behavior can only be expressed by a huge number of t-invariants. According to our validation criteria for biochemical Petri nets, the necessary verification of the biological meaning, by interpreting each subnetwork (t-invariant) manually, is not possible anymore. Thus, an automated, biologically meaningful classification would be helpful in analyzing t-invariants, and supporting the understanding of the basic behavior of the considered biological system.
Here, we introduce a new approach to automatically classify t-invariants to cope with network complexity. We apply clustering techniques such as UPGMA, Complete Linkage, Single Linkage, and Neighbor Joining in combination with different distance measures to get biologically meaningful clusters (t-clusters), which can be interpreted as modules. To find the optimal number of t-clusters to consider for interpretation, the cluster validity measure, Silhouette Width, is applied.
We considered two different case studies as examples: a small signal transduction pathway (pheromone response pathway in Saccharomyces cerevisiae) and a medium-sized gene regulatory network (gene regulation of Duchenne muscular dystrophy). We automatically classified the t-invariants into functionally distinct t-clusters, which could be interpreted biologically as functional modules in the network. We found differences in the suitability of the various distance measures as well as the clustering methods. In terms of a biologically meaningful classification of t-invariants, the best results are obtained using the Tanimoto distance measure. Considering clustering methods, the obtained results suggest that UPGMA and Complete Linkage are suitable for clustering t-invariants with respect to the biological interpretability.
We propose a new approach for the biological classification of Petri net t-invariants based on cluster analysis. Due to the biologically meaningful data reduction and structuring of network processes, large sets of t-invariants can be evaluated, allowing for model validation of qualitative biochemical Petri nets. This approach can also be applied to elementary mode analysis.
Motivation: Modern systems biology aims at understanding how the different molecular components of a biological cell interact. Often, cellular functions are performed by complexes consisting of many different proteins. The composition of these complexes may change according to the cellular environment, and one protein may be involved in several different processes. The automatic discovery of functional complexes from protein interaction data is challenging. While previous approaches use approximations to extract dense modules, our approach exactly solves the problem of dense module enumeration. Furthermore, constraints from additional information sources such as gene expression and phenotype data can be integrated, so we can systematically mine for dense modules with interesting profiles.
Results: Given a weighted protein interaction network, our method discovers all protein sets that satisfy a user-defined minimum density threshold. We employ a reverse search strategy, which allows us to exploit the density criterion in an efficient way. Our experiments show that the novel approach is feasible and produces biologically meaningful results. In comparative validation studies using yeast data, the method achieved the best overall prediction performance with respect to confirmed complexes. Moreover, by enhancing the yeast network with phenotypic and phylogenetic profiles and the human network with tissue-specific expression data, we identified condition-dependent complex variants.
Availability: A C++ implementation of the algorithm is available at http://www.kyb.tuebingen.mpg.de/~georgii/dme.html.
Supplementary information: Supplementary data are available at Bioinformatics online.
The current challenge of Systems Biology is to integrate high throughput data sets for simulating the complexity of biological networks, exploit the evolution of nature-designed networks that maintain the robustness of a biological system, and thereby generate novel, experimentally testable hypotheses. In order to simulate non-linear biological complexities, we have previously developed an Enzyme-Centric mechanistic modeling approach and validated it using metabolic network in E. coli. The idea is to use prior knowledge of catalytic and regulatory mechanisms of each enzyme within the metabolic network to build a dynamic model for investigating the network level regulation and thus understand the nature design principle behind the network.
In this paper, we further demonstrate the application of complex enzyme catalytic and regulatory modules to simulate nonlinear network regulatory patterns vs. simple linear conversion model. We learned and validated that it is essential to incorporate prior knowledge from the literature to simulate non-linear biological complexities. The network expandability is demonstrated and validated with the complex amino acid biosynthetic network with multi-regulations. Also, we demonstrated the compatibility of mechanistic models within close species. Furthermore, the eukaryotic protein factory model for insuring steady mRNA production is simulated and the coupling of RNA transcription and splicing is validated by both mathematical simulation and experimental analysis.
We demonstrated the importance of modeling complex enzyme catalytic and regulatory mechanisms to further understand nonlinear network regulatory patterns. The simulations presented in this paper reveal how a living system maintains homeostasis and its robustness to continue functioning while facing environmental stresses or genetic mutations.
The cell cycle is one of the biological processes most frequently investigated in systems biology studies and it involves the knowledge of a large number of genes and networks of protein interactions. A deep knowledge of the molecular aspect of this biological process can contribute to making cancer research more accurate and innovative. In this context the mathematical modelling of the cell cycle has a relevant role to quantify the behaviour of each component of the systems. The mathematical modelling of a biological process such as the cell cycle allows a systemic description that helps to highlight some features such as emergent properties which could be hidden when the analysis is performed only from a reductionism point of view. Moreover, in modelling complex systems, a complete annotation of all the components is equally important to understand the interaction mechanism inside the network: for this reason data integration of the model components has high relevance in systems biology studies.
In this work, we present a resource, the Cell Cycle Database, intended to support systems biology analysis on the Cell Cycle process, based on two organisms, yeast and mammalian. The database integrates information about genes and proteins involved in the cell cycle process, stores complete models of the interaction networks and allows the mathematical simulation over time of the quantitative behaviour of each component. To accomplish this task, we developed, a web interface for browsing information related to cell cycle genes, proteins and mathematical models. In this framework, we have implemented a pipeline which allows users to deal with the mathematical part of the models, in order to solve, using different variables, the ordinary differential equation systems that describe the biological process.
This integrated system is freely available in order to support systems biology research on the cell cycle and it aims to become a useful resource for collecting all the information related to actual and future models of this network. The flexibility of the database allows the addition of mathematical data which are used for simulating the behavior of the cell cycle components in the different models. The resource deals with two relevant problems in systems biology: data integration and mathematical simulation of a crucial biological process related to cancer, such as the cell cycle. In this way the resource is useful both to retrieve information about cell cycle model components and to analyze their dynamical properties. The Cell Cycle Database can be used to find system-level properties, such as stable steady states and oscillations, by coupling structure and dynamical information about models.
Our understanding of how evolution acts on biological networks remains patchy, as is our knowledge of how that action is best identified, modelled and understood. Starting with network structure and the evolution of protein–protein interaction networks, we briefly survey the ways in which network evolution is being addressed in the fields of systems biology, development and ecology. The approaches highlighted demonstrate a movement away from a focus on network topology towards a more integrated view, placing biological properties centre-stage. We argue that there remains great potential in a closer synergy between evolutionary biology and biological network analysis, although that may require the development of novel approaches and even different analogies for biological networks themselves.
correlation network; eQTL network; eurovision song contest; gene regulatory network; protein–protein interaction network