COnstraint-Based Reconstruction and Analysis (COBRA) methods are widely used for genome-scale modeling of metabolic networks in both prokaryotes and eukaryotes. Due to the successes with metabolism, there is an increasing effort to apply COBRA methods to reconstruct and analyze integrated models of cellular processes. The COBRA Toolbox for MATLAB is a leading software package for genome-scale analysis of metabolism; however, it was not designed to elegantly capture the complexity inherent in integrated biological networks and lacks an integration framework for the multiomics data used in systems biology. The openCOBRA Project is a community effort to promote constraints-based research through the distribution of freely available software.
Here, we describe COBRA for Python (COBRApy), a Python package that provides support for basic COBRA methods. COBRApy is designed in an object-oriented fashion that facilitates the representation of the complex biological processes of metabolism and gene expression. COBRApy does not require MATLAB to function; however, it includes an interface to the COBRA Toolbox for MATLAB to facilitate use of legacy codes. For improved performance, COBRApy includes parallel processing support for computationally intensive processes.
COBRApy is an object-oriented framework designed to meet the computational challenges associated with the next generation of stoichiometric constraint-based models and high-density omics data sets.
Genome-scale; Network reconstruction; Metabolism; Gene expression; Constraint-based modeling
We present SBEToolbox (Systems Biology and Evolution Toolbox), an open-source Matlab toolbox for biological network analysis. It takes a network file as input, calculates a variety of centralities and topological metrics, clusters nodes into modules, and displays the network using different graph layout algorithms. Straightforward implementation and the inclusion of high-level functions allow the functionality to be easily extended or tailored through developing custom plugins. SBEGUI, a menu-driven graphical user interface (GUI) of SBEToolbox, enables easy access to various network and graph algorithms for programmers and non-programmers alike. All source code and sample data are freely available at https://github.com/biocoder/SBEToolbox/releases.
Matlab toolbox; biological network; node centrality; network evolution
Mathematical modelling of cellular networks is an integral part of Systems Biology and requires appropriate software tools. An important class of methods in Systems Biology deals with structural or topological (parameter-free) analysis of cellular networks. So far, software tools providing such methods for both mass-flow (metabolic) as well as signal-flow (signalling and regulatory) networks are lacking.
Herein we introduce CellNetAnalyzer, a toolbox for MATLAB facilitating, in an interactive and visual manner, a comprehensive structural analysis of metabolic, signalling and regulatory networks. The particular strengths of CellNetAnalyzer are methods for functional network analysis, i.e. for characterising functional states, for detecting functional dependencies, for identifying intervention strategies, or for giving qualitative predictions on the effects of perturbations. CellNetAnalyzer extends its predecessor FluxAnalyzer (originally developed for metabolic network and pathway analysis) by a new modelling framework for examining signal-flow networks. Two of the novel methods implemented in CellNetAnalyzer are discussed in more detail regarding algorithmic issues and applications: the computation and analysis (i) of shortest positive and shortest negative paths and circuits in interaction graphs and (ii) of minimal intervention sets in logical networks.
CellNetAnalyzer provides a single suite to perform structural and qualitative analysis of both mass-flow- and signal-flow-based cellular networks in a user-friendly environment. It provides a large toolbox with various, partially unique, functions and algorithms for functional network analysis.CellNetAnalyzer is freely available for academic use.
Large-scale protein-protein interaction networks provide new opportunities for understanding cellular organization and functioning. We introduce network schemas to elucidate shared mechanisms within interactomes. Network schemas specify descriptions of proteins and the topology of interactions among them. We develop algorithms for systematically uncovering recurring, over-represented schemas in physical interaction networks. We apply our methods to the S. cerevisiae interactome, focusing on schemas consisting of proteins described via sequence motifs and molecular function annotations and interacting with one another in one of four basic network topologies. We identify hundreds of recurring and over-represented network schemas of various complexity, and demonstrate via graph-theoretic representations how more complex schemas are organized in terms of their lower-order constituents. The uncovered schemas span a wide range of cellular activities, with many signaling and transport related higher-order schemas. We establish the functional importance of the schemas by showing that they correspond to functionally cohesive sets of proteins, are enriched in the frequency with which they have instances in the H. sapiens interactome, and are useful for predicting protein function. Our findings suggest that network schemas are a powerful paradigm for organizing, interrogating, and annotating cellular networks.
Large-scale networks of protein-protein interactions provide a view into the workings of the cell. However, these interaction maps do not come with a key for interpreting them, so it is necessary to develop methods that shed light on their functioning and organization. We propose the language of network schemas for describing recurring patterns of specific types of proteins and their interactions. That is, network schemas describe proteins and specify the topology of interactions among them. A single network schema can describe, for example, a common template that underlies several distinct cellular pathways, such as signaling pathways. We develop a computational methodology for identifying network schemas that are recurrent and over-represented in the network, even given the distributions of their constituent components. We apply this methodology to the physical interaction network in S. cerevisiae and begin to build a hierarchy of schemas starting with the four simplest topologies. We validate the biological relevance of the schemas that we find, discuss the insights our findings lend into the organization of interactomes, touch upon cross-genomic aspects of schema analysis, and show how to use schemas to annotate uncharacterized protein families.
It is a great challenge of modern biology to determine the functional roles of non-synonymous Single Nucleotide Polymorphisms (nsSNPs) on complex phenotypes. Statistical and machine learning techniques establish correlations between genotype and phenotype, but may fail to infer the biologically relevant mechanisms. The emerging paradigm of Network-based Association Studies aims to address this problem of statistical analysis. However, a mechanistic understanding of how individual molecular components work together in a system requires knowledge of molecular structures, and their interactions.
To address the challenge of understanding the genetic, molecular, and cellular basis of complex phenotypes, we have, for the first time, developed a structural systems biology approach for genome-wide multiscale modeling of nsSNPs - from the atomic details of molecular interactions to the emergent properties of biological networks. We apply our approach to determine the functional roles of nsSNPs associated with hypoxia tolerance in Drosophila melanogaster. The integrated view of the functional roles of nsSNP at both molecular and network levels allows us to identify driver mutations and their interactions (epistasis) in H, Rad51D, Ulp1, Wnt5, HDAC4, Sol, Dys, GalNAc-T2, and CG33714 genes, all of which are involved in the up-regulation of Notch and Gurken/EGFR signaling pathways. Moreover, we find that a large fraction of the driver mutations are neither located in conserved functional sites, nor responsible for structural stability, but rather regulate protein activity through allosteric transitions, protein-protein interactions, or protein-nucleic acid interactions. This finding should impact future Genome-Wide Association Studies.
Our studies demonstrate that the consolidation of statistical, structural, and network views of biomolecules and their interactions can provide new insight into the functional role of nsSNPs in Genome-Wide Association Studies, in a way that neither the knowledge of molecular structures nor biological networks alone could achieve. Thus, multiscale modeling of nsSNPs may prove to be a powerful tool for establishing the functional roles of sequence variants in a wide array of applications.
Understanding gene interactions in complex living systems can be seen as the ultimate goal of the systems biology revolution. Hence, to elucidate disease ontology fully and to reduce the cost of drug development, gene regulatory networks (GRNs) have to be constructed. During the last decade, many GRN inference algorithms based on genome-wide data have been developed to unravel the complexity of gene regulation. Time series transcriptomic data measured by genome-wide DNA microarrays are traditionally used for GRN modelling. One of the major problems with microarrays is that a dataset consists of relatively few time points with respect to the large number of genes. Dimensionality is one of the interesting problems in GRN modelling.
In this paper, we develop a biclustering function enrichment analysis toolbox (BicAT-plus) to study the effect of biclustering in reducing data dimensions. The network generated from our system was validated via available interaction databases and was compared with previous methods. The results revealed the performance of our proposed method.
Because of the sparse nature of GRNs, the results of biclustering techniques differ significantly from those of previous methods.
Phenomenological information about regulatory interactions is frequently available and can be readily converted to Boolean models. Fully quantitative models, on the other hand, provide detailed insights into the precise dynamics of the underlying system. In order to connect discrete and continuous modeling approaches, methods for the conversion of Boolean systems into systems of ordinary differential equations have been developed recently. As biological interaction networks have steadily grown in size and complexity, a fully automated framework for the conversion process is desirable.
We present Odefy, a MATLAB- and Octave-compatible toolbox for the automated transformation of Boolean models into systems of ordinary differential equations. Models can be created from sets of Boolean equations or graph representations of Boolean networks. Alternatively, the user can import Boolean models from the CellNetAnalyzer toolbox, GINSim and the PBN toolbox. The Boolean models are transformed to systems of ordinary differential equations by multivariate polynomial interpolation and optional application of sigmoidal Hill functions. Our toolbox contains basic simulation and visualization functionalities for both, the Boolean as well as the continuous models. For further analyses, models can be exported to SQUAD, GNA, MATLAB script files, the SB toolbox, SBML and R script files. Odefy contains a user-friendly graphical user interface for convenient access to the simulation and exporting functionalities. We illustrate the validity of our transformation approach as well as the usage and benefit of the Odefy toolbox for two biological systems: a mutual inhibitory switch known from stem cell differentiation and a regulatory network giving rise to a specific spatial expression pattern at the mid-hindbrain boundary.
Odefy provides an easy-to-use toolbox for the automatic conversion of Boolean models to systems of ordinary differential equations. It can be efficiently connected to a variety of input and output formats for further analysis and investigations. The toolbox is open-source and can be downloaded at http://cmb.helmholtz-muenchen.de/odefy.
Over the past decade, a growing community of researchers has emerged around the use of COnstraint-Based Reconstruction and Analysis (COBRA) methods to simulate, analyze and predict a variety of metabolic phenotypes using genome-scale models. The COBRA Toolbox, a MATLAB package for implementing COBRA methods, was presented earlier. Here we present a significant update of this in silico ToolBox. Version 2.0 of the COBRA Toolbox expands the scope of computations by including in silico analysis methods developed since its original release. New functions include: (1) network gap filling, (2) 13C analysis, (3) metabolic engineering, (4) omics-guided analysis, and (5) visualization. As with the first version, the COBRA Toolbox reads and writes Systems Biology Markup Language formatted models. In version 2.0, we improved performance, usability, and the level of documentation. A suite of test scripts can now be used to learn the core functionality of the Toolbox and validate results. This Toolbox lowers the barrier of entry to use powerful COBRA methods.
Systems Biology; Computational Biology; MATLAB; Flux Balance Analysis; Fluxomics; Visualization; Gap Filling; Metabolic Engineering
The impact of gene silencing on cellular phenotypes is difficult to establish due to the complexity of interactions in the associated biological processes and pathways. A recent genome-wide RNA knock-down study both identified and phenotypically characterized a set of important genes for the cell cycle in HeLa cells. Here, we combine a molecular interaction network analysis, based on physical and functional protein interactions, in conjunction with evolutionary information, to elucidate the common biological and topological properties of these key genes. Our results show that these genes tend to be conserved with their corresponding protein interactions across several species and are key constituents of the evolutionary conserved molecular interaction network. Moreover, a group of bistable network motifs is found to be conserved within this network, which are likely to influence the network stability and therefore the robustness of cellular functioning. They form a cluster, which displays functional homogeneity and is significantly enriched in genes phenotypically relevant for mitosis. Additional results reveal a relationship between specific cellular processes and the phenotypic outcomes induced by gene silencing. This study introduces new ideas regarding the relationship between genotype and phenotype in the context of the cell cycle. We show that the analysis of molecular interaction networks can result in the identification of genes relevant to cellular processes, which is a promising avenue for future research.
The human brain is a complex system whose topological organization can be represented using connectomics. Recent studies have shown that human connectomes can be constructed using various neuroimaging technologies and further characterized using sophisticated analytic strategies, such as graph theory. These methods reveal the intriguing topological architectures of human brain networks in healthy populations and explore the changes throughout normal development and aging and under various pathological conditions. However, given the huge complexity of this methodology, toolboxes for graph-based network visualization are still lacking. Here, using MATLAB with a graphical user interface (GUI), we developed a graph-theoretical network visualization toolbox, called BrainNet Viewer, to illustrate human connectomes as ball-and-stick models. Within this toolbox, several combinations of defined files with connectome information can be loaded to display different combinations of brain surface, nodes and edges. In addition, display properties, such as the color and size of network elements or the layout of the figure, can be adjusted within a comprehensive but easy-to-use settings panel. Moreover, BrainNet Viewer draws the brain surface, nodes and edges in sequence and displays brain networks in multiple views, as required by the user. The figure can be manipulated with certain interaction functions to display more detailed information. Furthermore, the figures can be exported as commonly used image file formats or demonstration video for further use. BrainNet Viewer helps researchers to visualize brain networks in an easy, flexible and quick manner, and this software is freely available on the NITRC website (www.nitrc.org/projects/bnv/).
The Matlab software is a one of the most advanced development tool for application in engineering practice. From our point of view the most important is the image processing toolbox, offering many built-in functions, including mathematical morphology, and implementation of a many artificial neural networks as AI. It is very popular platform for creation of the specialized program for image analysis, also in pathology. Based on the latest version of Matlab Builder Java toolbox, it is possible to create the software, serving as a remote system for image analysis in pathology via internet communication. The internet platform can be realized based on Java Servlet Pages with Tomcat server as servlet container.
In presented software implementation we propose remote image analysis realized by Matlab algorithms. These algorithms can be compiled to executable jar file with the help of Matlab Builder Java toolbox. The Matlab function must be declared with the set of input data, output structure with numerical results and Matlab web figure. Any function prepared in that manner can be used as a Java function in Java Servlet Pages (JSP). The graphical user interface providing the input data and displaying the results (also in graphical form) must be implemented in JSP. Additionally the data storage to database can be implemented within algorithm written in Matlab with the help of Matlab Database Toolbox directly with the image processing. The complete JSP page can be run by Tomcat server.
The proposed tool for remote image analysis was tested on the Computerized Analysis of Medical Images (CAMI) software developed by author. The user provides image and case information (diagnosis, staining, image parameter etc.). When analysis is initialized, input data with image are sent to servlet on Tomcat. When analysis is done, client obtains the graphical results as an image with marked recognized cells and also the quantitative output. Additionally, the results are stored in a server database. The internet platform was tested on PC Intel Core2 Duo T9600 2.8GHz 4GB RAM server with 768x576 pixel size, 1.28Mb tiff format images reffering to meningioma tumour (x400, Ki-67/MIB-1). The time consumption was as following: at analysis by CAMI, locally on a server – 3.5 seconds, at remote analysis – 26 seconds, from which 22 seconds were used for data transfer via internet connection. At jpg format image (102 Kb) the consumption time was reduced to 14 seconds.
The results have confirmed that designed remote platform can be useful for pathology image analysis. The time consumption is depended mainly on the image size and speed of the internet connections. The presented implementation can be used for many types of analysis at different staining, tissue, morphometry approaches, etc. The significant problem is the implementation of the JSP page in the multithread form, that can be used parallelly by many users. The presented platform for image analysis in pathology can be especially useful for small laboratory without its own image analysis system.
Deciphering heterogeneous cellular networks with embedded modules is a great challenge of current systems biology. Experimental and computational studies construct complex networks of molecules that describe various aspects of the cell such as transcriptional regulation, protein interactions and metabolism. Groups of interacting genes and proteins reflect network modules that potentially share regulatory mechanisms and relate to common function. Here, we present GraphWeb, a public web server for biological network analysis and module discovery. GraphWeb provides methods to: (1) integrate heterogeneous and multispecies data for constructing directed and undirected, weighted and unweighted networks; (ii) discover network modules using a variety of algorithms and topological filters and (iii) interpret modules using functional knowledge of the Gene Ontology and pathways, as well as regulatory features such as binding motifs and microRNA targets. GraphWeb is designed to analyse individual or multiple merged networks, search for conserved features across multiple species, mine large biological networks for smaller modules, discover novel candidates and connections for known pathways and compare results of high-throughput datasets. The GraphWeb is available at http://biit.cs.ut.ee/graphweb/.
In recent years, graph theoretical analyses of neuroimaging data have increased our understanding of the organization of large-scale structural and functional brain networks. However, tools for pipeline application of graph theory for analyzing topology of brain networks is still lacking. In this report, we describe the development of a graph-analysis toolbox (GAT) that facilitates analysis and comparison of structural and functional network brain networks. GAT provides a graphical user interface (GUI) that facilitates construction and analysis of brain networks, comparison of regional and global topological properties between networks, analysis of network hub and modules, and analysis of resilience of the networks to random failure and targeted attacks. Area under a curve (AUC) and functional data analyses (FDA), in conjunction with permutation testing, is employed for testing the differences in network topologies; analyses that are less sensitive to the thresholding process. We demonstrated the capabilities of GAT by investigating the differences in the organization of regional gray-matter correlation networks in survivors of acute lymphoblastic leukemia (ALL) and healthy matched Controls (CON). The results revealed an alteration in small-world characteristics of the brain networks in the ALL survivors; an observation that confirm our hypothesis suggesting widespread neurobiological injury in ALL survivors. Along with demonstration of the capabilities of the GAT, this is the first report of altered large-scale structural brain networks in ALL survivors.
Molecular pathways represent an ensemble of interactions occurring among molecules within the cell and between cells. The identification of similarities between molecular pathways across organisms and functions has a critical role in understanding complex biological processes. For the inference of such novel information, the comparison of molecular pathways requires to account for imperfect matches (flexibility) and to efficiently handle complex network topologies. To date, these characteristics are only partially available in tools designed to compare molecular interaction maps.
Our approach MIMO (Molecular Interaction Maps Overlap) addresses the first problem by allowing the introduction of gaps and mismatches between query and template pathways and permits -when necessary- supervised queries incorporating a priori biological information. It then addresses the second issue by relying directly on the rich graph topology described in the Systems Biology Markup Language (SBML) standard, and uses multidigraphs to efficiently handle multiple queries on biological graph databases. The algorithm has been here successfully used to highlight the contact point between various human pathways in the Reactome database.
MIMO offers a flexible and efficient graph-matching tool for comparing complex biological pathways.
Graphs and networks are common analysis representations for biological systems. Many traditional graph algorithms such as k-clique, k-coloring, and subgraph matching have great potential as analysis techniques for newly available data in biology. Yet, as the amount of genomic and bionetwork information rapidly grows, scientists need advanced new computational strategies and tools for dealing with the complexities of the bionetwork analysis and the volume of the data.
We introduce a computational framework for graph analysis called the Biological Graph Environment (BioGraphE), which provides a general, scalable integration platform for connecting graph problems in biology to optimized computational solvers and high-performance systems. This framework enables biology researchers and computational scientists to identify and deploy network analysis applications and to easily connect them to efficient and powerful computational software and hardware that are specifically designed and tuned to solve complex graph problems. In our particular application of BioGraphE to support network analysis in genome biology, we investigate the use of a Boolean satisfiability solver known as Survey Propagation as a core computational solver executing on standard high-performance parallel systems, as well as multi-threaded architectures.
In our application of BioGraphE to conduct bionetwork analysis of homology networks, we found that BioGraphE and a custom, parallel implementation of the Survey Propagation SAT solver were capable of solving very large bionetwork problems at high rates of execution on different high-performance computing platforms.
This Teaching Resource provides lecture notes, slides, and a problem set for a set of three lectures from a course entitled “Systems Biology: Biomedical Modeling.” The materials are from three separate lectures introducing applications of graph theory and network analysis in systems biology. The first lecture describes different types of intracellular networks, methods for constructing biological networks, and different types of graphs used to represent regulatory intracellular networks. The second lecture surveys milestones and key concepts in network analysis by introducing topological measures, random networks, growing network models, and topological observations from molecular biological systems abstracted to networks. The third lecture discusses methods for analyzing lists of genes and experimental data in the context of prior knowledge networks to make predictions.
A great deal of data has accumulated on signalling pathways. These large datasets are thought to contain much implicit information on their molecular structure, interaction and activity information, which provides a picture of intricate molecular networks believed to underlie biological functions. While tremendous advances have been made in trying to understand these systems, how information is transmitted within them is still poorly understood. This ever growing amount of data demands we adopt powerful computational techniques that will play a pivotal role in the conversion of mined data to knowledge, and in elucidating the topological and functional properties of protein - protein interactions.
A computational framework is presented which allows for the description of embedded networks, and identification of common shared components thought to assist in the transmission of information within the systems studied. By employing the graph theories of network biology - such as degree distribution, clustering coefficient, vertex betweenness and shortest path measures - topological features of protein-protein interactions for published datasets of the p53, nuclear factor kappa B (NF-κB) and G1/S phase of the cell cycle systems were ascertained. Highly ranked nodes which in some cases were identified as connecting proteins most likely responsible for propagation of transduction signals across the networks were determined. The functional consequences of these nodes in the context of their network environment were also determined. These findings highlight the usefulness of the framework in identifying possible combination or links as targets for therapeutic responses; and put forward the idea of using retrieved knowledge on the shared components in constructing better organised and structured models of signalling networks.
It is hoped that through the data mined reconstructed signal transduction networks, well developed models of the published data can be built which in the end would guide the prediction of new targets based on the pathway's environment for further analysis. Source code is available upon request.
Biological networks characterize the interactions of biomolecules at a systems-level. One important property of biological networks is the modular structure, in which nodes are densely connected with each other, but between which there are only sparse connections. In this report, we attempted to find the relationship between the network topology and formation of modular structure by comparing gene co-expression networks with random networks. The organization of gene functional modules was also investigated.
We constructed a genome-wide Arabidopsis gene co-expression network (AGCN) by using 1094 microarrays. We then analyzed the topological properties of AGCN and partitioned the network into modules by using an efficient graph clustering algorithm. In the AGCN, 382 hub genes formed a clique, and they were densely connected only to a small subset of the network. At the module level, the network clustering results provide a systems-level understanding of the gene modules that coordinate multiple biological processes to carry out specific biological functions. For instance, the photosynthesis module in AGCN involves a very large number (> 1000) of genes which participate in various biological processes including photosynthesis, electron transport, pigment metabolism, chloroplast organization and biogenesis, cofactor metabolism, protein biosynthesis, and vitamin metabolism. The cell cycle module orchestrated the coordinated expression of hundreds of genes involved in cell cycle, DNA metabolism, and cytoskeleton organization and biogenesis. We also compared the AGCN constructed in this study with a graphical Gaussian model (GGM) based Arabidopsis gene network. The photosynthesis, protein biosynthesis, and cell cycle modules identified from the GGM network had much smaller module sizes compared with the modules found in the AGCN, respectively.
This study reveals new insight into the topological properties of biological networks. The preferential hub-hub connections might be necessary for the formation of modular structure in gene co-expression networks. The study also reveals new insight into the organization of gene functional modules.
Networks are employed to represent many nonlinear complex systems in the real world. The topological aspects and relationships between the structure and function of biological networks have been widely studied in the past few decades. However dynamic and control features of complex networks have not been widely researched, in comparison to topological network features. In this study, we explore the relationship between network controllability, topological parameters, and network medicine (metabolic drug targets). Considering the assumption that targets of approved anticancer metabolic drugs are driver nodes (which control cancer metabolic networks), we have applied topological analysis to genome-scale metabolic models of 15 normal and corresponding cancer cell types. The results show that besides primary network parameters, more complex network metrics such as motifs and clusters may also be appropriate for controlling the systems providing the controllability relationship between topological parameters and drug targets. Consequently, this study reveals the possibilities of following a set of driver nodes in network clusters instead of considering them individually according to their centralities. This outcome suggests considering distributed control systems instead of nodal control for cancer metabolic networks, leading to a new strategy in the field of network medicine.
Molecular typing methods are commonly used to study genetic relationships among bacterial isolates. Many of these methods have become standardized and produce portable data. A popular approach for analyzing such data is to construct graphs, including phylogenies. Inferences from graph representations of data assist in understanding the patterns of transmission of bacterial pathogens, and basing these graph constructs on biological models of evolution of the molecular marker helps make these inferences. Spoligotyping is a widely used method for genotyping isolates of Mycobacterium tuberculosis that exploits polymorphism in the direct repeat region. Our goal was to examine a range of models describing the evolution of spoligotypes in order to develop a visualization method to represent likely relationships among M. tuberculosis isolates.
We found that inferred mutations of spoligotypes frequently involve the loss of a single or very few adjacent spacers. Using a second-order variant of Akaike's Information Criterion, we selected the Zipf model as the basis for resolving ambiguities in the ancestry of spoligotypes. We developed a method to construct graphs of spoligotypes (which we call spoligoforests). To demonstrate this method, we applied it to a tuberculosis data set from Cuba and compared the method to some existing methods.
We propose a new approach in analyzing relationships of M. tuberculosis isolates using spoligotypes. The spoligoforest recovers a plausible history of transmission and mutation events based on the selected deletion model. The method may be suitable to study markers based on loci of similar structure from other bacteria. The groupings and relationships in the spoligoforest can be analyzed along with the clinical features of strains to provide an understanding of the evolution of spoligotypes.
With the advent of high-throughput biotechnology capable of monitoring genomic signals, it becomes increasingly promising to understand molecular cellular mechanisms through systems biology approaches. One of the active research topics in systems biology is to infer gene transcriptional regulatory networks using various genomic data; this inference problem can be formulated as a linear model with latent signals associated with some regulatory proteins called transcription factors (TFs). As common statistical assumptions may not hold for genomic signals, typical latent variable algorithms such as independent component analysis (ICA) are incapable to reveal underlying true regulatory signals. Liao et al.  proposed to perform inference using an approach named network component analysis (NCA), the optimization of which is achieved by a least-squares fitting approach with biological knowledge constraints. However, the incompleteness of biological knowledge and its inconsistency with gene expression data are not considered in the original NCA solution, which could greatly affect the inference accuracy. To overcome these limitations, we propose a linear extraction scheme, namely regulatory component analysis (RCA), to infer underlying regulatory signals even with partial biological knowledge. Numerical simulations show a significant improvement of our proposed RCA over NCA, not only when signal-to-noise-ratio (SNR) is low, but also when the given biological knowledge is incomplete and inconsistent to gene expression data. Furthermore, real biological experiments on E. coli are performed for regulatory network inference in comparison with several typical linear latent variable methods, which again demonstrates the effectiveness and improved performance of the proposed algorithm.
Transcriptional regulatory network inference; Source extraction; Gene expression; Genomic signal processing
The rapid progress of molecular biology tools for directed genetic modifications, accurate quantitative experimental approaches, high-throughput measurements, together with development of genome sequencing has made the foundation for a new area of metabolic engineering that is driven by metabolic models. Systematic analysis of biological processes by means of modelling and simulations has made the identification of metabolic networks and prediction of metabolic capabilities under different conditions possible. For facilitating such systemic analysis, we have developed the BioMet Toolbox, a web-based resource for stoichiometric analysis and for integration of transcriptome and interactome data, thereby exploiting the capabilities of genome-scale metabolic models. The BioMet Toolbox provides an effective user-friendly way to perform linear programming simulations towards maximized or minimized growth rates, substrate uptake rates and metabolic production rates by detecting relevant fluxes, simulate single and double gene deletions or detect metabolites around which major transcriptional changes are concentrated. These tools can be used for high-throughput in silico screening and allows fully standardized simulations. Model files for various model organisms (fungi and bacteria) are included. Overall, the BioMet Toolbox serves as a valuable resource for exploring the capabilities of these metabolic networks. BioMet Toolbox is freely available at www.sysbio.se/BioMet/.
Matlab, a powerful and productive language that allows for rapid prototyping, modeling and simulation, is widely used in computational biology. Modeling and simulation of large biological systems often require more computational resources then are available on a single computer. Existing distributed computing environments like the Distributed Computing Toolbox, MatlabMPI, Matlab*G and others allow for the remote (and possibly parallel) execution of Matlab commands with varying support for features like an easy-to-use application programming interface, load-balanced utilization of resources, extensibility over the wide area network, and minimal system administration skill requirements. However, all of these environments require some level of access to participating machines to manually distribute the user-defined libraries that the remote call may invoke.
mGrid augments the usual process distribution seen in other similar distributed systems by adding facilities for user code distribution. mGrid's client-side interface is an easy-to-use native Matlab toolbox that transparently executes user-defined code on remote machines (i.e. the user is unaware that the code is executing somewhere else). Run-time variables are automatically packed and distributed with the user-defined code and automated load-balancing of remote resources enables smooth concurrent execution. mGrid is an open source environment. Apart from the programming language itself, all other components are also open source, freely available tools: light-weight PHP scripts and the Apache web server.
Transparent, load-balanced distribution of user-defined Matlab toolboxes and rapid prototyping of many simple parallel applications can now be done with a single easy-to-use Matlab command. Because mGrid utilizes only Matlab, light-weight PHP scripts and the Apache web server, installation and configuration are very simple. Moreover, the web-based infrastructure of mGrid allows for it to be easily extensible over the Internet.
An Arabidopsis thaliana transcriptional network reveals regulatory mechanisms for the control of genes related to stress adaptation.
Understanding the molecular mechanisms plants have evolved to adapt their biological activities to a constantly changing environment is an intriguing question and one that requires a systems biology approach. Here we present a network analysis of genome-wide expression data combined with reverse-engineering network modeling to dissect the transcriptional control of Arabidopsis thaliana. The regulatory network is inferred by using an assembly of microarray data containing steady-state RNA expression levels from several growth conditions, developmental stages, biotic and abiotic stresses, and a variety of mutant genotypes.
We show that the A. thaliana regulatory network has the characteristic properties of hierarchical networks. We successfully applied our quantitative network model to predict the full transcriptome of the plant for a set of microarray experiments not included in the training dataset. We also used our model to analyze the robustness in expression levels conferred by network motifs such as the coherent feed-forward loop. In addition, the meta-analysis presented here has allowed us to identify regulatory and robust genetic structures.
These data suggest that A. thaliana has evolved high connectivity in terms of transcriptional regulation among cellular functions involved in response and adaptation to changing environments, while gene networks constitutively expressed or less related to stress response are characterized by a lower connectivity. Taken together, these findings suggest conserved regulatory strategies that have been selected during the evolutionary history of this eukaryote.
The problem of identifying dynamics of biological networks is of critical importance in order to understand biological systems. In this article, we propose a data-driven inference scheme to identify temporally evolving network representations of genetic networks. In the formulation of the optimization problem, we use an adjacency map as a priori information and define a cost function that both drives the connectivity of the graph to match biological data as well as generates a sparse and robust network at corresponding time intervals. Through simulation studies of simple examples, it is shown that this optimization scheme can help capture the topological change of a biological signaling pathway, and furthermore, might help to understand the structure and dynamics of biological genetic networks.
inference of dynamic models; temporally evolving networks; gene regulatory networks