With the growing importance of computational models in systems biology there has been much interest in recent years to develop standard model interchange languages that permit biologists to easily exchange models between different software tools. In this chapter two chief model exchange standards, SBML and CellML are described. In addition, other related features including visual layout initiatives, ontologies and best practices for model annotation are discussed. Software tools such as developer libraries and basic editing tools are also introduced together with a discussion on the future of modeling languages and visualization tools in systems biology.
SBML; CellML; Ontology; MIRIAM; SBGN; TEDDY; MIASE; Standards; libSBML; SBMLEditor; Biomodels.net; SBO
A great variety of software applications are now employed in the metabolic engineering field. These applications have been created to support a wide range of experimental and analysis techniques. Computational tools are utilized throughout the metabolic engineering workflow to extract and interpret relevant information from large data sets, to present complex models in a more manageable form, and to propose efficient network design strategies. In this review, we present a number of tools that can assist in modifying and understanding cellular metabolic networks. The review covers seven areas of relevance to metabolic engineers. These include metabolic reconstruction efforts, network visualization, nucleic acid and protein engineering, metabolic flux analysis, pathway prospecting, post-structural network analysis and culture optimization. The list of available tools is extensive and we can only highlight a small, representative portion of the tools from each area.
metabolic engineering; software; genome-scale metabolic networks; network visualization
Many biological studies are carried out on large populations of cells, often in order to obtain enough material to make measurements. However, we now know that noise is endemic in biological systems and this results in cell-to-cell variability in what appears to be a population of identical cells. Although often neglected, this noise can have a dramatic effect on system responses to environmental cues with significant and often counter-intuitive biological outcomes. A recent study in BMC Systems Biology provides an example of this, documenting a bimodal distribution of activated extracellular signal-regulated kinase in a population of cells exposed to epidermal growth factor and demonstrating that the observed bimodality of the response is induced purely by noise.
See research article: http://www.biomedcentral.com/1752-0509/6/109
One problem with synthetic genes in genetically engineered organisms is that these foreign DNAs will eventually lose their functions over evolutionary time in absence of selective pressures. This general limitation can restrain the long-term study and industrial application of synthetic genetic circuits. Previous studies have shown that because of their crucial regulatory functions, prokaryotic promoters in synthetic genetic circuits are especially vulnerable to mutations. In this study, we rationally designed robust bidirectional promoters (BDPs), which are self-protected through the complementarity of their overlapping forward and backward promoter sequences on DNA duplex. When the transcription of a target non-essential gene (e.g. green fluorescent protein) was coupled to the transcription of an essential gene (e.g. antibiotic resistance gene) through the BDP, the evolutionary half-time of the gene of interest increases 4–10 times, depending on the strain and experimental conditions used. This design of using BDPs to increase the mutational stability of genetic circuits can be potentially applied to synthetic biology applications in general.
Genetically identical cells can show phenotypic variability. This is often caused by stochastic events that originate from randomness in biochemical processes involving in gene expression and other extrinsic cellular processes. From an engineering perspective, there have been efforts focused on theory and experiments to control noise levels by perturbing and replacing gene network components. However, systematic methods for noise control are lacking mainly due to the intractable mathematical structure of noise propagation through reaction networks. Here, we provide a numerical analysis method by quantifying the parametric sensitivity of noise characteristics at the level of the linear noise approximation. Our analysis is readily applicable to various types of noise control and to different types of system; for example, we can orthogonally control the mean and noise levels and can control system dynamics such as noisy oscillations. As an illustration we applied our method to HIV and yeast gene expression systems and metabolic networks. The oscillatory signal control was applied to p53 oscillations from DNA damage. Furthermore, we showed that the efficiency of orthogonal control can be enhanced by applying extrinsic noise and feedback. Our noise control analysis can be applied to any stochastic model belonging to continuous time Markovian systems such as biological and chemical reaction systems, and even computer and social networks. We anticipate the proposed analysis to be a useful tool for designing and controlling synthetic gene networks.
Stochastic gene expression at the single cell level can lead to significant phenotypic variation at the population level. To obtain a desired phenotype, the noise levels of intracellular protein concentrations may need to be tuned and controlled. Noise levels often decrease in relative amount as the mean values increase. This implies that the noise levels can be passively controlled by changing the mean values. In an engineering perspective, the noise levels can be further controlled while the mean values can be simultaneously adjusted to desired values. Here, systematic schemes for such simultaneous control are described by identifying where and by how much the system needs to be perturbed. The schemes can be applied to the design process of a potential therapeutic HIV-drug that targets a certain set of reactions that are identified by the proposed analysis, to prevent stochastic transition to the lytic state. In some cases, the simultaneous control cannot be performed efficiently, when the noise levels strongly change with the mean values. This problem is shown to be resolved by applying extra noise and feedback.
Motivation: The SBML Render Extension enables coloring and shape information of biochemical models to be stored in the Systems Biology Markup Language (SBML). Rendering of this stored graphical information in a portable and well supported system such as TeX would be useful for researchers preparing documentation and presentations. In addition, since the Render Extension is not yet supported by many applications, it is helpful for such rendering functionality be extended to the more popular CellDesigner annotation as well.
Results: SBML2TikZ supports automatic generation of graphics for biochemical models in the popular TeX typesetting system. The library generates a script of TeX macro commands for the vector graphics languages PGF/TikZ that can be compiled into scalable vector graphics described in a model.
Availability: Source code, documentation and compiled binaries for the SBML2TikZ library can be found at http://www.sbml2tikz.org. In addition, a web application is available at http://www.sys-bio.org/layout
Supplementary information: Supplementary data are available at Bioinformatics online.
We have created the Knowledgebase of Standard Biological Parts (SBPkb) as a publically accessible Semantic Web resource for synthetic biology (sbolstandard.org). The SBPkb allows researchers to query and retrieve standard biological parts for research and use in synthetic biology. Its initial version includes all of the information about parts stored in the Registry of Standard Biological Parts (partsregistry.org). SBPkb transforms this information so that it is computable, using our semantic framework for synthetic biology parts. This framework, known as SBOL-semantic, was built as part of the Synthetic Biology Open Language (SBOL), a project of the Synthetic Biology Data Exchange Group. SBOL-semantic represents commonly used synthetic biology entities, and its purpose is to improve the distribution and exchange of descriptions of biological parts. In this paper, we describe the data, our methods for transformation to SBPkb, and finally, we demonstrate the value of our knowledgebase with a set of sample queries. We use RDF technology and SPARQL queries to retrieve candidate “promoter” parts that are known to be both negatively and positively regulated. This method provides new web based data access to perform searches for parts that are not currently possible.
In synthetic biology, gene regulatory circuits are often constructed by combining smaller circuit components. Connections between components are achieved by transcription factors acting on promoters. If the individual components behave as true modules and certain module interface conditions are satisfied, the function of the composite circuits can in principle be predicted.
In this paper, we investigate one of the interface conditions: fan-out. We quantify the fan-out, a concept widely used in electrical engineering, to indicate the maximum number of the downstream inputs that an upstream output transcription factor can regulate. The fan-out is shown to be closely related to retroactivity studied by Del Vecchio, et al. An efficient operational method for measuring the fan-out is proposed and shown to be applied to various types of module interfaces. The fan-out is also shown to be enhanced by self-inhibitory regulation on the output. The potential role of an inhibitory regulation is discussed.
The proposed estimation method for fan-out not only provides an experimentally efficient way for quantifying the level of modularity in gene regulatory circuits but also helps characterize and design module interfaces, enabling the modular construction of gene circuits.
One problem with engineered genetic circuits in synthetic microbes is their stability over evolutionary time in the absence of selective pressure. Since design of a selective environment for maintaining function of a circuit will be unique to every circuit, general design principles are needed for engineering evolutionary robust circuits that permit the long-term study or applied use of synthetic circuits.
We first measured the stability of two BioBrick-assembled genetic circuits propagated in Escherichia coli over multiple generations and the mutations that caused their loss-of-function. The first circuit, T9002, loses function in less than 20 generations and the mutation that repeatedly causes its loss-of-function is a deletion between two homologous transcriptional terminators. To measure the effect between transcriptional terminator homology levels and evolutionary stability, we re-engineered six versions of T9002 with a different transcriptional terminator at the end of the circuit. When there is no homology between terminators, the evolutionary half-life of this circuit is significantly improved over 2-fold and is independent of the expression level. Removing homology between terminators and decreasing expression level 4-fold increases the evolutionary half-life over 17-fold. The second circuit, I7101, loses function in less than 50 generations due to a deletion between repeated operator sequences in the promoter. This circuit was re-engineered with different promoters from a promoter library and using a kanamycin resistance gene (kanR) within the circuit to put a selective pressure on the promoter. The evolutionary stability dynamics and loss-of-function mutations in all these circuits are described. We also found that on average, evolutionary half-life exponentially decreases with increasing expression levels.
A wide variety of loss-of-function mutations are observed in BioBrick-assembled genetic circuits including point mutations, small insertions and deletions, large deletions, and insertion sequence (IS) element insertions that often occur in the scar sequence between parts. Promoter mutations are selected for more than any other biological part. Genetic circuits can be re-engineered to be more evolutionary robust with a few simple design principles: high expression of genetic circuits comes with the cost of low evolutionary stability, avoid repeated sequences, and the use of inducible promoters increases stability. Inclusion of an antibiotic resistance gene within the circuit does not ensure evolutionary stability.
Motivation: Model exchange in systems and synthetic biology has been standardized for computers with the Systems Biology Markup Language (SBML) and CellML, but specialized software is needed for the generation of models in these formats. Text-based model definition languages allow researchers to create models simply, and then export them to a common exchange format. Modular languages allow researchers to create and combine complex models more easily. We saw a use for a modular text-based language, together with a translation library to allow other programs to read the models as well.
Summary: The Antimony language provides a way for a researcher to use simple text statements to create, import, and combine biological models, allowing complex models to be built from simpler models, and provides a special syntax for the creation of modular genetic networks. The libAntimony library allows other software packages to import these models and convert them either to SBML or their own internal format.
Availability: The Antimony language specification and the libAntimony library are available under a BSD license from http://antimony.sourceforge.net/
Synthetic biology is an engineering discipline that builds on modeling practices from systems biology and wet-lab techniques from genetic engineering. As synthetic biology advances, efficient procedures will be developed that will allow a synthetic biologist to design, analyze and build biological networks. In this idealized pipeline, computer-aided design (CAD) is a necessary component. The role of a CAD application would be to allow efficient transition from a general design to a final product. TinkerCell is a design tool for serving this purpose in synthetic biology. In TinkerCell, users build biological networks using biological parts and modules. The network can be analyzed using one of several functions provided by TinkerCell or custom programs from third-party sources. Since best practices for modeling and constructing synthetic biology networks have not yet been established, TinkerCell is designed as a flexible and extensible application that can adjust itself to changes in the field.
synthetic biology; modeling; software; CAD; simulation; systems biology; design; standards; computational
Genetic circuits can be assembled from standardized biological parts called BioBricks. Examples of BioBricks include promoters, ribosome-binding sites, coding sequences and transcriptional terminators. Standard BioBrick assembly normally involves restriction enzyme digestion and ligation of two BioBricks at a time. The method described here is an alternative assembly strategy that allows for two or more PCR-amplified BioBricks to be quickly assembled and re-engineered using the Clontech In-Fusion PCR Cloning Kit. This method allows for a large number of parallel assemblies to be performed and is a flexible way to mix and match BioBricks. In-Fusion assembly can be semi-standardized by the use of simple primer design rules that minimize the time involved in planning assembly reactions. We describe the success rate and mutation rate of In-Fusion assembled genetic circuits using various homology and primer lengths. We also demonstrate the success and flexibility of this method with six specific examples of BioBrick assembly and re-engineering. These examples include assembly of two basic parts, part swapping, a deletion, an insertion, and three-way In-Fusion assemblies.
A wide range of research areas in molecular biology and medical biochemistry require a reliable enzyme classification system, e.g., drug design, metabolic network reconstruction and system biology. When research scientists in the above mentioned areas wish to unambiguously refer to an enzyme and its function, the EC number introduced by the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (IUBMB) is used. However, each and every one of these applications is critically dependent upon the consistency and reliability of the underlying data for success. We have developed tools for the validation of the EC number classification scheme. In this paper, we present validated data of 3788 enzymatic reactions including 229 sub-subclasses of the EC classification system. Over 80% agreement was found between our assignment and the EC classification. For 61 (i.e., only 2.5%) reactions we found that their assignment was inconsistent with the rules of the nomenclature committee; they have to be transferred to other sub-subclasses. We demonstrate that our validation results can be used to initiate corrections and improvements to the EC number classification scheme.
The fundamental understanding of metabolism in organisms which can only be achieved by integrated studies on their biology using a systems biology approach will aid in the design of future metabolic engineering strategies. Metabolic network reconstruction provides insight into the molecular mechanisms of a particular organism. An annotated genome containing the specific metabolic genes found in a particular organism can be used to reconstruct its metabolic network. The correlation between the genome and metabolism is made by searching gene databases or by searching protein databases with a known EC number in order to find the associated gene. The success of the search process is critically dependent upon the consistency and reliability of the underlying data. Therefore we have developed tools which can be used to identify wrong or inconsistent classification of enzymes and help to remove them from the relevant search databases.
Cytoplasmic transport of organelles, nucleic acids and proteins on microtubules is usually bidirectional with dynein and kinesin motors mediating the delivery of cargoes in the cytoplasm. Here we combine live cell microscopy, single virus tracking and trajectory segmentation to systematically identify the parameters of a stochastic computational model of cargo transport by molecular motors on microtubules. The model parameters are identified using an evolutionary optimization algorithm to minimize the Kullback-Leibler divergence between the in silico and the in vivo run length and velocity distributions of the viruses on microtubules. The present stochastic model suggests that bidirectional transport of human adenoviruses can be explained without explicit motor coordination. The model enables the prediction of the number of motors active on the viral cargo during microtubule-dependent motions as well as the number of motor binding sites, with the protein hexon as the binding site for the motors.
Molecular motors, due to their transportation function, are essential to the cell, but they are often hijacked by viruses to reach their replication site. Imaging of virus trajectories provides information about the patterns of virus transport in the cytoplasm, leading to improved understanding of the underlying mechanisms. In turn improved understanding may suggest actions that can be taken to interfere with the transport of pathogens in the cell. In this work we use in vivo imaging of virus trajectories to develop a computational model of virus transport in the cell. The model parameters are identified by an optimization procedure to minimize the discrepancy between in vivo and in silico trajectories. The model explains the in vivo trajectories as the result of a stochastic interaction between motors. Furthermore it enables predictions on the number of motors and binding sites on pathogens, quantities that are difficult to obtain experimentally. Beyond the understanding of mechanisms involved in pathogen transport, the present paper introduces a systematic parameter identification algorithm for stochastic models using in vivo imaging. The discrete and noisy characteristics of biological systems have led to increased attention in stochastic models and this work provides a methodology for their systematic development.
Probably one of the most characteristic features of a living system is its continual propensity to change as it juggles the demands of survival with the need to replicate. Internally these changes are manifest as changes in metabolite, protein and gene activities. Such changes have become increasingly obvious to experimentalists with the advent of high-throughput technologies. In this chapter we highlight some of the quantitative approaches used to rationalize the study of cellular dynamics. The chapter focuses attention on the analysis of quantitative models based on differential equations using biochemical control theory. Basic pathway motifs are discussed, including straight chain, branched and cyclic systems. In addition, some of the properties conferred by positive and negative feedback loops are discussed particularly in relation to bistability and oscillatory dynamics.
Motifs; control analysis; stability; dynamic models
Synthetic biology brings together concepts and techniques from engineering and biology. In this field, computer-aided design (CAD) is necessary in order to bridge the gap between computational modeling and biological data. Using a CAD application, it would be possible to construct models using available biological "parts" and directly generate the DNA sequence that represents the model, thus increasing the efficiency of design and construction of synthetic networks.
An application named TinkerCell has been developed in order to serve as a CAD tool for synthetic biology. TinkerCell is a visual modeling tool that supports a hierarchy of biological parts. Each part in this hierarchy consists of a set of attributes that define the part, such as sequence or rate constants. Models that are constructed using these parts can be analyzed using various third-party C and Python programs that are hosted by TinkerCell via an extensive C and Python application programming interface (API). TinkerCell supports the notion of a module, which are networks with interfaces. Such modules can be connected to each other, forming larger modular networks. TinkerCell is a free and open-source project under the Berkeley Software Distribution license. Downloads, documentation, and tutorials are available at .
An ideal CAD application for engineering biological systems would provide features such as: building and simulating networks, analyzing robustness of networks, and searching databases for components that meet the design criteria. At the current state of synthetic biology, there are no established methods for measuring robustness or identifying components that fit a design. The same is true for databases of biological parts. TinkerCell's flexible modeling framework allows it to cope with changes in the field. Such changes may involve the way parts are characterized or the way synthetic networks are modeled and analyzed computationally. TinkerCell can readily accept third-party algorithms, allowing it to serve as a platform for testing different methods relevant to synthetic biology.
Recognizing that certain biological functions can be associated with specific DNA sequences has led various fields of biology to adopt the notion of the genetic part. This concept provides a finer level of granularity than the traditional notion of the gene. However, a method of formally relating how a set of parts relates to a function has not yet emerged. Synthetic biology both demands such a formalism and provides an ideal setting for testing hypotheses about relationships between DNA sequences and phenotypes beyond the gene-centric methods used in genetics. Attribute grammars are used in computer science to translate the text of a program source code into the computational operations it represents. By associating attributes with parts, modifying the value of these attributes using rules that describe the structure of DNA sequences, and using a multi-pass compilation process, it is possible to translate DNA sequences into molecular interaction network models. These capabilities are illustrated by simple example grammars expressing how gene expression rates are dependent upon single or multiple parts. The translation process is validated by systematically generating, translating, and simulating the phenotype of all the sequences in the design space generated by a small library of genetic parts. Attribute grammars represent a flexible framework connecting parts with models of biological function. They will be instrumental for building mathematical models of libraries of genetic constructs synthesized to characterize the function of genetic parts. This formalism is also expected to provide a solid foundation for the development of computer assisted design applications for synthetic biology.
Deciphering the genetic code has been one of the major milestones in our understanding of how genetic information is stored in DNA sequences. However, only part of the genetic information is captured by the simple rules describing the correspondence between gene and proteins. The molecular mechanisms of gene expression are now understood well enough to recognize that DNA sequences are rich in functional blocks that do not code for proteins. It has proved difficult to express the function of these genetic parts in a computer readable format that could be used to predict the emerging behavior of DNA sequences combining multiple interacting parts. We are showing that methods used by computer scientists to develop programming languages can be applied to DNA sequences. They provide a framework to: 1) express the biological functions of genetic parts, 2) how these functions depend on the context in which the parts are placed, and 3) translate DNA sequences composed of multiple parts into a model predicting how the DNA sequence will behave in vivo. Our approach provides a formal representation of how the biological function of genetic parts can be used to assist in the engineering of synthetic DNA sequences by automatically generating models of the design for analysis.
Motivation: Simulations are an essential tool when analyzing biochemical networks. Researchers and developers seeking to refine simulation tools or develop new ones would benefit greatly from being able to compare their simulation results.
Summary: We present an approach to compare simulation results between several SBML capable simulators and provide a website for the community to share simulation results.
Availability: The website with simulation results and additional material can be found under: http://sys-bio.org/sbwWiki/compare. The software used to generate the simulation results is available on the website for download.
The slime mold Dictyostelium discoideum is one of the model systems of biological pattern formation. One of the most successful answers to the challenge of establishing a spiral wave pattern in a colony of homogeneously distributed D. discoideum cells has been the suggestion of a developmental path the cells follow (Lauzeral and coworkers). This is a well-defined change in properties each cell undergoes on a longer time scale than the typical dynamics of the cell. Here we show that this concept leads to an inhomogeneous and systematic spatial distribution of spiral waves, which can be predicted from the distribution of cells on the developmental path. We propose specific experiments for checking whether such systematics are also found in data and thus, indirectly, provide evidence of a developmental path.
Spatio-temporal pattern formation is a core discipline of theoretical biology. Formation of large-scale patterns from local interactions can very prominently be observed in the swarming behavior of fish and birds, in animal markings or bacterial growth patterns. It also plays a critical role in the life cycle of the social amoeba Dictyostelium discoideum. A homogeneous colony of amoebae is partitioned into subgroups that will form migrating slugs by a collective phase of chemotactic signaling, exhibiting typical and well-known patterns for this sort of excitable dynamics (circular and spiral waves). The mechanism of spatial localization of aggregation centers (that is, the centers of periodic circular and spiral waves) is unclear, despite its crucial role to the organism's procreation. Here we demonstrate for an established computational model of D. discoideum that the initial properties of potentially very few cells have a driving influence on the resulting asymptotic collective state of the colony. Analogous processes take place in diverse situations such as, e.g., heart cells (where spiral waves occur in potentially fatal ventricular fibrillation), so that a deeper understanding of this additional layer of self-organized pattern formation would be beneficial to a wide range of applications.
Synthetic biology is a useful tool to investigate the dynamics of small biological networks and to assess our capacity to predict their behavior from computational models. In this work we report the construction of three different synthetic networks in Escherichia coli based upon the incoherent feed-forward loop architecture. The steady state behavior of the networks was investigated experimentally and computationally under different mutational regimes in a population based assay. Our data shows that the three incoherent feed-forward networks, using three different macromolecular inhibitory elements, reproduce the behavior predicted from our computational model. We also demonstrate that specific biological motifs can be designed to generate similar behavior using different components. In addition we show how it is possible to tune the behavior of the networks in a predicable manner by applying suitable mutations to the inhibitory elements.
Feed-forward networks; Modules; Simulation; Synthetic biology
Just as complex electronic circuits are built from simple Boolean gates, diverse biological functions, including signal transduction, differentiation, and stress response, frequently use biochemical switches as a functional module. A relatively small number of such switches have been described in the literature, and these exhibit considerable diversity in chemical topology. We asked if biochemical switches are indeed rare and if there are common chemical motifs and family relationships among such switches. We performed a systematic exploration of chemical reaction space by generating all possible stoichiometrically valid chemical configurations up to 3 molecules and 6 reactions and up to 4 molecules and 3 reactions. We used Monte Carlo sampling of parameter space for each such configuration to generate specific models and checked each model for switching properties. We found nearly 4,500 reaction topologies, or about 10% of our tested configurations, that demonstrate switching behavior. Commonly accepted topological features such as feedback were poor predictors of bistability, and we identified new reaction motifs that were likely to be found in switches. Furthermore, the discovered switches were related in that most of the larger configurations were derived from smaller ones by addition of one or more reactions. To explore even larger configurations, we developed two tools: the “bistabilizer,” which converts almost-bistable systems into bistable ones, and frequent motif mining, which helps rank untested configurations. Both of these tools increased the coverage of our library of bistable systems. Thus, our systematic exploration of chemical reaction space has produced a valuable resource for investigating the key signaling motif of bistability.
How does a cell know what type of cell it is supposed to become? How do external chemical signals change the underlying “state” of the cell? How are response pathways triggered on the application of a stress? Such questions of differentiation, signal transduction, and stress response, while seemingly diverse, all pertain to the storage of state information, or “memory,” by biochemical switches. Just as a computer memory unit can store a bit of 0 or 1 through electrical signals, a biochemical switch can be in one of two states, where chemical signals are on or off. This lets the cell record the presence/absence of an environmental stimulus, the level of a signaling molecule, or the result of a cell fate decision. There are a small number of published ways by which a group of chemical reactions come together to realize a switch. We undertook an exhaustive computational exploration to see if chemical switches are indeed rare and found, surprisingly, that they are actually abundant, highly diverse, but related to one another. Our catalog of switches opens up new bioinformatics approaches to understanding cellular decision making and cellular memory.
Gene regulatory networks are perhaps the most important organizational level in the cell where signals from the cell state and the outside environment are integrated in terms of activation and inhibition of genes. For the last decade, the study of such networks has been fueled by large-scale experiments and renewed attention from the theoretical field. Different models have been proposed to, for instance, investigate expression dynamics, explain the network topology we observe in bacteria and yeast, and for the analysis of evolvability and robustness of such networks. Yet how these gene regulatory networks evolve and become evolvable remains an open question.
An individual-oriented evolutionary model is used to shed light on this matter. Each individual has a genome from which its gene regulatory network is derived. Mutations, such as gene duplications and deletions, alter the genome, while the resulting network determines the gene expression pattern and hence fitness. With this protocol we let a population of individuals evolve under Darwinian selection in an environment that changes through time.
Our work demonstrates that long-term evolution of complex gene regulatory networks in a changing environment can lead to a striking increase in the efficiency of generating beneficial mutations. We show that the population evolves towards genotype-phenotype mappings that allow for an orchestrated network-wide change in the gene expression pattern, requiring only a few specific gene indels. The genes involved are hubs of the networks, or directly influencing the hubs. Moreover, throughout the evolutionary trajectory the networks maintain their mutational robustness. In other words, evolution in an alternating environment leads to a network that is sensitive to a small class of beneficial mutations, while the majority of mutations remain neutral: an example of evolution of evolvability.
A cell receives signals both from its internal and external environment and responds by changing the expression of genes. In this manner the cell adjusts to heat, osmotic pressures and other circumstances during its lifetime. Over long timescales, the network of interacting genes and its regulatory actions also undergo evolutionary adaptation. Yet how do such networks evolve and become adapted?
In this paper we describe the study of a simple model of gene regulatory networks, focusing solely on evolutionary adaptation. We let a population of individuals evolve, while the external environment changes through time. To ensure evolution is the only source of adaptation, we do not provide the individuals with a sensor to the environment. We show that the interplay between the long-term process of evolution and short-term gene regulation dynamics leads to a striking increase in the efficiency of creating well-adapted offspring. Beneficial mutations become more frequent, nevertheless robustness to the majority of mutations is maintained. Thus we demonstrate a clear example of the evolution of evolvability.
Reconstructions of cellular metabolism are publicly available for a variety of different microorganisms and some mammalian genomes. To date, these reconstructions are “genome-scale” and strive to include all reactions implied by the genome annotation, as well as those with direct experimental evidence. Clearly, many of the reactions in a genome-scale reconstruction will not be active under particular conditions or in a particular cell type. Methods to tailor these comprehensive genome-scale reconstructions into context-specific networks will aid predictive in silico modeling for a particular situation. We present a method called Gene Inactivity Moderated by Metabolism and Expression (GIMME) to achieve this goal. The GIMME algorithm uses quantitative gene expression data and one or more presupposed metabolic objectives to produce the context-specific reconstruction that is most consistent with the available data. Furthermore, the algorithm provides a quantitative inconsistency score indicating how consistent a set of gene expression data is with a particular metabolic objective. We show that this algorithm produces results consistent with biological experiments and intuition for adaptive evolution of bacteria, rational design of metabolic engineering strains, and human skeletal muscle cells. This work represents progress towards producing constraint-based models of metabolism that are specific to the conditions where the expression profiling data is available.
Systems biology aims to characterize cells and organisms as systems through the careful curation of all components. Large models that account for all known metabolism in microorganisms have been created by our group and by others around the world. Furthermore, models are available for human cells. These models represent all possible biochemical reactions in a cell, but cells choose which subset of reactions to use to suit their immediate purposes. We have developed a method to combine widely available gene expression data with presupposed cellular functions to predict the subset of reactions that a cell uses under particular conditions. We quantify the consistency of subsets of reactions with existing biological knowledge to demonstrate that the method produces biologically realistic subsets of reactions. This method is useful for determining the activity of metabolic reactions in Escherichia coli and will be essential for understanding human cellular metabolism.
A major challenge in systems biology is to understand how complex and highly connected metabolic networks are organized. The structure of these networks is investigated here by identifying sets of metabolites that have a similar biosynthetic potential. We measure the biosynthetic potential of a particular compound by determining all metabolites than can be produced from it and, following a terminology introduced previously, call this set the scope of the compound. To identify groups of compounds with similar scopes, we apply a hierarchical clustering method. We find that compounds within the same cluster often display similar chemical structures and appear in the same metabolic pathway. For each cluster we define a consensus scope by determining a set of metabolites that is most similar to all scopes within the cluster. This allows for a generalization from scopes of single compounds to scopes of a chemical family. We observe that most of the resulting consensus scopes overlap or are fully contained in others, revealing a hierarchical ordering of metabolites according to their biosynthetic potential. Our investigations show that this hierarchy is not only determined by the chemical complexity of the metabolites, but also strongly by their biological function. As a general tendency, metabolites which are necessary for essential cellular processes exhibit a larger biosynthetic potential than those involved in secondary metabolism. A central result is that chemically very similar substances with different biological functions may differ significantly in their biosynthetic potentials. Our studies provide an important step towards understanding fundamental design principles of metabolic networks determined by the structural and functional complexity of metabolites.
Life is based on the ability of cells to convert raw materials into complex chemicals like proteins or DNA. This ability is obtained through the interplay of a large number of enzymes, which are specialized proteins, each facilitating one specific chemical transformation. Since the products of one reaction can again be substrates for others, the entirety of all reactions forms a large and complex network in which important substances can be produced from many different combinations of simple chemicals and through a variety of pathways. The aim of our work is to gain understanding of the structural design of these networks and the evolutionary principles shaping them. We propose a computational strategy which allows us to pinpoint characteristic structural and functional properties distinguishing networks characterizing living processes from those that may occur in inanimate matter. Our approach reveals an intricate and unexpected hierarchical organization of the network, and gives rise to new hypotheses regarding the evolutionary origins of metabolism.
Circadian clocks are endogenous time-keeping systems that temporally organize biological processes. Gating of cell cycle events by a circadian clock is a universal observation that is currently considered a mechanism serving to protect DNA from diurnal exposure to ultraviolet radiation or other mutagens. In this study, we put forward another possibility: that such gating helps to insulate the circadian clock from perturbations induced by transcriptional inhibition during the M phase of the cell cycle. We introduced a periodic pulse of transcriptional inhibition into a previously published mammalian circadian model and simulated the behavior of the modified model under both constant darkness and light–dark cycle conditions. The simulation results under constant darkness indicated that periodic transcriptional inhibition could entrain/lock the circadian clock just as a light–dark cycle does. At equilibrium states, a transcriptional inhibition pulse of certain periods was always locked close to certain circadian phases where inhibition on Per and Bmal1 mRNA synthesis was most balanced. In a light–dark cycle condition, inhibitions imposed at different parts of a circadian period induced different degrees of perturbation to the circadian clock. When imposed at the middle- or late-night phase, the transcriptional inhibition cycle induced the least perturbations to the circadian clock. The late-night time window of least perturbation overlapped with the experimentally observed time window, where mitosis is most frequent. This supports our hypothesis that the circadian clock gates the cell cycle M phase to certain circadian phases to minimize perturbations induced by the latter. This study reveals the hidden effects of the cell division cycle on the circadian clock and, together with the current picture of genome stability maintenance by circadian gating of cell cycle, provides a more comprehensive understanding of the phenomenon of circading gating of cell cycle.
Circadian clock and cell cycle are two important biological processes that are essential for nearly all eukaryotes. The circadian clock governs day and night 24 h periodic molecular processes and physiological behaviors, while cell cycle controls cell division process. It has been widely observed that cell division does not occur randomly across day and night, but instead is normally confined to specific times during day and night. These observations suggest that cell cycle events are gated by the circadian clock. Regarding the biological benefit and rationale for this intriguing gating phenomena, it has been postulated that circadian gating helps to maintain genome stability by confining radiation-sensitive cell cycle phases to night. Bearing in mind the facts that global transcriptional inhibition occurs at cell division and transcriptional inhibition shifts circadian phases and periods, we postulate that confining cell division to specific circadian times benefits the circadian clock by removing or minimizing the side effects of cell division on the circadian clock. Our results based on computational simulation in this study show that periodic transcriptional inhibition can perturb the circadian clock by altering circadian phases and periods, and the magnitude of the perturbation is clearly circadian phase dependent. Specifically, transcriptional inhibition initiated at certain circadian phases induced minimal perturbation to the circadian clock. These results provide support for our postulation. Our postulation and results point to the importance of the effect of cell division on the circadian clock in the interaction between circadian and cell cycle and suggest that it should be considered together with other factors in the exploitation of circadian cell cycle interaction, especially the phenomena of circadian gating of cell cycle.