Search tips
Search criteria 


Logo of bibLink to Publisher's site
Brief Bioinform. 2010 January; 11(1): 80–95.
Published online 2009 November 11. doi:  10.1093/bib/bbp054
PMCID: PMC2810114

The challenges of informatics in synthetic biology: from biomolecular networks to artificial organisms


The field of synthetic biology holds an inspiring vision for the future; it integrates computational analysis, biological data and the systems engineering paradigm in the design of new biological machines and systems. These biological machines are built from basic biomolecular components analogous to electrical devices, and the information flow among these components requires the augmentation of biological insight with the power of a formal approach to information management. Here we review the informatics challenges in synthetic biology along three dimensions: in silico, in vitro and in vivo. First, we describe state of the art of the in silico support of synthetic biology, from the specific data exchange formats, to the most popular software platforms and algorithms. Next, we cast in vitro synthetic biology in terms of information flow, and discuss genetic fidelity in DNA manipulation, development strategies of biological parts and the regulation of biomolecular networks. Finally, we explore how the engineering chassis can manipulate biological circuitries in vivo to give rise to future artificial organisms.

Keywords: informatics, synthetic biology, systems biology, networks


The processing and management of information is a critical part of synthetic biology, a field that approaches the design of biologically based machines from a systems engineering perspective, as a complement to systems biology. Whereas systems biology studies how biological parts give rise to the emergent properties and functions of a unified organism, the main goal of synthetic biology is to start with a set of functions and properties, and build a suitable system out of biological components. In other words, systems biology and synthetic biology represent two sides of the same coin: analysis and design [1].

The development of biologically based solutions to human problems is as old as mankind. For thousands of years, man has been breeding plants for agriculture, horses for transportation and pets for companionship. Genetic engineering pioneered the use of natural genes to modify organisms. Synthetic biologists also alter natural systems for human consumption, but with a different approach: they engineer biological systems starting from artificial components. As in systems engineering, biological modules could be developed from an eclectic set of natural sources and rapidly combined to arrive at innovations that would be far beyond incremental, time-consuming adjustments of natural organisms. The imminent departure from traditional biological engineering inspires novel ways to solve age-old problems, such as those in alternative energy [2], drug manufacture [3, 4], therapeutics [5] and green chemistry [6]. In other words, synthetic biology opens the door to unprecedented biochemical flexibility—a marked departure from an incremental pattern of progress.

In theory, the synthetic biologist should be able to start with a set of desired features, design a biological circuitry that meets those requirements, and implement that design in vivo. The reality is not so straightforward (Figure 1). The current practice of producing complex biological systems usually requires an iterative optimization, partly because biological parts are subject to apoptosis, crosstalk, mutations and perturbations. In addition, a biological component can exhibit context dependence—it can stop working when it is transplanted from its native context into another cell type. Synthesized biological circuitry also suffers from biological noise and undesirable initial conditions. The issues inherent in this field become most apparent when one considers biological components, when put together, give rise to emergent properties in the whole. The existence of emergent properties indicates that our biological knowledge and design capabilities are not yet at the level of sophistication needed for a priori design and production of a prototype with a fair shot at success.

Figure 1:
The synthetic biology infrastructure. Solid lines indicate the components of synthetic biology and the connections among them. Bold solid lines emphasize the main path from given requirements to finished product. Boxes with thin solid lines indicate support ...

It is clear that the acknowledgement of the existence of emergent properties implies the need for a better understanding of systems biology. What is less obvious is that efficiently building a robust infrastructure for synthetic biology requires a careful management of relevant information by the research community. Such information would include biological device data exchanged by collaborators, network models exported by software and signals transduced from one biological device to another. The complexity and amount of information needed implies an opportunity for synergy through standardized communication. However, reviews on synthetic biology from an informatics perspective are rare. This review addresses this gap in the literature.


Computer-based design and simulation are key elements of synthetic biology, and there is a need for efficient communication between both human beings and software programs. Taken together, these facts imply the need for standardization of synthetic biology data in silico.

Information standards

Most of the efforts in synthetic biology computer data standardization can be grouped into two areas. One starts with a network perspective, and the other has a ‘bottom-up’ approach that emphasizes the fundamental building block of synthetic biology, or the biological part. The dominant parts format appears to be the BioBrick Standard [7] (Figure 2), which is used by the Registry of Standard Biological Parts ( and the international Genetically Engineered Machine (iGEM) competition [8]. The Biobricks Standard is a set of rules that define features of a DNA sequence so that each BioBrick can be easily combined into larger compositions in vitro. In other words, each BioBrick is an easily clonable DNA sequence which codes for a biological part. While the ease of DNA construction is addressed, extending the format to support the functional composition of these modules remains an important challenge [9]. The BioBrick format bases its parts characterization on promoter structure and sequences, and this is not easily translated into functional characterization within the context of interacting networks [10]. Sequence-based descriptions of parts would be appropriate in designing small systems where potential interactions could be intuitively processed (for example, by ignoring ‘nonessential’ DNA segments), but this becomes impractical for the design of large networks. This is because even ‘nonessential’ portions of biological sequences still affect functional efficiency in DNA promoters, RNA, and proteins [11]. (This paper [11] not only published new biological parts but also proposed a general strategy that addresses problems of emergent properties and design inaccuracy. This paper convincingly argued for a new way to develop and characterize components, and will likely influence the way future biological parts are presented in databases and publications.) Minor changes of nonessential sequences affect individual components in minor amounts that are only quantitatively noticeable, but small changes to one component can still have a dramatic impact on network behavior. Therefore, quantitative characterizations of component functions are necessary for efficient network design. Canton et al. [12] (this paper proposed to augment the BioBricks documentation standards) proposed to extend the Biobrick Standard by adding quantified descriptions formatted into datasheets akin to those common in electrical engineering. However, different biological parts may require different types of information [9]. In other words, the Registry may require more than one datasheet format.

Figure 2:
The BioBrick Standard [7]. (a) Basic sequence template of the BioBrick Standard. The insert of the BioBrick is flanked upstream and downstream with restriction sites. EcoRI and XbaI restriction sites are at the 5′-end (prefix). SpeI and PstI restriction ...

Other enhancements of the BioBrick Standard have also been proposed. Recent experimental tests to confirm the validity of plasmid inserts for a collection of clones have resulted in unexpected discrepancies, so a quality control scheme has been proposed [13] (This paper proposes a quality control scheme for the Registry of Biological Parts). A provisional BioBrick language (PoBoL) was created to define a data exchange standard ( [14]. More specifically, PoBoL aims to define minimal information requirements for BioBricks, provide annotation methods for BioBricks, maintain interlinking possibilities and set the stage for further language extensions.

Of equal importance to biological parts standardization is an agreement on how network designs should be described. To model biological systems, it seems logical to start with conventions developed in the systems biology community, such as the Systems Biology Markup Language (SBML) [15–18], Cellular Markup Language (CellML) [19, 20], MIRIAM [21], Systems Biology Graphical Notation (SBGN) [22, 23] ([23] formally presents a set of conventions in graphical notation that will help biologists communicate clearly and efficiently) and BioPAX [24].

SBML was developed to exchange biological process information in the systems biology community [15–18]. It can be used to model a variety of phenomena, such as metabolic pathways, gene regulation and cell signaling pathways. Its success can be attributed to a number of factors. First, SBML has incorporated a number of other useful standards: MathML 2.0 [25], which provides a common mathematical expression language; the Resource Description Framework (RDF) [26], which allows for machine-readable metadata; and the Systems Biology Ontology (SBO) [27, 28] is a set of six controlled vocabularies. Second, SBML provides community-driven software support [29] ( A particularly useful software platform is an application programming interface (API) library called libSBML [30], which makes SBML file manipulation accessible to scripting languages. Current translation scripts have bridged SBML-structured data and other formats [31]. Third, the SBML format is used in the BioModels Database [32] ( Recent developments demonstrate both language extensions and applications. Its utility has been extended for stochastic simulations [33]. SBML has been used in the analysis of iron metabolism [34] and the RB/E2F pathway [35].

CellML, an alternative to SBML, is an extensible markup language that models the cell as a set of ordinary differential equations [19, 20]. Its more modular structure is convenient for multi-scale modeling and reuse of parts but has less emphasis on the biochemistry. CellML also incorporates MathML and RDF. It also has some community-driven software support [36] ( There are translators that bridge SBML and CellML [37]. Community adoption of this standard has resulted in the CellML Model Repository, which is a publicly accessible database of curated biological models [38] (this paper [38] presents the current state of the model repository). CellML's flexibility stems from its ability to represent biological phenomena through mathematical and model building constructs, but sometimes it is useful to have explicit biological descriptions. To this end Wimalaratne et al. [39] have developed a biophysical annotation framework.

MIRIAM, or minimal information requested in the annotation of biochemical networks, is a scheme to provide extensive documentation in the model file in a structured manner [21]. Models can only be useful if there is enough annotation. Controlled annotations are achieved with the help of uniform resource identifiers (URIs) [40]. The MIRIAM approach provides a common annotation format as well as controlled vocabularies and databases [41].

SBGN is a recent attempt at standardizing the visual representation of biological networks (Figure 3) [22]. Recently, automatic equation generation for SBML from SBGN diagrams was made possible [42].

Figure 3:
SBGN network example. of inter-cellular signaling near the neuromuscular junction [22, 23]. Biological concepts are organized with glyphs, or named containers. Some glyphs represent entity pool nodes, each of which is a population of entities that are ...

BioPAX is an effort to represent pathway data with ontological annotations [24, 43]. BioPAX complements formats like CellML and SBML because it focuses on the integration of large qualitative pathways rather than on mathematical modeling [10, 44].

The synthetic biology community also has other approaches that border on standardization. For example, Pedersen et al. [45] introduced a formal language called Genetic Engineering of Cells (GEC), which allows a modular modeling of interactions between potentially undetermined proteins and genes.

Ideally, a synthetic biology design approach would have the versatility to employ both network- and component-centric standards so that multiple levels of detail could be considered at the same time. In addition to importing publicly accessible data in common formats, the workflow would integrate problem-specific data and formats as well. Integration of the network and component perspectives is occurring or anticipated on multiple fronts. The BioBricks format is expected to support the design of ever more complex networks by incorporating integration approaches akin to BioPAX [10] that allow for ontological annotations. In contrast, standards like CellML and SBML that already allow mathematical network modeling would benefit from extending their formalisms to leverage synthetic biology constructs, such as DNA sequences and device-level information [10]. A third front is composed of integration efforts not though explicit dialogue on standards but with software development. OpenCell (PCEnv), a CellML-based platform, can model both quantitative networks and synthetic biology constructs [19].

The result of these efforts would be a comprehensive description framework, but the classic tradeoff between detail-driven accuracy and analytical efficiency will persist. Because a tradeoff naturally implies numerous possible approaches to addressing both accuracy and efficiency, each subgroup within synthetic biology may opt to pursue their own specialized formats for data management. For example, a network that depends on transcriptional regulation and a model that depends on protein–protein interaction may have different description requirements for modules and control kinetics equations. Such specializations may be easily achieved through the custom tag facility of XML [46], which is already familiar to developers of SBML and CellML.

Databases and software tools

No single data standard in synthetic biology has yet achieved the scope necessary to account for all useful information, such as epigenetic data [9]. Nevertheless, the current data formats are still useful for organizing biological information in databases and software. Synthetic Biology Software Suite (SynBioSS), designed for modeling synthetic genetic constructs, uses the Registry of Standard Biological Parts as well as a kinetic parameter database [47]. GenoCAD aims to streamline the design of synthetic DNA sequences [48]. This program appears to imply a debate in the synthetic biology community about the need for well-formatted ends for easy connection of coding sequences. The software takes advantage of the BioBrick-formatted DNA registry, but it also aims to do away with the standardization of the means by which the parts are connected. This implies a BioBrick-independent, general means of producing long stretches of error-free DNA (discussed later). CellML has software support through OpenCell (formerly PCEnv), Cellular Open Resource (COR) [49], InsilicoIDE and JSIM [19]. Cytoscape can visualize and analyze complex networks for biological research [50]. Plug-ins, which confer additional features, are actively being developed [51–54]. Funahashi's CellDesigner [55], an editor for SBML, was designed as a tool to model network dynamics. It has a plug-in facility that enables third parties to extend the software capability. CellDesigner's utility has been extended for stochastic simulations [33] and automatic equation generation from SBGN diagrams [42]. CellDesigner has been used in the analyses of iron metabolism [34] and the RB/E2F pathway [35]. The Process Modeling Tool (ProMoT) is a ‘drag and drop’ design platform [29]. Other software developments can be found at format-specific resource pages [29, 36, 56]. In short, concurrent with the efforts to reach consensus on information standards are attempts to employ data and standards in the design of synthetic networks.

Algorithms and heuristics

Computer-based informatics also has the advantage of relatively low-cost, quick simulations prior to in vitro implementation. Loewe [57] proposed a framework that combined systems biology and evolutionary theory to simulate mutations whose effects are too subtle to be detected in vitro. Chen et al. [58] proposed a stochastic game theory-based approach to address complications due to uncertain initial conditions and extra-cellular disturbances. They also proposed managing uncertainties by addressing four design specifications [59]. Banga [60] has recently reviewed optimization in computational systems biology. Computational limits make model simplification a useful strategy. To this end enzyme kinetic models are translated in a number of formats to reduce the model complexity. Hadlich et al. [61] developed an algorithm to automate the process of kinetic format translation. Bentley [36] proposes methods called systemic computation (SC) and fractal proteins for improving the simulations of biological systems. OptCircuit is an optimization-based method for automatically identifying the required circuits from a database of components and kinetic parameters [62]; this method may work well with Ellis et al.'s strategy of designing networks from quantitatively characterized libraries of diversified components [11]. Cantone et al. [63] developed a small synthetic gene network to assess current modeling and reverse-engineering algorithms. Models based on ordinary differential equations and Bayesian networks were qualitatively accurate, but it is not yet clear if these conclusions are generalizable to the analysis of larger networks. We see that the need for an unambiguous, quantitative, and collaborative exchange of digital, computerized information is currently being addressed by a variety of standards, databases and software.

Improvements in algorithms for analyzing networks in synthetic and systems biology are needed, because our current, relatively simple models do not have the capacity to handle the abundant data acquired from complex biological systems [31]. Issues in network analysis are exemplified by the fact that inferences from small-sized networks cannot be simply extrapolated to larger networks, as Stumpf et al. [64] have shown that sub-networks of a scale-free network are not necessarily scale free. In general, a rigorous statistical analysis of network data is difficult because there are numerous correlations [31].


The informatics approach can also reframe the in vitro aspects of synthetic biology. In this light, DNA synthesis from computer-aided design is essentially a format conversion from bytes to basepairs. Biological parts development often involves a refinement of signal transduction, or data flow within a biological circuit. Protein complexes can be modeled as instances of noisy communication channels [65, 66]. Indeed, because information-processing devices such as logic gates have been already implemented in vitro (Figure 4). In other words, critical informatics technology in synthetic biology resides not only in computers but also in biological circuitry as well.

Figure 4:
Transcription-based logic gates constructed from modular transcription units [67]. Electronic logic gates are the fundamental building blocks of computational ability. For each logic gate, the table presents the boolean logic (column 2), design a biological ...

DNA synthesis

Following a successful simulation, the computer-based network design must be translated into an in vitro DNA sequence. BioBrick-formatted synthetic genes can provide a set of required, proofread sequences that one can splice together (Figure 5). Combined, the much longer sequence codes for the synthetic biological circuitry. On the other hand, doing away with the BioBrick parts connection formats can streamline the design of synthetic DNA sequences [48], as long as sequence proofreading can still be done. In other words, an approach independent of the build-by-parts strategy requires a high-fidelity method for writing the basepair sequence, because even a single basepair mutation has been shown to cause system-wide disorders such as sickle-cell anemia. Linshiz et al. [68] (this paper proposes a strategy to make large, error-free DNA target molecules) developed a method for writing long, error-free DNA from potentially faulty building blocks (Figure 6). Gibson et al. [69] (this paper demonstrates that it is possible to handle an entire Mycoplasma genome with high fidelity) developed a method for constructing large DNA molecules, such as a 582 970-basepair Mycoplasma genitalium genome.

Figure 5:
Assembling DNA molecules with BioBrick parts [70]. Gene A is to be added to the standardized plasmid p1. Neither Gene A nor any gene within p1 may have sequences that can be recognized by the four restriction enzymes used during the main assembly process. ...
Figure 6:
Recursive construction of error-free DNA molecules from imperfect oligonucleotides [68]. (A) GFP DNA construction. The entire sequence is divided into overlapping ones in silico. These pieces are synthesized conventionally. Assembly by overlapping ssDNA ...

Biological component design

Just as electrical circuits need devices that control data flow, biological networks need biological parts that modulate signal transduction. Informatics issues in components and the network overlap with each other. We will start with components and transition into network informatics.

Synthetic biological devices are often made from natural devices with evolutionary optimization. Natural components may therefore have context dependence that precludes them from compatible connection points with other devices. One example is the codon mismatch that occurs when a biological part is transferred from one organism to a host of a different kingdom [71]. In order to adapt natural parts to the needs of synthetic biology, they must be standardized. Lucks [72] proposed a set of general features to consider when developing a biological device. An ideal part would be independent, reliable, tunable, orthogonal and composable. In other words, it does not interfere with other circuitry, functions as intended (context independent), can function in a range of selectable modes, can be tuned so that it does not interfere with similar devices, and can be combined to function in a system predictably. In addition, DNA sequences must adhere to the rules of transcription control [73]. Suarez et al. [74] discuss the challenges in the computational design of proteins. Martin et al. [71] review guidelines for engineering synthetic enzymes. Recent synthetic biology devices include a cellular counter in Escherichia coli [75], a tunable synthetic mammalian oscillator [76], an aptazyme-based riboswitch [77], a tunable synthetic gene oscillator [78] and a double inversion recombination switch [79]. Incidentally, Tsai et al. [80] argue that biological oscillators sometimes contain positive feedback loops in order to achieve frequency control without amplitude change. Dawid et al. [81] designed synthetic RNA regulatory elements based on transcription attenuator control.

Arkin [79] proposed developing a group of devices from a common core structure by altering a particular key property. Calling them a ‘family of parts’, Arkin argued that related devices are likely to share characterization protocols. Common protocols for a versatile set of devices would simplify the physical composition process, and this would have important ramifications on design strategies as well as parts organization within the Registry. However, it is important to keep in mind that similar devices raise the risk of crosstalk and interference with each other [10]. Unlike electrical circuits, the same ‘logic gate’ probably cannot be used in the same space.

Ellis et al. [11] proposed the development of libraries of diversified components—parts that are functionally equivalent but have differences in the nonessential sequences—for improving design strategy. Differences in nonessential sequences affect quantitative functional efficiencies of components, and this in turn can have a large impact on overall network behavior. If required documented libraries are established prior to design, then one can accurately simulate and fine-tune a system by picking the components with appropriate functional efficiencies. In other words, Ellis et al. [11] proposes to move component ‘tweaking’ to the front-end of the synthetic biology infrastructure and upstream of software-based network design. Such ‘diversified’ parts would address issues of emergent properties, biological noise and tunability. It may also address the need for compatible inputs and outputs in serial connectivity. Ellis et al. [11] successfully employed the above strategy in the development of a feed-forward loop network and a gene timer network. Establishment of such libraries will probably occur not only for DNA but RNA and proteins as well.

Biological noise presents problems for information flow through biological parts. A digital step-like interface between components may reduce the effect that noise would have on an analog system [82].

Network informatics

Information flow can also be addressed from the perspective of networks. The oldest synthetic biological circuits were based on transcriptional regulation. Within the transcriptional network, two genes were connected by having one gene code for the transcription factor of the promoter of the other gene. Carrera et al. [83] (this paper demonstrates a method to model and modify the transcription regulation network of E. coli ) proposed to rewire the transcription regulation network by exchanging the endogenous promoters. Other biological circuit experiments have involved RNA-based regulation and metabolism [84]. Recently, Bashor et al. [85] [this paper introduces and demonstrates the idea of using protein scaffolds (and hence protein–protein interactions) to control synthetic regulatory networks] constructed a biological network through protein–protein interactions. Compared to translation-dependent regulatory circuits, protein-level connections have the potential for quicker response with lower cellular resource consumption rates. Engineering of protein–protein interactions becomes a tractable problem if system design leverages well-characterized protein domains [86] that enable a combinatorial strategy to generating synthetic proteins and signaling pathways. In anticipation of multi-cellular assemblies with synthetic signaling requirements, Weber et al. developed a metabolite-controlled intercellular signaling method [87]. To achieve transient system dynamics, Yin et al. [88] argued for augmenting target structure sequences with the capability to automatically construct self-assembly and disassembly pathways. Yin et al. [88] implemented such a system with a DNA hairpin motif.

Biological noise is also a problem at the network level. Studying noise in complex networks traditionally involves computational perturbation methods, because an in vitro implementation of an arbitrary noise source is not always trivial. To bridge this gap, Lu et al. [89] have developed a means of implementing simple in silico perturbation sources as in vitro molecular noise generators.


Whereas in vitro synthetic biology enables biochemical flexibility, in vivo synthetic biology endows large-scale production capacity to a biological network [90]. The first step in the transition from in vitro to in vivo is the insertion of the constructed DNA into a biological chassis where transcription and translation could take place, such as a bacterium's genome. Itaya et al. [91] addressed physicochemical stability issues of large DNA by developing the Bacillus subtilis genome (BGM) vector, which accommodates large DNA as part of the B. subtilis genome, which might combine well with cell-free expression systems in the future [92]. Shao et al. [93] developed a method for assembling a 19 kb recombinant DNA molecule in Saccharomyces cerevisiae. Minaeva et al. [56] integrated two recombination methods—phages site-specific and Red/ET-mediated—into a straightforward, convenient protocol. This method, called the Dual-In/Out Strategy, was applied successfully on plasmid-less marker-less E. coli.

When a biological network is expressed by synthetic DNA sequences within the host, or engineering chassis, crosstalk between the host and synthetic circuitry can adversely affect performance. For example, endogenous carotenoid pathways in higher plants seem to resist synthetic alterations [94]. Emergent problems from crosstalk is not surprising, even for commonly studied organisms like E. coli, because significant portions of organismal gene regulatory networks are not yet known [95]. Hence, minimizing or at least controlling crosstalk is a desired goal in network information control. One approach is to reach community consensus on a ‘standard’ organism in which developed ‘standard’ parts exhibit negligible crosstalk and other desired properties. The obvious candidates are those that already have methods for accommodating large DNA molecules: S. cerevisiae [93] and E. coli [56]. However, both species will probably require crosstalk reduction through numerous deletions of nonessential genes.

The logical endpoint of systematic nonessential gene deletion is the concept of the minimal cell [96, 97], which in theory is composed only of genetic material critical to survival. Natural minimal cells like Pelagibacter ubique that thrive in resource-deficient environments may also be good starting points for the development of a standard artificial organism [97]. The standard artificial organism, however, is not necessarily a minimal cell, because effective crosstalk elimination may occur before all nonessential genes are deleted. In addition, the genomes of parasitic minimal cells and artificially minimized cells may present fastidious habits and lack the reliability of a bulkier genome [82]. Synthetic biology needs a host that minimizes interference while providing robust cellular infrastructure, and minimals cells do not guarantee that.

Another way to address crosstalk is to develop orthogonal ribosomes and mRNA that interact only with each other and with neither the ribosome nor the genetic material of the host organism [98]. Evolved ribosome–mRNA pairs can then be used to construct cellular networks [99]. With this approach, a synthetic type 1 coherent feed-forward loop was developed in E. coli [100] (this paper demonstrates that synthetic circuits can based on orthogonal transcription–translation networks). With enough orthogonal components, it may be possible to build a parallel metabolism within the cell [101].

Ultimately however, it may be necessary to implement physicochemical partitions with the phospholipid bilayer, whose adoption in natural modules poses a convincing argument for its use in synthetic biology. The bilayer can form a liposome into which one can incorporate several biochemical modules [96], which roughly outline the series of steps needed. This is essentially a ‘ground-up’ approach to the minimal cell, and the option to use artificial, low-interference modules suggests a higher chance of success than the ‘top-down’ approach of multiple gene deletions. Recently, Kuruma et al. [102] (this paper represents the latest progress in the development of the liposome into a viable chassis) developed a liposome-based system that synthesizes phosphatidic acid, a major constituent of cell membranes. A cell-free translation system was encapsulated in a liposome, in which functional membrane enzymes were synthesized. This represents a significant step toward liposome-encapsulated phospholipid bilayer biosynthesis and points toward synthetic modules with autopoietic capabilities.

At the border of in vitro and in vivo synthetic biology is the cell-free system, a platform for implementing complex biological processes outside a cell membrane. Historically, it has been difficult to activate more than one biochemical network in a single platform, but Jewett et al. [103] (this paper represents the latest progress on integrating multiple biochemical networks in a single cell-free system) have recently developed a cell-free system capable of co-activating central catabolism, oxidative phosphorylation, and protein synthesis.

Once a synthetic network has been fully implemented in vivo, the combined host-guest network must be characterized for performance and potential crosstalk. However, experimental perturbations inevitably lead to data noise. In fact, for protein interactions networks the rate of false-positive and false-negative results may be as high as 40% [104, 105]. To address this problem Lappe and Holm [106] have devised a means of efficiently deriving interaction networks. Cantone et al. [63] found that reverse-engineering methods based on ordinary differential equations and Bayesian networks were effective at inferring the structure of a small, synthetic gene regulatory network.


The survey of the role of information processing in synthetic biology reveals how future developments may be influenced by current ones (Table 1). Consolidation of and additions to data exchange formats are needed to enable efficient communication between people and software. The likely improvement in quantitative precision of component functional data will reduce network design unpredictability and post hoc tweaking. Current hosts for in vivo synthetic biology include E. coli and S. cerevisiae, but future hosts may take a more minimalist approach and incorporate orthogonal metabolic systems.

Table 1:
Recent major developments in synthetic biology. For each development, the row indicates its immediate impact niche, and the column indicates the informatics scope. However, all items noted have the potential to deeply influence the progress of synthetic ...

Synthetic biology is the next step in the progress of engineering biological systems. The key informatics challenges (some of which overlap with those of systems biology) are standardization, development of appropriate statistical analysis methods, digital data integrity, biological noise control and limitation of crosstalk (Table 2). When these issues are properly addressed, the result will be artificial organisms unrivaled in their biochemical sophistication.

Table 2:
Idealized recipe for synthetic biology. For each step, potentially useful Tools are identified. All steps exhibit Issues at this time. Emergent properties can be thought of as the result of biological noise. Note that many of the problems can be traced ...

Key Points

  • The main goal of synthetic biology is to start with a set of functions and properties, and build a suitable system out of biological components.
  • Component data standards (such as BioBrick) will likely require extensions to account for quantitative performance data, so that network design can become more predictable.
  • Data standards for networks and components will likely consolidate in order to increase the accuracy of design simulations and efficiency of collaborations.
  • Biological parts development will likely employ the strategy of building quantitatively characterized libraries of diversified components, because these libraries will increase the accuracy of network-level simulations.
  • Host interference of synthetic networks might be effectively addressed by gene deletions and the use of orthogonal protein expression systems.


This work was supported in part by the National Library of Medicine (NLM/NIH) under grant K99 LM009826 and the National Human Genome Research Institute (NHGRI/NIH) under grants 1R01HG003354 and 1R01HG004836.



Gil Alterovitz is a Harvard Medical School faculty member in the Children's Hospital Informatics Program at the Harvard/MIT Division of Health Sciences and Technology (HST).


Taro Muso is a graduate of the Harvard/MIT Division of Health Sciences and Technology (HST) and an affiliate of the Partners Healthcare Center for Personalized Genetic Medicine.


Marco F. Ramoni is the Associate Professor of Pediatrics and Medicine at Harvard Medical School, and the Director of the Biomedical Cybernetics Laboratory at the Partners Healthcare Center for Personalized Genetic Medicine.


  • Barrett CL, Kim TY, Kim HU, et al. Systems biology as a foundation for genome-scale synthetic biology. Curr Opin Biotechnol. 2006;17:488–92. [PubMed]
  • Lee SK, Chou H, Ham TS, et al. Metabolic engineering of microorganisms for biofuels production: from bugs to synthetic biology to fuels. Curr Opin Biotechnol. 2008;19:556–63. [PubMed]
  • Chang MCY, Keasling JD. Production of isoprenoid pharmaceuticals by engineered microbes. Nat Chem Biol. 2006;2:674–81. [PubMed]
  • Weber W, Schoenmakers R, Keller B, et al. A synthetic mammalian gene circuit reveals antituberculosis compounds. Proc Natl Acad Sci. 2008;105:9994–8. [PubMed]
  • Lu TK, Collins JJ. Engineered bacteriophage targeting gene networks as adjuvants for antibiotic therapy. Proc Natl Acad Sci. 2009;106:4629–34. [PubMed]
  • Marguet P, Balagadde F, Tan C, et al. Biology by design: reduction and synthesis of cellular components and behaviour. J R Soc Interface. 2007;4:607–23. [PMC free article] [PubMed]
  • Knight T. Idempotent Vector Design for Standard Assembly of Biobricks. [(23 October 2009, date last accessed)];MIT Synth Biol Wkg Grp. 2003 1:1–11.
  • Brown J. The iGEM competition: building with biology. IET Synth Biol. 2007;1:3–6.
  • Purnick PEM, Weiss R. The second wave of synthetic biology: from modules to systems. Nat Rev Mol Cell Biol. 2009;10:410–22. [PubMed]
  • Matsuoka Y, Ghosh S, Kitano H. J R Soc Interface (Advance online version) 2009. [(23 October 2009, date last accessed)]. Consistent design schematics for biological systems: standardization of representation in biological engineering. doi: 10.1098/rsif.2009.0046.focus . [PMC free article] [PubMed]
  • Ellis T, Wang X, Collins JJ. Diversity-based, model-guided construction of synthetic gene networks with predicted functions. Nat Biotech. 2009;27:465–71. [PMC free article] [PubMed]
  • Canton B, Labno A, Endy D. Refinement and standardization of synthetic biological parts and devices. Nat Biotech. 2008;26:787–93. [PubMed]
  • Peccoud J, Blauvelt MF, Cai Y, et al. Targeted development of registries of biological parts. PLoS ONE. 2008;3:e2671. [PMC free article] [PubMed]
  • Participants. Standards and Specifications in Synthetic Biology Workshop April 26–27; Seattle, WA, USA, 2008. PoBoL: provisional BioBrick language.
  • Hucka M, Finney A, Sauro HM, et al. The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics. 2003;19:524–31. [PubMed]
  • Finney A, Hucka M. Systems biology markup language: Level 2 and beyond. Biochem Soc Trans. 2003;31:1472–3. [PubMed]
  • Hucka M, Finney A, Bornstein BJ, et al. Evolving a lingua franca and associated software infrastructure for computational systems biology: the Systems Biology Markup Language (SBML) project. Syst Biol (Stevenage) 2004;1:41–53. [PubMed]
  • Endler L, Rodriguez N, Juty N, et al. Designing and encoding models for synthetic biology. J R Soc Interface. 2009;6:S405–17. [PMC free article] [PubMed]
  • Beard DA, Britten R, Cooling MT, et al. CellML metadata standards, associated tools and repositories. Philos Transact A Math Phys Eng Sci. 2009;367:1845–67. [PMC free article] [PubMed]
  • Lloyd CM, Halstead MD, Nielsen PF. CellML: its future, present and past. Prog Biophys Mol Biol. 2004;85:433–50. [PubMed]
  • Novere NL, Finney A, Hucka M, et al. Minimum information requested in the annotation of biochemical models (MIRIAM) Nat Biotech. 2005;23:1509–15. [PubMed]
  • Le Novere N, Moodie S, Sorokin A, et al. Systems biology graphical notation: process diagram level 1. Nature Precedings. 2008:1–75. (23 October 2009, date last accessed)
  • Novere NL, Hucka M, Mi H, et al. The systems biology graphical notation. Nat Biotech. 2009;27:735–741. [PubMed]
  • Workgroup B. BioPAX – biological pathways exchange language, level 3, release candidate 3 (version 0.92) documentation. BioPAX Workgroup 2007.
  • Ausbrooks R, Buswell S, Carlisle D, et al. Mathematical Markup Language (MathML) [(23 October 2009, date last accessed)]. Version 2.0, 2nd edn. Edited by (NAG) DC, Patrick Ion (Mathematical Reviews AMS, Robert Miner (Design Science I, Scope) NPP: W3C; REC-MathML2-20031021, 2003.
  • [(23 October 2009, date last accessed)]. RDF/XML Syntax Specification (Revised) on World Wide Web URL:
  • Le Novère N, Courtot M, Laibe C. Adding semantics in kinetics models of biochemical pathways. [(23 October 2009, date last accessed)]. Proceedings of the 2nd International Symposium on experimental standard conditions of enzyme characterizations (ESEC 2006) 19–23 March 2006; Beilstein Institute, Frankfurt am Main, Germany, 2007, pp. 137–53.
  • Le Novere N. Model storage, exchange and integration. BMC Neurosci. 2006;7(Suppl 1):S11. [PMC free article] [PubMed]
  • Marchisio MA, Stelling J. Computational design of synthetic gene circuits with composable parts. Bioinformatics. 2008;24:1903–10. [PubMed]
  • Bornstein BJ, Keating SM, Jouraku A, et al. LibSBML: an API library for SBML. Bioinformatics. 2008;24:880–1. [PMC free article] [PubMed]
  • de Silva E, Stumpf MPH. Complex networks and simple models in biology. J R Soc Int. 2005;2:419–30. [PMC free article] [PubMed]
  • Le Novere N, Bornstein B, Broicher A, et al. BioModels Database: a free, centralized database of curated, published, quantitative kinetic models of biochemical and cellular systems. Nucleic Acids Res. 2006;34:D689–91. [PMC free article] [PubMed]
  • Erhard F, Friedel CC, Zimmer R. FERN - a Java framework for stochastic simulation and evaluation of reaction networks. BMC Bioinformatics. 2008;9:356. [PMC free article] [PubMed]
  • Hower V, Mendes P, Torti FM, et al. A general map of iron metabolism and tissue-specific subnetworks. Mol Biosyst. 2009;5:422–43. [PMC free article] [PubMed]
  • Calzone L, Gelay A, Zinovyev A, et al. A comprehensive modular map of molecular interactions in RB/E2F pathway. Mol Syst Biol. 2008;4:173. [PMC free article] [PubMed]
  • Bentley PJ. Methods for improving simulations of biological systems: systemic computation and fractal proteins. J R Soc Int. 2009;6:S451–66. [PMC free article] [PubMed]
  • Schilstra MJ, Li L, Matthews J, et al. CellML2SBML: conversion of CellML into SBML. Bioinformatics. 2006;22:1018–20. [PubMed]
  • Lloyd CM, Lawson JR, Hunter PJ, et al. The CellML model repository. Bioinformatics. 2008;24:2122–3. [PubMed]
  • Wimalaratne SM, Halstead MDB, Lloyd CM, et al. Biophysical annotation and representation of CellML models. Bioinformatics. 2009;25:2263–70. [PubMed]
  • Berners-Lee T, Fielding R, Masinter L. Uniform Resource Identifier (URI): Generic Syntax. [(23 October 2009, date last accessed)]. Request For Comments Archive 2005, RFC3986:
  • Laibe C, Le Novere N. MIRIAM resources: tools to generate and resolve robust cross-references in systems biology. BMC Syst Biol. 2007;1:58. [PMC free article] [PubMed]
  • Drager A, Hassis N, Supper J, et al. SBMLsqueezer: a CellDesigner plug-in to generate kinetic rate equations for biochemical networks. BMC Syst Biol. 2008;2:39. [PMC free article] [PubMed]
  • Luciano JS. PAX of mind for pathway researchers. Drug Discov Today. 2005;10:937–42. [PubMed]
  • Stromback L, Lambrix P. Representations of molecular pathways: an evaluation of SBML, PSI MI and BioPAX. Bioinformatics. 2005;21:4401–07. [PubMed]
  • Pedersen M, Phillips A. Towards programming languages for genetic engineering of living cells. J R Soc Interface. 2009;6:S437–50. [PMC free article] [PubMed]
  • Bray T, Paoli J, Sperberg-McQueen CM, et al. Extensible Markup Language (XML) 1.0. 5th edn. 2008. [23 October 2009, date last accessed].
  • Hill AD, Tomshine JR, Weeding EMB, et al. SynBioSS: the synthetic biology modeling suite. Bioinformatics. 2008;24:2551–3. [PubMed]
  • Czar MJ, Cai Y, Peccoud J. Writing DNA with GenoCAD. Nucleic Acids Res. 2009;37:W40–W47. [PMC free article] [PubMed]
  • Garny A, Noble D, Hunter PJ, et al. Cellular Open Resource (COR): current status and future directions. Philos T Roy Soc A: Math Phys Eng Sci. 2009;367:1885–905. [PubMed]
  • Cline MS, Smoot M, Cerami E, et al. Integration of biological networks and gene expression data using Cytoscape. Nat Protocols. 2007;2:2366–82. [PMC free article] [PubMed]
  • Guziolowski C, Bourde A, Moreews F, et al. BioQuali Cytoscape plugin: analysing the global consistency of regulatory networks. BMC Genomics. 2009;10:244. [PMC free article] [PubMed]
  • Bindea G, Mlecnik B, Hackl H, et al. ClueGO: a Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks. Bioinformatics. 2009;25:1091–3. [PMC free article] [PubMed]
  • Clement-Ziza M, Malabat C, Weber C, et al. Genoscape: a Cytoscape plug-in to automate the retrieval and integration of gene expression data and molecular networks. Bioinformatics. 2009;25:2617–8. [PMC free article] [PubMed]
  • Gao J, Ade AS, Tarcea VG, et al. Integrating and annotating the interactome using the MiMI plugin for cytoscape. Bioinformatics. 2009;25:137–8. [PMC free article] [PubMed]
  • Funahashi A, Morohashi M, Kitano H, et al. CellDesigner: a process diagram editor for gene-regulatory and biochemical networks. BIOSILICO. 2003;1:159–62.
  • Minaeva NI, Gak ER, Zimenkov DV, et al. Dual-In/Out strategy for genes integration into bacterial chromosome: a novel approach to step-by-step construction of plasmid-less marker-less recombinant E. coli strains with predesigned genome structure. BMC Biotechnol. 2008;8:63. [PMC free article] [PubMed]
  • Loewe L. A framework for evolutionary systems biology. BMC Syst Biol. 2009;3:27. [PMC free article] [PubMed]
  • Chen BS, Chang CH, Lee HC. Robust synthetic biology design: stochastic game theory approach. Bioinformatics. 2009;25:1822–30. [PMC free article] [PubMed]
  • Chen B-S, Wu C-H. A systematic design method for robust synthetic biology to satisfy design specifications. BMC Systems Biology. 2009;3:66. [PMC free article] [PubMed]
  • Banga J. Optimization in computational systems biology. BMC Syst Biol. 2008;2:47. [PMC free article] [PubMed]
  • Hadlich F, Noack S, Wiechert W. Translating biochemical network models between different kinetic formats. Metab Engineering. 2009;11:87–100. [PubMed]
  • Dasika M, Maranas C. OptCircuit: An optimization based method for computational design of genetic circuits. BMC Systems Biology. 2008;2:24. [PMC free article] [PubMed]
  • Cantone I, Marucci L, Iorio F, et al. A yeast synthetic network for in vivo assessment of reverse-engineering and modeling approaches. Cell. 2009;137:172–81. [PubMed]
  • Stumpf MP, Wiuf C, May RM. Subnets of scale-free networks are not scale-free: sampling properties of networks. Proc Natl Acad Sci USA. 2005;102:4221–4. [PubMed]
  • Lenaerts T, Ferkinghoff-Borg J, Schymkowitz J, et al. Information theoretical quantification of cooperativity in signalling complexes. BMC Syst Biol. 2009;3:9. [PMC free article] [PubMed]
  • Lenaerts T, Ferkinghoff-Borg J, Stricher F, et al. Quantifying information transfer by protein domains: analysis of the Fyn SH2 domain structure. BMC Struct Biol. 2008;8:43. [PMC free article] [PubMed]
  • Greber D, Fussenegger M. Mammalian synthetic biology: engineering of sophisticated gene networks. J Biotechnol. 2007;130:329–45. [PubMed]
  • Linshiz G, Yehezkel TB, Kaplan S, et al. Recursive construction of perfect DNA molecules from imperfect oligonucleotides. Mol Syst Biol. 2008;4:191. [PMC free article] [PubMed]
  • Gibson DG, Benders GA, Andrews-Pfannkoch C, et al. Complete chemical synthesis, assembly, and cloning of a mycoplasma genitalium genome. Science. 2008;319:1215–20. [PubMed]
  • Leonard E, Nielsen D, Solomon K, et al. Engineering microbes with synthetic biology frameworks. Trends Biotechnol. 2008;26:674–81. [PubMed]
  • Martin CH, Nielsen DR, Solomon KV, et al. Synthetic metabolism: engineering biology at the protein and pathway scales. Chemistry & Biology. 2009;16:277–86. [PubMed]
  • Lucks JB, Qi L, Whitaker WR, et al. Toward scalable parts families for predictable design of biological circuits. Curr Opin Microbiol. 2008;11:567–73. [PubMed]
  • Welch M, Villalobos A, Gustafsson C, et al. You're one in a googol: optimizing genes for protein expression. J R Soc Interface. 2009;6:S467–76. [PMC free article] [PubMed]
  • Suarez M, Jaramillo A. Challenges in the computational design of proteins. J R Soc Interface. 2009;6:S477–91. [PMC free article] [PubMed]
  • Friedland AE, Lu TK, Wang X, et al. Synthetic gene networks that count. Science. 2009;324:1199–202. [PMC free article] [PubMed]
  • Tigges M, Marquez-Lago TT, Stelling J, et al. A tunable synthetic mammalian oscillator. Nature. 2009;457:309–312. [PubMed]
  • Atsushi O, Mizuo M. An Artificial Aptazyme-Based Riboswitch and its Cascading System in E. coli. Chem Bio Chem. 2008;9:206–9. [PubMed]
  • Stricker J, Cookson S, Bennett MR, et al. A fast, robust and tunable synthetic gene oscillator. Nature. 2008;456:516–9. [PubMed]
  • Ham TS, Lee SK, Keasling JD, et al. Design and construction of a double inversion recombination switch for heritable sequential genetic memory. PLoS ONE. 2008;3:e2815. [PMC free article] [PubMed]
  • Tsai TY-C, Choi YS, Ma W, et al. Robust, Tunable Biological Oscillations from Interlinked Positive and Negative Feedback Loops. Science. 2008;321:126–9. [PMC free article] [PubMed]
  • Dawid A, Cayrol B, Isambert H. RNA synthetic biology inspired from bacteria: construction of transcription attenuators under antisense regulation. Physical Biology. 2009;6:025007. [PubMed]
  • Andrianantoandro E, Basu S, Karig DK, et al. Synthetic biology: new engineering rules for an emerging discipline. Mol Syst Biol. 2006;2:2006.0028. [PMC free article] [PubMed]
  • Carrera J, Rodrigo G, Jaramillo A. Model-based redesign of global transcription regulation. Nucleic Acids Res. 2009;37:e38. [PMC free article] [PubMed]
  • Guye P, Weiss R. Customized signaling with reconfigurable protein scaffolds. Nat Biotech. 2008;26:526–8. [PubMed]
  • Bashor CJ, Helman NC, Yan S, et al. Using engineered scaffold interactions to reshape MAP kinase pathway signaling dynamics. Science. 2008;319:1539–43. [PubMed]
  • Pawson T, Nash P. Assembly of Cell Regulatory Systems Through Protein Interaction Domains. Science. 2003;300:445–52. [PubMed]
  • Weber W, Schuetz M, Denervaud N, et al. A synthetic metabolite-based mammalian inter-cell signaling system. Mol Biosyst. 2009;5:757–63. [PubMed]
  • Yin P, Choi HMT, Calvert CR, et al. Programming biomolecular self-assembly pathways. Nature. 2008;451:318–22. [PubMed]
  • Lu T, Ferry M, Weiss R, et al. A molecular noise generator. Phys Biol. 2008;5:036006. [PubMed]
  • Forster AC, Church GM. Synthetic biology projects in vitro. Genome Res. 2007;17:1–6. [PubMed]
  • Itaya M, Fujita K, Kuroki A, et al. Bottom-up genome assembly using the Bacillus subtilis genome vector. Nat Meth. 2008;5:41–3. [PubMed]
  • Yoshihiro S, Yutetsu K, Bei-Wen Y, et al. Cell-free translation systems for protein engineering. FEBS J. 2006;273:4133–40. [PubMed]
  • Shao Z, Zhao H, Zhao H. DNA assembler, an in vivo genetic method for rapid construction of biochemical pathways. Nucleic Acids Res. 2009;37:e16. [PMC free article] [PubMed]
  • Fraser PD, Enfissi EMA, Bramley PM. Genetic engineering of carotenoid formation in tomato fruit and the potential application of systems and synthetic biology approaches. Arch Biochem Biophys. 2009;483:196–204. [PubMed]
  • Keseler IM, Bonavides-Martinez C, Collado-Vides J, et al. EcoCyc: a comprehensive view of Escherichia coli biology. Nucleic Acids Res. 2009;37:D464–70. [PMC free article] [PubMed]
  • Forster AC, Church GM. Towards synthesis of a minimal cell. Mol Syst Biol. 2006;2:45. [PMC free article] [PubMed]
  • Moya A, Gil R, Latorre A, et al. Toward minimal bacterial cells: evolution vs. design. FEMS Microbiol Rev. 2009;33:225–35. [PubMed]
  • Rackham O, Chin JW. A network of orthogonal ribosome-mRNA pairs. Nat Chem Biol. 2005;1:159–66. [PubMed]
  • Rackham O, Chin JW. Synthesizing cellular networks from evolved ribosome-mRNA pairs. Biochem Soc Trans. 2006;34:328–9. [PubMed]
  • An W, Chin JW. Synthesis of orthogonal transcription-translation networks. Proc Natl Acad Sci. 2009;106:8477–82. [PubMed]
  • Filipovska A, Rackham O. Building a parallel metabolism within the cell. ACS Chem Biol. 2008;3:51–63. [PubMed]
  • Kuruma Y, Stano P, Ueda T, et al. A synthetic biology approach to the construction of membrane proteins in semi-synthetic minimal cells. Biochimica et Biophysica Acta (BBA) - Biomembranes. 2009;1788:567–74. [PubMed]
  • Jewett MC, Calhoun KA, Voloshin A, et al. An integrated cell-free metabolic platform for protein production and synthetic biology. Mol Syst Biol. 2008;4:220. [PMC free article] [PubMed]
  • Tong AH, Lesage G, Bader GD, et al. Global mapping of the yeast genetic interaction network. Science. 2004;303:808–13. [PubMed]
  • Bader JS, Chaudhuri A, Rothberg JM, et al. Gaining confidence in high-throughput protein interaction networks. Nat Biotechnol. 2004;22:78–85. [PubMed]
  • Lappe M, Holm L. Unraveling protein interaction networks with near-optimal efficiency. Nat Biotechnol. 2004;22:98–103. [PubMed]

Articles from Briefings in Bioinformatics are provided here courtesy of Oxford University Press