|Home | About | Journals | Submit | Contact Us | Français|
The advent of high throughput genome-scale bioinformatics has led to an exponential increase in available cellular system data. Systems metabolic engineering attempts to use data-driven approaches – based on the data collected with high throughput technologies – to identify gene targets and optimize phenotypical properties on a systems level. Current systems metabolic engineering tools are limited for predicting and defining complex phenotypes such as chemical tolerances and other global, multigenic traits. The most pragmatic systems-based tool for metabolic engineering to arise is the in silico genome-scale metabolic reconstruction. This tool has seen wide adoption for modeling cell growth and predicting beneficial gene knockouts, and we examine here how this approach can be expanded for novel organisms. This review will highlight advances of the systems metabolic engineering approach with a focus on de novo development and use of genome-scale metabolic reconstructions for metabolic engineering applications. We will then discuss the challenges and prospects for this emerging field to enable model-based metabolic engineering. Specifically, we argue that current state-of-the-art systems metabolic engineering techniques represent a viable first step for improving product yield that still must be followed by combinatorial techniques or random strain mutagenesis to achieve optimal cellular systems.
Cellular metabolism is a coordinated, interwoven collection of metabolites, enzymes, and regulatory factors. Metabolic engineering attempts to favor desired product formation by reconfiguring this interconnected network through the introduction of genetic controls and novel enzymes. In one respect, the complexity of these cellular networks makes engineering these systems a daunting task. However, at the same time, each of the sources of complexity within a cell provides an access point for improving cellular phenotype. In this regard, changes at the genetic, regulatory, enzymatic, and small molecule level can lead to desired phenotypes. However, it is usually difficult (and sometimes impossible) to naively select a genetic target among the wide array of potential candidate genes and components. To address this complexity, systematic approaches for redesigning cells have been formalized and explored. This basic premise founded the field of Metabolic Engineering – initially aimed at identifying pathway limitations through the systematic analysis and quantification of pathway fluxes.
A great deal has changed since the field of metabolic engineering was first described nearly 20 years ago . Specifically, the advent of post-genomic technologies and high-throughput biology provides the ability to make cellular measurements and perturbations at vastly increased speeds and accuracy. The resulting explosion of data has enabled the newfound ability to take accurate snapshots of entire cellular function. Integrating and synthesizing this data forms the foundation for systems biology research. When the efforts of systems biology and metabolic engineering are combined, a systems metabolic engineering approach  emerges with promises to unlock cellular potential and describe cellular phenomena. This approach truly appreciates the cell as an integrated, global network, and attempts to use model-based and data-driven approaches to identify pathway bottlenecks and provide cellular reconstructions. However, the cost of producing these models (and the data required to create and validate them) is often high both financially and in terms of time consumption. Therefore, it is essential that a sound methodology and firm set of outcomes and expectations be defined prior to utilizing these approaches. The purpose of this review is to provide an overview of systems metabolic engineering and to highlight areas where more work must be done before realizing the potential of ground-up systems metabolic engineering. In this review, we will particularly recognize the contributions of genome-scale modeling as it remains the most tangible and applicable systems biology approach for metabolic engineering. Prior to discussing these approaches, we will review the area of metabolic engineering to build the context for the emerging systems metabolic engineering paradigm. We will then discuss the complete work throughput required to perform systems metabolic engineering for a newly discovered organism. Finally, we will conclude with prospects and challenges for the future of systems metabolic engineering as an enabling tool for improving cellular phenotypes.
Metabolic engineering embodies the manipulation of enzymatic, transport, and regulatory functions of a cell through recombinant DNA technologies with the goal of improving a cellular phenotype, often yield of a desired product. The traditional metabolic engineering toolbox comprises rationally selected deletions and overexpressions of native and heterologous genes . More recently, this toolbox has been expanded to include many new tools for controlling gene expression, for modulating regulatory networks, for combinatorial genetics, and for employing synthetic biology approaches (Text Box 1). The current portfolio of advances in metabolic engineering is large for such a young field of study. A recent example of complex metabolic pathway engineering can be seen in work by the Keasling lab to produce artemisinic acid, a precursor to artemisinin, an anti-malarial drug. By relieving growth inhibition caused by a toxic pathway intermediate [4–7] and by tuning intergenic regions of polycistronic operons to alter expression levels of individual genes to balance flux , flux through the pathway to the intermediate amorphadiene was increased 1 000 000 fold and resulted in an artemisinic acid titer of 300 mg/L . Similar heterologous pathway engineering approaches have been used to produce other complex products such as fosfomycin [9, 10] and novel polyketides [11–13].
Metabolic engineering has had additional success increasing the productivity of industrially relevant small molecules [14–17], alcohol-based biofuels [18–21], and biodiesel [22, 23]. Recently, this work has been expanded to hijack E. coli's amino acid biosynthetic pathway and divert 2-keto acid intermediates for the synthesis of 1-butanol, 2-methyl-1-butanol, 3-methyl-1-butanol, and 2-phenylethanol . A similar approach was used to produce (S)-3-methyl-1-pentanol . In contrast to these single pathway optimization projects, creating complex products or phenotypes places a larger demand on metabolic engineers often requiring novel approaches to modulate multiple gene targets at the same time. Combinatorial metabolic engineering approaches have attempted to overcome this problem. Examples include improving xylose metabolism [26, 27] and further probing metabolic landscapes  in E. coli. Genome shuffling and shotgun genomic approaches have been successful in inducing phenotypical improvements [29, 30] and has led to improved thermotolerance, ethanol tolerance, and ethanol production in Saccharomyces cerevisiae  and to improved acid tolerance and L-lactic acid production in Lactobacillus rhamnosus . These are two industrially relevant strain improvements. A final example of combinatorial metabolic engineering is the use of global transcriptional machinery engineering (gTME) to increase ethanol tolerance and production in yeast [23, 33]. Most of these combinatorial tools were created to address a limited ability to systematically improve and model cellular phenotypes.
Systems metabolic engineering  embodies the incorporation and probing of large-scale datasets with the goal of improving a cellular phenotype and synthesizing cellular function in the form of models. Recent advances in the “omics” technologies enabled by high throughput biology techniques have expanded traditional metabolic engineering to further incorporate a systems-level view of cells . These capabilities have ushered in the field of systems metabolic engineering [2, 34] (Fig. 1). However, this wealth of data has created new challenges. Specifically, a major goal of systems biology involves combining high throughput genomic, transcriptomic, proteomic, metabolomic, and fluxomic data to develop a robust and experimentally confirmable in silico cell model. This complete cell model could theoretically simulate (and ideally predict) cell and metabolic function. In this regard, this model would be invaluable for metabolic engineering by enabling rational predictions of phenotypical response for changes in media, gene knockouts, antibiotic effects, or incorporation of heterologous pathways.
Large-scale global measurements are favored as a means of assessing cellular and metabolic function. In some respects, the use of carbon-labeled substrates to reconstruct cell-wide flux maps represented the first attempt at applying a systems biology approach toward metabolic engineering. More recently, our capacities have expanded beyond this point to measure transcript levels, protein levels, interactions, concentrations, and even localizations. However, our ability to reconcile this data is not yet complete enough to build comprehensive cell models. Consequently, the most comprehensive and predictive models to come out of systems biology work are global metabolic network reconstructions. These models serve as a basic outgrowth of simple material balances and typically only account for stoichiometric reactions occurring within the cell. Despite being simplistic in their view of the cell, these models have successfully predicted various metabolic perturbations and can aid in designing improved cells (examples discussed below). These in silico genome-scale metabolic reconstructions form the backbone of future applications of systems metabolic engineering.
Metabolic engineering has numerous successes to its credit, and new metabolic engineering tools have recently been discovered . For instance, by mutating the genetic sequence of a constitutive promoter, Alper et al.  created a library of promoters with different strengths, allowing the transcription rate of a gene to be modulated to desired levels by inserting the correct promoter upstream. This allows for the possibility of engineering the transcription rates of all enzymes in a metabolic pathway to optimize pathway flux. In addition to modifying the promoters themselves, Alper et al. [23, 33] mutated sigma factors in prokaryotes and binding and associated transcription factors in eukaryotes to alter the global transcriptional machinery of the model organism. This global transcription machinery engineering (gTME) altered expression levels for hundreds of different compared to wildtype expression levels. Several groups have employed the use of shotgun genomics and combinatorial genetic screens to identify gene over-expressions in a high-throughput manner. Synthetic biologists have invented many devices that can be used by metabolic engineers for cellular programming. The development of an inducible “on” switch for cell pathways can allow for the production of toxic products when cell mass is maximized . Furthering this, Gardner el al.  developed a novel genetic toggle switch, an inducible system that can switch between two states, with gene A on and gene B off or with gene B on and gene A off. These cellular “on/off” switches can easily be incorporated into metabolic models.
These novel metabolic engineering tools, promoter engineering, gTME, combinatorial shotgun overexpressions, and synthetic cell switches, will prove valuable additions to the metabolic engineer's toolbox.
While still an emergent field, systems metabolic engineering has already had significant successes. Park et al. [35, 36] and Lee et al. [2, 36] used similar approaches to improve the production of L-valine and L-threonine, respectively in E. coli. Their approaches diverged from the traditional means of creating industrial amino acid producing strains based on using random mutations and screening. Product yield was first increased using traditional pathway approaches (overexpression of rate limiting enzymes and deletion of genes to increase metabolic precursors). Then a systems metabolic engineering approach was used taking advantage of transcriptome analysis and in silico model-based metabolic reconstructions to identify gene knockouts. Sequential rounds of non-random mutations resulted in final product yields of 0.378 g L-valine per gram glucose and 0.393 g L-threonine per gram glucose. These yields are comparable to those values obtained from industrial strains, demonstrating that a rational approach can achieve yields similar to an unguided approach in shorter time-frames. Moreover, these strains may be further enhanced through in silico predictions because their genomes are still fully characterized, unlike in a randomly mutated strain. This outcome represents an important difference in systems metabolic engineering over strain improvement through random mutagenesis.
In a similar fashion, Alper et al.  used a genome-scale metabolic model of E. coli to identify single, double, and triple gene knockouts that improved lycopene production. A triple knockout system, which would have been intractable to discover without the genome-scale model using standard strain optimization search strategies , yielded lycopene production at nearly a 37% increase over an engineered parental strain. Furthermore, in 2002, Lee et al. [16, 39] used an early E. coli metabolic reconstruction to engineer the production of succinic acid, reaching 85% of the maximum theoretical yield.
More recently, these models have been employed to increase yields of complex products as in the case of recombinant human interleukin-2 (IL-2) production in E. coli . A simplified stoichiometric model (a lumped model that ignored vitamins and minerals) was used to predict amino acid supplementations that would increase IL-2 production. Successful results included an increase from 81 to 195 mg IL-2/L in shake flask, and 403 to 594 mg IL-2/L in a fermenter, and 5150 to 10 010 mg IL-2/L in a fed-batch cultivated fermenter. Albeit simplified, this model was able to successfully predict changes in culturing conditions.
Beyond E. coli, this approach has seen significant adoption in the basic yeast, S. cerevisiae. Initial work was performed to improve succinic acid production  in yeast. Bro et al.  used an in silico genome-scale metabolic reconstruction of S. cerevisiae to engineer yeast reductive oxidative metabolism for decreased glycerol production and increased ethanol production on glucose as well as on a glucose/xylose carbon source.
A third industrially relevant organism suitable for systems metabolic engineering is the amino acid producing bacterium Corynebacterium glutamicum , responsible for producing nearly 1 500 000 tons/yr of L-glutamate and 550 000 tons/yr of L-lysine. Kieldsen et al.  recently published a genome-scale in silico metabolic reconstruction of C. glutamicum. In addition, work has already been done to aide in the reconstruction of transcriptional regulatory networks of Corynebacteria, easing potential future integration of metabolic and regulatory networks to increase model robustness . Given the results above, this model may be used to engineer C. glutamicum for increased industrial amino acid titers.
The results above illustrate the power of in silico modeling as a complement to traditional metabolic engineering approaches for producing small molecules and biopharmaceuticals. The successful application of limited genome-scale models for metabolic engineering give reason to believe that further advances will be made when more comprehensive models are assembled. Beyond the limited scope of these models, a major potential drawback is the availability of a metabolic reconstruction. For standard, un-mutated strains of interest, network availability is often not an issue due to freely available genome-scale reconstructions for common organisms such as Escherichia coli , Saccharomyces cerevisiae , Aspergillus niger , Bacillus subtilus , and Corynebacterium glutamicum . However, for industrial strains, mutant versions of common strains, or newly isolated organisms, the lack of an in silico model limits the applicability of a systems-based approach. Thus, researchers are challenged with the following dilemma: invest resources to create a model or use established, traditional approaches. In the next section, we go through the steps in network construction and highlight limitations and challenges that need to be overcome.
A critical decision for any metabolic engineering project is the selection of a platform organism . Uncharacterized, non-model organisms may have innate biochemical pathways to produce desired products but can exhibit low growth rates or be limited by poorly developed genetic tools. Common industrial model organisms such as S. cerevisiae and E. coli have developed genetic tools, but may be lacking in necessary resistance or biosynthetic pathways.
When non-model organisms are chosen as the platform, building systems biology expertise and capacity from scratch is not a trivial task. High throughput biology measurements require high precision experiments. However, the price of these techniques is decreasing as they become more standardized. For the case of in silico genome-scale metabolic reconstructions, many steps are required if starting with an unsequenced organism (Fig. 2).
The required input data for an in silico genome-scale metabolic reconstruction is access to the genome sequence for the organism to determine innate cellular capacities. The first genome sequenced was that of bacteriophage ΦX175, in 1977 . A seemingly disproportionate amount of effort was necessary to discover the 5 kilobase sequence. By the early 1990s, Sanger's biochemistry had revolutionized sequencing technology , and the automation of Sanger's chain termination method would eventually allow for the shotgun sequencing of much larger genomes  with decreasing costs. However, recent advances in “next-generation”  sequencing (relying mainly on cheaper, shorter reads) have greatly reduced this cost. For example, the next-generation ABi SOLiD platform sequencer can sequence DNA for about $2 per megabase, compared to $500 per megabase with Sanger-based sequencing. Thus, next generation sequencers provide the means for rapid de novo sequencing, a required first step for model reconstruction.
These drastic decreases in sequencing cost have essentially brought major sequencing centers into the hands of individual investigators. This explosion of genomic data allows for comparative genomics, the systems-based study of genomes across different strains or species. Comparative genomics can be used to rationally engineer metabolic pathways by uncovering essential enzymes that can increase product yield. As an example, a comparative genomic analysis between the E. coli and Mannheimia succiniciproducens genomes, combined with in silico flux analysis, allowed for a seven-fold improvement in succinic acid yield in E. coli . In the near future, these costs will be reduced further, removing this step as a financial or time limitation in the systems metabolic engineering approach. We expect that sequencing diverse collections of organisms will lead to systems-based pathway design and will help advance the field of metabolic engineering by facilitating pathway construction and design. Currently however, the cost of sequencing for more than a handful of organisms cannot be overlooked as a trivial cost.
Following genome sequencing, the next step in the metabolic network reconstruction process requires the bioinformatic discovery of all unique ORFs coding for enzymes in the metabolic network. Once identified, ORFs are assigned an enzyme functionality based on database information that includes activity, substrate specificity, cofactor dependence, and location within the cell (for compartmentalized models) . Challenges to this process include assigning function to enzymes that may catalyze several reactions, enzymes with broad substrate specificity, or enzymes unique to the organism of study.
Genome annotation and metabolic reconstructions can be automated through a coupling with metabolic databases (Table 1), such as KEGG , LIGAND , BioCYC , EcoCYC , MetaCYC , PathBLAST , FMM , SEED , and BRENDA . These databases collect bioinformatic and systems biology knowledge sets, serving as a repository for new or well-characterized pathways and reconstructions. As this repository grows in size and continues to be characterized, it essentially amounts to available biological catalysts that can be “pulled off the shelf” and imported into cells via synthetic biology constructs. Thus, the prospect for de novo pathway design is being advanced in these endeavors.
For uncommon genes, function can be assigned using gene finding algorithms, sequence homology searches, and non-homology based algorithms (Table 1). The drawback of these approaches is that they rely upon databases and commonalities of studied organisms, but seemingly similar enzymes can have different functionalities in different organisms. Furthermore, many unsequenced organisms contain pathways, enzymes, or metabolites that have not been characterized. In particular, the biochemical verification of even very closely related enzymes is important because minor differences in sequence have been known to drastically alter enzyme function. Thus, a great deal of hand curation is necessary to ensure the accuracy of the genome annotation, making this step the limiting factor for employing a systems metabolic engineering approach to unsequenced organisms.
Once organism annotation is completed, it is necessary to create the metabolic network and component interactions . For a metabolic network enzyme, reaction stoichiometry, cofactor specificity, substrate specificity, directionality or reversibility, and cellular location all need to be included in the model . The combination of verified automatically generated network reactions and manually curated reactions yields the first version of the genome-scale metabolic reconstruction.
The first metabolic reconstruction is often neither sufficiently accurate nor complete, thus an approach to improve the model is necessary. At this stage, computer programming-based flux analysis can be performed using the in silico metabolic reconstruction to assess its accuracy (Table 1). In this regard, flux balance analysis (FBA) is a constraint-based optimization approach toward quantifying flux distribution inside a genome-scale in silico metabolic reconstruction under the assumption of steady state conditions. An informative review of many flux analysis techniques has previously been published by Park et al. . Initially, these in silico models are used to assess whether the model can predict biomass formation in general. These biomass formation tests are used to address the absence of key metabolic enzymes.
After successfully modeling cell growth, the reconstruction should be checked to confirm that it generates all known metabolites produced by the organism. This step requires either prior knowledge of the strain or global systems biology measurements of cellular components, including metabolites. The inability of the metabolic reconstruction simulation to produce a known cellular component infers a gap in the network that must be resolved using database or experimental methods. Furthermore, Manichaikul et al.  developed a methodology based on RT-PCR and RACE to verify hypothetical enzymes in order to refine genome annotations.
When the metabolic reconstruction can replicate cellular metabolites synthesis, further iterative improvement of the model is possible using comparative gene knockout data. Experimental data about growth or lack of growth of gene knockout strains can be compared to the in silico model. If the in silico model predicts no growth, while the experimental knockout grows, the discrepancy is likely due to an isozyme or an unsuspected metabolic pathway. If the in silico predicts growth, but the experimental knock out will not grow, then most likely there are enzyme functionalities in the metabolic reconstruction that are not actually present in the organism. GrowMatch  is a novel program that can automatically search for reactions to add to or to suppress from the network to help fix these growth versus no growth discrepancies.
To the present date, most successful systems metabolic engineering approaches have used flux analysis to model gene knockouts in metabolic reconstructions in order to identify otherwise intractable deletions to increase product yield. These gene knockouts are modeled in silico by constraining the flux through the deleted reaction to be zero. Often, these reconstructions also introduce synthetic enzyme pathways to model non-native products.
A number of optimization algorithms have been developed to better reflect flux redistributions in response to a gene deletion. In addition to the linear-programming of FBA, minimization algorithms such as  minimization of metabolic adjustment (MOMA) and  regulatory on/off minimization (ROOM) have been developed to model gene knockouts. MOMA minimizes the flux redistributions in knockout models compared to wildtype fluxes while ROOM minimizes the number of significant flux changes in knockout flux models compared to wildtype fluxes. FBA, MOMA, and ROOM all only have a single objective function when calculating cellular flux distributions, normally to optimize cell growth. However, the common objective of a systems metabolic engineer is to optimize production of a given metabolite without decreasing cell growth. Therefore, the ability to specify multiple objectives, as in OptKnock , is very useful for the dual optimization of cell growth and product yield. Additional modeling constructions such as OptStrain , OptReg , and OptGene  have potential systems metabolic engineering application. These modeling approaches attempt to find multiple targets for flux improvement through gene overexpressions, gene knockdowns, gene deletions, and heterologous protein incorporations. These approaches are expanding the in silico metabolic engineering toolbox to complement our experimental capacities.
These flux analysis algorithms can be used to model gene knockouts and insertion of heterologous enzymes for metabolic pathway engineering applications, however, they do have inherent limitations. Heterologous enzymes are simply inserted into the reconstruction matrix, assuming that they are going to be actively expressed in their new host. This is often not the case due to solubility problems, unknown regulators, or problems with unoptimized codons. In addition, these flux analyses are not particularly adept at modeling overexpressions of enzymes. This limitation results from the fact that flux analyses based on stoichiometric matrices tend to represent gene overexpression by adjusting the flux constraint. When looking at the basic genome to fluxome cellular data pathway: Genome → transcriptome → proteome → metabolome → fluxome, it can be seen that the enzyme overexpression occurs at the genomic level. Due to cellular complexity, there is no direct, linear relation between any of the levels of cellular data . In addition, the enzyme may not be the rate-limiting step or may be part of a tightly regulated pathway, so overexpression will have no phenotypical effect. Thus, these models are more predictive for the type of activity change required in the cell rather than the way in which to deliver this change. In this respect, these systems metabolic engineering tools provide guidance for strain engineering, but still rely on the availability of a powerful metabolic engineering toolbox (Textbox 1) capable of inducing these changes.
A general advantage of the systems metabolic engineering approach to strain improvement is complete cataloguing of all genetic modifications, allowing for future rational pathway or systems based engineering. To utilize traditional metabolic engineering techniques, only knowledge of the basic pathway of interest is prerequisite, and all engineering endeavors exploit only reactions closely related to the basic pathway. On the contrary, the systems metabolic engineering approach can target enzymes for gene knockout that are seemingly not related to the pathway or phenotype of interest. The cumulative effects of these knockouts combined with traditional metabolic engineering approaches generate an increased product titer or an enhanced phenotype. Even after several rounds of rational engineering, it is possible to adapt the metabolic reconstruction to model the engineered strain by constraining it to accurately reflect experimental 13Carbon flux data. This reconstruction can be used for further in silico analysis to search for potential genetic modifications [2, 35, 36]. Thus, an iterative systems metabolic engineering approach may be used to engineer already altered cells.
Systems metabolic engineering is dependent on the availability and accuracy of high throughput data to incorporate into in silico models. The omics revolution has vastly increased our knowledge base, but our basic understanding of the cell's complexity is rudimentary. Therefore, our in silico cell models are also comparatively simple. Systems metabolic engineering's effectiveness at phenotypical predictions will theoretically increase as in silico reconstructions further reflect the complexity of the cell.
Reconstruction complexity can be increased in three ways in the near future. Firstly, in vivo enzyme kinetic data are lacking for many metabolic reactions, and incorporating this data into the metabolic model will increase its predictive prowess. Secondly, a suitable in silico representation of traditional enzyme overexpression and novel enzyme underexpression [8, 77] will permit the modeling of essential metabolic engineering techniques, increasing the capacity of the reconstruction to model genetic perturbations. By simulating gene over-expression, knockdown, and knockout, instead of only gene knockout, the power of the reconstruction will have increased three fold. Finally, the incorporation of transcriptional regulatory networks and signaling networks into the metabolic network reconstruction will greatly increase the predictive efficacy of the model and allow for dynamic cellular representation.
Transcriptional regulatory network reconstructions can be created following the same basic outline as for metabolic reconstructions . Recent work by Faith et al.  using network inference algorithms may automate the future development of regulatory networks. In 2002, Covert and Palsson developed an experimentally confirmable integrated metabolic/regulatory model for the central carbon metabolism of E. coli that could model growth, substrate uptake, product secretion, and gene expression . Furthermore, a very complete genome-scale reconstruction of E. coli's transcriptional and translational machinery was recently completed by Thiele et al. , but it has not been integrated into the metabolic reconstruction. Advanced integrated transcriptional regulatory/metabolic reconstructions could be used to model global transcriptional modulation, a key facet missing from existing metabolic reconstructions. Dynamic analysis of integrated metabolic, transcriptional regulatory, and signaling networks is in the initial stages. Lee et al.  proposed an integrative, dynamic FBA to solve a stoichiometric reconstruction containing metabolic, regulatory, and signaling processes of a S. cerevisiae pathway. Likewise, Covert et al.  modeled the dynamic behavior of the three networks in E. coli central carbon metabolism using an altered FBA approach. When the integration of the metabolic, transcriptional regulatory, and signaling transduction cellular networks is scaled up to the genome level, it will vastly improve the predictive power of in silico cell models.
It is important to note that as network or integrated network reconstructions increase in complexity, the computational power necessary to model flux analyses increases. Modeling multiple gene knockouts by systematically searching the reconstruction for single, double, triple, or more knockouts leads very quickly to combinatorial explosion. Therefore, a sequential, iterative approach is typically employed whereby the reconstruction is searched for a certain number of potential single knockouts, and then these single knockouts are searched for potential double knockouts, and so on. Alternatively, the optimization function can be augmented to simultaneously guarantee product yields and maximal growth. Regardless, increased computational power and refined optimization techniques would allow for a more thorough search.
Assuming that reconstruction and modeling capabilities continue to be improved, there are still two potential difficulties blocking the proliferation of the systems metabolic engineering approach. Most importantly, even though the cost of generating the necessary high throughput data for a systems approach has been decreasingly rapidly, the price tag can still be prohibitively high, especially in an industrial setting where many mutant strains are under study. These high throughput costs include sequencing the genome and generating microarrays for analyzing transcript and protein levels, and 13Carbon flux data to verify the flux model. For instance, if multiple randomly mutated strains have improved phenotype, all of their genomes would have to be sequenced and analyzed in order to find the beneficial mutations. It may be more cost-effective to first shuffle the genomes together to combine the phenotypical improvements, and then sequence only the most improved strain. The second difficulty slowing the incorporation of systems-based approaches is that high throughput data in effect takes a “snap shot” of certain cellular component levels, transcript levels, metabolite levels, or protein levels. This snap shot provides a very descriptive view of the cell status but inherently does not have any information about any past or future cell states. Therefore, when a reconstruction is fitted to this high throughput data, it runs the risk of becoming a descriptive model, instead of the desired predictive model. As a result, systems metabolic engineering must address both cost and model predictivity to compete with traditional strategies for strain improvement.
In the near future, the most successful approaches utilizing systems metabolic engineering will continue to employ genome-scale metabolic reconstructions to model gene knockouts. Currently, there are metabolic reconstructions for over 30 organisms [56, 83], but of these, it has been mainly the metabolic reconstructions of commonly modeled organisms such as E. coli and S. cerevisiae that have been used for systems approaches. As metabolic reconstructions are expanded and genetic tools are developed for non-model organisms, systems approaches toward engineering these organisms will thrive. The capability to model gene overexpression and underexpression will be difficult due to the unquantified relation between transcription levels and network flux modulation. However, these impacts will certainly advance the field.
To maximize the benefit of using systems approaches, a metabolic engineer must ask the following questions. First, what organism should be engineered to optimize yield? Second, in what stage of the strain development process should a systems biology approach be applied? Third, is a systems metabolic engineering approach more cost-effective than traditional pathway engineering or industry-styled random mutagenesis and screening?
In conclusion, the emerging field of systems metabolic engineering is rife with prospects and capabilities. Previous systems metabolic engineering successes have used model organisms with existing metabolic reconstructions to discover gene knockouts to improve product yield. At the present time, strains improved in this manner can be furthered optimized using novel combinatorial techniques and random mutagenesis to further increase product yield. As more non-model organisms or industrial strains are sequenced and annotated, the library of organisms (and enzymes) available to a systems metabolic engineer will increase. Furthermore, reconstruction predictive power can be improved by incorporating other cellular processes, allowing for the improvement of multigenic traits and non-metabolic phenotypes. At present, systems biology approach for metabolic engineering work well for predicting changes for metabolic pathway engineering. The computational support of the field must also be advanced to match our current capabilities of high-throughput biology measurements. In addition, this approach has significant limitations in the engineering of complex, multigenic phenotypes such as tolerance and thus still requires follow-up using combinatorial metabolic and cellular engineering tools. As these approaches evolve, the cost of systems metabolic engineering will continue to decrease allowing for widespread adoption as an iterative first step in phenotype improvement.
We acknowledge support from NIH (Grant number: 1R01GM090221-01) and the Camille and Henry Dreyfus New Faculty Award.
Dr. Hal Alper is an Assistant Professor in the Department of Chemical Engineering at The University of Texas at Austin. He earned his Ph.D. in chemical engineering from the Massachusetts Institute of Technology in 2006 and was a postdoctoral research associate at the Whitehead Institute for Biomedical Research from 2006–2008, and at Shire Human Genetic Therapies from 2007–2008. Dr. Alper's research is in the area of cellular and metabolic engineering. His research focuses on metabolic and cellular engineering in the context of biofuel, biochemical, and biopharmaceutical production in an array of model host organisms.
John Blazeck received his BS in chemical engineering from the University of Florida, and is a graduate student in the Laboratory for Cellular and Metabolic Engineering at the University of Texas at Austin. His current research utilizes oleaginous yeast for the production of biofuels.
The authors have declared no conflict of interest.