|Home | About | Journals | Submit | Contact Us | Français|
A genome-scale metabolic network reconstruction for Clostridium acetobutylicum (ATCC 824) was carried out using a new semi-automated reverse engineering algorithm. The network consists of 422 intracellular metabolites involved in 552 reactions and includes 80 membrane transport reactions. The metabolic network illustrates the reliance of clostridia on the urea cycle, intracellular l-glutamate solute pools, and the acetylornithine transaminase for amino acid biosynthesis from the 2-oxoglutarate precursor. The semi-automated reverse engineering algorithm identified discrepancies in reaction network databases that are major obstacles for fully automated network-building algorithms. The proposed semi-automated approach allowed for the conservation of unique clostridial metabolic pathways, such as an incomplete TCA cycle. A thermodynamic analysis was used to determine the physiological conditions under which proposed pathways (e.g., reverse partial TCA cycle and reverse arginine biosynthesis pathway) are feasible. The reconstructed metabolic network was used to create a genome-scale model that correctly characterized the butyrate kinase knock-out and the asolventogenic M5 pSOL1 megaplasmid degenerate strains. Systematic gene knock-out simulations were performed to identify a set of genes encoding clostridial enzymes essential for growth in silico.
Genome-scale models involve the application of flux balance analysis (FBA) to the two-dimensional stoichiometric matrix of a reconstructed metabolic network (Edwards et al., 1999; Stephanopoulos et al., 1998). Maximizing the specific growth rate has become an accepted objective function of FBA (Edwards et al., 1999), but not the only one (Knorr et al., 2007). Thermodynamic (Henry et al., 2007; Kummel et al., 2006) and regulatory (Covert et al., 2001; Gianchandani et al., 2006; Thomas et al., 2004, 2007) flux constraints along with metabolite conservation relationships (Cakir et al., 2006; Nikolaev et al., 2005) have been developed to decrease the size of the steady-state flux-distribution solution space of FBA.
Solventogenic butyric-acid clostridia are of interest for industrial solvent (particularly bio-butanol) production from diverse substrates, including most hexoses and pentoses, cellulose and xylans (Demain et al., 2005; Montoya et al., 2001; Schwarz, 2001). C. acetobutylicum ATCC 824 is the first sequenced solventogenic Clostridium and can be argued that it serves as a model organism for clostridial metabolism and sporulation in general (Paredes et al., 2005; Thormann et al., 2002). It is an endospore former that displays several defined cascading sigma-factor regulated metabolic programs which impact or are driven by the extracellular environment (Husemann and Papoutsakis, 1988; Jones and Woods, 1986; Paredes et al., 2005; Zhao et al., 2005). It also has an incomplete TCA cycle that may operate in reverse to synthesize fumarate from oxaloacetate (Nolling et al., 2001). Although a genome-scale model has also been constructed for the endospore-forming Bacillus subtilis (Oh et al., 2007), clostridia differ substantially from bacilli in many different ways (Paredes et al., 2005). For example, clostridia are strict anaerobes while bacilli are facultative aerobes. Thus, a genome-scale model of C. acetobutylicum will not only serve genetic, biotechnological and physiological research needs of butyric-acid clostridia, but significantly, its genome-scale metabolic model may eventually be extrapolated to similar pathogenic and non-pathogenic clostridia with annotated genomes.
The development of a genome-scale metabolic network reconstruction and associated stoichiometric matrix requires the piece-wise integration of: (i) enzymes with annotated Enzyme Commission (EC) numbers and associated biological reactions; (ii) metabolic pathway blueprints from biochemical reaction, enzymatic, and membrane transport databases; and (iii) physiological knowledge of the organism transcriptome, proteome and metabolome, including high-throughput data when available. The traditional model-building methodology involves iterative organization of these data into a functional flux network (Becker and Palsson, 2005; Forster et al., 2003; Heinemann et al., 2005). Automation of a metabolic network reconstruction, based on enzyme homology, requires the use of a generalized metabolic network topology readily available from reaction network databases such as Kyoto Encyclopedia of Genes and Genomes (KEGG) and MetaCyc (Caspi et al., 2006; Francke et al., 2005; Kanehisa and Goto, 2000). Due to incomplete genome annotation, these methods commonly result in a non-functional metabolic network due to missing enzymes and other gaps in the network. Thus, algorithms have been developed to automate the processes needed to rectify these discrepancies in metabolic network drafts.
From initial drafts of the genome-scale metabolic network for C. acetobutylicum presented here, two categories of network gaps were identified: (i) gaps resulting from missing enzymes or unknown biological reactions and (ii) gaps resulting from discrepancies in biological reaction databases due to incorrect and mislabeling of compounds and reactions. The first category of network gaps have been addressed by many recently developed algorithms. Techniques used by these algorithms include: genome context analysis (advances of comparative genomics), metabolic pathway homology, enzymatic databases, and high-throughput-omics data (Francke et al., 2005; Kharchenko et al., 2006; Kumar et al., 2007; Notebaart et al., 2006; Osterman and Overbeek, 2003). Other useful algorithms make use of growth phenotyping data (Reed et al., 2006) and genetic perturbations (MacCarthy et al., 2005; Tegner et al., 2003), but these data exist only for a very small percentage of organisms with sequenced and annotated genomes. To address both types of network gaps, analysis of the stoichiometric matrix can be used to identify compounds without both an origin of biosynthesis and degradation (or transport in/out of the network) (Kumar et al., 2007; Reed et al., 2003). From our experience, many discrepancies of the reconstructed metabolic network are not evident from direct analysis of the stoichiometric matrix itself. We found that some discrepancies result in internal cycling of isolated pathways within the metabolic network. Common fixes to metabolic network discrepancies allow transport of inadequately synthesized (or degraded) biological macromolecules into (or out of) the network. This methodology may result in a miscalculation of the metabolic flux profile. Here, we propose a new semi-automated algorithm, based on reverse engineering, to quickly identify both categories of discrepancies in the stoichiometric matrix and illustrate a few examples encountered in metabolic network reconstruction for C. acetobutylicum. Our method allows for the conservation of pathways unique to each bacterial genome. We also demonstrate the usefulness of thermodynamic analysis of proposed pathways here.
The genome-scale metabolic model for C. acetobutylicum was derived from mass balances given all known or predicted intracellular metabolic and membrane transport reactions as well as empirical relations for biomass composition. The pseudo-steady state assumption was assumed for all mass balances, resulting in a system of linear equations (Edwards et al., 1999; Papoutsakis, 1984). Prediction of metabolic reactions or transport processes were based on the annotated genome (Nolling et al., 2001) in conjunction with accumulated physiological data. Reactions of the genome-scale model consist of biochemical reactions given constant physiological pH 7 and concentration of free metal ions, as opposed to charge-balanced chemical reactions, according to a previously developed formalism (Alberty, 1993, 1994, 2002). The reconstruction of the metabolic network and integration of these pathways to simulate cell growth in silico was divided into the following separate processes: (i) building metabolic pathways and membrane transport reactions based on genomic annotation, enzyme homology and experimental observations; (ii) developing biomass constituting equations based on physiological data; and (iii) identifying incomplete metabolic pathways and missing metabolite membrane transport reactions through semiautomated reverse engineering of the metabolic network. These three model-building processes are discussed in detail below and were used iteratively to generate a genome-scale model of C. acetobutylicum capable of cell growth in silico.
The resulting composite equation, S · ν=0, consisted of a two-dimensional matrix, S, and a vector, ν, of all intracellular and membrane transport fluxes. Integration of transport reaction fluxes into the stoichiometric matrix of a metabolic model was published (Edwards et al., 2001). Constraints, in the form αi≤νi≤βi were applied to all components of the flux vector. A constraint for irreversibility consisted of setting αi or βi to zero (depending on the reaction-flux direction) while setting the opposite constraint near infinity. The flux vector was optimized through linear programming, a technique commonly referred to as FBA (Edwards et al., 1999; Papoutsakis, 1984). The objective function used in the optimization algorithm was to maximize the specific growth rate. The stoichiometric matrix was constructed in MATLAB (The Mathworks, Inc., Natick, MA). Constrained optimization by linear programming was performed with LINDO API (Lindo Systems, Chicago, IL), within the MATLAB environment. A list of all biochemical reactions, biomass constituting equations, exchange reactions, and associated ranges of applied constraints for FBA is given as Supplementary Appendix 1.
The iterative metabolic pathway construction procedure is summarized in Figure 1. The procedure was initiated with data mining of metabolic pathways specific to C. acetobutylicum contained in the KEGG (Kanehisa and Goto, 2000), the GenomeNet (Kanehisa et al., 2002), MetaCyc (Caspi et al., 2006) and the Comprehensive Microbial Resource (CMR) (Peterson et al., 2001) at The Institute for Genomic Research (TIGR; http://www.tigr.org/). This set of metabolic reactions was further supplemented with metabolite transport reactions obtained from the Transport Classification Database (TCD) (Busch and Saier, 2002; Saier et al., 2006) and TransportDB (Ren et al., 2007). Unresolved metabolic pathways were identified through reverse engineering of metabolic network reconstruction (discussed below). Additional metabolic and transport reactions were identified through the PUMA2 database (Maltsev et al., 2006) and literature specific to the C. acetobutylicum physiology. Furthermore, BLASTP analyses of C. acetobutylicum proteins of unknown function to other annotated clostridial genomes were used to identify additional enzymes contained in KEGG and CMR that were required by the metabolic network. In the absence of clostridial data, genomes of the well-studied bacteria (in order) B. subtilis (Kunst et al., 1997), Staphylococcus aureus N315 (Kuroda et al., 2001), and Escherichia coli K-12 MG1655 (Blattner et al., 1997) were used. The BRENDA enzymatic database (Schomburg et al., 2004) and ExPASy ENZYME database (Bairoch, 2000) were used to further identify substrates/products and stoichiometry of reactions catalyzed by individual enzymes and characterize unresolved pathways. The BRENDA database was also parsed to obtain a list of all enzymes catalyzing irreversible reactions under physiological conditions, and this list was used to identify enzymes in the C. acetobutylicum metabolic network catalyzing irreversible reactions.
The contribution of the metabolic network to the production of biomass was calculated based on genomic and physiological data available for C. acetobutylicum. The components of the biomass constituting equation were adapted from a platform initially created for S. aureus N315 (Heinemann et al., 2005) and recently used for Methanosarcina barkeri (Feist et al., 2006). Specifically, biomass was defined as a sum of: RNA, DNA, protein, lipids, cell wall, and solute pools of the cytoplasm. The specific definition of each of these broad terms was constructed according to genomic information obtained from NCBI and from literature data. The total list of biomass constituting equations and energetic requirements are shown in Supplementary Appendix 1. The average DNA composition was based on the nucleotide content of the entire genome and the pSOL1 megaplasmid. The average protein and RNA compositions were calculated from an analysis of known ORFs. The calculation of the average RNA sequence included ribosomal and tRNA sequences in addition to ORFs. Previously published data, specific to C. acetobutylicum and B. subtilis, enabled specifically tailored constituting equations for lipids, teichoic acids, and peptidoglycan biosyntheses. These equations are also shown in Supplementary Appendix 1. Due to the unavailability of specific data, the composition of the intracellular solute pool (shown in Supplementary Appendix 1) was assumed similar to those published for S. aureus N315 (Heinemann et al., 2005) with some notable exceptions (discussed later). Also consistent with the model for S. aureus (Heinemann et al., 2005), a growth maintenance value of 40 mmol ATP/(g cell dry weight per hour) was assumed (Stephanopoulos et al., 1998).
Data mining of biochemical pathway databases (KEGG, in particular) were used in compiling initial drafts of the metabolic network for C. acetobutylicum. However, as is currently the case for most genomes, incomplete gene annotation leads to several incomplete metabolic pathways within such biochemical pathway databases. In addition, other inconsistencies were observed in data obtained directly from these biochemical pathway databases. These included: (i) multiple identity markers for the same compound; (ii) compounds that lacked an origin of synthesis/degradation within the biochemical database; (iii) incorrect stoichiometry of metabolic reactions; and (iv) misappropriated enzymes to a particular cell type. Identification of the source of a broken metabolic pathway (gaps) of the network is a laborious task, especially in the case where multiple sources of inconsistencies may exist (Kumar et al., 2007; Reed et al., 2003). Thus, a reverse engineering approach was developed to identify such inconsistencies within the metabolic network. The approach was designed to be used in conjunction with or after the identification of dead-ends through stoichiometric matrix analysis (Reed et al., 2003). The proposed reverse engineering approach includes optimizing the reaction flux network with an objective function of maximizing the specific growth rate. In general, a metabolic network with one or multiple incomplete biochemical pathways (from substrate to biomass building blocks) was found to result in a maximized specific growth rate of zero (no growth in silico). This approach is illustrated by a flow diagram of Figure 2. Our reverse engineering algorithm requires a set of biomass constituting equations (see Supplementary Appendix 1) and a metabolic network (complete or incomplete). The set of membrane transporters required for minimal medium (Monot et al., 1982) (see Supplementary Appendix 1) were used here as well. If the application of FBA to the existing metabolic network does not yield the production of biomass in silico, biomass transfer equations are added to the metabolic network. These equations are listed in Supplementary Appendix 2 and consist of the individual components comprising biomass (e.g., RNA, DNA, protein, lipids, cell wall, and pooled solutes) and which are separately transported into an incomplete metabolic network. The addition of biomass transfer equations results in a positive specific growth rate in silico when FBA is applied. It is noted that biomass transfer equations and component transfer equations (discussed later) are arbitrary membrane transport equations used to identify metabolic network discrepancies only. These equations are not present in the final version of the metabolic network reconstruction. Following their addition, one-by-one the biomass transfer equations are eliminated. Once the elimination of a biomass transfer equation results in a specific growth rate of zero (arrested growth in silico), that broadly defined component of biomass is broken down into its constituents. For example, the biomass component RNA is composed of genome-specific stoichiometric amounts of ATP, CTP, GTP, and UTP. In this case, the RNA biomass transfer equation would be removed and ATP, CTP, GTP, and UTP would be added to the metabolic network by separate equations termed component transfer equations. The full list of component transfer equations used in the model-building process is given in Supplementary Appendix 2. In a similar procedure, the component transfer equations are systematically eliminated until a specific growth rate of zero is realized. The component responsible for arresting growth in silico is recognized as being inadequately synthesized/degraded in the existing metabolic network. Upon identification of this type of discrepancy in the metabolic network, iterative measures, as shown in Figure 1, are implemented to resolve the network connectivity.
We also assessed the thermodynamic feasibility of proposed metabolic pathways (e.g., the reverse partial TCA cycle) for C. acetobutylicum that are not common to reaction network databases. This was done by calculating the Gibbs free energy of all reactions of the pathway using previously published methods and estimated values for the standard Gibbs free energy of formation, , and estimated standard Gibbs free energy of reaction, (Henry et al., 2006, 2007).
A negative Gibbs free energy of reaction,
is required for a metabolic reaction to occur and was calculated given m compounds of a chemical reaction with stoichiometric coefficients n, where R is the ideal gas constant, and an assumed temperature, T, of 298 K. Millimolar concentrations, ci, of reaction components (Henry et al., 2006) and dimensionless activity coefficients, γi were used to calculate the concentration-dependent term of the Gibbs free energy of reaction equation (Eq. 2). As shown previously (Henry et al., 2007), the standard error in and terms calculated from group contribution theory (Mavrovouniotis, 1990) outweighed the influence of ionic strength, despite the illustration of its strong influence on ΔrG′ (Maskow and von Stockar, 2005). Given these results, activity coefficients were set to 1 for our calculations. For proposed pathways in C. acetobutylicum not native to reaction network databases (e.g., KEGG), combinations of metabolite concentrations yielding negative ΔrG′ values for every reaction in the pathway were calculated. Pathways incapable of producing negative ΔrG′ values for every reaction are thermodynamically infeasible. Resulting metabolite concentrations were compared to measured physiological metabolite concentrations of C. acetobutylicum (when available) to assess the practicality of the proposed reaction, similar to that done for glycolysis (Maskow and von Stockar, 2005). For cases in which not all metabolite data were available, ranges of metabolite concentrations at which a proposed pathway is feasible were calculated. It is noted that a wide range of short-comings currently exist for the thermodynamic analysis of metabolic pathways (Maskow and von Stockar, 2005). Aside from the obvious pitfalls of accurate and cytoplasm ionic strength calculations, the influence of intracellular pH changes on is unaccounted for in our calculations as pH 7 was assumed for all cases. In addition, charge balances were carried-out using only the major protonated species at pH 7 to remain consistent with calculations (Henry et al., 2006, 2007).
The genome-scale metabolic network for C. acetobutylicum was constructed using the iterative methods of pathway construction shown in Figure 1 and the reverse engineering algorithm of Figure 2. The network consists of 422 metabolites involved in 552 reactions, including 80 metabolite transport reactions across the cell membrane. Simulation of the genome-scale model produced a positive specific growth rate for the wild-type genome with the complete set of transporter reactions. The buk gene knock-out mutant (Green and Bennett, 1998; Harris et al., 2000) was simulated by restricting flux through the butyrate kinase enzyme (Buk, EC 22.214.171.124, CAC3075) to zero using constraints. In addition, the pSOL1 megaplasmid degenerate M5 strain (Tomas et al., 2003) was simulated by restricting flux through enzymes encoded by megaplasmid genes. These reactions are specifically labeled in Supplementary Appendix 1. The qualitative results of these simulations are given in Table I. Resulting specific growth rates of these simulation studies did not match experimental observations due to the lack of regulatory mechanisms and large number of reversible reactions in this initial version of the genome-scale model. We further investigated the capabilities of the genome-scale model to simulate growth on the published minimal medium formulation for C. acetobutylicum (Monot et al., 1982) and a glycerol-containing synthetic medium (Vasconcelos et al., 1994). These results are also summarized in Table I. In all cases, growth in silico was successful without adding further additional transport equations to provide metabolites or macromolecules not adequately synthesized or effectively degraded by the metabolic network. In addition, observed phenotypes of knock-out strains were obtained in silico, suggesting that the network is complete and represents C. acetobutylicum metabolism. The number of reactions in the reconstructed metabolic network used to represent specified metabolic functions is shown in Table II. This table also provides statistics that relate the completed metabolic network to the genomic annotation used to reconstruct it.
An example of one iteration of the semi-automated reverse engineering process for completing the genome-scale metabolic network is shown in Figure 3, whereby deficient lipid biosynthesis of lipoteichoic acid, diglucosyl diacylglycerol and d-glucosyl-1,2-diacylglycerol were found responsible for arresting cell growth when the metabolic flux profile was optimized. The metabolic pathways for these precursors were investigated and manually rectified. Employing the reverse engineering procedure iteratively was necessary for identifying and correcting these growth-preventing errors in the metabolic network. Application of the reverse engineering algorithm of Figure 2 to an initial draft of the C. acetobutylicum metabolic network largely created from the KEGG database, revealed reaction network discrepancies beyond simply missing enzymes. These discrepancies are shown as Supplementary Appendix 3 and include a list of aerobic reactions annotated in KEGG to belong to C. acetobutylicum, a strict anaerobe.
Total lipids in C. acetobutylicum have been found to account for 5–6% of the dry cell weight (Lepage et al., 1987). It has been also reported that solvent exposure leads to an increase in the ratio of saturated and cyclopropane fatty acids to unsaturated membrane fatty acids (Baer et al., 1987; Vollherbst-Schneck et al., 1984; Zhao et al., 2003), changes in the mean fatty acid acyl chain length (Lepage et al., 1987; Vollherbst-Schneck et al., 1984; Zhao et al., 2003) and changes in the membrane phospholipid composition (Johnston and Goldfine, 1992; Lepage et al., 1987; MacDonald and Goldfine, 1991). Nevertheless, due to the absence of specific compositional information about these changes, a single lipid biosynthesis equation (see Supplementary Appendix 1) was used in the calculation of biomass composition over the entire course of exponential growth. The relative amounts of lipids and phospholipids of the lipids biosynthesis equation was derived based on a consensus of the cited literature data corresponding to exponential growth. The fatty acid composition in all cases was also held constant at 16:0 (carbon chain length: number of double-bonds), which is a dominant experimental observation (Lepage et al., 1987; Vollherbst-Schneck et al., 1984). For the lipid-equation component of lipoteichoic acid (LTA), literature data specific to B. subtilis (Neuhaus and Baddiley, 2003; Perego et al., 1995) were used, due to insufficient data available for C. acetobutylicum. The average LTA composition of 29 glycerophosphate units per chain was used. Also, an average of 13 glycerophosphate units per chain were substituted with d-alanine esters (d-alanylation) in B. subtilis (Neuhaus and Baddiley, 2003; Perego et al., 1995). The process of d-alanylation was ignored in the C. acetobutylicum model due to the absence of a dlt operon (Kiriukhin and Neuhaus, 2001; Perego et al., 1995).
Cell wall is made up of crosslinked peptidoglycan and wall teichoic acid (WTA). Due to the lack of information specific to C. acetobutylicum, in the cell-wall equation (see Supplementary Appendix 1), the stoichiometric coefficients of these components were kept identical to those found for S. aureus N315 (Heinemann et al., 2005). At the time of model construction, the genome-scale model of B. subtilis (Oh et al., 2007) had not yet been published, and thus information from B. subtilis was not employed in our model. Modifications of peptidoglycan structures and amino acids of the interpeptide bridge have been observed as a result of environmental changes (Schleifer and Kandler, 1972), and large differences exist between the peptidoglycan structures of vegetative cells and spores (Atrih and Foster, 2001; Makino and Moriyama, 2002). However, a single description of crosslinked peptidoglycan (Cummins and Johnson, 1971; Schleifer and Kandler, 1972) (see Supplementary Appendix 1) was used for model development of C. acetobutylicum vegetative growth. In addition, a model of WTA from B. subtilis (Neuhaus and Baddiley, 2003; Perego et al., 1995) was used, in absence of specific literature data for C. acetobutylicum. As with LTA, the cellular process of d-alanylation of WTA was ignored for the C. acetobutylicum model.
The TCA cycle of C. acetobutylicum is incomplete, lacking the necessary enzymes to support the biochemical conversions of: (i) oxaloacetate to citrate; (ii) succinyl-CoA to succinate; and (iii) succinate to fumarate. It has been proposed that fumarate is synthesized from oxaloacetate (with malate as an intermediate) through a reverse TCA pathway involving malate dehydrogenase (EC 126.96.36.199, CAC0566) and fumarate dehydratase (FumA, EC 188.8.131.52, CAC3090, CAC3091) (Nolling et al., 2001). Calculation of the estimated standard Gibbs free energy of reaction () using published values (Henry et al., 2007) yielded a thermodynamically favorable production of malate from oxaloacetate (). It is noted that significantly different standard Gibbs free energy of formation values have been published recently (Alberty, 2004). With an intracellular NAD+/NADH ratio of 10, this reaction is favorable (for the production of malate) given a malate to oxaloacetate ratio of less than ~300. If the NAD+/NADH ratio drops to 0.1, the reaction is favorable for malate to oxaloacetate ratios less than ~3 × 104. Fumarate production from malate through FumA is less thermodynamically favorable (); however, this reaction has a negative ΔrG′ when the intracellular ratio of malate to fumarate is greater than ~2.
With an incomplete TCA cycle, biosynthesis of the amino acid precursor 2-oxoglutarate remained unresolved in initial versions of the metabolic network but it was suggested to include a previously unresolved pathway involving the urea cycle (Nolling et al., 2001). Since growth was observable in a minimal medium containing no free amino acids or peptides (Monot et al., 1982), C. acetobutylicum must be capable of synthesizing all necessary l- and d-amino acids. To resolve this previously undefined pathway, the biosynthesis of 2-oxoglutarate without an l-glutamate precursor was explored through several possible mechanisms. One investigated pathway requires the conversion of l-ornithine to l-proline by ornithine cyclodeaminase (EC 184.108.40.206), an enzyme and metabolic reaction known to exist in S. aureus N315 (SAU0113). However, BLASTP results of this encoded protein sequence against that of the C. acetobutylicum genome returned a bit-score insufficient for orthology (Paredes et al., 2005; Pearson, 1996). Other investigated pathways require a 2-oxoglutarate to l-glutamate conversion as part of an intermediate reaction step. This requires a pre-existing l-glutamate solute pool for l-glutamate biosynthesis through these pathways. However, the investigation of known enzymatic mechanisms in available public databases and known pathways of similar organisms revealed an alternative pathway through enzyme homology. Although the genome annotation did not identify an ORF of C. acetobutylicum encoding a bacterial ornithine-oxo-acid transferase enzyme (RocD, EC 220.127.116.11), this enzyme was investigated using the BRENDA enzymatic database. Several researchers have identified the inclusion of 2-substituted oxo-acids (e.g., pyruvate, 2-oxoglutarate, oxaloacetate, 2-oxobutanoate, methylglycoxyl) as viable substrates in the conversion of l-ornithine to l-glutamate 5-semi-aldehyde as catalyzed by a Bacillus sp. YM-2-isolated ornithine-oxo-acid transferase (RocD, EC 18.104.22.168) (Jhee et al., 1995). Among microorganisms similar to C. acetobutylicum, both B. subtilis and S. aureus N315 are known to contain both RocD and ArgD (EC 22.214.171.124) enzymes. Both RocD and ArgD of B. subtilis (and S. aureus) were analyzed against the C. acetobutylicum genome using BLASTP. Bit-scores suggested orthology (Pearson, 1996) of both enzymes (of both organisms) with CAC2388. When further analyzed in the BRENDA enzyme database, multiple researchers have noted that acetylornithine transaminase (EC 126.96.36.199) may exhibit very similar enzymatic activity to ornithine-oxo-acid transferase (EC 188.8.131.52) (Billheimer et al., 1976; Friedrich et al., 1978; Voellmy and Leisinger, 1975). Because of the homology observed between the two enzymes and the availability of pyruvate through glycolysis (Desai et al., 1999; Papoutsakis and Meyer, 1985), we propose the following biochemical reaction catalyzed by the enzyme of CAC2388, which is currently annotated as an acetylornithine transaminase (ArgD, EC 184.108.40.206) in C. acetobutylicum:
The estimated standard Gibbs free energy () of the reaction shown as Equation (3) was calculated as 2.19 kcal mol−1 using published values (Henry et al., 2007) of charged species at pH 7. Intracellular pyruvate levels were estimated as 5 mM based on E. coli data (Yang et al., 2001), and an intracellular concentration of l-alanine of 1.3 mM was assumed (from S. aureus data) (Heinemann et al., 2005). Intracellular l-ornithine levels were estimated at 10 mM, based on measurements in other bacteria (Poolman et al., 1987). Given these values, a ratio of intracellular concentration of l-ornithine to l-glutamate 5-semi-aldehyde greater than or equal to 10.6 is required for the reaction of Equation (3) to proceed in the forward direction as written. Finally, l-glutamate is synthesized by the metabolic network from l-glutamate 5-semi-aldehyde by γ-glutamyl-phosphate reductase (ProA, EC 220.127.116.11, CAC3254) and by glutamate 5-kinase (ProB, EC 18.104.22.168, CAC3253). Using this mechanism of l-glutamate biosynthesis, a proposed amino acid biosynthesis and the linking of the degenerate TCA cycle through the urea cycle are shown in Figure 4. The proposed pathway yields a net production of l-glutamate because l-ornithine does not require an l-glutamate precursor in C. acetobutylicum. Given the incomplete TCA cycle of C. acetobutylicum, the pathway of l-ornithine biosynthesis is as follows (also shown in Fig. 4): (i) pyruvate to oxaloacetate (PykA, EC22.214.171.124, CAC2660); (ii) oxaloacetate to l-aspartate (NadB, EC 126.96.36.199, CAC1024); (iii) l-asparate to l-argininosuccinate (ArgG, EC 188.8.131.52, CAC0973); (iv) l-argininocussinate to l-arginine (ArgH, EC 184.108.40.206, CAC0974); and (v) l-arginine to L-ornithine (EC 220.127.116.11, CAC1054). In absence of an l-glutamate intracellular solute pool, a citrulline pool is required of this pathway. Citrulline solute pools, up to 1 mM, have been measured in clostridia (Kleiner, 1979). Based on current annotation and available enzymatic studies, the pathway presented is the most probable mode of l-glutamate biosynthesis in absence of a complete TCA cycle and intracellular l-glutamate (or l-glutamine) solute pool.
We have proposed a pathway (Eq. 3) for the biosynthesis of l-glutamate. However, in the presence of an l-glutamate solute pool, the arginine biosynthesis pathway (operating in reverse) may contribute to l-glutamate biosynthesis. Here, we further discuss this pathway, the thermodynamic feasibility of its operation in reverse, and the physiological characteristics (metabolic programs and environmental conditions) that have been observed to alter the size of the intracellular l-glutamate solute pool in other clostridia. The production of the l-glutamate biosynthesis precursor through ArgD (CAC2388) (Eq. 3) suggests that l-glutamate biosynthesis through this (single) enzyme may be a bottleneck of protein biosynthesis. The proposed biosynthesis of l-glutamate from l-ornithine with calculated values is illustrated in Figure 5. l-glutamate, from the intracellular solute pool, and acetyl-CoA, from the primary metabolism of glucose, are used by the glutamate N-acetyltransferase (ArgJ, EC 18.104.22.168, CAC2391, CAC3020) to produce N-acetyl-l-glutamate, which is then combined with l-ornithine (from the urea cycle) to form N-acetyl-ornithine and l-glutamate by ArgJ. l-glutamate is converted to 2-oxoglutarate by a host of enzymes in C. acetobutylicum; the glutamate dehydrogenase (EC 22.214.171.124, CAC0737) and the NADPH-dependent glutamate synthase (EC 126.96.36.199, CAC0764) are listed in Figure 5. The 2-oxoglutarate and N-acetyl-ornithine are converted to N-acetyl-l-glutamate 5-semi-aldehyde and l-glutamate by the N-acetylornithine aminotransferase (ArgD, EC 188.8.131.52, CAC2388). Further processing back to N-acetyl-l-glutamate occurs through the N-acetyl-γ-phosphate-reductase (ArgC, EC 184.108.40.206, CAC2390) and the acetylglutamate kinase (ArgB, EC 220.127.116.11, CAC2389). This cycle results in the net production of 1 mole of l-glutamate per mole of l-ornithine processed by this pathway. The net biochemical reaction (not charge-balanced) of the cycle, starting and ending with N-acetyl-l-glutamate, as shown in Figure 5, is given as Equation (4).
Also shown in Figure 5 is the observed inhibitory relationship between weak acids and the accumulation of an intracellular l-glutamate solute pool. Recent studies have suggested that the intracellular accumulation of weak acids at low extracellular pH disrupts the cellular anion balance and results in a largely diminished l-glutamate solute pool (Flythe and Russell, 2006; Roe et al., 1998). Although the governing mechanisms behind this observation remain unclear, intracellular l-glutamate pools approached ~0 mM in the presence of 100 mM sodium acetate and 100 mM sodium lactate (pH 5.0) in C. sporogenes MD1 (Flythe and Russell, 2006). The effects of fermentation acids on the intracellular pools of TCA cycle component 2-oxoglutarate remain unknown. However, the glutamate dehydrogenase (EC 18.104.22.168, CAC0764) strongly favors the formation of l-glutamate from 2-oxoglutarate (), and intracellular measurement of 2-oxoglutarate, investigating its involvement in nitrogen sensing in various bacteria, yielded pool levels orders of magnitude lower than that of l-glutamate (Muller et al., 2006; Muro-Pastor et al., 2001). Thus, a metabolic reaction (Eq. 3) ultimately leading to the production of l-glutamate or 2-oxoglutarate without the involvement of the intracellular l-glutamate solute pools or direct transport into the cell was required of the genome-scale metabolic network for C. acetobutylicum. Other specific cases of specialized clostridial metabolic network-building, related to anaerobic pathways, are discussed in the following sections.
A thermodynamic analysis was performed to assess the feasibility of the proposed reverse function of the arginine biosynthesis pathway, shown in Figure 5. This was done by calculating the ΔrG′ for all reactions of the pathway. It is noted that for a pathway to be thermodynamically possible, all reactions of the pathway (not the net reaction shown by Eq. 4) must have negative ΔrG′ values. Given the large positive values of (shown in Fig. 5) for reactions catalyzed by glutamate dehydrogenase (EC 22.214.171.124, CAC0737) and acetylglutamate kinase (ArgB, EC 126.96.36.199, CAC2389), the proposed pathway requires significant concentration gradients to be thermodynamically feasible. The effect of the size of the l-glutamate solute pool is two-fold in this case. On one hand, the l-glutamate concentration must be large enough to provide the concentration gradient needed to overcome the large of the glutamate dehydrogenase, yet a smaller l-glutamate solute pool favors the reverse function of the N-acetylornithine aminotransferase (ArgD, EC 188.8.131.52, CAC2388), as shown in Figure 5. In addition, a thermodynamic analysis of this reaction pathway requires inputs of the intracellular nucleotide levels of ATP, ADP, NADPH, and NADP+; however, it has been shown extensively in C. acetobutylicum that these levels vary widely in response to substrate limitation and dominant metabolic programs (e.g., acid-ogenesis or solventogenesis) (Girbal and Soucaille, 1994; Grupe and Gottschalk, 1992; Meyer and Papoutsakis, 1989). Thus, the thermodynamic feasibility of the proposed reverse arginine biosynthesis pathway was studied as a function of the intracellular concentration ratios of ADP/ATP, NADP+/NADPH, and l-glutamate/l-ornithine. The following intracellular metabolite concentrations were used as constants in the analysis: 0.4 mM acetyl-CoA (Boynton et al., 1994); 0.1 mM CoA (Boynton et al., 1994); 1 mM ammonia (Schreier et al., 1982); and 10 mM orthophosphate (Heinemann et al., 2005). The concentration ratios mentioned above were adjusted to yield ΔrG′ values of zero for all reactions of the pathway (Fig. 5). These calculations produced a 3-dimensional plane (ΔrG′ = 0), shown as Figure 6. Above the plane of Figure 6, reverse operation of the arginine biosynthesis pathway is thermodynamically infeasible, and below the plane all ΔrG′ values of the pathway are negative. For example, given ADP/ATP and NADP+/NADPH ratios of 10 (respectively), the ratio of l-glutamate to l-ornithine intracellular solute pool levels must be less than 3 for l-glutamate production from the reverse arginine biosynthesis pathway to be possible. Even though the intracellular nucleotide levels have been widely studied, metabolomic data of the l-glutamate/l-ornithine ratio in C. acetobutylicum remain unknown; however, this analysis provides the necessary physiological conditions to allow the arginine biosynthesis pathway to contribute to the intracellular l-glutamate solute pool.
Development of a genome-scale model for a strict anaerobe, such as C. acetobutylicum, from reaction network databases and enzyme homology yielded multiple aerobic reactions that were further resolved using the BRENDA database to locate anaerobic reactions catalyzed by available enzymes. The list of aerobic reactions assigned to the C. acetobutylicum genome in the KEGG database (as of August 2007) is presented in Supplementary Appendix 3. It is possible that many of the enzymes identified through homology searches that catalyze aerobic reactions also catalyze anaerobic reactions that remain uncharacterized. Two examples are: (i) the NAD biosynthesis pathway; and (ii) anaerobic biosynthesis of l-isoleucine.
The quinolinate precursor of NAD is commonly synthesized in vivo from l-aspartate through an iminoaspartate intermediate by l-aspartate oxidase (NadB, EC 184.108.40.206, CAC1024) and quinolinate synthase (NadA, EC 220.127.116.11, CAC1025). Alternatively, quinolinate is synthesized from the metabolism of l-tryptophan. However, with current genome annotation of C. acetobutylicum, the pathway of possible l-tryptophan utilization, yielding quinolinate, is largely uncharacterized. This biochemical process requires, at minimum, five enzymes, and none have been identified in C. acetobutylicum through gene homology. Since a minimal medium (Monot et al., 1982), that contained no amino acids or peptides was used, the assumption was made that amino acids were synthesized in vivo for incorporation into protein and as precursors of other biological macromolecules. Thus, quinolinate biosynthesis from l-tryptophan was not considered a feasible pathway of biosynthesis in a minimal medium. Thus, a feasible pathway of NAD biosynthesis requires the conversion of l-aspartate to iminoaspartate by l-aspartate oxidase (NadB, EC 18.104.22.168, CAC1024) under anaerobic conditions. Incidentally, l-aspartate oxidase is also one of multiple catalysts for the conversion between l-aspartate and oxaloacetate. However, reaction mechanisms catalyzed by l-aspartate oxidase currently available in the KEGG database are aerobic. Through the BRENDA database and a further literature investigation, fumarate was identified as a possible electron acceptor for the conversion of l-aspartate to oxaloacetate catalyzed by l-aspartate oxidase under anaerobic conditions (Messner and Imlay, 2002; Tedeschi et al., 1996). Further, an l-asparate oxidase has been identified in an anaerobic hyperthermophilic bacterium and has been found to catalyze anaerobic l-aspartate dehydrogenation (Sakuraba et al., 2002). Thus, we propose the conversion of l-aspartate to iminoaspartate by l-asparate oxidase (NadB, EC 22.214.171.124, CAC1024) in the C. acetobutylicum metabolic network through the use of fumarate as a terminal electron acceptor, resulting in the production of succinate as well as iminoaspartate, as shown by Equation (5).
The biosynthesis pathway of l-isoleucine in C. acetobutylicum was found not to include l-threonine (Nolling et al., 2001). Homology analysis of the threonine dehydratase from B. subtilis (IlvA, EC 126.96.36.199, BG10673), which catalyzes the reaction of l-threonine to 2-oxobutanoate, yielded a low bit-score (Pearson, 1996) when compared to ORFs of C. acetobutylicum. The biosynthesis of 2-oxobutanoate through a 2-methylmaleate intermediate was investigated since this pathway was suggested for M. thermaautotrophicum (Eikmanns et al., 1983). However, a homology search of the B. subtilis l-serine dehydratase (SdaAA, SdaAB; EC 188.8.131.52; BG13397, BG13398) against proteins of the C. acetobutylicum genome using BLASTP returned low bit-scores as well. Finally, biosynthesis was traced from l-aspartate to homoserine to 2-oxobutanoate through homoserine-O-succinyl-transferase (MetB, EC 184.108.40.206, CAC1825) and cystathione-γ-synthase (EC 220.127.116.11, CAC0390). This metabolic route of l-isoleucine biosynthesis is inefficient as MetB requires succinyl-CoA as a substrate. This TCA cycle component is derived from 2-oxoglutarate at the bottom of the incomplete TCA cycle of C. acetobutylicum (Fig. 4).
In the current model, succinate is produced from succinyl-CoA in the biosynthesis of homoserine and from the anaerobic biosynthesis of NAD. However, a clear path for its degradation remains elusive due to the incomplete TCA cycle (Fig. 4). Utilization of succinate through the reverse reaction of Equation (5) is infeasible since iminoaspartate is consumed by NAD biosynthesis. Other possibilities for succinate assimilation exist: (i) it is transported out of the cell, (ii) it is converted back to succinyl-CoA by a CoA transferase, or (iii) it is processed to butyric acid through a crotonyl-CoA intermediate by a pathway similar to that observed for C. kluyveri (Sohling and Gottschalk, 1996). The conversion of succinate to succinyl-CoA was chosen for the genome-scale model for the following reasons: (i) the primary metabolism of C. acetobutylicum is well-established and does not support butyric acid production from succinate, (ii) succinate is not a byproduct commonly found in C. acetobutylicum fermentation broths, (iii) several CoA transferase enzymes exist in the genome (see Supplementary Appendix 1), and (iv) a purified CoA transferase from C. acetobutylicum demonstrated reversibility and specificity for multiple substrates in previous kinetic studies (Wiesenborn et al., 1989). Therefore, we realize that the proposed pathway of succinate assimilation to succinyl-CoA is an approximation based on the best available data at this time.
The reconstructed metabolic network for C. acetobutylicum was used with FBA and systematic gene knock-outs to identify those enzymes (and their encoding genes) that will prevent growth when knocked-out in silico. One goal of this computational study is to identify gene knock-outs that arrest growth but do not disrupt the primary metabolism of C. acetobutylicum. Cells were grown in silico on three different media in this study, given the developed genome-scale model for C. acetobutylicum: (i) the minimal medium extracellular environment (Monot et al., 1982), (ii) minimal medium supplemented with l-glutamine, l-asparagine, l-histidine and l-cysteine (called partially supplemented medium), and (iii) minimal medium supplemented with all l-amino acids as well as d-ribose and glycerol 3-phosphate (called supplemented medium). It is noted that the energetics and metabolic capacities of these in silico knock-out strains were not probed in depth. Only the ability of the altered metabolic network to produce biomass in silico was investigated, so the underlying membrane transport mechanisms of supplemented media nutrients and details of resulting metabolic capacity were ignored for these simulations. Reactions resulting in arrested growth in silico of C. acetobutylicum for each medium are included in Supplementary Appendix 1. Table III contains a summary of the number of reactions arresting growth in silico, broken-down into broadly defined metabolic pathways. In particular, in the absence of an extracellular source of amino acids (minimal medium), the pathways of amino acids biosynthesis (e.g., aromatic amino acids biosynthesis) contained a large number of reactions that arrested growth in silico when knocked-out. In the presence of supplemented media, predictably, these pathways did not arrest growth in silico when knocked-out. However, four reactions in amino acids metabolism did arrest growth in this medium following in silico knock-outs. These particular enzymes are responsible for processing amino acids into precursors of other pathways. One member of this group is the d-alanine-d-alanine ligase (ddlA, EC 18.104.22.168, CAC2895) that produces d-alanyl-d-alanine, which is vital to peptidoglycan biosynthesis. Conversely, in the presence of supplemented media, the large numbers of related reactions leading to arrested growth in silico were in the biosynthesis of steroids, riboflavin, purine and glycerolipids.
Semi-automated reverse engineering of a genome-scale reaction network using building-block transfer equations was developed and coupled with iterative measures of network-building through database and literature mining resulting in the first genome-scale reaction network for C. acetobutylicum. This is the first genome-scale model for any of the clostridia. Thus, several examples of the use of reaction and enzyme databases to characterize anaerobic reactions catalyzed by pathways for several well-known enzymes were presented. In addition, the function of the incomplete TCA cycle, through incorporation of the urea cycle, was resolved in detail based on homology searches and metabolic demands of the genome-scale reaction network. This led to two possible mechanisms of l-glutamate biosynthesis that are influenced by the presence of weak acids and an intracellular l-glutamate solute pool. Thermodynamic analyses determined the physiological conditions under which the proposed pathways are capable of producing l-glutamate. Given the previously observed concentration ratios of ADP/ATP and NAD+/NADH, the levels of the intracellular l-glutamate solute pool (which is also subject to fluctuation) has a large impact on the thermodynamic feasibility of many metabolic pathways in C. acetobutylicum. Our model successfully predicted acidogenesis and solventogenesis of the wild-type strain, the loss of butyric acid production in the buk knock-out, and the loss of butanol and acetone production by the M5 strain. However, due to the lack of regulation, the genome-scale model cannot yet describe the metabolic events of acido-genesis, acid re-uptake, and solventogenesis as a cascading sequence of events governed by the activation of sigma factors.
This work was supported by NSF grant BES-0418157. R.S.S. was supported by NIH NRSA post-doctoral training grant F32GM078947.
Contract grant sponsor: NIH
Contract grant number: GM078947
Additional Supporting Information may be found in the online version of this article.