|Home | About | Journals | Submit | Contact Us | Français|
Pseudomonas aeruginosa is a major life-threatening opportunistic pathogen that commonly infects immunocompromised patients. This bacterium owes its success as a pathogen largely to its metabolic versatility and flexibility. A thorough understanding of P. aeruginosa's metabolism is thus pivotal for the design of effective intervention strategies. Here we aim to provide, through systems analysis, a basis for the characterization of the genome-scale properties of this pathogen's versatile metabolic network. To this end, we reconstructed a genome-scale metabolic network of Pseudomonas aeruginosa PAO1. This reconstruction accounts for 1,056 genes (19% of the genome), 1,030 proteins, and 883 reactions. Flux balance analysis was used to identify key features of P. aeruginosa metabolism, such as growth yield, under defined conditions and with defined knowledge gaps within the network. BIOLOG substrate oxidation data were used in model expansion, and a genome-scale transposon knockout set was compared against in silico knockout predictions to validate the model. Ultimately, this genome-scale model provides a basic modeling framework with which to explore the metabolism of P. aeruginosa in the context of its environmental and genetic constraints, thereby contributing to a more thorough understanding of the genotype-phenotype relationships in this resourceful and dangerous pathogen.
With sequenced genomes now routinely being made available to the public, detailed annotations and various publicly available genomic resources have enabled the formation of genome-scale models of metabolism for a wide variety of organisms (12, 21, 26, 42, 63, 65). A wealth of data from well-controlled experiments, coupled with advancements in methods for computational network analysis, have allowed these models to aid interrogation of metabolic behavior. In addition, an iterative process to model development—cycles of in silico model predictions, experimental (i.e., wet lab) validation, and subsequent model refinement—has enabled the development of methods that have contributed to biological discovery, such as in determination of likely drug targets in Mycobacterium tuberculosis (3, 26), metabolic engineering of cells for production of valuable compounds (5, 32, 34), and development of novel frameworks for contextualizing high-throughput “-omics” data sets (15, 24, 64).
Pseudomonas aeruginosa is a ubiquitous gram-negative bacterium that is capable of surviving in a broad range of natural environments, although it is mostly known for its role as an opportunistic pathogen (40, 60, 72). While P. aeruginosa is generally found in aerobic environments, it is able to thrive anoxically and, notably, to denitrify (58). It was also recently shown that P. aeruginosa can form biofilms under microaerobic (i.e., very low oxygen) conditions similar to those found in the lungs of cystic fibrosis (CF) patients (1). These features strongly contribute to the notable success of P. aeruginosa in chronically infecting the lungs of CF patients, nearly all of whom have lifelong P. aeruginosa infections starting at an early age (49, 70). P. aeruginosa is also a serious pathogen in nosocomial infections and various acute infections in immunocompromised patients, such as severe burns and urinary tract infections (39, 49, 68). Part of the reason for this remarkable ecological success is thought to be the considerable metabolic versatility and flexibility of P. aeruginosa (62), which renders the study of the metabolism of this life-threatening microbe crucial to the understanding of its pathogenicity and opportunistic nature.
We present a genome-scale metabolic reconstruction of P. aeruginosa PAO1 (hereafter referred to as PAO1), called in silico strain iMO1056 following an often-used naming convention (48). This model accounts for 1,056 genes encoding 1,030 proteins that catalyze 883 reactions (Fig. (Fig.1a).1a). Gene-protein-reaction (GPR) associations are accounted for in the reconstruction, as well as the stoichiometry and thermodynamically derived directionality of all included reactions. This reconstruction process led to the reannotation of several open reading frames (ORFs). The model was tested against high-throughput substrate utilization experiments (90% matched the usage of common substrates) and two published sets of genome-wide knockout data (85% accuracy of essentiality predictions). Several additional predictions were made regarding PAO1 physiology and virulence, and these may be tested subsequently. This genome-scale reconstruction and subsequent constraint-based modeling enable integration of high-throughput data to generate novel, testable hypotheses that will assist in exploring the physiology of PAO1 and assessing its relevance for pathology.
The reconstruction process is outlined schematically in Fig. Fig.2,2, and details are described elsewhere (47). The initial draft of iMO1056 was built from the annotated PAO1 genome available at the Pseudomonas Genome Database (PseudoCAP; www.pseudomonas.com), supplemented by literature mining and BLAST searches (69). Biological information databases such as EXPASY, KEGG, and TCDB were used to link annotated genes to proteins and proteins to reactions (19, 28, 51), and confidence values were added to reactions based on the reported confidence values from the PseudoCAP annotation. These GPR associations link genetic data to reactions in the model and allow the model to predict effects of genetic perturbations on metabolic phenotype. GPR associations are defined through Boolean logic, where, for example, two protein subunits that combine as a complex to perform one enzymatic function would have an “and” relationship with relation to that reaction, whereas two isozymes that are each independently capable of catalyzing a reaction would have an “or” relationship.
Gaps in metabolic pathways necessary for cell growth and production of major virulence factors were filled through careful literature mining and homology and context analysis with BLAST at the Pseudomonas Genome Database and NCBI websites and other online resources (53). Genes whose functions were inferred via these sources were assigned reaction identification confidence values based on the guidelines of the Pseudomonas Genome Database genome annotation effort for protein name confidence values. This concordance ensures that confidence values for key findings in our study are consistent with confidence values for gene functions in the currently published annotation (69). The PAO1 biomass reaction (defined as a relative weight of metabolites necessary to synthesize biomass, as described previously) (66) was assumed to be similar to the published Escherichia coli biomass reaction, albeit with some minor variations to account for PAO1-specific data, as described in the supplemental material. Energy efficiency of the electron transport chain in PAO1 was approximated by assuming that P. aeruginosa grown aerobically would have a similar P/O ratio to published values for the model prokaryote Escherichia coli (41).
Confidence classes for reactions were assigned based on the PseudoCAP protein name confidence rating system (69), which rates the confidences of gene-protein associations in the PAO1 annotation. We extended the system to rating confidences for reaction associations. There are four levels in this rating system, as follows: class 1, which accounts for genes whose functions have been demonstrated experimentally with PAO1; class 2, which accounts for genes whose functions were inferred via homology with an experimentally characterized protein in another organism; class 3, which represents gene-protein associations based on the presence of conserved amino acid motifs; and class 4, which accounts for genes with similarity to other genes of unknown function. We adhered to this rating system for genes whose functions were newly annotated in this study (see Table Table2,2, “proposed confidence” column), and for genes which were not newly annotated, we assigned reaction confidences equivalent to the protein name confidences in the annotation.
In order to perform flux balance analysis (FBA), a network must first be represented in the form of a stoichiometric matrix (S matrix), as shown in Fig. Fig.3a.3a. By convention, rows of the S matrix represent metabolites and columns represent reactions. Reaction substrates have negative coefficients in the S matrix, while products have positive coefficients. The S matrix for a network wholly describes the stoichiometry of reactions in the network. The S matrix also accounts for transport reactions across the cell membrane, represented as reactions interconverting intracellular and extracellular compounds, often coupled with hydrolysis of ATP (in the case of ATP-binding cassette transporters) or exploitation of a chemical gradient (e.g., H+ symport or antiport). Figure Figure3a,3a, for example, illustrates four transport reactions, namely, R4, R5, R6, and R7.
FBA is a computational method that calculates reaction fluxes, their distribution within a metabolic network, and on this basis, growth yields (Fig. (Fig.3)3) (33). In order to obtain flux predictions, the metabolic network is assumed to be at steady state, meaning that concentrations of intracellular metabolites do not vary during the course of a simulation. This steady-state assumption, which makes FBA results directly comparable to data from cells at a fixed growth rate (i.e., in exponential phase), is necessary since kinetic parameters are not accounted for in iMO1056.
In many FBA applications, a linear biomass reaction is defined and assumed to be the objective toward which the flux distribution of a network is optimized. The biomass reaction represents a weighted ratio of components forming the dry weight of a cell, as well as hydrolysis of ATP to account for the energy needs involved in growth and cellular maintenance (66). Optimization for biomass is justified by the assumption that bacteria have been optimized evolutionarily for growth, and experimental studies have verified the rationality of this assumption (16, 55). This optimization step is necessary due to the underdetermined nature of metabolic networks, as a range of flux profiles are possible for a given network. FBA optimization is constrained by the stoichiometry of the network and by upper and lower bounds on reactions, as set by thermodynamic and substrate uptake constraints. In addition, conditional constraints, such as gene knockouts and alternative gene regulation, can be placed on the system to simulate experimental conditions (22, 59).
To perform FBA, the flux of the biomass reaction is maximized given the steady-state condition, as follows: S · v = 0, where S is the S matrix representing the network (Fig. (Fig.3a)3a) and v is the vector of all fluxes in the network (Fig. (Fig.3b,3b, top panel). Ultimately, FBA yields a predicted optimal growth yield and a flux distribution through the system that will support this optimal growth (Fig. (Fig.3b,3b, bottom panel). See the supplemental material for a discussion of constraints used in FBA simulations in this study.
PAO1 was tested for the ability to utilize various carbon sources, using Biolog GN2 microplates (Biolog Inc., Hayward, CA). All procedures were performed as indicated by the manufacturer. Bacteria were grown overnight at 28°C on a Biolog universal growth agar plate. Samples were swabbed from the surface of the plate and suspended in GN inoculating fluid. Each well of the microplate was inoculated with 150 μl of bacterial suspension, and the plate was incubated at 28°C for 24 h. Subsequently, the microplate was read by a microplate reader, and the results were analyzed with MicroLog3 4.20 software.
The assay performed with the Biolog plate is a measurement of the reduction of tetrazolium dye after 24 h, indicating whether P. aeruginosa has been respiring actively during that period. Active respiration in minimal medium indicates utilization of the sole carbon source provided.
FBA optimizations were performed on a Dell computer with 2 GB of RAM and a 1.8-GHz Intel Centrino processor linked to a dedicated database server with Simpheny software (Genomatica, Inc.). Transposon knockout data were handled in Excel (Microsoft Corp.), and statistical analysis of transposon data was performed in Matlab R2006a (Mathworks, Inc.).
We generated a constraint-based, genome-scale representation of PAO1's metabolism. The reconstruction accounts for 1,056 genes involved in 883 metabolic reactions (Fig. (Fig.1a),1a), including anabolic pathways necessary for the synthesis of all major cellular biomass components from basic metabolic precursors (see Fig. Fig.4a4a for a complete network map). As described in Materials and Methods, the reconstruction process included an initial model-building stage based heavily on public databases and literature, a gap-filling stage in which missing steps in essential metabolic pathways were refined, and finally, an extension and validation stage that included expansion of the model by comparison with Biolog substrate utilization data, comparison of in silico predictions with physiological data, and validation against two experimentally determined sets of likely essential genes.
Aside from accounting for all major pathways necessary for growth of P. aeruginosa, an effort was made to reconstruct some of the major pathways associated with virulence for this bacterium. The metabolism of P. aeruginosa is involved in a number of virulence processes, such as quorum sensing (11, 23), expression of lipopolysaccharide and rhamnolipids (61), and notoriously, the switch from the nonmucoid to mucoid form and associated production of the exopolysaccharide alginate (46). Table Table11 lists the virulence-associated pathways accounted for in iMO1056 and some relevant features of these pathways.
A major value of a manual model-building effort is the careful revision of the current genome annotation based on literature evidence encountered during the model-building process, BLAST searches, and gap closures. Annotation refinements represent (i) knowledge that was in the literature but was overlooked in the original annotation and (ii) new hypotheses that came directly from the model-building process. Specifically, the identification of network gaps (instances where a metabolite can be consumed but not produced or vice versa) using FBA optimization pinpointed reactions that are necessary for a pathway to function but that are peripheral to the pathway and thus were overlooked during historical pathway characterization. Table Table22 provides a list of recommended annotation refinements for PAO1 that arose from the reconstruction process. In some cases, such as the annotation refinements for the nuh gene (Table (Table2),2), the function of the gene was not changed, but rather, specific reactions which are important for certain cellular processes were assigned to already annotated genes.
The first level of annotation refinement is the assignment of a specific function to a previously uncharacterized gene based on literature evidence or a BLAST search. An example of this in iMO1056 is the gene mtnP (PA3004), which was annotated in PseudoCAP as a “probable nucleoside phosphorylase” but was subsequently found in the literature to have the specific function of a 5′-methylthioadenosine phosphorylase (Fig. (Fig.4b)4b) (57). The mtnP gene was first identified (57) through detailed sequence analysis, and then its function was confirmed (57) by knocking out mtnP in a ΔmetZ strain of P. aeruginosa and observing a loss of growth on minimal medium supplemented with methionine, homocysteine, or methylthioadenosine (57). mtnP is necessary for the function of the methionine salvage pathway in P. aeruginosa and was previously characterized in the literature but not in the P. aeruginosa annotation, and its inclusion in iMO1056 is an example of how the manual reconstruction process can benefit annotation efforts in a way that would not be immediately obvious from automatic model building. In this case, the reconstruction process led us to investigate the methionine salvage cycle and to uncover a reaction that was necessary for the pathway to function but which would have been difficult to pinpoint without a rational model-building approach.
The next level of gene annotation refinement is the assignment of a new function based on gap analysis. This type of annotation refinement occurs when a gap in a pathway renders model growth infeasible, and the missing reaction is subsequently identified by literature mining or a homology search. For example, the previously uncharacterized conserved hypothetical protein PA4457 was identified as an arabinose-5-phosphate isomerase gene (kdsD), which was lacking in the original PAO1 annotation (Fig. (Fig.4c).4c). Gap analysis revealed a need for the production of d-arabinose-5-phosphate in the network to enable synthesis of 3-deoxy-d-manno-2-octulosonate (KDO). KDO links the backbone of the lipid A moiety to the core oligosaccharide of lipopolysaccharide and is essential for growth of PAO1 (20). A series of BLAST searches in the nonredundant NCBI database revealed that PA4457 (thus far annotated as a gene coding for a hypothetical protein) had 79% identity over a length of 326 amino acids to the protein KdsD carried by Pseudomonas fluorescens Pf-5 (score, 501; E value, 3 × 10−130). KdsD is an arabinose-5-phosphate isomerase, the enzyme that catalyzes the interconversion of d-arabinose 5-phosphate and d-ribulose 5-phosphate (38). This reannotation and its subsequent incorporation into the model enabled iMO1056 to grow in silico.
Gap analysis also enables the reconciliation of conflicting knowledge in the literature. An earlier version of the PAO1 reconstruction was unable to produce biomass in the absence of oxygen. It was discovered through gap analysis that the inability of iMO1056 to grow without oxygen was due to the oxygen consumption of the enzyme l-aspartate oxidase (NadB), which catalyzes the first step in the production of NAD. Literature mining revealed that while NadB has a strict requirement for O2 when assayed in vitro, it has been suggested that another electron acceptor might suffice in the absence of elemental oxygen in vivo (17). In the network reconstruction, we included an alternate reaction stoichiometry for NadB that had been suggested previously (17), and we thus eliminated the oxygen requirement for growth (Fig. (Fig.4d).4d). Despite this literature analysis, however, it is noteworthy that knocking out nadB is not lethal for P. aeruginosa in vivo (10). This result suggests that another pathway for NAD biosynthesis likely exists. More detailed biochemical study is necessary to determine if there is another anaerobic path for NAD synthesis in PAO1 and whether NadB can truly function in the absence of oxygen.
Gap analysis was also used to refine the stoichiometry and directionality of several reactions that were incorrect or vague in online databases. For instance, the reaction catalyzed by the electron transport enzyme NADPH-quinone oxidoreductase (PA4975; EC 126.96.36.199) was found upon analysis of free ATP production to be necessarily irreversible, despite its being listed as reversible in the EXPASY database (19). Such network refinements would be difficult to systematize without a genome-scale model that accounts for the interdependency of reactions and metabolites.
The in silico biomass yield of iMO1056 on glucose minimal medium under steady-state (continuous culture) conditions was calculated using FBA. In silico constraints used for minimal medium conditions are outlined in the supplemental material. The maximum specific growth rate was determined to be 1.048 h−1, with a specific glucose uptake rate of 10 mmol Glc · g dry weight−1 · h−1, giving an in silico biomass yield of 0.1048 g dry weight · (mmol Glc)−1. This value for in silico biomass yield falls within the range of experimentally determined values for biomass yield for P. aeruginosa under similar conditions, which were reported previously (67) as 0.094, 0.085, and 0.118 g dry weight · (mmol Glc)−1 at temperatures of 30°C, 38°C, and 41°C, respectively. This value for in silico biomass yield is also within 20% of the maximum biomass yield determined experimentally for P. aeruginosa ATCC 9027 under aerobic growth conditions in another study [0.088 g dry weight · (mmol Glc)−1] (7).
P. aeruginosa is renowned for its ability to survive by denitrification in anaerobic environments (13, 58). The pathway for denitrification of nitrate was included in the reconstruction, and iMO1056 is able to grow anaerobically, with an in silico biomass yield of 0.0846 g dry weight · (mmol Glc)−1, when nitrate is provided under glucose limitation. This in silico biomass yield is within 25% of an experimental value for maximum biomass yield reported previously for P. aeruginosa ATCC 9027 under anaerobic growth conditions [0.067 g dry weight · (mmol Glc)−1] (7). The ratio of maximum anaerobic yield to maximum aerobic yield reported previously (7) is 0.76, which differs from the ratio obtained in silico (0.81) by only 6%, indicating a good match between in silico versus in vivo (experimental) ratios of anaerobic to aerobic yield.
A Biolog study was performed for P. aeruginosa by cultivating PAO1 in minimal medium supplemented with various carbon sources (as described in Materials and Methods). Of the 95 carbon sources tested, 30 could be compared directly to in silico growth of iMO1056 (by assuming that substrate utilization indicates growth, which is usually but not always true). Disagreements between the Biolog data and the in silico growth simulations were used as hypotheses for refining the network reconstruction. The remaining 65 carbon sources included 15 that were present as intracellular metabolites in iMO1056 but for which there were no transporters assigned and 50 that are not currently accounted for in the iMO1056 reconstruction. Of these, 14 showed a growth phenotype in the Biolog data, indicating that incorporation of these compounds into the model might be a future step in iMO1056 model expansion.
Computational experiments were performed to determine the growth of iMO1056 on the 30 Biolog carbon sources that were directly testable with iMO1056 (Fig. (Fig.5a).5a). The in silico growth experiments were performed in Simpheny by using FBA, as described in Materials and Methods. Disagreements between Biolog substrate utilization data and in silico growth simulations were investigated through gap analysis and literature mining, and these discrepancies were rectified where possible. In one case (l-leucine), the Biolog data were unclear about the growth of PAO1, so that data point in Fig. Fig.5a5a is shaded lighter than those for the other carbon sources but is still considered to show a growth phenotype. After completion of pathways for substrates which initially conflicted with the Biolog data set, PAO1 viability matched iMO1056 viability under 27 of 30 conditions (Fig. (Fig.5a).5a). This 90% match indicates that the core metabolism and various catabolic pathways of PAO1 were sufficiently reconstructed in iMO1056 for the model to properly predict catabolism of a variety of common substrates, including many amino acids and several common sugars.
The 15 Biolog carbon sources that were intracellular metabolites but did not have transporters in iMO1056 gave in silico no-growth phenotypes due to the inability of iMO1056 to uptake these compounds from the environment. We evaluated the cause of these in silico phenotypes by adding a temporary transport reaction to iMO1056 for each carbon source in turn and assessing the new transporter-augmented models for in silico growth. Figure Figure5b5b indicates the results of this study, where numbers in the table represent the numbers of carbon compounds out of 15 total that fall into each of four possible permutations (growth versus nongrowth and in silico versus Biolog data). The 10 carbon sources on which PAO1 did not grow in the Biolog study (and thus matched the in silico results for iMO1056) were grouped into the “correctly lacks transporter” and “correctly lacks either transporter or pathway” categories, depending on whether the transporter-augmented model for that carbon source was able to grow (Fig. (Fig.5b).5b). The five carbon sources on which PAO1 did grow in the Biolog study (which thus contradicted the in silico results for iMO1056) were grouped into the similar categories “incorrectly lacks transporter” and “incorrectly lacks pathway and transporter,” indicating the likely cause of the mismatch between Biolog data and in silico simulation. These categorizations will help to focus future annotation refinement efforts. The full list of Biolog results can be found in the supplemental material.
A list of genes predicted to be essential for in silico growth of iMO1056 was compiled and compared with a set of in vivo essential genes (i.e., genes that are reported to be required for growth) from the literature. The set of in vivo essential genes encompasses genes deemed essential in both of two independent genome-scale transposon mutant studies of PAO1 (25, 35). This overlapping data set was chosen because it has a higher confidence value for gene essentiality than the candidate essentials from either transposon study individually, since the inclusion of two transposon studies increases coverage of the PAO1 genome and thus reduces the number of genes erroneously labeled essential. The set of in vivo essentials includes all genes in the PAO1 genome that were not knocked out in either of the two transposon studies. In silico gene essentiality predictions for iMO1056 were calculated in the Simpheny platform via FBA as previously described (27).
Genes in the in vivo essential set but not in the iMO1056 reconstruction were assumed to be involved either in nonmetabolic functions or in metabolic functions peripheral to central metabolism and were thus not included in our essentiality analysis. The total accuracy of essentiality predictions was 85% (Fig. (Fig.6a).6a). This accuracy is comparable to the 94% accuracy achieved in a recent reconstruction of Bacillus subtilis, which also used a genome-scale knockout set for model validation (42). Of genes in the iMO1056 reconstruction, in silico essentiality predictions matched in vivo essentiality for 41% of in vivo essential genes and 91% of in vivo nonessential genes (Fig. (Fig.6a).6a). Although the total accuracy of predictions is significant, these discrepancies (particularly in matching in vivo essentiality) deserve further discussion (as follows) and can serve as a collection of hypotheses that can subsequently be tested.
First, it is informative to note the functions of genes that mismatched between the in silico and in vivo essentiality predictions. Figure Figure6b,6b, c, d, and e show the functional distribution of genes in iMO1056 that are nonessential in vivo and in silico (true-negative results), essential in vivo and in silico (true-positive results), essential in silico but not in vivo (false-positive results), and essential in vivo but not in silico (false-negative results), respectively. A simple analysis of these figures can explain much of the discrepancy. For instance, among the false-negative set, 10 genes encode tRNA synthetases. These synthetases catalyze the formation of aminoacyl-tRNAs in preparation for protein synthesis. These reactions were included in iMO1056 for the sake of completeness, but the tRNAs represent gaps in the model and so are not truly involved in any part of model function (although the tRNAs are gaps in the sense that they cannot participate in metabolic pathways in the iMO1056 model, and they were included in the reconstruction because they represent genes with known enzymatic functions). Considering these 10 reactions to be nonmetabolic and removing them from iMO1056 would improve the match between the in silico essential gene set and the set of in vivo essential genes in iMO1056 from 41% to 44%.
Vitamin and cofactor synthesis genes make up the largest number of genes in both the false-positive and false-negative categories (Fig. 6d and e), and they make up the second largest number of genes in the true-positive category as well (Fig. (Fig.6c).6c). The high discrepancy between in silico and in vivo predictions of essentiality for vitamin and cofactor synthesis genes might be due in part to uncertainty about the vitamin and cofactor content of the LB medium on which the transposon mutants were grown. Since only trace amounts of vitamins are required for cell survival, it is possible that some essential vitamins were present at low concentrations in the LB medium but were not accounted for in our in silico LB medium. This is especially relevant considering previous analyses of yeast extract composition, which showed a high variance in vitamin concentration between extracts from different manufacturers (42; see the supplemental material). Furthermore, since the iMO1056 biomass reaction was based on that of E. coli, it is possible that nutritional differences between the two bacteria contribute to errant predictions about PAO1 vitamin and cofactor essentiality. It is therefore likely that growth experiments on better-defined media will be particularly helpful in elucidating the true essentiality of genes in vitamin and cofactor synthesis pathways.
Second, it is important that the in vivo gene essentiality predictions from the transposon mutant sets are not completely accurate in assessing gene essentiality. An example of a gene that was predicted to be essential in vivo despite in actuality being a nonessential gene is PA0555, encoding the enzyme fructose-1,6-biphosphate aldolase (FBPA). This gene was classified as essential in vivo because transposon mutants with a disrupted FBPA gene were not identified in either PAO1 transposon study, but previous literature shows that the FBPA knockout is not lethal for PAO1 (2). FBPA was correctly predicted to be nonessential in the in silico data set.
The in vivo PAO1 transposon studies used in this analysis discuss the possibility of errors in their candidate essential gene sets, and it was suggested in one of the studies that the number of candidate essential genes in that study (678 total) may overestimate the true number of essential genes by 40 to 50% (25). This overestimate prediction was based on an analysis of the number of ORFs that would likely escape transposon mutagenesis, assuming that all base pairs have equal susceptibility to transposon insertion, including essential genes. Due to its higher coverage of the PAO1 genome, the set of overlapping essentials from the two PAO1 studies should have a higher confidence than that for candidate essentials from either study alone, but the combined transposon set yields an in vivo candidate essential set that is reduced in size by only 34 genes, which does not nearly approximate the 40 to 50% discrepancy suggested (25). Hence, some of the mismatches between the in vivo essentials and the in silico essentials are likely due to overestimation of gene essentiality in the in vivo studies, which would translate into false-negative results in the in silico predictions of essentiality.
Additionally, in one of the transposon studies used in our analysis (35), transposons were reported to likely disrupt genes downstream within an operon, but in the other study (25), an outward-facing neomycin phosphotransferase promoter was inserted as part of the transposon in an attempt to reduce the disruption of downstream genes. It is somewhat unclear from the study, however, how successful the neomycin phosphotransferase promoter actually was in preventing disruption of downstream genes. We therefore undertook an analysis of whether genes downstream of transposon insertions are generally disrupted in vivo, a phenomenon that would cause some nonessential genes to be classified as essential in vivo due to transposons in those genes disrupting downstream essential genes. In order to investigate this phenomenon, we performed a statistical analysis of the number of in vivo essential genes <1,000 bp downstream of genes classified as giving false-negative results. Consistent with the reports for the two transposon studies, we determined that the false-negative results were significantly more likely to have essential genes downstream of them than a set of random genes from the in vivo study that reported polar disruption of downstream genes (indicating possible disruption of downstream essential genes by transposons) (35) but that false-negative results were not more likely to have essential genes downstream of them than a set of random genes from the in vivo study with the neomycin phosphotransferase promoter (indicating a lack of disruption of downstream essential genes by transposons) (25). The result for the in vivo study reporting polar disruption of downstream genes (35) might have been due to the study's much lower coverage of the genome than that of the other study (23% versus 88% of ORFs disrupted). Regardless, this analysis suggests that in a genome-scale transposon study, an outward-facing neomycin phosphotransferase promoter is effective in preventing disruption of downstream genes (see the supplemental material for more details).
We present iMO1056, a genome-scale reconstruction of P. aeruginosa PAO1 that accounts for 1,056 genes encoding 1,030 proteins that are involved in 883 reactions. Specifically reconstructed pathways included those necessary for growth and for production of common virulence factors, including alginate, rhamnolipids, phenazines, and quorum-sensing molecules. iMO1056 was validated with experimental growth rate data from literature, Biolog viability data on multiple carbon sources, and a genome-scale gene essentiality set derived from two independent transposon mutant studies of P. aeruginosa. Model predictions matched well with experimental data in many cases.
Building a genome-scale reconstruction of PAO1 offered insights that would have been difficult to obtain through any other means, including evidence for annotation refinements, the ability to quantitatively predict growth yields and secretion rates under various conditions, and a potential way to probe metabolic stresses incurred by the production of various virulence factors that may be explored in future studies. During the model-building process, we were forced to approach each metabolic pathway quantitatively and comprehensively, so previous gaps in knowledge were highlighted and investigated in a rational manner. This process of gap analysis involves coupling in silico growth simulations with bioinformatic searches and literature mining to fill holes in otherwise known pathways and offers a unique tool for identifying areas of metabolism that require further elucidation. Through gap analysis of P. aeruginosa metabolism, we were able to refine the annotation of a number of genes with respect to their function, directionality, or stoichiometry. Some of these changes are highlighted in Table Table2.2. These annotation refinements represent many crucial metabolic functions of P. aeruginosa, and without a systematic network-level approach to guide our analysis, it would have been difficult to highlight these weak points in current knowledge.
As in any genome-scale annotation effort, the network reconstruction process for P. aeruginosa will require continued work in the future. Mismatches between in vivo and in silico essential genes have highlighted metabolic regions of uncertainty in current knowledge, and these discrepancies still require more work to be explained adequately. In addition, several pathways in the model remain incomplete due to a lack of available knowledge. These gaps include synthesis of cofactors (e.g., cobalamin and thiamine) and, notably, the entire fatty acid β-oxidation pathway, which exists in but has not been characterized extensively for pseudomonads, to our knowledge (52). The Biolog data also indicated some pathways that should be investigated further in the future, such as substrates that PAO1 utilized but which are not in iMO1056 or for which iMO1056 has no membrane transporter. Furthermore, some pathways were not included in the model despite having been studied in the literature, simply because they were peripheral to the main processes of interest in this study (growth and expression of virulence traits). Thus, for instance, pathways for degrading aromatic compounds and for handling xenobiotics were generally not included in the model, and these offer areas for model expansion in the future. Specific validation involving controlled growth experiments with P. aeruginosa will also be informative, and coupling wet lab and in silico experiments will lend greater insight into both specific functions and global properties of P. aeruginosa metabolism.
Additionally, a genome-scale reconstruction of P. aeruginosa metabolism enables the interrogation of several otherwise difficult research questions. For instance, it would be informative to compare the P. aeruginosa metabolic network with metabolic networks of other related but nonpathogenic species. This comparison would allow probing of properties such as pathway redundancy and growth burden of key virulence pathways and would offer insight into how these system-level properties might affect pathogenicity. Such a genome-scale network analysis between a pathogen and a nonpathogen has never been done, to our knowledge, and could provide significant insight into the mechanisms for disease and possible therapeutic targets. Another question of great interest involves the enumeration of selective pressures on P. aeruginosa in the CF lung environment. For example, P. aeruginosa samples obtained from sputum during the life of a CF patient have shown gene mutations and significant alterations in gene expression over time (4, 60). Notably, convergent evolution toward a loss of virulence factors by P. aeruginosa strains taken from multiple CF patients has been demonstrated, suggesting that virulence traits might be selected against once a stable infection has been achieved (60). The effect of the loss of the lasR gene, a master regulator of hundreds of virulence-related genes through the quorum-sensing system in P. aeruginosa, has also been evaluated (8, 56). It was shown that loss of the lasR gene conferred a growth advantage both to strains extracted from the lungs of CF patients and from strains that emerged by selection on rich medium, indicating that optimization of yield could act as a metabolic objective even in the CF lung (8). This result is counterintuitive, as much of the success of P. aeruginosa in chronic lung infections seems to lie in its ability to adopt a slow-growth phenotype, which would suggest that optimization of yield is perhaps not a strong selective pressure in that environment (6). Regardless of whether slow-growing mutants of P. aeruginosa in the CF lung environment are actually optimized for yield, these studies indicate that metabolic selective pressure might be a factor in the evolution of chronic CF strains. With the integration of simple regulatory rules into iMO1056, it will be possible to model different hypotheses about selective pressures in the lung and to analyze the causes for these selective processes.
In addition to the pathogenicity analysis and evolutionary studies outlined above, iMO1056 can serve as a valuable tool in interpreting and informing genome-scale transcriptomic studies of PAO1. Microarray analysis has attracted sizeable interest in the P. aeruginosa community and was recently used to determine genome-level expression changes under various stresses and conditions (13, 36, 71). Enabling a survey of system-level traits of an organism in a relatively unbiased way, microarrays are a crucial element in postgenomic analyses of cell phenotype. While gene expression data cannot be linked directly with metabolic fluxes, past studies have used gene expression data to indicate the regulation of whole pathways in metabolism, thus indicating global phenotypes (9, 12). Furthermore, metabolic reconstructions have been used to predict which genes in a network are likely to be coregulated, and overlaying expression data on a metabolic reconstruction can inform interpretation of microarrays within the context of a genome-scale model (9, 43).
P. aeruginosa is an organism of much interest for its various roles as an opportunistic pathogen (60). From chronic lifelong infections of the lungs of CF patients to acute, highly deadly infections of severe wounds in burn victims, the robustness and environmental diversity of P. aeruginosa are testament to its remarkable natural metabolic agility. We chose to focus our reconstruction effort on P. aeruginosa PAO1 since it is the most studied P. aeruginosa strain and is also the best-characterized strain, but with minor modifications to iMO1056 the model can be tailored to describe similar strains, such as the more virulent P. aeruginosa PA14 (62, 69). A genome-scale metabolic model represents a potentially enormous tool for rational drug design and prediction of cell phenotypes and, in conjunction with regulatory information, can serve in modeling disease processes and engineering therapeutic responses. For P. aeruginosa, a bacterium whose metabolic diversity is a major determinant of virulence, a metabolic network reconstruction will serve as an essential component in a multifaceted and effective response to disease.
We thank Joanna Goldberg, Kenneth N. Timmis, Arvind Chavali, Erwin Gianchandani, Corinne Locke, and Rhiannon Franck for their thoughtful comments and contributions to this study. We also thank the reviewers for thoughtful comments that led to interesting new analyses and insights.
We acknowledge funding from the Whitaker Foundation, the National Science Foundation (CAREER grant 0643548), the National Institutes of Health (NIH Biotechnology training grant [M.A.O.]), the Kluyver Centre for Genomics (The Netherlands), the European Union (Project PROBACTYS; no. 21904), and ERA-NET SysMO (BMBF).
Published ahead of print on 11 January 2008.
†Supplemental material for this article may be found at http://jb.asm.org/.