The screening of single-gene deletion mutants on glycerol minimal medium provides a meaningful addition to the collection of data regarding essential genes for E. coli. With the combination of other such genome scale gene essentiality studies, we continue to refine our notion of what genes are required for growth on rich and minimal media. From a comparison of genes required for growth under rich- and minimal-medium conditions, a toolkit of genes enabling growth in limiting environments can be identified. By studying the genes required for growth on glycerol minimal medium, we showed that (i) our understanding of the roles that these essential genes play in this toolkit is clear and relatively complete, as only two putative genes of unknown function (yjhS and yhhK) were identified as essential in this phenotyping screen; (ii) the current metabolic and regulatory model is highly accurate in its essentiality predictions; and (iii) comparisons of model predictions and high-throughput phenotyping data represent a powerful approach to rapidly generate model refinements and hypotheses likely to lead to an enhanced understanding of the organism.
Remarkably, 112 of the identified 119 conditionally essential genes are included in the current metabolic model. This observation suggests that the applied experimental approach has a very low rate of incorrectly identifying essential genes. Otherwise, nonmetabolic and uncharacterized genes (at least 40% of E. coli genes) would comprise a substantially larger fraction of the identified set. At the same time, it indicates that an inventory of E. coli metabolic genes captured in the current model (1,003 out of ~4,400 genes in the E. coli genome) is rather comprehensive, at least with respect to the pathways required to support growth on minimal medium. The fact that the identified conditionally essential gene set contained only two genes of unknown function is notable but not surprising, since our screening protocol is conceptually equivalent to the identification of auxotrophs, a historical standard in the study of E. coli genetics.
These experimentally essential genes can be mapped to metabolic subsystems, which allows a level of generalization enabling us to detect tendencies across multiple organisms that may be obscured by details of functional variants. This type of analysis readily facilitates the identification of metabolic functions that are required by different organisms without the potentially complicating details regarding how the molecules are synthesized. For example, Bacillus subtilis, E. coli, and Corynebacteria use three different chemistries in the lysine biosynthesis DAP (meso-diaminopimelate) pathway, but their purposes remain the same. It should be noted that these subsystem projections were made only for conditionally essential genes and not for genes that are essential for growth on rich medium (and likely essential in minimal-medium environments, as well). For example, only the portions of the pathways that are required for NAD and CoA biosynthesis on minimal and not rich medium are represented. Otherwise, these fundamentally essential subsystems would be present in all analyzed genomes.
The set of conditionally essential subsystems (and genes therein) identified in this study may also be used to assess the metabolic potentials of organisms present in environmental samples as captured by emerging metagenomics data (49
). Researchers will be able to rapidly assess the pathways present within an environmental sample and use the essentiality information to develop potential laboratory medium formulations to facilitate further controlled study in the laboratory (47
). Furthermore, the presence of certain pathways and the absence of others may provide insights into the microenvironment from which the sample was taken and also indicate local intracommunity relationships between species that are present in the sample. This subsystem-based essentiality analysis approach could be a useful tool to add to the growing compendium of methods (5
) being developed to analyze and interpret these complex data.
Further analysis of the generated gene essentiality data set was made using a metabolic and regulatory model allowing the data to be easily placed into biological context. Discrepancies between model and experiment can be used to improve the predictive capabilities of the model by indicating regions that are not captured accurately by the models or, more importantly, can point to areas in metabolism or regulation that require further experimental interrogation. For example, a number of independent gene deletion studies have shown that some genes involved in arginine biosynthesis are not essential (18
), but without these enzymes, the current literature cannot explain how this essential amino acid is synthesized. Therefore, further experiments need to be conducted to either identify novel arginine biosynthetic genes or determine which multifunctional enzymes can compensate for any perturbation of the genes.
Additionally, based on the experimental results, several model improvements are suggested. Since a number of experimentally essential genes are involved in cofactor biosynthesis, a number of cofactors should be included in the biomass objective function used to conduct the growth prediction simulation. These cofactors include pyridoxal-5-phosphate, isoprenoids, hemes, ACP, and ubiquinone. These will help correct for the false negatives (lethal phenotypes with nonlethal model predictions) that account for a large number of discrepancies in both minimal- and rich-medium phenotypes (data not shown for rich medium). A wild-type biomass composition does not always correlate with an essential biomass composition; for example, only a core and not a complete LPS is required for cell survival (37
). Accordingly, the essentiality of these and other biomass components can be refined or relaxed based on the nonessentiality of the corresponding biosynthetic-pathway genes. These issues are being addressed in a forthcoming updated metabolic reconstruction of E. coli
(A. Feist and B. O. Palsson, personal communication) and represent a significant advance.
Model improvements are also suggested with regard to the first steps of glycerol metabolism (Fig. ). As previously noted, analysis of the false positives suggests that glycerol import can occur by passive transport across the cell membrane in the absence of the glpF
-encoded transporter. Additionally, the initial enzymatic steps required to convert glycerol to dihydroxyacetone phosphate appear to be exclusively mediated by GlpK and GlpD rather than by GldA and the DhaKLM-PtsHI complex. This pathway bias is likely due to transcriptional regulatory effects. Indeed, the elevated expression of glpK
during growth on glycerol revealed by quantitative RT-PCR (Fig. ) further supports the notion that the GlpK-GlpD branch is dominant under these conditions. Furthermore, a recent study showed that the DhaR transcriptional regulator specifically upregulates the genes encoding DhaKLM in the presence of dihydroxyacetone, but not glycerol (2
). Under the conditions utilized in this study, quantitative RT-PCR of dhaM
(Fig. ) showed that the dhaKLM
genes are only minimally expressed, leaving the alternative glycerol metabolic pathway dormant. Including the recently characterized DhaR regulatory interaction (2
) in the integrated regulatory-metabolic model will readily correct this discrepancy.
In summary, this high-throughput phenotyping screen provides a significantly enhanced view of the conditionally essential gene set required for growth under minimally supplemented growth conditions and additionally represents the most comprehensive assessment of the constraint-based metabolic model of E. coli conducted to date. Moreover, this study further highlights the utility of using genome scale models as a context for content in interpreting and analyzing complex high-throughput data sets. This powerful synergistic approach of not only using models as data analysis tools, but also using high-throughput data as feedback for model improvement, is becoming a paradigm that will continue to drive systems biology research forward.