This study describes the generation of GEMs representing the union (pangenome) and also the intersection (core) of all identifiable metabolic reactions contained in sixteen genomes of E. coli. We used the E. coli pan-GEM to rapidly construct six E. coli strain-specific GEMs. A comparison between model growth predictions and Biolog phenotypes measured in the laboratory demonstrated an accuracy of more than 88%, including those under anaerobic conditions Additional quantitative data was generated for each strain and used to validate the correlation between model predictions and experimental physiology of the strains in the laboratory. These new E. coli GEMs serve as a framework to examine genome-scale metabolic similarities and differences between strains in an evolutionary context with respect to the commensal, EHEC, and UPEC lineages.
The two E. coli
K-12 strains (MG1655 and W3110) are widely used laboratory strains that are believed to have diverged from the same parental strain (strain EMG2 or WG1) approximately 50 years ago [77
]. The sole identified metabolic differences between the two E. coli
K-12 strains based on genome comparison include the gatA
gene that is involved in galactitol transport, dcuA
involved in C4-dicarboxylate transport metabolism, and also tnaB
thought to be involved in the utilization of tryptophan as a carbon and/or nitrogen source [77
]. Of these four metabolic gene differences, only inactivation of gatA
leads to a loss of a reaction in iEco1335_W3110, compared to iEco1339_MG1655 since dcuA
have other isozymes. The gatA
gene contains an insertion sequence (IS) element in E. coli
W3110, which suggests a phenotypic loss for galactitol utilization as a carbon source, yet experimental data (Figure ) reveals that the strain can still use this substrate as sole carbon source, indicating that other transporters may permit galactitol transport for E. coli
W3110. Although the two E. coli
K-12 strains (MG1655 and W3110) exhibited no differences in their GEMs, quantitative and strain-specific differences were observed during batch growth in minimal media with glucose as the sole carbon source. While in silico
predictions for growth yield were similar for iEco1339_MG1655 and iEco1335_W3110, experimental data reveal that in both aerobic and anaerobic conditions, strain MG1655 had higher growth yields, higher growth rate, and attained the final biomass value in less time than strain W3110 (Figure ). Therefore, although the in silico
models for these two strains are nearly indistinguishable, strain specific differences in complex traits such as biomass composition [78
], ATP requirements, PO ratios, and glucose uptake rates may account for these experimental differences. Previous studies have shown that despite their nearly identical genomes and very similar growth patterns in a bioreactor, W3110 and MG1655 have many significant differences in their transcriptomes and proteomes. These include differential expression of pathways affecting central metabolism and the generation of precursor metabolites and energy [79
] suggesting that future models for even these very similar strains will need to account for subtle genetic differences between strains to accurately predict phenotypic traits in simulated culture conditions.
Previous analyses of the E. coli
pangenome estimated that on average each new E. coli
genome sequence added about 176 unique genes to the pangenome [8
], and among these unique genes, we found each additional E. coli
genome resulted in 27 metabolic gene additions corresponding to about 2 new metabolic reactions and 20 isozymes suitable for inclusion in the pan-GEM (Figure ). Clearly some of the metabolic differences between E. coli
strains are due to the addition of genes with new metabolic activity. However, our ability to add new reactions to the metabolic reconstructions is severely limited by the paucity of experimental characterization of the metabolic genes, proteins, and reactions unique to pathogenic strains. Since the strain-specific portions of the genomes remain largely uncharacterized, our current understanding of the metabolic functions they encode is dominated by the presence and absence of genes encoding functions represented in the iEco1339_MG1655 GEM. Many of the genes included in this model are not universally conserved among the genomes we examined; resulting in strain-specific GEMs with an average of 70 fewer genes than iEco1339_MG1655 (Table ). This observation is also consistent with draft GEMs generated using the Model SEED [80
] where the GEM for E. coli
MG1655 contained more genes (>60) and reactions (>460) than the draft GEMs for all four pathogenic E. coli
strains examined in this work (data not shown).
Number of strain-specific orthologous genes in common with those contained in iEco1339_MG1655
Although carbon source utilization has become a standard method to assess the validity of computational metabolic model predictions, this study was the first to examine this procedure under anaerobic conditions. Initially, the accuracy of predictions for carbon source utilization during anaerobic conditions was less than those determined during aerobic conditions. We account this difference to comparisons between Biolog carbon source assays, which examine the ability of a microbial strain to generate energy from each sole carbon source, to in silico
analysis that determines growth as a positive flux value for the biomass reaction. One possible explanation for experimental and in silico
data discrepancies may be that a microbial strain may be able to generate energy from a given carbon source, but that the carbon source is not suitable to sustain growth (i.e. generate a positive biomass value). Therefore, rather than maximize the objective value for the biomass equation, we added two reactions to monitor the ability to generate energy through electron transfer to quinones, and in many cases this analysis resolved discrepancies between in silico
predictions and experimental data, especially for anaerobic conditions. Although this methodology of examining carbon source utilization seems trivial, validation for accurate carbon source utilization is important for modeling complex environments such as those encountered in a host, as 31 of the 76 carbon sources tested here were used to simulate the conditions reflecting invasion of a human cell to study S. typhimurium
LT2 infection [69
]. Therefore, the validation of these strain-specific metabolic models for carbon source utilization will prove useful for future computational modeling of pathogenic E. coli
strains in conditions encountered in the gastrointestinal tract or in other locations such as the urinary tract in mammalian hosts.
With the generation of the first GEMs for pathogenic E. coli strains, two EHEC strains and two UPEC strains, properties of these genome-scale metabolic networks were investigated to identify differences that may play a role in human disease. We analyzed two E. coli O157:H7 strains associated with foodborne outbreaks, strain EDL933 isolated from ground beef in the U.S in 1982 and strain Sakai isolated from contaminated radish sprouts that sickened thousands in Japan in 1996. Strains CFT073 and UTI89, which cause human disease outside of the intestine, were isolated from patients with acute urinary tract infections. A comparison of reaction deletions between the EHEC and UPEC metabolic networks reveals that the EHEC strains have more missing genes corresponding to reactions for inner membrane transport in comparison to the UPEC strains. In addition, the reaction deletions that occur in both pathogenic lineages relative to E. coli K-12 strains are mainly associated with genes involved in lipopolysaccharide biosynthesis/recycling and alternate carbon utilization. It seems likely that some of these missing reactions are the result of acquisition of genes during the evolution of the K-12 lineage. Perhaps some of the reactions missing from both pathogen lineages arise from parallel deletions arising from selective pressures common to both pathogens.
Batch growth experiments were conducted to compare growth yields, growth rates, and the amount of time to attain final biomass among strains. We were surprised that EDL933, Sakai and CFT073 have significantly higher growth rates than MG1655 during aerobic growth conditions yet the in silico predictions reveal little to no differences. We sought to determine if strain-specific glucose uptake rates may improve in silico growth rate predictions. Experimentally determined glucose uptake rates were actually lower for EDL933 and CFT073 than for MG1655, and did not improve in silico predictions. The growth yield values we measured in the laboratory also showed significantly (student's t-test statistic yields p < 0.05) higher yields for EDL933 and CFT073 than the two K-12 strains, but in silico predictions showed only minor strain-to-strain variations. Dynamic FBA using the strain specific E. coli GEMs predicts a similar growth rate from all models including the model for the ancestral core of E. coli. Yet the actual growth rates determined experimentally vary significantly between strains suggesting that our models are not accounting for some strain-specific factors such as oxygen uptake rates, biomass composition, ATP requirement parameters, or additional uncharacterized reactions. The length of time required to attain final biomass was significantly (student's t-test statistic yields p < 0.05) shorter for the four pathogens suggesting that they may be more efficient at biomass production during glucose catabolism, and dynamic FBA analysis accurately predicted this phenotypic difference among the strains.
In anaerobic batch growth conditions there were also differences between strains. All pathogenic strains have higher growth rates than the K-12 strains. The FBA predictions for EHEC strains both reflect this phenotype, but the in silico growth rate predictions for the UPEC strains did not reflect this trend. The experimentally determined glucose uptake rates are higher for both pathogenic lineages than K-12, and these organism-specific parameters improved the FBA predictions. The growth yields determined experimentally are significantly (student's t-test statistic yields p < 0.005) higher for the four pathogens than the K-12 strains. The length of time required to attain final biomass predicted by FBA and determined experimentally was significantly (student's t-test statistic yields p < 0.05) shorter for the EHEC strains than the K-12 strains. Overall, for anaerobic glucose catabolism, all four pathogens appear to grow better than both E. coli K-12 strains.
Even though the metabolic networks of each E. coli strain differ, there were relatively few strain-to-strain differences in reactions predicted as essential for the two growth conditions examined. While there were some identified for all strains that were unique for anaerobic growth in comparison to aerobic, there were relatively few differences between all strains. The two reactions (fumarate reductase and glycolate oxidase) predicted as essential for the E. coli O157:H7 strains, play essential metabolic roles for glycolate recycling and the reoxidation of menaquinol, and represent new targets for control strategies that may help to prevent and treat human EHEC illness.
The comparison of the pan- and core-GEMs reveals that a substantial fraction of the reactions in our current pan-GEM are also in the ancestral core-GEM (92%). However, our knowledge of the detailed biochemistry of the pangenome is likely incomplete since many of the genes in other E. coli strains have unknown functions. One reason why the number of reactions in the core- and pan-GEMs are so similar is because the genes that have been well-characterized biochemically in E. coli tend to be the genes that are conserved and likely ancestral. While the pathogenic E. coli strains are of great interest medically, they are not typically the focus of intense biochemical study to uncover the functions of their novel metabolic genes.
Overall, when data for aerobic conditions is viewed phylogenetically (Figure ), there is no clear trend specific to the two pathogenic lineages, yet it appears that E. coli CFT073 has evolved with a similar growth rate in comparison to the E. coli ancestral core predictions, where as all other strains have evolved with higher growth rates and yields (Figure ).
In contrast, in anaerobic conditions (Figure ), higher growth yields and faster batch growth performance were observed for both EHEC E. coli strains (EDL933 and Sakai), and the insight derived from E. coli ancestral core in silico predictions suggest that the UPEC and K-12 lineages have evolved with less efficient anaerobic glucose catabolism then the EHEC lineage. One possible explanation for this behavior may be that the K-12 and UPEC strains do not routinely encounter the selective pressure from anaerobic conditions, whereas the EHEC strains may have evolved for improved growth in anaerobic conditions enabling their growth in both bovine and mammalian GI tracts, thus suggesting that many EHEC strains may have a better-suited anaerobic metabolism for glucose utilization. These findings suggest that E. coli K-12 strains could be engineered to be more efficient for anaerobic batch growth and that other E. coli strains not examined in this work may yield similar results, yet additional studies are warranted to examine more E. coli strain-specific GEMs, quantitative parameters, and catabolism of additional substrates other than glucose.