is perhaps the best characterized and studied bacterium and is of interest industrially, genetically and pathologically. For these reasons, in silico
modeling efforts have been made to describe and predict its cellular behavior. With the vast amounts of '-omics' data that are being generated, there is a growing need for incorporating and reconciling heterogeneous datasets, including genomic, transcriptomic, proteomic and metabolomic data [1
]. A constraint-based model of E. coli
metabolism can accomplish this and serve as a model centric database.
In addition to providing the context for various '-omics' data types, constraint-based models provide a framework to compute cellular functions [2
]. This modeling method finds the limits of cellular, biochemical and systemic functions, thereby identifying all allowable solutions. Searches within the allowable solution space can identify solutions of interest, for example a solution that maximizes a particular objective. This approach to genome-scale model building has been reviewed in detail [3
]. In general, the application of successive constraints (stoichiometric, thermodynamic and enzyme capacity constraints), with respect to the metabolic network, restricts the number of possible solutions. Linear optimization is often used to find a particular solution in the allowable solution space that maximizes a chosen objective function, such as cellular growth (Figure ). A more detailed description of the constraint-based modeling approach can be found in Materials and methods.
Figure 1 Principles of constraint-based modeling. A three-dimensional flux space for a given metabolic network is depicted here. Without any constraints the fluxes can take on any real value. After application of stoichiometric, thermodynamic and enzyme capacity (more ...)
The constraint-based modeling approach has been used to study E. coli
metabolism for over ten years; the history of such model building efforts has recently been reviewed [7
]. The first genome-scale metabolic (GSM) model accounting for 660 gene products (i
JE660 GSM) was reconstructed using genomic information, biochemical data and physiological data [8
]. This genome-scale model has been used to perform in silico
gene deletion studies [8
] and to predict both optimal growth behavior [9
] and the outcome of adaptive evolution [10
This paper reports an expansion of i
JE660a GSM, which itself is a slight modification of the original genome-scale metabolic model (i
JE660 GSM) [8
]. Gene to protein to reaction (GPR) associations are included directly in the new model (i
JR904 GSM/GPR). These associations describe the dependence of reactions on proteins and proteins on genes (Figure ). The metabolic network described by i
JR904 has also changed; individual reactions are now elementally and charge balanced, and a significant number of new genes and novel reactions have been added to the model. i
JR904 GSM/GPR accounts for over 904 genes and the 931 unique biochemical reactions the encoded proteins carry out. This paper discusses the effects that these additional reactions have on the predictive capabilities of the model and identifies putative ORFs in the genome which could resolve gaps in the metabolic network.
Figure 2 Representation of gene to protein to reaction (GPR) associations. Each gene included in the model is associated with at least one reaction. Examples of different types of associations are shown, where the top layer is the gene locus, the second layer (more ...)
Since computational models of E. coli
will continue to grow in size and scope [7
] it will become important to be able to distinguish between the different models - a naming convention will aid in this effort. The naming convention we chose to use mirrors the one already established for plasmids. The general form of the names of in silico
strains used is i
XXxxxa YYY. The 'i'
in the name refers to an in silico
model (that is, a computer model). This 'i
' is followed by the initials (XX) of the person who developed the model and then the number of genes (xxx) included in the model. Any letters (a) after the number of genes indicates that slight modifications were made to the model, for instance i
JE660a is derived from i
JE660. Further designation of the content and scope of a model are found in YYY; here the acronyms GSM and GPR stand for genome-scale model and gene-protein-reaction associations, respectively. The contents of i
JE660a and i
JR904 can be found on our website [11
], and i
JR904 is also detailed in the additional data files.