Flux balance analysis (FBA) is the use of a linear program (LP) to model the flow of metabolites through the network of reactions in a cell [1
]. FBA simulations give insight into the relative rates at which reactions occur when the cell is optimized for a specific objective. A fundamental assumption of FBA is that organisms can function optimally (often as a result of adaptive evolution) in that they make optimal use of scarce resources to serve the needs of the organism. This characterization of cell behavior naturally leads to a math programming modeling paradigm. FBA has been used to predict growth rates, gene essentiality, and other features of multiple organisms [2
Several related challenges are encountered in the building of metabolic reconstructions. To apply FBA to a constraint-based model, both a reaction network (representing organism-specific biochemical capabilities) and an objective (representing a desired or measurable physiological goal) need to be specified. Currently, complete reaction networks for organisms are not known. There may be reactions in a cell that must be active for the production of biomass that have not been cataloged in biological databases or documented in the literature. Another challenge is modeler error; the modeler can mistakenly omit a reaction or transport process that is necessary for the production of biomass. Aside from establishing a model that can produce biomass, a common difficulty in using FBA models is that of finding a culture medium that can allow the in silico cell to send flux through the biomass reaction.
Several methods for restoring functionality in broken FBA models, those incapable of a desired level of flux through the biomass reaction, have been previously proposed. GapFind [6
] is a procedure that determines which metabolites in a network cannot be produced, and GapFill [6
] determines a minimal set of reactions to add from a universal database so that a specified set of metabolites may be produced. These optimization-based procedures have already been integrated into the Model SEED metabolic reconstruction pipeline with some success [7
]. Reed et al. [8
] utilize a method that adds a minimum-sized set of reactions from a universal database that allows for a specified level of biomass production in the resulting model. MetaFlux [9
] is an automated approach to find missing reactions, exchange reactions, and biomass metabolites. OptStrain [10
] determines the maximum possible yield of a desired product based on the inclusion of all reactions in a universal database and then finds the minimum number of reactions from the database needed to achieve the optimal yield. Segrè et al. [11
] use the Forward Propagation and Backward Propagation/Backtracking algorithms [12
] to first determine the metabolites that can be produced in a model, and then find the precursors of essential nonproducible metabolites that cannot be produced.
Several investigators have proposed methods for filling gaps in metabolic networks outside of the FBA paradigm, including searching through a network of metabolites and reactions for logically possible paths [13
] and using logic programming to construct pathways [15
]. These methods do not ensure that the mass balancing constraints of FBA models are satisfied, nor do they consider the effects of generated pathways on the production of biomass. Thus, the application of these methods does not guarantee the generation of a constraint-based model that produces biomass when FBA is applied.
A fundamental assumption of FBA modeling is that metabolites remain at constant concentration within the cell. Throughout this paper, we use the term metabolite
to refer to any molecule whose concentration is of interest, including byproducts of metabolism, coenzymes, and protons. Let vj
be the flux through reaction j
, for each j R
, which is the number of times that a reaction occurs per unit time. Let Sij
be the stoichiometric coefficient for metabolite i
in reaction j
, for each i M
and j R
, with the convention that Sij
is negative for molecules i
that are reactants for reaction j
, positive for metabolites i
that are products for reaction j
, and 0 otherwise. Metabolites may participate in a unidirectional or reversible exchange reaction
. For our purposes, it will be helpful to distinguish source reactions
from escape reactions
and assign variables bisrc
for the fluxes through these reactions. We wish to restrict transport fluxes to zero for any metabolite unless its concentration changes in the cell due to transport processes. The conservation of mass for metabolite i
may be stated as follows:
The set of reactions R
may include a (potentially artificial) biomass
reaction which reflects the objective of the cell in terms of which metabolites are emphasized for production or consumption by other processes. The objective
can be added to the model, reflecting the desire to maximize flux through the biomass reaction. Maximizing flux through the biomass reaction is one of several possible objectives that one could assign to a cell. FBA models with this particular objective have been shown to reflect the behavior of single-celled organisms during cell growth. Assessing whether positive biomass production is possible is an effective method for testing the completeness of a metabolic reconstruction. If an FBA model is incapable of producing biomass, then there is likely a gap in the reaction network.
Upper and lower bounds on each reaction flux are specified. If possible, these bounds are based on experimentally observed fluxes and free energy considerations, as for the S. cerevisiae
and E. coli
]. If not, then a common lower and upper bound for all reactions can be assigned, and the fluxes returned by FBA give the investigator an idea of the relative activity of the reactions in the network for a given biomass reaction; the actual flux values in this latter case are less important than the ratios. For example if vj
≥ 4, the model indicates that a mechanism for maximizing biomass production exists wherein reaction j
is at least 4 times as active as reaction k
. If we generate (1
) for metabolites and reactions within a cell and add the flux bounds, we obtain the linear programming-based FBA model. The general model can be expressed compactly as follows:
In this paper, we propose a new approach to address the challenges of building FBA models called FBA-Gap. The procedure identifies gaps in the metabolic network that are preventing flux through a specified objective, which in our case is the biomass reaction that represents cellular growth. Given a metabolic reconstruction and a biomass reaction, the goal is to find the most plausible modification of the metabolic reconstruction so that the model is capable of sending flux through the biomass reaction. FBA-Gap uses mathematical optimization to determine a minimum cost set of additional exchange reactions needed such that the flux through the biomass reaction can exceed a given threshold. Costs are assigned to source and escape reactions a priori
based on their plausibility and distance to the biomass reaction. In general, exchange reactions for metabolites that exist in the extracellular compartment are given a low cost, while exchange reactions for metabolites that exist only in cytosolic and intracellular compartments are given a high cost. The output is a minimum cost set of exchange reactions and a flux distribution for the expanded reaction network. If the model is robust and has no detrimental gaps, the selected exchange reactions will correspond to missing transport reactions for uptake of metabolites from in silico
culture medium or for discharge of byproducts into the extracellular space. However, if the model has internal gaps in the reaction network, exchange reactions will be added for internal metabolites that are furthest from the biomass reaction.
Our method is a departure from previous gap-filling methods in that we place an increased emphasis on the accuracy of the final model. The approach is to preserve the set of reactions in the initial model and to direct the model builder to a set of reactions that lead to a biomass-producing model and can be added with high confidence. In the GapFind/GapFill framework, reactions are added until every
metabolite in the model is produced, and many additional reactions may be added to a model that are not required for the production of biomass. We will demonstrate that the proposed method is less computationally intensive than GapFind/GapFill. In the method described in [8
], hereafter referred to as GapReed, reactions may be added to the model which are downstream/upstream of the actual gap. In other words, there is no attempt to ensure that modifications address gaps in the “backbone” of the network; the gaps may be masked by implausible exchange reactions or secondary pathways. The emphasis in our method is directing the modeler to the gaps in the backbone of the network that can be addressed by adding high-confidence reactions to the model.
The cost structure in FBA-Gap for the artificial exchange reactions is crucial to the proper identification of gaps in the metabolic network. Our approach is to identify the gaps that are furthest distance from the biomass reaction, utilizing as much of the existing network as possible. A trivial “fix” to any constraint-based model would be to add exchange reactions for every component of the biomass reaction, which would always result in a solution that has no biological relevance. Measuring distance in a metabolic network is a well-studied problem. Distances between metabolites in a metabolic network have been used to establish and refute scale-freeness [19
]. Investigators have noted difficulties associated with the inclusion of coenzymes in distance calculations, not the least of which is specifying which metabolites are coenzymes [13
]. Some of these coenzymes are ubiquitous so that every metabolite appears near every other metabolite. Solutions to these difficulties include the introduction of compartments [13
], excluding the most common metabolites from distance calculations [21
], and using the Euclidean distance of attribute vectors for metabolites [14
]. In FBA-Gap, the length of a path in the metabolic network is based on the number of reactions in which each metabolite occurs, penalizing paths that pass through often-occurring metabolites. Gaps where coenzymes play a prominent role can be discovered, but preference is given to other gaps.
In the remainder of the paper, we describe the FBA-Gap method for building metabolic reaction networks and demonstrate its effectiveness in computational experiments. The method is used to help create a new metabolic reconstruction for a cellular organism based on a partial reconstruction. We compare the accuracy and computation time of FBA-Gap to existing gap-filling methods for this model. We then remove the exchange reactions from several existing models of organisms and apply FBA-Gap, yielding a hypothesis for minimal media for each organism. Finally, we delete a portion of the internal reactions of a working model, and apply FBA-Gap to detect the resulting gaps in the network.