|Home | About | Journals | Submit | Contact Us | Français|
Kinetic modeling of metabolic pathways has become a major field of systems biology. It combines structural information about metabolic pathways with quantitative enzymatic rate laws. Some of the kinetic constants needed for a model could be collected from ever-growing literature and public web resources, but they are often incomplete, incompatible, or simply not available. We address this lack of information by parameter balancing, a method to complete given sets of kinetic constants. Based on Bayesian parameter estimation, it exploits the thermodynamic dependencies among different biochemical quantities to guess realistic model parameters from available kinetic data. Our algorithm accounts for varying measurement conditions in the input data (pH value and temperature). It can process kinetic constants and state-dependent quantities such as metabolite concentrations or chemical potentials, and uses prior distributions and data augmentation to keep the estimated quantities within plausible ranges. An online service and free software for parameter balancing with models provided in SBML format (Systems Biology Markup Language) is accessible at www.semanticsbml.org. We demonstrate its practical use with a small model of the phosphofructokinase reaction and discuss its possible applications and limitations. In the future, parameter balancing could become an important routine step in the kinetic modeling of large metabolic networks.
The complex, dynamic behavior of cell metabolism can be simulated by mathematical models. Metabolic pathway models consist of enzymatic reactions described by their stoichiometry, the enzymatic rate laws, and their kinetic constants (such as, for instance, equilibrium constants or catalytic constants). The more we know about these quantities, the more reliably we can simulate the metabolic dynamics. Kinetic laws of individual enzymes have been studied experimentally for about 100 years,(1) and metabolic control theory,(2) a theoretical apparatus for the analysis of metabolic systems, has been developed since the 1970s. Recently, comprehensive web databases, advances in high-throughput experiments, and inexpensive computing power have led to a new interest in metabolic modeling. In particular, the numerous large-scale metabolic networks reconstructed from sequenced genomes3−5 call for automatic routines that can fill these networks with enzymatic rate laws and turn them into dynamic models.
Unfortunately, the enzymatic mechanisms and the rate laws of most enzymes are unknown, and it is laborious to determine them exclusively by enzyme assays. A pragmatic solution is to substitute missing kinetic laws by standard rate laws, such as mass-action kinetics, generalized mass-action kinetics,(6) or linlog kinetics.7,8 Here, we will use the common modular rate law,(9) a generalized version of the reversible Michaelis−Menten rate law, suitable for any reaction stoichiometry and accounting for various types of allosteric regulation. Once a metabolic network and enzymatic rate laws have been chosen, we need numerical values for the kinetic constants. This can be a challenge, especially for large networks. Modelers can find known kinetic constants in published models, in the literature, or in public web resources such as Sabio-RK,(10) Brenda,(11) and NIST.12,13 As pointed out by Alberty,(14) varying conditions such as pH or salt concentrations can be taken into account by describing biochemical reactants and reactions in terms of transformed thermodynamic quantities. In the future, automated enzyme assays might provide more kinetic data, but they will still not reach the speed at which metabolic networks are reconstructed from newly sequenced genomes. Available kinetic data may not be suited for a model if they are contradictory or measured under inappropriate conditions (e.g., pH values and temperatures). Furthermore, data collected from various sources are very unlikely to represent a thermodynamically consistent set. Since incompleteness of the kinetic constants remains a major obstacle, methods for guessing unknown kinetic constants or adjusting the known values will become increasingly important.
Here, we present parameter balancing, an approach to infer complete and consistent sets of model parameters from incomplete, inconsistent kinetic data. This is only possible due to mutual dependencies between the kinetic constants and other model parameters, arising from their definitions or from thermodynamic laws (Wegscheider conditions(15) and Haldane relationships). In a simple approach, incomplete kinetic data sets could be complemented by inserting all available values into the model and adding other quantities that can be directly computed from them. However, this might leave parameters undetermined and would not eliminate inconsistencies between the original data values. As a better strategy, we determine parameter sets that are consistent by construction and resemble the original data as closely as possible. Since these values may not be uniquely determined, we have to restrict them to plausible ranges and quantify the remaining uncertainties by error bars. We have previously suggested(16) implementing this parameter balancing approach in a Bayesian statistical framework(17) and based on standard rate laws.(9) The critical step is to identify a set of independent model quantities that can be freely selected during parameter fitting, sampling, or optimization and from which all remaining quantities can be easily computed. After establishing this dependence scheme for a wide range of kinetic constants and metabolic quantities, we obtain a relatively general and simple data integration method that can guarantee consistent parameter sets. Here, we use more general formulas and present an interactive online workflow for models in SBML format (Systems Biology Markup Language,(18)www.sbml.org), then illustrate its use with an example case.
Parameter balancing exploits the fact that kinetic model parameters often share mutual dependencies, either by their definition or because of thermodynamic constraints.(16) For kinetic models with modular rate laws(9) and many other rate laws, the model parameters can be split into two subsets: a set of independent basis quantities that can be chosen arbitrarily and a set of derived quantities that can be computed from linear combinations of the basis quantities. As a simple example, let us consider the concentration and the amount of a substance within a cellular compartment. The amount a can be computed from the concentration c and the compartment volume V by the formula a = cV. If we choose (e.g., guess, sample, fit, or optimize) all three model parameters independently, this dependence will be violated. Of course, this problem is easy to avoid: we just need to treat a as a derived quantity to be computed from the basis quantities c and V. On a logarithmic scale, we obtain the simple linear dependence scheme ln a = ln c + ln V. In parameter balancing, we do exactly the same thing, but treat all quantities of a (possibly large) kinetic model at the same time. In addition, we consider not only dependencies arising from quantity definitions, as in this simple example, but also more involved dependencies arising from the laws of thermodynamics. The resulting dependence scheme consists of many linear equations, emerging from a thermodynamic and kinetic analysis of the rate laws. For practical calculations, these equations are represented by a large, sparse matrix to be constructed from the metabolic network.
Parameter balancing exploits this partition of the parameter set into basis and derived quantities. Initially, the basis quantities are estimated from data by a linear regression model that follows directly from the dependence scheme. Following this, the dependence scheme is used again to determine the dependent model quantities. The estimation is based on Bayesian statistics, combining the data with prior distributions for all model parameters, which can be selected to incorporate general knowledge about plausible values. Accordingly, parameter balancing yields not only point estimates but also posterior distributions, from which mean values, variances, correlations, and even random samples of all relevant quantities can be obtained. A detailed description is given in the Supporting Information. On the basis of our practical experience, we have extended the original parameter balancing approach in three major ways:
(1) Our model quantities comprise not only kinetic constants, but possibly also metabolite and enzyme concentrations for one or more metabolic states. From their values, we can derive other state-dependent quantities; in particular, the chemical potentials and reaction affinities. The reaction affinities describe how strongly chemical reactions deviate from their equilibrium and predetermine the reaction directions.
(2) In biochemical reactions, individual protonation states of a molecule (“chemical species”) are usually subsumed in a single “reactant” concentration. As pointed out by Alberty,14,19,20 these reactants should be described by transformed thermodynamic quantities, which effectively depend on the pH value and on salt concentrations. In our approach, quantities such as equilibrium constants, Gibbs free energies, and chemical potentials are given as transformed values, depending both on temperature and on pH. If experimental values stem from measurements under incompatible conditions, the discrepancies can be corrected during parameter balancing.
(3) Although prior distributions can help to define plausible ranges for the basis quantities, they are not applicable to the derived quantities. As a consequence, whenever few data on these quantities are available, their large uncertainties lead to unreasonable balancing results. We address this problem by augmenting the experimental data set with fictitious pseudo values, playing a role similar to the prior distributions. They allow the modeler to control the variance of the independent values and, thus, the reliability of the whole estimate.
A detailed description of the original method and all new features is given in the Supporting Information.
For practical use, we have implemented an interactive workflow for SBML models, allowing the user to balance the parameters and to replace or complete kinetic rate laws in the model. Three kinds of information are needed as an input: the network structure, which is obtained from the SBML file, the mathematical rate laws, which are chosen from the list of modular rate laws,(9) and a table with collected kinetic constants and other relevant data (for an example, see Figure Figure11 and Table Table1).1). In the model, enzymes and allosteric activators and inhibitors need to be listed as reaction modifiers and specified by SBO terms. The quantity types are defined as in the Systems Biology Ontology,(21) and the names of the reactions and species refer to IDs in the SBML model. All this helps the process to run in an almost fully automated manner. For further details, see the Supporting Information and our online documentation at www.semanticsbml.org.
At the end of the workflow, kinetic rate laws with a set of balanced parameters are inserted into the SBML model. As a result of balancing, most parameters are represented by normal distributions of their logarithmic values. When converting these values back to nonlogarithmic scale, we obtain log-normal distributions and need to carefully distinguish between their arithmetic and geometric mean values. To insert the most probable parameter set into a model, we choose the median values of the nonlogarithmic parameters, which are identical to the geometric mean. In addition, we can sample additional logarithmic parameter sets from the normal distribution, convert them back by applying the exponential function, and insert them into the model.
In the Supporting Information, we discuss a number of possible extensions, such as handling of identical species in different compartments, electrochemical potentials, cell compartments with different pH values, forward and reverse reaction rates, use of correlated priors, use of equilibrium constants as basis quantities, and prescribed reaction directions. We plan to include these features in a future version of the workflow.
We have implemented parameter balancing as an open source software written in Python. An interactive online version is accessible at www.semanticsbml.org. The Web site also contains the documentation and a number of example models, including the phosphofructokinase reaction discussed below. At the beginning of the workflow, the user uploads an SBML model,(18) defining the network structure (see Figure Figure1)1) as well as a formatted data table (see Table Table1)1) listing the known kinetic constants and other quantities relevant for the model. After uploading both files, the data can be filtered for a specific source organism, and missing standard errors are completed by default values. Then, a table of relevant model quantities is produced, with blank rows where data are missing and averaged values where more than one data point was available. The standard chemical potentials or equilibrium constants measured under different temperature or pH are not averaged, but kept as separate values. As a second source of information, we use prior distributions, describing our general expectation about the quantity types. Such priors can be derived from the typical ranges of kinetic constants found in the literature.
On the basis of the prior distributions and pseudo values chosen by the user, a set of model parameters is determined by parameter balancing (see Table Table2).2). For comparison, the user can also choose between several simpler methods for data completion: (i) completing all missing quantities by the prior means and widths (for missing basis quantities) or by pseudo values and their standard errors (for missing derived quantities), leading to complete, but inconsistent parameter sets; (ii) completing all missing basis quantities by prior values and computing all derived quantities by the dependence scheme; (iii) completing all missing derived quantities by pseudo values, followed by balancing without pseudo values; or (iv) balancing without pseudo values. Afterward, the user can choose a rate law from the list of modular rate laws(9) that will be inserted into the SBML file. The parameter values represent either median values or random samples from the posterior distribution. Finally, the balanced quantities and the completed model are exported, respectively, as a data table and as a fully parametrized SBML file.
We have tested parameter balancing with medium-scale models of central metabolism: yeast glycolysis models by Teusink et al.(22) and Hynne et al.(23) and a model of metabolism in pancreatic beta cells.(24) The original models, the collected data, and the balancing results for all models can be found at www.semanticsbml.org in the online documentation. For simplicity, we consider here a small model of the phosphofructokinase reaction, a key step in upper glycolysis:
The enzyme phosphofructokinase (PFK), which transfers a phosphate group from ATP (adenosine triphosphate) to fructose 6-phosphate (see Figure Figure1),1), has been studied extensively, but the databases Brenda,(11) NIST,(13) and Sabio-RK(10) do not contain a complete kinetic data set for a reversible rate law. Therefore, we pretend that the kinetic law of PFK is unknown, apply parameter balancing to the pooled data from several organisms, and compare the results to parameters from published kinetic models for the Baker’s yeast Saccharomyces cerevisiae.22,25,26
Our SBML model for the PFK reaction, including MIRIAM-compliant annotations, was automatically constructed from the KEGG(27) reaction identifier R04779 with the tool semanticSBML(28) (accessible at www.semanticsbml.org). Then, we collected data from the databases Sabio-RK,(10) Brenda,(11) NIST,(13) and yeastGFP(29) and from publications by Nissler et al.,(30) Albe et al.,(31) and Alberty.(20) The preprocessed data are shown in Table Table1.1. For the concentrations of fructose 6-phosphate and fructose 1,6-bisphosphate, we inserted pseudo values (arithmetic mean values arising from geometric mean values 0.5 and 1 mM, respectively, with a broad standard deviation of 0.5 for the decadic logarithms). Some of the values were obtained by averaging over several data values measured in different organisms. The set of balanced parameters, obtained with the default workflow settings, is shown in Table Table22.
A comparison with values from the literature and existing models is shown in Tables Tables33 and and4.4. The maximal velocity of the phosphofructokinase, mainly determined by a broad prior on catalytic constants and by an experimentally derived count number of the enzyme molecules, remains within a sensible range of the values from the literature (0.006 mM/s). We find the equilibrium constant to be much lower (0.072) than the one estimated by Teusink et al.(22)(80), which is clearly due to the small input value (0.08) from Nissler et al.(30) Finally, the balanced inhibition constant for ATP (0.376 mM) lies within a sensible range of the one found in Sabio-RK(10) (1.09 mM), but is nonetheless a little lower, since the input data value (0.396 mM) is lower, as well.
The equilibrium constants play a central role in parameter balancing and their numerical values primarily arise from measured values and from known standard chemical potentials. To incorporate their dependence on pH values, we reran the balancing with input data containing equilibrium constants measured at different temperatures and pH values (see Table Table5).5). Instead of just averaging over them (as for other duplicate values), parameter balancing can adjust the values to a certain target temperature and pH value chosen by the user. The aim is, of course, to describe in vivo conditions considered in the model. When choosing a target temperature of 300 K and a target pH value of 7, we obtained a balanced equilibrium constant keq = 0.19397. Since the input measurement conditions (average pH, 7.7; average temperature, 303.82 K) differ from the desired conditions (pH, 7; temperature, 300 K), the input value for the equilibrium constant (keq = 0.02923) changes significantly after parameter balancing. When we choose target conditions closer to the data measurement conditions (pH, 7.7; temperature, 304 K), the balanced value keq = 0.19393 moves slightly closer to the data value.
Finally, we tested what would happen without any direct information on equilibrium constants. When we remove all data and pseudo values for the equilibrium constant, we obtain abnormally high values for the majority of the derived parameters. Even though equilibrium constants can in principle be obtained from standard chemical potentials, this result indicates that the remaining uncertainty may be extremely large, and pseudo values are an efficient way to limit the ranges of balanced values.
Parameter balancing can also be applied to models of bigger size. Due to the large amounts of data that are produced, the result tables are not included in this publication. Instead, the detailed kinetic data for the models of Teusink et al.(22) (17 reactions, computing time 0.29 s), Hynne et al.(23) (24 reactions, computing time 0.50 s), and Jiang et al.(24) (45 reactions, computing time 2.32 s) can be found, downloaded, and used on our Web site www.semanticsbml.org. Nevertheless, the rising resource demands of bigger models can be a limiting factor. In the Supporting Information, we discuss several possibilities to break down the calculations into manageable pieces.
The example of the phosphofructokinase reaction shows that parameter balancing can produce consistent estimates on kinetic constants in plausible ranges and close to available input data. In the spirit of Bayesian statistics, we do not estimate the unknowns by averaging over the data, but by fitting the data with a statistical model; in our case, produced from the dependence scheme. One advantage is that such a model can handle not only uncertainties of individual quantities, but also correlated uncertainties arising from parameter dependencies. Of course, this approach strongly depends on the collected input data and on the chosen prior distributions. In situations when input data are missing, we found that pseudo constants can significantly improve the results. As a side effect, balancing will shift all the values—even if data are available for a certain quantity—toward the center of the prior distribution and toward the pseudo values, as can be seen in the example shown in Tables Tables33 and and4.4. If this effect is unwanted, there are two possibilities to avoid it: on one hand, experimental values can be fixed by assigning small standard errors to them; on the other hand, our workflow also provides the possibility to replace unknown values by prior or pseudo values without further balancing. This, however, will lead to inconsistent parameter sets.
The median posterior values obtained from balancing can be used as point estimates, but we can also sample parameter sets from the posterior distribution. Studying models with sampled parameters can be an emergency solution if few data are available, but it is also a convenient way to explore the dynamics in metabolic networks for a wide range of kinetic parameters and to discern the influences of network structure and kinetics on the dynamic behavior.
Parameter balancing is a general approach that could be applied to any physical model as long as all parameters fit into a linear dependence scheme. For kinetic models, we could establish linear dependencies for most relevant quantities, considering some of them on logarithmic scale. The choice of model parameters to be balanced is not fixed, but can depend on the specific situation (for a variety of possible variants, see the Supporting Information). In the scheme presented here, the equilibrium constants are derived from standard chemical potentials, which allows us to incorporate data or predictions of Gibbs free energies of formation; a general alternative, with even fewer parameters, would be to express all equilibrium constants by a set of independent equilibrium constants (see the Supporting Information). However, since model identifiability is guaranteed by the usage of prior distributions, keeping the number of parameters small is usually not crucial.
Some of the constants (for instance, the inhibition constants) are independent of all other parameters and could therefore be balanced separately, but as long as the parameter set is not too large, it turned out to be practical to account for all kinetic constants and metabolic data in one large dependence matrix. The reaction rates would be the most important target but do not fit into the scheme. In the future, their signs and the individual forward and backward rates could be used as input data. Once a subset of the flux directions have been predefined, available data on metabolite levels can directly contribute to the balancing of kinetic constants and thereby improve the estimation results.
Arguably, the most critical point in our approach is the use of heterogeneous data from different sources. Ideally, all kinetic constants used in a model should stem from measurements under the same, standardized, nearly in vivo conditions. Since such data are rarely available, modelers often need to utilize heterogeneous data collected from literature and databases, acknowledging that this may cause various problems. First, kinetic constants measured in vitro and in vivo differ due to different pH values, temperatures, salt concentrations, or other factors. Although we attempt to take these conditions into account, it may be difficult to apply corrections, since the exact conditions in living cells are not known. Second, most kinetic models neglect the complexity of the cell (e.g., molecular crowding, channeling, inhomogeneous localization of enzymes). Since the kinetic parameters in such models describe an effective behavior (e.g., averaged over different cell compartments), they will differ from the values measured in vitro. Third, there are different conventions for reaction formulas (e.g., H+ ions and H2O molecules may or may not be listed) and about the standard concentration c0 (in our case, 1 mM). Since the definition of transformed equilibrium constants crucially depends on the reaction formulas, it is important that model and data sources use the same, appropriate conventions. Finally, if kinetic constants are taken from existing models, their values may become invalid out of this specific context. Thus, especially for poorly identifiable parameters, it is important to consider the uncertainties in the original estimations.
Facing these difficulties, parameter balancing follows a pragmatic approach: even if data have a low quality, we may still use them as clues about unknown parameters. This is why uncertainty ranges play a central role here. If a quantity has been measured under the wrong conditions, we may account for this by increasing its standard error. On one hand, this will decrease the weight of this data point in the balancing process; on the other hand, the uncertainty of all related balanced parameters will increase, reflecting our precautionary approach.
In some cases, we may need to assign very small uncertainties to some of the input data. For instance, if an SBML model contains kinetic laws and we intend to replace only some of them, we have to make sure that the new rate laws are compatible with those pre-existing rate laws that will remain in the model. As a key precondition, all equilibrium constants in the resulting model need to satisfy the Wegscheider conditions.(15) To ensure this, one can determine the equilibrium constants of the existing rate laws and use them as input data (with zero standard error) in parameter balancing.
Due to the large uncertainties in standard chemical potentials and metabolite concentrations, it is unlikely that models obtained from parameter balancing will directly show realistic stationary flux distributions. In the future, this could be enforced by a subsequent fit to “omics” data(16) or by predefining the signs of reaction affinities. These signs will induce dependencies between kinetic and metabolic quantities; for instance, between the standard chemical potentials and the metabolite concentrations. If several metabolic states are treated within the same dependence scheme, the resulting method would resemble network-embedded thermodynamic analysis(32) and allow to use the results of previous fluxome and metabolome analysis (by flux-balance methods including thermodynamic constraints32−35) as input data for parameter balancing.
Parameter balancing yields consistent and complete parameter sets for kinetic models, as a potential starting point for further modeling. It can integrate incomplete and contradictory input data and respects constraints implied by common modeling assumptions (standard rate laws; reactants in ideal, dilute solution). If few input data are available, parameter balancing can help to quantify the remaining uncertainties, whereas with more and higher-quality data, the predictions become more trustworthy and precise. The resulting posterior distribution can be used to define parameter ranges, to sample possible parameter sets, or to be reused as a prior for following rounds of parameter balancing.
This work was supported by the European Commission (BaSysBio, Grant no. LSHG-CT-2006-037469), the International Max Planck Research School (IMPRS-CBSC), and the German Research Foundation (CRC 618). We thank our anonymous reviewer for his/her very helpful comments.
Parameter balancing: method and formulas. The material is available free of charge via the Internet at http://pubs.acs.org.