One of the goals of systems biology is the construction of computational models that can accurately predict the response of a biological system to novel stimuli.
1-3 Such models serve to encapsulate our current understanding of biological systems, can indicate gaps in that understanding, and have the potential to provide a basis for the rational design of experiments,
4,5 clinical interventions,
6,7 and synthetic biological systems.
8 There are many varieties of computational models ranging from abstracted data-driven models to highly detailed molecular-mechanics ones. In this report we focus on the popular class of ordinary differential equation (ODE) models
9-16 typically used to describe systems at the biochemical and pharmacokinetic level but which are also appropriate at more abstract levels. Constructing an ODE model is comprised of writing kinetic rate equations that describe the time rate of change of the various chemical species (representing the model topopology), and determining the unknown parameters in those equations (typically rate constants and initial concentrations). Unknown parameters are estimated from a variety of data that often includes time-course measurements of concentration or activity. In this study, we have focused on the estimation of parameters, which is often referred to as model calibration. Using computational modeling and experimental design methodology, we have found that the selection of a set of experiments whose members provide complementary information can lead to efficient model calibration.
It should be noted that the problem of model calibration is different from model construction, where increasing numbers of parameters can be used to improve the fit to any given set of measurements, although parameter uncertainty may remain large. There is a considerable body of work focused on the problem of model complexity as it relates to parameter uncertainty.
17-20 In general these methods attempt to balance the ability of a more complex model to reduce fitting errors against the increased likelihood that a more complicated model will be able to fit the data by chance. Here we fix the model structure and number of parameters and vary only the measurements taken to develop a strategy for fitting the constant number of parameters with as little uncertainty as possible.
A detailed treatment of the theory for the current study is present in the Theory section. Here we provide a framework treatment of that theory. The quality of fit between measurements and a model can be expressed as the weighted sum-of-squares of the disagreement between them, which is a chi-squared (χ2) metric. Finding parameter values for a fixed model topology that minimizes χ2 gives the best-fit parameter values, but because of measurement uncertainty, different sets of parameter values may be consistent with any given set of measurements. A common approximation of this parameter uncertainty is to expand χ2 as a function of the parameters and truncate after the second-order term. The linear term vanishes because the first derivative of χ2 with respect to parameters is zero when the expansion is carried out about the best-fit parameter values. χ2 is thus approximated as a constant plus the second-order terms involving the second derivative of the χ2 quality of fit with respect to the parameters, known as the Hessian. A given amount of measurement uncertainty leads to an ellipsoid shaped envelope of constant χ2 in an appropriately scaled parameter space. Sets of parameters within the envelope are consistent with the measurements and their associated uncertainty.
Longer axes of the ellipsoid correspond to parameter combinations of greater uncertainty(i.e., that are less well determined by the measurements), whereas shorter ellipsoidal axes correspond to parameter combinations of less uncertainty. The mathematics is such that the aces directions of the ellipsoid are given by the eigenvectors (νi’s) of the Hessian, and their associated uncertainty is given by the reciprocal of the square root of the corresponding eigenvalues (λi−1/2). Thus, a set of parameters is well determined by a collection of measurements when the eigenvalues of the corresponding Hessian are all sufficiently large that they correspond to small relative parameter uncertainty.
Recently Gutenkunst
et al.21 examined parameter uncertainty for 17 models in the EMBL BioModels Database.
22 In their study, the authors assumed noise-free measurements of every model species sampled continuously in time. The study found that the eigenvalues of the Hessian spanned a large range (> 10
6). From this they suggested that, while it may be possible to estimate some parameters from system-wide data, in practice it would be difficult or impossible to estimate most of the parameters even from an unrealistically high-quality data set.
21,23,24 Moreover, they pointed out that due to the high eccentricity and skewness of uncertainly ellipses in parameter space, system-wide data can define system behavior better than independent measurements of each parameter and may also produce better predictions in some circumstances.
Here we extend the previous work by more fully considering the effect of experimental perturbations on the parameter estimation problem and use experimental design to probe for particularly effective perturbation experiments. The
χ2 goodness of fit metric depends on both the model and the set of experimental conditions. Some experiments may be more helpful in calibrating the model than others. In the current work we use effectively continuous-time data, but many experiments require the selection of discrete time points for measurements to be taken.
25,26 It is well established in the systems biology literature that optimal experimental design can have an impact on the parameter estimation problem for a single experiment.
23 ,25,27-29 For example, work by Faller
et al. has shown for a small model of a mitogen activated protein kinase (MAPK) cascade that the application of time-varying stimulation significantly improved the parameter estimation problem.
29 Essentially this corresponds to finding the time-varying input signal that gives the best shaped error ellipsoid.
In this work, we apply a related approach and examine the extent to which multiple complementary experiments can be combined to improve the overall parameter estimation problem. shows the result of combining data from two separate experiments. The parameter estimates from the individual data sets (blue and red ellipses) tightly constrain one parameter direction and weakly constrain the other. In , the weakly constrained parameter directions are very similar, so the parameter estimates from the combined data set are about the same as the estimates from the individual experiments (green ellipse); by contrast, in the experiments are complementary and together dramatically constrain the parameter estimates.
Because complementary experiments can constrain parameter estimation space, we have developed an approach to identify sets of complementary experiments to optimally minimize parameter uncertainty and tested it in a pathway model of signaling in response to EGF and NGF.
30 We have selected this model so that our results may be directly compared to the previously published analysis of this model performed by Gutenkunst
et al.21 For consistency, where possible we have used their methods and formalisms. In selecting sets of complementary experiments, we have explored a palette of candidate experiments consisting of overexpression or knockdown of single and multiple genes combined with different doses of EGF and NGF, either alone or in combination.
Computational experimental design methods determined all 48 free parameters to within 10% of their value using just five complementary experiments. Selection of complementary experiments was essential, as the same level of model calibration could not be achieved with arbitrary experiments or even with a larger number of “highly informative” experiments. Moreover, we argue that predictions that are sensitive to information complementary to that used to parameterize a model could be significantly in error. Experimental design methods can provide sufficient coverage for all parameter directions and thus guide model calibration for a given topology to maximize predictive accuracy. As systems biology models are applied to target identification and clinical trial design, the use of experimental design approaches to improve model prediction quality could be of crucial importance.
Previous work on the model calibration problem has focused on optimization within the scope of a single experiment.
31,32 Examples include selecting optimal time points,
33,34 species,
35-38 or stimulus conditions
5,39,40 that would be most effective in reducing parameter uncertainty. However, even highly optimized single experiments are generally insufficient for model calibration. For this reason, such methods have largely been applied to smaller scale problems. The current work is different in spirit in that it addresses the question of how improved model calibration might result from combinations of experiments that could collectively define all of the parameters. By design, the individual experiments may be easier to implement, yet relatively small combinations of simple experiments can determine all parameters in a medium-sized pathway model.