Existing approaches for prediction confidence intervals like MCMC
] or bootstrap procedures are based on forward evaluations of the model for many parameter values. This works reasonably well for a low dimensional parameter space and if the target density function, i.e. the parameter space to be sampled, is well-behaved
]. However, sampling nonlinear high-dimensional spaces densely is impractical and it is almost impossible to ensure that sampling the parameter space covers all prediction scenarios. Especially in biological applications the target distributions frequently inherit strong and nonlinear functional relations. In the case of non-identifiability, the parameter space to be sampled is not restricted rendering convergence near to impossible.
In this paper, we present a contrary procedure. The model prediction space is sampled directly and the corresponding model parameters are determined by constraint maximum likelihood to check the agreement of the predictions with the data. This concept yields the prediction profile likelihood which constitutes the propagation of uncertainty from experimental data to predictions.
If a comprehensive prior, i.e. for all parameters, would be available, a Bayesian procedure like MCMC where marginalization, i.e. integration over the nuisance dimensions is feasible could have superior performance. However, in cell biology applications, prior knowledge is very restricted because kinetic rates and concentrations are highly dependent on the cell type and biological context, e.g. on the cellular environment and biochemical state of a cell. Therefore, there is usually at most some prior information for few parameters available. Such prior information can be incorporated in our procedure without restricting its applicability by generalizing maximum likelihood estimation to maximum a-posterior estimation as discussed in the Additional file
In general, generating prediction confidence intervals given the uncertainty in a high-dimensional nonlinear parameter space requires large numerical efforts. However, this complication primarily originates from the complexity of the issue itself rather than from the methodological choice. In fact, the aim is approached by the prediction profile likelihood in a very efficient manner because scanning the parameter space by the constrained optimization procedure to explore the data-consistent predictions is more efficient than sampling parameter space without considering the predictions like it is performed for MCMC. Instead of sampling a high-dimensional parameters space, only the prediction space has to be explored for calculating a prediction profile likelihood, i.e. the optimization of the parameters reduces the high-dimensional sampling problem to exploring a single dimension.
The prediction confidence regions introduced above has to be interpreted point-wise. This means that a confidence level αcontrols errors of type 1 which is the probability that the model response for the true parameters is inside the prediction confidence interval for a single prediction condition if many realizations of the experimental data and the corresponding prediction confidence intervals are considered.
In contrast, if a single data set is utilized to generate many prediction intervals, e.g. predictions for several points in time as performed above, the results are statistically dependent, i.e. the realization of the PCI of a neighboring time point is very similar and therefore correlated. Therefore, the prediction confidence intervals for a compound for two adjacent points in time very likely both contain the true value, or neither. In such an example, a common prediction confidence region for two statistically dependent predictions would require a two-dimensional prediction profile likelihood. This topic, however, is beyond the scope of this article.
The prediction profile likelihood also provides a concept for experimental planning. Experimental conditions with a very narrow prediction confidence interval are very accurately specified by the available data. New measurements for such a condition on the one hand does not provide very much additional information to better calibrate the model parameters, and hence is from this point of view a bad choice for additional measurements. On the other hand, it very precisely predicts the model behavior under these certain conditions and is therefore a very powerful candidate setting for validating the model structure. Contrarily, large prediction confidence intervals indicate conditions which are weakly specified by the existing data and therefore constitute informative experimental designs for better calibrating the model. Because a design optimization on the basis of the prediction profile likelihood does not require any linearity approximation like common experimental design techniques, e.g. based on the Fisher information
], the presented procedure is very valuable for ODE models which are typically highly nonlinear.
Another potential of the prediction profile likelihood shown in this article is its interpretation in terms of observability
. This term is very commonly used in control theory to characterize whether the dynamics of some unobserved variables can be inferred by the set of feasible experiments. The theory in this field is based on analytical calculations, i.e. the limited amount and inaccuracy of the data is usually not considered. In this article, it has been shown that the prediction profile likelihood allows for a general data-based approach to check whether there is enough information about unobserved dynamic states in the given experimental design and realization of measurements. Therefore, in analogy to the terminology of practical identifiability
], we would suggest to term observability for a given data set, i.e. a restricted prediction confidence interval, as practical observability
Finally, it should be noted, that a prediction could be any function of the compounds and the parameters. In applications, e.g. a ratio of two compound concentrations is a characteristics of interest. In principle also integrals, peak positions and other functions of the dynamic states can be considered as predictions which could be targets for observability considerations as well as for the calculation of prediction and validation confidence intervals. This flexibility renders the prediction profile likelihood as a concept promising to resolve one bottleneck in computer-aided simulations of complex systems, the generation of reliable confidence intervals for predictions.