Dynamic models in systems biology as well as in other fields become increasingly complex as more detailed knowledge is incorporated. The massive presence of nonlinear relationships between their high-dimensional parameter- and solution- spaces is a key characteristic of such systems. Moreover, dynamic models typically generate multidimensional blocks of temporal data. Clearly it is very challenging to obtain a comprehensive overview of the behavioural repertoires of such models across the high-dimensional input parameter space, including the sensitivity of the model output to changes in the various input parameters, as well as interactions between input parameters and correlation patterns between model outputs. For dynamic model construction and validation, sound handling of such information is crucial. Since most of the existing methods for parameter estimation and sensitivity analysis are appropriate only for systems of relatively low output dimensionality and typically focus on one output variable at a time 1
, a generic methodology for analysis of model behaviour that is able to handle the entire range of model complexities and give a comprehensive overview of the relationships between the input parameters and all model outputs, is sorely needed.
Statistical approaches are gaining acceptance as a means for analysis of input-output relationships of complex dynamic models 2
, and statistical emulation of dynamic models (metamodelling 11
) has been demonstrated to be a useful tool both for speeding up computations 12
and as a basis for sensitivity analysis 2
and uncertainty assessment 14
. Multi-way (N-way) methods have previously been shown to be effective for data integration in e.g. systems biology 17
. We therefore hypothesise that N-way approaches will be especially advantageous for metamodelling of dynamic models due to the capability of integrating temporal data from several output state variables simultaneously while retaining the information about which state trajectory that corresponds to which state variable throughout the analysis (with 2-way methods, this information is lost when concatenating the trajectories for the different state variables prior to the analysis). Consequently, a more detailed exploration of the temporal dimension of dynamic models is possible. This is important in order to obtain a comprehensive overview of how variation in the input parameters is manifested in the model output. Moreover, methods utilising several model outputs simultaneously have already been demonstrated to reduce the model sloppiness by imposing more constraints on the system 5
The N-way Hierarchical Cluster-based Partial Least Squares Regression (N-way HC-PLSR) presented here is designed for efficient handling of block-wise nonlinear
data structures, and works by combining several regional N-way Partial Least Squares Regression (NPLSR) 19
models within which the mappings between input parameters and output state variables can be more adequately represented than in a global NPLSR model. The nonlinear capabilities of this metamodelling approach are obtained by combining clustering and generation of local linear metamodels for the various cluster regions. This is an N-way extension of our previously published method Hierarchical Cluster-based Partial Least Squares Regression (HC-PLSR) 8
. HC-PLSR is based on separate (2-way) PLSR 20
analyses of distinct regions of the parameter space (where the resulting regression coefficients are measures of the model sensitivity to the different input parameters), while in N-way HC-PLSR, the separated regions are defined according to the dynamic behaviour of the output state variables and the output from the dynamic model is represented as an N-way array (in our example the number of ways or modes N=3; observations×state variables×time points of the state trajectories (see Figure )). A common metamodel based on all state variable trajectories can thereby be generated. This allows simultaneous analysis of nonlinear relationships between all model outputs and input parameters of complex dynamic models, in a low-dimensional subspace spanned by estimated latent variables (called NPLSR factors). The NPLSR factors represent the features that are most important for the covariance between the inputs and outputs (see Additional file 1
: Section S1 for a description of the NPLSR methodology). The method is therefore suited for visualising covariance structures both within and between the input parameters and the model outputs, and thereby also useful for finding and removing possible redundancies (leading to model reduction) and prioritizing experiments for e.g. model validation by identification of key inputs and outputs describing the system behaviour. Hence, this can also guide experimentalists in the choice of quantities to measure when studying a biological system. Moreover, the NPLSR approach provides considerable dimension reduction possibilities through projection of the data into a low-dimensional subspace. In our opinion, this methodology should be considered as particularly useful in future multivariate metamodelling for analysis of complex spatiotemporal models. In N-way metamodelling, the three spatial dimensions can be included as separate modes in the N-way analysis, in addition to the temporal dimension on which we focus in this paper. The spatial structure of the data can thereby be kept throughout the analysis. We hypothesise that this will be a great advantage in the analysis of spatiotemporal models.
Figure 1 Illustration of the N-way data structure used in N-way HC-PLSR. Illustration of the data structure used in N-way metamodelling. Here the number of modes (ways) N=3, where the first mode is the different simulations carried out using (more ...)
Traditionally, metamodelling is carried out in the causal direction, predicting model outputs as functions of the input parameters using e.g. regression methods. Application of metamodelling in the reverse direction is, however, also of potential interest 5
. The two modelling directions can be understood as extensions of the classical/inverse calibration modelling 22
. Accordingly, we refer to the causal direction as classical metamodelling
, and the reverse direction as inverse metamodelling
, and in our application of the N-way HC-PLSR we demonstrate how their combination provides more detailed insight into the complexity of the mapping between input parameters and model outputs. Inverse metamodelling may also facilitate fitting of nonlinear models to large amounts of experimental data. Given that the results from the computations can be substituted with relevant experimentally measured data or quantities calculated from measurements, these can be used to predict corresponding parameters. Moreover, combinations of classical and inverse metamodelling can identify the key metrics to measure in order to validate the models. A more comprehensive introduction to the multivariate metamodelling methodology is given in Additional file 1
: Section S1.
As long as they handle high-dimensional data with nonlinear relationships and yield interpretable representations, a wide variety of statistical methods can be effectively used for multivariate metamodelling. We have recently shown that multivariate metamodelling based on PLSR and our nonlinear extension HC-PLSR 8
provides good approximations of the input-output mappings 8
as well as informative insight into complex interaction patterns between parameters 9
of advanced nonlinear dynamic models. PLSR can use multiple response variables simultaneously and utilise inter-correlations between them for model stabilisation. PLSR analysis has been shown to effectively reveal covariation patterns in large and complex data sets, and extract correlations between possibly noisy and partially redundant input variables and outputs 6
. The success of PLSR in the context of sensitivity analysis and for constraining input parameter values from dynamic model outputs has also been demonstrated by Sobie et al. 5
. Highly nonlinear input-output structures may, however, be difficult to model adequately with linear models such as PLSR, even with polynomial extensions. To confront these problems, HC-PLSR was introduced 8
. Heterogeneity in model sensitivity to certain parameters between various regions in the parameter space of a dynamic model of the mouse ventricular myocyte was identified by HC-PLSR-based sensitivity analysis in 9
. Similarly, zooming into different regions of the state variable behavioural domain provides the opportunity to identify regions where the relationship between certain parameters and the model output is less ambiguous, indicating that these parameters are especially important for defining a specific type of temporal model behaviour. In cases where variation in the input parameters can be directly related to genotypic variations, this may provide valuable information about how a specific genotype can be of particular importance for the manifestation of certain phenotypic characteristics.
Here, we combine three different aspects of multivariate metamodelling: 1) Description of highly nonlinear input-output relationships by regional metamodelling, 2) NPLSR, allowing a retention of a tensor data structure throughout the analysis and 3) Inverse metamodelling in addition to the classical approach, providing more confident conclusions and a more comprehensive model overview. Moreover, particularly complex details are pursued by more detailed metamodelling of individual outputs and their relationships to the varied input parameters. Altogether, this provides a powerful, robust and efficient approach to exploration of the behavioural repertoire of complex dynamic models.
We illustrate our methodology by an application to a complex dynamic model of the mammalian circadian clock developed by Leloup and Goldbeter 24
, which is a well-established and validated model. Models of circadian rhythms have e.g. been used for identifying mechanisms of chronotolerance and chronoefficacy for anticancer drugs 25
. The dynamic model we analyse describes circadian oscillations of cellular activity in conditions of continuous darkness, and consists of 16 coupled ordinary differential equations (ODEs) describing the dynamics of three genes through intertwined positive and negative feedback loops. By combining the classical and inverse approaches of the N-way HC-PLSR, we capture several interesting parts of the present complex input-output relationships, which are difficult to deduce directly from the model’s differential equations.