|Home | About | Journals | Submit | Contact Us | Français|
To date, parameters defining biological properties in multiscale disease models are commonly obtained from a variety of sources. It is thus important to examine the influence of parameter perturbations on system behavior, rather than to limit the model to a specific set of parameters. Such sensitivity analysis can be used to investigate how changes in input parameters affect model outputs. However, multiscale cancer models require special attention because they generally take longer to run than does a series of signaling pathway analysis tasks. In this article, we propose a global sensitivity analysis method based on the integration of Monte Carlo, resampling, and analysis of variance. This method provides solutions to (1) how to render the large number of parameter variation combinations computationally manageable, and (2) how to effectively quantify the sampling distribution of the sensitivity index to address the inherent computational intensity issue. We exemplify the feasibility of this method using a two-dimensional molecular-microscopic agent-based model previously developed for simulating non-small cell lung cancer; in this model, an epidermal growth factor (EGF)-induced, EGF receptor-mediated signaling pathway was implemented at the molecular level. Here, the cross-scale effects of molecular parameters on two tumor growth evaluation measures, i.e., tumor volume and expansion rate, at the microscopic level are assessed. Analysis finds that ERK, a downstream molecule of the EGF receptor signaling pathway, has the most important impact on regulating both measures. The potential to apply this method to therapeutic target discovery is discussed.
Despite advances in cancer therapies such as molecular-targeted therapy (Sawyers, 2008), the clinical outcome of highly malignant tumors remains disappointing, with one in four deaths in the United States attributed to this disease (Jemal et al., 2010). In recent years, interdisciplinary approaches using data-driven mathematical and computational modeling have garnered much attention, gaining recognition for their potential to simulate, and analyze the complex behavior of cancer systems (Khalil and Hill, 2005). Moreover, tumor models across different biological scales, i.e., multiscale cancer models, have begun to play an increasingly important role in moving the field of integrative cancer systems biology toward clinical implementation (Sanga et al., 2007; Wang and Deisboeck, 2008). A multiscale approach recognizes the fact that cancer growth indeed spans multiple spatial scales (from nanometers to centimeters) as well as temporal scales (from milliseconds to years), and in some cases, it is more informative to incorporate this aspect into models than to remain at a scale-specific level.
A multiscale cancer model’s need to quantify parameters on, and relationships between, different biological scales significantly increases the complexity of model development. It is quite common that model parameters defining biological properties at different scales are not produced by a single laboratory, but instead are supplemented with those obtained from the literature or have to be estimated. Given the inherent uncertainties in their values, parameters should not be fixed when building a model, but should be assigned statistical distributions that reflect the degree of uncertainty. Hence, it is crucial to not only study the dynamical system behavior governed by a specific set of parameters, but also to further investigate the influence of their perturbations on the overall system behavior (Zi et al., 2005). Sensitivity analysis has been widely accepted as a useful tool for this purpose, especially when it is not possible or practical to conduct numerous wet-lab experiments (van Riel, 2006).
Sensitivity analysis methods fall into two categories: local sensitivity analysis (LSA) and global sensitivity analysis (GSA). LSA (one-at-a-time parameter variation) evaluates the slopes of system outputs with respect to each parameter at a particular point, resulting in a narrow coverage of the entire parameter space (i.e., all possible parameter variation combinations). GSA, on the other hand, addresses model behavior over a wide range of parameter operating conditions (Mishra et al., 2009). In complex biosystems, such as cancer, model parameter values may vary within a large range depending on the specific cell types and cellular environments. Hence, LSA can potentially be misleading. A number of GSA techniques have been developed, especially in the engineering field (Helton et al., 2006), but only recently have GSA methods been applied to systems biology models (Bentele et al., 2004; Zi et al., 2005; Zhang and Rundell, 2006; Marino et al., 2008; Sahle et al., 2008; Zhang et al., 2009). Specific examples are given in the following, and interested readers should refer to the corresponding original articles for more detail. Bentele et al. (2004) proposed a GSA approach by directly building on LSA to measure parameter sensitivities in a CD95-induced apoptotic pathway model. In their approach, a parameter’s global sensitivity index is quantified by calculating a weighted average of local sensitivity indices of the parameter at multiple random points within the parameter space. In Sahle et al. (2008), the authors presented a new approach that integrates the use of optimization techniques with sensitivity analysis to find the global optimum of the sensitivities of control coefficients in a glycolysis model. However, their approach may prove insufficient if the flux control coefficient (used to evaluate the control exerted by each reaction on the glycolytic flux) of a parameter has a very high value. A probabilistic variance-based GSA, based on Sobol’s method (Sobol, 1993), was developed to study a mitogen-activated protein kinase (MAPK) signaling model (Zhang et al., 2009), but the method is very computationally intensive because it requires a large number of system simulations in a Monte Carlo framework.
So far, most of the GSA methods proposed in the systems biology field still focus on signaling pathway analysis. We believe that providing an applicable GSA method for multiscale cancer models, and for multiscale systems biology models at large, is an important step for the assessment of the context-dependent relationship between different biological scales of interest. In this article, we present such a method for performing GSA, based on the integration of Monte Carlo and resampling methods as well as analysis of variance (ANOVA). ANOVA is a model-independent method with no assumption regarding the functional relationships between input and output (Frey and Patil, 2002), a feature which is of particular interest because, in most cases of model analysis, such relationships are not known a priori or cannot be quantified mathematically. Although ANOVA has been used as a sensitivity analysis method in other scientific disciplines, such as food quality (Pet-Armacost et al., 1999; Gerken et al., 2000), material testing (Golinkin et al., 1997), and health risk assessment (Ashraf et al., 1999; Mokhtari and Frey, 2005), to our knowledge this is the first such systematic application to the theoretical modeling of cancer. We exemplify the feasibility of the method using a two-dimensional (2D) multiscale agent-based model previously developed for simulating non-small cell lung cancer (NSCLC; Wang et al., 2007).
We only briefly introduce the concept behind and key development methods of the 2D agent-based NSCLC model (Wang et al., 2007), which spans both molecular and microscopic (i.e., multi-cellular) scales. At the molecular scale, an epidermal growth factor (EGF)-induced, EGF receptor (EGFR)-mediated signaling pathway specific to NSCLC is implemented (Figure (Figure1).1). The pathway model is composed of 20 EGF downstream signaling components and 38 corresponding rate constants. Each cell (or agent) in the model carries such a self-maintained signaling pathway. At the microscopic scale, a lattice-based 2D biochemical microenvironment is constructed and populated with diffusive chemical cues including EGF, glucose, and oxygen (see Figure Figure22 for model environment setup). In each simulation run, a total of 49 seed cells, arranged in a 7×7 square, are initially positioned in the center of a 2D lattice. As a simulation progresses, cancer cells constantly sense changes in environmental factors, interact with other cells, and adjust their behavior according to a set of pre-defined biological rules. A cellular phenotypic decision algorithm is established to determine cell phenotypic transitions upon molecular changes: PLCγ-dependent migration and ERK-dependent proliferation. Experimental studies have shown that the transient acceleration of accumulating PLCγ levels leads to cell migration (Dittmar et al., 2002), while that of ERK leads to cell replication (Santos et al., 2007). Therefore, in the algorithm, the rate of change of PLCγ determines the cellular migration decision, and the rate of change of ERK dictates the cellular proliferation fate. If a cell decides to migrate or proliferate, it will search for a suitable neighborhood to move to or for its offspring to occupy. A main feature of the model is that, in each simulation, tumor growth and expansion patterns due to cell proliferation and migration are neither pre-defined nor intuitive, but rather are the accumulated result of dynamic interactions between individual cells, and between cells and their biochemical microenvironment. The model can be used to quantify the relationship between extracellular stimuli, intracellular signaling dynamics, and multi-cellular tumor growth and expansion. Thus, in this study the above-described NSCLC model is employed as the modeling platform to investigate the cross-scale effects of simultaneous pathway parameter variations on tumor outcome at the microscopic scale.
Analysis of variance evaluates the effects of the inputs (i.e., independent variables) on the response (i.e., output or dependent variable) by decomposing the response into an overall mean (μ), main factor effects, interaction effects, and an error term (ε), and then provides their corresponding estimates (Winter et al., 1991). In ANOVA, each input factor has to be assigned specific ranges of values (also termed “factor levels”). Here is an example of a two-way (i.e., two input factors) ANOVA with a single response variable (Y):
where i refers to the level of factor A, j refers to the level of factor B, and k refers to the kth value of the response variable. Ai, Bj, and (A×B)i,j represent the main effect of the ith level of factor A, the main effect of the jth level of factor B, and the interaction effect between the two factors, respectively. Corresponding equations can be deduced in the same way for three or more input factors. ANOVA uses the F-test to examine the significance of each main or interaction effect, thus determining whether a statistically significant difference exists between mean responses for main effects or interactions between input factors. Thus, an F value is used as the sensitivity index to evaluate the effects of a factor (input parameter) on model output and then to rank the parameters. The higher the F value, the more a parameter contributes to changes in the output, and thus the more critical the parameter is in the model.
We propose an integrated GSA method that includes a process for preparing the basic input data samples, and a process for performing the analysis and quantifying the distribution of the sensitivity index (i.e., F value). We exemplify each process using the aforementioned 2D NSCLC multiscale model.
Continuous input parameters (in our case, the initial concentrations of pathway components) are first partitioned into mutually exclusive ranges of values, and each individual range is termed a parameter level. Exploring the entire parameter space will provide the most accurate, comprehensive understanding of a system, but is computationally impractical unless the number of parameters or parameter levels is small. For example, suppose we have M parameters and for each parameter we have N levels; this will generate a total of NM combinations. The number of simulations (each corresponding to a mutually independent parameter set) grows exponentially as M or N increases. The other reason for its impracticality is that the 2D multiscale cancer model takes a relatively long time to finish, depending on the final size of the simulated tumor, because each cell has to undergo a series of pathway analyses throughout the course of the simulation (with 3D discrete models taking even longer). For this reason, we use a random sampling of input parameters to make the large number of variation combinations computationally manageable. Specifically, we use the Latin hypercube sampling (LHS) method (McKay et al., 1979; Iman and Conover, 1980) to generate 2000 random sets of parameter values which evenly cover individual parameter ranges simultaneously. The LHS is a widely used variant of the standard Monte Carlo method, which allows for an unbiased estimate of the average model output. The advantage of this approach is that the random samples are generated from all of the ranges of possible values, thus giving insight into the extremes of the probability distributions of the outputs. That is, the use of LHS helps to ensure maximal sampling through each parameter dimension.
With the 2000 random sets of parameter values, 2000 simulation runs will be carried out accordingly, and hence will generate 2000 sets of simulation results. Note that, with our multiscale cancer model, to investigate the effects of how molecular changes in individual cancer cells percolate throughout and across the scales of a cancer system, the model output (i.e., biological response of the tumor) is no longer the behavior of output signals or signal activation patterns (as is the case in most current signaling pathway studies Murphy and Blenis, 2006); but rather, they are the tumor’s growth and expansion rate, two phenotypic behaviors at the microscopic level. Similar to previous studies (Wang et al., 2008, 2009), we use the number of elapsed time steps as a measure for “tumor expansion rate,” and the final number of live cells for “tumor (area or, more generally) volume.” We note here that a greater number of time steps until termination indicates a slower tumor expansion rate in a given simulation. For simplicity, we call each set of parameter values along with the corresponding two tumor output values an observation (see Figure Figure3A3A for an illustration). Hence, at the end of this data preparation process, we have 2000 observations. Hereafter, the 2000 observations (consisting of input parameter values and corresponding output responses) are referred to as the original sample.
We plan to only consider the main effect for input (independent) parameters because it has been found that interaction effects contribute almost insignificantly when compared to the main effect (Pant and Ghosh, 2006); also, doing so is a reasonable way to reduce the initial complexity of the system. As mentioned above, the F value calculated by the F-test is the sensitivity index for ANOVA. We note here that, based on the original sample (containing 2000 observations) generated in the previous data preparation process, ANOVA can already yield a parameter sensitivity ranking. The specific procedure for performing ANOVA on an input parameter is as follows. Assume parameter A has N treatment levels (corresponding to treatment groups in biological experiments). For each treatment level, we perform a number of 2000/N “virtual” experiments (this number, dependent on the LHS method, should roughly be 2000/N but may be different). Thus, we perform a total of 2000 “virtual” experiments based on these treatment levels for parameter A. In the end, with respect to each dependent variable, we have 2000 lines of measured experimental data. ANOVA is then used to compare means of these treatment levels using the F-test.
However, although the original sample is generated with the sophisticated LHS method, there is still the possibility that the sample data is biased, which would make the resultant ranking incorrect. To produce more convincing results, we seek to further quantify the sampling distribution of the sensitivity index (i.e., F value). A straightforward approach to understand such distributions of F values is as follows: create more, different sets of sample data (each sample again contains as many parameter combinations as the original sample), run the multiscale model with each parameter set in each sample, perform an F-test to calculate the F value for each parameter for each sample, and then, using all of the obtained results, quantify the distribution of F values. Since this approach is computationally costly, we employ a bootstrap method (Henderson, 2005). In brief, bootstrapping repeatedly samples the original sample with replacement and forms a new sample that is the same size as the original sample. The most attractive feature of bootstrapping is that we do not have to run the multiscale cancer model again, saving a great deal of time. In our study, we generate 1000 bootstrap samples (including the original sample) and then apply ANOVA to each bootstrap sample to calculate corresponding F values (Figure (Figure3B).3B). As a result, with respect to each model output (tumor volume or expansion rate), there will be 1000 sensitivity index values (each corresponding to a bootstrap sample) generated for each input parameter. Probability distributions of sensitivity indices can then be drawn from these results.
The first 2000 sets of parameter combinations (i.e., the original sample) were created with Matlab 2008 (Mathworks, Inc.). All of the statistical analysis programs for running ANOVA and bootstrap resampling were developed with SAS/STAT 9.2 (SAS Institute). The 2D agent-based model was developed in C/C++. All simulation runs were carried out on a 19-node dual-CPU supercomputer, and each simulation run took about 10min to finish. In this study, we only considered the initial concentrations of seven EGFR pathway components (EGF, EGFR, PLCγ, PKC, Raf, MEK, and ERK; see Figure Figure1)1) as input parameters. Parameter variation ranges were set large enough to cover the entire possible parameter space. Moreover, a variation range was partitioned into levels by means of evenly spaced intervals for each parameter. Table Table11 summarizes the input parameter variation ranges and corresponding levels. In a parameter ranking, a rank of 1 was assigned to an input with the highest sensitivity index, and the largest value of rank was assigned to the input with the least importance (i.e., lowest sensitivity index).
Table Table22 summarizes the ranking results when using the ANOVA method with 1000 bootstrap samples. For tumor volume evaluation (Part I of Table Table2),2), ERK appears at the top of the ranking of the seven inputs, which is not surprising since ERK determines a cell’s proliferation fate in our phenotypic decision algorithm of the 2D multiscale model (Wang et al., 2007). MEK holds the second position, but its F value is only approximately 1/10 of that of ERK, implying that ERK is far more important than MEK in influencing the emergent tumor volume. The remaining five molecules all have F values that are small when compared to the aforementioned two. In the tumor expansion rate evaluation (Part II of Table Table2),2), ERK is again the top input, followed by PLCγ and EGFR. This is somewhat surprising since we assumed that PLCγ would have the most significant impact on tumor expansion because it is the determinant of cell migration fate (Wang et al., 2007). The differences in F values for the top three parameters are not substantial, but the F value starts to drop dramatically from EGFR to MEK (the number 4 parameter), indicating that the last four parameters (including MEK) are likely not as important as the top three.
For each input parameter, we also quantify the arithmetic mean of the 1000 individual ranks (corresponding to the 1000 bootstrap samples) and the range of rank (last two columns of Table Table2).2). ERK is the first-ranked parameter for all bootstrap samples in the tumor volume evaluation. However, tumor expansion rate evaluation using sample data shows that ERK is not always the most sensitive parameter (the range of rank for ERK is 1–3). Moreover, all other parameters also exhibit different ranks across different bootstrap samples. This result indicates that the introduction of bootstrapping into the GSA workflow is necessary to produce non-biased parameter rankings, especially when only a relatively small or moderate size of data samples is available.
We performed an analysis on the multiscale model using the LSA method adopted in Wang et al. (2008). In brief, a sensitivity coefficient (SC) was used as an index to evaluate how a change in a single pathway component affects the overall system response at the microscopic level. This coefficient was calculated by the following equation:
where p represents the parameter that is varied in a simulation and M, the response of the system; M0 is obtained by setting all parameters to their reference values, and thus (Mi−M0) is the change in M due to the change in p, i.e., (pi−p0). Similar to the ANOVA-based sensitivity analysis, the system response M here also corresponds to either tumor volume or tumor expansion rate, i.e., tumor behavior at the multicellular level. Note here that two conditions result in a positive SC (taking tumor volume evaluation as an example): an increased tumor volume with increasing levels of a parameter, and a decreased tumor volume with decreasing levels of a parameter. For instance, if the tumor volume is increased (i.e., Mi>M0) and a certain parameter value is increased as well (i.e., pi>p0), then by using Eq. 1, the resultant SC will be positive. Similarly, two conditions result in a negative SC: an increased tumor volume with decreasing levels of a parameter, and a decreased tumor volume with increasing levels of a parameter.
Parameter variation ranges for this LSA analysis were the same as those listed in Table Table1.1. For each parameter, we created 101 variations, from 0 to its maximum value with increments of 1.0% of the maximum value. Taking EGF variation as an example, variation values were [0, 0.1, 0.2, …, 10.0]-fold of the parameter’s reference value. Note that the reference value for each parameter, i.e., a variation of 1.0-fold, was discarded when calculating SC values using Eq. 2. Only one parameter was varied when running a simulation, and all other parameters were held fixed at their reference values. Figure Figure44 illustrates the SCs of each input parameter. In tumor volume evaluation, MEK appears to be the most sensitive parameter in the model, with the maximum absolute value of a SC of 2.36 occurring at a variation of 0.96-fold of MEK’s reference value. In tumor expansion rate evaluation, three parameters, EGFR, PLCγ, and PKC, are identified to have the most significant impact on tumor expansion. A variation of 1.02-fold of the reference values of the three parameters results in the maximum absolute SC value, i.e., 4.48. In fact, ERK with a variation of 1.02-fold of its reference value also results in a SC value of 4.48 (data not shown). This result signifies that, with respect to tumor expansion rate evaluation, the LSA cannot discriminate between the importance of these parameters to the model. The overall ranking results from this LSA differed from what we obtained with the ANOVA-based GSA.
Sobie (2009) developed a method using multivariate linear regression analysis combined with parameter randomization to quantify parameter sensitivities in electrophysiological models, and then demonstrated the power of this method to compare the relative effects of different parameters in determining outputs of complex non-linear computational models. Because it allows multiple parameters to vary at the same time, this method belongs to the GSA category. We used Sobie’s regression-based method to validate our ANOVA-based ranking results by performing another set of analysis on the original sample (containing 2000 sets of parameter combinations). Ranking results are shown in Figure Figure5.5. The regression coefficients indicate how changes in input parameters lead to changes in outputs, and examining these coefficients allows for an assessment of the relative contributions of the various parameters. In tumor volume evaluation (Figure (Figure5A),5A), ERK is the most sensitive parameter, followed by MEK. In tumor expansion rate evaluation (Figure (Figure5B),5B), ERK, EGFR, and PLCγ are the top 3 parameters in the order of their importance. Comparing Figure Figure55 with Table Table22 confirms that the ranking results produced by both regression- and ANOVA-based analyses are in good agreement.
We have presented an applicable GSA method for multiscale cancer models with the use of a number of support techniques, including LHS sampling, Bootstrap resampling, and ANOVA. We exemplified the applicability of this method using a previously developed 2D multiscale NSCLC model (Wang et al., 2007). The effects of the input parameters on model outputs were quantified by simultaneously varying all of the parameters over the entire parameter space. Moreover, the computing time required for prioritizing the importance of parameters was considerably reduced by the introduction of Bootstrap resampling. The final parameter ranking results indicate that ERK is the most critical parameter at the molecular scale, chiefly regulating the two tumor growth indices, i.e., tumor volume and expansion rate, on the multi-cellular level. The ranking results from the GSA and LSA were different, but we believe the GSA-based analysis provides more convincing and realistic results because, unlike with LSA, multiple parameters are allowed to change simultaneously. This finding therefore supports therapeutic efforts that seek to target ERK to control overall tumor growth in NSCLC.
ERK, the decision molecule for cell proliferation, appears to have the most significant impact on both tumor growth indices. It is counterintuitive that ERK is more important than PLCγ in regulating the tumor’s spatial expansion rate because PLCγ is used as the key component for determining a migratory phenotypic switch. However, many experimental studies have observed that the activation of ERK signaling is directly associated with both cancer growth and metastasis in a number of cancer types, including lung cancer (Tan et al., 2004; Cheng et al., 2008; Ming et al., 2009). Hence, our in silico analysis results, to some degree, are in agreement with these experimental findings. As noted before, a sensitivity analysis method may produce different parameter rankings with respect to either tumor growth index, and this is precisely what we find in our cross-scale analysis (Table (Table2).2). In tumor expansion rate evaluation, PLCγ and EGFR have demonstrated their importance with the ANOVA-based GSA, and thus they are also (in addition to ERK) considered important parameters. It is different from tumor volume evaluation, where only ERK demonstrates a dominating position.
In both tumor volume and expansion rate evaluation, PKC is assigned the lowest ranking with the ANOVA-based GSA, and thus is deemed to be a less-important parameter. However, this finding conflicts with the LSA analysis, where PKC was observed to be one of the most sensitive parameters affecting tumor expansion rate. Since the LSA method only varied a single parameter at a time while keeping all others fixed, we believe it only accessed the baseline of the effect of perturbations in each individual parameter. Furthermore, the ranking results produced by the ANOVA-based GSA were mostly consistent with the results of the multivariate regression analysis (Figure (Figure5).5). Thus, very cautiously extrapolating the findings of this theoretical study, ERK appears to be a more valuable therapeutic target than PKC in regulating tumor expansion.
The LSA carried out in this study is not a thorough analysis. Rather, the main purpose of performing LSA here is to demonstrate that the LSA and our ANOVA-based GSA produce different parameter ranking results. There are discontinuities in SC values for some parameters, e.g., for the “sudden” peak for EGFR and PLCγ in Figure Figure4B.4B. This outcome is not a system error, but indicates that the parameter is highly sensitive around that concentration value. In fact, other systems models also exhibit this behavior – for example (Hornberg et al., 2005). Moreover, a small variation in a parameter’s reference value resulted in a relatively big change on model output (e.g., PKC and MEK in Figure Figure4A,4A, and EGFR, PLCγ, PKC, and MEK in Figure Figure4B).4B). This result is not surprising either, because many other biological systems have shown this response behavior to system input (Calabrese, 2005).
In summary, we have presented a readily applicable, integrated GSA method to identify and rank critical parameters at the molecular level that have significant impact on tumor volume and expansion rate at the microscopic level. Applying the method to a previously developed multiscale lung cancer model, ERK is found to be the most critical molecule in regulating both tumor growth indices, thus indicating its potential to serve as a therapeutic target in treating NSCLC. In the future, we plan on taking into account other types of GSA methods since there is no single ultimate method that best fits all types of systems biology models. Then, a consistently high ranking confirmed by different GSA methods may even more convincingly confirm the value of a molecule to serve as a therapeutic target in pharmacological drug discovery efforts. Such an approach is helpful and necessary, especially when the resulting parameter rankings differ from each other. Additionally, kinetic rate constants will also be considered as molecular parameters, and their cross-scale effects will be examined together with the signaling pathway component concentrations.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
This work has been supported in part by NIH grant CA 113004 and by the Harvard-MIT (HST) Athinoula A. Martinos Center for Biomedical Imaging and the Department of Radiology at Massachusetts General Hospital.
ANOVA, Analysis of Variance; EGF, epidermal growth factor; EGFR, EGF receptor; ERK, extracellular signal-regulated kinase; GSA, global sensitivity analysis; LSA, local sensitivity analysis; MAPK, mitogen-activated protein kinase; MEK, mitogen-activated protein kinase kinase; PKC, protein kinase C; PLCγ, phospholipase Cγ.