The overall hypothesis of our study was that if several physiologically-relevant characteristics of a model's behavior were known, this information would be sufficient to constrain some or all of the model's parameters. We tested this idea using two approaches: one based on multivariable regression and the other based on Bayes's theorem. We began by generating a database of candidate models. The parameters that define maximal conductances and rates of ion transport in the TNNP model

[15] were varied randomly, and several simulations, defining how the candidate model responded to altered experimental conditions, were performed with each new set of parameters. In general, the simulations reflected experimental tests commonly performed on ventricular myocytes, such as determining the threshold for excitation or changing the rate of pacing.

For the first approach, the results of these simulations were collected in “input” and “output” matrices

**X** and

**Y**, respectively. Each column of

**X** corresponded to a model parameter, and each row corresponded to a candidate model (n

=

300). The columns of

**Y** were the physiological outputs extracted from the simulation results, such as action potential duration (APD) and Ca

^{2+} transient amplitude. Complete descriptions of the randomization procedure and simulation protocols are provided in the

Methods and

Text S1. Outputs are listed in and described in detail in

Text S1.

| **Table 1**Physiological outputs in simulations with TNNP model. |

Multivariable regression techniques were used to quantitatively relate the inputs to the outputs. In the “forward problem,” a matrix of regression coefficients

**B** was derived such that the predicted output

**Y** ^

=

**XB** was a close approximation of the actual output

**Y**. This method has recently been proven useful for characterizing the parameter sensitivity of electrophysiological models

[16]. We reasoned that a similar approach could be used to address the question: if the measurable physiological characteristics of a cardiac myocyte are known, can this information be used to uniquely specify the magnitudes of the ionic currents and Ca

^{2+} transport processes? Specifically, we hypothesized that if: 1)

**Y** ^

=

**XB** was a close approximation of the true output

**Y**, and 2)

**B** was a square matrix of full rank, then

**X**_{predicted}=

**YB**^{−1} should be a close approximation of the true input matrix

**X**. This argument is illustrated schematically in .

demonstrates the accuracy of the reverse regression method. For four chosen conductances, the scatter plots show the “actual” values, generated by randomizing the baseline parameters in the published TNNP model, versus the “predicted” values calculated with the regression model. The large R^{2} values (>0.9) indicate that the predictions of the regression method are quite accurate. Of the 16 conductances in the TNNP model, 12 could be predicted with R^{2}>0.7. The four that were less well-predicted were the Na^{+} background conductance (G_{Nab}), the rapid component of the K^{+} delayed rectifier conductance (G_{Kr}), the sarcolemmal Ca^{2+} pump (K_{pCa}) and the second SR Ca^{2+} release parameter (K_{rel2}).

To verify that these encouraging results were not specific to the TNNP model, we performed similar analyses on additional models, the human ventricular myocyte model of Bernus et al.

[17], and the “Phase 1” ventricular cell model of Luo and Rudy

[18]. In either case (

Figures S3 and

S4, respectively), the reverse regression was highly predictive of most parameters, indicating that this approach is generally applicable. The outputs used for these analyses, listed in

Text S1, differed somewhat from those used for the TNNP simulations because the Bernus et al.

[17] and Phase 1 Luo and Rudy

[18] models are relatively simple and do not consider intracellular Ca

^{2+} handling in detail.

illustrates how the quantity and identity of the outputs in

**Y** affected the accuracy of the predictions. Bar graphs show R

^{2} values for prediction of each model parameter obtained by performing the reverse regression in three ways: 1) using all 32 outputs (blue), 2) matrix inversion (green), with the 16 best outputs as identified by the output elimination algorithm (see

Methods), and 3) using only the 16 rejected outputs (red). The R

^{2} values computed using the 16 best outputs were virtually identical to those obtained when all 32 outputs were used whereas R

^{2} values for most conductances were substantially lower when only the 16 rejected outputs were included. These tests validate the algorithm which selected the outputs for matrix inversion. Moreover, since the 16 best outputs performed essentially as well as the full set of 32 outputs, this result implies that the model outputs were not fully linearly independent, and the 16 rejected outputs contained redundant information.

displays, as heat maps, the coefficients for both the forward and reverse regression problems. The former indicate how model parameters influence outputs, whereas the latter specify how changes in model outputs restrict the parameters. Parameter sensitivities for selected outputs and conductances are shown as bar graphs to the right. As previously argued for the case of forward regression

[16], these parameter sensitivities help to illustrate the relationships between parameters and outputs. For instance, forward regression coefficients indicate that diastolic [Ca

^{2+}] is determined primarily by a balance between SR Ca

^{2+} uptake and SR Ca

^{2+} leak, with other parameters making only minimal contributions. Conversely, for reverse regression, the maximal conductance of L-type Ca

^{2+} current (G

_{Ca}) depends on many model outputs including action potential duration, Ca

^{2+} transient amplitude, and, in particular, how these are altered with changes in extracellular potassium. This result underscores the centrality of intracellular Ca

^{2+} regulation to many cellular processes.

The results shown in demonstrated that most of the model parameters used to generate the dataset could be reconstructed using the reverse regression procedure. To provide evidence that this procedure may be more broadly useful, we applied the method to a novel test case by performing simulations with the most recent version of the Hund & Rudy canine ventricular model

[19]. Specifically, we considered changes in seven parameters corresponding to the condition of heart failure, as previously modeled by Shannon et al

[20]. shows that implementing these parameter changes dramatically alters both AP shape and Ca

^{2+} transient amplitude. After performing simulations under a range of conditions with both normal, healthy cells and pathological, failing cells (see

Methods and

Text S1), we asked how well the reverse regression matrix could calculate the parameter changes in the failing cells. We found that this method constrained 5 out of 7 parameters with excellent accuracy, while changes in two parameters (G

_{Ks} and K

_{leak}) were overestimated somewhat by the regression algorithm. This novel test cases validates our approach and suggests that it may indeed prove a useful method for developing new models based on experimental measurements.

The second approach for constraining model parameters is based on Bayes's theorem. In statistics, this celebrated result describes the conditional probability of one event given another in terms of: 1) the conditional probability of the second event given the first, and 2) the marginal probabilities of the two events:

In this context, we consider event A that a model conductance lies within a given range, and event B that a model output is within a particular range. When many simulations are performed with randomly varying parameters, the probability P(A) is fixed by the user, while the probabilities P(B) and P(B|A) can be estimated from the results. This allows us to approximate P(A|B), which reflects how well a model parameter is constrained by a particular simulation result.

Since our hypothesis was that multiple outputs needed to be considered to constrain model parameters, we were interested in extensions of Bayes's theorem to more than two variables, e.g. P(A|B∩C), where B and C are events related to two model outputs. For instance, B and C could represent, respectively, that APD and Ca^{2+} transient amplitude are within particular ranges. If the conditional probability of the parameter increases as additional outputs are considered, this validates the thinking underlying the approach.

The application of this strategy to our data set is illustrated in . The two rows of histograms display distributions of G_{Na} and G_{Ca}, which are typical of the 16 model parameters considered. The leftmost histogram in each row shows the distribution of conductance values in the entire population, and the remaining columns show conductance values for sub-populations that satisfy constraints on one or more model outputs. Successive columns from left to right show distributions with additional model outputs considered, as noted. In either case, the distributions become progressively narrower, and the conditional probability is unity once 3 outputs are considered.

This procedure also provides insights into which specific outputs provide the greatest information about particular model parameters. For instance, the distribution of G_{Na} given a certain range of APD appears similar to the overall distribution of G_{Na} because these two variables are not strongly correlated (i.e. P(B|A) ≈ P(B)). In contrast, inclusion of V_{peak}, an output highly dependent on G_{Na}, narrows the distribution significantly. In the case of G_{Ca}, restricting APD to a particular range makes the distribution narrower, which is to be expected given the relatively strong correlation between the parameter and the output. Thus, an approach based on Bayes's theorem also supports the idea that model parameters can successfully be constrained if multiple model outputs are considered.