|Home | About | Journals | Submit | Contact Us | Français|
Predicting information about human physiology and pathophysiology from genomic data is a compelling, but unfulfilled goal of post-genomic biology. This is the aim of the so-called Physiome Project and is, undeniably, an ambitious goal. Yet if we can exploit even a small proportion of the rich and varied experimental data currently available, significant insights into clinically important aspects of human physiology will follow. To achieve this requires the integration of data from disparate sources into a common framework. Extrapolation of available data across species, laboratory techniques and conditions requires a quantitative approach. Mathematical models allow us to integrate molecular information into cellular, tissue and organ-level, and ultimately clinically relevant scales. In this paper we argue that biophysically detailed computational modelling provides the essential tool for this process and, furthermore, that an appropriate framework for annotating, databasing and critiquing these models will be essential for the development of integrative computational biology.
Coupling genomic data to physiological function is the aim of biology in the post-genomic era. Quantitative descriptions of biological processes using mathematical modelling are one important tool in this aim. Empirically derived relationships have commonly been used in modelling biological processes. While such data-driven models often give useful descriptions and insights into specific data sets, these models often fail when combined together to study interaction between multiple processes. On the other hand, physics-based models – models built on principles including the laws of mechanics and thermodynamics, in which assumptions and approximations are made explicit – operate with a common currency of mass, charge, energy and momentum (Bassingthwaighte et al., 2001; Qian et al., 2003). With care, such models may be naturally integrated together to form comprehensive models of biological systems.
It is our conviction that a high degree of the true complexity of the biological mechanisms must be represented in models if clinically applicable insights are to be gained from model simulations. There are, however, significant challenges to be overcome, both mathematical and computational. Multi-scale models must incorporate nontrivial biological complexity, while remaining computationally tractable. Furthermore, while representing this complexity, models must still be capable of providing insights via mathematical analysis when simulations do not behave as expected (as must sometimes happen if we are to learn anything new!). This requires the development of approaches to deal with model complexity and parameterization, and communication and information sharing between developers of models.
One approach to handling complexity across multiple spatial and temporal scales is to adopt a modular and hierarchical approach to modelling biological systems. In this approach, mathematical representations of biological components are brought together and tuned appropriately to produce a model of a specific cell or tissue type. The most transparent way of achieving this goal is to retain biophysical detail at each level in a modelling hierarchy, while employing simplifying assumptions to move to higher level descriptions (Smith et al., 2004; Smith et al., 2000). This often requires the coupling of models governed by different physical equations, representing physiologically discrete functions (Nickerson et al., 2005). Such a hierarchical and multi-physics approach provides an obvious mechanism for revision or improvement of selected parts of a large-scale simulation as new data are collected. Furthermore, this biophysical approach provides greater confidence in the ability of a model to extrapolate from the data used for parameterization and to provide detailed, even patient-specific, predictions when data from an individual are available.
The integration of biophysically based models covering the breadth of physiological function, across spatial and temporal scales, is the approach and philosophy driving the IUPS sponsored Physiome Project (Crampin et al., 2004; Hunter and Borg, 2003). As part of this umbrella project, this multiscale modelling approach has had demonstrable success in models including the gastro-intestinal (Buist et al., 2006), renal (Ribba et al., 2006) and musculo-skeletal organ systems (Hunter et al., 2005) and, arguably the most sophisticated exemplar, the heart or ‘cardiome’ (Hunter and Borg, 2003). It is from this cardiac work that we draw our examples below; however, the principles we illustrate are relevant across the full range of organ systems.
Typically, as our knowledge and understanding of biological processes grows, models of increasing detail and comprehensiveness have been developed, often by piecing together existing model components, in order to incorporate more and more of the available data. However, the strength of building on existing work can also be the greatest weakness of this approach. Errors and implicit assumptions contained in foundation elements of models can, as we will demonstrate below, propagate through as more complete models are developed. It is, therefore, vital that the assumptions used to develop models are made explicit, and that propagation of errors is prevented. This imposes an extremely high duty of care on both authors and reviewers of new models. In particular, it is unreasonable to expect such problems to come to light during the conventional reviewing process. We assert that new and innovative processes and criteria must be developed to augment the standard peer review process, such that, not only are errors in models eliminated, but also the conditions of appropriate model use and connection with the experimental data are made transparent for the user community. If these issues can be addressed, we believe the scientific community at large will have improved confidence in the fidelity of individual models, and the utility of computational biology as a whole. This will be essential for computational modelling to achieve its promise, both in the laboratory and in the clinic.
Work in a number of groups is already progressing towards the development of tools and ontologies (Cuellar et al., 2003; Schilstra et al., 2006) to facilitate the unambiguous machine-readable representation of biological models. Most recently this concept has been progressed further with the proposal of set of rules (termed MIRIAM, Minimum Information Requested In the Annotation of biochemical Models) for curating quantitative models of biological systems (Le Novere et al., 2005). This community effort defines procedures for encoding and annotating models represented in machine-readable form which, if adopted, should ensure (i) consistency between curated models and their reference description; (ii) provide searchable databases of models using biological terms from accepted ontologies; and (iii) facilitate model reuse and development in the manner that we have described. These rules for annotation do not, however, provide any comment on the nature of the models themselves, or their suitability for any specific modelling purpose (indeed, this is not the intention of the MIRIAM initiative); however, it is apparent that additional constraints on the structure of models will also be useful when combining them together. Below, we briefly review the development of cardiac models with a more detailed focus on four of our own published models. We then highlight two specific examples in the cardiac field where reuse of elements has led to the connection between model parameters and experimental measurement becoming disconnected. These examples are used to motivate the proposal of additional criteria for biophysically based models to address the issues discussed above, before specifically analysing our four published models against these proposed criteria.
The last 40 years have seen the development of increasingly detailed biophysically based cell models of cardiac electrophysiology (Luo and Rudy, 1991; McCulloch et al., 1998). These models currently provide detailed representations of membrane-bound channels and transporters, and fluxes of ions between the cytosol and intracellular organelles. One example of a transporter model is our recent study characterising the kinetics of the sodium pump (Smith and Crampin, 2004) (Fig. 1A). The function of this exchanger is the maintenance of both the sodium and potassium gradients across the myocyte membrane. The kinetics of this process were represented using an enzymatic cycle, formulated to be thermodynamically consistent in coupling the free energy of ATP hydrolysis to movement of the ions against their electrochemical gradients, and fitted to experimental data of observed pump cycling rates at different extracellular sodium concentrations.
The known details of channels, pumps and exchangers have enabled analysis of the role that each functional element plays in health and disease (Shaw and Rudy, 1997). Further, they have provided a successful paradigm for integrating individual data sets on the different molecular components of the cell into a common framework. This allows trans-membrane ion transport to be linked to action potential recordings, in altered ionic conditions, in the whole myocyte, across a range of species from rat to human (Pandit et al., 2001; ten Tusscher et al., 2004). We recently published a model (shown schematically in Fig. 1B) of the myocyte that builds on the existing Luo–Rudy dynamic (LRd) electrophysiology model (Hund and Rudy, 2004). The LRd model was developed to study myocyte electrophysiology over one heart beat. Our study considered the effect of acidosis (a drop in pH associated with impaired metabolism) on excitation–contraction coupling in the heart cell, over multiple beats (Crampin and Smith, 2006). This imposes a new set of requirements on the model. It was necessary to ensure conservation of mass and charge, and that under normal conditions the time courses for state variables (ionic concentrations and membrane potential) were maintained from one beat to the next. Our model uses thermodynamically constrained cycles to represent acid transporters and includes proton inhibition of many of the calcium-handling process in the cell, fitted from available experimental data.
While initially lagging behind developments in electrophysiology, cellular models of myocardial contraction have now progressed so that myocardial mechanics can be computationally simulated. Detailed Ca2+-induced activation of thin-filament kinetics has been combined with a representation of cross-bridge tension generation, which describes the length and tension-dependent Ca2+-induced activation of cellular contraction. Transient Ca2+-induced excitation–contraction has been characterized by coupling electrophysiological and mechanical models (Nickerson et al., 2001), thus enabling simulations of activation-induced contraction. Based on the existing framework of Hunter et al. (Hunter et al., 1998), we recently developed a model of active contraction of the myocyte, which uses mass-action kinetics to model calcium binding to TnC, and tropomyosin kinetics (Niederer et al., 2006). These elements have been combined with a phenomenological representation of actin–myosin binding kinetics and the force and length dependence of each process was characterized in detail. In this study, each parameter was rationalized from numerous sources and, where possible, multiple experimental modalities, through an extensive review of the literature (Fig. 2A is shown as an example). Issues of species consistency and experimental conditions, in particular temperature, are explicitly addressed in the choice of parameters to represent a rat myocyte at room temperature.
In parallel work, we have developed a computational model of muscle cell oxidative energy metabolism (illustrated in Fig. 2B), which we have applied to analyze cardiac and skeletal muscle energetics (Bassingthwaighte et al., 2001; Wu et al., 2007). In these studies, ATP consumption is treated as a forcing function and the ATP consuming processes associated with contraction and electrophysiology are not explicitly modelled. In current work, the energy metabolism model is being integrated with the electrophysiology and mechanics models, leading to an increasingly detailed model of cardiomyocyte biophysics.
Despite the increasing complexity, rapid improvements in the performance per unit cost of high performance computing has more than offset the computational demands for solving the systems of ordinary differential equations that represent these cellular and sub-cellular models. This has led to the development of models of cardiac tissue, in which the cellular models are embedded in a continuum description of tissue geometry. These models incorporate data from confocal microscopy, which detail the myocyte, fibroblast and collagen microstructure within the tissue. These microstructural data can be used to determine the conductivity and stiffness tensor within the continuum model, in order to predict the functional properties of electrical conductivity and mechanical stiffness of cardiac tissue (Trew et al., 2006). By applying the mono-domain or bi-domain equations, tissue-level models have been used to predict the spread of activation in two- and three-dimensional simulations (Smith et al., 2004; Tomlinson et al., 2002). Using the tension transients calculated in the cellular models, tissue deformation can be predicted by solving the equations of finite deformation (Pullan et al., 2001). Linking the calcium transient of the cellular electrophysiology model to cellular tension generation enables the coupling of activation and contraction. This coupling is achieved at the tissue level by combining numerical solution techniques properly to preserve computational efficiency (Nickerson et al., 2005; Smith et al., 2003) (Fig. 3).
In this way, cellular and sub-cellular modelling provides a framework for capturing mechanisms at their own spatial scale and for extrapolating these responses to determine behaviour at the tissue level. The parameters of each of these cellular models are typically determined either directly (a single measurable parameter) or indirectly (fitting a data set) from experimental data.
It is critical to preserve this link to experimental data, both for appropriate parameterisation and for validation of model function. The potential provided by the ability to reuse and integrate existing model components can, however, be a double-edged sword. Model integration leads to the reuse of parameters, which is a necessary and efficient means to generate new, more complex models. Even if all model parameters are determined using the best currently available experimental data, they may still be superseded in time. The parameter set for a model component can, however, become obscured from further reviewer scrutiny once it is reused in later models, and the original explicit connection with experimental data is lost.
Specific cases of this phenomena for the propagation of two common cardiac myoctye model parameters over 25–30 years of modelling are shown in Fig. 4A,B: the binding affinity of Ca2+ to troponin C (Crampin and Smith, 2006; Faber and Rudy, 2000; Hilgemann and Noble, 1987; Holroyde et al., 1980; Hunter et al., 1998; Jafri et al., 1998; Luo and Rudy, 1994; Nickerson et al., 2001; Noble et al., 1998; Pandit et al., 2001; Robertson et al., 1981; Rodriguez et al., 2002; Winslow et al., 1999; Zeng et al., 1995) and to calsequestrin (Bondarenko et al., 2004; Cannell and Allen, 1984; Crampin and Smith, 2006; Faber and Rudy, 2000; Hund and Rudy, 2004; Iyer et al., 2004; Jafri et al., 1998; Luo and Rudy, 1994; Ostwald and MacLennan, 1974; Pandit et al., 2001; ten Tusscher et al., 2004; Winslow et al., 1999; Zeng et al., 1995). In both cases, an early model (Cannell and Allen, 1984; Robertson et al., 1981) provided a foundation component for a number of the current cardiac models. Since the original models were published, there has been a consistent flow of new and arguably more reliable experimental data sets, which have been largely ignored by the modelling community. The vast majority of cardiac models (including our own) (Crampin and Smith, 2006) are guilty of building on existing models without considering the source of all the model parameters. To address this issue, in our recent model of active contraction (Niederer et al., 2006) we performed an extensive literature search for each model parameter and noted the experimental conditions under which the parameter was measured. We belive this adoption of clear links between model parameters and experimental results is an important step in maintaining credibility in cardiac modelling.
Systematic validation against experimental data of models linking detailed cellular biophysics to tissue function remains challenging. As outlined above this is, in part, due to the technical difficulties associated with managing and maintaining links to experimental data required for each mechanism in the excitation–contraction metabolism process. Nonetheless, validation is essential before these promising simulation techniques can provide real value to the clinician.
The specific difficulties outlined above are as follows. (1) Models are rarely implemented and tested as part of the peer-review process for journal publications, meaning the published manuscript may contain errors. (2) The connection between model parameters and data is often ambiguous. Making this link transparent is fundamental to building large-scale models that integrate different physiological subsystems. (3) The functional limitations of a model do not become apparent until significant time and effort has been put into model implementation, application and coupling. (4) There are few public forums where feedback, experiences and critique of existing published models can be shared. (5) The experimental data used to parameterize and validate computational models are rarely available to the community in convenient useable formats.
Each of these issues undermines confidence and impairs the application and extension of models by people other than the developers, or those with specific expertise in model development. As discussed above, a number of cell modelling mark-up languages have been developed (CellML, SBML, Jsim) and using these, and other established computing languages, cell models can be made freely available. Furthermore, there is on-going discussion of the development of FieldML (http://www.physiome.org.nz/fieldml/pages/), a mark-up language that will enable the representation of structural and continuum information about biological and physical entities. This will allow the unambiguous machine-readable representation of structural and tissue-based models. Running versions of models provided by model authors using these codes provides a significant step in overcoming issue 1. Furthermore, a model that is compliant against the MIRIAM rules guarantees machine readability, an unambiguous description of the model, consistency with the published model, and consistency between published results and simulation output.
To address issues 2–5 will require the community to build on these initiatives, and the development of openly available resources to disseminate models linked to the data sets used to parameterize them. We suggest that the following two types of entities should be collected and published online in a physiome database: published models, including complete codes for simulation, and peer-reviewed published data sets in accessible electronic formats. The first of these is the domain of the MIRIAM standard. Model entries in the database will be annotated using established ontologies, and include working and executable codes, using freely available tools, or computational code in an established language (C, Matab, Fortran, Pascal). These marked up executables with the addition of digitized data sets (see point 1 below) will ideally be available as part of the review process. This will enable the reviewer and user community to curate entries in the database with the following tools and criteria:
Below is the list of objective criteria that we propose for classification of computational models of cellular function. Each model is classified in each of the following categories as: (A) satisfies, (B) does not satisfy, or (NA) not applicable. This classification is not intended as a judgment on the validity of a given model or approach; but is intended to help define the scope and applicability of a model for potential users.
We now consider the models, from our own work, described above. The classification of each of the models against these criteria is given in Table 1.
In the above section we have proposed a set of criteria for models in physiome databases, in addition to MIRIAM compliance, by which we hope to facilitate confidence in the use and reuse of biophysically based models of biological and physiological systems. These insist on a transparent connection between experimental data and model representation, and a set of objective model characteristics that will assist in quantifying the scope of a given model.
It would be naïve, however, not to consider the difficulties with implementing such a process. The culture of scientific publishing rewards the creation and publishing of new models rather than critiquing or reviewing existing work. The classification of models according to a set of criteria, as proposed above, may require significant investment of resources and, perhaps, requires new ways to recognize and to provide incentives for individual involvement.
As suggested in the MIRIAM proposal, an initial curation process will be most effective if performed by the model author, rather than post-hoc by a separate curator. However, if models are to fulfill their role, giving qualitative (mechanisms) and quantitative (experimental data) understanding, it will be vital that there is a forum for an open and robust critique of models. This debate could take the form of challenging models with new data sets, as they become available, or critiquing modelling assumptions or approaches used in deriving a model. Developing a forum that encourages open debate amongst experts and users and provides useful information for non experts, while minimizing unproductive conflict, would clearly require skilled mediation and a well established code of conduct. However, as argued in the Introduction, we believe this type of curation will be an essential process for the ongoing development of integrated computational models
We have outlined a preliminary plan that expands the currently proposed criteria for model curation and we assessed four models from our own work against the proposed criteria. We hope that this proposal will itself generate dialogue and debate within the biological modelling community. Our five criteria for model assessment have been selected for their primary relevance to metabolic and electrophysiological models. However, any ‘final’ set of criteria must of course be selected and adopted by the community, and may possibly require the formulation of additional criteria, or even of alternative lists for the classification of models based on other frameworks, e.g. network inference models for gene–gene interactions, or signalling pathways. We see this goal as falling firmly under the aegis of the Physiome Project; motivated by the pressing need to establish standards to facilitate communication and debate about models, to accelerate the use, implementation and review of models and their connection with data by the scientific community.
The authors would like to thank Professor Peter Hunter, for helpful discussions. This work was supported. by the Marsden Fund of the Royal Society of New Zealand through grant No. 04-UOA-177 and National Institutes of Health grant No. EB005825.
Glossary available online at http://jeb.biologists.org/cgi/content/full/210/9/1576/DC1