|Home | About | Journals | Submit | Contact Us | Français|
Multiscale modeling is essential to integrating knowledge of human physiology starting from genomics, molecular biology, and the environment through the levels of cells, tissues, and organs all the way to integrated systems behavior. The lowest levels concern biophysical and biochemical events. The higher levels of organization in tissues, organs, and organism are complex, representing the dynamically varying behavior of billions of cells interacting together. Models integrating cellular events into tissue and organ behavior are forced to resort to simplifications to minimize computational complexity, thus reducing the model’s ability to respond correctly to dynamic changes in external conditions. Adjustments at protein and gene regulatory levels shortchange the simplified higher-level representations. Our cell primitive is composed of a set of subcellular modules, each defining an intracellular function (action potential, tricarboxylic acid cycle, oxidative phosphorylation, glycolysis, calcium cycling, contraction, etc.), composing what we call the “eternal cell,” which assumes that there is neither proteolysis nor protein synthesis. Within the modules are elements describing each particular component (i.e., enzymatic reactions of assorted types, transporters, ionic channels, binding sites, etc.). Cell subregions are stirred tanks, linked by diffusional or transporter-mediated exchange. The modeling uses ordinary differential equations rather than stochastic or partial differential equations. This basic model is regarded as a primitive upon which to build models encompassing gene regulation, signaling, and long-term adaptations in structure and function. During simulation, simpler forms of the model are used, when possible, to reduce computation. However, when this results in error, the more complex and detailed modules and elements need to be employed to improve model realism. The processes of error recognition and of mapping between different levels of model form complexity are challenging but are essential for successful modeling of large-scale systems in reasonable time. Currently there is to this end no established methodology from computational sciences.
Multiscale modeling of human physiology serves to integrate knowledge from genomics and molecular biology, through the levels of cell tissue and organ modeling to integrated systems behavior. Biochemical system models need to be expressed by fully detailed dynamic equations, rather than in matrix forms for steady states. They must account for the thermodynamics, meaning that every reaction is reversible and obeys Haldane constraints, and they must account for appropriate balances of mass, energy, constituent fluxes, including the solute mass bound to enzymes. Tissue models are composed of these cellular models with appropriate anatomic measures of volumes and distances. Reduced forms are in use for steady-state tracer analysis of images from positron emission tomography and magnetic resonance imaging. Physiological modeling, however, requires handling transients in response to external changes, and reductions to steady-state kinetics would not be satisfactory in general.
Models of cell-blood solute exchanges and reactions were necessarily simplified to be computed fast enough to aid in devising hypotheses and designing critical experiments. Reduced-form models must be parameterized to reproduce the response functions of the full model for any given element or module—an optimization that results in a descriptor that is correct over only a limited range of operation. Each successive reduction degrades adaptability. For long-term time-dependent solutions to changes in state or external conditions, the reduced models err. Error correction requires resorting to the more complex and detailed forms of the modules and elements. The process of recognition—going down the hierarchy of model forms and back up—is difficult, prone to error, and is time consuming, but it is essential for modeling large-scale systems in reasonable time. Algorithms for this are still primitive.
One useful example is the physiological response to exercise, showing rather tight regulation of blood pressure in spite of large changes in body metabolism, cardiac output, and vascular resistance. In general, in the adult, there is a nested hierarchical control, the lowest level being the regulation of transcription and the generation and degradation of proteins. The next levels concern the biophysics and biochemistry of cells and tissues, which is where we start our cell-to-organ-to-cardiorespiratory system modeling, defined in Figure 1. The higher levels of organization, tissues, organs, and organism levels are increasingly complex, representing the dynamically varying behavior of the 2 billion cells of the body operating together.
Currently, computational models integrating cellular events into tissue and organ behavior utilize simplifications of the basic biophysical and biochemical models in order to reduce the complexity to practical computable levels. This compromises the ability of the models to respond correctly to dynamic changes in external inputs, for these normally require adjustments in rates at the biophysical, biochemical, and gene regulatory levels.
This is a novel approach to carrying out simulations, with continuous monitoring and dynamic control of the computing carried out at subsidiary modules in a multi-scale model system. The idea is to consider a computational simulation process no different from other complex engineering processes in Boeing or DuPont. The procedure is based on established methodologies from systems and control engineering, but even so is not definable as a recipe. This is a work in progress, and many others will contribute to advancing this technology. A key difference between simulations in physical sciences and in physiology is that the physiological processes are gradually being discovered and remain incompletely known, while a physical process, such as building an airplane, is composed of exactly defined parts. However, in both cases a common principal goal is to determine integrative behavior. The risk of error in physiological modeling is great, simply because each hypothesized descriptor for a biological function serves not only as the “output” of one computational module but is the “input” of another. Error propagates. The adaptive approach to be described strives to control the complex computational processes in a simulation, but it is not yet clear that any one completely general strategy can be prescribed.
This integrative system approach to multiscale modeling extends from four complementary areas of research: cardiovascular physiology and medicine, systems optimization and parameterization, signal analysis and decision control, and molecular and cellular biophysics and mathematical biology. These can be applied together to formulate computationally feasible multiscale system descriptions, allowing adaptation to interventions or to changing external driving forces. The approach encompasses four principles of modern, “open” science: (1) it uses new computational methods spanning multiple scales (regulation of transcription, cellular energetic metabolism, cell-to-cell interactions, integrated organ contractile function); (2) it bridges current levels of modeling from the level of cardiovascular and respiratory system responses to exercise, trauma, and activity with the cellular levels of receptor signaling, cellular metabolic dynamics, and the thermodynamics of energy transformation and transduction, and provides a framework for guiding research into the regulation of transcription in muscular tissues (e.g., cardiac remodeling, training in athletes); (3) it results in highly integrated models of cardiac and skeletal muscle systems showing adaptive responses to lowered or raised oxygen and substrate supply, changed demand for contractile work, sympathetic and parasympathetic neuronal input levels, and encompasses the remarkable heterogeneity of normal regional flows and metabolic function that exists in both heart and skeletal muscles; (4) it will produce models for research and education when operated under an open source philosophy. The results should be models with manuals and tutorials, source code, a modeling system for their operation, a Web site for their dissemination, and truly full publication, allowing complete reproducibility of the models by others. This is entirely in tune with the developing emphasis on full cooperativity in scientific research.
An adaptive simulation methodology appears to be more or less essential to furthering multiscale modeling of complex physiological systems during the foreseeable future, simply because computing resources are not infinite. It is vital to compute “at the speed of thought,” that is, to display model solutions so fast that they are useful as “mind-expanders,” projecting the results of “what if?”, and the queries exploring the behavior of the system. Consequently, models must be composed of modules of reduced complexity: one does not build a body out of atoms or a truck out of quarks. The term “adaptive” has two implications: In the biological context, it means that the physiological responses must be adaptive to the changing environment of a cell or an organ, and therefore it means that any computational model should be able to deal with this challenge. In the engineering context, our approach is to use adaptive simulation technology, wherein the multiscale simulation is treated as a complex computing process—not very different from a chemical factory or an aircraft plant with an engineering control approach applied to it.
The overall strategy summarized in Table 1 can be organized into three sets of tasks:
The identification relationships between subcellular module parameters and overall systems behavior—linking, for example, cardiac cellular energetics with contractile force development, beat-to-beat ejection of blood from the cardiac chambers with the regulation of blood pressure, and cardiac output in response to perturbations—will prove useful in defining conditions under which specific subsets of lower level modules might best be replaced simultaneously, each by an alternative reduced-form module. Defining such associations will provide computational strategies and technologies and their application to cardiac and muscle physiology.
The central product of such an effort provides examples of all of the techniques in a form that can be applied to different multiscale modeling applications, and it can be used by other investigators. The specific systems model of the cardiorespiratory system serves as vehicle for research and teaching and for the training of physicians and other medical personnel. The current version is limited to a higher level representation giving appropriate pressures and flows in the circulatory and respiratory systems, gas exchange including oxygen and carbon dioxide binding to hemoglobin, bicarbonate buffering in the blood, H+ production in the tissues, control of the respiratory quotient (RQ), baroreceptor and chemoreceptor feedback control of heart rate (HR), respiratory rate, and peripheral resistance. This is a large-scale model that summarizes the information from key studies done over the past century and adapts to changing conditions in a limited way. It is available now for public use as the model CVRESP (at http://nsr.bioeng.washington.edu/software/models/) and it serves as a basis not merely for the understanding of physiology, but for pathophysiological processes and their consequences. It is not yet a good framework for pharmacokinetics and pharmacodynamics, as the tissue processes are not described broadly enough to handle blood-tissue transcapillary exchange and cellular metabolism.
The next stage is the linking of a representative cell model with the overall circulatory system model in order to characterize the relationships between tissue events (varying requirements for blood flow, generation of metabolites, substrate exchanges). While this can be done at a fairly simple level for handling a few solutes at a time, as we have done using axially distributed capillary-tissue exchange models accounting for gradients along capillaries,1,2 these models are themselves computationally demanding when used in their fully expanded forms, accounting for intra-organ flow heterogeneity and for exchange, metabolism, and binding in red blood cells, plasma, endothelial cells, interstitium, and the organ’s parenchymal cells, requiring up to the equivalent of 80,000 differential equations.
To characterize cellular support for a high-level, adaptive model3 of the contracting heart, one must relate metabolic processes to contractile performance. To do this requires a basic cell unit accounting for metabolism, ionic regulation, and excitation–contraction (E-C) coupling, as well as contractile force development. It thus must be a more comprehensive model than the “eternal cell” model4 that accounts only for energetics and ionic balance. While a useful cell-level model can lack elements on regulation of transcription and translation to form protein, and lack inflammatory and immune response systems, it will be a basic fundamental structure upon which such features may be added.
A basic cell model adequate to serve as a component of a contracting heart (level 3) contains cellular metabolic networks, oxidative phosphorylation and the redox system, response systems for adrenergic stimuli and insulin, stretch-activated calcium fluxes and their modulation of metabolic fluxes, contractile force development in accord with heterogeneities of electrical activation times and sarcomere lengths. Control of contractile performance is defined by influences external to the cardiomyocyte, namely levels of sympathetic and parasympathetic stimulation and circulating hormones, and cardiac filling pressures (preload), the arterial pressure against which the heart ejects blood (after load), the HR, and the peripheral vascular resistances. Integration of these events into a whole organ model accounting for regional heterogeneities in flow5,6 and contractile stress7,8 gives rise to level 4 organ models with patterns of regional flows and oxygen consumption rates that are fractals in space but are stable over time. When put into the context of a fully developed three-dimensional finite element heart with all the correct fiber and sheet directions, one should be able to discern whether the spatial heterogeneity of local flows is governed by the local needs for energy for contractile force development. Because all organs demonstrate marked flow heterogeneity, what is learned from cardiac muscle can be applied to other organs.
Several hundred published models relate to our efforts in composing a five-level model with full sets of equations—some in texts9–12 and most as articles. At the cardiorespiratory system level, the classic model of Guyton, Coleman, and Granger13 was the pioneer. Of the many recent models at that level, there are several articles by Clark’s group (e.g., Lu et al.)14 that illustrate a coherent and extensive body of work. Exemplary models—meaning carefully documented, published with full sets of equations, parameters, and initial conditions, and reproducible—are remarkably few, but include Hodgkin and Huxley,15 Beeler and Reuter,16 Winslow et al.,17–19 Noble,20 Rideout,12 and Sherman,21 some of which can be found at our Web site (http://nsr.bioeng.washington.edu) along with many subcellular level, transport, and electrophysiological models.
Cellular tissue and organ level models will serve as the vehicles for systems integration22 of knowledge coming from molecular, cellular, and organ level observations of many sorts, including mRNA arrays or array sequences, gas chromatography/mass spectrometry (GCMS), metabolomic arrays, nuclear magnetic resonance (NMR), proteomic arrays,23 gene regulation, electrophysiology, NMR phosphoenergetics, membrane potential, ion regulation, and so on. While the “eternal cell” lacks the key ingredients for gene regulation, it is precisely positioned to accept recent developments in the synthesis of signaling networks and to carry the developments in specific genetic variants, as in the studies by Saucerman et al.24,25 The subsequent phase is the joining of signaling and genetic regulatory networks.26 Beyond the metabolic biochemical network, the signaling network (i.e., ligand-receptor binding, guanosine triphosphatase (GTPase) activation, intracellular phosphorylations) and genetic regulatory networks (gene transcription initiation, activations, repression, etc.) have been extensively studied in recent years. In recent work, Palsson and his colleagues have developed interesting approaches to signaling and regulatory networks from the constraint-based approach that integrates these models.23,27,28 In our own work we have an ideal preparation for studying the regulation of transcription of contractile proteins, namely the chronically paced heart of ambulatory dogs, in which one gets hypertrophy in predictable regions of increased work and atrophy in regions of diminished work.29,30 The “eternal cell” model, by providing a basic energetically and metabolically based picture, is in our view a key step to augmenting the powers of genomic and molecular biology. It is also a key to determining how genetic and cellular interventions can affect the physiology of the whole body (which is a long way beyond this project).
Constructing a multiscale model is currently being done by brute force, putting all the pieces into a full “reference” model (Step 2 of Table 1). This is doomed to be computationally slow. The models are to be used both as “mind-expanders” in exploring systems behavior and providing insight and prediction, and as data analysis tools, so it is vital “to compute at the speed of thought.” Thus, in aggregating large sets of models at a lower tier to form the next higher tier model, one must reduce the complexity of the individual modules in order to minimize the increase in computation time.
To meet the objectives of Step 3 of Table 1, traditional methods of optimization and parameter identification are being used for the fitting of computationally simpler, reduced forms of the mathematical models, for components at each level are fitted to the fully developed forms. For each component subsystem at each of the five levels of modeling, one or several reduced models are being developed (the “several” indicating that differing levels of approximation are used). Over a five-level hierarchical system, one must use successive levels of reduction, coalescing each time a few lower-level modules into a computationally more efficient composite model. The big problem with this approach is that each up-the-scale reduction limits the range of operational validity of the composite model.
How to determine when one should modify a reduced submodel or resort to the full submodel is important, is difficult, and is the part of any multiscale code that one would like to minimize, since it does not contribute directly to obtaining a good model solution. The tricky aspect of this is that when one is using external driving conditions or changes in internal conditions over time to govern the model solution, then decisions concerning the adequacy of the algorithms need to be made as the solution progresses. New research into devising algorithms is needed for detecting when the reduced model forms begin to lose accuracy in representing the physiology because of changes in external conditions or the shifting of the internal conditions, causing a reduced-form module to move out of its local range of validity. An adjustment requires either making changes in the parameters of the reduced-form module or resorting to an alternative reduced-form module or, as a last resort, reincorporating the full-form module into the computation. The basic decision control here is experiential, as it uses prior determination of the ranges of validity for detection of the need to use an alternative computation.
An alternative approach to avoiding errors due to model reduction is to use signal analysis to detect changes in the character of solution variable values that indicate lessening accuracy or deviation from a range of validity. “Lessening accuracy” implies failure in numerical methods for obtaining the solution. “Deviation from validity” implies that the solution variables show a change in a quantifiable characteristic that is not the variable value itself, but a change in the frequency or power content of a variable or set of variables that is associated with the overall systems model pushing the limits of validity. A detection method might also be based on system stability (e.g., in terms of Lyapunov exponents for subsystems).
It is essential to develop strategies for efficient computation of adaptive multi-scale models in order to apply them to solving a major problem in the simulation of multilevel simulations in biology, while retaining adaptability during the computing of whole system responses to changes in activity and environment. Extending this concept within the context of “open source science,” it would be reasonable to project combining of all of these algorithms into a well-documented software package to provide to the user community. This has been a central aspect of our Simulation Resource facility, but even there we have found it difficult to bring all the research models up to a level fit for public distribution. More than 90% of the effort put into cleaning up, protecting against incorrect parameterization, validating against data, archiving, and documenting follows after proving out an adequate research-level model. This is an expensive and time-consuming process that is difficult to support financially.
The multiscale cardiac and cardiovascular-respiratory system model brings together many transport models1,9,31–50 with work by us on cardiac cell modeling3,4,51–62 and by many others,7,8,17,18,25,63–68 in models incorporating both cardiac and endothelial cell uptake and metabolism;69–75 and for accounting for regional flow heterogeneities.5,6,37,76–80 Many of these are available at our Web site.
Transport models have been developed using both analytic and numerical methods of solution. The more detailed ones account for both passive diffusional transport and carrier-facilitated transport across capillary endothelial and parenchymal cell membranes, intracellular and extracellular binding, the different velocities of erythrocytes and plasma, concentration gradients between inflow and outflow, and enzymatically facilitated reactions in all cell types and extracellular regions.1,2,31,36,47,59,81,82 One general blood-tissue exchange model, GENTEX, accounts for most of these processes (for up to seven solute species simultaneously), and it is written in finite element form. Its code has the virtue that all processes not used by a specific solute are automatically removed from the computation, with the capability of being used as a one-compartment model (single exponential impulse response) or as a fully distributed model that accounts for flow heterogeneity for all species using up to a maximum of 80,000 differential equations.83 This model is used for analyzing image sequence data from PET (positron emission tomography), obtaining parametric images (e.g., regional blood flows, regional oxygen consumption, regional receptor densities). For the parameter optimization, under our simulation system XSIM we have done the computation on a supercomputer and displayed the results on a laptop or desktop in one’s office; this is being implemented within our simulation system, JSim.
Our “eternal cell” model,4,57 a virtual or computational cell model, is “eternal” in that it assumes there is no proteolysis and no protein synthesis. The initial configuration centers on energetics, fatty acid and glucose metabolism, the tricarboxylic acid cycle, the electrophysiology of time-dependent and voltage-dependent ionic currents, the ionic pumps and exchangers, the calcium cycling and the processes of calcium release and reuptake by the sarcoplasmic reticulum during E-C coupling, purine nucleoside and nucleotide metabolism in myocytes and endothelial cells, oxygen and CO2 metabolism, and mitochondrial oxidative phosphorylation.
Reaction schemes for biochemical reactions in energy metabolism, biosynthesis, and transport, etc., are well-developed components of the biochemical networks (see Table 2). The constraints on the system are physicochemical: balances of mass, charge, osmolarity and volume, oxidoreductive state, and energy balance. The cell system is computed as microcompartmentalized sets of (usually ordinary) differential and algebraic equations. The modules are running under JSim. Some components are incomplete (e.g., the ion exchangers and the Ca pump). For many reactions the equilibrium is known, providing exact thermodynamic (Haldane) constraints.84 A completely configured model should be ready for distribution in 2005 and will then become a component of the multiscale cardiovascular respiratory systems model. While most of the enzymes involved in these reactions are well characterized by reduced-form Michaelis-Mententype equations, these account only for fluxes and do not give an exact accounting for mass balance, since they omit the amount of substrate bound to enzyme and must therefore (in most cases) be augmented to account for the additional mass. This is essential for analyzing tracer experiments.
At the biochemical network level, we use a thermodynamic based flux-balance analysis of steady-state metabolic networks. This approach is capable of predicting metabolic flux of large networks with more than hundreds of reactions and biochemical species. On the whole-cell level, steady-state metabolic network models have been extensively characterized via flux balance analysis and constraint-based optimization.85 Recently this approach was augmented by adding thermodynamic constraints, which establishes a conduit between standard metabolic control analysis (MCA) and biochemical kinetic modeling,86 not yet taken up by others.87 More specifically, the constraint-based modeling with thermodynamics is able to provide estimates of the steady-state concentrations of metabolites, as well as rate constants in terms of biochemical conductances, and it provides the flux estimates with much narrower constraints than does simple stoichiometric flux balance analysis. The difference in the equations used is not great, the essential feature being to account for the reversibility of each reaction that is implicit in having an equilibrium condition: the Haldane constraint (Figure 2).
The existence of a catalyst for the reaction has the life-providing benefit of lowering the energy of activation for the reaction, increasing the probability of reaction often by a million to a billion times, but it does not affect the free energies of either substrate or product (as is shown in Figure 3.)
Given that all reactions are reversible, our view of the equations for fluxes needs to be matched to this concept. No longer can we think of an enzyme-facilitated reaction as being only forward S → P, but the backward reaction must be incorporated into the calculation. Thus, instead of JS→P = (Vmax · S)/(Km + S), where JS→P is the forward flux in moles/s, we calculate a net forward flux. The general expression is
where Vforward is the maximal forward reaction flux at high [S] and zero [P], KS is k−1/k1, and KP is k2/k−2. The term in brackets accounts for the reverse reaction rate in accord with the Haldane conditions and therefore accounts for the free energies and the reversibility of the reaction.
While the flux in a metabolic network depends on the network structure, the input and output, as well as the level of gene expression (enzyme activity), the biochemical conductance through a given reaction is related to the rate constants for the reaction, which include the forward flux term, Vforward. This is the product of two inseparable terms: the total enzyme concentration times the forward reaction rate to form product. Thus, the flux is dependent on the rate of expression of the protein minus its rate of proteolysis, hence the level of gene expression. The proteolytic rate may be variable: many enzymes are degraded more slowly when substrate is bound, protecting them.88
Considerations of kinetic models for cellular signaling and subsequent phosphorylation-dephosphorylation cycles with GTPase activation has implications not only for energy utilization but also for the accuracy, efficiency, and robustness of the cell’s biological functions through its switches and signal relays.84,89,90 These studies, based on kinetic models of a protein network, revealed cooperativity different from the traditional allosterism and revealed that sharp responses (high cooperativity) can be achieved through monomeric protein-protein interaction using cellular energy derived from the hydrolysis of adenosine triphosphate (ATP) or GTP.
Cellular biochemical reactions often involve species with low copy numbers. Hence, a realistic model for intracellular signaling network requires stochastic treatment of the biochemical reactions. Stochastic kinetics has become increasingly important in systems biology.91,92 Qian’s recent work84,93,94 revealed that chemical oscillation is also intimately related to the energy input to the reactions. With the utilization of energy, stochasticity of the biochemical reactions in cell can be suppressed, and more precise temporal behavior is achieved. Different computational methods for carrying out stochastic simulations to improve computational speed have also been studied.
Sigg, Qian, and Bezanilla95 developed a biophysical model of the thermodynamics and kinetics of voltage-dependent channels based on insight from the field of protein folding. The model is able to fit a wide range of experimental data from K- and Na-channels: in particular, the model fits both the fast and slow kinetics across several orders of time scale and may introduce a new approach to describing channel kinetics to contrast with the traditional Markovian models, just as Qian’s stochastic model for heterogeneous flow distribution in vascular networks contrasts with continuous flow spatial forms.80
These examples serve to show the level of detail that is required to develop models that are mathematically and scientifically valid. The devil is in the details, and the consequence of simplification is that it is difficult to maintain either mathematical accuracy or scientific validity through successive reductions. This argument would oppose using model reduction to gain speed, but the other side of the coin is that models will not, almost cannot, be used unless the results can be seen rapidly. The trade-offs are summarized in Table 3.
The first stage is to define the structure and content of a cardio-respiratory-skeletal muscle system for oxygen exchange and substrate delivery model at five levels of scale, defining components from the biophysical/biochemical level through to the fifth, the system level, using modular elements in a hierarchical fashion. Modules at each level need to be verified (for correct computation), validated (against experimental data), completely defined and documented (for reproducibility), and made publicly available for download on a simulation platform that runs under Linux, Unix, Windows, and Macintosh operating systems. The models and subsidiary modules provide solutions that give the time courses of the variables and the responses to perturbations such as changes in blood pressure, HR, respiratory minute volume, cardiac contractility, cellular substrate usage, and mitochondrial membrane potential, in response to a demand for increased blood flow to the body such as during exercise. Our most recent models are available at http://nsr.bioeng.washington.edu/PLN/models/.
Many lower-level modules have been developed and validated. For example, we are using, in a cellular level model not yet available on our Web site, the NaK ATPase model of Chapman et al.,96 for which Kootsey provided his code in ScOP (an ordinary differential equation [ODE] based language), translated into the simple equation-based mathematical modeling language of JSim, adding the membrane potential influences. This was embedded in a composite model of cardiac cell electrophysiology built upon that of Rudy’s group,97 Winslow et al.,17–19 and Michailova and McCulloch.66 The NaK ATPase model was validated against the data provided by Skou98,99 and compared to the data provided by Lauger11 and to the solutions of the Post-Albers model, an alternative NaK ATPase model that Lauger described fully. On incorporation into the cell, the parameters for numbers of transporters were adjusted to give appropriate balances for Na and K in the whole cardiac cell beating at 1/s.
An example of a desired multilevel model is one for oxygen transport and consumption in the heart. In papers over the past twenty years there has been debate concerning the length and cause of a delay in the venous oxygen consumption response to an increase in metabolic rate, as occurs following an increase in HR or mitochondrial decoupling. The delay has been attributed to transport time through the blood, myoglobin buffering of O2, slow adenosine diphosphate (ADP) generation, or slow mitochondrial cytochrome oxidase activation. These are all probable causes, but their relative influences should be apportioned via a “complete” mathematical model incorporating the following elements going from the whole organ down to the biochemical/biophysical level:
This four-level model becomes a five-level model at the whole contracting heart level when intramural gradients in intratissue pressures and subendocardial to subepicardial flows are considered, especially at reduced perfusion pressures where subendocardial ischemia comes into play. The current contracting heart models of the investigators account for this compression and flow reduction but do not extend it to the compromising of cellular metabolism preferentially in the subendocardium.
The cellular level model, the “eternal cell,” a useful element in itself, covers three levels, from biophysical/biochemical models to cellular models. Major pieces already exist, due to the work of the past decades by many investigators. Cell models (myocyte, smooth muscle cell, endothelial cell) all interact in governing and in being influenced by interstitial fluid (ISF) concentrations. For example, in hypoxia where ATP is released from myocytes, the extracellular hydrolysis of ATP to adenosine 5′-monophosphate (AMP) is due to the ISF alkaline phosphatase, which is abundant at the arteriolar end of capillaries, not at the venous end. This localization requires using axially distributed models2,31,35–37,39,42,71,72,74,83,100 for representation of the convection-diffusion-reaction events. Next, the enzyme ecto-5′-nuclotidase, like its intracellular analog, hydrolyzes AMP to adenosine, which in turn relaxes smooth muscle by binding to a receptor or is rapidly taken up by endothelial and cardiac cells or degraded to inosine.2,59 While we hypothesized and accounted for the ecto-5′-nucleotidase for our modeling analysis,2,59 only recently has the importance of the enzyme been thoroughly demonstrated.106,107
The verification of such models is difficult because there are no analytical solutions with which to make comparisons. Partial approaches are to check numerical stability at different time step lengths and under different conditions of stiffness, to check conservation (of mass, charge, etc.), verifying steady-state behavior in limiting cases and where possible determining agreement with known analytical results or simplified solutions. Such procedures do not guarantee but help to ensure the correctness of the mathematical model (irrespective of its validity regarding experimental data), computational algorithms, and computer code.
The validation, by comparison of model solutions with data, that the model is a reasonable representation of the experimental data with its physicochemical anatomical and mass constraints, is the scientific core of the project, without which the modeling would be a trivial exercise in defining a concept. True “validation” requires testing the model, component-by-component and level-by-level (all the way to the integrated systems level), against high-quality, carefully chosen data. Many of these data sets have already been selected implicitly, for each of the basic biophysical models has been built on experimental data. At this point we do not have all of those data sets in our computer system in digital form. Obtaining them is difficult and tedious, as it often requires going back to the original authors to ask them for their data. This often fails, for many investigators do not store their original data, feeling they have summarized them adequately in their papers; the papers, however, seldom provide enough of the raw observations and leave us with estimated means and variances. This is a big problem and is inspiring the National Institutes of Health (NIH) to demand that authors preserve data. (At the same time, however, the NIH is failing the research community by failing to provide repositories for physiological, pharmacological, immunological, or higher-level clinical information, except that which comes out of clinical trials.)
The digitized data that are available are mostly of time course responses to stimuli such as changes in concentrations, pH, tracer addition, oxygen deprivation, and so on. Other data are observations of enzyme levels, metabolite concentrations, anatomic measures, and composition. Fully detailed metabolic profiles that provide data during transients are needed and await technical developments in the field. We have shown that the use of constrained mass, density, and volume considerations powerfully resolves multiple observations into a self-consistent mass-conservative context for biochemical systems analysis.108
For validation we fit dynamical model solutions to the data using a priori constraints wherever possible from the anatomy and the thermodynamics. A model is considered a viable “working hypothesis” when the model fit is good for a variegated set of observations, preferably from more than one laboratory. (Examples of validated models from our own lab are found in Refs. 36, 49, and 59.)
Given that one must reduce model complexity, the questions are, “Where are you starting in your thinking about the system?” and “What are the possible approaches?” At the molecular level we know that everything is stochastic: molecules diffuse by jumps with collisions and react when they collide with an appropriate molecule. In 1977, Gillespie109 gave a clear approach to the mathematics of simulating the system at this level; despite later improvements, the computation is orders of magnitude slower than using the equivalent continuous expressions, assuming stationary or Markovian rate constants. The choice depends on copy numbers, the numbers of molecules within the volume being considered, and whether averaging over many such regions puts one in a situation where the continuous methods are valid.
One approach to capturing the behavior of systems of this type is to represent them as switching between different models, due to an underlying Markov process. Stochastic control theory for this is well developed and can be applied to systems that abruptly “jump” between different models.110–118 An alternative is to apply methods of adaptive control to automatically adjust models to reflect underlying changes. For example, this approach has been applied in the control of functional electrical stimulation for neural prostheses.119–131
For the mathematical modeling, system identification and adaptive control of large-scale biomedical systems, it is necessary to define mathematically, write computer code, and verify computational accuracy for computationally efficient reduced-complexity models for the components at each of the levels of this five-level model. These reduced form models must (1) accurately match the behavior of the full models (for the components of interest, at that level) over a prescribed range of conditions, (2) be computationally more efficient than the full model, (3) interact correctly with other components at the same level, and (4) allow for correct incorporation into the next higher-level models, with the goal being to attain accurate simulations of the desired behavior while minimizing computation.
Cellular models are chosen as the central target for testing model reductions, since they currently exist, are detailed, mechanistic, and extensively studied in computational biology. The reduced model forms use physically meaningful quantities under the constraint that the reduced model forms are descriptive of the behavior of the full models within a defined margin of error over a prescribed range of conditions. Each is an approximate representation that improves computational efficiency but sacrifices some accuracy in being applicable only over a restricted set of conditions.
Reducing a complex chemical reaction system into a simplified kinetic model has had a long history in chemical dynamics, a subfield of chemical physics.132 One standard approach is to reduce a kinetic model by its time scales. With a given time scale as the objective, there will be fast time scales on which all the kinetics are treated as quasi-steady-state (QSS), and slow time scales on which the dynamic variables are considered constant. This is formally known as singular perturbation in applied mathematics.133,134 Well-known examples of such are the studies on Belousov-Zhabotinsky reaction (oregonator) and Hodgkin-Huxley theory (FitzHugh-Nagumo).135 The advantage of this approach is that the reduced model usually preserves a high level of connection with the original mechanistic model; hence, the model parameters are physically meaningful. The drawback is that this is a task for trained specialists.
The general approaches to reduced model derivation that will be pursued involve kinetic model reductions. Essentially all the computational cellular biology models that are based on nonlinear chemical kinetics involve enzymes. There is a large literature (that is still growing) on model reduction in enzyme kinetics. This reflects the difficulty of multiscale model reduction, as well as the central role of enzyme kinetics in computational biology.136 A careful biochemical analysis of enzyme catalysis or channel or ion pump function usually leads to a detailed mechanistic scheme with multiple steps and multiple intermediate conformational states (species). While molecular biochemists strive for a more complete description of the details, not all of these details are important to the description of the dynamics and physiological functions.91,92 Because it is impractical to compute large networks of protein reactions at the level of detailed molecular motions, model reduction is useful to move from biophysical to cellular-level integration. The key to faithfully modeling physiological function through model reduction is to know which aspects of the model require preservation at higher levels, and which aspects can be relatively crudely approximated without compromising the robustness of the system.137
Discussing kinetics of just one enzyme serves as an illustration. The full model is based on the law of mass action: the full equations force mass conservation and are constrained also by the thermodynamics governing the reversibility. However, a widely used approximation in the biochemical community138 is the QSS approximation that neglects the rapid loading of the enzyme by substrate,139 assuming that the enzyme concentration is negligible compared to that of substrate. However, when enzyme and substrate concentrations are similar, then one must account for the binding. Likewise, if the enzyme substrate complex influences another reaction, then the QSS approximation is incorrect or inappropriate as a reduced model.140 One can often, however, reduce the model, assuming that substrate levels change slowly relative to the fast-binding kinetics (something known as “inner solution” rather than “outer solution” to applied mathematicians). A further reduction is to assume rapid binding and augment the QSS Michaelis-Menten expression with a term accounting exactly for single-site binding. The proposed approach to integrate the QSS and the full equation models with constraint-based flux balance and energy balance analysis is to apply the idea from MCA, in which we determine the sensitivities of the steady-state concentrations as functions of enzyme activities, which are the “control coefficients” of MCA.85 Since some of the control coefficients are experimentally measured, this gives us the possibility of integrating the Michaelis-Menten flux equations and the MCA approaches with the experimental data as constraints.
For each component subsystem, at each of the levels of the five-level model, there may be several reduced-order models fit from observed inputs and outputs to the component subsystem. These families of identified models will generally be nested and of increasingly complexity (like a series of Russian matryoshka nesting dolls). How to determine when or if resorting to lower-level models is important, and deciding which reduced-complexity model is the best choice is quite difficult.
Parameter identification for the reduced models will be individual to each reduced form. All require the fitting of the reduced form to the original full model. Reduction in biological systems analysis has many variants. Standard pharmacokinetic analysis usually assumes lumped (instantaneously mixed) volumes of distribution of agents, an oversimplified approach that yields parameters having statistical meaning but having little physiological meaning except to determine associations.141 These can be improved by accounting for continuously changing volumes where that is appropriate.142–145 Parameter estimation in nonlinear systems such as muscle and joint dynamics146,147 is often trickier or requires more data. Generalized kernel approaches, as described by Marmarelis,148 tend to be descriptive only and use a minimal number of parameters, and therefore lose sight of the biophysical level complexity. Discrete time representation and system identification of electrically stimulated muscle has been used in real-time adaptive control and detection of changes in muscle properties such as fatigue.120,121,149,150,151 At the molecular level, dynamic algebraic system representation allows for the prediction of local variations of DNA helical parameters, with the base pair sequence (as one goes from one end of the helix to the other) considered as the system input signal, and geometric features of the DNA molecule (at each location along the helix) considered as the output.152 Real-time gait event detection in spinal cord injury patients during paraplegic walking, using machine intelligence methods,153,154 bases control decisions upon observations. The simultaneous real-time system identification of nonlinear recruitment properties in the quadriceps muscle of spinal cord injury subjects117 involved the use of an identification algorithm that imposed equality constraints on the identified parameters, which is a key aspect of the work proposed here in mapping between different levels of model complexity.
Optimization of the fits of models to data for parameter identification should be viewed as a combination of art and science. One thinks automatically of least squares minimization as the method of choice, but this is not a defined choice. The art, or personal bias, comes into the choices of domain (time, frequency, or some other space of representation of data and model), of distance function (linear or log spaces, evenly weighted or points individually weighted to account for variations in experimental error), the choices of which experimental data are most reliable or most consistent with the state to be represented by the model, and on and on. For any given situation it is worthwhile to develop guidelines to assist in weighting the decisions, but there are no guaranteed effective algorithms. Realization of the nature of the art is particularly important in large systems where the model is necessarily a compromise developed to reconcile differences in observations and to render a view that is self-consistent. Choosing amongst variants of weighted recursive least square algorithms is a part of the art. The parameter identification algorithms can be used to fit models that are linear in parameters, which includes linear times series models, linear differential or difference equation models, and linear systems combined with memory-less nonlinearities (which can be expressed polynomials or other functions of the model parameters). These algorithms can be extended to situations that are not linear in parameters, through use of methods such as the extended least squares and generalized least squares algorithms.155
Constraints in parameter estimation in large models, when substantial data are available, are often so severe that one would have to consider the parameters to be overdetermined. This is not always the case, as many published models have not been subjected to extensive validity testing against carefully obtained quantitative data. But large models tend to take years to put together and are put together for the purpose of integrating diverse data sets and for reconciling disparities amongst conflicting data sets and mutually contradictory hypotheses. In science, alternative models must be considered, in line with Platt’s remonstrance156 to develop an alternative to one’s favorite hypothesis and to design the experiment that distinguishes between two hypotheses: the experiment must disprove one of them and advance science. Thus, the working hypothesis defined by the large model is usually a parsimonious but accurate portrayal of the system compatible with the best data available, and with the contradictions reconciled and removed. Removal of contradictions means that judgment was used; publications should report on how such decisions were made. (This view assumes that the definer of the working hypothesis is determined to provide the most accurate and predictive representation of the system under study.)
The result of this kind of effort is that usually the numbers of data sets available for validation and parameter estimation by far exceeds the numbers of free parameters. In complex interconnected systems like the circulatory and respiratory systems or large biochemical networks, there are many fewer degrees of freedom than there are equations and parameters. This is because in interactive networks the behavior of the system is most heavily influenced by the combined behavior of the set of equations, rather than by individual equations. One can make number counts to push this argument: count the number of data points in the multiple sets of experiments, versus the numbers of free parameters in the model equations. This is not a properly refined way of estimating degrees of freedom—something statisticians would like us to do—but it is in the right direction. Determining formal degrees of freedom is truly difficult, because in both the models and in the data sets, neither the individual equations nor the individual points are truly independent of each other. For example, in analysis of a time series (e.g., a blood pressure response to hemorrhage or a change in glycolytic rate with exercise), the points at each time are correlated with the near neighbors. Likewise, the equations for blood pressure in neighboring segments of the circulation are necessarily related to each other, both in parameter values and in variable values. Extending this view, one can see that the actual degrees of freedom in a closed loop integrated system are many fewer than in an open loop modeling of a part of a system.
The optimization routines used to adjust model parameters to data are useful in getting an idea of the accuracy of parameter estimates. Most provide an estimate of the covariance matrix that gives numerical measures of the correlation among the different free parameters, and they provide confidence limits on parameters when fitting a given data set. How one arrives at these estimates always involves judgment calls; the confidence limits for a parameter are narrowed by imposing constraints upon, or fixing the values of, other parameters. These parameters can be provided by prior information. An example is that the use of anatomic or mass or energy balance constraints on a system reduces the degrees of freedom in the fitting of data and narrows confidence limits on the individual parameters.
The validation of reduced-form models has to take into account that there may be a few reduced-form models each an appropriate analog to the full-complexity parent model over a limited range of conditions and parameter settings and therefore, that each version has to be validated. These different forms can either be completely different mathematical equations or different sets of parameter settings. The validation therefore requires several steps; namely validation against the full-complexity model, defining the range of conditions over which the approximation is valid, and reaffirming the validation against experimental data, seeking data over wide ranges of experimental or physiological conditions.
When using the full-form systems model, one could presumably simply let it run, changing parameters or variables on the fly to explore the effects of particular interventions or events. This implies that the models have been built to accommodate unsteady states and are not tightly constrained in their validity.
However, with the inclusion of reducing-form modules or elements, adaptation to changing conditions necessitates consideration of the limited ranges of validity of the modules, and one would like to minimize substitutions in the interest of computational efficiency. How to do this is an undeveloped technology, a new art form. One would like to substitute one module for another when the first reaches the edge of its range of validity. This raises the question of the smoothness of the transition from one reduced model form to another. Taking again enzyme kinetics as an example, after finding the two reduced forms for fast time scale and slow time scale (inner and outer solutions) to the full mass-action based kinetics, one needs to establish the “time zone” in which these two solutions join smoothly. In applied mathematics this is known as “matching asymptotics.”132
Methods used for system identification and adaptive control in pharmacological regulation of physiological processes are useful in thinking about model control. For example, self-tuning adaptive control of arterial pressure in dogs under anesthesia can be as good as equivalent with that of an anesthesiologist,157 using only the simultaneous regulation of mean arterial pressure and cardiac output with two drugs (sodium nitroprusside and dopamine/dobutamine).158–160 The method was similar to a detection and control algorithm for a pharmacological stress test.161 Similar control strategies are built into humans (e.g., adaptive buffering of breath-by-breath variations of end-tidal CO2)162 using physiological information to prevent instabilities.163 Refinements in algorithms for adaptive control help where the identified parameters are subject to inequality constraints,164 or more generally subject to statistical constraints.165
Detecting when the reduced model forms begin to lose accuracy because of changes in conditions (external or internal), and revising and implementing mechanisms for either resorting to the use of the full subsidiary model or shifting to an alternative form of the reduced model, are a part of the required new art form. Changes in the character of the behavior of the variables (signals) provide information about potential submodel inadequacy.
Time-frequency density functions have been applied in acoustics, radar, machine monitoring, and biomedical signal analysis as a framework for multiscale combinations of density-type functions, such as concentrations and energies, for example, in examining molecular signal processing and detection in T-cells.166 Recent work in multiscale modulation spectral decompositions167,168 offers a new approach to signal decomposition, and has been applied to audio coding and detection of acoustic signal change. This modulation spectra approach offers a new approach for multiscale decompositions of potential-like signals, where phase details and shift-invariance properties need to be maintained. In this approach, a signal’s components are treated as a product of a slowly varying modulator and a higher-frequency carrier.167 This modulation spectral approach allows recurrent decompositions into modulators and carriers, thus providing an approach to representation of signal change that integrates across multiple scales.169
These methods for recognizing change are almost surely stronger than more conventional techniques (such as correlation dimension; Kolmogorov entropy; approximate entropy; auto and cross correlation; autoregressive, autoregressive moving average, autoregressive integrated moving average, or fractional autoregressive integrated moving average; fractal dimension; or wavelet analysis170), but have not yet been proven out in the setting of adaptive model modification.
The needs for biological systems models are quite different from engineering uses of change detection. For example, system variables are usually potentials or energy levels; these require different representations and mathematics. A pressure or voltage potential can be either positive or negative, relative to some reference level. Sums of potentials are well behaved and the linear combinations needed for calculations of usual representations for change detection are thus acceptable mathematical operations. However, energy levels, concentrations, and other observed quantities such as volumes do not have similar mathematical combination properties. For example, energies and concentrations are never legitimately negative. By carefully considering these quantities as densities, which are non-negative and normalized to range between zero and one, estimators for change detection are possible. But the underlying mathematics used for these estimators needs to change to be consistent with density-type quantities.171 An analogous problem exists in combinations of time-frequency density functions, which have been extensively studied (see, for example, Ref. 172). Multiscale time-frequency density functions have been previously proposed.173 These techniques then identify variance of model solutions from expected behavior and hand over the task of revising the model by automated means.
On detecting a change in “texture” or signal characteristic amongst the variables, the next question is what to do. Methods for either resorting to the use of the full subsidiary model or shifting to an alternative form of the reduced model are complicated if there is more than one possible improved form. Decisions are balanced between demands for speed versus validity. Also, one must keep in mind that the systems-level models demonstrate emergent behavior characteristic of nonlinear dynamical systems, although this is not commonly found to be characterizable as a low-order system with a measurable Lyapunov exponent.77 For example, if the changed behavior is due to a bifurcation, the representation given by the reduced model forms might still be entirely correct, in which case one should not switch to a more complex model. The goal is automated reconfiguration of the model when changes in conditions occur. Which model version to choose might also depend on the resolution and accuracy of the data against which the solution is being matched, but in principle this dependency ought to be secondary.
Three different approaches can be applied to adaptive model modification:
For all of these parameter identification algorithms, a “weighting factor” can be chosen that allows the model parameters to track slowly varying system changes (by discounting past input and output data, relative to newer information). This use of weightings in the identification algorithms allows the identified reduced-complexity models to be adaptively fit to changing conditions.
Multiscale multilevel systems modeling is in its infancy, but formulating and operating such models is critical to future success in integrative systems biology. The problems associated with using reduced-form components within large systems models stem primarily from the limitations to their validity. Developing high-level models with such components thus encourages the formulation of multiple subsidiary or component modules that can be substituted for one another when conditions demand it. Such adaptive modeling is presaged by substantial developments in engineering practices, and though it is not at all straightforward, it can be anticipated to apply to the engineering analysis and design of biological systems.
This research was supported by grants from NIH EB 1973-24 and HL 073598-1, and DARPA W81XWH-04. We appreciate the assistance of J.E. Lawson in the preparation of the manuscript.