|Home | About | Journals | Submit | Contact Us | Français|
Biomedical research frequently involves performing experiments and developing hypotheses that link different scales of biological systems such as, for instance, the scales of intracellular molecular interactions to the scale of cellular behavior and beyond to the behavior of cell populations. Computational modeling efforts that aim at exploring such multi-scale systems quantitatively with the help of simulations have to incorporate several different simulation techniques due to the different time and space scales involved. Here, we provide a non-technical overview of how different scales of experimental research can be combined with the appropriate computational modeling techniques. We also show that current modeling software permits building and simulating multi-scale models without having to become involved with the underlying technical details of computational modeling.
For many scientists, the first contact with a mathematical description of their data occurs when they investigate the significance of an assumed correlation between two parameters by fitting a line or simple curve to a collection of two-dimensional data points. Fitting a straight line (linear regression) with slope 2, for instance, implicitly creates a mathematical model that assumes that some mechanism in the observed system causes the property Y (representing the y-axis) to increase twofold, whenever the property X (representing the x-axis) increases by one. In many cases, this kind of modeling is phenomenological and limited to simply demonstrating the significance of the slope. However, once the correlation between X and Y has been discovered, one may want to find an explanation for the observed relationship between X and Y that not only clarifies why the two aspects (or parameters) are related (as opposed to being independent of each other) but also why the relationship is sufficiently well (for statistical significance) described by a linear equation. The hypotheses that are then formulated as tentative explanations are model descriptions of the system the measurements were performed on. Biologists formulating such models create representations of molecules, cells, cell populations in their minds or on a piece of paper and interconnect them by what are the assumed influences these components have on each other. The first tests of such a model are frequently gedanken experiments: ‘If this is how my system works – what would I expect to observe experimentally?’ At this point, it is obviously important that the model can make predictions in terms of experimentally accessible parameters. Problems can arise when the model is too abstract, that is, when too many components of the model cannot directly represent any experimentally measurable parameters, or when the model incorporates mechanisms that act on a different scale than the scale of possible experiments. The latter could be the case, for instance, when the modeler tries to explain cell population dynamics with the help of a model of intracellular molecular interactions. One would then need a hypothesis about how the molecular interactions influence higher-level parameters such as cellular proliferation or death rates. At that point, the model becomes a multi-scale model. Qualitative diagrammatic multi-scale models are very common in biomedical research. Ultimately, all tissue- or organ-level phenomena are based on molecular interactions occurring within or on the surface of cells. For the purpose of depicting a hypothetical role that a specific molecular mechanism may play in a tissue level disease phenomenon, a diagram with an arrow connecting a molecule to a higher scale correlate of the disease (for instance a graphical symbol for increased cellular proliferation) is in order. However, if one wants to subject the proposed causal relationships to a stringent quantitative exploration one needs to transform the knowledge embodied in the arrow-based diagram and, importantly, the implicit assumptions that it entails, into a formal description suitable as input for computer simulations.
In contrast to reviews that focus on the technical computational challenges associated with such simulations [1, 2] this article discusses the concepts, prerequisites and ingredients for multi-scale modeling from a biological point of view and explains how existing software can greatly facilitate the transformation of a qualitative into a quantitative model.
Multi-scale models that link different spatial or temporal scales of experimentation and hypothesis can traverse and connect those scales with different strategies. Approaches that start with observed features on a high level of a system and then attempt to deduce what kinds of mechanisms on lower, more fundamental scales could account for those observations are called ‘Top-down’. Top-down models have the advantage that hypotheses can stepwise increase their level of detail with the starting level directly backed-up by the data. The disadvantage of such models is that in the direction of increasing details adjacent scales of modeling do not unambiguously emerge from one another because, typically, a higher scale phenomenon may have multiple different potential underlying explanations on more fundamental scales. ‘Bottom-up’ models, in contrast, aim at deriving a system’s behavior on higher spatial or temporal scales from the dynamics and interactions of model components ‘living’ on lower, more detailed scales. The coarse-graining that connects the different scales involves identifying which types of collective behavior on a fundamental scale give rise to a coherent phenomenon on a higher scale.
Consider, for example, the signaling processes that result in biochemical and morphological polarization of a cell responding to chemotactic stimuli [3–5]. On the sub-cellular scale of detailed biochemistry, this process may be described as a network of interactions between trans-membrane receptors, adaptors, phospholipids, kinases, phosphatases and structural proteins. Mapping the network dynamics onto the cellular scale, will typically involve dropping many of the molecular details and describing a relationship between the strength and direction of a chemotactic stimulus and the polarization response of the cell, the latter quantified, for instance, as the difference in actin turn-over in the morphological front of the cell compared to the back.
As part of a bottom-up approach, one would try to build a model of the signaling network that translates the receptor stimulus into a spatially polarized activation of actin polymerization. The relationship between extra-cellular stimulus and actin polymerization could then be directly read off from detailed simulations based on the model and could be compared to the experimental data. However, assembling a network model that includes all molecular components involved in phenomena as complex as chemosensing confronts the modeler with several challenges . First, there may be gaps in the available knowledge of the biochemistry of the cell that make a reliable identification of the necessary network components very difficult, forcing the modeler to speculate not only on signaling mechanisms but also on their molecular components. Second, assembling a ‘minimal’ network that would be capable of generating the observed response does not necessarily result in a realistic representation of the real cellular biochemistry. One may miss important components that modulate the cellular response or represent alternative signaling pathways that render the cell’s behavior robust towards mutations of single components of the main pathway. Bottom-up models thus may suffer from ambiguity on the fundamental level. The great disadvantage of these models, namely the requirement to invest much care into the construction of the fundamental modeling layer is at the same time their main advantage: the process of assembling the model unveils gaps in our knowledge and points out new directions for experimental studies that without the modeling effort would be less apparent .
A top-down approach, on the other hand, would analyze the observed actin dynamics and its spatial properties and their dependence on the applied concentrations of the chemoattractant and then speculate on the signaling mechanisms that could account for the observations . One would, for instance, identify the need for a mechanism that translates receptor activation into intracellular signaling events. Linked to this signal transduction there would have to be a signal amplification mechanism since chemotactic cells respond with steep intracellular gradients of actin polymerization even to shallow extra-cellular gradients of chemoattractant. Top-down approaches thus try to reverse-engineer underlying mechanisms from higher scale observations. They play important roles whenever phenomena are observed that represented the edge of the current understanding of the biological system at hand. The top-down models that extrapolate into the unknown regions of potential underlying (more fundamental) mechanisms in such situations thus can – similar to bottom-up approaches – provide valuable, even if less detailed, frameworks for further experimentation aiming at filling fundamental gaps in our knowledge.
In addition to choosing whether a multi-scale approach should be top-down or bottom-up the modeler has to search for the most useful definition of the different scales, their functional components and, in particular, of the ways information is exchanged between different scales of a model. In fact, the process of encapsulation, of dividing a complex biological system into different scales or functional units with limited exchange of information between the ‘inner’ components of separate scales or units can be the most difficult part of creating a model. For instance, using flow cytometry to analyze a process that involves communication between multiple cells of different types one may identify certain cellular markers that correspond to specific phenotypes which, in turn, may correlate with some characteristics (proliferation, death or differentiation rates) of the dynamics of the observed cell populations. For a mathematical single-scale model that only aims at investigating possible scenarios of cell population dynamics it may be in order to define communication between cell types in terms of changes in the expression of those markers: when population A expresses more of some marker X the proliferation rate of population B will increase. For a model that aims at understanding the flow cytometric observations on a single cell level one would simulate cellular decision processes (leading to division, death or differentiation) based on how intracellular signaling pathways are activated by the signals cells receive through their receptors. The next higher modeling scale, the cell population scale, would be constructed by encapsulating the biochemistry of single cells. It would only describe the resulting changes in the numbers of cells of the different populations over time.
The next section will discuss how to combine detailed mechanistic model components with phenomenological elements to encapsulate and link scales in models of cell biological processes.
The linear regression ‘model’ mentioned in the beginning simply quantifies a correlation between two sets of measurements. It is an example of a purely phenomenological model of the underlying biological system. The model does not assume any particular biological mechanism that would account for the data nor does it explain anything about the biology1. Multi-scale models are frequently hybrids containing phenomenological elements as well as detailed mechanistic parts. Sometimes, the phenomenological elements are inserted as functional placeholders for not yet understood mechanisms but in many cases models contain phenomenological shortcuts to avoid the effort of explicitly modeling causal relationships or processes that lie outside of the focus of the overarching biological question. Consider, for instance, a hypothesis linking signal-induced cell differentiation and proliferation that includes feedback between more and less differentiated cell populations . A model that aims at understanding the relationships between the signals the cells receive and their intra-cellular decision processes would describe in detail the signaling pathways leading to the activation of transcription factors inducing differentiation or proliferation. In the model, once a cell has reached that stage, it would be assumed that – after some delay – the cell will have created two daughter cells and/or will have differentiated into another phenotype. Many intracellular signaling pathways process information within minutes and involve only a relatively small subset of the components of the cellular biochemistry. Eukaryotic division or significant phenotypic differentiation, on the other hand, take hours and involve duplicating or modifying major parts of the cell. Modeling the cell biology of division or phenotypic transformation in more detail would not deepen the understanding of the cellular decision processes but would considerably increase the size of the model and the computational cost associated with computer simulations. Here, introducing a phenomenological model component that links a supra-threshold activation of transcription factors to the resulting division or differentiation and the products of these processes does not just reduce the footprint of the model, it represents an important strategy for linking physical and functional model components that are separated with regard to their spatial and temporal scale.
As depicted in Fig. 1, each scale of cell biology not only has its characteristic types of data but also typical modeling and simulation approaches associated with it. The fundamental scale we have chosen is the scale of molecular interactions. Computational modeling on this scale typically aims at understanding or predicting the structure of molecules and the complementarity of potential binding partners with regard to shape and charge distribution as well as the conformational changes the molecules may undergo in the course of an interaction [11, 12]. The results of theoretical alignment studies or molecular dynamics simulations are becoming increasingly important for our understanding of the fundamental scale of molecular signaling mechanisms. Molecular dynamics simulations are usually governed by the far sub-microsecond timescale. Due to the wide gap between this time scale of intra-molecular dynamics and the time scale (10−3 – 102 seconds) of most of the chemical aspects of molecular interactions, there exist to our knowledge no modeling efforts that directly combine simulation across these different scales. The insights gained from the structural studies can, however, be used by modeling approaches that describe molecular reaction networks not just as systems of structureless chemical entities whose interactions are sufficiently well characterized by the laws of mass-action, but describe molecular interactions such as the ligation of receptors by their ligands as mediated by specific functional binding sites [6, 13–15]. These tools are capable of automatically generating the full network of (multi-) molecular complexes that can form based on the specified bimolecular interactions and create mathematical descriptions of the reaction kinetics, thereby linking the scale of molecular binding sites to the scale of chemistry. Obviously, the generation of chemical reaction kinetics requires quantitative input data, namely association, dissociation and enzymatic transformation rates of the interactions mediated by the binding sites. Currently, the scarcity of such data is a major bottleneck for quantitative simulations of cellular signaling mechanisms.
Many software tools nowadays make it possible to formulate and simulate quantitative models of cellular signaling processes without having to invest effort into the more technical aspects (such as integrating differential equations) of calculating time courses of biochemical concentrations [6, 14–23]. Reflecting the growing appreciation of the fact that molecular interactions are stochastic events, many of these tools contain algorithms that allow the modeler to simulate the temporal evolution of a model not just as a deterministic process but, if necessary, also as a sequence of discrete stochastic molecular events. In addition to approaches that use some implementation of Gillespie’s Monte Carlo algorithm [24, 25] which provides a method to simulate the fluctuations in species abundances there are approaches that explicitly simulate the stochastic dynamics of single molecules . Neglecting the inherently stochastic nature of molecular interactions is justified when the contributions of stochastic fluctuations are too small to have a significant influence on the simulated systems’ behavior. A useful rule of thumb is that the fluctuations around the mean of a molecular binding state, for instance the number of occupied receptors of a given type, are typically of the order of the square root of the number of available molecules of that type. For 100 receptors, the number of occupied receptors would thus stochastically fluctuate 10% up and down. If the signaling network downstream of the receptors is sensitive enough, these fluctuations may have biological consequences. For 10,000 receptors and a high enough concentration of the ligand the receptor signal would fluctuate by only 1%. Depending on the sensitivity of the system, a deterministic simulation – far less time consuming than a stochastic simulation – may then be in order. It should be noted that stochastic effects are not only important on the molecular scale. Depending on the accessibility of their DNA different individual cells of a given phenotype may express varying numbers of copies of their proteins . These random variations translate into variations in the responsiveness of the cells toward external signals.
With increasing complexity of diagrams depicting molecular interaction networks  extracting information such as the temporal or causal hierarchy of signaling events becomes increasingly difficult. Molecular interaction maps with well-defined symbols for specific types of molecular species, phosphorylation states and interactions aim at establishing a framework for visualizing the structure of signaling processes . Most modeling tools for intracellular signaling networks assume well-mixed biochemical systems without taking into account spatial aspects such as the non-homogeneous intracellular distribution of signaling components. Mechanisms like membrane recruitment of signaling components and the interplay between membrane proximal activation and cytosolic deactivation, in addition to intracellular compartmentalization can make the assumption of a well-mixed homogeneous cellular biochemistry rather unrealistic . However, due to the greater conceptual and computational effort required for spatially resolved simulations only few generic modeling tools are capable of simulating such spatial aspects thereby linking the scales of signaling networks and sub-cellular distribution of molecular components on the multi-scale map (Fig. 1).
Some tools simulate intracellular reaction-diffusion with the help of partial differential equations [6, 18] that, in addition to time, include space as a variable. For these approaches, the intracellular space is divided into volume elements (this process is frequently called ‘discretization’ of the space). Molecular species in different volume elements are (computationally) treated like different species with the diffusion between volume elements being equivalent to inter-species transitions. With increasing computer power, it is becoming feasible to simulate stochastic changes of particle numbers in the volume elements representing intracellular space through generalizations of Gillespie’s algorithm  that include molecular diffusion events (transitions from one volume element to another) in addition to chemical reactions . For not too large regions of intracellular space, cellular membranes or cell-cell contact regions such as neuronal synapses, the stochastic diffusional trajectories of single molecular particles and their interactions may be simulated [32, 33].
Models that focus on cell state transitions and their consequences for inter-cellular communication – as opposed to details of intracellular biochemistry – are frequently formulated in terms of finite state automata. Finite state automata models consist of a set of states and rules specifying how each state reacts to input signals, for instance by switching from one state to another. Adding a spatial aspect to automata models, cellular automata consist of grids of ‘cells’ that switch between states based on the states of their neighbor ‘cells’ . Generalizations of the simple structure of cellular automata treat the single ‘cells’ as agents that can carry their states with them as they move on the grid representing extracellular space. Such agent-based models have been used for computer simulations of multi-cellular systems such as the adaptive immune system [35–37], excitable tissues  or populations of migrating cells , while others aim to provide generic platforms for studying how tissue-, organ- and organism-level biological phenomena may emerge from the behavior of simple, locally interacting cells . While cellular automata models treat the single ‘cells’ in their simulations as entities with fixed shape and size, Potts model simulations aim at reproducing the shape changes cells undergo due to mechanical contact with neighbor cells or extracellular matrices [41–43]. The boundary between (frequently agent-based) simulations of spatially confined multicellular systems and dynamic models of cell populations characterized mainly by their phenotype is somewhat diffuse. In those situation in which a model does not have to keep track of single-cellular states, cell population simulations are more efficiently performed based on differential equations than based on discrete agents. Differential equation models of cell populations describe how the change per unit time of the size of each population (that is, the number of cells in the population) depends on cell proliferation, death, differentiation and on interactions with other cell types or infectious agents [44–46]. Sometimes, such models include translocation of cells between spatial compartments – which, in the mathematical description, is very similar to state transitions. Dynamic differential equation based models have their roots in general population dynamical modeling. Predator-prey models , in particular, have been used to numerically investigate the influences of negative and positive interactions between cell populations .
The two paradigmatic fields with a long-standing tradition of multi-scale modeling efforts are heart  and brain research . Performing computational studies in these disciplines typically means facing the challenges of multi-scale models because the functions of heart and brain can only be understood by investigating the cooperative actions of large numbers of cells. Fortunately, for the cell types and their modes of communication in heart and brain tissue, useful models can be formulated and valuable insights gained based on strong simplifications and abstractions many of which include separation of biochemical from electrophysiological (and in the case of the heart: mechanical) cellular behavior. Recent modeling advances in neurobiology have produced models comprising millions of neurons and much larger numbers of synapses with different classes of functionality  and models of cardiac dynamics are now capable of reproducing and analyzing many aspects of whole heart behavior [49, 52–54]. Among the challenges that the disciplines of computational neurology and cardiology will now face is to develop models that use realistic couplings between intracellular biochemistry and cellular electrophysiology.
The modeling and simulation software Simmune [6, 13] was developed to connect the scale of interactions between molecular binding sites to the scale of (spatially resolved) intracellular biochemistry to the scale of whole-cell behavior and beyond, to the scale multi-cellular systems by combining reaction network simulations with rules (or mechanisms) for state transitions such as used in automata models. The software performs most of the bridging between the scales automatically. Based on the user inputs defining molecular properties such as diffusion coefficients and whether a molecule type represents a trans-membrane receptor or a membrane anchored adaptor or a freely diffusing cytosolic component and defining interactions between molecular binding sites and the (enzymatic) transformations they mediate the software automatically generates the resulting network of interacting molecular complexes and their kinetic relationships. Defining these properties does not require writing computer scripts or reaction equations. Instead, icononographic representations of molecules and molecular complexes are used (see Fig. 2a). After the building blocks of the extra- and intra-cellular biochemistry have been defined they can be used to specify the phenotype and the behavior of cells by defining the initial molecular contents of cells and, importantly, by defining cellular stimulus-response mechanisms that link sets of conditions (stimuli) to sets of actions (responses) that will be performed by the cells. A simple stimulus response mechanism could, for instance, link a supra-threshold ligation of a specific receptor in a user-defined region of the cytoplasmic membrane to cellular proliferation, death or differentiation. In this way, the scale of (sub-)cellular biochemistry can be coupled to the scale of whole cell behavior using phenomenological shortcuts (see section 2). During a simulated experiment, cells can be positioned into simulated 3D extracellular compartments and can interact through their surface receptors and can be exposed to extracellular molecular stimuli. The intracellular biochemistry of each cell can be investigated in detail and can be linked to the history of its movements and of the stimuli it received through other cells or extracellular molecules (see Fig. 2b).
Practically speaking, taking a ‘systems’ approach to a biological problem requires the integration of data collected over multiple scales in an effort to more comprehensively describe a biological process. The aim in such research efforts is to establish an infrastructure for data collection, storage and analysis that provides the basis for computational modeling of a biological system [55, 56]. Such models could in essence be limited to a single scale, but the potential of systems biology will only be realized with the development of multi-scale quantitative models capable of predicting the physiological consequences of interventions at the molecular level.
Almost any biological experiment will produce data that can potentially be incorporated into a multi-scale model, but developing an appreciation of where various types of data fit into a multi-scale modeling environment is a valuable exercise. The following descriptions of data types begins at the finest level of detail describing molecular interactions, working towards data which provides limited or no molecular details, but more accurately reflects a physiological function.
Detailed studies of molecular interaction between known components of a biological process might involve assessment of binding and reaction kinetics (SPR, ligand-receptor association), structure of both single proteins and complexes, measurement of post-translation modifications, or accurate quantitation of component concentrations. The number of components involved in a defined process may be expanded through literature survey and/or protein-protein interaction screens.
Any effort to connect a detailed molecular interaction model to a description of the contribution of such interactions in a cellular network requires an appreciation that the significance of an in vitro determination is context dependent. Assuming that co-expression of the said components in any cell system under study is a given, it is important to establish that the components can interact in the cell (biochemical fractionation, co-immunoprecipitation) and that they have overlapping spatial distribution (fluorescence microscopy). Stimulus dependent regulation of component levels (expression arrays, next-generation sequencing) is also a key determinant of network structure and dynamics, as is the activity state of components (associated inhibitors, activity dependent post-translational modifications).
Expression arrays, proteomic and flow cytometry approaches and quantitative microscopy provide important data sets for prediction of network function in different cellular contexts. Testing of such predictions ultimately rests on perturbations that can establish network sensitivity. Use of small molecules, genetic knockouts, RNA interference and expression of dominant negatives are currently popular approaches that can identify key nodes in a network. Epistasis mapping can also be used to establish whether redundancy in key components confers robustness to the system.
Can a model describing network structure and sensitivity nodes established using in vitro or cell-based assays predict higher order cell-cell interactions and physiological phenomena? The immune system is especially well-suited to address such questions  as it can be reconstituted with specific cell types harboring specific genetic modifications. Thus it allows us to ask whether computational models derived from molecular data linked through the multiple scales described here can explain or predict cellular function and cell-cell interactions in vivo. The ultimate goal in this regard would be to develop a predictive multi-scale model of how an organism interacts with an infectious agent, and how the genetics of both the pathogen and the host affects the outcome of such an infection.
Due to the inherent heterogeneity of their components developing successful multi-scale models is, above all, an exercise in integrating various simulation techniques, various experimental techniques and data from heterogeneous sources. Projects like cytoscape (http://www.cytoscape.org) or the geWorkbench of the National Center for the Multiscale Analysis of Genomic and Cellular Networks (MAGNet) (http://magnet.c2b2.columbia.edu/index.php) aim at providing frameworks for such integration efforts.
On the computational side, model sharing standards such as SBML  and CellML  are evolving to accommodate new modeling capabilities (generation of reaction networks from molecular binding site interactions, spatially resolved computational representations of intracellular biochemistry etc.). While variations in experimental protocols and measurements are unavoidable, in the future, increasing emphasis will have to be put on improving standardization of ontology and data reporting to take us closer to the grand vision for multi-scale modeling and systems biology – to achieve seamless integration of models, simulation techniques and the necessary data to build and test the models.
This work was supported by the Intramural Research Program of the National Institutes of Health, National Institute of Allergy and Infectious Diseases.
We thank the reviewers for their comments and for suggesting to cover some additional aspects of multi-scale modeling we had previously omitted.
Fig. 1 contains an image created by Thomas Splettstoesser, published at: (http://commons.wikimedia.org/wiki/Image:Arp2_3_complex.png).
1For a more philosophical discussion on the explanatory value of phenomenological versus mechanistic models see, for example, reference 9. Craver, C., When mechanistic models explain. Synthese, 2006. 153(3): p. 355–376.