|Home | About | Journals | Submit | Contact Us | Français|
Computational models of signal transduction face challenges of scale below the resolution of a single cell. Here, we organize these challenges around three key interfaces for multiscale models of cell signaling: molecules to pathways, pathways to networks, and networks to outcomes. Each interface requires its own set of computational approaches and systems-level data, and no single approach or dataset can effectively bridge all three interfaces. This suggests that realistic “whole-cell” models of signaling will need to agglomerate different model types that span critical intracellular scales. Future multiscale models will be valuable for understanding the impact of signaling mutations or population variants that lead to cellular diseases such as cancer.
Cells sense their environment, process information, and respond through the molecular biology of signal transduction.22 One amino acid mutation in a signaling protein can have profoundly adverse consequences for diseases such as cancer.14 Thus, signaling is fundamentally a multiscale problem in need of modeling approaches that can bridge across scales.
There are many reviews in this issue that focus on multiscale biological models connecting molecular-cell, cell-tissue, and tissue-organ scales. Here, we restrict the scope of our review to the different molecular scales that exist within a single cell (Fig. 1). We use “molecular scales” to refer to the number and granularity of the subcellular or submolecular components that make up a system. At the lowest scale, enzymes regulate posttranslational modifications on individual proteins to transmit information via intracellular pathways. These pathways are wired together as intracellular networks that enable crosstalk and feedback control. Ultimately, the networks converge upon a key set of effector proteins, which mediate discrete cellular outcomes, such as proliferation, differentiation, and death. This review focuses on models that make predictions across the molecule-pathway, pathway-network, and network-outcome scales. We conclude with some pragmatic simplifications for future models and look ahead to the new frontiers of modeling signal transduction across multiple scales.
Signaling proteins relay information by changing their abundance, activity, localization, or binding partners. These rapid signaling events are largely triggered by posttranslational modifications (phosphorylation and ubiquitylation) on the target protein. There are many techniques for cataloging specific protein modifications and for making predictions about the protein targets of modifying enzymes.23,39,40,52,78 However, it is difficult to integrate these lists of modification sites and candidate targets in a way that makes clear predictions about new signaling pathways.
Linding et al.58 developed an integrated computational approach, called NetworKIN, for predicting candidate protein kinases for catalyzing an observed phosphorylation event. NetworKIN takes individual phosphorylation sites from proteomic data and maps the flanking amino acid residues onto position-specific scoring matrices64 to search for consensus substrate motifs of protein kinases. The algorithm is improved by incorporating contextual information about the modified protein based upon its reported protein interactions.80 Then, the local interaction network is surveyed for protein kinases with the closest sequence similarity to the consensus substrate motif. Using known kinase-substrate pairs, the authors found that incorporating contextual information doubled the accuracy of their predictions.58 The importance of cellular context was further reinforced by a recent study that incorporated subcellular localization when considering mitotic kinases with overlapping substrate specificity.4 The tools for pathway prediction are most advanced for signals mediated by protein phosphorylation. However, similar approaches could soon be adapted to other classes of posttranslational modifications as we gain a greater understanding of the sequence specificity of the modifying enzymes.19
A major complication for modeling signaling pathways is that proteins are often modified on multiple sites. This creates a challenge for experimentally monitoring n different modification sites and for modeling each of the 2n possible modification states of a signaling protein. Sometimes, the biology reveals its own simplifications that can streamline a multiscale model. For example, autophosphorylation of the fibroblast growth factor receptor (FGFR) occurs on seven distinct tyrosine residues, creating 27 = 128 possible modification states. However, Furdui et al.27 found by mass spectrometry that FGFR autophosphorylation occurs as an ordered sequence: Y653 first, then Y583, then Y463, etc. This stereotyped cascade allowed the authors to build a kinetic model of FGFR signal propagation that was far more tractable than the combinatorics originally implied.
Other simplifications can stem from the architecture of a multisite model itself. Thomson and Gunawardena79 showed that a generic multisite model of protein phosphorylation simplifies dramatically when the steady-state distribution of phospho-states is considered. The authors found that the number of possible steady-state distributions increases with the number of phosphorylation sites. The different distributions create a platform for complex information processing by downstream pathways33 and suggest an explanation for why multisite phosphorylation has expanded during evolution. Computationally, the approximation79 should prove valuable for modeling receptor tyrosine kinases, which autophosphorylate rapidly and may operate at quasi-steady state relative to their downstream pathways.
Specific amino-acid modifications can be tracked empirically by various methods, but monitoring protein–protein associations is much more difficult. For association-driven signaling pathways, mathematical models can be particularly useful to assess the relative importance of competing molecular mechanisms. One pathway where protein–protein associations are critical is apoptotic signaling induced by cytokines. Albeck et al.1 modeled the intersection of apoptotic caspases and Bcl2-family proteins upon pathway activation by TNF-related apoptosis-inducing ligand (TRAIL). The authors paid particular attention to the protein–protein associations among Bcl2-family members and their ability to form pores in mitochondria. Mitochondrial permeabilization was known to be associated with apoptosis, but the authors’ model suggested that puncturing this organelle was critical for eliciting the “snap-action” kinetics of cell death. Surprisingly, positive feedback from the enzyme-catalyzed branches of the TRAIL network was insufficient to yield snap-action behavior. The model thus enabled interrogation of specific molecular processes that remain empirically inaccessible.
Combining protein–protein interactions and post-translational modifications often results in a combinatorial explosion of species and kinetic parameters that is difficult to handle analytically.36 Blinov et al.12 and Lok and Brent59 developed two computational strategies for dealing with this complexity of scale. The software BioNetGen12 implements “rule-based modeling” in which individual reaction rules are specified for each molecular state transition. The key simplification here is that transition rates are assumed to be independent of one another. Independence allows the parameters to scale with the number of molecular species rather than with the number of molecular states of those species. Faeder et al.24 used BioNetGen to model FcεRI signaling triggered by dimeric IgE. The authors found that state trajectories varied wildly depending on the expression levels of key signaling proteins, suggesting that a spectrum of combinatorial states could confer signaling specificity.
The Moleculizer approach from Lok and Brent59 implements a different modeling strategy by building reaction networks on demand. When the combinatorics of molecular states dramatically exceeds the number of signaling molecules in a cell, one can model the stochastic behavior of individual molecules more readily than the network itself. Moleculizer exploits this property by following the trajectory of single molecules through the reaction network and then summarizing the results at the end of the simulation. This strategy solves the problem of combinatorics but creates others with respect to inefficiencies in parameter estimation.13 Nevertheless, BioNetGen software now accommodates Moleculizer-like approaches, so the relative performance of the two can be compared for specific combinatorial applications.
Various intracellular pathways have now been encoded as detailed biochemical reaction networks with plausible model parameters.1,6,17,37,56,62,67 These models can explain and predict experimental data, but it is unclear how scalable they are when multiple interacting pathways are considered. The earliest efforts at wiring together pathways coincided with the first mechanistic models themselves. Bhalla and Iyengar11 assembled a library of signaling pathways and showed that inter-pathway connections could lead to “emergent properties” within the network. For example, crosstalk between the Ras-MAPK and PLCγ-PKC pathways through cytosolic phospholipase A2 creates a positive feedback loop and the emergence of MAPK-PKC bistability. These and other network architectures were shown to provide dynamical-systems properties that could give rise to sustained signaling activity, even when the initial stimulus was withdrawn. This thought-provoking study was among those that laid the groundwork for defining systems biology as a discipline.41
More recently, others have built upon established models of signaling pathways to incorporate multiple input stimuli. Borisov et al.15 began with an existing biochemical-reaction model of PI3K-MAPK signaling induced by the EGF receptor and then overlaid the receptor-adaptor proteins for insulin. The authors’ network reconstruction, model training, and experimental predictions pointed to the adaptor protein GAB1 as an important site of crosstalk for EGF+ insulin-induced MAPK activation. Basak et al.8 achieved the same goals for NF-κB signaling but in reverse. They began with the biochemistry of the noncanonical IκB protein p100/IκBδ, uncovering an essential role in developmental NF-κB signaling from the lymphotoxin β receptor. Then, the authors built upon a pre-existing model of canonical NF-κB signaling37,82 to include a noncanonical arm containing p100/IκBδ. The revised model accurately predicted NF-κB “cross-priming,” whereby prestimulation of cells with TNF augmented the subsequent response to lymphotoxin β.8 Together, the Borisov et al.15 and Basak et al.8 studies illustrate how existing biochemical-pathway models can be intelligently expanded to networks that receive multiple inputs.
Sometimes, downstream signaling mechanisms are too entangled to attempt a complete biochemical-reaction model with multiple input stimuli. However, there is often a basic understanding of the rules that different signaling pathways obey, and a rudimentary sense of how they may intersect. Here, Boolean network models have been found to be particularly valuable. Starting with the Ingenuity repository of literature-derived connections, Saez-Rodriguez et al.73 assembled a provisional Boolean network shared by seven cytokine receptors. The authors pruned their initial network by comparing the model predictions to a large set of phospho-protein measurements collected in HepG2 hepatocellular carcinoma cells. This step reduced the number of network edges by several fold, raising the possibility that only a small set of possible interactions are active in a specific cellular context. Moreover, the authors found a handful of edges that improved model performance but were absent in the initial network. Similarly novel edge requirements were also reported by Aldridge et al.,3 who used fuzzy logic (rather than discrete Boolean logic) modeling to reconstruct the TNF–EGF–insulin signaling network. Thus, simple rule-based network models are capable of proposing new experimentally testable interactions, even though the mechanisms themselves are not hard-coded as formal biochemical reactions.
There remain many biological scenarios where pathway connections are too speculative to assemble a plausible rule set for logic-based models. When scaling such pathways into networks, it is preferable to use modeling approaches that are purely data driven. Some data-driven approaches are incredibly simple but useful. For example, using signaling data from a two-ligand screen of cytokines and GPCR agonists, Natarajan et al.63 devised a straightforward “interaction score” to gauge the nonadditivity between input stimuli. Combined with a literature-based reconstruction, the score allowed the authors to hone in on a handful of surprising points of crosstalk that merited further investigation.
Other data-driven approaches are based on associations within a dataset. Garmaroudi et al.28 assessed specific pairwise correlations between phosphoproteins activated in cardiomyocytes upon infection with coxsackievirus B3. The authors built a data-driven phosphoprotein network by using partial correlation coefficients, which account for the confounding correlations among other measurements within the dataset. This reconstruction revealed that NF-κB signaling induced by inflammatory autocrine cytokines45 was a critical component of the host cell-death response.28 An elegant, but data-intensive, alternative to frequentist approaches is to infer network associations de novo with Bayesian methods.26 Sachs et al.72 exploited the single-cell “experiments” inherent to flow cytometry and reconstructed a network of nine phosphoproteins in CD4+ T cells. The application of Bayesian networks should become more widespread with the advent of mass cytometry, which can measure dozens of signaling proteins concurrently in single cells (see below).9 Given the many analytical approaches now available for network reconstruction, one must look closely at the pressing biological questions and available information before determining a strategy for multiscale modeling.46
Signaling networks carry information that determines key cellular behaviors. The approaches described above give detailed predictions about individual components or pathways, but they are inadequate for capturing how the pieces fit together to influence cell phenotype. This is because of the long time delay between signaling and most cell phenotypes, which creates a problem for network models that usually operate on shorter time scales. To bridge this gap, data-driven statistical models have been very successful and are now widely used.49
One context where data-driven models are clearly advantageous is when considering multiple cellular inputs. Although mechanistic approaches are possible,8,15 such biochemical-reaction models usually stop at effector signals and cannot move further downstream to explicit cell outcomes. Janes et al.44 took a data-driven approach to apoptosis and measured a panel of signal transduction and effector proteins that became activated during combinatorial stimulation with conflicting stimuli (TNF, pro-apoptotic and EGF or insulin, anti-apoptotic). Using nine combinations of cytokines, they measured nineteen intracellular parameters at multiple time points to generate ~8000 measurements, which were directly paired with multiple readouts of apoptosis. To interpret the high-dimensional dataset, the authors used partial least squares regression (PLSR) as a means to reduce the measurements to two axes that mapped to apoptotic response. The PLSR model accurately predicted cell-death outcomes based on how strongly the intracellular measurements projected onto the reduced two-dimensional space. Notably, the two axes were associated with separate combinations of survival and stress-death pathways, suggesting biologically distinct dimensions of signal transduction. Projections along these axes revealed different strategies that cytokine stimuli use to modulate the apoptotic response. For example, EGF reduced cell death by antagonizing apoptotic signaling induced by TNF, whereas insulin antagonized TNF-driven signaling and also supplemented the network with additional pro-survival signaling. This study established a framework in which complex extracellular stimuli and intracellular networks can be condensed into a simpler dimensional space that directly and mechanistically maps to cell phenotype across longer time scales.
A major challenge in cell signaling is that phenotypic responses to environmental stimuli are usually cell-type specific. These observations conflict with models of signaling pathways, which often imply that cells have a common intracellular circuitry.66 To hone in on the origin of cell-type specificity, Miller-Jensen et al.60 sought to examine the “effector layer” of signaling. Signaling networks often have an “hourglass” topology,66 and effectors lie at the waist where signaling inputs converge and outputs are disseminated. The authors selected three epithelial cell types with profoundly different cell death and cytokine responses to adenovirus infection and TNF. As expected, effector signaling was dramatically different amongst the cell lines. But surprisingly, the difference in responses could be captured by a common “effector processing” model that linked cell-specific effector signals to cell-specific outcomes. This study thus suggested that cell-type specificity arises at the transducer level between adaptor proteins33 and effector proteins.60
Like apoptosis, proliferation is promoted or suppressed by various external cues and has been shown to be amenable to data-driven modeling. Two different studies constructed PLSR models of proliferation, each illustrating different strategies for model training. Kumar et al.53 measured downstream phosphorylation states in the context of siRNA perturbation to construct a model of B-cell proliferation induced by the B-cell antigen receptor. In this strategy, the authors used loss-of-function perturbations directed at known network components to decompose input–output relationships in B cells. By contrast, Kumar et al.54 took an approach that was less biased but more correlative. The authors used mass spectrometry data of the phosphoproteome to identify a subset of nine phosphorylation sites on six proteins that accurately predict cell proliferation in response to EGF-family cytokines. Together, these two studies show how the choice of training influences the resulting value of a multiscale network-phenotype model.
Occasionally, intracellular signals are so tightly associated with certain cell functions that the signal itself can be considered as a phenotype. For the fastest cellular responses, second messengers such as Ca2+ and cAMP often fill this role. Saucerman et al.76 constructed a differential equations-based kinetic model of cAMP production in cardiomyocytes, where cAMP controls cell contractility. By perturbing the in silico network, they uncovered potential mechanisms for maintaining cAMP concentration in response to iso-proterenol stimulation. Cardiomyocyte contractility may be a special case where detailed molecular models of signaling can scale to cell and tissue function.18
Like other cell phenotypes, second-messenger signaling is subject to convergent stimuli from the environment. Chatterjee et al.16 examined calcium-signaling dynamics in platelets activated by six different agonists. The authors collected calcium transients triggered by individual or paired stimuli and used these data to build a neural-network model linking inputs to calcium mobilization. The model accurately predicted the calcium trajectory of platelets activated by higher-order combinations of stimuli despite that the model was trained with data from pairwise stimulation. These results suggest that pairs of extracellular stimuli are sufficient to capture how complex environmental inputs modulate signaling outcomes, which may provide an important simplification for future studies.43
Can models of signal transduction be scaled effectively beyond the level of a single cell? Achieving this long-term goal will require models that can accommodate many more diverse inputs while keeping track of individual cell fates across longer spatial and time scales. Future modeling efforts will also need to invoke simplifications at the molecular level that are acceptable for tissue- and organ-level simulations.
Paracrine-juxtacrine signaling plays an important role in the development and maintenance of normal tissues.81 How multiple cells coordinate and respond to these signals is difficult to decompose experimentally because they are highly iterative. During development, for example, initial morphogen gradients drive tissue polarity, which leads to subsequent molecular patterning.55 This complexity can be made more tractable when examining the earliest cell-to-cell communication events in model organisms. In the Drosophila syncytium, multiple nuclei share a common cytoplasm, and signaling can be dynamic and compartmentalized during development. Sample and Shvartsman75 have begun to model the reaction–diffusion dynamics of morphogens within this system, raising the possibility that outputs from this model could serve as inputs for a multicell model after cellularization. Syncytial modeling could provide a bridge between the single-cell molecular scales and multicell spatial-time scales, much like how Xenopus biochemical reconstitutions have helped to connect signaling pathways to higher-level network properties through modeling.25,74
Cell-to-cell heterogeneity occurs in many normal and disease contexts, such as development and cancer. Accounting for heterogeneity is important—there may be variation in molecular states on a single-cell level so profound that the “average” cell in a computational model does not exist.57 Recently, there have been several technical advances that allow more in-depth monitoring of single-cell states. Bendall et al.9 reported the combination of flow cytometry with mass spectrometry (“mass cytometry”) as a means for quantifying dozens of proteins within single cells. In this technique, antibodies are labeled with elements not normally present in cells. These elements provide an isotopic tag for mass spectrometry measurements that quantify the amount of protein of interest, and tags can be readily multiplexed with antibody cocktails. Mass cytometry should be invaluable for identifying heterogeneously activated signaling pathways in suspension cells, though sample-processing challenges remain for adherent cells that must be dissociated before antibody staining.
Sampling strategies from multiple cells can also give single-cell information when combined with quantitative analysis. Janes et al.48 reported a technique called “stochastic profiling,” which uses repeated ten-cell samplings to identify heterogeneously expressed transcripts. Ten-cell samplings avoid population averaging and allow for accurate nucleic acid amplification that would not otherwise be possible from single cells. The analysis uses statistical fluctuations in the 10-cell samplings to highlight genes that are heterogeneously expressed. While the authors restricted their analysis to gene measurements, the general sampling strategy could be extended to signaling measurements that are sensitive down to tens of cells.42,65
Single-cell methods will help to guide our knowledge of how cells interact with one another but raise an important challenge of how to synthesize and model these interactions. With this goal, agent-based models have been successfully employed to interrogate cell–cell and cell-environment interactions across longer spatial-time scales.68,69,71 These models treat individual entities (e.g., cells) as “agents” that follow a set of biologically relevant rules. Allowing the agents to iterate through the rules can reveal emergent phenomena of the system. Agent-based models are powerful for examining the influence of the cellular microenvironment,7 proposing critical cell-environment interactions that could easily be overlooked.
The compounding complexity of signaling across molecular scales raises the question of whether any simplifications can be invoked during the process of model building. Early on, for example, modelers went to great pains to assign absolute number to concentrations of signaling proteins in a network.11,17,51 The feeling was that such network models were not “complete” unless they accurately captured each molecular detail that was encoded. However, it soon became clear that systems-level properties of signaling networks were robust to fluctuations in most of the constituent proteins.5,10,37,51 In addition, multiple groups showed that the functional response of cells correlates more strongly with relative changes in signaling rather than absolute levels.20,32,47,61 Consequently, there is now a greater focus on the kinetic properties of signaling and the changes relative to a semiquantitative baseline value.17,62 It is far easier to quantify relative fold changes in signaling proteins compared to absolute levels.2 Adopting the fold-change simplification therefore allows multiscale models to be constrained by a larger set of experimental data. Similar results have since been reported for kinetic parameters,77 suggesting that network topology may be the critical determinant of systems-level properties.
Another recently uncovered simplification involves signal processing and the superposition of multiple environmental inputs.43 A decade ago, the first consortium-wide efforts for signal transduction assumed that crosstalk and nonlinearities would abound among receptors.31 Various instances of synergy and antagonism were uncovered in this work,63 but they were surprisingly rare compared to the number of stimulus pairs considered. Since then, multiple independent studies have found that synergy or antagonism beyond pairs of inputs is virtually nonexistent.16,30,38,43 Higher-level combinations of input stimuli are transduced as linear combinations, providing a dramatic simplification for modeling the complexities of a cell’s microenvironment.43 One can efficiently survey the input–output properties of a signaling network by using stimuli29,44 or perturbations28 in pairs to infer other higher-order combinations by linear superposition.
There is no “one size fits all” approach for modeling signal transduction across multiple scales (Table 1). For example, ordinary-differential-equation models thrive at capturing the molecule-to-pathway scale, struggle with the pathway-to-network scale, and can only model the networks-to-outcomes scale if the outcome is a molecular readout. Therefore, we predict that true multiscale modeling in the future will require hybrid models that integrate completely different model types in a coherent way.35 If successful, the clinical impact of such models could be tremendous. Already, many signaling diseases are diagnosed and treated at the molecular level. Our models must keep pace and strive to push the field forward.
We thank Eric Greenwald for critically reviewing this manuscript. Work in the Janes lab is supported by the National Institutes of Health Director’s New Innovator Award Program (1-DP2-OD006464), the American Cancer Society (120668-RSG-11-047-01-DMC), the Pew Scholars Program in the Biomedical Sciences, and the David and Lucile Packard Foundation.