We specified a rule-based model for molecular interactions in the ERBB receptor signaling network (see Methods). The model specification, including nominal parameter values, is provided in the form of a BioNetGen input file
[
28], which is a plain-text file. The file comprises the “Full Model Specification” Tiddler of our TiddlyWiki, which is available online (
https://modeling.tgen.org). The BioNetGen input file, which is named “ERBB_model.bngl,” is also provided separately (Additional file
1). The collection of online materials is included in the Supplementary Material as an archive file (Additional file
2). The model is composed of 544 rules. It accounts for 18 proteins, over 30 protein domains, several linear motifs, and 56 sites of lipid and protein phosphorylation. The rules of the model represent interactions of ligands with ERBB receptors, receptor dimerization, phosphorylation-dependent interactions of adapter proteins with receptors, the MAPK cascade downstream of Ras, PI3K signaling events that regulate phosphorylation of Akt, multiple feedback loops, and phosphorylation events that regulate the above processes. Dephosphorylation reactions are included in the model, but the phosphatases that mediate these reactions (e.g., PTEN and SHP-2) are not explicitly considered.
The model accounts
implicitly for more chemical species (»10
100) than there are molecules available to populate these species (Figure
). The model is able to provide this description because of simplifying assumptions embedded in its rules
[
10], which derive from assumptions of modularity. We view such assumptions as reasonable because proteins are composed of modular parts
[
33]. The trade-off for concise model specification is coarse-graining of chemical kinetics, meaning that all reaction events implied by a rule are taken to have the kinetic rate law associated with the rule. This coarse graining is controllable, as rules can be refined as needed to capture empirical data. Indeed, the only essential difference between a rule-based model and an ODE model lies in the means of model specification; both types of models provide representations of chemical kinetics
[
34]. To specify an ODE model, a modeler must state which chemical species in a system are populated and how these species are connected and influence each other. In contrast, to specify a rule-based model, a modeler must state which interactions are occuring in a system and the contextual dependencies of these interactions. The latter approach is more convenient when interactions depend mostly on local properties of proteins, such as whether a site is phosphorylated and free. Rules for interactions, together with rate laws and parameter estimates, can be used to predict which of the potentially populated chemical species are populated, regardless of the number of potentially populated chemical species
[
11-
13]. The main point of Figure
is to illustrate that known interactions and post-translational modifications of EGFR imply a number of potentially populated chemical species that is so large as to confound intuition and the ODE modeling approach, because the subset of populated chemical species, which is the information required to specify a mechanistic ODE model incorporating site-specific details about EGFR interactions, is impossible to identify through measurement or inference based on simple reasoning.
A challenge of developing a large model is communicating the substance of the model in such a way that it can be understood. In Figure
, we present an extended contact map
[
22], which shows the proteins, protein components, and sites of phosphorylation as well as the direct interactions and enzyme-substrate relationships considered in the model. Proteins are represented as boxes and arranged in layers to suggest the causality of signaling events, with the top layer corresponding to ligands, the layer below corresponding to ERBB receptors, etc. Most of the 544 rules of the model can be mapped to one of the 31 interactions represented by arrows in Figure
. The rules corresponding to a given arrow represent a common interaction but in different contexts. The correspondence between rules and arrows is indicated in a model guide
[
22], which is described below.
Making a large model reusable and extensible requires not only a means to understandably visualize the model but also annotation so that the basis of the model can be evaluated and updated as new knowledge is generated. To annotate our model, we prepared a model guide
[
22] (see Methods and Additional Materials). The guide links formal elements of the model (viz. graphs used to represent proteins and their component parts) to information about these components available in online resources, such as UniProt
[
35], OMIM (
http://omim.org), and Pfam
[
36]. This ability to easily connect formal model elements to information available in online resources, including sequences, is one of the advantageous and innovative features of rule-based modeling. For each protein included in the model, the guide includes a brief summary of available knowledge that was considered in the formulation of the model. Finally, as mentioned above, the guide links the arrows of Figure
to specific rules.
The parameters of our model, rate constants and protein copy numbers, are largely unknown. Identifying the values of these parameters to obtain a validated, predictive model would be a formidable challenge. Here, our intention is simply to demonstrate the feasibility of specifying, visualizing, annotating and simulating a model that captures the site-specific details of protein-protein interactions in a signaling network. Such a model can make predictions about time courses of phosphorylation for individual serine, threonine, and tyrosine (S/T/Y) residues, which is essential for mechanism-based interpretation of multiplex temporal phosphoproteomic data
[
37]. For the purpose of demonstrating that the model can be simulated, we divided the model parameters into several classes and estimated a range of feasible values for each class (Table
). We then sampled within these ranges to randomly specify nominal parameter values (see Methods).
| Table 1Ranges considered for six classes of model parameters |
As stated above, the sheer size of the network captured (implicitly) in our model (in excess of 10
100 reachable chemical species) has posed a barrier to simulation using conventional methods. Obviously if the cost of simulation scales with network size, then simulation of such large-scale reaction networks becomes impractical. On-the-fly stochastic simulations algorithms are an alternative to numerical integration of ODEs
[
38,
39] but on-the-fly simulation also becomes prohibitively slow as the number of populated states increases and the number of reachable states explodes
[
40]. The CPU time required for simulation of the model using such a method increases exponentially as the number of reactions in a network grows (exponentially). In contrast, network-free simulation methods
[
11,
12,
40,
41] have a constant cost of simulation per reaction event and hence the CPU time increases linearly with the number of reaction events in a system (Figure
A).
Figure
B illustrates that in our model a large number of chemical species quickly become populated after initiation of ERBB receptor signaling. Within 1 second after initiation of signaling, over 1,500 chemical species are populated. This number of species exceeds the number that can be practically considered in a manually specified ODE model in which one equation would be required for each reachable species. The results of Figure
B suggest that dispersion of mass into a large number of chemically distinct states is an inherent feature of cell signaling networks and explains why the on-the-fly method becomes impractical (Figure
A). It should be noted that the simulations of Figure
are not physiological, as the initial condition is artificial. The point of these simulations is simply to demonstrate that interactions of signaling proteins can be expected to lead to the population of more chemical species that can be practically tracked in an ODE model.
To simulate the model we use NFsim
[
13], which implements a network-free simulation algorithm
[
40]. The simulation results are shown in Figure
. The heat map of Figure
reports time courses of phosphorylation for the 55 S/T/Y residues considered in the model. The time courses, which are clustered by similarity, are representative of results obtained with other parameter values, in that the model consistently predicts distinct kinetics for different sites of phosphorylation. Thus, multiple sites of phosphorylation can be lumped together only with careful consideration, because in general, the kinetics of phosphorylation can be site specific. It is possible to generate the results of Figure
because the cost of network-free simulation, which was applied to obtain these results, depends on the number of rules in a model, not the number of reactions or chemical species implied by the rules.