|Home | About | Journals | Submit | Contact Us | Français|
Highly complex molecular networks, which play fundamental roles in almost all cellular processes, are known to be dysregulated in a number of diseases, most notably in cancer. As a consequence, there is a critical need to develop practical methodologies for constructing and analysing molecular networks at a systems level. Mathematical models built with continuous differential equations are an ideal methodology because they can provide a detailed picture of a network’s dynamics. To be predictive, however, differential equation models require that numerous parameters be known a priori and this information is almost never available. An alternative dynamical approach is the use of discrete logic-based models that can provide a good approximation of the qualitative behaviour of a biochemical system without the burden of a large parameter space. Despite their advantages, there remains significant resistance to the use of logic-based models in biology. Here, we address some common concerns and provide a brief tutorial on the use of logic-based models, which we motivate with biological examples.
The emergence of molecular biology has produced a vast literature on the cellular function of individual genes and their protein products. It has also generated massive amounts of molecular interaction data derived from high-throughput methods as well as more classical low-throughput methods, such as immunoprecipitation, immunoblotting, and yeast two-hybrid systems. From this accumulation of interaction data, researchers can now attempt to reconstruct and analyse the highly complex molecular networks involved in cellular function.
Intracellular molecular networks are known to be highly dysregulated in a number of diseases, most notably in cancer, and targeted molecular inhibitors have emerged as a leading anti-cancer strategy. Despite promising pre-clinical studies, many targeted inhibitors are beset by harmful off-target effects and/or lower than expected efficacy in the clinic. The large number of off-target effects associated with molecular inhibitors was recently termed the “whack a mole problem”1 because inhibiting one molecular target often results in the activation of another non-targeted molecule. It is increasingly clear that the inability of many targeted therapies to keep a disease in check is related to the complex interactions and emergent, non-linear behaviours found in intracellular networks. As a consequence, there is a critical need to develop practical methodologies for constructing and analysing molecular networks at a systems level.
The objective of systems biology is to integrate experimental data with theoretical methods to build predictive models of complex biological processes across a variety of spatial and temporal scales. Two very different paradigms of system biology are frequently used to construct and analyse network models of molecular interactions inferred from experimental data: structural network analysis methods and mathematical models based on differential equations. A third increasingly important network analysis paradigm in systems biology is the application of logic-based methods to generate predictive output.2,3 Although qualitative in nature, logic-based methods have the capacity to provide insights into the dynamics of highly complex gene regulatory and signal transduction networks without the burden of large parameter spaces.
Understanding the networks associated with neoplastic diseases offers especially difficult challenges. Fundamental problems in understanding the transition from the normal to near normal to dysplastic to neoplastic to metastatic states of cancer progression can theoretically be modelled by longitudinal comparisons of networks in which, as progression occurs, certain molecular interactions are rendered stronger (for instance through gene amplification) or lost (through mutation, deletion, down-regulation, or methylation). Logic models provide a framework in which these types of network comparisons are possible. Multi-state logic models can simulate signal amplification and random order asynchronous logic models can simulate the heterogeneous response in a population of cells to diverse stimuli. Logic-models are also well suited for performing in silico molecular perturbations, which could be used to predict a population level response to a targeted therapy or a combination of therapies. In this review, we provide a tutorial on the use of logic-based methods as well as a discussion of their limitations, using biologically motivated examples.
Typically, knowledge of molecular interactions is summarized in diagrams of varying complexity, commonly known as interaction networks.4 In an interaction network diagram, each node represents a molecule and a line drawn between two nodes represents a molecular interaction, also referred to as an edge in graph theory. If the nature of an interaction between two nodes is known (e.g., which molecule is the regulator that activates or inhibits the other molecule), the edge is said to have directionality. If a correlation between the activities or expression levels of two nodes is known but the causal relationship underlying their interaction is not, the edge is said to lack directionality.
Structural network analysis provides a picture of the correlations between molecules in very large networks. In structural network models, which are usually derived from high-throughput genomic or proteomic experimental methods, the directionality of interactions in the network is generally not known and it is the static correlations in expression patterns that are important. The primary objective of these methods is to infer functional patterns in large networks using statistical methods.5 These methods are also used to construct species specific interactomes.4 A limitation of these methods, however, is that they generally provide only a static view of molecular interactions in a network at single point in time. Additionally, the current experimental methods used to generate data for structural network models are extremely noisy,6 which further limits the predictive power of this method.
On the other hand, systems of ordinary differential equations (ODEs) are frequently used to model biochemical reactions involved in gene and protein regulation. In these models, information about the mechanistic nature of the interaction is essential and edges between species must be directional. ODE models are built from underlying biophysical principles, such as biochemical rate laws and the conservation of mass and energy. Consequently, ODE models have the capacity to be highly predictive.7–9 This predictive power translates into the ability to generate a dynamic view of the concentration of each interacting species in the network over time as well as the ability to identify biologically realistic steady states.8 The predictive power of ODE systems is dependent, however, on large numbers of kinetic parameters that are rarely known with any degree of certainty. These powerful methods are, therefore, limited both by the enormous parameter spaces involved in even a relatively simple network and by their need for detailed mechanistic knowledge a priori.
Logic-based network models were pioneered in the biomedical sciences by Kauffman 10,11 and represent a compromise between structural analysis and ODE methods in terms of precision and complexity.7 While logic models do not require mechanistic knowledge of interactions, they do require knowledge of edge directionality. In their simplest form, logic-based models permit each biochemical species (represented as nodes in a network) to be in one of two discrete states: ON or OFF. The state of a logic network evolves in a dynamic fashion as nodes in the network are switched ON and OFF according to the state of other nodes in the network, until the network settles into an unchanging state, often referred to as an attractor.3 Logic-based models with only two binary states are generally known as Boolean models. While there is no explicit notion of time in a logic model, each round of updating can be considered an arbitrary time unit.
The assumption that a molecule can have only two possible states is a simplification of biological complexity. It is a reasonable regulatory approximation, however, given the switch-like sigmoidal relationship often observed between an affector molecule and its target molecule (Fig. 1A, B).3,12–14 It is important to emphasize that when a molecule’s node is OFF in a discrete logic model, it does not imply that the molecule has zero concentration in the system. Instead, it implies that the molecule is not present at a high enough level to induce a change in the molecules it directly regulates.14 Wh en a molecule’s node is ON in a logic model, it means the molecule has reached a threshold of functional activation that is high enough to affect the state of the molecules it directly regulates. More specifically, a target molecule will remain OFF in a logic model until its activator reaches a specific threshold of activity (Fig. 1B). Likewise, a target molecule will remain ON in a logic model until its inhibitor reaches a specific threshold of activity (Fig. 1A).14 As a consequence, logic models can only provide qualitative approximations of molecular regulation. While this represents a limitation of the methodology, in reality, the majority of experimental data available on molecular regulations are also qualitative in nature.15
More complex logic-based methods have been developed, such as multi-state and fuzzy logic methods, which permit nodes to be in more than two discrete states. In addition, logic models that allow node states to vary continuously between states (e.g., from 0 to 1) have also been developed.7,12,16,17 Although theoretically able to more precisely simulate biochemical regulation,16 these more complicated approaches require parameter value estimates that are rarely known and, in some cases, are difficult to correlate with biophysical chemistry theory. Thus, discrete two-state logic models (Boolean models) are an intuitive and predictive method for describing biochemical interactions without requiring prior knowledge of complex mechanistic details of reaction kinetics (needed for ODE systems) or degrees of membership (needed for multi-state fuzzy logic systems).
Moreover, Boolean models can produce the same qualitative output as more quantitatively precise ODE models, when well-constructed. For example, Albert and Othmer18 used a Boolean model of the genetic regulation of segmentation patterns in Drosophila to produce results that were in close agreement with an earlier ODE model of the same system.19 Fauré et al.15 analysed a simple Boolean model of cell cycle regulation and found qualitative steady state agreement with a complex ODE model.20 More recently, Akman et al.21 demonstrated that a series of Boolean models produce the same qualitative output as a series of ODE models of circadian clock regulation. In addition, Boolean models are well suited for the testing of hypothesized regulatory mechanisms18,22 and for helping to direct future experiments. 23–26 They are also useful for performing a preliminary network analysis25 prior to developing more detailed experimental or theoretical models. For all these reasons, the development and analysis of two-state Boolean models will be the primary focus of this review.
In addition to the switch-like regulatory dependence described above, another switch-like behaviour in biochemical systems is related to the substrate concentration (S) needed to reach half the maximum velocity (Vmax) of a reaction, commonly represented by K0.5. Enzymologists routinely use saturation curves depicting how reaction velocity (v) varies with S (Fig. 1C) to estimate the K0.5of a reaction. In reality, v is not dependent on S but is, instead, dependent on the specific substrate concentration, defined as the ratio of S to K0.5( S/K0.5). The well-known constant Km is the substrate concentration needed to reach K0.5under Michaelis –Menten (MM) kinetics.27 While K0.5is sometimes called the “apparent Km”, K0.5is not restricted to kinetic mechanisms that follow the MM approximation.
The standard MM expression for the velocity (vMM) of a non-reversible enzyme catalysed reaction is presented in eqn (1). This expression is similar to the generalized expression for the velocity (v) of any enzyme catalysed reaction presented in eqn (2), where S is converted to a product. Here, n refers to the Hill coefficient describing the degree of cooperativity in the reaction (i.e., positive, negative, or no cooperativity). Dividing both sides of eqn (2) by Vmax and factoring K0.5out of the right -hand side, gives eqn (3), which is now in the form commonly used to normalize reaction velocities (v′). Eqn (3) also resembles a standard Hill equation.
In eqn (3), when S/K0.5 = 1, then v′ = 1/2. Consequently, there are two distinct regions where the normalized reaction velocity responds in a characteristic way to S/K0.5(Fig. 1D). When S/K0.5 1, then S < (0.01 × K0.5), meaning the enzyme is not saturated and the reaction rate is linear with respect to the substrate concentration. In contrast, when S/K0.5 1, then S > (100 × K0.5), meaning the enzyme is saturated, the reaction rate is independent of the substrate concentration, and the reaction has reached (or is very near to) its maximum velocity. Thus, the specific substrate concentration serves as the on-off switch of a reaction. In logical terms, when the node representing S is OFF(or 0), then the specific substrate concentration will be less than 1 and the reaction cannot not proceed. Likewise, when the node representing S is ON (or 1), then the specific substrate concentration will be greater than 1 and the reaction will proceed at a rate near Vmax( Fig. 1D).
The use of logic to describe chemical change is not limited to the rate of biochemical reactions. Molecular substrates have been described as computational devices that process physical and chemical inputs into outputs according to Boolean logic.28 Molecular logic operations are achieved by leveraging observable chemical changes to create a computational device. These devices, for example, can be used to solve arithmetic or logic operations by exploiting changes in the conformations of chemical components. In the laboratory, molecular logic functions have been developed that rely on charge transfer, which affects the fluorescent state of a molecule.29,30 Logical functions have also been created to exploit charge transfers in cascades of coupled enzymes.31,32 In fact, molecular logic operations serve as the basis for many nanosensors currently used in the basic sciences, industry, and medicine.28
Although the theoretical underpinnings of logic models provide qualitative approximations of molecular and biochemical regulation, in reality these models can only generate predictive output when the logic model is well constructed. Several examples of well-constructed logic models providing good agreement with experimental data exist in the literature.16–18,26,33–36
For example, Li et al.34 constructed a Boolean model of the genetic network controlling the cell cycle in Saccharomyces cerevisiae. The authors found that 86% of the 2048 possible states in the network settled into a steady state (also known as a stable attractor) that corresponded to the G1 stationary phase of the cell cycle. Their analysis suggested that the regulatory network controlling the yeast cell cycle is resistant to stochastic perturbations. The authors interpreted these findings to mean that robustness in the underlying network is advantageous for the organism because, under normal conditions, there is a high probability the regulatory network dynamics will settle into the G1 state regardless of the current state of the network. Once in the G1 state, the network will remain in that state until a significant external signal perturbs the network and initiates another round of cellular division. Subsequent work by Davidich and Bornholdt 37 used a similar Boolean approach to study the cell cycle regulation in Schizosaccharomyces pombe. These authors found that the majority of S. pombe network states settle into a steady state corresponding to the G1 stationary phase, which is in agreement with the results of the S. cerevisiae model. However, they also found significant differences in the regulatory network of S. pombe compared to S. cerevisiae, which yielded very different network dynamics.
Using a somewhat different logic-based approach, Bolouri and colleagues25,38 constructed an a priori gene regulatory network of endomesoderm specification control in sea urchin embryos. The network was logic-based and generated a series of testable predictions. Using computational methodologies and large-scale perturbation analyses, the authors iteratively tested and revised their model by comparing model output to biological readouts. Their use of a regulatory network to logically map inputs and outputs for cis-regulatory elements identified system level properties that would not otherwise have been observable. From this information, the authors were able to draw important conclusions about the developmental features of endomesoderm specification.
In addition to gene regulatory networks, logic models can also be used to model signal transduction networks. Li et al.26 developed a Boolean model of the signal transduction network controlling abscisic acid regulation of stomatal closure in plants. The authors employed a network construction approach that inferred indirect molecular relationships from data to build the sparsest logic network possible that was compatible with available experimental data. A random order asynchronous Boolean approach was then used to simulate the heterogeneity in a population of cells. The model results were in good agreement with previous experimental findings and generated novel predictions about the conditions likely to have the strongest effect on stomatal closure. In a subsequent manuscript, which also serves as an excellent tutorial, Albert et al.12 contrasted the asynchronous approach used in the Li et al.26 model with a continuous piecewise Boolean model of the same system that allowed node states to vary between 0 and 1. They reported that the asynchronous discrete Boolean model produced the same qualitative results as the continuous piecewise Boolean model.
For additional examples of predictive logic-based models in the literature, we direct the reader to comprehensive reviews by Morris et al.7 and Albert et al.12 Both reviews provide detailed case studies of logic model implementations and demonstrate the variety of ways logic-based models can be applied to answer biological questions.
Each time a network is updated in a standard two-state Boolean model, signals are transferred according to logic functions in a synchronous and deterministic manner. In these models, all nodes are updated instantaneously in the same time step so that the state of the network is always fully determined by the state of all nodes in the previous time step. Thus, the underlying assumption is that all molecular interactions in the network take the same arbitrary amount of time to complete. In reality, the time it takes for molecular interactions to complete varies widely.
An alternate updating method involves asynchronous updating where one node is selected at random and instantly updated according to the current state of the network. In these models, the next state of the network is non-deterministic. The random and instantaneous updating of a node is repeated many times with each random update representing a time step in the model. This non-deterministic updating scheme is thought to more closely resemble biological variation by eliminating temporal uniformity in the model.2,12,14,39,40 Typically, implementation of this type of scheme involves running a large number of model simulations to calculate a probability that any given node will end up ON or OFF for any set of initial conditions. 12,26 Synchronous and asynchronous updating methods are discussed in more detail in later sections of this review.
Creating a logic model is relatively straight-forward. Building a logic model that can generate predictive output that can be leveraged by experimentalists, however, requires considerably more effort. Specifically, building predictive logic-based models entails two primary steps: building a detailed interaction network and translating the interaction network into a logic network. The development of a logic network includes the careful derivation of the logic functions that will drive the network’s dynamics.
The first step in developing a logic model is to construct an interaction network of the system to be modelled (see, for example, Fig. 2A). To do so will typically require a thorough literature or database search. This is a critical step and should be performed by, or in close collaboration with, someone who is well acquainted with the biology of the system. Once the interactions involved in the network have been identified, it is often desirable to perform a node reduction to reduce network complexity, especially for very large networks. The formulation of any theoretical model requires sound judgments about which approximations are appropriate for simplifying model complexity without losing essential elements of the underlying mechanism(s).41 This is certainly true of logic models where decisions must be made about whether some complex interactions can be lumped into a smaller subset of nodes and interactions (Fig. S1, ESI†). In general, the objective is to use the simplest network possible that still agrees with known experimental data. This may be done manually or with the assistance of computational algorithms.26,42,43 If automated tools are used, it is useful to validate that the reduced network generated includes a suitable amount of complexity for the system and problem considered.
A common manual approach is to eliminate redundant linear regulations. For example, in Fig. 2A, nodes B and Care both activators of E because an even number of inhibitions produces an activating regulation. In contrast, E is an inhibitor of B because there is one inhibitory regulation between E and B. A reduced version of this network is presented in Fig. 3. As shown in Fig. S2 (ESI†), both forms of the network produce the same qualitative output.
Ultimately, the interaction network must be translated into a set of logic functions (also referred to as transfer functions or logic gates) that will be used to transfer information (or signals) between nodes each time the model is updated. Logic functions often contain one or more Boolean operators. The AND and OR operators are used to define how distinct signals acting on the same node (which may be stimulatory and/or inhibitory) will be processed. The NOT operator is used to negate the state of the node it precedes. The derivation of logic functions is discussed in detail in the next section.
A justification from the literature (or evidence from experimental data) should be provided to support each logic function. When individual logic functions qualitatively agree with experimental data, it is more likely that the model, as a whole, will be predictive. Ideally, a table or appendix summarizing each logic function’s rationale will be included with all published models (see Table 1for an example). 15,17,18,26
From an interaction network diagram alone, it is not possible to infer how multiple signals acting on the same node should be processed. Therefore, the use of descriptive logic network diagrams is recommended to graphically depict a logic model. The information contained in the full set of logic functions should be equivalent to the information contained in a logic network diagram (compare Fig. 2B to C and Fig. 3B to C). The use of descriptive diagrams to represent a biological network is not a novel concept.2,44 Albert and colleagues18 proposed the use of “pseudo-nodes” and “complementary pseudo-nodes” to clarify the functional nature of edge interactions in a logic network diagram. In a large-scale Boolean model of EGFR and ERB2 signalling, Samaga et al.36 used a graphical representation where AND interactions were depicted as small blue circles and all other interactions were assumed to be OR interactions. More recently, Morris et al.7 presented logic network diagrams that used “logic gate” notation similar to that used in an engineering diagram.
Throughout this review, we have adopted a modified version of the notation used by Morris et al.7 Our notation graphically illustrates how multiple edges regulating the same node will be integrated by explicitly identifying where AND and OR operators are used in the network. In addition, all activating interactions in our logic network diagrams are indicated with a black arrow and all inhibiting interactions are indicated with a red line and a blunt edge. Regardless of the graphical method used, the use of diagrammatic logic networks is strongly encouraged to remove ambiguity from interaction networks.
When translating a set of interactions into a logic model, the implicit assumptions underlying all logic functions must be carefully considered. We recommend the construction of truth tables for each logic function to confirm the logical output of each function is in agreement with experimental data (or that of a hypothesized regulatory mechanism). A truth table provides the logical output of all possible combinations of input values a logic function may receive. In Boolean models with only two discrete states, there are 2rpossible combinations of regulatory inputs in a truth table, where r is the number of regulators (or edges) leading into the regulated node. We have provided truth tables for two biologically motivated network examples in S1 (ESI†) for reference.
For nodes with one regulator, derivation of the logic function is straightforward: the next state of the regulated node is fully determined by the current state of its only regulator (Fig. 4A). An example of the two ways a single molecule can regulate another molecule in a Boolean model is presented in Fig. 4A, along with corresponding truth tables. In the case where Cis activated by A, the logic rule is represented as Ct+1= At, which means the value of C in the next arbitrary time step (t+ 1) will be the current value of A. In the case where Cis inhibited by B, the rule is represented as Ct+1 = NOT Bt, which means the value of C in the next arbitrary time step (t+ 1) will be the inverse of the current value of B. Importantly, because C will always be ON whenever B is OFF in this inhibition example, the implicit meaning of this function is that of constitutive activation of C in the absence of B. If this turned out to be an inappropriate assumption for the of regulation of C, then additional activators of C would need to be added to the model in order to generate a more complex and accurate regulatory logic function for C.
For nodes that have multiple regulators, the development of a logic function can be more challenging. In a logic model, there are three possible mechanisms by which two nodes can regulate another node: regulation by two activators (Fig. 4B), regulation by an activator and an inhibitor (Fig. 4C), or regulation by two inhibitors (Fig. 4D). The simple 3 node interaction networks presented in the first column of Fig. 4B–D clearly indicate that 2 nodes regulate another node but do not provide precise information about how the signals will be integrated to produce a response in the regulated node. In contrast, the second and third columns of Fig. 4B–D provide logic network diagrams and truth tables illustrating the logical output produced when an AND or OR operator is used, respectively. In general, the use of an AND operator with 2 regulators results in the regulated node turning ON in one of the four possible input conditions. In contrast, the use of an OR operator with two regulators results in the regulated node turning ON in three of the four possible input conditions. Although the examples provided are simple, there are important underlying assumptions that should be emphasized.
When an activator and an inhibitor are joined by an AND operator (Fig. 4C: AND column), the function is referred to as an AND NOT function. In the logic function Ct+1 = At AND NOT B t, the inhibitor is dominant because the state of C is OFF in 3 of the 4 possible input conditions. The presence of the inhibitor also trumps the presence of the activator when both are ON(refer to the last row in the truth table).
Similarly, when an activator and an inhibitor are joined with an OR operator ( Fig. 4C: OR column), the function is referred to as an OR NOT function. In this case, the activator is dominant because the state of Cis ON in three of the four input conditions. The presence of the activator trumps the presence of the inhibitor when both are ON(refer to the last row in the truth table). Importantly, when both the activator and inhibit or are OFF, C will turn ON (refer to the first row in the truth table). The implicit meaning of Ct+1 = At OR NOT Bt, therefore, is that C becomes activated when both regulators are absent or below their functional threshold, even if C was OFF in the previous time step. At first glance this may seem counter intuitive and biologically implausible. However, what the OR NOT function actually simulates is the condition where the regulated node is ubiquitously expressed at a functionally active level such that it can only be deactivated by the presence of a direct inhibitor and, importantly, this direct inhibitor can be overridden by an activator (refer to the truth table). Later, we will consider a biological example where OR NOT is the correct function to model a biological interaction.
As illustrated in Fig. 4 and described above, when more than one node regulates the same target node, the AND and OR logical operators produce very different outcomes. It is important to understand the underlying assumptions of all logic function included in a logic model to ensure they provide a reasonable biological approximation.40 Of course, the exact nature of an interaction may not be known ( e.g., what is necessary and what is sufficient for a molecular activation). In these cases, logic models can also be used to test hypothetical interactions and compare if results match experimental data.
The network state of a logic model can be uniquely represented in a variety ways, including by binary or decimal notation (Fig. 2D). In synchronously updated Boolean logic models (where all nodes are updated at the same instant each time step), each state deterministically gives rise to another state according to the model’s logic functions. Eventually, all states will settle into one or more stable states known as attractors. If an attractor consists of a series of states that oscillate in a cycle, the attractor is called a cycle attractor or a limit cycle. If an attractor consists of a single fixed state (which will be the case when a state always gives rise to itself), the attractor is called a fixed point attractor.3,45 Examples of each type of attractor a re presented in Fig. 2E and and3D.3D. In asynchronously updated Boolean logic models (where a single node is selected randomly and updated instantly), the next state of the network is non-deterministic. Nevertheless, the network will eventually settle into an attractor regardless of the initial state of the network.2,14,40
In small to moderate sized network models that have been well constructed, attractors often empirically agree with biological phenotypes.3,34 In the model previously discussed of yeast cell cycle regulation developed by Li et al.,34 for example, there were 11 nodes which generated 2048 possible network states and 7 attractors. The vast majority of all possible states (1764 out of 2048) settled into a fixed point attractor that represented the stationary phase of the cell cycle. The authors found that the trajectory of states (also referred to as a basin of attraction45) leading to this point attractor followed expected molecular changes observed during cell cycle progression.
In any binary Boolean logic model consisting of n nodes, there are 2npossible network states. In the hypothetical 12 node network presented in Fig. 2 there are 4096 possible states and in the synchronous form of this model there are 10 attractors ( Fig. 2E). In a 21 node model we recently developed, there were 2 097 152 possible states that settled into 1 of 52 possible attractors (data not shown). Clearly, as the network size increases, it becomes difficult to draw biological inferences from the full attractor space. In addition, for large-scale networks, the enumeration of all attractors quickly becomes computationally intractable owing to the exponential relationship between the number of nodes in the network and the number of possible network states.46 One approach for analysing large attractor spaces is to measure the robustness of each attractor. 47–49 Unfortunately, these methods are generally more useful to a theoretician than a biologist. Another approach for analysing large state spaces with many attractors is to use asynchronous updating methods.12,26 These methods can generate a probability that a given network node will be ON or OFF under a particular set of conditions. Because asynchronous methods represent a repeated sampling of many different timescales, 14,40,50 they are also useful for modelling the heterogeneity of a population of cells.26,51 Asynchronous methods can also facilitate in silico molecular perturbations (such as knock-downs or constitutive activations) to generate output that is readily comparable to biological data.26
Finally, not all stable attractors in a deterministic synchronous logic model remain stable under nondeterministic asynchronous update conditions. The random perturbations associated with an asynchronously updated model result in the disappearance of some synchronous attractors when the assumption of timescale uniformity is removed.3,14,45 Thus, the asynchronous method is more likely to identify attractors that are robust to the typical stochastic variations observed in molecular interactions52,53 under physiological conditions.
In Fig. 5, we present a 10 node network of the regulation of cellular proliferation and apoptosis. This model is adapted from the Boolean model used by Ribba et al. to control cell division and cell death in a multiscale model of colorectal cancer.54 For simplicity, our version of the model eliminates linear regulations. In the original Ribba model, for example, P53 activated BAX and BAX, in turn, activated apoptosis. BAX was eliminated in our model so that P53 is a direct activator of apoptosis. In this example, the linear node reduction results in the same qualitative network behaviour and output. The interaction network for this model is presented in Fig. 5A and the logic network is depicted in Fig. 5B. There are 4 input nodes representing the signals a cell may respond to in the model: Growth Factor, Over-population, Hypoxia, and DNA Damage. The output nodes are proliferation and apoptosis. The 4 internally regulated nodes are MYC, P27, Cyc-CDK, and P53. The logic functions used in this model are listed in Table 1along with a biological justification for each function and a statement of any relevant assumptions. Truth tables for each logic function are provided in S1 (ESI†).
In this model there are 1024 (210) possible network states and 16 (24) possible input conditions (Fig. 5C). For the logic network referred to as Model I (Fig. 5B), under synchronous updating each of the 1024 possible states eventually settle into one of 16 fixed point attractors corresponding to one of three biological states (Fig. 5C, left). In Model I, when all input signals are OFF, apoptosis and proliferation are also OFF in the attractor, indicating cellular homeostasis (first row in table). When all four input signals are ON, the combination of these signals leads to apoptosis ON and cell death (last row in the table). The only input condition leading to proliferation ON in Model I is Growth Factor ON and all other inputs OFF.
In another version of this model referred to as Model II, the logical operator controlling the regulation of P27 by MYC inhibition and Hypoxia activation is changed to AND NOT from OR NOT( Fig. 5C, right). This seemingly small change produces very different output. In Model II, under synchronous updating when all input signals are OFF, all states settle into an attractor that leads to proliferation ON and population growth (first row in table), which is not the behaviour expected from a normal cell. Moreover, there are now 4 input conditions that give rise to proliferation ON, including the state where all input signals are OFF. In this network example, the use of OR NOT for the regulation of P27, rather than AND NOT, is essential for obtaining the expected readout.
Under asynchronous updating, Model I and II settle into the same attractors each model produced under synchronous updating (Fig. 5D). Empirically, this will be the case when only point attractors are possible under synchronous updating because no regulatory feedback loops are included in the network. For more complex networks with multiple feedback loops, the attractor space produced by synchronous and asynchronous updating methods will not always be identical.
Logic models have been used to model oscillations in a number of biological systems.18,21,24 Oscillations play important roles in many biological processes including the cell cycle, circadian rhythms, developmental processes, and the cellular response to stress.55 It is easy to generate sustained oscillations (which are, by definition, a cycle attractor) with a synchronous logic model. All that is required is the presence of a negative feedback between one or more nodes. Even a single node can generate oscillations (Fig. 6A).56 In contrast, to generate sustained oscillations in chemical systems with an ODE model requires at least 3 distinct species represented by2 or more ODEs. 57,58 Physically realistic sustained oscillations are possible with only 2 chemical species when delay differential equations are used because these equations introduce an explicit time delay into the system.59,60 It must be emphasized that simply generating oscillations in a mathematical model of any type does not imply the underlying mechanism driving the oscillations in the model is equivalent to that driving the experimentally observed oscillations.
Given the ease with which oscillations can be generated with a logic model, it is essential to ensure that the implicit assumptions underlying the logic functions in a model are appropriate for the oscillatory system modelled. The protein P53 has been dubbed the “guardian of the genome” for its role in maintaining genome integrity and tumour suppression in normal cells.61 P53 plays critical regulatory roles in both cell cycle progression and apoptosis. In response to stress, such as DNA damage from ionizing radiation, P53 protein is known to mediate the transcription of MDM2, which targets P53 for degradation via ubiquitin ation.55,62–64 In response to DNA damage, this antagonistic relationship induces important cellular oscillations in P53 and MDM2 expression, which have been described as a digital behaviour.55
If we first consider the simple 2 node network in Fig. 6B consisting of only P53 and MDM2nodes, we can see that, because only one edge leads into each node, the interaction network is equivalent to its representation as a logic network (using the logic network notation described previously). In this simple network, all 4 possible states make up the limit cycle attractor. Thus, if no other inputs are added to this network, it will perpetually oscillate between these 4 states. A slightly more realistic model includes adding a DNA Damage node as the input signal (Fig. 6C). Because two nodes now regulate P53 (an activator and an inhibitor), we must decide whether AND NOT or OR NOT is appropriate for the regulation of P53. In the AND NOT case (Fig. 6D), 4 of the 8 possible network states settle into a fixed point attractor where everything is OFF. The remaining 4 states settle into a limit cycle where P53 and MDM2oscillate and DNA Damage is fixed to ON. In contrast, in the OR NOT case ( Fig. 6E), 4 of the 8 possible states settle into a point attractor where everything is ON and the remaining 4 states settle into a limit cycle where P53 and MDM2oscillate and DNA Damage is fixed to OFF. Because there is nothing in the network to regulate DNA Damage, it will never oscillate in either version of model.
P53 and MDM2 proteins are known to exist at low endogenous levels in the absence of DNA damage.55 We, therefore, consider them to be at levels below their functional threshold in the absence of DNA damage. However, when DNA damage is present, P53 becomes functionally activated,55 which triggers functional MDM2 expression levels and oscillations. Thus, we conclude that the AND NOT function is appropriate for this regulation (Fig. 6D). All model results presented in Fig. 6D, Erelied on a synchronous updating scheme. In Fig. 6F, the synchronous and asynchronous updating schemes are compared for the AND NOT form of the model using the following initial conditions: P53 OFF, MDM2 OFF, and DNA Damage ON. In the synchronous approach, it can clearly be seen that DNA Damage stays constant, while P53 and MDM2 oscillate out of phase between 0 and 1 (or OFF and ON). In the asynchronous model, which represents and average of 200 simulations using a random update order, over time both P53 and MDM2are roughly 50% likely to be ON. This is the expected probability when all 200 simulations settle into a limit cycle attractor where P53and MDM2 oscillate, which is also suggestive of the average signal across a population of cells exposed to DNA damage.
In the previous model of cellular proliferation and apoptosis (Fig. 5) only fixed point attractors were possible because no regulatory feedback loops were present in the network model. If the feedback loop in Fig. 6D is added to the network in Fig. 5, then the 8 fixed point attractors with DNA Damage ON become cycle attractors with P53 and MDM2 oscillations (data not shown). In addition to adding an MDM2node, we expanded Model I (Fig. 5) to include 6 additional nodes (Fig. 7A). An AKT node was included because AKT related signalling plays an important role in activating cellular growth and inhibiting apoptosis. In addition, PTEN, a powerful tumour suppressor and upstream inhibitor of AKT activation was included.65 RAS, FOXO, BAD, and BAX were also added. Truth tables for each logic function are provided in S1 (ESI†).
In Fig. 7B, C, we performed asynchronous simulations using two distinct conditions: (1) DNA Damage ON and all other inputs OFF and (2) Growth Factor ON, PTEN ON, and all other inputs OFF. We also perturbed each of these conditions by preventing P27 from turning ON in the simulations, which effectively served as an in silico knock down (KD) of this node. In the DNA Damage ON condition, the result is as expected: all states settle into attractors where there is 0% probability for population growth (proliferation) and approximately 50% probability for cell death (apoptosis). This intermediate cell death probability, which results from the fact that P53, MDM2, and apoptosis nodes are oscillating in the attractor, is representative of an average between the oscillating states. A very different outcome is found, however, when the DNA Damage ON condition is tested with the P27 KD. In this case, proliferation also has approximately 50% probability to be ON (Fig. 7B).
When both Growth Factor(which should trigger proliferative signals) and PTEN(which should suppress proliferative signals) are ON, then cellular homeostasis (or quiescence) occurs because all states end up in an attractor where both population growth (proliferation) and cell death (apoptosis) have a 0% probability to be ON. When the same conditions are tested with the P27KD, however, population growth is 100% likely to be ON in the attractor (Fig. 7C). While these results are not surprising given the importance of P27 in regulating cell cycle progression and proliferation, they do demonstrate how models of this type may be used to make predictions related to network dynamics that can then be verified experimentally.
We have seen how Boolean models can be used to simulate network dynamics and have also discussed that Boolean models are capable of qualitative agreement with more mechanistically precise ODE models. It must be stressed, however, that a Boolean model will not always be appropriate for modelling network dynamics. In complex biochemical pathways, the time evolution of the concentration of interacting species is governed by nonlinear feedback loops in which the output of a pathway is not proportional to its input.66 Examples of network behaviour that a two-state Boolean model cannot approximate include retroactive signalling,67,68 load-induced modulation,69 and bifurcations associated with nonlinear equations. 70 In the case of retroactivity and load-induced modulation, which involve upstream signal propagation in covalently modified signalling cascades, a two-state Boolean model is too qualitative to predict this behaviour because these signalling processes arise from enzyme sequestration mechanisms that are out of reach of two-state Boolean models. In the case of bifurcations, which occur when the qualitative behaviour of the solution of a nonlinear system changes as a parameter changes, two-state Boolean models are unable to predict bifurcations because they lack parameters. It is possible that multistate or continuous piecewise logic models (which are parameter driven) are capable of predicting bifurcations. To our knowledge, however, this has not yet been explored systematically.
In a recent paper by Batchelor et al.,71 the mechanisms regulating P53 response to different perturbations were investigated. The authors employed ODE models as part of an analysis into the amplitude, duration, and frequency of individual p53 pulses in response to varying amounts of ultra violet radiation. While a synchronous Boolean model would not be able to elucidate mechanisms driving the degrees of response in a system like this, multistate fuzzy logic16 and continuo us piecewise logic models12 have been used for similar purposes in other systems. In addition, asynchronous Boolean models may be used for these types of responses. In the model of guard cell aperture closure previously discussed,26 the authors chose to use an asynchronous Boolean model to generate a probability of stomate closure because stomate aperture responses are known to be graded and cannot be represented as simply open or closed. While the predictive power of an ODE model is preferred for a dose dependent response, an ODE model may not be practical for modelling such responses in large networks for computational and parameter space reasons. In such cases when a binary response is not sufficient, the use of more complex logic-based methods, such as asynchronous, multistate, fuzzy, or piecewise models, may be a reasonable alternative.
Logic-based models are predictive tools that can be leveraged in the absence of reliable parameter information or mechanistic details needed for more quantitatively precise methods, such as ODE models. Importantly, the predictive power of logic methods is dependent on the nature of the logic network model constructed. In this review, we have pointed out important factors to consider when building predictive logic-based models. We have emphasized the importance of using descriptive logic network diagrams and provided biologically motivated example networks. Most significantly, we have emphasized the need to properly characterize the nature of all interactions in the network and to understand the implicit meaning of logic functions used to integrate multiple input signals.
As we have seen, the use of AND and OR logical operators produce very different results for the same input conditions (Fig. 4B–D, Fig. 5C, D and Fig. 6D, E). We strongly encourage the creation of truth tables to verify that the output of each logic function is reasonable and in qualitative agreement with experimental data, if available. When the nature of the interaction modelled by a logic function is not known (e.g., whether an activator will trump an inhibitor, if both are active, or vice versa), then the logic model can be used to test hypothesized mechanisms for the uncertain interaction. The use of “incomplete truth tables”, a computational approach for analysing the effect of logical uncertainty in a logic network, has also been proposed for these cases.72
Despite their advantages, resistance to the use of logic-based models in biology exists. Some resistance is related to the idea that a molecule’s state can be reduced to discrete ON and OFF values. In actuality, experimental molecular states are often qualitatively described in binary terms. Genes may be characterized as up-regulated or down-regulated in microarray experiments and proteins are often referred to as activated or inactivated to indicate their functional state. Given the stochastic variation in gene and protein expression across cells, biological molecular networks are remarkably robust.73 The presence of growth factors in the local environment, for example, will almost invariably result in the induction of proliferative pathways within a population of cells, despite the heterogeneity in the molecular expression across individual cells in the population. This deterministic output from a given cellular input has been compared to cellular digital computation.74 Fundamentally, the basis of digital readouts are 0’s and 1’s–at least at the computational level. Another point of concern with logic-based Boolean models is that time is unrelated to physiological time and can provide only a qualitative chronology of molecular activations.3 While this is true, Boolean models can provide qualitative predictive values, which allow biomedical scientists to gain unique insights into molecular network dynamics that may otherwise be out of reach.
For those interested in using logic models to study large networks, the use of asynchronous updating is generally recommended.2,45 A variety of algorithms exist for introducing asynchronous updates in a logic model.45 For most purposes, the repeated random order asynchronous45 update method (which is similar to a statistical Monte Carlo simulation) will be sufficient. This is the algorithm used for the asynchronous simulations in this review. Some attractors found with the simpler synchronous updating scheme may be artifacts of uniform timescales. In contrast, an asynchronous scheme introduces stochastic variation in timescales. Moreover, asynchronous methods can produce qualitative readouts that are more representative of biological readouts (Fig. 5D and Fig. 7B, C) and can easily facilitate in silico perturbations, such as knock downs and constitutive activations.
We view logic models as complementary to other network analysis methods in systems biology and consider them to be an important tool for making biological inferences about the dynamics of intracellular networks. A number of software tools for logic-based network analysis are available.12,43,72,75 The appropriate software tool to use will depend on the nature of the network model and objectives of the analysis.7 For asynchronous simulations and in silico molecular perturbation studies, we recommend Boolean Net, 12 which is a relatively easy to use open source tool developed in Python. Needless to say, the implementation of logic-based models requires computational and mathematical proficiency. As a consequence, collaboration between integrative biologists and computational scientists will play a pivotal role in the successful development, analysis, and interpretation of logic-based models.
Importantly, logic-based models are also a powerful approach for constructing models of biological networks that can ultimately be integrated into multiscale models–models that consider the integration between different scales and phenomena in a biological system or process–to provide an integrative view of biological systems.76 In the literature, multiscale models of cancer growth have been developed that account for the cellular, genetic, and environmental factors regulating tumour growth.54,77 These models have implemented genetic and signalling networks as Boolean models to regulate cell cycle progression where the response to signals from the intracellular gene network determines whether a cell will proliferate or die and, therefore, directly influences the cellular and the extracellular tissue level of the model.
In conclusion, it is never feasible to create a model that is an exact replica of a complex system and, as a consequence, compromises must be made between the predictive power of a model and the complexity of a model. The discrete nature of a Boolean model sacrifices quantitative dynamics for qualitative dynamics. In exchange, a parameter-free modelling framework can be used to investigate complex intracellular networks.
The adaptations that living systems continually make are carried out through the transmission of information across large gene regulatory and protein signal transduction networks. These intracellular networks exhibit an enormous range of adaptive behaviours, termed emergent properties. The study of networks from first principles is challenging because quantitative data needed to estimate model parameters are seldom available. Here we show how parameter-free logic-based models are intuitive, predictive, and robust tools for qualitatively describing the integration of complex biochemical interactions without prior knowledge of mechanistic details. Logic models can provide robust predictions of emergent behaviours in networks and can be used to help biomedical scientists unravel fundamental properties of molecular networks.
This work was partially supported by the University of Michigan Center for Computational Medicine & Bioinformatics Pilot Grant 2010. MLW acknowledges support from the Rackham Merit Fellowship, NIH T32 CA140044, and the Breast Cancer Research Foundation. SDM acknowledges support from the Burroughs Wellcome Fund, Breast Cancer Research Foundation, and NIH R01 CA77612. NC and SS were partially supported by the NIH R25 DK088752. SS also acknowledges support from NIH R01 DK053456. The authors thank Tanya Salyers for helpful discussions and an algorithm for the enumeration of synchronous attractors.
†Electronic supplementary information (ESI) available. See DOI: 10.1039/c2ib20193c