PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
Biosystems. Author manuscript; available in PMC 2017 April 1.
Published in final edited form as:
PMCID: PMC5149109
NIHMSID: NIHMS768491

Relative Stability of Network States in Boolean Network Models of Gene Regulation in Development

Abstract

Progress in cell type reprogramming has revived the interest in Waddington’s concept of the epigenetic landscape. Recently researchers developed the quasi-potential theory to represent the Waddington’s landscape. The Quasi-potential U(x), derived from interactions in the gene regulatory network (GRN) of a cell, quantifies the relative stability of network states, which determine the effort required for state transitions in a multi-stable dynamical system. However, quasi-potential landscapes, originally developed for continuous systems, are not suitable for discrete-valued networks which are important tools to study complex systems. In this paper, we provide a framework to quantify the landscape for discrete Boolean networks (BNs). We apply our framework to study pancreas cell differentiation where an ensemble of BN models is considered based on the structure of a minimal GRN for pancreas development. We impose biologically motivated structural constraints (corresponding to specific type of Boolean functions) and dynamical constraints (corresponding to stable attractor states) to limit the space of BN models for pancreas development. In addition, we enforce a novel functional constraint corresponding to the relative ordering of attractor states in BN models to restrict the space of BN models to the biological relevant class. We find that BNs with canalyzing/sign-compatible Boolean functions best capture the dynamics of pancreas cell differentiation. This framework can also determine the genes' influence on cell state transitions, and thus can facilitate the rational design of cell reprogramming protocols.

Keywords: Gene regulatory network (GRN), Cell differentiation, Multistable dynamical system, Boolean network (BN), Attractor states, Epigenetic landscape

1. Introduction

A hallmark of multicellular organisms is the co-existence of distinct differentiated cell types with different functions and stable gene expression patterns. A less specialized cell, a stem or progenitor cell, spawns a variety of more specialized progeny cells through cell differentiation. Once differentiated, a specialized cell’s gene expression pattern is relatively robust against perturbations emanating from a noisy environment. Where does this stability come from? How do gene expression patterns change as cells differentiate in response to external cues, and thereby, transition from one stable gene expression pattern to another? In principle, such questions can be answered by understanding the interactions between the genes in the underlying gene regulatory network (GRN), which constrain the changes in the gene expression patterns, producing stable and unstable steady states. The dynamical system associated with GRNs can be modelled by a system of ordinary differential equations (ODEs) where continuous variables represent the expression levels of individual genes. However, with ODEs one is quickly limited by the number of configurations of the networks due to the exponential growth of complexity with the number of genes as well as the general lack of information on the parameters that characterize the interactions between genes. A widely used alternative approach to study GRNs is Boolean networks (BNs), a framework that enables modelling of networks with hundreds of genes or analyze large statistical ensembles of networks of random structure [1,2]. Analysis of an ensemble of BNs can yield insights on the relationship between structure and dynamics of GRNs [13].

In 1969 Kauffman introduced BNs to study the dynamics of GRNs [1]. Since then BNs have been used to model a wide range of biological phenomena such as cell cycle, cellular differentiation and evolution of GRNs [417]. Specifically, BNs have been extensively used to study developmental processes. Villani et al [13] have developed a BN framework for cell differentiation. Krumsiek et al [14] have developed a BN model to recapitulate hematopoiesis. Chang et al [15] employed a BN model to explain human embryonic stem cell differentiation and the generation of induced pluripotent stem cells (iPSCs). Klipp et al [16] used a BN model to study the influence of gene regulation, methylation and histone modifications on cell differentiation. Alvarez-Buylla et al [617] used BNs to explain cell differentiation and developmental ordering in the floral organ of Arabidopsis. An important limitation of these reconstructed BN models [417] for different biological processes is their specification of one defined set of Boolean functions for genes in the network out of a multitude of possible choices [18] that can reproduce the biologically relevant cell states as network attractors, and the reason for the chosen set of Boolean functions often remains elusive. Also experimental observations in cell differentiation systems usually are consistent with a large number of possible Boolean functions rather than suggesting a single well-defined set, giving rise to a set of possible BNs that can describe the observed gene expression patterns of the attractors [18]. Thus, one always wonders whether the reported results would still hold for other choices of functions and how structurally robust the predicted dynamics is for the observed attractor states.

A more stringent requirement on a model capturing the development of multicelluar organisms is the following constraint. In addition to recapitulating the multiple observed attractors of the network, the model of the developmental GRN should also reproduce the experimentally observed relative stabilities of attractors, i.e, the model has to relate the different attractor states to each other based on their relative stabilities. By that we mean the relative ease for transitioning from one attractor state (A) to another state (B) which epitomizes the developmental process. More formally, in a stochastic system, the relative ease of transitioning from state A to state B would be given by the probability P(AB) for transition from A to B (given random flutuations in gene expression). Note that such transition probabilties are typically asymmetric (i.e., P(AB) ≠ P(BA)) – a property that ultimately accounts for the directionality (irreversibility) of development.

Interactions between genes collectively produce the developmental ordering of different cell types which is robust and repeatable during embryogenesis. Therefore, once the multiple attractors of the dynamical system are determined, it is necessary to evaluate their relative stabilities in order to derive a consistent relative ordering for all attractors in a developmental process (if one exists). Recently, some of us have derived a framework to calculate the relative stabilities of cell attractors in continuous ODE-based GRN models using least action principles [19]. However, ODE-based GRN models are not well-suited to model large networks, let alone ensembles of networks, for which BNs are commonly used [13].

In this paper, we present a mathematical framework for calculating the relative stabilities of cell attractors and transitions, and hence deriving the notion of a landscape in BN models of development. We use a minimal GRN for pancreas development as an example to demonstrate the utility of our method. Imposing the observed relative ordering of attractors as a novel phenotypic constraint affords evaluation of ensembles of BNs (with a given network structure but different sets of Boolean functions) that are compatible with multiple observed attractors of the GRN. Our method can be used to reconstruct simple BN models for developmental processes from available information on GRN architecture and relative stability of attractor states, and thus, can predict the efforts associated with particular state transitions of interest which in turn can facilitate the rational protocol design for cell reprogramming in regenerative medicine.

2. Modelling Framework

2.1 Boolean network (BN) model

BN model for a GRN is specified by its set of nodes, directed edges and Boolean functions. In a BN, the nodes represent genes while the edges represent interactions among genes in the network. Any gene i in a BN at a given time can be in one of two expression states: on if its state xi = 1 and off if its state xi = 0. For a m-gene BN, the state vector Xt = (x1(t), x2(t), …, xm(t)) gives the expression of all genes at discrete time t in the network. For each gene i in a BN, a Boolean function Fi determines the output value xi at time t + 1 given the state of its input genes at time t. Thus, the gene expression state of a BN at any time step is governed by the recursive equation:

Xt+1=F(Xt)
(1)

where Xt is a m-dimensional binary vector (0 or 1) that gives the expression of all genes at time step t. F encapsulates both the network topology and Boolean functions at all nodes, and thus, contains the information determining the dynamics of the BN.

For a m-gene BN, there are 2m possible states. A sequence of states X0, …, Xt, Xt+1, … forms a trajectory in the state space. Trajectories converge in a deterministic (noise-free) system. Since the state space is finite, the trajectories eventually coverge either to a single state (point attractor) or a cycle of states (cyclic attractor). In the extreme case, a cyclic attractor encompasses all or almost all possible network states, and given the large number of states 2m, such behavior will appear chaotic. For any given attractor, its associated basin of attraction is the set of initial states that will converge to that attractor. Attractors of a BN are charaterized by the size (and shape) of their associated basin of attraction. The network topology (i.e., the set of nodes and edges) and the Boolean functions at each node fully determine the attractor structure – which consists of attractors, trajectories and basins of attraction. The attractor structure can be determined by explicitly evaluating all state transitions X1 = F(X0) for all 2m possible initial states X0. An example of a 4-gene GRN with Boolean functions and resulting attractor structure is shown in Fig. 1.

Figure 1
Example of a BN model and its attractor landscape

2.2 Transition matrix and BN dynamics

Spontaneous transitions between attractors that underlie the epigenetic landscape or the quasipotential landscape requires probabilistic (noise-driven) dynamics. A discrete Markov model can be used to describe the BN dynamics. Let pi give the probability for the occupation of a state si, and Tij give the transition probability from state sj to state si. In a BN with m genes, there are 2m possible states, and the occupation probability of different possible states at time t can be represented by the probability distribution pt=(p1t,,pkt)T, (k = 2m). The evolution of the discrete-time Markov model associated with the BN of m genes is governed by:

pt+1=Tpt=[T11T1kTk1Tkk](p1pk)
(2)

The steady state distribution is then given by eigenvector p* of Markov transition matrix T corresponding to the eigenvalue λ = 1. Thus, BN dynamics can be characterized by the eigenvectors of Markov matrix T. If p* has only one non-zero element, then the network has a point attractor. If p* has ω non-zero elements with 1 < ω < 2m, then the network has cyclic attractors. If p* has 2m non-zero elements, it means that every state can transition to every other state, and the network dynamics is ergodic.

2.3 Mapping BN to a Markov model

We describe here the procedure to map a BN model to a Markov model with a simple example. Consider the toggle switch between two genes (x and y) with both cross-inhibitory interactions and self-activation as shown in Fig. 2. The BN model of the 2-gene toggle switch can have 22 = 4 possible states: s1 = (0,0), s2 = (0,1), s3 = (1,0), and s4 = (1,1). The Boolean functions for the 2 genes in the BN model of the toggle switch are given by:

xt+1=xt OR NOT(xt OR yt)

yt+1=yt OR NOT(xt OR yt)
(3)

For the BN model of the toggle switch, the occupation probability of the 4 possible states is represented by a distribution vector p = (p1, p2, p3, p4). Based on Boolean functions in Eq. 3, the Markov transition matrix for the BN model of the toggle switch is:

T=(0000010000101001)
(4)

where the element Tij gives the transition probability from state sj to state si. In the case of the toggle switch, the Markov matrix T has three eigenvectors with eigenvalue λ = 1. The corresponding steady states are represented by the eigenvectors p* [set membership] {(0,1,0,0), (0,0,1,0), (0,0,0,1)} and the three corresponding point attractors are s2, s3, and s4.

Figure 2
BN model for a toggle switch of two genes

2.4 Relative stability of states in BN model with noise

In a deterministic BN model, there cannot be any transitions between two states in different basins of attraction, and thus the relative stabilities of attractors have no meaning for such a Markov model. However, introduction of noise into deterministic BNs will render the possibility of transitions between two states in different basins of attraction. Thus, the notion of relative stability of the attractors can be defined in BNs with noise where there is possibility of transitions between attractors. Also, the Markov transition matrix for BNs with noise is ergodic because there is a non-zero probability of transition between any two states of the network.

In a BN with m genes, any state si in the Markov model can be represented by a m-bit binary vector. Let η be the probability for randomly flipping one bit due to noise in the m-bit binary vector corresponding to any state si. A perturbation matrix P can then be constructed using the Hamming distance dij = ‖sisjH between two states si and sj as follows [20]:

Pij={Cdijmηdij(1η)mdij(ij)0(i=j)
(5)

where Cdijn are binomial coefficients, which guarantee that each column of matrix P sums up to 1 − (1 − η)m. The BN model with noise is then constructed by adding the perturbation matrix P to the Markov matrix T:

T˜=(1η)mT+P
(6)

The equation for the dynamics of the BN model with noise is then:

pt+1=T˜pt
(7)

Note that the Markov matrix T for the deterministic BN model is such that each column of T adds up to 1. Similarly, in Eq. 6, a normalizing coefficient is used to ensure that T also satisfies the conservation principle, i.e., each column of T adds up to 1. Since the Markov matrix T is ergodic for η > 0, the BN model with noise will have one and only one steady state probability distribution p* [21]. Equations 5 and 6 can be used to construct a BN model with noise which will always converge to a steady state probability distribution p* that can be directly calculated from Eq. 7. With the introduction of noise, the original dynamical system corresponding to the deterministic BN relaxes into an ergodic dynamical system, and we can quantify the relative stabilities of distinct attractors for such a system via two measures: steady states probability distribution p* and mean first passage time (MFPT).

2.5 Mean First Passage Time (MFPT)

In an ergodic dynamical system, it is possible to reach every possible state from any arbitrary state. A transition barrier that epitomizes the ease of transition from one attractor to another attractor offers a notion of relative stability. Relative stability can be measured via mean first passage time (MFPT) which evaluates the average number of time steps Mij that are necessary for the system to transition from state si to state sj [22].

One can compute the matrix Mij containing the MFPT for the transition between any two states si and sj in the dynamical systems as follows. Let fij(n) denote that the probability that the first passage time from si to sj is n. Then we have the equations:

Pij(1)=fij(1)

Pij(2)=fij(2)+fij(1)·pjj(1)Pij(n)=fij(n)+fij(n1)·pjj(1)+fij(n2)·pjj(2)++fij(1)·pjj(n1)
(8)

where Pij(n) is the probability of transition between states si and sj in n steps. Thus, we have the following recursive equation:

fij(n)=Pij(n)k=1n1fij(nk)·pjj(k),n=1,2,
(9)

for the first passage time, and the MFPT Mij is then calculated as:

Mij=1+kjT˜ikMkj
(10)

Note that the shorter the number of time steps Mij to transition between two states si and sj, the lower is the transition barrier between two states si and state sj. There exists a particular distribution of time steps required for the transition from state si to state sj and MFPT is a mean value of this distribution.

2.6 Relative transitive ordering in a developmental GRN

Cell differentiation is a largely (spontaneously) irreversible process where the zygote differentiates robustly into other cell types in a pre-determined order. This programmed behavior that allows the zygote to robustly differentiate into other cell types in a well-defined sequence in the presence of molecular noise requires that the GRN orchestrates a consistent relative ordering of different attractors and intermediate transient states. By consistent relative ordering of different attractors, we mean that in the directed graph with each node as attractor state and edges representing spontaneous attractor transitions, there is little or ideally no circular (non-transitive) structure. Based on the relative stabilities of cell states measured via MFPT, we propose below a mathematical formula to define a consistent relative ordering of all attractors in the GRN.

Suppose we have a relative stability matrix M which gives the transition barrier between any two states based on MFPT, then we can define a relative ordering as follows. First, we define the net transition rate Dij between two attractors, i and j, as follows:

Dij=1Mij1Mji
(11)

Note that Dij > 0 implies that attractor i is more stable than attractor j. This relationship can be extended to n different attractors of the network. From the structure of the skew-symmetric matrix D, we determine whether the set of n attractors is ordered with respect to the relative stability of attractors i and j, or whether cycles are present as follows. If there exists an even permutation π = (π1, …, πn): {1, …, n} → {1, …, n} with sgn(π) = 1 and ∀ij: πi ≠ πj, such that ∀πi < πj: Dπiπj ≥ 0, then π defines a consistent relative ordering. Note that when a set of states has consistent ordering then there are no cycles in terms of relative stability between the states.

2.7 Canalyzing, Nested canalyzing and Sign-compatible Boolean functions

Information on the topology of a GRN is increasingly available from direct experimental determination of gene regulatory mechanism or such information can be inferred from experimental data [2327]. However, the Boolean function that controls the expression state of each gene based on the state of its input genes is more difficult to determine than network topology. In practice, a vast number of Boolean functions are plausible for a given network topology based on available experimental data [18]. Thus, it is desirable to apply biologically motivated constraints to limit the number of possible Boolean functions. We propose to use canalyzing, nested canalyzing and unate Boolean functions to charaterize the gene interactions and limit the set of possible Boolean functions for biological systems.

Kauffman proposed the concept of canalyzing Boolean functions (CFs) [2]. A Boolean function F(x1, …, xn) is CF if there exists an input xi with canalyzing Boolean value a such that F(x1, …, xi = a, xn) = b is constant. Thus, the input value xi = a determines the output value of the Boolean function F(x1, …, xn) regardless of the state of other inputs. The output value b is called the canalyzed value.

Later Kauffman extended this concept to nested canalyzing Boolean functions (NCFs) which are a subset of CFs [27]. A Boolean function F(x1, …, xn) is NCF if there exists a permutation of inputs (σ1, …, σn) such that it satisfies the following conditions: F(xσ1 = aσ1, …, xσn) = bσ1; F(xσ1aσ1, xσ2 = aσ2 …, xσn) = bσ2; ………; F(xσ1aσ1, xσ2aσ2 …, xσn−1 = aσn−1, xσn) = bσn−1; F(xσ1aσ1, xσ2aσ2 …, xσn−1aσn−1, xσn = aσn) = bσn where bσi is the canalyzed Boolean value of input xσi. The presence of CFs in BNs results in lesser number of effective upstream regulators for genes than it may appear in the network topology [27].

Furthermore, it has been shown that CFs improve the robustness of the attractor states in BNs to both mutations (in the form of changes in architecture, such as rewiring of edges or deletion of nodes) as well as perturbations (in the form of instant bit flips of the node states), and thus, CFs push the BN dynamics towards an ordered regime [27,28].

Another class of Boolean functions that influence the network dynamics comprises the unate functions [28]. An unate function is a type of Boolean function which has monotonic properties. Since in biology a gene interaction is often characterized as either activating or inhibiting, unate functions can be used to capture the monotonic properties of the Boolean functions. A Boolean function F(x1, x2, …, xn) is said to be positive unate in input xi if ∀xj [set membership] {0,1} with ji:

F(,xi1,1,xi+1,)F(,xi1,0,xi+1,)
(12)

and negative unate in input xi if ∀xj [set membership] {0,1} with ji:

F(,xi1,1,xi+1,)F(,xi1,0,xi+1,)
(13)

Positive unateness in input xi means that when the input gene xi is on, the output of the Boolean function is greater than or equal to that when the input gene xi is off. Negative unateness in input xi means that when the input gene xi is on, the output of the Boolean function is less than or equal to that when the input gene xi is off. Note that it has been shown earlier that NCFs are equivalent to unate cascade functions [30].

Although the Boolean function controlling the expression state of an output gene based on the state of its input genes is difficult to determine, in many cases, the nature of gene interactions (activation or inhibition) between the input gene(s) and the output gene may be known from experimental data. Such information on the nature of gene interactions can be used to constrain the set of Boolean functions. Specifically, NCFs can be limited to a subset of sign-compatible functions (SGNs) based on such information on the nature of gene interactions. A NCF F(x1, x2, …, xn) is said to be a SGN if every input xi satisfies either positive unateness or negative unateness based on the known nature of interaction between input gene and output gene. If the output function for gene xj is a SGN then its input xi will satisfy positive unateness if xi is known to activate xj or its input xi will satisfy negative unateness if xi is known to inhibit xj.

In summary, a hierarchy exists among the different types of Boolean functions considered here. CFs are a subset of all possible Boolean functions, NCFs are a subset of CFs and SGNs are a subset of NCFs. We show later that the sign-compatiblity plays an important role in constraining the set of possible BNs compatible with multiple observed attractors and in determining the relative stabilities of cell attractors.

2.8 Restricting the space of BNs by imposing specific types of Boolean functions and dynamical constraints

Building the BN model for a GRN involves two important steps. Firstly, the topology of the underlying network has to be specified. This task is usually accomplished through synthesis of large amounts of experimental data from the literature on the specific biological process in question. Secondly, appropriate Boolean functions need to be specified for each gene in the network which determines the output expression state of genes based on the expression state of the input genes. This step is much more challenging compared to the task of specifying the topology of the network as the space of possible Boolean functions for a given network topology can be astronomical [18]. A possible strategy to restrict the space of possible Boolean functions for a given network topology is through imposition of dynamical constraints associated with known attractors of the GRN and the choice of specific type of Boolean functions. Firstly, through imposition of dynamical constraints, one demands that the BN with the chosen functions recapitulate the known attractors of system that correspond to cell types in a developmental GRN. Secondly, one can restrict the Boolean functions to certain desired mathematical forms [2,27,28,30,31] which may render the dynamics to be ordered and directional. Here, we hypothesized that canalyzing, nested canalyzing and sign-compatible functions may render the dynamics of the developmental GRN to be an ordered and directional process. We apply this framework next to model pancreas cell differentiation.

3. Results and Discussion

3.1 BN model for pancreas differentiation with restriction of Boolean functions

In this work, we apply the concept of relative stability of attractor states to study cell differentiation in BN models of human pancreas development.

The pancreas is an exocrine gland that secretes various digestion enzymes into the intestinal lumen. It is also an important endocrine organ: β cells of the pancreas islets secrete insulin into the circulatory system that regulates blood glucose levels and metabolism as shown in Fig. 3A. As deficiency or malfunction of β cells leads to diabetes, a major focus of research has been on cell regeneration or reprogramming to produce β cells. This line of research has led to the characterization of cell lineage determining factors in the pancreas development as well as reprogramming experiments to β cells. Pdx1 is the first gene in embryonic development that marks the onset of the pancreas cell lineage. The cells with high Ptf1a expression are fated to the exocrine pancreas. A few cells that display temporarily high Ngn3 expression will form the endocrine cell lineages. The expression of Pax4 in some cells further specifies the β, δ lineages while Arx specifies the α,PP lineages (Fig. 3B).

Figure 3
BN model of the minimal GRN for Pancreas cell differentiation

Based on experimental literature, we reconstruct a minimal GRN for pancreas development shown in Fig. 3C that determines three cell types: the exocrine cells, the β/δ progenitor and the α/PP progenitor. The set of interactions between the five genes in the minimal GRN for pancreas development shown in Fig. 3C were deduced from the literature on ChIP-chip or knockout or overexpression experiments [3237]. Supplementary Table S1 lists literature evidence in support of interactions between genes contained in the minimal GRN for pancreas development. Note that in Fig. 3C the interactions shown with continuous lines have known signs (activation or inhibition) based on experimental data while interactions shown with dashed lines have unknown signs. In principle, interactions with unknown signs can either mediate activation or inhibition.

We decided to build a BN model for pancreas development based on the structure of the minimal GRN of 5 genes along with known nature of interactions as shown in Fig. 3C. Since, the number of possible Boolean functions for a node with k inputs is 22k, the total number of all possible combinations of Boolean functions for the minimal GRN of 5 genes shown in Fig. 3C is 221 × 223 × 223 × 223 × 223 ≈ 1.7 × 1010 (Table 1), and this number is too large for any practical computational purpose. Thus, we decided to impose (structural) constraints on the type of functions to limit the set of possible Boolean functions in the BN model of the pancreas development.

Table 1
Number of possible combinations of Boolean functions satisfying structural and dynamical constraints associated with BN model of pancreas development.

As a first constraint, we explicitly adopt Boolean functions of the following type: CFs [2], NCFs [27] and SGNs (See Section 2.7). Such functions may render pancreas development to be an ordered and directional process [27,28]. Since, the number of possible CFs for a node with k=1 input is 4 and with k=3 inputs is 120, the total number of all possible combinations of CFs for the minimal GRN of 5 genes is 4 × 120 × 120 × 120 × 120 ≈ 8.3 × 108 (Table 1). Similarly, the number of possible NCFs for a node with k=1 input is 2 and with k=3 inputs is 64, and the total number of all possible combinations of NCFs for the minimal GRN of 5 genes is 2 × 64 × 64 × 64 × 64 ≈ 3.4 × 107 (Table 1). For the minimal GRN of 5 genes shown in Fig. 3C, we find the number of possible SGNs for the node with k=1 input to be 1, and for each of the 4 nodes with k=3 inputs to be 16. Thus, the total number of all possible combinations of SGNs for the minimal GRN of 5 genes is 1 × 16 × 16 × 16 × 16 = 65536 (Table 1). Note that while assigning SGNs to nodes which have some inputs with unknown nature of interaction (activation or repression), we allow both possibilities (positive or negative unateness) for such interactions with unknown signs in the SGN. In summary, as reported in Table 1, the number of possible Boolean functions for the minimal GRN of 5 genes decreases from 1.7 × 1010 to 8.3 × 108 when restricting Boolean functions to CFs, to 3.4 × 107 when restricting to NCFs, and down to 6.5 × 104 when restricting to SGNs. Thus, restriction to SGNs significantly constrains the number of possible Boolean functions for the minimal GRN of 5 genes by reducing the functions to a fraction 3.8 × 10−6 of the complete space (Table 1).

Based on experimental measurements [3236], we can define three cell states (corresponding to the exocrine cells, the β/δ progenitor and the α/PP progenitor) for the minimal GRN of 5 genes with each gene adopting binary values: 0 when the gene is not expressed in a cell lineage and 1 when the gene is expressed in a cell lineage, as shown in Fig. 3D. These three cell states or lineages (which are labelled attractor 1, 2, and 3 in Fig. 3D) must be stable attractor states of the BN model for pancreas development, and thus, represent a dynamical constraint on the network. Therefore, as a second constraint, we impose that the BN models for pancreas development based on topology of the minimal GRN of 5 genes must have at least three stable attractors with the gene expression patterns as defined in Fig. 3D. Here we further refine this condition into two cases: (i) BNs that have at least the three defined attractors; (ii) BNs that have exactly three defined attractors. In the first case, for at least three attractors, we find that the number of combinations of Boolean functions satisfying the dynamical constraint to be ~1.0×106. Among these ~1.0×106 combinations, there are 78,400 CFs, within which there are 9,216 NCFs, and 3,600 SGNs (Table 1). In the second case, for exactly three attractors, we find the number of combinations of Boolean functions satisfying the dynamical constraint to be 86,042. Among these 86,042 combinations, there are 3,741 CFs, 219 NCFs, and 109 SGNs (Table 1). Supplementary Table S2 lists the 109 combinations of SGNs for the minimal GRN of 5 genes satisfying the dynamical constraint of exactly three attractors.

Our results reported in Table 1 show that the condition of sign-compatible functions is the strictest constraint leading to significant decrease in the number of possible Boolean functions. We remark that our method of applying successive structural constraints (CFs, NCFs and SGNs) to shrink the space of possible Boolean functions can serve as a powerful tool to reconstruct large-scale BNs from experimental datasets. The logic relationship of various constraints for the nested subsets of Boolean functions is shown in the Venn diagram in Fig. 3E.

3.2 Imposing structural constraints enriches the biologically correct relative ordering of attractors

From experimental observations, it is known that among the cell types of pancreas studied here, the exocrine cells are the least stable while the β, δ progenitors are more stable than the α, γ progenitors [38]. In order to study the relative stability of cell states in pancreas differentiation, we demand that the corresponding BN model not only recapitulates the correct (observed) gene expression patterns for the three cell types, but also that it reproduces the correct developmental ordering of the three cell types, that is, of the three attractors. Thus, we impose one additional functional constraint for BN model determination: transition rates between the attractors should reflect the developmental ordering. Consideration of this additional functional constraint of developmental ordering within BN framework to build models raised a central question: when one imposes structural and dynamical constraints as in the last section to shrink the space of possible BNs, does it lead to the enrichment of correct relative ordering of cell attractors? Here we define the relative ordering of cell attractors in BN models of development based on MFPT which is used to measure the relative stability of attractor states (See sections 2.5 and 2.6).

We employ an ensemble approach to determine whether imposition of increasing level of constraints on the set of possible BNs will increase the probability of obtaining the BNs with correct relative ordering. In other words, we decided to check if the set of BNs with correct relative ordering will be enriched when more stringent constraints are put on the ensemble of possible Boolean functions. To estimate the statistical significance, a null model was constructed to bootstrap the same number of Boolean functions as the ones defined by the structural constraints (CFs, NCFs and SGNs). For example, there are 9,216 NCFs among the set of 78,400 CFs satisfying the dynamical constraint of at least three attractors. In the null model, we use bootstrapping to repeatedly randomly choose 9,216 functions from the set of 78,400 CFs and calculated their ordering distribution based on MFPT to generate a Boxplot (Fig. 4). In Fig. 4, only the second panel represents the correct biological ordering (Stability of Attractor 1 < Stability of Attractor 3 < Stability of Attractor 2) for pancreas differentiation. From the Boxplot in Fig. 4, it is seen that when we impose increasing constraints on the type of Boolean functions in the order of CFs, NCFs and SGNs, the ratio of the BNs with correct relative ordering (represented by black dots), always outperforms the random sampling. Bootstrapping for other possible orderings show mixed results compared with random sampling. In fact, the opposite ordering (Stability of Attractor 2< Stability of Attractor 3 < Stability of Attractor 1) is diluted as can be seen in panel 5 of Fig. 4. These observations further validate our choice of the sign-compatible functions, which not only significantly decrease the number of possible BNs, but also promote correct ordering of the relative stabilities of cell attractors in the GRN for pancreas development.

Figure 4
Distribution of BN models satisfying increasing structural constraints and the dynamical constraint of exactly three defined attractors across different possible relative stability orderings of cell attractors

3.3 Bimodal distribution of Landscape between Flatness and Ruggedness

Besides building a BN model for the correct relative ordering of cell attractors, it is of general interest to characterize the level of effort required for a cell to transition from one state to another state, i.e., is the BN landscape of cell state transitions rather flat or rugged [2,39,40]? A simple measure [41] to evaluate the ruggedness of the BN landscape is the entropy S=Σipi* ln pi* associated with the steady state probability distribution p* of the BN model with noise where elements pi* give the probability for the system to be in state si. The larger the entropy S, the flatter is the landscape and vice versa. Since the entropy of the steady state probability distribution of the BN model varies with the noise level η, we computed the entropies of the 86,024 combinations of Boolean functions (producing exactly three attractors) for different noise levels η [set membership] [0.001, 0.009] in increments of 0.001, η [set membership] [0.01, 0.09] in increment 0.01, and η [set membership] [0.1, 0.7] in increments of 0.1, as plotted in Fig. 5A. We find that the variance of entropies of the Boolean function ensemble becomes smaller and the landscape becomes flatter with increasing noise. This implies that presence of large amplitude of noise in BN models diminishes the influences of constraints specified by GRN. Since all noise levels were generally found to have the same trend, we chose the noise level η=0.01 to plot the entropies of 86,024 steady state distributions corresponding to all combinations of Boolean functions with exactly three attractors in Fig. 5B (where different colors correspond to different relative ordering). From Fig. 5B, it is seen that the patterns of entropy across the combinations of Boolean functions with exactly three attractors is almost repeatable, but interestingly this is not due to degeneracy in steady state distributions. We have explicitly checked that each steady state distribution p* is unique, with the overall trend that the entropy hops between two peaks when we sample all combinations of Boolean functions with exactly 3 attractors.

Figure 5
Entropy associated with the steady state probability distribution of BN models satisfying structural and dynamical constraints

From Fig. 5B and 5C, we can see that the entropy distributions for different relative ordering schemes appear to be bimodal. Since the state space of the minimal GRN of 5 genes has 32 states, the absolute random case corresponds to the entropy S=Σipi* ln pi*=132 ln 132·323.46. From Fig. 5B, it is seen that the maximum entropy for BNs of any relative ordering is below 1.8, suggesting that the selected networks were all in the ordered regime [2], far away from pure randomness. Note that smaller the value of entropy, greater is the level of constraints on a BN model. Thus, in Fig. 5B, the peak near zero reflects more rugged landscapes while the peak far away from zero denotes the flat landscapes. It is interesting to notice that even two dynamical systems with exactly the same attractors and the same relative ordering, can display a distinct overall landscape, either flat or rugged. Cell reprogramming is much more difficult in the rugged landscape than in the flat one because of the many intermediate attractors between the origin and the destination attractors. Thus, the above described scheme could be exploited for the analysis, and perhaps, rational protocol design of cell reprogramming. Specifically, one may be able to perturb the GRN such as to make the landscape flatter while maintaining its biological ordering of cell attractors.

We next tried to identify common features across the set of Boolean functions leading to rugged landscapes in the minimal GRN of 5 genes. Given any Boolean function, we define a quantity called rule number associated with the function that gives the number of Boolean rules separated by logical OR in the disjunctive normal form of the Boolean function. For example, the Boolean function A AND NOT B AND C has rule number 1; the Boolean function (NOT A AND B) OR (B AND NOT C) has rule number 2. In Figs. 6A and 6B, we plot the distribution of rule numbers for Boolean functions at each node of the minimal GRN of 5 genes for both the flat and the rugged landscapes. The main difference between the two types of landscapes we found was that the Boolean functions of the rugged landscape had, on average, higher rule numbers than those of the flat ones. One possible explanation for this finding is as follows: In case of dynamical systems with the same set of attractors, more rule numbers in Boolean functions leads to a more rugged landscape, which in turn renders more difficult the cell state transitions.

Figure 6
Distribution of rule numbers for Boolean functions at each node of the minimal GRN of 5 genes for flat and rugged landscapes

4. Conclusions

In the post-genome era, understanding of biological processes has shifted from a notion of causation focused on few genes or linear regulatory pathways to one that takes into account the high-dimensional dynamics of complex nonlinear systems, prosaically represented by GRNs [417,4244]. Many higher-level cell functions, such as cell differentiation, cell cycle, immune responses or neuronal activities can only be explained by the dynamics of complex biological networks [417]. In this paper, we built on the old elementary concept of the cell attractor that is now gaining attention and showed that Waddington’s epigenetic landscape [39] has a formal basis even in the BN framework [13]. Porting the landscape concept to the realm of discrete networks allowed us to use computational tools to study the landscape as additional information derived from the GRN. We defined the relative stability of network states on the BN landscape, thus providing a formal and quantifiable basis to the elevation in the landscape for Boolean network. We proposed methods to enforce the relative ordering of attractor states in BN models as a novel requirement for constraining large ensembles of BNs to the biologically relevant class. We showed in our example that even with incomplete information about network structures, the use of BN can capture the essential dynamics of cell fate change and permit the estimation of relative stability and transition barriers between the cell attractors. This mathematical framework has the potential to assist the rational design of perturbation protocols for directing cell reprogramming in order to generate the desired cell lineages in regenerative medicine. As the knowledge of the structure of GRNs governing the development of various tissues increases in the next decade, the utilization of such network information for therapeutic reprogramming may benefit from the concepts developed here.

Supplementary Material

Acknowledgments

Research reported in this publication was supported by the Center for Systems Biology/2P50GM076547 of the National Institutes of Health under award R01GM987654. This research was also supported by the National Science Foundation Grant PHY11-25915. The content is solely the responsibility of the authors and does not necessarily represent the official views of the funding agencies.

Abbreviations

BN
Boolean network
CF
Canalyzing function
GRN
Gene regulatory network
MFPT
Mean first passage time
NCF
Nested canalyzing function
ODE
Ordinary differential equation
SGN
Sign-compatible function

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Supplementary Material

Supplementary Table S1: Literature evidence in support of interactions between genes in the minimal GRN for pancreas development.

Supplementary Table S2: List of 109 combinations of sign-compatible functions (SGNs) for the 5 gene minimal GRN for pancreas development satisfying the dynamical constraint of exactly three attractors.

References

1. Kauffman SA. Metabolic stability and epigenesis in randomly constructed genetic nets. J. Theor. Biol. 1969;22:437–467. [PubMed]
2. Kauffman SA. The Origins of Order. New York: Oxford Univ. Press; 1993.
3. Shmulevich I, Kauffman SA. Activities and sensitivities in Boolean network models. Phys Rev Lett. 2004;93:48701. [PMC free article] [PubMed]
4. Huang S, Ingber DE. Shape-dependent control of cell growth, differentiation, and apoptosis: switching between attractors in cell regulatory networks. Exp. Cell Res. 2000;261:91–103. [PubMed]
5. Klemm K, Bornholdt S. Topology of biological networks and reliability of information processing. Proc. Natl. Acad. Sci. U. S. A. 2005;102:18414–18419. [PubMed]
6. Balleza E, Alvarez-Buylla ER, Chaos A, Kauffman A, Shmulevich I, Aldana M. Critical dynamics in genetic regulatory networks: examples from four kingdoms. PLoS One. 2008;3:e2456. [PMC free article] [PubMed]
7. Torres-Sosa C, Huang S, Aldana M. Criticality as an emergent property of genetic networks that exhibit evolvability. PLoS Comput. Biol. 2012;8:e1002669. [PMC free article] [PubMed]
8. Espinosa-Soto C, Martin OC, Wagner A. Phenotypic plasticity can facilitate adaptive evolution in gene regulatory circuits. BMC Evol. Biol. 2011;11:5. [PMC free article] [PubMed]
9. Lau KY, Ganguli S, Tang C. Function constrains network architecture and dynamics: A case study on the yeast cell cycle Boolean network. Phys. Rev. E - Stat. Nonlinear, Soft Matter Phys. 2007;75 [PubMed]
10. Samal A, Jain S. The regulatory network of E. coli metabolism as a Boolean dynamical system exhibits both homeostasis and flexibility of response. BMC Syst. Biol. 2008;2:21. [PMC free article] [PubMed]
11. Li F, Long T, Lu Y, Ouyang Q, Tang C. The yeast cell-cycle network is robustly designed. Proc. Natl. Acad. Sci. U. S. A. 2004;101:4781–4786. [PubMed]
12. Albert R. The topology of the regulatory interactions predicts the expression pattern of the segment polarity genes in Drosophila melanogaster. J. Theor. Biol. 2003;223:1–18. [PubMed]
13. Villani M, Barbieri A, Serra R. A dynamical model of genetic networks for cell differentiation. PLoS One. 2011;6:e17703. [PMC free article] [PubMed]
14. Krumsiek J, Marr C, Schroeder T, Theis FJ. Hierarchical Differentiation of Myeloid Progenitors Is Encoded in the Transcription Factor Network. PLoS One. 2011;6:e22649. [PMC free article] [PubMed]
15. Chang R, Shoemaker R, Wang W. Systematic Search for Recipes to Generate Induced Pluripotent Stem Cells. PLoS Comput. Biol. 2011;7:e1002300. [PMC free article] [PubMed]
16. Flöttmann M, Scharp T, Klipp E, Floettmann M. A Stochastic Model of Epigenetic Dynamics in Somatic Cell Reprogramming. Front. Physiol. 2012;3:216. [PMC free article] [PubMed]
17. Villarreal C, Padilla-Longoria P, Alvarez-Buylla E. General Theory of Genotype to Phenotype Mapping: Derivation of Epigenetic Landscapes from N-Node Complex Gene Regulatory Networks. Phys. Rev. Lett. 2012;109:1–5. [PubMed]
18. Henry A, Monéger F, Samal A, Martin OC. Network function shapes network structure: the case of the Arabidopsis flower organ specification genetic network. Mol. Biosyst. 2013;9:1726–1735. [PubMed]
19. Zhou J-XX, Aliyu MDS, Aurell E, Huang S. Quasi-potential landscape in complex multi-stable systems. J. R. Soc. Interface. 2012;9:3539–3553. [PMC free article] [PubMed]
20. Shmulevich I, Gluhovsky I, Hashimoto RF, Dougherty ER, Zhang W. Steady-State Analysis of Genetic Regulatory Networks Modelled by Probabilistic Boolean Networks. Comp. Funct. Genomics. 2003;4:601–608. [PMC free article] [PubMed]
21. Latouche G, Ramaswamy V. Introduction to Matrix Analytic Methods in Stochastic Model. ASA/SIAM Series on Statistics and Applied Probability. 1999
22. Kemeny JG, Snell JL. Finite Markov chains. 1960
23. Lee TI, et al. Transcriptional regulatory networks in Saccharomyces cerevisiae. Science. 2002;298:799–804. [PubMed]
24. Herrgård MJ, Covert MW, Palsson BØ. Reconstruction of microbial transcriptional regulatory networks. Curr. Opin. Biotechnol. 2004;15:70–77. [PubMed]
25. Blais A, Dynlacht BD. Constructing transcriptional regulatory networks. Genes Dev. 2005;19:1499–1511. [PubMed]
26. Hu Z, Killion PJ, Iyer VR. Genetic reconstruction of a functional transcriptional regulatory network. Nat. Genet. 2007;39:683–687. [PubMed]
27. Hecker M, Lambeck S, Toepfer S, Van Someren E, Guthke R. Gene regulatory network inference: data integration in dynamic models—a review. Biosystems. 2009;96(1):86–103. [PubMed]
28. Kauffman S, Peterson C, Samuelsson B, Troein C. Random Boolean network models and the yeast transcriptional network. Proc. Natl. Acad. Sci. U. S. A. 2003;100:14796–14799. [PubMed]
29. Nikolajewa S, Friedel M, Wilhelm T. Boolean networks with biologically relevant rules show ordered behavior. Biosystems. 2007;90(1):40–47. [PubMed]
30. Jarrah AS, Raposa B, Laubenbacher R. Nested Canalyzing, Unate Cascade, and Polynomial Functions. Physica D. 2007;233:167–174. [PMC free article] [PubMed]
31. Ebadi H, Klemm K. Boolean networks with veto functions. Physical Review E. 2014;90(2):022815. [PubMed]
32. Zhou J-XX, Brusch L, Huang S, Zhou J-XX. Predicting pancreas cell fate decisions and reprogramming with a hierarchical multi-attractor model. PLoS One. 2011;6:e14752. [PMC free article] [PubMed]
33. Chiang M-K, Melton DA. Single-Cell Transcript Analysis of Pancreas Development. Dev. Cell. 2003;4:383–393. [PubMed]
34. Gittes GK. Developmental biology of the pancreas: a comprehensive review. Dev. Biol. 2009;326:4–35. [PubMed]
35. Murtaugh LC. Pancreas and beta-cell development: from the actual to the possible. Development. 2007;134:427–438. [PubMed]
36. Jensen J. Gene regulatory factors in pancreatic development. Dev. Dyn. 2004;229:176–200. [PubMed]
37. Habener JF, Kemp DM, Thomas MK. Minireview: transcriptional regulation in pancreatic development. Endocrinology. 2005;146:1025–1034. [PubMed]
38. Collombat P, et al. The ectopic expression of Pax4 in the mouse pancreas converts progenitor cells into alpha and subsequently beta cells. Cell. 2009;138:449–462. [PMC free article] [PubMed]
39. Waddington CH. The Strategy of the Genes. A Discussion of Some Aspects of Theoretical Biology (Alen & Unwin) 1957
40. Goldberg AD, Allis CD, Bernstein E. Epigenetics: a landscape takes shape. Cell. 2007;128(4):635–638. [PubMed]
41. Malan KM, Engelbrecht AP. A survey of techniques for characterising fitness landscapes and some possible ways forward. Information Sciences. 2013;241:148–163.
42. Kitano H. Systems biology: a brief overview. Science. 2002;295:1662–1664. [PubMed]
43. De Jong H. Modeling and simulation of genetic regulatory systems: a literature review. Journal of computational biology. 2002;9(1):67–103. [PubMed]
44. Davidson E, Levin M. Gene regulatory networks. Proceedings of the national academy of sciences of the United States of America. 2005;102(14):4935–4935. [PubMed]