Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC5149109

Formats

Article sections

- Abstract
- 1. Introduction
- 2. Modelling Framework
- 3. Results and Discussion
- 4. Conclusions
- Supplementary Material
- References

Authors

Related links

Biosystems. Author manuscript; available in PMC 2017 April 1.

Published in final edited form as:

Published online 2016 March 7. doi: 10.1016/j.biosystems.2016.03.002

PMCID: PMC5149109

NIHMSID: NIHMS768491

Joseph Xu Zhou,^{1,}^{2,}^{*} Areejit Samal,^{1,}^{3,}^{4,}^{*} Aymeric Fouquier d’Hérouël,^{1,}^{5} Nathan D. Price,^{1} and Sui Huang^{1,}^{§}

The publisher's final edited version of this article is available at Biosystems

See other articles in PMC that cite the published article.

Progress in cell type reprogramming has revived the interest in Waddington’s concept of the *epigenetic landscape*. Recently researchers developed the quasi-potential theory to represent the Waddington’s landscape. The Quasi-potential *U( x)*, derived from interactions in the gene regulatory network (GRN) of a cell, quantifies the relative stability of network states, which determine the

A hallmark of multicellular organisms is the co-existence of distinct differentiated cell types with different functions and stable gene expression patterns. A less specialized cell, a stem or progenitor cell, spawns a variety of more specialized progeny cells through cell differentiation. Once differentiated, a specialized cell’s gene expression pattern is relatively robust against perturbations emanating from a noisy environment. Where does this stability come from? How do gene expression patterns change as cells differentiate in response to external cues, and thereby, transition from one stable gene expression pattern to another? In principle, such questions can be answered by understanding the interactions between the genes in the underlying gene regulatory network (GRN), which constrain the changes in the gene expression patterns, producing stable and unstable steady states. The dynamical system associated with GRNs can be modelled by a system of ordinary differential equations (ODEs) where continuous variables represent the expression levels of individual genes. However, with ODEs one is quickly limited by the number of configurations of the networks due to the exponential growth of complexity with the number of genes as well as the general lack of information on the parameters that characterize the interactions between genes. A widely used alternative approach to study GRNs is Boolean networks (BNs), a framework that enables modelling of networks with hundreds of genes or analyze large statistical ensembles of networks of random structure [1,2]. Analysis of an ensemble of BNs can yield insights on the relationship between structure and dynamics of GRNs [1–3].

In 1969 Kauffman introduced BNs to study the dynamics of GRNs [1]. Since then BNs have been used to model a wide range of biological phenomena such as cell cycle, cellular differentiation and evolution of GRNs [4–17]. Specifically, BNs have been extensively used to study developmental processes. Villani *et al* [13] have developed a BN framework for cell differentiation. Krumsiek *et al* [14] have developed a BN model to recapitulate hematopoiesis. Chang *et al* [15] employed a BN model to explain human embryonic stem cell differentiation and the generation of induced pluripotent stem cells (iPSCs). Klipp *et al* [16] used a BN model to study the influence of gene regulation, methylation and histone modifications on cell differentiation. Alvarez-Buylla *et al* [6–17] used BNs to explain cell differentiation and developmental ordering in the floral organ of Arabidopsis. An important limitation of these reconstructed BN models [4–17] for different biological processes is their specification of one defined set of Boolean functions for genes in the network out of a multitude of possible choices [18] that can reproduce the biologically relevant cell states as network attractors, and the reason for the chosen set of Boolean functions often remains elusive. Also experimental observations in cell differentiation systems usually are consistent with a large number of possible Boolean functions rather than suggesting a single well-defined set, giving rise to a set of possible BNs that can describe the observed gene expression patterns of the attractors [18]. Thus, one always wonders whether the reported results would still hold for other choices of functions and how *structurally robust* the predicted dynamics is for the observed attractor states.

A more stringent requirement on a model capturing the development of multicelluar organisms is the following constraint. In addition to recapitulating the multiple observed attractors of the network, the model of the developmental GRN should also reproduce the experimentally observed *relative stabilities of attractors*, i.e, the model has to relate the different attractor states to each other based on their *relative stabilities*. By that we mean the *relative ease* for transitioning from one attractor state (*A*) to another state (*B*) which epitomizes the developmental process. More formally, in a stochastic system, the relative ease of transitioning from state *A* to state *B* would be given by the probability *P*(*A*→*B*) for transition from *A* to *B* (given random flutuations in gene expression). Note that such transition probabilties are typically asymmetric (i.e., *P*(*A*→*B*) ≠ *P*(*B*→*A*)) – a property that ultimately accounts for the directionality (irreversibility) of development.

Interactions between genes collectively produce the developmental ordering of different cell types which is robust and repeatable during embryogenesis. Therefore, once the multiple attractors of the dynamical system are determined, it is necessary to evaluate their relative stabilities in order to derive a consistent relative ordering for all attractors in a developmental process (if one exists). Recently, some of us have derived a framework to calculate the relative stabilities of cell attractors in continuous ODE-based GRN models using least action principles [19]. However, ODE-based GRN models are not well-suited to model large networks, let alone ensembles of networks, for which BNs are commonly used [1–3].

In this paper, we present a mathematical framework for calculating the relative stabilities of cell attractors and transitions, and hence deriving the notion of a landscape in BN models of development. We use a minimal GRN for pancreas development as an example to demonstrate the utility of our method. Imposing the observed relative ordering of attractors as a novel phenotypic constraint affords evaluation of ensembles of BNs (with a given network structure but different sets of Boolean functions) that are compatible with multiple observed attractors of the GRN. Our method can be used to reconstruct simple BN models for developmental processes from available information on GRN architecture and relative stability of attractor states, and thus, can predict the efforts associated with particular state transitions of interest which in turn can facilitate the rational protocol design for cell reprogramming in regenerative medicine.

BN model for a GRN is specified by its set of nodes, directed edges and Boolean functions. In a BN, the nodes represent genes while the edges represent interactions among genes in the network. Any gene *i* in a BN at a given time can be in one of two expression states: *on* if its state *x _{i}* = 1 and

$${\mathit{X}}^{t+1}=\mathit{F}({\mathit{X}}^{t})$$

(1)

where **X**^{t} is a *m*-dimensional binary vector (0 or 1) that gives the expression of all genes at time step *t*. **F** encapsulates both the network topology and Boolean functions at all nodes, and thus, contains the information determining the dynamics of the BN.

For a *m*-gene BN, there are 2^{m} possible states. A sequence of states **X**^{0}, …, **X**^{t}, **X**^{t+1}, … forms a trajectory in the state space. Trajectories converge in a deterministic (noise-free) system. Since the state space is finite, the trajectories eventually coverge either to a single state (point attractor) or a cycle of states (cyclic attractor). In the extreme case, a cyclic attractor encompasses all or almost all possible network states, and given the large number of states 2^{m}, such behavior will appear *chaotic*. For any given attractor, its associated basin of attraction is the set of initial states that will converge to that attractor. Attractors of a BN are charaterized by the size (and shape) of their associated basin of attraction. The network topology (i.e., the set of nodes and edges) and the Boolean functions at each node fully determine the *attractor structure* – which consists of attractors, trajectories and basins of attraction. The attractor structure can be determined by explicitly evaluating all state transitions **X**^{1} = **F**(**X**^{0}) for all 2^{m} possible initial states **X**^{0}. An example of a 4-gene GRN with Boolean functions and resulting attractor structure is shown in Fig. 1.

Spontaneous transitions between attractors that underlie the *epigenetic landscape* or the quasipotential landscape requires probabilistic (noise-driven) dynamics. A discrete Markov model can be used to describe the BN dynamics. Let *p _{i}* give the probability for the occupation of a state

$${\mathit{p}}^{t+1}=\mathit{T}{\mathit{p}}^{t}=\left[\begin{array}{ccc}\hfill {T}_{11}\hfill & \hfill \dots \hfill & \hfill {T}_{1k}\hfill \\ \hfill \vdots \hfill & \hfill \ddots \hfill & \hfill \vdots \hfill \\ \hfill {T}_{k1}\hfill & \hfill \dots \hfill & \hfill {T}_{kk}\hfill \end{array}\right]\left(\begin{array}{c}\hfill {p}_{1}\hfill \\ \hfill \vdots \hfill \\ \hfill {p}_{k}\hfill \end{array}\right)$$

(2)

The steady state distribution is then given by eigenvector **p*** of Markov transition matrix **T** corresponding to the eigenvalue λ = 1. Thus, BN dynamics can be characterized by the eigenvectors of Markov matrix **T**. If **p*** has only one non-zero element, then the network has a point attractor. If **p*** has ω non-zero elements with 1 < ω < 2^{m}, then the network has cyclic attractors. If **p*** has 2^{m} non-zero elements, it means that every state can transition to every other state, and the network dynamics is ergodic.

We describe here the procedure to map a BN model to a Markov model with a simple example. Consider the toggle switch between two genes (*x* and *y*) with both cross-inhibitory interactions and self-activation as shown in Fig. 2. The BN model of the 2-gene toggle switch can have 2^{2} = 4 possible states: *s*_{1} = (0,0), *s*_{2} = (0,1), *s*_{3} = (1,0), and *s*_{4} = (1,1). The Boolean functions for the 2 genes in the BN model of the toggle switch are given by:

$${x}^{t+1}={x}^{t}\text{OR NOT}({x}^{t}\text{OR}{y}^{t})$$

$${y}^{t+1}={y}^{t}\text{OR NOT}({x}^{t}\text{OR}{y}^{t})$$

(3)

For the BN model of the toggle switch, the occupation probability of the 4 possible states is represented by a distribution vector **p** = (*p*_{1}, *p*_{2}, *p*_{3}, *p*_{4}). Based on Boolean functions in Eq. 3, the Markov transition matrix for the BN model of the toggle switch is:

$$\mathit{T}=\left(\begin{array}{cccc}\hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill 1\hfill & \hfill 0\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill 0\hfill & \hfill 1\hfill & \hfill 0\hfill \\ \hfill 1\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 1\hfill \end{array}\right)$$

(4)

where the element *T _{ij}* gives the transition probability from state

In a deterministic BN model, there cannot be any transitions between two states in different basins of attraction, and thus the relative stabilities of attractors have no meaning for such a Markov model. However, introduction of noise into deterministic BNs will render the possibility of transitions between two states in different basins of attraction. Thus, the notion of relative stability of the attractors can be defined in BNs with noise where there is possibility of transitions between attractors. Also, the Markov transition matrix for BNs with noise is ergodic because there is a non-zero probability of transition between any two states of the network.

In a BN with *m* genes, any state *s _{i}* in the Markov model can be represented by a

$${P}_{ij}=\{\begin{array}{cc}{C}_{{d}_{ij}}^{m}{\eta}^{{d}_{ij}}{(1-\eta )}^{m-{d}_{ij}}\hfill & (i\ne j)\hfill \\ 0\hfill & (i=j)\hfill \end{array}$$

(5)

where ${C}_{{d}_{ij}}^{n}$ are binomial coefficients, which guarantee that each column of matrix **P** sums up to 1 − (1 − η)^{m}. The BN model with noise is then constructed by adding the perturbation matrix **P** to the Markov matrix **T**:

$$\tilde{\mathbf{T}}={(1-\eta )}^{m}\mathbf{T}+\mathbf{P}$$

(6)

The equation for the dynamics of the BN model with noise is then:

$${\mathbf{p}}^{t+1}=\tilde{\mathbf{T}}{\mathbf{p}}^{t}$$

(7)

Note that the Markov matrix **T** for the deterministic BN model is such that each column of **T** adds up to 1. Similarly, in Eq. 6, a normalizing coefficient is used to ensure that also satisfies the conservation principle, i.e., each column of adds up to 1. Since the Markov matrix is ergodic for η > 0, the BN model with noise will have one and only one steady state probability distribution **p*** [21]. Equations 5 and 6 can be used to construct a BN model with noise which will always converge to a steady state probability distribution **p*** that can be directly calculated from Eq. 7. With the introduction of noise, the original dynamical system corresponding to the deterministic BN relaxes into an ergodic dynamical system, and we can quantify the relative stabilities of distinct attractors for such a system via two measures: steady states probability distribution **p*** and mean first passage time (MFPT).

In an ergodic dynamical system, it is possible to reach every possible state from any arbitrary state. A transition barrier that epitomizes the ease of transition from one attractor to another attractor offers a notion of relative stability. Relative stability can be measured via mean first passage time (MFPT) which evaluates the average number of time steps *M _{ij}* that are necessary for the system to transition from state

One can compute the matrix M_{ij} containing the MFPT for the transition between any two states **s**_{i} and **s**_{j} in the dynamical systems as follows. Let ${f}_{ij}^{(n)}$ denote that the probability that the first passage time from **s**_{i} to **s**_{j} is *n*. Then we have the equations:

$${\mathrm{P}}_{ij}^{(1)}={f}_{ij}^{(1)}$$

$$\begin{array}{c}\hfill {\mathrm{P}}_{ij}^{(2)}={f}_{ij}^{(2)}+{f}_{ij}^{(1)}\xb7{\mathrm{p}}_{jj}^{(1)}\hfill \\ \hfill \dots \hfill \\ \hfill {\mathrm{P}}_{ij}^{(n)}={f}_{ij}^{(n)}+{f}_{ij}^{(n-1)}\xb7{\mathrm{p}}_{jj}^{(1)}+{f}_{ij}^{(n-2)}\xb7{\mathrm{p}}_{jj}^{(2)}+\cdots +{f}_{ij}^{(1)}\xb7{\mathrm{p}}_{jj}^{(n-1)}\hfill \end{array}$$

(8)

where ${\mathrm{P}}_{ij}^{(n)}$ is the probability of transition between states **s**_{i} and **s**_{j} in *n* steps. Thus, we have the following recursive equation:

$${f}_{ij}^{(n)}={\mathrm{P}}_{ij}^{(n)}-\sum _{k=1}^{n-1}{f}_{ij}^{(n-k)}\xb7{p}_{jj}^{(k)},n=1,2,$$

(9)

for the first passage time, and the MFPT M_{ij} is then calculated as:

$${\mathrm{M}}_{ij}=1+\sum _{k\ne j}{\tilde{\mathrm{T}}}_{ik}{\mathrm{M}}_{kj}$$

(10)

Note that the shorter the number of time steps M_{ij} to transition between two states **s**_{i} and **s**_{j}, the lower is the transition barrier between two states s_{i} and state s_{j}. There exists a particular distribution of time steps required for the transition from state s_{i} to state s_{j} and MFPT is a mean value of this distribution.

Cell differentiation is a largely (spontaneously) irreversible process where the zygote differentiates robustly into other cell types in a pre-determined order. This programmed behavior that allows the zygote to robustly differentiate into other cell types in a well-defined sequence in the presence of molecular noise requires that the GRN orchestrates a consistent relative ordering of different attractors and intermediate transient states. By consistent relative ordering of different attractors, we mean that in the directed graph with each node as attractor state and edges representing spontaneous attractor transitions, there is little or ideally no circular (non-transitive) structure. Based on the relative stabilities of cell states measured via MFPT, we propose below a mathematical formula to define a consistent relative ordering of all attractors in the GRN.

Suppose we have a relative stability matrix **M** which gives the transition barrier between any two states based on MFPT, then we can define a relative ordering as follows. First, we define the net transition rate *D _{ij}* between two attractors,

$${D}_{ij}=\frac{1}{{M}_{ij}}-\frac{1}{{M}_{ji}}$$

(11)

Note that *D _{ij}* > 0 implies that attractor

Information on the topology of a GRN is increasingly available from direct experimental determination of gene regulatory mechanism or such information can be inferred from experimental data [23–27]. However, the Boolean function that controls the expression state of each gene based on the state of its input genes is more difficult to determine than network topology. In practice, a vast number of Boolean functions are plausible for a given network topology based on available experimental data [18]. Thus, it is desirable to apply biologically motivated constraints to limit the number of possible Boolean functions. We propose to use *canalyzing, nested canalyzing* and *unate* Boolean functions to charaterize the gene interactions and limit the set of possible Boolean functions for biological systems.

Kauffman proposed the concept of *canalyzing* Boolean functions (CFs) [2]. A Boolean function **F**(*x*_{1}, …, *x _{n}*) is CF if there exists an input

Later Kauffman extended this concept to *nested canalyzing* Boolean functions (NCFs) which are a subset of CFs [27]. A Boolean function **F**(*x*_{1}, …, *x _{n}*) is NCF if there exists a permutation of inputs (σ

Furthermore, it has been shown that CFs improve the robustness of the attractor states in BNs to both mutations (in the form of changes in architecture, such as rewiring of edges or deletion of nodes) as well as perturbations (in the form of instant *bit flips* of the node states), and thus, CFs push the BN dynamics towards an *ordered* regime [27,28].

Another class of Boolean functions that influence the network dynamics comprises the *unate* functions [28]. An unate function is a type of Boolean function which has monotonic properties. Since in biology a gene interaction is often characterized as either activating or inhibiting, *unate* functions can be used to capture the monotonic properties of the Boolean functions. A Boolean function **F**(*x*_{1}, *x*_{2}, …, *x _{n}*) is said to be

$$\mathit{F}(\dots ,{x}_{i-1,}1,{x}_{i+1},\dots )\ge \mathit{F}(\dots ,{x}_{i-1,}0,{x}_{i+1},\dots )$$

(12)

and *negative unate* in input *x _{i}* if ∀

$$\mathit{F}(\dots ,{x}_{i-1,}1,{x}_{i+1},\dots )\le \mathit{F}(\dots ,{x}_{i-1,}0,{x}_{i+1},\dots )$$

(13)

*Positive unateness* in input *x _{i}* means that when the input gene

Although the Boolean function controlling the expression state of an output gene based on the state of its input genes is difficult to determine, in many cases, the nature of gene interactions (activation or inhibition) between the input gene(s) and the output gene may be known from experimental data. Such information on the nature of gene interactions can be used to constrain the set of Boolean functions. Specifically, NCFs can be limited to a subset of *sign-compatible* functions (SGNs) based on such information on the nature of gene interactions. A NCF **F**(*x*_{1}, *x*_{2}, …, *x _{n}*) is said to be a SGN if every input

In summary, a hierarchy exists among the different types of Boolean functions considered here. CFs are a subset of all possible Boolean functions, NCFs are a subset of CFs and SGNs are a subset of NCFs. We show later that the *sign-compatiblity* plays an important role in constraining the set of possible BNs compatible with multiple observed attractors and in determining the relative stabilities of cell attractors.

Building the BN model for a GRN involves two important steps. Firstly, the topology of the underlying network has to be specified. This task is usually accomplished through synthesis of large amounts of experimental data from the literature on the specific biological process in question. Secondly, appropriate Boolean functions need to be specified for each gene in the network which determines the output expression state of genes based on the expression state of the input genes. This step is much more challenging compared to the task of specifying the topology of the network as the space of possible Boolean functions for a given network topology can be astronomical [18]. A possible strategy to restrict the space of possible Boolean functions for a given network topology is through imposition of dynamical constraints associated with known attractors of the GRN and the choice of specific type of Boolean functions. Firstly, through imposition of dynamical constraints, one demands that the BN with the chosen functions recapitulate the known attractors of system that correspond to cell types in a developmental GRN. Secondly, one can restrict the Boolean functions to certain desired mathematical forms [2,27,28,30,31] which may render the dynamics to be ordered and directional. Here, we hypothesized that canalyzing, nested canalyzing and sign-compatible functions may render the dynamics of the developmental GRN to be an ordered and directional process. We apply this framework next to model pancreas cell differentiation.

In this work, we apply the concept of relative stability of attractor states to study cell differentiation in BN models of human pancreas development.

The pancreas is an exocrine gland that secretes various digestion enzymes into the intestinal lumen. It is also an important endocrine organ: β cells of the pancreas islets secrete insulin into the circulatory system that regulates blood glucose levels and metabolism as shown in Fig. 3A. As deficiency or malfunction of β cells leads to diabetes, a major focus of research has been on cell regeneration or reprogramming to produce β cells. This line of research has led to the characterization of cell lineage determining factors in the pancreas development as well as reprogramming experiments to β cells. *Pdx1* is the first gene in embryonic development that marks the onset of the pancreas cell lineage. The cells with high *Ptf1a* expression are fated to the exocrine pancreas. A few cells that display temporarily high *Ngn3* expression will form the endocrine cell lineages. The expression of *Pax4* in some cells further specifies the β, δ lineages while *Arx* specifies the α,PP lineages (Fig. 3B).

Based on experimental literature, we reconstruct a minimal GRN for pancreas development shown in Fig. 3C that determines three cell types: the exocrine cells, the β/δ progenitor and the α/PP progenitor. The set of interactions between the five genes in the minimal GRN for pancreas development shown in Fig. 3C were deduced from the literature on ChIP-chip or knockout or overexpression experiments [32–37]. Supplementary Table S1 lists literature evidence in support of interactions between genes contained in the minimal GRN for pancreas development. Note that in Fig. 3C the interactions shown with continuous lines have known signs (activation or inhibition) based on experimental data while interactions shown with dashed lines have unknown signs. In principle, interactions with unknown signs can either mediate activation or inhibition.

We decided to build a BN model for pancreas development based on the structure of the minimal GRN of 5 genes along with known nature of interactions as shown in Fig. 3C. Since, the number of possible Boolean functions for a node with *k* inputs is 2^{2k}, the total number of all possible combinations of Boolean functions for the minimal GRN of 5 genes shown in Fig. 3C is 2^{21} × 2^{23} × 2^{23} × 2^{23} × 2^{23} ≈ 1.7 × 10^{10} (Table 1), and this number is too large for any practical computational purpose. Thus, we decided to impose (structural) constraints on the type of functions to limit the set of possible Boolean functions in the BN model of the pancreas development.

Number of possible combinations of Boolean functions satisfying structural and dynamical constraints associated with BN model of pancreas development.

As a first constraint, we explicitly adopt Boolean functions of the following type: CFs [2], NCFs [27] and SGNs (See Section 2.7). Such functions may render pancreas development to be an ordered and directional process [27,28]. Since, the number of possible CFs for a node with *k*=*1* input is 4 and with *k*=*3* inputs is 120, the total number of all possible combinations of CFs for the minimal GRN of 5 genes is 4 × 120 × 120 × 120 × 120 ≈ 8.3 × 10^{8} (Table 1). Similarly, the number of possible NCFs for a node with *k*=*1* input is 2 and with *k*=*3* inputs is 64, and the total number of all possible combinations of NCFs for the minimal GRN of 5 genes is 2 × 64 × 64 × 64 × 64 ≈ 3.4 × 10^{7} (Table 1). For the minimal GRN of 5 genes shown in Fig. 3C, we find the number of possible SGNs for the node with *k*=*1* input to be 1, and for each of the 4 nodes with *k*=*3* inputs to be 16. Thus, the total number of all possible combinations of SGNs for the minimal GRN of 5 genes is 1 × 16 × 16 × 16 × 16 = 65536 (Table 1). Note that while assigning SGNs to nodes which have some inputs with unknown nature of interaction (activation or repression), we allow both possibilities (positive or negative unateness) for such interactions with unknown signs in the SGN. In summary, as reported in Table 1, the number of possible Boolean functions for the minimal GRN of 5 genes decreases from 1.7 × 10^{10} to 8.3 × 10^{8} when restricting Boolean functions to CFs, to 3.4 × 10^{7} when restricting to NCFs, and down to 6.5 × 10^{4} when restricting to SGNs. Thus, restriction to SGNs significantly constrains the number of possible Boolean functions for the minimal GRN of 5 genes by reducing the functions to a fraction 3.8 × 10^{−6} of the complete space (Table 1).

Based on experimental measurements [32–36], we can define three cell states (corresponding to the exocrine cells, the β/δ progenitor and the α/PP progenitor) for the minimal GRN of 5 genes with each gene adopting binary values: 0 when the gene is not expressed in a cell lineage and 1 when the gene is expressed in a cell lineage, as shown in Fig. 3D. These three cell states or lineages (which are labelled attractor 1, 2, and 3 in Fig. 3D) must be stable attractor states of the BN model for pancreas development, and thus, represent a dynamical constraint on the network. Therefore, as a second constraint, we impose that the BN models for pancreas development based on topology of the minimal GRN of 5 genes must have at least three stable attractors with the gene expression patterns as defined in Fig. 3D. Here we further refine this condition into two cases: (i) BNs that have at least the three defined attractors; (ii) BNs that have exactly three defined attractors. In the first case, for *at least three attractors*, we find that the number of combinations of Boolean functions satisfying the dynamical constraint to be ~1.0×10^{6}. Among these ~1.0×10^{6} combinations, there are 78,400 CFs, within which there are 9,216 NCFs, and 3,600 SGNs (Table 1). In the second case, for *exactly three attractors*, we find the number of combinations of Boolean functions satisfying the dynamical constraint to be 86,042. Among these 86,042 combinations, there are 3,741 CFs, 219 NCFs, and 109 SGNs (Table 1). Supplementary Table S2 lists the 109 combinations of SGNs for the minimal GRN of 5 genes satisfying the dynamical constraint of *exactly three attractors*.

Our results reported in Table 1 show that the condition of sign-compatible functions is the strictest constraint leading to significant decrease in the number of possible Boolean functions. We remark that our method of applying successive structural constraints (CFs, NCFs and SGNs) to shrink the space of possible Boolean functions can serve as a powerful tool to reconstruct large-scale BNs from experimental datasets. The logic relationship of various constraints for the nested subsets of Boolean functions is shown in the Venn diagram in Fig. 3E.

From experimental observations, it is known that among the cell types of pancreas studied here, the exocrine cells are the least stable while the β, δ progenitors are more stable than the α, γ progenitors [38]. In order to study the relative stability of cell states in pancreas differentiation, we demand that the corresponding BN model not only recapitulates the correct (observed) gene expression patterns for the three cell types, but also that it reproduces the correct *developmental ordering* of the three cell types, that is, of the three attractors. Thus, we impose one additional functional constraint for BN model determination: transition rates between the attractors should reflect the developmental ordering. Consideration of this additional functional constraint of developmental ordering within BN framework to build models raised a central question: when one imposes structural and dynamical constraints as in the last section to shrink the space of possible BNs, does it lead to the enrichment of correct relative ordering of cell attractors? Here we define the relative ordering of cell attractors in BN models of development based on MFPT which is used to measure the relative stability of attractor states (See sections 2.5 and 2.6).

We employ an ensemble approach to determine whether imposition of increasing level of constraints on the set of possible BNs will increase the probability of obtaining the BNs with correct relative ordering. In other words, we decided to check if the set of BNs with correct relative ordering will be enriched when more stringent constraints are put on the ensemble of possible Boolean functions. To estimate the statistical significance, a null model was constructed to bootstrap the same number of Boolean functions as the ones defined by the structural constraints (CFs, NCFs and SGNs). For example, there are 9,216 NCFs among the set of 78,400 CFs satisfying the dynamical constraint of *at least three attractors*. In the null model, we use bootstrapping to repeatedly randomly choose 9,216 functions from the set of 78,400 CFs and calculated their ordering distribution based on MFPT to generate a Boxplot (Fig. 4). In Fig. 4, only the second panel represents the correct biological ordering (Stability of Attractor 1 < Stability of Attractor 3 < Stability of Attractor 2) for pancreas differentiation. From the Boxplot in Fig. 4, it is seen that when we impose increasing constraints on the type of Boolean functions in the order of CFs, NCFs and SGNs, the ratio of the BNs with correct relative ordering (represented by black dots), always outperforms the random sampling. Bootstrapping for other possible orderings show mixed results compared with random sampling. In fact, the opposite ordering (Stability of Attractor 2< Stability of Attractor 3 < Stability of Attractor 1) is diluted as can be seen in panel 5 of Fig. 4. These observations further validate our choice of the *sign-compatible* functions, which not only significantly decrease the number of possible BNs, but also promote correct ordering of the relative stabilities of cell attractors in the GRN for pancreas development.

Besides building a BN model for the correct relative ordering of cell attractors, it is of general interest to characterize the level of effort required for a cell to transition from one state to another state, i.e., is the BN landscape of cell state transitions rather flat or rugged [2,39,40]? A simple measure [41] to evaluate the ruggedness of the BN landscape is the entropy $S=-{\mathrm{\Sigma}}_{i}{p}_{i}^{*}\text{ln}{p}_{i}^{*}$ associated with the steady state probability distribution **p*** of the BN model with noise where elements ${p}_{i}^{*}$ give the probability for the system to be in state *s _{i}*. The larger the entropy

From Fig. 5B and 5C, we can see that the entropy distributions for different relative ordering schemes appear to be bimodal. Since the state space of the minimal GRN of 5 genes has 32 states, the absolute random case corresponds to the entropy $S=-{\mathrm{\Sigma}}_{i}{p}_{i}^{*}\text{ln}{p}_{i}^{*}=-\frac{1}{32}\text{ln}\frac{1}{32}\xb732\approx 3.46$. From Fig. 5B, it is seen that the maximum entropy for BNs of any relative ordering is below 1.8, suggesting that the selected networks were all in the ordered regime [2], far away from pure randomness. Note that smaller the value of entropy, greater is the level of constraints on a BN model. Thus, in Fig. 5B, the peak near zero reflects more rugged landscapes while the peak far away from zero denotes the flat landscapes. It is interesting to notice that even two dynamical systems with exactly the same attractors and the same relative ordering, can display a distinct overall landscape, either flat or rugged. Cell reprogramming is much more difficult in the rugged landscape than in the flat one because of the many intermediate attractors between the origin and the destination attractors. Thus, the above described scheme could be exploited for the analysis, and perhaps, rational protocol design of cell reprogramming. Specifically, one may be able to perturb the GRN such as to make the landscape flatter while maintaining its biological ordering of cell attractors.

We next tried to identify common features across the set of Boolean functions leading to rugged landscapes in the minimal GRN of 5 genes. Given any Boolean function, we define a quantity called *rule number* associated with the function that gives the number of Boolean rules separated by logical OR in the disjunctive normal form of the Boolean function. For example, the Boolean function *A* AND NOT *B* AND *C* has rule number 1; the Boolean function (NOT *A* AND *B*) OR (*B* AND NOT *C*) has rule number 2. In Figs. 6A and 6B, we plot the distribution of rule numbers for Boolean functions at each node of the minimal GRN of 5 genes for both the flat and the rugged landscapes. The main difference between the two types of landscapes we found was that the Boolean functions of the rugged landscape had, on average, higher rule numbers than those of the flat ones. One possible explanation for this finding is as follows: In case of dynamical systems with the same set of attractors, more rule numbers in Boolean functions leads to a more rugged landscape, which in turn renders more difficult the cell state transitions.

In the post-genome era, understanding of biological processes has shifted from a notion of causation focused on few genes or linear regulatory pathways to one that takes into account the high-dimensional dynamics of complex nonlinear systems, prosaically represented by GRNs [4– 17,42–44]. Many higher-level cell functions, such as cell differentiation, cell cycle, immune responses or neuronal activities can only be explained by the dynamics of complex biological networks [4–17]. In this paper, we built on the old elementary concept of the cell attractor that is now gaining attention and showed that Waddington’s epigenetic landscape [39] has a formal basis even in the BN framework [1–3]. Porting the landscape concept to the realm of discrete networks allowed us to use computational tools to study the landscape as additional information derived from the GRN. We defined the relative stability of network states on the BN landscape, thus providing a formal and quantifiable basis to the elevation in the *landscape for Boolean network*. We proposed methods to enforce the relative ordering of attractor states in BN models as a novel requirement for constraining large ensembles of BNs to the biologically relevant class. We showed in our example that even with incomplete information about network structures, the use of BN can capture the essential dynamics of cell fate change and permit the estimation of relative stability and transition barriers between the cell attractors. This mathematical framework has the potential to assist the rational design of perturbation protocols for directing cell reprogramming in order to generate the desired cell lineages in regenerative medicine. As the knowledge of the structure of GRNs governing the development of various tissues increases in the next decade, the utilization of such network information for therapeutic reprogramming may benefit from the concepts developed here.

Research reported in this publication was supported by the Center for Systems Biology/2P50GM076547 of the National Institutes of Health under award R01GM987654. This research was also supported by the National Science Foundation Grant PHY11-25915. The content is solely the responsibility of the authors and does not necessarily represent the official views of the funding agencies.

- BN
- Boolean network
- CF
- Canalyzing function
- GRN
- Gene regulatory network
- MFPT
- Mean first passage time
- NCF
- Nested canalyzing function
- ODE
- Ordinary differential equation
- SGN
- Sign-compatible function

**Publisher's Disclaimer: **This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

**Supplementary Material**

**Supplementary Table S1:** Literature evidence in support of interactions between genes in the minimal GRN for pancreas development.

**Supplementary Table S2:** List of 109 combinations of sign-compatible functions (SGNs) for the 5 gene minimal GRN for pancreas development satisfying the dynamical constraint of exactly three attractors.

1. Kauffman SA. Metabolic stability and epigenesis in randomly constructed genetic nets. J. Theor. Biol. 1969;22:437–467. [PubMed]

2. Kauffman SA. The Origins of Order. New York: Oxford Univ. Press; 1993.

3. Shmulevich I, Kauffman SA. Activities and sensitivities in Boolean network models. Phys Rev Lett. 2004;93:48701. [PMC free article] [PubMed]

4. Huang S, Ingber DE. Shape-dependent control of cell growth, differentiation, and apoptosis: switching between attractors in cell regulatory networks. Exp. Cell Res. 2000;261:91–103. [PubMed]

5. Klemm K, Bornholdt S. Topology of biological networks and reliability of information processing. Proc. Natl. Acad. Sci. U. S. A. 2005;102:18414–18419. [PubMed]

6. Balleza E, Alvarez-Buylla ER, Chaos A, Kauffman A, Shmulevich I, Aldana M. Critical dynamics in genetic regulatory networks: examples from four kingdoms. PLoS One. 2008;3:e2456. [PMC free article] [PubMed]

7. Torres-Sosa C, Huang S, Aldana M. Criticality as an emergent property of genetic networks that exhibit evolvability. PLoS Comput. Biol. 2012;8:e1002669. [PMC free article] [PubMed]

8. Espinosa-Soto C, Martin OC, Wagner A. Phenotypic plasticity can facilitate adaptive evolution in gene regulatory circuits. BMC Evol. Biol. 2011;11:5. [PMC free article] [PubMed]

9. Lau KY, Ganguli S, Tang C. Function constrains network architecture and dynamics: A case study on the yeast cell cycle Boolean network. Phys. Rev. E - Stat. Nonlinear, Soft Matter Phys. 2007;75 [PubMed]

10. Samal A, Jain S. The regulatory network of E. coli metabolism as a Boolean dynamical system exhibits both homeostasis and flexibility of response. BMC Syst. Biol. 2008;2:21. [PMC free article] [PubMed]

11. Li F, Long T, Lu Y, Ouyang Q, Tang C. The yeast cell-cycle network is robustly designed. Proc. Natl. Acad. Sci. U. S. A. 2004;101:4781–4786. [PubMed]

12. Albert R. The topology of the regulatory interactions predicts the expression pattern of the segment polarity genes in Drosophila melanogaster. J. Theor. Biol. 2003;223:1–18. [PubMed]

13. Villani M, Barbieri A, Serra R. A dynamical model of genetic networks for cell differentiation. PLoS One. 2011;6:e17703. [PMC free article] [PubMed]

14. Krumsiek J, Marr C, Schroeder T, Theis FJ. Hierarchical Differentiation of Myeloid Progenitors Is Encoded in the Transcription Factor Network. PLoS One. 2011;6:e22649. [PMC free article] [PubMed]

15. Chang R, Shoemaker R, Wang W. Systematic Search for Recipes to Generate Induced Pluripotent Stem Cells. PLoS Comput. Biol. 2011;7:e1002300. [PMC free article] [PubMed]

16. Flöttmann M, Scharp T, Klipp E, Floettmann M. A Stochastic Model of Epigenetic Dynamics in Somatic Cell Reprogramming. Front. Physiol. 2012;3:216. [PMC free article] [PubMed]

17. Villarreal C, Padilla-Longoria P, Alvarez-Buylla E. General Theory of Genotype to Phenotype Mapping: Derivation of Epigenetic Landscapes from N-Node Complex Gene Regulatory Networks. Phys. Rev. Lett. 2012;109:1–5. [PubMed]

18. Henry A, Monéger F, Samal A, Martin OC. Network function shapes network structure: the case of the Arabidopsis flower organ specification genetic network. Mol. Biosyst. 2013;9:1726–1735. [PubMed]

19. Zhou J-XX, Aliyu MDS, Aurell E, Huang S. Quasi-potential landscape in complex multi-stable systems. J. R. Soc. Interface. 2012;9:3539–3553. [PMC free article] [PubMed]

20. Shmulevich I, Gluhovsky I, Hashimoto RF, Dougherty ER, Zhang W. Steady-State Analysis of Genetic Regulatory Networks Modelled by Probabilistic Boolean Networks. Comp. Funct. Genomics. 2003;4:601–608. [PMC free article] [PubMed]

21. Latouche G, Ramaswamy V. Introduction to Matrix Analytic Methods in Stochastic Model. ASA/SIAM Series on Statistics and Applied Probability. 1999

22. Kemeny JG, Snell JL. Finite Markov chains. 1960

23. Lee TI, et al. Transcriptional regulatory networks in Saccharomyces cerevisiae. Science. 2002;298:799–804. [PubMed]

24. Herrgård MJ, Covert MW, Palsson BØ. Reconstruction of microbial transcriptional regulatory networks. Curr. Opin. Biotechnol. 2004;15:70–77. [PubMed]

25. Blais A, Dynlacht BD. Constructing transcriptional regulatory networks. Genes Dev. 2005;19:1499–1511. [PubMed]

26. Hu Z, Killion PJ, Iyer VR. Genetic reconstruction of a functional transcriptional regulatory network. Nat. Genet. 2007;39:683–687. [PubMed]

27. Hecker M, Lambeck S, Toepfer S, Van Someren E, Guthke R. Gene regulatory network inference: data integration in dynamic models—a review. Biosystems. 2009;96(1):86–103. [PubMed]

28. Kauffman S, Peterson C, Samuelsson B, Troein C. Random Boolean network models and the yeast transcriptional network. Proc. Natl. Acad. Sci. U. S. A. 2003;100:14796–14799. [PubMed]

29. Nikolajewa S, Friedel M, Wilhelm T. Boolean networks with biologically relevant rules show ordered behavior. Biosystems. 2007;90(1):40–47. [PubMed]

30. Jarrah AS, Raposa B, Laubenbacher R. Nested Canalyzing, Unate Cascade, and Polynomial Functions. Physica D. 2007;233:167–174. [PMC free article] [PubMed]

31. Ebadi H, Klemm K. Boolean networks with veto functions. Physical Review E. 2014;90(2):022815. [PubMed]

32. Zhou J-XX, Brusch L, Huang S, Zhou J-XX. Predicting pancreas cell fate decisions and reprogramming with a hierarchical multi-attractor model. PLoS One. 2011;6:e14752. [PMC free article] [PubMed]

33. Chiang M-K, Melton DA. Single-Cell Transcript Analysis of Pancreas Development. Dev. Cell. 2003;4:383–393. [PubMed]

34. Gittes GK. Developmental biology of the pancreas: a comprehensive review. Dev. Biol. 2009;326:4–35. [PubMed]

35. Murtaugh LC. Pancreas and beta-cell development: from the actual to the possible. Development. 2007;134:427–438. [PubMed]

36. Jensen J. Gene regulatory factors in pancreatic development. Dev. Dyn. 2004;229:176–200. [PubMed]

37. Habener JF, Kemp DM, Thomas MK. Minireview: transcriptional regulation in pancreatic development. Endocrinology. 2005;146:1025–1034. [PubMed]

38. Collombat P, et al. The ectopic expression of Pax4 in the mouse pancreas converts progenitor cells into alpha and subsequently beta cells. Cell. 2009;138:449–462. [PMC free article] [PubMed]

39. Waddington CH. The Strategy of the Genes. A Discussion of Some Aspects of Theoretical Biology (Alen & Unwin) 1957

40. Goldberg AD, Allis CD, Bernstein E. Epigenetics: a landscape takes shape. Cell. 2007;128(4):635–638. [PubMed]

41. Malan KM, Engelbrecht AP. A survey of techniques for characterising fitness landscapes and some possible ways forward. Information Sciences. 2013;241:148–163.

42. Kitano H. Systems biology: a brief overview. Science. 2002;295:1662–1664. [PubMed]

43. De Jong H. Modeling and simulation of genetic regulatory systems: a literature review. Journal of computational biology. 2002;9(1):67–103. [PubMed]

44. Davidson E, Levin M. Gene regulatory networks. Proceedings of the national academy of sciences of the United States of America. 2005;102(14):4935–4935. [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |