Home | About | Journals | Submit | Contact Us | Français |

**|**Curr Genomics**|**v.10(7); 2009 November**|**PMC2808677

Formats

Article sections

- Abstract
- 1. INTRODUCTION
- 2. PRELIMINARIES
- 3. ANALYSES OF BOOLEAN MODELS
- 4. SIMULATION ISSUES WITH BOOLEAN MODELS
- 5. CLOSING WORDS
- REFERENCES

Authors

Related links

Curr Genomics. 2009 November; 10(7): 511–525.

PMCID: PMC2808677

Received 2008 December 4; Revised 2009 May 11; Accepted 2009 May 11.

Copyright ©2009 Bentham Science Publishers Ltd.

This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.5/), which permits unrestrictive use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article has been cited by other articles in PMC.

Driven by the desire to understand genomic functions through the interactions among genes and gene products, the research in gene regulatory networks has become a heated area in genomic signal processing. Among the most studied mathematical models are Boolean networks and probabilistic Boolean networks, which are rule-based dynamic systems. This tutorial provides an introduction to the essential concepts of these two Boolean models, and presents the up-to-date analysis and simulation methods developed for them. In the Analysis section, we will show that Boolean models are Markov chains, based on which we present a Markovian steady-state analysis on attractors, and also reveal the relationship between probabilistic Boolean networks and dynamic Bayesian networks (another popular genetic network model), again *via *Markov analysis; we dedicate the last subsection to structural analysis, which opens a door to other topics such as network control. The Simulation section will start from the basic tasks of creating state transition diagrams and finding attractors, proceed to the simulation of network dynamics and obtaining the steady-state distributions, and finally come to an algorithm of generating artificial Boolean networks with prescribed attractors. The contents are arranged in a roughly logical order, such that the Markov chain analysis lays the basis for the most part of Analysis section, and also prepares the readers to the topics in Simulation section.

In most living organisms, genome carries the hereditary information that governs their life, death, and reproduction. Central to genomic functions are the coordinated interactions between genes (both the protein-coding DNA sequences and regulatory non-coding DNA sequences), RNAs and proteins, forming the so called gene regulatory networks (or genetic regulatory networks).

The urgency of understanding gene regulations from systems level has increased tremendously ever since the early stage of genomics research. A driving force is that, if we can build good gene regulatory network models and apply intervention techniques to control the genes, we may find better treatment for diseases resulting from aberrant gene regulations, such as cancer. In the past decade, the invention of high throughput technologies has made it possible to harvest large quantities of data efficiently, which is turning the quantitative study of gene regulatory networks into a reality. Such study requires the application of signal processing techniques and fast computing algorithms to process the data and interpret the results. These needs in turn have fueled the development of genomic signal processing and the use of mathematical models to describe the complex interactions between genes.

The roles of mathematical models for gene regulatory networks include:

- Describing genetic regulations at a system level;
- Enabling artificial simulation of network behavior;
- Predicting new structures and relationships;
- Making it possible to analyze or intervene in the network through signal processing methods.

Among various mathematical endeavors are two Boolean models, Boolean networks (BNs) [1] and probabilistic Boolean networks (PBNs) [2], in which each node (gene) takes on two possible values, ON or OFF (or 1 and 0), and the way genes interact with each other is formulated by standard logic functions. They constitute an important class of models for gene regulatory networks, in that they capture some fundamental characteristics of gene regulations, are conceptually simple, and their rule-based structures bear physical and biological meanings. Moreover, Boolean models can be physically implemented by electronic circuits, and demonstrate rich dynamics that can be studied using mathematical and signal processing theory (for instance, Markov chains [2, 3]).

In practice, Boolean models have been successfully applied to describe real gene regulatory relations (for instance, the *drosophila* segment polarity network [4]), and the attractors of BNs and PBNs have been associated with cellular phenotypes in the living organisms [5]. The association of network attractors and actual phenotypes has inspired the development of control strategy [6] to increase the possibility of reaching desirable attractors (“good” phenotypes) and decrease the likelihood of undesirable attractors (“bad” phenotypes such as cancer). The effort of applying control theory to Boolean models is especially appealing in the medical community, as it holds potential to guide the effective intervention and treatment in cancer.

The author would like to bring the fundamentals of Boolean models to a wider audience in light of their theoretical value and pragmatic utility. This tutorial will introduce the basic concepts of Boolean networks and probabilistic Boolean networks, present the mathematical essentials, and discuss some analyses developed for the models and the common simulation issues. It is written for researchers in the genomic signal processing area, as well as researchers with general mathematics, statistics, engineering, or computer science backgrounds who are interested in this topic. It intends to provide a quick reference to the fundamentals of Boolean models, allowing the readers to apply those techniques to their own studies. Formal definitions and mathematical foundations will be laid out concisely, with some in-depth mathematical details left to the references.

In Boolean models, each variable (known as a *node*) can take two possible values, 1 (ON) and 0 (OFF). A node can represent a gene, RNA sequence, or protein, and its value (1 or 0) indicates its measured abundance (expressed or unexpressed; high or low). In this paper, we use “node” and “gene” interchangeably.

A *state* in Boolean models is a binary vector of all the gene values measured at the same time, and is also called the gene activity (or expression) profile (GAP). The *state space* of a Boolean model consists of all the possible states, and its size will be 2* ^{n}* for a model with

**Definition 1**[2, 7] A Boolean network is defined on a set of *n* binary-valued nodes (genes) $V=\left\{{x}_{1},\cdot \cdot \cdot ,{x}_{n}\right\},{x}_{i}\in \left\{0,1\right\}$, where each node *x _{i}* has

$${x}_{i}\left(t+1\right)={f}_{i}\left({x}_{\mathit{i1}}\left(t\right),{x}_{\mathit{i2}}\left(t\right),...,{x}_{{\mathit{ik}}_{i}}\left(t\right)\right),\left\{i1,\cdot \cdot \cdot ,{\mathit{ik}}_{i}\right\}\u2ac5\left\{1,\cdot \cdot \cdot ,n\right\}$$

(1)

*k _{i}* is called the

In Boolean networks, genetic interactions and regulations are hard-wired with the assumption of biological determinism. However, any gene regulatory network is not a closed system and has interactions with its environment and other genetic networks, and it is also likely that genetic regulations are inherently stochastic; therefore, Boolean networks will have limitations in their modeling power. Probabilistic Boolean networks were introduced to address this issue [2, 7], such that they are composed of a family of Boolean networks, each of which is considered a context [8]. At any given time, gene regulations are governed by one component Boolean network, and network switchings are possible such that at a later time instant, genes can interact under a different context. In this sense, probabilistic Boolean networks are more flexible in modeling and interpreting biological data.

**Definition 2** [2, 3, 7] A probabilistic Boolean network is defined on $V=\left\{{x}_{1},\cdot \cdot \cdot ,{x}_{n}\right\},{x}_{i}\in \left\{0,1\right\}$
, and consists of *r* Boolean networks *β*_{1}(*V*,**f**_{1}),...,*β _{r}*(

$$\text{x}\left(t+1\right)=\left\{\begin{array}{c}{\text{f}}_{j}\left(\text{x}\left(t\right)\right),\mathrm{with\; probability}{\left(1-p\right)}^{n},\\ \text{x}\left(t\right)\oplus \mathrm{\gamma},\mathrm{with\; probability}1-{\left(1-p\right)}^{n},\end{array}\right.$$

(2)

where is bit-wise modulo-2 addition, *γ*=(*γ*_{1},...,*γ _{n}*) is a random vector with P

Alternatively, the PBN can be represented as *G*(*V*,Ψ,*α*,*p*), with Ψ = {Ψ_{1},...,Ψ_{n}} and *α*={*α*_{1},...,*α _{n}*}. In this representation, each node

$${\text{f}}_{j}=\left({\mathrm{\psi}}_{{j}_{1}}^{\left(1\right)},...,{\mathrm{\psi}}_{{j}_{n}}^{\left(n\right)}\right),{j}_{i}\in \left\{1,...,l\left(i\right)\right\}.$$

(3)

Moreover, if it is an independent PBN, namely $\mathrm{Pr}\left\{{\mathrm{\psi}}_{{j}_{1}}^{\left(1\right)},...,{\mathrm{\psi}}_{{j}_{n}}^{\left(n\right)}\right\}={\prod}_{i=1}^{n}\mathrm{Pr}\left\{{\mathrm{\psi}}_{{j}_{i}}^{\left(i\right)}\right\}$
, **c** and *α* are related by

$${c}_{j}=\prod _{i=1}^{n}{\mathrm{\alpha}}_{{j}_{i}}^{\left(i\right)}.$$

(4)

**Remark 1 ***q* does not appear in the PBN representation, because according to the network switching scheme described, it can be shown that the probability of being in the *β _{j}* at any time is equal to

A Boolean model with finite number of nodes has a finite state space. From the definition of Boolean network, it follows that its state transitions are deterministic, that is, given a state, its successor state is unique. Naturally, if we represent the whole state space and the transitions among the sates of a BN graphically, we can have a *state transition diagram*.

**Definition 3 **The state transition diagram of an *n*-node Boolean network *β*(*V*, **f**) is a directed graph *D*(*S*,*E*). *S* is a set of 2* ^{n}* vertices, each representing a possible state of a Boolean network;

Fig. (**11**) is an example of state transition diagram of a three-node BN. Like BNs, a PBN also has finite state space. Although state transitions in a PBN are not deterministic, they can be represented probabilistically. We will show how to construct the state transition diagram of a PBN in the Simulation section.

With the help of state transition diagram, such as the one in Fig. (**11**), we can easily visualize that in a BN, any state trajectory in time **x**(0)→ **x**(1)→ **x**(2)→ ... must end up in a “trap”, and stay there forever unless a gene perturbation occurs. Similarly, if neither gene perturbation nor network switching has occurred, a time trajectory in a PBN will end up in a “trap” in one of the component BNs too; however, either gene perturbation or a network switch may cause it to escape from the trap. In spite of this, when gene perturbation and network switching are rare, a PBN is most likely to reach a “trap” before either occurs and will spend a reasonably long time there.

**Definition 4 **Starting from any initial state in a finite Boolean network, when free of gene perturbation, state transitions will allow the network to reach a finite set of states {**a**_{1},...,**a**_{m}} and cycle among them in a fixed order forever. The set of states is called an attractor, denoted by *A*. If *A* contains merely one state, it is a singleton attractor; otherwise, it is an attractor cycle. The set of states from which the network will eventually reach an attractor *A* constitutes the basin of attraction of *A*. A BN may have more than one attractor.

The attractors of a PBN are defined as the union of attractors of its component BNs. In particular, if a PBN is composed of *r* BNs, and the *k* -th BN has *m _{k}* attractors,

In a BN, different basins of attraction are depicted in the state transition diagram as disjoint subgraphs. In Fig. (**11**), *D*(*S*, *E*) is composed of three disjoint subgraphs, *D*_{1}(*S*_{1},*E*_{1}), *D*_{2}(*S*_{2},*E*_{2}) , and *D*_{3}(*S*_{3},*E*_{3}). 110 and 101 are singleton attractors, while 100 and 111 constitute a cycle. Their respective basins of attraction are *S*_{1} = {000,010,110}, *S*_{2} = {101} and *S*_{3} = {001,011,100,111}.

We are interested in the attractors of a Boolean model for at least two reasons: (1) Attractors represent the stable states of a dynamic system, thus they are tied to the long term behavior of Boolean models; (2) Earlier researchers demonstrated the association of cellular phenotype with attractors [5], thus giving a biological meaning to the attractors. Intuitively, when an attractor has a large basin of attraction, the corresponding phenotype is more likely than that of an attractor with much smaller basin of attraction. To develop intervention strategies that change the long term behavior of Boolean models, it is important to study the attractors.

Although analysis and simulation are two parallel subjects with Boolean models, the former includes some essential results that lay a foundation for the latter. In this section, we visit Boolean model analysis first.

One of the central ideas with Boolean models is their connection with Markov chains (subsection 3.1). Because of this, Boolean models, under certain conditions, possess steady-state distributions. The steady-state probabilities of attractors, which indicate the long-run trend of network dynamics, can be found analytically *via *Markov chain analysis (subsection 3.2). Moreover, the relationship between PBNs and Bayesian networks (another class of gene regulatory network models) can be established in a similar manner (subsection 3.3). Lastly, a subsection will be dedicated to structural analysis, which opens a door to other topics beyond this tutorial (such as control of genetic networks).

As readers will find out soon, the transition probability matrix introduced below is not only a convenience in Markov chain analysis, but also finds itself useful in simulation, to be discussed in Section 4.

On a Boolean model of *n* nodes, a *transition probability matrix* $T={\left[{t}_{\mathit{ij}}\right]}_{{2}^{n}\times {2}^{n}}$
can be defined where *t _{ij}* indicates the probability of transition from one state (which is equal to

In a Boolean network *β*(*V*,**f**), *t _{ij}* can be computed by

$${t}_{\mathit{ij}}=\left\{\begin{array}{c}1,\exists s\in {\left\{0,1\right\}}^{n}\text{such that dec}\left(s\right)=i-1,\mathit{dec}\left(f\left(s\right)\right)=j-1,\\ 0,\text{otherwise},\end{array}\right.$$

(5)

where dec(.) converts a binary vector to an integer, for instance, dec(00101) = 5. Since BN is deterministic, *T* contains one 1 on each row, and all other elements are 0's.

In a PBN consisting of *r* BNs *β*_{1}(*V*,**f**_{1}),...,*β _{r}*(

${t}_{\mathit{ij}}=\sum _{k=1}^{r}\mathrm{Pr}\left\{{\beta}_{k}\mathrm{is\; selected}\right\}.$

$$\begin{array}{c}\mathrm{Pr}\left\{\text{s}\to \text{w},\mathrm{dec}\left(\text{s}\right)=i-1,\mathrm{dec}\left(\text{w}\right)=j-1\left|{\mathrm{\beta}}_{k}\text{is selected}\right.\right\}\\ =\sum _{k=1}^{r}{c}_{k}.\left[\mathrm{Pr}\left\{\text{s}\to w\text{by state transition},\mathrm{dec}\left(\text{s}\right)=i-1,\mathrm{dec}\left(\text{w}\right)=j-1\left|{\text{f}}_{k}\right.\right\}\right.\\ +\mathrm{Pr}\left.\left\{\text{s}\to w\text{by random gene perturbation},\text{dec}\left(\text{s}\right)=\text{i}-1,\mathrm{dec}\left(\text{w}\right)=j-1\left|{\text{f}}_{k}\right.\right\}\right]\\ =\sum _{k=1}^{r}{c}_{k}.\left[{\left(1-p\right)}^{n}\mathrm{Pr}\left\{\text{s}\to {\text{f}}_{k}\left(\text{s}\right),\mathrm{dec}\left(\text{s}\right)=i-1,\mathrm{dec}\left({\text{f}}_{k}\left(\text{s}\right)\right)=j-1\left|{\text{f}}_{k}\right.\right\}\right.\\ \left.+\mathrm{Pr}\left\{\text{s}\to \text{s}\oplus \mathrm{\gamma},\mathrm{\gamma}\ne \left(0,...,0\right),\mathrm{dec}\left(\text{s}\right)=i-1,\mathrm{dec}\left(s\oplus \mathrm{\gamma}\right)=j-1\left|{\text{f}}_{k}\right.\right\}\right]\\ =\sum _{k=1}^{r}{c}_{k}.\left[{\left(1-p\right)}^{n}{1}_{\left\{\text{s}\to {\text{f}}_{k}\left(\text{s}\right),\mathrm{dec}\left(\text{s}\right)=i-1,\mathrm{dec}\left({\text{f}}_{k}\left(\text{s}\right)\right)=j-1\right\}}+{p}_{\mathrm{\gamma}}{1}_{\left[i\ne j\right]}\right],\end{array}$$

(6)

where **1** 's are indicator functions, ${p}_{\mathrm{\gamma}}=\left(\begin{array}{c}n\\ l\end{array}\right){p}^{l}{\left(1-p\right)}^{n-1},$
, *l*=number of 1's in the random vector *γ*=(*γ*_{1},...,*γ _{n}*), and

When taking a closer look at Eq. (6), we find that *T* is the sum of a fixed transition matrix $\stackrel{-}{T}$
and a perturbation matrix $\stackrel{\sim}{T}$
,

$$\overline{T}={\left(1-p\right)}^{n}\sum _{j=1}^{r}{c}_{j}{T}_{j},$$

(7)

$$\stackrel{\sim}{T}=\left[{\tilde{t}}_{\mathit{ij}}\right],{\tilde{t}}_{\mathit{ij}}=n\left(\begin{array}{c}n\\ {n}_{\mathit{ij}}\end{array}\right){p}^{{\mathit{\eta}}_{\mathit{ij}}}{\left(1-p\right)}^{{n-\mathit{\eta}}_{\mathit{ij}}}{1}_{\left[i\ne j\right]},$$

(8)

where *T _{j}* and

*T _{j}* is sparse with only 2

Given the definition of *T* matrix in Section 3.1.1, we can see that a Boolean model with *n* genes is a homogeneous Markov chain of *N* = 2* ^{n}* states, with

What is the use of matrix *T* ? Let *W* = *T ^{n}*, we can show that the (

The proof is left as an exercise to the reader.

An *N* -state Markov chain possesses a *stationary distribution* (or invariant distribution) if there exists a probability distribution *π* = (*π*_{1},...,*π _{N}*) such that

*π* = *πT*.

*π* = *πT* implies $\mathrm{\pi}={\mathrm{\pi}T}^{n},\forall n$. Thus in a Markov chain with stationary distribution *π* , if we start from the i -th state with probability *π _{i}*, the chance of being in any state

An *N* -state Markov chain possesses a *steady-state distribution* ${\mathrm{\pi}}^{\ast}=\left({\mathrm{\pi}}_{1}^{\ast},...,{\mathrm{\pi}}_{N}^{\ast}\right)$
if starting from any initial distribution *π*

${\mathrm{\pi}}^{\ast}=\underset{k\to \mathrm{\infty}}{\text{lim}}\mathrm{\pi}{T}^{k},$

it means that regardless of the initial state, the probability of a Markov chain being in the *i* -th state in the long run is ${\mathrm{\pi}}_{i}^{\ast}$
. A Markov chain possessing a stationary distribution does not necessarily possess a steady-state distribution.

Why should it be of our concern if the Markov chain has a steady-state distribution or not? This is because we are interested in the Boolean model associated with the Markov chain, and would like to know how it behaves in the long-run. As a reminder, the attractors of a Boolean model are often associated with cellular phenotype, and by finding out the steady-state probabilities of a given attractor, we can have a general picture of the likelihood of a certain phenotype. When a Boolean model possesses (namely, its Markov chain possesses) a steady-state distribution, we can find those probabilities by simulating the model for a long time, starting from an arbitrary initial state **x**(0). In fact, this implies the equivalence of “space average” and “time average”, as is a common concept in stochastic processes.

When will a Markov chain possess a steady-state distribution? It turns out that an *ergodic* Markov chain will do. A Markov chain is said to be ergodic if it is irreducible and aperiodic [10].

**Definition 5 **A Markov chain is irreducible if it is possible to go from every state to every state (not necessarily in one move).

**Definition 6 **In a Markov chain, a state has period *d* if starting from this state, we can only return to it in *n* steps and *n* is a multiple of *d*. A state is periodic if it has some period > 1. A Markov chain is aperiodic if none of its state is periodic.

A Boolean network possesses a stationary distribution, but not a steady-state distribution unless it has one singleton attractor and no other attractors. Here we show how to find a stationary distribution. Assume a BN has *m* singleton attractors, **a**_{1},...,**a*** _{m}*, or an attractor cycle {

When *p*,*q*> 0, a PBN possesses steady-state distribution, because the Markov chain corresponding to the PBN is ergodic. Interested readers can find the proof in [3]. Now that PBN has a steady-state distribution, we can obtain such distribution in two ways: (2) solving the linear equations
$\mathrm{\pi}\left(T-I\right)=0,{\sum}_{i=1}^{N}{\mathrm{\pi}}_{i}=1$
(*I* is the identity matrix), and interested readers can consult books on linear algebra; (2) using the empirical methods in Section 4.3. If we are interested in the steady-state probabilities of the attractors only, an analytic method exists, to be discussed next.

Recall from Section 2 that attractors are important to the long-term behavior of Boolean models because they are associated with cellular phenotypes; now we also know that PBNs possess steady-state distributions, which means that a PBN has a unique long-term trend independent of initial state. Therefore, we would naturally ask the question, how can we find the long-term probabilities of these attractors which are so important to us?

In the following, we will present a Markov chain based analytic method that answers this question, and more details, including proofs, can be found in [11].

First consider a special case of PBN, Boolean network with perturbations (BNp), in which any gene has a probability *p* of flipping its value. A BNp inherits all the attractors and corresponding basins of attraction from the original BN. Because of the random gene perturbations, BNps possess steady-state distributions (the proof is similar to that of PBN, and it is left as an exercise to the reader).

A BNp defined on *V* = {*x*_{1},...,*x _{n}*} with gene perturbation rate

For **X*** _{t}*, there exists a unique steady-state distribution

Assume the BNp has attractors *A*_{1},...,*A _{m}*, with corresponding basins of attraction (or simply referred to as basins)

$$\mathrm{\pi}\left({A}_{k}\right)=\mathrm{\pi}\left({A}_{k}\left|{B}_{k}\right.\right)\mathrm{\pi}\left({B}_{k}\right).$$

(9)

Therefore, we can compute the steady-state probability of any attractor *A _{k}* by the following two steps: (1) the steady-state probability of basin

Define a random variable *τ*(*t*) which measures the time elapsed between the last perturbation and the current time *t*. *τ*(*t*) = 0 means a perturbation occurs at *t*. For any starting state **h**, let

$${P}_{{B}_{i}}^{\ast}\left({B}_{k}\right)=\underset{t\to \mathrm{\infty}}{\mathrm{lim}}\mathrm{Pr}\left\{{\text{X}}_{t}\in {B}_{k}\left|{\text{X}}_{t-1}\in {B}_{i},{\text{X}}_{0}=\text{h},\mathrm{\tau}\left(t\right)=0\right.\right\},$$

(10)

and define the conditional probability of being in state $\text{x}\in B$
given that the system is inside a set *B*, prior to a perturbation,

$${\mathrm{\pi}}^{\ast}\left(\text{x}\left|B\right.\right):=\underset{t\to \mathrm{\infty}}{\mathrm{lim}}\mathrm{Pr}\left\{{\text{X}}_{t-1}=\text{x}\left|{\text{X}}_{t-1}\in B,{\text{X}}_{0}=\text{h},\mathrm{\tau}\left(t\right)=0\right.\right\}.$$

(11)

The following theorem represents the steady-state distribution of the basins as the solution of a group of linear equations, where the coefficients are ${P}_{{B}_{i}}^{\ast}\left({B}_{k}\right)$ 's. The lemma that follows gives the formula for the coefficients.

**Theorem 1**

$$\mathrm{\pi}\left({B}_{k}\right)=\sum _{i=1}^{m}{P}_{{B}_{i}}^{\ast}\left({B}_{k}\right)\mathrm{\pi}\left({B}_{i}\right).$$

(12)

**Lemma 1**

$${P}_{{B}_{i}}^{\ast}\left({B}_{k}\right)=\sum _{\text{x}\in {B}_{k}}\sum _{\text{y}\in {B}_{i}}{P}_{\text{y}}^{\ast}\left(\text{x}\right){\mathrm{\pi}}^{\ast}\left(\text{y}\left|{B}_{i}\right.\right),$$

(13)

where ${P}_{\text{y}}^{\ast}\left(\text{x}\right)$
is the probability that state transition goes from **y** to **x** in one step by gene perturbation.

Now the only unknown is *π*^{*}(**y**|*B _{i}*). When

$${\mathrm{\pi}}^{\ast}\left(\text{y}\left|{B}_{i}\right.\right)\approx \frac{1}{\left|{A}_{i}\right|}{1}_{\left[\text{y}\in {A}_{i}\right]},$$

(14)

where |*A _{i}*| is the cardinality of

$${P}_{{B}_{i}}^{\ast}\left({B}_{k}\right)\approx \frac{1}{\left|{A}_{i}\right|}\sum _{\text{x}\in {B}_{k}}\sum _{\text{y}\in {A}_{i}}{P}_{\text{y}}^{\ast}\left(\text{x}\right).$$

(15)

**Lemma 2 ** For basin *B _{k}*, initial state

$$\underset{t\to \mathrm{\infty}}{\mathrm{lim}}\mathrm{Pr}\left\{{\text{X}}_{t-j}=\text{x}\left|{\text{X}}_{t-j}\in {B}_{k},{\text{X}}_{0}=\text{h},\mathrm{\tau}\left(t\right)=j\right.\right\}=\frac{1}{\mathrm{\pi}\left({B}_{k}\right)}\sum _{i=1}^{m}\sum _{\text{y}\in {B}_{i}}{P}_{\text{y}}^{\ast}\left(\text{x}\right){\mathrm{\pi}}^{\ast}\left(\text{y}\left|{B}_{i}\right.\right)\mathrm{\pi}\left({B}_{i}\right).$$

(16)

**Lemma 3 ** If *δ*(**x**,*A _{k}*) is the number of iterations of

$$\sum _{j=\mathrm{\delta}\left(\text{x},{A}_{k}\right)}^{\mathrm{\infty}}\left(1-b\right){b}^{j}={b}^{\mathrm{\delta}\left(\text{x},{A}_{k}\right)}.$$

(17)

Applying the two lemmas and letting *b*=(1-*p*)* ^{n}*, we can obtain the steady-state probability of attractor

**Theorem 2**

$$\mathrm{\pi}\left({A}_{k}\right)=\sum _{i=1}^{m}\left[\sum _{\text{x}\in {B}_{k}}\sum _{\text{y}\in {B}_{i}}{P}_{\text{y}}^{\ast}\left(\text{x}\right){\mathrm{\pi}}^{\ast}\left(\text{y}\left|{B}_{i}\right.\right){\left(1-p\right)}^{n\mathrm{\delta}\left(\text{x},{A}_{k}\right)}\right]\mathrm{\pi}\left({B}_{i}\right).$$

(18)

When *p* is small, using the approximation in Eq. (14), we have

$$\mathrm{\pi}\left({A}_{k}\right)\approx \sum _{i=1}^{m}\frac{1}{\left|{A}_{i}\right|}\left[\sum _{x\in {B}_{k}}\sum _{y\in {B}_{i}}{P}_{y}^{\ast}\left(\text{x}\right){\left(1-p\right)}^{n\mathrm{\delta}\left(x,{A}_{k}\right)}\right]\mathrm{\pi}\left({B}_{i}\right).$$

(19)

In a PBN, we represent the pair (**x**,**f**) as the state of a homogeneous Markov chain, (**X*** _{t}*,

$${P}_{y,g}\left(\text{x},\text{f}\right)=\mathrm{Pr}\left\{{\text{X}}_{t+1}=\text{x},{\text{F}}_{t+1}=\text{f}\left|{\text{X}}_{t}=\text{y},{\text{F}}_{t}=\text{g}\right.\right\}$$

(20)

Assume the PBN is composed of *r* BNs *β*_{1}(*V*,**f**_{1}),...,*β _{r}*(

The computations in steps (1) and (2) are similar to that of BNp, with (*B _{ki}*,

From steps (1) and (2), we can obtain *π*(*A _{ki}*,

$$\mathrm{\pi}\left({A}_{\mathit{ki}}\right)=\sum _{l=1}^{r}\mathrm{\pi}\left({A}_{\mathit{ki}},{\text{f}}_{l}\right).$$

(21)

Since *π*(*A _{ki}*,

$$\mathrm{\pi}\left({A}_{\mathit{ki}},{\text{f}}_{l}\left|{A}_{\mathit{lj}},{\text{f}}_{l}\right.\right)\approx \frac{\left|{A}_{\mathit{ki}}\cap {A}_{\mathit{lj}}\right|}{{A}_{\mathit{lj}}}.$$

(22)

Thus,

$$\mathrm{\pi}\left({A}_{\mathit{ki}},{\text{f}}_{l}\right)\approx \sum _{j=1}^{{m}_{l}}\mathrm{\pi}\left({A}_{\mathit{ki}},{\text{f}}_{l}\left|{A}_{\mathit{lj}},{\text{f}}_{l}\right.\right)\mathrm{\pi}\left({A}_{\mathit{lj}},{\text{f}}_{l}\right)\approx \sum _{j=1}^{{m}_{l}}\frac{\left|{A}_{\mathit{ki}}\cap {A}_{\mathit{lj}}\right|}{\left|{A}_{\mathit{lj}}\right|}\mathrm{\pi}\left({A}_{\mathit{lj}},{\text{f}}_{l}\right),$$

(23)

and

$$\mathrm{\pi}\left({A}_{\mathit{ki}}\right)\approx \sum _{l=1}^{r}\sum _{j=1}^{{m}_{l}}\frac{\left|{A}_{\mathit{ki}}\cap {A}_{\mathit{lj}}\right|}{\left|{A}_{\mathit{lj}}\right|}\mathrm{\pi}\left({A}_{\mathit{lj}},{\text{f}}_{l}\right).$$

(24)

Bayesian networks (BaN) are graphic models that describe the conditional probabilistic dependencies between variables, and have been used to model genetic regulatory networks [12]. An advantage of BaNs is that they involve model selection to optimally explain the observed data [2]; BaNs can use either continuous or discrete variables, which is more flexible for modeling. In comparison, Boolean models have explicit regulatory rules that carry biological information, which can be more appealing to biologist than the statistic representation of BaNs. Although Boolean models use binary-quantized variables which sets a limitation on the data usage, they are computational less complex than BaNs when learning the network structure from data (see Section 3.3 of [2] for a more detailed discussion and references). Since network structure learning is out of scope of this article, interested readers can refer to [12] for Bayesian learning, [13] for Boolean network learning, and [8, 14] for PBN learning.

While BNs are deterministic, PBNs and BaNs are related by their probabilistic nature; like PBNs, dynamic BaNs can be considered as Markov chains too. In the following analysis, we will show that equivalence between PBNs and BaNs can be established under certain conditions [15]. In this analysis, the random gene perturbation rate *p* in PBN is assumed to be 0.

A BaN with *n* random variables *X*_{1},...,*X _{n}* (not necessarily binary) is represented by

$\mathrm{Pr}\left\{{x}_{1},...,{x}_{n}\right\}=\prod _{i=\mathit{1}}^{n}\mathrm{Pr}\left\{{x}_{i}\left|\mathrm{Pa}\right.\left({X}_{i}\right)\right\}.$

A dynamic Bayesian network (DBN) is a temporal extension of BaN, and consists of two parts: (1) an initial BaN *Ba*_{0} = (*H*_{0},Θ_{0}) that defines the joint distribution of the variables *x*_{1}(0),...,*x _{n}*(0), and (2) a transition BaN

$$\mathrm{Pr}\left\{\text{x}\left(0\right),...,\text{x}\left(T\right)\right\}=\mathrm{Pr}\left\{\text{x}\left(0\right)\right\}\prod _{t=1}^{T}\mathrm{Pr}\left\{\text{x}\left(t\right)\left|\text{x}\left(t-1\right)\right.\right\}=\prod _{i=1}^{n}\mathrm{Pr}\left\{{x}_{i}\left(0\right)\left|\mathrm{Pa}\left({X}_{i}\left(0\right)\right)\right.\right\}\cdot \prod _{t=1}^{T}\prod _{j=1}^{n}\mathrm{Pr}\left\{{x}_{j}\left(t\right)\left|\mathrm{Pa}\left({X}_{j}\left(t\right)\right)\right.\right\}.$$

(25)

In a PBN *G*(*V*,**F**,**c**), where $V=\left\{{x}_{1},...,{x}_{n}\right\},{x}_{i}\in \left\{0,1\right\}$
and **F** = {**f**_{1},...,**f**_{r}}, the joint probability distribution of states over the time period [0,*T*] can be expressed as

$\mathrm{Pr}\left\{\text{x}\left(0\right),...,\text{x}\left(T\right)\right\}=\mathrm{Pr}\left\{\text{x}\left(0\right)\right\}\prod _{t=1}^{T}\mathrm{Pr}\left\{\text{x}\left(t-1\right)\to \text{x}\left(t\right)\right\}.$

For an independent PBN,

$$\mathrm{Pr}\left\{\text{x}\left(0\right),...,\text{x}\left(N\right)\right\}=\mathrm{Pr}\left\{\text{x}\left(0\right)\right\}\prod _{t=1}^{T}\prod _{i=1}^{n}\mathrm{Pr}\left\{\text{x}\left(t-1\right)\to {x}_{i}\left(t\right)\right\}.$$

(26)

Let the independent PBN be *G*(*V*,Ψ*α*) (the alternative representation, see what follows Definition 2). First, since a BaN can represent arbitrary joint distribution, the distribution of the initial state of PBN, Pr{**x**_{0}} , can be represented by some *Ba*_{0}. Second, to construct *Ba*_{1}(*H*_{1},Θ_{1}) from the PBN, we let set ${X}_{j}^{\left(i\right)}\u2ac5V$
denote the regulators of gene *x _{i}* in function ${\mathrm{\psi}}_{j}^{\left(i\right)}$
,

$$\mathrm{Pa}\left({x}_{i}\right)={\cup}_{j=1}^{l\left(i\right)}{\text{X}}_{j}^{\left(i\right)}.$$

(27)

We construct graph *H*_{1} such that there are two layers of nodes, the first layer has nodes *X*_{1}(*t*–1),...,*X _{n}*(

Let *D _{i}* be the joint distribution of the variables in Pa(

$$\mathrm{Pr}\left\{{X}_{i}=1\right\}=\sum _{j=1}^{l\left(i\right)}\mathrm{Pr}\left\{{X}_{i}=1\left|{\mathrm{\psi}}_{j}^{\left(i\right)}\right.\text{is used}\right\}.{\mathit{\alpha}}_{j}^{\left(i\right)}$$

(28)

$$=\sum _{j=1}^{l\left(i\right)}\left[\sum _{{\text{x}\in \left\{0,1\right\}}^{\left|\mathrm{Pa}\left({X}_{i}\right)\right|}}{D}_{i}\left(\text{x}\right){\mathrm{\psi}}_{j}^{\left(i\right)}\left(\text{x}\right)\right].{\mathrm{\alpha}}_{j}^{\left(i\right)}$$

(29)

$$=\sum _{{\text{x}\in \left\{0,1\right\}}^{\left|\mathrm{Pa}\left({X}_{i}\right)\right|}}{D}_{i}\left(\text{x}\right)\left[\sum _{j=1}^{l\left(i\right)}{\mathrm{\psi}}_{j}^{\left(i\right)}\left(\text{x}\right){\mathrm{\alpha}}_{j}^{\left(i\right)}\right],$$

(30)

and we have

$$\mathrm{Pr}\left\{{X}_{i}\left(t\right)=1\left|\mathrm{Pa}\left({X}_{i}\left(t\right)\right)=\text{z}\right.\right\}=\sum _{j=1}^{l\left(i\right)}{\mathrm{\psi}}_{j}^{\left(i\right)}\left(\text{z}\right){\mathit{\alpha}}_{j}^{\left(i\right)}.$$

(31)

Eq. (31) defines Θ_{1} (induced by *H*_{1}) for each node, thus any independent PBN *G*(*V*,Ψ*,α*) can be expressed as a binary DBN (*Ba*_{0},*Ba*_{1}).

**Remark 2 **Strictly speaking, the input variables for ${\mathrm{\psi}}_{j}^{\left(i\right)}$
are a subset of Pa(*x _{i}*), so the notations in Eqs. (29-31) are not accurate when we use the same vector

Assume DBN (*Ba*_{0},*Ba*_{1}) defined on **X**=(*X*_{1},...,*X _{n}*) is given, and

In , *Ba*_{1}(*H*_{1},Θ_{1}), assume Pa(*X*_{i}(*t*)) contains *k _{i}* variables

Letting *l*(*i*)=2* ^{ki}*+1, we define the set of

$${\mathrm{\psi}}_{m}^{\left(i\right)}={\tilde{x}}_{m}\vee {\tilde{x}}_{m+1}\vee ...\vee {\tilde{x}}_{l\left(i\right)-1},\mathrm{for}1\le m\le l\left(i\right)-1$$

(32)

is a disjunction of conjunctions, and ${\mathrm{\psi}}_{l\left(i\right)}^{\left(i\right)}$ is a zero function. Define the corresponding function selection probabilities ${\mathrm{\alpha}}_{1}^{\left(i\right)}={p}_{1},{\mathrm{\alpha}}_{m}^{\left(i\right)}={p}_{m}-{p}_{m-1}\mathrm{for1}<m\le l\left(i\right)-1,\mathrm{and}{\mathrm{\alpha}}_{l\left(i\right)}^{\left(i\right)}=1-{p}_{l\left(i\right)-1},$ it can be verified that

$$\mathrm{Pr}\left\{{X}_{i}\left(t\right)=1\left|\mathrm{Pa}\left({X}_{i}\left(t\right)\right)={\text{y}}_{j}\right.\right\}=\sum _{m=1}^{l\left(i\right)}{\mathrm{\psi}}_{m}^{\left(i\right)}\left({\text{y}}_{j}\right){\mathit{\alpha}}_{j}^{\left(i\right)}=\sum _{m=1}^{j}{\mathrm{\psi}}_{m}^{\left(i\right)}\left({\text{y}}_{j}\right){\mathit{\alpha}}_{j}^{\left(i\right)}={p}_{1}+\sum _{m=2}^{j}\left({p}_{m}-{p}_{m-1}\right)={p}_{j}.$$

(33)

Therefore, a binary DBN can be represented as a PBN *G*(*V*,Ψ,*α*), where Ψ={Ψ_{1},...,Ψ* _{n}*},

Summarizing subsections 3.3.1 and 3.3.2, we have the following theorem [15].

**Theorem 3 **Independent PBNs *G*(*V*,Ψ,*α*) and binary-valued DBNs (*Ba*_{0},*Ba*_{1}) whose initial and transition BNs *Ba*_{0} and *Ba*_{1} are assumed to have only within and between consecutive slice connections, respectively, can represent the same joint distribution over their common variables.

Boolean models, like any other networks, have two issues of interest: Is the model robust? Is the model controllable? From the standpoint of system stability, we require the model be robust, namely, resistent to small changes in the network; from the standpoint of network intervention, we desire that the network be controllable, such that it will respond to certain perturbation. There needs to be a balance of the two properties. These two questions encourage researchers to do the following, (1) Find structural properties of the network that are related to robustness and controllability; (2) Seek ways to analyze the effect of perturbations and to design control techniques.

In 3.4.1, (1) is addressed. We review some structural measures of Boolean models that quantify the propagation of expression level change from one gene to others (or vice versa). In 3.4.2, (2) is partly addressed, where we review structural perturbations, and present a methodology that analyzes the perturbation on Boolean functions. Since the control techniques are out of the scope of this paper, interested readers can find more information in the review articles [16, 17].

In gene regulatory networks, the interactions among genes are reflected by two facts: the connections among genes, and the Boolean functions defined upon the connection. No matter it is the robustness or the controllability issue we are interested in, it all boils down to one central question: how a change in the expression level of one gene leads to changes in other genes in the network and vice versa. Here, we introduce three measures of the structural properties that are related to the question: canalization, influence and sensitivity.

When a gene is regulated by several parent genes through function *f*, some parent genes can be more important in determining its value than others. An extreme case is canalizing function, in which one variable (canalizing variable) can determine the function output regardless of other variables.

**Definition 7 **[18] A Boolean function *f* : {0,1}* ^{n}* → {0,1} is said to be canalizing if there exists an $i\in \left\{1,...,n\right\}$
and $u,v\in \left\{0,1\right\}$
such that for all ${x}_{1},...,{x}_{n}\in \left\{0,1\right\}$
, if

In gene regulatory networks, canalizing variables are also referred to as the master genes. Canalization is commonly observed in real organisms, and it plays an important role in the stability of genetic regulation, as discussed in [19, 20]. Mathematically, researchers have shown that canalization is associated with the stability of Boolean networks. For more theoretical work, see [21-23].

Other than canalization, the degree of gene-gene interaction can be described in more general terms, and we define two quantitative measures, influence and sensitivity, as follows.

Consider a Boolean function *f* with input variables *x*_{1},...,*x _{n}*. Letting

**Definition 8** [2] The influence of a variable *x _{j}* on the Boolean function

$${I}_{j}\left(f\right)={E}_{D}\left[\frac{\partial f\left(\text{x}\right)}{{\partial x}_{j}}\right]=\mathrm{Pr}\left\{\frac{\partial f\left(\text{x}\right)}{{\partial x}_{j}}=1\right\}=\mathrm{Pr}\left\{f\left(\text{x}\right)\ne f\left({\text{x}}^{\left(j\right)}\right)\right\}$$

(34)

Note that the partial derivative of *f* with respect to *x _{i}* is

$$\frac{\partial f\left(\text{x}\right)}{{\partial x}_{j}}=\left|f\left(\text{x}\right)-f{\left(\text{x}\right)}^{\left(j\right)}\right|,$$

(35)

in which **x*** ^{(j)}* = (

In a BN, since each node *x _{i}* has one regulatory function

$${I}_{j}\left({x}_{i}\right)=\sum _{k=1}^{l\left(i\right)}{I}_{j}\left({\mathrm{\psi}}_{k}^{\left(i\right)}\right).{\mathit{\alpha}}_{k}^{\left(i\right)}.$$

(36)

Thus for a Boolean model with *n* genes, an *influence matrix* Γ of dimension *n* × *n* can be constructed, where its *i*,*j* element being Γ_{ij} = *I _{i}*(

$$r\left({x}_{i}\right)=\sum _{j=1}^{n}{\mathrm{\Gamma}}_{\mathit{ij}}.$$

(37)

Related to influence, we define the sensitivity of a function,

$${s}_{\text{x}}\left(f\right)=\sum _{j=1}^{n}\left|f\left(\text{x}\right)-f\left({\text{x}}^{j}\right)\right|.$$

(38)

Then the average sensitivity of *f* with respect to distribution *D* is

$$s\left(f\right)={E}_{D}\left[{s}_{\text{x}}\left(f\right)\right]=\sum _{j=1}^{n}{E}_{D}\left[\left|f\left(\text{x}\right)-f\left({\text{x}}^{\left(j\right)}\right)\right|\right]=\sum _{j=1}^{n}{I}_{j}\left(f\right).$$

(39)

The meaning of average sensitivity is that, on average, how much the function *f* changes between the Hamming distance one neighbors (i.e., the input vectors differ by one bit). For PBNs, the average sensitivity of gene *x _{i}* is (cf. Eq. (37))

$$s\left({x}_{i}\right)=\sum _{j=1}^{n}{I}_{j}\left({x}_{i}\right)=\sum _{j=1}^{n}{\mathrm{\Gamma}}_{\mathit{ji}}.$$

(40)

Biologically, the influence of a gene indicates its overall impact on other genes. A gene with high influence has the potential to regulate the system dynamics and its perturbation has significant downstream effect. The sensitivity of a gene measures its stability or autonomy. Low sensitivity means that other genes have little effect on it, and the “house-keeping” genes usually have this property [2]. It is shown that such quantitative measures (or variants) can help guide the control of genetic networks [24] and aid in the steady-state analysis [25].

There are two types of perturbation on Boolean models: perturbation on network states and perturbation on network structure. The former refers to a sudden (forced or spontaneous) change in the current state from **x** to **x**' , which causes the system dynamics to be disturbed temporarily. Such disturbance is transient in nature, because the network nodes and connections are intact, and the underlying gene regulation principles do not change. Therefore, the network attractors and the basins of attraction remain the same. However, if the perturbed Boolean model has multiple attractors, state perturbations may cause convergence to a different attractor than the original one, and may change the steady-state distribution of the network. This type of perturbation has been studied extensively (e.g. [26]), and finds its use in network control (e.g. [6]).

Perturbation on network structure refers to any change in the “wiring” or functions of the network. For instance, we may remove or add a gene to the network, change connections among genes, change the Boolean functions, or even change the synchronous Boolean network to an asynchronous model (where not all the genes are updated at the same time). Structural perturbation is more complex and less studied, compared to state perturbation. When network structure is perturbed, the network attractors and basins of attraction will be impacted, therefore the long-term consequence is more difficult to gauge than that of state perturbation.

The reasons for studying structural perturbation are: (1) modeling of gene regulatory networks is subject to uncertainty, and it is desirable to study the effect of small difference in network models on the network dynamic behavior; (2) it is likely that gene regulations, like other biological functions, have intrinsic stochasticity, and it is of interest to predict the consequence of any perturbation in regulation; (3) changing the network structure can alter the network steady-state distribution, thus structural perturbation can be an alternative way (with respect to state perturbation) of network control [25, 27, 28].

In [8], the authors developed theories to predict the impact of function perturbations on network dynamics and attractors, and main results are presented below. For more applications, see [28]. For further analysis in terms of steady-state distribution and application in network intervention, see [25].

**Problem formulation.** Given a Boolean network *β*(*V*,**f**), *V* = {*x*_{1},...,*x _{n}*},

Assume gene *x _{i}* has

Row label | ${x}_{\mathit{i1}}{x}_{\mathit{i2}}\cdot \cdot \cdot {x}_{{\mathit{ik}}_{i}}$ | f(.)_{i} |
---|---|---|

1 | $00\cdot \cdot \cdot 0$ | 0 |

2 | $00\cdot \cdot \cdot 1$ | 1 |

$\begin{array}{c}\cdot \\ \cdot \\ \cdot \end{array}$ | $\begin{array}{c}\cdot \\ \cdot \\ \cdot \end{array}$ | $\begin{array}{c}\cdot \\ \cdot \\ \cdot \end{array}$ |

${2}^{{k}_{i}}$ | $11\cdot \cdot \cdot 1$ | 0 |

Any state transition **s** → **w** contains *n* mappings, *f _{i}* :

The following proposition and corollaries state the basic effects of one-bit function perturbation on the state transitions and attractors. Proofs and extensions to two-bit perturbations can be found in [28].

**Proposition 1 **A state transition **s** → **w** is affected by one-bit perturbation ${f}_{i}\to {f}_{i}^{\left(j\right)}$
if and only if ${\mathrm{In}}_{i}\left(s\right)={\text{a}}_{j}^{i}$
. If the state transition is affected, the new state transition will be **s** → **w**^{(i)}, where **w**^{(i)} is defined to be the same as **w** except the *i* -th digit is flipped.

**Corollary 1 **If *x _{i}* has

**Corollary 2 **(**Invariant singleton attractor**) Suppose state **S** is a singleton attractor. It will no longer be a singleton attractor following the one-bit perturbation ${f}_{i}\to {f}_{i}^{\left(j\right)}$
if and only if ${\mathrm{In}}_{i}\left(s\right)={\text{a}}_{j}^{i}$
.

**Corollary 3 **(**Emerging singleton attractor**) A non-singleton-attractor state **S** becomes a singleton attractor as a result of the one-bit perturbation ${f}_{i}\to {f}_{i}^{\left(j\right)}$
if and only if the following are true: (1) ${\mathrm{In}}_{i}\left(s\right)={\text{a}}_{j}^{i}$
, and (2) absent the perturbation, **S** → **S**^{(i)}.

We use the following toy example to demonstrate the above results. From these results, more applications can be derived, such as controlling the network steady-state distribution through function perturbation, or identifying functional perturbation by observing phenotype changes [28].

**Example 1** Consider a BN with *n* = 3 genes,

$${x}_{1}\left(t+1\right)={x}_{3}\left(t\right),$$

(41)

$${x}_{2}\left(t+1\right)=0,$$

(42)

$${x}_{3}\left(t+1\right)=\stackrel{-}{{x}_{1}}\left(t\right){x}_{2}\left(t\right)+{x}_{1}\left(t\right)\stackrel{-}{{x}_{2}}\left(t\right),$$

(43)

where the truth table of *f*_{3} is shown below and the state transition diagram is shown in Fig. (**22**).

Row label | x_{1} | x_{2} | f_{3}(.) |
---|---|---|---|

1 | 0 | 0 | 0 |

2 | 0 | 1 | 1 |

3 | 1 | 0 | 1 |

4 | 1 | 1 | 0 |

If a one-bit perturbation forces *f*_{3} to become ${f}_{3}^{\left(3\right)}$
, since *k*_{3} = 2, 2 state transitions will be affected. By Proposition 1, states 100 and 101 no longer transit to 001 and 101 but to 000 and 100 respectively. Because of that, attractor cycle {001, 100} will be affected. Moreover, Corollary 2 predicts that the singleton attractor 000 is robust to the perturbation while 101 is not. The predictions are confirmed by the new state transition diagram shown in Fig. (**33**).

Finally, the author would like to remind the readers that other works on (various types of) structural perturbation are available. For instance, in [29], the authors added a redundant node to Boolean network, such that the bolstered network is more resistent to a one-bit function perturbation (as defined above). In [30], the effect of asynchronous update of a *drosophila* segment polarity network model is examined in terms of the phenotypes (steady-states). In [25], the authors derived analytical results of how function perturbations affects network steady-state distributions and applied them to structural intervention. In [31], the author modeled gene knockdown and broken regulatory pathway in Boolean networks, and analyzed the effects.

Recall from Section 2 that a Boolean model of *n* genes has a finite state space, and a BN has deterministic dynamic behavior which can be fully captured by the state transition diagram. A PBN is probabilistic in nature, therefore its state transition is also probabilistic. For both BNs and PBNs, attractors are characteristic of their long-term behavior. Given the above knowledge, if we would like to know anything about a Boolean model, we should find out its state transition diagram and attractors first. This is to be discussed in Section 4.1.

For Boolean models, the most commonly encountered simulation issues include: (1) how to generate the time sequence data of a network, **x**(0),**x**(1),...,**x**(*t*),...; (2) how to find the network steady-state distribution if it exists; and (3) how to produce artificial Boolean models with prescribed attractors to facilitate other studies. Among them, (1) is a basic practice that can be utilized in (2) and (3), and we will deal with them in Sections 4.2, 4.3 and 4.4 respectively. Note that the techniques in 4.1 is crucial to all the three issues.

To obtain the state transition diagram of a BN, we first compile a state transition table. Assuming *n* nodes in the network, *x*_{1},...,*x _{n}*, we evaluate the current state

To obtain the state transition diagram of a BN, we draw 2^{n} vertices, each representing a possible state, and connect two vertices by a directed edge if one state transits to the other based on the state transition table. If a state transits to itself, the edge points to itself. Fig. (**44**) is the state transition diagram based on Table **11**.

Similarly for a PBN, when gene perturbation rate *p*=0, we can draw its state transition diagram by combining the state transition diagrams of its component BNs. Now each edge has a probability attached to it, representing the possibility of one state transiting to the other. For example, if a PBN is composed of two BNs, where the first BN has state transitions shown in Table **11**, and the second BN has state transitions shown in Table **22**, and their selection probabilities are *c*_{1} and *c*_{2} = 1-*c*_{1} respectively, then when *p* = 0, the PBN's state transition diagram is shown in Fig. (**55**). When *p* > 0 , a state transition can either be driven by some network function or by random gene perturbations, and we may refer to its state transition matrix *T* when constructing the state transition diagram. It should be noted that the sum of probabilities of all the edges exiting a vertex should always be 1.

Example of state transition diagram of a probabilistic Boolean network when gene perturbation rate *p* = 0.

The following is a simple algorithm for finding the attractors of BN based on the state transition table (using integer representation of the states).

**Algorithm 1 **(Finding attractors)

- Generate an array
*a*of size 2^{n}, and initialize all*a*'S to 0._{i}*a*corresponds to state_{i}*i*–1. - Search for singleton attractors. For each state
*i*between 0 and 2^{n}–1, look up the (*i*+1) -th entry in the state transition table for its next state*j*. If*j*= 1, then*j*is a singleton attractor, set*a*:=1._{j+1} - Search for attractor cycles. For each state
*i*between 0 to 2^{n}-1, if*a*=0, look up the state transition table repeatedly for the successor states of_{i+1}*i*, such that*i*→*j*→*k*... until a singleton attractor or an attractor cycle is reached. If an attractor cycle is reached, save the cycle states and set the corresponding elements in*a*to 1.

A common practice with a Boolean model defined on *V* = {*x*_{1},...,*x _{n}*} is to generate time sequence data.

An alternative way, which is more efficient when simulation time is long $\left(t\gg {2}^{n}\right)$
, is to utilize the information of state transition diagram (encoded in the state transition table) or transition probability matrix *T*. For BN, it entails converting the current state **x**(*t*) to an integer, and looking up the state transition table or matrix *T* for the next state **x**(*t*+1). For PBN, one can start from a randomly chosen initial state and a randomly chosen initial network (from *r* BNs), and follow either of the two protocols below. Note that we follow the notations in Definition 2 and use *p*, *q* to denote the random gene perturbation rate and network switching probability, respectively. Network selection probabilities are denoted by *c*_{1},...,*c _{r}*.

**Table-lookup and real-time computation based method.**Construct*r*state transition tables for the*r*component BNs respectively (letting gene perturbation rate*p*= 0). At any time*t*, if at the*k*-th network, generate*n*independent [0,1] uniformly distributed random numbers*p*_{1},...,*p*. If_{n}*p*<_{i}*p*, flip*x*(_{i}*t*) to get ${x}_{i}\left(t+1\right);\mathrm{if}{p}_{i}>p\forall i$ (no gene perturbation), convert**x**(*t*) to integer and look up the*k*-th state transition table to find**x**(*t*+1). Finally, generate a [0,1] uniformly distributed random number*q*and compare to_{s}*q*to decide if the system will switch network at*t*+1; if switch will occur, choose from the*r*networks according to the selection probabilities.*T***matrix based method.**Compute the transition probability matrix*T*. If dec(**x**(*t*))=*i*–1, generate a [0,1] uniformly distributed random number*p*. If ${\sum}_{l=1}^{j-1}{t}_{\mathit{il}}\le {p}_{t}<{\sum}_{l=1}^{j}{t}_{\mathit{il}}$ (_{t}*t*is the (_{il}*i,l*) element of*T*), then convert*j*–1 to a*n*-bit vector**s**(dec(**s**)=*j*–1) and the next state is**x**(*t*+1)=**S**.

One other issue of simulating a PBN is the choice of parameters *p* and *q*. As stated in Section 2, network switching probability *q* does not affect the probability of being at any constituent BN, and in theory we can choose any value for *q*; however, we prefer to choose small *q* because in a biological system, switching network corresponds to the change of context (reflecting a change of regulatory paradigm, either caused by environment change or internal signals), which should not occur very often. Moreover, if *q* is large, or even *q*=1, then network switching is frequent, and a short time sequence of data **x**(*t*_{1}),**x**(*t*_{1}+1),...,**x**(*t*_{2}) are more likely to come from several BNs instead of from one single BN. This may pose a difficulty if we try to identify the underlying PBN and its component BNs from the sequence data [32]. On the other hand, if *q* is too small, and the number of BNs in the PBN is large, it will take too long a time to obtain the steady-state distribution by simulation method. Usually *p* should be small to reflect the rarity of random gene perturbation, and we let *p* <<*q*. Also, small *p* is helpful if the generated sequence data will be used as artificial time-series data for the identification the underlying PBN and its component BNs. However, if *p* is too small, it will take longer to obtain the steady-state distribution. Usually, we can choose *q* = 0.01 ~ 0.2 and *p* = 0.01% ~ 0.5%.

As discussed in Section 3.1, a PBN possesses a steady-state distribution when *q*, *p* > 0 [3]. By definition, this distribution *π*^{*} is the solution to linear equations *π* = *π*.*T* with constraint ${\sum}_{i}{\mathrm{\pi}}_{i}=1$
is unique and can be estimated by iteration, given the transition probability matrix *T *(assuming *n* genes, and *N* = 2^{n} ).

**Algorithm 2 **(Finding steady-state distribution)

- Set
*δ*^{*}and generate an initial distribution ${\mathrm{\pi}}^{\left(0\right)}=\left({\mathrm{\pi}}_{1}^{\left(0\right)},...,{\mathrm{\pi}}_{N}^{\left(0\right)}\right)$ ; Let*k*:=0. - DOCompute
*π*^{(k+1)}:=*π*^{(k)}.*T*;$\mathrm{\delta}:=\Vert {\mathrm{\pi}}^{\left(k+1\right)}-{\mathrm{\pi}}^{\left(k\right)}\Vert ;$*k*: =*k*+1;UNTIL (*δ*<*δ*^{*}) *π*^{*}:=*π*^{(k)}.

Note that ||.|| can be any norm, such as ||.||_{∞}.

When the number of BNs in a PBN is large and some BNs have small selection probabilities, an approximation method for constructing *T* is proposed in [33]. In the approximation, $\stackrel{\u02c6}{T}$
is computed instead of *T*, which ignores *r*_{0} BNs whose selection probabilities *c _{k1}*,...,

$$\stackrel{\u02c6}{T}=\stackrel{-}{{T}^{\text{'}}}+\stackrel{\sim}{T},$$

(44)

$$\stackrel{-}{{T}^{\text{'}}}={\left(1-p\right)}^{n}.\sum _{j=1,j\ne \mathit{k}1,...,{\mathit{kr}}_{0}}^{r}{c}_{j}{T}_{j}/\left(1-\sum _{i=1}^{{r}_{0}}{c}_{\mathit{ki}}\right)$$

(45)

where *T _{j}* (1≤

$$E\left[\frac{{\Vert \stackrel{\u02c6}{{\mathrm{\pi}}^{\ast}}-\stackrel{\u02c6}{{\mathrm{\pi}}^{\ast}}T\Vert}_{\mathrm{\infty}}}{{\Vert {\mathrm{\pi}}^{\ast}\Vert}_{\mathrm{\infty}}}\right]<\left(2+2n\right)\sum _{i=1}^{{r}_{0}}{c}_{\mathit{ki}}<2\left(n+1\right){r}_{0}\mathrm{\epsilon}.$$

(46)

The following is an alternative method of obtaining steady-state distribution. If we are interested in the attractors only, knowing that the majority of the steady-state probability mass is on attractors if *p* is small, we may apply the Markov chain based analytic method in Section 3.2.

This method requires generation of a long time sequence of data, **x**(0),**x**(1),...,**x**(*T*), such that the frequencies of all the possible 2^{n} states approach the steady-state distribution. In a given *n* -gene PBN with gene perturbation rate *p*, the smaller *p* is and the larger *n* is, the longer it takes to converge to the steady-state distribution. In general, we need to simulate at least 10.2^{n}.*p*^{-1} steps.

To estimate when the PBN has converged to its steady-state distribution, we can use the Kolmogorov-Smirnov test. The basic idea of Kolmogorov-Smirnov test is to measure the closeness of an empirical probability distribution to the theoretical distribution. Since the latter (steady-state distribution) is unknown in this case, we will test the closeness of two empirical distributions.

To get two quasi-i.i.d (independently and identically distributed) samples in PBN, we select two samples $\text{x}\left({t}_{1}\right),\text{x}\left({t}_{1}+m\mathrm{\Delta}\right),...,\text{x}\left({t}_{1}+\left(M-1\right)\mathrm{\Delta}\right)\mathrm{and}\text{x}\left({t}_{2}\right),\text{x}\left({t}_{2}+m\mathrm{\Delta}\right),...,x\left({t}_{2}+\left(M-1\right)\mathrm{\Delta}\right)\left({t}_{1}{t}_{2}\text{and}{t}_{2}-{t}_{1}\ne m\mathrm{\Delta},0\forall mM\right)$ , and the Kolmogorov-Smirnov statistic is defined as

$$K=\frac{1}{M}\underset{\text{s}}{\mathrm{max}}\left|\sum _{m=0}^{M-1}{1}_{\left[00\cdot \cdot \cdot 0,\text{s}\right]\left(\text{x}\left({t}_{1}+m\mathrm{\Delta}\right)\right)-}\sum _{m=0}^{M-1}{1}_{\left[00\cdot \cdot \cdot 0,\text{s}\right]}\left(\text{x}\left({t}_{2}+m\mathrm{\Delta}\right)\right)\right|.$$

(47)

In the definition, the maximum is over the state space {0,1}^{n}, and **1**_{[00...0,s]}**(x)** is an indicator function whose output equals 1 if and only if $\text{x}\in \left\{00\cdot \cdot \cdot 0,\cdot \cdot \cdot ,\text{s}\right\},\text{s}\in {\left\{0,1\right\}}^{n}$
, and the output equals 0 otherwise.

In a simulation study of Boolean models, it is often necessary to create artificial networks with certain properties. Of special interest is the problem of generating artificial BNs with a given set of attractors, since attractors are hypothesized to correspond to cellular phenotypes and play an important role in the long term behavior of Boolean models.

First, note that the state transition diagram can be partitioned into level sets, where level set *l _{j}* consists of all states that transit to one of the attractors in exactly

**Problem formulation** [35] Given a set of *n* nodes *V*={*x*_{1},...,*x _{n}*}, a family of

Specifically, if we are interested in constructing a BN with only singleton attractors, its state transition diagram will be a *d* -forest (containing *d* single-rooted trees) if the BN has *d* singleton attractors. The following theorem gives the number of all possible state transition diagrams that only contain singleton attractors (the proof can be found in [35]).

**Theorem 4** The cardinality of the collection of all forests on *N* vertices is (*N*+1)^{N-1}.

Since *N*=2^{n} and the number of all possible state transition diagrams are *N*^{N}, when *n* is large, the ratio (*N*+1)^{N-1}/*N*^{N} is asymptotically *e*/2^{n}, thus a brute force search has a low success rate.

Assuming only singleton attractors are allowed, the following algorithm is for solving the search problem formulated above. A second algorithm is also given in [35], but shown to be less efficient.

**Algorithm 3 **(Generating artificial Boolean network)

- Randomly generate or give in advance a set
*A*of*d*states (as singleton attractors). - Randomly generate a predictor set
*P*, where each*P*has_{i}*k*to*K*nodes. If Step 2 has been repeated more than a pre-specified number of times, go back to Step 1. - Check if the attractor set
*A*is compatible with*P*, i.e. only the attractors (each transits to itself) of the state transition diagram are checked for compatibility against*P*. If not compatible, go back to Step 2. - Fill in the entries of the truth tables that correspond to the attractors generated in Step 1. Using the predictor set
*P*and randomly fill in the remaining entries of the truth table. If Step 4 has been repeated more than a pre-specified number of times go back to Step 2. - Search for cycles of any length in the state transition diagram
*D*based on the truth table generated in Step 4. If a cycle is found go back to Step 4, otherwise continue to Step 6. - If
*D*has less than*l*or more than*L*level sets go back to Step 4. - Save the generated BN and terminate the algorithm.

This paper has presented the following analysis and simulation issues of Boolean networks and probabilistic Boolean networks, which are models for gene regulatory networks.

**Analysis**. An important aspect of Boolean models is that they can be viewed as homogeneous Markov chains; for a PBN, when the network switching probability*q*> 0 and gene perturbation rate*p*> 0, it possesses a steady-state distribution. Markov analysis serves as a basis for finding the steady-state probabilities of attractors and for proving the equivalence of PBN and dynamic Bayesian networks. Finally, a structural analysis is provided, where quantitative measures of gene-to-gene relationships are introduced, and the effect of perturbation on Boolean functions are analyzed.**Simulation**. Central to the simulation of Boolean models is the use of state transition diagram and transition probability matrix. In network simulation, different methods are presented and simple guidelines of parameter selection are provided. To test the convergence of a simulated PBN to its steady-state distribution, we can employ Kolmogorov-Smirnov statistic. Lastly, an algorithm for generating artificial BNs with prescribed attractors is presented.

To find more references on Boolean models, and obtain a MATLAB toolbox for BN/PBN, readers can go to the following website, http://personal.systemsbiology.net/ilya/PBN/PBN.htm. Another online source of papers is http://gsp.tamu.edu/Publications/journal-publications.

The author thanks Yuping Xiao for giving valuable critique on the initial manuscript. The anonymous reviewers' insightful comments have helped the author's revision considerably.

1. Kauffman SA. The origins of order self-organization and selection in evolution. New York: Oxford University Press; 1993.

2. Shmulevich I, Dougherty ER, Kim S, Zhang W. Probabilistic Boolean networks: a rule-based uncertainty model for gene regulatory networks. Bioinformatics. 2002;18:261–274. [PubMed]

3. Shmulevich I, Dougherty ER, Zhang W. Gene perturbation and intervention in probabilistic Boolean networks. Bioinformatics. 2002;18:1319–1331. [PubMed]

4. Albert R, Othmer HG. The topology of the regulatory interactions predicts the expression pattern of the segment polarity genes in Drosophila melanogaster. J. Theor. Biol. 2003;223:1–18. [PubMed]

5. Huang S. Gene expression profiling, genetic networks, and cellular states: an integrating concept for tumorigenesis and drug discovery. J. Mol. Med. 1999;77:469–480. [PubMed]

6. Datta A, Choudhary A, Bittner ML, Dougherty ER. External control in Markovian genetic regulatory networks. Mach. Learn. 2003;52:169–191. [PubMed]

7. Shmulevich I, Dougherty ER, Zhang W. From Boolean to probabilistic Boolean networks as models of genetic regulatory networks. Proc. IEEE. 2002;90:1778–1792.

8. Dougherty ER, Xiao Y. Design of probabilistic Boolean networks under the requirement of contextual data consistency. IEEE Trans. Signal Processing. 2006;54:3603–3613.

9. Zhang S-Q, Ching W-K, Ng MK, Akutsu T. Simulation study in probabilistic Boolean network models for genetic regulatory networks. Int. J. Data Min. Bioinform. 2007;1(3):217–240. [PubMed]

10. Çinlar E. Introduction to stochastic processes. 1st ed. Englewood Cliffs NJ: Prentice Hall; 1997.

11. Brun M, Dougherty ER, Shmulevich I. Steady-state probabilities for attractors in probabilistic Boolean networks. EURASIP J. Signal Processing. 2005;85:1993–2013.

12. Friedman N, Linial M, Nachman I, Pe’er D. Using Bayesian networks to analyze expression data. J. Comput. Biol. 2000;7:601–620. [PubMed]

13. Ideker T, Thorsson V. Discovery of regulatory interactions through perturbation: inference and experiment design. Pac. Symp. Biocomput. 2000;5:302–313. [PubMed]

14. Xiao Y, Dougherty ER. Optimizing consistency-based design of context-sensitive gene regulatory networks. IEEE Trans. Circ. Syst. I. 2006;53:2431–2437.

15. Lähdesmäki H, Hautaniemi S, Shmulevich I, Yli-Harja O. Relationship between probabilistic Boolean networks and dynamic Bayesian networks as models for gene regulatory networks. Signal Processing. 2006;86:814–834. [PMC free article] [PubMed]

16. Datta A, Pal R, Dougherty ER. Intervention in probabilistic gene regulatory networks. Curr. Genomics. 2006;1(2):167–184.

17. Datta A, Pal R, Choudhary A, Dougherty ER. Control approaches for probabilistic gene regulatory networks. IEEE Signal Processing Mag. 2007;24(1):54–63.

18. Shmulevich I, Lähdesmäki H, Dougherty ER, Astola J, Zhang W. The role of certain Post classes in Boolean network models of genetic networks. PNAS. 2003;100(19):10734–10739. [PubMed]

19. Szallasi Z, Liang S. Modeling the normal and neoplastic cell cycle with realistic Boolean genetic networks: their application for understanding carcinogenesis and assessing therapeutic strategies. Pac. Symp. Biocomput. 1998;3:66–76. [PubMed]

20. Martins-Jr DC, Hashimoto RF, Braga-Neto U, Bittner ML, Dougherty ER. Intrinsically multivariate predictive genes. IEEE J. Select. Topics Signal Processing. 2008;2:424–439.

21. Kauffman S, Peterson C, Samuelsson B, Troein C. Genetic networks with canalizing Boolean rules are always stable. PNAS. 2004;101(49):17102–17107. [PubMed]

22. Aldana M, Cluzel P. A natural class of robust networks. PNAS. 2003;100(15):8710–8714. [PubMed]

23. Reichhardt CJO, Bassler KE. Canalization and symmetry in Boolean genetic regulatory networks. Journal of Physics A: Math. Theor. 2007;40:4339–4350.

24. Choudhary A, Datta A, Bittner ML, Dougherty ER. Assignment of terminal penalties in controlling genetic regulatory networks. American Control Conference; Portland, OR. 2005.

25. Qian X, Dougherty ER. Effect of function perturbation on the steady-state distribution of genetic regulatory networks: optimal structural intervention. IEEE Trans. Signal Processing. 2008;56(10):4966–4975.

26. Qu X, Aldana M, Kadanoff LP. Numerical and theoretical studies of noise effects in the Kauffman model. J. Stat. Phys. 2002;109(516):967–985.

27. Shmulevich I, Dougherty ER, Zhang W. Control of stationary behavior in probabilistic Boolean networks by means of structural intervention. Biol. Syst. 2002;10:431–445.

28. Xiao Y, Dougherty ER. The impact of function perturbations in Boolean networks. Bioinformatics. 2007;23:1265–1273. [PubMed]

29. Gershenson C, Kauffman SA, Shmulevich I. The role of redundancy in the robustness of random Boolean networks. Artificial Life X, Proceedings of the Tenth International Conference on the Simulation and Synthesis of Living Systems; 2006. available online at http://arxiv.org/abs/nlin/0511018 .

30. Chaves M, Albert R, Sontag ED. Robustness and fragility of Boolean models for genetic regulatory network. J. Theor. Biol. 2005;235:431–449. [PubMed]

31. Xiao Y. Modeling gene knockdown and pathway blockage in gene regulatory networks. Proc. IEEE Workshop on Systems Biology and Medicine, Philadelphia, PA, November. 2008.

32. Marshall S, Yu L, Xiao Y, Dougherty ER. Inference of a probabilistic Boolean network from a single observed temporal sequence. EURASIP J. Bioinform. Syst. Biol. 2007;1 Article ID 32454, 15 pages. [PMC free article] [PubMed]

33. Ching W-K, Zhang S, Ng MK, Akutsu T. An approximation method for solving the steady-state probability distribution of probabilistic Boolean networks. Bioinformatics. 2007;23:1511–1518. [PubMed]

34. Shmulevich I, Gluhovsky I, Hashimoto R, Dougherty ER, Zhang W. Steady-state analysis of genetic regulatory networks modeled by probabilistic Boolean networks. Comp. Funct. Genomics. 2003;4:601–608. [PMC free article] [PubMed]

35. Pal R, Ivanov I, Datta A, Bittner ML, Dougherty ER. Generating Boolean networks with a prescribed attractor structure. Bioinformatics. 2005;21:4021–4025. [PubMed]

Articles from Current Genomics are provided here courtesy of **Bentham Science Publishers**

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |