Phase Transition in the Boolean Network Model

In this section we present the mean-field theory results that show the existence of a dynamical phase transition from ordered to chaotic dynamics in the Boolean network model

[26],

[38],

[44]. This allows us to introduce the tools that we use to characterize the dynamical regime in which the network operates. Although the phase transition was first obtained within the context of the mean-field approximation, we will show in the next section the remarkable result that the phase transition also occurs, almost identically, in networks with realistic topologies, for which the mean-field assumption does not necessarily apply. The phase transition is characterized by the temporal evolution of the Hamming distance

*x*(

*t*) between two different dynamical trajectories produced by two slightly different initial conditions. From a biological point of view, the Hamming distance

*x*(

*t*) is the average normalized size at time

*t* of the avalanche of perturbations in the gene expression profile, produced by the perturbation (e.g. gene knockout or gene over expression) of a small fraction

*x*(0) of genes at time

*t*=

0. The temporal evolution of

*x*(

*t*) is given by a dynamical mapping

*x*(

*t*+1)

=

*M*(

*x*(

*t*)) which relates the size of the perturbation avalanche at two consecutive time steps

[26],

[38],

[44]. Given an initial perturbation

*x*(0) at time

*t*=

0, successive iterations of this mapping will eventually converge to a stable fixed point

, which is the final size of the perturbation avalanche. The value

*x*_{∞} is the order parameter that characterizes the dynamical regime in which the network operates. Thus, if

*x*_{∞}=

0 (ordered regime), all the initial perturbations die out over time. On the contrary, if

*x*_{∞}>0 (chaotic regime), the initial perturbation of even a small fraction of genes propagates across the entire system, finally altering the expression of a finite fraction

*x*_{∞} of genes in the genome. It turns out that

*M*(

*x*) is a continuous convex monotonically increasing function of

*x* (in the mean field theory

*M*(

*x*) is a polynomial), with the property that

*M*(0)

=

0 and

*M*(1)<1. Therefore, there is only one parameter that controls the phase transition, the so-called average network sensitivity

*S*, which is given by

*S*=

[

*dM*(

*x*)/

*dx*]

_{x}_{=0}. If

*S*<1 then the only fixed point is

*x*_{∞}=

0, whereas if

*S*>1 then

*x*_{∞}>0. The phase transition occurs at

*S*=

1, for which the fixed point

*x*_{∞}=

0 is only marginally stable. In general,

*S* depends on

*p* and on the topology of the network. The dynamical mapping

*M*(

*x*) contains all the information about the dynamical regime in which the network operates. This is true even if

*M*(

*x*) cannot be obtained through a mean-field computation, which is the case for real networks. In the Supporting Information (

Text S1) we provide a Java applet with the animation of the perturbation dynamics in networks with homogeneous random topology.

Critical Dynamics in Real Genetic Networks

To determine the dynamical regime in which the genetic network of a real organism operates, we have to compute the dynamical mapping

*M*(

*x*) directly from experimental data, without any mean-field assumptions. The actual form of

*M*(

*x*) depends on both the network topology and the particular set of Boolean functions. We computed numerically

*M*(

*x*) for five genetic networks: The network of flower morphogenesis in

*A. thaliana* (15 genes;

[41]); the network of segment polarity genes in

*D. melanogaster* (60 genes;

[42]); and the gene transcription networks of

*E. coli* (1481 genes;

[46]),

*S. cerevisiae* (3459 genes;

[47]), and

*B. subtillis* (830 genes;

[48]). In the first two cases the Boolean functions are already known. However, the overwhelming majority of regulatory phrases for the gene transcription networks of

*E. coli*,

*S. cerevisiae* and

*B. subtilis* are still unknown. Due to this lack of information, to implement the Boolean dynamics on the GRN of these three organisms we used biased random Boolean functions generated with a gene expression probability

*p* inferred from microarray experiments. (In the next section we show that the map

*M*(x) does not change significantly for networks with a large fraction of canalizing functions.)

*Given* the network topology,

*p* can be estimated from microarray experiments by standard Bayessian parametric inference with two states (see

Methods). Using 223 microarrays to sample the gene expression space in

*S. cerevisiae* [49], 152 microarrays for

*E. coli* [49]–

[51], and 69 microarrays for

*B. subtillis* [52], we inferred several regulatory phrases for each of these organisms. All the experiments used for this analysis are listed in the Supporting Information accompanying this article (

Text S1). We only used regulatory phrases for which the average of the

*a posteriori* probability distribution was greater than 70% (activatory phrase) or smaller than 30% (inhibitory phrase). The inference technique is explained below in the

Materials and Methods section. With this technique, we obtained the following gene expression probabilities:

*p*=

0.576±0.038 for

*E. coli* (from 264 inferred phrases),

*p*=

0.495±0.055 for

*S. cerevisiae* (from 196 phrases), and

*p*=

0.0531±0.035 for

*B. subtilis* (from 307 phrases). We then constructed random Boolean functions with internal bias given by these probabilities.

We report in the Derrida curves (i.e. the graphs of

*M*(

*x*)) for the five genetic networks under consideration. It is apparent from this figure that in all five cases the Derrida curves arrive almost tangent to the identity close to the origin. The above is consistent with critical behavior at the genetic level. A polynomial regression analysis allows us to estimate the average network sensitivity

*S*=

[

*dM*(

*x*)/

*dx*]

_{x}_{=0} by computing the slope at the origin of the best-fit polynomial with a degree equal to the maximum number of regulators per gene in each case (this is the polynomial predicted by the mean-field theory). In all five cases the Regression Sum of Squares is below 10

^{−4}, and the average network sensitivities obtained in this way are:

*S*=

1.028 for

*E. coli*;

*S*=

1.036 for

*S. cerevisiae*;

*S*=

0.826 for

*B. subtilis*;

*S*=

0.914 for

*D. melanogaster*; and

*S*=

1.127 for

*A. thaliana*. Within numerical accuracy, these sensitivities show that the dynamics of these networks are very close to criticality. Note that this is particularly true for the two largest and most complete networks of

*E. coli* and

*S. cerevisiae*.

Criticality and its robustness revealed by varying the degree of canalization

We have used random Boolean functions in our simulations because the overwhelming majority of the logical rules for

*E. coli*,

*S. cerevisiae* and

*B. subtilis*, are still unknown. The caveat is that real biological functions are not random. It has been pointed out that canalizing functions are more realistic from a biological point of view

[27],

[28]. A formal definition of canalizing functions can be found in Ref.

[31]. Here, it suffices to define a canalizing function as follows. Let

*F*(

*g*_{1},

*g*_{2}, …,

*g*_{k}) be a Boolean function of

*k* arguments. We will say that

*F* is canalizing on one of its arguments

*g*_{i}, if the value of

*F* is determined by fixing the value of

*g*_{i} either to 0 or to 1. To illustrate this concept, shows a function

*F*(

*g*_{1},

*g*_{2},

*g*_{3}) that depends on three arguments. In this example,

*F*=

0 whenever

*g*_{2}=

1, regardless of the values of

*g*_{1} and

*g*_{3}. Therefore,

*F* is canalizing on

*g*_{2}. The biological significance of canalizing functions is based on the existence of dominant regulators. Thus, in the example shown in ,

*g*_{2} could represent a dominant repressor (like

*crp* in

*E. coli*) which, when present, blocks the transcription of the target gene regardless of the presence or absence of the activators.

| **Table 1**Example of canalizing function. |

It is known that the amount of canalizing functions in the system can change the dynamical regime in which the network operates

[29],

[30]. Therefore, it is important to determine if the dynamics of the large networks of

*E. coli*,

*S. cerevisiae* and

*B. subtilis* are still critical when more realistic Boolean functions are used. However, from the microarray experiments that we analyzed it is impossible to know the fraction of canalizing functions present in these organisms, or if such functions have one, two or more canalizing inputs. For instance, according to the

*regulonDB*, which gives the topology of the transcription regulatory network of

*E. coli* [46], there are at least 817 genes with two or more regulators. These 817 genes are all regulated by a subset of 160 genes. Therefore, to determine whether or not the Boolean functions associated with these genes are canalizing, one would have to analyze microarray experiments probing at least 2

^{160} different configurations for these genes. This is impossible for many reasons, but mainly because not all the 2

^{160} configurations of the regulators are biologically attainable. (Consider for instance the configuration 00000….00 in which all the regulators are “off”, or the configuration 11111…11 in which all the regulators are “on”. Clearly, these two configurations are not attainable under any biological condition—without killing the organism.) Therefore, the 152 microarrays used to sample the gene expression space in

*E. coli* represent a very tiny fraction of all the possible configurations necessary to determine the whole set of Boolean functions. However, these 152 experiments are absolutely relevant because they represent the

*observable and biologically attainable* gene expression configurations of the organism. Thus, it might be irrelevant if the whole Boolean function of a given gene is canalizing because neither us nor the organism are sampling its whole set of (mathematically possible) configurations. For this reason, we do believe that the important quantity is the

*observed* gene expression probability for the biologically attainable configurations.

Nonetheless, Random Boolean functions already contain canalizing functions. gives the probability

*P*_{c}(

*K*) for a randomly generated Boolean function with

*K* inputs to be canalizing on at least one of its inputs (data taken from

[31]). As we can see from , the probability for a randomly generated Boolean function to be canalizing is high for

*K*=

1, 2 and 3 (Boolean functions with

*K*=

1 are canalizing, by definition). On the other hand, shows the distribution

*P*_{E}(

*K*) of the number of genes with

*K* inputs in the genetic network of

*E. coli*, according to the last update of the regulonDB

[46].

| **Table 2**Fraction of canalizing functions. |

| **Table 3**Distribution of regulators per gene in E. coli. |

From and we obtain that, if the Boolean functions for

*E. coli* are generated at random, just by chance about

functions out of 1482 (70%) would be canalizing on at least one input. If we do not consider the 615 genes with only one regulator (because for such genes the Boolean function is trivially canalizing), and the 50 genes with no inputs, then there are 817 genes with two or more regulators. From the data listed in and one obtains that, for the genes with

*K*≥2, about

out of 817 of the randomly generated functions would be canalizing just by chance. It is important to note that these 421 canalizing functions mostly come from the genes with

*K*=

2 and

*K*=

3. Thus, combining the results presented in and we obtain that the fraction

*f*_{c} of canalizing functions present just by chance for the genes with

*K*≥2 is given by

*f*_{c}≈421/817≈0.51.

Therefore, in the simulations with random Boolean functions there is already a large fraction of canalizing functions. To further investigate the effect that canalizing functions might have on the dynamics of the genetic networks, we have added and removed canalizing functions to and from the ones already present just by chance. Our results indicate that the critical behavior observed in the genetic dynamics still exists even when the fraction of canalizing functions substantially deviates from the value expected for fully random functions.

shows the map

*M*(

*x*) for the

*E. coli* network for several values of the fraction

*f*_{c} of genes with

*K*≥2 regulated by canalizing functions. corresponds to the case in which there are more canalizing functions than the ones present just by chance, whereas shows the opposite case in which there are less canalizing functions. To compute

*f*_{c} we have ignored the genes with only one regulator, taking into account only the 817 genes with two or more regulators. We have already mentioned that in the

*E. coli* network,

*f*_{c}=

0.51 for fully random Boolean functions. Additionally, from we also see that for

*K*=

2 and

*K*=

3 the probability

*P*_{c}(

*K*) for a random Boolean function to be canalizing is relatively high (

*P*_{c}(2)

=

0.875 and

*P*_{c}(3)

=

0.468), whereas for

*K*≥4 the probability is extremely low. Therefore, what we did to increase the value of

*f*_{c} was to add canalizing functions only to the genes with

*K*≥4 in such a way as to preserve the overall value

*p*=

0.576±0.038 of the gene expression probability observed in microarray experiments. There are 230 genes with

*K*≥4, and we can consider that none of them are regulated by canalizing functions just by chance (the probability is very low). Therefore, to increase

*f*_{c} we

*forced* a fraction

*q* of these 230 genes to be regulated by canalizing functions. In we present the Derrida maps for

*q*=

0.1, 0.3, 0.5, 0.7 and 0.9. When considering the other 421 genes already regulated by canalizing functions just by chance, these values of

*q* correspond to

*f*_{c}=

0.544, 0.600, 0.656, 0.712, and 0.768, respectively. (We computed

*f*_{c} as

*f*_{c}=

(421+230×

*q*)/816). As we can see from , the map

*M*(

*x*) is practically the same even for

*f*_{c}=

0.656, namely, even when two thirds of the genes with two or more inputs are regulated by canalizing functions. Only for

*f*_{c}>0.7 significant deviations from criticality are observed (dotted-dashed curves in ).

Analogously, to decrease the value of

*f*_{c} we removed a fraction

*q* of the canalizing functions from the 421 genes which originally are regulated by canalizing functions just by chance. In other words, we forced a fraction

*q* of these 421 genes to be regulated by non-canalizing functions. In this case,

*f*_{c} is given by

*f*_{c}=

421(1−

*q*). shows the Derrida maps for

*q*=

0.1, 0.3, 0.5, 0.7 and 0.9.

From it is clear that the Derrida map is less robust to the removal than to the addition of canalizing functions. However, in the region 0.4≤*f*_{c}≤0.66 the Derrida map does not seem to change substantially.

To indicate the significance of the above results we present in the map

*M*(

*x*) for homogeneous random networks with different fractions

*f*_{c} of canalizing functions. In these networks each gene has exactly

*K*=

2 regulators chosen randomly from anywhere in the system, and the bias

*p* of the Boolean functions is

*p*=

0.5. With fully random functions the network operates in the critical regime. corresponds to the addition and to the removal of canalizing functions. In both cases we used networks with

*N*=

1481 (the same number of genes as in the

*E. coli* network). In a homogeneous random network with

*N* genes and connectivity

*K*, just by chance there are

*N*_{c}=

*N*×

*P*_{c}(

*K*) genes regulated by canalizing functions, and

*N*_{nc}=

*N*×(1−

*P*_{c}(

*K*)) genes regulated by non-canalizing functions. By forcing a fraction

*q* of these

*N*_{nc} genes to be regulated by canalizing functions, the overall fraction

*f*_{c} of canalizing functions in the network increases according to

*f*_{c}=

(

*N*_{c}+

*q*×

*N*_{nc})/

*N*. Analogously, by removing a fraction

*q* of the

*N*_{c} canalizing functions present just by chance, the fraction

*f*_{c} of remaining canalizing functions is

*f*_{c}=

*N*_{c}(1−

*q*). In and 4(b) we present results for

*q*=

0.1, 0.3, 0.5, 0.7 and 0.9, which produce the corresponding values of

*f*_{c} displayed in the Figure. As we can see from , for homogeneous random networks the addition or removal of even a small fraction of canalizing functions, on top of the ones that are already present by chance, has a much bigger effect on the dynamics than for the

*E. coli* network.

These results strongly suggest that the critical behavior observed in the dynamics of the genetic networks of real organisms, and its robustness to changes in the fraction of canalizing functions, are most probably due to the network architecture rather than to the logical rules that regulate the expression of the genes. If, for instance, there were a large fraction of genes with 5 inputs in the *E. coli* network, then the presence of canalizing functions would certainly affect the dynamical regime in which the network operates. But given that most of the genes have 1, 2 or 3 inputs, there is already a large fraction of canalizing functions just by chance. Adding more does not substantially affect the dynamical behavior.