Home | About | Journals | Submit | Contact Us | Français |

**|**Proc Biol Sci**|**v.276(1666); 2009 July 7**|**PMC2690470

Formats

Article sections

- Abstract
- 1. Introduction
- 2. Results
- 3. Discussion
- 4. Conclusion
- Appendix A. simulations of network evolution
- References

Authors

Related links

Proc Biol Sci. 2009 July 7; 276(1666): 2493–2501.

Published online 2009 April 8. doi: 10.1098/rspb.2009.0210

PMCID: PMC2690470

Received 2009 February 6; Accepted 2009 March 17.

Copyright © 2009 The Royal Society

This article has been cited by other articles in PMC.

Transcription networks have an unusual structure. In both prokaryotes and eukaryotes, the number of target genes regulated by each transcription factor, its *out*-degree, follows a broad tailed distribution. By contrast, the number of transcription factors regulating a target gene, its *in*-degree, follows a much narrower distribution, which has no broad tail. We constructed a model of transcription network evolution through *trans*- and *cis*-mutations, gene duplication and deletion. The effects of these different evolutionary processes on the network structure are enough to produce an asymmetrical *in*- and *out*-degree distribution. However, the parameter values required to replicate known *in*- and *out*-degree distributions are unrealistic. We then considered variation in the rate of evolution of a gene dependent upon its position in the network. When transcription factors with many regulatory interactions are constrained to evolve more slowly than those with few interactions, the details of the *in*- and *out*-degree distributions of transcription networks can be fully reproduced over a range of plausible parameter values. The networks produced by our model depend on the relative rates of the different evolutionary processes. By determining the circumstances under which the networks with the correct degree distributions are produced, we are able to assess the relative importance of the different evolutionary processes in our model during evolution.

Transcription regulation plays a key role in determining cellular function, response to external stimuli and development. Regulatory proteins orchestrate gene expression through thousands of interactions resulting in a system too complex to be easily understood in detail. This makes elucidation of gene regulation from a global perspective—that of the transcription network as a whole—an important challenge.

Genes in a transcription network either have outgoing edges, incoming edges or both. Outgoing edges from a gene represent the different targets that it regulates, while incoming edges at a gene represent the different transcription factors that regulate it. A number of studies (Thieffry *et al*. 1998; Guelzim *et al*. 2002; Maslov & Snepen 2005) have established that, in both prokaryotes and eukaryotes, the degree distributions for outgoing and incoming edges are very different. The *out*-degree distribution, *n*_{out}(*k*), follows a broad tailed distribution that is best described by a power-law:

$${n}_{out}(k)\propto {k}^{-\gamma}.$$

(1.1)

The exponent *γ* is observed to be in the range 1<*γ*<2 (Guelzim *et al*. 2002; Maslov & Snepen 2005). A power-law distribution indicates that there are a small number of hub transcription factors that regulate a large number of genes (Barabási & Oltvai 2004). Interpretation of power-law degree distributions, and the small world structure they confer, has been the focus of a great deal of attention (Barabási & Albert 1999; Bahn *et al*. 2002; Pastor-Sorras *et al*. 2002; Chung *et al*. 2003; Wagner 2003; Barabási & Oltvai 2004; Pagel *et al*. 2007). In particular, it has been suggested that a power-law distribution may deliver an evolutionary advantage through increased mutational robustness and evolvability (Barabási & Oltvai 2004).

However, the *in*-degree distribution of transcription networks is much narrower than a power-law and has no broad tail (Thieffry *et al*. 1998; Guelzim *et al*. 2002; Maslov & Snepen 2005). It is best described by an exponential distribution

$${n}_{in}(k)\propto {\text{e}}^{-\alpha k}.$$

(1.2)

The exponential *in*-degree distribution reflects the fact that only a few transcription factors combinatorially regulate any one gene. There exist no hub target genes. For example, in the yeast transcription network, 93 per cent of target genes are regulated by less than five transcription factors (Guelzim *et al*. 2002).

The extent to which the *in*- and *out*-degree distributions of transcription networks are different is intriguing, and the cause unknown. In this paper, we develop a model to explain the evolution of the asymmetrical transcription network degree distribution observed in yeast and other organisms. We focus on the different types of mutation through which the network evolves. Changes to the outgoing and incoming edges at a gene may occur as the result of mutation to a regulatory protein (*trans*-mutation) or as the result of mutation to transcription factor-binding sites (*cis*-mutation). These two processes change the network structure in different ways, but both result in either the loss or gain of regulatory interactions between existing genes. In addition, genes themselves may be lost or gained in the network through deletion and duplication.

The rates at which a gene evolves may vary according to its connectivity in the transcription network (Maslov *et al*. 2004; Wagner & Wright 2007). We investigate two types of connectivity-dependent evolution. It is often argued (Barabási & Oltvai 2004) that hub genes, which participate in many regulatory interactions, are particularly important for the proper functioning of the network, and are therefore constrained to evolve more slowly. This leads to the expectation of a slower rate of evolution among genes that regulate many downstream targets and a faster rate of evolution among genes that regulate only a few targets. It has also been suggested that a process of preferential attachment may occur in biological networks (Barabási & Oltvai 2004). Under preferential attachment, new interactions are gained in proportion to the number of interactions a node already participates in. Such a process has been shown to occur in protein–protein interactions networks (Wagner 2003; Pagel *et al*. 2007).

We construct a model incorporating evolution through *trans*- and *cis*-mutations, gene duplication and deletion along with variation in evolutionary rates depending on the connectivity of a gene. We use our model to unravel the relationship between the rates of evolution of genes through different processes in relation to the network structure.

There are four types of network mutation in out model—gene deletion and duplication, plus *cis*- and *trans*-mutation. The *in*- and *out*-degree distributions of the network are determined by the rates at which these different types of mutation become fixed in the transcription network of a population. Since there is a clear functional difference between genes that code for transcription factors and those that code for other types of protein, we separate genes into two groups. Those with regulatory functions are labelled transcription factors (TFs) and those that are only regulated are labelled target genes (TGs). TGs have only incoming edges, while TFs may have either outgoing or incoming edges.

We establish the equilibrium *in*- and *out*-degree distributions for four different versions of our model. In the first version, the rates of evolution are independent of a gene's connectivity. We then consider two types of connectivity dependence in TF evolution. In the second version of our model, there is connectivity dependence such that the TFs with a large number of interactions undergo *trans*-evolution more slowly than those with few interactions. This is referred to as *degree dependence* in the rate of *trans*-evolution. In the third version of our model, there is connectivity dependence such that TFs gain new targets at a rate proportional to the number of targets they regulate. This is referred to as *preferential attachment*. The final version of our model includes both degree dependence in the rate of *trans*-evolution and preferential attachment.

We assume that when genes are duplicated they inherit all the regulatory interactions of their parent. Evolution through duplication occurs at rate *D*^{+} and deletion occurs at rate *D*^{−} per gene (figure 1*a*). A TF of out-degree *k* gains outgoing edges due to duplication of its targets at rate *kD*^{+}, and loses outgoing edges due to deletion of its targets at rate *kD*^{−}. Similarly, a gene of *in*-degree *j* gains incoming edges due to duplication of TFs at rate *jD*^{+}, and loses incoming edges due to deletion of TFs at rate *jD*^{−}. If the rates of gene deletion and duplication are different, this will result in either growth (if the rate of duplication is greater than the rate of deletion), or decline (if the rate of deletion is greater than the rate of duplication) in the size of the network. We assume that the rate of growth (or decline) of the network is small compared to the rate of rewiring of regulatory interactions through *trans*- and *cis*-mutation (Gao & Innan 2004; Doniger & Fay 2007; Ward & Thornton 2007). Thus, we consider only networks of constant size, and therefore assume that ${D}^{+}={D}^{-}=D$.

A *trans*-mutation results in a change in the ability of TFs to bind to the promoter region of a gene. This may occur through a change in the binding affinity of a TF for a regulatory site. Alternatively, it may be the result of a TF gaining or losing an interaction with another TF, which helps it bind to the promoter region of a target (Tuch *et al*. 2008). Therefore, a *trans*-mutation in our model refers to a mutation affecting a transcription factor protein only. It does not refer to mutations affecting the *cis*-regulatory regions of *trans*-acting genes.

Following fixation of such a *trans*-mutation, a TF can cease to control some of the genes it currently regulates and can gain control over new genes. We assume that *trans*-evolution resulting in a TF potentially losing targets occurs at a constant rate ${\mu}_{trans}^{-}$. In this process, an existing target is lost with probability *m*. The probability, ${P}_{k,\mathrm{\Delta}k}^{-}$, that a TF with *k out*-edges loses Δ*k* of its targets following a *trans*-mutation is given by

$${P}_{k,\mathrm{\Delta}k}^{-}=\frac{k!}{\mathrm{\Delta}k!(k-\mathrm{\Delta}k)!}{m}^{\mathrm{\Delta}k}{(1-m)}^{k-\mathrm{\Delta}k}.$$

(2.1)

Similarly, we assume that *trans*-evolution resulting in the gain of new targets by a TF occurs at a constant rate ${\mu}_{trans}^{+}$, which is independent of the *out*-degree of the TF. Overall, *trans*-evolution results in a gene losing incoming edges at rate $m{\mu}_{trans}^{-}$ (per edge) and gaining a new incoming edge at rate ${\mu}_{trans}^{+}(1-(k/N))$ (figure 1*b*). The factor $(1-(k/N))$ gives the probability that the gene gaining the new incoming edge is not one of the *k* genes currently regulated by the mutated TF.

A *cis*-mutation results in the gain of a new binding site or the loss of an existing binding site in the promoter region of a gene. The rate at which binding sites are lost is ${\mu}_{cis}^{-}$. The probability that a gene, which is regulated by *k* TFs, loses an interaction through loss of a TF binding site is $k{\mu}_{cis}^{-}$. A gene may also gain a new regulatory binding site for any TF in the network to which it is not currently connected, at rate ${\mu}_{cis}^{+}$ (figure 1*c*). Therefore, a gene currently regulated by *k* TFs gains an incoming edge through *cis*-evolution at a rate ${\mu}_{cis}^{+}(1-(k/N))$. Throughout, we assume that the size of the network *N* is large compared to any realistic *in*- or *out*-degree *k*, so that the terms *k*/*N* may be neglected. Thus, new incoming edges are gained at constant rates ${\mu}_{trans}^{+}$ through *trans*-evolution and ${\mu}_{cis}^{+}$ through *cis*-evolution.

We also develop a model in which degree dependence in the rate of *trans*-evolution occurs. In this model, a *trans*-mutation, which results in TF-losing interactions, is fixed with a probability that depends on its *out*-degree. Since a *trans*-mutation affects the functioning of the transcription factor itself, it potentially alters all of the interactions in which a TF takes part. We assume that a *trans*-mutation at a TF with *k* targets has a deleterious effect on the functioning of the network that is proportional to *k*. Therefore, we assume that a *trans*-mutation resulting in the loss of edges from the network is fixed with probability proportional to 1/*k*. In this way, the rates of evolution of TFs are degree dependent.

A summary of all the parameters used in the model is given in table 1.

We allow evolution of the network by updating it at time intervals Δ*t*, taken so that at most one mutation occurs and goes to fixation within each interval. Hence, the mean field equation for the expected number of genes with *in*-degree *k* at time *t*, changes in the time interval Δ*t* by,

$$\begin{array}{l}\mathrm{\Delta}{n}_{in}=\left({\Pi}_{\text{TG}+}^{in}(k-1)+{\Pi}_{\text{TF}+}^{in}(k-1)\right){n}_{in}(k-1,t)\\ +\left({\Pi}_{\text{TG}-}^{in}(k+1)+{\Pi}_{\text{TF}-}^{in}(k+1)\right){n}_{in}(k+1,t)\\ -\left({\Pi}_{\text{TG}+}^{in}(k)+{\Pi}_{\text{TF}+}^{in}(k)+{\Pi}_{\text{TG}-}^{in}(k)+{\Pi}_{\text{TF}-}^{in}(k)\right){n}_{in}(k,t)\end{array}$$

(2.2)

where ${\Pi}_{\text{TG}+}^{in}(k)$ and ${\Pi}_{\text{TG}-}^{in}(k)$ are the probabilities of a gene with *in*-degree *k* gaining or losing an edge through mutation at the regulated gene; and ${\Pi}_{\text{TF}+}^{in}(k)$ and ${\Pi}_{\text{TF}-}^{in}(k)$ are the probabilities of a gene with *in*-degree *k* gaining or losing an edge through mutation at a TF regulating it, in the time interval Δ*t*. Similarly, the expected number of genes with *out*-degree *k* at time *t* changes in the time interval Δ*t* by

$$\mathrm{\Delta}{n}_{out}=\left({\Pi}_{\text{TG}+}^{out}(k-1)+{\Pi}_{\text{TF}+}^{out}(k-1)\right){n}_{out}(k-1,t)+{\Pi}_{\text{TG}-}^{out}(k+1){n}_{out}(k+1,t)-\left({\Pi}_{\text{TG}+}^{out}(k)+{\Pi}_{\text{TG}-}^{out}(k)+{\Pi}_{\text{TF}+}^{out}(k)\right){n}_{out}(k,t)+\sum _{j=k}^{N}{\Pi}_{\text{TF}-}^{out}(j,k){n}_{out}(j,t)-\sum _{j=0}^{k}{\Pi}_{\text{TF}-}^{out}(k,j){n}_{out}(k,t),$$

(2.3)

where *N* is the number of genes (TFs and TGs) in the network; ${\Pi}_{\text{TG}+}^{out}(k)$ and ${\Pi}_{\text{TG}-}^{out}(k)$ are the probabilities of a gene with *out*-degree *k* gaining or losing an edge through mutation at one of its targets; ${\Pi}_{\text{TF}+}^{out}(k)$ is the probability that a TF with *out*-degree *k* gains a target through mutation at the TF; and ${\Pi}_{\text{TF}-}^{out}(j,k)$ is the probability that a TF with *out*-degree *j*≥*k* loses interactions to become a TF with *out*-degree *k* due to mutation at the TF.

The equilibrium *in*-and *out*-degree distributions for the model can be found from equations (2.2) and (2.3), by setting the left-hand sides of both equations to 0 (see the electronic supplementary material). The equilibrium *in*-degree distribution satisfies

$$\left({\Pi}_{\text{TF}-}^{in}(k+1)+{\Pi}_{\text{TG}-}^{in}(k+1)\right){n}_{in}(k+1)=\left({\Pi}_{\text{TF}+}^{in}(k)+{\Pi}_{\text{TG}+}^{in}(k)\right){n}_{in}(k).$$

(2.4)

After making a number of approximations (see the electronic supplementary material), the equilibrium *out*-degree distribution satisfies

$$\left({\Pi}_{\text{TG}-}^{out}(k+1)+(k+1){\mu}_{trans}^{-}K(\gamma ,m)\right){n}_{out}(k+1)=\left({\Pi}_{\text{TG}+}^{out}(k)+{\Pi}_{\text{TF}+}^{out}(k)-k{\mu}_{trans}^{-}K(\gamma ,m)\right){n}_{out}(k),$$

(2.5a)

for the model excluding degree dependence in the rate of *trans*-evolution and

$$\left({\Pi}_{\text{TG}-}^{out}(k+1)+{\mu}_{trans}^{-}K(\gamma ,m)\right){n}_{out}(k+1)=\left({\Pi}_{\text{TG}+}^{out}(k)+{\Pi}_{\text{TF}+}^{out}(k)-{\mu}_{trans}^{-}K(\gamma ,m)\right){n}_{out}(k),$$

(2.5b)

for the model including degree dependence in the rate of *trans*-evolution. The positive parameter *γ* arises from the approximations used to obtain equations (2.5*a*) and (2.5*b*) (see the electronic supplementary material), and the functions *K*(*γ*,*m*) are specific to each of the models we consider and will be described below.

We now solve equations (2.4), (2.5*a*) and (2.5*b*) for the *in*- and *out*-degree distributions for four specific models of transcription network evolution. We start using a simple model and then investigate different models including degree dependence in the rate of *trans*-evolution and preferential attachment, to ask what conditions are required to explain the observed difference between the *in*- and *out*-degree distributions of transcription networks.

In the first model, we assume there is neither any degree dependence nor any preferential attachment in the rate of *trans*-evolution. The event probabilities for the *in*- and *out*-degree distributions in this model are given in table 2*a*. Substituting these in equations (2.4) and (2.5*a*), we find for the *in*-degree distribution (see the electronic supplementary material)

$${n}_{in}(k)\propto {k}^{-\lambda}\phantom{\rule{0.25em}{0ex}}{\text{e}}^{-\alpha k},$$

where

$$\begin{array}{l}\alpha =\text{ln}\left(1+\frac{{\mu}_{cis}^{-}+m{\mu}_{trans}^{-}}{D}\right),\\ \lambda =1-\frac{{\mu}_{cis}^{+}+{\mu}_{trans}^{+}}{D}.\end{array}$$

(2.6)

This is approximately an exponential distribution, characterized by *α*, unless *α* is small, which occurs if $D\gg {\mu}_{cis}^{-}+{\mu}_{trans}^{-}m$, or *λ* is large and negative, which occurs if $D\ll {\mu}_{cis}^{+}+{\mu}_{trans}^{+}$.

The equilibrium *out*-degree distribution for this model obtained from equation (2.5*a*) is

$${n}_{out}(k)\propto {k}^{-\gamma}\phantom{\rule{0.25em}{0ex}}{\text{e}}^{-\beta k},$$

where

$$\begin{array}{l}\beta =\text{ln}\left(\frac{{\mu}_{cis}^{-}+D+{\mu}_{trans}^{-}K(\gamma ,m)}{D-{\mu}_{trans}^{-}K(\gamma ,m)}\right),\\ \gamma =1-\frac{{\mu}_{cis}^{+}+{\mu}_{trans}^{+}}{D-{\mu}_{trans}^{-}K(\gamma ,m)},\end{array}$$

(2.7)

and *K*(*γ*,*m*) is as in table 2*b*. This distribution is a power-law characterized by *γ* only if *β* is 0. This occurs if ${\mu}_{cis}^{-}=-2{\mu}_{trans}^{-}K(\gamma ,m)$. However, as the rates, ${\mu}_{cis}^{-}$ and ${\mu}_{trans}^{-}$, are both positive constants, and $K(\gamma ,m)>0$ (table 2*b*), this condition cannot be met. Therefore, this model cannot produce a power-law *out*-degree distribution.

In this model, we allow degree dependence in the rate of *trans*-evolution. Substituting the event probabilities for this model (table 2*a*) into equations (2.4) and (2.5*b*), we find for the *in*-degree distribution

$${n}_{in}(k)\propto {k}^{-\lambda}\phantom{\rule{0.25em}{0ex}}{\text{e}}^{-\alpha k},$$

where

$$\begin{array}{l}\alpha =\text{ln}\left(1+\frac{{\mu}_{cis}^{-}+m\langle \frac{1}{k}\rangle {\mu}_{trans}^{-}}{D}\right),\\ \lambda =1-\frac{{\mu}_{cis}^{+}+{\mu}_{trans}^{+}}{D},\end{array}$$

(2.8)

and $\langle 1/k\rangle ={\sum}_{\mathit{j}=1}^{N}{n}_{out}(j)/j$ determines the mean rate of *trans*-evolution across the network. Following the same procedure as for model 1, this distribution will be approximately exponential unless *α* is small or *λ* is large and negative, which occurs when $D\gg {\mu}_{cis}^{-}+\langle 1/k\rangle {\mu}_{trans}^{-}m$ or $D\ll {\mu}_{cis}^{+}+{\mu}_{trans}^{+}$, respectively.

The equilibrium *out*-degree distribution for this model is

$${n}_{out}(k)\propto {k}^{-\gamma}\phantom{\rule{0.25em}{0ex}}{\text{e}}^{-\beta k},$$

where

$$\begin{array}{l}\beta =\text{ln}\left(1+\frac{{\mu}_{cis}^{-}}{D}\right),\\ D(\gamma -1)+{\mu}_{cis}^{+}+{\mu}_{trans}^{+}={\mu}_{trans}^{-}K(\gamma ,m)\left(1+\frac{D}{D+{\mu}_{cis}^{-}}\right),\end{array}$$

(2.9)

and *K*(*γ*,*m*) is as in table 2*b*. This distribution is a power-law characterized by *γ* only if *β* is 0. This occurs if $D\gg {\mu}_{cis}^{-}$. Under this condition, equation (2.9) has solutions with *γ*>1 provided $m{\mu}_{trans}^{-}>{\mu}_{cis}^{+}+{\mu}_{trans}^{+}$.

This model includes preferential attachment, but excludes degree dependence in the rate of *trans*-evolution (considered in model 2). In preferential attachment models, the rate at which nodes gain new edges is proportional to the number of edges already attaching to them. Preferential attachment has been discussed widely in the study of other biological networks (Wagner 2003; Barabási & Oltvai 2004; Pagel *et al*. 2007), including in the protein–protein interaction network of yeast (Wagner 2003).

We model preferential attachment of incoming and outgoing edges separately. For incoming edges our model is as follows: new edges arise due to *trans*-evolution at rate ${\mu}_{trans}^{+}$. When such a new edge arises, it may be either through preferential attachment or through random attachment (i.e. the new edge attaches to each gene with equal probability) at the gene that is regulated. In the case of preferential attachment, the probability that a gene gains a new incoming edge is proportional to its *in*-degree. In the case of random attachment, the probability that a gene gains a new incoming edge is independent of its *in*-degree. We assume that such new edges undergo preferential attachment to a gene at rate ${\mu}_{trans}^{\text{P}}$, and undergo random attachment at a rate ${\mu}_{trans}^{\text{R}}$. The rate at which a gene of *in*-degree *k* gains a new edge due to preferential attachment is $k{\mu}_{trans}^{\text{P}}$, and the rate at which it gains a new edge due to random attachment is ${\mu}_{trans}^{\text{R}}$. The total rate at which TFs gain new outgoing edges is then ${\mu}_{trans}^{+}=(E/{N}_{\text{TF}}){\mu}_{trans}^{\text{P}}+(N/{N}_{\text{TF}}){\mu}_{trans}^{\text{R}}$, where *E* is the total number of edges in the network.

Our model of preferential attachment for outgoing edges is of the same form: new edges arise due to *cis*-evolution at rate ${\mu}_{cis}^{+}$. The rate at which a TF of *out*-degree *k* gains new outgoing edges due to preferential attachment is then $k{\mu}_{cis}^{\text{P}}$, and the rate at which it gains new edges due to random attachment is ${\mu}_{cis}^{\text{R}}$. The total rate at which genes gain new incoming edges is then ${\mu}_{cis}^{+}=(E/N){\mu}_{cis}^{\text{P}}+({N}_{\text{TF}}/N){\mu}_{cis}^{\text{R}}$. Our model of preferential attachment is illustrated in figure 2*a*.

(*a*) Preferential attachment: (i) preferential attachment of incoming edges. A TF choosing between a TG with three incoming edges and a TG with two incoming edges gains an interaction with the first with probability 0.6 and with the second with probability **...**

The event probabilities for the *in*- and *out*-degree distributions in this model are given in table 2*a*. Substituting these into equations (2.4) and (2.5*a*), the *in*-degree distribution is

$${n}_{in}(k)\propto {k}^{-\lambda}\phantom{\rule{0.25em}{0ex}}{\text{e}}^{-\alpha k},$$

where

$$\alpha =\text{ln}\left(\frac{D+{\mu}_{cis}^{-}+m{\mu}_{trans}^{-}}{D+{\mu}_{trans}^{\text{P}}}\right),$$

$$\lambda =1-\frac{{\mu}_{cis}^{+}+{\mu}_{trans}^{\text{R}}}{D+{\mu}_{trans}^{\text{P}}}.$$

(2.10)

This distribution will be approximately exponential unless *α* is small or *λ* is large and negative. That is, unless $D+{\mu}_{trans}^{\text{P}}\gg {\mu}_{cis}^{-}+{\mu}_{trans}^{-}m$, or $D+{\mu}_{trans}^{\text{P}}\ll {\mu}_{cis}^{+}+{\mu}_{trans}^{\text{R}}$. Therefore, we require ${\mu}_{trans}^{\text{P}}\sim {\mu}_{cis}^{-}+m{\mu}_{trans}^{-}$. The equilibrium *out*-degree distribution for this model is

$${n}_{out}(k)\propto {k}^{-\gamma}\phantom{\rule{0.25em}{0ex}}{\text{e}}^{-\beta k},$$

where

$$\beta =\text{ln}\left(\frac{{\mu}_{cis}^{-}+D+{\mu}_{trans}^{-}K(\gamma ,m)}{D+{\mu}_{cis}^{\text{P}}-{\mu}_{trans}^{-}K(\gamma ,m)}\right),$$

$$\gamma =1-\frac{{\mu}_{cis}^{\text{R}}+{\mu}_{trans}^{+}}{D+{\mu}_{cis}^{\text{P}}-{\mu}_{trans}^{-}K(\gamma ,m)},$$

(2.11)

and *K*(*γ*,*m*) is as in table 2*b*. This distribution is a power-law characterized by *γ* only if *β* is 0. This occurs if ${\mu}_{cis}^{-}={\mu}_{cis}^{\text{P}}-2{\mu}_{trans}^{-}K(\gamma ,m)$. Equation (2.11) then gives $\gamma =1-(({\mu}_{cis}^{\text{R}}+{\mu}_{trans}^{+})/(D+(1/2)({\mu}_{cis}^{-}+{\mu}_{cis}^{\text{P}})))$, and the only solutions have *γ*<1.

In the final model, we include degree dependence (as described in model 2) and preferential attachment (as described in model 3). The event probabilities for this model are given in table 2*a*. Using these with equations (2.4) and (2.5*b*), we find for the *in*-degree distribution

$${n}_{in}(k)\propto {k}^{-\lambda}\phantom{\rule{0.25em}{0ex}}{\text{e}}^{-\alpha k},$$

where

$$\alpha =\text{ln}\left(\frac{D+{\mu}_{cis}^{-}+m\langle \frac{1}{k}\rangle {\mu}_{trans}^{-}}{D+{\mu}_{trans}^{\text{P}}}\right),$$

$$\lambda =1-\frac{{\mu}_{cis}^{+}+{\mu}_{trans}^{\text{R}}}{D+{\mu}_{trans}^{\text{P}}}.$$

(2.12)

This distribution is approximately exponential unless *α* is small or *λ* is large and negative. That is, unless $D+{\mu}_{trans}^{\text{P}}\gg {\mu}_{cis}^{-}+\langle 1/k\rangle {\mu}_{trans}^{-}m$, or $D+{\mu}_{trans}^{\text{P}}\ll {\mu}_{cis}^{+}+{\mu}_{trans}^{\text{R}}$. Therefore, we require ${\mu}_{trans}^{\text{P}}\sim {\mu}_{cis}^{-}+m\langle 1/k\rangle {\mu}_{trans}^{-}$. The equilibrium *out*-degree distribution for this model is

$${n}_{out}(k)\propto {k}^{-\gamma}\phantom{\rule{0.25em}{0ex}}{\text{e}}^{-\beta k},$$

where

$$\begin{array}{l}\beta =\text{ln}\left(\frac{D+{\mu}_{cis}^{-}}{D+{\mu}_{cis}^{\text{P}}}\right),\\ \left(D+{\mu}_{cis}^{\text{P}}\right)(\gamma -1)+{\mu}_{cis}^{\text{R}}+{\mu}_{trans}^{+}={\mu}_{trans}^{-}K(\gamma ,m)\left(1+\frac{D+{\mu}_{cis}^{\text{P}}}{D+{\mu}_{cis}^{-}}\right),\end{array}$$

(2.13)

and *K*(*γ*,*m*) is as in table 2*b*. This distribution is a power-law characterized by *γ* only if *β* is 0. This requires ${\mu}_{cis}^{-}={\mu}_{cis}^{\text{P}}$. Under this condition, the third term in equation (2.13) has solutions with *γ*>1 provided $m{\mu}_{trans}^{-}>{\mu}_{cis}^{\text{R}}+{\mu}_{trans}^{+}$.

To assess the four models we have presented, we compare their results to empirical observations from the yeast transcription network. The *out*-degree distribution of the *Saccharomyces cerevisiae* transcription network is best described by a power-law distribution with an exponent *γ*=1.5, while the *in*-degree distribution is best described by an exponential distribution with exponent *α*=0.4 (Maslov & Snepen 2005).

Since the exponent of the *out*-degree distribution for yeast is greater than 1, we conclude that models 1 and 3, which do not include degree dependence in the rate of *trans*-evolution, cannot account for the observed *out*-degree distribution of the *S. cerevisiae* transcription network. However, models 2 and 4, which include degree dependence in the rate of transcription factor evolution, can both produce networks with power-law *out*-degree distributions whose exponent is *γ*>1. Therefore, we conclude that degree dependence in the rate of transcription factor evolution is necessary to reproduce the structure of the yeast transcription network.

We can further distinguish between models 2 and 4 by referring to empirical data on the rates of evolution in the yeast transcription network. The rate of gene duplication in yeast is found to be in the range 1×10^{−5}–6×10^{−5}Myr^{−1} (Gao & Innan 2004). The rate of evolution (gain or loss) of regulatory interactions is an order of magnitude higher, approximately 36×10^{−5}Myr^{−1} (Gu *et al*. 2005). Evolution of regulatory interactions may occur due to changes in regulatory proteins (*trans*-mutations in our model) or due to changes in *cis*-regulatory elements. A *trans*-mutation in our model refers to a mutation affecting a transcription factor protein only. It does not refer to mutations affecting the *cis*-regulatory regions of *trans*-acting genes. In practice, it is difficult to distinguish between the effects of the *trans*- and *cis*-mutations of our model without much more detailed comparative data. Studies on the contribution of the evolution of *cis*-regulatory elements and of *trans*-acting proteins to the evolution of gene expression have mixed findings. Variation between yeast strains have been found to be mainly due to variation in *trans*-acting proteins by some studies (Yvert *et al*. 2003; Zhang *et al*. 2004; Wang *et al*. 2007), while this has been contradicted by others (Ronald *et al*. 2005).

In model 2, a power-law *out*-degree distribution is only produced if $D\gg {\mu}_{cis}^{-}$. If we consider the case in which *trans*-evolution is more rapid than *cis*-evolution, then, given a rate of evolution of regulatory interactions of 36×10^{−5}Myr^{−1} (Gu *et al*. 2005) and a rate of gene duplication of range 1×10^{−5}–6×10^{−5}Myr^{−1} (Gao & Innan 2004), model 2 suggests that the loss of regulatory interactions must be approximately 99 per cent due to *trans*-evolution. Such a disproportionate rate is not consistent with empirical data on the relative contributions of *trans*- and *cis*-change to the evolution of gene expression in yeast (Yvert *et al*. 2003; Zhang *et al*. 2004; Ronald *et al*. 2005; Wang *et al*. 2007). Therefore, we can reject model 2, as inadequate to explain the structure of the yeast transcription network.

Model 4 can produce a power-law *out*-degree distribution provided ${\mu}_{cis}^{-}={\mu}_{cis}^{\text{P}}$. This requirement means that the rate at which transcription factors lose connections to target genes through *cis*-mutations must be balanced by the rate at which they gain new targets through preferential attachment. From this we also conclude that preferential attachment for outgoing edges is necessary to produce the observed yeast transcription network. The condition ${\mu}_{cis}^{-}={\mu}_{cis}^{\text{P}}$ is identical to a model in which transcription factors undergo rewiring (figure 2*b*), and suggests that transcription factors undergo a constant turnover of targets, without net gain or loss.

In order to determine whether preferential attachment among incoming edges occurs, we must consider the *in*-degree distribution of model 4. This is given by equation (2.12), with an exponential exponent, *α*, of approximately 0.4. Given a low rate of duplication, equation (2.12) suggests that preferential attachment of incoming edges at target genes is also necessary to reproduce the structure of the yeast transcription network. Figure 3 shows the result of simulations using model 4, which confirm that this model can reproduce the observed structure of the yeast transcription network.

Our model for loss of interactions through *trans*-evolution includes two parameters, ${\mu}_{trans}^{-}$, the rate at which *trans*-mutations are fixed, and *m*, the probability each interaction is lost given that a *trans*-mutation is fixed. This means that following a *trans*-mutation a transcription factor will retain, on average, a fraction 1−*m* of its interactions. As it is difficult to estimate *m*, we consider two important cases: *m*→0 and *m*=1. In the first case, transcription factors evolve by small changes, one interaction at a time. In the second case, transcription factors lose all their existing interactions, and subsequently gain new ones through both *cis*- and *trans*-evolution. In this case, the TF may be seen as completely losing its old function before acquiring a new function.

When *m*→0, equation (2.13) for the *out*-degree distribution in model 4 may be used to obtain the approximation

$$\gamma =1-\frac{{\mu}_{cis}^{\text{R}}+{\mu}_{trans}^{+}}{D+{\mu}_{cis}^{\text{P}}}+\frac{m{\mu}_{trans}^{-}}{D+{\mu}_{cis}^{\text{P}}}.$$

(3.1)

Similarly, if *m*=1, equation (2.12) may be used to obtain

$$\gamma =\frac{1}{2}\left(1-\frac{{\mu}_{cis}^{\text{R}}+{\mu}_{trans}^{+}}{D+{\mu}_{cis}^{\text{P}}}+\sqrt{{\left(1-\frac{{\mu}_{cis}^{\text{R}}+{\mu}_{trans}^{+}}{D+{\mu}_{cis}^{\text{P}}}\right)}^{2}+\frac{4{\mu}_{trans}^{-}}{D+{\mu}_{cis}^{\text{P}}}}\right).$$

(3.2)

Therefore, given measurements of the relative rates of *cis*- and *trans*-evolution, it would be possible to distinguish between these two cases.

Given values for the other parameters, the value of ${\mu}_{trans}^{-}$ required in equation (3.2) to produce *γ*=1.5 will be greater than the value of $m{\mu}_{trans}^{-}$ required in equation (3.1) to produce the same distribution. The rate at which interactions are lost through *trans*-evolution is proportional to $m{\mu}_{trans}^{-}$. Therefore, the case *m*→0 is consistent with a slower rate of loss of interactions though *trans*-evolution than the case *m*=1. This can be compared with recent work (Gu 2009), suggesting that gene network evolution may be characterized by a 2-2-1 pattern (net gain of two genes and two edges along with loss of one edge). In our model, the ratio of gain of two edges to loss of one edge is more consistent with the case *m*→0 than with the case *m*=1.

We have considered networks in which the rates of gene duplication and deletion balance. However, it is well known that duplication growth models of networks can produce power-law distributions (Bahn *et al*. 2002; Pastor-Sorras *et al*. 2002; Chung *et al*. 2003; Gu 2009). We have not considered growing networks for two reasons. First, the observed low rate of gene duplication in yeast means that genes will undergo rewiring events at a rate that is 10-fold greater than the rate of duplication events. Second, the observed rates of gene duplication and deletion are comparable (Gao & Innan 2004) and suggests that the yeast transcription network is not undergoing constant growth. Therefore, any model that relies on network growth by duplication to reproduce the observed degree distributions in the yeast transcription network is not consistent with the data.

We have also investigated the case of shrinking networks. Although it is obvious that real networks cannot be continuously shrinking, the recent whole genome duplication in yeast (Kellis *et al*. 2004) means that there have been a great many redundant genes that have been lost resulting in an increased rate of gene deletion. Thus, the network has recently been undergoing a period of evolution in which it has been shrinking. We have considered a model in which a network is shrinking (see the electronic supplementary material). We show that this model is not able to reproduce the observed structure of the yeast transcription network without both degree dependence in the rate of *trans*-evolution and preferential attachment. Therefore, this model does not alter our conclusions.

In the analysis above, we neglected autoregulation of transcription factors. Autoregulation alters the consequences of transcription factor duplication. When an autoregulating transcription factor with *k* outgoing edges is duplicated, it gains an edge and becomes a transcription factor with *k*+1 outgoing edges. In our model, we assume that the transcription factors regulate each of the possible *N* targets with equal probability. Therefore, the probability that a transcription factor of *out*-degree *k* autoregulates is *k*/*N*. So the rate at which new transcription factors with *out*-degree *k* are produced due to duplication of autoregulators is $(k-1)(D/N){n}_{out}(k-1)$, and the rate at which transcription factors with *out*-degree *k* are lost due to duplication of autoregulators is $k(D/N){n}_{out}(k)$. Therefore, duplication of autoregulators provides a mechanism for a form of preferential attachment, since it results in transcription factors gaining new outgoing edges at a rate proportional to their *out*-degree. However, the rate at which this preferential attachment occurs is ~(1/*N*) times the rate of gene duplication, *D*. Since *N* is large, duplication of autoregulating transcription factors is therefore expected to have little impact on the equilibrium degree distributions produced by our models. To verify these arguments, we carried out simulations in which autoregulation was permitted in each of the four models (data not shown). The results showed that autoregulation had only a minor quantitative effect on the outcome of the models provided the rate of duplication *D* was not high. We also note empirical findings in the yeast transcription network, which show that only 12 out of 131 (9%) of transcription factors admit autoregulation (Milo *et al*. 2004). Given this, the rate at which new edges are produced through duplication of autoregulating transcription factors is approximately an order of magnitude less than the rate of gene duplication. Even at this rate of autoregulation, duplication of autoregulating transcription factors will not have a significant impact on the degree distribution of the network.

We have compared four simple models for the evolution of transcription networks. Genes are separated into regulatory transcription factors and non-regulatory target genes, which evolve through mutation of *trans*- and *cis*-elements, as well as through deletion and duplication. When rates of evolution are constant across the network, our model can reproduce the exponential *in*-degree and power-law *out*-degree distributions characteristic of transcription networks. However, this model cannot produce networks with the power-law exponent observed in the *out*-degree of the yeast transcription network. It is only when the effects of variation in the rate of protein evolution are taken into account that the correct degree distributions are fully reproduced. This variation takes two forms. First, degree dependence in the rate of *trans*-evolution, meaning that the more regulatory interactions a transcription factor participates in, the more slowly it undergoes *trans*-evolution. Second, preferential attachment, meaning that genes gain new interactions at a rate proportional to the number of interactions they already participate in. The requirement for preferential attachment can be relaxed if the rate of evolution through gene duplication and deletion is high compared to the rate of *cis*-evolution.

We have proposed a model in which the rate of *trans*-evolution among transcription factors varies in inverse proportion to the number of targets they regulate. The true rate of *trans*-evolution depends on the rate of evolution of gene sequence and gene expression (Tirosh & Barkai 2008). The relationship between the evolution of gene expression, gene sequence and position in the transcription network is likely to be complex and is not fully understood (Wagner 2003). Our model suggests that variation in the rate of *trans*-evolution with the position of a gene in an interaction network significantly affects the structure of that network. We have considered these effects in relation to the structure of transcription networks, although they may also play a role in shaping the structure of protein interaction networks and metabolic networks.

Simulations were carried out using ensembles of 1000 networks, each with an expected size of 100 TFs and 100 TGs. Networks were subject to 10^{6} mutations after which the average degree distributions were taken over the ensemble, and the mean degree distributions determined. The evolutionary algorithm used allowed networks to vary in size between a lower and upper boundary of 50 and 150 nodes, for both TFs and TGs. Loss of interactions through *trans*-mutation was executed by deleting each of a TF's outgoing edges with probability *m*. For gain of new interactions, random attachment was executed by selecting a gene and a TF at random and adding an edge between them. Preferential attachment of incoming edges was executed by selecting a gene with a probability proportional to its *in*-degree and a TF at random. A new edge was then added between them. Similarly for preferential attachment of outgoing edges, a TF was selected with probability proportional to its *out*-degree, and another gene was selected at random. An interaction was then added between them. Simulations were run for a range of parameter values. Data shown are for *m*=0.01, corresponding to the case *m*→0 (equation (3.1)). The rate of duplication used is *D*=0.26, the rate of gain of interactions through *trans*-evolution is ${\mu}_{trans}^{+}=0.04$, and through *cis*-evolution is ${\mu}_{cis}^{+}=0.31$. The rate of loss of interactions through *trans*-evolution is $m{\mu}_{trans}^{-}=0.25$ and through *cis*-evolution is ${\mu}_{cis}^{-}=0.14$.

This work was supported by an EPSRC Studentship (A.J.S.) awarded as part of the CoMPLEX Doctoral Training Centre.

In addition it contains details of a model in which a network is shrinking, corresponding to the evolution of the yeast transcription network following whole genome duplication

Click here to view.^{(392K, doc)}

- Bahn A., Galas D., Dewey T. A duplication growth model of gene expression networks. Bioinformatics. 2002;18:1486–1493. doi:10.1093/bioinformatics/18.11.1486 [PubMed]
- Barabási A., Albert R. Emergence of scaling in random networks. Science. 1999;286:509–512. doi:10.1126/science.286.5439.509 [PubMed]
- Barabási A., Oltvai Z. Network biology: understanding the cell's functional organization. Nat. Genet. 2004;5:101–113. doi:10.1038/nrg1272 [PubMed]
- Chung F., Lu L., Dewey G. Duplication models for biological networks. J. Comput. Biol. 2003;10:677–687. doi:10.1089/106652703322539024 [PubMed]
- Doniger S., Fay J. Frequent gain and loss of functional transcription factor binding sites. PLoS Comput. Biol. 2007;3:0932–0942. doi:10.1371/journal.pcbi.0030099 [PMC free article] [PubMed]
- Gao L., Innan H. Very low gene duplication rate in the yeast genome. Science. 2004;306:1367–1370. doi:10.1126/science.1102033 [PubMed]
- Gu X. An evolutionary model for the origin of modularity in a complex gene network. J. Exp. Zool. B: Mol. Dev. Evol. 2009;312:75–82. doi:10.1002/jez.b.21249 [PubMed]
- Gu X., Zhang Z., Wei H. Rapid evolution of expression and regulatory divergences after yeast gene duplication. Proc. Natl Acad. Sci. USA. 2005;102:707–712. doi:10.1073/pnas.0409186102 [PubMed]
- Guelzim N., Bottani S., Bourgine P., Képès F. Topological and causal structure of the yeast transcriptional regulatory network. Nat. Genet. 2002;31:60–63. doi:10.1038/ng873 [PubMed]
- Kellis M., Birren B., Lander E. Proof and evolutionary analysis of ancient genome duplication in the yeast
*Saccharomyces cerevisiae*. Nature. 2004;428:617–624. doi:10.1038/nature02424 [PubMed] - Maslov S., Snepen K. Computational architecture of the yeast regulatory network. Phys. Biol. 2005;2:S94–S100. doi:10.1088/1478-3975/2/4/S03 [PubMed]
- Maslov S., Sneppen K., Eriksen K., Yan K. Upstream plasticity and downstream robustness in evolution of molecular networks. BMC Evol. Biol. 2004;4:9. doi:10.1186/1471-2148-4-9 [PMC free article] [PubMed]
- Milo R., Itzkovitz S., Kashtan N., Levitt R., Shen-Orr S., Ayzenshtat I., Sheffer M., Alon U. Superfamilies of evolved and designed networks. Science. 2004;303:1538–1542. doi:10.1126/science.1089167 [PubMed]
- Pagel M., Meade A., Scott D. Assembly rules for protein networks derived from phylogenetic–statistical analysis of whole genomes. BMC Evol. Biol. 2007;7:S16. doi:10.1186/1471-2148-7-S1-S16 [PMC free article] [PubMed]
- Pastor-Sorras R., Smith E., Sole R. Evolving protein interaction networks through gene duplication. J. Theor. Biol. 2002;222:199–210. doi:10.1016/S0022-5193(03)00028-6 [PubMed]
- Ronald J., Brem R., Whittle J., Kruglyak L. Local regulatory variation in
*Saccharomyces cerevisiae*. PLoS Genet. 2005;1:213–222. doi:10.1371/journal.pgen.0010025 [PubMed] - Thieffry D., Huerta A., Pérez-Rueda E., Collado-Vides J. From specific gene regulation to genomic networks: a global analysis of transcriptional regulation in
*Escherichia coli*. Bioessays. 1998;20:433–440. doi:10.1002/(SICI)1521-1878(199805)20:5<433::AID-BIES10>3.0.CO;2-2 [PubMed] - Tirosh I., Barkai N. Evolution of gene sequence and gene expression are not correlated in yeas. Trends Genet. 2008;24:109–113. doi:10.1016/j.tig.2007.12.004 [PubMed]
- Tuch B., Li H., Johnson A. Evolution of eukaryotic transcription circuits. Science. 2008;319:1797–1799. doi:10.1126/science.1152398 [PubMed]
- Wagner A. How the global structure of protein interaction networks evolves. Proc. R. Soc. Lond. B. 2003;270:457–466. doi:10.1098/rspb.2002.2269 [PMC free article] [PubMed]
- Wagner A., Wright J. Alternative routes and mutational robustness in complex regulatory networks. Biosystems. 2007;88:163–172. doi:10.1016/j.biosystems.2006.06.002 [PubMed]
- Wang D., Sung H., Wang T. Expression evolution in yeast genes of single-input modules is mainly due to changes in
*trans*-acting factors. Genome Res. 2007;17:1161–1169. doi:10.1101/gr.6328907 [PubMed] - Ward J., Thornton J. Evolutionary models for formation of network motifs and modularity in the
*Saccharomyces*transcription factor network. PLoS Comput. Biol. 2007;3:1993–2002. doi:10.1371/journal.pcbi.0030198 [PMC free article] [PubMed] - Yvert G., Brem R., Whittle J., Akey J., Foss E., Smith E., Mackelprang R., Kruglyak L.
*Trans*-acting regulatory variation in*Saccharomyces cerevisiae*and the role of transcription factors. Nat. Genet. 2003;35:57–64. doi:10.1038/ng1222 [PubMed] - Zhang Z., Gu J., Gu X. How much expression divergence after yeast gene duplication could be explained by regulatory motif evolution? Trends Genet. 2004;20:403–407. doi:10.1016/j.tig.2004.07.006 [PubMed]

Articles from Proceedings of the Royal Society B: Biological Sciences are provided here courtesy of **The Royal Society**

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |