Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC3115574

Formats

Article sections

- Abstract
- I. INTRODUCTION
- II. BURSTS OF GENE EXPRESSION
- III. GENE REGULATION
- IV. REGULATION WITH BURSTS
- V. CONCLUSIONS
- References

Authors

Related links

Phys Rev E Stat Nonlin Soft Matter Phys. Author manuscript; available in PMC 2011 June 15.

Published in final edited form as:

Phys Rev E Stat Nonlin Soft Matter Phys. 2009 October; 80(4 Pt 1): 041921.

Published online 2009 October 20. PMCID: PMC3115574

NIHMSID: NIHMS189350

Andrew Mugler, Department of Physics, Columbia University, New York, New York 10027, USA;

See other articles in PMC that cite the published article.

Signal-processing molecules inside cells are often present at low copy number, which necessitates probabilistic models to account for intrinsic noise. Probability distributions have traditionally been found using simulation-based approaches which then require estimating the distributions from many samples. Here we present in detail an alternative method for directly calculating a probability distribution by expanding in the natural eigenfunctions of the governing equation, which is linear. We apply the resulting *spectral* method to three general models of stochastic gene expression: a single gene with multiple expression states (often used as a model of bursting in the limit of two states), a gene regulatory cascade, and a combined model of bursting and regulation. In all cases we find either analytic results or numerical prescriptions that greatly outperform simulations in efficiency and accuracy. In the last case, we show that bimodal response in the limit of slow switching is not only possible but optimal in terms of information transmission.

Signals are processed in cells using networks of interacting components, including proteins, mRNAs, and small signaling molecules. These components are usually present in low numbers [1-6], which means the size of the fluctuations in their copy counts is comparable to the copy counts themselves. Noise in gene networks has been shown to propagate [7] and therefore explicitly accounting for the stochastic nature of gene expression appears important when predicting the properties of real biological networks.

Although summary statistics such as mean and variance are sometimes sufficient for answering questions of biological interest [8], calculating certain quantities, such as information transmission [9-14], requires knowing the full probability distribution. Full knowledge of the probability distribution can also be used to discern different molecular models of the noise sources based on recent exact measurements of probability distributions [15-18].

Much analytical and purely computational effort has gone into detailed models of noise in small genetic switches [8,19-21]. The most general description is based on the master equation describing the time evolution of the joint probability distribution over all copy counts [22]. Some progress has been made by applying approximations to the master equation [24-26]. For example, a wide class of approximations focuses on limits of large concentrations or small switches [8,19,23]. More often, modelers resort to stochastic simulation techniques, the most common of which is based on the varying-step Monte Carlo method [27,28]. This method requires a computational challenge (generating many sample trajectories) followed by an even more difficult statistical challenge (parametrizing or otherwise estimating the probability distribution from which the samples are drawn) [29]. Recently there have been significant advances in simulation-based methods to circumvent these problems [20,21,30]. Simulation techniques are especially useful for more detailed studies of experimentally well-characterized systems, including those incorporating DNA looping, non-specific binding [30], and explicit spatial effects [20,21]. However, it is often beneficial to first gain intuition from simplified analytical models.

In a recent paper [31] we introduce an alternative method for calculating the steady-state distributions of chemical reactants, which we call the spectral method. The procedure relies on exploiting the natural basis of a simpler problem from the same class. The full problem is then solved numerically as an expansion in the natural basis [32]. In the spectral method we use the analytical guidance of a simple birth-death problem to reduce the master equation for a cascade to a set of algebraic equations. We break the problem into two parts: a parameter-independent preprocessing step and the parameter-dependent step of obtaining the actual probability distributions. The spectral method allows huge computational gains with respect to simulations. In prior work [31] we illustrate the method in the example of gene regulatory cascades. We combine the spectral method with a Markov approximation, which exploits the observation that the behavior of a given species should depend only weakly on distant nodes given the proximal nodes.

In this paper we expand upon the application of the spectral method to more biologically realistic models of regulation: (i) a model of bursting in gene expression and (ii) a model that includes both bursts and explicit regulation by binding of transcription factor proteins. In both cases we demonstrate how the spectral method gives either analytic results or reduced algebraic expressions that can be solved numerically in orders of magnitude less time than stochastic simulations.

We note that although information propagation in biological networks is impeded by numerous mechanisms collectively modeled as noise, here we focus on the inescapable or “intrinsic” noise resulting from the finite copy number of the molecules. One should consider these results, then, as an upper bound on information propagation, further hampered by, for example, cell division [33]; spatial effects [20,21]; active degradation of the constituents (here modeled via a constant degradation rate); and other complicating mechanisms particular to specific systems. Additionally, here we focus on steady-state solutions, but extensions to dynamics are straightforward and currently being pursued.

We begin with a model of a multistate birth-death process, a special case of which has been used to describe transcriptional bursting [17,34]. We illustrate how the spectral method reduces the model to a simple iterative algebraic equation, and in the appropriate limiting case recovers the known analytic results. We also use this section to introduce the basic notation used throughout the paper. Next we explore the problem of gene regulation in detail. The main idea behind the spectral method is the exploitation of an underlying natural basis for a problem which we can solve exactly. We explore four different spectral representations of the regulation model used in previous work [31] that arise from four natural choices of eigenbasis in which to expand the solution (cf. Sec. II A and Fig. 2). All representations reduce the master equation to a set of linear algebraic equations, and one admits an analytic solution by virtue of the tridiagonal matrix algorithm. We compare the efficiencies of the representations’ numerical implementations and show that all outperform simulation. Lastly, we apply the spectral method to a model that combines bursting and regulation. We obtain a linear algebraic expression that permits large speedup over simulation and thus admits optimization of information transmission. Optimization reveals two types of solutions: a unimodal response when the rates of switching between expression states are comparable to degradation rates and a bimodal response when switching rates are much slower than degradation rates.

We first consider a model of gene expression in which a gene exists in one of *Z* stochastic “states,” i.e., protein production obeys a simple birth-death process but with a state-dependent birth rate. In the special case of *Z*=2, this corresponds to a gene existing in an on or an off state due, for example, to the binding and unbinding of the RNA polymerase. Such a model has been used to describe transcriptional bursting [17,34], and we specialize to this case in Sec. II C.

For a general *Z*-state process, the master equation

(1)

describes the time evolution of the joint probability distribution , where *z* specifies the state (1≤*z*≤*Z*), *n* is the number of proteins, *g _{z}* is the production rate in state

(2)

The relationship between the transition rates in Ω_{zz’} and the probabilities of being in the *z*th state can be seen by summing Eq. (1) over *n*; at steady state one obtains

(3)

and normalization requires

(4)

In the following sections, we introduce the spectral method and demonstrate how it can be used to solve for the full joint distribution .

We begin our solution of Eq. (1) by defining the generating function [22] over complex variable *x* [35] (note that superscript *z* is an index, while superscript *n* on *x ^{n}* is a power). It will prove more convenient to rewrite the generating function in a more abstract representation using states indexed by protein number |

(5)

with inverse transform

(6)

The more familiar form can be recovered by projecting the position space *x*| onto Eq. (5), with the provision that

(7)

Concurrent choices of conjugate state

(8)

and inner product

(9)

ensure orthonormality of the states, *n*|*n*’=*δ*_{nn’}, as can be verified using Cauchy’s theorem,

(10)

where the convention *θ*(0)=0 is used for the Heaviside function.

With these definitions, summing Eq. (1) over *n* against |*n* yields

(11)

where the operators *â*^{+} and *â*^{−} raise and lower protein number, respectively, i.e.,

(12)

(13)

with adjoint operations

(14)

(15)

As in the operator treatment of the simple harmonic oscillator in quantum mechanics, the raising and lower operators satisfy the commutation relation [*â*^{−},*â*^{+}]=1 and *â*^{+}*â*^{−} is a number operator, i.e., *â*^{+}*â*^{−}|*n*=*n*|*n* [36]. This operator formalism for the generating function was introduced in the context of diffusion independently by Doi [37] and Zel’dovich [38] and developed by Peliti [39]. We note that all results can equivalently be obtained by remaining in *x* space (for example, *â*^{+}↔*x* and *â*^{−}↔* _{x}*). However, the raising and lowering operators define an extremely simple algebra which allows us to calculate all projections without explicitly computing overlap integrals (as shown, for example, in Appendix A). A review by Mattis and Glasser [40] introduces and discusses the applications of the formalism for diffusion.

The factorized form of the birth-death operator in Eq. (11) suggests the definition of shifted raising and lowering operators

(16)

(17)

making Eq. (11)

(18)

Since ^{+} and are simply shifted from *â*^{+} and *â*^{−} by constants, is a new number operator, and its eigenvalues are nonnegative integers *j*, i.e.,

(19)

where *j* indexes (*z*-dependent) eigenfunctions |*j _{z}*. In position space,

(20)

with conjugate

(21)

such that orthonormality is satisfied under the inner product in Eq. (9). Note that ^{+} and raise and lower eigenstates |*j _{z}* as in Eqs. (12)–(15), i.e.,

(22)

(23)

(24)

(25)

As we will see in this and subsequent sections, going between the protein number basis |*n* and the eigenbasis |*j _{z}* requires the mixed product

(26)

(27)

where the latter is the Poisson distribution.

We now demonstrate how the spectral method exploits the eigenfunctions |*j _{z}* to decompose and simplify the equation of motion. Expanding the generating function in the eigenbasis,

(28)

and projecting the conjugate state *j _{z}*| onto Eq. (18) yields the equation of motion

(29)

for the expansion coefficients [where the dummy index *j* in Eq. (28) has been changed to *j*’ in Eq. (29)]. Using Eqs. (9), (10), (20), and (21), the product evaluates to

(30)

where Δ_{zz’}=*g _{z}*−

(31)

The last term in Eq. (31) makes clear that each *j*th term is slaved to terms with *j*’<*j*, allowing the to be computed iteratively in *j*. The lower-triangular structure of the equation is a consequence of rotating to the eigenspace of the birth-death operator; this structure was not present in the original master equation.

At steady state, obeys

(32)

from which it is clear that can be computed successively in *j*. Since

(33)

[cf. Eq. (26)], the computation is initialized using Eqs. (3) and (4), i.e.,

(34)

(35)

Recalling Eqs. (6) and (28), the probability distribution is retrieved via

(36)

where the mixed product *n*|*j _{z}* can be computed as described in Appendix A.

There is an alternative way to decompose the master equation spectrally. Instead of expanding the generating function in eigenfunctions |*j _{z}*, which depend on the production rates

(37)

where

(38)

(39)

The parameter *g‾* is arbitrary and thus acts as a “gauge” freedom.

We may now partition the birth-death operator as

(40)

where *b‾*^{−}=*â*^{−}−*g‾* such that the |*j* are the eigenstates of ^{+}*b‾*^{−}, i.e.,

(41)

and Γ_{z}=*g‾*−*g _{z}* describes the deviation of each state’s production rate from the constant

Projecting the conjugate state *j*| onto Eq. (18) and using Eq. (37) (with dummy index *j* changed to *j*’) gives

(42)

Recalling Eq. (24), Eq. (42) at steady state becomes

(43)

Equation (43) is subdiagonal in *j*, meaning computation of the *j*th term requires only the previous (*j*−1)th term and the inversion of the *Z*-by-*Z* matrix (**Ω**−*j***I**) (where **I** is the identity matrix). It is initialized with Eqs. (34) and (35) and solved successively in *j*. The probability distribution is retrieved via

(44)

where *n*|*j* is computed as described in Appendix A.

As an example of a simple computation employing the spectral method, Fig. 1 shows probability distributions for the case of *Z*=3 states, corresponding to a gene that is either off, producing proteins at a low rate, or producing proteins at a high rate. For simplicity we set the rates of switching among all states equal to a constant *ω*, making the stochastic matrix

(45)

As seen in Fig. 1, when *ω*1 (corresponding to a switching rate much slower than the degradation rate) the dwell time in each expression state lengthens. The slow switching gives the protein copy number time to equilibrate in any of the three expression states, resulting in a trimodal marginal distribution *p _{n}*. When

For the special case of *Z*=2 states, as when a gene is either “on” (*z*=+) or “off” (*z*=−), it is useful to demonstrate how the spectral method reproduces known analytic results [17,34]. The probability distribution can be written in vector form as

(46)

and defining *ω*_{+} and *ω*_{−} as the transition rates to and from the on state, respectively, the stochastic matrix takes the form

(47)

Note that Eq. (3) implies

(48)

which makes clear that increasing the rate of transition to either state increases the probability of being in that state.

From Eq. (32) the spectral expansion coefficients obey

(49)

where Δ=Δ_{+−}=−Δ_{−+}. Initializing with and computing the first few terms reveals the pattern

(50)

(51)

where in the second line the products are written in terms of the Gamma function.

For comparison with known results [17,34] we write the total generating function |*G*=Σ_{±}|*G*_{±} in position space,

(52)

(53)

(54)

where

(55)

is the confluent hypergeometric function. As shown in Appendix B, in the limit *g*_{−}=0, Eq. (54) reduces to

(56)

and the marginal *p _{n}* is given by

(57)

in agreement with the results of Iyer-Biswas *et al.* [34] and Raj *et al.* [17]. We remind the reader that in addition to reducing to known results in the special case of *Z*=2 states with a vanishing off-rate, the spectral method is valid for any number of states with arbitrary production rates.

Next we consider a two-gene regulatory cascade, in which the production rate of the second gene is a function of the number of proteins of the first gene. As shown in previous work [31], a cascade of any length can be reduced to such a generalized two-dimensional system using the Markov approximation, which asserts that the probability distribution for a given node of the cascade should depend only weakly on the probability distributions of the distant nodes given the proximal nodes.

In the present section, we consider only the generalized two-dimensional equation and explore different approaches to solving it. The equation describes two genes, each with one expression state, with regulation encoded by a functional dependence of the downstream protein production rate on the upstream protein copy number. In Sec. III D we make an explicit connection between the on/off gene discussed in Sec. II C and the case when the functional dependence is a threshold. Finally, in Sec. III we combine the two types of models and consider a system with regulation and bursts.

Defining *n* and *m* as the numbers of proteins produced by the first and second gene, respectively, the master equation describing the time evolution of the joint probability distribution *p _{nm}* is [31]

(58)

The function *q _{n}* describes the regulation of the second species by the first, and the function

Summing Eq. (58) over *m* gives a simple recursion relation between *g _{n}* and

(59)

If on the other hand *g _{n}* is known,

(60)

with *p*_{0} set by normalization. Note that if the first species obeys a simple birth-death process, *g _{n}*=

In the current representation [Eq. (58)], which we denote the |*n,m* basis, finding the steady-state solution for the joint distribution *p _{nm}* means finding the null space of an infinite (or, effectively for numerical purposes, very large) locally banded tridiagonal matrix. More precisely, defining

In order to solve Eq. (58) more efficiently we will employ the spectral method. We begin as before by defining the generating function [22] *G*(*x, y*)=Σ_{nm} *p _{nm}x^{n}y^{m}* over complex variables

(61)

with inverse transform

(62)

Summing Eq. (58) over *n* and *m* against |*n,m* and employing the same operator notation as in Eqs. (12)–(15) yields

(63)

where

(64)

and

(65)

(66)

(67)

(68)

Here the regulation functions have been promoted to operators obeying *ĝ _{n}*|

Equation (64) makes clear that if not for the *n* dependence of the operators the Hamiltonian *Ĥ* would be diagonalizable, or, equivalently, if *g _{n}* and

(69)

where *Ĥ*_{0} is a diagonalizable part (and *Ĥ*_{1} is the corresponding deviation from the diagonal form), and expand |*G* in the eigenbasis of *Ĥ*_{0} to exploit the diagonality.

As with the multistate system in Sec. II B, where we expand the solution in two different bases, there are several natural choices of eigenbasis of *Ĥ*_{0}. Figure 2 summarizes these choices diagrammatically: starting in the |*n,m* basis (at the top of Fig. 2), one may expand in eigenfunctions either the first species, yielding the |*j,m* basis (left), or the second species, yielding the |*n,k _{n}* basis (right; in general we allow the parameter defining the second species’ eigenfunctions to depend on

The |*n,m* and |*j,m* bases are less numerically useful than the other bases: as discussed above and detailed in Fig. 3, the |*n,m* basis is numerically inefficient; and the |*j,m* basis does not exploit the natural structure of the problem, since, unlike the other bases, it neither retains the tridiagonal structure in *n* nor gains a lower triangular structure in *k* (see the sections below). Each of the remaining bases, however, has preferable properties in terms of numerical stability and ability to represent the function sparsely yet accurately (either using a few values of n in the |*n,k _{n}* basis or a few values of

For expository purposes we start by recalling the spectral representation used in previous work [31]. We choose the diagonal part of the Hamiltonian to correspond to two birth-death process with constant production rates *g‾* and *q‾*,

(70)

where and . As in Sec. II B, the gauge parameters *g‾* and *q‾* can be set arbitrarily without affecting the final solution; however, their values can affect the numerical stability of the method. For example, when the regulation *q _{n}* is a threshold function, a large discontinuity can narrow the range of

(71)

captures the deviations _{n}=*g‾*−*ĝ*_{n} and _{n}=*q‾*−_{n} of the regulation functions from the constant rates. We expand the generating function as

(72)

where |*j,k* is the eigenbasis of *Ĥ*_{0}, i.e.,

(73)

The eigenbasis is parametrized by the rates *g‾* and *q‾*, meaning in position space *x*|*j* is as in Eq. (38) and similarly for *y*|*k* with *x*→*y, j*→*k*, and *g‾*→*q‾*.

With Eqs. (70)–(72), projecting the conjugate state *j,k*| onto Eq. (63) yields

(74)

[where the dummy indices *j* and *k* in Eq. (72) have been changed to *j*’ and *k*’, respectively, in Eq. (74)]. Recalling Eq. (24) and restricting attention to steady state, Eq. (74) becomes

(75)

where the deviations have been rotated into the eigenbasis as

(76)

(77)

Equation (75) is subdiagonal in *k* and is therefore similar to Eq. (43) in that the last term acts as a source term. The subdiagonality is a consequence of the topology of the two-gene network: the first species regulates only itself (effectively) and the second species. Although the spectral method is fully applicable to systems with feedback, the subdiagonal structure is not preserved.

Equation (75) is initialized using

(78)

(79)

[cf. Eq. (26)] with known *p _{n}* [cf. Eq. (60)], then solved at each subsequent

(80)

where is a vector over *j*, bold denotes matrices, and and are diagonal and subdiagonal matrices, respectively. Equation (80) makes clear that the solution involves only matrix multiplication and the inversion of a *J*-by-*J* matrix *K* times, where *J* and *K* are cutoffs in the eigenmode numbers *j* and *k*, respectively. In fact, if the first species obeys a simple birth-death process, i.e., *g _{n}*=

Recalling Eqs. (62) and (72), the joint distribution is retrieved from *G _{jk}* via the inverse transform

(81)

a computation again involving only matrix multiplication. The mixed product matrices *n*|*j* and *m*|*k* are computed as described in Appendix A.

The |*j,k* basis treats both genes similarly by expanding each around a constant production rate. We may instead imagine an eigenbasis that more closely reflects the underlying asymmetry imposed by the regulation and make the basis of the second gene a function of that of the first. That is, we expand the first gene in a basis |*j* with gauge *g‾* as before, but now we expand the second gene in a basis |*k _{j}* with a

(82)

and Eq. (63) at steady state becomes

(83)

where we have partitioned the Hamiltonian for each *j* as

(84)

(85)

with and _{n}=*g‾*−*ĝ _{n}* as before and now and

(86)

Projecting the conjugate state *j,k _{j}*| onto Eq. (83) yields, after some simplification (cf. Appendix C), the equation of motion

(87)

where Γ_{jj’} is as in Eq. (76) and

(88)

(89)

with *Q*_{jj’}=*q‾ _{j}*−

(90)

where and are defined as before [cf. Eq. (80)], and * denotes an element-by-element matrix product. Once again, Eq. (90) is lower-triangular in *k* and requires only matrix multiplication and the inversion of a *J*-by-*J* matrix *K* times. is initialized as in Eq. (79), and the *k*th term is computed from the previous *k*’<*k* terms. The joint distribution is retrieved via the inverse transform

(91)

Expansion in either the |*j,k* or the |*j,k _{j}* basis conveniently turns the master equation into a lower-triangular linear algebraic equation and replaces cutoffs in protein number with cutoffs in eigenmode number (which can be smaller with appropriate choices of gauge). However, these bases sacrifice the original tridiagonal structure of the master equation in the copy number of first gene,

(92)

If the rate parameter of the |*k _{n}* basis were the regulation function

(93)

where

(94)

(95)

with and _{n}(*n*)=*q‾ _{n}*−

(96)

where Eqs. (12), (13), (65), and (67) are recalled in applying the first term of *Ĥ*_{0}(*n*).

Projecting the conjugate state *n,k _{n}*| onto Eq. (93) yields, after some simplification (cf. Appendix C), the equation of motion

(97)

where

(98)

(99)

and . Equation (97) can be written linear algebraically as

(100)

where **X**^{T} indicates the transpose of **X**, * denotes an element-by-element product, are super-(+) and subdiagonal (−) matrices, is a tridiagonal matrix, and , which is **G** with the columns reversed (i.e., ), is built incrementally in *k*. As with the |*j,k* and |*j,k _{j}* bases, the

We now consider a basis which employs both the constant-rate eigenfunctions |*j* in the *n* sector and the *n*-dependent eigenfunctions |*k _{n}* in the

(103)

where, as in the |*j,k* and |*j,k _{j}* bases, the |

(104)

Substituting Eq. (103) into Eq. (58) at steady state gives, after some simplification (cf. Appendix C),

(105)

where *Ĥ* is defined as in Eq. (64), and act as in Eqs. (12)-(15), and

(106)

We may now partition *Ĥ*=*Ĥ*_{0}+*Ĥ*_{1} as

(107)

(108)

with and _{n}=*g‾*−*ĝ _{n}* as in the |

(109)

Equation (105) becomes, after more simplification (cf. Appendix C),

(110)

where Γ_{jj}’ is as in Eq. (76),

(111)

(112)

(113)

and is as in Eq. (99). Linear algebraically,

(114)

with and defined as before [cf. Eq. (80)], revealing once again a lower-triangular equation (i.e., each *k*th term is slaved to the previous *k*’<*k* terms) requiring only matrix multiplication and the inversion of a *J*-by-*J* matrix *K* times. Recalling Eq. (104), the scheme is initialized using

(115)

[cf. Eq. (26)] with known *p _{n}* [cf. Eq. (60)], and the joint distribution is retrieved using Eq. (103).

The spectral representations in Sec. II A produce equations of motion with similar levels of numerical complexity. In all cases, the original two-dimensional master equation has been reduced by the lower-triangular structure in the second gene’s eigenmode number *k* to a hierarchy of evaluations of one-dimensional problems. The bases differ in the rate parameters or equivalently gauge freedoms that one is free to choose: the |*j,k* basis requires two constants *g‾* and *q‾*; the |*j,k _{j}* basis requires

The bases also differ in the types of problems for which they are most suitable. For example, the |*j,k*, |*j,k _{j}*, and |

As indicated in Fig. 2, the |*j,k* basis can be viewed as a special case of either the |*j,k _{j}* basis with

The strength of all the spectral bases discussed in this section and of the proposed spectral method in general is that it allows for a fast and accurate calculation of full steady-state probability distributions of the number of protein molecules in a gene regulatory network. In Fig. 3 we demonstrate this property for the two-gene system by plotting error versus computational runtime for each spectral basis, as well as for a stochastic simulation using a varying step Monte Carlo procedure [28]. For error we use the Jensen-Shannon divergence [41] (a measure in bits between two probability distributions) between the distribution *p _{nm}* computed in the |

In general, the equations of motion in the spectral representations [Eqs. (75), (87), (97), and (110)] need to be evaluated numerically. In the case of the |*n,k _{n}* basis, however, we can exploit the tridiagonal structure of Eq. (97) to find an exact analytic solution. Specifically, in the case of a Poisson parent (

(116)

setting *q‾ _{n}*=

(117)

where

(118)

(119)

Δ=*q*_{+}−*q*_{−}, and *n*_{1}=*n*_{0}+1. Equation (117) is solved using the tridiagonal matrix algorithm (also called the Thomas algorithm [42]), as described in detail in Appendix D. The result is an analytic expression for the *k*th column of *G _{nk}* in terms of its previous columns (i.e., the matrix inversion has been performed explicitly),

(120)

where

(121)

(122)

(123)

(124)

(125)

and *N* is the cutoff in protein number *n*. Along with the analytic form of the mixed product

(126)

(cf. Appendix A), Eq. (120) in the limit *N* → ∞ constitutes an exact analytic solution for the joint distribution *p _{nm}* as calculated using Eq. (102).

If a gene is regulated via a threshold function [cf. Eq. (116)], its steady-state protein distribution *p _{m}* can be well approximated by the two-state process discussed in Sec. II C. To make the connection clear, we first observe that the off-state (

(127)

and the on state (*z*=+) corresponds to the first gene expressing more proteins than the threshold, i.e.,

(128)

The dynamics of are then obtained by summing the master equation for two-gene regulation [Eq. (58)] over either all *n*≤*n*_{0} or all *n*>*n*_{0}, giving

(129)

where *n*_{1}=*n*_{0}+1. Making the approximations

(130)

(131)

where

(132)

(133)

are the total probabilities of being in the off and on states, respectively, and noting from Eq. (59) that *g*_{n0}=*n*_{1}*p*_{n1}/*p*_{n0}, Eq. (129) at steady state becomes

(134)

with *z*=± and

(135)

where

(136)

Equations (134) and (135) have the same form as Eqs. (1) and (47) at steady state with *n*→*m* and *g*→*q*, and Eq. (136) relates the effective switching rates *ω*_{±} to input and regulation parameters *p*_{n1}, *π*_{±}, and *n*_{1}, and the ratio *ρ* of the degradation rate of the second gene to that of the first. Note that Eq. (136) satisfies

(137)

in agreement with Eq. (48), and exhibits the intuitive behavior that increasing *ρ* (i.e., decreasing the relative response rate of the first gene) is equivalent to decreasing the switching rates *ω*_{±}.

A comparison of the distributions of a threshold-regulated gene with those of an on/off gene for various parameter settings reveals that Eqs. (130) and (131) are a good approximation. Figure 4 shows a demonstration for a threshold-regulated system with a Poisson input distribution. In the first column, the mean *g* of the input lies above the threshold *n*_{0}, making the output more likely to be in the on-state, i.e., *π*_{+}>*π*_{−}; in the second column, *g*<*n*_{0}, making *π*_{+}<*π*_{−}. In the first row *ρ*<1; in the second row *ρ*>1, corresponding to lower effective switching rates *ω*_{±} and producing bimodal distributions with peaks near the on/off rates *q*_{±}. In all examples, the approximation as a two-state process with switching rates given by Eq. (136) agrees well with the actual output from threshold regulation.

The final system we consider combines the multistate process used to model bursts of expression in Sec. I with gene regulation as discussed in Sec. II. Specifically we consider a system of two species, with protein numbers *n* and *m*, existing in *Z* possible states, distinguished by the settings of the two production rates *g _{z}* and

(138)

where the dependence of the stochastic matrix (Ω_{zz’} on *n* incorporates the regulation.

As with the previously discussed models, Eq. (138) benefits from spectral expansion, and for simplicity we present only the formulation in the |*j,k* basis, parametrized by constant rates *g‾* and *q‾* respectively, as in Secs. II B and III A 2. As before the first step is to define the generating function

(139)

with which Eq. (138), upon summing over *n* and *m* against |*n,m*, becomes

(140)

where

(141)

(142)

(143)

(144)

(145)

and _{zz’} is Ω_{zz’}(*n*) with every instance of *n* replaced by the number operator . Defining and , we partition the Hamiltonian as , with

(146)

as the operator of which |*j,k* is the eigenbasis, i.e.,

(147)

and

(148)

capturing the deviations Γ_{z}=*g‾*−*g _{z}* and Δ

(149)

and taking dummy indices *j*→*j*’ and *k*→*k*’, projecting the conjugate state *j,k*| onto Eq. (140) gives

(150)

where the components of _{zz}’ need only be evaluated in the *j* sector, not the *k* sector, because the transition rates depend on only *n*, not *m* [cf. Eq. (138)]. Like Eqs. (43) and (75), Eq. (150) is subdiagonal in *k* and thus far more efficient to solve than the original master equation [Eq. (138)] as we demonstrate for a special case in the next section. The joint distribution is retrieved from via inverse transform,

(151)

with the mixed products calculated as in Appendix A.

As a simple example of the model in Eq. (138), we consider a system in which each of the two species has an on state and an off state, and the transition rate of the second species to its on state is a function of the number of copies of the first species. This system models both (i) a single gene for which the production of proteins depends on the number of transcripts, and each is produced in on and off states by the binding and unbinding of ribosomes and RNA polymerase, respectively, and (ii) one gene regulating another with each undergoing burstlike expression.

There are a total of *Z*=4 states, i.e.,

(152)

where the first signed index denotes the state of the first gene (with protein count *n*) and the second signed index denotes the state of the second gene (with protein count *m*). Defining *g*_{±} as the production rates of the first species in its on (+) and off states (−), and similarly *q*_{±} for the second species, the production rates of the *Z*=4 states are

(153)

(154)

Defining *ω*_{±} as the transition rates of the first species to (+) and from (−) its on state, and similarly *α*_{±} for the second species, the transition matrix takes the form

(155)

The simple form *α*_{+}(*n*)=*cn ^{ν}* for constant

(156)

Since and raise and lower |*j*’ states respectively [cf. Eqs. (22) and (23)], the modified transition matrix *j*|(_{zz’}|*j*’ is nearly diagonal, with nonzero terms only for |*j*−*j*’|≤*ν*.

Equation (150) at steady state,

(157)

is solved successively in *k*, requiring the inversion of a 4*J*-by-4*J* matrix *K* times. It is initialized at *k*=0 by computing the null space of the left-hand side and normalizing with [cf. Eq. (35)]. The joint distribution is retrieved via inverse transform [Eq. (151)].

With *ν*=2, a typical solution of Eq. (157) takes a few seconds (in matlab), which, depending on the cutoff *N*, is ~10^{2}–10^{3} times faster than direct solution of the master equation [Eq. (138)] by iteration for equivalent accuracy. The advantage of such a large efficiency gain is that it allows repeated evaluations of the governing equation, necessary for parameter inference or optimization [31]. We demonstrate this possibility in the next section by finding and interpreting the solutions that optimize the information flow from the first to the second species.

Cells use regulatory processes to transmit relevant information from one species to the next [44-48]. Information processing is quantified by the mutual information *I*, which, between the first and second species in the four-state process, is

(158)

where the distributions *p _{nm}, p_{n}*, and

Upon optimization of *I* for the four-state process, two distinct types of optimal solutions become clear: those in which the distribution *p _{nm}* has one peak and those in which

To quantify the transition between the two types of solutions, we numerically optimized mutual information over parameters *g*_{+}, *q*_{+}, *ω*_{−}, *ω*_{+}, *α*_{−}, and *c* (the off-rates *g*_{−} and *q*_{−} were fixed at 0; the cooperativity *ν* was fixed at 2; and the decay rate ratio *ρ* was fixed at 1). Information may always be trivially optimized by allowing infinite copy number or arbitrary separation of relevant time scales. We limit copy number by constraining the gain

(159)

defined as the average of the parent gain =*g*_{+}−*g*_{−} and the child gain Δ=*q*_{+}−*q*_{−}. Since *g*_{−}=*q*_{−}=0, the maximum number of particles is dictated by the on rates *g*_{+} and *q*_{+} and thus constraining *γ* limits the copy number. We limit separation between the switching time scales and the decay time scales by constraining the stiffness

(160)

where the average *n ^{ν}* is taken over

(161)

for a given value of the Lagrange multiplier *λ*.

As shown in Fig. 5(A), one-peaked solutions are more informative at low stiffness, while two-peaked solutions are more informative at high stiffness. We compute the convex hulls of the one- and two-peaked data to remove suboptimal solutions, and the transition occurs at the stiffness value at which the convex hulls intersect [cf. Fig. 5(A)]. Repeating this procedure for many choices of gain allows one to trace out the phase transition shown Fig. 5(B), which makes clear that one-peaked solutions are most informative at low stiffness, two-peaked solutions are most informative at high stiffness, and the critical stiffness decreases weakly with increasing gain.

Examining the two-peaked solutions, which are optimal at high stiffness, we find that the marginal probability to be in the (−,+) state, where the first gene is off and the second gene is on, , is much smaller than the marginal probabilities to be in the other three states [*π*^{−,+}/*π ^{z}*<0.01, where

The presented spectral method exploits the linearity of the master equation to solve for probability distributions directly by expanding in the natural eigenfunctions the linear operator. We demonstrate the method on three models of gene expression: a single gene with multiple expression states, a gene regulatory cascade, and a model that combines multistate expression with explicit regulation through binding of transcription factor proteins.

The spectral method permits huge computational gains over simulation. As demonstrated for all spectral expansions of the two-gene cascade (cf. Fig. 3), directly solving for the distribution via the spectral method is ~10^{7}–10^{8} times faster than building the distribution from samples using a simulation technique. This massive speedup makes possible optimization and inference problems requiring full probability distributions that were not computationally feasible previously. For example, by optimizing information flow in a two-gene cascade in which both parent and child undergo two-state production, we reveal a transition from a one-peaked to a two-peaked joint probability distribution when constraints on protein number and time scale separation are relaxed. We emphasize that this optimization would not have been possible without the efficiency of the spectral method.

The spectral method also makes explicit the linear algebraic structure underlying the master equation. In many cases, such as in two-state bursting and the two-gene threshold regulation problem, this leads to analytic solutions. In general, such as shown in the case of the linear cascade, this leads to a set of natural bases for expansion of the generating function and reveals the features of each basis that are better suited to different types of problems. Specifically, bases in which the parent species is expanded in eigenfunctions are best when the parent distribution is Poissonian, and bases in which the parent is left in protein number space are best when the parent distribution is concentrated at low protein number. As well, bases in which the eigenfunctions of the child depend on the number of copies of the parent’s protein are best suited for smooth regulation functions, whereas a basis in which the eigenfunctions of the child are parametrized by a constant is more numerically robust for sharp regulation functions such as thresholds. In all cases the linear algebraic structure of the spectral decomposition yields numerical prescriptions that greatly outperform simulation techniques. We anticipate that the computational speedup of the method, as well as the removal of the statistical obstacle of density estimation inherently limiting simulation-based approaches, will make spectral methods such as those demonstrated here useful in addressing a wide variety of biological questions regarding accurate and efficient modeling of noisy information transmission in biological systems.

A.M. was supported by National Science Foundation Grant No. DGE-0742450. A.M. and C.W. were supported by National Science Foundation Grant No. ECS-0332479. A.M.W. was supported by the Princeton Center for Theoretical Science and by Columbia’s Professional Schools Visiting Scholar Grant. C.W. was supported by National Institutes of Health Grants No. 5PN2EY016586-03 and No. 1U54CA121852-01A1.

In this appendix we describe two ways to compute the mixed products *n*|*j* and *j* |*n* between the protein number states |*n* and the eigenstates |*j*: by direct evaluation and by recursive updating.

The direct evaluation follows from Eqs. (8)–(10) and (20), and the fact that repeated derivatives of a product follow a binomial expansion. Introducing *g* as the rate parameter for the |*j* states,

(A1)

(A2)

(A3)

(A4)

(A5)

(A6)

where

(A7)

Similarly, noting Eqs. (7) and (21),

(A8)

with *ζ _{nj}* as in Eq. (A7). Equations (A6) and (A8) clearly reduce to Eqs. (26) and (27) for the special case

It is more computationally efficient to take advantage of the selection rules in Eqs. (12)–(15) and (22)–(25) to compute the mixed products recursively. For example, using Eqs. (14), (16), and (22),

(A9)

which can be initialized using *n*|0=*e*^{−g}*g ^{n}*/

Alternatively, Eqs. (15), (17), and (23) give

(A10)

which can be initialized using 0|*j*=(−1)^{j}*e*^{−g} [cf. Eq. (A6)] and updated recursively in *n*.

One may similarly derive recursion relations for *j* |*n*, i.e.,

(A11)

(A12)

initialized with *j* |0=(−*g*)^{j} / *j*! or 0|*n*=1, respectively [cf. Eq. (A8)] and updated recursively in *n* or *j*, respectively.

One may also use the full birth-death operator ^{+}^{−} to derive the recursion relations

(A13)

(A14)

initialized with 0|*j*=(−1)^{j}*e*^{−g} and 1|*j*=(−1)^{j}*e*^{−g}(*g*− *j*) [cf. Eq. (A6)], and *j* |0=(−*g*)^{j} / *j*! and *j* |1=(−*g*)^{j}(1−*j* /*g*)/ *j*! [cf. Eq. (A8)], respectively, and updated recursively in *n*.We find Eqs. (A13) and (A14) are more numerically stable than Eqs. (A9)–(A12), as the former are two-term recursion relations while the latter are one-term recursion relations.

In this appendix, we fill in the details of the derivations of the equations of motion for the latter three of the four spectral bases discussed in Sec. II A.

Projecting the conjugate state *j,k _{j}*| onto Eq. (83) (in which dummy indices

(C1)

From the orthonormality of states, the first term of Eq. (C1) simplifies to

(C2)

Recalling Eq. (30), the product simplifies to

(C3)

with *Q _{jj}*

(C4)

with Γ_{jj’} as in Eq. (76) and

(C5)

(C6)

(C7)

[where the orthonormality of |*j* states is used in going from Eq. (C5) to Eq. (C6)]. Defining =*k*−*k*’ and

(C8)

Projecting the conjugate state *n,k _{n}*| onto Eq. (93) (in which dummy indices

(C9)

where Δ_{n}=*q‾ _{n}*−

(C10)

where , Eq. (C9) becomes

(C11)

Separating the parts of the second and third term that are diagonal in *k* and defining =*k*−*k*’ and

(C12)

Substituting Eq. (103) into Eq. (58) at steady state gives

(C13)

or in terms of raising and lowering operators [cf. Eqs. (14)]

(C14)

Using the definitions in Eqs. (64)–(68), Eq. (C14) can be written as Eq. (105).

Using Eqs. (107)–(109), Eq. (105) can be written

(C15)

where

(C16)

and the dummy indices *j* and *k* have been changed to *j*’ and *k*’ respectively. Using Eq. (C10) to note that

(C17)

where , we multiply Eq. (C15) by *k _{n}*|

(C18)

in which we exploit the completeness of |*m* states, i.e., Σ_{m}|*m**m*|=1, and is as in Eq. (99). Multiplying Eq. (C18) by *j* |*n*, summing over *n*, and exploiting Σ_{n}|*n**n*|=1 for the first two terms, we obtain Eq. (110).

In this appendix we explicitly solve for *G _{nk}* in Eq. (117) using the tridiagonal matrix or Thomas [42] algorithm. We start by identifying the subdiagonal, diagonal, superdiagonal, and right-hand side elements of Eq. (117), respectively, as

(D1)

(D2)

(D3)

(D4)

where *N* is the cutoff in protein count *n* and *n*_{1}=*n*_{0}+1. Auxiliary variables are defined iteratively as

(D5)

(D6)

(D7)

(D8)

and the solution is obtained by backward iteration with

(D9)

(D10)

(where *k* has been moved from subscript to superscript for ease of reading).

Computing the first few terms of Eq. (D6) reveals the pattern

(D11)

where

(D12)

with the convention that if *a*>*b*. Note that since , we may also use the integral representation of the Gamma function to write

(D13)

(D14)

(D15)

Using Eq. (D8) it is immediately clear that

(D16)

The first nonzero term is

(D17)

where we have defined

(D18)

Further iteration of Eq. (D8) makes clear that

(D19)

(D20)

where

(D21)

Computing the first few terms of Eq. (D10) reveals that

(D22)

(D23)

where

(D24)

At the threshold Eq. (D10) gives

(D25)

and since , the solution is easily completed using Eq. (D10), giving

(D26)

(D27)

PACS number(s): 87.10.Mn, 82.20.Fd, 87.10.Vg, 02.70.Hm

Andrew Mugler, Department of Physics, Columbia University, New York, New York 10027, USA.

Aleksandra M. Walczak, Princeton Center for Theoretical Science, Princeton University, Princeton, New Jersey 08544, USA.

Chris H. Wiggins, Department of Applied Physics and Applied Mathematics, Center for Computational Biology and Bioinformatics, Columbia University, New York, New York 10027, USA.

[1] Hooshangi S, Thiberge S, Weiss R. Proc. Natl. Acad. Sci. U.S.A. 2005;102:3581. [PubMed]

[2] Thattai M, van Oudenaarden A. Biophys. J. 2002;82:2943. [PubMed]

[3] Elowitz MB, Levine AJ, Siggia ED, Swain PS. Science. 2002;297:1183. [PubMed]

[4] Ozbudak EM, Thattai M, Kurtser I, Grossman AD, van Oudenaarden A. Nat. Genet. 2002;31:69. [PubMed]

[5] Swain PS, Elowitz MB, Siggia ED. Proc. Natl. Acad. Sci. U.S.A. 2002;99:12795. [PubMed]

[6] Paulsson J, Berg OG, Ehrenberg M. Proc. Natl. Acad. Sci. U.S.A. 2000;97:7148. [PubMed]

[7] Pedraza JM, van Oudenaarden A. Science. 2005;307:1965. [PubMed]

[8] Paulsson J. Nature (London) 2004;427:415. [PubMed]

[9] Tkačik G, Callan CG, Bialek W. Phys. Rev. E. 2008;78:011910. [PMC free article] [PubMed]

[10] Tkacik G, Callan CG, Jr., Bialek W. Proc. Natl. Acad. Sci. U.S.A. 2008;105:12265. [PubMed]

[11] Ziv E, Nemenman I, Wiggins CH. PLoS ONE. 2007;2:e1077. [PMC free article] [PubMed]

[12] Mugler A, Ziv E, Nemenman I, Wiggins CH. IET Sys. Bio. 2009;3:379. [PMC free article] [PubMed]

[13] Emberly E. Phys. Rev. E. 2008;77:041903. [PubMed]

[14] Tostevin F, ten Wolde PR. Phys. Rev. Lett. 2009;102:218101. [PubMed]

[15] Elf J, Li G-W, Xie XS. Science. 2007;316:1191. [PMC free article] [PubMed]

[16] Choi PJ, Cai L, Frieda K, Xie XS. Science. 2008;322:442. [PMC free article] [PubMed]

[17] Raj A, Peskin CS, Tranchina D, Vargas DY, Tyagi S. PLoS Biol. 2006;4:e309. [PMC free article] [PubMed]

[18] Golding I, Paulsson J, Zawilski SM, Cox EC. Cell. 2005;123:1025. [PubMed]

[19] Tǎnase-Nicola S, Warren PB, Rein ten Wolde P. Phys. Rev. Lett. 2006;97:068102. [PubMed]

[20] van Zon JS, Morelli MJ, Tǎnase-Nicola S, ten Wolde PR. Biophys. J. 2006;91:4350. [PubMed]

[21] van Zon JS, ten Wolde PR. J. Chem. Phys. 2005;123:234910. [PubMed]

[22] van Kampen NG. Stochastic Processes in Physics and Chemistry. North-Holland; Amsterdam: 1992.

[23] Walczak AM, Sasai M, Wolynes PG. Biophys. J. 2005;88:828. [PubMed]

[24] Lan Y, Papoian GA. J. Chem. Phys. 2006;125:154901. [PubMed]

[25] Lan Y, Wolynes PG, Papoian GA. J. Chem. Phys. 2006;125:124106. [PubMed]

[26] Lan Y, Papoian GA. Phys. Rev. Lett. 2007;98:228301. [PubMed]

[27] Bortz AB, Kalos MH, Lebowitz JL. J. Comput. Phys. 1975;17:10.

[28] Gillespie DT. J. Phys. Chem. 1977;81:2340.

[29] Bellman RE. Adaptive Control Processes. Princeton University Press; Princeton, NJ: 1961.

[30] Morelli MJ, ten Wolde PR, Allen RJ. Proc. Natl. Acad. Sci. U.S.A. 2009;106:8101. [PubMed]

[31] Walczak AM, Mugler A, Wiggins CH. Proc. Natl. Acad. Sci. U.S.A. 2009;106:6529. [PubMed]

[32]
A master equation in which the coordinate appears explicitly in the rates [e.g. *g*_{n} or *q*_{n} in Eq. (58)] is sometimes mistakenly termed “nonlinear” in the literature, perhaps discouraging calculations which exploit its inherent linear algebraic structure. We remind the reader that the master equation is perfectly linear.

[33] Klumpp S, Hwa T. Proc. Natl. Acad. Sci. U.S.A. 2008;105:20245. [PubMed]

[34] Iyer-Biswas S, Hayot F, Jayaprakash C. Phys. Rev. E. 2009;79:031911. [PubMed]

[35]
Note that setting *x*=*e*^{ik} makes clear that the generating function is simply the Fourier transform.

[36]
Note here, however, the difference with respect to the normalization convention commonly used in quantum mechanics in the prefactors of the creation and annihilation operations.

[37] Doi M. J. Phys. A. 1976;9:1465.

[38] Zel’dovich Ya. B., Ovchinnikov AA. JETP. 1978;47:829.

[39] Peliti L. J. Phys. A. 1986;19:L365.

[40] Mattis DC, Glasser ML. Rev. Mod. Phys. 1998;70:979.

[41] Lin J. IEEE Trans. Inf. Theory. 1991;37:145.

[42] Thomas L. Watson Sci. Columbia University; New York: 1949.

[43]
Because this basis is a function of three coordinates (*j, k*, and *n*), we cannot abstractly define the generating function as in Eq. (61); we must instead work directly with the joint probability distribution.

[44] Gomperts BD, Kramer IM, Tantham PER. Signal Transduction. Academic Press; San Diego: 2002.

[45] Ting AY, Endy D. Science. 2002;298:1189. [PubMed]

[46] Detwiler PB, Ramanathan S, Sengupta A, Shraiman BI. Biophys. J. 2000;79:2801. [PubMed]

[47] Bolouri H, Davidson EH. Proc. Natl. Acad. Sci. U.S.A. 2003;100:9371. [PubMed]

[48] Bassler BL. Curr. Opin. Microbiol. 1999;2:582. [PubMed]

[49] Koepf W. Hypergeometric Summation: An Algorithmic Approach to Summation and Special Function Identities. Vieweg; Braunschweig, Germany: 1998.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's Canada Institute for Scientific and Technical Information in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |