In order to investigate how promoter architecture affects cell-to-cell variability in gene expression, we use a model based on classical chemical kinetics (illustrated in ), in which a promoter containing multiple operators may exist in as many biochemical states as allowed by the combinatorial binding of transcription factors to its operators. The promoter transitions stochastically between the different states as transcription factors bind and fall off. Synthesis of mRNA is assumed to occur stochastically at a constant rate that is different for each promoter state. Further, transcripts are assumed to be degraded at a constant rate per molecule.
This kind of model is the kinetic counterpart of the so-called “thermodynamic model” of transcriptional regulation 
, and it is the standard framework for interpreting the kinetics of gene regulation in biochemical experiments, both in vivo 
and in vitro 
. This class of kinetic models can easily accommodate stochastic effects, and it leads to a master equation from which the probability distribution of mRNA and protein copy number per cell can be computed. It is often referred to as the standard model of stochastic gene expression 
. The degree of cell-to-cell variability in gene expression can be quantified by the stationary variance, defined as the ratio of the standard deviation and the mean of the probability distribution of mRNA or protein copy number per cell 
, or else by the Fano factor, the ratio between the variance and the mean. These two are the two most common metrics of noise in gene expression, and the relation between them will be discussed later.
In order to compute the noise strength from this class of models, we follow the same approach as in a previous article 
, which extends a master equation derived elsewhere 
to promoters with arbitrary combinatorial complexity. The complexity refers to the existence of a number of discrete promoter states corresponding to different arrangements of transcription factors on the promoter DNA. Promoter dynamics are described by trajectories involving stochastic transitions between promoter states which are induced by the binding and unbinding of transcription factors. A detailed derivation of the equations which describe promoter dynamics can be found in the Text S1
, but the essentials are described below.
There are only two stochastic variables in the model: the number of mRNA transcripts per cell, which is represented by the unitless state variable m
, and the state of the promoter, which is defined by the pattern of transcription factors bound to their operator sites. The promoter state is described by a discrete and finite stochastic variable (
) (for an example, see ). The example in illustrates the simplest model of transcriptional activation by a transcription factor. When the activator is not bound (state 1), mRNA is synthesized at rate
. When the activator is bound to the promoter (state 2), mRNA is synthesized at the higher rate
. The promoter switches stochastically from state 1 to state 2 with rate
, and from state 2 to state 1 with rate
. Each mRNA molecule is degraded with rate
The time evolution for the joint probability of having the promoter in states 1 or 2, with
mRNAs in the cell (which we write as
, respectively), is given by a master equation, which we can build by listing all possible reactions that lead to a change in cellular state, either by changing
or by changing
(). The master equation takes the form:
Inspecting this system of equations, we notice that by defining the vector:
and the matrices
we can rewrite the system of equations (1) in matrix form.
This has several advantages, but the most important one is that the matrix approach reduces the task of obtaining analytical expressions for the moments of the steady state mRNA distribution for an arbitrarily complex promoter to solving two simple linear matrix equations (more details are given in the Text S1
The matrices appearing in equation (4) all have simple and intuitive interpretations. The matrix
describes the stochastic transitions between promoter states: The off-diagonal elements of the matrix
are the rates of making transitions from promoter state
to promoter state
.The diagonal elements of the matrix
are negative, and they represent the net probability flux out of state
. The matrix
is a diagonal matrix whose element
gives the rate of transcription initiation when the promoter is in state
. Finally, the matrix
is the identity matrix.
An example of matrices
is presented pictorially in in Text S1
. It is straightforward to see that even though equation (4) has been derived for a two-state promoter, it also applies to any other promoter architecture. What will change for different architectures are the dimensions of the matrices and vectors (these are given by the number of promoter states) as well as the values of the rate constants that make up the matrix elements of the various matrices.
An important limit of the master equation, which is often attained experimentally, is the steady state limit, where the probability distribution for mRNA number per cell does not change with time. Although the time dependence of the moments of the mRNA distribution can be easily computed from our model, for the sake of simplicity and because most experimental studies have been performed on cells in steady state, we focus on this limit. As shown in Text S1
, analytic expressions for the first two moments of the steady state mRNA probability distribution are found by multiplying both sides of equation (4) by
respectively, and then summing
from 0 to infinity. After some algebra (elaborated in an earlier paper and in Text S1
), we find that the first two moments can be written as:
contains the ordered list of rates of transcription initiation for each promoter state. For the two-state promoter shown in ,
. The vector
contains the steady state probabilities for finding the promoter in each one of the possible promoter states, while
is the steady-state mean mRNA number in each promoter state. The vector
is the solution to the matrix equation
while the vector
is obtained from
illustrates the following algorithm for computing the intrinsic variability of mRNA number for promoters of arbitrarily complex architecture:
- Make a list of all possible promoter states and their kinetic transitions ()
- Construct the matrices and , and the vector , ( in Text S1).
- Solve equations (7–8) to obtain and
- Plug solutions of (7–8) into equations (5–6) to obtain the moments.
The normalized variance of the mRNA distribution in steady state is then computed from the equation:
Equation (9) reveals that, regardless of the specific details characterizing promoter architecture, the intrinsic noise is always the sum of two components, and it can be written as
The first component is due to spontaneous stochastic production and degradation of single mRNA molecules, it is always equal to the Poissonian expectation of
, and is independent of the architecture of the promoter. For an unregulated promoter that is always active and does not switch between multiple states (or does so very fast compared to the rates of transcription and mRNA degradation), the mRNA distribution is well described by a Poisson distribution 
, and the normalized variance is equal to
. The second component (“promoter noise”) results from promoter state fluctuations, and captures the effect of the promoter's architecture on the cell-to-cell variability in mRNA:
In order to quantify the effect of the promoter architecture in the level of cell-to-cell variability in mRNA expression, we define the deviation in the normalized variance caused by gene regulation relative to the baseline Poisson noise for the same mean (see ):
Therefore, the deviation in the normalized variance caused by gene regulation is equal to the ratio between the variance and the mean. This parameter is also known as the Fano factor. Thus, for any given promoter architecture, the Fano factor quantitatively characterizes how large the mRNA noise is relative to that of a Poisson distribution of the same mean (i.e. how much the noise for the regulated promoter elevates with respect to the Poisson noise). This is the parameter that we will use throughout the paper as the metric of cell-to-cell variability in gene expression.
Promoter noise and variability of mRNA and protein numbers
For proteins, the picture is only slightly more complicated. As shown in the Text S1
, in the limit where the lifetime of mRNA is much shorter than that of the protein it encodes for (a limit that is often fulfilled 
), the noise strength of the probability distribution of proteins per cell takes the following form (where we define n
as a state variable that represents the copy number of proteins per cell):
stands for the protein degradation rate, and the constant b
is equal to the protein burst size (the average number of proteins produced by one mRNA molecule). The mean protein per cell is given by
and the vector
is the solution of the algebraic equation:
The reader is referred to the Text S1
for a detailed derivation and interpretation of these equations. In the previous section we have shown that the noise for proteins and mRNA take very similar analytical forms. Indeed, if we define
, as the vector and matrix containing the average rates of protein synthesis for each promoter state, it is straightforward to see that equations (8) and (15) are mathematically equivalent, with the only difference being that in equation (15) the matrix
represents the rates of protein synthesis, so all the rates of transcription are multiplied by the translation burst size b
. Therefore, the vectors
are only going to differ in the prefactor b
multiplying all the different transcription rates. We conclude that the promoter contribution to the noise takes the exact same analytical form both for proteins and for mRNA, with the only other quantitative difference being the different rates of degradation for proteins and mRNA. Therefore, promoter architecture has the same qualitative effect on cell-to-cell variability in mRNA and protein numbers. All the conclusions about the effect of promoter architecture on cell-to-cell variability in mRNA expression are also valid for proteins, even though quantitative differences do generally exist. For the sake of simplicity we focus on mRNA noise for the remainder of the paper.
Parameters and assumptions
In order to evaluate the equations in our model, we use parameters that are consistent with experimental measurements of rates and equilibrium constants in vivo
and in vitro
, which we summarize in . Although these values correspond to specific examples of E. coli
promoters, like the Plac
or the PRM
promoter, we extend their reach by using them as “typical” parameters characteristic of bacterial promoters, with the idea being that we are trying to demonstrate the classes of effects that can be expected, rather than dissecting in detail any particular promoter. The rate of association for transcription factors to operators in vivo
is assumed to be the same as the recently measured value for the Lac repressor, which is close to the diffusion limited rate 
. In order to test whether the particular selection of parameters in is biasing our results, we have also done several controls (See – in Text S1
) in which the kinetic parameters were randomly sampled. We found that the conclusions reached for the set of parameters in are valid for other parameter sets as well.
Kinetic parameters used to make the quantitative estimates in the text and plots in the figures.
Operator strength reflects how tightly operators bind their transcription factors, and it is quantitatively characterized by the equilibrium dissociation constant
. The dissociation constant has units of concentration and is equal to the concentration of free transcription factor at which the probability for the operator to be occupied is 1/2.
is related to the association and dissociation rates by
, where koff
is the rate (i.e., the probability per unit time) at which a transcription factor dissociates from the promoter, and
is a second order rate constant, which represents the association rate per unit of concentration of transcription factors, i.e.,
. Note that in the last formula
, which has units of s−1
, is written as the product of two quantities:
, which is the concentration (in units of (mol/liter)) of transcription factors inside the cell, and
, a second order rate constant that has units of (mol/liter)−1
. For simplicity, we assume that the binding reaction is diffusion limited, namely,
is already close to its maximum possible value, so the only parameter that can differ from operator to operator is the dissociation rate: strong operators have slow dissociation rates, and weak operators have large dissociation rates.
Throughout this paper, we also make the assumption that the mean expression level is controlled by varying the intracellular concentration of transcription factors, a scenario that is very common experimentally 
. We also assume that changing the intracellular concentration of transcription factors only affects the association rate of transcription factors to the operators, but the dissociation rate and the rates of transcription at each promoter state are not affected. In other words,
is a constant parameter for each operator, and it is not changed when we change the mean by titrating the intracellular repressor level. All of these general assumptions need to be revisited when studying a specific gene-regulatory system. Here our focus is on illustrating the general principles associated with different promoter architectures typical of those found in prokaryotes.
To generate mRNA time traces, we applied the Gillespie algorithm 
to the master equation described in the text. A single time step of the simulation is performed as follows: one of the set of possible trajectories is chosen according to its relative weight, and the state of the system is updated appropriately. At the same time, the time elapsed since the last step is chosen from an exponential distribution, whose rate parameter equals the sum of rate parameters of all possible trajectories. This process is repeated iteratively to generate trajectories that exactly reflect dynamics of the underlying master equation. For the figures, simulation lengths were set long enough for the system to reach steady state and for a few promoter state transitions to occur.
To generate the probability distributions, it is convenient to reformulate the entire system of mRNA master equations in terms of a single matrix equation. To do this, we first define a vector
is the joint probability of having
mRNAs while in the
th promoter state. Then the master equation for time evolution of this probability vector is
where each element of the matrix is itself an N by N matrix as described in the text. Then finding the steady-state distribution
is equivalent to finding the eigenvector of the above matrix associated with eigenvalue 0. To perform this calculation numerically, one must first choose an upper bound on mRNA copy number in order to work with finite matrices. In this work, we chose an upper bound six standard deviations above mean mRNA copy number as an initial guess, and then modified this bound if necessary. Computations were performed using the SciPy (Scientific Python) software package.