|Home | About | Journals | Submit | Contact Us | Français|
Edited by Prof. Kenta Nakai
We investigate the hypothesis that, in Escherichia coli, while the concentration of RNA polymerases differs in different growth conditions, the fraction of RNA polymerases free for transcription remains approximately constant within a certain range of these conditions. After establishing this, we apply a standard model-fitting procedure to fully characterize the in vivo kinetics of the rate-limiting steps in transcription initiation of the Plac/ara-1 promoter from distributions of intervals between transcription events in cells with different RNA polymerase concentrations. We find that, under full induction, the closed complex lasts ~788 s while subsequent steps last ~193 s, on average. We then establish that the closed complex formation usually occurs multiple times prior to each successful initiation event. Furthermore, the promoter intermittently switches to an inactive state that, on average, lasts ~87 s. This is shown to arise from the intermittent repression of the promoter by LacI. The methods employed here should be of use to resolve the rate-limiting steps governing the in vivo dynamics of initiation of prokaryotic promoters, similar to established steady-state assays to resolve the in vitro dynamics.
Gene expression has been intensively studied with the relatively new tools provided by fluorescent proteins and microscopy techniques with single-molecule resolution, in both prokaryotic1–5 and eukaryotic6,7 systems. These studies have established that this process cannot be fully characterized by the mean protein production rate,8–12 since cells exhibit fluctuations (i.e. noise) over time and diversity in numbers across populations,13 which, among other things, generates phenotypic diversity.8 The noise has generally been investigated through indirect means, such as by observing the diversity in RNA and protein numbers in cell populations.2,3,10,11,14 Other, more direct means consist of observing the distribution of intervals between RNA productions2,4,5 and between protein bursts in individual cells.3,15
From these observations, a wide range of gene expression behaviours have been reported and, therefore, significantly different probabilistic models of transcription have been proposed.2,4,16–18 In general, higher-than-Poissonian variability in RNA numbers has been explained by models in which the promoter intermittently switched into an inactive state, resulting in bursty RNA production dynamics.2,16,19 Meanwhile, lower-than-Poissonian variability appears to be more consistent with models assuming multiple rate-limiting steps.4,5,16,20,21
There is direct experimental evidence for the existence of both mechanisms. Recently, Chong et al.19 showed that bursts of RNA production can emerge due to positive supercoiling build-up on a DNA segment, which eventually stops transcription initiation for a short period until the release of the supercoiling by gyrase. On the other hand, the existence of rate-limiting steps was established by studies using steady-state assays.22–24 Also, more recently, by fitting a monotone piecewise-constant function to the fluorescence signal from MS2-GFP tagged RNAs in individual cells, it was shown that in vivo RNA production can be a sub-Poissonian process.4,5,20,21
Recent studies have considered the possibility that both mechanisms can be present in a single promoter.16,25 In ref. 25, a model including both mechanisms was proposed, and statistical methods were developed to select the relevant components and estimate the kinetics of the intermediate steps in initiation based on empirical data. However, this method cannot distinguish the order of the steps which occur after the start of transcription initiation, nor can it determine their reversibility, which recent evidence suggests may play a significant role in the dynamics of RNA production.26
A complete model for transcription in prokaryotes must account, apart from the genome-wide variability in noise levels,17,27,28 for the well-established genome-wide variability in mean transcription rate2,3,8 and in fold change (ratio of production rate between zero and full induction)29 in response to induction found, e.g. in Escherichia coli promoters. For example, in vitro measurements on fully induced variants of the lar promoter showed that the mean interval between transcription events of these variants differs by hundreds of seconds.29 Promoters also differ widely in range of induction, even when differing only by a couple of nucleotides.29,30 For example, while PlarS17 has an induction range of 500 fold, PlarconS17 has an induction range of 4.5-fold, even though it only differs by 3 point mutations.29 This wide behavioural diversity is likely made possible by the sequence dependence of each step in transcription initiation.29
Thus far, the strategies used in vitro to characterize the kinetics of the steps involved in transcription initiation22,26 have not been applied in vivo since they rely on measuring transcription for different RNA polymerase (RNAp) concentrations. Such a change in cells is expected to have a multitude of unforeseen effects31 (in addition to the side effects of the means used to alter RNAp concentrations), which hampers the assessment of its consequences to the duration of the closed complex formation of a specific promoter. However, it is reasonable to hypothesize that, for certain small ranges of RNAp concentrations, these side effects will be negligible and thus, in such ranges, the inverse of the rate of transcription will be linear with respect to the inverse of the free RNAp concentration.
Importantly, in E. coli, RNAp concentrations have been shown to vary widely with differing growth conditions.32 As such, here we make use of different media richness to achieve different RNAp concentrations and test whether within this range of conditions, the RNA production rate changes hyperbolically with the RNAp concentrations (i.e. if the inverse of this rate changes linearly with the inverse of the RNAp concentration). Having established this relationship, we make use of it to study the in vivo kinetics of transcription initiation of Plac/ara-1. In particular, we perform measurements of the time intervals between RNA productions at the single molecule level in different intracellular RNAp and inducer concentration conditions, which we use to derive a more detailed model of transcription initiation of Plac/ara-1. For this, we first extrapolate the mean interval between production events to the limit of infinite RNAp concentration, so as to estimate the in vivo durations of the open and closed complex formations of this promoter. Next, we examine the significance of an intermittent inactive promoter state, and the role of LacI in the emergence of this state. Finally, for the first time in vivo, we determine the reversibility of the closed complex formation.
For single-cell RNAp fluorescence measurements, we used E. coli W3110 and RL1314,33 generously provided by Robert Landick, University of Wisconsin-Madison. For single-cell transcription interval measurements, we used E. coli DH5α-PRO (generously provided by Ido Golding, Baylor College of Medicine, Houston). The strain information is: deoR, endA1, gyrA96, hsdR17(rK- mK+), recA1, relA1, supE44, thi-1, Δ(lacZYA-argF)U169, Φ80δlacZΔM15, F-, λ-, PN25/tetR, PlacIq/lacI and SpR. This strain contains two constructs: a high-copy reporter plasmid vector PROTET-K133 (carrying MS2d-GFP under the control of PLtetO-1) and a single-copy plasmid vector pIG-BAC carrying the target transcript (mRFP1 followed by 96 MS2-binding sites) under the control of Plac/ara-1.2 This promoter is located approximately 2 and 9 kb from the origin of replication (Ori2) and the plasmid size is 11.5 kb.2 This system has been used to measure the distribution of time intervals between RNA production events due to its ability to detect individual target RNA molecules consisting of numerous MS2 coat protein binding sites, which are rapidly bound by fluorescently tagged MS2 coat proteins. These can be seen as they are produced under a fluorescence microscope as fluorescent foci.2,4,5,20,21 Finally, we used the plasmid pAB332 carrying hupA-mCherry to visualize nucleoids (generously provided by Nancy Kleckner, Harvard University, Cambridge, MA, USA). For our measurements, we inserted this plasmid into DH5α-PRO cells so as to detect nucleoids in individual cells during the live cell microscopy sessions. HupA is a major nucleoid associated protein (NAP) that participates in its structural organization.34
The components of Lysogeny Broth (LB) were purchased from LabM (UK), and antibiotics from Sigma-Aldrich (USA). For RT-PCR, cells were ﬁxed with RNAprotect bacteria reagent (Qiagen, USA). Tris and EDTA for lysis buffer were purchased from Sigma-Aldrich and lysozyme from Fermentas (USA). The total RNA extraction was done with RNeasy RNA puriﬁcation kit (Qiagen). DNase I, RNasefree for RNA puriﬁcation, was purchased from Promega (USA). iScript Reverse Transcription Supermix for cDNA synthesis and iQ SYBR Green supermix for RT-PCR were purchased from Biorad (USA). Agarose, isopropyl β-d-1-thiogalactopyranoside (IPTG), arabinose, and anhydrotetracycline (aTc) are from Sigma-Aldrich.
To achieve different RNAp concentrations in cells, we altered their growth conditions as in.35 For this, we used modified LB media which differed in the concentrations of some of their components. The media used are denoted as m×, where the composition per 100 ml are: m grams of tryptone, m/2 gram of yeast extract and 1 g of NaCl (pH = 7.0). For example, 0.25× media has 0.25 g of tryptone and 0.125 g of yeast extract per 100 ml.
We measured relative RNAp concentrations in cells using four different methods. First, relative RNAp concentrations in the strains W3110 and DH5α-PRO were measured from the relative rpoC transcript levels obtained using RT-PCR. Cells containing the target plasmid with Plac/ara-1-mRFP1-96BS and the reporter plasmids were grown overnight in respective media. Cells were diluted into fresh media to an OD600 of 0.05. After 110 min, cells were re-diluted to an OD600 of 0.05 into respective media containing IPTG (1 mM) and arabinose (1%). After 70 min, RNA protect reagent was added to fix the cells, followed by enzymatic lysis with Tris–EDTA lysozyme buffer (pH 8.3). RNA was isolated from cells using RNeasy mini-kit (Qiagen). One microgram of RNA was used as the starting material. The RNA samples were treated with DNase free of RNase to remove residual DNA. Next, RNA was reverse transcribed into cDNA using iSCRIPT reverse transcription super mix (Biorad). RT-PCR was performed using Power SYBR-green master mix (Life Technologies) with primers for the amplification of the target gene at a concentration of 200 nM. Reactions were carried out in triplicate with 500 nM per primer with a total reaction volume 20 µl. The following primers were used for quantification: RpoC-F: CGTCAGATGCTGCGTAAAGC, RpoC-R: GCGATCTTGACGCGAGAGTA, mRFP1-F: TACGACGCCGAGGTCAAG, mRFP1-R: TTGTGGGAGGTGATGTCCA. Estimated relative RNAp concentrations in each condition m, and their standard uncertainties , were calculated according to the method.36
Second, E. coli RL1314 cells with fluorescently tagged β′ subunits were grown overnight in respective media. A pre-culture was prepared by diluting cells to an OD600 of 0.1 with fresh specific medium, and grown to an OD600 of 0.5 at 37°C at 250 rpm. Cells were pelleted by centrifugation and re-suspended in saline. Fluorescence from the cell population was measured using a fluorescent plate-reader (Thermo Scientiﬁc Fluoroskan Ascent Microplate Fluorometer).
Third, relative RNAp concentrations were also estimated based on the growth rates of DH5α-PRO cells in Supplementary Fig. S1. First, we fit a power law function to the ‘RNA polymerase molecules per cell’ row of Table Table33 from ref. 32, which we found to be R=106µ−1.426, where µ is the cell doubling time. Relative RNAp concentrations were then estimated from the measured cell doubling times.
Lastly, we measured the relative RNAp concentrations in RL1314 cells under the microscope using fluorescently tagged RpoC (described in the next section).
DH5α-PRO cells containing the target and the reporter plasmids were grown as described previously. Briefly, cells were grown overnight in respective media, diluted into fresh media to an OD600 of 0.1, and allowed to grow to an OD600 of ~0.3. For the reporter plasmid induction, aTc (100 ng/ml) was added 1 h before the start of the measurements. For the target plasmid, arabinose (1%) was added at the same time as aTc (following the protocol in ref. 2), and IPTG (1 mM) was added 10 min before the start of the measurements. Cells were pelleted and resuspended to fresh medium. A few microliters of cells were placed between a coverslip and an agarose gel pad (2%), which contains the respective inducers, in a thermal imaging chamber (FCS2, Bioptechs), heated to 37°C. The cells were visualized using a Nikon Eclipse (Ti-E, Nikon, Japan) inverted microscope with a C2+ confocal laser-scanning system using a 100× Apo TIRF objective. Images were acquired using the Nikon Nis-Elements software. GFP ﬂuorescence was measured using a 488 nm argon ion laser (Melles-Griot) and 514/30 nm emission ﬁlter. Phase-contrast images were acquired with the external phase contrast system and a Nikon DS-Fi2 camera. Fluorescence images were acquired every 1 min for a total duration of 2 h. Phase-contrast images were acquired simultaneously every 5 min during the measurements.
We tested for phototoxicity due to the fluorescence and the phase-contrast imaging in these measurements. Supplementary Results suggest that there is no significant phototoxicity. Additionally, we verified that the relative RNAp concentrations under the microscope are similar to those measured in the previous section by repeating the above procedure with RL1314 cells and imaging RpoC::GFP fluorescence, 1 h after being placed in the thermal imaging chamber (see Supplementary Fig. S4). The relative RNAp concentration was estimated from the mean fluorescence concentrations of cells growing in each media.
Cells were detected from the phase contrast images as described in ref. 37. First, the images were temporally aligned using cross-correlation. Next, an automatic segmentation of the cells was performed by MAMLE,38 which was checked and corrected manually. Next, cell lineages were constructed by CellAging.39 Alignment of the phase-contrast images with the confocal images was done by manually selecting 5–7 landmarks in both images, and using thin-plate spline interpolation for the registration transform. Fluorescent spots and their intensities were detected from the confocal images using the Gaussian surface-fitting algorithm from.40
Jumps were detected in each cell's spot intensity timeseries using a least-deviation jump-detection method.41 Given the level of noise in the timeseries, jump sizes, i.e. the intensity of ‘one RNA’, were selected by manual inspection of the timeseries of total foreground spot intensities within cells of a given timeseries, and cross-referencing these values with the observed numbers of spots in the cells. After performing the jump detection process making use of the complete timeseries, jumps occurring within 5 min of the beginning or end of a cell's lifetime were disregarded due to our observation that the jump detection method tends to produce spurious jumps in these regions due to insufficient data. The remaining jumps were interpreted as RNA production times, from which intervals between transcription events were calculated. Finally, censored intervals were calculated as the time from the last RNA production in a cell until the last time at which a jump could have been observed (i.e. until 5 min prior to cell division or the end of the timeseries). This removes the possibility of false positives while not affecting the distribution of intervals.
This method, when first proposed, made two assumptions on the fluorescence of MS2-GFP tagged RNAs (named ‘spots’). Importantly, both assumptions were recently shown to be valid.42 First, an individual spot is bound sufficiently rapidly by MS2-GFPs such that its fluorescence intensity, when first detected, is already within the range of fluorescence of fully formed MS2-GFP-RNA spots (when taking one image per minute). In other words, the spot intensity of a newly transcribed RNA jumps from 0 to ‘full’ in <1 min, rather than slowly ramping up. Namely, since the transcription elongation rate of mRNA in E. coli is ~50 nt/s32 and the target gene is ~3,200 bp long,1 the time to elongate the MS2-binding site region of the target RNA is ~60 s. Provided that MS2-GFP binding to its RNA-binding sites is fast, there will therefore be a maximum of one timepoint at which the fully transcribed target RNA may have reduced fluorescence. Since MS2-GFP is produced in excess in the cell and its binding affinity is strong (dissociation constant of ~0.04 nM43), most binding sites will be saturated very shortly after being produced. In agreement with ref. 42, no gradual increase in spot fluorescence was observed around the time of the first appearance of a spot.
Second, once formed, MS2-GFP-RNA spots, as well as their fluorescence, are resistant to degradation for the duration of our measurements (2 h). This was shown by measurements of the dissociation rate of MS2 coat proteins from their RNA binding sites (on the order of several hours43), and by measurements of the lifetimes of the fluorescence of MS2-GFP tagged RNAs kept under observation for more than 2 h.1,2,5,42,44 Relevantly, no detectable decrease in fluorescence was observed during this time.42
We first consider a model that allows for RNA production dynamics to range from sub-Poissonian to super-Poissonian, given the results from genome-wide studies of the variability in RNA numbers27,45 and from studies of the transcription dynamics of individual genes.2,4,5,17,20 The features of the model that allow it to reproduce these numbers are based on processes known to occur during transcription initiation in E. coli (e.g. the open complex formation16,22,23 and an ON/OFF mechanism16,19). Then, based on our novel empirical data and methodology, we aim to obtain the most parsimonious version of the model that fits the data for a given promoter. We expect this procedure to be applicable to any promoter, and to result in slightly different models due to their differing dynamics and regulatory mechanisms.
The full model of transcription initiation considered here consists of the following set of reactions:
Reaction (1) represents the multi-step process of transcription initiation of an active promoter in prokaryotes.23,24,46,47 It begins with the formation of the closed complex (RPc), i.e. the binding of the RNA polymerase (R) to a free promoter (PON). Once at the start site, the polymerase must open the DNA double helix, a process that includes several long-lived intermediate states,23,26,46,48 resulting in the open complex (RPo). Finally, the polymerase begins RNA elongation, though before clearing the promoter, it may engage in abortive RNA synthesis in which short RNA transcripts (<10 nt) are produced.47,49 The reactions in (1) should not be interpreted as elementary transitions. Rather, they represent the effective rates of the rate-limiting steps in the process, thus defining the promoter strength, and have been shown to be sequence-dependent.50
Specifically, k1 represents the rate at which polymerases find and bind to the promoter region, which is the overall result of the promoter search process which includes non-specific binding of the polymerases to the DNA, followed by a 1D diffusive search,51,52 collectively referred to here as the closed complex formation. Subsequently, several rapid, possibly reversible isomerization reactions occur until the polymerase melts the DNA and forms the transcription ‘bubble’.51 In Reaction (1), the RPc state represents all substates until the first irreversible reaction in this chain. Consequently, k2 and k−1 should be interpreted as the product of the rates of the elementary reactions which exit from this group of substates, and the steady-state probability of being in the appropriate substates for these reactions to occur.
Similarly, the RPo state may represent numerous substates between the first state after which the complex is committed to initiation, and successful initiation. However, after this point, we cannot distinguish the reversibility of any of the following steps, since the time-interval distribution of a sequence of elementary reversible reactions of arbitrary rates is observationally equivalent to a sequence of irreversible reactions.25 The remaining steps (here, only k3) therefore represent the rates of the slowest of these irreversible reactions. Such steps may include additional isomerization reactions, abortive RNA synthesis and promoter escape and clearance.35
Reaction (2) represents the promoter intermittently transitioning to a transcriptionally inactive state (POFF). Experimentally verified mechanisms by which this can occur are the binding and unbinding of repressors and activators,29 the accumulation of positive supercoiling in the DNA.19 Additional mechanisms have also been hypothesized, such as transcriptional pausing53,54 and others.55
For a given concentration of R, the interval distribution between transcription events described by Reactions (1) and (2) (i.e. the first-passage time distribution to reach the final state, starting in the PON state) is observationally equivalent to the interval distribution described by a model of the form:
where the system starts in state S0. The relationship between the parameters of these two models is described in Supplementary Table S1. Note that the states Si do not correspond to the promoter states in Reactions (1) and (2). For details on how to derive and evaluate the distribution function for this model, see Supplementary Material and.25
It is noted that this model assumes that only one copy of the promoter is present in each cell at any given time. In the experiments performed here, in all conditions tested, the bacteria divided sufficiently slowly such that they spent most lifetime with only one chromosome. Specifically, cells spent no more than 11.4 ± 1.0% of their lifetime with two copies of the target promoter (Supplementary Material).
Finally, it is noted that the present model does not consider the influence of σ factors' numbers on the dynamics of transcription initiation, focussing instead solely on the concentration of RNA polymerases (in particular, on the concentration of holoenzymes containing a σ70, i.e. Eσ70, since our promoter of interest can only be transcribed by Eσ70). This is based on the fact that, in all conditions tested, most RNA polymerases are occupied by σ factors.56,57 Further, this occupation is made largely by σ70 since, first, when altering media richness, only σ32's concentration is significantly altered56 and, second, the binding affinity of σ70 to E is much higher than that of any other σ factor (e.g. it is approximately 9 times higher than that of σ32).57
Parameter estimates in Tables Tables11–3 were obtained by a maximum likelihood fit using the samples of the distribution of time intervals between production events obtained above (the intervals and censored intervals), as in.25 The complete model-fitting procedure is detailed in the Supplementary Material. The uncertainty of the fit of the model parameters was estimated using the negative of the Hessian of the log-likelihood surface, evaluated at the maximum likelihood estimate.
The mean of the time interval distribution between transcription initiation events, I(R), predicted by Reactions (1) and (2) is, for a given RNAp concentration R:
where is the mean time taken by the initial binding of RNAp for a given RNAp concentration, and is the mean time taken by the steps occurring after the polymerase has committed to transcription until the clearance of the promoter region (due to the initiation of elongation). As such, we expect the majority of the duration of to consist of the open complex formation as defined in.46 The remaining of its duration we attribute to failures in promoter escape.59
Estimates of and , denoted and , were obtained from the best-fit parameters of the most parsimonious model, as given in Table Table3.3. The standard uncertainties of the estimators and , denoted and , were obtained using the Delta Method60 from the uncertainties of the model parameters.
Finally, mean durations of intervals between transcription events for each media condition , were estimated by fitting the model in Reaction (3) to the data from only that condition, and taking the mean of the distribution. This procedure was followed to include the censored intervals in the estimate of to avoid underestimating the mean interval duration due to the limited observation times. The standard uncertainty was estimated using the Delta Method.60
We verified the slope of the τ-plot in Fig. Fig.44 using the RT-PCR measurements from Fig. Fig.3.3. These measurements are both linear with respect to , but differ by an unknown scaling factor. We denote the estimated production rate as measured by RT-PCR in media condition m as , with standard uncertainty . We found this scaling factor by fitting the parameter c in by weighted total least squares61 (WTLS), with the measurements weighted by the inverse of their uncertainty (i.e. and ). This method was chosen since it accounts for the uncertainty in both of the measurements. It results in the estimate . The dashed line in Fig. Fig.44 was obtained by fitting the scaled points against by WTLS. The uncertainty shown includes both the uncertainty in the WTLS fit of this line, as well as the uncertainty in .
The method to infer the kinetics of transcription initiation in vivo is illustrated in Fig. Fig.1.1. First, conditions are selected that differ widely in free intracellular RNAp concentrations (step A in Fig. Fig.1).1). Next, an in vivo single-molecule detection technique is used to sample the time interval distribution between consecutive transcription events in individual cells in each of the conditions (step C in Fig. Fig.1).1). To obtain these intervals, here we used the MS2d-GFP single RNA detection system4 (step B in Fig. Fig.1).1). Then, we fit a general model of transcription initiation to the empirical data (see above), which includes both the multi-step nature of transcription initiation as well as the possibility of an intermittently inactive promoter state25 (Reactions (1) and (2)). From this fit, we obtain an estimate of the in vivo mean duration of the open complex formation by extrapolating the duration of intervals between transcription events to infinite RNAp concentrations, similar to the in vitro extrapolation presented in ref. 22 (step D in Fig. Fig.1).1). The model fit will also assess the importance of an intermittent inactive promoter state and the reversibility and kinetics of the closed complex formation.
We first verified that it is possible to change intracellular RNAp concentration by a wide range by changing the growth conditions of the cells.32,35,62 As such, we grew cells in four media (described in the Materials and methods), labelled 1×, 0.75×, 0.5×, and 0.25×, which solely differ in richness of two components (tryptone and yeast extract). We then measured the relative RNAp concentrations in cells grown in these four media using RT-PCR of the rpoC gene, i.e. the gene coding for the β′ subunit, which is the limiting factor in the assembly of the RNAp holoenzyme.48,57,62 Results in Fig. Fig.22 (dark grey bars) show that, in the range tested, the RNAp concentration in the cells increases significantly with increasing media richness.
To validate this result, we measured the relative RNAp concentrations by plate reader in cells expressing fluorescently tagged RpoC in the strain RL1314 (derived from W3110),33 in the same four media. In addition, we also measured the levels of the rpoC transcripts in the strain W3110 by RT-PCR in the 0.5× and 1× conditions. Results (Fig. (Fig.2)2) show that the relative changes in the protein and mRNA levels of rpoC match the measurements by RT-PCR of the rpoC gene in DH5α-PRO.
Note that, even though the experimental procedures and strains differ, our measurements are in agreement with the relative changes in RNAp concentrations reported in ref. 32, for the difference in growth rates observed here between the 0.25× and 1× conditions (Supplementary Fig. S1), which we estimate to be ~0.48 (Materials and methods). In this regard, given that the same result applies to (at least) three different strains, we expect it to be significantly strain-independent.
Finally, to verify that the relative RNAp concentrations measured in Fig. Fig.22 are maintained under the microscope, we measured the relative RNAp concentration in the RL1314 cells expressing fluorescently tagged RpoC under the microscope between the two extreme conditions (0.25× and 1×), after 1 h in the thermal imaging chamber (Materials and methods). The relative RNAp concentration between the conditions was measured to be 0.367 ± 0.012, which is consistent with the measurements in Fig. Fig.2.2. Lastly, from these images, we did not observe significant cell-to-cell variability in the RNAp concentrations (Supplementary Fig. S4), indicating that the mean concentrations reported in Fig. Fig.22 are representative of the populations.
These measurements show that the relative RNAp concentration changes widely between the selected growth conditions. However, the variable affecting transcription kinetics is the relative free RNAp concentration. As such, we must verify whether the relative total RNAp concentration can be used as a proxy for the relative free RNAp concentrations. If this holds true and there are no other factors affecting the production rate of the promoter of interest in these conditions, then the RNA production rate should be hyperbolic with respect to the RNAp concentration. That is, the reciprocal of the RNA production rate from this promoter should be linear when plotted against the reciprocal of the measured relative RNAp concentrations, and one should obtain a line on a Lineweaver–Burk plot.
There are several reasons why this plot may not be linear. If, for example, the ratio of free RNAp to total RNAp is not constant in this range of growth conditions, with a higher fraction of free RNAp in the poorer growth conditions due to increased ppGpp,31 then we expect a curve with positive curvature on this plot. Meanwhile, a negative curvature would be obtained if the promoter of interest could be induced by increased cAMP in the poorer growth conditions, or if the cells spent, on average, a significantly increased amount of time with multiple copies of the plasmid in the richer growth conditions, among other possibilities. In these cases, to dissect the transcription initiation kinetics of such promoters, another method of modifying the free RNAp concentration will be required.
Given the above, we interpret a straight line on the Lineweaver–Burk plot as evidence that, for the conditions tested, (i) the relative free RNAp concentrations can be assessed from the total RNAp concentrations, and (ii) no factors other than the changes in the free RNAp concentration affect the target promoter.
Here, we tested this by measuring the RNA production rate from Plac/ara-1 in E. coli DH5α-PRO by RT-PCR in the same four media conditions as in Fig. Fig.2.2. We selected this promoter, since its dynamics has been extensively characterized2,21,29,63–67 and because it has the same logical structure as the lac promoter, with an activator and a repressor.63 The resulting Lineweaver–Burk plot is shown in Fig. Fig.33 where a linear relationship is clearly observed between these points (black points). To determine whether the small deviations from linearity are statistically significant, we performed a likelihood ratio test between a linear fit by WTLS61 (shown as a line in Fig. Fig.3),3), and fits with higher order polynomials (also by WTLS by minimizing χ2 as in61). No test rejected the linear model (all P > 0.25). As noted earlier, this relationship is only expected to occur in a limited range of growth conditions. To illustrate this, we repeated the same measurements in 1.5× media (grey point in Fig. Fig.3).3). The result shows that this hyperbolic relationship is lost in very rich media (including this point causes the likelihood ratio test to reject the linear model, P = 0.0014). We conclude that, for the growth conditions in Fig. Fig.2,2, the relative free RNAp concentrations are well-approximated by the total RNAp concentrations, and there are no significant other factors affecting the initiation dynamics of Plac/ara-1.
Given this, it is possible to apply a standard model-fitting procedure to fully characterize the in vivo kinetics of the rate-limiting steps in transcription initiation of the Plac/ara-1 promoter from distributions of intervals between transcription events in cells with different RNA polymerase concentrations.
We measured the distribution of time intervals between transcription events (hereafter referred to as ‘intervals’) for Plac/ara-1 in each cell growth condition using the MS2d-GFP single-RNA detection system,1 with a least-deviation jump-detection procedure41 (Materials and methods). This measurement results in samples from the interval distribution as well as ‘censored’ intervals, i.e. intervals for which we only observe the beginning due to cell division or the end of the time series. Both censored and uncensored intervals were accounted for in all parameter estimates to avoid biasing the estimates. For example, note that taking the mean of the uncensored intervals alone would underestimate the mean of the true interval distribution since long unobservable intervals would be absent from the estimate. Including the censored intervals balances this by considering long intervals that are at least as long as the censored interval length.25
From these distributions, we estimated the true mean and the squared coefficient of variation (CV2, defined as the variance over the squared mean) of the interval distributions (Materials and methods). We chose CV2 for quantifying the noise in the interval distribution since, to a good approximation, this quantity reflects the level of noise in the protein levels regardless of the actual shape of the transcription interval distribution.68 Further, this variable equals 1 for the interval distribution of a Poisson process (i.e. an exponential distribution), regardless of the mean rate. These results, along with the amount of empirical data used, are shown in Table Table11.
From Table Table1,1, the mean interval decreases significantly with increasing media richness, as expected from the increased RNAp concentrations. Meanwhile, the CV2 does not exhibit the same dependence on the media richness, and remains slightly >1 in all conditions tested.
From the data in Table Table1,1, we next recreate the Lineweaver–Burk plot in Fig. Fig.33 (white circles in Fig. Fig.4),4), using the mean interval durations between RNA productions, as this quantity is an absolute measure of the inverse rate of RNA production (this plot is called a τ-plot).
Previously, using in vitro techniques, it has only been possible to extract from a τ-plot the mean duration of the open complex formation (the y-intercept of the plot, here denoted ), because the plot is based on the steady-state assay which only measures the mean rate of abortive transcription initiations. However, the distributions of time intervals between RNA productions contain information about the stochasticity of the process (i.e. the variability between intervals). As such, it is possible to extract a more complete model of the process of transcription. Namely, aside from the open complex formation, as mentioned in Materials and methods, it is possible to extract information on the closed complex and on an intermittent state prior to the closed complex formation.
In particular, we consider the detailed model of transcription initiation presented in Materials and methods (Reactions (1) and (2)), along with simplified models that can be considered if certain steps of the more detailed model do not influence the distribution of intervals. This model assumes that only one copy of the promoter is present in each cell at any given time, since in all conditions, the bacteria divided slowly, which suggests that they spent most lifetime with only one chromosome. We then consider three simplified models. First, if the time spent in the OFF state is very small, or if the system switches between OFF and ON very rapidly when compared with the forward reaction, then Reaction (2) will not affect the RNA production dynamics. A sufficient condition for both of these situations is that kON >> k1. The other two simplifications are two limits of the closed complex formation, first considered in22: (i) k−1 >> k2, i.e. it is reversible (Limiting Mechanism I), and (ii) k2 >> k−1, i.e. irreversible (Limiting Mechanism II). Limiting Mechanism I was found to be more likely in several in vitro measurements of various promoters.22,23,26
While all three simplifications are consistent with a line on a τ-plot, they produce significantly different distributions of intervals between RNA production events. For example, a significant ON/OFF mechanism will result in a more noisy distribution (a higher CV2).25 Similarly, Limiting Mechanism I effectively eliminates one limiting step, which also results in higher noise when compared with Limiting Mechanism II (Supplementary Fig. S2).
We fit the full and simplified models of transcription initiation to the observed dynamics of Plac/ara-1 from all media conditions (Materials and methods). We used the Bayesian Information Criterion70 (BIC) to compare the fits. The BIC is a model selection criterion which balances goodness-of-fit with the number of parameters to determine which model is most likely the ‘truth’. The difference between BIC values (ΔBIC) can be interpreted as evidence against the model with higher BIC, with a ΔBIC > 5 being interpreted as strong evidence.58 Results are shown in Table Table2.2. Since, for several of the models, the optimal fit was for , we also considered models that do not include another rate-limiting step after the open complex formation.
From Table Table2,2, the initiation kinetics of Plac/ara-1 is best-fit by Limiting Mechanism I (i.e. a reversible closed complex), with very high certainty (ΔBIC of all other models >8). We also find evidence for a significant ON/OFF mechanism. Though the time spent in each OFF state is short (~87 s), it will turn OFF, on average, ~9.1 times before committing to transcription in the 1× case (see Supplementary Material). This results in an interval distribution which is only slightly more noisy than what would be expected if the production process were Poissonian (i.e. a CV2 of the interval distribution of 1; see the CV2 values in Table Table1).1). Interestingly, this implies that the noise in transcription of this promoter is representative of the behaviour of the majority of promoters in E. coli.27 Finally, the steps after the commitment to transcription are fast, indicating that abortive initiation events do not play a significant role in the dynamics of RNA production by Plac/ara-1. This model is depicted graphically in Fig. Fig.55.
In addition, from Table Table2,2, we find that is 193 ± 49 s. Meanwhile, the slope of the line on the τ-plot, here denoted , is 788 ± 59 R·s (R is the polymerase concentration such that R = 1 is the polymerase concentration in 1× media). The line given by these values is shown in Fig. Fig.44 (solid line). As a side note, the uncertainties of these estimates exaggerate the uncertainty of the inference, since the estimates are highly correlated (correlation coefficient of −0.6). This correlation is responsible for the hyperbolic shape of the confidence bounds (grey region in Fig. Fig.44).
We verified the slope of the solid line in Fig. Fig.44 using the RT-PCR measurements presented in Fig. Fig.3,3, scaled to match the timescale of the intervals (Materials and methods). The resulting line is shown in Fig. Fig.44 (dashed line), and is in good agreement with both the line given by our estimates of and (solid line), and the inferred interval means (white circles).
Lastly, we note that the BIC depends on the number of samples used to calculate the likelihood. Thus, BIC values calculated assuming that each censored interval is ‘one sample’ will over-penalize models with more parameters, while removing them will under-penalize them. Both sets of ΔBIC values are presented in Table Table22 and, in our case, both result in the same conclusion, and thus the distinction does not affect the results for Plac/ara-1. If, for another promoter, this turns out to be the case, additional measurements will be required to distinguish between the models.
Our results are in agreement with previous measurements of the kinetics of this and similar promoters. For example, a previous study reported that, under full induction in LB media (1× media here), Plac/ara-1 expresses ~4 RNA/h2 (i.e. 1 RNA every ~900 s), while we inferred the time between transcription events to be ~980 s. Using the steady-state assay, was measured to be ~330 s for Plac71 (with or without CRP-cAMP), while we obtained ~193 s.
We identified the presence of an ON/OFF mechanism in the dynamics of Plac/ara-1. It is worth noting that this ON/OFF phenomenon differs from the one reported in refs 2 and 19 since, first, we only observe OFF periods on the order of ~87 s, while in ref. 2 the OFF periods reported for Plac/ara-1were on the order of 37 min. In addition, both here and in ref. 2, the promoter of interest is integrated in a single-copy plasmid, and thus the OFF periods cannot be explained by the buildup of positive supercoiling, since the plasmid is not topologically constrained.19 We therefore hypothesized that the OFF periods observed here more likely result from the intermittent formation of a DNA loop, due to the transient binding of LacI, which exists in high concentration in DH5α-PRO (~3,000 copies vs. ~20 in wild type63).
If LacI is responsible for the ON/OFF behaviour, then reducing the concentration of IPTG should affect the ON/OFF dynamics, and not change the dynamics following the closed complex formation.29 To test this prediction, and demonstrate the utility of the model-fitting approach, besides considering the interval measurements in 1× in Table Table1,1, we also measured the interval distribution of Plac/ara-1 using MS2d-GFP in the 1× media without induction by IPTG. From 130 cells, we extracted 57 intervals and 117 censored intervals between transcription events. From these, we inferred a mean interval of 3,374 ± 462 s, and a CV2 of 1.03. This mean is significantly greater than the mean measured in the fully induced condition (1,005 ± 112 s), consistent with the much stronger repression of the promoter by LacI in this condition.
Given the wide difference in dynamics of RNA production between the induced and non-induced cases, we used the model fitting procedure to determine which steps are significantly affected by LacI. For this, we performed independent fits of a reduced model of initiation to the induced and the non-induced conditions. This model is observationally equivalent to the full model of initiation (Reactions (1) and (2)) for a single value of R, and is presented in Reaction (3). This reduced model is necessary since we do not have measurements of the uninduced case at multiple values of R with which to fit all parameters of the full model. The reduced model's parameters are denoted by λx, which are related to, but are not equal to the values of kx. Their relationship is presented in Supplementary Table S1. The fitting results are shown in Table Table33 (labelled ‘Independent’). We also considered joint models where parameters were fixed between conditions, and used the BIC to select the most likely model.
The first three models with joint parameters test for whether or not the parameters controlling the ON/OFF mechanism change with induction strength. Consistent with this hypothesis, the models with joint are strongly rejected (ΔBIC much higher than that of the Independent model). Surprisingly, the model with only joint was also rejected, implying that the mean OFF times might also vary with induction strength. Additional studies are needed to elucidate why such OFF times depend on the induction strength.
Having established that and differ between conditions, we next assessed whether only these parameters differ. For that, we fixed and , and verified that this model is the most parsimonious model (ΔBIC relative to the Independent model of −14.3). We conclude that only and differ between conditions, confirming the prediction that LacI is responsible for the ON/OFF mechanism affecting the RNA production dynamics.
Finally, other models were considered, e.g. the hypothesis that , , and/or do not differ between conditions. These models were also strongly rejected in favour of the parsimonious model, and are not shown for brevity.
We define the precision of the estimates of and as the ratio between the timescale of the intervals (i.e. the mean interval in the condition with greatest R) and the standard uncertainties of and , respectively. Specifically, the precision of 's estimate is , and the precision of 's estimate is . Given this, here, with the volume of data in Table Table1,1, we achieved and PCC = 17.0, corresponding to errors of ~5 and ~6%, respectively.
In addition, we found that this precision is highly dependent on the dynamic range of RNAp concentrations. For example, for a small dynamic range of 1.5 (our measurements in Fig. Fig.22 have a range of ~2.4), the precisions (in ) and PCC(in ) would have been reduced to ~11.2 and ~6.7, respectively. Losses in precision due to reduced dynamic ranges can, however, to some extent, be offset by collecting more samples for the interval distributions (see estimation of precision in Supplementary Material).
We established that, in E. coli, the concentration of free RNA polymerases differs significantly within a certain range of growth conditions, and that the inverse of the target RNA production rate under the control of Plac/ara-1 varies linearly with the inverse of the free RNAp concentration (which are the conditions imposed in the in vitro measurements the open complex formation by steady state assays22,24,72). Thus, we were able to apply a standard model-fitting procedure to fully characterize the in vivo kinetics of the rate-limiting steps in transcription initiation of the Plac/ara-1 promoter from distributions of intervals between transcription events in cells with different RNA polymerase concentrations. This revealed that this promoter has two rate-limiting steps: a reversible closed complex formation and a significant open complex formation. Further, it also intermittently switches to a short-lived inactive state. Based on the inferred timescale of this inactive state, we predicted that this state is the result of the intermittent binding of the repressor LacI, which we verified by measuring the interval distribution when the promoter is not induced by IPTG. We believe that the complexity of this process is the reason why it has not been reported before. Namely, previous studies only considered either multiple rate-limiting steps,4,5,22,23,66 or an ON/OFF process,2,17,19,73,74 while this promoter exhibits both.
We note that, provided that the promoter has a reversible closed complex formation, the model fitting procedure proposed here allows the duration and order of two steps following the closed complex to be obtained (specifically, the ratio between k2 and k3 can be determined from how the CV2 of the interval distribution changes with R; see Supplementary Fig. S2). Here, this additional step was not found. However, we expect that, for other promoters, or in different conditions (e.g. low temperatures72), this step may be significant. Meanwhile, if Limiting Mechanism II is found to be the best-fitting model, the order of the last two steps will remain ambiguous due to the lack of reversibility.
Finally, it is worth noting that in previous works, we have not found evidence for an ON/OFF mechanism for Plac/ara-1, due to the low levels of noise detected in the time intervals between transcription events.4,21,66 This can be explained by, first, we did not consider censored intervals, which contribute significantly to the increase of the tail of the distribution of intervals.25 Second, the OFF period is quite short, and thus its detection requires a large volume of data and a sensitive inference methodology.25 Our results show that, by solving these two issues (by applying the methods in refs 41 and 25), our methodology can identify and characterize many relevant steps in transcription initiation, including those with lesser influence.
In the future, it would be of interest to extend the model to consider what occurs when more than one copy of a promoter is present in the cell. We expect that variations in the promoter copy numbers would, in that case, explain some of the variance of the data, instead of this variance being solely determined by the ON/OFF mechanism and the sequential steps.
We expect the methodology employed here to be applicable to promoters, native or synthetic, whose changes in the inverse of the transcription rate are linear with the inverse of the free RNAp concentrations. Also, it should be applicable to promoters evolved to interact with multiple transcription factors (TF), provided their fast binding and unbinding (compared with competing events), as they could be accounted for by tuning the rate constants of some of the reactions of the model. Further, multiple slow TFs, including activators, can be accounted for by adding appropriate TF-bound states, with differing production rates, in a similar manner to the ON/OFF model. As such, the methodology should be applicable at a genome wide scale. It should also be applicable to eukaryotes, provided suitable means to alter polymerase concentrations. Lastly, it should be useful in detecting differences in transcription initiation kinetics of a promoter subject to different intra- or extra-cellular conditions.
This work was supported by the Academy of Finland (257603 to A.S.R.); Centre of International Mobility (13.1.2014/TM-14-91361/CIMO to A.S.R.); Jenny and Antti Wihuri Foundation (to A.H.); and the Tampere University of Technology President's Graduate Programme (to J.L.-P. and S.S.). Funding to pay the Open Access publication charges for this article was provided by Academy of Finland (257603 to A.S.R.).
We thank Axel Oikari, Abhishekh Gupta, and Antti Martikainen for valuable advice.