|Home | About | Journals | Submit | Contact Us | Français|
Proper regulation of gene expression is essential for the differentiation, development and survival of all cells and organisms. Recent work demonstrates that transcription of many genes, including key developmental and stimulus-responsive genes, is regulated after the initiation step, by pausing of RNA polymerase II during elongation through the promoter-proximal region. Thus, there is great interest in better understanding the events that follow transcription initiation and the ways in which the efficiency of early elongation can be modulated to impact expression of these highly regulated genes. Here we describe our current understanding of the steps involved in the transition from an unstable initially transcribing complex into a highly stable and processive elongation complex. We also discuss the interplay between factors that affect early transcript elongation and the potential physiological consequences for genes that are regulated through transcriptional pausing.
Transcription is the first - and most highly regulated - step in eukaryotic gene expression. The transcription cycle of RNA polymerase II (Pol II) is customarily divided into three phases: initiation, when Pol II is recruited to the promoter and begins to synthesize RNA, elongation, during which the polymerase extends the RNA transcript, and termination, when both polymerase and the transcript disengage from the DNA template. For many years, coordinated recruitment of the transcription machinery to form the pre-initiation complex was taken to be the principal step at which transcription was controlled , not in the least because the classic transcriptional systems, such as Gal4 promoter in S. cerevisiae and the Lac operon in E. coli, are regulated primarily through polymerase recruitment. Recent data, however, suggest that this conventional view of transcription regulation is incomplete and that the elongation phase of transcription is highly regulated in metazoan cells.
Studies in higher organisms indicate the existence of a conserved slow step in Pol II early elongation, during synthesis of the first ~100 nucleotides (nt) of mRNA. Originally described at the Drosophila heat shock and mammalian c-myc genes[2–7], but long regarded as an isolated phenomenon, this promoter-proximal pausing of polymerase is now recognized to be prevalent in metazoa and is increasingly appreciated as an important step in regulating transcriptional output [8–13]. Thus, early elongation, punctuated by promoter-proximal pausing, represents a distinct step in Pol II transcription that involves dedicated regulatory factors which mediate the transition towards processive, productive elongation (Figure 1A; [14, 15]).
Despite the keen recent interest in early elongation, many important questions remain concerning the molecular mechanisms and functional roles of promoter-proximal pausing. This review summarizes our current understanding of the transition from an initiating into a productive elongation complex, and how this transition might be subject to regulation through the coordinated action of negative and positive elongation factors. We also discuss the potential physiological roles of post-initiation control of gene expression and identify target areas for future research.
Transcription initiation is a complex, multistep process that involves the recruitment of RNA polymerase to a promoter, local melting of the DNA around the transcription start site (TSS), and formation of the first few phosphodiester bonds of mRNA (Figure 1B). Recognition of promoters begins with the assembly of a large protein complex containing Pol II and multiple General Transcription Factors (GTFs) on the promoter. The minimal set of factors required for the formation of this pre-initiation complex (PIC) includes Pol II, the GTFs TFIIB, TFIID (which includes the TATA-binding protein, TBP), TFIIE, TFIIF and TFIIH. Extensive interactions between the polymerase and GTFs increase the affinity of Pol II for the promoter region. In addition to the GTFs, recruitment of Pol II to promoters is greatly influenced by the Mediator complex, DNA-binding transcription activators, and a vast repertoire of nucleosome remodeling and modifying complexes (reviewed in [16, 17]). While these activities have been reviewed in detail elsewhere, we note that the involvement of multiple factors during PIC formation provides numerous opportunities for differential regulation.
While the exact mechanisms of TSS selection by Pol II are not completely clear, its positioning on the promoter may largely depend on the sequence specificity of GTF interactions with promoter DNA. Indeed, while transcription initiation from promoters that contain distinct sequence elements such as the TATA box, Initiator, or Downstream Promoter Element (DPE) is often very focused and likely to arise from a single nucleotide position, initiation from promoters that lack these motifs is much more dispersed (reviewed in ).
Following establishment of the pre-initiation complex, several steps have been shown in vitro to be slow or inefficient. First, work in purified transcription systems indicates that the initiating Pol II complex is susceptible to abortive initiation, during which the polymerase repeatedly synthesizes and releases short (2–3 nt) transcripts while still associated with the promoter . In vitro, abortive initiation can be rate-limiting for transcription, and the escape from abortive initiation is stimulated by several GTFs (TFIIB, TFIIE, TFIIF, TFIIH) [20, 21]. However, production of abortive transcripts in vivo has been so far demonstrated only in bacteria , and so the role of abortive initiation in higher organisms remains unknown. Second, the initially transcribing complex must undergo gross rearrangements of its contacts with GTFs. In particular, in the initiating complex, the N-terminal “B-finger” of TFIIB reaches through the Pol II RNA exit channel towards the enzyme’s catalytic site. Although this interaction helps to stabilize short RNA transcripts, TFIIB must be displaced as the nascent RNA extends beyond 4 nt in length [23–25]. Accordingly, the growing RNA helps drive conformational changes and increases the stability of the initiating complex, committing the polymerase to promoter escape . Third, at ~8–9 nt the length of the nascent transcript reaches that of the full-length DNA-RNA hybrid in the elongating complex. The formation of this stable RNA-DNA hybrid decreases the likelihood of RNA release [27–29]. The upstream portion of the extended transcription bubble also collapses at this point, further stabilizing the complex and releasing accumulated energy that may help drive promoter clearance . Although elegant biochemical assays have revealed much about these steps in vitro, they remain difficult to study in vivo, and thus it is currently unclear whether the processes of transcription initiation or promoter escape can be rate-limiting in cells.
The transition from initiation to early elongation is accompanied by a tightly controlled exchange of factors that is orchestrated in part through phosphorylation of the polymerase within the C-terminal domain (CTD) of its largest subunit (reviewed in [30, 31]). The CTD contains multiple heptad repeats with a consensus sequence YSPTSPS that can be phosphorylated at several sites (Figure 1B; reviewed in ). The CTD is mostly unphosphorylated during initial promoter binding, which favors interactions between the CTD and factors that stabilize the PIC, such as the Mediator complex . Pol II is then phosphorylated at Serine-5 by the cdk7 subunit of TFIIH, which is thought to destabilize the interactions between Pol II and promoter-bound factors, facilitating promoter escape. In addition, a number of factors involved in early elongation and modification of promoter-proximal histones preferentially bind Pol II with a Serine-5 phosphorylated CTD.
Then, as described in more detail below (section 2.2), the transition into productive elongation is triggered by recruitment of the P-TEFb kinase, which phosphorylates Serine-2 residues on the CTD, among other targets. This Serine-2 phosphorylation changes the entourage of Pol II-associated factors to favor processive transcription through chromatin and RNA processing . In addition to P-TEFb, recent evidence demonstrates that a second Serine-2 CTD kinase, comprised of Cyclin K and Cdk12 or Cdk13, also targets elongating Pol II, likely increasing CTD phosphorylation levels towards the 3′-ends of genes .
Phosphorylation of Serine-7 has also been described , and is important for transcription and 3′-end processing of snRNA genes . Although the roles of this modification for mRNA production are currently unclear, the involvement of TFIIH kinase in Serine-7 phosphorylation  and the identification of this modification near promoters of protein-coding genes suggest a potentially general function in gene expression.
By the time the nascent transcript reaches ~20 nucleotides, its 5′-end is modified through the addition of the 7-methyl-guanosine cap , which is critical for RNA stability, further RNA processing, export from the nucleus and protein translation . While co-transcriptional capping appears to take place in a precisely controlled fashion and is stimulated by Serine-5-phosphorylation [39, 40], RNA capping may also be facilitated by inefficient early elongation, which could provide a kinetic window for this step to take place .
Indeed, early elongation is a slow, inefficient process. In vitro, the early elongation complex displays a strong tendency to pause, arrest and terminate transcription (see Text box 1), with a substantial fraction of initiated Pol II complexes failing to generate full-length RNA . This property of early elongation likely has several causes, including negative elongation factors that inhibit synthesis through the promoter-proximal region (discussed below, section 2.1) and the fact that the early elongation complex has yet to undergo the final conformational changes that render it fully stable and processive.
Each of these terms has been used to describe Pol II that is detected near promoters, but does not efficiently transcribe into the gene; however, these seemingly interchangeable terms in fact refer to different subsets of transcription complexes. Below we provide definitions of each that we believe agree best with previous work.
refers to polymerases detected near a gene promoter without distinguishing preinitiation and elongating complexes. This term is useful when the data available are exclusively from ChIP assays, which cannot determine whether Pol II is engaged in transcription.
generally refers to elongation complexes that have halted transcription temporarily but remain transcriptionally competent and are expected to eventually resume transcription. However, we note that pausing can be a precursor to transcription termination, at least in bacterial transcription systems.
describes elongation complexes that have halted synthesis, then moved backward along the DNA template by several nucleotides, such that the growing 3′-end of the nascent RNA is displaced from the active center. A subset of these backtracked complexes becomes arrested, meaning that they cannot spontaneously restart transcription in the absence of additional factors that induce cleavage of the RNA. Formation of a new RNA 3′-end that is properly aligned with the active site allows for resumption of transcription, but these rescued elongation complexes can also pause before restarting RNA synthesis.
is a broad term referring to elongation complexes that have stopped RNA synthesis, without making any assumptions about the transcriptional status of the polymerase. As such, the term stalled encompasses complexes that are: paused, backtracked, arrested, undergoing termination, etc. Stalled complexes can also be generated artificially in vitro through withholding nucleotides from Pol II. This term is appropriate when one has confirmed that promoter-proximal Pol II is engaged in transcription, but it is unclear whether the polymerase will resume transcription.
In this review, we refer to regulation of early elongation as involving complexes that arepaused, since the majority of promoter-proximal elongation complexes detected in vivo appear to be in a transcriptionally competent state [10, 12], with other states representing short-lived or minor species.
In addition to detailed in vitro analysis, there is strong evidence that Pol II pauses during early elongation in vivo. This phenomenon has been best documented at the uninduced Drosophila heat shock genes, where Pol II synthesizes a 25–50 nt RNA before pausing in elongation [4, 7]. This paused elongation complex is phosphorylated at Serine-5 and the nascent RNA is largely capped [43, 44]. Pol II pausing at hsp70 gene can be quite long-lived and is rate-limiting for gene expression, with release of polymerase into the gene estimated to occur approximately once every 10 minutes prior to heat shock . Importantly, the duration of pausing is significantly reduced during heat shock, when the polymerase pauses for only ~4 seconds before proceeding downstream. Although many questions remain concerning the establishment and release of paused Pol II at the hsp and other genes, it is clear that this process involves regulated association of both negative and positive factors with early elongation complexes. As described below (section 3.1), recent work demonstrates that many- if not most- metazoan genes display hallmarks of Pol II pausing. These data underscore that, in addition to pausing being a prominent rate-limiting step in transcription, the duration of pausing can be regulated to modulate transcription output, raising considerable interest in understanding the mechanisms underlying this process.
While a number of factors interact with Pol II during the transition into productive elongation, the Positive Transcription Elongation Factor-b (P-TEFb) is the central player in this process [33, 46]. P-TEFb phosphorylates the Pol II CTD at Serine-2 residues, which relieves inhibition of early elongation caused by negative, pause-inducing transcription factors [46–48]. In addition, Serine-2 phosphorylation of the CTD provides a platform for assembly of complexes that travel with the polymerase into the gene. This includes factors that regulate transcription elongation, RNA processing and termination , as well as the modification and remodeling of histones (reviewed in [17, 49]). After transitioning to productive elongation, the mature Pol II complex is remarkably stable, and can transcribe tens, or even hundreds of kilobases without dissociating from the DNA template .
Termination of Pol II transcription can take place via distinct pathways (reviewed in ). Whereas most full-length Pol II transcripts terminate using the canonical cleavage and poly-adenylation machinery, several shorter transcripts, including those of snoRNA genes , use a distinct, poly-A independent pathway. Recently, components of both of these pathways have been detected near 5′-ends of genes [52, 53], suggesting that they might coordinate premature termination of incomplete transcripts under specific circumstances. We note that transcription attenuation through termination has been well established as a method of gene regulation in bacteria , and it will be interesting to determine if this is also a common regulatory mechanism in eukaryotes (discussed more below).
Following cleavage and termination of full-length transcripts, Pol II is released from the template DNA. Although this polymerase may dissociate completely from the locus, it has been suggested that the released Pol II may be “recycled” back to the promoter at genes where multiple rounds of transcription take place in rapid succession, thereby facilitating subsequent rounds of productive transcription .
Much of what we know about the control of early elongation stems from work with the nucleoside analog 5,6-dichloro-1-4-D-ribofuranosylbenzimidazol (DRB). Treatment of human cells with DRB (75 μM) inhibited mRNA transcription by ~95% by causing a significant decrease in polymerase processivity [56, 57]. However, DRB inhibited the production of long RNAs but did not affect the generation of short RNA species [14, 58], indicating that DRB affects the transition between early and productive elongation. Importantly, DRB was ineffective in inhibiting elongation in a purified system, indicating that it does not act on Pol II directly, but instead inhibits another factor . Subsequent studies identified the target of DRB as the kinase P-TEFb . However, the addition of P-TEFb directly to purified Pol II elongation complexes also had no effect on transcription, suggesting that the stimulatory effect of PTEF-b on Pol II elongation involved counteracting other, negatively acting elongation factor(s). These factors were later identified as the heterodimeric complex DRB Sensitivity-Inducing Factor (DSIF), which comprises the homologues of the yeast Spt4 and Spt5 proteins , and the Negative ELongation Factor (NELF) complex .
We now know that DSIF is required for NELF to associate with the early elongation complex, and the presence of both proteins together is required to inhibit transcription elongation . During the transition to productive elongation, P-TEFb triggers the release of NELF from the elongation complex, likely through phosphorylation of DSIF and/or NELF itself [33, 61, 62], stimulating escape from pausing and Pol II movement into the gene. Thus, early elongation involves coordinated interactions of both negative and positive factors with the transcription machinery [46, 48]. Below we describe our current understanding of the interplay between factors that establish pausing and those that stimulate pause release, and how this interplay regulates the transcriptional activity of many genes.
NELF is a complex of four subunits, NELF-A, B, C/D and E, that is conserved in higher eukaryotes, but has not been reported in C. elegans, S. cerevisiae, or Arabidopsis thaliana . The mammalian NELF-A protein, also called WHSC2, is thought to anchor the NELF complex to the polymerase . The NELF-B subunit, referred to as COfactor of BRCA1 (COBRA-1), has been reported to bind BRCA-1, ER-α and AP-1 family members, suggesting multiple potential mechanisms for NELF recruitment to gene promoters [65–67]. NELF C/D (also called TH1-like) are translation variants of the same mRNA, and their function is unknown. The NELF-E subunit contains an RNA recognition motif, which has been used as a basis to propose that NELF inhibits elongation by interacting with structures in the nascent RNA . Consistent with this idea, NELF is able to inhibit transcription only after the RNA is sufficiently long to emerge from Pol II. However, NELF was later shown to interact with RNA with very low-affinity in a sequence and structure-independent manner . Interestingly, recent findings showing that nascent RNA is also necessary for the association of DSIF with Pol II indicate that it may be DSIF, rather than NELF, that recognizes the transcript [69, 70]. In addition to DSIF, NELF has been shown to interact with the 5′-RNA cap binding complex as well as several 3′-processing factors, and to affect RNA processing at the short, non-polyadenylated histone transcripts .
Immunofluorescence imaging of polytene chromosomes showed that NELF broadly co-localizes with hypo-phosphorylated Pol II . This result agrees with higher resolution analysis of NELF distribution by ChIP-chip in both Drosophila and mammalian cells, which indicated that NELF generally occupies Pol II-bound genes [11, 13]. Indeed, ChIP-chip analysis of Pol II binding in Drosophila cells depleted of NELF using RNAi revealed that loss of NELF-mediated pausing reduces Pol II promoter occupancy at most promoters , globally implicating NELF in inhibition of early elongation. ChIP assays show that NELF is present promoter-proximally, but does not travel into the gene with the elongating Pol II, consistent with the idea the NELF dissociation accompanies pause release [11, 13, 72, 73].
Unlike NELF, DSIF remains with polymerase after its transition into productive elongation [74, 75]. DSIF has been shown to elicit both negative and positive effects on transcription , and so its effect is likely to be influenced by the activity of DSIF-interacting factors. For example, the “negative” role of DSIF in early elongation may be simply in mediating the interactions between NELF and the polymerase. Phosphorylation of the Spt5 subunit of DSIF by P-TEFb, and the dissociation of NELF from Pol II during the transition to productive elongation could thus free up DSIF to play a positive role by interacting with other factors. For example, the Spt5 was shown to associate with the capping enzyme and stimulate its activity in vitro .
Interestingly, the DSIF complex is present in all eukaryotes, archaea, and shares homology to the bacterial transcription factor NusG . Thus, DSIF is likely to be fundamental for Pol II transcription and to serve as a general platform for regulatory factors, some of which may be different for different clades, whereas NELF may play a role in establishing pausing during early elongation specifically in metazoans.
P-TEFb is a heterodimer of the Cdk9 kinase and CyclinT (in mammals, it may contain cyclin T1 or T2;  and references therein). As described above, several decades of study using inhibitors of P-TEFb have provided evidence that P-TEF-b activity is essential for the vast majority of mRNA transcription [56, 78]. In addition to its role in phosphorylation of Serine-2 of the CTD, P-TEFb has also been shown to phosphorylate NELF and DSIF, which is suggested to release NELF, either directly or through altering its interactions with the elongation complex indirectly, converting DSIF from a negative to a positive elongation factor [61, 62]. Accordingly, in vitro transcription elongation assays indicate that the kinase activity of P-TEFb relieves transcription repression by NELF [33, 48].
Because recruitment of P-TEFb leads to release of pausing factors and association of factors that promote productive elongation, it is a pivotal point of regulation in Pol II transcription. P-TEFb can be directly recruited to genes by transcription activators like NF-κB and c-myc [13, 79, 80], or the Bromodomain protein Brd4 [81, 82]. Moreover, artificial recruitment of P-TEFb using a DNA-binding domain fusion enhanced transcription from the Drosophila hsp70 gene, suggesting that delivery of P-TEFb to promoters may be a key role of transcription activators . Recently, a Super Elongation Complex has been described , which contains P-TEFb together with elongation factors like ELL and a number of chromatin remodeling factors, such as Paf1 , suggesting that P-TEFb may often associate with the elongating polymerase as part of a large, multi-protein complex. These findings highlight the importance of regulated recruitment of P-TEFb to promoters and suggest that the cell can accomplish this goal through a number of distinct mechanisms.
In addition to regulating its localization, the amount and availability of active P-TEFb in the cell is further controlled through sequestering of P-TEFb into an inactive complex with 7SK RNA and HEXIM protein (reviewed in [33, 86]). Conditions that broadly inhibit transcription lead to the release of P-TEFb from this inactive complex, resulting in a rapid increase in P-TEFb activity. Intriguingly, general P-TEFb inhibitors are present in the germ lines of both Drosophila and C. elegans [87, 88], suggesting that blocking P-TEFb function may broadly suppress germ line transcription prior to the appropriate developmental stage. In summary, mounting evidence points to P-TEFb as a major player in transcriptional regulation, further implicating the transition between early and productive elongation as the central step in regulation of gene expression.
While DNA sequence must contain all information needed for regulation of genes, our understanding of how the protein machinery interprets this information is limited. In particular, it is unclear what sequence features specify the location and duration of promoter-proximal Pol II pausing on a given gene. Transcription elongation is known to be non-uniform and RNA polymerases are inherently prone to transient pausing that is dependent on the local sequence context [89–91]. Two aspects of sequence are known to influence elongation profoundly: first, the thermodynamic stability of the 9-nt RNA-DNA hybrid formed by nascent RNA and the DNA template, with weaker hybrids (e.g. more A+T-rich) increasing the likelihood of pausing ; and second, structures formed by the nascent RNA facilitate pausing by displacing the 3′-end of RNA from the polymerase catalytic site .
Notably, transcription through A+T-rich regions with low hybrid stability has also been shown to promote transcriptional arrest, a characteristic common among early elongation complexes in vitro. Arrest occurs when Pol II that has paused during synthesis of a region with low RNA-DNA hybrid stability slides backwards on the DNA towards a more thermodynamically stable sequence (e.g more G+C-rich; ). This backward movement of polymerase dislodges the 3′-end of the RNA from the polymerase active site, requiring realignment of RNA in the catalytic site before synthesis can resume (Figure 2A).
Based on the HIV system, where a stem loop formed by the nascent RNA (TAR) was found to affect pausing and interact with NELF in vitro it was proposed that structures formed by the nascent RNA may trigger promoter-proximal pausing at other genes . However, subsequent studies demonstrated that pausing at HIV does not require the formation of the TAR RNA hairpin  and argued against the formation of stable RNA structures at most paused genes in Drosophila , suggesting that pausing is not a consequence of structures formed by the nascent RNA.
The idea that promoter-proximal DNA sequences play a role in pausing was suggested by early biochemical assays, which established that DSIF and NELF repress early transcription elongation – not by inducing pausing de novo– but instead by increasing the duration of intrinsic, sequence-specified pauses [47, 60]. Thus, pausing involves cooperation between DNA sequence and pause-enhancing protein factors. Finding sequences that specifically influence Pol II pausing is complicated by the general abundance of sequence motifs near promoters, including those involved in formation of pre-initiation complexes. Nevertheless, the Downstream Promoter Element (DPE) and Pause button motifs are clearly enriched at genes where Pol II pausing is prominent (; Figure 2B). Interestingly, we recently found that both of these G+C-rich sequences are preferentially located between positions +26 and +33 with respect to the TSS, precisely aligned with the peak of Pol II pausing (Figure 2). Downstream of this G+C-rich region, the sequence of paused genes is characterized by stretches of A+T, leading to the suggestion that the initially transcribed sequence contains a signal that specifies pausing and backtracking to Pol II . According to this model, promoter-proximal pausing comprises two stages: first, an initial halt in transcription is induced by an A+T-rich, weak RNA-DNA hybrid; and second, the polymerase backtracks to reach the more thermodynamically stable, G+C-rich sequence upstream, which coincides with the DPE or pause button motif (Figure 2A; ). Thus, we envision that the intrinsic inefficiency of early elongation and the propensity of Pol II to undergo backtracking at these genes provide a kinetic window of opportunity for the binding of NELF and DSIF, and the establishment of a stably paused state. In this view, the static information within DNA can influence the dynamic competition between Pol II pausing and productive elongation, thereby impacting the expression properties of the gene.
Rescue of polymerase that has backtracked and arrested can be achieved through inducing internal cleavage of the nascent RNA to create a new 3′-end properly aligned with the site of catalysis (reviewed in ), a task that is accomplished by the transcript cleavage factor TFIIS. That Pol II is particularly susceptible to arrest within the promoter-proximal region  suggests that TFIIS might be involved in regulating the transcriptional activity of early elongation complexes. Indeed, TFIIS is required to maintain paused Pol II at Drosophila hsp70 gene in a transcriptionally competent state , and high-throughput, RNA-seq analysis of paused Drosophila genes revealed that backtracking and TFIIS-mediated transcript cleavage are common among these early elongation complexes .
However, it is important to note that loss of TFIIS activity did not significantly alter RNA expression profiles in Drosophila cells , nor did it block hsp70 induction following heat shock, instead resulting in a kinetic delay in gene activation. These results suggest that the cell possesses additional mechanisms for re-starting arrested Pol II. In agreement with this idea, recent work in yeast has shown that while TFIIS is dispensable for viability, transcript cleavage activity within the polymerase is essential .
A number of highly paused genes, including the Drosophila hsp genes, are deprived of nucleosomes within the initially transcribed 200–300 base pairs, leading to the view that nucleosomes do not play a role in establishing pausing [101, 102]. Moreover, pausing can be recapitulated in vitro on DNA templates lacking nucleosomes . However, several recent genome-wide studies have shown that nucleosomes are present just downstream of paused polymerases, suggesting that nucleosomes might cause pausing by presenting physical barriers to polymerase elongation [104, 105].
In contrast to this idea, a recent study demonstrated that promoters with high levels of Pol II contained promoter-proximal nucleosomes that were readily dissociated by low salt challenge, indicating that these nucleosomes are not static barriers to transcription, but instead are relatively unstable species . The idea that nucleosomes exist in a dynamic equilibrium with the transcription machinery is consistent with earlier work which indicated that there was ongoing competition between paused Pol II and nucleosomes for promoter occupancy . Importantly, this work showed that loss of Pol II pausing due to NELF RNAi significantly increased nucleosome assembly at several genes investigated, suggesting that it is the paused polymerase may determine the occupancy of promoter-proximal nucleosomes rather than having nucleosomes dictate the behavior of Pol II .
While histone modifications have not been unequivocally connected to pausing, recent work hints at this possibility. For example, acetylation of histones has been shown to regulate the recruitment of P-TEFb via Bromodomain protein Brd4 [81, 82]. Also, promoter-proximal nucleosomes are specifically enriched in variant histone H2A.Z [104, 105] and methylation of histone H3 at Lysine 4 , suggesting that specific chromatin markers might impact early elongation.
A number of other protein factors are implicated in early elongation, although their mechanisms of action are not yet clear. For example, the GAGA (GAF) factor is present at many Drosophila genes with paused polymerase , but the manner in which GAF influences pausing remains to be determined. One possibility is that GAF may maintain nucleosome-free promoter regions through its ability to recruit chromatin remodelers , thereby enhancing the rate of assembly of pre-initiation complexes and increasing promoter Pol II levels.
Additional factors that facilitate elongation and thus potentially influence paused Pol II include ELL and the Elongins, which have been shown biochemically to stimulate elongation and reduce pausing, and are also found on Drosophila heat shock genes in vivo [110, 111]. Recently, Gdown1 has been identified as a tightly bound component of Pol II and was suggested to play a role in determining elongation efficiency . Further, components known to play roles at other steps in the transcription cycle are also implicated in early elongation. One example is TFIIF, which stimulates transcription initiation and also reduces the rate of pausing . Interestingly, recent work has shown that TFIIF is unable to elicit these positive effects on early elongation prior to P-TEFb-mediated phosphorylation events . Further, the Mediator complex, which was traditionally considered to be uniquely an initiation factor, has been shown to play a role in promoter escape .
However, of all the transcription elongation factors, the Polymerase Associated Factor complex (Paf1C) has gained the most recent attention for its possible roles in controlling early elongation. Paf1C enhances elongation rates on both naked DNA and chromatin templates [114, 115], and functions in concert with a number of other factors that influence histone modifications and RNA processing (reviewed in ). Although Paf1 can clearly impact productive elongation, evidence supports a role for Paf1 in early elongation as well: for example, Paf1 was suggested to play a role in vivo during activation of heat shock hsp70 genes in Drosophila  and has recently been found to interact genetically and physically with a number of factors that regulate early elongation, including DSIF, TFIIS and P-TEFb [85, 115, 118, 119]. Moreover, Paf1C is required for efficient Serine-2 phosphorylation of the Pol II CTD , suggesting that it might aid in the transition between early and productive elongation. Overall, these data indicate that much remains to be discovered about the known elongation factors and suggest that the repertoire of proteins that affect early elongation has not yet been exhausted.
A critical question raised by the prevalence of Pol II pausing concerns the functional impact of pausing on gene regulation. Insight into the role of paused Pol II may be gleaned from a classic example of regulated pausing that occurs during transcription of the late genes in E. coli bacteriophage lambda . There, promoter-proximal pausing of polymerase allows time for the binding of a phage-encoded anti-termination factor that renders the polymerase insensitive to termination signals further downstream. Accordingly, one can envision that pausing in eukaryotic Pol II transcription may also serve to adjust the elongation properties of Pol II, although it is highly likely that this “licensing” is more complex and involves multiple and diverse interactions. Although many questions remain concerning the molecular mechanisms, and biological roles of Pol II pausing, our current understanding of these problems is discussed below.
Early ChIP-chip and microarray analysis in human fibroblast cells revealed that Pol II, TFIID, and active histone marks were detected at many gene promoters that did not produce full-length transcripts nor display significant levels of Pol II within the gene . This study provided the first genome-wide evidence that regulation of many genes could take place after recruitment of the transcription machinery. In the absence of evidence to the contrary, however, these Pol II were presumed to be in inactive preinitiation complexes, rather than being paused in elongation. Subsequent work established that most protein-coding genes bound by Pol II displayed histone modifications characteristic of transcription initiation, though not all of these genes were actively transcribed . Moreover, this study showed that a number of these seemingly inactive genes generated short RNA transcripts, implying that regulation of these genes occurred post-initiation .
Evidence that the slow step observed between initiation and productive elongation was indeed promoter-proximal pausing came from Drosophila, where a combination of ChIP-chip, permanganate footprinting (see Text box 2) and genetics was used to show that Pol II was broadly and stably engaged in transcription within the promoter-proximal region, under the influence of the NELF complex [8, 9, 11]. Moreover, these studies revealed that pausing was enriched at genes involved in developmental and stimulus-responsive pathways [8, 9], and that paused Pol II was present at active as well as inactive genes . Thus, this work revealed that pausing during early elongation was, in fact, a common phenomenon and raised the possibility that it represented an important gene regulatory strategy.
Detects the presence of Pol II bound to DNA. Advantages: A snapshot of Pol II distribution can be achieved through rapid cross-linking of whole cells. Analysis of individual genes is straightforward. Readily adapted for high-throughput genome-wide studies using ChIP-chip and ChIP-seq. Disadvantages: Low spatial resolution, complicating the distinction between engaged Pol II and polymerase in pre-initiation complexes. ChIP may be difficult to perform in samples that are not in a homogeneous suspension, such as tissues.
Detects locally melted regions of DNA, including those arising from paused polymerase, by selectively modifying unpaired Thymines within a stable, open transcription bubble. Advantages: Can be performed directly on whole cells or tissues. Achieves essentially nucleotide-level resolution for mapping paused polymerase. Does not require antibodies. Disadvantages: Low throughput (requires Ligation-Mediated (LM) PCR on individual genes - no genome-wide application as of yet). Application is limited to genes where good primers for primer extension and LM PCR can be designed; because of that, permanganate probing in mammalian systems has been markedly less successful than in Drosophila. Since actively elongating polymerase generates very transient melting of DNA at a given location, permanganate is inefficient at detecting productive elongation complexes.
Detects elongation-competent RNA polymerases through their ability to incorporate a label into nascent RNA. Advantages: Specifically reveals transcriptionally engaged polymerase, with extremely high sensitivity and low background. Adaptable for high-throughput genome-wide applications. Can be used in various organisms. Disadvantages. Requires preparation of nuclei to detect paused polymerase. Resolution for mapping of paused polymerase is reduced by the necessity to allow polymerase to run-on and incorporate labeled nucleotides into RNA.
Directly detects short RNA species derived from paused Pol II. Advantages. Sequencing of RNAs from the 3′-end reveals the positions of promoter-proximally paused polymerase at nucleotide-level resolution. Designed for high-throughput genome-wide applications. Does not require antibodies, cell treatment or labeling. Can be used in various organisms. Disadvantages. Requires enzymes available from only one commercial source. Cannot distinguish between RNA species that remain associated with paused Pol II and those that have been released.
However, important questions remained about the transcriptional competence of promoter-proximal Pol II. While studies on individual genes in vitro showed that paused complexes remained transcriptionally active [14, 42], and nuclear run-on assays at a number of genes confirmed this idea in vivo (e.g. [4, 6]), none of the above described genome-wide techniques could determine whether paused elongation complexes were broadly capable of resuming RNA synthesis in living cells. Fortunately, an adaptation of traditional run-on analysis was developed for genomic use (termed Global Run-on-Sequencing, or GRO-seq ), which demonstrated that promoter-proximal Pol II could readily resume elongation following removal of negative elongation factors with detergents. Moreover, the high sensitivity of this technique allowed for detection of full-length RNA synthesis at the vast majority of genes that harbor paused Pol II. The fact that even highly active genes undergo transient pausing was underscored by isolation and sequencing of short, capped RNA species derived from paused Pol II : this work demonstrated that nearly all active genes produced significant levels of these short transcripts.
Taken together, these data support the view that Pol II recruited to gene promoters is generally not released efficiently into productive elongation. Moreover, they suggest that transient, NELF-mediated pausing may be a general feature of the transcription cycle, perhaps serving as a “checkpoint” to ensure that only properly matured elongation complexes proceed into the gene . Notably, this would imply that it is the efficiency of pause release, rather than the initial establishment of pausing, that would play a major role in determining levels of productive elongation. In this view, regulated recruitment of P-TEFb to promoters could be a general mechanism for transcriptional control, fully consistent with recent work broadly implicating the transcription factor c-myc in P-TEFb-mediated transcription activation .
Apart from mammals and Drosophila, evidence also points to the existence of pausing in C. elegans . Interestingly, homologues of NELF complex components are yet to be identified in C. elegans, raising the possibility that it employs a distinct mechanism to establish Pol II pausing. The yeast S. cerevisiae also lacks discernible NELF homologues, and ChIP experiments fail to detect promoter-proximal enrichment of Pol II. Moreover, to date, no biochemical assays (e.g. run-on or permanganate analysis) have documented the presence of a stably engaged, paused Pol II in yeast, although ChIP-chip studies suggest that Pol II may accumulate upstream of promoters under certain conditions, such as in stationary phase . Thus, currently available data favor the notion that promoter-proximal Pol II pausing is restricted to multicellular organisms.
Several lines of evidence indicate that the promoter-proximal polymerase detected by ChIP represents stably paused Pol II rather than the pre-initiation complex. First, the location of Pol II peaks, as judged by ChIP in either Drosophila or mammals, is shifted by ~50 nucleotides downstream from the TSS, which corresponds to the average location of paused Pol II . In this regard, it is worth noting that although earlier work in mammals reported localization of Pol II ChIP-chip peaks directly over promoters, subsequent high-resolution ChIP-seq analysis revealed that this broad “promoter” Pol II peak can be resolved into two distinct peaks flanking the TSS, corresponding to polymerase transcribing in both the sense and antisense directions [10, 126]. We also note that prior studies that attempted to distinguish PICs from elongation complexes using ChIP with anti-CTD antibodies alone are significantly complicated by the ambiguity of antibody specificity: for example, the H14 antibody, often thought to recognize Serine-5 phosphorylation characteristic of paused Pol II, was recently shown to preferentially bind a CTD phosphorylated both at Serine-5 and Serine-2 .
Second, Pol II promoter occupancy is generally accompanied by histone modifications characteristic of transcription initiation, such as acetylation of histone H3 . Third, NELF and DSIF broadly localize with promoter-proximal Pol II, and depletion of these factors shifts Pol II distribution [8, 13]. Fourth, GRO-seq analysis clearly demonstrates a strong enrichment of promoter-proximal Pol II that is engaged in elongation . Finally, there is remarkable correspondence between the level of short RNAs derived from a given Drosophila promoter and the Pol II ChIP-seq signal at that promoter , arguing strongly that most of the promoter-proximal Pol II is involved in the synthesis of short RNA species.
However, while there is good evidence that promoter-proximal Pol II has initiated RNA synthesis, whether these early elongation complexes are uniformly destined to produce full length transcripts remains an area of active debate. There are two alternative scenarios for the fate of paused Pol II that current techniques cannot distinguish: first, paused Pol II remains stably associated with the DNA template and is eventually released into productive elongation; or second, Pol II pausing is followed by premature termination and subsequent transcription re-initiation. Although both possibilities would involve transient pausing, and thus allow for the detection of paused polymerases at steady-state, they would entail substantially different mechanisms for regulating productive elongation. In the first scenario, the overall level of gene transcription could be determined by the recruitment of release factors such as P-TEFb, whereas in the second scenario, transcription levels could be dictated by altering the proportion of polymerases that undergo termination vs. productive elongation.
Although there are many examples where premature transcription termination is regulatory in bacterial systems, and the balance between transcription read-through vs. termination has been shown to regulate expression of the HIV provirus , it is not yet clear if promoter-proximal termination regulates expression of cellular genes in metazoans . Intriguingly, evidence from yeast suggests that a lack of Serine-2 CTD phosphorylation might help recruit the termination machinery , suggesting that termination efficiency, like pause release, might be modulated through CTD phosphorylation.
Evidence that paused complexes are largely released into productive elongation comes from classic studies of the Drosophila heat shock genes , and from more recent GRO-seq analysis which demonstrates that promoter Pol II is generally capable of resuming elongation in isolated nuclei . However, run-on analysis involves treatment of nuclei with the detergent sarkosyl, which some argue may allow for elongation from complexes that would have remained moribund in vivo. But, it is important to note that sarkosyl is not capable of re-starting polymerases that have become arrested or terminated transcription , ruling out the possibility that either of these species contribute to GRO-seq signal in cells. Moreover, comparison of short RNA species in Drosophila cells with or without TFIIS RNAi demonstrated that backtracked Pol II is readily rescued by TFIIS, such that early elongation complexes are maintained in a transcriptionally active state, poised to resume elongation .
Nonetheless, it remains possible that a certain fraction of paused Pol II terminate transcription, and that the steady-state levels of promoter Pol II result from iterative rounds of transcription initiation, pausing and termination. It is anticipated that future, quantitative analysis of short RNA species will shed light on this question. Ultimately, however further advances in live, single-cell imaging techniques may be necessary to definitively determine the fate of paused Pol II.
The enrichment of highly paused Pol II at signal-responsive genes raises a question of its functional role. A common rationale is that pausing represses basal transcription of inducible genes. This hypothesis predicts that pausing should be more prevalent on genes with low expression levels. However, experiments do not support this idea: genome-wide analyses demonstrate that there is no correlation between the pausing status of a gene and the level of its expression . In fact, Pol II pausing and NELF binding are readily detected at active genes [11–13].
Thus, the available data argue strongly that NELF-mediated pausing, rather than repressing gene expression, serves to modulate transcription kinetics, output, or the coordination of gene activation. One can readily envision that the step-wise, stochastic assembly of a large pre-initiation complex would inherently introduce noise into the kinetics of gene activation. In contrast, having Pol II engaged and awaiting the signal for pause release could reduce the variability in transcription output. Indeed, pausing has been suggested to enable more synchronous gene activation during Drosophila development . Accordingly, paused Pol II is prevalent at Drosophila genes involved in development and stress responses, where synchronous gene induction within a large group of cells may be beneficial [8, 9].
While recent work supports the idea of paused Pol II accumulating on genes before activation [13, 124], a reduction in pausing has not yet been shown to impact the rate of gene activation . Interestingly, several reports suggest that paused Pol II plays a role in the rapid shutoff of transcription following induction [73, 130, 131]. Thus, pausing could allow a gene to experience a burst of transcription followed rapidly by rest, since release from the paused state and its subsequent reinstatement could be highly dynamic. In agreement with this, genes with paused Pol II are known for transcriptional responses that are both rapid and transient (e.g. hsp70, c-myc, c-fos, junB ). Therefore, accumulating evidence supports a model wherein paused Pol II is not a stable “off switch” that represses transcription, but instead that pausing is a highly dynamic state, whose effective duration is regulated to serve as a “volume knob” that allows for fine-tuning of gene expression levels in response to a changing environment.
Data also support a role for paused Pol II in stabilizing permissive chromatin architecture around highly regulated genes, where the polymerase prevents gene repression by competing with nucleosomes for promoter occupancy . At these genes, depletion of NELF and reduction in pausing led to chromatin assembly over the promoter, and gene down-regulation . Thus, we envision that a critical role of pausing is to protect promoter regions from the formation of repressive chromatin structures, thereby maintaining them in an active- or activatable- state. Further supporting this notion, paused Pol II was suggested to play a structural role in blocking nucleosome assembly and maintaining chromatin boundaries near the HOX genes in Drosophila . Moreover, paused Pol II may serve to stabilize components of the pre-initiation complex, providing a scaffold of factors to facilitate more efficient re-initiation of transcription through enhanced polymerase recycling.
Although recent genome-wide studies showed that Pol II pausing is widespread, they have not answered the question of whether pausing duration is generally regulated to affect transcription output. However, studies of individual stimulus-responsive genes suggest this possibility. For example, the duration of pausing is known to be modulated during activation of the Drosophila heat shock genes, and at several genes involved in the DNA damage response [8, 43]. At these genes, pausing was shown to be rate-limiting for gene expression, and thus reduction in pause duration would lead directly to an increase in transcription output. Moreover, recent work implicates transient reduction in pausing in the burst of transcription from the murine TNF-α gene during bacterial challenge .
Several findings also support the idea that the dynamics of pausing are regulated during metazoan development. For example, the abundance of paused Pol II on developmental genes in early Drosophila embryos  and mouse embryonic stem cells , suggests that pausing is established as an initial, “primed” state that then changes during development. Further, NELF knockdown has been shown to disrupt the developmental program in mouse ES cells  and in Drosophila [134, 135], supporting the importance of Pol II pausing in development.
Although genome-wide demonstrations of regulated pause release are currently lacking, the emerging view is that both Pol II recruitment and pausing are important regulatory steps in metazoan transcription, and that gene expression represents the balance between these two rates. We show a simulated example of gene activation that illustrates this point (Figure 3). We depict a situation where Pol II recruitment is mediated by a DNA-binding transcription activator, and pause release is mediated by P-TEFb recruitment. At this hypothetical gene, pause release is slow in the uninduced state (0.002 molecules/s, corresponding to Pol II release every ~8 minutes) in comparison to the maximum rate of Pol II recruitment (0.01 molecules/s, or 1 polymerase recruited every 100s) and thus pausing is rate-limiting for overall transcription. Gene induction results in a dramatic increase of polymerase recruitment rate triggered by binding of the transcription activator (to 1 polymerase recruited each second) and also a substantial decrease in pause duration (to 0.25 molecules/s, or a pause duration of 4 s), for example due to an increase in histone acetylation and recruitment of P-TEFb through binding of Brd4. In this scenario, although both rates were increased in order to significantly increase transcription output, Pol II pausing remained rate-limiting during gene activation. Then we consider what would happen at this gene under conditions where histone acetylation and P-TEFb recruitment was stimulated in the absence of the transcription activator. If this event increased the rate of pause release ten-fold (to 0.02 molecules/s, for a pause duration of 50 s), then that would render Pol II recruitment rate-limiting at this gene, effectively switching the gene regulatory strategy. Although this is just a hypothetical example, we suggest that the interplay between regulation of transcription at these two steps might be common on many genes in metazoans. By imposing dynamic limitations at two different steps in transcription cycle, each controlled by distinct and perhaps independent sets of factors, precise and robust regulation of transcription output can be achieved.
The principal conclusion based on recent studies is that what was once generally dismissed as a peculiar oddity has become recognized as one of the central, if not the central, step of metazoan transcription regulation. Fully befitting the novelty of the field, the latest advances made using genomic techniques have revolutionized the way we look at regulation of gene expression, but at the same time ultimately raised more questions than they answered. Nevertheless, while the detailed mechanisms and regulatory roles of pausing remain to be determined, it is already safe to say that what is currently seen as polymerase pausing likely betrays multiple mechanisms, which bring together regulation of both individual genes and major pathways and networks. The pervasive nature of pausing highlights the complexity and sophistication of eukaryotic transcription, and recent advances in both technology and concepts provide unprecedented opportunities to address these questions by experiment. Next-generation sequencing technologies are expected to provide continually deeper coverage and higher resolution to global analysis. On the other hand, further advances in single-cell manipulation should enable analysis of global gene expression dynamics in individual cells.
We thank Daniel A. Gilchrist and Guang Hu for critical reading of the manuscript, Gilberto dos Santos for help with the figures, and J. Lis, D. Price and D. Gilmour for thought-provoking discussions that helped shape this review. This research was supported by the Intramural Research Program of the NIH, National Institute of Environmental Health Sciences (Z01 ES101987).
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.