|Home | About | Journals | Submit | Contact Us | Français|
Transcription is the first and most regulated step of gene expression. RNA polymerase (RNAP) is the heart of the transcription machinery and a major target for numerous regulatory pathways in living cells. The crystal structures of transcription complexes formed by bacterial RNAP in various configurations have provided a number of breakthroughs in understanding basic, universal mechanisms of transcription and have revealed regulatory “hot-spots” in RNAP that serve as targets and anchors for auxiliary transcription factors. In combination with biochemical analyses, these structures allow feasible modeling of the regulatory complexes for which experimental structural data are still missing. The available structural information suggests a number of general mechanistic predictions that provide a reference point and direction for future studies of transcription regulation.
In all organisms, transcription is carried out by DNA-dependent RNA polymerases (RNAP) and can be divided into three mechanistically and structurally distinct stages: initiation, elongation and termination. In bacteria and eukaryotes, each phase of transcription is a target for numerous regulatory factors. The interplay among the unstable initiation phase, transition to a stable elongation complex (EC) followed by processive synthesis, transient halting at numerous “roadblocks”, and RNA release depends on the intricate network of interactions between RNAP, nucleic acid (NA) signals, and/or auxiliary transcription factors. The past few years have been characterized by an “explosion” of structural studies that resulted in detailed structural characterization of several key transcription intermediates formed by both bacterial and eukaryotic multi-subunit RNAPs[1-12]. These structures shed significant light on such general mechanisms as downstream (dw) DNA and RNA/DNA hybrid strand separation[3,6,7], substrate selection and loading[4,6,9], DNA translocation[4,13,14], formation and rescue of the paused and backtracked/arrested complexes[6,11,12], inhibition by small molecules[4,13-21], etc..
This review is focused on the mechanistic insights gained from recent crystallographic analysis of the bacterial system that revealed regulatory “hot spots” in RNAP and implied common mechanisms utilized by structurally and/or functionally divergent transcription factors.
The high resolution structure of the T. thermophilus (tt) EC provided the first detailed view of the bacterial EC and identified the determinants of its stability, processivity, and response to nucleic acid signals, effectors, and auxiliary proteins. In the ttEC, the downstream (dw) DNA binding cavity accommodates 13 bps of the dwDNA duplex. The 9 bp RNA/DNA hybrid resides in the RNAP main channel, and the nascent RNA transcript, which is displaced from the template (T) DNA, is threaded through the RNA exit channel(Figures 1a, 1b). Access to the RNAP active site through the main channel is blocked suggesting that the widely open secondary channel (SC) is the major substrate entry pore. In the crystals, the complex is in the post-translocated state with the acceptor DNA template (register +1) available for base pairing with the incoming NTP.
The dwDNA duplex is melted immediately upstream of the active site (register +2) implying that only one substrate at a time may bind to the EC[3,4,22]. The +2 dwDNA base pair stacks on the fork loop-2 (fork-2), which most likely plays a crucial role in dwDNA strand separation and proper positioning of the open acceptor T-base in the active site.
At the upstream edge of the transcription bubble, the last (9th) bp of the RNA/DNA hybrid stacks on the β’-subunit “lid” loop that sterically blocks the nascent RNA/DNA duplex, reminiscent of the downstream fork-2[24,25]. The first displaced RNA base is trapped in the hydrophobic pocket formed by the β-subunit switch-3 segment implying the DNA-dependent mechanism of RNA displacement. The switch-3 pocket may possess certain sequence specificity and emerge as a critical checkpoint mediating initiation, translocation, pausing, and/or termination[24-26]. It is also possible that a similar, yet unknown, pocket accommodates the first displaced non-template (NT) DNA base (+1) additionally stabilizing the bubble. The mechanisms of the dwDNA and RNA/DNA hybrid strand separation are most likely universal for all multi-subunit RNAPs [6,7].
Modeling based on the ttEC structure shows that the intact RNA exit channel may accommodate and stabilize the short (5 bp) RNA hairpins that are characteristics of paused transcription complexes. On the other hand, there does not appear to be enough space for the bulky (8-12 bp) termination hairpins suggesting substantial widening of the RNA exit channel. The recently proposed allosteric model of termination suggests that these alterations open the passage to the main channel and allow hairpins to invade the main channel, partially (4-5 bps) melt the upstream RNA/DNA hybrid and travel by ~70Å towards the RNAP active site where it interacts with the catalytic trigger loop (TL) . Altogether, migration of the RNA hairpin compromises the hybrid and induces conformational changes in RNAP that trigger NA dissociation; DNA is static and plays no active role. This RNA-dependent mechanism is thus at odds with the earlier “translocation” model in which multiple steps of unproductive (no RNA synthesis) forward DNA translocation result in transcript release and complex dissociation.
A combined, “DNA displacement” model may reconcile these conflicting mechanisms (Figures 1c, 1d). We suggest that positioning of the hairpin both near the RNAP active site and the dwDNA binding cavity as specified in the allosteric model would result in competition with and displacement of the dwDNA duplex. Displacement presumably results from high affinity interactions of the hairpin head with the TL; these interactions are not evident if the dwDNA is not displaced because it is located between the hairpin and TL in the crystal structure. Displacement of dwDNA most likely dislodges the acceptor T-base from the active center to produce a catalytically inactive complex in which the remaining downstream hybrid (4-5 bps) can be unzipped as specified by the translocation model. Notably, the translocation model alone does not explain of how forward translocation can proceed without RNA synthesis.
Our two structures of substrate-bound ttECs revealed that NTP-induced refolding of the TL mediates formation of the closed, catalytically active intermediate(Figure 1e). This finding allowed us to propose the nucleotide addition cycle (NAC) model that is most likely relevant for all multi-subunit RNAPs[6,9,13] (Figure 1f). Prior to NTP loading, the EC exists in equilibrium between pre- and post-translocated states. Substrate loading occurs in two steps. First, the NTP binds to the open (unfolded TL) post-translocated EC in a template-dependent manner forming an inactive, pre-insertion intermediate. Secondly, NTP-induced displacement of the bridge helix (BH) and fork-2 facilitates folding of the TL in the α-helical hairpin. Upon TL folding, the complex isomerizes to the catalytically-competent, closed “insertion” state. The NAC culminates with the catalytic reaction that results in transcript extension and pyrophosphate release. The antibiotic streptolydigin (Stl) binds in a pocket formed by the BH and fork-2 and blocks their NTP-dependent displacement, thereby preventing TL folding and freezing the substrate in the inactive preinsertion state[4,17].
Structural studies have revealed that alterations (displacement/refolding) of several crucial RNAP domains modulate activity and/or stability of the transcription complexes. The domains involved in these conformational switches are likely targets of auxiliary transcription factors that enhance or inhibit these switches (Figure 2).
The TL, which undergoes dramatic substrate-induced refolding emerges as a central regulatory element, as well as a key determinant for the fidelity and processivity of transcription in multi-subunit RNAPs[3,4,9,13,14,28] (Figures 1e, 1f, 2a, 2b). Given the high sensitivity of its structure to even subtle alterations of adjacent structural domains and a possibility that it may adopt multiple conformations, the modulation of TL refolding by various transcription factors provides numerous degrees of freedom in transcription regulation. In addition to Stl, the TL is a plausible target for tagetitoxin (Tgt), DksA/ppGpp system, Gre-factors and Gfh1.
The fork-2 loop appears to play multiple functional roles mediating dwDNA melting and substrate loading, and maintaining bubble stability through interactions with the RNA/DNA hybrid (Figures 2a, 2b). This loop, which bridges the upstream and downstream RNAP domains and forms a part of the catalytic center, may also serve as a circuit for transmitting allosteric signals generated by remotely bound transcription factors (RfaH, NusG, NusA, etc.) to the RNAP active site. Interestingly, the fork-2 forms part of binding pocket for rifamycins (Rifs) suggesting that Rifs most likely block the functionally significant NTP-induced displacement of this loop in a Stl-like manner, thereby allosterically affecting active site configuration and/or substrate loading in agreement with the proposed allosteric model.
Recently, specific conformations of the BH and TL induced by a toxin, α-amanitin, were shown to stabilize a pre-templated intermediate in the eukaryotic enzyme in which the overall complex is in the post-translocated state, while the open acceptor T-base occupies an inactive site over the BH between pre- and post-translocated registers . The two conformations of the BH that have been observed in bacterial RNAP (straight and “flipped”) were also proposed to mediate DNA translocation[1,16]. In principle, the flipped BH may indeed stabilize the pre-templated state. Both, Stl and α-amanintin affect the BH and TL conformations and thus in the absence of the substrate Stl may theoretically trap the pre-templated bacterial EC[4,13,17]. However, an intact post-translocated complex is evident in the substrate-free ttEC/Stl structure (DGV, unpublished data). While the exact role of the bacterial BH in translocation remains to be elucidated, this helix stacks on the “active” acceptor T-base, mediates the TL folding and is accessible from both, the secondary and main channels suggesting that its configuration may be modulated by auxiliary factors to regulate the NAC.
The lid loop is located at the junction between the main channel and the RNA exit channels, and the upstream DNA pore. In addition to its commonly accepted role in RNA/DNA hybrid strand separation, the lid loop may also function as a “valve” that opens a passageway between these functional “chambers” during the major structural transitions of the transcription cycle. Indeed, according to the allosteric model of termination, displacement of the lid seems indispensable for intrusion of the termination hairpin in the main channel. On the other hand, in the holoenzyme, the lid locks the extended inter-domain linker of the σ-subunit (region 3.1) inside the core enzyme structure suggesting that opening of the lid accompanies/triggers the σ-factor release during transition from the initiation to elongation phase. Interestingly, the structure of the lid resembles that of the specificity loop in the single-subunit T7 RNAP which undergoes drastic structural rearrangement (repositioning and refolding) upon transition from initiation to elongation complexes and plays essential, but distinct roles in both complexes[29-31]. Therefore, it is possible that the lid also possesses “chameleon” properties and that these reconfigurations are regulated by external transcription factors.
Most recently, we demonstrated that the antibiotic myxopyronin stabilizes refolding of the β’-subunit switch-2 segment and, therefore, may sterically block downstream propagation of the nascent transcription bubble during open complex formation (Figure 2c, 2d, 2e). Mutations in switch-2 mimic the antibiotic effects on promoter complexes suggesting that this region may serve as a natural molecular checkpoint for DNA loading in response to regulatory factors/signals. Consistently with this hypothesis, transcription factor DksA appears to potentate this structural switch. While switch-2 refolding seems unlikely to affect a stable EC, it may play a regulatory role in unstable termination complexes and be a target for some termination/anti-termination factors.
A second, low affinity Mg2+ ion bound to the RNAP active site is known to be a “catalytic” metal required for all reactions catalyzed by RNAP. The “catalytic” transcription factors (Gre-factors, non-template NTPs, pyrophosphate) stimulate intrinsic catalytic activities of RNAP presumably through direct coordination of this catalytic ion. On the other hand, structural and biochemical data indicate that the antibiotic tagetitoxin and ppGpp appear to convert the second catalytic ion into an inhibitory one in essentially the same binding site[15,18]. These results suggest that other “inhibitory” transcription factors may modulate transcription by targeting this “regulatory” metal. The competitive interplay of these catalytic and inhibitory factors may provide exquisite regulatory control of the transcription process.
In the absence of direct structural data on RNAP complexes with protein transcription factors, localization of the major binding sites on RNAP for these factors and elucidation of the mechanisms of their recruitment to the cognate transcription complexes are of central importance for understanding general and specific mechanisms of transcription regulation. To this end, two solid “anchors” for transcription factors were identified. The protruding coiled-coils (CCs) in the β’-subunit of RNAP, the “upstream” clamp helices (CH) and the C-terminal β’CC at the rim of the SC appear to serve as the major binding sites for distinct (“upstream” and SC) subsets of competing transcription factors. Notably, while these factors possess different regulatory mechanisms, most of them are recruited to the two RNAP CCs in a very similar fashion – the hydrophobic tips of the CCs are inserted in the open hydrophobic cavities of the proteins.
So far, the three major “upstream” transcription factors were shown to bind to the CH in a competitive manner: the initiation factor σ, and the two elongation factors, RfaH and NusG[1,33-35]. RfaH and NusG are paralogs that regulate transcriptional pausing and termination and possess sequence and structural similarities[34,36]. The key difference between NusG and RfaH is that NusG acts as a sequence-independent general elongation factor, while RfaH is operon specific and its action depends on the ops site in DNA; this sequence alone induces transcription pausing. The structures of RfaH and NusG revealed two (N- and C-terminal) domains. The N-domains displayed high similarity, while the C-domains, retaining sequence homology, appeared strikingly different; the β-barrel in NusG, and an α-helical CC in RfaH (Figures 3a, 3b). Both N-domains possess a vast hydrophobic cavity that is closed by the C-domain in RfaH but is exposed in NusG. This cavity most likely constitutes the RNAP-binding site in both proteins. This cavity in RfaH becomes unmasked only upon sequence-specific binding to the ops NT-DNA that triggers domain dissociation. Identification of CH as a common anchor for RfaH and NusG allowed structural modeling of the RfaH- and NusG-bound ECs (Figures 3c, 3d). An interesting implication from the RfaH/EC model is that RfaH may interact with only a few (2-3) NT-DNA bases, while the ops element consists of 9 essential nts. One possible scenario is that the ops-induced pause, which presumably requires the entire ops sequence, is accompanied by DNA scrunching and that scrunching exposes the 2-3 NT bases recognized by RfaH (Figure 3e). Another important prediction is that, in addition to binding to the major, functionally active “insertion” binding site on RNAP, RfaH also binds to the “preinsertion” site in an “inactive” configuration and is converted to an “active” configuration when it meets and recognizes the specific DNA target.
The global transcription factor NusA is recruited to transcription complexes upon formation of RNA hairpins. Biochemical data suggest that the N-terminal domain of NusA is essential for binding to RNAP and that binding occurs near the RNA exit channel, presumably in the β-subunit flap domain[33,37]. Following a general prediction concerning the hydrophobic mode of binding of regulatory factors to RNAP, we have identified in the NusA structure the prominent hydrophobic cavity in the N-terminal domain that is blocked in the structure by the N-terminal amphiphilic α-helix. Thus, RfaH-like activation may be required to displace this N-terminal helix and to open the cavity for the RNAP target(Figures 3f, 3g). We speculate that upon activation this groove will accommodate some RNAP α-helix resembling the “self-inhibitory” NusA α-helix. The protruding amphiphilic flap-tip helix from the β-subunit flap domain is a promising candidate, this helix is located at the rim of the RNA exit channel where the RNA hairpins fold. Similar to RfaH, “preinsertion” recognition of the hairpin may trigger NusA activation. In the holoenzyme, the flap-tip helix is trapped in the hydrophobic groove formed by region 4 of the σ-subunit; consistently the σ-factor and NusA compete for binding to RNAP . The flap-tip helix is thus another potential regulatory anchor for the “upstream” group of the transcription regulators.
All structurally characterized SC protein transcription factors with known function (Gre-factors, DksA, Gfh1)[33,39,40] possess a two-domain (globular and CC) architecture (Figure 4a) and are thought to directly modulate the RNAP catalytic site through their “functional” CC domains. The CC domains presumably penetrate the SC, while the primary role of the “structural” globular domains is thought to form a stable and specific complex with RNAP.
Bacterial Gre-factors (GreA/B) stimulate the intrinsic endonucleolytic activity of RNAP and assist RNA polymerase(RNAP) in rescuing backtracked and/or arrested ECs. A commonly accepted mechanism suggests that the two invariant acidic side chains at the tip of the CC domain coordinate the second catalytic Mg2+ ion. However, several conflicting and controversial structural models have been proposed for the Gre/RNAP complex[42-44]. The X-ray structure of the GreB protein and mutational analysis identified a vast open hydrophobic cavity in the GreB globular domain and a complementary hydrophobic patch at the tip of the β’CC as the major binding partners; these results allowed plausible modeling of the RNAP/GreB complex (Figure 4b). This model is now being confirmed by the crystal structure of GreB complexed with isolated β’CC (DGV, in preparation).
The T. thermophilus Gfh1 protein belongs to the Gre-family of transcription factors, yet possesses inhibitory, rather than catalytic activity. The Gfh1 structure reveals two Gre-like domains with strikingly distinct inter-domain orientation[46-48] (Figure 4c). Similar to the Gre-factors, Gfh1 most likely binds to the β’CC and accesses the catalytic center of RNAP. Its recruitment to RNAP, however, appears to require activation through domain rearrangement to adopt an active, Gre-like conformation. By analogy with RfaH, this implies the preinsertion binding mode and a specific target(s) that triggers activation.
DksA binds to RNAP and greatly stimulates the activity of “magic spot”, ppGpp during stringent control. DksA possesses the two (globular and CC) domain architecture reminiscent of the Gre-factors. On the other hand, ppGpp was shown to bind near the active site of ttRNAP. Together, the structural and biochemical data[39,40] suggest a synergetic model in which the accessory DksA protein folds around the β’CC (but does not bind to the β’CC tip) and stabilizes ppGpp binding to RNAP through its CC-domain to enhance the effect of magic spot. However, recently, two DksA variants with the single substitutions in the globular and CC-domain were reported to possess high ppGpp-independent activity. There are the two major alternative interpretations of these results: (i) DksA is a major player in the DksA/ppGpp synergetic tandem, i.e. ppGpp is not a “magic spot” but rather is an accessory molecule (DksA “activator”); (ii) both regulators possess specific, but somewhat distinct effects on transcription suggesting that their mutual activation is triggered by direct interactions. Both mechanisms assume activation of the system that most likely occurs through ppGpp-induced/stabilized conformational alterations of DksA and/or DksA-induced/stabilized reconfiguration of the ppGpp binding site. One possible scenario is that DksA binds to the β’CC through the hydrophobic cavity of its globular domain in a Gre-like fashion. However, in DksA this cavity is masked by a CC domain similar to that of the inactive, apo-RfaH protein. If this prediction is correct, a ppGpp-dependent domain opening would mediate DksA recruitment to RNAP which may occur via the two step, preinsertion-to-insertion mode.
The two other SC transcription factors have recently been characterized. The Rnk protein conserves the Gre-like globular domain, but has a short, rudimentary CC suggesting thereby a similar binding mechanism, but distinct functional targets in RNAP. While the TraR protein shares significant sequence and functional similarities with DksA, there are two striking differences: (i) TraR effects on transcription are ppGpp-independent, and (ii) the TraR sequence is truncated from the N–terminus resulting in deletion of a portion of the globular domain and the first α-helix of the CC without which the protein can hardly adopt a stable DksA-like conformation. Thus, TraR most likely functions as a dimer R.
Finally, in E. coli a bulky (188 residues, SI3) β’-subunit domain, which is dispensable, is inserted in the catalytic TL. Although the structure of the isolated SI3 domain was recently determined, the location of this domain in the RNAP structure and its functional role remain elusive. One intriguing possibility is that this domain may reversibly bind to the β’CC in a Gre-like fashion (Figure 4d) and the on- and off-bound states may confer the regulatory effects through competition/interactions with the SC factors and/or via modulation of TL conformation.
Analysis of the available structural information allows us to make several provocative mechanistic predictions concerning general principles of transcription regulation. These predictions may provide a foundation for future studies of the bacterial transcription machinery. First, there are a limited number of regulatory anchors in RNAP; multiple functionally distinct transcription factors compete for binding to a single anchor. One important role of such competition for the “upstream” anchors may be to demarcate the different phases of transcription. Second, the known anchors contain protruding structural elements with hydrophobic patches complementary to the hydrophobic cavities of the cognate transcription factors. Third, structural (inactive-to-active) isomerization of the transcription factors, presumably triggered by recognition of the target complexes, in many cases appears to mediate recruitment to RNAP. Therefore, recruitment may occur in two, preinsertion-to-insertion steps. Finally, a number of flexible structural “switches” in RNAP may serve as the regulatory targets for various auxiliary transcription factors. Regulatory events are thus often accompanied by unpredictable structural alterations/refolding of RNAP and/or transcription factors suggesting that structural studies are the major, and in some cases the only, tool to elucidate basic mechanisms of transcription and regulation of gene expression.
I thank Dr. T. Townes for critical reading of the manuscript. The work of DGV was supported in part by NIH grants R01GM74252 and R01GM74840.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.