|Home | About | Journals | Submit | Contact Us | Français|
Eukaryotic transcription initiation requires the assembly of general transcription factors into a pre-initiation complex that ensures the accurate loading of RNA polymerase II at the transcription start site. The molecular mechanism and function of this assembly have remained elusive due to lack of structural information. We have used an in vitro reconstituted system to study the stepwise assembly of human TBP, TFIIA, TFIIB, Pol II, TFIIF, TFIIE, and TFIIH onto promoter DNA using cryo-electron microscopy. Our structural analyses provide pseudo-atomic models at various stages of transcription initiation that illuminate critical molecular interactions, including how TFIIF engages Pol II and promoter DNA to stabilize both the closed PIC and the open-promoter complex and regulate start site selection. Comparison of open versus closed pre-initiation complexes, combined with the localization of the TFIIH helicases XPD and XPB, supports a DNA translocation model of XPB and explains its essential role in promoter opening.
Accurate and regulated initiation of eukaryotic gene transcription represents a major step in gene regulation, requiring the coordinated activity of a large number of proteins and protein complexes. The basal transcriptional machinery includes RNA polymerase II (Pol II) along with a series of general transcription factors (GTFs) (TFIIA, TFIIB, TFIID, TFIIE, TFIIF, and TFIIH) that assemble into a ~2 MDa complex on core promoter DNA. This pre-initiation complex (PIC) is essential to direct accurate transcription start site (TSS) selection, promoter melting, and Pol II promoter escape1–3. Despite recent structural advances on Pol II4,5 and subcomplexes of the PIC6, the molecular assembly details of this essential complex remain elusive.
In vitro reconstitution of this process has provided a model for the sequential assembly pathway of transcription initiation. TFIID is the first factor specifically recruited to the promoter. This megadalton complex includes the TATA binding protein (TBP), which is sufficient for basal transcription on TATA box containing promoters2,3,7. TFIIA and TFIIB are then recruited, further stabilizing the interaction between TBP and promoter DNA. Next Pol II, likely in association with TFIIF, adds to the growing PIC. Finally TFIIE and TFIIH, which is required for DNA melting, are recruited to form the transcriptionally competent PIC2,3.
Structural characterization of PIC assemblies is challenging and has been limited to a small number of electron microscopy (EM) studies8–10. Crystallographic structures of individual components, combined with biochemical data, have led to a number of structural models for PIC subcomplexes, in either a closed or open-promoter conformation6,9,11,12. In spite of this progress, important questions remain unanswered, such as how TFIIB and TFIIF serve complementary roles during the promoter opening process or how TFIIE positions TFIIH in a configuration capable of melting the DNA.
Here we present cryo-EM snapshots of PIC intermediates during sequential assembly. A reconstitution system allowed us to localize each GTF within the cryo-EM structures, track the effect of each additional factor on the PIC, and ultimately reveal the network of protein-protein and protein-DNA interactions governing PIC assembly. Furthermore, by visualization of an open-promoter complex (OC) mimic, we have obtained new mechanistic details concerning promoter melting. Altogether, our structures provide unprecedented insights into the molecular assembly, organization, and functional roles of different GTFs during transcription initiation.
In order to structurally characterize the sequential assembly of GTFs necessary for human transcription initiation, we developed an in vitro system for reconstitution and purification of a simplified PIC, in which TBP substituted for TFIID, and that ultimately contained 31 polypeptides. Our promoter DNA contained TATA, BRE and INR core promoter elements and was immobilized on streptavidin beads (Fig. 1a). After stepwise assembly of PIC intermediates by sequential incubation with the desired GTFs, stable complexes were released by restriction enzyme digestion. The effectiveness of this approach for structural characterization of the PIC intermediates was initially tested by single particle EM of negatively stained samples (Supplementary Fig. 1). This initial analysis allowed us to localize each GTF within the context of the full assembly (Fig. 1b–e), although it precluded the visualization of DNA. The stepwise purification approach enabled us to describe the effect of factor addition on the rest of the PIC, which cannot be achieved by studying individual factors or the complete PIC. The negative stain structures were then used as starting references to generate cryo-EM reconstructions of the PIC subcomplexes with improved resolution that allowed visualization of the DNA and accurate docking of existing crystal structures (Supplementary Figs 2–5).
To start, we obtained the cryo-EM structure of a PIC subcomplex containing TBP, TFIIA, TFIIB, Pol II, and core promoter DNA (Fig. 2a). Crystal structures of TBP-TFIIA-DNA13, TBP-TFIIB-DNA14 and yeast Pol II-TFIIB11,12,15 could be unambiguously docked into our density map as rigid bodies (Supplementary Fig. 2e). This procedure validated our cryo-EM structure while also allowing the localization of each factor to generate a pseudo-atomic model of the assembly. The visible portion of DNA accounts only for the upstream core promoter elements, which are stabilized by protein-DNA interactions, whereas the DNA downstream of the BREd lacks contact with the PIC and was not visualized due to its flexibility (Fig. 2a).
A yeast PIC model has been previously proposed based on superimposing crystal structures using the common protein as an anchor point12. Our data shows that a simple pivoting of the C-terminal cyclin fold domain of TFIIB around the N-terminal one can explain the position of the TBP/TFIIA module in our map using the available crystal structures, without disrupting the interaction between the N-terminal cyclin fold of TFIIB and Pol II (Supplementary Fig. 7). This small discrepancy with the previous piece-wise model is unlikely due to differences between the human and yeast systems, but rather a re-organization with respect to individual crystal structures upon interaction of GTFs on the core promoter.
According to the sequential assembly pathway, TFIIF is recruited to the promoter in association with Pol II2. In order to understand its structural role during PIC assembly, we added TFIIF separately to our reconstituted system. By comparing the cryo-EM structures of PIC subcomplexes in the absence and presence of TFIIF, we identified additional protein densities appearing at two nearby locations, by the lobe and protrusion domains of Pol II (Fig. 2b). Importantly, the addition of TFIIF also resulted in the stabilization of the downstream DNA along the cleft of Pol II, in a position that is distinct from a previously proposed model12 (Supplementary Fig. 8). Thus, TFIIF is required for the engagement of DNA by Pol II within the context of a closed PIC.
The crystal structure of the human TFIIF dimerization domain16 could be unambiguously fitted into the new density ascribed to TFIIF by the lobe domain of Pol II using rigid-body docking, only slightly shifted from previous models that were based on crosslinking data17,18 (Fig. 2c and Supplementary Fig. 3e). No obvious density was observed for the arm domain of RAP74, which extends about 45 Å from the end of the RAP74 barrel16, suggesting it is mobile at this stage of PIC assembly. A small clash between the RPB2 lobe and the RAP74 α1 helix, can be explained by a reorganization of this element in the context of the PIC (see Supplementary Fig. 9 for details).
In addition to interacting with Pol II, our cryo-EM structure indicates that a region of TFIIF directly contacts the BREd (Fig. 2c,d). We propose that this region corresponds to the C-terminal winged-helix (WH) domain of RAP30, based upon the following: 1) the size of this additional TFIIF density is consistent with the RAP30 WH domain; 2) RAP30 has been shown to crosslink to BREd19 and its C-terminal WH domain has been identified to be in direct contact with the protrusion domain of Pol II18; 3) RAP30, rather than RAP74, is required for accurate transcription initiation20 and deletion mutants of RAP30’s WH domain are lethal in yeast18,21. This WH domain therefore contributes to a unique nucleoprotein complex formed by TBP, TFIIA, and the TFIIB cyclin fold, as they contact the core promoter elements upstream of the INR, which is further stabilized by the protrusion domain of Pol II (Fig. 2d).
Comparison of structures shown in Fig. 2a and Fig. 2b strongly suggests that the overall effect of TFIIF on the assembling PIC is a clear stabilization of the DNA along the Pol II cleft. Given its position, we propose that the RAP30 WH domain plays an essential role in positioning the flexible promoter DNA downstream of BREd along the Pol II cleft, thus facilitating subsequent steps in the promoter melting process. However, a contribution to this DNA stabilization by the dimerization domain is also possible. Correct positioning of the DNA by TFIIF is consistent with its role in promoter opening and TSS selection22–24. Our structures also revealed the opening of the Pol II clamp domain as it accommodates the downstream DNA (Fig. 2e and Supplementary Figure 11). In addition, we observe further rotation of the TBP-TFIIA-TFIIB subcomplex with respect to previously proposed models, positioning it even closer to Pol II (Fig. 2e and Supplementary Fig. 7).
The addition of TFIIE to the growing PIC resulted in new protein density that connects TFIIF with Pol II’s stalk domain (Fig. 3a). The resolution of this reconstruction (11 Å) was the highest obtained for any of the complexes studied, suggesting that TFIIE stabilizes the PIC. The density corresponding to TFIIE, however, was the least well-defined element according to local resolution calculations (Supplementary Fig. 4f), which may be due to flexible connections between the WH domains predicted within the TFIIE structure6 (Fig. 3a,b). One end of TFIIE associates with the stalk of Pol II by interacting with the RPB7 L45 loop (Fig. 3c), which has been predicted to stabilize the OC and whose deletion completely abolished transcription25. Also consistent with the positioning of TFIIE in our PIC structure, a zinc ribbon domain within the archaeal homolog of TFIIE was found to be located near the base of the stalk domain of the polymerase26. Away from the stalk, the TFIIE density contacts the Pol II clamp domain to interact ultimately with the WH domain of TFIIF. A model of the three WH domains within TFIIE interacting with elements of the clamp head has been proposed based on crosslinking studies6 (Fig. 3c). Although the model cannot fit the EM density perfectly, the overall path of the three tandem WH domains in the model follows the elongated TFIIE cryo-EM density and ends by directly contacting the RAP30 WH domain (Fig. 3b). Therefore, a continuous chain of four WH domains appears to link the Pol II clamp region with the TBP-TFIIA-TFIIB-DNA subcomplex, preventing DNA from leaving the cleft.
Our 11 Å resolution reconstruction of the PIC containing TFIIE starts to reveal the major and minor grooves of the promoter DNA (Fig. 3d), allowing us to model its path. We found that linear B-form DNA could not be accommodated into the DNA density (Supplementary Fig. 12), requiring instead a smooth bend of 18° between positions −23 and +7 that fitted both the path and groove features of the EM density. Interestingly, a hypersensitivity region around −6 position27 locates at one of the downstream DNA-Pol II interfaces as discussed below.
We observed two protein contacts with the downstream DNA. One connection involves the 3-strand β sheet below the clamp head while the other is mediated by a 2-helix bundle at the tip of the RPB5 jaw (Fig. 3d). Interestingly, these are the only two positively charged protein surfaces on Pol II along the path of the downstream DNA (Supplementary Fig. 12). The INR element is sandwiched precisely between these two protein-DNA contacts, an arrangement that may be relevant in promoter melting at the correct position in the DNA. The slightly open clamp conformation seen upon DNA placement onto the cleft following TFIIF addition is likely due to the interaction of the DNA with the clamp head β sheet (Fig. 3d and Supplementary Fig. 11b,c).
Whereas the spacing between the TATA box and the TSS can vary between species, the region within promoter DNA that is melted during transcription initiation is ~20 bp downstream of TATA28. We inferred the approximate position of flexible elements within TFIIB and TFIIF by docking their crystal structures as rigid bodies within our cryo-EM density. Importantly, we find that both the TFIIB linker helix and the TFIIF arm domain align with the promoter melting start site (Supplementary Fig. 13). This arrangement is consistent both with the proposed role of the linker helix of TFIIB in promoter opening12 and with the crosslinking of the arm domain of RAP74 to the TFIIB linker near the active site29,30, as well as with the suppression of the TSS defect of TFIIB mutations by a mutant within the arm domain of TFIIF31. In our rigid-body fitting, the linker helix of TFIIB overlaps with the DNA in our model, suggesting a rearrangement of the helix relative to the clamp domain at this stage in the PIC assembly. Finally, the tip of the TFIIF arm domain contains seven positively charged residues, whereas four positively charged residues are present on the side of the TFIIB linker helix that faces the DNA (Supplementary Fig. 13). The juxtaposition of these domains within the melting start site is consistent with their direct role in DNA interactions.
The structural features of our Pol II based PIC model are likely conserved with Pol I and Pol III, the two other RNA polymerases in eukaryotes. A side-by-side comparison of our Pol II-based PIC model with a cryo-EM structure of native Pol III agrees with this hypothesis32 (Supplementary Fig. 14).
It is well established that the PIC remains stably associated during transcription initiation until Pol II undergoes promoter escape2. Preceding this step, however, Pol II needs to transition into an OC in which the melted single-stranded DNA is inserted into the active site. To gain structural insight into the transition from a closed to an open promoter complex, we generated a “functional mimic” of the PIC in its open conformation by modifying the promoter substrate used to form the closed PIC (Fig. 4a). We replaced the segment of DNA containing the INR element with a 3’-tailed sequence previously used to create an arrested transcription state in yeast Pol II33. We matched the arrested position of Pol II on the template exactly to the TSS used in our studies, thereby creating a Pol II-nucleic acid complex containing only ~5 nucleotides at the active site, while still containing upstream core promoter elements available for assembling the rest of the PIC. We found that TFIIE had a higher affinity for the OC, as excess TFIIE had to be used to saturate the closed PIC, but not the open state mimic. Interestingly, excess TFIIE was no longer required in the context of the closed PIC when TFIIH was also included (see later), in agreement with previous studies suggesting cooperative binding of TFIIE and TFIIH within the PIC2,34,35.
The reconstruction of the OC mimic resembled that of the PIC in the closed conformation, with all the GTFs remaining at identical positions (Fig. 4b and Supplementary Fig. 4e, 5e, 10b–d). This finding is consistent with the prevalent hypothesis that the PIC assembled at the promoter remains intact until promoter escape2. In contrast, the downstream DNA adopted a conformation previously observed for elongating Pol II, indicating that the template strand was inserted through the positively charged cleft into the active site4 (Fig. 4b,c). The single stranded segments are invisible at our resolution or not present (non-template strand). As a reasonable model, the bubble depicted has been derived from a previous model based on FRET studies on the yeast system36.
When the position and orientation of the downstream DNA is compared between the closed PIC and the OC mimic, it is clear that there is a change in orientation concomitant with the insertion of the downstream DNA into the active site (Fig. 4c), indicating that the DNA rotates on a plane as it translates, while maintaining a point of contact between the DNA and RPB5 that corresponds to one of the two contacts present in the closed state (the one downstream of the INR).
Other than the repositioning of the DNA within the active site, two main differences were observed upon comparison of the OC mimic and the closed PIC structures. First, the clamp domain in the open state moves down to engage the open DNA bubble, adopting the conformation observed in the elongation state37 (Fig. 4d and Supplementary Fig. 11d). Thus, the clamp domain completes an open to closed transition throughout the process of PIC assembly and promoter opening (Supplementary Fig. 11), a cycle also reported for the bacterial system38. Second, an additional protein density now extends from the bottom of the clamp and connects to the dimerization domain of TFIIF (Fig. 4e). Rigid body fitting of crystal structures suggests that this density corresponds to the stabilized rudder of Pol II and the arm domain of TFIIF. We propose that these elements interact with each other as the clamp closes down over the melted DNA. Interestingly, this proposed interaction would prevent re-annealing of the melted DNA. The TFIIB linker helix is near this position and likely participates in the promoter melting process as well. This proposal is consistent with our hypothesis that the flexible TFIIB linker helix and the TFIIF arm domain act together in promoter opening (Supplementary Fig. 13). Thus, our structure and pseudo-atomic model provide a possible explanation for the enigmatic role of TFIIF in promoter opening and TSS selection22,23.
To gain insight into the natural promoter opening process carried out by TFIIH, we utilized the same purification strategy used for the previous closed PIC subcomplexes but included the purified, endogenous 10-subunit human TFIIH complex as a last step before elution. Given the scarcity of purified human TFIIH, this study was limited to negative stained samples, which require less material. The 3D reconstruction of the TFIIH-containing PIC showed a substantial additional density extending away from Pol II, consistent with the large molecular weight of TFIIH (0.45 MDa) (Fig. 1e, Fig. 5a and Supplementary Fig. 6e). Surprisingly, only two contacts are observed between TFIIH and the rest of the PIC. One is with the Pol II’s stalk domain, at the site of interaction with TFIIE. The other contact likely involves the interaction of TFIIH directly with the downstream DNA. While the DNA is not visible in this negative stain reconstruction, its position can be extrapolated from the cryo-EM structure of the PIC containing TFIIE (Fig. 5a).
The CAK subcomplex (Cdk7/CyclinH/Mat1) of TFIIH, which phosphorylates the C-terminal domain (CTD) of RPB1, is missing from our PIC reconstruction based on comparison with a recent EM study of yeast TFIIH39. When we analyzed images of free human TFIIH, an additional density that could accommodate the mass of the CAK subcomplex appeared highly mobile, in agreement with the yeast TFIIH data39 (Supplementary Fig. 15, 16). Interestingly, when this new density, which fits the crystal structures of Cdk7 and Cyclin H, is placed in the context of the full PIC, it faces towards the CTD of RPB1 (Supplementary Fig. 16).
The reconstruction of the TFIIH-containing PIC allowed us to dock the crystal structure of XPD40 and a homology model of XPB6 (Fig. 5a and Supplementary Fig. 6e). XPD is positioned in close proximity to TFIIE and the Pol II stalk, but away from DNA, consistent with a scaffolding role in transcription initiation7. On the other hand, XPB docked directly on the downstream DNA path, between the +10 and +20 bp position relative to the TSS (Fig. 5a). This position is consistent with previous crosslinking data using purified TFIIH41, but inconsistent with a recent crosslinking study using overexpressed XPB in extracts, in which XPB was proposed to be positioned closer to the TFIIE WH domains and the INR element6. This result might reflect an alternative position of this protein during the assembly of the PIC, a distinct position of XPB on the DNA when out of the TFIIH complex, or the effect of other factors like TFIID and Mediator on PIC organization. The position of XPB we observed within the TFIIH density, together with the movement of the downstream DNA inferred from comparison of our reconstructions of the closed and open states of Pol II, suggests how XPB could act as a DNA translocase. A translocase model for XPB has previously been proposed6, but our structure now shows XPB positioned further downstream, leaving enough space around the INR element for it to be melted during this process.
We believe that the position of XPB suggests a DNA insertion process in which, as XPB walks on the DNA away from the rest of PIC, the DNA would be translocated in the opposite direction and pushed into the Pol II cleft while maintaining a point of contact with RPB5 (which starts involving the DNA just downstream of the INR in the closed complex). This happens concomitantly with a rotation of the DNA, with the RPB5 contact likely serving as a pivot point. As XPB walks on the DNA helix, it would generate supertwist that would be relaxed by unwinding. While this unwinding cannot happen in the DNA that is tightly wrapped and stabilized by the TBP-TFIIA-TFIIB-Rap30-protrusion module, it would be facilitated and/or stabilized in the DNA region between the BREd and the INR, where the arm domain of Rap74 makes contact with the Pol II rudder and regions of TFIIB.
The combination of structures described here provides unprecedented mechanistic insight into the stepwise assembly of the human PIC, defining key protein-protein and protein-DNA interactions important for PIC function (Fig. 5b). Our structures reveal the location and role of RAP30 WH domain within an essential upstream nucleoprotein subcomplex. Its critical function in structurally stabilizing the whole PIC is highlighted by our direct visualization of the DNA as it positions along the Pol II cleft upon TFIIF binding. We also show a direct interaction between the arm module of the TFIIF dimerization domain and the rudder domain of Pol II upon formation of the OC, leading to a direct mechanistic model of how this TFIIF element facilitates and/or maintains strand separation concomitant with the closing down of the clamp domain of Pol II. Our structures show how two essential factors, TFIIB and TFIIF come together at critical locations for their activity in the context of a full PIC. Our studies also reveal how TFIIH, because of its large size, can simultaneously interact with TFIIE at the base of the Pol II stalk and position XPB on downstream DNA.
Our studies of the closed PIC and an OC mimic illuminate the structural transitions necessary during the process of promoter melting. The apparent movement of downstream DNA observed when comparing the closed PIC and OC structures, together with the positioning of XPB on the downstream DNA, suggests how XPB could act as a DNA translocase to thread approximately 10bp of downstream double stranded DNA into the cleft. This translocating activity would push against the stably bound upstream DNA around the TATA box to induce negative supercoiling near the TSS. We find that the TFIIB linker helix and the TFIIF arm domain align with each other at the promoter melting start site, likely to facilitate the separation of the two strands. Once promoter DNA melting is further extended and the Pol II clamp closes down, the TFIIB linker helix and the TFIIF arm domain work together with the Pol II rudder to maintain the upstream edge of the DNA bubble.
Finally, the arrangement of components within our PIC structure is compatible with existing structural models that include the large, multi-subunit Mediator and TFIID complexes10. Future structural studies with Mediator and/or TFIID will yield further insight regarding the regulation of PIC assembly and function. In summary, this work provides the structural framework needed to integrate biochemical and structural data into a unified mechanistic understanding of transcription initiation.
TBP, TFIIA, TFIIB, TFIIE, and TFIIF were recombinantly expressed and purified from Escherichia coli. Pol II and TFIIH were immunopurified from HeLa cell nuclear extracts42. The design of the DNA construct was based on the SCP43, with a SalI restriction enzyme site introduced downstream of the INR element. PIC complexes were assembled according to an in vitro transcription protocol42 with minor modifications (see extended Supplementary Methods). The reactions were incubated with magnetic streptavidin T1 beads (Invitrogen) and the desired complexes were eluted by SalI digestion.
Data collection and image processing were conducted using the Leginon data collection software44 and the Appion electron microscopy processing environment45, respectively. Three-dimensional maps were calculated using libraries from the EMAN2 and SPARX software packages46,47. Volume segmentation, automatic rigid body docking, figure and movie generation were performed using UCSF Chimera48.
TBP, TFIIA, TFIIB, TFIIE, and TFIIF were recombinantly expressed and purified from Escherichia coli. Pol II, TFIID, and TFIIH were immunopurified from HeLa cell nuclear extracts following previously established protocols42,51. The design of the DNA construct was based on the SCP43, except that a BREu element was introduced upstream of the TATA box14 and a SalI restriction enzyme site was included downstream of the INR element for purification purposes (Template1, 5’-ACTGGGAAGTCGACCGGTCCGTAGGCACGTCTGCTCGGCTCGAGTGTTCGATCGCGACTGAGGACGAACGCGCCCCCACCCCCTTTTATAGGCGCCCTTC; Nontemplate1, 5’-GAAGGGCGCCTATAAAAGGGGGTGGGGGCGCGTTCGTCCTCAGTCGCGATCGAACACTCGAGCCGAGCAGACGTGCCTACGGACCGGTCGACTTCCCAGT). The nucleic acid scaffold that was used to generate the PIC in the open conformation was designed by modification of the promoter substrate used to form the closed PIC. An RNA-DNA duplex beyond 7 bp has been proposed to be the trigger for TFIIB release and promoter escape12,52. Thus, we replaced the segment of DNA containing the INR element with a 3’-tailed sequence previously used to create an arrested transcription state in yeast Pol II33. We matched the arrested position of Pol II on the template exactly to the TSS used in our studies, thereby creating a Pol II-nucleic acid complex containing only about 5 nucleotides at the active site while still containing upstream core promoter elements available for assembling the rest of the PIC (Template2, 5’-ACTGGGAAGTCGACCGGTCCGTAGGCACGTCTGCTCGGCTCGAGTGAGCTAGCTTACCTGGTGTTGCTCTAACCCCCACCCCCTTTTATAGGCGCCCTTC; Nontemplate2, 5’-GAAGGGCGCCTATAAAAGGGGGTGGGGGTT; Nontemplate3, 5’-GAGGTAAGCTAGCTCACTCGAGCCGAGCAGACGTGCCTACGGACCGGTCGACTTCCCAGT). A biotin tag was engineered at the 5’ end of both template strands (Integrated DNA Technologies). The duplexed DNA was generated by annealing the template strand with equimolar amounts of single stranded non-template DNA at a final concentration of 50 µM in water. The annealing reaction was carried out at 100°C for 5 min and gradually cooled down to room temperature within 1 h.
PIC in the closed conformation were assembled according to an in vitro transcription protocol42 with minor modifications. The assembly buffer contained 12 mM HEPES, pH 7.9, 0.12 mM EDTA, 12% glycerol, 8.25 mM MgCl2, 60 mM KCl, 1 mM DTT, 0.05% NP-40, 2.5 ng/µl dI-dC, 10 µM ZnCl2. The following purified proteins and nucleic acids were sequentially added into the assembly buffer: Pol II, TFIIB, TBP/TFIIA, DNA (template1-nontemplate1), TFIIF, and TFIIE at final concentrations of 185 nM, 3.6 µM, 370 nM, 50 nM, 289 nM, and 370 nM, respectively. The assembly reaction was kept at 37°C for an additional 5 min whenever a new factor was added. The reaction was incubated at 28°C for 15 min using a 1:10 dilution of the magnetic streptavidin T1 beads (Invitrogen) which had been equilibrated with the assembly buffer. Following washing of the beads three times using a washing buffer (10 mM HEPES, 10 mM TRIS, pH 7.9, 5% glycerol, 5 mM MgCl2, 50 mM KCl, 1 mM DTT, 0.05% NP-40, 5 µM ZnCl2), TFIIH at a final concentration of 100 nM was incubated with the beads in assembly buffer at 37°C for an additional 5 min. Following a single additional wash of the beads using washing buffer, the desired complex was eluted by incubating the beads at 28°C for 1 h with digestion buffer containing 10 mM HEPES, pH 7.9, 5% glycerol, 10 mM MgCl2, 50 mM KCl, 1 mM DTT, 0.05% NP-40, 1 unit/µl BSA-free SalI-HF (New England Biolabs). The various PIC intermediates were generated by including just the factors of interest during the assembly process described above. For preparing TBP-TFIIA-TFIIB-DNA-PolII-TFIIF-TFIIE complex, extra TFIIE was added afterwards to the purified PIC at a final concentration of 100 nM.
It was not possible for us to reconstitute an open complex using either a mismatch DNA bubble, probably due to failure of efficiently and specifically positioning Pol II on the bubble, or a nucleic acid scaffold containing an RNA primer, probably because an RNA-DNA duplex of over 7 bp in length has been proposed to be the trigger for TFIIB release and promoter escape12,52. PIC in the open conformation was assembled similarly, except for the following changes. An arrested Pol II on the open promoter nucleic acid scaffold was first prepared by incubation at 28°C for 1 h of Pol II and DNA (template2-nontemplate3) at final concentrations of 300 nM and 80 nM, respectively, in the arresting buffer containing 12 mM HEPES, pH 7.9, 0.12 mM EDTA, 12% glycerol, 8.25 mM MgCl2, 60 mM KCl, 1 mM DTT, 0.05% NP-40, 2.5 ng/µl dI-dC, 1:100 dilution of RNasin Ribonuclease inhibitor (Promega), and 2 mM CTP. The following purified proteins and nucleic acid were sequentially added into the arrested Pol II reaction above: nontemplate2, TBP/TFIIA, TFIIB, TFIIF, and TFIIE at final concentrations of 200 nM, 370 nM, 3.6 µM, 289 nM, and 370 nM, respectively. The desired open promoter complex was then purified in the same manner as the closed complexes above.
Purified PIC complexes were crosslinked after elution by incubation with glutaraldehyde at a final concentration of 0.05%, on ice and under very low illumination conditions, for 5min, then immediately used for EM sample preparation (either negative stain or cryo-plunging).
Negative stain samples of PIC complex and of free TFIIH were prepared using 400 mesh copper grid containing a continuous carbon supporting layer. The grid was plasma cleaned for 10 s immediately prior sample deposition using a Solarus plasma cleaner (Gatan) equipped with 75% argon/25% oxygen. An aliquot (3 µl) of the purified sample (~50 nM) was placed onto the grid and allowed to absorb for 5 min at 100% humidity in a homemade humidity chamber kept under very low illumination conditions. It was subsequently stained by five successive 75 µl drops of 2% (w/v) uranyl formate solution, rocking 10 s on each drop followed by blotting till dryness. Data collection was performed using a Tecnai F20 Twin transmission electron microscope operating at 120 keV at a nominal magnification of ×80,000 (1.37 Å per pixel). The data was collected using the Leginon data collection software44 on a Gatan 4k×4k camera using low-dose procedures (20 e− Å−2 exposures) and a range of defocus values (from −0.5 to −1.2 µm). Between 300 and 600 images were acquired for each of the negative stain data sets.
Preparation of PIC samples for cryo-EM observation was carried out using 400 mesh C-flats containing 4 µm holes with 4 µm spacing (Protochips). A thin carbon film was floated onto the grid before it was plasma cleaned for 5 s using a Solarus plasma cleaner (Gatan) equipped with 75% argon/25% oxygen gas immediately prior sample deposition. Aliquot (3 µl) of the purified sample (~100 nM) was placed onto the grid and loaded into a Vitrobot (FEI) at 100% humidity and 4 °C. The sample was allowed to absorb for 5 min (under low illumination conditions), then was blotted for 4 s and immediate plunged into liquid ethane. The frozen grids were stored in liquid nitrogen until loaded into a Tecnai F20 Twin transmission electron microscope operating at 120 keV using a 626 single-tilt cryotransfer system (Gatan). Data were acquired at a nominal magnification of ×100,000 (1.05 Å per pixel) using low-dose procedures (20 e− Å−2 exposures) and a range of defocus values (from −1.2 to −2.4 µm). Between 1500 and 3200 images for each of the cryo data sets were collected using the MSI-T application of the Leginon data collection software44.
Negative stain data pre-processing was performed using the Appion processing environment45. Particles were automatically selected from the micrographs using a difference of Gaussians (DoG) particle picker53. The contract transfer function (CTF) of each micrograph was estimated using both the ACE2 and CTFFind programs during data collection54,55, the phases were flipped using CTFFind, and particle stacks were extracted using a box size of 256×256 pixels (except for both the TFIIH containing PIC complex and free TFIIH samples, which use 320×320 pixel boxes) from images whose ACE2 confidence value was greater than 0.8, followed by normalization using the XMIPP program to remove pixels which were above or below 4.5 σ of the mean value56. The particle stack was binned by a factor of two and two-dimensional classification was conducted using iterative multivariate statistical analysis and multireference alignment analysis (MSA-MRA) within the IMAGIC software57. Class averages containing properly assembled complexes were manually selected and re-extracted to create a new particle stack for reconstruction.
Cryo data processing was performed in a similar manner as the negative stain data. Particle stack was extracted using 384×384 pixel box size from phase-flipped images and binned by a factor of two.
The cryo-negative staining structure of free Pol II58, after low-pass filtering to 60 Å, was used as initial model for reconstruction of all the negatively stained PIC samples. For reconstruction of the TFIIH containing PIC, which has a substantial extra mass with respect to Pol II, the negative stain reconstruction of the TBP-TFIIA-TFIIB-DNA-Pol II-TFIIF-TFIIE sample, after low-pass filtering to 60 Å, was instead used as the initial reference (Supplementary Fig. S1). For reconstruction of the free TFIIH, the core TFIIH density segmented from the negative stain TBP-TFIIA-TFIIB-DNA-Pol II-TFIIF-TFIIE-TFIIH refined model was used as the initial reference, after low-pass filtering to 60 Å resolution. Three-dimensional reconstruction was conducted using an iterative multi-reference projection-matching approach containing libraries from the EMAN2 and SPARX software packages46,47 with two identical copies of the initial model as references. This step allowed us to further eliminate contamination, aggregated, or damaged complexes, which became enriched in one of the reconstructions. Refinement began at an angular step of 25° and progressed down to 4° angular increments. At each step, refinement proceeded to the next angular step only once >95% of the particles had a pixel error of <1 pixel. The particle numbers contributing to the final negative stain reconstructions were 11,880 for TBP-TFIIA-TFIIB-DNA-Pol II, 13,770 for as previous plus TFIIF, 15,656 for as previous plus TFIIE, 64,712 for as previous plus TFIIH, and 13,023 for free TFIIH. The resolution of the reconstruction was estimated using the 0.143 Fourier shell correlation (FSC) criterion to be about 15 Å for PIC complexes, and 20 Å for free TFIIH.
Cryo-EM reconstructions were performed in a similar manner. The negative stain reconstruction of TBP-TFIIA-TFIIB-DNA-Pol II-TFIIF was used as the initial reference for all the cryo reconstructions, except for that of TBP-TFIIA-TFIIB-DNA-Pol II, which used its corresponding negative stain model as the initial reference (Supplementary Figs 2–5). All initial models were low-pass filtered to 60 Å resolution. The particle numbers contributing to the final reconstructions were 122,480 for TBP-TFIIA-TFIIB-DNA-Pol II, 43,785 for as previous plus TFIIF, 51,043 for as previous plus TFIIE, and 53,505 for as previous in the open conformation. To dampen low-resolution amplitudes of the final maps, the Fourier amplitudes were adjusted to match an experimental GroEL SAXS curve using the SPIDER software59. The estimated resolution, using the 0.143 FSC criterion, was between 11 and 13 Å for the cryo-EM reconstructions of PIC assembly intermediates (Supplementary Figs 2–5).
Local resolution calculation was performed for all reconstructions using the “blocres” function in the Bsoft package60,61(Supplementary Figs 2–6). Volume segmentation, automatic rigid-body docking, figure and movie generation were performed using UCSF Chimera48. The globally bent DNA model was generated using the 3D-DART online server62 and the 3DNA software package63.
We thank Carla Inouye for providing us with recombinant TFIIF and TFIIE, Patricia Grob and Tom Houweling for electron microscopy and computer support, respectively, Tom Goddard for help with Chimera, and members of the Nogales lab for technical advice on image processing. We are thankful to James Kadonaga, James Goodrich, and Michael Cianfrocco for their comments on the manuscript. We thank Priscilla Cooper both for biochemical advice and for her comments on the manuscript. This work was funded by NIGMS (GM63072 to E.N) and by NCI (CA127364 to D.T). E. N. is a Howard Hughes Medical Institute Investigator.
Author Contributions Y.H. designed and carried out the experiments, J.F. and D.J.T. provided essential reagents, Y.H. and E.N. analyzed the data and wrote the paper.
Author Information Cryo-EM density maps have been deposited in the Electron Microscopy Data Bank (EMDB) under accession numbers EMD-2304 (TBP-TFIIA-TFIIB-DNA-Pol II), EMD-2305 (TBP-TFIIA-TFIIB-DNA-Pol II-TFIIF), EMD-2306 (TBP-TFIIA-TFIIB-DNA-Pol II-TFIIF-TFIIE), and EMD-2307 (TBP-TFIIA-TFIIB-DNA-Pol II-TFIIF-TFIIE in the OC mimic state). Negative stain EM density maps have been assigned accession numbers EMD-2308 (TBP-TFIIA-TFIIB-DNA-Pol II-TFIIF-TFIIE-TFIIH), and EMD-2309 (apo TFIIH).