|Home | About | Journals | Submit | Contact Us | Français|
In vertebrates, the 7SK RNA forms the scaffold of a complex, which regulates transcription pausing of RNA-polymerase II. By binding to the HEXIM protein, the complex comprising proteins LARP7 and MePCE captures the positive transcription elongation factor P-TEFb and prevents phosphorylation of pausing factors. The HEXIM-binding site embedded in the 5΄-hairpin of 7SK (HP1) encompasses a short signature sequence, a GAUC repeat framed by single-stranded uridines. The present crystal structure of HP1 shows a remarkably straight helical stack involving several unexpected triples formed at a central region. Surprisingly, two uridines of the signature sequence make triple interactions in the major groove of the (GAUC)2. The third uridine is turned outwards or inward, wedging between the other uridines, thus filling the major groove. A molecular dynamics simulation indicates that these two conformations of the signature sequence represent stable alternatives. Analyses of the interaction with the HEXIM protein confirm the importance of the triple interactions at the signature sequence. Altogether, the present structural analysis of 7SK HP1 highlights an original mechanism of swapping bases, which could represent a possible ‘7SK signature’ and provides new insight into the functional importance of the plasticity of RNA.
In the nucleus of higher eukaryotes, the non-coding RNA 7SK participates in the regulation of transcription by RNA-polymerase II (1–3). It assists in capturing the positive transcription elongation factor P-TEFb, inhibits its kinase activity and thereby prevents P-TEFb's function, which is to alleviate transcription pauses (4,5). This requires binding to a protein, HEXIM, (HEXIM1, or the minor HEXIM2 in human, which seem to behave similarly with respect to RNA-binding) which in turn binds the cyclin moiety of P-TEFb (6,7). Since P-TEFb is the cellular factor hijacked by human immunodeficiency virus (HIV) to enhance its transcription, 7SK is indirectly linked to the transcription of HIV RNA by monitoring P-TEFb availability (8).
The 7SK RNA belongs to the 7SKsnRNP core particle, together with proteins LARP7 and MePCE, which contribute to its protection against nucleases and its packing into a functional molecule (9–12). Alternative models of the secondary structure of 7SK have been described in the literature. A fold into four domains was proposed after probing the accessibility of the nucleotides of 7SK extracted from human cells (13). Another 2D structure (Figure (Figure1B)1B) was based on the analysis of about a hundred 7SK sequences spanning vertebrates, fly and mollusk species (14). It similarly contains a limited number of domains and linkers, but is closed through base-pairing of conserved sequences. The closure was recently proposed to be facilitated by LARP7 and thus impact 7SKsnRNP assembly (15). The most conserved domains of the 7SK RNA are the 5΄- and the 3΄-hairpins, present in the two 2D structures (16). Both were shown to participate in the function of 7SK (17). The 3΄-hairpin (HP4; 300–331 in human 7SK) was identified as involved in P-TEFb inhibition (17,18) and recently shown to contribute to LARP7 binding (15,18). The 5΄-hairpin (HP1; residues 24–87 in human) was shown to be essential for HEXIM recognition (17,19). It is also able to bind the HIV trans-activator protein Tat (20) which in HIV-infected cells, recruits P-TEFb after anchoring to the TAR domain of the early transcript (21). Both Tat and HEXIM thus bind the cyclin T1 moiety of P-TEFb following interaction with RNA. A major difference is that the HEXIM proteins are dimeric. The dimer interface encompasses a long helix at the C-terminal third of the sequence, which forms an interstrand coiled-coil (22,23). Apart from this domain, HEXIM is an intrinsically disordered protein (24).
The HP1 5΄-hairpin comprises a strictly conserved sequence signature, a GAUC repeat framed by single-stranded nucleotides, uridines in most sequences (Figure (Figure1B).1B). Several studies by the group of P. Stadler showed that in a gene with a Polymerase III promoter and terminating by poly-thymidine two GAUC sequences separated by a linker of sufficient length to make a hairpin, is a key feature that characterizes 7SK (14,16,25). It is thus named the 7SK-motif in the following text. In a previous study, we showed that the (GAUC)2 forms a short helix of four base-pairs (26). It was identified as belonging to the HEXIM binding site by monitoring in an NMR experiment the spectral perturbations of the imino-proton signals upon titration with a peptide comprising the RNA-binding sequence of HEXIM proteins. This sequence, an Arginine-Rich Motif (ARM), is the same for HEXIM1 and the minor HEXIM2, and similar to that of Tat (27). We further identified the bulged uridines, together with the (GAUC)2, as specifically required for HEXIM-binding by a mutational analysis monitored by binding assays with the full-length protein (26).
The 7SK-motif is intriguing since it is very small yet nonetheless characterizes 7SK. Moreover, although it is a symmetrical sequence, it binds only one ARM-motif of a HEXIM dimer (26,28). What the second ARM does is still an open issue, but it probably has consequences on the functional mechanism converting HEXIM into a P-TEFb inhibitor (28). It is expected that this mechanism relies on conformational changes of 7SK upon HEXIM binding (29), thus making the highly flexible and modular nature of the 7SK RNA a functional requirement. Interestingly, some of the molecular flexibility may also be expected in the 5΄-domain, which encompasses bulges and internal loops in all 7SK sequences (16,25). Hairpin flexibility was shown to be involved in Tat binding, although another RNA hairpin was considered in that study (30).
In order to uncover the structural characteristics of the 7SK-motif, and with the hope of better understanding its recognition by HEXIM, we launched a structural investigation of the 5΄-domain of 7SK. We present here a crystallographic investigation of the structure of the 5΄-hairpin of 7SK, HP1, modified at the apical loop to facilitate crystal formation. This 57-mer RNA forms a surprisingly straight and compact helical stack with most of the un-paired bases interacting inside the helix. In particular, two un-paired uridines of the 7SK signature fill the groove formed by the (GAUC)2 short helix. However, the structure is flexible, as indicated by the observation of two conformations at the 7SK-motif, with a third uridine either flipped in or out of the molecule. This conformational variability was further investigated by molecular dynamics (MD) simulation. It indicated that the two conformations represent stable alternatives. To further analyze how the structural elements highlighted in the crystal structure impact HEXIM-binding, we performed interaction assays with a series of mutants. These highlighted the importance of the nucleotides framing the (GAUC)2 motif, with a major contribution of the uridine on the 5΄-side of the strand (U40 in human) to recognition by HEXIM. Altogether, the structure of 7SK HP1 provides new insight into the functional importance of the plasticity of RNA molecules.
The RNA was produced by in vitro transcription with T7 polymerase from a PCR-generated template, purified by denaturing gel-electrophoresis followed by ion exchange chromatography and stored in a buffer containing 10 mM Na cacodylate, 2 mM MgCl2 and 0.25 mM EDTA. Monoclinic crystals grew at 4°C in 30% polyethylene glycol PEG 1000, 50 mM Tris pH 7.5, 50 to 75 mM NaCl, 50 mM MgCl2. The mutational analysis was performed as described previously (26). Data sets were collected from the native crystals and crystals soaked for several hours either with osmium hexamine (2 mM) or with gold (KAu[III]Cl4, 10 mM) in a solution containing 40% PEG 1000 prior to flash-freezing. The structure was solved by molecular replacement using PHASER with a model built using RNAcomposer (31) and the Au-derived crystal data set, exploiting the approach described by Robertson et al. (2010) (32). After refinement with BUSTER (33), the model at 2.3 Å resolution was used to phase the native crystals and osmium-derived crystals, to give three structures, as summarized in Table Table1.1. The molecular dynamics simulation was performed with the Amber force field (34) in a neutral system comprising potassium and sodium ions. More experimental details are provided as Supplementary Text.
The HP1 domain (boundaries 24–87) used in the structural work (Figure (Figure1A1A and B) corresponds to the largest domain common to the two published secondary structures of 7SK (13,14). It was previously shown to fold as a hairpin in the same way as in the full-length 7SK context, and to be sufficient to bind one HEXIM dimer (19,26,28,35).
No crystal was ever obtained with the wild-type hairpin comprising an apical loop of 11 nucleotides. Since this loop is not conserved across species, neither in length or sequence, we exchanged it with a small tetraloop of sequence UUCG, known to fold as a tight stable structure (36–38). The mutant hairpin binds HEXIM with a similar affinity as the wild-type, as shown by electrophoretic mobility shift assays (Figure (Figure1C1C and D). The shortening of the loop favored formation of monoclinic crystals, which appeared weeks after the crystallization tray was set-up. The crystals used for the structure determination were soaked with osmium or gold salts, but the structure was finally solved by the molecular replacement method since the very low derivation with these salts did not allow exploiting these data sets by the SAD or SIRAS methods. The molecular replacement was achieved using a model constructed with RNAcomposer (31) and the data from the crystal soaked with the osmium salt. The native structure was obtained by phasing the non-isomorphous native data (from crystals obtained in an earlier stage of the project) with the Os model. Altogether, we obtained three independent models at 2.2–2.3 Å resolution (Table (Table11).
The RNA molecule is composed of four regions (Figure (Figure2A):2A): the stem (24–33 and 78–87) comprises 10 base-pairs in a stack including three UoG pairs; the central region (34–39 and 68–77) contains one internal loop (34, 75–77) and a bulge (71–72) framed by two stacks of two C-G base-pairs; the 7SK-motif (40–45 and 63–67) comprises four base-pairs and the bulged uridines; the apical region comprises a stem of three C-G base-pairs and the tetraloop UUCG.
All crystals contain two molecules (A and B) in the asymmetric unit, leading to a redundant determination of six RNA structures, which are globally similar (average rmsd = 0.89 Å). However, the structures from the Au and Os data sets differ slightly from that obtained from the native crystals (rmsd 1.35 and 1.23 Å, respectively, for the A molecules, rmsd = 0.38 Å between molecules B of the Au and Os structures).
The global structure (Figure (Figure2B)2B) confirms the 2D base-pairing scheme shown in Figure Figure1A.1A. It is surprisingly straight, with the four domains piled up in a stack. Of the nine nucleotides represented as bulges in the topology diagram, six are turned inside the helical stack in the crystal structure and form ‘triple’ interactions with base-pairs. Interestingly, these interactions often involve an unexpectedly distant base-pair. This is summarized by green dotted lines in Figure Figure2A2A and will be detailed below. As a consequence, the structure has a length similar to a model molecule without bulges or internal loops (Supplementary Figure S1).
The structure is however quite different from a standard helix. The un-paired nucleotides induce a twist, which strongly impacts the orientation of the 7SK-motif. When the basal stem regions of the HP1-RNA and a standard helix are superimposed, the (GAUC)2 are found at a similar distance from the end, but are located on opposite sides, as shown in Supplementary Figure S1A.
All six structures show extruded bases. In the central region, U72 and U76 face the solvent, where they occupy various positions (Figure (Figure2C).2C). Base U41 is also extruded in most structures but can be found turned into the helical stack (Figure (Figure2C,2C, Supplementary Figure S2A). This variation is drastically different from the conformational flexibility observed in bases U72 and U76, which occupy a more continuous range of positions (Figure (Figure2C,2C, Supplementary Figure S2B). In the following text, the conformation with U41 flipped out, shown in Figure Figure2D2D is named OUT, and the other IN, with the base stacked in the helix (Figure (Figure2E).2E). The IN conformation is observed as an alternate conformation with an estimated occupancy of 50% in molecules A in both Os- and Au-derived crystals. The electronic density shows clearly two alternate positions for the phosphate group (Supplementary Figure S3A). The situation is exactly the same in the two maps obtained from the Au and Os data sets. Observation of two alternate conformations rules out the hypothesis that the IN conformation could be constrained by the crystal environment. For the extruded conformation, base U41 can be found either wedged into the minor groove of a symmetry-related molecule (Supplementary Figure S1B), or free in the solvent, due to the different environment of the native crystal. The two conformations thus seem to reflect a flexibility of this part of the molecule, which interestingly is located at the 7SK-motif.
The 7SK-motif comprises four base-pairs (G42-A43-U44-C45 paired with G64-A65-U66-C67) and three uridines (U40-U41 on one strand and U63 on the other strand). Uridines U40 and U63 are nested into the major groove of the regular helix formed by the four base-pairs (GAUC)2. The uridine on the apical side of the 7SK-motif, U63 (in orange in Figure Figure3)3) binds at the level of base-pair G42-C67, and forms a trans Watson–Crick/Hoogsteen pair with G42. The imino proton of U63 contributes a H-bond (2.9 Å) with the carbonyl group of G42. Another, more distant, H-bond is possibly made between the O2 oxygen of U63 and the amino group of the base C67 (3.4 Å). In the OUT conformation, U40 lies at the level of base-pair A43-U66 of the (GAUC)2 motif, and interacts with the Hoogsteen face of the adenine (Figure (Figure3A3A and C). In the IN conformation, U41 occupies this position and makes the same cis Watson–Crick/Hoogsteen interaction (Figure (Figure3D).3D). This induces a displacement of U40 toward the apex of the molecule, at the level of the C45-G64 base-pair, where it is then involved in a trans Watson–Crick/Hoogsteen pair with G64 (Figure (Figure3B).3B). This corresponds to a change of register of two steps in the helical stack. Apart from the exchange of position of U40 and U41, only a few changes are observed for the (GAUC)2 helix and uridine 63 (Supplementary Figure S2). The rmsd between the two conformations is 0.68 Å for U63 and less than 0.3 Å for the AUC sequence. It is slightly higher for guanines G42 and G64 with rmsd values of 1.02 and 0.62 Å, respectively. Interestingly, in the Au-soaked crystal, a low occupancy gold ion is observed at the level of C45-G64, at the position occupied by base U40 in the IN alternate conformation.
The nucleotide A34 of the 5΄ strand of the internal loop lies between the 7SK-motif and the central region (Figure (Figure4A4A and C). It inserts into the major groove, in the plane of base-pair G69-C38, resulting in a A34/G69 trans Watson–Crick/Hoogsteen interaction. The amino group of A34 establishes a H-bond with N7 of G69. This special position suggests that A34 could monitor the relative orientation of the blocks formed by the 7SK-motif on one side and the central region on the other. The major groove width is quite small at this level, with a distance of 3.1 Å across the groove, as compared with the typical 4.5 Å. The groove width was obtained with Curves+ (39) and represents the minimal distance between the backbone spline curves passing through the phosphorus atoms.
The apical part of the central region comprises a bulge of two unpaired nucleotides (C71-U72), among which only U72 is extruded (Figure (Figure4A).4A). C71 is packed in the major groove of the helix, where it makes a cis Watson–Crick/Hoogsteen interaction with the G74 of the C35-G74 base-pair (Figure (Figure4A4A and CB). One H-bond is formed with a distance of 2.9 Å, between the amino group of C71 and the carbonyl of G74. Another short distance (2.7 Å) is observed between C71(N3) and G74(N7) atoms, but formation of an H-bond requires one nitrogen to be protonated. The insertion of C71 coincides with a widening of the major groove and enlargement of the helix diameter. Interestingly, a magnesium ion was found in the vicinity. It is located in the groove, between bases A34 and C71 (Figure (Figure4A4A and B). This was observed in molecules B of all three crystals. In the native A molecule, density was also present, but more distorted. It was attributed to a sodium ion, on the basis of larger distances with the coordinated water molecules (around 2.6 Å compared with 2.0 Å for magnesium).
The lower part of the central region comprises nucleotides C75, U76 and A77. While in the sequence, these nucleotides form an internal loop with A34, they behave as a bulge of three in the structure (Figure (Figure4B).4B). Only U76 is extruded. The two other nucleotides pack inside the helix and interact with base-pairs in the stack (Figure (Figure4E).4E). Nucleotide C75 interacts with base-pair C33-G78. The distance (2.8 Å) and angle between the cytidine C75 amino group and the guanine carbonyl group are compatible with H-bond formation. As in the case of C71, a short distance (2.7 Å) is observed with N7 of the guanine, suggesting formation of a C+G Watson–Crick/Hoogsteen pair (40). This requires the cytidine to be protonated at the N3 position. Similar cases have been observed in RNA structures (41), and their importance in RNA folding was stressed (42). The single-stranded adenine A77 lies between the planes of the G31-C80 and U32-G79 base-pairs. It is stabilized by an interaction of its Watson–Crick edge with the Hoogsteen side of G31, with one H-bond formed between its amino group and the N7 nitrogen of G31 (Figure (Figure4D).4D). A distance of 2.7 Å is also observed between the N1 nitrogen atom of A77 and the O6 carbonyl of G31, suggesting the formation of a second H-bond. Such an interaction suggests protonation of the adenine (42). The same situation was observed in the core of yeast tRNAAsp (43). The observation of protonated bases stabilizing triple interaction was a surprise, since the crystals were grown in neutral conditions at pH 7.5. However, crystals grew after several weeks, and the polyethylene glycol used as precipitant is known to acidify upon ageing (44) (https://hamptonresearch.com/documents/growth_101/27.pdf). Together with the formation of two triples, the CUA bulge introduces a surprising twist in the hairpin. This is materialized in Figure Figure4D,4D, showing that the base-pairs on either sides of the bulge, G74-C35 and C-33-G78, are almost perpendicular. This figure also highlights the symmetry of the central region, in particular the position of bases C71 and C75. On the whole, C71 from the bulge and C75 and A77 from the internal loop form a remarkable stack of triple interactions stabilized by protonation, as schematized in Figure Figure1A1A and Supplementary Figure S5. The three triples lock the central region into a tight structure comprising a magnesium ion and introduce a tight twist of the helix.
The observation of U41 alternating between OUT and IN positions the helical stack, as well as the straightness of helix prompted us to launch an MD simulation to analyze if this reflects the molecular plasticity of this RNA, or was due to some influence of the crystal environment. The MD simulation was performed in a classical fashion using the latest AMBER force field, including all atoms and solvent with magnesium and sodium ions. Two MD simulations were performed in parallel, starting either from structures OUT or IN from the Os crystals. After a first minimization step, dynamics simulation was performed for 80 ns, a time-lapse that allows the evaluation of the stability of the experimental structures. The two structures appeared to be globally stable, as indicated by the variations of rmsd along the trajectory reported in Supplementary Figures S4A and S4B.
The simulation was coupled with an analysis of the helical parameters with the CURVES+ program, which allows the handling of all parameters for the full trajectory (39,47). Analysis of the helical axes bending in the trajectory showed the molecule to be most often more bent in the simulated models than in the crystal (Supplementary Figures S4C and S4D). The angles between the axis of the stem helix (base‐pairs 1 to 10) and the upper stem (base pairs 14 to the loop) observed in the crystal structures, 26° on average, does not correspond to the highest frequency, but is clearly observed in the distribution along the trajectory (Supplementary Figure S4C and D). Interestingly, the bending angle observed in the structure seems to be more frequently observed in the trajectory starting from the IN conformation.
Visual inspection of the trajectories did not reveal conformation exchange in the time lapse of the simulation. In particular, U41 did not move out of the groove in the simulation starting from the IN conformation. Visual inspection of the simulation suggests fluctuations of the U41 base orientation, which turns perpendicularly to the A43-U66 plane and loses the H-bonds. Nonetheless, U41 stays in the groove at the same level (Figure (Figure5A5A and B). Starting from the OUT conformation, the MD simulation shows that the position of U40 is stable. U41 did not insert into the helical stack in the time lapse of our experiment, as shown in Figure Figure5C5C and D.
The position of U63 at the bottom of the uridines-stack in the major groove of the (GAUC)2 is also remarkably stable when starting from the OUT conformation. This is highlighted by the minimal variations of the distance between the imino proton of U63 and the carbonyl group of G42 along the trajectory (Figure (Figure5E5E and F). When starting from the IN conformation, U63 moves concomitantly with U41. It adopts a tilted position, breaks its H-bonds, but stays in the major groove.
Thus, both MD simulations confirm that nesting two uridines (U40 and U63) in the major groove at the (GAUC)2 forms a stable structure, and that the stacking of the third uridine (U41) into the same groove is also a viable conformation. The MD experiment does not support the hypothesis that nucleotide U41 might have been trapped inwards or captured outward by contact with another molecule in the crystal. The two positions of U41 observed in the crystal seem thus to represent two alternative structures of the 7SK-motif.
Our MD simulations showed, in contrast, more variations of the structure in the central region below the 7SK-motif.
MD simulation indicates that the nucleotide A34 stays in the major groove, as seen in the crystal structures (Supplementary Figure S5C). It stays extended far in the apical direction although the interaction with the base G69 is lost in favor of an interaction with the backbone. Comparison of Supplementary Figures S5C with S4A suggests that the rotation of the base A34 could impact the major groove width and perhaps the bending of the molecule. This hypothesis was tested by performing a new MD simulation after changing the sequence at A34 for uridine (mutant A34U). However, even after 50 ns simulation, this change did not seem to impact the global bending of the hairpin. The U34 residue stayed in the groove, nevertheless.
The largest changes observed in the MD simulation were located in the central region. The triple interaction of C71 with the G74-C35 base-pair is lost after about 20 ns in favor of an interaction of C71 with the backbone at G70. C71 thus moves about one step in the apical direction to find a position which seems to relax the backbone, as schematized in Supplementary Figure S5B. The interaction of A77 with base-pair G31-C80 is already lost at the initial minimization step. A77 moves away from the plane of the G31-C80 base-pair, and re-localizes after 10 ns to the level of base-pair C35-G74, three steps away in the apical direction, a considerable change of register (Supplementary Figure S5D). It could be stabilized there by H-bonds with G74 or C35. Similarly, C75 quickly leaves the level of base-pair C33-G78 to reach the upper level of base-pair C36-G73. This corresponds to a +2 change of register. It seems to be stabilized by catching the ribose O2’ and the base O2 oxygens of C36 in a bifurcated H-bond.
These major changes suggested by the MD simulation involve nucleotides C71, C75 and A77, for which the location in the crystal structure depends on protonation. This is not surprising, since the MD was performed at neutral pH.
The 7SK-motif is embedded in a larger sequence, named the HEXIM-binding site with reference to our previous analysis by NMR-footprinting (26). Titration with the ARM peptide of the HEXIM RNA-binding sequence showed spectral environmental changes of the imino-protons of several residues at the 7SK-motif and the two stems of three base pairs on either sides. Figure Figure6A6A summarizes these previous findings. Besides specific binding, the analysis revealed an increase of solvent accessibility at U44 and G64 and stabilization of the structure at G69, G70 and at the base-pair A39-U68.
Binding of RNA variants with the purified recombinant human HEXIM1 protein was analyzed by electrophoretic mobility shift assays. Several positions were mutated (Figure (Figure6).6). Some did not impact HEXIM binding, such as deletion of U72 or change of A34 to uridine (A34U, Figure Figure6D,6D, dark blue curve). Larger impact was observed for the mutations at the 7SK signature. The mutation of both uridines U40 and U41 into cytidines results in a severe loss of capacity to bind HEXIM, as shown in Figure Figure6C6C (dark red curve). These two residues also were shown to be important in our previous study (26), where their simultaneous deletion conduced to a deleterious effect on HEXIM binding. The present subtler change into cytidines indicates that the nature of the bases is important. Furthermore, the individual mutations into cytidine were analyzed. Mutation U41C impacts the binding efficiency (Figure (Figure6C,6C, red curve). The impact is in the range observed previously for the deletion of U63, which was measured again in the present analysis (Figure (Figure6C,6C, orange curve). Mutation U40C shows a much stronger effect, suggesting this residue to be of prominent importance (Figure (Figure6C,6C, light blue curve). On the whole, the contribution to HEXIM-binding of each uridine correlates well with its position in the stack shown in Figure Figure6B,6B, with a major contribution for the central U40 (cyan in Figure Figure6B)6B) facing the core base-pairs of the (GAUC)2. The two central base-pairs of the (GAUC)2 were shown previously to be essential for HEXIM recognition, since changing to (GGCC)2 abolished binding (26). Such a modification impacts both grooves of an RNA helix. We thus changed again these two base-pairs, but reversed them in (GUAC)2. The new mutation impacts HEXIM-binding as strongly as observed previously for (GGCC)2 (Figure (Figure6C,6C, dark blue and green curves).
Interestingly, mutant A39G shows a reduced capacity to bind HEXIM, while A39G-U68C restores a full capacity to bind (Figure (Figure6D,6D, red and orange curves). The same result was observed with A39U, which showed less binding, and the double mutant A39U-U68A, which restored binding (Figure (Figure6D,6D, blue and green curves). This suggests that a standard base-pair must be present.
The position of nucleotide A34 just below the 7SK-motif, on the minor groove face, suggested this nucleotide to belong structurally to the HEXIM-binding site. However, the mutation A34U does not impact HEXIM binding.
The straightness of the hairpin conformation rapidly appeared to be a singularity of the crystal structure. During the molecular replacement process, many models were built with available web facilities, such as Assemble (48), RNAcomposer (49), Vfold-3D (50) and MC-fold (51). All those models were more bent than the observed structure in the present crystals, some of them as much as an ‘L’. This is also the case of a solution structure obtained in parallel with NMR (52), which was started when we faced phasing difficulties, and which will be compared in details in a future manuscript. The straightness observed seems to rely essentially on the interactions taking place at the central region. Among them, the insertion of A34 on top of the central region seems to add tension, like a brace restraining the helical flexibility. The extrusion of A34 favors the stacking of the short stem formed by the two base-pairs C35-G74 and C36-G73 on top of the basal stem, thus forming a continuous helix. On the other strand, the ribose–phosphate chain makes a sharp turn allowing C71 of the C71-U72 bulge to insert into the groove. This is stabilized by an interaction between the protonated cytidine C71 with G74, the nucleotide at N+3, reminiscent of a protonated U-turn (53), but with four nucleotides. The nucleotides of the ‘internal loop’, A77 to C75, form another tight loop pinned onto the helical stack. This again maximizes the stacking of bases and maintains the regular spacing of the base-pairs unaffected by the bulged residues. However, this introduces a remarkable twist. As a result, the base-pairs on either sides, C33-G78 and C35-G74 are almost perpendicular (Figure (Figure4D4D).
The observation of triple interactions formed with base-pairs as distant as three steps in the helical register came as a surprise. The triple interactions at the central region are all formed with G-C base pairs and depend on protonation. They do not hold in the MD simulation, which was performed at neutral pH, closer to physiological conditions. The relaxation observed upon MD simulation corresponds to a change of register of residues of the internal loop (C75, A77) and the bulge (C71) that could be sustained by the presence of the same (G-C)2 sequence on each side of the bulge (Supplementary Figure S5). Toggling between the two situations may be favored by such a symmetrical sequence with five stacked Cs on one strand and Gs on the other, which may sum up as an original source of flexibility.
On the whole, the observed straight hairpin corresponds to a particular conformation favoring stacking of several bases from the bulges into the helix, where they may be trapped upon protonation. Protonation of adenine and cytidines as a potential contribution for hydrogen bond formation and stacking has already been observed in the internal loop of a hairpin ribozyme and proposed to be of biological significance (54,55). The observed straight helix may represent a rare conformation trapped in the crystal, but seems however to depend on peculiar properties of the sequence at the central region.
Interestingly, a magnesium hexahydrate was found in the groove between the bases C71 and A34 (Figure (Figure4A).4A). This suggests that magnesium binding could participate in stabilization of the special conformation observed at the central region. The solution structure realized with the same RNA by NMR at low ionic strength in the absence of magnesium showed a different conformation of the RNA, in particular at the central region (52). Interestingly, the NMR study reported effects following magnesium addition, by monitoring the exchangeable protons, and revealed that the largest changes were obtained in the central region. Although tight, this remarkable packing of the middle bulges still leaves two uridines extruded toward the solvent. They might be involved in protein recognition, or RNA packing into a stable tertiary structure, as suggested by the interaction represented in Supplementary Figure S7. However, none of our experiments sustained that idea, unfortunately.
The (GAUC)2 motif forms a standard helix of four base-pairs. The main surprise was to find the major groove remarkably narrow and filled with uridines. Uridine U63 lies at the level of the bottom base-pair G42-C67. The next base pair of the short helix, A43-U66 is the recipient of a Hoogsteen interaction with one uridine of the lower bulge. This uridine may be either U40 or U41, but when uridine U41 is inserted in the groove, it pushes the other (U40) in the apical direction, where it positions at the level of the top base-pair of the (GAUC)2. Thus, U40 seems always buried in the groove, which imparts to this base a prominent role in the structural motif. Besides the packing of the uridines into the major groove, this dynamic behavior provides another layer of originality. The two positions of U41 represent two snapshots of a peculiar swapping mechanism, which may correspond to a functional distribution. Indeed, in the time-lapse of the MD simulation, both positions of U41, inside or outside of the groove, seem stable.
Uridines at positions equivalent to 40, 41 and 63 are conserved in 7SK sequences from vertebrates. Looking wider in evolution, including arthropods, the U40-U41 bulge may be extended to three residues by U, C or A, but one uridine is always present at the junction with the lower stem (25). This conserved uridine could insert in the groove to interact with the constant AU base-pair of the (GAUC)2 motif. The observation of the prominent contribution of one uridine of the lower bulge (here, U40) as a particular feature of the present structure suggests to expand the sequence for the 7SK-motif to UuGAUC/uGAUC, where capital letters stand for sequences conserved throughout evolution (as far as 7SK has been identified yet) and smaller letters stand for conserved sequences in vertebrates.
The 7SK-motif always tops a central region with bulges and a short stem ending with a Watson-Crick A-U base-pair equivalent to A39-U68. This base-pair was suggested to be dynamic in a previous investigation with NMR, and stabilized upon peptide binding (26). The present crystal structure shows it formed as a standard pair. However, it also reveals a large propeller twist: 19.3° on average as compared with the standard value of 11.6° or the average value of 11.4° for all base-pairs in the six structures. The MD simulation indicates that the A39-U68 base-pair, while keeping the Watson-Crick H-bonds, is susceptible to distortion. Indeed, large variations of the helical parameters (opening, shear and stretch) were observed at the end of the simulation, after ~70 ns (Supplementary Figure S6). These variations were not observed at the central base-pairs of the (GAUC)2, as e.g. the A43-U65 base-pair (Supplementary Figure S6B, middle panels). The only other variations were observed at the C45-G64 base pair, on the apical side of the (GAUC)2. The global picture suggested by these observation is that the rigid (GAUC)2 recipient for the uridines functions like a knuckle within a more dynamic environment. The internal loop would command the orientation and distance of this knuckle with respect to the core of 7SK, which is maintained by the associated proteins LARP7 and MePCE. The two positions of U41 could then represent a topping conformational signal to be exploited for 7SK functions. In our functional analysis we didn't observe direct effect of point mutations in the central region. However, suppressing both the bulge and the internal loop had an impact on HEXIM-binding, suggesting a need for dynamics at the central region (26).
What makes the small UuGAUC/uGAUC motif so special thus seems to be the peculiar packing of the single-stranded uridines in the groove, as well as the dynamic nature of the motif and its environment. The importance of these features was analyzed by studying the impact of local mutations on one function of this RNA, namely HEXIM binding. For the 7SK motif, a clear correlation was observed between the position of the mutation in the motif and the strength of the impact with major effects observed for changes at the most internal part of the structural motif, i.e. at the two central base-pairs A43-U66 and U44-A65, as well as at U40. The formation of the U:AU triple fully explains this result, since a cytidine would not be stabilized on the Hoogsteen face of the A43-U65 base-pair. The impact observed with the U41C mutant is clearly smaller than for the U40C mutant. This difference suggests that the extruded position of U41 seems favored. It is indeed observed more often in the different crystals.
Consideration of what is known about how ARM-peptides bind their RNA targets allows to speculate about HEXIM recognition. In most cases observed thus far, the ARM peptide penetrates into the major groove that is consequently widened, thus allowing the basic residues of the peptide to bind the phosphates at the rim of the groove (56,57). A closely related example comes from the protein Tat binding to the TAR RNA from HIV (58,59), BIV (57,60) or EIAV (61). The TAR sequence in EIAV is different from the 7SK sequence bound by Tat. Nevertheless, the ARM peptide of EIAV Tat folds as a helix in the deep major groove of the RNA, where it interacts specifically with several bases and stabilizes a UoG enol conformation, in addition to binding phosphates at the rim of the groove (61). In HIV-TAR the sequence recognized by Tat is similar to that recognized in 7SK by HEXIM and Tat (20). A bulged U between short stems with a base pair equivalent to A43-U66 are essential. Structural studies of HIV-TAR recognition by argininamide or a peptide from Tat showed the bulged U to be pushed into the major groove of the apical stem, in the vicinity of the A-U base-pair. An early work predicted the A-U base pair to form a triple interaction with the U, stabilized by the interaction with argininamide (58,62). This was not observed in later structures of HIV complexes (63), but such triple was formed in BIV-TAR. The Tat peptide stabilizes the TAR conformation around the triple by specific interactions with the bases and backbone of the RNA. In the present structure of the 7SK RNA, a striking difference is that the triples are observed in the free form of the RNA. Moreover, with the major groove almost filled with the uridine bases, the penetration of a peptide, folding in the groove as a β-hairpin in BIV (60) or as a α-helix in EIAV (61), seems prevented. This might suggest recognition on the minor groove, which is more accessible, although poorer in terms of H-bond donor/acceptor variety. However, the experiment where the central AU/UA base-pairs were reversed strongly supports involvement of the major groove. Such mutation does not change much the distribution of H-bonds donors/acceptors in the minor groove, but it clearly does in the major groove (64,65). The strong effect of the mutation indicates thus the major groove as important for HEXIM recognition. In fact, when U41 is extruded, the major groove shows some accessibility, as it opens like a mouth on the apex side (Figure (Figure6D).6D). There are several solvent molecules located in that area (conserved in all molecules) and this is the place where a gold ion was bound in the short soaking process. As often observed, these solvent molecules may indicate the protein binding site. Perhaps the ARM peptide of HEXIM does not fold deep in the groove, but wraps near the rim and around the structure defined by the uridine filling, and is facilitated in the OUT conformation. It cannot be excluded that the external, more hydrophobic, sides of the uridines (C5-C6 atoms) participate to the recognition, as was observed for BIV-TAR (1,57,60). On another hand, the present structure, and more particularly the IN conformation could represent a ‘switched-off’ step of the HEXIM-induced regulation, or another functional state of 7SK, independent of HEXIM-binding and P-TEFb regulation. Indeed, possible involvement of 7SK in other regulation processes, such as translation regulation, was raised (66). Altogether, this merits further investigation, as it provides an interesting case to further explore the slightly different concepts of recognition and binding.
Atomic coordinates have been deposited in the Protein Data Bank under accession codes 5LYS (Au), 5LYU (native), 5LYV (Os).
Diffraction data were collected at the following synchrotron beam lines: BM30A at the European Synchrotron Radiation Facility (ESRF) (Grenoble, France), and Proxima 1 (proposal ID 20090535) at SOLEIL synchrotron (Saint-Aubin, France). The authors are most grateful to the machine and beam line groups for making these experiments possible.
The authors thank Adam Ben Shem, Pierre Poussin-Courmontagne and Valérie Lamour for their helpful suggestions, Fabrice Jossinet for help with modelling, Isabelle Lebars for discussions and particularly Eric Westhof for invaluable comments on the structure.
Supplementary Data are available at NAR Online.
Centre National de la Recherche Scientifique; University of Strasbourg; Fondation pour la Recherche Médicale; Sorbonne Universités (UPMC); French National Agency for Research [TrscrREGsnRNP ANR-06-BLAN-0072]; DynamIC [ANR-12-BSV5-0018]; French Infrastructure for Integrated Structural Biology (FRISBI) [ANR-10-INSB-05-01]; INSTRUCT as part of the European Strategy Forum on Research Infrastructures (ESFRI); DMZ benefitted from doctoral fellowships from CONACyT (Mexico) and Association pour la Recherche contre le Cancer. Funding for open access charge: CNRS-UPMC.
Conflict of interest statement. None declared.