The double hairpin exists in solution as interconverting conformers
The conformational behavior of the double hairpin motif was initially characterized in the context of a larger ¬CES
construct that included residues of DIS-2, . Previous studies showed that ΨCES
exists predominantly as a monomer at low RNA concentrations and low ionic strength, and that the DIS-2 hairpin adopts two slowly inter-converting conformers,41
. We now show that the non-kissing form of SL-C also exists in two slowly-interconverting conformations, one in which conserved residues G229-G332 form a tetraloop and G338 to A341 form a bulge, an another in which several of these residues form Watson-Crick base pairs, . Both of these structures (and only these structures) are predicted to occur based on free energy calculations with MFOLD.47
Analysis of 2D NOESY spectra obtained for an isolated SL-C RNA confirmed the presence of the predicted structures (Supplementary Figure S1
), and NMR spectra obtained for native ΨCES
exhibited exchange cross peaks consistent with those observed for SL-C and indicative of the conformational equilibrium shown in . The exchange peaks persisted in NMR spectra obtained for ¬CES
under physiological concentrations of salts that favor dimerization (140 mM KCl, 10 mM NaCl, 1 mM MgCl2
), even though the intensities of the resolved diagonal peaks indicated that the population of non-kissing conformer is low (< 5%).
MFOLD calculations indicated that substitution of the U328-A333 base pair preceding the SL-C GACG tetraloop by an A-U base pair would prevent formation of the minor, non-kissing conformer. Interestingly, a comparison of the nucleotide sequences of the gammaretroviruses revealed that this substitution occurs naturally. We therefore prepared RNA constructs in which these residues were swapped. As predicted, these conservative substitutions (U328A/A333U) precluded formation of the alternate, non-kissing SL-C conformer and gave rise to NMR spectra that lacked the associated conformational exchange peaks ( and Supplementary Figure S1
). Therefore, in order to reduce crowding in the NMR spectra and simplify analyses, subsequent NMR studies were conducted with RNAs containing the U328A/A333U substitutions, .
Figure 2 Effects of a U328-A333 to A328-U333 base pair substitution on the RNA conformational equilibrium and gRNA packaging into the virus. (a) A portion of 2D NOESY spectrum obtained for wild type [ΨCES]2. Exchange peaks associated with the alternate (more ...)
To ensure that the conservative U328A/A333U substitution did not have unexpected effects on RNA packaging in vivo
, RNA encapsidation efficiencies were measured for RNAs containing mutant and wild type leader sequences. Virions were produced by transient transfection of human 293T cells, and quantified with values for wild type particles set to 100%. Previous work has demonstrated that virion quantification by virion protein (eg: reverse transcriptase activity48
) or by host 7SL RNA yields indistinguishable values.49,50
7SL is a host RNA that is incorporated into retroviruses in proportion to virion proteins, in a manner independent of viral genomic RNA.49-51
Therefore, the amount of 7SL RNA in a cell-free virus sample is directly proportional to virion proteins and can be used to normalize for the amount of virus in that sample. The RNAse protection assay that was used to quantify viral RNA packaging is shown in .
shows the riboprobe used, which allowed simultaneous quantification of encapsidated 7SL and virion genomic RNA. RNase digestion products are shown in . Analysis of RNA isolated from virions produced by transient transfection revealed that the U328A/A333U mutant RNA was packaged at levels comparable to wild type genomic RNA, while a mutant from which regions classically defined as sufficient to promote packaging of a heterologous RNA8
were deleted was packaged more than 200-fold less well than either wild type or the U328A/A333U mutant.
Confirmation of cross-kissing interactions by segmental labeling
NMR spectra obtained for ¬CES
under conditions that favor dimerization exhibit signals diagnostic of kissing interactions involving the GACG tetraloops of SL-C and SL-D. Unfortunately, the chemical shift differences between SL-C and SL-D loop residues are very small, and it was not possible to unambiguously differentiate between SL-C to SL-C, SL-D to SL-D, and SL-C to SL-D kissing modes on the basis of the 2D NOESY spectra obtained for fully protonated, or nucleotide-specifically protonated/perdeuterated [¬CES
RNAs. We therefore obtained 2D NOESY data for a ¬CES
sample prepared by segmentally ligating differentially deuterated fragments using T4 RNA ligase. Specifically, we ligated an SL-BC fragment, in which the guanosines were fully protonated and all other nucleotides were perdeuterated (GH
-SL-BC), with an SL-D fragment that contained protonated adenosines and deuterated G, C and U nucleotides (AH
-SL-D), . The AH
-SL-D fragment was prepared using a plasmid that encoded the hammerhead and HDV ribozymes at the 5′- and 3′-ends of the RNA, respectively, in order to obtain homogeneous products with adenosine and 2′-3′-cyclic phosphate groups at the 5′- and 3′-termini, respectively.52
After ribozyme cleavage, the SL-D fragment was treated with polynucleotide kinase to produce the desired 5′-monophosphorylated (donor) terminus, and was used in the ligation reaction with excess amounts of the SL-BC fragment. Efficient ligation was achieved without the use of stints, with typical yields (based on the limiting SL-D fragment) of ~85%, .
Figure 3 Evidence for inter-molecular cross-kissing interaction between SL-C and SL-D’. (a) Cartoon representation of the ΨCES sample employed, prepared by ligating G-protonated, A,C,U-perdeuterated SL-BC with A-protonated, G,C,U-perdeuterated (more ...)
2D NOESY spectra obtained for the [SL-C]2 kissing dimer exhibited spectral features similar to those reported previously for [SL-D]2. In particular, the H4′ proton of G334 (corresponding to C368 of SL-C) gave rise to an unusual upfield-shifted NMR signal (2.8 ppm) and exhibited an inter-molecular NOE with the aromatic H2 proton of A330, . A similar pattern of NOEs was observed in the 2D NOESY spectrum obtained for the fully protonated [ΨCES]2 sample (data not shown). Although these data confirmed that SL-C and SL-D both participate in kissing interactions, they did not allow unambiguous differentiation between SL-C:SL-C/SL-D:SL-D and SL-C:SL-D kissing modes. However, the observation of a A364-H2 to G334-H4′ NOE cross peak in the spectrum obtained for the segmentally labeled [ΨCES]2 sample () provides clear evidence that the kissing interface is formed by the loop residues of SL-C and SL-D.
NMR assignment of [ΨCD]2 by 2H-editing
To date, only a small handful of structures have been reported for RNAs comprising more than 50 nucleotides.53
We therefore focused our initial structural studies on the 132-nucleotide [ΨCD
dimer. A major impediment to NMR studies of larger RNAs is severe signal degeneracy resulting from the presence of only four different common nucleotides. Although spectral resolution of smaller RNAs can be increased through the use of multi-dimensional 13
C-edited NMR experiments, this approach is problematic when applied to larger RNAs due to signal losses and broadening resulting from strong 1
C dipolar coupling of aromatic C-H groups that are critical for assignment and structure determination.54
We therefore employed a strategy that involves the collection and analysis of 2D NOESY spectra for RNA samples prepared by in vitro
transcription using different combinations of fully protonated, fully deuterated, and partially deuterated nucleotides. This approach was intended to avoid sensitivity and resolution problems associated with 13
C labeling. Throughout this paper, we use a nomenclature in which only the proton-containing nucleotides are denoted, with superscripts indicating the positions of protons in partially deuterated nucleotides (8
, and R
= protonated on the C-8, C-2, and ribose carbons, respectively) and a lack of superscripts denoting full protonation. For example, a sample of [ΨCD
that contains fully protonated guanosines, perdeuterated cytidines and uridines, and adenosines with deuterons on the ribose and aromatic C-2 carbons and a proton on the aromatic C-8 carbon, is denoted G,A8
. Differentially deuterated samples utilized in these studies included: A,G8
-, and G,A2,R
As expected, the 2D NOESY spectrum obtained for fully protonated [ΨCD
exhibited broad, overlapping cross peaks and was not assignable, Supplementary Figure S2
. However, spectra exhibiting good resolution and sensitivity were obtained for several selectively deuterated [ΨCD
RNAs. Portions of the 2D NOESY spectrum obtained for G,A8
are shown in . In this spectrum, intra-residue NOE cross peaks for all guanosines, G-H1′ to A-H8 NOEs for all sequential G(i)
pairs, and most of the expected H8-to-H8 NOEs involving sequential purines were readily detected. Because all of the adenosine ribose protons were substituted by deuterium, inter-residue A(i+1)
-H8 to G(i)
-H2′/3′ NOEs were also readily identified. Assignments of partially overlapping H2′ and H3′ signals were confirmed by comparison with spectra obtained for the isolated RNA stem loops. Note that some of the H8-to-H8 NOEs could not be unambiguously assigned due to small chemical shift differences between the neighboring aromatic protons and/or the proximity of the associated NOE cross peaks to the intense diagonal. The spectrum also exhibited exchange cross peaks corresponding to non-kissing SL-C and SL-D species. Although the intensities of the exchange peaks were significant, intensities of the diagonal and NOE cross peaks associated with the non-kissing species were small, when measurable, and no intra- or inter-residue NOEs were detected for the non-kissing species. Based on a qualitative assessment of the signal intensities, we estimate that, under the conditions of the NMR experiments, the population of the non-kissing species is ca. 5 ± 3% of that of the kissing species.
Figure 4 Portions of 2D NOESY spectra obtained for partially deuterated [ΨCD]2 samples. Sequential aromatic to ribose (and H5) and aromatic to aromatic proton connectivities are shown in the upper and lower panels, respectively. (a) Data obtained for G,A (more ...)
Sequential and long-range NOEs involving the adenosine-H2 protons were assigned primarily from spectra obtained for G,A2,R-[ΨCD]2, . Intense, well resolved cross peaks were observed for all of the expected sequential and cross-strand A-H2 to G/A-H1′ proton pairs, in addition to the expected A(i)-H1′ to G(i+1)-H8 NOEs, . Notably, residues A353 and A354 of the linker that connects SL-C and SL-D exhibited sequential inter-residue aromatic-to-ribose proton NOEs consistent with A-form helical stacking, with A353-H2 exhibiting intense NOEs with the H1′ and H8 protons of A354 and G310, and with A354-H2 exhibiting NOEs with the G309-H1′ and H8 and C355-H6 protons (observed in the A2R,G8C6-[ΨCD]2 spectra). These data indicate that SL-C and SL-D are arranged in an end-to-end manner, in which the 3′-residue of SL-C, the linker adenosines, and the 5′-residues of SL-D are stacked in an extended, A-like geometry.
NMR signals observed for pyrimidine H6 and H5 protons in spectra obtained for [ΨCD]2 samples containing fully protonated cytosine or uracil residues were significantly broader than those associated with the purine H8 protons. For example, as shown in , 1H NMR linewidths associated with U-H6 protons in 2D NOESY spectra obtained for U,A8,G8-[ΨCD]2 were typically more than three-times greater than those associated with the G-H8 and A-H8 protons (the exception being U319, which forms an unstructured bulge; see below). Several purine H8-to-uracil proton NOEs could be assigned from these spectra, mainly because ΨCD contains only seven uracil residues. Analogous spectra obtained for ΨCD RNAs containing fully protonated cytosines were largely not assignable due to severe signal overlap (data not shown). To eliminate the relatively strong H5-to-H6 dipolar coupling that appears to be primarily responsible for the broader NMR signals, ΨCD samples were prepared using pyrimidine nucleotides containing a proton on the C6 carbon and deuterons on all other carbons. The quality of the NMR spectra obtained for these samples was significantly improved, enabling assignment of all expected purine-H1′ to pyrimidine-H6′ proton connectivities and most purine-H8 to pyrimidine-H6 connectivities. Representative spectra and assignments made for G,U6,A8-[ΨCD]2 are shown in .
The NMR spectra obtained for the highly deuterated samples also exhibited cross-strand NOEs (i.e., between protons on two different strands of a given helix) for protons that are likely to be separated by more than 5.0 Å. For example, in the NMR spectrum obtained for A,G8
, relatively intense G359-H8 to A371-H2, G323-H8 to A343-H2, and G334-H8 to A328-H2 NOEs were readily detected (Supplementary Figure S3
). Analogous cross peaks observed in spectra obtained for G,A2,R
() were originally attributed to spin diffusion, due to the close proximity of the A-H2 and G-H1′ protons. Since the G-H1′ protons are substituted by deuterium in the A,G8
-labeled sample, cross-strand A-H2 to G-H8 NOEs observed in these spectra can only be attributed to direct, long-range dipolar interactions.
Using the above approach, 100% of the non-exchangeable aromatic and H1′ signals were assigned. Signal overlap precluded independent assignment of a majority of the H2′ and H3′ signals, but by comparisons with higher resolution 2D NOESY and 2D 1H-13C correlated NMR spectra obtained for isolated SL-C and SL-D hairpins, signals for > 90% of the purine H2′ and H3′ protons and > 70% of the pyrimidine H2′ and H3′ protons of [ΨCD]2 could be identified.
The improved resolution provided by the partially deuterated samples enabled us to correct a previously misassigned NOE associated with A340. Specifically, NMR spectra obtained for C,G8
revealed that a A340-H8 to C342-H1′ cross peak, which was previously attributed to spin diffusion via the intervening A341-H1′ and/or –H2 protons,44
was actually due to a direct dipole-dipole interaction. The A340-H8 proton also exhibited an intense intra-residue H8-to-H1′ NOE that was obscured by signal overlap in previously obtained NMR spectra, indicating that this residue adopts a syn
conformation. As described below, these assignment corrections led to a change in the orientation of A340, but did not significantly affect other residues of the GGAA bulge in structures calculated for [ΨCD
The 1H NMR chemical shifts measured for the stem loops of [ΨCD]2 were nearly indistinguishable from those observed for the isolated SL-C and SL-D hairpin RNAs, indicating that the structures should also be similar. The 1H NMR chemical shifts of the adenosine residues that link SL-C with SL-D differed by 0.02-0.83 ppm (Δδ) relative to the shifts observed previously for a 101-nucleotide RNA containing mutations in the loops of SL-B, SL-C and SL-D that prevented dimerization (Δδ for the H8, H2 and H1′ protons of 0.052, 0.271, 0.019 ppm for A353 and 0.325, 0.831 and 0.455 ppm for A354). This construct (SL-BmCmDm-UU) contained two additional non-native 3′-uridines that formed base pairs with A353 and A354 of the linker, and this is likely responsible for the observed chemical shift differences. Interestingly, the NOE cross peak patterns observed for A353 and A354 of both [ΨCD]2 and SL-BmCmDm-UU are consistent with A-form helical stacking. Thus, relatively intense NOEs were observed between A353-H2 and protons on the following (A354-H1′, -H2, and –H8) and cross-strand (G310-H1′) residues, and between A354-H2 and the H1′ proton on the following residue (C355). In addition, residues C352 through C355 exhibited sequential ribose(i) to H8/6(i+1) and H8/6(i) to H5(i+1) NOEs consistent with a structure in which residues C352 through C355 stack in a continuous, A-form like manner.
A total of 1248 unique and functionally non-redundant NOE-derived 1
H distance restraints were obtained from the 2
H-edited NOESY spectra, . Hydrogen bond restraints and torsion angle restraints were employed as flat-well potentials with values centered at those observed in idealized A-form helices for segments exhibiting NOE cross peak patterns and intensities consistent with A-form helical conformations (residues G310-G313, C316-G318, G320-G323, G324-A328, U333-C337, C342-C352 of SL-C and C355-A362, U367-G374 of SL-D). Major groove inter-phosphate distances were loosely restrained for these residues using database potentials derived from high-resolution X-ray crystal structures.53
These restraints were required to avoid collapse of the major groove that can result from the asymmetric distribution of distance restraints in A-form helical segments.44,53,55,56
Inter-molecular cross-kissing H-bond restraints consistent with the NOE results obtained for the segmentally-labeled sample were employed (see above), but no intra-molecular H-bond or torsion angle restraints were employed for the residues of any of the loops or bulges. Because the NOEs associated with A353 and A354 are consistent with an A-form helical stacking, loose inter-phosphate distance and torsion angle restraints (ideal ± 50°) were employed for these residues as well. In addition, weak restraints were employed for the phosphorus, ribose and aromatic carbon atoms to enforce symmetry between the two molecules of the dimer.
NMR Restraint and Structure Statisticsa
An ensemble of 20 structures with lowest target function (4.11 ± 0.05 Å2
) was obtained from an initial pool of 160 structures generated using Cyana.57
The structure is well defined by the NMR data, with best-fit superpositions of all heavy atoms affording pairwise RMS deviations (relative to mean atomic coordinates) of 0.40 ± 0.12 Å, and . Residues G309 and U319 (and the corresponding residues in the symmetrical dimer) were not experimentally restrained and therefore exhibit poorer convergence. These residues give rise to intense and relatively narrow 1
H NMR signals and are therefore likely to be disordered. Statistical information regarding the restraints employed, restraint violations, and the structure convergence is provided in .
Figure 5 Structure of [ΨCD]2 determined by NMR. (a,b) Orthogonal views of the ensemble of 20 structures calculated with Cyana showing the degree of convergence and overall shape and asymmetry of the dimer. The 5′- and 3′-terminal residues (more ...)
Description of the [ΨCD]2 NMR structure
Many features of the [ΨCD
structure are consistent with those observed in NMR structures calculated previously for native and mutant fragments of ΨCES
Thus, residues G310-G323 adopt an A-form helical “lower stem” in which unpaired residues A314 and C315 are internally stacked, U319 forms an unstructured extra-helical bulge, and residues G338-A341 form an A-minor K-turn type structure that connects the lower stem with the upper stem (G324-C337). As observed previously for a mutant ΨCES
RNA (in which residues of the GACG tetraloop were mutated to prevent dimerization44
), the nucleobase of G338 stacks against C337 in an A-like manner, G339 adopts a syn
conformation and packs against the nucleobase of A340, and A341 packs against the nucleobase of C342 and forms an A-minor like interaction with A324 and A325 of the upper stem, . In addition, we now have evidence that A340 adopts a syn
conformation, and as indicated above, NOE spectra obtained for the C,G8
sample provides clear evidence for direct A340-H8 to C342-H1′/–H4′ dipolar interactions (which were originally erroneously attributed to spin diffusion via A341). As such, the orientation of A340 in the [ΨCD
structure differs from that reported earlier, and is now shown to pack against the nucleobase of G339 and the nucleobase and ribose moieties of A341 and C342, respectively (). The structures and interactions of the two symmetrical kissing interfaces (SL-C to SL-D’ and SL-C’ to SL-D) are essentially indistinguishable from those observed previously for a dimeric [SL-D]2
A surprising feature of the [ΨCD
structure is that the lower stems of SL-C and SL-D are stacked in an “end-to-end” orientation, in which the linker residues A353 and A354 are stacked in an A- helical manner between the G310-C352 base pair of SL-C and the C355-G374 base pair of SL-D, . Although similar end-to-end stacking interactions were observed in the monomeric, mutant ΨCES
RNA, those interactions were attributed to non-native base pairing between the linker adenosines and two 3′-uridines that were included to accommodate an HpaI
restriction site used for sub-cloning.44
RNA used in the present studies was transcribed from a template that encoded a 3′-hammerhead ribozyme which, after cleavage, afforded a product RNA that terminated at G374 and did not contain additional native or non-native residues. Thus, in the [ΨCD
structure, residues C352-C355 are stacked in a continuous, A-form like helical conformation as observed previously for ΨCES
, even though that A354 and A355 do not form Watson-Crick base pairs. As such, the structure of [ΨCD
is both elongated and flat, with overall dimensions of ca. 95 Å by 45 Å by 25 Å.
The 5′-guanosine residues do not form base pair or stacking interactions and appear disordered in the structure. This is consistent with earlier NMR studies showing that this guanosine (G309) interacts directly with the zinc finger domain of the MoMuLV NC protein. The two guanosines of the symmetrical dimer (G309 and G309′) are separated by ~20 Å in the NMR structure, , a finding that has important implications regarding the likely structure of the preceding DIS-2 duplex (see below).
Cryo-electron tomography of [ΨCD]2
Because of the inherent paucity of long-range structural information available in the NOE data, attempts were made to establish relative helix orientations and overall molecular shape using NMR-derived residual dipolar couplings (RDCs) and small angle X-ray scattering (SAXS), respectively. NMR-derived RDCs and RCSAs have been used previously to establish global structural features,44,53,58-85
including inter-helical orientations.44,63,64,68,86-91
Unfortunately, neither of these approaches was successful. As indicated above, ΨCD
exists as a monomer at low ionic strength, and under the conditions of the NMR studies, a small amount of the monomer species (or extended dimer species, in which one of the hairpins does not participate in kissing interactions) persists and is detectable in the NMR spectra. Despite the low abundance of these species, their higher apparent rotational mobilities lead to significant contributions to the RDC spectra obtained, precluding quantitative analysis of the RDC data. In addition, dynamic light scattering (DLS) and SAXS data obtained for ΨCD
under conditions of the NMR experiments were consistent with the presence of a small amount of one or more higher-MW species. Unfortunately, sample heterogeneity, and particularly the presence of even small amounts of high MW contaminants, can complicate quantitative analysis of the SAXS data. We therefore carried out structural studies by cryo-electron tomography (cryo-ET).
Cryo-ET allows for the direct detection of molecular assemblies within a given sample, and for heterogeneous samples, species of similar size and/or topological properties can be classified for further analysis. A representative projection of 15 slices of unfiltered cryo-ET data obtained for [ΨCD
is shown in . Numerous punctate and elongated dark densities are clearly visible in the tomogram. In contrast, cryo-ET data obtained for a sample containing only buffer (the same buffer used to prepare the [ΨCD
sample) and processed under identical conditions lacked these features, , indicating that the distinct densities observed using the [ΨCD
samples are due to electron scattering by the RNA. As an additional control, cryo-ET data were also obtained for [ΨCES
. The tomograms obtained for this larger RNA exhibited densities with volumes and topological features consistent with the side-by-side packing of the DIS-2 helix and the [ΨCD
tandem hairpin (Supplementary Figure S9
). These findings collectively indicate that RNAs as small as [ΨCD
(132 residues) are amenable to structural studies by cryo-ET. Previously, the smallest reported molecule studied by cryo-EM was a 78 kDA, 252 residue DNA tetrahedron. The structure of this highly symmetric and rigid molecule was determined by single particle methods with symmetry enforced.92
Figure 6 (a) A projection though 15 slices of a tomogram of [ΨCD]2. The dark densities are proposed to correspond to molecules of [ΨCD]2. They consist of individual compact point-like densities, multiple adjacent densities, and extended multimers (more ...)
In addition to the compact densities discussed above, densities consistent with larger, elongated structures were observed in the [ΨCD]2 tomograms (red boxes in ). These more extended densities adopted random overall shapes. In view of the facts that (i) the NMR spectra obtained for the liquid state samples exhibited exchange cross peaks indicative of a minor population of monomeric, non-kissing species, and (ii) the DLS and SAXS data suggest the presence of a minor population of a higher molecular weight species, we attribute the smaller and extended densities in the cryo-ET tomograms to monomers, partly dissociated dimers (i.e., dimers linked by only a single kissing structure) and higher-order multimers. The [ΨCD]2 concentration used for cryo-ET was ~ 40-fold lower than that used for NMR measurements, which may explain why the non-kissing species were more abundant in the cryo-ET data than in the NMR data.
Stereo views of an unfiltered subvolume of a representative cryo-ET tomogram showing a typical compact density, along with the relative noise level, are shown in (movies of the raw data are provided in Supplementary Figures S5 and S6
). For comparison, a representative [ΨCD
NMR structure has been fitted into the density. The signal-to-noise ratio of the unfiltered cryo-ET data is clearly sufficient for alignment and averaging, and the overall dimensions of the compact densities are consistent with the dimensions of the [ΨCD
NMR structure, .
A total of 38 such subvolumes with compact densities were computationally extracted, classified, aligned and averaged (see Methods), affording an averaged cryo-ET density with approximate length of 95 Å, a width of 45 Å and a thickness of 25 Å at its narrowest point, . The averaged cryo-ET density map exhibits two-fold symmetry, even though no symmetry was assumed or applied in the reconstruction, alignment and averaging process, and appears to be fully consistent with the NMR ensemble, . Although U319 resides outside of the averaged cryo-ET density (), this residue is disordered in the NMR structure and appears to undergo rapid conformational averaging (based on the unusually narrow 1H NMR lineshapes and lack of significant inter-residue NOEs; see above). Interestingly, the cryo-ET average density exhibits a slightly concave shape about the approximate two-fold axis, which also appears consistent with the NMR structures, .