|Home | About | Journals | Submit | Contact Us | Français|
We report a “top-down” method that uses mainly duplexes' global orientations and overall molecular dimension and shape restraints, which were extracted from experimental NMR and small angle X-ray scattering (SAXS) data respectively, to determine global architectures of RNA molecules consisting of mostly A-form like duplexes. The method is implemented in the G2G (from Global measurement to Global Structure) toolkit of programs. We demonstrate the efficiency and accuracy of the method by determining the global structure of a 71-nucleotide RNA using experimental data. The backbone root-mean-square-deviation (RMSD) of the ensemble of the calculated global structures relative to the X-ray crystal structure using the experimental data is 3.0 ± 0.3 Å, and the RMSD is only 2.5 ± 0.2 Å for the three duplexes that were orientation-restrained during the calculation. The global structure simplifies interpretation of multi-dimensional nuclear Overhauser spectra for high resolution structure determination. The potential general application of the method for RNA structure determination is discussed.
Among the greatest advances in biology today are the discoveries of the various roles played by RNA in biological functions. RNA is an active participant in regulation of gene expression by interference 1 or by riboswitches, 2 in the processing of RNA introns, 3 in the maintenance of chromosome ends by telomerase, 4 in protein synthesis by ribosome. 5 RNA function is encoded in its dynamics and structures 6,7 and determination of RNA structures remains a major goal in contemporary biology. Currently, despite significant advances in X-ray crystallography and solution NMR, structure determination of any given mid- to large-size RNA molecule with a complex fold remains a daunting task. This is because of the great difficulty in growing crystals and/or obtaining phase information in X-ray crystallography, and severe size constraints on structure determination by solution NMR spectroscopy.
The prevailing approach for structure determination of RNA in solution is a “bottom-up” approach, similar to the approach used for determining protein structures, 8 despite vast differences in both structural features and chemical compositions between these two types of biomacromolecules. In RNA, very high structural similarities among its basic building units leads to very similar chemical shift environments, thus, a very narrow chemical shift dispersion and severe NMR signal overlaps, making it extremely difficult, if not impossible, to extract sufficient local and global structural information to construct global structures of mid- and large-size RNAs. 9,10 Consequently, the current “bottom-up” approach runs into a size barrier.
A survey of RNA X-ray crystal structures with resolution better than 3.0 Å in the Protein Data Bank reveals that A-form like duplexes are the most predominant building blocks in RNA structures and the A-form conformation in term of the sugar-phosphate backbones is very conserved (Tables 1 and Table S1 in Supporting Information, SI). The other A-form structural parameters also vary in narrow ranges, except for the basepair tilt, which heavily depend on the basepair types (Table 1). Thus, RNA duplexes (stems) depicted in a secondary structure can be treated as A-form like and can be generated from RNA databases as approximate initial structures with an acceptable accuracy. Therefore, in determining a global architecture of a RNA molecule that mostly consists of duplexes, there are essentially two problems: (1) orientations and phases (the rotation of duplex around the helical axis) of duplexes, and (2) the relative positions of the duplexes in a global sense. Once the global relative orientations, phases and relative positions of these duplexes are determined, so is the approximate global structure of the RNA. We use RDC-structure periodicity correlation to derive the discrete relative orientations (DRO) of duplexes in terms of the polar angles of a duplex axis, Θ and Φ, and phase angle ρ0 11 (Figure 1a,1b&1c), and utilize the SAXS-derived molecular envelope to identify the correct combination of duplex orientations and approximate relative positions of the duplexes. The programs ORIENT, BLOCK and PACK in the G2G toolkit calculates DROs, generates duplex coordinates from a RNA structure database library, and packs duplexes together, respectively. In the next step, the starting structures that are generated by G2G are subjected to a rigid-body simulated annealing (SA) refinement, where the orientations and phases of duplexes and the overall shape of the RNA are restrained. We demonstrated the accuracy and efficiency of the method by determining the global structure of the adenine-riboswitch (riboA) (71 nt) using experimental RDC and SAXS data. The structure of riboA with a different sequence (see Materials and Methods section) has previously been determined using X-ray crystallography 12 was used as a benchmark for the G2G method. It is noteworthy that combined use of RDCs and SAXS data to refine a known structure of tRNA has been reported in literature. 13
The four DROs of a chiral molecule (a domain or a whole protein or RNA) are usually derived using the singular value decomposition (SVD) method if the coordinates are known 14 or using periodicity-RDC correlation 11 if the structure is periodic structural elements, as described briefly in the following. Duplexes belong to a unique class of chiral molecules that consist of repetitive structural units that are arranged in certain periodic patterns. This structural periodicity is reflected in and correlated with geometrical measurements, such as dipolar coupling or residual dipolar coupling of particular types of spin pairs. 11,15 DROs are derived using an efficient nonlinear least-squares optimization routine to fit the RDC-structure periodicity correlation to experimental imino RDCs of duplexes. 11 The general RDC equation in terms of the spherical coordinates (θ, ) of the bond vector connecting the two nuclei A and B
was rewritten to explicitly express the RDC as a function of a duplex orientation relative to an alignment tensor:
Where C1 to C5 are functions of duplex orientation (Θ,Φ), Da and R. 11 The latter two are the axial and rhombic components of an alignment tensor, respectively. The bond vector of the nth residue in the duplex in spherical coordinates is given by bond length dAB, angle δ, to the duplex axis (defined in Figures 1d&1e), and angle ρn with the x-axis (in the x-y plane perpendicular to the helix axis). The angle phase ρn = (αn + ρ0) where ρ0 is the initial phase offset, αn = 2π (n-1)/T, n = 1, 2, 3, and T is the period of the duplex (A-RNA:T = 11).
Eq. 2 has five unknown variables, Da, R, Θ, Φ, ρ0, and these are extracted using the nonlinear least squares fitting 11 method in the program ORIENT and the best fits are selected based on the RDC root mean square deviation (RMSD):
In theory, the Da and R values obtained by fitting RDCs in each RNA duplex should be consistent with each other. In practice, the quality of fits depends on quality of the data, number of RDCs per duplex, closeness of the duplexes to idealized A-form RNA and the rigidity of structures. The values of Da, R and orientations and phases, (Θ, Φ, ρ0), can be extracted by fitting each duplex individually with independent sets of tensors, or using a simultaneous fit where the same Da and R are assumed for all duplexes, assuming the system is rigid. In the individual fitting, each duplex is fit independently and produces its own parameter set of Da, R and (Θ, Φ, ρ0). The individual fit works well only when the duplexes are long. In the case where the amount of available RDC data is not sufficient to obtain a reliable parameter set for every duplex, the initial Da and R values are obtained from the longest duplex, and are assumed as initial values for all duplexes. The alternative method performs a simultaneous RDC fit for all duplexes in a molecule, under the rigid-body assumption, that produces a common Da and R for all duplexes, and orientations and phases for individual duplexes.
In principle, the unique orientation of each duplex can unambiguously be resolved by applying a second independent alignment tensor. 16 However, the dual alignment tensors may not be the most practical approach, partly because of the lack of a truly independent second alignment medium for RNA. 17 Therefore, we applied a combined approach by using both the shape information of a RNA molecule and the DROs to determine the unique orientation for each duplex in a molecular frame. This is done by restraining angles between duplexes to be consistent with the overall molecular envelope, which is calculated from SAXS data, in the program ORIENT. An example of an input to the program is shown in the Supporting Information (SI).
RNAs consisting of mostly duplexes have the following unique information and features, which must be considered in determining global structures: (1) Approximate secondary structures of RNAs are usually known prior to structure determination, based on either predictions, 18 experimental verification, 19 or both; (2) RNA bases tend to be stacked either sequentially or in tertiary interactions (more than 95% of bases in any given RNA structure are stacked in a folded state, as shown by our survey of the RNA database). Bases that are not stacked tend to be at the tip of loops or in bulges. RNA duplexes and well-determined tetraloop hairpins can be considered, to a first degree of approximation, as rigid building blocks for delineating the global architecture of a large RNA molecule.
We generate coordinates of these duplex building blocks and linker residues from MOSAICLIB in a database using the BLOCK program in the G2G toolkit. MOSACILIB was built based on the accepted geometric nomenclature, classification and standards. 20,21 These duplex building blocks are then connected by linkers with arbitrary initial conformations. Similar structures can also be generated from non-experimental predictions using the software22,23 that appeared in the literature during this study. A more detailed description of the program BLOCK is given in SI. Packing the discretely oriented building blocks is accomplished by the program PACK. In principle, when RDCs are measured from only one alignment medium, each duplex can be determined to four possible discrete orientations. In the case of three-duplex containing RNA such as riboA, there would be 4×4×4 possible combinations of these orientations, or four equivalent sets of 16 possible combinations. The actual number of viable combinations of orientations is lower, because of steric clashes between duplexes and linkers, as shown in the riboA case. A more detailed description about PACK is given in SI.
SAXS data contains information about the overall dimension and shape of a biomacromolecule in solution. 24 This is particularly true for nucleic acids because a large electron density of the phosphate-sugar backbones dominates the SAXS intensity (Fig. S2 in SI). 25 Thus, the shape of a molecular envelope reflects the outline shape of an overall structure of a RNA backbone. This information complements NMR spectroscopy, which is not particularly useful in extracting the phosphate-sugar backbone conformation of large RNA molecules. The molecular envelope, determined from SAXS data using the program DAMMIN, 26 was utilized to define the approximate dimensions, which are indicative of packing of the RNA duplexes. In the case of riboA, the top width of the front view of the topological envelope is about 36 Å (Fig. S3, SI), a little less than the sum of the width of the two duplexes but much greater than that of a single duplex. This width suggests that the two duplexes are packed side by side and “intercalated”. The width of the side view of the envelope, about 22 Å, simply suggests that the two duplexes are packed in a parallel or an anti-parallel arrangement (see the analysis in the next section).
In order to evaluate thus demonstrate the utility of the G2G method, we applied it to solve the global structure of the 71-nt riboA RNA whose secondary structure is shown in Figure 2a. The NMR spectra used for assigning imino signals, identifying basepairs in both duplexes and non-duplex regions, and measuring RDC are shown in Figures 2b & 2c and and3,3, respectively. The dipolar waves of the three duplexes of riboA are shown in Figure 4.
Four possible discrete orientations were derived from the one set of RDC data for each duplex by fitting eq 2. Assuming duplex H1 takes any one of four orientations, there are four possible relative orientations of duplexes H2 (Figures 5a–d), and the same is true for duplexes H1 and H3 (Figure 5e–h). Thus there are eight possible combinations for the orientations of duplexes H1 and H2, and duplexes H1 and H3. Covalent bond linkages and steric hindrance between linkers and duplexes, detected by the PACK program, rule out combinations b and d, and f and g as non-viable conformations. The correct combination of the DROs of all three duplexes was identified by shape-aided orientation analyses (Figure 6), which led to the conclusion that the three duplexes in riboA are packed in either approximately a parallel or an anti-parallel arrangement (ae in Figure 6).
The quantitative determination of orientations and phases of duplexes was then carried out using program ORIENT. In the calculation, the angles between duplexes were restrained to either 0 ± 30° or 180 ± 30° for all three pairs of duplexes to account for parallel or anti-parallel arrangements. The best simultaneous fit yields axial component Da = -26.4 Hz, and rhombic component R = 0.35, with orientations and phases, (Θ,Φ,ρ0) of (151°, 281°, 101°), (22°, 97°, 259°) and (40°, 101°, 63°) for duplexes H1, H2 and H3, respectively, which represent angles of 173°, 169° and 18° between H1-H2, H1-H3, and H2-H3, respectively. The top fits with an overall RMSD no greater than 1.5 Hz in RDCs yield average values of Da=-25.5 ± 2.0, R = 0.28 ± 0.03, with average values of (Θ,Φ,ρ0) of (160 ± 9°, 288 ± 18°, 104 ± 4°), (13 ± 7°, 94 ± 6°, 264 ± 6°) and (20 ± 20°, 100 ± 4°, 64 ± 3°) for duplexes H1, H2 and H3, respectively, where the error ranges are standard deviations.
We then derived the structural topology (Figure 7a, left), using the duplex orientation and phase in terms of (Θ,Φ,ρ0), and the overall envelope, and built a three-dimensional starting structure using the BLOCK and PACK programs of G2G (Figure 7a, right). In addition to the restraints imposed by the orientations, phases and the overall shape of the envelope, the degrees of freedom for the relative translational positions of the three duplexes are also constrained in part by linkers of short stretches of nucleotides, which were generated with arbitrary conformations using the MOSAIC library.
This structure was then used as a starting structure for regularization and a hybrid rigid-body simulated annealing (SA) refinement protocol similar to that used for refining the global structure of a dimeric RNA:RNA complex 27 using Xplor-NIH. 28 The regularization procedure in Xplor-NIH connects linkers with duplex building blocks and removes any gross covalent geometry distortions before SA refinement. During the refinement, the duplex orientations and phases were held fixed in space, but duplex translations and arbitrary linker motions were allowed using the IVM facility 29 of the Xplor-NIH package (version 2.22 or newer). In addition to the loose-distance and torsion-angle restraints that were applied to maintain the conformation of approximate A-form duplexes, we added the following restraints to the calculation: (1) the experimental SAXS data and an explicit restraint on the radius of gyration that was extracted from the SAXS data; (2) uniform distance restraints to maintain neighboring base-stacking throughout the whole chain; (3) restraints for hydrogen bonds, extracted from imino HNN-COSY 30,31 and NOE experiments, in addition to those in the duplex regions; (4) imino RDC restraints for residues in both the duplex and non-duplex regions; and (5) approximate dimension restraints derived from the envelope of riboA in the form of approximate phosphorus-phosphorus distances. To avoid gross close contacts, we also added a minimum distance repulsive restraint, 6 Å that is the sequential phosphorus distance in an A-form duplex and is generally shortest possible distance separating any given two phosphate groups in RNA structures. The backbone RMSD of the average structure of the top 10% of the lowest energy structures for overall structure excluding the flexible loops from the rigid-body SA calculation is about 3.3 Å comparing to that of the X-ray crystal structure (Figure 7b) (Table S3 in SI for the calculation statistics).
The global structure, shown in Figure 7b, greatly facilitates assigning cross peaks in two-dimensional Nuclear Overhauser Effect (NOE) and 15N-HNN-COSY spectra and led to identification of tertiary hydrogen bonding interactions/close contacts suggested by experiments (Figure 2). For example, the initial structure puts the two loops in L2 and L3 facing each other in space (Figure 7b). This “closeness” in space between the two loops led to assignments of two unassigned sequential GC pairs: G38:C60 and G37:C61 (Figure 2b & 2c), because they immediately follow the G59:C67 pair in H3 in the imino NOE walk path. In the second example, an imino signal of a Watson-Crick G:C pair, revealed by the HNN-COSY spectrum, has a cross peak to that of U25 in the imino NOE walk path from H2 throughout to H1 (Figure 2b & 2c) and was assigned as G46:C53 for the following reasons: G46 is in close vicinity of U25 according to the arrangements of the duplexes in the topology. Because of the arrangements among H1, H2 and H3, C53 is only unpaired cytosine in the close vicinity of G46 and is likely to be the one forming this Watson-Crick base pair with G46. In addition, the imino of G46 has corss peaks with those of two other Us, which are U22 and U47 because they are the only two Us in the close vicinity of G46 in the initial global structure. The third example is about assigning signals related to ligand adenine. A previous biochemical study indicated that U74 interacts with the ligand most likely via a Watson-Crick pairing. 32 We assigned the imino proton signal H9 of the ligand adenine base and identified its position using a combination of the two spectra and the initial topology structure of riboA as follows. U74 is located in the junction formed with linkers between H1-H2, H2-H3 and H1-H3. The U74 imino was assigned by its strong cross peak to that of U75 that is at the end of H1 (Figure 2b & 2c). The H9 signal of the ligand adenine base was identified by the peak position at 11.43 ppm in the 1H-NOESY spectrum (Figure 2c) but without corresponding signal at the same position in the 15N-HNN-COSY spectrum (Figure 2b), suggesting that the signal is from the non-isotope labeled N9H9 imino of the adenine ligand. Furthermore, the U74 imino forms a hydrogen bond with the non-labeled ligand adenine as suggested by the absence of the acceptor signal (Figure 2b and 2c). The ligand also forms a hydrogen bond with another U imino 13.24 ppm (Figure 2c), suggested by the absence of the acceptor signal (Figure 2b). This U imino has also strong cross peaks with the U74 and U75 imino protons (Figure 2c). Based on the initial structure, U51 is the only U in the close vicinity of the ligand, U75 and U74. We therefore assigned the imino at 13.24 ppm to that of U51. These long range distance restraints were then applied to restrain the structure in the rigid-body SA calculation, where the duplex orientations and phases were initially fixed to maintain the global folding, and then later during refinement the each duplex as rigid-body was allowed to rotate to achieve the best overall fit to the RDC data.
A comparison of the ensemble of the top 10% of the lowest energy G2G structures and the X-ray crystal structure is shown in Figure 8a. The structure calculation statistics are listed in Table 2. The backbone RMSD of the ensemble of the calculated global structures relative to the X-ray crystal structure is 3.0 ± 0.3 Å for the whole molecule and 2.5 ± 0.2 Å for the three orientation and phase restrained duplexes. The backbone RMSD between the X-ray crystal and the average structure of the three duplexes is 2.5 Å. The agreement between the G2G structure and the experimental RDC and SAXS data was also examined. The correlation coefficient r between the back-calculated RDCs based on the ensemble of the top 10% lowest energy structures from the rigid-body calculation and the experimental data is about 0.83 (Figure 8b). The correlation coefficient between the back-calculated based on the regularized average structure of the ensemble and the experimental RDCs is about 0.95 (Figure 8c). For the comparison, the correlation between the RDCs calculated based on the X-ray crystal structure and the experimental data is about 0.77 (Figure 8d). The relatively low correlation for the X-ray crystal structure may in part be due to differences between the structures in solution and in crystalline states as well as the sequence difference in duplex H1 that might result in direct or indirect changes in the structure nearby or even in the entire structure (see the Materials and Methods). The quality of the G2G structure is also demonstrated by the comparison of the back-calculated SAXS curves with the experimental one and the RMSD between the two is about 0.29 ± 0.04 (Figure 8e). The PDDF comparison is shown in Figure 8f. With the current data, the approximate position of the adenine ligand was also determined (Figure 8g).
The “top-down” approach of the G2G method presents a new strategy for determining global RNA structure in solution. In general, the RMSD of the G2G structure relative to the “true” structure may approximately be estimated using the following empirical formula:
Where α is the possible RMSD between the “true” and the database-derived duplex structures in the context of the structure, β is the possible RMSD between of the “true” and the G2G structures of non-duplex regions, such as long linkers and under-determined loops; Pduplex is the percentage of duplex residues in the RNA. For A-form like duplexes, α of individual duplex is well below 2.0 Å, based on RMSDs from the database (Table 1). The value of β can vary significantly, depending on the length of non-duplex regions such as linkers and loops. In the riboA case, duplexes make up more than 60% of the total residues and the linkers between H1 & H2, and H2 & H3 are relatively short, the overall RMSD between the G2G and the “true” structure is estimated to be about 3.3 Å or better, assuming α and β are about 2.5 and 4.0 Å for duplexes and the long linker/loops, respectively.
The global structures may imply potential intricate networks of interactions, such as multi-base pairing 33 at junctions and among residues that are next to each other in a three-dimensional fold but otherwise are far apart from each other in sequence, as seen in the riboA RNA structure (an illustration identifying such tertiary basepairings are presented in the Results section). Incorporating those tertiary basepairings in the rigid-body calculation results in an improvement of the structure, as indicated by a small decrease in overall backbone RMSD, about 0.4 Å, between the ensemble vs the crystal (Figures 7b and and8a).8a). Thus, the G2G global structures of RNAs may serve as structural scaffolds for rapid determination of high-resolution structures in a way that is similar to using a structural model driven approach to determine the high-resolution structures of homologous proteins. 34 Furthermore, since there is virtually no size limitation in using SAXS data to derive the approximate molecular envelope, this method can potentially be used to study the structures of large RNAs in solution, provided that imino signals can be assigned and enough RDCs can be measured by using partial and/or segmental labeling schemes 10 in conjunction with a SAXS-aided divide-and-conquer strategy. The G2G method may also be valuable in deriving more accurate structures with a combined use of the MC-fold and MC-Sym pipeline. 22,23
The efficacy and accuracy of the G2G method heavily depends on how well orientations and phases of duplexes can be determined, which depends in part on the quality of RDCs and number of available RDCs per duplex. When orientations of the X-ray structure was taken in place of the experimentally determined ones, the calculation with the same number of restraints yielded an ensemble of structures with the backbone RMSDs of 0.9 or 1.7 Å for the three duplexes or all residues, respectively (Figure S5, SI). In principle, the longer a duplex is, the better its orientation can be determined, because more RDCs per duplex can possibly be extracted. Furthermore, in general, dynamics with large amplitude may complicate interpretations of both SAXS and RDC data. Therefore, the G2G method is limited to rigid systems. In practice, segments and residues of a RNA may undergo motions at various time scales 6,7. Both the SAXS and RDCs are time-averaged measurements on fraction of ms-sec time scales. We examined the validity of the rigid-body assumption for riboA by comparing the duplex orientations calculated with and without the rigid-body assumption (Table 3). The angles between duplexes computed from the best simultaneous fit (under the rigid-body assumption) are 173° for H1 & H2, 169° for H1 & H3 and 18° for H2 & H3. Those angles agree remarkably well with those calculated from the best individual fits (without the rigid-body assumption) with 172°, 162°, 23° for H1&H2, H1&H3, and H2&H3, respectively. This agreement indicates that riboA is well packed with no large amplitude internal motions among duplexes and the rigid body assumption approximately applicable for riboA. The rigid-body assumption for riboA is also consistent with the linearity of the Guinier plot (indicating uniform molecular size), the bell-shaped Kratky plot (well folded conformation,24 Figure S6), the convergence and compactness of the molecular envelope derived from SAXS data.
A plasmid containing template coding for the version of riboA used in this study was a gift from Professor David Draper at the Johns Hopkins University. This riboA molecule differs from that of the X-ray crystal structure 35 in four basepairs: G14-U82 => G14-C82, C15G81 => G15-C81,U16-A80 => A16-U80 and U17-A79 => A17-U79. All these mutated baseparis are located in duplex H1 (Figure 2a) and have no tertiary interactions with other parts of the RNA. These changes made it much more stable (personal communication with David Draper) for the structural study. The riboA RNA samples were prepared using in vitro transcriptions with T7 RNA polymerase, followed by a polyacrylamide gel electrophoresis (PAGE) purification. 15N-isotope–labeled RNA samples were prepared from 15N-labeled NTPs (Sigma-Aldrich, St. Louis, MO). NMR samples were extensively dialyzed against pH = 6.8 buffer containing 50 mM potassium acetate, 2 mM MgCl2 and 2 mM adenine. For scattering experiments, the Tris-HCl buffer was used in place of potassium acetate. The concentrations of riboA were 1-5 mg/ml for the X-ray scattering and 0.5–1.0 mM for all NMR experiments.
For assigning imino signals, we recorded two-dimensional 1H-1H NOESY 8 and HN‥N COSY spectra 30,31 at 15, 25 °C on a Varian Innova spectrometer operating at proton frequency of 800 MHz equipped with a triple-resonance cryo-probe. Mixing times of 100, 120 and 150 ms, depending on temperatures, were used for recording the NOESY experiments. All spectra were processed and analyzed using nmrPipe, nmrDraw 36 and NMRViewJ (One Moon Scientific, New Jersey).
The G2G methodology requires assigning imino signals and identifying hydrogen bonds involved in canonical and noncanonical basepairs in duplexes. The assignments of imino signals of the riboA RNA were accomplished by an NOE-walk of the two-dimensional NOESY spectrum of the imino region, aided by the two-dimensional [15N…H-15N]HNN-COSY spectrum. 30,31 The assignments of the imino signals in the regular duplex regions were straightforward (Figure 2). The initial fold, which was generated based on the information on the orientations and packing of duplexes (Figure 7), provided a visible model of the approximate relative locations of residues in a global sense and made it relatively easy to identify close contact and potential hydrogen bonds involved in tertiary interactions in the junction and loops (see the previous section).
The RDC data were derived from the IPAP-HSQC 37 spectra, which were recorded using 15N- isotopic RNA samples in isotropic and anisotropic conditions. The anisotropic riboA sample was prepared by adding about 9.7 mg/ml pf1 phage (ASLA Biotech, Burlington, NC), which resulted in a split of 9.8 Hz in deuterium signal. RDC values range from 2.2 to 22.7 Hz for the imino 15N-1H of riboA. The IPAP spectra of the imino resonance splitting of the riboA in the alignment medium are shown in Figure 3 and the RDC-structural periodicity correlation curves of the three duplexes are shown in Figure 4.
Both small-angle and wide-angle X-ray scattering (SAXS and WAXS) were performed at beamlines 12-ID and 18-ID of Advanced Photon Sources (APS) at Argonne National Laboratory. The wavelength, λ, of X-ray radiation was set as 1.033 Å and the scattered X-ray photons were recorded with a charge-coupled device X-ray detector (Mar). An X-ray flow cell made of a cylindrical quartz capillary with a diameter of 1.5 mm and a wall of 10 μm was used. The X-ray beam with size of 0.1 × 0.2 mm2, was adjusted to pass through the center of the cell. The exposure time was set to 0.5-1.0 seconds to avoid detector saturation and radiation damage. Potential radiation damage was further reduced by flowing the samples.
Twenty images were taken for each sample or buffer solution to get good statistics. The two-dimensional images were reduced to one-dimensional scattering profiles using the program MarDetector (Tiede, unpublished). In this program, the center of beam and the sample-to-detector distance, and thus q values, of individual detector pixels, were first calibrated using scattering data of silver behenate powder. The 2-D scattering images of buffers and samples were azimuthally averaged after solid angle correction and then normalized with the intensity of the incident X-ray beam. The resulting 1-D scattering data sets were averaged before buffer background subtraction. The scattering profile of a sample solute was calculated with the following equation:
where I(q) is the scattering intensity at q, and α is the scaling factor that denotes the relative contribution from the buffer. For a quick quality check on SAXS data, α was estimated as α≈1-cmassβ/1000, where cmass is the concentration of sample in mg/ml, and β is the partial specific volume of a solute, 0.54 for nucleic acids.
The range of momentum transfer q [=4π sinθ/λ, where 2θ is the scattering angle] of SAXS experiments was 0.006-0.250 Å-1, and that of WAXS was 0.1 – 2.6 Å-1. The SAXS data quality was evaluated by the linearity of Guinier plot (eq. 5). The more accurate background subtractions were performed in the following way. First, the WAXS profile was obtained by using eq. 4 and tuning the value of α to totally get rid of the scattering from buffer, indicating by the disappearance of the solvent peak at 2.0 Å-1. Then, the resulting WAXS profile was used as a guide for the SAXS background subtraction, by tuning the value of α in SAXS subtraction and overlaying the resulting SAXS profile with WAXS profile at the overlapping q range, i.e., 0.1 – 0.25 Å-1 in our experiments. The final scattering data were obtained by piecing the resulting SAXS and WAXS data together in the range of 0.006 – 2.5 Å-1.
The radius of gyration (Rg) was calculated from data at low q values in the range of qRg < 1.2, using the Guinier approximation (eq. 5),
The Rg values using this reciprocal space method were 19.2 ± 0.2 Å for riboA. The scattering intensities at and near q = 0 were extrapolated with the Guinier equation, when needed.
The PDDF, p(r), in real space was calculated using GNOM, 38 which is an indirect Fourier transform program, using real space perceptual criteria based on a solid sphere. To avoid under-estimation of the molecular dimension and consequent distortion in low resolution structural reconstruction, the parameter Rmax, the upper end of distance r, was chosen such that the resulting PDDF has a short, near zero-values tail at large r. The maximum distances (Dmax) were estimated as 62 ± 5 Å from the PDDF for the riboA molecule. The Rg value of riboA calculated from p(r) using GNOM was 19.8 Å, in good agreement with that calculated directly from Guinier approximation. We used the program DAMMIN 26 to obtain an approximate molecular envelope, which outlines the phosphate-sugar backbone outline of an RNA structure. In DAMMIN, a spherical space with a radius of Rmax, read from the PDDF result, is built and initially filled with multiple-phase small beads or dummy atoms. To avoid distortion caused by possible under-estimation of Dmax, the Rmax was usually set to be about 10 -20 Å greater than Dmax. At each step, the envelope evolves by randomly phasing a dummy-atom in (as a part of the molecule) or out (as a part of the solvent). A simulated annealing algorithm is employed in DAMMIN for driving the envelope evolution, by reducing the discrepancy between the experimental and calculated scattering curves during the annealing process. All reconstructions for riboA were run in the “jagged” mode, which works well for middle size molecules, yielding dummy atoms sufficient for resolving RNA duplexes, but is less computationally demanding than “slow” mode. The resulting structural models were subjected to averaging using the program package DAMAVER. 39 In this program, the normalized spatial discrepancy (NSD) values between each pair of models were computed. The model with lowest average NSD with respect to the rest of models was chosen as the reference model. The remaining models were superimposed onto the reference model using SUPCOMB40 except that possible outliers identified by NSD criteria were discarded. The dummy atoms of these superimposed models were remapped onto a densely packed grid of atoms with each grid point marked with its occupancy factor. The grids with non-zero occupancies were chosen to generate a final consensus model with the volume equal to the average excluded volume of all the models. Scattering data in a q range of 0.006-0.33 Å-1, which reflect the global shape without significant undesired influence from the internal structure, were used in DAMMIN calculations. 16 independent DAMMIN runs were performed and the resulting bead models were subjected to averaging by DAMAVER. The Rf values, as defined in the following:
were in the range of 0.008 – 0.009, indicating good match between the experimental scattering data and the calculated ones for individual models, and the average NSD for those samples was 0.61 ± 0.01, which indicate the excellent convergences in both individual DAMMIN fits and overall bead model ensembles for each sample.
We thank Professor D. E. Draper for providing us with the plasmid template for in vitro transcription of the riboA RNA for helpful discussions and reviewing the manuscript. This research was supported [in part] by the Intramural Research Program of the NIH, National Cancer Institute, Y-X.W. and B.A.S., by the Intramural Research Program of the NIH, the CIT Intramural Research Program to C.D.S. Work at Argonne National Laboratory (DMT) and the Advanced Photon Source was supported by the Office of Basic Energy Sciences, Department of Energy under contract DE-AC02-06CH11357. We thank Drs. L. Guo (BioCAT, sector 18-ID) and S. Seifert (BESSRC, sector 12-ID) at Argonne National Laboratory for their support for synchrotron experiments. Use of the Advanced Photon Source was supported by the U.S. Department of Energy, Basic Energy Sciences, Office of Science, under contract No. W-31-109-ENG-38. BioCAT is a National Institutes of Health-supported Research Center RR-08630. The content is solely the responsibility of the authors and does not necessarily reflect the official views of the National Center for Research Resources or the National Institutes of Health.
Supporting Information Available: A more detailed description of the methods and materials, including a detailed description of the G2G toolkit is provided in the Supporting Information. The calculation protocols, and the G2G toolkit package, all data files together with the coordinates and restraint filescan also be downloaded from the authors' web sites http://xxx.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.