|Home | About | Journals | Submit | Contact Us | Français|
The growing number of high-resolution crystal structures of large RNA molecules provides much information for understanding the principles of structural organization of these complex molecules. Several in-depth analyses of nucleobase-centered RNA structural motifs and backbone conformations have been published based on this information, including a systematic classification of base pairs by Leontis and Westhof. However, hydrogen bonds involving sugar–phosphate backbone atoms of RNA have not been analyzed systematically until recently, although such hydrogen bonds appear to be common both in local and tertiary interactions. Here we review some backbone structural motifs discussed in the literature and analyze a set of eight high-resolution multi-domain RNA structures. The analyzed RNAs are highly structured: among 5372 nucleotides in this set, 89% are involved in at least one “long-range” RNA–RNA hydrogen bond, i.e., hydrogen bonds between atoms in the same residue or sequential residues are ignored. These long-range hydrogen bonds frequently use backbone atoms as hydrogen bond acceptors, i.e., OP1, OP2, O2′, O3′, O4′, or O5′, or as a donor (2′OH). A surprisingly large number of such hydrogen bonds are found, considering that neither single-stranded nor double-stranded regions will contain such hydrogen bonds unless additional interactions with other residues exist. Among 8327 long-range hydrogen bonds found in this set of structures, 2811, or about one-third, are hydrogen bonds entailing RNA backbone atoms; they involve 39% of all nucleotides in the structures. The majority of them (2111) are hydrogen bonds entailing ribose hydroxyl groups, which can be used either as a donor or an acceptor; they constitute 25% of all hydrogen bonds and involve 31% of all nucleotides. The phosphate oxygens OP1 or OP2 are used as hydrogen bond acceptors in 12% of all nucleotides, and the ribose ring oxygen O4′ and phosphodiester oxygens O3′ and O5′ are used in 4%, 4%, and 1% of all nucleotides, respectively. Distributions of geometric parameters and some examples of such hydrogen bonds are presented in this report. A novel motif involving backbone hydrogen bonds, the ribose–phosphate zipper, is also identified.
Advances in molecular biology and crystallographic methods have led to a dramatic increase in the number of high-resolution crystal structures of large RNA molecules solved in the past decade.1,2 These structures provide a wealth of information for understanding principles of organization of complex RNA structures and have motivated detailed analysis of nucleobase-centered RNA structural motifs and backbone conformations; for example, see ref. 3–12. This information can be used for knowledge-based prediction of three-dimensional (3D) RNA structures.13–15 In many instances, such methods can currently predict native structures of relatively small one-domain RNA within an atomic root mean square deviation (RMSD) of 2 Å,14 although the RMSD increases to 8–16 Å for larger multi-domain RNA.15 It is likely that the sugar–phosphate backbone plays important roles in inter-domain tertiary interactions of RNA, and systematic classification of such motifs may improve predictive methods. Indeed, many backbone motifs, especially those involving the ribose hydroxyl groups, have been identified.16–25 Yet, until recently, there were no attempts to systematically analyze hydrogen bonds (H-bonds) involving the backbone atoms; a recent paper reports compiled distributions of phosphate oxygens around individual RNA bases.26 Here, we review some of the previously identified structural motifs emphasizing the usage of the backbone H-bonds (BH-bonds) and also analyze a set of high-resolution multi-domain RNA structures. Consideration of BH-bonds greatly increases the number of structural motifs in RNA. Indeed, the analyzed set of RNA structures has 2444 base pairs stabilized by at least one H-bond when allowing donor and acceptor atoms only on nucleobases. The number of pairs of residues (as opposed to base pairs) is approximately doubled to 4294 when BH-bonds are included (Table 1). We do not attempt here to present a full systematic classification of structural motifs with BH-bonds. Instead, we present some typical examples, including some discussed previously in the literature. The purpose of this short report is to increase awareness of the roles of the sugar–phosphate backbone atoms in hydrogen bonding interactions, which contribute to structure, in RNA and to stimulate efforts to create a systematic classification of such interactions.
Table 1 lists the high-resolution multi-domain RNA crystal structures selected for the analysis as described in the Methods section. The structures include rRNA from small and large ribosomal subunits, 7S RNA from the signal recognition particle, hairpin ribozyme, P4–P6 domain from the group I intron, lysine and thiamine pyrophosphate riboswitches, and tRNA.
The analyzed RNAs are highly structured, which is perhaps not surprising since they were able to crystallize. In 5372 nucleotides (nt) analyzed, 8327 H-bonds have been found. Residues connected by at least one H-bond form 4294 hydrogen-bonded pairs, which involve 4796 nt and leave less than 11% of the residues unpaired. A surprisingly large number of these H-bonds were BH-bonds, which are defined as having an acceptor OP1 (pro-Sp), OP2 (pro-Rp), O3′, O5′, or O2′ or a donor 2′OH. It has been shown that base protons H2, H5, H6, and H8 can also form stabilizing interactions with phosphate oxygens OP1 and OP2,26 but these interactions were not included in the present analysis. BH-bonds constitute 34% of the total number of RNA–RNA H-bonds and involve 39% of all RNA residues. These statistics are, of course, strongly biased towards large multi-domain rRNA molecules such as the small and large ribosomal subunits (Table 1), but even the average percent among the eight structures analyzed (25% of all H-bonds and 31% of all residues) is greater than we intuitively expected prior to this analysis. The lowest number of BH-bonds, 10% of all H-bonds, is observed in hairpin ribozyme, which has few inter-domain contacts. The absolute majority, 2111 BH-bonds, use the ribose 2′OH hydroxyl group, either as donor (1766 times), acceptor (704 times), or both (359 times; see Table 2); 31% of all ribose hydroxyl groups participate in hydrogen bonding. Nevertheless, other backbone acceptor atoms are also used quite frequently. The frequencies of the phosphate oxygens OP1 and OP2, ribose ring oxygen O4′, and phosphodiester oxygen O3′ normalized to the total number of residues are comparable to those of guanine N3 and N7 normalized to the total number of guanines (Table 2). Together, BH-bonds with OP1 and OP2 acceptors account for 10% of all H-bonds and involve 12% of all nucleotides. The phosphodiester oxygen O5′ is the least used acceptor in either any BH-bonds or in 2′OH–O5′ BH-bonds (Table 2); the reasons for this are currently not entirely clear. Purine amino groups are the most common donors in BH-bonds, although the cytosine amino group and guanine and uracil imino protons are also observed (Table 2).
The distributions of geometries of observed H-bonds are summarized in Fig. 1. A noticeable number of observed H-bonds appear not to have optimal geometries due to short heavy atom distances but also due to small H-bond angle; some of them should probably be classified as steric clashes rather than H-bonds. The reasons for the nonoptimal geometries are not clear; a possible explanation has to do with refinement deficiencies due to the moderate resolution of these structures (Table 1). In any case, BH-bonds (Fig. 1H) do not appear to have this problem more severely than nucleobase–nucleobase H-bonds (Fig. 1G and I). A specific peculiarity of BH-bonds is a more frequently observed long heavy atom distance. This feature is pronounced in both OH–O and NH–O BH-bonds (Fig. 1B and E), but, interestingly, not in the 2′OH–O2′ BH-bonds specifically (Fig. 1C).
Below, we will show some examples of BH-bonds, including reviewing some motifs identified previously. In discussing various types of base pairing, we will follow the classification of Leontis and Westhof,3 according to which base pairs are divided into 12 groups based on their interacting edges (Watson–Crick, Hoogsteen, or sugar) and orientation of glycosidic bonds (cis or trans).
Ribose hydroxyl groups are frequently used in BH-bonds, either as donor or acceptor (Table 2). Indeed, several structural motifs with 2′OH group have been identified in the past. A-minor refers to a series of structural motifs where the ribose and minor groove edge of an adenosine interact with the sugar edge of a Watson–Crick pair, most commonly GC, within an RNA stem.18,27 In effect, the adenine forms a minor-groove triple in this motif. In the analyzed set of structures, there are 226 minor-groove triples (not necessarily all within the context of an RNA stem); in 184 of them, there are BH-bonds present. The majority of these triples, 108, have adenine as the third residue, and 89 of them are GC–A triples. In the A-minor I motif, the adenine interacts with both residues of the Watson–Crick GC pair of the stem making H-bonds AN1–GO2′ and AN3–GN2 and bifurcated H-bonds AO2′–CO2 and AO2′–CO2′. Similar interactions, but without the AN3–GN2 H-bond, are observed when an adenine forms a minor-groove triple with a Watson–Crick AU pair within an RNA stem,27 but they are less common. The AG interaction within the A-minor I motif can be also classified as Leontis–Westhof group 12 AG pair (trans-sugar edge/sugar edge). Such pairs are also found outside of the context of RNA stems and even outside of base triples. In the data set analyzed here, there are 69 occurrences of such pairs (an example is shown in Fig. 2A); they account for about one-third of all AN1–O2′ H-bonds (Table 2).
In the A-minor II motif, the adenine interacts only with the ribose of the cytosine of the GC base pair, with the adenine N3 and O2′ atoms making bifurcated H-bonds with the 2′OH of the cytosine.18 Such adenine–ribose interactions can be also frequently found outside of the context of RNA stems or triples; they are not cytosine-specific, although cytosines are the most common. In the set of structures analyzed here, 106 such pairs are present (Fig. 2B) (16 A, 47 C, 18 G, and 25 U). These interactions account for the majority of the 144 AN3–O2′ observed H-bonds; an example is shown in Fig. 2B.
The adenine amino group is one of the most common donors in BH-bonds, involving more than 20% adenines (Table 2); in the majority of cases (59%), ribose hydroxyl groups serve as acceptors of such H-bonds. Fig. 2C shows one example of such an interaction; the AG is a group 6 pair (trans-Watson–Crick/sugar edge) in this case. Such pairs are relatively infrequent; out of 13 group 6 AG pairs found, eight have the AN6–O2′ H-bond. Fig. 2D shows a much more frequent group 10 AG pair (trans-Hoogsteen/sugar edge). Out of 95 such pairs in the data set analyzed, 67 pairs have the AN6–O2′ H-bond. It appears that this H-bond requires a C3′–endo sugar conformation in the G residue. If the sugar is C2′–endo, the AN6–O4′ H-bond is formed instead (Fig. 2E; observed in 13 out of 95 group 10 AG pairs).
The G-ribo motif defines a side-by-side arrangement of two RNA stems stabilized by the sugar edge interaction of a guanine in a GC pair of one stem with ribose atoms of another stem,24 such motifs are present in pseudoknot structures in rRNA.28 The G-ribo interaction is characterized by two BH-bonds, GN2–O4′ and GO2′–O2′ (Fig. 3A), and it can also be found outside of the context of RNA stem packing. Somewhat different variants of guanine–ribose interactions are also possible, one with the reverse order of hydrogen bonding, GN2–O2′ and GO2′–O4′ (Fig. 3B), and another with three BH-bonds: GN2–O3′, GN3–O2′, and GO2′–O2′ (Fig. 3C). These motifs are relatively infrequent; 11, 4 and 4 examples of the three variants, respectively, were found in the analyzed set of structures. Interestingly, they are present only in rRNA, 16S and 23S.
Along-groove packing motifs (AGPM), or P-interactions, entail backbone interactions between two RNA helices packed against each other. Such motifs entail two base pairs interacting via their sugar edges, either two Watson–Crick GC pairs or a GC pair and a wobble GU pair.19,22 A number of specific arrangements have been identified for such pairs of base pairs.19,22,23,25 Two recently identified specific arrangements of a pair of GC pairs were termed ribo-base (see Fig. 8 in ref. 25). In ribo-base type 1, only guanines interact directly forming H-bonds GN2–GO2′ and GO2′–GN2. In ribo-base type 2, one GC pair is flipped 180° around the dyad axis and shifted such that G from the first pair interacts with both G and C in the second GC pair forming H-bonds GO2′–GN2 and GN2–CO2′. Two more arrangements have been described for a pair of GC pairs and two more for a combination of GC and GU pairs (see Fig. 2 in ref. 22). Various arrangements can be used in tandem (four base pairs altogether) to facilitate packing of helices; such motifs appear to be associated with a perpendicular arrangement of helices.25 To avoid confusion with terminology, we suggest reserving the terms AGPM or P-interactions to describe backbone-mediated packing of RNA helices and to use ribo-base to describe specific arrangements of two base pairs, especially because the latter can be potentially found outside of the helix packing context.
A special role for wobble GU pairs in P-interactions has been highlighted by Mokdad et al.23 who showed that participation in helix packing leads to a stronger conservation of the GU pairs in rRNA sequences. In addition, two local structural motifs involving BH-bonds with the shallow minor groove side of wobble GU pairs have been identified in this work (see Fig. 1 in ref. 23). One is the O2′-in-pocket interaction and another is the phosphate-in-pocket interaction (see also the next section on phosphate groups in H-bonds). Although relatively infrequent, these local motifs can be observed both within and outside of the helix packing context. We located 32 similar motifs (all of them in rRNA) among 147 wobble GU pairs present in the set of structures from Table 1: 16 are O2′-in-pocket motifs, 8 are phosphate-in-pocket motifs, and 8 more have both ribose and phosphate atoms forming BH-bonds with base atoms on the minor groove side of the GU pair (not shown). This count is somewhat different from that published previously,23 because of somewhat different criteria used: only base donor and acceptor atoms of GU pairs were considered, to exclude purely ribose–ribose interactions. Formation of BH-bonds with the major (non-glycosidic) groove side of the GU pairs is extremely rare: only two such cases are observed with the O2′ forming H-bonds with O4, O6 or N7 atoms in the analyzed structures, one in the 23S rRNA and another in the group I intron structure (not shown). It has been observed that sometimes protein side chains and Mg2+ ions form H-bonds with this side of GU pairs.23
Ribose zipper is a frequent structural motif defined as hydrogen bonding between ribose 2′-hydroxyl groups of at least two consecutive residues to the 2′-hydroxyl groups of at least two other residues antiparallel to the first two.16,20,29 There are 61 ribose zipper motifs in the set of structures analyzed here, mostly in rRNA of large and small ribosomal subunits, and also in the P4–P6 domain of group I intron and in the hairpin ribozyme. These interactions account for about one-third of all O2′–O2′ H-bonds in these structures.
Although not as frequently as 2′OH hydroxyl groups, oxygens on phosphoryl groups often participate in hydrogen bonding (Table 2). In this section, we present some examples of such BH-bonds. Group 8 symmetric AA pairs (trans-Hoogsteen/Hoogsteen), in addition to N7-amino interactions, can be additionally stabilized by amino-to-OP1/OP2 H-bonds (Fig. 4); see also ref. 3. There are twenty group 8 AA pairs in the analyzed structures; one AN6–AOP1/OP2 H-bond (Fig. 4A) is observed in 13 of them, and two such H-bonds (Fig. 4B) are observed in three cases. It appears that a C3′–endo sugar pucker is required for such hydrogen bonding. Most often, OP2 oxygen is used as acceptor (in 17 out of 19 H-bonds).
Group 10 AG base pairs (trans-Hoogsteen/sugar edge) were discussed above relative to the amino-to-hydroxyl hydrogen bonding. In many instances (in 43 cases out of 67 AG pairs; shown in Fig. 2D), the phosphate oxygen OP2 of the residue immediately upstream of A is making additional bifurcated H-bonds with imino and amino protons of G (Fig. 5). This XA–G motif can be a part of a more complex structure called the sarcin–ricin loop motif,17,21,30 but the XA–G block is much more frequent than the complete sarcin–ricin loop.
Major-groove triples are less common than minor-groove triples in the analyzed RNA structures; 96 such triples were found with the third residue interacting with the purine Hoogsteen edge of a Watson–Crick pair. In 39 of them, there are additional BH-bonds stabilizing the triples, including 13 instances with phosphate oxygens, usually OP2, serving as acceptors (Fig. 6). Modeling has shown that such triples with additional O2′–OP2 H-bonds can form extended triple helices,31 although such H-bonds have not been found experimentally in RNA triple helices.
Ribose hydroxyl–phosphate oxygen interactions can frequently stabilize bulged-out conformations. We identified 34 stacked n and n + 2 bases with O2′–OP1/OP2 H-bonds; OP1 and OP2 acceptors are used with similar frequency (Fig. 7). Finally, by analogy to the ribose zipper, we identified a ribose–phosphate zipper. In this motif, consecutive residues from two strands form O2′–OP1/OP2 H-bonds. Fifteen such motifs are present in the analyzed set of structures, seven O2′–OP1 zippers (Fig. 8A), three O2′–OP2 zippers (Fig. 8B), and five mixed O2′–OP1/OP2 zippers (Fig. 8C). The orientation of strands in ribose–phosphate zippers can be either parallel (Fig. 8A) or antiparallel (Fig. 8B and C). Flexibility in the choice of OP1 or OP2 acceptor atom increases the number of feasible variants when packing together two RNA segments.
Hydrogen bonding entailing sugar–phosphate backbone donor and acceptor atoms is very common in multi-domain RNA structures. Such interactions create a multitude of structural motifs, some of which are reviewed here. Systematic classification of such motifs, which is still lacking, will help elucidate principles of organization of complex RNA conformations and improve methods predicting RNA structures.
A set of high-resolution multi-domain RNA crystal structures was selected from the Protein Data Bank (PDB)32 for the analysis. A resolution cutoff of 2.5 Å was chosen such as to include the highest-resolution structure of the small ribosomal subunit.33 To avoid redundancy only one structure of each type of RNA was selected (Table 1).
The atomic coordinates of the RNA structures were downloaded from the PDB database32 and analyzed with tools written in the Python programming language (mf3d, Motif Finder in 3D structures). These tools represent a collection of functions searching for various structural features in 3D RNA structures, such as H-bonds, base pairs or triples, stems, apical and internal loops, etc. The flexibility of Python allows for quick development of additional functions tailored to specific research questions.
Imino and amino protons were added to RNA residues from purely geometric considerations, i.e., using the N–H distance of 1.01 Å, the H–N–H angle of 120° in amino groups, and assuming a planar geometry for amino groups. H-bonds involving imino and amino donor protons were defined as having the distance between the heavy atoms below 3.3 Å and the H-bond angle above 110°. No protons were added to the ribose 2′-hydroxyl groups, because their placement must depend on the energy and cannot be based on purely geometric considerations. H-bonds with the 2′OH donor were accepted if the distance between the heavy atoms was below 3.3 Å. Such H-bond definitions are rather arbitrary, because calculations with empirical force fields show that the H-bond configurations remain weakly stabilizing even with the heavy-atom distance beyond 3.3 Å, when the H-bond angle is close to 180° (data not shown). H-bonds formed within a residue or between sequential residues were excluded from the analysis; also, no attempts were made to analyze H-bonds between the RNA and proteins or other ligands present in the structures. The backbone hydrogen bonding statistical analysis, therefore, counts only H-bonds between residue n and residue n + 2 or greater.
Hydrogen-bonded pairs of nucleobases were classified following Leontis and Westhof3 by specifying the interacting edge for each base as “Watson–Crick edge”, “Hoogsteen edge”, or “sugar edge”. In addition to the interacting edges, base pairs were classified as cis or trans according to the orientation of the glycosidic bonds relative to the direction of H-bonds. We have not included in this analysis base protons H2, H5, H6 and H8 that have weak positive partial charges, although their interaction with oxygen and nitrogen atoms with negative partial charges can be considered as weak H-bonds (see, e.g., ref. 26, 34 and 35) and are also likely to contribute to stabilization of RNA conformations. Pseudo-rotation parameters of sugars were calculated with the Fitparam program.36 UCSF Chimera37,38 was used to prepare molecular graphics.
This work was supported in parts by the National Institutes of Health grant AI46967 and Binational Science Foundation grant 2001065.
†This paper is dedicated to Professor Wojciech J. Stec on his 70th birthday. This article is part of a themed issue on Biophosphates.