|Home | About | Journals | Submit | Contact Us | Français|
RNA molecules exhibit complex structures in which a large fraction of the bases engage in non-Watson–Crick base pairing, forming motifs that mediate long-range RNA–RNA interactions and create binding sites for proteins and small molecule ligands. The rapidly growing number of three-dimensional RNA structures at atomic resolution requires that databases contain the annotation of such base pairs. An unambiguous and descriptive nomenclature was proposed recently in which RNA base pairs were classified by the base edges participating in the interaction (Watson–Crick, Hoogsteen/CH or sugar edge) and the orientation of the glycosidic bonds relative to the hydrogen bonds (cis or trans). Twelve basic geometric families were identified and all 12 have been observed in crystal structures. For each base pairing family, we present here the 4 × 4 ‘isostericity matrices’ summarizing the geometric relationships between the 16 pairwise combinations of the four standard bases, A, C, G and U. Whenever available, a representative example of each observed base pair from X-ray crystal structures (3.0 Å resolution or better) is provided or, otherwise, theoretically plausible models. This format makes apparent the recurrent geometric patterns that are observed and helps identify isosteric pairs that co-vary or interchange in sequences of homologous molecules while maintaining conserved three-dimensional motifs.
The past 10 years have witnessed an explosion in RNA structure determination at the atomic level. An increasing number of structures of important, functionally diverse RNA molecules have been determined, including the rRNAs (5S, 16S and 23S), many tRNAs, a variety of ribozymes, part of the SRP RNA, portions of viral RNA genomes and a variety of RNA aptamers bound to their ligands (Table (Table1).1). The complexity of many of these structures challenges the ability of individual scientists to understand and visualize the diversity of interactions. Nonetheless, careful examination reveals recurrent motifs (1–4). The most fundamental motif is the edge-to-edge hydrogen bonding interaction between two bases. The prototype is the standard (canonical) Watson–Crick base pair, in which two bases interact with their Watson–Crick edges, with the glycosidic bonds oriented cis relative to the axis of the interaction (Fig. (Fig.1).1). Yet even the early crystallographic studies of nucleic acids revealed other modes of interaction (5). In the l980s, with only the atomic structures of tRNAs and crystal packing interactions of small oligonucleotides to work from, compilations of non-Watson–Crick pairs appeared (6). Such compilations grouped interactions according to base type (purine–purine, purine–pyrimidine and pyrimidine–pyrimidine) rather than geometry. Recently, we proposed a classification of RNA base pairs based on geometry (7). This approach is justified by the need to easily (and eventually automatically) identify recurrent structural motifs in new crystal structures and to predict the occurrences of motifs through comparative sequence analysis. This approach will lead in turn to higher quality sequence alignments of homologous RNA molecules. RNA homology modeling is based on two main assumptions (8–10). The first is that the secondary and tertiary structures are much more highly conserved than primary sequence. The second is that, just as for Watson–Crick pairs in secondary structures, those compensatory base substitutions that retain the non-Watson–Crick pairs in three-dimensional structure elements and motifs are more likely to be observed than those that cannot be accommodated. These ideas have been applied by other workers to the problem of identifying non-Watson–Crick interactions and RNA motifs using comparative sequence analysis, especially in the context of base triples (11–13).
Here, we present matrices of observed and predicted edge-to-edge interactions based on exhaustive examination of medium to high resolution (<3.0 Å) RNA crystal structures, including the recently published structures of the ribosome. In several important cases, NMR structural work has provided the first observations of non-Watson–Crick base pairs (14–19). However, NMR geometries are not always unambiguously determined and as all base pairs have subsequently been found in X-ray crystal structures, we chose our examples from the latter. These data provide the basis for implementing algorithms to automatically identify and classify motifs mediating tertiary interactions in complex RNA structures. The data we present should also assist in the interpretation of RNA interference (20), modification (21) and instant evolution data (22), i.e. the assignment of possible geometries for a given interaction identified through these types of experiments. Finally, these data are useful and crucial to the generation of accurate structural alignments of homologous RNA sequences.
This work relied on visual examination of high resolution X-ray crystal structures to determine hydrogen bonding patterns. Structures were obtained from the Nucleic Acid Database (http://ndbserver.rutgers.edu/NDB) and the Protein Data Bank (http://www.rcsb.org/pdb/) and were manipulated with the Swiss PDB Viewer program, available from http://www.expasy.ch/spdbv/ (23). Hydrogen bonding diagrams were prepared using the Chem3D and ChemDraw Pro programs (CambridgeSoft Corporation). Diagrams were prepared using Canvas (Deneba Software).
Figures Figures22–12 are available on the Internet at either http://www.bgsu.edu/departments/chem/RNA/pages or http://www-ibmc.u-strasbg.fr/upr9002/westhof/. The BGSU website also provides interactive three-dimensional views of each base pair using the CHIME plug-in.
In previous work, we showed that RNA bases (purines and pyrimidines) interact edge-to-edge using any one of three edges (Fig. (Fig.1);1); consequently, all base pairs involving two or more edge-to-edge hydrogen bonds belong to one of 12 geometric families (7). Each family is identified by the edges involved in the interaction and the relative orientations of the glycosidic bonds of the interacting nucleotides, cis or trans (Table (Table2).2). When the glycosidic bonds of the two bases assume the default anti configuration, the relative strand orientations are those given in the third column of Table Table22 (24). In Figures Figures22–13, representative examples are provided of observed base pairs for each geometrical family. When the two interacting edges are different (for example Watson–Crick and Hoogsteen), a historically based priority rule is invoked (Watson–Crick > Hoogsteen > sugar edge) so the base identified with each row of a given matrix is the one interacting with the higher priority edge. Thus, in Family 3, cis Watson–Crick/Hoogsteen (Fig. (Fig.4),4), all the pairings in the first row involve adenine interacting with its Watson–Crick edge while all the pairings in the first column involve adenine interacting with its Hoogsteen edge. In each panel of Figures Figures22–13, the higher priority base appears to the left, oriented so that its Watson–Crick edge faces to the right. A list of referenced NDB files with primary references is provided in Table Table11.
For each base pair in Figures Figures22–13, the source (NDB filename) and resolution of the X-ray data (in Å), as well as the C1′–C1′ distance (also in Å) are provided in the lower right corner. As higher resolution examples are obtained of each base pair, they may be conveniently substituted for the pair shown. In those cases where an example of a base pair was not found in a crystal structure, the pair was modeled using known structures as templates and basic principles of hydrogen bonding. The pairs used as templates for modeled pairs are noted in the lower right of the panel. Blank spaces in Figures Figures22–13 indicate base combinations for which no example has been found and for which no reasonable model could be proposed based on current knowledge. Sugar ring atoms are drawn for those cases where the O2′ participates (or could potentially participate) in hydrogen bonding to the base (or ribose O2′) of the partner nucleotide. Otherwise the entire sugar moiety is designated with a closed circle.
The sugar edge of purine and pyrimidine nucleotides includes the 2′-OH, when the glycosidic bond of the nucleotide is in the usual anti domain. Thus, when one or both bases interact with the sugar edge, hydrogen bonds can form with the 2′-OH group(s) acting either as donor(s) or acceptor(s). In fact, in some of the cis sugar edge/sugar edge pairs, no direct base–base hydrogen bonds occur at all. Since the position of the 2′-OH hydrogen cannot be inferred from X-ray structures of nucleic acids, the 2′-OH is drawn as a single unit in Figures Figures22–13. The C-H–O hydrogen bond is well established in structural chemistry on the basis of detailed analyses of small molecule crystallography (25). Thus, we also mark interactions involving adenine H2, purine (R) H8 and pyrimidine (Y) H5 or H6 as hydrogen bonds in Figures Figures22–13. For hydrogen bonds not involving a C-H the maximum distance between heavy atoms is 3.4 Å and for hydrogen bonds involving C-H bonds the maximum distance should be <3.9 Å. Bridging water molecules are integral elements of a number of non-Watson–Crick base pairs (26,27). Water acts as both hydrogen bond donor and acceptor in these structures but, again, the actual positions of the hydrogen atoms cannot be inferred from available crystal structures so water molecules are simply designated by W in Figures Figures22–13. Information regarding the hydrogen bonding is provided in the lower left-hand corner of each panel in Figures Figures22–13. Three numbers are given: (i) the number of observed or potential hydrogen bonds between two nitrogen or oxygen containing groups (i.e. normal hydrogen bonds); (ii) the number of hydrogen bonds involving polarized C-H groups (i.e. AH2, RH8, YH5 or YH6); (iii) the number of bridging water molecules.
It is well known that adenine can be protonated at the N1 position and cytidine at the N3 position (6). The proton cannot be directly observed in nucleic acid crystal structures, but a number of interactions cannot be readily rationalized without assuming protonation. In some rare instances, experimental or theoretical support has been obtained for protonation (28). Therefore, wherever it makes chemical sense, we have indicated protonated adenine and cytidine in Figures Figures22–13.
The three-dimensional structures of homologous RNA molecules change much more slowly than their sequences in the course of evolution (as is also true for homologous proteins). By definition, homologous molecules share a common biological origin and a conserved function. Random point mutations in structurally crucial parts of RNA molecules are accommodated by natural selection when they affect the three-dimensional structure little or when they are compensated by further mutations. Such co-variations, when they occur at positions that are cis Watson–Crick paired, have been applied with great success to predict the occurrence of conserved double helices in homologous RNA molecules. The isostericity of the standard base pairs, A-U, G=C, C=G and U-A in Figure Figure2,2, is the fundamental property. The C1′–C1′ distance in each of these pairs is identical (Fig. (Fig.2,2, lower right of each panel), as are the relative orientations of the glycosidic bonds, considered as vectors in three-dimensional space. When two base pairs display nearly the same C1′–C1′ distance and have their glycosidic bonds oriented in the same way, they can replace each other without drastically changing the three-dimensional path and relative geometric orientations of the phosphate–sugar backbones. We denote such base pairs as ‘isosteric’, although this does not necessarily imply that the two base pairs occupy the same total volume of space, and in many cases this, in fact, does not hold.
Generally, base pairs belonging to the same geometric family exhibit very similar relative orientations of their glycosidic bonds, implying the maintenance of the local orientations of the strands and thus of the three-dimensional organization. However, in the general case, all possible base pairs belonging to a single geometric family are not isosteric to each other because the C1′–C1′ distances may be quite different. Thus, the C1′–C1′ distance can be used to group the base pairs within each geometric family into isosteric subsets or subfamilies. The recognition of subsets of isosteric base pairs within a family serves the purpose of identifying pairs that can substitute for each other while preserving the three-dimensional structure, crucial information for three-dimensional modeling of tertiary interactions, prediction of motifs, and the generation and refinement of accurate structural alignments. In the following, each geometric family is considered in turn and the isosteric subsets of base pairs identified from Figures Figures22–13 are summarized in the form of isostericity matrices in Tables Tables33–5.
Cis Watson–Crick/Watson–Crick (Family 1). We begin with the base pairs belonging to the cis Watson–Crick/Watson–Crick geometric family, shown in Figure Figure2.2. The (canonical) Watson–Crick pairs, A-U, U-A, G=C and C=G, form an isosteric subfamily, which we designate I1 in the isostericity matrix for this family, shown in Table Table33 (first row, left). Likewise the wobble pairs G/U and A(+)/C form an isosteric subgroup I2. However, unlike I1, the wobble pairs are not self-isosteric and, thus, the wobble pairs U/G and C/A(+) comprise a third isosteric subset, which, however, is related to I2 and is therefore designated i2. In certain contexts the wobble pairs can substitute for canonical cis Watson–Crick/Watson–Crick pairs within a helix. We can say that they are compatible with the canonical pairs. However, substitution of a G/U or A(+)/C pair for a U/G or C/A(+) results in a larger structural perturbation in a helical context (29) and thus R/Y are usually not compatible with Y/R wobble pairs.
The pairs A/G and G/A constitute a fourth subfamily, designated I3. Like the canonical pairs (I1) they are self-isosteric. I4 consists solely of the A/A pair, since the G/G combination cannot occur in this geometry. C/U and U/C are self-isosteric and comprise subset I5. Interestingly, in high resolution structures this pair is consistently observed with an inserted water molecule, bridging between the imino positions of the bases, perhaps because of repulsion between the O2 atoms of the interacting pyrimidines (30). Consequently the C1′–C1′ distance for the water-inserted C/U pair is significantly larger than expected for a pyrimidine–pyrimidine pair, and close to that of cis Watson–Crick/Watson–Crick A/G. Interestingly, U/C is observed to co-vary with A/G in the anticodon stem of tRNAs (27). Thus, in certain contexts C/U and A/G are compatible.
The isosteric wobble pairs C(+)/C and U/U, both of which have been observed, comprise the final isosteric subgroup of the cis Watson–Crick/Watson–Crick geometric family, designated I6. The C1′–C1′ distance in this subfamily is significantly smaller than that of any of the others, including the water-inserted U/C.
Trans Watson–Crick/Watson–Crick (Family 2). Representa tive base pairs belonging to the trans Watson–Crick/Watson–Crick geometric family are shown in Figure Figure33 and the isostericity matrix is shown in the right panel of the first row of Table Table3.3. The trans orientation of the glycosidic bonds allows for a possible 2-fold axis perpendicular to and passing through the middle of the base pair. Unlike the corresponding cis pairs, the A/U (designated I1) and G/C (designated I2) pairs are not isosteric. However, these and all trans Watson–Crick/Watson–Crick pairs are self-isosteric and thus Table Table33 is symmetric with respect to the main diagonal. The pairs A/C and G/U are isosteric, but not isosteric with A/U or G/C, and thus form a third group, I3. The homopurine pairs A/A and G/G are isosteric (I4) but A/G cannot form with two hydrogen bonds. As for the cis Watson–Crick/Watson–Crick family, all possible trans Watson–Crick/Watson–Crick pairs have been observed in crystal structures.
The trans Watson–Crick/Watson–Crick C/C pair shown in Figure Figure33 has three hydrogen bonds and requires protonation of one cytosine at N1. It is from a crystal structure of cysteinyl tRNA at 2.6 Å resolution (PR0004). An alternative hydrogen bonding pattern can be proposed that does not require protonation but involves only two hydrogen bonds (CN1– CN4 and CN4–CN1), which would make C/C isosteric with U/U rather than U/C. This geometry is observed at lower resolution (3.5 Å) for the tertiary base pair (C1773/C2565) in the structure of the 23S rRNA of Deinococcus radiodurans (RR0051). This pair corresponds to the tertiary interaction U1838 /U2621 in the 23S rRNA of Haloarcula marismortui (U1782/U2586 in the Escherichia coli sequence) and was first identified by sequence analysis based on the co-variation of U/U and C/C for these positions (31). Thus we favor grouping U/C and C/U in one isosteric subgroup (I5) and C/C with U/U in another (I6). The observed U1432/C1394 pair (RR0033) has a sodium ion bridging UO4–CO2 (compare with cis Watson–Crick/Watson–Crick).
Cis Watson–Crick/Hoogsteen (Family 3). Representative pairs in this family are shown in Figure Figure44 and the corresponding isostericity matrix in Table Table3.3. U/A, U/G and C(+)/G have been observed and together with C/U (modeled on C/G and A/G) are grouped into the isosteric subfamily I1. Modeled base pairs are indicated in Tables Tables33 and and44 in parentheses. Cytosine requires protonation at N3 to form C(+)/G. C/C and U/U have both been observed and are grouped into subfamily I2, which is related to I1 by a lateral shift in the hydrogen bonds. A(+)/G has been observed at high resolution (1.9 Å) and requires protonation of AN1 to form. A(+)/G is grouped with G/A (observed) and A/U (modeled) in subfamily I3. G/G is related to A(+)/G and G/A by a lateral shift in the hydrogen bonding positions, and thus G/G is grouped separately (I4).
The cis Watson–Crick/Hoogsteen interaction often occurs as part of a base triple. The base that interacts with its Hoogsteen edge uses its Watson–Crick edge to pair with the third base. For example, the isosteric U/U and C/C pairs comprise tertiary interactions in the conserved L11-binding site of 23S rRNA as part of such a triple. C1072· C1092=G1099 (E.coli numbering) co-varies with U·U-A in the 23S rRNAs of all phylogenetic groups. This provides another example of sequence co-variation reflecting isosteric subgroups of the isostericity matrix.
In summary, eight of the 10 pairs expected in this family have been observed. The R/R and R/Y pairs exhibit significantly longer C1′–C1′ distances than the Y/R and Y/Y pairs. In addition, isolated examples involving single hydrogen bonds and non-planar interactions have been observed (e.g. A2812/A2814 and A378/C271 in RR0033).
Trans Watson–Crick/Hoogsteen (Family 4). As for the corresponding cis geometry, the R/R and R/Y pairs of the trans Watson–Crick/Hoogsteen geometry exhibit significantly longer C1′–C1′ distances than the Y/R and Y/Y pairs (Fig. (Fig.55 and Table Table3,3, second row, right). U/A and U/C are isosteric (subfamily I1) and are related by a lateral shift to C/A, C(+)/G and U/U (subfamily I2). In fact, I1 and I2 are mutually compatible, thus U/A and C/A are observed to co-vary in the loop E motifs of 5S rRNA and SRP (2,32). U/G is placed in its own group (I3) because it is rarely observed and does not co-vary with U/A or C/A, perhaps because of the repulsion between UO2 and GO6, which may destabilize pairing in the standard geometry and favor hydrogen bonding between UO4 and GC8.
Three of the four R/R combinations form base pairs. A/A and A(+)/G are isosteric and with G/U comprise subfamily I4. G/G is related by a lateral shift to A/A and A(+)/G and is thus not exactly isosteric and so is grouped separately (I5). A(+)/G requires protonation of AN1 and has been observed in tRNA (e.g. TRNA07).
In summary, all 10 pairs expected for this family have been observed. As for the cis Watson–Crick/Hoogsteen family, isolated examples involving single hydrogen bonds and non-planar interactions also occur (e.g. A2577/C2555 and G345/A305 in RR0033).
Cis Watson–Crick/sugar edge (Family 5). The cis Watson– Crick/sugar edge family (Fig. (Fig.66 and Table Table3,3, third row, left) comprises four main isosteric subfamilies that are defined by the base that pairs using its Watson–Crick edge. Thus, all four A/N pairs are isosteric and all have been observed (subfamily I1). All four C/N pairs have been observed and comprise a second group, I2. Three of the G/N pairs have been observed (I3). G/A was modeled using U/A as a template. It should be isosteric to G/C and G/U. G/G displays a significantly longer C1′–C1′ distance and is therefore placed in its own subgroup (I5). U/A and U/G have been observed, whereas U/C and U/U were modeled based on G/C and G/U. The four U/N pairs are also expected to form a single isosteric group (I4).
Trans Watson–Crick/sugar edge (Family 6). The base pairs belonging to the trans Watson–Crick/sugar edge family are shown in Figure Figure77 and the corresponding isostericity matrix in Table Table33 (third row, right). Both A/A and A/G have been observed and are isosteric. The A/G pair is more common and probably more stable as it involves two conventional base–base hydrogen bonds and a potential A(N6)–G(O2′) hydrogen bond. This interaction can occur as part of a base triple (for example A24·G7=C14 in UR0004) or as an isolated tertiary base pair (e.g. A629·G2070 or A2018·A1829 in H.marismortui 23S rRNA, RR0033). The A/Y interactions were modeled based on C/C, but these would only involve one base–base hydrogen bond (Fig. (Fig.7)7) and are expected to occur in the context of base triples. All four A·N interactions should be isosteric (I1, Table Table3,3, third row, right).
The C/A, C/G and C/C interactions have been observed and C/U can be modeled using C/C as a template. Like the A/R interactions, the C/R interactions can occur as part of base triples (e.g. C46·G43=C37 in H.marismortui 5S rRNA, RR0033) or as isolated tertiary interactions (e.g. C1981· A1983, RR0033). The C963/C959 pair from 23S rRNA belongs to a base triple in which C959 is Watson–Crick paired to A1005. The C/G pair is the only C/N trans Watson–Crick/sugar edge interaction to feature two conventional base–base hydrogen bonds and is the most common. All the C/N and A/N pairs are grouped in a single isosteric subfamily, designated I1.
The G/U pair occurs most commonly as the closing base pair in UUCG-type hairpin loops, with the G in the syn configuration and the strands antiparallel (see Table Table22 legend). The G/C trans Watson–Crick/sugar edge pair can also occur in a hairpin loop (e.g. G10/C7 in PR0022) and is isosteric with G/U, which together form the I2 subfamily. The G/R interactions are not expected to occur and have not been observed.
Examples of U/A, U/C and U/G have been observed and U/U can be modeled based on U/C (Fig. (Fig.7).7). (An example of U/U exists in a low resolution structure, U106/U258 in the Group I intron, UR0003.) The U/A interaction occurs as part of a base quadruple with C879=G871 in a three-way junction in 23S rRNA (RR0033). The U/C interaction occurs as part of a base triple in 16S rRNA of Thermus thermophilus (RR0015) and U/G as a tertiary interaction in 23S rRNA (RR0033) that involves a bridging water molecule. The hydrogen bonding patterns in the U/Y and G/Y pairs are similar but the C1′–C1′ distances are greater in the G/Y pairs, so these form different isosteric subgroups. U/A can be grouped with U/Y (I3), but U·G is distinct (I4).
Cis Hoogsteen/Hoogsteen (Family 7). The only examples from this family have been observed in the ribosome (Fig. (Fig.88 and Table Table4,4, row one, left). They are very rare. The G2494/C2493 interaction involves adjacent nucleotides. C2493 is in the rare syn conformation and thus presents its Hoogsteen edge to interact with the Hoogsteen edge of G2494, thus allowing the CH6–GO6 hydrogen bond to form in place of the unfavorable CO2–GO6 repulsive interaction. The second example, G2616/G2617, also involves adjacent nucleotides with G2616 also in the syn conformation. A1742/G2033 is a tertiary interaction with antiparallel strands. Kinks and sharp turns in the phosphodiester backbones of the antiparallel strands allow the two bases to approach each other to form the characteristic AN6–GO6 hydrogen bond.
Trans Hoogsteen/Hoogsteen (Family 8). The trans Hoogsteen/Hoogsteen pairs are shown in Figure Figure99 and the isostericity matrix in Table Table44 (first row, right). Like the trans Watson–Crick/Watson–Crick family, these pairs are self-isosteric due to symmetry. It is interesting to notice that in this family, except for one base pair, all the pairs involve a single hydrogen bond. This pair occurs in tRNA and in sarcin/ricin motifs. The sequence variations observed for these motifs correspond closely to the observed base pairs shown in Figure Figure88 (2,27).
Cis Hoogsteen/sugar edge (Family 9). The cis Hoogsteen/sugar edge interaction can involve the bases of adjacent or more distant nucleotides in the polynucleotide chain. Generally, only a single hydrogen bond can form between the interacting bases (Fig. (Fig.10).10). The best known examples are the A/A ‘platform’ (33) and the U/G ‘side-by-side’ pair of the sarcin/ricin loop motif (18). In addition to these, many other pairs of this type have been observed. Eleven examples involving immediately adjacent nucleotides have been observed and are shown in Figure Figure10.10. On the basis of the U/U pair, we can propose a model for U/C, and on the basis of the G/G pair we can propose G/A. In fact, cis Hoogsteen/sugar edge G/A is observed at lower resolution (~3.5 Å) in the 23S rRNA of D.radiodurans (G2035/A2034 in NDB file rr0051) at the position corresponding to G2093/G2092. Bases of non-adjacent nucleotides can form similar base pairs, but these are not isosteric to the adjacent pairs. All the pairs involving adjacent pairs are essentially isosteric (I1 in Table Table4).4). Non-adjacent pairs form a second isosteric group (I2). Examples of non-adjacent cis Hoogsteen/sugar edge pairs exist for many of the adjacent pairs shown in Figure Figure10,10, but the adjacent pair is shown by preference. Examples of non-adjacent pairs include U2527/G2525, C2787/C2785 and C2575/U2473 from 23S rRNA (H.marismortui) and A56/A54 from 5S rRNA (H.marismortui).
One of the most remarkable cis Hoogsteen/sugar edge pairs is U832/U831, in which a water bridges between U832(O4) and U831(O2). U/U is observed to co-vary with cis Hoogsteen/sugar edge U/G in some sarcin loop motifs (N.B.Leontis and E.Westhof, manuscript in preparation).
Trans Hoogsteen/sugar edge (Family 10). The most common interaction of this type is the ‘sheared’ A/G in which the Hoogsteen edge of A interacts with the sugar edge of G (Fig. (Fig.11).11). In fact, this is the most commonly occurring A/G base pair. This base pair occurs in loop E of 5S rRNA and in the sarcin/ricin motif of 23S rRNA. Co-variations at these positions include A/A, A/Y (Y = U or C), C/A (C Hoogsteen, A sugar edge) and C/Y. On the basis of these co-variations and the structures of the A/G and A/A pairs, models were proposed for A/Y, C/A and C/Y (32). Subsequently, all these pairs have been observed (see Fig. Fig.11),11), just as modeled, and, interestingly, all are isosteric. Thus, the A/N, C/A and C/Y pairs are grouped into one isosteric subfamily, designated I1 in Table Table44 (second row, right). The G/G, U/A and U/G pairs form a second isosteric subgroup (I2) that does not co-vary with the first.
Cis sugar edge/sugar edge (Family 11). As shown in Figure Figure12,12, examples of almost all possible cis sugar edge/sugar edge pairs have been observed and all 16 combinations are expected to be isosteric (Table (Table4,4, third row, left). This interaction is not symmetric as the O2′ of one nucleotide hydrogen bonds to the base R(N3) or Y(O2) and to the hydroxyl O2′ of the other nucleotide. The former nucleotide is given priority (7). When that nucleotide is a pyrimidine (Y), there is in fact no direct base–base hydrogen bond. When it is a purine (R), there is a single base–base hydrogen bond (except for A·G, with two). This interaction occurs frequently between adjacent nucleotides belonging to two strands (with the 5′ nucleotide of one strand receiving from the hydroxyl group of the 3′ nucleotide of the other). Such a motif is referred to as the ‘ribose-zipper motif’ (33). Furthermore, the cis sugar edge/sugar edge interaction often occurs in combination with the trans sugar edge/sugar edge pair of the frequent and versatile recognition motif comprised of adjacent cis and trans sugar edge/sugar edge base pairs (3,27).
Trans sugar edge/sugar edge (Family 12). The trans sugar edge/sugar edge base pair (Fig. (Fig.1313 and Table Table4,4, third row, right) usually involves at least one adenosine. Generally, such interactions occur as part of base triples in which the adenosine (and more rarely guanosine) interacts with the sugar edge of a standard base pair. The A·A, A·G and A·C examples are of this type: A306·A340-U325 (RR0033), A867·C880=G870 (RR0033) and A20·G4=C17 (UR0004). Of these, A·G is by far the most common, since it occurs in the frequent recognition motif made of adjacent cis and trans sugar edge/sugar edge pairs. The A·U pair is found in tRNAs as part of a base triple (A21·U8·A14). The other pairs involving G are much rarer. Examples of G·G include those in which one G is canonically paired as well as isolated tertiary pairs such as G315·G336 and G2428·G2466 (RR0033). The G·U shown in Figure Figure1313 is a tertiary pair, whereas the G·C example is part of a base triple (G2617·C2542=G2617, RR0033). The A·N pairs form one group (I1) and the G·N pairs a second group (I2).
Bifurcated hydrogen bonding patterns. Bifurcated pairs are intermediate between two edge-to-edge geometries (Fig. (Fig.1414 and Table Table5).5). They involve interactions between an exocyclic functional group of one base and the edge of another. Bifurcated pairs may also show distinct patterns of co-variation and substitution. For example, the isosteric G·G and G·U cis bifurcated pairs, first observed at high resolution in the structure of loop E of bacterial 5S rRNA, were found to co-vary with each other and with A·C and A·A, both of which could be modeled in the same geometry (32). These pairs are intermediate to the cis Watson–Crick/Watson–Crick and the trans Watson–Crick/Hoogsteen families. The isostericity matrix (Table (Table5)5) was proposed for bifurcated pairs of this kind (27). Additional examples belonging to this family of bifurcated pairs have been observed in the ribosome, including C2502/C2518 (also part of a loop E type motif) and C930/A1040.
Bifurcated pairs intermediate to the trans Watson–Crick/Hoogsteen and trans sugar edge/Hoogsteen families occur in loop E-related motifs in 16S rRNA (G581·G760, E.coli numbering), 23S rRNA (G706·G722) and the SRP (G162-G149).
Intermediate and alternative hydrogen bonding patterns. In a small number of cases, alternative hydrogen bonding patterns have been observed for particular base pair combinations. These may be due to the limited resolution of the experimental data or refinement errors or to the actual existence of distinct potential energy minima that depend on the local structural context. The symmetrical, cis Watson–Crick (wobble-like) U/U and C/C pairs provide trivial examples of the latter. For example, two uridines can pair with UO4–UN3 and UN3– UO2 hydrogen bonds or with UN3–UO4 and UO2–UN3 hydrogen bonds. Which set of hydrogen bonds occurs depends on the local context. Alternatively, U/U can open up and incorporate a bridging water molecule (34). Likewise, G and U can form a conventional wobble pair (Fig. (Fig.2)2) or, in certain contexts, a bifurcated pair, involving two bridging water molecules (Fig. (Fig.14).14). Two possible hydrogen bonding patterns for trans Watson–Crick C/C were discussed above. Higher resolution structural work complemented by computation is needed to determine which pattern is favored and whether this is context-dependent.
Another example is provided by the cis Watson–Crick (wobble) C·A pair, for which hydrogen bonding may be proposed between C(N4) and A(N1) and between C(N3) and A(C2) in place of hydrogen bonds between C(N1) and A(N6) and C(N3) and protonated A(N3), which are usually observed. An example with the alternative hydrogen bonding pattern is observed in the context of a base triple in 23S rRNA (C40·A441·A442 in RR0033). The triple consists of the A442·A441 cis Hoogsteen/sugar edge interaction and the alternative C40·A441 cis Watson–Crick/Watson–Crick pair. An additional hydrogen bond is observed between C40(O2) and A442(N6). Higher resolution is required to confirm this interaction.
In conclusion, it must be emphasized that base pairing is due to multiple weak interactions and thus a considerable degree of flexibility and deformation is expected. Thus, while one can generally classify base pairs into one of the 12 families discussed above, a particular base pair may form with a slightly different combination of hydrogen bonds or with the absence of one or more hydrogen bonds, depending on the structural context or on the resolution of the structure.
Interactions of a base with an ‘edge’ defined by two bases. A premise of the approach we have taken has been that complex interactions (base triples, quadruples, etc.) can be analyzed as combinations of base pairs. In a few cases this analysis breaks down and new patterns arise, which again reflect synergistic effects. An example is the interaction of the Watson–Crick edge of C with the Hoogsteen edge of a (standard) G=C base pair. Four interactions can be anticipated, cis or trans C·G Watson–Crick/Hoogsteen and cis or trans C·C Watson–Crick/Hoogsteen, and are in fact observed (see Tables Tables44 and and5).5). A fifth interaction, distinct from these, has also been observed. An example is provided by C113 interacting with the Hoogsteen edges of the C15=G66 base pair in 5S rRNA of H.marismortui (RR0033). This interaction is intermediate between the cis C·C Watson–Crick/Hoogsteen interaction seen in base triples such as C1072·C1092=G1099 in the L11-binding site of 23S rRNA (E.coli numbering, NDB file PR0015 or RR0009) and the trans C(N1+)·G Watson–Crick/Hoogsteen interaction seen in base triples such as C8(+)· G12=C26 in the frameshifting pseudoknot (UR0004). It can best be described as an interaction of the Watson–Crick edge of C113 with the Hoogsteen edge of the C15=G66 base pair, as it involves hydrogen bonds to both G66(O6) and C15(N4).
The rapidly growing database of RNA crystal structures provides examples of nearly every type of base pair. Many of the base pairs presented in Figures Figures22–14 were first proposed on theoretical grounds and have now been observed by X-ray crystallography at <3.0 Å resolution. Generally, the observed base pairs are as predicted (27,32). The overwhelming number of base–base interactions observed in the ribosome and the other new structures that have appeared recently can be unambiguously classified into one of the 12 families of Table Table2.2. A small number of base pairs comprise bifurcated pairs that are intermediate between two of the 12 families (7). Furthermore, care must be taken so as not to confuse the trans sugar edge/sugar edge and trans Watson–Crick/sugar edge interactions, because frequently a Watson–Crick/2′-OH hydrogen bond can also occur in the trans sugar edge/sugar edge geometry.
Other kinds of interactions are observed in complex RNA structures which need to be analyzed and catalogued, including additional bifurcated pairs, perpendicular edge-to-edge interactions, interactions exclusively involving the ribose moiety of one or both nucleotides, and base stacking interactions.
Preliminary analyses, some of which have been presented here, indicate that there is a close correspondence between the isosteric subfamilies identified on structural grounds and the patterns of co-variation and base substitution that are observed in homologous RNA, when they are properly aligned. The primary significance of this work is that it provides a basis for evaluating and refining structural alignments for homologous RNA molecules. Consideration of the isostericity matrix corresponding to each base pair is essential for producing correct alignments at positions involved in non-Watson–Crick base pairing or determining that one motif has in fact been replaced by another in a set of homologous sequences.
Here, we have emphasized the geometrical aspects of base pairing in order to aid in their classification. Clearly, depending on the edges involved, various groups or sites will be available for interactions with another RNA segment, a protein or a small molecule. For example, when the Watson–Crick sites are not engaged, they can be used for interaction with phosphate groups. Similarly, the Hoogsteen sites are used for interactions with amino acid side chains in complexes between proteins and helices. Besides conferring geometrical similarity, the isostericity matrices contain information on compensating changes that would occur between base pairs at the level of a given functional group or a set of functional groups.
The authors acknowledge fruitful discussions with Pascal Auffinger and Luc Jaeger. This work was supported by NSF REU grant CHE-9732563 and NIH grant 2R15-GM55898.