For each base pair in Figures –, the source (NDB filename) and resolution of the X-ray data (in Å), as well as the C1′–C1′ distance (also in Å) are provided in the lower right corner. As higher resolution examples are obtained of each base pair, they may be conveniently substituted for the pair shown. In those cases where an example of a base pair was not found in a crystal structure, the pair was modeled using known structures as templates and basic principles of hydrogen bonding. The pairs used as templates for modeled pairs are noted in the lower right of the panel. Blank spaces in Figures – indicate base combinations for which no example has been found and for which no reasonable model could be proposed based on current knowledge. Sugar ring atoms are drawn for those cases where the O2′ participates (or could potentially participate) in hydrogen bonding to the base (or ribose O2′) of the partner nucleotide. Otherwise the entire sugar moiety is designated with a closed circle.
Isostericity matrices
Generally, base pairs belonging to the same geometric family exhibit very similar relative orientations of their glycosidic bonds, implying the maintenance of the local orientations of the strands and thus of the three-dimensional organization. However, in the general case, all possible base pairs belonging to a single geometric family are not isosteric to each other because the C1′–C1′ distances may be quite different. Thus, the C1′–C1′ distance can be used to group the base pairs within each geometric family into isosteric subsets or subfamilies. The recognition of subsets of isosteric base pairs within a family serves the purpose of identifying pairs that can substitute for each other while preserving the three-dimensional structure, crucial information for three-dimensional modeling of tertiary interactions, prediction of motifs, and the generation and refinement of accurate structural alignments. In the following, each geometric family is considered in turn and the isosteric subsets of base pairs identified from Figures – are summarized in the form of isostericity matrices in Tables –.
| Table 3.Isostericity matrices for base pairing Families 1–6 |
| Table 5.Isostericity matrix for the cis bifurcated geometry |
Cis Watson–Crick/Watson–Crick (Family 1). We begin with the base pairs belonging to the
cis Watson–Crick/Watson–Crick geometric family, shown in Figure . The (canonical) Watson–Crick pairs, A-U, U-A, G=C and C=G, form an isosteric subfamily, which we designate I
1 in the isostericity matrix for this family, shown in Table (first row, left). Likewise the wobble pairs G/U and A(+)/C form an isosteric subgroup I
2. However, unlike I
1, the wobble pairs are not self-isosteric and, thus, the wobble pairs U/G and C/A(+) comprise a third isosteric subset, which, however, is related to I
2 and is therefore designated i
2. In certain contexts the wobble pairs can substitute for canonical
cis Watson–Crick/Watson–Crick pairs within a helix. We can say that they are compatible with the canonical pairs. However, substitution of a G/U or A(+)/C pair for a U/G or C/A(+) results in a larger structural perturbation in a helical context (
29) and thus R/Y are usually not compatible with Y/R wobble pairs.
The pairs A/G and G/A constitute a fourth subfamily, designated I
3. Like the canonical pairs (I
1) they are self-isosteric. I
4 consists solely of the A/A pair, since the G/G combination cannot occur in this geometry. C/U and U/C are self-isosteric and comprise subset I
5. Interestingly, in high resolution structures this pair is consistently observed with an inserted water molecule, bridging between the imino positions of the bases, perhaps because of repulsion between the O2 atoms of the interacting pyrimidines (
30). Consequently the C1′–C1′ distance for the water-inserted C/U pair is significantly larger than expected for a pyrimidine–pyrimidine pair, and close to that of
cis Watson–Crick/Watson–Crick A/G. Interestingly, U/C is observed to co-vary with A/G in the anticodon stem of tRNAs (
27). Thus, in certain contexts C/U and A/G are compatible.
The isosteric wobble pairs C(+)/C and U/U, both of which have been observed, comprise the final isosteric subgroup of the cis Watson–Crick/Watson–Crick geometric family, designated I6. The C1′–C1′ distance in this subfamily is significantly smaller than that of any of the others, including the water-inserted U/C.
Trans Watson–Crick/Watson–Crick (Family 2). Representa tive base pairs belonging to the trans Watson–Crick/Watson–Crick geometric family are shown in Figure and the isostericity matrix is shown in the right panel of the first row of Table . The trans orientation of the glycosidic bonds allows for a possible 2-fold axis perpendicular to and passing through the middle of the base pair. Unlike the corresponding cis pairs, the A/U (designated I1) and G/C (designated I2) pairs are not isosteric. However, these and all trans Watson–Crick/Watson–Crick pairs are self-isosteric and thus Table is symmetric with respect to the main diagonal. The pairs A/C and G/U are isosteric, but not isosteric with A/U or G/C, and thus form a third group, I3. The homopurine pairs A/A and G/G are isosteric (I4) but A/G cannot form with two hydrogen bonds. As for the cis Watson–Crick/Watson–Crick family, all possible trans Watson–Crick/Watson–Crick pairs have been observed in crystal structures.
The
trans Watson–Crick/Watson–Crick C/C pair shown in Figure has three hydrogen bonds and requires protonation of one cytosine at N1. It is from a crystal structure of cysteinyl tRNA at 2.6 Å resolution (PR0004). An alternative hydrogen bonding pattern can be proposed that does not require protonation but involves only two hydrogen bonds (CN1– CN4 and CN4–CN1), which would make C/C isosteric with U/U rather than U/C. This geometry is observed at lower resolution (3.5 Å) for the tertiary base pair (C1773/C2565) in the structure of the 23S rRNA of
Deinococcus radiodurans (RR0051). This pair corresponds to the tertiary interaction U1838 /U2621 in the 23S rRNA of
Haloarcula marismortui (U1782/U2586 in the
Escherichia coli sequence) and was first identified by sequence analysis based on the co-variation of U/U and C/C for these positions (
31). Thus we favor grouping U/C and C/U in one isosteric subgroup (I
5) and C/C with U/U in another (I
6). The observed U1432/C1394 pair (RR0033) has a sodium ion bridging UO4–CO2 (compare with
cis Watson–Crick/Watson–Crick).
Cis Watson–Crick/Hoogsteen (Family 3). Representative pairs in this family are shown in Figure and the corresponding isostericity matrix in Table . U/A, U/G and C(+)/G have been observed and together with C/U (modeled on C/G and A/G) are grouped into the isosteric subfamily I1. Modeled base pairs are indicated in Tables and in parentheses. Cytosine requires protonation at N3 to form C(+)/G. C/C and U/U have both been observed and are grouped into subfamily I2, which is related to I1 by a lateral shift in the hydrogen bonds. A(+)/G has been observed at high resolution (1.9 Å) and requires protonation of AN1 to form. A(+)/G is grouped with G/A (observed) and A/U (modeled) in subfamily I3. G/G is related to A(+)/G and G/A by a lateral shift in the hydrogen bonding positions, and thus G/G is grouped separately (I4).
| Table 4.Isostericity matrices for base pairing Families 7–12 |
The cis Watson–Crick/Hoogsteen interaction often occurs as part of a base triple. The base that interacts with its Hoogsteen edge uses its Watson–Crick edge to pair with the third base. For example, the isosteric U/U and C/C pairs comprise tertiary interactions in the conserved L11-binding site of 23S rRNA as part of such a triple. C1072· C1092=G1099 (E.coli numbering) co-varies with U·U-A in the 23S rRNAs of all phylogenetic groups. This provides another example of sequence co-variation reflecting isosteric subgroups of the isostericity matrix.
In summary, eight of the 10 pairs expected in this family have been observed. The R/R and R/Y pairs exhibit significantly longer C1′–C1′ distances than the Y/R and Y/Y pairs. In addition, isolated examples involving single hydrogen bonds and non-planar interactions have been observed (e.g. A2812/A2814 and A378/C271 in RR0033).
Trans Watson–Crick/Hoogsteen (Family 4). As for the corresponding
cis geometry, the R/R and R/Y pairs of the
trans Watson–Crick/Hoogsteen geometry exhibit significantly longer C1′–C1′ distances than the Y/R and Y/Y pairs (Fig. and Table , second row, right). U/A and U/C are isosteric (subfamily I
1) and are related by a lateral shift to C/A, C(+)/G and U/U (subfamily I
2). In fact, I
1 and I
2 are mutually compatible, thus U/A and C/A are observed to co-vary in the loop E motifs of 5S rRNA and SRP (
2,
32). U/G is placed in its own group (I
3) because it is rarely observed and does not co-vary with U/A or C/A, perhaps because of the repulsion between UO2 and GO6, which may destabilize pairing in the standard geometry and favor hydrogen bonding between UO4 and GC8.
Three of the four R/R combinations form base pairs. A/A and A(+)/G are isosteric and with G/U comprise subfamily I4. G/G is related by a lateral shift to A/A and A(+)/G and is thus not exactly isosteric and so is grouped separately (I5). A(+)/G requires protonation of AN1 and has been observed in tRNA (e.g. TRNA07).
In summary, all 10 pairs expected for this family have been observed. As for the cis Watson–Crick/Hoogsteen family, isolated examples involving single hydrogen bonds and non-planar interactions also occur (e.g. A2577/C2555 and G345/A305 in RR0033).
Cis Watson–Crick/sugar edge (Family 5). The cis Watson– Crick/sugar edge family (Fig. and Table , third row, left) comprises four main isosteric subfamilies that are defined by the base that pairs using its Watson–Crick edge. Thus, all four A/N pairs are isosteric and all have been observed (subfamily I1). All four C/N pairs have been observed and comprise a second group, I2. Three of the G/N pairs have been observed (I3). G/A was modeled using U/A as a template. It should be isosteric to G/C and G/U. G/G displays a significantly longer C1′–C1′ distance and is therefore placed in its own subgroup (I5). U/A and U/G have been observed, whereas U/C and U/U were modeled based on G/C and G/U. The four U/N pairs are also expected to form a single isosteric group (I4).
Trans Watson–Crick/sugar edge (Family 6). The base pairs belonging to the trans Watson–Crick/sugar edge family are shown in Figure and the corresponding isostericity matrix in Table (third row, right). Both A/A and A/G have been observed and are isosteric. The A/G pair is more common and probably more stable as it involves two conventional base–base hydrogen bonds and a potential A(N6)–G(O2′) hydrogen bond. This interaction can occur as part of a base triple (for example A24·G7=C14 in UR0004) or as an isolated tertiary base pair (e.g. A629·G2070 or A2018·A1829 in H.marismortui 23S rRNA, RR0033). The A/Y interactions were modeled based on C/C, but these would only involve one base–base hydrogen bond (Fig. ) and are expected to occur in the context of base triples. All four A·N interactions should be isosteric (I1, Table , third row, right).
The C/A, C/G and C/C interactions have been observed and C/U can be modeled using C/C as a template. Like the A/R interactions, the C/R interactions can occur as part of base triples (e.g. C46·G43=C37 in H.marismortui 5S rRNA, RR0033) or as isolated tertiary interactions (e.g. C1981· A1983, RR0033). The C963/C959 pair from 23S rRNA belongs to a base triple in which C959 is Watson–Crick paired to A1005. The C/G pair is the only C/N trans Watson–Crick/sugar edge interaction to feature two conventional base–base hydrogen bonds and is the most common. All the C/N and A/N pairs are grouped in a single isosteric subfamily, designated I1.
The G/U pair occurs most commonly as the closing base pair in UUCG-type hairpin loops, with the G in the syn configuration and the strands antiparallel (see Table legend). The G/C trans Watson–Crick/sugar edge pair can also occur in a hairpin loop (e.g. G10/C7 in PR0022) and is isosteric with G/U, which together form the I2 subfamily. The G/R interactions are not expected to occur and have not been observed.
Examples of U/A, U/C and U/G have been observed and U/U can be modeled based on U/C (Fig. ). (An example of U/U exists in a low resolution structure, U106/U258 in the Group I intron, UR0003.) The U/A interaction occurs as part of a base quadruple with C879=G871 in a three-way junction in 23S rRNA (RR0033). The U/C interaction occurs as part of a base triple in 16S rRNA of Thermus thermophilus (RR0015) and U/G as a tertiary interaction in 23S rRNA (RR0033) that involves a bridging water molecule. The hydrogen bonding patterns in the U/Y and G/Y pairs are similar but the C1′–C1′ distances are greater in the G/Y pairs, so these form different isosteric subgroups. U/A can be grouped with U/Y (I3), but U·G is distinct (I4).
Cis Hoogsteen/Hoogsteen (Family 7). The only examples from this family have been observed in the ribosome (Fig. and Table , row one, left). They are very rare. The G2494/C2493 interaction involves adjacent nucleotides. C2493 is in the rare syn conformation and thus presents its Hoogsteen edge to interact with the Hoogsteen edge of G2494, thus allowing the CH6–GO6 hydrogen bond to form in place of the unfavorable CO2–GO6 repulsive interaction. The second example, G2616/G2617, also involves adjacent nucleotides with G2616 also in the syn conformation. A1742/G2033 is a tertiary interaction with antiparallel strands. Kinks and sharp turns in the phosphodiester backbones of the antiparallel strands allow the two bases to approach each other to form the characteristic AN6–GO6 hydrogen bond.
Trans Hoogsteen/Hoogsteen (Family 8). The
trans Hoogsteen/Hoogsteen pairs are shown in Figure and the isostericity matrix in Table (first row, right). Like the
trans Watson–Crick/Watson–Crick family, these pairs are self-isosteric due to symmetry. It is interesting to notice that in this family, except for one base pair, all the pairs involve a single hydrogen bond. This pair occurs in tRNA and in sarcin/ricin motifs. The sequence variations observed for these motifs correspond closely to the observed base pairs shown in Figure (
2,
27).
Cis Hoogsteen/sugar edge (Family 9). The
cis Hoogsteen/sugar edge interaction can involve the bases of adjacent or more distant nucleotides in the polynucleotide chain. Generally, only a single hydrogen bond can form between the interacting bases (Fig. ). The best known examples are the A/A ‘platform’ (
33) and the U/G ‘side-by-side’ pair of the sarcin/ricin loop motif (
18). In addition to these, many other pairs of this type have been observed. Eleven examples involving immediately adjacent nucleotides have been observed and are shown in Figure . On the basis of the U/U pair, we can propose a model for U/C, and on the basis of the G/G pair we can propose G/A. In fact,
cis Hoogsteen/sugar edge G/A is observed at lower resolution (~3.5 Å) in the 23S rRNA of
D.radiodurans (G2035/A2034 in NDB file rr0051) at the position corresponding to G2093/G2092. Bases of non-adjacent nucleotides can form similar base pairs, but these are not isosteric to the adjacent pairs. All the pairs involving adjacent pairs are essentially isosteric (I
1 in Table ). Non-adjacent pairs form a second isosteric group (I
2). Examples of non-adjacent
cis Hoogsteen/sugar edge pairs exist for many of the adjacent pairs shown in Figure , but the adjacent pair is shown by preference. Examples of non-adjacent pairs include U2527/G2525, C2787/C2785 and C2575/U2473 from 23S rRNA (
H.marismortui) and A56/A54 from 5S rRNA (
H.marismortui).
One of the most remarkable cis Hoogsteen/sugar edge pairs is U832/U831, in which a water bridges between U832(O4) and U831(O2). U/U is observed to co-vary with cis Hoogsteen/sugar edge U/G in some sarcin loop motifs (N.B.Leontis and E.Westhof, manuscript in preparation).
Trans Hoogsteen/sugar edge (Family 10). The most common interaction of this type is the ‘sheared’ A/G in which the Hoogsteen edge of A interacts with the sugar edge of G (Fig. ). In fact, this is the most commonly occurring A/G base pair. This base pair occurs in loop E of 5S rRNA and in the sarcin/ricin motif of 23S rRNA. Co-variations at these positions include A/A, A/Y (Y = U or C), C/A (C Hoogsteen, A sugar edge) and C/Y. On the basis of these co-variations and the structures of the A/G and A/A pairs, models were proposed for A/Y, C/A and C/Y (
32). Subsequently, all these pairs have been observed (see Fig. ), just as modeled, and, interestingly, all are isosteric. Thus, the A/N, C/A and C/Y pairs are grouped into one isosteric subfamily, designated I
1 in Table (second row, right). The G/G, U/A and U/G pairs form a second isosteric subgroup (I
2) that does not co-vary with the first.
Cis sugar edge/sugar edge (Family 11). As shown in Figure , examples of almost all possible
cis sugar edge/sugar edge pairs have been observed and all 16 combinations are expected to be isosteric (Table , third row, left). This interaction is not symmetric as the O2′ of one nucleotide hydrogen bonds to the base R(N3) or Y(O2) and to the hydroxyl O2′ of the other nucleotide. The former nucleotide is given priority (
7). When that nucleotide is a pyrimidine (Y), there is in fact no direct base–base hydrogen bond. When it is a purine (R), there is a single base–base hydrogen bond (except for A·G, with two). This interaction occurs frequently between adjacent nucleotides belonging to two strands (with the 5′ nucleotide of one strand receiving from the hydroxyl group of the 3′ nucleotide of the other). Such a motif is referred to as the ‘ribose-zipper motif’ (
33). Furthermore, the
cis sugar edge/sugar edge interaction often occurs in combination with the
trans sugar edge/sugar edge pair of the frequent and versatile recognition motif comprised of adjacent
cis and
trans sugar edge/sugar edge base pairs (
3,
27).
Trans sugar edge/sugar edge (Family 12). The trans sugar edge/sugar edge base pair (Fig. and Table , third row, right) usually involves at least one adenosine. Generally, such interactions occur as part of base triples in which the adenosine (and more rarely guanosine) interacts with the sugar edge of a standard base pair. The A·A, A·G and A·C examples are of this type: A306·A340-U325 (RR0033), A867·C880=G870 (RR0033) and A20·G4=C17 (UR0004). Of these, A·G is by far the most common, since it occurs in the frequent recognition motif made of adjacent cis and trans sugar edge/sugar edge pairs. The A·U pair is found in tRNAs as part of a base triple (A21·U8·A14). The other pairs involving G are much rarer. Examples of G·G include those in which one G is canonically paired as well as isolated tertiary pairs such as G315·G336 and G2428·G2466 (RR0033). The G·U shown in Figure is a tertiary pair, whereas the G·C example is part of a base triple (G2617·C2542=G2617, RR0033). The A·N pairs form one group (I1) and the G·N pairs a second group (I2).
Bifurcated hydrogen bonding patterns. Bifurcated pairs are intermediate between two edge-to-edge geometries (Fig. and Table ). They involve interactions between an exocyclic functional group of one base and the edge of another. Bifurcated pairs may also show distinct patterns of co-variation and substitution. For example, the isosteric G·G and G·U
cis bifurcated pairs, first observed at high resolution in the structure of loop E of bacterial 5S rRNA, were found to co-vary with each other and with A·C and A·A, both of which could be modeled in the same geometry (
32). These pairs are intermediate to the
cis Watson–Crick/Watson–Crick and the
trans Watson–Crick/Hoogsteen families. The isostericity matrix (Table ) was proposed for bifurcated pairs of this kind (
27). Additional examples belonging to this family of bifurcated pairs have been observed in the ribosome, including C2502/C2518 (also part of a loop E type motif) and C930/A1040.
Bifurcated pairs intermediate to the trans Watson–Crick/Hoogsteen and trans sugar edge/Hoogsteen families occur in loop E-related motifs in 16S rRNA (G581·G760, E.coli numbering), 23S rRNA (G706·G722) and the SRP (G162-G149).
Intermediate and alternative hydrogen bonding patterns. In a small number of cases, alternative hydrogen bonding patterns have been observed for particular base pair combinations. These may be due to the limited resolution of the experimental data or refinement errors or to the actual existence of distinct potential energy minima that depend on the local structural context. The symmetrical,
cis Watson–Crick (wobble-like) U/U and C/C pairs provide trivial examples of the latter. For example, two uridines can pair with UO4–UN3 and UN3– UO2 hydrogen bonds or with UN3–UO4 and UO2–UN3 hydrogen bonds. Which set of hydrogen bonds occurs depends on the local context. Alternatively, U/U can open up and incorporate a bridging water molecule (
34). Likewise, G and U can form a conventional wobble pair (Fig. ) or, in certain contexts, a bifurcated pair, involving two bridging water molecules (Fig. ). Two possible hydrogen bonding patterns for
trans Watson–Crick C/C were discussed above. Higher resolution structural work complemented by computation is needed to determine which pattern is favored and whether this is context-dependent.
Another example is provided by the cis Watson–Crick (wobble) C·A pair, for which hydrogen bonding may be proposed between C(N4) and A(N1) and between C(N3) and A(C2) in place of hydrogen bonds between C(N1) and A(N6) and C(N3) and protonated A(N3), which are usually observed. An example with the alternative hydrogen bonding pattern is observed in the context of a base triple in 23S rRNA (C40·A441·A442 in RR0033). The triple consists of the A442·A441 cis Hoogsteen/sugar edge interaction and the alternative C40·A441 cis Watson–Crick/Watson–Crick pair. An additional hydrogen bond is observed between C40(O2) and A442(N6). Higher resolution is required to confirm this interaction.
In conclusion, it must be emphasized that base pairing is due to multiple weak interactions and thus a considerable degree of flexibility and deformation is expected. Thus, while one can generally classify base pairs into one of the 12 families discussed above, a particular base pair may form with a slightly different combination of hydrogen bonds or with the absence of one or more hydrogen bonds, depending on the structural context or on the resolution of the structure.
Interactions of a base with an ‘edge’ defined by two bases. A premise of the approach we have taken has been that complex interactions (base triples, quadruples, etc.) can be analyzed as combinations of base pairs. In a few cases this analysis breaks down and new patterns arise, which again reflect synergistic effects. An example is the interaction of the Watson–Crick edge of C with the Hoogsteen edge of a (standard) G=C base pair. Four interactions can be anticipated, cis or trans C·G Watson–Crick/Hoogsteen and cis or trans C·C Watson–Crick/Hoogsteen, and are in fact observed (see Tables and ). A fifth interaction, distinct from these, has also been observed. An example is provided by C113 interacting with the Hoogsteen edges of the C15=G66 base pair in 5S rRNA of H.marismortui (RR0033). This interaction is intermediate between the cis C·C Watson–Crick/Hoogsteen interaction seen in base triples such as C1072·C1092=G1099 in the L11-binding site of 23S rRNA (E.coli numbering, NDB file PR0015 or RR0009) and the trans C(N1+)·G Watson–Crick/Hoogsteen interaction seen in base triples such as C8(+)· G12=C26 in the frameshifting pseudoknot (UR0004). It can best be described as an interaction of the Watson–Crick edge of C113 with the Hoogsteen edge of the C15=G66 base pair, as it involves hydrogen bonds to both G66(O6) and C15(N4).