|Home | About | Journals | Submit | Contact Us | Français|
CAG repeats occur predominantly in the coding regions of human genes, which suggests their functional importance. In some genes, these sequences can undergo pathogenic expansions leading to neurodegenerative polyglutamine (poly-Q) diseases. The mutant transcripts containing expanded CAG repeats possibly contribute to pathogenesis in addition to the well-known pathogenic effects of mutant proteins. We have analysed two crystal forms of RNA duplexes containing CAG repeats: (GGCAGCAGCC)2. One of the structures has been determined at atomic resolution (0.95Å) and the other at 1.9Å. The duplexes include non-canonical A–A pairs that fit remarkably well within a regular A-helix. All the adenosines are in the anti-conformation and the only interaction within each A–A pair is a single C2-H2···N1 hydrogen bond. Both adenosines in each A–A pair are shifted towards the major groove, although to different extents; the A which is the H-bond donor stands out more (the ‘thumbs-up’ conformation). The main effect on the helix conformation is a local unwinding. The CAG repeats and the previously examined CUG structures share a similar pattern of electrostatic charge distribution in the minor groove, which could explain their affinity for the pathogenesis-related MBNL1 protein.
Trinucleotide repeats have received special attention in biomedical research because some of them are known to undergo pathogenic expansions leading to incurable triplet repeat expansion diseases (TREDs) (1). Several TREDs are triggered by expanded triplet repeats located in UTRs of the implicated genes. These include fragile X syndrome (FXS) caused by abnormally elongated CGG repeats located in 5′-UTR of the fragile X mental retardation gene (FMR1), and myotonic dystrophy type 1 (DM1) caused by an expanded CUG repeat present in 3′-UTR of dystrophia myotonica protein kinase gene (DMPK). Most of the TREDs are triggered by expanded CAG repeat tracts that occur in protein-coding regions of specific single genes that are transcribed and translated into functionally unrelated proteins. The genetic diseases having this mutational basis include Huntington’s disease, spinal and bulbar muscular atrophy (SBMA), dentatorubral-palidoluysian atrophy (DRPLA) and several spinocerebellar ataxias (SCAs). These disorders are also known under the common name of polyglutamine (poly-Q) diseases because the CAG repeat tracts encode polyglutamine, that results in altered protein structure, shown in numerous studies to be involved in pathogenesis (2,3).
Over this decade, several authors have proposed that also mutant transcript having expanded CAG repeat may contribute to pathogenesis of polyglutamine diseases (4–9). Among the arguments used in favour of this possibility was the similarity of the RNA secondary structure formed by the CAG repeats and CUG repeats implicated in RNA-mediated pathogenesis of DM1 and SCA8 (4,10). The CAG and the CUG repeats in transcripts have been shown to interact with the same splicing regulator MBNL1 (7,11) whose sequestration by CUG tracts cause alternative splicing aberrations leading to DM1 (12) and SCA8 (13). Moreover, it was shown using repeats expressed from suitable genetic constructs that expanded non-translated CAG repeats were capable of triggering neurodegeneration in vivo in the Drosophila eye model of SCA3 (8).
The CAG repeat tracts composed of six or more repeat units are present in about 300 human genes and these sequences are strongly overrepresented in exons, which suggests their positive selection and functional importance (14). Little is known about physiological roles of CAG repeats in transcripts but their structural characterisation is more advanced. Most of the relevant structural studies have been carried out using biochemical methods on CAG repeats buried in sequence context of mRNAs of genes implicated in poly-Q disease (15–17). Short CAG repeat tracts were shown to be single-stranded, but longer repeats formed fairly stable hairpins in which alternating A–A interactions occurred between the G–C and C–G base pairs (18). Also structures formed by pure CAG repeats were compared with those formed by other triplet repeats, using both biochemical (10,20) and biophysical (19,20) methods. The CAG repeats were shown to form considerably more stable hairpins than CUG repeats of the same length (20) but the specific structural factor responsible for this difference could not be identified.
In this study, we have determined the crystal structure of an RNA duplex containing consecutive CAG repeats, based on two different crystal forms of GGCAGCAGCC. One of the structures has been determined at atomic resolution (0.95Å). The study addresses the issue of the detailed structure as well as similarities and differences between the CAG and CUG repeat structures.
rGGCAGCAGCC oligomer was synthesized on an Applied Biosystems DNA/RNA synthesizer, using cyanoethyl phosphoramidite chemistry. Commercially available A, C and G phosphoramidites with 2'-O-tetrbutyldimethylsilyl were used for the synthesis of RNA (Glen Research, Azco, Proligo). The details of deprotection and purification of oligoribonucleotides were described previously (21). The RNA oligomer was dissolved in 100mM KCl to the final concentration of 1mM and annealed for 10min at 65°C, then cooled slowly to ambient temperature within 2–3h. Two forms of crystals were obtained, rhombohedral and trigonal, by the hanging drop/vapour diffusion method. The crystallization drops initially contained 2μl of RNA and 2μl of the reservoir solution. The initial volume of the reservoir solution was 500μl. The rhombohedral crystal appeared after almost 1 year at 19°C in 25mM MgSO4, 50mM Tris–HCl pH 8.5 and 1.8M (NH4)2SO4. The size of the crystal was 0.4×0.4×0.5mm. The second form was obtained at 30°C in 10mM MgSO4, 50mM cacodylate-NaOH pH 6.5 and 2M (NH4)2SO4. Crystals appeared within 2–3 weeks and then were moved to 19°C.
X-ray diffraction data were collected: for the rhombohedral crystal on BL 14.1 beam line at the BESSY synchrotron in Berlin to the resolution of 0.95Å and for the trigonal form on EMBL X13, DESY, Hamburg, resolution 1.9Å. Both forms were cryoprotected by 20% glycerol (v/v) in the mother liquor. The data were integrated and scaled using the program suite DENZO/SCALEPACK (22). Although the cell parameters of the two crystal forms were similar, the space groups were different: R32:H and P32 (details in Supplementary Data). Solving the structures by molecular replacement, using PHASER (23), revealed different packing of the r(GGCAGCAGCC)2 oligomer in the same crystal cell. Early stages of the refinement were done using the program Refmac5 (24) from the CCP4 program suite (25) then refinement was carried out with PHENIX (26). Approximately 1000 reflections were set aside for the Rfree statistic (5%). The program Coot (27) was used for visualisation of electron density maps 2Fo–Fc and Fo–Fc and for manual rebuilding of the atomic model. Solvent water molecules were added by ARP/wARP (28) working in the default solvent building mode. During the refinement of the rhombohedral structure, anisotropic temperature factors was implemented and hydrogen atoms were added to the model. The last few cycles were performed using all data, including the Rfree set. The final cycles of the refinement were carried out without stereochemical restraints. The trigonal model was refined using isotropic B-factors and TLS strategy.
The helical parameters were calculated using 3DNA (29). Sequence-independent measures were used, based on vectors connecting the C1′ atoms of the paired residues, to avoid computational artefacts arising from non-canonical base pairing. Program PDB2PQR (30) was used to assign partial charges and radii to atoms of the models, according to the AMBER force field. Subsequently, the surface electrostatic potential for the RNA models was calculated with APBS (31). All pictures were drawn using PyMOL v0.99rc6 (32). The coordinates of both crystallographic models have been deposited with the Protein Data Bank (PDB). The accession codes are 3NJ6 and 3NJ7.
In the rhombohedral structure, the asymmetric unit contains one RNA strand (chain H). The second strand of the duplex is symmetry-related via a crystallographic 2-fold axis. In the trigonal structure, the asymmetric unit contains tree duplexes: A+B, C+D, E+F. All the duplexes stack end-to-end, forming semi-infinite columns parallel to the c cell edge. The RNA interacts with ordered water molecules and sulphate anions. The models are summarized in Table 1 and Supplementary Table S1. Crystal lattice interactions are discussed in Supplementary Data.
The sequence-independent helical parameters, based on inter-strand vectors between C1′ atoms, were found to be a convenient, although simplified, measure of the helix properties (Supplementary Table S2). All the duplexes are in the A-form, with the Zp values (the displacement of the phosphorus atom from the xy-plane of the ‘middle frame’ between neighbouring base-pairs) in the range 2.3–3.1Å (33). The sugar conformation of most residues is 3′-endo, with the exception of 5G in chain A and 5G in chain E, which show the 2′-exo puckering. Values of the torsion angles fit in the typical range for the A-form (Supplementary Table S3). However, some distortions are observed, mainly for α- and γ-angles of all guanosines 5G. Instead of the common A-RNA conformation –gauche –sc, the α-angles for 5G in all seven strands are in different conformational regions: +sp for chain H, +ac for chain A, C and E, –ac for B, D and F. Only in chain B the γ-angle is in the typical +sc. It is +ap in H, –ap in A, C and E, +ac in D and F. This can be visualized as flipping of the O5′ atom due to a rotation of the O5′–C5′ bond. The effect on the conformation of the RNA strand is that the sugar rings of 5G and 4A become nearly co-planar (Figure 1). The corresponding helical twist shows unwinding of the duplex in the AG/CA steps, with helical twist values in the range 18–22°, compared to an average of 31° for other steps. Overall, the duplexes are underwound with average values of 12.5–12.9 base pairs per turn. The major groove opens up in the middle of each duplex to >20Å (Supplementary Table S4). The inter-strand distance measured between the C1′ atoms of the paired residues is typical for A-RNA—10.7Å, with standard deviation of 0.2Å. It is only slightly longer (11.0Å) for the paired adenosines.
All the base pairs are well defined in the electron density and the atomic temperature factors do not show any clear patterns of variability along the RNA sequence. The C–G pairs show the Watson–Crick interactions. All the adenosines are in the anti conformation and the only interaction within each A–A pair is a single C2-H2···N1 hydrogen bond (Figure 2). In the rhombohedral, atomic resolution structure, the distance between C2 and N1 is 3.41Å. When the H2 atom is included in the riding position its distance to N1 is 2.44Å, which is closer by 0.3Å than the sum of their van der Waals radii. The C, H and N atoms are almost co-linear (bond angle=176°). Consistently with the atomic resolution data, the distances between C2 and N1 in the trigonal structure are in the range 3.1–3.4Å. Of the two conformations that are possible in each A–A pair, the adenosine closer to the 3′-end is shifted towards the major groove, as indicated by the λ-angle (87° on average) between the bond C1′–N9 and the line between the C1′ atoms of the paired residues. The other adenosine is also upturned (λ=64°) compared to the other residues in both structures (average λ=55° in the range 51–58°).
Three kinds of stacking interactions are observed in both structures: one for the GC/GC step and two for the CA/AG step, depending on the conformation of the adenosines (Figure 3). The Watson–Crick pairs show extensive overlaps typical for canonical base pairs. Reduced stacking is observed for steps involving non-canonical pairing. The more upturned adenosines stack with adjacent cytosines, on the 5′ side (average of overlap area 2.2±1.1Å2), but are far removed and do not stack with guanosines adjacent on the 3′-side. The less upturned adenosines stack to a certain degree with both their adjacent residues (0.8±0.1Å2 with C and 0.3±0.3 Å2 with G).
The hydration of the A–A pairs forms a pattern similar in all the duplexes in the two crystal forms. In the minor groove, there is a single water molecule associated with each adenosine, bridging N3 with the O2′ atom of the ribose ring. In the major groove, there is usually a water molecule associated with N7. A sulphate ion is wedged between the paired adenine rings. One oxygen atom (O1) of the sulphate ion interacts simultaneously with N6 of the less upturned (towards the major groove) adenosine and with N1 of the other base. Another sulphate oxygen (O2) interacts with N6 of the same base (Figure 2). The interaction with the sulphate can be described as a merging of the anion binding sites described as ADE_WC_H and ADE_WC (34). The occupancy factor of the sulphate ion is ~0.5. In its absence, a water molecule occupies the position of O1.
The hydration of C–G pairs also shows regularity, especially in the high-resolution structure. In the major groove, guanosines interact with two or three water molecules. There are always two water molecules H-bonded with the O6 and N7 atoms. The third is associated with the phosphate group. There are one or two water molecules associated with the cytosines in the major groove. One always interacts with the exo-amino group and the other is bound to the phosphate. In the minor groove, the guanosines have the capacity to interact with two water molecules: one at the exo-amino group, the other between the N3 atom and the ribose ring. However, in two cases, one of the water molecule is displaced by H-bonds formed with the oxygen atom of a symmetry-related residue. Cytosines each have one water molecule in the minor groove, H-bonded to the O2 atom.
In the trigonal lattice, there are six ribose–ribose intermolecular interactions, each consisting of four CH···O hydrogen bonds: two C1′-H1′···O2′ and two C4′-H4′···O4′. Every 3C and every 8G is involved, on each strand. The paired sugars are either between two cytidines (two pairs), two guanosines (two pairs) or between a cytidine and a guanosine (two pairs). Similarly, in the rhombohedral structure, two symmetry-related pairs are observed of 3C with 8G (Supplementary Figure S2). The average donor-acceptor distance is 3.4Å with the standard deviation 0.1Å.
The calculated electrostatic potential shows a similar surface charge distribution for all duplexes. In the minor groove, the bands of positive and negative charge are arranged alternately along the direction of the helix axis (Figure 4). The positive bands are formed by the sugar rings and the exo-amino groups of G. The major groove is predominantly electronegative with patches of positive potential at the stacked A and C residues. The hydrogen atoms of the amino groups of each A–A pair form a stripe of electropositive potential across the major groove. These are the binding places of the sulphate ions. Adjacent cytosines generate additional positive patches on either side of the A–A pair, while the adjacent guanosines form electronegative niches in the surface of the major groove. The large shift between A and G, leading to their unstacking, results in characteristic surface features in both grooves: an indentation in the minor groove due to the protruding adenosine and a corresponding niche in the major groove at the adjacent (on the 3′-side) guanosine.
This work is part of the project to determine high-resolution crystal structures of all four CNG repeats, in order to identify their common and distinguishing features, which can then be interpreted in terms of their function. The atomic model interpreted in the context of the known physiochemical properties, such as the ligand affinity, surface features, electrostatic profile or hydrogen-bonding network and hydration can be used in ligand design.
One distinguishing feature of the presented structures is the A–A wobble, which to our knowledge has not been observed before. One other example of A–A pairing is found within the ribosome model (pdb code 1FFK) (35), in which both residues are in the anti conformation and interact with their Watson–Crick edges within an irregular double-stranded structure. However, the details of their interactions are different: the N6 amino group is H-bonded with N1, the two adenine rings are far from co-planar and it is doubtful if there is any significant interaction between N1 and C2 (distance 3.7Å). The C1′–C1′ distance in the ribosomal structure is 12.4Å—a consequence of the large size of the two purines interacting vis-a-vis. In contrast, the A–A pairs presented here, embedded in CAG repeats, fit remarkably well within a regular A-helix. Although accommodation of the interacting purine rings seems to be sterically demanding, the inter-strand C1′–C1′ distances for the adenosine residues are only slightly larger than average. This ability to conform to the helical form is worth noting in view of the fact that in the literature the CNG double stranded forms are often referred to as ‘containing internal loops’. In terms of the 3D structure, the main consequence of an A–A pair seems to be a local unwinding of the duplex. For the adenosine that is inclined towards the minor groove, the α-torsion angle of the following G takes positive or high negative values, and γ takes high values (Figure 1, Supplementary Table S3). Consequently, the two sugar rings on either side of the phosphate group are almost co-planar, as opposed to the typical case, in which the successive sugar rings follow the helical twist. The values of the twist associated with A–A and the following pair are small (in the range 18–22°) compared with the average value of 31–32° for other steps in the duplexes.
The only interaction between the paired adenosine residues is the weak C2-H2···N1 bond. Carbon is a poorer donor than nitrogen or oxygen and the later two clearly dominate in the H-bonding interactions in biological molecules. The energy of C-H···X bonds is estimated to be ~1kcal/mol or less, with the C-H···N bonds being weaker, less frequent and poorly studied compared to C-H···O. The energies, although small, are not negligible and correspond to measurable effects on the thermal stability of the duplex. Remarkably, in a thermodynamic study of related RNA sequences (19), bromination of one adenine, which is expected to force the adenosine residue into the syn conformation, results in a destabilisation of the helix, as indicated by an increase of the free energy of duplex formation by ~0.7kcal/mol, with a decrease of the melting temperature by a corresponding 4°C. The effect seems to be additive when more A–A pairs are modified.
In order to assess the biological significance, one needs to consider the present structure in the wider context of CNG repeats. Pathogenesis involving expanded runs of CAG repeats is well known to occur at the protein level. However, the role of the transcripts should also be considered. In one type of spinocerebellar ataxia the abnormal CAG run is found only in the UTR, the 5′-UTR promoter (36). In binding studies of MBNL1 protein, CAG repeats show similar affinity as CUG runs, both in vitro and in vivo (7,11,37). Detailed and well-parameterized structural model is necessary to explain the physiochemical properties of CAG structures and to use them as targets for ligands to block the translation of poly-Q mutant proteins. When comparing the present structure of the CAG repeats and the previously described CUG repeats (38) one can observe both similarities and differences (Table 2 and Supplementary Figures S3–5). In terms of the overall helical twist the CAG-containing duplexes are underwound (12.5–12.9bp/turn), while CUG structures are more typical (11bp/turn). The major groove in the CAG helices is wide and shallow, while in CUG it is narrow and deep. Each A–A and U–U pair can assume two alternative relative positions, depending on which base is the H-bond donor or acceptor (as it happens, the acceptor in both types of repeat is more inclined towards the minor groove). The four CAG-containing duplexes in the two crystal structure are closely superposable and they all show the same order of A–A paring conformations. The first adenosine, from the 5′-end, always acts as the H-bond acceptor within the A–A pair and the second A is in the ‘thumbs-up’ conformation, pointing towards the major groove, and acts as the H-bond donor. This is not a consequence of crystal symmetry, at least in the case of the trigonal structure, in which the three duplexes are crystallographically independent. The observed structures all correspond to one of three theoretically foreseeable arrangements of two consecutive A–A pairs (the other, unobserved, arrangements of the two adenosines within a strand would have alternating but reversed inclinations or similar inclinations). This can be contrasted with the structure of (CUG)n duplexes which show an apparently random order of two possible U–U pairing conformations within the CUG repeats (38).
One clear similarity between CAG- and CUG-duplexes is the pattern of stripes of alternating positive and negative electrostatic potential in the minor groove. The structural basis of the pattern is similar in both types of repeats except that the negative potential in the CAG structure is due to the imine groups of the adenosine residues, while in the CUG structure this is due to the carbonyl oxygen atoms of the uridines. This could explain some features of the repeat tracts, such as the reported affinity of the MBNL1 protein for both CUG and CAG (38). In the case of CCG repeats (whose structure is still unknown), which are also recognized by the MBNL1 protein, the C–C pairs should also contribute electronegative potential in the minor groove, due to their carbonyl groups. In contrast, the CGG-containing duplexes, which do not interact directly with the MBNL1 protein (40), are expected to present in the minor groove at least one prominent amine group. In the major groove, on the other hand, the A–A pair shows an affinity for sulphate ions. This is apparently due to the exposed Watson–Crick edge of the adenosine in the ‘thumbs-up’ conformation. The binding of the sulphate is a consequence of high concentration of the anions in the crystallization medium and is unlikely to take place to a significant degree in the cell, but the interaction in the crystals can be taken as an indication of an affinity of the exposed Watson–Crick edge of the adenine for negatively charged bidentate ligands.
The accession numbers are PDB 3NJ6, 3NJ7.
Supplementary Data are available at NAR Online.
Ministry of Science and Higher Education (Poland, N-N301-0171634, PBZ-MNiSW-07/I/2007, PBZ-MNiI-2/1/2005, PBZ-KBN-124/P05/2004); the European Community – Research Infrastructure Action under the FP6 ‘Structuring the European Research Area’ Programme (through the ‘Integrated Infrastructure Initiative’ Integrating Activity on Synchrotron and Free Electron Laser Science – Contract R II 3-CT-2004-506008); Fellowship of the Foundation for Polish Science (to R.K.). Funding for open access charge: Research Grant.
Conflict of interest statement. None declared.