|Home | About | Journals | Submit | Contact Us | Français|
We have determined the solution structure of the complex between an arginine-glycine-rich RGG peptide from the fragile X mental retardation protein (FMRP) and an in vitro-selected guanine-rich sc1 RNA. The bound RNA forms a novel G-quadruplex separated from the flanking duplex stem by a mixed junctional tetrad. The RGG peptide is positioned along the major groove of the RNA duplex, with the G-quadruplex forcing a sharp turn of R10GGGGR15 at the duplex-quadruplex junction. Arginines R10 and R15 form cross-strand specificity-determining intermolecular hydrogen-bonds with the major-groove edges of guanines of adjacent Watson-Crick G•C pairs. Filter binding assays on RNA and peptide mutations identify and validate contributions of peptide-RNA intermolecular contacts and shape complementarity to molecular recognition. These findings on FMRP RGG domain recognition by a combination of G-quadruplex and surrounding RNA sequences have implications for recognition of other genomic G-rich RNAs.
The regulation of gene expression by interactions between nucleic acid-binding proteins and G-quadruplexes is an area of intense interest. Bioinformatic analyses have predicted that there are potentially over 350,000 G-quadruplex-forming sequences in the human genome1,2, and recent studies of RNA G-quadruplexes in the transcriptome predict that they may be even more extensive than previously appreciated3,4. Functionally, G-quadruplex structures in RNA have been implicated in almost all aspects of pre-mRNA and mRNA metabolism including mRNA stability, IRES-dependent translation initiation5, translational repression3,6,7, alternative splicing8,9 and alternative polyadenylation/3´ end formation10,11 suggesting that the G-quadruplex may be an important regulatory motif in many aspects of gene expression.
The Fragile X mental retardation protein (FMRP) is a regulatory RNA binding protein that binds with high affinity to guanine-rich RNAs capable of G-quadruplex formation12,13. Loss of FMRP function leads to the Fragile X Syndrome, the most common form of inherited mental retardation, afflicting 1 in 2500 males, and is the leading single-gene cause of autism14. The Fragile X Syndrome almost always results from a triplet repeat expansion in the 5’-UTR, leading to abnormal methylation of the gene, repression of transcription, and hence complete loss of FMRP expression. However, one severely affected patient harbors a missense mutation in one of the RNA binding domains15, and mice harboring this mutation are indistinguishable from FMRP null mice in most behavioral and electrophysiologic assays and macroorchidism16, suggesting that loss of RNA binding activity may underlie the synaptic dysfunction observed in the disease.
Recent evidence suggests that FMRP functions to repress mRNA translation in neurons, leading to the widely held view that Fragile X Syndrome is a disease of “runaway translation” resulting in inappropriate gene expression with dire consequences for synaptic plasticity. This occurs despite the presence of two autosomal paralogs, FXR1P and FXR2P, which are also expressed in the brain. While all three proteins share a great deal of homology in their N-termini, KH domains and nuclear export signal, the C-termini containing the RGG box have diverged considerably. Consequently, the KH domains share RNA binding specificity but binding to G-rich sequences capable of forming G-quadruplexes is specific to FMRP17. These observations, together with interest in identifying the in vivo RNA ligands of FMRP suggest that understanding the role of the RGG box - G-rich RNA interaction will be important in understanding the human disease.
Binding of FMRP to G-rich RNAs requires only the RGG box RNA binding domain of FMRP, which is rich in arginines and glycines. In vitro selection identified guanine-rich RNA motifs that can bind tightly to FMRP, such as the 36-nt r(GCUGCGGUGUGGAAGGAGUGGUCGGGUUGCGCAGCG) sequence named sc112. Binding of FMRP to sc1 RNA was shown to depend on G-quadruplex formation based on the following observations: (i) the binding affinity was significantly increased in the presence of K+ as compared to Li+, and (ii) mutations of guanines within G-tracts abolished binding.
To understand how G-rich RNAs could be recognized by the RGG domain of FMRP, we used NMR to characterize the 1:1 complex between the 36-nt sc1 RNA sequence with a 28-aa peptide from the RGG domain of FMRP (RGG peptide) (Fig. 1a). We prepared complexes with various RNA and peptide constructs, including unlabeled, uniformly 13C,15N-labeled, as well as residue-type-specific and site-specific 13C,15N-labeled sc1 RNA and RGG peptide molecules. The structure of the complex revealed a number of new motifs and recognition principles between the FMRP RGG peptide and the sc1 RNA duplex-quadruplex junction. Subsequently, the peptide-RNA intermolecular contacts and the contributions of shape complementarity to molecular recognition were validated following analysis of filter binding assays on structure-guided RNA and peptide mutations.
NMR spectra of both peptide and RNA in 50 mM K-acetate, pH 6.8 at 25 °C, clearly indicated that the RGG peptide binds to sc1 RNA to form a stable complex. From the peptide side, peak dispersion in the fingerprint region of the 1H-15N HSQC spectra of the free peptide (Fig. 1b) and RNA-bound peptide (Fig. 1c) revealed a transition from a ‘random coil’ to a well-ordered conformation of the peptide upon RNA binding. From the RNA side, the imino proton spectrum of the RNA in the free state (Fig. 1d) suggested that besides formation of a Watson-Crick duplex stem (indicated by six imino proton peaks between 12 to 14 ppm), the remainder of the molecule either is essentially not well-structured or adopts multiple conformations, as indicated by broad imino protons between 10 and 12 ppm. Upon binding to the RGG peptide, sc1 RNA becomes well-structured as indicated by a sharp and well-resolved imino proton spectrum containing well-dispersed additional resonances (Fig. 1e). Imino proton peaks from 12 to 14 ppm are characteristic of Watson-Crick base pairs, while those from 10 to 12 ppm are characteristic of G-tetrad formation, with eight imino protons between 10–12 ppm exchanging slowly and observed even 16 hours after transfer to D2O solution (Fig. 1f).
The imino protons of the FMRP RGG peptide-sc1 RNA complex in K+-containing H2O buffer solution are exceptionally well-dispersed between 10.5 and 14.0 ppm (Fig. 1e; see also 1H-15N HSQC spectrum in Supplementary Fig. 1a). We have unambiguously assigned the guanine imino protons following site-specific incorporation of 2% 15N-labeled deoxyguanine at individual guanine positions in the sc1 RNA sequence18. An example of a 15N-edited NMR spectrum following incorporation of the label at the G31 position is shown below the control spectrum (Fig. 2a) and allows unambiguous assignment of the G31 imino proton resonance. The complete guanine imino proton assignments in the complex using this approach are listed over the spectrum in Fig. 1e. The imino (H1) protons were next correlated to H8 protons within individual guanines by through-bond connectivities via their intervening C5 carbons19. The well-resolved H1-C5 and H8-C5 cross-peaks in the 1H-13C HMBC spectrum allowed unambiguous assignment of all guanine H8 protons based on the known assignments of the corresponding guanine imino protons (Fig. 2b). Aromatic protons including guanine and adenine H8 protons and uracil and cytosine H6 protons were also independently assigned or confirmed with site-specific deoxyribose-for-ribose substitutions20.The aromatic protons were then correlated by through-bond connectivities with the sugar ring protons21. Ribose sugar protons were connected using COSY, TOCSY, and 3D HCCH-COSY and HCCH-TOCSY experiments22.
To overcome spectral crowding in the sugar proton region of the spectrum, we prepared sc1 RNA that was selectively 13C,15N-labeled solely at guanines, at adenines and at cytosines, and recorded 13C-edited and/or 15N-edited spectra of these selectively-labeled complexes (Supplementary Fig. 2). The base and sugar proton chemical shifts of sc1 RNA in the RGG peptide-bound state are tabulated in Supplementary Table 1.
We have applied through-bond 1H-15N HNN-COSY correlation experiments23,24 to monitor G•C (guanine imino proton to cytosine N3 nitrogen) and A•U (uracil imino proton to adenine N1 nitrogen) Watson-Crick pair formation (Supplementary Fig. 1b) within the sc1 RNA on complex formation with the FMRP RGG peptide. The data support formation of a five-base-pair duplex containing one A•U and four G•C base pairs as shown schematically in Fig. 1a, but also establish formation of an additional G7•C30 Watson-Crick pair, thereby unexpectedly extending the duplex by one base pair.
We have applied through-bond and through-space two-dimensional experiments to monitor G•G•G•G tetrad formation within the guanine-rich sc1 RNA sequence on complex formation with the FMRP RGG peptide (Fig. 3). We have used unambiguous through-bond HNN-COSY experiments to monitor hydrogen-bonding patterns between adjacent guanines around a G-tetrad, with the correlations between guanine N2 nitrogen and guanine H8 proton around individual G-tetrads25 plotted in Fig. 3a and between guanine amino protons and guanine N7 nitrogen around individual G-tetrads26 plotted in Fig. 3b. As an example, we observe connectivities 9/26, 26/18, 18/6 and 6/9 (Figs. 3a,b) identifying G9•G26•G18•G6 as one of the three G-tetrads (Fig. 3c) forming the G-quadruplex. Related through-bond connectivity patterns identify formation of additional G11•G15•G20•G24 and G12•G16•G21•G25 G-tetrads on complex formation (Fig. 3c).
We have also used through-space NOE connectivities between imino and H8 protons (involving spin diffusion via the amino protons) on adjacent guanines around individual G-tetrads, and these data in Fig. 3d provide independent support for formation of the same three G-tetrads listed in the previous paragraph on complex formation.
Backbone and side-chain protons in the FMRP RGG peptide bound to sc1 RNA were assigned using standard triple-resonance experiments27,28 (see Online Methods section) on a sample of the complex containing uniformly 13C,15N-labeled peptide and unlabeled sc1 RNA. The RGG proton peptide amide and side-chain proton chemical shifts in the sc1 RNA bound state are tabulated in Supplementary Table 2a.
We observe a finite set of intermolecular NOEs between RGG peptide and sc1 RNA in the complex. Many of these involve arginine side-chain protons, particularly between peptide side chains of Arg10 and Arg15 and RNA guanine imino protons (Supplementary Fig 3, left panel) and between peptide side chains of Arg4, Arg8 and Arg15 and RNA base protons (Supplementary Fig. 3, right panel). The complete set of intermolecular NOEs used as inputs in the structure determination computations are graphically listed in Supplementary Table 3.
We have used through-bond connectivities in HNN-COSY spectra to monitor intermolecular hydrogen bonding alignments within arginine•guanine pairs29 using samples containing uniform 13C,15N-labeling of both RGG peptide and sc1 RNA in the complex25,30. Using this approach, we have identified intermolecular through-hydrogen-bond correlations between the H8 proton of G31 from sc1 RNA and the Nε nitrogen (peak a, Fig. 3e) and NεH proton (peak a, Fig. 3f) of R10 from RGG peptide, associated with R10•G31 intermolecular pairing, and between the H8 proton of G7 from sc1 RNA and the Nη nitrogen (peak b, Fig. 3e) and NH2η protons (peak b, Fig. 3f) of R15 from RGG peptide, associated with R15•G7 intermolecular pairing in the complex.
Structure calculations were guided by 58 intramolecular peptide, 232 intramolecular RNA and 90 intermolecular peptide-RNA restraints. Further, six base pairs within the duplex segment and three G-tetrads within the G-quadruplex segment were aligned using hydrogen-bonding restraints consistent with experimental data. In addition, the side chains of R10 and R15 were aligned with the major groove edges of G31 and G7 respectively, based on observed through-bond connectivities. The molecular dynamics protocols, initially in the absence and subsequently in a water shell containing K+ cations that guided the docking and convergence of the structures of the complex are outlined in the Online Methods and Supplementary sections. The experimental restraints and structural statistics are listed in Table 1.
A family of 10 superpositioned structures is shown in stereo in Fig. 4a, with a representative structure shown in Fig. 4b. The peptide is colored in red, while RNA bases within the duplex, quadruplex and junctional segments are colored in purple, cyan and orange, respectively. The RNA sugar-phosphate backbone is colored in silver, while backbone phosphorus atoms are in yellow. The side chains of R8, R10 and R15 are colored in green. A K+ cation in the vicinity of the duplex-quadruplex junction, anchored in place through interaction with oxygens from four phosphate groups, is shown as a yellow sphere in Figs. 4b,c.
The bound sc1 RNA contains an anti-parallel duplex composed of six base pairs and a G-quadruplex composed of three stacked G-tetrad layers, with these folded elements separated by a junctional mixed tetrad (Fig. 4b). The Watson-Crick pairs (Fig. 1a) of the duplex segment are extended through stacking with Watson-Crick G7•C30 and wobble U8•G29 base pairs (Figs. 5a,b), with the latter pair being part of the junctional U8•A17•U28•G29 tetrad (two alternate alignments between refined structures are shown in Figs. 5c,d and hydrogen bonding alignments are shown in Supplementary Fig. 4a).
The topology of the G-quadruplex scaffold (Figs. 5e,f) is unprecedented: all twelve guanines are anti; the bottom two G-tetrad layers (G11•G15•G20•G24 positioned below G12•G16•G21•G25) are similar to other parallel-stranded G-quadruplexes, with a anti-clockwise hydrogen-bond directionality31–33. The third G-tetrad layer (G9•G26•G18•G6), positioned on top of the other two G-tetrads (Figs. 5e,f) flips over with resulting clockwise hydrogen-bond directionality; three out of four guanines (G6, G9 and G18) from the latter G-tetrad are isolated (not directly connected in the strands that support the G-quadruplex core).
A range of novel loop topologies connect opposingly oriented guanines between the top inverted G-tetrad (G9•G26•G18•G6) layer and the bottom two G-tetrads (G12•G16•G21•G25 and G11•G15•G20•G24) that are part of the all-parallel-stranded G-quadruplex (Figs. 5e,f). Amongst these, single-residue loops connect G9 to G11 and G18 to G20, each of which are separated by a G-tetrad plane, while a single-residue loop connects G16 to G18 on adjacent G-tetrad planes (Supplementary Fig. 4b–d). Strikingly, no nucleotide connects G25 and G26 on adjacent G-tetrads, given that these guanines adopt opposing local strand orientations (Figs. 5e,f). Given the novelty of these loops and the presence of left-handed dinucleotide steps (Supplementary Fig. 5), these topologies are considered further in the Discussion and Supplementary sections.
All residues in the duplex stem adopted the C3’-endo sugar pucker conformation, as usually expected for RNA. By contrast, all residues in the G-tetrad core and loops adopted the C2’-endo sugar pucker conformation, except for G11, G25 and A14, which adopted the C3’-endo conformation. The sugar puckers in the refined structures were consistent with pucker geometries estimated from the magnitude of the COSY peaks between H1’ and H2’ sugar protons in the complex.
The conformation of the bound peptide is best defined for the R10GGGGR15 segment, whose positioning at the duplex-quadruplex junction (Fig. 6a), is defined by the largest number of intermolecular NOEs in the complex (Supplementary Table 3). This R10GGGGR15 segment adopts a turn conformation, centered opposite the duplex-quadruplex junction, with the side chains of R10 and R15 directed in opposite orientations (Fig. 6b). The backbone NH of G13 and CO of G11 point towards each other with the resultant hydrogen bond stabilizing the turn.
The turn conformation of the R10GGGGR15 segment of the RGG peptide recognizes the duplex-quadruplex junction through a combination of shape complementarity (Figs. 6a,b) and intermolecular hydrogen-bonding interactions. Intermolecular steric constraints mandate an absolute requirement for Gly residues at positions 12 and 13, and at best, suggest that only small side chains could be tolerated at positions 11 and 14. The key intermolecular contacts in the complex are between Arg10 and Arg15 of the RGG peptide and the major groove of the Watson-Crick G•C base pairs of the duplex segment adjacent to the duplex-quadruplex junction. Thus, the side chains of Arg10 and Arg15 are directed in opposite directions (Fig. 6b), such that they target adjacent guanines G31 and G7 respectively, on partner strands of the duplex (Figs. 6c,d). Further, the structure of the complex revealed that the two arginines, Arg15 and Arg10, recognize O6 and N7 of guanines along the major-groove edges of G7•C30 and C5•G31 base pairs, respectively (Fig. 6d). In addition, the side chain of Arg8 is positioned in the vicinity of U3 5´-phosphate (Fig. 4a) providing important stabilizing interaction for the complex.
Unexpectedly, no intermolecular NOEs were observed between the RGG peptide and the G-quadruplex segment of the sc1 RNA in the complex.
The NMR-based structural studies reported above were undertaken on a 1:1 complex of 28-aa RGG peptide and 36-nt sc1 RNA sequence. Given that the R10GGGGR15 segment of the peptide and the duplex-quadruplex junction of the RNA were key to intermolecular recognition, we investigated into the extent to which both components could be truncated and still retain complex formation. The observed dispersion pattern of imino proton resonances in the spectrum of the complex provides a useful set of markers for testing the integrity of complex formation (Fig. 1e). Thus, the length of the peptide could be reduced stepwise from a 28-mer to a 10-mer, with the resulting Arg8 to Gln17 segment still forming a robust complex (Supplementary Fig. 6a–e). Conversely, the RNA could be reduced stepwise in length from a 36-mer to a 31-mer by truncating two terminal base pairs of the duplex, with the resulting U4-to-A33 segment still forming a robust complex (Supplementary Fig. 6f–h).
Based on the structure of the RGG peptide-sc1 RNA complex (Fig. 4), we investigated how mutations within the peptide sequence affect its structure and binding properties. These mutations were made in the context that the Arg8-to-Gln17 RGG peptide segment targeted the sc1 RNA at the duplex adjacent to the duplex-quadruplex junction.
Given that the guanidinium groups of both Arg10 and Arg15 form intermolecular hydrogen bonds with guanine major groove base edges in the complex (Fig. 6d), it was not surprising to observe that mutating Arg10 to either Lys or Leu resulted in complete loss in binding in filter binding assays (Supplementary Fig. 7a), as did mutating Arg15 to either Lys, Leu or Ala (Fig. 7a). Similarly, mutating Gly11 to Asp, but not Ala, resulted in complete loss in binding (Supplementary Fig. 7b), while replacement of Gly12, Gly13 and Gly16 to Ala (Fig. 7b) or Asp (Supplementary Fig. 7c) resulted in either complete or significant loss in binding. Replacement of Gly14 within the R10GGGGR15 segment by Ala (Fig. 7b) or Leu (Supplementary Fig. 7d) results in a 13-fold and 16-fold loss in binding affinity, respectively.
We have also mutated Arg residues 3, 4 and 8 to Lys, and observe a modest 2.5-fold loss in binding affinity for the Arg3Lys and Arg4Lys mutants and a 9-fold loss for the Arg8Lys mutant (Supplementary Fig. 7e). Further, replacement of Gln17 by Asn results in a 3.5-fold loss in binding affinity (Supplementary Fig. 7f).
The effect of Arg10 and Arg15 RGG peptide mutations were also tested with various other in vitro-selected RNA sequences (sc2 to sc6 RNAs) identified based on their ability to bind FMRP12. The Arg15 to Lys mutation was more perturbing than the Arg10 to Lys mutation for binding to sc2 to sc6 RNA sequences (Supplementary Figs. 8a–f).
We have identified a pair of one-nucleotide loops and a pair of two-nucleotide loops connecting guanines involved in G-quadruplex formation in the complex. Thus, U10 connects G9 to G11 and U19 connects G18 to G20 (Figs. 5e,f), while A13–A14 connects G12 to G15 and C22–U23 connects G21 to G24 (Figs. 5e,f). Replacement of such loop bases U10, U19, C22 and U23 by adenine (Supplementary Fig. 9a) has little impact on binding affinity (Supplementary Fig. 9b). Further, converting two-nucleotide loops into one-nucleotide loops, as for ΔC21 and ΔA14 combined with mutation A13U (Supplementary Fig. 9a) also has little impact on binding affinity (Supplementary Fig. 9b). Finally, U27, which connects G26 of the G-tetrad with U28 of the mixed junctional tetrad, can be replaced by adenine without impacting on binding affinity (Supplementary Fig. 9b). Overall, these results imply that connecting loops can be as small as a single nucleotide, and their base composition is not critical for retention of complex formation.
The integrity of the mixed U8•A17•U28•G29 junctional tetrad (Figs. 5c,d; Supplementary Fig. 4a) is important for complex formation, since single U8C and A17U mutants and the dual U27A/U28A mutants, result in complete loss in binding affinity (Fig. 7c).
We also note that the composition of the G7•C30 and C5•G31 pairs that form intermolecular hydrogen bonds with the side chains of R15 and R10, respectively (Fig. 6d) are also critical, since replacement for instance, of C5•G31 by U5•A31 (designated “long stem” sequence, Supplementary Fig. 9a), for instance resulted in complete loss in binding activity (Supplementary Fig. 9c), while RNA integrity was confirmed by denaturing gel analysis.
Given that the Hoogsteen edges of G7 and G31 pair with the guanidinium groups of R15 and R10 respectively, in the complex (Fig. 6d), we investigated the binding properties of sc1 RNA containing single deaza-G7 and deaza-G31 mutants, as well as the dual deaza-G7/deaza-G31 mutant. These deaza-G7 and deaza-G31 mutants result in complete to pronounced loss in binding affinity (Fig. 7d).
Despite considerable interest in the binding of FMRP RGG domain to G-rich sequences capable of G-quadruplex formation, little structural information is available on the molecular events underlying this recognition12,3,17,34–36. Our NMR-based solution structure of the FMRP RGG peptide – sc1 RNA complex has defined how the R10GGGGR15 segment of the RGG peptide targets the duplex-quadruplex junction of the RNA. The structure of the complex explains the role of Gly and Arg residues in mediating shape complementarity and base-specific recognition, as well as key features of the duplex-quadruplex junction topology in shaping the binding pocket and dictating the trajectory of the bound peptide. The RNA quadruplex composed of three stacked G-tetrads adopts an unprecedented fold, with the structure of the bound RNA also defining how a mixed junctional tetrad mediates the facile connection between duplex and quadruplex folds.
Both the FMRP RGG peptide (Fig. 1b) and the hairpin loop segment of sc1 RNA (Fig. 1d) are essentially unstructured in the free state, but become structured (Figs. 1c,e) on complex formation, with spectral features characteristic of a single conformational species in solution. We have identified a novel RNA G-quadruplex and an unanticipated duplex-quadruplex junction in the structure of the FMRP RGG peptide - sc1 RNA complex (Figs. 5a,b). The key challenge centered on how to connect an anti-parallel five-base-pair duplex (G1 to C5 paired with G31 to C35, Fig. 1a) to an all-parallel-stranded G-quadruplex composed of two stacked G-tetrads (G11•G15•G20•G24 and G12•G16•G21•G25, Figs. 5e,f). Remarkably, this has been achieved by interjecting three layers at the junction between these two motifs (Figs. 5a,b), that include a base pair, a mixed tetrad and an additional inverted G-tetrad, thereby allowing a smooth transition between the anti-parallel duplex and the all-parallel-stranded quadruplex motifs.
In the first layer, the duplex is extended through formation of a G7•C30 Watson-Crick pair, which is stacked on the terminal C5•G31 Watson-Crick pair of the duplex (Figs. 5a,b). Though G30 and C31 are adjacent to each other in sequence, C5 and G7 are not, but they still retain a pseudo-continuous backbone (Figs. 5a,b).
In the second layer, the duplex is further extended by formation of a U8•G29 wobble pair, which is stacked on the G7•C30 pair mentioned above, and forms part of a U8•A17•U28•G29 mixed tetrad, (Figs. 5a,b,c,d and Supplementary Fig. 4a). Formation of this mixed tetrad, with its U28-G29 platform37, is important since it contributes a bridging function linking the duplex and quadruplex folds.
The third layer is composed of an inverted G9•G26•G18•G6 tetrad, stacked between the mixed U8•A17•U28•G29 and G12•G16•G21•G25 tetrads (Figs. 5a,b and 5e,f). Strikingly, both the strand and hydrogen bond directionalities for the G9•G26•G18•G6 tetrad are the reverse of that observed for the G12•G16•G21•G25 and G11•G15•G20•G24 tetrads (Figs. 5e,f), thereby defining a completely new folding topology between three stacked G-tetrads within a G-quadruplex.
The G9•G26•G18•G6 tetrad is connected to the two-G-tetrad-layered all-parallel-stranded G-quadruplex by a novel set of connecting loops, the majority of which have not been observed previously33. Thus, single-residue linkers, U10 and U19, form a new type of loop, which is observed here for the first time, connecting guanines with opposing strand orientations and separated by a G-tetrad, but belonging to the same column of the G-quadruplex (Figs. 5e,f and Supplementary Figs. 4b,c). Single-residue linker A17 forms a second type of new loop connecting guanines with opposing strand orientations on two adjacent G-tetrad layers, while belonging to different columns of the G-quadruplex (Figs. 5e,f and Supplementary Fig. 4d). Linker G7–U8 forms the third type of new loop that connects two guanines G6 and G9 with similar strand orientations that occupy adjacent positions within a G-tetrad (Figs. 5e,f and Supplementary Fig. 4e).
Additional novel conformational features within junctional elements, including left-handed backbone connections (Supplementary Fig. 5), are outlined in the Supplementary section.
Our results establish that the R10GGGGR15 peptide adopts a turn conformation, with the side chains of R10 and R15 pointed in opposite directions (Figs. 6a,b), allowing for interactions with the major groove of flanking G•C pairs of the duplex segment adjacent to the duplex-quadruplex junction (Figs. 6c,d). Unexpectedly, the RGG peptide does not interact directly with the quadruplex, but this scaffold enforces the turn adopted by the R10GGGGR15 peptide, and facilitates its positioning within the binding pocket on the RNA in the complex.
In the case of sc1 RNA, the major groove of the duplex widens somewhat38–40 and opens adjacent to the duplex-quadruplex junction on complex formation as a consequence of three structural factors. First, these include widening of the groove itself as measured by the distance between phosphate groups of constitutive strands, bending of its helical axis, and specific structural nucleotide arrangements. Two structural factors leading to groove widening and generation of a defined ligand-binding pocket are the G29•U8 wobble pair that is shifted towards the major groove, as compared to a standard G•C base pair, and the presence of an extra residue G6 at the pseudo-continuous C5---G7 step, with resulting widening of the major groove at the junction with the quadruplex.
The role of shape complementarity and arginine-guanine pairing to peptide-RNA recognition is outlined in greater detail in the Supplementary section.
Regulation of gene expression through post-transcriptional mechanisms such as splicing, 3’-end formation, translational control and RNA turnover adds multiple layers of refinement to the metabolism of nascent transcripts. The widespread occurrence of G-quadruplex motifs near regulatory sites, and the identification of a number of RNA-binding proteins (RNABPs) which bind to, stabilize or unwind these structures, suggests that greater understanding of these interactions will shed light on these key regulatory steps. Moreover, as loss of similar interactions between DNA binding proteins such as WRN or telomere end-binding proteins and G-quadruplexes in DNA have been linked to human diseases such as Bloom’s and Werner’s syndromes, it is likely that understanding the subtleties of RNABP-RNA G-quadruplex interactions will contribute to the growing list of human diseases resulting from loss-of-function of RNABPs41,42.
Guanine-rich sequences are found in a number of RNAs, other than sc1 RNA, such as the 5’-UTR of certain oncogenes3, that are involved in distinct biological processes. A systematic bioinformatics search found the occurrence of the exact RGGGGR motif in several genes of the H. sapiens genome43–48. The RGG domain of nucleolin, an abundant protein in the nucleolus, has been reported to bind with high affinity to G-rich rDNA and other G-rich sequences49. Thus, the recognition principles reported in this paper for the FMRP RGG peptide - sc1 RNA complex may be applicable to other RGG-containing protein-RNA complexes.
We also discuss in some detail the significance of binding to the duplex-quadruplex junction in determination of FMRP target RNAs in some detail in the Supplementary text section.
In the current study we find that arginine interactions play a key role in this recognition. In particular, Arg15 is important for recognition of all six G-quadruplexes identified by RNA selection (sc1–6), while Arg10 is only important for a subset (Supplementary Fig. 8). While these observations suggest that there are multiple ways for the FMRP RGG box to bind G-quadruplexes with high affinity, the observation that Arg15 has at least a 10-fold effect on the binding to each G-quadruplex suggests that a common feature of RGG-box-RNA target recognition may lie in interaction with sequence-specific elements in the stem. Such interactions will be important in further defining RNA targets that are biologically relevant by virtue of their specific interaction with the FMRP RGG-box.
The RGG box is required for several functions of FMRP critical to its role in controlling local mRNA translation to effect activity-dependent synaptic plasticity in neurons, providing a critical link between RNA binding by this domain and protein function in vivo. Deletion of the RGG box resulted in failure of FMRP to nucleate stress granules, thought to be one of the important mechanisms through which FMRP regulates translation in cells50. A finer analysis of the determinants required for FMRP to associate with polyribosomes revealed that the two arginines we find to be directly involved in RNA recognition, Arg10 and Arg15 (R533 and R538 of murine FMRP) are necessary, directly implicating these arginines in proper translational control by FMRP36. Moreover, it has been reported that mono- and dimethylarginine modifications of the RGG box decrease binding to RNA, which may suggest a regulatory role for this modification in vivo. Intriguingly, Arg10 and Arg15 were two of the four arginines found to be modified in vivo by the arginine methyltransferase PRMT151. Finally, it has recently been shown that the RGG box is required for FMRP to mediate MEF2-dependent synapse elimination in response to neuronal activity in mice52. Loss-of-function of FMRP in this regard may cause the increases in spine density observed in both mouse models and human Fragile X patients which may be indicative of a deficit in excitatory spine elimination, and directly contributing to observed defects at the levels of both synapse and circuit function52.
Methods and any associated references are available in the Online version of the paper at http://www.nature.com/nsmb/.
We thank Yuying Gosser and Stephen Pitt for their participation at the early stage of the project, Olga Mostovetsky, Elizabeth F. Stone, and Ka Ying Sharon Hung for technical assistance, and Alex Lash for bioinformatics of RG4R motif in H. sapiens genome. This research was supported by NIH grant CA049982 to DJP, NIH grant R01 HD040647 to JCD, and Singapore BMRC grant 07/1/22/19/542 to ATP. DJP is a member of the New York Structural Biology Center supported by NIH grant GM66354.
The coordinates for the structure of the RGG-sc1 RNA complex have been deposited in the Protein Data Bank (accession code 2la5).
Note: Supplementary information is available on the Nature Structural & Molecular Biology web site.
AUTHOR CONTRIBUTIONSATP, AM and SI were responsible for NMR studies, VK undertook computations, while AS, TR and AP prepared labeled NMR samples, all under the supervision of DJP. CC, DC and JCD undertook filter-binding assays under the joint supervision of JCD and RBD. The paper was written jointly by DJP, ATP, VK, JCD and RBD.
COMPETING INTERESTS STATEMENT
The authors declare that they have no competing financial interest.
Reprints and permission information is available online at http://nature.com/reprintsandpermissions/.