|Home | About | Journals | Submit | Contact Us | Français|
Messenger RNAs interact with a number of different molecules that determine the fate of each transcript and contribute to the overall pattern of gene expression. These interactions are governed by specific mRNA signals, which in principle could represent a special mRNA recognition ‘code’. Both, small molecules and proteins demonstrate a diversity of mRNA binding modes often dependent on the structural context of the regions surrounding specific target sequences. In this review, we have highlighted recent structural studies that illustrate the diversity of recognition principles used by mRNA binders for timely and specific targeting and processing of the message.
Transcription does not simply transfer coding information required for protein biosynthesis from DNA to mRNA. In fact, transcription produces pre-mRNA and mRNA molecules, which carry multiple signals required for processing, modification, transport, translation and degradation of the message. These signals are recognized by mRNA-binding molecules in both sequence-specific and structure-dependent manner, and help define the spatial and temporal constraints for translation of mRNA species. The mRNA recognition signatures, therefore, could be considered a special ‘code’, contributing, along with other layers of gene expression control, to the final pattern of gene expression. This code, however, is unlikely to be universal due to dramatic differences in transcription and mRNA processing amongst evolutionary distant groups, as well as occurrence of species-specific mRNA-recognition systems necessary for adaptation to particular environmental cues . Due to the immense potential and opportunities for manipulation of gene expression at the post-transcriptional level, many structural biology groups have focused their ongoing research efforts towards determination of structures that would uncover the complex network of relationships between mRNA and its partners, thereby contributing towards a comprehensive understanding of the principles underlying a ‘mRNA recognition code’.
Although the last decade of intensive research has provided us with molecular details of many interesting intermolecular interactions involving mRNA, the last two years have been especially informative, with over thirty structures reported of complexes containing mRNA. This review analyzes recently published structural data (spanning 2006–2007) on specific mRNA recognition events, and complements excellent earlier reviews on protein-RNA [2–5] and metabolite-mRNA [6–8] recognition.
For the purpose of this review, we have considered protein data bank (PDB) entries that describe interactions of mRNA fragments or their mimetics with either small molecules or proteins. We propose dividing all such complexes into three categories (Table 1): (1) structure-specific recognition of folded RNAs (Figure 1a); (2) sequence-specific recognition of single-stranded RNAs (Figure 1b); (3) non-specific recognition of single-stranded RNAs (Figure 1c).
Most complexes belong to the first group and are often characterized by unique structure-specific aspects of mRNA recognition. About half of these complexes are comprised of long (70–150 nt) sensing domains of riboswitch mRNAs bound to their ligands (Figure 1a, left panel). The other half contains protein domain(s) bound to shorter RNAs, which typically adopt stem-loop scaffolds (Figure 1a, middle and right panels). The second group features protein domain(s) that interact with sequence-specificity to short single-stranded mRNA fragments (Figure 1b). The third group, not discussed in the current review, mostly includes proteins and protein assemblies capable of binding various RNA species in a non-sequence specific manner (Figure 1c).
Ribosensors are mRNA sequences that control gene expression in response to various stimuli, such as metabolites (riboswitches), cations (metallosensors), and temperature (thermosensors). Recently, the seminal structure determination of purine-sensing riboswitches [9,10] has been rapidly followed by structures of five more ribosensors: bacterial [11,12] and plant  thiamine pyrophosphate (TPP) riboswitches, a S-adenosylmethionine type I (SAM-I) riboswitch , a glucosamine-6-phosphate-sensing glmS ribozyme [15,16], a thermosensor  and a Mg2+ ribosensor . Here, we focus on the recent structure of the Mg2+ ribosensor, since it was not discussed in an earlier review .
Mg2+, the most abundant divalent cation, is critically required for both structure and function of many RNAs, including mRNAs. Therefore, it was surprising to find that Mg2+ homeostasis in Salmonella enterica and in Bacillus subtilis are controlled directly by two different RNA sensors located in the Mg2+ transporter MgtA and MgtE mRNAs, respectively [18,19]. Under high Mg2+ concentration, both sensors adapt one of two alternative conformations that arrest transcriptional elongation, using an as yet uncharacterized mechanism for the mgtA switch, and by a transcription attenuation mechanism for the mgtE switch (Fig. 2a–b).
The X-ray structure of the Mg2+-sensing domain (M-box) of mgtE switch provides, for the first time, insights into the molecular details of metalloregulation by an RNA sensor . The structure comprises of close packing of the P1-P2-P6 helix against the stem-loop structures P5/L5 and P4-P3/L4 (Fig. 2c–d), that are oriented downward and stapled by extensive tertiary contacts between the helix P2 and J2-1 junction with loops L4 and L5. Four (Mg1-4) of the six Mg2+ cations reside in this region, most likely comprising the cations that are key for metal sensing (Fig. 2e): Mg1 organizes the L5 structure for docking with P2 and L4, Mg2 bridges L5 and P2, and Mg3 and Mg4 stabilize the local conformations of P2 and L4. Similar to Mg2+ cations that mediate ligand binding in the TPP riboswitch and the glmS ribozyme [12,15], Mg2+ cations in the M-box structure predominantly utilize the outer-sphere coordination for interactions with nucleobases. Like the TPP riboswitch , only two inner-sphere contacts with nucleobases are formed, while additional direct contacts are made with non-bridging phosphate oxygens. Since Mg1 uses four inner-sphere (one nucleobase and three sugar-phosphate backbone) contacts with RNA (Fig. 2e), a feature rarely observed in previous studies, this cation may provide a key contribution to the docking of the J2-1/P2 region with the L4 and L5 loops. These long-distance interactions facilitate the formation of tertiary base contacts and base stacking, which in turn sequester antiterminator nucleotides and, along with Watson-Crick base pairing, contribute to the stabilization of helix P1, leading to the formation of the terminator hairpin and the repression of gene expression (Fig. 2b). By contrast, in purine, TPP and SAM-I riboswitches, formation of the P1 helix is dependent on stabilization of the adjacent junction by bound ligand (Fig. 2a) [9,11–14].
Similar to riboswitches, some bacterial proteins inhibit translation by interactions with the mRNA region located adjacent to the ribosome-binding site (RBS), thereby preventing ribosome loading onto mRNA. The classical example of such a protein versus ribosome competition mechanism is highlighted for the autoregulation of ribosomal protein synthesis. If produced in excess over their rRNA targets, primary ribosomal proteins, such as L1, interact with their own mRNAs, repressing ribosome binding. This implies a preferential binding to rRNA and overlap of binding sites for rRNA and mRNA on L1, suggesting similarities between both RNA targets.
The structures of ribosomal protein L1 bound to mRNA and 23S rRNA fragments demonstrated that both RNAs indeed have a common structural determinant (upper panel, Fig. 3a) [20–22]. This binding region in mRNA is built by an asymmetrical internal loop that creates a sharp bend between two helices, thereby resembling the kink-turn motif . The RNA core includes the primary recognition determinant, a G-C pair and its neighboring uridine, that is specifically recognized in both complexes by invariant E42, T217, M218 and G219 of domain 1 (lower panel in Figure 3a). To increase binding affinity, rRNA additionally interacts with domain 2 of L1.
Ribosomal protein S15 represents an interesting deviation from the general trend outlined above. In thermophilic bacteria, mRNA and rRNA targets of S15 [23,24] are similar, and the protein represses translation by competition with the ribosome . In Escherichia coli, S15 recognizes a pseudoknot structure folded within its own mRNA [25–27], rather than the three-way junction architecture associated with thermophilic bacteria. Although the ribosome can interact with mRNA already bound to S15, it cannot initiate translation in E. coli . A long-awaited cryo-electron microscopy study  showed that in the stalled ribosome, S15 positions itself along the mRNA pseudoknot, and the S15-mRNA complex is nested on a special platform of the small ribosomal subunit (Figure 3b). This precise positioning allows Shine-Dalgarno-rRNA interactions, but precludes the initiator tRNA from reaching the start codon. Therefore, S15 performs its repressor function by preventing the mRNA pseudoknot from unfolding and entering the ribosome, thereby trapping the ribosome in a translation-incompetent state.
The iron regulatory protein 1 (IRP-1) also has a dual function . The protein either binds iron-responsive elements (IREs) in mRNA to repress translation or degradation, or it binds an iron-sulfur cluster to become a cytosolic aconitase, catalyzing the conversion of citrate to isocitrate. In order to accommodate the stem-loop IRE, domains 3 and 4 are splayed apart in the aconitase bound state  and their contacting surfaces are incorporated into two distinct and separated RNA-binding sites (upper panel in Figure 3c). IRE contacts IRP-1 using its lower stem and the terminal loop, which contains a conserved CAGUG motif. The specific recognition of the loop is accomplished by base-specific bonding of S371, K379 and R260 with a 5′-A15-G16-U17 pseudo-triloop, and is strengthened by van der Waal’s contacts with the exposed purines (lower right panel in Figure 3c). The other recognition determinant, the conserved bulged C8, is sandwiched between two arginines within a small pocket and is involved in base-specific hydrogen bonds with the side chain of S681 and backbone of P682, D781, and W782 (lower left panel in Figure 3c). The availability of two separated RNA-binding sites, which recognize the loop and bulged cytosine, greatly increase the selectivity of IRE recognition by IRP-1, thereby resembling the two-point recognition reported previously in tRNA-aminoacyl-tRNA synthetase and some RNA-ribosomal protein complexes [32,33].
Loop-specific recognition is utilized by several other proteins of the first group for the readout of certain nucleotide sequences (Table 1). Though some proteins contact the helical RNA segments that close RNA loop regions, the majority of specific interactions are observed with non-paired nucleotides within the loops. These RNA-protein interactions resemble sequence-specific recognition patterns observed in complexes between proteins and single-stranded RNA (discussed below), which often involve small canonical RNA-binding domains, such as zinc-finger domains, the K-homology (KH) domain and the RNA recognition motif (RRM) . Not surprisingly, some proteins described here utilize canonical RNA-binding domains and motifs for RNA binding.
The structure of the translational repressor RsmE bound to the Shine-Dalgarno sequence of hcnA mRNA shows how a protein dimer specifically recognizes the consensus sequence 5′-A/UCANGGANGU/A (Figure 3d) . The loop contains six unpaired nucleotides A8-C9-G10-G11-A12-U13, with U13 and the C9-G10-G11 segment bulged out. The protein specifically recognizes the Watson-Crick edges of A8, G10, G11, the Hoogstean edge of A12, and the major-groove side of C7-G14 and U6-A15 base pairs. In contrast to small canonical RNA-binding domains, the sequence-specific recognition of unpaired nucleotides is mediated primarily by β-strand backbone residues, implying that the protein fold itself is responsible for RNA-binding specificity.
The sequence-specific recognition of the bulged out nucleotides in apical loops is a recurrent theme in complexes of aptamer RNAs with the KH1 domain of NOVA-1 KH1/2 protein (PDB code: 2ANR) and the RNA recognition motif (RRM) domain of human RBMY protein . The NOVA (neuro-oncological ventral antigen) family of proteins is expressed in neurons where it plays a crucial role in the regulation of alternative splicing . The NOVA-1 protein contains three KH domains. An earlier structure  has revealed details of the recognition between the KH3 domain and a UCAC tetranucleotide embedded within the hairpin loop of an in vitro-selected stem-loop RNA scaffold. However, it has not addressed the question how multiple KH domains can target RNA. The structure of the first two KH domains (KH1/2) bound to tandem UCAN repeats of an in vitro-selected stem-loop RNA, attempted to answer this question. These structural efforts revealed that the KH2 domain does not participate in RNA binding and only the KH1 domain interacts with a 5′-UCAG-UCAC-C loop closed by three non-canonical base pairs. This domain primarily binds to the second UCAN repeat in the cleft usually used by KH-domains for ss-DNA and ss-RNA recognition. Despite the Watson-Crick edges of all four nucleotides interacting with the protein, only cytosine and adenine form sequence-specific hydrogen bonds, thereby validating the YCAN sequence consensus found using the SELEX approach .
Testes-specific RBMY (RNA-binding motif gene on Y chromosome) protein encoded by the human Y chromosome is important for sperm development. The protein is possibly involved in pre-mRNA processing and recognizes an in vitro-selected RNA hairpin with a 5′-CA/UCAA loop and a 5′-GUC-loop-GAY consensus element in the loop-closing part of the stem . In the structure, CAA nucleotides protrude from the CACAA pentaloop and are spread on the β-sheet surface of the RRM, similar to other proteins that utilize the RRM-RNA mode of recognition. All three nucleotides form base-specific contacts with main and side chain atoms; however, only adenines provide base-specific discrimination. Unexpectedly, the protein makes additional contacts with a major groove of the stem using its β-hairpin, thereby demonstrating dual sequence and shape-specific RNA-recognition, a duality that is generally unusual for RRM motifs.
Two proteins, elongation factor SelB and mRNA-binding factor Vts1p, interact with stem-loop structures, whose loop regions, though composed of different sequences, demonstrate conformational similarity to the UNCG tetraloop fold . SelB is essential for incorporation of selenocysteine, the 21st amino acid, into bacterial polypeptides. The factor binds selenocysteine insertion sequence (SECIS) in mRNA with extremely high selectivity, and this binding serves as a signal for delivery of selenocysteyl-tRNA at a UGA stop codon upstream of SECIS hairpin. The high binding specificity is achieved through base-specific interactions of a DNA- and RNA-binding winged-helix (WH) motif with consecutive bulged out guanine and unpaired uridine of the 5′-GGUC-U loop, and interactions with the RNA backbone, which are determined by shape complementarity and electrostatic properties of the protein surface [39,40].
Yeast Vts1p has been implicated in vesicular transport and sporulation; however, its precise role remains unknown. The protein is a homolog of the Drosophila protein Smaug, a translational repressor that mediates body pattering during embryogenesis by binding to a mRNA hairpin termed Smaug recognition element (SRE) . The SRE hairpin exhibits consensus sequences 5′-UNGA-N and 5′-GNGC-N which are targeted by α-helical sterile alpha motif (SAM) domain of Vts1p, a domain also implicated in protein-protein and DNA-protein interactions . Three structures of the Vts1p-SAM domain bound to two SRE variants show parallels with SelB-SECIS recognition, such as shape recognition of the loop region and base-specific binding to an unpaired nucleotide, guanosine in this case [42–44]. In contrast to the SelB-SECIS complex, the bulged out nucleotide does not play a significant role in recognition by Vts1p-SAM.
The majority of the RNA-protein complexes from the second group contain canonical RNA-recognition modules. Nevertheless, the structures of these complexes show interesting details and deviations from typical RNA-recognition modes. These structures illustrate the high complexity of mRNA recognition and significantly expand our knowledge of the code underlying mRNA recognition. Since the RRM domain is the most common RNA-binding motif and is typically used for recognition of specific sequences, it is not surprising that amongst the seven complexes assigned to the second group, only two, RNase Kid and the KH domain of poly(C)-binding protein-2 (PCBP-2), do not contain this motif (Table 1).
The Fox-1 protein regulates tissue-specific alternative splicing by binding to a 5′-GCAUG RNA element. Like the above-mentioned RNA complex of the RBMY protein, the structure of the RRM domain of Fox-1 in complex with U1-G2-C3-A4-U5-G6-U7 demonstrates both canonical and unique modes of RNA recognition (Figure 4a) . The U5-G6-U7 segment is bound in a canonical way by the β-sheet of the protein. These interactions feature typical hydrophobic interactions of H120, F158 and F160 with U5 and G6 nucleotides. However, the binding platform is extended to the β1α1-loop of the RRM motif, where in an unprecedented manner, F126 is caged by the U1-G2-C3 segment. Since aromatic residues at equivalent positions are predicted in other RRMs, this RNA recognition feature may be shared with additional proteins. Binding specificity is provided by a dense network of hydrogen bonds to the bases of the first six nucleotides, while high binding affinity is achieved by numerous electrostatic and hydrophobic interactions. Several intramolecular hydrogen bonds additionally stabilize the RNA conformation, in contrast to its disordered topology in the unbound state.
The structure of another key regulator of alternative splicing, the SRp20 protein, bound to a 5′-CAUC sequence further expands the RNA-binding characteristics of the RRM motif . As anticipated from the consensus sequence 5′-A/UCA/UA/UC, SRp20-RRM binds RNA in a semi-sequence-specific mode. Although all bases participate in hydrogen bonding with RRM, only the invariant first cytosine is recognized sequence-specifically. In addition to unspecific binding, the AUC segment adopts an unusual RNA topology, possibly preserved for recognition by related proteins.
Two other RRM-containing complexes illustrate another distinctive mode of RRM-RNA binding, namely interactions of RNA with tandem RRMs. In the structure of pre-mRNA splicing factor U2AF65 bound to a U7 strand, the protein utilizes a unique pattern of hydrogen bonds with uracil bases, spread over the surfaces of both RRMs . These hydrogen bonds are frequently formed with protein side chains, which may be rearranged upon RNA binding to accommodate other polypyrimidine sequences. The structure of polyadenylation factor Hrp1 in complex with polyadenylation enhancement element 5′-GUAUAUAUA reveals a sequence-specific mode of recognition of the AUAUAU motif by both main and side chains of RRM2, RRM1 and the linker region. Interestingly, the β1α1-loop of RRM1 contains a tryptophan that is important for RNA binding, and which occupies a position equivalent to F126 in the Fox-1-RNA complex . Related β1α1-loops in Sex-lethal-RNA and HuD-RNA complexes contain a tyrosine at a nearby position in the loop [48,49]. This observation further reinforces the suggestion that an aromatic amino acid outside of the RRM β-sheet can be a strong determinant of RNA recognition.
The structure of the N-terminal part of the multifunctional La protein in complex with 5′-U1-G2-C3-U4-G5-U6-U7-U8-U9 RNA has revealed an unexpected mode of RNA recognition . La protein contains RNA-binding RRM and La domains and interacts with certain pyrimidine-rich mRNAs, as well as with small RNA precursors, which typically bear a UUUOH sequence at their 3′ termini. Unexpectedly, recognition of the 3′-terminal U7-U8-U9 segment occurred within a cleft between the La and RRM1 domains, involving contacts with one edge of a canonical β-sheet of the RRM domain and the backside of the winged-helix motif of the La domain (Figure 4b). The majority of the interactions involve conserved aromatic amino acids of the La domain with the U7-U8-U9 segment that adopts a reversed-turn stabilized by stacking of non-adjacent U7 and U9 residues. Hydrogen bonds to the U8 base determine sequence specificity, while interactions of an aspartate carboxylate with hydroxyls of the U9 sugar discriminate against phosphorylated 3′-ends. Since the canonical sites of La and RRM domains are not occupied, the protein could be involved in other RNA interactions.
Our understanding of RNA-recognition principles, which were largely based in previous years on the structures of protein-RNA complexes involved in translation or viral biogenesis, are currently being enhanced by structural information from complexes involving mRNA and other RNA species. Significant progress has been made in our understanding of the mechanism of action of metabolite-sensing mRNAs, based on the crystal structures of several important riboswitches in the metabolite-bound state. These structures along with the structures of protein repressor-mRNA complexes have stimulated functional studies aimed towards elucidation of the corresponding mechanisms of gene expression control. The ongoing structural studies illustrate the diversity and complexity of mRNA recognition and significantly expand our current understanding of the principles underlying a mRNA recognition code. The increasing sophistication and technical advances in X-ray, NMR and cryo-electron microscopy techniques should undoubtedly lead to the solution of more challenging problems, likely aimed at large multiprotein complexes containing longer mRNA fragments.
This work was supported by National Institutes of Health grant GM073618.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Papers of particular interest, published within the annual period of review, have been highlighted as:
• of special interest
•• of outstanding interest