|Home | About | Journals | Submit | Contact Us | Français|
An emerging body of evidence indicates that post-transcriptional gene regulation not only relies on the linear sequence of messenger RNAs but also on their folding into intricate secondary structures and on chemical modification of the RNA bases. These features, which are highly dynamic and interdependent, exert direct control over the transcriptome thereby influencing many aspects of cell function. Here, we consider that coupling of RNA modifications and structure actively shapes RNA-protein interactions through individual steps of gene expression.
The life of mRNAs involves intricate coordination of precursor mRNA (pre-mRNA) processing events, including capping, splicing and polyadenylation, as well as the export of mature transcripts to the cytoplasm for translation. How do cells exert tight spatiotemporal control over these multi-step processes? Coupling of nuclear and cytosolic events can be accomplished through two distinct mechanisms. First, nucleocytoplasmic shuttling RNA binding proteins (RBPs) can remain bound to their target transcripts to influence downstream processes. An excellent example of this is serine/arginine-rich (SR) proteins, RBPs initially identified for their roles in splicing1 but later found to serve important roles in transcription elongation, poly(A) site selection, mRNA export, and translation1–5. Second, nuclear processing including splicing and polyadenylation, as well as dynamic changes in RNA chemical modifications and secondary structures (discussed below), can facilitate downstream RBP associations in the cytoplasm.
As RBPs link multiple steps of gene expression, factors governing their binding to target RNAs must be considered to understand how coupling is achieved. The mRNA sequence is often insufficient to drive protein binding, as RBP-consensus motifs are highly overrepresented compared to the incidence of binding6–8. What additional features are responsible for determining the specificity of protein-RNA interactions in cells? Several recent studies have revealed that post-transcriptional modification of RNA through the addition or removal of chemical groups (termed the epitranscriptome) can expand the information encoded by RNA9. As we discuss below, RNA modifications, which are dynamic and reversible, can modulate protein-RNA interactions and mediate rapid responses to environmental changes. In addition to the panoply of chemical modifications, mRNAs also fold into intricate secondary structures. Until recently, the structural aspect of the transcriptome was largely unexplored owing to technical limitations. However, the development of new transcriptome-wide approaches to study RNA structure in vivo has unraveled their plasticity, complexity, and functionality10–12.
In this Opinion article, we discuss recent findings that demonstrate the functional coupling and interdependence of mRNA modifications and structures, and provide an updated view of gene expression that incorporates the dynamics and physiological relevance of these RNA features.
The epitranscriptome is shaped through the activity of evolutionarily conserved factors that encode (writers), decode (readers), and remove (erasers) various chemical modifications on RNA bases (Figure 1a & b). More than 100 distinct modifications have been found in cellular RNAs13, with the list continuing to grow. While most modifications—including N6-methyladenosine (m6A), 5-methylcytosine (m5C), N1-methyladenosine (m1A), pseudouridine (ψ), as well as adenosine-to-inosine and cytosine-to-uridine editing (not discussed here, see reviews for details14, 15)—were originally identified in highly abundant rRNAs, tRNAs, and small nuclear RNA (snRNAs)16, 17, recent advances in sequencing and mass spectrometry approaches have allowed for their detection and characterization in relatively lowly expressed mRNAs. Genome-wide analyses coupled with gain- and loss-of-function studies of readers, writers, and erasers have (i) expanded the repertoire of mRNA modifications in eukaryotic transcriptomes, (ii) annotated their abundance and distribution along transcripts, (iii) mapped the modification sites conserved between species, (iv) unraveled changes in modification abundance and position in response to environmental changes, (v) demonstrated their impact on mRNA processing, stability and translation, and (vi) revealed physiological consequences of modification dynamics (Figure 1c).
Amongst internal mRNA modifications, m6A is the most prevalent—accounting for ~80% of all RNA base methylations in eukaryotes18, 19. Emerging evidence demonstrates that m6A deposition—catalyzed by a writer complex consisting of methyltransferase-like 3 and 14 (METTL3 and METTL14), Wilm’s tumor 1-associated protein (WTAP), protein virilizer homolog (KIAA1429), and RNA-binding motif protein 15 and 15B (RBM15 and RBM15B)18–26—has important roles in regulating pre-mRNA processing. In mammals, m6A is enriched in 3′ untranslated regions (UTRs), introns, and alternatively spliced exons18, 19, 27, and depletion of METTL3, WTAP, or the m6A eraser fat mass and obesity-associated (FTO)28, results in splicing defects for hundreds of genes18, 25, 29. A portion of these events occurs in 3′UTRs and last exons, suggesting m6A also impacts polyadenylation. Indeed, higher m6A density near proximal poly(A) sites correlates with decreased usage of those sites for polyadenylation, whereas m6A at sites upstream of and near distal poly(A) sites correlates with increased usage in a subset of mRNAs27.
The sizeable proportion (~30%) of m6A found in introns indicates it is likely deposited co-transcriptionally and prior to splicing23, 30. Likewise, co-localization of m6A methyltransferases and demethylases with pre-mRNA splicing factors in the splicing-associated nuclear speckles suggests an intrinsic connection between m6A and splicing23, 31. Furthermore, deficiency of the m6A demethylase AlkB homolog 5 (ALKBH5) disrupts the recruitment of the splicing factors SRSF1, SRSF2, and the SRSF Protein Kinase 1 (SRPK1) to nuclear speckles31. As phosphorylation of SR proteins by SRPKs and their localization are intimately linked with splicing32, 33, these results partially explain how changes in m6A could affect global splice site choices. In another study, FTO-regulated m6A sites were found to display a marked overlap (60–80%) with exonic splicing enhancer sequences bound by SRSF229. Strikingly, SRSF2 binding and exon inclusion are inversely correlated to FTO expression, indicating that m6A potentiates SRSF2 binding and splicing regulation. However, it is unclear whether this is a direct effect or if m6A-binding proteins modulate SRSF2 binding. The m6A reader YTH domain containing 1 (YTHDC1) presents an example supporting the second scenario. Through binding to m6A, YTHDC1 modulates thousands of splicing events through the promotion or inhibition of transcript binding by SRSF3 or SRSF10, respectively34. Splicing factors can also directly bind to m6A-containing sequences. HNRNPA2B1, a member of the heterogeneous nuclear ribonucleoprotein (hnRNP) family of splicing factors, directly binds to m6A-bearing regions near exons regulated by METTL3 and promotes splicing in the same direction35. Thus, m6A serves as a docking site allowing protein-RNA interactions that can directly or indirectly influence pre-mRNA processing.
In addition to affecting pre-mRNA processing, RNA modifications control mRNA decay. The development of chemical labeling techniques has led to the transcriptome-wide mapping of ψ in yeast and human mRNAs36–39. Pseudouridylation of mRNAs is catalyzed by the pseudouridylate synthase (PUS) family of enzymes. Upon heat shock in yeast, PUS7 catalyzes ψ formation at over 200 distinct sites39. PUS7 targets became more highly expressed compared to those in PUS7-deficient cells, suggesting ψ actively regulates mRNA stability. Conversely, depletion of m6A writer METTL3 results in a global increase in target mRNA half-life40, 41. In mouse and human embryonic stem cells, this stabilization occurs in pluripotency genes resulting in increased abundance and failure to exit naïve pluripotency41, 42. These effects are regulated by the m6A reader YTH domain family 2 (YTHDF2) and by the RBP human antigen R (HuR), with opposing consequences. YTHDF2 binding to m6A recruits the CCR4-NOT deadenylase complex43 and facilitates mRNA transport to processing bodies44, sites of mRNA decay45. However, conflicting functions for YTHDF2 in stabilizing HIV mRNA and YTHDF2-tethered reporter constructs were published recently46, raising questions about the role of YTHDF2 in destabilizing mRNAs. By contrast, HuR stabilizes these mRNAs by binding in their 3′UTRs, thereby blocking miRNA binding30. Thus, decreased methylation following METTL3 depletion promotes HuR binding and stabilizes target transcripts by blocking miRNA-dependent mRNA decay.
The spatial proximity of m6A and miRNA-binding sites in 3′UTRs raises an interesting question—do miRNAs guide m6A writers to methylate specific adenosines? Recently, miRNAs were shown to modulate METTL3 binding to transcripts through a sequence-dependent pairing mechanism and confer methylation specificity independently of the miRNA-interacting Argonaute (AGO) proteins47. Although intriguing, this result has yet to be validated by other groups. METTL3 also methylates many primary miRNAs (pri-miRNAs), and is obligatory for their processing by the Microprocessor Complex Protein DGCR848. This is mediated by HNRNPA2B1, which recruits DGCR8 to pri-miRNAs48. Additionally—although the abundance of m5C in mRNA is low—transcriptome-wide profiling shows substantial overlap between m5C sites and Ago1–4 binding sites49, suggesting a role in miRNA-mediated silencing. Collectively, these studies highlight the dynamic cooperation between the epitranscriptome and miRNAs to modulate mRNA decay.
Translational regulation is a highly dynamic process that facilitates rapid responses to environmental stimuli owing to the ample availability of translationally-competent mRNA pools. Translation initiation can be cap-dependent or cap-independent50, and the epitranscriptome has profound effects on both mechanisms.
In human cells, the m6A reader YTHDF1 promotes cap-dependent translation initiation through associations with eukaryotic initiation factor 3 (eIF3) and other RBPs51. However, YTHDF1 has also been shown to be interchangeable with YTHDF2 and YTHDF3 in regulating mRNA decay43, 46, 52, suggestive of a role outside of translation. Although m6A levels are low in 5′UTRs, heat shock significantly increases m6A in 5′UTRs of heat-shock responsive genes in mammalian cells53, 54. Heat shock inhibits cap-dependent translation55 but, remarkably, a single m6A residue in 5′UTRs is sufficient to facilitate cap-independent translation of heat-shock responsive genes, including heat shock protein 70 (HSP70)53 (Figure 2). Under normal conditions, FTO removes m6A from the 5′UTRs of HSP70 and other heat shock-responsive genes, preventing their cap-independent translation. During heat shock, YTHDF2 translocates from the cytoplasm to the nucleus and binds m6A residues in the 5′UTR of heat shock-responsive genes, preventing demethylation by FTO and promoting rapid cap-independent translation54. In this fashion, 5′UTR m6A deposition ensures the production of proteins that are essential for survival during heat shock when general translation is halted.
m5C and its derivative hm5C are also implicated in translation regulation. Expression of the m5C writer NOP2/Sun RNA methyltransferase family member 2 (NSUN2) is tightly regulated during the cell cycle and its overexpression delays replicative senescence through methylation of Cyclin-dependent kinase 1 (CDK1) and the CDK inhibitor 1B (p27KIP1) transcripts. Deposition of m5C in the CDK1 3′UTR enhances its translation, while m5C in the 5′UTR of p27KIP1 represses its translation, resulting in increased cellular proliferation56, 57. Deposition of hm5C by methylcytosine dioxygenases58 in mRNA correlates with higher association with polyribosomes in Drosophila melanogaster59. Recently, the existence of another modification, m1A, was discovered60, 61. Interestingly, m1A displays tissue specificity and plasticity in response to nitrogen deprivation in yeast60 as well as multiple stimuli in mammalian cells60, 61. Most modified transcripts harbor a single m1A in their 5′UTRs, and m1A-bearing transcripts have higher translation efficiency and protein levels compared to unmodified transcripts. Finally, for some transcripts pseudouridylation can affect their translation through alternative decoding of pseudouridylated codons, which may improve stress tolerance62.
While much work remains, these studies clearly demonstrate the impact of RNA modifications on different steps of gene expression. A natural question that arises is how do modifications link various nuclear and cytosolic processes. As for m6A, the discrete localization of readers along with their distinct protein-protein interaction domains seems to allow their interaction with different downstream effector proteins and regulate specific events in the nucleus and cytoplasm29, 34, 35, 44, 51, 54.
Unlike DNA, which universally adopts a double-helix conformation, RNA forms extensive intramolecular interactions to fold into a vast array of complex structures63. RNA structure is highly dynamic, governed by factors such as temperature, cellular energy state, ATP-dependent RNA helicases, chaperone proteins and other RBPs64. RNA structures enable a myriad of functions, including encoding genetic information and catalyzing chemical reactions65, 66. The dynamic nature of RNA structure is illustrated by splicing, during which structural rearrangements of snRNAs permit the recognition of splice sites and branch point sequences in pre-mRNAs and facilitate the stepwise assembly of spliceosomes67. Numerous studies have confirmed that mRNAs also contain structured regions, although the degree of folding is significantly lower than that of noncoding RNAs. The advent of chemical modification-based procedures and RBP–RNA cross-linking methods11, 12, 68, 69 has allowed for transcriptome-wide mapping of RNA structures in different species. These studies have unveiled a previously unappreciated and complex layer of gene regulation.
RNA folding begins as the transcript is being synthesized by RNA polymerase II (Pol II)70–72. Transcript elongation by Pol II is not continuous, but rather entails periods of active elongation interrupted by pauses and backtracking. The formation of RNA structures, such as hairpins, on the nascent transcript can create physical barriers that prevent backtracking and promote forward elongation by Pol II70, 71. As splicing and polyadenylation are dependent upon Pol II elongation kinetics73, 74, RNA structure–elongation coupling likely affects downstream RNA processing.
RNA structures may also regulate RNA processing in a direct manner through three distinct mechanisms. First, local RNA structure variations can promote or inhibit the binding of RBPs8. Secondary structure surrounding the consensus motifs for the splicing factors RNA binding protein, Fox-1 homolog 2 (RBFOX2) and Muscleblind Like 1 (MBNL1) is a key determinant of whether these sites will be bound in vivo. Moreover, motifs near evolutionarily conserved alternative exons are more single-stranded and exhibit stronger RBP binding than species-specific alternative exons or constitutive exons8. Distinct structural features also occur at 5′ and 3′ splice sites, strong and weak splice sites, and polyadenylation signals, indicating that local structures regulate RNA processing globally68, 75, 76. Second, longer-range RNA structures could facilitate splicing regulation by bringing distal regulatory regions into close proximity of target exons. An example of this mechanism is presented by RBFOX2. Over half of RBFOX2 binding sites are found over 500 nucleotides (nt) away from any annotated exons77. Regulation from these deep intronic sites is enabled through the formation of long-range intronic structures that deliver RBFOX2 close to target exons (Figure 3A). Moreover, recent studies utilizing methods that detect long-range RNA duplexes have revealed that up to 40% of transcript structures span more than 300nt10, 78, 79. Thus, the RNA structure-based proximity system observed for RBFOX2 may represent a mechanism common to other RBPs. Third, RNA modifications may act to stabilize or disrupt structural elements and influence RBP accessibility. Such a mechanism has been identified to regulate the binding of HNRNPC80, an RBP involved in mRNA stability and processing7, 81, 82. U-tract sequences recognized by HNRNPC are often buried within stem structures that prevent binding. Adenosines in these stems are often N6-methylated, which destabilizes the stem structure, exposes the U-tract, and allows HNRNPC binding (Figure 3B). Thousands of m6A structural switches were identified, with the expression of nearly 2,000 genes and over 100 splicing events being regulated by HNRNPC and METTL3–METTL1480. Taken together, these studies indicate that the structural landscape of the transcriptome is an essential mediator of RNA processing, and yet another factor that must be fully understood to complete the assembly of the splicing code.
RNA structure has also been linked to translational regulation. Across plants, animals, and fungi, the ~5nt region surrounding the start codon displays a significant lack of structure11, 68, 76, 83, 84. This feature is enriched in genes with high translation efficiencies and absent in those with low efficiencies11, 68, 76, 83, 84, indicating that start codon accessibility promotes translation. RNA secondary structure is also highly dynamic in response to external stimuli. Upon the transition from 30 to 37ºC in yeast, over 25,000 bases clustered at ~2,000 sites in mRNAs specifically unpair84. These heat-sensitive bases are enriched in 5′UTRs, regions in which structure is known to influence translation85. Future studies may determine if temperature-sensitive RNA structural elements indeed regulate eukaryotic gene expression.
Another structural element that has gained attention in the past decade are RNA G-quadruplexes (RGQs), four-stranded structures that result from the stacking of multiple planar G-quartets (Figure 3C) that are implicated in splicing, stability, and translation regulation. Although prevalent in vitro—especially in 5′ and 3′ UTRs—recent in vivo genome-wide analysis has revealed that RGQs are globally unfolded in mammalian cells86. Nonetheless, many individual examples of functionally important RGQs have been published85, 87. Moreover, inhibition of the RGQ-resolving RNA helicase eIF4A in T-cell acute lymphoblastic leukemia (T-ALL) cells induces apoptosis and delayed tumor growth due to decreased translation of oncogenes such as MYC, NOTCH, and BCL288. Transcripts affected by eIF4A inhibition contain longer 5′UTRs on average, and display a strong enrichment for the 12-nucleotide RGQ-forming motif (CGG)4. Thus, eIF4A’s tumorigenic properties may be due to its ability to unwind translation-inhibiting RGQs in the 5′UTRs of oncogenes. In addition to 5′ structures, RGQs in the 3′UTRs of mRNAs can also affect mRNA translation and stability through the occlusion of miRNA binding sites89.
The degradation of mRNAs is primarily controlled by the exosome, a multiprotein exonuclease90. Exosome-mediated degradation requires a single-stranded region at the 3′ end of mRNAs, providing the potential for RNA structure to affect transcript stability. Indeed, 3′ UTR structures involving the poly(A) tail are the primary determinants of transcript stability in yeast, with alternative polyadenylation isoforms from the same gene displaying drastically different half-lives91, 92, although stability differences between 3′UTR isoforms are subtler in mammalian cells93. Also, the formation of 3′ triple helical structures in nonpolyadenylated mammalian and viral long noncoding RNAs (lncRNAs) drastically increases their half-lives by preventing exonuclease access to the 3′ end (BOX 1). During heat shock in yeast, RNA structures with low melting temperatures are rapidly degraded by the exosome as the 3′ end becomes accessible84. As discussed above, ψ deposition increases upon heat shock and increases target mRNA stability39. As ψ-A base pairs are more stable than U-A base pairs62, it is possible that ψ increases mRNA half-lives by preventing the melting of RNA structures that protect the 3′ end at increased temperatures. In contrast to ψ, m6A acts as a “spring-loaded” modification, disrupting RNA duplexes due to the adoption of an unfavorable high-energy conformation94, 95. Indeed, transcriptome-wide structural analysis revealed a distinct structural profile at m6A sites consistent with unpaired RNA11. In addition to its role in miRNA-mediated decay, the strong enrichment of m6A in 3′UTRs suggests it may also promote transcript degradation by destabilizing 3′-protecting structural elements. Interestingly, 3′UTR structures are not always protective. Transcriptome-wide analysis of Staufen 1 (STAU1), an RBP that binds double-stranded RNA and regulates mRNA stability96, revealed it binds to thousands of duplexes in 3′UTRs12. Depletion of STAU1 significantly increased the stability of mRNAs containing these 3′ duplexes, demonstrating that these structures promote transcript degradation.
Canonical transcript maturation involves 3′ end cleavage and the addition of a poly(A) tail of up to 250 nucleotides. A poly(A) tail is present in the majority of eukaryotic mRNAs, and is essential for proper nucleocytoplasmic export, translation, and confers stability by protecting the coding information from degradation by exonucleases100. Interestingly, a sizeable portion (15–25%) of annotated mRNAs and long noncoding RNAs (lncRNAs) are not polyadenylated or are bimorphic, with some transcripts containing and others lacking a poly(A) tail101, raising the question of how these transcripts are stabilized and—in the case of mRNAs—translated. One such group of non-polyadenylated mRNAs belong to the intronless replication-dependent histone genes. These mRNAs are instead stabilized by highly conserved stem-loop structures that form in their 3′UTRs, which are also required for their translation102. This provides direct evidence that structures at the 3′ end of transcripts can functionally replace the poly(A) tail in both RNA turnover and translation.
The triple helix structure that forms at the expression and nuclear retention element (ENE) in the 3′ end of the Kaposi’s sarcoma-associated herpesvirus lncRNA PAN (polyadenylated nuclear) protects the transcript from rapid nuclear deadenylation-dependent decay103–105. Recently, a pair of studies discovered similar ENE elements in two mammalian nonpolyadenylated lncRNAs, metastasis-associated lung adenocarcinoma transcript 1 (MALAT1) and multiple endocrine neoplasia-β (MENβ)106, 107. Both MALAT1 and MENβ are not processed by the canonical cleavage and polyadenylation machinery, and instead contain 3′ terminal tRNA-like structures that are recognized and cleaved by RNase P108, 109. Following cleavage, genomically encoded A-rich tract at the 3′ ends of both lncRNAs form triple-helical interactions nearly identical to those formed in the PAN RNA106, 107. These structures are essential for MALAT1 and MENβ stability, as deletions or destabilizing mutations made to these structures drastically reduce the transcripts’ half-lives. Strikingly, placement of the MALAT1 and MENβ 3′ structures onto polyadenylation-deficient reporter genes not only significantly stabilizes the mRNAs but also facilitates efficient translation107, suggesting such RNA structures may have a wider role in the translation and/or stabilization of other poly(A)-lacking transcripts. Recently, bioinformatics analysis has identified similar structures that stabilize hundreds of transposon transcripts in fungi, providing the first evidence of triple helices in mRNAs110.
An increasing number of studies have recognized that dynamic changes in RNA structures and modifications are indispensable features of eukaryotic gene expression. Recent findings support a new paradigm in which these features promote the coordination of individual gene expression processes by modulating access to specific RBPs and facilitating active transcriptome reprogramming in response to varying cellular conditions. Current RNA structural data represent only brief snapshots in time. However, it is likely that mRNAs are dynamically refolded as they progress throughout their life cycle, entering distinct subcellular compartments and associating with different RBPs. These structural rearrangements may affect downstream processes, facilitating remodeling of the mRNA-protein complex through the occlusion, exposure, or formation of RBP binding sites. Thus, the tracking of individual transcript structures throughout their life cycle will be necessary to fully understand the impact of mRNA structure. Future studies should discern how external cues fine-tune the structural and chemical variations within transcripts to produce a coherent biological response. For instance, do different signaling pathways recruit discrete epitranscriptome readers, writers or erasers as effectors to influence fates of individual mRNAs? Efforts to map the epitranscriptome at single nucleotide resolution also must continue to identify the exact nucleotides modified in each transcript, define how these sites are selected, and determine if the dynamics are achieved through active control of writers versus erasers.
Another important aspect yet to be addressed is the dynamics of RNA modification stoichiometry. Epitranscriptome studies today deal mostly with modification sites, not the fraction of each site that is modified. Low-throughput analysis of m6A modification sites in mRNA and viral RNA shows that no m6A site is 100% modified97–99. Changes in the modification fraction at individual sites may also represent a dynamic parameter of RNA modification biology. Since modifications can affect mRNA structure and/or recruitment of RBPs, fractional modification at one specific site would generate two distinct mRNA species that differ only in their structures or bound reader proteins (Figure 4). Therefore, fractional modification could represent another mechanism to generate functional diversity from the same RNA transcript. High-throughput methods that can determine the modification fraction are urgently needed to address this aspect of the epitranscriptome. This is especially important for identifying the functional residues in heavily modified transcripts. For instance, the X-inactive specific transcript (XIST) lncRNA is m6A methylated at nearly 80 distinct sites—more than any other RNA—and these modifications are necessary for recruiting YTHDC1 and transcriptional silencing of genes on the X chromosome24. However, the percentage of each adenosine that is modified and in turn, which modified bases or combinations of modified bases are required for this function is unknown.
Finally, we also need to explore further how structural and chemical alterations cooperate to allow functional coupling of different steps of the mRNA life cycle. In addition to the functions provided by direct reader proteins, RNA modifications may be one of the primary drivers of structural plasticity within the transcriptome by stabilizing or disrupting base-pairing interactions. As such, the phenotypes observed upon perturbations to modification writers or erasers may be—at least in part—due to global changes in the RNA structural topography.
A.K. is supported by grants from the US National Institute of Health (R01HL126845), March of Dimes (5-FY14-112), and the Center for Advanced Study at the University of Illinois. T.P. is supported by the US National Institutes of Health (R01GM113194). The authors apologize to those whose work could not be cited because of lack of space.
Competing interests statement
The authors declare no competing financial interests.