|Home | About | Journals | Submit | Contact Us | Français|
Decoding post-transcriptional regulatory programs in RNA is a critical step in the larger goal to develop predictive dynamical models of cellular behavior. Despite recent efforts1–3, the vast landscape of RNA regulatory elements remain largely uncharacterized. A longstanding obstacle is the contribution of local RNA secondary structure in defining interaction partners in a variety of regulatory contexts, including but not limited to transcript stability3, alternative splicing4 and localization3. There are many documented instances where the presence of a structural regulatory element dictates alternative splicing patterns (e.g. human cardiac troponin T) or affects other aspects of RNA biology5. Thus, a full characterization of post-transcriptional regulatory programs requires capturing information provided by both local secondary structures and the underlying sequence3,6. We have developed a computational framework based on context-free grammars3,7 and mutual information2 that systematically explores the immense space of small structural elements and reveals motifs that are significantly informative of genome-wide measurements of RNA behavior. The application of this framework to genome-wide mammalian mRNA stability data revealed eight highly significant elements with substantial structural information, for the strongest of which we showed a major role in global mRNA regulation. Through biochemistry, mass-spectrometry, and in vivo binding studies, we identified HNRPA2B1 as the key regulator that binds this element and stabilizes a large number of its target genes. Ultimately, we created a global post-transcriptional regulatory map based on the identity of the discovered linear and structural cis-regulatory elements, their regulatory interactions and their target pathways. This approach can also be employed to reveal the structural elements that modulate other aspects of RNA behavior.
To isolate stability from other aspects of mRNA behavior, we performed whole-genome mRNA stability measurements by incubating MDA-MB-231 cells in the presence of 4-thiouridine (4sU), which is efficiently incorporated into cellular RNA. Subsequently, 4sU-labeled transcripts were captured and quantified at different time-points after the removal of 4sU from the growth medium. We calculated a relative decay rate for each transcript based on the rate at which 4sU-labeled transcripts, in the absence of 4sU in the media, are replaced by newly synthesized unlabeled mRNAs in the population (Supplementary Fig. 1). These measurements were then used to identify the putative cis-regulatory elements (linear and structural) that underlie transcript stability. A number of methods have been previously introduced for discovering structural motifs mainly based on free energy minimization, local sequence alignments or a combination of both alignments and secondary structure predictions 3,6,8. However, the extent to which these in silico predictions reflect stable in vivo molecular conformations has not been fully explored9. In fact, the RNA binding proteins and complexes that interact with their target transcripts may facilitate the formation of secondary structures in vivo. Thus, we sought to bypass the need for predicting thermodynamically stable secondary structures by efficiently enumerating a large space of potential structural motifs. We developed TEISER (Tool for Eliciting Informative Structural Elements in RNA), a framework for identifying the structural motifs that are informative of whole-genome measurements across all the transcripts. In this approach, structural motifs are defined in terms of context-free grammars7 (CFGs) that represent hairpin structures as well as primary sequence information (see Methods and Supplementary Fig. 2). TEISER employs mutual information to measure the regulatory consequences of the presence or absence of each of roughly 100 million different seed CFGs (see Methods). Mutual information is a robust non-parametric measure that reveals general dependencies across discrete or continuous measurements2,10. For example, when applied to the transcript stability data, TEISER captures the dependency between the stability of each mRNA and the presence or absence of a given structural motif in its 5’ and 3’ untranslated regions (UTRs). TEISER, subsequently, uses these measurements to choose and further refine the most informative motifs, and performs a series of statistical tests, e.g. randomization-based statistics and jackknifing tests, to achieve very low (<0.01) false-discovery rates (see Methods and Supplementary Fig. 2).
Application of TEISER to the mRNA stability measurements in MDA-MB-231 cells revealed eight strong structural motif predictions that passed our statistical tests aimed at finding the most likely elements causally involved in mRNA stability (Fig. 1 and Supplementary Fig. 3). Apart from being highly informative of mRNA stability measurements, these putative regulatory elements show a variety of other characteristics that support their functionality. For example, four of the discovered motifs are also informative of transcript stability measurements in mouse11 (Supplementary Fig. 4a). Furthermore, these motifs are highly conserved between human and mouse genomes (see Methods and Supplementary Fig. 3) and are also informative of co-expression clusters discovered across independent whole-genome datasets (Supplementary Fig. 4b).
Among the putative structural motifs discovered by TEISER, we chose sRSM1 (structural RNA Stability Motif-1)—the most statistically significant 3’ UTR element (z-score=122)—for further analysis. In order to probe the functionality of sRSM1 instances across the genome, we performed in vivo titration experiments using synthetic oligonucleotides10,12. Upon transfecting MDA-MB-231 cells with decoy RNA molecules harboring sRSM1 instances (Supplementary Fig. 5), we observed a notable reduction in the level of endogenous transcripts that carried this motif, in comparison to their level in the control cells transfected with scrambled RNA molecules (Fig. 2). This global down-regulation points to the presence of a trans-acting factor that, upon interaction with sRSM1, stabilizes its target transcripts. The decoy (synthetic) sRSM1 elements compete with endogenous mRNAs for the putative trans-acting factor, which results in the observed reduction in the level of its target mRNAs. Furthermore, reporter constructs carrying instances of sRSM1 showed a marked decrease in transcript decay rate in comparison to scrambled controls, further suggesting a direct role for this structural element in transcript stability (Supplementary Fig. 6).
We used streptomycin-binding RNA aptamer immobilization coupled with mass spectrometry13 to discover candidates that bind, in vitro, to the decoy instances of sRSM1, but not to the scrambled versions (Supplementary Fig. 7). After isolation under stringent conditions and in-solution digestion of RNA-bound proteins followed by nanoliquid chromatography-tandem mass spectrometry, we identified HNRPA2B1 as a promising candidate (Supplementary Table 1). This RNA-binding protein is a member of the A/B subfamily of heterogeneous nuclear ribonucleoproteins (hnRNPs)14 and carries two repeats of quasi-RRM RNA binding domains (Supplementary Fig. 8). Moreover, the established roles of other members of this family, namely HNRNPD and HNRNA1, in regulating RNA stability15 and binding terminal stem-loops16 further suggest HNRPA2B1 as a functional regulator. Also, more than 4,000 transcripts carry potentially functional instances of sRSM1 (see Methods), implicating this motif as a major global regulator of mRNA stability. The HNRPA2B1 transcript, at the same time, is highly abundant in the cell (one standard deviations higher than average17), thus making it a promising candidate for global modulation of mRNA stability through sRSM1.
In order to directly assess the regulatory consequences of modulating HNRPA2B1, we performed knock-down experiments followed by gene expression profiling. Consistent with our prior observations, HNRPA2B1 knock-down caused a significant decrease in the expression level of transcripts carrying sRSM1 (Fig. 3a). Stability measurements in the knock-down cells confirmed that the observed down-regulation of these transcripts was in fact due to changes in stability (see Methods), with the transcripts carrying sRSM1 elements showing a marked increase in their corresponding relative decay rates (Fig. 3b).
In principle, our observations are consistent with a possible indirect role for HNRPA2B1—brought about, for instance, by a common partner that binds both HNRPA2B1 and sRSM1 sites. The direct interaction between HNRPA2B1 and its potential target genes can be tested through cross-linking and immunoprecipitation of HNRPA2B1, which, through local UV photoreactivity of bases and amino-acids, can detect direct physical interactions18. We expressed a tagged clone of HNRPA2B1 in MDA-MB-231 cells, and after UV-crosslinking, immunoprecipitated this protein and the target mRNA molecules that were bound to it. We then labeled the isolated RNA population and hybridized it to microarrays with the input total RNA as control (a method called RIP-chip19). We observed a highly significant enrichment of sRSM1 in the immunoprecipitated population (Fig. 3c). In order to reduce the background and better pinpoint the HNRPA2B1 binding sites, we treated the samples with nuclease prior to immunoprecipitation under denaturing conditions and sequenced the HNRPA2B1-bound RNA population (HITS-CLIP20). We observed that sRSM1 elements were significantly enriched in the identified putative binding sites, in comparison with randomly selected sequences21 (Fig. 3d). These observations demonstrate that HNRPA2B1 directly interacts with sRSM1 in vivo and functions to stabilize its target transcripts through this regulatory element. These transcripts, in turn, modulate a variety of cellular processes and pathways. For example, we observed a significant positive correlation between sRSM1 target transcripts and doubling-time in NCI-60 breast cancer cell-lines (Fig. 4a). Indeed, knocking-down HNRPA2B1 resulted in a slight but significant increase in growth rate (by 10%, p-value<10−8) further highlighting the regulatory role of this global modulator in a key cellular process (Fig. 4b).
Revealing the detailed post-transcriptional regulatory code relies on the discovery of all the cis-regulatory elements that contribute to changes in transcript abundance. In addition to the sRSMs identified through TEISER, we also discovered a large diverse set of lRSMs (linear RNA Stability Motifs), including six known miRNA recognition sites, that are informative of transcript stability measurements (Supplementary Fig. 9). These motifs were identified by FIRE2, a framework for discovering informative linear motifs. Combining these two approaches provided us with an extensive set of putative regulatory elements that cover both structural and primary sequence components. The next step in deciphering the post-transcriptional regulatory program involves the identification of target pathways that are potentially modulated by each element. Using iPAGE10, for pathway analysis of gene expression, we showed that our discovered elements likely target a diverse array of cellular processes and pathways (Supplementary Fig. 10). For example, the sRSM1 structural element is significantly enriched in the 3’ UTRs of the genes involved in “Notch signaling”, while avoiding the UTRs of other pathways such as “nucleosome assembly” (Supplementary Fig. 11). These results demonstrate that while post-transcriptional regulatory mechanisms are poorly characterized, they have potentially far-reaching impact on specific cellular processes.
Regulatory programs often employ combinatorial interactions between various cis-regulatory elements to modulate gene expression2,22. We utilized mutual information to reveal such potential interactions in the post-transcriptional regulatory programs governing mRNA stability (Supplementary Fig. 12 and 13). For example, sRSM1 showed significant interactions with a number of structural and linear motifs, including sRSM8 and sRSM3 (Supplementary Fig. 11). These observed interactions might reflect cross talk, or insulation, between the underlying regulatory processes that act upstream of these elements. The full map of such interactions (Supplementary Fig. 14 and 15) reveals a complex network of motif-pathway relationships that set the stage for molecular dissection and predictive modeling of post-transcriptional regulation from sequence.
While we have studied mRNA stability under normal and static conditions in a single cell line, the full regulatory program that governs mRNA stability likely involves a much richer repertoire of cis-regulatory elements operating within a more complex regulatory network. Also, while we have focused on transcript stability, our framework is general in concept and can be employed to study complex regulatory programs governing other aspects of RNA biology. For example, the established role of local secondary structures in shaping the splicing code4,23 suggests alternative splicing as a prominent area for analysis using this framework. The large repertoire of publicly available whole-genome expression datasets similarly offers a rich resource for identifying the post-transcriptional regulatory modules that underlie steady-state measurements.
TEISER relies on calculating mutual information values between whole-genome measurements and millions of predefined structural motifs. The statistically significant motifs are then optimized and elongated through a greedy algorithm. The mRNA stability measurements were performed using a previously published method1. The decoy/scrambled experiments and siRNA knock-downs were performed using lipofectamin 2000 reagent (Invitrogen). For hybridizations, we used human 4×44k whole-genome human arrays (Agilent). Isolation and identification of RNA-binding proteins were based on previously published protocols13,24. HNRPA2B1 target transcripts were isolated based on the CLIP protocol18.
Full Methods and any associated references are available in the online version of the paper at www.nature.com/nature.
We thank the members of the Tavazoie laboratory for helpful comments on the project and manuscript. We are also grateful to Nora Pencheva, Bambi Tsui, Sohail Tavazoie and Lars Dölken for their intellectual and technical contributions. S.T. was supported by grants from NHGRI (2R01HG003219) and the NIH Director's Pioneer Award.
Supplementary Information is linked to the online version of the paper at www.nature.com/nature.
Author Contributions HG, HSN and ST conceived and designed the study. HG and HSN developed TEISER. RS contributed to the execution of the study. HG, HSN, TMG, PO, IMC and ST designed the experiments. HG, PO, LF and TMG performed the experiments. HG, HSN and TMG analyzed the results. HG, HSN and ST wrote the paper.
Reprints and permissions information is available at www.nature.com/reprints The microarray and high-throuput sequencing data are deposited at GEO under the umbrella accession number GSE35800.