To isolate stability from other aspects of mRNA behavior, we performed whole-genome mRNA stability measurements by incubating MDA-MB-231 cells in the presence of 4-thiouridine (4sU), which is efficiently incorporated into cellular RNA. Subsequently, 4sU-labeled transcripts were captured and quantified at different time-points after the removal of 4sU from the growth medium. We calculated a relative decay rate for each transcript based on the rate at which 4sU-labeled transcripts, in the absence of 4sU in the media, are replaced by newly synthesized unlabeled mRNAs in the population (
Supplementary Fig. 1). These measurements were then used to identify the putative
cis-regulatory elements (linear and structural) that underlie transcript stability. A number of methods have been previously introduced for discovering structural motifs mainly based on free energy minimization, local sequence alignments or a combination of both alignments and secondary structure predictions
3,6,8. However, the extent to which these
in silico predictions reflect stable
in vivo molecular conformations has not been fully explored
9. In fact, the RNA binding proteins and complexes that interact with their target transcripts may facilitate the formation of secondary structures
in vivo. Thus, we sought to bypass the need for predicting thermodynamically stable secondary structures by efficiently enumerating a large space of potential structural motifs. We developed TEISER (Tool for Eliciting Informative Structural Elements in RNA), a framework for identifying the structural motifs that are informative of whole-genome measurements across all the transcripts. In this approach, structural motifs are defined in terms of context-free grammars
7 (CFGs) that represent hairpin structures as well as primary sequence information (see
Methods and
Supplementary Fig. 2). TEISER employs mutual information to measure the regulatory consequences of the presence or absence of each of roughly 100 million different seed CFGs (see
Methods). Mutual information is a robust non-parametric measure that reveals general dependencies across discrete or continuous measurements
2,10. For example, when applied to the transcript stability data, TEISER captures the dependency between the stability of each mRNA and the presence or absence of a given structural motif in its 5’ and 3’ untranslated regions (UTRs). TEISER, subsequently, uses these measurements to choose and further refine the most informative motifs, and performs a series of statistical tests,
e.g. randomization-based statistics and jackknifing tests, to achieve very low (<0.01) false-discovery rates (see
Methods and
Supplementary Fig. 2).
Application of TEISER to the mRNA stability measurements in MDA-MB-231 cells revealed eight strong structural motif predictions that passed our statistical tests aimed at finding the most likely elements causally involved in mRNA stability ( and
Supplementary Fig. 3). Apart from being highly informative of mRNA stability measurements, these putative regulatory elements show a variety of other characteristics that support their functionality. For example, four of the discovered motifs are also informative of transcript stability measurements in mouse
11 (
Supplementary Fig. 4a). Furthermore, these motifs are highly conserved between human and mouse genomes (see
Methods and
Supplementary Fig. 3) and are also informative of co-expression clusters discovered across independent whole-genome datasets (
Supplementary Fig. 4b).
Among the putative structural motifs discovered by TEISER, we chose sRSM1 (structural RNA Stability Motif-1)—the most statistically significant 3’ UTR element (
z-score=122)—for further analysis. In order to probe the functionality of sRSM1 instances across the genome, we performed
in vivo titration experiments using synthetic oligonucleotides
10,12. Upon transfecting MDA-MB-231 cells with decoy RNA molecules harboring sRSM1 instances (
Supplementary Fig. 5), we observed a notable reduction in the level of endogenous transcripts that carried this motif, in comparison to their level in the control cells transfected with scrambled RNA molecules (). This global down-regulation points to the presence of a
trans-acting factor that, upon interaction with sRSM1, stabilizes its target transcripts. The decoy (synthetic) sRSM1 elements compete with endogenous mRNAs for the putative
trans-acting factor, which results in the observed reduction in the level of its target mRNAs. Furthermore, reporter constructs carrying instances of sRSM1 showed a marked decrease in transcript decay rate in comparison to scrambled controls, further suggesting a direct role for this structural element in transcript stability (
Supplementary Fig. 6).
We used streptomycin-binding RNA aptamer immobilization coupled with mass spectrometry
13 to discover candidates that bind,
in vitro, to the decoy instances of sRSM1, but not to the scrambled versions (
Supplementary Fig. 7). After isolation under stringent conditions and in-solution digestion of RNA-bound proteins followed by nanoliquid chromatography-tandem mass spectrometry, we identified HNRPA2B1 as a promising candidate (
Supplementary Table 1). This RNA-binding protein is a member of the A/B subfamily of heterogeneous nuclear ribonucleoproteins (hnRNPs)
14 and carries two repeats of quasi-RRM RNA binding domains (
Supplementary Fig. 8). Moreover, the established roles of other members of this family, namely HNRNPD and HNRNA1, in regulating RNA stability
15 and binding terminal stem-loops
16 further suggest HNRPA2B1 as a functional regulator. Also, more than 4,000 transcripts carry potentially functional instances of sRSM1 (see
Methods), implicating this motif as a major global regulator of mRNA stability. The HNRPA2B1 transcript, at the same time, is highly abundant in the cell (one standard deviations higher than average
17), thus making it a promising candidate for global modulation of mRNA stability through sRSM1.
In order to directly assess the regulatory consequences of modulating HNRPA2B1, we performed knock-down experiments followed by gene expression profiling. Consistent with our prior observations, HNRPA2B1 knock-down caused a significant decrease in the expression level of transcripts carrying sRSM1 (). Stability measurements in the knock-down cells confirmed that the observed down-regulation of these transcripts was in fact due to changes in stability (see
Methods), with the transcripts carrying sRSM1 elements showing a marked increase in their corresponding relative decay rates ().
In principle, our observations are consistent with a possible indirect role for HNRPA2B1—brought about, for instance, by a common partner that binds both HNRPA2B1 and sRSM1 sites. The direct interaction between HNRPA2B1 and its potential target genes can be tested through cross-linking and immunoprecipitation of HNRPA2B1, which, through local UV photoreactivity of bases and amino-acids, can detect direct physical interactions
18. We expressed a tagged clone of HNRPA2B1 in MDA-MB-231 cells, and after UV-crosslinking, immunoprecipitated this protein and the target mRNA molecules that were bound to it. We then labeled the isolated RNA population and hybridized it to microarrays with the input total RNA as control (a method called RIP-chip
19). We observed a highly significant enrichment of sRSM1 in the immunoprecipitated population (). In order to reduce the background and better pinpoint the HNRPA2B1 binding sites, we treated the samples with nuclease prior to immunoprecipitation under denaturing conditions and sequenced the HNRPA2B1-bound RNA population (HITS-CLIP
20). We observed that sRSM1 elements were significantly enriched in the identified putative binding sites, in comparison with randomly selected sequences
21 (). These observations demonstrate that HNRPA2B1 directly interacts with sRSM1
in vivo and functions to stabilize its target transcripts through this regulatory element. These transcripts, in turn, modulate a variety of cellular processes and pathways. For example, we observed a significant positive correlation between sRSM1 target transcripts and doubling-time in NCI-60 breast cancer cell-lines (). Indeed, knocking-down HNRPA2B1 resulted in a slight but significant increase in growth rate (by 10%,
p-value<10
−8) further highlighting the regulatory role of this global modulator in a key cellular process ().
Revealing the detailed post-transcriptional regulatory code relies on the discovery of all the
cis-regulatory elements that contribute to changes in transcript abundance. In addition to the sRSMs identified through TEISER, we also discovered a large diverse set of lRSMs (linear RNA Stability Motifs), including six known miRNA recognition sites, that are informative of transcript stability measurements (
Supplementary Fig. 9). These motifs were identified by FIRE
2, a framework for discovering informative linear motifs. Combining these two approaches provided us with an extensive set of putative regulatory elements that cover both structural and primary sequence components. The next step in deciphering the post-transcriptional regulatory program involves the identification of target pathways that are potentially modulated by each element. Using iPAGE
10, for pathway analysis of gene expression, we showed that our discovered elements likely target a diverse array of cellular processes and pathways (
Supplementary Fig. 10). For example, the sRSM1 structural element is significantly enriched in the 3’ UTRs of the genes involved in “Notch signaling”, while avoiding the UTRs of other pathways such as “nucleosome assembly” (
Supplementary Fig. 11). These results demonstrate that while post-transcriptional regulatory mechanisms are poorly characterized, they have potentially far-reaching impact on specific cellular processes.
Regulatory programs often employ combinatorial interactions between various
cis-regulatory elements to modulate gene expression
2,22. We utilized mutual information to reveal such potential interactions in the post-transcriptional regulatory programs governing mRNA stability (
Supplementary Fig. 12 and 13). For example, sRSM1 showed significant interactions with a number of structural and linear motifs, including sRSM8 and sRSM3 (
Supplementary Fig. 11). These observed interactions might reflect cross talk, or insulation, between the underlying regulatory processes that act upstream of these elements. The full map of such interactions (
Supplementary Fig. 14 and 15) reveals a complex network of motif-pathway relationships that set the stage for molecular dissection and predictive modeling of post-transcriptional regulation from sequence.
While we have studied mRNA stability under normal and static conditions in a single cell line, the full regulatory program that governs mRNA stability likely involves a much richer repertoire of
cis-regulatory elements operating within a more complex regulatory network. Also, while we have focused on transcript stability, our framework is general in concept and can be employed to study complex regulatory programs governing other aspects of RNA biology. For example, the established role of local secondary structures in shaping the splicing code
4,23 suggests alternative splicing as a prominent area for analysis using this framework. The large repertoire of publicly available whole-genome expression datasets similarly offers a rich resource for identifying the post-transcriptional regulatory modules that underlie steady-state measurements.