|Home | About | Journals | Submit | Contact Us | Français|
The structures of RNA molecules are often important for their function and regulation1-6, yet there are no experimental techniques for genome-scale measurement of RNA structure. Here, we describe a novel strategy termed Parallel Analysis of RNA Structure (PARS), which is based on deep sequencing fragments of RNAs that were treated with structure-specific enzymes, thus providing simultaneous in-vitro profiling of the secondary structure of thousands of RNA species at single nucleotide resolution. We apply PARS to profile the secondary structure of the mRNAs of the budding yeast S. cerevisiae and obtain structural profiles for over 3000 distinct transcripts. Analysis of these profiles reveals several RNA structural properties of yeast transcripts, including the existence of more secondary structure over coding regions compared to untranslated regions, a three-nucleotide periodicity of secondary structure across coding regions, and a relationship between the efficiency with which an mRNA is translated and the lack of structure over its translation start site. PARS is readily applicable to other organisms and to profiling RNA structure in diverse conditions, thus enabling studies of the dynamics of secondary structure at a genomic scale.
Existing experimental methods for measuring RNA structure can only probe a single RNA structure per experiment and are typically limited in the length of the probed RNA (Supplemental note 1). To simultaneously measure structural properties of many different RNAs, we extracted poly-adenylated transcripts from log-phase growing yeast, renatured the transcripts in vitro, and treated the resulting pool with RNase V1 and separately, with RNase S1. RNase V1 preferentially cleaves phosphodiester bonds 3′ of double-stranded RNA, while RNase S1 preferentially cleaves 3′ of single-stranded RNA7. Thus, data from these two complementary enzymes should allow us to measure the degree to which each nucleotide was in a single- or double-stranded conformation (Fig. 1). We chose renaturation and enzymatic cleavage conditions under which the cleavage reactions occur with single-hit kinetics (Supplementary Fig. 1a,b), and where intramolecular, but not intermolecular, RNA-RNA interactions are dominant (Supplementary Fig. 1c,d). As a control, we also added two short RNA domains from HOTAIR, a human non-coding RNA8, and from the structurally known Tetrahymena group I intron ribozyme9.
We devised a ligation method to specifically ligate V1- and S1-cleaved RNA to adaptors, and converted them into cDNA libraries suitable for deep sequencing (Supplementary Fig. 2). As both enzymes leave a 5′ phosphate at the cleavage point and since only 5′ phosphoryl-terminated RNA are capable of ligating to our adaptors, we enrich for V1- and S1-cleaved fragments and select against random fragmentation and degradation products that typically have 5′ hydroxyl (Supplementary Fig. 3). Thus, each observed cleavage site provides evidence that the cut nucleotide was in a double-stranded (for V1-treated samples) or single-stranded (for S1-treated samples) conformation. As a quantitative measure at nucleotide resolution representing the degree to which a nucleotide was in a double- or single-stranded conformation, we take the log-ratio between the number of sequence reads obtained for each nucleotide in the V1 and S1 experiments. A higher (lower) log-ratio, or PARS score, thus denotes a higher (lower) probability for a nucleotide to be in a double-stranded conformation.
We performed four independent V1 experiments and three independent S1 experiments, which were highly reproducible across replicates (correlation=0.60-0.93, Supplementary Table 1), resulting in a total of ~85 million sequence reads that map to the yeast genome, of which ~97% mapped to annotated transcripts (Supplementary Table 2). At an average nucleotide coverage above 1.0, we obtained structural information for over 3000 yeast transcripts (Supplementary Table 3, Supplementary Fig. 4a), covering in total over 4.2 million transcribed bases, which is ~100-fold more than all published RNA footprints to date.
We used several tests to check for biases in our method, and found that RNase cleavage, adaptor ligation, and cDNA conversion do not introduce significant sequence biases (Supplementary Fig. 5), that our protocol has a very small bias towards particular regions along the transcript (Supplementary Fig. 6), and that we capture RNA fragments in proportion to their abundance in the initial pool (Supplementary Fig. 4b,c). Finally, we confirmed that signals generated by RNase V1 are highly distinct from those generated by RNase S1. Global inspection across all transcripts revealed that ~7% of the V1 and S1 peaks are shared (Methods and Supplementary Table 4 and Supplementary Fig. 7). Those joint peaks could be the result of experimental noise introduced by nonspecific enzymatic activity, but could also correspond to dynamic RNA regions or transcripts that fold into more than one stable conformation.
To test whether PARS accurately measures RNA structures, we first confirmed that its signals are similar to those obtained with traditional footprinting. To this end, we carried out ten separate footprinting experiments with either RNase V1 or S1, on two domains from the Tetrahymena ribozyme, two domains from the human HOTAIR non-coding RNA, which we doped into our samples and two domains of endogenous yeast mRNAs. In all cases, we found high agreement between our PARS signals and footprinting (correlations=0.63-0.97, Fig. 2 and Supplementary Fig. 8-10). Notably, due to length limitations of footprinting, we had to select short domains from each of the above transcripts, in vitro transcribe them, and then apply footprinting. Thus, footprinting may be inaccurate, since due to long-range interactions, the excised fragment could fold differently when taken out of context. In contrast, PARS can probe RNAs in their full-length context.
Next, we compared PARS to reported structures of yeast coding and non-coding RNAs, and found that it correctly reproduces the known secondary structure of three structured RNA domains of ASH110, of a structural element in URE2 mRNA11 and of the glu-tRNA (Fig. 2e-f and Supplementary Fig. 11-12). This suggests that PARS can provide structural information of transcripts in their full-length context and endogenous abundance from within a complex RNA pool. Taken together, our analyses demonstrate that PARS recapitulates results obtained by low-throughput methods with high accuracy, and also has advantages over existing methods, stemming from its ability to probe structures of long RNAs.
As another independent validation of PARS, we compared it to computational predictions of RNA structure, by applying the Vienna package12 to the 3000 transcripts that we analyzed. We found a significant correspondence between these predictions and our PARS scores, whereby nucleotides with high (low) double-stranded PARS score had a significantly higher (lower) average predicted pairing probability (p<10−200, Fig. 3a and Supplemental Fig. 13). Despite this significant global correspondence, there are large differences between PARS and predictions, in part due to noise in our approach but also due to known inaccuracies of folding algorithms. We thus suggest that genome-wide PARS data can be used to constrain folding algorithms and improve their accuracy, as previously shown for specific RNAs13,14 (Supplementary Fig. 15).
We used the obtained structural profiles to investigate five global properties of yeast transcripts. First, examining the average PARS score across the coding regions and untranslated regions (UTRs), we found that coding regions exhibit significantly more pairing than 5′ and 3′ UTRs (p<10−30 and p<10−50 respectively, Fig. 3c). Notably, the start and stop codons each exhibit local minima of PARS scores, indicating reduced tendency for double-stranded conformation and increased accessibility. These findings agree with previous computational predictions for mouse and human genes15. The evolutionary conservation of this global organization of mRNA secondary structure suggests that it may have functional importance. An overall decreased pairability in UTRs may allow functional elements to stand out and conversely, highly paired domains along coding regions may protect against ectopic translation initiation, or regulate ribosome translocation and protein folding, as recently postulated13.
Second, aligning our measured transcripts about their start codon and applying a discrete Fourier transform analysis to the average PARS signal, we detected a periodic structure signal across coding regions with a cycle of three nucleotides, such that on average, the first nucleotide of each codon is least structured and the second nucleotide is most structured. Notably, this periodic signal is only found in coding regions, and not in UTRs (Fig. 3b), and the degree of three-nucleotide periodicity in transcripts is significantly associated with ribosome density in vivo16 (Supplementary Fig. 14), suggesting that the three-nucleotide periodicity may directly or indirectly facilitate translation.
Third, we tested whether there is a correlation between mRNA structure around the translation start site and translation efficiency. Such a relationship has long been hypothesized17 and recently shown for one reporter protein in E. coli18. We found a small but significant anti-correlation between PARS scores at the region located ~10bp upstream of the translation start site and ribosome density throughout the transcript16, a proxy for translational efficiency (correlation=-0.1, p<10−4, Fig. 4a). Intriguingly, the −10bp region corresponds to the 5′ position of the first ribosome on yeast mRNAs16. To examine this relationship in more detail, we applied k-means clustering (k=4) to the PARS structural profile of the ±40bp surrounding the translation start site. Notably, genes found in clusters 3 and 4, exhibit significantly less structure in their 5′ UTR than in the beginning of their coding region, as well as a higher ribosome density (Fig. 4b). Overall, these results provide the first genome-wide experimental validation for the suggestion that mRNA secondary structure around the start codon may reduce translational efficiency17, although the low correlation we found implies that in vivo, translational efficiency is determined by additional factors.
Fourth, we asked whether genes with shared biological functions or cytotopic localizations19 tend to have similar PARS scores, indicative of similar degrees of secondary structures. We found a rich picture of biological coordination (Supplementary Fig. 16 and Supplementary Table 5), including increased RNA structure, especially in coding regions, in transcripts whose encoded proteins localize to distinct cellular domains or participate in distinct metabolic pathways, and found that mRNAs with the least secondary structure in their 5′ UTR and CDS encode subunits of the ribosome.
Finally, we examined the PARS score of transcripts predicted to encode a signal peptide, since a recent study showed that RNA sequences encoding the signal sequence (termed the SSCR) of secretory proteins function as RNA elements that promote RNA nuclear export20. We found that the 5′ UTR region and first ~30 coding nucleotides of signal peptide transcripts have lower PARS signal, indicating increased single-stranded propensity, as compared to other transcripts (p<10−11, Fig. 4c). Since SSCRs typically reside in the beginning of the coding region, these results suggest that specific secondary RNA structure around gene starts may assist in the cytotopic localization of mRNAs and their resulting proteins. More generally, we suggest that PARS can be used to both generate and test hypotheses regarding signals of secondary structure that may characterize and have functional importance for classes of mRNAs.
In summary, we introduced PARS, the first high-throughput approach for genome-wide experimental measurement of RNA structural properties, and showed that it recovers structural profiles with high accuracy and at nucleotide resolution. Like most existing methods, one limitation of PARS is that it maps RNA structures in vitro, and its reported structures may thus differ significantly from the in vivo conformations. This may be addressed in the future using reagents that can probe RNA structure in living cells7, but will require new methods to adapt to deep sequencing. Overall, PARS transforms the field of RNA structure probing into the realm of high-throughput, genome-wide analysis and should prove useful both in determining the structure of entire transcriptomes of other organisms as well as in systematically measuring the effects of diverse conditions on RNA structure. Probing RNA structure in the presence of different ligands, proteins, or in different physical or chemical conditions may provide further insights into how RNA structures control gene activity.
Total RNA was extracted from yeast grown at 30C to exponential phase in YPD medium by using hot acid phenol. Poly(A)+ RNA was obtained by purifying twice using the Poly(A) purist Kit. A diagram showing the PARS protocol is provided in Supplementary Fig. 2.
RNA was folded and probed for structure using 0.01U of RNase V1 (Ambion), or 1000U of S1 nuclease (Fermentas), in a 100ul reaction volume. A modified version (see supplementary methods) of the SOLiD Small RNA Expression Kit was used to convert fragments into a sequencing library.
cDNA libraries were amplified onto beads and subjected to emulsion PCR, according to the standard protocol described in the SOLiD Library Preparation Guide. Obtained sequences were truncated to 35bp, and required to map uniquely to either the yeast genome or transcriptome, allowing up to one mismatch and no insertions or deletions.
The PARS Score is defined as the log2 of the ratio between the number of times the nucleotide immediately downstream to the inspected nucleotide was observed as the first base when treated with RNase V1 and the number of times it was observed in the RNase S1 treated sample. To account for differences in overall sequencing depth between the V1- and S1- treated samples, the number of reads for each nucleotide is normalized prior to the computation of the ratio.
Periodicity analysis was done applying a Discrete Fourier Transform to the average PARS score collected from the following genomic features: last 100 bases of the 5′ UTR, first 200 bases of the coding sequence, 100 first bases of the 3′ UTR.
We thank D. Herschlag’s group, A. Adler, A. Fire, M. Kay, Life Technologies SOLiD team, M. Rabani, G. Sherlock, and A. Weinberger for assistance and critiques. Supported by NIH grant (RO1HG004361). Y.W. is funded by the Agency of Science, Technology and Research of Singapore. H.Y.C. is an Early Career Scientist of the Howard Hughes Medical Institute. E.S. is the incumbent of the Soretta and Henry Shapiro career development chair.
Online Resources. Nucleotide-resolution raw reads and PARS scores for the 3000 genes included in our analysis can be visualized and downloaded at http://genie.weizmann.ac.il/pubs/PARS10
Supplementary Information is linked to the online version of the paper at www.nature.com/nature.
Author Contributions M.K., J.R., H.C. and E.S. conceived the project; Y.W. and H.C. developed the protocol and designed the experiments; Y.W. performed all experiments; M.K., E.M. and E.S. planned and conducted the data analysis; J.R. and R.N. helped with sequencing; M.K., Y.W., E.M., H.C. and E.S. wrote the paper with contribution from all authors.
Author Information Sequencing data has been deposited in the Gene Expression Omnibus (GEO) under accession number GSE22393.
Reprints and permissions information is available at www.nature.com/reprints.
The authors declare no competing financial interests.