Due to read length limitations, most RNA-Seq studies turn to random hydrolysis of the sample before sequencing
21. Instead, we fragmented RNA in a structure-specific manner, reporting on nuclease susceptibility along each transcript. FragSeq will not generate the uniform coverage across a transcript needed for accurate abundance estimates or alternative splicing characterization. Instead, quantitative comparisons along each transcript, expressed as cutting scores, are made between enzyme-treated samples versus control samples, yielding information about RNA structure. For analysis of a novel transcriptome, the FragSeq preparation can be done in parallel with other preparations that quantify abundance, barcoding the samples for analysis in a single sequencing run.
By using nuclease P1, we were able to specifically enrich for its products and avoid products of spontaneous or canonical RNase degradation. Using the parallel PNK treatment where these latter products were converted to clonable RNAs showed how sequencing multiple treatments yields insights into naturally labile sites.
In parallel with this manuscript, a similar technique for high-throughput RNA structure probing was introduced
22. That study utilized nuclease S1, which has similar properties to P1, and RNase V1, which cleaves stacked bases. Their readout of structure is reported as a ratio of susceptibilities of each RNA site to the two nucleases, whereas FragSeq monitors one nuclease with respect to a control run without nuclease. We favor cutting scores that are log ratios of data from nuclease versus control treatments because they describe, for each site, its nuclease susceptibility relative to its natural degradation susceptibility in the cell or during the preparation. Cut counts per site in the nuclease-treated sample alone do not provide data as informative as cutting scores (compare with
Supplementary Fig. 4).
We provide configurable software to compute cutting scores from mapped sequencing reads, outputting them and intermediate analysis data in formats compatible with the UCSC Genome Browser (
http://genome.ucsc.edu), allowing visualization of structure data in a genomic context. This allows straightforward application of our analysis tools to future sequencing runs. We also modified the well-established RNAstructure software
23 to allow input of FragSeq data to guide computational structure prediction (
Supplementary Discussion).
We do not observe single-hit kinetics for which probing studies generally aim, as many ncRNA reads do not contain the native 3′ ends of the RNA from which they originate (
Supplementary Fig. 9). We also do not observe native 5′ ends for those RNAs, but that is due to the trimethylguanosine cap blocking adapter ligation. We have not determined whether multiple cuts by P1 in solution are indeed the general case, or whether our size selection step enriches for products of multiple hits. Perhaps calibrating P1 for single-hit kinetics on
in vitro transcribed test RNAs did not translate to single-hit kinetics in the nuclear transcriptome where many ncRNAs are highly modified. In addition, the test RNAs in our probing experiment were all intact at the beginning of digestion, whereas a portion of the ncRNAs in the nuclear sample may be partially degraded. In any case, it is clear that reads produced by multiple cuts are providing reliable structure data. This is likely because P1 prefers to cut in stem-loops or hinge regions and these cuts are unlikely to cause the closing helix to denature under our salt conditions, so the original structure may not change before subsequent cuts. As hinge regions often connect domains that fold separately, cuts there would not lead to refolding of those independent domains. This may not be true for larger structured RNAs with long-range tertiary interactions, but these RNAs fall outside of the scope of our current method. Rather than comparing to conventional single-hit probing, it is more fitting to liken FragSeq nuclease data to DNase hypersensitivity assays on chromatin in that it gives a global perspective of RNA structure (e.g. stem-loop positions) rather than fine details (e.g. bulges in a helix).
We envision several areas of RNA biology where refinement of a FragSeq protocol might prove fruitful. One topic of particular interest is riboswitches, RNA molecules that change structure upon the binding of a metabolite ligand
24. Using parallel sequencing runs with and without the ligand of interest could yield a differential pattern of cutting scores along such RNAs that would serve as a signature of a conformational change.
Additionally, nuclease protection assays
25 could be scaled up to whole transcriptomes by performing parallel nuclease digestions with and without an RNA-binding protein pre-incubated with the whole-cell RNA. Identifying differentially protected regions would hone in on the RNA binding protein's specificity for sequence or structural context. Likewise, such digestions could be carried out on whole cell or nuclear extracts with proteins still bound. Nuclease P1 would be a good candidate for these digestions since the buffer conditions for extracts are usually similar to the relatively physiological pH and salt concentrations used in this study.
Nuclease P1 is also stable at high temperatures so we envision that FragSeq could be another way to monitor thermal denaturation of RNA domains. By parallel sequencing from nuclease reactions performed at different temperatures, the single-stranded character of a given transcript could be monitored and act as a proxy for unfolding.
Though we focused on one enzyme here, our experimental pipeline and software could be easily adapted to other enzymatic or chemical probes, so long as a proper control is carried out in parallel. FragSeq, combined with methods developed in previous RNA-Seq studies, enables researchers to take high-throughput transcriptome analysis beyond one-dimensional sequence to reveal structural features of RNAs and provide clues to their underlying biology.