|Home | About | Journals | Submit | Contact Us | Français|
With a dataset of more than 600 million small RNAs deeply sequenced from mouse hippocampal and staged sets of mouse cells that underwent reprogramming to induced pluripotent stem cells, we annotated the stem–loop precursors of the known miRNAs to identify isomoRs (miRNA-offset RNAs), loops, non-preferred strands, and guide strands. Products from both strands were readily detectable for most miRNAs. Changes in the dominant isomiR occurred among the cell types, as did switches of the preferred strand. The terminal nucleotide of the dominant isomiR aligned well with the dominant off-set sequence suggesting that Drosha cleavage generates most miRNA reads without terminal modification. Among the terminal modifications detected, most were non-templated mono- or di-nucleotide additions to the 3′-end. Based on the relative enrichment or depletion of specific nucleotide additions in an Ago-IP fraction there may be differential effects of these modifications on RISC loading. Sequence variation of the two strands at their cleavage sites suggested higher fidelity of Drosha than Dicer. These studies demonstrated multiple patterns of miRNA processing and considerable versatility in miRNA target selection.
MicroRNAs (miRNAs) are ~21nt non-coding RNAs that serve as key post-transcriptional regulators of gene expression in the specialized cells of multi-cellular organisms (1–3). In animals, miRNAs are typically transcribed from the genomes as primary (pri-miRNA) transcripts by RNA polymerase II (4), and less often as mirtrons from spliceosome-mediated processing (5,6). The pri-miRNAs fold into hairpins and are cleaved at their base into pre-miRNAs by the RNase III family member, Drosha (7–9). Pre-miRNAs are ~70nt stem-loop structures, which are exported to the cytoplasm (10–12), where another RNase III family member, Dicer, cleaves the pre-miRNA loop to make a 21-bp duplex (9,13,14) that associates tightly with Argonaute (Ago) proteins (15,16). miR-451 is processed in a Dicer-independent alternative pathway in which the pre-miRNAs are cleaved by AGO2 and then are processed to maturity by exonucleolytic cleavage (17–19). Generally, one strand of the duplex is preferentially selected for incorporation into the RNA-induced silencing complex (RISC) (14,20) where it represses gene expression through interaction with the 3′-untranslated regions (3′-UTRs) of mRNAs [reviewed in (1,3)] and the other strand is released and degraded (21,22). The preferred strand has been called the guide strand or the mature miRNA and non-preferred strand has been called the passenger or the star strand (miRNA*). In general, the retained strand is the one with a less stably base-paired 5′-end in the miRNA/miRNA* duplex (2,20,21,23). The passenger strand may also be incorporated into a functional RISC (24–27).
Depending upon the depth of sequencing low copy number by-products of miRNA processing can be detected (28–31). Importantly, the complex processing of miRNAs greatly expands the targeting potential due to the utilization of both strands of the miRNA:miRNA* duplex, length heterogeneity, sequential Dicer cleavage and RNA editing.
We collected small RNA deep sequencing data from two large groups of cell and tissue types and annotated each known mouse miRNA to reveal general patterns of miRNA biogenesis and expression. Based on the large scale and depth of the integrated data, we were able to collect phased reads and annotate extensively the stem-loop precursors of each known mouse miRNA expressed in these samples. The snapshot quantification of pre-miRNA fragments suggests a high differential stability of the fragment categories—5p strand, 3p strand, loop regions, moRs, as well as extensive isoform variation that may contribute to functional versatility of miRNA function.
The small RNA samples and datasets are listed in Supplementary Table S1. Six hippocampus samples were used for total RNA isolation with mirVana™ miRNA Isolation kit (Ambion, Austin, TX) following the manufacturer’s protocol. Four samples came from four different mice and were used to isolate small RNAs (designated Group I: Hippocampus-total RNA samples). Two samples from the same source were separately immunoprecipitated with three different Ago antibodies, AGO1, AGO2 and 28A [generously donated by Z. Mourelatos (32) designated Hippocampus-AGO-IP data] by usingDynabeads protein G, kit (Invitrogen #100.03D). Eight samples came from mouse cells at various stages of reprogramming with four factors (33,34) (designated Group II: Reprogramming-total RNA samples). These samples included uninfected Oct4-GFP MEFs (mouse embryonic fibroblasts), Thy1- cells harvested from Day 5 post-transduction cultures, SSEA1+ cells [identified by APC-conjugated anti-SSEA1 (R&D Systems, Minneapolis, MN; Cat.FAB2155A)], GFP+ iPSC colonies purified by cell sorting, and induced pluripotent stem cell lines (ipS cells). Another three sets of cells were mouse embryonic cells (mES) and two partially induced pluripotent stem cell lines (piPS_4 and piPS_5). The libraries were prepared with the SOLiD™ Small RNA Expression Kit and sequenced by SOLiD™ sequencer version 3 and 4. The samples were sequenced at a length of 25 bp for some datasets and then sequenced at 35 bp.
Using the SOLiD™ small RNA Analysis pipeline (http://solidsoftwaretools.com), the raw data was filtered for tRNA, rRNA and adaptor sequences and then aligned against miRNA precursors and the mouse genome. To identify the adaptor sequences at the 3′-end of reads, the first 18nt with a maximum of two or three mismatches were aligned to find the candidate matches. Candidate matches were then extended to identify the best alignments and locate adaptors, producing high quality data for the exact size of the small RNAs (Supplementary Figure S1). Reads were first mapped to known mouse miRNAs (miRBse version 16), and then reads that did not align were mapped against the mouse genome for further analysis. A sequencing error rate of <1% was calculated by determining the percentage of tags that did not match the first 10nt of the mature miRNA.
The coordinates of known miRNA loci from miRBase were used to make a miRNA precursor database from the UCSC genome browser, and 50nt were added at both ends to identify moR sequences. Reads were aligned against the precursor database to obtain matches for 5p strands, 3p strands, moRs and loop regions. Only perfect matches were used for further analysis. The most frequently sequenced isomiRs of both arms were chosen as a reference to classify other isomiRs. IsomiRs with no more than a 3nt difference at either end from the most frequent isomiR were counted and included in Supplementary Datasets (Supplementary Datasets S1, S2 and S3). moRs were identified by surveying clustered reads outside the dominant isomiRs; a gap of 3nt at most between moRs and dominant isomiRs was allowed. Only very rarely did reads span the cleavage sites of the hairpin. Most of the analysis was done with in-house written Perl scripts. Z-tests were used to determine the significance of isomiR variation.
Non-templated nucleotides at the termini of mature miRNAs were identified by checking sequences before adaptors that did not exactly match the genome. The accuracy of adaptor identification was very important for correctly identifying non-templated nucleotides. Therefore, only data with 35-bp reads, where much of the adaptor was sequenced, were used.
A total of 34 datasets from 17 samples and more than 626 million (M) reads were analyzed (see Materials and Methods section and Supplementary Table S1). The 188M reads matched known mouse miRNA precursors from miRBase (Figure 1) and an additional 128M matched exons, introns and intergenic regions in the mouse genome. Reads that matched multiple miRNAs were rejected and only perfect matches were kept for downstream analysis. miRBase version 16 contains 582 well-authenticated mouse miRNA species and we identified 526 miRNA species in our samples (416 for Group I and 489 for Group II) by requiringdetection of both strands with at leastten reads on the non-preferred strand. Those miRNAs not validated in our samples may be expressed in tissues not used in our analysis. miRNA expression in Group I and Group II have distinct and overlapping patterns of miRNA expression (Supplementary Figure S2, Supplementary Tables S2 and S3). The miRNAs in AGO-IP data from the mouse hippocampus (Supplementary Table S4) correlated very well with total hippocampus small RNA profiling data, whereas groups I and II miRNAs were not significantly correlated. With a cutoff of >0.1% of miRNA percent in the sample, half of the expressed known miRNAs overlapped between Group I and II.
Of those reads that perfectly matched miRNA precursors, 180M reads corresponded to the more abundant strands and 8M reads to the less abundant strands from each duplex. The preferred strand was considered the one with more reads. Strands were also named 5p or 3p without regard to the preferred strand. Mature miRNAs can be processed from either arm of the pre-miRNA hairpins, but in most cases, the discrimination of the preferred strand was very high. Although a bias toward preferential stabilization of the 5′-arm in mammalian species has been reported (35,36), our data show more equal strand stabilization (Supplementary Figure S3) as seen in Caenorhabditiselegans (37).
The products of the preferred strand were 25 times that of the non-preferred strand in Group I and 18 times in Group II cells. We calculated the strand ratio for the 416 miRNAs with more than 50 reads on both arms. Sixty-five miRNAs had preferred strand reads that were < 2-fold higher than the non-preferred strand including some highly expressed miRNAs such as miR-292, miR-126, let-7d and miR-22. A total of 200 miRNAs had >10-fold difference between the two strands of the hairpin structures. The expression of 5p and 3p strands did not correlate well (R2=0.093) and large variations existed among different miRNAs (Supplementary Figure S4).
In the AGO-IP samples, the preferred strand was 39 times as abundant as the opposite strand. The correlations of the preferred/non-preferred ratio in the hippocampus between the AGO-IP samples and the total small RNAs data (Supplementary Figure S5) indicated that most miRNAs had higher ratios in the AGO-IP samples. Thus preferred strands were relatively more enriched in the AGO-IP sample, a finding, which supports the preferred strand bias for entry to the RISC. Nevertheless, non-preferred strands were often highly represented and may either associate with AGO, or alternatively, even with relatively high reads, fail to enter the RISC. If they are captured by a RISC, they could play a physiological role. A total of 90% of the miRNAs, which aligned to miRBase version 16 had more than 10 reads from the non-preferred strand and 71% of them had more than 50 reads from the non-preferred strand. miR-7a, miR-9, miR-129, miR-136, miR-132 miR-126 and let-7d in the mouse hippocampus (Supplementary Table S2), and miR-24, miR-199, miR-130 and the miRNA cluster on chromosome 7 that includes miR-292, miR-293, miR-294 and miR-295 in Group II cells all had both strands highly represented (Supplementary Table S3).
Characteristically, miRNAs are differentially expressed among tissue types. Most miRNAs showed a consistent strand preference across the cell and tissue types. The ratio of levels of 5′- to 3′-arms was well correlated between groups I and II (Figure 2). However, some miRNAs switched the preferred strand when comparing the two sample groups (Table 1). Most strand switches occurred among those miRNAs with read counts that did not differ greatly between the arms in the two sets. For example, hippocampal miR-22 had 5′-read counts of 69071 and 3′-read counts of 49670; however, in Group II cells the reads were 41172 and 60029 respectively. On the other hand, major strand switching occurred in the case of miR-1-2, which had 8368 and 22642 reads on 5′-and 3′-arms in hippocampus and the corresponding reads in Group II cells were 10356 and 903 (Table 1).
For each miRNA in our samples we determined the reads for each of the five phased products of miRNA processing: 5′-and 3′-moRs, loops, 5′-and 3′-miRNA strands. The sum of the lengths of the 5′-and 3′-miRNA strand and the loop sharply peaked in the range 55–65nt (Supplementary Figure S6) and corresponded to the length distribution of the sequences in miRBase. A total of 65000 reads matched moRs sequences, and 20000 reads matched loop sequences within miRNA precursors (Figure 1). Relative to total small RNA profiling samples, reads from the mature miRNAs dominated (Figure 1).
Reads of moRs were very common for most of the miRNAs (Figure 3, Supplementary Tables S2, S3, S4, Supplementary Datasets S1, S2 and S3). Although more highly expressed miRNAs tended to have higher loop and moR reads, their levels were not strictly correlated with the expression level of mature miRNAs. A total of 77% of known miRNAs had >10 moR reads. Relatively fewer moR reads were sequenced in Ago-IP samples, compared to data from total small RNA profiling (Figure 1). Reads of 5′-moR reads exceeded 3′-moRs by>3-fold in Group I and 5-fold in Group II samples. 5′-moRs were also reported to be more abundant than 3′-moRs in Drosophila (30), and therefore moR processing may be a highly conserved feature of the miRNA biogenesis pathway. The levels of 5′-moRs and 3′-moRs did not correlate (R2=0.025) suggesting some degree of independence in their generation. In Group II cells, miR-27b, miR-503, miR-21, miR-24 and miR-16 had the most abundant 5′-moRs, while miR-294 and miR-30e had the most 3′-moRs. In the hippocampus, miR-134 and miR-503 had abundant 5′-moRs (Supplementary Table S2, Supplementary Datasets S1, S2 and S3). Generally, the dominant moRs aligned perfectly with the dominant isomiRs (e.g. for a 3′-miR, the last base of the dominant isomiR was the base preceding the first base of the dominant moR), thereby suggesting that Drosha products without terminal modifications predominate among the phased fragments (Figure 3, Supplementary Datasets S1, S2 and S3). These findings also suggest that Drosha generates the ends of the 5′-and 3′-moRs that are adjacent to the pre-miR stem.
Although >90% of mouse miRNA precursors found in miRBase version 16 (31) had detectable loop regions, only 15% of the miRNAs had loop read numbers that exceeded 10. Therefore, loop sequences were the most unstable among all of the phased fragments. For comparably expressed mature miRNAs, the loop levels could differ greatly in different tissues. For example, some miRNAs, highly expressed in Group II had thousands of loop reads, e.g. such as miR-34a (2927 loop reads, 0.26% of the dominant strand), miR-106b (2359 loop reads, 5.34% of miRNAs from the dominant strand), miR-182 (2285 loop reads, 0.21% of the dominant strand) and miR-27a (2189 loop reads, 0.57% the dominant strand), whereas in the mouse hippocampus these miRs were also highly expressed, but had less than 10 loop reads (Supplementary Tables S2, S3, Supplementary Datasets S1 and S2). In mouse hippocampus, miRNAs with the most abundant loops were miR-138-2 (1018 loop reads, 0.06% of the dominant strand), let-7c-1 (745 loop reads, 0.21% of the dominant strand) and miR-219-2 (380 loop reads, 0.43% of the dominant strand).
Deep sequencing revealed variation in the Drosha and Dicer cleavage sites, which generate pre-miRNAs and RNA duplexes. These products are termed isomiRs (28,38). Some highly expressed miRNA genes had over a hundred isomiRs. The dominant isomiR was considered the one with the most reads. IsomiRs in our data set that exceed the length of the dominant isomiR are unlikely to be explained by the addition of non-templated nucleotides that fortuitously match the genome sequence (39–41). IsomiRs that are shorter than the dominant isomiR may arise in part due to exonucleolytic cleavage, as occurs in plants (42), but not, as yet, observed in animals (43). The variation is unlikely due to degradation or sequencing error (44,45), because species that were shorter than the dominant isomiR were approximately as frequent as those that were longer (39) and their numbers exceeded the rate of sequencing error. When considering the most abundant isomiRs, they differed by only 1 or 2nt at the 5′-or 3′-end of their sequences. Consistent with a previous proposal (46), the distribution of isomiRs in the cell is probably not random. However in our samples, the typical size distribution of isomiRs was nearly normal, especially for those highly expressed miRNAs (Figure 4A). When the isomiRs were grouped by only their 5′-ends or by their 3′-ends, they remained normally distributed (data not shown). This pattern was more obvious for highly expressed miRNAs. Our data showed that both strands exhibited higher 5′-fidelity, which agrees with previous data in several other species (30,36). Figure 4B shows the distributions of all isomiRs grouped by their 5′-ends and 3′-ends, rather than by the preferred strand. We further observed that both the 5′- and 3′-ends of the preferred strands showed a higher percentage of dominant isomiRs, which indicated less variation than occurs on the non-preferred strand (Figure 4B).
Much of the isomiR variability can be explained by variability in Dicer and Drosha cleavage positions. Variation in the termini of isomiRs produced by Dicer and Drosha reflect the fidelity of Dicer and Drosha (Figure 5) quantified as the percent of a set of isomiRs represented by the dominant isomiR. We grouped all the isomiRs based on their locations on the precursor hairpins as isomiRs derived from either the 5′- or 3′-arms, and compared the sequence variation of the enzyme cleavage positions. Because variation differed greatly at the 5′-ends compared to the 3′-ends of isomiRs, in order to compare directly the fidelity of Drosha to Dicer, we set up comparisons for both ends produced by the two enzymes. We found that the dominant isomiRs were 95.2% of the total isomiRs at the 5′-end of the 5p strand (Drosha cleavage), compared to 91.3% the 5′-end of the 3p strand (Dicer1 cleavage) (Z-value=1004.4 andP-value<0.0001). The dominant isomiRs were 60.5% of the total isomiRs at the 3′-end of the 3p strand (Drosha cleavage) compared to 56.2% at the 3′-end of the 5p strand (Dicer1 cleavage) (Z-value=592.0 and P-value<0.0001). Overall, for both 5′- and 3′-ends of isomiRs, the variations on Drosha cleavage positions were less than Dicer (Figure 5), a finding, which implies a higher fidelity of Drosha than Dicer. As a result, more variation in the cleavage sites occurs near the loop than at the base of the stem in the pre-miRNA hairpins. Although Dicer functions as a precise ‘measuring stick’ from the Drosha cleavage site, several possible RNA folding structures with variation in the nucleotide at the terminus of the stem could serve as possible Dicer substrates. An earlier report in mouse suggested greater fidelity of Dicer (36); however, those authors restricted their analysis to cases in which Dicer produced the 5′-terminus of the miRNA and was determined by the percent of offset reads. A study in C. elegans (37) showed approximately equal fidelity of Dicer and Drosha after they removed a single miRNA (miR-58) with an abundance that greatly exceeded all the other miRNAs.
Although most miRNAs showed typical symmetrical distributions of isomiRs, the proportion of sub-dominant isomiRs could vary greatly for different miRNA loci. Alternative miRNA isomiRs expressed at sufficiently high levels could direct the repression of a distinct target set (36). Therefore, we identified miRNAs with relatively abundant secondary isomiRs and miRNAs with differentially expressed isomiRs in different samples. As shown previously (37,46), the extent of variation at the ends of isomiRs varied among different miRNAs. Some miRNAs had high variation at their sequence ends with nearly equal proportions of their isomiRs. For example, the dominant isomiR of miR-1982 was only 49.55% of the total reads and the secondary isomiR, which differed by 2nt, accounted for 45.67% of total reads. To measure the isomiR variation, we calculated the weighted average size of nucleotide variation (WAZNV) according to the dominant sequences as:
where pi was the percentage of isomer i in total observed isomiRs and di was the relative distance of ends of isomiR i compared to the dominant one. WAZNV can be calculated based on either the 5′- or 3′-end for both the preferred and non-preferred strands. Because the miRNA targeting depends heavily on the 5′-end, we mainly focused on WAZNVs based on the 5′-end of miRNAs. WAZNV measured the variation among isomiRs. The top 10 sequences, whether preferred or non-preferred strands, with high isomiR variation based on all data sets are shown in Figure 6A. At the top of the list is miR-485-3p, which had shown significant variation in the literature (47–49). In our dataset, the dominant isomiR (isomiR I) was on the 3′-arm, which accounted 48.9% of miR-485-3p reads. The secondary and tertiary isomiRs accounted 28.1% (isomiR II) and 19.7% (isomiR III) of total reads, which shifted the mature sequence 2 and 3 nt in the 5′-direction. Although the isomiR distribution around the dominant isomiR is usually normal, some isomiR distributions shift the secondary and tertiary isomiRs 2 and 3nt from the dominant isomiR (Figure 6B).
By comparing isomiR variation between different samples, WAZNV can detect differentially expressed isomiRs. By defining the position of dominant isomiRs within the whole data set, we calculated the WAZNV of miRNAs in both Group I and II samples as well as the AGO-IP sample separately. Switching of dominant isomiRs was uncommon between hippocampus and Group II cells (Supplementary Datasets S1 and S2), and also very rare between AGO-IP and total small RNA samples. However, we identified several cases that miRNAs switched the dominant isomiRs between hippocampus and Groups II cells such as miR-485-3p, or had significant differences in isomiR distributions (Table 2 and Figure 7). In addition to miR-485-3p (Figure 7A), miR-543-5p (Figure 7B), miR-592-5p (Figure 7C) and mu-miR-137-3p (Figure 7D) also switched dominant isomiRs between Group I and II cells. miR-125b-2-3p (Figure 7E) was highly expressed in hippocampus, in which the dominant isomiR accounted for 78.9% of total reads; in contrast, the secondary isomiR, which shifted 2nt in the 5′-direction compared to the dominant isomiR, accounted for 40.0% of the total 3261 reads of miR-125b-2-3p in Group II cells. A few cases showed large differences between AGO-IP and total small RNA samples. miR-690 had 1010 reads, 3162 reads and 2196 in AGO-IP, Group I and Group II cells respectively, but its dominant isomiRs in AGO-IP samples shifted 2nt in the 3′-direction compared to the location of dominant isomiR in the other two groups which showed identical distributions of isomiRs (Figure 7F). The dominant isomiR of miR-34a-3p in AGO-IP samples shift 1nt compared the dominant ones in the other two samples. These examples could indicate an isomiR preference of the AGO complex to maintains greater uniformity in the mature miRNA that is guided to the mRNA.
Sequence alterations of miRNAs can occur by addition of non-templated nucleotides to miRNA termini (38,40,49). Most miRNAs in our data set did not have non-templated nucleotide modifications. Of the total reads, 16.00%, 14.10% and 12.67% were extended by 1nt in the Ago-IP, Group I, and Group II samples, respectively. Of the total reads, 7.81%, 6.85% and 3.18% were extended by 2nt in the Ago-IP, Group I and Group II samples, respectively. The nucleotides most frequently added to murine miRNAs were U and A (Figure 8 and Table 3). However, in the Ago-IP sample from hippocampus the addition of C was most common; C also showed a small increase in the Group I samples compared to the Group II samples. Di-nucleotide extensions not infrequently consisted of two different nucleotides (Supplementary Table S5). Half ofthe miR-143-3p reads in our samples were extended similarly to the very high proportion of extended reads previously described for this miRNA (36). Among highly expressed miRNAs, 32% of the miR-124 reads were extended and 33% of the miR-24 reads were extended (Supplementary Table S6).
Nucleotide addition can occur on the mature miRNA (50) or on the precursor species (43,51). For our entire data set nucleotide addition on the 5′-end of both the 5p (0.32%) and 3p (0.31%) strand was infrequent. The pattern of non-templated extension differs between Group I hippocampal tissue and Group II cultured cells undergoing re-programming (Figure 8). Group II cells also have a greater tendency toward 3′-mono-uridylation on the 3p arm relative to the 5p arm in Group II compared to Group I (Table 3, 4.88%/3.88% in Group I and 8.21%/1.84% in Group II). In general, most of the nucleotide additions occurred on the 3′-end with 19.85% on the 3p strand and 16.07% on the 5p strand. Whether the bias toward the 3′-end of the 3p strand represents nucleotide addition on the pre-miRNA or a bias to the 3p strand after Dicer cleavage cannot be discerned from these data. A further breakdown by the specific nucleotide added shows addition of A, U and C, but very low values of G as previously noted (46). The specific extended nucleotide at the 3′-ends shows the expected preference for mono-uridylation on the 3p arm, i.e. the extreme end of the precursor; however, the 5p arm has a bias toward the addition of adenine (Table 3). One possible reason for this asymmetry is a nucleotide specific tendency to extend the pre-miRNA versus the mature miRNA. There is a hint in the data that the choice of U or A may have an effect on promoting or hindering RISC entry. Comparing total miRNAs from the hippocampus (Group I) to miRNAs Ago IP’ed from the hippocampus there is a relative increase in mono U extension and a relative decrease in mono A extension in the Ago-IP fraction.
Several studies have used deep sequencing for detailed miRNA annotations in a range of model organisms including D. melanogaster (30), C. elegans (37) and mouse (29,36). By sampling diverse tissues and cells and sequencing at a great depth, several novel observations emerge here and other findings were confirmed. More than 90% of mouse miRNAs had more than 10 reads from both arms of the precursor, and therefore both strands could potentially play a biological role in target repression. Taking advantage of the tight association between miRNAs and AGO, we compared total small RNA (Group I) and a pan-Ago IP both from the mouse hippocampus to profile miRNAs which entered RISC. These data showed fewer non-preferred strands present in the RISC even in cases with high read counts. Nevertheless, not infrequently, the non-preferred strand also associated with AGO, and therefore possibly with the RISC. It seems likely that in some cases, a pool of pre-miRNAs sort their preferred and non-preferred strands into different RISCs without degradation of one of the strands. Although usually the preferred strand is readily recognizable by a many-fold enrichment, some miRNAs show quite similar read counts from both the preferred and non-preferred strands. 10% of miRNA loci showed a comparable expression of both strands with <2-fold difference.
Data from several large-scale miRNA sequencing studies have demonstrated that the arm from which the preferred miRNA is processed can switch in different tissues and at different developmental times (36,38,52–55). Among a variety of human samples, Cloonan et al. (56) found that 12.9% of miRNAs switched the dominant strand in at least one tissue. Utilization of opposite strands for target binding would result in significant changes in the miRNA targeting. However, often miRNA loci with comparable expression levels of both arms show switching of the preferred strand in different tissues, and in these cases both strands may be used in different proportions rather than a more dramatic switch in the target field if one or the other strand were used exclusively. When strand choice is closer to equilibrium different tissues appear to tilt the balance toward one strand or the other. The use of both strands among a sizable fraction of miRNAs is consistent with the demonstration of selection at the 5′-end of both strands (57). Other reports proposed that star sequences would not be excluded from functional complexes because they are present in substantial levels or sorted to different Argonaute complexes, in which the dominant arm directs translational repression (by means of Ago1) and the miR* sequence directs transcriptional degradation by means of Ago2 (25–27,36,38,58–63). Star sequences, such as miRNAs like miR-19* (64) and miR-223* (65), can have a detectable impact on target networks in Drosophila and mammals (25–27,54,66).
Although isomiRs are frequently observed, their origin has been attributed to sequencing or alignment artifacts (67–69). However, non-random features of their distributions suggest otherwise (29,70) and recently it was suggested that isomiRs with different repertoires of mRNA targets would distribute the ‘off-target’ hits while still targeting core biological networks (56). Among the non-random features of isomiRs is variation in the choice of the dominant isomiR. Furthermore, a potentially disruptive shift of the dominant isomiR by 2nt (36) was observed. In particular, variation at 5′-end of the mature isomiR will presumably have greater effects on targets (30,36). Secondary isomiRs could be explained by differential processing of the two paralogous hairpins (36,38), or alternative Drosha and/or Dicer1 cleavages (30). Finally, miRNAs can be modified by the non-templated addition of nucleotides almost always at the 3′-end. Reports in the literature vary on the prevalence of non-templated nucleotide addition among miRNAs (36,71,72) with patterns described in mammals (36,49), worms and flies (38,40). A bias toward mono-uridylation and away from mono-adenylation among miRNAs in Ago IP fractions compared to the total small RNA fraction suggests that these two additions may serve to promote or hinder association with the RISC. The methods here preclude the discovery of long 3′-terminus poly(U)-tailing and those reports that do use deep sequencing to annotate miRNAs also do not report extensive poly(U)-tailing (46).
The origin of moRs is unclear. They may arise by exonucleolytic activity on precursor transcripts (38,73) or double-stranded cleavage of extended hairpin regions on pri-miRNA transcripts via secondary DROSHA processing (30,74). However, the broad length distributions of moRs derived from 5′-and 3′-regions flanking pre-miRNAs are inconsistent with a DROSHA-based cleavage mechanism (72). However, our data show that the ends of both the dominant 5′-and the 3′-moRs fit very well with the dominant termini of the 3p and 5p strands arising from Drosha cleavage. It therefore seems likely that at least one end of the moR sequences is a Drosha product.
The sequencing data reported in this study can be obtained from the Sequence Read Archive (SRA) at NCBI under accession number SRP010127, SRP010168, SRP010169 and SRP010170.
Supplementary Data are available at NAR Online: Supplementary Tables 1–6, Supplementary Figures 1–6 and Supplemental Datasets 1–3.
California Institute for Regenerative Medicine (CIRM) (to K.S.K.); National Institutes of Health (to T.M.R.); a postdoctoral fellowship from California Institute for Regenerative Medicine to E.J.L. Funding for open accesss charge: California Institute for Regenerative Medicine and the NIH.
Conflict of interest statement. None declared.
The authors wish to the Kosik lab members for helpful comments.