mRNAs regulated by a direct Microprocessor cleavage mechanism should be upregulated in cells deficient for the Microprocessor, but not in Dicer deficient cells. Therefore, we evaluated coding mRNA profiling data from wild-type, Dgcr8 KO and Dicer KO mouse ES cells. Normalized mRNA levels in Dgcr8 KO and Dicer KO cells were compared to wild-type ES cells (). Most mRNAs that were upregulated or downregulated were similarly altered in both mutants. However, similar to previous studies 
, we found multiple mRNAs whose expression were specifically altered in cells that lacked Dgcr8. Using a false discovery rate of 5%, there were 778 transcripts there were upregulated in Dgcr8 KO cells relative to both wild-type and Dicer KO. There were 843 transcripts that were downregulated.
Transcripts differentially regulated in Dgcr8 KO relative to WT and Dicer KO ES cells.
If genes specifically upregulated in Dgcr8 KO cells are normally cleaved by the Microprocessor, there should be hairpin substrates for the complex within these mRNAs. Therefore, we searched for evolutionary conserved hairpins within these mRNAs using predictions generated by the EvoFold algorithm 
. The 5′UTR hairpin in Dgcr8 was first identified by this method. EvoFold predictions are grouped based on their location in CDS, 5′UTR, 3′UTR, intron and intergenic regions. We determined mouse genome coordinates for EvoFold hairpins in CDS, 5′UTR and 3′UTR regions (see Methods
), mapped them to the coding mRNA database, and compared the relative expression levels of all positive hits in Dgcr8 KO, Dicer KO, and wild-type ES cells (). A total of 824 out of 23805 (3.5%) coding mRNAs contained predicted hairpins. Of these 824, 43 mRNAs were specifically upregulated in Dgcr8 KO cells, while 24 mRNAs were specifically downregulated in the Dgcr8 KO cells. Therefore, there was a subset of genes specifically upregulated in Dgcr8 KO cells that contain predicted hairpins and hence could be direct targets of the Microprocessor.
If hairpins within the Dgcr8 KO- upregulated gene set are indeed cleaved by the Microprocessor, we hypothesized that there would be Dgcr8-dependent small RNAs that map to these hairpins. Therefore, we evaluated ultra-high throughput deep sequencing data representing small RNAs ranging from 18-32 nucleotides from the wild-type, Dgcr8 KO and Dicer KO ES cells. As expected, multiple sequence reads mapped to the EvoFold predicted 5′UTR and coding region hairpins of Dgcr8 mRNA in WT cells ()
. None of the reads mapping to the coding region hairpin were found in either Dgcr8 or Dicer KO libraries confirming their Dgcr8- and Dicer-dependence (). Interestingly, two sequence reads mapping to the 5′UTR hairpin were found in the Dicer KO library (). One of these reads mapped just 5′ to the hairpin. Such Dgcr8-dependent, Dicer-independent reads have been previously observed at miRNA loci in Drosophila and mouse small RNA sequencing studies and appear to be a 5′ remnant of Drosha cleavage that is further degraded by an unknown 5′-3′ exonuclease 
. The remaining read that was uncovered in the Dicer KO library had a 5′ end that did not map to the 5′ or 3′ end of the hairpin suggesting that it was a degradation product of the full length hairpin. Analysis of all EvoFold-predicted hairpins in the Dgcr8 KO-upregulated set of coding mRNAs failed to identify a single other hairpin with corresponding small RNAs.
The distribution of reads across hairpins in the first exon of Dgcr8 in mES cells.
Analysis of only EvoFold predicted loci could miss poorly conserved hairpins. Therefore, to extend the analysis, sequencing reads from WT ES cells were mapped to all exons of the transcripts whose expression was altered in Dgcr8 KO versus WT and Dicer KO cells. 7 out of the 778 Dgcr8 KO- upregulated transcripts and 15 out of the 844 downregulated transcripts had at least 5 small RNA reads that overlapped with their exons (). As Microprocessor activity is predicted to destabilize the mRNAs, we looked more closely at the 7 transcripts upregulated in Dgcr8 KO cells. The small RNAs that mapped within exonic regions of these annotated transcripts fell into two groups based on their distribution. Three had multiple small RNAs with a similar 5′ or 3′ end, consistent with specific endonuclease cleavage ( and Figure S1, S2)
. The remaining five (two from the same transcript, Arrdc-3) had small RNAs mapping across the exon without shared 5′ or 3′ ends consistent with degradation ( and Figure S3, S4, S5, S6)
. All of these small RNAs were present in the Dgcr8 null background ( and Figure S1–S6)
. Hence, they are not products of Microprocessor cleavage.
Representative examples of read distribution in exons with >5 reads in WT cells.
A small number of annotated miRNAs map to exonic regions of coding genes (~37 in mice) 
. Therefore, analogous to Dgcr8, the host genes for these miRNAs might be expected to be downregulated by Microprocessor-induced cleavage. Upon examination of the exonic miRNAs, we found only 10 to fully lie within annotated exons (Table S1)
. We were able to find small RNA reads to three of these exonic miRNAs (mmu-miR-21, mmu-miR-671, mmu-miR-147). However, the mRNA levels of the host genes of these three miRNAs were not altered in the Dgcr8 and Dicer KO ES cells. Therefore, production of these miRNAs does not appear to influence the overall levels of the annotated host mRNAs. Together, these detailed analyses of both mRNA expression profiling and small RNA sequencing data from ES cells failed to uncover any genes other than Dgcr8 that are directly destabilized by the Microprocessor.
It is possible that 18–32 nucleotide small RNA sequencing missed Microprocessor-cleaved exonic hairpins that are sequestered and/or are not processed by Dicer. Microprocessor miRNAs are typically 60–75 nucleotides in length. Therefore, to directly identify these hairpins, we analyzed ultra high-throughput sequencing data sequence sets produced from small RNAs less than 200 nucleotides in length from Hela and HepG2 cells 
. The forty small RNA libraries generated in the study were derived from whole cell, cytoplasmic and nuclear fractions, as well as from cells following enzymatic treatments that enrich for either mono-, di-, tri-phosphate modified or 5′ capped RNAs. Sequence reads from all forty libraries were mapped to exonic EvoFold hairpins. The largest number of hits, 184, mapped to the Dgcr8 5′UTR hairpin and 4 mapped to the coding region hairpin (). Most of these reads had a uniform 5′ end consistent with Microprocessor cleavage. There was an additional read just 5′ to the hairpin, a likely remnant of the Microprocessor cleavage, similar to that seen in the ES cell small RNA libraries (). A large number (166 out of 184) of the 5′ UTR reads were derived from nuclear libraries, consistent with previous work showing that the cleaved 5′UTR hairpin is confined to the nuclear fraction 
. When mapping reads from the libraries to known pre-miRNA hairpins, many reads extend beyond the known mature miRNA into the loop region of the hairpin (Figure S7)
, thereby confirming that these libraries contain hairpin products of the Microprocessor cleavage. These findings show that the analysis of the Hela and HepG2 small RNA data sets should identify other hairpins that are cleaved by the Microprocessor even if they are not further processed.
Read distribution across hairpins in the first exon of Dgcr8 in <200nt small RNA sequencing data from HeLa and HepG2 cells.
In order to identify any other potential mRNA substrates, we next mapped the HeLa and HepG2 datasets to all UTR and CDS EvoFold loci. There were 106 additional EvoFold hairpins containing overlapping small RNAs, although the number of reads mapping to any one of these hairpins was much less than seen for Dgcr8 (Table S2)
. Only four of these hairpins had at least 5 sequence reads. Furthermore, none of the small RNA reads in these hairpins mapped in a manner consistent with Microprocessor cleavage. That is, they had heterogeneous 5′ and 3′ends and/or the ends went beyond the extremes of the hairpins (). For example, the second highest-ranking hairpin, which mapped to the gene RPS3, had 14 reads. However, unlike the reads mapping to the Dgcr8 hairpins, they did not have a defined 5′ end, but instead mapped across the locus, more consistent with RNA degradation than Microprocessor cleavage. Therefore, analysis of small RNAs less than 200 nucleotides failed to identify any Evofold loci within exons other than Dgcr8 that are cleaved in a Microprocessor-like fashion.
Read distribution across hairpins positive for >5 small RNA reads in HeLa cell <200 nt small RNA sequencing data.
Again, limiting the analysis to Evofold predicted hairpins would miss non-conserved hairpins. Therefore, we mapped small RNAs from HeLa and HepG2 libraries to exons of transcripts upregulated over 2-fold with siRNA-mediated knockdown of both Drosha and Dgcr8 relative to siGFP. Expression information was extracted from recently published microarray data in HeLa cells (see Methods
. As expected, Dgcr8, which was upregulated in the Drosha knockdown sample, had 188 small RNAs mapping to the first exon. Upon examining protein-coding genes upregulated in both Drosha and Dgcr8 knockdown samples, 31 transcripts had >
10 small RNAs mapping to at least one exon (45 exons total, Table S3
). Notably, 15 out of the 31 were genes that encode ribosomal protein subunits, which are highly abundant in cells. Out of the 31, 11 transcripts had small RNA reads distributed over the exon, as would be expected for degradation products. The remaining 20 transcripts had small RNA reads clustering in small window(s) within exons. However, further examination of the regions in these 20 transcripts using RNAfold did not reveal the presence of any good hairpin structures, in contrast to the Dgcr8 small RNA mapping-regions. In summary, analysis of the ultra high-throughput sequence reads of RNAs less than 200 nucleotides, like the ES cell small RNA dataset, showed that a role of the Microprocessor in direct mRNA regulation is likely limited to Dgcr8.