Computational Survey for Well-Conserved Mammalian Mirtrons
At least some invertebrate mirtrons have been well conserved during fly or worm evolution. These exhibit characteristic features that reflect their status as microRNA-class genes (Lai et al., 2003
), namely that they are short, straight, hairpin introns that exhibit preferential conservation of the 5′ and 3′ terminal segments relative to the central intronic region (Okamura et al., 2007
; Ruby et al., 2007a
). In other words, the miRNA/miRNA* sequences of mirtron hairpins are much more conserved than their terminal loops. A forward analysis of all Drosophila
introns that exhibit these properties across eight or more sequenced Drosophilids revealed only those mirtrons that were cloned previously (W.-J.C. and E.C.L., unpublished data), suggesting that there is a fairly limited repertoire of well-conserved mirtrons in flies (Okamura et al., 2007
; Ruby et al., 2007a
We asked whether these simple features might yield candidate evidence for mammalian mirtrons. In brief, we extracted 25,935 RefSeq/Ensembl introns 50–200 nt in length from the UCSC Genome Browser (Kuhn et al., 2007
) and identified conserved mammalian introns that exhibit a “saddle-shaped” conservation profile, then used RNAfold (Hofacker, 2003
) and RNAshapes (Steffen et al., 2006
) to identify those introns with straight hairpin structures in both primate and nonprimate orthologs (see Experimental Procedures). This yielded 13 candidates for well-conserved mammalian mirtrons (see Figures S1
in the Supplemental Data available with this article online), of which some appeared less compelling than others, due to hairpin conservation in relatively few species and/or relatively high free energy.
We then asked whether the cloned products of any of these mirtron hairpin candidates were present in collections of mammalian small RNAs (Berezikov et al., 2006a
). Indeed, multiple reads corresponding precisely to both the 5′ and 3′ ends of host introns (i.e., miRNA/miRNA*) were found in human, chimpanzee, rat, and/or mouse small RNA data sets for three loci (mir-877
, and mir-1225
, Figures and and Figures S1
). As with invertebrate mirtrons, mammalian mirtrons generally lacked the pairing between their flanking exons needed for recognition by the Drosha/DGCR8 complex ( and Figure S1
); where pairing was found, it was typically not conserved and followed codon wobble rules.
Nineteen Confidently Annotated Mammalian Mirtron Loci
The mirtrons mir-877
, and mir-1225
were clearly maintained as hairpins in mammals as diverse as rodents, dog, and horse, indicating their persistence over at least ~80 million years of eutherian evolution (Figures S1
). We note that small RNAs from the mir-877
locus were recently cloned independently by Tuschl and colleagues, who annotated it as a canonical miRNA gene (Landgraf et al., 2007
). Its reclassification as a mirtron is akin to that of nematode mir-62
, which was only recently recognized as a mirtron gene (Ruby et al., 2007a
). We also note that two of the most abundantly cloned mirtron products were derived from mir-877
), which were also two of the most perfectly conserved predicted mirtrons. This parallels the finding that the most highly expressed invertebrate mirtrons are also the most highly conserved ones (Okamura et al., 2007
; Ruby et al., 2007a
), as is also generally the case for canonical animal miRNAs (Berezikov et al., 2006b
; Ruby et al., 2007b
A Plethora of Primate-Specific Mirtrons
Although some are well conserved, most invertebrate mirtrons arose quite recently during Drosophilid and nematode radiation (Okamura et al., 2007
; Ruby et al., 2007a
); thus, the consideration of evolutionary conservation does not aid their computational identification. However, newly evolved miRNAs have emerged through high-throughput small RNA sequencing efforts. In D. melanogaster
, adult heads expressed a high diversity of mirtrons and canonical miRNAs (Ruby et al., 2007b
). This is consistent with the fact that brains harbor an exceptional diversity of neurons, a cell type that intrinsically has exceptional needs for translational regulation. We therefore mined a data set of 30 additional small RNA libraries from 15 matched anatomical regions of human and rhesus macaque brains (Figure S3
), represented by 18,000–45,000 sequences each (E.B. and E.C., unpublished data).
In addition to revealing cloned evidence for mirtrons mir-877
, and mir-1225
in macaque, analysis of these small RNA data sets yielded another 16 mirtrons expressed in primate brains with evidence justifying official nomenclature ( and Figure S4
). We considered minimum evidence to be the recovery of clones from independent libraries, or at least three clones from any individual library. In several cases, higher levels of evidence were attained, including their cloning from multiple species (i.e., mir-1226
both from human and macaque), the isolation of many clones (i.e., mir-1229
, 16 clones from 12 different libraries), and/or the isolation of both miRNA and miRNA* species (i.e., mir-1227
). These mirtrons appeared to be phylogenetically restricted to primates, with some presenting conserved hairpin structures in human/rhesus/chimp, and others that were restricted to a primate subset. We have summarized the sequences and secondary structures of the orthologous primate mirtronic introns in Figure S5
Finally, we classified 46 additional hairpin introns from human (23 loci), macaque (16 loci), chimpanzee (3 loci), or mouse (4 loci) as mirtron candidates (Figure S6
). The greater number of human and macaque candidates was due in part to the deeper sampling of human and macaque brains. A few of these candidates were cloned three or more times, but we considered their candidacy tentative because of an atypical intronic extension of 8–10 nt on one side of the hairpin (i.e., macaque_block210826 [3 reads/2 libs], and human_block172399 [3 reads/1 lib]). In Drosophila
, at least one conserved mirtron-like locus (mir-1017
) exhibits a long intronic extension on one side of the hairpin (Ruby et al., 2007a
), suggesting that such “half-mirtron” loci might have one side defined by splicing and the other by exonucleolytic digestion. Of the remaining candidates, five (human_block107544, chimp_block23965, macaque_block550558, macaque_block137121, and mouse_block283) were sequenced twice while the rest were defined by single reads. Many of these candidate mirtrons exhibit compelling extended hairpin structures; thus, we anticipate that at least some of them (along with some of the uncloned, conserved, computational candidates) will eventually be validated by additional sequencing.
Most Short RNAs from Mammalian Intron Termini Derive from Mirtrons
The fact that at least three cloned mirtron loci have been highly conserved during mammalian evolution is evidence that vertebrate mirtrons can have regulatory functions that are subject to stringent constraint. Still, as mammalian mirtrons were not reported from previous sequencing efforts, we questioned whether some of these sequences might trivially represent intron degradation products, as opposed to bona fide regulatory RNAs. Certainly, this could apply especially to some members of our tentative “candidate” set. However, several lines of evidence argue against this being a major explanation.
First, our libraries were constructed to select for 5′ phosphates and therefore against degradation products. Second, the size bias for 21–24 nt RNAs and multiple instances of cloned miRNA/miRNA* pairs were indicative of Dicer cleavage. Third, we observed that the number of mirtron clones recovered was not strictly proportional to the number of host ESTs found (Figure S7
). Abundant mirtrons such as mir-877
had many host ESTs, as might be expected if intronic small RNAs are coexpressed with their hosts (Baskerville and Bartel, 2005
). In contrast, mir-1225
, which has been highly conserved over mammalian evolution and was cloned cross-species, had relatively few clones compared to EST clones (i.e., underrepresented). Conversely, mir-1224
, again a very highly conserved locus and cloned cross-species, had a similar number of reads as mir-877
but many fewer host ESTs (i.e., overrepresented). The lack of a strict correlation supports that mirtronic RNAs are not recovered simply as a degradation byproduct of the splicing of abundant mRNAs. Instead, it is consistent with the notion that the half-life of mirtronic small RNAs is influenced by their association with effector complexes, and thus may differ from the half-life of their host mRNAs.
We probed this further by comparing the number of annotated human and macaque introns across 100 nt length increments with the number of human or macaque reads corresponding to the 5′ or 3′ termini of introns (“boundary reads”). We found that short introns (1–100 nt, and to a lesser extent 101–200 nt), were highly enriched for boundary reads (). In particular, 138 short human introns 1–200 nt in length generated 55% of all boundary reads, while the remaining reads derived from 251 loci. This represented a 2.26-fold enrichment for cloned fragments to arise from short introns relative to introns of other sizes. However, because short introns comprise only 16% of all introns, this represented a 7.7-fold enrichment in reads per short intron versus all other introns. Analysis of macaque produced a similar picture: short introns generated 60.3% of all boundary reads, yielding a 2.51-fold enrichment when normalized as reads per cloned locus and a 6.37-fold enrichment when normalized for the number of short introns. We also observed that in both human and macaque, ~60% of all boundary reads from short introns derive from our officially annotated or candidate mirtron loci. Therefore, cloned intron boundary RNAs are quite preferentially associated with short hairpin introns.
Short Hairpin Introns Are the Predominant Source of Cloned Intron-Terminal Small RNAs in Diverse Mammals
Similar trends were evident in chimp and mouse, although the smaller number of mirtronic small RNAs in these species limited our ability to assess enrichment values confidently. Taken together, we can conclude that short introns are significantly biased to generate cloned small RNAs in different mammals, and the majority of these are derived from hairpin precursors. While we do not claim that all the cloned mirtrons have functional endogenous targets—indeed, many of the tentative candidates could be the result of fortuitous processing—the cloning, size distribution, evolutionary properties, and preferred derivation from short hairpins all support the idea that mirtrons are miRNA-pathway-derived regulatory RNAs in mammals.
Differences between Mammalian and Invertebrate Mirtrons
Our studies reveal that primates have more mirtrons than do worms or flies; thus, mirtrons are a substantial source of regulatory RNAs in mammals. However, mammalian mirtrons exhibit several differences from invertebrate mirtrons, which collectively have implications for the genesis of mirtrons.
3′ versus 5′ miRNA
All invertebrate mirtrons with more than two cloned products generate 3′ dominant miRNAs (Ruby et al., 2007a
). In contrast, several of the most highly expressed mammalian mirtrons clearly produce 5′ dominant species, with some 3′ miRNA* species representing only a few percent of clones from a given hairpin (i.e., mir-877
, and Figure S4
). We note that the corresponding 3′ mirtron species of 5′ dominant loci are often extremely pyrimidine rich. For example, miR-877* contains 19 consecutive pyrimidines before its terminal AG splice acceptor. This is consistent with location at 3′ intron ends, which are typically pyrimidine rich, but at odds with the sequence complexity typical of miRNAs. Therefore, at least some 5′ mirtron products are likely functional.
Importantly, we observed that the asymmetry of mammalian mirtron strand selection generally follows the thermodynamic rules proposed for canonical miRNA duplexes (Khvorova et al., 2003
; Schwarz et al., 2003
), which provides further support that they transit the miRNA biogenesis pathway. These analyses are summarized in Figure S8
. A curious exception is mir-1226
, which preferentially generates a 5′ miRNA, although its 3′ arm was expected to predominate. It may be that other factors can reverse miRNA strand selection.
5′ nt Identity
The 3′ products of mammalian mirtrons exhibit equal tendency to begin with either pyrimidine, which contrasts with the strong 5′ uridine bias of invertebrate mirtrons (). Approximately equal numbers of mammalian 3′ mirtron products start with U versus C, regardless of whether the 5′ or 3′ product was dominant ( and Figure S4
). Curiously, none of the 3′ mirtron species (cloned from 17 different loci) begin with an A or G, indicating a strong bias against 3′ mirtron products to begin with a purine, even in cases where the 3′ arm is not the dominant species ( and Figure S4
). However, animal mirtrons are united in that no cloned 3′ mirtron product from flies, worms, or mammals thus far begins with a G. Animal miRNAs are generally, but not exclusively (Figure S9
), biased against 5′ G residues. The fact that 5′ mirtron products begin with a G makes their selection as miRNAs in mammals noteworthy.
Sequence and Structural Features of Mammalian and Invertebrate Mirtrons
Hairpin End Structure
None of the most highly cloned mammalian mirtrons exhibit a stem structure with a precise AG 3′ overhang to the hairpin, as is typical for highly expressed Drosophila
and nematode mirtrons. In fact, of the 19 confidently annotated mammalian mirtrons, only three had precise AG overhangs adjacent to a terminal duplex. Instead, the most frequent configuration was for single nucleotide overhangs at both ends (seven loci, Figure S4
) in which the U of the GU splice donor pairs with the A of the AG splice acceptor (). The distinct, preferred end configurations of mammalian and invertebrate mirtrons were evident from their sequence logos (). The unusual configuration of (3 nt-5′) + (2 nt-3′) hairpin overhangs also seemed to be compatible with efficient processing of mammalian mirtrons (i.e., mirtron mir-1226
, ). Nevertheless, the end of the miR-1226/miR-1226* duplex on the terminal loop side exhibits a 2 nt 3′ overhang, as expected for Dicer cleavage of this otherwise atypical hairpin.
These observations appear to extend the potential range of endogenous Dicer substrates, previously comprised mostly of Drosha products (pre-miRNA hairpins), Drosha mimics (mirtrons), or other Dicer products—all of which exhibit signature 2 nt 3′ overhangs. Still, our presumption that mammalian mirtrons require the canonical pre-miRNA export machinery, as shown for Drosophila
mirtrons (Okamura et al., 2007
), led us to investigate the structural constraint on pre-miRNA hairpin ends. We analyzed all miRbase miRNAs with annotated miRNA* species and calculated their hairpin end structures. With the caveat that the ends of some miRNA* species might be incorrectly annotated, this study showed that a number of deduced pre-miRNA hairpins are not predicted to have perfect 2 nt 3′ overhangs (Figure S10
). Therefore, Exportin-5 may accept a broader range of small RNA hairpins than is often considered. Indeed, gel-shift analyses support the ability of Exportin-5 to bind to certain hairpins with noncanonical ends (Zeng and Cullen, 2004
). Alternatively, other factors might participate in the export of both canonical pre-miRNAs and mirtrons.
Mammalian mirtrons exhibited much higher GC content, and thus much lower free energy, than either invertebrate mirtrons or bulk human short introns (). Comparison of the 18 invertebrate mirtrons with the 29,120 D. melanogaster introns that are 50–120 nt in length showed that they had similar GC characteristics as bulk D. melanogaster short introns. In contrast, comparison of the 19 cloned primate mirtrons with all 13,453 human introns 50–120 nt in length showed that mammalian mirtrons are significantly enriched for high GC content compared to bulk human short introns (). These findings remained true when the miRNA/miRNA* portions of mirtrons were compared with matched lengths of 5′ and 3′ termini of short introns. In addition, the GC content of mammalian mirtrons was also much higher than that of canonical human miRNAs or invertebrate miRNAs (). It is conceivable that these characteristics might compensate in some way for the fact that mammalian mirtrons are frequently suboptimal mimics of Drosha products, in terms of hairpin end structure.
On the Evolutionary Emergence of Mirtrons and the Effect of Mirtrons on Evolution
The many differences between plant and animal miRNAs have been taken to indicate convergent evolution of miRNA pathways among divergent eukaryotes that share an ancestral RNA interference pathway. Similarly, the many distinctions between mammalian and invertebrate mirtrons might reflect independent acquisition of mirtron pathways in different animal clades. Consistent with this, while several mirtrons are highly conserved among Drosophilids (Okamura et al., 2007
; Ruby et al., 2007a
), nematodes (Ruby et al., 2007a
), and mammals (this work), these animals do not collectively share any mirtrons that are clearly related by ancestry. This does not exclude a model in which mirtrons facilitated the evolution of a canonical animal miRNA pathway, prior to the evolution of a Drosha-type activity (Ruby et al., 2007a
). However, in this scenario, it is necessary to posit that none of these ancient mirtrons evolved substantial functions and were all lost through evolution, or that all of them accumulated so many sequence changes that their ancestry is no longer apparent from sequence alignment. These scenarios are not easily reconciled with the fact that highly conserved mirtrons have subsequently emerged in three different animal lineages, nor with the fact that many canonical miRNAs have been retained completely unchanged from the bilaterian ancestor of invertebrates and vertebrates (Prochnik et al., 2007
Our findings also do not clearly support a model in which mirtrons arise in genomes strictly proportionally to the fraction of short introns whose size is comparable to pre-miRNA hairpins (Ruby et al., 2007a
). The extant evidence demonstrates that primate brains express a greater number of mirtrons than do flies and worms put together, despite the fact that these invertebrates have more short introns (Lim and Burge, 2001
; Yandell et al., 2006
). In addition, because mammalian mirtrons have very high GC content relative to bulk mammalian short introns, they evidently do not comprise a random sampling of mammalian short introns (). Indeed, the differences in sequence composition and structure between mammalian mirtron and pre-miRNA hairpins () further suggest that they are not simply pre-miRNA mimics, as appears to be the case for their invertebrate counterparts.
Overall, the observation of cloned products from many newly evolved mirtrons in diverse animal species suggests that the mirtron might represent an evolutionarily opportunistic and facile strategy for the birth of regulatory RNAs in animal species with a preexisting canonical miRNA pathway. This is conceptually similar to the notion that animals and plants may have evolved miRNA genes independently, building their respective pathways via an ancestral RNA interference pathway. The fact that a majority of D. melanogaster mirtrons arose quite recently during Drosophilid evolution, combined with the observation that miRNAs have relatively minimal requirements for target identification, suggested that mirtrons could have a palpable effect on insect speciation. Our parallel observation that primates, and specifically primate brains, express a strong diversity of processed mirtrons similarly suggests that they might also contribute to primate evolution and/or primate-specific behavior.