|Home | About | Journals | Submit | Contact Us | Français|
MicroRNAs (miRNAs) are endogenous transcripts that contain intramolecular double stranded RNA (dsRNA) and are processed by Dicer. Their mature products are ~21-24 nucleotides in length, and they collectively regulate a broad network of endogenous transcripts. A subset of animal miRNAs are produced from mirtrons, short hairpin introns whose splicing bypasses the normal nuclear processing of canonical miRNAs. Recent studies revealed novel, extended intramolecular dsRNA produced by defined transcription units in flies and mammals, termed hairpin RNAs (hpRNAs). Detailed biogenesis studies in Drosophila showed that hpRNAs are not merely “long” miRNAs, but are actually processed by a distinct biogenesis pathway that is related to the canonical RNA interference pathway. We compare and contrast the miRNA and hpRNA pathways in this review, and describe some of the key questions that the recognition of this novel pathway raises.
Three well-conserved classes of small RNAs associate with Argonaute (Ago) proteins in animals.1 Small interfering RNAs (siRNAs) are ~21 nt, 3' methylated RNAs that reside in a “Slicer” Ago protein that is specialized for target cleavage. These species mediate classically-defined RNA interference (RNAi), in which exogenous long double-strand RNA is processed into siRNAs that cleave perfectly complementary transcripts.2,3 microRNAs (miRNAs) are endogenous ~22 nt RNAs with 2', 3' hydroxy termini that derive from endogenous transcripts bearing inverted repeats.4-8 Most miRNAs reside in Ago-class proteins that have modest or no cleavage activity; instead, they exert their main effects via translational inhibition and/or target deadenylation.9 miRNAs collectively regulate thousands of endogenous target transcripts via as little as 7 nt “seed” pairing with nucleotides 2-8 of the miRNA. Piwi-interacting RNAs (piRNAs) are ~24-30 nt, 3' methylated RNAs that reside in Piwi-class Argonautes in flies and vertebrates, whose tested members are all Slicers.10 piRNAs are found predominantly in the germline, and one of their essential functions is to restrict the transposition of selfish genetic elements.
Until recently, the dogma was that siRNAs and piRNAs are produced from intermolecular double strand RNA (dsRNA), whereas miRNAs are produced from intramolecular dsRNA. In particular, siRNAs are produced by Dicer-mediated cleavage of perfectly double stranded RNA;11 a specialized nematode pathway also generates secondary siRNAs via unprimed synthesis off the 3' end of cleaved transcripts by an RNA dependent RNA polymerase.12,13 piRNA production involves a feed-forward loop in which piRNAs anneal to complementary transposon-related transcripts.14-16 “Antisense” piRNAs derived from piRNA master transcripts guide Piwi-class complexes to cleave active transposon transcripts and generate “sense” piRNAs; these reciprocally generate more antisense piRNAs through the cleavage of piRNA master transcripts. (Note that the biogenesis mechanism for abundant non-transposon-derived piRNAs present in pachytene-stage mammalian sperm is unclear. Their generation involves long single stranded precursors, but their antisense counterparts, if any, remain to be found). In contrast, miRNAs are produced from intramolecular dsRNA in the form of an inverted repeat transcript. As such, miRNAs were generally believed to be the only Argonaute-bound small RNA that could be generated by conventional transcription units without the need for a trans-annealing mechanism (i.e., via complementary transcripts) or a copying mechanism (i.e., via an RNA dependent RNA polymerase that could synthesize a complementary strand).
Recently, a new pathway was found to generate siRNAs via long inverted repeat transcripts in flies17-19 and mice,20-21 termed hairpin RNAs (hpRNAs). Best characterized in Drosophila, these longer cousins of miRNA transcripts are processed by a distinct pathway that includes many canonical RNAi factors but also a canonical miRNA factor. The realization of this pathway blurs the distinction between the miRNA and RNAi pathways, and raises new questions about the genesis and function of inverted repeat RNA genes in animals.
Animal miRNAs are processed from short hairpins termed pre-miRNAs, which themselves are the products of much longer precursors known as pri-miRNAs.22 The nuclear RNAse III enzyme Drosha is responsible for “cropping” pre-miRNA hairpins out of pri-miRNAs (Figs. 1 and and2A2A).23 Most pri-miRNAs are non-coding transcripts, and the pre-miRNA can derive from either the introns or exons of such transcripts. However, about a third of pre-miRNAs are located in the introns of protein-coding genes.24
A subset of miRNAs derive from short hairpin introns termed “mirtrons” (Fig. 1).25,26 The splicing machinery directly defines the mirtron ends, and following their linearization by lariat debranching enzyme they adopt hairpin structures typical of pre-miRNAs (Fig. 2B). Mirtrons constitute 5-10% of miRNA genes in invertebrates25,26 and vertebrates,27,28 but as typical mirtrons are expressed at much lower levels than typical canonical miRNAs, their regulatory influence is presumed to be comparatively more narrow.
Pre-miRNA hairpins produced from canonical precursors and mirtrons are both exported from the nucleus via Exportin-5. In the cytoplasm, pre-miRNAs are cleaved by the RNase III enzyme Dicer to yield a ~21-22 nt duplex. One strand of the duplex, termed the miRNA, is preferentially selected for entry into a silencing complex that includes an Argonaute (AGO) protein.29,30 The miRNA guides the AGO complex to complementary sites on target transcripts, which are mostly (although not exclusively) located in 3' UTRs. The partner strand of the duplex, known as the miRNA* species, is preferentially degraded; however, it was recently recognized that many miRNA* species can also access AGO complexes and regulate targets.31
If one introduces an artificial target that is perfectly complementary to an endogenous miRNA, it will be usually be readily sliced at a position that lies opposite nucleotides 10 and 11 measured from the 5' end of the miRNA,32 just as mRNAs perfectly complementary to artificial siRNAs are.33,34 In Drosophila, miRNAs are preferentially sorted into the poor slicer AGO1, with a small portion of miRNAs sorted to the main Slicer AGO2.17,19,35-37 It is presently unclear what endogenous regulatory purpose the slicing capability of miRNAs serves, since with few exceptions,38,39 animal miRNAs pair imperfectly to endogenous mRNAs. In fact, as little as 7 nucleotides of Watson-Crick basepairing to positions 2-8 from their 5' ends (the miRNA “seed”) can be sufficient to direct substantial repression.40-43 Transcripts bearing highly conserved seed matches are inferred to maintain functional regulatory interactions with cognate miRNAs, and comprise at least 30% of mammalian transcripts.44 Additional functional miRNA targets may have imperfect seed matching40,45,46 or be poorly conserved.47-49 The mechanism(s) by which imperfectly-paired targets are repressed by miRNAs is controversial at present, with experimental support for many models published in recent years. These include sequestration from ribosomes (by relocalizing into P bodies), blockage of translational initiation, translational repression after initiation, co-translational degradation, and target deadenylation coupled to transcript degradation.9
Although a few miRNAs were found purely by genetic means, most were found by cloning and/or by computational means. In the cloning method, short RNAs are prepared either by size fractionation by polyacrylamide gel electrophoresis,4,6,7 or by co-immunoprecipitation with Argonaute-containing complex.50-52 Following reverse transcription and sequencing, one attempts to map cloned ~21-24 nt RNAs to a local genomic inverted repeat. In the computational method, one identifies evolutionarily conserved hairpins that exhibit characteristic patterns of nucleotide divergence, namely candidates for which the terminal loop evolves more quickly than either hairpin arm.53,54 Consideration of several structural and/ or sequence details can further enrich for likely miRNAs, although the improvement in specificity always comes at some tradeoff in sensitivity, since some bona fide miRNA hairpins inevitably fail to pass certain criteria.55-59
Current computational approaches exploit the powerful comparative analysis now possible with tens of assembled animal genomes, and cloning approaches benefit tremendously from the capacity of next-generation sequencing strategies that produce millions of short RNA sequences in a single run. These combined approaches have now yielded many thousands of miRNAs.60 Does this depth of miRNA analysis mean that we have a good handle on what a miRNA gene looks like? In fact, we do not. In contrast to protein-coding genefinders, there is presently no effective method to perform miRNA genefinding across a complex genome without taking into account comparative genomics. Another serious issue concerns the fact that most computational efforts depend on a fairly narrow set of criteria based on the initial set of cloned miRNAs.61 For instance, a recent study used deep-sequencing data to confidently some unexpectedly long miRNA hairpins in Drosophila that were twice the length of typical pre-miRNAs,58 a range that is all but ignored by most miRNA annotators. In addition, the recent discovery that miRNAs can derive from mirtrons, whose hairpins are too short to qualify as canonical miRNAs,25-27 further suggests that miRNA genes of atypical structure may remain to be identified.
A few years ago, transgenic RNAi was induced in flies using inverted repeat snapback constructs,62-64 although they were difficult to clone and the knockdowns in some cases were variable. It was subsequently realized that the inclusion of an spacer between the hairpin stems65 could improve the stability of such inverted repeats in bacteria, as well as potentially in transgenic animals,66 although the knockdowns remained variable in some cases. A third innovation was the inclusion of an intron67 or the incorporation of genomic fragments spanning introns;68 it was suggested that these promoted their knockdown activity. Interestingly, an early study aimed at annotating putative non-coding mRNA-like (pncr) molecules in Drosophila identified a spliced, polyadenylated transcript named pncr009.69 Although not reported at the time, in some respects it very much resembled the design of typical RNAi transgenes: it has ~400 bp of stem and a long, spliced, terminal loop. The only difference is that its stem is imperfectly paired, akin to a long miRNA hairpin.
Computational methods can be employed to identify other potential examples of long inverted transcripts. This strategy was used effectively in Arabidopsis to identify miRNA genes.70,71 However, in the absence of supporting experimental evidence, lists of genomic inverted repeats are of limited utility. Thousands of long inverted repeats are easily found, but they are not necessarily contained within a defined transcription unit. The vast majority involve the long terminal repeats of transposons, and a substantial number of the remaining loci comprise tandem duplicated tRNA or protein-coding genes arranged in convergent or divergent orientation.18 However, a handful of these are contained within sequenced, spliced, cDNAs, indicating that they may also belong to the pncr009 class. In fact, several of them are clustered in the genome with pncr009.18
Recently, the availability of large catalogs of cloned Drosophila small RNAs allowed lists of genomic inverted repeats to be rationally evaluated. This revealed a limited number of inverted repeats that generated small RNAs and did not map to transposable elements or satellite sequences.18 Of these, the possibility that some of these loci were degraded into cloned breakdown fragments, as opposed to generating processed small RNAs, was assessed by examining the size distribution of RNAs that mapped to each locus. By demanding that the majority of cloned small RNAs were 21-22 nt in length, relative to all other sizes, this analysis boiled down to 7 genomic loci that were predicted as long inverted repeats and generated primarily 21-22 nt RNAs (e.g., Fig. 2C); one of the loci (hp-CG4068) consisted of 20 tandem repeats.18 Several of these loci were represented by full length polyadenylated cDNAs, indicating that they derive from conventional RNA polymerase II transcripts; some of these were also spliced. These were collectively designated as hairpin RNAs (hpRNAs).
Despite the fact that miRNA precursors and hpRNAs are both imperfect inverted repeats, these RNAs are channeled into distinct biogenesis pathways. Similar to exogenous siRNAs generated from artificial substrates, hpRNAs are processed by Dcr-2 instead of Dcr-1, are loaded into AGO2, and are modified at their 3' ends by Hen1 methyltransferase (Fig. 1).17-19 Like siRNAs from artificial long inverted repeats,72 the siRNA duplexes derived from hpRNAs are also phased (Fig. 2C). Tests of artificial and endogenous substrates demonstrated that hpRNAs could repress and directly cleave targets.17,18 Thus, hpRNAs generate siRNAs. Curiously then, the accumulation of hpRNA-derived siRNAs requires Loquacious (Loqs), which is dsRBD protein that assists Dcr-1 in the cleavage of pre-miRNA hairpins (Fig. 1).1
In plants, most miRNAs exhibit extended complementarity to one or a few mRNAs. Quite a few have now been validated to direct the cleavage of these targets,70 although it should be noted that plant miRNAs can also mediate translational repression of highly complementary targets.73 miR-196 is a rare example of an animal miRNA that is essentially perfectly complementary to a target (HoxB8) and directs its cleavage.38,39 Several Drosophila hpRNA-derived siRNAs are nearly perfectly complementary to endogenous transcripts. For example, one of the siRNA products of hp-CG4068 is antisense to the coding region of mus308, a DNA polymerase involved in the DNA damage response. This is the best-characterized hpRNA target thus far, having been validated by sensor tests in cultured cells and by cleavage assays that tested both endogenous hp-CG4068 and endogenous mus308.17,18 Another hpRNA, hp-CG18854, is a pseudogene of CG8289, which encodes a chromodomain protein.17,18 The endogenous regulatory activity of hp-CG18854-derived siRNAs was detected in cultured cells, and elevated hp-CG18854 could repress a CG8289-GFP fusion target in trans.18
These observations indicate that hpRNA-derived siRNAs can target endogenous transcripts by cleaving them. However, it is unclear whether hpRNA-derived siRNAs might target endogenous targets with limited complementarity. In one test performed thus far, the mus308-targeting hp-CG4068 siRNA did not appear to substantially repress a seed-matched target.18 However, as Drosophila siRNAs are able to cleave targets that are less-than-perfectly complementary,74 it is conceivable that additional hpRNA targets might exist.
Concurrent studies of mouse oocytes revealed four loci that were designated as hairpin RNAs. In fact, one of the most abundant siRNA loci was an hpRNA comprising a long-inverted repeat pseudogene of the Rangap1 gene,20,21 similar to the case of hp-CG18854:CG8289 in Drosophila. As with Drosophila hpRNAS, the accumulation of siRNAs from hp-Rangap1 (also known as Au76) is Dicer-dependent, and Dicer-mutant oocytes exhibit substantial upregulation of cognate Rangap1 transcripts. It is unclear whether this regulatory strategy is conserved or convergent, but these findings suggest that analogous systems to generate siRNAs from long inverted repeats operate in both Drosophila and mammals.
Curiously, mammalian oocytes exhibit a much more complex network of pseudogene-derived siRNAs that involve at least 20-30 antisense transcribed pseudogenes and their cognate progenitors. It was proposed that the siRNAs are generated from dsRNA formed from annealed gene:pseudogene complementary pairs, and their regulatory impact was evidenced by the upregulation of many the siRNA-complementary transcripts in Dicer-mutant oocytes.20,21 Thus, while pseudogenes have generally been assumed to be nonfunctional loci, both intramolecular and intermolecular mechanisms can produce regulatory siRNAs from pseudogenes.
The discovery of the hpRNA pathway raises many questions that will undoubtedly keep small RNA researchers busy for many years. One key question is regards how hpRNA and miRNA precursors are segregated. Just as the shortest giants and the tallest dwarves overlap in height, there are “short” hpRNAs18 and “long” miRNAs58 that appear indistinguishable. Yet, these are sorted into distinct biogenesis pathways, since long miRNA hairpins still generate only a single small RNA duplex, whereas short hpRNAs generate multiple small RNA duplexes. In the absence of merely a size-dependent mechanism, there is presumably a more sophisticated strategy that either prevents hpRNAs from being processed by Dcr-1, and/or prevents pre-miRNAs from being cleaved by Dcr-2. The nature of either putative sorting mechanism remains to be better understood.
More typically, though, hpRNA stemloops are much longer than miRNA stemloops. What is the upper limit on hpRNA stem length? Could there be read-through transcripts across duplicated inverted repeat genes that might generate hpRNA-like foldbacks many kilobases in length? Might this sort of pathway generate some fraction of transposon-derived siRNAs?17,19,36,75 This might conceivably be akin to the generation of nematode siRNAs from transcriptional readthrough across Tc1 elements, which form intramolecular dsRNA across the terminal inverted repeats.76 Finally, several of the hpRNAs are derived from spliced primary transcripts.17,18 Is there any limit to the size of their introns? If not, might there exist hpRNAs whose stems are separated by long introns perhaps tens of kilobases in length? The answers to these remain for future studies.
The finding that the canonical miRNA factor Loquacious (Loqs) plays a substantial role in hpRNA processing raises another mystery. Loqs is a dsRBD protein that is usually portrayed as a core miRNA biogenesis factor that aids Dicer-1 in cleaving pre-miRNA hairpins.77-79 However, careful scrutiny of loqs null mutant animals and biochemical reconstitution experiments recently suggest that Loqs has only an auxiliary role.80 Although it is certainly involved in miRNA biogenesis, the maturation of a substantial number of miRNA loci is reasonably normal in the absence of Loqs. Surprisingly then, Loqs proved to be very strongly required for the accumulation of many hpRNA-derived siRNAs (Fig. 1).17,18 Does Loqs partner directly with Dicer-2 to generate some kinds of siRNAs? Addressing this will likely require in vitro reconstitution of hpRNA processing with defined factors.
The biogenesis of siRNAs from perfectly double-stranded substrates was previously shown to require the dsRBD protein R2D2, which directs Dicer-2 to load siRNAs into AGO2.81,82 R2D2binds Dicer-2 directly, just as Loqs binds Dicer-1 directly. One might have thought that the imperfect dsRNA found in pre-miRNAs and hpRNAs might explain their shared requirement for Loqs. However, this does not seem to be the case, since several transposonderived siRNAs and cis-NAT-siRNAs similarly require Loqs for their accumulation.17,75,83 Although not revealed by earlier protein interaction studies, proteomic analysis of Dicer-2 complexes suggest that it Dicer-2/Loqs complexes may exist.17 Do these proteins directly work as a team to process endo-siRNAs, and is so, is their requirement in dicing or loading of siRNAs? Clearly, there is much to be learned about the functional partnerships of RNAse III enzymes and their dsRBD partners.
Perhaps most importantly, what are the endogenous functions of hpRNAs? The best characterized targets of hpRNAs encode a chromodomain protein (CG8289) and a nuclease (mus308).17,18 The generation and analysis of hpRNA deletion mutants will help to address the endogenous significance of these regulatory interactions. Curiously, Drosophila cis-natural antisense transcripts that generate siRNAs are also strongly enriched for various nuclease and transcription cofactors.83 Perhaps these findings suggest a shared molecular axis for the Drosophila endo-siRNA network.
K.O. was supported by the Charles Revson Foundation. E.C.L. was supported by the V Foundation for Cancer Research, the Sidney Kimmel Foundation for Cancer Research, the Alfred W. Bressler Scholar's Fund and the National Institutes of Health (GM083300).