|Home | About | Journals | Submit | Contact Us | Français|
Since the discovery in 1993 of the first small silencing RNA, a dizzying number of small RNA classes have been identified, including microRNAs (miRNAs), small interfering RNAs (siRNAs) and Piwi-interacting RNAs (piRNAs). These classes differ in their biogenesis, modes of target regulation and in the biological pathways they regulate. There is a growing realization that, despite their differences, these distinct small RNA pathways are interconnected and that small RNA pathways compete and collaborate as they regulate genes and protect the genome from external and internal threats.
The defining features of small silencing RNAs are their short length (~20–30 nt), and their association with members of the Argonaute (Ago) family of proteins, which they guide to their regulatory targets, typically resulting in reduced expression of target genes. Beyond these defining features, different small RNA classes guide diverse and complex schemes of gene regulation. Some small silencing RNAs, such as siRNAs, derive from double-stranded (ds) RNA, whereas others, such as piRNAs, do not. These different classes of regulatory RNAs also differ in the proteins required for their biogenesis, the constitution of the Argonaute-containing complexes that execute their regulatory functions, their modes of gene regulation, and the biological functions in which they participate. New small RNA classes and new examples of existing classes continue to be discovered, such as the recent identification of endo-siRNAs in flies and mammals. Here, we provide an overview of small silencing RNAs from plants and metazoan animals. For each class, we describe its biogenesis, function, and mode of target regulation, providing examples of the regulatory networks in which each participates (TABLE 1). Finally, we highlight several examples of unexpected, and often unexplained, complexity in the interactions between distinct small RNA pathways.
In 1998, Fire and Mello established dsRNA as the silencing trigger in C. elegans1. Their experiments overturned the contemporary view that antisense RNA induced silencing by base pairing to its mRNA counterpart, thereby preventing its translation into protein. In worms and other animals, siRNA-mediated silencing is known as RNA interference (RNAi). Remarkably, RNAi is systemic in both plants and nematodes, spreading from cell to cell2. In C. elegans, RNAi is also heritable: silencing can be transferred to the progeny of the worm originally injected with the trigger dsRNA3. Viral infection, inverted repeat transgenes, or aberrant transcription products, all lead to the production of dsRNA. dsRNA is converted to siRNAs that direct RNAi. siRNAs were discovered in plants4 and later shown in animal extracts to serve as guides that direct endonucleolytic cleavage of their target RNAs5,6. siRNAs can be classified according to the proteins involved in their biogenesis, their mode of regulation or their size. Here, we differentiate the major types of siRNAs according to the molecules that trigger their production, a classification scheme that best captures the biological distinctions among small silencing RNAs.
Early examples of RNAi were triggered by exogenous dsRNA. In these cases, long, exogenous dsRNA is cleaved into double-stranded siRNAs by Dicer (Dcr), a dsRNA-specific RNase III family ribonuclease7 (FIG. 1). siRNA duplexes produced by Dicer comprise two ~21 nt strands, each bearing a 5′ phosphate and 3′ hydroxyl group, paired so as to leave 3′, two-nucleotide overhangs5,8,9. The strand that directs silencing is called the guide, whereas the other strand, which is ultimately destroyed, is the passenger. Target regulation by siRNAs is mediated by the RNA-induced silencing complex (RISC), the generic name for an Argonaute-small RNA complex6. In addition to an Argonaute protein and a small RNA guide, RISC may also contain auxiliary proteins that extend or modify its function, for example, proteins that re-direct the target mRNA to a site of general mRNA degradation10.
While mammals and C. elegans each have a single Dicer that makes both miRNAs and siRNAs11–14, Drosophila has two Dicers: Dcr-1 makes miRNAs, whereas Dcr-2 is specialized for siRNA production15. The fly RNAi pathway defends against viral infection, and Dicer specialization may reduce competition between pre-miRNAs and viral dsRNAs for Dicer. Alternatively, Dcr-2 and Ago2 specialization might reflect the evolutionary pressure on the siRNA pathway to counter rapidly evolving viral strategies to escape RNAi. In fact, dcr-2 and ago2 are among the most rapidly evolving Drosophila genes16. C. elegans may achieve similar specialization with a single Dicer by using the double-stranded RNA-binding protein, RDE-4, as the gatekeeper for entry into the RNAi pathway17. However, no natural virus infection has been documented in C. elegans18. In contrast, mammals may not use the RNAi pathway to respond to viral infection, having evolved an elaborate, protein-based immune system19–21.
The relative thermodynamic stabilities of the 5′ ends of the two siRNA strands in the duplex determines the identity of the guide and passenger strands22–24. In flies, this thermodynamic difference is sensed by the dsRNA-binding protein R2D2, the partner of Dcr-2 and a component of the RISC Loading Complex (RLC)25,26. The RLC recruits Argonaute2 (Ago2), to which it transfers the siRNA duplex. Ago2 can then cleave the passenger strand as if it were a target RNA27–31. (Ago2 always cleaves its RNA target at the phosphodiester bond that lies between the nucleotides paired to guide nucleotides 10 and 11 (Refs. 8,9). Release of the passenger strand after its cleavage converts pre-RISC to RISC: mature RISC contains only single-stranded guide RNA. In flies, the guide strand is 2′-O-methylated at its 3′ end by the S-adenosyl methionine-dependent methyltransferase, Hen1, completing RISC assembly32,33. In plants, both miRNAs and siRNAs are terminally methylated, which is crucial for their stability34–36.
Plants exhibit a surprising diversity of small RNA types and the proteins that generate them. The diversification of RNA silencing pathways in plants may reflect the need of a sessile organism to cope with biotic and abiotic stress. The number of RNA silencing proteins can vary enormously among animals, too, with C. elegans producing 27 distinct Argonaute proteins compared to five in flies. Phylogenetic data suggest that nearly all of these “extra” C. elegans Argonautes act in the secondary siRNA pathway, perhaps because endogenous, secondary siRNAs are so plentiful in worms37. Arabidopsis thaliana has four Dicer-like (DCL) proteins and 10 Argonautes, with both unique and redundant functions. In plants, inverted repeat transgenes or co-expressed sense and antisense transcripts produce two sizes of siRNAs, 21 nt and 24 nt38,39. The 21 nt siRNAs are produced by DCL4, but in the absence of DCL4, DCL2 can substitute, making 22 nt siRNAs40–44. The DCL4-produced 21-mers typically associate with AGO1 and guide mRNA cleavage. The 24-mers associate with AGO4 (major) and AGO6 (surrogate), and promote the formation of repressive chromatin45.
In plants, exogenous sources of siRNAs are not confined to dsRNAs. Single-stranded sense transcripts from tandemly repeated or highly expressed single-copy transgenes are converted to dsRNA by RDR6, a member of the RNA-dependent RNA Polymerase (RdRP) family, transcribe single-stranded RNA from an RNA template46 (Box 1). RDR6 and RDR1, also convert viral single-stranded RNA into dsRNA, initiating an anti-viral RNAi response47. The resulting dsRNA is cleaved by Dicer into siRNAs that are terminally 2′-O-methylated by HEN136. Why plants RNAs expressed from transgenes are converted by RDR6 into dsRNA, but abundant, endogenous mRNAs are not, is poorly understood. Recent evidence that some housekeeping exonucleases compete with plant RNA silencing pathways for aberrant RNAs suggests that substandard RNA transcripts—e.g. those lacking a 5′ cap or 3′ poly(A) tail—act as substrates for RdRPs. Highly expressed transgenes might overwhelm normal RNA quality control pathways, escape destruction, and be converted to dsRNA by RdRPs47–49.
RNA-dependent RNA polymerases (RdRPs) amplify the silencing response. Primary siRNAs derived from exogenous triggers by Dicer processing bind their mRNA targets and direct cleavage by AGO complexes203. In plants, RdRPs uses these cleaved transcript fragments as templates to synthesize long dsRNA; the dsRNA is then diced into secondary siRNAs41,42,44,46,204–206. Secondary siRNAs are formed both 5′ and 3′ of the primary targeted interval, suggesting that mRNA cleavage per se, rather than priming of RdRP by primary siRNAs is the signal for siRNA amplification. Other data suggest that production of secondary siRNAs in Arabidopsis may sometimes be primed207. RdRP amplification of siRNAs is especially important in defending plants against viral infection.
In C. elegans, primary siRNAs are amplified into secondary siRNAs by a different mechanism203. In worms, primary siRNAs are bound to RDE-1, a “primary Argonaute”208,209. The primary siRNAs guide RDE-1 to the target mRNA, to which it recruits RdRPs that synthesize secondary siRNAs210,211. Worm secondary siRNAs have a 5′ di- or triphosphate, indicating that they are produced by transcription rather than dicing208,209,212, and, at least in vitro, secondary siRNA production does not require Dicer211. How the length of siRNA transcription is controlled is perplexing, but in vitro, the Neurospora RdRP, QDE1, can directly transcribe short RNA oligomers ~22 nt long from a much longer template212. As a consequence of their production by an RdRP, secondary siRNAs in C. elegans are exclusively antisense to their mRNA targets208,213. Secondary siRNAs act bound to secondary Argonautes, such as CSR-1, which can cleave its mRNA targets just like fly and human Ago2 proteins211.
The presence of siRNA amplification in plants, worms, fungi, and according to some early reports, in flies, led to the speculation that RdRPs are a universal feature of RNAi. An amplification step in human RNAi could produce secondary siRNAs bearing homology to other genes, a significant impediment to the use of RNAi as a target discovery tool or as therapy for human diseases. However, the success of allele-specific RNAi in cultured human cells and in mice makes it unlikely that an RdRP-catalyzed amplification step occurs in mammals214–218. Similarly, extensive biochemical and genetic studies have demonstrated that the fly RNAi pathway does not use an RdRP enzyme5,39,219–222.
The first endo-siRNAs were detected in plants and C. elegans38,50,51. Plants too produce a variety of endo-siRNAs, and, the recent discovery of endo-siRNAs in flies and mammals suggests that endo-siRNAs are ubiquitous among higher eukaryotes.
In plants, cis-acting siRNAs (casiRNAs) originate from transposons, repetitive elements, and tandem repeats such as 5S rRNA genes, and comprise the bulk of endo-siRNAs44 (FIG. 2). CasiRNAs are predominantly 24 nt long and methylated by HEN1. Their accumulation requires DCL3 and the RNA polymerases, RDR2 and POL IV, and either AGO6 (primarily) or AGO4, which act redundantly44,51–60. CasiRNAs promote heterochromatin formation by directing DNA methylation and histone modification of the loci from which they originate38,44,51,52,61–63.
Another class of plant endo-siRNAs illustrates how distinct small RNA pathways interact. Trans-acting siRNAs (tasiRNAs) are endo-siRNAs generated by the convergence of the miRNA and siRNA pathways in plants64–68(FIG. 2). miRNA-directed cleavage of certain transcripts recruits the RdRP enzyme, RDR6. RDR6 then copies the cleaved transcript into dsRNA, which DCL4 dices into tasiRNAs that are phased. This phasing suggests that DCL4 begins dicing precisely at the miRNA cleavage site, making a tasiRNA every 21 nt68. The site of miRNA cleavage is critical, because in determining the entry point for Dicer, it establishes the target specificity of the tasiRNAs produced. One of the determinants that seems to predispose a transcript to produce tasiRNAs after its cleavage by a miRNA is the presence of a second miRNA or siRNA complementary site on the transcript. Of special mention is the TAS3 locus, whose RNA transcript has two binding sites for miR-390. Only one of these sites is efficiently cleaved by miR-390, but binding of the miRNA to both appears to be required to initiate conversion of the TAS3 transcript to dsRNA by RDR669,70.
Natural antisense transcript-derived siRNAs (natsiRNAs) are produced in response to stress in plants71,72(FIG. 2). They are generated from a pair of convergently transcribed RNAs: typically, one transcript is expressed constitutively, whereas the complementary RNA is transcribed only when the plant is subject to environmental stress, such as high salt. Production of 24 nt siRNAs from region of overlap of the two transcripts requires DCL2 and/or DCL1, RDR6, and SGS3 (SUPPRESSOR OF GENE SILENCING3, probably an RNA-binding protein)73 and Pol IV71,72. The 24 nt natsiRNAs then direct cleavage of one of the mRNAs of the pair, and in one such case, trigger the DCL1-dependent production of 21 nt secondary siRNAs72. In addition to natsiRNAs, “long” siRNAs (lsiRNAs) in Arabidopsis also originate from NAT pairs and are stress-induced. Unlike natsi-RNAs, lsiRNAs are 30–40 nts long and require DCL1, DCL4, AGO7, RDR6 and POL IV for their production74.
Plant and worm endo-siRNAs are typically produced through the action of RdRPs (Box 1). The genomes of flies and mammals seem not to encode such RdRP proteins, so the recent discovery of endo-siRNAs in flies and mice was unexpected.
The first mammalian endo-siRNAs to be reported corresponded to the LINE-1 retrotransposon and were detected in cultured human cells75. Full length LINE-1 (L1) contains both sense and antisense promoters in its 5′ untranslated region (UTR) that could, in principle, drive bi-directional transcription of L1, producing overlapping, complementary transcripts to be processed into siRNAs by Dicer, but the precise mechanism by which transposons trigger siRNA production in mammals remains unknown.
More recently, endogenous siRNAs have been detected in Drosophila somatic and germ cells and in mouse oocytes. High throughput sequencing of small RNAs from germ-line and somatic tissues of Drosophila and of Ago2 immunoprecipitates revealed a small RNA population that could readily be distinguished from miRNAs and piRNAs76–81. These small RNAs are nearly always exactly 21 nt long, are present in both sense and antisense orientations, have modified 3′ ends, and, unlike miRNAs and piRNAs, are not biased toward beginning with uracil. Production of the 21-mers requires Dcr-2, although in the absence of Dcr-2 a remnant of the endo-siRNA population inexplicably persists.
Fly endo-siRNAs derive from transposons, heterochromatic sequences, intergenic regions, long RNA transcripts with extensive structure, and, most interestingly, from mRNAs (FIG. 3). Expression of transposon mRNAs increases in both dcr-2 and ago2 mutants, implicating an endogenous RNAi pathway in the silencing of transposons in flies, as reported previously for C. elegans82,83. siRNAs derived from mRNAs are >10 times more likely to come from regions predicted to produce overlapping, convergent transcripts than expected by chance76, suggesting that endo-siRNAs originate from endogenous dsRNA formed when these complementary transcripts pair.
A subset of fly endo-siRNAs derive from “structured loci” whose RNA transcripts can fold into long, intramolecularly paired hairpins77–79. Accumulation of these siRNAs requires Dcr-2 and the dsRNA-binding protein Loquacious (Loqs)—typically considered the partner of Dcr-1, the dicer that produces miRNA—rather than R2D284, the usual partner of Dcr-2 (FIG. 1). While surprising, a role for Loqs in the biogenesis of endo-siRNAs from structured loci was anticipated by the earlier finding that Loqs plays a role in the production of siRNAs from transgenes designed to produce long, intramolecularly paired inverted repeat transcripts so as to trigger RNAi in flies85.
Endo-siRNAs have also been identified in mouse oocytes86,87. As in flies, mouse endo-siRNAs are 21 nt long, Dicer-dependent, and derived from a variety of genomic sources (FIG. 3). The mouse endo-siRNAs were bound to Ago2, the sole mammalian Ago protein thought to mediate target cleavage, although it is not known if they also associate with any of the other three mouse Ago proteins. (Mammalian Ago2 is not, however, the ortholog of fly Ago2, whose sequence is considerably diverged from other Ago proteins.)
A subset of mouse oocyte endo-siRNAs maps to regions of protein-coding genes capable of pairing to their cognate pseudogenes and to regions of pseudogenes capable of forming inverted-repeat structures (FIG. 3). Pseudogenes can no longer encode proteins, yet they drift from their ancestral sequence more slowly than would be expected if they were simply junk. Perhaps some pseudogene sequences are under evolutionary selection to retain the ability to produce antisense transcripts that can pair with their cognate genes so as to produce endo-siRNAs88.
A key challenge for the future will be to understand the biological function of endo-siRNAs, especially those that can pair with protein-coding mRNAs. Do they regulate mRNA expression? Can endo-siRNAs act like miRNAs, tuning the expression of large numbers of genes?
The first microRNA, lin-4, was identified in a screen for genes required for post-embryonic development in C. elegans89,90. The lin-4 locus produces a 22 nt RNA that is partially complementary to sequences in the 3′ UTR of its regulatory target, the lin-14 mRNA89,91. miRNA binding to partially complementary sites in mRNA 3′ UTRs is now considered a hallmark of animal miRNA regulation. In 2001, tens of miRNAs were identified in humans, flies, and worms by small RNA cloning and sequencing, establishing miRNAs as a new class of small silencing RNAs92–94. miRBase (Release 12.0), the registry that coordinates miRNA naming, now lists 1,638 distinct miRNAs in plants and 6,930 in animals and their viruses95.
miRNAs derive from precursor transcripts called pri-miRNAs, which are typically transcribed by Polymerase II96–99. Several miRNAs are present as clusters in the genome and likely derive from a common pri-miRNA transcript. Liberating a 20–24 nt miRNA from its pri-miRNA requires the sequential action of two RNase III endonucleases, assisted by their double-stranded RNA-binding domain (dsRBD) partner proteins (FIG. 1). First, the pri-miRNA is processed in the nucleus into a 60–70 nt long pre-miRNA by Drosha, acting with its dsRBD partner, called DGCR8 in mammals and Pasha in flies96,100–104. The resulting pre-miRNA has a hairpin structure: a loop flanked by base-paired arms that form a stem. Pre-miRNAs have a two nt overhang at their 3′ ends and a 5′ phosphate group, reflecting their production by an RNase III. The nuclear export protein, Exportin-5, carries the pre-miRNA to the cytoplasm bound to Ran GTP, a GTPase that moves RNA and proteins through the nuclear pore105–108.
In the cytoplasm, Dicer and its dsRBD partner protein, TRBP (mammals) or Loqs (flies), cleaves the pre-miRNA7,11–13,85,109–112. Drosha and Dicer differ in that Dicer—like Argonaute proteins, but unlike Drosha—contains a PAZ domain, presumably allowing it to bind the two-nucleotide, 3′ overhanging end left by Drosha. Dicer cleavage generates a duplex containing two strands, the miRNA and miRNA*, corresponding to the two sides of the base of the stem. These roughly correspond to the guide and passenger strands of an siRNA, and similar thermodynamic criteria influence the choice of miRNA versus miRNA*22,23. miRNAs can arise from either arm of the pre-miRNA stem, and some pre-miRNAs produce mature miRNAs from both arms, whereas others show such pronounced asymmetry that the miRNA* is rarely detected even in high throughput sequencing experiments113.
In flies, worms and mammals, a few pre-miRNAs are produced by the nuclear pre-mRNA splicing pathway instead of Drosha processing114–118. These pre-miRNA-like introns, “mirtrons,” are spliced out of mRNA precursors whose sequence suggests they encode proteins. The spliced introns first accumulate as lariat products that require 2′-5′ debranching by the lariat debranching enzyme. Debranching yields an authentic pre-miRNA, which can then enter the standard miRNA biogenesis pathway.
In plants, DCL1 fills the roles of both Drosha and Dicer, converting pri-miRNAs to miRNA/miRNA* duplexes44,119–121. DCL1, assisted by its dsRBD partner HYL1, converts pri-miRNAs to miRNA/miRNA* duplexes in the nucleus, after which the miRNA/miRNA* duplex is thought to be exported to the cytoplasm by HASTY, an Exportin-5 homolog (HASTY mutants develop precociously, hence their name)65,121–123. Unlike animal miRNAs, plant miRNAs are 2′-O-methylated at their 3′ ends by HEN134,119,124. HEN1 protects plant miRNAs from 3′ uridylation, thought to be a signal for degradation36. HEN1 likely acts before miRNAs are loaded into AGO1, because both miRNA* and miRNA strands are modified in plants34.
The mechanism by which a miRNA regulates its mRNA target reflects both the specific Argonaute protein into which the small RNA is loaded and the extent of complementarity between the miRNA and the mRNA125–127. A few miRNAs in flies and mammals are nearly fully complementary to their mRNA targets; these direct endonucleolytic cleavage of the mRNA128–132. Such extensive complementarity is considered the norm in plants, as target cleavage was thought to be the main mode of target regulation in plants39,62,133. However, in flies and mammals, most miRNAs pair with their targets through only a limited region of sequence at the 5′ end of the miRNA, the “seed”; these repress translation and direct degradation of their mRNA targets134–139. The “seed” region of all small silencing RNAs contributes most of the energy for target binding140,141. Thus, the seed is the primary specificity determinant for target selection. The small size of the seed means that a single miRNA can regulate many—even hundreds—of different genes142,143. Intriguingly, recent data suggests that the nuclear transcriptional history of an mRNA influences if a miRNA represses its translation at the initiation or the elongation step144.
Since plant miRNAs are highly complementary to their mRNA targets, they can direct mRNA target cleavage. Nonetheless, AGO1-loaded plant miRNAs can also block translation, suggesting a common mechanism between plant and animal miRNAs, despite the absence of specific miRNAs shared between the two kingdoms145.
Like transcription factors, miRNAs regulate diverse cellular pathways, and are widely believed to regulate most biological processes, in both plants and animals, ranging from housekeeping functions to responses to environmental stress. Covering this vast body of work is beyond the scope of this review; the cited reviews provide valuable insight146–148.
The study of miRNA pathway mutants provided early evidence for the influence of miRNAs on biological processes in both plants and animals. Loss of Dicer or miRNA-associated Argonaute proteins is nearly always lethal in animals, and such mutants show severe developmental defects in both plants and animals. In Drosophila, dcr-1 mutant germ-line stem cell clones divide slowly; in Arabidopsis, embryogenesis is abnormal in dcl1 mutants; in C. elegans, dcr-1 mutants display defects in germ-line development and embryonic morphogenesis; zebrafish lacking both maternal and zygotic Dicer are similarly defective in embryogenesis; and mice lacking Dicer die as early embryos, apparently devoid of stem cells14,119,149–152. Loss of Dicer in mouse embryonic fibroblasts causes increased DNA damage and consequently, the up-regulation of p19Arf and p53 signaling that induces premature senescence153.
Many miRNAs function in specific biological processes, in specific tissues, and at specific times154. The importance of small silencing RNAs goes far beyond the RNA silencing field: long-standing questions about the molecular basis of pluripotency, tumorogenesis, apoptosis, cell identity, etc. are finding answers in small RNAs146,155.
Piwi-interacting RNAs (piRNAs) are the most recently discovered class of small RNAs, and, as their name suggests, they bind to the Piwi clade of Argonaute proteins. (Animal Argonaute proteins can be subdivided by sequence relatedness into Ago and Piwi sub-families.) The Piwi clade comprises Piwi, Aubergine (Aub) and Ago3 in flies, MILI, MIWI and MIWI2 in mice, and HILI, HIWI1, HIWI2 and HIWI3 in humans.
piRNAs were first proposed to ensure germ line stability by repressing transposons when Aravin and colleagues discovered in flies a class of longer small RNAs (~25–30 nt) associated with silencing of repetitive elements156. Later, these “repeat associated small interfering RNAs”—subsequently renamed piRNAs—were found to be distinct from siRNAs: they bind Piwi proteins and do not require Dcr-1 or Dcr-2 for their production, unlike miRNAs and siRNAs33,157,158. Moreover, they are 2′-O-methylated at their 3′ termini, unlike miRNAs, but like siRNAs in flies32,157,159–161.
High throughput sequencing of vertebrate piRNAs revealed a class of piRNAs unrelated to repetitive sequences, hence their name change87,162–168. Mammalian piRNAs can be divided into pre-pachytene and pachytene piRNAs, according to the stage of meiosis at which they are expressed in developing spermatocytes. Like piRNAs in flies, pre-pachytene piRNAs predominantly correspond to repetitive sequences and are implicated in silencing transposons, such as L1 and IAP165. In male mice, gametic methylation patterns are established when germ cells arrest their cell cycle 14.5 days postcoitum, resuming cell division 2–3 days after birth169,170. Both MILI and MIWI2 are expressed during this period, and miwi2 and mili deficient mice lose DNA methylation marks on transposons171. The pre-pachytene piRNAs, which bind MIWI2 and MILI, may serve as guides to direct DNA methylation of transposons. In contrast to pre-pachytene piRNAs, the pachytene piRNAs mainly arise from unannotated regions of the genome, not transposons, and their function remains unknown165.
Three recent studies report that the previously discovered germ-line “21U” RNAs in C. elegans are piRNAs113,172–174. These small RNAs were initially identified by high throughput sequencing113. They are precisely 21 nt long, begin with a uridine 5′-monophosphate, and are 3′ modified. They bind Piwi-Related Gene-1 (PRG-1), a C. elegans Piwi protein. Each 21U-RNA may be transcribed separately, as all are flanked by a common upstream motif. Like piRNAs in Drosophila, the 21U-RNAs are required for maintenance of the germ line and fertility, and like Drosophila Aub and other piRNA pathway components, PRG-1 is found in specialized granules, P granules, associated with germ-line function, in a cytoplasmic, perinuclear ring called “nuage.” Worm piRNAs resemble pachytene piRNAs in mammals: their targets and functions are largely unknown.
piRNA sequences are stunningly diverse, with more than 1.5 million distinct piRNAs identified thus far in flies, but collectively they map to a few hundred genomic clusters77,79,81,158,175–177. The best studied cluster is the flamenco locus. flamenco was identified genetically as a repressor of the gypsy, ZAM and Idefix transposons33,178–182. Unlike siRNAs, flamenco piRNAs are mainly antisense, suggesting that piRNAs arise from long, single-stranded precursor RNAs. In fact, disruption of flamenco by insertion of a P-element near the 5′ end of the locus blocks the production of distal piRNAs up to 168 kbp away. Thus, an enormously long, single-stranded RNA transcript appears to be the source of those piRNAs that derive from the flamenco locus176.
The current model for piRNA biogenesis was inferred from the sequences of piRNAs bound to Piwi, Aubergine and Ago3176,183. piRNAs bound to Piwi and Aubergine are typically antisense to transposon mRNAs, whereas Ago3 is loaded with piRNAs corresponding to the transposon mRNAs themselves (FIG. 1). Moreover, the first 10 nucleotides of antisense piRNAs are frequently complementary to the sense piRNAs found in Ago3. This unexpected sequence complementarity has been proposed to reflect a feed-forward amplification mechanism—”piRNA ping-pong”—that is activated only upon transposon mRNA transcription (FIG. 4)176,183. A similar amplification loop has been inferred from high throughput piRNA sequencing in vertebrates, implying its conservation through evolution162,171. Many aspects of the ping-pong model remain speculative. Why Ago3 appears to bind only sense piRNAs derived from transposon mRNAs is unknown. An untested idea is that different forms of RNA Pol II transcribe primary piRNA transcripts and transposon mRNAs and that the specialized RNA Pol II that transcribes the primary piRNA precursor recruits Piwi and Aub, but not Ago3. How the 3′ ends of piRNAs are made is also not known.
Piwi family proteins are indispensable for germ-line development in many, perhaps all, animals; but they have thus far been most extensively studied in Drosophila. Piwi is restricted to the nucleoplasm of Drosophila germ cells and adjacent somatic cells. Piwi is required to maintain germ line stem cells and to promote their division; the protein is required in both the somatic niche cells that support germ-line stem cells and in the stem cells themselves184,185. In the male germ line, Aub is required for the silencing of the repetitive Stellate locus, which would otherwise cause male sterility. Expression of Stellate is controlled by the related, repetitive Suppressor of Stellate locus, the source of antisense piRNAs that act through Aub to repress Stellate156,157,186.
aub was originally identified because it is required for specification of the embryonic axes187. The loss of anterior-posterior and dorsal-ventral patterning in embryos from mothers lacking Aub is an indirect consequence of the double-stranded DNA breaks that occur in the oocyte in its absence188. The breaks appear to activate a DNA-damage checkpoint that disrupts patterning of the oocyte and, consequently, of the embryo. The defects in patterning, but not in silencing repetitive elements, are rescued by mutations that bypass the DNA damage signaling pathway, suggesting the breaks are caused by transposition. That activation of a DNA damage checkpoint should inappropriately reorganize embryonic polarity was most unexpected, but further underscores the vital role piRNAs play in germ line development.
The role of piRNAs in the fly soma is hotly debated. Piwi and Aub are required to silence tandem arrays of white, a gene required to produce red eye pigment189. It is not understood if piRNAs are produced in the soma as well as in the germ line, or if piRNAs present during germ-line development deposit long-lived chromatin marks that exert their effects days later.
Both piRNAs and endo-siRNAs repress transposons in the germ line, where mutations caused by transposition, of course, would propagate to the next generation. siRNAs—that is, the RNAi pathway—likely provide a rapid response to the introduction of a new transposon into the germ line, a challenge not dissimilar to a viral infection. In contrast, the piRNA system appears to provide a more robust, permanent solution to the acquisition of a transposon. In the soma, however, endo-siRNAs are the predominant transposon-derived small RNA class, and their loss in dcr-2 and ago2 mutants increases transposon expression76,77,79,81. Somatic piRNA-like small RNAs have been observed in ago2 mutant flies76. Perhaps, in the absence of endo-siRNAs, piRNAs are produced somatically and resume transposon surveillance. Such a model implies significant cross-talk between the piRNA and endo-siRNA–generating machineries.
The RNAi, miRNA and piRNA pathways were initially believed to be independent and distinct. However, the lines distinguishing them continue to fade. These pathways interact and rely on each other at several levels, competing for and sharing substrates, effector proteins and cross-regulating each other.
Both the siRNA and miRNA pathways load dsRNA duplexes containing a 19 bp double-stranded core flanked by 2 nt 3′ overhangs. An siRNA duplex contains guide and passenger strands and is complementary throughout its core; a miRNA/miRNA* duplex contains mismatches, bulges and G:U wobble pairs. In Drosophila, biogenesis of small RNA duplexes is uncoupled from its loading into Ago1 or Ago2190,191. Instead, loading is governed by the structure of the duplex: duplexes bearing bulges and mismatches are sorted into the miRNA pathway and hence loaded into Ago1; duplexes with greater double-stranded character partition into Ago2, the Argonaute protein associated with RNAi.
The partitioning of small RNAs between Ago1 and Ago2 also has implications for target regulation. Ago1 primarily represses translation whereas Ago2 represses by target cleavage, reflecting the faster rate of target cleavage by Ago2 compared to Ago1191. Sorting creates competition between the two pathways for substrates190,191. In Drosophila loading of a small RNA duplex into one pathway decreases its association with other pathway.
Different double-stranded RNA precursors require distinct combinations of proteins to produce small silencing RNAs. For example, Drosophila endo-siRNAs derived from structured loci require Loqs, rather than R2D277–79. We presume that under some circumstances the endo-siRNA and miRNA pathways might therefore compete for Loqs. The endo-siRNA and RNAi pathways likely also compete for shared components.
In contrast to Drosophila, plants load small RNAs into Argonautes according to the identity of the 5′ nt of the small RNA69,192. AGO1 is the main effector Argonaute for miRNAs, and the majority of miRNAs begin with uracil; and AGO4 is the major effector of the heterochromatic pathway and is predominantly loaded with small RNAs beginning with an adenosine193. AGO2 and AGO5, however have no characterized function in plants193. Changing the 5 nt from A to U shifts the loading bias of a plant small RNA from AGO2 to AGO1, and vice versa. Similarly, Arabidopsis AGO4 binds small RNAs that begin with adenosine, while AGO5 prefers cytosine.
Aub- and Piwi-bound piRNAs typically begin with U, whereas those bound to Ago3 show no 5′ nucleotide bias. It remains to be determined if this reflects a 5′ nucleotide preference like the situation for the plant AGOs or some feature of an as-yet-discovered piRNA loading machinery that sorts piRNAs between Piwi proteins.
Small RNA pathways are often entangled. TasiRNA biogenesis in Arabidopsis is a classic example of such cross talk between pathways. miRNA-directed cleavage of tasiRNA-generating transcripts initiates tasiRNA production and subsequent regulation of tasiRNA targets64–68. In C. elegans, at least one piRNA has been implicated in initiating endo-siRNA production172,173, and in flies, the endo-siRNA pathway may repress expression of piRNAs in the soma76. Moreover, small RNA levels may be buffered by negative feedback loops in which small RNAs from one pathway alter the expression levels of RNA silencing proteins that act in the same or in other RNA silencing pathways.194–198.
Despite our growing understanding of the mechanism and function of small RNAs, their evolutionary origins remain obscure. siRNAs are present in all three eukaryotic kingdoms, plants, animals, and fungi, and provide anti-viral defense in at least plants and animals. Thus, the siRNA machinery was present in the last common ancestor of plants, animals and fungi. In contrast, miRNAs have only been found in land plants, the unicellular green alga, Chlamydomonas reinhardtii, and metazoan animals, but not in unicellular choanoflagellates or fungi199–201. Deep sequencing experiments have found no miRNAs shared by plants and animals, suggesting that miRNA genes, unlike the miRNA protein machinery, arose independently at least twice in evolution. Finally, piRNAs appear to be the youngest major small RNA family, having been found only in metazoan animals201. While Dicer proteins have been identified only in eukaryotes, Argonaute proteins can also be found in eubacteria and archea, raising the prospect that small nucleic-acids may have served as guides for proteins at the very dawn of cellular life, and though the machinery might be ancient, the small RNA guides diversified over time to acquire specialized roles.
The history of small silencing RNAs makes predicting the future particularly daunting, as new discoveries have come at a breakneck pace, with each new small RNA mechanism or function forcing a re-evaluation of cherished models and “facts.” Several longstanding but unanswered questions, however, are worth highlighting. First, does RNAi—in the sense of an siRNA-guided defense against external nucleic acid threats such as viruses—exist in mammals? Second, how do miRNAs repress gene expression? Do several parallel mechanisms co-exist in vivo, or will the current, apparently contradictory, models for miRNA-directed translational repression and mRNA decay ultimately be unified in a larger mechanistic scheme? Third, can miRNA regulated genes ever be identified by computation alone, or will computational predictions ultimately give way to high-throughput experimental methods for associating individual miRNA species with their regulatory targets? Will network analysis uncover themes in miRNA-target relationships that reveal why miRNA-regulation is so widespread in animals? Fourth, how are piRNAs made? The feed-forward amplification “ping-pong” model is appealing, but likely underestimates the complexity of piRNA biogenesis mechanisms? We do not yet know how piRNA 3′ ends are generated. Nor do we have a coherent model for how long, antisense transcripts from piRNA clusters are fragmented into piRNAs. Finally, will the increasing number of examples of small RNAs carrying epigenetic information across generations3,202 ultimately force us to reexamine our Mendelian view of inheritance?
Much of the credit for the identification of small RNAs rests with advances in high throughput sequencing. Presently, there are three commercial “high depth” sequencing systems: Roche′s 454 GS FLX Genome Analyzer, Illumina′s Solexa Analyzer and, most recently, Applied Biosystem′s SOLiD System. Reference 223 describes how each method works. Whereas 454 has the advantage of sequencing >250 bp per read, compared to ~35–50 bp for Solexa and SOLiD, these two platforms provide 70- to 400-fold greater sequencing depth. All three platforms have been used successfully to identify novel small RNA species and to discover new small RNA classes in mutant plants and animals. Using less than 10 µg total RNA, high throughput sequencing, together with advances in small RNA library preparation, has revealed the length distribution, sequence identity, terminal structure, sequence and strand biases, isoform prevalence, genomic origins, and mode of biogenesis for millions of small RNAs. Initial small RNA sequencing experiments sought simply to identify novel small RNA species and classes. Increasingly, high throughput sequencing is being used to profile small RNA expression across the stages of development and in different tissues and disease states. Profiling by deep sequencing provides quantitative information about small RNA expression, like PCR- or microarray-based approaches, but can also precisely detect subtle changes in small RNA sequence or length.
Perhaps the most problematic step in small RNA sequencing is preparing the small RNA library. The most frequently employed cloning protocols require the small RNAs to have 5′ phosphate and a 3′ hydroxyl groups, the hallmarks of Dicer products. This approach identifies small RNAs with the expected termini, but alternative methods must be used to find small RNAs, such as C. elegans secondary siRNAs, with other terminal structures. Additionally, finding every possible small RNA in a cell using exhaustive deep sequencing is a game with diminishing returns. For example, while many miRNAs have been sequence 100,000′s or even a million times, the C. elegans miRNA lsy-6, which is apparently expressed in less than ten cells of the adult, has so far eluded high depth sequencing224.
We thank Hervé Seitz for bioinformatic assistance and Xuemei Chen, Ira Pekker and Stefan Ameres for helpful discussions. This work was supported, in part, by grants from the NIH to PDZ (GM065236 and GM062862).