|Home | About | Journals | Submit | Contact Us | Français|
Endogenous small RNAs (endo-siRNAs) interact with Argonaute (AGO) proteins to mediate sequence-specific regulation of diverse biological processes. Here, we combine deep-sequencing and genetic approaches to explore the biogenesis and function of endo-siRNAs in C. elegans. We describe conditional alleles of the dicer-related helicase, drh-3, that abrogate both RNA interference and the biogenesis of endo-siRNAs, called 22G-RNAs. DRH-3 is a core component of RNA-dependent RNA polymerase (RdRP) complexes essential for several distinct 22G-RNA systems. We show that in the germ-line, one system is dependent on worm-specific AGOs, including WAGO-1, which localizes to germ-line nuage structures called P-granules. WAGO-1 silences certain genes, transposons, pseudogenes and cryptic loci. Finally, we demonstrate that components of the nonsense-mediated decay pathway function in at least one WAGO-mediated surveillance pathway. These findings broaden our understanding of the biogenesis and diversity of 22G-RNAs and suggest novel regulatory functions for small RNAs.
Regulatory pathways related to RNA interference (RNAi) utilize small RNAs to guide the sequence-specific modulation of gene expression, chromatin structure and innate immune function (Ding and Voinnet, 2007; Moazed, 2009). Small RNA classes can be distinguished based on a number of factors including mechanism of biogenesis, mode of regulation or function, and the Argonaute (AGO) proteins with which they interact (Ghildiyal and Zamore, 2009). AGO family members are structurally related to ribonuclease (RNase) H and bind to the ends of single-stranded small RNAs, presenting the central residues for base-pairing interactions with target nucleic acids (Parker et al., 2005; Song et al., 2004).
Diverse pathways have been implicated in the biogenesis and loading of various small RNA species onto their respective AGO proteins (Siomi and Siomi, 2009). Double-stranded (ds) RNA, including the stem-loop precursors of miRNAs, are processed into mature 5′ mono-phosphorylated short RNAs by Dicer, an RNaseIII-like enzyme. The biogenesis of PIWI-interacting RNAs (piRNAs) is less well understood and appears to be independent of Dicer. In flies, PIWI-mediated cleavage events appear to define the 5′ ends of new piRNAs (Brennecke et al., 2007; Gunawardane et al., 2007), whereas 3′ end maturation occurs through an undefined mechanism that ultimately results in the 2′-O-methylation of the 3′ residue (Klattenhoff and Theurkauf, 2008). In plants, fungi and nematodes, silencing signals are amplified from target RNA by RNA-dependent RNA polymerases (RdRPs). In some cases, RdRPs and Dicer function in a concerted manner to synthesize and process dsRNA, producing siRNAs with 5′ monophosphate residues (Colmenares et al., 2007; Lee and Collins, 2007). In nematodes, RdRPs also catalyze the unprimed, de novo synthesis of 5′ triphosphorylated RNAs that appear to be loaded directly, without Dicer processing, onto members of an expanded clade of worm-specific AGO proteins (WAGOs) (Aoki et al., 2007; Pak and Fire, 2007; Sijen et al., 2007; Yigit et al., 2006).
In plants, flies and mammals, several classes of small RNA species are derived from transposable elements and repeat sequences as well as a subset of non-repetitive protein coding sequences and pseudogenes (Czech et al., 2008; Ghildiyal et al., 2008; Kasschau et al., 2007; Okamura et al., 2008; Tam et al., 2008; Watanabe et al., 2008). In many cases, these small RNAs are derived from loci capable of dsRNA formation including transposable elements, inverted repeats and bi-directionally transcribed regions. Naturally occurring small RNAs are often coincident with pericentric heterochromatin and have been implicated in the establishment and/or maintenance of heterochromatin and in centromere function (Hall et al., 2003; Verdel et al., 2004). In C. elegans, endogenous small RNAs have been reported to target several hundred loci including protein-coding as well as non-coding loci (Ambros et al., 2003; Lim et al., 2003; Ruby et al., 2006).
Here, we demonstrate that the Dicer-Related Helicase DRH-3 is essential for the biogenesis of RdRP-derived small RNAs in C. elegans. We have named these small RNAs 22GRNAs based on their strong propensity for a 5′G residue and a length of 22nt. 22G-RNAs are abundantly expressed in the germline and are maternally deposited in embryos. Surprisingly, the majority of 22G-RNAs target unique genome sequences including ~50% of the annotated coding genes in C. elegans In addition to DRH-3, we show that the RdRPs, RRF-1 and EGO-1, and the tudor-domain protein EKL-1 are required for the biogenesis of 22G-RNAs. 22G-RNAs can be divided into two major systems based on the associated AGOs and their cofactors. One of these systems is dependent on RDE-3, MUT-7 and members of the worm-specific AGOs (WAGOs), including WAGO-1, while the second is dependent on the AGO CSR-1 and the nucleotidyl transferase CDE-1. The WAGO 22G-RNA system silences transposons, pseudogenes and cryptic loci as well as certain genes, while the CSR-1 system functions to promote chromosome segregation (see Claycomb et al.). Finally, we demonstrate a role for components of the nonsense-mediated mRNA decay (NMD) pathway in 22G-RNA biogenesis. Our findings uncover a surprisingly rich maternal inheritance of small RNAs and raise many questions about the potential significance of these RNA species in the transmission of epigenetic information and genome surveillance.
A previous study identified DRH-3 as a Dicer-interacting factor required for germline RNAi and for viability (Duchaine et al., 2006). Animals homozygous for the drh-3(tm1217) deletion (a putative null allele) are infertile and RNAi targeting drh-3 results in a penetrant embryonic lethal phenotype with defects in chromosome segregation and in the production of both Dicer-dependent and Dicer-independent small RNA populations (Duchaine et al., 2006; Nakamura et al., 2007).
Our screens for RNAi-deficient (Rde) strains identified three additional alleles of drh-3. Mutants bearing these non-null alleles are homozygous viable at 20°C but are infertile at 25°C. Each allele alters a distinct amino acid within the HELICc domain of the putative helicase (Figure 1A), and, based on genetic tests, each behaves like a partial loss-of-function mutation. Consistent with previous work demonstrating that DRH-3 is required for germline RNAi (Duchaine et al., 2006), these drh-3 point mutants were defective for RNAi targeting the maternal gene pos-1 (Figure 1B), and exhibited varying degrees of somatic RNAi deficiency. For example, the drh-3(ne3197) mutant was strongly resistant to RNAi targeting the somatically-expressed basement membrane collagen let-2, while the ne4253 and ne4254 alleles are partially and fully sensitive, respectively (Figure 1B).
Although the drh-3 point mutants exhibit a range of phenotypes that increase in penetrance at 25°C, they do not appear to be classic temperature-sensitive mutants. Null alleles of many germline factors including several RNAi-pathway genes cause similar conditional-sterile phenotypes that likely reflect an underlying temperature-dependent process in the germline that is uncovered in these mutant backgrounds. The spectrum of phenotypes observed in the drh-3 point mutant strains, including sterility, embryonic lethality and an increased frequency of spontaneous males (Figure 1C and D), are similar to those observed for null alleles of several mutator-class, Rde strains. And, as observed with mutator strains, the RNAi defect associated with the drh-3 alleles was not affected by temperature: each allele was fully resistant to pos-1 RNAi at the permissive temperature of 20°C (Figure 1A and not shown).
To determine whether the drh-3 point mutants showed small RNA defects similar to the deletion mutant (Duchaine et al., 2006), small RNA populations were isolated and analyzed by Northern blot to detect previously characterized endogenous small RNAs (Figure S1). All of the previously examined DRH-3 dependent endogenous small RNAs were dramatically reduced or undetected in the drh-3 point-mutant samples, while miRNA biogenesis was unaffected. These results are consistent with the notion that these novel alleles of drh-3 cause a partial loss of drh-3 function.
In the course of analyzing the small RNA phenotypes associated with the drh-3 mutants, we observed that a prominent small RNA species of ~22nt was virtually absent in small RNA samples prepared from each of the drh-3 mutants (Figure 2A). A second prominent small RNA species of ~21nt appeared to be unaffected in drh-3. Neither small RNA species was altered in samples prepared from a dcr-1(ok247) deletion mutant (Figure 2A). Thin-layer chromatography experiments indicated that the 22nt small RNAs have a 5′ guanosine (5′G) and are resistant to terminator exonuclease, suggesting the presence of a 5′-cap or polyphosphate (Figure 2B). The 21nt small RNAs are comprised of both 5′G and 5′ uracil (5′U) species, the latter of which we recently identified as the 21U-RNAs, piRNAs associated with PRG-1 (Batista et al., 2008). In contrast to 21U-RNAs, the majority of the 22nt small RNAs were sensitive to periodate (data not shown), indicating that the 3′ end is not modified (Ruby et al., 2006). Both 22nt and 21nt 5′G small RNAs were dramatically reduced in the drh-3 mutants, whereas 5′U small RNAs were unaffected. These data are consistent with the notion that the 22nt small RNAs represent an abundant pool of 5′-triphosphorylated products of endogenous RNA-dependent RNA polymerase (RdRP) (Aoki et al., 2007; Pak and Fire, 2007; Sijen et al., 2007).
To identify and characterize the DRH-3 dependent small RNAs on a genome-wide level, small RNAs between 18 and 26nt were cloned from wild-type and drh-3 mutant animals using a protocol compatible with cloning small RNAs bearing 5′-triphosphates. Illumina sequencing of wild-type and drh-3 libraries yielded 2.34 million and 4.33 million reads that perfectly match the C. elegans genome, respectively. Consistent with our biochemical analyses, both the size distribution and first nucleotide composition of small RNAs were dramatically altered in the drh-3 sample (Figure 2C). First, wild-type reads peaked sharply at 21 - 22nt, comprising 25% and 36% of the total reads, respectively. Whereas 21nt reads had similar levels of 5′U and 5′G residues (11% and 10% of total reads, respectively), ~60% of 22nt reads started with 5′G (~21% of total reads). Strikingly, reads with 5′G were strongly depleted from the drh-3 mutant sample, resulting in a dramatic enrichment of reads with 5′U.
After removing structural RNA degradation products from the dataset, about one-third of the wild-type reads matched to miRNAs (24.7%) and 21U-RNAs (11%) (Figure 2D). The most abundant class of small RNA reads from wild-type samples (34%) were antisense to protein-coding genes. The remaining small RNA reads were derived from transposons and other repetitive loci (~10%), as well as non-annotated loci (~20%). In contrast, the drh-3 sample was strongly depleted of endogenous small RNAs targeting protein coding genes, pseudogenes, repeats and non-annotated loci, but enriched proportionately for reads matching miRNAs and 21U-RNAs. Together, our biochemical and deep sequencing data indicated that DRH-3 is essential for the biogenesis of an abundant class of endogenous small RNAs expressed in C. elegans. Based on the propensity for 22nt length and 5′G residue, we refer to these small RNAs as 22G-RNAs, which include what were previously identified as endogenous siRNAs (Ambros et al., 2003; Ruby et al., 2006).
While examining the distribution of 22G-RNAs targeting protein-coding loci, we observed that 22G-RNAs were most abundant toward the 3′ end of many protein-coding transcripts. For example, 22G-RNAs were clearly enriched at the 3′ ends of both rrf-1 and ama-1 transcripts in the wild-type sample and tapered toward the 5′ end (Figure 2E). Interestingly, 22GRNAs were also enriched at the 5′ end of rrf-1. Remarkably, the remnant of rrf-1 and ama-1 22G-RNAs that were cloned from drh-3 mutants mapped almost exclusively to the 3′ end of these loci. This pattern might be expected if RdRP initiates 22G-RNA biogenesis at or near the 3′ end of a transcript.
To examine this on a larger scale, each transcript for which 22G-RNA reads occur in both wild-type and drh-3 mutant samples was divided into 20 consecutive intervals of equal size. The total number of 22G-RNAs that map to each interval was plotted for both wild-type and drh-3. In wild-type, 22G-RNAs were more abundant toward both the 5′ and 3′ termini of predicted transcripts (Figure 2F). This terminal distribution of 22G-RNAs represented a general trend for most target genes, and was not caused by a few genes with a high number of small RNAs at either end (Figure S2).
Markedly different results were obtained when this analysis was applied to the drh-3 mutant data. In drh-3 mutants, while 22G-RNAs were greatly depleted, they were not depleted uniformly across their targets (Figure (Figure2F2F and S2). The levels of 22G-RNAs within the first 15 bins (representing 75% transcript length) were disproportionately depleted, resulting in a marked exponential trend in 22G-RNA levels over the remaining 5 bins peaking at the 3′ end. This 3′ end enrichment was evident for about 86% of the 22G-RNA targets. Together, these results raise the possibility that 22G-RNA biogenesis is initiated at the 3′ end of most targets and the wild-type activity of drh-3 promotes the propagation of 22G-RNA biogenesis by RdRP along the template RNA.
To characterize the genetic pathways required for the biogenesis of 22G-RNAs, we examined a panel of RNAi-related mutants by Northern blot analysis to detect small RNAs derived from representative abundant 22G-RNA loci. In addition, we performed small RNA cloning experiments to identify 22G-RNA populations that are enriched in the germline, soma or oocyte (Figure S3; Tables S1 and S2; Supplemental Discussion). This exercise revealed that most 22G-RNAs are germline expressed and/or maternal (Figure S3). As expected from our deep sequencing experiments, we found that 22G-RNAs enriched in both the soma (Y47H10A.5) and germline (F37D6.3 and Tc1) were dependent on DRH-3. Likewise, RDE-3 and MUT-7 were required for 22G-RNAs targeting all three loci (Figure 3A), indicating that at least one aspect of the mechanism of 22G-RNA expression is shared by these somatic and germline loci.
Previous work has shown that some somatic 22G-RNA loci (e.g. K02E2.6 and X-cluster) are dependent on the ERI endo-RNAi complex as well as multiple AGO proteins that interact with secondary siRNAs produced by RdRP (Duchaine et al., 2006; Yigit et al., 2006). However, somatic 22G-RNAs derived from Y47H10A.5 (Figure 3A) were independent of the ERI pathway genes rrf-3 and ergo-1. Instead, Y47H10A.5 22G-RNAs fail to accumulate in exogenous (exo) RNAi pathway mutants, including rde-4, rde-1 and rrf-1 as well as a derivative of the multiple AGO mutant MAGO (Yigit et al., 2006) with two additional AGO mutations (MAGO+2), suggesting that Y47H10A.5 22G-RNAs could be triggered by dsRNA.
Germline 22G-RNAs appear to be independent of the exo-RNAi and ERI pathways. For example, mutations in dcr-1, rde-4 or ergo-1 caused no visible depletion in 22G-RNA populations based on Ethidium-bromide staining of small RNA (Figure 2A and data not shown) and were not required for the production of F37D6.3 or Tc1 22G-RNAs (Figures (Figures3A3A and S4). These findings suggest that 22G-RNA biogenesis is not triggered by dsRNA at these and many other targets. Despite the biochemical evidence indicating that germline 22G-RNAs are the products of RdRP, we observed near wild-type levels of 22G-RNAs in each of the individual RdRP mutants (Figure 3A and 3B) as well as in the MAGO+2 mutant, suggesting additional redundancy within the respective RdRP and AGO families of proteins.
RRF-1 is required for RNAi in somatic tissues (Sijen et al., 2001), while EGO-1 is required for fertility and has been implicated in RNAi targeting some, but not all, germline-expressed genes (Smardon et al., 2000). The latter finding could be explained if RRF-1 is functionally redundant with EGO-1 and is expressed within an overlapping domain in the germline. This might also account for the persistence of germline 22G-RNA expression in the RdRP single mutants analyzed (Figure 3). A non-complementation screen to generate the rrf-1 ego-1 double mutant, yielded a rearrangement, neC1, that disrupts rrf-1 and results in a putative null allele of rrf-1 linked to the ego-1(om97) nonsense allele (Figure S5). Northern blot analyses of small RNAs prepared from the rrf-1(neC1) ego-1(om97) double mutant revealed that germline 22G-RNAs fail to accumulate in animals null for both rrf-1 and ego-1 (Figure 3B), demonstrating that RRF-1 and EGO-1 function redundantly in the germline to produce 22G-RNAs.
Consistent with the overlapping functions of RRF-1 and EGO-1 for germline 22G-RNA biogenesis, both RRF-1 and EGO-1 interacted with DRH-3 in immunoprecipitation (IP) experiments (Figure 3C). A recent study demonstrated that DRH-3 interacts with RRF-1 and is required for RdRP activity in vitro (Tabara et al. 2007). We previously identified DRH-3 as a component of the ERI complex, which includes the Tudor-domain protein ERI-5 (Duchaine et al., 2006). EKL-1 is a close homolog of ERI-5 that is required for fertility, RNAi and chromosome segregation (Claycomb et al., cosubmitted). Consistent with the phenotypic similarities between drh-3, ego-1 and ekl-1 mutants (Duchaine et al., 2006; Rocheleau et al., 2008; Vought et al., 2005), EKL-1 also interacted with DRH-3 (Figure 3C). In addition, both EGO-1 and EKL-1 were among the most enriched proteins in DRH-3 IPs as assessed by Multidimensional Protein Identification Technology (MudPIT) (Figure S6). Although DCR-1 was detected in the DRH-3 IP by Western blot, DCR-1 peptides were not identified in DRH-3 IP-MudPIT experiments. Combined with our deep-sequencing data (below), these data suggest that DRH-3, EKL-1 and RdRP form a core RdRP complex that is essential for the biogenesis of 22G-RNAs in C. elegans.
Previously, we demonstrated that the accumulation of secondary siRNAs generated by RdRP is dependent upon multiple, redundant worm-specific AGO (WAGO) proteins. The previously described MAGO mutant was strongly defective for RNAi and failed to accumulate certain endogenous small RNAs (Yigit et al., 2006). However, the MAGO+2 derivative, containing the additional WAGO mutations ppw-2(tm1120) and C04F12.1(tm1637), continued to express normal levels of germline 22G-RNAs targeting, F37D6.3 and Tc1 (Figure 3A), indicating that additional WAGOs interact with germline 22G-RNAs.
Therefore, we generated additional combinations of mutants within the WAGO clade (Figure 4A). This analysis identified other WAGO mutant combinations with germline RNAi defects. For example, ppw-2(tm1120); f58g1.1(tm1019) double mutants were resistant to pos-1(RNAi) (data not shown), whereas the individual alleles are sensitive to pos-1(RNAi) (Yigit et al., 2006). A mutant lacking four of the branch III WAGOs, including ppw-2(tm1120), F55A12.1(tm2686), F58G1.1(tm1019) and ZK1248.7(tm1113), was resistant to germline RNAi (data not shown), but still produced normal levels of F37D6.3 germline 22G-RNAs (Figure 4B, Quadruple). Deletion of the branch III WAGO, wago-1(tm1414), resulted in a Quintuple AGO mutant with dramatically reduced germline 22G-RNAs (Figure 4B and 4C). Furthermore, the wago-1(tm1414) mutant alone showed a reduction in F37D6.3 germline 22G-RNAs that was comparable to the Quintuple AGO mutant (Figure 4C), demonstrating that WAGO-1 plays a key role in germline 22G-RNA function. Transgenic lines expressing a GFP::WAGO-1 fusion, under the control of the wago-1 promoter, revealed that WAGO-1 is expressed in the germline and localizes to perinuclear foci that resemble P-granules (Figure 4D and 4E).
Finally, we generated a strain lacking all 12 of the WAGO genes (not including predicted pseudogenes). This duodecuple mutant (MAGO12) is viable, resistant to RNAi, and exhibits a high frequency of spontaneous males and temperature-dependent sterility at 25°C (data not shown). Germline 22G-RNAs were undetectable by Northern blot analysis in the MAGO12 strain (Figure 4C), demonstrating a clear dependence of the 22G-RNAs on WAGOs.
To gain insight into the function of germline 22G-RNAs, we performed deep-sequencing of small RNA populations from mutants with germline 22G-RNA defects. In addition, we generated transgenic animals that express a 3xFLAG::WAGO-1 fusion protein and deep-sequenced the small RNAs that coprecipitate with WAGO-1. For each mutant, the fraction of reads matching coding genes, non-annotated loci and repeat elements was reduced with concomitant increases in the fraction of miRNA and 21U-RNA reads (Figure S7). Conversely, the fraction of reads matching to a particular set of coding genes, non-annotated loci and repeat elements were enriched in the small RNA library prepared from the WAGO-1 immunoprecipitate, while miRNAs and 21U-RNAs were severely depleted (Figure 5A).
We next asked whether the reduction of 22G-RNAs in each mutant occurred globally or at particular loci. 22G-RNAs targeting protein-coding loci were nearly completely eliminated in the drh-3 and ekl-1 single mutants and in the rrf-1 ego-1 double mutant. Gene-targeted 22G-RNAs were far less likely to be depleted in the rde-3, mut-7 and MAGO12 mutant samples (Figure 5B), with the notable exception that 22G-RNA species targeting a subset of genes with normally very high 22G-RNA levels were strongly depleted in each of these mutants (Figure S8). In contrast, 22G-RNAs were largely unaffected in an rde-4 mutant, which is required for ERI-class small RNAs (Duchaine et al., 2006; Lee et al., 2006; J.V., W.G. and C.C.M., unpublished). The 22G-RNAs depleted in the rde-3, mut-7 and MAGO12 mutants were almost completely overlapping (Figure 5C and Table S3). Despite an overall reduction of 22G-RNA reads, a subset of 22G-RNA species was not depleted in rde-3, mut-7 and MAGO12 mutants (Figure 5B and S8); in fact, some were increased in proportion. These WAGO-independent 22G-RNA populations are associated with and dependent upon another germline-expressed AGO, CSR-1 (Figure S9; Claycomb et al., cosubmitted). The bimodal distribution of 22G-RNA loci indicates that at least two qualitatively distinct 22G-RNA pathways exist in the germline that depend on a core set of factors (DRH-3, EKL-1 and RdRP), whose small-RNA products interact with distinct AGOs.
Consistent with the requirement for WAGO-pathway components in exo-RNAi, the WAGO-associated 22G-RNAs appear to be involved in silencing their respective targets. Loci with the highest levels of 22G-RNAs in wild-type were consistently derepressed in the drh-3 mutant as assessed by semi-quantitative, polymerase chain reaction with reverse transcription (qRT-PCR, Figure 5E) and Affymetrix tiling arrays (Figure S10). In contrast, CSR-1 associated 22G-RNAs do not appear to silence their targets (Claycomb et al., cosubmitted), consistent with the biological distinction between these pathways.
Previous work has shown that Tc elements are silenced in the germline by an RNAi mechanism (Ketting et al., 1999; Sijen and Plasterk, 2003; Tabara et al., 1999). Individual Repbase annotations, which include all major classes of transposons in C. elegans were uniformly depleted of 22G-RNAs in drh-3, ekl-1, RdRP, rde-3, mut-7 and MAGO12 mutant samples and most transposon 22G-RNAs were enriched in the WAGO-1 IP sample (Figure 5B). Transposon loci showed normal levels of 22G-RNAs in an rde-4 mutant sample (Figure 5B; Tabara et al., 1999). Thus, the transposon-silencing pathway in C. elegans consists of DRH-3, EKL-1, RdRPs, RDE-3, MUT-7 and multiple WAGOs, including WAGO-1.
The drh-3 alleles described here display the hallmarks of mutator class, Rde mutants (Figure 1). Indeed, spontaneous mutants with phenotypes that revert at high frequency were cloned from the drh-3 mutants, including a dpy-5::Tc5 insertion. The frequency of reversion from Dumpy to wild type, upon excision of Tc5 from dpy-5 in drh-3(ne4253), was similar to an allele of mut-7(ne4255) that was isolated in the same screen, and almost 5-fold higher than the nonsense allele mut-7(pk204) (Figure 5D). Similar results were obtained with an unc-22::Tc1 insertion (data not shown). Furthermore, Tc1 and Tc3 transcripts were derepressed in the drh-3 mutant (Figure 5E), demonstrating that DRH-3 is required for transposon silencing.
Approximately 15% of 22G-RNA reads were derived from non-annotated loci and were dependent on RDE-3, MUT-7 and MAGO12. These loci primarily correspond to unique intergenic sequences and could represent pseudogenes or cryptic loci that lack open reading frames and are unrecognizable by current bioinformatic approaches. In some cases, we could predict potential splicing patterns based on anti-sense reads spanning these non-annotated regions (Figure S11A). Consistent with this notion, 22G-RNAs derived from many loci annotated as pseudogenes were also depleted in rde-3, mut-7 and MAGO12 mutants (Figure 6A). As with annotated genes targeted by WAGO-associated 22G-RNAs, qRT-PCR and microarray analysis demonstrated that both pseudogene and cryptic loci targeted by 22G-RNAs were desilenced in the drh-3 mutant (Figures (Figures6B6B and S12).
Upon closer inspection of protein coding loci targeted by the WAGO pathway, we noted that the 22G-RNA profile often did not correspond to the annotated gene prediction (Figure S11B and Table S4). In many cases, 22G-RNAs mapped within predicted introns, suggesting that the corresponding introns were not spliced in the target RNA. In other cases, 22G-RNAs started or ended abruptly in the middle of the annotation and extended well upstream or downstream of the gene prediction, suggesting that the annotation is incomplete or incorrect. Lastly, we noticed a number of WAGO-target genes with intron annotations in 3′UTRs.
Because pseudogenes and genes with 3′UTR introns are expected to be targets of the nonsense-mediated decay (NMD) pathway, we asked whether 22G-RNA biogenesis was dependent on the PIN domain protein SMG-5, the Upf1 helicase SMG-2, and the phosphatidylinositol-kinase SMG-1 (Anders et al., 2003; Glavan et al., 2006; Grimson et al., 2004; Page et al., 1999). 22G-RNAs derived from X-loci and K02E2.6, which has a 3′UTR intron, were reduced in the null mutant smg-5(r860) and to a lesser extent in the non-null smg-2(r863) (Figure 6C). K02E2.6 and X-loci 22G-RNAs were unchanged in the temperature-sensitive mutant smg-1(cc546) at both permissive and non-permissive temperatures (Figure 6C and data not shown). These data suggest a role for SMG-2 and SMG-5 in 22G-RNA biogenesis that is distinct from their recognized role in NMD and that NMD per se is not required for 22G-RNA biogenesis.
Deep-sequence analysis of smg-5 mutant small RNAs revealed that SMG-5 is required for the biogenesis of 22G-RNAs targeting 15% of WAGO-dependent 22G-RNA target genes (Figure 6C). Interestingly, SMG-5 was not required for most pseudogene-derived 22G-RNAs (data not shown). Furthermore, the few published endogenous targets of NMD do not appear to be 22G-RNA targets (data not shown). Roughly half of the SMG-5 dependent 22G-RNA loci overlap with RDE-4 dependent 22G-RNA loci (Figure 6D), which includes both ERI-dependent and ERI-independent 22G-RNA loci (J.V., W.G. and C.C.M., unpublished). These findings indicate that multiple WAGO-dependent 22G-RNA pathways exist, which together define a general surveillance system that silences transposons and aberrant transcripts.
In this study, we have combined deep-sequencing with the powerful genetics of C. elegans to identify and characterize an abundant class of endogenous small RNAs that we call 22G-RNAs. In adult animals, 22G-RNAs are primarily germline-expressed and are derived from unique sequences in the genome, including coding genes, transposons, pseudogenes and non-annotated loci. Combining data from three small RNA libraries, including an AGO IP sample, we have identified 22G-RNAs antisense to over 50% of the annotated protein-coding genes.
DRH-3, EKL-1 and the partially redundant RdRPs, RRF-1 and EGO-1, form a core RdRP complex that functions in multiple 22G-RNA pathways (Figure 7). RDE-3, MUT-7 and members of the WAGO clade, in particular WAGO-1, define a general 22G-RNA surveillance system that silences transposable elements and aberrant transcripts. A second pathway is dependent on CSR-1, which promotes kinetochore structure and chromosome segregation (Figure 7; Claycomb et al., cosubmitted). Taken together, 22G-RNAs appear to engage targets that derive from both the actively expressed regions (CSR-1 associated), as well as the ‘silent’ regions of the genome (WAGO-1 associated). These findings support a model in which the 22GRNA pathways exert genome-scale surveillance important for maintenance of the germline.
It is formally possible that maternal DCR-1 is sufficient to generate primary siRNAs and that subsequent recruitment of the secondary RNAi machinery, dependent on RdRPs and WAGOs, results in a self-sustaining amplification cycle to produce 22G-RNAs in dcr-1 zygotic mutants. However, extensive Northern blot experiments demonstrate that ERI-dependent 22GRNAs and miRNAs are depleted in the dcr-1 mutant despite the maternal contribution of DCR-1, while germline 22G-RNAs are present at normal or in some cases elevated levels (Figure S4). Aside from miRNA and exo-RNAi complexes, the ERI complex appears to be the primary DCR-1 complex in C. elegans (Duchaine et al., 2006). We have shown that 22G-RNAs are largely independent of RDE-4, a DCR-1 cofactor in both exo-RNAi and ERI pathways (Tabara et al., 2002; Duchaine et al., 2006; Lee et al., 2007). Thus, we favor the model that the major 22GRNA pathways are initiated in a DCR-1 independent fashion. New alleles of dcr-1 will be important in order to resolve this issue in the future.
How might Dicer-independent 22G-RNA biogenesis be triggered and what role does DRH-3 play in this process? A recent report has shown that drh-3 null mutant extracts are deficient in the in vitro synthesis of antisense small RNA by RRF-1 (Aoki et al., 2007). Our analysis of 22G-RNA levels in hypomorphic drh-3 point mutant alleles is consistent with a role for the conserved DRH-3 helicase domain in 22G-RNA biogenesis. Two of the three drh-3 missense alleles alter highly conserved residues within the HELICc domain. The drh-3(ne4253) lesion (T834M) alters a residue that contacts RNA in the Vasa crystal structure; the drh-3(ne3197) lesion (G840D) alters a residue that maps to the interface between the HELICc and DExH domains and coordinates the water molecule that is thought to be required for ATP hydrolysis (Sengoku et al., 2006). Both lesions are likely to abrogate ATPase and/or unwinding activity based on structural and biochemical studies with related proteins (Liang et al., 1994; Sengoku et al., 2006).
The enrichment of 22G-RNAs at the 3′ end of transcripts, suggests that 22G-RNA biogenesis begins at the 3′ end of target RNAs followed by cycles of 22G-RNA synthesis by RdRP and proceeding along the template toward the 5′ end. Interestingly, the 3′ localized 22GRNAs are least diminished in the drh-3 mutant, suggesting that these lesions do not prevent the initial loading of RdRP onto the template, but rather interfere with the processivity of RdRP. DRH-3 could remove secondary structure from the template or facilitate transfer of the 22GRNA to downstream WAGOs, allowing RdRP to initiate a second round of synthesis at the next available C residue in the template RNA.
Whatever the mechanism by which DRH-3 promotes 22G-RNA biogenesis, our data support the idea that RdRP is recruited to the 3′ end of target transcripts. Templates lacking a poly(A) tail are better substrates for RdRP in vitro (Aoki et al., 2007), suggesting that defective 3′ end formation may be one trigger for 22G-RNA biogenesis. In Arabidopsis, decapped, mis-spliced and mis-terminated transcripts are recognized by, and activate, the RNA-silencing machinery (Gazzani et al., 2004; Herr et al., 2006). In fission yeast, two β-nucleotidyl transferases, Cid12 and Cid14, determine whether transcripts are recognized by the RdRP complex (Cid12) or the TRAMP/exosome complex (Cid14) (Buhler et al., 2007; Buhler et al., 2008; Motamedi et al., 2004). In cid14 mutants, transcripts that are normally turned-over by the TRAMP surveillance pathway become substrates for the RdRP complex (Buhler et al., 2008), indicating that these pathways recognize a common feature. Perhaps the β-nucleotidyl transferase RDE-3 and the 3′-to-5′ exonuclease MUT-7 function in an exosome-like pathway that recognizes and processes the 3′ end of aberrant transcripts, providing a signal that recruits the RdRP complex (Figure 7) (Chen et al., 2005; Ketting et al., 1999; Lee et al., 2006; Tabara et al., 1999).
Based on studies in other model systems, we expected a significant fraction of germline 22G-RNAs to be derived from transposons. Indeed, transposable elements are targets of 22GRNAs. However, 22G-RNAs derived from unique sequences, both genic and intergenic, comprise a major fraction of the 22G-RNAs that interact with WAGO-1. Furthermore, loci that produce the highest levels of 22G-RNAs appear to interact with WAGO-1 and are the most desilenced in the drh-3 mutant. Remarkably, CSR-1 interacts with a non-overlapping population of 22G-RNAs derived almost exclusively from protein coding loci (Figure S9; Claycomb et al., cosubmitted). Loci targeted by CSR-1 produce fewer 22G-RNAs than WAGO-1 loci and are not desilenced in either csr-1 or drh-3 mutants (Claycomb et al., cosubmitted). These findings indicate that these pathways are mechanistically or functionally distinct (Figure 7), a conclusion that is consistent with the genetically defined functions of CSR-1 and WAGO-1. Although both pathways seem to be important for an efficient response to foreign dsRNA (Yigit et al., 2006; Claycomb et al., cosubmitted), it seems likely that the WAGO surveillance system is primarily involved, as WAGOs were shown to be limiting for RNAi and to interact directly with the secondary 22G-RNAs in the amplification cycle (Yigit et al., 2006). csr-1 mutants disrupt the perinuclear localization of the germline nuage (Claycomb et al., cosubmitted) wherein WAGO-1 resides, and hence could indirectly affect WAGO-1 function by interfering with its proper localization within these germline structures. Further genetic and biochemical studies will be necessary to dissect the relative contributions of the WAGO-1 and CSR-1 pathways to RNAi.
At least some of the specificity of CSR-1 and WAGO 22G-RNA pathways can be attributed to the involvement of distinct RdRP complexes. We have uncovered a role for RRF-1 in the germline, where it is redundant with EGO-1 in a surveillance pathway that regulates transposons, pseudogenes and cryptic loci. However, EGO-1 alone is required for the 22G-RNAs that associate with CSR-1 (Claycomb et al., cosubmitted). In addition, van Wolfswinkel et al. (cosubmitted) implicate the β-nucleotidyl transferase CDE-1 as a specificity factor for EGO-1 in the chromosome segregation pathway, as EGO-1 can also function with RDE-3 (a CDE-1 homolog) for the biogenesis of WAGO-associated 22G-RNAs (Figure 7).
Despite clear genetic redundancy among the WAGOs, we expect that individual WAGOs normally function in distinct pathways (Figure 7). Consistent with this idea, RDE-4 and SMG-5 are required for the biogenesis of distinct and overlapping subsets of WAGO-dependent 22GRNAs. Domeier et al. (2000) showed that the exo-RNAi response is short-lived in smg-2, -5 and -6 mutants. Our findings extend their work and provide molecular insight into the link between the NMD pathway and RNAi. Both studies connect 22G-RNA biogenesis to the translation apparatus and suggest that an alternative branch of the NMD pathway exists (see Behm-Ansmant et al., 2007). Perhaps SMG proteins recognize a particular characteristic of 22G-RNA target transcripts and recruit the RdRP machinery. Our findings indicate that the signal is unlikely to be premature termination codons. Alternatively, NMD components could function as WAGO cofactors. WAGOs lack the catalytic residues important for Slicer activity and are not expected to cleave a target in vivo, suggesting that alternative turnover mechanisms are involved in silencing. It is interesting to note that the PIN domains of both SMG-5 and SMG-6 are structurally related to RNase H (Glavan et al., 2006), but only SMG-6 retains the catalytic residues important for the endonucleolytic cleavage that initiates NMD (Eberle et al., 2009; Huntzinger et al., 2008). If WAGOs do indeed lack catalytic activity, perhaps SMG-6 could provide the endonuclease activity that reinforces the 22G-RNA amplification cycle for this set of targets. Additional complexity and bifurcation/convergence of WAGO-dependent 22G-RNA pathways is likely to emerge as we identify new factors required for 22G-RNA biogenesis.
The Mutator phenotypes of mut-7, rde-3 and drh-3 could, in part, result from defects in WAGO-dependent chromatin silencing (Figure 7). WAGO-12/NRDE-3 is a nuclear WAGO required for co-transcriptional silencing (Guang et al., 2008), but not for the accumulation of 22G-RNAs. However, mutants that block the biogenesis of 22G-RNAs prevent nuclear localization of NRDE-3 and exacerbate derepression of NRDE-3 targets, suggesting additional WAGOs are involved in a parallel, post-transcriptional silencing pathway. It will be of interest in the future to dissect the potential role of 22G-RNAs in different chromatin-mediated silencing pathways.
In several respects, including transposon control and DCR-1 independent small RNA biogenesis, the WAGO 22G-RNA system is analogous to the Drosophila and vertebrate piRNA pathways. Furthermore, we have shown that the WAGO and CSR-1 22G-RNA systems are maternal and that factors involved in these pathways localize to germline P-granules (Claycomb et al., cosubmitted), which are thought to function in the repression and storage of maternal mRNAs (Rajyaguru and Parker, 2009). This appears to be a common feature of germline small RNA pathways in animals, as PIWI family members also localize to P-granules or nuage (Batista et al., 2008; Li et al., 2009; Malone et al., 2009). In each case, the localization of AGO proteins to P-granules appears to be dependent on small RNA biogenesis. For the CSR-1 pathway, the Pgranule structure itself seems to be dependent on small RNA biogenesis (Claycomb et al., cosubmitted). The close association between cytoplasmic P-granules and nuclear pores would allow AGO systems to survey the entire transcriptome as RNAs exit the nucleus and enter the Pgranule, reinforcing both the biogenesis and regulatory functions of small RNAs.
Maternal small RNAs function in a number of epigenetic programs from transposon silencing (Brennecke et al., 2008; Tam et al., 2008; Watanabe et al., 2008) to imprinting (Davis et al., 2005) to the maternal-zygotic transition (Giraldez et al., 2006; Lykke-Andersen et al., 2008). Our findings suggest that 22G-RNAs in C. elegans mirror the expression of many germline-expressed RNAs, including those destined for expression as well as silencing. Thus 22G-RNAs and their AGO partners provide versatile regulators of both physical and epigenetic inheritance.
Standard methods were used for preparing and processing samples for RNA and protein analyses. Antibodies used in this study include: (1) affinity-purified, anti-DRH-3 polyclonal; (2) anti-DCR-1 polyclonal (Duchaine et al., 2006); (3) affinity-purified, anti-EKL-1 (Claycomb et al., accompanying); (4) affinity-purified, anti-RRF-1 polyclonal; (5) affinity-purified, anti-EGO-1 polyclonal; (6) anti-FLAG M2 monoclonal (Sigma); (7) HRP-conjugated, anti-rabbit IgG secondary antibodies (Jackson Immunoresearch). Details provided in Supplemental Materials.
Small RNA amplicons were prepared essentially as described (Ambros et al., 2003; Lim et al., 2003) with some modifications. Libraries were sequenced using an Illumina 1G Genome Analyzer at the Center for AIDS Research, UMass Medical School or at the Center for Genome Research and Biocomputing, Oregon State University. Detailed methods for cloning small RNAs are provided in Supplemental Materials.
Small RNA sequences were processed and mapped to the C. elegans genome (Wormbase release WS192) as well as Repbase (13.07) using Perl (5.8.6). Details included in Supplemental Materials.
Samples for tiling array analysis were prepared as described (Batista et al., 2008). Probe signals were calculated using Affymetrix Tiling Analysis Software 1.1.2 (bandwidth: 30; intensities: PM/MM) with three replicates each drh-3 (ne4253) (experimental) and wild-type (control). Affymetrix probe coordinates (release WS170) were converted to release WS192 coordinates using a Perl script. Gene expression values were defined as the geometric mean of all probe signals within a gene that had a P-value of <0.1 and were present in both datasets. Actin was used to normalize expression values prior to comparison.
We thank E. Kittler and UMass Deep Sequencing Core facility for Solexa-sequencing; M. Hammell for help with statistics; R. Ketting for sharing unpublished data; and the CGC for providing strains. P.J.B. and D.A.C. were supported by SFRH/BD/11803/2003 (P.J.B.) and SFRH/BD/17629/2004/H6BM (D.A.C.) from Fundação para Ciência e Tecnologia, Portugal. J.M.C. was an HHMI fellow of the LSRF. J.J.M. is supported by NIH grant DK074798. E.M.Y. is a Damon Runyon Fellow supported by the DRCRF (DRG-1983-08). J.R.Y. is supported by R41 RR011823 from the Yeast Resource Center. C.C.M. is a Howard Hughes Medical Institute Investigator. This work was supported in part by Ruth L. Kirschstein N.R.S.A. GM63348 (D.C.) and R01 grant GM58800 (C.C.M.) from the NIGMS.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.