Novel alleles of drh-3 disrupt RNAi
A previous study identified DRH-3 as a Dicer-interacting factor required for germline RNAi and for viability (
Duchaine et al., 2006). Animals homozygous for the
drh-3(tm1217) deletion (a putative null allele) are infertile and RNAi targeting
drh-3 results in a penetrant embryonic lethal phenotype with defects in chromosome segregation and in the production of both Dicer-dependent and Dicer-independent small RNA populations (
Duchaine et al., 2006;
Nakamura et al., 2007).
Our screens for RNAi-deficient (Rde) strains identified three additional alleles of
drh-3. Mutants bearing these non-null alleles are homozygous viable at 20°C but are infertile at 25°C. Each allele alters a distinct amino acid within the HELICc domain of the putative helicase (), and, based on genetic tests, each behaves like a partial loss-of-function mutation. Consistent with previous work demonstrating that DRH-3 is required for germline RNAi (
Duchaine et al., 2006), these
drh-3 point mutants were defective for RNAi targeting the maternal gene
pos-1 (), and exhibited varying degrees of somatic RNAi deficiency. For example, the
drh-3(ne3197) mutant was strongly resistant to RNAi targeting the somatically-expressed basement membrane collagen
let-2, while the
ne4253 and
ne4254 alleles are partially and fully sensitive, respectively ().
Although the drh-3 point mutants exhibit a range of phenotypes that increase in penetrance at 25°C, they do not appear to be classic temperature-sensitive mutants. Null alleles of many germline factors including several RNAi-pathway genes cause similar conditional-sterile phenotypes that likely reflect an underlying temperature-dependent process in the germline that is uncovered in these mutant backgrounds. The spectrum of phenotypes observed in the drh-3 point mutant strains, including sterility, embryonic lethality and an increased frequency of spontaneous males (), are similar to those observed for null alleles of several mutator-class, Rde strains. And, as observed with mutator strains, the RNAi defect associated with the drh-3 alleles was not affected by temperature: each allele was fully resistant to pos-1 RNAi at the permissive temperature of 20°C ( and not shown).
To determine whether the
drh-3 point mutants showed small RNA defects similar to the deletion mutant (
Duchaine et al., 2006), small RNA populations were isolated and analyzed by Northern blot to detect previously characterized endogenous small RNAs (
Figure S1). All of the previously examined DRH-3 dependent endogenous small RNAs were dramatically reduced or undetected in the
drh-3 point-mutant samples, while miRNA biogenesis was unaffected. These results are consistent with the notion that these novel alleles of
drh-3 cause a partial loss of
drh-3 function.
DRH-3 is essential for the biogenesis of 22G-RNAs
In the course of analyzing the small RNA phenotypes associated with the
drh-3 mutants, we observed that a prominent small RNA species of ~22nt was virtually absent in small RNA samples prepared from each of the
drh-3 mutants (). A second prominent small RNA species of ~21nt appeared to be unaffected in
drh-3. Neither small RNA species was altered in samples prepared from a
dcr-1(ok247) deletion mutant (). Thin-layer chromatography experiments indicated that the 22nt small RNAs have a 5′ guanosine (5′G) and are resistant to terminator exonuclease, suggesting the presence of a 5′-cap or polyphosphate (). The 21nt small RNAs are comprised of both 5′G and 5′ uracil (5′U) species, the latter of which we recently identified as the 21U-RNAs, piRNAs associated with PRG-1 (
Batista et al., 2008). In contrast to 21U-RNAs, the majority of the 22nt small RNAs were sensitive to periodate (data not shown), indicating that the 3′ end is not modified (
Ruby et al., 2006). Both 22nt and 21nt 5′G small RNAs were dramatically reduced in the
drh-3 mutants, whereas 5′U small RNAs were unaffected. These data are consistent with the notion that the 22nt small RNAs represent an abundant pool of 5′-triphosphorylated products of endogenous RNA-dependent RNA polymerase (RdRP) (
Aoki et al., 2007;
Pak and Fire, 2007;
Sijen et al., 2007).
To identify and characterize the DRH-3 dependent small RNAs on a genome-wide level, small RNAs between 18 and 26nt were cloned from wild-type and drh-3 mutant animals using a protocol compatible with cloning small RNAs bearing 5′-triphosphates. Illumina sequencing of wild-type and drh-3 libraries yielded 2.34 million and 4.33 million reads that perfectly match the C. elegans genome, respectively. Consistent with our biochemical analyses, both the size distribution and first nucleotide composition of small RNAs were dramatically altered in the drh-3 sample (). First, wild-type reads peaked sharply at 21 - 22nt, comprising 25% and 36% of the total reads, respectively. Whereas 21nt reads had similar levels of 5′U and 5′G residues (11% and 10% of total reads, respectively), ~60% of 22nt reads started with 5′G (~21% of total reads). Strikingly, reads with 5′G were strongly depleted from the drh-3 mutant sample, resulting in a dramatic enrichment of reads with 5′U.
After removing structural RNA degradation products from the dataset, about one-third of the wild-type reads matched to miRNAs (24.7%) and 21U-RNAs (11%) (). The most abundant class of small RNA reads from wild-type samples (34%) were antisense to protein-coding genes. The remaining small RNA reads were derived from transposons and other repetitive loci (~10%), as well as non-annotated loci (~20%). In contrast, the
drh-3 sample was strongly depleted of endogenous small RNAs targeting protein coding genes, pseudogenes, repeats and non-annotated loci, but enriched proportionately for reads matching miRNAs and 21U-RNAs. Together, our biochemical and deep sequencing data indicated that DRH-3 is essential for the biogenesis of an abundant class of endogenous small RNAs expressed in
C. elegans. Based on the propensity for 22nt length and 5′G residue, we refer to these small RNAs as 22G-RNAs, which include what were previously identified as endogenous siRNAs (
Ambros et al., 2003;
Ruby et al., 2006).
22G-RNAs are enriched at transcript termini
While examining the distribution of 22G-RNAs targeting protein-coding loci, we observed that 22G-RNAs were most abundant toward the 3′ end of many protein-coding transcripts. For example, 22G-RNAs were clearly enriched at the 3′ ends of both rrf-1 and ama-1 transcripts in the wild-type sample and tapered toward the 5′ end (). Interestingly, 22GRNAs were also enriched at the 5′ end of rrf-1. Remarkably, the remnant of rrf-1 and ama-1 22G-RNAs that were cloned from drh-3 mutants mapped almost exclusively to the 3′ end of these loci. This pattern might be expected if RdRP initiates 22G-RNA biogenesis at or near the 3′ end of a transcript.
To examine this on a larger scale, each transcript for which 22G-RNA reads occur in both wild-type and
drh-3 mutant samples was divided into 20 consecutive intervals of equal size. The total number of 22G-RNAs that map to each interval was plotted for both wild-type and
drh-3. In wild-type, 22G-RNAs were more abundant toward both the 5′ and 3′ termini of predicted transcripts (). This terminal distribution of 22G-RNAs represented a general trend for most target genes, and was not caused by a few genes with a high number of small RNAs at either end (
Figure S2).
Markedly different results were obtained when this analysis was applied to the
drh-3 mutant data. In
drh-3 mutants, while 22G-RNAs were greatly depleted, they were not depleted uniformly across their targets (Figure and
S2). The levels of 22G-RNAs within the first 15 bins (representing 75% transcript length) were disproportionately depleted, resulting in a marked exponential trend in 22G-RNA levels over the remaining 5 bins peaking at the 3′ end. This 3′ end enrichment was evident for about 86% of the 22G-RNA targets. Together, these results raise the possibility that 22G-RNA biogenesis is initiated at the 3′ end of most targets and the wild-type activity of
drh-3 promotes the propagation of 22G-RNA biogenesis by RdRP along the template RNA.
Distinct genetic requirements for 22G-RNAs
To characterize the genetic pathways required for the biogenesis of 22G-RNAs, we examined a panel of RNAi-related mutants by Northern blot analysis to detect small RNAs derived from representative abundant 22G-RNA loci. In addition, we performed small RNA cloning experiments to identify 22G-RNA populations that are enriched in the germline, soma or oocyte (
Figure S3; Tables
S1 and
S2;
Supplemental Discussion). This exercise revealed that most 22G-RNAs are germline expressed and/or maternal (
Figure S3). As expected from our deep sequencing experiments, we found that 22G-RNAs enriched in both the soma (
Y47H10A.5) and germline (
F37D6.3 and Tc1) were dependent on DRH-3. Likewise, RDE-3 and MUT-7 were required for 22G-RNAs targeting all three loci (), indicating that at least one aspect of the mechanism of 22G-RNA expression is shared by these somatic and germline loci.
Previous work has shown that some somatic 22G-RNA loci (e.g.
K02E2.6 and X-cluster) are dependent on the ERI endo-RNAi complex as well as multiple AGO proteins that interact with secondary siRNAs produced by RdRP (
Duchaine et al., 2006;
Yigit et al., 2006). However, somatic 22G-RNAs derived from
Y47H10A.5 () were independent of the ERI pathway genes
rrf-3 and
ergo-1. Instead,
Y47H10A.5 22G-RNAs fail to accumulate in exogenous (exo) RNAi pathway mutants, including
rde-4,
rde-1 and
rrf-1 as well as a derivative of the multiple AGO mutant MAGO (
Yigit et al., 2006) with two additional AGO mutations (MAGO+2), suggesting that
Y47H10A.5 22G-RNAs could be triggered by dsRNA.
Germline 22G-RNAs appear to be independent of the exo-RNAi and ERI pathways. For example, mutations in
dcr-1,
rde-4 or
ergo-1 caused no visible depletion in 22G-RNA populations based on Ethidium-bromide staining of small RNA ( and data not shown) and were not required for the production of
F37D6.3 or Tc1 22G-RNAs (Figures and
S4). These findings suggest that 22G-RNA biogenesis is not triggered by dsRNA at these and many other targets. Despite the biochemical evidence indicating that germline 22G-RNAs are the products of RdRP, we observed near wild-type levels of 22G-RNAs in each of the individual RdRP mutants () as well as in the MAGO+2 mutant, suggesting additional redundancy within the respective RdRP and AGO families of proteins.
DRH-3, the tudor-domain protein EKL-1 and two partially redundant RdRPs form a core complex essential for 22G-RNA biogensis
RRF-1 is required for RNAi in somatic tissues (
Sijen et al., 2001), while EGO-1 is required for fertility and has been implicated in RNAi targeting some, but not all, germline-expressed genes (
Smardon et al., 2000). The latter finding could be explained if RRF-1 is functionally redundant with EGO-1 and is expressed within an overlapping domain in the germline. This might also account for the persistence of germline 22G-RNA expression in the RdRP single mutants analyzed (). A non-complementation screen to generate the
rrf-1 ego-1 double mutant, yielded a rearrangement,
neC1, that disrupts
rrf-1 and results in a putative null allele of
rrf-1 linked to the
ego-1(om97) nonsense allele (
Figure S5). Northern blot analyses of small RNAs prepared from the
rrf-1(neC1) ego-1(om97) double mutant revealed that germline 22G-RNAs fail to accumulate in animals null for both
rrf-1 and
ego-1 (), demonstrating that RRF-1 and EGO-1 function redundantly in the germline to produce 22G-RNAs.
Consistent with the overlapping functions of RRF-1 and EGO-1 for germline 22G-RNA biogenesis, both RRF-1 and EGO-1 interacted with DRH-3 in immunoprecipitation (IP) experiments (). A recent study demonstrated that DRH-3 interacts with RRF-1 and is required for RdRP activity
in vitro (Tabara et al. 2007). We previously identified DRH-3 as a component of the ERI complex, which includes the Tudor-domain protein ERI-5 (
Duchaine et al., 2006). EKL-1 is a close homolog of ERI-5 that is required for fertility, RNAi and chromosome segregation (Claycomb et al., cosubmitted). Consistent with the phenotypic similarities between
drh-3,
ego-1 and
ekl-1 mutants (
Duchaine et al., 2006;
Rocheleau et al., 2008;
Vought et al., 2005), EKL-1 also interacted with DRH-3 (). In addition, both EGO-1 and EKL-1 were among the most enriched proteins in DRH-3 IPs as assessed by Multidimensional Protein Identification Technology (MudPIT) (
Figure S6). Although DCR-1 was detected in the DRH-3 IP by Western blot, DCR-1 peptides were not identified in DRH-3 IP-MudPIT experiments. Combined with our deep-sequencing data (below), these data suggest that DRH-3, EKL-1 and RdRP form a core RdRP complex that is essential for the biogenesis of 22G-RNAs in
C. elegans.
WAGO-family AGOs function redundantly and interact with germline 22G-RNAs
Previously, we demonstrated that the accumulation of secondary siRNAs generated by RdRP is dependent upon multiple, redundant worm-specific AGO (WAGO) proteins. The previously described MAGO mutant was strongly defective for RNAi and failed to accumulate certain endogenous small RNAs (
Yigit et al., 2006). However, the MAGO+2 derivative, containing the additional WAGO mutations
ppw-2(tm1120) and
C04F12.1(tm1637), continued to express normal levels of germline 22G-RNAs targeting,
F37D6.3 and Tc1 (), indicating that additional WAGOs interact with germline 22G-RNAs.
Therefore, we generated additional combinations of mutants within the WAGO clade (). This analysis identified other WAGO mutant combinations with germline RNAi defects. For example,
ppw-2(tm1120);
f58g1.1(tm1019) double mutants were resistant to
pos-1(RNAi) (data not shown), whereas the individual alleles are sensitive to
pos-1(RNAi) (
Yigit et al., 2006). A mutant lacking four of the branch III WAGOs, including
ppw-2(tm1120),
F55A12.1(tm2686),
F58G1.1(tm1019) and
ZK1248.7(tm1113), was resistant to germline RNAi (data not shown), but still produced normal levels of
F37D6.3 germline 22G-RNAs (). Deletion of the branch III WAGO,
wago-1(tm1414), resulted in a Quintuple AGO mutant with dramatically reduced germline 22G-RNAs (). Furthermore, the
wago-1(tm1414) mutant alone showed a reduction in
F37D6.3 germline 22G-RNAs that was comparable to the Quintuple AGO mutant (), demonstrating that WAGO-1 plays a key role in germline 22G-RNA function. Transgenic lines expressing a GFP::WAGO-1 fusion, under the control of the
wago-1 promoter, revealed that WAGO-1 is expressed in the germline and localizes to perinuclear foci that resemble P-granules ().
Finally, we generated a strain lacking all 12 of the WAGO genes (not including predicted pseudogenes). This duodecuple mutant (MAGO12) is viable, resistant to RNAi, and exhibits a high frequency of spontaneous males and temperature-dependent sterility at 25°C (data not shown). Germline 22G-RNAs were undetectable by Northern blot analysis in the MAGO12 strain (), demonstrating a clear dependence of the 22G-RNAs on WAGOs.
WAGOs, RDE-3 and MUT-7 are required for germline 22G-RNA silencing pathways
To gain insight into the function of germline 22G-RNAs, we performed deep-sequencing of small RNA populations from mutants with germline 22G-RNA defects. In addition, we generated transgenic animals that express a 3xFLAG::WAGO-1 fusion protein and deep-sequenced the small RNAs that coprecipitate with WAGO-1. For each mutant, the fraction of reads matching coding genes, non-annotated loci and repeat elements was reduced with concomitant increases in the fraction of miRNA and 21U-RNA reads (
Figure S7). Conversely, the fraction of reads matching to a particular set of coding genes, non-annotated loci and repeat elements were enriched in the small RNA library prepared from the WAGO-1 immunoprecipitate, while miRNAs and 21U-RNAs were severely depleted ().
We next asked whether the reduction of 22G-RNAs in each mutant occurred globally or at particular loci. 22G-RNAs targeting protein-coding loci were nearly completely eliminated in the
drh-3 and
ekl-1 single mutants and in the
rrf-1 ego-1 double mutant. Gene-targeted 22G-RNAs were far less likely to be depleted in the
rde-3,
mut-7 and MAGO12 mutant samples (), with the notable exception that 22G-RNA species targeting a subset of genes with normally very high 22G-RNA levels were strongly depleted in each of these mutants (
Figure S8). In contrast, 22G-RNAs were largely unaffected in an
rde-4 mutant, which is required for ERI-class small RNAs (
Duchaine et al., 2006;
Lee et al., 2006; J.V., W.G. and C.C.M., unpublished). The 22G-RNAs depleted in the
rde-3,
mut-7 and MAGO12 mutants were almost completely overlapping (
Figure 5C and
Table S3). Despite an overall reduction of 22G-RNA reads, a subset of 22G-RNA species was not depleted in
rde-3,
mut-7 and MAGO12 mutants ( and
S8); in fact, some were increased in proportion. These WAGO-independent 22G-RNA populations are associated with and dependent upon another germline-expressed AGO, CSR-1 (
Figure S9; Claycomb et al., cosubmitted). The bimodal distribution of 22G-RNA loci indicates that at least two qualitatively distinct 22G-RNA pathways exist in the germline that depend on a core set of factors (DRH-3, EKL-1 and RdRP), whose small-RNA products interact with distinct AGOs.
Consistent with the requirement for WAGO-pathway components in exo-RNAi, the WAGO-associated 22G-RNAs appear to be involved in silencing their respective targets. Loci with the highest levels of 22G-RNAs in wild-type were consistently derepressed in the
drh-3 mutant as assessed by semi-quantitative, polymerase chain reaction with reverse transcription (qRT-PCR, ) and Affymetrix tiling arrays (
Figure S10). In contrast, CSR-1 associated 22G-RNAs do not appear to silence their targets (Claycomb et al., cosubmitted), consistent with the biological distinction between these pathways.
Previous work has shown that Tc elements are silenced in the germline by an RNAi mechanism (
Ketting et al., 1999;
Sijen and Plasterk, 2003;
Tabara et al., 1999). Individual Repbase annotations, which include all major classes of transposons in
C. elegans were uniformly depleted of 22G-RNAs in
drh-3,
ekl-1, RdRP,
rde-3,
mut-7 and MAGO12 mutant samples and most transposon 22G-RNAs were enriched in the WAGO-1 IP sample (). Transposon loci showed normal levels of 22G-RNAs in an
rde-4 mutant sample (;
Tabara et al., 1999). Thus, the transposon-silencing pathway in
C. elegans consists of DRH-3, EKL-1, RdRPs, RDE-3, MUT-7 and multiple WAGOs, including WAGO-1.
The drh-3 alleles described here display the hallmarks of mutator class, Rde mutants (). Indeed, spontaneous mutants with phenotypes that revert at high frequency were cloned from the drh-3 mutants, including a dpy-5::Tc5 insertion. The frequency of reversion from Dumpy to wild type, upon excision of Tc5 from dpy-5 in drh-3(ne4253), was similar to an allele of mut-7(ne4255) that was isolated in the same screen, and almost 5-fold higher than the nonsense allele mut-7(pk204) (). Similar results were obtained with an unc-22::Tc1 insertion (data not shown). Furthermore, Tc1 and Tc3 transcripts were derepressed in the drh-3 mutant (), demonstrating that DRH-3 is required for transposon silencing.
22G-RNAs and surveillance
Approximately 15% of 22G-RNA reads were derived from non-annotated loci and were dependent on RDE-3, MUT-7 and MAGO12. These loci primarily correspond to unique intergenic sequences and could represent pseudogenes or cryptic loci that lack open reading frames and are unrecognizable by current bioinformatic approaches. In some cases, we could predict potential splicing patterns based on anti-sense reads spanning these non-annotated regions (
Figure S11A). Consistent with this notion, 22G-RNAs derived from many loci annotated as pseudogenes were also depleted in
rde-3,
mut-7 and MAGO12 mutants (). As with annotated genes targeted by WAGO-associated 22G-RNAs, qRT-PCR and microarray analysis demonstrated that both pseudogene and cryptic loci targeted by 22G-RNAs were desilenced in the
drh-3 mutant (Figures and
S12).
Upon closer inspection of protein coding loci targeted by the WAGO pathway, we noted that the 22G-RNA profile often did not correspond to the annotated gene prediction (
Figure S11B and
Table S4). In many cases, 22G-RNAs mapped within predicted introns, suggesting that the corresponding introns were not spliced in the target RNA. In other cases, 22G-RNAs started or ended abruptly in the middle of the annotation and extended well upstream or downstream of the gene prediction, suggesting that the annotation is incomplete or incorrect. Lastly, we noticed a number of WAGO-target genes with intron annotations in 3′UTRs.
Because pseudogenes and genes with 3′UTR introns are expected to be targets of the nonsense-mediated decay (NMD) pathway, we asked whether 22G-RNA biogenesis was dependent on the PIN domain protein SMG-5, the Upf1 helicase SMG-2, and the phosphatidylinositol-kinase SMG-1 (
Anders et al., 2003;
Glavan et al., 2006;
Grimson et al., 2004;
Page et al., 1999). 22G-RNAs derived from
X-loci and
K02E2.6, which has a 3′UTR intron, were reduced in the null mutant
smg-5(r860) and to a lesser extent in the non-null
smg-2(r863) (). K02E2.6 and X-loci 22G-RNAs were unchanged in the temperature-sensitive mutant
smg-1(cc546) at both permissive and non-permissive temperatures ( and data not shown). These data suggest a role for SMG-2 and SMG-5 in 22G-RNA biogenesis that is distinct from their recognized role in NMD and that NMD
per se is not required for 22G-RNA biogenesis.
Deep-sequence analysis of smg-5 mutant small RNAs revealed that SMG-5 is required for the biogenesis of 22G-RNAs targeting 15% of WAGO-dependent 22G-RNA target genes (). Interestingly, SMG-5 was not required for most pseudogene-derived 22G-RNAs (data not shown). Furthermore, the few published endogenous targets of NMD do not appear to be 22G-RNA targets (data not shown). Roughly half of the SMG-5 dependent 22G-RNA loci overlap with RDE-4 dependent 22G-RNA loci (), which includes both ERI-dependent and ERI-independent 22G-RNA loci (J.V., W.G. and C.C.M., unpublished). These findings indicate that multiple WAGO-dependent 22G-RNA pathways exist, which together define a general surveillance system that silences transposons and aberrant transcripts.