|Home | About | Journals | Submit | Contact Us | Français|
Conceived and designed the experiments: MAF ACB JKK. Performed the experiments: ACB AMD VK. Analyzed the data: MAF SYC. Wrote the paper: ACB MAF JKK.
Piwi-interacting RNAs (piRNAs) fulfill a critical, conserved role in defending the genome against foreign genetic elements. In many organisms, piRNAs appear to be derived from processing of a long, polycistronic RNA precursor. Here, we establish that each Caenorhabditis elegans piRNA represents a tiny, autonomous transcriptional unit. Remarkably, the minimal C. elegans piRNA cassette requires only a 21 nucleotide (nt) piRNA sequence and an ~50 nt upstream motif with limited genomic context for expression. Combining computational analyses with a novel, in vivo transgenic system, we demonstrate that this upstream motif is necessary for independent expression of a germline-enriched, Piwi-dependent piRNA. We further show that a single nucleotide position within this motif directs differential germline enrichment. Accordingly, over 70% of C. elegans piRNAs are selectively expressed in male or female germline, and comparison of the genes they target suggests that these two populations have evolved independently. Together, our results indicate that C. elegans piRNA upstream motifs act as independent promoters to specify which sequences are expressed as piRNAs, how abundantly they are expressed, and in what germline. As the genome encodes well over 15,000 unique piRNA sequences, our study reveals that the number of transcriptional units encoding piRNAs rivals the number of mRNA coding genes in the C. elegans genome.
Across the animal kingdom, Piwi-interacting small RNAs (piRNAs) protect genome integrity and promote fertility. While the functions of piRNAs are well-characterized, far less is known about how they are generated and how their expression is regulated. In the Caenorhabditis elegans genome, a conserved sequence motif lies upstream of many piRNA loci and appears to regulate their expression. We combined computational and experimental approaches to investigate the role of this motif in the expression of C. elegans piRNAs. We discovered that >70% of piRNAs are differentially enriched in male versus female germline, and these male and female piRNAs show different upstream motifs. Using a transgenic system for expressing synthetic piRNAs in vivo, we demonstrate that variation of a single nucleotide within this motif influences piRNA germline enrichment. We further show that the conserved motif is capable of driving piRNA expression in genomic isolation. Accordingly, the genomic distribution of these motifs determines which sequences are expressed as piRNAs in C. elegans. Our results suggest that each C. elegans piRNA represents an independent transcript whose sequence, abundance, and germline enrichment are encoded by a variant upstream motif, defining a novel modality for expression of piRNAs.
piRNAs and Piwi clade Argonautes arose in the primordial metazoan ancestor  and are generally restricted to the germline, where they act in an RNA-induced silencing complex (RISC) to silence foreign genetic elements. From protozoa to mammals, loss of Piwi proteins, and consequently piRNAs, results in abnormal fertility phenotypes or sterility, revealing their highly conserved and essential role in animal reproduction 2–8. piRNAs are incredibly diverse, with tens of thousands of unique sequences expressed in any single organism. While piRNAs in many organisms map to large, broadly syntenic genomic clusters, the sequences are not conserved among even closely related species, and no unifying sequence features have been identified beyond a bias among primary piRNAs for a 5′ uridine –.
The mechanisms of de novo piRNA biogenesis remain elusive. In fly and mouse, primary piRNAs appear to be processed from long, single-stranded RNA precursors , , . This long transcript is cleaved by the endoribonuclease Zucchini with little or no sequence specificity to generate candidate piRNA 5′ ends , , which are likely subsequently purified according to the binding preferences of the Piwi proteins that bind primary piRNAs . Silkworm data suggest that the 3′ ends of these piRNA precursors are then trimmed by a 3′ to 5′ exonuclease until the 3′ end is sufficiently short for anchoring by Piwi to protect against further trimming . The 3′ end is then methylated to prevent degradation –. While recent studies have shed light on the biogenesis of primary piRNAs in many animal models, little is known in any organism about how primary piRNA expression is regulated or how specific sequences are designated as piRNAs.
21U RNAs, a class of germline-enriched small RNAs, represent the piRNAs of Caenorhabditis elegans. They are terminally methylated –, show a 5′ uridine bias , and are dependent upon and bound by the Piwi Argonaute PRG-1 , , which is required for normal fertility . Yet C. elegans piRNAs exhibit some unusual features. While the vast majority of 21U RNAs map to two large genomic clusters on chromosome IV, the loci do not exhibit prominent strand biases . The 21U RNAs also do not appear to play a prominent role in silencing transposable elements, a main function of mouse and fly piRNAs, nor do they engage a ping-pong amplification mechanism , . Rather, PRG-1 and the 21U RNAs target aberrant and coding transcripts broadly via imperfect complementarity, triggering production of secondary endogenous siRNAs –. These 21U RNA-dependent 22G RNAs can induce chromatin changes to establish dominant, heritable target silencing –. 21U RNAs evolve rapidly, presumably constrained only by selection against sequences that silence mRNAs; thus, mismatch-tolerant 21U RNAs constitute an epigenetic memory of self versus non-self. Finally, a conserved motif lies upstream of 21U RNA genomic loci . This stretch of sequence, which includes an eight-nucleotide (nt) core motif approximately 40 nt upstream of the 21U RNA locus, is conserved across divergent nematodes , . Recently, Cecere et al. found that this motif is bound by forkhead family transcription factors and that deletion of the core motif abrogates 21U RNA expression , but it is still unknown how 21U RNA sequences are defined and how their expression is regulated.
Here, we demonstrate that piRNAs are expressed autonomously in C. elegans. Combining computational and transgenic approaches, we find that the conserved core motif defines the piRNA transcriptional cassette, specifying expression of 21U RNAs from genomic thymidines situated at an optimal distance downstream to determine which genomic sequences are expressed as C. elegans piRNAs. Core motifs also encode information dictating germline-specific expression of 21U RNAs. We show that more than 70% of C. elegans piRNAs are preferentially enriched in male or female germline. Unexpectedly, this germline enrichment appears to be enforced by a single nucleotide position within the core motif. We demonstrate autonomous expression of synthetic 21U RNAs from multiple minimal transgenic cassettes consisting only of the 8 nt core motif, the ~40 nt intervening genomic spacer, the 21U RNA sequence, and ~50–100 nt of flanking genomic context. Finally, we use single-copy transgenes integrated in genomic isolation to show that the clustered organization of endogenous piRNA loci is entirely dispensable for robust piRNA expression. Together, our results suggest that each 21U RNA locus encodes all of the information necessary for driving independent, autonomous transcription from more than 15,000 unique piRNA loci in C. elegans.
To investigate the mechanisms regulating piRNA expression, we first identified 21U RNA subclasses by performing a meta-analysis of over 50 million reads from published small RNA deep sequencing datasets , – (Table S1). Using the pipeline shown in Figure 1A, we determined that a majority of the 13,711 21U RNAs represented in our composite dataset show differential germline enrichment, distinguishing 7,677 (56.0%) unique male and 2,171 (15.8%) unique female germline-enriched 21U RNAs (hereafter, male and female 21U RNAs) (See Materials and Methods). The distribution of 21U RNA Enrichment scores is skewed toward the male (Figure S1A), whereas randomly generated 21U RNA count data show no significant skewing (Binomial test, p=0.245) and define a false discovery rate below 1% (Figure S1B). To assess the reliability of the Enrichment score in classifying germline enrichment, we quantified the average relative abundance of every male 21U RNA between each pair of male and female libraries (Figure S1C); the reciprocal calculation was performed for female 21U RNAs (Figure S1D). On average, the abundance of male 21U RNAs is 6.8-fold higher in male libraries than female, whereas the abundance of female 21U RNAs is 2.4-fold higher in female libraries than male. Average abundance of 21U RNAs not classified as male or female (hereafter, non-enriched 21U RNAs) is approximately equal in male and female libraries (Figure S1E). Taqman RT-qPCR of select 21U RNAs in fem-1(hc17) adult female versus him-8(e1489) or fog-2(q71) adult male animals shows segregation of 21U RNAs according to germline enrichment classification (Figure 1B, 1C), endorsing our computational discovery of germline-enriched piRNA subclasses in C. elegans.
Our meta-analysis also revealed a subpopulation of 21U RNAs highly abundant in embryo. Comparison of the abundances of male and female 21U RNAs in mixed stage embryo sequencing libraries showed that female 21U RNAs were overrepresented in embryo relative to male. A higher proportion of unique female 21U RNAs were detected in embryo (χ2 test, p=9.2e−45) (Figure S2A, S2B). Furthermore, unique female 21U RNAs were on average 4.4-fold more abundant in embryo than unique male species (Welch's t-test, p=3.4e−148). The trend is corroborated by Taqman analysis showing depletion of male 21U RNAs and enrichment of female 21U RNAs in embryo (Figure S2C–S2E). These data suggest that female piRNAs are preferentially inherited into C. elegans embryo, consistent with previous observations in fly –. Parallel classification and embryonic enrichment analysis of 26G RNAs, germline-enriched primary endo-siRNAs, recapitulated previously observed inheritance patterns  and validated the ability of our pipeline to identify germline-enriched small RNA subclasses (Figure S2F, S2G).
21U RNAs target transcripts with imperfect complementarity of up to three mismatches to trigger production of antisense 22G RNAs proximal to the targeting site , , , . The lax complementarity requirement for piRNA-mediated silencing predicts widespread targeting capacity. Compartmentalization of piRNA expression to the male and female germline may help to confer specificity. To investigate the biological significance of germline-enriched 21U RNA subclasses, we first examined whether male and female 21U RNAs target distinct subsets of genes. We analyzed the overlap between their respective dependent 22G RNAs by identifying 22G RNAs that map antisense to within 40 nt of 21U RNA target sites  (See Materials and Methods). Ignoring 22G RNAs detected in prg-1(n4357) deep sequencing datasets, as these are likely not 21U RNA-dependent, we identified 11,377 (72.3%) unique 22G RNAs that are likely male 21U RNA-dependent and 3,855 (24.5%) unique 22G RNAs that are likely female 21U RNA-dependent (Figure S3A). Only 494 (3.1%) unique 22G RNAs lie within 40 nt of both a male and female 21U RNA target site, precluding assignment to either category. This overlap is less than expected when 22G RNAs from random but similarly sized sets of 21U RNAs are compared (χ2 test, p=0.012). We then compared the 5,956 male and 1,387 female 21U RNA targets identified in young adult  and gravid  animals, respectively. Overlap between targets (149 overlapping targets) is significantly lower compared to random sets of genes (294 overlapping and 6,756 non-overlapping targets; χ2 test, p=7.7e−13) (Figure S3B).
Because targets of 21U RNAs are subject to transgenerational silencing –, 21U RNAs are unlikely to evolve to target transcripts required in the germline. Similarly, male 21U RNAs would not be expected to target transcripts required for spermatogenesis; however, temporal separation of the spermatogenic and oogenic gonads might permit evolution of male 21U RNAs capable of targeting transcripts required for oogenesis. We examined our data for evidence of this evolutionary signature. As comprehensive lists of genes required for spermatogenesis and oogenesis have yet to be assembled, we used as a proxy lists of transcripts identified by microarray studies as enriched during spermatogenesis (865 transcripts) or oogenesis (1,030) . Comparing male 21U RNA targets to randomly generated gene lists, we found that male 21U RNA targets are indeed depleted of spermatogenesis transcripts (χ2 test, p=0.044), but neither enriched nor depleted for oogenesis transcripts (χ2 test, p=0.76) (Figure S3C). Curiously, we do not observe the same signature for female 21U RNAs (Figure S3D). Their targets are neither enriched nor depleted for spermatogenesis transcripts (χ2 test, p=0.27), as expected, but female 21U RNA targets are significantly enriched for oogenesis transcripts (χ2 test, p=0.0017). These differences between male and female 21U RNA targeting suggest that the evolutionary pressures acting on male and female 21U RNA sequences may differ (See Discussion).
To investigate how 21U RNA germline enrichment information is genetically encoded, we analyzed the genomic loci of the 13,387 21U RNAs that map uniquely to the genome. Comparison of male and female 21U RNA sequences identified no differences in content; therefore, we evaluated the 21U RNA upstream region. The 8 nt core motif, with consensus sequence CTGTTTCA, is separated from the 21U RNA locus by an A/T-rich spacer of ~35 to 42 nt . Scanning the 60 nt upstream of each 21U RNA for the best conserved central GTTTC of the core motif, we found that 6,615 of 7,677 (88%) male 21U RNAs show a canonical, GTTTC-containing core motif, compared to only 1,119 of 2,171 (54%) female 21U RNAs. While the length of the A/T-rich spacer does not differ between male and female 21U RNAs (Figure 2A), core motif sequence analysis revealed a striking difference: only the core motifs of male 21U RNAs are enriched for a 5′ cytidine. 5,765 of 7,677 (77%) male 21U RNAs are located downstream of canonical, GTTTC-containing core motifs with a 5′ cytidine, compared to only 443 of 2,171 (21%) female 21U RNAs (χ2 test, p=7.9e-137) (Figure 2B). To examine whether this 5′ core motif position influences 21U RNA expression, we calculated the average abundance of male and female 21U RNAs grouped by 5′ core motif nt. Male 21U RNAs with 5′ cytidine core motifs are significantly more abundant than all other male 21U RNAs (Figure 2C, Welch's t-test p-values in Table S2), consistent with the previous observation that 21U RNAs whose core motifs better match the consensus sequence are more highly expressed . No other subgroup differs significantly in abundance from all others among the male, female, and non-enriched 21U RNAs (Figure 2C and 2D, Table S2), suggesting that GTTTC-containing core motifs with a 5′ cytidine are overrepresented among male 21U RNAs and may drive male germline expression.
To explore the significance of variation at the 21U RNA upstream motif, we developed a transgenic system to express synthetic 21U RNAs from high-copy, integrated arrays in vivo (Figure 3A). 2–3 kilobase regions of genomic sequence from a chromosome IV piRNA cluster were cloned, and a central 21U RNA (male 21U RNA ♂21UR-1258 or female 21U RNA ♀21UR-2502) was mutated to a unique synthetic 21 nt sequence (21UR-synth) to distinguish transgenic from endogenous expression. The sequences were then further mutated to generate the panel of transgenes shown in Table 1. Transgenes are named for the endogenous 21U RNA replaced by 21UR-synth, with prefixes to indicate transgene type (e.g., ♀Tg2502 represents the otherwise wild-type transgene encoding 21UR-synth in place of ♀21UR-2502). These transgenes are carried by the vector pCFJ178 , which also expresses the C. briggsae unc-119 gene (Figure S4A), enabling gross normalization for variable array expression.
To validate our transgenic system, we examined whether 21UR-synth recapitulates all of the known features and genetic sensitivities of endogenous 21U RNAs. 21U RNAs are 2′-O-methylated at the 3′ terminus by the conserved methyltransferase HENN-1 –. Northern blot for 21UR-synth in transgenic strains identified a 21 nt species that is terminally methylated in a henn-1-dependent manner (Figure 3B). Robust, specific detection of the 3′ terminus by Taqman RT-qPCR  confirms that this species corresponds to 21UR-synth (Figure 3C). Levels of endogenous 21U RNAs ♂21UR-1258 and ♀21UR-2502 are largely unaffected by expression of the transgenes (Figure S4B). Endogenous 21U RNAs are generated in the germline and require PRG-1 for accumulation , . Accordingly, 21UR-synth is highly depleted by loss of prg-1 and in the glp-4(bn2) germline-deficient mutant (Figure 3C–3E). 21UR-synth and endogenous 21U RNAs are also specifically detected in immunoprecipitated PRG-1 complexes, while a microRNA control is not (Figure 3F and 3G, Figure S5). To rule out the unlikely possibility that transgenic products corresponding to the 21UR-synth sequence might be generated by an alternative, Dicer-dependent mechanism, we assayed 21UR-synth accumulation in a null mutant of rde-4. This gene encodes a dsRNA binding protein that is a key cofactor of Dicer in siRNA biogenesis –, but dispensable for 21U RNA production (Figure S4B). Loss of rde-4 does not impair 21UR-synth expression (Figure 3C–3E), suggesting that 21UR-synth does not represent an siRNA generated from the high-copy transgenic array.
Finally, we examined whether the core motif is required for 21UR-synth expression. We scrambled the core motif to eliminate any resemblance to the consensus sequence (♂Scram1258 and ♀Scram2502 transgenes; Table 1). 21UR-synth levels in these strains are depleted by more than 100-fold after normalization for array expression (Figure 3H, 3I), consistent with previous findings that deletion of the core motif depletes 21U RNA expression . Together, these data demonstrate that 21UR-synth represents a bona fide 21U RNA and support the use of this transgenic system for exploring 21U RNA biology in vivo.
We then used our transgenic system to test whether variation at the core motif 5′ position affects germline expression of 21UR-synth (Figure 4A). Endogenous male 21U RNA ♂21UR-1258, which lies downstream of a CTGTTTCA core motif, peaks in expression during spermatogenesis (52 h time point) and is highly expressed in him-8(e1489) male adult; in contrast, expression of endogenous ♀21UR-2502, with an ATGTTTCA core motif, peaks after the spermatogenesis-to-oogenesis transition in adulthood (~72 h) and is highly expressed in fem-1(hc17) female adult (Figure 4B). Accordingly, the ♂Tg1258 and ♀Tg2502 transgenes express 21UR-synth in similar male and female patterns, respectively (Figure 4C and 4D, colored lines/bars). Toggling the core motif from CTGTTTCA to ATGTTTCA (♂C>A1258 transgene) or ATGTTTCA to CTGTTTCA (♀A>C2502) disrupts these germline-specific expression patterns. Whereas 21UR-synth expression from ♂Tg1258 plummets after spermatogenesis, loss of the core motif 5′ cytidine in the ♂C>A1258 transgenic strain results in sustained 21UR-synth expression through oogenesis; the ♂C>A1258 transgene also preferentially expresses 21UR-synth in fem-1(hc17) female (Figure 4C). Thus, mutating the 5′ cytidine of a male 21U RNA core motif results in a failure to restrict 21U RNA expression to spermatogenesis. Similarly, introducing a 5′ cytidine into a female 21U RNA core motif impairs restriction of expression to oogenesis: while ♀Tg2502 expression of 21UR-synth increases dramatically during the spermatogenesis-to-oogenesis transition, gain of the motif 5′ cytidine in the ♀A>C2502 transgene dampens this increase (Figure 4D). These results suggest that this single nucleotide orchestrates the accurate switching of 21U RNA expression in the hermaphroditic germline. However, 21UR-synth expression from the ♀A>C2502 transgene is still high in fem-1(hc17) female, indicating that other elements contribute to female 21U RNA expression patterns. This is consistent with our finding that female 21U RNA core motifs show no bias at the 5′ nucleotide, and indeed ~21% of female 21U RNA core motifs show a 5′ cytidine (Figure 2B). As expected, 21UR-synth expression from the ♂C>A1258 and ♀A>C2502 transgenes is still dependent upon prg-1 (Figure 4E).
It is not yet known how individual genomic sequences are selected for expression as piRNAs. As the core motifs, but not the sequences, of 21U RNAs are conserved across Caenorhabditis species, it seemed possible that the core motifs themselves might determine what sequences are expressed as 21U RNAs by directing their expression from genomic thymidines located an optimal distance downstream. We explored this hypothesis by mutating the genomic thymidines encoding the first nucleotide of 21UR-synth to adenosine (21U>A transgenes) or guanosine (21U>G transgenes), such that the transgenes encode 21[U>A]R-synth or 21[U>G]R-synth, respectively (Figure 5A, Figure S6A). These putative products emulate the 5′ nucleotide identity of microRNAs (predominantly 5′ uridine and adenosine) and endo-siRNAs (predominantly 5′ guanosine). Small RNAs expressed from these transgenes and recognized by the 21UR-synth northern blot probe differ in size from and are less abundant than wild-type 21UR-synth (Figure 5B, Figure S6B). By Taqman analysis, 21[U>A]R-synth and 21[U>G]R-synth are detected at levels more than 150-fold lower than 21UR-synth after normalization for array expression (Figure 5C, Figure S6C), suggesting that 21[U>A]R-synth and 21[U>G]R-synth are poorly transcribed, stabilized, or both.
We hypothesized that 21U RNA expression from a particular genomic thymidine may simply be a function of distance from a core motif (i.e., length of the intervening genomic spacer). Therefore, the presence of multiple thymidines within the optimal genomic window downstream of a core motif might result in expression of multiple, overlapping 21U RNAs. Indeed, many C. elegans piRNAs map to proximal genomic thymidines as members of “miniclusters” of overlapping 21U RNAs that appear to share an upstream core motif. To explore the relationship between core motif position and expression, we extracted read count information from deep sequencing of wild-type adult animals  for uniquely mapping 21U RNAs and analyzed their corresponding genomic loci. After separating 21U RNAs into those that share a core motif with at least one other uniquely mapping 21U RNA (“miniclustered”; 4,550 21U RNAs) and those that do not (“solitary”; 8,837 21U RNAs), we grouped 21U RNAs by length of genomic spacer and examined their abundance. For both miniclustered and solitary 21U RNAs, the resulting distributions peak at a 39 nt spacer length and decrease as the spacer lengthens or shortens (Figure 5D). The evident correlation between spacer length and robustness of expression explains previous observations that miniclustered 21U RNAs routinely show great variation in abundance .
We also observed that miniclustered 21U RNAs with 37–40 nt spacers are more abundant than solitary 21U RNAs at matched positions (Figure 5D, asterisks), suggesting that 21U RNA miniclusters may arise when expression is driven more robustly. To investigate this further, we compared the core motifs associated with miniclustered 21U RNAs (“shared” motifs) versus solitary 21U RNAs (“non-shared” motifs). We found that a significantly larger proportion of miniclustered 21U RNAs (3,580 of 4,550, 79%) than solitary 21U RNAs (5,667 of 8,837, 64%) are associated with canonical, GTTTC-containing core motifs (χ2 test, p=1.4e−66). Additionally, we observed significantly greater thymidine richness in the optimal genomic windows 35–42 nt downstream of shared GTTTC-containing motifs versus non-shared (Welch's t-test, p=4.0e−95) (Figure 5E). Therefore, particular sequences of 21U RNAs may not be specified intrinsically; rather, core motifs may simply direct expression of 21U RNAs from one or more downstream thymidines, depending on the strength of the motif and the number of optimally positioned thymidines.
To further confirm the association between the core motif and germline enrichment, we analyzed miniclusters consisting of two germline-enriched 21U RNAs (1,026 pairs). Random assortment of these 21U RNAs would predict 66% malemale, 4% femalefemale, and 31% malefemale pairs; however, we observed 73% malemale, 12% femalefemale, and only 15% malefemale pairs. Thus, 85% of pairs showed matching enrichment classification (Figure 5F), a significant departure from the 69% expected by random assortment (χ2 test, p=9.6e−28). We note that this paucity of mixed malefemale 21U RNA miniclusters likely contributes to the low number of 22G RNAs that can be attributed to both male and female 21U RNAs (Figure S3A).
The absence of long, unidirectional 21U RNA clusters in the C. elegans genome and the presence of the conserved upstream motif have generated speculation that 21U RNAs represent autonomously transcribed units , , . This is further suggested by our and others' findings that scrambling or deleting the core motif abrogates 21U RNA expression (Figure 3H, 3I and ). To test whether 21U RNAs express independently, we generated transgenes representing putative minimal 21U RNA transcriptional units. Each of these Min transgenes encodes only a single core motif, spacer, and 21U RNA, with limited 5′ and 3′ genomic context (Figure 3A). Strikingly, 21UR-synth expressed from this minimal context shows the same size, prg-1 dependence, rde-4 independence, and germline enrichment as endogenous 21U RNAs (Figure 6A, 6B), indicating that the sequence features conferring these 21U RNA characteristics are contained within a single 21U RNA transcriptional unit. To ensure that the 5′ nucleotide of the core motif still influences germline enrichment within this minimal context, we also generated and tested an independent set of minimal 21UR-synth transgenes with core motif intact (♂Min1415) or first nucleotide toggled (♂MinC>A1415). These transgenes also showed impaired male germline enrichment upon toggling of the core motif 5′ nucleotide (Figure 6C), reaffirming our conclusions that a core motif 5′ cytidine helps to orchestrate 21U RNA male germline enrichment.
To explore the autonomy of the 21U RNA transcriptional unit further, we generated additional transgenes carrying <300 nt of genomic sequence encoding two adjacent 21U RNA transcriptional units on the same strand (Figure 6D). To create the “wild-type” Tg1415-Tg2109 transgene, the upstream 21U RNA locus, corresponding to 21UR-1415, was mutated to encode 21UR-synth, and the downstream locus, corresponding to 21UR-2109, was mutated to encode a different unique synthetic 21U RNA (21UR-synthB). We then scrambled the core motif of the upstream 21U RNA locus to generate the Scram1415-Tg2109 transgene and measured relative expression of the two synthetic 21U RNAs from each transgene. Much as expression of 21UR-synth is vastly decreased by loss of the core motif in the ♂Scram1258 and ♀Scram2502 transgenes above (Figure 3H, 3I), 21UR-synth is expressed at far lower levels than 21UR-synthB from the Scram1415-Tg2109 transgene, whereas expression of the synthetic 21U RNAs from the Tg1415-Tg2109 transgene is comparable (Figure 6E). This experiment specifically pursues a recent finding by Cecere et al. that deletion of the core motif of one 21U RNA does not abrogate expression of neighboring 21U RNAs, although the species assessed were distant, separated by multiple 21U RNA loci, and encoded on both strands .
C. elegans 21U RNA loci, like the piRNA loci of mouse and fly , , , are genomically clustered. The overwhelming majority of 21U RNAs map to two large regions on chromosome IV, and GTTTC, the most highly conserved five nt of the core motif, occurs much more frequently on chromosome IV at these regions (4.0 occurrences per kilobase, occ/kb) than on chromosome IV outside these regions (0.4 occ/kb) or on other chromosomes (0.2 occ/kb). Furthermore, 21U RNAs encoded on chromosome IV are detected at much higher abundance (mean abundance: 148 RPM) than those encoded on other chromosomes (1 RPM) (Welch's t-test, p=2.4e-269). These observations suggest the possibility of a positional requirement for expression of 21U RNA loci: a privileged genomic environment might contribute to the expression of 21U RNAs. To investigate the significance of 21U RNA genomic organization, we carried out rough mapping of the genomic insertion sites of several of the high-copy transgenic arrays. None of the integration loci mapped to chromosome IV (Table 1), indicating that these strains are not expressing 21UR-synth from the context of the 21U RNA genomic clusters. Yet the transgenic arrays themselves could represent 21U RNA-rich genomic microenvironments, much like the chromosome IV 21U RNA clusters. We therefore tested the true autonomy of the 21U RNA by using the MosSCI technique  to insert single-copy transgenes at a locus on chromosome IV not contained within the 21U RNA genomic clusters. Local 21U RNA concentration at the integration site is low, and no 21U RNAs are annotated as mapping to the homology arms encoded on the pCFJ178 MosSCI plasmid. Unexpectedly, single-copy insertions of ♂Tg1258 and ♀Tg2502 transgenes express 21UR-synth at levels easily detectable, albeit tenfold lower than the high-copy arrays. As observed for the high-copy arrays, scrambling of the core motif severely diminishes expression of 21UR-synth from the single-copy transgenic insertions (Figure 6F). Finally, to exclude the remote possibility that chromosome IV origin itself is essential for 21U RNA expression, we used an alternative MosSCI plasmid to insert onto chromosome II a single copy of the ♀Min2502 transgene, which encodes no other 21U RNAs. Like the chromosome IV transgene insertions, ♀Min2502 expresses 21UR-synth robustly (Figure 6F), confirming that 21U RNAs can be autonomously transcribed.
Our data support a 21U RNA biogenesis mechanism wherein the upstream motif and 21U RNA sequence constitute a tiny, independent transcriptional unit that encodes regulated germline expression. The upstream motif as initially identified by Ruby et al.  is necessary for autonomous expression of a 21U RNA from one or more optimally situated downstream genomic thymidines. Importantly, this genomic thymidine may not represent a transcriptional requirement but rather reflect the binding preferences of the Argonaute PRG-1: a heterogeneous pool of candidate 21U RNA sequences may be transcribed and subsequently purified through preferential stabilization by PRG-1. Our transgenic studies showing greatly decreased expression when 21UR-synth is mutated to 21[U>A/G]R-synth cannot differentiate between a transcriptional or post-transcriptional requirement for a 5′ uridine; however, findings in other organisms support the latter mechanism. In mouse and fly, the prevailing model posits that Zucchini generates candidate primary piRNA 5′ ends with very little sequence specificity during the processing step, and then Piwi preferentially binds 5′ uridine piRNAs during the loading step , . This is consistent with in vitro data showing that Siwi, the silkworm ortholog of PRG-1, preferentially incorporates ssRNAs bearing a 5′ uridine .
The upstream motif differences of male and female 21U RNAs suggest that germline enrichment could be achieved through selective transcription in male versus female germlines. Recently, Cecere et al. reported that 21U RNA upstream regions are depleted of nucleosomes . They further observed that RNA polymerase II (Pol II) occupancy shows local peaks in this region, rising steadily over the interval of −300 nt to −50 nt from the genomic thymidine encoding the 5′ uridine of the 21U RNA. Analyzing the same ChIP-seq dataset as Cecere et al., we noticed that the amplitude of the changes in Pol II occupancy at 21U RNA loci is quite modest. Analyzing randomly generated intergenic windows from chromosome IV, we determined that the Pol II ChIP-seq background actually exceeds the “signal” at 21U RNA loci (Figure S7A, S7B), indicating relative Pol II depletion. This overall depletion of Pol II occupancy at 21U RNA loci may indicate that transcription of 21U RNAs is a more transient process than transcription of genes with canonical promoter elements. Thus the ChIP-seq might capture only a small fraction of interactions between Pol II and DNA. However, the Pol II occupancy profiles for the loci encoding the top 25% and bottom 25% of 21U RNAs by abundance are virtually indistinguishable (Figure S7C). Again, this is in stark contrast to mRNA coding loci, for which Pol II occupancy at the top 25% of mRNAs by abundance is much higher than at the bottom 25% (Figure S7D). An alternative possibility is that the open chromatin of the nucleosome-depleted regions upstream of 21U RNA loci is more susceptible to incidental binding by Pol II, causing the modest increase in local occupancy observed by Cecere et al. Should this be the case, the products of Pol II transcription at these loci could be unrelated to 21U RNAs. Cecere et al. also identify a transcript whose 5′ end extends 2 nt upstream of a 21U RNA locus and note that deep sequencing of 5′ capped RNAs reveals many more such transcripts. While these transcripts may represent 21U RNA precursors, they may also represent the products of incidental transcription from 21U RNA loci exposed due to local nucleosome depletion. The levels of such long putative precursors were below the threshold of our detection, precluding further study. Nevertheless, the uncertain 5′ nucleotide identity of the nascent 21U RNA transcript does not affect the interpretation of our results. Further studies, including identification of a cleavage mechanism for the 2 nt 5′ overhang, are needed to confirm these capped transcripts as bona fide 21U RNA precursors. The Zucchini endoribonuclease, thought to generate piRNA 5′ ends in mouse and fly , , is not a likely candidate, as it has no obvious homolog in C. elegans and shows very little sequence specificity, nor is there any evidence in C. elegans for processing of a long 21U RNA precursor into multiple species.
We show that the 5′ nucleotide of the conserved core motif influences germline enrichment of the dependent 21U RNA species (Figure 4, Figure 6). This differential expression of male and female 21U RNAs may be orchestrated by DNA-binding proteins that differ in germline expression patterns and/or binding affinity for 5′ cytidine core motifs. Recently, Cecere et al. demonstrated that the forkhead transcription factors UNC-130, FKH-3, and FKH-5 specifically bind a CTGTTTCA-containing substrate dsDNA probe in vitro . However, male and female 21U RNAs do not appear to be differentially sensitive to depletion of these forkhead proteins, nor do 21U RNAs with and without 5′ cytidine motifs (data not shown and ). Cecere et al. propose that these forkhead proteins play a redundant role in transcription of 21U RNAs. While these are dispensable for viability and fertility, other forkhead proteins are required for development of the germline, precluding testing for a role in transcribing 21U RNAs; these additional forkhead proteins could indeed represent germline-specific or motif-specific transcription factors (Figure S8).
The autonomy of the C. elegans piRNA gene raises the questions of why 21U RNA loci exhibit genomic clustering on chromosome IV and why 21U RNAs encoded on chromosome IV are expressed more robustly. Perhaps the high density of 21U RNAs within these genomic clusters evolved as such: 21U RNA loci, defined by 21U RNA core motifs flanked by A/T richness, accumulated randomly on ancestral chromosome IV. Targeting of any overlapping genes resulted in silencing, subjecting the coding sequences of these genes to drift and eventual elimination. This would deplete the region of genes, reducing selection upon the genomic sequence and thereby permitting further accumulation of 21U RNA loci. The lack of selective pressure related to conservation of protein-coding genes might also explain why chromosome IV loci express 21U RNAs most robustly: the high density of coding and regulatory elements on other chromosomes likely constrains the evolution of features such as flanking A/T-richness that might enhance 21U RNA expression. It is also possible that different transcriptional machineries or different chromatin configurations are required to transcribe 21U RNAs versus other elements.
Genomic clustering of piRNA loci has been proposed to provide a “trap” for mobile elements . In organisms such as mouse and fly where these clusters are transcribed to generate long precursors from which piRNAs are processed , , , the trapping function of the genomic piRNA cluster is readily apparent. Although the 21U RNAs are independently transcribed, Bagijn et al. have identified a similar mechanism acting in C. elegans: the genome shows evidence of recent transposon integration downstream of the conserved upstream 21U RNA motif, sometimes generating 21U RNAs that are antisense to the transposon 3′ end and capable of silencing it . Each conserved upstream motif can therefore serve as an independent trap, with the result that increased accumulation of motifs enhances protection against mobile elements. While retroelements comprise over 40% of the human genome, they appear to have been strongly counterselected in C. elegans, where they constitute only 0.2% of the genome . Perhaps the autonomous piRNA mechanism at play in C. elegans has rendered the animal less susceptible to this kind of mobile element over an evolutionary time scale. Intriguingly, however, C. elegans shows significantly higher rates of gene duplication than fly , and the C. elegans genome shows substantial expansions of gene families; for example, the C. elegans Argonaute family has expanded to over two dozen members, with the evolution of a worm-specific clade. As gene duplications, like mobile elements, may also be targeted by piRNAs, the preponderance of gene family expansions in C. elegans could suggest that this system confers enhanced protection against transposons at the expense of enhanced tolerance for gene duplications. Identification of additional organisms that use similar mechanisms for generating piRNAs will reveal whether this is a pattern or a peculiarity of C. elegans.
Gu et al. recently identified global candidate RNA polymerase II transcription start sites by deep sequencing of capped RNAs . For a large proportion of annotated 21U RNAs, the authors identified 5′ capped, ~26 nt putative precursors with a 2 nt 5′ overhang. Longer RNA reads (70–90 nt) were identified overlapping a very small minority of 21U RNA loci. Abundance of these longer reads correlated poorly with 21U RNA abundance, while the abundance of the short, ~26 nt reads correlated well, suggesting they are likelier to represent 21U RNA precursors. The 5′ cap structure of the putative 21U RNA precursor indeed suggests transcription by Pol II, although our analysis of Pol II occupancy data is inconclusive.
C. elegans were maintained according to standard procedures. The Bristol strain N2 was used as the standard wild-type strain. The alleles used in this study, listed by chromosome, are: unmapped: xkIs11[♂Scram1258 cb-unc-119(+)], xkIs12[♀Scram2502 cb-unc-119(+)], xkIs14[♂21U>A1258 cb-unc-119(+)], xkIs15[♂21U>G1258 cb-unc-119(+)], xkIs16[♂Min1258 cb-unc-119(+)], xkIs17[♀21U>A2502 cb-unc-119(+)], xkIs18[♀21U>G2502 cb-unc-119(+)], xkIs19[♀Min2502 cb-unc-119(+)], xkIs20[♂Min1415 cb-unc-119(+)], xkIs21[♂MinC>A1415 cb-unc-119(+)], xkIs22[Tg1415-Tg2109 cb-unc-119(+)], xkIs23[Scram1415-Tg2109 cb-unc-119(+)]; LGX: xkIs10[♀A>C2502 cb-unc-119(+)]; LGI: glp-4(bn2), prg-1(tm872), xkIs5[♀Tg2502 cb-unc-119(+)]; LGII: xkSi30 [♀Min2502 cb-unc-119(+)], xkIs6[♂C>A1258 cb-unc-119(+)]; LGIII: rde-4(ne301), henn-1(tm4477); LGIV: xkSi3[♂Tg1258 cb-unc-119(+)], xkSi13[♀Tg2502 cb-unc-119(+)], xkSi17[♂C>A1258 cb-unc-119(+)], xkSi20[♀A>C2502 cb-unc-119(+)], xkSi23[♂Scram1258 cb-unc-119(+)], xkSi28[♀Scram2502 cb-unc-119(+)], fem-1(hc17), him-8(e1489); LGV: fog-2(q71), xkIs1[♂Tg1258 cb-unc-119(+)]. Transgenic allele details and corresponding strain names are shown in Table 1.
C. elegans samples were generated as previously described . Samples for Taqman RT-qPCR validation of 21U RNA germline enrichment classification analysis were collected in biological duplicate. Samples collected for RNA-immunoprecipitation (RIP) analysis were collected in biological duplicate and analyzed in independent experiments with technical duplicates. All other samples were collected in biological triplicate. All samples analyzed represent adult animals unless otherwise stated.
RNA isolation, beta-elimination, northern blot analysis, Taqman RT-qPCR, and mRNA quantitation were performed as previously described . RIP analysis was performed as follows: A custom rabbit polyclonal anti-PRG-1 antibody was generated by Proteintech Group, Inc using an N-terminal peptide antigen (MASGSGRGRGRGSGSNNS (C)) conjugated to keyhole limpet hemocyanin (KLH) carrier protein. Antisera were affinity purified using Affi-Gel 10 gel (Bio-Rad). PRG-1 was purified from synchronized gravid animals using this anti-PRG-1 rabbit polyclonal antibody. For each IP, 10 µg of anti-PRG-1 antibody was cross-linked to Dynabeads Protein A (Invitrogen) and incubated with lysate prepared from 0.3 ml of frozen worms at 4°C for 1 hr. Beads were washed 4× with RIP wash buffer (50 mM Tris-HCL pH 7.5, 200 mM KCL and 0.05% NP-40). After final wash, beads were split into equal volumes for RNA extraction and western blot procedure. For western blot analysis: 30 ul of 1× Tris-glycine SDS sample buffer (Invitrogen) without DTT was added directly to beads and incubated at 50°C for 10 min. 0.1 M DTT was then added to samples and boiled for 5 min before loading on gel. Proteins immobilized on Immobilon-FL transfer membrane (Millipore) were probed with anti-PRG-1 rabbit polyclonal antibody or anti-gamma-tubulin rabbit polyclonal antibody (LL-17) (Sigma) (12,000). Peroxidase-AffiniPure goat anti-rabbit IgG secondary antibody was used at 110,000 (Jackson ImmunoResearch Laboratories) for detection using Pierce ECL Western Blotting Substrate (Thermo Scientific). For RNA extraction: 1 ml of TRI-Reagent (Ambion) was directly added to beads and incubated at room temperature for 5 min. RNAs were precipitated in isopropanol for 1 hr at −30°C followed by three washes with 70% ethanol.
Small RNA quantitation was performed as previously described . All 21U RNA qPCR data from transgenic studies were normalized to miR-1 levels. As a result of this normalization, some small RNAs whose levels are not detectable (cycle number >36) appear to be detected due to small variation in detection of miR-1. 21UR-synth is not detectable in non-transgenic animals at any stage at which it was assessed. All Cbr-unc-119 qPCR data were normalized to act-1 mRNA levels. The sequence of 21UR-synth is 5′ TGATATGCGATGTAGTAGACT 3′. The sequence of 21UR-synthB is 5′ TTAGTCGTATGTGACGCTGCC 3′. Full small RNA sequences were submitted to Applied Biosystems for design of Taqman assays. Northern blot probe sequences used for this study: miR-1 5′ TACATACTTCTTTACATTCCA /3StarFire/ 3′; ♀21UR-2502 5′ CAGCAGTCTACTACAATTTCA /3StarFire/ 3′; 21UR-synth 5′ AGTCTACTACATCGCATATCA /3StarFire/ 3′. RT-qPCR primer sequences used for this study are as follows: act-1 F 5′ CCAGGAATTGCTGATCGTATGCAGAA 3′, R 5′ TGGAGAGGGAAGCGAGGATAGA 3′; Cbr-unc-119 F 5′ AACGACGTTTTAGCACTTCCG 3′, R 5′ GGATTTGGAACTTGGTGAACTCG 3′.
To generate the base of the 1258 transgene, sequence spanning genomic coordinates IV:14390835–14393692 was used; IV:14392513–14392673 was used for the ♂Min1258 transgene. To generate the base of the 2502 transgene, sequence spanning genomic coordinates IV:15395699–15397722 was used; IV:15396667–15396886 was used for the ♀Min2502 transgene. To generate the base of the Min1415 transgene, sequence spanning genomic coordinates IV:16564187–16564395 was used. To generate the base of the Tg1415–Tg2109 transgene, sequence spanning genomic coordinates IV:16564133–16564395 was used; the Tg1415–Tg2109 and Scram1415-Tg2109 transgenes carry a 13 nt deletion downstream of both 21U RNA loci. Coordinates were taken from the C. elegans genome WS220. The mutations described in Table 1 were introduced through site-directed mutagenesis or inverse PCR with phosphorylated primers. Transgenes were then subcloned into the pCFJ178 (IV) or pCFJ151 (II) vector. The chromosome IV transgene insertion site lies outside the larger 21U RNA genomic clusters, and the homology arms of chromosome IV MosSCI vector pCFJ178 do not encode any annotated 21U RNAs. Transgenes were confirmed by sequencing and injected into animals with pharyngeal and/or body wall muscle coinjection markers to distinguish transgenic animals. High-copy arrays were integrated through ultraviolet irradiation. MosSCI single-copy insertions were generated as previously described .
Raw data files from 24 small RNA sequencing experiments , – were downloaded from NCBI Gene Expression Omnibus . Artificial linker sequences were removed using an in-house linker removal pipeline. We first searched each sequence for a perfect match to the linker. If a perfect match was not found, we searched for an alignment to the linker with 1 mismatch. If not found, we searched for a perfect alignment between the last 5 nt of the sequence and the first 5 nt of the linker. If not found, we repeated this search allowing 1 mismatch. We continued this pattern to align 4 and 3 nt. Sequences with no linker alignment were discarded (~20% of reads).
Reads were aligned to the reference C. elegans genome version WS220 using Bowtie  with the following parameters: -f -v 2 -k 50 --best --strata. Mapped read counts in each library were normalized to the number of total mapped reads in that library and to the number of mapped genomic loci. Sequence abundance is reported as reads per million mapped reads (RPM). To determine 21U RNA abundance, we first generated 21U RNA genomic coordinates by aligning 15,703 known 21U RNA sequences  to the C. elegans genome version WS220 using Bowtie. Perfect, full-length alignments for 15,093 of these sequences were considered valid 21U RNA coordinates. Reads mapping entirely within these coordinates were annotated to 21U RNAs.
Germline enrichment classifications of 21U RNAs were generated based on read counts in 17 germline libraries: 14 male germline libraries prepared from isolated spermatogenic cells, isolated spermatids, or whole adult males; and 3 female germline libraries prepared from purified oocytes or whole adult hermaphrodites defective in sperm production (Table S1). 1,198 21U RNAs had no read counts in any of these libraries and were removed from our analysis. 184 21U RNAs had higher read counts in a prg-1(tm872) young adult library compared to an N2 young adult library  and were removed from our analysis, leaving 13,711 21U RNAs for which we assessed germline enrichment. Libraries generated using a 5′-monophosphate-dependent (5 male, 1 female) versus -independent (9 male, 2 female) protocol were separated for calculation of the Enrichment Score as follows: For each 21U RNA, we calculated fold abundance difference between every male and female library, for a total of 23 comparisons. Each 21U RNA began with an Enrichment Score of 0. For every comparison, if the 21U RNA was more then 5-fold abundant in the male library, the Enrichment Score decreased by 1; if the 21U RNA was more than 5-fold abundant in the female library, the Enrichment Score increased by 1. Male 21U RNAs were defined as those with Enrichment Scores ≤−3, while female 21U RNAs were defined as those with Enrichment Scores ≥3. Remaining 21U RNAs were classified as non-enriched. To validate enrichment classifications, the fold abundance differences for each 21U RNA were averaged across all 23 comparisons. Less than 1% of 21U RNAs classified as male or female do not show enrichment by average fold abundance in their respective libraries. These 21U RNAs were reclassified as non-enriched for subsequent analyses. 21U RNA Enrichment scores and germline enrichment classifications are in Dataset S1.
To approximate the number of 21U RNAs falsely classified as male or female germline-enriched by our method, we performed Enrichment Score calculations on randomly generated count data modeled from an N2 young adult library . 11,458 21U RNAs are represented in this library. Because 17 germline libraries were used for the real analysis, we generated 17 control libraries as follows: For each 21U RNA, 17 random counts were generated from a Poisson distribution with λ=α (where α is set to the 21U RNA count in the N2 library) and assigned to one of 17 control libraries. After all counts were assigned, the 17 control libraries were randomly grouped to represent the number of male or female and 5′-monophosphate-dependent or -independent libraries used above. Enrichment Score calculations were then performed on these control libraries as described above, and the number of 21U RNAs classified as germline-enriched was calculated. This protocol was repeated 1,000 times. On the basis of this randomized data, we defined an Enrichment Score threshold of −/+3, inclusive, for classifying 21U RNAs as male or female germline-enriched, respectively. Application of this threshold to the randomized data resulted in classification of, on average, only 0.76% (101 of 11,458) of 21U RNAs as germline-enriched, corresponding to a false discovery rate below 1%. This value is consistent with the less than 1% of 21U RNAs classified as male or female that do not show enrichment by average fold abundance in their respective libraries.
26G RNA annotations were taken from Han et al., 2009 . The abundances of 4,002 26G RNAs were measured in 13 of the 17 libraries used for 21U RNA Enrichment Score calculations. Four male libraries (GSM465244, GSM503843, GSM459329, and GSM459331) were excluded because the animals used in preparation of the libraries carried mutations in genes required for 26G RNA expression , , . Enrichment Score calculations were performed on the 13 remaining libraries as above, for a total of 16 malefemale comparisons. We retained the Enrichment Score threshold for classifying 26G RNAs as male or female germline-enriched.
21U RNA target and 22G RNA information for young adult animals (N2 and prg-1(n4357)) was obtained from Bagijn et al. ; raw sequencing data files for gravid adult animals (N2 and prg-1(n4357)) were downloaded from GEO . Raw sequences were processed as described above, and reads 22 nt long and starting with guanosine were annotated as 22G RNAs. 21U RNA targets were defined as transcripts with 0–3 mismatches to a 21U RNA sequence. 21U RNA-dependent 22G RNAs were defined as 22G RNAs that map antisense to transcripts within 40 nt of a 21U RNA target site. The number of 22G RNAs that map to both male and female 21U RNA target sites was compared to a control number of 22G RNAs that map to both a random set of male and a random set of female 21U RNA target sites. These random target sites were defined as the target sites of 7,677 randomly selected 21U RNAs representing “male” 21U RNAs and the target sites of 2,171 randomly selected (and not overlapping random male) 21U RNAs to represent “female” 21U RNAs. This random selection was repeated 1,000 times. A similar randomization process was repeated to compare with the number of genes targeted by both male and female 21U RNAs.
Core motifs of 21U RNAs were visualized using WebLogo and correcting for C. elegans genome nucleotide composition . To account for variability in the location of core motifs relative to their 21U RNA loci, upstream regions were aligned by the central 3 Ts of the core motif. If no core motif was identified within 60 nt upstream of a 21U RNA, we aligned position −44 relative to the 21U RNA locus to the G of the core motif, corresponding to the previously identified most common position of the G . Only 21U RNAs that map to a single locus in the genome (13,387 of 13,711 21U RNAs, 97.6%) were analyzed since 21U RNAs that map to more than 1 locus may have different upstream sequences.
Nucleosome and Pol II occupancy profiling for 21U RNA loci was centered on the genomic thymidine encoding the 21U RNA 5′ uridine. Profiling for transcripts was centered on transcription start sites (TSS) defined as the start of 5′ UTRs annotated in the Ensembl66 database . Intergenic regions were defined as regions absent of an annotated 5′ UTR, exon, intron, 3′ UTR or small RNA transcript that were partitioned into randomly distributed, non-overlapping 1,000 nt windows. Profiling for intergenic regions was centered on these 1,000 nt windows.
Young adult TSS expression was calculated as fragments per kilobase per million mapped reads (FPKM) using biological replicates from a transcriptomic sequencing experiment . Transcriptome sequence data were removed of linkers and aligned to the C. elegans genome version WS220 using TopHat . Cufflinks  was used to calculate transcript isoform expression. Transcripts with an annotated 5′ UTR were extracted from the Ensembl66 database. Average transcript FPKM across the two libraries was calculated, and the isoform with the highest expression was chosen for nucleosome and Pol II occupancy analyses. For isoforms with equivalent expression, a single isoform was randomly chosen.
Published nucleosome occupancy data  were downloaded from UCSC, and the genomic coordinates were lifted over from WS170 to WS220. Adjusted nucleosome occupancy data centered on 21U RNAs, TSS, and an intergenic background control were averaged for each nucleotide. Pol II ChIP-seq data from young adult worms were downloaded from the modEncode repository . Pol II signal to input ratios on chromosome IV were averaged for each nucleotide. TSS were further filtered to only include transcripts with at least 5 FPKM as calculated above.
Pol II ChIP-seq data from young adult worms were downloaded from the modEncode repository. Pol II signal to input ratios were averaged for each nucleotide separately for male and female 21U RNAs on chromosome IV.
List of 21U RNAs analyzed in this study, their Enrichment scores, their germline enrichment classifications based on Enrichment scores (Classification by score), and final germline enrichment classifications after removal of 21U RNAs whose average fold abundance in the enriched germline was not higher than the non-enriched germline (Adj. for avg. fold abundance).
Computational identification of male and female germline-enriched 21U RNAs. (A) Enrichment Score calculations performed on 17 small RNA sequencing libraries classify a majority of 21U RNAs as male (blue) or female (red) germline-enriched. Non-enriched (NE) 21U RNAs, grey. Numbers indicate percent of 13,711 21U RNAs analyzed. (B) Enrichment Score calculations performed on control data classify <1% of 21U RNAs as male or female germline-enriched indicating a 1% false discovery rate. Numbers indicate percent of 11,458 21U RNAs analyzed. (C) Male 21U RNAs are more abundant in male libraries. Average relative abundance of each male 21U RNA was calculated between each of the 23 malefemale library comparisons. (D) Female 21U RNAs are more abundant in female libraries. Average relative abundance of each male 21U RNA was calculated between each of the 23 femalemale library comparisons. (E) Non-enriched 21U RNAs are equally abundant in male and female libraries.
Female 21U RNAs are preferentially abundant in embryo. (A) Relative male 21U RNA abundance is decreased in embryo. Average relative abundance of each male 21U RNA was calculated between each of 5 male and 4 mixed stage embryo libraries. Dotted line indicates equal male and embryo abundance. Pie chart depicts proportion of male 21U RNAs with reads in at least one embryo library (dark blue). (B) A population of female 21U RNAs shows increased abundance in embryo. Average relative abundance of each female 21U RNA was calculated between each of 1 female and 4 mixed stage embryo libraries. Pie chart depicts proportion of female 21U RNAs with reads in at least one embryo library (dark red). (C,D) Taqman RT-qPCR analysis corroborates male 21U RNA depletion in embryo. Expression of representative male 21U RNAs was assayed by Taqman in him-8(e1489) (C) and fog-2(q71) (D) male animals and N2 embryos. Error bars represent ±1 SD from two biological replicates. (E) Taqman RT-qPCR analysis corroborates female 21U RNA enrichment in embryo. Expression of representative female 21U RNAs was assayed by Taqman in fem-1(hc17) female animals and N2 embryos. (F) Male germline-enriched 26G RNAs are generally absent in embryo. Average relative abundance of each male 26G RNA was calculated between each of 4 male and 4 mixed stage embryo libraries. (G) Female germline-enriched 26G RNAs are robustly expressed in embryo. Average relative abundance of each female 26G RNA was calculated between each of 1 female and 4 mixed stage embryo libraries.
21U RNAs target significantly non-overlapping sets of genes. (A) 22G RNAs are almost exclusively derived from either male or female 21U RNAs, but not both. The number of unique 22G RNAs derived from both male and female 21U RNAs is significantly less than expected if 22G RNAs are selected at random (Fisher's exact test, p=1.2e−02). (B) Male and female 21U RNAs target significantly fewer overlapping genes compared to selecting random sets of genes (Fisher's exact test, p=7.7e−13). (C) 5,956 genes targeted by male 21U RNAs in young adult (YAd) animals are depleted of spermatogenesis genes compared to a random set of 5,956 genes. (D) 1,387 genes targeted by female 21U RNAs in gravid adult (GA) animals are enriched for oogenesis genes compared to a random set of 1,387 genes.
Transgenic array expression varies across transgenes. (A) Levels of Cbr-unc-119 mRNA in adult animals were assayed by RT-qPCR for all transgenes and normalized to act-1 mRNA levels. (B) Expression of transgenic 21UR-synth does not affect expression of endogenous 21U RNA counterparts. Endogenous ♂21UR-1258 and ♀21UR-2502 levels were assayed by Taqman RT-qPCR and normalized to microRNA miR-1 levels in all samples.
21U RNAs are specifically immunoprecipitated with PRG-1 complexes. (A) anti-PRG-1 antibody does not immunprecipitate microRNA miR-1. (B) 21UR-synth expression does not interfere with association of endogenous 21U RNAs with PRG-1.
21U RNA expression requires a 5′ genomic thymidine. (A) Schematic of transgenes encoding 21UR-synth with different 5′ nt. (B–C) Mutation of the 5′ genomic thymidine disrupts expression of 21UR-synth by northern blot (B) and Taqman assay (C). WT and ♀Tg2502 lanes in (B) are repeated from Figure 5B for clarity.
RNA polymerase II occupancy at 21U RNA loci is below background level. (A) Average Pol II occupancy in a young adult library of 21U RNA loci expressing 21U RNAs with at least 5 RPM (red), transcriptional start sites (TSS) expressing transcripts with at least 5 FPKM (green), and randomized intergenic regions (yellow). Only regions on ChrIV were assayed (B) Pol II occupancy as described in (A) but independently scaled for each transcript type and plotted with average nucleosome occupancy (black line). Grey error bands: SEM. (C) Average Pol II occupancy of 21U RNA loci as (B) but showing the top 25% 21U RNAs by abundance (1st quartile) and the bottom 25% (4th quartile) separately. 21U RNAs on all chromosomes are shown. (D) Same as (E) but showing top and bottom 25% of TSS by transcript abundance.
Model of 21U RNA expression.
Descriptions of small RNA sequencing libraries used in this study. GEO Accessions for datasets and libraries used are listed. Libraries generated using 5′-monophosphate-dependent (Dep) or -independent (Indep) RNA extraction protocols are indicated along with how the library was used in this study (“Use” column).
Welch's t-test p-values for all abundance comparisons between 21U RNAs with different core motifs. Highlighted are p-values <0.01. Identity of the 5′ nt corresponding to higher 21U RNA abundance is indicated below each significant p-value. All t-tests are two-tailed. Comparisons of abundances in 5′-monophosphate-dependent and -independent libraries were performed separately.
We thank T. Han, J. Moran, S. Kalantry, E. Fearon, and S. Camper for discussions and comments on the manuscript, and Germano Cecere and Alla Grishok for generously sharing methods. We also thank the Caenorhabditis Genetics Center for C. elegans strains.
ACB was supported by the NIH Genetics Training Grant graduate fellowship T32GM007544-34. MAF was supported by the National Science Foundation Open Data IGERT Grant 0903629. AMD was supported by the NIH Cellular and Molecular Biology Training Grant T32-GM007315. SYC was supported by the National Institutes of Health Bioinformatics Training Grant T32GM070449-06. VK and JKK were supported by grants from the National Institute of General Medical Sciences (NIGMS) R01GM088565 and the Pew Charitable Trusts (www.pewtrusts.org). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.