Distinct siRNA and piRNA pathways act upon 3' UTRs in Drosophila OSS cells
We recently showed that the Drosophila
OSS cell line not only expresses miRNAs, but generates abundant piRNAs and siRNAs from transposable elements (TEs) [11
]. Its accumulation of abundant piRNAs is shared with gonadal tissues and early embryos [13
]. On the other hand, most Drosophila
cell types are capable of mounting an RNAi response, either to endogenous or exogenous triggers [17
]. For example, somatic and germline cells generate endo-siRNAs from TEs and from the overlap regions of convergently transcribed protein-coding genes ("3' cis-NAT-siRNAs") [20
]. Indeed, we categorized >18,000 cis-NAT-siRNAs from OSS small RNA reads, of which 75% were accounted for by the overlap regions of 86 3' cis-NAT pairs that each generated >50 siRNAs ( and Supplementary Table S1A
Figure 1 Distinct spatial patterns of siRNAs and piRNAs mapped to mRNAs. (A) 3' cis-natural antisense transcript (cis-NAT) overlap regions generate nearly exclusively 21 nt reads (siRNAs) in S2 cells. (B) cis-NAT overlaps generate both siRNAs and piRNAs in OSS (more ...)
Although S2 and OSS cells are divergent in their overall gene expression profile, many cis-NAT-siRNA loci were common to OSS cells and S2 cells [20
]. In fact, 17 of the 40 most highly expressed cis-NAT-siRNA loci were shared by these cell types (Supplementary Table 1B
), including CG5148/cher
(4th highest-expressed in OSS, #8 in S2), MED21/CG40351
(#5 in OSS, #1 in S2), cenB1A/CG31365
(#6 in OSS, #14 in S2), and CG7739/AGO2
(#7 in OSS, #3 in S2). These data are consistent with the proposal that there exist preferred cis-NAT substrates for the endo-RNAi pathway [20
]. Curiously, the third most highly-expressed cis-NAT-siRNA locus in OSS cells was piwi/SCAR
(); thus, transcripts for two Argonaute effectors are leading substrates of endo-RNAi in Drosophila
While the reads from S2 cis-NAT overlaps were strictly 21 nt in length, OSS cis-NAT overlap reads exhibited an unexpectedly bimodal size distribution reflecting both siRNAs and piRNAs (). As with other cis-NATs, the SCAR/piwi locus illustrated that piRNA production was not confined to the region of double-stranded transcription, as is characteristic of endo-siRNA production. Instead, piRNAs preferentially mapped to the sense strands of both SCAR and piwi 3' UTRs, and to a lesser extent from the piwi coding region. Such patterns suggested that primary piRNAs derive from independent consumption of transcripts produced from either strand. The spatially segregated registers of piRNA and siRNA production within 3' UTRs indicated that, in addition to being translated, subpopulations of a transcript could enter multiple small RNA biogenesis pathways.
On the basis of these observations, we conducted a transcriptome-wide survey for piRNAs derived from annotated protein-coding transcripts, yielding 1.25 million such piRNAs out of 13.8 million total library reads (). Thus, the mRNA-directed primary piRNA pathway is highly active: the production of 3' UTR-piRNAs is ~22% the amount of TE-piRNAs and ~30% the amount of miRNAs in OSS cells [11
]; mRNA-derived piRNAs outnumbered cis-NAT-siRNAs by 40-fold. This pathway is also broadly active: at an expression cutoff of >50 piRNAs, we defined 2356 mRNAs that generated 1.1 million piRNAs (Supplementary Table S2
). This collection comprised >12% of Drosophila
transcripts, compared to ~2% of transcripts that generated cis-NAT-siRNAs.
Over 70% (877,808) of mRNA-derived piRNAs mapped to 3' UTRs (), even though the aggregate length of 3' UTRs was much smaller than coding regions (5.4Mb vs. 22.5Mb). The strong bias for 3' UTR-directed piRNA production was illustrated by plotting the cumulative relative density of piRNAs derived from 5' UTRs, CDS and 3' UTRs (). Despite this strong bias, a small subset of mRNAs preferentially generated piRNAs across their coding regions ().
Figure 2 Preferred production of piRNAs from 3' UTRs. (A) Normalized piRNA read density amongst the transcript features of 5’ UTRs, CDS and 3’ UTRs in genes with >50 piRNAs from OSS cells. Values greater than 1 indicate enrichment of piRNAs, (more ...)
The upper levels of piRNA accumulation were considerable: 180 genes generated >1000 piRNAs (Supplementary Table S2 and Supplementary Figure S1
). The loci with the most abundant 3' UTR piRNAs were traffic jam
) and brain tumor
) with >67,000 and >22,000 piRNAs, respectively, and their piRNAs covered the entirety of these 3' UTRs (). tj
encodes a Maf-bZIP transcription factor that is specifically expressed in somatic gonadal cells and genetically required for ovary and testis development [24
is a translational regulator required for development and growth control [25
], and has also been implicated in the miRNA pathway [27
]. The 3' UTR-directed piRNA pathway produced levels of small RNAs from some transcripts that were equivalent to reasonably highly-expressed miRNAs (e.g., only 13 miRNA genes generated more reads than tj
, Supplementary Table S2
Drosophila 3' UTR piRNAs are produced by a primary pathway dependent on Piwi
To gain evidence for exonic piRNA production in the animal, we analyzed large-scale data of small RNAs from total ovaries and 0–2 hour embryos published by Hannon and colleagues [14
]. We tallied ~100,000 such piRNAs in the combined available ovary data, and ~65,000 piRNAs in the combined embryo data. When normalized for the total reads in each dataset, OSS cells expressed considerably higher levels of 3' UTR-derived piRNAs, ~56,800 reads per million (RPM) vs. 4,500 RPM in ovaries and 3,000 RPM in early embryos. We defined 316 and 253 transcripts in the ovary and embryo, respectively, which crossed a normalized threshold equivalent to the 50 piRNA cutoff for OSS cells. Importantly, the strong majority of 3' UTR piRNA-generating transcripts detected in the animal were shared by OSS cells (Supplementary Figure S2A
), indicating that this reflects a normal pathway rather than an aberrant feature of cultured OSS cells. The 12.6-fold elevation of 3' UTR-derived piRNAs in OSS cells was in line with our observation that follicle-cell specific flamenco
piRNAs are ~8-fold higher in OSS than in total ovaries [11
], suggesting that 3' UTR-piRNAs are preferentially generated in follicle cells. Similar to primary TE-piRNAs, 3' UTR-derived piRNAs exhibited a 5' U bias (66% of 3' UTR-piRNAs), which is also consistent with their production by a primary piRNA pathway.
The Hannon and Zamore groups recently sequenced ovarian small RNAs from heterozygous or homozygous mutants in a variety of piRNA pathway components [28
]. Hannon and colleagues observed that all piRNA pathway mutants exhibited decreased levels of TE-piRNAs, but that only piwi
specifically decreased levels of flamenco piRNAs [28
]. These and other observations supported a model in which germline cells (but not follicle cells) engage in strong ping-pong amplification, and that zucchini
are uniquely involved in a primary piRNA biogenesis pathway in somatic ovarian cells.
We collected the mRNA-derived reads from these datasets and observed that 3' UTR-derived piRNAs (and coding exonic piRNAs to a lesser degree) exhibited many similarities to TE-piRNAs across the mutants. For example, zucchini
mutants were strongly decreased for mRNA-derived piRNAs (), consistent with their role in primary TE-piRNA biogenesis. flamenco
did not substantially alter 3' UTR piRNA population, befitting its status as a piRNA substrate rather than a biogenesis component per se
. In contrast, the ping-pong factors armi
exhibited increased proportions of 3' UTR piRNAs. However, this does not necessarily reflect that these genes limit the accumulation of 3' UTR piRNA. We also observed that the mutants with the strongest reduction in piRNAs from the ping-pong cluster 42AB--armi
]--correspondingly exhibited the highest proportions of 3' UTR-piRNAs (). We hypothesize that the absence of TE-piRNA amplification may incidentally lead to higher representation of 3' UTR piRNAs in the remaining small RNA pool.
Examination of the other Piwi-class mutants, aub
, was also informative [28
]. While AGO3
mutants had increased proportions of 3' UTR piRNAs, similar to most other ping-pong mutants, aub
actually exhibited decreased 3' UTR piRNAs. This suggested that Aub received some 3' UTR piRNAs via a primary biogenesis pathway operative in the germline, consistent with the notion that Aub and Piwi are both loaded with primary piRNAs while AGO3 mostly contains secondary piRNAs [2
]. Still, the piwi
mutation clearly had greatest effect on 3' UTR piRNA accumulation. Indeed, piwi
exhibited substantial haploinsufficiency, since piwi
heterozygotes accumulated far fewer 3' UTR piRNAs than did aub
heterozygotes (). In summary, these analyses support the existence of a primary piRNA biogenesis pathway in vivo
, with specificity towards the 3' UTRs of protein-coding transcripts and marked dependence on PIWI.
Conservation of 3' UTR piRNAs in vertebrates
We next assessed whether the mRNA/3' UTR-directed piRNA pathway was apparent in mammalian gonads. We analyzed published mouse pre-pachytene testes (10 dpp) data [4
] and generated new small RNA data from post-pachytene (8 weeks adult) testes. The proportion of pre-pachytene spermatocytes in relation to other testicular cell types is highest at 10 dpp; subsequently, post-pachytene spermatocytes and spermatids dominate in the testes [34
]. At 10 dpp, Mili appears to be sole Piwi protein expressed, whereas the adult testis expresses both Mili and Miwi [4
Exonic piRNAs have been noted [4
], but their regional preferences within protein-coding transcripts were not previously characterized in detail. Examination of previously published lists of the top 200 fetal, 100 pre-pachytene, and 94 post-pachytene clusters [4
], indicated that many of these clusters could be re-classified as 3’ UTR-directed piRNA clusters (Supplementary Table S3
; e.g. clusters ranked #34 and #2 in fetal and 10dpp testis, respectively, are piRNAs against the 3’ UTR of ELK4
, while adult testis cluster ranked #49 has piRNAs against the 3’ UTR of CBL
). As was the case in Drosophila
follicle cells, we observed substantial bias for 3' UTR-directed piRNA production in mouse testes (), although there were distinct populations of transcripts with apparent CDS or even 5' UTR enrichment for piRNA production. 3' UTR-piRNAs were seemingly much more abundant at 10dpp (~35% of all library reads) than in the adult (1.8% of all reads), although their representation in the adult library may incidentally be lowered due to the tremendous output of pachytene piRNA clusters [7
Figure 3 Mammalian 3' UTR-directed piRNAs and representative directed transcripts. (A) Size distribution of reads sequenced from total small RNAs (top graph), a MIWI IP (middle graph), and a MILI IP (bottom graph) from adult mouse testes extract. Dotted boxes (more ...)
From combined data in 10dpp and adult testes total RNA libraries, we identified 829 transcripts 50 or more sense-oriented, uniquely mapping piRNAs, with 83 transcripts generating >1000 piRNAs (Supplementary Figure S3 and Supplementary Table S5A and S5B
). For abundant piRNA-generating mRNAs, the strong majority of piRNAs were located in 3' UTRs (424,227 uniquely mapping reads) relative to 5' UTR and CDS (55,799 total uniquely mapping reads). Even following normalization for gene lengths, piRNA density was highest amongst 3' UTRs in both 10dpp and adult testes libraries (). The Ets-domain oncogene family gene, ELK4
, and the Abelson murine leukemia viral oncogene homolog 2, ABL2
, serve as typical examples of transcripts with 3' UTR-directed piRNA production (). The enrichment of 3' UTR piRNAs is actually more considerable since ~1/3 of the reads mapping outside of 3' UTRs derived from non-coding RNAs like Rnu1b2
, whose abundant reads came from point-sources (Supplementary Figure S4
); or genes like Fth1
, which are transcripts with abundant piRNA production from 5' UTR and CDS (Supplementary Figure S5
Analysis of a published 10 dpp library from mili
−/− testes [4
] revealed a strong depletion of genic piRNAs, including at 3' UTRs ( and Supplementary Figure S3
). This genetic dependence echoed the observation that 3' UTR-piRNAs are highly dependent on Drosophila piwi
. We observed a strong 5' U bias (>85%) for murine mRNA/3' UTR-derived piRNAs, suggesting that these are primary piRNAs as in flies.
Loading of 3' UTR-derived piRNAs into Piwi complexes in three animal clades
We next sought biochemical validation of the loading of 3' UTR piRNAs into Piwi complexes. Analysis of Piwi immunoprecipitates (IP) from OSS cells revealed that it carried piRNAs derived from the tj and brat 3' UTRs, but did not contain the microRNA miR-2 (). Reciprocally, AGO1-IP contained miR-2 but not 3' UTR piRNAs. In adult mouse testes, we detected Abl2 and Elk4 piRNAs in both Mili and Miwi complexes, but not in mAGO2 complexes. Equivalently-sized probes directed at coding exons of these genes did not yield signal (data not shown), consistent with the much lower abundance of coding piRNAs from these mRNAs.
Figure 4 Association of 3’ UTR-directed piRNAs in Piwi-protein complexes. (A) RNAs from PIWI, MILI, and MIWI IPs were probed on Northern blots with cloned 3’ UTR fragments. (B) The 3’ UTR probes are specific for piRNAs from Piwi-protein (more ...)
Curiously, the peak size of 3' UTR piRNAs in Mili complexes (27–28 nt) was distinctly shorter than those in Miwi complexes (29–30 nt), reminiscent of the observation that different Piwi proteins are associated with characteristic sizes of piRNAs [3
]. The sizes of the Northern blot signals from the Miwi and Mili IP were in concordance with deep-sequencing analysis (). In addition, the observation that bulk 3' UTR piRNAs in total RNA were the same size as those in Miwi-IP suggested that Miwi is the predominant carrier of 3' UTR piRNAs in adult testes. On the other hand, Mili may be the primary carrier of 3’UTR piRNAs in 10dpp testes because Miwi is not yet expressed at this stage [4
]. Consistent with this, a previously reported 10 dpp Mili-IP library contained 3' UTR-biased distribution of 26–27 nt piRNAs, similar to total RNA data from this stage [4
We confirmed the distribution of 3' UTR-derived piRNAs by sequencing 8.2 million Mili-IP and 5.5 million Miwi-IP mapped reads from adult testes, and comparing these to our matched total RNA reads. To our knowledge, Mili-IP has not previously been deeply sequenced at this stage, and no deep Miwi-IP data have been reported thus far. We observed that the total RNA library contains three distinct size peaks at 22 nt, 27 nt and 30 nt (). The former corresponds to miRNA reads, while the latter two peaks correspond to piRNA reads. Consistent with the Northern blots, analysis of the IP libraries showed that these peaks segregated precisely with the contents of Mili and Miwi complexes, respectively. In addition, the total RNA library exhibited greater representation of 3' UTR reads whose sizes corresponded to Miwi complexes, as we inferred from Northern analysis. When focusing on reads that matched to 3' UTRs, we observed similar trends across these libraries, indicating that 3' UTR piRNAs were loaded into distinct Piwi-class complexes in adult testes. Nevertheless, the genes hit by piRNAs were relatively similar between Mili- and Miwi-IPs (data not shown), suggesting their production via a common primary pathway.
To address whether 3' UTR piRNAs might be present in Piwi complexes of mature germ cells in an evolutionarily distant vertebrate outgroup species, we examined in Xenopus tropicalis
eggs piRNAs thatassociated with Y12 antibody, which binds symmetrically methylated arginines [37
] that are present on diverse Piwi proteins [32
]. Despite incomplete genome annotation and the high proportion of reads from intergenic regions and TEs, we identified numerous annotated mRNAs with piRNAs corresponding to exons and 3' UTRs (, data not shown). These reads were less abundant than in mouse and Drosophila
, but KLHL7
serve as examples for piRNA enrichment at 3' UTRs. KLHL7 is implicated in ubiquitination and is associated with retinitis pigmentosa [39
], while DDX3X is a helicase homologous to Drosophila
Belle, which has been implicated as a component in the endo-RNAi pathway [40
]. The existence of 3' UTR piRNAs in Xenopus
provides evidence that this biogenesis pathway has been quite broadly conserved amongst animals.
Selective production of 3' UTR-directed piRNAs from gonadal mRNAs
We next assayed the relationship of piRNA-producing transcripts with Affymetrix gene expression data from OSS cells (this study) and murine testis [41
]. Comparing expression of mRNA generating 3’ UTR-piRNA with the general population, we observed that transcripts with moderate to high level of piRNA were biased for higher expression (mean log2
expression=6.7 for all transcripts called present, 7.3 for mRNAs with >100 piRNAs and 7.4 for mRNAs with >1000 piRNAs, Supplementary Figure S2B
). Nevertheless, a large population of highly expressed transcripts (385 transcripts with log2
expression value of >8 did not generate piRNAs, indicating that piRNA production was not strictly determined by transcript abundance. This was illustrated by plotting the gene expression of all mRNAs vs. those that generated >100 piRNAs and >1000 piRNAs (). One might have expected transcripts with higher piRNA production to be greatly skewed towards higher expression levels; instead, the mRNAs with highest piRNA production spanned the full range of transcript accumulation.
Figure 5 (A) Histograms of genes by their ranked log10 expression levels and their corresponding count of 3’ UTR piRNAs from OSS cells, mouse 10dpp testes total RNA, and mouse adult testes total RNA libraries. Expressed genes are represented in white bars, (more ...)
We found the same to be true for 10 dpp and adult mouse testes. Although there was a slight skew for piRNA-producing transcripts towards higher steady-state levels, they exhibited a similar distribution as the profile of all genes that are expressed in mouse testes (). These data suggest that the primary piRNA pathway selects the 3' UTRs of transcripts across a broad expression range, rather than acting non-specifically on transcripts in proportion to their abundance.
Selection of transcripts with specific GO enrichments by the primary piRNA pathway
Since there appeared to be selectivity for mRNA substrates by the primary piRNA pathway, we tested for enrichment of specific gene ontology (GO) categories. We first analyzed 646 OSS mRNAs that generated >200 3' UTR piRNAs and had GO terms (regardless of gene expression level), relative to a background set of ~3000 lowly-expressed OSS transcripts with GO terms. To assess whether piRNA-producing genes exhibited properties that were distinct from merely abundant transcripts, we analyzed the 675 most abundant OSS mRNAs that produced at most 10 piRNAs. These gene sets overlapped modestly in their GO terms, which included cytoskeleton, RNA and protein metabolism processes and various nucleic acid binding-related functions (). However, the themes amongst GO terms exclusively represented in either dataset were notably different: abundant genes lacking piRNAs were most highly enriched in "housekeeping" GO categories such as general biosynthesis, metabolism, translation and ribosome components, and general transcription machinery. In contrast, the most highly enriched GO terms amongst genes with abundant 3’UTR piRNAs included categories such as development, morphogenesis and regulatory processes, as well as DNA binding proteins, kinases and phosphatases. The specific enrichment of many regulatory and developmental functions amongst abundant piRNA-generating genes, but not amongst abundantly expressed genes per se, suggests possible regulatory coherence by the primary piRNA pathway.
To analyze the murine data, we used cutoffs of ≥50 piRNAs/mRNA in 10 dpp total RNA library, and ≥10 piRNAs/mRNA in total RNA, Mili-IP and Miwi-IP in adult testis libraries (to account for lower 3’ UTR piRNA content in adult libraries). Equivalently-sized cohorts of mRNAs with high 3' UTR piRNAs and a non-overlapping set of the highest expressed mRNAs without piRNAs revealed no overlap in enriched GO Function categories and only a modest overlap in Process categories (), indicating that genes that generated abundant piRNAs were functionally distinct from abundant transcripts. Terms highly enriched in abundant transcripts without piRNAs at both 10dpp and adult testes included many housekeeping categories, such as catabolism, translation and ribosome components. On the other hand, terms that were highly and uniquely enriched amongst abundant piRNA transcripts included transcription and nucleic acid metabolic processes, and zinc ion-binding and kinase-related functions.
Taken together, these analyses clearly demonstrate that abundant piRNA-producing transcripts in Drosophila
OSS cell and murine testis comprise gene cohorts whose functions are very distinct from those of highly-expressed transcripts in either species. Interestingly, several of the GO categories enriched in piRNA-producing transcripts were shared between fly and mouse gonads, even though the actual genes involved were quite different. We take this as a strong suggestion that the selection of mRNAs encoding certain types of GO functions is important to the operation of the 3' UTR-directed piRNA pathway. Interestingly, posttranscriptional gene silencing by RNA was a category enriched amongst abundant piRNA-generating mRNAs in both OSS cells and adult murine testes, but not amongst abundant transcripts in either tissue. In Drosophila
, these genes included brat
, AGO1 gawky/GW182
(Supplementary Table S2
), while in mouse these included TNRC6b
, and piwil2/mili
(Supplementary Table S5A, S5B
Beyond the small subset of small RNAs that coincidentally derive from cis-NATs or are related to TEs (e.g. and ), the bulk of mRNA/3' UTR-derived piRNAs lack highly complementary sequences in the transcriptome. Although some mRNA/3' UTR-derived piRNAs conceivably guide the regulation of trans-encoded targets bearing imperfect matches, a more proximal effect is the removal of 3' UTRs from transcripts via abundant piRNA production. However, we did not observe substantial, consistent changes in the average expression of murine piRNA-producing mRNAs in mili
+/− vs. −/− testes (Supplementary Table S8
In principle, the results from whole gonads might be confounded if only a subset of cells have the capacity for piRNA generation, and/or if there is a discrepancy between mRNA and protein output. We examined this in more detail by generating piwi mutant clones in Drosophila ovaries and staining them for TJ, whose mRNA generated the most abundant 3' UTR piRNAs in OSS cells. We observed many clones in which TJ protein was upregulated in piwi mutant cells relative to neighboring control cells (). There appeared to be a temporal dependence in that younger clones did not exhibit increased levels of TJ, perhaps because of insufficient time for its deregulation. Nevertheless, these data support the notion that this pathway might influence target output.