A recent landmark discovery has identified a novel class of small RNAs in mammalian testes that is expressed during spermatogenesis [1
]. PIWI-interacting RNAs (piRNAs) are typically ~30 bases long, associate with PIWI proteins, and are organized into distinct genomic clusters (reviewed in [7
]). The function of piRNAs is currently unknown,
but the homology of PIWI proteins to Argonaute proteins, key components of the small interfering RNA pathway, and the similarities of piRNAs to microRNAs and short-interfering RNAs (siRNAs), known as negative regulators of gene expression, suggest a role in RNA-dependent regulatory processes during meiosis. Furthermore, piRNAs are similar to repeat-associated small interfering RNA (rasiRNA), a class of small RNAs that are responsible for transposon silencing in the Drosophila
] (and recently identified in Zebrafish
]), suggesting analogies between rasiRNAs and mammalian piRNAs in terms of biogenesis and function. Note that the terms rasiRNA and piRNA are often used interchangeably. Here we refer to the PIWI-interacting small RNAs from Drosophila
as rasiRNAs and the mammalian counterparts as piRNAs without discounting functional similarity.
To better understand the origin of piRNAs, we compared the available three largest mouse piRNA datasets (identified at the pachytene stage of spermatogenesis) in terms of sequence similarities and cluster organization. Given the comprehensive nature of these efforts and the focus on a common specific stage in mouse spermatogenesis, we expected close agreement between the datasets. Indeed, the three groups report similar location, size, and strand organization of the piRNA genomic clusters (A). However, the three sets of sequences are surprisingly dissimilar suggesting a much larger underlying pool of potential piRNAs from which each group has been independently sampled. We estimate the size of the pool to be about ~2 × 105 potential piRNAs, based on the number of sequences in each datasets and their overlaps.
Sequence and Cluster Overlaps between Datasets A, B, and C
We further show that 25% of piRNA clusters are bracketed by inverted repeats of varying length, suggesting that some of the long piRNAs single-stranded precursors [1
] can form a double-strand RNA (dsRNA) intermediate from inverted repeats that may trigger piRNA biogenesis. Taking into account positional nucleotide frequencies and copy numbers of experimentally determined piRNAs, we conclude that piRNA precursors are processed by a quasi-random mechanism that generates large numbers of distinct piRNA sequences.
Discovery of piRNAs
Five groups reported the discovery of small RNAs expressed exclusively in mammalian testes (mouse, rat, and human) that bind MIWI (murine PIWI) or MILI proteins [1
]. Here, we focus on the three largest datasets (A–C, listed in decreasing number of piRNA sequences identified in [1
]) each with thousands of distinct piRNA sequences (a recent fourth comprehensive dataset of MILI-bound piRNAs identified in the pre-pachytene stage of spermatogenesis [6
] is not included in this analysis). The number of unique piRNA sequences ranges from 3,482 to 40,102 (Table S1
), as a result of the different methods used to identify the sequences. Overall, the length distributions of piRNAs peak at 29–31 nucleotides. However, the MILI-bound piRNAs (dataset C) [3
] are generally shorter (26–28 nt) than the MIWI-bound piRNAs (29–31 nt) [1
], possibly due to differences in binding modes of the two proteins.
The short length of piRNAs and the structural homology between PIWI and Argonaute proteins are suggestive of functional similarities between piRNAs and microRNAs. However, the combined evidence indicates that both the biogenesis and function of these two classes of RNA are distinct (). Primary differences are in genomic organization, sequence conservation, and in the number of unique sequences—among which are hundreds of microRNAs and tens of thousands of piRNAs. The majority of the identified piRNAs have a preference for a uridine base at the first position (78%–94%). Similar 5′ bias was observed in other types of small RNAs such as microRNAs and siRNAs, although to a lesser extent. The 5′ U is reminiscent of processing by RNase III enzymes [17
] but may also reflect preferential binding to the Argonaute-like proteins. Although microRNAs and piRNAs share similar 5′ termini, other aspects of their biogenesis pathways are noticeably distinct: (i) piRNAs undergo 2′-O-methylation at their 3′ end [22
], which animal microRNAs do not; (ii) microRNA precursors are characterized by a distinct hairpin structure whereas piRNA precursors have no apparent secondary structure; and (iii) in contrast to microRNAs, piRNA maturation is independent of Dicer enzymes [16
Comparison of microRNAs and piRNAs
The majority of piRNAs (81%–96%) is organized in clusters (Figure S1
) with distinct strand preference that ranges from 1 to 127 kb in size and are found predominantly in autosomes. Some of the clusters are organized in a bipartite arrangement with a stretch of piRNAs on one strand adjacent to a second stretch of piRNAs on the opposing strand. This organization is consistent with bi-directional transcription—for a minority of the clusters—from a common origin that generates two RNA precursors. The organization of piRNAs into clusters is common to mouse, human, and rat with significant conservation of the cluster genomic locations (synteny) [2
]. In contrast, there is very little conservation at the level of individual piRNA sequences (unpublished data and previously reported by [1
]). Most reported piRNAs are in un-annotated intergenic regions and only a small fraction appears to be derived from mRNAs (5.7%–12%) or is coincident with other classes of RNAs such as snoRNAs, tRNAs, rRNAs, or miRNAs (0.2%–3.5%) [1
piRNAs bind MILI and MIWI proteins, which are members of the PIWI protein family, a subclass of the Argonaute family. In eukaryotes, Argonaute proteins are key components of the interfering RNA pathway in which they bind mature microRNAs or siRNAs to form the RNA-induced silencing complex (RISC) [27
]. All three murine PIWI members (MIWI, MILI, and MIWI2) are required for spermatogenesis as determined by knockout experiments and are predominantly expressed in testes in partially overlapping time intervals [28
]. Recent reports link mammalian MIWI protein to chromatoid bodies (also known as nuages in Drosophila
]. These are cytoplasmic structures found in all mammalian spermatogenic cells that physically associate with the nuclear membrane during spermatogenesis and contain an RNA helicase protein (VASA). The function of chromatoid bodies is unknown but they are presumed to be the site of post-transcriptional processing and storage of mRNAs analogous to processing bodies in somatic cells (P-bodies) [33
]. It is unknown if the co-localization of MIWI proteins to chromatoid bodies is linked in any way to their function with piRNAs.
Similarities between rasiRNAs and Mammalian piRNAs
rasiRNAs are a class of interfering RNA with a size distribution of 23–28 nucleotides that were identified in a number of organisms [17
]. They originate from repeat sequences related to transposable elements and heterochromatic regions [15
], and evidence supports their involvement in transposon silencing [13
]. rasiRNAs are found in both female and male germline where they bind members of the PIWI family (Piwi, Aub, and Ago3 in Drosophila
]. There are two distinct types of Drosophila
rasiRNAs (there is evidence that similar classes exist in Zebrafish
]); the first type bind Piwi or Aub proteins, are mostly antisense to transposable elements, and enriched for 5′ uridine. The second type bind Ago3 proteins, are mostly sense to the transposable elements, and enriched in adenosine at position 10. The different strand-specificity and the U and A enrichments led to the hypothesis that the biogenesis of the two types of rasiRNAs is coupled [13
]. In this model the Piwi/Aub-associated rasiRNAs guide the 5′ cleavage of the Ago3-associated rasiRNAs by hybridization to the sense transcript. Similarly, the Ago3-bound rasiRNAs direct the 5′ cleavage of the Piwi/Aub-bound rasiRNAs by hybridization to the anti-sense transcripts. Thus, the two rasiRNA types are engaged in a mutual amplification loop that facilitates the silencing of multiple transposon copies.
The length characteristics, testis-specific expression, PIWI interaction, genomic organization, and 5′ uridine enrichment suggest that piRNAs may be the mammalian equivalent of rasiRNAs. This would support the idea that mammalian piRNAs might be involved in silencing transposable elements. However, at present, there are a number of differences that cast doubt on this functional analogy. First, genomic annotation of piRNAs indicates that only 12%–20% are repeat derived [1
], which is smaller than the frequency of repeat sequences in the mouse genome (37.5%) [35
], while Drosophila
rasiRNAs originate preferentially from repeat regions. Second, mammalian piRNAs originate from one strand or the other forming clusters with continuous strand bias whereas rasiRNAs originate from both strands of the clusters with positional enrichment for “U” and “A.” We explored the analogy between rasiRNAs and piRNAs, but did not find significant 5′ partial complementarity between piRNA sequences as found in rasiRNAs [13
]. However, at present, sequences associated with the third mouse testes–specific MIWI protein (MIWI2), also essential for spermatogenesis and linked to transposon silencing [31
], have not yet been identified. Future identification of MIWI2-bound piRNAs—in analogy to Ago3-bound Drosophila
rasiRNAs—enriched for adenosine at position 10 with partial complementary match to other piRNAs would be strongly suggestive of functional similarity between rasiRNAs and piRNAs.
The discovery of large sets of piRNAs raises a number of important biological questions. In particular, what is the biochemical role and cellular function of PIWI-bound piRNAs during spermatogenesis? Are they involved in transposon silencing, chromosome rearrangements (as are 30-nt PIWI-bound RNAs in Tetrahymena
]), or chromosome pairing? What are the evolutionary constraints on piRNA sequences? Answers to these questions will primarily emerge from further experiments. Here, we focused on the basic questions of how many piRNA sequences
there are and how they are produced
. We reasoned that a detailed computational comparison of the three major datasets, representing independent discoveries of piRNAs, provides insight into the organization of genomic clusters, the number and distribution of sequences within the clusters, and, by implication, their biogenesis.