|Home | About | Journals | Submit | Contact Us | Français|
Pseudogenes populate the mammalian genome as remnants of artefactual incorporation of coding messenger RNAs into transposon pathways1. Here we show that a subset of pseudogenes generates endogenous small interfering RNAs (endo-siRNAs) in mouse oocytes. These endo-siRNAs are often processed from double-stranded RNAs formed by hybridization of spliced transcripts from protein-coding genes to antisense transcripts from homologous pseudogenes. An inverted repeat pseudogene can also generate abundant small RNAs directly. A second class of endo-siRNAs may enforce repression of mobile genetic elements, acting together with Piwi-interacting RNAs. Loss of Dicer, a protein integral to small RNA production, increases expression of endo-siRNA targets, demonstrating their regulatory activity. Our findings indicate a function for pseudogenes in regulating gene expression by means of the RNA interference pathway and may, in part, explain the evolutionary pressure to conserve argonaute-mediated catalysis in mammals.
Small-RNA-directed gene silencing pathways have been adapted to accept numerous inputs and to act on many types of downstream targets. In few places is this more apparent than in animal germ lines where two classes of small RNAs—microRNAs (miRNAs) and Piwi-interacting RNAs (piRNAs)—with distinct biogenesis mechanisms and biological functions have been reported. Although miRNAs, as a group, are ubiquitously expressed, piRNAs have thus far been found only in germ cells and in a few gonadal somatic cells types2. piRNAs repress the activity of mobile genetic elements, forming a small RNA-based, innate immune system with both genetically encoded and adaptive components2–9.
In mice, a homozygous mutation in any single Piwi family member causes male sterility accompanied by gonadal hypotrophy 5,10,11. In Mili and Miwi2 mutants, meiosis is not completed and germ cells are progressively lost5. This correlates with an activation of transposons, particularly the non-long terminal repeat (LTR) retrotransposon, L1 (refs 5, 12). DNA methylation of L1 elements is correspondingly lost. In contrast, females bearing homozygous mutations in individual Piwi genes are apparently normal and fertile 5,10,11. Because female germ cells must also control transposons, we sought to characterize their small RNA profiles to determine whether a piRNA system, similar to that operating in spermatocytes, also exists in oocytes.
Approximately 6,000 fully grown oocytes, arrested in prophase of meiosis I, were collected. Small RNA fractions from 19–24 nucleotides (lower fraction) and 24–30 nucleotides (upper fraction) were gel purified and used to prepare small RNA libraries. These were deeply sequenced3,4. A total of 1,037,355 sequences was obtained that could be mapped to the mouse genome (753,981 from the lower fraction and 283,374 from the upper fraction; Supplementary Table 1). In the lower fraction, 126,515 non-redundant sequences were obtained, falling into 24,271 non-overlapping clusters. In the upper fraction, 97,807 non-redundant sequences fell into 15,032 non-overlapping clusters.
An examination of the small RNAs in the upper fraction of the oocyte library revealed a piRNA population that resembled those found in early-stage spermatocytes3 (Fig. 1a, right). Roughly 62% of small RNAs correspond to annotated repeats (Supplementary Table 2), with 3% matching genic sequences and 3% matching unannotated, intergenic sites. The function of the latter species remains unknown. Roughly 30% of the library corresponded to presumed breakdown products of abundant, non-coding RNAs, such as ribosomal RNAs, transfer RNAs and small nucleolar RNAs.
As expected, oocyte piRNAs arise from discrete genomic loci in a strand-asymmetric fashion (Supplementary Table 3). A number of these loci share structural similarities to Drosophila piRNA loci, which act as master controllers of mobile elements3. One example (Fig. 1b) spans ~120 kb of chromosome 10 and contains an abundance of long interspersed elements (LINEs) and LTR elements. These have an orientation bias that results in the generation of predominantly antisense piRNAs (Fig. 1b, piRNA, weighted; see also Supplementary Fig. 1).
piRNAs have been proposed to act with transcripts from active transposons in a feed-forward amplification loop that confers signature features on a piRNA population that is mounting an ongoing transposon defence2–4,6,7. Primary piRNA-directed cleavage of transposon mRNAs creates the 5′ ends of secondary piRNAs4,6. This produces primary and secondary piRNA pairs that overlap by 10 nucleotides at their 5′ ends. The 5′ U bias of primary piRNAs thus leads to an enrichment of an A at position 10 of secondary piRNAs. These characteristics are prevalent in piRNA populations from mouse oocytes, particularly those that can be mapped to the L1 and intracisternal A particle (IAP) elements (Supplementary Fig. 2).
As expected, annotated miRNAs comprised the majority (69%) of 19–24-nucleotide RNAs (Fig. 1a, left; see also Supplementary Table 4). Among the highly abundant species are members of the let-7 family (let-7a/c/f), generally abundant miRNAs (miR-22, miR-16, miR-21, miR-26, miR-93 and miR-29a/b), and miRNAs abundant in ovary and placenta (miR-322, miR-503, miR-451). Finally, we detected miRNAs specifically expressed in male and female gonad (miR-103)13.
A substantial fraction of 19–24-nucleotide RNAs matched annotated transposons (Supplementary Table 2). Many that mapped uniquely to the genome could be traced to oocyte piRNA loci (Fig. 1b, siRNA, weighted). These species might represent piRNA degradation products, or oocyte piRNA clusters might generate both siRNAs and piRNAs.
Therefore, we independently mapped piRNAs and candidate siRNAs to consensus L1 and IAP sequences (Supplementary Fig. 3). Each gave characteristic profiles. Moreover, piRNAs and candidate siRNAs show distinctly different nucleotide biases, with piRNAs displaying their characteristic enrichment for a 5′ uridine residue and an A at position 10 (Supplementary Fig. 2). Candidate siRNAs lack a 10A bias and show enrichment for both A and U residues at their 5′ ends (Supplementary Fig. 2). Finally, we gel purified 19–30-nucleotide RNAs from mouse oocytes as a single fraction and deeply sequenced this population. A length distribution of small RNAs that match the piRNA cluster shown in Fig. 1b yields two distinct peaks (Fig. 1c). siRNAs 21–22 nucleotides in length apparently predominate over the piRNA population, which averages ~27 nucleotides. We conclude that transposon-rich loci in oocytes give rise to both siRNAs and piRNAs. Although siRNAs are apparently more abundant, piRNA cloning frequencies could be reduced by the 2′-O-methyl modification that occurs on their 3′ termini2. Our results raise the possibility that piRNA and siRNA systems may act redundantly to repress transposons in mouse oocytes, perhaps explaining the lack of substantial phenotypic consequences of individual Piwi mutations in females5,10,11.
Although many transposons were targeted by both piRNAs and siRNAs, some relied more heavily on a particular pathway. For example, MTB and MTC, both LTR retrotransposons, matched almost exclusively to siRNAs. Moreover, the most prominent cluster that produces MTB/MTC small RNAs contains an inverted repeat with strong potential to produce a Dicer substrate. Notably, this transposon class showed increased expression in Dicer-null oocytes, consistent with its being regulated predominantly if not exclusively by the siRNA system14.
Small RNA libraries often contain genic sequences. In other tissues, these correspond exclusively to sense sequences that probably represent contaminating degradation products. However, in oocytes, numerous sense and antisense siRNAs corresponding to protein-coding genes could be identified (Supplementary Table 5). As mammals lack any identifiable RNA-dependent RNA polymerase, this raised the question of how antisense siRNAs might be generated.
On the basis of polymorphisms, uniquely mapping sense siRNAs could often be assigned to the functional protein-coding copy of a gene, whereas antisense siRNAs mapped to a homologous pseudogene (Fig. 2a and Supplementary Table 5). Thus, oocyte endo-siRNAs might be processed from double-stranded (ds)RNAs that form by hybridization of transcripts derived from two unlinked loci. A similar process in which transcripts from active transposons hybridize to antisense transposon fragments transcribed from piRNA clusters could explain the genesis of transposon siRNAs.
siRNAs from gene–pseudogene pairs arise exclusively from regions of complementarity between the partners. Because many sense-oriented siRNAs cross exon–exon junctions, we propose that mature, spliced mRNAs from genes interact with antisense pseudogene transcripts to form Dicer substrates (Supplementary Fig. 4). In one case (Fig. 2b), both sense and antisense siRNAs to the GTPase-activating protein for Ran (Ran-GAP) were produced from a pseudogene locus containing a ~300-base pair (bp) inverted repeat with an intervening ~800-base loop. siRNAs were derived only from the potentially double-stranded portion of this locus.
In some cases, Dicer processing of dsRNA substrates proceeds in an apparently processive fashion from a discrete initiation site, producing ‘phased’ small RNAs with an ~21-nucleotide periodicity15. Transposon-derived and genic siRNAs showed this property only very weakly (Supplementary Fig. 2). Notably, piRNAs also show a similar, very weak phasing signal, although with a period of ~27 nucleotides rather than ~21 nucleotides.
Pseudogenes have often diverged substantially from their functional ancestors. Thus, we wished to examine the possibility that pseudogene-derived antisense siRNAs could regulate corresponding protein-coding genes. We mapped antisense siRNAs to potentially relevant regulatory targets. Many small RNAs aligned to their targets either with no mismatches or with mismatches lying outside regions essential for slicer cleavage16,17 (Fig. 2a, c). Thus, antisense, pseudogene-derived siRNAs might be capable of regulating homologous protein-coding genes through a conventional RNA-interference mechanism.
To test the regulatory potential of pseudogene-derived siRNAs, we assessed the effects of Dicer loss on their putative targets. We have previously shown that deletion of Dicer in growing oocytes causes the production of non-functional gametes with defects in spindle organization and chromosome segregation14,18. We compared the expression of candidate endo-siRNA targets in wild-type and Dicer-null cells14. Many genes with abundant, pseudogene-derived siRNAs showed significant increases in expression following Dicer loss (Fig. 3a). We verified candidates derived from the array data by semi-quantitative polymerase chain reaction with reverse transcription (qRT–PCR, Fig. 3b).
Collectively, our data indicate that in mammalian oocytes, protein-coding mRNAs interact with pseudogene transcripts to form dsRNAs that are processed into endo-siRNAs. Examination of Dicer knockouts indicates a function for endo-siRNAs in gene regulation. At present, we cannot distinguish whether these small RNAs direct target cleavage or whether the act of siRNA production per se, which consumes the coding mRNA, is sufficient for repression. However, the specific case of HDAC1 may point to a RNA-induced silencing complex (RISC)-based mechanism. Few uniquely mapping siRNAs are generated from the Hdac1 gene itself, suggesting that it is not used prominently as a Dicer substrate. Instead, most uniquely mapping sense and antisense siRNAs can be assigned to a series of Hdac1 pseudogenes. On the basis of its increased expression in Dicer-null oocytes, we propose that pseudogene-derived, antisense siRNAs direct RISC to cleave Hdac1 mRNAs.
The catalytic potential of at least one argonaute protein has been conserved through mammalian evolution from platypus to humans19,20. However, mammalian miRNAs, with one known exception, act through translational mechanisms without the need for mRNA cleavage21. The discovery of endogenous siRNAs in mammalian oocytes not only expands the realm of mammalian small RNA classes but also provides one possible explanation for the evolutionary pressure to conserve argonaute enzymatic activity.
Pseudogenes have long been considered to be non-functional artefacts of transposition pathways acting on protein-coding mRNAs. In a few cases, regulatory roles have been posited for pseudogenes, largely through antisense mechanisms22–24. Our findings, and those of the accompanying paper25, provide a role for a subset of mammalian pseudogenes in the production of functional siRNAs. The production of dsRNAs by interaction between sense and antisense transcripts from distinct loci has not been observed in other tissues and may require the unique environment of oocytes, which substantially lack a protein kinase R response (a dsRNA-induced general translational repression pathway) and are geared for mRNA stabilization and storage26–28. The fact that many targets of this pathway are related to microtubule dynamics (including microtubule-based processes, P = 0; kinesin complex, P = 0; motor activity, P < 1 × 10−254; spindle, P < 8 × 10−239; and microtubule-associated complex, P < 3 × 10−60; Supplementary Fig. 5) suggests that the regulatory circuits that we describe may have important biological roles, as the consequences of Dicer loss in growing oocytes is disruption of proper spindle formation and defects in chromosome segregation14,18.
Mouse oocytes were collected from primed mice and used to prepare small RNA fractions. These were cloned and deeply sequenced as previously described2,4. Bioinformatic analysis of the sequences was performed as described in the Methods. For semi-quantitative RT–PCR, RNA was extracted from fully grown oocytes from Dicerflox/flox and Dicerflox/flox Zp3-cre mice. Quantitative PCR was performed using TaqMan probes.
We thank members of the Hannon laboratory for discussions. O.H.T. is a Bristol-Meyers Squibb fellow and A.G. is a Florence Gould Foundation Scholar of the Watson School of Biological Sciences. E.P.M. is supported by a fellowship from the Australian-American Association. This work was supported in part by grants from the NIH (R.M.S. and G.J.H.) and gifts from Kathryn W. Davis and the Stanley family (G.J.H. and E.H.). G.J.H. is an Investigator of the Howard Hughes Medical Institute.
Supplementary Information is linked to the online version of the paper at www.nature.com/nature.
Small RNA data sets can be accessed in GEO with the following accession numbers GSM261957, GSM261958 and GSM261959.
Reprints and permissions information is available at www.nature.com/reprints.