|Home | About | Journals | Submit | Contact Us | Français|
N.L. performed the small RNA size analyses in Figure 1; R.M. performed some of the northern analysis in Figure 5; S.B. performed the cis-NAT bioinformatics; and K.O. constructed the small RNA libraries and performed the remaining functional tests. The contributions of S.B. and K.O. were considered equal. All authors contributed to the preparation of the manuscript.
Cis-natural antisense transcripts (cis-NATs) have been speculated to be substrates for endogenous RNA interference (RNAi), but little experimental evidence for such a pathway in animals has been reported. Analysis of massive Drosophila melanogaster small RNA data sets now reveals two mechanisms that yield endogenous small interfering RNAs (siRNAs) via bidirectional transcription. First, >100 cis-NATs with overlapping 3′ exons generate 21-nt, Dicer-2 (Dcr-2)–dependent, 3′-end modified siRNAs. The processing of cis-NATs by RNA interference (RNAi) seems to be actively restricted, and the selected loci are enriched for nucleic acid–based functions and include Argonaute-2 (AGO2) itself. Second, we report that extended intervals of the thickveins and klarsicht genes generate exceptionally abundant siRNAs from both strands. These siRNA clusters derive from atypical cis-NAT arrangements involving introns and 5′ or internal exons, but their biogenesis is similarly Dcr-2– and AGO2-dependent. These newly recognized siRNA pathways broaden the scope of regulatory networks mediated by small RNAs.
cis-NAT pairs, for which the same genomic segment is transcribed on both strands, are common in the genomes of most higher eukaryotes1–3. Such transcript arrangements have been proposed as a potentially substantial source of endogenous double-stranded RNA (dsRNA)4, which might conceivably be processed by RNAi into siRNAs. In plants, select cis-NAT pairs generate siRNAs that function in stress response5,6. However, there has been no direct evidence for this mechanism in animals; indeed, the available evidence argues against the generation of RNAi triggers by cis-NATs7.
This is not to say that cis-NAT arrangements are not of regulatory consequence. For example, the steric hindrance of colliding polymerase complexes might disfavor simultaneous transcription of both strands8, or influence alternative splicing or polyadenylation9. On the other hand, if cis-NATs readily formed dsRNA that entered the RNAi pathway, one might expect that forced expression of antisense transcripts would efficiently trigger gene knockdown. This technique is not actually widely applicable10,11, probably because transcripts are likely to be segregated in vivo as ribonucleoprotein complexes at most stages of their life cycle. Therefore, complementary transcripts, even if coexpressed in the same cell, are not necessarily expected to have the opportunity to meet and anneal as dsRNA in vivo.
In this study, we examined large collections of D. melanogaster small RNA sequences and confidently identified two classes of endogenous siRNAs that derive from bidirectional transcription. These include > 100 cis-NAT pairs, most of which overlap on their 3′ ends. siRNA cloning patterns suggest that processed mRNAs engage in dsRNA interactions to yield 3′-modified, 21-nt RNAs. This does not seem to be a default pathway, as we identified hundreds of coexpressed cis-NATs that do not generate siRNAs. We also identified two exceptional regions of the genome whose siRNAs seem to define a distinct mechanism, on the basis of their largely intronic origin, their ability to generate unusually large numbers of siRNAs across extended genomic intervals and their involvement with candidate antisense noncoding transcripts. Contrary to current understanding of how dsRNA is processed, the biogenesis of these siRNAs is, unexpectedly, Loquacious (Loqs)-dependent, rather than r2d2-dependent. These observations collectively reveal new pathways for endogenous siRNA-mediated regulation in D. melanogaster.
We accumulated > 16 million mapped D. melanogaster small RNA reads from ten 454 samples12, a female head Solexa library13, a Solexa library from mass-isolated imaginal discs and brains14, nine Solexa libraries from early embryos and adult heads and bodies15, and three new Solexa libraries prepared from S2 cells and Kc cells (this study). In searching for new classes of small RNAs in these data, we sought to distinguish genuine regulatory RNAs from functionally irrelevant degradation products. Genuine regulatory RNAs have distinct sizes reflecting their particular biogenesis history, whereas random degraded RNAs are not expected to have particular size tendencies. For example, the reads mapped to ribosomal protein genes spanned the 18–26-nt window selected for library cloning (Fig. 1a), consistent with their probable identity as degradation fragments. In contrast, microRNAs (miRNAs) and miRNA* species, which are processed via consecutive cleavages of a precursor hairpin by Drosha and Dcr-1, show obvious size preference. Analysis of the first 382 miRNA clones suggested that 22-nt miRNA lengths were actually the most common in D. melanogaster16, and this preference held true across > 3.7 million miRNA reads, which collectively had a mean length of 21.91 nt (Fig. 1b).
A new class of endogenous small RNA derives from long inverted repeats (IRs) termed hairpin RNAs (hpRNAs), including CG18854 and a repeat cluster overlapping CG4068 (ref. 14). These produce a diverse set of reads that show preference for a length of 21 nt (Fig. 1c). This size distinction suggested that, despite their common derivation from hairpin transcripts, hpRNA biogenesis differs from that of miRNAs. Indeed, we found that hpRNA biogenesis is mediated not by Dcr-1, but instead by Dcr-2 (ref. 14), and generates functional siRNAs. Thus, endogenous Dcr-2 products seem to be shorter than Dcr-1 products.
We proposed that the internal bulges in hpRNA stems might push their average length above 21 nt and that processing of perfectly double-stranded substrates might yield products closer to 21 nt in length. Previous small-scale cloning of Dcr-2 products could not distinguish a preference for 21 nt or 22 nt17, and indirect analysis using tiling microarrays suggested that a perfectly double-stranded white IR RNA trigger18 is processed in transgenic animals with 22-nt periodicity19. We re-examined this using small RNA cloning data from white-IR heads reported by Zamore and colleagues20 and observed that white siRNAs actually show a strong preference for 21 nt (Fig. 1d). That study also described RNAs cloned from cultured S2 cells transfected with perfectly double-stranded GFP RNA, and we found that GFP siRNAs were similarly 21 nt (Fig. 1e). Taken together, these observations of miRNAs, hpRNAs and siRNAs show how functional differences in small RNA biogenesis can be inferred from subtle patterns in RNA size distributions.
About 1,000 Drosophila loci are arranged as cis-NAT pairs1. However, there is currently little data to support the coexpression of cis-NAT pairs in individual cells, which would presumably be necessary for them to be relevant to an RNAi-type mechanism. We reasoned that such evidence might be revealed by analyzing the distribution and size of small RNAs emanating from the non-overlap versus the overlap regions of cis-NAT pairs.
The two major classes of cis-NAT mRNA pairs in D. melanogaster are divergently transcribed loci that overlap on their initiating 5′ exons (100 loci) and convergently transcribed loci that overlap on their terminal 3′ exons (793 loci). A plausible explanation for the fact that there is more 3′ cis-NAT pairs than 5′ cis-NAT pairs is that transcriptional exclusion mechanisms are preferentially inhibitory to transcriptional initiation8. If there are indeed general mechanistic differences between 5′ and 3′ cis-NAT arrangements, then that might be reflected in the types of RNAs that emanate from these loci. We therefore examined the length, number and density of unique small RNA reads that mapped to the exons of top- and bottom-strand mRNA annotations of cis-NAT overlap regions. We compared these data to reads that mapped to the non-overlap regions of the same cis-NATs. The full data set is presented in Supplementary Table 1 online.
These analyses yielded four conclusions. First, there were extremely few small RNAs emanating from either strand of 5′ cis-NAT overlap regions (Fig. 1f, blue data). The corresponding non-overlap regions of 5′ cis-NATs generated RNAs with broad distribution across lengths of 18–26 nt (Fig. 1f, red data). We conclude that 5′ cis-NATs make, at most, only minor contributions to siRNA pathways. The small RNAs derived from the exons of 3′ cis-NAT genes outside of annotated overlaps showed detectable 21-nt bias but were otherwise spread over a broad size range (Fig. 1g). In contrast, the overlap regions of 3′ cis-NAT pairs generated RNAs with a strong 21-nt bias (Fig. 1h), similar to other known Dcr-2 products. These RNAs came from the top and bottom strands in approximately equal numbers. Finally, the overlap regions of cis-NAT pairs that yielded 21-mers showed much greater small RNA density than did the non-overlap regions. This is evident in the gene schematics shown in Figure 2.
We propose from these data that 3′ cis-NATs are a substantial source of endogenous siRNAs in D. melanogaster. We estimated the minimum number of functional cis-NAT siRNA loci as follows. First, we selected cis-NAT overlap regions that produce a preponderance (> 67%) of 21-nt RNAs relative to all other sizes, which should reflect their preferred processing by an RNAi pathway. Second, we asked that there be at least 11 independent 21-nt reads generated by a given overlap. These criteria resulted in a set of 117 cis-NATs, with 40 pairs producing ≥40 siRNA reads (Supplementary Fig. 1a online). By comparison, we previously annotated 16 confident miRNAs on the basis of <40 reads12. In fact, this was a conservative estimate, as the annotated extent of many cis-NAT overlaps was incorrect (Supplementary Fig. 2a online and Methods). This contributed to the mild enrichment for 21-mers in annotated non-overlap regions of cis-NATs (Fig. 1f,g).
The enrichment of 21-nt RNAs in 3′ cis-NAT overlap versus non-overlap regions was especially evident in these top 117 loci. Their aggregate overlap length comprised 27,668 bp and generated 7,486 unique reads (271 reads per kilobase), of which 5,953 (80%) were 21 nt in length. In contrast, the corresponding non-overlap exonic regions of these 117 loci spanned 488,508 bp but generated only 8,848 unique reads (18 reads per kilobase), only 1,942 of which (22%) were 21 nt in length. This represents > 50-fold greater density of 21-nt RNAs in 3′ cis-NAT overlap regions compared to their annotated non-overlap regions. Comparison of the 117 cis-NAT siRNA loci and the remaining 676 3′ cis-NATs further highlighted that siRNA loci comprise a specialized subset of cis-NATs with great propensity to generate 21-nt RNAs from cis-NAT overlap regions (Supplementary Fig. 1b). This is clearly visualized in the siRNA density chart shown in Figure 2b.
The relatively sharp demarcation of 21-mer mappings to 3′ cis-NATs suggested that the substrates were processed mRNAs with defined 3′ ends and precise complementary limits, instead of the heterogenous ends of primary transcripts before 3′ end cleavage. In support of this, we found rare examples in which a 3′ untranslated region (UTR) spanned several spliced exons on the other strand, in which case 21-nt RNAs were generated from exonic but not intronic sequence (for example, fbl/CG33969; Supplementary Fig. 2b). We also identified examples in which the genomic extent of siRNA-generating cis-NAT overlap differed between cell types. This was most evident when comparing cis-NAT siRNA patterns between the homogenous S2 and Kc cell populations (for example, for BRWD3/CG5728, CG5919/CG3308 and gry/CG14967 3′ cis-NATs; Supplementary Fig. 2c). Such variable siRNA patterns were probably due to alternative polyadenylation yielding distinct 3′ UTR limits in different cell types. We noticed that S2 cells yielded extended siRNA overlaps much more frequently than did Kc cells, and these uncertainties in particular were the basis of apparent ‘non-overlap’ 21-mer enrichments observed earlier (Fig. 1f,g).
To obtain further insight into the biogenesis of cis-NAT–derived siRNAs, we analyzed published small RNA data from Dcr-2 heterozygous and homozygous fly heads21. As reported, the overall miRNA content of these libraries was similar (Table 1). However, we observed a notable difference in the content of cis-NAT siRNAs generated from overlaps. When normalized for total library reads, we observed a > 12-fold decrease in cis-NAT siRNAs in Dcr-2 homozygous tissue. As a control, we analyzed the number of non–21-mer reads from cis-NAT regions outside of the overlap regions and observed no change in this population. These data indicate that, as suggested by their specific 21-nt size, the production of cis-NAT siRNAs is mediated by Dcr-2.
Exogenous D. melanogaster siRNAs are processed by Dcr-2 and loaded into AGO2, where they are modified at their 3′ end by the Hen1 methyltransferase22,23. This modification can be inferred by their resistance to β-elimination, which affects small RNAs with free 3′ hydroxyl groups such as AGO1-loaded miRNAs20. On this basis, the enrichment of 21–22-nt RNAs following cloning from β-eliminated RNA has been interpreted as evidence for residence in AGO2 (refs. 20,22,24). We calculated a > two-fold depletion and > eight-fold depletion of mature miRNAs when cloning from β-eliminated head RNA and S2 RNA20, respectively (Table 1). Conversely, there was strong enrichment of both white (13.6-fold) and GFP (14.6-fold) siRNAs (derived from RNAi of white in adult heads and of GFP in S2 cells) following cloning from β-eliminated RNA (Table 1). Thus, miRNAs were depleted following β-elimination, whereas exogenous siRNAs were correspondingly enriched. Similar results were recently reported21.
Analysis of cis-NAT overlap clones also clearly showed their enrichment in the appropriate libraries. There was a five-fold enrichment of cis-NAT overlap clones following β-elimination of S2 RNAs and a seven-fold enrichment of such clones following β-elimination of head RNAs (Table 1). The number of distinct cis-NAT loci that generated cloned RNAs was also strongly increased following β-elimination of head and S2 RNAs. In summary, the specific generation of 21-nt RNAs from mRNA overlaps, their selective loss in Dcr-2 homozygous tissue and their enrichment following β-elimination experiments, lead us to conclude that 3′ cis-NATs produce siRNAs via a Dcr-2 and AGO2 pathway. However, our observations do not exclude the possibility that some cis-NAT–derived small RNAs might be loaded into AGO1.
We selected the AGO2/CG7739 and tsunagi/Mys45A cis-NAT siRNA loci (Fig. 2b) for functional tests, as they generated the third- and seventh-highest number of overlap siRNAs among all libraries and were similarly ranked when considering only S2 reads. We tested the effect of loqs, Dcr-2 and AGO2 knockdown on the levels of these transcripts in S2 cells using quantitative PCR (qPCR) and normalization to transcript levels following treatment with GFP dsRNA. Certain transcripts appeared mildly upregulated in certain conditions, but we did not observe convincing evidence from consistent derepression under both Dcr-2 and AGO2 knockdown conditions (Supplementary Fig. 3a online).
As dsRNA affords only incomplete suppression of gene activity, we also tested null alleles of animal mutants. We performed qPCR of RNA isolated from the heads of Dcr-2 and AGO2 heterozygous and homozygous adult flies. We again did not observe significant changes in cis-NAT transcript levels (Supplementary Fig. 3b,c). It is possible that these animal tests were obscured because a proportion of tissue may not have coexpressed these cis-NATs on a cell-by-cell basis. However, as these tests were directed at some of the more highly expressed cis-NAT siRNA loci, a conservative conclusion is that cis-NAT siRNAs may have only subtle effects on the expression of their parental loci.
Despite the lack of regulatory evidence for qPCR tests, we sought evidence for the active restriction of cis-NATs into the siRNA pathway. It was reported that certain classes of D. melanogaster genes actively avoid the acquisition of miRNA binding sites25. Even though the endogenous function of individual miRNA binding sites often cannot be assessed in laboratory tests, their exclusion from certain types of genes can be inferred to reflect their regulatory capacity. Inspired by this line of reasoning, we sought to determine whether any co-expressed cis-NATs are denied entry into the RNAi pathway. By analogy with the phenomenon of miRNA target-site avoidance, the existence of coexpressed cis-NATs that do not generate siRNAs would provide evidence for the functional impact of cis-NAT siRNAs.
We tested this by analyzing S2 cell gene expression profiles using Affymetrix microarrays. Although the assessment of coexpression on a cell-by-cell basis is not trivial, it can be approximated by examining gene expression in a homogenous cell population. Of 13,632 annotated genes interrogated by the Affymetrix platform, 7,274 (53%) were reproducibly called present in S2 cells. Of the 117 annotated cis-NAT siRNA pairs, we observed 104 cases in which both top- and bottom-strand genes of a cis-NAT pair were called present in S2 cells. This enrichment is consistent with the idea that the genes involved in cis-NAT siRNA arrangements are more broadly expressed than average genes. However, of the 676 non-siRNA cis-NATs, 333 top- and bottom-strand gene pairs were also called present at a minimum level that captured almost all of the genuine cis-NAT siRNA loci (Fig. 3). In other words, there are three times as many coexpressed cis-NATs in S2 cells than there are cis-NATs that generate siRNAs. Indeed, many non-siRNA cis-NATs consisted of gene pairs that were highly expressed. These data provide compelling evidence that only a minority (~25%) of coexpressed cis-NAT pairs are competent to enter the siRNA pathway.
Because of the apparently active selection of cis-NATs by the RNAi pathway, we wondered whether there was an underlying molecular logic to the genes regulated by this network. To test this, we used GOToolbox26 to ask whether there was a functional signature of Gene Ontology (GO) terms among cis-NAT siRNA loci. We were particularly interested to identify GO terms that distinguished the 117 cis-NAT siRNA loci from the 676 non-siRNA cis-NATs, which would provide evidence for the active selection of specific molecular functions by the siRNA pathway.
We found that terms such as ‘nucleus’ and ‘transferase’ were highly enriched among both cis-NAT siRNA genes and non-siRNA cis-NAT genes, relative to all other D. melanogaster genes (Table 2), consistent with the idea that some gene functions are preferentially encoded by cis-NAT gene arrangements1. However, the other most highly significant enrichments concerned intriguing sets of genes with nucleic acid–based functions, including nuclease activity (both DNase and RNase), transcription factor complexes and chromosome pericentric regions (Table 2 and Supplementary Table 2 online); lipid metabolism genes were also overrepresented. Therefore, genes with nucleic acid–based functions are preferentially incorporated into cis-NAT siRNA systems. Moreover, this selection is not a simple consequence of the types of gene functions that are generally arranged as cis-NATs in the genome. Taken together, these observations implicate the endogenous regulatory impact of cis-NAT siRNAs in D. melanogaster.
In our computational screen, we recovered an abundantly expressed cis-NAT siRNA locus in the thickveins (tkv) region (Fig. 4a). This locus was cloned specifically from male bodies but not from male heads or female bodies, suggesting that its expression in the adult is restricted to the male sexual apparatus or germ line. It was also abundant in a library constructed from mass-isolated imaginal discs and brains. This locus showed several unusual features. First, whereas the vast majority of cis-NAT siRNA loci overlap on their 3′ ends, the tkv/CG14033 cis-NAT pair overlap on their 5′ exons, a configuration that is otherwise highly unfavorable for entry into an siRNA pathway (Fig. 1f). Second, this locus produced bidirectional 21-nt RNAs (Fig. 1i) not only from annotated exons, but also from the annotated tkv-RC intron (Fig. 4a). The structure of this intron-exon junction was verified by the existence of multiple cDNA clones, none of which was unspliced27. The extent of the tkv/CG14033 cluster was also not coincident with the CG14033 transcript, and thus seemed unlikely to be an mRNA phenomenon as deduced for cis-NAT siRNA loci. Unexpectedly, CG14033 was a candidate for a noncoding locus, because its longest open reading frame was only 74 amino acids (aa) long and was poorly conserved in other Drosophilids27.
A search for other intronic-exonic, bidirectionally transcribed, siRNA loci revealed two massive clusters of small RNAs that map to the klarsicht (klar) locus (Fig. 4b). We recovered ~180,000 unphased 21-nt RNAs from our libraries (Fig. 1j), and these tiled densely across both strands of neighboring 3.7-kb and 4.4-kb regions (Fig. 4b). The sheer length of these siRNA clusters rendered them unlikely to be produced by the same system that generates cis-NAT siRNAs (Fig. 2b), which are almost exclusively < 500 bp. Similarly to the tkv locus, the klar siRNA clusters were mostly intronic, but overlapped short klar coding exons at the 5′ end of the cluster (Fig. 4b). The klar small RNAs were cloned most abundantly from S2 cells and Kc cells; however, both clusters were also expressed in 0–1-hour-old and 2–6-hour-old embryos. Thus, the klar siRNA system is endogenously active in the animal. Unlike tkv, there was not an annotated antisense transcript that included the siRNA clusters. However, an uncharacterized, possibly noncoding, EST mapped (BK001845) close to the 5′ end of the cluster on the klar antisense strand (Fig. 4b).
Notably, the genomic region between the two klar siRNA clusters contains two transcription units, CG34267 and CG34268, that comprise an inverted gene duplication (Fig. 4b). Although the predicted proteins are short (77 aa and 83 aa), they are associated with spliced cDNA evidence and encode highly related polypeptides. In theory, the primary klar transcript that traverses the siRNA clusters would produce a long inverted repeat across CG34267/CG34268 (Supplementary Fig. 4 online) that resembles artificial RNAi-inducing transcripts as well as endogenous hpRNAs14. Nevertheless, no siRNAs were produced by this dsRNA region. This makes the strategy by which klar siRNA clusters are selected for processing even more mysterious.
Overall, the tkv and klar siRNA loci show many coherent features. First, these loci generate large numbers of 21-nt small RNAs that tile with nearly single-nucleotide resolution across long genomic intervals up to several kilobases in length. Second, the 5′ ends of the siRNA clusters overlap short coding exons of protein-encoding genes, but are contiguous with substantial lengths of intronic sequence. Third, the transcript pairs consist of a protein-encoding mRNA and a possibly noncoding RNA. Altogether, these characteristics define a second class of siRNAs derived from bidirectional transcription.
Although most klar siRNAs were individually rare, two out of six probes tested detected endogenous 21-nt RNAs in S2 cells. This allowed us to analyze their biogenesis using knockdowns of candidate RNAi and miRNA factors. To do so, we collected RNA from cells treated with dsRNA against GFP, Drosha, Pasha, Dcr-1, Loqs, Dcr-2, r2d2, Exportin-5, AGO1, AGO2 and CG8273 (encoding a double-stranded RNA binding domain (dsRBD) protein of unknown function) and analyzed them by northern blot. Consistent with their classification as siRNAs, we found that their accumulation was most dependent on Dcr-2 and AGO2 (Fig. 5a). In addition, we observed that klar siRNAs are blocked at their 3′ termini (Fig. 5b), providing further evidence for their loading into AGO2. Notably, their accumulation showed a substantial requirement for Loqs, which was previously known as a miRNA-specific Dcr-1 cofactor28–30. Equally unexpected was the finding that r2d2, a Dcr-2 cofactor that loads siRNAs into AGO2 (refs. 31,32), was not apparently required for klar siRNA accumulation. qPCR tests verified comparable knockdown of loqs and r2d2 in these experiments (Supplementary Fig. 5 online). These biogenesis data provide direct evidence for the production of klar siRNAs by a canonical Dcr-2 and AGO2 pathway, but introduce unexpected complexity in their requirement for dsRBD partner proteins.
There has been substantial controversy over whether complementary transcripts produced by bidirectional transcriptional can anneal in vivo to yield dsRNA that is recognized by the RNAi machinery. In plants, there is strong evidence that select cis-NAT loci engage an siRNA-mediated autoregulatory loop, especially under stress conditions5,6. However, evidence has also been presented against the broad engagement of RNAi pathways by plant cis-NATs, in favor of perhaps influencing alternative splicing or polyadenylation9. Similarly, experimental studies in mammalian cells have argued against the involvement of RNAi by cis-NATs7 in favor of indirect modes of cis-NAT regulation8.
The frequent failure of forced antisense transcripts to engage the RNAi pathway makes it clear that complementary mRNAs probably do not efficiently associate as dsRNA in vivo. However, it has been difficult to assess whether any dsRNA at all is generated from cis-NATs. A major problem for their study lies in the challenge of determining confidently that a given genomic locus is transcribed on both strands in the same cell. Even if one can detect coexpression of top and bottom strands of a cis-NAT in RNA extracted from a seemingly homogenous cell line, it is hard to rule out that expression of the two strands might not be inversely correlated on a cell-by-cell basis.
The study of large-scale small RNA sequence data allowed us to bypass the difficulty of detecting cis-NAT dsRNA by visualizing downstream siRNA production directly. On the basis of genomic patterns of small RNA enrichment, 21-nt specificity, dependence on Dcr-2 and 3′ end modifications, we were able to confidently infer that many 3′ cis-NATs transit a bona fide siRNA pathway. Although the RNAi pathway accepts at least 100 3′ cis-NAT pairs, gene expression analysis of S2 cells indicated that 75% of coexpressed cis-NAT pairs are rejected from the siRNA pathway. In addition, the highly significant enrichment for various nucleic acid–based (DNA and RNA) functions among cis-NAT siRNA genes indicated that this pathway is not a fortuitous consequence of frequent cis-NAT arrangements in the D. melanogaster genome. Rather, we infer that cis-NAT siRNA loci are actively selected from among bulk cis-NAT gene pairs. Notably, a concurrent study identified 17 cis-NAT siRNA loci expressed in mouse oocytes33, suggesting that this regulatory strategy is conserved in animals.
Our report of genuine endogenous siRNA loci now provides direction to efforts to understand the role of endogenous RNAi in D. melanogaster. We describe here some particularly compelling cis-NAT siRNA loci that are highly expressed and have RNA-based regulatory functions.
First, the AGO2/CG7739 cis-NAT (AGO2 being the very protein that binds siRNAs and effects siRNA-mediated gene regulation) generated the third-highest number of cis-NAT siRNAs. Although our experimental tests were mostly inconclusive, this potential for self-regulation is reminiscent of autoregulation of ARGONAUTE-1 and DICER-LIKE-1 by miRNAs34,35. Perhaps this regulatory arrangement has a specialized use in the fly.
Second, with regard to the CG18854/IP3K cis-NAT, we recently showed that the spliced mRNA of the CG18854 locus is processed into functional siRNAs by a Dcr-2– and AGO2-dependent mechanism termed the hpRNA pathway14. We noticed that the 3′ end of CG18854 overlaps extensively with the 3′ exon of inositol 1,4,5-triphosphate kinase 1 on the other strand, and generates the fifth-highest number of overlap siRNAs of all cis-NATs. Therefore, the CG18854 locus participates prominently in both cis-NAT siRNA and hpRNA pathways.
Third, the Tsunagi/Mys54A and mago/Magi cis-NATs generated the seventh- and twelfth-highest number of overlap siRNAs amongst all cis-NATs. Notably, tsunagi and mago encode physically interacting components of the exon junction complex36. It is commonly recognized that proteins that function as components of a common machinery are co-transcriptionally regulated. These cis-NATs provide a curious example for the common post-transcriptional regulation of functionally related proteins.
In general, the fact that cis-NAT siRNA loci are strongly enriched for nucleic acid–based functions suggests this as a major molecular axis for siRNA-mediated gene regulation in D. melanogaster. In this regard, it is perhaps relevant that Dcr-2 mutants were reported to have abnormal nucleolar morphology37, and AGO2 mutants were found to have defects in chromosome condensation, centromeric function and nuclear division38. The most highly significantly enriched GO-terms for confident cis-NAT siRNA include genes that function in these pathways (Table 2). Even though Dcr-2 and AGO2 null mutants are reasonably healthy and do not wildly deregulate cis-NAT siRNA transcripts, directed genetic interaction studies may prove useful in gaining further insight into the regulation of RNA and DNA pathways by cis-NAT siRNAs.
We showed that in addition to cis-NAT siRNA loci, there are at least two exceptional regions of the genome that generate siRNAs via bidirectional transcription and a Dcr-2– and AGO2-dependent pathway. These loci, located within the klar and tkv genes, probably define an independent mechanism of siRNA generation. Several salient features support this idea: (i) they generate 1–2 orders of magnitude more siRNA clones than do the most highly expressed cis-NAT siRNA loci; (ii) they are not 3′ cis-NATs; (iii) they originate from much larger genomic intervals than do cis-NAT siRNA overlaps; (iv) their antisense partners are potentially noncoding RNAs; (v) their predominant genomic annotations are introns, not exons; (vi) both siRNA clusters overlap coding exons at their 5′ ends. A functional understanding of their impact on klar and tkv activity awaits genetic knockouts of these siRNA clusters.
In summary, our studies reveal distinct mechanisms for the production of bona fide endogenous siRNAs via bidirectional transcription in D. melanogaster. Together with concurrent studies of endogenous siRNAs in D. melanogaster21 and mice33,39, this work opens windows onto studies of host-directed functions of RNAi and novel mechanisms for small regulatory RNA biogenesis.
We used a published protocol40 to generate libraries from ~18–26-nt RNAs isolated from (i) S2 cells, (ii) Kc cells and (iii) a mixture of Kc cells and S2 cells before and after 48-h treatment with 5 × 10−6 M 20-hydroxyecdysone (Sigma). These were analyzed using the Solexa/Illumina platform, respectively yielding 2,265,651, 2,151,714 and 699,873 reads for which 3′ linker sequences could be confidently identified and clipped. The entire data sets of clipped reads were deposited in NCBI-GEO under the platform GPL6573 as GSM272652 (S2), GSM272653 (KC) and GSM272651 (S2+KC).
We used previously described templates28,41 to generate dsRNAs for S2 cell knockdowns. We soaked 2 × 106 S2 R+ cells in six-well plates with 20 µg ml−1 dsRNA41. Quantitative reverse-transcription PCR (qPCR) was performed on AGO2, CG9937, tsunagi and Mys45A using SYBR green. These raw data were normalized to rp49 values, and then each of the knockdown samples was normalized to the value from GFP knockdown. We performed a total of eight qPCR reactions on two different knockdown samples for each condition.
We tested the following locked nucleic acid (LNA) probes for their ability to detect klar siRNAs in S2 cell RNA. Of these, klar2.1 and klarD hybridized to endogenous 21-nt RNAs on northern blots and were subsequently used to analyze klar siRNA biogenesis across a panel of RNA samples treated with various dsRNAs against miRNA and RNAi factors: klar_2.1, 5′-TGGACACCCATTCGATAGAATCGGA-3′; klar_2.2, 5′-CAACCGAGGTGTAAGCACTCATGTT-3′; klarA_probe, 5′-AGAGACAGGCCCAACAAAAAGACGA-3′; klarB_probe, 5′-GCGACCTTTAATCAACACCTCAA-3′; klarC_probe, 5′-CTCATCATTAAGGCAAATCCGAAGA-3′; klarD_probe, 5′-AGAGGCACAGGAAGAAACGCTCGAA-3′.
FlyBase release 5.5 annotations (January 2008) were used to identify cis-NATs in the D. melanogaster genome27. A set of overlaps between exons in the plus strand and those in the minus strand were first identified, and alternatively spliced transcript annotations were collapsed to single genes to avoid duplicate analyses. For each gene pair, those transcripts that contained overlapping exons were then identified. The length of the overlap region (lin) was the sum of the overlapping exon regions from both strands. The sum of the lengths of distinct exon regions that were outside the overlap region of the transcripts in the cis-NAT pair was the length outside the overlap (lout). Small RNAs of length greater than or equal to 18 nt that appear within and outside the overlap region of a cis-NAT were identified from the 454 and Solexa libraries. The normalized number of reads of the small RNAs were recorded as output; that is, if the clone count of a small RNA is c and its number of blast hits is b, then, the normalized clone count is (c/b).
Our cis-NAT read analysis made it evident that many annotations are either incorrect with truncated 3′ UTRs or are alternatively polyadenylated in an individual library resulting in cell-specific cis-NAT overlap regions. We manually corrected a small number of loci that gave rise to substantial numbers of siRNAs in corrected overlaps. The FlyBase annotations and revised overlaps are as follows: (i) CG12016/CG11526, chr3L:3322274–3322491 to chr3L: 3322000–3322491; (ii) CG31898/fy, chr2L:8401446–8401477 to chr2L:8,401, 438–8,401,647; (iii) dmt/hyd, chr3R:5540344–5540397 to chr3R:5,539,735–5,540,500; (iv) BRWD3/CG5728, chr3R:20154175–20154348 to chr3R: 20150000–20158919 only in S2 cells; (v) CG5919/CG3308, chr3R:17096644–17097129 to chr3R:17,096,119–17,098,531 only in S2 cells; (vi) gry/CG14967, chr3L:3211110–3211347 to chr3L:3209271–3218451 only in S2 cells; and (vii) CG8594/Sin3A, chr2R:8462743–8462813 to chr2R:8460808–8464448 only in S2 cells. We did not attempt to correct the many cis-NATs whose reannotation individually affected few reads; however, summed together, these were sufficient to account for the couple of hundred 21-nt non-overlap reads observed above background (Fig. 1f,g).
Another relevant cis-NAT revision concerned CG18854, a FlyBase-annotated protein-encoding gene that we have recently shown to be a hairpin RNA (hpRNA) transcript that generates endogenous siRNAs. CG18854 produces relatively abundant 3′ cis-NAT siRNAs with IP3K1. However, because the CG18854 non-overlap region generates thousands of 21-nt reads via the hpRNA pathway, it was necessary to exclude hpRNA-derived CG18854 siRNAs from the analysis.
Small RNA enrichments within and outside the cis-NAT overlap region were assessed on the basis of uniquely mapped small RNA, as certain mRNA degradation fragments map to many (sometimes hundreds) of locations. Small RNA enrichment (Esr) and the 21-mer enrichment (E21) of a cis-NAT pair were calculated as follows. Let nsr,in, nsr,out, n21,in and n21,out be the sum of clone count of unique small RNAs and 21-mers in and outside the overlap, respectively. Then, Esr is defined as the ratio of the number of small RNAs in overlap to the length of overlap over those outside overlap to the length outside the overlap; that is, Esr = (nsr,in/lin)/(nsr,out/lout). Similarly, E21 is defined as the ratio of the number of 21-mers in overlap to the length of overlap over those outside overlap to the length outside the overlap; that is, E21 = (n21,in/lin)/ (n21,out/lout). Note that enrichment values are inherently variable as a function of overlap and non-overlap lengths, which necessarily differ for each individual cis-NAT. Therefore, the enrichment value of any particular cis-NAT is slightly less meaningful than the overall distribution of enrichment values across cis-NAT sets.
To extract cis-NATs enriched with 21-mers in the overlap, we adopted a threshold for the ratio (n21,in/ nsr,in) > 0.67. Of the total 793 3′ cis-NATs found, 275 that satisfied the above threshold were then binned with respect to the number of 21-mers in the overlap. We empirically set cut-off values of > 10 overlap 21-mers, for which > 67% of all overlap reads were 21-nt in length. This resulted in 117 confident 3′ cis-NAT siRNA loci.
To assess enrichments of GO terms in cis-NAT siRNA genes, we used GOToolBox (http://burgundy.cmmt.ubc.ca/GOToolBox/)26. We used the hypergeometric test option of GO-Stat with the gene set and the reference set specified as follows: (i) genes from the 117 confident 3′ cis-NAT siRNA loci compared to all D. melanogaster genes and (ii) genes from the 676 non-siRNA cis-NATs compared to all D. melanogaster genes.
Affymetrix Drosophila Genome Arrays were interrogated using independent preparations of total RNA from S2 cells treated with GFP dsRNA28. The expression data were processed using the R software environment for statistical computing and graphics (http://www.r-project.org/) using GC-RMA normalization.
We thank E. Hodges and G. Hannon for Solexa sequencing, A. Viale and the Sloan-Kettering Genomics Core for hybridizing Affymetrix microarrays, and N. Socci for assistance with microarray analysis. K.O. was supported by a grant from the Charles Revson Foundation. E.C.L. was supported by grants from the Burroughs Wellcome Foundation, the V Foundation for Cancer Research, the Sidney Kimmel Cancer Foundation and the US National Institutes of Health (R01-GM083300 and U01-HG004261).
Accession codes. NCBI-GEO. The microarray data were deposited under platform GPL1322, accession number GSM286644. The entire data sets of clipped reads were deposited in NCBI-GEO under the platform GPL6573 as GSM272652 (S2), GSM272653 (KC) and GSM272651 (S2+KC).
Note: Supplementary information is available on the Nature Structural & Molecular Biology website.
Reprints and permissions information is available online at http://npg.nature.com/reprintsandpermissions/