|Home | About | Journals | Submit | Contact Us | Français|
The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact firstname.lastname@example.org
In contrast to mRNAs, which are templates for translating proteins, non-protein coding (npc) RNAs (also known as ‘non-coding’ RNA, ncRNA), exhibit various functions in different compartments and developmental stages of the cell. Small nucleolar RNAs (snoRNAs), one of the largest classes of npcRNAs, guide post-transcriptional modifications of other RNAs that are crucial for appropriate RNA folding as well as for RNA–RNA and RNA–protein interactions. Although snoRNA genes comprise a significant fraction of the eutherian genome, identifying and characterizing large numbers of them is not sufficiently accessible by classical computer searches alone. Furthermore, most previous investigations of snoRNAs yielded only limited indications of their evolution. Using data obtained by a combination of high-throughput cDNA library screening and computational search strategies based on a modified DNAMAN program, we characterized 151 npcRNAs, and in particular 121 snoRNAs, from Caenorhabditis elegans and extensively compared them with those in the related, Caenorhabditis briggsae. Detailed comparisons of paralog snoRNAs in the two nematodes revealed, in addition to trans-duplication, a novel, cis-duplication distribution strategy with insertions near to the original loci. Some snoRNAs coevolved with their modification target sites, demonstrating the close interaction of complementary regions. Some target sites modified by snoRNAs were changed, added or lost, documenting a high degree of evolutionary plasticity of npcRNAs.
Two very surprising discoveries have arisen from the Human Genome Project. One, humans do not have significantly more protein-coding genes than other mammals; and two, sequences corresponding to protein open reading frames comprise only 1.5% of our genome (1). The unavoidable conclusion to be drawn from this is that the differences that separate humans from other species may reside in the remaining 98.5% of the genome that encode untranslated functional RNAs and regulatory regions, or constitutes non-genic regions. The present work focuses on a defined population of non-protein coding RNAs (npcRNAs), often not quite correctly termed ‘non-coding’ RNA (ncRNA), derived from a Caenorhabditis elegans cDNA library generated with size-fractionated RNA (70–600 nt). The size limitation, while excluding mature microRNAs (miRNAs), short interfering RNAs and large ribosomal RNAs (rRNAs) that are well described elsewhere (2,3), yields predominantly small nucleolar RNAs (snoRNAs) and spliceosomal RNAs. snoRNAs are 60–300 nt long and guide the post-transcriptional modifications of ribosomal and other RNAs. Such modifications are crucial for appropriate RNA folding as well as for RNA–RNA and RNA–protein interactions (4). Furthermore, snoRNAs are thought to be involved in epigenetic mechanisms regulating gene expression. In this context, deletion of certain imprinted snoRNA clusters in the cerebral cortex is thought to play a causative role in the Prader–Willi Syndrome of mental retardation (5–7).
Based on structural motifs and function the snoRNA family is divided into two subclasses: C/D-box snoRNAs (C-box consensus UGAUGA; D-box consensus CUGA) and H/ACA snoRNAs (H-box consensus ANANNA and box ACA), which interact directly by base complementarity to their target rRNA and spliceosomal RNA sequences to direct 2′-O-ribose methylation and pseudouridylation, respectively. The complementary regions, known as ‘antisense elements’, reside at the 5′ and/or 3′ ends of snoRNAs. Although snoRNA modifications were initially thought to be restricted to rRNA and to be localized strictly in the nucleolus, a growing list of npcRNAs including transfer RNAs (tRNAs) are also modified by snoRNAs (4,8), and they have also been found in Cajal bodies, nucleoplasmic substructures involved in processing npcRNAs (9). The spectrum of snoRNA targets could potentially include even mRNAs, although it cannot be excluded yet that such existing base complementarities are simply fortuitous and without biological significance (5). Most vertebrate snoRNAs are derived from introns of pre-mRNA transcripts, especially those from ribosomal protein genes (RPGs) and other housekeeping proteins, and are processed in a complex sequence involving endonucleases, exonucleases and helicases (10,11). Interestingly, a growing number of host genes do not yield translatable mRNAs, and it appears that the main function of the corresponding genes and primary transcripts is the expression of snoRNAs (12–14). Many miRNAs are also hosted by npcRNAs (15).
Systematic searches using experimental RNomics, an EST-like approach tailored for small RNAs, have successfully identified large numbers of npcRNAs in Mouse (16), Drosophila (17), Arabidopsis (18) and Archaea (8). To better elucidate the evolutionary pathways of snoRNAs, we have now extended this search to the nematode, C.elegans, an extremely interesting model eukaryote with a simple body plan but complicated genomics including, for example, cis, trans and alternative splicing systems. As intermediates between single-celled organisms and ‘higher’ metazoan animals, they offer an excellent system for studies on metazoan genome function and evolution. To provide a large enough dataset for exhaustive analysis of snoRNAs in C.elegans we have now combined high-throughput, experimental RNomic screening with computational methods focused on RPGs and other introns of genes that harbor snoRNAs identified in our experimental approach. Furthermore, we have analyzed the phylogeny of snoRNAs by comparing the above results with those of Caenorhabditis briggsae, a nematode that shared a common ancestor with C.elegans some 100 million years ago (mya). Our three-pronged approach revealed possible mechanisms of how novel snoRNAs arose, spread in the genome, changed targets or were lost over the course of evolution.
The experimental procedures concerning construction and analysis of libraries are described in Hüttenhofer et al. (16). Detailed methods for constructing the C.elegans library are given in Supplementary Data.
The commercial software package DNAMAN was modified, in collaboration with the Lynnon Corporation, to computationally screen defined databases of intronic sequences for snoRNAs and to identify snoRNA modification target sites from compiled RNA databases. The modified DNAMAN version is available from http://www.lynnon.com (Mac OS X version 6018 or later). Note that additional freeware is available at http://lowelab.ucsc.edu/snoscan to analyze C/D-box snoRNAs (19), at http://lowelab.ucsc.edu/snoGPS for H/ACA-box snoRNAs (20) and at http://rna.tbi.univie.ac.at/cgi-bin/alifold.cgi to analyze secondary structural prediction (21).
The modified DNAMAN software allowed us to apply complex search profiles to find potential snoRNAs in a compilation of C.elegans introns of RPGs and genes that harbor experimentally identified snoRNAs. The following search was applied for C/D-box snoRNAs:
TGATGA(N9-35)CTGA(N4-35)TGATGA(N9-35)CTGA <mismatch=3 2 3 1>
A maximum of three mismatches were allowed in the first, two in the second, three in the third and one in the last sequence motif. N9-35 and N4-35 denote variable sequence stretches of at least 9 or 4, respectively, and a maximum of 35 nt. The search motif for H/ACA-box snoRNAs was ANA(NN)A(N50-100)ACA. No mismatches were allowed. Both searches were accompanied by intensive structural evaluations of the computationally predicted snoRNAs (Supplementary Figure 1).
The pattern search of DNAMAN is implemented in C language. Details of the search procedure are provided by the Lynnon Corporation (Supplementary Data).
BLAST searches of the cDNA sequences were made against the C.elegans non-redundant (nr) NCBI database (http://www.ncbi.nlm.nih.gov/blast), the Santa Cruz server (http://genome.ucsc.edu/cgi-bin/hgBlat), the Sanger database (http://www.ensembl.org/Caenorhabditis_elegans/blastview) or the RPG databank (http://ribosome.miyazaki-med.ac.jp). Thus, the sequences absent in the truncated cDNAs (usually some 10 nt) were extended with the aid of genomic sequences. The mature 5′ ends were estimated by structural requirements of mature snoRNAs (Supplementary Figure 1).
A compiled library of all C.elegans rRNA, spliceosomal and tRNA genes was searched with the modified DNAMAN software for potential snoRNA target motifs. For C/D-box snoRNA target sites we allowed a maximum of three G–U pairs and a minimum length of 9 nt. For H/ACA-box snoRNAs we used a similar search profile but allowed a split of target sites in four or five contiguous nucleotides. The detailed search process is shown in Supplementary Figure 1.
The same database sources as mentioned above were used to computationally detect orthologous snoRNAs in C.briggsae.
The secondary structures of all experimentally and computationally identified snoRNAs were derived using the M-fold program (22); http://www.bioinfo.rpi.edu/applications/mfold/old/rna.
Following high-density array hybridization of 38400 cDNA sequence clones to exclude known small npcRNAs or fragments of degraded large rRNAs (Supplementary Figure 2), we selected 4673 clones for sequencing. Exclusion of unreadable or very short sequences, empty vectors, E.coli contaminations and other ambiguities yielded 3294 clones; among these we identified 15 known spliceosomal RNAs (294 sequences), 41 known tRNAs (322 sequences), 3 isoforms of SRP (signal recognition particle) RNA (736 sequences), 29 different parts of known rRNAs that escaped prior exclusion (1180 sequences), 22 known mRNAs (31 sequences), 7 splice leader RNA sequences (SL; 64 sequences) and two histone hairpin RNAs (2 sequences) all of which were excluded from a more detailed analysis (SL, SRP, histone hairpin and spliceosomal RNAs are listed in Supplementary Data). The remaining 665 sequences contained 120 npcRNAs including 91 snoRNAs (Figure 1).
In addition to those npcRNAs identified experimentally, computer searches based on the following arguments yielded another 23 snoRNAs (Figure 1). Yoshihama et al. (11) estimated that RPGs harbor about one-third of all snoRNAs in the human genome. In our experimentally identified snoRNAs we also observed that genes harboring one snoRNA in an intron are likely to encode additional snoRNAs in the same intron or in neighboring introns of the same gene. Consequently, we extracted and analyzed snoRNA candidates from introns of all known C.elegans RPGs and from other intronic sequences that were found in the proximity of our experimentally detected snoRNAs. We used the following stringent criteria to validate all computationally detected snoRNA candidates (Supplementary Figure 1): (i) presence of all snoRNA structural requirements and box motifs; (ii) identification of potential modification target site complementarities, (iii) sequence conservation in C.briggsae, and/or signals in northern blots. From >100 potential candidates (Supplementary Data) an additional 23 novel snoRNA candidates met these stringent criteria (Figure 1; comCe).
The reliability of the computational algorithm was confirmed in that we were also able to identify all but 17 of the experimentally found or previously predicted snoRNAs with these search criteria. Those snoRNAs not confirmed in the computer search were structurally modified and therefore did not match our search profile (data not shown). An additional BLAST search of Genbank genomic sequences revealed seven snoRNA paralogs (Figure 1, blCe, blCb) and one additional spliceosomal RNA (blCe378).
Of all 154 experimentally or computationally identified sequences, 59 are novel snoRNAs candidates (Figure 1, I), while 65 of the recovered snoRNA candidates were recently confirmed experimentally (23) (59 candidates) or (24) (6 candidates) (Figure 1, II; Supplementary Data). For completeness, Figure 1 (III) also lists 20 other snoRNA candidates that were not recovered by our screen, but were identified previously by either Deng et al. (23) (18 candidates) or Wachi et al. (24) (2 candidates), and is thus now a compilation of all presently known C.elegans snoRNAs.
Altogether, we found 76 unambiguous snoRNA candidates with motifs, secondary structure elements and recognizable target modification complementarities characteristic of C/D-box snoRNAs. Based on their chromosomal locations, individual candidates could be described as either intronic or intergenic snoRNAs (Figure 1 and Supplementary Data). All but 16 are also potentially functional in C.briggsae and are located at orthologous loci; 10 of these 16 were recognizable, but diverged, at orthologous positions in C.briggsae (Supplementary Figure 3). Presumably, they became inactive pseudogenes that lack motifs and structures to function as bona fide snoRNAs. Interestingly, in all C/D-box snoRNA candidates (as well as the H/ACA-box snoRNAs) we identified a characteristic uridine-rich region adjacent to the mature 3′ ends. This sequence has previously been implicated in maturation of H/ACA-box snoRNAs only (25). We also found a C/D-box homodimer (Ce234) and a chimeric C/D-H/ACA-box snoRNA (Ce104). Northern blot analysis of both resulted in hybridization to only dimeric forms, indicating that the respective dimers are the mature forms of these snoRNAs. Interestingly, we detected only six C/D-box snoRNA candidates in RPG introns compared with 20 H/ACA-box snoRNAs (Figure 1).
We also identified 48 H/ACA-box snoRNAs that were localized to intronic and intergenic regions (Figure 1 and Supplementary Data). Only seven of those are probably not functional in C.briggsae (Figure 1). The sequences of three H/ACA-box snoRNA orthologs are apparently diverged pseudogenes in C.briggsae (Supplementary Figure 3; comCb17, comCb22, blCb176).
In keeping with their function, snoRNAs have dual binding capacity for both small RNA modifying proteins and, via specific sequence complementarity, for their target RNAs. We identified complementarities for potential modification targets in 5S, 5.8S, 18S and 26S rRNAs; in U1, U2, U4, U5 and U6 spliceosomal RNAs and in tRNAs. Twelve 26S rRNA target sites are supported by the presence of nucleotide modifications (26) (Supplementary Figure 3, black dots). This is the first time that C/D-box snoRNAs in eukaryotes have been identified with potential target sites in various tRNAs (Ce62-tRNAIle, Ce63-tRNASer, Ce94-tRNAAsn, Ce246-tRNAIle, comCe3-tRNAThr, comCe18-tRNAArg) (Figure 2). tRNA modifications guided by snoRNAs have been reported thus far only in Archaea (8,27). Another interesting observation was the presence of two antisense elements in some of our snoRNAs (e.g. Ce173.3, Ce251, Ce298, Ce23) with complimentary regions suggesting the potential to modify target RNAs located in two different subcellular compartments. These snoRNAs are predicted to modify rRNAs that occur in the nucleolus, as well as U1, U4, U5 spliceosomal RNAs that are present in Cajal bodies. Even an individual antisense element has the potential to be complementary to more than one hypothetical target site (Supplementary Figure 3).
From our 121 experimentally and computationally identified snoRNAs in C.elegans, 98 potentially functional orthologs were identified in C.briggsae (Figure 1). Forty of these orthologous pairs contain matching sequence complementarities to the same RNA modification targets in C.briggsae and C.elegans (Supplementary Figure 3a and b). Surprisingly, the potential target sites for the majority of them changed over a period of 100 million years (Supplementary Figure 3c and d).
snoRNA paralogs, generated perhaps by gene duplication, have been observed frequently and are a potential source for the creation of novel snoRNAs (28). We identified 20 snoRNAs and their corresponding paralogs (11 pairs are orthologs in both C.elegans and C.briggsae, 6 pairs in C.elegans and 3 pairs in C.briggsae only; Figure 3a). To help determine whether the computationally identified H/ACA-box snoRNA paralogs are functional, we analyzed the compensatory nucleotide substitution patterns in their double-stranded stem structures. Compensatory changes tend to maintain the secondary structure of stem regions and indicate selection pressure for functionality. Characteristic compensatory changes could be found for all identified H/ACA-box snoRNA paralogs suggesting that they retained their functionality, at least for a sufficient period to form individual compensations after duplication (data not shown). Compensatory substitution pattern analyses in C/D-box snoRNAs are not of much help in determining their functionality because they do not possess sufficient amounts of double-stranded structures; thus, C/D-box snoRNAs were omitted from this analysis.
The chromosomal localization of snoRNA paralogs could be categorized based on two distinguishing events: those in which snoRNA paralogs inserted into different positions in the same gene (cis-duplication), and those in which snoRNA paralogs inserted into target genes (or intergenic regions) other than the original host gene or, in the event of host gene duplication, moved to a different chromosomal location along with the host gene (trans-duplication).
The presence and/or absence of 20 snoRNA paralogs were analyzed in C.elegans and C.briggsae. Figure 4a shows a C.elegans snoRNA (comCe12) that is conserved at the orthologous locus (intron 2 of the rps-29 gene) in C.briggsae. A paralog of this snoRNA (Ce236) was also present in our cDNA library. In C.elegans Ce236 is located in intron 1 of the same rps-29 host gene, while the orthologous position in the rps-29 gene of C.briggsae is empty. Since the probability of a clean excision of the snoRNA without parts of the flanking sequences at this position in C.briggsae is negligible, this indicates a duplication process involving integration into the adjacent intron (cis-duplication) after C.elegans split from a common ancestor with C.briggsae. Interestingly, the function of Ce236 in C.elegans may have been replaced in C.briggsae by another non-orthologous snoRNA (Cb309) that has the potential to modify the identical nucleotide in 26S rRNA, while the Cb309 ortholog in C.elegans (Ce309) modifies a target sequence in 18S rRNA. A similar scenario is shown in Figure 4b. We found a snoRNA (comCe7) present at orthologous positions of the rpl-24 gene in C.elegans, C.briggsae and Caenorhabditis remanei. A paralog (Ce80) could be detected in intron 1 of the same gene in C.elegans only. Figure 4c shows a snoRNA (comCe14) present in orthologous positions of the hypothetical protein gene K07C5 in C.elegans and C.briggsae. A corresponding paralog is found in intron 7 of the same gene in C.elegans (comCe15) but not in C.briggsae. In all presence/absence cases examined, the intronic sequences flanking the duplicated snoRNAs were recognizable at the corresponding, empty loci.
We could distinguish two forms of trans-duplications, both of which are exemplified in Figure 5. In some instances of segmental duplications of entire genes that harbor snoRNAs in one or more introns, the snoRNA did not move to another part of the host gene but hitchhiked with the host to a new location after duplication. In other cases, snoRNAs inserted into introns of a new host gene without traces of the original host gene, or into a new intergenic location. Figure 5a describes an example of segmental duplication of a hypothetical protein gene (C06A1.3) yielding a duplicated pseudogene (with respect to the protein-coding capacity) including the paralogous snoRNAs Ce173.1-3. Figure 5b shows two experimentally identified snoRNA paralogs; Ce254b is located in intron 1 of a hypothetical protein gene (Y53F4B.12). The paralog Ce254b duplicated, along with the 5′ (~100 nt) and 3′ (~50 nt) sequences of its original flanking intron, but without detectable surrounding exons, and migrated into a new location on a different chromosome. Interestingly, the separate left and right antisense elements of Ce254 (Figure 3b, bottom) modify bases in 26S rRNA that are shifted by only 6 nt. Hence, the sequences on 26S rRNA that are complementary to the two snoRNA antisense elements overlap by 2 nt (Figure 3b, top). This indicates that modification of the two methylation targets is not likely to occur at the same time. We found both paralogs at orthologous positions in C.briggsae, indicating that the duplication event took place in a common ancestor of both worms. Both forms retained their modification targets over 100 my demonstrating strong functional constraints. The fact that two conserved snoRNA paralogs modify the same targets indicates that one may not be enough to perform modification of all rRNA molecules, and that quantitative aspects play an important role in snoRNA function. Figure 5c and d describe snoRNA paralogs that are located in totally different surroundings following duplication. In the latter case it is noteworthy that the comCe6 paralog moved from one RPG (rpl-7) to another (rps-13) as the Ce280 paralog or vice versa.
Data provided by both the experimental and computational searches, as well as comparisons of paralogous snoRNAs in both C.elegans and C.briggsae enabled us to analyze target sites and hence function of duplicated snoRNA genes. We observed three different fates of the snoRNAs following duplication: (i) one of the paralogs apparently became inactive and decayed during the course of evolution; (ii) the new paralog maintained the same function as the original snoRNA and (iii) the new paralog either partially (one antisense element maintained the same target and the other acquired a new one) or fully diverged with respect to the complementary targets. Of the 20 pairs of paralogs, we found 4, 16 and 10 examples for the above three scenarios, respectively. One example of target site plasticity is illustrated in C/D-box snoRNA Ce246, which was detected experimentally in the C.elegans cDNA library and computationally in C.briggsae. In C.briggsae one paralog differs from the other mainly by a 2 nt deletion 5′ adjacent to the D′-box, shifting the methylation site by 2 nt (blCb246a-blCb246b). In 26S rRNA G860 is modified by one paralog and A862 by the other.
Coevolution is defined as a change in the genetic composition of one species in response to a genetic change in another (29,30). This definition can be adapted to molecular interactions within organisms. Biologically significant interactions within macromolecules [e.g. RNA secondary structure; (31)] or between macromolecules, [e.g. RNA and proteins], can be demonstrated by compensatory changes in one or the other (32). Two of the C/D-box snoRNAs (Ce138, Ce234.2) exemplify adaptive evolution of the snoRNA complementary region to their 26S rRNA target sequence (Figure 6). In the lineage leading to C.elegans, an A→U substitution occurred in the 26S rRNA target site of the Ce138 snoRNA. This base change is not present in C.briggsae or C.japonica 26S rRNA sequences (data not shown). Accordingly, we found a compensatory U→A substitution in the antisense element of the snoRNA ortholog in C.elegans (Figure 6a), but not in C.briggsae or C.japonica. At another 26S rRNA position we found an A→G substitution in C.briggsae but not in C.elegans or C.japonica (data not shown). The corresponding C.briggsae snoRNA Cb234.2 shows a compensatory change from U→C (Figure 6b).
Our goal was to obtain as comprehensive a view as possible of cellular snoRNA expression in C.elegans. Creating a cDNA library based on size-fractionated, expressed RNAs, and enriched to remove large numbers of known RNAs, yielded highly efficient experimental search results. We present here a detailed analysis of 120 different npcRNA species (Figure 1, groups I, II) from 665 informative sequences selected from an initial 38400 clones. Moreover, to complement and validate the results of this experimental approach, we customized commercially available computer software to generate a search tool for identifying snoRNA candidates and their modification target sites according to a set of stringent criteria. From >100 potential intronic snoRNA candidates, 23 additional candidates fulfilled these conditions; another 8 npcRNAs were found by BLAST search. Thus, it is obvious that while computational approaches are not capable of supplanting experimental work, they do constitute a very useful complementation. This was particularly exemplified by our ability to analyze experimentally identified snoRNAs that, although apparently still functional, had diverged from the canonical motifs used for the computer search. In fact, the pitfalls of not complementing experimental results with such careful computational analyses can be clearly seen in a recent experimental screen (23). Of the C.elegans 56 novel snoRNAs shown in Figure 1, 14 were also reported recently but were analyzed either incorrectly or not at all (23). As examples, Ce96 (CeN25-2) or Ce135 (CeN25-1) were described as members of a novel class of small nuclear-like RNAs (23) (see their Figure 3D). Nevertheless, we could discern clear characteristics of C/D-box snoRNAs for both of these npcRNAs using computational analyses. Ce173.1-3 (CeN128) was described as one single H/ACA-box snoRNA species. By comparative analyses of C.elegans and C.briggsae we could distinguish them as three independent C/D-box snoRNAs. The same was true for Ce234.1-2 (CeN47) that they defined as one single snoRNA species. Ce110 (CeN42) is clearly a C/D-box snoRNA but they defined it as an H/ACA-box snoRNA even though part of the predicted H/ACA-box snoRNA would clearly overlap exonic sequences. They also identified six other unclassified npcRNAs [Ce86 (CeN35), Ce151 (CeN129), Ce254a (CeN23-1), Ce254b (CeN23-2), Ce282(CeN52), Ce105 (CeN66)] which we could clearly assign to specific snoRNA categories.
Our in silico target site complementarity search provided evidence of a high degree of plasticity in target site modification. In some cases we found evidence to suggest that the two complementary regions of particular snoRNAs modify targets in different compartments of the nucleoplasm, namely rRNAs in the nucleolus and spliceosomal RNAs in the Cajal body. In addition to the classical modification targets, we also found snoRNA complementarities for target sites in five different tRNAs. Modification of tRNAs by snoRNAs has been demonstrated so far only for Archaea and not for Eukarya (8,27). There is evidence that, following duplication, several snoRNA paralogs evolved new target site complementarities. Comparing C.elegans and C.briggsae, we observed that several specific modification sites of rRNAs are targeted by otherwise unrelated snoRNAs in both species. Losing, gaining or changing target sites are frequent phenomena that document the plasticity of modification interactions. Another source of plasticity is the compensatory changes of snoRNA target site complementary sequences that arose following base substitutions in their targets as illustrated in the case of Ce138 and Ce234.2 (Figure 6). Although several of our predicted target sites were confirmed by experimental approaches (26), a more conclusive verification of other target sites is necessary.
Little is known about the origin and distribution of snoRNAs. Polycistronic clusters of snoRNAs are frequent in plants, and propagation due to cluster duplication is generated by polyploidization (33). However, polycistronic clusters of snoRNAs are the exception in vertebrates, as snoRNAs in those organisms are mainly singular and intron-encoded. To elucidate the process of snoRNA propagation in a ‘model’ eukaryote, we analyzed presence/absence patterns of snoRNA paralogs in C.elegans and compared them with those in C.briggsae. We identified three snoRNA paralogs with clear presence/absence patterns (Figure 4). These patterns suggest a copy/paste mechanism in the duplication of certain singular snoRNAs into neighboring introns of the same gene (cis-duplication). Cis-duplication seems to be a dominant process for H/ACA-box snoRNA propagation, but thus far, we did not identify any C/D-box snoRNA paralogs generated by cis-duplication. We found most genes harboring predominantly one type of intronic snoRNAs, (e.g. H/ACA-box snoRNAs in RPGs; Figure 1 and Supplementary Data), one notable exception being the C/D-box snoRNA comCe4 which is present in the midst of several other H/ACA-box snoRNAs in the hypothetical protein gene K07C5.4 (Figure 4).
Our data also suggest that snoRNAs can be propagated by complete or partial gene duplication that includes the embedded snoRNAs, an event that has been purported to precede evolutionary novelties (Figure 5) (34,35). Brosius (36) suggested that snoRNAs could be propagated by retroposition, a mechanism that might be responsible for trans-duplicated snoRNAs, but, because insertions of retroposed sequences are virtually random and should not lead to accumulations in neighboring introns, seems not to be involved in cis-duplication. Local, unequal recombination is a more probable mechanism for cis-duplications, especially in C.elegans, because of the A/T-rich surroundings of snoRNA sequences.
In summary, the gain, loss and change of targets of snoRNAs over relatively short evolutionary times, possibly similar to the evolution of miRNAs (37–39), indicate that npcRNAs are not merely fossils from the long gone RNA/RNP world but continue to contribute to the changing needs of cells and genomes. This constitutes an astounding and unexpected level of plasticity for a primordial macromolecule such as RNA.
Supplementary Data are available at NAR Online.
We thank Yue Huang for implementing modifications of the DNAMAN software and Marsha Bundman for editorial assistance. This work was supported by the German Human Genome Project through the BMBF (#01KW9966), and grants from the Fonds der Chemischen Industrie from the European Union (EU; LSHG-CT-2003-503022) to J.B., and the Nationales Genomforschungsnetz (NGFN; 0313358A) to J.B. and J.S. Funding to pay the Open Access publication charges for this article was provided by NGFN.
Conflict of interest statement. None declared.