To investigate the taxonomic distribution of the known snoRNAs and highlight where potential new discoveries can be made, we have gathered data from the Pfam (protein families), Rfam (RNA families), Genomes Online (GOLD) and EMBL databases (Figure ). The Rfam database uses experimentally validated ncRNA sequences that have been deposited in EMBL to search for homologous sequences across all nucleotide sequences (see the red and pink bars in Figure ). The results show that for many major taxonomic clades there are few or no known snoRNAs annotated.
Figure 2 The taxonomic distribution of existing snoRNA annotations. The figure displays a tree derived from the top three levels of the National Center for Biotechnology Information (NCBI) taxonomy. Mapped onto this are counts of: (1) the snoRNP-associated Pfam (more ...)
In the Archaea, annotated snoRNAs are notably absent from the taxon Halobacterium
, for which a genome sequence has been available for nearly 10 years and which has been proposed to contain snoRNAs on the basis of the presence of the snoRNP-associated proteins fibrillarin and Nop56/58 [7
]. In fact, only 33% of the crenarchaeal and 60% of the euryarchaeal groups carry known or predicted snoRNAs, and numbers of snoRNAs are very low in the Euryarchaeota. Still within the Archaea, snoRNAs have been annotated in some methanococcal genomes, predicted on the basis of homology to experimentally validated snoRNAs from members of the Thermoprotei [8
Some eukaryotic taxa fare little better. For example, in the unicellular diplomonads (Diplomonadida; Figure ), such as Giardia lamblia
, there are no snoRNA families listed in Rfam, although putative snoRNA-like RNAs have been reported from G. lamblia
]. Databases such as Rfam inevitably lag behind the current literature; we expect that these missing snoRNAs will be included in future releases.
The case of the microsporidia (unicellular organisms allied to the fungi) is interesting in that one genome sequence was published nearly a decade ago and eight further projects are in progress, yet despite this apparent wealth of information no snoRNAs have been identified. But like diplomonads, microsporidia clearly have components of the snoRNA machinery and almost certainly utilize snoRNAs. The absence, therefore, is due to the fact that snoRNAs have not been experimentally determined, and current bioinformatics methods are not sensitive enough to reliably identify snoRNAs in these taxa from sequence analyses alone, so none have been inferred by homology.
As expected, the Metazoa are comparatively well studied; there is a host of supporting experimental and bioinformatics evidence for snoRNAs across the metazoa, with the exception of the Cnidaria and the Platyhelminthes, which currently only have bioinformatically predicted snoRNAs based upon sequence similarity to other metazoan snoRNAs.
The genome sequence for the parasitic protozoan Trichomonas vaginalis
(a parabasalid; Figure ) bears one lonely C/D-box snoRNA annotation for a homolog of the fungal snoRNA snR52/Z13. Furthermore, this is a rather low-scoring hit (26.12 bits, E-value = 1.04e+02) to an otherwise exclusively fungal family and the Trichomonas
sequence has some differences from the canonical C- and D-box motifs, suggesting that the prediction may be spurious (Additional file 1
). In contrast, the two main groups of green plants (Viridiplantae), the Streptophyta (multicellular green plants and some green algae) and Chlorophyta (green algae) (Figure ), both have good snoRNA coverage, which is based on both bioinformatics and intensive experimental study of green plant snoRNAs.
Finally, the Stramenopiles (Figure ) have five completed and one draft genome project according to the GOLD database. Both the two main lineages of stramenopiles, Bacillariophyta and Oomycetes, have reasonable numbers of predicted snoRNAs based on homology to other lineages (9 and 75, respectively), though none has been experimentally validated. Whereas counts of Pfam domains and rRNAs indicate that the snoRNP machinery is present in all known taxa of Archaea and Eukaryota, surprisingly it seems to be absent from Oomycetes. However, this lack is likely to be due to the protein sequences not yet being included in the public sequence databases rather than bona fide loss of the snoRNP machinery.