Despite our growing understanding of the mechanism and function of small RNAs, their evolutionary origins remain obscure. siRNAs are present in all three eukaryotic kingdoms, plants, animals, and fungi, and provide anti-viral defense in at least plants and animals. Thus, the siRNA machinery was present in the last common ancestor of plants, animals and fungi. In contrast, miRNAs have only been found in land plants, the unicellular green alga, Chlamydomonas reinhardtii
, and metazoan animals, but not in unicellular choanoflagellates or fungi199–201
. Deep sequencing experiments have found no miRNAs shared by plants and animals, suggesting that miRNA genes, unlike the miRNA protein machinery, arose independently at least twice in evolution. Finally, piRNAs appear to be the youngest major small RNA family, having been found only in metazoan animals201
. While Dicer proteins have been identified only in eukaryotes, Argonaute proteins can also be found in eubacteria and archea, raising the prospect that small nucleic-acids may have served as guides for proteins at the very dawn of cellular life, and though the machinery might be ancient, the small RNA guides diversified over time to acquire specialized roles.
The history of small silencing RNAs makes predicting the future particularly daunting, as new discoveries have come at a breakneck pace, with each new small RNA mechanism or function forcing a re-evaluation of cherished models and “facts.” Several longstanding but unanswered questions, however, are worth highlighting. First, does RNAi—in the sense of an siRNA-guided defense against external nucleic acid threats such as viruses—exist in mammals? Second, how do miRNAs repress gene expression? Do several parallel mechanisms co-exist in vivo, or will the current, apparently contradictory, models for miRNA-directed translational repression and mRNA decay ultimately be unified in a larger mechanistic scheme? Third, can miRNA regulated genes ever be identified by computation alone, or will computational predictions ultimately give way to high-throughput experimental methods for associating individual miRNA species with their regulatory targets? Will network analysis uncover themes in miRNA-target relationships that reveal why miRNA-regulation is so widespread in animals? Fourth, how are piRNAs made? The feed-forward amplification “ping-pong” model is appealing, but likely underestimates the complexity of piRNA biogenesis mechanisms? We do not yet know how piRNA 3′ ends are generated. Nor do we have a coherent model for how long, antisense transcripts from piRNA clusters are fragmented into piRNAs. Finally, will the increasing number of examples of small RNAs carrying epigenetic information across generations3,202
ultimately force us to reexamine our Mendelian view of inheritance?
Box 2. High throughput sequencing and small RNA discovery
Much of the credit for the identification of small RNAs rests with advances in high throughput sequencing. Presently, there are three commercial “high depth” sequencing systems: Roche′s 454 GS FLX Genome Analyzer, Illumina′s Solexa Analyzer and, most recently, Applied Biosystem′s SOLiD System. Reference 223 describes how each method works. Whereas 454 has the advantage of sequencing >250 bp per read, compared to ~35–50 bp for Solexa and SOLiD, these two platforms provide 70- to 400-fold greater sequencing depth. All three platforms have been used successfully to identify novel small RNA species and to discover new small RNA classes in mutant plants and animals. Using less than 10 µg total RNA, high throughput sequencing, together with advances in small RNA library preparation, has revealed the length distribution, sequence identity, terminal structure, sequence and strand biases, isoform prevalence, genomic origins, and mode of biogenesis for millions of small RNAs. Initial small RNA sequencing experiments sought simply to identify novel small RNA species and classes. Increasingly, high throughput sequencing is being used to profile small RNA expression across the stages of development and in different tissues and disease states. Profiling by deep sequencing provides quantitative information about small RNA expression, like PCR- or microarray-based approaches, but can also precisely detect subtle changes in small RNA sequence or length.
Perhaps the most problematic step in small RNA sequencing is preparing the small RNA library. The most frequently employed cloning protocols require the small RNAs to have 5′ phosphate and a 3′ hydroxyl groups, the hallmarks of Dicer products. This approach identifies small RNAs with the expected termini, but alternative methods must be used to find small RNAs, such as C. elegans
secondary siRNAs, with other terminal structures. Additionally, finding every possible small RNA in a cell using exhaustive deep sequencing is a game with diminishing returns. For example, while many miRNAs have been sequence 100,000′s or even a million times, the C. elegans
, which is apparently expressed in less than ten cells of the adult, has so far eluded high depth sequencing224