|Home | About | Journals | Submit | Contact Us | Français|
Double-stranded RNA has been shown to induce gene silencing in diverse eukaryotes and by a variety of pathways. We have examined the taxonomic distribution and the phylogenetic relationship of key components of the RNA interference (RNAi) machinery in members of five eukaryotic supergroups. On the basis of the parsimony principle, our analyses suggest that a relatively complex RNAi machinery was already present in the last common ancestor of eukaryotes and consisted, at a minimum, of one Argonaute-like polypeptide, one Piwi-like protein, one Dicer, and one RNA-dependent RNA polymerase. As proposed before, the ancestral (but non-essential) role of these components may have been in defense responses against genomic parasites such as transposable elements and viruses. From a mechanistic perspective, the RNAi machinery in the eukaryotic ancestor may have been capable of both small-RNA-guided transcript degradation as well as transcriptional repression, most likely through histone modifications. Both roles appear to be widespread among living eukaryotes and this diversification of function could account for the evolutionary conservation of duplicated Argonaute-Piwi proteins. In contrast, additional RNAi-mediated pathways such as RNA-directed DNA methylation, programmed genome rearrangements, meiotic silencing by unpaired DNA, and miRNA-mediated gene regulation may have evolved independently in specific lineages.
RNA-mediated silencing is an evolutionarily conserved mechanism(s) through which double-stranded RNA (dsRNA) induces the inactivation of cognate sequences. The role of dsRNA in triggering repression was initially characterized in Caenorhabditis elegans and termed RNA interference (Fire et al. 1998). However, silencing phenomena had already been described in a number of eukaryotes and the connection to dsRNA helped to unify several, apparently disparate, processes involving post-transcriptional RNA degradation, transcriptional gene silencing via heterochromatin formation and/or DNA methylation, DNA elimination, or meiotic silencing by unpaired DNA (Baulcombe 2004; Matzke and Birchler 2005; Meister and Tuschl 2004; Ullu et al. 2004; Zamore and Haley 2005). In plants and animals, the RNAi machinery is also involved in the production of microRNAs (miRNAs), by the processing of genome encoded imperfect RNA hairpins, which play a role in developmental regulation (Bartel 2004; Chen 2005; Wienholds and Plasterk 2005; Zamore and Haley 2005).
Currently, the most extensively characterized dsRNA-mediated mechanism is targeted mRNA degradation guided by small interfering RNAs (siRNAs). Genetic and biochemical studies from diverse species have revealed that long dsRNAs are processed into siRNAs by an RNaseIII-like endonuclease, named Dicer (Bernstein et al. 2001; Meister and Tuschl 2004; Sontheimer 2005; Tomari and Zamore 2005). siRNAs are then incorporated into a multiprotein complex, the RNA-induced silencing complex (RISC) (Pham et al. 2004; Tomari et al. 2004). Members of the Argonaute-Piwi (Ago-Piwi) family of proteins are core components of the RISC and some of these polypeptides function as siRNA-guided endonucleases (Baumberger and Baulcombe 2005; Hammond et al. 2001; Liu et al. 2004a; Meister and Tuschl 2004; Qi et al. 2005; Tomari and Zamore 2005). Recent evidence suggests that a siRNA duplex may be loaded into RISC and then Ago cleaves one of the siRNA strands (the passenger strand) triggering its dissociation from the complex (Matranga et al. 2005; Miyoshi et al. 2005; Rand et al. 2005). Activated RISC then functions as a multiple-turnover enzyme that recognizes and cleaves RNA molecules complementary to the incorporated single-stranded guide siRNA (Meister and Tuschl 2004; Sontheimer 2005; Tomari and Zamore 2005).
Members of the Argonaute-Piwi family fall into two main classes, one named after Arabidopsis thaliana Argonaute and the other after Drosophila melanogaster Piwi (Carmell et al. 2002). These proteins are highly basic, approximately 100-kD in size, and contain two conserved motifs, the PAZ (after Piwi/Argonaute/Zwille) and the PIWI domains (Cerutti et al. 2000; Lingel et al. 2004; Ma et al. 2004; Song et al. 2004; Yuan et al. 2005). A number of experiments have implicated certain argonautes, such as human Ago2, as the catalytic unit (‘slicer’) of the RISC (Liu et al. 2004a; Okamura et al. 2004; Rivas et al. 2005). However, several other Ago paralogs are not endonucleolytically active (Liu et al. 2004a; Meister et al. 2004; Rivas et al. 2005). In fact, in several species, there is evidence for functional specialization of Ago-Piwi proteins (Grishok et al. 2001; Lee et al. 2003; Matzke and Birchler 2005; Okamura et al. 2004).
RNA-dependent RNA polymerases (RdRPs) also play an important role in RNAi in some eukaryotes (Wassenegger and Krczal 2006). For instance, putative homologs of a tomato RdRP are required for post-transcriptional gene silencing (PTGS) triggered by sense transgenes in A. thaliana, for quelling (a phenomenon similar to PTGS) and for meiotic silencing by unpaired DNA in Neurospora crassa, as well as for RNAi in C. elegans and Dictyostelium discoideum (Baulcombe 2004; Cogoni and Macino 2000; Martens et al. 2002; Shiu et al. 2001; Sijen et al. 2001; Wassenegger and Krczal 2006). It has been proposed that RdRPs generate dsRNA from single-stranded transcripts either by de novo, primer independent second-strand synthesis (utilizing as template ‘aberrant’ RNAs, presumably lacking normal processing signals such as a 5′ cap or a polyA tail) or by using siRNAs as primers to synthesize RNA complementary to the target mRNA (Baulcombe 2004; Sijen et al. 2001; Wassenegger and Krczal 2006). Thus, RdRP activity may initiate RNAi (by producing the trigger dsRNA) or dramatically enhance the RNAi response (by amplifying the amount of dsRNA) (Baulcombe 2004). However, dsRNA-induced RNAi can occur in the absence of RdRP activity (Schwarz et al. 2002; Stein et al. 2003).
The biochemical and genetic studies briefly summarized above have led to the identification of three key components of the RNAi machinery, namely Dicer, Argonaute-Piwi, and RdRP. However, the taxonomic distribution of these proteins and their ancestry have not been explored in detail. In this review we have examined the phylogenetic relationship of Ago-Piwi, Dicer-like, and RdRP proteins present in members of five eukaryotic supergroups. On the basis of the parsimony principle we have attempted to infer the composition and function(s) of the RNAi machinery in the last common ancestor of eukaryotes. We have also assessed putatively derived RNAi functions that might have evolved in specific lineages. Our findings provide a framework for predicting the existence of RNAi-related mechanisms in uncharacterized eukaryotes.
Based on morphological, biochemical, and molecular phylogenetic approaches, eukaryotes have recently been classified into six supergroups: the Opisthokonta, including animals and fungi; the Amoebozoa, including most traditional amoebae and slime moulds; the Excavata, grouping diplomonads, several genera of heterotrophic flagellates, and possibly the Euglenozoa; the Rhizaria, including the Foraminifera and the Cercozoa; the Archaeplastida, grouping red algae, green algae, and plants; and the Chromalveolata, including dinoflagellates, apicomplexan parasites, and the Stramenopiles (brown algae, diatoms, and many zoosporic fungi) (Adl et al. 2005; Medina 2005). In order to evaluate the phyletic distribution of the RNAi machinery components, we have surveyed 25 complete or near-complete genomes that belong to five eukaryotic supergroups (with only Rhizaria remaining unsampled). Proteins containing conserved Argonaute-Piwi, Dicer, or RdRP domains were identified by either BLAST or PSI-BLAST searches of protein and/or translated genomic DNA databases. Since several of the examined genomes are in draft stage, an important caveat in our analyses is that some proteins may be missing from the databases whereas others may have errors in the predicted gene structure. However, we only considered as potential homologs proteins that exhibited enough sequence similarity to be aligned and used for phylogenetic tree construction.
Argonaute-Piwi, Dicer-like, and RdRP proteins occur in members of all the eukaryotic supergroups examined (Table 1). This widespread taxonomic distribution, as well as the direct demonstration of RNAi related phenomena in most of these organisms (Table 2), suggests that the main components of the RNAi machinery were already present in the last common ancestor of eukaryotes. However, Ago-Piwi, Dicer-like, and RdRP polypeptides (or a subset of these proteins) also appear to have been lost from specific lineages. The RNAi machinery seems to be entirely absent in Saccharomyces cerevisiae (Opisthokonta), Trypanosoma cruzi and Leishmania major (Excavata), Cyanidioschyzon merolae (Archaeplastida), and Plasmodium falciparum (Chromalveolata) (Table 1). For some of these organisms there is also convincing evidence that they are unable to utilize dsRNA to trigger degradation of target RNA (DaRocha et al. 2004; Robinson and Beverley 2003; Ullu et al. 2004). Thus, the RNAi mechanism appears to have been lost independently several times during eukaryotic evolution.
The greatest conservation among the examined polypeptides corresponded to Ago-Piwi proteins, which are clearly identifiable in all species where RNAi-related phenomena have been experimentally demonstrated (Tables 1, ,2).2). Moreover, the dual domain structure of the Ago-Piwi polypeptides, namely a PAZ domain followed by a PIWI domain, has also been well conserved. The only exception among the RNAi-positive organisms listed in Table 1 is Giardia intestinalis that encodes a protein with a well-defined PIWI domain fused to a highly divergent PAZ domain. However, Giardia protein-encoding genes are notoriously fast-evolving compared with those of most other eukaryotes (Richards and Cavalier-Smith 2005), which might explain the poor conservation of the PAZ domain. Several organisms, including a few archaea and eubacteria, encode proteins with a single PIWI domain recognizable by primary sequence comparisons (Anantharaman et al. 2002; Cerutti et al. 2000; Ullu et al. 2004). Though, the crystal structures of the Pyrococcus furiosus and the Aquifex aeolicus Ago-like polypeptides revealed that they also contain somewhat variant PAZ-like domains (Song et al. 2004; Yuan et al. 2005). Yet, since similar proteins are also present in species such as T. cruzi and L. major that are RNAi negative (Ullu et al. 2004) and the Trypanosoma brucei single PIWI polypeptide is not required for RNAi (Durand-Dubief and Bastin 2003), their functional role(s) is uncertain and they were not included in our analyses. Intriguingly, recent findings have indicated that the Archaeoglobus fulgidus Piwi and the A. aeolicus Ago bind ssDNA with greater affinity than ssRNA (Ma et al. 2005; Yuan et al. 2005) and the A. aeolicus protein has been postulated to function as a DNA-guided site-specific endoribonuclease (Yuan et al. 2005).
Dicer-like proteins are relatively well conserved among organisms that have retained the RNAi pathway (Table 1), albeit with significant variability in their primary sequence and domain organization (Fig. 1). The Dicer enzymes initially characterized in D. melanogaster and humans (Bernstein et al. 2001; Zhang et al. 2004) are multidomain proteins consisting of a SFII RNA helicase domain, a domain of unknown function (DUF283), a PAZ domain, two RNaseIII catalytic domains (RNaseIIIa and RNaseIIIb), and a dsRNA binding domain (DSRM) (Bernstein et al. 2001; Meister and Tuschl 2004). This overall organization is maintained in Dicer-like proteins from animals (H. sapiens), fungi (Schyzosaccharomyces pombe), and plants (A. thaliana) (Fig. 1) with the greatest variability associated with the presence or absence of the DSRM and/or of the PAZ domains. In Tetrahymena thermophila there are three Dicer-like sequences: Dcr2, which contains only the helicase and the two RNaseIII domains (Fig. 1), and Dcr1 and Dcl1, which are more divergent (Lee and Collins 2006; Mochizuki and Gorovsky 2005). Indeed, Dcl1 only includes the RNaseIII motifs and a C-terminal DSRM. In the incomplete genome of Phytophthora sojae the only recognizable Dicer-like sequence consists of a poorly conserved helicase domain, the DUF283 motif, and the two RNaseIII domains (Fig. 1). In D. discoideum only the RNaseIII domains have been conserved, in association with a DSRM fused at the N-terminal end of the two Dicer-like proteins (Fig. 1). Interestingly, in this organism the RdRPs now contain SFII RNA helicase motifs homologous to that of Dicer (Martens et al. 2002). In G. intestinalis the sole Dicer-like protein is characterized by a PAZ domain and the RNaseIII motifs (Fig. 1), yet this polypeptide has recently been shown to work as a fully functional enzyme (MacRae et al. 2006). Thus, the only Dicer domains that appear to be predominantly conserved as a fusion across the eukaryotic spectrum are the two RNaseIII catalytic motifs (Fig. 1). Intriguingly, both T. brucei and Entamoeba histolytica, species with demonstrated RNAi (Djikeng et al. 2001; Kaur and Lohia 2004), appear to encode only proteins with single RNaseIII domains (Abed and Ankri 2005). These enzymes may perhaps act as dimers to assume a catalytic core similar to that of Dicer (Zhang et al. 2004; MacRae et al. 2006). Alternatively, the T. brucei and E. histolytica genomes might be incomplete or the corresponding Dicer-like sequences might be so divergent that they are no longer recognizable by primary sequence searches.
RdRPs are not as widely distributed among eukaryotes as Ago-Piwi and Dicer-like proteins (Table 1). Even though RNAi occurs in animals (Schwarz et al. 2002; Stein et al. 2003), Chlamydomonas reinhardtii (a green alga) (Rohr et al. 2004; Schroda 2006), and T. brucei (Ullu et al. 2004), RdRPs were not detected in their genomes (Table 1), with the exception of C. elegans and Branchiostoma floridae (AAQ10792). In Aspergillus nidulans it has been experimentally demonstrated that RNAi induced by an inverted repeat transgene does not require any of the two RdRPs encoded in the genome (Hammond and Keller 2005). This is consistent with the postulated ancillary roles of RdRPs in generating the dsRNA trigger and/or in amplifying siRNA levels (Baulcombe 2004; Sijen et al. 2001; Wassenegger and Krczal 2006). If enough dsRNA is produced by other means RdRPs might not be needed for the degradative RNAi pathway, explaining their more widespread loss from specific eukaryotic lineages. Conversely, processes that depend on RdRPs such as transitive RNAi (i.e., the spreading of silencing to regions outside that initially targeted by dsRNA) and the generation of transacting siRNAs will be limited to eukaryotes encoding these enzymes in their genomes (Allen et al. 2005; Baulcombe 2004; Sijen et al. 2001; Yoshikawa et al. 2005).
The phylogeny of eukaryotic organisms has been difficult to resolve. The relationship among the supergroups, the potential lack of monophyly of Chromalveolata and Excavata, and the placement of the root of the tree have remained contentious (Adl et al. 2005; Simpson and Roger 2004). Earlier phylogenetic analyses indicated that diplomonads (including G. intestinalis) are among the deepest divergences in the eukaryotic lineage and the tree was ‘rooted’ with these mitochondrion-lacking unicellular eukaryotes (Sogin et al. 1989; Sogin 1991). Recent studies have suggested that this rooting may have been patterned by methodological artefacts (Philippe et al. 2000). Arisue et al. (2004) have argued that two possibilities seem to exist for the root of the eukaryotic tree, namely the branch leading to Opisthokonta (animals and fungi) or that leading to the common ancestor of Diplomonadida/Parabasalia (within Excavata). Moreover, combined protein phylogenies strongly suggest that Opisthokonta are most closely related to Amoebozoa (Richards and Cavalier-Smith 2005; Simpson and Roger 2004). This grouping has been called ‘unikonts’ (ancestrally monociliate). Based on similar evidence all the other major groups of eukaryotes (Archaeplastida, Chromalveolata, Rhizaria, and Excavata) might be related to each other and have been called ‘bikonts’ (ancestrally biciliate) (Richards and Cavalier-Smith 2005; Simpson and Roger 2004). Thus, an emerging hypothesis is that the earliest evolutionary divergence within eukaryotes (and the root of the eukaryotic tree) falls between unikonts and bikonts (Richards and Cavalier-Smith 2005; Simpson and Roger 2004; Stechmann and Cavalier-Smith 2003). If true, then comparisons between animals, fungi, and plants (which would include organisms derived from each branch of the earliest divergence) would be largely sufficient to diagnose the generalities of the ancestral RNAi machinery. However, because of the detected pattern of lineage specific losses of RNAi components (see above) and the possibility of an alternative eukaryotic rooting in the branch leading to Diplomonadida/Parabasalia, we have included proteins from species belonging to each of five eukaryotic supergroups (whenever possible) in our phylogenetic analyses.
The most likely point of origin of the Argonaute-Piwi, Dicer-like, and RdRP protein families was inferred from the patterns of phyletic distribution and phylogenetic tree topology and on the basis of the parsimony principle (Anantharaman et al. 2002). If a particular protein family is widely represented in all eukaryotic supergroups, the most parsimonious scenario points to its presence in the last common ancestor of eukaryotes. This conclusion is reinforced when the phylogenetic tree for the family in question conforms to the topology of the eukaryotic tree. However, none of the best phylogenetic trees for Ago-Piwi, Dicer-like, or RdRP polypeptides strictly coincided with the consensus tree of Eukaryota (Medina 2005) nor could reconstruct the monophyly of some higher-order eukaryotic groups, as previously reported for individual gene/protein trees (Arisue et al. 2004; Philippe et al. 2004). Besides the usual problems of weakness of phylogenetic signal, lateral gene transfers, hidden paralogy, and tree reconstruction artefacts (Philippe et al. 2004), incorrectly predicted protein models (since some sequences were extracted from draft genomes) may have contributed to greater than usual divergences making the relationship among deep branches difficult to resolve.
Despite these caveats, the Argonaute-Piwi proteins in present day organisms fell into two relatively well supported, presumably paralogous, groups: the Argonaute-like and the Piwi-like polypeptides (Fig. 2). Fungi (Opisthokonta), green algae and plants (Archaeplastida), and P. sojae (Chromalveolata) appear to encode exclusively Argonaute-like proteins in their genomes. In contrast, Amoebozoa, and T. thermophila and Paramecium tetraurelia (Chromalveolata) seem to encode exclusively Piwi-like proteins. Lastly, animals (Opisthokonta) have representatives of both types of proteins whereas the Excavata sequences (G. intestinalis and T. brucei) could not be reliably resolved in terms of their grouping. A parsimonious interpretation of these data suggests that the last common ancestor of eukaryotes contained both Argonaute-like and Piwi-like proteins and that specific lineages independently lost either one or the other. Only animals appear to have retained both classes of proteins, although this conclusion may need to be reexamined as more sequences from diverse taxonomic groups become available. Interestingly, the Argonaute-Piwi duplication may have preceded the formation of a multidomain, PAZ-containing Dicer protein (see below) since in phylogenetic analyses the PAZ domains of Piwi-like and Dicer-like proteins cluster together whereas the PAZ domains of Argonaute-like proteins behave as an outgroup (data not shown). Thus, domain shuffling from an ancestral Piwi-like gene might have contributed the PAZ motif to Dicer.
Ago-Piwi proteins have also undergone a marked degree of expansion in certain eukaryotic lineages (Fig. 2), most prominently plants and metazoans, perhaps associated with more extensive diversification of function. In plants, several duplications of Argonaute-like proteins appear to have occurred both before and after the divergence of monocots and dicots, represented by Oryza sativa and A. thaliana, respectively (Fig. 2). Extensive expansion of Argonaute-Piwi proteins has also occurred in the animal lineage. Moreover, in certain species such as C. elegans and D. melanogaster some of these polypeptides are currently so divergent that they do not reconstruct the monophyly of animals (Fig. 2). At least one group of C. elegans Argonaute-like proteins (including PPW1 and PPW2) behaves as paralogous to all other Ago-like polypeptides in animals, fungi, plants, C. reinhardtii, and P. sojae (Fig. 2).
A phylogenetic tree of Dicer-like proteins, constructed based on the alignment of the dual RNaseIII domains, did no allow resolving the relationship among most of these proteins (Fig. 3). This is likely a reflection of the lower sequence (and domain structure) conservation of Dicer-like proteins relative to Ago-Piwi and RdRP polypeptides. However, the animal and plant Dicer-like sequences form a well-supported cluster and appear to be orthologous (Fig. 3). Interestingly, plant Dicer-like sequences underwent significant expansion largely prior to the divergence of monocots and dicots. In contrast, most animals appear to encode a single Dicer sequence, with the exception of insects that contain two. Whereas insect Dcr1 clusters with all other animal Dicers, Dcr2 is much more divergent and forms a paralogous clade (Fig. 3). Intriguingly, insect Ago2 is also much more divergent than Ago1 and does not cluster with most other animal Argonaute-like proteins (the Ago1 clade) (Fig. 2). It remains unclear whether this reflects an ancient RNAi pathway duplication in the animal lineage that was retained only in insects and/or the fast evolution of certain duplicated sequences within the insect lineage. Although a recent report suggests that D. melanogaster Dcr2 and Ago2 are among the fastest evolving genes in this organism, perhaps as a result of a coevolutionary ‘arms race’ with viral pathogens (Obbard et al. 2006).
A monophyletic origin of animal and plant Dicers is also supported by the comparable domain organization of their sequences (Fig. 1). Moreover, if the eukaryotic tree is truly rooted between unikonts and dikonts, as already discussed, one of the ancestral forms of Dicer may have been similar to the multidomain protein now present in animals and plants. Domain deletion/truncation, domain fusion, as well as sequence divergence could explain the more variable Dicer-like proteins found in other living organisms (which contain various combinations of some of the putative ancestral motifs) (Fig. 1). However, a polyphyletic origin of Dicer-like sequences, and, potentially, the existence of more than one Dicer form in the eukaryotic ancestor, cannot be statistically ruled out (Fig. 3). Interestingly, Drosha, another type of RNaseIII enzyme involved in RNAi via the processing of miRNA precursors in animals (Bartel 2004; Wienholds and Plasterk 2005; Zamore and Haley 2005), is absent from the genome of all other eukaryotes examined (data not shown). Drosha polypeptides form an outgroup with respect to plant and animal Dicers and they seem to be somewhat better related (albeit weakly) to eubacterial RNaseIII enzymes (Fig. 3). Thus, this type of protein may have evolved, independently from Dicer, in the animal lineage.
RdRPs are not as widely distributed among eukaryotes as Ago-Piwi and Dicer-like sequences but a phylogenetic tree, constructed by aligning the RdRP domains, supports the monophyletic origin of the proteins found in C. elegans, fungi, Amoebozoa, P. tetraurelia, and a subset of plant RdRPs (Fig. 4). However, the evolutionary relationships among some of these polypeptides as well as the grouping of the G. intestinalis RdRP are not well defined. Besides the already discussed caveats associated with our analyses, the topology of the RdRP tree might also be affected by more prevalent lineage-specific losses of some of these proteins. For instance, there is experimental evidence that A. nidulans has lost, via DNA sequence degeneration, the putative ortholog of the N. crassa RdRP Qde-1 (Hammond and Keller 2005). Intriguingly, plants also contain a subset of RdRPs (including A. thaliana Rdr3, Rdr4, and Rdr5) that behaves as an outgroup to all other RdRPs (Fig. 4). Thus, a parsimonious interpretation of the data is consistent with the existence of at least one RdRP in the common eukaryotic ancestor, but the origin of the subset of more divergent plant RdRPs remains uncertain.
RNAi-related phenomena occur in all eukaryotic supergroups (with the possible exception of Rhizaria that remains to be sampled) (Table 1) and phylogenetic analyses suggest that the main components of the RNAi machinery can be traced back to the common ancestor of eukaryotes (Figs. 2, ,3,3, ,4).4). Moreover, several motifs typical of RNAi effectors such as the RdRP (which differs from viral RdRPs), PAZ, and DUF283 domains are restricted to the eukaryotic lineage and have been postulated to be eukaryotic innovations (Anantharaman et al. 2002; Wassenegger and Krczal 2006). Though a somewhat divergent PAZ-like structural fold, lacking primary sequence conservation with the eukaryotic PAZ domain, also occurs in some archaeal and eubacterial proteins (Song et al. 2004; Yuan et al. 2005). The remaining domains in Argonaute-Piwi and Dicer-like proteins such as the PIWI, the SFII RNA helicase, the RNaseIII, and the DSRM motifs appear to have originated in prokaryotic lineages prior to the divergence of eukaryotes (Anantharaman et al. 2002). Thus, the innovative evolution of certain domains as well as the fusion of diverse functional motifs in order to generate Argonaute-Piwi, Dicer-like, and RdRP proteins might have occurred ancestrally in the eukaryotic lineage.
Interestingly, the last common ancestor of living eukaryotes appears to have been a ‘complete’ eukaryotic cell (Richards and Cavalier-Smith 2005; Simpson and Roger 2004). It had a nucleus, endoplasmic reticulum, and Golgi apparatus, and underwent mitosis and meiosis (Ramesh et al. 2005; Simpson and Roger 2004). It also had an endosymbiont-derived mitochondrion, a cilium and centriole, and a complex eukaryotic cytoskeleton (Richards and Cavalier-Smith 2005). The only major eukaryotic features that appear to be of later origin are plastids (Simpson and Roger 2004). Given this complexity it is perhaps not surprising that the ancestral RNAi machinery may have already been well developed. Our analyses (see above) suggest that the last common ancestor of eukaryotes had, at a minimum, one Argonaute-like polypeptide, one Piwi-like protein, one Dicer, and one RdRP. Further, since in the case of Dicer the reconstructed phylogeny did not resolve the relationship among most of these polypeptides (Fig. 3), the existence of Dicer-like paralogs in the last common ancestor of eukaryotes cannot be ruled out.
The function(s) of this ancestral RNAi machinery is unknown. However, several considerations may provide a framework for hypotheses about its possible role(s). First, the ancestral RNAi function(s) very likely was not essential for life since the machinery appears to have been lost from a variety of taxonomically divergent eukaryotes, such as S. cerevisiae, T. cruzi, L. major, C. merolae, and P. falciparum (Table 1). The alternative explanation, that in these species an essential RNAi role(s) was compensated by another mechanism, would require the independent evolution of the latter multiple times. Moreover, in organisms that contain a single Dicer gene such as S. pombe and vertebrates, Dicer-null mutants are RNAi-defective but viable at the cellular level (Giraldez et al. 2005; Kanellopoulou et al. 2005; Martienssen et al. 2005; Murchison et al. 2005; Volpe et al. 2002). Though these mutants commonly show reactivation of transposons and/or repetitive sequences, deficient (hetero)chromatin formation, and/or abnormal chromosome segregation; and vertebrate germ cells fail to differentiate (Fukagawa et al. 2004; Kanellopoulou et al. 2005; Martienssen et al. 2005; Murchison et al. 2005). Second, if a particular function is now widely represented in all examined eukaryotic supergroups, the most parsimonious scenario would suggests that it was already operative in the last common ancestor of eukaryotes. Interestingly, two RNA-mediated processes appear to be widespread among living eukaryotes: the post-transcriptional degradation of cognate RNAs and the transcriptional repression of homologous DNA sequences (Table 2). Third, the existence of duplicated Argonaute-Piwi proteins in the last common ancestor of eukaryotes (Fig. 2) suggests some degree of functional diversification tracing back to the ancestral RNAi machinery (since completely redundant genes are unlikely to be evolutionarily conserved; Ohno 1970; Force et al. 1999; Lynch and Conery 2000; Moore and Purugganan 2005; Presgraves 2004).
In present day eukaryotes both Argonaute-like proteins and Piwi-like proteins have been implicated in a variety of RNAi-related phenomena and several subfamily members appear to have distinct functions (Grishok et al. 2001; Lee et al. 2003; Okamura et al. 2004). For instance, a Piwi-like protein, Twi1, is required for the conjugation-induced accumulation of small RNAs involved in programmed DNA elimination in T. thermophila (Lee and Collins 2006; Liu et al. 2004b). The D. melanogaster Piwi-like genes Aubergine and Piwi have been implicated in both transcriptional, partly associated with histone H3 lysine 9 (H3K9) methylation, and post-transcriptional gene silencing (Aravin et al. 2004; Kavi et al. 2005; Pal-Bhadra et al. 2004). Members of the Argonaute-like class of proteins have been identified as key components of the RISC (Hammond et al. 2001; Rivas et al. 2005; Zamore and Haley 2005). In humans, hAgo2 was shown to be responsible for target RNA cleavage whereas other Argonaute subfamily members (hAgo1, hAgo3, and hAgo4) can associate with both siRNAs and miRNAs but do not mediate cleavage (Liu et al. 2004a; Meister et al. 2004; Rivas et al. 2005). In D. melanogaster, both Ago2 and Ago1 have slicer functions but Ago2 is predominantly involved in siRNA-directed target RNA cleavage whereas Ago1 is necessary for miRNA-directed target RNA cleavage (Miyoshi et al. 2005; Okamura et al. 2004). In plants, specific members of the Argonaute subfamily also show distinct activities. Arabidopsis Ago4 has been implicated in DNA and histone methylation whereas Ago1 is an RNA slicer and much more pleotropic in its functions (Baumberger and Baulcombe 2005; Matzke and Birchler 2005; Qi et al. 2005; Zilberman et al. 2003). Interestingly, in organisms that encode a single Ago-Piwi protein, such as S. pombe and T. brucei, its mutation results in defects in both transcriptional and post-transcriptional silencing (Shi et al. 2004; Sigova et al. 2004).
Given the above constraints, it seems reasonable to hypothesize, as previously suggested, an ancestral (but non-essential) role of the RNAi machinery in defense responses against genomic parasites such as transposable elements and viruses (Baulcombe 2004; Buchon and Vaury 2006; Li and Ding 2005; Matzke and Birchler 2005; Plasterk 2002; Waterhouse et al. 2001). This function has been widely conserved throughout the eukaryotic spectrum (Table 1). Moreover, barring a sampling bias given the relatively small number of taxonomically diverse eukaryotic genomes currently available, all known species that have lost the RNAi machinery are unicellular and possess relatively small genomes. These organisms may be affected by a limited number of genomic parasites and likely have alternative means to control them. Indeed, even in RNAi-positive eukaryotes, partly redundant, RNAi-independent pathways are also involved in the silencing of transposons and other repetitive sequences (Chicas et al. 2004; Jeong et al. 2002; Kuhlmann et al. 2005; Lippman et al. 2003; Lippman and Martienssen 2004; Martienssen et al. 2005; Robert et al. 2005; Tran et al. 2005; van Dijk et al. 2006; Yamada et al. 2005). From a mechanistic standpoint, the ancestral RNAi machinery may have been capable of both siRNA-guided transcript degradation as well as siRNA-guided transcriptional repression of homologous sequences. Again, both roles appear to be widespread among living eukaryotes (Table 2) and this diversification of function could account for the ancestral conservation of duplicated Argonaute-Piwi proteins. It is tempting to speculate that one Ago-Piwi protein might have been predominantly located in the nucleus, as D. melanogaster Piwi (Cox et al. 2000), and perhaps involved in transcriptional silencing whereas another Ago-Piwi protein might have been preferentially located in the cytoplasm, as D. melanogaster Ago2 (Findley et al. 2003; Rehwinkel et al. 2005), and perhaps involved in post-transcriptional silencing. Further roles of the ancestral RNAi machinery are certainly possible but many known RNA-mediated silencing processes show a limited phyletic distribution in present day eukaryotes (Table 2) and may have evolved independently in specific lineages (see below).
In Table 2 we have examined the taxonomic distribution of six, experimentally supported, RNA-mediated silencing pathways: dsRNA-induced RNAi, RNAi-mediated (hetero)chromatin formation, RNA-directed DNA methylation, programmed genome rearrangements (DNA elimination), meiotic silencing by unpaired DNA, and miRNA-mediated gene regulation. As discussed above, (degradative) dsRNA-induced RNAi appears to be widespread (Table 2) and likely one of the ancestral functions of the RNAi machinery. However, the sources of long dsRNA are quite variable resulting, in different species, in the silencing of diverse sequences from genomic parasites and repetitive DNA to specific genes (Baulcombe 2004; Chicas et al. 2004; Kavi et al. 2005; Matzke and Birchler 2005; Nakayashiki 2005; Yoshikawa et al. 2005; Zamore and Haley 2005).
The proposed ancestral role of the RNAi machinery in transcriptional gene silencing could have involved siRNA-mediated targeting of chromatin modifications and/or cytosine DNA methylation. RNAi-dependent transcriptional silencing has been demonstrated to entail histone modifications, such as H3K9 methylation, in eukaryotes belonging to at least three different supergroups (Table 2). Moreover, siRNA-triggered transcriptional repression occurs in organisms that lack cytosine DNA methylation such as S. pombe and C. elegans (Grishok et al. 2005; Martienssen et al. 2005; Ponger and Li 2005; Robert et al. 2005; Volpe et al. 2002) and in organisms with very limited DNA methylation such as T. brucei and D. melanogaster (Kavi et al. 2005; Pal-Bhadra et al. 2004; Ponger and Li 2005; Shi et al. 2004; Ullu et al. 2004). In contrast, RNA-directed DNA methylation has, thus far, only been demonstrated in plants and mammals (Kawasaki and Taira 2004; Matzke and Birchler 2005; Morris et al. 2004) and the role of DNA methylation in this type of gene silencing is somewhat debatable in mammals (Ting et al. 2005; Weinberg et al. 2006). Further, several A. thaliana proteins needed for this process, such as the RdRP Rdr2 and subunits of RNA polymerase IV, do not have mammalian counterparts (Chan et al. 2004; Herr et al. 2005; Kanno et al. 2005; Onodera et al. 2005; Xie et al. 2004). Thus, the ancestral RNAi machinery very likely had the capability to target histone modifications, given the widespread phyletic distribution of this function. Conversely, RNA-directed DNA methylation might have arisen independently in specific eukaryotic lineages. Alternatively, if RNA-directed DNA methylation did evolve in the last common ancestor of eukaryotes it appears to have been lost from many lineages and the molecular effectors now differ substantially between plants and mammals.
Interestingly, the mechanism(s) of RNAi-mediated (hetero)chromatin formation also appears to have diverged in present day eukaryotes since, for instance, an RdRP has been implicated in this process in S. pombe (Martienssen et al. 2005; Verdel et al. 2004; Volpe et al. 2002) but it occurs in the absence of RdRPs in D. melanogaster and vertebrates (Fukagawa et al. 2004; Kanellopoulou et al. 2005; Kavi et al. 2005; Pal-Bhadra et al. 2004; Ting et al. 2005; Weinberg et al. 2006). Adding to this complexity, cytosine DNA methylation and histone modifications seem to be interconnected in self-reinforcing feedback loops in higher eukaryotes (Fuks 2005), although the role (if any) of the RNAi machinery in this cycle is not clear. Further, in S. pombe and chicken DT40 cells RNAi-mediated (hetero)chromatin formation may now play a critical role in determining chromosome structure and function during mitosis and/or meiosis (Fukagawa et al. 2004; Martienssen et al. 2005; Wong and Choo 2004). Conversely, in other organisms such as N. crassa the RNAi machinery appears to be dispensable for the methylation of both DNA and H3K9 associated with repetitive sequences (Chicas et al. 2004; Freitag et al. 2004), whereas in mouse there might be cell-type-specific differences in the mechanism(s) of (hetero)chromatin formation (Kanellopoulou et al. 2005; Murchison et al. 2005). Indeed, RNAi-independent pathways for (hetero)chromatin formation and DNA methylation appear to exist in several RNAi-positive eukaryotes (Chicas et al. 2004; Freitag et al. 2004; Goll and Bestor 2005; Jia et al. 2004; Kaller et al. 2006; Laayoun and Smith 1995; Yamada et al. 2005). It remains uncertain to what extent this functional diversity was already present in the unicellular eukaryotic ancestor.
RNAi has also been implicated in the programmed excision of excess DNA in ciliated protozoa such as T. thermophila and P. tetraurelia (Garnier et al. 2004; Mochizuki and Gorovsky 2005; Nowacki et al. 2005; Yao and Chao 2005). This process requires small RNAs, termed scan RNAs, and components of the RNA machinery that direct H3K9 methylation of the chromatin associated with the sequences to be deleted (Garnier et al. 2004; Liu et al. 2004b; Mochizuki and Gorovsky 2005; Yao and Chao 2005). Many of the eliminated sequences appear to be derived from transposons (Lee and Collins 2006; Yao and Chao 2005) and RNA-mediated DNA elimination may have evolved as an extension of the role of the RNAi machinery in the transcriptional silencing of transposon/repetitive sequences. Interestingly, DNA diminution phenomena have also been observed in Ascaris worms and in some species of crustaceans and fish, although it is not known whether these processes are RNAi mediated. Based on this phyletic pattern, Yao and Chao (2005) have recently proposed that programmed genome rearrangements may have arisen by the independent evolution in some eukaryotic lineages of a final (yet uncharacterized) RNAi step, elimination of the (hetero)chromatin induced by small RNAs.
In N. crassa, as a zygotic cell undergoes meiosis (which involves pairing of homologous chromosomes), the presence of an unpaired copy of a gene triggers silencing of all homologous sequences in the genome. This phenomenon has been termed meiotic silencing by unpaired DNA (MSUD) and shown to require an RdRP (Sad1) and an Argonaute-like protein (Lee et al. 2003; Shiu et al. 2001). MSUD also requires RNA production from the unpaired DNA sequence and these transcripts are presumable used as a template by Sad1 to synthesize dsRNA that enters the degradative RNAi pathway (Lee et al. 2004; Matzke and Birchler 2005; Nakayashiki 2005). Thus, even though MSUD originates in the nucleus, it ultimately seems to be a post-transcriptional process that does not involve detectable chromatin alterations at the target locus (Matzke and Birchler 2005; Nakayashiki 2005). MSUD-like phenomena have also been observed in mouse and C. elegans (Maine et al. 2005; Turner et al. 2005). However, both of these processes involve chromatin modifications and transcriptional repression of the unpaired loci. Moreover, (hetero)chromatin formation on unpaired DNA in C. elegans requires the RdRP Ego1 but occurs in the absence of several other RNAi pathway components (Maine et al. 2005). Thus, given its more limited taxonomic distribution and the mechanistic differences in various species, the silencing of unpaired DNA during meiosis is likely a derived, more recently evolved function of RNAi.
The RNAi machinery also plays an important role in gene regulation via microRNAs. However, miRNAs have, thus far, only been identified in multicellular plants and animals (Table 2). They appear to be absent from several unicellular eukaryotes such as S. pombe and T. brucei, where extensive libraries of small RNAs have been sequenced (Djikeng et al. 2001; Reinhart and Bartel 2002), and no miRNA-directed silencing pathway has been documented in fungi (Nakayashiki 2005). In contrast, miRNAs are essential for the development of animals and plants (Bartel 2004; Chen 2005; Kidner and Martienssen 2005; Wienholds and Plasterk 2005). For instance Dicer-deficient vertebrate germ cells are viable but they fail to differentiate (Giraldez et al. 2005; Kanellopoulou et al. 2005; Murchison et al. 2005). Moreover, Dicer is required for morphogenesis (but not for cell fate specification) during zebrafish embryogenesis, and the absence of miRNAs is responsible, at least in part, for this phenotype (Giraldez et al. 2005). Similarly, null alleles of Dicer-like1, necessary for the generation of mature miRNAs, result in embryo lethality in A. thaliana (Chen 2005; Ray et al. 1996; Kidner and Martienssen 2005).
Despite these similarities, plant and animal miRNA pathways vary in multiple aspects (Bartel 2004; Llave et al. 2002; Reinhart et al. 2002). In animals, miRNAs are initially transcribed into long precursor transcripts that are processed to mature miRNAs in a series of steps involving two RNaseIII-like enzymes, Drosha and Dicer (Bartel 2004; Wienholds and Plasterk 2005). Plants lack a Drosha homolog and the production of miRNAs from precursor RNAs appears to be carried out by a single Dicer-like protein (Chen 2005; Kurihara and Watanabe 2004; Millar and Waterhouse 2005). Plant miRNAs are also methylated on the ribose of the last nucleotide, a modification presumably involved in protecting miRNAs from 3′ end uridylation and degradation, whereas animal miRNAs do not appear to be modified (Chen 2005; Li et al. 2005; Yu et al. 2005). Mechanistically, most animal miRNAs are only partly complementary to their targets and mediate silencing primarily by translational repression, although localization to the processing bodies may also affect RNA stability (Humphreys et al. 2005; Liu et al. 2005b; Pillai et al. 2005; Wienholds and Plasterk 2005; Zamore and Haley 2005). In contrast, most plant miRNAs have near-perfect complementarity to their targets and trigger predominantly mRNA cleavage (Carrington and Ambros 2003; Chen 2005; Schwab et al. 2005). These differences, the lack of conservation of particular miRNA genes between plant and animals, and their absence from many eukaryotes suggest that the miRNA pathway may have evolved independently in the lineages leading to multicellular plants and animals (Bartel 2004; Millar and Waterhouse 2005; Wienholds and Plasterk 2005). Conceivably, the appearance of miRNAs may have played a role in the evolution of organisms with complex body patterns (Bartel 2004; Millar and Waterhouse 2005; Wienholds and Plasterk 2005).
Double-stranded RNA has been demonstrated to trigger gene silencing in eukaryotes, linking a variety of apparently dissimilar phenomena with RNA interference in animals. Biochemical and genetic studies have led to the identification of three conserved components of the RNAi machinery, namely Dicer, Argonaute-Piwi, and RdRP. We have analyzed the taxonomic distribution and the phylogenetic relationship of these proteins with the goal of inferring the composition and function(s) of the RNAi machinery in the last common ancestor of eukaryotes. This ancestral RNAi machinery likely consisted of, at least, one Argonaute-like polypeptide, one Piwi-like protein, one Dicer, and one RNA-dependent RNA polymerase. The original role of these components may have been non-essential for unicellular life, although important for defense responses against genomic parasites such as transposable elements and viruses. In fact the RNAi machinery in the eukaryotic ancestor may have been able to target both transcript degradation as well as locus-specific histone modifications, resulting in the inactivation of extra-chromosomal and genome integrated parasitic sequences. Other known RNAi-mediated processes show a limited taxonomic distribution in living eukaryotes and may have evolved more recently in specific lineages. RNA-directed DNA methylation, DNA elimination, and meiotic silencing by unpaired DNA possibly arose as an extension of the RNAi machinery role in controlling transposons and retroviruses. In contrast, miRNAs and several kinds of endogenous siRNAs appear to be flexible innovations that allowed gathering the selectivity of RNAi for the regulation of gene expression.
Although the ancestral RNAi machinery seems to have been fairly complex, a considerable degree of functional diversification as well as integration with RNAi-independent pathways for (hetero)chromatin formation and DNA methylation seem to have occurred during eukaryotic evolution. Moreover, the great expansion of RNAi components, particularly Ago-Piwi proteins, in present day plants and animals suggests the possibility of further, still unrecognized, pathway specialization. Much remains to be learned about the extent of subfunctionalization, neofunctionalization, and partial redundancy of gene family members. In addition, the regulation of gene expression by miRNAs may be a relatively easy innovation, particularly when triggering transcript cleavage as in plants. As recently proposed (Allen et al. 2004; Smalheiser and Torvik 2005), at least some miRNA genes may arise from inverted duplications of target gene sequences and the initially produced double stranded foldback transcripts may operate as regulators via the degradative RNAi pathway. Progressive sequence degradation, under selective pressure, may eventually result in the bulged structure typical of miRNA precursors. From this perspective, it seems reasonable to expect that miRNAs will be found in additional eukaryotic lineages, particularly in organisms with complex genomes where new repeats are tolerated and where the regulation conferred by miRNAs provides a selective advantage. Conversely, the RNAi machinery seems to have been entirely lost or extensively simplified in a number of unicellular eukaryotes with small genomes. In the latter, the presence of a recognizable Argonaute-Piwi protein appears to be diagnostic of a functional RNAi pathway, whereas Dicer-like proteins are less conserved and RdRPs may be absent.
We are grateful to members of the Cerutti lab for critical reading of the manuscript. This work was supported by a grant from the National Institutes of Health (GM62915).
Communicated by R. Bock