The phylogeny of eukaryotic organisms has been difficult to resolve. The relationship among the supergroups, the potential lack of monophyly of Chromalveolata and Excavata, and the placement of the root of the tree have remained contentious (
Adl et al. 2005;
Simpson and Roger 2004). Earlier phylogenetic analyses indicated that diplomonads (including
G. intestinalis) are among the deepest divergences in the eukaryotic lineage and the tree was ‘rooted’ with these mitochondrion-lacking unicellular eukaryotes (
Sogin et al. 1989;
Sogin 1991). Recent studies have suggested that this rooting may have been patterned by methodological artefacts (
Philippe et al. 2000).
Arisue et al. (2004) have argued that two possibilities seem to exist for the root of the eukaryotic tree, namely the branch leading to Opisthokonta (animals and fungi) or that leading to the common ancestor of Diplomonadida/Parabasalia (within Excavata). Moreover, combined protein phylogenies strongly suggest that Opisthokonta are most closely related to Amoebozoa (
Richards and Cavalier-Smith 2005;
Simpson and Roger 2004). This grouping has been called ‘unikonts’ (ancestrally monociliate). Based on similar evidence all the other major groups of eukaryotes (Archaeplastida, Chromalveolata, Rhizaria, and Excavata) might be related to each other and have been called ‘bikonts’ (ancestrally biciliate) (
Richards and Cavalier-Smith 2005;
Simpson and Roger 2004). Thus, an emerging hypothesis is that the earliest evolutionary divergence within eukaryotes (and the root of the eukaryotic tree) falls between unikonts and bikonts (
Richards and Cavalier-Smith 2005;
Simpson and Roger 2004;
Stechmann and Cavalier-Smith 2003). If true, then comparisons between animals, fungi, and plants (which would include organisms derived from each branch of the earliest divergence) would be largely sufficient to diagnose the generalities of the ancestral RNAi machinery. However, because of the detected pattern of lineage specific losses of RNAi components (see above) and the possibility of an alternative eukaryotic rooting in the branch leading to Diplomonadida/Parabasalia, we have included proteins from species belonging to each of five eukaryotic supergroups (whenever possible) in our phylogenetic analyses.
The most likely point of origin of the Argonaute-Piwi, Dicer-like, and RdRP protein families was inferred from the patterns of phyletic distribution and phylogenetic tree topology and on the basis of the parsimony principle (
Anantharaman et al. 2002). If a particular protein family is widely represented in all eukaryotic supergroups, the most parsimonious scenario points to its presence in the last common ancestor of eukaryotes. This conclusion is reinforced when the phylogenetic tree for the family in question conforms to the topology of the eukaryotic tree. However, none of the best phylogenetic trees for Ago-Piwi, Dicer-like, or RdRP polypeptides strictly coincided with the consensus tree of Eukaryota (
Medina 2005) nor could reconstruct the monophyly of some higher-order eukaryotic groups, as previously reported for individual gene/protein trees (
Arisue et al. 2004;
Philippe et al. 2004). Besides the usual problems of weakness of phylogenetic signal, lateral gene transfers, hidden paralogy, and tree reconstruction artefacts (
Philippe et al. 2004), incorrectly predicted protein models (since some sequences were extracted from draft genomes) may have contributed to greater than usual divergences making the relationship among deep branches difficult to resolve.
Despite these caveats, the Argonaute-Piwi proteins in present day organisms fell into two relatively well supported, presumably paralogous, groups: the Argonaute-like and the Piwi-like polypeptides (). Fungi (Opisthokonta), green algae and plants (Archaeplastida), and P. sojae (Chromalveolata) appear to encode exclusively Argonaute-like proteins in their genomes. In contrast, Amoebozoa, and T. thermophila and Paramecium tetraurelia (Chromalveolata) seem to encode exclusively Piwi-like proteins. Lastly, animals (Opisthokonta) have representatives of both types of proteins whereas the Excavata sequences (G. intestinalis and T. brucei) could not be reliably resolved in terms of their grouping. A parsimonious interpretation of these data suggests that the last common ancestor of eukaryotes contained both Argonaute-like and Piwi-like proteins and that specific lineages independently lost either one or the other. Only animals appear to have retained both classes of proteins, although this conclusion may need to be reexamined as more sequences from diverse taxonomic groups become available. Interestingly, the Argonaute-Piwi duplication may have preceded the formation of a multidomain, PAZ-containing Dicer protein (see below) since in phylogenetic analyses the PAZ domains of Piwi-like and Dicer-like proteins cluster together whereas the PAZ domains of Argonaute-like proteins behave as an outgroup (data not shown). Thus, domain shuffling from an ancestral Piwi-like gene might have contributed the PAZ motif to Dicer.
Ago-Piwi proteins have also undergone a marked degree of expansion in certain eukaryotic lineages (), most prominently plants and metazoans, perhaps associated with more extensive diversification of function. In plants, several duplications of Argonaute-like proteins appear to have occurred both before and after the divergence of monocots and dicots, represented by Oryza sativa and A. thaliana, respectively (). Extensive expansion of Argonaute-Piwi proteins has also occurred in the animal lineage. Moreover, in certain species such as C. elegans and D. melanogaster some of these polypeptides are currently so divergent that they do not reconstruct the monophyly of animals (). At least one group of C. elegans Argonaute-like proteins (including PPW1 and PPW2) behaves as paralogous to all other Ago-like polypeptides in animals, fungi, plants, C. reinhardtii, and P. sojae ().
A phylogenetic tree of Dicer-like proteins, constructed based on the alignment of the dual RNaseIII domains, did no allow resolving the relationship among most of these proteins (). This is likely a reflection of the lower sequence (and domain structure) conservation of Dicer-like proteins relative to Ago-Piwi and RdRP polypeptides. However, the animal and plant Dicer-like sequences form a well-supported cluster and appear to be orthologous (). Interestingly, plant Dicer-like sequences underwent significant expansion largely prior to the divergence of monocots and dicots. In contrast, most animals appear to encode a single Dicer sequence, with the exception of insects that contain two. Whereas insect Dcr1 clusters with all other animal Dicers, Dcr2 is much more divergent and forms a paralogous clade (). Intriguingly, insect Ago2 is also much more divergent than Ago1 and does not cluster with most other animal Argonaute-like proteins (the Ago1 clade) (). It remains unclear whether this reflects an ancient RNAi pathway duplication in the animal lineage that was retained only in insects and/or the fast evolution of certain duplicated sequences within the insect lineage. Although a recent report suggests that
D. melanogaster Dcr2 and Ago2 are among the fastest evolving genes in this organism, perhaps as a result of a coevolutionary ‘arms race’ with viral pathogens (
Obbard et al. 2006).
A monophyletic origin of animal and plant Dicers is also supported by the comparable domain organization of their sequences (). Moreover, if the eukaryotic tree is truly rooted between unikonts and dikonts, as already discussed, one of the ancestral forms of Dicer may have been similar to the multidomain protein now present in animals and plants. Domain deletion/truncation, domain fusion, as well as sequence divergence could explain the more variable Dicer-like proteins found in other living organisms (which contain various combinations of some of the putative ancestral motifs) (). However, a polyphyletic origin of Dicer-like sequences, and, potentially, the existence of more than one Dicer form in the eukaryotic ancestor, cannot be statistically ruled out (). Interestingly, Drosha, another type of RNaseIII enzyme involved in RNAi via the processing of miRNA precursors in animals (
Bartel 2004;
Wienholds and Plasterk 2005;
Zamore and Haley 2005), is absent from the genome of all other eukaryotes examined (data not shown). Drosha polypeptides form an outgroup with respect to plant and animal Dicers and they seem to be somewhat better related (albeit weakly) to eubacterial RNaseIII enzymes (). Thus, this type of protein may have evolved, independently from Dicer, in the animal lineage.
RdRPs are not as widely distributed among eukaryotes as Ago-Piwi and Dicer-like sequences but a phylogenetic tree, constructed by aligning the RdRP domains, supports the monophyletic origin of the proteins found in
C. elegans, fungi, Amoebozoa,
P. tetraurelia, and a subset of plant RdRPs (). However, the evolutionary relationships among some of these polypeptides as well as the grouping of the
G. intestinalis RdRP are not well defined. Besides the already discussed caveats associated with our analyses, the topology of the RdRP tree might also be affected by more prevalent lineage-specific losses of some of these proteins. For instance, there is experimental evidence that
A. nidulans has lost, via DNA sequence degeneration, the putative ortholog of the
N. crassa RdRP Qde-1 (
Hammond and Keller 2005). Intriguingly, plants also contain a subset of RdRPs (including
A. thaliana Rdr3, Rdr4, and Rdr5) that behaves as an outgroup to all other RdRPs (). Thus, a parsimonious interpretation of the data is consistent with the existence of at least one RdRP in the common eukaryotic ancestor, but the origin of the subset of more divergent plant RdRPs remains uncertain.