|Home | About | Journals | Submit | Contact Us | Français|
The explosion of molecular data has transformed hypotheses on both the origin of eukaryotes and the structure of the eukaryotic tree of life. Early ideas about the evolution of eukaryotes arose through analyses of morphology by light microscopy and later electron microscopy. Though such studies have proven powerful at resolving more recent events, theories on origins and diversification of eukaryotic life have been substantially revised in light of analyses of molecular data including gene and, increasingly, whole genome sequences. By combining these approaches, progress has been made in elucidating both the origin and diversification of eukaryotes. Yet many aspects of the evolution of eukaryotic life remain to be illuminated.
Eukaryotes – cells with nuclei – have inhabited Earth for ~1.2–1.8 billion year (Knoll et al. 2006), are major players in biogeochemical cycling, and include lineages that are the causative agents of numerous global disease (e.g., malaria, African sleeping sickness, amoebic dysentery). The most familiar eukaryotes – plants, animals and fungi – dominate the visible landscapes of terrestrial systems. Yet, these three lineages represent only a small fraction of the estimated 35–55 eukaryotic lineages that may be of comparable age (figure 1; Patterson 1999). The small size of the organisms in many of the remaining lineages has traditionally made these groups recalcitrant to study. Moreover, the vast time scale of eukaryotic evolution obscures evolutionary events during their origin and early diversification. Despite these challenges, advances in molecular techniques are making these diverse lineages more accessible and hence transforming our views on eukaryotic evolution.
Here we focus on the recent transformation of hypotheses on the origin and diversification of eukaryotes that have occurred with the rise in molecular data (i.e. multigene sequencing, genomics). This increase in molecular data is the result of a confluence of factors including development of the polymerase chain reaction (PCR) and subsequently improvements in high-throughput sequencing of expressed sequences (ESTs) and whole genomes. We first focus on the origin of eukaryotic features, highlighting the compelling evidence for the chimeric nature of eukaryotic genomes. We then describe the state of knowledge on eukaryotic phylogeny (see box 1) where there has been considerable progress in establishing robust clades, though deep nodes remain elusive. Together, this synthesis highlights both the progress and challenges to our understanding of the evolution of eukaryotic life.
Archaea – microbial clade originally defined using culture-independent studies of extreme environments. Now known to be widespread, Archaea constitute one of the three domains of life and are defined by ether linkages in their membranes.
Bacteria – metabolically diverse, microbial domain that includes many familiar lineages such as Escherichia coli and Staphylococcus aureus. The ancestors of both mitochondria and chloroplast are believed to belong to this domain.
Chloroplast (see plastid below)– a membrane-bound organelle in green algae (plant) cells that contains chlorophyll.
Homology (homologous) – similarity in structure/sequence due to common ancestry.
Lateral inheritance (Lateral or horizontal gene transfer) – genetic inheritance between non-sister lineages; transfer of genes between species.
Mitochondrion – a double membrane-bound organelle primarily involved in oxidative phosphorylation (energy (ATP) production) in eukaryotes.
Molecular systematics – a branch of systematics (evolutionary relationships of organisms) that uses molecular data.
Monophyly (monophyletic group, also known as a clade) – a group of organisms that include a common ancestor and all of its descendants.
Phylogenomics – in phylogenetic context, large scale reconstruction of a species tree using genomic data (usually 100+ genes).
Phylogeny –evolutionary relationship among organisms or their components (e.g. genes).
Plastid – a double membrane-bound organelle, generally involved in photosynthesis, storage (starch) and synthesis of several macromolecules (e.g. fatty acids). Plastids include chloroplasts and chromoplasts.
Heterogeneous rates of evolution (sequence rate heterogeneity) – the variation in rates of substitutions among species.
SSU-rRNA (small subunit rRNA)- is an RNA molecule that is one component of ribosomes, which are involved in protein synthesis. SSU-rRNA is extensively used in phylogenetic studies due to its high sequence conservation across all domains of life.
Supertree – a tree assembled from several smaller phylogenies that share some, but not necessarily all, terminal taxa.
Synapomorphy – a derived character shared by two or more groups that originated in their last common ancestor.
Taxonomy – the practice or science of classification.
Ultrastructural identities – sub-cellular features (mitochondria, plastids, flagella, root system, nucleus, types of hairs etc.) obtained using electron microscope that have been used to define lineages of eukaryotes (e.g. alveolar sacs in alveolates and tripartite hairs in stramenopiles).
Hypotheses on the origin of eukaryotic cells must address the evolution of key features that distinguish eukaryotes from bacteria and archaea (see box 1). One such innovation, the presence of a nucleus, is the defining feature of eukaryotes that gives this clade its name: eu = ‘true’ and karyon = ‘kernel’ (referring to the nucleus). Hypotheses on the origin of eukaryotes must further explain the presence of a chimeric genome – a genome in which different genes share most recent common ancestry with varying archaeal and bacterial lineages. A further and perhaps more significant innovation of eukaryotic cells is the cytoskeleton, the complex set of protein-rich structures that underlie eukaryotic morphology and mobility. Bacteria contain homologs of some eukaryotic cytoskeletal, as explained below, but the mechanism underlying the large divergences between these homologs is unclear. Two additional features found in many but not all eukaryotes are mitochondria and chloroplast (see box 1); the endosymbiotic origins of these organelles compounds the complexity of eukaryotic genomes by adding further sources for genes through endosymbiotic gene transfer (EGT).
Certainly, numerous insightful hypotheses on the origin of eukaryotes predate the recent flood of molecular data. These hypotheses can be divided into two major categories: 1) hypotheses involving endosymbiosis, which argue that components of the eukaryotic cell arose by engulfment of prokaryotic organisms (e.g., Sagan 1967, Taylor 1987) 2) hypotheses for autogenous ('self-birth') pathways for eukaryotic cell components (e.g., Cavalier-Smith 1978); these earlier hypotheses have been extensively reviewed in the literature (e.g., Roger, 1999). Hence, we focus here on the impact of molecular data on hypotheses on eukaryotic origins and evolution.
Eukaryotic genomes are chimeric as evidenced by varying eukaryotic genes that share common ancestry with diverse archaeal and bacterial lineages. Evidence of the chimeric nature of eukaryotic genomes represents one of the most significant transformations in theories on the origins and diversification of eukaryotes. Rather than evolving in a strictly tree-like manner, eukaryotic genes have complex histories that likely involve transfers from endosymbionts, from ingested microbes, and/or from the original partners in the evolution of the eukaryotic cell (figure 2).
If eukaryotic genomes arose strictly through vertical descent, then we might expect the bulk of eukaryotic genes to trace back to a common ancestor in a single non-eukaryotic lineage, depending on patterns of gene gain and loss. In contrast, if endosymbiotic hypotheses are correct and the chimerism of the eukaryotic genome resulted from fusion between two distinct genomes (i.e. one archaeal and one bacterial lineage), then we would expect eukaryotic genes to trace back to two, or another relatively small number, of host and donor lineages. However, analyses of individual genes (Brown and Doolittle 1997) and more recently whole genome data (figure 2; Dagan and Martin 2006) reveal a much more complex history of eukaryotic genes, with many potential donor lineages. The chimerism of eukaryotic genomes is argued to be due to both a fusion event at the time of the origin of eukaryotes coupled with continuous lateral transfer of genes from prey organisms into the eukaryotic genome (figure 2; ‘you are what you eat,’ Doolittle 1998). As a result of these processes, different genes within eukaryotic genomes are more similar to varying lineages of bacteria and archaea as indicated by BLAST analysis of 5,833 human proteins (figure 2j; Dagan and Martin 2006).
One recent study using a supertree (see box 1) approach does find evidence of substantial input from specific prokaryotic lineages, listed in order of relative contribution: cyanobacteria, proteobacteria, and thermoplasmatales (an order within archaea) (Pisani et al. 2007). Aspects of this pattern are consistent with several of the hypotheses on the origin of eukaryotes that involve a merger between archaeal and proteobacterial genomes at the origin of eukaryotes (e.g., Margulis et al. 2000, Martin and Müller 1998, Zillig et al. 1989, Searcy 1992), coupled with subsequent transfer of genes from cyanobacterial-derived plastids (see box 1 and sections on mitochondria and chloroplast below). Interpretation of possible donor lineages to the chimeric eukaryotic genome is complicated by the high levels of lateral gene transfer among bacterial and archaeal lineages over the past ~1 billion years since eukaryotes evolved; this combined with the deep divergence times makes phylogenetic inference challenging. Nevertheless, analyses of completed genomes indicate that the evolutionary history of eukaryotic genes is complex (figure 2) and involves a combination of endosymbiotic gene transfers and aberrant lateral gene transfers (defined here as transfers that are not from hosts or symbionts but rather from food, the environment or another undetermined source).
Determining the origin of the nucleus is perhaps one of the most difficult aspects for theories on the origin of eukaryotic cells. The nucleus, which contains the eukaryotic genome, is bounded by the nuclear envelope (with the outer membrane contiguous with the endoplasmic reticulum) that contains numerous complex components such as nucleopores, and the transcriptional/splicesomal machinery present in the nucleoplasm. As with the case for cytoskeleton described below, hypotheses on the origin of the nucleus must explain the evolution of this complex structure that exists in all extant eukaryotes – there are no clear intermediate taxa for a transition from 'prokaryote' to eukaryote. As with virtually any eukaryotic feature, there are hypotheses that propose endosymbiotic (e.g., Hartman and Fedorov 2002, Margulis et al. 2000) and autogenous (e.g., Cavalier-Smith 2002, Jekely 2007) origins for the nucleus and its associated proteins. One intriguing recent hypothesis on why eukaryotes evolved a nucleus is that there was selection for a temporal separation between transcription and translation due to the invasion of self-splicing Group II introns, which in turn may be precursors to eukaryotic spliceosomal introns (introns removed by the splicesome) (Martin and Koonin 2006). A comprehensive phylogenomic (see box 1) analysis of proteins in the nuclear envelope, including nucleoporins, lamins and karyopherins, revealed few clear homologs among bacteria or archaea (Mans et al. 2004). Hence, the challenge remains to develop hypotheses that can explain the intermediate steps that enabled the evolution of the highly complex eukaryotic nucleus.
Hypotheses on the eukaryotic cytoskeleton have shifted from a focus on the origin of cytoskeletal proteins to questions of how cytoskeletal proteins evolved from homologs in bacteria and archaea into the complex system present in all extant eukaryotes. Detailed comparisons of protein structures have yielded evidence of homologs for numerous cytoskeletal proteins. Most notably, bacterial homologs have been identified for both tubulin and actin, named FtsZ and MreB, respectively (reviewed in Erickson 2007). The bacterial and eukaryotic versions of these proteins are highly divergent, with amino acid similarities ranging from only ~10 to 40% (Erickson 2007; (see Jenkins et al. 2002 for special case of Prosthecobacter)). Contrary to the theory that eukaryotic cytoskeletal proteins are derived from a spirochaete endosymbiont (Margulis 1993, Margulis et al. 2000, Margulis et al. 2006), there is no compelling evidence for any specific bacterial donor lineage for eukaryotic cytoskeletal proteins (Mitchell et al. 2007). Further, neither bacterial nor archaeal homologs have yet been identified for some cytoskeletal proteins, including the diverse myosins (Richards and Cavalier-Smith 2005). Hence, the challenge for theories on the origin of eukaryotic cytoskeleton remains explaining the steps that generated the highly coordinated and complex eukaryotic cytoskeleton found in all extant eukaryotes (Doolittle 1995). Progress in this area will emerge as genome sequences are completed from diverse eukaryotes coupled with developments in models of protein evolution that allow for identification of potential bacterial and archaeal homologs of additional cytoskeletal proteins.
There is compelling evidence that the mitochondrion resulted from endosymbiosis of an alpha-proteobacterium and that during the reduction of the mitochondrial genome, some genes were transferred to the nucleus, contributing to the chimerism of eukaryotic genomes (reviewed in Embley and Martin 2006, Roger 1999). In contrast, the timing of the acquisition of mitochondria has been subject to much recent debate. Under the 'Archezoa' hypothesis, the ancestral eukaryote lacked mitochondria and the organelle was acquired only after the evolution of numerous amitochondriate lineages (Cavalier-Smith 1983). Early analyses of SSU-rRNA (see box 1) yielded topologies that were consistent with the 'Archezoa' hypothesis with amitochondriate lineages such as diplomonads (e.g. Giardia lamblia), trichomonads (e.g. Trichomonas vaginalis) and microsporidians (e.g. Encephalitozoon cuniculi) falling at the base of the eukaryotic tree of life (e.g., Sogin et al. 1989). However, these early trees were based on only a single gene (SSU-rRNA) and are limited by both the number of characters available for analysis (as opposed to more recent multigene studies) and the methodological challenges of dealing with highly heterogeneous rates of evolution in phylogenies.
The taxonomic content of ‘Archezoa’ changed over time with emergence of more data and improved analytical tools that suggested that amitochonridate lineages were either not basal or that these lineages are derived from mitochondrial-containing lineages (see Embley et al. 2003, Patterson 1999, Roger 1999). In addition the 'Archezoa' hypothesis has been challenged by the presence of genes of mitochondrial origin within the nuclei of putative ‘Archezoa’ as well as the discovery of diminutive organelles such as hydrogenosomes (hydrogen-producing organelles) and mitosomes (degenerate mitochondria of uncertain function) in amitochondriate eukaryotes (Embley et al. 2003). Falsification of the ‘Archezoa’ hypothesis based on extant amitochondriate eukaryotes leads to the hypothesis that the acquisition of mitochondria occurred at the time or soon after the origin of eukaryotes (e.g. Gray 1999, Katz 1998, Roger 1999). Under such a scenario, this early acquisition of mitochondria contributed to the present-day chimerism of the eukaryotic genome.
Currently, many accept a model of a single primary endosymbiotic origin of plastids into the last common ancestor of red algae, green algae and glaucocystophytes (a clade of freshwater algae). Under this scenario, all other photosynthetic eukaryotic lineages evolved by subsequent secondary, tertiary and perhaps even quaternary endosymbiosis whereby a eukaryote engulfs another eukaryote. Evidence for a single primary acquisition, in which a eukaryote engulfed a cyanobacterium, include monophyly of plastid genes from diverse eukaryotes (box 1) (Bhattacharya et al. 2004, McFadden and van Dooren 2004) and shared transport pathways (McFadden and van Dooren 2004, Steiner et al. 2005). However, other data complicate the hypothesis of a single primary acquisition of chloroplasts in eukaryotes, including the non-monophyly of plastid Rubisco genes (Delwiche and Palmer 1996), the non-monophyly of the putative hosts of this endosymbiosis (Yoon et al. 2008), and the diversity of pigments and light-harvesting complex proteins among photosynthetic eukaryotes (Larkum et al. 2007).
Similarly, there is debate between those who view only a small number of secondary plastid transfers among eukaryotes and those who argue for a much more dynamic history of photosynthesis in eukaryotes. On the one hand, there is the hypothesis that there have only been three secondary transfers of plastids in eukaryotes, into the ancestors of euglenids, chlorarachniophytes and the supergroup 'Chromalveolata' (Archibald 2006, Cavalier-Smith 1999, Keeling 2004). However, these parsimonious hypotheses are not consistently recovered in analyses of host genomes (Sanchez-Puerta et al. 2007, Yoon et al. 2008). One alternative is articulated in models whereby plastids are particularly likely to be transferred because of their immediate selective benefit to hosts (Grzebyk et al. 2003). As genome scale data from diverse photosynthetic eukaryotes emerge, hypotheses on origins of photosynthesis in eukaryotes will be rigorously tested, with emphasis on resolving the chimeric history of the genomes of photosynthetic lineage.
Beyond theories for the origin of eukaryotic cells, there remains the challenge of reconstructing the diversification of eukaryotic life through phylogenetic analyses of extant eukaryotes. Many theories have been proposed in the form of taxonomies (box 1) that seek to describe the organization of eukaryotic diversity (see Parfrey et al. 2006). In the pre-molecular era numerous taxonomies of eukaryotes were proposed based on morphological features observable through the light microscope and later ultrastructural characters (sub-cellular features) revealed by the electron microscope (e.g. (Patterson 1999, Taylor 1999). Ultrastructural studies led to the recognition of about 70 different lineages of protists with robust patterns of ultrastructural identities or organization (see box 1) (Patterson 1999, Taylor 1999). However, evaluation of relationships among these groups has proven difficult, largely due to the lack of clearly homologous ultrastructural characters.
During the past three decades a wealth of molecular data have been employed in molecular systematics (box 1), greatly advancing our understanding of the diversity of and relationships among eukaryotic lineages (Baldauf et al. 2000, Cavalier-Smith 1983, Philippe et al. 2000, Sogin et al. 1989, Yoon et al. 2008). Systematic studies primarily use sequence data (amino acid and DNA/RNA nucleotides) as they contain numerous characters that can be analyzed once homologous positions are aligned. The amount of sequence data available has grown exponentially as technology for producing and analyzing sequences become cheaper. The majority of the microbial lineages are now represented by some molecular data in the genetic databases. Analyses of these data are producing new hypotheses about the evolutionary histories among eukaryotes, and enabling the testing of existing hypotheses.
Analyses of emerging molecular phylogenies coupled with ultrastructural studies led to the hypothesis that eukaryotes can be classified into six major supergroups, though subsequent analyses indicate some of these clades are not robustly supported (e.g. Parfrey et al. 2006, Yoon et al. 2008). The six supergroups, Opisthokonta, ‘Amoebozoa’, ‘Chromalveolata’, ‘Plantae’ (also referred to as ‘Archaeplastida’), ‘Rhizaria’ and ‘Excavata’ (Adl et al. 2005), constitute the current classification of eukaryotic diversity that is now emerging in biology text books (figure 3). Here, we put the names of many groups in single quotes to indicate the uncertainties around these hypothesized eukaryotic relationships as the support for most of the supergroups is questionable (Parfrey et al. 2006).
We discuss the levels of support the supergroups receive in multigene and phylogenomic (see box 1) analyses as well as other forms of supporting evidence such as ultrastructural or molecular synapomorphies (see box 1), common organellar origin and unique molecular features. Groups with ultrastructural identities are by and large supported in molecular studies, and provide a standard for robust groups (Patterson 1999, Yoon et al. 2008). Explanations for the instability of the supergroups include the chimeric nature of the genome, lateral gene transfer (LGT), heterogeneous rates of evolution (see box 1), methodological challenges, and the limited and biased taxonomic sampling in many studies (see Philippe et al. 2005, Roger and Hug 2006). Others have suggested that the deepest eukaryotic relationships are unknowable because the radiation of eukaryotic lineages proceeded so rapidly that we cannot resolve it a billion or two years later (Koonin 2007).
Opisthokonta is the most robust supergroup and includes animals, fungi and their microbial relatives (e.g. choanoflagellates, nucleariids) (figure 1t–w). The animal-fungal clade first emerged in SSU-rRNA gene trees in 1993, and subsequent studies have provided additional support for the group and added new microbial members (see Parfrey et al. 2006, Steenkamp et al. 2006). Multiple genomic data including phylogenomic studies (figure 4; Burki et al. 2007, Rodriguez-Ezpeleta et al. 2007) also provide strong support for the Opisthokonta. This diverse clade shares morphological and molecular synapomorphies such as the presence of single posterior flagellum in those lineages with flagella and a 12-amino acid insertion in the elongation factor 1-alpha gene (Steenkamp et al. 2006) as well as a shared haloarchaeal type tyrosyl-tRNA synthetase that was acquired through LGT (Huang et al. 2005).
‘Plantae’ and ‘Chromalveolata’, two supergroups with predominantly photosynthetic members, emerged as hypotheses postulating single plastid acquisition through endosymbiosis of a cyanobacterium in ‘Plantae’ and a red alga in ‘Chromalveolata’ (see above plastid origins section; Adl et al. 2005). Resolving the evolutionary history of these lineages is complicated, as their genomes are a chimera of the original host and plastid genomes. This chimerism is reflected in the incongruence observed between plastid-derived characters (genes of plastid origin, transport pathways into the plastid) that support these groups and host-derived characters (nuclear genes, host ultrastructure), which often do not (Parfrey et al. 2006). Elucidating the evolutionary history of the host is a key test of organelle-based hypotheses, as this will differentiate between a single origin of the organelle (host and plastid phylogenies congruence) and a single plastid source engulfed by multiple hosts (host and plastid phylogenies incongruent) (Bodyl 2005, Delwiche 1999, Grzebyk et al. 2003).
The members of ‘Plantae’ are the green algae (including land plants), red algae and glaucocystophytes (figure 1a–c). Molecular genealogies of plastid-derived genes provide support for common ancestry of all ‘Plantae’ plastids, as do other similarities in transport machinery (McFadden and van Dooren 2004). However, as mentioned above, red and green algae have different rubisco protein complexes (Delwiche and Palmer 1996) and light harvesting compounds (i.e., chlorophyll) (Larkum et al. 2007). Support for the monophyly of ‘Plantae’ host genomes is generally high in analyses of large datasets (phylogenomics) with limited taxonomic sampling (e.g., Burki et al. 2007, Rodriguez-Ezpeleta et al. 2007, Rodriguez-Ezpeleta et al. 2005). Conversely, re-analyses of the Rodriguez-Ezpeleta et al. (2005) data including additional taxa of interest and with removal of fast evolving sites, which may produce spurious relationships in phylogenies, did not support the monophyly of ‘Plantae’ (Nozaki et al. 2007). Further, our taxon rich analyses of the four most sampled markers (SSU, actin, alpha- and beta-tubulin) similarly failed to resolve this clade (figure 4; Yoon et al. 2008). These results suggest incongruence due to the conflicting signals of the host and the plastid genomes.
The supergroup ‘Chromalveolata’ is a highly contentious group that unites the alveolates and the ‘Chromista’ (Adl et al. 2005). The alveolates are well-established by both molecular and ultrastructural evidence and include dinoflagellates, apicomplexans and ciliates (reviewed in Patterson 1999). The ‘Chromista’ are a polyphyletic collection of eukaryotes that encompass much of the marine eukaryotic phytoplankton diversity (kelps, diatoms, dinoflagelates, coccolithophorids) (e.g. Cavalier-Smith 2002). ‘Chromalveolata’ remains controversial as it has consistently failed to form a monophyletic group in nuclear gene analyses (Parfrey et al. 2006), even those with large numbers of genes (Burki et al. 2007, Hackett et al. 2007) and increased sampling of relevant taxa (Yoon et al. 2008). The support for this supergroup is largely limited to those studies that use genetic markers of plastid origin (e.g. McFadden and van Dooren 2004). However, these plastid genealogies are also consistent with other hypotheses, and a shift is underway from ‘Chromalveolata’ towards alternative hypotheses (Bodyl 2005, Burki et al. 2007, Grzebyk et al. 2003, Sanchez-Puerta et al. 2007).
One of the impacts of molecular data is the emergence of unsuspected relationships between morphologically dissimilar organisms, as in the case of the heterogeneous supergroups ‘Amoebozoa’ and ‘Rhizaria’ (see Adl et al. 2005). These groups lack defining morphological synapomorphies, but are generally supported by molecular analyses (see Parfrey et al. 2006). The membership of these diverse groups continues to expand as more and more unaffiliated microbial eukaryotes are sampled (e.g., Polet et al. 2004, Tekle et al. 2007, Tekle et al. 2008, Yoon et al. 2008). As there is some circularity in continuing to test these groups with the markers that were originally used to define them, robust tests must await additional data, such as phylogenomic studies with broad taxon sampling.
‘Rhizaria’ is a collection of amoebae, parasites and flagellates, including Foraminifera (testate marine amoebae with reticulating pseudopods), and chlorarachniophytes (filose amoebae with green algal plastids), and plasmodiophorids (plant parasites) (figure 1j–m). ‘Rhizaria’ is frequently supported in molecular analyses of single or few genes with decent taxonomic sampling (Nikolaev et al. 2004) and recently phylogenomic analyses with few taxa (Burki et al. 2007, Rodriguez-Ezpeleta et al. 2007). Support for this supergroup is inconsistent in multigene genealogies with larger taxon sampling (Yoon et al. 2008). However, some members of this supergroup (e.g. Foraminifera) are characterized by highly divergent sequence evolution for some markers used and hence this might account for its failure to form a group (Habura et al. 2005).
‘Amoebozoa’ is composed of mostly amoeboid forms including naked free-living lobose amoebae (e.g. Amoeba proteus), organisms that cause disease in humans such as Entamoeba histolytica (amoebic dysentery), cellular slime molds (Dictyostelium) and lobose amoebae that secrete shells (figures 1q–s). This supergroup is generally supported in molecular analyses with more genetic data (e.g. Burki et al. 2007, Rodriguez-Ezpeleta et al. 2007) and taxonomic sampling (Tekle et al. 2008).
The supergroup ‘Excavata,’ hypothesized on the basis of a homologous ventral feeding groove and associated ultrastructural characters (Simpson and Patterson 1999), has not proven robust in most molecular analyses. This group includes the familiar parasites Trypanosoma, Giardia and Trichomonas as well as photosynthetic free-living lineages (e.g., Euglena gracilis) (figures 1n–p). Molecular analyses generally reject the monophyly of ‘Excavata’ even in studies with good sampling of genes and taxa (Rodriguez-Ezpeleta et al. 2007, Simpson et al. 2006, Yoon et al. 2008). ‘Excavata’ members generally fall into two separate, well-supported clades (figures 3 and and4).4). The non-monophyly of ‘Excavata’ suggests that the excavate groove structure has been lost or modified in intervening lineages, or that it evolved more than once by convergent evolution.
The supergroups are themselves in flux as groups shift with additional sampling of taxa and data. For example, several recent papers have reported a relationship between ‘Rhizaria’ and members of the ‘Chromalveolata’, leading to a hypothesis on the specific relationship between stramenopiles, alveolates and ‘Rhizaria’ – the ‘SAR’ hypothesis (figure 3; Burki et al. 2007, Hackett et al. 2007, Rodriguez-Ezpeleta et al. 2007). The emergence of ‘SAR’ highlights the complexity of the history of secondary endosymbiosis of green and red algal plastids, which are both found in these lineages. There are no morphological or molecular synapomorphies that support this group, illustrating both the power of molecules to point to unsuspected relationships and the difficulty of resolving the deepest relationships.
At a higher level, the eukaryotic supergroups have been classified as either ‘Unikonta’ or ‘Bikonta’ and the root has been drawn between them (Stechmann and Cavalier-Smith 2002). Members of the ‘Bikonta’ (‘Chromalveolata’, ‘Excavata’, ‘Plantae’, ‘Rhizaria’) were originally postulated to encompass organisms whose flagella are supported by two flagellar roots, while the ‘Unikonta’ have flagella bearing a single root (‘Amoebozoa’ and Opisthokonta). These concepts were based on only a few representative lineages and there are putative ‘bikonts’ that have one flagellar root, and ‘unikonts’ with two (Cavalier-Smith 2002). There is some evidence from genomic data that has also been cited in support of ‘Unikonta’ and ‘Bikonta’, including a gene fusion said to be ‘unikont’ specific (Stechmann and Cavalier-Smith 2002). However, the gene fusion, was later discovered in the genome of a red alga, a putative ‘bikont’ (Nozaki et al. 2005). The bifurcation eukaryotes into ‘unikonts’ and ‘bikonts’ requires further scrutiny based on analyses of larger taxonomic coverage and unequivocal morphological and molecular synapomorphies.
The availability of gene and genome sequences has greatly increased our understanding of the phylogenetic relationships among eukaryotes by 1) suggesting relationships between morphologically dissimilar organisms, 2) providing a means to test the robustness of groups, such as those with ultrastructural identities, and 3) providing homologous characters that enable comparison across the diversity of eukaryotes. Below the supergroup level molecular data has increased support for numerous clades. Molecular data also pointed out some surprising alliances of organisms such as the placement of Microsporidia – a lineage of parasites previously classified among the "protozoa" – within the fungi (Hirt et al. 1999). Molecular studies have also provided confirmation of relationships suspected based on other forms of evidence such as the stramenopiles, a clade containing diatoms, kelps, golden algae, as well as many non-photosynthetic members including water molds (e.g., Phytophthora infestans, the agent of the Irish potato blight). These diverse lineages share the ultrastructural synapomorphy of tripartite tubular hairs on one flagellum (Patterson 1999). Both ultrastructure and molecular studies were instrumental in the emergence of the Alveolata (Taylor 1999). A relationship between ciliates and dinoflagellates was first suggested based on ultrastructural similarities, while molecular analyses added the Apicomplexa to this group. Subsequent work uncovered the ultrastructural synapomorphy of alveolar sacs located directly under the membrane (Patterson 1999).
We have seen a tremendous progress in our understanding of the origin of eukaryotes with the accumulation of gene and genomic data in the past decade. In this era it has become possible to assess the different theories of the origin of eukaryotes with empirical data. Examination of genomic data has revealed that the eukaryotic genome is chimeric as eukaryotic genes share ancestry with different archaeal and bacterial lineages (Fig. 2). Several lines of molecular evidence indicate an early acquisition of mitochondrion during the evolution of the eukaryotic cell. Genomic data are also playing a role in unraveling the mysteries surrounding the origin of the key innovations of the eukaryotic cell.
Our understanding of the diversification of eukaryotes has also benefited from molecular analyses of genetic data. In particular, molecular data provide a myriad of homologous characters that can be used to reconstruct phylogenies of diverse microbial and macroscopic lineages whose morphological differences preclude comparison by other methods. Despite varying support, the current classification of eukaryotes into six putative supergroups is a step towards deciphering the eukaryotic diversification and evolution. Complex patterns of molecular evolution, methodological problems, inadequate and biased taxonomic, and data sampling have hindered inferring the deep interrelationship of eukaryotes. However, these challenges will diminish through inclusion of more lineages, incorporation of appropriate models of evolution and refinement of the methodological tools.
The work was supported by NSF ATOL (DEB 043115), NIH AREA (1R15GM081865-01) and NSF OCE (0648713) grants to LAK.
Author contributions:All authors contributed equally to this work.