A major remaining gap in our knowledge of the tree of life is the uncertain relationships among eukaryotes, including the many divergent microbial lineages plus plants, animals and fungi. Microbial eukaryotes, often referred to as protists, are an eclectic assemblage of lineages that are defined as eukaryotes that are not plants, animals, or fungi [
1]. Clearly, knowledge of the phylogenetic positions of protists is a key to understanding the origins of eukaryotes, and where the ancestries of plants, animals and fungi lie within these microbial groups.
During the 1970's and 1980's a revolution in understanding eukaryotic diversity occurred as a result of ultrastructural studies. These data [
2,
3] demolished traditional classifications where algae, protozoa and fungi were considered discrete entities, and microbial eukaryotes were inappropriately lumped into one of four classes: amoebae, flagellates, ciliates, and sporozoans. Ultrastructural studies revealed distinct assemblages of organisms that are distinguished by their complement and organization of organelles, providing lineages with ultrastructural identities [
1]. About 60 different, robust patterns of ultrastructural organization are recognized, but ~200 genera of uncertain affinities have yet to be examined [
1,
4]. Determining relationships among groups using ultrastructure, however, has proven difficult, largely due to the lack of unambiguously homologous structures.
Early molecular analyses relied on comparisons of rDNAs from diverse protists and suggested that diplomonads, trichomonads, and microsporidia were basal lineages [
5-
7]. These analyses of rDNAs sequences also produced a topology with a base and crown (putatively recently radiated) lineages [
8-
11], which is now argued to be an artifact of long branch attraction. Given the well known limitations of single gene genealogies when inferring deep evolutionary relationships, the current trend is to focus on multigene datasets [
12,
13]. However, taxon representation in many of these analyses is sparse. With such incomplete taxon sampling, distantly related groups may appear as sister taxa and many deep nodes are poorly supported [
14].
The past decade has seen the emergence of six eukaryotic 'supergroups' that aim to portray evolutionary relationships between microbial and macrobial lineages. The supergroup concept is increasingly accepted as evidenced by several reviews [
15,
16] and the recently proposed formal reclassification by the International Society of Protozoologists [
17]. However, the support for supergroups is highly variable in the published literature [
14].
The six putative supergroups have complex and often unstable histories. The supergroup 'Amoebozoa' was proposed in 1996 [
18,
19] based largely on molecular genealogies. The controversial supergroup 'Chromalveolata' was proposed based on the assertion that the last common ancestor of the 'Chromista' (cryptophytes, haptophytes, stramenopiles) and the undisputed Alveolata (dinoflagellates, apicomplexans, ciliates) contained a common chlorophyll
c-containing red algal plastid [
20]. 'Excavata' is another controversial supergroup composed predominately of heterotrophic flagellates whose ancestor is postulated to have had a synapomorphy of a conserved ventral feeding groove [
21]. The supergroup 'Opisthokonta' includes animals, fungi, and their microbial relatives, and is supported by many molecular genealogies [
10]. The 'Opisthokonta' is united by the presence of a single posterior flagellum in many constituent lineages [
22]. The supergroup 'Plantae' was erected as a Kingdom in 1981 [
23] to unite the three lineages with double-membrane primary plastids: green algae (including land plants), rhodophytes, and glaucophytes. Finally, the 'Rhizaria' emerged from molecular data in 2002 to unite a heterogeneous group of flagellates and amoebae including: cercomonads, foraminifera, some of the diverse testate amoebae, and former members of the polyphyletic radiolaria [
24].
We believe that comprehensive taxon sampling, coupled with gene-rich analyses, is critical for resolving accurate phylogenies [
14]. This is particularly relevant for the eukaryotes where only a tiny fraction of the >200,000 species of microbial eukaryotes have thus far been characterized for any gene sequence, and over one-half of identified protists groups [
1] have yet to be subjected to any molecular study. Misleading results can also arise if a study addressing "deeper" splits in the eukaryotic tree does not include a broad diversity of lineages, including members of all six putative supergroups [
14]. This is because the addition of diverse lineages is critical to break long single branches that pose a significant problem for robust phylogenetic inference. We know that the lack of adequate sampling and the use of highly derived (e.g., parasitic) taxa have created unstable tree topologies and led to inaccurate statements of sister-group relationships (i.e., in the creation of the now-abandoned supergroup Archezoa, whose history is described in [
25]). Yet only a handful of studies have been published that take a multigene taxon-rich approach for assessing the eukaryotic tree of life.
Here, we set out to accomplish two tasks: (1) place newly determined sequences from a diversity of microbial eukaryotes onto relatively well-sampled multigene eukaryote phylogenies, and (2) evaluate the support for the six supergroups. Our approach was to use phylogenetic analyses of four genes from two distinct taxon sets that included 61 newly-characterized sequences. The two taxon sets represent 1) 105 diverse eukaryotic lineages and 2) a reduced 92 taxon set in which long-branch taxa were removed. The four loci, SSU-rDNA, actin, alpha-tubulin, and beta-tubulin, have a rich history in eukaryotic phylogenetics [
7,
12,
26]. These genes have been used for more intensive studies of some groups such as 'Amoebozoa' [
27], 'Rhizaria' [
28] and 'Opisthokonta' [
29] as well as for the establishment of many of the proposed supergroups [
14]. Yet, there are few studies in which a multigene data set has been combined with extensive taxon sampling from all six supergroups [
30,
31].
Our work contrasts with many past efforts that have used either single-gene data with a broad taxon sampling [
8-
11], or multigene data with a limited number of taxa [
12,
13,
26,
32]. We performed individual and concatenated analyses of four genes. To assess rate heterogeneity and possible lateral gene transfers, we analyzed each gene individually prior to concatenation and then applied a variety of phylogenetic inference methods with both DNA and the inferred protein sequences. Use of a concatenated data set greatly reduces phylogenetic error in simulation studies [
33] and the large number of characters that we have obtained for this study is expected to improve the accuracy of resulting phylogenetic trees [
34].
Seventy-two sequences were characterized for this study, the bulk of which are newly-characterized (47 sequences) or were previously characterized from other strains (14 sequences), were available as ESTs in public databases (1 sequence) or are previously published and confirmed here (10 sequences; see Additional file
1). These sequences include representatives of all six 'Chromalveolata' groups thereby sampling a sizable fraction of the diversity in this supergroup. This is critical with respect to overall eukaryotic diversity because 'Chromalveolata' contain about one-half of the recognized species of protists and algae [
35]. In addition, eight of the ten 'Excavata' lineages were included in our study. Finally, we also add genes from several lineages within the 'Rhizaria', another poorly supported eukaryotic supergroup [
14].