|Home | About | Journals | Submit | Contact Us | Français|
Understanding the early evolution and diversification of eukaryotes relies on a fully resolved phylogenetic tree. In recent years, most eukaryotic diversity has been assigned to six putative supergroups, but the evolutionary origin of a few major “orphan” lineages remains elusive. Two ecologically important orphan groups are the heterotrophic Telonemia and Centroheliozoa. Telonemids have been proposed to be related to the photosynthetic cryptomonads or stramenopiles and centrohelids to haptophytes, but molecular phylogenies have failed to provide strong support for any phylogenetic hypothesis. Here, we investigate the origins of Telonema subtilis (a telonemid) and Raphidiophrys contractilis (a centrohelid) by large-scale 454 pyrosequencing of cDNA libraries and including new genomic data from two cryptomonads (Guillardia theta and Plagioselmis nannoplanctica) and a haptophyte (Imantonia rotunda). We demonstrate that 454 sequencing of cDNA libraries is a powerful and fast method of sampling a high proportion of protist genes, which can yield ample information for phylogenomic studies. Our phylogenetic analyses of 127 genes from 72 species indicate that telonemids and centrohelids are members of an emerging major group of eukaryotes also comprising cryptomonads and haptophytes. Furthermore, this group is possibly closely related to the SAR clade comprising stramenopiles (heterokonts), alveolates, and Rhizaria. Our results link two additional heterotrophic lineages to the predominantly photosynthetic chromalveolate supergroup, providing a new framework for interpreting the evolution of eukaryotic cell structures and the diversification of plastids.
The phylum Telonemia encompasses only two formally described heterotrophic zooflagellate species, Telonema subtilis and Telonema antarcticum (Klaveness et al. 2005), but a study of environmental sequences identified a large number of unknown representatives of this phylum in marine plankton (Shalchian-Tabrizi et al. 2007). Telonemids are of pivotal evolutionary significance because they exhibit a unique combination of cellular structures that have only been found separately in different eukaryotic lineages belonging to the chromalveolate supergroup, suggesting that they may represent a transitional form between deeply diverging eukaryotes (Shalchian-Tabrizi et al. 2006). Thus far, molecular support for the position of telonemids relative to other eukaryotes remains weak, with a 3-gene analysis suggesting a position close to plastid-bearing cryptomonads (Shalchian-Tabrizi et al. 2006) and a 6-gene study favoring an association with stramenopiles (Reeb et al. 2009).
Heliozoans, on the other hand, are a large and ultrastructurally diverse array of axopodia-bearing, largely heterotrophic protists that have recently been confirmed to be polyphyletic (Nikolaev et al. 2004; Sakaguchi et al. 2005). Although some small groups of heliozoans are now known to belong to Rhizaria or stramenopiles (heterokonts), the most distinctive core group of Heliozoa, the invariably heterotrophic Centroheliozoa, is the last substantial eukaryotic group not yet clearly placed on the tree. Based on weakly supported 18S ribosomal RNA trees and some intriguing ultrastructural similarities, a possible relationship between centrohelids and haptophytes was recently suggested (Cavalier-Smith and von der Heyden 2007), but molecular trees (even based on as many as seven genes) are generally inconsistent and inconclusive (Sakaguchi et al. 2007).
In contrast to single-gene phylogenies, phylogenomics has proven very useful for placing important species of uncertain affinity on the tree of eukaryotes (e.g., the breviate amoebae and Ministeria; Minge et al. 2009 and Shalchian-Tabrizi et al. 2008). We thus opted for a genomic scale approach to infer the phylogenetic position of the telonemids and centrohelids. In order to generate sufficient amounts of data for this purpose, we used massively parallel 454 pyrosequencing on normalized cDNA libraries from these poorly understood protists. A total of 213,350 and 363,490 sequence reads from T. subtilis and R. contractilis were obtained, respectively, with read lengths for both data sets averaging 232 bp. Of these, 184,838 (87%) could be assembled into 26,013 contigs for T. subtilis and 327,570 (90%) into 30,120 contigs for R. contractilis (cutoff for assembling: 100 bp). The average length of the contigs was 302 bp for T. subtilis (median = 233; standard deviation [SD] = 197) and 299 bp for R. contractilis (median = 236; SD=180). In T. subtilis, 3,041 contigs (12%) were larger than 500 bp, of which 381 were larger than 1,000 bp; in R. contractilis, 3,276 contigs (11%) were larger than 500 bp, of which 329 were larger than 1,000 bp (fig. 1A). As expected from normalized libraries (see Supplementary Material online), most contigs were comprised of a small number of reads (10 or less); yet, the high throughput pyrosequencing approach yielded 4,372 (17%) and 7,525 contigs (25%) that contained 11 or more reads for T. subtilis and R. contractilis, respectively (fig. 1B). Interestingly, the 150,140 additional reads for R. contractilis seemed mostly to increase the number of contigs with many reads (fig. 1B). For example, we observed nearly 8 times the number of contigs with 100 reads or more in R. contractilis than we did in T. subtilis, whereas both species had a comparable number of contigs with 2–10 reads. This suggests that we may have sampled the majority of the expressed genes for R. contractilis. Yet, even for this species, the total number of contigs obtained (30,120) probably substantially overestimates the total number of genes present in the genome as many contigs are so short that many genes are represented by two or more contigs.
In addition to T. subtilis and R. contractilis, new data for the cryptomonads Guillardia theta and Plagioselmis nannoplanctica as well as the haptophyte Imantonia rotunda were also generated (see Supplementary Material online), leading to a much improved sampling for the cryptomonad/haptophyte group compared with earlier phylogenomic studies (Patron et al. 2007; Burki et al. 2008). These new sequences, together with publicly available data, were used to construct a multigene alignment (supermatrix) containing 127 genes (29,235 amino acid positions) and a taxon-rich sampling of 72 species belonging to all supergroups of eukaryotes. Importantly, all species were carefully selected to minimize the impact of heterogeneity in evolutionary rates by excluding long-branched taxa.
Our concatenated data set was first analyzed by Bayesian (phylobayes—CAT model; Lartillot and Philippe 2004) and maximum likelihood (ML) methods (RAxML—RTREV model; Stamatakis 2006). Figure 2 shows an unrooted Bayesian consensus tree, with ML bootstrap values (BPs) and Bayesian posterior probabilities (PPs) indicated. All main eukaryotic assemblages were recovered with moderate to maximum support and are consistent with the most recent published studies of eukaryote evolution (Burki et al. 2007, 2008; Rodriguez-Ezpeleta et al. 2007; Hampl et al. 2009; Minge et al. 2009). Amoebozoa and opisthokonts robustly grouped together (the unikonts) to the exclusion of excavates (only monophyletic in ML analyses) and a megagroup composed of all other eukaryotes (78% BP; 1.0 PP). Within this megagrouping, Plantae were monophyletic, as was the stramenopile, alveolate, and Rhizaria clade (the SAR group; Burki et al. 2007) and the haptophyte/cryptomonad clade. Remarkably, in all analyses, T. subtilis and R. contractilis grouped with cryptomonads and haptophytes in a moderately supported clade (node 1; fig. 2: 70% BP; 0.88 PP). Within this group (henceforth the group composed of cryptomonads, centrohelids, telonemids, and haptophytes is referred to as the CCTH group), relationships were essentially unresolved and the Bayesian and ML methods yielded different but unsupported branching patterns (fig. 2). In addition, all analyses placed the CCTH group as sister to the SAR group with moderate support (node 2; fig. 2: 65% BP; 0.99 PP), and this major assemblage branched with the plants (78% BP; 1.0 PP). In an attempt to evaluate further alternative positionings of T. subtilis and R. contractilis, a procedure that randomly sampled among the 127 genes to construct 200 bootstrap replicates was applied. Each of these concatenated alignments was then analyzed by the ML method. A clade containing telonemids, centrohelids, and haptophytes was obtained in the majority rule consensus tree and recovered in 41% of the replicates (supplementary fig. 1, Supplementary Material online). The CCTH group was inferred in 35 trees (18%); this low number was notably due to the tendency of cryptomonads to branch as sister to some excavates, a relationship that was not observed after excluding T. subtilis and R. contractilis from the analysis (cryptomonads grouped with haptophytes in 61% of the trees; supplementary fig. 2, Supplementary Material online). Of much interest, T. subtilis and R. contractilis branched together in 133 trees (67%) regardless of their association with other members of the CCTH group, supporting the affinity that exists between these lineages in our “standard” supermatrix approach. With the exception of cryptomonads, this analysis also placed T. subtilis, R. contractilis, and haptophytes as sister to the SAR group (66 trees, 33%). Thus, the bootstrapping of genes approach showed overall similar relationships compared with the original data set and did not reveal supported alternative placements in the tree for the CCTH members.
Previous studies have shown that phylogenomic analyses treating multigene data sets as concatenated alignments may not sufficiently account for the evolutionary specificities of each gene and potentially introduce tree reconstruction artifacts (Bapteste et al. 2002; Philippe et al. 2004; Patron et al. 2007). We therefore conducted a “separate” analysis that takes into account the difference in evolutionary tempos and modes across genes. This analysis specifically examined the relationships among 8 major groups: 1) T. subtilis, 2) R. contractilis, 3) cryptomonads, 4) haptophytes, 5) the SAR group, 6) Plantae, 7) excavates, and 8) unikonts (opisthokonts + Amoebozoa). Because not all genes in the original selection contained at least one representative taxon for each group of interest, a subset of 87 genes (amounting to 19,270 amino acids) was selected from the total 127 genes used in the concatenation (supplementary table S1, Supplementary Material online). This analysis resulted in the same relationships observed in the Bayesian analysis of the supermatrix (RELL BPs indicated on fig. 2), notably recovering a T. subtilis plus R. contractilis clade (65% RELL BP) that formed a group with haptophytes and cryptomonads (69% RELL BP; fig. 2). Furthermore, this approach was consistent with the concatenated analysis in positioning the CCTH group together with the SAR group (92% RELL BP). Topology comparisons using the approximate unbiased (AU) test strongly confirmed the monophyletic association between the CCTH and SAR lineages. Indeed, only 19 of 351 test trees were not rejected at the 5% level, among which 18 trees contained a clade comprising T. subtilis, R. contractilis, haptophytes, cryptomonads, and the SAR group, to the exclusion of all other eukaryotes (table 1).
Telonema subtilis and R. contractilis are two heterotrophic unicellular eukaryotes that represent groups which are among the most difficult to place within the tree of eukaryotes (Cavalier-Smith and Chao 2003; Nikolaev et al. 2004; Shalchian-Tabrizi et al. 2006; Cavalier-Smith and von der Heyden 2007; Sakaguchi et al. 2007; Reeb et al. 2009). By generating large molecular data sets from both lineages, we have provided convincing and congruent evidence suggesting that telonemids and centrohelids both have evolutionary affinities with haptophytes and cryptomonads and more generally with the SAR group. Nevertheless, uncertainties remain. Despite use of a very large data set, there are three important reasons why we failed to recover a more highly resolved topology. First, our analyses strongly confirm earlier indications from multi-gene trees that T. subtilis and R. contractilis are not closely related to any known eukaryotic lineage (Shalchian-Tabrizi et al. 2006; Sakaguchi et al. 2007; Reeb et al. 2009) and may have diverged soon after the origin of the CCTH-SAR grouping. If true, such early divergence would have resulted in relatively few sequence synapomorphies to have arisen during their brief period of shared common ancestry and in the loss of much of that phylogenetic signal during the far longer period of subsequent evolution. Second, one expects to observe a decrease in statistical support when early diverging species are added to a phylogeny (Sanderson and Wojciechowski 2000). In keeping with this expectation, after removing T. subtilis and R. contractilis from our multigene alignment, support for the haptophyte/cryptomonad/SAR group and its sister grouping with Plantae both increased substantially (supplementary fig. 3, Supplementary Material online). Finally, T. subtilis and R. contractilis are the only species from telonemids and centrohelids for which genomic data are available; similar samples from additional representatives of these and other related lineages are needed before their phylogenetic position can be determined conclusively. Of particular importance are the heterotrophic flagellate katablepharids that have been proposed to be sister to haptophytes or cryptomonads on the basis of a handful of genes (Okamoto and Inouye 2005; Kim and Graham 2008) or classified with cryptomonads and telonemids based on ultrastructure (Cavalier-Smith 2004) and the as-yet uncultured biliphytes that might be related to cryptomonads (Not et al. 2007; Cuvelier et al. 2008).
If the relationships between these lineages and the CCTH-SAR clade are confirmed, a new major assemblage is emerging with important implications for understanding the early evolution of eukaryotes. Recently, multigene analyses (Hackett et al. 2007; Patron et al. 2007) and a shared lateral transfer of bacterial rpl36 to their plastid genomes (Rice and Palmer 2006) suggested that cryptomonads and haptophytes form a clade. Taken together, the evidence that katablepharids and possibly biliphytes are related to cryptomonads, and our demonstration that telonemids and centrohelids may also be part of the cryptomonad/haptophyte clade, substantially increase the organismal diversity and importance of this novel phylogenetic group. Our results also provide additional confirmation of the monophyly of the SAR clade, and, because of our carefully chosen taxon sampling, the possibility that this grouping was a long-branch artifact (Cavalier-Smith 2009) is now reduced. Moreover, the monophyly of SAR was recently strengthened by the discovery of a shared paralogy in the Rab1A gene family, representing the first synapomorphy associated with the origin of the group (Elias et al. 2009). The Rab1A paralogy is found in each completely sequenced genome and several expressed sequence tag (EST) data sets of stramenopiles, alveolates, and chlorarachniophytes (Elias et al. 2009), as well as in at least two other rhizarians (Reticulomyxa filosa and Gromia sphaerica, data not shown).
Altogether, the phylogenomic data presented here suggest that the chromalveolate assemblage should be expanded to include Rhizaria and at least two additional poorly known lineages for which plastids have never been reported, telonemids and centrohelids. However, more data are urgently needed to exclude completely that these relationships are not affected by undetected endosymbiotic gene transfers and replacements (Lane and Archibald 2008), which, given the very limited genomic data available for red algae, are currently hard to identify. In particular, the affinities between some members of the CCTH group and Plantae, as observed in several recent multigene phylogenies (Patron et al. 2007; Burki et al. 2008; Hampl et al. 2009; Minge et al. 2009), need to be further tested. This is particularly important because AU tests based on our supermatrix (contrasting with the separate analysis, see above) failed to reject that alternative relationship at the 5% level (P=0.072 and P=0.06 for the branching pattern within CCTH corresponding to the Bayesian tree or the ML tree, respectively) or a hypothetical grouping of haptophytes and cryptomonads alone with Plantae (P=0.079). Other scenarios, such as the recent Plastidophila hypothesis (Kim and Graham 2008), also need to be specifically addressed, essentially based on the phylogenetic signal present in just one protein (eukaryotic translation elongation factor 2, eEF2), that hypothesis challenged the monophyly of both Plantae and chromalveolates. However, Kim and Graham's interpretation of a two amino acid signature (SA) in the eEF2 protein as evidence supporting a grouping of green plants, red algae, haptophytes, cryptomonads, and katablepharids (to the exclusion of glaucophytes, alveolates, stramenopiles, and Rhizaria, which have GS) is weakened by its absence in both T. subtilis and R. contractilis, which possess GS and GA amino acid residues, respectively, as well as by other contradictions to the oversimplified signature sequence distribution they noted (i.e., Ustilago, Rhizopus, Schizosaccharomyces, Reclinomonas, Jakoba, and an Acanthamoeba all have AS instead of GS; Malawimonas californiana has AL and Spironucleus has GA).
Many members of cryptomonads, haptophytes, stramenopiles, and alveolates possess chlorophyll-c–containing plastids that are, under the chromalveolate hypothesis (Cavalier-Smith 1999), postulated to have originated by a single secondary endosymbiosis of a red alga in the ancestor of all these lineages. A photosynthetic ancestry for all chromalveolates is suggested by the history inferred from plastid phylogenies (e.g., Iida et al. 2007; Khan et al. 2007) and rare genomic events such as endosymbiotic gene replacements (Fast et al. 2001; Harper and Keeling 2003; Patron et al. 2004). The unexpected phylogenetic position of Rhizaria, for which no red algal-derived plastid-bearing lineages are known, most closely related to alveolates and stramenopiles caused some controversy over the chromalveolate hypothesis (Burki et al. 2007, 2008; Hackett et al. 2007; Rodriguez-Ezpeleta et al. 2007). Taking into account these new relationships as well as general difficulties associated with the chromalveolate hypothesis, in particular concerning the lack of clear evidence for the relative difficulty of plastid gain versus plastid loss, alternative models for the origin and spread of red algal-derived plastids have been proposed (Sanchez-Puerta and Delwiche 2008; Archibald 2009; Bodyl et al. 2009).
In this context, an important question raised by our results is whether expanding the group of eukaryotes with red algal-derived plastids to include additional nonphotosynthetic lineages is still compatible with the chromalveolate hypothesis. Indeed, the addition of telonemids, centrohelids, and Rhizaria involves additional photosynthesis and/or plastid loss events to explain the observed distribution of photosynthesis under the assumption that the common ancestor of all chromalveolates was photosynthetic. However, recent discoveries of cryptic plastids and genes of putative red algal origin in nonphotosynthetic chromalveolates illustrated the difficulty in distinguishing between the absence of a plastid and the absence of photosynthesis (Tyler et al. 2006; Reyes-Prieto et al. 2008; Slamovits and Keeling 2008), indicating that the chromalveolate hypothesis remains reasonable in spite of the numerous heterotrophic lineages it comprises.
In conclusion, the genomes of telonemids and centrohelids, as well as other nonphotosynthetic lineages such as Rhizaria, katablepharids, or early divergent stramenopiles potentially bear important information to evaluate the chromalveolate hypothesis and other scenarios. Indeed, remnant algal-derived genes might still persist in these nuclear genomes, which would be very helpful to favor either a photosynthetic ancestry for the chromalveolates or the possibility that plastids were transferred between the CCTH and SAR groups by serial endosymbioses.
All reads were assembled into contigs using the Newbler assembler with default parameters. We searched among contigs larger than 200 bp for sequences with significant similarity to genes recently used in multigene phylogenies using the following rigorous procedure (Burki et al. 2007, 2008): 1) BlastP searches against the translated set of T. subtilis and R. contractilis contigs using as queries the single-gene sequences composing our multiple alignments; 2) retrieving (with a stringent e value cutoff at 10−40) and adding of the new homologous copies to existing single-gene alignments; 3) automatic alignments using Mafft (Katoh et al. 2002), followed by manual inspection to remove unambiguously aligned positions; 4) testing the orthology, in particular possible lateral or ancestral endosymbiotic gene transfer, for each of the selected genes by performing single-gene ML reconstructions using Treefinder (WAG substitution matrix and six gamma categories; Jobb et al. 2004) and visually inspecting the resulting individual trees. We retained a set of 127 genes (29,235 amino acid positions) that did not show any obvious problem of deep paralogy or nonvertical transmission and 72 species excluding fast-evolving taxa used previously (Burki et al. 2008; the rhizarians Reticulomyxa and Quinqueloculina; the stramenopile Blastocystis; and the excavates Sawyeria, Leishmania, and Trypanosoma) when more slowly evolving lineages were available. Importantly, careful attention was made to correctly distinguish the Imantonia sequences from the Telonema library and the Chlorogonium sequences from the Raphidiophrys library. These species were kept in the single genes only when unambiguous sequence attributions were recovered, so for the trees shown here there was no confusion between the genes of the centrohelid and telonemid and those of their cocultured algal food. Monophyletic groups corresponding to haptophytes (including Imantonia but excluding Telonema) and green algae (including Chlorogonium but excluding Raphidiophrys) were mandatory in order to consider sequences from these species for concatenation. The final concatenation of all single-gene alignments was done using Scafos (Roure et al. 2007). Because of the limited data available for certain groups and to maximize the number of genes for each taxonomic assemblage, some lineages were represented by different closely related species always belonging to the same genus.
RAxML (Stamatakis 2006) was used in combination with the RTREV amino acid replacement matrix. The best ML tree was determined with the PROTGAMMA implementation in multiple inferences using 10 randomized maximum parsimony starting trees. Statistical support was evaluated with 200 bootstrap replicates. Four independent runs from different starting trees were performed on each replicate in order to prevent the analysis from getting trapped in a local maximum. The tree with the best log likelihood was selected for each replicate, and the 200 resulting trees were used to calculate the bootstrap proportions. To save computational burden, the PROTMIX solution was chosen with 25 distinct rate categories. Phylobayes (Lartillot and Philippe 2004) was run using the site-heterogeneous mixture CAT model and two independent Markov chains with a total length of 8,000 cycles, discarding the first 1,000 points as burn-in, and calculating the posterior consensus on the remaining trees. Convergence between the two chains was ascertained by comparing the frequency of their bipartitions. The bootstrapping of genes analysis used a new perl script that allows individual genes to be sampled (with replacement) to create new matrices containing the same number of genes as the original concatenated alignment. Specifically, 200 replicates containing 127 concatenated genes were constructed by sampling from within the initial pool of 127 genes and analyzed by ML as described above. To assess the robustness of the phylogenetic position of the CCTH group (or the position of the haptophytes and cryptomonads in the absence of T. subtilis and R. contractilis), we conducted topology comparisons using the AU test based on our supermatrix. Alternative topologies were generated by moving the CCTH group (or haptophytes and cryptomonads alone) either as sister to Plantae or at any possible positions within Plantae. Site likelihoods were then calculated using RAxML, and the AU test was performed using CONSEL v.0.1 (Shimodaira and Hasegawa 2001) with default scaling and replicate values.
In the separate analysis, we exhaustively examined the 10,395 test trees resulting from the applied constraints on eight major groups of eukaryotes. Log likelihoods for each test tree were calculated under the RTREV + Γ model using RAxML. RELL BPs were calculated using TotalML in Molphy v.2.3 (Adachi and Hasegawa 1996). Out of these 10,395 possible topologies, we subjected 351 test trees to the AU test. Specifically, we considered the 347 trees possessing the unikonts–excavates bifurcation that were closer than five standard error units from the ML tree (this restricted number of tested trees was due to computational burden) and four additional trees that were constructed by 1) moving T. subtilis to the branch leading to unikonts, 2) moving T. subtilis to the branch leading to excavates, 3) moving R. contractilis to the branch leading to unikonts, and 4) moving R. contractilis to the branch leading to excavates.
Research Council of Norway (the FUGE and FRIBIO programs) to K.S.J; Swiss National Science Foundation (3100A0–112645 and 31003A–125372); Centre for Microbial Diversity and Evolution from the Tula Foundation to A.H.; and Japanese Society for the Promotion of Sciences (201242) to Y.I.
Supplementary figures 1–3 and table S1 are available at Genome Biology and Evolution online (http://www.oxfordjournals.org/our_journals/gbe/). New sequences that were generated in this study and included in the concatenated alignment were deposited in EMBL nucleotide sequence database (GenBank accession numbers: Telonema subtilis FN392439–FN392557; Raphidiophrys contractilis FN392328–FN392438; Plagioselmis nannoplanctica FN392589–FN392620; Imantonia rotunda FN392558–FN392588) and GenBank (accession numbers: Guillardia theta GO787145–GO787665).
We thank Daniel Vaulot for providing the Telonema subtilis culture. All analyses were performed on the freely available Bioportal at the University of Oslo (http://www.bioportal.uio.no) and the Vital-IT (http://www.vital-it.ch) Center for high-performance computing of the Swiss Institute of Bioinformatics. We thank Ave Tooming-Klunderud and Lex Nederbragt at the Ultra-high throughput sequencing laboratory at the University of Oslo for help with 454 sequencing of T. subtilis. And we thank Tom Andersen for assistance with bootstrapping of genes. K.S.T. thanks the University of Oslo for starting grants and EMBIO for funding of the Bioportal. F.B. is grateful to the Fondation Ernst and Lucie Schmidheiny for funding part of the R. contractilis sequencing. The G. theta ESTs used in this study were generated by the Protist EST Program and the Joint Genome Institute's Community Sequencing Program (http://www.jgi.doe.gov/sequencing/why/50026.html). We thank Kerrie Barry and Erika Lindquist of the JGI for project management and data availability.