Sequencing of 81 clones from an EES of the small river, the Seymaz (Geneva, Switzerland) yielded 58 distinct SSU rRNA phylotypes. The size of the sequences varies from 760 to 900 base pairs, which corresponds to the average size expected for the amplified fragment (helices 27 to 50 of the SSU rRNA secondary structure). Size variations occur mainly in the variable region V7, but expansions were observed in the variable region V8 for some sequences. The newly obtained SSU rRNA phylotypes were added to a general alignment of eukaryotes, including most complete or nearly complete sequences from EES available in GenBank. Sequences from cultured organisms were selected so that all major taxonomic groups of eukaryotes were represented; only extremely divergent lineages such as microsporidia and metamonads were omitted. Manual alignment of our sequences allowed the identification of 10 chimeras, which were initially detected because different regions of the same sequence contained rare substitutions and/or indels that are specific for different groups of eukaryotes. Distance analyses based on different subsets of unambiguously aligned regions (partial treeing analysis [12
]) were then used to confirm the chimeric nature of these sequences (see Additional file 1
for detailed examples of how we detected chimeric sequences).
The phylogenetic position of the 48 non-chimeric phylotypes from our samples was assessed by minimum evolution analyses. Results are illustrated in Figure (see Additional file 2
for a summary of the identification of all 81 sequences). The tree shown in Figure is the result of an analysis of 86 partial eukaryotic SSU rRNA gene sequences, including five selected environmental phylotypes from previous studies. A total of 670 unambiguously aligned positions were included, and the GTR + G model of evolution was used (alpha = 0.37). Because of the short size of the amplified fragment, some phylogenetic signal was lost and the monophyly of cercozoans and fungi was not retrieved. Almost all phylotypes belong to already known eukaryotic groups. Their relative proportions are illustrated in Figure . Only two phylotypes (Sey010 and Sey017, represented by ten and two sequences, respectively) belong to a yet undetermined, fast-evolving eukaryotic lineage (Figure ). They clearly correspond to already published environmental sequences from deep-sea Antarctic plankton (DH148-5-EKD18 [3
]), from the Guaymas Basin hydrothermal vent (CS_R003 [7
]), and from anoxic, marine sediments collected in Bolinas Tidal Flat (BOL1 cluster [5
]). These phylotypes were screened by eye in search for rare sequence signatures that would support their inclusion in already known eukaryotic groups, but none could be detected, suggesting that this lineage might represent a novel high-level taxon.
Figure 1 Identification of the 48 distinct, non-chimeric eukaryotic phylotypes we obtained from our samples of the small river, the Seymaz (Geneva, Switzerland). (A) Phylogenetic positions of the 48 eukaryotic phylotypes we obtained. The tree shown is the result (more ...)
In the second part of this work, we re-analysed 403 complete or nearly complete published environmental sequences, representing 289 distinct phylotypes. We focused on 28 phylotypes that could not be attributed to known groups of eukaryotes. First, our general alignment was screened by eye for the presence of specific sequence signatures, as described above. It is noteworthy that several previously undetected chimeras were identified in that way, among which three phylotypes were considered as novel high-level taxa, and this result was confirmed by partial treeing analysis. The phylogenetic position of all non-chimeric phylotypes was analysed using Bayesian methods (Figures , , and ; see Additional file 3
for a summary of the identification of all 403 sequences). In order to avoid the loss of important informative sites, none of our partial sequences were included in these analyses. The tree shown in Figure is the result of a Bayesian analysis of 125 eukaryotic SSU rRNA gene sequences, including a selection of 56 phylotypes from environmental surveys. A total of 1,175 unambiguously aligned positions were included, and the GTR + G model of evolution was used (alpha = 0.44). Since resolution within alveolates and opisthokonts was poor (using only 1,175 sites), two additional datasets were designed to refine evolutionary relationships within these supergroups. Figure presents the result of a Bayesian analysis of 77 alveolate SSU rRNA gene sequences, inferred from 1,325 unambiguously aligned positions, using the GTR + G model of evolution (alpha = 0.38). Figure presents the result of a Bayesian analysis of 75 opisthokont SSU rRNA gene sequences, inferred from 1,395 unambiguously aligned positions, using the same model (alpha = 0.37). Remarkably, 10 of the 25 non-chimeric phylotypes that could not be attributed to known lineages of eukaryotes are now robustly identified as fast-evolving members of different well-known groups (mainly alveolates), and five other phylotypes can be linked to recently published sequences of various small eukaryotic lineages (Figures and ). Figure summarizes the proportion of phylotypes belonging to each of the higher-level eukaryotic groups identified in EES as previously published (Figure ) and after our re-analysis (Figure ).
Figure 2 Bayesian phylogeny of eukaryotes based on the analysis of 125 complete or nearly complete SSU rRNA gene sequences (1,175 positions), including 56 selected environmental phylotypes (indicated in bold). The number of phylotypes belonging to each higher-level (more ...)
Figure 3 Bayesian phylogeny of alveolates based on the analysis of 80 complete or nearly complete SSU rRNA gene sequences (1,325 positions), including 44 selected environmental phylotypes (indicated in bold). The number of phylotypes belonging to each of the five (more ...)
Figure 4 Bayesian phylogeny of opisthokonts based on the analysis of 80 complete or nearly complete SSU rRNA gene sequences (1,395 positions), including 28 selected environmental phylotypes (indicated in bold). The number of phylotypes belonging to each opisthokont (more ...)
Figure 5 Identification of the 289 published phylotypes we re-analysed. (A) As determined by their authors and (B) after our re-analysis, highlighting the relative proportion of previously undetected chimeras and the reduced number of phylotypes of undetermined (more ...)