The trees constructed with each of the four approaches employed here reflect both the phylogenetic signal and the phenotypic (life style) similarities or differences between organisms, but the relative contributions of these two types of information appear to differ substantially. The gene presence-absence analysis seemed to be dominated by the phenotypic signal, primarily that from gene loss. The tree based on conserved gene pairs appeared to combine phylogenetic information with major effects of horizontal transfer of operons. In contrast, the trees based on the distributions of the identity level of orthologs appear to be more meaningful phylogenetically as indicated by the recovery of established high-level phylogenetic groups of bacteria, such as Proteobacteria and Gram-positive bacteria. The ability to correctly identify these major bacterial subdivisions and the absence of obviously wrong groupings confer credibility to non-trivial clades present in these trees, in particular the spirochete-chlamydia clade. The same logic applied to the tree made of concatenated ribosomal protein sequences, which included two other non-trivial bacterial groupings, Aquifex-Thermotoga and Synechocystis-Mycobacterium-Deinococcus, the latter joining the Gram-positive branch. Furthermore, extensive testing of alternative topologies using the Kishino-Hasegawa test largely supported these new bacterial branches. The nature of this support becomes clearer when one examines the results of the protein family census. Each of the potential new clades was indeed most common among the observed topologies, but in no case, was the excess of this topology overwhelming. Taken together, these results seem to shed light on the very notion of a "species tree". It appears that, at best, a species tree can be viewed as a prevailing phylogenetic trend, which, as far as deep branchings are concerned, may not even apply to a majority of the genes in a genome.
The potential new, deep relationships between bacterial lineages revealed during this analysis should be considered preliminary and treated with caution. Nevertheless, an evolutionary affinity between Cyanobacteria (Synechocystis
) and Actinomycetes (Mycobacterium
) appears plausible, particularly given the presence, in these bacterial groups, of well-developed and partly similar signal transduction systems [27
]. The connection between two hyperthermophilic bacteria, Aquifex
also has obvious biological meaning, although, in this case, particular caution is due, given the possibility of preferential horizontal gene exchange between these organisms that inhabit similar environments. However, the strong support for this grouping obtained in the analysis of concatenated ribosomal proteins argues against horizontal transfer as the primary cause for the observed topology. Although recent studies on the phylogeny of ribosomal proteins suggest some horizontal transfer events, these seem to be largely restricted to bacteria-specific ribosomal proteins. In the universal set of ribosomal proteins, only one, S14, showed clear signs of horizontal transfer [28
]. The potential deep phylogenetic connections uncovered during this analysis call for detailed genome comparisons in search of potential shared derived characters, such as unique protein domain architectures, that could support the new clades.
The major bacterial lineages are poorly resolved in rRNA-based trees [2
] and those built using alignments of RNA polymerase subunits [30
] and translation elongation factors [29
]. In the currently accepted taxonomy, which is based primarily (but not exclusively) on 16S RNA phylogenetic analysis, bacterial lineages that are suggested by this analysis to form higher-level clusters, tend to form primary nodes under Bacteria (Chlamydiales, Spirochetales, Cyanobacteria, the Thermus-Deinococcus
group, Aquificales, Thermotogales). Thus, the genome trees primarily suggest (however tentatively) new unifications based on deep phylogenetic connections, rather than split already established clades. A notable exception is the traditional unification of Actinomycetes, or High G+C gram-positive bacteria (represented here by Mycobacterium
), with low G+C Gram-positive bacteria (the Bacillus-Clostridium
group) under Firmicutes (Gram-positive bacteria). Such a connection was not supported by any of the trees analyzed here, and it is also poorly, if at all, supported by the latest consensus trees for 16S RNA, 23 S RNA and translation factor EF-Tu [29
]. Therefore it seems likely that the Firmicutes clade, at least in its present composition, does not exist. The new clade that might replace it consists of low-GC Gram-positive bacteria and the potential Actinomycetes-Deinococcales-Cyanobacteria group (Fig. ). All methods of tree analysis applied here also challenge the traditional division of the archaeal kingdom into Euryarchaeota and Crenarchaeota, suggesting instead that Euryarchaeota could be a paraphyletic group with respect to Crenarchaeota, or in other words, that Crenarchaeota might have evolved from within the Euryarchaeota. However, the existence of a statis
tically supported alternative topology, with a sister-group relationship between Euryarchaeota and Crenarchaeota allows for the possibility that the apparent paraphyly of Euryarchaea is an artifact caused by rapid evolution in some Euryarchaeal lineages, such as Halobacterium
An independent phylogenetic study of concatenated ribosomal proteins has been recently published [32
]. The main specific conclusion reported in this study was the apparent association of Synechocystis
with Gram-positive bacteria, although instability of the tree topology dependent on the subset of sites used for analysis was noticed. Another recent study addressed the issue of a global tree through phylogenetic analysis of 14 concatenated sets of orthologous proteins, for which no strong evidence of horizontal transfer was available [33
]. Notably, some of the unexpected groupings within the bacterial domain reported in this study coincide or overlap with those described here, namely, a spirochete-chlamydial clade and a Deinococcales-Cyanobacteria clade. The grouping of the latter clade with Actinomycetes, the unification of the Deinococcales-Cyanobacteria-Actinomycetes clade with Gram-positive bacteria and the grouping of the two bacterial hyperthermophiles were not reproduced in the work of Brown and co-workers. The differences between the results of the two studies could owe to the differences between data sets analyzed, the methods used or, most likely, both. We should note that the present study engaged a substantially broader data set and more diverse methods for tree construction. We believe, however, that, in terms of the potential contribution of genome-wide phylogenetic analysis to phylogenetic taxonomy, the areas where different methods and independent analyses by different groups converge might be more important than the areas of discrepancy. It appears that potential new clades revealed in such independent studies are strong candidates for new, high-level taxa.
The results of the present study suggest that genome trees based on new, integral criteria do not provide substantial advantages in phylogenetic reconstruction over more traditional, alignment-based methods expanded to the genomic scale. In fact, the latter seem to be more sensitive in detecting potential deep evolutionary relationships and this is expected to further improve with the increasing number of completely sequenced genomes becoming available for analysis. We believe, however, that this conclusion does not necessarily indicate that genome trees, such as those based on representation of genomes in orthologous sets or conservation of gene pairs, are useless. In addition to revealing some new phylogenetic affinities, they are capable of alerting researchers to other evolutionary phenomena, such as loss of similar gene sets in different organisms and preferential horizontal gene exchange between certain lineages.