The tree of life is, probably, the single dominating metaphor that permeates the discourse of evolutionary biology, from the famous single illustration in Darwin's On the
Origin of Species [
1] to 21st-century textbooks. For about a century, from the publication of the
Origin to the founding work in molecular evolution carried out by Zuckerkandl and Pauling in the early 1960s [
2,
3], phylogenetic trees were constructed on the basis of phenotypic differences between organisms. Accordingly, every tree constructed during that century was an 'organismal' or 'species' tree by definition; that is, it was assumed to reflect the evolutionary history of the corresponding species. Zuckerkandl and Pauling introduced molecular phylogeny, but for the next two decades or so it was viewed simply as another, perhaps most powerful, approach to the construction of species trees and, ultimately, the tree of life that would embody the evolutionary relationships between all lineages of cellular life forms. The introduction of rRNA as the molecule of choice for the reconstruction of the phylogeny of prokaryotes by Woese and co-workers [
4,
5], which was accompanied by the discovery of a new domain of life – the Archaea – boosted hopes that the detailed, definitive topology of the tree of life could be within sight.
Even before the advent of extensive genomic sequencing, it had become clear that biologically important common genes of prokaryotes had experienced multiple horizontal gene transfers (HGTs), so the idea of a 'net of life' potentially replacing the tree of life was introduced [
6,
7]. Advances in comparative genomics revealed that different genes very often had distinct tree topologies and, accordingly, that HGT seemed to be extremely common among prokaryotes (bacteria and archaea) [
8-
17], and could also have been important in the evolution of eukaryotes, especially as a consequence of endosymbiotic events [
18-
21]. These findings indicate that a true, perfect tree of life does not exist because HGT prevents any single gene tree from being an accurate representation of the evolution of entire genomes. The nearly universal realization that HGT among prokaryotes is common and extensive, rather than rare and inconsequential, led to the idea of 'uprooting' the tree of life, a development that is often viewed as a paradigm shift in evolutionary biology [
11,
22,
23].
Of course, no amount of inconsistency between gene phylogenies caused by HGT or other processes can alter the fact that all cellular life forms are linked by a tree of cell divisions (
Omnis cellula e cellula, quoting the famous motto of Rudolf Virchow – paradoxically, an anti-evolutionist [
24]) that goes back to the earliest stages of evolution and is only violated by endosymbiotic events that were key to the evolution of eukaryotes but not prokaryotes [
25]. Thus, the travails of the tree of life concept in the era of comparative genomics concern the tree as it can be derived by the phylogenetic (phylogenomic) analysis of genes and genomes. The claim that HGT uproots the tree of life more accurately has to be read to mean that extensive HGT has the potential to result in the complete decoupling of molecular phylogenies from the actual tree of cells. It should be kept in mind that the evolutionary history of genes also describes the evolution of the encoded molecular functions, so the phylogenomic analyses have clear biological connotations. In this article we discuss the phylogenomic tree of life with this implicit understanding.
The views of evolutionary biologists on the changing status of the tree of life (see [
23] for a conceptual discussion) span the entire range from persistent denial of the major importance of HGT for evolutionary biology [
26,
27]; to 'moderate' overhaul of the tree of life concept [
28-
33]; to radical uprooting whereby the representation of the evolution of organisms (or genomes) as a tree of life is declared meaningless [
34-
36]. The moderate approach maintains that all the differences between individual gene trees notwithstanding, the tree of life concept still makes sense as a representation of a central trend (consensus) that, at least in principle, could be elucidated by comprehensive comparison of tree topologies. The radical view counters that the reality of massive HGT renders illusory the very distinction between the vertical and horizontal transmission of genetic information, so that the tree of life concept should be abandoned altogether in favor of a (broadly defined) network representation of evolution [
17]. Perhaps the tree of life conundrum is epitomized in the recent debate on the tree that was generated from a concatenation of alignments of 31 highly conserved proteins and touted as an automatically constructed, highly resolved tree of life [
37], only to be dismissed with the label of a 'tree of one percent' (of the genes in any given genome) [
38].
Here we report an exhaustive comparison of approximately 7,000 phylogenetic trees for individual genes that collectively comprise the 'forest of life' and show that this set of trees does gravitate to a single tree topology, but that the deep splits in this topology cannot be unambiguously resolved, probably due to both extensive HGT and methodological problems of tree reconstruction. Nevertheless, computer simulations indicate that the observed pattern of evolution of archaea and bacteria better corresponds to a compressed cladogenesis model [
39,
40] than to a 'Big Bang' model that includes non-tree-like phases of evolution [
36]. Together, these findings seem to be compatible with the 'tree of life as a central trend' concept.