The elucidation of the dominant role of horizontal gene transfer (HGT) in the evolution of prokaryotes led to a severe crisis of the Tree of Life (TOL) concept and intense debates on this subject.
Prompted by the crisis of the TOL, we attempt to define the primary units and the fundamental patterns and processes of evolution. We posit that replication of the genetic material is the singular fundamental biological process and that replication with an error rate below a certain threshold both enables and necessitates evolution by drift and selection. Starting from this proposition, we outline a general concept of evolution that consists of three major precepts.
1. The primary agency of evolution consists of Fundamental Units of Evolution (FUEs), that is, units of genetic material that possess a substantial degree of evolutionary independence. The FUEs include both bona fide selfish elements such as viruses, viroids, transposons, and plasmids, which encode some of the information required for their own replication, and regular genes that possess quasi-independence owing to their distinct selective value that provides for their transfer between ensembles of FUEs (genomes) and preferential replication along with the rest of the recipient genome.
2. The history of replication of a genetic element without recombination is isomorphously represented by a directed tree graph (an arborescence, in the graph theory language). Recombination within a FUE is common between very closely related sequences where homologous recombination is feasible but becomes negligible for longer evolutionary distances. In contrast, shuffling of FUEs occurs at all evolutionary distances. Thus, a tree is a natural representation of the evolution of an individual FUE on the macro scale, but not of an ensemble of FUEs such as a genome.
3. The history of life is properly represented by the "forest" of evolutionary trees for individual FUEs (Forest of Life, or FOL). Search for trends and patterns in the FOL is a productive direction of study that leads to the delineation of ensembles of FUEs that evolve coherently for a certain time span owing to a shared history of vertical inheritance or horizontal gene transfer; these ensembles are commonly known as genomes, taxa, or clades, depending on the level of analysis. A small set of genes (the universal genetic core of life) might show a (mostly) coherent evolutionary trend that transcends the entire history of cellular life forms. However, it might not be useful to denote this trend "the tree of life", or organismal, or species tree because neither organisms nor species are fundamental units of life.
A logical analysis of the units and processes of biological evolution suggests that the natural fundamental unit of evolution is a FUE, that is, a genetic element with an independent evolutionary history. Evolution of a FUE on the macro scale is naturally represented by a tree. Only the full compendium of trees for individual FUEs (the FOL) is an adequate depiction of the evolution of life. Coherent evolution of FUEs over extended evolutionary intervals is a crucial aspect of the history of life but a "species" or "organismal" tree is not a fundamental concept.
This articles was reviewed by Valerian Dolja, W. Ford Doolittle, Nicholas Galtier, and William Martin
Comparative genomics and systems biology offer unprecedented opportunities for testing central tenets of evolutionary biology formulated by Darwin in the Origin of Species in 1859 and expanded in the Modern Synthesis 100 years later. Evolutionary-genomic studies show that natural selection is only one of the forces that shape genome evolution and is not quantitatively dominant, whereas non-adaptive processes are much more prominent than previously suspected. Major contributions of horizontal gene transfer and diverse selfish genetic elements to genome evolution undermine the Tree of Life concept. An adequate depiction of evolution requires the more complex concept of a network or ‘forest’ of life. There is no consistent tendency of evolution towards increased genomic complexity, and when complexity increases, this appears to be a non-adaptive consequence of evolution under weak purifying selection rather than an adaptation. Several universals of genome evolution were discovered including the invariant distributions of evolutionary rates among orthologous genes from diverse genomes and of paralogous gene family sizes, and the negative correlation between gene expression level and sequence evolution rate. Simple, non-adaptive models of evolution explain some of these universals, suggesting that a new synthesis of evolutionary biology might become feasible in a not so remote future.
The first bacterial genome was sequenced in 1995, and the first archaeal genome in 1996. Soon after these breakthroughs, an exponential rate of genome sequencing was established, with a doubling time of approximately 20 months for bacteria and approximately 34 months for archaea. Comparative analysis of the hundreds of sequenced bacterial and dozens of archaeal genomes leads to several generalizations on the principles of genome organization and evolution. A crucial finding that enables functional characterization of the sequenced genomes and evolutionary reconstruction is that the majority of archaeal and bacterial genes have conserved orthologs in other, often, distant organisms. However, comparative genomics also shows that horizontal gene transfer (HGT) is a dominant force of prokaryotic evolution, along with the loss of genetic material resulting in genome contraction. A crucial component of the prokaryotic world is the mobilome, the enormous collection of viruses, plasmids and other selfish elements, which are in constant exchange with more stable chromosomes and serve as HGT vehicles. Thus, the prokaryotic genome space is a tightly connected, although compartmentalized, network, a novel notion that undermines the ‘Tree of Life’ model of evolution and requires a new conceptual framework and tools for the study of prokaryotic evolution.
The year 2009 is the 200th anniversary of the publication of Jean-Bapteste Lamarck's Philosophie Zoologique and the 150th anniversary of Charles Darwin's On the Origin of Species. Lamarck believed that evolution is driven primarily by non-randomly acquired, beneficial phenotypic changes, in particular, those directly affected by the use of organs, which Lamarck believed to be inheritable. In contrast, Darwin assigned a greater importance to random, undirected change that provided material for natural selection.
The classic Lamarckian scheme appears untenable owing to the non-existence of mechanisms for direct reverse engineering of adaptive phenotypic characters acquired by an individual during its life span into the genome. However, various evolutionary phenomena that came to fore in the last few years, seem to fit a more broadly interpreted (quasi)Lamarckian paradigm. The prokaryotic CRISPR-Cas system of defense against mobile elements seems to function via a bona fide Lamarckian mechanism, namely, by integrating small segments of viral or plasmid DNA into specific loci in the host prokaryote genome and then utilizing the respective transcripts to destroy the cognate mobile element DNA (or RNA). A similar principle seems to be employed in the piRNA branch of RNA interference which is involved in defense against transposable elements in the animal germ line. Horizontal gene transfer (HGT), a dominant evolutionary process, at least, in prokaryotes, appears to be a form of (quasi)Lamarckian inheritance. The rate of HGT and the nature of acquired genes depend on the environment of the recipient organism and, in some cases, the transferred genes confer a selective advantage for growth in that environment, meeting the Lamarckian criteria. Various forms of stress-induced mutagenesis are tightly regulated and comprise a universal adaptive response to environmental stress in cellular life forms. Stress-induced mutagenesis can be construed as a quasi-Lamarckian phenomenon because the induced genomic changes, although random, are triggered by environmental factors and are beneficial to the organism.
Both Darwinian and Lamarckian modalities of evolution appear to be important, and reflect different aspects of the interaction between populations and the environment.
this article was reviewed by Juergen Brosius, Valerian Dolja, and Martijn Huynen. For complete reports, see the Reviewers' reports section.
It is well known that horizontal gene transfer (HGT) is a major force in the evolution of prokaryotes. During the adaptation of a bacterial population to a new ecological niche, and particularly for intracellular bacteria, selective pressures are shifted and ecological niches reduced, resulting in a lower rate of genetic connectivity. HGT and positive selection are therefore two important evolutionary forces in microbial pathogens that drive adaptation to new hosts. In this study, we use genomic distance analyses, phylogenomic networks, tree topology comparisons, and Bayesian inference methods to investigate to what extent HGT has occurred during the evolution of the genus Rickettsia, the effect of the use of different genomic regions in estimating reticulate evolution and HGT events, and the link of these to host range. We show that ecological specialization restricts recombination occurrence in Rickettsia, but other evolutionary processes and genome architecture are also important for the occurrence of HGT. We found that recombination, genomic rearrangements, and genome conservation all show evidence of network-like evolution at whole-genome scale. We show that reticulation occurred mainly, but not only, during the early Rickettsia radiation, and that core proteome genes of every major functional category have experienced reticulated evolution and possibly HGT. Overall, the evolution of Rickettsia bacteria has been tree-like, with evidence of HGT and reticulated evolution for around 10–25% of the core Rickettsia genome. We present evidence of extensive recombination/incomplete lineage sorting (ILS) during the radiation of the genus, probably linked with the emergence of intracellularity in a wide range of hosts.
Rickettsia; horizontal gene transfer; phylogenomic networks; reticulate evolution; bacterial speciation
Darwin's theory about the evolution of species has been the object of considerable dispute. In this review, we have described seven key principles in Darwin's book The Origin of Species and tried to present how genomics challenge each of these concepts and improve our knowledge about evolution. Darwin believed that species evolution consists on a positive directional selection ensuring the “survival of the fittest.” The most developed state of the species is characterized by increasing complexity. Darwin proposed the theory of “descent with modification” according to which all species evolve from a single common ancestor through a gradual process of small modification of their vertical inheritance. Finally, the process of evolution can be depicted in the form of a tree. However, microbial genomics showed that evolution is better described as the “biological changes over time.” The mode of change is not unidirectional and does not necessarily favors advantageous mutations to increase fitness it is rather subject to random selection as a result of catastrophic stochastic processes. Complexity is not necessarily the completion of development: several complex organisms have gone extinct and many microbes including bacteria with intracellular lifestyle have streamlined highly effective genomes. Genomes evolve through large events of gene deletions, duplications, insertions, and genomes rearrangements rather than a gradual adaptative process. Genomes are dynamic and chimeric entities with gene repertoires that result from vertical and horizontal acquisitions as well as de novo gene creation. The chimeric character of microbial genomes excludes the possibility of finding a single common ancestor for all the genes recorded currently. Genomes are collections of genes with different evolutionary histories that cannot be represented by a single tree of life (TOL). A forest, a network or a rhizome of life may be more accurate to represent evolutionary relationships among species.
catastrophes; Darwin; gene creation; giant viruses; micorbial genomics; rhizome of life; sequence exchange
Charles Darwin believed that all traits of organisms have been honed to near perfection by natural selection. The empirical basis underlying Darwin’s conclusions consisted of numerous observations made by him and other naturalists on the exquisite adaptations of animals and plants to their natural habitats and on the impressive results of artificial selection. Darwin fully appreciated the importance of heredity but was unaware of the nature and, in fact, the very existence of genomes. A century and a half after the publication of the “Origin”, we have the opportunity to draw conclusions from the comparisons of hundreds of genome sequences from all walks of life. These comparisons suggest that the dominant mode of genome evolution is quite different from that of the phenotypic evolution. The genomes of vertebrates, those purported paragons of biological perfection, turned out to be veritable junkyards of selfish genetic elements where only a small fraction of the genetic material is dedicated to encoding biologically relevant information. In sharp contrast, genomes of microbes and viruses are incomparably more compact, with most of the genetic material assigned to distinct biological functions. However, even in these genomes, the specific genome organization (gene order) is poorly conserved. The results of comparative genomics lead to the conclusion that the genome architecture is not a straightforward result of continuous adaptation but rather is determined by the balance between the selection pressure, that is itself dependent on the effective population size and mutation rate, the level of recombination, and the activity of selfish elements. Although genes and, in many cases, multigene regions of genomes possess elaborate architectures that ensure regulation of expression, these arrangements are evolutionarily volatile and typically change substantially even on short evolutionary scales when gene sequences diverge minimally. Thus, the observed genome archtiectures are, mostly, products of neutral processes or epiphenomena of more general selective processes, such as selection for genome streamlining in successful lineages with large populations. Selection for specific gene arrangements (elements of genome architecture) seems only to modulate the results of these processes.
The scientific contribution of Darwin, still agonized in many religious circles, has now been recognized and celebrated by scientists from various disciplines. However, in recent years, several evolutionists have criticized Darwin as outdated, arguing that “Darwinism,” assimilated to the “tree of life,” cannot explain microbial evolution, or else was not operating in early life evolution. These critics either confuse “Darwinism” and old versions of “neo-Darwinism” or misunderstand the role of gene transfers in evolution. The core of Darwin explanation of evolution (variation/selection) remains necessary and sufficient to decipher the history of life. The enormous diversity of mechanisms underlying variations has been successfully interpreted by evolutionists in this framework and has considerably enriched the corpus of evolutionary biology without the necessity to kill the father. However, it remains for evolutionists to acknowledge interactions between cells and viruses (unknown for Darwin) as a major driving force in life evolution.
evolutionary synthesis; variation; natural selection; lateral gene transfer; Darwinian threshold; viruses
Darwin provided a great unifying theory for biology; its visual expression is the universal tree of life. The tree concept is challenged by the occurrence of horizontal gene transfer and—as summarized in this review—by the omission of viruses. Microbial ecologists have demonstrated that viruses are the most numerous biological entities on earth, outnumbering cells by a factor of 10. Viral genomics have revealed an unexpected size and distinctness of the viral DNA sequence space. Comparative genomics has shown elements of vertical evolution in some groups of viruses. Furthermore, structural biology has demonstrated links between viruses infecting the three domains of life pointing to a very ancient origin of viruses. However, presently viruses do not find a place on the universal tree of life, which is thus only a tree of cellular life. In view of the polythetic nature of current life definitions, viruses cannot be dismissed as non-living material. On earth we have therefore at least two large DNA sequence spaces, one represented by capsid-encoding viruses and another by ribosome-encoding cells. Despite their probable distinct evolutionary origin, both spheres were and are connected by intensive two-way gene transfers.
universal tree; viruses; phages
Horizontal gene transfer (HGT) is a process that facilitates the transfer of genetic material between organisms that are not directly related, and thus can affect both the rate of evolution and emergence of traits. Recent phylogenetic studies reveal HGT events are likely ubiquitous in the Tree of Life. However, our knowledge of HGT's role in evolution and biological organization is very limited, mainly due to the lack of ancestral evolutionary signatures and the difficulty to observe complex evolutionary dynamics in a laboratory setting. Here, we utilize a multi-scale microbial evolution model to comprehensively study the effect of HGT on the evolution of complex traits and organization of gene regulatory networks.
Large-scale simulations reveal a distinct signature of the Distribution of Fitness Effect (DFE) for HGT events: during evolution, while mutation fitness effects become more negative and neutral, HGT events result in a balanced effect distribution. In either case, lethal events are significantly decreased during evolution (33.0% to 3.2%), a clear indication of mutational robustness. Interestingly, evolution was accelerated when populations were exposed to correlated environments of increasing complexity, especially in the presence of HGT, a phenomenon that warrants further investigation. High HGT rates were found to be disruptive, while the average transferred fragment size was linked to functional module size in the underlying biological network. Network analysis reveals that HGT results in larger regulatory networks, but with the same sparsity level as those evolved in its absence. Observed phenotypic variability and co-existing solutions were traced to individual gain/loss of function events, while subsequent re-wiring after fragment integration was necessary for complex traits to emerge.
The core enzymes of the DNA replication systems show striking diversity among cellular life forms and more so among viruses. In particular, and counter-intuitively, given the central role of DNA in all cells and the mechanistic uniformity of replication, the core enzymes of the replication systems of bacteria and archaea (as well as eukaryotes) are unrelated or extremely distantly related. Viruses and plasmids, in addition, possess at least two unique DNA replication systems, namely, the protein-primed and rolling circle modalities of replication. This unexpected diversity makes the origin and evolution of DNA replication systems a particularly challenging and intriguing problem in evolutionary biology.
I propose a specific succession for the emergence of different DNA replication systems, drawing argument from the differences in their representation among viruses and other selfish replicating elements. In a striking pattern, the DNA replication systems of viruses infecting bacteria and eukaryotes are dominated by the archaeal-type B-family DNA polymerase (PolB) whereas the bacterial replicative DNA polymerase (PolC) is present only in a handful of bacteriophage genomes. There is no apparent mechanistic impediment to the involvement of the bacterial-type replication machinery in viral DNA replication. Therefore, I hypothesize that the observed, markedly unequal distribution of the replicative DNA polymerases among the known cellular and viral replication systems has a historical explanation. I propose that, among the two types of DNA replication machineries that are found in extant life forms, the archaeal-type, PolB-based system evolved first and had already given rise to a variety of diverse viruses and other selfish elements before the advent of the bacterial, PolC-based machinery. Conceivably, at that stage of evolution, the niches for DNA-viral reproduction have been already filled with viruses replicating with the help of the archaeal system, and viruses with the bacterial system never took off. I further suggest that the two other systems of DNA replication, the rolling circle mechanism and the protein-primed mechanism, which are represented in diverse selfish elements, also evolved prior to the emergence of the bacterial replication system. This hypothesis is compatible with the distinct structural affinities of PolB, which has the palm-domain fold shared with reverse transcriptases and RNA-dependent RNA polymerases, and PolC that has a distinct, unrelated nucleotidyltransferase fold. I propose that PolB is a descendant of polymerases that were involved in the replication of genetic elements in the RNA-protein world, prior to the emergence of DNA replication. By contrast, PolC might have evolved from an ancient non-templated polymerase, e.g., polyA polymerase. The proposed temporal succession of the evolving DNA replication systems does not depend on the specific scenario adopted for the evolution of cells and viruses, i.e., whether viruses are derived from cells or virus-like elements are thought to originate from a primordial gene pool. However, arguments are presented in favor of the latter scenario as the most parsimonious explanation of the evolution of DNA replication systems.
Comparative analysis of the diversity of genomic strategies and organizations of viruses and cellular life forms has the potential to open windows into the deep past of life's evolution, especially, with the regard to the origin of genome replication systems. When complemented with information on the evolution of the relevant protein folds, this comparative approach can yield credible scenarios for very early steps of evolution that otherwise appear to be out of reach.
Eric Bapteste, Patrick Forterre, and Mark Ragan.
The concept of the genome tree depends on the potential evolutionary significance in the clustering of species according to similarities in the gene content of their genomes. In this respect, genome trees have often been identified with species trees. With the rapid expansion of genome sequence data it becomes of increasing importance to develop accurate methods for grasping global trends for the phylogenetic signals that mutually link the various genomes. We therefore derive here the methodological concept of genome trees based on protein conservation profiles in multiple species. The basic idea in this derivation is that the multi-component “presence-absence” protein conservation profiles permit tracking of common evolutionary histories of genes across multiple genomes. We show that a significant reduction in informational redundancy is achieved by considering only the subset of distinct conservation profiles. Beyond these basic ideas, we point out various pitfalls and limitations associated with the data handling, paving the way for further improvements. As an illustration for the methods, we analyze a genome tree based on the above principles, along with a series of other trees derived from the same data and based on pair-wise comparisons (ancestral duplication-conservation and shared orthologs). In all trees we observe a sharp discrimination between the three primary domains of life: Bacteria, Archaea, and Eukarya. The new genome tree, based on conservation profiles, displays a significant correspondence with classically recognized taxonomical groupings, along with a series of departures from such conventional clusterings.
Since Darwin's Origin of Species and Haeckel's Tree of Life, systematic biology has attempted to classify species into “family trees.” Genomics has provided a new framework permitting descriptions of sibling relations between species on the basis of their complete genetic blueprints. While trees based on single genes (rRNA), or limited numbers of genes have been useful, genome trees derived from complete genome comparisons should lead to more complete pictures of phylogenetic relations between various organisms. In order to reach such a global vision, procedures to establish sibling relationships should depend on an overall comparison that captures the evolutionary fates of proteins jointly in multiple genomes. This paper aims to establish a methodological basis to use genuine multidimensional procedures in the construction of genome trees. This approach completes the derivation of trees based on more classical techniques of pair-wise comparison between species. The authors survey classification schemes emerging from this approach, which either supports traditional views, such as the separation between the three phylogenetic domains Bacteria, Archaea, and Eukarya, or challenges them by suggesting, for example, intermingled clusterings of Proteobacteria with various other bacterial species.
Comparative analysis of sequenced genomes reveals numerous instances of apparent horizontal gene transfer (HGT), at least in prokaryotes, and indicates that lineage-specific gene loss might have been even more common in evolution. This complicates the notion of a species tree, which needs to be re-interpreted as a prevailing evolutionary trend, rather than the full depiction of evolution, and makes reconstruction of ancestral genomes a non-trivial task.
We addressed the problem of constructing parsimonious scenarios for individual sets of orthologous genes given a species tree. The orthologous sets were taken from the database of Clusters of Orthologous Groups of proteins (COGs). We show that the phyletic patterns (patterns of presence-absence in completely sequenced genomes) of almost 90% of the COGs are inconsistent with the hypothetical species tree. Algorithms were developed to reconcile the phyletic patterns with the species tree by postulating gene loss, COG emergence and HGT (the latter two classes of events were collectively treated as gene gains). We prove that each of these algorithms produces a parsimonious evolutionary scenario, which can be represented as mapping of loss and gain events on the species tree. The distribution of the evolutionary events among the tree nodes substantially depends on the underlying assumptions of the reconciliation algorithm, e.g. whether or not independent gene gains (gain after loss after gain) are permitted. Biological considerations suggest that, on average, gene loss might be a more likely event than gene gain. Therefore different gain penalties were used and the resulting series of reconstructed gene sets for the last universal common ancestor (LUCA) of the extant life forms were analysed. The number of genes in the reconstructed LUCA gene sets grows as the gain penalty increases. However, qualitative examination of the LUCA versions reconstructed with different gain penalties indicates that, even with a gain penalty of 1 (equal weights assigned to a gain and a loss), the set of 572 genes assigned to LUCA might be nearly sufficient to sustain a functioning organism. Under this gain penalty value, the numbers of horizontal gene transfer and gene loss events are nearly identical. This result holds true for two alternative topologies of the species tree and even under random shuffling of the tree. Therefore, the results seem to be compatible with approximately equal likelihoods of HGT and gene loss in the evolution of prokaryotes.
The notion that gene loss and HGT are major aspects of prokaryotic evolution was supported by quantitative analysis of the mapping of the phyletic patterns of COGs onto a hypothetical species tree. Algorithms were developed for constructing parsimonious evolutionary scenarios, which include gene loss and gain events, for orthologous gene sets, given a species tree. This analysis shows, contrary to expectations, that the number of predicted HGT events that occurred during the evolution of prokaryotes might be approximately the same as the number of gene losses. The approach to the reconstruction of evolutionary scenarios employed here is conservative with regard to the detection of HGT because only patterns of gene presence-absence in sequenced genomes are taken into account. In reality, horizontal transfer might have contributed to the evolution of many other genes also, which makes it a dominant force in prokaryotic evolution.
Major transitions in biological evolution show the same pattern of sudden emergence of diverse forms at a new level of complexity. The relationships between major groups within an emergent new class of biological entities are hard to decipher and do not seem to fit the tree pattern that, following Darwin's original proposal, remains the dominant description of biological evolution. The cases in point include the origin of complex RNA molecules and protein folds; major groups of viruses; archaea and bacteria, and the principal lineages within each of these prokaryotic domains; eukaryotic supergroups; and animal phyla. In each of these pivotal nexuses in life's history, the principal "types" seem to appear rapidly and fully equipped with the signature features of the respective new level of biological organization. No intermediate "grades" or intermediate forms between different types are detectable. Usually, this pattern is attributed to cladogenesis compressed in time, combined with the inevitable erosion of the phylogenetic signal.
I propose that most or all major evolutionary transitions that show the "explosive" pattern of emergence of new types of biological entities correspond to a boundary between two qualitatively distinct evolutionary phases. The first, inflationary phase is characterized by extremely rapid evolution driven by various processes of genetic information exchange, such as horizontal gene transfer, recombination, fusion, fission, and spread of mobile elements. These processes give rise to a vast diversity of forms from which the main classes of entities at the new level of complexity emerge independently, through a sampling process. In the second phase, evolution dramatically slows down, the respective process of genetic information exchange tapers off, and multiple lineages of the new type of entities emerge, each of them evolving in a tree-like fashion from that point on. This biphasic model of evolution incorporates the previously developed concepts of the emergence of protein folds by recombination of small structural units and origin of viruses and cells from a pre-cellular compartmentalized pool of recombining genetic elements. The model is extended to encompass other major transitions. It is proposed that bacterial and archaeal phyla emerged independently from two distinct populations of primordial cells that, originally, possessed leaky membranes, which made the cells prone to rampant gene exchange; and that the eukaryotic supergroups emerged through distinct, secondary endosymbiotic events (as opposed to the primary, mitochondrial endosymbiosis). This biphasic model of evolution is substantially analogous to the scenario of the origin of universes in the eternal inflation version of modern cosmology. Under this model, universes like ours emerge in the infinite multiverse when the eternal process of exponential expansion, known as inflation, ceases in a particular region as a result of false vacuum decay, a first order phase transition process. The result is the nucleation of a new universe, which is traditionally denoted Big Bang, although this scenario is radically different from the Big Bang of the traditional model of an expanding universe. Hence I denote the phase transitions at the end of each inflationary epoch in the history of life Biological Big Bangs (BBB).
A Biological Big Bang (BBB) model is proposed for the major transitions in life's evolution. According to this model, each transition is a BBB such that new classes of biological entities emerge at the end of a rapid phase of evolution (inflation) that is characterized by extensive exchange of genetic information which takes distinct forms for different BBBs. The major types of new forms emerge independently, via a sampling process, from the pool of recombining entities of the preceding generation. This process is envisaged as being qualitatively different from tree-pattern cladogenesis.
This article was reviewed by William Martin, Sergei Maslov, and Leonid Mirny.
Mycoplasmas are commonly described as the simplest self-replicating organisms, whose evolution was mainly characterized by genome downsizing with a proposed evolutionary scenario similar to that of obligate intracellular bacteria such as insect endosymbionts. Thus far, analysis of mycoplasma genomes indicates a low level of horizontal gene transfer (HGT) implying that DNA acquisition is strongly limited in these minimal bacteria. In this study, the genome of the ruminant pathogen Mycoplasma agalactiae was sequenced. Comparative genomic data and phylogenetic tree reconstruction revealed that ∼18% of its small genome (877,438 bp) has undergone HGT with the phylogenetically distinct mycoides cluster, which is composed of significant ruminant pathogens. HGT involves genes often found as clusters, several of which encode lipoproteins that usually play an important role in mycoplasma–host interaction. A decayed form of a conjugative element also described in a member of the mycoides cluster was found in the M. agalactiae genome, suggesting that HGT may have occurred by mobilizing a related genetic element. The possibility of HGT events among other mycoplasmas was evaluated with the available sequenced genomes. Our data indicate marginal levels of HGT among Mycoplasma species except for those described above and, to a lesser extent, for those observed in between the two bird pathogens, M. gallisepticum and M. synoviae. This first description of large-scale HGT among mycoplasmas sharing the same ecological niche challenges the generally accepted evolutionary scenario in which gene loss is the main driving force of mycoplasma evolution. The latter clearly differs from that of other bacteria with small genomes, particularly obligate intracellular bacteria that are isolated within host cells. Consequently, mycoplasmas are not only able to subvert complex hosts but presumably have retained sexual competence, a trait that may prevent them from genome stasis and contribute to adaptation to new hosts.
Mycoplasmas are cell wall–lacking prokaryotes that evolved from ancestors common to Gram-positive bacteria by way of massive losses of genetic material. With their minimal genome, mycoplasmas are considered to be the simplest free-living organisms, yet several species are successful pathogens of man and animal. In this study, we challenged the commonly accepted view in which mycoplasma evolution is driven only by genome down-sizing. Indeed, we showed that a significant amount of genes underwent horizontal transfer among different mycoplasma species that share the same ruminant hosts. In these species, the occurrence of a genetic element that can promote DNA transfer via cell-to-cell contact suggests that some mycoplasmas may have retained or acquired sexual competence. Transferred genes were found to encode proteins that are likely to be associated with mycoplasma–host interactions. Sharing genetic resources via horizontal gene transfer may provide mycoplasmas with a means for adapting to new niches or to new hosts and for avoiding irreversible genome erosion.
Congruence is a broadly applied notion in evolutionary biology used to justify multigene phylogeny or phylogenomics, as well as in studies of coevolution, lateral gene transfer, and as evidence for common descent. Existing methods for identifying incongruence or heterogeneity using character data were designed for data sets that are both small and expected to be rarely incongruent. At the same time, methods that assess incongruence using comparison of trees test a null hypothesis of uncorrelated tree structures, which may be inappropriate for phylogenomic studies. As such, they are ill-suited for the growing number of available genome sequences, most of which are from prokaryotes and viruses, either for phylogenomic analysis or for studies of the evolutionary forces and events that have shaped these genomes. Specifically, many existing methods scale poorly with large numbers of genes, cannot accommodate high levels of incongruence, and do not adequately model patterns of missing taxa for different markers. We propose the development of novel incongruence assessment methods suitable for the analysis of the molecular evolution of the vast majority of life and support the investigation of homogeneity of evolutionary process in cases where markers do not share identical tree structures.
incongruence; lateral gene transfer; microbial evolution; phylogenetic networks; phylogenomics
All living beings on Earth, from bacteria to humans, are connected through descent from common ancestors and represent the summation of their corresponding, ca. 3500 million year long evolutionary history. However, the evolution of phenotypic features is not predictable, and biologists no longer use terms such as "primitive" or "perfect organisms". Despite these insights, the Bible-based concept of the so-called "ladder of life" or Scala Naturae, i.e., the idea that all living beings can be viewed as representing various degrees of "perfection", with humans at the very top of this biological hierarchy, was popular among naturalists until ca. 1850 (Charles Bonnet, Jean Lamarck and others). Charles Darwin is usually credited with the establishment of a branched evolutionary "Tree of Life". This insight of 1859 was based on his now firmly corroborated proposals of common ancestry and natural selection. In this article I argue that Darwin was still influenced by "ladder thinking", a theological view that prevailed throughout the 19th century and is also part of Ernst Haeckel's famous Oak tree (of Life) of 1866, which is, like Darwin's scheme, static. In 1910, Constantin Mereschkowsky proposed an alternative, "anti-selectionist" concept of biological evolution, which became known as the symbiogenesis-theory. According to the symbiogenesis-scenario, eukaryotic cells evolved on a static Earth from archaic prokaryotes via the fusion and subsequent cooperation of certain microbes. In 1929, Alfred Wegener published his theory of continental drift, which was later corroborated, modified and extended. The resulting theory of plate tectonics is now the principal organizing concept of geology. Over millions of years, plate tectonics and hence the "dynamic Earth" has caused destructive volcanic eruptions and earthquakes. At the same time, it created mountain ranges, deep oceans, novel freshwater habitats, and deserts. As a result, these geologic processes destroyed numerous populations of organisms, and produced the environmental conditions for new species of animals, plants and microbes to adapt and evolve. In this article I propose a tree-like "symbiogenesis, natural selection, and dynamic Earth (synade)-model" of macroevolution that is based on these novel facts and data.
This article was reviewed by Mark Ragan, W. Ford Doolittle, and Staffan Müller-Wille.
It is well-known that Charles Darwin sketched abstract trees of relationship in his 1837 notebook, and depicted a tree in the Origin of Species (1859). Here I attempt to place Darwin's trees in historical context. By the mid-Eighteenth century the Great Chain of Being was increasingly seen to be an inadequate description of order in nature, and by about 1780 it had been largely abandoned without a satisfactory alternative having been agreed upon. In 1750 Donati described aquatic and terrestrial organisms as forming a network, and a few years later Buffon depicted a network of genealogical relationships among breeds of dogs. In 1764 Bonnet asked whether the Chain might actually branch at certain points, and in 1766 Pallas proposed that the gradations among organisms resemble a tree with a compound trunk, perhaps not unlike the tree of animal life later depicted by Eichwald. Other trees were presented by Augier in 1801 and by Lamarck in 1809 and 1815, the latter two assuming a transmutation of species over time. Elaborate networks of affinities among plants and among animals were depicted in the late Eighteenth and very early Nineteenth centuries. In the two decades immediately prior to 1837, so-called affinities and/or analogies among organisms were represented by diverse geometric figures. Series of plant and animal fossils in successive geological strata were represented as trees in a popular textbook from 1840, while in 1858 Bronn presented a system of animals, as evidenced by the fossil record, in a form of a tree. Darwin's 1859 tree and its subsequent elaborations by Haeckel came to be accepted in many but not all areas of biological sciences, while network diagrams were used in others. Beginning in the early 1960s trees were inferred from protein and nucleic acid sequences, but networks were re-introduced in the mid-1990s to represent lateral genetic transfer, increasingly regarded as a fundamental mode of evolution at least for bacteria and archaea. In historical context, then, the Network of Life preceded the Tree of Life and might again supersede it.
This article was reviewed by Eric Bapteste, Patrick Forterre and Dan Graur.
Recent advances in genomics of viruses and cellular life forms have greatly stimulated interest in the origins and evolution of viruses and, for the first time, offer an opportunity for a data-driven exploration of the deepest roots of viruses. Here we briefly review the current views of virus evolution and propose a new, coherent scenario that appears to be best compatible with comparative-genomic data and is naturally linked to models of cellular evolution that, from independent considerations, seem to be the most parsimonious among the existing ones.
Several genes coding for key proteins involved in viral replication and morphogenesis as well as the major capsid protein of icosahedral virions are shared by many groups of RNA and DNA viruses but are missing in cellular life forms. On the basis of this key observation and the data on extensive genetic exchange between diverse viruses, we propose the concept of the ancient virus world. The virus world is construed as a distinct contingent of viral genes that continuously retained its identity throughout the entire history of life. Under this concept, the principal lineages of viruses and related selfish agents emerged from the primordial pool of primitive genetic elements, the ancestors of both cellular and viral genes. Thus, notwithstanding the numerous gene exchanges and acquisitions attributed to later stages of evolution, most, if not all, modern viruses and other selfish agents are inferred to descend from elements that belonged to the primordial genetic pool. In this pool, RNA viruses would evolve first, followed by retroid elements, and DNA viruses. The Virus World concept is predicated on a model of early evolution whereby emergence of substantial genetic diversity antedates the advent of full-fledged cells, allowing for extensive gene mixing at this early stage of evolution. We outline a scenario of the origin of the main classes of viruses in conjunction with a specific model of precellular evolution under which the primordial gene pool dwelled in a network of inorganic compartments. Somewhat paradoxically, under this scenario, we surmise that selfish genetic elements ancestral to viruses evolved prior to typical cells, to become intracellular parasites once bacteria and archaea arrived at the scene. Selection against excessively aggressive parasites that would kill off the host ensembles of genetic elements would lead to early evolution of temperate virus-like agents and primitive defense mechanisms, possibly, based on the RNA interference principle. The emergence of the eukaryotic cell is construed as the second melting pot of virus evolution from which the major groups of eukaryotic viruses originated as a result of extensive recombination of genes from various bacteriophages, archaeal viruses, plasmids, and the evolving eukaryotic genomes. Again, this vision is predicated on a specific model of the emergence of eukaryotic cell under which archaeo-bacterial symbiosis was the starting point of eukaryogenesis, a scenario that appears to be best compatible with the data.
The existence of several genes that are central to virus replication and structure, are shared by a broad variety of viruses but are missing from cellular genomes (virus hallmark genes) suggests the model of an ancient virus world, a flow of virus-specific genes that went uninterrupted from the precellular stage of life's evolution to this day. This concept is tightly linked to two key conjectures on evolution of cells: existence of a complex, precellular, compartmentalized but extensively mixing and recombining pool of genes, and origin of the eukaryotic cell by archaeo-bacterial fusion. The virus world concept and these models of major transitions in the evolution of cells provide complementary pieces of an emerging coherent picture of life's history.
W. Ford Doolittle, J. Peter Gogarten, and Arcady Mushegian.
Bacterial populations are highly successful at colonizing new habitats and adapting to changing environmental conditions, partly due to their capacity to evolve novel virulence and metabolic pathways in response to stress conditions and to shuffle them by horizontal gene transfer (HGT). A common theme in the evolution of new functions consists of gene duplication followed by functional divergence. UlaG, a unique manganese-dependent metallo-β-lactamase (MBL) enzyme involved in L-ascorbate metabolism by commensal and symbiotic enterobacteria, provides a model for the study of the emergence of new catalytic activities from the modification of an ancient fold. Furthermore, UlaG is the founding member of the so-called UlaG-like (UlaGL) protein family, a recently established and poorly characterized family comprising divalent (and perhaps trivalent) metal-binding MBLs that catalyze transformations on phosphorylated sugars and nucleotides.
Here we combined protein structure-guided and sequence-only molecular phylogenetic analyses to dissect the molecular evolution of UlaG and to study its phylogenomic distribution, its relatedness with present-day UlaGL protein sequences and functional conservation. Phylogenetic analyses indicate that UlaGL sequences are present in Bacteria and Archaea, with bona fide orthologs found mainly in mammalian and plant-associated Gram-negative and Gram-positive bacteria. The incongruence between the UlaGL tree and known species trees indicates exchange by HGT and suggests that the UlaGL-encoding genes provided a growth advantage under changing conditions. Our search for more distantly related protein sequences aided by structural homology has uncovered that UlaGL sequences have a common evolutionary origin with present-day RNA processing and metabolizing MBL enzymes widespread in Bacteria, Archaea, and Eukarya. This observation suggests an ancient origin for the UlaGL family within the broader trunk of the MBL superfamily by duplication, neofunctionalization and fixation.
Our results suggest that the forerunner of UlaG was present as an RNA metabolizing enzyme in the last common ancestor, and that the modern descendants of that ancestral gene have a wide phylogenetic distribution and functional roles. We propose that the UlaGL family evolved new metabolic roles among bacterial and possibly archeal phyla in the setting of a close association with metazoans, such as in the mammalian gastrointestinal tract or in animal and plant pathogens, as well as in environmental settings. Accordingly, the major evolutionary forces shaping the UlaGL family include vertical inheritance and lineage-specific duplication and acquisition of novel metabolic functions, followed by HGT and numerous lineage-specific gene loss events.
Microbial eukaryotes may extinguish much of their nuclear phylogenetic history due to endosymbiotic/horizontal gene transfer (E/HGT). We studied E/HGT in 32,110 contigs of expressed sequence tags (ESTs) from the dinoflagellate Alexandrium tamarense (Dinophyceae) using a conservative phylogenomic approach. The vast majority of predicted proteins (86.4%) in this alga are novel or dinoflagellate-specific. We searched for putative homologs of these predicted proteins against a taxonomically broadly sampled protein database that includes all currently available data from algae and protists and reconstructed a phylogeny from each of the putative homologous protein sets. Of the 2,523 resulting phylogenies, 14-17% are potentially impacted by E/HGT involving both prokaryote and eukaryote lineages, with 2-4% showing clear evidence of reticulate evolution. The complex evolutionary histories of the remaining proteins, many of which may also have been affected by E/HGT, cannot be interpreted using our approach with currently available gene data. We present empirical evidence of reticulate genome evolution that combined with inadequate or highly complex phylogenetic signal in many proteins may impede genome-wide approaches to infer the tree of microbial eukaryotes.
dinoflagellates; endosymbiosis; eukaryote evolution; horizontal gene transfer; phylogenomics
The acceptance of Darwin's theory of evolution by natural selection is not complete and it has been pointed out its limitation to explain the complex processes that constitute the transformation of species. It is necessary to discuss the explaining power of the dominant paradigm. It is common that new discoveries bring about contradictions that are intended to be overcome by adjusting results to the dominant reductionist paradigm using all sorts of gradations and combinations that are admitted for each case. In addition to the discussion on the validity of natural selection, modern findings represent a challenge to the interpretation of the observations with the Darwinian view of competition and struggle for life as theoretical basis. New holistic interpretations are emerging related to the Net of Life, in which the interconnection of ecosystems constitutes a dynamic and self-regulating biosphere: viruses are recognized as a macroorganism with a huge collection of genes, most unknown that constitute the major planet's gene pool. They play a fundamental role in evolution since their sequences are capable of integrating into the genomes in an “infective” way and become an essential part of multicellular organisms. They have content with “biological sense” i.e., they appear as part of normal life processes and have a serious role as carrier elements of complex genetic information. Antibiotics are cell signals with main effects on general metabolism and transcription on bacterial cells and communities. The hologenome theory considers an organism and all of its associated symbiotic microbes (parasites, mutualists, synergists, amensalists) as a result of symbiopoiesis. Microbes, helmints, that are normally understood as parasites are cohabitants and they have cohabited with their host and drive the evolution and existence of the partners. Each organism is the result of integration of complex systems. The eukaryotic organism is the result of combination of bacterial, virus, and eukaryotic DNA and it is the result of the interaction of its own genome with the genome of its microbiota, and their metabolism are intertwined (as a “superorganism”) along evolution. The darwinian paradigm had its origin in the free market theories and concepts of Malthus and Spencer. Then, nature was explained on the basis of market theories moving away from an accurate explanation of natural phenomena. It is necessary to acknowledge the limitations of the dominant dogma. These new interpretations about biological processes, molecules, roles of viruses in nature, and microbial interactions are remarkable points to be considered in order to construct a solid theory adjusted to the facts and with less speculations and tortuous semantic traps.
Darwinism; natural selection; evolution; paradigm; virus; hologenome; autopoiesis
Many bacteria and viruses adapt to varying environmental conditions through the acquisition of mosaic genes. A mosaic gene is composed of alternating sequence polymorphisms either belonging to the host original allele or derived from the integrated donor DNA. Often, the integrated sequence contains a selectable genetic marker (e.g. marker allowing for antibiotic resistance). An effective identification of mosaic genes and detection of corresponding partial horizontal gene transfers (HGTs) are among the most important challenges posed by evolutionary biology. We developed a method for detecting partial HGT events and related intragenic recombination giving rise to the formation of mosaic genes. A bootstrap procedure incorporated in our method is used to assess the support of each predicted partial gene transfer. The proposed method can be also applied to confirm or discard complete (i.e. traditional) horizontal gene transfers detected by any HGT inferring method. While working on a full-genome scale, the new method can be used to assess the level of mosaicism in the considered genomes as well as the rates of complete and partial HGT underlying their evolution.
Phylogenetic trees of individual genes of prokaryotes (archaea and bacteria) generally have different topologies, largely owing to extensive horizontal gene transfer (HGT), suggesting that the Tree of Life (TOL) should be replaced by a “net of life” as the paradigm of prokaryote evolution. However, trees remain the natural representation of the histories of individual genes given the fundamentally bifurcating process of gene replication. Therefore, although no single tree can fully represent the evolution of prokaryote genomes, the complete picture of evolution will necessarily combine trees and nets. A quantitative measure of the signals of tree and net evolution is derived from an analysis of all quartets of species in all trees of the “Forest of Life” (FOL), which consists of approximately 7,000 phylogenetic trees for prokaryote genes including approximately 100 nearly universal trees (NUTs). Although diverse routes of net-like evolution collectively dominate the FOL, the pattern of tree-like evolution that reflects the consistent topologies of the NUTs is the most prominent coherent trend. We show that the contributions of tree-like and net-like evolutionary processes substantially differ across bacterial and archaeal lineages and between functional classes of genes. Evolutionary simulations indicate that the central tree-like signal cannot be realistically explained by a self-reinforcing pattern of biased HGT.
phylogenetic tree; horizontal gene transfer; species quartets; computer simulation
The evolutionary history of all life forms is usually represented as a vertical tree-like process. In prokaryotes, however, the vertical signal is partly obscured by the massive influence of horizontal gene transfer (HGT). The HGT creates widespread discordance between evolutionary histories of different genes as genomes become mosaics of gene histories. Thus, the Tree of Life (TOL) has been questioned as an appropriate representation of the evolution of prokaryotes. Nevertheless a common hypothesis is that prokaryotic evolution is primarily tree-like, and a routine effort is made to place new isolates in their appropriate location in the TOL. Moreover, it appears desirable to exploit non–tree-like evolutionary processes for the task of microbial classification. In this work, we present a novel technique that builds on the straightforward observation that gene order conservation (‘synteny’) decreases in time as a result of gene mobility. This is particularly true in prokaryotes, mainly due to HGT. Using a ‘synteny index’ (SI) that measures the average synteny between a pair of genomes, we developed the phylogenetic reconstruction tool ‘Phylo SI’. Phylo SI offers several attractive properties such as easy bootstrapping, high sensitivity in cases where phylogenetic signal is weak and computational efficiency. Phylo SI was tested both on simulated data and on two bacterial data sets and compared with two well-established phylogenetic methods. Phylo SI is particularly efficient on short evolutionary distances where synteny footprints remain detectable, whereas the nucleotide substitution signal is too weak for reliable sequence-based phylogenetic reconstruction. The method is publicly available at http://research.haifa.ac.il/ssagi/software/PhyloSI.zip.