Search tips
Search criteria

Results 1-25 (945682)

Clipboard (0)

Related Articles

1.  The Phylogenetic Forest and the Quest for the Elusive Tree of Life 
Extensive horizontal gene transfer (HGT) among prokaryotes seems to undermine the tree of life (TOL) concept. However, the possibility remains that the TOL can be salvaged as a statistical central trend in the phylogenetic “forest of life” (FOL). A comprehensive comparative analysis of 6901 phylogenetic trees for prokaryotic genes revealed a signal of vertical inheritance that was particularly strong among the 102 nearly universal trees (NUTs), despite the high topological inconsistency among the trees in the FOL, most likely, caused by HGT. The topologies of the NUTs are similar to the topologies of numerous other trees in the FOL; although the NUTs cannot represent the FOL completely, they reflect a significant central trend. Thus, the original TOL concept becomes obsolete but the idea of a “weak” TOL as the dominant trend in the FOL merits further investigation. The totality of gene trees comprising the FOL appears to be a natural representation of the history of life given the inherent tree-like character of the replication process.
PMCID: PMC3380366  PMID: 19687142
2.  The Tree and Net Components of Prokaryote Evolution 
Phylogenetic trees of individual genes of prokaryotes (archaea and bacteria) generally have different topologies, largely owing to extensive horizontal gene transfer (HGT), suggesting that the Tree of Life (TOL) should be replaced by a “net of life” as the paradigm of prokaryote evolution. However, trees remain the natural representation of the histories of individual genes given the fundamentally bifurcating process of gene replication. Therefore, although no single tree can fully represent the evolution of prokaryote genomes, the complete picture of evolution will necessarily combine trees and nets. A quantitative measure of the signals of tree and net evolution is derived from an analysis of all quartets of species in all trees of the “Forest of Life” (FOL), which consists of approximately 7,000 phylogenetic trees for prokaryote genes including approximately 100 nearly universal trees (NUTs). Although diverse routes of net-like evolution collectively dominate the FOL, the pattern of tree-like evolution that reflects the consistent topologies of the NUTs is the most prominent coherent trend. We show that the contributions of tree-like and net-like evolutionary processes substantially differ across bacterial and archaeal lineages and between functional classes of genes. Evolutionary simulations indicate that the central tree-like signal cannot be realistically explained by a self-reinforcing pattern of biased HGT.
PMCID: PMC2997564  PMID: 20889655
phylogenetic tree; horizontal gene transfer; species quartets; computer simulation
3.  The fundamental units, processes and patterns of evolution, and the Tree of Life conundrum 
Biology Direct  2009;4:33.
The elucidation of the dominant role of horizontal gene transfer (HGT) in the evolution of prokaryotes led to a severe crisis of the Tree of Life (TOL) concept and intense debates on this subject.
Prompted by the crisis of the TOL, we attempt to define the primary units and the fundamental patterns and processes of evolution. We posit that replication of the genetic material is the singular fundamental biological process and that replication with an error rate below a certain threshold both enables and necessitates evolution by drift and selection. Starting from this proposition, we outline a general concept of evolution that consists of three major precepts.
1. The primary agency of evolution consists of Fundamental Units of Evolution (FUEs), that is, units of genetic material that possess a substantial degree of evolutionary independence. The FUEs include both bona fide selfish elements such as viruses, viroids, transposons, and plasmids, which encode some of the information required for their own replication, and regular genes that possess quasi-independence owing to their distinct selective value that provides for their transfer between ensembles of FUEs (genomes) and preferential replication along with the rest of the recipient genome.
2. The history of replication of a genetic element without recombination is isomorphously represented by a directed tree graph (an arborescence, in the graph theory language). Recombination within a FUE is common between very closely related sequences where homologous recombination is feasible but becomes negligible for longer evolutionary distances. In contrast, shuffling of FUEs occurs at all evolutionary distances. Thus, a tree is a natural representation of the evolution of an individual FUE on the macro scale, but not of an ensemble of FUEs such as a genome.
3. The history of life is properly represented by the "forest" of evolutionary trees for individual FUEs (Forest of Life, or FOL). Search for trends and patterns in the FOL is a productive direction of study that leads to the delineation of ensembles of FUEs that evolve coherently for a certain time span owing to a shared history of vertical inheritance or horizontal gene transfer; these ensembles are commonly known as genomes, taxa, or clades, depending on the level of analysis. A small set of genes (the universal genetic core of life) might show a (mostly) coherent evolutionary trend that transcends the entire history of cellular life forms. However, it might not be useful to denote this trend "the tree of life", or organismal, or species tree because neither organisms nor species are fundamental units of life.
A logical analysis of the units and processes of biological evolution suggests that the natural fundamental unit of evolution is a FUE, that is, a genetic element with an independent evolutionary history. Evolution of a FUE on the macro scale is naturally represented by a tree. Only the full compendium of trees for individual FUEs (the FOL) is an adequate depiction of the evolution of life. Coherent evolution of FUEs over extended evolutionary intervals is a crucial aspect of the history of life but a "species" or "organismal" tree is not a fundamental concept.
This articles was reviewed by Valerian Dolja, W. Ford Doolittle, Nicholas Galtier, and William Martin
PMCID: PMC2761301  PMID: 19788730
Methods in molecular biology (Clifton, N.J.)  2012;856:10.1007/978-1-61779-585-5_3.
Genome-wide comparison of phylogenetic trees is becoming an increasingly common approach in evolutionary genomics, and a variety of approaches for such comparison have been developed. In this article we present several methods for comparative analysis of large numbers of phylogenetic trees. To compare phylogenetic trees taking into account the bootstrap support for each internal branch, the Boot-Split Distance (BSD) method is introduced as an extension of the previously developed Split Distance (SD) method for tree comparison. The BSD method implements the straightforward idea that comparison of phylogenetic trees can be made more robust by treating tree splits differentially depending on the bootstrap support. Approaches are also introduced for detecting tree-like and net-like evolutionary trends in the phylogenetic Forest of Life (FOL), i.e., the entirety of the phylogenetic trees for conserved genes of prokaryotes. The principal method employed for this purpose includes mapping quartets of species onto trees to calculate the support of each quartet topology and so to quantify the tree and net contributions to the distances between species. We describe the applications methods used to analyze the FOL and the results obtained with these methods. These results support the concept of the Tree of Life (TOL) as a central evolutionary trend in the FOL as opposed to the traditional view of the TOL as a ‘species tree’.
PMCID: PMC3842619  PMID: 22399455
Forest of life; tree of life; phylogenomic methods; tree comparison; map of quartets
5.  Search for a 'Tree of Life' in the thicket of the phylogenetic forest 
Journal of Biology  2009;8(6):59.
Comparative genomics has revealed extensive horizontal gene transfer among prokaryotes, a development that is often considered to undermine the 'tree of life' concept. However, the possibility remains that a statistical central trend still exists in the phylogenetic 'forest of life'.
A comprehensive comparative analysis of a 'forest' of 6,901 phylogenetic trees for prokaryotic genes revealed a consistent phylogenetic signal, particularly among 102 nearly universal trees, despite high levels of topological inconsistency, probably due to horizontal gene transfer. Horizontal transfers seemed to be distributed randomly and did not obscure the central trend. The nearly universal trees were topologically similar to numerous other trees. Thus, the nearly universal trees might reflect a significant central tendency, although they cannot represent the forest completely. However, topological consistency was seen mostly at shallow tree depths and abruptly dropped at the level of the radiation of archaeal and bacterial phyla, suggesting that early phases of evolution could be non-tree-like (Biological Big Bang). Simulations of evolution under compressed cladogenesis or Biological Big Bang yielded a better fit to the observed dependence between tree inconsistency and phylogenetic depth for the compressed cladogenesis model.
Horizontal gene transfer is pervasive among prokaryotes: very few gene trees are fully consistent, making the original tree of life concept obsolete. A central trend that most probably represents vertical inheritance is discernible throughout the evolution of archaea and bacteria, although compressed cladogenesis complicates unambiguous resolution of the relationships between the major archaeal and bacterial clades.
PMCID: PMC2737373  PMID: 19594957
6.  Phylo SI: a new genome-wide approach for prokaryotic phylogeny 
Nucleic Acids Research  2013;42(4):2391-2404.
The evolutionary history of all life forms is usually represented as a vertical tree-like process. In prokaryotes, however, the vertical signal is partly obscured by the massive influence of horizontal gene transfer (HGT). The HGT creates widespread discordance between evolutionary histories of different genes as genomes become mosaics of gene histories. Thus, the Tree of Life (TOL) has been questioned as an appropriate representation of the evolution of prokaryotes. Nevertheless a common hypothesis is that prokaryotic evolution is primarily tree-like, and a routine effort is made to place new isolates in their appropriate location in the TOL. Moreover, it appears desirable to exploit non–tree-like evolutionary processes for the task of microbial classification. In this work, we present a novel technique that builds on the straightforward observation that gene order conservation (‘synteny’) decreases in time as a result of gene mobility. This is particularly true in prokaryotes, mainly due to HGT. Using a ‘synteny index’ (SI) that measures the average synteny between a pair of genomes, we developed the phylogenetic reconstruction tool ‘Phylo SI’. Phylo SI offers several attractive properties such as easy bootstrapping, high sensitivity in cases where phylogenetic signal is weak and computational efficiency. Phylo SI was tested both on simulated data and on two bacterial data sets and compared with two well-established phylogenetic methods. Phylo SI is particularly efficient on short evolutionary distances where synteny footprints remain detectable, whereas the nucleotide substitution signal is too weak for reliable sequence-based phylogenetic reconstruction. The method is publicly available at
PMCID: PMC3936750  PMID: 24243847
7.  Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes 
Comparative analysis of sequenced genomes reveals numerous instances of apparent horizontal gene transfer (HGT), at least in prokaryotes, and indicates that lineage-specific gene loss might have been even more common in evolution. This complicates the notion of a species tree, which needs to be re-interpreted as a prevailing evolutionary trend, rather than the full depiction of evolution, and makes reconstruction of ancestral genomes a non-trivial task.
We addressed the problem of constructing parsimonious scenarios for individual sets of orthologous genes given a species tree. The orthologous sets were taken from the database of Clusters of Orthologous Groups of proteins (COGs). We show that the phyletic patterns (patterns of presence-absence in completely sequenced genomes) of almost 90% of the COGs are inconsistent with the hypothetical species tree. Algorithms were developed to reconcile the phyletic patterns with the species tree by postulating gene loss, COG emergence and HGT (the latter two classes of events were collectively treated as gene gains). We prove that each of these algorithms produces a parsimonious evolutionary scenario, which can be represented as mapping of loss and gain events on the species tree. The distribution of the evolutionary events among the tree nodes substantially depends on the underlying assumptions of the reconciliation algorithm, e.g. whether or not independent gene gains (gain after loss after gain) are permitted. Biological considerations suggest that, on average, gene loss might be a more likely event than gene gain. Therefore different gain penalties were used and the resulting series of reconstructed gene sets for the last universal common ancestor (LUCA) of the extant life forms were analysed. The number of genes in the reconstructed LUCA gene sets grows as the gain penalty increases. However, qualitative examination of the LUCA versions reconstructed with different gain penalties indicates that, even with a gain penalty of 1 (equal weights assigned to a gain and a loss), the set of 572 genes assigned to LUCA might be nearly sufficient to sustain a functioning organism. Under this gain penalty value, the numbers of horizontal gene transfer and gene loss events are nearly identical. This result holds true for two alternative topologies of the species tree and even under random shuffling of the tree. Therefore, the results seem to be compatible with approximately equal likelihoods of HGT and gene loss in the evolution of prokaryotes.
The notion that gene loss and HGT are major aspects of prokaryotic evolution was supported by quantitative analysis of the mapping of the phyletic patterns of COGs onto a hypothetical species tree. Algorithms were developed for constructing parsimonious evolutionary scenarios, which include gene loss and gain events, for orthologous gene sets, given a species tree. This analysis shows, contrary to expectations, that the number of predicted HGT events that occurred during the evolution of prokaryotes might be approximately the same as the number of gene losses. The approach to the reconstruction of evolutionary scenarios employed here is conservative with regard to the detection of HGT because only patterns of gene presence-absence in sequenced genomes are taken into account. In reality, horizontal transfer might have contributed to the evolution of many other genes also, which makes it a dominant force in prokaryotic evolution.
PMCID: PMC149225  PMID: 12515582
8.  The Cobweb of Life Revealed by Genome-Scale Estimates of Horizontal Gene Transfer 
PLoS Biology  2005;3(10):e316.
With the availability of increasing amounts of genomic sequences, it is becoming clear that genomes experience horizontal transfer and incorporation of genetic information. However, to what extent such horizontal gene transfer (HGT) affects the core genealogical history of organisms remains controversial. Based on initial analyses of complete genomic sequences, HGT has been suggested to be so widespread that it might be the “essence of phylogeny” and might leave the treelike form of genealogy in doubt. On the other hand, possible biased estimation of HGT extent and the findings of coherent phylogenetic patterns indicate that phylogeny of life is well represented by tree graphs. Here, we reexamine this question by assessing the extent of HGT among core orthologous genes using a novel statistical method based on statistical comparisons of tree topology. We apply the method to 40 microbial genomes in the Clusters of Orthologous Groups database over a curated set of 297 orthologous gene clusters, and we detect significant HGT events in 33 out of 297 clusters over a wide range of functional categories. Estimates of positions of HGT events suggest a low mean genome-specific rate of HGT (2.0%) among the orthologous genes, which is in general agreement with other quantitative of HGT. We propose that HGT events, even when relatively common, still leave the treelike history of phylogenies intact, much like cobwebs hanging from tree branches.
A stastical approach applied to 297 orthologous gene clusters in 40 microbial genomes suggests a low rate of interspecies gene transfer. Species relationships can therefore be modeled with a tree structure.
PMCID: PMC1233574  PMID: 16122348
9.  Being Pathogenic, Plastic, and Sexual while Living with a Nearly Minimal Bacterial Genome 
PLoS Genetics  2007;3(5):e75.
Mycoplasmas are commonly described as the simplest self-replicating organisms, whose evolution was mainly characterized by genome downsizing with a proposed evolutionary scenario similar to that of obligate intracellular bacteria such as insect endosymbionts. Thus far, analysis of mycoplasma genomes indicates a low level of horizontal gene transfer (HGT) implying that DNA acquisition is strongly limited in these minimal bacteria. In this study, the genome of the ruminant pathogen Mycoplasma agalactiae was sequenced. Comparative genomic data and phylogenetic tree reconstruction revealed that ∼18% of its small genome (877,438 bp) has undergone HGT with the phylogenetically distinct mycoides cluster, which is composed of significant ruminant pathogens. HGT involves genes often found as clusters, several of which encode lipoproteins that usually play an important role in mycoplasma–host interaction. A decayed form of a conjugative element also described in a member of the mycoides cluster was found in the M. agalactiae genome, suggesting that HGT may have occurred by mobilizing a related genetic element. The possibility of HGT events among other mycoplasmas was evaluated with the available sequenced genomes. Our data indicate marginal levels of HGT among Mycoplasma species except for those described above and, to a lesser extent, for those observed in between the two bird pathogens, M. gallisepticum and M. synoviae. This first description of large-scale HGT among mycoplasmas sharing the same ecological niche challenges the generally accepted evolutionary scenario in which gene loss is the main driving force of mycoplasma evolution. The latter clearly differs from that of other bacteria with small genomes, particularly obligate intracellular bacteria that are isolated within host cells. Consequently, mycoplasmas are not only able to subvert complex hosts but presumably have retained sexual competence, a trait that may prevent them from genome stasis and contribute to adaptation to new hosts.
Author Summary
Mycoplasmas are cell wall–lacking prokaryotes that evolved from ancestors common to Gram-positive bacteria by way of massive losses of genetic material. With their minimal genome, mycoplasmas are considered to be the simplest free-living organisms, yet several species are successful pathogens of man and animal. In this study, we challenged the commonly accepted view in which mycoplasma evolution is driven only by genome down-sizing. Indeed, we showed that a significant amount of genes underwent horizontal transfer among different mycoplasma species that share the same ruminant hosts. In these species, the occurrence of a genetic element that can promote DNA transfer via cell-to-cell contact suggests that some mycoplasmas may have retained or acquired sexual competence. Transferred genes were found to encode proteins that are likely to be associated with mycoplasma–host interactions. Sharing genetic resources via horizontal gene transfer may provide mycoplasmas with a means for adapting to new niches or to new hosts and for avoiding irreversible genome erosion.
PMCID: PMC1868952  PMID: 17511520
10.  metaTIGER: a metabolic evolution resource 
Nucleic Acids Research  2008;37(Database issue):D531-D538.
Metabolic networks are a subject that has received much attention, but existing web resources do not include extensive phylogenetic information. Phylogenomic approaches (phylogenetics on a genomic scale) have been shown to be effective in the study of evolution and processes like horizontal gene transfer (HGT). To address the lack of phylogenomic information relating to eukaryotic metabolism, metaTIGER ( has been created, using genomic information from 121 eukaryotes and 404 prokaryotes and sensitive sequence search techniques to predict the presence of metabolic enzymes. These enzyme sequences were used to create a comprehensive database of 2257 maximum-likelihood phylogenetic trees, some containing over 500 organisms. The trees can be viewed using iTOL, an advanced interactive tree viewer, enabling straightforward interpretation of large trees. Complex high-throughput tree analysis is also available through user-defined queries, allowing the rapid identification of trees of interest, e.g. containing putative HGT events. metaTIGER also provides novel and easy-to-use facilities for viewing and comparing the metabolic networks in different organisms via highlighted pathway images and tables. metaTIGER is demonstrated through evolutionary analysis of Plasmodium, including identification of genes horizontally transferred from chlamydia.
PMCID: PMC2686446  PMID: 18953037
11.  HGT-Gen: a tool for generating a phylogenetic tree with horizontal gene transfer 
Bioinformation  2011;7(5):211-213.
Horizontal gene transfer (HGT) is a common event in prokaryotic evolution. Therefore, it is very important to consider HGT in the study of molecular evolution of prokaryotes. This is true also for conducting computer simulations of their molecular phylogeny because HGT is known to be a serious disturbing factor for estimating their correct phylogeny. To the best of our knowledge, no existing computer program has generated a phylogenetic tree with HGT from an original phylogenetic tree. We developed a program called HGT-Gen that generates a phylogenetic tree with HGT on the basis of an original phylogenetic tree of a protein or gene. HGT-Gen converts an operational taxonomic unit or a clade from one place to another in a given phylogenetic tree. We have also devised an algorithm to compute the average length between any pair of branches in the tree. It defines and computes the relative evolutionary time to normalize evolutionary time for each lineage. The algorithm can generate an HGT between a pair of donor and acceptor lineages at the same evolutionary time. HGT-Gen is used with a sequence-generating program to evaluate the influence of HGT on the molecular phylogeny of prokaryotes in a computer simulation study.
The database is available for free at˜thoriike/HGT-Gen.html
PMCID: PMC3218414  PMID: 22125388
12.  Using the nucleotide substitution rate matrix to detect horizontal gene transfer 
BMC Bioinformatics  2006;7:476.
Horizontal gene transfer (HGT) has allowed bacteria to evolve many new capabilities. Because transferred genes perform many medically important functions, such as conferring antibiotic resistance, improved detection of horizontally transferred genes from sequence data would be an important advance. Existing sequence-based methods for detecting HGT focus on changes in nucleotide composition or on differences between gene and genome phylogenies; these methods have high error rates.
First, we introduce a new class of methods for detecting HGT based on the changes in nucleotide substitution rates that occur when a gene is transferred to a new organism. Our new methods discriminate simulated HGT events with an error rate up to 10 times lower than does GC content. Use of models that are not time-reversible is crucial for detecting HGT. Second, we show that using combinations of multiple predictors of HGT offers substantial improvements over using any single predictor, yielding as much as a factor of 18 improvement in performance (a maximum reduction in error rate from 38% to about 3%). Multiple predictors were combined by using the random forests machine learning algorithm to identify optimal classifiers that separate HGT from non-HGT trees.
The new class of HGT-detection methods introduced here combines advantages of phylogenetic and compositional HGT-detection techniques. These new techniques offer order-of-magnitude improvements over compositional methods because they are better able to discriminate HGT from non-HGT trees under a wide range of simulated conditions. We also found that combining multiple measures of HGT is essential for detecting a wide range of HGT events. These novel indicators of horizontal transfer will be widely useful in detecting HGT events linked to the evolution of important bacterial traits, such as antibiotic resistance and pathogenicity.
PMCID: PMC1657035  PMID: 17067382
13.  Complete-fosmid and fosmid-end sequences reveal frequent horizontal gene transfers in marine uncultured planktonic archaea 
The ISME Journal  2011;5(8):1291-1302.
The extent of horizontal gene transfer (HGT) among marine pelagic prokaryotes and the role that HGT may have played in their adaptation to this particular environment remain open questions. This is partly due to the paucity of cultured species and genomic information for many widespread groups of marine bacteria and archaea. Molecular studies have revealed a large diversity and relative abundance of marine planktonic archaea, in particular of Thaumarchaeota (also known as group I Crenarchaeota) and Euryarchaeota of groups II and III, but only one species (the thaumarchaeote Candidatus Nitrosopumilus maritimus) has been isolated in pure culture so far. Therefore, metagenomics remains the most powerful approach to study these environmental groups. To investigate the impact of HGT in marine archaea, we carried out detailed phylogenetic analyses of all open reading frames of 21 archaeal 16S rRNA gene-containing fosmids and, to extend our analysis to other genomic regions, also of fosmid-end sequences of 12 774 fosmids from three different deep-sea locations (South Atlantic and Adriatic Sea at 1000 m depth, and Ionian Sea at 3000 m depth). We found high HGT rates in both marine planktonic Thaumarchaeota and Euryarchaeota, with remarkable converging values estimated from complete-fosmid and fosmid-end sequence analysis (25 and 21% of the genes, respectively). Most HGTs came from bacterial donors (mainly from Proteobacteria, Firmicutes and Chloroflexi) but also from other archaea and eukaryotes. Phylogenetic analyses showed that in most cases HGTs are shared by several representatives of the studied groups, implying that they are ancient and have been conserved over relatively long evolutionary periods. This, together with the functions carried out by these acquired genes (mostly related to energy metabolism and transport of metabolites across membranes), suggests that HGT has played an important role in the adaptation of these archaea to the cold and nutrient-depleted deep marine environment.
PMCID: PMC3146271  PMID: 21346789
Thaumarchaeota; marine Euryarchaeota; metagenomics; deep ocean; planktonic archaea; horizontal gene transfer
14.  Detecting lateral gene transfers by statistical reconciliation of phylogenetic forests 
BMC Bioinformatics  2010;11:324.
To understand the evolutionary role of Lateral Gene Transfer (LGT), accurate methods are needed to identify transferred genes and infer their timing of acquisition. Phylogenetic methods are particularly promising for this purpose, but the reconciliation of a gene tree with a reference (species) tree is computationally hard. In addition, the application of these methods to real data raises the problem of sorting out real and artifactual phylogenetic conflict.
We present Prunier, a new method for phylogenetic detection of LGT based on the search for a maximum statistical agreement forest (MSAF) between a gene tree and a reference tree. The program is flexible as it can use any definition of "agreement" among trees. We evaluate the performance of Prunier and two other programs (EEEP and RIATA-HGT) for their ability to detect transferred genes in realistic simulations where gene trees are reconstructed from sequences. Prunier proposes a single scenario that compares to the other methods in terms of sensitivity, but shows higher specificity. We show that LGT scenarios carry a strong signal about the position of the root of the species tree and could be used to identify the direction of evolutionary time on the species tree. We use Prunier on a biological dataset of 23 universal proteins and discuss their suitability for inferring the tree of life.
The ability of Prunier to take into account branch support in the process of reconciliation allows a gain in complexity, in comparison to EEEP, and in accuracy in comparison to RIATA-HGT. Prunier's greedy algorithm proposes a single scenario of LGT for a gene family, but its quality always compares to the best solutions provided by the other algorithms. When the root position is uncertain in the species tree, Prunier is able to infer a scenario per root at a limited additional computational cost and can easily run on large datasets.
Prunier is implemented in C++, using the Bio++ library and the phylogeny program Treefinder. It is available at:
PMCID: PMC2905365  PMID: 20550700
15.  Genome trees constructed using five different approaches suggest new major bacterial clades 
The availability of multiple complete genome sequences from diverse taxa prompts the development of new phylogenetic approaches, which attempt to incorporate information derived from comparative analysis of complete gene sets or large subsets thereof. Such attempts are particularly relevant because of the major role of horizontal gene transfer and lineage-specific gene loss, at least in the evolution of prokaryotes.
Five largely independent approaches were employed to construct trees for completely sequenced bacterial and archaeal genomes: i) presence-absence of genomes in clusters of orthologous genes; ii) conservation of local gene order (gene pairs) among prokaryotic genomes; iii) parameters of identity distribution for probable orthologs; iv) analysis of concatenated alignments of ribosomal proteins; v) comparison of trees constructed for multiple protein families. All constructed trees support the separation of the two primary prokaryotic domains, bacteria and archaea, as well as some terminal bifurcations within the bacterial and archaeal domains. Beyond these obvious groupings, the trees made with different methods appeared to differ substantially in terms of the relative contributions of phylogenetic relationships and similarities in gene repertoires caused by similar life styles and horizontal gene transfer to the tree topology. The trees based on presence-absence of genomes in orthologous clusters and the trees based on conserved gene pairs appear to be strongly affected by gene loss and horizontal gene transfer. The trees based on identity distributions for orthologs and particularly the tree made of concatenated ribosomal protein sequences seemed to carry a stronger phylogenetic signal. The latter tree supported three potential high-level bacterial clades,: i) Chlamydia-Spirochetes, ii) Thermotogales-Aquificales (bacterial hyperthermophiles), and ii) Actinomycetes-Deinococcales-Cyanobacteria. The latter group also appeared to join the low-GC Gram-positive bacteria at a deeper tree node. These new groupings of bacteria were supported by the analysis of alternative topologies in the concatenated ribosomal protein tree using the Kishino-Hasegawa test and by a census of the topologies of 132 individual groups of orthologous proteins. Additionally, the results of this analysis put into question the sister-group relationship between the two major archaeal groups, Euryarchaeota and Crenarchaeota, and suggest instead that Euryarchaeota might be a paraphyletic group with respect to Crenarchaeota.
We conclude that, the extensive horizontal gene flow and lineage-specific gene loss notwithstanding, extension of phylogenetic analysis to the genome scale has the potential of uncovering deep evolutionary relationships between prokaryotic lineages.
PMCID: PMC60490  PMID: 11734060
16.  A database of phylogenetically atypical genes in archaeal and bacterial genomes, identified using the DarkHorse algorithm 
BMC Bioinformatics  2008;9:419.
The process of horizontal gene transfer (HGT) is believed to be widespread in Bacteria and Archaea, but little comparative data is available addressing its occurrence in complete microbial genomes. Collection of high-quality, automated HGT prediction data based on phylogenetic evidence has previously been impractical for large numbers of genomes at once, due to prohibitive computational demands. DarkHorse, a recently described statistical method for discovering phylogenetically atypical genes on a genome-wide basis, provides a means to solve this problem through lineage probability index (LPI) ranking scores. LPI scores inversely reflect phylogenetic distance between a test amino acid sequence and its closest available database matches. Proteins with low LPI scores are good horizontal gene transfer candidates; those with high scores are not.
The DarkHorse algorithm has been applied to 955 microbial genome sequences, and the results organized into a web-searchable relational database, called the DarkHorse HGT Candidate Resource . Users can select individual genomes or groups of genomes to screen by LPI score, search for protein functions by descriptive annotation or amino acid sequence similarity, or select proteins with unusual G+C composition in their underlying coding sequences. The search engine reports LPI scores for match partners as well as query sequences, providing the opportunity to explore whether potential HGT donor sequences are phylogenetically typical or atypical within their own genomes. This information can be used to predict whether or not sufficient information is available to build a well-supported phylogenetic tree using the potential donor sequence.
The DarkHorse HGT Candidate database provides a powerful, flexible set of tools for identifying phylogenetically atypical proteins, allowing researchers to explore both individual HGT events in single genomes, and large-scale HGT patterns among protein families and genome groups. Although the DarkHorse algorithm cannot, by itself, provide definitive proof of horizontal gene transfer, it is a flexible, powerful tool that can be combined with slower, more rigorous methods in situations where these other methods could not otherwise be applied.
PMCID: PMC2573894  PMID: 18840280
17.  Extensive Intra-Kingdom Horizontal Gene Transfer Converging on a Fungal Fructose Transporter Gene 
PLoS Genetics  2013;9(6):e1003587.
Comparative genomics revealed in the last decade a scenario of rampant horizontal gene transfer (HGT) among prokaryotes, but for fungi a clearly dominant pattern of vertical inheritance still stands, punctuated however by an increasing number of exceptions. In the present work, we studied the phylogenetic distribution and pattern of inheritance of a fungal gene encoding a fructose transporter (FSY1) with unique substrate selectivity. 109 FSY1 homologues were identified in two sub-phyla of the Ascomycota, in a survey that included 241 available fungal genomes. At least 10 independent inter-species instances of horizontal gene transfer (HGT) involving FSY1 were identified, supported by strong phylogenetic evidence and synteny analyses. The acquisition of FSY1 through HGT was sometimes suggestive of xenolog gene displacement, but several cases of pseudoparalogy were also uncovered. Moreover, evidence was found for successive HGT events, possibly including those responsible for transmission of the gene among yeast lineages. These occurrences do not seem to be driven by functional diversification of the Fsy1 proteins because Fsy1 homologues from widely distant lineages, including at least one acquired by HGT, appear to have similar biochemical properties. In summary, retracing the evolutionary path of the FSY1 gene brought to light an unparalleled number of independent HGT events involving a single fungal gene. We propose that the turbulent evolutionary history of the gene may be linked to the unique biochemical properties of the encoded transporter, whose predictable effect on fitness may be highly variable. In general, our results support the most recent views suggesting that inter-species HGT may have contributed much more substantially to shape fungal genomes than heretofore assumed.
Author Summary
Genes are commonly vertically inherited, meaning that they share the evolutionary history of the organisms in which they are found. However, they can also be transmitted between species with overlapping niches, a phenomenon known as horizontal gene transfer (HGT) that can occur between closely related species but also between organisms belonging to different domains of life. While HGT is very common in prokaryotes, it has been less frequently reported in eukaryotes, including eukaryotic microbes. In fungi, several instances of genes acquired by HGT from bacteria have been reported, but gene exchange between fungal species is thought to be rare. Here, we describe our findings concerning a single fungal gene that seems to have been transferred between fungi very often. We believe this may be related to the fact that the gene can be both very useful and detrimental for the host, depending on genetic background and environment. Our results suggest that exchange of genes between fungi may happen much more frequently than assumed so far.
PMCID: PMC3688497  PMID: 23818872
18.  SPRIT: Identifying horizontal gene transfer in rooted phylogenetic trees 
Phylogenetic trees based on sequences from a set of taxa can be incongruent due to horizontal gene transfer (HGT). By identifying the HGT events, we can reconcile the gene trees and derive a taxon tree that adequately represents the species' evolutionary history. One HGT can be represented by a rooted Subtree Prune and Regraft (RSPR) operation and the number of RSPRs separating two trees corresponds to the minimum number of HGT events. Identifying the minimum number of RSPRs separating two trees is NP-hard, but the problem can be reduced to fixed parameter tractable. A number of heuristic and two exact approaches to identifying the minimum number of RSPRs have been proposed. This is the first implementation delivering an exact solution as well as the intermediate trees connecting the input trees.
We present the SPR Identification Tool (SPRIT), a novel algorithm that solves the fixed parameter tractable minimum RSPR problem and its GPL licensed Java implementation. The algorithm can be used in two ways, exhaustive search that guarantees the minimum RSPR distance and a heuristic approach that guarantees finding a solution, but not necessarily the minimum one. We benchmarked SPRIT against other software in two different settings, small to medium sized trees i.e. five to one hundred taxa and large trees i.e. thousands of taxa. In the small to medium tree size setting with random artificial incongruence, SPRIT's heuristic mode outperforms the other software by always delivering a solution with a low overestimation of the RSPR distance. In the large tree setting SPRIT compares well to the alternatives when benchmarked on finding a minimum solution within a reasonable time. SPRIT presents both the minimum RSPR distance and the intermediate trees.
When used in exhaustive search mode, SPRIT identifies the minimum number of RSPRs needed to reconcile two incongruent rooted trees. SPRIT also performs quick approximations of the minimum RSPR distance, which are comparable to, and often better than, purely heuristic solutions. Put together, SPRIT is an excellent tool for identification of HGT events and pinpointing which taxa have been involved in HGT.
PMCID: PMC2829038  PMID: 20152048
19.  Estimating the extent of horizontal gene transfer in metagenomic sequences 
BMC Genomics  2008;9:136.
Although the extent of horizontal gene transfer (HGT) in complete genomes has been widely studied, its influence in the evolution of natural communities of prokaryotes remains unknown. The availability of metagenomic sequences allows us to address the study of global patterns of prokaryotic evolution in samples from natural communities. However, the methods that have been commonly used for the study of HGT are not suitable for metagenomic samples. Therefore it is important to develop new methods or to adapt existing ones to be used with metagenomic sequences.
We have created two different methods that are suitable for the study of HGT in metagenomic samples. The methods are based on phylogenetic and DNA compositional approaches, and have allowed us to assess the extent of possible HGT events in metagenomes for the first time. The methods are shown to be compatible and quite precise, although they probably underestimate the number of possible events. Our results show that the phylogenetic method detects HGT in between 0.8% and 1.5% of the sequences, while DNA compositional methods identify putative HGT in between 2% and 8% of the sequences. These ranges are very similar to these found in complete genomes by related approaches. Both methods act with a different sensitivity since they probably target HGT events of different ages: the compositional method mostly identifies recent transfers, while the phylogenetic is more suitable for the detections of older events. Nevertheless, the study of the number of HGT events in metagenomic sequences from different communities shows a consistent trend for both methods: the lower amount is found for the sequences of the Sargasso Sea metagenome, while the higher quantity is found in the whale fall metagenome from the bottom of the ocean. The significance of these observations is discussed.
The computational approaches that are used to find possible HGT events in complete genomes can be adapted to work with metagenomic samples, where a level of high performance is shown in different metagenomic samples. The percentage of possible HGT events that were observed is close to that found for complete genomes, and different microbiomes show diverse ratios of putative HGT events. This is probably related with both environmental factors and the composition in the species of each particular community.
PMCID: PMC2324111  PMID: 18366724
20.  Identification of horizontally transferred genes in the genus Colletotrichum reveals a steady tempo of bacterial to fungal gene transfer 
BMC Genomics  2015;16(1):2.
Horizontal gene transfer (HGT) is the stable transmission of genetic material between organisms by means other than vertical inheritance. HGT has an important role in the evolution of prokaryotes but is relatively rare in eukaryotes. HGT has been shown to contribute to virulence in eukaryotic pathogens. We studied the importance of HGT in plant pathogenic fungi by identifying horizontally transferred genes in the genomes of three members of the genus Colletotrichum.
We identified eleven HGT events from bacteria into members of the genus Colletotrichum or their ancestors. The HGT events include genes involved in amino acid, lipid and sugar metabolism as well as lytic enzymes. Additionally, the putative minimal dates of transference were calculated using a time calibrated phylogenetic tree. This analysis reveals a constant flux of genes from bacteria to fungi throughout the evolution of subphylum Pezizomycotina.
Genes that are typically transferred by HGT are those that are constantly subject to gene duplication and gene loss. The functions of some of these genes suggest roles in niche adaptation and virulence. We found no evidence of a burst of HGT events coinciding with major geological events. In contrast, HGT appears to be a constant, albeit rare phenomenon in the Pezizomycotina, occurring at a steady rate during their evolution.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-16-2) contains supplementary material, which is available to authorized users.
PMCID: PMC4320630  PMID: 25555398
Horizontal gene transfer; Phytopathogen; Colletotrichum; Anthracnose; Pezizomycotina; Bacteria; Molecular clock
21.  Evolutionary insights about bacterial GlxRS from whole genome analyses: is GluRS2 a chimera? 
Evolutionary histories of glutamyl-tRNA synthetase (GluRS) and glutaminyl-tRNA synthetase (GlnRS) in bacteria are convoluted. After the divergence of eubacteria and eukarya, bacterial GluRS glutamylated both tRNAGln and tRNAGlu until GlnRS appeared by horizontal gene transfer (HGT) from eukaryotes or a duplicate copy of GluRS (GluRS2) that only glutamylates tRNAGln appeared. The current understanding is based on limited sequence data and not always compatible with available experimental results. In particular, the origin of GluRS2 is poorly understood.
A large database of bacterial GluRS, GlnRS, tRNAGln and the trimeric aminoacyl-tRNA-dependent amidotransferase (gatCAB), constructed from whole genomes by functionally annotating and classifying these enzymes according to their mutual presence and absence in the genome, was analyzed. Phylogenetic analyses showed that the catalytic and the anticodon-binding domains of functional GluRS2 (as in Helicobacter pylori) were independently acquired from evolutionarily distant hosts by HGT. Non-functional GluRS2 (as in Thermotoga maritima), on the other hand, was found to contain an anticodon-binding domain appended to a gene-duplicated catalytic domain. Several genomes were found to possess both GluRS2 and GlnRS, even though they share the common function of aminoacylating tRNAGln. GlnRS was widely distributed among bacterial phyla and although phylogenetic analyses confirmed the origin of most bacterial GlnRS to be through a single HGT from eukarya, many GlnRS sequences also appeared with evolutionarily distant phyla in phylogenetic tree. A GlnRS pseudogene could be identified in Sorangium cellulosum.
Our analysis broadens the current understanding of bacterial GlxRS evolution and highlights the idiosyncratic evolution of GluRS2. Specifically we show that: i) GluRS2 is a chimera of mismatching catalytic and anticodon-binding domains, ii) the appearance of GlnRS and GluRS2 in a single bacterial genome indicating that the evolutionary histories of the two enzymes are distinct, iii) GlnRS is more widespread in bacteria than is believed, iv) bacterial GlnRS appeared both by HGT from eukarya and intra-bacterial HGT, v) presence of GlnRS pseudogene shows that many bacteria could not retain the newly acquired eukaryal GlnRS. The functional annotation of GluRS, without recourse to experiments, performed in this work, demonstrates the inherent and unique advantages of using whole genome over isolated sequence databases.
PMCID: PMC3927822  PMID: 24521160
GluRS; GluRS2; GlnRS; HGT; tRNAGln; Gene duplication; Phylum-specificity; Whole-genome analysis
22.  Horizontal gene transfer dynamics and distribution of fitness effects during microbial in silico evolution 
BMC Bioinformatics  2012;13(Suppl 10):S13.
Horizontal gene transfer (HGT) is a process that facilitates the transfer of genetic material between organisms that are not directly related, and thus can affect both the rate of evolution and emergence of traits. Recent phylogenetic studies reveal HGT events are likely ubiquitous in the Tree of Life. However, our knowledge of HGT's role in evolution and biological organization is very limited, mainly due to the lack of ancestral evolutionary signatures and the difficulty to observe complex evolutionary dynamics in a laboratory setting. Here, we utilize a multi-scale microbial evolution model to comprehensively study the effect of HGT on the evolution of complex traits and organization of gene regulatory networks.
Large-scale simulations reveal a distinct signature of the Distribution of Fitness Effect (DFE) for HGT events: during evolution, while mutation fitness effects become more negative and neutral, HGT events result in a balanced effect distribution. In either case, lethal events are significantly decreased during evolution (33.0% to 3.2%), a clear indication of mutational robustness. Interestingly, evolution was accelerated when populations were exposed to correlated environments of increasing complexity, especially in the presence of HGT, a phenomenon that warrants further investigation. High HGT rates were found to be disruptive, while the average transferred fragment size was linked to functional module size in the underlying biological network. Network analysis reveals that HGT results in larger regulatory networks, but with the same sparsity level as those evolved in its absence. Observed phenotypic variability and co-existing solutions were traced to individual gain/loss of function events, while subsequent re-wiring after fragment integration was necessary for complex traits to emerge.
PMCID: PMC3382434  PMID: 22759418
23.  Cophenetic correlation analysis as a strategy to select phylogenetically informative proteins: an example from the fungal kingdom 
The construction of robust and well resolved phylogenetic trees is important for our understanding of many, if not all biological processes, including speciation and origin of higher taxa, genome evolution, metabolic diversification, multicellularity, origin of life styles, pathogenicity and so on. Many older phylogenies were not well supported due to insufficient phylogenetic signal present in the single or few genes used in phylogenetic reconstructions. Importantly, single gene phylogenies were not always found to be congruent. The phylogenetic signal may, therefore, be increased by enlarging the number of genes included in phylogenetic studies. Unfortunately, concatenation of many genes does not take into consideration the evolutionary history of each individual gene. Here, we describe an approach to select informative phylogenetic proteins to be used in the Tree of Life (TOL) and barcoding projects by comparing the cophenetic correlation coefficients (CCC) among individual protein distance matrices of proteins, using the fungi as an example. The method demonstrated that the quality and number of concatenated proteins is important for a reliable estimation of TOL. Approximately 40–45 concatenated proteins seem needed to resolve fungal TOL.
In total 4852 orthologous proteins (KOGs) were assigned among 33 fungal genomes from the Asco- and Basidiomycota and 70 of these represented single copy proteins. The individual protein distance matrices based on 531 concatenated proteins that has been used for phylogeny reconstruction before [14] were compared one with another in order to select those with the highest CCC, which then was used as a reference. This reference distance matrix was compared with those of the 70 single copy proteins selected and their CCC values were calculated. Sixty four KOGs showed a CCC above 0.50 and these were further considered for their phylogenetic potential. Proteins belonging to the cellular processes and signaling KOG category seem more informative than those belonging to the other three categories: information storage and processing; metabolism; and the poorly characterized category. After concatenation of 40 proteins the topology of the phylogenetic tree remained stable, but after concatenation of 60 or more proteins the bootstrap support values of some branches decreased, most likely due to the inclusion of proteins with lowers CCC values. The selection of protein sequences to be used in various TOL projects remains a critical and important process. The method described in this paper will contribute to a more objective selection of phylogenetically informative protein sequences.
This study provides candidate protein sequences to be considered as phylogenetic markers in different branches of fungal TOL. The selection procedure described here will be useful to select informative protein sequences to resolve branches of TOL that contain few or no species with completely sequenced genomes. The robust phylogenetic trees resulting from this method may contribute to our understanding of organismal diversification processes. The method proposed can be extended easily to other branches of TOL.
PMCID: PMC2045111  PMID: 17688684
24.  HGT turbulence 
Mobile Genetic Elements  2011;1(4):256-261.
Horizontal gene transfer (HGT) often leads to phylogenetic incongruence. When “duplicative HGT” introduces a second copy of a pre-existing gene, the two copies may then engage in gene conversion, leading to phylogenetically mosiac genes. When duplicative HGT is followed by differential gene conversion among descendant lineages, as under the DH-DC model, phylogenetic analysis is further complicated. To explore the effects of DH-DC on phylogeny reconstruction, we analyzed two sets of sequences: (1) an augmented set of plant mitochondrial atp1 sequences for which we recently published evidence of DH-DC; and (2) a set of simulated sequences for which we varied the extent of chimerism, the number of chimeric genes and nucleotide substitution rates. We show that the phylogenetic behavior of evolutionarily chimeric genes is highly volatile and depends on both the degree of chimerism and the number of differentially chimeric genes present in the analysis. Furthermore, we show that the presence of chimeric genes in gene trees can spuriously affect the phylogenetic position of purely native sequences, especially by attracting these sequences toward basal positions in trees. We propose the term “HGT turbulence” to describe these complex effects of evolutionarily chimeric genes on phylogenetic results.
PMCID: PMC3337133  PMID: 22545235
chimeric gene; Gene conversion; horizontal gene transfer; maximum likelihood; mitochondria; phylogeny reconstruction; recombination; simulations
25.  The transferome of metabolic genes explored: analysis of the horizontal transfer of enzyme encoding genes in unicellular eukaryotes 
Genome Biology  2009;10(4):R36.
Metabolic network analysis in multiple eukaryotes identifies how horizontal and endosymbiotic gene transfer of metabolic enzyme-encoding genes leads to functional gene gain during evolution.
Metabolic networks are responsible for many essential cellular processes, and exhibit a high level of evolutionary conservation from bacteria to eukaryotes. If genes encoding metabolic enzymes are horizontally transferred and are advantageous, they are likely to become fixed. Horizontal gene transfer (HGT) has played a key role in prokaryotic evolution and its importance in eukaryotes is increasingly evident. High levels of endosymbiotic gene transfer (EGT) accompanied the establishment of plastids and mitochondria, and more recent events have allowed further acquisition of bacterial genes. Here, we present the first comprehensive multi-species analysis of E/HGT of genes encoding metabolic enzymes from bacteria to unicellular eukaryotes.
The phylogenetic trees of 2,257 metabolic enzymes were used to make E/HGT assertions in ten groups of unicellular eukaryotes, revealing the sources and metabolic processes of the transferred genes. Analyses revealed a preference for enzymes encoded by genes gained through horizontal and endosymbiotic transfers to be connected in the metabolic network. Enrichment in particular functional classes was particularly revealing: alongside plastid related processes and carbohydrate metabolism, this highlighted a number of pathways in eukaryotic parasites that are rich in enzymes encoded by transferred genes, and potentially key to pathogenicity. The plant parasites Phytophthora were discovered to have a potential pathway for lipopolysaccharide biosynthesis of E/HGT origin not seen before in eukaryotes outside the Plantae.
The number of enzymes encoded by genes gained through E/HGT has been established, providing insight into functional gain during the evolution of unicellular eukaryotes. In eukaryotic parasites, genes encoding enzymes that have been gained through horizontal transfer may be attractive drug targets if they are part of processes not present in the host, or are significantly diverged from equivalent host enzymes.
PMCID: PMC2688927  PMID: 19368726

Results 1-25 (945682)