Homing endonuclease genes (HEGs) are superfluous, but are capable of invading populations that mix alleles by biasing their inheritance patterns through gene conversion. One model suggests that their long-term persistence is achieved through recurrent invasion. This circumvents evolutionary degeneration, but requires reasonable rates of transfer between species to maintain purifying selection. Although HEGs are found in a variety of microbes, we found the previous discovery of this type of selfish genetic element in the mitochondria of a sea anemone surprising.
We surveyed 29 species of Cnidaria for the presence of the COXI HEG. Statistical analyses provided evidence for HEG invasion. We also found that 96 individuals of Metridium senile, from five different locations in the UK, had identical HEG sequences. This lack of sequence divergence illustrates the stable nature of Anthozoan mitochondria. Our data suggests this HEG conforms to the recurrent invasion model of evolution.
Ordinarily such low rates of HEG transfer would likely be insufficient to enable major invasion. However, the slow rate of Anthozoan mitochondrial change lengthens greatly the time to HEG degeneration: this significantly extends the periodicity of the HEG life-cycle. We suggest that a combination of very low substitution rates and rare transfers facilitated metazoan HEG invasion.
Diverse, distantly-related eukaryotic lineages have adapted to low-oxygen environments, and possess mitochondrion-related organelles that have lost the capacity to generate adenosine triphosphate (ATP) through oxidative phosphorylation. A subset of these organelles, hydrogenosomes, has acquired a set of characteristic ATP generation enzymes commonly found in anaerobic bacteria. The recipient of these enzymes could not have survived prior to their acquisition had it not still possessed the electron transport chain present in the ancestral mitochondrion. In the divergence of modern hydrogenosomes from mitochondria, a transitional organelle must therefore have existed that possessed both an electron transport chain and an anaerobic ATP generation pathway. Here, we report a modern analog of this organelle in the habitually aerobic opportunistic pathogen, Acanthamoeba castellanii. This organism possesses a complete set of enzymes comprising a hydrogenosome-like ATP generation pathway, each of which is predicted to be targeted to mitochondria. We have experimentally confirmed the mitochondrial localizations of key components of this pathway using tandem mass spectrometry. This evidence is the first supported by localization and proteome data of a mitochondrion possessing both an electron transport chain and hydrogenosome-like energy metabolism enzymes. Our work provides insight into the first steps that might have occurred in the course of the emergence of modern hydrogenosomes.
Glyceraldehyde-3-phosphate dehydrogenase (GAPDH) is a key enzyme of the glycolytic pathway, reversibly catalyzing the sixth step of glycolysis and concurrently reducing the coenzyme NAD+ to NADH. In photosynthetic organisms a GAPDH paralog (Gap2 in Cyanobacteria, GapA in most photosynthetic eukaryotes) functions in the Calvin cycle, performing the reverse of the glycolytic reaction and using the coenzyme NADPH preferentially. In a number of photosynthetic eukaryotes that acquired their plastid by the secondary endosymbiosis of a eukaryotic red alga (Alveolates, haptophytes, cryptomonads and stramenopiles) GapA has been apparently replaced with a paralog of the host’s own cytosolic GAPDH (GapC1). Plastid GapC1 and GapA therefore represent two independent cases of functional divergence and adaptations to the Calvin cycle entailing a shift in subcellular targeting and a shift in binding preference from NAD+ to NADPH.
We used the programs FunDi, GroupSim, and Difference Evolutionary-Trace to detect sites involved in the functional divergence of these two groups of GAPDH sequences and to identify potential cases of convergent evolution in the Calvin-cycle adapted GapA and GapC1 families. Sites identified as being functionally divergent by all or some of these programs were then investigated with respect to their possible roles in the structure and function of both glycolytic and plastid-targeted GAPDH isoforms.
In this work we found substantial evidence for convergent evolution in GapA/B and GapC1. In many cases sites in GAPDHs of these groups converged on identical amino acid residues in specific positions of the protein known to play a role in the function and regulation of plastid-functioning enzymes relative to their cytosolic counterparts. In addition, we demonstrate that bioinformatic software like FunDi are important tools for the generation of meaningful biological hypotheses that can then be tested with direct experimental techniques.
Many of the eukaryotic phylogenomic analyses published to date were based on alignments of hundreds to thousands of genes. Frequently, in such analyses, the most realistic evolutionary models currently available are often used to minimize the impact of systematic error. However, controversy remains over whether or not idiosyncratic gene family dynamics (i.e., gene duplications and losses) and incorrect orthology assignments are always appropriately taken into account. In this paper, we present an innovative strategy for overcoming orthology assignment problems. Rather than identifying and eliminating genes with paralogy problems, we have constructed a data set comprised exclusively of conserved single-copy protein domains that, unlike most of the commonly used phylogenomic data sets, should be less confounded by orthology miss-assignments. To evaluate the power of this approach, we performed maximum likelihood and Bayesian analyses to infer the evolutionary relationships within the opisthokonts (which includes Metazoa, Fungi, and related unicellular lineages). We used this approach to test 1) whether Filasterea and Ichthyosporea form a clade, 2) the interrelationships of early-branching metazoans, and 3) the relationships among early-branching fungi. We also assessed the impact of some methods that are known to minimize systematic error, including reducing the distance between the outgroup and ingroup taxa or using the CAT evolutionary model. Overall, our analyses support the Filozoa hypothesis in which Ichthyosporea are the first holozoan lineage to emerge followed by Filasterea, Choanoflagellata, and Metazoa. Blastocladiomycota appears as a lineage separate from Chytridiomycota, although this result is not strongly supported. These results represent independent tests of previous phylogenetic hypotheses, highlighting the importance of sophisticated approaches for orthology assignment in phylogenomic analyses.
Capsaspora; Filasterea; Filozoa; Holozoa; Ichthyosporea; multicellularity
Sterols are key components of eukaryotic cellular membranes that are synthesized by multi-enzyme pathways that require molecular oxygen. Because prokaryotes fundamentally lack sterols, it is unclear how the vast diversity of bacterivorous eukaryotes that inhabit hypoxic environments obtain, or synthesize, sterols. Here we show that tetrahymanol, a triterpenoid that does not require molecular oxygen for its biosynthesis, likely functions as a surrogate of sterol in eukaryotes inhabiting oxygen-poor environments. Genes encoding the tetrahymanol synthesizing enzyme squalene-tetrahymanol cyclase were found from several phylogenetically diverged eukaryotes that live in oxygen-poor environments and appear to have been laterally transferred among such eukaryotes.
This article was reviewed by Eric Bapteste and Eugene Koonin.
eukaryotes; lateral gene transfer; phagocytosis; sterols; tetrahymanol
The divergent eukaryotic unicellular organism Giardia intestinalis is an intestinal parasite in humans and various animals. An analysis of a draft genome sequence suggested that G. intestinalis has a much simpler genome organization and gene repertoire than those of other model eukaryotic organisms (e.g., Arabidopsis and human). This general picture of the G. intestinalis genome seemingly agrees with the fact that only four spliceosomal (cis-spliced) introns have been identified in this organism to date. We have recently shown that G. intestinalis possesses a unique gene expression system incorporating spliceosome-mediated trans-splicing. Some protein-coding genes in G. intestinalis are split into multiple pieces in the genome and each gene fragment is independently transcribed. Two particular pre-mRNAs directly interact with each other by forming an intermolecular-stem structure and are then trans-spliced into a mature mRNA by spliceosomes. We believe that this trans-splicing secondarily arose from the system that excises canonical (cis-splicing) introns. Based on these findings, we suspect that similar phenomena—split genes and post-transcriptional assemblage of their transcripts via trans-splicing—may be prevalent in more distinct eukaryotic lineages than previously known, particularly in organisms possessing “intron-poor” genomes.
cis-splicing; dynein; Giardia intestinalis; heat shock protein 90; RNA maturation; spliceosomal introns; splintrons; trans-splicing
Protists that live under low-oxygen conditions often lack conventional mitochondria and instead possess mitochondrion-related organelles (MROs) with distinct biochemical functions. Studies of mostly parasitic organisms have suggested that these organelles could be classified into two general types: hydrogenosomes and mitosomes. Hydrogenosomes, found in parabasalids, anaerobic chytrid fungi, and ciliates, metabolize pyruvate anaerobically to generate ATP, acetate, CO2, and hydrogen gas, employing enzymes not typically associated with mitochondria. Mitosomes that have been studied have no apparent role in energy metabolism. Recent investigations of free-living anaerobic protists have revealed a diversity of MROs with a wider array of metabolic properties that defy a simple functional classification. Here we describe an expressed sequence tag (EST) survey and ultrastructural investigation of the anaerobic heteroloboseid amoeba Sawyeria marylandensis aimed at understanding the properties of its MROs. This organism expresses typical anaerobic energy metabolic enzymes, such as pyruvate:ferredoxin oxidoreductase, [FeFe]-hydrogenase, and associated hydrogenase maturases with apparent organelle-targeting peptides, indicating that its MRO likely functions as a hydrogenosome. We also identified 38 genes encoding canonical mitochondrial proteins in S. marylandensis, many of which possess putative targeting peptides and are phylogenetically related to putative mitochondrial proteins of its heteroloboseid relative Naegleria gruberi. Several of these proteins, such as a branched-chain alpha keto acid dehydrogenase, likely function in pathways that have not been previously associated with the well-studied hydrogenosomes of parabasalids. Finally, morphological reconstructions based on transmission electron microscopy indicate that the S. marylandensis MROs form novel cup-like structures within the cells. Overall, these data suggest that Sawyeria marylandensis possesses a hydrogenosome of mitochondrial origin with a novel combination of biochemical and structural properties.
There is little doubt that genes can spread across unrelated prokaryotes, eukaryotes and even between these domains. It is expected that organisms inhabiting a common niche may exchange their genes even more often due to their physical proximity and similar demands. One such niche is anaerobic or microaerophilic environments in some sediments and intestines of animals. Indeed, enzymes advantageous for metabolism in these environments often exhibit an evolutionary history incoherent with the history of their hosts indicating potential transfers. The evolutionary paths of some very basic enzymes for energy metabolism of anaerobic eukaryotes (pyruvate formate lyase, pyruvate:ferredoxin oxidoreductase, [FeFe]hydrogenase and arginine deiminase) seems to be particularly intriguing and although their histories are not identical they share several unexpected features in common. Every enzyme mentioned above is present in groups of eukaryotes that are unrelated to each other. Although the enzyme phylogenies are not always robustly supported, they always suggest that the eukaryotic homologues form one or two clades, in which the relationships are not congruent with the eukaryotic phylogeny. Finally, these eukaryotic enzymes are never specifically related to homologues from α-proteobacteria, ancestors of mitochondria. The most plausible explanation for evolution of this pattern expects one or two interdomain transfers to one or two eukaryotes from prokaryotes, who were not the mitochondrial endosymbiont. Once the genes were introduced into the eukaryotic domain they have spread to other eukaryotic groups exclusively via eukaryote-to-eukaryote transfers. Currently, eukaryote-to-eukaryote gene transfers have been regarded as less common than prokaryote-to-eukaryote transfers. The fact that eukaryotes accepted genes for these enzymes solely from other eukaryotes and not prokaryotes present in the same environment is surprising.
pyruvate formate lyase; pyruvate:ferredoxin oxidoreductase; [FeFe]hydrogenase; arginine deiminase; lateral gene transfer; origin; mitochondrion; anaerobic metabolism
Elongation factor-1α (EF-1α) and elongation factor-like (EFL) proteins are functionally homologous to one another, and are core components of the eukaryotic translation machinery. The patchy distribution of the two elongation factor types across global eukaryotic phylogeny is suggestive of a ‘differential loss’ hypothesis that assumes that EF-1α and EFL were present in the most recent common ancestor of eukaryotes followed by independent differential losses of one of the two factors in the descendant lineages. To date, however, just one diatom and one fungus have been found to have both EF-1α and EFL (dual-EF-containing species).
In this study, we characterized 35 new EF-1α/EFL sequences from phylogenetically diverse eukaryotes. In so doing we identified 11 previously unreported dual-EF-containing species from diverse eukaryote groups including the Stramenopiles, Apusomonadida, Goniomonadida, and Fungi. Phylogenetic analyses suggested vertical inheritance of both genes in each of the dual-EF lineages. In the dual-EF-containing species we identified, the EF-1α genes appeared to be highly divergent in sequence and suppressed at the transcriptional level compared to the co-occurring EFL genes.
According to the known EF-1α/EFL distribution, the differential loss process should have occurred independently in diverse eukaryotic lineages, and more dual-EF-containing species remain unidentified. We predict that dual-EF-containing species retain the divergent EF-1α homologues only for a sub-set of the original functions. As the dual-EF-containing species are distantly related to each other, we propose that independent re-modelling of EF-1α function took place in multiple branches in the tree of eukaryotes.
Diatoms; Differential Gene Loss; EF-1α; EFL; Functional Remodeling; Goniomonas; Pythium; Spizellomyces; Thecamonas
Motivation: Aligning protein sequences with the best possible accuracy requires sophisticated algorithms. Since the optimal alignment is not guaranteed to be the correct one, it is expected that even the best alignment will contain sites that do not respect the assumption of positional homology. Because formulating rules to identify these sites is difficult, it is common practice to manually remove them. Although considered necessary in some cases, manual editing is time consuming and not reproducible. We present here an automated editing method based on the classification of ‘valid’ and ‘invalid’ sites.
Results: A support vector machine (SVM) classifier is trained to reproduce the decisions made during manual editing with an accuracy of 95.0%. This implies that manual editing can be made reproducible and applied to large-scale analyses. We further demonstrate that it is possible to retrain/extend the training of the classifier by providing examples of multiple sequence alignment (MSA) annotation. Near optimal training can be achieved with only 1000 annotated sites, or roughly three samples of protein sequence alignments.
Availability: This method is implemented in the software MANUEL, licensed under the GPL. A web-based application for single and batch job is available at http://fester.cs.dal.ca/manuel.
Supplementary information: Supplementary data are available at Bioinformatics online.
The covarion hypothesis of molecular evolution holds that selective pressures on a given amino acid or nucleotide site are dependent on the identity of other sites in the molecule that change throughout time, resulting in changes of evolutionary rates of sites along the branches of a phylogenetic tree. At the sequence level, covarion-like evolution at a site manifests as conservation of nucleotide or amino acid states among some homologs where the states are not conserved in other homologs (or groups of homologs). Covarion-like evolution has been shown to relate to changes in functions at sites in different clades, and, if ignored, can adversely affect the accuracy of phylogenetic inference.
PROCOV (protein covarion analysis) is a software tool that implements a number of previously proposed covarion models of protein evolution for phylogenetic inference in a maximum likelihood framework. Several algorithmic and implementation improvements in this tool over previous versions make computationally expensive tree searches with covarion models more efficient and analyses of large phylogenomic data sets tractable. PROCOV can be used to identify covarion sites by comparing the site likelihoods under the covarion process to the corresponding site likelihoods under a rates-across-sites (RAS) process. Those sites with the greatest log-likelihood difference between a 'covarion' and an RAS process were found to be of functional or structural significance in a dataset of bacterial and eukaryotic elongation factors.
Covarion models implemented in PROCOV may be especially useful for phylogenetic estimation when ancient divergences between sequences have occurred and rates of evolution at sites are likely to have changed over the tree. It can also be used to study lineage-specific functional shifts in protein families that result in changes in the patterns of site variability among subtrees.
Widely used substitution models for proteins, such as the Jones-Taylor-Thornton (JTT) or Whelan and Goldman (WAG) models, are based on empirical amino acid interchange matrices estimated from databases of protein alignments that incorporate the average amino acid frequencies of the data set under examination (e.g JTT + F). Variation in the evolutionary process between sites is typically modelled by a rates-across-sites distribution such as the gamma (Γ) distribution. However, sites in proteins also vary in the kinds of amino acid interchanges that are favoured, a feature that is ignored by standard empirical substitution matrices. Here we examine the degree to which the pattern of evolution at sites differs from that expected based on empirical amino acid substitution models and evaluate the impact of these deviations on phylogenetic estimation.
We analyzed 21 large protein alignments with two statistical tests designed to detect deviation of site-specific amino acid distributions from data simulated under the standard empirical substitution model: JTT+ F + Γ. We found that the number of states at a given site is, on average, smaller and the frequencies of these states are less uniform than expected based on a JTT + F + Γ substitution model. With a four-taxon example, we show that phylogenetic estimation under the JTT + F + Γ model is seriously biased by a long-branch attraction artefact if the data are simulated under a model utilizing the observed site-specific amino acid frequencies from an alignment. Principal components analyses indicate the existence of at least four major site-specific frequency classes in these 21 protein alignments. Using a mixture model with these four separate classes of site-specific state frequencies plus a fifth class of global frequencies (the JTT + cF + Γ model), significant improvements in model fit for real data sets can be achieved. This simple mixture model also reduces the long-branch attraction problem, as shown by simulations and analyses of a real phylogenomic data set.
Protein families display site-specific evolutionary dynamics that are ignored by standard protein phylogenetic models. Accurate estimation of protein phylogenies requires models that accommodate the heterogeneity in the evolutionary process across sites. To this end, we have implemented a class frequency mixture model (cF) in a freely available program called QmmRAxML for phylogenetic estimation.
Fornicata is a relatively recently established group of protists that includes the diplokaryotic diplomonads (which have two similar nuclei per cell), and the monokaryotic enteromonads, retortamonads and Carpediemonas, with the more typical one nucleus per cell. The monophyly of the group was confirmed by molecular phylogenetic studies, but neither the internal phylogeny nor its position on the eukaryotic tree has been clearly resolved.
Here we have introduced data for three genes (SSU rRNA, α-tubulin and HSP90) with a wide taxonomic sampling of Fornicata, including ten isolates of enteromonads, representing the genera Trimitus and Enteromonas, and a new undescribed enteromonad genus. The diplomonad sequences formed two main clades in individual gene and combined gene analyses, with Giardia (and Octomitus) on one side of the basal divergence and Spironucleus, Hexamita and Trepomonas on the other. Contrary to earlier evolutionary scenarios, none of the studied enteromonads appeared basal to diplokaryotic diplomonads. Instead, the enteromonad isolates were all robustly situated within the second of the two diplomonad clades. Furthermore, our analyses suggested that enteromonads do not constitute a monophyletic group, and enteromonad monophyly was statistically rejected in 'approximately unbiased' tests of the combined gene data.
We suggest that all higher taxa intended to unite multiple enteromonad genera be abandoned, that Trimitus and Enteromonas be considered as part of Hexamitinae, and that the term 'enteromonads' be used in a strictly utilitarian sense. Our result suggests either that the diplokaryotic condition characteristic of diplomonads arose several times independently, or that the monokaryotic cell of enteromonads originated several times independently by secondary reduction from the diplokaryotic state. Both scenarios are evolutionarily complex. More comparative data on the similarity of the genomes of the two nuclei of diplomonads will be necessary to resolve which evolutionary scenario is more probable.
The anaerobic lifestyle of the intestinal parasite Blastocystis raises questions about the biochemistry and function of its mitochondria-like organelles. We have characterized the Blastocystis succinyl-CoA synthetase (SCS), a tricarboxylic acid cycle enzyme that conserves energy by substrate-level phosphorylation. We show that SCS localizes to the enigmatic Blastocystis organelles, indicating that these organelles might play a similar role in energy metabolism as classic mitochondria. Although analysis of residues inside the nucleotide-binding site suggests that Blastocystis SCS is GTP-specific, we demonstrate that it is ATP-specific. Homology modelling, followed by flexible docking and molecular dynamics simulations, indicates that while both ATP and GTP fit into the Blastocystis SCS active site, GTP is destabilized by electrostatic dipole interactions with Lys 42 and Lys 110, the side-chains of which lie outside the nucleotide-binding cavity. It has been proposed that residues in direct contact with the substrate determine nucleotide specificity in SCS. However, our results indicate that, in Blastocystis, an electrostatic gatekeeper controls which ligands can enter the binding site.
Blastocystis is a unicellular stramenopile of controversial pathogenicity in humans [1, 2]. Although it is a strict anaerobe, Blastocystis has mitochondrion-like organelles with cristae, a transmembrane potential and DNA [2–4]. An apparent lack of several typical mitochondrial pathways has led some to suggest that these organelles might be hydrogenosomes, anaerobic organelles related to mitochondria [5, 6]. We generated 12,767 expressed sequence tags (ESTs) from Blastocystis and identified 115 clusters that encode putative mitochondrial and hydrogenosomal proteins. Among these is the canonical hydrogenosomal protein iron-only [FeFe] hydrogenase that we show localizes to the organelles. The organelles also have mitochondrial characteristics, including pathways for amino acid metabolism, iron-sulfur cluster biogenesis, and an incomplete tricarboxylic acid cycle as well as a mitochondrial genome. Although complexes I and II of the electron transport chain (ETC) are present, we found no evidence for complexes III and IV or F1Fo ATPases. The Blastocystis organelles have metabolic properties of aerobic and anaerobic mitochondria and of hydrogenosomes [7, 8]. They are convergently similar to organelles recently described in the unrelated ciliate Nyctotherus ovalis. These findings blur the boundaries between mitochondria, hydrogenosomes, and mitosomes, as currently defined, underscoring the disparate selective forces that shape these organelles in eukaryotes.
Most modern eukaryotes diverged from a common ancestor that contained the α-proteobacterial endosymbiont that gave rise to mitochondria. The ‘amitochondriate’ anaerobic protist parasites that have been studied to date, such as Giardia and Trichomonas harbor mitochondrion-related organelles, such as mitosomes or hydrogenosomes. Yet there is one remaining group of mitochondrion-lacking flagellates known as the Preaxostyla that could represent a primitive ‘pre-mitochondrial’ lineage of eukaryotes. To test this hypothesis, we conducted an expressed sequence tag (EST) survey on the preaxostylid flagellate Trimastix pyriformis, a poorly-studied free-living anaerobe. Among the ESTs we detected 19 proteins that, in other eukaryotes, typically function in mitochondria, hydrogenosomes or mitosomes, 12 of which are found exclusively within these organelles. Interestingly, one of the proteins, aconitase, functions in the tricarboxylic acid cycle typical of aerobic mitochondria, whereas others, such as pyruvate:ferredoxin oxidoreductase and [FeFe] hydrogenase, are characteristic of anaerobic hydrogenosomes. Since Trimastix retains genetic evidence of a mitochondriate ancestry, we can now say definitively that all known living eukaryote lineages descend from a common ancestor that had mitochondria.
Determining the relationships among and divergence times for the major eukaryotic lineages remains one of the most important and controversial outstanding problems in evolutionary biology. The sequencing and phylogenetic analyses of ribosomal RNA (rRNA) genes led to the first nearly comprehensive phylogenies of eukaryotes in the late 1980s, and supported a view where cellular complexity was acquired during the divergence of extant unicellular eukaryote lineages. More recently, however, refinements in analytical methods coupled with the availability of many additional genes for phylogenetic analysis showed that much of the deep structure of early rRNA trees was artefactual. Recent phylogenetic analyses of a multiple genes and the discovery of important molecular and ultrastructural phylogenetic characters have resolved eukaryotic diversity into six major hypothetical groups. Yet relationships among these groups remain poorly understood because of saturation of sequence changes on the billion-year time-scale, possible rapid radiations of major lineages, phylogenetic artefacts and endosymbiotic or lateral gene transfer among eukaryotes.
Estimating the divergence dates between the major eukaryote lineages using molecular analyses is even more difficult than phylogenetic estimation. Error in such analyses comes from a myriad of sources including: (i) calibration fossil dates, (ii) the assumed phylogenetic tree, (iii) the nucleotide or amino acid substitution model, (iv) substitution number (branch length) estimates, (v) the model of how rates of evolution change over the tree, (vi) error inherent in the time estimates for a given model and (vii) how multiple gene data are treated. By reanalysing datasets from recently published molecular clock studies, we show that when errors from these various sources are properly accounted for, the confidence intervals on inferred dates can be very large. Furthermore, estimated dates of divergence vary hugely depending on the methods used and their assumptions. Accurate dating of divergence times among the major eukaryote lineages will require a robust tree of eukaryotes, a much richer Proterozoic fossil record of microbial eukaryotes assignable to extant groups for calibration, more sophisticated relaxed molecular clock methods and many more genes sampled from the full diversity of microbial eukaryotes.
eukaryotes; protists; molecular phylogenetics; molecular clock; systematics; superkingdoms
The patchy distribution of genes across the prokaryotes may be caused by multiple gene losses or lateral transfer. Probabilistic models of gene gain and loss are needed to distinguish between these possibilities. Existing models allow only single genes to be gained and lost, despite the empirical evidence for multi-gene events. We compare birth-death models (currently the only widely-used models, in which only one gene can be gained or lost at a time) to blocks models (allowing gain and loss of multiple genes within a family). We analyze two pairs of genomes: two E. coli strains, and the distantly-related Archaeoglobus fulgidus (archaea) and Bacillus subtilis (gram positive bacteria). Blocks models describe the data much better than birth-death models. Our models suggest that lateral transfers of multiple genes from the same family are rare (although transfers of single genes are probably common). For both pairs, the estimated median time that a gene will remain in the genome is not much greater than the time separating the common ancestors of the archaea and bacteria. Deep phylogenetic reconstruction from sequence data will therefore depend on choosing genes likely to remain in the genome for a long time. Phylogenies based on the blocks model are more biologically plausible than phylogenies based on the birth-death model.
gene content; lateral transfer; phylogenetics; likelihood
Glycolysis and subsequent fermentation is the main energy source for many anaerobic organisms. The glycolytic pathway consists of ten enzymatic steps which appear to be universal amongst eukaryotes. However, it has been shown that the origins of these enzymes in specific eukaryote lineages can differ, and sometimes involve lateral gene transfer events. We have conducted an expressed sequence tag (EST) survey of the anaerobic flagellate Trimastix pyriformis to investigate the nature of the evolutionary origins of the glycolytic enzymes in this relatively unstudied organism.
We have found genes in the Trimastix EST data that encode enzymes potentially catalyzing nine of the ten steps of the glycolytic conversion of glucose to pyruvate. Furthermore, we have found two different enzymes that in principle could catalyze the conversion of phosphoenol pyruvate (PEP) to pyruvate (or the reverse reaction) as part of the last step in glycolysis. Our phylogenetic analyses of all of these enzymes revealed at least four cases where the relationship of the Trimastix genes to homologs from other species is at odds with accepted organismal relationships. Although lateral gene transfer events likely account for these anomalies, with the data at hand we were not able to establish with confidence the bacterial donor lineage that gave rise to the respective Trimastix enzymes.
A number of the glycolytic enzymes of Trimastix have been transferred laterally from bacteria instead of being inherited from the last common eukaryotic ancestor. Thus, despite widespread conservation of the glycolytic biochemical pathway across eukaryote diversity, in a number of protist lineages the enzymatic components of the pathway have been replaced by lateral gene transfer from disparate evolutionary sources. It remains unclear if these replacements result from selectively advantageous properties of the introduced enzymes or if they are neutral outcomes of a gene transfer 'ratchet' from food or endosymbiotic organisms or a combination of both processes.
Lateral gene transfer (LGT) in eukaryotes from non-organellar sources is a controversial subject in need of further study. Here we present gene distribution and phylogenetic analyses of the genes encoding the hybrid-cluster protein, A-type flavoprotein, glucosamine-6-phosphate isomerase, and alcohol dehydrogenase E. These four genes have a limited distribution among sequenced prokaryotic and eukaryotic genomes and were previously implicated in gene transfer events affecting eukaryotes. If our previous contention that these genes were introduced by LGT independently into the diplomonad and Entamoeba lineages were true, we expect that the number of putative transfers and the phylogenetic signal supporting LGT should be stable or increase, rather than decrease, when novel eukaryotic and prokaryotic homologs are added to the analyses.
The addition of homologs from phagotrophic protists, including several Entamoeba species, the pelobiont Mastigamoeba balamuthi, and the parabasalid Trichomonas vaginalis, and a large quantity of sequences from genome projects resulted in an apparent increase in the number of putative transfer events affecting all three domains of life. Some of the eukaryotic transfers affect a wide range of protists, such as three divergent lineages of Amoebozoa, represented by Entamoeba, Mastigamoeba, and Dictyostelium, while other transfers only affect a limited diversity, for example only the Entamoeba lineage. These observations are consistent with a model where these genes have been introduced into protist genomes independently from various sources over a long evolutionary time.
Phylogenetic analyses of the updated datasets using more sophisticated phylogenetic methods, in combination with the gene distribution analyses, strengthened, rather than weakened, the support for LGT as an important mechanism affecting the evolution of these gene families. Thus, gene transfer seems to be an on-going evolutionary mechanism by which genes are spread between unrelated lineages of all three domains of life, further indicating the importance of LGT from non-organellar sources into eukaryotic genomes.
Golgi bodies are nearly ubiquitous in eukaryotic cells. The apparent lack of such structures in certain eukaryotic lineages might be taken to mean that these protists evolved prior to the acquisition of the Golgi, and it raises questions of how these organisms function in the absence of this crucial organelle. Here, we report gene sequences from five proposed 'Golgi-lacking' organisms (Giardia intestinalis, Spironucleus barkhanus, Entamoeba histolytica, Naegleria gruberi and Mastigamoeba balamuthi). BLAST and phylogenetic analyses show these genes to be homologous to those encoding components of the retromer, coatomer and adaptin complexes, all of which have Golgi-related functions in mammals and yeast. This is, to our knowledge, the first molecular evidence for Golgi bodies in two major eukaryotic lineages (the pelobionts and heteroloboseids). This substantiates the suggestion that there are no extant primitively 'Golgi-lacking' lineages, and that this apparatus was present in the last common eukaryotic ancestor, but has been altered beyond recognition several times.
An increasing number of bioinformatics methods are considering the phylogenetic relationships between biological sequences. Implementing new methodologies using the maximum likelihood phylogenetic framework can be a time consuming task.
The bioinformatics library libcov is a collection of C++ classes that provides a high and low-level interface to maximum likelihood phylogenetics, sequence analysis and a data structure for structural biological methods. libcov can be used to compute likelihoods, search tree topologies, estimate site rates, cluster sequences, manipulate tree structures and compare phylogenies for a broad selection of applications.
Using this library, it is possible to rapidly prototype applications that use the sophistication of phylogenetic likelihoods without getting involved in a major software engineering project. libcov is thus a potentially valuable building block to develop in-house methodologies in the field of protein phylogenetics.
A number of methods have recently been published that use phylogenetic information extracted from large multiple sequence alignments to detect sites that have changed properties in related protein families. In this study we use such methods to assess functional divergence between eukaryotic EF-1α (eEF-1α), archaebacterial EF-1α (aEF-1α) and two eukaryote-specific EF-1α paralogs—eukaryotic release factor 3 (eRF3) and Hsp70 subfamily B suppressor 1 (HBS1). Overall, the evolutionary modes of aEF-1α, HBS1 and eRF3 appear to significantly differ from that of eEF-1α. However, functionally divergent (FD) sites detected between aEF-1α and eEF-1α only weakly overlap with sites implicated as putative EF-1β or aminoacyl-tRNA (aa-tRNA) binding residues in EF-1α, as expected based on the shared ancestral primary translational functions of these two orthologs. In contrast, FD sites detected between eEF-1α and its paralogs significantly overlap with the putative EF-1β and/or aa-tRNA binding sites in EF-1α. In eRF3 and HBS1, these sites appear to be released from functional constraints, indicating that they bind neither eEF-1β nor aa-tRNA. These results are consistent with experimental observations that eRF3 does not bind to aa-tRNA, but do not support the ‘EF-1α-like’ function recently proposed for HBS1. We re-assess the available genetic data for HBS1 in light of our analyses, and propose that this protein may function in stop codon-independent peptide release.
Lateral gene transfer can introduce genes with novel functions into genomes or replace genes with functionally similar orthologs or paralogs. Here we present a study of the occurrence of the latter gene replacement phenomenon in the four gene families encoding different classes of glutamate dehydrogenase (GDH), to evaluate and compare the patterns and rates of lateral gene transfer (LGT) in prokaryotes and eukaryotes.
We extend the taxon sampling of gdh genes with nine new eukaryotic sequences and examine the phylogenetic distribution pattern of the various GDH classes in combination with maximum likelihood phylogenetic analyses. The distribution pattern analyses indicate that LGT has played a significant role in the evolution of the four gdh gene families. Indeed, a number of gene transfer events are identified by phylogenetic analyses, including numerous prokaryotic intra-domain transfers, some prokaryotic inter-domain transfers and several inter-domain transfers between prokaryotes and microbial eukaryotes (protists).
LGT has apparently affected eukaryotes and prokaryotes to a similar extent within the gdh gene families. In the absence of indications that the evolution of the gdh gene families is radically different from other families, these results suggest that gene transfer might be an important evolutionary mechanism in microbial eukaryote genome evolution.
Comparative sequence analysis has been used to study specific questions about the structure and function of proteins for many years. Here we propose a knowledge-based framework in which the maximum likelihood rate of evolution is used to quantify the level of constraint on the identity of a site. We demonstrate that site-rate mapping on 3D structures using datasets of rhodopsin-like G-protein receptors and α- and β-tubulins provides an excellent tool for pinpointing the functional features shared between orthologous and paralogous proteins. In addition, functional divergence within protein families can be inferred by examining the differences in the site rates, the differences in the chemical properties of the side chains or amino acid usage between aligned sites. Two novel analytical methods are introduced to characterize rate- independent functional divergence. These are tested using a dataset of two classes of HMG-CoA reductases for which only one class can perform both the forward and reverse reaction. We show that functionally divergent sites occur in a cluster of sites interacting with the catalytic residues and that this information should facilitate the design of experimental strategies to directly test functional properties of residues.