Homing endonuclease genes (HEGs) are superfluous, but are capable of invading populations that mix alleles by biasing their inheritance patterns through gene conversion. One model suggests that their long-term persistence is achieved through recurrent invasion. This circumvents evolutionary degeneration, but requires reasonable rates of transfer between species to maintain purifying selection. Although HEGs are found in a variety of microbes, we found the previous discovery of this type of selfish genetic element in the mitochondria of a sea anemone surprising.
We surveyed 29 species of Cnidaria for the presence of the COXI HEG. Statistical analyses provided evidence for HEG invasion. We also found that 96 individuals of Metridium senile, from five different locations in the UK, had identical HEG sequences. This lack of sequence divergence illustrates the stable nature of Anthozoan mitochondria. Our data suggests this HEG conforms to the recurrent invasion model of evolution.
Ordinarily such low rates of HEG transfer would likely be insufficient to enable major invasion. However, the slow rate of Anthozoan mitochondrial change lengthens greatly the time to HEG degeneration: this significantly extends the periodicity of the HEG life-cycle. We suggest that a combination of very low substitution rates and rare transfers facilitated metazoan HEG invasion.
The cytosolic iron/sulfur cluster assembly (CIA) machinery is responsible for the assembly of cytosolic and nuclear iron/sulfur clusters, cofactors that are vital for all living cells. This machinery is uniquely found in eukaryotes and consists of at least eight proteins in opisthokont lineages, such as animals and fungi. We sought to identify and characterize homologues of the CIA system proteins in the anaerobic stramenopile parasite Blastocystis sp. strain NandII. We identified transcripts encoding six of the components—Cia1, Cia2, MMS19, Nbp35, Nar1, and a putative Tah18—and showed using immunofluorescence microscopy, immunoelectron microscopy, and subcellular fractionation that the last three of them localized to the cytoplasm of the cell. We then used comparative genomic and phylogenetic approaches to investigate the evolutionary history of these proteins. While most Blastocystis homologues branch with their eukaryotic counterparts, the putative Blastocystis Tah18 seems to have a separate evolutionary origin and therefore possibly a different function. Furthermore, our phylogenomic analyses revealed that all eight CIA components described in opisthokonts originated before the diversification of extant eukaryotic lineages and were likely already present in the last eukaryotic common ancestor (LECA). The Nbp35, Nar1 Cia1, and Cia2 proteins have been conserved during the subsequent evolutionary diversification of eukaryotes and are present in virtually all extant lineages, whereas the other CIA proteins have patchy phylogenetic distributions. Cia2 appears to be homologous to SufT, a component of the prokaryotic sulfur utilization factors (SUF) system, making this the first reported evolutionary link between the CIA and any other Fe/S biogenesis pathway. All of our results suggest that the CIA machinery is an ubiquitous biosynthetic pathway in eukaryotes, but its apparent plasticity in composition raises questions regarding how it functions in nonmodel organisms and how it interfaces with various iron/sulfur cluster systems (i.e., the iron/sulfur cluster, nitrogen fixation, and/or SUF system) found in eukaryotic cells.
Most eukaryotic lineages belong to one of a few major groups. However, several protistan lineages have not yet been robustly placed in any of these groups. Both the breviates and apusomonads are two such lineages that appear to be related to the Amoebozoa and Opisthokonta (i.e. the ‘unikonts’ or Amorphea); however, their precise phylogenetic positions remain unclear. Here, we describe a novel microaerophilic breviate, Pygsuia biforma gen. nov. sp. nov., isolated from a hypoxic estuarine sediment. Ultrastructurally, this species resembles the breviate genera Breviata and Subulatomonas but has two cell morphologies, adherent and swimming. Phylogenetic analyses of the small sub-unit rRNA gene show that Pygsuia is the sister to the other breviates. We constructed a 159-protein supermatrix, including orthologues identified in RNA-seq data from Pygsuia. Phylogenomic analyses of this dataset show that breviates, apusomonads and Opisthokonta form a strongly supported major eukaryotic grouping we name the Obazoa. Although some phylogenetic methods disagree, the balance of evidence suggests that the breviate lineage forms the deepest branch within Obazoa. We also found transcripts encoding a nearly complete integrin adhesome from Pygsuia, indicating that this protein complex involved in metazoan multicellularity may have evolved earlier in eukaryote evolution than previously thought.
eukaryote evolution; protist; mitochondria; integrin; animal–fungal clade
Termination codons in mRNA molecules are typically specified directly by the sequence of the corresponding gene. However, in mitochondria of a few eukaryotic groups, some mRNAs contain the termination codon UAA deriving one or both adenosines from transcript polyadenylation. Here, we show that a similar phenomenon occurs for a substantial number of nuclear genes in Blastocystis spp., divergent unicellular eukaryote gut parasites. Our analyses of published genomic data from Blastocystis sp. subtype 7 revealed that polyadenylation-mediated creation of termination codons occurs in approximately 15% of all nuclear genes. As this phenomenon has not been noticed before, the procedure previously employed to annotate the Blastocystis nuclear genome sequence failed to correctly define the structure of the 3′-ends of hundreds of genes. From sequence data we have obtained from the distantly related Blastocystis sp. subtype 1 strain, we show that this phenomenon is widespread within the Blastocystis genus. Polyadenylation in Blastocystis appears to be directed by a conserved GU-rich element located four nucleotides downstream of the polyadenylation site. Thus, the highly precise positioning of the polyadenylation in Blastocystis has allowed reduction of the 3′-untranslated regions to the point that, in many genes, only one or two nucleotides of the termination codon are left.
Blastocystis; evolution; gene expression; mRNA processing; polyadenylation; termination codons; translation
Nucleomorphs are residual nuclei derived from eukaryotic endosymbionts in chlorarachniophyte and cryptophyte algae. The endosymbionts that gave rise to nucleomorphs and plastids in these two algal groups were green and red algae, respectively. Despite their independent origin, the chlorarachniophyte and cryptophyte nucleomorph genomes share similar genomic features such as extreme size reduction and a three-chromosome architecture. This suggests that similar reductive evolutionary forces have acted to shape the nucleomorph genomes in the two groups. Thus far, however, only a single chlorarachniophyte nucleomorph and plastid genome has been sequenced, making broad evolutionary inferences within the chlorarachniophytes and between chlorarachniophytes and cryptophytes difficult. We have sequenced the nucleomorph and plastid genomes of the chlorarachniophyte Lotharella oceanica in order to gain insight into nucleomorph and plastid genome diversity and evolution.
The L. oceanica nucleomorph genome was found to consist of three linear chromosomes totaling ~610 kilobase pairs (kbp), much larger than the 373 kbp nucleomorph genome of the model chlorarachniophyte Bigelowiella natans. The L. oceanica plastid genome is 71 kbp in size, similar to that of B. natans. Unexpectedly long (~35 kbp) sub-telomeric repeat regions were identified in the L. oceanica nucleomorph genome; internal multi-copy regions were also detected. Gene content analyses revealed that nucleomorph house-keeping genes and spliceosomal intron positions are well conserved between the L. oceanica and B. natans nucleomorph genomes. More broadly, gene retention patterns were found to be similar between nucleomorph genomes in chlorarachniophytes and cryptophytes. Chlorarachniophyte plastid genomes showed near identical protein coding gene complements as well as a high level of synteny.
We have provided insight into the process of nucleomorph genome evolution by elucidating the fine-scale dynamics of sub-telomeric repeat regions. Homologous recombination at the chromosome ends appears to be frequent, serving to expand and contract nucleomorph genome size. The main factor influencing nucleomorph genome size variation between different chlorarachniophyte species appears to be expansion-contraction of these telomere-associated repeats rather than changes in the number of unique protein coding genes. The dynamic nature of chlorarachniophyte nucleomorph genomes lies in stark contrast to their plastid genomes, which appear to be highly stable in terms of gene content and synteny.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-374) contains supplementary material, which is available to authorized users.
Nucleomorph; Genome reduction; Chlorarachniophytes; Cryptophytes; Endosymbiosis; Phylogenomics
The unicellular eukaryotic assemblage Discoba (Excavata) comprises four lineages: the Heterolobosea, Euglenozoa, Jakobida, and Tsukubamonadida. Discoba has been considered as a key assemblage for understanding the early evolution of mitochondrial (mt) genomes, as jakobids retain the most gene-rich (i.e., primitive) genomes compared with any other eukaryotes determined to date. However, to date, mt genome sequences have been completed for only a few groups within Discoba, including jakobids, two closely related heteroloboseans, and kinetoplastid euglenozoans. The Tsukubamonadida is the least studied lineage, as the order was only recently established with the description of a sole representative species, Tsukubamonas globosa. The evolutionary relationship between T. globosa and other discobids has yet to be resolved, and no mt genome data are available for this particular organism. Here, we use a “phylogenomic” approach to resolve the relationship between T. globosa, heteroloboseans, euglenozoans, and jakobids. In addition, we have characterized the mt genome of T. globosa (48,463 bp in length), which encodes 52 putative protein-coding and 29 RNA genes. By mapping the gene repertoires of discobid mt genomes onto the well-resolved Discoba tree, we model gene loss events during the evolution of discobid mt genomes.
gene loss; genome reduction; organelles; phylogenomics
Among models of nucleotide evolution, the Barry and Hartigan (BH) model (also known as the General Markov Model) is very flexible as it allows separate arbitrary substitution matrices along edges. For a given tree, the estimates of the BH model are a set of joint probability matrices, each giving the pairwise frequencies of nucleotides at the ends of the edge. We have previously shown that, due to an identifiability problem, these cannot be expected to consistently estimate the actual pairwise frequencies. A further consequence is that internal node frequency estimates are likely to be incorrect. Here we define a nonstationary GTR model for each edge that we refer to as the NSGTR model. We fit the NSGTR model by minimizing the sums of squares between the estimates of transition probabilities under the NSGTR model and the estimates provided by a fitted BH model. This NSGTR model provides estimates that avoid the identifiability difficulties of the BH model while closely fitting it. With the best-fitting NSGTR estimates, we are able to get interpretable frequency vectors at internal nodes as well as edge length estimates that are otherwise not yielded by the BH model. These edge lengths are interpretable as the expected number of substitutions along an edge for the model. We also show that for a nonstationary continuous-time model these are not the same as the edge length parameters for conventional substitution matrices that are output by nonstationary model phylogenetic estimation programs such as nhPhyML.
Average substitutions; BH model; identifiability; nonstationary; NSGTR model
Diverse, distantly-related eukaryotic lineages have adapted to low-oxygen environments, and possess mitochondrion-related organelles that have lost the capacity to generate adenosine triphosphate (ATP) through oxidative phosphorylation. A subset of these organelles, hydrogenosomes, has acquired a set of characteristic ATP generation enzymes commonly found in anaerobic bacteria. The recipient of these enzymes could not have survived prior to their acquisition had it not still possessed the electron transport chain present in the ancestral mitochondrion. In the divergence of modern hydrogenosomes from mitochondria, a transitional organelle must therefore have existed that possessed both an electron transport chain and an anaerobic ATP generation pathway. Here, we report a modern analog of this organelle in the habitually aerobic opportunistic pathogen, Acanthamoeba castellanii. This organism possesses a complete set of enzymes comprising a hydrogenosome-like ATP generation pathway, each of which is predicted to be targeted to mitochondria. We have experimentally confirmed the mitochondrial localizations of key components of this pathway using tandem mass spectrometry. This evidence is the first supported by localization and proteome data of a mitochondrion possessing both an electron transport chain and hydrogenosome-like energy metabolism enzymes. Our work provides insight into the first steps that might have occurred in the course of the emergence of modern hydrogenosomes.
Glyceraldehyde-3-phosphate dehydrogenase (GAPDH) is a key enzyme of the glycolytic pathway, reversibly catalyzing the sixth step of glycolysis and concurrently reducing the coenzyme NAD+ to NADH. In photosynthetic organisms a GAPDH paralog (Gap2 in Cyanobacteria, GapA in most photosynthetic eukaryotes) functions in the Calvin cycle, performing the reverse of the glycolytic reaction and using the coenzyme NADPH preferentially. In a number of photosynthetic eukaryotes that acquired their plastid by the secondary endosymbiosis of a eukaryotic red alga (Alveolates, haptophytes, cryptomonads and stramenopiles) GapA has been apparently replaced with a paralog of the host’s own cytosolic GAPDH (GapC1). Plastid GapC1 and GapA therefore represent two independent cases of functional divergence and adaptations to the Calvin cycle entailing a shift in subcellular targeting and a shift in binding preference from NAD+ to NADPH.
We used the programs FunDi, GroupSim, and Difference Evolutionary-Trace to detect sites involved in the functional divergence of these two groups of GAPDH sequences and to identify potential cases of convergent evolution in the Calvin-cycle adapted GapA and GapC1 families. Sites identified as being functionally divergent by all or some of these programs were then investigated with respect to their possible roles in the structure and function of both glycolytic and plastid-targeted GAPDH isoforms.
In this work we found substantial evidence for convergent evolution in GapA/B and GapC1. In many cases sites in GAPDHs of these groups converged on identical amino acid residues in specific positions of the protein known to play a role in the function and regulation of plastid-functioning enzymes relative to their cytosolic counterparts. In addition, we demonstrate that bioinformatic software like FunDi are important tools for the generation of meaningful biological hypotheses that can then be tested with direct experimental techniques.
Elongation factor-1α (EF-1α) and elongation factor-like (EFL) proteins are functionally homologous to one another, and are core components of the eukaryotic translation machinery. The patchy distribution of the two elongation factor types across global eukaryotic phylogeny is suggestive of a ‘differential loss’ hypothesis that assumes that EF-1α and EFL were present in the most recent common ancestor of eukaryotes followed by independent differential losses of one of the two factors in the descendant lineages. To date, however, just one diatom and one fungus have been found to have both EF-1α and EFL (dual-EF-containing species).
In this study, we characterized 35 new EF-1α/EFL sequences from phylogenetically diverse eukaryotes. In so doing we identified 11 previously unreported dual-EF-containing species from diverse eukaryote groups including the Stramenopiles, Apusomonadida, Goniomonadida, and Fungi. Phylogenetic analyses suggested vertical inheritance of both genes in each of the dual-EF lineages. In the dual-EF-containing species we identified, the EF-1α genes appeared to be highly divergent in sequence and suppressed at the transcriptional level compared to the co-occurring EFL genes.
According to the known EF-1α/EFL distribution, the differential loss process should have occurred independently in diverse eukaryotic lineages, and more dual-EF-containing species remain unidentified. We predict that dual-EF-containing species retain the divergent EF-1α homologues only for a sub-set of the original functions. As the dual-EF-containing species are distantly related to each other, we propose that independent re-modelling of EF-1α function took place in multiple branches in the tree of eukaryotes.
Diatoms; Differential Gene Loss; EF-1α; EFL; Functional Remodeling; Goniomonas; Pythium; Spizellomyces; Thecamonas
Many of the eukaryotic phylogenomic analyses published to date were based on alignments of hundreds to thousands of genes. Frequently, in such analyses, the most realistic evolutionary models currently available are often used to minimize the impact of systematic error. However, controversy remains over whether or not idiosyncratic gene family dynamics (i.e., gene duplications and losses) and incorrect orthology assignments are always appropriately taken into account. In this paper, we present an innovative strategy for overcoming orthology assignment problems. Rather than identifying and eliminating genes with paralogy problems, we have constructed a data set comprised exclusively of conserved single-copy protein domains that, unlike most of the commonly used phylogenomic data sets, should be less confounded by orthology miss-assignments. To evaluate the power of this approach, we performed maximum likelihood and Bayesian analyses to infer the evolutionary relationships within the opisthokonts (which includes Metazoa, Fungi, and related unicellular lineages). We used this approach to test 1) whether Filasterea and Ichthyosporea form a clade, 2) the interrelationships of early-branching metazoans, and 3) the relationships among early-branching fungi. We also assessed the impact of some methods that are known to minimize systematic error, including reducing the distance between the outgroup and ingroup taxa or using the CAT evolutionary model. Overall, our analyses support the Filozoa hypothesis in which Ichthyosporea are the first holozoan lineage to emerge followed by Filasterea, Choanoflagellata, and Metazoa. Blastocladiomycota appears as a lineage separate from Chytridiomycota, although this result is not strongly supported. These results represent independent tests of previous phylogenetic hypotheses, highlighting the importance of sophisticated approaches for orthology assignment in phylogenomic analyses.
Capsaspora; Filasterea; Filozoa; Holozoa; Ichthyosporea; multicellularity
Sterols are key components of eukaryotic cellular membranes that are synthesized by multi-enzyme pathways that require molecular oxygen. Because prokaryotes fundamentally lack sterols, it is unclear how the vast diversity of bacterivorous eukaryotes that inhabit hypoxic environments obtain, or synthesize, sterols. Here we show that tetrahymanol, a triterpenoid that does not require molecular oxygen for its biosynthesis, likely functions as a surrogate of sterol in eukaryotes inhabiting oxygen-poor environments. Genes encoding the tetrahymanol synthesizing enzyme squalene-tetrahymanol cyclase were found from several phylogenetically diverged eukaryotes that live in oxygen-poor environments and appear to have been laterally transferred among such eukaryotes.
This article was reviewed by Eric Bapteste and Eugene Koonin.
eukaryotes; lateral gene transfer; phagocytosis; sterols; tetrahymanol
The divergent eukaryotic unicellular organism Giardia intestinalis is an intestinal parasite in humans and various animals. An analysis of a draft genome sequence suggested that G. intestinalis has a much simpler genome organization and gene repertoire than those of other model eukaryotic organisms (e.g., Arabidopsis and human). This general picture of the G. intestinalis genome seemingly agrees with the fact that only four spliceosomal (cis-spliced) introns have been identified in this organism to date. We have recently shown that G. intestinalis possesses a unique gene expression system incorporating spliceosome-mediated trans-splicing. Some protein-coding genes in G. intestinalis are split into multiple pieces in the genome and each gene fragment is independently transcribed. Two particular pre-mRNAs directly interact with each other by forming an intermolecular-stem structure and are then trans-spliced into a mature mRNA by spliceosomes. We believe that this trans-splicing secondarily arose from the system that excises canonical (cis-splicing) introns. Based on these findings, we suspect that similar phenomena—split genes and post-transcriptional assemblage of their transcripts via trans-splicing—may be prevalent in more distinct eukaryotic lineages than previously known, particularly in organisms possessing “intron-poor” genomes.
cis-splicing; dynein; Giardia intestinalis; heat shock protein 90; RNA maturation; spliceosomal introns; splintrons; trans-splicing
Protists that live under low-oxygen conditions often lack conventional mitochondria and instead possess mitochondrion-related organelles (MROs) with distinct biochemical functions. Studies of mostly parasitic organisms have suggested that these organelles could be classified into two general types: hydrogenosomes and mitosomes. Hydrogenosomes, found in parabasalids, anaerobic chytrid fungi, and ciliates, metabolize pyruvate anaerobically to generate ATP, acetate, CO2, and hydrogen gas, employing enzymes not typically associated with mitochondria. Mitosomes that have been studied have no apparent role in energy metabolism. Recent investigations of free-living anaerobic protists have revealed a diversity of MROs with a wider array of metabolic properties that defy a simple functional classification. Here we describe an expressed sequence tag (EST) survey and ultrastructural investigation of the anaerobic heteroloboseid amoeba Sawyeria marylandensis aimed at understanding the properties of its MROs. This organism expresses typical anaerobic energy metabolic enzymes, such as pyruvate:ferredoxin oxidoreductase, [FeFe]-hydrogenase, and associated hydrogenase maturases with apparent organelle-targeting peptides, indicating that its MRO likely functions as a hydrogenosome. We also identified 38 genes encoding canonical mitochondrial proteins in S. marylandensis, many of which possess putative targeting peptides and are phylogenetically related to putative mitochondrial proteins of its heteroloboseid relative Naegleria gruberi. Several of these proteins, such as a branched-chain alpha keto acid dehydrogenase, likely function in pathways that have not been previously associated with the well-studied hydrogenosomes of parabasalids. Finally, morphological reconstructions based on transmission electron microscopy indicate that the S. marylandensis MROs form novel cup-like structures within the cells. Overall, these data suggest that Sawyeria marylandensis possesses a hydrogenosome of mitochondrial origin with a novel combination of biochemical and structural properties.
There is little doubt that genes can spread across unrelated prokaryotes, eukaryotes and even between these domains. It is expected that organisms inhabiting a common niche may exchange their genes even more often due to their physical proximity and similar demands. One such niche is anaerobic or microaerophilic environments in some sediments and intestines of animals. Indeed, enzymes advantageous for metabolism in these environments often exhibit an evolutionary history incoherent with the history of their hosts indicating potential transfers. The evolutionary paths of some very basic enzymes for energy metabolism of anaerobic eukaryotes (pyruvate formate lyase, pyruvate:ferredoxin oxidoreductase, [FeFe]hydrogenase and arginine deiminase) seems to be particularly intriguing and although their histories are not identical they share several unexpected features in common. Every enzyme mentioned above is present in groups of eukaryotes that are unrelated to each other. Although the enzyme phylogenies are not always robustly supported, they always suggest that the eukaryotic homologues form one or two clades, in which the relationships are not congruent with the eukaryotic phylogeny. Finally, these eukaryotic enzymes are never specifically related to homologues from α-proteobacteria, ancestors of mitochondria. The most plausible explanation for evolution of this pattern expects one or two interdomain transfers to one or two eukaryotes from prokaryotes, who were not the mitochondrial endosymbiont. Once the genes were introduced into the eukaryotic domain they have spread to other eukaryotic groups exclusively via eukaryote-to-eukaryote transfers. Currently, eukaryote-to-eukaryote gene transfers have been regarded as less common than prokaryote-to-eukaryote transfers. The fact that eukaryotes accepted genes for these enzymes solely from other eukaryotes and not prokaryotes present in the same environment is surprising.
pyruvate formate lyase; pyruvate:ferredoxin oxidoreductase; [FeFe]hydrogenase; arginine deiminase; lateral gene transfer; origin; mitochondrion; anaerobic metabolism
Motivation: Aligning protein sequences with the best possible accuracy requires sophisticated algorithms. Since the optimal alignment is not guaranteed to be the correct one, it is expected that even the best alignment will contain sites that do not respect the assumption of positional homology. Because formulating rules to identify these sites is difficult, it is common practice to manually remove them. Although considered necessary in some cases, manual editing is time consuming and not reproducible. We present here an automated editing method based on the classification of ‘valid’ and ‘invalid’ sites.
Results: A support vector machine (SVM) classifier is trained to reproduce the decisions made during manual editing with an accuracy of 95.0%. This implies that manual editing can be made reproducible and applied to large-scale analyses. We further demonstrate that it is possible to retrain/extend the training of the classifier by providing examples of multiple sequence alignment (MSA) annotation. Near optimal training can be achieved with only 1000 annotated sites, or roughly three samples of protein sequence alignments.
Availability: This method is implemented in the software MANUEL, licensed under the GPL. A web-based application for single and batch job is available at http://fester.cs.dal.ca/manuel.
Supplementary information: Supplementary data are available at Bioinformatics online.
The covarion hypothesis of molecular evolution holds that selective pressures on a given amino acid or nucleotide site are dependent on the identity of other sites in the molecule that change throughout time, resulting in changes of evolutionary rates of sites along the branches of a phylogenetic tree. At the sequence level, covarion-like evolution at a site manifests as conservation of nucleotide or amino acid states among some homologs where the states are not conserved in other homologs (or groups of homologs). Covarion-like evolution has been shown to relate to changes in functions at sites in different clades, and, if ignored, can adversely affect the accuracy of phylogenetic inference.
PROCOV (protein covarion analysis) is a software tool that implements a number of previously proposed covarion models of protein evolution for phylogenetic inference in a maximum likelihood framework. Several algorithmic and implementation improvements in this tool over previous versions make computationally expensive tree searches with covarion models more efficient and analyses of large phylogenomic data sets tractable. PROCOV can be used to identify covarion sites by comparing the site likelihoods under the covarion process to the corresponding site likelihoods under a rates-across-sites (RAS) process. Those sites with the greatest log-likelihood difference between a 'covarion' and an RAS process were found to be of functional or structural significance in a dataset of bacterial and eukaryotic elongation factors.
Covarion models implemented in PROCOV may be especially useful for phylogenetic estimation when ancient divergences between sequences have occurred and rates of evolution at sites are likely to have changed over the tree. It can also be used to study lineage-specific functional shifts in protein families that result in changes in the patterns of site variability among subtrees.
Widely used substitution models for proteins, such as the Jones-Taylor-Thornton (JTT) or Whelan and Goldman (WAG) models, are based on empirical amino acid interchange matrices estimated from databases of protein alignments that incorporate the average amino acid frequencies of the data set under examination (e.g JTT + F). Variation in the evolutionary process between sites is typically modelled by a rates-across-sites distribution such as the gamma (Γ) distribution. However, sites in proteins also vary in the kinds of amino acid interchanges that are favoured, a feature that is ignored by standard empirical substitution matrices. Here we examine the degree to which the pattern of evolution at sites differs from that expected based on empirical amino acid substitution models and evaluate the impact of these deviations on phylogenetic estimation.
We analyzed 21 large protein alignments with two statistical tests designed to detect deviation of site-specific amino acid distributions from data simulated under the standard empirical substitution model: JTT+ F + Γ. We found that the number of states at a given site is, on average, smaller and the frequencies of these states are less uniform than expected based on a JTT + F + Γ substitution model. With a four-taxon example, we show that phylogenetic estimation under the JTT + F + Γ model is seriously biased by a long-branch attraction artefact if the data are simulated under a model utilizing the observed site-specific amino acid frequencies from an alignment. Principal components analyses indicate the existence of at least four major site-specific frequency classes in these 21 protein alignments. Using a mixture model with these four separate classes of site-specific state frequencies plus a fifth class of global frequencies (the JTT + cF + Γ model), significant improvements in model fit for real data sets can be achieved. This simple mixture model also reduces the long-branch attraction problem, as shown by simulations and analyses of a real phylogenomic data set.
Protein families display site-specific evolutionary dynamics that are ignored by standard protein phylogenetic models. Accurate estimation of protein phylogenies requires models that accommodate the heterogeneity in the evolutionary process across sites. To this end, we have implemented a class frequency mixture model (cF) in a freely available program called QmmRAxML for phylogenetic estimation.
Fornicata is a relatively recently established group of protists that includes the diplokaryotic diplomonads (which have two similar nuclei per cell), and the monokaryotic enteromonads, retortamonads and Carpediemonas, with the more typical one nucleus per cell. The monophyly of the group was confirmed by molecular phylogenetic studies, but neither the internal phylogeny nor its position on the eukaryotic tree has been clearly resolved.
Here we have introduced data for three genes (SSU rRNA, α-tubulin and HSP90) with a wide taxonomic sampling of Fornicata, including ten isolates of enteromonads, representing the genera Trimitus and Enteromonas, and a new undescribed enteromonad genus. The diplomonad sequences formed two main clades in individual gene and combined gene analyses, with Giardia (and Octomitus) on one side of the basal divergence and Spironucleus, Hexamita and Trepomonas on the other. Contrary to earlier evolutionary scenarios, none of the studied enteromonads appeared basal to diplokaryotic diplomonads. Instead, the enteromonad isolates were all robustly situated within the second of the two diplomonad clades. Furthermore, our analyses suggested that enteromonads do not constitute a monophyletic group, and enteromonad monophyly was statistically rejected in 'approximately unbiased' tests of the combined gene data.
We suggest that all higher taxa intended to unite multiple enteromonad genera be abandoned, that Trimitus and Enteromonas be considered as part of Hexamitinae, and that the term 'enteromonads' be used in a strictly utilitarian sense. Our result suggests either that the diplokaryotic condition characteristic of diplomonads arose several times independently, or that the monokaryotic cell of enteromonads originated several times independently by secondary reduction from the diplokaryotic state. Both scenarios are evolutionarily complex. More comparative data on the similarity of the genomes of the two nuclei of diplomonads will be necessary to resolve which evolutionary scenario is more probable.
The anaerobic lifestyle of the intestinal parasite Blastocystis raises questions about the biochemistry and function of its mitochondria-like organelles. We have characterized the Blastocystis succinyl-CoA synthetase (SCS), a tricarboxylic acid cycle enzyme that conserves energy by substrate-level phosphorylation. We show that SCS localizes to the enigmatic Blastocystis organelles, indicating that these organelles might play a similar role in energy metabolism as classic mitochondria. Although analysis of residues inside the nucleotide-binding site suggests that Blastocystis SCS is GTP-specific, we demonstrate that it is ATP-specific. Homology modelling, followed by flexible docking and molecular dynamics simulations, indicates that while both ATP and GTP fit into the Blastocystis SCS active site, GTP is destabilized by electrostatic dipole interactions with Lys 42 and Lys 110, the side-chains of which lie outside the nucleotide-binding cavity. It has been proposed that residues in direct contact with the substrate determine nucleotide specificity in SCS. However, our results indicate that, in Blastocystis, an electrostatic gatekeeper controls which ligands can enter the binding site.
Blastocystis is a unicellular stramenopile of controversial pathogenicity in humans [1, 2]. Although it is a strict anaerobe, Blastocystis has mitochondrion-like organelles with cristae, a transmembrane potential and DNA [2–4]. An apparent lack of several typical mitochondrial pathways has led some to suggest that these organelles might be hydrogenosomes, anaerobic organelles related to mitochondria [5, 6]. We generated 12,767 expressed sequence tags (ESTs) from Blastocystis and identified 115 clusters that encode putative mitochondrial and hydrogenosomal proteins. Among these is the canonical hydrogenosomal protein iron-only [FeFe] hydrogenase that we show localizes to the organelles. The organelles also have mitochondrial characteristics, including pathways for amino acid metabolism, iron-sulfur cluster biogenesis, and an incomplete tricarboxylic acid cycle as well as a mitochondrial genome. Although complexes I and II of the electron transport chain (ETC) are present, we found no evidence for complexes III and IV or F1Fo ATPases. The Blastocystis organelles have metabolic properties of aerobic and anaerobic mitochondria and of hydrogenosomes [7, 8]. They are convergently similar to organelles recently described in the unrelated ciliate Nyctotherus ovalis. These findings blur the boundaries between mitochondria, hydrogenosomes, and mitosomes, as currently defined, underscoring the disparate selective forces that shape these organelles in eukaryotes.
Most modern eukaryotes diverged from a common ancestor that contained the α-proteobacterial endosymbiont that gave rise to mitochondria. The ‘amitochondriate’ anaerobic protist parasites that have been studied to date, such as Giardia and Trichomonas harbor mitochondrion-related organelles, such as mitosomes or hydrogenosomes. Yet there is one remaining group of mitochondrion-lacking flagellates known as the Preaxostyla that could represent a primitive ‘pre-mitochondrial’ lineage of eukaryotes. To test this hypothesis, we conducted an expressed sequence tag (EST) survey on the preaxostylid flagellate Trimastix pyriformis, a poorly-studied free-living anaerobe. Among the ESTs we detected 19 proteins that, in other eukaryotes, typically function in mitochondria, hydrogenosomes or mitosomes, 12 of which are found exclusively within these organelles. Interestingly, one of the proteins, aconitase, functions in the tricarboxylic acid cycle typical of aerobic mitochondria, whereas others, such as pyruvate:ferredoxin oxidoreductase and [FeFe] hydrogenase, are characteristic of anaerobic hydrogenosomes. Since Trimastix retains genetic evidence of a mitochondriate ancestry, we can now say definitively that all known living eukaryote lineages descend from a common ancestor that had mitochondria.
Determining the relationships among and divergence times for the major eukaryotic lineages remains one of the most important and controversial outstanding problems in evolutionary biology. The sequencing and phylogenetic analyses of ribosomal RNA (rRNA) genes led to the first nearly comprehensive phylogenies of eukaryotes in the late 1980s, and supported a view where cellular complexity was acquired during the divergence of extant unicellular eukaryote lineages. More recently, however, refinements in analytical methods coupled with the availability of many additional genes for phylogenetic analysis showed that much of the deep structure of early rRNA trees was artefactual. Recent phylogenetic analyses of a multiple genes and the discovery of important molecular and ultrastructural phylogenetic characters have resolved eukaryotic diversity into six major hypothetical groups. Yet relationships among these groups remain poorly understood because of saturation of sequence changes on the billion-year time-scale, possible rapid radiations of major lineages, phylogenetic artefacts and endosymbiotic or lateral gene transfer among eukaryotes.
Estimating the divergence dates between the major eukaryote lineages using molecular analyses is even more difficult than phylogenetic estimation. Error in such analyses comes from a myriad of sources including: (i) calibration fossil dates, (ii) the assumed phylogenetic tree, (iii) the nucleotide or amino acid substitution model, (iv) substitution number (branch length) estimates, (v) the model of how rates of evolution change over the tree, (vi) error inherent in the time estimates for a given model and (vii) how multiple gene data are treated. By reanalysing datasets from recently published molecular clock studies, we show that when errors from these various sources are properly accounted for, the confidence intervals on inferred dates can be very large. Furthermore, estimated dates of divergence vary hugely depending on the methods used and their assumptions. Accurate dating of divergence times among the major eukaryote lineages will require a robust tree of eukaryotes, a much richer Proterozoic fossil record of microbial eukaryotes assignable to extant groups for calibration, more sophisticated relaxed molecular clock methods and many more genes sampled from the full diversity of microbial eukaryotes.
eukaryotes; protists; molecular phylogenetics; molecular clock; systematics; superkingdoms
Comparative genomic studies of the mitochondrion-lacking protist group Diplomonadida (diplomonads) has been lacking, although Giardia lamblia has been intensively studied. We have performed a sequence survey project resulting in 2341 expressed sequence tags (EST) corresponding to 853 unique clones, 5275 genome survey sequences (GSS), and eleven finished contigs from the diplomonad fish parasite Spironucleus salmonicida (previously described as S. barkhanus).
The analyses revealed a compact genome with few, if any, introns and very short 3' untranslated regions. Strikingly different patterns of codon usage were observed in genes corresponding to frequently sampled ESTs versus genes poorly sampled, indicating that translational selection is influencing the codon usage of highly expressed genes. Rigorous phylogenomic analyses identified 84 genes – mostly encoding metabolic proteins – that have been acquired by diplomonads or their relatively close ancestors via lateral gene transfer (LGT). Although most acquisitions were from prokaryotes, more than a dozen represent likely transfers of genes between eukaryotic lineages. Many genes that provide novel insights into the genetic basis of the biology and pathogenicity of this parasitic protist were identified including 149 that putatively encode variant-surface cysteine-rich proteins which are candidate virulence factors. A number of genomic properties that distinguish S. salmonicida from its human parasitic relative G. lamblia were identified such as nineteen putative lineage-specific gene acquisitions, distinct mutational biases and codon usage and distinct polyadenylation signals.
Our results highlight the power of comparative genomic studies to yield insights into the biology of parasitic protists and the evolution of their genomes, and suggest that genetic exchange between distantly-related protist lineages may be occurring at an appreciable rate in eukaryote genome evolution.
The patchy distribution of genes across the prokaryotes may be caused by multiple gene losses or lateral transfer. Probabilistic models of gene gain and loss are needed to distinguish between these possibilities. Existing models allow only single genes to be gained and lost, despite the empirical evidence for multi-gene events. We compare birth-death models (currently the only widely-used models, in which only one gene can be gained or lost at a time) to blocks models (allowing gain and loss of multiple genes within a family). We analyze two pairs of genomes: two E. coli strains, and the distantly-related Archaeoglobus fulgidus (archaea) and Bacillus subtilis (gram positive bacteria). Blocks models describe the data much better than birth-death models. Our models suggest that lateral transfers of multiple genes from the same family are rare (although transfers of single genes are probably common). For both pairs, the estimated median time that a gene will remain in the genome is not much greater than the time separating the common ancestors of the archaea and bacteria. Deep phylogenetic reconstruction from sequence data will therefore depend on choosing genes likely to remain in the genome for a long time. Phylogenies based on the blocks model are more biologically plausible than phylogenies based on the birth-death model.
gene content; lateral transfer; phylogenetics; likelihood