Archaeplastida (=Kingdom Plantae) are primary plastid-bearing organisms that evolved via the endosymbiotic association of a heterotrophic eukaryote host cell and a cyanobacterial endosymbiont approximately 1,400 Ma. Here, we present analyses of cyanobacterial and plastid genomes that show strongly conflicting phylogenies based on 75 plastid (or nuclear plastid-targeted) protein-coding genes and their direct translations to proteins. The conflict between genes and proteins is largely robust to the use of sophisticated data- and tree-heterogeneous composition models. However, by using nucleotide ambiguity codes to eliminate synonymous substitutions due to codon-degeneracy, we identify a composition bias, and dependent codon-usage bias, resulting from synonymous substitutions at all third codon positions and first codon positions of leucine and arginine, as the main cause for the conflicting phylogenetic signals. We argue that the protein-coding gene data analyses are likely misleading due to artifacts induced by convergent composition biases at first codon positions of leucine and arginine and at all third codon positions. Our analyses corroborate previous studies based on gene sequence analysis that suggest Cyanobacteria evolved by the early paraphyletic splitting of Gloeobacter and a specific Synechococcus strain (JA33Ab), with all other remaining cyanobacterial groups, including both unicellular and filamentous species, forming the sister-group to the Archaeplastida lineage. In addition, our analyses using better-fitting models suggest (but without statistically strong support) an early divergence of Glaucophyta within Archaeplastida, with the Rhodophyta (red algae), and Viridiplantae (green algae and land plants) forming a separate lineage.
origin of plastids; phylogeny; Cyanobacteria; Archaeplastida
Despite the significance of the relationships between embryophytes and their charophyte algal ancestors in deciphering the origin and evolutionary success of land plants, few chloroplast genomes of the charophyte algae have been reconstructed to date. Here, we present new data for three chloroplast genomes of the freshwater charophytes Klebsormidium flaccidum (Klebsormidiophyceae), Mesotaenium endlicherianum (Zygnematophyceae), and Roya anglica (Zygnematophyceae). The chloroplast genome of Klebsormidium has a quadripartite organization with exceptionally large inverted repeat (IR) regions and, uniquely among streptophytes, has lost the rrn5 and rrn4.5 genes from the ribosomal RNA (rRNA) gene cluster operon. The chloroplast genome of Roya differs from other zygnematophycean chloroplasts, including the newly sequenced Mesotaenium, by having a quadripartite structure that is typical of other streptophytes. On the basis of the improbability of the novel gain of IR regions, we infer that the quadripartite structure has likely been lost independently in at least three zygnematophycean lineages, although the absence of the usual rRNA operonic synteny in the IR regions of Roya may indicate their de novo origin. Significantly, all zygnematophycean chloroplast genomes have undergone substantial genomic rearrangement, which may be the result of ancient retroelement activity evidenced by the presence of integrase-like and reverse transcriptase-like elements in the Roya chloroplast genome. Our results corroborate the close phylogenetic relationship between Zygnematophyceae and land plants and identify 89 protein-coding genes and 22 introns present in the chloroplast genome at the time of the evolutionary transition of plants to land, all of which can be found in the chloroplast genomes of extant charophytes.
charophytes; bryophytes; land plants; chloroplast genomics
Microsporidia are an abundant group of obligate intracellular parasites of other eukaryotes, including immunocompromised humans, but the molecular basis of their intracellular lifestyle and pathobiology are poorly understood. New genomes from a taxonomically broad range of microsporidians, complemented by published expression data, provide an opportunity for comparative analyses to identify conserved and lineage-specific patterns of microsporidian genome evolution that have underpinned this success. In this study, we infer that a dramatic bottleneck in the last common microsporidian ancestor (LCMA) left a small conserved core of genes that was subsequently embellished by gene family expansion driven by gene acquisition in different lineages. Novel expressed protein families represent a substantial fraction of sequenced microsporidian genomes and are significantly enriched for signals consistent with secretion or membrane location. Further evidence of selection is inferred from the gain and reciprocal loss of functional domains between paralogous genes, for example, affecting transport proteins. Gene expansions among transporter families preferentially affect those that are located on the plasma membrane of model organisms, consistent with recruitment to plug conserved gaps in microsporidian biosynthesis and metabolism. Core microsporidian genes shared with other eukaryotes are enriched in orthologs that, in yeast, are highly expressed, highly connected, and often essential, consistent with strong negative selection against further reduction of the conserved gene set since the LCMA. Our study reveals that microsporidian genome evolution is a highly dynamic process that has balanced constraint, reductive evolution, and genome expansion during adaptation to an extraordinarily successful obligate intracellular lifestyle.
Microsporidia; intracellular parasites; evolution; genome reduction; gene duplication; novel gene families
Heterogeneity among life traits in mammals has resulted in considerable phylogenetic conflict, particularly concerning the position of the placental root. Layered upon this are gene- and lineage-specific variation in amino acid substitution rates and compositional biases. Life trait variations that may impact upon mutational rates are longevity, metabolic rate, body size, and germ line generation time. Over the past 12 years, three main conflicting hypotheses have emerged for the placement of the placental root. These hypotheses place the Atlantogenata (common ancestor of Xenarthra plus Afrotheria), the Afrotheria, or the Xenarthra as the sister group to all other placental mammals. Model adequacy is critical for accurate tree reconstruction and by failing to account for these compositional and character exchange heterogeneities across the tree and data set, previous studies have not provided a strongly supported hypothesis for the placental root. For the first time, models that accommodate both tree and data set heterogeneity have been applied to mammal data. Here, we show the impact of accurate model assignment and the importance of data sets in accommodating model parameters while maintaining the power to reject competing hypotheses. Through these sophisticated methods, we demonstrate the importance of model adequacy, data set power and provide strong support for the Atlantogenata over other competing hypotheses for the position of the placental root.
mammal phylogeny; phylogenetic reconstruction; evolutionary models; placental root; heterogeneous modeling
The influence of lateral gene transfer on gene origins and biology in eukaryotes is poorly understood compared with those of prokaryotes. A number of independent investigations focusing on specific genes, individual genomes, or specific functional categories from various eukaryotes have indicated that lateral gene transfer does indeed affect eukaryotic genomes. However, the lack of common methodology and criteria in these studies makes it difficult to assess the general importance and influence of lateral gene transfer on eukaryotic genome evolution.
We used a phylogenomic approach to systematically investigate lateral gene transfer affecting the proteomes of thirteen, mainly parasitic, microbial eukaryotes, representing four of the six eukaryotic super-groups. All of the genomes investigated have been significantly affected by prokaryote-to-eukaryote lateral gene transfers, dramatically affecting the enzymes of core pathways, particularly amino acid and sugar metabolism, but also providing new genes of potential adaptive significance in the life of parasites. A broad range of prokaryotic donors is involved in such transfers, but there is clear and significant enrichment for bacterial groups that share the same habitats, including the human microbiota, as the parasites investigated.
Our data show that ecology and lifestyle strongly influence gene origins and opportunities for gene transfer and reveal that, although the outlines of the core eukaryotic metabolism are conserved among lineages, the genes making up those pathways can have very different origins in different eukaryotes. Thus, from the perspective of the effects of lateral gene transfer on individual gene ancestries in different lineages, eukaryotic metabolism appears to be chimeric.
Genome evolution; phylogenomics; lateral gene transfer; eukaryotes; parasites
Specimens of neotropical Anopheles (Nyssorhynchus) were collected and identified morphologically. We amplified three genes for phylogenetic analysis–the single copy nuclear white and CAD genes, and the COI barcode region. Since we had multiple specimens for most species we were able to test how well the single or combined genes were able to corroborate morphologically defined species by placing the species into exclusive groups. We found that single genes, including the COI barcode region, were poor at confirming species, but that the three genes combined were able to do so much better. This has implications for species identification, species delimitation, and species discovery, and we caution that single genes are not enough. Higher level groupings were partially resolved with some well-supported groupings, whereas others were found to be either polyphyletic or paraphyletic. There were examples of known groups, such as the Myzorhynchella Section, which were poorly supported with single genes but were well supported with combined genes. From this we can infer that more sequence data will be needed in order to show more higher-level groupings with good support. We got unambiguously good support (0.94–1.0 Bayesian posterior probability) from all DNA-based analyses for a grouping of An. dunhami with An. nuneztovari and An. goeldii, and because of this and because of morphological similarities we propose that An. dunhami be included in the Nuneztovari Complex. We obtained phylogenetic corroboration for new species which had been recognised by morphological differences; these will need to be formally described and named.
Determining the relationships among the major groups of cellular life is important for understanding the evolution of biological diversity, but is difficult given the enormous time spans involved. In the textbook ‘three domains’ tree based on informational genes, eukaryotes and Archaea share a common ancestor to the exclusion of Bacteria. However, some phylogenetic analyses of the same data have placed eukaryotes within the Archaea, as the nearest relatives of different archaeal lineages. We compared the support for these competing hypotheses using sophisticated phylogenetic methods and an improved sampling of archaeal biodiversity. We also employed both new and existing tests of phylogenetic congruence to explore the level of uncertainty and conflict in the data. Our analyses suggested that much of the observed incongruence is weakly supported or associated with poorly fitting evolutionary models. All of our phylogenetic analyses, whether on small subunit and large subunit ribosomal RNA or concatenated protein-coding genes, recovered a monophyletic group containing eukaryotes and the TACK archaeal superphylum comprising the Thaumarchaeota, Aigarchaeota, Crenarchaeota and Korarchaeota. Hence, while our results provide no support for the iconic three-domain tree of life, they are consistent with an extended eocyte hypothesis whereby vital components of the eukaryotic nuclear lineage originated from within the archaeal radiation.
phylogenetics; eukaryotes; evolution; tree of life
Mitochondrial genomes comprise a small but critical component of the total DNA in eukaryotic organisms. They encode several key proteins for the cell’s major energy producing apparatus, the mitochondrial respiratory chain. Additonally, their nucleotide and amino acid sequences are of great utility as markers for systematics, molecular ecology and forensics. Their characterization through nucleotide sequencing is a fundamental starting point in mitogenomics. Methods to amplify complete mitochondrial genomes rapidly and efficiently from microgram quantities of tissue of single individuals are, however, not always available. Here we validate two approaches, which combine long-PCR with Roche 454 pyrosequencing technology, to obtain two complete mitochondrial genomes from individual amphibian species.
We obtained two new xenopus frogs (Xenopus borealis and X. victorianus) complete mitochondrial genome sequences by means of long-PCR followed by 454 of individual genomes (approach 1) or of multiple pooled genomes (approach 2), the mean depth of coverage per nucleotide was 9823 and 186, respectively. We also characterised and compared the new mitogenomes against their sister taxa; X. laevis and Silurana tropicalis, two of the most intensely studied amphibians. Our results demonstrate how our approaches can be used to obtain complete amphibian mitogenomes with depths of coverage that far surpass traditional primer-walking strategies, at either the same cost or less. Our results also demonstrate: that the size, gene content and order are the same among xenopus mitogenomes and that S. tropicalis form a separate clade to the other xenopus, among which X. laevis and X. victorianus were most closely related. Nucleotide and amino acid diversity was found to vary across the xenopus mitogenomes, with the greatest diversity observed in the Complex 1 gene nad4l and the least diversity observed in Complex 4 genes (cox1-3). All protein-coding genes were shown to be under strong negative (purifying selection), with genes under the strongest pressure (Complex 4) also being the most highly expressed, highlighting their potentially crucial functions in the mitochondrial respiratory chain.
Next generation sequencing of long-PCR amplicons using single taxon or multi-taxon approaches enabled two new species of Xenopus mtDNA to be fully characterized. We anticipate our complete mitochondrial genome amplification methods to be applicable to other amphibians, helpful for identifying the most appropriate markers for differentiating species, populations and resolving phylogenies, a pressing need since amphibians are undergoing drastic global decline. Our mtDNAs also provide templates for conserved primer design and the assembly of RNA and DNA reads following high throughput “omic” techniques such as RNA- and ChIP-seq. These could help us better understand how processes such mitochondrial replication and gene expression influence xenopus growth and development, as well as how they evolved and are regulated.
Xenopus; Mitochondrial DNA; Next generation sequencing; Phylogeny; Mitogenomics; Comparative analyses; Variation; Selection and molecular markers
The three-domains tree, which depicts eukaryotes and archaebacteria as monophyletic sister groups, is the dominant model for early eukaryotic evolution. By contrast, the ‘eocyte hypothesis’, where eukaryotes are proposed to have originated from within the archaebacteria as sister to the Crenarchaeota (also called the eocytes), has been largely neglected in the literature. We have investigated support for these two competing hypotheses from molecular sequence data using methods that attempt to accommodate the across-site compositional heterogeneity and across-tree compositional and rate matrix heterogeneity that are manifest features of these data. When ribosomal RNA genes were analysed using standard methods that do not adequately model these kinds of heterogeneity, the three-domains tree was supported. However, this support was eroded or lost when composition-heterogeneous models were used, with concomitant increase in support for the eocyte tree for eukaryotic origins. Analysis of combined amino acid sequences from 41 protein-coding genes supported the eocyte tree, whether or not composition-heterogeneous models were used. The possible effects of substitutional saturation of our data were examined using simulation; these results suggested that saturation is delayed by among-site rate variation in the sequences, and that phylogenetic signal for ancient relationships is plausibly present in these data.
universal tree of life; eukaryote origins; archaebacteria; eocyte; heterogeneous phylogenetic models
A phylogeny was reconstructed for four species belonging to the Neotropical Anopheles (Nyssorhynchus) albitarsis complex using partial sequences from the mitochondrial cytochrome oxidase I (COI) and NADH dehydrogenase 4 (ND4) genes and the ribosomal DNA ITS2 and D2 expansion region of the 28S subunit. The basis for initial characterization of each member of the complex was by correlated random amplification of polymorphic DNA-polymerase chain reaction (RAPD-PCR) markers. Analyses were carried out with and without an outgroup (An.(Nys.) argyritarsis Robineau-Desvoidy) by using maximum parsimony, maximum likelihood, and Bayesian methods. A total evidence approach without the outgroup, using separate models for “fast” (COI and ND4 position 3) and “slow” (rDNA ITS2 and D2, and COI and ND4 position 1) partitions, gave the best supported topology, showing close relationships of An. albitarsis Lynch-Arribálzaga to An. albitarsis B and An. marajoara Galvão & Damasceno to An. deaneorum Rosa-Freitas. Analyses with the outgroup included showed poorer support, possibly because of a long branch attraction effect caused by a divergent outgroup, which caused one of the An. marajoara specimens to cluster with An. deaneorum in some analyses. The relationship of the above-mentioned result to a separately proposed hypothesis suggesting a fifth species in the complex is discussed.
Culicidae; Anopheles albitarsis Complex; molecular phylogeny
We describe the genome sequence of the protist Trichomonas vaginalis, a sexually transmitted human pathogen. Repeats and transposable elements comprise about two-thirds of the ~160-megabase genome, reflecting a recent massive expansion of genetic material. This expansion, in conjunction with the shaping of metabolic pathways that likely transpired through lateral gene transfer from bacteria, and amplification of specific gene families implicated in pathogenesis and phagocytosis of host proteins may exemplify adaptations of the parasite during its transition to a urogenital environment. The genome sequence predicts previously unknown functions for the hydrogenosome, which support a common evolutionary origin of this unusual organelle with mitochondria.
Currently the shikimate pathway is reported as a metabolic feature of prokaryotes, ascomycete fungi, apicomplexans, and plants. The plant shikimate pathway enzymes have similarities to prokaryote homologues and are largely active in chloroplasts, suggesting ancestry from the plastid progenitor genome. Toxoplasma gondii, which also possesses an alga-derived plastid organelle, encodes a shikimate pathway with similarities to ascomycete genes, including a five-enzyme pentafunctional arom. These data suggests that the shikimate pathway and the pentafunctional arom either had an ancient origin in the eukaryotes or was conveyed by eukaryote-to-eukaryote horizontal gene transfer (HGT). We expand sampling and analyses of the shikimate pathway genes to include the oomycetes, ciliates, diatoms, basidiomycetes, zygomycetes, and the green and red algae. Sequencing of cDNA from Tetrahymena thermophila confirmed the presence of a pentafused arom, as in fungi and T. gondii. Phylogenies and taxon distribution suggest that the arom gene fusion event may be an ancient eukaryotic innovation. Conversely, the Plantae lineage (represented here by both Viridaeplantae and the red algae) acquired different prokaryotic genes for all seven steps of the shikimate pathway. Two of the phylogenies suggest a derivation of the Plantae genes from the cyanobacterial plastid progenitor genome, but if the full Plantae pathway was originally of cyanobacterial origin, then the five other shikimate pathway genes were obtained from a minimum of two other eubacterial genomes. Thus, the phylogenies demonstrate both separate HGTs and shared derived HGTs within the Plantae clade either by primary HGT transfer or secondarily via the plastid progenitor genome. The shared derived characters support the holophyly of the Plantae lineage and a single ancestral primary plastid endosymbiosis. Our analyses also pinpoints a minimum of 50 gene/domain loss events, demonstrating that loss and replacement events have been an important process in eukaryote genome evolution.
Lateral gene transfer (LGT) in eukaryotes from non-organellar sources is a controversial subject in need of further study. Here we present gene distribution and phylogenetic analyses of the genes encoding the hybrid-cluster protein, A-type flavoprotein, glucosamine-6-phosphate isomerase, and alcohol dehydrogenase E. These four genes have a limited distribution among sequenced prokaryotic and eukaryotic genomes and were previously implicated in gene transfer events affecting eukaryotes. If our previous contention that these genes were introduced by LGT independently into the diplomonad and Entamoeba lineages were true, we expect that the number of putative transfers and the phylogenetic signal supporting LGT should be stable or increase, rather than decrease, when novel eukaryotic and prokaryotic homologs are added to the analyses.
The addition of homologs from phagotrophic protists, including several Entamoeba species, the pelobiont Mastigamoeba balamuthi, and the parabasalid Trichomonas vaginalis, and a large quantity of sequences from genome projects resulted in an apparent increase in the number of putative transfer events affecting all three domains of life. Some of the eukaryotic transfers affect a wide range of protists, such as three divergent lineages of Amoebozoa, represented by Entamoeba, Mastigamoeba, and Dictyostelium, while other transfers only affect a limited diversity, for example only the Entamoeba lineage. These observations are consistent with a model where these genes have been introduced into protist genomes independently from various sources over a long evolutionary time.
Phylogenetic analyses of the updated datasets using more sophisticated phylogenetic methods, in combination with the gene distribution analyses, strengthened, rather than weakened, the support for LGT as an important mechanism affecting the evolution of these gene families. Thus, gene transfer seems to be an on-going evolutionary mechanism by which genes are spread between unrelated lineages of all three domains of life, further indicating the importance of LGT from non-organellar sources into eukaryotic genomes.