Through the course of their evolution, viruses with large genomes have acquired numerous host genes, most of which perform function in virus reproduction in a manner that is related to their original activities in the cells, but some are exapted for new roles. Here we report the unexpected finding that protein F12, which is conserved among the chordopoxviruses and is implicated in the morphogenesis of enveloped intracellular virions, is a derived DNA polymerase, possibly of bacteriophage origin, in which the polymerase domain and probably the exonuclease domain have been inactivated. Thus, F12 appears to present a rare example of a drastic, exaptive functional change in virus evolution.
Reviewers: This article was reviewed by Frank Eisenhaber and Juergen Brosius. For complete reviews, go the Reviewers’ Reports section.
DNA polymerase; Exaptation; Poxviruses; Evolution of viruses
The nucleocytoplasmic large DNA viruses (NCLDVs) comprise a monophyletic group of viruses that infect animals and diverse unicellular eukaryotes. The NCLDV group includes the families Poxviridae, Asfarviridae, Iridoviridae, Ascoviridae, Phycodnaviridae, Mimiviridae and the proposed family “Marseilleviridae”. The family Mimiviridae includes the largest known viruses, with genomes in excess of one megabase, whereas the genome size in the other NCLDV families varies from 100 to 400 kilobase pairs. Most of the NCLDVs replicate in the cytoplasm of infected cells, within so-called virus factories. The NCLDVs share a common ancient origin, as demonstrated by evolutionary reconstructions that trace approximately 50 genes encoding key proteins involved in viral replication and virion formation to the last common ancestor of all these viruses. Taken together, these characteristics lead us to propose assigning an official taxonomic rank to the NCLDVs as the order “Megavirales”, in reference to the large size of the virions and genomes of these viruses.
The nucleo-cytoplasmic large DNA viruses (NCLDV) constitute an apparently monophyletic group that consists of 6 families of viruses infecting a broad variety of eukaryotes. A comprehensive genome comparison and maximum-likelihood reconstruction of NCLDV evolution reveal a set of approximately 50 conserved genes that can be tentatively mapped to the genome of the common ancestor of this class of eukaryotic viruses. We address the origins and evolution of NCLDV.
Phylogenetic analysis indicates that some of the major clades of NCLDV infect diverse animals and protists, suggestive of early radiation of the NCLDV, possibly concomitant with eukaryogenesis. The core NCLDV genes seem to have originated from different sources including homologous genes of bacteriophages, bacteria and eukaryotes. These observations are compatible with a scenario of the origin of the NCLDV at an early stage of the evolution of eukaryotes through extensive mixing of genes from widely different genomes.
The common ancestor of the NCLDV probably evolved from a bacteriophage as a result of recruitment of numerous eukaryotic and some bacterial genes, and concomitant loss of the majority of phage genes except for a small core of genes coding for proteins essential for virus genome replication and virion formation.
Bacteriophage; Eukaryogenesis; Nucleo-cytoplasmic large DNA viruses, evolution; Phylogenetic analysis
The recently discovered Pandoraviruses are by far the largest viruses known, with their 2 megabase genomes exceeding in size the genomes of numerous bacteria and archaea. Pandoraviruses show a distant relationship with other nucleocytoplasmic large DNA viruses (NCLDV) of eukaryotes, lack some of the NCLDV core genes and in particular do not appear to be specifically related to the other, better characterized family of giant viruses, the Mimiviridae. Here we report phylogenetic analysis of 6 core NCLDV genes that confidently places Pandoraviruses within the family Phycodnaviridae, with an apparent specific affinity with Coccolithoviruses. We conclude that, despite their many unusual characteristics, Pandoraviruses are highly derived phycodnaviruses. These findings imply that giant viruses have independently evolved from smaller NCLDV on at least two occasions.
This article was reviewed by Patrick Forterre and Lakshminarayan Iyer. For the full reviews, see the Reviewers’ reports section.
Recent advances of genomics and metagenomics reveal remarkable diversity of viruses and other selfish genetic elements. In particular, giant viruses have been shown to possess their own mobilomes that include virophages, small viruses that parasitize on giant viruses of the Mimiviridae family, and transpovirons, distinct linear plasmids. One of the virophages known as the Mavirus, a parasite of the giant Cafeteria roenbergensis virus, shares several genes with large eukaryotic self-replicating transposon of the Polinton (Maverick) family, and it has been proposed that the polintons evolved from a Mavirus-like ancestor.
We performed a comprehensive phylogenomic analysis of the available genomes of virophages and traced the evolutionary connections between the virophages and other selfish genetic elements. The comparison of the gene composition and genome organization of the virophages reveals 6 conserved, core genes that are organized in partially conserved arrays. Phylogenetic analysis of those core virophage genes, for which a sufficient diversity of homologs outside the virophages was detected, including the maturation protease and the packaging ATPase, supports the monophyly of the virophages. The results of this analysis appear incompatible with the origin of polintons from a Mavirus-like agent but rather suggest that Mavirus evolved through recombination between a polinton and an unknownvirus. Altogether, virophages, polintons, a distinct Tetrahymena transposable element Tlr1, transpovirons, adenoviruses, and some bacteriophages form a network of evolutionary relationships that is held together by overlapping sets of shared genes and appears to represent a distinct module in the vast total network of viruses and mobile elements.
The results of the phylogenomic analysis of the virophages and related genetic elements are compatible with the concept of network-like evolution of the virus world and emphasize multiple evolutionary connections between bona fide viruses and other classes of capsid-less mobile elements.
The family Mimiviridae belongs to the large monophyletic group of Nucleo-Cytoplasmic Large DNA Viruses (NCLDV; proposed order Megavirales) and encompasses giant viruses infecting amoeba and probably other unicellular eukaryotes. The recent discovery of the Cafeteria roenbergensis virus (CroV), a distant relative of the prototype mimiviruses, led to a substantial expansion of the genetic variance within the family Mimiviridae. In the light of these findings, a reassessment of the relationships between the mimiviruses and other NCLDV and reconstruction of the evolution of giant virus genomes emerge as interesting and timely goals.
Database searches for the protein sequences encoded in the genomes of several viruses originally classified as members of the family Phycodnaviridae, in particular Organic Lake phycodnaviruses and Phaeocystis globosa viruses (OLPG), revealed a greater number of highly similar homologs in members of the Mimiviridae than in phycodnaviruses. We constructed a collection of 898 Clusters of Orthologous Genes for the putative expanded family Mimiviridae (MimiCOGs) and used these clusters for a comprehensive phylogenetic analysis of the genes that are conserved in most of the NCLDV. The topologies of the phylogenetic trees for these conserved viral genes strongly support the monophyly of the OLPG and the mimiviruses. The same tree topology was obtained by analysis of the phyletic patterns of conserved viral genes. We further employed the mimiCOGs to obtain a maximum likelihood reconstruction of the history of genes losses and gains among the giant viruses. The results reveal massive gene gain in the mimivirus branch and modest gene gain in the OLPG branch.
These phylogenomic results reported here suggest a substantial expansion of the family Mimiviridae. The proposed expanded family encompasses a greater diversity of viruses including a group of viruses with much smaller genomes than those of the original members of the Mimiviridae. If the OLPG group is included in an expanded family Mimiviridae, it becomes the only family of giant viruses currently shown to host virophages. The mimiCOGs are expected to become a key resource for phylogenomics of giant viruses.
Collections of Clusters of Orthologous Genes (COGs) provide indispensable tools for comparative genomic analysis, evolutionary reconstruction and functional annotation of new genomes. Initially, COGs were made for all complete genomes of cellular life forms that were available at the time. However, with the accumulation of thousands of complete genomes, construction of a comprehensive COG set has become extremely computationally demanding and prone to error propagation, necessitating the switch to taxon-specific COG collections. Previously, we reported the collection of COGs for 41 genomes of Archaea (arCOGs). Here we present a major update of the arCOGs and describe evolutionary reconstructions to reveal general trends in the evolution of Archaea.
The updated version of the arCOG database incorporates 91% of the pangenome of 120 archaea (251,032 protein-coding genes altogether) into 10,335 arCOGs. Using this new set of arCOGs, we performed maximum likelihood reconstruction of the genome content of archaeal ancestral forms and gene gain and loss events in archaeal evolution. This reconstruction shows that the last Common Ancestor of the extant Archaea was an organism of greater complexity than most of the extant archaea, probably with over 2,500 protein-coding genes. The subsequent evolution of almost all archaeal lineages was apparently dominated by gene loss resulting in genome streamlining. Overall, in the evolution of Archaea as well as a representative set of bacteria that was similarly analyzed for comparison, gene losses are estimated to outnumber gene gains at least 4 to 1. Analysis of specific patterns of gene gain in Archaea shows that, although some groups, in particular Halobacteria, acquire substantially more genes than others, on the whole, gene exchange between major groups of Archaea appears to be largely random, with no major ‘highways’ of horizontal gene transfer.
The updated collection of arCOGs is expected to become a key resource for comparative genomics, evolutionary reconstruction and functional annotation of new archaeal genomes. Given that, in spite of the major increase in the number of genomes, the conserved core of archaeal genes appears to be stabilizing, the major evolutionary trends revealed here have a chance to stand the test of time.
This article was reviewed by (for complete reviews see the Reviewers’ Reports section): Dr. PLG, Prof. PF, Dr. PL (nominated by Prof. JPG).
Archaea; Orthologs; Horizontal gene transfer
The 1,021,348 base pair genome sequence of the Acanthamoeba polyphaga moumouvirus, a new member of the Mimiviridae family infecting Acanthamoeba polyphaga, is reported. The moumouvirus represents a third lineage beside mimivirus and megavirus. Thereby, it is a new member of the recently proposed Megavirales order. This giant virus was isolated from a cooling tower water in southeastern France but is most closely related to Megavirus chiliensis, which was isolated from ocean water off the coast of Chile. The moumouvirus is predicted to encode 930 proteins, of which 879 have detectable homologs. Among these predicted proteins, for 702 the closest homolog was detected in Megavirus chiliensis, with the median amino acid sequence identity of 62%. The evolutionary affinity of moumouvirus and megavirus was further supported by phylogenetic tree analysis of conserved genes. The moumouvirus and megavirus genomes share near perfect orthologous gene collinearity in the central part of the genome, with the variations concentrated in the terminal regions. In addition, genomic comparisons of the Mimiviridae reveal substantial gene loss in the moumouvirus lineage. The majority of the remaining moumouvirus proteins are most similar to homologs from other Mimiviridae members, and for 27 genes the closest homolog was found in bacteria. Phylogenetic analysis of these genes supported gene acquisition from diverse bacteria after the separation of the moumouvirus and megavirus lineages. Comparative genome analysis of the three lineages of the Mimiviridae revealed significant mobility of Group I self-splicing introns, with the highest intron content observed in the moumouvirus genome.
moumouvirus; mimivirus; giant virus; megavirus; Mimiviridae; Megavirales; horizontal gene transfer; viral genome; nucleo-cytoplasmic large DNA viruses
Viruses with large genomes encode numerous proteins that do not directly participate in virus biogenesis but rather modify key functional systems of infected cells. We report that a distinct group of giant viruses infecting unicellular eukaryotes that includes Organic Lake Phycodnaviruses and Phaeocystis globosa virus encode predicted proteorhodopsins that have not been previously detected in viruses. Search of metagenomic sequence data shows that putative viral proteorhodopsins are extremely abundant in marine environments. Phylogenetic analysis suggests that giant viruses acquired proteorhodopsins via horizontal gene transfer from proteorhodopsin-encoding protists although the actual donor(s) could not be presently identified. The pattern of conservation of the predicted functionally important amino acid residues suggests that viral proteorhodopsin homologs function as sensory rhodopsins. We hypothesize that viral rhodopsins modulate light-dependent signaling, in particular phototaxis, in infected protists.
This article was reviewed by Igor B. Zhulin and Laksminarayan M. Iyer. For the full reviews, see the Reviewers’ reports section.
The Nucleo-Cytoplasmic Large DNA Viruses (NCLDV) constitute an apparently monophyletic group that consists of at least 6 families of viruses infecting a broad variety of eukaryotic hosts. A comprehensive genome comparison and maximum-likelihood reconstruction of the NCLDV evolution revealed a set of approximately 50 conserved, core genes that could be mapped to the genome of the common ancestor of this class of eukaryotic viruses.
We performed a detailed phylogenetic analysis of these core NCLDV genes and applied the constrained tree approach to show that the majority of the core genes are unlikely to be monophyletic. Several of the core genes have been independently acquired from different sources by different NCLDV lineages whereas for the majority of these genes displacement by homologs from cellular organisms in one or more groups of the NCLDV was demonstrated.
A detailed study of the evolution of the genomic core of the NCLDV reveals substantial complexity and diversity of evolutionary scenarios that was largely unsuspected previously. The phylogenetic coherence between the core genes is sufficient to validate the hypothesis on the evolution of all NCLDV from a common ancestral virus although the set of ancestral genes might be smaller than previously inferred from patterns of gene presence-absence.
Archaeal and bacterial ribosomes contain more than 50 proteins, including 34 that are universally conserved in the three domains of cellular life (bacteria, archaea, and eukaryotes). Despite the high sequence conservation, annotation of ribosomal (r-) protein genes is often difficult because of their short lengths and biased sequence composition. We developed an automated computational pipeline for identification of r-protein genes and applied it to 995 completely sequenced bacterial and 87 archaeal genomes available in the RefSeq database. The pipeline employs curated seed alignments of r-proteins to run position-specific scoring matrix (PSSM)-based BLAST searches against six-frame genome translations, mitigating possible gene annotation errors. As a result of this analysis, we performed a census of prokaryotic r-protein complements, enumerated missing and paralogous r-proteins, and analyzed the distributions of ribosomal protein genes among chromosomal partitions. Phyletic patterns of bacterial and archaeal r-protein genes were mapped to phylogenetic trees reconstructed from concatenated alignments of r-proteins to reveal the history of likely multiple independent gains and losses. These alignments, available for download, can be used as search profiles to improve genome annotation of r-proteins and for further comparative genomics studies.
Tubulins are a family of GTPases that are key components of the cytoskeleton in all eukaryotes and are distantly related to the FtsZ GTPase that is involved in cell division in most bacteria and many archaea. Among prokaryotes, bona fide tubulins have been identified only in bacteria of the genus Prosthecobacter. These bacterial tubulin genes appear to have been horizontally transferred from eukaryotes. Here we describe tubulins encoded in the genomes of thaumarchaeota of the genus Nitrosoarchaeum that we denote artubulins Phylogenetic analysis results are compatible with the origin of eukaryotic tubulins from artubulins. These findings expand the emerging picture of the origin of key components of eukaryotic functional systems from ancestral forms that are scattered among the extant archaea.
This article was reviewed by Gáspár Jékely and J. Peter Gogarten.
Recently a novel cell division system comprised of homologues of eukaryotic ESCRT-III (endosomal sorting complex required for transport III) proteins was discovered in the hyperthermophilic crenarchaeote Sulfolobus acidocaldarius. On the basis of this discovery, we undertook a comparative genomic analysis of the machineries for cell division and vesicle formation in Archaea. Archaea possess at least three distinct membrane remodelling systems: the FtsZ-based bacterial-type system, the ESCRT-III-based eukaryote-like system and a putative novel system that uses an archaeal actin-related protein. Many archaeal genomes encode assortments of components from different systems. Evolutionary reconstruction from these findings suggests that the last common ancestor of the extant Archaea possessed a complex membrane remodelling apparatus, different components of which were lost during subsequent evolution of archaeal lineages. By contrast, eukaryotes seem to have inherited all three ancestral systems.
The genome sequence of the Mamavirus, a new Acanthamoeba polyphaga mimivirus strain, is reported. With 1,191,693 nt in length and 1,023 predicted protein-coding genes, the Mamavirus has the largest genome among the known viruses. The genomes of the Mamavirus and the previously described Mimivirus are highly similar in both the protein-coding genes and the intergenic regions. However, the Mamavirus contains an extra 5′-terminal segment that encompasses primarily disrupted duplicates of genes present elsewhere in the genome. The Mamavirus also has several unique genes including a small regulatory polyA polymerase subunit that is shared with poxviruses. Detailed analysis of the protein sequences of the two Mimiviruses led to a substantial amendment of the functional annotation of the viral genomes.
Mimivirus; viral genome; nucleocytoplasmic large DNA viruses
Small, hydrophobic proteins whose synthesis is repressed by small RNAs (sRNAs), denoted type I toxin–antitoxin modules, were first discovered on plasmids where they regulate plasmid stability, but were subsequently found on a few bacterial chromosomes. We used exhaustive PSI-BLAST and TBLASTN searches across 774 bacterial genomes to identify homologs of known type I toxins. These searches substantially expanded the collection of predicted type I toxins, revealed homology of the Ldr and Fst toxins, and suggested that type I toxin–antitoxin loci are not spread by horizontal gene transfer. To discover novel type I toxin–antitoxin systems, we developed a set of search parameters based on characteristics of known loci including the presence of tandem repeats and clusters of charged and bulky amino acids at the C-termini of short proteins containing predicted transmembrane regions. We detected sRNAs for three predicted toxins from enterohemorrhagic Escherichia coli and Bacillus subtilis, and showed that two of the respective proteins indeed are toxic when overexpressed. We also demonstrated that the local free-energy minima of RNA folding can be used to detect the positions of the sRNA genes. Our results suggest that type I toxin–antitoxin modules are much more widely distributed among bacteria than previously appreciated.
Eukaryotic Nucleo-Cytoplasmic Large DNA Viruses (NCLDV) encode most if not all of the enzymes involved in their DNA replication. It has been inferred that genes for these enzymes were already present in the last common ancestor of the NCLDV. However, the details of the evolution of these genes that bear on the complexity of the putative ancestral NCLDV and on the evolutionary relationships between viruses and their hosts are not well understood.
Phylogenetic analysis of the ATP-dependent and NAD-dependent DNA ligases encoded by the NCLDV reveals an unexpectedly complex evolutionary history. The NAD-dependent ligases are encoded only by a minority of NCLDV (including mimiviruses, some iridoviruses and entomopoxviruses) but phylogenetic analysis clearly indicated that all viral NAD-dependent ligases are monophyletic. Combined with the topology of the NCLDV tree derived by consensus of trees for universally conserved genes suggests that this enzyme was represented in the ancestral NCLDV. Phylogenetic analysis of ATP-dependent ligases that are encoded by chordopoxviruses, most of the phycodnaviruses and Marseillevirus failed to demonstrate monophyly and instead revealed an unexpectedly complex evolutionary trajectory. The ligases of the majority of phycodnaviruses and Marseillevirus seem to have evolved from bacteriophage or bacterial homologs; the ligase of one phycodnavirus, Emiliana huxlei virus, belongs to the eukaryotic DNA ligase I branch; and ligases of chordopoxviruses unequivocally cluster with eukaryotic DNA ligase III.
Examination of phyletic patterns and phylogenetic analysis of DNA ligases of the NCLDV suggest that the common ancestor of the extant NCLDV encoded an NAD-dependent ligase that most likely was acquired from a bacteriophage at the early stages of evolution of eukaryotes. By contrast, ATP-dependent ligases from different prokaryotic and eukaryotic sources displaced the ancestral NAD-dependent ligase at different stages of subsequent evolution. These findings emphasize complex routes of viral evolution that become apparent through detailed phylogenomic analysis but not necessarily in reconstructions based on phyletic patterns of genes.
This article was reviewed by: Patrick Forterre, George V. Shpakovski, and Igor B. Zhulin.
The Nucleo-Cytoplasmic Large DNA Viruses (NCLDV) comprise an apparently monophyletic class of viruses that infect a broad variety of eukaryotic hosts. Recent progress in isolation of new viruses and genome sequencing resulted in a substantial expansion of the NCLDV diversity, resulting in additional opportunities for comparative genomic analysis, and a demand for a comprehensive classification of viral genes.
A comprehensive comparison of the protein sequences encoded in the genomes of 45 NCLDV belonging to 6 families was performed in order to delineate cluster of orthologous viral genes. Using previously developed computational methods for orthology identification, 1445 Nucleo-Cytoplasmic Virus Orthologous Groups (NCVOGs) were identified of which 177 are represented in more than one NCLDV family. The NCVOGs were manually curated and annotated and can be used as a computational platform for functional annotation and evolutionary analysis of new NCLDV genomes. A maximum-likelihood reconstruction of the NCLDV evolution yielded a set of 47 conserved genes that were probably present in the genome of the common ancestor of this class of eukaryotic viruses. This reconstructed ancestral gene set is robust to the parameters of the reconstruction procedure and so is likely to accurately reflect the gene core of the ancestral NCLDV, indicating that this virus encoded a complex machinery of replication, expression and morphogenesis that made it relatively independent from host cell functions.
The NCVOGs are a flexible and expandable platform for genome analysis and functional annotation of newly characterized NCLDV. Evolutionary reconstructions employing NCVOGs point to complex ancestral viruses.
Phagocytosis, that is, engulfment of large particles by eukaryotic cells, is found in diverse organisms and is often thought to be central to the very origin of the eukaryotic cell, in particular, for the acquisition of bacterial endosymbionts including the ancestor of the mitochondrion.
Comparisons of the sets of proteins implicated in phagocytosis in different eukaryotes reveal extreme diversity, with very few highly conserved components that typically do not possess readily identifiable prokaryotic homologs. Nevertheless, phylogenetic analysis of those proteins for which such homologs do exist yields clues to the possible origin of phagocytosis. The central finding is that a subset of archaea encode actins that are not only monophyletic with eukaryotic actins but also share unique structural features with actin-related proteins (Arp) 2 and 3. All phagocytic processes are strictly dependent on remodeling of the actin cytoskeleton and the formation of branched filaments for which Arp2/3 are responsible. The presence of common structural features in Arp2/3 and the archaeal actins suggests that the common ancestors of the archaeal and eukaryotic actins were capable of forming branched filaments, like modern Arp2/3. The Rho family GTPases that are ubiquitous regulators of phagocytosis in eukaryotes appear to be of bacterial origin, so assuming that the host of the mitochondrial endosymbiont was an archaeon, the genes for these GTPases come via horizontal gene transfer from the endosymbiont or in an earlier event.
The present findings suggest a hypothetical scenario of eukaryogenesis under which the archaeal ancestor of eukaryotes had no cell wall (like modern Thermoplasma) but had an actin-based cytoskeleton including branched actin filaments that allowed this organism to produce actin-supported membrane protrusions. These protrusions would facilitate accidental, occasional engulfment of bacteria, one of which eventually became the mitochondrion. The acquisition of the endosymbiont triggered eukaryogenesis, in particular, the emergence of the endomembrane system that eventually led to the evolution of modern-type phagocytosis, independently in several eukaryotic lineages.
This article was reviewed by Simonetta Gribaldo, Gaspar Jekely, and Pierre Pontarotti. For the full reviews, please go to the Reviewers' Reports section.
The phylum Verrucomicrobia is a widespread but poorly characterized bacterial clade. Although cultivation-independent approaches detect representatives of this phylum in a wide range of environments, including soils, seawater, hot springs and human gastrointestinal tract, only few have been isolated in pure culture. We have recently reported cultivation and initial characterization of an extremely acidophilic methanotrophic member of the Verrucomicrobia, strain V4, isolated from the Hell's Gate geothermal area in New Zealand. Similar organisms were independently isolated from geothermal systems in Italy and Russia.
We report the complete genome sequence of strain V4, the first one from a representative of the Verrucomicrobia. Isolate V4, initially named "Methylokorus infernorum" (and recently renamed Methylacidiphilum infernorum) is an autotrophic bacterium with a streamlined genome of ~2.3 Mbp that encodes simple signal transduction pathways and has a limited potential for regulation of gene expression. Central metabolism of M. infernorum was reconstructed almost completely and revealed highly interconnected pathways of autotrophic central metabolism and modifications of C1-utilization pathways compared to other known methylotrophs. The M. infernorum genome does not encode tubulin, which was previously discovered in bacteria of the genus Prosthecobacter, or close homologs of any other signature eukaryotic proteins. Phylogenetic analysis of ribosomal proteins and RNA polymerase subunits unequivocally supports grouping Planctomycetes, Verrucomicrobia and Chlamydiae into a single clade, the PVC superphylum, despite dramatically different gene content in members of these three groups. Comparative-genomic analysis suggests that evolution of the M. infernorum lineage involved extensive horizontal gene exchange with a variety of bacteria. The genome of M. infernorum shows apparent adaptations for existence under extremely acidic conditions including a major upward shift in the isoelectric points of proteins.
The results of genome analysis of M. infernorum support the monophyly of the PVC superphylum. M. infernorum possesses a streamlined genome but seems to have acquired numerous genes including those for enzymes of methylotrophic pathways via horizontal gene transfer, in particular, from Proteobacteria.
This article was reviewed by John A. Fuerst, Ludmila Chistoserdova, and Radhey S. Gupta.
The set of conserved eukaryotic protein-coding genes includes distinct subsets one of which appears to be most closely related to and, by inference, derived from archaea, whereas another one appears to be of bacterial, possibly, endosymbiotic origin. The “archaeal” genes of eukaryotes, primarily, encode components of information-processing systems, whereas the “bacterial” genes are predominantly operational. The precise nature of the archaeo–eukaryotic relationship remains uncertain, and it has been variously argued that eukaryotic informational genes evolved from the homologous genes of Euryarchaeota or Crenarchaeota (the major branches of extant archaea) or that the origin of eukaryotes lies outside the known diversity of archaea. We describe a comprehensive set of 355 eukaryotic genes of apparent archaeal origin identified through ortholog detection and phylogenetic analysis. Phylogenetic hypothesis testing using constrained trees, combined with a systematic search for shared derived characters in the form of homologous inserts in conserved proteins, indicate that, for the majority of these genes, the preferred tree topology is one with the eukaryotic branch placed outside the extant diversity of archaea although small subsets of genes show crenarchaeal and euryarchaeal affinities. Thus, the archaeal genes in eukaryotes appear to descend from a distinct, ancient, and otherwise uncharacterized archaeal lineage that acquired some euryarchaeal and crenarchaeal genes via early horizontal gene transfer.
archaea; eukaryotes; Euryarchaeota; Crenarchaeota; phylogenetic analysis