The Nucleo-Cytoplasmic Large DNA Viruses (NCLDV) constitute an apparently monophyletic group that consists of at least 6 families of viruses infecting a broad variety of eukaryotic hosts. A comprehensive genome comparison and maximum-likelihood reconstruction of the NCLDV evolution revealed a set of approximately 50 conserved, core genes that could be mapped to the genome of the common ancestor of this class of eukaryotic viruses.
We performed a detailed phylogenetic analysis of these core NCLDV genes and applied the constrained tree approach to show that the majority of the core genes are unlikely to be monophyletic. Several of the core genes have been independently acquired from different sources by different NCLDV lineages whereas for the majority of these genes displacement by homologs from cellular organisms in one or more groups of the NCLDV was demonstrated.
A detailed study of the evolution of the genomic core of the NCLDV reveals substantial complexity and diversity of evolutionary scenarios that was largely unsuspected previously. The phylogenetic coherence between the core genes is sufficient to validate the hypothesis on the evolution of all NCLDV from a common ancestral virus although the set of ancestral genes might be smaller than previously inferred from patterns of gene presence-absence.
Eukaryotic Nucleo-Cytoplasmic Large DNA Viruses (NCLDV) encode most if not all of the enzymes involved in their DNA replication. It has been inferred that genes for these enzymes were already present in the last common ancestor of the NCLDV. However, the details of the evolution of these genes that bear on the complexity of the putative ancestral NCLDV and on the evolutionary relationships between viruses and their hosts are not well understood.
Phylogenetic analysis of the ATP-dependent and NAD-dependent DNA ligases encoded by the NCLDV reveals an unexpectedly complex evolutionary history. The NAD-dependent ligases are encoded only by a minority of NCLDV (including mimiviruses, some iridoviruses and entomopoxviruses) but phylogenetic analysis clearly indicated that all viral NAD-dependent ligases are monophyletic. Combined with the topology of the NCLDV tree derived by consensus of trees for universally conserved genes suggests that this enzyme was represented in the ancestral NCLDV. Phylogenetic analysis of ATP-dependent ligases that are encoded by chordopoxviruses, most of the phycodnaviruses and Marseillevirus failed to demonstrate monophyly and instead revealed an unexpectedly complex evolutionary trajectory. The ligases of the majority of phycodnaviruses and Marseillevirus seem to have evolved from bacteriophage or bacterial homologs; the ligase of one phycodnavirus, Emiliana huxlei virus, belongs to the eukaryotic DNA ligase I branch; and ligases of chordopoxviruses unequivocally cluster with eukaryotic DNA ligase III.
Examination of phyletic patterns and phylogenetic analysis of DNA ligases of the NCLDV suggest that the common ancestor of the extant NCLDV encoded an NAD-dependent ligase that most likely was acquired from a bacteriophage at the early stages of evolution of eukaryotes. By contrast, ATP-dependent ligases from different prokaryotic and eukaryotic sources displaced the ancestral NAD-dependent ligase at different stages of subsequent evolution. These findings emphasize complex routes of viral evolution that become apparent through detailed phylogenomic analysis but not necessarily in reconstructions based on phyletic patterns of genes.
This article was reviewed by: Patrick Forterre, George V. Shpakovski, and Igor B. Zhulin.
Nucleo-Cytoplasmic Large DNA viruses (NCLDV), a diverse group that infects a wide range of eukaryotic hosts, exhibit a large heterogeneity in genome size (between 100 kb and 1.2 Mb) but have been suggested to form a monophyletic group on the basis of a small subset of approximately 30 conserved genes. NCLDV were proposed to have evolved by simplification from cellular organism although some of the giant NCLDV have clearly grown by gene accretion from a bacterial origin.
We demonstrate here that many NCLDV lineages appear to have undergone frequent gene exchange in two different ways. Viruses which infect protists directly (Mimivirus) or algae which exist as intracellular protists symbionts (Phycodnaviruses) acquire genes from a bacterial source. Metazoan viruses such as the Poxviruses show a predominant acquisition of host genes. In both cases, the laterally acquired genes show a strong tendency to be positioned at the tip of the genome. Surprisingly, several core genes believed to be ancestral in the family appear to have undergone lateral gene transfers, suggesting that the NCLDV ancestor might have had a smaller genome than previously believed. Moreover, our data show that the larger the genome, the higher is the number of laterally acquired genes. This pattern is incompatible with a genome reduction from a cellular ancestor.
We propose that the NCLDV viruses have evolved by significant growth of a simple DNA virus by gene acquisition from cellular sources.
The Nucleo-Cytoplasmic Large DNA Viruses (NCLDV) comprise an apparently monophyletic class of viruses that infect a broad variety of eukaryotic hosts. Recent progress in isolation of new viruses and genome sequencing resulted in a substantial expansion of the NCLDV diversity, resulting in additional opportunities for comparative genomic analysis, and a demand for a comprehensive classification of viral genes.
A comprehensive comparison of the protein sequences encoded in the genomes of 45 NCLDV belonging to 6 families was performed in order to delineate cluster of orthologous viral genes. Using previously developed computational methods for orthology identification, 1445 Nucleo-Cytoplasmic Virus Orthologous Groups (NCVOGs) were identified of which 177 are represented in more than one NCLDV family. The NCVOGs were manually curated and annotated and can be used as a computational platform for functional annotation and evolutionary analysis of new NCLDV genomes. A maximum-likelihood reconstruction of the NCLDV evolution yielded a set of 47 conserved genes that were probably present in the genome of the common ancestor of this class of eukaryotic viruses. This reconstructed ancestral gene set is robust to the parameters of the reconstruction procedure and so is likely to accurately reflect the gene core of the ancestral NCLDV, indicating that this virus encoded a complex machinery of replication, expression and morphogenesis that made it relatively independent from host cell functions.
The NCVOGs are a flexible and expandable platform for genome analysis and functional annotation of newly characterized NCLDV. Evolutionary reconstructions employing NCVOGs point to complex ancestral viruses.
The family Mimiviridae belongs to the large monophyletic group of Nucleo-Cytoplasmic Large DNA Viruses (NCLDV; proposed order Megavirales) and encompasses giant viruses infecting amoeba and probably other unicellular eukaryotes. The recent discovery of the Cafeteria roenbergensis virus (CroV), a distant relative of the prototype mimiviruses, led to a substantial expansion of the genetic variance within the family Mimiviridae. In the light of these findings, a reassessment of the relationships between the mimiviruses and other NCLDV and reconstruction of the evolution of giant virus genomes emerge as interesting and timely goals.
Database searches for the protein sequences encoded in the genomes of several viruses originally classified as members of the family Phycodnaviridae, in particular Organic Lake phycodnaviruses and Phaeocystis globosa viruses (OLPG), revealed a greater number of highly similar homologs in members of the Mimiviridae than in phycodnaviruses. We constructed a collection of 898 Clusters of Orthologous Genes for the putative expanded family Mimiviridae (MimiCOGs) and used these clusters for a comprehensive phylogenetic analysis of the genes that are conserved in most of the NCLDV. The topologies of the phylogenetic trees for these conserved viral genes strongly support the monophyly of the OLPG and the mimiviruses. The same tree topology was obtained by analysis of the phyletic patterns of conserved viral genes. We further employed the mimiCOGs to obtain a maximum likelihood reconstruction of the history of genes losses and gains among the giant viruses. The results reveal massive gene gain in the mimivirus branch and modest gene gain in the OLPG branch.
These phylogenomic results reported here suggest a substantial expansion of the family Mimiviridae. The proposed expanded family encompasses a greater diversity of viruses including a group of viruses with much smaller genomes than those of the original members of the Mimiviridae. If the OLPG group is included in an expanded family Mimiviridae, it becomes the only family of giant viruses currently shown to host virophages. The mimiCOGs are expected to become a key resource for phylogenomics of giant viruses.
A recent work has provided strong arguments in favor of a fourth domain of Life composed of nucleo-cytoplasmic large DNA viruses (NCLDVs). This hypothesis was supported by phylogenetic and phyletic analyses based on a common set of proteins conserved in Eukarya, Archaea, Bacteria, and viruses, and implicated in the functions of information storage and processing. Recently, the genome of a new NCLDV, Cafeteria roenbergensis virus (CroV), was released. The present work aimed to determine if CroV supports the fourth domain of Life hypothesis.
A consensus phylogenetic tree of NCLDVs including CroV was generated from a concatenated alignment of four universal proteins of NCLDVs. Some features of the gene complement of CroV and its distribution along the genome were further analyzed. Phylogenetic and phyletic analyses were performed using the previously identified common set of informational genes present in Eukarya, Archaea, Bacteria, and NCLDVs, including CroV.
Phylogenetic reconstructions indicated that CroV is clearly related to the Mimiviridae family. The comparison between the gene repertoires of CroV and Mimivirus showed similarities regarding the gene contents and genome organization. In addition, the phyletic clustering based on the comparison of informational gene repertoire between Eukarya, Archaea, Bacteria, and NCLDVs unambiguously classified CroV with other NCLDVs and clearly included it in a fourth domain of Life. Taken together, these data suggest that Mimiviridae, including CroV, may have inherited a common gene content probably acquired from a common Mimiviridae ancestor.
This further analysis of the gene repertoire of CroV consolidated the fourth domain of Life hypothesis and contributed to outline a functional pan-genome for giant viruses infecting phagocytic protistan grazers.
Nucleo-cytoplasmic large DNA viruses (NCLDVs) constitute a group of eukaryotic viruses that can have crucial ecological roles in the sea by accelerating the turnover of their unicellular hosts or by causing diseases in animals. To better characterize the diversity, abundance and biogeography of marine NCLDVs, we analyzed 17 metagenomes derived from microbial samples (0.2–1.6 μm size range) collected during the Tara Oceans Expedition. The sample set includes ecosystems under-represented in previous studies, such as the Arabian Sea oxygen minimum zone (OMZ) and Indian Ocean lagoons. By combining computationally derived relative abundance and direct prokaryote cell counts, the abundance of NCLDVs was found to be in the order of 104–105 genomes ml−1 for the samples from the photic zone and 102–103 genomes ml−1 for the OMZ. The Megaviridae and Phycodnaviridae dominated the NCLDV populations in the metagenomes, although most of the reads classified in these families showed large divergence from known viral genomes. Our taxon co-occurrence analysis revealed a potential association between viruses of the Megaviridae family and eukaryotes related to oomycetes. In support of this predicted association, we identified six cases of lateral gene transfer between Megaviridae and oomycetes. Our results suggest that marine NCLDVs probably outnumber eukaryotic organisms in the photic layer (per given water mass) and that metagenomic sequence analyses promise to shed new light on the biodiversity of marine viruses and their interactions with potential hosts.
eukaryotic viruses; marine NCLDVs; taxon co-occurrence; oomycetes
Mimivirus is a nucleocytoplasmic large DNA virus (NCLDV) with a genome size (1.2 Mb) and coding capacity ( 1000 genes) comparable to that of some cellular organisms. Unlike other viruses, Mimivirus and its NCLDV relatives encode homologs of broadly conserved informational genes found in Bacteria, Archaea, and Eukaryotes, raising the possibility that they could be placed on the tree of life. A recent phylogenetic analysis of these genes showed the NCLDVs emerging as a monophyletic group branching between Eukaryotes and Archaea. These trees were interpreted as evidence for an independent “fourth domain” of life that may have contributed DNA processing genes to the ancestral eukaryote. However, the analysis of ancient evolutionary events is challenging, and tree reconstruction is susceptible to bias resulting from non-phylogenetic signals in the data. These include compositional heterogeneity and homoplasy, which can lead to the spurious grouping of compositionally-similar or fast-evolving sequences. Here, we show that these informational gene alignments contain both significant compositional heterogeneity and homoplasy, which were not adequately modelled in the original analysis. When we use more realistic evolutionary models that better fit the data, the resulting trees are unable to reject a simple null hypothesis in which these informational genes, like many other NCLDV genes, were acquired by horizontal transfer from eukaryotic hosts. Our results suggest that a fourth domain is not required to explain the available sequence data.
Heterocapsa circularisquama DNA virus (HcDNAV; previously designated as HcV) is a giant virus (girus) with a ~356-kbp double-stranded DNA (dsDNA) genome. HcDNAV lytically infects the bivalve-killing marine dinoflagellate H. circularisquama, and currently represents the sole DNA virus isolated from dinoflagellates, one of the most abundant protists in marine ecosystems. Its morphological features, genome type, and host range previously suggested that HcDNAV might be a member of the family Phycodnaviridae of Nucleo-Cytoplasmic Large DNA Viruses (NCLDVs), though no supporting sequence data was available. NCLDVs currently include two families found in aquatic environments (Phycodnaviridae, Mimiviridae), one mostly infecting terrestrial animals (Poxviridae), another isolated from fish, amphibians and insects (Iridoviridae), and the last one (Asfarviridae) exclusively represented by the animal pathogen African swine fever virus (ASFV), the agent of a fatal hemorrhagic disease in domestic swine. In this study, we determined the complete sequence of the type B DNA polymerase (PolB) gene of HcDNAV. The viral PolB was transcribed at least from 6 h post inoculation (hpi), suggesting its crucial function for viral replication. Most unexpectedly, the HcDNAV PolB sequence was found to be closely related to the PolB sequence of ASFV. In addition, the amino acid sequence of HcDNAV PolB showed a rare amino acid substitution within a motif containing highly conserved motif: YSDTDS was found in HcDNAV PolB instead of YGDTDS in most dsDNA viruses. Together with the previous observation of ASFV-like sequences in the Sorcerer II Global Ocean Sampling metagenomic datasets, our results further reinforce the ideas that the terrestrial ASFV has its evolutionary origin in marine environments.
The discovery of Mimivirus, with its very large genome content, made it possible to identify genes common to the three domains of life (Eukarya, Bacteria and Archaea) and to generate controversial phylogenomic trees congruent with that of ribosomal genes, branching Mimivirus at its root. Here we used sequences from metagenomic databases, Marseillevirus and three new viruses extending the Mimiviridae family to generate the phylogenetic trees of eight proteins involved in different steps of DNA processing. Compared to the three ribosomal defined domains, we report a single common origin for Nucleocytoplasmic Large DNA Viruses (NCLDV), DNA processing genes rooted between Archaea and Eukarya, with a topology congruent with that of the ribosomal tree. As for translation, we found in our new viruses, together with Mimivirus, five proteins rooted deeply in the eukaryotic clade. In addition, comparison of informational genes repertoire based on phyletic pattern analysis supports existence of a clade containing NCLDVs clearly distinct from that of Eukarya, Bacteria and Archaea. We hypothesize that the core genome of NCLDV is as ancient as the three currently accepted domains of life.
Genomes of nucleocytoplasmic large DNA viruses (NCLDVs) encode enzymes that catalyze the formation of disulfide bonds between cysteine amino acid residues in proteins, a function essential for the proper assembly and propagation of NCLDV virions. Recently, a catalyst of disulfide formation was identified in baculoviruses, a group of large double-stranded DNA viruses considered phylogenetically distinct from NCLDVs. The NCLDV and baculovirus disulfide catalysts are flavin adenine dinucleotide (FAD)-binding sulfhydryl oxidases related to the cellular Erv enzyme family, but the baculovirus enzyme, the product of the Ac92 gene in Autographa californica multiple nucleopolyhedrovirus (AcMNPV), is highly divergent at the amino acid sequence level. The crystal structure of the Ac92 protein presented here shows a configuration of the active-site cysteine residues and bound cofactor similar to that observed in other Erv sulfhydryl oxidases. However, Ac92 has a complex quaternary structural arrangement not previously seen in cellular or viral enzymes of this family. This novel assembly comprises a dimer of pseudodimers with a striking 40-degree kink in the interface helix between subunits. The diversification of the Erv sulfhydryl oxidase enzymes in large double-stranded DNA viruses exemplifies the extreme degree to which these viruses can push the boundaries of protein family folds.
The family Phycodnaviridae encompasses a diverse and rapidly expanding collection of large icosahedral, dsDNA viruses that infect algae. These lytic and lysogenic viruses have genomes ranging from 160 to 560 kb. The family consists of six genera based initially on host range and supported by sequence comparisons. The family is monophyletic with branches for each genus, but the phycodnaviruses have evolutionary roots that connect them with several other families of large DNA viruses, referred to as the nucleocytoplasmic large DNA viruses (NCLDV).
Ectocarpus siliculosus virus-1 (EsV-1) is a lysogenic dsDNA virus belonging to the super family of nucleocytoplasmic large DNA viruses (NCLDV) that infect Ectocarpus siliculosus, a marine filamentous brown alga. Previous studies indicated that the viral genome is integrated into the host DNA. In order to find the integration sites of the viral genome, a genomic library from EsV-1-infected algae was screened using labelled EsV-1 DNA. Several fragments were isolated and some of them were sequenced and analyzed in detail.
Analysis revealed that the algal genome is split by a copy of viral sequences that have a high identity to EsV-1 DNA sequences. These fragments are interspersed with DNA repeats, pseudogenes and genes coding for products involved in DNA replication, integration and transposition. Some of these gene products are not encoded by EsV-1 but are present in the genome of other members of the NCLDV family. Further analysis suggests that the Ectocarpus algal genome contains traces of the integration of a large dsDNA viral genome; this genome could be the ancestor of the extant NCLDV genomes. Furthermore, several lines of evidence indicate that the EsV-1 genome might have originated in these viral DNA pieces, implying the existence of a complex integration and recombination system. A protein similar to a new class of tyrosine recombinases might be a key enzyme of this system.
Our results support the hypothesis that some dsDNA viruses are monophyletic and evolved principally through genome reduction. Moreover, we hypothesize that phaeoviruses have probably developed an original replication system.
Mimivirus, a giant dsDNA virus infecting Acanthamoeba, is the prototype of the mimiviridae family, the latest addition to the family of the nucleocytoplasmic large DNA viruses (NCLDVs). Its 1.2 Mb-genome was initially predicted to encode 917 genes. A subsequent RNA-Seq analysis precisely mapped many transcript boundaries and identified 75 new genes.
We now report a much deeper analysis using the SOLiD™ technology combining RNA-Seq of the Mimivirus transcriptome during the infectious cycle (202.4 Million reads), and a complete genome re-sequencing (45.3 Million reads). This study corrected the genome sequence and identified several single nucleotide polymorphisms. Our results also provided clear evidence of previously overlooked transcription units, including an important RNA polymerase subunit distantly related to Euryarchea homologues. The total Mimivirus gene count is now 1018, 11% greater than the original annotation.
This study highlights the huge progress brought about by ultra-deep sequencing for the comprehensive annotation of virus genomes, opening the door to a complete one-nucleotide resolution level description of their transcriptional activity, and to the realistic modeling of the viral genome expression at the ultimate molecular level. This work also illustrates the need to go beyond bioinformatics-only approaches for the annotation of short protein and non-coding genes in viral genomes.
The mimivirus L544 gene product was expressed in E. coli and crystallized; preliminary phasing of a MAD data set was performed using the selenium signal present in a crystal of recombinant selenomethionine-substituted protein.
Mimivirus is the prototype of a new family (the Mimiviridae) of nucleocytoplasmic large DNA viruses (NCLDVs), which already include the Poxviridae, Iridoviridae, Phycodnaviridae and Asfarviridae. Mimivirus specifically replicates in cells from the genus Acanthamoeba. Proteomic analysis of purified mimivirus particles revealed the presence of many subunits of the DNA-directed RNA polymerase II complex. A fully functional pre-transcriptional complex appears to be loaded in the virions, allowing mimivirus to initiate transcription within the host cytoplasm immediately upon infection independently of the host nuclear apparatus. To fully understand this process, a systematic study of mimivirus proteins that are predicted (by bioinformatics) or suspected (by proteomic analysis) to be involved in transcription was initiated by cloning and expressing them in Escherichia coli in order to determine their three-dimensional structures. Here, preliminary crystallographic analysis of the recombinant L544 protein is reported. The crystals belonged to the orthorhombic space group C2221 with one monomer per asymmetric unit. A MAD data set was used for preliminary phasing using the selenium signal present in a selenomethionine-substituted protein crystal.
nucleocytoplasmic large DNA viruses; transcription; DNA-directed RNA polymerases; mimivirus
In the last twenty years, numerous giant, dsDNA, icosahedral viruses have been discovered and assigned to the nucleocytoplasmic large dsDNA virus (NCLDV) clade. The major capsid proteins of these viruses consist of two consecutive jelly-roll domains, assembled into trimers, with pseudo 6-fold symmetry. The capsomers are assembled into arrays that have either p6 (as in Paramecium bursaria Chlorella virus-1) or p3 symmetry (as in Mimivirus). Most of the NCLDV viruses have a membrane that separates the nucleocapsid from the external capsid.
Double-jelly-roll capsid protein; Giant icosahedral DNA viruses; Unique vertices; Assembly
Acanthamoeba polyphaga Mimivirus is a giant double-stranded DNA virus defining a new genus, the Mimiviridae, among the Nucleo-Cytoplasmic Large DNA Viruses (NCLDV). We used utrastructural studies to shed light on the different steps of the Mimivirus replication cycle: entry via phagocytosis, release of viral DNA into the cell cytoplasm through fusion of viral and vacuolar membranes, and finally viral morphogenesis in an extraordinary giant cytoplasmic virus factory (VF). Fluorescent staining of the AT-rich Mimivirus DNA showed that it enters the host nucleus prior to the generation of a cytoplasmic independent replication centre that forms the core of the VF. Assembly and filling of viral capsids were observed within the replication centre, before release into the cell cytoplasm where progeny virions accumulated. 3D reconstruction from fluorescent and differential contrast interference images revealed the VF emerging from the cell surface as a volcano-like structure. Its size dramatically grew during the 24 h infectious lytic cycle. Our results showed that Mimivirus replication is an extremely efficient process that results from a rapid takeover of cellular machinery, and takes place in a unique and autonomous giant assembly centre, leading to the release of a large number of complex virions through amoebal lysis.
Nucleocytoplasmic large DNA viruses (NCLDVs) are characterized by large genomes that often encode proteins not commonly found in viruses. Two species in this group are Acanthocystis turfacea chlorella virus 1 (ATCV-1) (family Phycodnaviridae, genus Chlorovirus) and Acanthamoeba polyphaga mimivirus (family Mimiviridae), commonly known as mimivirus. ATCV-1 and other chlorovirus members encode enzymes involved in the synthesis and glycosylation of their structural proteins. In this study, we identified and characterized three enzymes responsible for the synthesis of the sugar l-rhamnose: two UDP-d-glucose 4,6-dehydratases (UGDs) encoded by ATCV-1 and mimivirus and a bifunctional UDP-4-keto-6-deoxy-d-glucose epimerase/reductase (UGER) from mimivirus. Phylogenetic analysis indicated that ATCV-1 probably acquired its UGD gene via a recent horizontal gene transfer (HGT) from a green algal host, while an earlier HGT event involving the complete pathway (UGD and UGER) probably occurred between a protozoan ancestor and mimivirus. While ATCV-1 lacks an epimerase/reductase gene, its Chlorella host may encode this enzyme. Both UGDs and UGER are expressed as late genes, which is consistent with their role in posttranslational modification of capsid proteins. The data in this study provide additional support for the hypothesis that chloroviruses, and maybe mimivirus, encode most, if not all, of the glycosylation machinery involved in the synthesis of specific glycan structures essential for virus replication and infection.
Evolution of exon-intron structure of eukaryotic genes has been a matter of long-standing, intensive debate. The introns-early concept, later rebranded ‘introns first’ held that protein-coding genes were interrupted by numerous introns even at the earliest stages of life's evolution and that introns played a major role in the origin of proteins by facilitating recombination of sequences coding for small protein/peptide modules. The introns-late concept held that introns emerged only in eukaryotes and new introns have been accumulating continuously throughout eukaryotic evolution. Analysis of orthologous genes from completely sequenced eukaryotic genomes revealed numerous shared intron positions in orthologous genes from animals and plants and even between animals, plants and protists, suggesting that many ancestral introns have persisted since the last eukaryotic common ancestor (LECA). Reconstructions of intron gain and loss using the growing collection of genomes of diverse eukaryotes and increasingly advanced probabilistic models convincingly show that the LECA and the ancestors of each eukaryotic supergroup had intron-rich genes, with intron densities comparable to those in the most intron-rich modern genomes such as those of vertebrates. The subsequent evolution in most lineages of eukaryotes involved primarily loss of introns, with only a few episodes of substantial intron gain that might have accompanied major evolutionary innovations such as the origin of metazoa. The original invasion of self-splicing Group II introns, presumably originating from the mitochondrial endosymbiont, into the genome of the emerging eukaryote might have been a key factor of eukaryogenesis that in particular triggered the origin of endomembranes and the nucleus. Conversely, splicing errors gave rise to alternative splicing, a major contribution to the biological complexity of multicellular eukaryotes. There is no indication that any prokaryote has ever possessed a spliceosome or introns in protein-coding genes, other than relatively rare mobile self-splicing introns. Thus, the introns-first scenario is not supported by any evidence but exon-intron structure of protein-coding genes appears to have evolved concomitantly with the eukaryotic cell, and introns were a major factor of evolution throughout the history of eukaryotes. This article was reviewed by I. King Jordan, Manuel Irimia (nominated by Anthony Poole), Tobias Mourier (nominated by Anthony Poole), and Fyodor Kondrashov. For the complete reports, see the Reviewers’ Reports section.
Intron sliding; Intron gain; Intron loss; Spliceosome; Splicing signals; Evolution of exon/intron structure; Alternative splicing; Phylogenetic trees; Mobile domains; Eukaryotic ancestor
We report an in-depth computational study of the protein sequences and structures of the superfamily of archaeo-eukaryotic primases (AEPs). This analysis greatly expands the range of diversity of the AEPs and reveals the unique active site shared by all members of this superfamily. In particular, it is shown that eukaryotic nucleo-cytoplasmic large DNA viruses, including poxviruses, asfarviruses, iridoviruses, phycodnaviruses and the mimivirus, encode AEPs of a distinct family, which also includes the herpesvirus primases whose relationship to AEPs has not been recognized previously. Many eukaryotic genomes, including chordates and plants, encode previously uncharacterized homologs of these predicted viral primases, which might be involved in novel DNA repair pathways. At a deeper level of evolutionary connections, structural comparisons indicate that AEPs, the nucleases involved in the initiation of rolling circle replication in plasmids and viruses, and origin-binding domains of papilloma and polyoma viruses evolved from a common ancestral protein that might have been involved in a protein-priming mechanism of initiation of DNA replication. Contextual analysis of multidomain protein architectures and gene neighborhoods in prokaryotes and viruses reveals remarkable parallels between AEPs and the unrelated DnaG-type primases, in particular, tight associations with the same repertoire of helicases. These observations point to a functional equivalence of the two classes of primases, which seem to have repeatedly displaced each other in various extrachromosomal replicons.
Large DNA viruses are ubiquitous, infecting diverse organisms ranging from algae to man, and have probably evolved from an ancient common ancestor. In aquatic environments, such algal viruses control blooms and shape the evolution of biodiversity in phytoplankton, but little is known about their biological functions. We show that Ostreococcus tauri, the smallest known marine photosynthetic eukaryote, whose genome is completely characterized, is a host for large DNA viruses, and present an analysis of the life-cycle and 186,234 bp long linear genome of OtV5. OtV5 is a lytic phycodnavirus which unexpectedly does not degrade its host chromosomes before the host cell bursts. Analysis of its complete genome sequence confirmed that it lacks expected site-specific endonucleases, and revealed the presence of 16 genes whose predicted functions are novel to this group of viruses. OtV5 carries at least one predicted gene whose protein closely resembles its host counterpart and several other host-like sequences, suggesting that horizontal gene transfers between host and viral genomes may occur frequently on an evolutionary scale. Fifty seven percent of the 268 predicted proteins present no similarities with any known protein in Genbank, underlining the wealth of undiscovered biological diversity present in oceanic viruses, which are estimated to harbour 200Mt of carbon.
Eukaryotic genes with cyanobacterial ancestry in plastid-lacking protists have been regarded as important evolutionary markers implicating the presence of plastids in the early evolution of eukaryotes. Although recent genomic surveys demonstrated the presence of cyanobacterial and algal ancestry genes in the genomes of plastid-lacking protists, comparative analyses on the origin and distribution of those genes are still limited.
We identified 12 gene families with cyanobacterial ancestry in the genomes of a taxonomically wide range of plastid-lacking eukaryotes (Phytophthora [Chromalveolata], Naegleria [Excavata], Dictyostelium [Amoebozoa], Saccharomyces and Monosiga [Opisthokonta]) using a novel phylogenetic pipeline. The eukaryotic gene clades with cyanobacterial ancestry were mostly composed of genes from bikonts (Archaeplastida, Chromalveolata, Rhizaria and Excavata). We failed to find genes with cyanobacterial ancestry in Saccharomyces and Dictyostelium, except for a photorespiratory enzyme conserved among fungi. Meanwhile, we found several Monosiga genes with cyanobacterial ancestry, which were unrelated to other Opisthokonta genes.
Our data demonstrate that a considerable number of genes with cyanobacterial ancestry have contributed to the genome composition of the plastid-lacking protists, especially bikonts. The origins of those genes might be due to lateral gene transfer events, or an ancient primary or secondary endosymbiosis before the diversification of bikonts. Our data also show that all genes identified in this study constitute multi-gene families with punctate distribution among eukaryotes, suggesting that the transferred genes could have survived through rounds of gene family expansion and differential reduction.
Viruses with genomes greater than 300 kb and up to 1200 kb are being discovered with increasing frequency. These large viruses (often called giruses) can encode up to 900 proteins and also many tRNAs. Consequently, these viruses have more protein-encoding genes than many bacteria, and the concept of small particle/small genome that once defined viruses is no longer valid. Giruses infect bacteria and animals although most of the recently discovered ones infect protists. Thus, genome gigantism is not restricted to a specific host or phylogenetic clade. To date, most of the giruses are associated with aqueous environments. Many of these large viruses (phycodnaviruses and Mimiviruses) probably have a common evolutionary ancestor with the poxviruses, iridoviruses, asfarviruses, ascoviruses, and a recently discovered Marseillevirus. One issue that is perhaps not appreciated by the microbiology community is that large viruses, even ones classified in the same family, can differ significantly in morphology, lifestyle, and genome structure. This review focuses on some of these differences rather than provides extensive details about individual viruses.
algal virus; phycodnavirus; Mimivirus; White spot shrimp virus; jumbo phage; NCLDVs
The major human intestinal pathogen Giardia lamblia is a very early branching eukaryote with a minimal genome of broad evolutionary and biological interest.
To explore early kinase evolution and regulation of Giardia biology, we cataloged the kinomes of three sequenced strains. Comparison with published kinomes and those of the excavates Trichomonas vaginalis and Leishmania major shows that Giardia's 80 core kinases constitute the smallest known core kinome of any eukaryote that can be grown in pure culture, reflecting both its early origin and secondary gene loss. Kinase losses in DNA repair, mitochondrial function, transcription, splicing, and stress response reflect this reduced genome, while the presence of other kinases helps define the kinome of the last common eukaryotic ancestor. Immunofluorescence analysis shows abundant phospho-staining in trophozoites, with phosphotyrosine abundant in the nuclei and phosphothreonine and phosphoserine in distinct cytoskeletal organelles. The Nek kinase family has been massively expanded, accounting for 198 of the 278 protein kinases in Giardia. Most Neks are catalytically inactive, have very divergent sequences and undergo extensive duplication and loss between strains. Many Neks are highly induced during development. We localized four catalytically active Neks to distinct parts of the cytoskeleton and one inactive Nek to the cytoplasm.
The reduced kinome of Giardia sheds new light on early kinase evolution, and its highly divergent sequences add to the definition of individual kinase families as well as offering specific drug targets. Giardia's massive Nek expansion may reflect its distinctive lifestyle, biphasic life cycle and complex cytoskeleton.
The eukaryotic replicative DNA polymerases are similar to those of large DNA viruses of eukaryotic and bacterial T4 phages but not to those of eubacteria. We develop and examine the hypothesis that DNA virus replication proteins gave rise to those of eukaryotes during evolution. We chose the DNA polymerase from phycodnavirus (which infects microalgae) as the basis of this analysis, as it represents a virus of a primitive eukaryote. We show that it has significant similarity with replicative DNA polymerases of eukaryotes and certain of their large DNA viruses. Sequence alignment confirms this similarity and establishes the presence of highly conserved domains in the polymerase amino terminus. Subsequent reconstruction of a phylogenetic tree indicates that these algal viral DNA polymerases are near the root of the clade containing all eukaryotic DNA polymerase delta members but that this clade does not contain the polymerases of other DNA viruses. We consider arguments for the polarity of this relationship and present the hypothesis that the replication genes of DNA viruses gave rise to those of eukaryotes and not the reverse direction.