|Home | About | Journals | Submit | Contact Us | Français|
The nucleo-cytoplasmic large DNA viruses (NCLDV) constitute an apparently monophyletic group that consists of 6 families of viruses infecting a broad variety of eukaryotes. A comprehensive genome comparison and maximum-likelihood reconstruction of NCLDV evolution reveal a set of approximately 50 conserved genes that can be tentatively mapped to the genome of the common ancestor of this class of eukaryotic viruses. We address the origins and evolution of NCLDV.
Phylogenetic analysis indicates that some of the major clades of NCLDV infect diverse animals and protists, suggestive of early radiation of the NCLDV, possibly concomitant with eukaryogenesis. The core NCLDV genes seem to have originated from different sources including homologous genes of bacteriophages, bacteria and eukaryotes. These observations are compatible with a scenario of the origin of the NCLDV at an early stage of the evolution of eukaryotes through extensive mixing of genes from widely different genomes.
The common ancestor of the NCLDV probably evolved from a bacteriophage as a result of recruitment of numerous eukaryotic and some bacterial genes, and concomitant loss of the majority of phage genes except for a small core of genes coding for proteins essential for virus genome replication and virion formation.
Viruses are ubiquitous parasites of all cellular life forms. As a group, they are united by their intracellular reproduction and reliance on the host cell translation system, but not necessarily by common origin . Indeed, not a single gene is represented in the genomes of all known viruses, although a small group of ‘viral hallmark genes’ encoding some of the key proteins involved in genome replication and virion structure formation are shared by extremely diverse subsets of viruses [2,3]. Thus, viruses as a class of biological agents are not monophyletic, at least not within the traditional concept of monophyly. Nevertheless, several large groups of viruses infecting diverse hosts do appear to share common ancestry in the strict sense – that is, to have evolved from a single ancestral virus – which is indicated by the conservation of sets of genes encoding proteins responsible for many functions essential for virus reproduction.
One of the most expansive apparently monophyletic viral divisions currently includes six families of eukaryotic viruses with large DNA genomes that are collectively denoted nucleo-cytoplasmic large DNA viruses (NCLDV; table table1)1) [4,5]. The best known of these viral families, Poxviridae, is a large assemblage of animal viruses that includes a major human pathogen, the smallpox virus, important animal pathogens, such as rabbit myxoma virus, as well as vaccinia virus, one of the best characterized models of molecular biology [6,7,8]. Another family of the NCLDV that recently became the focus of much attention and fascination is the Mimiviridae, which so far includes two closely related giant viruses isolated from Acanthamoeba – Mimivirus and Mamavirus. With their genomes being slightly larger than 1 megabase, these viruses are undisputed genome size record holders in the virosphere, exceed numerous parasitic bacteria, and approach the genome size of the simplest free-living prokaryotes [9,10,11,12,13].
The NCLDV infect animals and diverse unicellular eukaryotes, and either replicate exclusively in the cytoplasm of the host cells, or possess both cytoplasmic and nuclear stages in their life cycle (table (table1).1). The NCLDV typically do not strongly depend on the host replication or transcription systems for completing their replication [6,14]. In line with this relative independence of virus reproduction from the host cell functions (apart from translation, of course), the NCLDV encode several conserved proteins that mediate most of the processes essential for viral reproduction. These key proteins include DNA polymerases, helicases and primases responsible for DNA replication, Holliday junction resolvases and topoisomerases involved in genome DNA processing and maturation, transcription factors that function in transcription initiation and elongation, ATPase pumps mediating DNA packaging, chaperones involved in capsid assembly, and capsid proteins themselves [4,5,15]. Although several viral hallmark genes  are shared by NCLDV and other large DNA viruses, such as herpesviruses and baculoviruses, the conservation of the entire set of core genes clearly demarcates the NCLDV as a distinct class of viruses .
Recently, a novel giant virus, denoted Marseillevirus, has been isolated from Acanthamoeba. Genome analysis of Marseillevirus indicated that it represents a putative new family of NCLDV that appears to be distantly related to iridoviruses and ascoviruses . In addition, comparative genomic analysis revealed probable gene exchange between Marseillevirus and Mimiviruses, an observation that suggests a role of amoeba as a ‘melting pot’ of giant virus evolution.
We performed a new comparison of the updated collection of NCLDV genomes and constructed clusters of orthologous NCLDV genes (NCVOGs), 177 of which were represented in two or more viral families . The NCVOGs were employed for phylogenetic analysis and for reconstruction of the ancestral viral gene set. Here we review the results of these analyses in the context of the origin and evolution of the NCLDV, and attempt to decipher the origins of the NCLDV genes that are mapped to the last common ancestral virus.
As in the major divisions of cellular life forms [17,18], very few genes are represented in all sequenced NCLDV genomes. The original comparative genomic analysis revealed 9 universal NCLDV genes , and the latest update that took into account the newly discovered viral families showed that 5 genes remained common to all known NCLDV  (table (table2).2). In order to derive a maximally robust phylogeny of the NCLDV, we analyzed phylogenetic trees of both the universal and the nearly universal genes. These trees had somewhat conflicting topologies; however, given the results of previous studies that pointed to an origin of the NCLDV from a single ancestral virus ([4,5] and see below), we assumed that the discrepancies between the tree topologies of the highly conserved NCLDV genes were caused by phylogenetic analysis artifacts rather than genuinely different evolutionary trajectories of these genes (although some exceptions are possible ). Thus, to produce a ‘species tree’ of the NCLDV, we derived a consensus of the trees for individual conserved genes (fig. (fig.1).1). The best supported consensus tree topology reveals three major divisions of the NCLDV: (1) the recently discovered Marseillevirus clustered with iridoviruses and ascoviruses, with the latter confidently placed inside the family Iridoviridae; (2) Mimiviruses clustered with phycodnaviruses, and (3) poxviruses clustered with asfarviruses.
When an NCLDV tree was constructed using a completely different approach that was based on the comparison of the patterns of representation (phyletic patterns) of viruses in NCVOGs , the resulting tree topologies were generally compatible with the topology of the sequence-based consensus tree, indicating that evolution of the gene repertoire of the NCLDV largely mirrored the evolution of the conserved core genes [15,16]. There was one notable exception to this congruence, namely, clustering of Marseillevirus with the Mimivirus that suggests extensive gene exchange between these viruses that reproduce in the same amoebal host .
Although viruses of unicellular eukaryotes are still poorly characterized, the hosts of known NCLDV span much of the phylogenetic diversity of eukaryotes. The best current representation of eukaryotic phylogeny appears to be a multifurcation of five (or possibly four) major supergroups (fig. (fig.1).1). The earliest events of eukaryotic radiation and, accordingly, the root of the tree remain murky [20,21,22]. Cross-mapping of the NCLDV and eukaryotic trees reveals a complex network structure where members of the same NCLDV branch often infect organisms that belong to different eukaryotic supergroups (fig. (fig.1).1). For instance, the phycodna-Mimivirus clade of NCLDV spans three eukaryotic supergroups, and the pox-asfarvirus clade spans at least two supergroups (fig. (fig.1).1). Beyond doubt, this assessment of the diversity of the NCLDV host range only scratches the surface, as indicated by the discovery of an extensive unexplored diversity of homologs of the key genes of the NCLDV in marine metagenomic sequences [23,24,25]. The discovery of numerous environmental sequences that are homologous to genes of Mimiviruses, iridoviruses, phycodnaviruses and asfarviruses suggests that not only the large divisions but even individual families of the NCLDV (with the possible exception of Poxviridae) infect highly diverse hosts including both animals and unicellular organisms from different supergroups [25,26].
The complex network connecting the different lineages of the NCLDV with the host lineages (fig. (fig.1)1) clearly indicates that, on a large scale, viruses of this class did not coevolve with their hosts. Two possible scenarios of NCLDV evolution could account for the observed virus-host mapping:
Of course, mixed evolutionary scenarios are also readily imaginable. Given that horizontal virus transfer between taxonomically distant hosts remains a speculative possibility and the indications of the early origin of the NCLDV (see below), which was probably concomitant with eukaryogenesis, the ancient divergence scenario appears most plausible.
The species tree derived as a consensus phylogeny of the conserved NCLDV genes (fig. (fig.1)1) was employed as the scaffold to reconstruct the core gene repertoires of ancestral viruses as well as gene loss and gain events during the evolution of the NCLDV. Original reconstructions of the NCLDV gene repertoire evolution were performed using a simple maximum parsimony approach. Recently, we used a more sophisticated maximum-likelihood methodology developed by Csuros and Miklos  to map 47 genes to the common ancestor of the NCLDV (table (table1)1) and reconstruct progressively growing gene repertoires for other ancestral viruses (fig. (fig.2).2). These are highly conservative reconstructions because no approach will assign to ancestral forms genes that survived in only one of the progeny lineages let alone those that were lost in all extant lineages. Nevertheless, the reconstructed gene repertoire seems to cover most, if not all, of the core functions characteristic of this class of viruses. This indicates that the common viral ancestor of all known NCLDV already possessed the relative autonomy from the host cell that is the distinguishing feature of this class of viruses. Such functions include the basal machineries for replication, transcription and transcript processing (such as the capping and decapping enzymes), enzymes required for DNA precursor synthesis (thymidine kinase and thymidylate kinase), the two major virion proteins, the central enzymes of virion morphogenesis (protease and disulfide oxidoreductase), and even some proteins implicated in virus-cell interaction such as a RING-finger ubiquitin ligase subunit (table (table22).
Some of the core functions are prone to non-orthologous gene displacement  among the NCLDV, sometimes showing complex patterns of evolution. A case in point is the DNA ligase that is an essential activity for DNA replication. The reconstruction of the ancestral NCLDV gene repertoire tentatively defines the ATP-dependent ligase as an ancestral gene; however, Mimiviruses, entomopoxviruses and some iridoviruses lack the ATP-dependent ligase and instead encode an NAD-dependent ligase that is characteristic of bacteria and also found in some bacteriophages. In addition, a considerable number of NCLDV from different families, including some poxviruses and the majority of iridoviruses, encode no DNA ligase at all. Phylogenetic analysis of ATP-dependent and NAD-dependent ligases yielded unexpected results: the NAD-dependent ligases of the NCLDV, although quantitatively less prevalent than the ATP-dependent ligases, turned out to be monophyletic, whereas the ATP-dependent ligases showed diverse phylogenetic affinities, with monophyly confidently rejected. The most likely interpretation of these findings seems to be that the ancestral NCLDV encoded an NAD-dependent ligase, probably of bacteriophage origin, but this ancestral gene was repeatedly and independently lost and replaced with the gene for an ATP-dependent ligase in several viral lineages (table (table2)2) . This case study reveals the remarkable complexity of the NCLDV evolution that is augmented by the possibility of complementation of some of the viral functions by cellular analogs, as recently demonstrated experimentally for the poxvirus DNA ligase , and is only partially captured by reconstructions based on patterns of gene presence-absence.
Given the inherent conservative character of the reconstruction and the complications caused by non-orthologous gene displacement, the actual genome size and complexity of the ancestral NCLDV is a wide-open question. The 47 genes mapped to the ancestral genome in the present reconstruction comprise only the core of most highly conserved, essential viral genes involved in key functions. The ancestral NCLDVs undoubtedly reproduced in unicellular eukaryotes, and this type of hosts support the propagation of extant giant viruses, such as the Mimiviruses [13,31], that actively absorb genes from the eukaryotic hosts as well as bacterial endosymbionts [32,33]. Thus, it cannot be ruled out that the common ancestor of all extant NCLDV was a highly complex, possibly even a giant virus .
All the complications notwithstanding, the reconstruction of the gene composition of the common ancestor of the NCLDV is a relatively straightforward task. In contrast, the origin of this ancestral virus remains enigmatic. We examined homologs and phyletic patterns of the inferred set of ancestral genes of the NCLDV in an attempt to decipher their likely origins (table (table2).2). Definitive inference of gene origins requires a comprehensive phylogenetic analysis that is beyond the scope of the present review. However, in many cases, even examination of the taxonomic composition of the most similar homologs of a gene allows one to determine its most likely origin, especially when all or nearly all homologs belong to the same taxon [see, for instance, [34,35]]. We therefore compared representative sequences of the 47 putative ancestral proteins from all NCLDV families to the non-redundant protein sequence database at the NCBI  using the BLASTP program, with multiple PSI-BLAST iterations where required [37,38], and manually examined the results using the Taxonomy Report feature of the NCBI BLAST server, in an attempt to infer the likely origin of each gene. For most of the putative ancestral genes, the taxonomic distribution of the highly conserved homologs turned out to be obviously skewed, allowing confident inference of the most likely origin that, in several cases, was also supported by previous detailed analyses (table (table22).
The majority of the ancestral genes of the NCLDV showed a clear eukaryotic affinity but a substantial minority appeared to be of bacteriophage origin and a few genes of bacterial origin (table (table22 and fig. fig.3).3). The genes of apparent bacteriophage origin encode some of the key proteins involved in viral replication, such as the DNA primase-helicase, NAD-dependent ligase and Holliday junction resolvase, and DNA packaging in the capsid, namely the packaging ATPase. The major capsid protein itself is most likely of the same origin (table (table2).2). All these genes, with the possible exception of the DNA ligase, are viral hallmark genes that are shared by diverse viruses . Genes of inferred eukaryotic origin encode proteins involved in functions that are related to the cytoplasmic site of the NCLDV replication, such as the RNA polymerase subunits, and the specifics of eukaryotic molecular biology, such as the capping and decapping enzymes or ubiquitin ligase (table (table22).
These observations are most compatible with a scenario for the origin of the NCLDV under which the ancestral virus of this class evolved from a bacteriophage by replacement of many (probably most) of the phage genes, primarily by genes acquired from the eukaryotic hosts. Only a small core of phage genes encoding virus-specific functions for which no functional analog exists in cellular life forms survived in the NCLDV genomes. It is notable that even the principal enzyme of DNA replication, the DNA polymerase, was apparently replaced by the eukaryotic counterpart. Nevertheless, this scenario is compatible with the principle of the continuity of evolution in the virus world, Omnis virus e virus.
The recent expansion of virology into the study of viruses infecting unicellular eukaryotes resulted in the unexpected discovery of giant viruses that belong to three families: Mimiviridae, Phycodnaviridae, and the putative novel family represented by Marseillevirus. Phylogenetic analysis of the expanded class of NCLDV and cross-mapping of the phylogenetic trees of NCLDV and their eukaryotic hosts suggest an early origin and primary radiation of the NCLDV, possibly concomitant with eukaryogenesis. Phylogenomic reconstruction maps approximately 50 genes to the last common ancestor of the extant NCLDV. However, this is a conservative reconstruction. A distinct possibility is that the ancestral virus of this class was indistinguishable from its modern members in terms of genetic complexity. The core NCLDV genes seem to have originated from different sources, with the majority affined with eukaryotic homologs but a substantial minority derived from bacteriophage genes. These observations are compatible with the principle of the evolutionary continuity of the viral world, whereby the common ancestor of the NCLDV evolved from a bacteriophage as a result of recruitment of numerous eukaryotic and some bacterial genes, and concomitant loss of the majority of the ancestral phage genes. Only a small core of genes coding for proteins that are essential for virus genome replication and virion formation and which have no functional analogs in cellular life forms survived in the NCLDV. Subsequent evolution of the NCLDV included lineage-specific recruitment of numerous additional genes from both the eukaryotic hosts and bacteria. Gene duplication was also prominent, especially in giant viruses, as well as loss of ancestral genes, especially in animal viruses with smaller genomes, resulting in the extensive genomic diversity observed among the extant NCLDV.
The authors’ research is supported by the DHHS intramural program (NIH, National Library of Medicine).