Viruses are ubiquitous, obligate, intracellular parasites of all cellular life forms that rely on the host cell translation system, metabolism and, in many cases, the replication and transcription systems, for their reproduction. There is no evidence that all viruses have a monophyletic origin, at least not under the traditional concept of monophyly. Indeed, not a single gene is conserved in the genomes of all known viruses although a small group of “viral hallmark genes” encoding some of the key proteins involved in genome replication and virion structure formation are shared by large, diverse subsets of viruses
]. However, several large groups of viruses infecting diverse hosts do appear to share common ancestry in the strict sense, that is, to have evolved from the same ancestral virus, as indicated by the conservation of sets of genes encoding proteins responsible for many functions essential for virus reproduction.
One of the largest viral divisions that seem to be monophyletic includes 6 recognized families and a 7th
candidate family of viruses with large DNA genomes that infect diverse eukaryotes and are collectively known as Nucleo-Cytoplasmic Large DNA Viruses (NCLDV)
]. The formally recognized NCLDV families are Poxviridae
, and Mimiviridae
; in addition, the recently discovered Marseillevirus and the related Lausannevirus could not be assigned to any of the 6 families, and are likely to become founding members of a new family
]. Hereinafter we speak of 7 NCLDV families for the sake of simplicity.
By far the most thoroughly studied group of the NCLDV are the Poxviridae
, the family of animal viruses that include a major human pathogen, the smallpox virus, important animal pathogens, such as rabbit myxoma virus, as well as vaccinia virus (VACV), one of the best characterized models of molecular biology
]. Another family of the NCLDV that has recently become a major focus of attention is the Mimiviridae
that includes giant viruses infecting amoeba and probably algae
]. The genome of the prototype virus of this family, Acanthamoeba polyphaga
], slightly exceeds one megabase (Mb), and other related viruses possess even larger genomes
], so the Mimiviridae
are undisputed genome size record holders in the virosphere. Indeed, in terms of genome size and complexity, the NCLDV eclipse numerous parasitic bacteria, and approach the simplest free-living prokaryotes.
The NCLDV infect animals and diverse unicellular eukaryotes, and either replicate exclusively in the cytoplasm of the host cells, or encompass both cytoplasmic and nuclear stages in their life cycle. Most of the NCLDV do not strongly depend on the host replication or transcription systems for completing their replication
]. The autonomous life style of the NCLDV is supported by a set of conserved proteins that are encoded in the viral genomes and mediate most of the processes essential for viral reproduction. These essential, conserved proteins include DNA polymerases, helicases, and primases responsible for DNA replication, RNA polymerase subunits and transcription factors that function in transcription initiation and elongation, Holliday junction resolvases and topoisomerases involved in genome DNA processing and maturation, ATPase pumps mediating DNA packaging, molecular chaperones involved in capsid assembly and capsid proteins themselves
]. Although several viral hallmark genes are shared by NCLDV and other large DNA viruses, such as herpesviruses, baculoviruses and some bacteriophages
], the conservation of the large set of core genes clearly demarcates the NCLDV as a distinct, most likely monophyletic class of viruses
]. More specifically, reconstructions of the ancestral NCLDV genome composition using maximum parsimony and maximum likelihood methods have delineated a set of approximately 50 genes that are inferred to have been responsible for the key functions in the reproduction of the last common ancestor of the NCLDV
The core, presumably ancestral set of the NCLDV genes was delineated using sequence similarity-based methods that have been previously employed for identification of clusters of orthologous genes in diverse cellular life forms. The comparative genomic analysis underlying these reconstructions was deliberately limited to the NCLDV genomes to simplify the analysis and to facilitate detection of distant relationships between viral proteins. Indeed, some of the core NCLDV proteins, such as for example the packaging ATPases and the disulfide chaperones, show only weak sequence similarity between the viral families. Moreover, for some of the NCLDV genes with important functions in virus reproduction, indications of complex evolutionary histories have been obtained. The showcase for such evolutionary complexity is the viral DNA ligase which is represented by two distantly related forms across the NCLDV families. A maximum likelihood reconstruction based on the presence-absence of conserved genes in the viral genomes has implied that one of the two forms, the ATP-dependent DNA ligase, was the ancestral form that was present in the genome of the prototype NCLDV but was replaced by the distantly related NAD-dependent ligase in several viral lineages. However, when the reconstruction was supplemented by phylogenetic analysis of the two forms of DNA ligase, the opposite conclusion has been reached, namely that the NAD –dependent ligase was the ancestral form in NCLDV that was displaced by the ATP-dependent ligase on several independent occasions
]. This change in perspective occurred because phylogenetic analysis indicated that the ATP-dependent ligases from different lineages of the NCLDV clustered with distinct groups of eukaryotic homologs whereas the NAD–dependent ligases of the NCLDV appeared to be monophyletic. In the same vein, complex phylogenies suggestive of multiple horizontal gene transfer have been observed for several core NCLDV genes such as thymidine kinase or the two subunits of ribonucleotide reductase
Taken together, these findings imply that some of the apparently conserved genes of the NCLDV might actually have complex histories which could include independent (convergent) acquisition of these genes from different cellular organisms as well as displacement of ancestral viral gene by homologs of cellular provenance. This line of reasoning prompted us to perform a comprehensive phylogenetic analysis of the set of the putative ancestral NCLDV genes. Here we present the results of this analysis which suggest that, although the existence of a common ancestor of the NCLDV is beyond reasonable doubt, most of the conserved NCLDV genes indeed had complex evolutionary histories.