Search tips
Search criteria 


Logo of jvirolPermissionsJournals.ASM.orgJournalJV ArticleJournal InfoAuthorsReviewers
J Virol. 2000 August; 74(15): 7079–7084.
PMCID: PMC112226

A Hypothesis for DNA Viruses as the Origin of Eukaryotic Replication Proteins


The eukaryotic replicative DNA polymerases are similar to those of large DNA viruses of eukaryotic and bacterial T4 phages but not to those of eubacteria. We develop and examine the hypothesis that DNA virus replication proteins gave rise to those of eukaryotes during evolution. We chose the DNA polymerase from phycodnavirus (which infects microalgae) as the basis of this analysis, as it represents a virus of a primitive eukaryote. We show that it has significant similarity with replicative DNA polymerases of eukaryotes and certain of their large DNA viruses. Sequence alignment confirms this similarity and establishes the presence of highly conserved domains in the polymerase amino terminus. Subsequent reconstruction of a phylogenetic tree indicates that these algal viral DNA polymerases are near the root of the clade containing all eukaryotic DNA polymerase delta members but that this clade does not contain the polymerases of other DNA viruses. We consider arguments for the polarity of this relationship and present the hypothesis that the replication genes of DNA viruses gave rise to those of eukaryotes and not the reverse direction.

Divergence of the bacterial and eukaryotic lineages appears to represent the deepest split in the tree of life (22). Because the DNA replication proteins of these groups are of fundamental importance and interact through complex mechanisms, it seems likely that the genome replication system, like the translational system, would contain the most conserved coevolved genes among all related lineages.

Obvious functional homologues of replication genes are found in bacteria, eukaryotes, and archaea, including proteins involved in origin recognition, helicases, DNA-binding proteins, DNA synthesis, sliding clamp processivity factors (PCNA), ligation, and primer removal (see reference 7 and references therein). However, there are clear differences in sequence similarity that separate the replication proteins of bacteria from those of the archea and eukaryotes (7). The bacterial replication genes thus appear evolutionarily unrelated to those of eukaryotes and archaea. For example, the replicative DNA polymerase (Pol) III of Escherichia coli belongs to the family C DNA Pol group and does not have similarity to either of the two mammalian replicative DNA family B DNA Pols (alpha priming and delta extending; see reference 30). As such, phylogenetic analysis of these replicative DNA Pols results in polyphyletic groupings that are contrary to accepted species trees (6). Such wide existence of functionally identical yet nonorthologous genes presents a dilemma when they are being used for connecting the universal tree of life, and this has led some to propose that the cenancestor of bacteria, archea, and eukaryotes had an RNA genome (7, 17). However, it is now clear that between bacteria and eukaryotes, perhaps several hundred functional genes are homologous (e.g., DNA synthesis genes). This suggests that the putative prokaryotic-eukaryotic ancestor possessed many genes inherited by both lineages (for references, see references 14 and 8). Proper replicative transmission of such a large number of essential genes seems unlikely given the small size of RNA genomes and the error-prone nature of their replication (14). It therefore appears more likely that the common ancestor had a DNA genome, which leaves unexplained how the replication systems underwent the transition during the divergence of bacteria from archaea and eukaryotes.

DNA viruses, however, also possess a full set of independent DNA replication and repair proteins that include members of family A and B DNA Pols (12). When first sequenced, it was noteworthy how similar phage T4 DNA Pol was to DNA Pols alpha and delta of eukaryotes, Epstein-Barr virus, human cytomegalovirus, and other DNA viruses of eukaryotes, but not adenoviruses or E. coli Pol I or III (23). This similarity includes the conservation of five of six sequential domains (31), as well as resistance to various family B-specific inhibitors (3). Other phage DNA Pols, however, such as T7, show similarity to bacterial DNA Pol I but not to Pols of eukaryotes. With the sequencing of the entire T4 genome, it was additionally surprising to see that this strictly lytic bacteriophage had more genes similar to those of eukaryotes (including genes for self-splicing RNA [13]) than to bacterial genes (4). Viruses are usually thought to impose negative selection on their hosts. In addition, recombination between host and viral genomes is a commonly observed phenomenon, such as with retroviruses acquiring cellular protooncogenes (5, 28). Yet viruses are rarely considered a source of host genes, and hence viral sequences are not taken into account when reconstructing the tree of life. However, a viral genome can evolve up to a million time faster than that of its host. If a DNA virus could impose a stable persistent (or genomic) infection on its host, it might then also provide genes altering host evolution, as we have previously reasoned (29). This raises the question: Could a DNA virus have been the origin of replicative eukaryotic DNA Pols?

In this report, we consider the hypothesis for the viral origin of eukaryotic replication proteins in the context of DNA viruses that infect host species which are likely representative of the earliest eukaryotes. We examine DNA Pols from two families of DNA viruses prevalent as acute infections of parasitic microalgae (Chlorella-like viruses) (27) and persistent infections of filamentous brown algae (Feldmania species virus) (9, 15, 16, 21, 27). These algal species represent some of the earliest eukaryotes for which clear archaeological data exist (11). We perform sequence similarity and phylogenetic analyses which indicate that these viral proteins appear related to the progenitor of all eukaryotic Pol delta sequences and consider arguments that a DNA virus may have been the origin of the eukaryotic DNA replication system.


The open reading frame that codes for the DNA Pol or Pol-like gene from Chlorella virus (NT2A; GenBank M86836; 913 amino acids [a.a.]) and Feldmania species virus (GenBank AF013260; 996 a.a.) were retrieved from GenBank. Using these sequences, a gapped Tblastn (version 2.0.4) analysis against the translated nonredundant database was performed. It was observed that essentially all of the replicative DNA family B Pols from eukaryotes showed similarity to both sequence probes. In addition, the DNA Pol sequences from most large DNA viruses of animals were also identified. Although the analysis suggests that all eukaryotic replicative DNA Pols (alpha and delta) are similar, the DNA Pol delta genes were most similar to these phycodnavirus-like genes. Interestingly, although Feldmania virus and Chlorella virus are both DNA viruses of algae, each of these DNA Pol sequences was more similar to a lower eukaryotic host DNA Pol gene (Schizosaccharomyces pombe, Candida albicans, Glycine max, or Saccharomyces cerevisiae) than to each other. In addition, the DNA Pols of several lytic phages (T4 and RB69) were identified. Also present were the DNA Pol II genes from various archaebacterial and bacterial (i.e., nonreplicative E. coli) species. Absent were the replicative DNA polymerases (Pol III) and Pol I from bacteria as well as the DNA Pols of other lytic phages (T7), adenoviruses, and related linear plasmids of fungi.

Following the elimination of redundant and incomplete proteins, the remaining sequences were aligned using ClustalW to aid in identification of homologous regions. After this alignment, four regions (labeled I, II, III, and IV) of high conservation were easily identifiable between most of the taxa and are shown listed in color patterns corresponding to similar amino acids and in biologically related groups (Fig. (Fig.1).1). As had previously been established, the family B polymerase sequences contain up to six specific domains (23, 31). We compared our conserved domains to those previously identified and determined that our regions II, III, and IV corresponded roughly to the respective regions II, III, and IV which were identified in DNA Pol alpha by Wang et al. and that our region I had been previously identified as the phosphonoacetic acid-resistant domain of herpes simplex virus type 1 DNA Pol in the study to T4 DNA Pol by Spicer et al. (23). Because there is large variation in length among these DNA Pol genes, the sequences are shown as a roughly proportional line drawing in which the locations of the four highly conserved domains are indicated, and the sequences were centered to the most highly conserved region II domain (Fig. (Fig.2).2). The two smallest sequences correspond to fragments of Micromonas pusilla virus and Chrysochromulina species virus (phycodnavirus). The next largest was the full gene (313 a.a.) for the Pol alpha of Endotrypanum (Leishmania) monterogeni, then the Helicoverpa armigera nuclear polyhedrosis virus DNA Pol (623 a.a.), and all other genes were complete sequences. The largest gene (encoding 1,855 a.a.) was the DNA Pol alpha of Plasmodium falciparum. In general, domains I and II are adjacent to each other and occur at variable positions from the amino terminus, although some Archaea species Pol II genes have a region I domain well displaced toward the amino terminus. With the exception of Halteria species DNA Pol alpha (ciliated hypotrichous), the order of the domains was conserved, although DNA Pol alpha genes of hyptrochous species were often lacking domains II and IV. In addition, the DNA Pol II of several archaea (lineage A) had domains III and IV displaced well towards the carboxy terminus.

FIG. 1
Amino acid alignment of four highly conserved DNA Pol protein regions. Taxon names are color coded according to clade as in Fig. Fig.33 and are labeled A0 to L5 according to the branch tips therein. Gaps inserted to improve the alignment are indicated ...
FIG. 2
Protein map indicating proportional lengths of DNA Pol (black lines) and relative locations of the four conserved Pol protein domains (labeled I to IV). Proteins are mostly “centered” so that region II is aligned.

These highly conserved regions were then used to aid in the alignment of the remaining regions as follows. First, using the sequence editor GeneDoc version 2.5 (18), each taxon was examined to determine which if any of the four domains were present in the protein sequence. Next, these regions were used as anchors from which to optimize the alignment of amino acids in the intervening sections. These interregion sequences were extracted and aligned using ClustalW. Following this procedure, the alignments were again optimized by eye, focusing mostly on the similarity within each of the major clades. Once an overall alignment was obtained, a phylogenetic tree was constructed using the more conserved amino terminus of the protein sequence that included region I and amino acids thereafter. Phylogenetic analysis was performed using the neighbor-joining algorithm with 500 bootstrap replications (20) as implemented by PAUP version 4.0b2 (25). Pairwise distances were calculated as mean observed substitutions per site. The unrooted tree is shown in Fig. Fig.33 and is color coded to mark clear clades.

FIG. 3
Unrooted neighbor-joining phylogeny based on amino-terminal portion of DNA Pol protein sequences as discussed in the text. Labels at branch tips represent taxa as presented in Fig. Fig.1.1. Numbers at branch nodes indicate percent bootstrap support ...


The results suggest that the relationships are robust: 68% of the nodes had >90% bootstrap frequency support, and all nodes were >50%. The unrooted tree shows DNA Pol sequences falling into seven clades that correspond to biologically coherent gene sets. The two largest clades correspond to variants of DNA Pol alpha (pink) and DNA Pol delta, respectively. In the DNA Pol delta clade (black), the Feldmania species virus (which causes a prevalent persistent infection of filamentous brown algae) DNA Pol is near the base (labeled pol delta) and the Chlorella-like viral Pol genes are slightly more derived. Other Pol delta proteins appear to correspond roughly with accepted evolutionary relationships. The topology of the DNA Pol alpha group is more complex. Near its root, the trypanosomes and Leishmania species branch first, followed by insects and mammals, which, interestingly, are grouped separately from Saccharomyces and Schizosaccharomyces pombe. Also branching near the base of this clade are the macronuclear genes of various binucleated hypotrich species.

There are three distinct clades of viral DNA Pols. Two of these correspond to the poxvirus family (light gray) and the baculoviruses of insects that includes the nucleopolyhedrosis virus family (green). Both of these groups branch from the most unresolved region at the center of the tree. The third clade corresponds to the animal herpesviruses (red). It is interesting that the herpesviruses appear to share an ancestor with the Feldmania DNA Pol, which corresponds to the base of the cellular DNA Pol delta clade. The herpesviruses are further branched into three monophyletic subgroups corresponding to the alphaherpes-, gammaherpes-, and cytomegaloviruses. The placement of the herpesvirus ancestor near the unresolved center of the tree suggests a very old origin of these genes.

The remaining two groups include the replicative DNA Pol II genes from various archaea (methanogens and Thermococcus, Pyrococcus, and Sulfolobus species), which were known to be similar to family B DNA Pols (19). DNA Pol II of archaea species appears to exist as two distinct lineages, both of which are thought to be involved in genome replication (7, 26). The larger of these groups appear to share an ancestor with the DNA Pol alpha genes (blue). The smaller clade (gold) corresponds to DNA Pols found in Solfolobus and pyrodiococci archaea species. The archaeal DNA Pols on this smaller branch are closer but not directly connected to the Pol delta group. This cluster is rooted near the unresolved center of the tree. Also originating near the unresolved center are the Pols from lytic phages T4 and RB69 and from E. coli DNA Pol II (nonessential Pol).


With sequences obtained from a similarity search using DNA Pols from DNA viruses that infect microalgae and filamentous brown algae as a probe, we generated a phylogeny in which the base of the monophyletic group containing the replicative DNA Pol delta of eukaryotes resembles viral sequences. Although an earlier analysis of DNA Pol genes gave rise to similar patterns, the authors did not attempt to explain this result (6). Since it is unrooted, the phylogeny does not directly establish the polarity or direction of evolutionary change. It therefore remains formally possible that the phycodnaviruses acquired DNA Pol genes from their algal hosts and maintained similarity to them for unknown reasons. As the algal host DNA Pol genes have not been sequenced, we cannot place them on this tree. Even if they were subsequently to be placed phylogenetically near the phycodnavirus genes, this would still be unlikely to resolve the issue of evolutionary direction. However, we believe several considerations argue that the direction of transmission was from virus to host. First, only under this circumstance could the dilemma of dissimilar replication genes now present in bacteria and eukaryotes be resolved. In addition, all the other viral DNA Pols examined form distinct monophyletic groups (i.e., herpesviruses, poxviruses, and baculoviruses) that do not include host Pols. Therefore, these other viruses did not appear to acquire their Pol genes from a host species. The DNA delta clade is clearly monophyletic yet includes all the diverse phycodnavirus Pols of both microalgal and filamentous algal hosts. Thus, the phycodnaviruses are clearly evolutionarily exceptional DNA viruses. The simplest way to account for these observations is to propose that host Pol delta genes are derived from an early DNA viral gene that resembles that present in Feldmania virus.

Trees of life have been generated using different genes, yielding multiple evolutionary histories (8). Phylogenetic analysis of DNA Pol sequences presents patterns inconsistent with accepted organismal phylogenies. These phylogenetic disparities are difficult to explain if most genetic variation during evolution of species occurs by random genetic change and vertical gene transmission. Genomic analysis has suggested that horizontal transfer of gene sets may have been more prevalent then previously believed, especially in bacterial species. Horizontal transmission of DNA replication genes, however, would suggest the transfer of fundamental, complex, cellular components and the involvement of a DNA virus. We have argued that the persistence of a genetic parasite (a virus or its defective derivatives) is a life strategy that can allow the superimposition of complex molecular genetic control systems onto its host (29). As such, a persistent agent (like Feldmania virus) can potentially provide new systems of genetic control, including genome replication, to its host, particularly if it is integrated into the genome. We suggest at least in the case of DNA Pol delta an evolutionary link of the bacteria and eukaryota (and archaea) via the DNA Pol of an ancient DNA virus, not the replicative host genes. Our analysis also suggests that DNA Pol alpha may share an ancestor with DNA Pol II of archaea that diverged after the initial divergence of bacteria from eukaryotes and archaea. Two other DNA Pols resemble the family B replicative Pols of eukaryotes and archaea. One is the nonessential Pol II of E. coli, and the other is the Pol from lytic phages T4 and RB69. Both branch from the largely unresolved center of the tree. As the phages represent a much more transmissible system than E. coli Pol II, and as T-like phages infect both bacteria and archaea (Euryachaeota kingdom [32]), it is easier to envision substitution of functional homologues for DNA replication genes if such a virus was involved. Other DNA replication genes may also fit this pattern, since it is known that DNA viruses also code for various ligases, helicases, and PCNA-like genes as well as “repair-like” DNA Pols, such as DNA Pol beta, found in entomopoxvirus (1).

Many of the crucial regulatory genes of DNA viruses, such as the T antigens of polyomaviruses, have no known host analogues, even though these viruses are phylogenetically congruent with their host species over long periods of time (29). Thus, at least for these regulatory genes, they are viral, not host, creations. Viral genomes can evolve much faster than host genomes, and populations are known to exhibit much greater genetic variability, as demonstrated by the frequent occurrence of mutants and defectives. Thus, viral systems have an enhanced capacity to produce genetic novelty. Although some examples of virus-mediated horizontal gene transfer have recently been proposed (2), in most of these proposals it is suggested that the host, not the virus, is the original source of the transferred gene. We now suggest that such infectious and/or persisting agents may be a general source for acquisition of complex molecular systems and phenotypes.


This research was supported by the Irvine Research Unit in Animal Virology.


1. Afonso C L, Tulman E R, Lu Z, Oma E, Kutish G F, Rock D L. The genome of Melanoplus sanguinipes entomopoxvirus. J Virol. 1999;73:533–552. [PMC free article] [PubMed]
2. Baldo A M, McClure M A. Evolution and horizontal transfer of dUTPase-encoding genes in viruses and their hosts. J Virol. 1999;73:7710–7721. [PMC free article] [PubMed]
3. Bernad A, Zaballos A, Salas M, Blanco L. Structural and functional relationships between prokaryotic and eukaryotic DNA polymerases. EMBO J. 1987;6:4219–4225. [PubMed]
4. Bernstein H, Bernstein C. Bacteriophage T4 genetic homologies with bacteria and eucaryotes. J Bacteriol. 1989;171:2265–2270. [PMC free article] [PubMed]
5. Bishop J M. Cellular oncogenes and retroviruses. Annu Rev Biochem. 1983;52:301–354. [PubMed]
6. Braithwaite D K, Ito J. Compilation, alignment, and phylogenetic relationships of DNA polymerases. Nucleic Acids Res. 1993;21:787–802. [PMC free article] [PubMed]
7. Edgell D R, Klenk H P, Doolittle W F. Gene duplications in evolution of archaeal family B DNA polymerases. J Bacteriol. 1997;179:2632–2640. [PMC free article] [PubMed]
8. Forterre P. Displacement of cellular proteins by functional analogues from plasmids or viruses could explain puzzling phylogenies of many DNA informational proteins. Mol Microbiol. 1999;33:457–465. [PubMed]
9. Goldbach R, De Haan P. RNA viral supergroups and the evolution of RNA viruses. In: Morse S S, editor. The evolutionary biology of viruses. New York, N.Y: Raven Press, Ltd.; 1994. pp. 105–119.
10. Kapp M. Viruses infecting marine brown algae. Virus Genes. 1998;16:111–117. [PubMed]
11. Knoll A H. The early evolution of eukaryotes: a geological perspective. Science. 1992;256:622–627. [PubMed]
12. Knopf C W. Evolution of viral DNA-dependent DNA polymerases. Virus Genes. 1998;16:47–58. [PubMed]
13. Kutter E, Gachechiladze K, Poglazov A, Marusich E, Shneider M, Aronsson P, Napuli A, Porter D, Mesyanzhinov V. Evolution of T4-related phages. Virus Genes. 1995;11:285–297. [PubMed]
14. Lake J A, Jain R, Rivera M C. Mix and match in the tree of life. Science. 1999;283:2027–2028. [PubMed]
15. Mueller D G, Braeutigam M, Knippers R. Virus infection and persistence of foreign DNA in the marine brown alga Feldmannia simplex (Ectocarpales, Phaeophyceae) Phycologia. 1996;35:61–63.
16. Muller D G, Sengco M, Wolf S, Brautigam M, Schmid C E, Kapp M, Knippers R. Comparison of two DNA viruses infecting the marine brown algae Ectocarpus siliculosus and E. fasciculatus. J Gen Virol. 1996;77:2329–2333. [PubMed]
17. Mushegian A R, Koonin E V. A minimal gene set for cellular life derived by comparison of complete bacterial genomes. Proc Natl Acad Sci USA. 1996;93:10268–10273. [PubMed]
18. Nicholas K B, Nicholas Jr H B, Deerfield II D W. GeneDoc: analysis and visualization of genetic variation. EMBNEWNEWS. 1997;4:14. . ( annotating multiple sequence alignments, version 2.5. ) annotating multiple sequence alignments, version 2.5.
19. Pisani F M, De Martino C, Rossi M. A DNA polymerase from the archaeon Sulfolobus solfataricus shows sequence similarity to family B DNA polymerases. Nucleic Acids Res. 1992;20:2711–2716. [PMC free article] [PubMed]
20. Saitou N, Nei M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol. 1987;4:406–425. [PubMed]
21. Sengco M R, Braeutigam M, Kapp M, Mueller D G. Detection of virus DNA in Ectocarpus siliculosus and E. fasciculatus (Phaeophyceae) from various geographic areas. Eur J Phycol. 1996;31:73–78.
22. Sogin M L, Gunderson J H, Elwood H J, Alonso R A, Peattie D A. Phylogenetic meaning of the kingdom concept: an unusual ribosomal RNA from Giardia lamblia. Science. 1989;243:75–77. [PubMed]
23. Spicer E K, Rush J, Fung C, Reha-Krantz L J, Karam J D, Konigsberg W H. Primary structure of T4 DNA polymerase. Evolutionary relatedness to eucaryotic and other procaryotic DNA polymerases. J Biol Chem. 1988;263:7478–7486. [PubMed]
24. Staskawicz B J, Ausubel F M, Baker B J, Ellis J G, Jones J D. Molecular genetics of plant disease resistance. Science. 1995;268:661–667. [PubMed]
25. Swofford D L. PAUP: a computer program for phylogenetic inference using maximum parsimony. J Gen Physiol. 1993;102:9A.
26. Uemori T, Ishino Y, Doi H, Kato I. The hyperthermophilic archaeon Pyrodictium occultum has two alpha-like DNA polymerases. J Bacteriol. 1995;177:2164–2177. [PMC free article] [PubMed]
27. Van Etten J L. Algal viruses. In: Webster R G, Granoff A, editors. Encyclopedia of virology. Vol. 1. San Diego, Calif: Academic Press, Inc.; 1994. pp. 35–40.
28. Varmus H E. The molecular genetics of cellular oncogenes. Annu Rev Genet. 1984;18:553–612. [PubMed]
29. Villarreal L P. DNA virus contribution to host evolution. In: Domingo E, Webster R G, Holland J J, editors. Origin and evolution of viruses. San Diego, Calif: Academic Press; 1999. pp. 391–420.
30. Wang C C, Yeh L S, Karam J D. Modular organization of T4 DNA polymerase: evidence from phylogenetics. J Biol Chem. 1995;270:26558–26564. [PubMed]
31. Wang T S, Wong S W, Korn D. Human DNA polymerase alpha: predicted functional domains and relationships with viral DNA polymerases. FASEB J. 1989;3:14–21. [PubMed]
32. Zillig W, Prangishvilli D, Schleper C, Elferink M, Holz I, Albers S, Janekovic D, Gotz D. Viruses, plasmids and other genetic elements of thermophilic and hyperthermophilic Archaea. FEMS Microbiol Rev. 1996;18:225–236. [PubMed]

Articles from Journal of Virology are provided here courtesy of American Society for Microbiology (ASM)