Divergence of the bacterial and eukaryotic lineages appears to represent the deepest split in the tree of life (22
). Because the DNA replication proteins of these groups are of fundamental importance and interact through complex mechanisms, it seems likely that the genome replication system, like the translational system, would contain the most conserved coevolved genes among all related lineages.
Obvious functional homologues of replication genes are found in bacteria, eukaryotes, and archaea, including proteins involved in origin recognition, helicases, DNA-binding proteins, DNA synthesis, sliding clamp processivity factors (PCNA), ligation, and primer removal (see reference 7
and references therein). However, there are clear differences in sequence similarity that separate the replication proteins of bacteria from those of the archea and eukaryotes (7
). The bacterial replication genes thus appear evolutionarily unrelated to those of eukaryotes and archaea. For example, the replicative DNA polymerase (Pol) III of Escherichia coli
belongs to the family C DNA Pol group and does not have similarity to either of the two mammalian replicative DNA family B DNA Pols (alpha priming and delta extending; see reference 30
). As such, phylogenetic analysis of these replicative DNA Pols results in polyphyletic groupings that are contrary to accepted species trees (6
). Such wide existence of functionally identical yet nonorthologous genes presents a dilemma when they are being used for connecting the universal tree of life, and this has led some to propose that the cenancestor of bacteria, archea, and eukaryotes had an RNA genome (7
). However, it is now clear that between bacteria and eukaryotes, perhaps several hundred functional genes are homologous (e.g., DNA synthesis genes). This suggests that the putative prokaryotic-eukaryotic ancestor possessed many genes inherited by both lineages (for references, see references 14
). Proper replicative transmission of such a large number of essential genes seems unlikely given the small size of RNA genomes and the error-prone nature of their replication (14
). It therefore appears more likely that the common ancestor had a DNA genome, which leaves unexplained how the replication systems underwent the transition during the divergence of bacteria from archaea and eukaryotes.
DNA viruses, however, also possess a full set of independent DNA replication and repair proteins that include members of family A and B DNA Pols (12
). When first sequenced, it was noteworthy how similar phage T4 DNA Pol was to DNA Pols alpha and delta of eukaryotes, Epstein-Barr virus, human cytomegalovirus, and other DNA viruses of eukaryotes, but not adenoviruses or E. coli
Pol I or III (23
). This similarity includes the conservation of five of six sequential domains (31
), as well as resistance to various family B-specific inhibitors (3
). Other phage DNA Pols, however, such as T7, show similarity to bacterial DNA Pol I but not to Pols of eukaryotes. With the sequencing of the entire T4 genome, it was additionally surprising to see that this strictly lytic bacteriophage had more genes similar to those of eukaryotes (including genes for self-splicing RNA [13
]) than to bacterial genes (4
). Viruses are usually thought to impose negative selection on their hosts. In addition, recombination between host and viral genomes is a commonly observed phenomenon, such as with retroviruses acquiring cellular protooncogenes (5
). Yet viruses are rarely considered a source of host genes, and hence viral sequences are not taken into account when reconstructing the tree of life. However, a viral genome can evolve up to a million time faster than that of its host. If a DNA virus could impose a stable persistent (or genomic) infection on its host, it might then also provide genes altering host evolution, as we have previously reasoned (29
). This raises the question: Could a DNA virus have been the origin of replicative eukaryotic DNA Pols?
In this report, we consider the hypothesis for the viral origin of eukaryotic replication proteins in the context of DNA viruses that infect host species which are likely representative of the earliest eukaryotes. We examine DNA Pols from two families of DNA viruses prevalent as acute infections of parasitic microalgae (Chlorella
-like viruses) (27
) and persistent infections of filamentous brown algae (Feldmania
species virus) (9
). These algal species represent some of the earliest eukaryotes for which clear archaeological data exist (11
). We perform sequence similarity and phylogenetic analyses which indicate that these viral proteins appear related to the progenitor of all eukaryotic Pol delta sequences and consider arguments that a DNA virus may have been the origin of the eukaryotic DNA replication system.