Recent advances in genomics of viruses and cellular life forms have greatly stimulated interest in the origins and evolution of viruses and, for the first time, offer an opportunity for a data-driven exploration of the deepest roots of viruses. Here we briefly review the current views of virus evolution and propose a new, coherent scenario that appears to be best compatible with comparative-genomic data and is naturally linked to models of cellular evolution that, from independent considerations, seem to be the most parsimonious among the existing ones.
Several genes coding for key proteins involved in viral replication and morphogenesis as well as the major capsid protein of icosahedral virions are shared by many groups of RNA and DNA viruses but are missing in cellular life forms. On the basis of this key observation and the data on extensive genetic exchange between diverse viruses, we propose the concept of the ancient virus world. The virus world is construed as a distinct contingent of viral genes that continuously retained its identity throughout the entire history of life. Under this concept, the principal lineages of viruses and related selfish agents emerged from the primordial pool of primitive genetic elements, the ancestors of both cellular and viral genes. Thus, notwithstanding the numerous gene exchanges and acquisitions attributed to later stages of evolution, most, if not all, modern viruses and other selfish agents are inferred to descend from elements that belonged to the primordial genetic pool. In this pool, RNA viruses would evolve first, followed by retroid elements, and DNA viruses. The Virus World concept is predicated on a model of early evolution whereby emergence of substantial genetic diversity antedates the advent of full-fledged cells, allowing for extensive gene mixing at this early stage of evolution. We outline a scenario of the origin of the main classes of viruses in conjunction with a specific model of precellular evolution under which the primordial gene pool dwelled in a network of inorganic compartments. Somewhat paradoxically, under this scenario, we surmise that selfish genetic elements ancestral to viruses evolved prior to typical cells, to become intracellular parasites once bacteria and archaea arrived at the scene. Selection against excessively aggressive parasites that would kill off the host ensembles of genetic elements would lead to early evolution of temperate virus-like agents and primitive defense mechanisms, possibly, based on the RNA interference principle. The emergence of the eukaryotic cell is construed as the second melting pot of virus evolution from which the major groups of eukaryotic viruses originated as a result of extensive recombination of genes from various bacteriophages, archaeal viruses, plasmids, and the evolving eukaryotic genomes. Again, this vision is predicated on a specific model of the emergence of eukaryotic cell under which archaeo-bacterial symbiosis was the starting point of eukaryogenesis, a scenario that appears to be best compatible with the data.
The existence of several genes that are central to virus replication and structure, are shared by a broad variety of viruses but are missing from cellular genomes (virus hallmark genes) suggests the model of an ancient virus world, a flow of virus-specific genes that went uninterrupted from the precellular stage of life's evolution to this day. This concept is tightly linked to two key conjectures on evolution of cells: existence of a complex, precellular, compartmentalized but extensively mixing and recombining pool of genes, and origin of the eukaryotic cell by archaeo-bacterial fusion. The virus world concept and these models of major transitions in the evolution of cells provide complementary pieces of an emerging coherent picture of life's history.
W. Ford Doolittle, J. Peter Gogarten, and Arcady Mushegian.
Bacterial genomes displaying a strong bias between the leading and the lagging strand of DNA replication encode two DNA polymerases III, DnaE and PolC, rather than a single one. Replication is a highly unsymmetrical process, and the presence of two polymerases is therefore not unexpected. Using comparative genomics, we explored whether other processes have evolved in parallel with each polymerase.
Extending previous in silico heuristics for the analysis of gene co-evolution, we analyzed the function of genes clustering with dnaE and polC. Clusters were highly informative. DnaE co-evolves with the ribosome, the transcription machinery, the core of intermediary metabolism enzymes. It is also connected to the energy-saving enzyme necessary for RNA degradation, polynucleotide phosphorylase. Most of the proteins of this co-evolving set belong to the persistent set in bacterial proteomes, that is fairly ubiquitously distributed. In contrast, PolC co-evolves with RNA degradation enzymes that are present only in the A+T-rich Firmicutes clade, suggesting at least two origins for the degradosome.
DNA replication involves two machineries, DnaE and PolC. DnaE co-evolves with the core functions of bacterial life. In contrast PolC co-evolves with a set of RNA degradation enzymes that does not derive from the degradosome identified in gamma-Proteobacteria. This suggests that at least two independent RNA degradation pathways existed in the progenote community at the end of the RNA genome world.
replication; degradosome; LUCA; phylogenetic profiling; nanoRNase
The elaborate eukaryotic DNA replication machinery evolved from the archaeal ancestors that themselves show considerable complexity. Here we discuss the comparative genomic and phylogenetic analysis of the core replication enzymes, the DNA polymerases, in archaea and their relationships with the eukaryotic polymerases. In archaea, there are three groups of family B DNA polymerases, historically known as PolB1, PolB2 and PolB3. All three groups appear to descend from the last common ancestors of the extant archaea but their subsequent evolutionary trajectories seem to have been widely different. Although PolB3 is present in all archaea, with the exception of Thaumarchaeota, and appears to be directly involved in lagging strand replication, the evolution of this gene does not follow the archaeal phylogeny, conceivably due to multiple horizontal transfers and/or dramatic differences in evolutionary rates. In contrast, PolB1 is missing in Euryarchaeota but otherwise seems to have evolved vertically. The third archaeal group of family B polymerases, PolB2, includes primarily proteins in which the catalytic centers of the polymerase and exonuclease domains are disrupted and accordingly the enzymes appear to be inactivated. The members of the PolB2 group are scattered across archaea and might be involved in repair or regulation of replication along with inactivated members of the RadA family ATPases and an additional, uncharacterized protein that are encoded within the same predicted operon. In addition to the family B polymerases, all archaea, with the exception of the Crenarchaeota, encode enzymes of a distinct family D the origin of which is unclear. We examine multiple considerations that appear compatible with the possibility that family D polymerases are highly derived homologs of family B. The eukaryotic DNA polymerases show a highly complex relationship with their archaeal ancestors including contributions of proteins and domains from both the family B and the family D archaeal polymerases.
DNA replication; archaea; mobile genetic elements; DNA polymerases; enzyme inactivation
Evolution of DNA polymerases, the key enzymes of DNA replication and repair, is central to any reconstruction of the history of cellular life. However, the details of the evolutionary relationships between DNA polymerases of archaea and eukaryotes remain unresolved.
We performed a comparative analysis of archaeal, eukaryotic, and bacterial B-family DNA polymerases, which are the main replicative polymerases in archaea and eukaryotes, combined with an analysis of domain architectures. Surprisingly, we found that eukaryotic Polymerase ε consists of two tandem exonuclease-polymerase modules, the active N-terminal module and a C-terminal module in which both enzymatic domains are inactivated. The two modules are only distantly related to each other, an observation that suggests the possibility that Pol ε evolved as a result of insertion and subsequent inactivation of a distinct polymerase, possibly, of bacterial descent, upstream of the C-terminal Zn-fingers, rather than by tandem duplication. The presence of an inactivated exonuclease-polymerase module in Pol ε parallels a similar inactivation of both enzymatic domains in a distinct family of archaeal B-family polymerases. The results of phylogenetic analysis indicate that eukaryotic B-family polymerases, most likely, originate from two distantly related archaeal B-family polymerases, one form giving rise to Pol ε, and the other one to the common ancestor of Pol α, Pol δ, and Pol ζ. The C-terminal Zn-fingers that are present in all eukaryotic B-family polymerases, unexpectedly, are homologous to the Zn-finger of archaeal D-family DNA polymerases that are otherwise unrelated to the B family. The Zn-finger of Polε shows a markedly greater similarity to the counterpart in archaeal PolD than the Zn-fingers of other eukaryotic B-family polymerases.
Evolution of eukaryotic DNA polymerases seems to have involved previously unnoticed complex events. We hypothesize that the archaeal ancestor of eukaryotes encoded three DNA polymerases, namely, two distinct B-family polymerases and a D-family polymerase all of which contributed to the evolution of the eukaryotic replication machinery. The Zn-finger might have been acquired from PolD by the B-family form that gave rise to Pol ε prior to or in the course of eukaryogenesis, and subsequently, was captured by the ancestor of the other B-family eukaryotic polymerases. The inactivated polymerase-exonuclease module of Pol ε might have evolved by fusion with a distinct polymerase, rather than by duplication of the active module of Pol ε, and is likely to play an important role in the assembly of eukaryotic replication and repair complexes.
This article was reviewed by Patrick Forterre, Arcady Mushegian, and Chris Ponting. For the full reviews, please go to the Reviewers' Reports section.
Single-stranded (ss)DNA viruses are extremely widespread, infect diverse hosts from all three domains of life and include important pathogens. Most ssDNA viruses possess small genomes that replicate by the rolling-circle-like mechanism initiated by a distinct virus-encoded endonuclease. However, viruses of the family Bidnaviridae, instead of the endonuclease, encode a protein-primed type B DNA polymerase (PolB) and hence break this pattern. We investigated the provenance of all bidnavirus genes and uncover an unexpected turbulent evolutionary history of these unique viruses. Our analysis strongly suggests that bidnaviruses evolved from a parvovirus ancestor from which they inherit a jelly-roll capsid protein and a superfamily 3 helicase. The radiation of bidnaviruses from parvoviruses was probably triggered by integration of the ancestral parvovirus genome into a large virus-derived DNA transposon of the Polinton (polintovirus) family resulting in the acquisition of the polintovirus PolB gene along with terminal inverted repeats. Bidnavirus genes for a receptor-binding protein and a potential novel antiviral defense modulator are derived from dsRNA viruses (Reoviridae) and dsDNA viruses (Baculoviridae), respectively. The unusual evolutionary history of bidnaviruses emphasizes the key role of horizontal gene transfer, sometimes between viruses with completely different genomes but occupying the same niche, in the emergence of new viral types.
Complex viruses that encode their own initiation proteins and subvert the host’s elongation apparatus have provided valuable insights into DNA replication. Using purified bacteriophage SPP1 and Bacillus subtilis proteins, we have reconstituted a rolling circle replication system that recapitulates genetically defined protein requirements. Eleven proteins are required: phage-encoded helicase (G40P), helicase loader (G39P), origin binding protein (G38P) and G36P single-stranded DNA-binding protein (SSB); and host-encoded PolC and DnaE polymerases, processivity factor (β2), clamp loader (τ-δ-δ′) and primase (DnaG). This study revealed a new role for the SPP1 origin binding protein. In the presence of SSB, it is required for initiation on replication forks that lack origin sequences, mimicking the activity of the PriA replication restart protein in bacteria. The SPP1 replisome is supported by both host and viral SSBs, but phage SSB is unable to support B. subtilis replication, likely owing to its inability to stimulate the PolC holoenzyme in the B. subtilis context. Moreover, phage SSB inhibits host replication, defining a new mechanism by which bacterial replication could be regulated by a viral factor.
The mechanism of DNA replication is one of the driving forces of genome evolution. Bacterial DNA polymerase III, the primary complex of DNA replication, consists of PolC and DnaE. PolC is conserved in Gram-positive bacteria, especially in the Firmicutes with low GC content, whereas DnaE is widely conserved in most Gram-negative and Gram-positive bacteria. PolC contains two domains, the 3′-5′exonuclease domain and the polymerase domain, while DnaE only possesses the polymerase domain. Accordingly, DnaE does not have the proofreading function; in Escherichia coli, another enzyme DnaQ performs this function. In most bacteria, the fidelity of DNA replication is maintained by 3′-5′ exonuclease and a mismatch repair (MMR) system. However, we found that most Actinobacteria (a group of Gram-positive bacteria with high GC content) appear to have lost the MMR system and chromosomes may be replicated by DnaE-type DNA polymerase III with DnaQ-like 3′-5′ exonuclease. We tested the mutation bias of Bacillus subtilis, which belongs to the Firmicutes and found that the wild type strain is AT-biased while the mutS-deletant strain is remarkably GC-biased. If we presume that DnaE tends to make mistakes that increase GC content, these results can be explained by the mutS deletion (i.e., deletion of the MMR system). Thus, we propose that GC content is regulated by DNA polymerase and MMR system, and the absence of polC genes, which participate in the MMR system, may be the reason for the increase of GC content in Gram-positive bacteria such as Actinobacteria.
DNA polymerase III; GC content; mismatch repair; Gram-positive; Actinobacteria
dnaE, the gene encoding one of the two replication-specific DNA polymerases (Pols) of low-GC-content gram-positive bacteria (E. Dervyn et al., Science 294:1716-1719, 2001; R. Inoue et al., Mol. Genet. Genomics 266:564-571, 2001), was cloned from Bacillus subtilis, a model low-GC gram-positive organism. The gene was overexpressed in Escherichia coli. The purified recombinant product displayed inhibitor responses and physical, catalytic, and antigenic properties indistinguishable from those of the low-GC gram-positive-organism-specific enzyme previously named DNA Pol II after the polB-encoded DNA Pol II of E. coli. Whereas a polB-like gene is absent from low-GC gram-positive genomes and whereas the low-GC gram-positive DNA Pol II strongly conserves a dnaE-like, Pol III primary structure, it is proposed that it be renamed DNA polymerase III E (Pol III E) to accurately reflect its replicative function and its origin from dnaE. It is also proposed that DNA Pol III, the other replication-specific Pol of low-GC gram-positive organisms, be renamed DNA polymerase III C (Pol III C) to denote its origin from polC. By this revised nomenclature, the DNA Pols that are expressed constitutively in low-GC gram-positive bacteria would include DNA Pol I, the dispensable repair enzyme encoded by polA, and the two essential, replication-specific enzymes Pol III C and Pol III E, encoded, respectively, by polC and dnaE.
The Firmicutes often possess three conspicuous genome features: marked Purine Asymmetry (PAS) across two strands of replication, Strand-biased Gene Distribution (SGD) and presence of two isoforms of DNA polymerase III alpha subunit, PolC and DnaE. Despite considerable research efforts, it is not clear whether the co-existence of PAS, PolC and/or SGD is an essential and exclusive characteristic of the Firmicutes. The nature of correlations, if any, between these three features within and beyond the lineages of Firmicutes has also remained elusive. The present study has been designed to address these issues.
A large-scale analysis of diverse bacterial genomes indicates that PAS, PolC and SGD are neither essential nor exclusive features of the Firmicutes. PolC prevails in four bacterial phyla: Firmicutes, Fusobacteria, Tenericutes and Thermotogae, while PAS occurs only in subsets of Firmicutes, Fusobacteria and Tenericutes. There are five major compositional trends in Firmicutes: (I) an explicit PAS or G + A-dominance along the entire leading strand (II) only G-dominance in the leading strand, (III) alternate stretches of purine-rich and pyrimidine-rich sequences, (IV) G + T dominance along the leading strand, and (V) no identifiable patterns in base usage. Presence of strong SGD has been observed not only in genomes having PAS, but also in genomes with G-dominance along their leading strands – an observation that defies the notion of co-occurrence of PAS and SGD in Firmicutes. The PolC-containing non-Firmicutes organisms often have alternate stretches of R-dominant and Y-dominant sequences along their genomes and most of them show relatively weak, but significant SGD. Firmicutes having G + A-dominance or G-dominance along LeS usually show distinct base usage patterns in three codon sites of genes. Probable molecular mechanisms that might have incurred such usage patterns have been proposed.
Co-occurrence of PAS, strong SGD and PolC should not be regarded as a genome signature of the Firmicutes. Presence of PAS in a species may warrant PolC and strong SGD, but PolC and/or SGD not necessarily implies PAS.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-430) contains supplementary material, which is available to authorized users.
Fusobacteria; Tenericutes; Thermotogae; G-dominance; Leading strand; Lagging strand; Mutational bias; Cytosine methylation; Codon sites; Base usage
It is proposed that the pre-cellular stage of biological evolution unraveled within networks of inorganic compartments that harbored a diverse mix of virus-like genetic elements. This stage of evolution might comprise the Last Universal Cellular Ancestor (LUCA) that more appropriately could be denoted Last Universal Cellular Ancestral State (LUCAS). This scenario for the origin of cellular life recapitulates the early ideas of J. B. S. Haldane sketched in his classic 1928 essay. However, unlike in Haldane’s day, there is now considerable support for this scenario from three major lines of comparative-genomic evidence: i) lack of homology between the core components of the DNA replication systems of the two primary lines of descent of cellular life forms, archaea and bacteria, ii) distinct membrane chemistries and lack of homology between the enzymes of lipid biosynthesis in archaea and bacteria, iii) spread of several viral hallmark genes, which encode proteins with key functions in viral replication and morphogenesis, among numerous and extremely diverse groups of viruses, in contrast to their absence in cellular life forms, iv) the extant archaeal and bacterial chromosomes appear to be shaped by accretion of diverse, smaller replicons, suggesting a continuity between the hypothetical, primordial virus stage of life’s evolution and the dynamic prokaryotic world that existed ever since. Under the viral model of pre-cellular evolution, the key components of cells including the replication apparatus, membranes, and molecular complexes involved in membrane transport and translocation originated as components of virus-like entities. The two surviving types of cellular life forms, archaea and bacteria, might have emerged from the LUCAS independently, along with, probably, numerous forms now extinct.
comparative genomics; evolution of cells; evolution of viruses; origin of membranes; viral hallmark genes
Information transfer systems in Archaea, including many components of the DNA replication machinery, are similar to those found in eukaryotes. Functional assignments of archaeal DNA replication genes have been primarily based upon sequence homology and biochemical studies of replisome components, but few genetic studies have been conducted thus far. We have developed a tractable genetic system for knockout analysis of genes in the model halophilic archaeon, Halobacterium sp. NRC-1, and used it to determine which DNA replication genes are essential.
Using a directed in-frame gene knockout method in Halobacterium sp. NRC-1, we examined nineteen genes predicted to be involved in DNA replication. Preliminary bioinformatic analysis of the large haloarchaeal Orc/Cdc6 family, related to eukaryotic Orc1 and Cdc6, showed five distinct clades of Orc/Cdc6 proteins conserved in all sequenced haloarchaea. Of ten orc/cdc6 genes in Halobacterium sp. NRC-1, only two were found to be essential, orc10, on the large chromosome, and orc2, on the minichromosome, pNRC200. Of the three replicative-type DNA polymerase genes, two were essential: the chromosomally encoded B family, polB1, and the chromosomally encoded euryarchaeal-specific D family, polD1/D2 (formerly called polA1/polA2 in the Halobacterium sp. NRC-1 genome sequence). The pNRC200-encoded B family polymerase, polB2, was non-essential. Accessory genes for DNA replication initiation and elongation factors, including the putative replicative helicase, mcm, the eukaryotic-type DNA primase, pri1/pri2, the DNA polymerase sliding clamp, pcn, and the flap endonuclease, rad2, were all essential. Targeted genes were classified as non-essential if knockouts were obtained and essential based on statistical analysis and/or by demonstrating the inability to isolate chromosomal knockouts except in the presence of a complementing plasmid copy of the gene.
The results showed that ten out of nineteen eukaryotic-type DNA replication genes are essential for Halobacterium sp. NRC-1, consistent with their requirement for DNA replication. The essential genes code for two of ten Orc/Cdc6 proteins, two out of three DNA polymerases, the MCM helicase, two DNA primase subunits, the DNA polymerase sliding clamp, and the flap endonuclease.
Inteins are "protein introns" that remove themselves from their host proteins through an autocatalytic protein-splicing. After their discovery, inteins have been quickly identified in all domains of life, but only once to date in the genome of a eukaryote-infecting virus.
Here we report the identification and bioinformatics characterization of an intein in the DNA polymerase PolB gene of amoeba infecting Mimivirus, the largest known double-stranded DNA virus, the origin of which has been proposed to predate the emergence of eukaryotes. Mimivirus intein exhibits canonical sequence motifs and clearly belongs to a subclass of archaeal inteins always found in the same location of PolB genes. On the other hand, the Mimivirus PolB is most similar to eukaryotic Polδ sequences.
The intriguing association of an extremophilic archaeal-type intein with a mesophilic eukaryotic-like PolB in Mimivirus is consistent with the hypothesis that DNA viruses might have been the central reservoir of inteins throughout the course of evolution.
DNA replication is central to all extant cellular organisms. There are substantial functional similarities between the bacterial and the archaeal/eukaryotic replication machineries, including but not limited to defined origins, replication bidirectionality, RNA primers and leading and lagging strand synthesis. However, several core components of the bacterial replication machinery are unrelated or only distantly related to the functionally equivalent components of the archaeal/eukaryotic replication apparatus. This is in sharp contrast to the principal proteins involved in transcription and translation, which are highly conserved in all divisions of life. We performed detailed sequence comparisons of the proteins that fulfill indispensable functions in DNA replication and classified them into four main categories with respect to the conservation in bacteria and archaea/eukaryotes: (i) non-homologous, such as replicative polymerases and primases; (ii) containing homologous domains but apparently non-orthologous and conceivably independently recruited to function in replication, such as the principal replicative helicases or proofreading exonucleases; (iii) apparently orthologous but poorly conserved, such as the sliding clamp proteins or DNA ligases; (iv) orthologous and highly conserved, such as clamp-loader ATPases or 5'-->3' exonucleases (FLAP nucleases). The universal conservation of some components of the DNA replication machinery and enzymes for DNA precursor biosynthesis but not the principal DNA polymerases suggests that the last common ancestor (LCA) of all modern cellular life forms possessed DNA but did not replicate it the way extant cells do. We propose that the LCA had a genetic system that contained both RNA and DNA, with the latter being produced by reverse transcription. Consequently, the modern-type system for double-stranded DNA replication likely evolved independently in the bacterial and archaeal/eukaryotic lineages.
Viruses and/or virus-like selfish elements are associated with all cellular life forms and are the most abundant biological entities on Earth, with the number of virus particles in many environments exceeding the number of cells by one to two orders of magnitude. The genetic diversity of viruses is commensurately enormous and might substantially exceed the diversity of cellular organisms. Unlike cellular organisms with their uniform replication-expression scheme, viruses possess either RNA or DNA genomes and exploit all conceivable replication-expression strategies. Although viruses extensively exchange genes with their hosts, there exists a set of viral hallmark genes that are shared by extremely diverse groups of viruses to the exclusion of cellular life forms. Coevolution of viruses and host defense systems is a key aspect in the evolution of both viruses and cells, and viral genes are often recruited for cellular functions. Together with the fundamental inevitability of the emergence of genomic parasites in any evolving replicator system, these multiple lines of evidence reveal the central role of viruses in the entire evolution of life.
Production of concatemeric DNA is an essential step during HSV infection, as the packaging machinery must recognize longer-than-unit-length concatemers; however, the mechanism by which they are formed is poorly understood. Although it has been proposed that the viral genome circularizes and rolling circle replication leads to the formation of concatemers, several lines of evidence suggest that HSV DNA replication involves recombination-dependent replication reminiscent of bacteriophages λ and T4. Similar to λ, HSV-1 encodes a 5′-to-3′ exonuclease (UL12) and a single strand annealing protein [SSAP (ICP8)] that interact with each other and can perform strand exchange in vitro. By analogy with λ phage, HSV may utilize viral and/or cellular recombination proteins during DNA replication. At least four double strand break repair pathways are present in eukaryotic cells, and HSV-1 is known to manipulate several components of these pathways. Chromosomally integrated reporter assays were used to measure the repair of double strand breaks in HSV-infected cells. Single strand annealing (SSA) was increased in HSV-infected cells, while homologous recombination (HR), non-homologous end joining (NHEJ) and alternative non-homologous end joining (A-NHEJ) were decreased. The increase in SSA was abolished when cells were infected with a viral mutant lacking UL12. Moreover, expression of UL12 alone caused an increase in SSA, which was completely eliminated when a UL12 mutant lacking exonuclease activity was expressed. UL12-mediated stimulation of SSA was decreased in cells lacking the cellular SSAP, Rad52, and could be restored by coexpressing the viral SSAP, ICP8, indicating that an SSAP is also required. These results demonstrate that UL12 can specifically stimulate SSA and that either ICP8 or Rad52 can function as an SSAP. We suggest that SSA is the homology-mediated repair pathway utilized during HSV infection.
The repair of DNA damage is essential to maintain genomic stability. Cells have at least four distinct DNA repair pathways, and defects in any of them can lead to tumor formation and cancer progression. Herpes Simplex Virus-1 (HSV-1) manipulates components of the host DNA repair pathways. In this paper we showed that DNA repair by the single strand annealing (SSA) pathway was increased during HSV infection and that other pathways were inhibited. We also show that a viral nuclease in conjunction with either a viral or cellular single strand annealing protein can stimulate the SSA pathway. We suggest that viral DNA synthesis occurs via an SSAdependent mechanism that is reminiscent of that used by bacterial viruses such as λ. Interestingly, λ has evolved an SSA-mediated repair mechanism to exchange genetic information that has also been used to enhance gene targeting in bacteria. It is thus possible that HSV proteins could be similarly used as tools to stimulate gene targeting in human cells leading to more effective strategies for gene therapy. Furthermore, the diversity of HSV reported in human populations, combined with the high rate of genetic exchange during infection, suggests that SSA may play a role in viral evolution and pathogenesis.
Three evolutionarily distinct families of replicative DNA polymerases, designated polymerase B (Pol B), Pol C, and Pol D, have been identified. Members of the Pol B family are present in all three domains of life, whereas Pol C exists only in Bacteria and Pol D exists only in Archaea. Pol B enzymes replicate eukaryotic chromosomal DNA, and as members of the Pol B family are present in all Archaea, it has been assumed that Pol B enzymes also replicate archaeal genomes. Here we report the construction of Thermococcus kodakarensis strains with mutations that delete or inactivate key functions of Pol B. T. kodakarensis strains lacking Pol B had no detectable loss in viability and no growth defects or changes in spontaneous mutation frequency but had increased sensitivity to UV irradiation. In contrast, we were unable to introduce mutations that inactivated either of the genes encoding the two subunits of Pol D. The results reported establish that Pol D is sufficient for viability and genome replication in T. kodakarensis and argue that Pol D rather than Pol B is likely the replicative DNA polymerase in this archaeon. The majority of Archaea contain Pol D, and, as discussed, if Pol D is the predominant replicative polymerase in Archaea, this profoundly impacts hypotheses for the origin(s), evolution, and distribution of the different DNA replication enzymes and systems now employed in the three domains of life.
Viruses strongly influence the ecology and evolution of their eukaryotic hosts in the marine environment, but little is known about their diversity and distribution. Prasinoviruses infect an abundant and widespread class of phytoplankton, the Mamiellophyceae, and thereby exert a specific and important role in microbial ecosystems. However, molecular tools to specifically identify this viral genus in environmental samples are still lacking. We developed two primer sets, designed for use with polymerase chain reactions and 454 pyrosequencing technologies, to target two conserved genes, encoding the DNA polymerase (PolB gene) and the major capsid protein (MCP gene). While only one copy of the PolB gene is present in Prasinovirus genomes, there are at least seven paralogs for MCP, the copy we named number 6 being shared with other eukaryotic alga-infecting viruses. Primer sets for PolB and MCP6 were thus designed and tested on 6 samples from the Tara Oceans project. The results suggest that the MCP6 amplicons show greater richness but that PolB gave a wider coverage of Prasinovirus diversity. As a consequence, we recommend use of the PolB primer set, which will certainly reveal exciting new insights about the diversity and distribution of prasinoviruses at the community scale.
Heterocapsa circularisquama DNA virus (HcDNAV; previously designated as HcV) is a giant virus (girus) with a ~356-kbp double-stranded DNA (dsDNA) genome. HcDNAV lytically infects the bivalve-killing marine dinoflagellate H. circularisquama, and currently represents the sole DNA virus isolated from dinoflagellates, one of the most abundant protists in marine ecosystems. Its morphological features, genome type, and host range previously suggested that HcDNAV might be a member of the family Phycodnaviridae of Nucleo-Cytoplasmic Large DNA Viruses (NCLDVs), though no supporting sequence data was available. NCLDVs currently include two families found in aquatic environments (Phycodnaviridae, Mimiviridae), one mostly infecting terrestrial animals (Poxviridae), another isolated from fish, amphibians and insects (Iridoviridae), and the last one (Asfarviridae) exclusively represented by the animal pathogen African swine fever virus (ASFV), the agent of a fatal hemorrhagic disease in domestic swine. In this study, we determined the complete sequence of the type B DNA polymerase (PolB) gene of HcDNAV. The viral PolB was transcribed at least from 6 h post inoculation (hpi), suggesting its crucial function for viral replication. Most unexpectedly, the HcDNAV PolB sequence was found to be closely related to the PolB sequence of ASFV. In addition, the amino acid sequence of HcDNAV PolB showed a rare amino acid substitution within a motif containing highly conserved motif: YSDTDS was found in HcDNAV PolB instead of YGDTDS in most dsDNA viruses. Together with the previous observation of ASFV-like sequences in the Sorcerer II Global Ocean Sampling metagenomic datasets, our results further reinforce the ideas that the terrestrial ASFV has its evolutionary origin in marine environments.
The analysis of ∼2000 bacterial genomes revealed that they all, without a single exception, encode one or more DNA polymerase III α-subunit (PolIIIα) homologs. Classified into C-family of DNA polymerases they come in two major forms, PolC and DnaE, related by ancient duplication. While PolC represents an evolutionary compact group, DnaE can be further subdivided into at least three groups (DnaE1-3). We performed an extensive analysis of various sequence, structure and surface properties of all four polymerase groups. Our analysis suggests a specific evolutionary pathway leading to PolC and DnaE from the last common ancestor and reveals important differences between extant polymerase groups. Among them, DnaE1 and PolC show the highest conservation of the analyzed properties. DnaE3 polymerases apparently represent an ‘impaired’ version of DnaE1. Nonessential DnaE2 polymerases, typical for oxygen-using bacteria with large GC-rich genomes, have a number of features in common with DnaE3 polymerases. The analysis of polymerase distribution in genomes revealed three major combinations: DnaE1 either alone or accompanied by one or more DnaE2s, PolC + DnaE3 and PolC + DnaE1. The first two combinations are present in Escherichia coli and Bacillus subtilis, respectively. The third one (PolC + DnaE1), found in Clostridia, represents a novel, so far experimentally uncharacterized, set.
After DNA damage, Def1 triggers degradation of the catalytic subunit of the replicative DNA polymerase at stalled replication forks, allowing special polymerases to take over DNA synthesis.
DNA damages hinder the advance of replication forks because of the inability of the replicative polymerases to synthesize across most DNA lesions. Because stalled replication forks are prone to undergo DNA breakage and recombination that can lead to chromosomal rearrangements and cell death, cells possess different mechanisms to ensure the continuity of replication on damaged templates. Specialized, translesion synthesis (TLS) polymerases can take over synthesis at DNA damage sites. TLS polymerases synthesize DNA with a high error rate and are responsible for damage-induced mutagenesis, so their activity must be strictly regulated. However, the mechanism that allows their replacement of the replicative polymerase is unknown. Here, using protein complex purification and yeast genetic tools, we identify Def1 as a key factor for damage-induced mutagenesis in yeast. In in vivo experiments we demonstrate that upon DNA damage, Def1 promotes the ubiquitylation and subsequent proteasomal degradation of Pol3, the catalytic subunit of the replicative polymerase δ, whereas Pol31 and Pol32, the other two subunits of polymerase δ, are not affected. We also show that purified Pol31 and Pol32 can form a complex with the TLS polymerase Rev1. Our results imply that TLS polymerases carry out DNA lesion bypass only after the Def1-assisted removal of Pol3 from the stalled replication fork.
DNA damages can lead to the stalling of the cellular replication machinery if not repaired on time, inducing DNA strand breaks, recombination that can result in gross chromosomal rearrangements, even cell death. In order to guard against this outcome, cells have evolved several precautionary mechanisms. One of these involves the activity of special DNA polymerases—known as translesion synthesis (TLS) polymerases. In contrast to the replicative polymerases responsible for faithfully duplicating the genome, these can carry out DNA synthesis even on a damaged template. For that to occur, they have to take over synthesis from the replicative polymerase that is stalled at a DNA lesion. Although this mechanism allows DNA synthesis to proceed, TLS polymerases work with a high error rate even on undamaged DNA, leading to alterations of the original sequence that can result in cancer. Consequently, the exchange between replicative and special polymerases has to be highly regulated, and the details of this are largely unknown. Here we identified Def1—a protein involved in the degradation of RNA polymerase II—as a prerequisite for error-prone DNA synthesis in yeast. We showed that after treating the cells with a DNA damaging agent, Def1 promoted the degradation of the catalytic subunit of the replicative DNA polymerase δ, without affecting the other two subunits of the polymerase. Our data suggest that the special polymerases can take over synthesis only after the catalytic subunit of the replicative polymerase is removed from the stalled fork in a regulated manner. We predict that the other two subunits remain at the fork and participate in TLS together with the special polymerases.
Bioinformatics and functional screens identified a group of Family A-type DNA Polymerase (polA) genes encoded by viruses inhabiting circumneutral and alkaline hot springs in Yellowstone National Park and the US Great Basin. The proteins encoded by these viral polA genes (PolAs) shared no significant sequence similarity with any known viral proteins but were remarkably similar to PolAs encoded by two of three families of the bacterial phylum Aquificae and by several apicoplast-targeted PolA-like proteins found in the eukaryotic phylum Apicomplexa, which includes the obligate parasites Plasmodium, Babesia, and Toxoplasma. The viral gene products share signature elements previously associated only with Aquificae and Apicomplexa PolA-like proteins and were similar to proteins encoded by prophage elements of a variety of otherwise unrelated Bacteria, each of which additionally encoded a prototypical bacterial PolA. Unique among known viral DNA polymerases, the viral PolA proteins of this study share with the Apicomplexa proteins large amino-terminal domains with putative helicase/primase elements but low primary sequence similarity. The genomic context and distribution, phylogeny, and biochemistry of these PolA proteins suggest that thermophilic viruses transferred polA genes to the Apicomplexa, likely through secondary endosymbiosis of a virus-infected proto-apicoplast, and to the common ancestor of two of three Aquificae families, where they displaced the orthologous cellular polA gene. On the basis of biochemical activity, gene structure, and sequence similarity, we speculate that the xenologous viral-type polA genes may have functions associated with diversity-generating recombination in both Bacteria and Apicomplexa.
viral metagenomics; horizontal gene transfer; replication; DNA polymerase; Apicomplexa; Aquificae
Many temperature-resistant revertants of a polA1 polB polCts (HS432) strain are PolI+ (by either suppression of the polA1 amber allele or intragenic reversion) but remain polCts (contain a temperature-sensitive DNA polymerase III). It appears that DNA replication in such temperature-resistant revertants depends on an extragenic mutation, pcbA, already present in the parent strain and not linked to any of the DNA polymerase loci. This allele allows DNA replication dependent on DNA polymerase I and bypasses a temperature-sensitive DNA polymerase III (polC bypass), so that reversion to PolI+ makes the strain temperature resistant. This pathway of DNA replication also supports phage and plasmid DNA replication. At restrictive temperature, these mutants display a normal response to UV irradiation but show increased sensitivity to the alkylating agent methyl methanesulfonate. We have located pcbA linked to dnaA.
Accurate DNA replication is essential for maintenance of every genome. All archaeal genomes except Crenarchaea, encode for a member of Family B (polB) and Family D (polD) DNA polymerases. Gene deletion studies in Thermococcus kodakaraensis and Methanococcus maripaludis show that polD is the only essential DNA polymerase in these organisms. Thus, polD may be the primary replicative DNA polymerase for both leading and lagging strand synthesis. To understand this unique archaeal enzyme, we report the biochemical characterization of a heterodimeric polD from Thermococcus. PolD contains both DNA polymerase and proofreading 3′–5′ exonuclease activities to ensure efficient and accurate genome duplication. The polD incorporation fidelity was determined for the first time. Despite containing 3′–5′ exonuclease proofreading activity, polD has a relatively high error rate (95 × 10−5) compared to polB (19 × 10−5) and at least 10-fold higher than the polB DNA polymerases from yeast (polε and polδ) or Escherichia coli DNA polIII holoenzyme. The implications of polD fidelity and biochemical properties in leading and lagging strand synthesis are discussed.
Electronic supplementary material
The online version of this article (doi:10.1007/s00792-014-0646-9) contains supplementary material, which is available to authorized users.
Analytical biochemistry; Archaea; DNA enzymes; DNA polymerase; DNA replication; Family D DNA polymerase; Thermococcus; Replisome; Fidelity
The division of labor between template and catalyst is a fundamental property of
all living systems: DNA stores genetic information whereas proteins function as
catalysts. The RNA world hypothesis, however, posits that, at the earlier stages
of evolution, RNA acted as both template and catalyst. Why would such division
of labor evolve in the RNA world? We investigated the evolution of DNA-like
molecules, i.e. molecules that can function only as template, in minimal
computational models of RNA replicator systems. In the models, RNA can function
as both template-directed polymerase and template, whereas DNA can function only
as template. Two classes of models were explored. In the surface models,
replicators are attached to surfaces with finite diffusion. In the compartment
models, replicators are compartmentalized by vesicle-like boundaries. Both
models displayed the evolution of DNA and the ensuing division of labor between
templates and catalysts. In the surface model, DNA provides the advantage of
greater resistance against parasitic templates. However, this advantage is at
least partially offset by the disadvantage of slower multiplication due to the
increased complexity of the replication cycle. In the compartment model, DNA can
significantly delay the intra-compartment evolution of RNA towards catalytic
deterioration. These results are explained in terms of the trade-off between
template and catalyst that is inherent in RNA-only replication cycles: DNA
releases RNA from this trade-off by making it unnecessary for RNA to serve as
template and so rendering the system more resistant against evolving parasitism.
Our analysis of these simple models suggests that the lack of catalytic activity
in DNA by itself can generate a sufficient selective advantage for RNA
replicator systems to produce DNA. Given the widespread notion that DNA evolved
owing to its superior chemical properties as a template, this study offers a
novel insight into the evolutionary origin of DNA.
At the core of all biological systems lies the division of labor between the
storage of genetic information and its phenotypic implementation, in other
words, the functional differentiation between templates (DNA) and catalysts
(proteins). This fundamental property of life is believed to have been absent at
the earliest stages of evolution. The RNA world hypothesis, the most realistic
current scenario for the origin of life, posits that, in primordial replicating
systems, RNA functioned both as template and as catalyst. How would such
division of labor emerge through Darwinian evolution? We investigated the
evolution of DNA-like molecules in minimal computational models of RNA
replicator systems. Two models were considered: one where molecules are adsorbed
on surfaces and another one where molecules are compartmentalized by dividing
cellular boundaries. Both models exhibit the evolution of DNA and the ensuing
division of labor, revealing the simple governing principle of these processes:
DNA releases RNA from the trade-off between template and catalyst that is
inevitable in the RNA world and thereby enhances the system's resistance
against parasitic templates. Hence, this study offers a novel insight into the
evolutionary origin of the division of labor between templates and catalysts in
the RNA world.
Many pathogens associated with chronic infections evolve so rapidly that strains found late in an infection have little in common with the initial strain. This raises questions at different levels of analysis because rapid within-host evolution affects the course of an infection, but it can also affect the possibility for natural selection to act at the between-host level. We present a nested approach that incorporates within-host evolutionary dynamics of a rapidly mutating virus (hepatitis C virus) targeted by a cellular cross-reactive immune response, into an epidemiological perspective. The viral trait we follow is the replication rate of the strain initiating the infection. We find that, even for rapidly evolving viruses, the replication rate of the initial strain has a strong effect on the fitness of an infection. Moreover, infections caused by slowly replicating viruses have the highest infection fitness (i.e., lead to more secondary infections), but strains with higher replication rates tend to dominate within a host in the long-term. We also study the effect of cross-reactive immunity and viral mutation rate on infection life history traits. For instance, because of the stochastic nature of our approach, we can identify factors affecting the outcome of the infection (acute or chronic infections). Finally, we show that anti-viral treatments modify the value of the optimal initial replication rate and that the timing of the treatment administration can have public health consequences due to within-host evolution. Our results support the idea that natural selection can act on the replication rate of rapidly evolving viruses at the between-host level. It also provides a mechanistic description of within-host constraints, such as cross-reactive immunity, and shows how these constraints affect the infection fitness. This model raises questions that can be tested experimentally and underlines the necessity to consider the evolution of quantitative traits to understand the outcome and the fitness of an infection.
Rapidly mutating viruses, such as hepatitis C virus, can escape host immunity by generating new strains that avoid the immune system. Existing data support the idea that such within-host evolution affects the outcome of the infection. Few theoretical models address this question and most follow viral diversity or qualitative traits, such as drug resistance. Here, we study the evolution of two virus quantitative traits—the replication rate and the ability to be recognised by the immune response—during an infection. We develop an epidemiological framework where transmission events are driven by within-host dynamics. We find that the replication rate of the virus that initially infects the host has a strong influence on the epidemiological success of the disease. Furthermore, we show that the cross-reactive immune response is key to determining the outcome of the infection (acute or chronic). Finally, we show that the timing of the start of an anti-viral treatment has a strong effect on viral evolution, which impacts the efficiency of the treatment. Our analysis suggests a new mechanism to explain infection outcomes and proposes testable predictions that can drive future experimental approaches.