|Home | About | Journals | Submit | Contact Us | Français|
The machines that decode and regulate genetic information require the translation, transcription and replication pathways essential to all living cells. Thus, it might be expected that all cells share the same basic machinery for these pathways that were inherited from the primordial ancestor cell from which they evolved. A clear example of this is found in the translation machinery that converts RNA sequence to protein. The translation process requires numerous structural and catalytic RNAs and proteins, the central factors of which are homologous in all three domains of life, bacteria, archaea and eukarya. Likewise, the central actor in transcription, RNA polymerase, shows homology among the catalytic subunits in bacteria, archaea and eukarya. In contrast, while some “gears” of the genome replication machinery are homologous in all domains of life, most components of the replication machine appear to be unrelated between bacteria and those of archaea and eukarya. This review will compare and contrast the central proteins of the “replisome” machines that duplicate DNA in bacteria, archaea and eukarya, with an eye to understanding the issues surrounding the evolution of the DNA replication apparatus.
The Last Universal Common Ancestor cell, referred to by the acronym “LUCA”, is the cell from which the three domains of life evolved, bacteria, archaea and eukarya (Steel et al., 2010; Theobald, 2010). This last universal common ancestor cell is thought to have evolved around 3.5 billion years ago (Doolittle, 2000; Glansdorff et al., 2008). How did the first cell come about (i.e. either LUCA or its predecessors)? Cells that self-replicate couldn't have appeared all at once, in just one step. One prevailing theory is that pre-cellular life started in an aqueous environment, possibly in, or near deep thermal vents on the ocean floor where plenty of heat and organic chemicals abound and could freely float about. Complex organic polymers may have formed on inorganic catalytic surfaces and concentrated in nooks and crannies within and around deep-sea vents. Polymers of many types probably formed, and just about as many probably fell apart. The polymer RNA has been proposed to be the basis for the first cellular life (Woese, 1968; Orgel 1968; Crick 1968), sometimes referred to as the “RNA world” hypothesis. This hypothesis gained momentum with the discovery that RNA can catalyze a reaction (Kruger et al., 1982). The RNA world hypothesis posits that the RNA polymer had the properties required to catalyze a wide variety of reactions, more so than the inorganic catalysts in the rocks, and perhaps sufficient for life to occur (Copley et al., 2007; Gilbert, 1986; Orgel, 2003; Robertson and Joyce, 2012). RNA can fold into very complex shapes, like proteins, and catalyze many types of chemical reactions. The RNA in modern cells is formed from only 4 types of nucleotide building blocks, and one might think this would limit the repertoire of chemistry that RNA can provide. But the repertoire was sufficient to form the very complicated and huge ribosome machinery (which is a ribozyme) that makes proteins (Steitz, 2008). The ribozyme, and all of the tRNAs that supply the individual amino acid building blocks of protein polymers, is present in all three domains of life, the bacteria, archaea and eukarya (Woese et al., 1990). While RNA utilizes only 4 different monomeric nucleotide units (e.g. rC, rG, rA, rU), the current genetic code of triplet nucleotide units in modern-day cells enable use of 20 different amino acid units. The development of translation and the use of many amino acids in proteins resulted in the evolution of protein molecules that could catalyze many different types of reactions, and could bind ribozymes and enhance thier reaction rates. The first proteins probably simply helped RNA fold into tighter structures, by shielding the strong repelling forces of the phosphate backbone and (e.g. like the ribosomal proteins). Thus proteins would allow RNA to adopt more precise shapes and to develop into the precision ribozymes that we know of today (most ribozymes have a protein component). Eventually proteins would develop to become catalysts in their own right. For example, a ribozyme probably catalyzed the synthesis of other RNA polymers, but modern-day cells use a protein, RNA polymerase, to make RNA polymers. Eventually, the chemical efficiency afforded by 20 different amino acid side chains replaced most ribozymes.
Protein synthesis, the process of translation, is highly complex. It requires a large multi-protein/multi-rRNA ribosome, numerous tRNAs, 20 amino acids, a method to match each different tRNA with the exact amino acid that its anticodon decodes (i.e. catalyzed by aminoacyl tRNA synthetases), and a messenger RNA sequence that is “read” by the ribosome and uses the amino-acyl tRNAs to make protein. The process of translation is fascinating, but exactly how the translation machinery evolved is one of the mysteries of cellular evolution that remains a major question to this day. No protein machine has yet evolved to substitute for RNA in this complex process. In fact, despite the enormous genetic diversity in the viral world, no virus encodes its own translation system, yet many viruses have evolved to encode various methods of synthesizing RNA and DNA.
We don't know if the first cell used protein or RNA polymers for enzymatic catalysis, but we can be pretty certain that LUCA harbored the ribosome machinery, and thus synthesized protein molecules, because all cells today contain ribosomes composed of homologous rRNAs and use the same universal genetic code to make proteins. LUCA also probably contained DNA polymers because ribonucleotide reductase (RDR), which makes the dNMP building blocks of DNA, is homologous in all cells (Forterre et al., 2004; Leipe et al., 1999). Another compelling argument for the existence of DNA in LUCA is the role of homologous recombination in evolution of cellular life (see next section), and the fact that all known recombinases (RecA/Rad52) use the DNA polymer as the substrate. However, whether LUCA used DNA as the genetic material in addition to using DNA as a recombination substrate, or used RNA for its genome instead of DNA, is less certain. There is substantial precedent in modern-day viruses for use of RNA as their genetic material. By way of example, the scheme in Fig. 3 suggests one possible scheme (out of many) for information flow within LUCA in which RNA, DNA and protein exist, yet DNA is not used as the genome. In this scheme, the genome is RNA, and DNA is present for use as a substrate for homologous recombination required for exchange of genes among genomes (horizontal gene transfer), and thus sampling of combinations of genes that facilitated evolution of a self-replicating cell. In this hypothetical cell, RNA polymerase uses DNA to make RNA for protein synthesis. It also uses RNA polymerase to generate the RNA genome from the DNA molecule. The DNA molecule is made by a simple process, similar to the replication of modern-day retroviruses, which only require one enzyme, a reverse transcriptase. The reverse transcriptase initiates DNA synthesis using a tRNA for a primer, and when finished it makes the second strand of DNA while digesting away the original RNA.
The exchange of a gene from one organism to another type of organism is referred to as horizontal gene transfer, and requires homologous recombination catalyzed by a recombinase. Before the advent of genome sequencing, horizontal gene transfer was thought to be a rare event. But sequencing of numerous cellular genomes indicates that horizontal gene transfer was a frequent process during evolution. A high rate of horizontal gene transfer could not last long, as the genomic instability would prevent stable species development. But DNA swapping by horizontal gene transfer appears to have been frequent early in evolution, and this was probably necessary to sample sufficient combinations of enzymes that catalyzed different types of reactions needed for evolution of a free living self-reproducing cell. Accordingly, the recombinases, bacterial RecA and eukaryotic Rad51, are homologous and their common ancestor enzyme is thought to have been present in LUCA.
A relatively new view is that viruses predated cellular life (Forterre et al., 2004; Koonin et al., 2015; Leipe et al., 1999). In this scenario, viruses did not evolve from cells. It used to be thought that cells must have come first, and then viruses devolved from them by becoming a trimmed down selfish DNA held inside a protein capsid container. But this is inconsistent with the fact that genetic diversity in the viral world dwarfs the diversity of cellular genetic information. Thus the argument has been turned around to propose that the DNA sequences in cellular genomes derived from a sampling of information in viruses that predated cells. In this view, viruses were agents of information transfer between compartments, or nooks and crannies in deep-sea vents, that held pre-cellular assemblages. This viral mediated information transfer was performed by homologous recombination, taking chunks of nucleic acid from one compartment to another. In this view, viruses predated cellular life and were needed to explore the large variety of combinations needed to “pick the combination lock” of life. Even today, viruses are the predominant form of biological assemblage, being 10-100 fold more prevalent in number than all cells on earth combined (Forterre et al., 2004; Koonin et al., 2015; Leipe et al., 1999).
Note that the recombinases act upon DNA, not RNA, and presumably acted on duplex DNA in LUCA. It can be reasoned that DNA only became a prevalent polymer after the emergence of proteins, as follows. The nucleotide precursors of RNA are made from “scratch”, meaning that the metabolic pathways that piece together nucleotide bases and the ribose sugars synthesize the four rNMP precursors from numerous smaller molecules. The nucleotide precursors for DNA (dNMPs) are simply made in one step by removing the 2’ OH from the ribose sugar of the rNMPs. The abstraction of the 2’ hydroxyl moiety may not have been possible to achieve using the chemistry afforded by amino acid side chains, because all modern-day cells require a RDR enzyme that employs an iron radical to perform this reaction. Iron radicals destroy RNA and thus would have destroyed an RNA enzyme that catalyzes this process. This is one reason that proteins are thought to have predated the existence of the DNA polymer. DNA uses the base T instead of U, used in RNA. The likely reason for the evolution of T to replace U for DNA is because deamination of C is a frequent spontaneous process, due to hydrolysis, and U is the product of C deamination. Since U correctly base pairs with A, frequent deamination of C, followed by replication over the template U would have led to loss of GC base pairs over evolutionary time, replacing GC base pairs with AT base pairs. But the use of T in DNA enables a mechanism to detect and repair DNA containing U produced by deamination of C. All cells contain an enzyme called uracyl glycosylase, which detects U residues in DNA, and removes the base. Other enzymes then replace the U with C to restore the GC base pair.
Based on homology of genes and the universal genetic code, LUCA contained the ribosome, the tRNAs, tRNA syntheses, RDR, RNA polymerase, recombinase and a host of other molecules that made life possible. These molecules are homologous in every cell type, which is the hallmark definition of a molecule that was present in LUCA. One may think that by comparing many cellular genome sequences, one should be able to derive the minimal set of genes for a living cell, but it isn't that simple. The types and variety of the minimal genes needed for life is a very active area of investigation (Glass et al., 2006; Mushegian and Koonin, 1996).
The nucleotide units of DNA and RNA are identical except for the 2’ OH on the ribose sugar of RNA. The loss of the 2’ OH from the ribose sugar provides DNA with much greater stability than RNA, but DNA lacks the catalytic power of RNA. With the evolution of proteins, the catalytic properties of RNA were no longer needed, and eventually most ribozymes were replaced by more efficient protein counterparts. The synthesis of RNA and DNA is a comparatively simple process and only one enzyme is needed, RNA or DNA polymerase. The chemistry is the same, and is shown for DNA in Fig. 4. The 3’ OH is activated by proton abstraction and the 3’ oxy anion performs a nucleophilic attack at the 5’ alpha phosphate of an incoming dNTP, with the beta-gamma phosphates leaving as a molecule of pyrophosphate (Kornberg and Baker, 1992). The energy of the reaction is not too far downhill, and the reverse reaction can occur provided a sufficient pyrophosphate concentration. While use of dNDP substrates is just as energetically favorable as use of dNTPs, phosphate is plentiful in cells and would reverse the reaction. Therefore long chains of DNA and RNA would be difficult to synthesize using dNDP substrates. The use of dNTPs, and production of pyrophosphate during synthesis enables the cell to prevent the reverse reaction by splitting the pyrophosphate into two phosphate molecules (i.e. by pyrophosphatase) (Kornberg and Baker, 1992).
Crystal structures of RNA and DNA polymerases, along with biochemical experiments, reveal that the catalysis of nucleotide addition does not directly involve any amino acid side chains. Instead, the chemistry of activating the 3’ OH of the primer strand, and stabilizing the pyrophosphate leaving group of the incoming dNTP, is mainly accomplished by two metal ions (i.e. Mg++) that are bound to the polymerase active site by three conserved acidic residues (Beese and Steitz, 1991; Steitz, 2006). Interestingly, RNA is fully capable of chelating Mg++, and thus the use of two metals to catalyze this reaction may be a reflection of the RNA world. However, the most important aspect of RNA and DNA synthesis is accuracy of synthesis, and this is possibly more efficiently achieved by an enzyme than by a ribozyme. Base pairing efficiency of a NTP substrate to the template strand will strongly bias the choice of inserting the correct nucleotide, and an RNA or protein catalyst would be no different in this regard. But the AT and GC base pairs only differ by one hydrogen bond and therefore base pairing alone cannot provide the high accuracy (fidelity) required of DNA synthesis in making exact copies of entire genomes for cell duplication. Instead of base pairing energy, it is the nearly identical geometry of the AT and CG base pairs that enable polymerases to be highly accurate. The nucleotide selection step of DNA polymerases involves a conformation change of the enzyme that completely buries the base pair prior to chemistry. Only if the base pair adopts the correct geometry does chemistry occur (e.g. Fig. 5a) (Doublie et al., 1999; Johnson et al., 2003; Joyce et al., 2008; Li et al., 1998; Luo et al., 2007; Santoso et al., 2010).
DNA polymerases assort into different “families” based on non homologous sequences (families: A, B, C, D, X, Y and RT (reverse transcriptase) (Steitz, 1999; Yang and Woodgate, 2007). Crystal structure analysis of representatives of each family show they all have the shape of a right hand, with subdomains referred to as palm, fingers and thumb (e.g. see Fig. 5b) (Johansson and Macneill, 2010; Steitz, 1999). Hence, the evolutionary relationship of polymerases is not completely clear, because even though they have non homologous sequences they adopt the same general shape. Each subdomain carries a specific function. The fingers domain binds the incoming dNTP, the thumb domain helps grip duplex DNA, and the palm domain contains the amino acids that bind the two catalytic metal ions. The catalytic palm domain is the most conserved of the domains, and the topological chain folding pattern of this domain is similar in the A, B, and Y families, indicating an ancestral relationship among these polymerases (Fig. 5c) (Yang and Woodgate, 2007). Likewise, the C and X family polymerases have similar palm folding patterns (Fig. 5c) (Bailey et al., 2006; Lamers et al., 2006; Wing et al., 2008). Perhaps the palm domain evolved twice independently. It is interesting that all bacteria to date use the C family polymerase for chromosome replication, and this family is not represented in a eukaryotic cell. Eukarya use three B family polymerases for replication. Archaea also use a B family polymerase, or a D family polymerase for replication (Kelman and Kelman, 2014), and genome sequence analysis of archaea indicate that this domain of life evolved along the same line as eukaryotes, separate from bacteria (Xie et al., 2012; Yutin et al., 2008). Hence, the use of C- and B-family polymerases to replicate bacterial and eukaryotic/archaeal genomes, respectively, suggests the replication machinery evolved twice independently after LUCA (Leipe et al., 1999).
The fingers domain binds each of the 4 dNTPs. As described above, the highly accurate dNTP selection step of DNA polymerases involves a conformation change in which the fingers domain closes over the palm, burying the dNTP-template base pair in a confined chamber within which only a correct base pair can fit (e.g. Fig. 5a). Base pairs with the correct shape are joined, while those without the correct shape prevent the conformation change needed for catalysis, and dissociate from the polymerase, followed by binding of another dNTP to try again. The entire replication process is highly accurate, with only 1 mistake per 10−9 base pairs (Echols and Goodman, 1991; Kunkel, 2004; Modrich et al., 1996). This high fidelity of synthesis involves several enzymes, but the replicative DNA polymerase active site alone is quite accurate, making only about one mistake every 10,000 base pairs. Cellular genomes are much larger than this, and therefore this level of accuracy would not be sufficient to avoid mutations. Replicative DNA polymerases have an associated 3’-5’ exonuclease active site that proofreads the product of the DNA polymerase. Thus when a DNA polymerase makes a mistake and incorporates an incorrect dNTP, the polymerase stalls because it is very slow at extending a mismatch. This gives time for the 3’ mismatch primer terminus to relocate from the palm domain to the 3’-5’ exonuclease active site which readily removes mismatched 3’ nucleotides. The exonuclease proofreader gives an extra 100-fold accuracy, increasing the fidelity of synthesis to only one mistake every 1 million base pairs. Despite this impressive accuracy, most genomes are longer than one million base pairs, so even this level of accuracy is not sufficient for accurate genome duplication. But cells have repair systems that act after replication and detect and “fix” mistakes made by the polymerase. For example, the post-replication mismatch repair system recognizes mismatches (mistakes) made during replication and repairs them, increasing fidelity about 1000-fold. This brings the accuracy of genome replication to one mistake every 1 billion base pairs, well above the size of many (but not all) genomes.
While bacteria use a C-family DNA polymerase for replication, eukaryotes utilize three different B family DNA polymerases (Pols) for chromosome replication, Pols alpha, delta and epsilon (Garg and Burgers , 2005; Johansson and Macneill , 2010; Stillman, 2008). Pol alpha was the first eukaryotic replicative DNA polymerase identified and, surprisingly, it was found to synthesize primers (Conaway and Lehman, 1982). For many years it was thought to be the only replicative DNA polymerase in eukaryotes. Pol alpha consists of 4 subunits, the catalytic Pol1 subunit is the largest, the Pol12 subunit is second largest (unknown function), and two smaller subunits perform the RNA priming function. The smallest of the two priming subunits contains the primase catalytic site, but the larger priming subunit is required to form the first dinucleotide bond. After about 7 nucleotides, the RNA primer switches to the DNA polymerase subunit that extends RNA primers another 20-25 residues to produce a hybrid RNA/DNA primer (Garg and Burgers, 2005). A second replicative polymerase, Pol delta, was discovered in biochemical studies of SV40 DNA replication (Waga et al., 1998). Human Pol delta contains 4 subunits, the catalytic Pol3 is the largest, then Pol31 and Pol32 subunits in budding yeast, and a fourth very small subunit (p12) in human (Garg and Burgers, 2005). Characterization of SV40 replication showed that Pol delta synthesizes both the leading and lagging strands, and that Pol alpha serves as the primase (Tsurimoto and Stillman, 1991b; Tsurimoto and Stillman, 1991a; Waga et al., 1998). It came as quite a surprise when biochemical and genetic studies revealed a third DNA polymerase, Pol epsilon, that is essential to cellular replication (Ohya et al., 2002; Syvaoja et al., 1990). Pol epsilon consists of 4 subunits, the large catalytic Pol2 subunit, the second largest subunit is Dpb2, and two small histone fold subunits, Dpb3 and Dpb4. The sequence of the Pol2 subunit predicts that it consists of two B family DNA polymerase sequences linked end to end (Tahirov et al., 2009). The N-terminal half of Pol2 is the active polymerase and the C-terminal half of Pol2 encodes an inactive polymerase. Consistent with a role in DNA synthesis, point mutants in the active site of the catalytic N-terminal polymerase of Pol2 are not viable (Dua et al., 1999). However, cells are viable upon deletion of the entire N-terminal half of Pol2, suggesting that a back-up polymerase rescues cells when the catalytic region of Pol epsilon is absent (Dua et al., 1999; Kesti et al., 1999). Surprisingly, the C-terminal inactive polymerase region of Pol2 is essential to cell viability, and is currently believed to serve an essential structural role (Dua et al., 1999; Kesti et al., 1999).
Bacteria use distinctive enzymes to replicate their DNA compared to eukaryotes, and thus the replication process of modern-day cells may not have been utilized by LUCA. One of the differences between bacteria and eukaryotes, mentioned above, is that the DNA polymerases are widely diverged; bacteria use C family DNA polymerases while eukaryotes use polymerases of the B family. This has been taken as evidence that DNA replication evolved twice, independently for bacteria and eukarya (Leipe et al., 1999). As explained above, DNA polymerases all have right hand shapes and use a 2 metal ion mechanism of catalysis. Thus it may be arguable as to whether DNA polymerases of different families are truly a case of independent evolution. But even if there are distinctly evolved polymerases, recent studies have shown that many different types of DNA polymerases can trade places with the main chromosomal polymerase during replication, and these different DNA polymerases are functional with the sliding clamp and helicase of the replicating machinery (Geertsema and van Oijen, 2013; Indiani et al., 2005; Johansson and Dixon, 2013; Stukenberg et al., 1991; Yang et al., 2004). This plasticity is probably important to the replication process because replication, unlike transcription and translation, must succeed at all costs in the task of duplicating an entire genome. Replication only happens once in a cell's life, and the cell cannot afford failure. Transcription and translation are processes that can just start over whenever they fail. Consider replication of an entire genome. DNA polymerases differ in their capability to traverse various types of DNA sequences or DNA lesions. Thus, plasticity in the replicating machinery, through use of exchangeable polymerases, enables the replication machinery to advance over innumerable types of sequences and sites of DNA damage. Given this perspective, the use of distinct B and C family DNA polymerases in bacteria versus eukaryotes may not signify distinct evolutionary lineage of the replicating machinery. For example, maybe bacteria started out using a B family polymerase, but switched to a C family polymerase.
However, one can make more supportive arguments for the hypothesis of distinct evolution of replication machinery in bacteria versus eukaryotes by looking at the other enzymes required for replication, and asking whether LUCA even needed them. The “trick”, or most difficult aspect of cellular replication, is simultaneous replication of both strands of duplex DNA. LUCA may have replicated one strand at a time instead of simultaneous replication of both strands. Simultaneous synthesis of antiparallel strands requires that the lagging strand is made in the opposite direction of fork progression because DNA polymerases can only synthesize DNA in the 5’-3’ direction (i.e. the nucleotide building blocks are only activated at the 5’ position). Therefore replication of the lagging strand requires it to be synthesized in sections (i.e. Okazaki fragments). This required evolution of a robust priming activity, as well as enzymes to remove primers and ligate the sections of DNA together. However, if the two strands of dsDNA are not replicated simultaneously, then one strand can be synthesized continuously 5’-3’, and after it is completed the second strand can also be extended continuously, 5’-3’. Indeed, many viruses and phage replicate this way, and often with a strand displacing polymerase instead of requiring a helicase. Bacterial and eukaryotic cells use very distinct, non homologous helicase and primases (as will be discussed further in a later section). The non homologous primases, helicases, and DNA polymerases, combined, make a compelling argument that replication machinery evolved twice, independently, and that LUCA may not have performed simultaneous duplication of both strands of DNA.
Examples abound of relatively simple mechanisms of replication in eukaryotic viruses and bacteriophage that do not require a helicase or primase. For example, some viruses use a linear dsDNA genome and a protein-nucleotide conjugate to create the initial primer site, after which the DNA polymerase catalyzes synthesis by strand displacement to the end of the genome, circumventing the need for primase and helicase (Kornberg and Baker, 1992). Other viral genomes have covalently closed ends, and the polymerase simply generates a duplicate genome from a single nick and then extension around the entire genome without interruption. Again no primase or helicase are required. In another example, mentioned earlier, retroviruses use a reverse transcriptase to convert the ssRNA genome to a dsDNA genome, by initiating synthesis with a tRNA (i.e. not primase), and using RNase to remove the RNA thus circumventing need for a helicase. New viral RNA genomes are generated from the dsDNA by the host RNA polymerase.
The first protein that was discovered to encircle DNA was the E. coli beta subunit sliding clamp (Kong et al., 1992; Kuriyan and O'Donnell, 1993; Stukenberg et al., 1991). Since then, many different types of proteins have been found to encircle DNA for their function. The beta sliding clamp is shown in Fig. 6a; it is a dimer of a subunit that consists of three domains. Each of the three domains has the same folding pattern, giving the dimer a pseudo six-fold symmetry. Turned on its side, the clamp is relatively thin, about 1 turn of DNA, and has two distinctive faces (the C-face is labeled in Fig. 6a). The beta clamp is assembled onto DNA by a 5-subunit clamp loader that uses ATP to open and close the clamp around DNA (Kuriyan and O'Donnell, 1993). After being loaded onto DNA, the beta clamp can diffuse along dsDNA. The clamp binds to Pol III, the chromosomal replicase of E. coli, and once bound to beta Pol III can synthesize DNA for thousands of nucleotides without coming off DNA at a rate of 500-1000 nucleotides each second (Kuriyan and O'Donnell, 1993). The main role of the clamp in chromosome replication is to impart high processivity onto the DNA polymerase. It does so by binding directly to the polymerase and sliding along behind it, acting as a mobile tether that holds polymerase to DNA during numerous rounds of dNTP incorporation.
In eukarya and archaea the sliding clamp is the proliferating cell nuclear antigen (PCNA). The structure of PCNA is essentially superimposable with that of beta (Fig. 6b) (Gulbis et al., 1996; Krishna et al., 1994). An example of an archaeal PCNA is shown in Fig. 6c (Matsumiya et al., 2001). The major difference between PCNA and beta is that each PCNA subunit is composed of only 2 domains, instead of three, and PCNA trimerizes to form a 6-domain ring like beta. Numerous studies demonstrate that nearly all the different DNA polymerases of a cell interact with the sliding clamp, both in bacteria and eukarya/archaea (Yao and O'Donnell, 2015). The striking similarity in the structures of the bacterial and eukarya/archaea sliding clamps reveals that they are related by a common ancestor and thus LUCA contained the sliding clamp. Clamps require multi-subunit ATPase driven clamp loaders that open and close the clamp around a primed site. Both the bacterial and eukaryotic clamp loading machines are circular pentamers and the subunits have both sequence homology (Bunz et al., 1993; O'Donnell et al., 1993), and structural similarity (Bowman et al., 2004; Jeruzalmi et al., 2001; Kelch et al., 2011). Thus LUCA contained the clamp and clamp loading system that was handed down during evolution to all three domains of life.
Considering that the sliding clamp provides processivity to DNA polymerases, it is natural to assume that in LUCA the clamp provided processivity to the polymerase that replicated the genome. But this does not need to be the case, nor the principle function for with the clamp evolved. For one, DNA polymerases can be processive without a clamp, as will be discussed later. But more importantly, the sliding clamp has many functions beyond its role in processive DNA replication. Take for example the PCNA clamp. PCNA binds a host of different enzymes, far too many to list here (see (Georgescu et al., 2015; Maga et al., 2003)). Therefore, the reason that clamps evolved in the first place could be for another purpose, and not replication. Many PCNA clamp binding proteins are involved in DNA repair. For example, PCNA is required for mismatch repair in eukaryotes, in which it dictates strand specificity to ensure that the misincorporated nucleotide of the new strand is excised, and not the correct parental nucleotide (Manosas et al., 2012; Pluciennik et al., 2010). This review will revisit the role of sliding clamps later (in the last section), in which we propose a new perspective on the modern-day function of the sliding clamp in cell physiology.
The structure of DNA is elegant, but the antiparallel double helix poses significant barriers to duplication by enzymes. Long chains of duplex DNA, must be unzipped and the helical turns require enormous amounts of unwinding during replication, a fact pointed out by Watson and Crick in their classic model of DNA and its implications for cellular replication (Watson and Crick, 1953). We now know that the unwinding problem is solved by topoisomerases, and that unzipping of DNA is performed by ATP-driven helicases. As described earlier, the antiparallel strands require synthesis in opposite directions, yet the nucleotide precursors are only activated on the 5’ end (the 5’ triphosphate). Therefore, one strand is synthesized in segments extended in the opposite direction of unwinding, and each segment must be initiated de novo, by condensation of two nucleotides. In addition, all cells contain primase, a specialized RNA polymerase that makes a short RNA primer (Kornberg and Baker, 1992). Why don't DNA polymerases make the primer, and why is the primer made of RNA that requires other enzymes to repair and replace the RNA with DNA? One possible reason a RNA primase is required is that rNTPs are present in 10-100 fold higher concentration in cells than dNTPs, and a high concentration may be required for an enzyme to bind two rNTPs at the same time. There is another reason that RNA may be used to prime DNA synthesis. The initial condensations of a nucleotide chain are inaccurate, and in fact primases make a mistake about 1 % of the time (Sheaff and Kuchta, 1994; Zhang and Grosse, 1990). Hence, the use of RNA is a convenient “marker” of low fidelity nucleic acid synthesis, enabling it to be recognized and replaced with DNA. The RNA primer is distinguished from DNA by repair enzymes that excise the RNA and replace it with DNA using a high fidelity DNA polymerase, after which DNA ligase seals Okazaki fragments together.
Given the complexity required to coordinate the diverse enzymatic reactions required for simultaneous replication of both strands of duplex DNA, one may question whether this process was operative in LUCA. Previous sections have dealt with the DNA polymerase, and how bacteria and eukarya/archaea have distinctive replicative DNA polymerases, suggesting independent evolution of the replication machinery in these two branches of cellular life. In this section we compare and contrast the primase and helicase of bacteria and eukarya/archaea. It is these two components of bacterial and eukaryotic replication enzymes that differ the most.
Cells that simultaneously replicate both strands of duplex DNA must continually reinitiate synthesis on the antiparallel strand, the “lagging strand” that is synthesized in the opposite direction of fork movement. For this function, cells from all three domains of life have evolved a RNA primase activity (Kornberg and Baker, 1992). Primases generally use rNTPs to make a short RNA of a dozen nucleotides or less. The bacterial and eukaryotic primases appear the most divergent of all replication factors. The bacterial primase is a single-subunit protein and the active site displays homology to topoisomerases, sometimes referred to as a “toprim” fold, illustrated in Fig. 5a (Aravind et al., 1998; Podobnik et al., 2000). In bacteria the RNA primers are approximately 10-12 nucleotides. In contrast, the eukaryotic primase is the Pol alpha, and the RNA priming activity is contained within a heterodimer of the two smallest subunits. The eukaryotic priming subunits share no sequence or structural similarity to the toprim fold of bacterial primase, and instead share homology to the Pol X family (Fig. 7b) (Kilkenny et al., 2013; Kirk and Kuchta, 1999). Priming activity requires both subunits; the larger of the two is required to form the first dinucleotide bond, while the smaller subunit contains the active site for chain extension (Copeland et al., 1993; Zerbe and Kuchta, 2002). The priming subunits make a RNA primer of about 7 rNMPs, and then the DNA polymerase subunit extends the RNA primer with about 20-25 nucleotides of DNA to make a hybrid RNA/DNA primer (Kilkenny et al., 2013; Singh et al., 1986). The archaeal primase lacks the polymerase and B subunits of Pol alpha and is a heterodimer like the two small subunits of Pol alpha (Lao-Sirieix et al., 2005). Interestingly, archaeal primase can synthesize either DNA or RNA primers in vitro (Bocquier et al., 2001; Liu et al., 2001), but it binds rNTPs tighter than dNTPs (Lao-Sirieix and Bell, 2004). In archaeal cells the 5’ ends of Okazaki fragments are in fact tipped with RNA (Matsunaga et al., 2003). In summary, the bacterial and eukaryotic/archaeal primases are distinct in both sequence and structure, making a compelling argument that they share no common ancestor and must have evolved independently of one another.
Helicases harness the power of nucleotide hydrolysis to translocate along a single-strand (ss) DNA, and are required for DNA replication in all cell types. Replicative helicases in all cells are hexamers that appear to encircle one strand and motor along it, while excluding the other strand from the internal channel (Ahnert and Patel, 1997; Hacker and Johnson, 1997; Lee et al., 2014). This is referred to as the “steric exclusion” mechanism of DNA unwinding. While the common quaternary structure and steric exclusion mechanism might suggest that replicative helicases of all cells share a common ancestor, they share no sequence homology or subunit structure. The motor domains of the bacterial helicase (DnaB in E. coli) utilize a similar chain fold as the RecA recombinase (Forterre et al., 2004; LeBowitz and McMacken, 1986; Leipe et al., 1999). In contrast, the replicative helicase motors of the eukaryotic Mcm2-7 heterohexamer are based in the AAA+ fold (ATPases associated with a variety of cellular activities) (Chong et al., 2000; Forterre et al., 2004; Leipe et al., 1999; Li et al., 2015; Tye, 1999). The archaeal replicative Mcm homohexameric helicase is also based on the AAA+ fold (Chia et al., 2010; Chong et al., 2000). Furthermore, the bacterial helicase translocates 5’-3’ on ssDNA (LeBowitz and McMacken, 1986), while archaeal Mcm and eukaryotic CMG translocate in the 3’-5’ direction (Bochman and Schwacha, 2008; Bochman and Schwacha, 2009; Ilves et al., 2012; Moyer et al., 2006). Thus the bacterial and eukaryotic helicases evolved independently, and not from a common ancestor. Furthermore, the eukaryotic helicase consists of eleven distinct subunits needed for activity; six different AAA+ motor subunits comprise the Mcm2-7 heterohexamer (Li et al., 2015), and 5 accessory factors bind the side of the Mcm2-7 ring (Ilves et al., 2012; Moyer et al., 2006). These accessory factors are Cdc45 protein, and the GINS heterotetramer; the 4 GINS subunits share homology to one another. The 11-subunit helicase complex is referred to as CMG (Cdc45, Mcm2-7, GINS). Unlike bacterial and archaeal helicase, 3D EM reconstruction reveals that eukaryotic CMG contains two channels; the Mcm2-7 subunits form the channel that encircles and translocates on ssDNA during unwinding, and the accessory factors form a second channel. The function, if any, of the second channel is not known (Fig. 7b) (Costa et al., 2011). There are no bacterial homologues to the Cdc45 and GINS subunits. The archaea contain homologues to the Cdc45/GINS accessory factors, but it is unknown whether they form a CMG complex (Makarova et al., 2012).
Each subunit within both bacterial and eukaryotic helicase motor rings have large C-terminal and N-terminal domains, and this bilobed structure gives the helicase the appearance of two rings stacked on top of one another (Fig. 7a). The N-terminal domains may form one ring and the C-terminal motor domains form another. The eukaryotic Mcm2-7 and archaeal Mcm hexamer both encircle the leading strand with the motor domains pointing toward the forked junction (Costa et al., 2014; Rothenberg et al., 2007). Structure analysis of bacterial and eukaryotic viral replicative helicases propose that ATP hydrolysis drives rotary movements of DNA binding elements in the central channel for DNA translocation (Fig. 7c) (Enemark and Joshua-Tor, 2006; Itsathitphaisarn et al., 2012).
Replication proteins work together as a replisome machine that replicate both strands of DNA at the same time (Kornberg and Baker, 1992; Benkovic et al., 2001). The current view of the bacterial replisome, from studies of E. coli, is illustrated in Fig. 6a (McHenry, 2011; Yao and O'Donnell, 2015). The clamp loader organizes the replication enzymes at the fork. Three of the five clamp loading subunits are identical (called tau) and contain a 24 kda C-terminal extension that bind three molecules of Pol III core for every clamp loader. The C-region of tau subunits also connect to one molecule of the hexameric helicase that encircles the lagging strand. Studies of E. coli and bacteriophage T4 and T7, show that a polymerase-helicase contact is required for leading strand synthesis, and the energy of dNTP incorporation gives an extra push to the helicase (Benkovic et al., 2001; Donmez and Patel, 2008; Kim et al., 1996; Kulczyk et al., 2012). While it would seem a waste to use 3 Pol molecules at the E. coli replication fork, when there are only two DNA strands to replicate, the lagging strand is made as multiple Okazaki fragments and thus two polymerases could be used on this strand. Indeed, experimental evidence indicates that two of the three Pol III's function on the lagging strand (Georgescu et al., 2011; Reyes-Lamothe et al., 2010). RNA primers of about 12 nucleotides are synthesized by primase, which acts stochastically coming in and out of the replisome (i.e. is not an integral member of the replisome) (Wu et al., 1992). As each Okazaki fragment is extended, a DNA loop forms on the lagging strand to accommodate the opposite direction of extension of the lagging strand relative to the leading strand. Okazaki fragments in bacteria are 1-2 kb, and upon finishing a section of DNA Pol III rapidly ejects from the beta clamp, leaving it behind on DNA (O'Donnell, 1987; Studwell et al., 1989; Stukenberg et al., 1994). The beta clamp is then targeted by Pol I and ligase (Lopez de Saro and O'Donnell, 2001), which remove the RNA primer, fill in the resulting gap with DNA and seal the Okazaki fragments together (Kornberg and Baker, 1992).
The formation of the lagging strand DNA loops, one for each Okazaki fragment, is due to the attachment of the lagging strand polymerase(s) to the helicase. The DNA loops serve no known function and my simply be a consequence of identical leading and lagging strand DNA polymerases in bacterial and the T4 and T7 bacteriophage systems (Hamdan et al., 2009; Nossal et al., 2007; Park et al., 1998). The leading strand Pol-helicase contact is important to activity, and the same contact, between the lagging strand Pol-helicase, result in DNA loops. DNA loops take little energy to form, and in the absence of a negative effect, evolution would not select to prevent their formation. The ssDNA generated in the DNA loop is tightly wrapped by the single-strand DNA binding protein, SSB (Chastain et al., 2003; Georgescu et al., 2014b). SSB is ubiquitous in all cell types and not only protects ssDNA against nucleases but also removes secondary structure blocks to the lagging strand polymerase.
The eukaryotic replisome architecture is not yet clear, although some details have emerged. Several studies indicate that Pol epsilon is the leading strand polymerase while Pol delta replicates the lagging strand (Clausen et al., 2015; Kunkel and Burgers, 2008; Miyabe et al., 2011; McElhinny et al., 2008; Pursell et al., 2007). Biochemical studies of replication forks driven by CMG also support these polymerase assignments (Georgescu et al., 2014a; Georgescu et al., 2015; Langston et al., 2014). However this assignment is still in contention (Johnson et al., 2015) and further studies will be required to sort this out. CMG forms a central organizing unit of the eukaryotic replisome, and antibody pull-outs of CMG from cell extracts identify a large complex of numerous proteins referred to as the RPC (Replication Promoting Complex (Gambus et al., 2006; Gambus et al., 2009). Among these proteins are CMG, Ctf4, Pol alpha, Pol epsilon, Mcm10, Tof1, Csm3, Mrc1 and FACT. While the function of CMG helicase, and the DNA polymerase/primases are understood, the exact function of most of the other proteins of the RPC are largely unknown. Nor is it known whether the lagging strand Pol delta connects stably to the replisome. Genetic studies indicate that Mcm10 is essential to replication, and it binds Pol alpha, although very little is known about its function (Warren et al., 2009). Ctf4 mutants display a chromosome segregation phenotype, and Ctf4 is known to bind both CMG and Pol alpha, linking them together (Miles and Formosa, 1992; Simon et al., 2014). Tof1, Csm3, Mrc1 are involved in a replication checkpoint pathway, although how they function at the fork remains largely unknown. Recent biochemical studies have reconstituted leading and lagging strand replication in vitro in the absence of Ctf4, Mcm10, Tof1, Mrc1 and Csm3, and therefore these particular proteins do not appear to be required for the central actions of leading and lagging strand synthesis (Georgescu et al., 2014a; Georgescu et al., 2015).
The current view of the architecture of the eukaryotic replisome is shown in Fig. 8b. The CMG encircles the leading strand as discussed above (Bochman and Schwacha, 2008; Bochman and Schwacha, 2009; Ilves et al., 2012, Moyer et al., 2006). Studies with pure proteins have shown a direct connection between CMG and Pol epsilon, consistent with Pol epsilon acting on the leading strand (Langston et al., 2014). Indeed, in vitro, Pol epsilon is much more active with CMG on the leading strand compared to Pol delta, while Pol delta is more active on the lagging strand with CMG than Pol epsilon (Georgescu et al., 2014a). Unlike the bacterial replisome, where the clamp loader organizes the replisome, there is no evidence that the eukaryotic RFC clamp loader (Replication Factor C) travels with the replisome (e.g. RFC is not pulled-down with the RPC). CMG links the lagging strand Pol alpha to the replisome through the Ctf4 trimer, and Pol alpha-primase is a component of the RPC, indicating that it travels with the replisome, unlike the primase of E. coli (Gambus et al., 2009; Simon et al., 2014). The lagging strand Pol delta is not reported to bind CMG, nor is it part of the RPC, and therefore lagging strand looping may not occur. Okazaki fragments in eukaryotic cells are only 100-200 nucleotides in length, much shorter than in bacteria. The PCNA clamp helps Pol delta to perform strand displacement synthesis of the RNA portion of Okazaki fragments, and PCNA recruits the endonuclease (Fen1) that excises the ssRNA, and recruits the ligase that seals Okazaki fragments (Burgers, 2009). Each of these proteins bind PCNA, and thus a PCNA trimer might bind all three proteins simultaneously (Burgers, 2009). The lagging ssDNA is coated by RPA (Replication Protein A), a ssDNA binding protein that serves an analogous function to bacterial SSB. Both RPA and SSB bind DNA using OB folds, four within the SSB tetramer (one per subunit) and four within the RPA heterotrimer.
Among the core replication proteins (Table I), only the clamp and clamp loader are clearly conserved in the three domains of life, and thus these components were likely present in LUCA before the divergence of bacteria, archaea and eukaryotes. An obvious possible use of the sliding clamp in LUCA was to provide processivity for the replicative polymerase. However, a much wider role for the sliding clamp may be proposed (as in Kelman and Hurwitz, 1988), based on the fact that the clamp interacts with numerous factors in modern day cells. The reasoning is as follows: any protein that functions with DNA and interacts with a clamp that encircles DNA will be tethered to DNA in a mobile fashion, and can scan along the DNA surface to locate its proper site of action. Scanning along a linear DNA would, in effect, be equivalent to an increase in concentration of the protein in the cell. In other words, in the absence of a clamp, the protein would need to be present in much higher amounts to locate its site of action in 3D space at the same rate compared to linear 1D scanning along DNA. Thus the evolution of a clamp loader and a promiscuous clamp that binds different DNA metabolic proteins would, with one protein (and the clamp loader), increase the effective concentration of a host of proteins.
The sliding clamp was discovered in the field of DNA replication, and it provides a dramatic increase in processivity of the replicative DNA polymerase of E. coli (Stukenberg et al., 1991). Thus it is natural to assume that the clamp is required for a processive polymerase. However, we now know of several viral and phage DNA polymerases that do not require a clamp and clamp loader for their dramatically high processivity. In fact, to these authors’ knowledge the only examples of phages/viruses that encode their own clamp and clamp loader are provided by the T-even family bacteriophages. There are viral/phage polymerases that are highly processive by themselves, or have a single accessory factor that aids processivity, but is not a ring. An excellent example of this is the phi29 bacteriophage DNA polymerase, which is highly processive as a single protein (Salas, 2009). The phi29 DNA polymerase can undergo rapid and processive strand displacement synthesis with no help of an accessory factor, and it works so well that industry has employed it for whole genome amplification (Dean et al., 2002). The herpes simplex virus 1, vaccinia poxvirus, and T7 phage polymerases are also highly processive (they bind one accessory factor but do not use a ring or clamp loader) (Kornberg and Baker, 1992). As another example, RNA polymerase has long been known to be highly processive (Kornberg and Baker, 1992). Hence, sliding clamps are not inherently required for a polymerase to be highly processive in synthesis. In this line of reasoning, we have proposed that the replicative polymerases of modern-day cells primarily use the clamp for another purpose, specifically to mark newly replicated DNA to distinguish it from old parental DNA (Georgescu et al., 2015). For example, it is of utmost importance that mismatch repair distinguish the parental strand having the correct nucleotide from the newly replicated strand that contains the incorrect mismatch. The front and back surfaces of sliding clamps are structurally distinct (see Fig. 6), and they are loaded onto a 3’ terminus with a distinctive polarity. When the clamp is loaded onto newly replicated DNA for replisome function, the clamp provides directional information to mismatch repair proteins that must distinguish the new strand from the old strand in the daughter duplexes. Although E. coli uses DNA methylation to direct mismatch repair to the new strand, most bacteria do not have this methylation system, and they use the clamp to direct mismatch repair to the new strand, as do eukaryotic cells (Kunkel and Erie, 2005; Lenhart et al., 2015; Modrich, 2006). Another important function of clamps that mark newly replicated DNA is the assembly of nucleosomes. The Caf1 factor that assembles nucleosomes on daughter strands requires PCNA for function (Sharp et al., 2001; Shibahara and Stillman, 1999).
One way to ensure that clamps are placed on newly replicated DNA, and thus mark newly replicated DNA, is to evolve DNA polymerases that simply can not function without a clamp. In this view, the “clamp dependent” DNA polymerase must also periodically leave the clamp, let it diffuse away along DNA, and then associate with a new clamp (after the clamp loader assembled a new clamp on the primed site), thereby populating the product DNA with clamps. This is an observed characteristic of both E. coli Pol III (Stukenberg et. al., 1994) and eukaryotic Pol delta (Langston and O'Donnell., 2008). Thus we propose that replicative DNA polymerases evolved to become “addicted” to sliding clamps as a mechanism that ensures that newly replicated DNA is marked by sliding clamps.
Viruses may have predated cells (Forterre et al., 2004; Koonin et al., 2015; Leipe et al., 1999), and thereby provided a large pool of genetic diversity that homologous recombination could plumb for genes and ultimately splice them together into a genome that encodes sufficient chemistry to enable a growing, self-reproducing cell. The translation and transcription pathways of all cells are performed by homologous machineries, indicating that LUCA (the last universal common ancestor cell) had settled on a solution to these processes and handed this solution down to all modern-day cells. Evolution of the replication process appears far different, as the core enzymes of replication are not homologous in bacteria and archaea/ eukarya. The helicase and primase components of the replication machinery may not have been in LUCA, as they are only required when both strands of duplex DNA are replicated simultaneously. Instead, LUCA may have had a simple replication system similar to numerous modern-day viruses and phage that replicate one strand at a time, and require no primase or helicase. Interestingly, the sliding clamps and clamp loader are homologous in all cells and thus were present in LUCA. It is possible that sliding clamps had other functions in LUCA beyond replication, and that replicative polymerases of modern-day cells use sliding clamps as a method to mark newly replicated DNA for DNA repair and packaging.
This work was supported by a grant from the NIH GM115809 and from Howard Hughes Medical Institute.
Declaration of Interest
This manuscript was supported by NIH grant GM115809, and the Howard Hughes Medical Institute. The authors have no competing or conflict of interest in this review.