|Home | About | Journals | Submit | Contact Us | Français|
Coordinated replication of eukaryotic nuclear genomes is asymmetric, with copying of a leading strand template preceding discontinuous copying of the lagging strand template. Replication is catalyzed by DNA polymerases α, δ and ε, enzymes that are related yet differ in physical and biochemical properties, including fidelity. Recent studies suggest that Pol ε is normally the primary leading strand replicase, whereas most synthesis by Pol δ occurs during lagging strand replication. New studies show that replication asymmetry can generate strand-specific genome instability resulting from biased deoxynucleotide pools and unrepaired ribonucleotides incorporated into DNA during replication, and that the eukaryotic replication machinery has evolved to most efficiently correct those replication errors that are made at the highest rates.
Three main processes contribute to replication fidelity . (i) Replicative DNA polymerases almost always select correct deoxynucleoside triphosphates (dNTPs) for incorporation onto correctly aligned primer-templates. (ii) Rare mismatches made during replication can be excised by the 3´ to 5´ exonucleases associated with certain replicases. Nucleotide selectivity and proofreading depend on the relative and absolute concentrations, respectively, of the dNTPs. (iii) Rare errors that escape proofreading can be corrected by mismatch repair (MMR). Operating in series, these processes generate very few replication errors, such that DNA based organisms generate much less than one spontaneous mutation per genome per generation . This is truly amazing accuracy in light of the enormous complexity of replication. This complexity partly derives from the fact that the two DNA strands are anti-parallel yet DNA polymerases only synthesize DNA in the 3´–5´ direction. Thus coordinated replication of the two strands is intrinsically asymmetric, with copying of a leading strand template preceding discontinuous copying of the complementary lagging strand template [3,4]. During replication of the eukaryotic nuclear genome, which is the focus of this review, the lagging strand is synthesized as Okazaki fragments of about 200–300 nucleotides, requiring an enormous amount of end processing to complete replication. This review considers studies conducted in the last few years on the roles of Pols δ and ε in copying the leading and lagging strand templates, on strand-specific genome instability resulting from asymmetric replication, and on how the balance between generating and correcting errors achieves high fidelity replication of both strands.
Replication of nuclear genomes is initiated at replication origins spaced at about 30–100 kilobase intervals. Two replication forks assemble at origins and travel in opposite directions until they merge with forks migrating from adjacent origins. Three eukaryotic DNA polymerases are required for replication, Pol α, Pol δ and Pol ε [4,5]. Their catalytic subunits share the sequence homology of B family polymerases, they contain the fingers, palm and thumb subdomains typical of most polymerases, and they are thought to share a common two-metal-ion catalytic mechanism for polymerization. Nonetheless, Pols α, δ and ε differ substantially in subunit composition and biochemical properties. For example, Pol α is the least processive, and also the least accurate  because it lacks an intrinsic proofreading exonuclease. Both properties are consistent with an essential but limited role in initiating replication at origins and of Okazaki fragments. Pol α extends RNA primers synthesized by its associated primase activity until sufficient duplex DNA is created to allow a switch to Pol δ and/or Pol ε. Pols δ and ε are better suited for extensive DNA synthesis because they are highly processive when operating with their accessory proteins. Their polymerase catalytic subunits possess 3´ exonucleolytic proofreading activity, such that they are more accurate than Pol α . Studies in vitro indicate that yeast Pol ε is more accurate than yeast Pol δ with regard to single base-base  and single base deletion mismatches . Because Pol δ and Pol ε have similar nucleotide selectivity , the higher fidelity of Pol ε likely reflects more efficient proofreading.
While Pol α clearly initiates DNA synthesis at origins and of Okazaki fragments, the roles of Pols δ and ε in leading and lagging strand replication in vivo have been difficult to discern, in part because mutations in the yeast genes encoding these polymerases (POL3 for Pol δ and POL2 for Pol ε) are lethal or nearly so [9,10]. The situation changed when structure-function studies identified novel variants of yeast Pol α , Pol δ [12–14] and Pol ε  that were used for tracking where each polymerase synthesizes DNA in a cell. Each variant contains a replacement for a conserved leucine (α/δ) or methionine (ε) at the polymerase active site. These variants have high replication capacity allowing normal cell growth. They also have low fidelity that elevates spontaneous mutation rates in vivo, thus identifying the polymerase responsible for making a replication error. Most importantly, they have error signatures that distinguish whether the error was made during copying the leading or lagging strand template. For example, between two mismatches that could give rise to T-A to C-G transition substitutions, L612M Pol δ generates T-dGTP mismatches at a ≥ 28-fold higher rate than A-dCTP mismatches (Fig. 1A, top) during DNA synthesis in vitro . In a MMR-defective pol3-L612M msh2Δ strain that monitors uncorrected replication errors made by L612M Pol δ in vivo, T to C substitutions at base pair 97 in the URA3 gene were created at a high rate (58 × 10−7) when URA3 was located close to a well characterized replication origin (ARS306), and distant from the next closest origin . Given the direction of fork movement as base pair 97 is being replicated, and inferring that the T to C substitution results from a T-dGTP mismatch, the error would be generated by Pol δ during lagging strand replication (Fig. 1A, OR1). Importantly, T to C substitutions at base pair 97 occurred at a much lower rate (3.1 × 10−7) when the URA3 orientation (OR2) was reversed relative to ARS306. In this orientation, the T-dGTP mismatch would be a leading strand error, so the much lower rate implies that Pol δ has at most a minor role in replicating the leading strand template. When a similar logic was applied to other orientation-dependent mutations across the URA3 open reading frame (e.g., deletion of a T-A base pair (Fig. 1A) and see other events in ), the results supported the idea that Pol δ primarily copies the lagging strand template.
A more recent study  extended this approach to the whole genome by sequencing 16 genomes from the pol3-L612M msh2Δ strain. This identified 1206 single base substitutions distributed evenly across all 16 chromosomes. The vast majority of these were consistent with formation of T-dGTP and G-dTTP mismatches (Fig. 1B, top). The distribution of these events was strikingly asymmetric relative to the 274 functional replication origins in yeast (Fig. 1B), and occurred with strand biases that varied as predicted if Pol δ is primarily replicating the lagging strand template.
To examine the role of Pol ε in replication, mutagenesis in URA3 was monitored in pol2-M644G strains proficient in MMR, thereby scoring Pol ε errors that escape MMR . Two hotspots were observed in one URA3 orientation but not the other, both for T-A to A-T substitutions (Fig. 1A shows the hotspot at base pair 686). The results, and the fact that M644G Pol ε generates T-dTTP mismatches at a ≥39-fold higher rate than A-dATP mismatches in vitro (top of Fig. 1A), imply that Pol ε participates in replicating the leading strand template . When combined with the evidence that Pol δ has at most a minor role in leading strand replication, the data further imply that Pol ε may be the major leading strand replicase. This leads to a simple model (Fig. 1C) wherein Pols ε and δ primarily replicate the leading and lagging strands, respectively. This model is consistent with a report that Pol δ and Pol ε proofread errors on opposite DNA strands , and with more recent evidence  suggesting that lagging strand replication errors generated by L868M Pol α are proofread by the 3´ exonuclease of Pol δ but not by the 3´ exonuclease of Pol ε . Similar studies have not yet been performed in higher eukaryotes, but certain asymmetric error signatures for human Pol ε  and variants of human Pol δ  may be useful for this purpose.
Eukaryotic chromosomes vary widely in sequence composition, they are highly organized with respect to transcription and chromatin content, and they are constantly assaulted by endogenous metabolites and external environmental stresses that result in variety of different lesions in DNA. Each of these variables may influence the composition of replication forks, such that a simple model is unlikely to apply to all circumstances [9,10]. When replication stalls at a lesion or a natural fork barrier, the fork can be re-modeled to deal with the dilemma in several ways. One solution is translesion synthesis by one or more specialized DNA polymerases, which may operate at the fork and/or during filling of gaps left behind when replication restarts downstream of the lesion ( and references therein). Restart of leading strand synthesis by Pol α-primase and a switch to Pol δ could allow these enzymes to contribute to leading strand replication.
The concentrations of the dNTPs required for eukaryotic replication are not equal even in normal cells [23,24], and this natural pool imbalance can be exacerbated by mutations in the RNR1 gene encoding a subunit of ribonucleotide reductase (RNR), an enzyme involved in the production and regulation of dNTPs. These imbalances can be mutagenic by promoting polymerase misinsertion, and increased concentrations of one or more dNTPs can also reduce proofreading efficiency by increasing mismatch extension. For these reasons, mutagenic specificity can be predicted by the nature of the dNTP pool changes and the sequence of the template being replicated. Recently, two studies [25,26] used this predictability to infer strand-specific mutagenesis driven by two different pool imbalances in yeast strains with amino acid changes in the allosteric specificity site of RNR. In one strain (rnr1-Y285A), high dTTP and dCTP concentrations did not reduce proliferation or initiate an S-phase checkpoint, but strongly increased mutation rates in specific sequence contexts. The mismatches inferred to be responsible for the mutations occurred during both leading and lagging strand replication (Fig. 2A, top), and largely with the specificity predicted by the nature of the dNTP pool changes. In contrast, a strain (rnr1-Q288A) with a different dNTP pool imbalance progressed more slowly through S phase, likely due to the low dCTP concentration, and in this strain the S-phase checkpoint was activated. Surprisingly, the mutagenesis in this strain was inferred to only result from lagging strand replication (Fig. 2A, bottom). These results correlate with a suggested role for Pol ε in S-phase checkpoint control , again implying that Pol ε is the leading strand replicase. The checkpoint may selectively reduce leading strand replication errors by affecting one or both replication error correction steps, proofreading and MMR.
rNTPs in DNA present a potential risk to genome stability because the reactive 2´ hydroxyl would sensitize the DNA backbone to strand cleavage. This is interesting because rNTP concentrations in eukaryotes are much higher than dNTP concentrations [23,28]. Although most DNA polymerases discriminate well against rNTP insertion into DNA , a recent study showed that in reactions containing cellular rNTP and dNTP concentrations, yeast Pols α, δ and ε incorporate substantial numbers of ribonucleotides into DNA during synthesis in vitro . Soon thereafter , rNTPs were shown to be incorporated during replication by yeast Pol ε in vivo, and also that they are normally removed by RNase H2-dependent repair. However, failure to repair these ribonucleotides in RNase H2-defective strains slightly increased replicative stress and strongly increased the rate of 2–5 base pair deletions in repetitive sequences. In some cases (e.g., deletion of a CA dinucleotide, Fig. 2B), the deletion rate was higher for one orientation of URA3 as compared to the other. These data suggest strand-specific rNTP incorporation by Pol ε, once again underscoring the asymmetry of replication. Interestingly, the rate of 2–5 base pair deletions in the pol2-M644G rnh201 strain was not strongly increased by loss of mismatch repair . This suggests a model wherein, when rNMPs incorporated by M644G Pol ε during leading strand replication are not repaired by RNase H2, they are processed outside the context of replication in a manner that gives rise to misaligned intermediates that result in short deletions. In support of this hypothesis, a recent study  showed that topoisomerase 1 is required for formation of 2–5 base pair deletions in rnh201Δ strains. Topoisomerase 1 normally relieves DNA supercoils formed during transcription, and it can also cleave the backbone of DNA containing a single ribonucleotide. This cleavage creates DNA ends that must be processed to allow ligation, and it is this processing that is proposed to provide the opportunity for strand misalignments in repetitive sequences that ultimately yield the deletions.
Overall, these studies highlight three generally understudied subjects, discrimination against an incorrect sugar during DNA replication, the roles of RNases H in repair of ribonucleotides in DNA , and mutagenesis due to aberrant processing of unrepaired rNMPs in the genome. Among many interesting issues remaining to be addressed, one is the intriguing speculation that the transient presence of rNMPs in the genome may have important signaling functions (several possibilities discussed in ). This seems possible because rNMPs in DNA affect helix parameters (see references in ), which may affect protein binding. The cost of rNMPs as signals may be low because large numbers of unrepaired rNMPs can be tolerated in the yeast nuclear genome with only a mild effect on growth [30,34].
Early genetic studies in E. coli (e.g., see  and references therein) revealed that mismatch repair most efficiently corrects mismatches that are most commonly made by its major replicase, DNA polymerase III. This type of ‘reciprocity’ is also seen for replication errors generated by yeast Pol δ. Pol δ most frequently creates mismatches that could result in transitions and single base deletions in mononucleotide runs, and it is these errors that are corrected most efficiently by MMR . Does reciprocity extend even further? Might MMR correct lagging strand replication errors made by less accurate Pol α more efficiently than those made by more accurate Pol δ? One reason to consider this possibility is that errors made by Pol α are closer to the 5´ ends of Okazaki fragments than are errors made by Pol δ. This proximity may be relevant because the 5´ ends of DNA can serve as signals for MMR in vitro , and because when working in conjunction with the asymmetric PCNA sliding clamp , DNA 5´ ends have been suggested to be strand discrimination signals in vivo . When this type of reciprocity was tested in yeast using the Leu to Met mutator alleles of Pol δ and Pol α, the results (Fig. 3A and ) suggest that apparent MMR efficiencies may indeed be higher for base-base mismatches made by Pol α as compared to the same mismatches made by Pol δ. This supports a model (Fig. 3B) wherein the 5′ ends of Okazaki fragments serve as strand discrimination signals. The proximity of a Pol α error to a 5´ DNA end may result in higher efficiency for the general MMR machinery, just as the efficiency of E. coli MMR is proportional to the distance between the mismatch and the strand discrimination signal . It is also possible that mismatches near the 5′ ends of Okazaki fragments are repaired by a MMR pathway specialized to protect the genome against particularly abundant replication errors made by Pol α (Fig. 3B). This could involve mismatch removal via strand displacement, as recently observed in vitro , or MMR involving nucleases such as Exo1 or the exonuclease activity of yeast FEN1, a nuclease involved in Okazaki fragment maturation that has also been suggested to participate in Msh2-Msh6-dependent MMR .
These studies support an important idea that emerged early in studies of MMR of replication errors, namely that mismatch repair may have “a special relation to the replication complex” . Replication and MMR use many of the same proteins and may be physically coupled . They may also be carefully coordinated in order to deal with nucleosomes, which could potentially influence MMR efficiency and are loaded onto DNA behind the replication fork. Several recent studies reveal a growing interest in this topic [43–46].
The studies reviewed here illustrate a growing appreciation of the intricate enzymology required to asymmetrically replicate large and complex eukaryotic nuclear genomes with the accuracy needed to preserve the information content of both strands. The ultimate outcome reflects not just the polymerases discussed here, but also their highly differentiated accessory proteins  and the coordination of replication with repair, recombination, transcription , nucleosome loading, and chromatin status. This complexity makes it improbable that genome stability can be generally increased . This complexity also provides many opportunities for destabilizing the genome, some undoubtedly yet to be discovered. Achieving a detailed understanding of these opportunities is motivated by the beneficial consequences of mutations to organisms, like evolution, phase variation at contingency loci in microbes and somatic hypermutation of immunoglobulin genes, as well as from the many associations between mutations and human diseases [49,50]. As just one example that may be related to asymmetric relication of the leading and lagging strands, defects in the 3´ exonuclease activities of Pols δ and ε both confer cancer susceptibility in mice, but do so at different ages of onset and with different tissue specificity . Could this be related to differences in strand-specific mutagenesis and/or tissue-specific differences in replication origin firing during development?
To the many scientists whose work contributed to the topics considered here but whose publications were not cited due to space limitations, I thank them for their understanding. Interested readers are encouraged to consult the recent articles that are cited, where citations to many other important studies can be found. I also thank Peter Burgers and Andrei Chabes for thoughtful comments on this article. Research performed in my laboratory is supported by Projects Z01 ES065070 and Z01 ES065089 from the Division of Intramural Research, NIEHS, NIH.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.