|Home | About | Journals | Submit | Contact Us | Français|
Understanding how DNA polymerases process lesions remains fundamental to determining the molecular origins of mutagenic translesion bypass. We have investigated how a benzo[a]pyrene-derived N2-dG adduct, 10S (+)-trans-anti-[BP]-N2-dG ([BP]G*), is processed in Dpo4, the well-characterized Y-family bypass DNA polymerase. This polymerase has a slippage-prone spacious active site region. Experimental results in a 5′-C[BP]G*G-3′ sequence context reveal significant selectivity for dGTP insertion that predominantly yields −1 deletion extension products. A less pronounced error-prone non-slippage pathway that leads to full extension products with insertion of A > C > G opposite the lesion, is also observed. Molecular modeling and dynamics simulations follow the bypass of [BP]G* through an entire replication cycle for the first time in Dpo4, providing structural interpretations for the experimental observations. The preference for dGTP insertion is explained by a 5′-slippage pattern in which the unmodified G rather than G* is skipped, the incoming dGTP pairs with the C on the 5′-side of G*, and the −1 deletion is produced upon further primer extension which is more facile than nucleotide insertion. In addition, the simulations suggest that the [BP]G* may undergo an anti/syn conformational rearrangement during the stages of the replication cycle. In the minor non-slippage pathway, the nucleotide insertion preferences opposite the lesion are explained by relative distortions to the active site region. These structural insights, provided by the modeling and dynamics studies, augment kinetic and limited available crystallographic investigations with bulky lesions, by providing molecular explanations for lesion bypass activities over an entire replication cycle.
Understanding how DNA polymerases process lesions remains fundamental to determining the molecular origins of mutagenic translesion bypass. It is now widely appreciated that high-fidelity polymerases that are stalled by DNA lesions are replaced by translesion bypass polymerases that are capable of transiting the lesions (1–6). The structural differences between high-fidelity and translesion bypass polymerases that determine error-prone or error-free bypass of the lesions is of significant and fundamental interest since mutations can lead to cancer and other diseases (7). In order to gain insights into the effects of polymerase structure on translesion bypass on a molecular level, we have utilized a model system employing a bulky mutagenic DNA lesion derived from benzo[a]pyrene (BP), a widely studied environmental contaminant (7). Like many other polycyclic aromatic hydrocarbons, BP is metabolized in vivo to numerous metabolites (8–10), with the (+)-7(R),8(S),9(S),10(R) benzo[a]pyrene diol epoxide [(+)-anti-BPDE] being a predominant tumorigenic derivative of BP (11). The major stable lesion formed when (+)-anti-BPDE reacts with DNA in vitro (12) or in vivo (13, 14) is the 10S (+)-trans-anti-[BP]-N2-dG ([BP]G*) adduct (Figure 1A). As a representative high-fidelity polymerase, we investigated in a previous study (15) the A-family polymerase from the bacterium Bacillus stearothermophilus, specifically the fragment called BF (16). The model Y-family bypass polymerase from the archaeon bacterium Sulfolobus solfataricus (17), Dpo4, is investigated in the present work. Dpo4 belongs to the DinB branch of Y-family polymerases of which human Pol κ is also a member (18).
A number of crystal structures in the forms of binary and ternary complexes, with and without DNA lesions, have been published for these two enzymes [e.g. (16, 19–39)], and the availability of these structural data motivated our selection of these two polymerases. In BF, proceeding from the binary complex (enzyme with primer/template DNA) to the ternary complex (enzyme with primer/template DNA and dNTP) involves a closing motion of the fingers domain, with an induced-fit mechanism (40, 41) that is used to select for the correct incoming dNTP. The structure of Dpo4 is significantly different from BF, and has a pre-formed spacious and solvent-accessible active site that can simultaneously accommodate two templating bases (6, 25). Proceeding from the binary complex to the ternary complex does not involve a closing motion of the finger domain, and there is no analogous induced-fit mechanism for selecting the correct incoming dNTP (34). Furthermore, in BF, like in other high-fidelity polymerases, the minor groove side of the DNA duplex is tightly filled with enzyme amino acid residues, and constitutes a scanning track that is crucial for high fidelity and processivity (20, 42, 43). In contrast, Dpo4 has a large open pocket on the major groove side and a smaller one on the minor groove side of the evolving DNA duplex, and there is no scanning track on the minor groove side. A recent review discusses these features and their connection to lesion processing (44).
Our objectives are to delineate the distinguishing features of translesion bypass involving the model [BP]G* bulky adduct at or near the primer-template junction within the active site of Dpo4; we wish to gain a deeper understanding of the effects of base sequence context in which the lesions are embedded, and to compare these phenomena with the case for BF. We have previously employed molecular modeling and dynamics simulations to follow these interactions through an entire replication cycle in a 5′-CG*G-3′ sequence context in BF (15). In the present work, we use these computational approaches to investigate how Dpo4 processes the [BP]G* adduct through the same replication cycle (Figure 1B: Steps 1–4) in the same sequence context; we again considered both binary and ternary complexes and all four partners opposite the [BP]G* adduct during the dNTP insertion and extension steps (Figure 1B: Steps 2–4).
Our current experiments provide the foundation for the modeling studies and show that in the CG*G sequence context, Dpo4 bypasses the [BP]G* adduct relatively efficiently by predominantly utilizing a −1 deletion pathway that involves a misalignment of the primer strand at the active site (strand slippage). In the single dNTP insertion assays with the lesion site next to be replicated, dGTP is predominantly selected, followed by dATP, and then dCTP. There is no observable dTTP insertion. Further extension beyond the lesion site is more facile than nucleotide incorporation. In striking contrast, BF was shown to be mainly blocked by this adduct: while nucleotide insertion opposite the adduct was observed with purines favored over pyrimidines, further extension was strongly inhibited (15, 45). In addition, the base 5′-flanking the adduct affects the nucleotide selectivities in Dpo4, but not in BF (45). Our current modeling studies provide detailed dynamic molecular structures of these various intermediates in the multi-step process of replication past the [BP]G* adduct in Dpo4. The selectivity for dGTP is explained by a primer strand relocation or misalignment mechanism: an unusual 5′-slippage mechanism, in which the undamaged G rather than G* is skipped, and the incoming dGTP pairs with C, is suggested for the CG*G sequence context studied in this work. In a previous study with Dpo4 utilizing the 5′-TG*G-3′ sequence context as a template, we proposed an analogous mechanism for explaining the preferred selectivity of dATP insertion in single nucleotide insertion assays (46). In contrast, a study of dNTP insertion opposite G* in the 5′-AG*C-3′ sequence revealed little nucleotide selectivity under saturating conditions of dNTP concentrations (47). Primer strand slippage also accounts for the efficient translesion bypass that occurs mainly by the −1 deletion pathway in Dpo4. Based on experimental primer extension data reported here and relevant available experimental X-ray crystallographic data (25, 28, 34), we employed molecular dynamics simulation methods to analyze the structural features of the intermediate binary and ternary complexes over an entire replication cycle. This approach offers an integrated strategy for studying the dynamic properties of molecular interactions at the active sites of polymerases; it yields structural details about these intermediates that are not provided by the kinetics or yet observed by crystallographic methods.
[γ-32P] ATP (3000 ci/mmol) was purchased from Perkin-Elmer Life Sciences, Inc. (Boston, MA). The dNTPs, dATP, dCTP, dGTP, dTTP were purchased from New England Biolabs, Inc. (Beverly, MA). Sulfolobus solfataricus DNA polymerase IV (Dpo4) was kindly provided by Dr. O. Rechkoblit and Dr. D.J. Patel. The site-specifically modified 43-mer template oligonucleotide strand, 5′-GAC TAC GTA CTG TCA CC G*GA CAC GCT ATC TGG CCA GAT CCG C-3′, was synthesized, purified and verified as described earlier (48).
Running-start and standing-start primer extension reactions were carried out as described in more detail previously (15, 46). Briefly, in the running-start experiments, the 43-mer templates were annealed to a 32P-labeled 22-mer primer (defined in Figure 2A), with the [BP]G* adduct (G*) positioned at template position 25 counted from the 3′ end of the template strand (CG*G template, Figure 2A). The running start experiments were carried out under the following conditions: 4.5 nM template strands, 10 nM Dpo4, and 100 μM dNTP in buffer solution (50 mM Tris-HCl, pH 8.0, 5 mM MgCl2, 1 mM DTT, 50 μg/mL BSA, 4% glycerol). The reaction was allowed to proceed at 37°C for selected times before being terminated with 5 μL of stop solution (20 mM EDTA, 95% formamide, 0.05% bromophenol blue, and 0.05% xylene cyanol). After the specified incubation times, the DNA samples were heated at 90°C for 5 min, chilled on ice and then applied to a 15–20% denaturing polyacrylamide gel containing 7M urea. The replication products were visualized by autoradiography and quantitatively analyzed by a Storm 840 phosphorimager and the Storm ImageQuant software (GE Healthcare). In the standing-start single dNTP insertion experiments, the 43-mer CG*G templates were annealed to a 32P-labeled 24-mer primer (Figure 2D). The one-step dNTP insertion assays were carried out under the following conditions at 37°C: 2 nM CG*G template/primer complex, 2 nM Dpo4, 500 μM of each dNTP, one at a time, and the incubation time was 5 min.
A two-phase gel electrophoresis method originally used by Shibutani (49) was adapted and significantly modified in order to sequence the primer extension products derived from the running-start experiments. The principle of our modified approach is outlined in Figure 2. The main difference from the original Shibutani method is that we synthesized a radioactively labeled 22-mer primer strand. This internally labeled 32P-labeled 22-mer primer strand was prepared by extending an 18-mer primer annealed to a 22-mer template strand (5′-end labeled) that had the same sequence as the first 22-nucleotide sequence of the 43-mer template strand counted from the 3′-end (Figure 2A). The 18-mer was extended to a 22-mer by incubating the annealed 22-mer template•18-mer duplex (100 nM) in solution containing 500 nM of BF, 2 μM of dTTP and α-32P-dCTP, and 4 μM of dGTP at 37°C for 30 minutes, thus generating the internally 32P-labeled primer strand (the labeled C is underlined in Figure 2). These 22-mer primer oligonucleotides were then purified by denaturing gel electrophoresis, eluted, desalted and annealed with either a 43-mer unmodified template strand (CGG), or a 43-mer template strand containing a single [BP]G* adduct (CG*G, Figure 2A). The 22-mer primer strand was then extended using Dpo4 as described above (4.5 nM template strand, 10 nM Dpo4, and 500 μM of each of the four dNTPs at 37°C). The primer extension products were then subjected to 7 M urea 20% denaturing Polyacrylamide gel electrophoresis. The labeled 41–43-mer extended primer strands were then eluted from the gels, annealed with a 10-fold excess of unmodified and unlabeled 43-mer template strands CGG. These duplexes were then incubated with the restriction enzymes Tsp45I and AlwI that sequence-specifically cleave the duplexes as shown in Figure 2B. The incubations of the duplexes with the restriction enzymes TspI and AlwI (4 units each) were carried out at 37 and 65°C, respectively, as described by the supplier (New England BioLabs Inc.). The cleavage products were separated by a two-phase gel system: the first 10 cm of the 82 cm plate consisted of 20% denaturing polyacrylamide gel, whereas the remaining 72 cm consisted of 20% native polyacrylamide gel. The gels were allowed to run overnight (~20 hrs) at 40 watts and 2000 Volts. The migration patterns of the 16-mers were then compared to those of 16-mer standards 5′-ACGATAGCGTGTCZGT, with Z = A, G, C or T, where Z corresponds to the nucleotide inserted opposite G* during the primer extension reaction (Figure 2). If the polymerase skips G* during the translesion synthesis (TLS), the 15-mer 5′-ACGATAGCGTGTCGT is observed which corresponds to a (−1) deletion (Del) or framshift mutation. The 15-mer Del products are easily distinguishable from the 16-mers that represent point mutations. The different 16-mers with Z = A, G, C or T can be distinguished only if their mobilities are sufficiently different from one another.
Three Dpo4 crystal structures (PDB IDs: 1S0M, 2ASL, and 1JXL) (25, 28, 34) were used as the starting structures for molecular modeling and the coordinates were obtained from the Protein Data Bank (50). All molecular modeling was carried out using Insight 2000.1 (Accelrys, Inc., a subsidiary of Pharmacopeia, Inc.).
An initial model for the unmodified Dpo4 ternary complex was constructed as described previously (51) based on the crystal structure with PDB ID 1S0M (28) and provided a near-reaction ready active site region with well coordinated Mg2+ ions (Figure S1). The binary post-insertion complex utilized the structure from (34) (PDB ID: 2ASL), which originally contained 7,8-dihydro-8-oxoguanine (oxoG); it was remodeled as follows: the oxoG was replaced by a G, the Ca2+ ion which is far from the O3′ of the primer terminus was removed, and missing hydrogen atoms were added to the crystal structure by the LEaP module in AMBER (52). The slippage ternary complex based on the Type-II Dpo4 crystal structure (PDB ID: 1JXL) (25) was remodeled as described in Xu et al. (46) and in the Supporting Information.
The above remodeled crystal structures were used to build the initial models with [BP]G* for MD simulations. The DNA sequence was remodeled according to that used experimentally (Figure 1B). The BP moiety was covalently linked to the N2 of the guanine. According to the different positions of [BP]G* in Dpo4 at the four steps, four corresponding non-slippage DNA structures and one slippage DNA structure were built. For the non-slippage models, we investigated both anti and syn conformations of the glycosidic torsion χ of the BP adducted dG* (Figure 1A) opposite all four partners. In addition, for G:A and G:G mismatches, anti-anti, anti-syn and syn-anti conformations were investigated; we did not consider syn-syn conformations as these are rarely observed (53). A syn-dATP has been observed opposite a damaged template in a Dpo4 crystal structure (26). The torsion angles α′ and β′ at the linkage site between the guanine and the BP (Figure 1A) were rotated at 10° intervals within the low energy domains (54), in combination to locate conformations with minimal collisions. Slippage models were constructed using the same strategy as in Xu et al. (46), detailed also in the Supporting Information. Two anti-[BP]G* models and one syn-[BP]G* model were located. The torsion angles for all initial models are given in Table S1. The remodeled crystal structures also provided the initial models for the unmodified controls. All initial models are shown in Figures S2 and S3.
Partial charges for anti/syn-[BP]G*, anti/syn-dATP, anti/syn-dGTP and anti-dCTP and anti-dTTP utilized were computed previously (15, 55). The minimization, equilibration and production were carried out using the same software and protocols as in Xu et al. (15).
To obtain ensemble average values for properties of interest, trajectories were collected for all systems and were analyzed using the PTRAJ and CARNAL modules of the AMBER package (52). For each system, the RMSDs (root-mean-square deviations) of the whole structure and the active site (composed of all the residues within 5 Å of the nascent base pair) were calculated relative to the first production frame (Figure S4). We found that the active sites of all systems are locally stable in the 1.0~2.5 ns time frame. The following analyses are based on these ranges unless otherwise stated. All structural figures were prepared using PyMOL (56).
A goal of the present work was to evaluate the characteristics of translesion bypass of [BP]G* adducts by Dpo4 in the CG*G sequence context (Figure 2) and compare the results to the features obtained with the A-family polymerase BF investigated earlier in the same base sequence context (15). In the studies with BF, it was shown that this polymerase inserted dNTPs rather efficiently opposite the lesion, but that further primer extension beyond the lesion was severely inhibited. Here we show that, in contrast, in the case of Dpo4 the rate-limiting step in translesion synthesis (TLS) of the same lesion is the apparent insertion of a dNTP opposite the lesion, rather than the extension step as in the case of BF. We utilized the term “apparent” to highlight that, in our delineated slippage mechanism discussed below, the dNTP is actually inserted opposite the C on the 5′-side of G*. Another goal of this study was to compare the effects of base sequence in the CG*G sequence context with our earlier studies in the TG*G (46) and AG*C sequence contexts (47). We previously found that at 37°C all four dNTPs are inserted with rather similar probabilities opposite the lesion in the AG*C sequence context under conditions of saturating dNTP concentrations, and that dATP insertion is significantly favored in the TG*G sequence context. Here, in the CG*G sequence context, we have used a two-phase gel sequencing approach to determine that the apparent insertion of dGTP opposite [BP]G* is favored, and that −1 deletions are the dominant full translesion bypass products. An objective of this work was to elucidate the structural origins of such base sequence context effects. The experimental primer extension results were generated to set the stage for the modeling and computational components of this effort.
The distribution of primer extension products of different lengths generated by Dpo4 using 43-mer templates containing a single [BP]G* residue (Figure 1A) depend on base sequence context (45), temperature, activity of the Dpo4 polymerase sample, primer/template to Dpo4 ratios, and polymerase and DNA concentrations. The time course of a typical running-start primer extension reaction is shown in Figure 3A and an example of a profile of extension products formed after a 15 min incubation time is shown in Figure 3B. Additional product profiles (for the 12 min and 30 min time points) are shown in Figure S5, Supporting Information. Since the templates were gel-purified before use, and the purity of the templates was better than 98% (45), the fast phase cannot be attributed to the presence of unmodified template strand contamination. In these experiments, the 22-mer primer strand (labeled “P” in Figure 3A) was present in ~ 25% excess to maximize the extent of formation of products that were needed for subsequent sequencing experiments. About 65% of the 22-mer strands initially present were rapidly extended within the first three minutes of incubation time, another 8% were converted from 3 to 15 minutes, and only 2% of the strands were converted during the last 15 min of incubation (Figure 3C). Thus, the depletion of the primer band P occurs in two phases, a rapid one within the first three-minute time interval, followed by a significantly slower phase within the next 12 min. Within the final 15 min incubation time the concentrations of primer strands remains practically constant. Similar two-phase kinetics observed upon primer extension catalyzed by Dpo4, have been previously reported and discussed (38). Such biphasic kinetics most likely represent differing conformational states of the lesion within the polymerase active site that interconvert slowly; they are consistent with the existence of a productive ternary complex conformation that can lead to nucleotide incorporation, and a non-productive state that is reaction-incapable (38). Such two state conformations have been crystallographically observed with Dpo4 for several bulky lesions (28, 38, 39). The ~ 25% residual 22-mer strand levels (Figure 3C) represent the single-stranded primers that are in excess. As shown elsewhere (45), residual primer strand levels are not observed when the initial primer:template ratio is ≤ 1.0. The thermodynamic stabilities of the 22-mer primer-43-mer template duplexes are lower than the stabilities of any of the partially or fully extended and longer primer-template duplexes. Therefore, it is unlikely that the excess 22-mer primers can efficiently displace any of the extended primer strands during the 30 min incubation time, thus accounting for the slow depletion of the 22-mer primer strands during the last 15 min (Figure 3C).
The most abundant and persistent extension product is the 24-mer (labeled “−1” in Figure 3A) which reaches a level of ~ 50% within the first 3 min and then slowly decreases to 25% after 30 min (Figure 3C). The persistence of the −1 band during the entire 30 min incubation period, indicates that extension from the 22-mer to the 24-primer strand is significantly faster than the apparent insertion of a nucleotide opposite position “0” that defines the position of an extended 25-mer. This is due to the slow −1 to 0 nucleotide incorporation step. In contrast, when the incubation temperature is increased to 55°C (Figure S6, Supporting Information), the −1 band intensity decreases substantially after a 15 min incubation time and beyond, although the 22-mer primer band levels exhibit little change. This is consistent with an acceleration of the bypass of the [BP]G* lesions and higher reaction rates at the higher temperature (57), but without a significant depletion of the 22-mer primer strand for the reasons noted above.
Since the −1 band is the strongest among all primer extension bands beyond the 22-mer primer band, we conclude that the −1 → 0 dNTP step is the slowest in the series of steps that ultimately lead to 41–42-mer nucleotide extension products for incubation times of up to 15 min, and a faint 43-mer, fully extended primer band is observable at the 30 min time point (Figure 3A and 3B). The 41 and 42-mer bands observed stem from a (−1) deletion TLS step, as discussed further below. Faint bands are also observed up to the +1, +2, and +3 primer extension products indicating that the adduct disturbs the primer template structure at least up to template position +3. The time evolution of selected primer extension products, presented in Figure 3C shows the slow accumulation of 41–43-mer extension products. The product profiles for the 12 and 30 min incubation time intervals are shown in Supporting Information (Figure S5). At short incubation times, the 41-mer is dominant, with smaller amounts of 42-mers and 43-mers. However, at the longest reaction times the 42-mer products are dominant, with lesser amounts of 41-mers, and even smaller relative amounts of 43-mers.
The single-step standing-start primer extension assays (Figure 4A) reveal that the order of dNTP insertion in the slowest −1 → 0 step (opposite the [BP]G* lesion) is dGTP dATP > dCTP dTTP.
In the sequencing experiments, the 41–43-mer primer extension products (the bands shown at the top of the gel, Figure 3A, 30 min time point) were eluted from the gel. The samples were then analyzed by the two-phase gel sequencing approach after cleaving the extended primer-template duplexes by restriction enzymes as described in the Methods section. The central 16-nucleotide long portion of the extended primer strands (restriction enzyme fragments) thus isolated, contain the nucleotides inserted opposite the [BP]G* adduct (if any) and the neighboring nucleotides (45).
A typical gel comparing the electrophoretic mobilities of the restriction enzyme fragments to those of the 16-mer standards with Z = A, C, G, or T, or the −1 deletion product (15mer) are shown in Figure 4B, and the densitometry tracings of each lane are shown in Figure 4C (there were no bands shorter than the 15-mer shown, thus ruling out the existence of −2 deletion products (45)). The 43-mer band shown at the top of the gel in Figure 4B (left corner) is an oligonucleotide standard with the same sequence as that of a normal full extension 43-mer product. The lanes labeled “A”, “C”, “G”, and “T” are size marker 16-mer sequences (Figure 2C) with Z = A, C, G, and T, respectively. These sequences are markers for the 16-mer restriction fragments derived from the 41–43-mer primer extension bands that arise from the translesion bypass of G* in the 43-mer CG*G template strand. The band labeled “Del” represents the 15-mer primer with Z missing altogether, and thus is a standard for a −1 deletion primer extension product. The lane labeled “Mix” contains all five standards. It is evident that the 16-mer sequences with Z = G or T cannot be distinguished.
The lane labeled CGG in Figure 4B is a control experiment conducted with an unmodified template strand CGG, while the lane marked CG*G shows the restriction products obtained from the duplex shown in Figure 2B. The lower bands (27-mers) represent the unsuccessful cleavage catalyzed by the restriction enzyme Tsp45I, and are observed in both the case of the modified and the unmodified CGG and CG*G duplexes, respectively (20%, or less of the total). Weaker 32-mer bands are observable in both lanes that correspond to single cleavage events by the AlwI restriction enzyme site. The CGG lane in Figure 4B shows a dominant single band that co-migrates with the 16-mer with Z = C, as expected for this control sample. However, in the CG*G lane, the dominant product is a band that co-migrates with the 15-mer deletion standard (Del). Thus, the dominant TLS product catalyzed by Dpo4 in the case of the [BP]G* adduct in the CG*G sequence is a −1 frameshift mechanism leading to the (−1) Del product (57% of the bypass products). This was confirmed by Sanger sequencing experiments (Figure S7) of the individual 41 and 42-mer primer extension products (Figures S8 and S9, respectively, Supporting Information). These results clearly show that the sequence of the 41 and 42-mers are identical in that both are missing a nucleotide that would have been inserted opposite the [BP]G* adduct. Only the lengths of these two extended primer strands are different. The 43-mer strand was also sequenced by the Sanger method, but the sequencing lane patterns were more complex (Figures S10 and S11) than in the case of the 42 or 41-mers. Two series of bands are recognized. The weaker series corresponds to a −1 deletion, and is probably due to the −1 deletion 42-mer product that could not be entirely separated from the 43-mer during the gel excision of the 43-mer band. The stronger series of bands (Figure S11) corresponded to 43-mer extension product with an A inserted opposite the lesion G* during TLS. These Sanger sequencing results are in qualitative agreement with the two-phase gel sequencing results of the mix of 41, 42, and 43-mer extension products (Figures 4B and 4C). A quantitative analysis of the data in Figure 4C demonstrates that the different nucleotides Z (Figure 2) are inserted ‘opposite’ the [BP]G* adduct in running start experiments with Z = A (10%), C (6%), and G or T (5%). The single-step insertion experiments suggest that the insertion of dGTP is significantly more efficient than that of dTTP (Figure 4A); we therefore hypothesize that the 16-mer with Z = G dominates over products with Z = T.
The single-step nucleotide insertion experiments shown in Figure 4A cannot distinguish between a mechanism that involves the pairing of dGTP opposite [BP]G*, followed by nucleotide incorporation, or one involving the pairing of the dGTP with the 5′-flanking C template residue, followed by nucleotide incorporation. However, the two-phase gel sequencing experiments depicted in Figure 4B indicate that the latter slipped frameshift, or primer relocation mechanism is dominant, although the dGTP:[BP]G* paired intermediate cannot be excluded since G is found to be incorporated ‘opposite’ the adduct (Figure 4B,C). However, a transient misalignment mechanism (58, 59) in which the dGTP pairs with the 5′-flanking template C, followed by realignment and then further primer extension from a dGTP:[BP]G* mispair can also account for the observed substitution errors opposite the lesion (58).
Our previous modeling studies suggested that a change of the glycosidic bond torsion angle at the [BP]G* site from a normal anti confomation with the BP moiety on the minor groove side of the evolving duplex to a syn conformation with BP on the major groove side may play a role in translesion bypass catalyzed by BF (15) and that both anti and syn conformations are feasible at the insertion step in a different sequence context in Dpo4 (47). In addition, both anti and syn conformations have been recently observed with a bulky adduct in Dpo4(38). In the present work, we again employed molecular modeling and dynamics simulations to interpret, on a molecular level, the results of Dpo4-catalyzed translesion bypass experiments of the [BP]G* adduct in a 5′-CG*G-3′ sequence context, investigating both anti and syn conformations of the [BP]G*. The experimental observations indicate the prevalence of relatively easy bypass of [BP]G* via a −1 deletion pathway, and a less prevalent non-slippage pathway involving dNTP incorporation opposite the [BP]G* adduct with dATP > dCTP > dGTP.
In order to analyze possible intermediates that can account for these results, we constructed different models that could explain the experimental observations. We define slippage and non-slippage models. In the non-slippage models, the incoming dNTP pairs with [BP]G* and the 3′-terminal base pairs with the unmodified template strand base G in CG*G. We investigated two types of slippage models: (a) the incoming dGTP pairs with C in CG*G, and the 3′-terminal primer base C pairs with G* instead of with the unmodified G that is skipped altogether (Figure 5A), and (b) dGTP pairs in the same manner as in (a), but G* is skipped (Figure 5B) (46). All slippage and non-slippage models constructed contained the lesions dG* in either the anti or the syn conformations. Specifically, as illustrated in Figure 1B, the models included (1) a binary complex (DNA and Dpo4 only) with the 3′-terminal primer base C opposite G in CG*G, termed ‘pre-insertion step’; (2) the incoming dNTP paired with either [BP]G* (non-slippage models), or with C in the CG*G sequence context in either pairing scheme A or B of Figure 5 (slippage models), termed the ‘insertion step’; (3) the terminal 3′-primer base paired with [BP]G* in the absence of any dNTP (binary complex ‘pre-extension step’); (4) same as (3), but with a dNTP opposite the template C in CG*G (ternary complex ‘extension step’) (Figure 1B). These are the four consecutive steps as the [BP]G* adduct threads through the Dpo4 active site and correspond to Steps 1–4 of our earlier study with the replicative A-family polymerase BF (15).
All of these models, together with their corresponding unmodified controls, were subjected to 2.5 ns of molecular dynamics simulations following equilibration and were locally stable (Figure S4). The structures thus obtained are shown in Figures 6, ,7,7, S12, and S13. Ensemble average structural properties (see Methods) relevant to the efficiency of primer extension catalyzed by Dpo4 were analyzed in each case (Tables 1, S2-S6 and Figure S14) and were compared to the corresponding unmodified DNA template-primer control complexes. The structural factors specifically considered are: (1) the number and occupancies of the hydrogen bonds in the [BP]G*-containing base pair and [BP]G*’s neighboring base pairs (Table S2); hydrogen bond occupancies are the percent of time during the stable region of the MD trajectory (1.0 to 2.5 ns) that the hydrogen bond is present according to the following criteria: a distance of ≤3.3 Å between heavy atoms (donor and acceptor) and a hydrogen bonding donor-hydrogen-acceptor angle of 180±45°; (2) presence of hydrogen bonds between amino acid residues and incoming dNTP that stabilize nucleotide binding in the ternary complexes (Table S3); (3) C1′-C1′ distance (~10.8 Å) of the 0 and +1 base pairs (Table S4); (4) frequency of sampling a near reaction-ready distance (3.1–3.5 Å) between Pα of the dNTP and the O3′ of the 3′-terminal primer base in ternary complexes (Table S4); (5) the attack angle in the ternary complexes, formed by O(primer 3′-end)-Pα (dNTP)-O3α (dNTP); for in-line attack, this angle should be close to 180° (Table S4); (6) distance between the two Mg2+ ions (Table S4) and the quality of their octahedral coordination (Table S5) in the ternary complexes. These criteria are based on a well-organized polymerase crystal structure (60) and QM/MM calculations (61).
The experimental results indicate that dGTP insertion opposite [BP]G* and −1 deletion products are dominant in the 42-mers and fully extended 43-mer primer strands. In order to account for these observations, we constructed two anti-dG* adduct models (anti-[BP]G*-5′-slippage and anti-[BP]G*-3′-slippage) (Figure 5A,B) and one syn-dG* (syn-[BP]G*) model. As described in detail below, we found that the 5′-slippage pattern with the unmodified G skipped (Figure 5A) favors dGTP insertion opposite the C 5′ to the adduct.
The anti-[BP]G*-5′-slippage system starts with a structure in which the primer terminal C pairs with G*, thus skipping the unmodified G in the CG*G sequence (Figure 5A); during the MD, this C mostly remains paired with G* and is not observed to pair with G (Table S6, Figure 6A). The Pα-O3′ distance has an ensemble average value of 3.7±0.2 Å (Tables 1, S6). The C1′-C1′ distance, the Mg2+-Mg2+ distance, and the Mg2+ coordination are near normal (Table S6).
Both the anti- and syn-[BP]G*-3′-slippage systems start with structures in which the 3′-terminal primer C pairs with the unmodified G in CG*G (Figure 5B); during the MD, this C remains paired with G and is never observed to pair with G* (Table S6, Figure 6B,C). However, the Pα-O3′ distance is never found to be within the near reaction-ready range in either of these two systems: the ensemble average values are 7.5±0.6 Å and 6.9±1.0 Å, respectively, for the anti- and syn-[BP]G*-3′-slippage systems (Tables 1, S6).
Since the Pα-O3′ ensemble average distance is near the reaction-ready range of 3.7±0.2 Å in the anti-[BP]G*-5′-slippage system, we conclude that the 5′-slippage pattern with G instead of G* skipped clearly favors dGTP insertion opposite the C in the CG*G sequence. The incorporation of dGTP into the primer strand with subsequent primer extension from this Watson-Crick C:G pair yields the −1 deletion products.
The experimental results indicate that besides the predominant −1 deletion products, the fully extended primers also contain products without deletion mutations: G*:A mismatch 10%, G*:C pair 6%, G*:G mismatch 5%. To gain insights into the underlying mechanisms, we have constructed non-slippage models at the pre-insertion step, insertion step, pre-extension step and extension step (Figure 1B), with all four partners opposite the [BP]G* adduct in Steps 2 through 4. Distortions of the structural properties for all the non-slippage models were scored according to the criteria described above and are given in Table 2: a more negative score indicates a more distorted structure. Our current best understanding of reaction-ready polymerase active site geometry was utilized in selecting the properties evaluated and the scoring criteria employed. A well-organized active site of a recent crystal structure (60) and QM/MM calculations reveal (61) features that closely resemble our unmodified models, suggesting that our criteria for evaluating distortions are reasonable; although the relationship between the scores and the biological outcome is likely non-linear, the reasonably good correlation between the experimental data and our scoring results indicates that this approach is useful.
All anti structures place the BP lesion on the minor groove side of the evolving duplex while the BP rings are on the major groove side in the syn structures. At Step 1, the pre-insertion step, both anti- and syn-[BP]G* models have a composite score of 0, suggesting that the adduct in the single strand overhang causes negligible distortion to the structure. At Step 2, the insertion step, the anti-[BP]G*:syn-dATP system (composite score -1) appears most favorable for nucleotide incorporation, followed by the anti-[BP]G*:dCTP system (composite score -2), and then the anti-[BP]G*:anti-dGTP system (composite score -3). The other systems appear too distorted for the nucleotidyl transfer reaction to occur. At Step 3, the pre-extension step, the anti-[BP]G*:dC and anti-[BP]G*:dT structures, both with a composite score of 0, appear most feasible, followed by the anti-[BP]G*:syn-dA, anti-[BP]G*:anti-dG, anti-[BP]G*:syn-dG systems (all with a composite score of -2), and then the syn-[BP]G*:dC system (composite score -3). In the other systems, the structures are distorted to a more significant extent and appear unfeasible. At Step 4, the extension step, the anti-[BP]G*:dT system (composite score -1) appears most favorable for primer extension beyond the lesion site, followed by the anti-[BP]G*:syn-dA and syn-[BP]G*:anti-dG systems (both with a composite score of -2), and then the anti-[BP]G*:anti-dA, syn-[BP]G*:anti-dA, syn-[BP]G*:dC, and syn-[BP]G*:dT systems (all with a composite score of -3). The remaining systems appear unfavorable for a successful nucleotidyl transfer reaction beyond the lesion site. The least distorted structures for each step are given in Figure 7. A full description of the structural properties of each of these systems is provided in the Supporting Information.
The 5′-slippage pattern indicated by our modeling studies, in which the unadducted G instead of the modified G* in the 5′-CG*G-3′ sequence context is skipped (Figure 5A), allows a favorable active site alignment (Tables 1 and S6), which supports dGTP insertion opposite the C base flanking [BP]G* on its 5′-side. This is consistent with a previous study utilizing the TG*G sequence in the Dpo4 polymerase which revealed strong dATP selectivity; in that study, the same 5′-slippage pattern was suggested to favor dATP insertion opposite the T base flanking [BP]G* on its 5′-side (45, 46). Therefore, the slippage mechanism we have proposed may be applicable to sequences with a repetitive G*G sequence theme as in CG*G/TG*G. Both CG*G and TG*G are prone to slippage because of the tandem GG motif; the mechanism accounts for sequence-dependent nucleotide selectivity which relies on the identity of both the 3’- and 5’-bases flanking G*. In contrast to our findings that dGTP is strongly selected over the other nucleotides in the CG*G sequence context, Perlow-Poehnelt et al. reported that all four dNTPs are promiscuously incorporated opposite G* in an AG*C sequence context under conditions of saturating dNTP concentration (47). The non-repetitive AG*C sequence context does not support the current 5′-slippage pattern, and such slippage does not appear to occur at the nucleotide insertion step in this sequence context.
Our 5′-slippage mechanism differs from the one described by Bauer et al. (39) that was based on crystallographic and primer extension studies with the damaged base in a different sequence context (5′-TG*A-3′), without contiguous guanines and with the G* in the post-insertion rather than the insertion position; in this case it is the adducted base that is skipped. We note that the mechanism proposed by our simulations does not preclude the possibility that G* may be skipped; the looping out of the adduct G* is a general mechanism that can occur in all sequence contexts; it allows for a relocation of the terminal 3′-primer nucleotide from the next-to-be-replicated G* to the 5′-flanking unmodified template base. Our results suggest that the exact primer frameshift misalignment mechanism can also depend on the nucleotide flanking the adduct G* on its 3′-side, as well as the base on its 5′-side. The misalignment mechanism in which the G on the 3′-side of G* is skipped and the terminal 3′-primer nucleotide relocates towards the 5′ direction, appears favored for the 5′-CG*G sequence considered here and for the TG*G sequence investigated previously (46). Various strand slippage mechanisms as mutational intermediates have been extensively studied by Ripley (59) and by Kunkel (58, 62). The relevance of these types of mechanisms to in vivo mutagenesis is a topic of considerable recent interest (63).
To explain that the deletion product is the predominant one among the fully extended primers, we infer that following dGTP insertion opposite the C, extension beyond this unmodified Watson-Crick C:G pair is facile despite the presence of a skipped base on the template strand. Primer extension beyond a G*:N base pair, which would yield full extension products, appears to be less favored in the experiments reported here, most likely because any G*:N base pair is more disturbing than a Watson-Crick C:G pair even if the primer strand is misaligned because the G that is 3’ to the lesion G* is skipped.
The strong preference for dGTP insertion occurs primarily through the predominant −1 deletion pathway, with the minor contribution by G*:G mismatch mutations; the preference for the other nucleotides then follows the order of dATP > dCTP dTTP. The sequencing analyses of the fully extended primers reveal that among the products without a −1 deletion, the G*:A mismatch is the most prevalent, followed by the G*:C pair and the G*:G mismatch, while productive G*:T mismatches are least likely. The composite scores in Table 2 reflect the extent of structural distortion for each non-slippage G* system. There are multiple conformational possibilities for each pair starting from Step 2, the nucleotide insertion step. For each pairing scheme the system with the least distortion is characterized by the least negative score. At Step 2, the anti-[BP]G*:syn-dATP (composite score -1) is modestly distorted with one of three hydrogen bonding interactions between G* and A disturbed. For the anti-[BP]G*:dCTP case, distortion is greater (composite score -2) with two of the three Watson-Crick hydrogen bonds between G* and C disturbed. This is consistent with the experimental result that insertion of dATP opposite the G* is greater than insertion of dCTP. Neither of the two [BP]G*:dTTP systems (scores -4 and -5) appears favorable for nucleotide incorporation. In the anti-[BP]G*:dTTP system, both the interactions between dTTP and its partner G* and the interactions stabilizing dTTP binding are disrupted. In the syn-[BP]G*:dTTP system, the nascent G:T base pair at the active site is severely distorted: the dTTP forms no hydrogen bonds with G*, the C1′-C1′ distance of this base pair is significantly enlarged, and the G* stacks poorly with its neighboring base. These modeling results are consistent with the experimental observations that explain dTTP insertion is not detectable (Figure 4).
The scoring system in Table 2 can be used to examine the feasibility of the conformational pathways for the entire four steps of the replicative cycle we investigated. Figure 7 summarizes the pathways that are most preferred according to Table 2. We note that in certain cases, a conformational change from anti to syn around the glycosidic bond of the dG* favors the progression through the cycle. Such an anti/syn conformational transition was observed in a BF crystal structure containing the bulky [AF]G* adduct; the AF-modified dG* was in a syn conformation at the pre-insertion site, but after one round of nucleotide incorporation within the crystal, the modified dG* rotated to the anti conformation that placed the AF residue into the open major groove side in the post-insertion site, while the syn conformation would have placed the lesion in the crowded minor groove scanning track (21). In addition, recently both anti and syn conformations of the glycosidic bond in the bulky adduct N2-CH2(2-naphthyl)G have been observed crystallographically in Dpo4(38), and had also been predicted for [BP]G* in Dpo4 (47). Furthermore, we mention in this connection that during replication the mobile polymerase machinery allows flexibilities that are not manifested in the static snapshots of crystal structures. In our cases, such anti/syn rearrangements appear necessary to avoid highly distorted structures. For example, in the anti-[BP]G*:dC system at Step 4, the extension step, the BP residue pushes the incoming dGTP away from its normal position (Figures S12, #37 and S13, #37), resulting in a poor Pα (dGTP)-O3′ (primer end) distance (Table S4); however, the syn-[BP]G* conformation appears feasible for dGTP insertion (Table S4). Similarly, both anti-[BP]G*:anti-dG and anti-[BP]G*:syn-dG systems are much more distorted than the syn-[BP]G*:anti-dG system at Step 4, the extension step, and the syn-[BP]G* conformation appears feasible for the nucleotidyl transfer reaction. Thus, bypass of the G*:C pair and the G*:G mismatch may involve an anti/syn conformational change of the BP modified dG*, while the G*:A mismatch does not. This would make bypass of the G*:A mismatch easier than bypass of the G*:C pair or the G*:G mismatch, as observed. As discussed earlier, neither of the two [BP]G*:dTTP systems at Step 2 (the insertion step) appear favorable for nucleotide incorporation, explaining why virtually no product containing a G*:T mismatch is detected (Figure 4). However, our results of Table 2 do indicate that if any dTTP were incorporated, extension should be facile.
In the present work, our experimental studies have revealed that Dpo4 allows primer extension past the [BP]G* lesion site to a significant extent; bypass of the [BP]G* in the current CG*G sequence context is achieved through a major −1 deletion pathway and a minor non-slippage pathway. Single nucleotide insertion assays in the CG*G sequence context, together with a previous study in a TG*G sequence context, reveal strong nucleotide selectivities: dGTP is strongly favored in CG*G and dATP is significantly preferred in the TG*G sequence contexts. In this predominant −1 deletion pathway, nucleotide insertion opposite the base on the 5′ side of the adduct involves the skipping of the template strand base G (Figures (Figures5A,5A, ,6A),6A), thus introducing a significant structural disturbance that hinders extension from the +1, +2 and +3 sites (Figure 3A and 3B). In addition, the 0 → +1 step in Dpo4 is faster than the slowest −1 → 0 step (Figure 3A). However, processing of the [BP]G* lesion in the CG*G sequence context by an A-family high-fidelity DNA polymerase, the Bacillus fragment (BF), is strikingly different, as shown in our earlier work (15). In this polymerase, nucleotide insertion opposite the adduct is allowed with the preferential insertion of purines (dATP and dGTP), but further extension is strongly inhibited with only ~1% full extension observed (15). In addition, the template base flanking the adduct on the 5′-side did not alter the selectivity of purine dNTP insertion opposite the [BP]G* adduct when catalyzed by BF (45). The different replicating activities of these two polymerases for this adduct are summarized in Table 3.
These differences result from the structural distinctions of the two enzymes, especially at the active site region. Dpo4 has a spacious active site which can accommodate two templating bases simultaneously and thus can readily allow slippage to occur (25). The observed nucleotide selectivities in Dpo4 result from a realignment or relocation of the 3′-end primer base so that dGTP or dATP pairs with template C or T, respectively, flanking the adduct [BP]G* on the 5′ side; the subsequent nucleotidyl transfer reaction in such slippage structures involves extension from a normal Watson-Crick C:G or T:A pair. The minor non-slippage pathway in Dpo4 is more facile than the analogous primer extension in BF (15). At Step 1 (Figure 1B), both anti- and syn-dG* cause negligible distortion to the structure. From Step 2 through Step 4, differing dG* anti/syn conformational preferences were deduced from the simulations depending on the partner base opposite the [BP]G* adduct (Table 2); Dpo4 can more easily sample a reaction-favorable glycosidic conformation of the dG* as well as the partner since a conformational anti/syn rearrangement would take place more readily in the spacious active site of Dpo4 than in BF.
Significantly different from Dpo4, BF has a much more confined active site with strand slippage much less likely to occur; this can account for the absence of sequence context-governed nucleotide selectivity in the experiments. In addition, the minor groove side of the DNA in BF is highly packed with amino acid residues, which constitute a scanning tract to ensure replication with high fidelity and processivity; any factor that disrupts this minor groove scanning tract inhibits the enzyme activity. The blocking of TLS activity of BF by [BP]G* is explained by the anti dG* conformation with BP on the crowded minor groove side, as shown in a crystal structure (24) and our modeling studies (15). The observed limited bypass can be explained by a model where the [BP]G* adduct assumes the syn dG* conformation, placing BP on the open major groove side. The preference for the pairing of purine dNTPs opposite the adduct is attributed to a better accommodation of these dNTPs opposite the adduct in the syn-[BP]G* conformation which has a pyrimidine-like shape in the active site and allows for TLS (15). The syn-[BP]G* causes more structural distortion at the extension steps than the insertion step, explaining why further extension beyond the lesion site is more inhibited than nucleotide incorporation opposite the G* adduct in BF.
Overall, the slippage-prone structure of Dpo4 explains its predominant lesion bypass and sequence-governed nucleotide selectivity properties in the CG*G template sequence context. By contrast, the absence of sequence-governed nucleotide selectivity in BF and its strong lesion-blocking properties stem from its spatially tight and constrained active site that does not favor strand slippage.
Our modeling studies combined with primer extension experiments have provided new structural insights into the selective dNTP incorporation opposite the lesion G* and the subsequent primer extension beyond the lesion over an entire Dpo4 replicative cycle in the CG*G sequence context. The selectivity of dGTP incorporation results primarily from an unusual 5′-slippage mechanism in which the unmodified G rather than the G* is skipped (Figure 5A). Furthermore, the modeling provides structural understanding into the effect of the lesion on translesion synthesis, since the dNTP has two alternate slippage mechanism choices (skipping G or G*) that cannot be distinguished by kinetic experiments. In addition, our simulations suggest that the G* may undergo an anti/syn conformational transition during the replicative cycle which could be particularly feasible in the spacious Dpo4 active site region. By contrast, the poor extension following nucleotide incorporation opposite the lesion as well as the absence of sequence-governed nucleotide selectivity in the high-fidelity polymerase BF, stems from its non-slippage prone structural properties. By combining insights gained from simulations with experimental investigations, we achieve enhanced understanding of the structural origins for the functional observations. Simulating the lesion bypass through an entire replication cycle provides molecular views of the replicative stages in lesion processing that have not yet been achieved in crystal structures and are not kinetically observable.
†This work was supported by the National Cancer Institute, National Institutes of Health, through grants CA28038 (S.B.) and CA099194 (N.E.G.). Partial support for computational infrastructure and systems management was also provided by grant CA75449 (S.B.). The contents of this work are solely the responsibility of the authors and do not necessarily represent the official views of the National Cancer Institute or the National Institutes of Health. Computational resources supported by the NSF Partnerships for Advanced Computational Infrastructure are gratefully acknowledged. We thank the reviewers for valuable suggestions.
Supporting Information Available. Details of molecular modeling protocols for the slippage models, detailed results for the molecular dynamics (MD) simulations of the non-slippage models, torsion angles of [BP]G* for all initial models, and detailed analyses of trajectory ensembles. Experimental results comparing (1) extension product profiles at 12 and 30 min of the results shown in Figure 3A, (2) running-start primer extension experiments at 37°C and 55°C, and (3) Sanger sequencing data of the 41-mer, 42-mer, and 43-mer extension products, are also shown. This material is available free of charge via the Internet at http://pubs.acs.org.