|Home | About | Journals | Submit | Contact Us | Français|
The variant form of human xeroderma pigmentosum syndrome (XPV) is caused by a deficiency in DNA polymerase η (Pol η) that enables replication through sunlight-induced pyrimidine dimers. We report high-resolution crystal structures of human Pol η at four consecutive steps during DNA synthesis through cis-syn cyclobutane thymine dimers. Pol η acts like a molecular splint to stabilize damaged DNA in a normal B-form conformation. An enlarged active site accommodates the thymine dimer with excellent stereochemistry for two-metal ion catalysis. Two residues conserved among Pol η orthologs form specific hydrogen bonds with the lesion and the incoming nucleotide to assist translesion synthesis. Based on the structures, eight Pol η missense mutations causing XPV can be rationalized as undermining the “molecular splint” or perturbing the active-site alignment. The structures also shed light on the role of Pol η in replicating through D loop and DNA fragile sites.
Life on earth can scarcely escape the ultraviolet radiation (UV) of sunlight, which catalyzes covalent linkages between adjacent pyrimidines in DNA1. The resulting pyrimidine dimers are roadblocks to most DNA polymerases, but the human POLH gene, defective in the variant form of xeroderma pigmentosum (XPV), encodes a Y-family DNA polymerase, Pol η, specialized for translesion synthesis (TLS) through cyclobutane pyrimidine dimers (CPDs)2,3. XPV is characterized by sunlight-induced pigmentation changes and a highly elevated incidence of skin malignancies. In contrast to humans, yeast lives well without Pol η when chronically exposed to a low dosage of UV radiation4. Conversely, in humans TLS by Pol η can decrease cellular sensitivity to anticancer DNA adducts, e.g. cisplatin5. In addition to TLS, Pol η is involved in somatic hypermutation6 and replication of D loop and DNA fragile sites7–10.
Pol η is the only one of the fifteen human DNA polymerases in which defects are unequivocally associated with cancer11–13. Among human Y-family polymerases (Rev1, Pols η,ι and κ), the lesion-bypass specificity and efficiency of Pol η stand out. Pol η binds CPD-containing DNA better than undamaged DNA and extends primers more processively opposite and up to 2 nt beyond CPDs14,15. Yet Pol η is completely blocked by UV-induced pyrimidine (6-4) pyrimidone photoproducts (6-4PP)16. 6-4PPs occur less frequently than CPDs and distort the helical structure of DNA more severely and are thus more efficiently recognized and removed by nucleotide-excision repair proteins17,18.
Structural analyses of these TLS polymerases reveal a spacious active site that accommodates DNA adducts and non-Watson-Crick base pairs with little discrimination19,20. However, the molecular basis for the specificity and efficiency of Pol η in DNA synthesis through CPDs has remained a puzzle despite many crystal structures, including those of the apo and cisplatin-DNA bound yeast Pol η21,22.
We report here crystal structures of the polymerase domain of human Pol η catalyzing the four-step TLS through a cis-syn thymine dimer and in a ternary complex with a normal DNA. Complementing these structures, two residues conserved among Pol η orthologs were mutated, and the resulting defects in TLS assessed by functional studies.
The catalytic domain of human Pol η (1–432 aa) (abbreviated as hPol η) was expressed in E. coli after codon optimization (Methods). Crystals of ternary complexes were initially grown with normal DNAs, and the structure was solved and refined to 2.9Å resolution. However, a hydrophobic crystal-lattice contact between protein molecules appeared to distort hPol η-DNA interactions as compared with known structures of Y-family polymerases (Supplementary Fig. 1). Multiple amino-acid substitutions were engineered to break this lattice contact. Non-hydrolyzable dNMPNPPs were synthesized to replace dNTPs for co-crystallization of an active hPol η, DNA and Mg2+. Different DNA lengths were systematically sampled and proven most effective for breaking the undesirable lattice contacts. After many trials, two new crystal forms were grown that diffracted X-rays to ~3Å and subsequently improved to 1.75 to 2.15Å (Supplementary Table 1–2).
The crystal structure of WT hPol η complexed with normal DNA and dAMPNPP opposite the template T (denoted Nrm) was determined and refined to 1.83Å (Methods) (Fig. 1a). The relevance of the structure is validated by the appearance of a well-formed active site centered on two properly coordinated Mg2+ ions and the 3´-OH of the primer strand 3.2Å from the α-phosphate of dAMNPP poised for in-line nucleophilic attack (Fig. 1b). The single water ligand of the Mg2+ ions (Fig. 1b) has been observed in several DNA polymerases and may participate in catalysis23. Configured around the non-hydrolyzable dNTP analog, the active site is arguably the most reaction ready among the homologous A-, B- and Y-family DNA polymerases crystallized to date. The incoming nucleotide, catalytic carboxylates and metal ions are nearly superimposable with those of human Pol β (X-family) despite dissimilar tertiary structures (Supplementary Fig. 2). This active site configuration is likely conserved among all DNA polymerases and possibly all RNA polymerases as well.
Like all Y-family polymerases, hPol η contains four domains—palm, finger, thumb and little finger (LF)—with the active site in the palm domain and DNA bound between thumb and LF. Unlike the structures of yeast Pol η–cisplatin complexes22, hPol η’s finger is closed and contacts the replicating base pair (Fig. 1c, Supplementary Fig. 3a), and LF interacts extensively with both template and primer in the major groove (Fig. 1d). A surface area of 2200Å2 is buried between hPol η and DNA. The finger, palm and incoming dNTP in hPol η superimpose well with those in Dpo4 and Pols κ and ι (Supplementary Fig. 3a). Uniquely in hPol η, however, two bases of the template strand instead of one are in the active site, and the 3´ T is base paired with the incoming dAMPNPP (Fig. 1c). Translocation of two template bases into the active site was previously observed with Dpo4, but there the 5´ base was paired with the dNTP resulting in a misalignment24. In hPol η a slight shift of LF relative to the finger domain enlarges the active site, and allows it to accommodate two template bases without misalignment. Concomitantly, hPol η’s thumb and LF are oriented differently relative to Dpo4 and Pols κ and ι (Supplementary Fig 3a). The thumb domain only contacts the DNA primer (Fig. 1d).
CPD-containing oligonucleotides were synthesized to place each crosslinked thymine at the template position or 1 or 2 bp upstream (Supplementary Table 1). Four ternary complexes were crystallized and labeled as TT1, TT2, TT3 and TT4. All except TT2 were crystallized in the same space group as the complex of normal DNA (Nrm). These structures were determined by molecular replacement (Methods). After refinement, the four CPD-containing complexes are virtually superimposable with Nrm (Fig. 2a). The rmsds among proteins are 0.15 to 0.71Å over ~400 pairs of Cα atoms.
In TT1 the 3´ thymine of CPD serves as the template and forms a Watson-Crick (WC) base pair with dAMPNPP, much like an undamaged base (Fig. 2b). The 5´ thymine of CPD is turned and moved closer to the finger domain than the undamaged base in Nrm. The tighter interactions with the CPD may explain the better fidelity of hPol η in TLS (Fig. 3a). S62 forms van der Waals contacts with the 5´ base in both cases (Fig. 1c), but S62 is not conserved and replacing it with Gly increases the TLS efficiency of hPol η25. Even with the S62G substitution, a 6-4PP would not fit in the active site (Supplementary Fig. 4a), which explains the selectivity of Pol η.
In TT2, the 5´ thymine of CPD also forms a WC base pair with dAMPNPP (Fig. 2c). The finger domain in TT2 is more open than in the other hPol η complex structures. Despite being crosslinked and rotationally constrained (Fig. 2a), the 3´ thymine of the CPD maintains a WC basepair with the primer, which undergoes minimal changes relative to the Nrm and TT1 complexes. This differs from CPDs complexed with a stalled replicative polymerase, where only one crosslinked thymine is base paired26. In TT1 and TT2, the template thymine adopts the position most similar to that of undamaged DNA, and the crosslinked partners adjust. In TT2 the deoxyribose of the 3´ thymine moves by 2–3Å and the base is shifted toward the major groove by ~2Å and tilted by ~20° (Fig. 2c).
Each template thymine in TT1 and TT2 is hydrogen bonded to Q38 of hPol η via its O2 oxygen (Fig. 2b–c). Both cytosine and thymine are vulnerable to forming UV-induced CPDs, and both have the O2 to form the hydrogen bond with Q38 as observed in TT1 to TT4 (Fig. 2d). In Nrm Q38 is instead hydrogen bonded with deoxyribose (Fig. 2b) due to a slight shift of the DNA template (Fig. 2a). Q38 is one of the two residues uniquely conserved in the Pol η family (Supplementary Fig. 5). Ala substitution of Q38 reduces the catalytic efficiency as expected (Fig. 3a). Interestingly, Q38A increases polymerase stalling after the CPD at the stage equivalent to TT3 (Fig. 3b). This stalling likely occurs because the template base cannot stack with the upstream CPD (Fig. 2e) and depends on Q38, its sole contact with the polymerase, to align with the incoming dNTP.
The second invariant residue in the Pol η family is an Arg (R61 in hPol η). In the yeast Pol η-cisplatin DNA complex crystal structures, its equivalent (R73) forms cation-π interactions with the base and polar interactions with the phosphates of the incoming dNTP22. In the five hPol η structures, the closed finger domain prevents R61 from stacking with the base. R61 instead adopts different rotamer conformations and is hydrogen bonded with the N7 of purines (A or G) or the phosphates (Fig. 2b–d), thus favoring the anti-conformation of dNTP and preventing Hoogsteen base pairing27. Ala substitution of R61 reduces the polymerase efficiency as well as misincorporation of dG opposite a template T (Fig. 3a). However, dGTP would be the correct incoming nucleotide if a cytosine is in the CPD. R61 is thus important for Pol η to efficiently synthesize DNA through cis-syn cytosine and thymine dimers but at the cost of potential misincorporation.
The DNAs in all four CPD-Pol η complexes maintain a straight B-form conformation (Fig. 2a) in spite of the inability of CPDs to base stack. This unperturbed structure is a stark contrast to the bent and unwound structures of CPD-containing DNAs alone or complexed with repair proteins (Supplementary Fig. 6)28–31. Poor base stacking and segmentation of the DNA helices are the structural features of CPDs that are recognized by repair proteins32. However, when complexed with hPol η, lesion-induced perturbations are absorbed by minor adjustments to torsion-angles of the surrounding nucleotides.
hPol η acts like a molecular splint to keep the damaged DNA straight and rigid owing to a continuous and highly positively charged DNA-binding surface that interacts extensively with the four template nucleotides immediately upstream of the active site (Fig. 4a). A β strand in LF (aa 316 to 324) is nearly parallel to the template strand, and every other mainchain amide donates a hydrogen bond to the template phosphates (Fig. 1d, ,4b).4b). Each phosphate forms additional hydrogen bonds with sidechains of Arg, Lys, Tyr and Thr. In contrast, the template-binding surface in other Y-family polymerases has a gap or holes owing to separations between LF and the catalytic core (palm, thumb and finger) (Fig. 4a, Supplementary Fig. 3a). The gap is particularly large in Pol κ, and the divided protein requires an N-terminal appendage (N-clasp) for stabilization33. Pol κ and Dpo4 likely use the structural gaps (Fig. 4a) to accommodate bulky minor-groove adducts during TLS34,35. The gaps in Dpo4 and Dbh also promote template looping out as a means of lesion bypass36,37. In hPol η LF is connected to the catalytic core by hydrogen bonds, salt bridges and hydrophobic interactions (Supplementary Fig. 3b–c). The resulting DNA-binding surface constrains the template backbone and reinforces a B-form structure in spite of CPDs.
The sidechains of R93 and R111 extend from the palm domain to the template strand and further strengthen this molecular splint. R93 connects the template strand with the incoming nucleotide via its neighbor Y92, which stacks with F18 (the steric gate)38 and the deoxyribose of dNTP (Fig. 1c). Although this Y92/R93 pair is found in Pols ι and κ, only in Pol η is the DNA template hydrogen bonded to R93. R111 is adjacent to the catalytic carboxylate D115 and interacts with both LF and the template strand (Fig. 5a), which explains why substitution of R111 by His leads to the XPV phenotype11.
TLS polymerases are recruited to stalled replication forks by Rad6-Rad18 mediated ubiquitination of PCNA39. Dissociation following TLS, however, may be an intrinsic property of human Pol η because of reduced affinity for DNA 3 bp past the CPD14,15. In all five structures, the DNA-binding surface of hPol η makes extensive interactions with 4 bp (Fig. 4), but has reduced interactions with the template strand further upstream. Modeling a CPD 1bp further upstream from TT4 reveals steric clashes of its phosphates with hPol η in addition to the loss of favorable hydrogen bonds (Supplementary Fig. 4b).
The crosslinked thymines are shifted 1–2Å into the major groove (Fig. 2). The C5 methyl groups form van der Waals contacts with LF (L378 and F423) in TT3 and TT4 (Fig. 4c). In Nrm, the space between the DNA bases and LF is filled with water and glycerol molecules. These major-groove interactions are lost when the CPD is 3 bp beyond the active site. The position-dependent CPD-Pol η interactions provide a structural basis for polymerase switching after lesion bypass.
Many mutations in Pol η have been identified in XPV patients, and five missense mutations R111H, A117P, T122P, G263V and R361S are within the first 432 aa11,12. These mutations and three newly identified XPV mutations (A264P, F290S and G295R; ARL, unpublished data) are modeled in Fig. 5b. A117 adjoins the catalytic carboxylate E116, and substitution by Pro would cause clashes with the carbonyl oxygen of V12. T122 occurs at the beginning of an α-helix and participates in the hydrogen-bond network surrounding all three catalytic carboxylates (D13, D115 and E116) (Fig. 5c). T122P would likely perturb the active site as does A117P. R111 is a part of the molecular splint (see the previous section). R361 is hydrogen bonded with the carbonyl oxygen of P316 and anchors the β strand (aa 316–324) for template binding (Fig. 5a,d). The R111H and R361S mutations are likely to relax the protein-DNA interface and break the molecular splint.
The remaining four XPV mutations are in the thumb domain and affect primer binding, which is essential to complete the molecular splint. G263 to Val and A264 to Pro mutations would cause steric clashes with the carbonyl oxygens of L258 and K261, respectively (Fig. 5e). These clashes would disrupt primer binding by the GGK motif (259–261aa) (Fig. 1d, ,4c).4c). F290 and G295 form the hydrophobic core of the thumb domain, and F290S and G295R mutations would destabilize the structure. Homozygous W174C has been implicated in XPV13, and V99M and I272T have been found in melanoma patients40. However, they are unlikely to directly alter the structure or polymerase activity of Pol η (Supplementary Fig. 7).
hPol η has been shown to synthesize through D loop and fragile sites, which contain hairpins and non-B-form structures7–10. Interestingly, in both crystal forms of hPol η the back of LF interacts with a neighboring DNA (Supplementary Fig. 8). The two symmetry-related DNA molecules are reminiscent of D-loop and hairpin structures as the template strands are nearly contiguous. Mimics of downstream DNA have not been observed previously in polymerase-DNA co-crystals. Many Pol η appear to retain the backside DNA-binding potential although the interfacial residues (R81, R84 and W339) are not strictly conserved (Supplementary Fig. 5). We hypothesize that the back of hPol η may be an authentic downstream DNA-binding site, and the LF serves as a wedge to separate non-B-form DNAs and aids the molecular splint to complete replication through D loop and fragile sites.
Human Pol η is specifically optimized for bypassing the most ubiquitous DNA lesion. An enlarged active site accommodates CPDs, the complementary DNA-binding surface reinforces the B-form conformation, and hydrophobic residues interact with crosslinked pyrimidines in the major groove. These features readily explain the specificity of human Pol η in TLS through CPDs and reduced efficiencies for other lesions including cisplatin 41–44. The limited interactions with the DNA minor groove may enable Pol η to misincorporate nucleotides during somatic hypermutation. The high-resolution structures reported here not only provide a detailed mechanism of TLS but also generate testable models for investigating novel functions of Pol η.
The full-length human Pol η gene codon-optimized for E. coli expression was synthesized by GenScript. The catalytic core (aa 1–432, hPol η) was cloned into modified pET28a45, expressed in E. coli and purified by Ni2+-affinity, MonoS and Superdex75 chromatography. The His-tag was removed by PreScission protease. Mutagenesis was performed using QuikChange (Stratagene). Non-hydrolyzable dNMPNPPs were purchased from Jena Bioscience, and phosphoramidites of CPD from Glen Research. CPD oligos were synthesized and purified by TriLink Biotechnogies. Ternary complexes were prepared by mixing WT or C406M mutant hPol η and annealed DNA at a 1:1.05 molar ratio and addition of 5 mM Mg2+ and 1 mM non-hydrolyzable deoxynucleotides (dNMPNPP). The final protein concentration was 6–7 mg/ml. Crystals were grown in 0.1 M MES (pH 6.0), 19–21% (w/v) PEG 2K-MME and 5 mM MgCl2 after several rounds of microseeding. Diffraction data were collected at sectors 22 and 23 of the APS. Phases were determined by molecular replacement46 and multi-wavelength anomalous dispersion using selenomethionine-labeled hPol η47. Structures were refined using CNS48 and interspersed with manual model building using COOT49. All residues are in the most favorable (97%) and allowed (2.3%) regions of Ramachandran plot except for two that are well defined by electron densities. For functional assays, the C-terminal truncated human Pol η (1–511aa), which has the same TLS activity as the full-length hPol η43, was subcloned into pET21a and readily expressed in E. coli. Q38A and R61A mutations were made using Mutant-K (TaKaRa BIO Inc). Steady-state kinetic assays and primer extension reactions were carried out as described43.
We thank D. Leahy, M. Gellert and R. Craigie for critical reading of the manuscript. The research was funded by the intramural research program of NIDDK, NIH and grants from the Ministry of Education, Culture, Sports, Science, and Technology of Japan. Y.Z. is a recipient of Chinese Ministry of Education scholarship and joint PhD student in NIH-Zhejiang University Graduate Partnership Program. S.R-M received a fellowship from the Human Frontiers Science Program.
Supplementary information is linked to the online version of the paper at www.Nature.com/nature.
Author contributionsC.B. determined the five structures; Y.Z. prepared the samples and grew the crystals; Y.K. did the kinetic and bypass assays; S.R.-M. determined the type 1 structure; M.G. prepared the clone and type 1 crystals; J.-Y.L. made mutants; C.M. designed the functional assays; A.R.L. identified the unpublished XPV mutations; F.H. conceived the project; and W.Y. supervised the structure determination. C.B., Y.Z., F.H. and W.Y. prepared the manuscript. C.B. and Y.Z. contributed equally to the study. All authors discussed the results and commented on the manuscript.
Atomic coordinates and structure factors for the reported crystal structures have been deposited with the Protein Data Bank under accession codes 3MR2 (Nrm), 3MR3 (TT1), 3MR4 (TT2), 3MR5 (TT3) and 3MR6 (TT4). Reprints and permissions information is available at www.nature.com/reprints. The authors declare no competing financial interests.