|Home | About | Journals | Submit | Contact Us | Français|
Targeted genome engineering requires nucleases that introduce a highly specific double-strand break in the genome that is either processed by homology-directed repair in the presence of a homologous repair template or by non-homologous end-joining (NHEJ) that usually results in insertions or deletions. The error-prone NHEJ can be efficiently suppressed by ‘nickases’ that produce a single-strand break rather than a double-strand break. Highly specific nickases have been produced by engineering of homing endonucleases and more recently by modifying zinc finger nucleases (ZFNs) composed of a zinc finger array and the catalytic domain of the restriction endonuclease FokI. These ZF-nickases work as heterodimers in which one subunit has a catalytically inactive FokI domain. We present two different approaches to engineer highly specific nickases; both rely on the sequence-specific nicking activity of the DNA mismatch repair endonuclease MutH which we fused to a DNA-binding module, either a catalytically inactive variant of the homing endonuclease I-SceI or the DNA-binding domain of the TALE protein AvrBs4. The fusion proteins nick strand specifically a bipartite recognition sequence consisting of the MutH and the I-SceI or TALE recognition sequences, respectively, with a more than 1000-fold preference over a stand-alone MutH site. TALE–MutH is a programmable nickase.
Precise gene targeting requires custom-designed highly specific nucleases. Two basically different approaches are being pursued for this purpose, (i) protein engineering of homing endonucleases, which results in ‘meganucleases’ of predefined specificity (1,2) and (ii) the fusion of a programmable DNA-binding module and a DNA-cleavage module, as exemplified by the zinc finger nucleases [ZFNs, (3–6)] and the TALE nucleases [TALENs, (7–10)]. ZFNs and TALENs use the non-specific cleavage domain of FokI as cleavage module and an array of zinc fingers or the DNA-binding domain of TAL effector proteins as DNA-binding module, respectively. Recently, also fusion proteins with a catalytically inactive I-SceI or a ZF array as DNA-binding modules and the restriction endonuclease PvuII as sequence-specific DNA-cleavage module were described (11,12), as well as fusion proteins with a catalytically inactive I-OnuI or a ZF array as DNA-binding modules and the I-TevI nuclease domain as DNA-cleavage module (13).
All nucleases mentioned above introduce DNA double-strand breaks at a specific target site that can be repaired by two different pathways: homology-directed repair (HDR) which requires a donor DNA template, or error-prone non-homologous end-joining (NHEJ), resulting in insertions, deletions or translocations. For the purpose of genome engineering by introducing sequence alterations or insertions at or near the site of the double-strand break, NHEJ is an unwanted and possibly genotoxic side reaction. It can be largely circumvented by using a DNA-nicking endonuclease [vulgo ‘nickase’, (14,15)] which had been shown to stimulate HDR (16).
Nickases occur naturally or can be obtained by engineering restriction or homing endonucleases [reviewed in (17,18)]. Naturally occurring nickases are for example the restriction endonuclease Nt.CviPII (19,20) or subunits of heterodimeric restriction endonucleases, e.g. Nt.BstD6I (21,22), Nb.BsrDI and Nb.BtsI (23). I-HmuI and I-BasI represent naturally occurring nickases in the homing endonuclease family (24–26). Restriction and homing endonucleases that usually recognize palindromic or quasi-palindromic DNA sequences and catalyze double-strand cuts making use of two catalytic centres can be engineered to become nickases by inactivating one catalytic centre, as had been shown for EcoRV (27,28), FokI (29), I-SceI (30) and I-AniI (31).
ZFNs that use the non-specific DNA-cleavage domain of FokI to introduce a programmable DNA double-strand cut can be converted to a ZF-nickase by inactivating the catalytic centre of one FokI monomer in a ZFN heterodimer. This has been shown in a few studies published in 2012 (32–34), which demonstrated that the ZF-nickases allow site-specific genome modifications at the predetermined target site, while reducing unwanted mutagenesis caused by error-prone NHEJ.
It had been argued by Halford (35) that ‘the reaction mechanism of FokI excludes the possibility of targeting ZFNs to unique DNA sites’. Although this risk has been minimized by using obligate heterodimers (36,37), it was shown in two recent studies that ZFNs that form obligate heterodimers caused residual off-target cleavage (38,39). What is a problem with ZFNs might be also one with ZF-nickases, when the same FokI cleavage module is used. The question arises, whether one can substitute the catalytic domain of FokI in ZF-nickases with a different DNA-cleavage module, as we have done in three different fusion proteins, PvuII–I-SceI (12), ZF-PvuII (11) and TALE–PvuII (M. Yanik, unpublished data) that introduce double-strand cuts into DNA.
In this study, we engineered and tested two highly specific nickases, MutH–I-SceI and TALE–MutH. The DNA mismatch repair protein MutH is a naturally occurring monomeric site-specific nickase that can nick the DNA at GATC sites in un- or hemimethylated DNA (40,41). It is part of the MutHLS system from Escherichia coli and a few other bacteria; it serves to repair DNA mismatches, e.g. caused by rare errors of the replicative polymerases. The mismatch is recognized by MutS which recruits MutL to the site of error to form a ternary complex (42). This complex activates MutH to nick the erroneous daughter strand at the 5′-side of a hemimethylated GATC site which can be >1000 bp upstream or downstream from the mismatch (43). The nick in the unmethylated strand is the entry point for UvrD helicase to generate single-stranded DNA that is digested by exonucleases past the original mismatch. The resulting gap is filled by DNA polymerase and the DNA is ligated (44). MutH by itself hardly shows any cleavage activity on unmethylated DNA under physiological ionic strength (41,45). Only when guided by the fused DNA-binding module does MutH exhibit nicking activity at the targeted GATC site. We show here that MutH–I-SceI and TALE–MutH can be considered as site- and strand-specific nickases, and TALE–MutH, in addition, as a programmable nickase.
For the construction of the MutH–I-SceI fusion protein, a catalytically inactive variant of I-SceI (46), which had been truncated at the C-terminus (ΔC9), was fused to the C-terminal end of a cysteine-free variant of MutH via a 10-amino-acid linker (ASENLYFQGG) harbouring a TEV protease recognition site (underlined), or for control a linker without the TEV site (TKQLVKSE). The gene for MutH (C96S), which contains the coding sequence for an N-terminal His6-tag, and the gene for I-SceI (ΔC9 D44S D145A) were connected by the coding sequence of the linker and inserted into the expression vector pASK-IBA63b-plus (IBA, Göttingen, Germany) coding for a C-terminal Strep-tag. Thus, the expected fusion protein would be: His6-MutH–I-SceI-Strep.
For the TALE–MutH fusion protein, a truncated variant of the TALE protein AvrBs4, corresponding to the previously described AvrBs3 DNA-binding module (9), was fused directly to the N-terminal end of MutH. The gene for the TALE–MutH fusion contains two parts: (i) the gene of the truncated TALE variant, missing the coding sequence of the first 152 amino acids at the N-terminal end and the last 250 amino acids at the C-terminal end [28 amino acids remain after the last half-repeat (M. Yanik, unpublished data)]. (ii) The gene of MutH, which contains the coding sequence for a C-terminal His6-tag. The two parts were connected and introduced into the expression vector pQE30 (Qiagen), coding for an N-terminal Strep-tag. Thus, the expected fusion protein would be: Strep-TALE–MutH-His6. The sequence of both fusion constructs was confirmed by DNA sequencing of the entire coding region. We had varied the linker length between AvrBs3 and PvuII which is structurally very similar to MutH (47) and found the 28-amino-acid linker superior to a 63- and 16-amino-acid linker (M. Yanik, unpublished data). Therefore, we have chosen the 28-amino-acid linker for the TALE–MutH fusion protein.
The expression vectors for the recombinant fusion proteins were introduced into the E. coli strain XL1-Blue (Stratagene). The cells were grown at 37°C to OD600 nm ca. 0.7 in LB-medium containing 75 µg/ml ampicillin. Protein expression was induced by adding 200 mg/l anhydrotetracycline or 1 mM isopropyl-β-d-thiogalactopyranoside for MutH–I-SceI or TALE–MutH, respectively, followed by further growth at 20°C overnight. The cells were harvested by centrifugation and resuspended in 20 mM Tris–HCl, pH 7.9, 1 M NaCl, 20 mM imidazole, 1 mM phenylmethanesulfonyl fluoride (PMSF), lysed by sonification and centrifuged for 30 min (>17 000g) at 4°C to remove cell debris. The His6- and Strep-tagged proteins were first purified via Ni2+-NTA agarose (Qiagen) by a 1 h incubation of Ni2+-NTA agarose with the supernatant at 4°C and washing with resuspension buffer; the protein was eluted with 20 mM Tris–HCl, pH 7.9, 1 M NaCl and 200 mM imidazole. The eluted fractions were then transferred to a column with Strep-Tactin Sepharose (IBA) for further purification. After washing with 100 mM Tris–HCl, pH 7.9, 1 M NaCl, the protein was eluted with 100 mM Tris–HCl, pH 7.9, 500 mM NaCl, 2.5 mM desthiobiotin, dialyzed overnight at 4°C against 10 mM HEPES-KOH, pH 7.9, 500 mM KCl, 1 mM ethylenediaminetetraacetic acid (EDTA), 1 mM dithiothreitol (DTT), 50% (v/v) glycerol and stored at −20°C. The protein concentration was determined by measuring the absorbance at 280 nm using the molar extinction coefficient as determined according to Pace et al. (48). Protein purification was monitored by sodium dodecyl sulphate–polyacrylamide gel electrophoresis (PAGE) analysis.
To test the activity and specificity of the fusion proteins, the substrate plasmid pAT153 was used, which was modified by introducing a DNA cassette with the appropriate target site (Table 1). Three kinds of substrates were generated: one addressed substrate and two unaddressed substrates. The addressed substrate contains the recognition site for the binding module [I-SceI (S) or AvrBs4 (T)] and a MutH (H) recognition site separated by spacers of different lengths. For MutH–I-SceI, only an addressed substrate with a 3-bp spacer was used (S-3-H) and for TALE–MutH, several spacer lengths were tested (T-x-H; x = 1, 2, 3, 4, 5, 6, 7, 8 and 9 bp). The two unaddressed substrates contain either the recognition site for the binding module (S or T) without a nearby MutH recognition site or with a MutH recognition site (H) and without a recognition site for a binding module. For the analysis of the cleavage reactions, 4 nM substrate plasmid and 16 nM enzyme (i.e. a 4-fold excess of enzyme over substrate) were incubated in 20 mM Tris-acetate, 120 mM K-acetate, 1 mM MgCl2, 50 µg/ml bovine serum albumin (BSA), pH 7.5 at 37°C. Taking the enzyme dilution into account, the ionic strength of the reaction mixture was ca. 160 mM (which corresponds to physiological conditions). The reaction progress was measured after defined time intervals (1, 3, 10, 30, 60 and 120 min) by agarose gel electrophoresis; gels were stained with ethidium bromide, and the fluorescence of DNA bands was visualized with a BioDocAnalyze System (Biometra). The time course of the reaction (product versus time) was fitted to a monoexponential function to give the rate constants for nicking. For the determination of the strand-specificity of MutH–I-SceI and TALE–MutH, respectively, a pAT153-derived plasmid harbouring the specific substrate cassettes S-3-H and T-3-H/T-6-H, respectively (Table 1), was used. The nicked addressed plasmid was gel-purified and sent for sequencing with primers 5′-AATAGGCGTATCACGAGGCCCTTTC-3′ (binding to the bottom strand) and 5′-ACCCAGAGCGCTGCCGGCAC-3′ (binding to the top strand).
The TEV protease cleavage of MutH–I-SceI to separate the DNA-binding and DNA-cleavage modules was performed in 50 mM Tris–HCl, pH 8.0, 0.5 mM EDTA, 1 mM DTT, 10 units/150 µl of TEV protease (Invitrogen) and 0.44 µM MutH–I-SceI in the presence of 4.4 µM of a 53-bp oligonucleotide harbouring a GATC recognition site. The reaction mixture was incubated for 90 min at 30°C. Afterwards, an aliquot was taken to test the activity of MutH–I-SceI by a plasmid cleavage assay as mentioned above, using the addressed substrate. To exclude that the loss of the catalytic activity of MutH–I-SceI after TEV proteolysis is due to the procedure, the whole experiment was performed in parallel using water instead of the TEV protease.
To investigate whether TALE–MutH nicks a DNA substrate in the top or the bottom strand (defined by the asymmetric recognition sequence of the DNA-binding module), PCR products were generated using a forward and a reverse primer which were 5′-labelled with the fluorophores Atto 488 and Atto 647N, respectively. These experiments were carried out for the T-x-H substrates (x = 1, 2, 3, 4, 5, 6, 7, 8 and 9 bp). The assay was performed with 20 nM of a 211-bp polynucleotide substrate and 120 nM of TALE–MutH (i.e. a 6-fold excess of enzyme over substrate) in 20 mM Tris-acetate, 120 mM K-acetate, 1 mM MgCl2, 50 µg/ml BSA, pH 7.5 at 37°C. The ionic strength of the reaction mixture was ca. 160 mM (which corresponds to physiological conditions). The reaction progress was analysed after defined time intervals (3, 10, 30, 60, 120 and 180 min) by denaturing PAGE; the fluorescence of DNA bands was visualized with the VersaDoc Imaging System (Bio Rad).
Two principally different types of approaches had been recently used by us to produce highly specific nucleases for the purpose of genome engineering: (i) fusion proteins with a specificity defined by a catalytically inactive homing endonuclease [I-SceI, (12)] and (ii) fusion proteins with a programmable specificity defined by a ZF array (11) or a TALE protein (M. Yanik, unpublished data). In both cases, the Type II restriction endonuclease PvuII was the DNA-cleavage module. We have now used these two different approaches in generating highly specific nickases, MutH–I-SceI and TALE–MutH. In analogy to our PvuII–I-SceI construct (12), we have produced a MutH–I-SceI fusion protein with an N-terminal His6-tag and a C-terminal Strep-tag. Likewise, in analogy to our ZF- (11) and TALE-PvuII constructs (M. Yanik, unpublished data), a TALE–MutH fusion protein with an N-terminal Strep-tag and a C-terminal His6-tag was produced (Figure 1).
The rates of DNA nicking were determined with unmethylated plasmid substrates with three different recognition sites: (i) the addressed bipartite recognition site composed of an I-SceI recognition site next to a MutH recognition site (GATC), (ii) a stand-alone I-SceI recognition site and (iii) a stand-alone MutH recognition site. All three plasmid substrates have 19 additional GATC sites. As shown in Figure 2, only the plasmid substrate with the addressed bipartite site is nicked. Even at a 4-fold excess of enzyme over unaddressed substrate and at prolonged incubation time (2 h), no non-specific nicking or cleavage is observed. Given the sensitivity of the assay, these results show that the plasmid substrate with the addressed site is nicked by a factor of 1000 faster than the other plasmid substrates. Actually, the preference for the addressed site over an unaddressed site might exceed the factor of 1000, because (i) the plasmids used in the assay contain 19 unaddressed sites, i.e. stand-alone GATC sites and only one addressed site and (ii) in the determination of the rate of nicking of the undressed substrate, the lower limit of detection had been reached.
The activity of the MutH–I-SceI fusion construct can be suppressed by separating the DNA-binding and DNA-cleavage modules by pre-incubating MutH–I-SceI with the TEV protease which cleaves the linker peptide that connects the two modules. Figure 3 shows that the covalent linkage of I-SceI and MutH is necessary for nicking of the addressed site. There is no nicking or cleavage observed when I-SceI and MutH are present in the reaction mixture but not covalently linked to each other. If the MutH–I-SceI fusion construct, in which the linker does not contain a TEV protease cleavage site, is pre-incubated with TEV protease, the addressed substrate is nicked, demonstrating that TEV protease does not cleave MutH or I-SceI at a cryptic TEV protease recognition site.
Since in the fusion protein MutH–I-SceI, the DNA-binding and the cleavage module were fused with a flexible linker, the question arose, whether both strands can be nicked by MutH or whether one strand is preferred. Guided by the molecular model shown in Figure 1, we expected the bottom strand to be cleaved preferentially if not exclusively. The strand specificity of MutH–I-SceI was determined by sequencing the nicked product. As shown in Figure 4, the top strand remains intact and the bottom strand is cleaved 5′ of the GATC site.
Similar experiments as carried out to determine the preference of the MutH–I-SceI fusion protein for addressed over unaddressed substrate cleavage were carried out for the TALE–MutH fusion proteins. As shown in Figure 5, the TALE–MutH fusion constructs are specific for the addressed bipartite recognition site, consisting of the TALE target site (here we used the AvrBs4 target site) in a defined distance next to a MutH recognition site. We tested spacer lengths of 1, 2, 3, 4, 5, 6, 7, 8 and 9 bp between the two sites on the addressed substrate and found that a distance of 3 bp (T-3-H) is optimal (Figure 5C) for nicking. No double-strand breaks were detectable.
To determine the strand specificity of TALE–MutH, we have used a 211-bp PCR product that carried different fluorophores on the 5′-ends of the top and bottom strand, respectively. Depending on which strand is preferentially nicked, the electrophoretic analysis of the nicking reaction would yield a different characteristic fluorescence image. As shown in Figure 6, TALE–MutH nicks the bottom strand of the substrate T-3-H with more than two orders of magnitude higher preference over the top strand. Intriguingly, the substrate T-6-H is nicked preferentially in the top strand, albeit by a factor of ca. 3 more slowly than the bottom strand of T-3-H. Similar experiments were performed also for the other substrates: T-1-H to T-4-H are nicked preferentially in the bottom strand, whereas T-5-H to T-9-H are nicked preferentially in the top strand. This change in preference for bottom strand nicking to top strand nicking could be correlated with the finding shown in Figure 5 that there is a decrease in the rate of nicking between T-3-H and T-6-H. Similarly, as for MutH–I-SceI (Figure 4), determination of strand specificity was also determined for the substrates T-3-H and T-6-H by sequencing. The results show that the bottom strand (T-3-H) and the top strand (T-6-H), respectively, are nicked only at the ↓GATC site.
The efficiency of homologous recombination can be increased by several orders of magnitude by a specific double-strand break at the locus of interest (49). The targeted insertion of DNA into a pre-defined locus by homologous recombination requires highly specific nucleases (50,51). Such nucleases became available with the fusion of zinc fingers to the FokI cleavage domain (52). It was demonstrated, however, that engineered double-strand-specific nucleases could introduce mutations when the double-strand break is repaired by the error-prone NHEJ pathway (53) rather than by HDR and that nicking enzymes suppress NHEJ (14,15), which means that DNA nicks can initiate efficient gene correction, with less genomic instability than a targeted DNA double-strand break. Ramirez et al. (33), Wang et al. (34) and Kim et al. (32) introduced Zinc finger nickases (ZFNs) for targeted gene insertion and showed that they induce HDR with reduced mutagenic effects. As there are other architectures for highly specific double-strand-specific nucleases [e.g. (11,12), M. Yanik, unpublished data], it should be possible to generate highly site- and strand-specific nickases other than the ZF-based nickases. In this article, we have shown that specificity-determining DNA-binding modules (catalytically inactive I-SceI and the DNA-binding domain of a TALE protein, respectively) can be fused to a specific nicking enzyme (MutH) to produce a highly sequence- and strand-specific nickase. The fusion proteins that we obtained, MutH–I-SceI and TALE–MutH, recognize their respective bipartite recognition sequence, consisting of the recognition site of the DNA-binding module and the MutH recognition site (GATC). They only nick one strand 5′ of the GATC site and do not nick stand-alone GATC sites or any other sites, making them potentially useful tools for site-specific nicking of DNA in general and precision genome engineering in particular.
MutH is a site-specific DNA nicking enzyme which in its natural function requires complex formation with activated MutL to be directed to its target site, a hemimethylated GATC site that is nicked in the unmethylated strand. By itself and at physiological ionic strength and Mg2+ concentration (which we had deliberately chosen to be prepared for applications in vivo), MutH does not attack unmethylated GATC sites, but is strictly dependent on a covalent coupling to a DNA-binding module, as was shown here for MutH–I-SceI. Proteolytic separation of the DNA-binding and DNA-cleavage module prevents DNA cleavage. This finding suggests that MutH cannot bind in a productive manner to the bipartite recognition site, unless it is positioned properly by the I-SceI (or TALE) module as part of the fusion protein.
We had shown before (12) that catalytically inactive I-SceI can serve as a specific DNA-binding module for the restriction endonuclease PvuII. Our results with MutH–I-SceI demonstrate that this is also possible with other nucleases, here a nicking enzyme that is structurally related to PvuII (47). MutH–I-SceI recognizes a unique sequence. If such a sequence is introduced into a complex genome, this sequence could be used as a target site for genome engineering, as it was done with I-SceI sites for in planta gene targeting (54).
Of particular interest is the TALE-MutH fusion protein, which in contrast to MutH–I-SceI is programmable to recognize almost any DNA sequence. Similar to the MutH–I-SceI, the TALE–MutH fusion protein requires a GATC site next to the recognition sequence of its DNA-binding module, the TALE recognition sequence. This requirement should not limit the usefulness of this programmable nickase, as GATC sites occur on average every 256 bp and are unmethylated in eukaryotic genomes. In contrast to the ZF-nickases based on FokI (32–34), TALE–MutH only needs one TALE protein for targeting to a specific site, which reduces the size of a MutH-based TALE-nickase by 50% compared with a FokI-based TALE-nickase, because MutH functions as a monomer whereas FokI is a functional dimer. As the DNA-binding module of TALE proteins can be used to program the restriction endonuclease PvuII to cleave a bipartite recognition sequence consisting of the TALE recognition sequence and the PvuII recognition sequence (M. Yanik, unpublished data), we believe that other single-strand-specific nucleases can function as highly specific nickases in fusion proteins consisting of a DNA-binding module and a DNA-nicking module. Examples are restriction enzymes such as Nt.CviPII (19,20) or subunits of heterodimeric restriction endonucleases, e.g. Nt.BstD6I (21,22), Nb.BsrDI and Nt.BtsI (23) (Nt or Nb indicate the specificity for top or bottom strand nicking). Different from these naturally occurring nicking enzymes, the TALE–MutH fusion protein can nick the upper or lower strand, depending on the distance between the TALE recognition site and the GATC sequence. For example, with a spacer length of 2–4 bp (optimally T-3-H), the bottom strand is nicked with several hundred-fold preference over the other strand. In contrast, with a spacer length of 5–8 bp (optimally T-6-H), the rate of nicking decreases but now the top strand is preferred. Substrates with a spacer length of 1 and 9 bp are hardly attacked at all. This suggests that there is some flexibility in the junction of the DNA-binding module and the DNA-cleavage module which allows MutH to reach the scissile phosphodiester bond either in the bottom strand or in the top strand, depending on the distance between the MutH recognition site relative to the TALE recognition site. We believe that decreasing or increasing the length of the linker between TALE and MutH could have a similar effect, i.e. changing the preference for bottom or top strand nicking. As we have not determined the optimal linker between AvrBs4 and MutH experimentally, but rather extrapolated the linker length (28 aa) in our TALE–MutH fusion protein from a previously constructed TALE–PvuII fusion construct, it could be that the specificity could be further increased by slight modifications in the linker length. Likewise, we have not optimized the DNA-binding module of TALE protein but rather used the natural AvrBs4 protein sequence; it could well be that exchanging some of the RVDs by ‘strong’ RVDs (55) could even further increase specificity.
Deutsche Forschungsgemeinschaft [International Research Training Group GRK 1384, and Excellence Cluster ECCPS] and the Justus-Liebig-University Giessen [Just’us]. Funding for open access charge: DFG and the Justus-Liebig-University Giessen.
Conflict of interest statement. None declared.
The authors thank Mert Yanik, Marika Midon, Ines Fonfara, Andreas Marx and Roger Heinze for fruitful discussions and plasmids; Laura Waltl for assistance and Anja Drescher for critical reading of the article. L.G. is a member of the International Research Training Group ‘Enzymes and Multienzyme Complexes acting on Nucleic Acids’ funded by the Deutsche Forschungsgemeinschaft (DFG): B.S. has been a recipient of a grant (Just’us) by the Justus-Liebig-University Giessen.