Type II REases are known for their extraordinary sequence specificity. The precise recognition of the target DNA sequence is achieved by an extensive network of protein–DNA interactions and, frequently, by the redundancy of contacts with a particular base. A majority of the reported changes in the sequence specificity of REases led to obtaining a variant, which recognized a subset of the original degenerate targets (7
) or an extension of the original recognition sequence by introduction of degeneracy in one of the positions (9
). Our results belong to the first category since new variants of R.MwoI cleave preferentially only a subset of the original recognition sequence. The engineering of enzymes from the R.MmeI family was a distinctive case where deciphering of the “sequence recognition code” led to the construction of several REases with new specificities. Here, we reported the results of structure-guided protein engineering of a thermophilic REase R.MwoI, which has led to the development of a two single position variants with sequence preferences clearly different from the original enzyme.
In the absence of a high-resolution crystal structure for R.MwoI, we used bioinformatics methods to construct a theoretical model of the protein–DNA complex. The model was supported by site-directed mutagenesis of residues predicted to be functionally important, and compared with the crystal structures of two related enzymes with more stringent, but overlapping sequence specificities: R.BglI (GCCNNNNNGGC) and R.SfiI (GGCCNNNNNCCGG). This comparison suggested that a substitution of the residue S310 of R.MwoI (which is not required for the cleavage of its natural substrate) may lead to the acquisition of selectivity for the positions 3–9 of the MwoI cognate sequence. The model-based prediction was confirmed by the experimental analysis of S310 substitutions in R.MwoI. We obtained four enzyme variants with sequence selectivity different from the wt protein. The sequence preference depends on the chemical character of the amino acid side chain introduced instead of the original Ser residue. Substitution with a positively charged residue Arg (which is present in an analogous position in R.BglI and R.SfiI) or Lys (which is present in some of the uncharacterized homologs, see Supplementary Figure 1
) leads to the selectivity for the R.BglI recognition sequence (GCCNNNNNGGC). On the other hand, introduction of a negatively charged residue (not found in naturally occurring homologs) causes a preference toward the GCGNNNNNGCG sequence; this effect is more pronounced in the S310E variant than in the S310D variant. Other substitutions of S310 did not cause a change in the cleavage pattern (data not shown). We predict that the specificity is achieved owing to the specific recognition of Gua by Arg (or Lys) with the direct contact of the positively charged amino acid group with the O6 atom of the DNA base versus the recognition of Cys by Glu or Asp, and either direct (Glu) or water-mediated (Asp) contact of the negatively charged protein side chain with the N4 amino group of the DNA base. It will be interesting to try analogous substitutions in other enzymes of the R.MwoI/R.BglI/R.SfiI family, as they may lead to similar changes in the specificity.
Biochemical characterization of the R.MwoI variants with the altered sequence specificity demonstrated that their selectivity increased substantially. When assayed on set of short substrates with unique MwoI sites, the wt enzyme displays a 5-fold difference in the cleavage rate of the substrates that vary in the positions 3–9. In case of the three most selective variants, we could detect cleavage only for one preferred sequence and at least 20-fold lower cleavage of another sequence (T–A). Cleavage of two other substrates was undetectable, which indicates that these variants exhibit at least 1000-fold increase of the substrate cleavage selectivity.
Patterns of λ DNA cleavage generated by the S310R and S310E variants of R.MwoI indicate, however, that they have more complex sequence preferences than those characterized on the limited panel of short substrates used in the kinetic analysis. Cloning and sequencing of fragments generated by the cleavage of λ DNA provided us with data amenable to statistical analysis of the variants’ preferences. The S310R variant cleaves preferentially sequences with A–G, C–G, G–G and T–G at positions 3 and 9 of the GCNNNNNNNGC site (where the discriminated sites are underlined). Therefore, its sequence preference can be summarized as GCNNNNNNGGC. In partial cleavage conditions, the S310R variant shows clear preference toward the GCNCNNNNNGGC sequence, similarly to the preference revealed in kinetic assays on substrates with a single recognition site.
Cleavage preferences of the S310E variant, according to the analysis of the cloned cleavage products, are broader than the preferences of the S310R variant. The S310E variant cleaves preferentially sites with sequences C–G and G/C at positions 3 and 9 of the of the GCXNNNNNXGC site, but sequences G–A, G–G, T–A and T–G are cleaved only slightly less efficiently. More apparent is the discrimination of the S310E variant against sites with sequences A–A, A–G and A–T at positions 3 and 9.
Further protein engineering efforts are necessary to obtain R.MwoI variants with fully developed specificities toward a single six nucleotide interrupted palindrome. One possibility to explore is the introduction of additional contacts to the base complementary to that contacted by residues at position 310 in R.MwoI. Analysis of R.MwoI model and sequence alignment involving R.BglI and R.SfiI that do form such contacts, reveals however a deletion in a corresponding loop in R.MwoI (see , around position 308 in R.MwoI), which hampers the introduction of a suitable contact by a simple amino acid substitution. Thus, optimization of the S310E and S310R specificities will most likely require extensive experiments involving alternative insertions with different sequences and testing for their specificity.
The measurements of the binding affinities of S310R, S310K and S310E R.MwoI variants suggest that discrimination of nucleotide residues in positions 3 and 9 is caused by differences in the substrate binding. R.MwoI variants bind their preferred substrates slightly weaker than the wt enzyme, but they display at least a 25-fold discrimination against other sequences. Equilibrium binding constants show a 100 to 1000-fold difference in the KD between the preferred sequence and other palindromic extensions of the MwoI cognate sequence. KD values demonstrate that the variants’ affinities for the preferred sequences did not increase in comparison with the wild-type enzyme. The main factor responsible for the sequence preference of the obtained proteins is a significant decrease in the affinities for the substrates with sequences other than the preferred one. We suspect that in the engineered variants new interactions with the 3–9 positions are formed, but they may also weaken other elements of the substrate binding network, compared with the wt enzyme. The answer to such detailed questions would require the availability of a high-resolution crystal structure. Unfortunately, thus far our attempts to crystalize R.MwoI-DNA complexes were unsuccessful.
Our approach was to some extent similar to the one used in one of the first attempts to engineer the R.EcoRV REase, in which the 6
bp EcoRV recognition sequence was extended by the nucleotides in the flanking positions (5
). In principle, such changes do not require any alteration of the existing interaction network in the native enzyme, but rather involve introduction of new ones, which discriminate nucleobases in additional positions in the immediate vicinity of the wt target sequence. In the case of R.MwoI, which recognizes an interrupted sequence, it was possible to extend the recognition toward the internal positions that are not recognized by the wt enzyme. In the DNA co-crystal structure of R.NotI (a REase that recognizes an 8-bp sequence), the central six base pairs form 17 interactions with the REase that saturate the interaction potential of this part of the substrate (49
). However, distal base pairs of the NotI site are recognized only by two amino acid residues. This observation led to hypothesis that extension (or shortening) of the target sequence may be one of the mechanisms of divergence in REases. Therefore the approach we have used bears some resemblance to the natural evolution.
R.MwoI is a thermophilic REase with an optimum reaction temperature 60°C, 10% activity at 37°C and undetectable activity at the room temperature. Here, we reported an efficient expression of R.MwoI in E. coli without a cognate methyltransferase, when grown at the room temperature. Such system brings some new possibilities of REase specificity engineering that are not available in the case of mesophilic REases. The most important application of this system is the potential to express native or engineered thermophilic REases with any sequence specificity because the methyltransferase is not required to protect the host DNA from cleavage. In the case of thermophilic enzymes with detectable activity at temperature 37°C or 42°C, it is possible to create permissive and non-permissive conditions in the same E. coli host. In such situation, a screen for desirable mutants of REase would be carried out by combining the replica plating technique with two different growth temperatures. Two obvious exemplification of such strategy would be (i) screening for inactive variants (colonies will grow in a host without the methyltransferase regardless of the incubation temperature); (ii) screening for the variants with sequence specificity not recognized by the cognate methyltransferase (even in the presence of the cognate methyltransferase, colonies will not grow at permissive temperature). In any case, the possibility of expressing a thermophilic REase without a cognate MTase in the host cells, relieves the experimentalist from the necessity of engineering a cognate MTase with the new specificity in the first place. Although R.MwoI variants obtained in this study do not cut sequences not recognized by the wt enzyme, the experimental protocol described herein allows for engineering REase variants that could cut other sites as well.
The results of our study prove the utility of protein modeling as a tool to guide protein engineering. We have successfully demonstrated that a single residue replacement can lead to the development of new, different sequence preferences. This finding explains the extraordinary potential of REases to develop new sequence specificities in the nature. Together with the new procedure for engineering of thermophilic REases, our study provides a toolkit for rational engineering of sequence specificities in Type II REases.