Identification of putative MmeI-like restriction endonuclease genes
A BLASTP search using the protein sequence of MmeI as query against the non-redundant GenBank database returned more than 100 sequences that produced highly significant expectation values of E < e–20. The identified putative sequences were annotated as ‘hypothetical proteins’ or ‘putative DNA methyltransferases.’ While it might be expected that the DNA methyltransferase portion of the bi-functional MmeI protein would produce matches to DNA methyltransferase genes because these contain conserved sequence motifs, none of the top 100 hits included typical type II DNA methyltransferases. Many of the putative sequences identified, and especially the highest scoring sequences, were highly similar to MmeI throughout their entire protein sequence, including the endonuclease and DNA recognition domains. The identified protein sequences were aligned and two groups were identified. The first was similar to the entire MmeI protein and contained conserved amino-acid residues of the PD–ExK endonuclease family in their amino terminal domain that aligned with those of MmeI and with each other. Sequences in the second set did not contain the PD–ExK endonuclease motif and differed from MmeI in their first 100 amino-acid residues yet were highly similar to MmeI and the first set of putative genes throughout the rest of their sequences. No additional DNA methyltransferase genes were observed flanking either set of MmeI homologs. No particular genes consistently flanked the homologs that contained the endonuclease domain motif. However, the set of homologs that lacked the endonuclease motif, such as YeeA (GenBank accession no NP_388558) from Bacillus subtilis, were flanked by two conserved genes; one identified as a putative DNA helicase, similar to YeeB of B. subtilis (GenBank accession no AAB66475), and a second identified as a conserved hypothetical protein similar to YeeC of B. subtilis (GenBank accession no AAB66476). Selected sequences were cloned into E. coli for expression and characterization.
Characterization of novel restriction endonucleases
We have identified, expressed and characterized 20 novel restriction endonucleases. For each new enzyme the name, recognition sequence and DNA cleavage position, source organism, protein accession number and source of genomic DNA is presented in . Of the 20 newly discovered enzymes, 19 have unique DNA specificities not previously known for type II restriction–modification (R–M) systems. The one exception, AquIII [5′-GAGGAG(20/18)], recognizes the same DNA sequence as BseRI [5′-GAGGAG(10/8)] but cuts the DNA at a different position, making it a neoschizomer to BseRI. While each enzyme recognizes a different sequence, some of the recognition sequences differ at only one base position, while others differ at every position except the penultimate adenine that is the target for the DNA methyltransferase activity of these proteins. The enzymes all required AdoMet for endonuclease activity. An example of recognition sequence determination is shown in . The positions of the single RpaB5I cut in pBR322, and the four cuts in pBC4, were mapped by cutting the DNAs with RpaB5I and restriction enzymes having a single site in these DNAs. The sequence CGAGGAC or CGGGGAC was found to occur only at the mapped positions in these DNAs, and did not occur in two other DNAs, pUC19 and PhiX174, that were not cut by RpaB5I. The observed fragment sizes produced by RpaB5I digestion of larger DNA substrates such as lambda, T7 and T3 matched the predicted fragment sizes for cutting at CGRGGAC, confirming that this is the specific recognition sequence for RpaB5I.
The enzymes all cut DNA at essentially the same position relative to their recognition site, that is 20 (±1) bases 3′ to the recognition sequence in the top DNA strand that contains the adenine that is methylated, and two bases 3′ to this top strand cut in the bottom strand, to produce a two-base 3′ extension. The exact position of DNA scission can vary by one base at different sites for each enzyme, dependent upon the sequence that occurs between the recognition site and cleavage point. An example of run off sequencing to determine the cleavage position relative to the recognition sequence is shown for RpaB5I, which cuts at 5′-CGRGGAC(20/18)-3′ (). For many of the enzymes variability of one base longer or shorter than the typical (20/18) reach was observed for cleavage at different sites, and some sites showed a mixture of cutting length products, for example at (21/19) and (20/18). The cleavage distance most frequently observed for each enzyme is reported in ; however it should be understood that an enzyme reported as (21/19) may cut some sites at (20/18) and vice versa.
Two members of the set of sequences that do not contain the PD–ExK endonuclease motif, YeeA of B. subtilis and MslORFHP of Moraxella osloensis NEB722, were similarly expressed but no endonuclease activity was observed.
Activation of inactive native genes
Three enzymes were activated from open reading frames that were found to be disrupted in the particular isolate used for genomic sequence determination. The DNA sequence reported in the database leading to the interruption in the coding frame was confirmed for all three cases. Successful prediction of where to introduce changes and what specific changes to make to correct the reading frames of these enzymes was possible due to the significant sequence conservation found among members of this protein family. Only a single base change was necessary to change early termination stop codons to coding codons for NmeAIII and DraRI. For NmeAIII the early termination TAG stop codon at amino-acid position 32 of the full length ORF was changed to TGG (tryptophan) using primers 1 and 2 (
Supplementary Table T1). The early termination codon TAA at position 841 in DraRI was corrected to GAA using primers 3 and 4 (
Supplementary Table T1). ApyPI contained a frame shift following R
886 that was corrected by the addition of two bases, GC, to a run of three GC dinucleotides (GCGCGC changed to GCGCGCGC) using primers 5 and 6 (
Supplementary Table T1). Individual transformants carrying the corrected genes were tested for endonuclease activity and found to express active endonuclease.
Deinococcus radiodurans genomic DNA was tested and found to be cleaved
in vitro by the activated DraRI endonuclease, indicating the DNA methyltransferase activity of DraRI is not active
in vivo in the host
Deinococcus strain.
DNA methyltransferase activity produces N6-adenine methylation
The conserved DNA methyltransferase motifs in the expressed MmeI-like enzymes are those of the amino DNA methyltransferases, and are most similar to the gamma class of N6-methyl adenine DNA methyltransferases. Fourteen enzymes were tested and confirmed to produce N6-methyl adenine by the use of antibodies specific for N6-methyl adenine (A). None of these enzymes produced any detectable N4-cytosine methylation (B). The antibody results confirm that the enzymes in this family modify adenine at the N6 position to form N6-methyl adenine (m6A).
Genome context of MmeI family homologs
None of the genes expressed have an additional DNA methyltransferase gene in close proximity to the single polypeptide coding for the fused endonuclease–DNA methyltransferase enzyme. Furthermore, no one conserved gene, putative or characterized, is observed to co-localize with the 20 characterized endonuclease genes in their genome context. The absence of a nearby methyltransferase gene is consistent with the observation that all the systems tested use only the single-strand DNA modification produced by the bi-functional enzyme for host protection. Indeed, 15 of the characterized enzymes do not have an adenine base in the complement strand that could serve as a target for m6A modification, while one member of this family, BsbI, has neither adenine nor cytosine bases in the complement strand.
Modification of host DNA occurs on only one DNA strand of the duplex recognition sequence
Host genomic DNA was examined for the presence or absence of modification able to protect against endonuclease cleavage in each DNA strand of the recognition sequence. DNA substrates that consisted of a hybrid of one strand of host genomic DNA, which will carry the respective DNA modification present in the host cell, and one newly synthesized and therefore un-modified DNA strand, were produced from a single round of primer extension on host genomic DNA for RpaB5I and PspOMII. Both enzymes cut the hybrid DNAs in which the bottom strand of the recognition sequence, 5′-GTCCYCG-3′ for RpaB5I and 5′-YTGGGCG-3′ for PspOMII, was derived from their host genomic DNA and the top strand was newly synthesized, indicating the host DNA has no modification present in the bottom strand of the recognition sequence to block cleavage by these enzymes (,
Supplementary Figure S2). DNA in which the top strand of the recognition sequence, 5′-CGRGGAC-3′ for RpaB5I and 5′-CGCCCAR-3′ for PspOMII was located in the host-derived genomic strand and the bottom strand was newly synthesized were not cut by these enzymes, indicating modification is present in the top strand to prevent cleavage. The same results were observed previously for MmeI (
12). These results indicate that only the top strand adenine modification produced by the DNA methyltransferase activity of the bi-functional enzymes is present in their respective host DNA and able to block cleavage. No additional modification is present in the host DNA on the bottom strand of the enzyme's recognition sequence to block endonuclease activity. This observation is consistent with the absence of a co-localized companion DNA methyltransferase in the genome sequence context of these enzymes and the lack of a conserved adenine base target for modification in the bottom strand of their recognition sequences. These results confirm that the entire modification used by the RpaB5I, PspOMII and MmeI restriction systems, and by inference all members of this family of R–M systems, is the methylation of only the one conserved top strand adenine produced by these single polypeptide, bi-functional enzymes themselves.
MmeI family enzymes requires two sites for efficient DNA cleavage
Some type II endonucleases bind individual recognition sites and cleave their sites independently. Others require two or more sites for efficient cleavage, with the multiple sites either acting cooperatively to effect cleavage, or with one site binding to an effector position in the endonuclease to effect a conformational change required for DNA cleavage competence (
4,
24–26). The cleavage efficiency on DNA substrates containing single or multiple recognition sites was compared.
All the enzymes tested cleaved a single site DNA incompletely, achieving between 10 and 70% cleavage even with excess enzyme. For example, RpaB5I cuts its single site in pBR322 DNA only partially (A). In contrast, the same single site DNA is nearly completely cleaved when a second recognition site is provided
in trans by adding a synthetic DNA containing the RpaB5I recognition site (B and C). The DNA bearing the recognition site need not be capable of being cleaved itself, as a DNA having only 14 bases 3′ to the recognition site facilitates cleavage of the single site plasmid as well as a DNA extending to or beyond the position of cleavage. Cleavage stimulation is dependent upon the presence of the enzyme's specific recognition sequence, as addition of a similar DNA lacking an RpaB5I site did not increase cleavage (D). For RpaB5I the concentration of sites supplied
in trans needed to stimulate cutting of the single site DNA was approximately equimolar (0.01 µM) with the concentration of recognition sites (0.007 µM) in the single site DNA (C), in a reaction that contained two units of RpaB5I. These results are quite similar to those obtained for MmeI (
12).
Several of the enzymes characterized, such as NmeAIII and PspOMII, cut even a multiple site substrate incompletely, producing a stable, partial digestion pattern even with excess enzyme. For example, NmeAIII cuts pBR322, which contains three sites, to a stable partial digestion pattern that does not change even with 32-fold excess enzyme (A). NmeAIII cleavage of pBR322 is stimulated by the presence of its recognition site in trans, as observed for MmeI and RpaB5I; however in contrast to MmeI and RpaB5I, this stimulation requires an ~10-fold excess of both the enzyme and the in trans DNA in order to drive the cleavage reaction on the pBR322 substrate to completion (B). These results indicate that while all of the enzymes described require interaction between two specific recognition sites for cleavage, there are subtle differences in the endonuclease domains and their interactions that affect the extent of DNA scission produced.
Protein sequence features
The new enzymes described share many common features. They are single polypeptides that encode both the DNA methyltransferase activity required for host protection and the endonuclease activity for cleavage of identifiably foreign DNAs. The proteins are relatively large for type II restriction endonucleases, ranging in length from 908 amino acids (SdeAI) to 1184 amino acids (RpaB5I). The primary amino-acid sequences of the characterized enzymes are quite similar, with ApyPI and CstMI sharing 76% identity, and many of the enzymes exhibiting 40–50% identities. The amino-acid sequences align well, particularly when secondary structure predictions are included in the alignment algorithm (
Supplementary Figure S1). The enzymes display a remarkable conservation of predicted secondary structure elements throughout the entire alignment, while also displaying the flexibility common to restriction enzymes for accommodating insertions of short sequence elements in individual enzymes between regions of conserved sequence and secondary structures.
The endonuclease domain is located at the amino terminus of these proteins and contains the conserved motifs of the PD-ExK endonuclease family. Secondary structure prediction indicates the endonuclease domain forms a structure containing four helices and five beta strands in the order α-β-β-β-α-α-β-β-α, suggesting these enzymes fall into the class III group of restriction endonucleases proposed by Niv,
et al. (
8).The aspartate of the PD–ExK motif is completely conserved and occurs at the start of beta strand 2 (D
70 in MmeI). The E and K are also completely conserved and occur at the end of the third beta strand (E
80 and K
82 in MmeI). Mutations to these residues have recently been shown to abolish endonuclease activity (
27). There is a highly conserved (17 of 20) glutamate residue at the c-terminal end of beta strand 1 (E
51 in MmeI), though this is an aspartate in one case and a glutamine in two of the enzymes. A completely conserved glutamate also occurs just before the start of helix 2 (E
25 in MmeI).
A second feature observed in the MSA is a region of predominantly helical nature located between the endonuclease domain and the methyltransferase domain, from approximately amino acids 151–300 in MmeI. This region is presumed to form the ‘arm’ that enables the enzyme to position the endonuclease domain two turns of the DNA helix, or 20 nt, away for DNA cleavage when the enzyme is bound at the recognition sequence. This region shares similarity to the amino terminal portion of the type I DNA methyltransferases, suggesting an evolutionary relationship between the MmeI family and type I DNA methyltransferases, as has recently been proposed (
27).
The methyltransferase domain contains readily identifiable amino-acid sequence motifs of the amino DNA-methyltransferases. These motifs occur in the order found in the gamma class of N6-adenine methyltransferases: motif X, motif I to motif VIII. Structure prediction algorithms model the methyltransferase domain of these enzymes onto the structure of the gamma class m6A DNA methyltransferase M.TaqI (PDB: 1G38) with high accuracy probabilities, indicating that the methyltransferase domain of these enzymes is typical of the gamma m6A DNA methyltransferases. The methyltransferase domain corresponds to approximately amino acids 301–620 in MmeI.
In the type II gamma class m6A DNA methyltransferases, specific recognition is determined by the Target Recognition Domain (TRD) located C-terminal to the methyltransferase domain, as well as minor groove contacts located between methyltransferase motifs IV and V in the case of M.TaqI, while in the type I systems recognition is supplied by a separate specificity polypeptide. For the enzymes described the TRD appears to immediately follow the methyltransferase domain as in the type II gamma class m6A DNA methyltransferases, corresponding to approximately position 621–820 in MmeI. There is remarkable conservation in the predicted secondary structure elements within the TRD region, indicating that the enzymes are likely to contact the DNA using similar structural elements. The enzymes form two main branches in a phlyogenetic analysis, with the enzymes recognizing six-base sequences in one and those recognizing seven-base sequences in the other, with the one exception of DrdIV (). The enzymes recognizing seven-base sequences exhibit a small insertion of seven amino acids and a small deletion of four amino acids relative to the six-base enzymes within the putative TRD region.
The TRD appears to end at a conserved short sequence motif, ‘FPFP’, that is reminiscent of the PLPPL motif found in type I specificity subunits. The PLPPL motif occurs at the transition from one-half site TRD to the helical spacer arm that connects the two-half site TRD domains (
28). Following this ‘FPFP’ motif there is a C-terminal region consisting of several predicted well-conserved helices of unknown function.