For a long time, genes in the Mrr superfamily have remained essentially uncharacterized as to their biochemical function. Although E. coli
Mrr was shown to restrict certain epigenetic modifications in vivo
, there seemed no simple consensus for its recognition sites (9
). In this article, we characterize a remote homolog of E. coli
Mrr, MspJI from a Mycobacterium
sp., and show that it is a genuine endonuclease which recognizes cytosine residues modified at the C-5 position and cleaves at fixed distances away from the recognition site. The homology between MspJI and E. coli
Mrr is weak and only revealed through multiple rounds of PSI–BLAST search. Following the finding of MspJI, we examined several other close homologs to MspJI, most of which also show modification-dependent endonuclease activity. Our experimental evidence lends strong support to the earlier proposal that genes in the Mrr family are endonucleases and recognize modified DNA (5
). A notable difference between the MspJI subfamily and other homologs of E. coli
Mrr appears to be their length. Further investigation is needed to characterize the in vitro
activity of other E. coli
Mrr homologs and compare the differences among the family members.
We have determined the preferred recognition site of MspJI to be mCNNR (R=G or A). Moreover, in our experiments we noticed that mCNNG provides a better site than mCNNA. These observations suggest that Mrr-like enzymes may be more promiscuous in their specificity by nature than typical restriction enzymes which recognize unmodified DNA. This may explain why it was difficult to infer a clear-cut recognition consensus for the E. coli Mrr. We reason that it may have to do with the different selection pressures associated with these two different types of enzymes. For a typical restriction enzyme, selection pressure includes not only the efficient cleavage against the target recognition sites to fight off invading DNA, but also minimal off-target cleavage which can be detrimental to the host DNA. However, for Mrr and many other modification-dependent nucleases, as long as the host DNA does not have the target modification sites, there would be little or no selection pressure to limit activity on non-target modification sites. If anything, the only selective pressure that might be present would be if there was a specific cytosine methyltransferase present elsewhere in the chromosome and the Mrr homolog would need to specifically avoiding recognizing those sequences. This is precisely the situation that is realized with the Mrr endonuclease in E. coli where it coexists with the dcm
m5C methyltransferase. As a result, the co-existence of modification-dependent endonucleases such as the Mrr, McrA and McrBC superfamilies with the regular RM systems contribute to defining a delicate epigenetic landscape for each bacterial genome.
The association between MspJI and typical RM system elements, such as DNA methyltransferase genes and vsr genes, is another intriguing observation. On the one hand, the association does not seem to be incidental because a number of MspJI homologs have kept the association with the methyltransferases and the vsr genes. On the other hand, in some cases, the association seems dispensable. It is highly possible that MspJI-like genes may have originated from the conventional RM systems, for instance, one scenario could be that the R gene gradually acquires modification-dependent activity after losing its ability to recognize unmodified DNA. As a result, the M gene is no longer essential for the host survival and starts to accumulate inactivating mutations.
As enzymatic reagents, the MspJI family should be especially useful in studies that aim at detecting the epigenetic status of DNA and mapping the locations of the modifications. The nature of the cleavage which takes place at fixed distances away from the methylated sites allows precise inferences of the methylated bases. In this regard, the fragments excised at a fully-methylated site would lend itself to analysis by some of the modern high throughput sequencing techniques. The biggest advantage of the method based on the MspJI-like enzymes, compared to the bisulfite sequencing method as the gold standard, is the convenience in pre-sequencing sample preparation and simplicity in post-sequencing data analysis. As an example, compared to some rather complicated algorithms designed to map bisulfite sequencing data, post-sequencing data analysis here would be much simpler: once the cleavage sites are located by mapping sequencing reads back to the reference genomes, the modified cytosines should be either down or upstream of the cleavage sites at 16 or 17 bases away.
In eukaryotic organisms, the most relevant epigenetic change is CpG or sometimes CHG methylation (21
), both of which are recognized by MspJI. With the m
CNNR specificities of MspJI, it is theoretically possible to introduce cleavage in the vicinity of up to 50% of the methylated CpG sites. Up to 25% of the genomic CpG sites can be directly interrogated by sequencing the 32-bp bands. A complicating factor is that if two substrate CpG sites are less than 16 bp apart, cleavages from MspJI bound to different half site may interfere with each other and produce fragments <32 bp. More studies are in progress to investigate the possibility of using MspJI in decoding epigenomes.
To our knowledge, MspJI and its homologs represent a unique group of modification-dependent endonucleases which extend our understanding of bacterial RM systems and may have practical application in manipulating modified nucleic acids.