The results presented in this article and summarized in represent one of the first times that it has been possible to examine the complete methylation pattern of a bacterial genome. For the MTases studied in this article, seven are components of Type I RM systems and have six different recognition sequences, all of which are new. Two Type III systems were found with one new recognition sequence. Two MTases were part of traditional Type II systems although we did not test whether the REase was active. Four Type IIG REases, which contain both MTase and REase activity in a single polypeptide chain, were found, all with new specificities. It should be noted that two of these, RM.CjeFIII and RM.CjeNIII, show very high sequence similarity and yet recognize different sequences (5′-GCAm6
AGG-3′ and 5′-GKAm6
AYG-3′, respectively). Thus, this finding represents another family of Type IIG restriction enzymes that resemble the MmeI family, where a few simple changes in critical base recognition elements cause changes in specificity (39
). This again emphasizes the need for caution when transferring annotation from one characterized protein to another (40
). The composition of an amino acid change can be critical if it occurs at a residue belonging to a DNA sequence recognition element. Two orphan MTases, M.CsaIII and M.BceSVII, were found to be active when cloned, but inactive in the genome. Both are promiscuous m6
A MTases and both occur on prophage elements suggesting that they may play a protective role during phage infection. Finally, two solitary 5′-GATC-3′ MTases were shown to be active. It should be noted that when examining complete genome sequences for MTases, some of the genes may be inactive because of mutation, while others may be inactive due to transcriptional silencing as is often found when the genes are present as part of a prophage. In the latter case cloning can reveal methylation activity, permitting complete characterization as found earlier (15
One of the striking features of the results from the current analysis is that the recognition sequences of all MTases found to be active showed fairly strict specificity with very few off-target events noted. Of course, much greater coverage would be required to detect very rare off-site effects and so some degree of promiscuity cannot be ruled out. However, the apparent promiscuity that was observed in our earlier work (15
) using MTase genes cloned in high copy number plasmids was not apparent. We consider the ‘true’ MTases specificity to be reflected in the modification patterns seen when they are expressed in their genomic context. Thus, based on the current findings, we would have to conclude that in general it seems likely that most MTases show essentially identical specificity to their cognate REases, a result that was not completely expected since there are no obvious constraints on their specificity.
Previously, it had been found that Type III MTases only methylate a single strand of their recognition sequence and that holds true here. Similarly, most characterized Type IIG enzymes methylate just a single strand although several do not, including RM.CjeNII as described here. Nevertheless, this can be very helpful when trying to match recognition sequences found by sequencing with the genes responsible for each consensus sequence. Another useful feature is that all known Type I restriction systems seem to possess split recognition sequences, which can help in distinguishing them when matching genes and consensus sequences. Nevertheless, if two Type I systems are present as in V. breoganii
1C-10, it was essential to clone out the individual systems so that specificity and genes could be properly matched. Note that because of the mechanism of methylation it is only the M and S subunits that need to be cloned to permit assembly of a functional MTase (16
In the case of the Type II RM system BceSIII, because of the asymmetric nature of the recognition sequence, two independent MTases are required to methylate each strand of the sequence. While SMRT sequencing can easily find the locations of each methyl group, it was necessary to clone out the two MTase genes separately in order to assign strand specificity to each one. M.GmeI also recognizes an asymmetric sequence, but in this case, the two M genes are fused. At the present time, we have relatively little information about strand specificity of MTases, because it has proven difficult to determine specificity experimentally. As more data accumulate using the kinds of analyses that we present here, it should become much easier in the future to make accurate bioinformatic predictions about recognition sequences and specificity for MTases in newly sequenced genomes.
Despite the recognized importance of methylation for understanding fundamental microbiological processes, microbe adaptability and disease pathogenicity (11
), in the past, there has not been a great deal of research into the methylation patterns of bacterial genomes, largely because of the difficulty of obtaining suitable data. One area where knowledge about the methylome is very important relates to studies trying to transform DNA into strains that contain one or more RM systems and which vastly reduce transformation efficiencies. In some cases, these barriers have been overcome by premethylating the DNA or by removing the RM systems from strains (41
). One problem with the latter approach is that removal of methylation systems may fundamentally change the biology of the organism under study. With the kind of analysis provided here, the RM systems likely to cause problems with transformation can be easily spotted and appropriate measures taken. Thus, the MTases necessary for protection can be identified and if needed intermediate cloning hosts carrying suitable complements of MTase genes can be prepared.
In summary, the results provided here show that SMRT sequencing can provide functional information about active MTases present in genomes and can decipher their recognition sequences, a task that used to be time-consuming to a point where it was not usually carried out. This, combined with the long reads provided by this technology can be an excellent adjunct to current high-throughput sequencing platforms, in that sequence assembly is facilitated and gene function is reliably documented.