Methylation of DNA bases, catalyzed by DNA methyltransferases (MTases), is the most abundant form of post-replicative DNA modification found in the genomes of prokaryotic and higher eukaryotic organisms. Three functional classes of MTases have been identified in bacteria and archaea. Two of these transfer a methyl group from
S-adenosyl-
l-methionine (SAM) to the exocyclic amino groups of adenine and cytosine bases in duplex DNA, yielding N6-methyladenine (m6A) and N4-methylcytosine (m4C), respectively (
1). A third, and mechanistically distinct, class transfers the methyl group of SAM to C5 of cytosine to produce 5-methylcytosine (m5C) (
2,
3).
Most bacterial and archaeal MTases are associated with sequence-specific restriction-modification (RM) systems that protect the prokaryotic cell from invasion by DNA bacteriophages, and examples have been identified that recognize several hundred distinct DNA sequences (
4). However, some well-characterized prokaryotic MTases do not appear to be associated with a cognate restriction endonuclease (REase) and some of these ‘orphan’ MTases perform different cellular functions. For example, the product of the
Escherichia coli deoxyadenosine methyltransferase (
dam) gene, M.EcoKDam, which modifies adenine residues in the sequence 5′-GATC-3′, is involved in both chromosomal replication initiation and in the maintenance of genomic integrity [reviewed earlier (
5)].
Most eukaryotic MTases are of the m5C class and are related to their prokaryotic equivalents through a common reaction mechanism that is reflected in the conservation of tertiary structural elements within the enzyme active sites (
6). Mammalian m5C-MTase activity is predominantly targeted to CpG dinucleotides and three different enzymes (DNMT1, DNMT3A and 3B) are known (
7). Such DNA methylation is a key component of the epigenetic control of gene expression [reviewed earlier (
8)]. CpG methylation is also a key player in genomic imprinting and in female X-inactivation (
9,
10). An additional modified base, 5-hydroxymethylcytosine (5-hmC), was presumed to be a product of DNA damage (
11), but has recently been found to be a normal component of mammalian DNA (
12,
13) and appears to be generated by oxidation of m5C in a reaction catalyzed by the ten eleven translocation (Tet)-family of enzymes (
13).
A number of diagnostic tools have been developed to detect DNA methylation, to establish what type of modification is associated with a particular MTase and to determine the DNA sequence context in which such methylation occurs. Nucleosides containing m6A, m4C and m5C can be resolved from their unmodified equivalents by chromatographic methods, thereby facilitating their detection in total DNA hydrolysates (
14,
15). Alternatively, polyclonal antisera that specifically recognize m6A or m4C have been used for immunological detection of such modifications in order to ascribe function to multiple putative MTase genes in
Helicobacter pylori (
16).
Bioinformatic analysis of a large number of genes that encode well-characterized prokaryotic MTases have identified groups of conserved sequence motifs that are diagnostic for DNA MTases and permit the accurate prediction of m5C-MTases, but the amino m6A- and m4C-MTases cannot be unequivocally distinguished (
1,
2). In the case of methylation activity that is part of a Type II RM system, the sequence specificity of the MTase is expected to be the same as the cleavage specificity of the associated REase. However, very little experimental evidence has been generated to support this, and biochemical characterization of the target specificity of most MTases remains to be gathered. Furthermore, the exact sites of modification are often uncertain in cases where the recognition sequence contains multiple potential target bases (i.e. A or C) or where separate MTases act on the two strands independently (
17).
The experimental approaches currently available to fully characterize prokaryotic and eukaryotic MTases are labor-intensive, requiring radioactive labeling with [
3H]
S-adenosylmethionine and mapping and sequencing of individual sites (
17,
18). One method based on Sanger sequencing can also detect methylated bases, but this is not a high-throughput method (
19). Methods currently in use to discriminate between m5C and unmodified cytosines at CpG sequences, such as differential sensitivity to cleavage by REases (
20) and methylation-specific polymerase chain reaction (PCR) of bisulfite-modified DNA (
21,
22) do not permit genome-wide analysis. The state-of-the-art method for m5C methylome studies is MethylC-Seq (
23,
24), but it is indirect and also requires extensive experimentation. In particular, the first requisite is the complete DNA sequence of the DNA to be analyzed. Only then can methylome analysis be undertaken. Some additional new technologies have been described recently that have potential applications in methylome analysis such as methods to map genomic 5-hmC by enzyme-catalyzed glucosylation (
25,
26) and a family of REases that specifically recognize m5CpG and m5CpWpG sequences (
27). The latter enzymes are of interest because members of this MspJI family of REases excise a 32–33

bp fragment that includes the methylated bases in a central position and the products lend themselves to high-throughput sequencing approaches. However, they are partially constrained by the sequence specificity of the REases (
27).
A method has been described previously to directly detect methylated DNA bases during single-molecule, real-time (SMRT) DNA sequencing (
28). This method takes advantage of kinetic data pertaining to the rate of incorporation of each dNTP in the form of two parameters—the pulse width (PW) and the interpulse duration (IPD). Significant changes in these kinetic parameters were observed during SMRT sequencing when the DNA polymerase encounters m6A, m5C or 5-hmC on the template strand. These distinct kinetic signatures allow for the identification of the type and position of the base modification in the DNA template.
Here, we extend the SMRT sequencing method to combine complete DNA sequence determination and methylated base analysis to characterize MTase specificities in a single operation (
28,
29). We analyzed a set of 16 DNA substrates that were methylated
in vivo by a range of single prokaryotic MTases expressed in an
E. coli strain that lacks additional MTase genes. The samples included MTases introducing m6A, m4C or m5C modifications, either from MTases whose substrate specificity was previously known, or from some whose specificity was unknown. The results allowed us to determine the absolute sequence specificity of the MTase, as well as the precise location of the methylated base.