Modified DNA bases are widespread in living organisms including mammals. The most common DNA modification in mammals is 5-mC, which is believed to be the precursor of 5-hmC. During the early seventies, formic acid hydrolysis followed by chromatographic analysis was used to detect 5-hmC in murine brain and liver DNA. Using this method, 5-hmC was estimated to comprise about 15% of the total cytosine residues (
24). More recently, use of thin layer chromatography and high pressure liquid chromatography coupled to mass spectrometry has reliably identified and quantified 5-hmC to be 0.6% of the total nucleotides, corresponding to about a quarter of modified cytosine residues in Purkinje cells (
13). From our studies and previous reports, it appears that there are substantial amounts of 5-hmC in mammalian genomes and that these modified bases may have potential physiological significance. For example, 5-hmC may play a role in the fetal development of heart, lung, and brain (
13,
14), and chromosomal translocation of MLL-TET1 and MLL-TET2 as well as mutation of TET2 may be involved in carcinogenesis, particularly leukemia (
25,
26).
However, sequence-specific 5-hmC detection and quantification methods are still lacking. Most commonly used techniques for DNA methylation detection and mapping, including sodium bisulfite sequencing and its other derivative approaches (
27), cannot distinguish between 5-mC and 5-hmC. In a recent study, it was suggested that bisulfite treatment could convert 5-hmC in DNA to cytosine-5-methylsulfonate, which would interfere with PCR amplification via polymerase stalling (
28). Another study by Jin
et al. (
16) demonstrated that bisulfite-treated DNA containing 5-hmC can be amplified efficiently, and similar to 5-mC, 5-hmC does not undergo conversion to a deaminated cytosine ring that would be read as T base after bisulfite conversion and PCR. Furthermore, the affinity matrices using methyl binding domains may not be useful in identifying 5-hmC either, due to their poor affinity toward this modification (
15). Although 5-hmC antibody-based methods may be a viable alternative to detect and map 5-hmC, they pose specific challenges to single base resolution or a sequence where 5-mC and 5-hmC are in close proximity.
Because most of the loci, depending on tissue specificity or mixed populations of cells, may contain C, 5-mC, and 5-hmC, our method of glucosylation of DNA followed by MspI/HpaII digestion will aid in detection and quantification of 5-hmC. The major perceived drawback of this method is that it can only interrogate 5-hmC in CCGG context. However, several other recently developed methods also utilize MspI/HpaII enzymes for genome wide methylation analysis. For example, reduced representation bisulfite sequencing utilizes MspI digestion in conjunction with bisulfite treatment to sequence the majority of CpG islands in the human genome (
29). The reduced representation bisulfite sequencing method showed deep coverage of gene promoters and selective sampling of all other type of genomic regions while detecting epigenetic alterations (
29).
Furthermore, use of the MspI isoschizomer HpaII in the methyl-sensitive cut-counting (MSCC) method generated nontargeted genome-scale data for ~1.4 million unique HpaII sites (CCGG) in the DNA of B-lymphocytes (~2.3 million total number of HpaII sites) and confirmed that gene-body methylation in highly expressed genes is a consistent phenomenon throughout the human genome. The authors observed that HpaII sites have a distribution similar to the distribution of all CpG dinucleotides (
30), making them a good target for relatively unbiased genome-scale profiling. For example, 7.5% of all CG sites are within CpG islands as compared with 11.8% all HpaII sites. The frequency of all CG sites and all HpaII sites within 1 kb of the transcription start site, inside genes or within a repetitive DNA, were observed to be similar, 2.3
versus 2.8%, 43.3
versus 45.5%, 51.5
versus 52.6%, respectively. Although our technique interrogates ~10% of the total CG sites, it can offer a potential for genome wide 5-hmC profiling by using MspI digestion of either glucosylated or control DNA followed by high throughput sequencing.
Another potential issue of this assay is background due to incomplete enzymatic digestion in combination with the extreme sensitivity of qPCR analysis. Indeed, we observed a background between 1 and 5% for most of the qPCR products despite a high concentration of enzyme being used for cleavage. The background products were dependent on various commercial batches of DNA (in our experiments all human DNAs) as well as primer sets used for amplification. Normalization by a locus containing no HpaII/MspI sites does not alter the results significantly. Therefore, we have subtracted the background values from all the qPCR analysis data and then normalized the copy numbers to undigested matched DNA samples. Although we clearly show that our assay can be used to identify and quantify both 5-hmC and 5-mC at specific cytosine residues, these caveats must be considered for any type of downstream analysis.
Our current results support previous studies indicating that ES and brain cell genomic DNAs contain considerable amounts of 5-hmC (
13,
14). In addition, this report identified novel loci containing 5-hmC in mouse ES cells and in multiple regions of human brain DNA, including genes, intergenic regions, and repetitive elements. Further analysis of these loci revealed that 5-hmC patterns shift during embryoid body formation and although brain tissue DNA contains significant amounts of 5-hmC as expected, other tissue DNAs are hydroxymethylated at various loci. These results suggest that 5-hmC, like 5-mC, may play a role in determining differentiation status and tissue-specific gene expression.
In a previous study, Tahiliani
et al. (
14) reported that Tet1 mRNA expression and genomic 5-hmC are both decreased at 5 days after LIF removal. Here, we confirmed that Tet1 protein expression indeed decreases after LIF withdrawal. Interestingly, we also observed a decrease in genomic 5-hmC at day 5, but this was preceded by a sharp increase in 5-hmC at day 1 of embryoid body differentiation. Tet1 expression does not appear to increase at early time points after LIF withdrawal. Therefore, it is possible that other members of the Tet family enzymes (Tet2 and Tet3) may have altered expression upon commencement of differentiation, as they have recently been shown to possess oxygenase activity
in vivo (
31).
Finally, every locus we examined through end point or qPCR that was identified in our screen displayed high levels of total methylation, even in samples that did not appear to be hydroxymethylated. This suggests that these loci are normally highly methylated and then in certain tissues this methylation can be converted to 5-hmC and thus accounts for only a fraction of the total methylation. In agreement with several recently published articles (
21–
23), we found that 5-hmC coincided with gene body methylation.
VANGL1 and
EGFR both have highly regulated expression in brain tissues and are important for normal brain development and function (
32–
34). It is currently unknown how gene body methylation may affect expression, and the identification of hydroxymethylation in the transcribed regions reveals yet another layer of possible epigenetic regulation of these genes. Future studies will be required to determine the exact function of 5-hmC in various tissues and within different regions of the mammalian genome.