|Home | About | Journals | Submit | Contact Us | Français|
In contrast to 5-methylcytosine (5-mC), which has been studied extensively1–3, little is known about 5-hydroxymethylcytosine (5-hmC), a recently identified epigenetic modification present in substantial amounts in certain mammalian cell types4,5. Here we present a method for determining the genome-wide distribution of 5-hmC. We use the T4 bacteriophage β-glucosyltransferase to transfer an engineered glucose moiety containing an azide group onto the hydroxyl group of 5-hmC. The azide group can be chemically modified with biotin for detection, affinity enrichment and sequencing of 5-hmC–containing DNA fragments in mammalian genomes. Using this method, we demonstrate that 5-hmC is present in human cell lines beyond those previously recognized4. We also find a gene expression level–dependent enrichment of intragenic 5-hmC in mouse cerebellum and an age-dependent acquisition of this modification in specific gene bodies linked to neurodegenerative disorders.
Parallel to the discovery of 5-hmC in mammalian genomes4,5, Tet proteins were shown to use dioxygen to oxidize 5-mC to 5-hmC in mammalian DNA5. Tet proteins are a group of iron(II)/α-ketoglutarate–dependent dioxygenases similar to the AlkB family proteins and hypoxia-inducible factor (HIF) prolyl-hydroxylases6,7. As Tet1 and Tet2 appear to affect embryonic stem (ES) cell maintenance and normal myelopoiesis, respectively8,9, these findings fostered speculation that this 5-hmC modification might also be an important epigenetic modification10.
To elucidate the biology of 5-hmC, the first step is to identify the locations of 5-hmC within genomic DNA, but so far it has remained challenging to distinguish 5-hmC from 5-mC and to enrich 5-hmC-containing genomic DNA fragments.
Widely used methods to probe 5-mC, such as bisulfite sequencing and methylation-sensitive restriction digestion, cannot discriminate between 5-hmC and 5-mC11,12. Anti-5-hmC antibodies have only recently become commercially available. However, attempts to use the antibodies to immuno-enrich 5-hmC-containing genomic DNA from complex genomes for sequencing have yet to be successful8. A single-molecule, real-time sequencing technology has been applied to distinguish between cytosine, 5-mC and 5-hmC, but further improvements are necessary to affinity-enrich 5-hmC–containing DNA and to achieve base-resolution sequencing13.
Here we present a chemical tagging technology to address both challenges. It has been shown that 5-hmC is present in the genome of the T-even bacteriophages. A viral enzyme, β-glucosyltransferase (β-GT), can catalyze the transfer of a glucose moiety from uridine diphosphoglucose (UDP-Glu) to the hydroxyl group of 5-hmC, yielding β-glucosyl-5-hydroxymethyl-cytosine (5-gmC) in duplex DNA14,15 (Fig. 1a). We took advantage of this enzymatic process and used β-GT to transfer a chemically modified glucose, 6-N3-glucose, onto 5-hmC for selective bio-orthogonal labeling of 5-hmC in genomic DNA (Fig. 1b). With an azide group present, a biotin tag or any other tag can be installed using Huisgen cycloaddition (click) chemistry for a variety of enrichment, detection and sequencing applications16–18.
We used the biotin tag for high-affinity capture and/or enrichment of 5-hmC–containing DNA for sensitive detection and deep sequencing to reveal genomic locations of 5-hmC (Fig. 1b). The covalent chemical labeling coupled with biotin-based affinity purification provides considerable advantages over noncovalent, antibody-based immunoprecipitation as it ensures accurate and comprehensive capture of 5-hmC–containing DNA fragments, while still providing high selectivity.
We chemically synthesized UDP-6-N3-Glu (Supplementary Fig. 1 and Supplementary Methods) and attempted the glycosylation reaction of an 11-mer duplex DNA containing a 5-hmC modification as a model system (Fig. 2). Wild-type β-GT worked efficiently using UDP-6-N3-Glu as the co-factor, showing only a sixfold decrease of the reaction rate compared to the native co-factor UDP-Glu (Supplementary Fig. 2). The 6-N3-glucose transfer reaction finished within 5 min with as low as 1% enzyme concentration. The identity of the resulting β-6-azide-glucosyl-5-hydroxymethyl-cytosine (N3-5-gmC) of the 11-mer DNA was confirmed by matrix-assisted laser desorption/ionization–time of flight (MALDI-TOF) analysis (Fig. 2). One can readily couple N3-5-gmC with dibenzocyclooctyne-modified biotin (compound 1) by copper-free click chemistry to introduce a biotin group (Fig. 2)19,20. Again, the identity of the 11-mer DNA with the biotin-N3-5-gmC label was confirmed by MALDI-TOF analysis (Fig. 2). High-performance liquid chromatography (HPLC) analysis indicated that the click chemistry is high yielding (~90%) (Supplementary Fig. 3). High-resolution mass spectroscopy (HRMS) analysis of the corresponding HPLC hydrolysates further verified that biotin-N3-5-gmC was formed (Supplementary Fig. 4).
The properties of 5-hmC in duplex DNA are quite similar to those of 5-mC in terms of its sensitivity toward enzymatic reactions such as restriction enzyme digestion and polymerization13–15. In an attempt to develop a method to differentiate these two bases in DNA, primer extension with a biotin-N3-5-gmC–modified DNA template was tested. Addition of streptavidin tetramer (binds biotin tightly) completely stops replication by Taq polymerase specifically at the modified position as well as one base before the modified position (Supplementary Fig. 5). Therefore, this method has the potential to provide single-base resolution of the location of 5-hmC in DNA loci of interest.
Next, we performed selective labeling of 5-hmC in genomic DNA from various cell lines and animal tissues. Genomic DNA from various sources was sonicated into small fragments (~100–500 base pairs), treated with β-GT in the presence of UDP-6-N3-Glu or regular UDP-Glu (control group) to yield N3-5-gmC or 5-gmC modifications and finally labeled with cyclooctyne-biotin (1) to install biotin. Because each step is efficient and bio-orthogonal, this protocol ensures selective labeling of most 5-hmC in genomic DNA. The presence of biotin-N3-5-gmC allows affinity enrichment of this modification and accurate quantification of the amount of 5-hmC in a genome using avidin–horseradish peroxidase (HRP).
We determined the total amount of 5-hmC in mouse cerebellum at different stages of development (Fig. 3a,b). The control group showed almost no signal, demonstrating the high selectivity of this method. The amount of 5-hmC depends on the developmental stage of the mouse cerebellum (Fig. 3b). A gradual increase from post-natal day 7 (P7, 0.1% of total nucleotides in the genome) to adult stage (0.4% of total nucleotides) was observed21, which was further confirmed using antibody against 5-hmC through a dot-blot assay (Supplementary Fig. 6a). Our observation suggests that 5-hmC might play an important role in brain development. The 5-hmC level of mouse embryonic stem cells (mESC) was determined to be comparable to results reported previously (~0.05% of total nucleotides) (Fig. 3c,d)5. In addition, the amount of 5-hmC in mouse adult neural stem cells (aNSC) was tested, which proved comparable to that of mESC (~0.04% of total nucleotides) (Fig. 3c,d).
We also tested human cell lines (Fig. 3c,d). Notably, the presence of 5-hmC was detected in HeLa and HEK293FT cell lines, although in much lower abundance (~0.01% of total nucleotides) (Fig. 3d) than in other cells or tissues that have been previously reported to contain 5-hmC (previous studies did not show the presence of 5-hmC in HeLa cells due to the limited sensitivity of the methods employed4). These results suggest that this modification may be more widespread than previously anticipated. By contrast, no 5-hmC signal was detected in wild-type Drosophila melanogaster, consistent with a lack of DNA methylation in this organism22.
To further validate the utility of the method for biological samples we confirmed the presence of 5-hmC in the genomic DNA from HeLa cells. A monomeric avidin column was used to pull down the biotin-N3-5-gmC–containing DNA after genomic DNA labeling. These enriched DNA fragments were digested into single nucleotides, purified by HPLC and subjected to HRMS analysis. To our satisfaction, we obtained HRMS as well as MS/MS spectra of biotin-N3-5-gmC identical to the standard from synthetic DNA (Supplementary Fig. 4 and Fig. 6b,c). In addition, two 60-mer double-stranded (ds)DNAs, one with a single 5-hmC in its sequence and the other without the modification, were prepared. We spiked equal amounts of both samples into mouse genomic DNA and performed labeling and subsequent affinity purification of the biotinylated DNA. The pull-down sample was subjected to deep sequencing, and the result showed that the 5-hmC–containing DNA was >25-fold higher than the control sample (Supplementary Fig. 7).
Next, we performed chemical labeling of genomic DNA from mouse cerebellum, subjecting the enriched fragments to deep sequencing such that 5-hmC–containing genomic regions could be identified. Initially, we compared male and female adult mice (2.5 months old), sequencing multiple independent biological samples and multiple libraries prepared from the same genomic DNA. Genome-scale density profiles are nearly identical between male and female and are clearly distinguishable from both input genomic DNA and control DNA labeled with regular glucose (no biotin) (Fig. 4a). Peak identification revealed a total of 39,011 high-confidence regions enriched consistently with 5-hmC in both male and female (Fig. 4a and Supplementary Table 1). All of the 13 selected, enriched regions were subsequently successfully verified in both adult female and male cerebellum by quantitative PCR (qPCR), whereas multiple control regions did not display enrichment (Supplementary Fig. 8).
DNA methylation is widespread in mammalian genomes, with the exception of most transcription start sites (TSS)23–25. Previous studies have mostly assessed DNA methylation by bisulfite sequencing and methylation-sensitive restriction digests. It has since been appreciated that neither of these methods adequately distinguishes 5-mC from 5-hmC11,12. To determine the genome-wide distribution of 5-hmC, we generated metagene 5-hmC read density profiles for RefSeq transcripts. Normalized 5-hmC read densities differ by an average of 2.10 ± 0.04% (mean ± s.e.m.) in adult male and female cerebellum samples, indicating that the profiles are accurate and reproducible. We observed enrichment of 5-hmC in gene bodies as well as in proximal upstream and downstream regions relative to TSS, transcription termination sites (TTS) and distal regions (Fig. 4b). This is in contrast to previously generated methyl-binding domain–sequencing (MBD-Seq)26, as well as our own methylated DNA immunoprecipitation sequencing (MeDIP-Seq) from mouse cerebellum genomic DNA, in which the majority (~80%) of 5-mC–enriched DNA sequences were derived from satellite and/or repeat regions (Supplementary Fig. 9). Further analyses also reveal that both intragenic and proximal enrichment of 5-hmC is associated with more highly expressed genes, consistent with a role for 5-hmC in maintaining and/or promoting gene expression (Fig. 4b). Proximal enrichment of 5-hmC ~875 bp upstream of TSSs and ~160–200 bp downstream of the annotated TTSs further suggests a role for these regions in the regulation of gene expression through 5-hmC.
Quantification of bulk 5-hmC in the cerebellum of P7 and adult mice indicates genomic acquisition of 5-hmC during cerebellum maturation (Fig. 3a). We further explored this phenomenon by sequencing 5-hmC–enriched DNA from P7 cerebellum and compared these sequences to those derived from adult mice. Metagene profiles at RefSeq transcripts confirmed an increase in proximal and intragenic 5-hmC in adult relative to P7 cerebellum, although there was little to no difference and minimal enrichment over input genomic DNA in distal regions (Fig. 4c and Supplementary Table 2). Peak identification using P7 as background identified a total of 20,092 enriched regions that showed significant differences between P7 and adult tissues. Of those, 15,388 (76.6%) occurred within 5,425 genes acquiring intragenic 5-hmC in adult females (Supplementary Fig. 10 and Supplementary Table 3).
Gene ontology pathway analysis of the 5,425 genes acquiring 5-hmC during aging identified significant enrichment of pathways associated with age-related neurodegenerative disorders as well as angiogenesis and hypoxia response (Fig. 4d and Supplementary Table 4). This is of particular interest given that all these pathways have been linked to oxidation stress response and that the conversion of 5-mC to 5-hmC requires dioxygen5. Furthermore, an assessment of the gene list revealed that 15/23 genes previously identified as causing ataxia and disorders of Purkinje cell degeneration in mouse and human acquired intragenic 5-hmC in adult mice (Supplementary Fig. 11 and Supplementary Table 5)27. Together, these observations suggest that 5-hmC may play a role in age-related neurodegeneration.
Recently, β-GT was used to transfer a radiolabeled glucose for 5-hmC quantification28. (Our paper was under review when ref. 28 was published.) A major advantage of our technology is its ability to selectively label 5-hmC in genomic DNA with any tag. With a biotin tag attached to 5-hmC, DNA fragments containing 5-hmC can be affinity purified for deep sequencing to reveal distribution and/or location of 5-hmC in mammalian genomes. Because biotin is covalently linked to 5-hmC and biotin-avidin/streptavidin interaction is strong and highly specific, this technology promises high robustness as compared to potential anti-5-hmC, antibody-based, immune-purification methods8. Other fluorescent or affinity tags may be readily installed using the same approach for various other applications. For instance, imaging of 5-hmC in fixed cells or even live cells (if labeling can be performed in one step with a mutant enzyme) may be achieved with a fluorescent tag. In addition, the chemical labeling of 5-hmC with a bulky group could interfere with restriction enzyme digestion or ligation, which may be used to detect 5-hmC in specific genome regions. The attachment of biotin or other tags to 5-hmC also dramatically enhances the sensitivity and simplicity of the 5-hmC detection and/or quantification in various biological samples28. The detection limit of this method can reach ~0.004% (Fig. 3d) and the method can be readily applied to study a large number of biological samples.
With the technology presented here, we observed the developmental stage–dependent increase of 5-hmC in mouse cerebellum. Compared to postnatal day 7 at a time of massive cell proliferation in the mouse cerebellum, adult cerebellum has a significantly increased level of 5-hmC, suggesting that 5-hmC might be involved in neuronal development and maturation. Indeed, we also observed an increase of 5-hmC in aNSCs upon differentiation (unpublished data).
This technology enables us to selectively capture 5-hmC–enriched regions in the cerebellums from both P7 and adult mice, and determine the genome-wide distribution of 5-hmC by deep sequencing. Our analyses revealed general features of 5-hmC in mouse cerebellum. First, 5-hmC was enriched specifically in gene bodies as well as defined gene proximal regions relative to more distal regions. This differs from the distribution of 5-mC, where DNA methylation has been found both within gene bodies as well as in more distal regions23–25,29. Second, the enrichment of 5-hmC is higher in gene bodies that are more highly expressed, suggesting a potential role for 5-hmC in activating and/or maintaining gene expression. It is possible that conversion of 5-mC to 5-hmC is a pathway to offset the gene repression effect of 5-mC during this process without going through demethylation30. Third, we observed an enrichment of 5-hmC in genes linked to hypoxia and angiogenesis. The oxidation of 5-mC to 5-hmC by Tet proteins requires dioxygen5,8. A well-known oxygen sensor in mammalian systems that are involved in hypoxia and angiogenesis is the HIF protein, which belongs to the same mononuclear iron-containing dioxygenase superfamily as the active domain of the Tet proteins7. It is tempting to speculate that oxidation of 5-mC to 5-hmC by Tet proteins may constitute another oxygen-sensing and regulation pathway in mammalian cells. Lastly, the association of 5-hmC with genes that have been implicated in neurodegenerative disorders suggests that this base modification could potentially contribute to the pathogenesis of human neurological disorders. Should a connection between 5-hmC levels and human disease be established, the affinity purification approach shown in the current work could be used to purify and/or enrich 5-hmC–containing DNA fragments as a simple and sensitive method for disease prognosis and diagnosis.
In summary, we have developed an efficient and selective method to label and capture 5-hmC from genomic DNA. We have demonstrated the feasibility of using this approach to determine the genome-wide distribution of 5-hmC. Future application of this technology would enable us to understand the role(s) of the 5-hmC modification at molecular, cellular and physiological levels.
Methods and any associated references are available in the online version of the paper at http://www.nature.com/naturebiotechnology/.
We would like to thank S. Warren for the helpful discussion and critical reading of the manuscript. This study was supported partly by the US National Institutes of Health (GM071440 to C.H. and NS051630/MH076090/MH078972 to P.J.) and the University of Chicago.
Accession codes. The sequencing data have been deposited in NCBI’s Gene Expression Omnibus with accession number GSE25398.
Note: Supplementary information is available on the Nature Biotechnology website.
AUTHOR CONTRIBUTIONSC.H., C.-X.S. and P.J. designed the experiments with help from Y.F. and B.T.L. Experiments were performed by C.-X.S., K.E.S., Y.F., C.Y. and Q.D. with the help of W.Z. and X.J.; Q.D. and J.W. carried out the chemical synthesis; K.E.S., X.L., Y.L. and P.J. provided the mouse cerebellum, mouse aNSC and fly samples, and performed deep sequencing; C.-H.C., L.Z., T.J.L. and L.A.G. helped with the mouse ESC, human HeLa, human HEK and related samples; B.Z. and L.M.H. performed the mass spectrometry analysis from HeLa cells. C.H., C.-X.S. and P.J. wrote the paper. All authors discussed the results and commented on the manuscript.
COMPETING FINANCIAL INTERESTS
The authors declare competing financial interests: details accompany the full-text HTML version of the paper at http://www.nature.com/naturebiotechnology/.
Published online at http://www.nature.com/naturebiotechnology/.
Reprints and permissions information is available online at http://npg.nature.com/reprintsandpermissions/.