We developed two independent methods for precipitation of 5hmC in genomic DNA. The GLIB method () entails addition of a glucose molecule to each 5hmC with T4 phage β-glucosyltransferase (BGT)3
(Supplementary Fig. 1a). The glucose moiety is oxidized with sodium periodate, which converts the vicinal hydroxyl groups to aldehydes10
, and further modified with aldehyde-reactive probe, which adds two biotin molecules to each 5hmC (). A related strategy, which uses a custom-synthesized UDP-glucose analogue (UDP-6-N3-glucose), was recently used to profile 5hmC distribution in mouse brain11
. The second method uses an antibody against cytosine 5-methylenesulphonate (CMS)5
, produced by reaction of 5hmC with sodium bisulphite ()12
. Anti-CMS antibodies are more sensitive and less density-dependent than anti-5hmC in DNA dot blot assays5
. Both methods are specific for DNA containing 5hmC (Supplementary Fig. 1b)5
Comparison of 5hmC enrichment methodsa
We examined the ability of GLIB-treated (biotinylated) and bisulphite-treated 5hmC-containing DNA to be pulled down by streptavidin and anti-CMS antisera, respectively. Using varying ratios of dCTP:dhmCTP, we generated 201 base pairs PCR amplicons with differing incorporation of cytosine and 5hmC in identical sequence contexts (Supplementary Table 1). At each dhmCTP:dCTP ratio, the fraction of amplicons that contain no 5hmC, and therefore should not be precipitated, can be calculated using the binomial equation (Supplementary Table 2). Observed and calculated pull-down efficiencies were very similar (): even at low densities of 5hmC, more than 90% of DNA fragments calculated to contain a single 5hmC were precipitated after GLIB treatment. Anti-CMS pull-down showed increased density dependence compared to GLIB, but had very low background, such that there was still a strong preference for precipitation of sparsely hydroxymethylated amplicons over unmodified ones (). The performance of a commercial polyclonal anti-hmC antiserum was inferior to that of anti-CMS, in terms of higher background pull-down of unmodified DNA (3.0% versus 0.06%) as well as greater density dependence (). By testing PCR amplicons with varying 5mC, we confirmed that the methyl-DNA immunoprecipitation (MeDIP) technique, which uses a monoclonal antibody to 5mC, is extremely density-dependent ().
We applied the GLIB and anti-CMS techniques to enrich 5hmC-containing regions in genomic DNA using genomic DNA with low, intermediate and high levels of 5hmC (Supplementary Fig. 1c, left panel). For the GLIB and anti-CMS pull-downs, the amount of specifically precipitated genomic DNA was proportional to the relative amount of 5hmC (Supplementary Fig. 1c). The GLIB technique did not produce mutations (Supplementary Table 3), the biotinylated DNA could be efficiently eluted by heating with formamide (Supplementary Fig. 1d), and the biotinylated adduct had a minimal inhibitory effect on PCR at 5–25% hmC density (delay of approximately 0.1 cycles per converted 5hmC residue; Supplementary Fig. 1f). There was no PCR delay with CMS-containing PCR amplicons except at very high CMS levels (Supplementary Fig. 1e), consistent with our previous report that CMS inhibits PCR predominantly at biologically irrelevant sequences where multiple CMS adducts occur in a row13
We investigated the genome-wide localization of 5hmC in murine V6.5 ES cells. For GLIB-treated DNA, we chose Helicos single molecule DNA sequencing, which does not require an amplification step and thus avoids PCR bias14,15
. For CMS-enriched genomic DNA, we used an Illumina instrument, as longer read lengths are needed for efficient alignment of bisulphite-treated DNA to the genome16
. With the GLIB method, 119,600 regions of the genome, averaging 1,422 bp in length, showed a substantially higher density of reads in the +BGT as opposed to the −BGT sample; with the CMS method, comparison of enriched to input DNA identified 109,264 enriched regions (average length 1,168 bp). There was high overlap in the enriched regions, here designated 5hmC-enriched regions of the genome (HERGs) (). Comparing the number of HERGs retrieved by using different fractions of aligned reads yielded a curve that approached an asymptote, suggesting that a majority of hydroxymethylated regions had been identified (Supplementary Fig. 2a).
To determine whether HERGs overlapped with methylated DNA regions, we identified 62,991 5mC-enriched regions of the genome (MERGs) by MeDIP. The resulting 5mC profile does not represent a complete map of 5mC in mouse ES cells, but rather is biased towards regions of dense methylation. Statistics pertaining to the GLIB, anti-CMS and MeDIP enrichments are shown in Supplementary Figs 2b–d, the corresponding annotations are provided in Supplementary Tables 4–9, and reads and enrichment for the Hoxb
locus are provided in Supplementary Table 10. As expected, both HERGs and MERGs contained a high frequency of CG sequences relative to the genome at large (Supplementary Fig. 3a). Intriguingly, HERGs also contained relatively high levels of CAG sequences, the most frequent site of non-CpG methylation in human ES cells16
, and we confirmed that the TET1 catalytic domain is capable of hydroxylating 5mC in CHG and CHH (H = A, T or C) contexts in vitro
(Supplementary Fig. 3b).
Analysis of the GLIB and anti-CMS HERG sets gave very similar results. We observed a strong correlation between the densities of HERGs and genes on a given chromosome; this trend was less pronounced for MERGs (). When we compared the distribution of HERGs and MERGs to the distribution of DNA fragments of equivalent length distributed randomly across the genome, both 5hmC and 5mC were enriched within transcribed regions, particularly exons, which are known to be sites of high CpG density17
as well as high DNA methylation18
( and Supplementary Fig. 3c). However, only 5hmC was enriched at transcription start sites (TSSs) and within the 5′ untranslated regions (UTRs) of genes (). Moreover, 5hmC was relatively more enriched in enhancers (defined by H3K4me1 in the absence of H3K4me3)19
than 5mC, strongly indicating a connection between 5hmC and regulatory elements (). Plotting each HERG as a single point relative to the nearest TSS, we found that 5hmC is heavily enriched both 5′ and 3′ of the TSS, whereas 5mC is enriched primarily 3′ of the TSS (). These results show a unique distribution of 5hmC in regulatory elements of genes, one that is not explained simply by the distribution of 5mC, the substrate for TET enzymes.
Genomic distribution of 5hmC or 5mC enriched regions of the genome
The enrichment of 5hmC at the TSS suggested a role for 5hmC in transcriptional regulation. To evaluate this possibility, we used published data sets on gene expression20,21
and histone modification22,23
profiles in mouse ES cells to compare the sets of genes with 5hmC or 5mC at their start sites (Supplementary Tables 11–13) to the set of all genes in the genome. 5hmC is preferentially found at promoters with high or intermediate CpG content (Supplementary Fig. 4a), even though high CpG promoters are hypomethylated in ES cells16,18,24
. This distribution is consistent with the possibility that TET proteins are preferentially recruited to high CpG regions through their CpG-binding CXXC domains6,25
In ES cells, genes with ‘bivalent’ H3K27 and H3K4 trimethylation are transcriptionally inactive but poised for expression upon differentiation to embryoid bodies20,26,27
. We found that genes with 5hmC at their start sites were disproportionately likely to contain bivalent domains at their promoters; likewise, a majority (~60%) of genes reported to contain bivalent domains have 5hmC at their start sites (). 5hmC was less likely to be found at genes with the activating ‘H3K4me3 only’ mark than is predicted by chance. Moreover, genes with 5hmC at their start sites showed lower expression in murine ES cells than other genes () and were more likely to be upregulated upon embryoid body differentiation (). The correlation of 5hmC with bivalent domains holds even after adjusting for the known relation between promoter CpG content and bivalency22
(Supplementary Fig. 5). Although 5mC at the TSS also correlates with lower gene expression in murine ES cells (Supplementary Fig. 4b), 5mC is not enriched at the promoters of genes with bivalent domains28
(Supplementary Fig. 4c), and genes with high levels of 5mC did not tend to be upregulated upon embryoid body differentiation (Supplementary Fig. 4d). Thus 5hmC is preferentially enriched at the promoters of genes with bivalent histone marks in ES cells, indicating that 5hmC may contribute functionally to the ‘poised’ but inactive state of these genes in ES cells.
Properties of HERGs at transcription start sites
Genes with 5hmC at their start sites are also disproportionately enriched in the set of genes whose promoters bind polycomb repressor complex (PRC) components, and in a majority of genes with the ‘H3K27me3’ only mark (). There is a statistically significant correlation between genes that had 5hmC at the TSS and genes that were upregulated upon small interfering RNA-mediated Tet1 depletion8
(therefore, negatively regulated by Tet1) (), indicating that 5hmC in the promoter region has a negative role in the transcription of some genes in ES cells. Unlike 5mC, however, 5hmC is not substantially enriched at sites of heterochromatic H3K9 or H4K20 trimethylation22
(data not shown).
Collectively, our results support a model in which 5hmC and 5mC have different roles in transcription. Like 5mC28
, 5hmC at promoters is predictive of lower levels of gene expression. However, 5hmC is uniquely associated with a ‘poised’ chromatin configuration and with genes that are upregulated upon differentiation, and may thus be involved in priming loci for rapid activation in response to appropriate signals. Activation of lineage-specific genetic loci upon differentiation could occur via a postulated 5mC ‘demethylation’ pathway (5mC to 5hmC to cytosine)1
or through recruitment of transcriptional regulators that specifically recognize 5hmC and are activated in response to differentiation signals. The ability to profile 5hmC even at sparsely hydroxymethylated loci will allow a careful evaluation of these possibilities in differentiating cells.