Bisulfite sequencing has been broadly used to analyze the genomic distribution and abundance of 5mC (Bernstein et al., 2007
; Clark et al., 1994
; Lister et al., 2008
; Meissner, 2010
; Pelizzola and Ecker, 2011
). However, because traditional bisulfite sequencing cannot distinguish 5mC from 5hmC, results from such approaches cannot yet accurately reveal 5mC abundance (Huang et al., 2010
; Jin et al., 2010
). Recent experiments show that 5hmC is widespread in the mammalian genome, and at least two functions have been proposed for this cytosine modification: 1) 5hmC serves as an intermediate in the process of DNA demethylation, either passively (Inoue and Zhang, 2011
), or actively through further oxidation (He et al., 2011
; Ito et al., 2011
; Maiti and Drohat, 2011
; Zhang et al., 2012
); 2) 5hmC may be recognized by chromatin factors (Frauer et al., 2011
; Yildirim et al., 2011
), and that its presence could reduce binding of certain methyl-CpG-binding proteins (Hashimoto et al., 2012
; Kriaucionis and Heintz, 2009
; Valinluck et al., 2004
). These functions implicate two opposing notions about the relative stability of 5hmC at distinct genomic loci. As the first step toward understanding these molecular mechanisms associated with 5hmC function, it is important to not only precisely locate 5hmC in the genome, but also to determine the relative abundance at each modified site. Here we describe a modified bisulfite sequencing method that when combined with traditional bisulfite sequencing can determine the location of 5hmC at single-base resolution and quantitatively assess the abundance of 5mC and 5hmC at each modified cytosine.
Using synthetic model DNA we demonstrated that coupling βGT-mediated protection of 5hmC with mTet1-based oxidation of 5mC allows for the distinction of 5hmC from unmodified cytosine and 5mC by sequencing. 5fC and 5caC presented in the original genomic DNA do not interfere with TAB-Seq since they behave like unmodified cytosine under bisulfite treatment (He et al., 2011
). We also utilized this method to examine previously reported 5hmC enriched loci and successfully identified genuine 5hmC sites. These results show the general utility of TAB-Seq to assess 5hmC in a loci-specific manner, much the same as traditional bisulfite sequencing is currently used.
We applied this technique to mammalian genomes by generating single-base resolution maps of 5hmC in human and mouse ESCs. We show that these maps agree well with previous maps generated using affinity-based 5hmC profiling. Importantly, these single-base maps also revealed a significant number of new 5hmC sites. Analyses of two 5hmC maps in ESCs identified several novel sequence-based characteristics of 5hmC that were previously unknown. We observed that, much like 5mC, 5hmC tends to occur primarily at CpG-dinucleotides yet, unlike 5mC, exhibits an asymmetric strand bias. We also observed a relatively strong local sequence preference surrounding 5hmC, with 5hmC occurring within a G-rich context. This observation is consistent with previous report that 5hmC regions are GC-skewed (Stroud et al., 2011
). These sequence-based features associated with 5hmC may provide a basis for future mechanistic insight into the means by which 5hmC is deposited, recognized, and dynamically regulated.
The ability to quantify 5hmC abundance with base resolution offered the unique opportunity to assess its relative abundance at various regulatory elements and genomic annotations without bias. In contrast to the nearly uniform distribution of 5mC outside of promoter regions, we found that the abundance of 5hmC varies among different classes of functional sequences. It is most enriched at distal regulatory regions where levels of 5mC are correspondingly lower than the genome average. This observation agrees with recent findings from others (Stadler et al., 2011
), and suggests that active demethylation occurs at active regulatory elements through 5hmC. This active demethylation is distributed around, but not within, transcription factor consensus motifs. Supporting the notion of active demethylation, total DNA methylation exhibits a strong negative correlation with 5hmC at distal regulatory elements (Spearman correlation = −0.30). One interesting observation of these distal cis
-regulatory elements is that 5hmC and 5mC often occur together at the same cytosine. Currently, the exact mechanisms that determine the dynamics of 5hmC and 5mC at these cis
-regulatory sequences are unclear.
Previous affinity-based studies have suggested enrichment of 5hmC at CpG-rich transcription start sites. However, these observations relied heavily on antibody-base detection, which has been shown to exhibit bias toward 5hmC dense regions. Here we find that, in general, 5hmC is most abundant at regions of low CpG content. Furthermore, even promoters with relatively high 5hmC content tend to have low CpG content in both mouse and human ESCs. These findings highlight the utility of a base-resolution method for measuring 5hmC abundance, and provide new insight into its dynamic regulation at promoter sites with distinct CpG content.
Tahiliani and colleagues (Tahiliani et al., 2009
) recently estimated the genome-wide abundance of 5hmC to be about 14 times less than that of 5mC, which would correspond to ~4.4 million 5hmCs in human. However, as our results indicate that the base-level abundance of 5hmC is several times lower than 5mC, this is likely an under-estimate. The comparatively low number of 5hmCs confidently detected in our study (691,414) is likely explained by the frequent hydroxymethylation of gene bodies previously observed in affinity-based studies (Ficz et al., 2011
; Pastor et al., 2011
; Stroud et al., 2011
; Szulwach et al., 2011a
; Williams et al., 2011
; Wu et al., 2011
; Xu et al., 2011
). Since genic cytosines likely exist at a relatively low abundance of 5hmC (between 3–4%), they would have escaped detection at our current sequencing depth. In order to resolve low abundance 5hmCs at single-base precision, significantly more sequencing would be required. This observation highlights the biases inherent in affinity-based 5hmC mapping, which can amplify frequent weak signals found in gene bodies to overshadow rare but stronger ones at distal regulatory elements.
In summary, we have developed a genome-wide approach to determine 5hmC distribution at base resolution, and generated the first base-resolution maps of 5hmC in both human and mouse ESCs. These maps provide a template for further understanding the biological roles of 5hmC in stem cells as well as gene regulation in general. In conjunction with methylC-Seq, the TAB-Seq method described here represents a general approach to measure the absolute abundance of 5mC and 5hmC at specific sites or genome-wide, which could be widely applied to various cell types and tissues.