A report in this issue of Genome Biology
by Stroud and colleagues [3
] represents one of the recent efforts in mapping the localization of 5hmCs throughout the genome in embryonic stem cells and related cell types. In the work by Shroud et al
. on the mapping of 5hmCs in human embryonic stem cells (hESCs), genomic fragments containing 5hmCs were enriched by immunoprecipitation with 5hmC-specific antibodies, followed by Illumina massively parallel sequencing and quantification of enrichment based on read depth. To eliminate artifacts due to non-specific antibodies, two data sets were generated with different commercial antibodies, and the consistency seemed to be high.
With this map, Shroud and colleagues showed that 5hmCs tend to associate with genic regions, including both promoters and gene bodies (particularly exons). In intergenic regions, 5hmCs co-localize with enhancers marked by the activating histone modifications H3K4me1 and H3K27ac. Importantly, enhancers enriched for 5hmCs appear to associate strongly with hESC-specific genes, suggesting a role for 5hmCs in gene regulation through enhancers. In addition to enhancers, other DNA-protein interaction regions, in particular transcription factor (such as NANOG and OCT4) binding sites, have also been found to be enriched for 5hmCs. This suggests a potential secondary regulatory mechanism by 5hmCs through the blocking of DNMT1, a methyltransferase that generates 5mC, and MeCP2, a transcriptional repressor that binds to methylated promoters. Such a mechanism would ensure that no 5mC is present to prevent the binding of enhancers or transcription factors. Finally, Shroud et al. reported an interesting observation of GC skewness in 5hmC-enriched regions, wherein G residues are enriched over C residues from the 5' ends of the regions, and C residues are enriched over G residues from the 3' ends, although the functional roles of such GC skewness remain elusive.
Four other recent studies on mouse embryonic stem cells (mESCs) have revealed a very similar distribution of 5hmCs [4
]. Notably, Pastor et al
] developed two novel methods for the genome-wide mapping of 5hmCs. One of these methods, called GLIB (glucosylation, periodate oxidation, biotinylation), uses three enzymatic and chemical reactions to label 5hmCs with biotins, followed by pull-down with streptavidin-coated magnetic beads, and direct single-molecule sequencing on a HeliScope. Compared with affinity pull-down by antibodies, GLIB seems to have lower background noise and no bias towards CpG-dense regions. Single-molecule sequencing also eliminated any potential artifact due to bias in PCR amplification. The second novel method is similar to the methods used by the other three groups [3
] in that 5hmC-containing DNA fragments are enriched by antibodies, coupled with massive parallel sequencing. However, one unique aspect of the second method is that genomic DNA is first treated with bisulfite to convert 5hmC into cytosine 5-methylenesulfonate prior to immunoprecipitation. Since the sulfonate group is larger than the hydroxyl group, antibody binding could be more specific and less dependent on CpG density.
Despite the technical differences between these mapping methods, the five studies reached very similar conclusions about the distribution of 5hmCs at the genome level. They reported that 5hmC, similar to 5mC, is enriched in promoters and gene bodies in hESCs and mESCs. In addition, 5hmCs are preferentially present in promoters that have been found to be repressive toward transcription of the associated genes [3
]. Interestingly, while 5mC is distributed mainly at the 3' end of transcriptional start sites, 5hmC is distributed symmetrically at the 5' and 3' ends of transcriptional start sites [4
]. The majority of high 5hmC promoters are marked by the H3K4me3 activation mark alone, and smaller percentages of such promoters co-localize with the repressive bivalent H3K4me3 and H3K27me3 marks. However, if normalized by the total number of genes carrying these histone marks, 5hmCs are actually enriched in H3K4me3 and H3K27me3 bivalent regions [3
]. This led to the hypothesis that 5hmC binds to genes that may be poised for transcription upon differentiation. Interestingly, genes associated with 5hmC in the gene bodies are actively transcribed in mESCs and mouse cerebellum [4
]. Xu et al
] and Wu et al
] noted that the correlation of 5hmC levels with gene expression is stronger at the 3' ends of genes and also for genes associated with Tet1.