DNA methylation can be detected by sequencing genomic DNA that has been treated with sodium bisulphite
7. It has been impractical to apply bisulphite sequencing at a genome-wide scale because polymerase chain reaction (PCR)-based
8 and whole-genome shotgun
9 approaches are currently too inefficient for comparative analysis across multiple cell states in large mammalian genomes. However, reduced representations can be generated to sequence a defined fraction of a large genome
6,10. Computational analysis indicated that digesting mouse genomic DNA with the methylation-insensitive restriction enzyme MspI, selecting 40–220-base pair (bp) fragments, and performing 36-bp end-sequencing would cover ~1 million distinct CpG dinucleotides (4.8% of all CpGs), with roughly half located within ‘CpG islands’ (including sequences from 90% of all CpG islands) and the rest distributed between other relatively CpG-poor sequence features (
Supplementary Fig. 1 and Supplementary Table 1). Notably, although CpGs are not distributed uniformly in the genome, every MspI reduced representation bisulphite sequencing (RRBS) sequence read includes at least one informative CpG position (
Supplementary Fig. 2), making the approach highly efficient.
We validated high-throughout RRBS by sequencing MspI fragments from wild-type and methylation-deficient embryonic stem (ES) cells
6, using an Illumina Genome Analyser. We generated an initial set of ~21 million high quality, aligned RRBS reads. The reads from each cell type included ~97% of the predicted non-repetitive MspI fragments (12-fold and 8-fold median coverage, respectively). This demonstrates that RRBS library construction is relatively unbiased (
Supplementary Fig. 3) and is insensitive to genome-wide CpG methylation levels (estimated by nearest-neighbour analysis as 72% and 0.5%, respectively). Reads from both cell types showed near complete (>99%) bisulphite conversion of non-CpG cytosines.
To investigate cell-type-specific DNA methylation patterns, we generated 140 million additional RRBS reads (5.8 gigabase (Gb);
Supplementary Information) from ES-derived neural precursor cells (NPCs) and various primary cell populations (
Supplementary Table 2). We also generated new chromatin-state maps of H3 lysine 4 mono- and di-methylation (H3K4me1 and H3K4me2) from ES cells, NPCs and whole brain tissue (
Supplementary Table 3 and Supplementary Information), using chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-Seq)
11.
The methylation levels of CpG dinucleotides in wild-type ES cells display a bimodal distribution (), with most being either ‘largely unmethylated’ (<20% of reads showing methylation) or ‘largely methylated’ (>80% of reads). As expected
2,8,12, CpGs in regions of high CpG density (>7% over 300 bp) tend to be unmethylated, whereas CpGs in low-density regions (<5%) tend to be methylated. However, we noted that ~10% of CpGs in low-density regions were unmethylated, whereas ~0.3% of CpGs in high-density regions were methylated. We found that DNA methylation patterns were better explained by histone methylation patterns than by CpG density. Because genomic features tend to be associated with distinct histone methylation patterns
11, we analysed these features separately.
High-CpG-density promoters (HCPs) are associated with two classes of genes: ubiquitous ‘housekeeping’ genes and highly regulated ‘key developmental’ genes
13. In ES cells, HCPs at housekeeping genes are enriched with the transcription initiation mark H3K4me3 (‘univalent’) and are generally highly expressed, whereas those at developmental genes are enriched with both H3K4me3 and the repressive mark H3K27me3 (‘bivalent’) and are generally silent
11,14. Both types of promoters are also enriched with H3K4me2, which is associated with an open chromatin confirmation. Out of the 10,299 HCPs sampled (on average, 19 distinct CpGs per promoter), we found that virtually all contain a core region of unmethylated CpGs, regardless of their level of expression or H3K27me3 enrichment ( and )
12,14,15.
Low-CpG-density promoters (LCPs) are generally associated with tissue-specific genes. In ES cells, a small subset of LCPs are enriched with H3K4me3 (~7%) or H3K4me2 (~3%), and essentially none are enriched with H3K27me3 (
ref. 11). We found that whereas most CpGs located in sampled LCPs (990 sites from 392 promoters) are methylated, those in LCPs enriched with H3K4me3 or H3K4me2 have significantly reduced methylation levels (
Supplementary Fig. 4).
Distal regulatory regions such as enhancers, silencers and boundary elements are often required to establish correct gene expression patterns in mammalian cells
16.
Cis-regulatory elements active in a particular cell type are often associated with markers of open chromatin, such as H3K4me2 or H3K4me1 (
refs 17, 18). We identified 25,051 sites of H3K4me2 enrichment in ES cells from 1 kb to >100 kb away from known promoters (most were also enriched with H3K4me1, but not with H3K4me3). CpGs sampled at H3K4me2-enriched sites (outside of promoters and CpG islands) had significantly lower methylation levels than those at unenriched sites (). This relationship was particularly strong for CpGs located in highly conserved non-coding elements (HCNEs; ).
Imprinting control regions (ICRs) are CpG-rich regulatory regions that display allele-specific histone and DNA methylation
19. Our RRBS library included sequences from 13 of ~20 known ICRs (on average, 13 distinct CpGs per ICR). CpGs within these elements display a unimodal distribution of methylation levels, with a median close to 50%, which is consistent with hypomethylation of the active allele marked with H3K4me3 and hypermethylation of the silenced allele marked with H3K9me3 ()
11.
Interspersed repeat families differ in their chromatin structure, with H3K9me3 enriched at active long terminal repeats (LTRs) and to a lesser extent at long interspersed elements (LINEs), but not at short interspersed elements (SINEs). Notably, CpGs located in LTRs and LINEs are generally hypermethylated, even in CpG-rich contexts (). In contrast, CpGs in SINEs show a correlation between methylation levels and CpG density that is comparable to non-repetitive sequences.
We conclude that in ES cells the presence of H3K4 methylation and the absence of H3K9 methylation are better predictors of unmethylated CpGs than sequence context alone. This is consistent with models in which
de novo methyl-transferases either specifically recognize sites with unmethylated H3K4 (
ref. 20) or are excluded by H3K4 methylation or associated factors. Similarly, H3K9me3 or associated factors may recruit methyl-transferases at ICRs and repetitive elements
21.
We next used RRBS to analyse how DNA methylation patterns change when ES cells are differentiated
in vitro into a homogeneous population of NPCs (
Supplementary Fig. 4)
22. Whereas CpG methylation levels are highly correlated between the two cell types (rho = 0.81), there were clear differences: ~8% of CpGs unmethylated in ES cells became largely methylated in NPCs, whereas ~2% of CpGs methylated in ES cells became unmethylated; these changes were strongly correlated with changes in histone methylation patterns.
At both univalent and bivalent HCPs, we found that most CpGs remained unmethylated on differentiation, particularly within their core CpG island, but that loss of H3K4me3 and retention of H3K4me2 or H3K27me3 correlated with a partial increase in DNA methylation levels (median, ~25%; 2.9% and 32% of univalent and bivalent HCPs, respectively) and complete loss of H3K4 and H3K27 methylation correlated with DNA hypermethylation (median, ~75%; 2.8% and 16% of univalent and bivalent HCPs, respectively; ).
Most LCPs marked by H3K4 methylation in ES cells lose this mark in NPCs; however, LCPs associated with genes expressed in NPCs gain this mark. Loss or gain of H3K4 methylation is a strong predictor of inverse changes in CpG methylation levels at these promoters (
Supplementary Fig. 5).
Our chromatin-state maps revealed that 18,899 (75%) of putative distal regulatory elements enriched with H3K4me2 in ES cells lost this mark in NPCs, whereas 20,088 new H3K4me2 sites appeared, often in HCNE-rich regions surrounding activated developmental genes (). Loss or gain of H3K4 methylation were again inversely correlated with CpG methylation levels (). In fact, these regions account for most observed de-methylation events. The presence of H3K27me3 alone did not correlate with lower methylation levels in CpG-poor regions (
Supplementary Fig. 6).
The data support the notion that CpG-rich and -poor regulatory elements undergo distinct modes of epigenetic regulation
2,11,12. Most (>95%) HCPs seem to be constitutively unmethylated and regulated by trithorax-group (trxG; associated with H3K4me3) and/or Polycomb-group (PcG; associated with H3K27me3) proteins, which may be recruited in part by means of non-specific unmethylated-CpG binding domains
23. Hypermethylation of these CpG-dense regions leads to exclusion of trxG/PcG activity, heterochromatin formation and essentially irreversible gene silencing
2. In contrast, regulatory elements in CpG-poor sequence contexts seem to undergo extensive and dynamic methylation and de-methylation. Hence, methylation of isolated CpGs may contribute to chromatin condensation or directly interfere with transcription factor binding
2, but does not necessarily prevent chromatin remodelling in response to activating signals.
As noted above, a small set of HCPs (
n = 252; ~3%) became hypermethylated (>75% mean methylation across sampled CpGs) on
in vitro differentiation of ES cells to NPCs. To investigate whether the observed pattern reflects an
in vivo regulatory mechanism, we isolated NPCs from embryonic day (E)13.5 embryos and differentiated them into glial fibrillary acidic protein (Gfap)-positive astrocytes (with no more than two passages
in vitro). We similarly differentiated the
in vitro-derived NPCs into astrocytes (with these cells having undergone at least 18 passages;
Supplementary Fig. 4), and compared the two populations using RRBS ().
The methylation levels of CpGs were highly correlated (rho = 0.85), but astrocytes obtained from
in vivo NPCs displayed substantially less HCP hypermethylation than those obtained from ES cells (). The
in vivo-derived astrocytes showed hypermethylation at only 30 HCPs, largely associated with germline-specific genes (including
Dazl, Hormad1, Sycp1, Sycp2 and
Taf7l), several of which also showed partial methylation in ES cells. In contrast, the
in vitro-derived astrocytes showed hypermethylation of these and ~305 additional HCPs. This set includes some genes known to be expressed by at least some
in vivo astrocytes (including
Isyna1, Gsn and
Cldn5;
ref. 24) but that were silent in the ES-cell-derived astrocytes (
Supplementary Information). However, the hypermethylated HCPs are significantly enriched for genes not expressed in NPCs or in the astrocyte lineage (
Supplementary Tables 4–7). They include genes involved in development and differentiation of neuronal (
Lhx8, Lhx9, Moxd1, Htr1f and
Slit1), ependymal (
Otx2 and
Kl) and unrelated lineages (including
Myod1, Dhh and
Nkx3-1). In fact, we found that ‘key developmental’ HCPs that are bivalent in ES cells are six times more likely to be included in the hypermethylated set compared to univalent HCPs. Moreover, univalent genes in the hypermethylated set are expressed at significantly lower levels in both ES cells and primary astrocytes, compared to those that remained hypomethylated (). We also found that the hypermethylated HCPs tend to have a ~15% lower CpG density ().
To investigate further the differences between
in vitro and
in vivo cell populations, we analysed whole brain tissue (representing cells of mainly glial origin). Virtually all (>99%) of sampled HCPs were unmethylated () and enriched with H3K4me3 and/or H3K27me3 (
Supplementary Fig. 7), with ~20 germline-specific HCPs being the only clear exceptions. RRBS libraries from other
in vivo sources (T cells, B cells, spleen, lung, liver and fibroblasts) also showed few hypermethylated HCPs (
Supplementary Fig. 8). This suggests that—apart from silencing germline-specific
12, imprinted and X-inactivated (
Supplementary Fig. 9) genes in somatic tissues—hypermethylation of HCPs is not a major mechanism of developmental regulation
in vivo.
To test for a correlation between passage number and HCP hypermethylation, we examined independently derived
in vitro NPCs collected after only 9 passages. These cells displayed hypermethylation at approximately half of the HCPs that are hypermethylated in the NPCs after 18 passages (). To reduce time in culture further, we used Sox1–GFP (green fluorescent protein) ES cells
25 to isolate very early NPCs. These cells initially displayed virtually no HCP hypermethylation. However, after continued culturing they acquired hypermethylation at many of the same HCPs as the previous NPC populations (
Supplementary Fig. 8). Finally, we grew the
in vivo-derived NPCs for 11 passages
in vitro, differentiated them into astrocytes and then examined the methylation pattern. Notably, these cells had also begun to acquire hypermethylation at a largely similar set of HCPs ().
These results show that independently derived NPC populations from both in vitro and in vivo sources and different genetic backgrounds reproducibly undergo gradual hypermethylation at a characteristic set of HCPs. These observations have several implications.
First, aberrant epigenetic regulation in culture has raised concern over the accuracy of cellular models generated by
in vitro differentiation or manipulation
26–28. Both primary and transformed cell lines, including ES-derived NPC populations, tend to lose developmental potency after continued proliferation in culture
26,29. Susceptibility to hypermethylation at key regulatory genes that are normally activated on differentiation could explain this phenomenon. Second, malignant cells are often found to harbour hypermethylated CpG islands
4,5. Recently, genes known to undergo frequent hypermethylation in adult cancers were noted to be significantly enriched for genes with bivalent promoters in ES cells (reviewed in
ref. 30). The similarities between hypermethylation in culture and in cancer may provide a useful
in vitro model for studying a common underlying mechanism. Finally, the gradual hypermethylation of ‘weak’ HCPs hints at underlying kinetics. Because H3K4 methylases are targeted, at least in part, by non-specific CpG-binding domains
23, such HCPs may be particularly sensitive to imbalanced chromatin-modifying factors or other cancer- or culture-related perturbations.
More generally, RRBS makes it feasible to perform genome-scale bisulphite sequencing on large-mammalian genomes, providing a valuable tool for epigenetic profiling of cell populations. As sequencing capacity increases, genome coverage can be readily scaled in step by adding restriction enzymes, increasing the selected size range or using hybridization-based reduced representation strategies.