Although global patterns of DNA methylation in the CG context appeared very similar between ES cells and iPSCs ( and ), a comprehensive analysis of CG DNA methylation between all ES cell and iPSC lines identified 1,175 differentially methylated regions (CG-DMRs) that were differentially methylated in at least one iPSC or ES cell line (1% false discovery rate (FDR); and
Supplementary Table 2) and in total comprised 1.68 Mb ranging from 1–11 kb in length. Importantly, identification of CG-DMRs between the H1 and H9 ES cells with the same criteria (1% FDR) provided no results (see
Supplementary Methods for details). Whereas mCG patterns within each category of cells (ES cell, iPSC, somatic) were generally consistent and distinct from the cells in each other category, individual cell lines showed some variability.
DNA methylation at CG islands proximal to gene promoters and transcriptional start sites is inhibitory to transcriptional activity
22. To address whether highly methylated CG islands in differentiated cells can be demethylated during iPSC reprogramming, we analysed CG-DMRs between the ES cells and somatic cells (1% FDR, twofold enrichment) that overlapped with CG islands. Of 3,507 CG-DMRs coincident with CG islands (CGI-DMRs), 1,904 and 374 were hypermethylated in ES cells and somatic cells, respectively. Of the 374 CGI-DMRs hypermethylated in somatic cells, 94% were hypomethylated in the iPSCs and were similar to ES cells (
Supplementary Fig. 8). Of the 1,904 CGI-DMRs hypermethylated in ES cells, 83% were hypermethylated, similar to ES cells, in the iPSCs (
Supplementary Fig. 9). Together, these results indicate that CG islands in iPSCs are predominantly reprogrammed to an ES-cell-like state and, in particular, hypermethylated CG islands are not especially resistant to reprogramming.
CG-DMRs identified between iPSCs and ES cells may be categorized as either a failure to reprogram the progenitor somatic cell methylation patterns (somatic ‘memory’) or iPSC-specific DMRs (iDMRs) that are not observed in the progenitor somatic cells and ES cells. A recent study reported the retention of somatic cell DNA methylation patterns in early-passage (passage 4) mouse iPSCs that was sufficient to distinguish between iPSC lines derived from different progenitor cell types, and which was subsequently attenuated after further passages (10–16 in total)
14. However, the iPSCs analysed here included relatively late-passage iPSC lines (15–65 passages;
Supplementary Table 1), indicating that we are able to discriminate somatic DNA methylation patterns in iPSCs that are resistant to resetting to an ES-cell-like state. Comparison of iPSC lines to their respective progenitors revealed that 44–49% of CG-DMRs were aberrant with respect to ES cells (
P value = 0.05) and reflected memory of the progenitor methylation state ( and
Supplementary Fig. 10). Accordingly, 51–56% of the iPSC CG-DMRs could be classified as iDMRs, reflecting a methylation state dissimilar to the respective progenitor somatic cell and both ES cell lines ( and
Supplementary Fig. 10).
Inspection of the concordance of methylation states in the five iPSC lines showed that 69% of the CG-DMRs were aberrant with respect to the ES cells in at least two iPSC lines, with 16% being confirmed in all five iPSC lines ( and
Supplementary Table 3). The majority of CG-DMRs (80%) occurred at CG islands, and to a lesser extent near or within genes (62%), with 29% and 19% located within 2 kb of transcriptional start and end sites, respectively (). Analysis of biological processes attributed to genes proximal to CG-DMRs in each line or common to all iPSC lines did not identify any enrichment of specific processes, indicating that disruption of the normal regulation of these genes could affect many aspects of cellular function. Closer inspection of the CG-DMRs confirmed in all five iPSC lines revealed that the vast majority of them (119 of 130, or 92%) were hypomethylated in the iPSC lines, indicating that the general deficiency in resetting DNA methylation patterns during reprogramming is insufficient methylation. Notably, the remaining 11 CG-DMRs hypermethylated in all iPSC lines were iDMRs, as they are not differentially methylated in the progenitor cells compared to the ES cells. In addition, they were associated with transcriptional repression and the absence of the heterochromatic H3K27me3 histone modification, compared to H1 ES cells ().
The genome sequences at the CG-DM Rs present in all iPSC lines were analysed to identify motifs that could be associated with the altered DNA methylation states. Binding sites for two human transcription factors were identified in sequences conserved over the DMRs, corresponding to the reprogramming factor KLF4 and the chromatin-remodelling factor FOXL1 (
Supplementary Fig. 11). Given that KLF4 has previously been found to bind to the promoter of FAM19A5 in H1 ES cells at precisely the same genomic position as one of the 11 hypermethylated iDMRs shared between all iPSC lines
18, it is tempting to speculate that development of the conserved aberrant methylation states in the iPSC lines may be related to altered expression of the endogenous and/or introduced copy of
KLF4 during the reprogramming process.
By differentiation of both H1 and FF-iPSC 19.11 cells into trophoblast lineage cells with BMP4, we were able to determine the frequency at which CG-DMRs in iPSCs were transmitted through differentiation. We identified 140 hypomethylated () and 70 hypermethylated () CG-DMRs present in both FF-iPSC 19.11 cells and FF-iPSC 19.11-BMP4 trophoblasts with respect to H1 and H9 ES cells, and H1-BMP4 trophoblasts. A high proportion of the CG-DMRs in FF-iPSC 19.11 cells relative to both ES cell lines were transmitted through the differentiation process, with 88% and 46% of hypermethylated and hypomethylated CG-DMRs, respectively, still present in FF-iPSC 19.11-BMP4 trophoblasts but not in H1-BMP4 trophoblasts (). These transmitted CG-DMRs were comprised of both somatic memory ( and
Supplementary Fig. 12) and iDMR ( and
Supplementary Fig. 13) classes. Notably, 9 of 11 hypermethylated and 57 of 119 hypomethylated CG-DMRs present in all iPSC lines were transmitted to the FF-iPSC 19.11-BMP4 trophoblast cells.
The 1,175 CG-DMRs identified between iPSCs and ES cells and the iPSC conserved CG-DMRs were profiled and confirmed in two previously reported ES cell DNA methylomes, HSF1 (
ref. 23) and H9-Laurent (
ref. 24) (
Supplementary Fig. 14). Hierarchical clustering of the 1,175 CG-DMRs indicated that HSF1 and H9-Laurent ES cells are similar to H1 and H9. Lastly, we find that all of the iPSC hypermethylated CG-DMRs and 75% of the iPSC hypomethylated CG-DMRs are confirmed with respect to the two additional ES cell lines (
P value < 0.05, as for H1 and H9).
Several conclusions can be made from this catalogue of CG-DMRs. First, reprogramming a somatic cell to a pluripotent state generates hundreds of aberrantly methylated loci, predominantly at CG islands and associated with genes. Second, whereas insufficient reprogramming manifested as a memory of the progenitor somatic cell methylation state is common, a high incidence of iDMRs unlike both the progenitor somatic cell and ES cells indicates that aberrant methylation patterns dissimilar to both the start and endpoints of the reprogramming process are frequently generated. Third, although there is variability in the loci that are differentially methylated between iPSC lines, a high proportion of CG-DMRs are found in multiple independent iPSC lines, indicating that these regions have a strong propensity to be insufficiently or aberrantly reprogrammed. Fourth, a core set of CG-DMRs was present in every iPSC line, representing hotspots of failed epigenomic reprogramming common to iPSCs. Fifth, both memory CG-DMRs and iDMRs are transmitted through differentiation of the iPSCs at a high frequency, indicating that the disrupted DNA methylation states are not simply a transient aberration during the pluripotent state. The identification of hundreds of CG-DMRs that cannot be erased by passaging and are frequently transmitted through cellular differentiation has immediate consequences for the derivation and use of iPSCs.