Induced pluripotent stem (iPS) cells can be derived from somatic cells by introduction of a small number of genes: for example, POU5F1
. As direct derivatives of an individual’s own tissue, iPS cells offer considerable therapeutic promise5
, avoiding both immunologic and ethical barriers to their use. iPS cells differ from their somatic parental cells epigenetically, and thus a comprehensive comparison of the epigenome in iPS and somatic cells would provide insight into the mechanism of tissue reprogramming. Although two recent targeted studies6,7
examined a subset of the genome—7,000 (ref. 6
) and 66,000 (ref. 7
) CpG sites—in a small cohort of three iPS-fibroblast pairs, a global assessment of genome-wide methylation has not yet been performed.
Recently, we described differential methylation patterns that distinguish among normal tissue types (T-DMRs) and patterns that can segregate colorectal cancer tissue from matched normal tissues (C-DMRs)8
. Unexpectedly, these two DMRs occur 13-fold more frequently at CpG island ‘shores’, regions of comparatively low CpG density that are located near traditional CpG islands, than at the CpG islands themselves. Cancers showed approximately equal numbers of hypomethylated and hypermethylated regions, and 45% of C-DMRs overlapped T-DMRs, suggesting that epigenetic changes in cancer involve reprogramming of the normal pattern of tissue-specific differentiation8
Here we used a similar approach to the question of iPS cell reprogramming, first comparing six human iPS cell lines to the fibroblasts from which they were derived using comprehensive high-throughput array-based relative methylation (CHARM) analysis9
. This approach allows the interrogation of ~4.6 million CpG sites genome-wide using a custom designed NimbleGen HD2 microarray, including almost all CpG islands and shores in the human genome. Genomic DNA from iPS cells3,5
, their parental fibroblasts and human embryonic stem (hES) cells (Online Methods) was digested with the enzyme McrBC, fractionated, labeled and hybridized to a CHARM array.
A total of 4,401 regions (including 96,404 CpG sites) were found to differ in iPS cell lines from the fibroblasts of origin (, Supplementary Table 1
) at a false discovery rate (FDR) of 5%; we term these regions R-DMRs. Of these R-DMRs, DMRs that were hypermethylated in iPS cells compared to fibroblasts predominated over hypomethylated DMRs (60%:40%). Of the 4,401 DMRs, 1,969 were within 2 kb of the transcriptional start site of a gene.
Differentially methylated regions (DMRs) found by CHARM that overlap with tissue-specific differentially methylated regions (T-DMRs)
The genes that were associated with these R-DMRs showed functionally important features based on bioinformatic analyses. First, gene ontology (GO) annotation analysis of these genes revealed significant enrichment for genes involved in developmental and regulatory processes (Supplementary Table 2
). For example, 38% of the genes that were hypomethylated in iPS compared to fibroblasts (P
= 3.56 × 10−60
) and 22% of the genes that were hypermethylated in iPS compared to fibroblasts (P
= 1.73 × 10−12
) were involved in developmental processes. To further elucidate the functional significance of these R-DMRs, we looked at their overlap with bivalent domains, which mark developmental genes in embryonic stem (ES) cells10,11
. Notably, 65% of the R-DMRs that were hypomethylated in iPS cells compared to fibroblasts showed significant association with bivalent domain marks (P
< 0.0001 by 10,000 permutations), whereas only 18.6% of hypermethylated R-DMRs overlapped with these domains (P
= 0.5699 by 10,000 permutations) (Supplementary Table 3
). Furthermore, when we observed the overlap of the R-DMRs with known binding sites for pluripotency markers such as POU5F1
), we saw a similar relationship, in which the hypomethylated R-DMRs showed significant overlap (P
< 0.0001 by 10,000 permutations) whereas the hypermethylated DMRs did not (P
= 1 by 10,000 permutations; Supplementary Table 4
). These observations indicate that the sites of demethylation during reprogramming of fibroblasts to iPS cells are tightly linked to genes that are functionally important for pluripotency.
The R-DMRs showed several noteworthy features. First, over 70% of the R-DMRs were associated with CpG island shores rather than with the associated CpG islands (), regardless of whether the R-DMRs were hypermethylated or hypomethylated in iPS cells relative to fibroblasts (Supplementary Fig. 1a
). Second, 56% of R-DMRs overlapped T-DMRs previously identified as distinguishing tissues representing the three germ cell lineages, namely, brain, liver and spleen8
(). This overlap was statistically significant (P
< 0.0001 by 10,000 permutations). Furthermore, both hypermethylated and hypomethylated R-DMRs in iPS cells showed similar overlap with known T-DMRs, overlapping at 54% and 60%, respectively (). Thus, R-DMRs are heavily enriched in CpG island shores and largely overlap T-DMRs that are involved in normal development. There was also a 61% overlap of the gene-proximal R-DMRs with the T-DMRs.
Figure 1 Reprogramming differentially methylated regions (R-DMRs). (a) Enrichment of R-DMRs at CpG island shores. The CHARM array (left, labeled CpG regions) is enriched in CpG islands, and the R-DMRs (right, labeled R-DMR) show marked enrichment at CpG island (more ...)
We then repeated the CHARM analysis on a separate set of three iPS cell lines and the fibroblasts from which they were derived, as well as three human ES cell lines. We could not perform an FDR statistical test on this smaller number of lines, so we used a similar area cutoff in the curves that corresponded in magnitude to the 5% FDR cutoff of the previous experiment. In this second analysis, 2,179 R-DMRs were identified, with a slight excess of hypomethylated versus hypermethylated DMRs (55% compared to 45%) in iPS cells. Notably, 80% of the DMRs overlapped those found in the first experiment (see Supplementary Table 5
for full list). As in the first analysis, there was a substantial enrichment for CpG island shores (78%, Supplementary Fig. 1b
), and 60% of the R-DMRs overlapped T-DMRs ().
This second analysis provided insight into the methylome of iPS cells as compared to ES cells. Although the two cell types had very similar DNA methylation, 71 DMRs distinguished them, with 51 showing hypermethylation and 20 showing hypomethylation in iPS cells (Supplementary Table 6
). GO annotation of these DMRs showed significant enrichment of developmental processes in the genes that were hypermethylated in iPS cells as compared to ES cells (Supplementary Table 7
). In 32 of the DMRs that distinguish iPS cells from ES cells, the DMRs were near genes of interest, including HOXA9
and two genes that encode the zinc finger proteins ZNF568
. In some cases, the methylation in iPS cells was intermediate between differentiated fibroblasts and ES cells; this was true, for example, of TBX5
, which encodes a transcription factor that is involved in cardiac and limb development. In other cases, methylation in iPS cells differed from both fibroblasts and ES cells, suggesting that the iPS cells occupy a distinct and possibly aberrant epigenetic state. An example was PTPRT
, encoding a protein tyrosine phosphatase involved in many cellular processes including differentiation. For some ES-iPS differences, the methylation levels changed in the same direction as for ES cells compared to fibroblasts, but to a greater degree; for example, methylation of the homeobox gene HOXA9
was greater in iPS compared to ES, whose methylation at this gene was greater than in fibroblasts.
We validated these data in two ways. First we verified the methylation results from CHARM by bisulfite pyrosequencing of nine DMRs, examining 2–6 CpGs within each DMR. For all of these genes, the bisulfite pyrosequencing data confirmed the differential methylation data from CHARM (, Supplementary Fig. 2
We also performed global gene expression analysis using the Affymetrix HGU133 Plus 2.0 microarray. There was a strong inverse correlation between differential gene expression and differential DNA methylation at R-DMRs that are within 500 bp of the transcriptional start site (TSS) of a gene: P
for both hypermethylation and hypomethylation (Supplementary Fig. 3a
, Supplementary Table 8
). The significant association held true even when the R-DMR was within 1 kb of a TSS (P
= 0.01 and P
for hypermethylated and hypomethylated R-DMRs, respectively, Supplementary Fig. 3b
). Moreover, this correlation was enhanced in DMRs that were in CpG island shores.
Furthermore, we performed an unsupervised cluster analysis using the R-DMRs to determine to what degree the methylation at these locations distinguished normal brain, liver and spleen from each other. Notably, there was complete separation of these three tissues, indicating that the sites of the methylation changes that occur during reprogramming normally distinguish these disparate tissues (). In addition, the R-DMRs could largely distinguish normal colonic mucosa from colorectal cancer, indicating that the R-DMRs are also involved in abnormal reprogramming in cancer (). As a test of significance, none of 1,000 randomly generated lists of the CHARM array regions of equal length and number clustered the tissues as well, as assessed either by whether they yielded a median euclidean distance among samples of a given tissue type at least as low as that found when using the R-DMRs, or yielded a median euclidean distance among samples of different tissue types at least as great as that found when using the R-DMRs. This was true both for the comparison between normal tissues and for the cancer-to-normal-tissue comparison.
Figure 2 DNA methylation at R-DMRs distinguishes normal tissues from each other and colon cancer from normal colon. (a,b) The M values of all tissues from the 4,401 regions (FDR < 0.05) corresponding to R-DMRs (iPS cells compared to parental fibroblasts) (more ...)
We compared the R-DMRs to those obtained in a genome-scale comparison of DNA methylation in colorectal cancer and matched normal colonic mucosa from the same individuals (C-DMRs)8
. We had previously found a much smaller number of C-DMRs than T-DMRs (2,707 compared to 16,379), and 45% of the C-DMRs overlapped T-DMRs. Approximately 16% of the R-DMRs in the present study overlapped the C-DMRs of the previous study, whereas only 4.5% on average would be predicted by permutation analysis to overlap (P
< 0.0001 based on 10,000 permutations) (Supplementary Table 9
). Notably, hypomethylated R-DMRs (iPS compared to fibroblasts) were associated with hypermethylated C-DMRs (cancer compared to normal, P
< 0.0001 based on 10,000 permutations) (Supplementary Table 9
). Of the 294 DMRs found to overlap between hypomethylated R-DMRs and hypermethylated C-DMRs, 251 (85%) also overlapped bivalent chromatin marks. In contrast, hypermethylated R-DMRs were associated with hypomethylated C-DMRs (P
< 0.0001 based on 10,000 permutations) (Supplementary Table 9
). Of the 293 DMRs found to overlap between hypermethylated R-DMRs and hypomethylated C-DMRs, only 37 (13%) also overlapped bivalent chromatin marks. Because bivalent chromatin marks are associated with recruitment of Polycomb group proteins, these data suggest that there are two independent epigenetic mechanisms for cell reprogramming and tumorigenesis. One mechanism involves decreased DNA methylation and chromatin modifications at bivalent sites during reprogramming and increased methylation in cancer. The other mechanism involves increased methylation during reprogramming and loss of methylation in cancer.
In summary, we have found that epigenetic reprogramming of human fibroblasts to iPS cells involves substantial changes in DNA methylation largely affecting the same CpG island shores in T-DMRs that mark normal differentiation. It is notable that the R-DMRs completely distinguish brain from liver from spleen tissues and largely distinguish colon cancer from normal colon tissue. These results provide compelling evidence of the importance of CpG island shores and T-DMRs in both normal development and somatic cell reprogramming. Indeed, the target loci for normal tissue programming, epigenetic reprogramming to pluripotency and aberrant programming of cancers largely overlap. A secondary finding is that certain loci in iPS cells remain incompletely reprogrammed, whereas others are aberrantly reprogrammed, thus establishing that the methylation pattern of iPS cells differs both from those of the parent somatic cells and from those of human ES cells.
Our results contrast with prior studies that were primarily directed toward developing powerful new tools to analyze DNA methylation of targeted genomic regions rather than genome-scale studies of iPS cell methylation. Our more extensive genome-scale analysis of nine paired sets of iPS cells and parental fibroblasts detected roughly equal levels of hypo- and hypermethylation and revealed the predominant involvement of CpG island shores over islands themselves. Limitations of our study include the still-incomplete genome coverage of the CHARM array, which although including islands and shores, still does not examine single or very low density CpG methylation; the use of iPS cells derived from a single cell type; and the still relatively limited database for comparison of T-DMRs, involving only three normal tissue types, and C-DMRs, involving only one cancer type. Nevertheless, the present study reveals a host of loci that represent targets of epigenetic remodeling that are central to somatic cell reprogramming. These R-DMRs include both hypomethylated and hypermethylated regions and are a subset of the previously described T-DMRs and C-DMRs, indicating that these R-DMRs at CpG island shores are critical epigenetic targets for defining cell fate.
Finally, the colocalization of hypomethylated R-DMRs in iPS cells with hypermethylated C-DMRs in cancer and bivalent chromatin marks, and hypermethylated R-DMRs with hypomethylated C-DMRs and the absence of these marks, suggest two parallel mechanisms for epigenetic reprogramming in iPS cells and in cancer, one involving a loss of DNA methylation in iPS and a chromatin-dependent gain of DNA methylation in cancer and the other involving a gain of methylation in iPS and a chromatin-independent loss of DNA methylation in cancer.