|Home | About | Journals | Submit | Contact Us | Français|
Differentiation of human embryonic stem cells (hESCs) provides a unique opportunity to study the regulatory mechanisms that facilitate cellular transitions in a human context. To that end, we performed comprehensive transcriptional and epigenetic profiling of populations derived through directed differentiation of hESCs representing each of the three embryonic germ layers. Integration of whole genome bisulfite sequencing, chromatin immunoprecipitation-sequencing and RNA-sequencing reveals unique events associated with specification towards each lineage. Dynamic alterations in DNA methylation and H3K4me1 are evident at putative distal regulatory elements bound by pluripotency factors or activated in specific lineages. In addition, we identified germ layer-specific H3K27me3 enrichment at sites exhibiting high DNA methylation in the undifferentiated state. A better understanding of these initial specification events will facilitate identification of deficiencies in current approaches leading to more faithful differentiation strategies as well as provide insights into the rewiring of human regulatory programs during cellular transitions.
Coordinated changes to the epigenome are essential for lineage specification and maintenance of cellular identity. DNA methylation (DNAme) and certain histone modifications critically contribute to epigenetic maintenance of chromatin structures and gene expression programs (Zhou et al., 2011; Smith and Meissner, 2013). Genetic deletion of histone methyltransferases and the catalytically active DNA methyltransferases are embryonic or postnatal lethal (Li, 2002) providing evidence for their essential role in proper execution of developmental programs.
Several groups have reported genome-wide maps of chromatin and DNA methylation in pluripotent and differentiated cell types. From these efforts, a global picture of the architecture and regulatory dynamics is beginning to emerge. For example, active promoters generally contain modifications such as H3K4me3 and H3K27ac, while active enhancers are generally enriched for H3K4me1 and H3K27ac (Heintzman et al., 2009; Creyghton et al., 2010; Ernst et al., 2011; Rada-Iglesias et al., 2011). Repressed loci exhibit enrichment for H3K27me3, H3K9me2/3, DNAme, or a combination of the latter two modifications. The enrichment of repressive histone modifications, such as H3K27me3, which is initiated at CpG islands (CGI), is considered a facultative state of repression, while DNAme is generally considered a more stable form of epigenetic silencing (Smith and Meissner, 2013).
Recent studies have reported dynamics that suggest epigenetic priming such as the appearance of euchromatic histone modifications prior to gene activation during in vitro T cell differentiation (Zhang et al., 2012) and cardiac differentiation (Wamstad et al., 2012). These results are reminiscent of changes that occur during the early stages of reprogramming towards the induced pluripotent state (Koche et al., 2011) and highlight possible similarities between differentiation and de-differentiation. In parallel to these advances, whole genome bisulfite sequencing (WGBS) has been used to map DNAme genome-wide. Examination of WGBS data from murine ESCs (mESCs) and neural progenitor cells highlighted lowly methylated regions (LMRs) at distal sites that frequently overlap with DNAse I hypersensitive sites (HS) and/or displayed an enhancer signature defined by H3K4me1 and p300 enrichment (Stadler et al., 2011).
Studying the role of epigenetic modifications in the dynamic rewiring of human transcriptional programs in vivo is complicated by numerous technical and ethical limitations. However, models for in vitro differentiation of hESCs offer a unique opportunity to explore and characterize critical events that prepare, guide and possibly regulate cell fate decisions. Populations representing each embryonic germ layer have been produced from hESCs (Kriks et al., 2011; Chen et al., 2012; Wei et al., 2012).
To dissect the early transcriptional and epigenetic events during hESC specification, we used 2-dimensional, directed differentiation of hESCs to produce representative populations from the three germ layers, namely ectoderm, mesoderm and endoderm (Hay et al., 2008; Evseenko et al., 2010; Lee et al., 2010) followed by fluorescence-activated cell sorting (FACS) to enrich for the desired differentiated populations. These three cell types, in addition to undifferentiated hESCs (HUES64), were then subjected to ChIP-Seq for 6 histone marks (H3K4me1, H3K4me3, H3K27me3, H3K27ac, H3K36me3 and H3K9me3), WGBS and RNA-Sequencing (RNA-Seq). To complement this data, we also performed ChIP-Seq for three TFs (OCT4, SOX2, NANOG) in the undifferentiated hESCs, as well as ChIP-BS-Seq for FOXA2 in the endoderm population. The combined data sets provide a wealth of information, including holistic views of transcriptional and epigenetic dynamics that help further dissect the molecular events during human germ layer specification.
To better understand the molecular dynamics involved in hESC differentiation, we produced populations representative of each embryonic germ layer, namely ectoderm (Lee et al., 2010), mesoderm (Evseenko et al., 2010) and endoderm (Hay et al., 2008) (see Extended Experimental Procedures). We chose the male hESC line HUES64, an NIH-approved line that readily differentiates into each of the three germ layers. These hESCs can be differentiated into a neuroectoderm-like progenitor population positive for SOX2 and PAX6 by inhibition of TGFβ, WNT and BMP signaling (Figure 1A, top). Alternatively, canonical mesoderm markers, such as GATA2 (Figure 1A, middle), can be induced using ACTIVIN A, BMP4, VEGF and FGF2 treatment. Lastly, differentiation towards a definitive endoderm-like fate, positive for markers such as SOX17 and FOXA2 (Figure 1A, bottom), is induced using ACTIVIN A and WNT3A.
We began by measuring the expression of 541 selected genes, including many developmental transcription factors and lineage markers (Bock et al., 2011), at 24-hour intervals during differentiation towards each respective germ layer. We found that 268 of these genes exhibit expression changes (z-score log2 expression) during the first five days of differentiation (Figure 1B). Mesendodermal genes, such as EOMES, T, FOXA2 and GSC, are upregulated at 24 hours of mesoderm and endoderm induction, but not ectoderm differentiation (Figure 1B–C, Table S1). GSC expression decreases within 48 hours of differentiation in the mesoderm-like population, while the expression level is maintained in the endoderm population (Figure 1B–C). EOMES and FOXA2 expression is also maintained in the endoderm population accompanied by upregulation of GATA6, SOX17 and HHEX (Figure 1B). After transient upregulation of mesendodermal markers, activation of mesodermal markers such as GATA2, HAND2, SOX9 and TAL1 is detected specifically in the mesoderm conditions (Figure 1B–C, Table S1). None of these markers are detected during early ectoderm differentiation, which instead upregulates neural markers, such as PAX6, SOX10 and EN1 (Figures 1B–C, Table S1).
We found that POU5F1 (OCT4), NANOG and to some extent SOX2 expression is maintained in our endoderm population (Figure 1B, D, Table S1). This is consistent with prior studies indicating that OCT4 and NANOG expression is detected during the course of early endoderm differentiation and supports NANOG’s suggested role in endoderm specification (Teo et al., 2011). SOX2 expression is downregulated in mesoderm and to a lesser degree in endoderm, but maintained at high levels in the ectoderm population (log2 expression 10.9) (Figure 1D, Table S1), while ZFP42 (REX1) is similarly downregulated in all three lineages (Figure 1B, Table S1). We confirmed that these populations indeed represent a precursor stage for each respective lineage by inducing them to differentiate further, which resulted in upregulation of genes such as OLIG2 and SST in the ectoderm (Chambers et al., 2012), TRPV6 in the mesoderm (Evseenko et al., 2010), and AFP and HGF in the later endoderm populations (Figure S1A) (DeLaForest et al., 2011). Lastly, multidimensional scaling confirmed that at 24 hours the mesoderm population is very similar to the endoderm, while the ectoderm population has already moved in an alternative direction (Figure S1B). These high temporal resolution gene expression signatures suggest that expression programs associated with the three unique cell populations, representing early stages of each germ layer are established within a similar timeframe of hESC differentiation.
Based on these results, we selected day five as the optimal time point to capture early regulatory events in well-differentiated populations representing all three germ layers. To reduce heterogeneity, we used FACS to enrich populations based on previously reported surface markers (Figure S1C); populations isolated by FACS are referred to as dEC for the ectoderm, dME for the mesoderm and dEN for the endoderm. Expression analysis of the sorted populations confirms further enrichment for the desired populations (Figure 1E, Table S1).
We next expanded on our selected gene signature profiles by performing strand specific RNA-Seq on poly-A fractions from each day 5 differentiated FACS-isolated populations and undifferentiated HUES64 (Table S2). Hierarchical clustering based on the global expression profiles of each cell type reveals that the dME population is the most distantly related cell type and that dEN and dEC are more similar to each other than to dME or hESCs (Figure 1F). This was unexpected given that the dME and dEN populations are putatively derived through a common mesendoderm precursor stage (Figure 1B–C, Table S1) while the dEC does not upregulate markers associated with this stage (EOMES, T, GSC; Table S1). Overall, 14,196 RefSeq defined coding and non-coding transcripts (≈38% of defined transcripts) are expressed (FPKM >1) in at least one of the populations, with 11,579 (81.6% of the total number of transcripts detected within our cell types) being expressed in all three populations. Examining the overlap of genes expressed (FPKM >1) in each population reveals that the dME population exhibits expression of the largest number of unique genes (n=448, Figure 1G), such as RUNX1 (FPKM: 3.4) and HAND2 (FPKM: 17.8). Examining genes unique to pairs of the differentiated cell types also reveals that dEC and dME have the least in common (n=37, Figure 1G), while the dEC and dEN have the most number of transcripts in common (n=171, Figure 1G) consistent with our clustering analysis. Genes such as PAX6 (dEC FPKM: 25.9, dEN FPKM: 5.6) and NKX6.1 (dEC FPKM: 2.3, dEN FPKM: 3.3), which are each required for both brain (Ericson et al., 1997) and pancreas development (Sander et al., 1997), are expressed in both the dEC and dEN. Canonical markers of embryonic development, such as FOXA2 (FPKM: 12.7) in the dEN and EN1 (FPKM: 5.8) in the dEC are restricted to their expected germ layers at our early stages (Table S2).
Notably, we also identified 1,296 splicing events (FDR=5%) as well as alternative promoter usage within our populations (Table S3) (Trapnell et al., 2013). For example, we detected expression of multiple isoforms of DNMT3B (p=5×10−5). Expression of DNMT3B isoform 1 (NM_006892) was restricted to the undifferentiated hESCs (FPKM: 214.3), while the differentiated cell types predominantly express an alternative isoform, DNMT3B isoform 3 (NM_175849) (dEC FPKM: 33.9, dME FPKM: 14.2, dEN FPKM: 20.0) (Figure 1H). The presence of this isoform, as well as others, has previously been reported in more advanced stages of embryonic development as well as normal adult (Robertson et al., 1999) and cancerous tissues (Ostler et al., 2007). Our results suggest that this switch coincides with the exit from the pluripotent state, regardless of the specified lineage. We also identified expression of three PITX2 isoforms, with differential splicing leading to different isoform expression between the dEN and dME (Table S3). In the chick PITX2 is essential for heart looping and each isoform is responsible for executing distinct functions (Yu et al., 2001). Taken together, this suggests that both transcript levels and isoform expression contribute to cellular identity.
To gain a more complete picture of the underlying molecular dynamics and investigate the regulatory events during the specification of the three germ layers, we collected approximately 12 million cells of the respective dEC, dME and dEN populations as well as HUES64. All samples were subjected to ChIP-Seq (H3K4me1, H3K4me3, H3K27me3, H3K27ac, H3K36me3, and H3K9me3) and whole genome bisulfite sequencing (WGBS) (Figure 2A), producing a total of 32 data sets with over 12 billion aligned reads (data are publicly available through the NIH Roadmap Epigenomics Project data repositories: http://www.roadmapepigenomics.org/, Table S4).
After completing our basic quality control (see Extended Experimental Procedures and Table S4), we focused our analysis on previously identified informative chromatin states associated with various types of regulatory elements (Ernst et al., 2011; Rada-Iglesias et al., 2011), including the following specific combinations: H3K4me3+H3K27me3 (bivalent/poised promoter); H3K4me3+H3K27ac (active promoter); H3K4me3 (initiating promoter); H3K27me3+H3K4me1 (poised developmental enhancer); H3K4me1 (poised enhancer), H3K27ac+H3K4me1 (active enhancer); H3K27me3 (Polycomb-repressed) and H3K9me3 (heterochromatin). In addition, we segmented the WGBS data into three DNAme states: highly methylated regions (HMRs: >60%), intermediately methylated regions (IMRs: 11–60%) and unmethylated regions (UMRs: 0–10%). The latter differs from the highly methylated background of the genome and likely indicates functional importance as previously suggested (Stadler et al., 2011). We next assigned each genomic region to one of the resulting states (Figure S2A, see Extended Experimental Procedures for details).
Across the four cell populations, we identified 268,062 genomic regions spanning 400 base pairs (bp) to 10 kilobases (kb) (Figure 2B) and covering a total of 450,058,678 bp (18.6% of the human genome) (Figure 2C), that were enriched for at least one of the 8 chromatin states and/or classified as an IMR or UMR in at least one of the cell types. Of the identified epigenetic states, H3K27me3-enriched regions and HMRs covered the most base pairs (Figure 2C) and the combination of H3K4me3 and H3K27me3 exhibits the highest CpG content (Figure 2D), as expected given the close association between H3K27me3 and high CpG density (Ku et al., 2008). The majority of epigenetically dynamic regions are not located near promoters (6.8% +2kb to −500bp of the TSS- Promoters; 48.8% >50 kb upstream of TSS- Intergenic; 15.1% >500 bp downstream of TSS- Intragenic/Gene body) (Figure S2B). As expected, regions of open chromatin exhibit the highest median expression value (Figure 2E). However, overall we find that the majority (62–67%) of all epigenetic remodeling events are not directly linked to transcriptional changes based on the expression of the nearest gene.
The loss of H3K4 methylation (me1 and me3) is commonly associated with a transition to high DNAme (Figure 2F), which is most prominent in the dEN population and preferentially eliminated from genes involved in neural development (i.e. neural tube development q=9.6×10−12). We identified 4,639 proximal bivalent domains in hESCs and observe that 3,951 (85.1%) of these domains resolve their bivalent state in at least one hESC-derived cell type (Figure 2F, Figure S2C). When we specifically investigated the promoters of TF-encoding genes, we found that 463 of these promoters are in a bivalent state in hESCs, and 400 of them change in at least one differentiated cell type (Figure S2D). The majority transitions to H3K4me3-only or H3K27me3-only in a lineage-specific manner, as shown for ISL1 (Figure S2E). In dME, H3K4me3 is gained at the ISL1 locus while H3K27me3 is lost, leading to expression (FPKM: 14.3). The lineage specific dynamics in this region are interesting given that this gene has known roles in all three germ layers, although at later time points (Pfaff et al., 1996; Ahlgren et al., 1997; Cai et al., 2003). Notably, in contrast to the limited overall association between many epigenomic dynamics and changes in expression, we found that a large proportion of these bivalent TFs (275) change their expression level during the differentiation (Figure S2D).
To further explore potential regulators of chromatin dynamics during the exit from pluripotency, we performed ChIP-Seq for OCT4, NANOG and SOX2 in HUES64 (Figure S2F and G). We found that regions bound by all three factors (n=1,556), by SOX2-only (n=923) or by NANOG-only (n=14,531) are frequently associated with inter- and intragenic regions (Figure S2H–S2J, top). In contrast, regions bound by OCT4-only (n=8,599) are more frequently associated with promoter regions (Figure S2H). Examination of regions bound by OCT4, NANOG and SOX2 in hESCs showed H3K4me1 regions enriched for OCT4 binding sites frequently become HMRs in all three differentiated cell types whereas NANOG and SOX2 sites are more prone to change to an HMR state in dME (Figure 2I). In general, many regions associated with open chromatin that are bound by NANOG are more likely to retain this state in dEN compared to dME and dEC (Figure 2I). We also found that regions enriched for H3K27ac in hESCs that maintain this state in dEN or dEC are likely to be bound by SOX2 and NANOG. This is in agreement with the reported role of SOX2 during ectoderm development and differentiation (Wang et al., 2012), but also supports our observation that SOX2 expression is maintained in the dEN. Motif enrichment analysis detected the GATA3 motif in regions bound by OCT4 and SOX2 that transition to an active state in dEC. Furthermore, we found that regions bound by OCT4, NANOG and SOX2 that gain an active mark in dEC are enriched for the motifs PAX9, p63 and STATs (Table S5). Examining epigenetic dynamics at sites of OCT4, NANOG and SOX2 binding further supports the observation that some pluripotency associated TFs are also involved in the downstream specification.
We next utilized the WGBS data that cover approximately 26 Million CpGs (at ≥5 coverage) across all four cell types. Hierarchical clustering analysis of the WGBS data, which included human adult liver and hippocampus for comparison, revealed that the pluripotent hESCs and the hESC-derived cell types form a separate cluster arm with respect to the somatic tissues (Figure 3A).
We determined DMRs defined as exhibiting a significant (p≤0.05) minimal difference of CpG methylation level of 0.1 among our four cell types. The majority of all DMRs occur at CpG-poor intergenic regions in line with previous reports (Figure 3B, bottom) (Stadler et al., 2011). The dEN exhibits more than twice the number of regions that gain DNAme compared to dEC and dME (Figure 3B, top). Interestingly, only 65 of the total number of DMRs identified are shared between all three populations. However, reaffirming that our populations are depleted of pluripotent cells, this group of DMRs includes the regulatory region of OCT4 (Figure S3A). In line with the small number of shared regions, more than 60% of regions that gain DNAme are lineage specific (Figure 3C) and include loci such as SMAD3 (dEC), CTNNA3 (dME) and FOXA2 (dEN). FOXA2 has an upstream CGI that exhibits gain of DNAme (Figure 3D), and transcription in dEN is initiated downstream of this DMR at an alternative TSS, suggesting that TSS usage may be regulated, stabilized or reflected by DNAme (Maunakea et al., 2010).
We find significant enrichment of various TF motifs as DNAme targets upon differentiation (Table S6), which has some analogy to the gain of methylation observed at myeloid targets in the lymphoid lineage in vivo (Ji et al., 2010; Deaton et al., 2011; Bock et al., 2012). To extend this observation, we examined the DNAme state at regions bound by SOX2, OCT4 and NANOG in hESCs. For example, two regions 20 kb downstream of DBX1, a gene associated with early neural specification, are bound by all three TFs and gain DNAme in dME and dEN. In contrast, this region maintains low levels of DNAme in dEC, which has activated transcription of DBX1 (Figure 3E). We generally find that co-bound sites gain DNAme in the dME and dEN, but not dEC (Figure S3B). Further supporting the functional relevance of these dynamics, we find that regions with gain DNAme frequently coincide with DNAseI hypersensitive sites (Figure S3C) (Thurman et al., 2012). While transcriptional silencing was infrequently correlated with gain of DNAme at distal elements (Figure 3F, left), the promoters that gain DNAme in dEC and dME, are associated with a decrease in expression as expected (Figure 3F, right).
In examining the chromatin state of regions that gain DNAme during differentiation, we find that most regions exhibited enrichment of one or more histone modifications in hESCs (Figure 3G). These results confirm that in particular distal regulatory elements show highly dynamic regulation of DNA methylation during specification.
Loss of DNAme is asymmetric between the three populations (Figure 4A, top) and occurs in a more lineage specific fashion than gain (Figure 4B). However, loss also occurs mainly at intergenic regions (Figures 3B, bottom and and4A,4A, bottom). Notably, the dEC has the most DMRs and many were associated with neuronal gene categories (for instance: neural tube development, q=3.13×10−13). This includes the ectodermal TF POU3F1, which has a bivalent promoter in hESCs, resolves to a H3K4me3-only state (Figure S4A) and exhibits transcriptional activation in dEC (Table S2). Chromatin remodeling and activation at this locus coincides with specific loss of DNAme at a putative regulatory element downstream of the 3’UTR of this gene in dEC (Figure 4C). On a global scale, an immediate correspondence between loss of DNAme and expression, such as that observed at POU3F1, occurs at about half the regions (Figure 4D). More than 70% of DMRs that lose DNAme during differentiation are enriched for one of our profiled histone modifications in particular H3K4me1 or H3K27ac (Figure 4E).
Taken together, our hESC differentiation system reveals several interesting DNAme dynamics, including the lineage specific silencing of regulatory regions in default or alternative lineages. The asymmetric loss may also explain why our chromatin state analysis revealed fewer regions that gained H3K4me3 in the dEC population.
In addition to methylation on H3K4, open chromatin is also demarcated by enrichment of H3K27ac. It has also been suggested that the combination of H3K4me1 and H3K27ac at distal regions identifies active enhancer elements, while H3K4me1 and H3K27me3 corresponds to poised enhancer elements (Rada-Iglesias et al., 2011). To extend these observations, we focused specifically on regions that gain H3K27ac during differentiation and found that more than half of the identified regions are HMRs in hESCs (Figure 5A), while another large fraction is enriched for H3K4me1 in hESCs (Figure 5A). The majority of regions that gain H3K27ac are intergenic (Figure S5A), as shown for the RUNX1 locus (Figure 5A, B).
We next placed each region into one of three distinct categories (repressed, poised, open) based on their state in hESCs, and subsequently performed gene set enrichment analysis using the GREAT toolbox (Figure 5C) (McLean et al., 2010). This analysis reveals enhancer dynamics in line with the lineage specific differentiation trajectory for dEC and dME (Figure 5C). In contrast, the dEN population shows an unexpected enrichment for early neuronal genes (e.g. neural tube development, Figure 5C). This observation is consistent with the correlation that we reported between our dEC and dEN RNA-Seq data, suggesting that similar networks are induced in the early stages of both our ectoderm and endoderm specification (van Arensbergen et al., 2010).
Moreover, we find strong enrichment of downstream effector genes of the TGFβ, VEGF and BMP pathways in dME, directly reflecting the signaling cascades that were stimulated to induce the respective differentiation. In dEN we find enrichment of genes involved in WNT/β- CATENIN and retinoic acid (RA) signaling (Figure 5C). While we did not use RA, this signaling cascade has previously been implicated in endodermal tissue development including pharyngeal and pancreatic cell types (Wendling et al., 2000; Ostrom et al., 2008). Concordantly, we also find high levels of SMAD3 motif enrichment in the repressed dME and dEN, particularly in the poised putative enhancer populations (Figure 5D). Similarly, we observe enrichment of key lineage specific TF motifs such as the ZIC family proteins in dEC, TBX5 in dME and SRF in dEN. Interestingly, we also find the FOXA2 motif highly overrepresented in dEN where the factor is active, and also dEC where the factor is inactive but becomes expressed at a later stage of neural differentiation (Kriks et al., 2011), but not in dME (Figure 5D).
Many regions that exhibit high DNAme in hESCs and transition to H3K4me1 in one lineage remain HMRs in the two alternative cell types (Figure 2F, Figure 6A). Similar to the regions showing dynamic DNAme during differentiation, these regions are typically intergenic and >10kb from the nearest TSS (Figure 6B). GREAT analysis of these regions shows a strong enrichment for categories associated with brain development such as cerebellum morphogenesis in dEC (q<10−30), TGFβ pathway targets (q<10−10) in dME and suppression of EMT in dEN (q<0.0001). To understand if regions that gain H3K4me1 in our system are associated with somatic identity, we took advantage of published microarray data for 24 human tissues and determined genes upregulated in these tissues with respect to hESCs (termed, Tissue Atlas, see Extended Experimental Procedures). Reaffirming the relevance of our dynamics, we found regions that gain H3K4me1 in dEC are associated with fetal brain and specific cell types found within the adult brain (Figure 6C) based on region association with its nearest gene. The dME H3K4me1 pattern was associated with a range of interrogated tissues, such as heart, spinal cord and stomach, which may be due to heterogeneity of the tissues collected (Figure 6C). The dEN associations were interesting given that, as with the RNA-Seq and H3K27ac trends, H3K4me1 was again associated with brain-related categories (Figure 6C).
Overall, less than half of the genes that gain H3K4me1 exhibit immediate transcriptional changes (Figure 6D). CYP2A6 and CYP2A7 (Figure 6E) are representative examples that do not show a corresponding change in expression, while LMO2 does (Figure 6F). To investigate these regions in more detail, we carried out motif enrichment analysis and found lineage specific enrichment of TF motifs near regions that gain H3K4me1. While the FOXA2 motif is enriched in all three cell types, the DBX1 motif is associated with the gain of H3K4me1 in dEC (Figure 6G), which coincides with its transcriptional activation in this cell type (FPKM: 5.36). Conversely, the GLI3, HIC1 and CTF1 motifs are strongly enriched at regions that gain H3K4me1 in dEN (Figure 6G).
To further assess if this DNAme to H3K4me1 switch acts as a priming event, we differentiated the HUES64 endoderm population for five additional days in the presence of BMP4 and FGF2, leading to HNF4α positive hepatoblast-like (dHep) cells (Table S2, Figure S6A). Interestingly, of the motifs enriched in dEN that gain H3K4me1, HIC1, KLF4 and CTF1 (Figure 6G), several of these genes become expressed at the next stage of differentiation (Figure 6H). Lastly, 1,346 of these putatively primed regions are enriched for the active enhancer mark H3K27ac in human liver (Figure 6I).
More surprisingly, we observe intergenic regions that switch from high DNAme to H3K27me3 (n=3,985 in dEN) (Figure 7A). This transition frequently occurred within CpG poor, distal regions, which is distinct from the common CpG island-centric targets of Polycomb Repressive Complex 2 and H3K27me3 (Figure 7B). This switch is highly lineage specific and DNAme is generally retained in the alternative two cell types (Figure 7C, Figure 2F).
Motif enrichment analysis, combined with the evaluation of publicly available TF binding site (TFBS) data from the ENCODE project, indicated that many regions exhibiting this transition in dEN were near binding sites of the pioneering factor FOXA2. This TF has putative roles in chromatin decompaction, but its distinct functions and limitations remain somewhat unclear (Li et al., 2012). To investigate this association, we performed ChIP-Seq for FOXA2 in the endoderm population. This analysis reveals that FOXA2 binding sites frequently overlap with regions that transition from HMR to H3K27me3 (Figure 7D). We also confirmed that gain of H3K27me3 at dEN FOXA2 binding sites occurs predominantly in dEN, and not dEC or dME (Figure 7E). A notable example of this transition can be seen at the ALB locus, where H3K27me3 is gained at AFP and AFM, proximal to FOXA2 binding sites (Figure 7F). This mark is not found in primary liver tissue, suggesting it represents a transient state (Figure 7F). Many regions that exhibit this transition are required for later stages of development as with AFP and AFM or HBB1 in the dME. Importantly, the majority of these regions do not yet exhibit significant increases in expression (Figure S7A).
A previous report found that FOXA1/FOXA2 could bind to regions exhibiting DNAme (Serandour et al., 2011), which is not a characteristic shared by all TFs (Gifford and Meissner, 2012). Regions bound by these factors subsequently lost DNAme and gained euchromatic histone modifications in our populations. We therefore compared DNAme at FOXA2 binding sites in hESCs to dEN and found a slight reduction in specifically in the dEN (Figure 7G). To more directly assess this relationship, we interrogated the DNAme state of regions isolated by FOXA2-ChIP-Bisulfite sequencing in dEN (Brinkman et al., 2012). Interestingly, we saw a major depletion of DNAme at sites isolated by FOXA2-ChIP (Figure 7G). To determine if these regions exhibit transcriptional activation after further differentiation we examined again our dHep RNA-Seq data and found that 50 genes, which were bound by FOXA2 and gained H3K27me3 in dEN, increased their expression (Figure 7H, Table S2). We also find H3K27ac enrichment at 197 loci in the human liver that had experienced the gain of H3K27me3 in dEN (Figure 7I).
Using directed differentiation of hESCs to three distinct, FACS-enriched populations, representing early stages of embryonic development, we provide an extensive set of new data and many insights on the transcriptional and epigenetic dynamics that occur during human in vitro lineage specification.
Among other things we describe two very interesting, but distinct lineage specific dynamics from high DNAme to H3K4me1 or H3K27me3. These transitions occur at many sites that do not significantly change gene expression during our early stages of differentiation. Notably, we made similar observations for H3K4 methylation during the early stages of reprogramming to an iPSC state (Koche et al., 2011), suggesting that this type of epigenetic priming event might be common. At this point however it is not clear whether these events reflect a regulatory mechanism to facilitate timely activation upon differentiation or indicate the absence of a critical co-factor necessary for complete transcriptional activation. We also cannot rule out that a subset of the observed priming events are due to heterogeneity in the cell population that are not detected by our RNA-Seq. Our observation that high DNAme switches to H3K27me3 enrichment in distal, CpG-poor regions is even more interesting. It remains to be tested whether targeted loss of DNAme at these regions causes a default gain of H3K27me3 in the absence of additional co-factors due to underlying sequence context (Mendenhall et al., 2010) or represents a more active recruitment event and regulatory mechanism. It is also possible that H3K27me3 gain at distal regions is due to genomic conformation changes and reflects H3K27me3 spreading in three dimensions. It was recently reported that the combination of H3K27me3 enrichment and a nearby nucleosome-depleted region creates sites amenable to TF binding (Taberlay et al., 2011). Based on these results, one may speculate that specific TFs, such as FOXA2, exert chromatin decompaction functions resulting in loss of DNAme leading to gain of H3K27me3, which creates a platform for subsequent binding of other TFs that cannot directly remodel a heterochromatic state, but instead function in transcription machinery assembly and transcriptional activation.
In conclusion, our data provide new insights on transcriptional and epigenetic dynamics during hESC specification and represent valuable reference maps for many applications, including regenerative biology and the study of human developmental biology.
For full details, see Extended Experimental Procedures.
RNA was hybridized to a custom probe set, processed using the Nanostring prep station, imaged using the Nanostring nCounter and analyzed as previously described (Bock et al., 2011).
For WGBS sequencing libraries, Genomic DNA was fragmented using a Covaris S2 sonicator. DNA fragments were cleaned-up, end-repaired, A-tailed, and ligated with methylated paired-end adapters (purchased from ATDBio).
For ChIP-Seq libraries, ChIP-isolated DNA was end-repaired, A-tailed and ligated to barcoded Illumina adaptors, and the library was amplified using PFU Ultra II Hotstart Master Mix (Agilent)
For RNA-Seq libraries, polyadenylated RNA was isolated using Oligo dT beads (Invitrogen) and fragmented to 200–600 base pairs, and then ligated to RNA adaptors using T4 RNA Ligase, (NEB), preserving strand of origin information.
ChIP-BS libraries were constructed as previously described (Brinkman et al., 2012).
We would like to thank Kendell Clement for support of the WGBS data visualization, Loyal Goff for RNA-Seq visualization, as well as Zachary Smith and Jing Liao, for critical reading of the manuscript, and the SCRB FACS Core for advice regarding FACS analysis. We also thank other members of the Meissner Lab and Epigenomics Platform at the Broad Institute for helpful discussion. A.M. is supported by the Pew Charitable Trusts and is a New York Stem Cell Foundation (NYSCF) Robertson Investigator. The work was funded by the US National Institutes of Health (NIH) grants (U01ES017155 and P01GM099117) and the NYSCF.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.