|Home | About | Journals | Submit | Contact Us | Français|
Pluripotency is dependent on the maintenance of a proper epigenetic landscape (Gaspar-Maia et al., 2011; Orkin and Hochedlinger, 2011). Bivalent domains, which are defined by the paradoxical coexistence of a permissive histone mark (H3K4me3) and a repressive mark (H3K27me3), are thought to play an important role in pluripotency by keeping developmental genes in a silenced state poised for activation upon differentiation (Azuara et al., 2006; Bernstein et al., 2006). In support of this model, bivalent domains have been shown to be present preferentially at developmental regulatory genes in undifferentiated embryonic stem cells (ESCs) and several adult tissues in vivo including sperm, testis, the cerebellum, and the hematopoietic compartment (Mikkelsen et al., 2007; Cui et al., 2009, 2012; Hammoud et al., 2009).The moderate de-repression of developmental regulators in eed mutant ESCs, which have reduced levels of H3K27me3, further supports this model (Azuara et al., 2006; Boyer et al., 2006). However, it has recently been proposed that the concomitant presence of H3K4me3 and H3K27me3 at developmental genes in ESCs is an in vitro artifact resulting from suboptimal culture conditions (Hong et al., 2011; Marks et al., 2012). In addition, direct evidence for bivalency in the developing mammalian embryo is limited and conflicting due to technical difficulties associated with the low amounts of material available (Alder et al., 2010; Dahl et al., 2010; Rugg-Gunn et al., 2010). The universal nature of bivalency has been further questioned due to conflicting reports from non-mammalian species, with bivalent domains demonstrated to exist in zebrafish (Vastenhouw et al., 2010) but not detected in Xenopus or Drosophila embryos (Akkers et al., 2009; Schuettengruber et al., 2009). We therefore lack a clear understanding of whether bivalency exists in embryonic cells in vivo, and how it relates to pluripotency.
Of particular interest to pluripotency are primordial germ cells (PGCs), the embryonic precursors of the germline. PGCs have a transcriptional profile similar to ESCs (Qin et al., 2012), including the silencing of developmental regulators, and can give rise to pluripotent stem cells when cultured in vitro. However, unlike ESCs, PGCs are unipotent and unable to contribute to chimeras. Additionally, PGCs are thought to undergo a period of ‘epigenetic erasure’ or reprogramming via global removal of both DNA methylation and histone modifications that may prevent transmission of epimutations (Greer and Shi, 2012; Seisenberger et al., 2013). Due to the low number of PGCs present during development, nearly all evidence for this model is from immunofluorescence (IF)-based methodologies that lack gene level resolution and have shown conflicting results (Seki et al., 2007; Hajkova et al., 2008; Kagiwada et al., 2012). To characterize the chromatin state of PGCs, we developed a low cell number Chromatin Immunoprecipitation (ChIP) for the analysis of histone marks using less than 10,000 cells per IP without the need for carrier chromatin or pre-amplification. Using this technique, we performed ChIP-Seq and ChIP-qPCR to show that bivalent domains are present at developmental regulatory genes at multiple stages of PGC development.
We developed a low cell number ChIP protocol using a magnetic bead-based capture with key improvements over existing approaches (O’Neill et al., 2006; Acevedo et al., 2007; Dahl and Collas, 2008; Adli and Bernstein, 2011; Shankaranarayanan et al., 2011), including reduced handling of material for cross-linking, improvements to sonication reproducibility, and no requirement for carrier DNA or DNA pre-amplification (see Methods). We validated the protocol using a transgenic mouse ESC line expressing an N-terminal myc-tagged H3.3 histone, due to technical simplicity and the availability of genome-wide data (Goldberg et al., 2010). ChIP was performed for myc-H3.3 and a control IgG with decreasing amounts of chromatin starting from 2.5 million to 12,500 cells per IP (Figure S1A). qPCR was used to interrogate the enrichment status of 15 different loci representing 6 categories of genomic regions. Importantly, 11 of the 12 loci expected to be bound by H3.3 (Goldberg et al., 2010) were as least 10 fold more enriched for myc-H3.3 compared to IgG, demonstrating the specificity and low background of the assay. Additionally, the relative enrichment between loci remains stable independent of starting cell number, demonstrating the robustness and sensitivity of the assay (Figure S1B). We envision that this protocol will be broadly useful to characterize the chromatin state of rare stem and precursor cell populations in vivo.
To investigate the presence of bivalent domains in PGCs, we used a transgenic Oct4:GFP mouse line to isolate PGCs (GFP+) and the surrounding somatic cells (GFP−) by FACS at different stages of development as previously reported (Yoshimizu et al., 1999). Using our low cell ChIP protocol, we performed ChIP-Seq for H3K4me3 and H3K27me3 in two biological replicates of E11.5 PGCs, obtaining a total of approximately 386 million reads and between 15 and 32 million uniquely mappable reads post filtering per ChIP library (Table S1). The two biological replicates in our ChIP-Seq libraries show very strong correlation, with Pearson correlation coefficients of 0.978 for H3K4me3 and 0.996 for H3K27me3 (Figure 1A, S2A, Table S1), indicative of the high reproducibility of the protocol. Furthermore, all IPs are enriched for signal over input as assessed using CHANCE (Table S1) (Diaz et al., 2012).
We identified H3K4me3-only regions, H3K27me3-only regions, and bivalent domains in both E11.5 PGCs (this study) and ESCs (Mikkelsen et al., 2007) using ChromHMM (Ernst and Kellis, 2012). The overall distribution of these histone marks is similar between PGCs and ESCs, except for an increase in H3K27me3-marked DNA in PGCs (Figure S2B), which is in agreement with IF data (Seki et al., 2005, 2007; Kagiwada et al., 2012). Genes marked only with H3K27me3 in PGCs (Supplemental File 2) are involved in cell motility (Figure S2C), which may be a consequence of PGCs having just completed their migration to the gonads at E11.5.
We next focused on bivalent regions. Consistent with their association with gene promoters, bivalent domains largely occur at transcriptional start sites (TSSs) in both E11.5 PGCs and ESCs (Figure S2D). E11.5 PGCs and ESCs have a similar distribution and a highly significant overlap of genes marked as bivalent (p<2.2X10−16, Figure 1B, 1C, S2E). Similarly to what has been described for ESCs (Bernstein et al., 2006), the 2715 genes marked as bivalent in PGCs and ESCs are strongly enriched for regulators of early development (Figure 1D). We note that while this manuscript was under review, Ng et al. reported a ChIP-Seq dataset for several histone marks in PGCs, including H3K4me3 and H3K27me3 (Ng et al., 2013). However, the authors did not analyze bivalency in detail and the low signal for H3K4me3 in E11.5 PGCs in that study precludes a direct comparison with our work (Figure S2A). Therefore, we sought to validate our data using ChIP-qPCR in independent biological samples of PGCs. ChIP-qPCR at 12 genes representing 8 developmental pathways confirmed that all are bivalently marked in PGCs, with bivalency defined as enrichment above a background of 10% (Supplemental File 1) for both H3K4me3 and H3K27me3 and a maximum 2-fold difference in enrichment between the two marks (Figure 1E). These bivalent regulators include transcription factors that are important for the specification of the 3 somatic germ layers, such as Pax6, Nkx2.2, MyoD, brachyury (T), Gata6 and FoxA1, and Hox genes such as HoxA3 and HoxB9. This is notable because ESCs are pluripotent and expected to maintain these genes in a poised transcriptional state for activation upon differentiation, but PGCs are unipotent and will not express any of these developmental regulators until post fertilization, in the next generation. In fact, the Hox cluster has been shown to be silenced as part of the early specification of PGC precursors at E7.25 (Saitou et al., 2002). A comparison of the expression pattern of genes bivalent in PGCs and ESCs demonstrates that these genes are transcriptionally repressed in ESCs and PGCs, but subsets are active in select lineage-restricted cell types such as MEFs, bone marrow, or brain tissue (Figure 1F). Taken together, our genome-wide analysis reveals that the mouse embryonic germline in vivo is remarkably similar to cultured ESCs with regards to the presence of bivalent chromatin at silenced developmental regulatory genes.
We next compared the chromatin landscape of PGCs to the surrounding soma. Using ChIP-qPCR, we examined the 12 previously tested developmental regulators plus Pou5f1/Oct4 in ESCs and E11.5 soma (Figure 2A, 2B). 12/12 of these developmental regulators are bivalent in both E11.5 PGCs and ESCs. However, in the soma most of these genes have stronger enrichment for either H3K4me3 or H3K27me3 and only four genes are bivalent (Figure 2B, 2C). These data indicate that developmental regulators of all somatic lineages exist in a bivalent state in ESCs and PGCs but not in somatic cells in vivo (Figure 2D), suggesting that this distribution of bivalent domains is a distinguishing characteristic of pluripotency-associated cells both in vitro and in vivo.
E11.5 PGCs are sexually indifferent and capable of giving rise to pluripotent stem cells when cultured in vitro. After E11.5, PGCs initiate the process of sexual differentiation and express sex-specific genes by E13.5. Furthermore, E13.5 PGCs can no longer give rise to pluripotent stem cells (Labosky et al., 1994). We therefore sought to investigate whether the process of sexual differentiation and concomitant loss of the ability to give rise to pluripotent stem cell cells affects the bivalent chromatin state of PGCs. Using low cell ChIP-qPCR, we examined 8 representative developmental genes through a time-course of PGC development from E11.5 to E13.5. In agreement with previously published data (Qin et al., 2012), somatic developmental genes, including HoxA11, HoxB9, Pax6, Nkx2.2, T, MyoD, Gata6, and FoxA1 are expressed at very low to undetectable levels from E11.5 through E13.5 (Figure 3). Interestingly, these developmental regulators remain bivalent in PGCs throughout this time-period (Figure 3). In contrast, germline-specific genes, such as Dazl, Dppa3, and Vasa are enriched for only H3K4me3, coinciding with their transcriptional activation (Figure S3). Taken together, these data indicate that somatic developmental regulators remain bivalent in the mouse embryonic germline from E11.5 through initiation of sexual differentiation at E13.5.
Surprisingly, we identified a large number of ‘orphan’ bivalent domains that cannot be associated with any annotated promoter or gene in the RefSeq database, and are distal to annotated genes by a median distance of 13kb (Figure 4A). From a total of 9132 bivalent domains in E11.5 PGCs, 2886 are orphan, and many are also bivalent in ESCs (Table S2). Bivalent domains are known to have a strong association with unmethylated CpG islands (CGIs) (Ku et al., 2008). We found that 1413 (49.0%, p-value < 1×10−5) of the orphan bivalent domains in PGCs map to experimentally defined unmethylated CGIs (Illingworth et al., 2010). We next investigated the expression status of the regions containing orphan bivalent domains. If the orphan bivalent domains were similar to bivalent genes (Figure 1F), they would be silent in PGCs and expressed in somatic tissues. Indeed, only 23 (0.80%) orphan bivalent domains show signs of transcription in an RNA-Seq dataset from E11.5 PGCs (Seisenberger et al., 2012). Analysis of CAGE tag clusters showed that 927 of the 1413 (65.6%, p-value = 5.78×10−168) orphan bivalent domains with CGIs are expressed in at least 1 of 22 tissue types (Figure 4B, Table S3) (Kawaji et al., 2009). Similarly, 461 of the 1473 (31.3%, p-value = 3.61×10−35) orphan bivalent domains without CGIs are expressed in at least 1 of 22 tissue types (Figure 4B, Table S3). Furthermore, both classes of orphan bivalent domains (with or without CGI) display highly tissue-specific expression (Figure 4C), more so than bivalent developmental regulators (p-value = 1.1×10−141, 2.3×10−72 respectively), with a preferential activation in brain regions (Figure 4E). Interestingly, we found bivalent domains in PGCs at 300 non-coding RefSeq genes and 179 putative non-coding RNAs inferred from chromatin state (Guttman et al., 2009) and expression data (Carninci et al., 2005) (Figure 4D, S2F). These data suggest that, similar to protein-coding developmental regulators, some non-coding RNAs are bivalent in the germline and poised for activation upon lineage commitment.
In this work we describe an optimized ChIP protocol suitable for low numbers of cells and its use to examine the genome-wide and gene-specific distribution of H3K4me3 and H3K27me3 in mouse PGCs. We report that developmental regulatory genes remain bivalent and transcriptionally silent in vivo in PGCs, but not in adjacent somatic cells, throughout E11.5-E13.5, in a manner highly similar to cultured ESCs. In addition to developmental genes, we identify some ~3000 orphan bivalent domains that are enriched for CGIs and are expressed in a tissue-specific manner. Some of the bivalent domains identified here correspond to non-coding RNAs that we speculate may have developmental functions, similar to bivalent genes, although this remains to be determined. The findings presented here represent strong evidence for the existence of bivalent domains in the embryonic germline, and raise a number of important questions.
Do bivalent promoters escape the epigenetic reprogramming reported to occur in PGCs from E8.5–E13.5? Our data support the suggestion that H3K27me3 may act as a potential mechanism to compensate for the loss of DNA methylation or H3K9me2/3 in mid-gestation PGCs (Sasaki and Matsui, 2008). It is also possible that the loss of certain histone marks observed by IF may occur primarily at abundant repetitive sequences of the genome and mostly spare unique genes, such as developmental regulators. The application of the low cell ChIP protocol reported here should help answer this question for other histone marks and time-points.
Does the germline continuously maintain the somatic program in a bivalent state until activation in the next generation? It is unclear why such a large set of important developmental regulators would remain bivalent in the developing germline, but not in somatic cells. The hypothesis that bivalency maintains developmental genes in a transcriptionally poised state is intuitive in pluripotent cells that activate these genes upon differentiation, such as ESCs (Azuara et al., 2006; Bernstein et al., 2006) or the epiblast (Rugg-Gunn et al., 2010). The presence of bivalent domains in PGCs may contribute to preventing expression of the somatic program in the germline. Our observations suggest that developmental regulators may be kept in a repressed but accessible state in the germline for activation post-fertilization, in the next generation. The transmission of bivalency through the germline could provide a substratum for epigenetic inheritance. Surprisingly, recent work indicates that the sperm genome maintains a residual level of nucleosomes, and that these are enriched for H3K4me3/H3K27me3 bivalent marks at developmental regulators (Hammoud et al., 2009). It will be interesting to determine whether bivalency is detected at other stages of germline development and in oocytes. Functional studies of regulators of bivalency in PGCs should provide important insights into these questions.
Male B6 mice homozygous for a transgenic Oct4ΔPE:GFP reporter were crossed with Swiss-Webster females. The gonadal regions from multiple embryos were isolated and pooled prior to enzymatic dissociation for FACS.
Cells were pooled into batches of ~50,000 cells and cross-linked in 1% formaldehyde (Sigma F8775-25ml) in PBS for 5min at room temperature and quenched with 125mM glycine. For each IP, 11µl of protein A Dynabeads (Life Technologies 1001D) were pre-incubated with 2.4µg of either anti H3K4me3 (Diagenode pAb003-050), H3K27me3 (Millipore 07-449), or IgG (Abcam ab46540) in ChIP lysis buffer. Cross-linked cells were lysed in Lo-Bind microfuge tubes (Eppendorf 022431021) followed by a 40 minute sonication in a BioRuptor sonicator (Diagenode UCD-200) set to high power, 7” ON, 15” OFF, changing ice/water slurry every 10 minutes.
Lysate was diluted in lysis buffer and cleared of debris. Cleared lysate from ~50,000 cells was divided into four equal aliquots, and one aliquot used for Input. The remaining three aliquots (equivalent to ~12,500 cells each) were used for IP of H3K4me3, H3K27me3 and IgG. Lysates were incubated with preformed bead/antibody overnight at 4C with mixing. The chromatin/bead/antibody complexes were washed sequentially with lysis buffer three times, DOC buffer once, and TE buffer once, followed by a transfer to a fresh PCR tube. Chromatin was eluted using elution buffer (1% SDS, 0.1M NaHCO3). IP and Input samples were treated with RNaseA followed by Proteinase K treatment. Cross-linking was reversed by incubating overnight at 65C while shaking. DNA purification was done using a QiaQuick PCR Purification Kit (Qiagen 28104). E11.5 ChIP data are representative of 3–7 biological replicates expect for Dnmt3b which was technically replicated. Additional PGC data represent a minimum of 2 biological replicates except for male E12.5 which were technically replicated. All primers are listed in Supplemental File 1. A full protocol is located in the supplemental experimental procedures.
ChIP material was obtained and processed as described above with the following modifications. Cells were cross-linked in 0.25% formaldehyde (Thermo Scientific 28906) in PBS for 10 minutes at room temperature prior to quenching with 125mM glycine. Cross-linked material was sonicated using a Covaris sonicator for 12’ at Duty 5%, Intensity 3, and Bursts 200. Two aliquots of E11.5 PGCs, consisting of 104,000 and 97,000 cells, were each divided equally to perform ChIP for H3K4me3, H3K27me3, and an input control. IP-DNA and Inputs were purified using a MinElute PCR Purification Kit (Qiagen 28004) and libraries generated using the ThruPLEX-FDPrep Kit (Rubicon Genomics R40048) with 20 cycles of amplification for IP-DNA and 15 cycles for Input DNA.
Reads from each ChIP-Seq library were filtered to retain only unique sequences (Table S1). Reads were aligned to the mm9/NCBI build 37 mouse genome using bowtie and reads mapping only once were retained. Data for ESCs were handled in the same manner, obtained from GEO accession GSE12241(Mikkelsen et al., 2007). To call and compare bivalent domains between the PGC and ESC datasets, a 13-state segmentation of the genome was generated using ChromHMM (v1.06) (Ernst and Kellis, 2012). The four samples (PGC H3K4me3, PGC H3K27me3, ESC H3K4me3, ESC H3K27me3) were treated as distinct marks from a single cell type to allow a direct and unbiased comparison of the genome segmentation in PGCs and ESCs. K-means clustering was used to group the state emission parameters for each sample into 2 groups ("on" or "off"), and these were used to classify each of the 13 ChromHMM segmentation states as "bivalent", "H3K4me3 only", "H3K27me3 only", or "none" for each cell type. Data from the replicate PGC libraries were pooled together for this analysis. H3 ChIP-Seq data from the Mikkelsen dataset was used as control for the ESC samples.
All gene set analysis is based on the mm9 RefSeq gene set. To ensure a conservative segregation of bivalent domains associated with regulation of annotated genes from orphan bivalent domains, genes were considered bivalent if an H3K4me3/H3K27me3 enriched region fell in the range 0.2kb upstream of the transcription start site (TSS) through the transcription end site (TES) of any isoform of the gene. H3K4me3/H3K27me3 enriched regions occurring outside the range from 0.2kb upstream of the TSS through the TES of any isoform of a gene were called orphans. Genes were considered H3K4me3-only if no isoforms were bivalent and if any isoform had H3K4me3-only regions falling in the window from − 0.2 kb upstream to 0.2 kb downstream of the TSS. Similarly, genes were considered to be H3K27me3-only if they were neither bivalent nor H3K4me3-only and if any isoform had regions of H3K27me3-only occurring in the +/− 0.2kb window. Analysis of enrichment for Gene Ontology terms was done using DAVID (Huang et al., 2009). The Gene Expression Omnibus accession number for the ChIIP-Seq data reported in this paper is GSE46396.
CAGE tag cluster locations in mm9 and tags per million (tpm) values for 22 tissue types were obtained from FANTOM 4 (Kawaji et al., 2009). To assess the statistical significance of the accumulation of CAGE tags, the probability pi of observing a CAGE tag within 200bp of each orphan bivalent domain was first estimated based on CAGE tag representation in the region Xi containing the bivalent domain extended 10kb in each direction. Let Yi be the indicator random variable that indicates whether the observed bivalent domain in Xi had a detectable CAGE tag cluster within 200 bp. Assuming that Yi are independent Bernoulli random variables with probability pi, the sum of Yi is approximately normal with mean Σipi and variance Σipi(1−pi), via the Lyapunov central limit theorem (see supplemental experimental procedures).
For each CAGE tag cluster, tissue specificity score was computed based on the Jensen-Shannon divergence between the relative abundance of tpm values across the tissue types and the extreme distribution of being expressed in only one tissue type where the tag cluster has the greatest expression value (Cabili et al., 2011).
MS conceived of the project. MRS and MS directed the project with participation from JSS. MS and MRS developed the study design with participation from KB and KTE. MS developed the low-cell ChIP protocol, performed ChIP and expression experiments and analyzed them. MS, KB, and KTE isolated PGCs. KTE conducted all FACS. MS generated ChIP-Seq libraries. Sequencing was done at UC DAVIS Expression Analysis Core. CO analyzed the majority of genome-wide data with aid and supervision from JSS. MS and MRS wrote the manuscript. All authors read, helped write, and edit the manuscript. We thank Jehnna Ronan for advice in developing the ChIP assay, members of the Santos lab for discussions and Robert Blelloch, Diana Laird, Marco Conti and Barbara Panning for critical reading of the manuscript. K.B. was the recipient of an NSF pre-doctoral fellowship. K.T.E. was the recipient of a CIRM post-doctoral training grant (TG2-01153). This work was funded by U54HD055764 to J.S.S. and an NIH New Innovator Award (DP2OD004698) to M. R.-S.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.