|Home | About | Journals | Submit | Contact Us | Français|
Master transcription factors Oct4, Sox2 and Nanog bind enhancer elements and recruit Mediator to activate much of the gene expression program of pluripotent embryonic stem cells (ESCs). We report here that the ESC master transcription factors form unusual enhancer domains at most genes that control the pluripotent state. These domains, which we call super-enhancers, consist of clusters of enhancers that are densely occupied by the master regulators and Mediator. Super-enhancers differ from typical enhancers in size, transcription factor density and content, ability to activate transcription, and sensitivity to perturbation. Reduced levels of Oct4 or Mediator cause preferential loss of expression of super-enhancer-associated genes relative to other genes, suggesting how changes in gene expression programs might be accomplished during development. In other more differentiated cells, super-enhancers containing cell type-specific master transcription factors are also found at genes that define cell identity. Super-enhancers thus play key roles in the control of mammalian cell identity.
Transcription factors typically regulate gene expression by binding cis-acting regulatory elements known as enhancers and recruiting coactivators and RNA Polymerase II (RNA Pol II) to target genes (Lelli et al., 2012; Ong and Corces, 2011). Enhancers are segments of DNA that are generally a few hundred base pairs in length and are typically occupied by multiple transcription factors (Carey, 1998; Levine and Tjian, 2003; Panne, 2008; Spitz and Furlong, 2012).
Much of the transcriptional control of mammalian development is due to the diverse activity of transcription factor-bound enhancers that control cell type-specific patterns of gene expression (Bulger and Groudine, 2011; Hawrylycz et al., 2012; Maston et al., 2006). Between 400,000 and 1.4 million putative enhancers have been identified in the mammalian genome by using a variety of high-throughput techniques that detect features of enhancers such as specific histone modifications (Bernstein et al., 2012; Thurman et al., 2012). The number of enhancers that are active in any one cell type has been estimated to be in the tens of thousands and enhancer activity is largely cell type-specific (Bernstein et al., 2012; Heintzman et al., 2009; Shen et al., 2012; Visel et al., 2009; Yip et al., 2012).
In embryonic stem cells (ESCs), control of the gene expression program that establishes and maintains ESC state is dependent on a remarkably small number of master transcription factors (Ng and Surani, 2011; Orkin and Hochedlinger, 2011; Young, 2011). These transcription factors, which include Oct4, Sox2 and Nanog, bind to enhancers together with the Mediator coactivator complex (Kagey et al., 2010). The Mediator complex facilitates the ability of enhancer-bound transcription factors to recruit RNA Pol II to the promoters of target genes (Borggrefe and Yue, 2011; Conaway and Conaway, 2011; Kornberg, 2005; Malik and Roeder, 2010) and is essential for maintenance of ESC state and embryonic development (Ito et al., 2000; Kagey et al., 2010; Risley et al., 2010).
ESCs are highly sensitive to reduced levels of Mediator. Indeed, reductions in the levels of many subunits of Mediator cause the same rapid loss of ESC-specific gene expression as loss of Oct4 and other master transcription factors (Kagey et al., 2010). It is unclear why reduced levels of Mediator, a general coactivator, can phenocopy the effects of reduced levels of Oct4 in ESCs.
Interest in further understanding the importance of Mediator in ESCs led us to further investigate enhancers bound by the master transcription factors and Mediator in these cells. We found that much of enhancer-associated Mediator occupies exceptionally large enhancer domains and that these domains are associated with genes that play prominent roles in ESC biology. These large domains, or super-enhancers, were found to contain high levels of the key ESC transcription factors Oct4, Sox2, Nanog, Klf4 and Esrrb, to stimulate higher transcriptional activity than typical enhancers, and to be exceptionally sensitive to reduced levels of Mediator. Super-enhancers were found in a wide variety of differentiated cell types, again associated with key cell type-specific genes known to play prominent roles in control of their gene expression program. These results indicate that super-enhancers drive genes essential for cell identity in many mammalian cell types.
Previous studies have shown that co-occupancy of ESC genomic sites by the Oct4, Sox2 and Nanog transcription factors is highly predictive of enhancer activity (Chen et al., 2008) and that Mediator is typically associated with these sites (Kagey et al., 2010). We generated high-quality ChIP-Seq datasets for Oct4, Sox2 and Nanog (OSN) in murine ESCs and identified 8,794 sites that are co-occupied by these three transcription factors to annotate enhancers in ESCs (Table S1, Data S1). Inspection of enhancers at several genes that have prominent roles in ESC biology revealed an unusual feature: a large domain containing clusters of constituent enhancers (Figure 1). While the vast majority of enhancers spanned DNA segments of a few hundred base pairs (Figure 1A), some portions of the genome contained clusters of enhancers spanning as much as 50kb (Figure 1B). We found that ESC enhancers can be divided into two classes based on Mediator levels: one class comprised the vast majority of enhancers and the other encompassed 231 large enhancer domains (Figure 1C). Approximately 40% of the Mediator signal associated with enhancers was found in these 231 enhancer domains. The key features of the 231 domains containing high levels of Mediator, which we call super-enhancers, are 1) they span DNA regions whose median length is an order of magnitude larger than the typical enhancer and 2) they have levels of Mediator that are at least an order of magnitude greater than those at the typical enhancer (Figure 1D).
Further characterization of the ESC super-enhancer regions revealed that they contain many features of typical enhancers but at a considerably larger scale (Figure 1D, Figure S1A). Previous studies have shown that nucleosomes with the histone modifications H3K27ac and H3K4me1 are enriched at active enhancers (Creyghton et al., 2010; Rada-Iglesias et al., 2011). Based on ChIP-Seq data, the levels of these histone modifications at the super-enhancers exceed those at the typical enhancers by at least an order of magnitude (Figure 1D). These high levels of histone modifications are due both to the size of the domain and the density of occupancy at constituent enhancers (Figure 1D). Similar results were obtained for DNase I hypersensitivity (Figure 1D, Figure S1A), another feature of enhancers (Dunham et al., 2012). We compared the relative ability of ChIP-Seq data for OSN, Mediator, H3K27ac, H3K4me1, as well as DNaseI hypersensitivity data to distinguish super-enhancers from typical enhancers (Extended Experimental Procedures and Figure S1B). We found that Mediator performed optimally, although each of these enhancer features could be used to some degree to distinguish super-enhancers from typical enhancers (Figure 1E, Figure S1B).
To investigate whether the super-enhancers have features that might further distinguish them from typical enhancers, we examined ChIP-Seq data for 18 different transcription factors, histone modifications, chromatin regulators, as well as DNaseI hypersensitivity (Table S2). The most striking difference was in the occupancy of transcription factors Klf4 and Esrrb (Figure 1F–H). While the levels of Oct4, Sox2 and Nanog were similar in constituent enhancers within typical enhancers and super-enhancers (p-val.= 0.012, 10−4, and 0.11, respectively), the levels of Klf4 and Esrrb showed considerably higher occupancy at the constituent enhancers of super-enhancer domains (p-val.<10−34 and 10−25, respectively)(Figure 1G,H). Thus, super-enhancers are not simply clusters of typical enhancers, but are particularly enriched in Klf4 and Esrrb, which have previously been shown to play important roles in the ESC gene expression program and in reprogramming of somatic cells to induced pluripotent stem (iPS) cells (Feng et al., 2009; Festuccia et al., 2012; Jiang et al., 2008; Martello et al., 2012; Percharde et al., 2012; Takahashi and Yamanaka, 2006).
To gain additional insights into the mechanisms involved in super-enhancer formation, we studied the frequency of known transcription factor binding motifs in these and other regions of the genome. We found that constituent enhancers within super-enhancer regions were significantly enriched for sequence motifs bound by Oct4, Sox2, Nanog, Klf4 and Esrrb, but not for motifs bound by other transcription factors expressed in ESCs such as CTCF and c-Myc (Figure 1I). The sequence motifs for Oct4, Sox2 and Nanog showed similar levels of enrichment at typical enhancers and constituent enhancers within super-enhancer domains, but motifs for Klf4 and Esrrb were significantly enriched in the constituent enhancers within super-enhancers (p-val.<10−45)(Figure 1J). These data indicate that ESC super-enhancers are large clusters of enhancers that can be distinguished from typical enhancers by the presence of the transcription factors Klf4 and Esrrb and exceptional levels of Mediator, and indicate that these domains are formed as a consequence of binding of specific master transcription factors to dense clusters of their binding site sequences.
Enhancers tend to loop to and associate with adjacent genes in order to activate their transcription (Ong and Corces, 2011). Most of these interactions occur within a distance of ~50kb of the enhancer, although many can occur at greater distances up to several megabases (Sanyal et al., 2012). Previous studies have utilized various methods to assign enhancers to their target genes, including proximity, enhancer-promoter unit assignments (EPUs), and genome-wide interactions discovered by chromosome conformation capture techniques (Dixon et al., 2012; Shen et al., 2012; Whyte et al., 2012). We initially used proximity to assign 231 super-enhancers to 210 genes (Table S1), because the super-enhancers tend to overlap the genes to which they were associated. These super-enhancer proximity assignments were highly consistent (95% agreement) with EPU assignments (Table S3). In addition, 93% of the super-enhancer-promoter pairs identified by proximity occur within the same topological domains defined by Hi-C (Figure 2A, Table S3). Furthermore, for three of these genes (Oct4, Nanog, and Lefty1), interactions between portions of the super-enhancer and the target promoter were previously demonstrated using chromatin conformation capture (3C)(Kagey et al., 2010).
The set of super-enhancer-associated genes contained nearly all genes that have been implicated in control of ESC identity (Table S1). They included genes encoding the master ESC transcription factors Oct4, Sox2 and Nanog (Figure 2B, Table S1). They also included genes encoding most other transcription factors implicated in control of ESC identity, as well as genes encoding DNA-modifying enzymes and miRNAs that feature prominently in the control of the ESC gene expression program (Figure 2C). For example, Klf4 and Esrrb play important roles in ESC biology and can facilitate reprogramming (Feng et al., 2009; Festuccia et al., 2012; Jiang et al., 2008; Martello et al., 2012; Percharde et al., 2012; Takahashi and Yamanaka, 2006). The products of the Tet genes are associated with most active promoters and are responsible for global 5-hydroxymethylation of DNA in ESCs (Wu et al., 2011; Yu et al., 2012a). The miR-290-295 locus produces the most abundant miRNAs in ESCs (Calabrese et al., 2007) and is essential for embryonic survival (Medeiros et al., 2011).
Previous studies have identified genes encoding a broad range of transcription factors, coactivators and chromatin regulators that are necessary for maintenance of the ESC state (Ding et al., 2009; Fazzio et al., 2008; Hu et al., 2009; Kagey et al., 2010). To further investigate the extent to which super-enhancer-associated genes are involved in control of ESC state, we compared the set of super-enhancer-associated genes to the genes in a short hairpin RNA (shRNA) knockdown screen involving 2,000 regulators, which included most transcription factors and chromatin regulators encoded in the mouse genome (Kagey et al., 2010). We found that the majority of genes encoding transcription factors, coactivators and chromatin regulators whose knockdown most profoundly caused loss of ESC state are associated with super-enhancers (p-val.<10−2) (Figure 2D). This further supports the notion that super-enhancer-associated genes encode many regulators that are key to establishing and maintaining ESC state.
Genes encoding transcription factors were the predominant class of super-enhancer-associated genes based on analysis of gene ontology functional categories (Figure 2E). In contrast, super-enhancers were not found to be associated with housekeeping genes (Figure S2). The ESC master transcription factors Oct4, Sox2 and Nanog have previously been shown to form an inter-connected autoregulatory loop, where all three factors bind as a group to the promoters of each of their own genes and form the core regulatory circuitry of ESCs (Boyer et al., 2005; Loh et al., 2006). The discovery of Klf4 and Esrrb at super-enhancers, and evidence that Klf4 and Esrrb play important roles in the ESC gene expression program and in reprogramming of somatic cells to iPS cells (Feng et al., 2009; Takahashi and Yamanaka, 2006) suggest that this autoregulatory loop should be expanded to include Klf4 and Esrrb (Figure 2F).
Super-enhancer-associated genes are generally expressed at higher levels than genes associated with typical enhancers (p-val.<10−5)(Figure 3A, Figure S3A, Table S4), suggesting super-enhancers drive high level expression of their target genes. To test whether super-enhancers confer stronger enhancer activity than typical ESC enhancers, we cloned DNA fragments from these elements into luciferase reporter constructs that were subsequently transfected into ESCs. Constituent enhancer segments within the super-enhancers, defined as a 600–1,400 base pair region with a single peak of Oct4/Sox2/Nanog occupancy, generated higher luciferase activity relative to single peaks from typical enhancers (3.8 fold higher; p-val.= 0.02)(Figure 3B). These results are consistent with the idea that super-enhancers and their components help drive high levels of transcription of the key genes that control ESC identity.
To obtain clues to the factors that contribute to the higher activity of individual enhancer elements within super-enhancers, we determined whether the levels of particular transcription factors at the enhancer elements, based on ChIP-Seq data for the genomic locus, correlated with the levels of luciferase activity in the reporter assays. The presence of Klf4 and Esrrb were correlated with high levels of luciferase activity (Figure S3B). Thus, Klf4 and Esrrb, which are especially enriched in super-enhancers (Figure 1G), may contribute to the superior activity of the enhancer elements from super-enhancers in these reporter assays.
We next investigated whether the functional attributes of super-enhancers might account for the observation that reduced levels of either Oct4 or Mediator have very similar effects on the ESC gene expression program and cause the same rapid loss of ESC state (Kagey et al., 2010). Enhancers typically function through cooperative and synergistic interactions between multiple transcription factors and coactivators (Carey, 1998; Carey et al., 1990; Giese et al., 1995; Kim and Maniatis, 1997; Thanos and Maniatis, 1995). The transcriptional output of enhancers with large numbers of transcription factor binding sites can be more sensitive to changes in transcription factor concentration than those with smaller numbers of binding sites (Giniger and Ptashne, 1988; Griggs and Johnston, 1991). We therefore hypothesized that super-enhancer-associated genes may be more sensitive to perturbations in the levels of enhancer-binding factors than genes associated with normal enhancers. We carried out two tests of this model.
In ESCs, reducing the levels of Oct4 leads to loss of ESC-specific gene expression and differentiation. If super-enhancer-associated genes are more sensitive to loss of master transcription factors than other genes, then a reduction in Oct4 levels should cause a preferential loss of super-enhancer-associated gene expression. To test this idea, we reduced the levels of Oct4 transcription using shRNAs, which leads to activation of the trophectoderm master transcription factor Cdx2 and cellular differentiation (Figure 3C)(Deb et al., 2006; Niwa et al., 2005; Strumpf et al., 2005). Oct4 depletion results in changes in cellular morphology consistent with ESC differentiation by 5 days (Figure S3C). We analyzed gene expression 3, 4 and 5 days after Oct4 depletion, and observed super-enhancer-associated genes suffered an earlier and more profound reduction in the levels of transcripts than those associated with typical enhancers (p-val.<10−5, 10−8, and 10−10, respectively)(Figure 3C). These results indicate that the transcriptional output of ESC super-enhancer-associated genes is rapidly and preferentially reduced during differentiation.
If super-enhancer-associated genes are more sensitive to loss of coactivators than other genes, then a reduction in levels of Mediator subunits should preferentially affect expression of super-enhancer-associated genes. When the levels of Mediator were reduced using shRNAs in ESCs, the most pronounced effects on gene expression were observed at super-enhancer-associated genes (p-val.<10−11, 10−11, and 10−13, respectively)(Figure 3D). In summary, these results indicate that reducing the levels of Oct4 and Mediator lead to more profound effects on expression of super-enhancer-associated genes than on other active genes with typical enhancers. These results may thus account for the observation that loss of Oct4 and loss of Mediator subunits have similar effects on ESC state (Kagey et al., 2010).
We investigated whether the super-enhancers found in ESCs had similar counterparts in differentiated cells. We annotated 13,814 enhancers using ChIP-Seq data for the master transcription factor PU.1 in murine progenitor B (pro-B) cells (Table S5)(DeKoter and Singh, 2000; Nutt and Kee, 2007). Previous studies have shown that occupancy of pro-B genomic sites by PU.1 is predictive of enhancer activity (Abujarour et al., 2010; Wlodarski et al., 2007). We found that genome-wide occupancy of the master transcription factor PU.1 and Mediator were highly correlated (Figure 4A, Figure S4). When the levels of Mediator were plotted against enhancers ranked by ChIP-Seq signal, the enhancers in these cells fell into two classes, as was observed for ESCs (Figure 4B). The pro-B cells had 395 large domains that shared key characteristics with the super-enhancers found in ESCs: they spanned DNA domains whose median length is an order of magnitude larger than the typical enhancer, and they had levels of Mediator that are at least an order of magnitude greater than those at the typical enhancer (Figure 4C). Nearly 40% of all Mediator signal observed at enhancers was associated with the super-enhancer domains in pro-B cells.
We studied the frequency of DNA sequences bound by pro-B transcription factors in super-enhancers and in other regions of the genome. Constituent enhancers within super-enhancer regions were significantly enriched for clusters of sequence motifs bound by PU.1, as well as for a set of other transcription factors that have been implicated in control of B cells (Figure 4D,E). The transcription factors with sequence motif enrichment in the super-enhancer domains included Ebf1, E2A and Foxo1, which have previously been shown to be important for control of B cells (Lin et al., 2010). The sequence motif for E2A was significantly more enriched at super-enhancer constituents relative to typical enhancer constituents (p-val.< 10−22)(Figure 4E). E2A is essential for pro-B cell development during B cell lymphopoiesis (Kwon et al., 2008). These findings are consistent with those obtained for ESCs, where DNA sequence motifs for the master transcription factors were enriched in closely spaced clusters.
We next identified genes associated with super-enhancers in pro-B cells and found that many of these are prominent regulators of B cell identify (Figure 4F). For example, super-enhancer-associated genes in pro-B cells included Foxo1 and Inpp5d. In common lymphoid progenitors, Foxo1 acts in concert with Ebf1 to specify B-cell fate as part of a positive feedback loop (Mansson et al., 2012), while the lipid metabolizing enzyme encoded by Inpp5d, SHIP1, dephosphorylates proteins to regulate the B-cell antigen receptor (BCR) signaling response (Alinikula et al., 2010). As in ESCs, the genes associated with super-enhancers in pro-B cells were expressed at higher levels than those associated with typical enhancers (p-val.<10−6)(Figure 4G, Table S5).
To further investigate whether super-enhancers are a general feature of mammalian cells, we extended the study of these elements to a range of other cell types where the key transcription factors that control cell state are well defined (Figure 5). We found that the master transcription factors of mouse myotubes (MyoD), T helper (Th) cells (T-Bet) and macrophages (C/EBPα) also bind large domains with clusters of enhancers (Figure S5A,B), and these large domains are associated with genes that feature predominantly in the biology of these cells (Figure 5A, Figure S5C, Table S6). In myotubes, for example, a super-enhancer is associated with the gene encoding MyoD, which is a master regulator of skeletal muscle and the first factor shown to reprogram fibroblasts into muscle cells (Tapscott, 2005; Weintraub et al., 1989). In Th cells, a super-enhancer is associated with the gene Tcf7 that encodes T cell factor 1 (Tcf-1), which is critical for the production of T cells during hematopoiesis (Staal and Sen, 2008; Xue and Zhao, 2012; Yu et al., 2012b). In macrophages, a super-enhancer is associated with the gene encoding the extracellular matrix glycoprotein Thbs-1, which is involved in scavenger recognition of apoptotic cells by macrophages (Savill et al., 1992). These results support the notion that the key transcription factors controlling cell state bind to clusters of enhancers that are associated with specific genes that are key to cell identity.
The set of enhancers that are bound by transcription factors and control transcription in any one cell type can promote expression of both cell type-specific genes and genes that are active in multiple cell types (Bernstein et al., 2012; Shen et al., 2012; Yip et al., 2012). The super-enhancer elements identified in ESCs, pro-B cells, myotubes, Th cells and macrophages spanned domains that were almost entirely cell type-specific (Figure 5A, Figure S5D) and the genes associated with these elements were highly cell type-specific relative to typical enhancer-associated genes (Figure 5B,C). These results are consistent with the idea that super-enhancers are formed by the binding of key transcription factors to clusters of binding sites that are associated with genes controlling unique cellular identities.
If super-enhancers generally form at genes whose functions are associated with cell identity, we might expect super-enhancer-associated genes to be defining of cell type. When gene ontology analysis was conducted using the set of genes associated with super-enhancers in each cell type, we found that the top 10 most significant biological process terms obtained for each cell type were remarkably descriptive of each cells’ specific function (Figure 5D). This result suggests that super-enhancer-associated genes may be valuable biomarkers for cell identity.
We have identified exceptionally large enhancer domains that are occupied by master transcription factors and associated with genes encoding key regulators of cell identity. In ESCs, these super-enhancers consist of clusters of enhancer elements that are formed by the binding of key transcription factors and the Mediator coactivator complex. The ESC super-enhancers differ from typical enhancers in size, transcription factor density and content, ability to activate transcription and sensitivity to perturbation. Super-enhancers are found in a wide variety of other cell types, where they are associated with key cell type-specific genes known to play prominent roles in their biology. These results implicate super-enhancers in the control of mammalian cell identity.
Super-enhancer formation appears to occur as a consequence of binding of large amounts of master transcription factors to clusters of DNA sequences that are relatively abundant across these large domains. The ESC transcription factors Oct4, Sox2, Nanog, Klf4 and Esrrb have DNA binding motifs that are enriched in super-enhancer domains. Super-enhancers are not simply clusters of typical enhancers, but are particularly enriched in Klf4 and Esrrb, which have previously been shown to play important roles in the ESC gene expression program and in reprogramming of somatic cells to iPS cells (Feng et al., 2009; Festuccia et al., 2012; Jiang et al., 2008; Martello et al., 2012; Percharde et al., 2012; Takahashi and Yamanaka, 2006). Furthermore, super-enhancer-associated genes are highly sensitive to reduced levels of enhancer-bound factors and cofactors. We speculate that the signals that naturally cause ESCs to differentiate may exploit this sensitivity of super-enhancer-associated genes to facilitate transitions to new gene expression programs.
Remarkably, the genes encoding the ESC master transcription factors are themselves driven by super-enhancers, forming a feedback loop where the key transcription factors regulate their own expression (Figure 2F). Earlier studies identified a portion of this interconnected autoregulatory loop, consisting of the genes encoding Oct4, Sox2 and Nanog, but were unaware of the unusual enhancer structure associated with genes in this regulatory loop (Boyer et al., 2005; Loh et al., 2006). The formation of super-enhancers at these genes is also of interest because it suggests that super-enhancers may generally identify genes that are important for control of cell identity and, in some cases, capable of reprogramming cell fate. Indeed, we found evidence for super-enhancers associated with genes that control cell identity in a wide range of cell types and some of these genes do encode factors that have been demonstrated to reprogram cell fate.
We found that super-enhancers can be identified by searching for clusters of binding sites for enhancer-binding transcription factors, and they can be distinguished from typical enhancers by occupancy of cofactors or enhancer-associated surrogate marks such as histone H3K27ac or DNaseI hypersensitivity. Previous studies have noted that many different ESC transcription factors can bind to sites called multiple transcription factor-binding loci (Chen et al., 2008; Kim et al., 2008), but these loci differ from super-enhancers and are associated with different genes. Other studies have also identified large genomic domains involved in gene control, but have not noted that genes encoding the key regulators of cell state are generally driven by super-enhancers. For example, large control regions with clusters of transcription factor binding sites or DNaseI hypersensitivity sites have been described for the IgH enhancer (~20kb), the Th cell receptor (~11.5kb), the β-globin enhancer (~16kb) and others (Diaz et al., 1994; Forrester et al., 1990; Grosveld et al., 1987; Madisen and Groudine, 1994; Michaelson et al., 1995; Orkin, 1990). It is possible that previous studies did not note large domains of enhancer activity associated with key cell identity genes because most existing algorithms typically seek evidence for factor binding or DNaseI hypersensitivity within small regions of the genome. There are, however, algorithms that are designed to identify large domains (Ernst and Kellis, 2010; Filion et al., 2010; Hon et al., 2008; Thurman et al., 2012), and the algorithm we describe here should be useful for further discovery of super-enhancers and other large domains.
The presence of super-enhancers at key cell identity genes provides new insights into transcriptional control of mammalian cells. The evidence described here indicates that mammalian genomes have evolved clusters of DNA sequences near genes encoding key drivers of cell state. These clusters are bound by a combination of key transcription factors to form cell type-specific super-enhancers and in this fashion control the gene expression programs associated with specific cell identities.
The concept of super-enhancers may facilitate mapping of the regulatory circuitry of many different cell types comprising mammals. Discovering how thousands of transcription factors co-operate to control gene expression programs in the vast number of cells in vertebrates is a highly complex undertaking. If only a few hundred super-enhancers dominate control of the key genes that establish and maintain cellular identity, however, it may be possible to create basic models that describe the key features of transcriptional control of cell state.
V6.5, murine ESCs were grown on irradiated murine embryonic fibroblasts (MEFs). Cells were grown under standard ESC conditions as described previously (Whyte et al., 2012). Cells were grown on 0.2% gelatinized (Sigma, G1890) tissue culture plates in ESC media; DMEM-KO (Invitrogen, 10829-018) supplemented with 15% fetal bovine serum (Hyclone, characterized SH3007103), 1000 U/ml LIF (ESGRO, ESG1106), 100 uM nonessential amino acids (Invitrogen, 11140-050), 2 mM L-glutamine (Invitrogen, 25030-081), 100 U/ml penicillin, 100 ug/ml streptomycin (Invitrogen, 15140-122), and 8 nl/ml of 2-mercaptoethanol (Sigma, M7522).
ChIP was carried out as described previously (Boyer et al., 2005). Additional details are provided in the Extended Experimental Procedures. ChIP-Seq of Mediator was generated using a Med1 antibody (Bethyl Labs A300-793A, Lot #A300-793A-2).
Purified ChIP DNA was used to prepare Illumina multiplexed sequencing libraries. Libraries for Illumina sequencing were prepared following the Illumina TruSeq™ DNA Sample Preparation v2 kit protocol with exceptions described in the Extended Experimental Procedures.
A minimal Oct4 promoter was amplified from mouse genomic DNA and cloned into the XhoI and HindIII sites of the pGL3 basic vector (Promega). Enhancer fragments were subsequently cloned into the BamHI and SalI sites of the pGL3-pOct4 vector. The v6.5 murine ESCs were transfected using Lipofectamine 2000 (Invitrogen). The pRL-SV40 plasmid (Promega) was cotransfected as a normalization control. Cells were incubated for 24 hours, and luciferase activity was measured using the Dual-Luciferase Reporter Assay System (Promega). The genomic coordinates of the cloned fragments are found in Table S7.
All ChIP-Seq datasets were aligned using Bowtie (version 0.12.2) (Langmead et al., 2009) to build version MM9 of the mouse genome, or HG18 of the human genome. The GEO Accession ID for aligned and raw data is GSE44288 (www.ncbi.nlm.nih.gov/geo/). Datasets used in this manuscript can be found in Table S8.
We developed a simple method to calculate the normalized read density of a ChIP-Seq dataset in any region. ChIP-Seq reads aligning to the region were extended by 200 base pairs, and the density of reads per base pair (bp) was calculated. The density of reads in each region was normalized to the total number of million mapped reads producing read density in units of reads per million mapped reads per base pair (rpm/bp)
We used the MACS version 1.4.1 (Model based analysis of ChIP-Seq) (Zhang et al., 2008) peak finding algorithm to identify regions of ChIP-Seq enrichment over background. A p-value threshold of enrichment of 10−9 was used for all datasets.
Enhancers were defined as regions of ChIP-Seq enrichment for transcription factor(s). In order to accurately capture dense clusters of enhancers, we allowed regions within 12.5kb of one another to be stitched together.
The methods for identifying and characterizing super-enhancers, as well as assignment of enhancers to genes, are fully described in the Extended Experimental Procedures.
We thank Tom Volkert, Jennifer Love, Sumeet Gupta, and Jeong-Ah Kwon at the Whitehead Genome Technologies Core for Solexa sequencing; Lee M. Lawton, Jessica Reddy, Ana D’Alessio and Jasmine M. De Cock for experimental assistance; and Alla A. Sigova, Alan C. Mullen, Roshan M. Kumar and members of the Young lab for helpful discussion. This work was supported by the National Institutes of Health grants HG002668 (RAY) and CA146445 (R.A.Y., T.L.). R.A.Y. is a founder, and D.A.O. and P.B.R. have become employees, of Syros Pharmaceuticals.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.