|Home | About | Journals | Submit | Contact Us | Français|
The transcription factors OCT4, SOX2, and NANOG have essential roles in early development and are required for the propagation of undifferentiated embryonic stem (ES) cells in culture. To gain insights into transcriptional regulation of human ES cells, we have identified OCT4, SOX2, and NANOG target genes using genome-scale location analysis. We found, surprisingly, that OCT4, SOX2, and NANOG co-occupy a substantial portion of their target genes. These target genes frequently encode transcription factors, many of which are developmentally important homeodomain proteins. Our data also indicate that OCT4, SOX2, and NANOG collaborate to form regulatory circuitry consisting of autoregulatory and feedforward loops. These results provide new insights into the transcriptional regulation of stem cells and reveal how OCT4, SOX2, and NANOG contribute to pluripotency and self-renewal.
Mammalian development requires the specification of over 200 unique cell types from a single totipotent cell. Embryonic stem (ES) cells are derived from the inner cell mass (ICM) of the developing blastocyst and can be propagated in culture in an undifferentiated state while maintaining the capacity to generate any cell type in the body. The recent derivation of human ES cells provides a unique opportunity to study early development and is thought to hold great promise for regenerative medicine (Pera and Trounson, 2004; Reubinoff et al., 2000; Thomson et al., 1998). An understanding of the transcriptional regulatory circuitry that is responsible for pluripotency and self-renewal in human ES cells is fundamental to understanding human development and realizing the therapeutic potential of these cells.
Homeodomain transcription factors are evolutionarily conserved and play key roles in cell-fate specification in many organisms (Hombria and Lovegrove, 2003). Two such factors, OCT4/POU5F1 and NANOG, are essential regulators of early development and ES cell identity (Chambers et al., 2003; Hay et al., 2004; Matin et al., 2004; Mitsui et al., 2003; Nichols et al., 1998; Zaehres et al., 2005). Several genetic studies in mouse suggest that these regulators have distinct roles but may function in related pathways to maintain the developmental potential of these cells (Chambers, 2004). For example, disruption of OCT4 or NANOG results in the inappropriate differentiation of ICM and ES cells to trophectoderm and extra-embryonic endoderm, respectively (Chambers et al., 2003; Mitsui et al., 2003; Nichols et al., 1998). However, overexpression of OCT4 in ES cells leads to a phenotype that is similar to loss of NANOG function (Chambers et al., 2003; Mitsui et al., 2003; Nichols et al., 1998; Niwa et al., 2000). Knowledge of the set of genes regulated by these two transcription factors might reveal why manipulation of OCT4 and NANOG results in these phenotypic consequences.
OCT4 is known to interact with other transcription factors to activate and repress gene expression in mouse ES cells (Pesce and Schöler, 2001). For example, OCT4, a member of the POU (PIT/OCT/UNC) class of homeodomain proteins, can heterodimerize with the HMG-box transcription factor, SOX2, to affect the expression of several genes in mouse ES cells (Botquin et al., 1998; Nishimoto et al., 1999; Yuan et al., 1995). The cooperative interaction of POU homeodomain and HMG factors is thought to be a fundamental mechanism for the developmental control of gene expression (Dailey and Basilico, 2001). The extent to which ES cell gene regulation is accomplished by OCT4 through an OCT4/SOX2 complex and whether NANOG has a role in this process are unknown.
OCT4, SOX2, and NANOG are thought to be central to the transcriptional regulatory hierarchy that specifies ES cell identity because of their unique expression patterns and their essential roles during early development (Avilion et al., 2003; Chambers et al., 2003; Hart et al., 2004; Lee et al., 2004; Mitsui et al., 2003; Nichols et al., 1998; Schöler et al., 1990). Studies in a broad range of eukaryotes have shown that transcriptional regulators that have key roles in cellular processes frequently regulate other regulators associated with that process (Guenther et al., 2005; Lee et al., 2002; Odom et al., 2004). It is likely that the key stem cell regulators bind and regulate genes encoding other transcriptional regulators, which in turn determine the developmental potential of these cells, but we currently lack substantial knowledge of the regulatory circuitry of ES cells and other vertebrate cells.
To further our understanding of the means by which OCT4, SOX2, and NANOG control the pluripotency and self-renewal of human ES cells, we have used genomescale location analysis (chromatin immunoprecipitation coupled with DNA microarrays) to identify the target genes of all three regulators in vivo. The results reveal that OCT4, SOX2, and NANOG co-occupy the promoters of a large population of genes, that many of these target genes encode developmentally important homeodomain transcription factors, and that these regulators contribute to specialized regulatory circuits in ES cells.
DNA sequences occupied by OCT4 in human H9 ES cells (NIH code WA09; Supplemental Data) were identified in a replicate set of experiments using chromatin immunoprecipitation (ChIP) combined with DNA microarrays (Figure 1A and Supplemental Data). For this purpose, DNA microarrays were designed that contain 60-mer oligonucleotide probes covering the region from −8 kb to +2 kb relative to the transcript start sites for 17,917 annotated human genes. Although some transcription factors are known to regulate genes from distances greater than 8 kb, 98% of known binding sites for human transcription factors occur within 8 kb of target genes (Figure S1). The sites occupied by OCT4 were identified as peaks of ChIP-enriched DNA that span closely neighboring probes (Figure 1B). OCT4 was associated with 623 (3%) of the promoter regions for known protein-coding genes and 5 (3%) of the promoters for known miRNA genes in human ES cells (Table S2).
Two lines of evidence suggested that this protein-DNA interaction dataset is of high quality. First, the genes occupied by OCT4 in our analysis included many previously identified or supposed target genes in mouse ES cells or genes whose transcripts are highly enriched in ES cells, including OCT4, SOX2, NANOG, LEFTY2/EBAF, CDX2, HAND1, DPPA4, GJA1/CONNEXIN43, FOXO1A, CRIPTO/TDGF1, and ZIC3 (Abeyta et al., 2004; Brandenberger et al., 2004; Catena et al., 2004; Kuroda et al., 2005; Niwa, 2001; Okumura-Nakanishi et al., 2005; Rodda et al., 2005; Sato et al., 2003; Wei et al., 2005) (Table S2). Second, we have used improved protocols and DNA microarray technology in these experiments (Supplemental Data) that should reduce false positive rates relative to those obtained in previous genome-scale experiments (Odom et al., 2004). By using this new technology with yeast transcription factors, where considerable prior knowledge of transcription factor binding sites has been established, we estimated that this platform has a false positive rate of <1% and a false negative rate of 20% (Supplemental Data).
We next identified, with location analysis, protein-coding and miRNA genes targeted by the stem cell regulators SOX2 and NANOG. SOX2 and NANOG were found associated with 1271 (7%) and 1687 (9%), respectively, of the promoter regions for known protein-coding genes in human ES cells (Tables S2–S4). It was immediately evident that many of the target genes were shared by OCT4, SOX2, and NANOG (Figure 2A). Examples of protein-coding genes that are co-occupied by the three regulators are shown in Figure 2B (Table S5). Control experiments showed that the set of promoters bound by the cell-cycle transcription factor E2F4 in these human ES cells did not overlap substantially with those bound by the three stem cell regulators (Tables S2 and S6). We found that OCT4, SOX2, and NANOG together occupy at least 353 genes in human ES cells.
Previous studies in murine ES cells have shown that SOX2 and OCT4 can interact to synergistically activate transcription of target genes and that this activity is dependent upon the juxtaposition of OCT4 and SOX2 binding sites (Ambrosetti et al., 1997; Remenyi et al., 2004). Our results revealed that approximately half of the promoter regions occupied by OCT4 were also bound by SOX2 in human ES cells (Figure 2A; Table S2). It was surprising, however, to find that >90% of promoter regions bound by both OCT4 and SOX2 were also occupied by NANOG. Furthermore, we found that OCT4, SOX2, and NANOG binding sites occurred in close proximity at nearly all of the genes that they co-occupied (Figure 2C). These data suggest that OCT4, SOX2, and NANOG function together to regulate a significant proportion of their target genes in human ES cells.
A class of small noncoding RNAs known as micro-RNAs (miRNA) play vital roles in gene regulation, and recent studies indicate that more than a third of mammalian protein-coding genes are conserved miRNA targets (Bartel, 2004; Lewis et al., 2005). ES cells lacking the machinery that processes miRNA transcripts are unable to differentiate (Kanellopoulou et al., 2005). Moreover, recent evidence indicates that microRNAs play an important role in organismal development through regulation of gene expression (Pasquinelli et al., 2005). OCT4, SOX2, and NANOG were found associated with 14 miRNA genes and co-occupied the promoters of at least two miRNA genes, mir-137 and mir-301 (Table 1). Our results suggest that miRNA genes are likely regulated by OCT4, SOX2, and NANOG in human ES cells and are important components of the transcriptional regulatory circuitry in these cells.
OCT4 and SOX2 are known to be involved in both gene activation and repression in vivo (Botquin et al., 1998; Nishimoto et al., 1999; Yuan et al., 1995), so we sought to identify the transcriptional state of genes occupied by the stem cell regulators. To this end, the set of genes bound by OCT4, SOX2, and NANOG were compared to gene expression datasets generated from multiple ES cell lines (Abeyta et al., 2004; Brandenberger et al., 2004; Sato et al., 2003; Wei et al., 2005) to identify transcriptionally active and inactive genes (Table S2). The results showed that one or more of the stem cell transcription factors occupied 1303 actively transcribed genes and 957 inactive genes.
The importance of OCT4, SOX2, and NANOG for early development and ES cell identity led us to focus additional analyses on the set of 353 genes that are co-occupied by these regulators in human ES cells (Table S5). We first identified transcriptionally active genes. Transcripts were consistently detected in ES cells for approximately half of the genes co-bound by OCT4, SOX2, and NANOG. Among these active genes, several encoding transcription factors (e.g., OCT4, SOX2, NANOG, STAT3, ZIC3) and components of the Tgf-β (e.g., TDGF1, LEFTY2/EBAF) and Wnt (e.g., DKK1, FRAT2) signaling pathways were notable targets. Recent studies have shown that Tgf-β and Wnt signaling play a role in pluripotency and self-renewal in both mouse and human ES cells (James et al., 2005; Sato et al., 2004). These observations suggest that OCT4, SOX2, and NANOG promote pluripotency and self-renewal through positive regulation of their own genes and genes encoding components of these key signaling pathways.
Among transcriptionally inactive genes co-occupied by OCT4, SOX2, and NANOG, we noted a striking enrichment for transcription factor genes (p < 10−18; Table S7), many of which have been implicated in developmental processes. These included genes that specify transcription factors important for differentiation into extra-embryonic, endodermal, mesodermal, and ectodermal lineages (e.g., ESX1l, HOXB1, MEIS1, PAX6, LHX5, LBX1, MYF5, ONECUT1) (Table S5). Moreover, nearly half of the transcription factor genes that were bound by the three regulators and transcriptionally inactive encoded developmentally important homeodomain proteins (Table 2). These results demonstrate that OCT4, SOX2, and NANOG occupy a set of repressed genes that are key to developmental processes.
To determine which of the OCT4, SOX2, and NANOG bound genes were preferentially expressed in ES cells, we compared expression datasets (Abeyta et al., 2004; Sato et al., 2003) from ES cells and a compendium of differentiated tissues and cell types (Su et al., 2004) (Figure 3; Supplemental Data). It was notable that DPPA4, TDGF1, OCT4, NANOG, and LEFTY2 were at the top of the rank order list of genes that are bound and preferentially expressed in ES cells (Figure 3A). All five of these genes have been implicated in pluripotency (James et al., 2005; Mitsui et al., 2003; Chambers et al., 2003; Nichols et al., 1998; Bortvin et al., 2003). Moreover, several genes that encode developmentally important homeodomain proteins such as DLX5, HOXB1, LHX5, TITF1, LBX1, and HOP were at the bottom of this list, indicating that they are preferentially repressed in ES cells.
The observation that OCT4, SOX2, and NANOG bound to transcriptionally active genes that have roles in pluripotency and transcriptionally inactive genes that promote development suggests that these binding events are regulatory. Two additional lines of evidence indicated that many of the binding events identified in this study contribute to regulation of their target genes. First, some of the genes identified here (e.g., OCT4, SOX2, and NANOG) were previously shown to be regulated by OCT4 and SOX2 in mouse ES cells (Catena et al., 2004; Kuroda et al., 2005, Okumura-Nakanishi et al., 2005; Rodda et al., 2005). Second, we further explored the hypothesis that bound genes are regulated by these transcription factors by taking advantage of the fact that OCT4 and NANOG are expressed in ES cells, but their expression is rapidly downregulated upon differentiation. We compared the expression of OCT4, SOX2, and NANOG occupied genes in human ES cells with expression patterns in 79 differentiated cell types (Su et al., 2004) (Supplemental Data) and focused the analysis on transcription factor genes because these were the dominant functional class targeted by the ES cell regulators (Figure 3B). We expected that for any set of genes, there would be a characteristic change in expression levels between ES cells and differentiated cells. If OCT4, SOX2, and NANOG do not regulate the genes they occupy, then these genes should have the same general expression profile as the control population. We found, however, a significant shift in the distribution of expression changes for genes occupied by OCT4, SOX2, and NANOG (p value < 0.001). Taken together, these data support the model that OCT4, SOX2, and NANOG functionally regulate the genes they occupy and suggest that loss of these regulators upon differentiation results in increased expression of genes necessary for development and reduced expression of a set of genes required for the maintenance of stem cell identity.
Our results suggest that OCT4, SOX2, and NANOG contribute to pluripotency and self-renewal by activating their own genes and genes encoding components of key signaling pathways and by repressing genes that are key to developmental processes. It is presently unclear how the three key regulators can activate some genes and repress others. It is likely that the activity of these key transcription factors is further controlled by additional cofactors, by the precise levels of OCT4, SOX2, and NANOG, and by posttranslational modifications.
In order to identify regulatory network motifs associated with OCT4, SOX2, and NANOG, we assumed that regulator binding to a gene implies regulatory control and used algorithms that were previously devised to discover such regulatory circuits in yeast (Lee et al., 2002). The simplest units of commonly used transcriptional regulatory network architecture, or network motifs, provide specific regulatory capacities such as positive and negative feedback loops to control the levels of their components (Lee et al., 2002; Milo et al., 2002; Shen-Orr et al., 2002).
Our data indicated that OCT4, SOX2, and NANOG form feedforward loops that involve at least 353 protein coding and 2 miRNA genes (Figure 4A). Feedforward-loop motifs contain a regulator that controls a second regulator and have the additional feature that both regulators bind a set of common target genes. The feedforward loop has multiple regulatory capacities that may be especially useful for stem cells. When both regulators are positive, the feedforward loop can provide consistent activity that is relatively insensitive to transient changes in input (Mangan et al., 2003; Shen-Orr et al., 2002). If the regulators have positive and negative functions, the feedforward loop can act as a switch that enables a rapid response to inputs by providing a timesensitive delay where the downstream regulator acts to counter the effects of the upstream regulator in a delayed fashion (Mangan and Alon, 2003; Mangan et al., 2003). In ES cells, both regulatory capacities could be useful for maintaining the pluripotent state while retaining the ability to react appropriately to differentiation signals. Previous studies have shown that feedforward-loop architecture has been highly favored during the evolution of transcriptional regulatory networks in less complex eukaryotes (Lee et al., 2002; Ma et al., 2004; Milo et al., 2002; Resendis-Antonio et al., 2005; Shen-Orr et al., 2002). Our data suggest that feedforward regulation is an important feature of human ES cells as well.
Our results also showed that OCT4, SOX2, and NANOG together bound to the promoters of their own genes, forming interconnected autoregulatory loops (Figure 4B; see also Figure S2). Transcriptional regulation of OCT4, SOX2, and NANOG by the OCT4-SOX2 complex was recently described in murine ES cells (Catena et al., 2004; Kuroda et al., 2005; Okumura-Nakanishi et al., 2005; Rodda et al., 2005). Our data indicate that this autoregulatory loop is conserved in human ES cells and, more importantly, that NANOG is a component of the regulatory apparatus at these genes. Thus, it is likely that the expression and function of these three key stem cell factors are inextricably linked to one another. Autoregulation is thought to provide several advantages, including reduced response time to environmental stimuli and increased stability of gene expression (McAdams and Arkin, 1997; Rosenfeld et al., 2002; Shen-Orr et al., 2002; Thieffry et al., 1998).
The autoregulatory and feedforward circuitry described here may provide regulatory mechanisms by which stem cell identity can be robustly maintained yet permit cells to respond appropriately to developmental cues. Modifying OCT4 and NANOG levels and function can change the developmental potential of murine ES cells (Chambers et al., 2003; Mitsui et al., 2003; Nichols et al., 1998; Niwa et al., 2000), and this might be interpreted as being a consequence of perturbing independent regulatory pathways under the control of these two regulators. Our results argue that the levels and functions of these key stem cell regulators are tightly linked at both target genes and at their own promoters and thus provide an additional framework for interpreting the genetic studies. Changes in the relative stoichiometry of these factors would disturb the autoregulatory and feedforward circuitry, producing changes in global gene regulation and thus cell fate.
An initial model for ES cell transcriptional regulatory circuitry was constructed by identifying OCT4, SOX2, and NANOG target genes that encode transcription factors and chromatin regulators and integrating knowledge of the functions of these downstream regulators in both human and mouse based on the available expression studies and literature (Figure 5). The model includes a subset of active and a subset of repressed target genes based on the extensive expression characterization of the 353 co-bound genes as described earlier. The active targets include genes encoding components of chromatin remodeling and histone-modifying complexes (e.g., SMARCAD1, MYST3, and SET), which may have general roles in transcriptional regulation, and genes encoding transcription factors (e.g., REST, SKIL, HESX1, and STAT3), which themselves are known to regulate specific genes. For instance, REST has recently been shown to be highly abundant in ES cells and functions in part to repress neuronal specific genes (Ballas et al., 2005). Previous studies have proposed that NANOG may function through the Tgf-β pathway in ES cells (Chambers, 2004). Our model suggests that this occurs through direct regulation of key components of this pathway (e.g., TDFG1, LEFTY2/EBAF) and through regulation of at least one transcription factor, SKIL, which controls the activity of downstream components of this pathway (SMAD2, SMAD4) (He et al., 2003). Our data also reveal that OCT4, SOX2, and NANOG co-occupy STAT3, a key regulator of self-renewal in mouse ES cells (Chambers, 2004), suggesting that STAT3 may also play a role in human ES cells.
The model described in Figure 5 also depicts a subset of the genes bound by OCT4, SOX2, and NANOG that are inactive and that encode transcription factors that have key roles in differentiation and development. These include regulators with demonstrated roles in development of all embryonic lineages. This initial model for ES cell transcriptional regulatory circuitry is consistent with previous genetic studies in mice that suggest that OCT4 and NANOG maintain pluripotency through repression of differentiation programs (Chambers et al., 2003; Mitsui et al., 2003; Niwa et al., 2000). This model also provides a mechanistic framework for understanding how this is accomplished through regulation of specific sets of genes that control cell-fate specification.
Discovering how gene expression programs are controlled in living cells promises to improve our understanding of cell biology, development, and human health. Identifying the target genes for key transcriptional regulators of human stem cells is a first critical step in the process of understanding these transcriptional regulatory networks and learning how they control cell identity. Mapping OCT4, SOX2, and NANOG to their binding sites within known promoters has revealed that these regulators collaborate to form in ES cells regulatory circuitry consisting of specialized autoregulatory and feedforward loops. Continued advances in our ability to culture and genetically manipulate human ES cells will allow us to test and manipulate this circuitry. Identification of the targets of additional transcription factors and chromatin regulators using the approaches described here should allow investigators to produce a more comprehensive map of transcriptional regulatory circuitry in these cells. Connecting signaling pathways to this circuit map may reveal how these pluripotent cells can be stimulated to differentiate into different cell types or how to reprogram differentiated cells back to a pluripotent state.
Human embryonic stem (ES) cells were obtained from WiCell (Madison, Wisconsin; NIH Code WA09). Detailed protocol information on human ES cell growth conditions and culture reagents are available at http://www.mcb.harvard.edu/melton/hues. Briefly, passage 34 cells were grown in KO-DMEM medium supplemented with serum replacement, basic fibroblast growth factor (bFGF), recombinant human leukemia inhibitory factor (LIF), and a human plasma protein fraction. In order to minimize any MEF contribution in our analysis, H9 cells were cultured on a low density of irradiated murine embryonic fibroblasts (ICR MEFs) resulting in a ratio of approximately >8:1 H9 cell to MEF. The culture of H9 on low-density MEFs had no adverse effects on cell morphology, growth rate, or undifferentiated status as compared to cells grown under typical conditions. In addition, immunohistochemistry for pluripotency markers (e.g., OCT4, SSEA-3) indicated that H9 cells grown on a minimal feeder layer maintained the ability to generate derivates of ectoderm, mesoderm, and endoderm upon differentiation (Figures S3 and S4).
The NANOG (AF1997) and SOX2 (AF2018) antibodies used in this study were immunoaffinity purified against the human proteins and shown to recognize their target proteins in Western blots and by immunocytochemistry (R&D Systems Minneapolis, Minnesota). Multiple OCT4 antibodies directed against different portions of the protein (AF1759 R&D Systems, sc-8628 Santa Cruz, sc-9081 Santa Cruz), some of which were immunoaffinity purified, were used in this study and have been shown to recognize their target protein in Western blots and by immunocytochemistry. The E2F4 antibody used in this study was obtained from Santa Cruz (sc-1082) and has been shown to recognize E2F4-responsive genes identified in previous ChIP studies (Table S2) (Ren et al., 2002; Weinmann et al., 2002).
Protocols describing all materials and methods can be downloaded from http://jura.wi.mit.edu/young/hESRegulation/.
Human embryonic stem cells were grown to a final count of 5 × 107−1 × 108 cells for each location analysis reaction. Cells were chemically crosslinked by the addition of one-tenth volume of fresh 11% formaldehyde solution for 15 min at room temperature. Cells were rinsed twice with 1 × PBS and harvested using a silicon scraper and flash frozen in liquid nitrogen and stored at −80°C prior to use. Cells were resuspended, lysed in lysis buffers, and sonicated to solubilize and shear crosslinked DNA. Sonication conditions vary depending on cells, culture conditions, crosslinking, and equipment. We used a Misonix Sonicator 3000 and sonicated at power 7 for 10 × 30 s pulses (90 s pause between pulses) at 4°C while samples were immersed in an ice bath. The resulting whole-cell extract was incubated overnight at 4°C with 100 μl of Dynal Protein G magnetic beads that had been preincubated with 10 μg of the appropriate antibody. Beads were washed five times with RIPA buffer and one time with TE containing 50 mM NaCl. Bound complexes were eluted from the beads by heating at 65°C with occasional vortexing, and crosslinking was reversed by overnight incubation at 65°C. Whole-cell extract DNA (reserved from the sonication step) was also treated for crosslink reversal. Immunoprecipitated DNA and whole-cell extract DNA were then purified by treatment with RNaseA, proteinase K, and multiple phenol:chloroform: isoamyl alcohol extractions. Purified DNA was blunted and ligated to linker and amplified using a two-stage PCR protocol. Amplified DNA was labeled and purified using Invitrogen Bioprime random primer labeling kits (immunoenriched DNA was labeled with Cy5 fluorophore, whole-cell extract DNA was labeled with Cy3 fluorophore). Labeled DNA was combined (5–6 μg each of immunoenriched and whole-cell extract DNA) and hybridized to arrays in Agilent hybridization chambers for 40 hr at 40°C. Arrays were then washed and scanned (Supplemental Data).
We would like to thank Bioinformatics and Research Computing (BaRC) and the Center for Microarray Technology (CMT) at the Whitehead Institute for computational and technical support. We would also like to thank members of the Young lab as well as Chad Cowan and Kevin Eggan for helpful discussions. L.A.B. was supported by NRSA postdoctoral fellowship CA094664, and H.L.M. by NRSA postdoctoral fellowship GM068273. R.M.K. was supported by a fellowship from the American Cancer Society. This work was supported by NHGRI grant HG002668 to D.K.G. and R.A.Y. and NIH grant GM069400 to R.A.Y. T.I.L., D.K.G., and R.A.Y. consult for Agilent Technologies.
Supplemental Data: Supplemental Data include seven figures, seven tables, and Supplemental text and can be found with this article online at http://www.cell.com/cgi/content/full/122/6/■■■/DC1/.
All microarray data from this study are available at ArrayExpress at the EBI (http://www.ebi.ac.uk/arrayexpress) under the accession designation E-WMIT-5.