|Home | About | Journals | Submit | Contact Us | Français|
The process by which a single fertilized egg develops into a human being with more than 200 cell types—each with a distinct gene expression pattern controlling its cellular state—is poorly understood. Knowledge of the transcriptional regulatory circuitry that establishes and maintains gene expression programs in mammalian cells is fundamental to understanding development and should provide the foundation for improved diagnosis and treatment of disease. Although it is not yet feasible to map the entirety of this circuitry in vertebrate cells, recent work in embryonic stem (ES) cells has demonstrated that core features of the circuitry can be discovered through studies involving selected regulators. Here, we highlight the fundamental insights that have emerged from studies that examined the role of transcription factors, chromatin regulators, signaling pathways, and noncoding RNAs in the regulatory circuitry of ES cells. Maps of regulatory circuitry and the insights that have emerged from these studies have improved our understanding of global gene expression and are facilitating efforts to reprogram cells for disease therapeutics and regenerative medicine.
More than 200 different cell fates are generated during vertebrate development, each with a gene expression program unique to that cell type (Su et al. 2004; Shyamsundar et al. 2005; Vickaryous and Hall 2006). The expression programs are controlled by transcription factors, chromatin modifiers, and regulatory RNAs that can be influenced by signals from the extracellular environment. The subset of regulators that are expressed in each cell type have a key role in establishing and maintaining cell state. How these regulators control global gene expression programs is almost entirely unmapped in vertebrates because of limitations in our knowledge of the transcriptional, chromatin, and signaling components that are key to the control of each cell type. Nonetheless, multiple groups have begun to tackle the challenge of mapping transcriptional regulatory circuitry, particularly in ES cells, and have demonstrated that important insights into the global control of cell state can be obtained from these efforts (Boyer et al. 2005, 2006; Chew et al. 2005; Bernstein et al. 2006; Lee et al. 2006; Loh et al. 2006; Wu et al. 2006; Pan et al. 2007; Zhao et al. 2007; Chen et al. 2008; Cole et al. 2008; Endoh et al. 2008; Jiang et al. 2008; Kim et al. 2008a; Tam et al. 2008).
Mapping transcriptional regulatory circuitry is important because it provides insights into the regulators and mechanisms that control global gene expression, cell state, and development. Because deficiencies in control of gene expression can contribute to many human diseases such as cancer, immune disease, and diabetes (Villard 2004; Kloosterman and Plasterk 2006; Latchman 2008), knowledge of normal and abnormal transcriptional regulatory circuitries may provide new approaches to disease diagnosis and therapy. Here, we describe initial efforts to dissect core features of the ES cell transcriptional regulatory network and highlight the key insights that have emerged from such studies.
Transcription factors, chromatin regulators, signaling pathways, and noncoding RNAs are among the key components that control mRNA gene expression (Fig. 1). The molecular mechanisms by which these regulators control expression of individual genes have been studied extensively and are reviewed elsewhere (Lee and Young 2000; Orphanides and Reinberg 2002; Berger 2007; Li et al. 2007; Core and Lis 2008; Hobert 2008). Our understanding of these mechanisms suggests how to organize models of the transcriptional regulatory circuitry, as described below.
DNA-binding transcription factors recognize sequence motifs, are key to specific gene regulation, and can thus be used to anchor transcriptional regulatory networks (Harrison 1991; Pabo and Sauer 1992; Kadonaga 2004; Remenyi et al. 2004). Transcription factors are also the single largest protein family encoded in the human genome, where they account for approximately 10% of protein-coding genes (Lander et al. 2001; Levine and Tjian 2003). They bind to both promoter-proximal and -distal regulatory DNA sequences and can aid or inhibit recruitment of the transcription apparatus at target genes (Latchman 1997; Blackwood and Kadonaga 1998; Ogata et al. 2003; West and Fraser 2005). At most well-studied promoters, it is evident that multiple transcription factors are bound, which allows for the combinatorial control of gene expression (Evans et al. 1990; Greene 1990; Harbison et al. 2004; Panne et al. 2004; Remenyi et al. 2004).
Chromatin regulators are often recruited to specific portions of the genome by DNA-binding transcription factors or the transcription apparatus where they act to augment gene expression or repression through their effects on chromatin state (Berger 2007; Kouzarides 2007; Li et al. 2007). Chromatin regulators that methylate DNA and certain nucleosomal histone residues have been implicated in heritable chromatin states and thus have important roles in developmental control. Methylation of CpG islands in the promoter regions of some genes by DNA methyltransferases contributes to their repression and is maintained during development by maintenance methylases (Goll and Bestor 2005; Turek-Plewa and Jagodziński 2005; Klose and Bird 2006). Methylation of nucleosomal histones by Trithorax and Polycomb group protein complexes is also thought to be important for maintaining gene expression programs associated with specific cell states; Trithorax group complexes are associated with actively transcribed genes, whereas Polycomb group regulators are associated with repression of most genes that they occupy (Pirrotta 1998; Orlando 2003; Ringrose and Paro 2004; Schuettengruber et al. 2007; Schwartz and Pirrotta 2007).
Signaling pathways act to maintain or initiate changes in the regulatory circuitry in response to environmental or developmental cues. The terminal components of signaling pathways are often protein kinases that can phosphorylate and activate transcriptional regulators or are themselves transcription factors and chromatin modifiers (Hunter 2000; Brivanlou and Darnell 2002; Yang et al. 2003; Pokholok et al. 2006). Knowledge of the target genes of each of the signaling pathways that contribute to control of cell state is critical to understanding how these pathways control the gene expression program associated with such states.
Noncoding RNAs can influence gene expression and chromatin state (Goodrich and Kugel 2006; Amaral et al. 2008; Hawkins and Morris 2008). For example, a large class of noncoding RNAs, termed microRNAs (miRNAs), modify gene expression by regulating translation and degradation of mRNA transcripts (Ambros 2004; Bartel 2004; Valencia-Sanchez et al. 2006; Meister 2007; Makeyev and Maniatis 2008). Noncoding RNA species have also been implicated in control of chromatin state (Verdel et al. 2004; Moazed et al. 2006; Grewal and Elgin 2007; Rinn et al. 2007; Zaratiegui et al. 2007). We have limited understanding of the regulation of expression of noncoding RNA species, and in most cases, we have yet to identify the specific set of genes that are under the control of these noncoding RNA species.
Hundreds of gene expression regulators are present in each cell, making it a challenge to map the regulatory network that they form in even one cell type, much less in 200 cell types (Lander et al. 2001; Brivanlou and Darnell 2002). For this reason, even the most ambitious global studies have examined only a handful of transcriptional regulators and then in only a few cell types (Cawley et al. 2004; Odom et al. 2004, 2006; Boyer et al. 2005; Rada-Iglesias et al. 2005, 2008; Loh et al. 2006; Barski et al. 2007; Mikkelsen et al. 2007; Chen et al. 2008; Cole et al. 2008; Jaenisch and Young 2008; Jiang et al. 2008; Kim et al. 2008a; Komashko et al. 2008; Marson et al. 2008b; Park et al. 2008; Reed et al. 2008; Wang et al. 2008). However, several lines of evidence argue that a small subset of transcription factors and other regulators have a key role in the control of cell state. Cells can be reprogrammed into other cell states through forced expression of a very small number of transcription factors. For example, fibroblasts can be reprogrammed into induced pluripotent stem cells upon forced expression of four transcription factors (Takahashi and Yamanaka 2006; Okita et al. 2007; Takahashi et al. 2007; Wernig et al. 2007; Yu et al. 2007). Similarly, fibroblasts and other cells can take on a skeletal muscle state when the myogenic transcription factor MyoD is expressed (Davis et al. 1987; Weintraub et al. 1989, 1991; Choi et al. 1990). Screens to identify genes that are key to maintaining the ES cell state have identified only a small number of all of the transcription factor genes that are expressed in these cells (Ivanova et al. 2006; Zhang et al. 2006; Fazzio et al. 2008). Furthermore, several studies have shown that many transcription factors can be eliminated without dire consequences for the cell (Winzeler et al. 1999; Giaever et al. 2002; Kemphues 2005). The small set of transcription factors that have been demonstrated to be important for establishment or maintenance of a cell state will henceforth be termed “key regulators.”
A simplified version of the transcriptional regulatory circuitry of a cell can thus be deduced by discovering the population of genes that are occupied and controlled by the key regulators for that cell type. We call this simplified network the “core transcriptional regulatory circuitry.” Given current experimental limitations to elucidating complete vertebrate circuitry, we propose that the mapping of core regulatory circuitry provides a shortcut to discovering key network themes, a concept that we believe has been validated with the study of ES cells.
Initial studies of transcription factors in the ES cell transcriptional regulatory network focused on the key regulators Oct4, Sox2, and Nanog (Boyer et al. 2005; Loh et al. 2006). Knowledge of genetic phenotypes, expression profiles, and molecular relations was leveraged to identify these factors as key components of the ES cell network. Genetic studies demonstrated functional consequences in ES cells of inappropriate expression of these factors, the expression of Oct4 and Nanog was found to be specific to pluripotent cells, and Sox2 was known to form a heterodimer with Oct4 (Schöler et al. 1990; Ambrosetti et al. 1997; Nichols et al. 1998; Avilion et al. 2003; Chambers et al. 2003; Mitsui et al. 2003; Hart et al. 2004). Because of the overwhelming evidence for key roles for these regulators in the ES cell network, multiple groups have mapped their target genes in human and murine ES cells (Boyer et al. 2005; Loh et al. 2006).
Several important themes emerged from the study of target genes for Oct4, Sox2, and Nanog (Fig. 2). The key regulators clearly prefer to cooccupy their target genes, thus forming a network structure called a multi-input motif (Fig. 2A) (Lee et al. 2002; Boyer et al. 2005; Loh et al. 2006; Alon 2007). Because they form a heterodimer, Oct4 and Sox2 were expected to bind the same target genes, but Nanog was also found to occupy a large percentage of the Oct4-Sox2 bound genes. More recent studies have mapped additional transcription factors in ES cells and have found that they also follow the theme of target gene cooccupancy (Wu et al. 2006; Chen et al. 2008; Jiang et al. 2008; Kim et al. 2008a). These studies, and similar investigations of key regulators in other cell types (Odom et al. 2004, 2006), suggest that the multi-input motif is an important theme in the transcriptional regulatory circuitry of vertebrate cells.
A related feature or theme that emerged from these global binding studies is that the key regulators tend to occupy DNA sequences in very close proximity to one another (Fig. 2B) (Boyer et al. 2005; Loh et al. 2006; Cole et al. 2008; Marson et al. 2008b). Oct4, Sox, and Nanog were often found to bind within 25 bp of one another at target genes (Marson et al. 2008b). This proximity suggests that these factors are forming tightly associated complexes on DNA to coordinately affect transcription. Some of these transcription factors are competing for binding to overlapping or similar DNA sequences, and because the data come from a population of cells, it is also possible that the complete set of transcription factors is not simultaneously bound at these sites in individual cells. Further studies into the biochemical nature of these binding events are needed to test these possibilities.
One of the more important themes that emerged from the initial studies of the key regulators of ES cells was that Oct4, Sox2, and Nanog together cooccupy their own promoter regions and thus form a network structure called an interconnected autoregulatory loop (Fig. 2C) (Boyer et al. 2005; Loh et al. 2006). This network motif may have two purposes: Feedback gene regulation by these transcription factors may contribute to the stability of the core ES cell transcriptional regulatory network, yet this network structure may also allow for a rapid change in core regulatory circuitry if one regulator is eliminated upon receipt of differentiation signals. Indeed, the circuit formed by Oct4, Sox2, and Nanog could apparently act as a bistable switch controlling ES cell maintenance versus differentiation (Chickarmane et al. 2006).
The early global studies also revealed that the key regulators occupy and control genes encoding many other transcriptional regulators that are expressed in ES cells (Boyer et al. 2005; Loh et al. 2006), forming a hierarchical regulatory network structure (Fig. 2D). The target genes of Oct4, Sox2, and Nanog were significantly enriched for transcription factors and developmental regulators (Boyer et al. 2005). The control of these secondary regulators allows key transcription factors to indirectly control a much larger set of genes. This hierarchical network structure has been described in model organisms and will likely prove to be a common vertebrate network architecture (Martinez-Antonio and Collado-Vides 2003; Ma et al. 2004; Farkas et al. 2006). This network structure may allow for rapid large-scale changes in the transcription program in response to signals that may only directly target a handful of key regulators.
One additional fundamental theme that has emerged from global binding studies is that Oct4, Sox2, and Nanog occupy both actively transcribed genes encoding ES cell transcription factors and repressed genes encoding lineage-specific developmental regulators (Fig. 2E) (Boyer et al. 2005; Loh et al. 2006). This observation suggested that regulation of silent developmental regulators in ES cells might be critical for pluripotency. Inappropriate expression of lineage-specific developmental regulators could initiate gene expression programs for other cell states, and thus it appears to be important to maintain these key regulators of other cell types in a repressed state in ES cells. Precisely how Oct4, Sox2, and Nanog contribute to repression of lineage-specific developmental regulators is not known.
The genomic locations and functions of Polycomb and Trithorax group (PcG and TrxG) proteins, along with the histone modifications catalyzed by these chromatin regulators, have been the subject of much study in ES cells (Bernstein et al. 2006; Boyer et al. 2006; Lee et al. 2006; Pan et al. 2007; Zhao et al. 2007; Endoh et al. 2008). There is considerable genetic evidence that these chromatin regulators have an important role in early development (Faust et al. 1998; O'Carroll et al. 2001; Pasini et al. 2004; Breiling et al. 2007). Studies of their role in the ES cell network have revealed several important insights that are likely to become general themes of vertebrate transcriptional regulatory networks.
One key insight that emerged from studying these chromatin regulators is that the previous assumption that TrxG complexes are associated with actively transcribed genes, whereas PcG regulators are associated with repressed genes, is imperfect (Bernstein et al. 2006; Boyer et al. 2006; Lee et al. 2006; Pan et al. 2007; Zhao et al. 2007; Endoh et al. 2008). The set of silent genes encoding lineage-specific developmental regulators that are occupied by Oct4, Sox2, and Nanog was also occupied by both PcG and TrxG complexes and contained nucleosomes trimethylated at both histone H3 lysine (K)4 and H3K27. These silent genes encoding developmental regulators were therefore described as being bivalently marked by both activating and repressive marks.
Further studies revealed that the transcription apparatus was recruited to the promoters of these bivalently marked genes encoding developmental regulators and that transcription was initiated but full-length transcript was not produced (Guenther et al. 2007; Stock et al. 2007). Studies in Drosophila suggest that transcriptional pausing is a conserved regulatory feature at genes encoding silent developmental regulators in embryonic tissues (Muse et al. 2007; Zeitlinger et al. 2007; Hendrix et al. 2008; Nechaev and Adelman 2008). This regulatory feature may keep genes encoding developmental regulators in a poised expression state, allowing rapid transcription of certain genes upon induction of differentiation. How only a specific subset of these Pc-repressed genes is induced to overcome transcriptional pausing to permit lineage-specific differentiation is not yet understood.
Recent studies have revealed how some signal transduction pathways contribute to control of the ES cell transcriptional regulatory network (Pereira et al. 2006; Chen et al. 2008; Cole et al. 2008; Tam et al. 2008; Yi et al. 2008). The Wnt signaling pathway has important roles throughout development and can influence ES cell state (Logan and Nusse 2004; Reya and Clevers 2005). A terminal component of this pathway, the transcription factor Tcf3, was identified as a likely key regulator in ES cells due to its genetic and expression phenotypes (Korinek et al. 1998; Merrill et al. 2004; Pereira et al. 2006).
Subsequent genome-wide studies of Tcf3 in ES cells revealed that the Wnt signaling pathway is intimately connected to the core transcriptional circuitry of ES cells (Fig. 3) (Cole et al. 2008; Marson et al. 2008b; Tam et al. 2008; Yi et al. 2008). Tcf3 occupies promoters of the key transcription factors Oct4, Sox2, and Nanog, and these factors, together with Tcf3, occupy the Tcf3 promoter (Fig. 3A). Thus, Tcf3 is a component of the interconnected autoregulatory loop that is at the core of ES cell transcriptional regulatory circuitry. These studies also revealed that Tcf3 cooccupied the genome with the key transcription factors, suggesting that the Wnt signaling pathway can affect cellular state by directly connecting to the core circuitry. In this manner, cells could respond to Wnt signaling through a feedforward loop where the key ES cell regulators as well as their targets are immediately targeted by Tcf3 (Fig. 3B). This network structure would allow for both a rapid and stable response to environmental stimuli.
Manipulation of the canonical Wnt pathway through Tcf3 can affect the balance between pluripotency and differentiation in ES cells (Cole et al. 2008). High Wnt pathway activity favors pluripotency, whereas low activity favors differentiation (Sato et al. 2004; Ogawa et al. 2006; Singla et al. 2006; Miyabayashi et al. 2007). Tcf3 and its associated proteins apparently contribute to gene activation when the Wnt pathway is activated and to repression when the pathway is not (Cole et al. 2008; Tam et al. 2008). This suggests that under conditions of high Wnt activity, Tcf3 and the key transcription factors Oct4, Sox2, and Nanog generally function to activate target gene expression (although such activity can be overridden by PcG proteins and other repressors). In contrast, under conditions of low Wnt activity, Tcf3 acts to repress target gene expression and may thus counter the activating functions of Oct4, Sox2, and Nanog. These opposing inputs thus allow ES cells to modulate the level of target gene expression in the core circuitry on the basis of the cell's external environment, which in turn influences the balance between pluripotency and differentiation.
miRNAs are critical for normal ES cell self-renewal and differentiation and have demonstrated roles in early development (Bernstein et al. 2003; Kanellopoulou et al. 2005; Murchison et al. 2005; Wang et al. 2007; Sinkkonen et al. 2008; Stefani and Slack 2008). Recent studies have revealed how the miRNA class of noncoding RNAs is controlled in ES cells, and this information has been incorporated into a model of the core regulatory circuitry of ES cells (Marson et al. 2008b). This class of non-coding RNAs adds another layer to the regulation of gene expression because the RNAs act posttranscriptionally to influence mRNA stability and translation. miRNAs can regulate the expression of many protein-coding genes (Farh et al. 2005; Krek et al. 2005; Lewis et al. 2005; Lim et al. 2005) and thus form a number of interesting control circuits in cells where they are expressed.
miRNA gene expression is regulated in a manner similar to that of regulation of protein-coding genes in ES cells (Fig. 4A) (Marson et al. 2008b). The key transcription factors Oct4, Sox2, Nanog, and Tcf3 occupy and positively regulate the promoters of miRNA genes that are actively expressed in ES cells. These key transcription factors also occupy a set of silent miRNA genes that are expressed later during differentiation. This set of silent miRNA genes is occupied by PcG proteins in ES cells, thus poising these miRNA genes for expression during development in a lineage-specific fashion.
Studies of miRNAs and the core circuitry of ES cells also revealed recognizable network motifs that provide insights into how networks can control cell state (Marson et al. 2008b). Certain miRNAs, such as mir-290-295, form a common network motif termed an incoherent feed-forward loop with the key transcription factors in ES cells (Fig. 4B) (Alon 2007). This network architecture may allow ES cells to fine-tune gene expression levels of important target genes and facilitate removal of certain ES-cell-specific mRNAs when cells are stimulated to differentiate.
A model for ES cell core regulatory circuitry has recently been described that incorporates key transcription factors, chromatin regulators, the Wnt signaling pathway, and miRNAs (Fig. 5) (Marson et al. 2008b). This model represents only a portion of the available data, but it serves to illustrate several important features of ES cell regulatory circuitry. The transcription factors Oct4, Sox2, Nanog, and Tcf3 form an interconnected autoregulatory loop, to which the Wnt signaling pathway connects. The key transcription factors occupy and regulate a set of actively transcribed protein-coding and noncoding genes whose functions contribute to the ES cell state. The products of some of these genes add another layer of regulation; for example, the miRNAs fine-tune the levels of mRNAs for certain protein-coding genes. The key transcription factors also occupy silent protein-coding and noncoding genes involved in lineage-specific functions, and these genes appear to experience transcription initiation, but transcript completion is prevented by PcG proteins and perhaps additional repressors (Guenther et al. 2007; Stock et al. 2007).
Somatic cells can be reprogrammed into induced pluripotent stem (iPS) cells by ectopic expression of four or fewer transcriptional regulators (Takahashi and Yamanaka 2006; Meissner et al. 2007; Okita et al. 2007; Takahashi et al. 2007; Wernig et al. 2007, 2008; Yu et al. 2007; Aoi et al. 2008; Jaenisch and Young 2008; Kim et al. 2008b; Nakagawa et al. 2008; Park et al. 2008). The transcription factors that have been used for iPS cell generation have typically included a combination of Oct4, Sox2, Klf4, and cMyc or a mix of Oct4, Sox2, Nanog, and Lin28. Knowledge of the transcriptional regulatory circuitry has already provided insights into the mechanisms by which forced expression of these transcription factors leads to reprogramming of somatic cells (Jaenisch and Young 2008). For example, the interconnected autoregulatory loop of ES cells—composed of genes encoding the transcription factors Oct4, Sox2, Nanog, and Tcf3—can be jump-started by transient expression of these reprogramming factors.
Knowledge of the transcriptional regulatory circuitry has recently been used to improve the reprogramming process (Marson et al. 2008a; Mikkelsen et al. 2008). The discovery that the Wnt pathway is connected directly to ES cell core regulatory circuitry (Cole et al. 2008; Tam et al. 2008; Yi et al. 2008) suggested that manipulation of the Wnt pathway might facilitate reprogramming. Indeed, addition of the Wnt3a ligand allows efficient reprogramming even in the absence of c-Myc (Marson et al. 2008a), which is important because the presence of the exogenous c-Myc oncogene in iPS cells leads to tumors in animals derived from these cells (Okita et al. 2007; Jaenisch and Young 2008). For somatic cells to adopt ES cell transcriptional regulatory circuitry, it is thought that they must silence the expression of key regulators of the somatic cell state. This was confirmed by experiments revealing that repression of key regulators of the somatic cell circuitry using RNA inhibition can substantially improve reprogramming efficiency (Mikkelsen et al. 2008).
Knowledge of the transcriptional regulatory circuitry is important because it provides insights into the mechanisms by which key regulators control global gene expression, cell state, and development. Despite having identified only core elements of the ES cell transcriptional regulatory circuitry, important insights have been gained into the control of pluripotency and self-renewal in these cells. This knowledge has also provided insights into the mechanisms involved in reprogramming of cell state (Jaenisch and Young 2008) and has led to improved methods for reprogramming (Marson et al. 2008a; Mikkelsen et al. 2008). The new understanding of the core transcriptional regulatory circuitry in ES cells is also likely to shed light on key aspects of cancer, because many features of the ES cell gene expression program are recapitulated in cancer cells (Ben-Porath et al. 2008; Wong et al. 2008). These advances highlight the importance of future work to further map the regulatory circuitry of a wide range of cells, particularly those of medical importance.
We are grateful for the contributions made by many members of the Young and Jaenisch labs to the insights and concepts described in this paper.