|Home | About | Journals | Submit | Contact Us | Français|
To examine transcription factor (TF) network(s), we created mouse ES cell lines, in each of which one of 50 TFs tagged with a FLAG moiety is inserted into a ubiquitously controllable tetracycline-repressible locus. Of the 50 TFs, Cdx2 provoked the most extensive transcriptome perturbation in ES cells, followed by Esx1, Sox9, Tcf3, Klf4, and Gata3. ChIP-Seq revealed that CDX2 binds to promoters of up-regulated target genes. By contrast, genes down-regulated by CDX2 did not show CDX2 binding, but were enriched with binding sites for POU5F1, SOX2, and NANOG. Genes with binding sites for these core TFs were also down-regulated by the induction of at least 15 other TFs, suggesting a common initial step for ES cell differentiation mediated by interference with the binding of core TFs to their target genes. These ES cell lines provide a fundamental resource to study biological networks in ES cells and mice.
The prevailing paradigm of modern biology states that gene regulatory networks determine the identity of cells, and their alteration by environmental factors dictates changes of cell identity, i.e., cell differentiation (Davidson, 2006). Analysis of the structure and dynamics of gene regulatory networks is key to the understanding of biological systems but poses a great challenge due to the vast and manifold complexity of regulatory mechanisms.
One possible approach is to carry out a systematic gene perturbation study in order to “reverse engineer” these regulatory networks. Ideally, all the transcription factors would be manipulated one at a time and in different combinations in a variety of cell types. Readout of the impact of such manipulation would be monitored in a variety of ways, including the profiling of all RNA transcripts and proteins. This type of approach is complementary to conventional studies in which systematic gene manipulations have been successfully carried out, but usually with a focus on the phenotype, e.g., cell morphology, growth property, and differentiation markers (Chambers et al., 2003; Ivanova et al., 2006; Pritsker et al., 2006). To this end, mouse ES cells (Evans and Kaufman, 1981; Martin, 1981) are most suitable, as they can be differentiated into a variety of cell types in distinct cell culture conditions in vitro (Murry and Keller, 2008) and can also be developed into animal models to further study the effects of gene perturbation in vivo (Solter, 2006). As a complementary approach to the comprehensive “loss-of-function” study of all mouse genes (Collins et al., 2007; Skarnes et al., 2004), we aim to generate ES cell lines in which a TF can be induced for “gain-of-function” in a controlled manner, enabling observations of the network perturbation caused by each TF in a uniform condition across all the ES cell lines.
Global gene regulatory networks have been intensively studied in mouse ES cells by expression profiling (Walker et al., 2007), protein complex analysis (Wang et al., 2006), and genome-wide chromatin immunoprecipitation (Loh et al., 2006). The critical roles of three transcription factors - Pou5f1 (Oct4 or Oct3/4) (Nichols et al., 1998), Sox2 (Yuan et al., 1995), and Nanog (Chambers et al., 2003; Mitsui et al., 2003) – discovered earlier (reviewed in (Niwa, 2007; Silva and Smith, 2008)) have recently been rationalized by the discovery of core transcriptional regulatory networks between these genes (Boyer et al., 2005; Loh et al., 2006). Furthermore, similar analyses of other key TFs in mouse ES cells have successfully extended the core transcriptional network (Chen et al., 2008; Kim et al., 2008). Obviously, the analysis of many more TFs, including genes that are not expressed in ES cells, is required to explore global TF network(s) that may be outside of the core transcriptional network of Pou5f1, Sox2, and Nanog. To this end, an appropriate mouse ES cell bank could facilitate a variety of high-throughput, genome-wide analysis methodologies.
Here we describe a strategy for and the establishment of TF-inducible ES cell lines, and we show how these ES cell lines can be used in several ways. As a proof of principle for the strategy, we characterize how an exemplary differentiation-inducing TF, Cdx2, regulates the global transcriptome and shifts the balance of gene regulatory network toward ES cell differentiation.
We analyzed 50 TF genes (~3% of all 1,600 – 2,000 TFs encoded in the mouse genome (Kanamori et al., 2004; Messina et al., 2004) to assess the consequences of their induction in ES cells. These genes were selected primarily from a set of high-priority genes involved in critical functions in mouse ES cells and their differentiation, inferred in our previous work (Matoba et al., 2006). About half the genes are expressed in undifferentiated ES cells; the other half are not expressed or are expressed at a low level in undifferentiated ES cells but are induced in differentiating ES cells. We also included genes expressed late in lineage specification (Ascl1, Ascl2, Myod1, Sox9, and Sfpi1) to see whether TFs can induce their cognate targets without their usual regulatory context for function. Three non-TF genes - Dppa5a, Gadd45a, and Tuba1a - and one empty vector were included as controls. A total of 53 genes and a control used for this work are listed in Figure 1A. To induce a specific TF, we employed the Tet-repressible gene expression system, with the expression cassette integrated at the ROSA26 locus (ROSA-TET locus) (Figure 1B and Figure S1A) (Masui et al., 2005). This system makes use of the ubiquitous and relatively high expression at the ROSA26 locus (Soriano, 1999). In the absence of doxycycline (Dox), the recombinant ROSA-TET locus expresses a polycistronic transcript for the Open Reading Frame (ORF) of TF and Venus YFP proteins. To facilitate the detailed analyses of individual TFs, we inserted a FLAG-tag at the C-terminus of all transgenes, making it possible to use FLAG as a universal bait for immunological assays. In order to minimize clone-to-clone variation of gene expression level and to generate these ES cells as a permanent resource for future work, we synchronized passage numbers and performed multi-step quality control of these ES cells (Figure 1B, F and G: Supplemental Text).
We first carried out control experiments and found that three medium changes at 3 hour intervals minimized unwanted perturbation associated with commonly used cell passaging while inducing the transgene fully by effectively removing Dox (Figure S2). Indeed, the control ES cells, in which an expression unit without an ORF was inserted into the ROSA-TET locus, showed only a small number of genes differentially expressed (see below). In all transgene induction experiments, we set the last medium change as 0 hour induction.
We also carried out time course DNA microarray analysis of 10 ES cell lines (Figure S3) and time course Western blot analysis of 17 ES cell lines (Figure 1D and Figure S4). Western blots showed that in all examined cases, a transgene-derived protein started to appear by 12 hours after induction and reached a maximum level by 48 hours. DNA microarray analysis showed that while the global transcriptome began to change within 24 hours, the expression of the majority of genes changed relatively monotonically until 72 hours (Figure S3). To capture early effects of TF induction, we looked 48 hours after induction for the expression profiling of all 54 ES cell lines in the Dox+ and Dox− conditions. Except for Dox, all other culture conditions (including LIF) were the same in both Dox+ and Dox− conditions. We confirmed that each transgene expressed a protein that was detectable by an antibody against the FLAG tag by Western blot and immunohistochemistry (Figure 1D, E and Figure S4, S5). Immunohistochemistry also showed that TF-proteins are mainly localized in the nucleus (Figure 1E and Figure S5). The induced level of a transgene was comparable among ES cell lines based on the measurement of transcript levels using qRT-PCR (Figure 1C and Figure S6A). To assess the induced level of TFs at the protein level, we also carried out Western blot analysis using native antibodies that detect both endogenous and exogenous TFs. As expected, for TFs that are already expressed in ES cells, we detected only mild (up to 2- to 3-fold) increases in TF levels (Figure S4C, D). For example, the amount of STAT3 protein was induced by 2.7-fold, which was only 3.4-fold higher than that in thymus (Figure S4D). In contrast, CDX2, which is not usually expressed in ES cells, showed an ~80-fold increase, but was only ~2-fold higher than the highest protein level reached in the differentiated trophoblast cells (Figure S4D). These data indicate that the induced levels of TFs in this system are largely within the physiological range of gene dosage.
To assess the extent of changes in global gene expression patterns, we first combined all the new microarray data obtained from 54 ES cell lines with previous microarray data that we had obtained from ES cells differentiating into three cell lineages (Aiba et al., 2009). The data sets were fully compatible because the same microarray platform was used. Principal component analysis (PCA) of all 304 microarray data sets revealed that the transcriptome state of all the TF-inducible ES cells, even after 48 hours of TF induction (Dox−), did not shift away from the zone where undifferentiated pluripotent ES cells were clustered (Figure 2A; see Table S1 for data on expression changes for all genes). This indicates that transcriptome changes measured 48 hours after TF-induction reflect the early effects of the TF in undifferentiated or nearly undifferentiated ES cells, but not in more differentiated cell types. Consistent with the PCA, most of the TF-induced ES cells showed no significant morphological changes at 48 hours (data not shown). However, after 7 days of continuous induction of TFs, most of the examined ES cell lines showed morphological changes indicative of differentiation (Figure S7).
Interestingly, even these early effects of transcriptome changes often revealed incipient trajectories of differentiation, as shown as clusters in the “heatmap” (Figure 2B; see Table S2 for data of individual genes). This became more evident when these expression profiles were compared to the microarray data of ES cells differentiating into specific lineages (Aiba et al., 2009) and those of mature tissues and organs (Su et al., 2002). For example, ES cell lines with Sox2, Pou5f1, Nrip1, and Ascl1 showed the greatest similarity to epiblast/neural cells; ES cell lines with Ascl2, Cdx2, Eomes, and Esx1 showed the greatest similarity to trophoblast cells; and ES cell lines with Gata3 showed the greatest similarity to primitive endoderm cells (Figure 2C). Similarly, even TFs that are known to function in late lineage specification induced expression profiles that correspond to those late-stage differentiated cells. For example, ES cells with Myod1 and Mef2c showed the greatest similarity to heart and muscle tissues, and ES cells with Sfpi1 showed the greatest similarity to lymph node, thymus, and immune cells (Figure 2D). The results were generally consistent with previously published functions of these TFs: Cdx2, Esx1, Ascl2, and Eomes (Simmons and Cross, 2005); Sox2 and Ascl1 (Diez del Corral and Storey, 2001); and Myod1 and Mef2c (Naya and Olson, 1999).
Next, genes whose expression was affected by induction of TFs were identified using pair-wise statistical comparison (FDR<0.05 and expression changes >2-fold) between microarray data for the same clone in Dox+ and Dox− conditions (Figure 3A; Figure S6B and S8). Some TFs (e.g., Cdx2, Esx1, Gata3, Klf4, Sox9, and Tcf3) caused substantial changes in the transcriptome, whereas other TFs (e.g., Fem1b and Cbx8) had little effect (Figure 3B; Table S3 and S4). For the most part, induction of TFs that were already present in ES cells (ES cell lines on the left side of Figure 3B) had a smaller effect on the gene expression profile than induction of differentiation-related genes that have low endogenous expression in ES cells (ES cell lines on the right side of Figure 3B). It seems reasonable that the greater the fold-induction of a TF, the greater the global perturbation of transcriptome. However, induction of Klf4 and Sox2 (and to some extent Pou5f1) resulted in a strong response even though they were already expressed in ES cells and thus showed low fold-induction of their expression levels, indicating that these TFs have an unusually dose-sensitive and potent regulatory role.
As a proof of principle for the utility of the ES cell lines, we report our study of Caudal type homeobox 2 (Cdx2), which was selected because of its exceptionally strong effect on the transcriptome (Figure 3B) and its unique role in mouse embryo development. Cdx2 is the earliest differentiation marker in the embryo and is highly expressed in the trophectoderm lineage (Strumpf et al., 2005), and the balance between Pou5f1 and Cdx2 expression shifts cell fate during preimplantation development (Niwa et al., 2005).
Time course Western blot analysis showed that the product of the transgene (CDX2-FLAG) was induced to a high level within 0.5 days after removal of Dox, reaching a maximum by 48 hours and remaining very high until day 5, with a slight reduction by day 7 (Figure 4A). Antibodies against CDX2, which recognize both endogenous and exogenous CDX2, showed similar expression patterns (Figure 4A). Colony formation assays followed by alkaline phosphatase staining showed that ES cells and colonies became very flat and lost alkaline phosphatase staining, indicating that Cdx2-induction alone caused differentiation of ES cells by day 7 of induction (Figure 4B). Differentiation of the Cdx2-inducible ES cell line to trophoblast cells was confirmed by positive immunostaining with trophoblast markers CDC42 (Natale and Watson, 2002) and ITGA7 (Klaffky et al., 2001) (Figure 4C). These data are thus consistent with the previous report of the induction of trophoblast cells from ES cells by Cdx2-overexpression (Niwa et al., 2005). Taken together, these data confirm that transgene Cdx2-FLAG was induced by Dox, was translated properly, and was functional as CDX2 protein.
Based on the DNA microarray analysis, 2090 genes were up-regulated and 1699 were down-regulated by 48 hours of Cdx2-induction (Figure 3A). These genes include not only direct targets of CDX2, but also indirect targets that are regulated by the direct targets. To identify direct targets of CDX2 at the genomic level, we applied ChIP-Seq to Cdx2-inducible ES cells 48 – 60 hours after induction. At this time, the ES cells did not yet show signs of differentiation; thus, the ChIP-Seq results reflect Cdx2 function at the very start of ES cell differentiation. ChIP-Western confirmed that FLAG-IP pulled down cross-linked DNA-CDX2 protein complexes in the Dox- condition (Figure 5A). Sequencing of FLAG-ChIP DNAs produced 17.59 million 36-nucleotide tags that were mapped to the latest mouse genome sequence (mm9, NCBI/NIH). We found that 5.72 million tags matched to the genome with ≤ 2 nucleotide mismatches; the remaining tags either did not match to the genome or matched to more than 5 different locations. We found a total of 59,098 peaks with at least 6 tags (Table S5), of which 15,855 had at least 10 tags. Significant peaks of tags (> 9) were observed at 3,152 loci (genes) within 15 kb upstream and downstream of the transcription start sites (TSS) (Table S6). However, only 38 genes had peaks in the promoter regions (< 300 bp upstream or downstream from the TSS), which indicates that CDX2 binds mostly to more distant regulatory regions (300 – 15,000 bp upstream or downstream of the TSS). Figure 5B shows a typical example of peaks. Analysis of the CDX2-ChIP target sequence by CisFinder software (A.A.S. and M.S.H.K., in preparation: available at http://lgsun.grc.nia.nih.gov/CisFinder/) indeed identified one main motif and 5 additional variant motifs (Figure 5C). CDX2 binds mainly to a [T/C][A/C]ATAAA[A/G] motif and to its direct repeat (Figure 5D). The major motif matched the CDX2-binding motif identified by direct binding in an oligonucleotide assay (Berger et al., 2008).
It is conceivable that some binding sites of TF may not be involved in the regulation of gene expression. Therefore, we used our recently published approach to identify functional direct targets of TF by combining TF binding information from ChIP-Seq and gene expression changes caused by TF induction (Sharov et al., 2008). The method compares binding score distributions in genes that responded to TF manipulation with those in control genes that did not respond to TF manipulation (see details in Supplemental Experimental Procedures, Table S7). Of the 3,152 genes with CDX2-binding sites, the analysis revealed a total of 337 functional target genes that satisfied statistical criteria (p<0.1 and FDR<0.6) (Table S8). Of these genes, 334 were up-regulated following the induction of Cdx2 gene, and only 3 were down-regulated. Functional CDX2-target genes included Hox genes and other differentiation-associated genes, consistent with CDX2 function as an inducer of ES cell differentiation (Figure 5E and Table S8). The GO annotations of CDX2 target genes are also available (Table S9). Selected target genes were also further validated by ChIP-qPCR (Figure 5F). Consistent with the ChIP-Seq data, we observed notably high enrichment of CDX2 at promoter regions of Hox genes by ChIP-qPCR validation (Figure 5F).
To see the correlation between the up- or down-regulated genes and CDX2-binding sites, we ranked all 25,030 genes according to the changes caused by Cdx2-induction and compared them to the probability of CDX2 binding for 3152 genes (Figure 5G and Table S6–S7). Interestingly, genes up-regulated after Cdx2-induction were strongly enriched in genes with CDX2 binding, but no enrichment was observed among down-regulated genes. This implies that up-regulation (i.e., positive regulation) of downstream genes is mediated by direct binding of CDX2, whereas down-regulation of downstream genes is not. To gain further insights into a possible mechanism for down-regulation by CDX2, we used published ChIP-Seq data for 13 TFs (Chen et al., 2008) (Table S10). For each TF, we estimated the proportion of genes with its binding sites in their distal regulatory regions among genes that were up- or down-regulated by CDX2 and compared this with the proportion of genes with binding sites among “control” genes unresponsive to CDX2 (response < 1.25-fold). Strikingly, a list of genes down-regulated by the induction of Cdx2 was enriched in genes that carry binding sites for POU5F1, SOX2, NANOG, STAT3, and SMAD1 (Figure 5H). Because it is known that POU5F1, SOX2, and NANOG form a core transcriptional network (Boyer et al., 2005; Chen et al., 2008; Jaenisch and Young, 2008; Kim et al., 2008; Loh et al., 2006), we tentatively call them, as a group, “Pluripotency-Associated Transcription Factors (pTFs).” When we plotted the proportion of genes with binding sites of at least 2 pTFs against the changes of expression caused by Cdx2-induction (Figure 5G and Table S7), we found that genes down-regulated after Cdx2-induction were strongly enriched in genes with pTF-binding, but no enrichment was observed among up-regulated genes.
We initially considered that CDX2 might first down-regulate transcription of Pou5f1, Sox2, or Nanog, in turn resulting in the reduction of either POU5F1, SOX2, or NANOG protein, and consequently the down-regulation of pTF-target genes. However, ChIP-Seq data showed that CDX2 did not bind to enhancer/promoters of Pou5f1, Sox2, and Nanog. Furthermore, when we tested the protein level of these TFs with time-course Western blot analysis of POU5F1, SOX2, and NANOG after CDX2-induction in ES cells, we found that the levels of POU5F1, SOX2, and NANOG protein did not change by day 2, when pTF-target genes were already significantly down-regulated (Figure 5I, Microarray data). These findings make it unlikely that CDX2 first down-regulated Pou5f1, Sox2, and Nanog, leading to subsequent down-regulation of pTF-target genes. Instead, CDX2 seems to interfere with the binding of POU5F1, SOX2, and/or NANOG to enhancers/promoters of their target genes. To further investigate this possibility, we selected Pou5f1 as an example and carried out ChIP-qPCR analysis of POU5F1-target genes using POU5F1 antibody in Cdx2-inducible ES cells in the Dox+ (i.e., in the absence of CDX2) or in the Dox− (i.e., in the presence of CDX2) conditions (Figure 5J). The results indeed showed that the induction of CDX2 caused reduced binding of POU5F1 to its target sequence in downstream genes (Lef1, Tdgf1, Sox2, Nanog, and to some extent, Pou5f1 itself) (Figure 5J). Taken together, the data strongly suggest that CDX2 up-regulates direct target genes by directly binding to their regulatory regions, but down-regulates genes by interfering with the binding of pTFs to the regulatory regions of pTF-target genes.
To investigate a possible mechanism by which CDX2 might interfere with the binding of pTFs to their target genes, we used a FLAG-antibody to isolate a putative protein complex from the Cdx2-manipulated ES cells 48 – 60 hours after CDX2 induction (Dox−) (Figure 6A). A silver-stained SDS-PAGE gel showed a series of discrete bands that were not observed in a control sample isolated from the Cdx2-manipulated ES cells in Dox+ conditions (Figure 6B). The silver-stained gel also indicated that a significant fraction of CDX2 was present in a free form. Mass spectrometric analysis of the immunoprecipitated protein complex revealed a number of proteins matched by multiple peptides (Figure S9). Based on the relatively high number of peptide matches, the following protein groups are likely to be the components of CDX2 complex: (i) NuRD (nucleosomal remodeling and histone deacetylase) complex, including HDAC1, MBD3, and CHD4 (Denslow and Wade, 2007); (ii) SALL4, (iii) PARP1; and (iv) KPNB1 (Importin-1-beta) - a protein known for its function in nuclear transport (Lange et al., 2007).
The presence of HDAC1, SALL4, and KPNB1 in the CDX2-associated complex was validated by IP-Western blotting using antibodies against these proteins (Figure 6C). To test whether SALL4 is a component of NuRD-CDX2 complex, we carried out a reverse-IP assay using antibodies against HDAC1 and SALL4, respectively (Figure 6D). Western blotting results confirmed the interaction between SALL4 and HDAC1 both in the absence of CDX2 (Dox+) and in the presence of CDX2 (Dox−). By contrast, UBF, used as a control, was present in the HDAC1-IP sample but absent in the SALL4-IP sample. HDAC1 and SALL4 were present at similar levels in the nuclear extract from Dox+ and Dox− cells. Taken together, these data indicate that CDX2 associates with NuRD and SALL4 in Cdx2-induced ES cells.
To gain further information about global TF regulatory networks in ES cells, we extended the analysis done for CDX2 to the other 32 TFs that caused significant changes in the expression of >150 genes (Figure 3B). Gene sets that were up- or down-regulated by each induced TF were examined for over-representation of genes with binding sites of various TFs and with various chromatin modifications based on ChIP-Seq data published for ES cells with sufficient tag numbers (Chen et al., 2008; Mikkelsen et al., 2007) (Table S10). The compendium analysis revealed three global features of gene regulatory networks in ES cells.
First, like CDX2, lists of genes down-regulated by the induction of at least 15 other TFs were enriched for those with binding sites for pTFs (i.e., POU5F1, SOX2, NANOG, STAT3, and SMAD1) (Figure 7A; Figure S10 and S11). To confirm this finding, we did a similar analysis using TF binding sites identified using ChIP-chip methodology (Kim et al., 2008). The results (Figure S12) were consistent with our previous analysis and showed that pTFs include 3 additional TFs: Nr0b1 (Dax1), Nac1, and Zfp281. pTF-targets include genes which are commonly associated with ES cell pluripotency: Nr5a2, Fgf4, Lrrc2, Foxd3, Klf2, Nr0b1, Tcea3, Tdgf1, Zfp42, Aire, Phc1, Mycn, Sox2, Jmjd1a, Jarid2, Nanog, Spp1, Myc, Nodal, Dppa3, Trim24, Zic3, Sall4, Dppa5a, Rest, Lefty1, Lefty2, Mybl2, Pou5f1 (see Figure S11 for a full gene list).
Second, genes with binding sites for the polycomb gene Suz12 were enriched in the lists of genes that responded to the induction of nearly all the TFs. We looked at the genes previously identified as having “bivalent” chromatin domains in their promoters, characterized by a combination of H3K4me3 and H3K27me3 marks (Azuara et al., 2006; Bernstein et al., 2006; Roh et al., 2006), because it is known that Suz12 is associated with bivalent domains (Boyer et al., 2006; Lee et al., 2006). Using published data (Mikkelsen et al., 2007) (Table S10), we found that both genes up-regulated and genes down-regulated by nearly all the TF-inductions were enriched for those with bivalent domains (Figure 7A). Previously, a bivalent domain has been attributed to genes that are “poised” or “primed”, indicating that the expression levels of these genes are low or none, but that the gene is ready to be activated immediately (Azuara et al., 2006; Bernstein et al., 2006). Therefore, it is not unusual to see that up-regulated genes fall into the category of genes marked with bivalent domains. However, current models do not anticipate that many of the genes that are down-regulated also fall into bivalent domains. Intrigued by the down-regulation of genes with bivalent domains, we first searched for genes with relatively high expression in ES cells (>30% of maximum expression level) in our earlier compendium microarray data of differentiating ES cells (Aiba et al., 2009) and found 460 such genes with bivalent domains (Figure S13). Among them, 280 genes were down-regulated more than 2-fold following induction of some TFs (Figure S13). To validate whether these genes were indeed down-regulated during differentiation, we examined the changes of expression by microarray and qRT-PCR for 5 genes from this list during ES cell differentiation into trophoblasts. In all genes examined, expression was indeed down-regulated by hundreds of folds (Figure S13E).
Third, genes with binding sites of MYC, MYCN, E2F, and ZFX in the proximal regulatory regions were strongly depleted in both up- and down-regulated gene lists in almost all the TF-induced ES cells (Figure 7A). Similar results were obtained using MYC binding sites from another report (Kim et al., 2008) (Figure S12). Some of these genes are already maximally expressed and therefore cannot be up-regulated further. It is not clear why expression of these genes is not effectively down-regulated following manipulation of TFs. In any case, binding sites of MYC, MYCN, E2F, and ZFX seem to mark genes that are refractory to the induction of TFs.
Initial analyses of ~3% (50) of all TF genes have provided a glimpse of the structure and dynamics of global gene regulatory networks as well as proof-of-principle that this experimental system provides potentially universal tools and resources for “gain-of-function” analyses of transcription factors (TFs) in vitro and in vivo (Figure 7B). The ES cell lines reported here will be freely available to the research community, which could facilitate rapid accumulation and comparison of a variety of information on these standardized ES cells.
One of the striking observations is the difference between TFs in terms of their relative impact on the ES transcriptome (Figure 3). Some TFs can cause a huge perturbation, whereas others cause almost no change. Of particular interest, TFs with a high impact on the transcriptome include the four TFs (Klf4, Pou5f1, Sox2, and Myc) that have been successfully used to convert fibroblast cells into iPS cells (Takahashi and Yamanaka, 2006). This may indicate that formation of iPS cells correlates with the capacity of these TFs to perturb the transcriptome dramatically. Overall, the systematic study of TFs reveals important behaviors that would not be immediately evident in traditional phenotype-driven screens.
Interestingly, TFs that are known to function in late lineage specification (Ascl1, Ascl2, Myod1, Sox9, and Sfpi1) induced expression profiles that overlapped substantially with late-stage differentiated cells within 48 hours of TF overexpression (Figure 2C, D). The data suggest that undifferentiated ES cells may be in a permissive or susceptible state, in which forced induction of single TFs can make relevant changes in the transcriptome, regardless of whether usual TF partners or regulatory context are in place. These data seem to be consistent with the idea that the chromatin of ES cells is less restricted and more open, so changes of TF level alone can cause critical transcriptome changes (Meshorer and Misteli, 2006; Niwa et al., 2005; Silva and Smith, 2008). Whether or not this feature is specific to ES cells should be further tested by examining other cell types, such as fibroblasts.
There are nevertheless categories of genes whose response is modulated by structural or epigenetic cues. For example, genes up-regulated by various TFs are enriched in genes with bivalent domains (H3K4me3+H3K27me3) and depleted in genes with binding sites of MYC, MYCN, E2F, and ZFX in promoters (Figure 7A). The same trend was observed for genes that were down-regulated by these TFs, although the number of genes was much smaller (only up to 10% of all ~3,000 genes with bivalent domains). Thus, genes with bivalent domains may form a dynamic network that can be rapidly up- or down-regulated by changes in expression of TFs, whereas genes with binding sites of MYC, MYCN, E2F, and ZFX in promoters tend to maintain the status quo in their expression level responses to TF changes.
By current thinking, the maintenance of expression of ES-specific genes is governed by the transcriptional network of pTFs (especially, Pou5f1, Sox2, and Nanog) (Boyer et al., 2005; Chen et al., 2008; Jaenisch and Young, 2008; Kim et al., 2008; Loh et al., 2006). These genes form a positive feedback loop to maintain their own expression levels while at the same time regulating other ES cell-specific genes (Figure 7C). Our data imply that CDX2 induction causes the widespread down-regulation of pTF-target genes. However, the data further indicate that CDX2 does not inactivate the transcription of pTFs by directly binding to their regulatory regions; rather, CDX2 interferes with the binding of pTFs to the regulatory regions of pTF-target genes (Figure 7D). Such an effect at the protein complex level would facilitate swift adaptation for ES cells to begin or commit to differentiation pathways. Because at least some of the pTFs (POU5F1, SOX2, and NANOG) are also pTF-targets, the protein levels of POU5F1, SOX2, and NANOG would eventually decline, and the differentiation process would pass a “point of no return” and become “irreversible.”
It has been well established that NuRD is involved in gene transcriptional repression and chromatin remodeling (Denslow and Wade, 2007). Therefore, NuRD could play a major role in interfering with the bindings of pTFs to their targets. This notion is consistent with previous findings that ES cells lacking MBD3, one of the key components of the NuRD complex, cannot differentiate and remain undifferentiated even under differentiation-inducing conditions (Kaji et al., 2006). Indeed, our mass-spectrometry analysis of CDX2-associated protein complex identified the presence of MBD3. However, the exact mechanism of NuRD actions remains unknown. For example, CDX2 may recruit NuRD to the pTFs, resulting in inactivation of the pTFs. Alternatively, CDX2 may compete with the pTFs (particularly POU5F1 (Niwa et al., 2005)) for the binding of NuRD, as it has been shown that pTFs interact with the NuRD in the absence of CDX2 in undifferentiated ES cells (Liang et al., 2008; Wang et al., 2006). The SALL4 - CDX2 association revealed in this study may also be significant, as Sall4 is required to maintain ES cell pluripotency and is important for early embryonic cell-fate decisions (Kim et al., 2008; Lim et al., 2008; Yang et al., 2008; Zhou et al., 2007).
We note at least three possible groups of TFs based on their ability to down-regulate the pTF-target genes. A first group of TFs shows no strong effects on pTF-target genes. A second group of TFs is exemplified by CDX2 and includes Esx1, Dlx3, Gata3, Ascl1, Sox9, Sfpi1, Mef2c, Nr2f2, and Myod1 (Figure 7A). These TFs are not expressed or are expressed at low levels in undifferentiated ES cells. The forced induction of these TFs down-regulates direct target genes of pTFs, possibly through the same mechanism as CDX2. This is reasonable, because the differentiation of ES cells requires the down-regulation of ES-specific genes, particularly genes involved in the maintenance of pluripotency of ES cells. A third group of TFs includes Pou5f1, Sox2, and Klf4, which are expressed relatively highly in undifferentiated ES cells, but still show a significant perturbation of the global transcriptome after a moderate increase in their expression levels (Figure 7A). Because it has been shown that pTFs (e.g., POU5F1, SOX2, and NANOG) form a protein complex (Liang et al., 2008; Wang et al., 2006), overexpression of one component could distort the stoichiometry of this complex, possibly resulting in a reduced amount of the effective protein complex and leading to the down-regulation of direct target genes of pTFs. As an additional consistent observation, it has been reported that Klf4 regulates downstream genes in a synergistic manner with Pou5f1 and Sox2 (Nakatake et al., 2006). This model rationalizes the fact that both Pou5f1 overexpression and repression can down-regulate pTF-target genes, as both of them could cause a reduction in effective pTF complexes. This peculiar dose-response pattern of gene expression has previously been called “squelching” (Scholer et al., 1991), but has also been called “bell-shaped” or “inverse bell-shaped” based on the behavior of a large number of genes regulated by Pou5f1 in DNA microarray studies (Matoba et al., 2006).
At the outset of this project, we reasoned that analyses of a large number of TFs might be requisite to explore global TF network(s) beyond the core transcriptional network of Pou5f1, Sox2, and Nanog. After analyzing large-scale transcriptome changes induced by 50 TFs, the core network remains one of the most conspicuous features of gene expression regulation in ES cells. These data thus reinforce the current paradigm that Pou5f1/Sox2/Nanog are the key regulators maintaining the pluripotent undifferentiated state of ES cells (Jaenisch and Young, 2008; Niwa, 2007; Silva and Smith, 2008). Furthermore, our data have revealed that suppression of pTF activity, which causes widespread down-regulation of ES-specific genes, is an early step of TF-induced ES cell differentiation. Further mining of the results reported here may provide additional inferences about relevant gene regulatory networks. Carrying out similar experiments for a larger number of regulatory proteins – and ideally all 1,600 – 2,000 TFs and additional signaling proteins – should give increasingly complete information to help infer the cybernetic networks in mammalian cells.
MC1 mouse ES cells derived from 129S6/SvEvTac were cultured in DMEM with 15% FBS and LIF on feeder cells. Cells were electroporated with linearized pMWROSATcH and selected by hygromycin B. Knock-in for ROSA-TET locus in ES[MC1R(20)] cells was confirmed by southern blotting. For exchange vectors, PCR amplified ORFs were subcloned into pZhcSfi that was modified to express His6-FLAG tagged protein and puromycin resistant gene. ES[MC1R(20)] cells (passage 17) cultured on feeder cells were co-transfected with a sequence verified exchange vector and pCAGGS-Cre and selected by puromycin in the presence of Dox. Isolated clones were tested for Venus expression, hygromycin B susceptibility, transgene RNA expression, genotyping for Cre mediated integration, karyotyping, western blotting using anti-FLAG antibody and mycoplasma contamination (Supplemental Text). Further details about the ES cell lines and how to obtain them can be found in the project website: http://esbank.nia.nih.gov/.
ES cells (passage 25) were cultured in the standard LIF+ medium on a gelatin-coated dish through the experiments. Dox was removed through washing 3 times with PBS at 3 hours intervals and total RNA was isolated by TRIzol (Invitrogen) after 2 days. All procedures for each ES line were done in two independent replications (Figure S14). Cy3-CTP labeled sample targets were prepared with total RNA by Low RNA Input Fluorescent Linear Amplification Kit (Agilent). A Cy5-CTP labeled reference target was produced from mixture of Stratagene Universal Mouse Reference RNA and MC1 cells RNA. Targets were hybridized to the NIA Mouse 44K Microarray v3.0 (Agilent, design ID 015087) (Carter et al., 2005). Slides were scanned with Agilent DNA Microarray Scanner. All DNA Microarray data are available in Table S1, at GEO/NCBI (http://www.ncbi.nlm.nih.gov/geo/; GSE16375), and at NIA Array Analysis software ((Sharov et al., 2005); http://lgsun.grc.nia.nih.gov/ANOVA/).
CDX2-complexes were purified with anti-FLAG M2 affinity gel and proteins were eluted by 3XFLAG peptide for mass spectrum analysis or Laemmli's sample buffer for western blotting.
Cross-linked chromatin from Cdx2 expressed cells was fragmented by sonication and incubated with anti-FLAG M2 affinity gel. The immunoprecipitate was eluted and reverse cross-linked. Sequence sample preparation, Cluster generation on Cluster Station (Illumina), sequencing by Genome Analyzer II (Illumina) were performed according to Illumina's manuals. See details in Supplementary Experimental Procedures. ChIP-Seq data are available at GEO/NCBI (GSE16375).
We thank Dr. Hitoshi Niwa for providing the Tet-inducible vector system and some cDNA clones, and for discussion. This research was supported entirely by the Intramural Research Program of the NIH, National Institute on Aging.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.