Dox-inducibility of the transgene in each of 54 ES cell lines
We first carried out control experiments and found that three medium changes at 3 hour intervals minimized unwanted perturbation associated with commonly used cell passaging while inducing the transgene fully by effectively removing Dox (
Figure S2). Indeed, the control ES cells, in which an expression unit without an ORF was inserted into the ROSA-TET locus, showed only a small number of genes differentially expressed (see below). In all transgene induction experiments, we set the last medium change as 0 hour induction.
We also carried out time course DNA microarray analysis of 10 ES cell lines (
Figure S3) and time course Western blot analysis of 17 ES cell lines ( and
Figure S4). Western blots showed that in all examined cases, a transgene-derived protein started to appear by 12 hours after induction and reached a maximum level by 48 hours. DNA microarray analysis showed that while the global transcriptome began to change within 24 hours, the expression of the majority of genes changed relatively monotonically until 72 hours (
Figure S3). To capture early effects of TF induction, we looked 48 hours after induction for the expression profiling of all 54 ES cell lines in the Dox+ and Dox− conditions. Except for Dox, all other culture conditions (including LIF) were the same in both Dox+ and Dox− conditions. We confirmed that each transgene expressed a protein that was detectable by an antibody against the FLAG tag by Western blot and immunohistochemistry ( and
Figure S4, S5). Immunohistochemistry also showed that TF-proteins are mainly localized in the nucleus ( and
Figure S5). The induced level of a transgene was comparable among ES cell lines based on the measurement of transcript levels using qRT-PCR ( and
Figure S6A). To assess the induced level of TFs at the protein level, we also carried out Western blot analysis using native antibodies that detect both endogenous and exogenous TFs. As expected, for TFs that are already expressed in ES cells, we detected only mild (up to 2- to 3-fold) increases in TF levels (
Figure S4C, D). For example, the amount of STAT3 protein was induced by 2.7-fold, which was only 3.4-fold higher than that in thymus (
Figure S4D). In contrast, CDX2, which is not usually expressed in ES cells, showed an ~80-fold increase, but was only ~2-fold higher than the highest protein level reached in the differentiated trophoblast cells (
Figure S4D). These data indicate that the induced levels of TFs in this system are largely within the physiological range of gene dosage.
Global patterns of gene expression in response to induction of TFs
To assess the extent of changes in global gene expression patterns, we first combined all the new microarray data obtained from 54 ES cell lines with previous microarray data that we had obtained from ES cells differentiating into three cell lineages (
Aiba et al., 2009). The data sets were fully compatible because the same microarray platform was used. Principal component analysis (PCA) of all 304 microarray data sets revealed that the transcriptome state of all the TF-inducible ES cells, even after 48 hours of TF induction (Dox−), did not shift away from the zone where undifferentiated pluripotent ES cells were clustered (; see
Table S1 for data on expression changes for all genes). This indicates that transcriptome changes measured 48 hours after TF-induction reflect the early effects of the TF in undifferentiated or nearly undifferentiated ES cells, but not in more differentiated cell types. Consistent with the PCA, most of the TF-induced ES cells showed no significant morphological changes at 48 hours (data not shown). However, after 7 days of continuous induction of TFs, most of the examined ES cell lines showed morphological changes indicative of differentiation (
Figure S7).
Interestingly, even these early effects of transcriptome changes often revealed incipient trajectories of differentiation, as shown as clusters in the “heatmap” (; see
Table S2 for data of individual genes). This became more evident when these expression profiles were compared to the microarray data of ES cells differentiating into specific lineages (
Aiba et al., 2009) and those of mature tissues and organs (
Su et al., 2002). For example, ES cell lines with
Sox2,
Pou5f1,
Nrip1, and
Ascl1 showed the greatest similarity to epiblast/neural cells; ES cell lines with
Ascl2,
Cdx2,
Eomes, and
Esx1 showed the greatest similarity to trophoblast cells; and ES cell lines with
Gata3 showed the greatest similarity to primitive endoderm cells (). Similarly, even TFs that are known to function in late lineage specification induced expression profiles that correspond to those late-stage differentiated cells. For example, ES cells with
Myod1 and
Mef2c showed the greatest similarity to heart and muscle tissues, and ES cells with
Sfpi1 showed the greatest similarity to lymph node, thymus, and immune cells (). The results were generally consistent with previously published functions of these TFs:
Cdx2,
Esx1,
Ascl2, and
Eomes (
Simmons and Cross, 2005);
Sox2 and
Ascl1 (
Diez del Corral and Storey, 2001); and
Myod1 and
Mef2c (
Naya and Olson, 1999).
Next, genes whose expression was affected by induction of TFs were identified using pair-wise statistical comparison (FDR<0.05 and expression changes >2-fold) between microarray data for the same clone in Dox+ and Dox− conditions (;
Figure S6B and S8). Some TFs (e.g.,
Cdx2,
Esx1,
Gata3,
Klf4,
Sox9, and
Tcf3) caused substantial changes in the transcriptome, whereas other TFs (e.g.,
Fem1b and
Cbx8) had little effect (;
Table S3 and
S4). For the most part, induction of TFs that were already present in ES cells (ES cell lines on the left side of ) had a smaller effect on the gene expression profile than induction of differentiation-related genes that have low endogenous expression in ES cells (ES cell lines on the right side of ). It seems reasonable that the greater the fold-induction of a TF, the greater the global perturbation of transcriptome. However, induction of
Klf4 and
Sox2 (and to some extent
Pou5f1) resulted in a strong response even though they were already expressed in ES cells and thus showed low fold-induction of their expression levels, indicating that these TFs have an unusually dose-sensitive and potent regulatory role.
Dissecting gene regulatory networks: the example of Cdx2
As a proof of principle for the utility of the ES cell lines, we report our study of Caudal type homeobox 2 (
Cdx2), which was selected because of its exceptionally strong effect on the transcriptome () and its unique role in mouse embryo development.
Cdx2 is the earliest differentiation marker in the embryo and is highly expressed in the trophectoderm lineage (
Strumpf et al., 2005), and the balance between
Pou5f1 and
Cdx2 expression shifts cell fate during preimplantation development (
Niwa et al., 2005).
Time course Western blot analysis showed that the product of the transgene (CDX2-FLAG) was induced to a high level within 0.5 days after removal of Dox, reaching a maximum by 48 hours and remaining very high until day 5, with a slight reduction by day 7 (). Antibodies against CDX2, which recognize both endogenous and exogenous CDX2, showed similar expression patterns (). Colony formation assays followed by alkaline phosphatase staining showed that ES cells and colonies became very flat and lost alkaline phosphatase staining, indicating that
Cdx2-induction alone caused differentiation of ES cells by day 7 of induction (). Differentiation of the
Cdx2-inducible ES cell line to trophoblast cells was confirmed by positive immunostaining with trophoblast markers CDC42 (
Natale and Watson, 2002) and ITGA7 (
Klaffky et al., 2001) (). These data are thus consistent with the previous report of the induction of trophoblast cells from ES cells by
Cdx2-overexpression (
Niwa et al., 2005). Taken together, these data confirm that transgene
Cdx2-FLAG was induced by Dox, was translated properly, and was functional as CDX2 protein.
Based on the DNA microarray analysis, 2090 genes were up-regulated and 1699 were down-regulated by 48 hours of Cdx2-induction (). These genes include not only direct targets of CDX2, but also indirect targets that are regulated by the direct targets. To identify direct targets of CDX2 at the genomic level, we applied ChIP-Seq to
Cdx2-inducible ES cells 48 – 60 hours after induction. At this time, the ES cells did not yet show signs of differentiation; thus, the ChIP-Seq results reflect
Cdx2 function at the very start of ES cell differentiation. ChIP-Western confirmed that FLAG-IP pulled down cross-linked DNA-CDX2 protein complexes in the Dox- condition (). Sequencing of FLAG-ChIP DNAs produced 17.59 million 36-nucleotide tags that were mapped to the latest mouse genome sequence (mm9, NCBI/NIH). We found that 5.72 million tags matched to the genome with ≤ 2 nucleotide mismatches; the remaining tags either did not match to the genome or matched to more than 5 different locations. We found a total of 59,098 peaks with at least 6 tags (
Table S5), of which 15,855 had at least 10 tags. Significant peaks of tags (> 9) were observed at 3,152 loci (genes) within 15 kb upstream and downstream of the transcription start sites (TSS) (
Table S6). However, only 38 genes had peaks in the promoter regions (< 300 bp upstream or downstream from the TSS), which indicates that CDX2 binds mostly to more distant regulatory regions (300 – 15,000 bp upstream or downstream of the TSS). shows a typical example of peaks. Analysis of the CDX2-ChIP target sequence by CisFinder software (A.A.S. and M.S.H.K., in preparation: available at
http://lgsun.grc.nia.nih.gov/CisFinder/) indeed identified one main motif and 5 additional variant motifs (). CDX2 binds mainly to a [T/C][A/C]ATAAA[A/G] motif and to its direct repeat (). The major motif matched the CDX2-binding motif identified by direct binding in an oligonucleotide assay (
Berger et al., 2008).
It is conceivable that some binding sites of TF may not be involved in the regulation of gene expression. Therefore, we used our recently published approach to identify functional direct targets of TF by combining TF binding information from ChIP-Seq and gene expression changes caused by TF induction (
Sharov et al., 2008). The method compares binding score distributions in genes that responded to TF manipulation with those in control genes that did not respond to TF manipulation (see details in
Supplemental Experimental Procedures,
Table S7). Of the 3,152 genes with CDX2-binding sites, the analysis revealed a total of 337 functional target genes that satisfied statistical criteria (p<0.1 and FDR<0.6) (
Table S8). Of these genes, 334 were up-regulated following the induction of
Cdx2 gene, and only 3 were down-regulated. Functional CDX2-target genes included Hox genes and other differentiation-associated genes, consistent with CDX2 function as an inducer of ES cell differentiation ( and
Table S8). The GO annotations of CDX2 target genes are also available (
Table S9). Selected target genes were also further validated by ChIP-qPCR (). Consistent with the ChIP-Seq data, we observed notably high enrichment of CDX2 at promoter regions of Hox genes by ChIP-qPCR validation ().
To see the correlation between the up- or down-regulated genes and CDX2-binding sites, we ranked all 25,030 genes according to the changes caused by
Cdx2-induction and compared them to the probability of CDX2 binding for 3152 genes ( and
Table S6–
S7). Interestingly, genes up-regulated after
Cdx2-induction were strongly enriched in genes with CDX2 binding, but no enrichment was observed among down-regulated genes. This implies that up-regulation (i.e., positive regulation) of downstream genes is mediated by direct binding of CDX2, whereas down-regulation of downstream genes is not. To gain further insights into a possible mechanism for down-regulation by CDX2, we used published ChIP-Seq data for 13 TFs (
Chen et al., 2008) (
Table S10). For each TF, we estimated the proportion of genes with its binding sites in their distal regulatory regions among genes that were up- or down-regulated by CDX2 and compared this with the proportion of genes with binding sites among “control” genes unresponsive to CDX2 (response < 1.25-fold). Strikingly, a list of genes down-regulated by the induction of Cdx2 was enriched in genes that carry binding sites for POU5F1, SOX2, NANOG, STAT3, and SMAD1 (). Because it is known that POU5F1, SOX2, and NANOG form a core transcriptional network (
Boyer et al., 2005;
Chen et al., 2008;
Jaenisch and Young, 2008;
Kim et al., 2008;
Loh et al., 2006), we tentatively call them, as a group, “Pluripotency-Associated Transcription Factors (pTFs).” When we plotted the proportion of genes with binding sites of at least 2 pTFs against the changes of expression caused by
Cdx2-induction ( and
Table S7), we found that genes down-regulated after
Cdx2-induction were strongly enriched in genes with pTF-binding, but no enrichment was observed among up-regulated genes.
We initially considered that CDX2 might first down-regulate transcription of Pou5f1, Sox2, or Nanog, in turn resulting in the reduction of either POU5F1, SOX2, or NANOG protein, and consequently the down-regulation of pTF-target genes. However, ChIP-Seq data showed that CDX2 did not bind to enhancer/promoters of Pou5f1, Sox2, and Nanog. Furthermore, when we tested the protein level of these TFs with time-course Western blot analysis of POU5F1, SOX2, and NANOG after CDX2-induction in ES cells, we found that the levels of POU5F1, SOX2, and NANOG protein did not change by day 2, when pTF-target genes were already significantly down-regulated (, Microarray data). These findings make it unlikely that CDX2 first down-regulated Pou5f1, Sox2, and Nanog, leading to subsequent down-regulation of pTF-target genes. Instead, CDX2 seems to interfere with the binding of POU5F1, SOX2, and/or NANOG to enhancers/promoters of their target genes. To further investigate this possibility, we selected Pou5f1 as an example and carried out ChIP-qPCR analysis of POU5F1-target genes using POU5F1 antibody in Cdx2-inducible ES cells in the Dox+ (i.e., in the absence of CDX2) or in the Dox− (i.e., in the presence of CDX2) conditions (). The results indeed showed that the induction of CDX2 caused reduced binding of POU5F1 to its target sequence in downstream genes (Lef1, Tdgf1, Sox2, Nanog, and to some extent, Pou5f1 itself) (). Taken together, the data strongly suggest that CDX2 up-regulates direct target genes by directly binding to their regulatory regions, but down-regulates genes by interfering with the binding of pTFs to the regulatory regions of pTF-target genes.
To investigate a possible mechanism by which CDX2 might interfere with the binding of pTFs to their target genes, we used a FLAG-antibody to isolate a putative protein complex from the Cdx2-manipulated ES cells 48 – 60 hours after CDX2 induction (Dox−) (). A silver-stained SDS-PAGE gel showed a series of discrete bands that were not observed in a control sample isolated from the Cdx2-manipulated ES cells in Dox+ conditions (). The silver-stained gel also indicated that a significant fraction of CDX2 was present in a free form. Mass spectrometric analysis of the immunoprecipitated protein complex revealed a number of proteins matched by multiple peptides (
Figure S9). Based on the relatively high number of peptide matches, the following protein groups are likely to be the components of CDX2 complex: (i) NuRD (nucleosomal remodeling and histone deacetylase) complex, including HDAC1, MBD3, and CHD4 (
Denslow and Wade, 2007); (ii) SALL4, (iii) PARP1; and (iv) KPNB1 (Importin-1-beta) - a protein known for its function in nuclear transport (
Lange et al., 2007).
The presence of HDAC1, SALL4, and KPNB1 in the CDX2-associated complex was validated by IP-Western blotting using antibodies against these proteins (). To test whether SALL4 is a component of NuRD-CDX2 complex, we carried out a reverse-IP assay using antibodies against HDAC1 and SALL4, respectively (). Western blotting results confirmed the interaction between SALL4 and HDAC1 both in the absence of CDX2 (Dox+) and in the presence of CDX2 (Dox−). By contrast, UBF, used as a control, was present in the HDAC1-IP sample but absent in the SALL4-IP sample. HDAC1 and SALL4 were present at similar levels in the nuclear extract from Dox+ and Dox− cells. Taken together, these data indicate that CDX2 associates with NuRD and SALL4 in Cdx2-induced ES cells.
Compendium analysis of TF-binding sites in genes affected by the induction of TFs
To gain further information about global TF regulatory networks in ES cells, we extended the analysis done for CDX2 to the other 32 TFs that caused significant changes in the expression of >150 genes (). Gene sets that were up- or down-regulated by each induced TF were examined for over-representation of genes with binding sites of various TFs and with various chromatin modifications based on ChIP-Seq data published for ES cells with sufficient tag numbers (
Chen et al., 2008;
Mikkelsen et al., 2007) (
Table S10). The compendium analysis revealed three global features of gene regulatory networks in ES cells.
First, like CDX2, lists of genes down-regulated by the induction of at least 15 other TFs were enriched for those with binding sites for pTFs (i.e., POU5F1, SOX2, NANOG, STAT3, and SMAD1) (;
Figure S10 and S11). To confirm this finding, we did a similar analysis using TF binding sites identified using ChIP-chip methodology (
Kim et al., 2008). The results (
Figure S12) were consistent with our previous analysis and showed that pTFs include 3 additional TFs:
Nr0b1 (
Dax1),
Nac1, and
Zfp281. pTF-targets include genes which are commonly associated with ES cell pluripotency:
Nr5a2,
Fgf4,
Lrrc2,
Foxd3,
Klf2,
Nr0b1,
Tcea3,
Tdgf1,
Zfp42,
Aire,
Phc1,
Mycn,
Sox2,
Jmjd1a,
Jarid2,
Nanog,
Spp1,
Myc,
Nodal,
Dppa3,
Trim24,
Zic3,
Sall4,
Dppa5a,
Rest,
Lefty1,
Lefty2,
Mybl2,
Pou5f1 (see
Figure S11 for a full gene list).
Second, genes with binding sites for the polycomb gene
Suz12 were enriched in the lists of genes that responded to the induction of nearly all the TFs. We looked at the genes previously identified as having “bivalent” chromatin domains in their promoters, characterized by a combination of H3K4me3 and H3K27me3 marks (
Azuara et al., 2006;
Bernstein et al., 2006;
Roh et al., 2006), because it is known that
Suz12 is associated with bivalent domains (
Boyer et al., 2006;
Lee et al., 2006). Using published data (
Mikkelsen et al., 2007) (
Table S10), we found that both genes up-regulated and genes down-regulated by nearly all the TF-inductions were enriched for those with bivalent domains (). Previously, a bivalent domain has been attributed to genes that are “poised” or “primed”, indicating that the expression levels of these genes are low or none, but that the gene is ready to be activated immediately (
Azuara et al., 2006;
Bernstein et al., 2006). Therefore, it is not unusual to see that up-regulated genes fall into the category of genes marked with bivalent domains. However, current models do not anticipate that many of the genes that are down-regulated also fall into bivalent domains. Intrigued by the down-regulation of genes with bivalent domains, we first searched for genes with relatively high expression in ES cells (>30% of maximum expression level) in our earlier compendium microarray data of differentiating ES cells (
Aiba et al., 2009) and found 460 such genes with bivalent domains (
Figure S13). Among them, 280 genes were down-regulated more than 2-fold following induction of some TFs (
Figure S13). To validate whether these genes were indeed down-regulated during differentiation, we examined the changes of expression by microarray and qRT-PCR for 5 genes from this list during ES cell differentiation into trophoblasts. In all genes examined, expression was indeed down-regulated by hundreds of folds (
Figure S13E).
Third, genes with binding sites of MYC, MYCN, E2F, and ZFX in the proximal regulatory regions were strongly depleted in both up- and down-regulated gene lists in almost all the TF-induced ES cells (). Similar results were obtained using MYC binding sites from another report (
Kim et al., 2008) (
Figure S12). Some of these genes are already maximally expressed and therefore cannot be up-regulated further. It is not clear why expression of these genes is not effectively down-regulated following manipulation of TFs. In any case, binding sites of MYC, MYCN, E2F, and ZFX seem to mark genes that are refractory to the induction of TFs.