The capability of human embryonic stem cells (hESCs) to undergo unlimited self-renewal and retain the pluripotency to differentiate into all cell lineages in the body has raised great hope for developing cell replacement therapy. Although a handful of regulators have been identified to regulate self-renewal and differentiation of hESC, the underlying mechanisms have not been fully understood and additional regulators are still to be uncovered. High-throughput screenings of genes important for maintaining pluripotency in hESC reveal hundreds of potential regulators (
1). Distinguishing direct from indirect regulations in such screenings as well as illustrating the mechanisms of these regulators are the immediate challenges.
In this study, we focus on identification of transcription factors (TFs) that may play crucial roles in regulating self-renewal and differentiation of ESCs. The functions of a TF are largely conveyed by its target genes. Despite the availability of complete genome sequences for many organisms, genome-wide identification of TF direct targets and assembling these regulatory interactions into a functional network in mammals remain a challenge (
2). chromatin immunoprecipitation (ChIP)-based technologies have been exploited to determine binding sites of numerous TFs in higher organisms (
3,
4). However, this approach is hindered by the limited availability of suitable antibodies and cell types for analysis. Additionally, not all TF-binding sites correspond to functional consequences. In parallel, many computational methods have also been developed to infer transcription networks from gene expression, TF binding data or integration of various types of data (
2). When applied to mammalian genomes, these methods often suffer from inability to distinguish direct from indirect target genes or to identify the context-dependent activities of the transcription network.
Recent studies reveal that functional elements such as promoters and enhancers are associated with characteristic chromatin signatures (
5–7). Genome-wide maps of chromatin modification states have led to identification of such elements in the human genome (
6,
8). More recently, DNA methylomes have been mapped at base resolution in multiple cell types (
9). These epigenomic data are context dependent and reflect the functional state of the cell. Integrating epigenomic and genomic information to identify transcription factor-binding sites (TFBSs) (
10,
11) may allow one to determine cell-type-specific transcription networks. In this study, we demonstrated the success of this approach in reconstructing cell-type-specific transcription networks in pluripotent cells (human and mouse ESC, hESC and mESC) and lineage committed (human fetal lung fibroblast, hFLF) cell type. Genome-wide identification of TFBSs allowed reconstruction of these networks (each consisting of >11,000 genes and >500,000 interactions) at an unprecedented scale.
We conducted systems-level analyses of these networks that revealed regulators of pluripotency or differentiation, and illustrated how they might cooperate with the master regulators (Oct4, Nanog and Sox2) in ESCs. Gene expression changes of these predicted regulators upon knockdown of Oct4 in hESC confirmed their functional roles. Furthermore, these networks also facilitated investigation of the interplay between TF binding and epigenetic modifications. Especially, we observed poised enhancers marked by both active (H3K4me1) and repressive (H3K27me3) histone marks that contain enriched Oct4- and Suz12-binding sites in hESC.