CTCF plays a critical role in transcriptional regulation in vertebrates (for reviews, see (
Ohlsson et al., 2001) (
Klenova et al., 2002) (
Dunn and Davie, 2003)). It was first identified by its ability to bind to a number of dissimilar regulatory sequences in the promoter-proximal regions of the chicken, mouse, and human MYC oncogenes (
Filippova et al., 1996;
Lobanenkov et al., 1990). CTCF is a ubiquitously expressed nuclear protein with 11 zinc finger (ZF) DNA-binding domain (
Filippova et al., 1996;
Klenova et al., 1993). It is essential (
Fedoriw et al., 2004) and highly conserved from Drosophila to mice and man (
Moon et al., 2005). Point-mutations at the distinct DNA-recognition amino acid positions in ZF3 and ZF7 of CTCF have been identified in a variety of cancers selected for LOH at 16q22 where CTCF maps, suggesting its role as candidate tumor-suppressor gene (
Filippova et al., 1998;
Filippova et al., 2002).
Initial biochemical analyses revealed that CTCF contains two transcription repressor domains, and can act as a transcriptional repressor (
Baniahmad et al., 1990;
Burcin et al., 1997;
Klenova et al., 1993;
Lobanenkov et al., 1990). However, others have found that it could also function as a transcriptional activator in a different sequence context (
Vostrov and Quitschke, 1997). Recent studies have identified CTCF to be the vertebrate insulator protein (
Bell et al., 1999). So far, CTCF remains as the only major protein implicated in establishment of insulators in vertebrates (
Felsenfeld et al., 2004), including those involved in regulation of gene imprinting and mono-allelic gene expression (
Fedoriw et al., 2004) (
Ling et al., 2006), as well as in X-chromosome inactivation and in the escape from X-linked inactivation (
Filippova et al., 2005;
Lee, 2003).
There has been a great interest in identifying where potential insulators are located in the eukaryotic genome, because knowledge of these elements can help understand how
cis-regulatory elements coordinate expression of the target genes. Transcription of every eukaryotic gene begins with the assembly of an RNA polymerase preinitiation complex (PIC) at the promoter (
Kadonaga, 2004), a process that is regulated by sequence specific transcription factors and
cis-regulatory elements. Genetics studies in
Drosophila first identified the importance of insulators in ensuring proper enhancer/promoter interactions (
Udvardy et al., 1985). More recent studies have implicated insulators in the establishment of euchromatin/heterochromatin boundaries in vertebrates (
Felsenfeld et al., 2004;
Gerasimova and Corces, 2001;
Jeong and Pfeifer, 2004). In addition, it has been demonstrated that an insulator in the IGF2/H19 locus is critical for the imprinting of the locus (
Bell and Felsenfeld, 2000;
Hark et al., 2000;
Kanduri et al., 2000).
The mechanism of insulator function remains unclear. One model proposes that insulators, by formation of special chromatin structures, compete for enhancer-bound activators, preventing the activation of downstream promoters (
Bulger and Groudine, 1999). Alternatively, insulators may facilitate the formation of loops, for example, via attachment of chromosomal regions to the nuclear membrane (
Yusufzai et al., 2004), keeping the intermediate regions exposed for only local interactions between enhancers and promoters. Consistent with this model, it was recently shown that CTCF could mediate long-range chromosomal interactions in mammalian cells, providing a possible mechanism by which insulators establish regulatory domains (
Kurukuti et al., 2006;
Ling et al., 2006;
Yusufzai et al., 2004). The extent at which each mechanism plays a role in shaping genome expression remains unresolved. Knowledge of insulators in the genome would provide a much-needed framework for understanding the genome organization and function.
The effort to computationally identify potential insulators in the human genome has been hampered by an incomplete understanding of the DNA recognition sequence of CTCF. Biochemical assays have indicated that the 11-zinc-finger protein can use different combinations of the zinc-finger domains to bind different DNA target sequences (
Filippova et al., 1996;
Ohlsson et al., 2001). Thus, the CTCF binding sites identified from
in vitro protein/DNA interaction assays and a limited number of known insulators exhibit extensive sequence variation and not enough specificity for genome-wide prediction of CTCF binding (
Ohlsson et al., 2001). Recently, an attempt has been made to systematically isolate insulators in the mouse genome through chromatin immunoprecipitation followed by cloning and sequencing (
Mukhopadhyay et al., 2004). Unfortunately, due to a limited scale of the sequencing effort, only about 200 DNA-fragments with the enhancer-blocking activity, each driven by various CTCF binding sites, have been identified. However, no consensus of CTCF binding motif has been so far reported from this study.
As a first step towards understanding how insulators contribute to gene expression in human cells, we have located the sites of CTCF binding in the human genome using chromatin immunoprecipitation followed by detection with genome-tiling microarrays (
Kim et al., 2005b;
Kim and Ren, 2006). Our analyses have generated a high-resolution genomic map of CTCF binding, with on average 2.5 genes bounded by a pair of CTCF binding sites. We also identify a clear consensus of CTCF binding motif shared by a majority of the experimentally determined
in vivo CTCF binding sites. We show that the sites of CTCF binding sequences in the human genome are highly conserved in other vertebrates, consistent with the widespread and fundamental role of CTCF in cellular function. In addition, we demonstrate that CTCF binding to DNA is largely invariant from cell to cell, with a subset interacting with the protein in a cell type dependent manner. Our results offer a general resource for understanding the role of CTCF in insulator function, gene regulation, and genome organization in human cells.