The multiple zinc-finger DNA-binding protein CTCF is known to be required for the enhancer blocking action of vertebrate insulators, and a clear role for CTCF in the regulation of endogenous gene expression has been demonstrated at the imprinted Igf2 locus [9
]. The mode of action of CTCF is, however, still unclear, although several studies have implicated CTCF in the formation of higher-order chromatin structure. CTCF molecules can interact to form clusters and thereby may mediate the formation of chromatin loop domains [44
]. Partitioning of regulatory elements into independent chromatin loop domains is postulated to play a key role in the interactions between enhancers and promoters. Recently, a CTCF homolog was identified in Drosophila,
and it was discovered that CTCF is required for the insulator function of the Fab-8
element in the BX-C [13
]. This observation opened up the prospect of utilising the wealth of genetic and molecular characterisation of BX-C transcriptional regulation for the analysis of CTCF function. Here we have used ChIP-array to investigate CTCF binding sites in regions of the Drosophila
genome with a particular focus on the BX-C. We find that CTCF is not only associated with the Fab-8
insulator, but also with other mapped boundary elements, Fab-6
. In addition, we show that CTCF sites are located at other postulated boundaries within the BX-C; “Fab-2,” “Fab-3,” and “Fab-4.” This provides a precise mapping of regulatory domain boundaries and a specific molecular foundation for the domain model of BX-C regulation.
We note that the Fab-7 boundary may differ from the other characterised boundaries in the BX-C as we do not find a strong Patser match to the CTCF consensus in the functionally mapped Fab-7 boundary element. Although Fab-7 was not demonstrably enriched in the ChIP-array, we found significant CTCF association with Fab-7 in the more sensitive PCR-base ChIP assay. Given the lack of a strong Patser match this may suggest an indirect association. We also do not see a CTCF site between the abx/bx and the bxd/pbx regulatory elements. However, these elements are separated by a long distance, and it is not clear whether they require insulation.
According to the domain model [26
], the parasegment-specific regulatory domains that control the expression patterns of the Ubx, abd-A,
genes of the BX-C are initially activated in appropriate parasegments by the early pattern-forming genes acting on initiator elements. Each regulatory domain is predicted to contain a particular initiator element, tuned to respond to a specific combination of gap and pair-rule gene products, thus activating the regulatory domain in the appropriate set of parasegments. This activation would be read by maintenance elements consisting of PREs that thereafter autonomously maintain each regulatory domain in either the OFF (silenced) or ON (active) state. Within a domain in the ON state, enhancers present in that domain would be able to engage with the relevant gene promoter and regulate expression of the gene. Boundary elements that flank each domain are proposed to restrict the effects of the initiator and maintenance elements to a single domain.
Although boundary elements are postulated to have the common property of insulating the regulatory domains, no sequence similarity between the mapped boundary elements has been reported until now. Here we show that a set of these boundary elements contain CTCF binding sites and bind CTCF in vivo. CTCF has been shown to be required for the insulator activity of Fab-8,
and it seems likely that CTCF will also be a required component at the other boundary elements. In support of this suggestion, we find that the CTCF sites are well conserved within the sequenced insect genomes. The observation that CTCF sites flank a set of regulatory domains in the BX-C, together with the vertebrate studies that suggest that CTCF can mediate the formation of chromatin loops [44
] supports the idea that interaction between CTCF sites may organise these domains into chromatin loops. However, how such a looping mechanism enables the autonomy of the individual regulatory domains and facilitates appropriate enhancer/promoter interactions is still unclear.
A key feature of the domain model is the relationship between the boundary and maintenance elements. For the domains to be capable of independently being set to the ON or OFF state, the range of influence of PREs needs to be restricted by the domain boundaries. Each domain would require at least one PRE. Our precise mapping of in vivo CTCF binding sites has enabled us to examine their relationship with Polycomb target sites. In strong support of the domain model, we find that the domains demarcated by CTCF sites contain Polycomb target sites. Indeed, we find an intimate relationship between CTCF and Polycomb binding sites as shown in for “Fab-4,” Mcp, Fab-6,
and CTCF site “C.” This fits with previous functional mapping indicating that boundary elements and PREs are closely associated at Fab-7, Fab-8,
. This arrangement would impose a polarity on the spread of chromatin modification from the PRE, such that modification may start at the PRE abutting one boundary and spread across the domain in one direction towards the next boundary. At the boundaries, CTCF may play many possible roles. It could participate in boundary element function allowing the independence of chromatin domains by acting as a chromatin insulator blocking the spread of chromatin modification. However, at the chicken ß-globin locus, the chromatin boundary appears to be separable from the CTCF binding site [55
]. Another possibility is suggested by that fact that CTCF has been demonstrated to block the progression of RNA polymerase [56
]. This could potentially play an important role at boundaries in the BX-C to enable the independent function of PREs in neighbouring domains. There is considerable evidence that transcription through PREs may control their state, and many noncoding RNAs have been detected in the regulatory regions of the BX-C [56
]. One role for CTCF could be to act as a barrier to such noncoding transcription, preventing transcripts arising in one regulatory domain from crossing into the neighbouring domain and affecting the PRE state. Such a role would be consistent with the observed location of CTCF sites in this region, as a CTCF site closely abuts one side of each PRE.
The individual regulatory domains must not only be able to act autonomously to set and maintain their activity state, but they must also be able to interact appropriately with the relevant gene promoters. Boundaries may play a role in this, and recently Cleard et al. [63
] have demonstrated a long-range interaction between Fab-7
and the Abd-B
-RB promoter. This interaction was associated with lack of Abd-B
expression, but similar interactions, bringing in appropriate enhancers, may also activate expression. The ability of CTCF to form clusters may facilitate such interactions, and it is intriguing that there are CTCF sites not only at the boundaries but also close to Abd-B
promoters; the CTCF site “B” is 300 bp upstream of the Adb-B
-RB promoter (). Clustering of boundaries together with Abd-B
promoter sequences may enable interaction between the promoter and enhancers in domains in the ON state. The clustering may also be more selective; in we see that in S2 cells, which specifically express Abd-B
-RB, several boundaries are embedded in chromatin bearing the repressive H3K27me3 modification, whereas Fab-8
, CTCF site “B,” and the Abd-B
-RB promoter are in the unmodified, presumably “open,” chromatin domain. We could speculate that the expression of Abd-B
-RB in these cells might be facilitated by interaction of the CTCF sites in the “open” domain, Fab-8
and site “B,” enabling Fab-8
to bring appropriate enhancers to the Abd-B
We can compare this ChIP-array analysis of CTCF genomic sites with our ChIP-array analysis of binding sites for another Drosophila
insulator-binding protein, Su(Hw) (B. Adryan, G. Woerfel, I. Birch-Machin, S. Gao, M. Quick, L. Meadows, S. Russell, and R. White; unpublished data). CTCF and Su(Hw) are both multi-zinc- finger DNA-binding proteins, and in both cases we have identified relatively long (~20 bp) consensus binding sites. In contrast to most DNA-binding proteins, we find that strength of match to the consensus binding sites is a good predictor of in vivo occupancy. We have also investigated whether our data indicate any collaboration between CTCF and Su(Hw). This seemed an attractive possibility since removing Su(Hw) function in vivo has little effect; su(Hw)
null mutant flies are female-sterile but viable. Also, the insulating activity of Fab-8
was significantly reduced when the CTCF sites were mutated but not completely abolished [13
]. However we found no evidence for general colocalisation between CTCF and Su(Hw). A total of 60 Su(Hw) sites were identified in the Adh
region, and only one of the fragments covering this region contained both CTCF and Su(Hw) sites. The single CTCF site identified in the achaete-scute
complex was also some distance from the two Su(Hw) sites we found. Subsequent ChIP-array analysis in the BX-C led to the identification of only one Su(Hw) site within the entire BX-C region, in a location devoid of CTCF binding sites (B. Adryan, S. Russell, and R. White unpublished data). Indeed whilst the BX-C appears relatively enriched in CTCF sites compared to the Adh
region, the converse is true for Su(Hw). For CTCF there are 4.7 sites/100 kb in the BX-C and 1.7 sites/100 kb in the Adh region (using Patser p
), whereas for Su(Hw) the BX-C is depleted in sites with only 0.29/100 kb in comparison to 2.7/100 kb in the Adh region (using Patser p
). Clearly, although CTCF and Su(Hw) both possess insulating ability, their sites of action do not correlate and there is no evidence from our analysis, covering approximately 3% of the Drosophila
genome, for cooperative activity.
By comparing the sequences of ChIP-enriched fragments we identified a strong Drosophila
consensus CTCF binding site. Analysis of vertebrate CTCF target sequences leads us to propose that vertebrate CTCF also binds to a similar consensus sequence. Our findings do not support the current view that CTCF binds to divergent DNA sequences by engaging different subsets of the zinc fingers [38
]. Indeed, the binding site revealed here has been previously noted. Bell et al. [9
] identified a CTCF binding site in the chicken β-globin insulator, and sequence comparisons between this site and other known CTCF sites [6
] identified a conserved 3′ region, the mutation of which completely abolished CTCF binding and enhancer blocking. Filippova et al. [49
] extended this comparison to include the Dm1 sites, mouse H19
DMD4 and DMD7 and human MYC A,
and again identified a conserved region within the larger approximately 50-bp DNase footprint for each site. It is this conserved region that corresponds to the vertebrate CTCF site found here. Very recently, an analysis of CTCF binding in the human genome has generated a vertebrate CTCF consensus site [65
], and a CTCF consensus has also been derived from analysis of conserved regions in the human genome [66
]. Both these sites are very similar to the consensus we identify here; in particular they share the strong features of the CC at positions 1 and 2, the AG at positions 6 and 7, and the GGC at positions 10, 11, and 12. Overall, these findings indicate that CTCF in both Drosophila
and vertebrates binds to a single core consensus sequence.
In summary, ChIP-array analysis has enabled us to construct a CTCF binding site consensus. Mapping of genomic binding sites leads us to propose that all known or predicted insulators in the BX-C (with the possible exception of Fab-7) function in a CTCF dependent manner.