By integrating a large number of high-throughput sequencing and microarray datasets and performing aggregation analysis with transcription factor binding sites or the TSS as anchors, we discovered that there is an array of 20 nucleosomes flanking occupied CTCF sites genome-wide. These nucleosomes are so well positioned that remarkable oscillatory patterns were observed for 21 out of the 22 genome-wide datasets 
. Two case studies reported CTCF binding in the IGF2/H19 and DM1 loci, both of which suggested that the CTCF binding sites occurred in linker regions between nucleosomes 
. These are consistent with our findings in this study. We are unaware of other previous work on the genome-wide relationship between CTCF and nucleosomes. The TSS is the only genome-wide anchor for which well-positioned nucleosomes were reported, and only two nucleosomes upstream the TSS and five nucleosomes downstream of the TSS are well positioned. Here we show that 20 nucleosomes flanking CTCF sites exhibit much stronger oscillatory patterns, and hence are much better positioned than the nucleosomes around the TSS.
No well-positioned nucleosomes have ever been reported to flank transcription factor binding sites. Among the human transcription factors for which genome-wide binding data are available, the ChIP target regions for only three factors are highly enriched in their binding motifs (STAT1, NRSF and p53). We did not observe well-positioned nucleosomes around the occupied sites of any of these three factors. One complication is that the ChIP-Seq data of histone modifications were on CD4+ T cells while the binding site data were on other cell types. We must wait for future data to resolve this issue definitively, and to uncover whether well-positioned nucleosomes flank the binding sites of other transcriptional factors genome-wide.
There are four possible mechanisms for the well-positioned nucleosomes around CTCF sites: 1. CTCF binds to its sites first and then recruits chromatin remodeling factors to position neighboring nucleosomes; 2. CTCF binds to its sites first, which provides a strong anchor for the neighboring nucleosomes to line up by themselves; 3. Nucleosomes are well positioned in some regions of the genome due to DNA sequence features, and a CTCF site has co-evolved with the nucleosome positioning sequence features to exist in a lengthened linker region, which attracts the binding of CTCF; 4. Some genomic regions contain nucleosome-positioning sequence features leading to an array of regularly positioned nucleosomes which occlude a CTCF site, and CTCF binds to its site and repositions the nucleosomes to create a lengthened linker region. We argue that our results mostly support the second scenario for reasons as follows. Three lines of evidence suggest that the well positioned nucleosomes are unlikely caused predominantly by the intrinsic sequence features of the genomic DNA surrounding occupied CTCF sites: 1. There is a lack of conservation for the sequences that flank occupied CTCF sites, in sharp contrast with the strong conservation at the CTCF sites (); 2. A computational algorithm predicts a nucleosome to occupy sites that are occupied by CTCF in vivo
); 3. We performed in vitro
nucleosome reconstitution and mapping experiments on two insulators that each contains three CTCF sites. The results showed an irregular pattern of MNase cleavages, indicating the lack of a positioned nucleosomal array. Moreover, all six CTCF sites in these two insulators are in nucleosomal regions, consistent with the computational prediction and in contrast with the in vivo
data. The binding of CTCF lengthens the linker region to 118 bp. Thus if the 20 nucleosomes form with regular intervals before CTCF binds, all of them will need to slide apart to accommodate the binding of CTCF, which seems unlikely. CTCF has not been reported to recruit chromatin remodeling factors. Thus we propose that the second scenario is most likely to be biologically relevant in general. Our hypothesis is consistent with the statistical positioning mechanism, which states that nucleosomes prefer not to occupy some regions of the genome due to sequence features such as homo-poly A/T or the eviction by regulatory proteins, but are well-positioned in the remaining regions of the genome due to structural constraints imposed by DNA packaging 
. We hypothesize that the binding of CTCF acts as a roadblock for translational nucleosome movements and as a result the nucleosomes are packaged between the CTCF binding sites and the nearest nucleosome-free regions.
Nonetheless, our hypothesis does not preclude the possibility that in some loci other mechanisms cause well-positioned nucleosomes around CTCF sites. Indeed, Kanduri et al. reported that a subset of CTCF sites in the H19 locus was flanked by nucleosome positioning sequences and the authors argue that these sequences have evolved to ensure the constitutive availability of the CTCF binding sites 
. Thus these results argue for the third scenario described above.
The nucleosomes flanking CTCF sites are enriched in H2A.Z and 11 histone modifications. Among these, H2A.Z and 8 histone modifications are also enriched in promoters and are positively correlated with the transcriptional levels of downstream genes 
. The remaining three, H3K79me1, H3R2me1 and H3R2me2, are enriched to much less extents among the 11 modifications (). The large overlap between the epigenetic features of nucleosomes in promoters and the nucleosomes around CTCF sites is surprising, given that CTCF is mostly known to bind to insulators, suggesting that CTCF may play an important role in regulating promoters.
The well-positioned nucleosomes around occupied CTCF sites allowed us to determine the length of the nucleosomal DNA protected against MNase digestion. Our results () indicate that there is great variation in the accessibility of nucleosomal DNA that corresponds to various histone methylations. It would be interesting to quantify the amounts of variation for modifications that affect net charges of the histones, once the data becomes available. The histone modifications that correspond to greater DNA accessibilities and H2A.Z, which also corresponds to great DNA accessibility, are highly enriched in promoters of expressed genes. Collectively, these results suggest that one of the mechanisms by which histone modifications regulate gene expression can be by modulating accessibility to the genomic DNA. In light of the recent findings on histone turnover 
, it is tempting to suggest that accessible DNA would facilitate rapid histone turnover and/or rapid turnover results in accessible DNA. In particular, rapid histone turnover was observed in chromatin boundaries and suggested to help delimit the spread of chromosome states 
. Because the primary function of CTCF is to bind to insulators, which are the most well understood boundary elements, we suggest that those CTCF sites flanked by nucleosomes with highly accessible DNA can prevent the lateral spreading of chromosome states.
also suggests that regions around occupied CTCF sites are of heterogeneous composition: subsets of them are enriched in different histone modifications, therefore producing different L-Digest measurements. Indeed, hierarchical clustering of the regions surrounding all occupied CTCF sites based on the ChIP-Seq signal levels of histone modification, H2A.Z and RNA polymerase II (Figure S10
) confirms that these genomic regions have diverse patterns of epigenetic marks. It would be interesting to investigate whether some of these patterns are correlated with the insulator function, and if so, which ones are. CTCF has also been reported to possess activating and repressing functions and it is possible that some epigenetic patterns correspond to these functions. Figure S10
further indicates that all the nucleosomes surrounding occupied CTCF sites are covered by H2A.Z and/or some of the histone modifications investigated in this study. Because well-positioned nucleosomes are observed for all but one histone modification datasets (), we conclude that this is a universal feature of CTCF, regardless of the underlying biological function (insulation, activation, repression or others) of the particular locus.
Because Unit+ and Unit− are on average 185 bp and largely invariant, we can deduce that the length of human linker DNA is 38 bp given that 147 bp of DNA is observed in the crystal structure of nucleosomes 
. This linker length is somewhat shorter than the previous estimate of 70 bp in higher eukaryotes 
. Because our analysis included data on all nucleosomes, nucleosomes with H2A.Z or one of 20 histone modifications, we believe that 38 bp is a robust estimate. Furthermore, this is unlikely to be specific to only the nucleosomes flanking occupied CTCF sites, because the well-positioned nucleosomes around the TSS have similar intervals as the nucleosomes flanking CTCF sites.
In summary, we discovered that occupied CTCF binding sites in the human genome are flanked by 20 well-positioned nucleosomes. These nucleosomes are enriched in H2A.Z and 11 histone modifications, forming complex epigenetic patterns. Nucleosomes enriched in different histone modifications have diverse but compensating lengths of DNA that are protected from or digested by MNase. The binding of CTCF extends the linker to 118 bp and the CTCF footprint is smaller if the DNA of neighboring nucleosomes is more accessible. These results provide insights to the interplay between chromatin structure and CTCF function.