Previous studies in model organisms3–7
as well as initial analyses in human cells8
have identified fundamental aspects of nucleosome organization. We here focus on the dynamic relationships between sequence-based nucleosome preferences and chromatin regulatory function in primary human cells. We mapped tissue-specific and DNA-encoded nucleosome organization across granulocytes and two types of T-cells (CD4+ and CD8+) isolated from the blood of a single human donor, by isolating cellular chromatin and treating it with micrococcal nuclease (MNase) followed by deep sequencing of the resulting nucleosome-protected fragments (Methods, Supplementary Fig. 1
). To provide sufficient depth for both local and global analyses, we used high-throughput SOLiD technology, generating 584, 342, and 343 million mapped reads for granulocytes, CD4+, and CD8+ T-cells, respectively. These are equivalent to 16x–28x genome coverage by 147 bp nucleosome footprints (cores; see Methods). The depth of sequence was critical for our subsequent analysis: while shallower coverage can illuminate features of nucleosome positions through statistical analysis (e.g. 6,8
), any definitive map and thus comparison of static and dynamic positioning requires high sequence coverage throughout the genome.
To provide complementary data on purely sequence-driven nucleosome positioning in the absence of cellular influences, we reconstituted genomic DNA in vitro
with recombinantly derived histone octamers to produce in vitro
nucleosomes (Methods, Supplementary Fig. 2
), and generated over 669 million mapped reads, representing 32x core coverage of the genome. To identify primary nucleosome positioning sites in DNA, the reconstitution was performed under conditions of DNA excess (see methods). We also generated a control dataset of 321 million mapped reads from MNase-digested naked DNA (Supplemental Materials
). In the population of granulocytes (our deepest in vivo
data set), over 99.5 % of the mappable genome is engaged by nucleosomes (Methods), and 50 percent of nucleosome-depleted bases occur in regions shorter than 160 bp.
We first focused on global patterns of nucleosome positioning and spacing by calculating fragment distograms and phasograms6,7,9
. Distograms (histograms of distances between mapped reads’ start positions aligning in opposing orientation, Supplementary Fig. 3A
) reveal the average core fragment size as a peak if there are many sites in the genome that contain consistently positioned nucleosomes. A positioning signal that is strongly amplified by conditioning the analysis on sites with 3 or more read starts (reflecting a positioning preference; 3-pile subset), is present not only in vivo
(), but also in vitro
(), demonstrating that many genomic sites bear intrinsic, sequence-driven, positioning signals. Phasograms (histograms of distances between mapped reads’ start positions aligning in the same orientation, Supplementary Fig. 3B
) reveal consistent spacing of positioned nucleosomes by exhibiting a wave-like pattern with a period that represents genome-average internucleosome spacing. In granulocytes, the wave peaks are 193 bp apart (, adjusted R2
), which, given a core fragment length of 147 bp, indicates an internucleosome linker length of 46 bps. By contrast, the phasograms of both types of T-cells have spacing that is wider by 10 bp (), equivalent to a 56 bp average linker length. These results are consistent with classical observations of varying nucleosome phases in different cell types10,11
. Linker length differences have been tied to differences in linker histone gene expression12,13
, which we found to be 2.4 times higher in T-cells compared to granulocytes (84 RPKM14
vs.35 RPKM). The in vitro
phasogram () reveals no detectable stereotypic spacing of positioned nucleosomes, demonstrating a lack of intrinsic phasing among DNA-encoded nucleosome positioning sites.
Global parameters of cell-specific nucleosome phasing and positioning in human
Using a positioning stringency metric (Methods; Supplementary Fig. 4
) that quantifies the fraction of defined nucleosome positions within a given segment, we calculated the fraction of the genome that is occupied by preferentially positioned nucleosomes at different stringency thresholds. The maximum number of sites at which some positioning preference can be detected statistically is 120 M, covering just over 20% of the genome (Supplementary Fig. 5
)at the low stringency of 23%. Thus, the majority of nucleosome positioning preferences is weak, and nucleosomes across the majority of the human genome are not preferentially positioned, either by sequence or by cellular function.
We next focused on how transcription and chromatin functions affect nucleosome organization regionally. For each cell type, we generated deep RNA-seq data and binned genes into groups according to their expression levels. The average spacing of nucleosomes was greatest within silent genes (CD4+ T-cells, 206 bp, ) and decreased by as much as 11 bp as the expression levels went up (t-statistic p-value 6.5×10−34
). This suggests that transcription-induced cycles of nucleosome eviction and reoccupation cause denser packing of nucleosomes and slight reduction in nucleosome occupancy (Supplementary Fig. 6
). On the basis of this result, we hypothesized that higher-order chromatin organization as implied by specific chromatin modifications might be associated with specific spacing patterns. Using previously published ChIP-seq data, we identified regions of enrichment15
for histone modifications that are found within heterochromatin (H3K27me3, H3K9me3)16
, gene-body euchromatin (H4K20me1, H3K27me1)16
, or euchromatin associated with promoters and enhancers (H3K4me1, H3K27ac, H3K36ac)17
, and estimated spacing of nucleosomes for each of these epigenetic domains. We found that active promoter-associated domains contained the shortest spacing of 178–187 bp, followed by a larger spacing of 190–195 bp within the body of active genes, while heterochromatin spacing was largest at 205 bp (). These results reveal striking heterogeneity in nucleosome organization across the genome that depends on global cellular identity, metabolic state, regional regulatory state, and local gene activity.
Transcription and chromatin modification-dependent nucleosome spacing
To characterize DNA signals responsible for consistent positioning of nucleosomes, we identified 0.3 million sites occupied in vitro
by nucleosomes at high stringency (> 0.5; Methods). The region occupied by the center of the nucleosome (dyad) exhibits a significant increase in G/Cusage(Poisson p-value < 10−100
; ). Flanking regions increase in A/T usage as the positioning strength increases (). A subset of in vitro
positioned nucleosomes (stringency > 0.5) which are also strongly positioned in vivo
(stringency > 0.4) revealed increased A/T usage within the flanks () compared to in vitro
-only positioning sites (), which underscores the importance of flanking repelling elements for positioning in vivo
. We term such elements with strong G/C cores and A/T flanks “container sites” to emphasize the proposed positioning mechanism (). This positioning signal is different from a 10 bp dinucleotide periodicity observed in populations of nucleosome core segments isolated from a variety of species19,20
and proposed to contribute to precise positioning and/or rotational setting of DNA on nucleosomes20
on a fine scale(Supplementary Fig. 7
). G/C rich signals are known to promote nucleosome occupancy18,21
, while AA-rich sequences repel nucleosomes4
, and our data demonstrate that precise arrangement of a core-length attractive segment flanked by repelling sequences can produce a strongly positioned nucleosome ().
Sequence signals that drive nucleosome positioning
Dyad frequencies around container sites () show a strong peak of enrichment in vivo
, confirming that DNA positions nucleosomes in vivo
over these sites. Additionally, wave-like patterns emanate from these sites in vivo
(but not in vitro
), reflecting the nucleation of phased arrays by positioned cellular nucleosomes. Viewing these results in light of the nucleosome barrier model22
, which proposes that nucleosomes are packed into positioned and phased arrays against a chromatin barrier, we conclude that sequence-positioned nucleosome can initiate propagation of adjacent stereotypically positioned nucleosomes. Importantly, wave periods around container sites are shorter in granulocytes than in T-cells, allowing tissue-specific variation in linker length () to alter placement of nucleosomes over distances of as much as 1 kb from an initial container site. Functional consequences of such rearrangements might include global shifts in regulatory properties that could contribute to distinct transcription factor accessibility profiles in different cell types.
The cellular environment can drive nucleosomes to sequences not intrinsically favorable to being occupied, as evident in a genome-wide comparison of observed nucleosome coverage of all possible tetranucleotides between the granulocyte and the in vitro data (). In vitro, nucleosome occupancy is strongly associated with AT/GC content, but this preference is abolished in vivo; the exception are C/G rich tetramers that contain CpG dinucleotides, which show a 30% reduction in apparent nucleosome occupancy despite having high core coverage in vitro. Consistent with this, CpG islands are five-fold depleted for observed nucleosome coverage in vivo (). No such decrease is observed in the in vitro dataset.
Influence of gene regulatory function on nucleosome positioning
We hypothesize that the decreased nucleosome occupancy of promoters could be due to promoter-related functions of mammalian CpG islands, similar to promoter-associated nucleosome-free regions observed in flies23
, which do not have CpG islands. We therefore analyzed transcription-dependent nucleosome packaging around promoters. As in other organisms23–27
, promoters of active genes have a nucleosome-free region (NFR) of about 150 bp overlapping the transcriptional start site and arrays of well-positioned and phased nucleosomes that radiate from the NFR (). A notable reduction in apparent nucleosome occupancy extends up to 1 kb into the gene body. We also observed consistent nucleosome coordinates in an independent data set of H3K4me3-bearing nucleosomes16
(). Comparison of the nucleosome data () with binding patterns of RNA Polymerase II16
() around active promoters indicates that phasing of positioned nucleosomes can be explained by packing of nucleosomes against Pol II stalled at the promoter, with Pol II potentially acting as the “barrier”. The set of inactive promoters, by contrast, exhibits neither a pronounced depletion of nucleosomes, nor a positioning and phasing signal (). The transition of an inactive promoter to an active one is therefore likely to involve eviction of nucleosomes, coupled with positioning and phasing of nucleosomes neighboring RNA Pol II (). These results suggest that CpG-rich segments in mammalian promoters override intrinsic signals of high nucleosome affinity (Supplementary Fig. 8
) to become active; this would be in contrast to fly and yeast, where AT-rich promoters may comprise intrinsic sequence signals that are particularly prone to nucleosome eviction28
To explore how regulatory factors interact with sequence signals to influence nucleosome organization outside of promoters, we focused on binding sites of the NRSF repressor protein15
and the insulator protein CTCF. NRSF and CTCF sites are flanked by arrays of positioned nucleosomes (, Supplementary Fig. 9
), consistent with barrier-driven packing previously reported for CTCF29,30
. Both proteins occupy additional linker space, with NRSF taking up an extra 37 bp and CTCF 74 bp. In agreement with sequence-based predictions21
, both CTCF and NRSF sites intrinsically encode high nucleosome occupancy as can be seen from the in vitro
data (, Supplementary Fig. 9
), but this signal is overridden in vivo
by occlusion of these sites from associating with nucleosomes. Additionally, phasing of nucleosomes around these regulatory sites is more compact in granulocytes compared to T-cells (Supplementary Fig. 9
), again exemplifying the importance of cellular parameters for placement of nucleosomes.
Our genome-wide, deep sequence data of nucleosome positions facilitated an initial characterization of the determinants of nucleosome organization in primary human cells. Spacing of nucleosomes differs between cell types and between distinct epigenetic domains in the same cell type, and is influenced by transcriptional activity. We confirm positioning preferences in regulatory elements such as promoters and chromatin regulator binding sites, but find that the majority of the human genome exhibits little if any detectable positioning. The influence of sequence on positioning of nucleosomes in vivo is modest but detectable. Despite DNA sequence being a potent driver of nucleosome organization at certain sites, the cellular environment often overrides sequence signals and can drive nucleosomes to occupy intrinsically unfavorable DNA elements or evict nucleosomes from intrinsically favorable sites. We find evidence for the barrier model for nucleosome organization, and that barriers can be nucleosomes (positioned by container sites), RNA polymerase II (stalled at the promoter), or sequence-specific regulatory factors. Our nucleosome maps should be useful for investigating how nucleosome organization affects gene regulation and vice versa, as well as for pinpointing the mechanisms driving regional heterogeneity of nucleosome spacing.