|Home | About | Journals | Submit | Contact Us | Français|
Nucleosomes are the basic packaging units of chromatin, modulating accessibility of regulatory proteins to DNA and thus influencing eukaryotic gene regulation. Elaborate chromatin remodeling mechanisms have evolved that govern nucleosome organization at promoters, regulatory elements, and other functional regions in the genome1. Analyses of chromatin landscape have uncovered a variety of mechanisms, including DNA sequence preferences, that can influence nucleosome positions2–4. To identify major determinants of nucleosome organization in the human genome, we utilized deep sequencing to map nucleosome positions in three primary human cell types and in vitro. A majority of the genome exhibited substantial flexibility of nucleosome positions while a small fraction showed reproducibly positioned nucleosomes. Certain sites that position in vitro can anchor the formation of nucleosomal arrays that have cell type-specific spacing in vivo. Our results unveil an interplay of sequence-based nucleosome preferences and non-nucleosomal factors in determining nucleosome organization within mammalian cells.
Previous studies in model organisms3–7 as well as initial analyses in human cells8 have identified fundamental aspects of nucleosome organization. We here focus on the dynamic relationships between sequence-based nucleosome preferences and chromatin regulatory function in primary human cells. We mapped tissue-specific and DNA-encoded nucleosome organization across granulocytes and two types of T-cells (CD4+ and CD8+) isolated from the blood of a single human donor, by isolating cellular chromatin and treating it with micrococcal nuclease (MNase) followed by deep sequencing of the resulting nucleosome-protected fragments (Methods, Supplementary Fig. 1). To provide sufficient depth for both local and global analyses, we used high-throughput SOLiD technology, generating 584, 342, and 343 million mapped reads for granulocytes, CD4+, and CD8+ T-cells, respectively. These are equivalent to 16x–28x genome coverage by 147 bp nucleosome footprints (cores; see Methods). The depth of sequence was critical for our subsequent analysis: while shallower coverage can illuminate features of nucleosome positions through statistical analysis (e.g. 6,8), any definitive map and thus comparison of static and dynamic positioning requires high sequence coverage throughout the genome.
To provide complementary data on purely sequence-driven nucleosome positioning in the absence of cellular influences, we reconstituted genomic DNA in vitro with recombinantly derived histone octamers to produce in vitro nucleosomes (Methods, Supplementary Fig. 2), and generated over 669 million mapped reads, representing 32x core coverage of the genome. To identify primary nucleosome positioning sites in DNA, the reconstitution was performed under conditions of DNA excess (see methods). We also generated a control dataset of 321 million mapped reads from MNase-digested naked DNA (Supplemental Materials). In the population of granulocytes (our deepest in vivo data set), over 99.5 % of the mappable genome is engaged by nucleosomes (Methods), and 50 percent of nucleosome-depleted bases occur in regions shorter than 160 bp.
We first focused on global patterns of nucleosome positioning and spacing by calculating fragment distograms and phasograms6,7,9. Distograms (histograms of distances between mapped reads’ start positions aligning in opposing orientation, Supplementary Fig. 3A) reveal the average core fragment size as a peak if there are many sites in the genome that contain consistently positioned nucleosomes. A positioning signal that is strongly amplified by conditioning the analysis on sites with 3 or more read starts (reflecting a positioning preference; 3-pile subset), is present not only in vivo (Fig. 1A), but also in vitro (Fig. 1B), demonstrating that many genomic sites bear intrinsic, sequence-driven, positioning signals. Phasograms (histograms of distances between mapped reads’ start positions aligning in the same orientation, Supplementary Fig. 3B) reveal consistent spacing of positioned nucleosomes by exhibiting a wave-like pattern with a period that represents genome-average internucleosome spacing. In granulocytes, the wave peaks are 193 bp apart (Fig. 1C, adjusted R2=1, p-value<10−15), which, given a core fragment length of 147 bp, indicates an internucleosome linker length of 46 bps. By contrast, the phasograms of both types of T-cells have spacing that is wider by 10 bp (Fig. 1D), equivalent to a 56 bp average linker length. These results are consistent with classical observations of varying nucleosome phases in different cell types10,11. Linker length differences have been tied to differences in linker histone gene expression12,13, which we found to be 2.4 times higher in T-cells compared to granulocytes (84 RPKM14 vs.35 RPKM). The in vitro phasogram (Fig. 1E) reveals no detectable stereotypic spacing of positioned nucleosomes, demonstrating a lack of intrinsic phasing among DNA-encoded nucleosome positioning sites.
Using a positioning stringency metric (Methods; Supplementary Fig. 4) that quantifies the fraction of defined nucleosome positions within a given segment, we calculated the fraction of the genome that is occupied by preferentially positioned nucleosomes at different stringency thresholds. The maximum number of sites at which some positioning preference can be detected statistically is 120 M, covering just over 20% of the genome (Supplementary Fig. 5)at the low stringency of 23%. Thus, the majority of nucleosome positioning preferences is weak, and nucleosomes across the majority of the human genome are not preferentially positioned, either by sequence or by cellular function.
We next focused on how transcription and chromatin functions affect nucleosome organization regionally. For each cell type, we generated deep RNA-seq data and binned genes into groups according to their expression levels. The average spacing of nucleosomes was greatest within silent genes (CD4+ T-cells, 206 bp, Fig. 2A) and decreased by as much as 11 bp as the expression levels went up (t-statistic p-value 6.5×10−34). This suggests that transcription-induced cycles of nucleosome eviction and reoccupation cause denser packing of nucleosomes and slight reduction in nucleosome occupancy (Supplementary Fig. 6). On the basis of this result, we hypothesized that higher-order chromatin organization as implied by specific chromatin modifications might be associated with specific spacing patterns. Using previously published ChIP-seq data, we identified regions of enrichment15 for histone modifications that are found within heterochromatin (H3K27me3, H3K9me3)16, gene-body euchromatin (H4K20me1, H3K27me1)16, or euchromatin associated with promoters and enhancers (H3K4me1, H3K27ac, H3K36ac)17, and estimated spacing of nucleosomes for each of these epigenetic domains. We found that active promoter-associated domains contained the shortest spacing of 178–187 bp, followed by a larger spacing of 190–195 bp within the body of active genes, while heterochromatin spacing was largest at 205 bp (Fig. 2B). These results reveal striking heterogeneity in nucleosome organization across the genome that depends on global cellular identity, metabolic state, regional regulatory state, and local gene activity.
To characterize DNA signals responsible for consistent positioning of nucleosomes, we identified 0.3 million sites occupied in vitro by nucleosomes at high stringency (> 0.5; Methods). The region occupied by the center of the nucleosome (dyad) exhibits a significant increase in G/Cusage(Poisson p-value < 10−100; Fig. 3A). Flanking regions increase in A/T usage as the positioning strength increases (Fig. 3B). A subset of in vitro positioned nucleosomes (stringency > 0.5) which are also strongly positioned in vivo (stringency > 0.4) revealed increased A/T usage within the flanks (Fig. 3C) compared to in vitro-only positioning sites (Fig. 3A), which underscores the importance of flanking repelling elements for positioning in vivo. We term such elements with strong G/C cores and A/T flanks “container sites” to emphasize the proposed positioning mechanism (Fig. 3D). This positioning signal is different from a 10 bp dinucleotide periodicity observed in populations of nucleosome core segments isolated from a variety of species19,20 and proposed to contribute to precise positioning and/or rotational setting of DNA on nucleosomes20 on a fine scale(Supplementary Fig. 7). G/C rich signals are known to promote nucleosome occupancy18,21, while AA-rich sequences repel nucleosomes4, and our data demonstrate that precise arrangement of a core-length attractive segment flanked by repelling sequences can produce a strongly positioned nucleosome (Fig. 3D).
Dyad frequencies around container sites (Fig 3E) show a strong peak of enrichment in vivo, confirming that DNA positions nucleosomes in vivo over these sites. Additionally, wave-like patterns emanate from these sites in vivo (but not in vitro), reflecting the nucleation of phased arrays by positioned cellular nucleosomes. Viewing these results in light of the nucleosome barrier model22, which proposes that nucleosomes are packed into positioned and phased arrays against a chromatin barrier, we conclude that sequence-positioned nucleosome can initiate propagation of adjacent stereotypically positioned nucleosomes. Importantly, wave periods around container sites are shorter in granulocytes than in T-cells, allowing tissue-specific variation in linker length (Fig. 1D) to alter placement of nucleosomes over distances of as much as 1 kb from an initial container site. Functional consequences of such rearrangements might include global shifts in regulatory properties that could contribute to distinct transcription factor accessibility profiles in different cell types.
The cellular environment can drive nucleosomes to sequences not intrinsically favorable to being occupied, as evident in a genome-wide comparison of observed nucleosome coverage of all possible tetranucleotides between the granulocyte and the in vitro data (Fig. 4A). In vitro, nucleosome occupancy is strongly associated with AT/GC content, but this preference is abolished in vivo; the exception are C/G rich tetramers that contain CpG dinucleotides, which show a 30% reduction in apparent nucleosome occupancy despite having high core coverage in vitro. Consistent with this, CpG islands are five-fold depleted for observed nucleosome coverage in vivo (Fig. 4B). No such decrease is observed in the in vitro dataset.
We hypothesize that the decreased nucleosome occupancy of promoters could be due to promoter-related functions of mammalian CpG islands, similar to promoter-associated nucleosome-free regions observed in flies23 and yeast5, which do not have CpG islands. We therefore analyzed transcription-dependent nucleosome packaging around promoters. As in other organisms23–27, promoters of active genes have a nucleosome-free region (NFR) of about 150 bp overlapping the transcriptional start site and arrays of well-positioned and phased nucleosomes that radiate from the NFR (Fig. 4C). A notable reduction in apparent nucleosome occupancy extends up to 1 kb into the gene body. We also observed consistent nucleosome coordinates in an independent data set of H3K4me3-bearing nucleosomes16 (Fig. 4D). Comparison of the nucleosome data (Fig 4D) with binding patterns of RNA Polymerase II16 (Fig 4D) around active promoters indicates that phasing of positioned nucleosomes can be explained by packing of nucleosomes against Pol II stalled at the promoter, with Pol II potentially acting as the “barrier”. The set of inactive promoters, by contrast, exhibits neither a pronounced depletion of nucleosomes, nor a positioning and phasing signal (Fig. 4C). The transition of an inactive promoter to an active one is therefore likely to involve eviction of nucleosomes, coupled with positioning and phasing of nucleosomes neighboring RNA Pol II (Fig. 4E). These results suggest that CpG-rich segments in mammalian promoters override intrinsic signals of high nucleosome affinity (Supplementary Fig. 8) to become active; this would be in contrast to fly and yeast, where AT-rich promoters may comprise intrinsic sequence signals that are particularly prone to nucleosome eviction28.
To explore how regulatory factors interact with sequence signals to influence nucleosome organization outside of promoters, we focused on binding sites of the NRSF repressor protein15 and the insulator protein CTCF. NRSF and CTCF sites are flanked by arrays of positioned nucleosomes (Fig. 4G, Supplementary Fig. 9), consistent with barrier-driven packing previously reported for CTCF29,30. Both proteins occupy additional linker space, with NRSF taking up an extra 37 bp and CTCF 74 bp. In agreement with sequence-based predictions21, both CTCF and NRSF sites intrinsically encode high nucleosome occupancy as can be seen from the in vitro data (Fig. 4F, Supplementary Fig. 9), but this signal is overridden in vivo by occlusion of these sites from associating with nucleosomes. Additionally, phasing of nucleosomes around these regulatory sites is more compact in granulocytes compared to T-cells (Supplementary Fig. 9), again exemplifying the importance of cellular parameters for placement of nucleosomes.
Our genome-wide, deep sequence data of nucleosome positions facilitated an initial characterization of the determinants of nucleosome organization in primary human cells. Spacing of nucleosomes differs between cell types and between distinct epigenetic domains in the same cell type, and is influenced by transcriptional activity. We confirm positioning preferences in regulatory elements such as promoters and chromatin regulator binding sites, but find that the majority of the human genome exhibits little if any detectable positioning. The influence of sequence on positioning of nucleosomes in vivo is modest but detectable. Despite DNA sequence being a potent driver of nucleosome organization at certain sites, the cellular environment often overrides sequence signals and can drive nucleosomes to occupy intrinsically unfavorable DNA elements or evict nucleosomes from intrinsically favorable sites. We find evidence for the barrier model for nucleosome organization, and that barriers can be nucleosomes (positioned by container sites), RNA polymerase II (stalled at the promoter), or sequence-specific regulatory factors. Our nucleosome maps should be useful for investigating how nucleosome organization affects gene regulation and vice versa, as well as for pinpointing the mechanisms driving regional heterogeneity of nucleosome spacing.
Neutrophil granulocytes, CD4+ and CD8+ T-cells were isolated from donor blood using Histopaque density gradients and Ig-coupled beads against blood cell surface makers (pan T and CD4+ microbeads, Miltenyi Biotec). Nucleosome cores were prepared as previously described7; cells were snap-frozen and crushed to release chromatin, followed by micrococcal nuclease treatment. In vitro nucleosomes were prepared by combining human genomic DNA with recombinantly-derived histone octamers at an average ratio of 1 octamer per 850 bps. Unbound DNA was then digested using micrococcal nuclease. After digestion, reactions were stopped with EDTA, samples were treated with proteinase K, and nucleosome-bound DNA was extracted with phenol-chloroform and precipitated with ethanol (Supplementary methods). Purified DNA was size-selected (120–180 bp) on agarose to obtain mononucleosome cores, followed by sequencing library construction. RNA was isolated by homogenizing purified cells in Trizol, poly-A RNA was purified using Qiagen Oligotex kit and RNA-seq libraries were constructed using SOLiD Whole Transcriptome Analysis kit. All sequence data was obtained using SOLiD 35 bp protocol and aligned using the SOLiD pipeline against the human hg18 reference genome. Downstream analyses were all conducted using custom scripts (supplementary methods).
This work was supported by the Stanford Genetics/Pathology Sequencing Initiative. We would like to thank Geeta Narlikar for help with in vitro experiments, Life Technologies, especially Jason Briggs, for help with generating sequencing data, Phil Lacroute for help with sequence alignment, Stephen Galli for valuable discussions, Lia Gracey for critical reading of the manuscript, and members of Sidow and Fire labs for valuable feedback and discussions. Work in the Fire lab was partially supported by NIGMS (R01GM37706). AV was partially supported by an ENCODE subcontract to AS (NHGRI U01HG004695). SMJ was partially supported by the Stanford Genome Training program (NHGRI T32HG00044).
Author contributionsAV, SMJ, AS and AZF designed the experiments. SMJ, AV, CLS and SB performed the experiments. AV designed and carried out analyses with input from AS, AZF, and SMJ. AV, AS and AZF wrote the manuscript.
Data availability. All sequence data were submitted to Sequence Read Archive (accession number GSE25133). Sites containing strongly positioned in vitro nucleosomes are available as a supplementary data file.