|Home | About | Journals | Submit | Contact Us | Français|
The centromere is the chromosomal locus that ensures fidelity in genome transmission at cell division. Centromere protein A (CENP-A) is a histone H3 variant that specifies centromere location independently of DNA sequence. Conflicting evidence has emerged regarding the histone composition and stoichiometry of CENP-A nucleosomes. Here we show that the predominant form of the CENP-A particle at human centromeres is an octameric nucleosome. CENP-A nucleosomes are very highly phased on α-satellite 171 bp monomers at normal centromeres, and also display strong positioning at neocentromeres. At either type of functional centromere, CENP-A nucleosomes exhibit similar DNA wrapping behavior as octameric CENP-A nucleosomes reconstituted with recombinant components, having looser DNA termini than those on their conventional counterparts containing canonical H3. Thus, the fundamental unit of the chromatin that epigenetically specifies centromere location in mammals is an octameric nucleosome with loose termini.
Faithful genome inheritance at cell division requires that each chromosome contain a single functional centromere1. The centromere is the site of assembly of the mitotic kinetochore—a massive complex of proteins that serves as the connection point to the microtubule-based spindle— and also serves as the site of final sister chromatid cohesion1. Strong evidence suggests that CENP-A can provide the key epigenetic information to mark centromere location2–4, distinguishing centromeres from the rest of the chromosome. Prime examples of the DNA sequence-independent nature of centromere inheritance are human neocentromeres that have been isolated out of the population, where centromere function is uncoupled from the repetitive α-satellite DNA that typically overlaps with CENP-A chromatin occupancy5–10. Fundamental questions remain regarding CENP-A nucleosomes, such as the histone composition and stoichiometry of the CENP-A particle and how much DNA it wraps.
There is now nearly a consensus on the point that recombinant, purified CENP-A readily assembles into octameric nucleosomes where two copies of CENP-A replace the two copies of canonical H311–15. Reconstituted octameric CENP-A nucleosomes are known to have loose terminal DNA contacts13,15,16. In addition to loose terminal DNA wrapping, the CENP-A targeting domain (CATD) confers structural changes14,15, as well as conformational rigidity14,17, to the folded core of reconstituted octameric nucleosomes. The relevance of all studies of recombinant nucleosomes to native centromeric chromatin is unclear, however, because the field remains deeply divided over key issues on the nature of the protein–DNA particle into which CENP-A assembles in vivo18. Experiments involving isolation of CENP-A particles from various eukaryotic species have led to radically different models for the fundamental unit of centromeric chromatin including non-octameric forms (e.g. tetrasomes19, hemisomes20–23, hexasomes24, etc.). Perhaps the two most intriguing and conflicting proposals for the major form of the CENP-A particles that specify centromere location in metazoans are for octameric nucleosomes and hemisomes. The two proposals suggest radically different modes for how centromere-specifying chromatin particles are distinguished from bulk nucleosomes. Clear examples of when such molecular recognition is important at the centromere include the direct binding of CENP-A-containing particles by constitutive non-histone centromere components, CENP-N4,25 and CENP-C4,26.
To test the proposed models for the major form of the fundamental repeating unit of centromeric chromatin, we used native chromatin immunoprecipitation followed by sequencing (ChIP-Seq) of CENP-A-containing particles from normal centromeres on α-satellite DNA and three naturally-occurring neocentromeres.
To investigate the nature of CENP-A-containing particles at functional human centromeres, we first considered the merits of a micrococcal nuclease (MNase) digestion approach coupled to ChIP-Seq. The MNase approach is attractive because it straightforwardly tests the specific predictions for how much DNA could be wrapped by octameric nucleosomes or hemisomes (Fig. 1a)15,27. Since early nucleosome studies, MNase protection has been a standard for defining canonical nucleosomes28,29. Crystallographic studies of canonical nucleosomes have defined how each histone dimer pair (H2A–H2B or H3–H4) has a single, basic DNA binding ridge that binds to ~25–30 bp of DNA27. The canonical histone octamer wraps ~100–120 bp of DNA in this way with the final ~two turns (~20 bp) of terminal DNA stabilized by contacts with the αN helix of histone H3. Thus, in total, the canonical nucleosome core particle stably protects ~147 bp from MNase digestion. Tetrameric histone complexes of any sort only have enough DNA wrapping surface to bind to ~65 bp of DNA (Fig. 1b)30–32. Before embarking on our ChIP-Seq studies of CENP-A nucleosomes isolated from functional centromeres, we examined reconstituted CENP-A-containing complexes using one of the same high-resolution, high-sensitivity detection methods with which we now employ with the native particles (see below). We found that recombinant CENP-A nucleosomes lack protection of crossed entry–exit DNA (i.e. they do not protect a fragment corresponding to the ~165 bp peak protected by canonical nucleosomes containing conventional H3) and digest to three discrete peaks, one the size of the nucleosome core particle (~145 bp) and two of smaller size (~110 and ~130 bp)(Fig. 1c and Supplementary Fig. 1). All of the fragment lengths protected by recombinant octameric CENP-A nucleosomes are substantially larger than any fragment from reconstituted (H3–H4)2 and (CENP-A–H4)2 tetrasomes that protect ~65 bp (Fig. 1d). Thus, structural models and experiments with reconstituted particles encouraged us to pursue a similar MNase strategy with native CENP-A particles to distinguish between the radically different configurations that have been proposed for the fundamental unit of functional centromeric chromatin.
Native ChIP of CENP-A-containing particles from human cultured cells strongly enriched centromere DNA and yielded MNase-protected fragments in three major size classes (~110 bp, ~130 bp, ~150bp)(Fig. 2a–c). Native ChIP of canonical nucleosomes containing conventional H3 yielded a distribution of a single size class of MNase-protected fragments that expectedly matched the input bulk nucleosomes (Fig. 2a and Supplementary Fig. 2a), indicating that the smaller fragments observed for CENP-A-containing particles are not due to additional fragmentation during immunoprecipitation. If the smaller fragments (~110 bp) are derived from digestion of the nucleosome termini, as in our experiments with recombinant CENP-A octameric nucleosomes (Fig. 1c), then excessive digestion should remove the termini and leave a stable ~110 bp core fragment undigested. We tested this notion by repeating the isolation of CENP-A nucleosomes where we used a low or high concentration of MNase. Treatment with high concentration of MNase expectedly yields more heavily digested bulk chromatin with a higher mononucleosome:dinucleosome ratio and mononucleosomes that are trimmed down to core particles (Fig. 2d). High concentration MNase treatment diminishes the larger (130–160 bp fragments) and increases the smaller (100–120 bp fragments) MNase-protected DNA fragments from isolated CENP-A particles (Fig. 2e). These findings suggest that native CENP-A particles have a stable core with transiently unwrapping ends that are digested in a manner that is sensitive to the concentration of MNase.
Transient unwrapping of nucleosome terminal DNA predicts that chemical protein–DNA crosslinking would lock the DNA to the CENP-A-containing histone octamer. Indeed, standard formaldehyde crosslinking as is used in diverse chromatin studies33,34, yielded CENP-A-containing particles with a single distribution of MNase-protected fragments of ~150–170 bp, nearly identical to that of solubilized bulk nucleosomes (Supplementary Fig. 2b–e). These CENP-A particles were isolated out of nucleosome preparations that contained all detectable CENP-A protein (Supplementary Fig. 2b), and were specifically enriched for α-satellite DNA (Supplementary Fig. 2c) to a similar extent as were native preparations (Fig. 2c). Further, similar results were obtained for two independent cell types, one derived from healthy tissue (PD-NC4 cells) and one derived from a tumor (HeLa)(Supplementary Figs. 2d,e). Together, our findings strongly suggest that we are monitoring the DNA wrapping behavior of the major form of CENP-A nucleosomes, and that in doing so, our approach represents a highly sensitive means to probe centromere chromatin architecture.
We considered that the sub-145 bp MNase-protected fragments on natively prepared CENP-A particles could be caused by 1) the physical properties conferred by the incorporation of CENP-A into nucleosomes that make the terminal DNA susceptible to MNase digestion or 2) the properties imposed by the sequence or higher-order structure of the α-satellite DNA (where the monomer repeat unit is 171 bp18,35) upon which CENP-A is assembled at normal human centromeres. Neocentromeres provide a prime tool to investigate functional CENP-A nucleosomes in the absence of any effects imposed by α-satellite DNA. We used patient-derived cell lines harboring one neocentromeric chromosome each (Fig. 3a–c) for ChIP-Seq studies. Two of the neocentromeres map to single copy, complex DNA sequences6,8 (Fig. 3a,c) while the other is present on a repeat sequence where the ~12 kb monomer sequence is completely unrelated to α-satellites10 (Fig. 3b). We mapped paired-end CENP-A nucleosome sequences and found strong enrichment at each of the neocentromeres we examined (Fig. 3a–c; Supplementary Table 1), in good agreement with earlier mapping efforts6,8,10. The vast majority of CENP-A nucleosomes at the neocentromere fall in the three CENP-A nucleosome size classes (Fig. 2a and Supplementary Fig. 3c; three bins: 100–119bp, 120–139bp, 140–160bp), and we found that all three size classes map to the same positions (Fig. 3d–i and Supplementary Figs. 3–5; Supplementary Table 2).
The finding that the small (~110 bp), medium (~130 bp), and large (~150 bp) fragments localize to the same genomic positions (as opposed to distinct ones) is consistent with the notion that CENP-A nucleosomes have DNA termini that transiently unwrap and are thus prone to variable terminal nuclease digestion. Indeed, co-localization of all three size classes is evident for both quantitative global analysis of the entire neocentromere regions (Fig. 3d,f,h and Supplementary Figs. 3d, 4a, 5a; Supplementary Table 2) and for local analysis of CENP-A nucleosome sites (Fig. 3e,g,i and Supplementary Figs. 3f–m, 4b–i, 5b–g; Supplementary Table 2). Initial removal of duplicate reads yielded similar results (Supplementary Fig. 3e,n), indicating that the DNA wrapping behavior we observe is entirely attributable to positioning of CENP-A-containing particles. Despite originating at diverse genomic locations on separate chromosomes, and with highly variable sizes and patterns of CENP-A nucleosome enrichment (Fig. 3a–c), the wrapping behavior of individual CENP-A nucleosomes is strikingly similar for all three of the neocentromeres we examined. In total, our analysis of neocentromeres strongly suggests that the DNA wrapping properties of CENP-A-containing particles are largely independent of DNA sequence variation in complex DNA and can be attributed to the physical properties conferred by the presence of CENP-A.
The highly repetitive nature of the DNA sequences found at normal centromere raises the possibility that nucleosome positioning and DNA wrapping is more ordered on α-satellite DNA. Indeed, there are preferred MNase digestion sites of CENP-A-containing chromatin within α-satellite monomers36. Centromeres remain largely unannotated, and standard genomic sequence filters discard these sequences. We developed a scheme that takes advantage of paired-end and long (100 bp) deep sequencing reads (Supplementary Fig. 6a,b) and the fact that α-satellite monomers share >60% sequence identity to one another35. Without both paired-end and long reads, it is impossible to identify the length or sequence of nucleosome protected MNase fragments within such highly repetitive DNA. Our scheme is to align nucleosome sequences to a dimer α-satellite consensus sequence (Supplementary Fig. 6a,b)35,36. In doing so, we include all sequences that map within a single 171 bp monomer or span two monomers. For all three of the cell lines that we examined, we observed a biphasic behavior of CENP-A nucleosome sequence alignments with a subset of sequences having an alignment value of 35–40% and another at ≥60% (Fig. 4a and Supplementary Fig. 6c,g). The former subset represents the alignment value of random sequences (i.e. sequences that do not originate from α-satellite DNA). The latter subset (≥60% identity shared with the α-satellite consensus) represents bona fide α-satellite sequences. CENP-A nucleosome ChIP preparations are strongly enriched for α-satellite DNA, representing 35–52% of the sequences for each of the three cell lines used in this study (Fig. 4a and Supplementary Fig. 6c,g). ~1.5% of bulk nucleosome sequences are from α-satellite DNA and align with ≥60% identity, and this equates to 4–7×105 bulk nucleosome sequences at centromeres (Fig. 4b and Supplementary Fig. 6d,h) for us to compare to their counterparts containing CENP-A.
The tripartite distribution of size classes of MNase digestion of CENP-A nucleosomes includes 17–24% of the large bin (140–160 bp), 36–42% for the middle bin (120–139 bp), and 32–38% for the small bin (100–119 bp), with small variation observed between experiments performed in the three cell lines used in this study (Fig. 4c and Supplementary Fig. 6e,i). The tripartite distribution is in stark contrast to bulk nucleosomes on α-satellite DNA, where MNase protection of 140–160 bp, or slightly larger, predominates (Fig. 4d and Supplementary Fig. 6f,j), consistent with fully wrapped nucleosomes with or without crossed linker DNA at the entry–exit positions (Fig. 1a). Therefore, even when wrapped with nearly identical sequences—the closely related α-satellite DNA of normal centromeres—CENP-A nucleosomes exhibit distinctly shorter lengths of MNase protection than their conventional counterparts with canonical H3.
To measure the degree of phasing of CENP-A nucleosomes on α-satellite DNA and investigate the relationship between the three different size classes of DNA fragments protected from MNase digestion, we mapped our sequencing data back to the dimerized α-satellite sequence (Fig. 5). CENP-A nucleosomes are highly phased on α-satellite DNA, with the small (100–119 bp) and medium (120–139 bp) MNase protected fragments showing the highest level of phasing (Fig. 5a and Supplementary Fig. 7a,d). The small- and medium-sized MNase protected fragments share a 5’ digestion site ~15–20 bp 3’ of the position of the CENP-B box (a 17 bp binding site for the CENP-B protein37), with the smallest fragments digested ~20 bp shorter than the medium-sized fragments at their 3’ end (Fig. 5a and Supplementary Fig. 7a,d). Phasing of bulk nucleosomes on α-satellite DNA is less pronounced, but there is one clearly preferred site with MNase digestion near the 3’ end of the first CENP-B box and ~5–10 bp 5’ of the second CENP-B box (Fig. 5b and Supplementary Fig. 7b,e).
We predicted that CENP-A-containing and H3-containing octameric nucleosomes have similar preferred sites on α-satellite DNA since the basic residues that contact nucleosomal DNA are largely conserved on the surface of the (CENP-A–H4)2 heterotetramer relative to (H3–H4)2 (Fig. 1)14,15. Upon plotting the midpoints of all nucleosome sequences that map to α-satellite DNA, we found that the most prominent position of the small (100–119 bp) MNase fragments from CENP-A nucleosomes is identical to the most prominent bulk nucleosome position (Fig. 5c,d and Supplementary Fig. 7c,f; compare yellow trace of the CENP-A ChIP to the maroon trace of the bulk nucleosomes). The midpoint of the middle-sized MNase fragments from CENP-A nucleosomes is shifted 10 bp 3’ of the midpoint of the small-sized CENP-A fragments and canonical nucleosomes (Fig. 5c,d and Supplementary Fig. 7c,f; red trace of the CENP-A ChIP). Together, these data argue for a model for nucleosome positioning on α-satellite DNA wherein: 1) canonical nucleosomes prefer a site between CENP-B boxes and maintain strong terminal DNA wrapping with their dyad axis positioned at or very near the midpoint peak we observed (Fig. 5d,f; maroon tracing), 2) the small-sized CENP-A fragments (Fig. 5c,e; yellow) represent MNase digestion of 15–20 bp from each end of a nucleosome with identical dyad axis positioning, and 3) the medium-sized CENP-A fragments (Fig. 5c,e; red) represent asymmetrically digested MNase products that have been cleaved 15–20 bp at their 5’ end but not their 3’ end. Further, CENP-A nucleosomes at their most prominent position at centromeres do not strongly protect fragments >140 bp (Fig. 5). To the contrary, the >140 bp fragments protected by CENP-A nucleosomes are not well phased (Fig. 5a,c). Thus, in the context of their preferred biological context on the chromosome, CENP-A nucleosomes are strongly phased and their propensity to unwrap DNA at their termini is accentuated, especially at the 5’ nucleosome entry–exit site (Fig. 5e; for CENP-A nucleosomes, the i’ site is almost always the site of cleavage, and the i site is very rarely used).
Since our initial analysis (Fig. 5) suggests a strong relationship between the positioning of CENP-B boxes and the CENP-A nucleosome, we next investigated the extent to which CENP-A nucleosome phasing is dependent upon functional CENP-B boxes. The mapping scheme we first employed to examine the phasing of CENP-A nucleosomes on α-satellite DNA revealed that the most prominent CENP-A nucleosome location in the genome yields MNase-digested fragments that exclude the location of the CENP-B box (Fig. 5a). Thus, such a mapping strategy based on consensus sequence does not allow us to directly assess the relationship of these nucleosome positions relative to functional CENP-B boxes that contain the key nucleotide sequence for recognition by the CENP-B protein37. Therefore, we further examined CENP-A nucleosome positions in chromosome-specific higher-order repeat (HOR) α-satellite DNA sequences that have been identified for almost all human chromosomes, although many are poorly annotated in the human genome38. Most of these HORs contain a functional CENP-B box in some fraction of their monomers. Here we chose to examine the well-characterized 2 kb HOR from the X chromosome which contains functional CENP-B boxes in 4 of its 12 monomers (Fig. 6a)38,39. We compared this to the α-satellite HOR found on the Y chromosome, which does not contain any functional CENP-B boxes, and in fact the Y chromosome is the only chromosome that does not show any binding of the CENP-B protein at its centromere40,41. Since we have contiguous end-to-end sequence reads for all of the CENP-A nucleosome derived DNA sequences, we can effectively align them to these HORs. For instance, two of the neocentromere cell lines we use are derived from females, and yield almost no CENP-A nucleosome-derived fragments that align with the chromosome Y HOR (Table 1). One cell line, MS4221, is derived from a male and yields >500,000 CENP-A nucleosome-derived fragments that align with the chromosome Y HOR (Table 1). Thus, our mapping strategy is extremely stringent and provides an attractive means to very faithfully and precisely assign CENP-A nucleosome-derived fragments to their location within annotated HORs.
Pronounced phasing is apparent on each HOR with the midpoints of CENP-A nucleosome positions peaking at locations between the positions of functional CENP-B boxes (Fig. 6a; magenta boxes in monomer diagrams) or where they would be located on monomers lacking functional CENP-B boxes (Fig. 6a,b; grey boxes in monomer diagrams). Since the Y chromosome α-satellite HOR completely lacks a functional CENP-B box, such phasing on the Y HOR indicates that CENP-B box-independent phasing clearly occurs (Fig. 6b). An additional contribution to CENP-A nucleosome phasing by functional CENP-B boxes is suggested at the chromosome X HOR where the monomer sequences that are more than one full monomer away from a functional CENP-B box appear to contain a broader distributions of midpoints (Fig. 6a; dashed box). At these locations, many CENP-A nucleosome midpoint positions fall within the coordinates of the non-functional CENP-B boxes. Further, the difference between the prominent peaks of CENP-A position and valleys between them appears to be more pronounced on the X than on the Y. Thus, aligning CENP-A nucleosome positions on α-satellite DNA suggests a strong CENP-B-box-independent phasing component encoded within the α-satellite monomer and that additional ‘fine-tuning’ by CENP-B-box-dependent phasing may exist.
To examine the extent to which CENP-B-box-dependent phasing of CENP-A nucleosomes occurs, we mapped the chromosome X and Y HOR CENP-A nucleosome sequences to the dimer α-satellite consensus (Fig. 7). Strikingly, the phasing on the chromosome Y HOR is specifically diminished relative to the chromosome X HOR (note that the X HOR [Fig. 7a–c] is very similar to the phasing of CENP-A nucleosome sequences from all α-satellite sequences [Fig. 5a,c and Supplementary Fig. 7a,c,d,f]). The CENP-B box-independent phasing on the chromosome Y HOR (Fig. 7d,e) remains strong enough, however, to still clearly observe the most prominent position(s) for each small, medium, and large CENP-A nucleosome-derived fragments (Fig. 7e). These positions indicate that the central dyad of the preferred CENP-A nucleosome position on the chromosome Y HOR (Fig. 7f) is the same as deduced from our analysis on all α-satellite sequences (Fig. 5e). On the chromosome Y HOR, however, the i and ii MNase cleavage sites are used equally (Fig. 7f), as opposed to the sharp asymmetry observed on the X HOR (Fig. 7b,c) or globally on the CENP-A nucleosome-derived fragments on α-satellite DNA from all chomosomes (Fig. 5e).
Regarding the fundamental unit of centromere specifying chromatin, we report nuclease digestion experiments that demonstrate a remarkable similarity in the behavior of octameric CENP-A-containing nucleosomes reconstituted with recombinant components and the form present at functional human centromeres. We conclude that the predominant form of CENP-A particles at functional centromeres is an octamer with loose terminal DNA based on several key findings: 1) the smallest CENP-A containing particle protects ~110 bp from MNase digestion, which is ~30–50 bp longer than what could be accommodated by tetrameric models, 2) three size classes of CENP-A particles all map to the same nucleosome positions on the complex DNA of neocentromeres, and 3) CENP-A nucleosomes at normal centromeres share the same apparent dyad axis positioning as their conventional counterparts containing H3 on the 171 bp α-satellite DNA repeat sequence.
Our findings do not exclude the possibility that a minor population of CENP-A-containing particles with special stoichiometry exists, nor do they exclude the possibility that other forms exist at particular steps during a cell cycle coupled program of CENP-A nucleosome maturation and propagation18. Mutation at the CENP-A–CENP-A interface abrogates CENP-A accumulation at centromeres42,43, suggesting a particle with two copies of CENP-A is required at least transiently in this program. Atomic force microscopy (AFM) measurements of CENP-A-containing particles that were isolated from phases outside of S-phase are shorter than conventional nucleosomes, but are of similar height at S-phase22. These findings were interpreted as evidence for hemisomes as the predominant form through the majority of the cell cycle22. The use of AFM-based height measurements to differentiate between hemisomes and octameric nucleosomes from isolated CENP-A-containing particles may not be as straightforward as it originally seemed, since reconstituted, recombinant CENP-A-containing octameric nucleosomes are substantially shorter than their canonical counterparts containing conventional H344. Further, and to this point, in addition to the neocentromere-harboring cell lines derived from healthy tissue, our studies also include the same tumor-derived cell type as used in the AFM study22, HeLa (Supplementary Fig. 2e,f). Under our culturing conditions ~70% of the HeLa cell population is outside of S-phase (Supplementary Fig. 2f). We observe DNA fragment lengths consistent with an octameric CENP-A nucleosomes in HeLa (Supplementary Fig. 2e) with no evidence of the biphasic behavior predicted by a model where there are long periods of the cell cycle where CENP-A forms radically different particles (e.g. a hemisome and octameric nucleosome switching model22). Therefore, since sub-octameric forms are not highly populated in the genome, we conclude that such minor species would be present at very low levels or only very transiently during the cell cycle.
Our findings also uncovered remarkable coupling of the propensity of the CENP-A nucleosome to unwrap its terminal DNA with its strongly phased position within the 171 bp monomer unit of centromeric α-satellite DNA. We further conclude that CENP-B binding to the CENP-B box generates asymmetric unwrapping of CENP-A nucleosome terminal DNA. Nucleosomes, CENP-A-containing or bulk nucleosomes, are not positioned evenly between the sites of CENP-B boxes within α-satellite monomers. Rather, the site for the CENP-B box is immediately adjacent 5’ of the entry–exit site. Thus, this places the 3’ end of the CENP-B box very near to the nucleosome (Fig. 5e,f). CENP-B binding induces a ~60° bend in the DNA with the strongest kink induced 4 bp from the 3’ end of the CENP-B box45. We think it is very likely that this property of CENP-B contributes strongly to several chromatin features we observe on α-satellite DNA: 1) the general phasing observed for bulk nucleosomes, 2) the enhanced phasing we see for CENP-A nucleosomes, and 3) the asymmetric unwrapping of nucleosome terminal DNA that is exquisitely specific to CENP-A-containing nucleosomes that are bounded by CENP-B boxes. To the latter feature, it appears that CENP-A has evolved in a manner that is poised to have its nucleosomal termini unwrapped. It is enticing to speculate that the physical relationship between CENP-A, CENP-B, and α-satellite DNA is a product of co-evolution. Whether at established centromere locations of highly repetitive DNA or at new centromere locations lacking repeats, however, CENP-A marks centromere location as part of an octameric nucleosome with loose termini.
Tetrasomes, nucleosomes, and nucleosomal arrays were reconstituted from purified components using salt dialysis46. Briefly, human histones H3, H4, H2A, H2B were purified as monomers14 and mixed to form (H3–H4)2 tetramer and (H2A–H2B) dimer complexes14,47 while human (CENP-A–H4)2 was purified from a bi-cistronic vector as a tetramer17. The ‘601’ 1× 200 bp and ‘601’ 12× 200 bp DNA templates48,49 were both purified by anion exchange chromatography. The indicated histone complexes were combined with the DNA in 2 M NaCl and dialyzed in steps: 1) TE (10 mM Tris [pH 7.8], 0.25 mM EDTA) supplemented with 1 M NaCl, followed by 2) TE supplemented with 0.75 M NaCl, and lastly 3) TE supplemented with 2.5 mM NaCl. Tetrasomes, nucleosomes, or nucleosomal arrays were digested with 2 U/µg MNase (Roche) in the presence of 3 mM CaCl2 for 0.5 to 2 min. Each comparison shown between CENP-A-containing and H3-containing particles was performed in parallel under identical reaction conditions for the same length of time. Each reaction was quenched by addition of 10 µl of 0.5 M EGTA and Buffer QG (Qiagen) and placed on ice. The DNA was purified using a DNA purification kit (Qiagen) and subsequently analyzed by Agilent 2100 Bioanalyzer using the DNA 1000 kit.
For native ChIP, 2–5 × 107 cells were collected and resuspended in 2 ml of ice cold buffer I (0.32 M Sucrose, 60 mM KCl, 15 mM NaCl, 5 mM MgCl2, 0.1 mM EGTA, 15 mM Tris [pH 7.5], 0.5 mM DTT, 0.1 mM PMSF, 1:1000 protease inhibitor cocktail [Sigma]). 2 ml of ice cold buffer I supplemented with 0.1% IGEPAL was added and placed on ice for 10 min. The resulting 4 ml of nuclei was gently layered on top of 8 ml of ice cold buffer III (1.2 M Sucrose, 60 mM KCl, 15 mM NaCl, 5 mM MgCl2, 0.1 mM EGTA, 15 mM Tris [pH 7.5], 0.5 mM DTT, 0.1 mM PMSF, 1:1000 protease inhibitor cocktail) and centrifuged at 10,000 × g for 20 min at 4°C with no brake. Pelleted nuclei were resuspended in buffer A (0.34 M sucrose, 15 mM Hepes [pH 7.4], 15 mM NaCl, 60 mM KCl, 4 mM MgCl2, 1 mM DTT, 0.1 mM PMSF, 1:1000 protease inhibitor cocktail (Sigma)) to 400 ng/ul. MNase (Affymetrix) digestion reactions were carried out on 100 µg or more chromatin using 0.9–2.8 U/µg chromatin in buffer A supplemented with 3 mM CaCl2 for 10 min at 37°C. The reaction was quenched with 5 mM EGTA on ice and centrifuged at 13,500 × g for 10 min. The chromatin was resuspended in 10 mM EDTA [pH 8.0], 1 mM PMSF, 1:1000 protease inhibitor cocktail and rotated at 4°C for 2–4 h. The mixture was adjusted to 500 mM NaCl, allowed to rotate for another 45 min and then centrifuged at 13,500 × g for 10 min yielding nucleosomes in the supernatant. 100 µg or more of chromatin was diluted to 100 ng/µl with buffer B (20 mM Tris [pH 8.0], 5 mM EDTA, 500 mM NaCl, 0.2% Tween 20) and pre-cleared with 60 µl 50% protein G bead (GE Healthcare) slurry for 20 min at 4°C. 1–2 µg of the pre-cleared supernatant (bulk nucleosomes) was saved for further processing. To the remaining supernatant, antibody was added and rotated overnight at 4°C. Immunocomplexes were recovered by addition of 100 µl 50% protein G bead slurry followed by rotation at 4°C for 3 h. The beads were washed three times with buffer B, and once with buffer B without Tween. For the input fraction, an equal volume of input recovery buffer (0.6 M NaCl, 20 mM EDTA, 20 mM Tris [pH 7.5], 1% SDS) and 1 µl of RNAse A (10 mg/ml) was added followed by incubation for one hour at 37°C. 100µg/ml Proteinase K (Roche) was then added and was incubated for another 3 h at 37°C. For the ChIP fraction, 300 µl of ChIP recovery buffer (20 mM Tris [pH 7.5], 20 mM EDTA, 0.5% SDS, 500 µg/ml Proteinase K) was added directly to the beads and incubated for 3–4 hrs at 56°C. The resulting Proteinase K-treated samples were subjected to a phenol-chloroform extraction followed by purification using a Qiagen MinElute column. For crosslinked ChIP, 2–5 × 107 cells were processed with the SimpleChIP Enzymatic Chromatin IP Kit (Cell Signaling) using the manufacturer’s recommendations. Unamplified bulk nucleosomes or ChIP DNA was analyzed using Agilent 2100 Bionanalyzer High Sensitivity Kit. The Bioanalyzer determines the quantity of DNA based on fluorescence intensity. Antibodies used for ChIP: mouse α-CENP-A monoclonal (15 µg, ab13939 (Abcam)); rabbit α-H3K9me3 polyclonal (10 µg, ab8898 (Abcam)); rabbit α-H3.3 polyclonal (17 µg, 09–838 (Millipore)).
Sequencing libraries were generated and barcoded for multiplexing according to Illumina recommendations with minor modifications. Briefly, 2–15 ng Input or ChIP DNA was end-repaired and A-tailed. Illumina Truseq adaptors were ligated, libraries were size-selected to exclude polynucleosomes, and the libraries were PCR-amplied using Phusion polymerase. All steps in library preparation were carried out using NEB enzymes. Resulting libraries were submitted for 100 bp, paired-end Illumina sequencing on a HiSeq 2000 instrument.
Paired-end ChIP-Seq reads were aligned to the human genome build hg19 with Bowtie2 version 2.0.0 using paired-end mode. Reads were aligned using a seed length of 50 bp and only the single best alignment per read with up to 2 mismatches was reported in the SAM file. The aligned mate pairs were joined in MATLAB using the ‘localalign’ function (to determine the overlapping region between the reads [requiring ≥95% overlap identity]) (Supplementary Fig. 3a). Duplicate read removal was carried out using the ‘rmdup’ command in SAMtools. To create nucleosome occupancy maps at neocentromeres, all joined reads were aligned to the neocentromere and the number of reads that align with 100% identity are plotted for each particular base pair along the neocentromere coordinate (Supplementary Fig. 3b). For analysis of α-satellite DNA, all joined reads were aligned to the dimerized α-satellite consensus sequence and those reads aligning with ≥ 60% identity were chosen for further analysis (Supplementary Fig. 6a,b).
Paired-end ChIP-Seq reads were aligned to the Chromosome X or Chromosome Y HOR with Bowtie2 version 2.0.0 using paired end mode. Reads were aligned using a seed length of 50 bp and only the single best alignment per read with 0 mismatches was reported in the SAM file. The 2.0 kb Chromosome X HOR was previously described elsewhere39. The 5.8 kb Chromosome Y HOR was determined by performing dot plot analysis on the annotated portion of the centromere on chromosome Y in the human genome build hg19.
Pairwise Pearson correlation coefficients between nucleosome occupancy maps of various size classes (and between randomly generated datasets) at the neocentromeres were determined using MATLAB. p-values were determined using the Student’s T-test by transforming the correlations to a t-statistic having n-2 degrees of freedom.
Molecular models were generated using PDB ID 1KX5 and 1ZBB for the H3-containing particles and 3AN2 for CENP-A-containing particles. Models of tetrasomes and hemisomes with crossed DNA were generated using linker DNA from 1ZBB and minimized using CNS50,51. The model of the CENP-A nucleosome core particle was generated using DNA from 1KX5. The point in space of DNA crossing was determined as the shortest distance along the projection angle of the DNA between entry and exit sites. All molecular structure figures were generated using PyMOL (www.pymol.org).
We thank O. Jabado (Mt. Sinai, New York, NY, USA) for help with Illumina sequencing, T. Patel (University of Pennsylvania, Philadelphia, PA, USA [UPenn]) and B. Cole (UPenn) for advice on data analysis, D. Rogers (UPenn) and B. Gregory (UPenn) for advice, E. Bernstein (Mt. Sinai, New York, NY, USA) for mentoring and advising D.H., M. Lampson (UPenn) for comments on the manuscript, K. Luger (Colorado State University, Fort Collins, Colorado, USA), D. Cleveland (University of California, San Diego, La Jolla, California, USA), A. Straight (Stanford University, Stanford, California, USA), and D. Rhodes (MRC Laboratory of Molecular Biology, Cambridge, United Kingdom) for plasmids, A. Choo (Murdoch Children’s Research Institute, Victoria, Australia) for the cell line harboring the PD-NC4 chromosome, and R. Allshire (University of Edinburgh, Edinburgh, Scotland, UK) for discussing unpublished results. This work was supported by US National Institutes of Health research grant GM082989 (B.E.B.), a Career Award in the Biomedical Sciences from the Burroughs Wellcome Fund (B.E.B.), a Rita Allen Foundation Scholar Award (B.E.B.), a predoctoral fellowship from the American Heart Association (K.J.S.), and a postdoctoral fellowship from the American Cancer Society (N.S.). T.P. acknowledges support from US National Institutes of Health grant GM08275 (UPenn Structural Biology Training Grant).
The sequencing data have been deposited in the NCBI GEO database with accession number GSE44724.
Author ContributionsD.H. designed and performed experiments and analyzed data. K.J.S and T.P. designed and performed experiments, developed new analytical tools, analyzed data, and wrote the manuscript. M.U.S. developed new analytical tools. N.S. analyzed data and modeled nucleosomes. A.A. performed experiments and provided technical advice on ChIP experiments. P.E.W. directed the project, designed experiments, and analyzed data. B.E.B. directed the project, designed experiments, analyzed data, and wrote the manuscript.