|Home | About | Journals | Submit | Contact Us | Français|
Nucleosomes are the basic unit of chromatin. Nucleosome positioning (NP) plays a key role in transcriptional regulation and other biological processes. To better understand NP we used MNase-seq to investigate changes that occur as human embryonic stem cells (hESCs) transition to nascent mesoderm and then to smooth muscle cells (SMCs). Compared to differentiated cell derivatives, nucleosome occupancy at promoters and other notable genic sites, such as exon/intron junctions and adjacent regions, in hESCs shows a stronger correlation with transcript abundance and is less influenced by sequence content. Upon hESC differentiation, genes being silenced, but not genes being activated, display a substantial change in nucleosome occupancy at their promoters. Genome-wide, we detected a shift of NP to regions of higher G+C content as hESCs differentiate to SMCs. Notably, genomic regions with higher nucleosome occupancy harbor twice as many G↔C changes but fewer than half A↔T changes, compared to regions with lower nucleosome occupancy. Finally, our analysis indicates that the hESC genome is not rearranged and has a sequence mutation rate resembling normal human genomes. Our study reveals another unique feature of hESC chromatin, and sheds light on the relationship between nucleosome occupancy and sequence G+C content.
Human embryonic stem cells (hESCs) are pluripotent and have the capacity to differentiate into all adult cell types.1,2 If cultured under appropriate conditions, pluripotent stem cells maintain their genomic integrity, ensuring fidelity in transmission of genetic information from generation to generation.3,4 These unique features (pluripotency and genomic stability) could be attributed to epigenetic mechanisms, which are critical in chromatin remodeling, cell fate specification, and cell identity establishment.5-15 Indeed, studies have shown that hESCs bear unique chromatin composition compared to somatic cells, including prevalent bivalent histone modifications and DNA hydroxymethylation.16-23 Additionally, changes in chromatin architecture, DNA methylation, and histone modification occur frequently and extensively throughout the course of differentiation.5-8,13,17,18,21,22,24
Nucleosomes are the basic unit of chromatin. The canonical nucleosome is composed of 147 bp of DNA wrapped around a histone octamer core consisting of 2 copies each of histone H2A, H2B, H3, and H4.25-27 Nucleosome positioning (NP) along the DNA sequence, the most fundamental structure of chromatin, provides an essential level of epigenetic mechanism regulating biological processes, such as transcription and DNA repair.28,29 NP is influenced by DNA sequence composition in a passive and ATP-independent manner, as well as by the cellular chromatin remodeling machinery in an active and ATP-dependent fashion.
Current epigenetic research on hESC and its differentiation has greatly focused on histone modifications and DNA methylation. This leaves NP,30-37 an equally fundamental epigenetic mechanism, relatively understudied when compared to histone modifications or DNA methylation. For example, the NIH Roadmap Epigenomics project (www.roadmapepigenomics.org) has generated DNA methylation and various histone modification data for numerous human and mouse cell lines and types, including hESCs and their differentiated products. However, very few of them have been subjected to NP analyses.38-40 Although the DNA accessibility of these cell lines/types has been investigated by DNase-seq studies,41 these data do not readily provide genome-wide NP information.42
To more comprehensively understand hESC chromatin and its changes during differentiation, we investigated NP in a well-defined differentiation system, WA09 hESCs → ISL1+ nascent mesoderm (INM) → smooth muscle cells (SMCs), via paired-end sequencing of micrococcal nuclease (MNase)-digested mononucleosomal DNA fragments (MNase-seq). We have been using this system to investigate epigenetic changes of individual genes and genomic loci during hESC differentiation for some time.
We conducted expression microarray experiments to characterize WA09-hESC → INM → SMC differentiation. The principle component analysis (PCA) of the microarray data indicates a clear separation among the 3 cell types (Fig. 1A), supporting homogeneity for each of them. We then investigated established markers characteristic of each cell type and observed the corresponding changes. These include: 1) Silencing of pluripotent markers SOX2, OCT4, and NANOG upon WA09-hESC differentiation; 2) High level expression of INM markers ISL1 and HAND1 in INM; and 3) Significant activation of SMC markers, such as ACTA2, in SMCs (Table S1A).
These individual marker observations are further corroborated by global analyses. Specifically, at false discovery rate (FDR) of < 0.05, we found a total of 753 genes showing altered expression during the first differentiation stage of WA09-hESC → INM. Among them, while known and putative pluripotent marker genes (20 in total)43 are being silenced (P < 2.2e-16), genes related to development, extracellular matrix (ECM), and focal adhesion are being activated (P < 3.0e-11) (Fig. 1B and Table S1A-B). The 2nd stage of differentiation, INM → SMC, is however characterized by downregulation of cell adhesion and tight junction genes (P < 1.0e-04) (Fig. 1B and Table S1A-B), as well as by upregulation of genes that are consistent with SMC properties, including 29 genes that are associated with smooth muscle, actin, or calponin (P = 1.1e-07) (Table S1A).
To conduct MNase-seq, we treated the cells with MNase that yields >98% mononucleosomes in each cell type, gel-purified the mononucleosomal DNA band (Fig. S1), and sequenced from both ends. In total, we generated 205–226 million pairs of 90 × 90 bp end sequences per cell type (Table S2A). We placed over 94% of these pairs uniquely back onto the human reference genome properly (both ends on the same chromosome, with the right orientation and spanning a reasonable genomic distance) (Table S2A). This results in >11× coverage in both sequence and fragments of mononucleosomes for each cell type (Table S2A).
As a control, we also sequenced randomly sheared genomic DNA fragments of 150–200 bp of WA09-hESC. We achieved the same sequencing and mapping efficiency, reaching a 13X coverage in both sequence and fragments (Table S2A).
We investigated the relationship of NP with transcript abundance and the sequence G+C content at notable genic sites, including promoters, exon/intron junctions and flanking regions, as well as gene ends. We first sorted the genes into 6 groups based on their transcript abundance, with each group having microarray expression intensity of ≤100, 100–250, 250–500, 500–1000, 1000–3000 or ≥3000 (Fig. 2 and Table S2B). For promoters, which cover regions flanking the transcription start site (TSS), we observed a nucleosome depleted region (NDR) and positioned nucleosomes immediately upstream and downstream of the TSS(Fig. 2A and Fig. S2; Table S2C). The extent of the NDR strongly correlates with the transcript abundance level, with a Pearson correlation coefficient >0.7 for each cell type and of 0.84, the highest, for WA09-hESC (Table S2D). Meanwhile, we also noted a much weaker overall correlation between nucleosome occupancy and the G+C content of promoter sequences (Fig. 2A and Table S2E; see Table S2F for Pearson correlation coefficients at various promoter regions). These correlations apply to exon/intron junctions and flanking regions (Fig. 2B–C). Lastly, a NDR was observed at the transcription termination site (TTS), primarily arising from the very AT-rich sequences there (Fig. 2D and Fig. S2). These findings are consistent with published studies.30,31,38,44 Moreover, our nucleosome occupancy maps closely resemble those of published for hESC (Fig. S3).38
WA09-hESC promoters have the most prominent NDRs among the 3 cell types. For example, the NDRs of the 6 gene expression groups shown in Fig. 2A are significantly larger in WA09-hESCs, when compared to either INM (P = 0.02) or SMC (P = 0.0004) (Table S2D). Meanwhile, no significant difference was observed for the NDRs between INM and SMC (P = 0.2) (Table S2D). Within the NDRs, nucleosome occupancy is actually in negative correlation with the sequence G+C content in WA09-hESCs, with the lowest correlation coefficient reaching −0.57 for WA09-hESC compared to −0.2 for INM and SMC (Table S2F).
Unlike promoters, the gene ends of WA09-hESC have significantly smaller NDRs, compared to INM (P = 0.046) or SMC (P = 0.02) (Fig. 2D and Fig. S2; Table S2D). This indicates the least influence of the AT-rich sequences at TTSs on nucleosome occupancy in WA09-hESC among the 3 cell types.
At exon/intron junctions, WA09-hESC consistently shows the strongest negative correlation between nucleosome occupancy and transcript abundance among the 3 cell types, with Pearson correlation coefficients approximately at −0.8 for WA09-hESC, −0.7 for INM, and −0.6 for SMC (Table S2G). The same conclusion applies to regions flanking the exon/intron junctions (Table S2G). Indeed, the 6 groups of genes shown in Fig. 2B-C, classified based on their transcript abundance, as previously described, are more evenly spaced according to their nucleosome occupancy strength in WA09-hESC. This differs from INM and especially SMC, where nucleosome occupancy divides the 6 gene groups into 3 aggregates, with each having nearly identical nucleosome occupancy level (Fig. 2B-C). The three aggregates roughly correspond to genes being lowly, moderately, or abundantly transcribed, with expression intensity of < 250, 250–500 and > 500, respectively (Fig. 2B–C).
In summary, the findings described above indicate that nucleosome occupancy at promoters and other notable genic regions (Fig. 2) is influenced more by transcript abundance but less by the sequence content in WA09-hESC, when compared to INM and SMC.
For a total of 118 genes being silenced upon differentiation (genes that are downregulated by >2-fold and have expression intensity of >500 in WA09-hESC but of <500 in both INM and SMC; see Table S3A), we observed a clear change in promoter nucleosome occupancy (Fig. 3A). This is shown by progressively diminishing NDR as WA09-hESC transition to INM and further to SMC (P < 0.001; Table S3B). For a total of 240 genes being activated upon differentiation (genes that are upregulated by ≥2-fold and have expression intensity of <500 in WA09-hESC but of >500 in both INM and SMC; see Table S3A), no such clear change was found (P = 0.2; see Table S3B), although the NDR is slightly more prominent in SMCs (Fig. 3A). No changes were detected for genes of which transcript abundance levels remain largely constant during differentiation (Fig. S4).
Puzzled by this difference in promoter NP change between gene silencing and activation (Fig. 3A), we investigated the mRNA half-life of these genes. The rationale is that transcript abundance levels, measured here by microarray analysis, are determined by mRNA production, which is regulated by processes including promoter NP, as well as by degradation, which can be assessed with mRNA half-life. Because we did not find global determination of mRNA half-life of hESC in published literature and databases, we instead used values of their mouse counterparts.45 Interestingly, silenced genes have a significantly (P < 0.01) shorter mRNA half-life than activated genes on average (Fig. 3B and Table S3C).
To explore if the promoter NP change shown in Fig. 3A is specific to WA09-hESC differentiation, we investigated genes being silenced or activated during the second stage of the differentiation, INM → SMC. For silenced genes, although the overall nucleosome occupancy increases slightly at the promoter region (Fig. S5), the change at the NDR is not as visible as that of WA09-hESC differentiation (Fig. 3A). For activated genes, no clear NP changes were detected, similar to what is observed during WA09-hESC differentiation (Fig. 3A and S5).
We investigated a total of 5,118 putative active enhancers and 2,287 poised enhancers reported for WA09-hESC by a previous study.46 Both types of enhancers are marked by the presence of chromatin regulators p300 and BRG1, enrichment in histone modification H3K4me1, and low nucleosome occupancy.46 Moreover, while putative active enhancers are enriched in H3K27ac and are near active genes in hESCs, poised enhancers are enriched in H3K27me3 and are associated with genes that are inactive in hESCs but are involved in early embryogenesis.46 Consistent with these features,46 we found that both types of enhancers have the lowest nucleosome occupancy level in WA09-hESCs, but the highest nucleosome occupancy level in SMCs (Fig. 4A). We also examined a total of 7,006 enhancers reported for WA01-hESC.47 We concluded that these enhancers have the lowest nucleosome occupancy level in WA09-hESCs compared to their differentiated derivatives (Fig. 4B), consistent with published findings.38
We integrated our NP findings with published DNA methylation studies of WA09-hESC.13 As shown in Fig. 5A, only unmethylated promoters display a prominent NDR and, as expected, a large number of methylated genes are silent or lowly expressed. These observations are consistent with the notion that NP precedes DNA methylation in transcription silencing.48 We also incorporated the histone modification data of WA09-hESC from the NIH Roadmap Epigenomics Project. Indeed, promoters enriched with H3K4me3-only exhibit a NP pattern of actively transcribed genes, having a prominent NDR and well positioned nucleosomes respectively upstream and downstream of the TSS (Fig. 5B). Promoters enriched with H3K27me3-only display a NP pattern of silent genes, while promoters with both histone marks have the NP pattern of poised genes (Fig. 5B). These observations support the accuracy of our NP analysis.
Besides transcription, NP is also influenced by sequence composition. In fact, canonical nucleosomal core sequence consists of 147 bp with A/T (AA/AT/TA/TT) and G/C (GG/GC/CG/CC) dinucleotides oscillating at approximately every 10 bp, assisting the winding of the DNA molecule around the histone core.25,26,49 We hence analyzed the sequences of our mononucleosomal fragments of chosen length by studying the genomic regions onto which they were mapped (see Materials and Methods). We indeed observed a strong oscillation in sequence composition for mononucleosomal fragments of 147 bp (Fig. 6A). While not as strong, the oscillation pattern was also visible in mononucleosomal fragments of other lengths. This is especially so for those approximately 10–12 bp increment/decrement away from 147 bp, i.e., 135 bp, 157 bp, 169 bp, and 181 bp (Fig. 6B and Table S4). For mononucleosomal fragments of ≥ 157 bp, the core exhibiting the dinucleotide oscillation is flanked by G+C rich sequences (Fig. 6B and Table S4).
Within the same cell type, nearly identical A/T and G/C dinucleotide oscillation patterns were observed between autosomes and the X chromosome (Fig. S6 and Table S4). Among cell types, WA09-hESC resembles INM but slightly differs from SMC (Fig. 6 and Table S4). The oscillation signal is absent in randomly sheared genomic fragments of any length (Fig. 6A and Table S4).
Consistent with the association of nucleosome occupancy with sequences of higher G+C content,49-51 genomic regions where mononucleosomal fragments were mapped harbor more G and C bases, compared to flanking regions (Fig. 6B and Table S4). Mononucleosomal fragments are also G+C richer than randomly sheared genomic fragments (Fig. 7A). Interestingly, as mononucleosomal fragment length increases, the G+C content of the sequence rises, at each base position (Fig. 6B and Table S4) and in overall percentage (Fig. 7A). These observations were noted in all 3 cell types. Meanwhile, we also found a few differences among the cell types. While mononucleosomal sequences of SMC are about 2% G+C richer (Fig. 6B and 7A and Table S4), WA09-hESC and INM have more mononucleosomal fragments of longer length, with an average length of 165 bp for WA09-hESC and INM vs. 155 bp for SMC (Fig. 7A and Table S5A).
Two factors are currently proposed to explain the higher G+C content of nucleosomal DNA: nucleosome occupancy preference49 or mutation bias.50 In an attempt to test the mutation bias theory,50 we investigated sequence variations in genomic regions with high or low nucleosome occupancy in the 3 cell types. We first utilized sequences from both mononucleosomal fragments and randomly sheared fragments to identify genomic regions with higher or lower nucleosome occupancy in each cell type (see Materials and Methods). We then examined base substitutions (compared to the human reference genome) in both regions. In all 3 cell types, we found that genomic regions with higher nucleosome occupancy have twice as many C↔G mutations but fewer than half T↔A changes (Fig. 7B and Table S5B). This finding is consistent with the notion50 that higher G+C content of nucleosomal DNA arises from mutation bias rather than nucleosome occupancy preference.
With the sequences of randomly sheared genomic DNA fragments, we investigated potential structural and sequence variations in the WA09-hESC genome. By examining sequence read pair information, we detected neither translocations nor inversions. We found a comparable amount of copy number variations (CNVs) in the WA09-hESC genome of an individual of Middle East - East European ancestry52 and in a normal (non-diseased) genome of an individual of European ancestry (NA12892) that was sequenced to approximately the same coverage (11.8X). For point/oligo-base changes, we identified about 2.6 million single nucleotide polymorphisms (SNPs) when compared to the human reference genome (Table S5C), with base transitions (C↔T and G↔A) occurring at a higher frequency (~70%) compared to base transversions (~30%) (Fig. 8). These observations are comparable to findings from normal human genomes derived from blood samples, including our own analyses with NA12892 and an Asian genome sequenced to a similar coverage53 (Fig. 8 and Table S5C), as well as a published study.54 In summary, the WA09-hESC genome appears normal, with no large genomic changes, and with a sequence mutation rate as low as that identified in normal human genomes, consistent with other studies.3
We performed MNase-seq to investigate NP in a well-defined hESC differentiation system, for which the identity and homogeneity of each cell type was supported by global and individual marker expression analyses. Furthermore, the hESCs under study have maintained their genomic stability and integrity, with no detected genomic rearrangements and with a sequence mutation rate comparable to that of normal human genomes derived from blood samples. Our study contributes to the understanding of the unique chromatin and sequence composition of nucleosomal DNA in hESCs.
Many of our NP findings concur with published studies,13,16,17,23,30,31,38,44 indicating the accuracy of our MNase-seq pipeline. For example, hESC enhancers display lower nucleosome occpancy in hESCs than in their differenciated derivatives,38,46 and NP at promoters agrees with published histone modification and DNA methylation patterns.13 Futhermore, only promoters with unmethylated CpGs exhibit a prominent NDR, consistent with the concept that NP precedes DNA methylation in gene silencing.48 Notably, similar to our histone and DNA methylation findings,16-23 our NP study indicates the uniqueness of hESC chromatin. At promoters and other notable genic sites, nucleosome occupany shows a stronger correlation with transcript abundance but is less influenced by the sequence content in hESC than its differenciated products. We have also detected a shift of NP to G+C richer regions as hESCs differenciate to SMCs. Consistent with the in vivo and in vitro NP comparsion study in C. elegans,55 our observations indicate that chromatin remodeling may be more active in hESC, compared to its differentiated derivatives. Futhermore, our analyses reveal a dynamic NP in hESC, which decreases progressively once differentiation starts. This is consistent with the observation that hESC contains more genes in a poised chromatin state,16,23 to readily resume transcription or to be more irreversibly silenced upon differentiation.
An interesting finding from our study is that genes being silenced, but not genes being activated, show a visible change in nucleosome occupancy at the promoter upon hESC differentiation. This finding was also reported by another group38 and is perhaps related to the possibility that hESCs have the most active chromatin remodeling activity, which decreases as hESCs transition into other cell types, as previously discussed. Interestingly, silenced genes appear to have a shorter mRNA half-life on average, compared to activated genes. Function-wise, while silenced genes encode many known and putative pluripotent markers, activated genes are significantly enriched in ECM-associated groups. A recent study56 reports that during hESC differentiation, genes that have undergone A/B chromatin compartment switching and show correlated gene expression changes are mostly ECM-related and have low G+C content at their promoter. Studies in yeast reveal an association between a gene's mRNA half-life and its promoter sequence.57,58 Based on these publications and our observations, we hypothesize that, for silenced genes, which are enriched in functions maintaining pluripotency, expression decreases by increasing nucleosome occupancy and other repressive epigenetic modifications at the promoter, suppressing transcription, and by rapid decay of existing mRNA molecules. For activated genes, which are enriched in functions of ECM-building, expression increases via higher levels of chromatin change, e.g., B to A compartment switch. Further studies are needed to test this hypothesis.
Our genome-wide analyses identified NP-associated sequence features. These include A/T and G/C dinucleotide oscillation of canonical nucleosomal core sequence25,26 in our mononucleosomal fragments of each cell type. Notably, we found twice as many G↔C substitutions, but fewer than half of A↔T substitutions, in genomic regions with higher nucleosome occupancy than regions with lower nucleosome occupancy. This finding may explain why nucleosomal DNA have higher G+C content.49-51 This result is consistent with a recent study reporting that the higher G+C content of nucleosomal DNA arises from mutation bias rather than nucleosome occupancy preference.50 Frequent G↔C substitutions in cancer have been shown to be associated with over-activity of the APOBEC cytidine deaminases.59 We do not know if any link exists between APOBEC activity and frequent G↔C changes in genomic-regions of higher nucleosome occupancy. One interesting observation is that, compared to WA09-hESC and INM, nucleosomal DNA of SMC is richer in G+C. Meanwhile, 3 APOBEC members, APOBEC3G, APOBEC3C, and APOBEC3F, are expressed 2–3-fold higher in SMC. Whether this is a coincidence or has functional implications remains to be determined. Finally, we caution that the observed high G+C content of nucleosomal DNA may partially arise from the digestion bias of MNase toward AT-rich sequences (although one study has reported that this bias is not substantial in their nucleosome mapping60), besides the strong intrinsic association between nucleosome occupancy and high G+C sequences.49-51 Therefore, our conclusions need to be validated in future studies that correct MNase sequence digestion bias.
WA09-hESCs were maintained in StemPro defined media (Invitrogen). Differentiation to INM and SMCs was achieved by supplementation of the defined media with Wnt3a (25 ng/ml) and BMP4 (50 ng/ml) for 4 and 21, days respectively. RNA was purified from approximately 5 million cells per sample using the Qiagen RNeasy Plus Mini kit (Cat. No. 74134). Then, high quality (a 260/280 absorbance ratio of ~2.0, non-degraded, and free of genomic DNA contamination) samples were analyzed using the Affymetrix Human Gene 1.0 ST array with biological replicates. The moderated t-test implemented in the ‘limma’ package61 was used to identify differentially expressed genes between the cell types, and P-values were adjusted for multiple-hypothesis testing with the Benjamini and Hochberg method.62 PCA analysis was performed using R (www.R-project.org). The packages used are available at the Bioconductor site (www.bioconductor.org).
Chromatin was processed by following a published protocol31 with MNase (Worthington Biochemical Corp.) to yield >95% mononucleosomal DNA (Fig. S1). DNA samples with a 260/280 absorbance ratio around 1.8 were used for downstream applications. DNA fragments of approximately 150 bp were gel purified as described,63 and sequenced from both ends to yield 90 × 90 bp sequence read pairs using Illumina Genome Analyzer at the BGI. As a control, genomic DNA was extracted following the same protocol but without MNase-digestion, randomly sheared to 100–200 bp fragments, and sequenced from both ends. All read pairs were then mapped to the human genome (hg18) using the Burrows-Wheeler Aligner (BWA) tool64 with the default parameters documented in the bwa-0.5.9 version. Read pairs uniquely placed onto the genome were used for further analyses.
The KnownGene annotation (hg18) downloaded from the UCSC genome database (genome.ucsc.edu) was used to match the genes of the Affymetrix Human Gene 1.0 ST array in genomic sequence coordinate. A total of 33,271 transcripts (17,592 genes) with more than 90% overlapping in coordinate were then chosen for nucleosome occupancy analysis. Nucleosome occupancy for base i in the genome was estimated by , where and respectively represent the total count of mononucleosomal fragments (m) and randomly sheared genomic fragments (g) covering base i, with and being the corresponding genome-wide average of and . Data of randomly sheared genomic fragments of WA09-hESC were used in calculations of all 3 cell types, under the assumption that the genome remains the same during the 21-days of cell differentiation. The TSS, exon-intron/intron-exon junction, and gene end data were extracted from the UCSC KnownGene annotation. Average G+C content of each corresponding region was calculated based on the hg18 genome.
Bisulfite sequencing data of WA09-hESC were obtained from a published study.13 Both H3K4me3 (GSM605316) and H3K27me3 (GSM706066 and GSM667622) ChIP-seq reads of WA09-hESC were downloaded from the NIH Roadmap Epigenomics site (www.roadmapepigenomics.org), and mapped to the hg18 genome with BWA as previously described. Uniquely mapped reads were selected for further analyses.
Gene functional annotation and enrichment were analyzed by DAVID.65 Mouse mRNA half-life data were obtained from a published study.45 The mouse-human gene conversion was achieved using the Human and Mouse Ortholog file obtained from Mouse Genome Database (www.informatics.jax.org). As a result, mRNA half-life was assigned to a total of 13,578 human genes.
Mononucleosomal DNA sequences were determined as follows. For a mononucleosomal fragment, if its end sequence reads were both mapped perfectly and uniquely onto the human genome, the genomic sequence spanned by the reads would be its sequence. Then, the sequences of all such mapped mononucleosomal fragments of a chosen length were aligned, from which fractions of AA/AT/TA/TT and GG/GC/CG/CC dinucleotides at each base position from dyad were calculated. The same analysis was repeated with randomly sheared genomic fragments for control.
The sequences of randomly sheared genomic DNA of WA09-hESC, along with genomic sequences of an individual of an European ancestry (NA12892) downloaded from www.1000genomes.org and genomic sequences of an Asian,53 were mapped to the hg18 genome with BWA. SNPs, compared to the hg18 genome, were identified using SAMtools66 and GATK.67 CNVs were identified as described,68-70 using . and respectively represent the fragment density of the ith window (200 bp) of the test genome (t: either WA09-hESC or NA12892) and of the normalizing Asian genome (a). and represent the corresponding genome-wide average of and .
To identify genomic regions with higher or lower nucleosome occupancy, we analyzed , where and respectively represent the fragment density of window i of nucleosomal DNA (n) and randomly sheared genomic DNA (g), with and being their corresponding genome-wide average. Furthermore, we set for windows with: 1) and ; or 2) or . Then, we identified genomic regions that are amplified, considered as sites with higher nucleosome occupancy, or deleted, deemed sites with lower nucleosome occupancy, as described.68-70 Next, we examined sequence mutations in these regions as described above.
MNase-seq and microarray data have been deposited to the GEO database under the accession number GSE46467.
No potential conflicts of interest were disclosed.
We thank the Emory Biomarker Core for conducting the microarray experiments, as well as the BGI for the sequencing work.
The study was 545 funded by the National Cancer Institute R01 CA182093 (to SZ) and the National Institute of General Medical Sciences GM085354 (to SD).