|Home | About | Journals | Submit | Contact Us | Français|
Although nucleosome remodeling is essential to transcriptional regulation in eukaryotes, little is known about its genome-wide behavior. Since a number of nucleosome positioning maps in vivo have been recently determined, we examined if their comparisons might be used for obtaining a genome-wide profile of nucleosome remodeling. Using seven yeast maps, the local variability of nucleosomes, measured by the entropy, was significantly higher in a set of reported unstable nucleosomes. The binding sites of four transcription factors, known as the remodeling factors, were distinctively high both in entropy and linker ratio, whereas those of Yhp1, their potential inhibitor, showed the lowest values in both of them. Taken together, our map shows the general information of nucleosome dynamics reasonably well. The “nucleosome dynamics” map provides the new significant correlation with the degree of expression variety instead of their intensity. Furthermore, the associations with gene function and histone modification were also discussed here.
The online version of this article (doi:10.1007/s00412-010-0264-y) contains supplementary material, which is available to authorized users.
The genome of eukaryotes takes the chromatin structure, which plays important roles in many cellular processes, such as transcription and replication. The unit of the chromatin structure is the nucleosome, which is composed of a histone octamer, and its positioning on DNA seems to be optimized for facilitating these cellular processes. For example, a number of researchers reported that the upstream region of transcription start sites (TSSs) of genes is statistically free of nucleosomes (Jiang and Pugh 2009a, b). This kind of studies are accelerated thanks to a series of recent releases of genome-wide nucleosome positioning maps in human (Ozsolak et al. 2007; Schones et al. 2008), fly (Mavrich et al. 2008a, b), nematode (Johnson et al. 2006; Valouev et al. 2008), Medaka fish (Sasaki et al. 2009), and budding yeast (Kaplan et al. 2009; Lee et al. 2007; Mavrich et al. 2008a, b; Whitehouse et al. 2007; Yuan et al. 2005).
The nucleosome positioning is not uniform temporally, either. This phenomenon is known as nucleosome remodeling, where ATP-dependent chromatin remodeling factors can change the nucleosome organization by inducing DNA superhelical torsion (Chandy et al. 2006; Havas et al. 2000). In yeast, several transcription factors, such as Abf1, Reb1, and Rap1, known as General Regulatory Factors (GRFs; Chasman et al. 1990), also perturb nucleosome positioning in the vicinity of their binding sites for activating neighboring regulatory sites (Fourel et al. 2002; Hartley and Madhani 2009). In the promoters of ribosomal protein, Rap1 is required for recruitment of Esa1 catalytic subunit of NuA4 H4 lysine acetyltransferase (Reid et al. 2000). Additionally, it is known that induction of chromatin remodeling factors is controlled by a variety of histone modifications. For example, a typical remodeling complex, Swi/Snf, requires histone acetylation by the Spt-Ada-Gcn5-acetyltransferase (SAGA) complex to bind to DNA and then displaces the acetylated histone (Chandy et al. 2006; Varga-Weisz 2001). Another remodeling complex, Mi-2, also contains histone deacetylases (Whitehouse et al. 1999). Therefore, it is of great interest to compare the changes between histone modification patterns and associated nucleosome dynamics.
To examine the chromatin remodeling globally, it is necessary to compare the change of nucleosome positions upon some stimuli. So far, while the positional changes of nucleosomes have been analyzed in several loci (Almer et al. 1986; Barbaric et al. 1992; Kent et al. 2001; Moreira and Holmberg 1998; Weiss and Simpson 1997), only a little is known about their genome-wide behavior. As a pioneering study, Shivaswamy et al. showed the depletion of nucleosomes in promoters induced by heat shock (Shivaswamy et al. 2008). However, their results are based on the comparison between only two conditions. Much more data are necessary to get the general genome-wide landscape. Jiang and Pugh developed a compiled reference map of nucleosome positions in Saccharomyces cerevisiae, using multiple maps independent of our study (Jiang and Pugh 2009a, b). However, detailed analyses of its “dynamic” positions have not been published yet.
In this study, we tested a simple hypothesis that the local positional variance of nucleosomes in a set of heterogeneous maps can be an indicator of local nucleosome remodeling. Seven yeast maps were used, and we show that our results are quite promising, that is, consistent with a number of previous observations, such as promoters, replication origins, and transcription factor binding sites (TFBSs) (Field et al. 2008; Hartley and Madhani 2009; Shivaswamy et al. 2008), and are useful for deriving novel but reasonable correlations between the change of nucleosome positioning and several features, such as gene function, expression variety, and histone modifications.
We combined seven different maps from four publications, which are based on two different experimental techniques: tiling arrays and next-generation sequencers (Supplementary Table 1). Since it is desirable to define all nucleosome positions based on a unified method, we modified a hidden Markov Model (HMM) that was first introduced by Yuan et al. with the signal gradient instead of its intensity (Yuan et al. 2005; Supplementary Fig. 1 and Materials and methods). In each map, more than 55,000 locations of nucleosomes, which cover 70–85% of the entire genome, were re-assigned (Fig. 1a and Table 1). The total count of detected nucleosomes in this study was about 3,000–4,000 higher than the original values in (Mavrich et al. 2008a, b; Whitehouse et al. 2007) and was closer to that in Lee et al. (2007). However, for more than 80% of the originally reported positions, newly assigned nucleosomes were positioned within a 30-bp range of the inter-center distance (Supplementary Fig. 2). Thus, we regarded that our re-definition of nucleosome positions is reasonable.
By comparing the relative positions of re-assigned nucleosomes with the above procedure, we can identify local regions where nucleosome positions are stable or unstable among the seven maps. As a measure of the positional variability of nucleosomes, we adopted the average of “entropy” value over a 100-bp window (see Materials and methods). This definition of entropy is based on Shannon's entropy, which is a standard measure for estimating the uncertainty of a given signal in Information Theory (Schneider 2010). While the entropy is high in the region where nucleosome positioning is variable between datasets (black rectangle in Fig. 1a), the entropy is low where nucleosomes are occupied or depleted in all datasets (white or gray rectangles in Fig. 1a).
To verify our “nucleosome dynamics” map, we collected experimentally verified examples of 21 stable and 13 unstable nucleosomes on five promoters from literature (Almer et al. 1986; Barbaric et al. 1992; Kent et al. 2001; Moreira and Holmberg 1998; Weiss and Simpson 1997) (Supplementary Table 2). The unstable nucleosomes show significantly higher entropy values than the stable ones (P=1.6e-3 from Wilcoxon test; Fig. 1b).
It is also important to know how much the variability of nucleosome locations is explained by the difference of experimental techniques. Thus, the mutual correlation of assigned positions in highly dynamic regions between different maps was examined using a cluster analysis (Supplementary Fig. 3). The data produced by the same authors tend to be clustered together. To confirm if this intra-laboratory correlation does significant harm or not, we further checked the effect of removing one or two of the data from the same Kaplan et al.'s laboratory. The Pearson's correlation coefficient (PCC) was calculated between the entropy values of each case and the original one along the whole genome. In all cases, high correlation was observed [PCC ≥0.936 (P<2.2e-16)], indicating that the differences of nucleosome locations were not always due to those of experimental techniques or authors. In the following sections, we will further validate the reliability of our map by comparing our observations with a variety of previous reports.
The averaged positions of nucleosomes around TSSs and replication origins have been studied in various organisms (Field et al. 2008; Johnson et al. 2006; Lee et al. 2007; Mavrich et al. 2008a,b; Ozsolak et al. 2007; Sasaki et al. 2009; Schones et al. 2008; Valouev et al. 2008; Whitehouse et al. 2007). Then, we took similar statistics from our map. In Saccharomyces Genome Database (SGD), the data of mRNAs/TSSs are classified into three groups depending on their reliability: verified, uncharacterized, and dubious ones (Supplementary Table 3). Since the lower entropy may indicate the stable open or closed nucleosome regions (gray or white rectangles in Fig. 1a, respectively), we added the linker ratio (the ratio of linker regions in a 100 bp window), which indicates the tendency of nucleosome depletion, to distinguish them in this analysis. In Fig. 2, the entropy values of these three groups as well as autonomous replication sequences (ARSs) are shown in black lines while the linker ratio values are shown as a heat map. Around the “verified” TSSs, a clear peak for both values was observed around the −100 bp region and a less significant negative peak just downstream of the TSS. This result can be interpreted that a highly unstable nucleosome exists around −100 bp from the TSS while another stable nucleosome is positioned just downstream of the TSS. Similar tendencies were also observed in “uncharacterized”, but not in “dubious”. Additionally, one unstable nucleosome was also observed at around +100 bp from ARSs. It is noteworthy that the less conspicuous the peaks around TSS become, the more questionable the data becomes. This makes sense and supports the plausibility of our data.
Since nucleosome remodeling has been reported to occur around the binding sites of several transcription factors (Shivaswamy et al. 2008), we next examined the positional variability of nucleosomes around conserved binding sites of 100 transcription factors (Harbison et al. 2004; MacIsaac et al. 2006; Supplementary Fig. 4). Interestingly, the correlation between TFBSs and the entropy/linker ratio was quite variable with transcription factors (Fig. 3a; typical examples are shown in Fig. 3b). For example, Abf1 and Reb1 show the highest linker ratios, implicating that the chromatin structure in their binding sites is relatively open. On the other hand, Yhp1 shows the lowest in both entropy and linker ratio, suggesting that its binding sites exist in closed and stable chromatin. As discussed later, many of those extremely positioned transcription factors are known or the relationship with chromatin remodeling is proposed (Fourel et al. 2002; Hartley and Madhani 2009), which strongly suggests that our “nucleosome dynamics” map is reasonably reliable. Furthermore, the binding sites of 83 out of 100 transcription factors are located in regions with significantly higher entropy than expected at random (Bonferroni-adjusted P<0.01; Supplementary Table 4). These results imply that most conserved binding sites are subject to chromatin remodeling.
From the above results, we concluded that the entropy/linker ratio values of the combined map are reliable indicators of (at least) the averaged dynamic status of chromatin. Hereafter, we report some of the new results derived from our data.
The first one is the distribution of “dynamic” regions along the whole genome (Fig. 4a). The histogram of the entropy value is shown in Supplementary Fig. 5a. The local regions with higher entropy values, which are interpreted as highly dynamic chromatin regions, are distributed relatively uniformly in each chromosome. This distribution is slightly negatively correlated with the GC content (PCC=−0.243), but shows no clear correlation with the transcriptome map by Nagalakshmi et al. (PCC=−0.039; Nagalakshmi et al. 2008).
Next, the difference of “nucleosome dynamics” between intragenic regions (i.e., exons plus introns) and intergenic regions was examined. The entropy in intergenic regions was significantly higher than in intragenic regions (P<1.0e-15 from the Wilcoxon test; Fig. 4b). Within the intragenic regions, the entropy in introns was significantly higher than in exons (P<1.0e-15), but was lower than in the intergenic regions (P<1.0e-15). Like entropy, the linker ratio in exons was significantly lower than in introns and in intergenic regions (P<1.0e-15 in both regions; Fig. 4c). These results suggest that nucleosomes are stably and densely located within exonic regions.
Since chromatin remodeling is involved in transcriptional regulation, it is of interest to study the correlation between our data and systematic gene expression data. First, we plotted 5,089 genes based on the entropy and the linker ratio, averaged over their upstream 400-bp region (Fig. 2). Roughly speaking, these two values correlate with each other, but there exists a group of exceptional genes that have larger linker ratios than expected. Second, we classified these 5,089 genes into four clusters by Ward's method and tried to characterize each of them (clusters 1 to 4 in Fig. 5a and Supplementary Fig. 6). Each cluster contained 1,142, 788, 1,289, and 1,870 genes, respectively (Supplementary Table 5). Since the entropy and the linker ratio in cluster 1 were the highest, this cluster was interpreted as having “dynamic and open” promoters. On the other hand, cluster 2 showed the lowest entropy and linker ratio, characterized as “stable and closed”. As noted above, cluster 3 is exceptional because it has lower entropy but higher linker ratio. This cluster of genes is expected to have “stable and open” promoters.
Next, using 173 DNA microarrays for which time-scale expression changes under 16 different stresses have been measured (Gasch et al. 2000), we calculated the standard deviation of log2(ratio) for each gene as a measure of expression variety and examined this value among the clusters (Fig. 5b). Cluster 1, the “dynamic and open” cluster, shows the largest expression variety (Bonferroni-adjusted P<3.0e-3 in all comparisons from Wilcoxon test). On the other hand, cluster 3, the “stable and open” cluster, shows the smallest expression variety (Bonferroni-adjusted P<1.5e-12 in all comparisons). For a confirmation, we did an opposite analysis: we classified the genes into 20 clusters based on the degree of their expression variety. As shown in Fig. 5c, clusters with larger variety tend to have higher entropy values while clusters with smaller variety show lower values, an observation which supports our claim that “dynamic” promoters often regulate genes with larger expression variety. We also tested the correlation with the gene expression intensity (interpreted as equivalent with the mRNA amount) by comparing the counts of cDNA tags mapped on the intragenic regions among the four clusters (Fig. 5d; Nagalakshmi et al. 2008). Although the expression level of cluster 1 was slightly higher than cluster 2 (Bonferroni-adjusted P=8.1e-4), there were no clear differences. Thus, these results support the idea that chromatin remodeling is linked to the degree of expression change but not to its expression level.
For each of the above-mentioned four clusters, we checked whether there are any over-represented Gene Ontology (GO) terms (Table 2). In cluster 1, terms such as “plasma membrane” and “cell wall” were over-represented. Cluster 2 showed the enrichment of terms such as “ribosome”, “structural molecule activity”, and “microtubule organizing center”. Cluster 3 was enriched with terms such as “DNA metabolic process”, “chromosome”, and “nucleus”. To confirm the stability of this GO analysis, we clustered the genes into three classes in a different way. [First, divide the data into two, using the average entropy value as a border; then, the genes with lower entropy values were further classified by the average linker ratio value (Supplementary Fig. 7)]. With these three clusters, we confirmed that the result of their GO analysis gave the same tendency (Supplementary Table 6). Although they are just a statistical tendency, these results may imply that the initial changes of gene expression in response to various outside signals tend to be regulated by chromatin remodeling (See Discussion).
To clarify the correlation between the variability of nucleosome positioning and the histone modification pattern, we used two datasets: ChIP-on-chip data (Pokholok et al. 2005) and tiling array data (Liu et al. 2005).
From the comparison with the ChIP–chip data, we found significant differences in entropy for five out of eight modifications (Fig. 6). In monomethylation of H3 lysine 4 (H3K4me1) and trimethylation of H3 lysine 36 (H3K36me3), the entropies in hyper-modified sites were significantly higher than those in hypo-modified sites (P<1.3e-7 from Wilcoxon test). On the other hand, hyper-acetylation of H3 lysine 14 (H3K14ac) and H4 (H4ac) as well as trimethylation of H3 lysine 4 (H3K4me3) showed significantly lower entropies than their hypo-modifications (P<2.6e-6). For the linker ratio, small differences were observed in H3K36me3, H3K4me3, and H3K14ac (P<5.0e-3; Supplementary Fig. 8).
Similarly, in the comparison with the tiling array data, hyper-H3K4me1 showed significantly higher entropies while hyper-H3K4me3 and H3K14ac showed significantly lower entropies (P<1.0e-04 from Wilcoxon test; Supplementary Fig. 9). Note that trimethylation of H3 lysine 79 (H3K79me3), H3K36me3, and H4ac were not included in this tiling array data. The linker ratio was not significantly different between hyper- and hypo-modifications in all of the five modifications (P>0.01). Thus, at least the consensus of these two results (positive correlation of H3K4me1 and negative correlation of H3K4me3 and H3K14ac) seems to be reliable.
To examine if there are any tight links between any of the four clusters of genes and the status of histone modification in each promoter is also interesting. Cluster 3 was clearly enriched with H3K4me3 in the upstream region (Fig. 7). The levels of H3K9ac and H3K14ac in stable chromatin clusters (clusters 2 and 3) were modestly higher, too. On the other hand, H3K4me1 and H3K4me2 did not show a large difference among the four clusters. These results suggest that H3K4me3 modification is required to maintain stably open chromatin in the upstream regions.
In this study, we tested our hypothesis that a compendium of heterogeneous nucleosome maps should give us the local information of nucleosome instability, in other words, nucleosome dynamics. There are two kinds of raw data: the tag counts of next-generation sequencers and the hybridization signals in tiling arrays. They differ both in signal intensities and in resolutions. To make objective alignments of nucleosome positions, it is desirable to use a common method to assign these positions. Therefore, we modified the HMM method by Yuan et al. so that signal gradients, instead of intensity, are used as inputs (Yuan et al. 2005). Another modification was to add two self-loops to nucleosome states to make the model flexible enough to deal with noisy data.
The comparative analysis of the entropy value between known stable nucleosomes and known unstable nucleosomes showed the accuracy of our “nucleosome dynamics” map (Fig. 1b). Although there was a statistically significant difference between the two groups, the distinction was not perfect. It can be interpreted that our map may not be accurate enough to assess the situation of individual nucleosome accurately, but its statistical analysis is meaningful. Then, we analyzed the averaged distribution of the two indicator values (the entropy and the linker ratio) around TSS with three groups of mRNA data and ARSs (Fig. 2). There was a sharp peak of open and dynamic chromatin at around the −100 bp position, and this open region extends to around the −400 bp position from the TSS. A similar peak was also observed within ARSs. On the contrary, there was a weak tendency of dense and stable chromatin just downstream of TSS. Notably, these observations are consistent with our current knowledge of chromatin remodeling in promoters (Shivaswamy et al. 2008) and in replication origins (Field et al. 2008). The observed tendency becomes weaker as the reliability of the data becomes less.
Furthermore, our data indicate that most of the transcription factor binding sites are dynamic and open, more or less. Impressively, GRFs showed especially large entropy/linker ratio values (Fig. 3a). This is consistent with previous studies proposing that GRFs may have chromatin remodeling activity (Fourel et al. 2002; Hartley and Madhani 2009). Our observation that Abf1 and Reb1 show similar values, but are a little apart from Rap1, is also consistent with a previous report (Kaplan et al. 2009; Yarragudi et al. 2004). Mcm1, which is also located near GRFs in our plot, is implicated to be involved in chromatin remodeling (Chang et al. 2003). The fact that Mcm1 regulates the expression of a number of replication initiation factors may be related to our observation that there is a peak of nucleosome dynamics around ARSs (Fig. 2). On the contrary, Yhp1, which can bind to Mcm1 and the sites adjacent to Mcm1-binding sites, was observed in the stable and closed chromatin regions. Pramila et al. previously showed that Yhp1 protein represses cell-cycle-regulated genes with Mcm1 (Pramila et al. 2002). Thus, Yhp1 may restrict the activity of chromatin remodeling of Mcm1. Moreover, the two transcription activators, Ime1 and Fhl1, are also associated with the regulators of chromatin structure. The former factor activates meiosis-specific genes by recruiting with RSC chromatin remodeling complex (Inai et al. 2007); the latter is involved in the activation of ribosomal protein genes with Rap1 and is proposed to mediate the recruitment of Esa1 (Schawalder et al. 2004; Wade et al. 2004).
These results suggest that our map is reliable enough to predict the degree of chromatin remodeling around the binding sites of individual transcription factor. In addition, significant differences of the entropy and the linker ratio between inside and outside of genes (Fig. 4b, c) are consistent with the fact that TFBSs are enriched in intergenic regions, which contain promoters and enhancers. Recent reports suggesting that nucleosome positioning within exon regions may function as an exon–intron marker also support our results (Schwartz et al. 2009; Tilgner et al. 2009).
The correlation between nucleosome positioning and gene expression has often been reported. For example, Lee et al. showed that average nucleosome occupancy in promoters with high expression intensity is significantly lower than that with low intensity using a single nucleosome map (Lee et al. 2007). Our results give a little different interpretation from an additional dimension: based on the entropy value that is derived from the comparison of multiple maps, we further classified the “open” chromatin status, which is characterized by higher linker ratio values, into “dynamic” (higher entropy) and “stable” (lower entropy) statuses (Fig. 5a). As an example, we mapped the four clusters onto the promoter of genes involved in a typical stress-responsive pathway, cAMP-protein kinase A (Aguilera et al. 2007). There is a clear tendency that upstream genes have the “dynamic and open” promoters while downstream ones have “stable” promoters except a few exceptions, such as Msn4 (Supplementary Fig. 10). Additionally, while genes involved in nucleotide metabolism, promoters of which are enriched in cluster 3, are reported to show little changes in expression under various winemaking conditions too (Varela et al. 2005), promoters of upstream genes on another stress-responsible pathway, protein kinase C pathway, such as Rho1 and Mid2, are enriched in cluster 1. Taken together, we conclude that our clustering results seem to reflect the difference between genes that are responsive to various external stimuli and ones that are constantly expressed. Furthermore, the difference between the “open” and “closed” statuses was not significantly correlated with the expression intensity. It is an important observation that is only obtainable using multiple maps.
Although the interpretation of various histone modification data is complicated, the results obtained from two independent datasets (Liu et al. 2005; Pokholok et al. 2005) were basically consistent, further supporting our “nucleosome dynamics” map. Moreover, we observed that the contribution of each histone modification to nucleosome dynamics is different. H3K4me3, H3K14ac, and H4K9ac, which are known to be rich in the 5′ end of active genes (Barski et al. 2007; Heintzman et al. 2007; Li et al. 2007; Liu et al. 2005; Pokholok et al. 2005; Sung and Amasino 2006; Wang et al. 2008), were enriched in stable chromatin promoters in our study. Furthermore, H3K4me3 showed a clear difference in the upstream regions of genes in cluster 3 (Fig. 7). Recently, Vermeulen et al. showed that H3K4me3, but not the other two acetylations, is specifically associated with TFIID complex (Vermeulen et al. 2007). They also showed that H3K9ac and H3K14ac have a potential to enhance TFIID interaction with H3K4me3. Their associations with remodeling factors also have been presented in several studies (Kuo et al. 1998; Wysocka et al. 2006; Zhang et al. 1998). On the other hand, the H3K4me1-rich regions showed significantly higher entropy values (Fig. 6). Since this modification is reported to be associated with enhancer activity (Heintzman et al. 2007), it is possible that the modification modulates the enhancer activity with the change of nucleosome positioning.
From the above results, we conclude that the integration of various maps can show general features of nucleosome dynamics. Using our “nucleosome dynamics” map, a number of novel observations are made. We hope that some of them will be experimentally verified in the future. For example, the order of nucleosome dynamics was: “intergenic regions”>“introns”>“exons”. The degree of nucleosome instability correlates well with their degree of expression variety but not their intensity. The genes whose TSS region is highly dynamic and open tend to encode proteins that can sense extracellular conditions while the genes whose TSS regions are stably open tend to encode nuclear proteins. In addition to the GRFs, there are additional transcription factors whose binding sites exist at dynamically open regions: Leu3, Ime1, Rds1, and Fhl1, two of which have been suggested to be associated with chromatin remodeling.
It is evident that our methodology is quite intuitive, and there is much room for further improvement. Nevertheless, this methodology is promising in clarifying various aspects of epigenetic effects in cellular processes such as gene expression.
The in vivo nucleosome maps were obtained from four articles (Supplementary Table 1; Kaplan et al. 2009; Lee et al. 2007; Mavrich et al. 2008a, b; Whitehouse et al. 2007). The genome sequences, the gene coordinates, and the GO Slim data of S. cerevisiae were downloaded from the SGD (http://www.yeastgenome.org/). Yeast cDNA tags generated by random primers (GSE11209) were obtained from the Gene Expression Omnibus database (http://www.ncbi.nlm.nih.gov/geo/; Nagalakshmi et al. 2008). The microarray data of time-scale expression changes were obtained from Stanford Genomic Resources (http://www-genome.stanford.edu/yeast_stress/; Gasch et al. 2000). The in vivo binding sites of yeast transcription factors, with the criteria of p value cutoff 0.005 and no conservation, were downloaded from the website of Fraenkel's laboratory (http://fraenkel.mit.edu/; Harbison et al. 2004; MacIsaac et al. 2006). The ChIP-on-chip data for histone acetylations and methylations were obtained from the website of Young's laboratory (http://inside.wi.mit.edu/young/pub/download.html; Pokholok et al. 2005). The tiling array dataset of histone modifications was downloaded from the supporting information of (Liu et al. 2005). The data of stable and unstable nucleosomes was collected from five articles (Almer et al. 1986; Barbaric et al. 1992; Kent et al. 2001; Moreira and Holmberg 1998; Weiss and Simpson 1997). In each article, nucleosomes, which were depleted or largely moved in another condition, were defined as unstable nucleosomes (Supplementary Table 2).
Since the downloaded nucleosome maps were heterogeneous in both their experimental techniques and the algorithms for nucleosome position assignment (Supplementary Table 1), we modified the HMM algorithm by Yuan et al. to be applicable to both the next-generation sequencer and the tiling array data with various resolutions (Yuan et al. 2005):
Since only 36-bp from 5′ end is read for each mononucleosomal DNA fragment by Illumina sequencer, we need to estimate the position of corresponding nucleosome centers from the data. Basically, we followed the way of Jiang and Pugh (2009a, b) and added 73 bp, which is the half length of the nucleosomal DNA, to each position of the 5′ end as the estimated center. In the GS20 sequencer data, covering more than 100 bp of the nucleosomal DNA, the midpoint of tags is used as the nucleosome center. For each nucleosome center i, the counts of corresponding tags c(i) were recorded. Then, at each discrete position Ij (Ij=5+10j, j=0, 1, 2, ...) in the genome, the nucleosome signal s(Ij) is calculated with the following formula:
where w(i, Ij) is the Gaussian distribution with mean Ij and standard deviation 20 as introduced in (Albert et al. 2007). The nucleosome signals were further normalized to the Z-scores. In the tiling array data, we used the log2(intensity) as the nucleosome signal.
Next, for each discrete position L (L=10, 20, 30, ...), the minimum range that encompasses it is defined as follows:
Note that the above formulae are applicable to both of the data. It is possible that m=n−1 where the tiling array is sparse.
Then, we define the gradient (more accurately, the sine) g(L) as:
g(L) takes the values from −1 to +1. The value “10” is a scaling coefficient. In the linker region, the gradient signals are almost zero, whereas the signals are positive/negative on the 5′/3′ side of the nucleosome region, respectively (Supplementary Fig. 1a).
Our HMM contains three types of hidden states, each of which outputs the gradient signal value: one linker node (L), seven nucleosome nodes on the 5′ side (5N1-5N7), and seven nucleosome nodes on the 3′ side (3N1-3N7; Supplementary Fig. 1b). In addition to one self-connecting loop on the L node for allowing various lengths of linker regions, two self-loops were added to the 5N7 and 3N7 nodes for the detection of wider peaks. The emission probability function in each node is represented by a Gaussian distribution N(μ,ρ), where μ and ρ are parameters that take the same value in all nodes for each state (Supplementary Fig. 1c). All model parameters were estimated from a sliding window of 100 consecutive positions by Baum-Welch algorithm using the gradient signal in chromosome 3 as input (therefore, the parameter values are different for each original map). The averaged parameters from all windows were used for the estimation of hidden states by Viterbi's algorithm.
Since several nucleosome signals were obtained from short DNA sequences, we removed abnormal regions, where the length of the linker region is more than 300 bp (Supplementary Fig. 11).
For control studies, the nucleosome positions were shuffled randomly. This shuffling was iterated ten times. The distance between each nucleosome reported in the original article and the nearest nucleosome detected by our method was used for evaluation.
In the highly dynamic regions, where the entropy is more than 0.8, the output of Viterbi algorithm was converted to binary code (1=nucleosome state and 0=linker state). Phi coefficient was calculated with each pair of nucleosome maps.
To evaluate the level of nucleosome positional variety, we introduced average Shannon entropy, calculated as follows:
where PN,i and PL,i (=1−PL,i) are the ratio of nucleosome states and linker states at relative position i, respectively (Schneider 2010). The entropy value goes toward zero if either state becomes dominant within the window (PN,i → 1 or PL,i → 1) while it takes the maximum value if the two states are equally observed (PN,i=PL,i=0.5). The entropy is calculated through the whole genome using moving windows. In this study, we used 100 bp as the window size W, with a sliding interval of 10 bp.
Additionally, we used the linker ratio, which is defined as the count of linker states within the window divided by the window size (=100 bp) and the number of the nucleosome maps (=7).
In Pokholok et al.'s data, which covers the whole genome, we used the average of the entropy and the linker ratio within the probe center ±250 bp for characterizing the chromatin state. In Liu et al.'s tiling array data, which covers the entire chromosome 3 and parts of the other chromosomes at 20-bp resolution, we used the values at the corresponding sites as they were. For the analysis of hyper- and hypo-modified sites, the 250 highest and the 250 lowest probes were used.
Among the data of 119 transcription factors downloaded from Fraenkel's website, we removed those of 19 factors because their binding sites were not observed in the “normal” regions more than five times. The entropy and the linker ratio at each position within ±50 bp from the binding sites were averaged over all binding sites for each transcription factor. As a control, we randomly picked up 100 sites and calculated the values in the same manner. This sampling was repeated 100 times. The significance of the distribution against the control for each transcription factor was calculated based on the two-dimensional Gaussian distribution.
About 15 million cDNA tags were mapped onto the yeast genome using the Maq software with the option “-n 3 -e 100” (http://maq.sourceforge.net/). Failed tags were further mapped by BLAT with the option “-trimHardA-minIdentity=85 -mask=lower” (Kent 2002). Then, the transcriptome map was constructed by counting the mapped tags within a sliding window of 100 bp with a 10-bp interval. Additionally, the count of tags mapped on the intragenic region of each gene was used as its expression intensity.
Using the average of the entropy and the linker ratio on the upstream 400 bp region from the TSS of “verified mRNAs” and “uncharacterized mRNAs”, we categorized these mRNAs (genes) into four clusters using Ward's method. In this clustering, we ignored 683 genes, whose promoters were overlapped with the above-mentioned abnormal regions. In each cluster, p values for 84 GO terms were calculated by the hyper-geometric test and were adjusted using Benjamini-Hochberg's method (false discovery rate <0.05).
Below is the link to the electronic supplementary material.
Detection method of nucleosome locations. a The gradients of signals. b The diagram of HMM. c The emission probability of each hidden state. A linker state is represented in pink. The 5′-end and the 3′-end nucleosome states are represented in green and blue, respectively. (GIF 17 kb)
Difference of our detected nucleosomes with ones reported in each original article. Red lines correspond to the relationship between the positions of nucleosomes re-defined by us, and blue lines show the relationship between the positions of nucleosomes randomly positioned and those originally reported (GIF 10 kb)
Similarity of nucleosome maps used in this study (GIF 3 kb)
Distribution of chromatin dynamics around the binding sites of 100 transcription factors. The solid and dashed lines represent the entropies and the linker ratios, respectively (GIF 74 kb)
Histogram of a the entropies and b the linker ratios (GIF 4 kb)
Gene clustering using the chromatin state in the promoter by Ward's method (GIF 5 kb)
Gene clustering by averages of entropy and linker ratio (GIF 10 kb)
Difference of the linker ratios between hyper- and hypo-modified sites with the dataset of Pokholok et al. The asterisk mark indicates significant difference (P<0.01 from Wilcoxon test; GIF 8 kb)
Differences of the entropies and the linker ratios between hyper- and hypo-modified sites with the dataset of Liu et al. The asterisk mark indicates significant difference (P<0.01 from Wilcoxon test; GIF 9 kb)
Mapping clusters of chromatin states onto cAMP-PKA pathway. Gray, red, green, and blue ovals indicate genes in cluster 1–4, respectively (GIF 21 kb)
Histogram of linker DNA region width (GIF 7 kb)
Summary of nucleosome maps used in this study (DOC 28 kb)
Collection of stable and unstable nucleosomes reported from five literatures (DOC 30 kb)
Category of genomic features used in this study (DOC 28 kb)
Statistical significance of chromatin dynamics in the binding site of each transcription factor (DOC 34 kb)
Genes in each cluster (DOC 1599 kb)
Over-represented GO terms in three clusters by another clustering method (DOC 44 kb)
We are grateful to Riu Yamashita, Alexis Vandenbon, and all other members of the Nakai–Kinoshita Laboratory for their valuable advice and discussions. We also thank Takashi Ito for valuable comments. Computational time was provided by the Super Computer System, Human Genome Center, Institute of Medical Science, University of Tokyo. This work was supported in part by Global COE Program (Center of Education and Research for Advanced Genome-Based Medicine), MEXT, Japan.
Open Access This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.