|Home | About | Journals | Submit | Contact Us | Français|
The recent development of next-generation sequencing technology has enabled significant progress in chromatin structure analysis. Here, we review the experimental and bioinformatic approaches to studying nucleosome positioning and histone modification profiles on a genome scale using this technology. These studies advanced our knowledge of the nucleosome positioning patterns of both epigenetically modified and bulk nucleosomes and elucidated the role of such patterns in regulation of gene expression. The identification and analysis of large sets of nucleosome-bound DNA sequences allowed better understanding of the rules that govern nucleosome positioning in organisms of various complexity. We also discuss the existing challenges and prospects of using next-generation sequencing for nucleosome positioning analysis and outline the importance of such studies for the entire chromatin structure field.
Packaging of genomic DNA inside a eukaryotic cell is facilitated by the interaction between the DNA and a specific set of highly conserved architectural proteins, called histones . Two copies of histones H2A, H2B, H3, and H4 are wrapped around by a 147-bp DNA fragment to form a nucleosome core particle, and histone H1/H5 binds to the linker DNA connecting the cores  (Figure 1). Placement of nucleosomes at specific locations can significantly affect transcriptional activity by hindering or facilitating binding of transcription factors to DNA [3-5]. Moreover, epigenetic marks such as covalent modifications of the histone tails or incorporation of specific histone variants are an essential part of transcriptional regulation. Therefore, mapping the dynamics of nucleosome positions and modifications of the histones on a genome-scale is crucial for understanding gene regulatory networks that underlie various biological processes both in normal development and disease .
Nucleosome mapping on a genome-scale can be performed by first fragmenting the chromatin extracted from a cell to preferentially release the DNA fragments associated with nucleosomes (Figure 2A). This is followed by identification of the exact locations of these fragments in the reference genome assembly, as described below. The two most common methods of chromatin fragmentation are enzymatic digestion with micrococcal nuclease (MNase) and sonication. Prior to chromatin fragmentation, histones can be fixed on DNA by reversible cross-linking. This procedure is required in the case of sonication, but a gentler action of MNase allows skipping of this step and ‘native’ chromatin is often used for nucleosome profiling. Chromatin immunoprecipitation (ChIP), a technique that enriches the DNA fragments associated with histones of specific type , enables profiling of the epigenetically modified nucleosomes, when the antibody specific to the modification of interest is available.
Until recently, hybridization on tiling microarrays was the most commonly used approach for large-scale identification of the DNA fragments obtained from the chromatin. Array studies allowed identification of the depletion in nucleosome density at active promoters in yeast [8;9] and human , followed by a complete map of nucleosome positioning for the yeast  and fly genomes . More recently, high-throughput sequencing-based approaches have become available. Developed in the past three years, this massively parallel sequencing currently results in either hundreds of millions of short reads (35-50 bp, at the 5'-ends of DNA fragments) or about a million of longer reads (~100-500 bp, in most cases spanning the entire nucleosomal DNA) per run, at a much-reduced cost [13-15]. These techniques are collectively referred to as next-generation sequencing technology (NGS). Direct sequencing of the fragments rather than hybridization on a microarray platform results in a much improved spatial resolution of the nucleosome positioning maps. This has enabled significant advances in the field of chromatin structure analysis.
The nucleosomal datasets produced with the NGS platforms by August 2009 are summarized in Table 1. Both epigenetically modified nucleosomes and bulk nucleosomes that are not selected for any specific histone modification have been mapped in different organisms from yeast to human, and the list is growing rapidly due to the widespread availability of NGS. MNase digestion in combination with NGS (MNase-Seq) was used to profile bulk nucleosomes in the worm [16;17], yeast [18-20] and human genomes . ChIP followed by sequencing (ChIP-seq) was used, either with sonication or MNase digestion, to profile nucleosomes that contain a histone variant or epigenetic modification. Nucleosomes with the histone variant H2A.Z were profiled in yeast , fly , and human ; a broad range of histone methylations and acetylations were profiled in human CD+ T cells [23;24] using the ChIP-seq approach. Most recently, the genomic locations enriched for nucleosomes containing the histone variant H3.3 and those additionally containing the histone variant H2A.Z were identified in the human HeLa cells .
Generally, characterization of nucleosome positioning comprises two aspects: (i) assessment of nucleosome occupancy or relative abundance of the nucleosomes of a specific epigenetic type to understand the overall impact of nucleosome positioning on the chromatin organization and function [17;19;23;25] and (ii) detection of stable positions of mono-nucleosomes in various genome regions to study the details of the regulatory pathways such as the interplay between nucleosome positioning and transcription factor binding [22;26;27] and to investigate the rules directing nucleosome positioning (discussed below). These aspects are not exclusive but rather complement each other, and NGS has proved to be useful for studying each of them.
The overall nucleosome occupancy can be assessed by directly counting the numbers of sequenced tags mapped to each position in the genome  or by using a more elaborate ‘tag extension’ approach . In the latter approach, each sequenced tag represents an independent nucleosome fragment and is ‘extended’ by the expected mono-nucleosome size towards its 3'-end on the strand to which it maps. The number of overlapping nucleosome fragments at each position in the genome is assumed to correspond to the nucleosome occupancy averaged over the cell population used in the sequencing experiment. No ‘tag extension’ is required for constructing such occupancy profiles if the entire nucleosome fragments are sequenced, as in the studies that use Roche 454 pyrosequencing [12;16;28].
The profile of nucleosome occupancy could be used to determine stable nucleosome positions simply as peaks with sufficiently high tag counts [12;20;26]. However, the sequencing technology is not free of artifacts such as amplification bias, and this approach may result in a relatively large number of false positive calls of nucleosome positions (Figure 2B,C). A methodology for mapping of stable nucleosome positions in a more robust way has been developed based on the characteristic features in tag distribution on the DNA positive and negative strands [22;29;30]. As shown in Figure 2B, a specific pattern of tag density marks the sites that are protected against enzymatic digestion or other DNA shearing agents. This pattern comprises two peaks, one on each of the DNA strands, flanking the protected region. Scanning the tag distribution profile for such a signature pattern allows a reliable estimation of the stable nucleosome positions.
This approach can also help in identification of the length of the DNA sequence protected by the histone core against fragmentation . This length represents the genome-wide mean for the nucleosome positions, averaged over the cell population. This information was used to show that the nucleosomes containing the H2A.Z variant are susceptible to asymmetric internal digestion , which is indicative of less tight DNA wrapping around the histone core. Concordantly, the shortened DNA protection was also reported for the nucleosomes at the transcription start sites of fungus genes based on the paired-end sequencing, which allows direct measurement of the DNA fragment lengths . Other types of nucleosomes, e.g. those that carry a specific variant of histone H3 (CenH3) and are associated with centromoric regions of chromatin, or those that carry the histone variant H2A.Bbd and are associated with active chromatin, were previously reported to organize less DNA than canonical nucleosomes [33;34]. With more sequencing data available for various genomes, especially for the regions such as centromeres and heterochromatin in general, it should be possible to investigate local variability of nucleosomal DNA length. Accounting for such variability would be essential for correct reconstruction of the nucleosome occupancy profiles and predicting the exact locations of the stable nucleosomes.
The nucleosome profiling on a genome-scale has already revealed specific patterns in placement of both bulk and epigenetically modified nucleosomes at the regulatory regions such as transcription start and end sites (reviewed in Refs. [35;36;37]). These studies have highlighted the importance of the combinatorial patterns of various histone modifications and variants and their interplay with other factors such as DNA structural flexibility, methylation status, binding of small RNAs and other regulatory proteins.
For instance, the broad regions of compact, mostly inactive chromatin called heterochromatin are enriched in the nucleosomes that are di- and tri-methylated at lysine 9 of histone H3. The deposition of this methylation was shown to be mediated by RNA interference in a number of organisms  and the presence of these epigenetic marks is essential for binding of such proteins as HP1 that silence chromatin. Another methylation mark, tri-methylation of histone H3 at lysine 27, is associated with Polycomb-assisted silencing of chromatin . The histone modification marks associated with active transcription are usually more localized and are focused at the regulatory and transcribed regions of the genome. The tri-methylation of histone H3 at lysine 36 is the signature of the actively transcribed regions which is the most pronounced at gene ends . The mono-methylation of histone H3 at lysine 4 combined with binding of the protein p300 denotes the distant-acting enhancers, which modulate transcription activity in a tissue specific manner [39-41].
Perhaps the most characterized genomic regions in terms of nucleosome positioning and histone modifications patterns are around transcription start sites (TSS) [12;16;18;20-23]. Tri-methylation of histone H3 at lysine 4 as well as histone acetylations and enrichment in the histone variants H2A.Z and H3.3 mark the TSS of active genes [12;22-24]. Many studies have shown that nucleosomes are arranged in a specific pattern at gene starts. This pattern comprises the nucleosome-free region at the TSS, flanked by the stably positioned nucleosomes (an example of such a pattern is shown in Figure 2B,C for human gene TIMM17A). A recent study  argues, however, that the nucleosome free region can actually be occupied to a considerable extent by the nucleosomes simultaneously containing the two histone variants H3.3 and H2A.Z. These nucleosomes are very labile and could be easily displaced by transcription factors. Thus, presence of such nucleosomes would not suppress transcription while it helps to keep the region free from the stable canonical nucleosomes. Future studies will test this hypothesis and further reveal the role of epigenetically modified nucleosomes in the regulation of gene expression and other biological processes.
Identification of large sets of the DNA sequences that wrap around the histone core in chromatin has led to an explosion in the number of studies focused on the principles that guide nucleosome positioning, in particular on the role of the DNA sequence itself in this process [16-18;20;22;30;42-46;47]. Previously known nucleosome positioning signals such as the 10-bp periodicity in the dinucleotide composition  was confirmed in the studies on yeast and worm genomes [16;20;22;42;45]. At the same time, such a periodicity was shown to be less pronounced in fly and even further diminished in human [12;30]. In addition, a positioning signal associated with the increased GC-content of the DNA incorporated in the core particle as compared to linker DNA was shown to be present in various genomes [11;30;43;49].
The rules of rotational positioning deduced from both in vitro and in vivo studies appear to be very similar, with AT-rich stretches tending to occupy the locations on the histone surface where DNA is bent into the minor groove and GC-rich stretches tending to occupy the locations where DNA is bent into the major groove [20;42;50]. On the other hand, the overall ability of DNA sequence to direct translational positioning of nucleosomes remains a subject of debate. In one study, nucleosome occupancy profile observed in yeast in vivo was compared to the profile obtained for reconstituted nucleosomes on bare DNA in vitro . The two profiles showed a high level of correspondence, suggesting that DNA sequence plays an important role in determining chromatin structure at least in yeast. However, a related recent study argues that the DNA sequence contributes mainly to nucleosome rotational positioning and to formation of the nucleosome free regions in yeast promoters, e.g. through well-documented nucleosome exclusion by A-tracts (reviewed in Ref. ) and does not determine precise translational positioning of nucleosomes on genomic DNA . Furthermore, a study on worm chromatin showed that exact nucleosome positioning is not preserved in cell population on a single-nucleotide scale .
Many factors other than DNA sequence influence nucleosome positioning. The arrays of positioned nucleosome that are consistently observed at the transcription starts [12;22;23] can be a result of ‘statistical’ positioning  caused by the presence of a ‘barrier’ in the form of a single stably positioned nucleosome or a bound transcription-related protein [44;46;53]. ATP-dependent chromatin remodeling complexes can be recruited to displace nucleosomes and form nucleosome free regions at gene starts . Other proteins can contribute to nucleosome positioning as well. For example, binding of insulator protein CTCF was demonstrated to facilitate nucleosome positioning in human genome . At the same time, the DNA sequence could evolve to ‘imprint’ the active chromatin structure at the starts of the most actively transcribed genes . It remains to be determined which fraction of stable nucleosomes is positioned by the signals encoded in genomic DNA, as those discussed above, in the genomes of organisms of various complexity.
Interestingly, the epigenetically modified nucleosomes of different types from the human genome were shown to be preferentially associated with the DNA sequences characterized by distinct dinucleotide patterns . Clearly, the DNA sequence cannot be the sole determinant of the epigenetic state of a DNA locus. However, the sequence may reflect an increased probability for a DNA region to have a certain epigenetic state. For instance, the GC-rich sequences of CpG islands tend to be enriched for histone modifications associated with open chromatin. Furthermore, biophysical properties of the sequence may favor incorporation of particular nucleosome types. It was shown, for example, that the sequence organization of the DNA from human H2A.Z nucleosomes differs from the organization of the DNA associated with bulk nucleosomes . Such a sequence specificity of the H2A.Z-containing nucleosomes was suggested to facilitate nucleosome repositioning upon the histone variant deposition which would constitute an important mechanism for fine-tuning of the chromatin structure at regulatory regions.
The biggest obstacle in these studies so far has been the large amount of sequencing that is needed to profile bulk nucleosomes in a large genome. To profile human bulk nucleosomes , more than 140 million Illumina/Solexa tags were generated, but it was only sufficient to analyze the aggregate tag distributions averaged over large groups of genes. To understand nucleosome dynamics in development and differentiation, nucleosomes have to be mapped in a myriad of epigenomes. It may be possible to carry out a focused profiling of selected regions of the genome using tiling arrays as a first step to capture the appropriate fragments [56;57]; however, additional studies are required before such an approach become available to a broad community. The amount of sequencing necessary for epigenetically modified nucleosomes is generally smaller, but it still requires a substantial number of tags and high-quality antibodies. For most experimental laboratories, storing, handling and analyzing such large data sets pose perhaps the most difficult challenge .
The nucleosome profiling experiment has a number of internal biases, which need to be addressed (Figure 3A). An important issue is the salt concentration used for chromatin extraction. It was shown  that the experiments performed under low and high salt concentrations resulted in profiling of chromatin regions that have different physical properties, with the inactive chromatin regions being less soluble on average than the active regions (we note, however, that some fraction of active regions was reported to be present in the least soluble chromatin fraction). Using different salt conditions can be a tool for capturing epigenetic dynamics of chromatin as was demonstrated by studies of the distributions of histone variants H3.3 and H2A.Z in the fly and human genomes [25;59].
Estimated nucleosome positions can also be affected to a high degree by the procedure used for DNA fragmentation. In the most studies, the MNase digestion is used for these purposes (Table 1) because it produces nucleosome maps with near base-pair resolution and does not require cross-linking. However, this enzyme has pronounced sequence preferences [60;61], which may lead to bias in the nucleosome occupancy profile. Although several computational approaches have been suggested to compensate for such a bias, they do not address the issue completely [22;30]. Sonication also appears to have sequence-specific bias, which has not been comprehensively studied yet. As a result, different sequence organization may be revealed for the DNA fragments associated with the same type of nucleosomes depending on whether MNase digestion or sonication is used for chromatin fragmentation. The GC-content profiles for the sequences around the 5’-ends of nucleosome fragments obtained by MNase digestion or sonication of human chromatin are shown in Figure 3B. Not only the two fragmentation techniques produce distinct sequence signatures at the site of the DNA breakage but also the overall nucleotide composition is different in the adjacent DNA regions. We note that these experiments using the two fragmentation methods were carried out on the cells of different types, CD4+ T-cells and HeLa cells, but biases shown are likely to be present regardless. The issue clearly requires further investigation.
Another important issue is the GC-bias. The GC-rich sequences tend to be over-represented in the final set of sequenced tags, while the extremely AT-rich genomic regions may be poorly covered . Such a bias is explained, at least partly, by the DNA amplification which is used during sample preparation by most of the sequencing platforms (see Figures 2B,C and the related text for a discussion of the effect of the amplification bias on nucleosome detection). A recently developed Helicos platform does not require amplification and seems to be a promising technique for addressing the GC-bias. However, to date there was no published study which would use this platform in a nucleosome positioning experiment.
In the next several years, a large amount of epigenetic data will be generated. Nucleosomes of various types are being currently profiled in several human cell lines in the ENCODE (Encyclopedia of DNA elements) consortium project . A related effort, Model Organism ENCODE, is underway for fruit fly and worm . A new NIH initiative, Roadmap Epigenomics Program, will promote the discovery of novel epigenetic marks in mammalian cells and is expected to advance our understanding of epigenomics of human health and disease. Analysis and integration of the data generated in the ongoing and future epigenetic projects will require further development of the experimental and bioinformatic approaches some of which are discussed above. The new data will facilitate direct comparison of the nucleosome positioning and histone modification profiles obtained for different cell types and developmental stages of the same organism and the profiles of both bulk and epigenetically modified nucleosomes in organisms of various complexity. It is anticipated that these projects will provide novel mechanistic insights into the subject of nucleosome positioning and bring our understanding of the pathways of epigenetic regulation to a new level.
Biological role and rules of nucleosome positioning
Challenges and prospects of chromatin structure analysis with next-generation sequencing