More recently, methods for mapping nucleosomes on a genome-wide scale have been developed. The first approach to be described was the hybridisation of yeast nucleosomal DNA to microarrays carrying oligonucleotides representative of an entire yeast chromosome (35
). The method has a resolution equivalent to that of indirect end-labelling, because the borders of the nucleosome cannot be defined. Microarray data are best understood as measurements of nucleosome density or occupancy, i.e
., the relative probability of each oligonucleotide on the array being found in a nucleosome. Many genes exhibit a sinusoidal nucleosome density profile, with peaks interpreted as positioned nucleosomes and troughs as linkers; many other genes exhibit more complex patterns that are difficult to interpret (see below) (35
The most important finding from the microarray studies is that the old observation that active promoters are much less likely to be nucleosomal than coding regions is generally true. This is not
to say that promoters are nucleosome-free - a critical point! It is illustrated by many studies in which the accessibility of a restriction site in a promoter is measured in nuclei. The assay is based on the observation that the nucleosome affords essentially complete protection of a restriction site from digestion in vitro
)). It is possible for a restriction enzyme to cut a site within a nucleosome, particularly if it is close to the edge, but high concentrations of enzyme are required (39
). In most studies, digestion of a restriction site in nuclei reaches a plateau in the region of 50% (e.g
., the yeast PHO5
) and the chicken β-globin enhancer (41
)); very few studies record complete accessibility. A conceptually weak point of the assay when applied in vivo
is that proteins other than nucleosomes might also protect against restriction enzymes, although such proteins would have to be unusually strongly bound to DNA. Our own studies provide direct evidence for nucleosomes on two active yeast promoters in vivo
, measured both by monomer extension mapping and by restriction enzyme accessibility (18
). Thus, the active promoter is much less likely to be nucleosomal than the neighbouring coding region, but it is not always nucleosome-free. This can be understood in terms of a dynamic chromatin structure: at some points in the transcription cycle, there is a nucleosome on the promoter, at other times it is nucleosome-free (i.e.
, some arrays have a nucleosome on the promoter; others do not ()).
The latest breakthough in the field is parallel sequencing of nucleosomal DNA (43
). Nucleosome sequencing is the ultimate high resolution mapping technique, because the nominal resolution is one base pair. Nucleosome sequencing using traditional methods has been described previously, leading to important insights into how DNA interacts with the histone octamer (48
). It is the scale of the new sequencing experiments that is breathtaking: the new machines can sequence millions of nucleosomes! For parallel sequencing, adaptors containing suitable primer sequences are ligated to purified nucleosomal DNA, which is then amplified by PCR and sequenced in parallel (50
). Two different technologies are available: the Illumina-Solexa system which yields millions of short reads (~40 bases), and the Roche 454 system, which yields >100 times fewer reads, but of greater length (~250 bases). The advantage of the latter is that the length of the nucleosome is obtained from the sequence and therefore the degree of trimming and the accuracy of the position data should be apparent. However, most studies have used the Illumina system because it yields much higher genome coverage. Short sequences corresponding to one end of each nucleosomal DNA molecule are obtained and identified. To infer the position of the nucleosome, forward and reverse reads are paired assuming that they should be ~150 bp apart and so can be attributed to the same nucleosome (50
). However, it is difficult to distinguish between different degrees of trimming of the same nucleosome and a cluster of overlapping positions (18
). These difficulties can be resolved experimentally using Illumina paired end sequencing, in which both ends of the same
DNA molecule are sequenced, yielding the length of the nucleosomal DNA. All the caveats concerning high resolution mapping mentioned above also apply to nucleosome sequencing, particularly the requirement for full trimming of core particles to 145–150 bp. An additional issue is the potential for amplification bias.
Most high throughput studies differ from traditional studies in two ways: (i) An attempt is made to prevent nucleosome sliding during core particle preparation by prior fixation of the cells with formaldehyde. However, it is unlikely that cross-linking of the histones to DNA is complete and therefore effective. It is unclear whether formaldehyde fixation is appropriate, given that it might introduce artefacts resulting from modification of DNA-binding lysine residues and DNA bases. (ii) An attempt is made to correct for the sequence bias of MNase by comparing nucleosomal DNA with protein-free genomic DNA digested by MNase to about the same size. This seems inappropriate given that protein-free DNA is destroyed long before mono-nucleosomes appear in the digestion (discussed above). A different kind of bias might be introduced if some core particles are more susceptible to digestion by MNase than others (13
), resulting in under-representation of particular nucleosomes (34
Are the genome-wide data consistent with multiple alternative arrays ()? Nucleosome sequencing has also revealed overlapping nucleosomes in yeast (43
), although the genes given as examples exhibit more obvious dominant arrays than our data have indicated for CUP1
). Overlapping positions appear to be a general feature of C. elegans
). Data from genome-wide studies are typically presented as nucleosome occupancy maps for specific genes, with peaks and troughs interpreted as nucleosomes and linkers, respectively. These represent an averaged chromatin structure. Quantitative conversion of our monomer extension data for HIS3
chromatin to nucleosome density revealed a similar sinusoidal variation (), even though multiple arrays are present (18
). We proposed that such patterns correspond to “interference” between the nucleosome density signals from overlapping nucleosomal arrays. Different patterns can be obtained depending on the phase and relative occupancy of each array (). Such analysis suggests an explanation for the apparently “fuzzy” positioning of a large fraction of nucleosomes genome-wide, even though the majority of them must be present in ordered nucleosomal arrays.
Figure 4 Model nucleosome density/occupancy profiles for unique and multiple nucleosomal arrays. Three different model array sets: (A) Unique array. (B) Two arrays with equal occupancies that are out of phase by 80 bp. (C) A set of three arrays (a dominant array (more ...)