Our results indicate that while indels are depleted on average in all types of nucleosomes at TSS, TES, and genome-wide, the density of SNPs exhibits a more intricate behavior. In particular, SNPs are negatively correlated with nucleosome occupancy at TSS and positively correlated with nucleosome occupancy at TES ( and Supplementary Table 1
). In agreement with this observation, the density of SNPs is increased in the core sequences associated with bulk but not epigenetic nucleosomes, which constitute a majority of the nucleosomes in our set at TES and TSS respectively (). The positive correlation between SNP density and nucleosome occupancy was reported earlier for the fish and yeast genomes7,10
. We note that the nucleosome positions used in earlier studies correspond to bulk positions in our notation; thus, the results between the previous studies and ours are consistent. However, in the current paper we show that this rule does not hold in a number of important cases in the human genome. At least some classes of epigenetic nucleosomes are associated with a decrease rather than increase in the SNP density (). We also report that SNPs may be negatively correlated with the nucleosome density in the DNA regions that are under strong selective pressure, such as exon-intron boundaries (). Thus, our findings highlight the complexities in the interplay between the mechanisms that control SNP appearance and nucleosome positioning in humans.
It is interesting to consider why nucleosomal sequences in bulk are strongly depleted of one type of mutations, indels, while they are either only moderately depleted or even enriched in another type of mutations, SNPs. In general, two mechanisms are potentially responsible for the difference in the density of genome variations inside and outside nucleosomes27
. One is connected to the alteration of the mutation rate in nucleosomal DNA, e.g. due to physical interaction the nucleosomal DNA with histones7,10,28
. Another assumes that the DNA sequences that contain nucleosome positioning signals and/or binding sites of transcription factors are evolutionarily conserved to a higher extent than the adjacent DNA fragments29
. These mechanisms are not mutually exclusive and can both contribute as discussed below.
Our observation of roughly the same frequency of indels inside nucleosomes of different types () suggests alteration of mutation rate rather than action of purifying selection for indels. Indeed, our results provide little support for the hypothesis that the selection pressure excludes indels from nucleosomes. We did not detect a dependence of the nucleosome-to-linker ratio of the indel occurrences on indel length (Supplementary Figure 2
), which would be suggestive of this mechanism. Overall, our results indicate that the stable nucleosome positions are reflected in the indel frequency profile regardless of the local base composition or details of regulatory pathways in which a specific DNA locus is involved. This is illustrated by a shift of the nucleosome position +1 at starts of the CpG genes relative to the corresponding position at starts of the non-CpG genes in the indel frequency profile (). The sequence composition of the TSS proximal regions of CpG and non-CpG genes is quite different and CpG genes are actively transcribed in a broader range of cell types than the non-CpG genes26
, yet the nucleosome position +1 is reflected in the indel frequency profile in each of these groups.
On the other hand, the density of SNPs appears to be affected by natural selection. A single nucleotide mutation can disrupt a transcription factor binding site to interfere with regulatory pathways. It is less likely that such a mutation would significantly alter the positioning properties of a 147-bp sequence associated with a nucleosome. Furthermore, even if a mutation changes the position of a bulk nucleosome by several base pairs, this may not have any biological effect. As a result, mutations would be tolerated in the core sequence of bulk nucleosomes but would be excluded from the linkers where many transcription factors bind3,30,31
. In contrast, correct placement of epigenetically modified nucleosomes is important for gene regulation, and the positions preferentially occupied by these nucleosomes are likely to be conserved to the same or greater extent compared to the linker sequences. It should be emphasized that our results do not imply a complete absence of selective pressure on the bulk nucleosome sequences but rather that the pressure is stronger in linkers than in the nucleosomes of this type. Neither do we suggest that the SNP occurrence rate is not changed in nucleosome core sequences at all. Rather, our results provide information about the mechanisms contributing the most to the observed features of sequence variation profiles.
The interpretation of a stronger conservation of epigenetic nucleosome positions, rather than the difference in mutation rates in the bulk and epigenetic nucleosomes is further supported by two lines of evidence. The fraction of SNPs rarely occurring in population, in particular those associated with only one genome in our data set, is higher for the epigenetic than for bulk nucleosomes (Supplementary Figure 4
). This indicates a stronger selection against SNPs from the epigenetic nucleosomes. As discussed above, we also observe a clear drop in SNP density at the nucleosome positions coinciding with exon-intron boundaries (), which is likely to result from the strong selection pressure acting on the splicing sites. Since the greater part of the nucleosomes proximal to exon-intron junctions are bulk, the anti-correlation of SNP frequency with nucleosome occupancy argues against the idea that the presence of nucleosomes of this type necessarily increases the SNP accumulation rate.
Taken together, our results suggest that a combination of purifying selection acting on biologically important sequences and the alteration of the mutation rate in nucleosomal DNA determine the pattern of sequence variation in the human genome (). Further studies are required, however, to unambiguously prove or disprove the involvement of the above mechanisms in the evolution of nucleosome positioning sequences in the human genome. In particular, characterization of molecular mechanisms that can underlie chromatin-directed mutational bias will undoubtedly advance our understanding of the principles of genome evolution.
Figure 6 Interplay of chromatin-mediated mutation bias and selection can shape sequence variation profile (cf. to schematic illustration in Ref. 27). (A) Bulk and epigenetically modified nucleosomes are represented with blue and red ovals. Green and orange lines (more ...)