Heterochromatin is refractory to the activity of several enzymes, including restriction enzymes, DNA methylases and the
HO endonuclease
[23]–
[25]. However, it has been previously assumed that physical manipulation of DNA
in vitro by high energy methods such as sonication is unaffected by biological properties affecting the chromatin under study. This study puts an end to that assumption, with new properties of chromatin revealed in the deep sequencing of samples previously considered as merely controls of other experiments. We discovered that input-Seq coverage differs widely for many distinct positions, including silenced subtelomeric DNA, telomeres, protein-binding sites, and highly transcribed genes and promoters. Such differences will significantly influence interpretation of ChIP experiments, an issue that was previously unrecognized. These differences can also be exploited to detect unusual chromatin states.
By comparing coverage of sequence reads from sheared chromatin samples to those from sheared genomic DNA, we were able to separate technology-related sequencing biases from biologically meaningful effects. The most under-covered regions were heavily biased towards subtelomeric regions which are subject to silencing in yeast, similarly to
HML and
HMR [12]–
[14]. This analysis supported the hypothesis that silencing interfered with shearing of DNA. In contrast, the DNA inside the telomeres was vastly over-represented in the sequenced input sample. Yeast telomeres, as in other organisms, are specialized structures, with highly repetitive sequences, coated by a variety of proteins
[26]. The over-representation of sequence reads on chromosome ends was specific to the sheared chromatin sample and was not observed in the sheared genomic DNA. Hence, peculiar DNA sequence composition inside the telomeres could not explain the over-representation of input sequence reads in these regions.
We observed striking differences in the coverage of protein-bound sites. The sequences around transcription factor binding sites and DNase I footprints had higher coverage than intergenic or coding DNA. The read density over genes and their promoters also correlated with the transcription level of the gene–a high expression level was associated with an increase in read density, and a low expression with a decrease. The increased coverage of the binding sites and DNaseI footprints, and the correlation between high coverage and high RNA levels may have reflected the frenetic activity of nucleosome remodelers, transcription factors, general transcription machinery, and RNA polymerases. It is noteworthy that the only two transcription factors whose binding sites were not enriched in input reads were Ste12 and Dig1. Both are involved in the mating and invasive growth pathways
[27], and therefore, would probably have been inactive under the rich media (YPD) conditions in which the cells were grown in preparation for input-Seq.
In testing candidate hyper- and hypo-covered input-Seq regions, we observed changes in shearing similar to HMRa1 only in the under-covered subtelomeric region. Shearing appeared to be normal inside the poorly-covered TRA1 gene and the over-covered RPL26A promoter that we analyzed. These results suggested that chromatin states can also influence input-Seq coverage through effects other than shearing. Indeed, Quantitative PCR (Q-PCR) measurements for the above regions showed similar variation in DNA content of the input sample, as we observed in the input-Seq coverage. It is likely that the chromatin states of the telomeric structures, promoters, and genes, lead to differences in the efficiency of isolation of chromatinized DNA, prior to the shearing step or during the reversal of crosslinking.
Chromatin immunoprecipitation, in conjunction with tiled Q-PCR, is often used to establish the extent of spreading along a chromosome for proteins of interest. If a locus is refractory to shearing and/or is inefficiently isolated due to the chromatin state, a ChIP-based localization of a protein in such a region would exaggerate the apparent interval over which that protein interacts with chromatin. Conversely, a higher susceptibility to shearing or better isolation may result in an under-estimate of the spreading. Particularly for ChIP-Seq studies, our observations of the pervasive inhomogeneity of coverage in the input sample highlighted the need to normalize the sequence read counts from ChIP samples to the input control counts. Many studies currently lack sheared chromatin input sequencing data, and the analyses from these studies are likely to have increased false positives and false negatives.
In addition to the effect of the chromatin structures on ChIP studies, our study re-emphasizes the importance of normalizing deep sequencing results to the sequence reads from genomic DNA. Bias in GC-content and other sequence composition patterns can produce dramatic peaks or troughs in coverage, as we observed over centromeres and across transcripts, potentially leading to mistaken inferences about the underlying biology. These biases would affect ChIP-Seq studies, and would also confound interpretation of RNA-Seq and copy-number variation detection using high-throughput sequencing technologies.
As more ChIP-Seq experiments with appropriate input controls are performed, the deviation in coverage is going to become an increasingly powerful way to identify distinct chromatin states, as long as the raw data from such studies remain available. We were already able to pinpoint specific regions, with decreased or increased read counts defining domains hundreds of base pairs long. Given the highly reproducible results that we observed in the different input-Seq experiments, as more ChIP-Seq input controls for the same species become available, it will become possible to detect chromatin differences at specific loci with increasing resolution. The chromatin-related variation in ChIP experiments is likely to be pervasive across taxa.