To identify Oct4 binding sites in the human genome, we first performed a ChIP experiment using an antibody to Oct4 and demonstrated that the Oct4 ChIP sample showed enrichment when primers specific to the NANOG and EVX1 promoters (known Oct4 binding sites) were used in PCR reactions, but no enrichment when negative control primers specific for the DHFR gene were used (data not shown). We then used LMPCR to amplify the Oct4 ChIP samples and hybridized the amplified samples to ENCODE arrays. Using ChIP samples amplified by LMPCR, we have previously identified binding sites for E2F family members using CpG island (
4), promoter (
9,
10), and genomic tiling (
9) arrays. However, using the LMPCR amplification method we found that Oct4 binding sites could not be distinguished from the background noise on the arrays (,
top panel). For example, although the Oct4 binding site in the EVX1 promoter is present on the array used in this study, it could not be identified above background noise. Also, two Oct4 binding sites (confirmed by PCR analysis of ChIP samples) within the EXT1 gene, indicated with arrows in , do not show enhanced enrichment as compared to the surrounding DNA. Peak prediction analysis of two biologically independent ChIP-chip assays performed using the LMPCR method was carried out using a 98th percentile threshold of log2 oligomer ratios and a P-value P<0.0001 (
9). Although hundreds of peaks were called for the two arrays using the LMPCR-derived amplicons, very few peaks were in common on both arrays (
and Supplementary data).
Because known Oct4 binding sites were enriched in the ChIP samples, it was likely that the inability to identify binding sites on the arrays was a result of the amplification method and not inefficient immunoprecipitation. To test this hypothesis, we performed 10 ChIP reactions for each of two biologically independent samples of cross-linked cells. The 10 ChIP samples from a given batch of cells were pooled, and the two pools were applied separately to genomic tiling arrays. We found that the pooling method greatly reduced the background noise on the array and produced reproducible binding patterns (,
middle panel). In fact, ~70% of the peaks identified on one array were identified on the biological replicate array (
and Supplementary data).
Unfortunately, pooling ChIP samples is not always possible (e.g. if using specialized cell types or tumor tissues) and the need to pool 10 ChIP samples for every array would greatly increase the number of ChIP reactions needed to analyze the entire human genome. Therefore, we felt that a different method for amplifying ChIP samples was required. The method of whole genome amplification (WGA) has proven very useful for investigators performing comparative genomic hybridizations (see
http://www.sigmaaldrich.com/sigma/bulletin/wga1bul.pdf.). The standard protocol for this technique is to first employ a random chemical fragmentation of the genome, producing a series of overlapping short templates averaging 400 base pairs. Next, the DNA fragments are efficiently primed to generate a library of DNA fragments with defined 3' and 5' termini. This library is then replicated using linear amplification in the initial stages, followed by a limited round of geometric amplifications. Because ChIP samples are obtained using sonicated chromatin that has an average size of 500 bp-1 kb, we reasoned that the chemical fragmentation step should not be necessary. Therefore, we used an entire ChIP sample (obtained from 1 × 10
7 cells) for the library generation and subsequent amplification. Using this protocol, we found that the predicted Oct4 peaks show a very similar pattern as in the pooled ChIP samples and the background noise was very low (,
bottom panel). Using the WGA method, we found that ~63% of the peaks were detected on both arrays (
and Supplementary data). These results are very similar to those obtained by analysis of the arrays hybridized with the pooled samples. One reason why the overlap percentage was not higher than 63−70% when the pooled and WGA samples were analyzed is due to limitations of the peak-calling program. As shown in
Supplementary Figure 1, very similar binding patterns of Oct4 on two arrays can lead to differences in the number and exact positions of called peaks.
The Oct4 binding sites identified using the WGA method were tested by standard PCR analyses using a ChIP sample from a third independent culture of cells (). After analyzing 14 predicted Oct4 binding sites, we obtained a 93% confirmation rate, indicating that the WGA amplification method results in an accurate representation of a ChIP sample obtained from a small number of cells.
Conclusions
We have shown that the method of LMPCR-mediated amplification does not work well for all ChIP samples, perhaps dependent upon the number of binding sites and the abundance of the factor. We have tested a different amplification method, originally developed to provide accurate representation of the genome for studies of copy number changes and SNP analyses in tumor samples. We found that the signal to noise ratio obtained from the hybridization of the WGA amplicons to genomic arrays is superior to the LMPCR method of amplification for ChIP samples, not only for Oct4 but also for a number of other human and mouse transcription factors (data not shown). Based on the low background, reproducibility, and the fact that a single ChIP sample provides sufficient material for several array hybridizations, we recommend the WGA protocol for ChIP-chip analyses.