ChIP-based technologies are being used extensively in identifying protein-DNA interaction networks in a variety of cell types and a number of varying conditions. In particular, ChIP-PET and ChIP-chip have been used to identify the mouse and human ES cell transcriptional circuitries, which are largely regulated by the key pluripotency factors OCT4 and NANOG. Although each ChIP-based technology used in the identification of these networks has its distinct advantages, we find substantial differences in the data derived through these different experimental methods. Recent technological comparisons have shown differences in the results obtained by these methods, and illustrated the need to use these data in a complementary manner [14
]. We have used ChIP-chip to uncover genomic regions bound by OCT4 and NANOG in mouse ES cells, and expanded on previously published ChIP-PET results, and find a large number of binding sites identified exclusively by each technique. Therefore, using these data in a complementary fashion provides a more detailed overview of the OCT4 and NANOG transcriptional networks.
We analyzed our ChIP-chip results for OCT4 and NANOG in relation to existing ChIP-PET data. Since the criteria for identification of genomic targets is different between platforms, the datasets obtained by the two methods were examined against each other under an exhaustive range of significance values. Recovery curves were used to measure the recovery of targets obtained by keeping the binding threshold for one technique constant and varying the threshold values for the other method. As expected, for both OCT4 and NANOG targets, the ChIP-PET recovery decreased as the ChIP-chip p-value threshold was made more stringent. A similar trend was observed for the ChIP-chip recovery when the ChIP-PET read stringency was increased. Additionally, at the same thresholds, this overlap decreased when the recovery distance permitted between a ChIP-chip peak and ChIP-PET peak was narrowed. Therefore, these recovery curves revealed the necessity of recovery distance calibration in examining binding experiments from multiple sources. Interestingly, we also observed that the amount of recovery between ChIP-chip and ChIP-PET data increased when the whole chromosome arrays were used. Therefore, the criteria used to determine a binding event, as well as the extent of genome coverage, had an effect on the overlap between the data obtained by the two methods. The recovery curves illuminated the sensitivity of recovery to distance threshold, and provided a useful means to examine the datasets relative to each other.
We combined the protein-DNA binding data with known Oct4 and Nanog RNAi expression profiling data in order to analyze the targets that are differentially regulated upon Oct4 or Nanog knockdown in ES cells. OCT4- and NANOG-bound regions uncovered by both technologies, as well as the ones obtained exclusively by each method, contained a number of differentially regulated genes. Many of these genes encode transcription factors and regulators of gene expression, which are important in development. For instance, the expanded OCT4 and NANOG regulatory network contained genes such as Hoxa1, Foxd3, Msx2 and Hexb, which showed changes in expression upon Oct4 or Nanog knockdown. These genes have been shown to be important in cell fate specification, and are involved in developmentally important signaling pathways. Such additional targets identified by each technique can be used to expand the ES cell transcriptional regulatory framework, and thereby provide more detailed groundwork to understand pluripotency mechanisms. Further genetic manipulations of each of these genes in ES cells would be necessary to independently validate their contributions to pluripotency.
Although both ChIP-chip and ChIP-PET technologies have been useful in studying protein-DNA interactions on a genome-wide scale, each method has its set of limitations. In ChIP-chip, our observations are restricted to regions tiled on the array platform, and the resolution is limited by the size of the probes, their spatial distribution, as well as the average fragment length of sonicated DNA hybridized to the arrays. In ChIP-PET experiments, the bacterial cloning and sequencing steps, as well as mapping issues, introduce scope for error. We feel that a combination of more stringent mapping criteria and the inherent noise in the sequencing procedure may be responsible for the number of sequence reads that did not match perfectly to the genome. Moreover, as indicated by our sequence-depth analysis, the number of sequences obtained from ChIP-PET experiments can be a limiting factor, since more binding targets can be recovered through greater depth in sequencing. Additionally, as in the case of ChIP-chip experiments, the resolution of binding is limited by the average DNA fragment size used in the ChIP experiment. We observed some of these limitations in this study since there was a significant number of OCT4 and NANOG targets that had been identified by ChIP-PET, and did not have corresponding probes tiled on the arrays used in the ChIP-chip experiments. Apart from these limitations, it is also important to consider that binding sites may be differentially occupied at different times in the cell cycle since the chromatin state changes at different times [22
]. However, since it is currently not feasible to culture ES cells in a synchronized manner, such genome-wide analyses should be done with this caveat in mind. In addition to this, another limitation to these studies is that the processing of ES cell samples can vary between different laboratories and also between different batches of serum used to culture these cells. Finally, different binding results may be obtained due to differences in ES cell strains. Therefore, with the availability of binding information from different cell strains [11
], we can begin to address such issues.
Apart from ChIP-chip and ChIP-PET, other ChIP based methodologies, such as ChIP-SACO (serial analysis of chromatin occupancy) [23
] and STAGE (sequence tag analysis of genome enrichment) [24
], have been used to determine protein-DNA interactions on a genome-wide scale. Most recently, ChIP-Seq [25
], a sequencing based technology, has aimed to address many of the issues, such as genome coverage, sequencing depth and binding resolution, which are encountered by other currently used techniques. With this rapid change in technologies, it will be important to investigate the results obtained from these techniques and incorporate them into our current understanding of regulatory networks. Importantly, the use of multiple techniques has been shown to produce variations in the information obtained through individual platforms [14
]. Using the data obtained through these different methodologies in a complementary fashion provides a more thorough foundation for further investigating these networks.