Chromatin immunoprecipitation coupled with massively parallel sequencing (ChIP-seq) is increasingly being applied to study transcriptional regulation on a genome-wide scale. While numerous algorithms have recently been proposed for analysing the large ChIP-seq datasets, their relative merits and potential limitations remain unclear in practical applications.
The present study compares the state-of-the-art algorithms for detecting transcription factor binding sites in four diverse ChIP-seq datasets under a variety of practical research settings. First, we demonstrate how the biological conclusions may change dramatically when the different algorithms are applied. The reproducibility across biological replicates is then investigated as an internal validation of the detections. Finally, the predicted binding sites with each method are compared to high-scoring binding motifs as well as binding regions confirmed in independent qPCR experiments.
In general, our results indicate that the optimal choice of the computational approach depends heavily on the dataset under analysis. In addition to revealing valuable information to the users of this technology about the characteristics of the binding site detection approaches, the systematic evaluation framework provides also a useful reference to the developers of improved algorithms for ChIP-seq data.