The analysis of the recently available ChIP-seq data on 8 histone modification marks and 13 TF binding sites in mES cells confirmed the distinct chromatin signatures associated with promoters and enhancers. We did not observe any significant correlation between the histone modification patterns and the binding of the 13 TFs probably because none of these factors are involved in chromatin modification. The unexpected correlations between several histone marks and the binding strength of TFs (Table S3 in Additional file 2
) still needs further validation and determination of the underlying molecular mechanisms.
Histone modifications reflect the epigenetic state of a cell, which provides useful information to map the functional activities of regulatory elements. In this study, we present a new computational model called Chromia that integrates sequence motif and chromatin signatures to predict target loci of TFs. We have demonstrated that the performance of our method is superior to many other methods. When comparing the predicted target genes of four TFs with the genes affected by knocking down these TFs, we found that Chromia identified more TF target genes than using the binding peaks of these TFs. This observation is not totally unexpected because the histone modifications are tightly related to function, which illustrates the usefulness of Chromia for predicting functional TFBSs.
There are several advantages of our approach. First, antibodies specifically against many histone marks are already available and therefore the chromatin modification profiles can be readily obtained for many organisms/tissues/cell lines. Second, this approach does not rely on the assumption that TFBSs are evolutionarily conserved, which allows identification of fast evolving or species-specific TFBSs. Furthermore, the non-trivial problem of choosing genomes with appropriate evolutionary distance and aligning these genomes can also be avoided. Third, since histone modification patterns are condition-specific, our method provides an approach to identifying TFBSs that may be functional only in specific tissues or developmental stages. Fourth, our method is much more efficient than many methods for predicting TFBSs at the genomic scale.
It is also worth noting that our model suggests a way to combine discrete and continuous sources of information by converting DNA sequence information to continuous PSSM scores. Previous studies showed that, in many scenarios, a cluster of weak TFBSs may play significant roles in regulating gene expression. The PSSM score profile provides an overall characterization of binding preference of a TF at a genomic locus. This is captured by the HMM and integrated with the chromatin signature to pinpoint the binding sites of a TF.
Recently, several approaches have been proposed to predict TFBSs in mammalian genomes using chromatin structure information. For example, ProbTB combined multiple sources of data to identify TFBSs in 47 mouse promoters [43
]. Whitington et al
] used H3K4me3 as an additional filter to predict TFBSs in promoter regions. However, these studies are restricted to the small regions near TSSs. In contrast, we integrated chromatin signature and sequence motif information into one model and performed genome-wide prediction of TFBSs in both promoter and enhancer regions. Also, we demonstrated the superior performance of Chromia over the baseline method, which is in the same spirit of the Whitington et al
. approach. Compared to our previous study [30
], which aimed to find genomic regions of functional elements, including promoters and enhancers, here we were able to pinpoint TFBSs to 100-bp resolution by incorporating motif information, which also demonstrates the flexibility of our model to integrate additional data.
Although the performance of our method is very encouraging, it is no doubt there is still much room for improvement. Currently, only eight histone marks are mapped in the mES cells and not all of them are informative for locating regulatory elements. We expect that more histone marks with distinct patterns will help improve the performance of our method. We also observed that predictions for enhancers were relatively worse than those for promoters. Recent studies suggested that enhancers might be more cell type specific than promoters [40
]. It is possible that the lower prediction accuracy for enhancers may be due to different cell lines used in histone modification (murine V6.5 ES cells) and TF binding (murine E14 ES cells) experiments. Furthermore, we should point out that our HMM was trained on the chromatin signatures associated with the p300 binding sites, which might only represent a small subset of the histone modification patterns at enhancers. Therefore, the trained HMM may miss many enhancers with different chromatin signatures. When binding sites of other cofactors commonly appearing at enhancers are mapped, a more comprehensive collection of histone modification patterns can be established and it is possible that the performance of our method can be further improved. Another limit of our method is that, like all methods that rely on binding motifs, it cannot distinguish TFs with very similar PSSMs (like n-Myc and c-Myc). However, if more histone marks are mapped and these TFs are associated with distinct chromatin signatures, it is possible to resolve the ambiguity of binding of these TFs.
Chromia is available at [45