Genome-wide analyses of TF-DNA interactions ultimately aim at unravelling the molecular mechanisms underlying the control of gene expression. As it has been established that the combinatorial input of different TFs on CRMs is a key determinant of spatiotemporal regulation of gene expression, detection of TFBS co-occurrences provides an excellent starting point for identifying TF combinations with tissue-specific transcriptional outputs. To this end, we have developed COPS, a new computational tool which scans genomic sequences for statistically significant TF motif co-occurrences. The performance of COPS with respect to the biological significance of the detected co-occurring TFBS pairs was evaluated by analyzing three independent genome-wide datasets. In all cases, COPS successfully detected co-occurrence of each TF with its known co-regulators. As genome-wide data are available for several of these co-regulators, we could show that co-occurrence of these motif pairs indeed correlates with in vivo combinatorial binding. Furthermore, for all analyzed TFs COPS reported co-occurrences with TFs involved in common tissue-specific processes. Therefore COPS is applicable for identifying potential transcriptional co-regulators of a TF of interest.
One advantage of COPS is the use of a FP-tree based data mining approach 
, which avoids the costly candidate generation and testing and is therefore time-efficient, especially when mining large datasets. Moreover, the calculation of the statistical significance of the frequent motif patterns is more dependent on DNA sequence content and to a lesser extent on sequence length, thus eliminating the requirement for normalization of the motif frequencies against the sequence length. Importantly, COPS is capable of efficiently scanning large datasets (i.e. the Snail DamID dataset consisting of 4000 sequences) for extensive motif collections. Overall, the above-mentioned parameters render COPS a powerful, time-efficient and statistically reliable computational tool.
Notably, COPS is not restricted to the detection of motif pairs, but it can also be used for identifying longer co-occurrence patterns, namely combinations of three or more TFBSs. However, the number of motif combinations dramatically expands when scanning for longer patterns using large motif collections, hence resulting in increased memory requirements and processing time. We exemplarily mention that when a Drosophila genome-wide dataset is scanned for detecting combinations of three TFBSs using the whole collection of Drosophila TF motifs (75 motifs) COPS will have to scan the sequences for a total number of 67525 frequent patterns. Therefore when possible, we advice COPS users to preselect a smaller subset of motifs in analyses involving longer patterns, in order to facilitate their analysis.
One feature of COPS not found in other sequence-screening tools is the detection of distance preferences between co-occurring motifs. The defined spatial organization of TFBSs on CRMs is critical for the proper assembly of functional regulatory complexes, since protein-protein interactions very often depend on favourable arrangements of BSs 
. Protein complex formation between TFs binding at adjacent BSs can explain how TFs with degenerate DNA-binding specificity precisely regulate their target genes in a cell type-specific manner, either by modifying their DNA recognition properties 
or by exhibiting a synergistic activity 
. Therefore, preferred close distance arrangements of TFBSs reported by COPS raise the possibility of direct interactions between the respective TFs. As we showed in this study, genomic regions containing closely spaced TFBS pairs (as reported by COPS) are associated to similar gene classes that reflect the properties of the TFs of the pair, therefore close distance arrangement of TF motifs may at least in some cases indicate cell-type specific combinatorial activity of the respective TFs.
When analyzing sequences with bioinformatics tools such as COPS, a parameter that should be taken into consideration is the important role of PWMs. TF co-occurrences are likely to be falsely omitted if the reported PWMs fail to represent the actual binding profile of the respective TFs or if they are poorly annotated. Furthermore, false positive results may be obtained due to degenerate PWMs that are frequently encountered in the genome and are therefore likely to be part of co-occurrence patterns on multiple genomic regions. An additional critical parameter is the definition of the genomic regions that will be considered as “TF-bound DNA regions”. In contrast to the regions defined by ChIP-Seq, which are usually in the range of a few hundreds base-pairs long, DamID- and ChIP-on-Chip- detected regions often extend up to several kilo base-pairs. In such cases, the region that is bound by the TF in vivo
is not always easy to detect due to a decrease of the signal to noise ratio. Moreover, similarly to all computational routines, COPS cannot capture several aspects of in vivo
transcriptional regulation. First of all, recruitment of TFs can occur without the presence of recognizable TF motifs on the respective genomic region. In such cases, genomic localization of the TF is mediated via binding at distal sites followed by DNA looping or via protein-protein interactions 
. For instance, when Mef2a- and Nkx2.5-bound sequences were analyzed in our study, COPS failed to detect the BSs of the known co-regulator Srf due to the fact that Srf binding at target sequences largely relies on protein-protein interactions and to a lesser extent on the recognition of consensus sequences 
. In addition, as epigenetic modifications often define the accessibility of genomic regions to TFs, TFBSs detected by COPS might not be occupied in vivo
. Therefore, genome-wide data on histone modifications could be used to optimize the interpretation of results obtained by COPS. Finally, detection of co-occurring BSs by COPS does not necessarily mean that the respective TFs combinatorially interact with the CRMs, but they could also be occupied by the TFs in different tissues or at different developmental stages.
In sum, COPS is a potent computational tool applicable for identifying potential transcriptional co-regulators that define context-dependent transcriptional outputs. In combination with genome-wide data for TF-DNA interactions, histone modifications and protein-protein interactions COPS allows the elucidation of cell type-specific regulatory networks.