The problem of identifying functional TF targets that regulate gene expression, in a specific biological context, requires joint considerations of both TF DNA-binding data and the target gene's expression changes. We described a statistical framework for quantifying the evidence of TF-gene interaction from ChIP-seq data, and integrating them appropriate gene expression data to construct genome-wide signatures of TF activity.
Two main findings of our study are that 1) TREG binding scores derived from ChIP-seq data alone are more informative than simple alternatives that can be used to summarize ChIP-seq data; and 2) TREG signatures that integrate the binding and gene expression data are more sensitive in detecting evidence of TF regulatory activity than available alternatives. We show that this advantage of TREG signatures can make the difference between being able and not being able to infer TF regulatory activity in complex transcriptional profiles. This increased sensitivity also showed to be critical in establishing connections between disease and drug signatures that would not be possible using currently available strategies.
Identifying the role of specific TFs in producing disease-related transcriptional profiles is of vital importance for understanding the molecular mechanisms underlying disease phenotype. Although it is possible to obtain direct measurements of TF activity in disease samples 
, such ChIP-seq profiling is technically challenging and systematic profiling of many different TFs is not feasible. Therefore, the ability to infer the role of a TF from the transcriptional profiles remains challenging. The most common strategy of implicating TF involvement is by computational analysis of genomics regulatory regions of differentially expressed genes 
, or by searching for enrichment of known targets among differentially expressed genes 
. Here we present an alternative strategy relying on direct concordance analysis between TREG signatures of TF activity and disease-related transcriptional profiles. When searching for evidence of regulation by the TF with functional binding sites in distant enhancers, such as ERα, and “messy” transcriptional signatures resulting from activity of multiple regulatory programs, our approach dramatically improves the precision of the analysis.
Our results indicate that TREG signatures derived from in-vitro experiments (ERα; MCF-7 cells), and even from a different organism (E2f1; mouse) provide effective means for analyzing transcriptional profiles derived from human tissue samples. This would indicate that TF binding profiles coming from any biological system under which TF shows signs of activity might be sufficiently informative to construct TREG signatures. In this context the recently released ENCODE project data 
may be turned into a powerful tool for detecting TF activity. As a step in this direction, we have created 494 TREG binding profiles using the ENCODE ChIP-seq data and made it available from the support web-site (http://GenomicsPortals.org
). Complementary gene expression data generated by directly perturbing specific TFs, such as shRNA knock-downs and overexpression experiments can be used to construct TREG signatures. For example, transcriptional signatures of such systematic perturbations that is being generated by NIH LINCS project (http://LincsProject.org
) could provide complementary transcriptional profiles for ENCODE ChIP-seq data.
Our methods are complementary to methods used to analyze the recently released ENCODE project data 
. For some experimental conditions, the ENCODE project provides additional data types that can be used in assessing the functionality of TF binding peaks, such as distribution of specific epigenetic histone modifications. For discussion on how to possibly incorporate this additional information within TREG methodology, please see supplemental discussion (Text S1
Up-regulated expression of proliferation genes is a hallmark of neoplastic transformation and progression in a whole array of different human cancers 
. While the core transcriptional signature of proliferation is recognizable in a wide range of biological systems and diseases, the events and pathways that drive the transcriptional program of proliferation vary widely. Increased expression of proliferation-associated genes has been associated with poor outcomes in breast cancer patients 
. However, the driver mechanisms in many aggressive cancer types are poorly understood. Inhibiting known driver pathways, such as ERα signaling in breast cancer often leads to treatment resistant tumors due to activation of alternative, poorly understood driver pathways 
. Using the signatures of such “driver events/pathways” we can identify candidate drugs capable of inhibiting them. In our analysis of ERα activity in ER+ breast cancers we showed that such an approach can highlight connections between disease and drug candidates that would be missed by simply correlating disease and drug transcriptional signatures