Transcription factors (TFs) play important roles in the regulation of many biological processes, such as cell proliferation, cell cycle progression, and apoptosis [1
]. Aberrant expression or activation/inactivation of TFs has been implicated in a variety of human cancer types [3
]. As a matter of fact, a large number of oncogenes and tumor suppressor genes are actually TFs in nature [7
]. P53, the most well studied tumor suppressor gene, has been found to mutate in over 50% of human cancers, mostly impairing its capability of transcriptional activation [8
Association between TF expression and patient survival has been demonstrated in various cancer types [9
]. Bamham et al. showed that in patients with diffuse large B-cell lymphoma (DLBCL) the transcription factor FOXP1-positive group had a significant decreased overall survival in comparison with the FOXP1-negative group (P = 0.0001) [12
]. Anttilla et al. found that the expression level of cytoplasmic AP-2alpha, a transcription factor, is positively correlated with patient survival in epithelial ovarian cancer [15
]. In lung adenocarcinoma, positive thyroid transcription factor 1 (TTF1) staining is strongly correlated with the survival of patients [11
]. In gastric cancer, expression of the transcription factor Sp1 is negatively correlated with patient survival [13
]. These studies indicate the importance of TFs in cancers as well as their prognostic value in clinical outcome predictions. Nevertheless, systematic association between TF activities (the capability for a TF to regulate gene expression) and patient survival has not previously been investigated due to the lack of high-throughput techniques to measure TF activities.
In cancer research, microarray technologies have been widely used to identify differentially expressed genes [16
], to classify tumor samples into different sub-types [17
], to predict clinical outcome based on gene expression profiles and so on [18
]. However, in general, gene expression profiles in microarray data represent the down-stream readout of a few genetic alterations such as mutations, amplifications and deletions [19
]. The regulatory mechanisms underlying the observed expression changes (e.g. the alterations in TF activities) are often not directly observable from the microarray data due to relatively low abundance of TF mRNAs and post-transcriptional modifications to TFs. Namely, the mRNA expression levels for TFs may not reflect their protein abundance or transcription regulatory activities. As a consequence, a mutation in the P53 gene, for instance, may not be reflected by its own expression change, but we would more likely observe the differential expression of its target genes. Thus, it is useful to infer the activity alterations of TFs in cancers from the expression changes of their target genes.
For many microarray cancer data sets, the survival information of patients after diagnosis is also provided. With this kind of data at hand, we propose a method to infer TF activities and identify TFs that are associated with patient survival in a systematic manner. Given gene expression profiles for tumor samples, we use the BASE method [20
] to infer TF activities based on expression changes of their target genes. The complete list of target genes for human TFs is generally not available, so we used computational methods to predict the TF-gene regulatory relationships by examining the occurrence of TF binding sites (represented as positional weighted matrices, PWMs) within the promoter-proximal regions of genes. The resulting TF-gene binding affinity profiles were taken together with gene expression profiles as inputs to the BASE algorithm to infer the activities of TFs (PWMs) in each patient sample. We obtained 565 PWMs from the TRANSFAC database [22
] and inferred their activities (reflect the activities of TFs binding with them) in each sample of the given microarray cancer data. We then identified all the PWMs whose activities were significantly correlated with patient survival.
We applied our method to two microarray data sets, a breast cancer data set with ER-positive and ER-negative subtypes [18
] and a leukemia data set [23
]. In breast cancer, the activities of steroid nuclear receptors and the ATF/CREB family are significantly correlated with the disease-free survival time of patients. In leukemia, TAL1 (T-cell acute lymphocytic leukemia 1) activity is significantly correlated with patient survival. Further investigation of these TFs may provide new insight into the mechanisms of tumorigenesis in breast cancer and leukemia. Moreover, our method can be readily applied to other microarray cancer data sets.