Transcription factor knockout microarrays (TFKMs) provide useful information about gene regulation. By using statistical methods for detecting differentially expressed genes between the gene expression microarray data of the mutant and wild type strains, the TF knockout targets of the knocked-out TF can be identified. However, the identified TF knockout targets may contain a certain amount of false positives due to the experimental noises inherent in the high-throughput microarray technology. Even if the identified TF knockout targets are true, the molecular mechanisms of how a TF regulates its TF knockout targets remain unknown by this kind of statistical approaches.
To solve these two problems, we developed a method to filter out the false positives in the original TF knockout targets (identified by statistical approaches) so that the biologically interpretable TF knockout targets can be extracted. Our method can further generate experimentally testable hypotheses of the molecular mechanisms of how a TF regulates its biologically interpretable TF knockout targets. The details of our method are as follows. First, a TF binding network was constructed using the ChIP-chip data deposited in the YEASTRACT database. Then for each original TF knockout target, it is said to be biologically interpretable if a path (in the TF binding network) from the knocked-out TF to this target could be identified by our path search algorithm. The identified path explains how the TF may regulate this target either directly by binding to its promoter or indirectly through intermediate TFs. After checking all the original TF knockout targets, the biologically interpretable ones could be extracted and the false positives could be filtered out. We validated the biological significance of our refined (i.e., biologically interpretable) TF knockout targets by assessing their functional enrichment, expression coherence, and the prevalence of protein-protein interactions. Our refined TF knockout targets outperform the original TF knockout targets across all measures.
By jointly analyzing the TFKM and ChIP-chip data, our method can extract the biologically interpretable TF knockout targets by identifying paths (in the TF binding network) from the knocked-out TF to these targets. The identified paths form experimentally testable hypotheses regarding the molecular mechanisms of how a TF may regulate its knockout targets. About seven hundred hypotheses generated by our methods have been experimentally validated in the literature. Our work demonstrates that integrating different data sources is a powerful approach to study complex biological systems.