Although genome-wide proteomic approaches are rapidly improving, the most widely available and cost-effective genome-wide expression data is still collected at the mRNA level. These experiments are carried out using either microarrays or more recently RNA sequencing (RNA-seq) (Wang et al., 2009
). Commonly, studies examine cells under different experimental conditions such as control versus drug treated, disease versus normal states or as a time-series, for example, during cell differentiation. Since quantitative changes in mRNA levels do not directly explain how cell signaling mechanisms are altered to induce changes in gene expression, and in turn lead to changes in cellular phenotype, identification of such upstream regulatory mechanisms has been the focus of many computational systems biology studies. Such understanding will enable us, among other things, to better control cell behavior with small molecules, and in turn translate such ability to therapeutics. Most popular approaches for data interpretation of changes in genome-wide gene expression include promoter analysis (Matys et al., 2006
; Portales-Casamar et al., 2010
), gene ontology (The Gene Ontology Consortium) or pathway enrichment analyses (Kanehisa et al., 2010
), as well as reverse engineering of networks from mRNA expression data (Margolin et al., 2006
). The ultimate goal of many of these approaches is to identify and rank potential target genes/proteins that if knocked down or overexpressed would explain the observed changes by, for example, reversing them. Such proteins may ultimately become drug targets. Here, we present a rational approach called Expression2Kinases (X2K) to identify and rank putative transcription factors, protein complexes and protein kinase that are likely responsible for the observed changes in genome-wide mRNA expression. By combining data from chromatin immuno-precipitation (ChIP)-seq/chip experiments and/or position weight matrices (PWMs), protein–protein interactions and kinase–substrate protein phosphorylation reactions, we demonstrate how we can better identify regulatory mechanisms responsible for genome-wide differences in gene expression. The idea is to first infer the most likely transcription factors that regulate the differences in gene expression, then use protein–protein interactions to connect the identified transcription factors using additional proteins to build transcriptional regulatory subnetworks centered on these factors and finally use kinase–substrate protein phosphorylation reactions to identify and rank candidate protein kinases that most likely regulate the formation of the identified transcriptional complexes ().
We show how transcription factors, protein complexes and protein kinase candidate identification and ranking are inferred robustly by cross-validating the method with additional data such as those from drug perturbations followed by genome-wide mRNA expression profiling. Furthermore, we demonstrate the application of the method to in several case studies, where we developed several visualization methods that present a global view of cell-fate trajectories at different layers of regulation. All together, X2K can rapidly advance our understanding of cell signaling networks' regulation of gene expression by utilizing different modalities of prior knowledge. The X2K approach can assist in drug target discovery and help in unraveling drug mechanisms of action.