Cancers are caused by somatic mutations that lead to hallmark changes in cellular signaling systems1
, e.g., sustaining proliferative signaling, evading growth suppressors, activating invasion and metastasis, enabling replicative immortality, inducing angiogenesis, and resisting cell death. Large-scale efforts have been devoted to identify somatic and germ-line mutations from a large number of tumor samples, including the Cancer Genome Atlas (TCGA) project and the international network of cancer genomic projects 2
. A major thrust in cancer genomic research is to identify driving mutations and reconstruct perturbed signaling pathways that underlie the hallmark behaviors. This body of information will shed light on the disease mechanisms of cancers, reveal novel drug targets, and more importantly guide patient treatment based on personal genetic information.
It is not uncommon that cancer cells accumulate a large number of mutations during development; some are cancer-causing (driving mutations) while others have no relation to cancers (passenger mutations)5
. It is challenging to delineate a pathway based on the mutations because mutations of genes on the same pathway usually are mutually exclusive in a tumor6
and thus one has to reconstruct such a pathway by compiling the mutations across multiple tumor samples. This in turn requires one to identify the tumor samples in which a common pathway is perturbed. Furthermore, the nature, that multiple aberrant signaling pathways underlie each tumor, makes more challenge to the task of de-convoluting large-scale cancer genomic data solely based on genetic mutation7
In this study, we address the task of revealing perturbed signaling pathways by integrating genomic mutation data with functional genomic data, i.e., gene expression data. The main idea underlying our approach is to use differential expression gene modules as the readouts of signaling pathway perturbations, which enable us to reconstruct a signaling pathway by finding the mutations that are strongly associated with a gene expression module. We developed a framework that unifies ontology-guided knowledge mining and graph-based data mining to achieve the goal. Our methods were able to discover perturbation of many well-known cancer signaling pathways, and we conjecture that some of our results may help to discover novel pathways in cancers. In the paper, we applied our methods in the ovarian cancer data from TCGA as a test, and we believe this general technology can also be used to study other types of cancer data.