Breast cancer is a heterogeneous disease, both with respect to cells of origin and the underlying course on the molecular level [1
]. Variable series of cellular events may lead to the formation of malignancy, but to date the nature and sequence of many of the processes that go awry during tumorigenesis remain elusive. The value of such knowledge cannot be overestimated for understanding the disease and outlining the effective treatment.
With this aim in mind, we devised a novel bioinformatics approach, taking advantage of the abundance of available functional genomics data. Integration of heterogeneous data allows extraction of knowledge that is not evident when examining data of different types separately and provides a holistic view on the functioning of the biological system on multiple levels [2
]. Our approach goes beyond traditional microarray analysis, because it considers Protein Interaction Network modules as gene groups in a joint differential expression-differential coexpression analysis.
First, we use Protein Interaction Network (PIN) modules to delineate biological processes as an alternative to 'textbook pathways'. Several approaches for partitioning a bio-molecular network of interactions into sensible and coherent functional units have been proposed [4
]. Here we define modules as the densely interconnected regions in the PIN, i.e., groups of proteins in the network that are distinguishable from the neighborhood due to a much higher density of interactions among them than with other proteins in the network. Proteins heavily interconnected by a network of mutual interactions are likely to be involved in the same biological process [6
]. Such topology-based designation of modules is not constrained by existing annotation of pathways, and allows finding novel disease-specific modules. This network-guided approach is especially useful in the study of cancer, since this disease proceeds through step-wise accumulation of defects in biological processes, whose nature is often not known [9
]. In constantly evolving cancerous cells, signaling and metabolic pathways might be disrupted or modified to better serve the cells' needs and particular genes might trade their usual housekeeping function for a different one; some natural control mechanisms might be turned off, while others could be triggered. These events can be reflected in changes of the level and cohesiveness of gene expression profiles of affected processes. Therefore we subjected the network modules to differential expression and differential coexpression tests, in search of not only the processes that alter their intensity, but also the ones whose degree of coregulation differs between the disease stages.
The identification of network elements whose altered activity is associated with disease has been pursued in several recent studies. Choi et al. (2005) constructed coexpression networks specific for cancer and normal tissue and identified pairs of Gene Ontology categories most often represented by the pairs of differentially coexpressed genes [10
]. A similar approach was adopted by Xu et al. (2008), who integrated topological features of coexpression networks with differential coexpression analysis to identify network modules activated in cancer [11
]. Knowledge of protein interaction network served to identify network markers - subnetworks differentially expressed in breast cancer [12
], while Mani et al. took advantage of the B-cells' interactome and expression data to identify interactions disrupted in lymphoma [13
]. None of those studies however explore the possibilities offered by combining protein interaction network and gene expression data together with joined differential expression and differential coexpression analyses.
We applied our approach to the expression data from the study of mouse model of mammary gland tumor by Li et al. [14
], in which the tumor was induced by the expression of the fusion oncogene ETV6-NTRK3 in epithelial cells. The ETV6-NTRK3 oncogene encodes a chimeric tyrosine kinase [15
], whose expression leads to the formation of the human secretory breast carcinoma [17
]. Tumor development was monitored by measurement of gene expression from healthy, through hyperplastic (showing abnormal growth, but not yet invading surrounding tissues) to more aggressive carcinoma stage. A particular advantage of these data for our present study is that they are cell specific. In contrast to many studies of gene expression in solid tumors, in which gene expression measurements reflect averages over different cell types, the use of a molecular marker allowed Li et al. to separate the oncoprotein-expressing epithelial cells from healthy cells before RNA extractions.
We tested for the differential expression of the PIN-derived network modules and for changes in the correlation within the modules, associated with transition between disease states. Additionally, we investigated a possible role of microRNAs (miRNAs). Since miRNAs form an integral part of cellular regulatory network, they could contribute to the disruption of the vital cellular processes [18
]. There is growing evidence of an important function of miRNAs in cancer-related processes, such as differentiation, proliferation and apoptosis [19
]. The mechanisms of miRNAs functions fall mainly into two classes - oncogenic miRNAs, upregulated in cancer, and miRNAs with tumor suppressor activity, downregulated in cancer [20
]. In the present work, we identified miRNAs that target genes in modules with altered expression, thus being potentially responsible for the observed expression differences.