Genome-wide mRNA expression data provide a rich resource for studying the molecular mechanisms of complex diseases. Through comparison of mRNA expression data between case and control samples, biomarkers and functional molecules significant for diagnosis, prognosis, and treatment have been identified for many complex diseases, including cancers [1
]. Extracting signals while rejecting noise in the data and interpreting the results to elucidate biological mechanisms relevant to disease are however, challenging [3
]. Lists of hundreds of mRNAs identified as differentially expressed are interesting but can be difficult to interpret in terms of the complex underlying biological processes. In addition, there are in many cases limited overlap between lists of individually dysregulated genes identified by different laboratories that study the same disease [3
]. To overcome these challenges, a number of methods that consider genes not as individual entities but as members of biological relevant groups have been developed. Among such methods, gene set enrichment analysis (GSEA, [4
]) is very powerful and highly popular.
While being quite useful for system-level analyses, GSEA and similar methods, such as gene set analysis (GSA, [5
]) have a limitation: they focus only on the molecules (i.e., genes) that comprise a pathway and may neglect the changing interactions among genes within a pathway. Consequently, only pathways enriched in individual differentially expressed genes are detected with statistical significance. However, gene interactions and the dynamics of these interactions are also essential components of pathways and they underlie the orchestration of biological processes at many levels [6
]. Interactions are associated with several dynamic characteristics, such as their direction, strength, permanence or transience, and presence or absence [6
]. The biological influence of a pathway can be dramatically changed if the dynamics of the interactions in the pathway are altered. Indeed, several studies have demonstrated that the changes in the dynamics of interaction are associated with cancer and other diseases [7
In this vein, Zhang et al. have proposed a method in which the interactions were represented by the co-variances or correlations between case and control classes, and showed that this approach provides biologically meaningful results [8
]. Eddy et al. developed another method called DIfferential RAnk Conservation (DIRAC), which is based on the relative expression ranks of genes in a pathway [10
]. A limitation of this method, however, is that it assesses the change in the relationship between genes qualitatively, and misses cases in which (i) changes in expression are not large enough to change the relative order of genes or (ii) the difference between the expressions levels becomes even larger. Watkinson et al. defined the synergy among pairs of genes in terms of the mutual information between phenotype and the clustering of samples induced by the gene expression levels [12
] and extracted disease-specific interactions in cancer. Another class of algorithms for system-level analysis of differential gene expression aims to identify dysregulated subnetworks in disease [2
]. Using protein-protein interaction (PPI) networks as a template for assessing functional associations among genes, these methods identify groups of functionally related genes that exhibit collective mRNA-level differential expression with respect to disease based on: mutual information, cover-based algorithms and others [13
]. These results strongly suggest that dysregulation of interactions is as important a mechanism of disease as dysregulation of genes.
In order to further explore the dysregulation of gene interactions in disease, we have developed Gene Interaction Enrichment and Network Analysis (GIENA), which implements four mathematically simple, yet powerful interaction profile functions to model gene interactions. The hypothesis behind the analysis, suggested by the work described above, is that dysregulation of interactions, like the dysregulation of individual genes revealed by GSA, is an important set of variables to analyze to provide a comprehensive understanding of mechanisms of disease. GIENA attempts to provide a set of interaction profiles that are associated with universal biological concepts. We then use the canonical pathway information to drive a specific network analysis to indentify hub genes that may mediate communication across pathways. These profiles and their biological interpretation are as follows: (i) the sum of mRNA expression levels, which models cooperation, (ii) the difference between mRNA expression levels models competition, (iii) the maximum mRNA expression level models redundancy, and (iv) the minimum mRNA expression level models dependency between a pair of genes. This framework provides a basis for interrogating both the dynamics of multiple types of interactions and gives clues to the regulatory logic of the perturbed networks, both within pathways and across pathways, as opposed to simply identifying the dysregulated players.
We evaluated these four interaction profiles using previously published mRNA expression datasets associated with cancer [15
]. We detected multiple disease-associated gene interactions, which we annotated with their biological significance and compared to known literature findings to validate the results. Also, we used the approach to compare data from different experimental studies to examine the robustness of the method. Then, we constructed gene interaction networks based on these detected interactions and analyzed the results as well, in this case to better understand potential novel connections between
pathways and to provide testable hypothesis for future experimental validations. Our results show that GIENA is able to reliably detect both known and novel dysregulated canonical pathways and dysregulated interaction networks related to the disease. In addition, the method gives consistent results across datasets from disparate laboratories. Overall, GIENA is systematic approach for the identification of dysregulated interactions at the pathway level and provides specific guidance for interpretation of disease-specific interactions in complex diseases.