The phenotypic heterogeneity of human cancers presents major challenges to advancing our understanding of disease mechanisms as well as to developing effective strategies for therapeutic design. This heterogeneity is also reflected at a molecular level in the variations in activity of cell signaling pathways that control cell growth and determine cell fate, processes critical for driving the cancer phenotype. Recent studies describing in-depth analyses of gene mutations in a number of human cancers have emphasized the importance of placing such data in pathway-specific contexts (
Ding et al., 2008;
Jones et al., 2008;
Network, 2008;
Parsons et al., 2008;
Wood et al., 2007). Certain biological processes do represent relatively simple series of biochemical events linked in an orderly fashion, such as the known biochemical pathways associated with energy metabolism. However, the extension of this notion of a linear pathway is not useful nor appropriate as a description of the events associated with complex cellular responses to environmental inputs such as growth stimulation. Rather, the signaling events represent activities in complex networks of multiple signaling modules that each respond to given inputs (
Segal et al., 2004). A module is the unit of signaling activity; one example is PI3K phosphorylating Akt to activate its kinase activity, another is cyclin D/Cdk4 phosphorylating Rb to eliminate its negative control of E2F. These modules are defined by the biochemical events that they mediate. They are assembled into
pathways by virtue of the nature of the signaling processes, but this is fluid, variable and context-dependent. For instance, PI3K can be activated by Ras, but PI3K can also be activated by a variety of other signaling events, so the
PI3K module is part of the Ras pathway in one setting but part of another pathway in a different setting. Ultimately, the complex assemblage of these signaling modules constitute the signaling network that is activated in response to a particular input under a defined set of conditions.
The Ras signaling network, frequently altered in human cancers, exemplifies modular structure. Ras controls numerous processes related to cell proliferation and fate through interactions with secondary effectors (
Shaw and Cantley, 2006). Mutations in Ras can alter its ability to interact with specific effectors, decoupling the downstream activities into discrete modules that contribute complementary activities critical to the initiation and maintenance of tumors (
Lim and Counter, 2005;
White et al., 1995). Of nearly a dozen effectors identified, the Raf kinase, RalGEF, and phosphoinositide-3-kinase (PI3K) modules are studied most thoroughly (
Mitin et al., 2005). Since particular modules are connected to specific characteristics of the tumor phenotype, having an unbiased catalog of the modules that comprise pathways, as well as the means to measure them, will prove valuable in efforts to pinpoint the precise modules that drive a tumor phenotype.
It is thus critical to develop methods to assay the activity of individual signaling modules as the basic units of signaling activity. While measures of protein phosphorylation could be an approach, this is limited by the availability of reagents to carry out the assay (usually antibodies), the sensitivity of the measurements, and the capacity to do this on a scale sufficient to eventually reconstruct the signaling network. Gene expression data represents one form of data that is an accessible, useful source for these measurements. Ultimately, cell signaling events lead to changes in gene expression and thus, regardless of whether or not the module directly involves transcriptional activity, the eventual result of the signaling process will be a change in gene expression. Further, whole genome measures of gene expression from DNA microarray analysis provide the complexity of data that can discern the subtle distinctions in signaling events.
Genome-scale expression data has proven ability to characterize the complex biological diversity in tumors or cells lines (
Bild et al., 2006b;
Segal et al., 2004). Multiple studies have shown that the activity of a pathway, such as amplification of MYC or mutation in RAF, leads to distinctive patterns in the expression of genes--the expression signatures of the pathways (
Adler et al., 2006;
Solit et al., 2006). Even pathways that operate primarily through post-translational mechanisms such as phosphorylation cascades leave recognizable gene expression signatures (
Bild et al., 2006a;
Huang et al., 2003;
Sweet-Cordero et al., 2005). For these pathways, the genes in the signatures reflect the downstream transcriptional consequences of protein-level regulation; while those genes may not coincide with the ones in the primary cascades, they nevertheless provide measures of upstream pathway activity. This suggests that the complexity of pathway machinery is reflected in the complexity of the expression data; we then need analysis methods to deconvolute this complexity and identify contributions of fundamental pathway modules.
To address this central question of deciphering pathway complexity, we have developed an approach to deconstruct pathways into underlying modules based on structure observed in gene expression profiles (
Bild et al., 2006a;
Lamb et al., 2006). Our approach builds on statistical factor analysis methods (
Brunet et al., 2004;
Carvalho et al., 2008;
Lucas et al., 2006;
Seo et al., 2007). By centering the analysis on the genes in a pathway, this analysis produces a set of pathway-related
signatures that we hypothesize represent the activities of the modules of the pathway. To exemplify and test the approach, we deconstruct the Ras signaling and E2F transcriptional regulatory pathways to reveal a series of module signatures that can predict drug sensitivity and dissect clinical outcomes in practically meaningful ways. This generates a deeper understanding of the complexity of pathway function by elucidating the modules reflected in natural variability of genomic expression structure. The analysis also leads to opportunity for therapeutic advances through the identification and characterization of clinically relevant pathway modules that may now be more specifically targeted with drugs.