The epidermal growth factor receptor (EGFR) is a transmembrane receptor belonging to the group of receptor tyrosine kinases that forward extracellular signals via phosphorylation cascades, which finally arouse cellular responses. This kind of proteins is often related to cancer due to mutations or overexpression leading to aberrant signaling and resultant excessive proliferation [
1-
3]. Main adaptors for EGFR are GRB2 and Shc, activating the mitogen-activated protein kinase (MAPK) pathway via RAS. ERBB2 binding sites are more promiscuous, enabling the respective dimers to activate not only the MAPK but also the phosphoinositide 3-kinase (PI3K) pathway, the two major pathways in ERBB signaling responsible for cell proliferation, cellular survival and anti-apoptosis [
4]. Also, cross-talk of these pathways exists, offering potential bypass strategies in the protein network (Figure ). Due to the association of overexpressed EGFR with poor prognosis of head and neck squamous cell carcinoma (HNSCC), cetuximab, a monoclonal antibody targeting the receptor, is applied in common therapeutic strategies [
5]. However, many HNSCC patients are non-responders or develop resistance, which is suspected to result from aberrant activation of EGFR pathways [
6,
7]. To improve such a targeted therapy, it would be beneficial to gain insight into the individual molecular specificity of the targeted pathway per patient [
8]. Thus, in a personalized medicine approach, the relevance of the pathway should be revealed in advance to treatment. Therefore, the detection of common gene activity patterns among sample subsets is used to stratify patients based on their gene expression profiles.
Gene expression microarrays are a widely used tool to measure genomewide transcription within cell lines or tissues under varying conditions. Usually, gene-wise statistical tests, for example employing linear models, are then performed to determine differentially expressed genes [
9]. Methods to find overrepresentation of functional gene sets or pathway genes, so called gene set enrichment analysis (GSEA), are employed in order to interpret the resulting long lists of differential genes [
10-
12]. To monitor the activity of certain pathway parts or transcription factors (TFs), gene sets of TF target genes, as they can be retrieved from databases like TRANSFAC, are of special interest [
13]. Another aspect of data analysis is revealing gene expression patterns of patient or gene groups by clustering or dimension reduction techniques [
14]. A number of specialized methods have been proposed previously, for example, clustering genes and patients simultaneously into biclusters [
15], applying predefined gene signatures in guided clustering approaches [
16] or signal flow reconstruction in pathways from downstream effects of perturbation experiments [
17].
Fertig et al. have proposed the new method Coordinated Gene Activity in Pattern Sets (CoGAPS) [
18] and made it available as add-on for the popular free statistical computing software R [
19]. It combines a matrix factorization technique with GSEA of downstream transcriptional targets to determine patterns of pathway activity. They now demonstrate its utility to study cetuximab resistance in HNSCC by analyzing gene expression patterns downstream of EGFR [
20].