|Home | About | Journals | Submit | Contact Us | Français|
Small molecule inhibitors, such as lapatinib, are effective against breast cancer in clinical trials, but tumor cells ultimately acquire resistance to the drug. Maintaining sensitization to drug action is essential for durable growth inhibition. Recently, adaptive reprogramming of signaling circuitry has been identified as a major cause of acquired resistance. We developed a computational framework using a Bayesian statistical approach to model signal rewiring in acquired resistance. We used the p1-model to infer potential aberrant gene-pairs with differential posterior probabilities of appearing in resistant-vs-parental networks. Results were obtained using matched gene expression profiles under resistant and parental conditions. Using two lapatinib-treated ErbB2-positive breast cancer cell-lines: SKBR3 and BT474, our method identified similar dysregulated signaling pathways including EGFR-related pathways as well as other receptor-related pathways, many of which were reported previously as compensatory pathways of EGFR-inhibition via signaling cross-talk. A manual literature survey provided strong evidence that aberrant signaling activities in dysregulated pathways are closely related to acquired resistance in EGFR tyrosine kinase inhibitors. Our approach predicted literature-supported dysregulated pathways complementary to both node-centric (SPIA, DAVID, and GATHER) and edge-centric (ESEA and PAGI) methods. Moreover, by proposing a novel pattern of aberrant signaling called V-structures, we observed that genes were dysregulated in resistant-vs-sensitive conditions when they were involved in the switch of dependencies from targeted to bypass signaling events. A literature survey of some important V-structures suggested they play a role in breast cancer metastasis and/or acquired resistance to EGFR-TKIs, where the mRNA changes of TGFBR2, LEF1 and TP53 in resistant-vs-sensitive conditions were related to the dependency switch from targeted to bypass signaling links. Our results suggest many signaling pathway structures are compromised in acquired resistance, and V-structures of aberrant signaling within/among those pathways may provide further insights into the bypass mechanism of targeted inhibition.
Cell signaling pathways transduce input signals from extracellular to intracellular environments and determine various cell activities, including cell growth, proliferation, differentiation, migration, and apoptosis [1, 2]. Perturbation of a signaling network may occur when there are genetic alterations, such as DNA mutations and/or amplifications/deletions of a genomic region, or changes in gene expression (GE) [3, 4]. For example, the amplification or over-expression of the ErbB2 (HER2/neu) oncogene, that enhances various growth-related signaling activities  from receptor-level to effector-level , is commonly found in about 25% of breast cancer patients. In the majority of cancers, aberrant activities in signaling pathways are involved in various stages of tumor progression and metastasis [6–9].
Drugs targeting a signaling network, such as EGFR signaling pathway, often become ineffective as acquired resistance develops in cancer cells . Primary reasons for acquired resistance to EGFR family receptor targeted therapies include: secondary mutations of targeted genes (e.g., the EGFR T790M mutation ), transcriptional and post-translational up-regulation of RTKs (Receptor Tyrosine Kinases) both within the receptor-family (e.g. ERBB3/HER3 [12, 13]) and other kinases (i.e. IGF1R, MET, FGFR2, FAK, SRC family kinases [14–16]), the over-expression of ABC transporters , and the re-activation of targeted pathways . Moreover, tumor cells induce adaptive responses to targeted therapies  by rewiring in such a way that the adaptive signaling bypasses the inhibiting effects of initial treatments [4, 10, 17–19]. Therefore, rewiring of signaling networks plays a vital role as a non-genetic mechanism of acquired resistance [3, 14, 17, 18, 20]; targeting of which has the potential to improve the response durability of single kinase inhibitors [4, 5, 21]. However, reprogramming of signaling activities in acquired resistance inherently imposes increased uncertainties in the network structure when compared with their sensitive counterparts.
The functionality of biological networks is determined by their underlying architecture. Thus understanding, characterising, and analysing network structures are very important tasks in the field of systems biology . Statistical modeling approaches offer a great deal of flexibility in terms of scalability and the number of local features that can be incorporated . Moreover, as in other biological networks, signaling activities predicted using signaling data may be unreliable, whereas some crucial signaling links may not be predicted . Measurements of the signaling activities often yield noisy data. Therefore, for such data-driven signaling networks a statistical modeling approach such as exponential random graph models (ERGMs) or p* can be a suitable choice [22, 23]. The p1-model, a special class of ERGMs which was originally proposed by Holland and Leinhardt , models the probability of an edge formation in the observed network based on network statistics (e.g. node degree) and associating model parameters with those statistics [22, 23].
Measuring the probabilistic nature of pair-wise relationships is an important aspect of modeling a gene-gene relationship network. Particularly in cancer drug resistance, some relationships between gene-pairs may evolve in the resistant conditions to compensate for the inhibiting effects of the drugs used [10, 19]. Some gene-pairs may have higher probabilities of evolving correlations in resistant conditions than in sensitive conditions. Simultaneously, some gene-pairs having high correlations in sensitive conditions may become loosely correlated (or even independent) in resistant conditions. For example, Komurov et al. reported that genes of the glucose-deprivation response network are up-regulated in lapatinib- (an EGFR/HER2 dual inhibitor) resistant conditions, thus providing an EGFR-independent mechanism of glucose uptake in cancer cells . ErbB2-positive cancer cells largely depend on EGFR/ErbB2 signaling for their glucose uptake  which was recently reported as a major factor in oncogenic KRAS pathway mutations [25, 26]. Lapatinib mediates down-regulation of cell cycle machinery and up-regulation of cell cycle inhibitory complexes that are downstream of EGFR/ErbB2 signaling . Moreover, the inhibitory effect of lapatinib on EGFR/ErbB2 signaling in the sensitive condition was found to be associated with glucose starvation of cancer cells, and thus induced cancer cell death . However, in resistant conditions, up-regulation of activities involved in the glucose deprivation response network (and other hypoglycemic response networks) played an important role as a compensatory mechanism of glucose uptake in cancer cells for which tumors ultimately relapsed. Therefore, it can be hypothesized that genes involved in the process of cell proliferation and survival may evolve, in resistant conditions, to be highly correlated with the genes in the glucose deprivation response network in order to establish an alternate mechanism of glucose uptake in cancer cells, even though the inhibiting effects of lapatinib abrogated their dependencies on EGFR/ErbB2 signaling in sensitive conditions (See Fig 1 of .) Therefore, studying systematic characterizations of such differential dependencies among gene-pairs in resistant-vs-sensitive conditions, and their combined roles on particular genes’ dysregulations (in resistant-vs-sensitive) may reveal novel insights into mechanisms of acquired resistance.
Moreover, Komurov et al.  suggested that the drug resistance mechanism more likely occurs downstream of growth factor-mediated signaling pathways, such as Ras signaling, PI3K/AKT signaling, mTOR signaling, and others. However, an enormous number of diverse effector pathways may be involved in this process, making the prediction of biologically plausible hypotheses a challenging task. New computational approaches are needed to resolve such challenges in identifying the mechanistic underpinnings of acquired resistance.
Gene dysregulation is associated with aberrant signaling activities that are crucial for both cell growth and apoptosis in breast cancer . For example, dysregulation  and/or mutation [28, 29] of apoptosis-related genes may overcome the initial response to apoptotic stimuli, thereby conferring resistance to apoptosis. Sharifnia et al. recently reported that several kinases and kinase-related genes from the Src family (e.g. FGFR1, FGFR2 and MOS) can compensate the loss of EGFR activity across multiple EGFR-dependent models . Using unbiased gene-expression profiles of cells, their study revealed that over-expression of these EGFR-bypass genes plays a critical role in EGFR-independent activation of the MEK-ERK and PI3K-AKT signaling pathways in EGFR-mutant NSCLC cells. Recently, differential dependencies/associations were used to model rewiring in biological networks [31, 32]. Therefore, we hypothesize that differential associations between genes identified by modeling network reprogramming in resistant-vs-sensitive conditions could potentially explain gene dysregulation in acquired resistance.
In this study, we propose a computational framework to identify dysregulated signaling pathways in resistant-vs-sensitive conditions, and a possible mechanism of gene dysregulation in acquired resistance. The schematic diagram of our proposed framework is shown in Fig 1. We used two breast cancer cell-lines, SKBR3 and BT474, each having gene expression values measured under matched lapatinib-sensitive (parental) and lapatinib-resistant conditions. A gene-gene relationship network was constructed for each gene expression dataset by combining data-driven and protein-protein interaction (PPI) information indicative of both direct and indirect relationships between gene-pairs. Then we applied a fully Bayesian approach involving the p1-model to infer gene-pairs with differential posterior probabilities between these two conditions. Next, statistically significant dysregulated signaling pathways from KEGG, Reactome, and WikiPathway were identified by enriching putative aberrant pairs using literature curated signaling links. Finally, by proposing a novel pattern of aberrant pairs, called a V-structure, we identified possible mechanisms of dysregulation in resistant-vs-sensitive conditions that may be crucial for breast cancer metastasis and/or EGFR-TKI resistance. We hope such patterns revealed using our framework will lead to further insights into aberrant signaling activities in acquired resistance.
We developed a computational framework exploiting Bayesian statistical modeling to identify putative aberrant signaling links involved in acquired resistance. In this study, we hypothesized that aberrant signaling can be detected as differential probabilities of occurrence of gene-pairs in resistant-vs-parental conditions. Thus, after building gene-gene relationship networks individually from both parental and resistant conditions, a comparative study of edge probabilities in those two networks may reveal aberrant relationships due to acquired resistance.
Our framework constructs a gene-gene relationship network, GGR: = (S, R) by combining GE and PPI datasets, where S is a set of seed genes and R is a set of pair-wise gene relationships (Fig 1). Table 1 shows primary statistics for the GGR networks of both SKBR3 (GSE38376) and BT474 (GSE16179) cell-lines. For SKBR3 cell-lines (Parental and Resistant), we selected 897 seed genes comprised of 345 differentially expressed (DE) genes (Bonferroni corrected p-value ≤ 0.01), 370 genes from the Cancer Gene Census (CGC), and 502 and 479 linker genes from Resistant and Parental cell-lines, respectively. For BT474 cell-lines, we found 875 distinct seed genes comprised of 354 DE genes (Bonferroni corrected p-value ≤ 0.05), 357 CGC genes, and 477 and 489 linker genes from Resistant and Parental cell-lines, respectively. Note that to find DE genes in SKBR3 and BT474 cell-lines, two different p-value thresholds: 0.01 and 0.05 were used, respectively. This was done for two reasons: firstly, because the computational cost of using a conventional threshold of 0.05 with SKBR3 was prohibitive, and secondly, to ensure the numbers of DE genes in the two different cell-lines were comparable, and similarly for the sizes of the seed gene sets [for details see S1 Text].
Our approach constructs a GGR in a series of stages: an initial set of genes is obtained by combining DE and CGC genes. Edges are added corresponding to direct relationships between pairs of these genes. We then search for indirect relationships among gene-pairs for which direct relationships couldn’t be found, and where indirect relationships are found the linker genes and the edges connecting them are added to the network. For the SKBR3 cell-line, the initial gene set contained 704 genes obtained by combining 345 DE and 370 CGC genes, whereas for the BT474 cell-line, the initial gene set contained 698 genes obtained by combining 354 DE and 357 CGC genes. To define direct relationships among the genes in the initial sets, we chose the top 20% from the ranked list of all pair-wise absolute Pearson Correlation Coefficients (PCC). Thus, we identified 49,492 (in both parental and resistant condition) and 48,651 (in both parental and resistant condition) direct relationships in SKBR3 and BT474 cell-lines, respectively. We justified this choice of threshold by applying an approach proposed by Elo et al. which analyses the topological properties of a co-expression network in order to find an optimal cutoff value  [for details see S1 Text]. In searching for indirect relationships, we found that 502 and 479 linker genes connect 1,440 and 1,393 distinct gene-pairs (for which direct relationships were not found) with the help of 1,757 and 1,758 distinct PPI links, for SKBR3 resistant and parental cell-lines, respectively. Similarly, for BT474 Resistant and Parental cell-lines, 477 and 489 linker genes connect 1,572 and 1,517 distinct indirect gene-pairs along with 1,895 and 1,951 distinct PPI links, respectively. In both datasets (SKBR3 and BT474), to build two GGR matrices for resistant and parental conditions with similar sets of genes, we constructed the final set of seed genes as an intersection of the two individual seed gene sets for Resistant and Parental conditions. Hence, 502 and 479 linker genes from SKBR3 resistant and parental conditions were combined with 704 (DE CGC) genes to form 1,262 and 1,245 seed genes, respectively, and then finding an intersection of these two sets yielded a set of 897 genes. Similarly, combining 698 (DE CGC) with 477 and 489 linker genes from BT474 resistant and parental genes produced 1100 and 1101 seed genes, respectively, and intersecting these resulted in a final set of 875 genes. At the end of this process, the SKBR3 resistant and parental GGR networks contained 897 distinct seed genes (DE CGC Linker) with 52,560 and 52,510 gene-gene relationships (direct indirect PPI), respectively, and the BT474 Resistant and Parental GGR network contained 875 distinct seed genes with 51,998 and 51,972 gene-gene relationships, respectively. Note that for both SKBR3 and BT474 cell-lines, although the total number of final seed genes is the same for both resistant and parental conditions, their respective GGR networks may contain different numbers of gene-gene relationships.
After building the GGR networks for both resistant and parental conditions YkR and YkP separately, we conducted Bayesian inference of parameters using the p1-model to estimate posterior probabilities of gene-gene relationships in each network. We used a WinBUGS script used in our previous work  for this inference. We ran the MCMC (Markov Chain Monte Carlo) method for 15,000 iterations, where the first 10,000 iterations were considered as ‘burn-in’, and the next 5,000 iterations were used for sampling. Time-series plots indicated that all parameters converged within the first few thousand iterations (data not shown). In both networks, the posterior probability of each edge was estimated to be the proportion of the 5,000 sampled networks in which that edge was present.
We identified a gene pair (genei, genej) as putatively aberrant if its posterior probabilities and of appearing in each network (resistant and parental networks, respectively) are significantly different. To determine which gene-pairs had this characteristic, we calculated two odds ratios—OddsR and OddsP—as shown in Eqs (3) and (4) for each gene-pair (genei, genej). Note that since the two posterior probabilities used in these odds ratios may lie in different ranges, we normalized their values by dividing by their respective maximum values over all the gene-pairs in the respective sets. We then used two thresholds to define significance: first, we constructed the empirical distribution of odds ratios and chose only those gene-pairs which had odds ratios among the top 20%. For SKBR3 cell-lines, these threshold values were 2.53 and 1.66 for resistant and parental conditions, respectively, and for BT474 these values were 12.028 and 2.115, respectively. Next, we constructed empirical distributions of the posterior probabilities of the previously selected gene-pairs, and chose only those gene-pairs whose posterior probabilities were in the top 50% in their respective distributions. For SKBR3 resistant and parental cell-lines, these thresholds of posterior probabilities were 0.212 and 0.252, respectively, and for BT474 cell lines, 0.177 and 0.304, for resistant and parental conditions, respectively. More detailed explanations regarding these two types of thresholds are provided in the Supplementary Methods section in S1 Text. Thus, our framework finally selected 80,372 and 76,476 aberrant gene-pairs for SKBR3 and BT474 cell-lines, respectively, and we hypothesized that these aberrant gene-pairs have the potential to explain the mechanism of acquired resistance in breast cancer. Lists of all identified putative aberrant gene-pairs for both SKBR3 and BT474 cell-lines are shown in S1 Table.
To investigate the robustness of our approach, we compared the posterior probabilities with the initial PCC (Pearson Correlation Coefficient) values for each of the putative aberrant gene pairs as shown in Fig 2. We treated the posterior probabilities of the red gene-pairs [see Methods] as positive values and the posterior probabilities of the green gene-pairs [see Methods] as negative, and plotted their sorted values in descending order (Fig 2). Next, we constructed a scatter plot with corresponding absolute PCC values for each of these gene-pairs, sorted based on posterior probabilities. We added a trendline using a moving average with window size 25, to investigate whether this trendline was in any way similar to the trend observed in the posterior probabilities. Interestingly, for both SKBR3 and BT474 cell-lines, the trendlines of PCC values revealed a visually similar pattern to that of the corresponding posterior probability values. This confirms our expectation that our Bayesian analysis is sensitive to a signal in the PCC values that would be otherwise difficult to detect.
To measure the significance of signaling pathways in terms of aberrant signaling activities in acquired resistance, we conducted a hypergeometric test. In this test, we measured how significant was the overlap between the set of literature-supported signaling links  found in a particular signaling pathway with the set of putative aberrant gene-pairs in the same pathway. We identified all the signaling pathways from KEGG, Reactome, and WikiPathway databases for which the corresponding q-value (FDR corrected p-value) from the above hypergeometric test was < 0.05 in both SKBR3 and BT474 cell-lines as is shown in Fig 3. For both SKBR3 and BT474 cell-lines, 71.11% (32 out of 45), 62.5% (15 out of 24), and 57.38% (35 out of 61) signaling pathways from KEGG, Reactome, and WikiPathways, respectively, were found to be significantly enriched with aberrant signaling gene-pairs in acquired resistance (Fig 3). Again, for all corresponding KEGG, Reactome, and WikiPathway databases, such high percentages of enriched signaling pathways found in both SKBR3 and BT474 cell-lines indicates that our framework is consistent in terms of finding aberrant gene-pairs in both cell-lines. Complete enrichment results of this hypergeometric test are reported in S2 Table.
We conducted a literature survey for the putative dysregulated signaling pathways, and found that the aberrant activities in most of these pathways are strongly associated with acquired resistance to EGFR tyrosine kinase inhibitors (EGFR-TKIs) . EGFR (also known as HER1, or ErbB1) and EGFR 2 (also known as HER2/neu, or ErbB2) are cell surface transmembrane proteins, and members of the HER family of receptors. EGFR (in KEGG, Reactome, and WikiPathway) and ErbB2 (in Reactome) are reported to be frequently mutated and/or over-expressed in various types of cancer resulting in aberrant activities contributing to abnormal cell growth, survival, migration, and differentiation [35, 36]. However, over-expression and secondary mutations of both EGFR [11, 37–39] and ErbB2  are associated with acquired resistance to EGFR-TKI. Moreover, being key components of cell signaling systems, these RTKs control major downstream signaling pathways, i.e. Ras/Raf/MAPK (in KEGG and WikiPathway), PI3K-Akt (in KEGG and Reactome), FoxO (in KEGG), and Jak-STAT (in KEGG) that are crucial for cancer cell growth and survival . Moreover, as these downstream signaling pathways further regulate multiple downstream effector pathways (related to cell growth and survival), aberrant re-activation of those pathways provide a common mechanism to compensate for inhibition of targeted pathways, thereby conferring acquired resistance to EGFR-TKIs [4, 41, 42]. Interestingly, these signaling pathways (i.e. Ras, PI3K-Akt, FoxO, Jak-Stat signaling) were found as the top-most in the list of aberrant signaling pathways in both datasets (SKBR3 and BT474) based on the above hypergeometric test using KEGG database as shown in Fig 3. For ErbB4 signaling (in Reactome), recently it has been reported that in ErbB2-positive breast cancer cell-lines, ErbB4 was up-regulated at the protein level in vitro and re-activated PI3K-Akt signaling in resistant conditions compared to the sensitive condition, and the knock-down of ErbB4 induced apoptosis in both the lapatinib-resistant and trastuzumab-resistant cell-lines . Rap1 (in KEGG) and ras (in KEGG) signaling are activated by lung cancer oncogene CRKL whose focal amplification (secondary mutation) was reported to be associated with acquired resistance to EGFR inhibitor . Again, signals for cell proliferation and survival from activated AKT may transduce through several phosphorylated transcription factors, such as FoxO (in KEGG) , which indicates that the dysregulation of FoxO signaling pathway (in KEGG) may potentially be associated with resistance to EGFR-TKIs.
Our previous study found cross-talks between EGFR signaling and pathways triggered by other types of receptors, e.g. Notch, Wnt, IGF1R, GPCR, etc. which contributed to acquired resistance to lapatinib (an EGFR/Her2 dual inhibitor) . Here, we also found these pathways showing significant aberrant activities in acquired resistance to lapatinib [Fig 3]. The activation of IGF1R signaling (in KEGG, Reactome, and WikiPathway) is commonly reported to induce acquired resistance to EGFR-TKIs by many studies [46, 47], and its inhibition could down-regulate PI3K-Akt signaling, eventually inhibiting cell growth, providing co-inhibition of EGFR and IGF1R signaling a clinical success . Similarly, the Notch signaling pathway (in KEGG, Reactome, and WikiPathway) cross-talks with EGFR signaling in breast cancer, thus maintaining the cancer cell growth signal through MAPK and PI3K-Akt signaling . It is suggested that an improved drug-sensitivity could be achieved by down-regulating the Notch signaling pathway with specific inhibitors [49, 50]. Again, genes involved in Wnt signaling (in KEGG, Reactome, and WikiPathway) were up-regulated in the resistant condition in both breast and colon cancer when compared to the sensitive condition [51, 52], thus contributing to acquired resistance to EGFR-TKIs .
Targeting angiogenesis is another important aspect of anticancer therapies , as aberrant vascularity and hypoxia are directly associated with tumor growth and survival . In our analysis, we found aberrant angiogenic pathways including signaling by Vascular Endothelial Growth Factors (VEGFs) (in KEGG), Fibroblast Growth Factors (FGFs), and Platelet-Derived Growth Factors (PDGFs). It has been reported that the VEGF/VEGFR-2 feed-forward loop increases VEGF secretion in lung cancer via mTOR-dependent regulation that is required for the activation of downstream signaling , and the over-expression of VEGFR-1 reduces EGFR-TKIs sensitivity in different human cancer cells [3, 55]. Alternate activation of the FGFR signaling pathway (in Reactome) through the over-expressions of FGFR1 and FGF2 acts as a compensating mechanism for EGFR-TKIs  by maintaining signals for cell survival and proliferation in the downstream signaling pathways . Again, it has been recently reported that, in PDGFR signaling (in Reactome), transcriptional de-repression of PDGFR-β contributed to compensating for the effects of EGFR-TKIs in EGFR-mutant glioblastoma via an mTORC1- and extracellular signal regulated kinase-dependent mechanism .
The hippo signaling pathway (in KEGG) is associated with cell proliferation, apoptosis, organ size control, and stem cell self renewal . YAP is a transcription co-activator and oncoprotein , and plays a central role in cancer-related activities of the hippo signaling pathway . Huang et al. have recently reported that down-regulating YAP expression in various cell-lines can improve the sensitivity of erlotinib (an EGFR-TKI) and cetuximab (anti-EGFR drug) . We found the gene-pair AKT2:MYC as a signaling cross-talk between EGFR/ErbB and the TGF-β signaling pathway (in KEGG, Reactome, and WikiPathway) in our previous study . Recently, it has been reported that combined inhibition of EGFR-TKIs (erlotinib) and TGF-β type I receptor inhibitor may improve sensitivity of EGFR-TKIs in lung cancer without EGFR T790M mutation .
For both SKBR3 and BT474 cell-lines, the primary findings in this study with supporting references are summarized in Tables Tables22 and and3.3. In this table, for each aberrant pathway, we also show what percentages of predicted gene-pairs from Bayesian analysis were previously defined as direct relationships, indirect relationships, and PPI during the network modeling. It is apparent that substantial proportions of predicted pairs came from direct and indirect relationships in both SKBR3 and BT474 cell-lines. This also indicates the robustness of our Bayesian modeling in inferring gene-pair relationships. Note that in the above calculation, if a predicted pair was defined both as direct and PPI, or both as indirect and PPI, then we counted that as direct or indirect, respectively, since that prediction for that particular pair was made by our framework. Again, some of the predicted pairs (by Bayesian modeling) may not be defined as direct or indirect previously, because the definitions of the terms predicted (based on posterior probability from Bayesian modeling), direct, and indirect were based on thresholds calculated from the distributions of corresponding values [see Methods]. Thus, the enrichment test with literature supported gene-dependencies  along with the evidences from the above literature survey confirm that our framework is able to identify significantly dysregulated signaling pathways that have key associations with acquired resistance in cancer.
To compare the performances of our current framework with our previous one , we investigated which of the two frameworks identify a greater number of dysregulated signaling pathways from KEGG, Reactome, and WikiPathway databases, since we used similar gene expression datasets (SKBR3 and BT474) in both approaches. We conducted a hypergeometric test to measure the statistical significance of the overlap between the aberrant pairs and known signaling links . For that purpose, we defined the aberrant pairs in our previous approach  with oddsP and oddsR > 10.0, and posterior probabilities, and > 0.5. We found that greater percentages of pathways from KEGG, Reactome, and WikiPathway databases were found as perturbed (dysregulated) in acquired resistance when the current approach was used compared to the old one [Fig 4].
One of the main differences between these two approaches was in the definitions of the set of edges in GGR network models: the current approach used direct pairs and non-direct pairs (indirect pairs and PPI pairs), whereas the old approach only used direct pairs . Therefore, we conducted two experiments to investigate the importance of non-direct pairs in the new model. First, in aberrant signaling pathways that were detected by our current but not the previous model, we observed what percentages of enriched links (i.e. aberrant pairs found as known signaling links) were previously defined as non-direct (indirect and PPI) pairs in our current model. In both SKBR3 and BT474 cell-lines, we found that all such dysregulated pathways from KEGG, Reactome, and WikiPathway databases contained high percentages of non-direct (indirect and PPI) enriched links [S3 Table]. Second, in aberrant signaling pathways that were detected by both of our current and previous models and were ranked (based on enrichment q-values) high in the current model but low in the previous model, we observed what percentages of enriched links were previously defined as non-direct in our current model. Considering the rank difference ≥ 10 (an empirical cutoff threshold), we found that aberrant pathways from KEGG, Reactome, and WikiPathway databases that showed such behavior in both SKBR3 and BT474 cell-lines, also contained high percentages of non-direct (indirect and PPI) enriched links [S4 Table]. Therefore, we claim that our current model demonstrate enhanced performances in detecting dysregulated signaling pathways in acquired resistance compared with our previous model.
Next, we compared our framework with other published methods in terms of identifying the aberrant signaling pathways, specifically SPIA , DAVID , GATHER , ESEA  and PAGI . The first three methods (i.e. SPIA, DAVID, and GATHER) are node-centric methods, where the role of differentially expressed (DE) genes was the key to identifying dysregulated pathways. However, ESEA and PAGI are edge-centric methods, where topological information regarding pathway structures was significantly exploited. All of these methods use GE datasets, except DAVID and GATHER which take a list of DE genes as input and identify aberrant pathways, or pathways enriched with given DE genes, respectively. For this comparative analysis, we used KEGG signaling pathways only, and for all the methods default configurations were applied unless specified otherwise.
The SPIA method combines classical enrichment analysis and actual aberrant activities by analysing Cancer-Vs-Normal GE samples , and ranks corresponding signaling pathways by calculating a global pathway significance p-value, called pG. The global p-value (pG) is obtained by combining the perturbation probability (p-value: pPERT) and the probability of over-representation of DE genes (using log fold-change) in a particular pathway (p-value: pNDE) by using either Fisher’s method or the normal inversion method . Here, we conducted the same analysis but with Resistant-vs-Parental GE samples aiming to capture the aberrant activities responsible for acquired resistance. In the case of multi-probe sets for the same gene, we used the most significant probe to get a single log2 fold-change value per gene. For SKBR3 cell-lines, we found 5 signaling pathways as significant (raw pG-value ≤ 0.05) including Ras signaling, PI3K-Akt signaling, Rap1 signaling, hippo signaling, thyroid hormone signaling, and TGF-β signaling pathways. Interestingly, we found that the significance (−log(q-values)) of aberrant pathways found by our approach is strongly correlated with the global p-values (pG) found by SPIA analysis, both for all pathways (-0.4) and for above 5 signaling pathways only (-0.928). This indicates, in SKBR3 cell-lines, the signaling pathways from our framework with high enrichment of aberrant gene-pairs in acquired resistance are also consistent with the results from SPIA in terms of identifying aberrant activities. Again, for BT474, we found 12 signaling pathways with significant aberration (raw pG-value ≤ 0.05), i.e. hippo, p53, Ras, Rap1, PI3K-Akt, FoxO, Wnt, neurotrophin, insulin, estrogen, ErbB, and MAPK signaling pathways. Moreover, among these 12 signaling pathways, the first 6 had FDR-corrected pG ≤ 0.05, among which hippo signaling pathway had Bonferroni-corrected pG ≤ 0.05, as shown in Fig 5A. Among these 12 significantly dysregulated pathways in BT474 cell-line, we chose FoxO signaling to investigate further, since it was found highly perturbed by both SPIA (pPERT = 0.053) and our methods (enrichment q-value = 2.7 × 10−150). We observed perturbation plots for this signaling pathway (KEGG pathway ID = 04068), in which perturbations of all genes were plotted as a function of their initial log2 fold-change Fig 5B. Here, non-DE genes were assigned 0 for their log2 fold-change. However, many genes were identified as DE, since their absolute log2 fold-change values were mostly ~2. Again, compared to the null distribution of net accumulated perturbation values, the observed value was also found significant as shown with the red vertical line in Fig 5B. Next, we also drew the network view of the FoxO signaling pathway, where the nodes were the constituent genes (from KEGG), and the edges were the known signaling links from the literature . Here, we found 54 known signaling links that were also identified as aberrant gene-pairs by our method. Next, we plotted the heatmap of the expression values of the genes in these 54 known aberrant signaling links, where each expression value was the mean of all three replicates , z-transformed, and normalized with absolute max value (of the z-scores across the particular gene). Here, this heatmap not only shows the differential expression of the genes in aberrant gene-pairs but also indicates the similarities of their expression changes within this signaling pathway, which is a marker of aberrant activities in a modular way. Such differential gene expression in resistant-vs-parental conditions may indicate that pathway dysregulation within the signaling circuitry can be mediated by the corresponding aberrant gene-pairs.
As DAVID and GATHER both take as input a list of presumably differentially expressed genes for their pathway enrichments, we used the list of 703 and 683 distinct genes in the list of aberrant gene-pairs which were found by our framework from SKBR3 and BT474 cell lines, respectively. To detect statistically significant pathways using DAVID and GATHER we select those for which the raw p-values of their enrichment were < 0.05. For SKBR3 cell-line, DAVID and GATHER identified 15 and 5 signaling pathways as statistically significant, respectively. Again, for BT474 cell-lines they found 13 and 4 pathways as significant, respectively.
For both ESEA and PAGI analyses, we used our Resistant and Parental GE datasets for both SKBR3 and BT474 cell-lines. For both analyses, we used the default running parameters, except for the parameter nperm (the number of permutations) which was set to 1000. Both of these methods used a built-in set of topological structures of pathways from known pathway databases including KEGG. After running these methods with our GE datasets, if the identified signaling pathways had nominal p-value < 0.05, then we considered them as significantly dysregulated in resistant-vs-parental conditions. Thus, in the SKBR3 cell-line, we found 4 and 15 significantly dysregulated signaling pathways by ESEA and PAGI methods, respectively. For the BT474 cell-line, we found 2 and 12 signaling pathways significantly dysregulated by ESEA and PAGI, respectively.
All the dysregulated KEGG signaling pathways identified by any of these six methods are listed in Table 4. Some pathways were found consistently as dysregulated in both SKBR3 and BT474 cell-lines, but none were common to all six methods. However, our method identifies 33 KEGG pathways in both SKBR3 and BT474 cell-lines among which 17 were also identified by at least one of the other five methods (including both node-centric and edge-centric methods), for example MAPK, insulin (in DAVID, GATHER, and PAGI), ErbB, Wnt, B-cell receptor, Neurotrophin (in DAVID and PAGI), p53 (in DAVID and ESEA), and Jak-Stat signaling (in DAVID and GATHER). Moreover, our method identifies some novel dysregulated pathways in both SKBR3 and BT474 cell-lines which were not detected by any other methods. These pathways include Hif-1, AMPK, TNF and calcium signaling, which were reported to be involved in lapatinib-resistance in ErbB2-positive breast cancer cell-lines [4, 19, 67, 68]. Thus, the comparative identification of dysregulated pathways in resistant-vs-parental conditions in both SKBR3 and BT474 cell-lines indicates that our method is not only comparable to others but also able to detect novel findings which were validated by literature evidence.
To investigate the potential of the putative aberrant gene-pairs to characterise acquired resistance, we hypothesized that genes become dysregulated in acquired resistance because of the compensating effect of aberrant signaling that evolves in resistant-vs-parental conditions. In the simplest cases, this will involve both red and green aberrant edges incident upon a particular dysregulated gene. To investigate this hypothesis, we identified all genes with at least two aberrant links to observe which of two possible architecture types are associated with a larger number of dysregulated genes: 1) both red and green aberrant edges incident upon a gene (forming V-structures—see Methods for the definition), or 2) only red or only green aberrant edges incident upon a gene. Next, we identified the dysregulated genes among these for which the following was true: a gene is over-/under-expressed (in any patient sample) in PT-vs-PB conditions, but respectively under-/over-expressed in both RB-vs-PB and RT-vs-PB conditions, where PB, PT, RB and RT stand for ‘Parental Basal’, ‘Parental Treatment’, ‘Resistant Basal’ and ‘Resistant Treatment’, respectively. The rationale for using only such combinations is as follows. Both expression datasets of SKBR3 (GSE38376)  and BT474 (GSE16179)  cell-lines contain steady-state measurements of signaling activities, for both parental and resistant conditions. Therefore, we hypothesized that the expression changes of dysregulated genes in PT-vs-PB conditions may indicate the sensitivity of Lapatinib drug (EGFR/HER2 dual inhibitor) in the parental (sensitive) conditions whereas the opposite changes in expressions in both RB-vs-PB and RT-vs-PB conditions may indicate two things: 1) the cell-line had already became resistant to the drug for which the tumorigenic phenotype of cancer cells relapsed in the resistant condition (RB-vs-PB), and 2) the resistance characteristics of the cell-line persisted even with further treatment with lapatinib (RT-vs-PB). For each comparison, we examined the log2 of fold-change values, and the treatment and basal doses were 1.0 μM and 0 μM, respectively. We found that, for both SKBR3 and BT474 cell-lines, higher percentages of dysregulated genes were identified with both green and red aberrant signaling links compared to those with only a single type of incident edge (either red or green). For SKBR3 and BT474 cell-lines we identified 111 and 108 genes with degree ≥ 2, respectively. For the SKBR3 cell-line, 90 of the 111 genes had only one type of aberrant signaling link incident upon them, out of which 48 showed dysregulation (53.3%), whereas the remaining 21 of the 111 genes had both red and green aberrant signaling links, out of which 13 genes were dysregulated (62%). Similarly, for BT474 cell-lines, among the 108 genes with degree ≥ 2, 78 out of 102 (76%) of genes with only one type of aberrant link and 6 out of 6 (100%) of genes with both types of aberrant signaling links, exhibited dysregulation. These results suggest that for a dysregulated gene in resistant-vs-parental conditions, the expression changes that occur upon treatment in parental conditions are likely to be compensated by aberrant signaling link(s) that evolved in resistant conditions. Therefore, the initial effect of inhibitors on oncogene(s)/tumor suppressor gene(s) becomes abrogated by restoring their tumorigenic phenotype once the cell acquires resistance to that inhibitor. This experiment demonstrates that V-structures can explain an interesting mechanism of acquired resistance in cell-lines by associating the dysregulated gene(s) with both red and green aberrant signaling links.
From the list of all putative aberrant gene-pairs (after Bayesian analysis), we enumerated all possible V-structures. We first listed all of the genes in red aberrant pairs, and separately listed all of the genes in green aberrant pairs. We then identified the genes common to both lists, which we termed crossing-genes. Next, we aggregated aberrant gene-pairs incident upon crossing-genes and enumerated all possible pairs of a red and green edge incident upon that gene. Thus, we found 23,156 distinct Type-I V-structures [see Methods for Type-I, Type-II and Type-III V-structure definitions] in SKBR3 cell-lines using signaling pathways from KEGG, Reactome, and WikiPathway, out of which 53 V-structures were found in the literature-curated signaling network . Similarly for BT474, there were 5,271 distinct Type-I V-structures in all KEGG, Reactome, and WikiPathway signaling pathways, and 11 of them overlapped with the literature-curated network . For Type-II V-structures in SKBR3 and BT474 cell-lines, 1,525 and 263 distinct V-structures were found in all KEGG, Reactome and WikiPathway databases, respectively, out of which 29 and 4 V-structures were found in the literature-curated network , respectively. For Type-III V-structures in SKBR3 and BT474 cell-lines, 940 and 376 distinct V-structures were found in all KEGG, Reactome, and WikiPathway databases, respectively, where 18 and 10 V-structures overlapped with the literature-curated signaling network . A summary of these results for SKBR3 and BT474 cell-lines is provided in S5 and S6 Tables, respectively. Note that Type-I and Type-II V-structures have the potential to explain the role of signaling cross-talks in acquired resistance, but here we focus on Type-II V-structures only, since we have already investigated the role of signaling cross-talks in acquired resistance in our previous study  which are the similar kind of Type-I V-structures.
We investigated whether Type-II and Type-III V-structures can provide insights of a possible mechanism of acquired resistance in cancer cell-lines, focusing on the dysregulations of the crossing-genes in resistant-vs-parental conditions and its association with the GE changes of the other two genes in a particular V-structure. Our rationale was that the dysregulation of a crossing-gene may provide an indication that significant changes evolved in resistant-vs-parental conditions are associated with acquired resistance of cell-lines to a particular inhibitor. Moreover, significant GE changes in either of the two other genes (in the V-structure) would indicate that their differential associations with crossing-gene(s) may disrupt their functional coherence in signaling activities . Therefore, we considered the above-mentioned 13 and 6 dysregulated genes in SKBR3 and BT474, respectively, for further analyses in which gene-pairs in corresponding V-structures overlapped with known signaling links . Among the 13 dysregulated genes in SKBR3 cell-lines, 8 genes (CTNNB1, TP53, MYC, RAC2, LCK, PIK3R1, PIK3CA, and TGFBR2) were found in 22 (out of 29) literature-supported Type-II V-structures and 4 genes (CTNNB1, TP53, MYC, and PIK3CA) were found in 9 (out of 18) literature-supported Type-III V-structures (S5 Table). Similarly, among 6 dysregulated genes in BT474 cell-lines, 3 genes (CTNNB1, LEF1, and TP53) were found in 4 (out of 4) literature-supported Type-II V-structures and 4 genes (MET, TP53, CTNNB1, and LEF1) were found in 10 (out of 10) literature-supported Type-III V-structures (S6 Table). In Fig 6A, we show the network-view of the literature-supported Type-II V-structures incident upon the 8 and 4 dysregulated genes in SKBR3 and BT474 cell-lines, respectively, along with their annotated signaling pathways. Similarly, Fig 6B shows the Type-III literature-supported V-structures in both SKBR3 and BT474 cell-lines. Next, for each of the genes in the selected V-structures in Fig 6 we observed gene expression differences among all four conditions: PB (Parental Basal: 0 μM), PT (Parental Treatment: 1.0 μM), RB (Resistant Basal: 0 μM), and RT (Resistant Treatment: 1 μM) using both two-tailed paired t-tests and one-way ANOVA tests. For these statistical tests we used the mean expression value of all three replicates. In the t-tests, we compared the mean expression of all PT, RB and RT conditions with the mean of PB. Additionally, we also compared the mean of the RT condition with the means of the PT and RB conditions to observe 1) how a gene is behaving differently upon treatment in resistant-vs-parental conditions (RT-vs-PT), and 2) its expression changes upon treatment from its Resistant basal condition (RT-vs-RB). Moreover, one-way ANOVA tests (with the mean of PB as the control condition for the multiple comparison test) may indicate the significance of overall changes in all four groups. All of these statistical tests were done using GraphPad Prism 6.0 software. Concurrently, we also surveyed the literature to determine whether the observed significance of expression changes in resistant-vs-parental conditions were also supported by the literature. We found literature evidence (Fig 6C) supporting a role in breast cancer metastasis and/or in developing acquired resistance to EGFR-TKIs for the SMAD4 − TGFBR2 − RPS6KA2 (Type-II) V-structure in SKBR3, and SMAD4 − LEF1 − CCND2 (Type-II) and PTEN − TP53 − DDB2 (Type-III) V-structures in BT474 cell lines, respectively. Below we discuss these three V-structures in more detail.
Gene dysregulation plays an important role in developing acquired resistance to EGFR-TKIs in breast cancer [28–30, 100]. Here, along with literature-supported gene-gene associations in V-structures (Fig 6C), we demonstrated that the switch in dependency from the targeted signaling link involving green gene-pair (with the inhibitor) to the bypass signaling link involving red gene-pair (evolved in resistant conditions) is a possible mechanism mediating the dysregulation of crossing-genes in acquired resistance.
In this study, we proposed a computational framework that models signal rewiring by systematically characterizing potential aberrant signaling in acquired resistance. We hypothesized that an aberrant signaling link involved in acquired resistance may have differential probabilities of appearing (either higher, or lower) in resistant-vs-parental networks, where in each network, nodes were genes and the edges were the relationships among genes. In this gene-gene relationship network, called GGR, we considered both direct and indirect correlations (via linker genes) among genes for defining the edges that combine both data-driven (from gene expression) and topological (from PPI) information about gene-pairs. Note that the PPI edges in the statistically significant paths [see Methods], defining indirect relationships among gene-pairs for which direct relationships were not found, were also added to the final edge set [Table 1]. The rationale for including those PPI edges was: 1) to retain precise information regarding how indirect relationships were constructed, and 2) to better model the data-driven signaling networks (resistant and parental GGR networks) for the Bayesian statistical analysis (using p1-model) of their respective global structure formation. We used a fully Bayesian statistical model: a special class of Exponential Random Graph Model, called p1-model for inferring aberrant gene-pairs with differential posterior probabilities in resistant-vs-parental GGR networks, where these networks were constructed from matched gene expression values of resistant and parental conditions, respectively. When selecting aberrant gene-pairs, we chose the thresholds for Odds and posterior probabilities from their frequency distributions, sequentially. Firstly, we chose the gene-pairs with top 20% of odd-ratio values from two distribution individually (oddR and oddP) by ensuring their mutual exclusivity after selection, and termed them as red and green, respectively. Then, we further filter red and green pairs with top 50% of their respective posterior probability values. Note that before calculating the Odds values, we normalized both posterior probabilities (from resistant and parental conditions) with their corresponding max values over all gene-pairs, individually, in order to achieve same scaling. All other model parameters in p1-model were estimated with Gibbs sampling [see Methods].
After detecting putative aberrant pairs in resistant-vs-parental conditions, we analyzed them in two-ways, 1) Identifying potentially dysregulated pathways in acquired resistance, 2) Identifying their roles in explaining a possible mechanism of acquired resistance via dysregulation of crucial genes. Using two lapatinib-treated breast cancer cell-lines: SKBR3 and BT474, our method was able to predict similar pathways as dysregulated. The rationale for using these datasets for our experiments was that—to the best of our knowledge—these are only datasets available for responsive and resistant lapatinib-treated ERBB2-positive breast cancer cell-lines. Our results suggested that signal rewiring is a major event in acquired resistance since we found a range of dysregulated pathways in both SKBR3 and BT474 cell-lines including EGFR-related pathways (e.g. EGFR, ErbB2, PI3K-Akt, Mapk, Jak-Stat, FoxO signaling, etc.) as well as other receptor-related pathways (e.g. Notch, Wnt, insulin, PDGFR, FGFR, VEGFR signaling, etc). Although there may be some false-positives in those results, we found literature evidence from Huang et al.  that aberrant signaling in most of our predicted dysregulated pathways were actually related with acquired resistance in EGFR-TKIs. Furthermore, our predictions of network re-adjustment in multiple signaling pathways were also consistent with the results recently published by Stuhlmiller et al. . Their study suggested that multiple heterogenous kinases (e.g. DDR1, FGFRs, IGFI1, MET, etc.) compensate for the ErbB2 inhibition by kinome re-programming induced by lapatinib , which provides an indication that aberrant signaling activities in those kinase-related pathways are crucial for such bypass mechanism. Note that since the pathway annotations are still incomplete, we used three pathway databases here: KEGG, Reactome, and WikiPathways to define constituent genes of signaling pathways individually. However, to maintain the same true-relationship among those constituent genes we used literature-supported signaling links (collected from online resources of Wang Lab ) since it is the largest manually curated human signaling network as reported.
Gene dysregulation plays crucial roles in acquired resistance by mediating both uncontrolled cell-growth and disrupted apoptosis [27–29]. Here, to evaluate the potentialities of identified aberrant signaling, we conducted an analysis which demonstrated that the greater number of dysregulated genes were found in resistant-vs-parental conditions when they were incident with both red and green-types of aberrant pairs (V-structures) compared to those with single type only (either red, or green). Manual literature survey also validated some of the V-structures, such as SMAD4 − TGFBR2 − RPS6KA2, SMAD4 − LEF1 − CCND2, and PTEN − TP53 − DDB2, as consistent with our hypothesis. Thus, we claim that a mechanism of dependency shift from targeted signaling (by inhibitor) towards bypass signaling can potentially cause dysregulation of shared genes (crossing-genes). Similar idea of dependency switch was recently reported by Sharifnia et al.  that EGFR-dependent status of downstream signaling nodes can be modified by other over-expressed kinase-related genes that shared them (downstream signaling nodes) with EGFR-dependant signaling. However, to the best of our knowledge, our study is the first to emphasise the compensating effects of aberrant signaling upon mRNA expression changes of crucial genes by examining the dependency switch from targeted signaling to bypass signaling.
We included all the available genes from the Cancer Gene Census (CGC) into the list of seed genes in our framework for which gene expression data was available (see Methods): 370 and 357 genes in SKBR3 and BT474 cell-lines, respectively. Cancer genes are crucial for mediating various cancer related activities and many are hub genes in mammalian signaling networks . Therefore, they are very important in terms of signaling network formation, an aspect which we examine in this study by statistical models (i.e. p1-model). Note that we combined cancer genes with a set of differentially expressed (DE) genes even though some may not be differentially expressed. However, cancer genes can still be important in network-based analyses of studies comparing two conditions (i.e. resistant-vs-parental). For example, in a network-based classification of breast cancer patients, Chuang et al.  reported that the subnetworks which can classify metastatic and non-metastatic patients contain genes playing a central role connecting DE genes even though those cancer genes were non-DE themselves . Moreover, we intend to include all CGC genes, not just those ones that are breast cancer related, since no classifications are perfect, and the census is continuously being updated . CGC genes are selected based on the mutational profiles of cancer patients , hence their transcriptional profiles may also reveal additional insights into the mechanisms of aberrant signaling activities in acquired resistance. To investigate the influence of CGC genes in our framework, we observed all the genes involved in all the V-Structures (VSs) of aberrant pairs (Type-I, Type-II and Type-III VSs) found in pathways from KEGG, Reactome and WikiPathway databases [See S5 Table]. We found that many of the genes involved in VSs overlapped with genes from CGC, where most of those cancer genes were not identified as DE genes during the formation of the seed gene list [see Methods] [S7 Table]. Thus, we claim that CGC genes were very important in the network-based analyses of our framework.
In this paper, we considered only gene expression values for modeling gene-gene relationship networks (GGR). However, we look forward to adapting other appropriate high-throughput datasets, such as miRNA expression, methylation, copy number aberration, and phosphorylation datasets into our framework in order to better model gene-gene dependencies in resistant-vs-parental conditions to reflect greater mechanistic insights. Moreover, the V-structures we have examined in our current study can be called first-order V-Structures since they involve only a single aberrant edge of each type (red and green). In future we intend to examine the role of higher order V-structures in acquired resistance.
Our research hypothesis was primarily focused on studying the acquired resistance mechanisms of HER2-positive breast cancer cells to lapatinib (an EGFR/HER2 dual inhibitor). Therefore, we conducted a literature survey in Pubmed database using keywords: ‘lapatinib’, ‘acquired resistance’, and ‘breast cancer’, which lead us to find two articles: Komurov et al.  and Liu et al. . Both of these articles studied the resistance mechanisms of HER2-positive breast cancer cell-lines by analysing gene expression datasets of lapatinib-treated sensitive (parental) and resistant conditions. To find these gene expression datasets, we also searched GEO (Gene Expression Omnibus) database with the same keywords as above and found two data collections with accession IDs: GSE38376 and GSE16179, respectively. Detailed technical descriptions of cell-line preparation and dataset generation were reported in their respective original articles. The first dataset (GSE38376) included SKBR3 parental and resistant (SKBR3-R) cell-lines, and the second dataset (GSE16179) included BT474 parental and resistant (BT474-J4) cell-lines. In both of these datasets, expression values of both parental and resistant samples were measured first in basal condition (0 μM), and then in treatment conditions (0.1 μM and 1.0 μM for GSE38376; 1.0 μM only for GSE16179). For both GSE38376 (SKBR3) and GSE16179 (BT474), we converted probe-level expression values into gene-level values using the corresponding annotation files: GPL6947 (Illumina HumanHT-12 V3.0 expression beadchip) and GPL570 (Affymetrix Human Genome U133 Plus 2.0 Array), respectively, which were also collected from GEO database. For some genes, multiple probes were mapped to a single gene, and we averaged the GE values of such probes to obtain the final GE values of the corresponding gene. Next, for each collection (GSE38376 and GSE16179) we built two data matrices, one from the parental and another from the resistant GE dataset, where rows were labeled with gene symbols and columns were labeled with samples under different treatment conditions. A protein-protein interaction dataset was obtained from Cerami et al. . For the enrichment analysis, we collected gene sets of all 1) the 24 signaling pathways from Reactome  (downloaded at 19/05/2014), 2) 45 signaling pathways from KEGG [83, 108] (downloaded at 12/05/2015), and 3) 61 signaling pathways from WikiPathway  (downloaded at 16/10/2014) databases. Each signaling pathway downloaded from these databases was encoded as tab-delimited lists of gene symbols. For KEGG signaling pathways, we built a parser program that extracted gene names from the web-responses after making HTTP web-requests to KEGG server using a list of IDs corresponding to signaling pathways.
We denote the gene-gene relationship network as GGR:= (S, R) for each GE data matrix. Here, S is the set of seed genes, which is the union of a set of differentially expressed (DE) genes, a set of cancer genes collected from the Cancer Gene Census (CGC) , and a set of linker genes (see below) selected from the PPI network. R is the set of edges defined among the genes in the set S. The sets S and R were constructed as follows.
We built the set S cumulatively; first a set of DE genes was identified by differential expression analysis of parental and resistant GE data using a two-tailed pooled Student’s t-test. For this test, significant p-values were identified using the Bonferroni correction method, and genes with such corrected p-values ≤ threshold (see Results) were selected as differentially expressed. Next, we added CGC genes for which corresponding GE data was available. The rationale for such inclusion is that CGC genes are well known to be hub genes in mammalian cellular signaling networks  where they play key regulatory roles in various cancer related activities. In the process of finding indirect relationships among (DE CGC) genes, a set of intermediate genes from the PPI network was identified, which we defined as linker genes (see next section). The final set of seed genes consisted of (DE CGC Linker) genes.
To identify interacting gene pairs, all pair-wise absolute Pearson Correlation Coefficients (PCCs) were calculated for expression levels of the genes in the (DE CGC) gene set. The value demarcating the top 20% of absolute PCCs was selected as the threshold for defining direct relationships among the genes in the above set. That is, for each gene pair (genei, genej), if the corresponding PCC value was above the threshold then the pair was considered to have a direct relationship, and hence added into the set of edges, R.
Otherwise, a gene pair was said to have an indirect relationship if there was at least one statistically significant simple path in the PPI network between genei and genej via an intermediate gene (called a linker gene). Here, we imposed a path-length threshold of 2 and restricted to paths in the PPI network, otherwise considering all the remaining genes as possible intermediates would convert this searching procedure into an NP-hard problem. Simple paths of length 2 [for details see S1 Text] connecting a given pair (genei, genej) in the PPI network were considered statistically significant if one can reject the following null hypothesis: the geometric mean of pairwise PCC values of constituent edges in the path is distributed as for paths of length 2 between these genes generated by a randomized procedure. Random paths of the form genei → linker → genej were generated by replacing linker with any other gene in the network except genei, genej and any gene on a path of length 2 connecting these genes in the PPI network. To evaluate the PCC for a random path, we used the same expression values for the genes as in the observed case. Paths were considered significant if the probability of generating a path using above randomized procedure with a geometric mean of pairwise PCC values greater than or equal to that observed for the PPI network was ≤0.05 (an empirical p-value). PPI edges comprising statistically significant simple paths were added to the set R. The set of edges R was finally composed of direct relationships, indirect relationships, and PPI edges of statistically significant simple paths, which are used for identifying those indirect relationships [see Discussion].
Exponential Random Graph Models (ERGMs) are parametric probability distributions over spaces of networks  that have been successfully used to evaluate probabilities of the presence of each edge in a network [23, 24]. Here, in order to infer edge probabilities in a gene-gene relationship network, we used the p1-model, a special class of ERGM introduced by Holland and Leinhardt . The p1-model has previously been used by Bulashevska et al.  to model human protein-protein interaction networks. In this approach, edge probabilities are evaluated by summarizing topological properties of networks in a parametric form and associating them with sufficient statistics [23, 24]. The definition of the p1-model for a directed graph is contained in the original article . An equivalent log-linear formulation was proposed by Fienberg and Wasserman , in which each directed edge was assigned four Bernoulli variables Yij00, Yij01, Yij10 and Yij11. Since our GGR network is an undirected graph, the model can be simplified by using only two Bernoulli variables Yij0 and Yij1 defined as follows:
where, the binary outcome uij = 1 if genei interacts with genej in GGR, and uij = 0 otherwise. Under this simplified model, the posterior probability of an edge connecting genei and genej is given by:
for i < j. Here, θ is the global density parameter, αi is the expansiveness/attractiveness of genei, and λij is the scaling parameter ensuring . We hypothesized that some aberrant gene-pairs involved in acquired resistance may have unusually high probability of appearing in Resistant-vs-Parental conditions, whereas other pairs may have unusually low probabilities. Hence, we used two Yk data matrices, YkR and YkP, from GGR matrices of Resistant and Parental samples, respectively. Note that it is possible to replace the expansiveness and attractiveness parameters by a single parameter α that represents the propensity of a gene to be connected in an undirected network.
We used a fully Bayesian approach, both for modeling the network parameters and their estimation. To estimate the model parameters, we used Gibbs sampling, a Markov Chain Monte Carlo (MCMC) method implemented in WinBUGS  which allows users to construct complex Bayesian models in a simple manner. We constructed a hierarchical Bayesian model in which the model parameters were further defined as dependent upon hyperparameters as follows:
We assigned the density parameter θ a normal prior distribution with mean zero and standard deviation σθ. (In fact, this was implemented in WinBUGS using the precision parameter τθ = σθ-2). Next, the parameter τθ was assigned a gamma prior distribution with hyperparameters a0 = 0.001 and b0 = 0.001. We set a0 = 0.001 and b0 = 0.001 to express large uncertainty regarding the value , following . For the propensity parameters and , we selected the above prior following Adam et al. .
One of our primary hypotheses in this study is that aberrant gene-pairs involved in network re-wiring in drug-resistance are likely to have high probabilities of occurring in one network (resistant or parental) but low probabilities in the other network. To determine which gene-pairs exhibit this pattern, we calculated two odds ratios defined in the following equations:
where, and are defined for resistant and parental networks, respectively, and their corresponding posterior probabilities are estimated using MCMC sampling. Before calculating these ratios, we normalized the posterior probabilities by their respective maximum values over all gene-pairs, since two values ( and ) may not be in the same scale. For the sake of brevity we refer to these ratios as odds ratios, but they are more appropriately called normalized odds ratios.
Our intention is to identify gene-pairs for which only one of the two odds ratios (Eqs (3) and (4)) is very high. Additionally, we require that both posterior probabilities exceed a minimum threshold, since very small denominators can yield high odds ratio scores even if the edge has low probability in both networks. We therefore defined two thresholds, one for odds ratio values and another for posterior probabilities. We examined the distributions of all oddsR and oddsP values and set a threshold demarcating the top 20%. Next, we examined the distribution of posterior probabilities for gene-pairs exceeding the odds ratio threshold and set a second threshold to demarcate the top 50%. Finally, we chose only those gene-pairs that had posterior probabilities above that threshold, and identified as putative aberrant gene-pairs that were potentially involved in network rewiring in acquired resistance.
Edges were then grouped into two types: gene-pairs for which the oddsR scores and the were greater than their respective thresholds in Eq (3) were defined as red pairs, whereas gene-pairs for which the oddsP scores and the were greater than their respective thresholds in Eq (4) were defined as green pairs.
Putative aberrant gene-pairs from the above Bayesian analyses were then further filtered by comparing them to another set of known (true) signaling links from the literature. For that purpose, we obtained a signaling network from the online resources of Wang Lab , which is claimed as the largest manually curated signaling network available to date. This network has more than 6,000 proteins and ~63,000 binary relations defined, including activations, inhibitions and physical interactions. Note that signaling pathways from KEGG, Reactome, and WikiPathway databases were merely genesets, and to define true signaling links among the genes within those genesets we considered the signaling links from Wang Lab . Next, to find dysregulated signaling pathways from KEGG, Reactome, and WikiPathway databases, we searched for significant overlaps between the set of true signaling links and the set of putative aberrant gene-pairs, for the genesets in a specific pathway. To this end, we designed a hypergeometric test as follows:
where N is the number of distinct gene-pairs contained in all of the signaling pathways (from a particular pathway database) and all the predicted aberrant gene-pairs, M is the set of all known signaling links in a given pathway, K is the set of aberrant gene-pairs predicted by our framework, and x = |M ∩ K|. After measuring p-values using Eq (5), a False Discovery Rate (FDR) multiple comparison correction technique was conducted to obtain q-values. Signaling pathways with q-value <0.05 were considered to be significantly dysregulated in acquired drug resistance. A similar gene-pair enrichment test, called Edge Set Enrichment Analysis (ESEA) using the weighted Kolmogorov-Smirnov statistic was recently proposed by Han et el. for detecting dysregulated pathways in the context of gene expression datasets .
To investigate the role of signaling rewiring in acquired drug resistance, we searched for a configuration of edges that we call a V-structure (Fig 1F). A V-structure consists of three genes connected by one red edge and one green edge. One gene, called a crossing-gene, is connected to both of the other genes, to one by a red edge and to the other by a green edge. Thus V-structures involve both types of aberrant pairs: one gene-pair present only in Resistant conditions, and another gene-pair present only in Parental conditions, with the crossing-gene common to both pairs. Our rationale is that the compensatory kinases may switch the oncogenic-addiction of cancer-related (growth/survival) genes to overcome their dependencies upon their primary driver kinases (e.g. EGFR/HER2) that were initially targeted in parental conditions with inhibitors [19, 30], thereby relapsing into their tumorigenic roles in acquired resistance. We hypothesise that crossing-genes that are dysregulated restore their metastatic phenotype (i.e. up- or down-regulation of oncogenes or tumor suppressor genes, respectively) in resistant conditions by forming a V-structure in the rewired signaling network.
Therefore, we define a V-structure to be a pair of aberrant gene-pairs (gi, gk) and (gj, gk) such that (gi, gk) are connected by a green edge and (gj, gk) are connected by a red edge. To identify V-structures, first we identified the set of common genes in the two mutually exclusive sets of aberrant gene-pairs (red and green gene-pairs). This set of common genes are the crossing-genes (see Fig 1). Next, we observed and enumerated all the gene-pairs (red and green) incident on each of the crossing-genes, and enumerated all of the possible pairings of one red and one green edge to form a V-structure.
Next, for each V-structure, we identified signaling pathways from KEGG, Reactome, and WikiPathway databases that contained at least one gene in that V-structure. We classified V-structures into three sub-types based on their configurations relative to these pathways. Firstly, Type-I V-structures are those in which all three member genes belong to different signaling pathways. Type-II V-structures are those in which the two aberrant gene-pairs in a particular V-structure are from two different signaling pathways, with the crossing-gene common to both pathways. Type-III V-structures are those in which all three genes are from the same signaling pathway. Note that Type-I and Type-II V-structures may represent signaling pathway cross-talks that play crucial roles in acquired drug-resistance. In our previous study, we investigated and explained the concept of Type-I V-structures, their involvement in the cross-talk between EGFR/ErbB and other signaling pathways, and their contribution to lapatinib resistance . Type-III V-structures can explain the aberrant co-regulation of genes within a single pathway. We observed and analysed all the V-structures that overlap with the literature curated signaling network .
List of identified putative aberrant gene-pairs (for both SKBR3 and BT474) cell-lines in acquired resistance.
Full results of pathway enrichment tests of identified aberrant gene-pairs in acquired resistance from KEGG, Reactome, and WikiPathway databases for both SKBR3 and BT474 cell-lines.
Comparing our current model with the previous model by observing the percentages of non-direct (indirect and PPI pair) enriched links (aberrant pairs as known signaling links) in the aberrant signaling pathways from KEGG, Reactome, and WikiPathway databases that were detected by our current but not the previous model, for both SKBR3 and BT474 cell-lines.
Comparing our current model with the previous model by observing the percentages of non-direct (indirect and PPI pair) enriched links (aberrant pairs as known signaling links) in the aberrant signaling pathways from KEGG, Reactome, and WikiPathway databases that were detected by both of our current and previous models, and were ranked (based on enrichment q-value) high in the current model but low in the previous model, for both SKBR3 and BT474 cell-lines.
Summary of Type-I, Type-II and Type-III enrichment of V-structures in KEGG, Reactome, and WikiPathway databases in SKBR3 cell-line.
Summary of Type-I, Type-II and Type-III enrichment of V-structures in KEGG, Reactome, and WikiPathway databases in BT474 cell-line.
CGC genes in all the Type-I, Type-II and Type-III V-structures in SKBR3 and BT474 cell-lines.
We thank Dr. Tianhai Tian for his initial support in this project.
This research was supported by Monash International Postgraduate Research Scholarship and Monash Graduate Scholarship at the Monash University, Australia. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
All relevant data are within the paper and its Supporting Information files.