|Home | About | Journals | Submit | Contact Us | Français|
Somatic genetic alterations in cancers have been linked with response to targeted therapeutics by creation of specific dependency on activated oncogenic signaling pathways. However, no tools currently exist to systematically connect such genetic lesions to therapeutic vulnerability. We have therefore developed a genomics approach to identify lesions associated with therapeutically relevant oncogene dependency. Using integrated genomic profiling, we have demonstrated that the genomes of a large panel of human non–small cell lung cancer (NSCLC) cell lines are highly representative of those of primary NSCLC tumors. Using cell-based compound screening coupled with diverse computational approaches to integrate orthogonal genomic and biochemical data sets, we identified molecular and genomic predictors of therapeutic response to clinically relevant compounds. Using this approach, we showed that v-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog (KRAS) mutations confer enhanced Hsp90 dependency and validated this finding in mice with KRAS-driven lung adenocarcinoma, as these mice exhibited dramatic tumor regression when treated with an Hsp90 inhibitor. In addition, we found that cells with copy number enhancement of v-abl Abelson murine leukemia viral oncogene homolog 2 (ABL2) and ephrin receptor kinase and v-src sarcoma (Schmidt-Ruppin A-2) viral oncogene homolog (avian) (SRC) kinase family genes were exquisitely sensitive to treatment with the SRC/ABL inhibitor dasatinib, both in vitro and when it xenografted into mice. Thus, genomically annotated cell-line collections may help translate cancer genomics information into clinical practice by defining critical pathway dependencies amenable to therapeutic inhibition.
The dynamics of ongoing efforts to fully annotate the genomes of all major cancer types are reminiscent of those of the Human Genome Project. The analysis of somatic gene copy number alterations and gene mutations associated with cancer (both here referred to as lesions) will thus provide the genetic landscape of human cancer in the near future. The medical implications of these endeavors are exemplified by the success of molecularly targeted cancer therapeutics in genetically defined tumors: the ERBB2/Her2-targeted (where ERBB2 is defined as v-erb b2 erythroblastic leukemia viral oncogene homolog 2, neuro/glioblastoma-derived oncogene homolog [avian]) antibody trastuzumab shrinks tumors in women with ERBB2-amplified breast cancer (1); the ABL/KIT/PDGFR (where ABL is defined as v-abl Abelson murine leukemia viral oncogene homolog and KIT is defined as v-kit Hardy-Zuckerman 4 feline sarcoma viral oncogene homolog) inhibitor imatinib induces responses in patients with chronic myeloid leukemia carrying the BCR/ABL (where BCR is defined as breakpoint cluster region) translocation (2, 3) as well as in patients with gastrointestinal stromal tumors and melanomas bearing mutations in KIT (4) or PDGFRA (5); and finally, EGFR-mutant lung tumors are highly sensitive to the EGFR inhibitors gefitinib and erlotinib (6–8). In most cases, such discoveries were made after the completion of clinical trials; as yet no robust mechanism currently exists that permits systematic identification of lesions causing therapeutically relevant oncogene dependency prior to initiation of such clinical trials.
The use of cancer cell lines allows systematic perturbation experiments in vitro, yet the validity and clinical interpretability of these widely used models have been questioned. In some notable instances, pathways may lose function when grown in culture (9). In addition, cell lines are frequently thought to be genomically disarrayed and unstable and therefore likely poorly representative of primary tumors. Furthermore, the genetic diversity of histopathologically defined classes of tumors is often substantial, e.g., the clinical tumor entity non–small cell lung cancer (NSCLC) comprises EGFR- and KRAS-mutant (where KRAS is defined as v-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog) lung adenocarcinomas as well as KRAS-mutant squamous-cell lung cancers. Thus, any representative preclinical model would need to capture the nature of lesions of primary tumors as well as their distribution in the histopathologically defined cohort.
Recent reports have credentialed the use of cancer cell lines in preclinical drug target validation experiments (10–13). Building on the foundation of these studies, we have now established a cell-line collection that enables systematic prediction of drug activity using global profiles of genetic lesions in NSCLC. Given the genomic diversity of a particular cancer type, we reasoned that in-depth preclinical analyses of activity of cancer therapeutics in tumor cells would require both thorough genomic analysis of a large cell-line collection of a single tumor entity and high-throughput cell-line profiling, followed by genomic prediction of compound activity.
We set out to systematically annotate the genomes of a large panel of NSCLC cell lines in order to determine whether such a collection reflects the genetic diversity of primary NSCLC tumors. We further determined the phenotypic validity of this collection and analyzed drug activity as a function of genomic lesions in a systematic fashion. Finally, we confirmed the validity of our predictors in vitro and in lung cancer mouse models. Such complementary efforts may provide a framework for future preclinical analyses of compound activity, taking into account the multitude of genetic lesions in histopathologically defined cancer types.
Eighty-four NSCLC cell lines were collected from various sources (Supplemental Table 1; supplemental material available online with this article; doi:10.1172/JCI37127DS1) and formed the basis for all subsequent experiments. Cell lines were derived from tumors representing all major subtypes of NSCLC tumors, including adenocarcinoma, squamous-cell carcinoma, and large-cell carcinoma.
The genomic landscape of these cell lines was characterized by analyzing gene copy number alterations using high-resolution SNP arrays (250K Sty1). We used the statistical algorithm Genomic Identification of Significant Targets in Cancer (GISTIC) to distinguish biologically relevant lesions from background noise (14). The application of GISTIC revealed 16 regions of recurrent, high-level copy number gain (inferred copy number > 2.14) and 20 regions of recurrent copy number loss (inferred copy number < 1.86) (Supplemental Tables 2 and 3). Overall, we identified focal peaks with a median width of 1.45 Mb (median 13.5 genes/region) for amplifications and 0.45 Mb for deletions (median 1 gene/region). These regions contained lesions known to occur in NSCLC (e.g., deletion of LRP1B [2q], FHIT [3p], CDKN2A [9p]; amplification of MYC [8q], EGFR [7p] and ERBB2 [17q]; Figure Figure1A1A and Supplemental Table 2). Furthermore, within broad regions of copy number gain, we also identified amplification of TITF1 (14q) and TERT (5p) (Figure (Figure1A1A and Supplemental Table 2), recently identified by large-scale genomic profiling of primary lung adenocarcinomas (15–17).
Analysis of homozygous deletions as well as loss of heterozygosity (LOH) is typically hampered by admixture of nontumoral cells in primary tumors. The purity of cell-line DNA permitted identification of previously unknown homozygous deletions and regions of LOH, including LOH events resulting from uniparental disomy (e.g., copy-neutral events) (Supplemental Table 4). In this analysis, known genes such as MTAP (9p) and LATS2 (13q) were altered by homozygous deletions (18, 19) and we found what we believe are novel homozygous deletion of genes such as TUBA2 (Supplemental Table 4). Of note, most of these regions could also be identified in primary NSCLC tumors as deleted (15); however, inferred copy numbers only inconstantly showed LOH or homozygous deletions, indicating admixture of normal diploid DNA (Supplemental Table 4). Thus, while a recent large-scale cancer profiling study (15) enabled insight into the genomic landscape of lung adenocarcinoma, the use of pure populations of tumor cells further afforded discovery of previously unrecognized regions of homozygous deletions and LOH.
We next compared the profile of significant amplifications and deletions in this cell-line collection with that of a set of 371 primary lung adenocarcinomas (15). This comparison revealed a striking similarity between the 2 data sets (Figure (Figure1A)1A) but not between NSCLC cell lines and gliomas or melanomas (Supplemental Figure 1, A and B). A quantitative analysis of similarity by computing correlations of the false discovery rate (q value) confirmed the similarity of primary lung cancer and lung cancer cell lines (r = 0.77) and the lack of similarity of lung cancer cell lines and primary gliomas (14) (r = 0.44), melanoma cell lines (11) (r = 0.44), or ovarian tumors (r = 0.38; Supplemental Figure 1C). As a control, repeated random splitting of the lung cancer cell-line data and computation of internal similarity resulted in correlation coefficients between 0.82 and 0.86, whereas we found no correlation with normal tissue (r = 0.0195; Supplemental Figure 1C). These results demonstrate that the genomic copy number landscape of NSCLC cell lines reflects that of primary NSCLC tumors, while tumors or cell lines of other lineages show a much lower degree of similarity (20, 21). Furthermore, the distribution of oncogene mutations in the cell lines (Supplemental Table 5) was similar to that in primary NSCLC tumors, with a high prevalence of mutations in the KRAS and EGFR genes (22–25) and rare occurrence of phosphoinositide-3-kinase, catalytic, α polypeptide (PIK3CA) and v-raf murine sarcoma viral oncogene homolog B1 (BRAF) mutations (Figure (Figure1B).1B). These results further validate our cell-line collection on a genetic level.
The availability of both copy number alteration and oncogene mutation data of the NSCLC cell lines enabled us to analyze the interactions of both types of lesions (Supplemental Figure 2). Hierarchical clustering of lesions robustly grouped both mutations and amplification of EGFR in 1 subcluster (ratio Q of observed vs. expected cooccurrence: Q = 4.38, P = 0.001), while KRAS mutations consistently grouped in a distinct cluster. These findings corroborate prior observations in vivo in which mutations in KRAS and EGFR were mutually exclusive while EGFR mutation and EGFR amplification frequently cooccurred (23, 26, 27). Moreover, these results suggest that these mutations influence the particular signature of genomic alterations in the affected tumors. Finally, in unsupervised hierarchical cluster analyses of gene expression data, primary lung cancer specimens (28) and lung cancer cell lines shared 1 cluster (Figure (Figure1C),1C), while renal cell carcinomas (29) and lymphomas (30) as well as the corresponding cell lines clustered in a separate group.
In summary, in-depth comparative analysis of orthogonal genomic data sets of a large panel of NSCLC cell lines and primary tumors demonstrates that these cell lines reflect the genetic and transcriptional landscape of primary NSCLC tumors.
Activated oncogenes typically cause a transcriptional signature that can be used to identify tumors carrying such oncogenes (31, 32). However, we consistently failed to identify a gene expression signature characteristic of EGFR-mutant tumors (33, 34) using a gene expression data set of 123 primary lung adenocarcinomas (35) annotated for mutations in EGFR (data not shown). We therefore reasoned that the cellular purity of our cell lines (n = 54 analyzed on U133A) might enable the determination of such a signature and the application of this signature in primary tumors. We applied principal component analyses on the variable genes and found a remarkable grouping of all EGFR mutated cell lines (n = 8/54), with a significant dissociation already in the first principal component (Welch’s t test on the distribution of eigenvalues: P = 0.0005) contributing 14.5% to the overall variance (Figure (Figure2A).2A). Similar results were obtained by hierarchical clustering (data not shown). Using genes differentially expressed in EGFR-mutant cell lines (including T790M) as a surrogate feature (Supplemental Table 6), all of the EGFR-mutant primary tumors (35) were grouped in a distinct cluster (P = 0.00001) when performing hierarchical clustering (Figure (Figure2B).2B). This result was also recapitulated when selecting genes differentially expressed in erlotinib-sensitive (GI50 < 0.1 μM, n = 5/54 vs. GI50 > 2 μM, n = 45, where GI50 indicates half-maximal growth inhibitory concentration) cell lines (Supplemental Figure 3A). Furthermore, patients with tumors expressing the signature of EGFR mutated cell lines had better overall survival than those whose tumors did not (Figure (Figure2C)2C) (36). The power of our EGFRmut signature to predict survival was confirmed, employing the data published by Beer and colleagues (Figure (Figure2D)2D) (37). This effect was even observed when excluding EGFR-mutant tumors (n = 13) from the analysis (Figure (Figure2C).2C). Thus, expression signatures extracted in vitro can be used to identify biologically diverse tumors in vivo (38).
Others have recently characterized a transcriptional signature of EGFR-mutant NSCLC using a small set of cell lines (39). However when analyzing primary lung adenocarcinomas with the signature described by Choi et al., EGFR-mutant samples were randomly distributed across the data set (Supplemental Figure 3B). This finding further highlights the importance of using large cell-line collections in order to represent the overall genomic diversity of primary tumors.
Recent studies have linked the presence of EGFR mutations in lung adenocarcinomas to clinical response to the EGFR inhibitors erlotinib and gefitinib (6–8). However, retrospective studies aimed at determining predictive markers for EGFR inhibition yielded heterogeneous results, implicating EGFR mutations and/or EGFR amplifications among others as predictive of response or patient outcome (40–42). We set out to systematically identify genetic lesions associated with sensitivity to erlotinib by including all global lesion data from our genomics analyses rather than focusing on EGFR-associated lesions. We established a high-throughput cell-line screening pipeline that enables systematic chemical perturbations across the entire cell-line panel followed by automated determination of GI50 values (43) to determine erlotinib sensitivity for all cell lines. We next analyzed the distribution of genetic lesions in erlotinib-sensitive compared with insensitive cell lines (Supplemental Tables 5 and 7) and further compared the mean sensitivity of cell lines with and without the respective genetic lesions. In both analyses, EGFR mutations were the best single-lesion predictor of erlotinib sensitivity (Figure (Figure2E2E and Supplemental Table 7; Fisher’s exact test; P = 6.9 × 10-8). Furthermore, we found a less stringent association with amplification of EGFR (Fisher’s exact test; P = 1.4 × 10-4); however, only EGFR mutations were significant predictors of erlotinib sensitivity when we adjusted for multiple hypothesis testing using Bonferroni’s correction (data not shown). We next used signal-to-noise–based feature selection combined with the K-nearest-neighbor (KNN) algorithm (44, 45) to build a multilesion predictor of erlotinib sensitivity. The best performing multilesion predictor comprised EGFR mutations, amplification of EGFR, and lack of KRAS mutations (Figure (Figure2E2E and Supplemental Table 7), which have all been implicated in determining responsiveness of NSCLC patients to EGFR inhibitors (6–8, 27, 40, 41, 46). We note that in our data set, as in previously published reports (6–8, 27, 40, 41, 46), EGFR amplification and mutation were correlated, whereas KRAS mutations were mutually exclusive with either lesion (Supplemental Figure 2). Thus, our observation confirms the overall predominant role of EGFR mutations in predicting responsiveness to EGFR inhibition, and it provides an explanation for the finding of EGFR amplification as being predictive of response as well. Our findings also corroborate prior clinical reports establishing KRAS mutations as a resistance marker for EGFR inhibition therapy. Together, these results imply that essential transcriptional and biological phenotypes of the original tumors are preserved in the cell lines, a necessary requirement for application of such collections as proxies in preclinical drug target validation efforts.
Having validated the cell-line collection by demonstrating its genomic and phenotypic similarity to primary NSCLC tumors, we reasoned that adding complex phenotypic data might elicit additional insights into the impact cancer genotypes have on cell biology phenotypes. In our initial pilot screening experiment, we profiled all cell lines against erlotinib and subsequently extended our assay to 11 additional inhibitors that were either under clinical evaluation or showed high activity in preclinical models; these compounds target a wide spectrum of relevant proteins in cancer (Supplemental Figure 4). We treated all cell lines with these compounds and determined GI50 values (GI25 respectively; Supplemental Table 5). The resulting sensitivity patterns (Figure (Figure3)3) revealed that while some of the compounds exhibited a pronounced cytotoxic activity in a small subset of cell lines (e.g., erlotinib, vandetanib, VX-680), others were active in most of the cell lines, with only a minority being resistant [e.g., 17-(allylamino)-17-demethoxygeldanamycin (17-AAG)]. Only 2 cell lines (<2%) were resistant to all of the compounds (Supplemental Table 5), suggesting that most NSCLC tumors might be amenable to targeted treatment. Overall, these observations are highly reminiscent of patient responses in clinical trials in which limited subsets of patients experience partial and, rarely, complete response while the majority of patients exhibit stable disease, no change, or progression.
As an initial approach to identification of shared targets of inhibitors, we performed hierarchical clustering based on the similarity of sensitivity profiles (Figure (Figure4A)4A) and based on the correlation between sensitivity and genomic lesion profiles (Figure (Figure4B).4B). Erlotinib and vandetanib exhibited the highest degree of similarity, pointing to mutant EGFR as the critical target of vandetanib in NSCLC tumor cells (Figure (Figure4,4, A and B) (47, 48). The high degree of correlation (r = 0.91; P < 0.001) of cell-line GI50 values for both compounds as well as structural modeling of vandetanib binding in the EGFR kinase domain, which revealed a binding mode identical to that of erlotinib, further corroborate this notion (Supplemental Figure 5A). This model predicted that binding of both compounds would be prevented by the T790M resistance mutations of EGFR (48–50); accordingly, murine Ba/F3 cells ectopically expressing erlotinib-sensitizing mutations of EGFR together with T790M (51) were completely resistant to erlotinib and vandetanib (Supplemental Figure 5, B and C).
In addition to the ERBB2/EGFR inhibitor lapatinib, vandetanib, and the irreversible EGFR inhibitor PD168393 (52), the SRC/ABL (where SRC is defined as v-src sarcoma [Schmidt-Ruppin A-2] viral oncogene homolog [avian]) inhibitor dasatinib (53) shared a cluster with the EGFR inhibitor erlotinib, although at a much lower potency than erlotinib (Figure (Figure4,4, A and B). Molecular modeling of dasatinib binding to EGFR predicted a binding mode similar to that of erlotinib (Figure (Figure4C),4C), with a steric clash of erlotinib and dasatinib with the erlotinib resistance mutation T790M (49, 50, 54, 55) (Figure (Figure4C).4C). We therefore formally validated EGFR as a relevant dasatinib target in tumor cells by showing cytotoxicity as well as EGFR dephosphorylation (56) elicited by this compound in Ba/F3 cells ectopically expressing mutant EGFR but not in those coexpressing the T790M resistance allele (Figure (Figure4D).4D). Thus, large-scale phenotypic profiling coupled to computational prediction formally validated a relevant tumor-cell target of an FDA-approved drug using a systematic unbiased approach. It is noteworthy that a trial of dasatinib in patients with acquired erlotinib resistance is currently ongoing (trial ID: NCT00570401; http://clinicaltrials.gov/ct2/show/NCT00570401?term=NCT00570401&rank=1; based on previously reported biochemical findings (54) and our results, we predict limited clinical activity in those patients in whom erlotinib resistance is due to the EGFR resistance mutation T790M.
We have shown that hierarchical clustering can identify compounds with overlapping target specificities within a screening experiment. We now set out to extend our analyses to additional computational approaches to predict inhibitor responsiveness from global lesion data in a systematic fashion. To this end, we applied supervised learning methods as we did for erlotinib (see above). Applying this method, we identified robust, genetic lesion-based predictors for the majority of the tested compounds (Supplemental Table 7).
UO126 is a MEK inhibitor that also showed enhanced activity in a subset of the lung cancer cell-line collection. Here, the supervised approach identified chromosomal gains of 1q21.3 affecting the genes ARNT and RAB13 as being robustly associated with UO126 sensitivity (Fisher’s exact test, copy number threshold 2.14, P = 0.02; Supplemental Figure 6 and Supplemental Table 7). In order to validate this finding in an independent data set, we made use of the NCI-60 cancer cell-line panel (57) in which hypothemycin was used as a MEK inhibitor (12). This cross-platform validation revealed that 1q21.3 gain predicted sensitivity to MEK inhibition in both data sets (Fisher’s exact test, P = 0.03, NCI-60 collection; Supplemental Figure 6).
In our initial cluster analysis, we found that KRAS mutations correlated with sensitivity to the Hsp90 inhibitor 17-AAG, a geldanamycin derivative (Figure (Figure4B).4B). Recapitulating this observation, we found KRAS mutations to be predictive of 17-AAG sensitivity, even when applying our KNN-based prediction approach (Fisher’s exact test, P = 0.029; Figure Figure5A5A and Supplemental Table 7). Confirming this observation in an independent cell-line model, we found the distribution of geldanamycin sensitivity and KRAS mutation in the NCI-60 cell-line collection to be strikingly similar to that observed in our panel (P = 0.049; Figure Figure5A). 5A).
In 17-AAG-sensitive cells, Hsp90 inhibition led to robust induction of apoptosis (Supplemental Figure 7A). In order to gain mechanistic insight into KRAS dependency on Hsp90 chaperonage, we first confirmed the specificity of our KRAS antibody (Supplemental Figure 7C). Using conditions under which EGFR coprecipitated with Hsp90 in EGFR-mutant cells (Supplemental Figure 7B) (58), we found KRAS to be bound to Hsp90 as well (Figure (Figure5B).5B). However, while 17-AAG treatment depleted mutant EGFR from Hsp90 (Supplemental Figure 7B), KRAS binding to Hsp90 was not affected by this treatment (Figure (Figure5B).5B). Furthermore, cellular KRAS protein levels were also not reduced by 17-AAG (Figure (Figure5B).5B). These findings are surprising, as other oncogenes, such as EGFR or BRAF, known to be dependent on Hsp90 chaperonage are depleted from the complex after treatment with 17-AAG (58, 59). However, reduction of viability of KRAS-mutant cells treated with 17-AAG is accompanied by depletion of c-RAF and AKT (60) (Figure (Figure5B).5B). Since both c-RAF and AKT are known Hsp90 clients (59, 61), we hypothesize that this observation might rely on the activation of the AKT and RAF/MEK/ERK signaling pathways by mutant KRAS (62, 63).
To further validate the power of KRAS mutations to predict response to Hsp90 inhibition, we employed a lox-stop-loxKRASG12D mouse model that enables the study of KRAS-driven lung adenocarcinomas in vivo (64). Mice with established lung tumors induced by nasal inhalation of adenoviral Cre (64) were either treated with the water-soluble geldanamycin Hsp90 inhibitor 17-(dimethylaminoethylamino)-17-demethoxygeldanamycin (17-DMAG) or placebo. Whereas no tumor shrinkage was observed in the placebo-treated mice after 1-week treatment (Figure (Figure5C5C and Supplemental Figure 8), substantial regression of established tumors was observed in 3 out of 4 mice receiving 17-DMAG, with a tumor volume reduction of up to 80% (Figure (Figure5C5C and Supplemental Figure 8). Although responses were transient as those seen in 17-DMAG–treated transgenic mice with EGFR-driven lung carcinomas (data not shown), these findings validate our observation that KRAS mutation predicts response to Hsp90 inhibition in vivo.
We have used similarity profiling and supervised learning approaches that led to the identification of predictive markers based on significant lesions found in our data set as defined by GISTIC. However, the advantage of statistically defining relevant lesions in a given data set limits the utility of lesions occurring at low frequency and/ or amplitude to be used as predictors for compound sensitivity. We therefore developed an additional approach, denoted Target-Enriched Sensitivity Prediction (TESP), which enables inclusion of statistically underrepresented yet biologically relevant lesions.
Amplification of drug-target genes has been demonstrated to predict vulnerability to target-specific compounds in ERBB2-amplified breast cancer and EGFR-amplified lung cancer (1, 46). We therefore speculated that chromosomal copy number alterations of biochemically defined drug targets could be used for prediction of sensitivity to other tyrosine kinase inhibitors as well. To this end, we used tyrosine kinase inhibitor targets defined by the quantitative dissociation constant as determined in quantitative kinase assays (65). As a proof of principle, we determined whether copy number gain in EGFR is associated with sensitivity to erlotinib (40). In our systematic approach, cell lines inhibited by erlotinib at clinically achievable dosages (up to 1 μM) were highly enriched for amplification of EGFR (P = 0.00023; Supplemental Figure 9A). We next tested our prediction model for lapatinib, a specific inhibitor of ERBB2 and EGFR, clinically approved for ERBB2-positive breast cancer (66). Again, we observed cell lines inhibited by lapatinib (n = 82) below clinically achievable dosage of 1 μM to be significantly enriched in the subgroup of cell lines with amplification of ERBB2 or EGFR (Fisher’s exact test, P = 0.009; data not shown). Thus, TESP enables discovery of clinically relevant genotype-phenotype relationships.
Encouraged by these findings, we set out to test our approach for compounds inhibiting a wide range of kinases, such as dasatinib (65). We determined the distribution of GI50 values of cell lines with chromosomal copy number gain (copy number > 3) affecting at least 1 or 2 of either one of the genes encoding the most biochemically sensitive dasatinib targets and compared these to the distribution of GI50 values of cells without copy number gain at these genomic positions (Figure (Figure6A,6A, Supplemental Table 8, and Supplemental Figure 9B). As hypothesized, these groups were significantly distinct in the distribution of GI50 values (P = 1.8 × 10–3 when 1 gene was affected and P = 4.6 × 10–3 when 2 of the target genes were affected by copy number gain; Figure Figure6A6A and Supplemental Figure 9B). In particular, this predictor comprised copy number gain at the loci of gene family members of ephrin receptor kinases (EPHA3, EPHA5, and EPHA8), SRC kinases (SRC, FRK, YES1, LCK, and BLK), and ABL2, suggesting that NSCLC cells harboring such lesions might be exquisitely sensitive to therapeutic inhibition of the encoded proteins. The probability that cell lines with copy number gain at either 1 or 2 of these genes will be sensitive to dasatinib treatment (GI50 < 100 nM) increases up to 5.6-fold (gain of 1 gene) and 15.8-fold (gain of 2 genes), respectively, when compared with cells without copy number gain at these loci (Figure (Figure6A6A and Supplemental Figure 9B). In contrast, copy number gain involving loci encoding biochemically less sensitive dasatinib targets failed to show enrichment of sensitive cell lines (data not shown).
In cells with copy number gain of biochemically defined dasatinib target genes, dasatinib treatment led to robust induction of apoptosis (data not shown). Importantly, copy number gain of at least one of either of these genes is present in 12.9% (copy number > 3) of several hundred primary lung adenocarcinomas (15) (data not shown), thus emphasizing the potential clinical relevance of our predictor.
In the dasatinib-sensitive cell-line H322M harboring amplified SRC, dasatinib treatment led to dephosphorylation of SRC at low nanomolar doses, paralleling growth inhibition at similar concentrations (Supplemental Figure 9C). In order to determine whether the genes in our dasatinib predictor are causatively linked with the activity of dasatinib, we silenced SRC by lentiviral shRNA in H322M cells (Figure (Figure6B).6B). When compared with parental cells or cells expressing the control vector, H322M-SRC–knockdown (H322MSRCkd) cells showed a massive reduction in cellular proliferation (Figure (Figure6B)6B) and increase in cell death (data not shown). In order to further validate activated SRC as the relevant dasatinib target in H322M cells, we expressed an activated allele of SRC together with a sterically demanding mutation at the gatekeeper position of the ATP-binding pocket (T341M) (67); this mutation and the analogous mutations in Bcr-Abl and EGFR (see above) induce on-target drug resistance (67) by displacing the compound from the ATP-binding pocket. As hypothesized, expression of the T3141M gatekeeper mutation but not of SRC alone rescued dasatinib-induced cell death in H322M cells (Figure (Figure6C).6C). These results formally validate SRC as the relevant dasatinib target in SRC-amplified NSCLC cells.
We also validated EPHA3 as a relevant target in H28 cells with gain of EPHA3 by showing decreased viability of these cells upon stable knockdown of EPHA3 (Supplemental Figure 10).
We next transplanted cells with or without copy number gain of SRC into nude mice. Mice were treated with either dasatinib or placebo on a daily application schedule. Again confirming our in-vitro observations, robust tumor shrinkage was observed in mice transplanted with cells harboring copy number gain of SRC (H322M) (Figure (Figure6D)6D) receiving dasatinib. In contrast, no tumor shrinkage was observed in mice transplanted with cells predicted to be resistant against dasatinib (A549) and in all mice treated with placebo (Figure (Figure6D).6D). We consistently failed to grow EPHA3-amplified H28 cells in nude mice; HCC515 cells were therefore chosen as another model of NSCLC with gain of EPHA3. Dasatinib treatment of established HCC515 tumors also induced significant tumor shrinkage (data not shown).
Together, these results show that in NSCLC, copy number gain of ephrin receptor or SRC family member genes and ABL2 may render tumor cells dependent on these kinases, thus exposing a vulnerability to therapeutic inhibition with dasatinib.
Here, we show that diverse analytical approaches of multiple orthogonal genomic and chemical perturbation data sets pertinent to a large collection of cancer cell lines afford insights into how somatic genetic lesions impact cell biology and therapeutic response in cancer. Such data sets provide a rich source for different computational approaches that each yield complementary, accurate, and valid predictors of inhibitor sensitivity. The basis for such predictions is a panel of genomically annotated NSCLC cell lines that is representative of the genetic diversity, the transcriptional profile, and the phenotypic properties of primary NSCLC tumors. The overall functional biological validity of our approach is supported by the observation that EGFR mutations are the strongest predictor of sensitivity to the EGFR inhibitor erlotinib. Others have similarly observed high activity of EGFR inhibitors in EGFR-mutant NSCLC cell lines (6, 13, 68), supporting the validity of our unbiased computational approach employing systematic global measurements of genetic lesions.
Applying systematic similarity profiling using computationally defined significant genetic lesions, we also identified predictors for compounds currently in clinical use or trials. Specifically, in an unbiased manner, we confirmed EGFR mutations not only to predict sensitivity to EGFR inhibitors (erlotinib, PD168393, vandetanib) (6–8, 47, 52) but also to the SRC/ABL inhibitor dasatinib (54, 56). We formally demonstrated that EGFR is the relevant target of dasatinib in EGFR-mutant cells by showing the lack of activity of this compound in Ba/F3 cells expressing the T790M resistance allele of EGFR. Thus, exploring multiple orthogonal genomic and chemical data sets enabled the formal definition of a relevant tumor-cell target of an FDA-approved drug.
In addition, we performed supervised identification of predictors for drug sensitivity. A noteworthy finding is the role of KRAS mutation as a predictor of sensitivity to 17-AAG. Independent validation of the predictor for an Hsp90 inhibitor in a transgenic murine lung cancer model strengthens the robustness of our approach. Given the high prevalence of cancer patients with mutated KRAS and their unfavorable prognosis, this finding might be of clinical importance, as Hsp90 inhibitors (e.g., 17-AAG, IPI-504, NVP-AUY922) are currently under clinical evaluation.
Finally, our compound target-enrichment approach for prediction of sensitivity led to the observation of exquisite vulnerability of cells with copy number gain of ephrin receptor and SRC family genes as well as ABL2 to dasatinib treatment. As a proof of principle we validated our prediction model in great depth for the relevance of SRC amplification for dasatinib activity in vitro and in vivo. Thus, copy number gain affecting one of these genes may render tumor cells dependent on the encoded kinases, thereby defining potential biomarkers for successful treatment of NSCLC patients with dasatinib, an FDA-approved drug.
In summary, we have established a genomically, phenotypically, and functionally validated tool for studying drug activity mechanisms in the laboratory. Our results strengthen the notion that multiple orthogonal data sets pertinent to large cancer cell-line collections may offer an as-yet-unmatched potential for exploring the cell-biological impact of novel compounds in genomically defined cancer types. Such cell-line collections may advance molecularly targeted treatment of cancer by providing a tool for preclinical molecular drug target validation on the basis of the genetic lesion signature characteristic of individual tumors.
The cell-line collection generated by A.F. Gazdar, J. Minna, and colleagues (69, 70) formed the basis of this collection. Further cell lines were obtained from ATCC, DSMZ (German Collection of Microorganisms and Cell Cultures, Germany), and our own or other cell culture collections. Details on all cell lines are listed in Supplemental Table 1, including providers and culture conditions. Cells were routinely controlled for infection with mycoplasma by MycoAlert (Cambrex) and were treated with antibiotics according to a previously published protocol (71) in case of infection.
Genomic DNA was extracted from cell lines using the Puregene kit (QIAGEN) and hybridized to high-density oligonucleotide arrays (Affymetrix) interrogating 238,000 SNP loci on all chromosomes except Y, with a median intermarker distance of 5.2 kb (mean 12.2 kb). Array experiments were performed according to the manufacturer’s instructions. SNPs were genotyped by the Affymetrix Genotyping Tools software, version 2.0. SNP array data of 371 primary samples were obtained from the Tumor Sequencing Project (processed data file viewable in GenePattern’s SNP viewer: dataset.snp; http://www.broad.mit.edu/cancer/pub/tsp/) (15). We applied what we believe is a novel and general method for GISTIC (14) to analyze the data sets. In brief, each genomic marker was scored according to an integrated measure of the prevalence and amplitude of copy number changes (and only prevalence in the case of LOH), and the statistical significance of each score was assessed by comparison with the results expected from the background aberration rate alone. The GISTIC algorithm was run using 2 different pairs of copy number thresholds: copy number 4 (amplifications); 1 (deletions); and copy number 2.14 (amplifications); 1.87 (deletions) to reflect focal and broad events, respectively. For the sake of simplicity, we refer to these settings using only the amplification threshold.
For identification of homozygous deletions, SNP data were filtered for 5 coherent SNPs exhibiting copy numbers of less than 0.5. The analysis was focused on focal losses, excluding entire chromosomal arms. Information about genes located in a region of homozygous deletion was based on hg17 build of the human genome sequence from the University of California Santa Cruz (http://genome.ucsc.edu).
The analysis was performed computing ratios of observed versus expected cooccurrence frequency of individual lesions. Hierarchical clustering of mutation data combined to quantitative copy number changes that were dichotomized was performed using the reciprocal cooccurrence ratio as distance measure with average linkage method. As the adequate threshold for occurrence of copy number lesions depends on the overall level of copy number alteration for that specific lesion, the sum of these ratios for 3 distinct thresholds was used.
Mutation status of known oncogene mutations in the genes EGFR, BRAF, ERBB2, PIK3CA, NRAS, KRAS, ABL1, AKT2, CDK4, FGFR1, FGFR3, FLT3, HRAS, JAK2, KIT, PDGFRA, and RET was determined by mass-spectrometric genotyping. Mutation status of these genes for all cell lines was published previously (22). In addition, the genes EGFR, BRAF, ERBB2, PIK3CA, KRAS, TP53, STK11, PTEN, and CDKN2A were bi-directionally sequenced following PCR amplification of all coding exons.
Expression data for 54 of the cell lines were obtained using Affymetrix U133A arrays. RNA extraction, hybridization, and scanning of arrays were performed using standard procedures (35). CEL files from U133A arrays were preprocessed using the dChip software (http://biosun1.harvard.edu/complab/dchip/; built date May 5, 2008). We compared the cell lines with cell lines and primary tumors from lung cancer (28), renal cell carcinomas (29, 72), and lymphoma (30, 73) data sets obtained from GEO (http://www.ncbi.nlm.nih.gov/geo/) by hierarchical clustering. Data were processed by standard procedures; normalization was performed in dChip. For comparison of NSCLC cell lines (U133A) and primary tumors, we used data on adenocarcinomas from Bhattacharjee and colleagues generated on U95Av2 arrays (35). We selected genes that we found differentially expressed between cell lines with mutant EGFR and WT EGFR (fold change between groups >2, 90% CI; absolute difference > 100, P < 0.01) and between erlotinib-sensitive and erlotinib-resistant cell lines (erlotinib-sensitive [GI50 < 0.1 μM] vs. erlotinib-resistant [GI50 > 2 μM], fold change > 2, 90% CI; absolute difference > 100, P < 0.005). For principal component analysis, the R language for statistical computing was used. Variable transcripts were identified using the following filtering criteria: coefficient of variation 1.9 through 10, 40% present call rate. The first principal component described 14.5% of the overall variance, the second 9.6%, and the third 8.2%. Using a cutoff of 1400 in the eigenvalue, samples were grouped according to the first principal component.
All compounds were purchased from commercial suppliers or synthesized in house, dissolved in DMSO, and stored at –80°C. Cells were plated into sterile microtiter plates using a Multidrop instrument (Thermo Scientific) and cultured overnight. Compounds were then added in serial dilutions. Cellular viability was determined after 96 hours by measuring cellular ATP content using the CellTiter-Glo Assay (Promega). Plates were measured on a Mithras LB 940 Plate Reader (Berthold Technologies). GI50 values were determined from the preimage under the growth inhibition curve, where the latter was smoothed according to the logistic function with the parameters appropriately chosen. For these analyses, we have established a semiautomated pipeline as what we believe to be a novel R package (43).
For lesion-based prediction of sensitivity, 3 different approaches were applied. First, the most sensitive and most resistant samples were chosen according to their sensitivity profile. Where the sensitivity profile of the corresponding compound did not allow a clear distinction between resistant and sensitive cell lines, groups were defined by the 25th and 75th percentiles. We used Fisher’s exact test to evaluate the association between the activity of the compound and the presence of significant lesions as defined by GISTIC. For this purpose, the cell-line panel was divided according to the presence of each lesion. The logarithmically transformed GI50 values pertinent to each group were now compared by a 2-sample Welch’s t test. In order to avoid an artificially low variance, the Welch’s t tests were based on a fixed variance determined as the mean of the variances that were clearly distinct from zero (>0.1). Details of this procedure are presented in the publication by Solit and colleagues (12).
In a next step, multilesion predictors of sensitivity were calculated using feature selection, with subsequent validation by a KNN algorithm with a leave-one-out strategy (45), in which the same choice of samples was used as above for Fisher’s exact test: For all but 1 sample, genetic lesions strongly discriminating between sensitive and resistant cell lines were selected and the prediction was validated by the remaining left-out sample. Copy number data were dichotomized to ensure a better comparability with the mutation data. Five different thresholds were used to dichotomize the copy numbers: 2.14, 2.46, 2.83, 3.25, and 4 for amplified loci; and 1.87, 1.62, 1.41, 1.23, and 1 for deletions. The collection of features and the threshold for the dichotomization were selected for which the leave-one-out validation showed best performance and was taken as the best combined predictor to the respective compound. As a measure to select the setting with the largest predictive strength, the Youden index (sensitivity + specificity – 1) was used.
For example, the best erlotinib single gene predictor was obtained when the lesion data were dichotomized using the thresholds 3.25 and 1.23, respectively. Cell lines with a GI50 of less than 0.07 μM were considered sensitive. For the predictor, the same cutoff values were used. Best performance in the leave-one-out cross validation was obtained using 15 features, k = 3 neighbors, and the cosine-based metric. Due to the problem of multiple hypothesis testing, the significance of the above Welch’s t tests as well as Fisher’s exact tests should be understood in an explorative rather than confirmative sense.
The NCI-60 cancer cell-line panel was used for validation of our findings (http://dtp.nci.nih.gov/mtargets/mt_index.html). Since the MEK inhibitor UO126 and the Hsp90 inhibitor 17-AAG were not covered by the collection of pharmacological data, we analyzed the association of the respective lesions to hypothemycin (MEK inhibitor) and to geldanamycin (17-AAG is a geldanamycin derivate) instead. Significance of association was analyzed by Fisher’s exact test. Due to strongly discordant GI50 values, the cell lines HOP62 and A549 were excluded from the analysis with respect to the Hsp90 inhibitors. The thresholds for 1q21.3 amplification were set according to the overall distribution of copy number changes in the respective data sets (2.7 corresponding to 33% of the NSCLC cell lines; 2.4 corresponding to 33% of the NCI-60 collection).
All Fisher’s exact tests, Welch’s t tests (all 2-tailed), and Wilcoxon tests were performed using R version 2.7.1 (http://www.wpic.pitt.edu/WPICCompGen/hclust/hclust.htm). A level of significance of 5% was chosen. For cluster analysis, the R routine “hclust” was used.
The crystal structures of dasatinib bound to ABL kinase (pdb code 2IVU; ref. 74) and vandetanib bound to the RET kinase (pdb code 2IVU; ref. 75) were aligned to the kinase domain of EGFR bound to erlotinib (pdb code 1M17; ref. 76) using PyMOL software, 1.1beta (DeLano Scientific LLC). Based on the structural alignment of ABL with EGFR, the binding mode for dasatinib in EGFR is identical to that of the dasatinib-Abl complex. Figures of the structures were prepared using PyMOL.
Whole-cell lysates were prepared in NP40 lysis buffer (50 mmol/l Tris-HCl, pH 7.4, 150 mmol/l NaCl, 1% NP40) supplemented with protease and phosphatase inhibitor I and II cocktails (Merck) and clarified by centrifugation. Proteins were subjected to SDS-PAGE on 12% gels, except where indicated. Western blotting was done as described previously (77). The EGFR (no. 2232), the AKT (no. 9272), and the phosphor-SRC (Tyr416) (no. 2101) antibodies were both purchased from Cell Signaling Technology. The SRC (GD11) antibody was purchased from Millipore. The Hsp90 antibody (16F1) was purchased from Stressgen (Assay Designs).The phospho-EGFR (Tyr1068) antibody was purchased from BioSource (Invitrogen). The cyclin D1 (DCS-6), the c-RAF (C-20), and the actin (C-11) antibody were purchased from Santa Cruz Biotechnology Inc. The KRAS (234-4.2) antibody was purchased from Calbiochem.
For the detection of complexes of Hsp90 with KRAS or EGFR and vice versa, whole-cell lysate (0.5–1 mg) in NP40 lysis buffer was incubated with Agarose A/G Plus preconjugated with the Hsp90 or KRAS antibody (see Western blot analyses). Immunoprecipitates were washed in NP40 lysis buffer, boiled in sample buffer, and subjected to SDS-PAGE followed by Western blotting using an anti KRAS, Hsp90, or EGFR antibody to detect complex formation.
Cells were plated in 6-well plates after 24 hours of incubation, treated with 17-AAG for 72 hours, and finally harvested after trypsinization. Then cells were washed with PBS, resuspended in annexin V binding buffer, and finally stained with annexin V–FITC and propidium iodide. FACS analysis was performed on a FACSCanto flow cytometer (BD Biosciences), and results were calculated using FACSDiva Software, version 5.0.
Replication-incompetent retroviruses were produced from pBabe-based vectors by transfection into the Phoenix 293-TL packaging cell line (Orbigen) using the calcium precipitation method. Replication-incompetent lentiviruses were produced from pLKO.1-puro based vectors containing the shRNA insert (http://www.broad.mit.edu/node/563) by cotransfection of 293-TL cells with pMD.2 and pCMVd.8.9 helper plasmids using reagent Trans-LT (Mirus). Cells were infected with viral supernatants in the presence of polybrene. After 24 hours, medium was changed and cell lines were selected with 1–2 μg/ml puromycin, from which stable transduced clonal cell lines were derived.
All mutations (Y530F; T341M) were introduced into the c-SRC ORF with the QuikChange XL II Mutagenesis Kit (Stratagene) following the instructions of the manufacturer. Oligonucleotides covering the mutations were designed with the software provided by Stratagene, and each mutant was confirmed by sequencing.
The lox-stop-lox–KRAS (LSL-KRAS) mouse lung cancer model has been described elsewhere (64). Seven mice were imaged by MRI at 12 to 20 weeks after adeno-CRE treatments to document initial tumor volume. The mice were then divided into 17-DMAG (LC Laboratories) and placebo treatment groups, with 4 and 3 mice in each group, respectively. 17-DMAG was formulated in saline and given through tail-vein injection at 20 mg/kg/d dosing schedule. Mice were imaged by MRI after 1 week of drug treatment and sacrificed for further histological analysis thereafter. The protocol for animal work was approved by the Dana-Farber Cancer Institute Institutional Animal Care and Use Committee, and the mice were housed in a pathogen-free environment at the Harvard School of Public Health.
Mice were anesthetized with 1% isoflurane; respiratory and cardiac rates were monitored with BioTrig Software, version BT1 (Bruker BioSpin). Animals were imaged in the coronal planes with a rapid acquisition with relaxation enhancement (RARE) sequence (Tr = 2000 ms; TE effect = 25 ms, where Tr = pulse repetition time and TE = minimum echo time), using 17 × 1 mm slices to cover the entire lung. Matrix size of 128 × 128 and field of view (FOV) of 2.5 × 2.5 cm2 were used for all imaging. The areas of lung tumors were manually segmented and measured using ImageJ software (version 1.33; http://rsbweb.nih.gov/ij/) on each magnetic resonance slice. Total tumor volume was calculated by adding tumor areas from all 17 slices (78). Note that MRI cannot clearly distinguish tumor lesions and postobstruction pneumonia that is induced by bronchial tumors of this particular tumor model.
All animal procedures were in accordance with the German Laws for Animal Protection and were approved by the local animal protection committee and the local authorities (Bezirksregierung Köln). Tumors were generated by s.c. injections of 5 × 106 tumor cells into nu/nu athymic male mice. When tumors had reached a size of about 50 mm3, animals were randomized into 2 groups, control (vehicle) and dasatinib-treated mice. All controls were dosed with the same volume of vehicle. Mice were treated daily by oral gavage of 20 mg/kg dasatinib. The vehicle used was propylene glycol/water (1:1). Tumor size was monitored every 2 days by measuring perpendicular diameters. Tumor volumes were calculated from the determination of the largest diameter and its perpendicular diameter according to the equation [tumor volume = a × (b2/2), where a = tumor width and b = tumor length].
We thank William Pao and William Sellers for helpful discussions and comments on the manuscript. We thank Andreas Janzer for help with immunoblotting experiments and Diana Wagner-Stippich for excellent technical assistance. Roman Thomas is a fellow of the International Association for the Study of Lung Cancer (IASLC). Stefanie Fisher holds a Köln Fortune fellowship. This work was supported by grants from the Deutsche Krebshilfe (107954 to Roman Thomas) and the German Ministry of Science and Education (BMBF) as part of the German National Genome Research Network (NGFNplus) program (01GS08100 to Roman Thomas). John Minna is supported by grants from SPORE (P50CA70907), DOD PROSPECT, and the Longenbaugh Foundation. Jordi Barretina holds a Beatriu de Pinos fellowship from the Departament d’Educació i Universitats de la Generalitat de Catalunya. K.-K. Wong was supported by NIH grants R01 CA122794 and R01 AG2400401; Dana-Farber/Harvard Cancer Center Lung Cancer Specialized Program of Research Excellence (SPORE) grant P50 CA090578; and the Cecily and Robert Harris Foundation.
Authorship note: Martin L. Sos, Kathrin Michel, Thomas Zander, Peter Frommolt, and Jonathan Weiss contributed equally to this work.
Conflict of interest: R.K. Thomas has received research support from AstraZeneca. J. Wolf has received research support from Novartis and Roche. A.F. Gazdar has served as a consultant/ lecturer for AstraZeneca, Genentech, and Boehringer Mannheim. L.A. Garraway has received research support from and served as a consultant for Novartis. M. Meyerson is a consultant for and received research funding from Novartis. M. Meyerson is an inventor on a patent describing EGFR mutation testing as a diagnostic test.
Nonstandard abbreviations used: 17-AAG, 17-(allylamino)-17-demethoxygeldanamycin; ABL2, v-abl Abelson murine leukemia viral oncogene homolog 2; BCR, breakpoint cluster region; BRAF, v-raf murine sarcoma viral oncogene homolog B1; 17-DMAG, 17-(dimethylaminoethylamino)-17-demethoxygeldanamycin; ERBB2, v-erb b2 erythroblastic leukemia viral oncogene homolog 2, neuro/glioblastoma-derived oncogene homolog (avian); GI50, half-maximal growth inhibitory concentrations; GISTIC, Genomic Identification of Significant Targets in Cancer; KIT, v-kit Hardy-Zuckerman 4 feline sarcoma viral oncogene homolog; KNN, K-nearest-neighbor; KRAS, v-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog; LOH, loss of heterozygosity; NSCLC, non–small cell lung cancer; PIK3CA, phosphoinositide-3-kinase, catalytic, α polypeptide; SRC, v-src sarcoma (Schmidt-Ruppin A-2) viral oncogene homolog (avian); TESP, Target-Enriched Sensitivity Prediction.
Citation for this article: J. Clin. Invest. 119:1727–1740 (2009). doi:10.1172/JCI37127