|Home | About | Journals | Submit | Contact Us | Français|
Clinical responses to anticancer therapies are often restricted to a subset of patients. In some cases, mutated cancer genes are potent biomarkers of response to targeted agents. To uncover new biomarkers of sensitivity and resistance to cancer therapeutics, we screened a panel of several hundred cancer cell lines, which represent much of the tissue-type and genetic diversity of human cancers, with 130 drugs under clinical and preclinical investigation. In aggregate, we found mutated cancer genes were associated with cellular response to most currently available cancer drugs. Classic oncogene addiction paradigms were modified by additional tissue-specific or expression biomarkers, and some frequently mutated genes were associated with sensitivity to a broad range of therapeutic agents. Unexpected relationships were revealed, including the marked sensitivity of Ewing’s sarcoma cells harboring the EWS-FLI1 gene translocation to PARP inhibitors. By linking drug activity to the functional complexity of cancer genomes, systematic pharmacogenomic profiling in cancer cell lines provides a powerful biomarker discovery platform to guide rational cancer therapeutic strategies.
There is compelling evidence that the likelihood of a patient’s cancer responding to treatment can be strongly influenced by alterations in the cancer genome. For example, the use of drugs to selectively target the protein product of the BCR-ABL translocation in chronic myeloid leukemia (CML) has revolutionized the treatment of this disease, with 5-year survival rates of 90% in treated patients1. While targeting of specific genetic changes in defined patient subsets has been successful, a poorly explained range of responses to appropriately selected therapies is often still observed in patients2,3. Moreover, a large number of cancer drugs have not been linked to specific genomic alterations that could be used as biomarkers to specify their selective therapeutic effectiveness4. As drug pipelines generate new classes of compounds, systematic methods to identify predictive biomarkers during their early development could have a profound effect on the design, cost and ultimate success of new cancer drug development.
The NCI60 cell line panel and associated drug screens pioneered the approach of using cancer cell lines to link drug sensitivity with genotype data5,6. Cancer cell lines have subsequently been used to identify rare drug-sensitizing genotypes, including mutant EGFR, BRAF and EML4-ALK, that are highly predictive of clinical responses2,3,7. Here, we report the results of a large-scale screen of human cancer cell lines, incorporating detailed genomic and gene expression analysis, to systematically identify drug sensitivity biomarkers to a broad range of cancer drugs.
In order to capture the high degree of genomic diversity in cancer and to identify rare mutant subsets with altered drug sensitivity, we assembled 639 human tumour cell lines, representing the spectrum of common and rare types of adult and childhood cancers of epithelial, mesenchymal and haematopoietic origin (Fig. 1a and Supplementary Data 1). Cell lines were subjected to sequencing of the full coding exons of 64 commonly mutated cancer genes, genome-wide analysis of copy number gain and loss using Affymetrix SNP6.0 microarrays, and expression profiling of 14,500 genes using Affymetrix HT-U133A microarrays. The presence of seven commonly rearranged cancer genes and of microsatellite instability (MSI) were also investigated. The 130 drugs selected for screening covered a wide range of targets and processes implicated in cancer biology (Fig. 1b and Supplementary Data 2). They encompassed both targeted agents (n =114) and cytotoxic chemotherapeutics (n=13), including approved drugs used in clinical practice (n=31), drugs in development undergoing studies in clinical trials (n=47), and experimental tool compounds (n=52). To gain insight into drug-to-drug variation, we included multiple drugs designed against well-credentialed targets (Fig. 1b). The effect of 72 hours of drug treatment on cell viability was used to derive a multi-parameter description of drug sensitivity, including the half maximal inhibitory concentration (IC50), and the slope of the dose response curve (Supplementary Fig. 1). In total, we assayed 48,178 drug-cell line combinations with a range of 275-507 cell lines screened per drug (mean = 368 cell lines per drug; Supplementary Data 2). Clustering of compounds across cell lines based on IC50 values indicated that drugs with overlapping specificity were highly correlated, supporting the selectivity of the biological effects observed in the dataset (Supplementary Fig. 3 and Supplementary Tables 1 and 2).
Tumours from a particular tissue frequently have a shared set of somatic mutations. To gain insight into how this might relate to drug sensitivity, we performed an analysis to identify associations between cancer tissue type and drug sensitivity based on IC50 values. As expected, in some instances tumour-type specific sensitivity may be explained by the prevalence of cancer gene mutations (e.g. breast cancer sensitivity to inhibitors of the PI3Kinase pathway which is commonly altered in this tumour type; Supplementary Data 3 and 4). In other cases, however, our current understanding of cancer genomes could not explain the observed associations. For example, renal cell carcinoma (RCC) cells were sensitive to five SRC inhibitors (e.g. AZD0530, P < 1 × 10−4, n = 9 RCC and 294 non-RCC)8, glioma cells to a ROCK inhibitor (GSK269962A, P < 1 × 10−6, n = 23 glioma and non-glioma 266)9. This analysis also identified therapeutic associations already used in the clinic with incompletely understood molecular basis such as sensitivity of myeloma cells to lenalidomide (P < 1 × 10−5, n = 3 myeloma and non-myeloma 455)10. For most drugs, however, sensitive cell lines were scattered across multiple cancer types.
Single gene mutations are increasingly being adopted as clinical biomarkers for the optimal application of cancer therapeutics. To identify associations between individual mutated cancer genes and drug sensitivity across the cell line panel we used a multivariate analyses of variance (MANOVA) incorporating the IC50 and slope of the dose response curve. This analysis revealed a large number of individual gene-drug associations, a subset of which (448/9039, 5%) were highly significant and are discussed here (Fig. 1c and Supplementary Data 5). Interestingly, most of the cancer genes analyzed, including those that are not known direct targets of the drugs tested, were associated with either sensitivity or resistance to at least one drug in our panel (65/69, 94%) (Supplementary Fig. 4). Similarly, sensitivity to most drugs tested was associated with a mutation in at least one cancer gene (118/130, 91%). Thus, diverse cancer gene mutations are implicated as markers of sensitivity or resistance to a broad range of anti-cancer drugs, indicating that genomic biomarkers could inform the therapeutic selectivity of many cancer drugs.
The mutated cancer genes most clearly associated with drug sensitivity are oncogenes that are direct targets of the relevant drug. For example, the BCR-ABL rearrangement conferred sensitivity to multiple ABL inhibitors (e.g. P = 2.54 × 10−65 for nilotinib, Fig. 1c and Supplementary Fig. 5)1, several of which are approved for CML treatment. Similarly, BRAF mutation was associated with sensitivity to BRAF and MEK1/2 inhibitors (e.g. P = 1.25 × 10−24 for PLX4720, Fig. 1c and Fig. 2a, b and c)3, including a structural analogue of Vemurafenib, which in clinical trials has extended the survival of BRAF mutation-positive melanoma patients. Additionally, ERBB2 (HER2) amplification was associated with sensitivity to EGFR-family inhibitors including Lapatinib (P < 1 × 10−7, Fig. 2d)11, which is licensed for the treatment of HER2 positive breast cancer. We were also able to detect known associations between EGFR, FLT3, and PIK3CA mutations and drugs that target the products of these genes (Supplementary Data 5)12,13. A number of associations were driven by dramatic responses in small subsets of outlier cell lines. For example, two FGFR2-mutated cell lines were exquisitely sensitive to the FGFR inhibitor PD-173074 (Fig. 2e; P < 1 × 10−5)14,15, confirming the need for large panels of cell lines to capture low frequency drug sensitizing genotypes.
We also found associations between the presence of inactivating mutations in tumor suppressor genes and several drugs, which in some instances provide insight into the interplay between tumour suppressors and the cellular machinery in mediating drug sensitivity. For example, mutation of TP53, an important regulator of apoptosis and cell cycle arrest in response to cellular stress, confers resistance to Nutlin-3a (P < 1 × 10−36), an inhibitor of the MDM2 E3-ligase which negatively regulates p53 protein levels (Fig. 2f)16. Similarly, mutational inactivation of RB1, a key repressor of cell cycle progression in normal cells, confers resistance to PD-0332991 (P < 1 × 10−10), an inhibitor of the upstream cyclin-dependent kinases (CDKs) 4 and 6, which drive cell cycle progression by inhibiting pRb through phosphorylation (Fig. 2g)17. Conversely, mutational inactivation of CDKN2A, encoding the CDK inhibitory protein p16, was associated with sensitivity to PD-0332991 (P < 1 × 10−11; Supplementary Data 5)17, presumably because CDKN2A mutated cells have an enhanced requirement for signaling through the CDK4/6 - pRb signaling pathway.
In other instances genomic associations appear related to enrichment of mutations in a particular tissue type. The association of BRAF and NRAS mutation with sensitivity to obatoclax mesylate, a pro-apoptotic drug that targets BCL2 family anti-apoptotic proteins (BCL2, BCL-XL and MCL1), likely results from the enrichment of these mutations in melanoma, since drug sensitivity among melanoma cell lines was not correlated with presence or absence of these mutations (Supplementary Fig. 6). The tissue-specific effect of obatoclax may be related to inhibition of the melanoma survival mediator MCL118, since sensitivity of melanoma lines to ABT-263, another BCL-2 inhibitor which does not target MCL1, was not correlated with BRAF or NRAS mutation. Moreover, an ABT-263 insensitive melanoma cell line can be sensitized to this drug by siRNA-mediated depletion of MCL1 (Supplementary Fig. 7).
The genomic associations identified for 13 clinically approved cytotoxic chemotherapeutics in our panel were generally less significant than for targeted drugs, indicating that single gene biomarkers may be less informative for this class of drugs with broad action across many cancers (Supplementary Fig. 8 and 9). Intriguingly, we did not find general associations between targeted or cytotoxic drug sensitivity patterns and mutations in TP53. It may be that functional inactivation of p53, through mutations or abrogation of signaling pathways that regulate its activity, is an almost universal feature of cancer cell lines and thus differential drug sensitivity between mutant and non-mutant cell lines is not observed19.
Several other novel gene-drug associations were identified that cannot be readily explained on the basis of our current knowledge of signaling pathways and may reflect unappreciated biological relationships. Mutation of NOTCH1 was associated with sensitivity to ABT-263 (P <1 × 10−9; Fig. 2h, Supplementary Fig 10), perhaps due to decreased expression of BCL2 family members in NOTCH1 mutant cell lines (Supplementary Fig. 11). Amplification of CCND1 (CyclinD1) or loss of SMAD4 were associated with sensitivity to multiple EGFR-family inhibitors including lapatinib and BIBW2992; and for SMAD4 this correlated with elevated EGFR gene expression (Fig. 2i and Supplementary Fig. 12). Inactivation of STK11 (LKB1; P < 0.01), thought to relieve repression of mTOR, was associated with sensitivity to the HSP90 inhibitor 17-AAG. Additionally, loss of FBXW7 was associated with sensitivity to the histone deacetylase (HDAC) inhibitor MS-275 (P < 1 × 10−5; Fig. 2j), and TET2 loss with sensitivity to the WEE1/CHK1 inhibitor 681640 (P < 1 × 10−4; Fig. 2k). These associations, and others presented here (Supplementary Data 5), represent candidate biomarkers of drug sensitivity and may ultimately be useful for the deployment of targeted therapies in cancer.
In most instances sensitivity of cancer cells to drugs is likely to depend on a multiplicity of genomic and epigenomic variables. Indeed, single gene-drug associations were only rarely able to explain the range of drug sensitivities observed across cell lines for any given drug (Fig. 2). We thus applied elastic net (EN) regression20, a penalized linear modeling technique, to identify cooperative interactions among multiple genes and transcripts across the genome and defined response signatures for each drug. EN identified 26,938 feature-drug associations (Supplementary Data 6) from which 534 associations corresponding to 69 different drugs were highly significant (defined as −2.95> effect(e)>2.79 and frequency (f) > 0.76; Fig. 3a and Supplementary Fig. 13 and Supplementary Data 7).
In many instances transcriptional features showed correlations with drug sensitivity that were equal to or stronger than those observed with gene mutation (Fig. 3a and Supplementary Table 3). For example, while sensitivity to the EGFR/ERBB2 inhibitor lapatinib correlated with ERBB2 expression and mutation, the strongest correlate for this drug was actually expression of the matrix metalloproteinase MMP28 (e = −29.28, f=1) (Fig. 3a). Notably, for most drugs, including those with clear linkage to cancer gene mutations, EN modeling identified multi-feature signatures of drug sensitivity. For example, together with BRAF mutation, sensitivity to RAF or MEK1/2 inhibitors was recurrently associated with 67 features. These features included expression of SPRY2 and DUSP4/6, which are known regulators of MAP-Kinase signaling (Supplementary Fig. 14)21,22. Interestingly, expression of 8 genes identified as markers of sensitivity to the MEK inhibitor AZD6244 significantly overlapped with a 18-gene signature of sensitivity to this drug (hypergeometric test of the overlap significance: P = 3 × 10−9)23. In some cases, EN modeling identified complex patterns of biomarkers corresponding to distinct subsets of sensitive cancer cell lines. Thus, sensitivity to Dasatinib, an inhibitor of multiple kinases including ABL and SRC, correlated with both BCR-ABL translocation and with multi-gene transcriptional signatures that were expressed in sensitive cell lines lacking that gene fusion (Fig. 3b).
EN modeling also identified transcriptional correlates of sensitivity for drugs without a known sensitizing mutational event. This included expression of LAG3, which correlated with sensitivity to the SGK inhibitor GSK-650394 (e= −29.9, f =1) and the correlation between expression of the NADPH dehydrogenase family member NQO1 with sensitivity to the HSP90 inhibitor 17-AAG (e = −22.21, f =1, Fig. 3c). Consistent with these findings NQO1 was previously shown to metabolize 17-AAG into a more potent inhibitor of HSP9024.
A small number of features were recurrently associated with altered sensitivity to drugs from different classes indicating that they may be broadly involved in mediating drug sensitivity by diverse mechanisms such as drug efflux (e.g. ABCB1; Supplementary Data 6). To give further insight into this dataset, we have mapped the EN drug signatures onto the target of the drugs (Supplementary Data 7) and onto 457 known cancer genes (http://www.sanger.ac.uk/genetics/CGP/Census/; Supplementary Data 8). Collectively, these observations illustrate that in many instances multi-feature genomic signatures incorporating markers related to mutations, tissue lineage, cellular differentiation states and cellular pathways have the potential to expand and refine our current understanding of drug sensitivity.
We identified a highly significant association between the EWS-FLI1 rearrangement that is characteristic of Ewing’s sarcoma tumours and sensitivity to olaparib (AZD2281), an inhibitor of the poly-ADP ribose polymerase (PARP)(P = 1.03 × 10−26, Fig. 1c and and4a).4a). Screening of a structurally distinct PARP inhibitor, AG-014699, across a large panel of cell lines confirmed the sensitivity of Ewing’s sarcoma cell lines (geometric mean IC50 for EWS-FLI1 = 4.7 uM versus 64 uM for wild-type, P < 0.0001 Mann-Whitney test, n = 291 cell lines; Fig. 4a, Supplementary Fig. 15 and Supplementary Data 9). Cells from Ewing’s sarcoma were more sensitive to olaparib (P = 2.84 × 10−11) than cells from other tumour types, including sarcomas of bone and soft tissue (Supplementary Fig. 15 and Supplementary Data 3). PARP inhibitors have activity in BRCA1/2 mutant cancers due to the defects in homologous recombination present in these tumors and their consequent reliance on alternative DNA damage repair pathways that are targeted by these inhibitors25. A comparison of olaparib and AG-014699 sensitivity in a panel of cell lines using a 6-day viability assay and colony formation experiments confirmed the marked sensitivity of Ewing’s sarcoma to PARP inhibitors, an effect that was comparable to that observed in BRCA-deficient cells (Fig. 4b and c, and Supplementary Fig. 16 and 17)26. Furthermore, treatment with olaparib selectively induced apoptosis in Ewing’s sarcoma compared to control cells (Fig 4d). Unlike in Ewing’s Sarcoma, we did not observe an association between BRCA1/2 mutations and sensitivity to PARP inhibitors in the 3-day screening format, which is likely due to a requirement for several rounds of division in these cells to accumulate toxic levels of DNA damage.
To assess whether the sensitivity to PARP inhibitors is due to the presence of the EWS-FLI1 rearrangement or intrinsic to the mesenchymal precursor cell type from which Ewing’s sarcoma arise, we compared the sensitivity to olaparib in mouse mesenchymal cells transformed with either EWS-FLI1 or the related liposarcoma-associated translocation FUS-CHOP27,28. EWS-FLI1 transformed cells showed sensitivity comparable to human Ewing’s sarcoma cells, while the FUS-CHOP-transformed cells were relatively resistant (Fig. 4e and Supplementary Fig. 18). Moreover, expression of EWS-FLI1 in NIH3T3 cells conferred increased sensitivity to olaparib (Supplementary Fig. 19), whereas olaparib sensitivity was partially reverted by transiently depleting EWS-FLI1 from Ewing’s sarcoma cells (Fig 4f and Supplementary Fig. 20). Higher FLI1 expression was also correlated with sensitivity to olaparib even when considering only non-Ewing’s sarcoma cell lines (r= −0.32 between IC50 and FLI1 expression, n =413, P = 1.68 × 10−11), suggesting that the sensitivity to olaparib of Ewing’s sarcoma lines might be related to EWS-FLI1 transcriptional activity.
Mutations in BRCA1 or BRCA2 are not present in these Ewing’s sarcoma lines (Supplementary Data 1), and we have observed no evidence to indicate that the DNA damage response is defective in Ewing’s sarcoma cells (data not shown). However, for reasons that are currently unclear, the EWS-FLI1 translocation was associated with sensitivity to cytotoxic drugs, including DNA damaging agents such as camptothecin (P < 1 × 10−5), cisplatin (P < 1 × 10−4) and mitomycin-C (P < 0.001)(Supplementary Fig. 21 and 22). Together with the report of olaparib sensitivity in prostate cancer cell lines expressing the ETS gene fusion, TMPRSS2-ERG29, our data support the potential utility of ETS gene fusions as biomarkers of sensitivity to PARP inhibitors. Notably however, unlike the effect reported in prostate cancer, we observe potent cell death with olaparib treatment alone in Ewing’s sarcoma cells. These observations raise the possibility that PARP inhibitors have utility in the treatment of Ewing’s Sarcoma, a tumour of children and young adults with a 15% five year survival rate in patients with metastatic disease or relapse after chemotherapy30.
High throughput cancer cell line screening for drug sensitivity patterns provides a strategy to identify appropriate cancer subtypes and biomarkers that may guide the early phase clinical trials of multiple novel compounds under development. The validity of this approach is supported by its effective identification of drug-genotype associations that have already been established clinically, and it sets the stage for clinical testing of novel therapeutic biomarkers, such as the association between the EWS-FLI1 translocation in Ewing’s sarcoma cells and sensitivity to PARP inhibitors. The data release accompanying this report, as well as the ongoing public web resource from future screenings (Genomics of Drug Sensitivity in Cancer; www.cancerRxgene.org), will hopefully enhance the discovery and validation of additional predictive cancer biomarkers.
While the large number of cell lines screened facilitates representation of rare cancer genotypes and mitigates against the effects of individual samples, the dataset presented here is limited both by the number of available genotypes, as well as the number of targets interrogated by currently available drugs. Despite the apparent utility of using tumour-derived cell lines grown in two dimensional culture, it is likely that experimental models that better mimic the tumour environment would in some instances further improve our understanding of drug sensitivity and provide additional insights. Nonetheless, we can discern an initial landscape of drug sensitivity patterns across a broad set of different cancer types and genomic backgrounds.
The identification of “outliers” which are exquisitely sensitive to a drug as a result of a specific genetic abnormality within a targeted pathway remains the most compelling vision for targeted cancer therapies. BCR-ABL positive CML, BRAF mutant melanoma and EGFR mutant positive lung cancer and drugs that target the protein products of these genes are now well-established associations. The observation of PARP inhibitor sensitivity by EWS-FLI1 positive Ewing’s sarcoma cell lines points to the likelihood of new potent gene-drug associations as novel chemical and genomic space are explored. Even in the absence of “outlier” effects, pharmacogenomic profiling reveals a wealth of biomarkers that may prove useful for patient stratification. Although further work is needed to assess their potential clinical utility, in some instances these biomarkers may help explain heterogeneity in clinical responsiveness even among preselected patient populations.
This work, as well as an accompanying report31, provides a systematic and extensive view of the genomics underlying the sensitivity of human cancer cell lines to the diverse array of cancer drugs currently in use or under development. The emergent picture is of a complex network of biological factors that affect sensitivity to the majority of cancer drugs. This underscores both the challenge of identifying pre-selected patient populations for targeted therapies, as well as the opportunity to improve existing therapies and identify new therapeutic avenues by identifying more predictive biomarkers.
Cells were treated with 9 concentrations (2-fold dilutions) of drug for 72 hours before measuring cell number relative to controls. A MANOVA was used to examine how drug IC50 and slope values associate with tissue type, the mutation status of 64 cancer genes (including gene amplifications and homozygous deletions), rearrangements and MSI. The elastic net utilised the same genomic datasets as the MANOVA and also incorporated additional copy number data from a total of 426 cancer genes, transcriptional profiles, and tissue type to identify feature associated with drug response as measured by cell line IC50.
We thank Philip Lo Grasso of the Scripps Research Institute for providing the inhibitor JNK9L. This work was supported by a grant from the Wellcome Trust (086357; M.R.S, P.A.F, J.S., D.A.H.) and by grants from the National Institute of Health (P41GM079575-02; N.S.G and 1U54HG006097-01; N.S.G, D.A.H). S.R. is supported by a Physician-Scientist Early Career Award from the Howard Hughes Medical Institute. U.M is supported by a Cancer Research UK Clinician Scientist Fellowship.
All cell lines were sourced from commercial vendors. Cells were grown in RPMI or DMEM/F12 medium supplemented with 5% FBS and penicillin/streptavidin, and maintained at 37°C in a humidified atmosphere at 5% CO2. Cell lines were propagated in these two media in order to minimize the potential effect of varying the media on sensitivity to therapeutic compounds in our assay, and to facilitate high-throughput screening. To exclude cross-contaminated or synonymous lines, a panel of 92 SNPs was profiled for each cell line (Sequenom, San Diego, CA) and a pair-wise comparison score calculated. In addition, we performed short tandem repeat (STR) analysis (AmpFlSTR Identifiler, Applied Biosystems, Carlsbad, CA) and matched this to an existing STR profile generated by the providing repository. More information on the cell lines screened, including their SNP and STR profiles is available on the Genomics of Drug Sensitivity in Cancer website (www.cancerRxgene.org).
Compounds were from academic collaborators or commercial vendors. We have provided a full description for each compound including its name, source, PubChem and/or CHEMBL IDs, screening concentration as well as therapeutically relevant molecular target(s) (Supplementary Data 2). Compounds were stored as 10 mM aliquots at −80°C, and were subjected to a maximum of 5 freeze-thaw cycles. The range of concentrations selected for each compound was based on in vitro data of concentrations inhibiting relevant kinase activity and cell viability.
Cells were seeded in either 96-well or 384-well microplates in medium supplemented with 5% FBS and penicillin/streptavidin. The optimal cell number for each cell line was determined to ensure that each was in growth phase at the end of the assay (~70% confluency). Adherent cell lines were plated one day prior to treatment with a 9-point 2-fold dilution series of each compound using liquid handling robotics, and assayed at a 72-h time point. Cells were fixed in 4% formaldehyde for 30 minutes and then stained with 1 μM of the fluorescent nucleic acid stain Syto60 (Invitrogen) for 1 hour. Suspension cell lines were treated with compound immediately following plating, incubated for 72 hours, and then stained with 55 μg/ml Resazurin (Sigma) prepared in Glutathione-free media for 4 hours. Quantitation of fluorescent signal intensity was performed using a fluorescent plate reader at excitation and emission wavelengths of 630/695 nM for Syto60, and 535/595 nM for Resazurin. All screening plates were subjected to stringent quality control measures and a Z-factor score comparing negative and positive control wells was calculated across all screening plates (median = 0.70, upper quartile = 0.86, lower quartile = 0.47, n = 4857 plates). Drug screening was performed at two sites using matched cell line collections (the site of drug screening and plate format for each IC50 are described in Supplementary Data 2 and 10). As a control the drug camptothecin (drugs IDs 1003 and 195) was screened at both sites. The IC50s were highly correlated (r2 = 0.3244, slope = 1.030, n = 252 cell lines) and these drugs were nearest neighbours in our cluster analysis of drugs (Supplementary Table 1). Measurements of cell viability during 6-day assays using 3-fold dilution series of olaparib were performed in 96-well plates using Cell Titer Blue (Promega) according to manufacturers instructions. A673 and NIH3T3 cells were also exposed to a 3-fold dilution series of olaparib and cell viability measured following 72 hours of drug exposure using Cell Titer 96 Aqueous One Solution Cell (Promega) according to manufacturers instructions. Measurements of cellular apoptosis were performed using Apo-ONE caspase assay (Promega) following manufacturers instructions. The A673 Ewing’s sarcoma cell line was transiently transfected with allstars non-targeting siRNA control (Qiagen; siCT) or an siRNA targeting the EWS-FLI1 fusion protein32 (siEF1, 5′-GGC AGC AGA ACC CUU CUU ACG) and olaparib-treated immediately following siRNA-transfection. Transient knockdown of MCL1 in HT-144 and A549 cells was performed using the following siRNA sequences: oligo 1, 5′-CTG GTT TGG CAT ATC TAA TAA; oligo 2, 5′-CCC GCC GAA TTC ATT AAT TTA; oligo 3, 5′-AAG GGT TAG GAC CAA CTA CAA; oligo 4, 5′-CCC TAG CAA CCT AGC CAG AAA) and controls were mock transfected.
Cells were plated at low density into 35 mm cell culture plates and the following day treated with the indicated drug concentration or vehicle control (DMSO). The medium was changed and cells re-drugged every 3-4 days. When sufficient colonies were visible, typically after 7-21 days, cells were washed once in PBS before fixing in ice-cold methanol for 30 minutes while shaking. Methanol was aspirated and Giemsa stain added at a dilution of 1:20 overnight while shaking. The following day cells were rinsed in distilled water and air-dried.
A total of 64 of the most frequently mutated cancer genes were sequenced to base-pair resolution across all coding exons for each gene by capillary sequencing in our panel of human cancer cell lines, and which formed the basis for the cell lines chosen for this drug screen. The presence of 7 of the most commonly rearranged cancer genes (e.g. BCR-ABL, MLL-AFF1 and EWS-FLI1) was determined across the drug screen cell line panel by the design of breakpoint-specific sequence primers that enabled the detection of the rearrangement following capillary sequencing. Analysis of microsatellite instability (MSI) was carried out according to the guidelines set down by “The International Workshop on Microsatellite Instability and RER Phenotypes in Cancer Detection and Familial Predisposition”33. Samples were screened using the markers BAT25, BAT26, D5S346, D2S123 and D17S250 and were characterised as MSI if two or more markers showed instability. Total integral copy number values across the footprints of the cancer genes were determined from Affymetrix SNP6.0 microarray data using the ‘PICNIC’ algorithm to predict copy number segments in each of the cell lines34. For a gene to be classified as amplified, the entire coding sequence must be contained in one contiguous segment defined by PICNIC, and have a total copy number of eight or more. A copy number of 8 was chosen to ensure that the majority of segments above this threshold consisted of focal amplifications more likely to have been selectively amplified. Deletions must occur within a single contiguous segment with copy number zero. For gene expression analysis, RNA was extracted from each cell line using a standard Trizol protocol and hybridized to the HT-HGU133A Affymetrix whole genome array. Normalised gene expression intensities were generated using the Robust Multi-Array Average (RMA) algorithm35. The genomic data used for this analysis are provided in the Supplementary Data and transcriptional data are available from ArrayExpress (accession number E-MTAB-783). A complete description of the characterization of our cancer cell line collection is also available from the Cancer Genome Project webpages (http://www.sanger.ac.uk/genetics/CGP/CellLines/).
Dose response curves were fitted to the fluorescence signal intensities. We required a method that would firstly allow us to model the heteroscedasticity in the luminescence data, and secondly allow us to incorporate prior knowledge of response, especially at drug concentrations where the data are less informative. A bespoke Bayesian sigmoid model was thus implemented to facilitate this, yielding a full description of the uncertainties in the data, and allowing reasonable interpretation of predicted response at concentrations outside the tested range. Response curves were fitted to the fluorescence signal intensities using a Bayesian sigmoid model. Drug response data consisted of 16 (96-well format) or 42 (384-well) drug-free positive controls, 8 (96-well) or 32 (384-well) negative (no cells) controls and drug response points for nine half-fold concentrations. Technical replicate intensity responses were averaged. Generalized sigmoidal response curves are fitted as follows.
Intensity xlc is assumed to have the mean value . Parameters Imax and Imin are the mean intensities of the positive and negative controls. α and β are scale and gradient responses. f is a shape parameter. lc denotes the log-concentration. We assume that the intensity xlc has variance Var(xlc) = B.E(xlc), where B represents a noise parameter. We assume that xlc has a gamma distribution:
Positive and negative controls xmax and xmin are also gamma distributed:
The concentration giving percentage response p is given by:
We use Markov Chain Monte Carlo simulations to obtain mean posterior parameter estimates. Parameters were initialized from maximum likelihood estimates and 100,000 iterations obtained. The final 10,000 values were used for subsequent inference. The IC50 has a normal prior with 95% probability mass covering range from 1000 fold below minimum concentration tested to 1000 fold higher than the maximum tested concentration. We assume uninformative priors on the remaining parameters. Response curves are plotted using mean posterior values of ICp for p ranging between 0% and 100%. Confidence intervals for ICp are obtained from the associated posterior. Results were manually curated as part of quality control.
A fixed effects multivariate analyses of variance (MANOVA) was used to correlate response with genomics. An n × 2 dose response matrix consisting of IC50 and slope parameter β for n cell lines was constructed for each drug. A linear (no interaction terms) model explained these observables with factors including tissue type, the mutation status of cancer genes, chromosomal re-arrangements, and MSI status. Size effects and significances were obtained. A gene was defined as mutated if it fulfilled any of these criteria: a coding sequence variant in the cancer gene, a total copy number =0 (homozygous deletion) or >=8 (amplification). Only those genes with > 1 mutated cell lines in the panel were included for analysis (65 cancer genes and 3 chromosomal re-arrangements in total). The effect measures the relative difference in the mean IC50 from the wild type to mutant group (e.g. an effect of 0.1 or 10 indicates a ~10-fold decrease or increase in drug concentration, respectively). A Benjamini-Hochberg multiple testing correction threshold with false discovery rate of 20% was used to identify a candidate list of significant associations for the purposes of this paper.
We take the approach of Zou and Hastie’s elastic net20, a multivariate variable selection technique with a penalization approach. Genomic data including mutation status of 64 cancer genes, rearrangements, continuous copy number data from 426 genes causally implicated in cancer (http://www.sanger.ac.uk/genetics/CGP/Census/), and genome-wide transcriptional profiles as well as tissue type were used as input variables (Supplementary Data 1 and 11). The elastic net was used to select which of these features were associated with drug response as measured by IC50 across the cell line panel.
Let X be a n × p matrix of input features (where p is the number of features and n is the number of cell lines) and y be a vector of drug sensitivities of length n. For any non-negative λ1 and λ2
Let be the naïve elastic net estimator. Then . A scaling factor of (1+λ2) is added to the naïve elastic net to prevent double shrinking.
To determine the optimal λ1 and λ2 we let α=λ2/(λ1+λ2). Then is the elastic net penalty. 10-fold cross validation is performed to optimize λ1 and λ2 in equation 2, denoted as and respectively. To find the variables that are associated with drug response, , , X, and y are inputted into equation (2) to solve for vector β. The variables with nonzero βs are determined to be features associated with drug response.
This procedure was repeated 100 times for each drug in order to assess the stability of the features when applying the 10-fold cross validation procedure. For each of the 100 runs, a feature list was built for the drug comprised of genes, transcripts, and tissues with weights assigned to each. The final signature of markers for a drug consisted of all features that appear in any of the 100 runs along with the statistics on the frequency that the feature appears and the average weight given to that feature over the 100 runs. The weights (β) were used to assess effect sizes of features in a drug’s marker signature. The effect size of a feature was calculated by multiplying the feature’s weight by its standard deviation across the cell line panel. The effect size is therefore a normalization of the feature’s weight to account for the different scales used to measure the different genomic features. Features with higher stability of correlation in cross-validation (f) were considered of the highest confidence of truly being associated with drug response. The most significant features associated with drug response are those with both large frequency and effect size. Highly significant associations are defined as those with an effect size that was ±2 s.d. from the mean, −2.95>e>2.79 and a frequency (f) of two s.d. above the mean, f > 0.76.
The dataset was filtered to remove cell lines for which less than 50% of drugs in the panel were tested resulting in 400 cell lines remaining. Missing data points were inputted by using a nearest-neighbor approach based on Pearson-Correlation (PC) scores between IC50 values. We also applied a step to reduce effects inherently present in the data due to the integration of data that were obtained from two different screening sites and which may be amplified by the nearest-neighbor missing point imputation. In order to do this, we performed an ANOVA for each cell line to assess the difference in the average IC50 values for drugs at the two screening sites. Cell lines with a p-value < 0.01 according to this test where filtered out. Of the original dataset, 265 cell lines (41.47%) were used for cluster analysis.
As done for missing point imputation, we measured the extent of similarity among drug sensitivity/resistance profiles with Pearson-Correlation (PC) scores pair-wisely by considering for each drug its pattern of IC50 values across the cell lines (log values were considered). Then we used these similarity scores as input to the Affinity-Propagation clustering (APC) algorithm in a recursive manner36. The APC algorithm is based on a “passing-message-between-datapoints” strategy, it requires as input a pair-wise similarity matrix and gives as the output a set of disjointed clusters. It also indicates, for each of the computed clusters, a datapoint called the “cluster exemplar” (i.e. the “cluster centroid”): the datapoint that best interpolates all the other datapoints in its cluster. The APC algorithm requirement consists in a square matrix containing similarity scores between each pair of datapoints and a set of probabilities, one for each datapoint to be elected as the exemplar of its clusters. Implicitly, these probabilities determine the number of clusters to be computed. If they are uniformly distributed then the ATC algorithm from the data automatically determines the number of clusters. In the first step of our clustering strategy, we applied the APC algorithm to the whole set of drugs in the screening by assuming the probability of each drug to be elected as a cluster exemplar uniformly distributed (so the number of clusters was automatically determined by the APC algorithm). Particularly, we set the input parameter from which this probability is computed equal to the median value of all the pair-wise similarities. The algorithm provided 22 clusters. In each of these clusters, a drug was indicated as the cluster exemplar. The intra-cluster similarity score (odd Ratio) of a community C containing n drugs is computed as the average similarity (i.e. correlation between IC50 patterns) between all the possible couple of drugs belonging to C (total Avg Corr) divided by the expected average similarity between all the possible couple of drugs in a randomly selected set of n drugs. We derived a network representation of the clustered data by considering each drug as a network node and by adding a link between couple of nodes corresponding to drugs in the same cluster. This generated the “network communities” indicated with different colors in Supplementary Figure 3. We then, recursively, clustered again the cluster exemplars with the APC algorithm, in order to obtain “communities of communities” (or “rich clubs”) and added links to the network as explained above. This procedure was recursively re-applied over cluster exemplars until convergence (no datapoints were clustered together) ending up into a hierarchical network depicted in Supplementary Figure 3 and as described in37. The network visualization in Supplementary Figure 3 has been obtained by using Cytoscape38.
The inter-cluster similarity score for each couple of communities A and B was computed as follows:
Where ρX, Y is the Pearson correlation coefficient between the patterns of IC50 values of drug X and drug Y.
Empirical p-values were computed for each of these scores by permutation test: for a number n = 10,000 of trials and each ρ(A, B), we sampled two random sets of drugs A* and B* (containing | A | drugs and | B | drugs, respectively) from the whole set of 131 drugs and we computed ρ(A*, B*). Then we computed the empirical p-value for a given ρ(A, B) ≥ 0 (resp. ≤ 0) as the number of times that ρ(A*, B*) was greater (resp. less) than or equal to ρ(A, B) across the n permutation-trials, divide by n.
M.J.G, U.M, S.V.S and C.B. supervised data collection and analysis. C.D.G. and K.W. conceived and wrote the curve-fitting algorithm and performed the MANOVA; E.J.E and S.R. performed elastic net analysis and analyzed the data. P.G., I.R.T, and J.So. developed and managed screening databases with assistance from A.B and W.Y.; S.J.H. performed most of the Ewing’s sarcoma related studies with contribution from D.S. A.D., X.L, F.K. and L.C., with I.S and O.D. providing critical reagents; R.J.M., A.T.T., J.A.S., S.B., S.R.L., K.L., A. M., X.M., T.M, H.T., L.R., F.J. P.O. performed cell line screening experiments. Q.L.,W.Z., T.Z., W.H., X.D, H.G.C. and J.W.C. synthesized screening compounds and N.S.G provided guidance on their selection and use; F.I. and J.SR. performed compound activity clustering; G.R.B. and H.D. performed cell line genotyping and genetic analysis; J.A.E. and J.B. provided guidance regarding clinical relevance of the work; M.J.G. and C.B. wrote the manuscript with major contributions from S.R., S.J.H. and U.M.; M.R.S., D.A.H, J.Se. and P.A.F conceived the study, analyzed the data and edited the manuscript.
The authors declare competing financial interests. J.Se. is currently an employee of Genentech and is a shareholder of Roche.