Human cancer cell lines represent a mainstay of tumor biology and drug discovery through facile experimental manipulation, global and detailed mechanistic studies, and various high-throughput applications. Numerous studies have employed cell line panels annotated with both genetic and pharmacologic data, either within a tumor lineage
3–5 or across multiple cancer types
6–12. While affirming the promise of systematic cell line studies, many prior efforts were limited in their depth of genetic characterization and pharmacologic interrogation.
To address these challenges, we generated a large-scale genomic dataset for 947 human cancer cell lines, together with pharmacologic profiling of 24 compounds across ~500 of these lines. The resulting collection, which we termed the Cancer Cell Line Encyclopedia (CCLE), encompasses 36 tumor types (,
Supplementary Table 1 and
www.broadinstitute.org/ccle). All cell lines were characterized by several genomic technology platforms. The mutational status of >1,600 genes was determined by targeted massively parallel sequencing, followed by removal of variants likely to be germline events (
Supplementary Methods). Moreover, 392 recurrent mutations affecting 33 known cancer genes were assessed by mass spectrometric genotyping
13 (
Supplementary Table 2 and
Supplementary Fig. 1). DNA copy number was measured using high-density single nucleotide polymorphism arrays (Affymetrix SNP 6.0;
Supplementary Methods). Finally, mRNA expression levels were obtained for each of the lines using Affymetrix U133 plus 2.0 arrays. These data were also used to confirm cell line identities (
Supplementary Methods,
Supplementary Figs. 2–4).
We next measured the genomic similarities by lineage between CCLE lines and primary tumors from Tumorscape
14, exp
O, MILE and COSMIC datasets (, see
Supplementary Methods). For most lineages, a strong positive correlation was observed in both chromosomal copy number and gene expression patterns (median correlation coefficients of 0.77, range = 0.52–0.94, p < 10
−15, for copy number and 0.60, range = 0.29–0.77, p < 10
−15, for expression, respectively; ,
Supplementary Table 3 and 4), as has been described previously
3–5,15. A positive correlation was also observed for point mutation frequencies (median correlation coefficient = 0.71, range = −0.06–0.97, p < 10
−2 for all but 3 lineages,
Supplementary Fig. 5), even when
TP53 was removed from the dataset (median correlation coefficient = 0.64, range = −0.31–0.97, p < 10
−2 for all but 3 lineages; ,
Supplementary Table 5). Thus, with relatively few exceptions (
Supplementary Information), the CCLE may provide representative genetic proxies for primary tumors in many cancer types.
Given the pressing clinical need for robust molecular correlates of anticancer drug response, we incorporated a systematic framework to ascertain molecular correlates of pharmacologic sensitivity
in vitro. First, 8-point dose response curves for 24 compounds (targeted and cytotoxic agents) across 481 cell lines were generated (
Supplementary Tables 1 and 6, and
Supplementary Methods). These curves were represented by a logistic sigmoidal function with a maximal effect level (A
max), the concentration at half- maximal activity of the compound (EC
50), a Hill coefficient representing the sigmoidal transition, and the concentration at which the drug response reached an absolute inhibition of 50% (IC
50).
Broadly active compounds, exemplified by the HDAC inhibitor panobinostat, showed a roughly even distribution of A
max and EC
50 values across most cell lines (). In contrast, the RAF inhibitor PLX4720 displayed a more selective profile: A
max or EC
50 values for most cell lines could be categorized as “sensitive” or “insensitive” to PLX4720, with sensitive lines enriched for the
BRAFV600E mutation (). To capture simultaneously the efficacy and potency of a drug, we designated an “activity area” ( and
Supplementary Fig. 6). The 24 compounds profiled showed wide variations in activity area, and those with similar mechanisms of action clustered together (
Supplementary Fig. 7).
Genomic correlates of drug sensitivity may be extracted by predictive models using machine learning techniques
6,10. We therefore assembled all CCLE genomic data types into a matrix wherein each feature was converted to a
z-score across all lines (
Supplementary Methods). Next, we adapted a categorical modeling approach that utilized a naive Bayes classification and discrete sensitivity calls, or an elastic net regression analysis
16 for continuous sensitivity measurements. Both approaches were applied to all compounds with or without gene expression data (
Supplementary Methods). Prediction performance was determined using ten-fold cross-validation, and the elastic net features were bootstrapped to retain only those that were consistent across runs (
Supplementary Methods).
Out of >50,000 input features, the regression-based analysis identified multiple known features as top predictors of sensitivity to several agents (
Supplementary Table 7 and
Supplementary Fig. 8 and 9), with robust cross-validated performance (
Supplementary Fig. 10 and 11). For example, activating mutations in
BRAF and
NRAS were among the top four predictors of sensitivity in models generated for the MEK inhibitor PD-0325901
10 (). Additional predictive features for MEK inhibition included expression of
PTEN,
PTPN5, and
SPRY2, which encodes a regulator of MAPK output.
KRAS mutations were also identified, albeit with a lower predictive value (,
Supplementary Tables 8–9 and
Supplementary Fig. 8).
Additional top predictors included
EGFR mutations and
ERBB2 amplification/over- expression for Erlotinib
8 and Lapatinib
17, respectively;
BRAFV600E for RAF inhibitors (PLX4720
18 and RAF265);
HGF expression and
MET amplification for the MET/ALK inhibitor PF-2341066
19; and
MDM2 over-expression for Nutlin-3
20 sensitivity. Variants affecting the
EXT2 gene, which encodes a glycosyltransferase involved in heparin sulfate biosynthesis, were significantly correlated with Erlotinib sensitivity (
Supplementary Fig. 12). This observation is intriguing in light of a report linking heparin sulfate with erlotinib sensitivity
21. In addition,
NQO1 expression was identified as the top predictive feature for sensitivity to the Hsp90 inhibitor 17-AAG, a quinone moiety metabolized by NAD(P)H:quinone oxidoreductase (NQO1). NQO1 produces a high-potency intermediate (17-AAGH2)
22, and has previously been identified as a potential biomarker for Hsp90 inhibitors
23.
Since some genetic/molecular alterations occur commonly in specific tumor types, lineage may become a confounding factor in predictive analyses. Indeed, a classifier built using the entire cell line dataset performed suboptimally when applied exclusively to melanoma derived-cell lines (), whereas a model built with only melanoma cell lines performed better (). Predictive features in the melanoma-only model showed a strong over-expression of genes regulated by the transcription factors MITF and SOX10 (
Supplementary Table 10), recently identified as predictive of RAF inhibitor drug sensitivity within a melanoma-dominated cell line collection.
On the other hand, lineage emerged as the predominant predictive feature for several compounds. For example, elastic net studies of the HDAC inhibitor LBH589 (panobinostat) identified hematologic lineages as predictors of sensitivity ( and
Supplementary Fig. 9). Interestingly, most clinical responses to panobinostat and related compounds (e.g., vorinostat and romidepsin) have been observed in hematological cancers. Similarly, most multiple myeloma cell lines (12 of 14 lines tested) exhibited enhanced sensitivity to the IGF-1 receptor inhibitor AEW541 ( and
Supplementary Fig. 8 and 9) and showed high
IGF1 expression (). Interestingly, elevated
IGF1R expression also correlated with AEW541 sensitivity (
Supplementary Fig. 9). The CCLE results suggest that multiple myeloma may be a promising indication for clinical trials of IGF-1 receptor inhibitors
24 and that these drugs may have enhanced efficacy in cancers with high
IGF1 or
IGF1R expression.
While
BRAF and
NRAS mutations are known single-gene predictors of sensitivity to MEK inhibitors, several “sensitive” cell lines lacked mutations in these genes, whereas other lines harboring these mutations were nonetheless “insensitive” (). The elastic net regression model derived from the subset of cell lines with validated
NRAS mutations identified elevated expression of the
AHR gene (which encodes the aryl hydrocarbon receptor) as strongly correlated with sensitivity to the MEK inhibitor PD-0325901 (). This finding was intriguing in light of prior studies suggesting that a related MEK inhibitor (PD-98059) may also function as a direct AHR antagonist
25. We therefore hypothesized that the enhanced sensitivity of some
NRAS-mutant cell lines to MEK inhibitors might relate to a coexistent dependence on AHR function.
To test this hypothesis, we first confirmed the correlation between
AHR expression and sensitivity to MEK inhibitors in a subset of
NRAS-mutant cell lines ( and
Supplementary Fig. 13). Next, we performed shRNA knockdown of
AHR in cell lines with high or low AHR expression (). Silencing of
AHR suppressed the growth of three
NRAS-mutant cell lines with elevated
AHR expression (), but had no effect on the growth of two lines with low
AHR expression (). The growth inhibitory effect was confirmed with two additional shRNAs, where evidence for a dose-dependent knockdown effect was also apparent (). We also tested the hypothesis that allosteric MEK inhibitors may function as AHR antagonists by measuring the effect of PD-0325901 and PD-98059 on endogenous
CYP1A1 mRNA, a transcriptional target of AHR in some contexts. Both compounds reduced
CYP1A1 levels in
NRAS-mutant melanoma cells (IPC-298 and SK-MEL-2; ) but not in neuroblastoma cells (CHP-212, ), suggesting that other factors may govern
CYP1A1 expression in the latter lineage. Together, these results suggest that AHR dependency may co-occur with MAP kinase activation in some
NRAS-mutant cancer cells, and that elevated AHR may serve as a mechanistic biomarker for enhanced MEK inhibitor sensitivity in this setting.
We also looked for markers predictive of response to several conventional chemotherapeutic agents (
Supplementary Fig. 7 and
Supplementary Table 6) and identified
SLFN11 expression as the top correlate of sensitivity to irinotecan (), a camptothecin analog that inhibits the topoisomerase I (TOP1) enzyme.
SLFN11 expression also emerged as the top predictor of topotecan sensitivity (another TOP1 inhibitor;
Supplementary Figs. 8 and 14). Overall, 12 of 16 lineages showed significant
SLFN11 associations for topotecan or irinotecan sensitivity (Pearson’s r ≥ 0.2,
Supplementary Fig. 14b). This finding was independently validated using data from the NCI-60 collection (
Supplementary Fig. 15).
SLFN11 knockdown did not affect steady-state growth sensitivity profiles (
Supplementary Fig. 14d–f).
All three Ewing’s sarcoma cell lines screened showed both high
SLFN11 expression and sensitivity to irinotecan (,
Supplementary Fig. 14). Ewing’s sarcomas also exhibited the highest
SLFN11 expression among 4,103 primary tumor samples spanning 39 lineages (), suggesting that TOP1 inhibitors might offer an effective treatment option for this cancer type. Toward this end, several ongoing trials in Ewing’s sarcoma are examining irinotecan-based combinations, or the addition of topotecan to standard regimens
26. For some lineages with high
SLFN11 expression, (e.g. cervical adenocarcinoma) topoisomerase inhibitors already comprise a standard chemotherapy regimen. In other tumors where topoisomerase inhibitors are commonly used (e.g., colorectal and ovarian cancers), a range of
SLFN11 expression was observed, raising the possibility that high
SLFN11 expression might enrich for tumors more likely to respond. If confirmed in correlative clinical studies,
SLFN11 expression may offer a means to stratify patients for topoisomerase inhibitor treatment.
By assembling the Cancer Cell Line Encyclopedia (CCLE), we have expanded the process of detailed annotation of preclinical human cancer models (
www.broadinstitute.org/ccle). Genomic predictors of drug sensitivity revealed both known and novel candidate biomarkers of response. Even within genetically defined sub-populations—or when agents were broadly active without clear genetic targets—predictive modeling studies identified key predictors or mechanistic effectors of drug response. Future efforts that increase the scale and add additional types of information (e.g., whole genome/transcriptome sequencing, epigenetic studies, metabolic profiling or proteomic/phosphoproteomic analysis) should enable additional insights. In the future, comprehensive and tractable cell line systems provided through this and other efforts
27 may facilitate numerous advances in cancer biology and drug discovery.