|Home | About | Journals | Submit | Contact Us | Français|
Cancers arise owing to mutations in a subset of genes that confer growth advantage. The availability of the human genome sequence led us to propose that systematic resequencing of cancer genomes for mutations would lead to the discovery of many additional cancer genes. Here we report more than 1,000 somatic mutations found in 274 megabases (Mb) of DNA corresponding to the coding exons of 518 protein kinase genes in 210 diverse human cancers. There was substantial variation in the number and pattern of mutations in individual cancers reflecting different exposures, DNA repair defects and cellular origins. Most somatic mutations are likely to be ‘passengers’ that do not contribute to oncogenesis. However, there was evidence for ‘driver’ mutations contributing to the development of the cancers studied in approximately 120 genes. Systematic sequencing of cancer genomes therefore reveals the evolutionary diversity of cancers and implicates a larger repertoire of cancer genes than previously anticipated.
Cancers are clonal proliferations that arise owing to mutations that confer selective growth advantage on cells. The mutated genes that are causally implicated in cancer development are known as ‘cancer genes’ and more than 350 have thus far been identified (ref. 1 and http://www.sanger.ac.uk/genetics/CGP/Census/). Cancer genes have been identified by several different physical and genetic mapping strategies, by biological assays and as plausible biological candidates. Each of these approaches has identified a subset of cancer genes, leaving the possibility that others have been overlooked. The provision of the human genome sequence, therefore, led to the proposal that systematic resequencing of cancer genomes could reveal the full compendium of mutations in individual cancers and hence identify many of the remaining cancer genes2.
Somatic mutations occur in the genomes of all dividing cells, both normal and neoplastic. They may occur as a result of misincorporation during DNA replication or through exposure to exogenous or endogenous mutagens. Cancer genomes carry two biological classes of somatic mutation arising from these various processes. ‘Driver’ mutations confer growth advantage on the cell in which they occur, are causally implicated in cancer development and have therefore been positively selected. By definition, these mutations are in ‘cancer genes’. Conversely, ‘passenger’ mutations have not been subject to selection. They were present in the cell that was the progenitor of the final clonal expansion of the cancer, are biologically neutral and do not confer growth advantage. A challenge to all systematic mutation screens will, therefore, be to distinguish driver from passenger mutations. However, the prevalence and characteristics of driver and passenger mutations in cancer genomes are not currently well defined. The aim of these studies was to survey the numbers and patterns of somatic point mutations in a diverse set of human cancer genomes and hence to obtain insights into the relative contributions of driver and passenger mutations.
The protein kinase gene family was selected for these studies because the protein kinase is the domain most commonly found among known cancer genes1 and because inhibitors of mutated protein kinases have recently shown remarkable efficacy in cancer treatment3. Furthermore, the coding sequences of the protein kinases (Supplementary Table 3) constitute a much larger sample of cancer genome, approximately 1.3 Mb of DNA per case, than has previously been analysed across many cancer types, thus permitting insights into the general patterns of somatic mutation in human cancers.
Human cancers (n=210) including breast, lung, colorectal, gastric, testis, ovarian, renal, melanoma, glioma and acute lymphoblastic leukaemia (Supplementary Table 3) were screened for somatic mutations in the coding exons and splice junctions of the 518 protein kinase genes4; a total of 274 Mb of cancer genome. Of the 210 cancers analysed 169 were primary tumours, 2 were early cultures and 39 were immortal cancer cell lines.
One-thousand-and-seven somatic mutations were detected (Supplementary Table 2 and http://www.sanger.ac.uk/genetics/CGP/Studies/). Of these, 921 were single base substitutions, 78 were small insertions or deletions and 8 were complex changes, usually double nucleotide substitutions. Of the single base substitutions, 620 encoded mis-sense changes, 54 caused nonsense changes, 28 were at highly conserved positions of splice junctions and 219 were synonymous (silent) mutations. Approximately one-third of these mutations have previously been reported5-8.
Although there is extensive information on the prevalence of somatic rearrangements and copy number changes in human cancer genomes (from studies using cytogenetics and comparative genomic hybridization) there has previously been limited insight into the prevalence of somatic point mutations5,6,8-10. The results of the current studies show that the number of somatic point mutations varies widely both within and between classes of cancer (Fig. 1 and Supplementary Fig. 1).
Seventy-three out of the two-hundred-and-ten cancers showed no somatic mutations at all, whereas others showed exceptionally large numbers (Fig. 1 and Supplementary Fig. 1). The highest mutation prevalence (~77 mutations per Mb) was in two gliomas that were recurrences after treatment with the anticancer drug temozolomide, an alkylating agent that is a known mutagen7,11,12. Some individual melanomas and lung cancers also showed substantial numbers of mutations that may relate to the extent of past exposure to ultraviolet radiation (UV) and tobacco smoke carcinogens, respectively. Abnormalities in DNA repair also influenced the number of somatic mutations. Five cancers with defective DNA mismatch repair leading to microsatellite instability had a high prevalence of both base substitutions (14–40 per Mb) and small insertions and deletions at polynucleotide tracts (5–12 per Mb). Occasional cancers without known prior treatment, defects in DNA repair or mutagenic exposure also showed very large numbers of mutations.
Excluding individual cancers with known DNA repair defects or previous treatment, there were differences in overall mutation prevalence between different cancer types (Table 1). Among primary cancers, lung carcinomas showed the highest prevalence of somatic mutations (4.21 per Mb), followed by gastric cancers (2.10 per Mb), ovarian cancers (1.85 per Mb), colorectal cancers (1.21 per Mb, a prevalence similar to that previously reported10) and renal cancers (0.74 per Mb). Conversely, testis cancers (0.12 per Mb), lung carcinoids (0 per Mb) and most breast cancers (0.19 per Mb) manifested a much lower prevalence of mutations. The cancer types with high mutation prevalence mainly originate from high turnover, surface epithelia that are subject to recurrent exogenous mutagen exposure (for example, colorectal, lung and gastric). However, other less well understood factors may have a role. For example, the prevalence of somatic mutations in ovarian cancer was higher than that of colorectal cancer. Most ovarian cancers are thought to arise from the specialized peritoneal lining overlying the ovary (or ovarian inclusion cysts deriving from it), for which major exogenous exposures are not recognized and, unlike normal colorectal epithelium, is not thought to be rapidly turning over.
The large numbers of somatic mutations found in this screen also allow comparison of the mutational signatures of cancers. These signatures can carry the specific imprint of previous mutagenic exposures or DNA repair defects and hence provide insights into cancer aetiology. Signatures derived in the past from driver mutations in known cancer genes, notably TP53 (see http://www-p53.iarc.fr/index.html), have been informative but are inevitably influenced by biological selection, which distorts the patterns generated by the underlying mutational processes. In contrast, in systematic mutation screens most somatic mutations turn out to be passengers (see below) and are therefore not affected by selection.
Mutational signatures differed between cancer types (Fig. 2). In the lung cancers, melanomas and glioblastomas studied they may reflect previous exposure to tobacco carcinogens, UV light and mutagenic alkylating chemotherapy, respectively6,7. However, the pathogenesis of other mutational signatures is not understood. For example, we previously showed that a subset of breast cancers has an unusual mutational signature characterized by a high prevalence of C:G>G:C transversions (Fig. 2) that occur in a specific sequence context, at TpC/GpA dinucleotides5. We now demonstrate that C:G>G:C changes in lung, ovarian and other cancers are also strongly enriched at TpC/GpA dinucleotides (Table 2), indicating that the underlying mutational process may be more widespread than previously appreciated. In contrast, the TpC/GpA sequence context was not observed in germline C:G>G:C polymorphisms in the protein kinases, suggesting that the process is restricted to cancer cells (Supplementary Table 4). The biological basis of this mutational signature remains unknown and may be due to a defect in DNA repair or a shared mutagenic exposure.
Sequencing the coding exons of the 518 kinases yielded 921 base substitution somatic mutations. These were annotated as non-synonymous (changing an amino acid) or synonymous (not changing an amino acid). To investigate the numbers of driver and passenger mutations we examined the observed ratio of non-synonymous: synonymous mutations compared with that expected by chance alone13,14 (see Supplementary Methods for details). The underlying assumption of the analysis is that biological selection is exerted mainly on non-synonymous mutations because these may alter the structure and function of proteins. Conversely, synonymous mutations are generally biologically silent and hence cannot be selected. Therefore, a higher ratio of non-synonymous:synonymous mutations compared with that expected by chance indicates positive selection overall (selection pressure > 1) and is indicative of the presence of driver mutations. A lower non-synonymous:synonymous ratio compared with that expected by chance indicates negative selection overall (selection pressure < 1). This approach has been widely used in studies of selection during evolution15. In these analyses we have corrected for several other factors that might influence the non-synonymous:synonymous ratio (see Methods). We are, therefore, interpreting deviation from the expected ratio as owing to selection. However, we cannot completely exclude the existence of other, currently cryptic, factors that might influence the non-synonymous: synonymous ratio and hence imitate the effects of selection.
The selection pressure of all 921 base substitution mutations was 1.29 (95% confidence interval, 1.10–1.51; P=0.0013), demonstrating an excess of non-synonymous mutations compared with that expected and thus providing evidence for the existence of driver mutations within the set. Eleven out of the nine-hundred-and-twenty-one mutations (eight in BRAF and three in STK11) would have been clearly implicated, on the basis of prior knowledge, in the development of the cancers analysed16,17. Removing these mutations, however, only marginally reduces the selection pressure to 1.28 (P=0.0025), indicating that most driver mutations detected were not previously known to be involved in oncogenesis.
To evaluate further the significance of this observation, genes carrying non-synonymous somatic mutations in each cancer type were examined in additional series of each cancer. An additional 454 cancers were examined in this follow-up screen and 91 additional somatic mutations were identified (see Supplementary Information). The selection pressure among this set of mutations was 1.66, indicating that the gene set examined in the follow-up screen was enriched in cancer genes compared with the main screen (selection pressure 1.29, see above), supporting the notion that a proportion of protein kinases harbour oncogenic, driver mutations.
The numbers of passenger and driver mutations present can be estimated from these results (see Supplementary Methods). Of the 921 base substitutions in the primary screen, 763 (95% confidence interval, 675–858) are estimated to be passenger mutations. Therefore, the large majority of mutations found through sequencing cancer genomes are not implicated in cancer development, even when the search has been targeted to the coding regions of a gene family of high candidature. However, there are an estimated 158 driver mutations (95% confidence interval, 63–246), accounting for the observed positive selection pressure. These are estimated to be distributed in 119 genes (95% confidence interval, 52–149). The number of samples containing a driver mutation is estimated to be 66 (95% confidence interval, 36–77). The results, therefore, provide statistical evidence for a large set of mutated protein kinase genes implicated in the development of about one-third of the cancers studied.
To gain further insights into the nature of the driver mutations in protein kinases, we examined how the selection pressure varied among different subsets of mutations. There was no significant difference in selection pressure between mis-sense (1.27), nonsense (1.58) and splice site mutations (1.23) (P=0.3363) or between histological classes of cancer. However, the selection pressure was lower in cancers with defective DNA mismatch repair (MMR) (selection pressure 1.08; P=0.72) compared with MMR-proficient cancers (selection pressure 1.35; P=0.00089). As reported above, MMR-deficient cancers have a higher prevalence of base substitutions than MMR-proficient cancers, presumably due to an increased mutation rate. The lower selection pressure in MMR-deficient cancers is therefore compatible with a model in which driver mutations are overwhelmed by passenger mutations.
Many previously described activating mutations in protein kinase genes that contribute to cancer development are in the kinase domain (see http://www.sanger.ac.uk/genetics/CGP/cosmic/). However, the selection pressure was only slightly higher (1.40) among mutations within kinase domains compared with mutations outside (1.23; P=0.08). Mutations within the P loops and activation segments of kinase domains, in which activating mutations in cancer are often located (Fig. 3), showed a selection pressure of 1.75. Overall, the analysis suggests that, although there may be greater selection pressure for kinase domain mutations, many driver mutations are not in the kinase domains.
There were differences in selection pressure between the ten subclasses4 of protein kinase (P=0.04) with the highest in calmodulin-dependent protein kinases (1.59), atypical/other kinases (1.32) and tyrosine kinase like kinases (1.33). Many previously reported protein kinase cancer genes have been members of the tyrosine kinase or serine/threonine kinase subclasses. These analyses suggest that other subclasses are also contributing to cancer development.
To define further which protein kinases are likely to be carrying driver mutations, the 518 genes have been ranked according to the probability that each is carrying at least one driver mutation, conditional on the selection pressure estimate for each gene (Table 3; Supplementary Table 5; and see Methods). BRAF and STK11 are second and sixteenth in this ranking, providing validation of this indicator. Remarkably, the gene at the top of this statistical ranking is Titin (TTN), which carries 63 non-synonymous and 13 synonymous mutations. The selection pressure associated with TTN is only 2.04 compared with 8.36 and 7.16 for BRAF and STK11 respectively and approximately half of the non-synonymous mutations in TTN are likely to be passengers. TTN is the largest polypeptide encoded by the human genome18 and has been extensively studied as a component of the muscle contractile machinery. However, it is expressed in many cell types and has other functions that are compatible with a role in oncogenesis19-21. The role of TTN as a cancer gene is currently a mathematically based prediction and will require direct biological evaluation.
Several genes that are high in the statistical ranking have previously been associated with cancer development. Some of these genes may be activated by their somatic mutations and function as dominant cancer genes, for example NTRK3 and ITK, which are activated by rearrangement in secretory breast cancer and T-cell lymphoma respectively (see http://www.sanger.ac.uk/genetics/CGP/Census/). Others are more likely to be inactivated and operate as recessive cancer genes including ATM, in which germline mutations predispose to ataxia telangiectasia22 and breast cancer23, TGFBR2, in which frameshift somatic mutations are frequently found in mismatch repair deficient cancers24, and BMPR1A, in which germline inactivating mutations cause juvenile polyposis25. Each of these three genes has at least one somatic nonsense mutation in the screen. However, most of the genes with probable driver mutations have not previously been associated with cancer development.
Several mutations identified in conserved, functional domains are plausible candidate driver mutations. For example, mutations were found in the glycine residues of the ATP-binding P-loop GxGxxG motif of several protein kinases (Fig. 3). Similar mutations in BRAF induce cellular transformation and activate downstream MEK signalling26. Mutations were also identified within the activation segment (Fig. 3), a domain frequently harbouring oncogenic mutations in known cancer genes such as EGFR, FLT3, KIT and BRAF (see http://www.sanger.ac.uk/genetics/CGP/cosmic/). In particular, the highly conserved DFG motif at the amino-terminal end of the activation segment was mutated in eight protein kinases including three closely related members of the SRC family, HCK, LYN and FYN. Similarly, a Y589H mutation was identified in the juxtamembrane domain of PDGFRB in a gastric cancer. PDGFRB is activated by translocation in leukaemias (http://www.sanger.ac.uk/genetics/CGP/Census/), and activating mutations in the juxtamembrane domain of the PDGFRB paralogue, PDGFRA, are found in gastrointestinal stromal tumours (http://www.sanger.ac.uk/genetics/CGP/cosmic/). Tyrosine 589 is highly conserved and mutation of this residue increases the baseline kinase activity of PDGFRB, conferring IL3 independence on BaF3 cells27.
Clustering of mutations in multiple genes implicates the JNK pathway in cancer development. We and others have identified truncating and mis-sense mutations of MAP2K4 in lung, colorectal and other cancers6,28-30. Downstream signalling from MAP2K4 is mediated, in part, through phosphorylation of MAP2K7 (MKK7) and subsequent activation of JNK1 (MAPK8) and JNK2 (MAPK9)31,32. We found two different MAP2K7 mis-sense mutations of codon 162 (p.R162C and p.R162H) within the kinase domain in colorectal cancers. Moreover, we identified activation segment mutations in MAPK8 (JNK1) and a kinase domain mutation in MAPK9 (JNK2). Taken together, these data indicate that mutations in the JNK pathway are likely to be involved in cancer development.
To investigate formally the distribution of mutated genes with respect to biological pathways, we compared the set of genes with a high probability of having at least one driver mutation to a combined data set of human pathway information that is based on Reactome33, Panther34 and INOH35 data sets. Five-hundred-and-thirty-seven non-redundant pathways containing different combinations of protein kinases were examined. The FGF signalling pathway (Panther Accession P00021 http://www.pantherdb.org/) showed the highest enrichment for kinases containing non-synonymous mutations (corrected P-value of 0.011). Among genes in this pathway, previous biological and genetic information suggest that the fibroblast growth factor receptors show several plausible driver mutations. Activating germline mutations of FGFR3 are known to cause dwarfism36 Previous studies have shown that the same amino acids in FGFR3 that are mutant in the germ line, causing thanatophoric dwarfism, are mutated somatically in bladder cancer37. We observed the same pattern of coincident germline mutations causing skeletal dysplasia and somatic mutations in cancer for FGFR1 (p.P252T) and FGFR2 (p.W290C), both in lung cancers6. Other mutated genes in the FGF signalling pathway included several MAP kinases such as MAP2K4, MAP2K7, MAPK8 (JNK1) and MAPK9 (JNK2). Interestingly, pathways involved in apoptosis and cell cycle checkpoints were not enriched in this analysis, although the relative paucity of kinase-domain-containing genes in these pathways limits the power to draw definitive conclusions. Finally, comparison of our results with previously published screens of protein kinases in colorectal cancer9,30,38 identifies several genes mutated in both colorectal cancer series including BRAF, MAP2K4, ERBB4, PRKCZ and RET.
These large-scale sequencing studies have shown that the prevalence and signature of somatic mutations in human cancers are highly variable. It is likely that the full range of somatic mutation patterns will not be apparent until thousands of cancer samples have been sequenced, each one yielding several dozen mutations each. For some cancers this may require sequencing of hundreds of megabases. This information, however, will ultimately provide major insights into the mutagenic processes underlying neoplastic change.
Our results demonstrate that most somatic mutations in cancer cells are likely to be passenger mutations; however, they have also revealed surprising insights into the number of cancer genes operative in human cancer. Approximately 120 of the 518 genes screened are estimated to carry a driver mutation and therefore function as cancer genes, a larger number than previously anticipated. Interestingly, however, similar conclusions have recently been reached by others. A recent paper reported a mutational analysis of 13,023 genes in 11 colorectal and 11 breast cancers, covering ~1.7 times as much cancer genome as this study38. As in this study, they interpret an excess of observed non-synonymous mutations compared with that expected by chance as evidence for the presence of driver mutations. Their design did not include the examination of synonymous changes and hence the analysis of selection pressure undertaken here. Instead, they estimated the expected number of non-synonymous passenger mutations on the basis of prior published data and identified 189 genes that were mutated at significantly higher frequency. Their conclusion was broadly similar, that a large number of cancer-causing mutations and cancer genes are operative in human cancers.
By studying a gene family with a strong track record of involvement in oncogenesis, it is conceivable that we have improved our chances of detecting new cancer genes and that other gene sets may yield a more meagre harvest. Nevertheless, given that we have studied only 518 genes and limited numbers of each cancer type, it seems likely that the repertoire of mutated human cancer genes is larger than previously envisaged. The work presented here suggests that systematic sequencing studies of larger numbers of tumours from a wide variety of cancer types will yield further insights into the development of human cancer, providing new opportunities for molecular diagnosis and therapeutics.
DNA was extracted from primary tumours, cancer cell lines and normal tissue samples. Collection and use of tissue samples were approved by the IRB of each institution. Samples estimated to contain more than 80% tumour cells were used. All samples were analysed using Affymetrix 10K SNP arrays to demonstrate that they were from the same individual and to confirm the presence of copy number changes. Microsatellite instability was assessed using the NCI consensus marker panel39. PCR primers were designed to amplify all coding exons of the 518 protein kinases4 annotated in the human genome (available at http://www.sanger.ac.uk/genetics/CGP/). Approximately 10,000 fragments of 500 base pairs were amplified and directly sequenced in both directions from each cancer. Sequence traces were initially evaluated computationally and subsequently manually reviewed. The existence of the variant was then assessed in dbSNP (http://www.ncbi.nlm.nih.gov/SNP/) and, if not present, was directly evaluated in normal DNA from the same individual by PCR sequencing using the appropriate amplimer. Cancer samples showing putative somatic sequence alterations were then re-amplified and re-sequenced along with the appropriate, matched, non-cancer DNA to confirm the somatic nature of the mutation and to eliminate sequencing artefacts. Statistical analyses are outlined in more detail in Supplementary Methods. Deviation of the ratio of non-synonymous:synonymous mutations from that expected by chance was used to indicate the presence of selection on non-synonymous mutations. To assess the significance of this ratio, an exact Monte Carlo test was developed which was applied to the entire set and to subsets of mutations. Additional methods were developed to determine the number of driver mutations, analyse differences in selection between mismatch-repair-deficient and -proficient cancers and to assess the likelihood of a gene being a cancer gene. A combined pathway database was generated by merging Reactome, Panther and INOH to test for the presence of mutated pathways.
We would like to thank J. Leary and the ABN-Oncology group (funded by the National Health and Medical Research Council of Australia), the Hauenstein Foundation and the Cooperative Human Tissue Network for providing samples for analysis, G. Wu and L. Stein for the development of the joint Reactome, Panther, INOH database, and C. Marshall and N. Rahman for comments. The studies were funded by the NIH and the Wellcome Trust.