|Home | About | Journals | Submit | Contact Us | Français|
Sporadic tumours, which account for the majority of all human cancers, arise from the acquisition of somatic, genetic and epigenetic alterations leading to changes in gene sequence, structure, copy number and expression. Within the last decade, the availability of a complete sequence-based map of the human genome, coupled with significant technological advances, has revolutionized the search for somatic alterations in tumour genomes. Recent landmark studies, which resequenced all coding exons within breast, colorectal, brain and pancreatic cancers, have shed new light on the genomic landscape of cancer. Within a given tumour type there are many infrequently mutated genes and a few frequently mutated genes, resulting in incredible genetic heterogeneity. However, when the altered genes are placed into biological processes and biochemical pathways, this complexity is significantly reduced and shared pathways that are affected in significant numbers of tumours can be discerned. The advent of next-generation sequencing technologies has opened up the potential to resequence entire tumour genomes to interrogate protein-encoding genes, non-coding RNA genes, non-genic regions and the mitochondrial genome. During the next decade it is anticipated that the most common forms of human cancer will be systematically surveyed to identify the underlying somatic changes in gene copy number, sequence and expression. The resulting catalogues of somatic alterations will point to candidate cancer genes requiring further validation to determine whether they have a causal role in tumourigenesis. The hope is that this knowledge will fuel improvements in cancer diagnosis, prognosis and therapy, based on the specific molecular alterations that drive individual tumours. In this review, I will provide a historical perspective on the identification of somatic alterations in the pre- and post-genomic eras, with a particular emphasis on recent pioneering studies that have provided unprecedented insights into the genomic landscape of human cancer.
In 1914, Theodor Boveri hypothesized that numerical alterations in the chromosome content of a cell might contribute to tumourigenesis . However, it would take almost another 50 years before the first specific chromosomal abnormality, a marker chromosome termed the Philadelphia chromosome, was identified within cancer genomes of patients with chronic myelogenous leukaemia (CML) . We now know that sporadic human tumours, which account for the majority of all human cancers, result from the accumulation of numerous genetic and epigenetic alterations, leading to the dysregulation of protein-encoding genes as well as non-coding RNAs (ncRNAs) [3–6]. Such alterations underlie the acquired attributes shared by most cancer cells, namely their capacity to generate their own mitogenic signals, evade apoptosis, resist exogenous growth-inhibitory signals, proliferate without limits, and acquire angiogenic, invasive and metastatic properties .
Early attempts to identify protein-encoding cancer genes were hampered by both the limited resolution of technologies available to detect genomic alterations and the enormous complexity of tumour genomes. This complexity was first recognized from cytogenetic studies cataloguing numerical and structural abnormalities within solid tumours . Unlike haematological malignancies and certain sarcomas, which were often characterized by a signature chromosome abnormality, most solid tumours of epithelial origin exhibited numerous chromosomal aberrations, both clonal and nonclonal [8–10]. The karyotypic heterogeneity of tumour genomes led to the proposal by Peter Nowell, in the mid-1970s, that tumourigenesis occurs by a stepwise evolutionary process . Nowell suggested that, following an initiating event that converts a normal cell to a neoplastic cell, cancer progression results from the acquisition of genetic instability, leading to the accumulation of genetic alterations and the continual selective outgrowth of variant subpopulations of tumour cells with a proliferative advantage. This model not only accounted for the genetic heterogeneity of solid tumours, but also led Nowell to speculate that each cancer patient might require individual specific therapy .
In the ensuing years, cancer gene identification was a laborious task, significantly impeded by the lack of a sequence-based map of the normal human genome. This changed in 2003 with the completion of the Human Genome Project . In the so-called post-genomic era, the availability of a complete map of the human genome and significant technological advances together powered systematic interrogations of cancer genomes, at unprecedented resolution and throughput, to catalogue somatic alterations and thus pinpoint candidate cancer genes. Within the last decade the concept of individualized therapy, based on the presence of specific molecular alterations within a patient's tumour, has become a reality, spurring the research community to systematically catalogue the somatic alterations that underlie all human cancers [13–26]. Here I review progress that has been made in the search for the genetic basis of sporadic cancers, with a particular emphasis on recent efforts to comprehensively and systematically catalogue somatic alterations using integrated genomic approaches. For comprehensive reviews of other factors that shape the cancer landscape, including the role of ncRNAs, dysregulation of protein synthesis, altered splicing, changes in chromatin structure, epi-genetic alterations and germline cancer susceptibility, the reader is referred to additional articles within this issue and elsewhere [27–38].
To appreciate the significance of recent developments in the search for cancer genes, it is useful to understand the pace of cancer gene discovery between 1960 and 2003, the years marking the identification of the first cancer-specific chromosomal abnormality and the completion of the Human Genome Project (Figure 1) [2,12]. In the late 1970s it was discovered that the genes responsible for the oncogenicity of certain retroviruses were actually altered versions (oncogenes) of normal cellular genes (proto-oncogenes). The first proto-oncogene, termed c-src, was identified in 1976, based on homology to the transforming gene (v-src) of the Rous sarcoma virus, a retrovirus that induces sarcomas in chickens [39,40].
The significance of proto-oncogenes in relation to human cancer causation became clear in 1982, when a transforming gene isolated from a human bladder carcinoma cell line was discovered to be a mutated version of a normal cellular proto-oncogene [41–44]. This marked the first description of a sequence change, a point mutation resulting in the oncogenic activation of H-RAS, in a human cancer gene.
The development of high-resolution chromosome banding techniques in the early 1970s allowed for detailed karyotypic analysis of cancer genomes and the identification of recurrent chromosomal abnormalities in haematological malignancies, cancer-derived cell lines and, to a lesser extent, solid tumours (reviewed in ). The molecular characterization of genes positioned at the breakpoints of chromosomal translocations, or those contained within homogeneously staining regions and double minutes, revealed that gene rearrangement, amplification and increased expression also resulted in oncogene activation in human cancers [45–54] (reviewed in ). In the pre-genomics era, oncogene identification largely relied on cloning genes at the site of proviral integrations, functional assays to isolate cellular genes capable of inducing foci formation in NIH3T3 cells, and positional cloning following the identification of structural and numerical chromosomal abnormalities, using both conventional and, later, molecular cytogenetic approaches (reviewed in [56,57]). By 2002, more than 100 oncogenes had been described . Many of these encoded protein tyrosine kinases that modulate mitogenic signal transduction pathways . Their oncogenic activation was associated with increased kinase activity and cellular transformation, properties that would have important therapeutic implications in later years (reviewed in ).
In 1971, by studying the inheritance pattern of retinoblastoma, a childhood cancer of the eye with a familial component, Alfred Knudson hypothesized the existence of a distinct class of cancer gene, acting recessively and requiring ‘two hits’ to contribute to tumourigenesis . The idea that neoplasia was sometimes a genetically recessive phenotype was supported by the cell fusion experiments of Henry Harris. The retinoblastoma gene (RB1), the first of the co-called tumour suppressor genes, was cloned 16 years later [62–68]. As Knudson had predicted, familial forms of retinoblastoma are caused by the inheritance of a germline alteration within one allele of RB1 and the acquisition of a somatic alteration in the other allele (the second hit) during the patient's lifetime. In contrast, sporadic forms of retinoblastoma arise later in life after a single cell acquires two independent somatic alterations, leading to a complete loss of RB1 function. A variety of genetic alterations lead to tumour suppressor gene inactivation, including point mutations, gene deletions and epigenetic alterations leading to gene silencing [64,69]. The earliest tumour suppressor genes identified were isolated by positional cloning, following the delineation of regions of linkage to disease susceptibility in cancer kindreds [62–68,70–88]. By 1993, 13 tumour suppressor genes had been positionally cloned or mapped to specific chromosomal locations .
Later, candidate gene approaches were also successful in uncovering cancer susceptibility genes [90–92]. With the development of molecular methods to search for somatic loss-of-heterozygosity and homozygous deletions in sporadic tumours, additional recessively acting cancer genes were described [70,71,93–95]. In 1986, Renato Dulbecco contended that the research community was faced with a choice — either to continue searching for cancer genes by a piecemeal approach or to decode the human genome and lay a foundation for future systematic searches for cancer genes . Four years later saw the launch of the Human Genome Project, a publicly funded, international effort to decode the human genome, led by the International Human Genome Sequencing Consortium (IHGSC).
In 2001, the IHGSC and Celera Genomics reported draft versions of the human genome sequence [97,98]. Shortly thereafter, the complete human genome sequence was reported by the IHGSC . The availability of a sequence-based map of the human genome and the development of high-throughput Sanger sequencing, a by-product of the human genome project, provided new opportunities to systematically catalogue somatic mutations within both causal and candidate cancer genes on a larger scale, and at a faster pace, than previously possible. Given the clinical successes of genotype-directed therapies in treating HER2-positive breast cancer patients with trastuzumab, and treating BCR-ABL-positive CML patients with imatinib, many of the earliest high-throughput resequencing efforts centred on protein kinase-encoding genes [13,99,100]. As a result, a number of novel therapeutic targets were revealed for genotypically-defined subgroups of cancer patients [101–118] (reviewed in ). This paved the way for other high-throughput resequencing efforts targeting specific biochemical pathways, large gene families and large numbers of candidate cancer genes [120–130]. Not all cancer genes encode proteins that are amenable to direct therapeutic intervention. However, identifying synthetic lethal partners of these so-called ‘undruggable targets’ holds promise in leveraging such alterations for targeted cancer therapy [23–26,131] (reviewed in ).
In parallel with the human genome project, the development of array-based comparative genomic hybridization and progressive refinements of this technology significantly increased the resolution and throughput at which genome-wide somatic alterations in DNA copy number could be ascertained (reviewed in ). The completion of the human genome project also facilitated the construction of the HapMap, a map of naturally occurring human genomic variation, in the form of single nucleotide polymorphisms (SNPs) and its underlying genomic structure [134–136]. This allowed for the development of high-density SNP genotyping arrays that could be used to screen not only for copy number alterations but also for copy-neutral loss-of-heterozygosity throughout the genome, at unparalleled resolution [137–143] (reviewed in ). The implementation of genome-wide expression profiling revealed novel transcriptional signatures that can be utilized to molecularly classify cancer subtypes or predict clinical phenotypes, and was also instrumental in facilitating the identification of previously unrecognized gene fusions in common epithelial tumours (reviewed in [144,145]). Furthermore, integrated studies that combined genome-wide searches for copy number alterations with global gene expression profiling significantly increased the power to hone in on candidate cancer genes [146,147]. Nonetheless, the capacity to understand the full compendium of genomic alterations that drive human tumourigenesis was limited by the inability to rapidly and systematically sequence entire tumour genomes.
Our first glimpse into the true genetic complexity of human cancers has come from recent pioneering studies that sequenced the exomes, the coding exons of more than 18 000 protein-encoding genes, from a series of breast, colorectal, pancreatic and brain cancers [148–151]. Remarkably, these explorations were achieved using high-throughput Sanger sequencing, and were integrated with genomic analyses to interrogate gene expression and copy number [149,150,152]. These studies revealed that cancer genomes are highly complex, with an average of 48–101 somatic alterations in each tumour, depending on the cancer type [149,150]. Within a given cancer type there is considerable inter-tumour heterogeneity, resulting in large numbers of altered genes. However, this complexity is reduced significantly by considering the biological pathways and processes on which altered genes converge, rather than the altered genes themselves . For example, 12 core biological processes or pathways appear to be deregulated in the majority of pancreatic tumours, although precisely how this is achieved varies from tumour to tumour (Figure 2) . As has been noted, this may have practical implications for the development of targeted therapeutics, in that it may be more prudent to consider targeting functional pathways or processes rather than individual proteins encoded by mutated genes . Prior to these investigations, most cancer genes had been identified because they were frequently altered in tumourigenesis. However, the resequencing of tumour exomes revealed that, for a given type of cancer, the majority of somatically mutated genes are altered in just a fraction of tumours. This new view of the genomic landscape of human cancer suggested that the acquisition of numerous somatic mutations, each with a small fitness advantage, may also drive tumourigenesis [151,153].
Somatic alterations can be so-called driver mutations that confer a selective growth advantage to the tumour cell, or passenger mutations that have no effect on tumourigenesis [101,154]. Thus, the identification of a somatically altered gene indicates a candidate cancer gene rather than a causal cancer gene. What necessarily follows are detailed biochemical and cellular studies comparing the functional properties of the wild-type and mutant proteins. To guide such studies, statistical calculations, based on the frequency and nature of the observed somatic mutations, can be applied to prioritize or rank candidate cancer genes based on the likelihood that they represent driver genes . The statistical assumptions that are most appropriate to use in this type of predictive modelling have been the subject of some debate, because of the inherent difficulty in setting a background mutation rate for each tumour type [155–158]. Other computational approaches predict driver mutations rather than driver genes [159–161]. One such method has estimated that approximately 8% of missense mutations identified by exomic sequencing of glioblastomas are likely to be functionally significant, with the majority of these affecting infrequently mutated genes . Although synonymous somatic mutations are generally not considered in statistical predictions because they do not result in amino acid changes, it is worth noting that they can, on occasion, encode proteins with altered functional activity . It is important to note that, in addition to statistical predictions, functional genetic screens in mice and large-scale RNA interference screens can also guide the identification of causal cancer genes [131,163–172] (reviewed in [173–175]).
Although exomic resequencing of cancer genomes captures the spectrum of mutations within protein-encoding genes, it does not assess the sequence integrity of non-coding regions of the genome. These regions contain functionally relevant elements, including ultra-conserved elements and ncRNAs, which are being systematically mapped by the Encyclopedia of DNA Elements (ENCODE) project [176–180]. Non-coding RNAs have been implicated in a variety of processes, including the regulation of transcription and chromosome structure, RNA processing and modification, mRNA stability and translation and protein stability and transport (reviewed in ). Within the past few years, our vision of the cancer landscape has been reshaped with the realization that the dysregulation of micro-RNAs (miRNAs), a subset of ncRNAs, contributes to tumourigenesis (reviewed in [32,35]). MiRNAs are small ncRNAs that negatively regulate gene expression, including that of protein-encoding cancer genes. Dysregulated miRNAs have been described in human cancers and in some instances are associated with oncogenic properties, tumour-suppressive properties or both, depending on the cellular context (reviewed in ). Furthermore, miRNA expression profiling of a mouse model of pancreatic cancer revealed distinct miRNA expression signatures at each step in the progression of tumourigenesis, correlating with the acquired attributes shared by most cancer cells [6,7]. The full extent to which ncRNAs contribute to cancer has yet to be revealed. In addition to miRNAs, another class of ncRNAs, represented by transcribed ultraconserved regions of the genome, has also been implicated in tumourigenesis [179,180]. Moreover, inherited mutations within the gene encoding DICER1, an endonuclease that regulates the processing of ncRNAs, have been linked to familial pleuropulmonary blastoma .
The impetus for the human genome project was to generate a sequence-based map that would allow researchers to identify the germline and somatic variation that underlies human disease, including cancer . One implication is that of personalized medicine, the ability to predict individual disease risk and drug response based on personal genomics. In the context of sporadic tumourigenesis, decoding individual tumour genomes holds the promise of more accurate diagnosis and prognosis and of guiding personalized treatment strategies for cancer patients.
Sanger sequencing has been the gold standard of DNA sequencing since it was first described over 30 years ago . However, the expense and throughput of this approach prohibits its application in sequencing large numbers of genomes. Therefore, in order to make personalized medicine incorporating whole-genome sequencing a reality, several new and revolutionary sequencing technologies have been developed in the past few years. These so-called next-generation sequencing approaches have been reviewed in detail elsewhere [184–188]. Suffice it to say that, by sequencing DNA in a massively parallel fashion, next-generation sequencing methodologies have significantly lowered the cost and time required to decode an entire human genome [189–193]. Nonetheless, the short reads obtained, coupled with the relatively high error rate, require deep coverage of each genome, keeping current costs relatively high. In addition, data analysis and confirmation of sequence variants are non-trivial. To bring whole-genome sequencing in line with other clinical diagnostic tests, the goal of a $1000 genome has been set (reviewed in [194,195]). Although this has not yet been attained, it is believed to be within reach with further improvements in technology. In addition to their higher throughput and lowered costs, next-generation sequencing methodologies have several other advantages compared with Sanger sequencing that are key to comprehensively deciphering tumour genomes. They are quantitative in nature and can be used to simultaneously determine both nucleotide sequence and copy number. They are more sensitive than Sanger sequencing, and therefore can detect somatic mutations present in just a subset of tumour cells [196,197]. In addition, these methods can be coupled with a so-called paired-end read strategy, in which both ends of individual clones are sequenced and mapped back to the genome . This strategy can be used to identify structural alterations, including insertions, deletions, duplications and rearrangements [199–201]. It is important to note that sequencing entire genomes encompasses not only the cellular genome but also the mitochondrial genome and, potentially, any virally-associated genomes. Prior to the development of next-generation sequencing, searching for changes in sequence, copy number and structure required the integration of data from multiple platforms. As next-generation sequencing eventually becomes routine, it will transform the analyses of cancer genomes, using a single platform.
Decoding entire genomes by next-generation sequencing is feasible, but is not yet commonplace [189–193]. One consideration that pertains to whole-genome sequencing of cancer genomes is the current need to survey both the tumour and the constitutional genomes from the same individual, to accurately discriminate polymorphic variants from potential somatic mutations. At the time of this writing, the genomes of two cytogenetically normal cases of acute myeloid leukaemia (AML), as well as DNA from the normal skin cells of these individuals, have been resequenced and partially analysed. The extraordinary number of potential somatic mutations identified (20 000–30 000 in each tumour), together with the high error rate, necessitated the prioritization of mutations for confirmatory sequencing by other methods. To date, only those mutations localizing within protein-encoding genes of the first AML genome have been validated . For the second AML genome, mutations within the coding regions of annotated genes and RNA genes, as well as those within highly conserved non-genic regions and regulatory regions, have been validated . Eight to twelve true somatic alterations were identified within the coding regions of each tumour genome [202,203]. Most of the altered genes were not previously implicated in the pathogenesis of AML and would not have been obvious choices for resequencing using a candidate gene approach. The coding regions of the genome comprise only 1–2% of the entire human genome. It has been estimated that 500–1000 additional somatic mutations exist within the non-coding regions of each AML genome. Most are anticipated to be passenger mutations with no contribution to tumourigenesis, but this awaits experimental confirmation .
Other studies have applied next-generation sequencing to search for somatic alterations within only the transcribed genes, or transcriptome, of tumours and tumour-derived cell lines, by a process known as RNA sequencing [204–213]. Because transcriptomes are significantly smaller than whole genomes, RNA sequencing represents a more cost-effective alternative than whole-genome sequencing to search for mutations within coding genes. In addition, transcriptome analysis can provide insights into the nature and abundance of alternative splice forms. However, one disadvantage of transcriptome sequencing versus exome or whole-genome sequencing is that unstable or low-abundance transcripts might escape detection . Notably, this approach has thus far led to the identification of novel candidate cancer genes within malignant mesotheliomas, as well as a novel genetic signature within granulosa-cell tumours of the ovary [205,206]. A recurrent somatic mutation within the FOXL2 gene was identified within almost all adult cases of ovarian granulosa cell tumours, opening up the possibility for improved diagnostic classification of this rare malignancy . The relevance of massively parallel transcriptome analysis to histopathology is reviewed elsewhere in this issue . Paired end-read transcriptome sequencing has also been applied to search for gene fusions resulting from chromosomal rearrangements within tumours [216,217]. Likewise, massively parallel sequencing, using a paired end-read strategy, has been used to catalogue structural rearrangements and copy number alterations present within two lung cancer cell lines . Approximately one-third of somatic rearrangements identified among these cell lines had escaped detection by other methods, testifying to the sensitivity of this approach to catalogue genomic rearrangements.
The past few years have seen the formation of increasingly more organized efforts to comprehensively and systematically search for the somatic alterations that underlie human cancer. The Cancer Genome Project of the Wellcome Trust Sanger Institute (UK) spearheaded the systematic search for somatic alterations in human tumours and tumour-derived cell lines, and continues this mission. Ongoing projects include the resequencing of 4000 candidate cancer genes from a series of cancer cell lines derived from a variety of solid tumours and from a collection of clear cell renal carcinomas (http://www.sanger.ac.uk/genetics/CGP/Studies/). Additionally, 800 tumour-derived cell lines are being interrogated for copy number alterations.
In 2005, the National Human Genome Research Institute and the National Cancer Institute (USA) launched The Cancer Genome Atlas (TCGA), a publicly-funded initiative to systematically catalogue genomic alterations present within the major forms of human cancer (http://cancergenome.nih.gov/about/mission.asp). This began with a 3 year pilot study interrogating glioblastoma multiforme, ovarian cancer and lung cancer, using integrated genomic approaches to search for changes in gene copy number, sequence and expression. In the pilot phase, ~600 candidate cancer genes were selected for Sanger sequencing . The analysis of glioblastomas by the TCGA network revealed novel somatic alterations in this disease, including mutations within the PIK3R1 gene, the regulatory subunit of PI3K, and indicated that disruption of the TP53, RB1 and receptor tyrosine kinase-mediated signalling pathways is a core element of gliomagenesis. It also pointed to a possible mechanistic link between the methylation of the MGMT gene, which identifies subgroups of glioblastoma patients sensitive to temozolomide treatment, and the appearance of MSH6 mutations, which are associated with temozolomide resistance [218, 219]. TCGA is moving out of its pilot phase with the objective of comprehensively identifying somatic alterations in 20–25 major tumour types by 2014 (http://www.cancer.gov/recovery). The data are publicly available through both open and controlled access via the Cancer Genome Workbench (CGWB), a feature of the Cancer Biomedical Informatics Grid (caBIG) [220,221].
One limitation experienced in the pilot phase of TCGA was the difficulty in obtaining sufficient amounts of tumour tissue for integrated genomic analyses using multiple platforms . This is likely to become less of a limitation as next-generation sequencing, a single method that requires significantly less starting material for analysis, is incorporated into this endeavour. Building on the experience gained in the pilot phase of TCGA, the NCI and the Foundation for the National Institutes of Health launched the Childhood Cancer Therapeutically Applicable Research to Generate Effective Treatments (TARGET) Initiative (http://target.cancer.gov/). TARGET aims to identify therapeutic targets for childhood cancers, including acute lymphoblastic leukaemia (ALL) and neuroblastoma, by characterizing cancer genomes and transcriptomes. Already studies within this initiative have revealed that somatic alterations within the IKZF1 and janus kinase (JAK ) genes occur in a subset of paediatric patients with B cell progenitor ALL and are predictive of poor outcome in such cases . This is also significant because it suggests the potential for therapeutic intervention of molecularly defined subsets of paediatric ALL patients with JAK inhibitors .
In 2008, the International Cancer Genome Consortium (ICGC) was formed in an effort to standardize the approaches by which genomic alterations are identified in human cancers (http://www.icgc.org/). The consortium aims to produce comprehensive catalogues of the genomic alterations of up to 50 clinically and societally significant cancer types and subtypes over the course of a decade. Tumours of the pancreas, ovary, stomach, liver, breast and oral cavity, as well as chronic lymphocytic leukaemia, have thus far been prioritized for analysis.
With the emergence of the post-genomic era, and the potential to interrogate human tumour genomes for somatic alterations at unprecedented resolution, came the need to organize and integrate mutation data into searchable catalogues. In 2004, the Wellcome Trust Sanger Institute established the Cancer Gene Census, a catalogue of genes mutated in human cancer at a higher frequency than expected by chance alone . This catalogue includes genes that have undergone somatic mutations in sporadic tumours as well as those that are mutated in the germline of cancer families. From 2004 to 2009 the number of consensus cancer genes increased by 40%, from 291 to 410 genes, coinciding with concerted efforts to systematically interrogate cancer genomes for somatic alterations in gene sequence and copy number. A complementary resource, the Catalogue of Somatic Mutations in Cancer (COSMIC), catalogues somatic mutation frequencies in benign and malignant tumours as well as tumour-derived cell lines [225,226]. The genotypic information available through COSMIC is not limited to just the consensus cancer genes but also includes mutated genes that do not meet the requirement for a consensus cancer gene, as well as genes that have been found not to harbour cancer-associated mutations. Currently, almost 5000 genes and ~340 000 tumours have been curated, resulting in a catalogue of approximately 13 000 unique mutations (Figure 3). Yet other catalogues of mutations serve more specific purposes. The Mitelman Database of Chromo-some Aberrations in Cancer systematically catalogues structural and numerical chromosomal alterations and their clinical associations, as reported in the literature [227,228]. The Genetic Alterations in Cancer database curates mutations associated with exposures to specific chemical, physical and biological agents implicated in tumourigenesis . Numerous other databases catalogue the frequency and spectrum of mutations within individual cancer genes with the goal of facilitating the clinical interpretation of variants (reviewed in ).
As the number of somatic mutations identified in cancer genomes continues to grow, it seems likely that additional, gene-specific catalogues will emerge. In this regard, guidelines to standardize the reporting of both somatically mutated and non-mutated tumour samples have recently been proposed, in order to establish a foundation for the accurate interpretation of their clinical and biological significance . Setting and adhering to such guidelines will be critical as we enter a new era in cancer genomics, in which next-generation sequencing of tumour genomes will undoubtedly lead to an explosion in the wealth of mutation data.
Over the course of the next decade we will gain an unparalleled appreciation of the somatic alterations present within human cancers, within protein-coding genes, non-coding RNA genes and non-coding regions of the genome, as well as the mitochondrial genome. However, this is just a first step towards identifying bone fide cancer genes. What necessarily follows will be the detailed functional characterization of individual candidate cancer genes, to determine whether and how they contribute to a tumourigenic phenotype. In parallel, functional genetic screens in mice and large-scale RNA interference screens may also guide the confirmation of causal cancer genes [131,163–172] (reviewed in [173–175]).
Once a causal cancer gene has been identified, it is important to recognize that not all somatic mutations within the gene will be functionally equivalent; some mutations might be consequential and others functionally insignificant . For example, certain EGFR mutations predict clinical responses to targeted therapies, whereas other mutations within this gene predict clinical resistance [115,231–236]. Likewise, the larger genomic context of a mutated gene can also affect its interpretation. For example, in glioblastomas it is the co-expression of a mutated form of EGFR (EGFRvIII) and PTEN that correlates with clinical sensitivity to small molecule inhibitors of EGFR . Conversely, the absence of a somatic mutation within tumour cells can also be used to guide molecularly directed therapies. For example, in both the USA and Europe the drugs cetuximab and panitumumab (monoclonal antibodies against EGFR) are recommended only for advanced colorectal patients whose tumours contain a non-mutated form of the KRAS gene (reviewed in ). Therefore, before new tests are introduced into the clinical setting it will be imperative to establish the clinicopathological associations of individual mutations, or lack thereof.
How the systematic cataloguing of somatic alterations in cancer genomes will transform diagnostic and clinical practices remains to be seen. One could envision a new battery of specific tests for each tumour type and subtype, based on altered genes or altered pathways, or eventually routine sequencing of individual tumour genomes. Such approaches might ultimately be coupled with methods that noninvasively isolate circulating tumour cells from peripheral blood samples, permitting the serial analyses of cancer genomes to dynamically monitor tumour evolution and adjust clinical management accordingly (reviewed in ). Whatever the final outcome, the systematic analyses of cancer genomes using new technologies  that will take place over the next decade has the potential to create a paradigm shift in the diagnosis, prognosis and treatment of individual cancer patients, as Nowell speculated over 30 years ago .
I would like to extend my sincere thanks to Dr Cariappa Annaiah and members of my laboratory for critical reading of the manuscript. I thank Julia Fekecs and Darryl Leja for graphical expertise. Funded by the Intramural Program of the National Human Genome Research Institute at NIH (to DWB).
Conflict of interest statement: I am employed by the National Human Genome Research Institute (NHGRI)/NIH, but I am not a member of The Cancer Genome Atlas research network, which is partially funded by NHGRI. I am an inventor on a patent describing EGFR mutations, which is licensed to Genzyme Corporation.
PowerPoint slides of the figures from this review are supplied as supporting information in the online version of this article.