In 2001, the IHGSC and Celera Genomics reported draft versions of the human genome sequence [
97,
98]. Shortly thereafter, the complete human genome sequence was reported by the IHGSC [
12]. The availability of a sequence-based map of the human genome and the development of high-throughput Sanger sequencing, a by-product of the human genome project, provided new opportunities to systematically catalogue somatic mutations within both causal and candidate cancer genes on a larger scale, and at a faster pace, than previously possible. Given the clinical successes of genotype-directed therapies in treating
HER2-positive breast cancer patients with trastuzumab, and treating
BCR-ABL-positive CML patients with imatinib, many of the earliest high-throughput resequencing efforts centred on protein kinase-encoding genes [
13,
99,
100]. As a result, a number of novel therapeutic targets were revealed for genotypically-defined subgroups of cancer patients [
101–
118] (reviewed in [
119]). This paved the way for other high-throughput resequencing efforts targeting specific biochemical pathways, large gene families and large numbers of candidate cancer genes [
120–
130]. Not all cancer genes encode proteins that are amenable to direct therapeutic intervention. However, identifying synthetic lethal partners of these so-called ‘undruggable targets’ holds promise in leveraging such alterations for targeted cancer therapy [
23–
26,
131] (reviewed in [
132]).
In parallel with the human genome project, the development of array-based comparative genomic hybridization and progressive refinements of this technology significantly increased the resolution and throughput at which genome-wide somatic alterations in DNA copy number could be ascertained (reviewed in [
133]). The completion of the human genome project also facilitated the construction of the HapMap, a map of naturally occurring human genomic variation, in the form of single nucleotide polymorphisms (SNPs) and its underlying genomic structure [
134–
136]. This allowed for the development of high-density SNP genotyping arrays that could be used to screen not only for copy number alterations but also for copy-neutral loss-of-heterozygosity throughout the genome, at unparalleled resolution [
137–
143] (reviewed in [
133]). The implementation of genome-wide expression profiling revealed novel transcriptional signatures that can be utilized to molecularly classify cancer subtypes or predict clinical phenotypes, and was also instrumental in facilitating the identification of previously unrecognized gene fusions in common epithelial tumours (reviewed in [
144,
145]). Furthermore, integrated studies that combined genome-wide searches for copy number alterations with global gene expression profiling significantly increased the power to hone in on candidate cancer genes [
146,
147]. Nonetheless, the capacity to understand the full compendium of genomic alterations that drive human tumourigenesis was limited by the inability to rapidly and systematically sequence entire tumour genomes.
Our first glimpse into the true genetic complexity of human cancers has come from recent pioneering studies that sequenced the exomes, the coding exons of more than 18 000 protein-encoding genes, from a series of breast, colorectal, pancreatic and brain cancers [
148–
151]. Remarkably, these explorations were achieved using high-throughput Sanger sequencing, and were integrated with genomic analyses to interrogate gene expression and copy number [
149,
150,
152]. These studies revealed that cancer genomes are highly complex, with an average of 48–101 somatic alterations in each tumour, depending on the cancer type [
149,
150]. Within a given cancer type there is considerable inter-tumour heterogeneity, resulting in large numbers of altered genes. However, this complexity is reduced significantly by considering the biological pathways and processes on which altered genes converge, rather than the altered genes themselves [
151]. For example, 12 core biological processes or pathways appear to be deregulated in the majority of pancreatic tumours, although precisely how this is achieved varies from tumour to tumour () [
149]. As has been noted, this may have practical implications for the development of targeted therapeutics, in that it may be more prudent to consider targeting functional pathways or processes rather than individual proteins encoded by mutated genes [
151]. Prior to these investigations, most cancer genes had been identified because they were frequently altered in tumourigenesis. However, the resequencing of tumour exomes revealed that, for a given type of cancer, the majority of somatically mutated genes are altered in just a fraction of tumours. This new view of the genomic landscape of human cancer suggested that the acquisition of numerous somatic mutations, each with a small fitness advantage, may also drive tumourigenesis [
151,
153].
Somatic alterations can be so-called driver mutations that confer a selective growth advantage to the tumour cell, or passenger mutations that have no effect on tumourigenesis [
101,
154]. Thus, the identification of a somatically altered gene indicates a candidate cancer gene rather than a causal cancer gene. What necessarily follows are detailed biochemical and cellular studies comparing the functional properties of the wild-type and mutant proteins. To guide such studies, statistical calculations, based on the frequency and nature of the observed somatic mutations, can be applied to prioritize or rank candidate cancer genes based on the likelihood that they represent driver genes [
155]. The statistical assumptions that are most appropriate to use in this type of predictive modelling have been the subject of some debate, because of the inherent difficulty in setting a background mutation rate for each tumour type [
155–
158]. Other computational approaches predict driver mutations rather than driver genes [
159–
161]. One such method has estimated that approximately 8% of missense mutations identified by exomic sequencing of glioblastomas are likely to be functionally significant, with the majority of these affecting infrequently mutated genes [
161]. Although synonymous somatic mutations are generally not considered in statistical predictions because they do not result in amino acid changes, it is worth noting that they can, on occasion, encode proteins with altered functional activity [
162]. It is important to note that, in addition to statistical predictions, functional genetic screens in mice and large-scale RNA interference screens can also guide the identification of causal cancer genes [
131,
163–
172] (reviewed in [
173–
175]).
Although exomic resequencing of cancer genomes captures the spectrum of mutations within protein-encoding genes, it does not assess the sequence integrity of non-coding regions of the genome. These regions contain functionally relevant elements, including ultra-conserved elements and ncRNAs, which are being systematically mapped by the
Encyclopedia
of
DNA
Elements (ENCODE) project [
176–
180]. Non-coding RNAs have been implicated in a variety of processes, including the regulation of transcription and chromosome structure, RNA processing and modification, mRNA stability and translation and protein stability and transport (reviewed in [
181]). Within the past few years, our vision of the cancer landscape has been reshaped with the realization that the dysregulation of micro-RNAs (miRNAs), a subset of ncRNAs, contributes to tumourigenesis (reviewed in [
32,
35]). MiRNAs are small ncRNAs that negatively regulate gene expression, including that of protein-encoding cancer genes. Dysregulated miRNAs have been described in human cancers and in some instances are associated with oncogenic properties, tumour-suppressive properties or both, depending on the cellular context (reviewed in [
32]). Furthermore, miRNA expression profiling of a mouse model of pancreatic cancer revealed distinct miRNA expression signatures at each step in the progression of tumourigenesis, correlating with the acquired attributes shared by most cancer cells [
6,
7]. The full extent to which ncRNAs contribute to cancer has yet to be revealed. In addition to miRNAs, another class of ncRNAs, represented by transcribed ultraconserved regions of the genome, has also been implicated in tumourigenesis [
179,
180]. Moreover, inherited mutations within the gene encoding DICER1, an endonuclease that regulates the processing of ncRNAs, have been linked to familial pleuropulmonary blastoma [
182].