|Home | About | Journals | Submit | Contact Us | Français|
It is widely accepted that cancer is a disease caused by accumulation of mutations in specific genes. These tumor-specific mutations provide clues to the cellular processes underlying tumorigenesis and have proven useful for diagnostic and therapeutic purposes. To date, however, only a small fraction of genes has been analyzed and the number and type of alterations responsible for the development of common tumor types are unknown. The determination of the human genome sequence coupled with improvements in sequencing and bioinformatic approaches have made it possible to examine the cancer cell genome in a comprehensive and unbiased manner. Systematic sequencing studies have been performed on gene families involved in signal transduction in several tumor types, and have now been extended to include the majority of protein-coding genes in breast and colorectal cancers. These analyses have identified new genes and pathways that had not been linked previously to human cancer. One example has been the discovery of genetic alterations in the PIK3CA gene encoding p110α phosphatidylinositol 3-kinase and in related pathway genes in >30% of colon and breast cancers. These mutational analyses provide a window into the genetic landscape of human cancer, indicate new targets for personalized diagnostic and therapeutic intervention, and suggest lessons for future large-scale genomic analyses in human tumors.
Cancer research is poised for a transformation that will soon permit the comprehensive identification of genomic changes in any tumor type. In the past, identification of genes implicated in tumorigenesis was a long-term endeavor driven by the analysis of candidate genes in certain chromosomal regions, by clues from functional studies, or by linkage in families with hereditary syndromes (1). Though the results of such analyses represent the foundation of our current understanding of tumor initiation and progression, many molecular changes underlying human cancer remain to be discovered. Recent improvements in technologies for high-throughput sequencing and mutation detection together with the sequence of the human genome have now permitted rapid analyses of a large numbers of genes for somatic (i.e. tumor-specific) alterations (2,3).
Several important advances have aided the development of high-throughput approaches for DNA sequencing and mutation detection in human cancer. The first has been the collection and isolation of high-quality tumor tissue for these analyses, either through generation of early passage tumor cell lines or through selective capture or microdissection of neoplastic tissue. This has permitted the sensitive detection of somatic mutations that would otherwise have been masked by contaminating normal tissue. The second advance has been the development of automated methods for large-scale sequence analysis of specific loci by polymerase chain reaction and Sanger sequencing. These methods have now been optimized to provide rapid and robust sequence analysis of nearly all exonic regions in the human genome (4,5). Finally, several methods for automated mutation detection have been developed and applied for analysis of somatic alterations in cancer (6,7). By direct comparison of sequence traces from tumor and normal tissues, these methods have allowed the sensitive identification of most types of somatic sequence alterations, including nucleotide substitutions, and small insertions, duplications and deletions.
Further improvements are likely to make these analyses even more facile in the future. These will include the use of next generation sequencing technologies that can potentially allow sequence analyses of entire human genomes through use of massively parallel short sequence reads (8). Currently, such approaches suffer from relatively high sequencing error rates, from the requirement for redundant analyses at each locus to ensure that both alleles have been accurately genotyped, and from the difficulty of assessing related regions in the genome using short sequences. Though these issues reduce the attractiveness of these methods for mutation detection, there are a variety of other applications, including analyses of expression and other epigenetic changes, that can be readily performed at this time (9). Given the pace at which sequencing technology has improved, it would be reasonable to expect that further progress in reducing error rates and increasing read lengths will ultimately lead to simpler and more sensitive methods of mutation detection in tumor DNA in the future.
The application of high-throughput sequencing methods for analysis of human cancer has already permitted analyses of increasingly larger numbers of genes for somatic mutations (Table I). These approaches were initially used to estimate the number of somatic alterations that one may expect to detect in a human cancer genome (10). Once a baseline for the number of background somatic changes in a tumor was thus established, efforts focused on analyses of groups of genes involved in signal transduction pathways, in particular protein kinases and phosphatases. The proteins encoded by such genes have been shown to play an important role in regulating cellular aspects related to tumorigenesis, including differentiation, cell cycle progression, apoptosis, motility and invasion (11). By virtue of their enzymatic activities, these genes were also attractive as they may be amenable to therapeutic intervention. Although a few kinases and phosphatase genes had been shown to be mutationally altered in specific human cancers (12), the involvement of the vast majority of these genes in neoplasia had not been explored.
In one of the first examples of these types of analyses, investigators at the Sanger Center examined genes in the RAS–RAF pathway for genetic alterations in a variety of tumor types. This analysis identified a high frequency of mutations in the v-raf murine sarcoma viral oncogene homolog protein kinase gene in melanomas and to a lesser degree in other tumors (13). This discovery was surprising because this pathway was well known and had been extensively characterized at the biochemical level. Mutations in v-rat murine sarcoma viral oncogene homolog were shown to be mutually exclusive with alterations in the Kirsten rat sarcoma viral oncogene homolog (KRAS) providing genetic evidence that these genes operated in the same pathway in human tumors and suggesting that mutation in either was sufficient to activate downstream signaling (13,14).
Using a sequencing-based approach, our group at Johns Hopkins University performed a series of mutational analyses of gene families encoding protein kinases and phosphatases in human colorectal cancers (6,15,16). To determine whether these genes were genetically altered, Hidden Markov models and previous reports in the literature were used to identify the genes containing kinase and phosphate domains in the human genome, and these were then directly analyzed by sequence analysis of tumor DNA. From the sequence information obtained, seven tyrosine kinases, eight serine/threonine kinases and six tyrosine phosphatases were identified that contained somatic mutations. In aggregate, these mutated genes affected a substantial fraction of the colorectal cancers analyzed. For example, of the tyrosine phosphatase genes identified, protein tyrosine phosphatase receptor type T was shown to be altered in >10% of colorectal cancers and was also mutated in lung and gastric cancers (15). Some of the alterations in protein tyrosine phosphatase receptor type T were predicted to result in truncated mutant proteins lacking the phosphatase domain, while missense mutations were shown to lead to reduced phosphatase activity. Recent analyses have identified STAT3 as being a substrate of protein tyrosine phosphatase receptor type T and suggest that dysregulation of this pathway may be an important feature of many colorectal tumors (17). Despite years of research on protein kinases and phosphatase genes, few of the genes identified had been linked previously to human cancer and pointed to new pathways that were involved in tumor development.
A similar yet smaller approach was undertaken for analysis of the phosphatidylinositol 3-kinase (PI3K) genes, a family of lipid kinases that mediate pathways important for proliferation, adhesion, survival and motility (18). To evaluate whether PI3Ks may be genetically implicated in tumorigenesis, sequence based analyses were used to identify 16 PI3K genes in the human genome. These were examined for sequence alterations in their kinase domains in a panel of colorectal cancers (19). PIK3CA, encoding the p110α catalytic subunit, was the only gene identified with somatic mutations affecting 32% of colorectal tumors examined. Analysis of PIK3CA in other tumor types identified somatic mutations in a smaller fraction of breast, brain, gastric and lung cancers. In subsequent studies, PIK3CA was shown to be altered in 36% of hepatocellular carcinomas, 36% of endometrial carcinomas, 25% of breast carcinomas, 15% of anaplastic oligodendrogliomas and 5% of medulloblastomas and anaplastic astrocytomas (20–23). Analysis of other members of the PI3K pathway has shown that a number of additional genes within the PI3K pathway are altered in colorectal, breast and other tumor types (16,18,24). In most cases, the mutations in these genes appeared to be mutually exclusive, suggesting that alterations in any one gene were sufficient to drive tumorigenesis. In colorectal cancers, >40% of tumors had alterations in one of eight PI3K pathway genes, demonstrating the importance of this pathway in colorectal cancer pathogenesis.
Additional screens for somatic alterations in human tumors have identified mutations in the epidermal growth factor receptor (EGFR) in a small fraction of lung cancers and have linked mutations in this gene with increased sensitivity to EGFR inhibitors such as gefitinib (Iressa) and erlotinib (Tarceva) (25,26). Although the frequency of these alterations was low, these observations were important to the field as they substantiated the hypothesis that tumor cells were dependent on specific pathways for continued proliferation (27,28) and provided fresh impetus for the development of novel therapeutic agents against EGFR and other protein kinases. Sequence analysis of the tyrosine kinase genes in other neoplasms have identified mutations in janus kinase 2 in polycythemia vera and in other myeloproliferative disorders (29–33). Interestingly, the mutations in janus kinase 2 mostly occurred at a single residue, V617F, providing a potentially facile and sensitive means of detecting alterations in individuals with these disorders. Additional analyses of protein kinases in lung, breast and other tumor types have revealed mutations of the HER-2/neu receptor (ERBB2) in a fraction of lung cancers, have identified a subset of breast tumors with an unusually high mutator phenotype and have shown that certain tumors, such as testicular germ cell tumors, have a very low prevalence of somatic mutations (34–37). Finally, systematic analyses have been performed on genes that may be involved in chromosomal instability, as this phenotype is an underlying characteristic of the vast majority of human cancers. These analyses identified somatic alterations in the DNA repair gene MRE11 and in a number of genes involved in the cohesin complex that are thought to be important for sister chromatid cohesion and accurate chromosome segregation (38,54).
Taken together, the unexpected discoveries from these systematic analyses of gene families involved in signaling pathways provided a clear rationale for expanding these studies to examine the remaining genes in the human genome. Such analyses could reveal additional genes in known pathways that are significantly affected by genetic alterations as well as identify genes that may be pointing to entirely different cellular processes. To achieve such goals, we recently undertook an effort to sequence a large fraction of the protein-coding genes in the human genome in breast and colorectal cancers. We initially focused on a set of ~13000 genes comprising the consensus coding sequences as these represented the most highly curated gene set available (4). We have recently extended these analyses to include the remaining ~5100 genes in the Reference Sequence database (5). The goals of these studies were to provide a methodological strategy that would allow genome-wide mutational analyses in human tumors, to identify the spectrum and extent of somatic mutations in human tumors and to identify new genes and molecular pathways that were important in these tumors.
From the combination of these two studies, a total of ~200,000 coding genomic regions were analyzed in 11 samples of each tumor type (breast and colorectal carcinomas). Over 4 million polymerase chain reaction products were generated and directly sequenced, resulting in nearly 660 million bp of tumor sequence. Examination of sequence traces from these amplicons revealed over a million putative nucleotide changes. These changes could represent germ line variants, artifacts of polymerase chain reaction or sequencing, or bona fide somatic mutations. A variety of bioinformatic and experimental steps were employed to distinguish among these possibilities. The combination of these steps removed >99% of the potential alterations, resulting in 2185 confirmed somatic mutations in 1885 genes.
The great majority of the mutations observed were single-base substitutions. Though the fraction of these was similar in breast and colorectal cancers, the spectrum and nucleotide contexts of mutations were very different between the two tumor types. The most dramatic of these differences occurred at C:G base pairs (many of which were at 5′-CpG-3′ dinucleotide sites). Over half of the colorectal cancer mutations were C:G to T:A transitions, whereas <10% were C:G to G:C transversions. In breast cancers, however, only 35% of the mutations were C:G to T:A transitions, whereas 29% were C:G to G:C transversions. In contrast, mutations occurring at 5′-TpC-3′ sites (or complementary 5′-GpA-3′ sites) comprised nearly a third of alterations in breast cancers but a much smaller fraction in colorectal cancers. These observations have important implications for processes of carcinogenesis and suggest that the mechanisms underlying mutagenesis and repair in the two tumor types are probably different. These conclusions have been extended by analysis of somatic alterations in other tumor types and suggest that additional spectra of mutations may affect tumors derived from other tissues (39).
Somatic mutations in human tumors can arise either through selection of functionally important alterations via their effect on net cell growth or through accumulation of non-functional ‘passenger’ alterations that arise during repeated rounds of cell division in the tumor or in its progenitor stem cell. To distinguish between these possibilities, several statistical approaches have been developed to estimate the probability that the number of mutations in a given gene reflects a mutation frequency that is greater than expected from the background mutation rate (5,40). In general, these analyses incorporate the number of somatic alterations observed, the number of tumors studied, the number of nucleotides that were successfully analyzed and the nucleotide type and context of each mutation. We used such approaches to identify those genes in our genome-wide study that were most likely to have been selected during tumorigenesis. Over 200 such candidate genes were discovered in breast and colorectal tumors. The genes we identified that were previously known to be somatically mutated in human cancers represented the vast majority of genes that are thought to be important in these two tumor types, thereby providing an important validation of such unbiased approaches in genetic analyses of neoplasia.
This study also revealed a substantial number of genes that had not been suspected previously to be involved in cancer. The potential roles of these genes has been analyzed by their annotation in various functional databases, including Gene Ontology, kyoto encyclopedia of genes and genomes and GeneGo databases, or through previously published literature (4,5,41,55). Several of the groups identified in this way were of special interest, as a substantial fraction of the genes were transcriptional regulators, cell adhesion molecules and members of signal transduction pathways. At least one member of each of these gene groups was mutated in >70% of the tumors of each type. Subsets of these groups were also of interest and included metalloproteinases, ephrin receptors and G proteins and their regulators. Interestingly, additional members of the PI3K pathway were also identified that had not been detected previously. These data suggest that dysregulation of specific cellular processes are genetically selected during neoplasia and that distinct members of each group may serve similar roles in different tumors.
The observations from these genome-wide mutational analyses suggest that breast and colorectal cancers display a large degree of complexity at the genetic level. This is reflected by the fact that individual tumors harbor ~80 non-silent mutations in the coding regions of different genes and that 15–20 of these are probably to be causally implicated in human cancer. Although a handful of these genes are mutated in a high fraction of tumors, the vast majority are mutated at relatively low frequencies and are different among tumors. This genomic landscape of a few highly mutated genes among a large number of less frequently mutated genes is a feature of both breast and colorectal cancers and is probably a defining characteristic of other solid cancers. This picture of genomic complexity intuitively suggests that the highly mutated genes provide a large selective advantage to the mutated cell, whereas the genes with low frequency mutations provide only a modest advantage. Mathematical modeling of tumor progression is consistent with this notion and shows that even small degrees of fitness advantage are sufficient for such mutations to be selected during tumorigenesis (42).
A consequence of the genetic heterogeneity that we observed from these large-scale studies is that individual tumors are different with respect to the mutations that they contain within their genomes. These differences may in part be responsible for the clinical diversity that tumors display in development, response to therapy and clinical outcome. Taking advantage of these alterations will be challenging because of the multitude of changes observed and because of the current lack of understanding of the role of many of the mutations in the pathogenesis of the disease.
Nevertheless, some of the observed mutations may be useful for therapeutic targeting. It has been proposed that mutations in tumor cells may lead to ‘oncogenic addiction’ (27,28). This hypothesis suggests that tumor cells are addicted or dependent on mutated genes or pathways, and that inhibition of these can result in cellular arrest or death. This hypothesis has been supported by successful examples in the clinic, including use of imatinib (Gleevec) to inhibit BCR-ABL in patients with chronic myeloid leukemia and use of gefitinib (Iressa) and erlotinib (Tarceva) in patients with lung tumors containing EGFR mutations. Moreover, genetic and cellular analyses of two commonly mutated oncogenes, kirsten rat sarcoma viral oncogene homolog and PIK3CA, suggest that tumor cells depend on the activity of these mutant genes for continued cellular proliferation and that their disruption reduces the neoplastic potential of these cells (43,44).
A potential target identified through systematic genomic studies that may be amenable to therapeutic intervention is PIK3CA. The positions of the mutations within this gene immediately suggested that they were likely to increase kinase activity. Over 75% of alterations occurred in two small clusters in evolutionarily conserved regions of the helical and kinase domains. This clustering of somatic missense mutations in specific domains was similar to that observed for activating mutations in other oncogenes, such as kirsten rat sarcoma viral oncogene homolog, v-rat murine sarcoma viral oncogene homolog and tyrosine kinases. A number of studies have now shown that these alterations increase kinase activity compared with the wild-type protein and are oncogenic (19,44–46). To examine the function of mutated PIK3CA in human cancer cells, we have used homologous recombination to disrupt the PIK3CA locus, thereby generating isogenic cancer cell lines containing either the wild-type or the mutant version of PIK3CA (44). These studies show that mutant PIK3CA appears to be important for cell growth and invasion both in vitro and in vivo. Treatment of these cells with PI3K inhibitors have shown that those with mutant PIK3CA were preferentially inhibited, and suggested that PIK3CA may be a useful target in tumors with mutations in this gene. The combination of these genetic and functional properties of PIK3CA have spurred the development of PI3K inhibitors, leading to at least several compounds that are in phase I clinical trials (47). Additionally, as most of the genes altered in the PI3K pathway encode protein kinases, the encoded proteins of these genes could also serve as potential therapeutic targets. Targeting of the proteins that act downstream in the PI3K pathway may be effective in treating the larger fraction of tumors containing mutations in PIK3CA, in the phosphatase and tensin homolog, or in other components.
Another potential modality of therapeutic intervention that takes advantage of the large number of alterations observed in human tumors is based on immune recognition of novel mutant epitopes. In silico analyses of the tumors analyzed in our genome-wide study suggest that each accumulated on average 7–10 unique mutant peptides that could be presented as major histocompatibility complex, class IA HLA-A*0201 epitopes (48). As tumor cells have potentially six distinct HLA class I molecules, the average number of presented mutant epitopes may actually be closer to ~60. If these predictions are confirmed experimentally, they suggest the possibility for the development of immunologic-based approaches for treatment cancers that would be broadly applicable yet highly specific to individual tumors.
These mutation data can also be useful for early detection of cancer. Diagnostic avenues may ultimately be more valuable than development of new therapies, as most tumors can be cured if they are detected and surgically excised at an early stage. Mutated tumor DNA, released from the primary tumor or from circulating tumor cells, can be used as markers of tumorigenesis that can be detected in the blood or other bodily fluids (49,50). Detection of such mutations would be highly specific for tumors as they would not be expected to be present in normal tissues at appreciable levels. The challenge of using mutation-based markers is that such methods need to be able to detect small numbers of mutant DNA molecules in the context of much larger amounts of normal DNA. Techniques have already been devised for fecal DNA mutation detection that permit screening of colorectal cancers with sensitivities nearing those of colonoscopies (51,52). The sensitivity of such approaches would be expected to increase with the inclusion of additional mutated genes within these assays. Though the use of a large number of gene mutations would currently make such tests expensive, it is likely that new next generation sequencing technologies will permit these approaches to be accessible in the future.
Several general lessons have emerged from these genome-wide mutational analyses. The first is that a relatively large number of previously uncharacterized mutated genes exist in human cancers and that these genes can be discovered by unbiased genomic approaches. These results support the prediction that large-scale mutational analyses of other tumor types will prove useful for identifying genes not currently known to be linked to tumorigenesis. Along these lines, The Cancer Genome Atlas Project has recently been initiated to provide mutational analyses of a large number of genes in brain, lung and ovarian cancers (53). Even larger international cancer genome sequencing efforts may arise to tackle these issues in other tumor types. Second, our results suggest that the number of mutational events occurring during the evolution of human tumors from a benign to a metastatic state is much larger than previously imagined and that breast and colorectal cancers show substantial differences in their mutation spectra and in the genes that are mutated. These data show that tumors have a substantial heterogeneity at the genetic level, and that the observed alterations may be reflected in the biologic and clinical differences between breast and colorectal tumors and between individual patients within the same tumor type. Finally, it appears that a substantial number of mutated genes may participate in common biologic groups or molecular pathways, thereby reducing the apparent genetic complexity. The enrichment of mutations in novel pathways will probably be facilitated in the future by unbiased genome-wide mutational analyses.
It is clear that the studies performed to date represent only an initial foray into the determination of the genetic understanding of human cancer. Additional analyses using complementary approaches, including those assessing copy number alterations, translocations and epigenetic modifications, in combination with genomic sequencing, will provide a more complete picture of the compendium of genetic alterations in human cancer. Improvements in next generation sequencing technologies may allow for more rapid analyses that provide insight into most of these genomic alterations simultaneously. Ideally, these data should be combined with information regarding patient prognosis, response to therapy and outcome in order to identify those changes that have important clinical implications. These results will undoubtedly stimulate widespread efforts to understand the functional effects of the observed genetic alterations and to develop targeted therapeutic and diagnostic approaches against cancers containing these changes. Such large-scale studies may ultimately allow us to envisage a future where patients would receive cancer diagnoses through non-invasive DNA analyses and based on the alterations identified would receive personalized therapies designed to be effective against the combination of mutations present in their tumors. Though such a scenario may not be immediately realizable, the systematic genomic studies and the targeted therapies already described provide a road map for the long-term management of human cancer.
Virginia and D.K. Ludwig Fund for Cancer Research; National Institutes of Health (CA121113, CA 43460, CA 57345); National Cancer Institute, Division of Cancer Prevention (HHSN261200433002C); Dr. Miriam and Sheldon G. Adelson Medical Research Foundation; Pew Charitable Trusts.
Conflict of Interest Statement: None declared.