|Home | About | Journals | Submit | Contact Us | Français|
There are currently few therapeutic options for patients with pancreatic cancer, and new insights into the pathogenesis of this lethal disease are urgently needed. Toward this end, we performed a comprehensive genetic analysis of 24 pancreatic cancers. We first determined the sequences of 23,219 transcripts, representing 20,661 protein-coding genes, in these samples. Then, we searched for homozygous deletions and amplifications in the tumor DNA by using microarrays containing probes for ~106 single-nucleotide polymorphisms. We found that pancreatic cancers contain an average of 63 genetic alterations, the majority of which are point mutations. These alterations defined a core set of 12 cellular signaling pathways and processes that were each genetically altered in 67 to 100% of the tumors. Analysis of these tumors’ transcriptomes with next-generation sequencing-by-synthesis technologies provided independent evidence for the importance of these pathways and processes. Our data indicate that genetically altered core pathways and regulatory processes only become evident once the coding regions of the genome are analyzed in depth. Dysregulation of these core pathways and processes through mutation can explain the major features of pancreatic tumorigenesis.
Worldwide, over 213,000 patients will develop pancreatic cancer in 2008, and nearly all will die of their disease (1–3). Several genetic alterations have been identified in these lethal cancers, including those in the CDKN2A, SMAD4, and TP53 tumor suppressor genes and in the KRAS oncogene (4–8). Although the discoveries of these genes have provided important insights into the natural history of the disease and have spurred efforts to develop improved diagnostic and therapeutic agents, the vast majority of human genes have not been analyzed in this cancer type.
We examined the genetic makeup of human pancreatic cancers in unprecedented detail. Because all human cancers are primarily genetic diseases, we hoped to identify additional genes and signaling pathways that could guide future research on this disease.
The sequences of protein-coding exons from 20,735 genes were identified and used to design primers for 219,229 amplicons covering these regions (9). DNA from 24 advanced pancreatic adenocarcinomas (table S1) was polymerase chain reaction (PCR)–amplified with these primers and sequenced with the use of fluorescent dye terminators (9). The 24 tumors were passaged in vitro as cell lines or in nude mice as xenografts to remove contaminating non-neoplastic cells, facilitating detection of mutations (10–12). Exons containing variant sequences were reamplified and resequenced from the tumor DNA as well as from normal DNA from the same patient to confirm the mutation and to ensure that the mutation was somatic (i.e., that it was not present in normal cells). PCR products from 208,311 amplicons resulted in PCR products that were successfully sequenced and met stringent quality controls (table S2). These amplicons included 94.5% of the targeted sequences and yielded high-quality sequencing data for 98.5% of the target bases within these amplicons. The 208,311 successfully sequenced amplicons yielded mutational data on 23,219 transcripts representing 20,661 genes.
Among the 1562 somatic mutations detected with this strategy, 25.5% were synonymous, 62.4% were missense, 3.8% were nonsense, 5.0% were small insertions and deletions, and 3.3% were at splice sites or within the untranslated region (UTR) (Table 1 and table S3). The spectra of somatic mutations can yield insights into potential carcinogens and other environmental exposures. Table 1 lists the spectra observed in the four tumors that have been subjected to large-scale sequencing analyses of the majority of protein-encoding genes. It is evident that breast tumors have a unique somatic mutation spectrum, with a preponderance of mutations at 5′-TpC sites and a relatively small number of mutations at 5′-CpG sites. However, the spectra of colorectal (13, 14), brain (15), and pancreatic tumors are similar, suggesting that breast epithelial cells are exposed to different levels or types of carcinogens or use distinctive repair systems (16, 17). Given that cells in the colon are expected to be exposed to dietary carcinogens more than breast, brain, or pancreatic cells, one possible interpretation of these results is that dietary components are not directly responsible for causing most of the mutations found in human cancers.
Of the 20,661 genes analyzed by sequencing, 1327 had at least one mutation, and 148 had two or more mutations among the 24 cancers surveyed (table S3). In addition to the frequency of mutations, the type of mutation can provide information useful for evaluating its potential role in disease (18). Nonsense mutations, out-of-frame insertions or deletions, and splice-site changes generally lead to inactivation of the protein products. To evaluate missense mutations, we developed an algorithm that uses machine learning of 58 predictive features based on the physical-chemical properties of amino acids involved in the substitutions and their evolutionary conservation at equivalent positions of conserved proteins (9). Of the 924 missense mutations that could be scored with this algorithm, 160 (17.3%) were predicted to contribute to tumorigenesis when assessed by this method (table S3).
We also generated structural models of 404 of the missense mutations identified in this study [links to structural models available at (19)]. In each case, the model was based on x-ray crystallography or nuclear magnetic resonance spectroscopy of the normal protein or a closely related homolog. This analysis showed that 55 of the 404 mutations were located near a domain interface or ligand-binding site and were likely to affect function (examples in Fig. 1).
The average number of somatic mutations in pancreatic cancers (48; Table 2) is considerably less than that in breast cancer (101) or colorectal cancers (77) (P < 0.001), even though fewer genes were sequenced in the latter two tumor types (14). One plausible explanation for this lower rate is that the cells that initiate pancreatic tumorigenesis have gone through fewer divisions than colorectal or breast cancer cells. It has been previously shown that the majority of mutations observed in colorectal cancers are likely to have occurred in the normal stem cells that gave rise to the initiating neoplastic cell (12). Our data are thus consistent with observations showing that normal pancreatic epithelial cells divide infrequently (20, 21).
We further evaluated 39 genes that were mutated in more than one of the 24 discovery screen cancers in a prevalence screen consisting of 90 pancreatic cancers. In this screen, we detected 255 nonsilent somatic mutations among 23 genes (table S4). The nonsilent mutation rate of the genes in the prevalence screen (excluding KRAS, TP53, CDK2NA, and SMAD4) was higher than that in the discovery screen (3.6 versus 1.47 nonsilent mutations per Mbase, P < 0.001). The fraction of nonsilent mutations observed in these 19 genes was also higher than that observed in the genes assessed in the discovery screen (P = 0.052). These data are consistent with the hypothesis that a greater fraction of the genes tested in the prevalence screen were positively selected during tumorigenesis.
By using oligonucleotide arrays containing probes for 1,069,688 SNPs and robust algorithms for identifying deletion events from SNP array data (22), we identified 198 separate homozygous deletions among the 24 pancreatic cancers (table S5). The average size of these deletions was 335,000 bp. Additionally, we observed many regions that had undergone single-copy losses, often manifest as losses of heterozygosity, including losses of whole chromosomes or whole chromosome arms. We did not pursue these changes because it is difficult to reliably identify target genes from such large regions unless the residual copy of the gene on the nondeleted chromosome is mutated. Such target genes would have already been called to our attention by the results of the discovery sequencing screen and would have been scored as homozygous changes (table S3).
According to the allelic two-hit hypothesis, the presence of a homozygous deletion indicates that a tumor suppressor gene exists within the deleted region (23). To determine the most likely target within these deletions, we used the results from our analysis of point mutations as well as expression analyses (see below) and previously published studies. For a gene to be considered the target, a portion of its coding region had to be affected by the homozygous deletion, and the gene (i) had to harbor a nonsilent sequence alteration in a different tumor, (ii) had to be a well-documented tumor suppressor gene, or (iii) had to have corroborating expression data. The presumptive target genes for the homozygous deletions that met these criteria are listed in table S5. This list includes the classic tumor suppressor genes CDKN2A (p16), SMAD4, and TP53, as well as genes that had not previously been implicated in pancreatic cancer development.
When an exon of a gene is truly deleted in a tumor, no sequencing information should be obtainable, providing confirmation of the deletion. Without exception, the homozygous deletions found through the SNP arrays were consistent with the sequencing data (9). Furthermore, there was only one homozygous deletion revealed by sequencing that was not evident in the microarray hybridizations (a four-exon deletion of SMAD4 in tumor Pa21C).
The number of deletions in a tumor was more variable than the number of somatic mutations, averaging 8.3 and ranging between 2 and 20 per tumor (Fig. 2). However, each homozygous deletion completely abrogated the function of the target gene as well as all other genes within the deleted region, whereas only a fraction of the somatic mutations were predicted to alter the gene’s function. In a typical pancreatic cancer, ~10 genes (including targets and nearby genes) are eradicated by homozygous deletion, providing fertile grounds for therapeutic strategies that target such losses (24, 25).
With the use of algorithms similar to those described above for deletions (22), we identified 144 focal high-copy amplifications in the 24 tumors (table S6). We also identified a variety of low-copy-number gains of entire chromosomes, chromosomal arms, or other large genomic regions that were not pursued because of the difficulty in reliably identifying candidate cancer genes from such large chromosomal regions. To determine the most likely target of the focal amplifications, we again used the results from our mutational data, expression analyses, and previously published data. The presumptive target genes for each of the amplifications that met predefined criteria (9) are listed in table S6. There were fewer amplifications than homozygous deletions or point mutations in most pancreatic tumors (Fig. 2).
The primary goal of cancer genome studies is the identification of genes that are likely to play a causal role in the neoplastic process (potential drivers). One can categorize the best candidate cancer genes (CAN genes) on the basis of their mutation frequencies and types. This categorization requires an estimate of the passenger mutation rate (13, 14, 26). For each of the genes containing somatic mutations, passenger probabilities were determined by using estimated minimal and maximal passenger mutation rates after taking into account the size of the gene, its nucleotide composition, and other relevant factors [Table 1 and (9)]. To analyze the probability that a given gene would be involved in an amplification or deletion, we made the conservative assumption that the overall frequency of all observed amplifications and deletions represented the passenger mutation rate (9).
Passenger probabilities for all genes in which at least two genetic alterations were identified in the discovery screen are listed in table S7. This list includes all genes previously known to play an important role in pancreatic cancer through mutation or copy number change, providing experimental confirmation of our general approach. Importantly, the CAN genes listed in table S7 included numerous other genes of potential biological interest, many of which had not previously been identified as playing an important role in this tumor type. Examples include the transcriptional activator MLL3; cadherin homologs CDH10, PCDH15, and PCDH18; the α-catenin CTNNA2; the dipeptidyl-peptidase DPP6; the angiogenesis inhibitor BAI3; the heterotrimeric guanine nucleotide–binding protein (G-protein)–coupled receptor GPR133; the guanylate cyclase GUCY1A2; the protein kinase PRKCG; and Q9H5F0, a gene of unknown function. These genes were generally mutated at much lower frequencies than those previously identified to be mutated in pancreatic cancers (table S7). This is compatible with the idea that conventional strategies were able to identify frequently mutated genes but not the bulk of the genes that are genetically altered in pancreatic cancers.
Because most cellular pathways and processes involve multiple proteins functioning in a concerted manner, it is possible that mutations in different genes result in similar tumorigenic effects. Because nearly all of the protein-coding genes in the human genome were evaluated in the current study, the data provided a unique opportunity to investigate groups of genes operating though specific signaling pathways and processes. Sets of genes involved in signaling pathways or cellular processes were defined through three well-annotated GeneGo MetaCore databases: gene ontology (GO), canonical gene pathway maps (MA), and genes participating in defined cellular processes and networks (GG) (27). We developed a statistical approach that provided a combined probability that a gene set contained driver alterations, taking into account all types of genetic alterations evaluated in this study (22). For each gene set, we considered whether the component genes were more likely to be affected by a genetic alteration than would be predicted by the passenger mutation rate.
These analyses identified 69 gene sets that were genetically altered in the majority of the 24 cancers examined (table S8). Thirty-one of these sets could be further grouped into 12 core signaling pathways and processes that were each altered in 67 to 100% of the 24 cancers analyzed and had clear functional relevance to neoplasia based on annotations in the databases described above (Table 2). The core pathways included those in which a single, frequently altered gene predominated, such as in KRAS signaling and in the regulation of the G1/S cell cycle transition; pathways in which a few altered genes predominated, such as in TGF-β signaling; and pathways in which many different genes were altered, such as in integrin signaling, regulation of invasion, homophilic cell adhesion, and small guanine triphosphatase (GTPase)–dependent signaling.
Gene expression patterns can inform the analysis of pathways because they can reflect epigenetic alterations not detectable by sequencing or copy number analyses. They can also point to downstream effects on gene expression resulting from the altered signaling pathways and processes described above. To analyze the transcriptome of pancreatic cancers, we performed SAGE [serial analysis of gene expression (28)] on RNA from the same 24 cancers used for mutation analysis. When combined with massively parallel sequencing by synthesis, SAGE provides a highly quantitative and sensitive measure of gene expression. The approach described above is similar to that used in recent RNA-Seq studies (29–32), but SAGE has the advantage that the quantification does not depend on the length of the transcript.
As a control for the current study, we microdissected histologically normal pancreatic duct epithelial cells because these cells are the presumed precursors of pancreatic cancers. As an additional control, we used human papillomavirus (HPV)–immortalized pancreatic duct epithelial (HPDE) cells, which have been shown to have many properties in common with normal duct epithelial cells (33, 34). SAGE libraries were prepared from these cells as well as the 24 pancreatic cancers; an average of 5,737,000 tags was obtained from each library, and an average of 2,268,000 tags per library matched the sequence of known transcripts (table S9).
The expression analysis was first used to help identify target genes from amplified and homozygously deleted regions. Although a small fraction of these regions contained a known tumor suppressor gene or oncogene, many contained more than one gene that had not previously been implicated in cancer. In tables S5 and S6, a presumptive target gene was identified within these regions through the use of the mutational and transcriptional data. For example, we assumed that a gene could not have been the target of an amplification event if that gene was not wholly contained within the amplicon and expressed in the tumor containing the amplification. Similarly, expression data can be used to help gauge the importance of genes containing missense mutations. A missense mutation in a gene that is not expressed in the tumor containing it is more likely to be a passenger than a mutation in a gene that is expressed (table S3).
Second, we determined whether the genes in the core signaling pathways and processes described above were differentially expressed. If the pathways and processes containing genetic alterations were indeed responsible for tumorigenesis, one might expect that many of the genes within these pathways would be aberrantly expressed. To test this hypothesis, we examined the expression of the gene sets constituting the 12 core signaling pathways and processes (Table 2 and table S8). The 31 gene sets constituting these pathways were more highly enriched for differentially expressed genes than the remaining 3041 gene sets (P < 0.001). These expression data thus independently support the contribution of these signaling pathways and processes to pancreatic tumorigenesis.
Lastly, we attempted to identify individual genes rather than pathways that were differentially expressed in the cancers. The data in table S9 represent the largest compendium of digital expression data derived for any tumor type to date. There was a remarkably high number (541) of genes that were at least 10-fold overexpressed in >90% of the 24 cancers (compared to normal pancreatic duct cells or HPDE cells). To determine whether these genes were also overexpressed in the primary tumors from which the cell lines were made, we performed SAGE on five such primary tumors. These results confirmed these 541 genes’ overexpression in situ: The genes were, on average, expressed at 75-fold higher levels in the cell lines and at 88-fold higher levels in the primary tumors compared with their expression in normal duct epithelial cells. Notably, 54 of the overexpressed genes encoded proteins that are predicted to be secreted or expressed on the cell surface (table S9). These over-expressed genes provide leads for a variety of diagnostic and therapeutic approaches.
The extensive genetic studies described above suggest that the key to understanding pancreatic cancers lies in an appreciation of a core set of pathways and processes. We identified 12 partially overlapping processes that are genetically altered in the great majority of pancreatic cancers (Fig. 3A). However, the pathway components that are altered in any individual tumor vary widely (Fig. 3, B and C). For example, the two tumors depicted in Fig. 3, B and C, each contain mutations of a gene involved in the TGF-β pathway (one SMAD4, the other BMPR2). Similarly, these two tumors both contain mutations of genes involved in most of the other 11 core processes and pathways, but the specific genes altered in each tumor are largely different. Although we cannot be certain that every identified mutation plays a functional role in the pathway or process in which it is implicated, it is clear from both the current and the previously published genetic data, as well as from past functional studies, that many of them are likely to affect these pathway(s).
This perspective is likely to apply to most, if not all, epithelial tumors. It is consistent with the idea that genetic alterations can be classified as mountains (high-frequency mutations) or hills (low-frequency mutations), with the hills predominating in terms of the total number of alterations involved (14). The heterogeneity among pathway components and the varied nature of mutations within individual genes can explain tumor heterogeneity, a fundamental facet of all solid tumors (35).
From an intellectual viewpoint, the pathway perspective helps bring order and rudimentary understanding to a very complex disease (36–38). Although the importance of signaling pathways in understanding neoplasia has been recognized (39, 40), genomewide genetic analyses such as that described here can identify the precise genetic alterations that may be responsible for pathway dysregulation in each patient’s tumor. Because most genes are mutated in only a small fraction of tumors, it is only through analysis of functional gene groups that an appreciation for the true importance of these genes’ mutations in neoplasia can be reached. For example, from Table 2 it is evident that all pancreatic cancers studied had alterations in genes in the Wnt/Notch and Hedgehog signaling pathways, a finding that could not have been appreciated in the absence of global analyses.
In addition to yielding insights into tumor pathogenesis, such studies provide the data required for personalized cancer medicine. Unlike certain forms of leukemia, in which tumorigenesis appears to be driven by a single, targetable oncogene, pancreatic cancers result from genetic alterations of a large number of genes that function through a relatively small number of pathways and processes. Our studies suggest that the best hope for therapeutic development may lie in the discovery of agents that target the physiologic effects of the altered pathways and processes rather than their individual gene components. Thus, rather than seeking agents that target specific mutated genes, agents that broadly target downstream mediators or key nodal points may be preferable. Pathways that could be targeted include those causing metabolic disturbances, neoangiogenesis, misexpression of cell surface proteins, alterations of the cell cycle, cyto-skeletal abnormalities, and an impaired ability to repair genomic damage (Table 2 and table S8).
Supporting Online Material