|Home | About | Journals | Submit | Contact Us | Français|
The purpose of this review is to summarize the evidence that can be used to reconstruct the etiology of human cancers from mutations found in tumors. Mutational spectra of the tumor suppressor gene p53 (TP53) are tumor-specific. In several cases, these mutational spectra can be linked to exogenous carcinogens, most notably for sunlight-associated skin cancers, tobacco-associated lung cancers, and aristolochic acid-related urothelial tumors. In the TP53 gene, methylated CpG dinucleotides are sequences selectively targeted by endogenous and exogenous mutagenic processes. Recent high-throughput sequencing efforts analyzing a large number of genes in cancer genomes have so far, for the most part, produced mutational spectra similar to those in TP53 but have unveiled a previously unrecognized common G to C transversion mutation signature at GpA dinucleotides in breast cancers and several other cancers. Unraveling the origin of these G to C mutations will be of importance for understanding cancer etiology.
Human cancer genomes contain numerous genetic and epigenetic aberrations (Chin and Gray 2008). Instability at the chromosome level manifested by chromosome deletions and rearrangements has been recognized as a signature of cancer and is referred to as aneuploidy (Rajagopalan and Lengauer 2004). Instability at the level of chromatin involves many processes that have the potential to alter gene expression patterns in cancer cells. These chromatin changes are also described as epigenetic events that, for example, lead to rearrangements of histone modifications and DNA methylation patterns often affecting specific genes or entire DNA sequence classes such as repetitive elements. Epigenetic aberrations are common in most human cancers and can affect hundreds if not thousands of genes (Rauch et al. 2008). Collectively, chromosomal, mutational, and epigenetic aberrations in cancer genomes are likely to contribute to all known hallmarks of cancer including, but not limited to, self-sufficiency in growth signals, evasion of apoptosis, unlimited replicative potential, and tissue invasion or metastasis (Hanahan and Weinberg 2000). It is generally agreed that malignant progression requires the mutational inactivation or activation, or epigenetic inactivation or activation, of only a limited number of genes but the exact number of such critical genes is not known and may be cancer type-specific.
This review deals with the instability that is observed at the nucleotide sequence level in human cancer genomes. There is an ongoing debate as to whether all or most human cancers exhibit a mutator phenotype (Bodmer et al. 2008; Loeb et al. 2008). The mutator phenotype is hypothesized to be the driving force for the accumulation of large numbers of mutations in tumors enabling the selection of tumor promoting events. There are clear examples for the existence of a mutator phenotype in certain cancers. Germline or somatic mutations of mismatch repair genes, e.g., MLH1, MSH1, MSH6, etc. are linked to a massive increase in microsatellite and other sequence alterations, and these mutations are found in hereditary forms of colorectal cancer (Edelmann and Edelmann 2004; Lynch and de la Chapelle 2003). Others have, however, argued that genetic instability is not usually required for tumor development (Bodmer et al. 2008). Initial large scale sequencing studies of human mismatch repair-stable cancer genomes failed to uncover clear evidence for a large number of sequence alterations detectable in a tumor clone (Wang et al. 2002) although subsequent studies have shown the presence of a substantial number of mutations in individual tumor samples.
The intense efforts put forward by large-scale re-sequencing approaches of cancer genomes have resulted in the identification of only a small number of genes that are commonly mutated in human tumors (Ding et al. 2008; Greenman et al. 2007; Jones et al. 2008; Sjoblom et al. 2006; Wood et al. 2007). One important discovery was that the BRAF gene, which encodes a serine/threonine kinase in the RAS signaling pathway, harbors somatic mutations in ~60% of malignant melanomas (Davies et al. 2002). Large-scale re-sequencing efforts also uncovered mutations in the PIK3CA34 gene (encoding the catalytic subunit of phosphatidylinositol-3-OH kinase) and AKT1 gene (encoding a serine/threonine kinase) in several types of cancer (Carpten et al. 2007; Samuels et al. 2004). In addition, mutations in the growth factor receptor genes ERBB2 and EGFR have been found frequently in non-small-cell lung cancers, in particular adenocarcinomas (Sharma et al. 2007). However, the relatively small number of genes that are commonly mutated in human cancers came as a surprise outcome of the large-scale sequencing efforts reported to date. These findings suggest that there are only a few genes that can effectively promote cancer formation in humans when mutated and, accordingly, are repeatedly and frequently mutated in multiple samples of the same tumor type and across different tumor types. Among these genes are the RAS genes and the TP53 gene. In this review, we will discuss some of the potential endogenous and exogenous origins of RAS and TP53 mutations. We will also discuss properties of mutations found in protein kinase genes and other genes analyzed by large-scale sequencing of human tumors.
Epidemiologic studies have the potential to identify suspect carcinogens, which may etiologically be involved in human cancers (Vineis and Perera 2007). This objective is often achieved by demonstration of a link between exposure to carcinogens and the incidence of cancer in defined human populations (Mossman et al. 2004). These observational studies examine correlations between cancer development and carcinogen exposure in humans with known exposure to cancer-causing agents (Besaratinia and Pfeifer 2006). In addition to the unavoidable exposures to food-, water- and air-borne carcinogens, humans are also exposed to specific carcinogenic agents, e.g., in occupational settings, or due to medicinal or life-style choices, e.g., tobacco smoking, alcohol drinking, etc (Carbone et al. 2004; Luch 2005). Not only can the specificity of carcinogen exposure determine the type of human cancers, but it may also influence the genetic and/or epigenetic alterations that are unique for certain types of human cancer (Besaratinia and Pfeifer 2006). For example, sunlight ultraviolet (UV) irradiation is a key determinant of non-melanoma skin cancers, and the presumed culprit of C to T transition mutations at hotspot dipyrimidines in the TP53 gene, which are characteristics of non-melanoma skin tumors (Pfeifer et al. 2005). Tobacco smoke carcinogens are linked to G to T mutations in lung cancer arising in smokers (Hainaut and Pfeifer 2001). Workplace exposure to vinyl chloride can be associated with liver tumors showing preferential mutation at A:T base pairs in TP53 (Hollstein et al. 1994), and Chinese herb nephropathy is characterized by A to T mutations in urothelial tumors (Grollman et al. 2007).
The correlative nature of human cancer development and carcinogen exposure can be used for causality inference if the observed correlation can be recapitulated experimentally (Hussain and Harris 1998; Hussain et al. 2000; Olivier et al. 2004; Thilly 1990). Obviously, experimental exposure of humans to carcinogenic agents is unethical. Thus, only in vitro or in vivo model systems can be used for inferring causality once epidemiologic studies have established a relationship between human cancer development and exposure to certain carcinogen(s). The available model systems utilize various strategies to reproduce the epidemiologic findings on carcinogen-specific genetic and/or epigenetic changes in vitro and/or in vivo (Besaratinia and Pfeifer 2006).
Nucleotide sequence changes in human cancers were first reported in the 1980s and they affected specific cellular homologues of viral oncogenes, the RAS genes HRAS, KRAS, and NRAS (Marshall et al. 1984; Parada et al. 1982; Sukumar et al. 1983; Taparowsky et al. 1982). RAS gene mutation has been described in several human cancers including colon, lung, pancreatic, thyroid cancers, melanomas and several other types of tumor (Barbacid 1987; Bos 1989). Mutations at codon 12 of KRAS are the most common mutations found among the three RAS genes. It was soon recognized that the type of RAS mutation was tumor-specific, i.e. the KRAS gene was frequently undergoing transition mutations in colorectal cancers (e.g., GGT → GAT at codon 12) but were predominantly transversion mutations (e.g., GGT → GTT) in lung cancers. Induction of tumors in animal models was shown to be accompanied by mutation of the HRAS gene (Sukumar et al. 1983). Also, notably, when RAS gene-containing plasmids were modified in vitro with chemical carcinogens, an oncogene was produced that could transform NIH3T3 cells (Marshall et al. 1984). The nature of the oncogenic mutation depended on the mutational specificity of the carcinogen (Quintanilla et al. 1986; Zarbl et al. 1985). These experiments established an important paradigm showing that tumor initiation results from mutations arising from the binding of ultimate carcinogens to DNA. Due to the selection of point mutations producing an activated oncogene by mutation of only a few amino acids in the RAS genes (i.e. codons 12, 13, 61, and 146) only a limited number of sequence changes can be observed. Despite these limitations, however, the types of mutations in the KRAS gene are different among various types of cancer. Smoking-associated lung cancers, but to a much lesser extent lung cancers in nonsmokers, often contain activating G to T transversion mutations at codon 12 of KRAS (Husgafvel-Pursiainen et al. 1993). Convincing evidence from both epidemiologic and experimental studies has identified tobacco smoke carcinogens as initiators of lung cancer (Besaratinia and Pfeifer 2008; Hecht 1999; Pfeifer et al. 2002). It has been unclear why only one codon in one particular RAS gene is undergoing mutations in lung tumors. In an elegant study, Tang and coworkers showed that codons 12 and 14 of the KRAS gene were hotspots for DNA adduct formation by the activated tobacco smoke carcinogen benzo[a]pyrene diol epoxide (B[a]PDE), with little or no adduct formation at codons 13 and 61, respectively. The carcinogen–DNA adducts formed at codon 14 were repaired more efficiently than those formed at codon 12 (Feng et al. 2002). However, a potential carcinogen-induced origin of other human RAS gene mutations has not been clearly demonstrated, neither for pancreatic cancers nor colorectal cancers, in which these mutations are relatively frequent.
Inactivating mutations in the TP53 tumor suppressor gene are the most common genetic events in human cancers affecting a specific gene (Hainaut and Wiman 2005; Hofseth et al. 2004; Olivier et al. 2004; Petitjean et al. 2007; Vogelstein et al. 2000), with the vast majority arising from a single point mutation in the segment encoding the DNA-binding domain of TP53. The inactivating mutations render the mutant TP53 protein unable to carry out its normal functions, i.e., transcriptional transactivation of downstream target genes that regulate the cell cycle and apoptosis (Petitjean et al. 2007; Vogelstein et al. 2000). Several TP53 mutation databases are maintained to catalogue TP53 mutations reported in the literature (Hamroun et al. 2006; Petitjean et al. 2007).
The large number and tissue-specific diversity of mutations in the TP53 mutation databases provide indirect but compelling evidence that certain mutagens may be involved in human carcinogenesis (Hussain and Harris 1998). The TP53 gene is useful as a mutagen test system for several reasons. For many tumor suppressor genes, nonsense or frameshift mutations that lead to protein truncation and mRNA decay are most frequently reported. Examples are the BRCA1 and APC genes. These types of mutations are not particularly useful for assessing potential mutagens as initiators of carcinogenesis since the creation of a stop codon either by a point mutation or an insertion/deletion-induced frameshift event severely limits the types of mutational events that can be scored. The situation is different for TP53 in which almost 90% of all mutations are of the missense type. Thus, many different types of mutations are observed in this gene including all types of possible base substitutions. Further, the occurrence of TP53 mutations is not limited to a few particular sequences or codons along this gene (like in the RAS genes). Walker et al noted no less than 73 mutation hotspots along the TP53 gene (Walker et al. 1999). Most mutations cluster in the TP53 DNA binding domain, which encompasses exons five through eight and spans approximately 180 codons or 540 nucleotides. Although particular amino acids in direct contact with DNA may be preferentially mutated or selected (Walker et al. 1999), there are hundreds of TP53 mutants that can lead to a phenotypic change of TP53 function (Kato et al. 2003). Most TP53 missense mutations lead to the synthesis of a stable protein, which lacks its specific DNA binding and transactivation function and accumulates in the nucleus of cells. The acquisition of TP53 mutations can have two consequences: i) a dominant negative effect by hetero-oligomerization of the more stable mutant TP53 with wild-type TP53 molecules expressed from the normal remaining allele, and ii) a gain of function of mutant TP53 protein. Since there are so many different types of mutant TP53 proteins functioning in diverse pathways, it is extremely difficult to distinguish between these possibilities. There have been very few reports describing inactivation of TP53 expression by promoter hypermethylation, and missense mutations in TP53 are much more common than nonsense or frameshift mutations, which supports the idea of a function for TP53 mutants either in a dominant-negative fashion or in a gain of function pathway.
We have hypothesized that the primary mutagenesis process (lesion formation, DNA repair, lesion bypass) and selection for tumorigenic mutations, are the driving forces that shape the TP53 mutation spectra in different human tumors (Besaratinia and Pfeifer 2006). Scrutiny of one of the large public domain databases of TP53 gene mutations in human cancers (nearly 25,000 entries; http://www-p53.iarc.fr/index.html) has been used to find associations between various types of human cancer mutations and carcinogen exposures, e.g., from environmental, occupational, and dietary sources (Hussain and Harris 1998; Olivier et al. 2004). The putative associations identified by this approach can be validated using a wide range of experimental models, ranging from lower organisms, e.g., bacterial Ames test (Gee et al. 1994), TP53 functional assays in yeast (Fronza et al. 2000), to reporter gene-based transgenic rodents, e.g., the BigBlue® system (Lambert et al. 2005), or analysis of endogenous non-cancer-related genes, e.g., the house-keeping gene hypoxanthine phosphoribosyltransferase (HPRT) in mammalian or human tissues/cells (Albertini 2001). Although these systems lack, in one way or another, important factors that contribute to TP53 mutations and human cancers, e.g., DNA-sequence context, DNA repair capacities and fidelity of translesion DNA synthesis, which are species/cell-type dependant, they have provided invaluable information on many aspects of mutagenesis-derived carcinogenesis (Besaratinia and Pfeifer 2006).
For certain cancers, the distribution of DNA lesions along the TP53 gene caused by environmental carcinogens can be correlated well with the mutational spectra, i.e. hotspots and types of mutations (Pfeifer et al. 2002). As described below, this concept has been validated by experiments with simulated sunlight and the cigarette smoke component benzo[a]pyrene (B[a]P) representing the polycyclic aromatic hydrocarbon (PAH) class of carcinogens. The damage and repair data obtained for the respective mutagens can predict many parameters of TP53 mutagenesis in human nonmelanoma skin cancers and lung cancers from tobacco smokers, respectively. Future studies with suspected mutagens will be helpful to implicate causative agents involved in other cancers, where the exact carcinogen has not yet been identified.
In the TP53 gene, an exceptionally high percentage of mutations can be found at the dinucleotide sequence CpG (Soussi and Beroud 2003). About 3 to 4% of all cytosines in mammalian DNA are methylated by a postreplicative enzymatic process catalyzed by DNA methyltransferases. The modified base, 5-methylcytosine (5mC) is found exclusively at CpG dinucleotides in mammalian genomes. In studies with prokaryotes, 5mC was first identified as a mutational hotspot as early as 1978 (Coulondre et al. 1978). In mammalian genomes, CpG sequences are hypermutable and, as a consequence, a large fraction of all CpG sites has been lost during evolution (Pfeifer 2006). In fact, CpGs are present at only about one seventh of their expected frequency, which would be roughly one every 13 or 14 base pairs. In the human genome of 3,080,419,480 nucleotides, there are 28,163,863 CpGs, which corresponds to only 9 CpGs per kilobase (1.8% of the sequence). Attesting to the general mutability of CpG sequences, it has been estimated that up to 25% of all disease-causing human mutations in autosomal genes occur at CpG sites (Krawczak et al. 1998). Interestingly, CpG frequency is not much depleted in the human TP53 gene. There are 23 CpG dinucleotides between codons 120 and 300 of the TP53 coding sequence representing the DNA binding domain, and 22 of these are between codons 150 and 300 in the most frequently mutated region. The hypermutability of CpG sequences has generally been attributed to hydrolytic deamination of 5-methylcytosine leading to the emergence of thymine base-paired with guanine at CpG sites (Jones and Baylin 2002; Pfeifer 2006). CpG mutation leading to the formation of CpA or TpG dinucleotides is considered to be a process with no apparent involvement of exogenous mutagens.
All CpG sequences analyzed in the TP53 gene are completely (at least 95% at each site) methylated in all human tissue samples examined (Rideout III et al. 1990; Tornaletti and Pfeifer 1995). The methylated CpGs (mCpGs) contain more than one third of all cancer mutations in TP53 and the vast majority of these alterations are transition mutations. Therefore, methylated CpG dinucleotides are the single most important mutational targets in TP53. Five of the six major TP53 mutational hotspots (in all cancers combined), i.e., codons 175, 245, 248, 273, and 282, all contain methylated CpG dinucleotides. C to T transitions at CpGs are particularly common in brain and colorectal cancers (http://www-p53.iarc.fr/index.html).
Endogenous deamination of 5-methylcytosine is viewed as the main source of the high frequency of CpG transitions at TP53 mutational hotspots in human internal cancers (Jones and Baylin 2002; Pfeifer 2006). We would like to point out, however that deamination of 5-methylcytosine in double-stranded DNA may not be the only mechanism that can cause transition mutations at methylated CpG sites. Both cytosine and 5-methylcytosine are subject to deamination, resulting in conversion to uracil and thymine, respectively. Hydrolytic deamination occurs at cytosines in double stranded DNA at a relatively slow rate with a half-life of about 30,000 years at 37°C and pH 7.4 (Frederico et al. 1990; Lindahl 1993; Shen et al. 1994). Methylation at the 5 position of the base ring facilitates spontaneous hydrolytic deamination, and as a result, 5-methylcytosines are deaminated two to four times more rapidly than cytosines (Ehrlich et al. 1990; Lindahl 1993; Shen et al. 1994). For double-stranded DNA the difference is only 2.2-fold, and 5-methylcytosines deaminate at a rate of 5.8 × 10-13 per second (Shen et al. 1994). From these data, it can be calculated that only two or three 5-methylcytosines deaminate per day in each cell (Pfeifer 2000; Schmutte and Jones 1998). These numbers are almost insignificant compared to steady state levels that have been measured for many endogenous and exogenous DNA adducts, which can be orders of magnitude higher. In any case, the 2-fold enhancement of the deamination rate is certainly not enough to account for the elevated mutation rate at methylated cytosines in CpG dinucleotides, which is estimated to be up to 42-fold higher than that of unmethylated cytosines at non-CpG sites (Cooper and Youssoufian 1988). The mutational effect may be augmented by the difference in repair of the resulting two mismatches. Uracil is recognized and excised efficiently by ubiquitous uracil-DNA glycosylase enzymes. Two thymine DNA glycosylase repair proteins (TDG and MBD4), which act upon deaminated mCpGs, have been identified in mammals (Hendrich et al. 1999; Neddermann et al. 1996). In fact, when MBD4 was deleted in the mouse, there was a 2-3-fold increase in CpG transition mutations in mutational reporter genes (Millar et al. 2002; Wong et al. 2002). However, the effectiveness of uracil versus thymine repair at deaminated C and 5mC bases in vivo is not known.
It is possible that certain chemicals may enhance the deamination reaction at methylated CpGs. Nitric oxide was shown to increase the rate of C to T transitions via stimulation of base deamination (Wink et al. 1991). However, nitric oxide did not cause significant 5mC-specific deamination of 5-methylcytosine containing reporter genes in other in vitro mutagenesis assays (Felley-Bosco et al. 1995; Schmutte et al. 1995). Oxidative damage to 5-methylcytosine can result in the formation of thymine glycol as one end product (Zuo et al. 1995). Thymine glycol is thought to be primarily a replication-blocking lesion, which will however pair mostly with adenine, when bypassed by a lesion-tolerant DNA polymerase (Kusumoto et al. 2002), which makes the oxidative deamination pathway a viable possibility. The symmetrical structure of a methylated CpG dinucleotide creates, of course, two possible sources of a C to T mutational event. The transition mutations may be caused by a lesion forming preferentially at guanine bases within methylated CpG sequences. This will produce G to A transition mutations indistinguishable from C to T mutations on the opposite DNA strand. Alternatively, certain mutagens may preferentially form DNA adducts at methylated cytosines and cause transition mutations by mispairing of the modified 5mC during DNA replication, or may increase the hydrolytic deamination of 5-methylcytosine. In summary, although there is only limited hard evidence to support the generally accepted idea that spontaneous hydrolytic deamination of 5-methylcytosine plays a dominant role in mammalian CpG mutagenesis, no alternative mechanism for this mutational specificity has been experimentally demonstrated so far.
It is widely accepted [though not universally; see (Thilly 2003)] that mutagenesis induced by endogenous and exogenous agents is an important component of tumorigenesis. This concept has been well proven in animal experiments in which exposure of animals to specific mutagens led to the formation of tumors harboring carcinogen-specific (“fingerprint-type”) mutations in RAS genes (Barbacid 1987) or in the TP53 gene (Ruggeri et al. 1993). While RAS mutations are very frequently observed in carcinogen-induced tumors in rodents, mutations in TP53 are much more infrequent. Therefore, for reproducing the TP53 mutational spectrum that occurs in human cancers, more indirect approaches have been developed.
One approach involves identification of sequence-specific DNA lesions generated by carcinogens in the TP53 gene, and correlation of these “fingerprints” with TP53 mutations collected from human cancer databases (Besaratinia and Pfeifer 2006; Pfeifer et al. 2002). This approach is based on mapping of DNA damage at the nucleotide resolution level by the ligation-mediated PCR (LMPCR) technique (Denissenko et al. 1996; Pfeifer et al. 1991). Using this technique, we have compared the distribution of DNA damage in the TP53 gene of human cells exposed to sunlight ultraviolet (UV) radiation, benzo[a]pyrene diolepoxide (BPDE), or aflatoxin B1 (AFB1) with the distribution of TP53 mutations in human cancers of the skin (non-melanoma), lung, and liver (Denissenko et al. 1998b; Denissenko et al. 1996; Tommasi et al. 1997). These experiments revealed a previously unrecognized role of methylated CpG sites as preferential targets for physical and chemical genotoxic agents.
Base changes characteristic for skin cancer, i.e. transitions at CC or TC dipyrimidine sequences, show a strong association with methylated CpGs (Tommasi et al. 1997; You et al. 1999). The relative contribution of TP53 mutations affecting dipyrimidines within mCpG sequences is ~35% of the total mutations, despite the fact that 5’CCG and 5’TCG occur only 20 times in the 1,080 bp double-stranded target sequence between codons 120 and 300 (Fig. 1). Importantly, all these CpG sequences are methylated in human keratinocytes (Tornaletti and Pfeifer 1995). For skin cancer, it was found that mutational hotspots that contain 5-methylcytosine at dipyrimidines are much more susceptible to pyrimidine dimer formation after exposure of cells to natural sunlight rather than to 254 nm UVC. Methylation of cytosine enhances pyrimidine dimer formation by sunlight by up to 15-fold and methylated cytosines are preferentially mutated by sunlight (Tommasi et al. 1997; You et al. 1999).
Tobacco smoking is a strong risk factor for the development of lung cancer (Hecht 1999). The characteristic signature of TP53 lung tumor mutations in smokers is the G to T transversion (Fig. 1). Ninety percent of the guanines undergoing these transversion events in lung cancer are located on the nontranscribed DNA strand (Hussain and Harris 1998; Pfeifer et al. 2002). Of note, five of the six most prominent mutation hotspots in the TP53 gene are represented by G to T mutations at codons containing methylated CpG sequences, including codons 157, 158, 245, 248, and 273 (Fig. 1B) (Pfeifer et al. 2002). G to T transversions are typical for bulky adduct-producing mutagens including the class of polycyclic aromatic hydrocarbons (PAHs). Benzo[a]pyrene is a widely studied member of the PAH class. Upon metabolic activation to benzo[a]pyrene diolepoxide (B[a]PDE), it induces G to T mutations (Luch 2005). The distribution of B[a]PDE adducts along the TP53 gene was mapped, at nucleotide resolution level, in carcinogen-treated normal human bronchial epithelial cells (Denissenko et al. 1996). Selective adduct formation sites were major mutational hotspots in human lung cancers, i.e. there was an excellent correlation between the benzo[a]pyrene adduct spectrum and the mutation spectrum in lung cancer (Pfeifer et al. 2002). The mechanistic basis for the selective occurrence of these PAH-damage hotspots is related to patterns of cytosine methylation in the TP53 gene (Denissenko et al. 1997). The distribution of B[a]PDE-DNA adducts differed drastically in CpG-methylated DNA compared to non-methylated DNA. Guanines flanked by 5-methylcytosines were the preferentially adducted positions. Therefore, CpG dinucleotides, which are methylated in the human TP53 gene in all human tissues examined, in addition to being an endogenous promutagenic factor, represent a preferential target for exogenous chemical carcinogens as well. The extent by which enhanced binding of an individual carcinogen at methylated CpGs affects mutagenesis at the same location has been studied in mouse cells carrying the lacI and cII transgenes. These cells were treated with B[a]PDE and the mutations were scored. A dominant fraction of the mutations (58-77% of all G to T mutations) occurred at methylated CpG sequences (Yoon et al. 2001). In summary, the PAH-DNA adduct patterns in the TP53 gene in bronchial epithelial cells coincide with G to T mutational hotspots in tobacco-smoking associated lung cancers (Denissenko et al. 1996; Smith et al. 2000), and this mutational pathway is faithfully reproduced with a tobacco smoke carcinogen (B[a]PDE) in mutational reporter genes rich in methylated CpG sites (Yoon et al. 2001).
More recently, a novel model system has been developed to investigate experimentally induced mutations in the human TP53 gene. The human p53 knock-in (Hupki) mouse model has addressed the issue of DNA sequence context by replacing exons 4-9 of the endogenous mouse TP53 allele with the homologous normal human TP53 gene sequence (Luo et al. 2001b). The Hupki mouse model has the capacity to detect both spontaneously arisen and carcinogen-induced mutations in the human TP53 gene in vitro (Liu et al. 2004; Liu et al. 2005; Luo et al. 2001a; Reinbold et al. 2008; Vom Brocke et al. 2008) or in vivo (Luo et al. 2001b; Tong et al. 2006).
The Hupki mouse model system was constructed using gene-targeting technology to create a mouse strain that harbors human wild-type TP53 DNA sequences from exons 4 to 9 in place of the homologous murine DNA sequences in both copies of the mouse Tp53 gene (Luo et al. 2001b). The substituted segment encodes the polyproline domain and DNA-binding domain of wild-type human TP53, and the chimeric TP53 gene remains under normal transcriptional regulation at the mouse locus. The Hupki mice develop normally, exhibit no apparent defects, remain fertile, and show no susceptibility to spontaneous lymphomas, sarcomas, or other neoplasms, which are common in TP53-deficient mice (Luo et al. 2001b). The Hupki mice retain a variety of normal TP53 functions and characteristics, including nuclear accumulation of TP53 protein after exposure to DNA-damaging agents, transcriptional activation of known TP53 downstream targets, and induction of apoptosis in thymocytes after gamma-irradiation, an outcome modulated by a functional TP53 gene (Luo et al. 2001a; Luo et al. 2001b).
In addition to its application for in vivo animal studies, the Hupki model system is also amenable to in vitro cell culture experiments. Murine fibroblasts, in contrast to human cells, spontaneously undergo immortalization during in vitro culturing, and require only one key genetic defect, such as loss of TP53 function, thus allowing the selection of TP53 mutant cells in vitro. Primary embryonic fibroblasts from the Hupki mice readily undergo immortalization during in vitro passaging, which allows for dysfunctional TP53 point mutations that are characteristic of human tumors, to be selected for (Feldmeyer et al. 2006; Liu et al. 2004; Liu et al. 2005; Luo et al. 2001a; Reinbold et al. 2008; Vom Brocke et al. 2008).
Hupki mouse embryonic fibroblasts treated with benzo[a]pyrene (B[a]P), a tobacco-derived carcinogen, harbored TP53 mutations comprised of predominantly single base substitutions in the DNA-binding domain of this gene [29 out of 36 (~81%) of all mutations] (Feldmeyer et al. 2006; Liu et al. 2005; Reinbold et al. 2008). G to T transversion mutations constituted half of all B[a]P-induced mutations, of which all but one (17 out of 18) occurred at sites where the mutated guanines were positioned on the non-transcribed strand of the TP53 gene. Distribution of the twenty-nine B[a]P-induced mutations in the DNA-binding domain of the TP53 gene revealed codons 157, 158 and 273 as the most frequently mutated sites. The overall pattern and distribution of B[a]P-induced mutations in the Hupki mouse model system (Feldmeyer et al. 2006; Liu et al. 2005; Reinbold et al. 2008) resembled the characteristic features of TP53 mutations in lung tumors of smokers (see, Fig. 2A and Fig. 1C) (Besaratinia and Pfeifer 2008; Hainaut and Pfeifer 2001; Toyooka et al. 2003) and the distribution of B[a]P-DNA adducts in human bronchial epithelial cells (Pfeifer et al. 2002).
The Hupki mouse embryonic fibroblasts were treated with aristolochic acid (AA) (Feldmeyer et al. 2006; Liu et al. 2004; Nedelko et al. 2009), a plant extract potentially involved in Chinese herb nephropathy and possibly leading to urothelial cancer development (Nortier et al. 2000). Twenty-one out of the 36 AA-induced TP53 mutations (~56%) were A to T transversion mutations (Feldmeyer et al. 2006; Liu et al. 2004; Nedelko et al. 2009) (see, Fig. 2B), an otherwise rare type of mutation but reflecting the hallmark mutation detected in urothelial tumors from patients with documented AA exposure (Grollman et al. 2007; Lord et al. 2004). The induced A to T transversion mutations were due to the adducted adenines located almost exclusively on the non-transcribed strand of the TP53 gene consistent with the fact that 20 of the 21 mutations were A to T and only one of the 21 mutations was T to A (Feldmeyer et al. 2006; Liu et al. 2004; Nedelko et al. 2009). This finding is consistent with the preferential formation of AA-adenine adducts found in the DNA of AA-treated cells and nephropathy patients (Arlt et al. 2001; Arlt et al. 2002; Lord et al. 2001; Lord et al. 2004; Nortier et al. 2000), as well as in the DNA from target organs of AA-exposed rats (Kohara et al. 2002; Pfau et al. 1990). Collectively, the data on aristolochic acid-induced DNA damage and mutations support a role of this compound in the etiology of tumors linked to Chinese herb nephropathy.
In other experiments, the Hupki mouse embryonic fibroblasts were treated with 3-nitrobenzanthrone (3-NBA) (Vom Brocke et al. 2008), a member of the class of nitropolycyclic aromatic hydrocarbons, present in the particulate fraction of diesel exhaust (US-EPA (Gilman 2002), and a ubiquitous urban air pollutant (Arlt 2005). The established cultures of 3-NBA-treated cells harbored TP53 mutations in the DNA-binding domain of this gene, which consisted mainly of base substitutions (22 out of 29, ~76%) (Vom Brocke et al. 2008). Of these, G to T transversions were the major type of mutations (10 out of 22 (~46%) followed by A to T transversions (3 out of 22, (~14%) (Fig. 2). This ratio of G to T to A to T transversions (3:1) perfectly mirrored the ratio of dG/dA adduct formation (75:25%) determined in similarly treated cells with 3-NBA or its reactive metabolite, N-hydroxy-3-aminobenzanthrone (N-OH-3-ABA) (Vom Brocke et al. 2008). A similar correlation in ratios of 3-NBA-derived purine adducts to transversion mutations was previously found in liver tissues of the MutaMouse™, where the proportion of induced dG to dA adducts was 6 to 1 and that of corresponding G to T and A to T mutations was 5 to 1 (Arlt 2005).
Luo et al. have demonstrated that UVB-irradiated Hupki mice exhibit characteristic molecular pathology features of sunlight-associated human skin cancers, including (i) development of clones of epidermal cell patches with TP53-immunoreactive nuclei, (ii) formation of UV-induced cyclobutane pyrimidine dimers at skin cancer mutational hotspots in the TP53 gene, which co-localize with the respective lesions induced in UVB-exposed human keratinocytes, and (iii) induction of signature C to T transition mutations in the respective TP53 mutational hotspots found in human skin cancers (Luo et al. 2001a).
Tong et al. (Tong et al. 2006) have used the Hupki mice to investigate the effect of local DNA sequence on TP53 codon 249 mutation, a prevalent occurrence in human hepatocellular carcinoma associated with synergistic exposure to aflatoxin B1 (AFB1) and hepatitis B virus (HBV) infection (Montesano et al. 1997). A single intraperitoneal injection of AFB1 to the Hupki mice and counterpart wild-type animals showed that the mice expressing the humanized TP53 gene were more prone to hepatocellular carcinoma development and death, compared to mice expressing the murine TP53, without acquiring any mutations in the TP53 gene (Tong et al. 2006). These findings support the notion that the specificity of TP53 codon 249 mutation in human hepatocellular carcinoma is not solely dependent upon DNA sequence context of this gene (Denissenko et al. 1998b), and that other determining factors, e.g., concomitant HBV infection, may synergistically be involved in this specific mutational process (Hussain et al. 2007). Also, despite the overall conservation in evolution of DNA repair mechanisms, differences exist between humans and mice, such as the efficiency of the global genomic repair sub-pathway of nucleotide excision repair (Hanawalt 2002). Such discrepancies may set some limitations because promutagenic lesions in the Hupki TP53 gene are subject to the murine DNA repair machinery. Nonetheless, the Hupki TP53 model system has recapitulated many aspects of TP53 mutagenesis in human tumors (Feldmeyer et al. 2006; Jaworski et al. 2005; Liu et al. 2004; Liu et al. 2005; Luo et al. 2001a; Luo et al. 2001b; Reinbold et al. 2008; Tong et al. 2006; Vom Brocke et al. 2008; vom Brocke et al. 2006). Future studies will determine the accuracy of its portrayal of these events in other types of human cancers.
Large-scale and high-throughput DNA sequencing is now being used to find almost any genome alteration in individual tumors. In addition to the TP53 mutation databases, the catalogue of somatic mutations in cancer (COSMIC) database is currently the most comprehensive resource available for information on cancer-associated DNA sequence changes (Forbes et al. 2008). It combines information from the scientific literature with resequencing data of the Sanger Institute.
Initial efforts of systematic genome-wide screening for genes mutated in cancer have led to the discovery that the BRAF gene, which encodes a serine/threonine kinase in the RAS signaling pathway, frequently contains somatic mutations in a large fraction of malignant melanomas and other tumors (Davies et al. 2002). Large-scale re-sequencing efforts also uncovered common mutations in the PI3 kinase pathway in several types of human cancer (Carpten et al. 2007; Samuels et al. 2004). Recently, cancer genome sequencing has focused on a collection of ~500 genes encoding protein kinases (Greenman et al. 2007). The frequent inactivation of a particular biochemical pathway by mutation or epigenetic inactivation of any one of the critical pathway components rather than by inactivation of only a single gene is becoming a common theme that is currently explored. Exceptions, however, are still the TP53 and KRAS genes, which despite of the sequence analysis of hundreds or even thousands of genes, still stand at the top of the list of the most frequently mutated cancer genes (Ding et al. 2008; Parsons et al. 2008; Sjoblom et al. 2006). On average, large-scale re-sequencing of large sets human genes has identified generally between 10 and 100 mutations in each individual tumor. The percentage of silent mutations is often quite high and can be almost as high as if one would expect that none of the observed non-synonymous mutations would lead to a phenotypic change selected in the tumor and almost all such mutations are innocuous passenger mutations. However, careful analysis has led to the prediction that at least a limited number of the newly identified mutations other than TP53, KRAS, etc., are biologically significant (Wood et al. 2007).
Importantly, the large-scale sequencing data have generally confirmed the cancer-specific mutation data obtained earlier for the TP53 gene (see Figures 3 and and4).4). For example, sequencing of 518 protein kinase genes in six melanoma samples uncovered 144 mutations and more than 90% of these mutations were C to T transitions at dipyrimidine sites (Greenman et al. 2007). These data strongly support UVB-induced pyrimidine dimer lesions as the cause of these mutations in melanoma, at least in those six samples analyzed. Another interesting observation is that tumors from glioma patients treated with the chemotherapeutic agent temozolomide contained a vast excess (>95%) of G to A transition mutations (Greenman et al. 2007). Temozolomide is an alkylating agent that produces O6-methyl-guanine adducts. These adducts can mispair with thymine during DNA replication leading to the observed mutational change from G to A.
Colorectal cancers are characterized by a high percentage of G:C to A:T mutations (Figure 3A, Figure 4A). Also, intriguingly, the high preponderance of G:C to A:T transitions at CpG dinucleotides, previously recognized in the TP53 gene of colorectal cancers, is also found in the protein kinase genes (Greenman et al. 2007). These transition mutations represent roughly 50% of all mutations. Other investigators determined the sequences of over 23,000 transcripts in colorectal, pancreatic, breast, and brain tumors. The percentage of C to T transitions at CpG sites was 47.8% for colon cancer, 43.1% for brain cancer, and 37.9% for pancreatic cancer (Jones et al. 2008). Overall, these data are very similar to what has been observed in the TP53 gene (Soussi and Beroud 2003).
Lung cancer is a particularly illustrative example. As was shown earlier, the TP53 mutation spectrum in lung cancer is different from those of other cancers, perhaps with the exception of aflatoxin-associated liver cancers. Lung tumors are characterized by a large percentage of G to T transversions thought to be derived from mutagenic agents in tobacco smoke (Hainaut and Pfeifer 2001). The same situation was recapitulated when 623 genes with a known or potential relationship to cancer were sequenced in 188 lung adenocarcinomas (Ding et al. 2008). Of 1013 non-synonymous mutations, 41% were G to T transversions, a percentage even higher than that for TP53 (Fig. 1). In smokers, 43% of the mutations were G to T transversions but this number dropped to 13% in never-smokers (Ding et al. 2008). Again, this difference between smokers and nonsmokers is similar to that in the TP53 gene (Hainaut and Pfeifer 2001) and suggests a mutagenic role of tobacco-smoke carcinogens in lung carcinogenesis.
The large scale sequencing of breast cancer genomes has provided an unexpected and unusual result. As shown by independent groups sequencing either 518 protein kinase genes (Greenman et al. 2007), or the exons of more than 20,000 genes (Jones et al. 2008), breast cancers are characterized by a low fraction of C to T transitions at CpG sites, and by a high frequency of G to C transversions (Figures 3, ,4).4). Interesting, a large fraction of the G to C transversions occur at the dinucleotide sequence 5’GpA, which is equivalent to a C to G transversion at 5’-TpC on the opposite DNA strand, so that this unique type of mutation accounts for >20-30% of all mutations in breast tumors. This data suggests that breast cancers are caused by an etiological agent that induces this particular type of mutation. There are few known mutagens that specifically induce G/C to C/G transversions, let alone selectively at a particular dinucleotide sequence. Polycyclic aromatic hydrocarbons containing a cyclopentane ring, such as cyclopenta[c,d]pyrene and benz[j]aceanthrylene have been shown to induce predominantly G to C transversions in the KRAS gene of mouse lung tumors but the sequence context is different (Jackson et al. 2006). Surprisingly, however, analysis of G to C transversions in the TP53 gene in breast cancer shows that this type of mutation is quite rare (7.8%) (Fig. 1). When 181 breast cancer TP53 G to C transversions were analyzed for sequence context, we found that 55 of them (=30%) were in the sequence context 5’GpA, which is just slightly more than would be expected from a random distribution of flanking bases. One possible interpretation for the different results in the TP53 gene versus the other genes is of course that the frequency of TpC/GpA dinucleotides that produce a mutant TP53 protein after G to C transversion may be low in the TP53 gene. Although there are several common hotspot codons in TP53 (248, 249, 273) that can frequently be mutated by a G to C transversion, these codons do not contain GpA sequences. However, a G to C mutation at 5’GpA sites can mutate several other commonly mutated TP53 codons, including codons 278 and 280, producing a mutant TP53 protein. After large scale sequence analysis of additional types of tumors, it was reported that the G/C to C/G transversions are targeted to 5’GpA dinucleotides not only in breast cancers but also commonly in lung cancers and other cancers but G to C mutations do not have a dinucleotide sequence preference among germ line variants (Greenman et al. 2007). Little strand bias for the G to C transversion was observed arguing against the possibility that this mutation is caused by a bulky DNA adduct subject to transcription-coupled DNA repair. The unique sequence preference of G to C transversions, in particular in breast and lung cancers, is puzzling, as is the fact that this pattern is not readily found in the TP53 gene. One possibility is that there are unknown exogenous or endogenous mutagens, in particular in breast, and perhaps other tissues, that effectively induce this unique type of mutation. Candidate mutagens, once identified, can be tested in the various in vitro and in vivo mutation reporter systems described earlier in this review with the goal of identifying a causative agent for these tumors.
Initial studies on human tumor-specific mutation spectra have laid the groundwork for finding etiological connections between specific mutagens/carcinogens and tumor development. These connections were made possible by the design and application of various in vitro and in vivo DNA damage detection and mutation reporter systems that can be used to score the DNA damaging effects, mutagenic competence and mutational specificity of suspected human carcinogens. Recent large-scale sequencing efforts of cancer genomes have confirmed that several previously well-studied genes are indeed the most commonly mutated ones (e.g. TP53, KRAS) but have expanded the catalogue of tumor-associated DNA sequence changes. These new sequencing data have confirmed established relationships between UV exposure and melanoma and tobacco smoke carcinogens and lung cancer, and the new studies have generally found similar mutation spectra in the TP53 gene and the many other genes that have now been sequenced. A notable exception is the identification of common G to C transversion mutations at a unique dinucleotide sequence in breast cancer and other cancers, which may provide an important signature for identification of a suspected human mutagen. It can be expected that the continued efforts aimed at sequencing the genomes of different types of cancer in many individuals will lead to new information on tumor-specific mutation spectra important for deciphering the etiology of human cancer.
Work of the authors is supported by NIH grant CA084469 to G.P.P.