Nature. Author manuscript; available in PMC 2012 August 28. Published in final edited form as: | PMCID: PMC3428862 UKMSID: UKMS49632 |
The landscape of cancer genes and mutational processes in breast cancer
Philip J. Stephens,1* Patrick S. Tarpey,1* Helen Davies,1 Peter Van Loo,1,2 Chris Greenman,1,3,4 David C. Wedge,1 Serena Nik-Zainal,1 Sancha Martin,1 Ignacio Varela,1 Graham R. Bignell,1 Lucy R. Yates,1,5,6 Elli Papaemmanuil,1 David Beare,1 Adam Butler,1 Angela Cheverton,1 John Gamble,1 Jonathan Hinton,1 Mingming Jia,1 Alagu Jayakumar,1 David Jones,1 Calli Latimer,1 King Wai Lau,1 Stuart McLaren,1 David J. McBride,1 Andrew Menzies,1 Laura Mudie,1 Keiran Raine,1 Roland Rad,1 Michael Spencer Chapman,1 Jon Teague,1 Douglas Easton,7,8 Anita Langerød,9 OSBREAC,† Ming Ta Michael Lee,10 Chen-Yang Shen,10 Benita Tan Kiat Tee,11 Bernice Wong Huimin,12 Annegien Broeks,13 Ana Cristina Vargas,14 Gulisa Turashvili,15,16 John Martens,17 Aquila Fatima,18 Penelope Miron,18 Suet-Feung Chin,19 Gilles Thomas,20 Sandrine Boyault,20 Odette Mariani,21 Sunil R. Lakhani,14,22,23 Marc van de Vijver,24 Laura van ’t Veer,13 John Foekens,17 Christine Desmedt,25 Christos Sotiriou,25 Andrew Tutt,5 Carlos Caldas,19,26 Jorge S. Reis-Filho,27 Samuel A. J. R. Aparicio,15,16 Anne Vincent Salomon,21,28 Anne-Lise Børresen-Dale,9,29 Andrea L. Richardson,18,30 Peter J. Campbell,1,31,32 P. Andrew Futreal,1 and Michael R. Stratton1
1Cancer Genome Project, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK
2Human Genome Laboratory, Department of Human Genetics, VIB and University of Leuven, Herestraat 49 Box 602, B-3000 Leuven, Belgium
3School of Computing Sciences, University of East Anglia, Norwich NR4 7TJ, UK
4The Genome Analysis Centre, Norwich Research Park, Norwich NR4 7UH, UK
5Breakthrough Breast Cancer Research Unit, Research Oncology, 3rd Floor Bermondsey Wing, Guy’s Hospital Campus, Kings College London School of Medicine, London SE1 9RT, UK
6Department of Clinical Oncology, Ground floor, Lambeth Wing, Guys and St Thomas’ NHS Trust, Westminster Bridge Road, London SE1 7EH, UK
7Centre for Cancer Genetic Epidemiology, Department of Oncology, Strangeways Research Laboratory, Cambridge CB1 8RN, UK
8Centre for Cancer Genetic Epidemiology, Department of Public Health and Primary Care, Strangeways Research Laboratory, Cambridge CB1 8RN, UK
9Department of Genetics, Institute for Cancer Research, The Norwegian Radium Hospital, Oslo University Hospital, O310 Oslo, Norway
10National Genotyping Center, Institute of Biomedical Sciences, Academia Sinica, 128 Academia Road, Sec 2, Nankang, Taipei 115, Taiwan, China
11Department of General Surgery, Singapore General Hospital, 169608, Singapore
12NCCS-VARI Translational Research Laboratory, National Cancer Centre Singapore, 11 Hospital Drive, 169610, Singapore
13Department Experimental Therapy, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands
14The University of Queensland Centre for Clinical Research, The Royal Brisbane & Women’s Hospital, Herston, Brisbane, Queensland 4029, Australia
15Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, British Columbia V6T 2B5, Canada
16Molecular Oncology, British Columbia Cancer Research Centre, Vancouver, British Columbia V5Z 1L3, Canada
17Department of Medical Oncology, Erasmus University Medical Center, Daniel den Hoed Cancer Center and Cancer Genomics Center, Postbus 2040, 3000 CA Rotterdam, Netherlands
18Department of Cancer Biology, Dana-Farber Cancer Institute, 450 Brookline Avenue, Boston, Massachusetts 02215, USA
19Department of Oncology, University of Cambridge and Cancer Research UK Cambridge Research Institute, Li Ka Shin Centre, Cambridge CB2 0RE, UK
20Universite Lyon 1, INCa-Synergie, Centre Leon Berard, 28 rue Laennec, Lyon CEDEX 08, France
21Institut Curie, Department of Tumor Biology, 26 rue d’Ulm, 75248 Paris CEDEX 05, France
22The University of Queensland School of Medicine, Herston Road, Herston, Brisbane, Queensland 4006, Australia
23Anatomical Pathology, Pathology Queensland, The Royal Brisbane and Women’s Hospital, Herston, Brisbane, Queensland 4029, Australia
24Department of Pathology, Academic Medical Center, Meibergdreef 9, 1105 AZ Amsterdam, The Netherlands
25Breast Cancer Translational Laboratory, Université Libre de Bruxelles, Jules Bordet Institute, Boulevard de Waterloo 121, 1000 Brussels, Belgium
26NIHR Cambridge Biomedical Research Centre and Cambridge Experimental Cancer Medicine Centre, Cambridge University Hospitals NHS Foundation Trust, Cambridge CB2 2QQ, UK
27The Breakthrough Breast Cancer Research Centre, Institute of Cancer Research, London SW3 6JB, UK
28Institut Curie, INSERM Unit 830, 26 rue d’Ulm, 75248 Paris CEDEX 05, France
29K.G. Jebsen Center for Breast Cancer Research, Institute for Clinical Medicine, Faculty of Medicine, University of Oslo, O318 Oslo, Norway
30Department of Pathology, Brigham and Women’s Hospital, Harvard Medical School, 75 Francis St, Boston, Massachusetts 02115, USA
31Department of Haematology, Addenbrooke’s Hospital, Cambridge CB2 0QQ, UK
32Department of Haematology, University of Cambridge, Hills Road, Cambridge CB2 2XY, UK
The coding exons of 21,416 protein coding genes and 1,664 microRNAs were sequenced and copy number changes examined in 100 primary breast cancers, 79 of which were oestrogen receptor positive (ER+) and 21 of which were oestrogen receptor negative (ER−) (Supplementary Table 1). We sequenced normal DNAs from the same individuals to exclude inherited sequence variation. We identified 7,241 somatic point mutations: 6,964 were single-base substitutions, of which 4,737 were predicted to generate missense; 422, nonsense; 158, an essential splice site; 8, stop codon read-through; and 1,637, silent changes in protein sequence. Two substitutions were found in microRNAs. There were 277 small insertions or deletions (71 and 206, respectively), of which 231 introduced translational frameshifts and 46 were in-frame (Supplementary Table 2). Analyses of copy number yielded 1,712 homozygous deletions and 1,751 regions of increased copy number (amplification) (Supplementary Table 3).
Somatic driver substitutions and small insertions/deletions (indels) were identified in cancer genes previously implicated in breast cancer development, including
AKT1,
BRCA1,
CDH1,
GATA3,
PIK3CA,
PTEN,
RB1 and
TP53 (Supplementary Table 4; see also
http://www.sanger.ac.uk/genetics/CGP/Census). Likely drivers were also found in cancer genes involved in other cancer types, including
APC,
ARID1A,
ARID2,
ASXL1,
BAP1,
KRAS,
MAP2K4,
MLL2,
MLL3,
NF1,
SETD2,
SF3B1,
SMAD4 and
STK11.
To identify new cancer genes, we searched for non-random clustering of somatic mutations in each of the 21,416 protein-coding genes
2,3 and sequenced a subset of genes highlighted by this analysis in a followup series of 250 breast cancers (Supplementary Tables 5 and 6). Persuasive evidence was found for nine new cancer genes ( and Supplementary Fig. 1). Of these
ARID1B,
CASP8,
MAP3K1,
MAP3K13,
NCOR1,
SMARCD1 and
CDKN1B had the truncating mutations and often biallelic inactivation characteristic of inactivated, potentially recessive cancer genes (Supplementary Table 4).
AKT2 is probably an activated, dominantly acting cancer gene. The effects of TBX3 mutations on its function are unclear.
MAP3K1 encodes a serine/threonine protein kinase that regulates the activity of the ERK MAP kinase (the extracellular signal-regulated mitogen-activated protein kinase), JUN kinase and p38 signalling pathways implicated in control of cell proliferation and death
4. Somatic mutations in
MAP3K1 were observed in 6% of breast cancers, predominantly in ER+ cases. Most were protein truncating. MAP3K1 phosphorylates and activates the protein encoded by
MAP2K4, a known recessive cancer gene with inactivating mutations in breast and other cancers
5. In turn, MAP2K4 phosphorylates and activates the JUN kinases MAPK8 (also known as JNK1) and MAPK9 (also known as JNK2), which phosphorylate JUN, TP53 and other transcription factors mediating cellular responses to stress
4. Truncating mutations and other non-synonymous mutations were also found in
MAP3K13, which encodes a kinase that phosphorylates and activates MAP2K7. MAP2K7 phosphorylates and activates MAPK8 and MAPK9 (ref.
4). Thus, in breast cancer, inactivating mutations in
MAP3K1,
MAP2K4 and
MAP3K13 are predicted to abrogate signalling pathways that activate JUN kinases ().
In the serine/threonine kinase gene
AKT2, we identified a single somatic missense mutation, Glu 17 Lys, that is identical to the recurrent, activating mutation in
AKT1 previously reported in breast cancer
6. Thus,
AKT2 is also probably a cancer gene, albeit one infrequently implicated in breast cancer development. Because AKT phosphorylates and inhibits MAP2K4 (ref.
7) and mutations in
PIK3CA and
PTEN can result in AKT activation
8, about half of breast cancers may have abrogation of JUN kinase signalling (). The biological consequences of the reduction in JUN kinase activity are likely to be diverse and complex, but may include destabilization and consequent inactivation of TP53 with disruption of pro-apoptotic cellular signalling in response to stress
9.
We observed truncating mutations and homozygous deletions of
NCOR1. In addition to mediating repression of thyroid-hormone and retinoic-acid receptors by promoting chromatin condensation and preventing access of the transcription machinery
10, NCOR1 participates in ligand-dependent transcriptional repression by oestrogen receptor alpha
11. We also identified inactivating mutations in
SMARCD1 and
ARID1B, further implicating aberrant chromatin regulation. The encoded proteins of both are components of the SWI/SNF chromatin modelling complex, which incorporates the products of several established recessive cancer genes, including
PBRM1, ARID1A, SMARCB1 and
SMARCA4 (refs
3, 12-14).
We found three truncating mutations and a missense mutation in
CDKN1B. Two truncating mutations in
CDKN1B in cancer have previously been reported
15,16, and collectively the results confirm that
CDKN1B is a cancer gene. CDKN1B (also known as p27 or KIP1) normally inhibits activation of cyclin E/CDK2 and cyclin D/CDK4 complexes, thus preventing cell cycle progression at phase G1
17.
Three truncating mutations were observed in CASP8. CASP8 is a member of the cysteine/aspartic acid protease family that forms a complex with the FAS cell surface receptor to promote programmed cell death. Inactivation of CASP8 in these cancers is therefore predicted to abrogate apoptosis in response to a variety of signals.
Six tumours had mutations in
TBX3, which encodes a T-box transcription factor that regulates stem cell pluripotency-associated and reprogramming factors and is involved in normal breast development
18,19. Constitutional inactivating mutations in
TBX3 cause ulnar-mammary syndrome, in which there is failure of breast and apocrine development coupled with abnormalities of limb morphogenesis
20. Three breast cancers had in-frame deletions, one of Thr 210 and the other two of Asn 212, a residue through which the T-box domain binds to DNA. Despite the presence of truncating mutations in three further cases, the recurrent and clustered in-frame deletions and the finding that all mutations were heterozygous suggests that they may not simply result in loss of function. Indeed, recent reports suggest that increased activity of TBX3 is likely to contribute to oncogenesis. The proportion of stem-like cells in breast cancers is increased by oestrogen-dependent activation of the TBX3 pathway
21. Moreover, TBX3 overexpression increases the efficiency of the derivation of induced pluripotent stem cells
18 and the ability of cancer cells to form tumours
21.
Further supporting their role in oncogenesis, three of the nine newly identified somatically mutated cancer genes,
MAP3K1, CASP8 and
TBX3, carry inherited common variants, identified by genome-wide association studies, that confer small increased risks of breast cancer
22,23. Several additional genes showed truncating mutations and are biologically plausible candidate cancer genes contributing infrequently to breast cancer development. Some, including
ASXL2, ARID5B, KDM3A, SETD1A, CHD1, NCOR2, HDAC9 and
CTCF, encode proteins that regulate chromatin structure, whereas others, including
FANCA and
ATR, are involved in DNA repair.
Cancers arise through successive waves of clonal expansion dependent on the sequential acquisition of driver mutations. A central parameter of cancer development is therefore the number of driver mutations required for conversion of a normal cell into a symptomatic cancer. Estimates based on cancer age–incidence curves have indicated that approximately five rate-limiting steps underlie the development of common adult solid tumours
24. Experimental studies have similarly indicated that a limited number of key genetic changes are required for neoplastic transformation of human cells
25. Our systematic genome analysis now provides a direct survey of the landscape of driver mutations in breast cancer.
Somatic driver point mutations and/or copy number changes in at least 40 cancer genes were implicated in the development of the 100 breast cancers (, Supplementary Tables 3 and 4, and Supplementary Methods). The maximum number of mutated cancer genes in an individual cancer was 6, but 28 cases only showed a single driver. Thus, there seems to be substantial variation in the number of drivers. In some cases, the presence of multiple drivers was associated with subclonal evolution of the cancer (Supplementary Statistical Analyses). However, in others multiple drivers were in the root cancer clone. Seven of the 40 cancer genes (TP53, PIK3CA, ERBB2, MYC, FGFR1/ZNF703, GATA3 and CCND1) were mutated in more than 10% of cases. Collectively these contributed 58% of driver mutations (144 of 250). Therefore, 33 mutated cancer genes, each contributing relatively infrequently, were responsible for the remaining 42% of driving genetic events. We observed 73 different combinations of mutated cancer genes. Thus, most breast cancers differed from all others ( and Supplementary Fig. 2). This assessment of the genetic diversity of breast cancer is probably conservative because, for several reasons, it underestimates the number of mutated cancer genes in each case.
At present, we know little about the mutational processes responsible for the generation of somatic mutations in breast and other cancers. In the 100 breast cancers analysed here, there was substantial variation in the total numbers of base substitutions and indels between individual cases (). There was also considerable diversity of mutational pattern, ranging from cases in which C•G → T•A transitions predominated to cases in which all transitions and transversions made equal contributions ( and Supplementary Fig. 3). Taken together, the results suggest that multiple distinct mutational processes are operative. For most of these processes, the underlying mechanism is unknown.
To illustrate one mutational signature in detail, we selected the ER+ breast cancer with the largest number of base substitutions in the series, PD4120 (, asterisk; ). The mutation spectrum of this case was distinctive, featuring C•G → T•A, C•G → G•C and C•G → A•T mutations and very few mutations at A•T base pairs (). To characterize this process further, we examined the sequence context in which the mutations occurred (in the following discussion, mutations at C•G base pairs are represented as the change at the C base) and found pronounced overrepresentation of thymine immediately 5′ to the mutated cytosines. Thus, in PD4120 the large majority of mutations were of cytosine at TpC dinucleotides ().
To obtain further insight into the underlying mechanism in this case, we looked for differences in mutation prevalence between the transcribed and untranscribed strands of the 21,416 genes analysed (‘strand bias’) and found a higher prevalence of C→T, C→G and C→A mutations on transcribed strands (
P = 0.02) ( and Supplementary Table 7). This strand bias raises the possibility that transcription-coupled nucleotide excision repair (NER) has been operative. NER removes bulky DNA adducts that distort the DNA double helix, notably pyrimidine dimers due to ultraviolet light exposure or adducts due to mutagens in tobacco smoke
26. There is a form of NER, recruited by RNA polymerase II, that is operative only on the transcribed strand of each gene and thus introduces a strand bias for mutations
27. Therefore, one hypothesis to account for the strand bias in PD4120 is past involvement of NER, in turn implicating exposure to a bulky DNA-damaging agent, either of endogenous or exogenous origin. However, we cannot exclude the possibility that other DNA damage or repair processes generate a strand bias. At least eight additional cancers in this series had a very similar mutational spectrum, sequence context and strand bias (Supplementary Fig. 4 and Supplementary Statistical Analysis). None had been treated before excision of the cancer.
The somatic mutations in a cancer genome accumulate over a patient’s lifetime, during the lineage of mitotic divisions from the fertilized egg to the cancer cell. Some are acquired while cells in the lineage are biologically normal, whereas others are acquired after acquisition of the neoplastic phenotype. However, the relative proportions accumulated in these two phases are unknown. To explore this question, we examined the relationship between the total numbers of somatic base substitutions and the age at diagnosis in the 100 tumours (). In both ER+ and ER− cancers, no correlation was observed (P = 0.33 and 0.14 respectively). If most somatic mutations in a cancer genome are acquired in normal tissues before neoplastic transformation, the later the onset of the cancer the longer this part of the lineage is likely to have been and, consequently, the higher the number of mutations. The absence of a correlation there fore suggests that most mutations in breast cancer genomes occur after the initiating driver event.
We then considered separately the subset of somatic mutations constituted by C•G → T•A substitutions at CpG dinucleotides, because this mutational pattern is observed in non-diseased tissues, manifesting prominently in normal germline variation. This subset showed a strong positive correlation with the age at cancer diagnosis in ER− cancers (P = 1.2 × 10−7), supporting the proposition that it is enriched in mutations occurring in normal tissues and that, overall, other mutation classes occur later. By contrast, ER+ cancers showed no correlation between C•G → T•A substitutions at CpG dinucleotides and age at diagnosis (P = 0.27). The basis for this pronounced difference is unclear, but potentially highlights a profound divergence in the dynamics of mutation acquisition between these two major subclasses of breast cancer.
In clinical practice, breast cancers are graded microscopically on the basis of mitotic counts, pleomorphism of cancer cell nuclei and extent of tubule formation, which are then collected into an overall grade score. High scores indicate large numbers of mitoses, substantial tumour cell pleomorphism and little tubule formation, and are generally associated with more rapid progression. Significant correlations were not observed between numbers of driver mutations and grade scores (Supplementary Statistical Analysis). However, there were strong positive correlations between the total number of substitutions (that is, drivers and passengers) and mitosis and tubule scores (P = 0.0002 and 0.002 respectively), which remained significant after multiple testing corrections. The causal relationships between these features are unclear. However, because most substitutions are likely to be biologically inert passengers, it is possible that the biological state of high-grade breast cancers may be responsible for generating increased numbers of mutations, rather than the converse.
The panorama of mutated cancer genes and mutational processes in breast cancer is becoming clearer, and a sobering perspective on the complexity and diversity of the disease is emerging. Driver mutations are operative in many cancer genes. A few are commonly mutated, but many infrequently mutated genes collectively make a substantial contribution in myriad different combinations. Multiple somatic mutational processes have been operative. Ultimately, characterization of the genomes of breast cancer, and others, will provide a robust and biologically meaningful classification generating insights into the clinical heterogeneity of the disease and influencing strategies to find new modes of prevention and treatment.