The nature and pace of genome mutation is largely unknown. Because standard methods sequence DNA from populations of cells, the genetic composition of individual cells is lost, de novo mutations in cells are concealed within the bulk signal and per cell cycle mutation rates and mechanisms remain elusive. Although single-cell genome analyses could resolve these problems, such analyses are error-prone because of whole-genome amplification (WGA) artefacts and are limited in the types of DNA mutation that can be discerned. We developed methods for paired-end sequence analysis of single-cell WGA products that enable (i) detecting multiple classes of DNA mutation, (ii) distinguishing DNA copy number changes from allelic WGA-amplification artefacts by the discovery of matching aberrantly mapping read pairs among the surfeit of paired-end WGA and mapping artefacts and (iii) delineating the break points and architecture of structural variants. By applying the methods, we capture DNA copy number changes acquired over one cell cycle in breast cancer cells and in blastomeres derived from a human zygote after in vitro fertilization. Furthermore, we were able to discover and fine-map a heritable inter-chromosomal rearrangement t(1;16)(p36;p12) by sequencing a single blastomere. The methods will expedite applications in basic genome research and provide a stepping stone to novel approaches for clinical genetic diagnosis.
The application of paired-end next generation sequencing approaches has made it possible to systematically characterize rearrangements of the cancer genome to base-pair level. Utilizing this approach, we report the first detailed analysis of ovarian cancer rearrangements, comparing high-grade serous and clear cell cancers, and these histotypes with other solid cancers. Somatic rearrangements were systematically characterized in eight high-grade serous and five clear cell ovarian cancer genomes and we report here the identification of > 600 somatic rearrangements. Recurrent rearrangements of the transcriptional regulator gene, TSHZ3, were found in three of eight serous cases. Comparison to breast, pancreatic and prostate cancer genomes revealed that a subset of ovarian cancers share a marked tandem duplication phenotype with triple-negative breast cancers. The tandem duplication phenotype was not linked to BRCA1/2 mutation, suggesting that other common mechanisms or carcinogenic exposures are operative. High-grade serous cancers arising in women with germline BRCA1 or BRCA2 mutation showed a high frequency of small chromosomal deletions. These findings indicate that BRCA1/2 germline mutation may contribute to widespread structural change and that other undefined mechanism(s), which are potentially shared with triple-negative breast cancer, promote tandem chromosomal duplications that sculpt the ovarian cancer genome.
ovarian cancer; structural rearrangements; TSHZ3
All cancers carry somatic mutations in their genomes. A subset, known as driver mutations, confer clonal selective advantage on cancer cells and are causally implicated in oncogenesis1, and the remainder are passenger mutations. The driver mutations and mutational processes operative in breast cancer have not yet been comprehensively explored. Here we examine the genomes of 100 tumours for somatic copy number changes and mutations in the coding exons of protein-coding genes. The number of somatic mutations varied markedly between individual tumours. We found strong correlations between mutation number, age at which cancer was diagnosed and cancer histological grade, and observed multiple mutational signatures, including one present in about ten per cent of tumours characterized by numerous mutations of cytosine at TpC dinucleotides. Driver mutations were identified in several new cancer genes including AKT2, ARID1B, CASP8, CDKN1B, MAP3K1, MAP3K13, NCOR1, SMARCD1 and TBX3. Among the 100 tumours, we found driver mutations in at least 40 cancer genes and 73 different combinations of mutated cancer genes. The results highlight the substantial genetic diversity underlying this common disease.
Cancer evolves dynamically as clonal expansions supersede one another driven by shifting selective pressures, mutational processes, and disrupted cancer genes. These processes mark the genome, such that a cancer’s life history is encrypted in the somatic mutations present. We developed algorithms to decipher this narrative and applied them to 21 breast cancers. Mutational processes evolve across a cancer’s lifespan, with many emerging late but contributing extensive genetic variation. Subclonal diversification is prominent, and most mutations are found in just a fraction of tumor cells. Every tumor has a dominant subclonal lineage, representing more than 50% of tumor cells. Minimal expansion of these subclones occurs until many hundreds to thousands of mutations have accumulated, implying the existence of long-lived, quiescent cell lineages capable of substantial proliferation upon acquisition of enabling genomic changes. Expansion of the dominant subclone to an appreciable mass may therefore represent the final rate-limiting step in a breast cancer’s development, triggering diagnosis.
Multiple somatic rearrangements are often found in cancer genomes. However, the underlying processes of rearrangement and their contribution to cancer development are poorly characterised. Here, we employed a paired-end sequencing strategy to identify somatic rearrangements in breast cancer genomes. There are more rearrangements in some breast cancers than previously appreciated. Rearrangements are more frequent over gene footprints and most are intrachromosomal. Multiple architectures of rearrangement are present, but tandem duplications are common in some cancers, perhaps reflecting a specific defect in DNA maintenance. Short overlapping sequences at most rearrangement junctions suggest that these have been mediated by non-homologous end-joining DNA repair, although varying sequence patterns indicate that multiple processes of this type are operative. Several expressed in-frame fusion genes were identified but none were recurrent. The study provides a new perspective on cancer genomes, highlighting the diversity of somatic rearrangements and their potential contribution to cancer development.
All cancers carry somatic mutations. The patterns of mutation in cancer genomes reflect the DNA damage and repair processes to which cancer cells and their precursors have been exposed. To explore these mechanisms further, we generated catalogs of somatic mutation from 21 breast cancers and applied mathematical methods to extract mutational signatures of the underlying processes. Multiple distinct single- and double-nucleotide substitution signatures were discernible. Cancers with BRCA1 or BRCA2 mutations exhibited a characteristic combination of substitution mutation signatures and a distinctive profile of deletions. Complex relationships between somatic mutation prevalence and transcription were detected. A remarkable phenomenon of localized hypermutation, termed “kataegis,” was observed. Regions of kataegis differed between cancers but usually colocalized with somatic rearrangements. Base substitutions in these regions were almost exclusively of cytosine at TpC dinucleotides. The mechanisms underlying most of these mutational signatures are unknown. However, a role for the APOBEC family of cytidine deaminases is proposed.
► The genomes of 21 breast cancers sequenced ► Multiple somatic mutational processes extracted from mutation catalogs ► Mutational processes of BRCA1/BRCA2 breast cancers are distinctive ► Localized regions of hypermutation, “kataegis,” are frequent in breast cancers
Analyses of breast cancer genomes define distinct mutational signatures that imply the existence of multiple distinct somatic mutational processes throughout the genome and reveal a remarkable phenomenon of localized hypermutation. These highly mutated regions vary in size and chromosomal location and are surprisingly frequent in cancer genomes, often colocalizing with somatic rearrangements.
Cancer evolves dynamically as clonal expansions supersede one another driven by shifting selective pressures, mutational processes, and disrupted cancer genes. These processes mark the genome, such that a cancer's life history is encrypted in the somatic mutations present. We developed algorithms to decipher this narrative and applied them to 21 breast cancers. Mutational processes evolve across a cancer's lifespan, with many emerging late but contributing extensive genetic variation. Subclonal diversification is prominent, and most mutations are found in just a fraction of tumor cells. Every tumor has a dominant subclonal lineage, representing more than 50% of tumor cells. Minimal expansion of these subclones occurs until many hundreds to thousands of mutations have accumulated, implying the existence of long-lived, quiescent cell lineages capable of substantial proliferation upon acquisition of enabling genomic changes. Expansion of the dominant subclone to an appreciable mass may therefore represent the final rate-limiting step in a breast cancer's development, triggering diagnosis.
► Genome-wide analyses of mutations emerging through time in 21 breast cancers ► Minimal expansion of subclones occurs until thousands of mutations have accumulated ► Cancer-specific signatures of point mutations and genomic instability emerge late ► ERBB2 amplification begins early but continues to evolve over long molecular time
Newly developed algorithms allow the reconstruction of the genomic history of different breast cancers, tracing the temporal evolution of each tumor and the emergence of the dominant subclones that will eventually trigger diagnosis.
All cancers carry somatic mutations. A subset of these somatic alterations, termed driver mutations, confer selective growth advantage and are implicated in cancer development, whereas the remainder are passengers. Here we have sequenced the genomes of a malignant melanoma and a lymphoblastoid cell line from the same person, providing the first comprehensive catalogue of somatic mutations from an individual cancer. The catalogue provides remarkable insights into the forces that have shaped this cancer genome. The dominant mutational signature reflects DNA damage due to ultraviolet light exposure, a known risk factor for malignant melanoma, whereas the uneven distribution of mutations across the genome, with a lower prevalence in gene footprints, indicates that DNA repair has been preferentially deployed towards transcribed regions. The results illustrate the power of a cancer genome sequence to reveal traces of the DNA damage, repair, mutation and selection processes that were operative years before the cancer became symptomatic.
The genetics of renal cancer is dominated by inactivation of the VHL tumour suppressor gene in clear cell carcinoma (ccRCC), the commonest histological subtype. A recent large-scale screen of ~3500 genes by PCR-based exon re-sequencing identified several new cancer genes in ccRCC including UTX (KDM6A)1, JARID1C (KDM5C) and SETD22. These genes encode enzymes that demethylate (UTX, JARID1C) or methylate (SETD2) key lysine residues of histone H3. Modification of the methylation state of these lysine residues of histone H3 regulates chromatin structure and is implicated in transcriptional control3. However, together these mutations are present in fewer than 15% of ccRCC, suggesting the existence of additional, currently unidentified cancer genes. Here, we have sequenced the protein coding exome in a series of primary ccRCC and report the identification of the SWI/SNF chromatin remodeling complex gene PBRM14 as a second major ccRCC cancer gene, with truncating mutations in 41% (92/227) of cases. These data further elucidate the somatic genetic architecture of ccRCC and emphasize the marked contribution of aberrant chromatin biology.
Pancreatic cancer is an aggressive malignancy with 5-year mortality of 97–98%, usually due to widespread metastatic disease. Previous studies indicate that this disease has a complex genomic landscape, with frequent copy number changes and point mutations1–5, but genomic rearrangements have not been characterised in detail. Despite the clinical importance of metastasis, there remain fundamental questions about the clonal structures of metastatic tumours6,7, including phylogenetic relationships among metastases, the scale of on-going parallel evolution in metastatic and primary sites7, and how the tumour disseminates. Here, we harness advances in DNA sequencing8–12 to annotate genomic rearrangements in 13 patients with pancreatic cancer and explore clonal relationships among metastases. We find that pancreatic cancer acquires rearrangements indicative of telomere dysfunction and abnormal cell-cycle control, namely dysregulated G1-S phase transition with intact G2-M checkpoint. These initiate amplification of cancer genes and occur predominantly in early cancer development rather than later stages of disease. Genomic instability frequently persists after cancer dissemination, resulting in on-going, parallel and even convergent evolution among different metastases. We find evidence that there is genetic heterogeneity among metastasis-initiating cells; seeding metastasis may require driver mutations beyond those required for primary tumours; and phylogenetic trees across metastases show organ-specific branches. These data attest to the richness of genetic variation in cancer, hewn by the tandem forces of genomic instability and evolutionary selection.
Pancreatic cancer is an aggressive malignancy with a five-year mortality of 97–98%, usually due to widespread metastatic disease. Previous studies indicate that this disease has a complex genomic landscape, with frequent copy number changes and point mutations1–5, but genomic rearrangements have not been characterized in detail. Despite the clinical importance of metastasis, there remain fundamental questions about the clonal structures of metastatic tumours6,7, including phylogenetic relationships among metastases, the scale of ongoing parallel evolution in metastatic and primary sites7, and how the tumour disseminates. Here we harness advances in DNA sequencing8–12 to annotate genomic rearrangements in 13 patients with pancreatic cancer and explore clonal relationships among metastases. We find that pancreatic cancer acquires rearrangements indicative of telomere dysfunction and abnormal cell-cycle control, namely dysregulated G1-to-S-phase transition with intact G2–M checkpoint. These initiate amplification of cancer genes and occur predominantly in early cancer development rather than the later stages of the disease. Genomic instability frequently persists after cancer dissemination, resulting in ongoing, parallel and even convergent evolution among different metastases. We find evidence that there is genetic heterogeneity among metastasis-initiating cells, that seeding metastasis may require driver mutations beyond those required for primary tumours, and that phylogenetic trees across metastases show organ-specific branches. These data attest to the richness of genetic variation in cancer, brought about by the tandem forces of genomic instability and evolutionary selection.
Catalogue of Somatic Mutations in Cancer (COSMIC) (http://www.sanger.ac.uk/cosmic) is a publicly available resource providing information on somatic mutations implicated in human cancer. Release v51 (January 2011) includes data from just over 19 000 genes, 161 787 coding mutations and 5573 gene fusions, described in more than 577 000 tumour samples. COSMICMart (COSMIC BioMart) provides a flexible way to mine these data and combine somatic mutations with other biological relevant data sets. This article describes the data available in COSMIC along with examples of how to successfully mine and integrate data sets using COSMICMart.
Database URL: http://www.sanger.ac.uk/genetics/CGP/cosmic/biomart/martview/
Cancer is driven by somatically acquired point mutations and chromosomal rearrangements, conventionally thought to accumulate gradually over time. Using next-generation sequencing, we characterize a phenomenon, which we term chromothripsis, whereby tens to hundreds of genomic rearrangements occur in a one-off cellular crisis. Rearrangements involving one or a few chromosomes crisscross back and forth across involved regions, generating frequent oscillations between two copy number states. These genomic hallmarks are highly improbable if rearrangements accumulate over time and instead imply that nearly all occur during a single cellular catastrophe. The stamp of chromothripsis can be seen in at least 2%–3% of all cancers, across many subtypes, and is present in ∼25% of bone cancers. We find that one, or indeed more than one, cancer-causing lesion can emerge out of the genomic crisis. This phenomenon has important implications for the origins of genomic remodeling and temporal emergence of cancer.
► 2%–3% cancers show 10–100 s of rearrangements localized to specific genomic regions ► Genomic features imply chromosome breaks occur in one-off crisis (“chromothripsis”) ► Found across all tumor types, especially common in bone cancers (up to 25%) ► Can generate several genomic lesions with potential to drive cancer in single event
COSMIC (http://www.sanger.ac.uk/cosmic) curates comprehensive information on somatic mutations in human cancer. Release v48 (July 2010) describes over 136 000 coding mutations in almost 542 000 tumour samples; of the 18 490 genes documented, 4803 (26%) have one or more mutations. Full scientific literature curations are available on 83 major cancer genes and 49 fusion gene pairs (19 new cancer genes and 30 new fusion pairs this year) and this number is continually increasing. Key amongst these is TP53, now available through a collaboration with the IARC p53 database. In addition to data from the Cancer Genome Project (CGP) at the Sanger Institute, UK, and The Cancer Genome Atlas project (TCGA), large systematic screens are also now curated. Major website upgrades now make these data much more mineable, with many new selection filters and graphics. A Biomart is now available allowing more automated data mining and integration with other biological databases. Annotation of genomic features has become a significant focus; COSMIC has begun curating full-genome resequencing experiments, developing new web pages, export formats and graphics styles. With all genomic information recently updated to GRCh37, COSMIC integrates many diverse types of mutation information and is making much closer links with Ensembl and other data resources.
Clear cell renal cell carcinoma (ccRCC) is the most common form of adult kidney cancer, characterised by the presence of inactivating mutations in the VHL gene in the majority of cases1,2 and by infrequent somatic mutations in known cancer genes. To elucidate further the genetics of ccRCC, we have sequenced 101 cases through 3544 protein coding genes. Here we report the identification of inactivating mutations in two genes encoding enzymes involved in histone modification, SETD2, a histone H3 lysine 36 methyltransferase and JARID1C (KDM5C), a histone H3 lysine 4 demethylase in addition to mutations in the histone H3 lysine 27 demethylase, UTX (KMD6A), we recently reported3. The results highlight the role of mutations in components of the chromatin modification machinery in human cancer. Additionally, NF2 mutations were found in non-VHL mutated ccRCC and several other likely cancer genes were identified. These results indicate that substantial genetic heterogeneity exists in a cancer type dominated by mutations in a single gene and that systematic screens will be key to fully elucidating the somatic genetic architecture of cancer.
Cancer is driven by mutation. Worldwide, tobacco smoking is the major lifestyle exposure that causes cancer, exerting carcinogenicity through >60 chemicals that bind and mutate DNA. Using massively parallel sequencing technology, we sequenced a small cell lung cancer cell line, NCI-H209, to explore the mutational burden associated with tobacco smoking. 22,910 somatic substitutions were identified, including 132 in coding exons. Multiple mutation signatures testify to the cocktail of carcinogens in tobacco smoke and their proclivities for particular bases and surrounding sequence context. Effects of transcription-coupled repair and a second, more general expression-linked repair pathway were evident. We identified a tandem duplication that duplicates exons 3-8 of CHD7 in-frame, and another two lines carrying PVT1-CHD7 fusion genes, suggesting that CHD7 may be recurrently rearranged in this disease. These findings illustrate the potential for next-generation sequencing to provide unprecedented insights into mutational processes, cellular repair pathways and gene networks associated with cancer.
A novel missense mutation in the mediator of RNA polymerase II transcription subunit 12 (MED12) gene has been found in the original family with Lujan syndrome and in a second family (K9359) that was initially considered to have Opitz–Kaveggia (FG) syndrome. A different missense mutation in the MED12 gene has been reported previously in the original family with FG syndrome and in five other families with compatible clinical findings. Neither sequence alteration has been found in over 1400 control X chromosomes. Lujan (Lujan–Fryns) syndrome is characterised by tall stature with asthenic habitus, macrocephaly, a tall narrow face, maxillary hypoplasia, a high narrow palate with dental crowding, a small or receding chin, long hands with hyperextensible digits, hypernasal speech, hypotonia, mild‐to‐moderate mental retardation, behavioural aberrations and dysgenesis of the corpus callosum. Although Lujan syndrome has not been previously considered to be in the differential diagnosis of FG syndrome, there are some overlapping clinical manifestations. Specifically, these are dysgenesis of the corpus callosum, macrocephaly/relative macrocephaly, a tall forehead, hypotonia, mental retardation and behavioural disturbances. Thus, it seems that these two X‐linked mental retardation syndromes are allelic, with mutations in the MED12 gene.
Somatically acquired epigenetic changes are present in many cancers. Epigenetic regulation is maintained via post-translational modifications of core histones. Here, we describe inactivating somatic mutations in the histone lysine demethylase, UTX, pointing to histone H3 lysine methylation deregulation in multiple tumour types. UTX reintroduction into cancer cells with inactivating UTX mutations resulted in slowing of proliferation and marked transcriptional changes. These data identify UTX as a new human cancer gene.
Large-scale systematic resequencing has been proposed as the key future strategy for the discovery of rare, disease-causing sequence variants across the spectrum of human complex disease. We have sequenced the coding exons of the X chromosome in 208 families with X-linked mental retardation (XLMR), the largest direct screen for constitutional disease-causing mutations thus far reported. The screen has discovered nine genes implicated in XLMR, including SYP, ZNF711 and CASK reported here, confirming the power of this strategy. The study has, however, also highlighted issues confronting whole-genome sequencing screens, including the observation that loss of function of 1% or more of X-chromosome genes is compatible with apparently normal existence.
The catalogue of Somatic Mutations in Cancer (COSMIC) (http://www.sanger.ac.uk/cosmic/) is the largest public resource for information on somatically acquired mutations in human cancer and is available freely without restrictions. Currently (v43, August 2009), COSMIC contains details of 1.5-million experiments performed through 13 423 genes in almost 370 000 tumours, describing over 90 000 individual mutations. Data are gathered from two sources, publications in the scientific literature, (v43 contains 7797 curated articles) and the full output of the genome-wide screens from the Cancer Genome Project (CGP) at the Sanger Institute, UK. Most of the world’s literature on point mutations in human cancer has now been curated into COSMIC and while this is continually updated, a greater emphasis on curating fusion gene mutations is driving the expansion of this information; over 2700 fusion gene mutations are now described. Whole-genome sequencing screens are now identifying large numbers of genomic rearrangements in cancer and COSMIC is now displaying details of these analyses also. Examination of COSMIC’s data is primarily web-driven, focused on providing mutation range and frequency statistics based upon a choice of gene and/or cancer phenotype. Graphical views provide easily interpretable summaries of large quantities of data, and export functions can provide precise details of user-selected data.
Cancers arise owing to mutations in a subset of genes that confer growth advantage. The availability of the human genome sequence led us to propose that systematic resequencing of cancer genomes for mutations would lead to the discovery of many additional cancer genes. Here we report more than 1,000 somatic mutations found in 274 megabases (Mb) of DNA corresponding to the coding exons of 518 protein kinase genes in 210 diverse human cancers. There was substantial variation in the number and pattern of mutations in individual cancers reflecting different exposures, DNA repair defects and cellular origins. Most somatic mutations are likely to be ‘passengers’ that do not contribute to oncogenesis. However, there was evidence for ‘driver’ mutations contributing to the development of the cancers studied in approximately 120 genes. Systematic sequencing of cancer genomes therefore reveals the evolutionary diversity of cancers and implicates a larger repertoire of cancer genes than previously anticipated.
Human cancers often carry many somatically acquired genomic rearrangements, some of which may be implicated in cancer development. However, conventional strategies for characterizing rearrangements are laborious and low-throughput and have low sensitivity or poor resolution. We used massively parallel sequencing to generate sequence reads from both ends of short DNA fragments derived from the genomes of two individuals with lung cancer. By investigating read pairs that did not align correctly with respect to each other on the reference human genome, we characterized 306 germline structural variants and 103 somatic rearrangements to the base-pair level of resolution. The patterns of germline and somatic rearrangement were markedly different. Many somatic rearrangements were from amplicons, although rearrangements outside these regions, notably including tandem duplications, were also observed. Some somatic rearrangements led to abnormal transcripts, including two from internal tandem duplications and two fusion transcripts created by interchromosomal rearrangements. Germline variants were predominantly mediated by retrotransposition, often involving AluY and LINE elements. The results demonstrate the feasibility of systematic, genome-wide characterization of rearrangements in complex human cancer genomes, raising the prospect of a new harvest of genes associated with cancer using this strategy.