Clonal evolution is a key feature of cancer progression and relapse. We studied intratumoral heterogeneity in 149 chronic lymphocytic leukemia (CLL) cases by integrating whole-exome sequence and copy number to measure the fraction of cancer cells harboring each somatic mutation. We identified driver mutations as predominantly clonal (e.g., MYD88, trisomy 12 and del(13q)) or subclonal (e.g., SF3B1, TP53), corresponding to earlier and later events in CLL evolution. We sampled leukemia cells from 18 patients at two timepoints. Ten of 12 CLL cases treated with chemotherapy (but only 1 of 6 without treatment) underwent clonal evolution, predominantly involving subclones with driver mutations (e.g., SF3B1, TP53) that expanded over time. Furthermore, presence of a subclonal driver mutation was an independent risk factor for rapid disease progression. Our study thus uncovers patterns of clonal evolution in CLL, providing insights into its stepwise transformation, and links the presence of subclones with adverse clinical outcome.
Major international projects are now underway aimed at creating a comprehensive catalog of all genes responsible for the initiation and progression of cancer. These studies involve sequencing of matched tumor–normal samples followed by mathematical analysis to identify those genes in which mutations occur more frequently than expected by random chance. Here, we describe a fundamental problem with cancer genome studies: as the sample size increases, the list of putatively significant genes produced by current analytical methods burgeons into the hundreds. The list includes many implausible genes (such as those encoding olfactory receptors and the muscle protein titin), suggesting extensive false positive findings that overshadow true driver events. Here, we show that this problem stems largely from mutational heterogeneity and provide a novel analytical methodology, MutSigCV, for resolving the problem. We apply MutSigCV to exome sequences from 3,083 tumor-normal pairs and discover extraordinary variation in (i) mutation frequency and spectrum within cancer types, which shed light on mutational processes and disease etiology, and (ii) mutation frequency across the genome, which is strongly correlated with DNA replication timing and also with transcriptional activity. By incorporating mutational heterogeneity into the analyses, MutSigCV is able to eliminate most of the apparent artefactual findings and allow true cancer genes to rise to attention.
While genetic lesions responsible for some Mendelian disorders can be rapidly discovered through massively parallel sequencing (MPS) of whole genomes or exomes, not all diseases readily yield to such efforts. We describe the illustrative case of the simple Mendelian disorder medullary cystic kidney disease type 1 (MCKD1), mapped more than a decade ago to a 2-Mb region on chromosome 1. Ultimately, only by cloning, capillary sequencing, and de novo assembly, we found that each of six MCKD1 families harbors an equivalent, but apparently independently arising, mutation in sequence dramatically underrepresented in MPS data: the insertion of a single C in one copy (but a different copy in each family) of the repeat unit comprising the extremely long (~1.5-5 kb), GC-rich (>80%), coding VNTR in the mucin 1 gene. The results provide a cautionary tale about the challenges in identifying genes responsible for Mendelian, let alone more complex, disorders through MPS.
Detection of somatic point substitutions is a key step in characterizing the cancer genome. Mutations in cancer are rare (0.1–100/Mb) and often occur only in a subset of the sequenced cells, either due to contamination by normal cells or due to tumor heterogeneity. Consequently, mutation calling methods need to be both specific, avoiding false positives, and sensitive to detect clonal and sub-clonal mutations. The decreased sensitivity of existing methods for low allelic fraction mutations highlights the pressing need for improved and systematically evaluated mutation detection methods. Here we present MuTect, a method based on a Bayesian classifier designed to detect somatic mutations with very low allele-fractions, requiring only a few supporting reads, followed by a set of carefully tuned filters that ensure high specificity. We also describe novel benchmarking approaches, which use real sequencing data to evaluate the sensitivity and specificity as a function of sequencing depth, base quality and allelic fraction. Compared with other methods, MuTect has higher sensitivity with similar specificity, especially for mutations with allelic fractions as low as 0.1 and below, making MuTect particularly useful for studying cancer subclones and their evolution in standard exome and genome sequencing data.
The incidence of esophageal adenocarcinoma (EAC) has risen 600% over the last 30 years. With a five-year survival rate of 15%, identification of new therapeutic targets for EAC is greatly important. We analyze the mutation spectra from whole exome sequencing of 149 EAC tumors/normal pairs, 15 of which have also been subjected to whole genome sequencing. We identify a mutational signature defined by a high prevalence of A to C transversions at AA dinucleotides. Statistical analysis of exome data identified significantly mutated 26 genes. Of these genes, four (TP53, CDKN2A, SMAD4, and PIK3CA) have been previously implicated in EAC. The novel significantly mutated genes include chromatin modifying factors and candidate contributors: SPG20, TLR4, ELMO1, and DOCK2. Functional analyses of EAC-derived mutations in ELMO1 reveal increased cellular invasion. Therefore, we suggest a new hypothesis about the potential activation of the RAC1 pathway to be a contributor to EAC tumorigenesis.
Prior studies have identified recurrent oncogenic mutations in colorectal adenocarcinoma1 and have surveyed exons of protein-coding genes for mutations in 11 affected individuals2,3. Here we report whole-genome sequencing from nine individuals with colorectal cancer, including primary colorectal tumors and matched adjacent non-tumor tissues, at an average of 30.7× and 31.9× coverage, respectively. We identify an average of 75 somatic rearrangements per tumor, including complex networks of translocations between pairs of chromosomes. Eleven rearrangements encode predicted in-frame fusion proteins, including a fusion of VTI1A and TCF7L2 found in 3 out of 97 colorectal cancers. Although TCF7L2 encodes TCF4, which cooperates with β-catenin4 in colorectal carcinogenesis5,6, the fusion lacks the TCF4 β-catenin–binding domain. We found a colorectal carcinoma cell line harboring the fusion gene to be dependent on VTI1A-TCF7L2 for anchorage-independent growth using RNA interference-mediated knockdown. This study shows previously unidentified levels of genomic rearrangements in colorectal carcinoma that can lead to essential gene fusions and other oncogenic events.
Cytogenetic and molecular cytogenetic studies demonstrate association between congenital diaphragmatic hernia (CDH) and chromosome 1q41q42 deletions. In this study, we screened a large CDH cohort (N=179) for microdeletions in this interval by the multiplex ligation-dependent probe amplification (MLPA) technique, and also sequenced two candidate genes located therein, dispatched 1 (DISP1) and homo sapiens H2.0-like homeobox (HLX). MLPA analysis verified deletions of this region in two cases, an unreported patient with a 46,XY,del(1)(q41q42.13) karyotype and a previously reported patient with a Fryns syndrome phenotype [Kantarci et al., 2006]. HLX sequencing showed a novel but maternally inherited single nucleotide variant (c.27C>G) in a patient with isolated CDH, while DISP1 sequencing revealed a mosaic de novo heterozygous substitution (c.4412C>G; p.Ala1471Gly) in a male with a left-sided Bochdalek hernia plus multiple other anomalies. Pyrosequencing demonstrated the mutant allele was present in 43%, 12%, and 4.5% of the patient’s lymphoblastoid, peripheral blood lymphocytes, and saliva cells, respectively. We examined Disp1 expression at day E11.5 of mouse diaphragm formation and confirmed its presence in the pleuroperitoneal fold, as well as the nearby lung which also expresses Sonic hedgehog (Shh).
Our report describes the first de novo DISP1 point mutation in a patient with complex CDH. Combining this finding with Disp1 embryonic mouse diaphragm and lung tissue expression, as well as previously reported human chromosome 1q41q42 aberrations in patients with CDH, suggests that DISP1 may warrant further consideration as a CDH candidate gene.
congenital diaphragmatic hernia (CDH); chromosome 1q41q42 deletion; microdeletion; MLPA; pyrosequencing; Sonic Hedgehog (SHH) pathway; pleuroperitoneal fold (PPF)
Lung adenocarcinoma, the most common subtype of non-small cell lung cancer, is responsible for over 500,000 deaths per year worldwide. Here, we report exome and genome sequences of 183 lung adenocarcinoma tumor/normal DNA pairs. These analyses revealed a mean exonic somatic mutation rate of 12.0 events/megabase and identified the majority of genes previously reported as significantly mutated in lung adenocarcinoma. In addition, we identified statistically recurrent somatic mutations in the splicing factor gene U2AF1 and truncating mutations affecting RBM10 and ARID1A. Analysis of nucleotide context-specific mutation signatures grouped the sample set into distinct clusters that correlated with smoking history and alterations of reported lung adenocarcinoma genes. Whole genome sequence analysis revealed frequent structural re-arrangements, including in-frame exonic alterations within EGFR and SIK2 kinases. The candidate genes identified in this study are attractive targets for biological characterization and therapeutic targeting of lung adenocarcinoma.
Neuroblastoma is a malignancy of the developing sympathetic nervous system that often presents with widespread metastatic disease, resulting in survival rates of less than 50%1. To determine the spectrum of somatic mutation in high-risk neuroblastoma, we studied 240 cases using a combination of whole exome, genome and transcriptome sequencing as part of the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) initiative. Here we report a low median exonic mutation frequency of 0.60 per megabase (0.48 non-silent), and remarkably few recurrently mutated genes in these tumors. Genes with significant somatic mutation frequencies included ALK (9.2% of cases), PTPN11 (2.9%), ATRX (2.5%, an additional 7.1% had focal deletions), MYCN (1.7%, a recurrent p.Pro44Leu alteration), and NRAS (0.83%). Rare, potentially pathogenic germline variants were significantly enriched in ALK, CHEK2, PINK1, and BARD1. The relative paucity of recurrent somatic mutations in neuroblastoma challenges current therapeutic strategies reliant upon frequently altered oncogenic drivers.
The somatic genetic basis of chronic lymphocytic leukemia, a common and clinically heterogeneous leukemia occurring in adults, remains poorly understood.
We obtained DNA samples from leukemia cells in 91 patients with chronic lymphocytic leukemia and performed massively parallel sequencing of 88 whole exomes and whole genomes, together with sequencing of matched germline DNA, to characterize the spectrum of somatic mutations in this disease.
Nine genes that are mutated at significant frequencies were identified, including four with established roles in chronic lymphocytic leukemia (TP53 in 15% of patients, ATM in 9%, MYD88 in 10%, and NOTCH1 in 4%) and five with unestablished roles (SF3B1, ZMYM3, MAPK1, FBXW7, and DDX3X). SF3B1, which functions at the catalytic core of the spliceosome, was the second most frequently mutated gene (with mutations occurring in 15% of patients). SF3B1 mutations occurred primarily in tumors with deletions in chromosome 11q, which are associated with a poor prognosis in patients with chronic lymphocytic leukemia. We further discovered that tumor samples with mutations in SF3B1 had alterations in pre–messenger RNA (mRNA) splicing.
Our study defines the landscape of somatic mutations in chronic lymphocytic leukemia and highlights pre-mRNA splicing as a critical cellular process contributing to chronic lymphocytic leukemia.
Multiple myeloma is an incurable malignancy of plasma cells, and its pathogenesis is poorly understood. Here we report the massively parallel sequencing of 38 tumor genomes and their comparison to matched normal DNAs. Several new and unexpected oncogenic mechanisms were suggested by the pattern of somatic mutation across the dataset. These include the mutation of genes involved in protein translation (seen in nearly half of the patients), genes involved in histone methylation, and genes involved in blood coagulation. In addition, a broader than anticipated role of NF-κB signaling was suggested by mutations in 11 members of the NF-κB pathway. Of potential immediate clinical relevance, activating mutations of the kinase BRAF were observed in 4% of patients, suggesting the evaluation of BRAF inhibitors in multiple myeloma clinical trials. These results indicate that cancer genome sequencing of large collections of samples will yield new insights into cancer not anticipated by existing knowledge.
Because of the high risk of recurrence in high-grade serous ovarian carcinoma (HGS-OvCa), the development of outcome predictors could be valuable for patient stratification. Using the catalog of The Cancer Genome Atlas (TCGA), we developed subtype and survival gene expression signatures, which, when combined, provide a prognostic model of HGS-OvCa classification, named “Classification of Ovarian Cancer” (CLOVAR). We validated CLOVAR on an independent dataset consisting of 879 HGS-OvCa expression profiles. The worst outcome group, accounting for 23% of all cases, was associated with a median survival of 23 months and a platinum resistance rate of 63%, versus a median survival of 46 months and platinum resistance rate of 23% in other cases. Associating the outcome prediction model with BRCA1/BRCA2 mutation status, residual disease after surgery, and disease stage further optimized outcome classification. Ovarian cancer is a disease in urgent need of more effective therapies. The spectrum of outcomes observed here and their association with CLOVAR signatures suggests variations in underlying tumor biology. Prospective validation of the CLOVAR model in the context of additional prognostic variables may provide a rationale for optimal combination of patient and treatment regimens.
Melanoma is notable for its metastatic propensity, lethality in the advanced setting, and association with ultraviolet (UV) exposure early in life1. To obtain a comprehensive genomic view of melanoma, we sequenced the genomes of 25 metastatic melanomas and matched germline DNA. A wide range of point mutation rates was observed: lowest in melanomas whose primaries arose on non-UV exposed hairless skin of the extremities (3 and 14 per Mb genome), intermediate in those originating from hair-bearing skin of the trunk (range = 5 to 55 per Mb), and highest in a patient with a documented history of chronic sun exposure (111 per Mb). Analysis of whole-genome sequence data identified PREX2 - a PTEN-interacting protein and negative regulator of PTEN in breast cancer2 - as a significantly mutated gene with a mutation frequency of approximately 14% in an independent extension cohort of 107 human melanomas. PREX2 mutations are biologically relevant, as ectopic expression of mutant PREX2 accelerated tumor formation of immortalized human melanocytes in vivo. Thus, whole-genome sequencing of human melanoma tumors revealed genomic evidence of UV pathogenesis and discovered a new recurrently mutated gene in melanoma.
The systematic translation of cancer genomic data into knowledge of tumor biology and therapeutic avenues remains challenging. Such efforts should be greatly aided by robust preclinical model systems that reflect the genomic diversity of human cancers and for which detailed genetic and pharmacologic annotation is available1. Here we describe the Cancer Cell Line Encyclopedia (CCLE): a compilation of gene expression, chromosomal copy number, and massively parallel sequencing data from 947 human cancer cell lines. When coupled with pharmacologic profiles for 24 anticancer drugs across 479 of the lines, this collection allowed identification of genetic, lineage, and gene expression-based predictors of drug sensitivity. In addition to known predictors, we found that plasma cell lineage correlated with sensitivity to IGF1 receptor inhibitors; AHR expression was associated with MEK inhibitor efficacy in NRAS-mutant lines; and SLFN11 expression predicted sensitivity to topoisomerase inhibitors. Altogether, our results suggest that large, annotated cell line collections may help to enable preclinical stratification schemata for anticancer agents. The generation of genetic predictions of drug response in the preclinical setting and their incorporation into cancer clinical trial design could speed the emergence of “personalized” therapeutic regimens2.
Head and neck squamous cell carcinoma (HNSCC) is a common, morbid, and frequently lethal malignancy. To uncover its mutational spectrum, we analyzed whole-exome sequencing data from 74 tumor-normal pairs. The majority exhibited a mutational profile consistent with tobacco exposure; human papilloma virus was detectable by sequencing of DNA from infected tumors. In addition to identifying previously known HNSCC genes (TP53, CDKN2A, PTEN, PIK3CA, and HRAS), the analysis revealed many genes not previously implicated in this malignancy. At least 30% of cases harbored mutations in genes that regulate squamous differentiation (e.g., NOTCH1, IRF6, and TP63), implicating its dysregulation as a major driver of HNSCC carcinogenesis. More generally, the results indicate the ability of large-scale sequencing to reveal fundamental tumorigenic mechanisms.
Cancer is principally considered a genetic disease, and numerous mutations are thought essential to drive its growth. However, the existence of genomically stable cancers and the emergence of mutations in genes that encode chromatin remodelers raise the possibility that perturbation of chromatin structure and epigenetic regulation are capable of driving cancer formation. Here we sequenced the exomes of 35 rhabdoid tumors, highly aggressive cancers of early childhood characterized by biallelic loss of SMARCB1, a subunit of the SWI/SNF chromatin remodeling complex. We identified an extremely low rate of mutation, with loss of SMARCB1 being essentially the sole recurrent event. Indeed, in 2 of the cancers there were no other identified mutations. Our results demonstrate that high mutation rates are dispensable for the genesis of cancers driven by mutation of a chromatin remodeling complex. Consequently, cancer can be a remarkably genetically simple disease.
Exome sequencing is a powerful tool for discovery of the Mendelian disease genes. Previously, we reported a novel locus for autosomal recessive non-syndromic mental retardation (NSMR) in a consanguineous family [Nolan, D.K., Chen, P., Das, S., Ober, C. and Waggoner, D. (2008) Fine mapping of a locus for nonsyndromic mental retardation on chromosome 19p13. Am. J. Med. Genet. A, 146A, 1414–1422]. Using linkage and homozygosity mapping, we previously localized the gene to chromosome 19p13. The parents of this sibship were recently included in an exome sequencing project. Using a series of filters, we narrowed the putative causal mutation to a single variant site that segregated with NSMR: the mutation was homozygous in five affected siblings but in none of eight unaffected siblings. This mutation causes a substitution of a leucine for a highly conserved proline at amino acid 182 in TECR (trans-2,3-enoyl-CoA reductase), a synaptic glycoprotein. Our results reveal the value of massively parallel sequencing for identification of novel disease genes that could not be found using traditional approaches and identifies only the seventh causal mutation for autosomal recessive NSMR.
The Clinic for Special Children (CSC) has integrated biochemical and molecular methods into a rural pediatric practice serving Old Order Amish and Mennonite (Plain) children. Among the Plain people, we have used single nucleotide polymorphism (SNP) microarrays to genetically map recessive disorders to large autozygous haplotype blocks (mean = 4.4 Mb) that contain many genes (mean = 79). For some, uninformative mapping or large gene lists preclude disease-gene identification by Sanger sequencing. Seven such conditions were selected for exome sequencing at the Broad Institute; all had been previously mapped at the CSC using low density SNP microarrays coupled with autozygosity and linkage analyses. Using between 1 and 5 patient samples per disorder, we identified sequence variants in the known disease-causing genes SLC6A3 and FLVCR1, and present evidence to strongly support the pathogenicity of variants identified in TUBGCP6, BRAT1, SNIP1, CRADD, and HARS. Our results reveal the power of coupling new genotyping technologies to population-specific genetic knowledge and robust clinical data.
Rare coding variants constitute an important class of human genetic variation, but are underrepresented in current databases that are based on small population samples. Recent studies show that variants altering amino acid sequence and protein function are enriched at low variant allele frequency, 2 to 5%, but because of insufficient sample size it is not clear if the same trend holds for rare variants below 1% allele frequency.
The 1000 Genomes Exon Pilot Project has collected deep-coverage exon-capture data in roughly 1,000 human genes, for nearly 700 samples. Although medical whole-exome projects are currently afoot, this is still the deepest reported sampling of a large number of human genes with next-generation technologies. According to the goals of the 1000 Genomes Project, we created effective informatics pipelines to process and analyze the data, and discovered 12,758 exonic SNPs, 70% of them novel, and 74% below 1% allele frequency in the seven population samples we examined. Our analysis confirms that coding variants below 1% allele frequency show increased population-specificity and are enriched for functional variants.
This study represents a large step toward detecting and interpreting low frequency coding variation, clearly lays out technical steps for effective analysis of DNA capture data, and articulates functional and population properties of this important class of genetic variation.
Prostate cancer is the second most common cause of male cancer deaths in the United States. Here we present the complete sequence of seven primary prostate cancers and their paired normal counterparts. Several tumors contained complex chains of balanced rearrangements that occurred within or adjacent to known cancer genes. Rearrangement breakpoints were enriched near open chromatin, androgen receptor and ERG DNA binding sites in the setting of the ETS gene fusion TMPRSS2-ERG, but inversely correlated with these regions in tumors lacking ETS fusions. This observation suggests a link between chromatin or transcriptional regulation and the genesis of genomic aberrations. Three tumors contained rearrangements that disrupted CADM2, and four harbored events disrupting either PTEN (unbalanced events), a prostate tumor suppressor, or MAGI2 (balanced events), a PTEN interacting protein not previously implicated in prostate tumorigenesis. Thus, genomic rearrangements may arise from transcriptional or chromatin aberrancies to engage prostate tumorigenic mechanisms.
We sequenced all protein-coding regions of the genome (the “exome”) in two family members with combined hypolipidemia, marked by extremely low plasma levels of low-density lipoprotein (LDL) cholesterol, high-density lipoprotein (HDL) cholesterol, and triglycerides. These two participants were compound heterozygotes for two distinct nonsense mutations in ANGPTL3 (encoding the angiopoietin-like 3 protein). ANGPTL3 has been reported to inhibit lipoprotein lipase and endothelial lipase, thereby increasing plasma triglyceride and HDL cholesterol levels in rodents. Our finding of ANGPTL3 mutations highlights a role for the gene in LDL cholesterol metabolism in humans and shows the usefulness of exome sequencing for identification of novel genetic causes of inherited disorders. (Funded by the National Human Genome Research Institute and others.)
Sarcoma; DNA copy number; Sequencing; RNAi
Joubert syndrome (JBTS), related disorders (JSRD) and Meckel syndrome (MKS) are ciliopathies. We now report that MKS2 and JBTS2 loci are allelic and due to mutations in TMEM216, encoding an uncharacterized tetraspan transmembrane protein. JBTS2 patients displayed frequent nephronophthisis and polydactytly, and two cases conformed to the Oro-Facio-Digital type VI phenotype, whereas skeletal dysplasia was common in MKS fetuses. A single p.R73L mutation was identified in all patients of Ashkenazi Jewish descent (n=10). TMEM216 localized to the base of primary cilia, and loss of TMEM216 in patient fibroblasts or following siRNA knockdown caused defective ciliogenesis and centrosomal docking, with concomitant hyperactivation of RhoA and Dishevelled. TMEM216 complexed with Meckelin, encoded by a gene also mutated in JSRD and MKS. Abrogation of tmem216 expression in zebrafish led to gastrulation defects that overlap with other ciliary morphants. The data implicate a new family of proteins in the ciliopathies, and further support allelism between ciliopathy disorders.
Determining the genetic basis of cancer requires comprehensive analyses of large collections of histopathologically well-classified primary tumours. Here we report the results of a collaborative study to discover somatic mutations in 188 human lung adenocarcinomas. DNA sequencing of 623 genes with known or potential relationships to cancer revealed more than 1,000 somatic mutations across the samples. Our analysis identified 26 genes that are mutated at significantly high frequencies and thus are probably involved in carcinogenesis. The frequently mutated genes include tyrosine kinases, among them the EGFR homologue ERBB4; multiple ephrin receptor genes, notably EPHA3; vascular endothelial growth factor receptor KDR; and NTRK genes. These data provide evidence of somatic mutations in primary lung adenocarcinoma for several tumour suppressor genes involved in other cancers—including NF1, APC, RB1 and ATM—and for sequence changes in PTPRD as well as the frequently deleted gene LRP1B. The observed mutational profiles correlate with clinical features, smoking status and DNA repair defects. These results are reinforced by data integration including single nucleotide polymorphism array and gene expression array. Our findings shed further light on several important signalling pathways involved in lung adenocarcinoma, and suggest new molecular targets for treatment.