Search tips
Search criteria

Results 1-20 (20)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
Document Types
1.  Mutational landscape and significance across 12 major cancer types 
Nature  2013;502(7471):333-339.
The Cancer Genome Atlas (TCGA) has used the latest sequencing and analysis methods to identify somatic variants across thousands of tumours. Here we present data and analytical results for point mutations and small insertions/deletions from 3,281 tumours across 12 tumour types as part of the TCGA Pan-Cancer effort. We illustrate the distributions of mutation frequencies, types and contexts across tumour types, and establish their links to tissues of origin, environmental/carcinogen influences, and DNA repair defects. Using the integrated data sets, we identified 127 significantly mutated genes from well-known(forexample, mitogen-activatedprotein kinase, phosphatidylinositol-3-OH kinase,Wnt/β-catenin and receptor tyrosine kinase signalling pathways, and cell cycle control) and emerging (for example, histone, histone modification, splicing, metabolism and proteolysis) cellular processes in cancer. The average number of mutations in these significantly mutated genes varies across tumour types; most tumours have two to six, indicating that the numberof driver mutations required during oncogenesis is relatively small. Mutations in transcriptional factors/regulators show tissue specificity, whereas histone modifiers are often mutated across several cancer types. Clinical association analysis identifies genes having a significant effect on survival, and investigations of mutations with respect to clonal/subclonal architecture delineate their temporal orders during tumorigenesis. Taken together, these results lay the groundwork for developing new diagnostics and individualizing cancer treatment.
PMCID: PMC3927368  PMID: 24132290
2.  The origin and evolution of mutations in Acute Myeloid Leukemia 
Cell  2012;150(2):264-278.
Most mutations in cancer genomes are thought to be acquired after the initiating event, which may cause genomic instability, driving clonal evolution. However, for acute myeloid leukemia (AML), normal karyotypes are common, and genomic instability is unusual. To better understand clonal evolution in AML, we sequenced the genomes of AML samples with a known initiating event (PML-RARA) vs. normal karyotype AML samples, and the exomes of hematopoietic stem/progenitor cells (HSPCs) from healthy people. Collectively, the data suggest that most of the mutations found in AML genomes are actually random events that occurred in HSPCs before they acquired the initiating mutation; the mutational history of that cell is “captured” as the clone expands. In many cases, only one or two additional, cooperating mutations are needed to generate the malignant founding clone. Cells from the founding clone can acquire additional cooperating mutations, yielding subclones that can contribute to disease progression and/or relapse.
PMCID: PMC3407563  PMID: 22817890
3.  BreakDancer: An algorithm for high resolution mapping of genomic structural variation 
Nature methods  2009;6(9):677-681.
Detection and characterization of genomic structural variation are important for understanding the landscape of genetic variation in human populations and in complex diseases such as cancer. Recent studies demonstrate the feasibility of detecting structural variation using next-generation, short-insert, paired-end sequencing reads. However, the utility of these reads is not entirely clear, nor are the analysis methods under which accurate detection can be achieved. The algorithm BreakDancer predicts a wide variety of structural variants including indels, inversions, and translocations. We examined BreakDancer's performance in simulation, comparison with other methods, analysis of an acute myeloid leukemia sample, and the 1,000 Genomes trio individuals. We found that it substantially improved the detection of small and intermediate size indels from 10 bp to 1 Mbp that are difficult to detect via a single conventional approach.
PMCID: PMC3661775  PMID: 19668202
4.  Re-sequencing Expands Our Understanding of the Phenotypic Impact of Variants at GWAS Loci 
PLoS Genetics  2014;10(1):e1004147.
Genome-wide association studies (GWAS) have identified >500 common variants associated with quantitative metabolic traits, but in aggregate such variants explain at most 20–30% of the heritable component of population variation in these traits. To further investigate the impact of genotypic variation on metabolic traits, we conducted re-sequencing studies in >6,000 members of a Finnish population cohort (The Northern Finland Birth Cohort of 1966 [NFBC]) and a type 2 diabetes case-control sample (The Finland-United States Investigation of NIDDM Genetics [FUSION] study). By sequencing the coding sequence and 5′ and 3′ untranslated regions of 78 genes at 17 GWAS loci associated with one or more of six metabolic traits (serum levels of fasting HDL-C, LDL-C, total cholesterol, triglycerides, plasma glucose, and insulin), and conducting both single-variant and gene-level association tests, we obtained a more complete understanding of phenotype-genotype associations at eight of these loci. At all eight of these loci, the identification of new associations provides significant evidence for multiple genetic signals to one or more phenotypes, and at two loci, in the genes ABCA1 and CETP, we found significant gene-level evidence of association to non-synonymous variants with MAF<1%. Additionally, two potentially deleterious variants that demonstrated significant associations (rs138726309, a missense variant in G6PC2, and rs28933094, a missense variant in LIPC) were considerably more common in these Finnish samples than in European reference populations, supporting our prior hypothesis that deleterious variants could attain high frequencies in this isolated population, likely due to the effects of population bottlenecks. Our results highlight the value of large, well-phenotyped samples for rare-variant association analysis, and the challenge of evaluating the phenotypic impact of such variants.
Author Summary
Abnormal serum levels of various metabolites, including measures relevant to cholesterol, other fats, and sugars, are known to be risk factors for cardiovascular disease and type 2 diabetes. Identification of the genes that play a role in generating such abnormalities could advance the development of new treatment and prevention strategies for these disorders. Investigations of common genetic variants carried out in large sets of research subjects have successfully pinpointed such genes within many regions of the human genome. However, these studies often have not led to the identification of the specific genetic variations affecting metabolic traits. To attempt to detect such causal variations, we sequenced genes in 17 genomic regions implicated in metabolic traits in >6,000 people from Finland. By conducting statistical analyses relating specific variations (individually and grouped by gene) to the measures for these metabolic traits observed in the study subjects, we added to our understanding of how genotypes affect these traits. Our findings support a long-held hypothesis that the unique history of the Finnish population provides important advantages for analyzing the relationship between genetic variations and biomedically important traits.
PMCID: PMC3907339  PMID: 24497850
5.  Clonal Architecture of Secondary Acute Myeloid Leukemia 
The New England Journal of Medicine  2012;366(12):1090-1098.
The myelodysplastic syndromes are a group of hematologic disorders that often evolve into secondary acute myeloid leukemia (AML). The genetic changes that underlie progression from the myelodysplastic syndromes to secondary AML are not well understood.
We performed whole-genome sequencing of seven paired samples of skin and bone marrow in seven subjects with secondary AML to identify somatic mutations specific to secondary AML. We then genotyped a bone marrow sample obtained during the antecedent myelodysplastic-syndrome stage from each subject to determine the presence or absence of the specific somatic mutations. We identified recurrent mutations in coding genes and defined the clonal architecture of each pair of samples from the myelodysplastic-syndrome stage and the secondary-AML stage, using the allele burden of hundreds of mutations.
Approximately 85% of bone marrow cells were clonal in the myelodysplastic-syndrome and secondary-AML samples, regardless of the myeloblast count. The secondary-AML samples contained mutations in 11 recurrently mutated genes, including 4 genes that have not been previously implicated in the myelodysplastic syndromes or AML. In every case, progression to acute leukemia was defined by the persistence of an antecedent founding clone containing 182 to 660 somatic mutations and the outgrowth or emergence of at least one subclone, harboring dozens to hundreds of new mutations. All founding clones and subclones contained at least one mutation in a coding gene.
Nearly all the bone marrow cells in patients with myelodysplastic syndromes and secondary AML are clonally derived. Genetic evolution of secondary AML is a dynamic process shaped by multiple cycles of mutation acquisition and clonal selection. Recurrent gene mutations are found in both founding clones and daughter subclones. (Funded by the National Institutes of Health and others.)
PMCID: PMC3320218  PMID: 22417201
6.  Clonal evolution in relapsed acute myeloid leukemia revealed by whole genome sequencing 
Nature  2012;481(7382):506-510.
Most patients with acute myeloid leukemia (AML) die from progressive disease after relapse, which is associated with clonal evolution at the cytogenetic level1,2. To determine the mutational spectrum associated with relapse, we sequenced the primary tumor and relapse genomes from 8 AML patients, and validated hundreds of somatic mutations using deep sequencing; this allowed us to precisely define clonality and clonal evolution patterns at relapse. Besides discovering novel, recurrently mutated genes (e.g. WAC, SMC3, DIS3, DDX41, and DAXX) in AML, we found two major clonal evolution patterns during AML relapse: 1) the founding clone in the primary tumor gained mutations and evolved into the relapse clone, or 2) a subclone of the founding clone survived initial therapy, gained additional mutations, and expanded at relapse. In all cases, chemotherapy failed to eradicate the founding clone. The comparison of relapse-specific vs. primary tumor mutations in all 8 cases revealed an increase in transversions, probably due to DNA damage caused by cytotoxic chemotherapy. These data demonstrate that AML relapse is associated with the addition of new mutations and clonal evolution, which is shaped in part by the chemotherapy that the patients receive to establish and maintain remissions.
PMCID: PMC3267864  PMID: 22237025
Nature Genetics  2011;44(1):53-57.
Myelodysplastic syndromes (MDS) are hematopoietic stem cell disorders that often progress to chemotherapy-resistant secondary acute myeloid leukemia (sAML). We used whole genome sequencing to perform an unbiased comprehensive screen to discover all the somatic mutations in a sAML sample and genotyped these loci in the matched MDS sample. Here we show that a missense mutation affecting the serine at codon 34 (S34) in U2AF1 was recurrently mutated in 13/150 (8.7%) de novo MDS patients, with suggestive evidence of an associated increased risk of progression to sAML. U2AF1 is a U2 auxiliary factor protein that recognizes the AG splice acceptor dinucleotide at the 3′ end of introns and mutations are located in highly conserved zinc fingers in U2AF11,2. Mutant U2AF1 promotes enhanced splicing and exon skipping in reporter assays in vitro. This novel, recurrent mutation in U2AF1 implicates altered pre-mRNA splicing as a potential mechanism for MDS pathogenesis.
PMCID: PMC3247063  PMID: 22158538
8.  Recurring Mutations Found by Sequencing an Acute Myeloid Leukemia Genome 
The New England journal of medicine  2009;361(11):1058-1066.
The full complement of DNA mutations that are responsible for the pathogenesis of acute myeloid leukemia (AML) is not yet known.
We used massively parallel DNA sequencing to obtain a very high level of coverage (approximately 98%) of a primary, cytogenetically normal, de novo genome for AML with minimal maturation (AML-M1) and a matched normal skin genome.
We identified 12 acquired (somatic) mutations within the coding sequences of genes and 52 somatic point mutations in conserved or regulatory portions of the genome. All mutations appeared to be heterozygous and present in nearly all cells in the tumor sample. Four of the 64 mutations occurred in at least 1 additional AML sample in 188 samples that were tested. Mutations in NRAS and NPM1 had been identified previously in patients with AML, but two other mutations had not been identified. One of these mutations, in the IDH1 gene, was present in 15 of 187 additional AML genomes tested and was strongly associated with normal cytogenetic status; it was present in 13 of 80 cytogenetically normal samples (16%). The other was a nongenic mutation in a genomic region with regulatory potential and conservation in higher mammals; we detected it in one additional AML tumor. The AML genome that we sequenced contains approximately 750 point mutations, of which only a small fraction are likely to be relevant to pathogenesis.
By comparing the sequences of tumor and skin genomes of a patient with AML-M1, we have identified recurring mutations that may be relevant for pathogenesis.
PMCID: PMC3201812  PMID: 19657110
9.  DNMT3A Mutations in Acute Myeloid Leukemia 
The New England journal of medicine  2010;363(25):2424-2433.
The genetic alterations responsible for an adverse outcome in most patients with acute myeloid leukemia (AML) are unknown.
Using massively parallel DNA sequencing, we identified a somatic mutation in DNMT3A, encoding a DNA methyltransferase, in the genome of cells from a patient with AML with a normal karyotype. We sequenced the exons of DNMT3A in 280 additional patients with de novo AML to define recurring mutations.
A total of 62 of 281 patients (22.1%) had mutations in DNMT3A that were predicted to affect translation. We identified 18 different missense mutations, the most common of which was predicted to affect amino acid R882 (in 37 patients). We also identified six frameshift, six nonsense, and three splice-site mutations and a 1.5-Mbp deletion encompassing DNMT3A. These mutations were highly enriched in the group of patients with an intermediate-risk cytogenetic profile (56 of 166 patients, or 33.7%) but were absent in all 79 patients with a favorable-risk cytogenetic profile (P<0.001 for both comparisons). The median overall survival among patients with DNMT3A mutations was significantly shorter than that among patients without such mutations (12.3 months vs. 41.1 months, P<0.001). DNMT3A mutations were associated with adverse outcomes among patients with an intermediate-risk cytogenetic profile or FLT3 mutations, regardless of age, and were independently associated with a poor outcome in Cox proportional-hazards analysis.
DNMT3A mutations are highly recurrent in patients with de novo AML with an intermediate-risk cytogenetic profile and are independently associated with a poor outcome. (Funded by the National Institutes of Health and others.)
PMCID: PMC3201818  PMID: 21067377
10.  Sequencing a mouse acute promyelocytic leukemia genome reveals genetic events relevant for disease progression 
The Journal of Clinical Investigation  2011;121(4):1445-1455.
Acute promyelocytic leukemia (APL) is a subtype of acute myeloid leukemia (AML). It is characterized by the t(15;17)(q22;q11.2) chromosomal translocation that creates the promyelocytic leukemia–retinoic acid receptor α (PML-RARA) fusion oncogene. Although this fusion oncogene is known to initiate APL in mice, other cooperating mutations, as yet ill defined, are important for disease pathogenesis. To identify these, we used a mouse model of APL, whereby PML-RARA expressed in myeloid cells leads to a myeloproliferative disease that ultimately evolves into APL. Sequencing of a mouse APL genome revealed 3 somatic, nonsynonymous mutations relevant to APL pathogenesis, of which 1 (Jak1 V657F) was found to be recurrent in other affected mice. This mutation was identical to the JAK1 V658F mutation previously found in human APL and acute lymphoblastic leukemia samples. Further analysis showed that JAK1 V658F cooperated in vivo with PML-RARA, causing a rapidly fatal leukemia in mice. We also discovered a somatic 150-kb deletion involving the lysine (K)-specific demethylase 6A (Kdm6a, also known as Utx) gene, in the mouse APL genome. Similar deletions were observed in 3 out of 14 additional mouse APL samples and 1 out of 150 human AML samples. In conclusion, whole genome sequencing of mouse cancer genomes can provide an unbiased and comprehensive approach for discovering functionally relevant mutations that are also present in human leukemias.
PMCID: PMC3069786  PMID: 21436584
11.  CMDS: a population-based method for identifying recurrent DNA copy number aberrations in cancer from high-resolution data 
Bioinformatics  2009;26(4):464-469.
Motivation: DNA copy number aberration (CNA) is a hallmark of genomic abnormality in tumor cells. Recurrent CNA (RCNA) occurs in multiple cancer samples across the same chromosomal region and has greater implication in tumorigenesis. Current commonly used methods for RCNA identification require CNA calling for individual samples before cross-sample analysis. This two-step strategy may result in a heavy computational burden, as well as a loss of the overall statistical power due to segmentation and discretization of individual sample's data. We propose a population-based approach for RCNA detection with no need of single-sample analysis, which is statistically powerful, computationally efficient and particularly suitable for high-resolution and large-population studies.
Results: Our approach, correlation matrix diagonal segmentation (CMDS), identifies RCNAs based on a between-chromosomal-site correlation analysis. Directly using the raw intensity ratio data from all samples and adopting a diagonal transformation strategy, CMDS substantially reduces computational burden and can obtain results very quickly from large datasets. Our simulation indicates that the statistical power of CMDS is higher than that of single-sample CNA calling based two-step approaches. We applied CMDS to two real datasets of lung cancer and brain cancer from Affymetrix and Illumina array platforms, respectively, and successfully identified known regions of CNA associated with EGFR, KRAS and other important oncogenes. CMDS provides a fast, powerful and easily implemented tool for the RCNA analysis of large-scale data from cancer genomes.
Availability: The R and C programs implementing our method are available at
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC2852218  PMID: 20031968
12.  Genome Remodeling in a Basal-like Breast Cancer Metastasis and Xenograft 
Nature  2010;464(7291):999-1005.
Massively parallel DNA sequencing technologies provide an unprecedented ability to screen entire genomes for genetic changes associated with tumor progression. Here we describe the genomic analyses of four DNA samples from an African-American patient with basal-like breast cancer: peripheral blood, the primary tumor, a brain metastasis, and a xenograft derived from the primary tumor. The metastasis contained two de novo mutations and a large deletion not present in the primary tumor, and was significantly enriched for 20 shared mutations. The xenograft retained all primary tumor mutations, and displayed a mutation enrichment pattern that paralleled the metastasis (16 of 20 genes). Two overlapping large deletions, encompassing CTNNA1, were present in all three tumor samples. The differential mutation frequencies and structural variation patterns in metastasis and xenograft compared to the primary tumor suggest that secondary tumors may arise from a minority of cells within the primary.
PMCID: PMC2872544  PMID: 20393555
13.  VarScan: variant detection in massively parallel sequencing of individual and pooled samples 
Bioinformatics  2009;25(17):2283-2285.
Summary: Massively parallel sequencing technologies hold incredible promise for the study of DNA sequence variation, particularly the identification of variants affecting human disease. The unprecedented throughput and relatively short read lengths of Roche/454, Illumina/Solexa, and other platforms have spurred development of a new generation of sequence alignment algorithms. Yet detection of sequence variants based on short read alignments remains challenging, and most currently available tools are limited to a single platform or aligner type. We present VarScan, an open source tool for variant detection that is compatible with several short read aligners. We demonstrate VarScan's ability to detect SNPs and indels with high sensitivity and specificity, in both Roche/454 sequencing of individuals and deep Illumina/Solexa sequencing of pooled samples.
Availability and Implementation: Source code and documentation freely available at implemented as a Perl package and supported on Linux/UNIX, MS Windows and Mac OSX.
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC2734323  PMID: 19542151
14.  Novel MEK1 Mutation Identified by Mutational Analysis of Epidermal Growth Factor Receptor Signaling Pathway Genes in Lung Adenocarcinoma 
Cancer research  2008;68(14):5524-5528.
Genetic lesions affecting a number of kinases and other elements within the epidermal growth factor receptor (EGFR) signaling pathway have been implicated in the pathogenesis of human non–small-cell lung cancer (NSCLC). We performed mutational profiling of a large cohort of lung adenocarcinomas to uncover other potential somatic mutations in genes of this pathway that could contribute to lung tumorigenesis. We have identified in 2 of 207 primary lung tumors a somatic activating mutation in exon 2 of MEK1 (i.e., mitogen-activated protein kinase kinase 1 or MAP2K1) that substitutes asparagine for lysine at amino acid 57 (K57N) in the nonkinase portion of the kinase. Neither of these two tumors harbored known mutations in other genes encoding components of the EGFR signaling pathway (i.e., EGFR, HER2, KRAS, PIK3CA, and BRAF). Expression of mutant, but not wild-type, MEK1 leads to constitutive activity of extracellular signal–regulated kinase (ERK)-1/2 in human 293T cells and to growth factor–independent proliferation of murine Ba/F3 cells. A selective MEK inhibitor, AZD6244, inhibits mutant-induced ERK activity in 293T cells and growth of mutant-bearing Ba/F3 cells. We also screened 85 NSCLC cell lines for MEK1 exon 2 mutations; one line (NCI-H1437) harbors a Q56P substitution, a known transformation-competent allele of MEK1 originally identified in rat fibroblasts, and is sensitive to treatment with AZD6244. MEK1 mutants have not previously been reported in lung cancer and may provide a target for effective therapy in a small subset of patients with lung adenocarcinoma.
PMCID: PMC2586155  PMID: 18632602
15.  Somatic mutations affect key pathways in lung adenocarcinoma 
Ding, Li | Getz, Gad | Wheeler, David A. | Mardis, Elaine R. | McLellan, Michael D. | Cibulskis, Kristian | Sougnez, Carrie | Greulich, Heidi | Muzny, Donna M. | Morgan, Margaret B. | Fulton, Lucinda | Fulton, Robert S. | Zhang, Qunyuan | Wendl, Michael C. | Lawrence, Michael S. | Larson, David E. | Chen, Ken | Dooling, David J. | Sabo, Aniko | Hawes, Alicia C. | Shen, Hua | Jhangiani, Shalini N. | Lewis, Lora R. | Hall, Otis | Zhu, Yiming | Mathew, Tittu | Ren, Yanru | Yao, Jiqiang | Scherer, Steven E. | Clerc, Kerstin | Metcalf, Ginger A. | Ng, Brian | Milosavljevic, Aleksandar | Gonzalez-Garay, Manuel L. | Osborne, John R. | Meyer, Rick | Shi, Xiaoqi | Tang, Yuzhu | Koboldt, Daniel C. | Lin, Ling | Abbott, Rachel | Miner, Tracie L. | Pohl, Craig | Fewell, Ginger | Haipek, Carrie | Schmidt, Heather | Dunford-Shore, Brian H. | Kraja, Aldi | Crosby, Seth D. | Sawyer, Christopher S. | Vickery, Tammi | Sander, Sacha | Robinson, Jody | Winckler, Wendy | Baldwin, Jennifer | Chirieac, Lucian R. | Dutt, Amit | Fennell, Tim | Hanna, Megan | Johnson, Bruce E. | Onofrio, Robert C. | Thomas, Roman K. | Tonon, Giovanni | Weir, Barbara A. | Zhao, Xiaojun | Ziaugra, Liuda | Zody, Michael C. | Giordano, Thomas | Orringer, Mark B. | Roth, Jack A. | Spitz, Margaret R. | Wistuba, Ignacio I. | Ozenberger, Bradley | Good, Peter J. | Chang, Andrew C. | Beer, David G. | Watson, Mark A. | Ladanyi, Marc | Broderick, Stephen | Yoshizawa, Akihiko | Travis, William D. | Pao, William | Province, Michael A. | Weinstock, George M. | Varmus, Harold E. | Gabriel, Stacey B. | Lander, Eric S. | Gibbs, Richard A. | Meyerson, Matthew | Wilson, Richard K.
Nature  2008;455(7216):1069-1075.
Determining the genetic basis of cancer requires comprehensive analyses of large collections of histopathologically well-classified primary tumours. Here we report the results of a collaborative study to discover somatic mutations in 188 human lung adenocarcinomas. DNA sequencing of 623 genes with known or potential relationships to cancer revealed more than 1,000 somatic mutations across the samples. Our analysis identified 26 genes that are mutated at significantly high frequencies and thus are probably involved in carcinogenesis. The frequently mutated genes include tyrosine kinases, among them the EGFR homologue ERBB4; multiple ephrin receptor genes, notably EPHA3; vascular endothelial growth factor receptor KDR; and NTRK genes. These data provide evidence of somatic mutations in primary lung adenocarcinoma for several tumour suppressor genes involved in other cancers—including NF1, APC, RB1 and ATM—and for sequence changes in PTPRD as well as the frequently deleted gene LRP1B. The observed mutational profiles correlate with clinical features, smoking status and DNA repair defects. These results are reinforced by data integration including single nucleotide polymorphism array and gene expression array. Our findings shed further light on several important signalling pathways involved in lung adenocarcinoma, and suggest new molecular targets for treatment.
PMCID: PMC2694412  PMID: 18948947
16.  DNA sequencing of a cytogenetically normal acute myeloid leukemia genome 
Nature  2008;456(7218):66-72.
Lay Summary
Acute myeloid leukemia is a highly malignant hematopoietic tumor that affects about 13,000 adults yearly in the United States. The treatment of this disease has changed little in the past two decades, since most of the genetic events that initiate the disease remain undiscovered. Whole genome sequencing is now possible at a reasonable cost and timeframe to utilize this approach for unbiased discovery of tumor-specific somatic mutations that alter the protein-coding genes. Here we show the results obtained by sequencing a typical acute myeloid leukemia genome and its matched normal counterpart, obtained from the patient’s skin. We discovered 10 genes with acquired mutations; two were previously described mutations thought to contribute to tumor progression, and 8 were novel mutations present in virtually all tumor cells at presentation and relapse, whose function is not yet known. Our study establishes whole genome sequencing as an unbiased method for discovering initiating mutations in cancer genomes, and for identifying novel genes that may respond to targeted therapies.
We used massively parallel sequencing technology to sequence the genomic DNA of tumor and normal skin cells obtained from a patient with a typical presentation of FAB M1 Acute Myeloid Leukemia (AML) with normal cytogenetics. 32.7-fold ‘haploid’ coverage (98 billion bases) was obtained for the tumor genome, and 13.9-fold coverage (41.8 billion bases) was obtained for the normal sample. Of 2,647,695 well-supported Single Nucleotide Variants (SNVs) found in the tumor genome, 2,588,486 (97.7%) also were detected in the patient’s skin genome, limiting the number of variants that required further study. For the purposes of this initial study, we restricted our downstream analysis to the coding sequences of annotated genes: we found only eight heterozygous, non-synonymous somatic SNVs in the entire genome. All were novel, including mutations in protocadherin/cadherin family members (CDH24 and PCLKC), G-protein coupled receptors (GPR123 and EBI2), a protein phosphatase (PTPRT), a potential guanine nucleotide exchange factor (KNDC1), a peptide/drug transporter (SLC15A1), and a glutamate receptor gene (GRINL1B). We also detected previously described, recurrent somatic insertions in the FLT3 and NPM1 genes. Based on deep readcount data, we determined that all of these mutations (except FLT3) were present in nearly all tumor cells at presentation, and again at relapse 11 months later, suggesting that the patient had a single dominant clone containing all of the mutations. These results demonstrate the power of whole genome sequencing to discover novel cancer-associated mutations.
PMCID: PMC2603574  PMID: 18987736
17.  Mutational Analysis of EGFR and Related Signaling Pathway Genes in Lung Adenocarcinomas Identifies a Novel Somatic Kinase Domain Mutation in FGFR4 
PLoS ONE  2007;2(5):e426.
Fifty percent of lung adenocarcinomas harbor somatic mutations in six genes that encode proteins in the EGFR signaling pathway, i.e., EGFR, HER2/ERBB2, HER4/ERBB4, PIK3CA, BRAF, and KRAS. We performed mutational profiling of a large cohort of lung adenocarcinomas to uncover other potential somatic mutations in genes of this signaling pathway that could contribute to lung tumorigenesis.
Methodology/Principal Findings
We analyzed genomic DNA from a total of 261 resected, clinically annotated non-small cell lung cancer (NSCLC) specimens. The coding sequences of 39 genes were screened for somatic mutations via high-throughput dideoxynucleotide sequencing of PCR-amplified gene products. Mutations were considered to be somatic only if they were found in an independent tumor-derived PCR product but not in matched normal tissue. Sequencing of 9MB of tumor sequence identified 239 putative genetic variants. We further examined 22 variants found in RAS family genes and 135 variants localized to exons encoding the kinase domain of respective proteins. We identified a total of 37 non-synonymous somatic mutations; 36 were found collectively in EGFR, KRAS, BRAF, and PIK3CA. One somatic mutation was a previously unreported mutation in the kinase domain (exon 16) of FGFR4 (Glu681Lys), identified in 1 of 158 tumors. The FGFR4 mutation is analogous to a reported tumor-specific somatic mutation in ERBB2 and is located in the same exon as a previously reported kinase domain mutation in FGFR4 (Pro712Thr) in a lung adenocarcinoma cell line.
This study is one of the first comprehensive mutational analyses of major genes in a specific signaling pathway in a sizeable cohort of lung adenocarcinomas. Our results suggest the majority of gain-of-function mutations within kinase genes in the EGFR signaling pathway have already been identified. Our findings also implicate FGFR4 in the pathogenesis of a subset of lung adenocarcinomas.
PMCID: PMC1855985  PMID: 17487277
18.  Identification of a pan-cancer oncogenic microRNA superfamily anchored by a central core seed motif 
Nature Communications  2013;4:2730.
MicroRNAs modulate tumorigenesis through suppression of specific genes. As many tumour types rely on overlapping oncogenic pathways, a core set of microRNAs may exist, which consistently drives or suppresses tumorigenesis in many cancer types. Here we integrate The Cancer Genome Atlas (TCGA) pan-cancer data set with a microRNA target atlas composed of publicly available Argonaute Crosslinking Immunoprecipitation (AGO-CLIP) data to identify pan-tumour microRNA drivers of cancer. Through this analysis, we show a pan-cancer, coregulated oncogenic microRNA ‘superfamily’ consisting of the miR-17, miR-19, miR-130, miR-93, miR-18, miR-455 and miR-210 seed families, which cotargets critical tumour suppressors via a central GUGC core motif. We subsequently define mutations in microRNA target sites using the AGO-CLIP microRNA target atlas and TCGA exome-sequencing data. These combined analyses identify pan-cancer oncogenic cotargeting of the phosphoinositide 3-kinase, TGFβ and p53 pathways by the miR-17-19-130 superfamily members.
AGO-CLIP permits the identification of miRNA target genes. Here, Hamilton et al. compile publicly available AGO-CLIP data and combine this information with miRNA analysis from The Cancer Genome Atlas, permitting the identification of an oncogenic miRNA superfamily that targets tumour suppressor genes.
PMCID: PMC3868236  PMID: 24220575
19.  Whole Genome Analysis Informs Breast Cancer Response to Aromatase Inhibition 
Nature  2012;486(7403):353-360.
To correlate the variable clinical features of estrogen receptor positive (ER+) breast cancer with somatic alterations, we studied pre-treatment tumour biopsies accrued from patients in a study of neoadjuvant aromatase inhibitor (AI) therapy by massively parallel sequencing and analysis. Eighteen significantly mutated genes were identified, including five genes (RUNX1, CBFB, MYH9, MLL3 and SF3B1) previously linked to hematopoietic disorders. Mutant MAP3K1 was associated with Luminal A status, low grade histology and low proliferation rates whereas mutant TP53 associated with the opposite pattern. Moreover, mutant GATA3 correlated with suppression of proliferation upon AI treatment. Pathway analysis demonstrated mutations in MAP2K4, a MAP3K1 substrate, produced similar perturbations as MAP3K1 loss. Distinct phenotypes in ER+ breast cancer are associated with specific patterns of somatic mutations that map into cellular pathways linked to tumor biology but most recurrent mutations are relatively infrequent. Prospective clinical trials based on these findings will require comprehensive genome sequencing.
PMCID: PMC3383766  PMID: 22722193
20.  The identification of a novel TP53 cancer susceptibility mutation through whole genome sequencing of a patient with therapy-related AML 
The identification of patients with inherited cancer susceptibility syndromes facilitates early diagnosis, prevention, and treatment. However, in many cases of suspected cancer susceptibility, the family history is unclear and genetic testing of common cancer susceptibility genes is unrevealing.
To apply whole-genome sequencing to a patient with suspected cancer susceptibility (and lacking a clear family history of cancer and no BRCA1 and BRCA2 mutations) to identify rare or novel germline variants in cancer susceptibility genes.
Design, Setting, and Participant
Skin (normal) and bone marrow (leukemia) DNA were obtained from a patient with early-onset breast and ovarian cancer and therapy-related acute myeloid leukemia (t-AML), and analyzed with: 1) whole genome sequencing using paired end reads; 2) SNP genotyping; 3) RNA expression profiling; and 4) spectral karyotyping.
Main Outcome Measures
Structural variants, copy number alterations, single nucleotide variants and small insertions and deletions (indels) were detected and validated using the above platforms.
Whole genome sequencing revealed a novel, heterozygous 3 Kb deletion removing exons 7-9 of TP53 in the patient’s normal skin DNA, which was homozygous in the leukemia DNA as a result of uniparental disomy. In addition, a total of 28 validated somatic single nucleotide variations or indels in coding genes, 8 somatic structural variants, and 12 somatic copy number alterations were detected in the patient’s leukemia genome.
Whole genome sequencing can identify novel, cryptic variants in cancer susceptibility genes in addition to providing unbiased information on the spectrum of mutations in a cancer genome.
PMCID: PMC3170052  PMID: 21505135

Results 1-20 (20)