Most mutations in cancer genomes are thought to be acquired after the initiating event, which may cause genomic instability, driving clonal evolution. However, for acute myeloid leukemia (AML), normal karyotypes are common, and genomic instability is unusual. To better understand clonal evolution in AML, we sequenced the genomes of AML samples with a known initiating event (PML-RARA) vs. normal karyotype AML samples, and the exomes of hematopoietic stem/progenitor cells (HSPCs) from healthy people. Collectively, the data suggest that most of the mutations found in AML genomes are actually random events that occurred in HSPCs before they acquired the initiating mutation; the mutational history of that cell is “captured” as the clone expands. In many cases, only one or two additional, cooperating mutations are needed to generate the malignant founding clone. Cells from the founding clone can acquire additional cooperating mutations, yielding subclones that can contribute to disease progression and/or relapse.
The myelodysplastic syndromes are a group of hematologic disorders that often evolve into secondary acute myeloid leukemia (AML). The genetic changes that underlie progression from the myelodysplastic syndromes to secondary AML are not well understood.
We performed whole-genome sequencing of seven paired samples of skin and bone marrow in seven subjects with secondary AML to identify somatic mutations specific to secondary AML. We then genotyped a bone marrow sample obtained during the antecedent myelodysplastic-syndrome stage from each subject to determine the presence or absence of the specific somatic mutations. We identified recurrent mutations in coding genes and defined the clonal architecture of each pair of samples from the myelodysplastic-syndrome stage and the secondary-AML stage, using the allele burden of hundreds of mutations.
Approximately 85% of bone marrow cells were clonal in the myelodysplastic-syndrome and secondary-AML samples, regardless of the myeloblast count. The secondary-AML samples contained mutations in 11 recurrently mutated genes, including 4 genes that have not been previously implicated in the myelodysplastic syndromes or AML. In every case, progression to acute leukemia was defined by the persistence of an antecedent founding clone containing 182 to 660 somatic mutations and the outgrowth or emergence of at least one subclone, harboring dozens to hundreds of new mutations. All founding clones and subclones contained at least one mutation in a coding gene.
Nearly all the bone marrow cells in patients with myelodysplastic syndromes and secondary AML are clonally derived. Genetic evolution of secondary AML is a dynamic process shaped by multiple cycles of mutation acquisition and clonal selection. Recurrent gene mutations are found in both founding clones and daughter subclones. (Funded by the National Institutes of Health and others.)
Most patients with acute myeloid leukemia (AML) die from progressive disease after relapse, which is associated with clonal evolution at the cytogenetic level1,2. To determine the mutational spectrum associated with relapse, we sequenced the primary tumor and relapse genomes from 8 AML patients, and validated hundreds of somatic mutations using deep sequencing; this allowed us to precisely define clonality and clonal evolution patterns at relapse. Besides discovering novel, recurrently mutated genes (e.g. WAC, SMC3, DIS3, DDX41, and DAXX) in AML, we found two major clonal evolution patterns during AML relapse: 1) the founding clone in the primary tumor gained mutations and evolved into the relapse clone, or 2) a subclone of the founding clone survived initial therapy, gained additional mutations, and expanded at relapse. In all cases, chemotherapy failed to eradicate the founding clone. The comparison of relapse-specific vs. primary tumor mutations in all 8 cases revealed an increase in transversions, probably due to DNA damage caused by cytotoxic chemotherapy. These data demonstrate that AML relapse is associated with the addition of new mutations and clonal evolution, which is shaped in part by the chemotherapy that the patients receive to establish and maintain remissions.
Myelodysplastic syndromes (MDS) are hematopoietic stem cell disorders that often progress to chemotherapy-resistant secondary acute myeloid leukemia (sAML). We used whole genome sequencing to perform an unbiased comprehensive screen to discover all the somatic mutations in a sAML sample and genotyped these loci in the matched MDS sample. Here we show that a missense mutation affecting the serine at codon 34 (S34) in U2AF1 was recurrently mutated in 13/150 (8.7%) de novo MDS patients, with suggestive evidence of an associated increased risk of progression to sAML. U2AF1 is a U2 auxiliary factor protein that recognizes the AG splice acceptor dinucleotide at the 3′ end of introns and mutations are located in highly conserved zinc fingers in U2AF11,2. Mutant U2AF1 promotes enhanced splicing and exon skipping in reporter assays in vitro. This novel, recurrent mutation in U2AF1 implicates altered pre-mRNA splicing as a potential mechanism for MDS pathogenesis.
The full complement of DNA mutations that are responsible for the pathogenesis of acute myeloid leukemia (AML) is not yet known.
We used massively parallel DNA sequencing to obtain a very high level of coverage (approximately 98%) of a primary, cytogenetically normal, de novo genome for AML with minimal maturation (AML-M1) and a matched normal skin genome.
We identified 12 acquired (somatic) mutations within the coding sequences of genes and 52 somatic point mutations in conserved or regulatory portions of the genome. All mutations appeared to be heterozygous and present in nearly all cells in the tumor sample. Four of the 64 mutations occurred in at least 1 additional AML sample in 188 samples that were tested. Mutations in NRAS and NPM1 had been identified previously in patients with AML, but two other mutations had not been identified. One of these mutations, in the IDH1 gene, was present in 15 of 187 additional AML genomes tested and was strongly associated with normal cytogenetic status; it was present in 13 of 80 cytogenetically normal samples (16%). The other was a nongenic mutation in a genomic region with regulatory potential and conservation in higher mammals; we detected it in one additional AML tumor. The AML genome that we sequenced contains approximately 750 point mutations, of which only a small fraction are likely to be relevant to pathogenesis.
By comparing the sequences of tumor and skin genomes of a patient with AML-M1, we have identified recurring mutations that may be relevant for pathogenesis.
The genetic alterations responsible for an adverse outcome in most patients with acute myeloid leukemia (AML) are unknown.
Using massively parallel DNA sequencing, we identified a somatic mutation in DNMT3A, encoding a DNA methyltransferase, in the genome of cells from a patient with AML with a normal karyotype. We sequenced the exons of DNMT3A in 280 additional patients with de novo AML to define recurring mutations.
A total of 62 of 281 patients (22.1%) had mutations in DNMT3A that were predicted to affect translation. We identified 18 different missense mutations, the most common of which was predicted to affect amino acid R882 (in 37 patients). We also identified six frameshift, six nonsense, and three splice-site mutations and a 1.5-Mbp deletion encompassing DNMT3A. These mutations were highly enriched in the group of patients with an intermediate-risk cytogenetic profile (56 of 166 patients, or 33.7%) but were absent in all 79 patients with a favorable-risk cytogenetic profile (P<0.001 for both comparisons). The median overall survival among patients with DNMT3A mutations was significantly shorter than that among patients without such mutations (12.3 months vs. 41.1 months, P<0.001). DNMT3A mutations were associated with adverse outcomes among patients with an intermediate-risk cytogenetic profile or FLT3 mutations, regardless of age, and were independently associated with a poor outcome in Cox proportional-hazards analysis.
DNMT3A mutations are highly recurrent in patients with de novo AML with an intermediate-risk cytogenetic profile and are independently associated with a poor outcome. (Funded by the National Institutes of Health and others.)
The identification of patients with inherited cancer susceptibility syndromes facilitates early diagnosis, prevention, and treatment. However, in many cases of suspected cancer susceptibility, the family history is unclear and genetic testing of common cancer susceptibility genes is unrevealing.
To apply whole-genome sequencing to a patient with suspected cancer susceptibility (and lacking a clear family history of cancer and no BRCA1 and BRCA2 mutations) to identify rare or novel germline variants in cancer susceptibility genes.
Design, Setting, and Participant
Skin (normal) and bone marrow (leukemia) DNA were obtained from a patient with early-onset breast and ovarian cancer and therapy-related acute myeloid leukemia (t-AML), and analyzed with: 1) whole genome sequencing using paired end reads; 2) SNP genotyping; 3) RNA expression profiling; and 4) spectral karyotyping.
Main Outcome Measures
Structural variants, copy number alterations, single nucleotide variants and small insertions and deletions (indels) were detected and validated using the above platforms.
Whole genome sequencing revealed a novel, heterozygous 3 Kb deletion removing exons 7-9 of TP53 in the patient’s normal skin DNA, which was homozygous in the leukemia DNA as a result of uniparental disomy. In addition, a total of 28 validated somatic single nucleotide variations or indels in coding genes, 8 somatic structural variants, and 12 somatic copy number alterations were detected in the patient’s leukemia genome.
Whole genome sequencing can identify novel, cryptic variants in cancer susceptibility genes in addition to providing unbiased information on the spectrum of mutations in a cancer genome.
Large-scale cancer sequencing data enable discovery of rare germline cancer susceptibility variants. Here we systematically analyse 4,034 cases from The Cancer Genome Atlas cancer cases representing 12 cancer types. We find that the frequency of rare germline truncations in 114 cancer-susceptibility-associated genes varies widely, from 4% (acute myeloid leukaemia (AML)) to 19% (ovarian cancer), with a notably high frequency of 11% in stomach cancer. Burden testing identifies 13 cancer genes with significant enrichment of rare truncations, some associated with specific cancers (for example, RAD51C, PALB2 and MSH6 in AML, stomach and endometrial cancers, respectively). Significant, tumour-specific loss of heterozygosity occurs in nine genes (ATM, BAP1, BRCA1/2, BRIP1, FANCM, PALB2 and RAD51C/D). Moreover, our homology-directed repair assay of 68 BRCA1 rare missense variants supports the utility of allelic enrichment analysis for characterizing variants of unknown significance. The scale of this analysis and the somatic-germline integration enable the detection of rare variants that may affect individual susceptibility to tumour development, a critical step toward precision medicine.
Published sequencing data sets of cancer samples could be used to identify genetic variants associated with the risk of developing cancer. Here, Lu et al. analyse over 4,000 tumour-normal pairs to reveal variable frequencies of inherited susceptibilities across 12 cancer types and find enrichment of functionally validated missense variants of unknown significance.
SomaticSeq is an accurate somatic mutation detection pipeline implementing a stochastic boosting algorithm to produce highly accurate somatic mutation calls for both single nucleotide variants and small insertions and deletions. The workflow currently incorporates five state-of-the-art somatic mutation callers, and extracts over 70 individual genomic and sequencing features for each candidate site. A training set is provided to an adaptively boosted decision tree learner to create a classifier for predicting mutation statuses. We validate our results with both synthetic and real data. We report that SomaticSeq is able to achieve better overall accuracy than any individual tool incorporated.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-015-0758-2) contains supplementary material, which is available to authorized users.
Massively parallel sequencing technologies continue to alter the study of human genetics. As the cost of sequencing declines, next-generation sequencing (NGS) instruments and datasets will become increasingly accessible to the wider research community. Investigators are understandably eager to harness the power of these new technologies. Sequencing human genomes on these platforms, however, presents numerous production and bioinformatics challenges. Production issues like sample contamination, library chimaeras and variable run quality have become increasingly problematic in the transition from technology development lab to production floor. Analysis of NGS data, too, remains challenging, particularly given the short-read lengths (35–250 bp) and sheer volume of data. The development of streamlined, highly automated pipelines for data analysis is critical for transition from technology adoption to accelerated research and publication. This review aims to describe the state of current NGS technologies, as well as the strategies that enable NGS users to characterize the full spectrum of DNA sequence variation in humans.
massively parallel sequencing; next generation sequencing; human genome; variant detection; short read alignment; whole genome sequencing
In this work, we present the Genome Modeling System (GMS), an analysis information management system capable of executing automated genome analysis pipelines at a massive scale. The GMS framework provides detailed tracking of samples and data coupled with reliable and repeatable analysis pipelines. The GMS also serves as a platform for bioinformatics development, allowing a large team to collaborate on data analysis, or an individual researcher to leverage the work of others effectively within its data management system. Rather than separating ad-hoc analysis from rigorous, reproducible pipelines, the GMS promotes systematic integration between the two. As a demonstration of the GMS, we performed an integrated analysis of whole genome, exome and transcriptome sequencing data from a breast cancer cell line (HCC1395) and matched lymphoblastoid line (HCC1395BL). These data are available for users to test the software, complete tutorials and develop novel GMS pipeline configurations. The GMS is available at https://github.com/genome/gms.
To identify the cause of retinitis pigmentosa (RP) in UTAD003, a large, six-generation Louisiana family with autosomal dominant retinitis pigmentosa (adRP).
A series of strategies, including candidate gene screening, linkage exclusion, genome-wide linkage mapping, and whole-exome next-generation sequencing, was used to identify a mutation in a novel disease gene on chromosome 10q22.1. Probands from an additional 404 retinal degeneration families were subsequently screened for mutations in this gene.
Exome sequencing in UTAD003 led to identification of a single, novel coding variant (c.2539G>A, p.Glu847Lys) in hexokinase 1 (HK1) present in all affected individuals and absent from normal controls. One affected family member carries two copies of the mutation and has an unusually severe form of disease, consistent with homozygosity for this mutation. Screening of additional adRP probands identified four other families (American, Canadian, and Sicilian) with the same mutation and a similar range of phenotypes. The families share a rare 450-kilobase haplotype containing the mutation, suggesting a founder mutation among otherwise unrelated families.
We identified an HK1 mutation in five adRP families. Hexokinase 1 catalyzes phosphorylation of glucose to glucose-6-phosphate. HK1 is expressed in retina, with two abundant isoforms expressed at similar levels. The Glu847Lys mutation is located at a highly conserved position in the protein, outside the catalytic domains. We hypothesize that the effect of this mutation is limited to the retina, as no systemic abnormalities in glycolysis were detected. Prevalence of the HK1 mutation in our cohort of RP families is 1%.
Mutations in a novel gene, hexokinase 1 (HK1), account for 1% of cases of autosomal dominant retinitis pigmentosa.
retinitis pigmentosa; hexokinase; inherited retinal dystrophy
The contribution of genetic variants to sporadic amyotrophic lateral sclerosis (ALS) remains largely unknown. Either recessive or de novo variants could result in an apparently sporadic occurrence of ALS. In an attempt to find such variants we sequenced the exomes of 44 ALS-unaffected-parents trios. Rare and potentially damaging compound heterozygous variants were found in 27% of ALS patients, homozygous recessive variants in 14% and coding de novo variants in 27%. In 20% of patients more than one of the above variants was present. Genes with recessive variants were enriched in nucleotide binding capacity, ATPase activity, and the dynein heavy chain. Genes with de novo variants were enriched in transcription regulation and cell cycle processes. This trio study indicates that rare private recessive variants could be a mechanism underlying some case of sporadic ALS, and that de novo mutations are also likely to play a part in the disease.
The identification of small sequence variants remains a challenging but critical step in the analysis of next-generation sequencing data. Our variant calling tool, VarScan 2, employs heuristic and statistic thresholds based on user-defined criteria to call variants using SAMtools mpileup data as input. Here, we provide guidelines for generating that input, and describe protocols for using VarScan 2 to (1) identify germline variants in individual samples; (2) call somatic mutations, copy number alterations, and LOH events in tumor-normal pairs; and (3) identify germline variants, de novo mutations, and Mendelian inheritance errors in family trios. Further, we describe a strategy for variant filtering that removes likely false positives associated with common sequencing- and alignment-related artifacts.
variant calling; mutation detection; trio calling; snvs; indels; varscan 2; next-generation sequencing
Genomics is a relatively new scientific discipline, having DNA sequencing as its core technology. As technology has improved the cost and scale of genome characterization over sequencing’s 40-year history, the scope of inquiry has commensurately broadened. Massively parallel sequencing has proven revolutionary, shifting the paradigm of genomics to address biological questions at a genome-wide scale. Sequencing now empowers clinical diagnostics and other aspects of medical care, including disease risk, therapeutic identification, and prenatal testing. This Review explores the current state of genomics in the massively parallel sequencing era.
The goal of our research is to identify genes and mutations causing auto-somal dominant retinitis pigmentosa (adRP). For this purpose we established a cohort of more than 250 independently ascertained families with adRP in the Houston Laboratory for Molecular Diagnosis of Inherited Eye Diseases. Affected members of each family were screened for disease-causing mutations in genes and gene regions that are commonly associated with adRP. By this approach, we detected mutations in 65 % of the families, leaving 85 families that are likely to harbor mutations outside of the “common” regions or in novel genes. Of these, 32 families were tested by several types of next-generation sequencing (NGS), including (a) targeted polymerase chain reaction (PCR) NGS, (b) whole exome NGS, and (c) targeted retinal-capture NGS. We detected mutations in 11 of these families (31 %) bringing the total detected in the adRP cohort to 70 %. Several large families have also been tested for linkage using Afymetrix single nucleotide polymorphism (SNP) arrays.
Retinitis pigmentosa; Next-generation sequencing; Linkage mapping; Mutation prevalence; Retinal gene capture; Whole-exome sequencing
We report the first large-scale exome-wide analysis of the combined germline-somatic landscape in ovarian cancer. Here we analyze germline and somatic alterations in 429 ovarian carcinoma cases and 557 controls. We identify 3,635 high confidence, rare truncation and 22,953 missense variants with predicted functional impact. We find germline truncation variants and large deletions across Fanconi pathway genes in 20% of cases. Enrichment of rare truncations is shown in BRCA1, BRCA2, and PALB2. Additionally, we observe germline truncation variants in genes not previously associated with ovarian cancer susceptibility (NF1, MAP3K4, CDKN2B, and MLL3). Evidence for loss of heterozygosity was found in 100% and 76% of cases with germline BRCA1 and BRCA2 truncations respectively. Germline-somatic interaction analysis combined with extensive bioinformatics annotation identifies 237 candidate functional germline truncation and missense variants, including 2 pathogenic BRCA1 and 1 TP53 deleterious variants. Finally, integrated analyses of germline and somatic variants identify significantly altered pathways, including the Fanconi, MAPK, and MLL pathways.
Macular degeneration is a common cause of blindness in the elderly. To identify rare coding variants associated with a large increase in risk of age-related macular degeneration (AMD), we sequenced 2,335 cases and 789 controls in 10 candidate loci (57 genes). To increase power, we augmented our control set with ancestry-matched exome sequenced controls. An analysis of coding variation in 2,268 AMD cases and 2,268 ancestry matched controls revealed two large-effect rare variants; previously described R1210C in the CFH gene (fcase = 0.51%, fcontrol = 0.02%, OR = 23.11), and newly identified K155Q in the C3 gene (fcase = 1.06%, fcontrol = 0.39%, OR = 2.68). The variants suggest decreased inhibition of C3 by Factor H, resulting in increased activation of the alternative complement pathway, as a key component of disease biology.
Data from eight breast cancer genome sequencing projects identified 25 patients with HER2 somatic mutations in cancers lacking HER2 gene amplification. To determine the phenotype of these mutations, we functionally characterized thirteen HER2 mutations using in vitro kinase assays, protein structure analysis, cell culture and xenograft experiments. Seven of these mutations are activating mutations, including G309A, D769H, D769Y, V777L, P780ins, V842I, and R896C. HER2 in-frame deletion 755-759, which is homologous to EGFR exon 19 in-frame deletions, had a neomorphic phenotype with increased phosphorylation of EGFR or HER3. L755S produced lapatinib resistance, but was not an activating mutation in our experimental systems. All of these mutations were sensitive to the irreversible kinase inhibitor, neratinib. These findings demonstrate that HER2 somatic mutation is an alternative mechanism to activate HER2 in breast cancer and they validate HER2 somatic mutations as drug targets for breast cancer treatment.
Genomics; Breast Cancer; Receptor Tyrosine Kinase; Oncogene
Genome-wide association studies (GWAS) have identified >500 common variants associated with quantitative metabolic traits, but in aggregate such variants explain at most 20–30% of the heritable component of population variation in these traits. To further investigate the impact of genotypic variation on metabolic traits, we conducted re-sequencing studies in >6,000 members of a Finnish population cohort (The Northern Finland Birth Cohort of 1966 [NFBC]) and a type 2 diabetes case-control sample (The Finland-United States Investigation of NIDDM Genetics [FUSION] study). By sequencing the coding sequence and 5′ and 3′ untranslated regions of 78 genes at 17 GWAS loci associated with one or more of six metabolic traits (serum levels of fasting HDL-C, LDL-C, total cholesterol, triglycerides, plasma glucose, and insulin), and conducting both single-variant and gene-level association tests, we obtained a more complete understanding of phenotype-genotype associations at eight of these loci. At all eight of these loci, the identification of new associations provides significant evidence for multiple genetic signals to one or more phenotypes, and at two loci, in the genes ABCA1 and CETP, we found significant gene-level evidence of association to non-synonymous variants with MAF<1%. Additionally, two potentially deleterious variants that demonstrated significant associations (rs138726309, a missense variant in G6PC2, and rs28933094, a missense variant in LIPC) were considerably more common in these Finnish samples than in European reference populations, supporting our prior hypothesis that deleterious variants could attain high frequencies in this isolated population, likely due to the effects of population bottlenecks. Our results highlight the value of large, well-phenotyped samples for rare-variant association analysis, and the challenge of evaluating the phenotypic impact of such variants.
Abnormal serum levels of various metabolites, including measures relevant to cholesterol, other fats, and sugars, are known to be risk factors for cardiovascular disease and type 2 diabetes. Identification of the genes that play a role in generating such abnormalities could advance the development of new treatment and prevention strategies for these disorders. Investigations of common genetic variants carried out in large sets of research subjects have successfully pinpointed such genes within many regions of the human genome. However, these studies often have not led to the identification of the specific genetic variations affecting metabolic traits. To attempt to detect such causal variations, we sequenced genes in 17 genomic regions implicated in metabolic traits in >6,000 people from Finland. By conducting statistical analyses relating specific variations (individually and grouped by gene) to the measures for these metabolic traits observed in the study subjects, we added to our understanding of how genotypes affect these traits. Our findings support a long-held hypothesis that the unique history of the Finnish population provides important advantages for analyzing the relationship between genetic variations and biomedically important traits.
Cancer immunoediting, the process whereby the immune system controls tumour outgrowth and shapes tumour immunogenicity, is comprised of three phases: elimination, equilibrium and escape1–5. Although many immune components that participate in this process are known, its underlying mechanisms remain poorly defined. A central tenet of cancer immunoediting is that T cell recognition of tumour antigens drives the immunologic destruction or sculpting of a developing cancer. However, our current understanding of tumour antigens comes largely from analyses of cancers that develop in immunocompetent hosts and thus may have already been edited. Little is known about the antigens expressed in nascent tumour cells, whether they are sufficient to induce protective anti-tumour immune responses or whether their expression is modulated by the immune system. Here, using massively parallel sequencing, we characterize expressed mutations in highly immunogenic methylcholanthrene-induced sarcomas derived from immunodeficient Rag2−/− mice which phenotypically resemble nascent primary tumour cells1,3,5. Employing class I prediction algorithms, we identify mutant spectrin-β2 as a potential rejection antigen of the d42m1 sarcoma and validate this prediction by conventional antigen expression cloning and detection. We also demonstrate that cancer immunoediting of d42m1 occurs via a T cell-dependent immunoselection process that promotes outgrowth of pre-existing tumour cell clones lacking highly antigenic mutant spectrin-β2 and other potential strong antigens. These results demonstrate that the strong immunogenicity of an unedited tumour can be ascribed to expression of highly antigenic mutant proteins and show that outgrowth of tumour cells that lack these strong antigens via a T cell-dependent immunoselection process represents one mechanism of cancer immunoediting.
The emergence of next-generation sequencing (NGS) technologies offers an incredible opportunity to comprehensively study DNA sequence variation in human genomes. Commercially available platforms from Roche (454), Illumina (Genome Analyzer and Hiseq 2000), and Applied Biosystems (SOLiD) have the capability to completely sequence individual genomes to high levels of coverage. NGS data is particularly advantageous for the study of structural variation (SV) because it offers the sensitivity to detect variants of various sizes and types, as well as the precision to characterize their breakpoints at base pair resolution. In this chapter, we present methods and software algorithms that have been developed to detect SVs and copy number changes using massively parallel sequencing data. We describe visualization and de novo assembly strategies for characterizing SV breakpoints and removing false positives.
Next-generation sequencing; Paired-end sequencing; 454; Illumina; Solexa; Abi solid; Insertions; Deletions; Duplications; Inversions; Translocations; Indels; Copy number variants
Motivation: The sequencing of tumors and their matched normals is frequently used to study the genetic composition of cancer. Despite this fact, there remains a dearth of available software tools designed to compare sequences in pairs of samples and identify sites that are likely to be unique to one sample.
Results: In this article, we describe the mathematical basis of our SomaticSniper software for comparing tumor and normal pairs. We estimate its sensitivity and precision, and present several common sources of error resulting in miscalls.
Availability and implementation: Binaries are freely available for download at http://gmt.genome.wustl.edu/somatic-sniper/current/, implemented in C and supported on Linux and Mac OS X.
Contact: firstname.lastname@example.org; email@example.com
Supplementary information: Supplementary data are available at Bioinformatics online.
To correlate the variable clinical features of estrogen receptor positive (ER+) breast cancer with somatic alterations, we studied pre-treatment tumour biopsies accrued from patients in a study of neoadjuvant aromatase inhibitor (AI) therapy by massively parallel sequencing and analysis. Eighteen significantly mutated genes were identified, including five genes (RUNX1, CBFB, MYH9, MLL3 and SF3B1) previously linked to hematopoietic disorders. Mutant MAP3K1 was associated with Luminal A status, low grade histology and low proliferation rates whereas mutant TP53 associated with the opposite pattern. Moreover, mutant GATA3 correlated with suppression of proliferation upon AI treatment. Pathway analysis demonstrated mutations in MAP2K4, a MAP3K1 substrate, produced similar perturbations as MAP3K1 loss. Distinct phenotypes in ER+ breast cancer are associated with specific patterns of somatic mutations that map into cellular pathways linked to tumor biology but most recurrent mutations are relatively infrequent. Prospective clinical trials based on these findings will require comprehensive genome sequencing.