|Home | About | Journals | Submit | Contact Us | Français|
The full complement of DNA mutations that are responsible for the pathogenesis of acute myeloid leukemia (AML) is not yet known.
We used massively parallel DNA sequencing to obtain a very high level of coverage (approximately 98%) of a primary, cytogenetically normal, de novo genome for AML with minimal maturation (AML-M1) and a matched normal skin genome.
We identified 12 acquired (somatic) mutations within the coding sequences of genes and 52 somatic point mutations in conserved or regulatory portions of the genome. All mutations appeared to be heterozygous and present in nearly all cells in the tumor sample. Four of the 64 mutations occurred in at least 1 additional AML sample in 188 samples that were tested. Mutations in NRAS and NPM1 had been identified previously in patients with AML, but two other mutations had not been identified. One of these mutations, in the IDH1 gene, was present in 15 of 187 additional AML genomes tested and was strongly associated with normal cytogenetic status; it was present in 13 of 80 cytogenetically normal samples (16%). The other was a nongenic mutation in a genomic region with regulatory potential and conservation in higher mammals; we detected it in one additional AML tumor. The AML genome that we sequenced contains approximately 750 point mutations, of which only a small fraction are likely to be relevant to pathogenesis.
By comparing the sequences of tumor and skin genomes of a patient with AML-M1, we have identified recurring mutations that may be relevant for pathogenesis.
Acute myeloid leukemia (AML) is a clonal hematopoietic disease caused by both inherited and acquired genetic alterations. 1-3 Current AML classification and prognostic systems incorporate genetic information but are limited to known abnormalities that have previously been identified with the use of cytogenetics, array comparative genomic hybridization (CGH), gene-expression profiling, and the resequencing of candidate genes (see the Glossary).
The karyotyping of AML cells remains the most powerful predictor of the outcome in patients with AML and is routinely used by clinicians. 4,5 As an adjunct to cytogenetic studies, small subcytogenetic amplifications and deletions can be identified with the use of genomic methods, such as single-nucleotide-polymorphism (SNP) array and array CGH platforms (see the Glossary). However, these techniques remain investigational, and studies6-9 suggest that there are few recurrent acquired copy-number alterations in each AML genome. Gene-expression profiling has identified patients with known chromosomal lesions and genetic mutations and subgroups of patients with normal cytogenetic profiles who have variable clinical outcomes.10,11 Expression profiling has yielded single-gene predictors of outcome that are currently being evaluated for clinical use.12-16 Candidate-gene resequencing studies have also identified recurrent mutations in several genes — for example, genes encoding FMS-related tyrosine kinase 3 (FLT3) and nucleophosmin 1 (NPM1) — that can help to stratify patients with normal cytogenetic profiles according to risk and to identify patients for targeted therapy (e.g., those with mutated FLT3).3,12,17 However, the revised classification systems are imperfect, suggesting that important genetic factors for the pathogenesis of AML remain to be discovered.
We have previously described the sequence of an entire AML genome from a patient who had AML with minimal maturation (AML-M1) and a normal cytogenetic profile.18 Here we describe the genome sequence of another such tumor and recurring mutations in additional AML tumors.
Details regarding the methods for library production, DNA sequencing with the Illumina Genome Analyzer II,19 evaluation of sequence coverage, identification of sequence variants, validation of variants and determination of the prevalence of variants in the index AML tumor, and screening of additional AML samples are provided in the Supplementary Appendix, available with the full text of this article at NEJM.org. All the high-quality single-nucleotide variants (SNVs) that were found in tumor and skin samples from this patient are available in the database of genotypes and phenotypes (dbGaP) of the National Center for Biotechnology Information (accession number, phs000159.v1.p1).
A previously healthy 38-year-old man of European ancestry presented with fatigue and a cough. The white-cell count was 39,800 cells per cubic millimeter, with 97% blasts; the hemoglobin level was 8.9 g per deciliter, and the platelet count was 35,000 per cubic millimeter. A bone marrow examination revealed 90% cellularity and 86% myeloperoxidase-positive blasts (Fig. 1 in the Supplementary Appendix). Routine cytogenetic analysis of bone marrow samples revealed a normal 46,XY karyotype. There was no family history of leukemia. The patient’s mother had received the diagnosis of breast cancer at the age of 60 years and of non-Hodgkin’s lymphoma at the age of 63 years; her half-sister had received the diagnosis of breast cancer at the age of 50 years.
Samples of the patient’s bone marrow and skin were banked for whole-genome sequencing under a protocol approved by the institutional review board at Washington University. The patient provided written informed consent.
The patient was treated initially with a 7-day course of infusional cytarabine and with a 3-day course of daunorubicin. Within 5 weeks, he had complete morphologic remission and recovery of white-cell and platelet counts. The patient subsequently received consolidation therapy with four cycles of high-dose cytarabine without any further antileukemic therapy. He remained in complete remission 3 years later.
DNA samples from the patient’s bone marrow sample at the time of initial presentation and a normal skin-biopsy specimen obtained after the patient’s disease was in remission were labeled and genotyped with the use of the Affymetrix Genome-Wide Human SNP Array 6.0. The tumor genome had no detectable somatic copy-number alterations and no regions of partial uniparental disomy (Glossary, and Fig. 2 in the Supplementary Appendix). RNA that was derived from the same bone marrow sample was analyzed with the use of the Affymetrix GeneChip Human Genome U133 Plus 2.0 array, which revealed an expression signature similar to that of many other cytogenetically normal marrow samples from patients with AML-M1 (Fig. 2 in the Supplementary Appendix).
We sequenced 69.9 billion base pairs (23.3× haploid coverage) from DNA libraries that we generated from the tumor sample and 63.9 billion base pairs from libraries that we generated from the normal skin sample (21.3× haploid coverage) (Glossary and Table 1). Using Affymetrix 6.0 SNP arrays, we confirmed the detection of both alleles of 98.5% of the approximately 45,000 high-quality heterozygous SNPs in the tumor sample and 97.4% of the approximately 45,000 high-quality heterozygous SNPs in the skin sample.
A summary of the sequence differences between the patient’s tumor genome and National Center for Biotechnology Information build 36 of the human reference genome is shown in Figure 1 (see the Glossary).20 We identified 3,872,936 SNVs in the tumor genome, of which 3,464,449 passed a stringent calling filter. Of these SNVs, 3,377,680 (97.5%) were detected in the skin genome, indicating that they were inherited variants. Of the 86,769 potentially novel somatic SNVs, 66,513 had been described previously.
We binned the remaining 20,256 SNVs into four tiers, which are detailed in the Supplementary Appendix. Briefly, tier 1 contains all changes in the amino acid coding regions of annotated exons, consensus splice-site regions, and RNA genes (including microRNA genes). Tier 2 contains changes in highly conserved regions of the genome or regions that have regulatory potential. Tier 3 contains mutations in the nonrepetitive part of the genome that does not meet tier 2 criteria, and tier 4 contains mutations in the remainder of the genome. We tentatively identified 113 potential tier 1 mutations, 749 potential tier 2 mutations, 3188 potential tier 3 mutations, and 16,206 potential tier 4 mutations. For each of the 113 putative tier 1 variants, we amplified the genomic region containing the mutation from both tumor and skin, using a polymerase-chain-reaction (PCR) assay, and performed Sanger sequencing. Of the 101 variants that were called with low confidence (the calling algorithm is summarized in the Supplementary Appendix), none were validated. Of the high-confidence variants, 10 of 12 were validated as somatic mutations. Similarly, we tested 178 low-confidence calls for tier 2, and only one was validated. In contrast, 51 of 104 high-confidence tier 2 calls were validated. We did not carry out validation studies of variants in tiers 3 and 4.
We also searched for somatic insertions and deletions (indels) using an algorithm described in the Supplementary Appendix. We identified 142 potential somatic indels (28 deletions and 114 insertions). Of these variants, 119 failed validation (i.e., they were falsely positive) in Sanger sequencing of the relevant PCR products, 21 were validated but were present in both tumor and skin, and 2 were validated as somatic mutations. One was a 4-bp insertion in exon 12 of the NPM1 gene associated with aberrant cytoplasmic expression of nucleophosmin (NPMc). This insertion creates a frameshift mutation and a truncated protein that is known to have altered cellular localization, as described previously.21 The second mutation was a 3-bp insertion in the gene encoding centrosomal protein 170kDa (CEP170) at amino acid 177, predicted to result in the addition of a leucine residue at this position.
The genes with tier 1 mutations and the consequences of these mutations are summarized in Table 2, and in Table 1 in the Supplementary Appendix. Both the NPMc insertion and the NRAS mutation have been described previously in AML genomes, and both are known to be relevant for pathogenesis.3 Mutations in IDH1 (encoding isocitrate dehydrogenase 1), which are predicted to affect the arginine residue at position 132, are found in malignant gliomas but have not been reported in patients with AML and are rare in other tumor types.22-24 Variants of the nine other tier 1 genes are discussed in the Supplementary Appendix.
Each of the 10 point mutations was amplified from tumor and skin samples by means of PCR, and the DNA species carrying the variant allele was assayed by sequencing the PCR products with the use of the Illumina platform. The entire experiment was replicated with amplified genomic DNA, with excellent concordance for all samples (Fig. 3 in the Supplementary Appendix). The variant allele frequencies of the two insertions were determined by sequencing PCR products containing these mutations. The representation of all but two of the mutations — in chromosome 19 open reading frame 62 (C19orf62), an unannotated gene of unknown function, and CEP170 — was approximately 50%, suggesting that all the mutations were heterozygous and present in nearly all the cells in the tumor sample (Fig. 2A). Ten of the 12 genes in tier 1 had probe sets on the Affymetrix U133 Plus 2.0 array, and 9 of 10 were detectably expressed (Table 1). We also assayed expression of the 10 nonsynonymous mutant alleles by means of reverse-transcriptase PCR, using amplicons designed to span introns, followed by sequencing and counting of the sequenced PCR products. Eight of the mutant alleles were detected at frequencies of 35 to 85%. However, for two of the mutations (in FREM2 and IMPG2) we did not detect complementary DNA carrying the variant allele (although we easily detected the wild-type allele), even though each variant was present in approximately 50% of the tumor DNA.
The individual bases that were mutated were highly conserved for 10 of the 12 variants, and all but 1 were found in highly conserved regions of the genome. The Sorting Intolerant from Tolerant (SIFT) algorithm (which gauges the likely effect of genic mutations on protein function) predicted that the mutations in NRAS, IDH1, IMPG2, and ANKRD26 were deleterious.25 The splice-site mutation at the 3′ end of intron 4 of C19orf62 caused exon 5 to be skipped (data not shown).
We then genotyped the tier 1 mutations in 187 additional samples from patients with AML whose clinical characteristics have been described previously26 (Table 2 in the Supplementary Appendix). The NPMc mutation was previously shown to be present in 43 of 180 samples (23.9%), and activating NRAS mutations were present in 17 of 182 samples (9.3%).26 We observed mutations in IDH1, which were predicted to cause substitution of the arginine residue at position 132, in 16 of 188 samples: R132C in 8 samples, R132H in 7 samples, and R132S in 1 sample (Table 2 in the Supplementary Appendix). The other nine mutations were not detected in the 187 additional samples. We detected no R172 mutations in IDH2 in 188 samples (the sample from the index patient and the 187 additional samples), nor did we observe additional mutations in any of the exons of IDH1 or CDC42.
A nonsynonymous acquired mutation (C328Y) was found in the mitochondrial gene ND4, which encodes NADH dehydrogenase subunit 4, a part of complex 1 of the electron transport chain. Two of 93 additional AML samples also had nonsynonymous mutations in this gene, but the importance of these mutations is not yet clear (Table 5 and the Results and Discussion section in the Supplementary Appendix).
We confirmed 52 mutations in tier 2. DNA segments, each containing 1 of the 52 mutations, were PCR-amplified from the tumor and skin samples and sequenced to determine the proportion of DNA molecules carrying the mutation (Fig. 2B, and Table 4 in the Supplementary Appendix). Three of these tier 2 mutations had variant frequencies of approximately 98%, and all were located on chromosome X or Y. Because only a single copy of these chromosomes was present in this male genome, the high representation of these three tier 2 mutations was consistent with the finding that an extremely high percentage of cells within the bone marrow sample were part of the malignant clone. One mutation (chromosome 4 at position 128,102,994) had a variant read frequency of approximately 78%, and we observed no somatic microamplification or deletion near this variant. Of the tier 2 mutations, 39 were present in approximately 50% of DNA species, and 9 were present in approximately 40%. We genotyped the 52 tier 2 mutations in 187 additional AML samples and detected the presence of just 1 of the mutations (on chromosome 10) in 1 other AML sample, from a patient with myelomonocytic leukemia (AML-M4), which bore a translocation and did not have a paired normal sample (Table 2 in the Supplementary Appendix). The proportion of DNA species in this sample that carried the mutation was 54%, suggesting that it was heterozygous.
Of the 16 patients who had AML with an IDH1 R132 mutation, 13 had tumors with normal cytogenetic profiles (of a total of 80 cytogenetically normal samples [16%]), 2 had trisomy 8, and 1 had trisomy 13. Ten of the 16 patients had AML-M1, three had AML with maturation (AML-M2), and three had AML-M4. The characteristics of patients with and those without the IDH1 mutation are shown in Table 3, and in Tables 2 and 3 in the Supplementary Appendix. The mutation was detected only in patients with cytogenetic profiles associated with intermediate risk (P<0.001).4,5 Although the patients who were analyzed in this study were not treated with a single uniform protocol, outcome data were available for all 188 patients (Table 2 in the Supplementary Appendix). IDH1 mutational status did not have independent prognostic value with respect to overall survival in multivariate analysis; subgroup analysis showed a possible adverse effect on overall survival among patients with normal-karyotype AML and wild-type NPM1, regardless of FLT3 status (Fig. 4 in the Supplementary Appendix).
Our findings support the use of an unbiased sequencing approach to discover previously unsuspected, recurring mutations in a cancer genome. With improved sequencing techniques, we covered this genome more completely than the first one we sequenced (98% vs. 91% diploid coverage) and used fewer sequencing runs (16.5 vs. 98), resulting in a dramatically reduced cost of data generation. With better data quality and calling algorithms, we reduced the 96% false positive frequency of possible mutations for the first sequenced AML genome to a frequency of 47% of the high-confidence tier 1 and 2 mutations called in this genome. We predicted 1458 tumor-specific point mutations with high confidence; we tested 116 of these with validation sequencing and confirmed 61 of them (53%). Thus, this genome may contain approximately 750 somatic point mutations. We detected mutations in NRAS, NPMc, and IDH1 and a tier 2 mutation on chromosome 10 in more than one AML genome, suggesting that these mutations are not random and are probably important for the pathogenesis of this tumor.
We suggest that the 12 nonsynonymous mutations are the most likely to be relevant for pathogenesis, since they could potentially alter the function of expressed genes. Consistent with this idea and with the results of our previous study18 is the finding that all these mutations were retained in the dominant clone. Surprisingly, we found that virtually all the 52 tier 2 mutations were also present in nearly every tumor cell in the sample, suggesting that they are also a part of the same dominant clone. However, one cannot conclude that these mutations (or any of the tier 3 or 4 mutations) are relevant for pathogenesis simply because they are found at a high frequency in the dominant clone. It is more likely that most of these mutations are random, benign sequence changes that existed in the hematopoietic cell that was transformed (i.e., they were preexisting and carried along as benign “passengers,” irrelevant for pathogenesis). The finding that the percentage of mutations found in each tier closely approximated the total amount of DNA assayed in that tier supports this hypothesis. Collectively, these data suggest that the vast majority of the mutations that we detected in this genome are random, background mutations in the hematopoietic stem cell that was transformed.27 Functional validation will be required to prove which mutations are truly important.
The best test of the relevance of individual mutations for pathogenesis (in the absence of functional validation) is recurrence in other AML samples or other cancers. Of the 12 tier 1 mutations, 3 (occurring in NPM1, NRAS, and IDH1) were recurrent in patients with AML and therefore were likely to be important in the pathogenesis of this tumor. R132 mutations in the IDH1 gene had not previously been detected in the 45 patients with AML who were tested23 and are detected only rarely in tumor types other than malignant gliomas.22,24 The IDH1 R132H, C, and S mutations dramatically reduce the catalytic activity of the IDH1 enzyme; it has been suggested that IDH1 is a tumor suppressor that is inactivated by dominant mutations in R132.28 There are significant differences, however, between the IDH1 mutations found in gliomas and those in AML. We detected the R132C mutation in 8 of 16 patients with AML who carried an IDH1 mutation (50%). In contrast, the mutation was reported in only 7 of 161 patients with gliomas (4%, P<0.001 by Fisher’s exact test). The most common mutation in gliomas (R132H) was detected in 142 of 161 patients (88%) but in only 7 of 16 patients with AML (44%, P = 0.13). When the R132H mutation was overexpressed in a glioblastoma cell line, induction of messenger RNAs for several target genes of hypoxia-inducible factor 1α (HIF1α) was detected (GLUT1, VEGF, and PGK1).28 However, in 13 patients with AML — 5 with R132H and 8 with R132C — there were no significant alterations in the expression of any of these genes (Fig. 3 in the Supplementary Appendix).
Assuming that the number of point mutations in most AML genomes is similar to the number in the first 2 patients we studied (approximately 750), the likelihood that 2 of 188 patients will carry an identical mutation at the same position in the genome is extremely small (1.1×10−9). This suggests that the tier 2 somatic mutation at position 108,115,590 of chromosome 10 is unlikely to be a random event. It falls in a conserved region with regulatory potential, and its detection in a second patient with AML suggests that this region may contribute to pathogenesis through a novel mechanism that remains to be defined.
Although the potential of next-generation sequencing platforms for uncovering the genetic rules of cancer is great, the sequencing of thousands of additional cancer genomes will be required to fully unravel this complex and heterogeneous disease.29,30
Supported by grants from the National Institutes of Health (PO1-CA101937, to Dr. Ley; and U54-HG003079, to Dr. Wilson) and the Barnes–Jewish Hospital Foundation (00335-0505-01, to Dr. Ley).
We thank Jennifer Ivanovich for obtaining the detailed family histories of the patients; Nancy Reidelberger for administrative support; Dr. Rob Culverhouse for statistical support; Todd Hepler, William Schroeder, Justin Lolofie, Scott Abbott, Shawn Leonard, Ken Swanson, Indraniel Das, and Michael Kiwala for their contributions to the Laboratory Information Management System; Gary Stiehr, Richard Wohlstadter, Matt Weil, and Kelly Fallon for information-technology support; Drs. Clara Bloomfield, Michael Caligiuri, and James Vardiman for providing the AML samples from the Cancer and Leukemia Group B Leukemia Bank; the nursing staff of the Siteman Cancer Center and Barnes–Jewish Hospital; and all the patients who participated in the study.
Dr. Westervelt reports receiving lecture fees from Celgene and Novartis; and Dr. DiPersio, receiving consulting and lecture fees from Genzyme. No other potential conflict of interest relevant to this article was reported.