|Home | About | Journals | Submit | Contact Us | Français|
The myelodysplastic syndromes are a group of hematologic disorders that often evolve into secondary acute myeloid leukemia (AML). The genetic changes that underlie progression from the myelodysplastic syndromes to secondary AML are not well understood.
We performed whole-genome sequencing of seven paired samples of skin and bone marrow in seven subjects with secondary AML to identify somatic mutations specific to secondary AML. We then genotyped a bone marrow sample obtained during the antecedent myelodysplastic-syndrome stage from each subject to determine the presence or absence of the specific somatic mutations. We identified recurrent mutations in coding genes and defined the clonal architecture of each pair of samples from the myelodysplastic-syndrome stage and the secondary-AML stage, using the allele burden of hundreds of mutations.
Approximately 85% of bone marrow cells were clonal in the myelodysplastic-syndrome and secondary-AML samples, regardless of the myeloblast count. The secondary-AML samples contained mutations in 11 recurrently mutated genes, including 4 genes that have not been previously implicated in the myelodysplastic syndromes or AML. In every case, progression to acute leukemia was defined by the persistence of an antecedent founding clone containing 182 to 660 somatic mutations and the outgrowth or emergence of at least one subclone, harboring dozens to hundreds of new mutations. All founding clones and subclones contained at least one mutation in a coding gene.
Nearly all the bone marrow cells in patients with myelodysplastic syndromes and secondary AML are clonally derived. Genetic evolution of secondary AML is a dynamic process shaped by multiple cycles of mutation acquisition and clonal selection. Recurrent gene mutations are found in both founding clones and daughter subclones. (Funded by the National Institutes of Health and others.)
The Myelodysplastic Syndromes, A Heterogeneous group of diseases characterized by ineffective hematopoiesis, are the most common cause of acquired bone marrow failure in adults.1 Secondary acute myeloid leukemia (AML) develops in approximately one third of persons with myelodysplastic syndromes.2 Clinical discrimination between the myelodysplastic syndromes and secondary AML currently rests predominantly on cytomorphologic analysis, since patients with myelodysplastic syndromes have dysplastic hematopoiesis and a myeloblast count of less than 20%, whereas those with a myeloblast count of 20% or more have AML. Although considerable overlap exists between the spectrum of cytogenetic and molecular lesions seen in the two disorders, there remains uncertainty among patients, insurers, and funding agencies about whether the myelodysplastic syndromes are actually cancers.3
The genetic events underlying the progression from cellular dysplasia to cancer have been studied extensively in epithelial tissues (e.g., from the colon and oropharynx),4–6 but less is known about the genetic progression of the myelodysplastic syndromes to secondary AML. Candidate-gene resequencing has identified several genes that have recurrent mutations during the evolution from the myelodysplastic syndromes to secondary AML (e.g., FLT3, NPM1, RUNX1, TP53, and NRAS),7–15 but our understanding of the total number and clonal distribution of mutations in this disease is limited. We used whole-genome sequencing to discover somatic mutations in bone marrow samples obtained from seven subjects with secondary AML and determined whether these mutations were present in paired samples obtained during the antecedent myelodysplastic-syndrome stage. We used this information to define the proportion of clonal cells and genetic architecture at the time of diagnosis of a myelodysplastic syndrome and progression to secondary AML.
Methods are described in detail in the Supplementary Appendix, available with the full text of this article at NEJM.org. In brief, bone marrow biopsy specimens were obtained from seven subjects, all of whom provided written informed consent on a form that contained specific language authorizing whole-genome sequencing. Myeloblasts were enriched by means of cell sorting. Paired-end DNA libraries from bone marrow samples (for secondary AML) and normal skin samples were sequenced on the Illumina HiSeq 2000 and Genome Analyzer IIx. Aligned reads were analyzed to detect putative somatic mutations, as described previously.16 Somatic-mutation predictions were validated with the use of solid-phase capture, followed by deep sequencing of skin, myelodysplastic-syndrome, and secondary-AML samples. Mutations in one subject (UPN266395) have been reported previously.17 Array-based profiling of DNA copy number and gene expression were performed, as described previously.18 Sequence and microarray data have been deposited in the database of Genotypes and Phenotypes (dbGaP) of the National Center for Biotechnology Information (accession number, phs000159.v3.p2).
We performed whole-genome sequencing using paired-end reads generated from DNA libraries prepared from secondary-AML samples and matched skin samples from seven subjects with antecedent de novo myelodysplastic syndromes (Table 1). Diploid genome coverage was more than 95% for the secondary-AML samples and 97% for the skin samples (Table 1 in the Supplementary Appendix). Putative somatic mutations were called and prioritized into nonoverlapping tiers, as described previously.16 To validate somatic mutations and measure mutant allele frequencies, we designed custom solid-phase long-oligonucleotide arrays for each subject and used them to capture regions of the genome containing putative single-nucleotide variants (SNVs, or point mutations) and insertions or deletions (indels). For each subject, a trio of samples was analyzed (normal skin, bone marrow obtained during the antecedent myelodysplastic-syndrome stage, and bone marrow obtained during the secondary-AML stage). The captured DNA was sequenced to provide an average of 640 reads at the sites of validated mutations (Table 1 in the Supplementary Appendix). In each genome, we validated 304 to 872 somatic point mutations and 0 to 2 indels in tier 1 (i.e., the tier consisting of changes in the amino acid coding regions of annotated exons, consensus splice-site regions, and RNA genes) (Tables 2 and 3 in the Supplementary Appendix). The enrichment of myeloblasts by means of cell sorting had no major effect on measurement of the mutant allele burden (Fig. 1 in the Supplementary Appendix).
We first focused on tier 1 mutations with predicted translational consequences (i.e., whole-gene deletions, indels, missense, nonsense, frameshift, or splice-site mutations). There were 17 to 32 validated point mutations or indels (mean, 24) per secondary-AML genome in 168 genes among the seven samples (Table 3 in the Supplementary Appendix). Most of these genes did not have recurrent mutations, suggesting that many of the mutations were randomly acquired and not causally related to the pathogenesis of the myelodysplastic syndromes. Two recurrently mutated genes were detected in two samples each: loss-of-function mutations in a known myeloid-tumor suppressor, RUNX1, and two somatic missense mutations in UMODL1 (Table 2). UMODL1 was recently reported to be mutated in patients with multiple myeloma and those with ovarian cancer.19,20 UMODL1 messenger RNA (mRNA) was expressed in normal CD34+ progenitor cells and in secondary-AML cells from our seven subjects, and the two mutations occurred in regions of UMODL1 encoding conserved domains (T533P in a calcium-binding epidermal-growth-factor–like domain and V882M in a sea-urchin sperm protein, enterokinase, and agrin [SEA] domain).21
To extend these results, we compared the mutations with translational consequences in myelodysplastic-syndrome and secondary AML samples with mutations identified in 200 de novo AML samples (50 subjected to whole-genome sequencing and 150 to whole-exome sequencing). We identified 10 genes that were mutated in one secondary-AML sample and in at least 3 of 200 AML samples (>1%) (Table 2). Seven of these genes are known to have recurrent mutations in AML. Four genes with recurrent mutations (including UMODL1) have not previously been implicated in the myelodysplastic syndromes or AML. A specific codon in U2AF1 harbored missense mutations in multiple AML samples, suggesting that these mutations may cause gain of function. Supporting this hypothesis is our recent report17 that the S34F substitution in U2AF1 (affected by a missense mutation) enhances alternative mRNA splicing in vitro. The X-chromosome gene STAG2 was also recurrently mutated. All STAG2 mutations that we observed in this study are predicted to cause protein truncation (H738fs in secondary AML and all 4 nonsense or frameshift mutations in AML). In addition, STAG2 is deleted in AML and other cancers.22,23 Taken together, these results suggest that STAG2 loss of function probably contributes to the pathogenesis of the myelodysplastic syndromes and AML. Although STAG2 inactivation was recently reported to cause aneuploidy in glioblastoma and colorectal-cancer cell lines,23 the subjects with STAG2 mutations in our study had normal karyotypes.
Singleton mutations (i.e., mutations that are not recurrent) may also be important in the pathogenesis of the myelodysplastic syndromes. Overall, the nonrecurrent tier 1 mutations implicate mutant proteins in 1 of 11 biologic pathways24 that are relevant for cancer pathogenesis, and nearly all these pathways were affected in all seven samples (Fig. 2 in the Supplementary Appendix). Tier 1 mutations were not substantially enriched in any single pathway (data not shown).24
The clinicopathological diagnosis of the myelodysplastic syndromes requires a finding of less than 20% myeloblasts (morphologically defined malignant cells) in the bone marrow. However, data from single-nucleotide-polymorphism (SNP) array and single-gene resequencing studies have suggested that the clonal population of cells in the myelodysplastic syndromes can be greater than 20%.18,25,26 In this study, we used capture sequencing data to accurately measure the prevalence of clonal cells in myelodysplastic-syndrome samples. We calculated the mutant allele frequencies for all validated somatic SNVs (adjusted for chromosome copy number) and performed an unsupervised clustering analysis to define mutation clusters.27 Each myelodysplastic-syndrome and secondary-AML genome contained a founding clone of cells, defined as the mutation cluster (containing 182 to 660 somatic SNVs) that had the highest mutant allele burden at the myelodysplastic-syndrome and secondary-AML stages (cluster 1 in Fig. 1A, and Fig. 5 in the Supplementary Appendix). We estimated maximum tumor clonality as twice the average mutant allele frequency of the founding clone (since all mutations were present once per diploid genome after copy-number correction). In all cases, the majority of unfractionated bone marrow cells (up to 92.7%) in the myelodysplastic-syndrome samples were clonal and indistinguishable from the cells in the secondary-AML samples, even with a myeloblast count of less than 5% (Fig. 1B). Although these results confirm previous observations that the myeloblast count can underestimate the size of the clonal population in myelodysplastic syndromes,18,25,26 they also suggest that when the entire genome is interrogated for mutations, clonal hematopoiesis involving most of the bone marrow appears to be the rule even in early-stage myelodysplastic syndromes.
To assess the clonality of copy-number alterations, we designed capture probes spanning heterozygous SNPs that were present in the skin samples and deleted or retained in the bone marrow samples on the basis of SNP arrays and data from whole-genome sequencing (Table 4 in the Supplementary Appendix). We determined the proportion of read counts containing alleles present in the skin samples that were retained in the myelodysplastic-syndrome and secondary-AML samples. The prevalence of the retained alleles in the skin samples was approximately 50%, as expected for heterozygous SNPs. The prevalence of these alleles in the myelodysplastic-syndrome and secondary-AML samples diverged from 50% and was proportionate to the percentage of cells harboring a deletion (Table 5 in the Supplementary Appendix). Clustering of the data regarding copy-number alterations largely recapitulated the SNV clusters for these subjects (Fig. 1C, and Fig. 7 in the Supplementary Appendix). However, unique clones (i.e., unique populations of cells defined by the mutation clusters they contained) were identified by both approaches, suggesting that these types of data provide complementary views of clonal evolution. Our methods do not capture changes in DNA methylation, which are known to be present in myelodysplastic-syndrome genomes,28 suggesting that there may be additional layers of clonal complexity that we have not detected.
All seven samples were oligoclonal at the secondary-AML stage and were monoclonal in only two cases at the myelodysplastic-syndrome stage (Fig. 1D). In the first subject (UPN461282), we identified five distinct mutation clusters that collectively defined the clonal genetic progression of this tumor (Fig. 1A). Two clones were present in the myelodysplastic-syndrome sample. A single cell in clone 2, which contained cluster 1 and 2 mutations, moved forward and by acquisition of cluster 3, 4, and 5 mutations generated three new clones that were present only in the secondary-AML sample. Collectively, the data suggest that the evolution of this tumor proceeded sequentially from cells with cluster 1 mutations to cells containing cluster 1 to 5 mutations, with each new clone carrying forward all the preexisting pathogenic and nonpathogenic mutations (Fig. 2A). The other six tumors also conformed to linear models of clonal evolution (Fig. 5 in the Supplementary Appendix), although analysis at the single-cell level may reveal more complex patterns in some cases.
The secondary-AML genomes contained 304 to 872 somatic SNVs (Table 2 in the Supplementary Appendix). It is likely that most of these mutations are irrelevant to the pathogenesis of the disease. Consistent with this notion is the observation that the number of somatic SNVs per genome tier tended to be proportionate to the tier size (as expected by chance), suggesting that the vast majority of mutations were random background mutations (Fig. 3A in the Supplementary Appendix). We observed that most of the mutations in each secondary-AML sample were present in the paired myelodysplastic-syndrome sample, and there was a subset of secondary-AML–specific mutations. As expected, the proportion of secondary-AML–specific mutations was smaller in the four subjects who had rapid progression to secondary AML (<6 months) than in the three subjects with slower progression (>20 months). An average of 6.7% of all mutations were specific to secondary AML in the subjects with rapid progression, as compared with 37.8% of secondary-AML–specific mutations in the subjects with slow progression (P<0.05) (Fig. 3B in the Supplementary Appendix). The spectrum of transition and transversion mutations in the myelodysplastic-syndrome founding clone was similar in all seven subjects (Fig. 4 in the Supplementary Appendix). The most common substitution was a C·G→T·A transition, as seen in other cancer genomes.29,30 Two subjects (who had been treated with decitabine for 4 to 11 months, after the diagnosis of a myelodysplastic syndrome and before the progression to secondary AML) had a significant increase in the frequency of secondary-AML–specific C→G transversions in their secondary-AML samples, whereas two subjects who were not treated with decitabine did not have a similar increase (Fig. 4 in the Supplementary Appendix). We could not identify a sequence-specific context for these transversions.
Even though the mutations driving clonal outgrowth at each stage are not known, the founding clone in the seven myelodysplastic-syndrome samples (containing 182 to 660 mutations) persisted in all seven secondary-AML samples and included at least one tier 1 mutation. The founding clones acquired at least one new tier 1 mutation with predicted translational consequences during the generation of secondary-AML–specific clusters (Fig. 5 in the Supplementary Appendix). Genes with recurrent mutations were detected in both founding clones and daughter subclones (Fig. 2B).
Using next-generation sequencing, we have found that the proportion of neoplastic bone marrow cells is indistinguishable in myelodysplastic-syndrome and secondary-AML samples, suggesting that the myelodysplastic syndromes are as clonal as secondary AML, even with a myeloblast count of zero. Although clonality is not sufficient to define malignant transformation, it is a cardinal manifestation of most human cancers, and our findings suggest that the myelodysplastic syndromes and secondary AML are both highly clonal hematologic cancers.31 Analysis of the proportion of mutant cells in samples obtained from the same subject before and after progression to secondary AML allowed us to compare the clonal architecture and genes that were mutated in order to gain insight into the genetics of these diseases. Robust detection of mutation clusters was possible because hundreds of mutations per genome were identified by whole-genome sequencing, and the allele burdens were quantified by deep resequencing at two time points. If we had analyzed only the secondary-AML samples or only the tier 1 (i.e., exomic) variants, this complexity could not have been elucidated. In the samples from all seven subjects, the secondary-AML genomes were oligoclonal. The preexisting myelodysplastic-syndrome founding clone always persisted in secondary AML, although it was outcompeted by daughter subclones in some cases. With the acquisition of each new set of mutations, all the preexisting mutations were carried forward, resulting in subclones that contained increasing numbers of mutations during evolution. On the basis of our experimental design, we cannot exclude the possibility that there were additional subclones in the myelodysplastic-syndrome samples that were not present in the secondary-AML samples, and one genome (UPN298273) suggests that this could occur.
A unique aspect of the biology of leukemia is that hematopoietic cells freely mix and recirculate between the peripheral blood and the bone marrow. Clones that persist and grow over time must retain the capacity for self-renewal. Mutations in new clones must confer a growth advantage for them to successfully compete with ancestral clones. The result is that these secondary-AML samples are not monoclonal but are instead a mosaic of several genomes with unique sets of mutations; this mosaic is shaped by the acquisition of serial mutations and clonal diversification. Similarly, recent analysis of de novo AML samples with the use of whole-genome sequencing showed that relapse after chemotherapy is associated with clonal evolution and acquisition of new mutations.32 Analysis of individual cancer cells may reveal additional layers of genetic complexity. Recent studies of B-cell acute lymphoblastic leukemia have shown that serial acquisition of cytogenetic abnormalities in that disease most often occurs through a branching hierarchy and only rarely follows a simple linear path.33,34 Extending this work to include the full complement of mutations discovered by whole-genome sequencing will be a major goal for future studies of cancer genetics.
Our study has several clinical implications. First, the distinction between the myelodysplastic syndromes and secondary AML currently relies on manual enumeration of bone marrow myeloblasts, a standard that is subject to interobserver bias but nonetheless drives major decisions about treatment for patients with small differences in myeloblast counts. Ultimately, identifying the patterns of pathogenic mutations and their clonality in bone marrow samples from patients with myelodysplastic syndromes should lead to greater diagnostic certainty and improved prognostic algorithms. On this note, two of the subjects in our study had progression from myelodysplastic syndromes to secondary AML in 1 month. This progression was based on an increase in the myeloblast count from 7% to 66% in one subject and from 13% to 43% in the other, despite an absence of change in the number of clones (two in each case) and only minor increases in point mutations (<2% were gained during progression to secondary AML in these cases) when the same specimens were analyzed by means of next-generation sequencing. Second, our finding that the dominant secondary-AML clone was derived from a myelodysplastic-syndrome founding clone in all cases suggests that therapies targeted to these early mutations might be the most effective strategy for eliminating disease-propagating cells and improving the rate of response to traditional chemotherapy for patients with secondary AML.35,36 Finally, it is possible that disease progression in patients with myelodysplastic syndromes is driven not only by the presence of recurrent mutations, which have recently been shown to have prognostic value,37 but also by the clone (i.e., founding vs. daughter) in which they arise. Coupling genotyping of myelodysplastic-syndrome samples for prognostically important mutations with analysis of the clonal architecture may yield more informative biomarkers and a better understanding of the pathogenesis of the myelodysplastic syndromes.
Supported by grants from the National Institutes of Health (R01HL082973 and RC2HL102927, to Dr. Graubert; U54HG003079, to Dr. Wilson; and P01CA101937, to Dr. Ley); a Howard Hughes Medical Institute Physician-Scientist Early Career Award (to Dr. Walter); and a grant from the National Center for Research Resources (UL1RR024992).
We thank the Alvin J. Siteman Cancer Center High Speed Cell Sorting Core, the Molecular and Genomic Analysis Core, the Hereditary Cancer Core, and the Tissue Procurement Core for providing technical assistance; Masayo Izumi for providing technical assistance; Joshua McMichael for providing assistance with illustrations; and the Cancer Genome Atlas AML study for providing access to de novo AML sequence data from 200 cases.
Disclosure forms provided by the authors are available with the full text of this article at NEJM.org.