In this report, we describe the sequencing and analysis of a primary human cancer genome using next-generation sequencing technology. Our patient’s tumor genome was essentially diploid, and contained ten non-synonymous somatic mutations that may be relevant for her disease. These mutations affect genes participating in several well-described pathways that are known to contribute to cancer pathogenesis, but most of these genes would not have been candidates for directed re-sequencing based on our current understanding of cancer. Hence, these results justify the use of next-generation whole genome sequencing approaches to reveal somatic mutations in cancer genomes.
As we demonstrated in our re-sequencing of the genome of the C. elegans
N2 Bristol strain14
, and again in this study, massively-parallel short-read sequencing provides an effective method for examining single nucleotide and short indel variants by comparison of the aligned reads to a reference genome sequence. By sequencing our patient’s tumor genome to a depth of >30-fold coverage, and gauging our ability to detect known heterozygous positions across the genome, we have produced a sufficient depth and breadth of sequence coverage to comprehensively discover somatic genome variants. A slightly lower coverage of the normal genome from this individual helped to identify nearly 98% of potential variants as being inherited, a critical filter that allowed us to more readily identify the true somatic mutations in this tumor. Our results strongly support the notion that hypothesis-driven (e.g. candidate gene-based) examination of tumor genomes by PCR-directed or capture-based methods is inherently limited, and will miss key mutations. An additional and important consideration is the demand for large amounts of genomic DNA by these techniques; this is a serious limitation when precious clinical samples are being studied. The Illumina/Solexa technology requires only ~1 ug of DNA per library, enabling the study of primary tumor DNA rather than requiring the use of tumor cell lines, which may contain genetic changes and adaptations required for immortalization and maintenance in tissue culture conditions.
A total of 10 non-synonymous somatic mutations were identified in this patient’s tumor genome. Two are well known AML-associated mutations, including an internal tandem duplication of the FLT3 receptor tyrosine kinase gene, which constitutively activates kinase signaling, and portends a poor prognosis5,24,25
, and a four base insertion in exon 12 of the NPM1 gene (NPMc)26–28
. Both of these mutations are common (25–30%) in AML tumors, and both are thought to contribute to progression of the disease, rather than to cause it directly29
. Interestingly, the frequency of the mutant FLT3 allele in the primary and relapse tumor samples (35.08% and 31.30%, respectively) was significantly less than that of the other 9 mutations (p<0.000001 for both the primary and relapse samples). These data suggest that the FLT3 ITD may not have been present in all tumor cells, and further, that it may have been the last mutation acquired.
The other eight somatic mutations that we detected are all single base changes, and none has previously been detected in an AML genome. Four of the genes affected, however, are in gene families that are strongly associated with cancer pathogenesis (including PTPRT, CDH24, PCLKC, and SLC15A1). The other four somatic mutations occurred in genes not previously implicated in cancer pathogenesis, but whose potential functions in metabolic pathways suggest mechanisms by which they could act to promote cancer (including KNDC1, GPR123, EBI2, and GRINL1B). We speculate regarding the roles of these mutations for the pathogenesis of this patient’s disease in Supplementary Materials
The importance of the eight newly defined somatic mutations for AML pathogenesis is not yet known, and will require functional validation studies in tissue culture cells and mouse models to assess their relevance. Even though we could not detect recurrent mutations in the limited AML sample set we surveyed, several lines of evidence suggest that these mutations may not be random, “passenger” mutations, as follows:
- Somatic mutations in this genome are extremely rare. The rarity of somatic variants, and the normal diploid structure of the tumor genome, argues strongly against genetic instability or DNA repair defects in this tumor. Conceptually, this result is further supported by the very small number of novel somatic mutations discovered in the expressed tyrosine kinases of AML samples4,5; genetic instability does not appear to be a general feature of AML genomes.
- Based on the equivalent frequencies of the variant and wild type alleles for the mutations in the tumor genome (except for FLT3 ITD), it is highly likely that all the mutations are heterozygous, and are present in virtually all of the tumor cells (). The latter suggests that these mutations all may have been selected for and retained because they are important for disease pathogenesis in this patient. Alternatively, all may have occurred simultaneously in the same leukemia-initiating cell, but only a subset of the mutations (or an as-yet undetected mutation) is truly important for pathogenesis (i.e. disease “drivers” vs. passengers). Although we suggest that the latter scenario is very unlikely based on our current understanding of tumor progression, many more AML genomes will need to be sequenced to resolve this issue.
- The same mutations were detected in the tumor cells in the relapse sample at approximately the same frequencies as in the primary sample. All of these mutations were therefore present in the resistant tumor cells that contributed to the patient’s relapse, further suggesting that a single clone contains all 10 mutations.
- Seven of the 10 genes containing somatic mutations were detectably expressed in the tumor sample. FLT3 and NPM1 mRNAs were highly expressed in this tumor sample, as they are in virtually all AML samples. We detected mRNA from the CDH24, SLC15A1, and EBI2 genes on the Affymetrix expression array, while expression of GRINL1B and PCLKC were detected by RT-PCR (data not shown). Expression of KNDC1, PTPRT, and GPR123 was not detected by either approach, but we cannot rule out expression of these genes in a small subset of tumor cells (e.g. leukemia initiating cells).
- For the five point mutations where data are available, the mutated base is highly conserved across multiple species ().
Although we performed whole genome sequencing on this cancer sample, we restricted our initial validation studies to the 1–2% of the genome that encodes genes. This raises the issue of whether sequencing the cDNA transcriptome of this tumor would have been a faster, cheaper, and more efficient way of finding the mutations. While this approach will undoubtedly be an important adjunct to whole genome sequencing, there are several advantages to the approach we used:
- Coverage models for whole genome libraries are currently better understood than for cDNA libraries, where transcript abundance can vary over many orders of magnitude.
- Even if the transcriptome had been sequenced, extensive characterization of the normal genome would have been required to distinguish inherited variants from somatic mutations.
- Relevant non-synonymous mutations could be missed by cDNA sequencing, including mutations that result in RNA instability (splice variants, nonsense mutations, etc.), and/or mutations in genes expressed at low levels, or in only a small subset of tumor cells.
The additional non-coding and non-genic somatic variants in this genome (which we currently estimate at 500–1000, based on our assessment of false positive and negative rates for non-synonymous mutations), which will be fully described later, will provide a rich source of potentially relevant sequence changes that will be better understood as more cancer genomes are sequenced.
In summary, we have successfully used a next-generation whole genome sequencing approach to identify new candidate genes that may be relevant for AML pathogenesis. We cannot overemphasize the importance of parallel sequencing of the patient’s normal genome to determine which variants were inherited; the identification of the true somatic mutations in this tumor genome would not have been feasible without this approach. Furthermore, until hundreds (or perhaps thousands) of normal genomes and additional AML tumors are sequenced, the contextual relevance of the mutations found in this genome will be unknown. Regardless, the somatic mutations that we did find were neither predicted by the curation of previously defined cancer genes, nor by the study of this tumor using unbiased, high-resolution array-based genomic approaches. For AML and other types of cancer, whole genome sequencing may therefore be the only effective means for discovering all of the mutations that are relevant for pathogenesis.
Sequence end reads (average length for tumor genome, 32 bp, and for skin, 35 bp) were generated from Illumina/Solexa fragment libraries derived from the tumor or skin cells of patient 933124, using the Illumina Genome Analyzer. The analyzed reads were aligned to the human reference (NCBI Build 36) using Maq21
. Coverage of the tumor and normal genomes was ascertained by comparison to the patient’s heterozygous SNPs, established by compiling shared SNP calls monitored on the Affymetrix 6.0 and Illumina Infinium 550K genotyping platforms. We examined the Maq alignments by Decision Tree analysis to discover single nucleotide variants, as well as to identify copy number variants. Nonaligned reads were further analyzed for indel discovery. For all putative variants, we attempted validation using custom PCR and capillary sequencing on the ABI 3730 platform. All validated somatic mutations were further analyzed by Roche/454 sequencing of PCR-generated amplicons made from primary genomic DNA to compare readcounts of wild-type and mutant alleles in the primary tumor, skin, and relapse tumor samples. A complete description of the AML case sequenced, and the Materials and Methods used to generate this dataset, are provided in the Supplementary Materials