|Home | About | Journals | Submit | Contact Us | Français|
Massively parallel DNA sequencing technologies provide an unprecedented ability to screen entire genomes for genetic changes associated with tumor progression. Here we describe the genomic analyses of four DNA samples from an African-American patient with basal-like breast cancer: peripheral blood, the primary tumor, a brain metastasis, and a xenograft derived from the primary tumor. The metastasis contained two de novo mutations and a large deletion not present in the primary tumor, and was significantly enriched for 20 shared mutations. The xenograft retained all primary tumor mutations, and displayed a mutation enrichment pattern that paralleled the metastasis (16 of 20 genes). Two overlapping large deletions, encompassing CTNNA1, were present in all three tumor samples. The differential mutation frequencies and structural variation patterns in metastasis and xenograft compared to the primary tumor suggest that secondary tumors may arise from a minority of cells within the primary.
Basal-like breast cancer is characterized by the absence of estrogen receptor (ER) expression, the lack of ERBB2 gene amplification, and a high mitotic index. The consequent absence of approved targeted therapy options and frequently poor response to standard chemotherapy often result in a rapidly fatal clinical course. The disease also accounts for an elevated percentage of breast cancers in patients with African ancestry1. Clinical progress has been limited by a poor understanding of the genetic events responsible for this tumor subtype and by limited preclinical models to study the disease. Since basal-like breast cancer has a highly unstable genome, a key question is whether the fatal metastatic process is driven by mutations that occur after the tumor cells arrive at the distant site, or whether the primary tumor generates cells with a complete repertoire of somatic mutations required for metastatic growth. The rapid advancement of next generation sequencing technologies allows comprehensive characterization of genomic changes, facilitating the comparison of multiple samples taken from the same patient to address the genetic basis for tumor progression and metastasis.
A 44-year old African-American woman was diagnosed with an ERBB2 negative and ER negative inflammatory breast cancer. She was treated with neoadjuvant dose-dense chemotherapy2, but significant residual tumor was present in the breast and axillary lymph nodes at mastectomy. This indicated chemotherapy resistance and she underwent radiation therapy. Eight months later, she developed a cerebellar metastasis and, despite resection, rapidly succumbed to widely disseminated disease. A transplantable Human-in-Mouse (HIM) xenograft tumor line was generated from a sample of her primary tumor biopsied before treatment3. The xenograft in the mammary fat pad was locally invasive and produced metastatic deposits in lymph nodes and ovaries. Informed consent for full genome sequencing was obtained and DNA samples were prepared from her peripheral blood, primary tumor, brain metastasis, and an early passage xenograft (harvested 101 days after initial engrafting into the mouse host). Application of the PAM50 intrinsic subtype algorithm identified the primary tumor, brain metastasis, and xenograft line as basal-like subtype, with high risk of relapse (ROR) scores4.
Using a paired-end sequencing strategy, we generated 130.7, 124.9, 111.8, and 149.2 billion base pairs of sequence data from genomic DNA derived from blood, primary tumor, brain metastasis, and xenograft samples, respectively, with corresponding haploid coverages of 38.8X, 29.0X, 32.0X, and 23.8X (Supplementary Table 1). These genome-wide coverages were assessed by comparing SNVs detected by MAQ5 with SNPs genotyped using Illumina 1M duo arrays for all tissues excluding the xenograft. Array data from the metastasis were used as a surrogate for monitoring the xenograft SNP coverage and confirmed bi-allelic detection of 98.27%, 96.79%, 96.17%, and 88.77% of the heterozygous array SNPs in the normal, primary tumor, metastasis, and xenograft sequence datasets, respectively (Supplementary Table 1).
The process for selecting somatic mutations is shown in Supplementary Table 2 and is detailed in Supplementary Information. Putative somatic SNVs and indels that overlap with coding sequences, splice sites, and RNA genes were included as “tier 1”. We combined tier 1 sites identified in all three tumor samples and obtained deep read count data for all four samples from Illumina and/or 454 platforms (Supplementary Information). Based on pathology review, the tumor cellularity estimates were 70% for the primary tumor and 90% for both the brain metastasis and xenograft. Utilizing these estimates, we calculated the tumor read counts by proportionally removing the counts derived from the normal tissue reads from the counts obtained from primary tumor and metastasis reads (Supplementary Table 3a). Using the Illumina platform, we also generated 15.6 Gb (4.4X haploid coverage) of sequence data for the NOD/SCID mouse genome used as the host for the xenograft line. The mapping rates of NOD/SCID data to human and mouse C57BL/6 reference sequences were 3.17% and 95.85%, respectively. Since the non-malignant contamination in xenograft is largely from murine cells (which do not significantly affect read mapping), no correction was applied for the xenograft data. Adjusted tumor read counts were used to calculate mutant allele frequencies. Somatic changes were validated by comparing mutant allele frequencies in the three tumor genomes against the germline DNA sample, combined with a manual review of ABI 3730 data from PCR products (Supplementary Information).
In summary, a total of 50 somatic sites, including 28 missense, 11 silent, 2 splice site, 1 RNA, 1 nonsense, 4 insertions, and 3 deletions, were validated in at least one of the three tumor genomes. Of coding point mutations, the observed nonsynonymous:synonymous (NS:SS) ratio of 2.64:1 (29:11) is not significantly different from that expected by chance6 (P = 0.51), suggesting that the majority of coding mutations do not confer a selective advantage to the basal tumor. This is similar to the NS:SS ratio reported in the small-cell lung cancer cell line NCI-H2097, but higher than the ratio reported in the melanoma cell line COLO-8298.
We investigated the spectrum of DNA sequence changes in this basal tumor and found 55% (22/40) of coding point mutations represent C:G->T:A transitions. A similar frequency of C:G->T:A transitions (56% (18/32)) was observed in the lobular breast tumor recently reported9 (Figure 1a). In addition, 15% (6/40) of coding point mutations representing C:G->A:T transversions were detected in the basal tumor, but none were found in the lobular tumor. The statistical significance of these observations should be explored with the comparative analysis of a larger number of basal and lobular breast tumors. Moreover, the observed C:G->T:A transition frequency is notably higher than those observed in a previous breast cancer study10 (P = 0.027; Figure 1b). A set of extremely high confidence tier 1-4 mutations (somatic score > 55 and average mapping quality > 79) was used to explore the genome-wide mutation spectrum. We found that mutations at A:T bases are significantly expanded in the genome-wide set compared to the coding mutations, especially for A:T->G:C transitions (P = 0.0065). This is consistent with the higher A:T content in non-coding sequences than in coding sequences. Comparison to the whole-genome mutation spectrum reported for the melanoma cell line (COLO-829)8 and a small cell lung cancer cell line (NCI-H209)7 suggests that the tumor genome under study shows no sign of tobacco or ultraviolet influence. We then compared the fraction of the three classes of guanine mutations occurring at CpG dinucleotides in primary tumor, brain metastasis, and xenograft and found that the frequencies of G->A mutations are 27.54%, 27.60%, and 28.05% in each respective tumor, significantly higher than both the genome average of 4.45% (P < 10-10) and the frequency reported in NCI-H209 (P < 10-10; Figure 1c).
Of the 50 validated point mutations and small indels, 48 are detectable in all three tumors. We performed a statistical enrichment test that takes the variations of different platforms, experiments, and primer pairs into consideration (Supplementary Information). These 48 sites consist of (1A) 20 sites with relatively comparable frequencies across tumors, (1B) 26 sites significantly enriched (FDR ≤ 0.05) in the metastasis and/or xenograft, and (1C) two sites with significant enrichment (FDR ≤ 0.05) in the primary tumor (Figure 2 and Table 1). The affected genes and the likely consequences of these mutations are summarized in Table 1 and Supplementary Table 3b.
We detected a JAK2 mutation (I166T), residing in the FERM domain, which is different from the previously reported activating mutations in myeloproliferative diseases, often found in the pseudokinase domain11. Screening of an additional 116 breast tumors identified another mutation (R1122P) in the kinase domain of JAK2 from a Luminal B-type breast cancer. A splice site mutation (e8-1) was found in IRAK2. We performed an RT-PCR experiment using RNAs from the brain metastasis and xenograft and found that the first 30 nucleotides of exon 8 (IRAK2, NM_001570) were skipped and an internal exonic AG site was used as a splice acceptor, resulting in an in-frame deletion. A missense mutation (A401S) in CSMD1 was found in all three tumors. Loss of CSMD1 expression is associated with poor survival in invasive ductal breast carcinoma12 and it is frequently deleted in colorectal adenocarcinoma and head/neck carcinomas13. We also identified 3 missense (E608K, T1456R, and Q2204R) and 1 nonsense (Q3005*) mutations in CSMD1 in 4 breast cancers out of 116 screened. A binomial test shows that CSMD1 is significantly mutated in breast cancer (P = 0.022 and FDR = 0.197)(Supplementary Table 4).
A missense mutation (A681E) in NRK, a protein kinase involved in activating JNK, was found to be present in all three tumors, but at 8- and 13-fold increased allele frequencies in the metastasis and xenograft, respectively (Figure 2 and Table 1). Two somatic mutations (S424C and Q521*) in NRK have been previously reported in breast cancer14. The missense mutation (P461L) identified in the C-terminus of MAP3K8 was present at a roughly 6-fold increase in the xenograft compared to the primary tumor. C-terminal truncation of MAP3K8 has been shown to activate this oncogenic kinase15,16, raising the possibility that this C-terminal substitution (P461L) is an activating mutation.
Another missense mutation (K1017N) in PTPRJ, a protein tyrosine phosphatase, had a mutant allele frequency of 32% in the metastasis and 57% in the xenograft compared to just 1.3% in the primary tumor. This K1017N mutation in PTPRJ is among the most highly enriched mutations in both the metastasis (FDR = 0.00035) and xenograft (FDR = 0.00022). The mutation site is in the juxtamembrane domain (a basic residue motif) and is in close proximity to the tyrosine-protein phosphatase domain (aa 1041-1298). Sacco et al.17 reported that the PTPRJ charged peptide (aa 1013-1024) is responsible for interaction with its substrates, such as ERK1/2. The K1017N mutation found in the basal tumor and the K1016A mutation described in Sacco's report both change a basic residue to a neutral residue, suggesting these two mutations may be functionally similar. A missense mutation (F299V) in WWTR1, assigned as deleterious by SIFT18, was detected at 28% mutant allele frequency in metastasis, but only at 7% and 10% in primary tumor and xenograft, respectively (Figure 2 and Table 1). WWTR1, a 14-3-3 binding protein with a PDZ binding motif, has been shown to modulate mesenchymal stem cell differentiation19. Over-expression of WWTR1 has also been implicated in promoting the migration, invasion, and tumorigenesis of breast cancer cells20.
Another point mutation (R258Q) was identified in CHGB (chromogranin B) encoding a tyrosine-sulfated secretory protein. A SNP at the same position was reported to dbSNP in January 2009 for a Yoruba sample. It was also assigned as a germline site in another African-American with breast cancer when we genotyped this mutation in 112 additional primary tumors and 73 metastatic tumors of various expression classes (Supplementary Information). To investigate this variant further, 84 cancer-free African American women with an average age of 71.2 yrs (low risk for developing breast cancer) and 38 early-onset African American breast cancer patients with an average age of 35.6 yrs were genotyped. The results indicated that 8 out of 84 controls and 3 out of 38 cases carried the variant allele, suggesting this variant is unlikely to be a breast cancer susceptibility allele.
Three validated indels were enriched in the metastasis and/or xenograft. One was the 1-bp insertion in exon 4 of the TP53 gene, which creates a frameshift mutation (Q167fs) in the DNA binding domain and results in a truncated protein. We found the TP53 mutation significantly enriched in the xenograft, while present at a relatively constant frequency in primary tumor and metastasis (Figure 2 and Table 1).
A nonsense mutation (Q2222*) in MYCBP2 and a missense mutation (E576K) in TGFBI, both found in all three tumors, had higher mutant allele frequencies in the primary tumor (88% for MYCBP2 and 89% for TGFBI) than in the metastasis (44% for MYCBP2 and 38% for TGFBI) or the xenograft (37% for MYCBP2 and 18% for TGFBI) (Figure 2 and Table 1).
Two de novo mutations were discovered in the metastatic tumor, neither of which was detected in the primary or xenograft tumor genomes. One was a missense mutation (T708I) in SNED1, with a mutant allele frequency of 37%. The other was a silent mutation (N2483) in FLNC with a mutant allele frequency of 18% (Figure 2 and Table 1). Since the xenograft line, without these two mutations, exhibits metastatic lesions in ovarian, lymphoid, and subcutaneous tissue (data not shown), it is unlikely that these mutated genes are essential to the metastatic process.
The cnvHMM algorithm (unpublished) was applied to the aligned sequence reads to detect regions of copy number alterations in all three tumors. Using pathology-based purity estimates for the primary tumor and brain metastasis, we calculated the read depth contributed from the tumor cells alone and then computed the copy number for all genomic positions. Read depth correction was not applied to the xenograft, as stated earlier. We subsequently compared the copy number data from all three tumors with those from peripheral blood, to identify genomic segments with significant copy number alterations (CNAs) (Supplementary Information). A total of 516.5 Mb, 640.4 Mb, and 754.5 Mb were amplified, while 342.5 Mb, 383.1 Mb, and 562.5 Mb were deleted, in primary tumor, metastasis, and xenograft, respectively (Supplementary Table 5-7). Moreover, 96.11% and 93.98% of CNA sequences in the primary tumor also were found in CNA segments in the metastasis and xenograft respectively, suggesting most primary tumor CNAs are preserved during disease progression and engraftment. On the other hand, only 80.65% of metastasis and 61.29% of xenograft CNA sequences overlap with primary tumor CNAs. Furthermore, 155 regions with focal copy number segments (≤ 2Mbp) were detected in the primary tumor, but only 101 and 97 regions in the metastasis and xenograft (Supplementary Table 8-10). Our result also shows that 111 (average span = 745,183 bp) and 99 (average span = 799,395 bp) focal copy number segments (≤ 2Mbp) in the primary tumor overlap with broader copy number segments in the metastasis (average span = 2,245,546 bp) and xenograft (average span = 3,565,456 bp), suggesting possible expansion of primary focal regions or selection of new adjacent events during disease progression and in the mouse host. Sequence depth-based copy number analysis shows overall the highest concordance with other platforms, including the array CGH and Illumina SNP array, and also provided the highest concordance of copy number (correlation coefficients: 0.89-0.97) between primary tumor, metastasis, and xenograft(Supplementary Table 11).
We used BreakDancer 21 to detect structural variants (SV) in sequencing data from paired end libraries (Supplementary Table 12) and applied a set of thresholds to identify putative somatic structural events.
Breakpoint-containing contigs from the three tumor samples, that were not present in the matched normal genome, were successfully assembled for 137 deletions, 15 insertions, and 38 inversions using the TIGRA assembler (unpublished), suggesting they were putative somatic events. We then re-mapped individual reads to these assembled contigs to screen out germline SVs and to confirm somatic SVs (Supplementary Information), resulting in the detection of 59 deletions and 18 inversions. PCR primers were designed successfully to validate 73 out of 77 putative SV events and the resulting amplicons were sequenced by either the Roche454 or ABI 3730 platform. Subsequently, 28 deletions and 6 inversions were validated as somatic events (Table 2). Among them, a 46,462 bp heterozygous deletion in FBXW7 removes the last 10 exons and a portion of the first exon of NM_018315, likely inactivating FBXW7. FBXW7 targets Cyclin E and mTOR for ubiquitin-mediated degradation22,23. Numerous cancer-associated mutations in FBXW7 have been previously reported, and loss of FBXW7 function causes chromosomal instability and tumorigenesis24. Two overlapping deletions (538,467 bp and 515,465 bp in length) on chromosome 5, affecting CTNNA1 along with LRRTM2, MATR3, SNORA74A, and SIL1 also were validated. This result is consistent with the detection of a focal copy number deletion encompassing this region in both metastasis (copy number = 0.65) and xenograft (copy number = 0.03) (Figure 3 and Supplementary Table 9 and 10). Careful examination of this region in the aligned sequence reads for the primary tumor confirms the existence of copy number deletion. Loss of CTNNA1 was shown to result in global loss of cell adhesion in human breast cancer cells25 and increased in vitro tumorigenic characteristics26, suggesting this bi-allelic deletion has functional importance. A 109,563 bp heterozygous deletion on chromosome 8 was assembled and validated in all three tumors. This event removed three exons of NRG1, which encodes a peptide growth factor that binds to ERBB3 and ERBB4. Interestingly, a 26,919 bp deletion in MECR was only identified, assembled, and validated in the metastasis, suggesting its de novo nature in this sample.
Of the 112 assembled putative translocations, 34 passed manual review using Pairoscope graphs (unpublished), and 19 with an assembly score greater than our experimentally-supported cutoff of 10 were included in Supplementary Table 13. Seven translocations were experimentally validated (Table 2). One validated translocation t(4;9)(188855443;139022258), assembled in all three tumors, involved an LTR from the ERVL-MaLR family on chromosome 4 and ABCA2 on chromosome 9. The translocation removes the final exon of the ABCA2 gene (NM_001606). Two other validated translocations, identified in all three tumors, are t(1;2)(245548338;64855172) and t(2;6)(64855607;144243116) (Supplementary Figure 1). Noticeably, the breakpoints on chromosome 2 for these two translocations are only separated by 393 bp in a TcMar-Tigger repeat. The chromosome 1 breakpoint of t(1;2)(245548338;64855172) is in intron 5 of NM_032752 in ZNF496. We expect the translation of ZNF496 to continue through exon 5 into intron 5 due to lack of a splice acceptor site. On the other hand, t(2;6)(64855565;144243116) involves FAM164B on chromosome 6 and the translocation contig retains 3 exons of XM_928657. We have also validated t(1:6)(245548342;144243110) (not detected by BreakDancer), whose breakpoints are only 4 bp and 6 bp away from the breakpoints identified on chromosomes 1 and 6 for t(1;2)(245548338;64855172) and t(2;6)(64855607;144243116), respectively (Supplementary Figure 1). This translocation is found in both the primary tumor and the metastasis, but apparently is lost in the xenograft (Supplementary Figure 1 and and4).4). Sequencing of two PCR products generated using two primer pairs from chromosomes 1 and 6 demonstrated the presence of two forms of genomic fusions: one includes chromosomes 1 and 6 and the other includes chromosomes 1, 2, and 6. The former is only present in the primary tumor and the metastasis.
Our comprehensive analysis of this sample set identified 50 novel somatic point mutations and small indels in coding sequences, RNA genes, and splice sites as well as 28 large deletions, 6 inversions and 7 translocations. In terms of functional annotation, a hierarchy can be suggested. The first level includes somatic changes likely to be functional, such as the small indel in TP53, the large heterozygous deletion in FBXW7, and the bi-allelic deletion in CTNNA1. The second level consists of non-synonymous mutations in genes previously noted to be targeted for somatic mutation in cancer or found to be recurrently mutated in this study, although the exact mutations are novel and their functional importance requires further investigation (JAK2, PTCH2, CSMD1, and NRK). The third level contains mutations known to be related to signal transduction in the malignant cells and/or found to be enriched during disease progression (MAP3K8, PTPRJ, and WWTR1). The final level, by far the largest group, awaits the acquisition of new data. Analysis of germline variants for over 500 classic tumor suppressor genes and oncogenes27 identified a large number of SNPs, none of which were unequivocal hereditary breast cancer susceptibility alleles (data not shown).
The wide range of mutant allele frequencies suggests considerable genetic heterogeneity in the cellular population at the primary site. The mutation frequency range narrowed in brain metastasis and xenograft, suggesting the metastatic and transplantation processes selected for cells utilizing a distinct subset of the primary tumor mutation repertoire. The overlap between the mutation frequency changes seen in the metastatic and xenograft samples argues that genomic progression during xenograft formation is similar to that during metastasis. Moreover, it suggests that the changes were not therapy-related, since the xenograft was established prior to any treatment. GO annotation of enriched mutations suggests that transcription factor activity is possibly selected in xenograft (Supplementary Table 14). In contrast to our observation of only two new tier 1 mutations at the metastatic site, sequencing of an indolent metastatic lobular breast tumor showed that the great majority of the mutations detected were completely novel when compared to the primary tumor9. However, in this instance, the metastatic process evolved over nine years, as opposed to less than one year in the case we describe here. Another difference relative to the lobular cancer genome, where no structural variants were validated, was that paired-end sequencing detected 41 structural variations within this basal-like tumor genome. Our study of a primary tumor-metastasis-xenograft trio therefore demonstrates that, while additional somatic mutations, copy number alterations, and structural variations do occur during the clinical course of the disease, most of the original mutations and structural variants present in the primary tumor are propagated. The preservation of all primary mutations in the xenograft suggests that early passage xenograft lines are valid for functional and therapeutic studies. However, the altered mutation frequency and elevated degree of copy number alterations suggest caution when interpreting the results of such experiments.
In conclusion, the first completed basal-like breast cancer genome is highly complex, as would be anticipated from a tumor-type associated with chromosomal instability and DNA repair defects. Indeed this cancer genome, in comparison with the two AML cases published recently by our group27,28, revealed a 3-4 fold increase in high confidence SNVs genome-wide, suggesting a much greater background mutation rate. Future studies should extend our analysis approach of primary, metastatic, and normal tissue trios and include affected individuals with diverse geographic origins to produce a complete catalog of recurrent somatic and inherited variants associated with the development of this common malignancy.
Illumina reads from peripheral blood, primary tumor, metastasis, and xenograft were aligned to NCBI build36 using MAQ5 and coverage levels were defined by comparison of SNPs identified by Illumina 1M duo arrays to SNVs called by MAQ. Somatic mutations were identified using our in-house programs glfSomatic and a modified version of the Samtools indel caller (http://samtools.sourceforge.net/). Putative variants were manually reviewed and then validated by Illumina, 3730 or 454 sequencing. Structural variations were identified using BreakDancer21, manually reviewed and validated by a combination of localized Illumina read assembly, PCR and either 3730 or 454 sequencing. A complete description of the materials and methods used to generate this data set and results is provided in the Supplementary Information.
We thank the many members of The Genome Center and Siteman Cancer Center at Washington University in St. Louis for support. This work was funded by grants to R.K.W. (Richard K. Wilson) from Washington University in St. Louis and the National Human Genome Research Institute (NHGRI U54 HG003079), and grants to M.J.E. (Matthew J. Ellis) from the National Cancer Institute (NCI 1 U01 CA114722-01), the Susan G Komen Breast Cancer Foundation (BCTR0707808), and the Fashion Footwear Charitable Foundation, Inc. NCI U10 CA076001 and a Breast Cancer Research Foundation grant awarded to the American College of Surgeons Oncology Group supported the acquisition of samples for recurrence testing. The tissue procurement core was supported by an NCI core grant to the Siteman Cancer Center (NCI 3P50 CA68438). The Human and Mouse Linked Evaluation of Tumors Core was supported by the Institute of Clinical and Translational Sciences at Washington University (CTSA grant UL1 RR024992). Lastly, we thank Illumina, Inc. for their support and role in the Washington University Cancer Genome Initiative.
Author Contributions: E.R.M., L.D., R.S.F., M.J.E., T.J.L., and R.K.W. designed the experiments. L. D. and M.J.E. led data analysis. L.D., D.E.L., K.C., J.W.W., C.C.H., M.D.M., D.C.K., Q.Z., H.S., J.K., L.C., L.L., M.C.W., N.D.D., D.S., D.M.T., J.L.I., P.J.G., J.S.H., W. S., G.M.W., and Y.T. performed data analysis. D.E.L., C.C.H., J.W.W., J.F.M., and L.D. prepared figures and tables. R.S.F., L.L.F., R.M.A., J.H., K.D.D., C.C.F., K.A.P., J.S.R., V.J.M., L.C., S.D.M., T.L.V., E.A., K.D., S.D., T.G., L.L., R.C., J.E.S., D.P.L., M.E.W., M.C., G.E., M.D.M., D.M.T., J.L.I., and P.J.G. performed laboratory experiments. S.L. and M.J.E. created the xenograft line. M.J.E., M.W., and R.A. provided samples. D.J.D., S.M.S., A.F.D., G.E.S., C.S.P., J.M.E., J.B.P., B.J.O., J.T.L., F.D., A.E.H., M.D.O., and K.E.B. provided informatics support. L.D., M. J. E., E. R. M., and R. K. W. wrote the manuscript. T.J.L., D.E.L, M.C.W., D.C.K., and C.M.P. critically read and commented on the manuscript.