|Home | About | Journals | Submit | Contact Us | Français|
Endometrial cancer is the 6th most commonly diagnosed cancer among women worldwide, causing ~74,000 deaths annually 1. Serous endometrial cancers are a clinically aggressive subtype with a poorly defined genetic etiology 2-4. We used whole exome sequencing (WES) to comprehensively search for somatic mutations within ~22,000 protein-encoding genes among 13 primary serous endometrial tumors. We subsequently resequenced 18 genes that were mutated in more than one tumor, and/or were genes that formed an enriched functional grouping, from 40 additional serous tumors. We identified high frequencies of somatic mutations in CHD4 (17%), EP300 (8%), ARID1A (6%), TSPYL2 (6%), FBXW7 (29%), SPOP (8%), MAP3K4 (6%) and ABCC9 (6%). Overall, 36.5% of serous tumors had mutated a chromatin-remodeling gene and 35% had mutated a ubiquitin ligase complex gene, implicating the frequent mutational disruption of these processes in the molecular pathogenesis of one of the deadliest forms of endometrial cancer.
We performed targeted exon capture and next generation sequencing on DNA from 13 primary serous endometrial tumors with high neoplastic cellularity (Supplementary Table 1) and matched normal DNA. The mean depth of coverage for aligned reads was 102.6× and on average 89.5% of targeted bases had sufficient coverage and quality for variant calling (Supplementary Table 2). Using stringent filtering criteria, which included an empirically determined threshold that accounted for read quality and depth and provided an optimal balance between positive predictive value (86.1%) and sensitivity (97.3%) (Materials and Methods), we identified 1,522 exonic somatic mutations (1,183 nonsynonymous: 339 synonymous) and 22 splice junction mutations within the protein-encoding genes of the 13 tumors (Supplementary Figure 1). One tumor had an apparent hypermutable phenotype with a greater number of mutations and a different mutation signature than the other 12 tumors (Supplementary Figure 2; Supplementary Table 3 and Supplementary Table 4) and was excluded from subsequent analyses.
Among the remaining 12 tumors we identified 516 exonic mutations (380 nonsynonymous and 136 synonymous) and 11 splice junction mutations (Supplementary Table 5). We could orthogonally assess 510 of the 527 exonic/splice junction mutations by Sanger sequencing; 86.1% (439 of 510) of mutations confirmed as somatic (Supplementary Table 5 and Supplementary Table 6). The validated somatic mutations included 321 nonsynonymous mutations (279 missense, 86.9%; 19 nonsense, 5.9%; 17 frameshift, 5.3%; 6 in-frame insertions/deletions, 1.9%) and 9 splice junction mutations among 304 protein-encoding genes, and 109 synonymous mutations (Supplementary Table 5). There was a mean of 27.5 validated nonsynonymous and splice junction mutations per tumor (range 5-55) (Supplementary Table 6). The predicted functional impact of the validated missense mutations was assessed in silico; 34.4% of 241 missense mutations that could be assessed by both the SIFT and Mutation Assessor algorithms are predicted to impact protein function (Supplementary Table 5).
To prioritize our search for novel driver mutations in serous endometrial cancer, we focused on the nine genes that had validated nonsynonymous somatic mutations in more than one tumor (Supplementary Table 7). We resequenced these genes in a prevalence screen of 40 additional serous endometrial tumors. Three of the nine genes (TP53, PIK3CA, and PPP2R1A) have established roles in the pathogenesis of serous endometrial cancer 5-8. Among the 52 serous tumors in our study, TP53, PIK3CA, and PPP2R1A were mutated, respectively, in 71%, 31%, and 25% of tumors, as reported here and in a previous study from our group 6 (Supplementary Table 8 and Supplementary Table 9). The other six genes (CHD4, SPOP, FBXW7, ABCC9, CYP4X1, MAP3K4) have no previously reported role in serous endometrial cancer. The combined discovery and prevalence screens revealed high frequency somatic mutations in CHD4 (17%), FBXW7 (29%), SPOP (8%), MAP3K4 (6%), ABCC9 (6%), and CYP4X1 (4%) (Table 1, Figure 1, Supplementary Table 9, Supplementary Figure 3, and Supplementary Figure 4). The mutation rates for CHD4, FBXW7, and SPOP were significantly higher than the background mutation rate (q ≤0.0353) (Supplementary Table 10).
In addition to serous tumors, the major histological subtypes of endometrial cancer are clear cell and endometrioid, with overall 5-year relative survival rates of 45%, 65% and 91%, respectively 9. For comparison across subtypes, we resequenced CHD4, FBXW7, and SPOP from 23 clear cell, 67 endometrioid, and 18 mixed histology endometrial tumors. Collectively, these three genes were somatically mutated in 40% of serous, 26% of clear cell, 15% of endometrioid, and 17% of mixed histology endometrial cancers (Supplementary Table 9). There was no significant association between mutations in CHD4, FBXW7 or SPOP and microsatellite instability or MSH6 mutations (Supplementary Table 11, Supplementary Table 12, Supplementary Figure 5).
CHD4 (Chromodomain-helicase-DNA-binding protein 4) is a catalytic subunit of the NuRD complex, which regulates transcriptional repression, chromatin assembly, and the DNA damage response 10-19. We confirmed endogenous CHD4 expression in endometrial cancer cells (Supplementary Figure 6). CHD4 was highly mutated in serous tumors (17%), and was also mutated in clear cell (4%), endometrioid (7%), and mixed histology (11%) tumors (Supplementary Table 9). Eighty-percent of CHD4 missense mutations, including an arginine-1162 (CHD4Arg1162) hotspot, are predicted to impact protein function (Table 1, and Figure 1). Most CHD4 mutations were missense mutations; 83% of all CHD4 mutations affected residues that are highly conserved throughout evolution or across closely related family members (Supplementary Figure 7 and Supplementary Figure 8).
Half of all CHD4 mutations localized to the ATPase/helicase domains (Figure 1a); two-thirds (6 of 9) of those mutations localized to conserved residues that undergo germline or de novo pathogenic mutations in SMARCAL1, SMARCA4, or SMARCA2 causing Schimke immune-osseous dysplasia, Coffin-Siris syndrome, and Nicolaides-Baraitser syndrome (Supplementary Figure 9) 20-22. This observation leads us to speculate that somatic mutations in the ATPase/helicase domain of CHD4 may be driver mutations in endometrial cancer.
Other frequently mutated genes in our study were FBXW7 and SPOP. The FBXW7 (F-box and WD repeat domain containing 7) tumor suppressor is a component of the FBXW7-SKP1-CUL1 ubiquitin ligase complex, which mediates ubiquitination and proteosomal degradation of phosphoprotein substrates including CYCLIN-E, NOTCH, JUN, and C-MYC 23. Previous reports of FBXW7 mutations in endometrial cancer either did not include serous and clear cell tumors or did not report the histology of mutated cases 24-27. We identified FBXW7 mutations in 29% of serous, 13% of clear cell, 10% of endometrioid, and 11% of mixed histology endometrial cancers (Supplementary Table 9). The mutation frequency was significantly higher among serous tumors than high-grade endometrioid tumors (29% versus 0%, P=0.0146 Fisher’s 2-tailed Exact test of significance). Most FBXW7 mutations localized to the substrate binding WD-repeats (Figure 1b), consistent with the mutation spectrum in other cancers 28. Forty-six percent of FBXW7 missense mutations in our study are known loss-of-function mutations; another 39% of missense mutations are predicted to affect function 29 (Table 1). Our findings may be clinically relevant since loss of FBXW7 function correlates with resistance to antitubulin chemotherapeutics 30 and sensitivity to an HDAC inhibitor 31.
SPOP (Speckle-type POZ Protein) was somatically mutated in serous (8%) and clear cell tumors (9%) but was not mutated in endometrioid or mixed histology tumors (Supplementary Table 9). The mutation frequency was statistically significantly higher in serous than in endometrioid tumors (8% versus 0%, P=0.0341, Fisher’s 2-tailed Exact test of significance). SPOP forms part of a multi-subunit CULLIN3 (CUL3)-dependent ubiquitin ligase complex, and has recently been shown to be mutated at high frequency in prostate cancer 32. All SPOP mutations in endometrial cancer, including a recurrent SPOPSer80Arg mutant, localized to highly evolutionarily conserved residues within the MATH domain that acts as the substrate recognition domain to bind proteins targeted for ubiquitin-mediated degradation (Figure 1c). This localization is strikingly similar to the localization of loss-of-function mutations to the substrate recognition domain of FBXW7 (Figure 1b). Thus, we predict that SPOP mutations in endometrial cancer are likely to be loss-of-function mutants with impaired substrate binding. Intriguingly, the SRC-3/AIB1 oncoprotein, an SPOP substrate 33, is overexpressed in endometrial cancer independent of gene amplification 34.
To identify additional candidate driver genes for serous endometrial cancer, we evaluated the functional relationships of the 304 protein-encoding genes that had orthogonally validated mutations (nonsynonymous or splice junction) in the discovery screen (Supplementary Table 13 and Supplementary Table 14) 35,36. One of the enriched functional groupings was chromatin modification, which was formed by CHD4 and ten other genes (EP300, ARID1A, TSPYL2, KDM4B, TRIM16, HDAC7, CTCF, YEATS4, TRRAP, and BAZ1B) (Supplementary Table 13). Although this enriched grouping did not achieve statistical significance following multiple testing correction, we focused on it because it contained CHD4, one of the most highly mutated genes identified in our study, and because chromatin-remodeling genes are a frequent target of somatic mutations in other types of cancer 37-50. We therefore resequenced the ten additional chromatin-remodeling genes from 40 additional serous and 23 clear cell endometrial tumors. In the combined discovery and prevalence screens, the 11 chromatin-remodeling genes were somatically mutated in 36.5% of serous tumors and 22% of clear cell tumors (Table 1, Table 2, and Figure 2). Two of the mutated genes, EP300 (E1A binding protein p300) and ARID1A (AT rich interactive domain 1A, SWI-like), are consensus cancer genes. EP300 and ARID1A were mutated, respectively, in 8% and 6% of serous tumors and in 4% and 13% of clear cell tumors (Table 2). Most EP300 mutations localized within the histone acetyltransferase domain of p300, a global transcriptional co-activator (Figure 3a). p300Arg1627 formed a mutation hotspot in EC; the p300Arg1627Trp mutation has also been described in lymphoma 51. ARID1A encodes the BAF250A tumor suppressor, a subunit of the SWI/SNF-A chromatin-remodeling complex 43,49,52-54. Most ARID1A mutations we uncovered are predicted to truncate BAF250A (Figure 3b), consistent with the mutation spectrum in other tumors 43,49,55. Our finding of ARID1A mutations in 6% of serous endometrial cancers is consistent with a recent study by McConechy et al., which documented ARID1A mutations in 11% of serous endometrial tumors 56. To our knowledge, this is the first report of ARID1A mutations in clear cell endometrial cancer, substantiating previous reports of loss of BAF250A expression in this histotype 57.
Our study provides novel insights into the somatic mutations present in serous endometrial cancer exomes. However, it is important to acknowledge that our discovery screen is underpowered to detect all somatically mutated genes that drive serous tumors. For example, PIK3R1, which we previously found to be somatically mutated in 8% of serous endometrial tumors 58, was not somatically mutated among the tumors that formed our discovery screen. We estimate that for genes that are mutated in 8% of all serous endometrial cancers, a discovery screen of 12 tumors has 25% power to detect two mutated tumors and 63% power to detect one mutated tumor (Supplementary Table 15); for genes that are mutated in 20% of all serous endometrial cancers, our discovery screen has an estimated 72.5% power to detect two mutated tumors and 93% power to detect one mutated tumor. Massively parallel sequencing of additional cases will undoubtedly yield deeper insights into the mutational landscape of serous endometrial cancer.
Herein we report the first exome sequence analysis of serous endometrial cancers, which are clinically aggressive tumors that have been poorly characterized genomically. Our findings implicate the disruption of chromatin-remodeling and ubiquitin ligase complex genes in 50% of serous endometrial tumors and 35% of clear cell endometrial tumors (Figure 2). The high frequency and specific distribution of mutations in CHD4, FBXW7, and SPOP strongly suggests these are likely to be driver events in serous endometrial cancer.
Anonymized, snap-frozen primary tumor tissues, corresponding hematoxylin and eosin-stained tumor sections, matched normal tissues (uninvolved reproductive tissue or whole blood), and clinicopathological information were obtained from the Cooperative Human Tissue Network, which is funded by the National Cancer Institute. The NIH Office of Human Subjects Research determined that this research was not “human subjects research” per the Common Rule (45 CFR 46), and therefore that no IRB review was required for whole exome sequencing of these samples. A small number of samples in the prevalence screen were obtained from the Biosample Repository at Fox Chase Cancer Center, or from Oncomatrix, Inc (San Marcos, CA). Tumor specimens were collected at surgical resection, before treatment. Histological classifications were based upon the entire specimen at time of diagnosis. Tumors consisted of 53 serous cases, 23 clear cell cases, 67 endometrioid cases and 18 cases of mixed histology. A pathologist reviewed hematoxylin and eosin-stained sections of banked tumor tissues, to verify that they were representative of the original histological classification, and to delineate regions of high tumor cell content (>70%) for macrodissection.
Genomic DNA was isolated from macrodissected tumor tissues and normal tissues using the PUREGENE kit (Qiagen), followed by phenol-chloroform purification. Tumor-normal pairs were typed using the Coriell Identity Mapping kit (Coriell). Genotyping fragments were resolved on an ABI-3730xl DNA analyzer (Applied Biosystems) and scored using GeneMapper software.
Exomes of tumor-normal pairs were captured using the Agilent SureSelect Human All Exon Kit (3 pairs) or the Agilent SureSelect Human All Exon 50 Mb Target Enrichment kit (10 pairs) according to the manufacturer’s instructions. DNAs captured using the SureSelect Human All Exon Kit were run on an Illumina GAIIx platform with version 4 chemistry and version 2 flowcells; DNAs captured using the SureSelect Human All Exon 50 Mb Target Enrichment kit were run on an Illumina GAIIx platform with version 5 chemistry and version 4 flowcells, according to the manufacturer’s instructions to generate 75- or 100-base paired end reads.
Next generation sequence reads were initially mapped to the human reference sequence NCBI build 36 (hg18) using the Illumina ELAND alignment algorithm. When at least one read in a pair mapped to a unique location in the genome, that read and its pair were subjected to a more accurate gapped alignment to the 100kb region surrounding the location with cross_match (URL: http://www.phrap.org/phredphrapconsed.html). Alignments were stored in BAM format, and fed as input to bam2mpg (URL: http://research.nhgri.nih.gov/software/bam2mpg/), to call genotypes at all covered positions using a probabilistic Bayesian algorithm referred to as the most-probable-genotype (MPG) algorithm 59. Highly reliable genotypes have an MPG score ≥10 59. For tumor samples, bam2mpg was run with the --score_variant option, in order to calculate a “most probable variant” (MPV) score, which assesses the posterior probability of the existence of any variant at a position, and therefore is more sensitive than the MPG score at positions for which there is uncertainty about whether a variant is heterozygous or homozygous non-reference. Additional information on the MPG and MPV scores is provided in the Supplementary Note.
We used a number of steps to filter nucleotide variants identified in the whole exome screen. Germline variants called in paired tumor-normal samples were excluded from further analysis. Variants that were present within dbSNP build 132, but which were not annotated as pathogenic or probable pathogenic variants within dbSNP, were also excluded. We compared the remaining variants in each tumor exome to the variants in all 13 normal exomes sequenced in this study; variants called in both a tumor exome and a normal exome were excluded. Variants representing probable mapping ambiguities were also excluded. All remaining variants were considered to be potential somatic mutations and were annotated using the VarSifter software package 60 into bins representing mutations in exons, introns, splice junctions, UTRs, and non-coding RNAs.
After filtering the exome data to exclude germline variants and probable mapping ambiguities, 798 somatic variants were called in the exons and splice junctions of 12 tumors (Supplementary Figure 1 and Supplementary Table 16). We were able to orthogonally assess 730 of the 798 variants by Sanger sequencing; PCR products could not be generated for the remaining 68 variants. Of the 730 variants tested by Sanger sequencing, 451 variants orthogonally validated as somatic (present in tumor and absent from matched normal DNA) yielding a positive predictive value of 61.8% (451 of 730). The remaining variants were either not detected within the tumor or were germline (present in both tumor and matched normal DNA).
Because a positive predictive value of 61.8% was unacceptably low, we sought to empirically establish filtering criteria, based on sequence quality (MPV/MPG score) and read depth (coverage), to achieve an optimal balance between accuracy and sensitivity of mutation detection. We observed (Supplementary Figure 10) that the majority of mutations that did not orthogonally validate as somatic by Sanger sequencing had (i) < 5 reads in the tumor or normal samples; (ii) an MPG score < 10 in the normal sample and/or an MPV score <10 in the tumor sample, (iii) an MPG:COV ratio of <0.5 in the normal sample.
We therefore retrospectively imposed filtering criteria of (i) at least 5 reads in the tumor and normal samples and (ii) an MPG score of ≥ 10 in the normal sample and an MPV score of ≥ 10 in the tumor sample; and (iii) an MPG:COV ratio ≥0.5 in the normal sample, to the 798 somatic variants called by exome sequencing; 527 variants were retained after filtering (Supplementary Table 5). Filtering on score and read depth attained a positive predictive value of 86.1% (510 of 527 retained variants could be assessed by Sanger sequencing; 439 of 510 (86.1%) assessed variants orthogonally validated as somatic mutations) and a sensitivity of 97.3% (439 of 451 orthogonally validated mutations observed before applying the coverage-score filter were retained after filtering).
Filtering criteria of (i) at least 5 reads in the tumor sample and normal sample and (ii) an MPV score of ≥ 10 in the tumor sample and an MPG score of ≥ 10 in the normal sample; (iii) an MPG:COV ratio ≥ 0.5 in the normal sample, was also applied to the 1,042 somatic variants called in T155 by exome sequencing (Supplementary Table 17); 1,017 variants were retained after filtering (Supplementary Table 4).
Nonsynonymous missense mutations called by whole exome sequencing were evaluated in silico using the SIFT (http://sift.jcvi.org/) and Mutation Assessor (http://mutationassessor.org/) algorithms to predict their impact on protein function. A SIFT prediction of “deleterious” and a Mutation Assessor predictions of “medium impact” or “high impact” were considered to predict an impact on protein function.
Assuming N tumor samples are sequenced in a discovery screen, and a fraction X of all tumor samples have gene G mutated, the probability that the N samples sequenced contain 0 samples with the gene mutation is (1-X)N. Therefore, the probability of observing gene G mutated at least once in the discovery screen is 1 - (1-X)N, which is 93% for 12 discovery samples, assuming X = 0.20, or 20%. Likewise, the probability of observing 0 or 1 samples with a mutation in gene G is (1-X)N + N(1-X)(N-1)X, giving the probability of seeing the mutation twice or more as 1 - (1-X)N – N(1-X)(N-1)X, which is 73% for 12 tumors, assuming X=0.20.
Genomic DNA (5ng) was amplified using M13-tailed primers (Supplementary Table 18) in a 10μl polymerase chain reaction (PCR) containing 1X AmpliTaq Gold PCR buffer (Applied Biosystems), 1.5mM MgCl2, 75mM dNTP, 400nM sense primer, 400nM antisense primer, and 0.5 units of AmpliTaq Gold DNA polymerase. PCR amplification was performed on a GeneAmp® PCR System 9700 (Applied Biosystems). PCR amplicons were purified using exonuclease I (Epicentre Biotechnologies) and shrimp alkaline phosphatase (USB Corporation) and bidirectionally sequenced using the Big Dye Terminator v.3.1 kit (Applied Biosystems) and M13 primers. Cycle sequencing products were run on an ABI-3730xl DNA analyzer (Applied Biosystems). Tumor and reference sequences were aligned, and compared using in-house software to determine the genotype of variant nucleotide positions. Non-pathogenic variants present in dbSNP were excluded from further analysis. True somatic mutations were confirmed by reamplification and sequencing of matched tumor-normal DNAs and analyzed using Sequencher software (Gene Codes Corporation).
The somatic mutation rate was determined by dividing the total number of exonic mutations present within a tumor (after filtering on quality score and coverage) by the number of the exonic bases that had adequate quality and coverage in both the tumor sample (MPV score ≥ 10 and at least 5 reads) and paired normal sample (MPG score ≥10, MPG:COV ratio ≥0.5, and at least 5 reads). A Grubb’s test was used to calculate an approximate p-value for each tumor to identify outliers. A uniform background mutation rate equal to the rate observed in the discovery phase was assumed, and a Poisson distribution function was used to calculate p-values for the observed number of mutations in each gene. False discovery rates were calculated using the Benjamini Hochberg method 61, correcting for 21,441 genes tested. This method is a simplified version of the CaMP (cancer mutation prevalence) scoring method 62 including subsequently suggested corrections 63.
The 304 somatically mutated protein-encoding genes identified and orthogonally validated in the discovery screen were analyzed for enriched functional groupings using the Database for Annotation Visualization and Integrated Discovery (DAVID) 35,36, and Ingenuity Systems Pathway Analysis (IPA) in silico tools (www.ingenuity.com). The Bonferroni, Benjamini, and FDR values, computed within DAVID, were assessed for significance.
Tumor-normal DNA pairs were screened for the presence of microsatellite instability (MSI) using the Promega Microsatellite Instability Analysis System v1.2 (Promega) according to the manufacturer’s instructions. All coding exons of MSH6 were PCR amplified and Sanger sequenced.
Endometrial cancer cell lines (RL-95-2, HEC1A, HEC1B, KLE, ANC3A) were obtained from the American Type Culture Collection, or the NCI Developmental Therapeutics Program cell line repository. ARK1 and ARK2 serous endometrial cancer cell lines were kindly provided by Dr. Alessandro Santin (Yale School of Medicine).
Cells were lysed in RIPA buffer (Thermo Scientific) containing 1mM Naorthovanadate, 10mM NaF, and 1X protease inhibitor cocktail (Roche). Lysates were centrifuged followed by denaturing at 95°C in 2X SDS sample buffer (Sigma) prior to SDS-PAGE and transfer to PVDF membranes (Bio-Rad). Primary and HRP-conjugated secondary antibodies were: CHD4 (Cell Signaling), β-Actin (Sigma), goat anti-mouse HRP (Cell Signaling), and goat anti-rabbit HRP (Cell Signaling). Immunoreactive proteins were visualized with enhanced chemiluminescence (Pierce).
We thank our colleagues for critical reading of the manuscript; R. Travis Moreland and Niraj Trivedi of the NHGRI Bioinformatics and Scientific Programming Core for, respectively, performing in silico PCR and advice on statistics; and Dr. Jamie Teer for sharing expertise on VarSifter. Dr. Alessandro Santin kindly provided the ARK1 and ARK2 cell lines. Funded in part by the Intramural Program of the National Human Genome Research Institute, National Institutes of Health (DWB and JCM); NIH grant R01CA112021 (DCS); the Avon Foundation (DCS); NIH grant R01CA140323 (AKG); the Ovarian Cancer Fund (AKG); PH is supported by grants from the NIH grant (CA016519) and by the Canadian Institutes for Health Research (MOP-38096).
D.W.B. designed and directed the study, and wrote the manuscript. A.K.G. contributed clinical specimens. M.J.M. and D.C.S. conducted pathological review of clinical specimens. M.L.R. prepared DNA samples, performed identity testing and microsatellite instability analysis. NISC performed library construction and whole exome sequencing. NISC and N.F.H. performed variant calling. M.L.G. and A.O.H. curated and orthogonally validated exome sequencing data. M.L.G., A.O.H. and D.W.B. interpreted the exome data and established filtering criteria. M.L.G., A.O.H., M.L.R., J.C.P., B.M.E., S.Z. and D.W.B. designed, performed, analyzed, and interpreted the mutation prevalence screens. A.O.H. and M.L.G. analyzed MSH6. M.E.U. and M.L.G. generated sequence conservation alignments. M.E.U. performed cell culture and immunoblotting. N.F.H. M.L.G. and J.C.M. performed statistical analyses. N.F.H. performed the power calculation. D.W.B., M.E.U., M.L.G., M.L.R., A.O.H., N.F.H., J.C.M., A.K.G., P.H. and N.O’N. edited and commented on the manuscript.
Competing Financial Interests
The authors declare no competing financial interests.