|Home | About | Journals | Submit | Contact Us | Français|
The incidence and outcomes for patients with colorectal cancer (CRC) varies by age. Younger patients tend to have sporadic cancers not detected by screening and worse survival. To understand if genetic differences exist between age cohorts we sought to characterize unique genetic alterations in patients with CRC.
We identified 283 patients with sporadic CRC between 1998 and 2010 and divided them by age into two cohorts: ≤45 years old (younger) or ≥65 years old (older) and performed targeted exome sequencing. Fisher’s Exact test was used to detect differences in mutation frequencies between the two groups. Whole exome sequencing was performed on 21 additional younger patient samples for validation. Findings were confirmed in The Cancer Genome Atlas CRC dataset.
246 samples were included for final analysis (195 older, 51 younger). Mutations in FBXW7 were more common in the younger cohort (27.5% vs. 9.7%, p=0.0022) as were mutations in the proofreading domain of POLE (9.8% vs. 1.0%, p= 0.0048). There were similar mutation rates between cohorts with regards to TP53 (64.7% vs. 61.5%), KRAS (43.1% vs. 46.2%), and APC (60.8% vs 73.8%). BRAF mutations were numerically more common in the older cohort, though not statistically significant (2.0% vs 9.7%, p=0.082).
In this retrospective study, we identified a unique genetic profile for younger CRC patients as compared to patients diagnosed at an older age. These findings should be validated in a larger study and could have an impact on future screening and treatment modalities for younger CRC patients.
In this manuscript, we describe our original research involving exome sequencing of patients with colorectal cancer in which we compared younger patients to older patients. We identified a unique genetic profile for younger patients which may have implications on future screening and treatment paradigms for these patients.
CRC is a major cause of morbidity and mortality, both in the US and worldwide,1, 2 and is primarily a disease of older adults, with a median age at diagnosis of 65. Improvements in screening, increasingly effective therapies, and optimization of supportive care have contributed to the trend in improved survival for older patients with colorectal cancer (CRC).3, 4 Though the incidence of disease is declining in the over 50 population, data suggest that the cancer incidence is actually rising in younger patients.5, 6 Younger patients have worse outcomes when matched stage for stage, but also overall for multifactorial reasons such as higher stage at presentation, higher rate of mucinous histology, and intrinsic biologic differences in their disease.7–11 In addition, younger patients with CRC have not seen a significant improvement in survival over the past three decades.4, 12 One possible explanation for this discrepancy is that cancer in this younger cohort of CRC patients is biologically distinct from older adults and to date has not been the focus of large research initiatives.4 Though there are several genetic syndromes that predispose younger patients to CRC, the majority of CRC in younger patients is sporadic.13
Elucidating the reasons behind these disparities in outcome and survival improvement may reveal opportunities to improve cancer care for adolescents and young adults. Preliminary research has suggested that cancer incidence and outcome differences in young adults could relate to fundamental age-related differences in tumor biology.7, 8, 14–19
Recent efforts in molecular profiling have allowed great strides to be made in understanding underlying tumor pathophysiology.20, 21 We performed molecular characterization of tumors from younger and older sporadic CRC patient cohorts in an effort to determine potentially distinct genetic signatures. We hypothesized that CRC in young patients would harbor biologically distinct alterations and that these differences could ultimately play a role in developing personalized screening and management strategies.
Institutional Review Board approval was obtained to conduct this multi-institutional, observational study. Tumor samples from fresh frozen surgical biopsy specimens collected from 468 colon and rectal cancer patients between 1998 and 2010 at Moffitt Cancer Center and affiliated community hospitals as part of the Moffitt Total Cancer Care™ (TCC) protocol.22 Tissue is collected at the time of surgery or biopsy and all patients with cancer are eligible for inclusion.
Our initial analysis included 283 patients. Following initial observations, we refined the cohort to exclude patients with known familial cancer syndromes as well as MSI high (MSI-H) tumors in order to minimize bias and capture those patients who truly had sporadic tumors. Out of the initial 283 patients, 58 were excluded for familial cancer syndromes or having MSI-H tumors (see Figure 1). Out of these 58 patients, 2 patients from the young cohort had a known genetic syndrome, 4 patients from the older cohort had Lynch syndrome associated with an MSH6 mutation, and 52 patients without a known familial syndrome had MSI-H tumors (50 from the older cohort and 2 from the younger cohort). Of the 225 patients remaining, 195 patients were 65 or older and 30 patients were 45 or younger at diagnosis.
The tumors were collected using a snap frozen technique in liquid nitrogen within 15–20 minutes of extirpation. Macrodissection was performed to ensure >80% tumor was present in the specimens that underwent sequence analysis. Normal tissue, necrotic tissue and excessive stromal tissues were dissected away from the specimen under frozen section control. DNA was then extracted for targeted gene sequencing which was performed by Beijing Genomic Institute (Beijing, China). 1,321 genes were targeted using the Agilent SureSelect technology (Agilent Technologies, Inc. Santa Clara CA), followed by 90 base pair, paired-end sequencing on GAIIx instruments (Illumina, Inc. San Diego, CA) to achieve 138× average depth of coverage across the target region.
Burrows-Wheeler Aligner (BWA23) was used to align sequence reads to the human reference (hs37d5). The Genome Analysis Toolkit (GATK24) was used for insertion/deletion realignment, quality score recalibration, and identification of single nucleotide and insertion/deletion variants. Matched normal samples were generally not available for comparison to identify somatic mutations, so filtering of normal variants was performed using the 1000 Genomes Project dataset25 and an internal pool of 238 normal samples. Variants identified in 1000 Genomes with an MAF ≥ 0.01 or identified in the normal pool with MAF ≥ 0.05 were removed. ANNOVAR26 was used to annotate variants, and VarSifter27, in-house web-based display applications, and custom Perl and R scripts were used for analysis.
Mutation rate differences were assessed in two ways: mutation frequency at specific positions to focus on likely recurrent activating oncogenic mutations and entire genes when a truncating mutation was present to account for more diverse inactivating tumor suppressor mutations.
In the validation phase of this study, a focused search through TCC revealed an additional 21 CRC samples from patients diagnosed at age 45 or younger with sporadic disease. Retrospective chart review was performed to capture information on age, sex, gender, stage, pathology, and survival outcome.
These tumors underwent whole exome sequencing in order to identify somatic mutations in the coding regions of the human genome. Two micrograms of DNA was used as input into the Agilent SureSelect XT Clinical Research Exome kit, which includes the exon targets of Agilent’s v5 whole-exome kit, with increased coverage at 5000 disease-associated targets. For each tumor DNA sample, a genomic DNA library was constructed according to the manufacturer’s protocol and the size and quality of the library was evaluated using the Agilent BioAnalzyer. An equimolar amount of library DNA was used for a whole-exome enrichment using the Agilent capture baits and after quantitative PCR library quantitation and QC analysis on the BioAnalzyer, approximately 150 million 75-base paired-end sequences were generated using v2 chemistry on an Illumina NextSeq 500 sequencer. Average depth of coverage across the target regions was 116×. Data were analyzed as described for the original TCC cohort.
Candidate mutation differences were detected in the original cohort by testing for frequency differences using a two-sided Fisher’s exact test at the position level and at the truncating gene level (counting the presence of any truncating mutations in a gene). Marginally significant findings (p<0.05 uncorrected) in the discovery set were then retested including the results from 21 additional younger patients, resulting in an expanded cohort with 51 younger and 195 older patients. Multiple test correction was performed using the Benjamini and Hochberg method.28 Gene and mutations known to drive colorectal cancer were specifically tested to understand differences in colorectal cancer driver genes.
The Cancer Genome Atlas (TCGA) data were used for comparison and external validation.29 Sequencing data was downloaded for both colon adenocarcinoma (COAD) and rectum adenocarcinoma (READ). Individuals with the clinical indicator “microsatellite instability” of YES were excluded, resulting in 29 young individuals and 220 old individuals. MAF files were converted to VCF, and re-annotated as described above. Differential mutation rates in specific genes, including known CRC driver genes and differentially mutated genes in our cohort were retested in this independent dataset. Lollipop plots were prepared using the cBioPortal MutationMapper tool.30
283 CRC patient specimens underwent targeted gene sequencing covering 1,321 genes. We observed a marginally significant imbalance in MSI frequency between groups (younger= 6.2%, older = 20.7%, p=0.055). MSI patients as well as patients with familial syndromes were excluded from further analysis to avoid introduction of mutational bias potentially caused by this baseline difference. Our final discovery cohort of sporadic, mismatch repair sufficient CRC (30 younger, 195 older) was then analyzed to identify mutation differences between groups. Baseline patient characteristics of the expanded cohort are described in Table 1.
In the older cohort (age ≥65), the median age at diagnosis was 73 with a range from 65–93 and 49% were male. In the younger cohort (age ≤45, the median age at diagnosis was 42 with a range from 30–45 and 55% were male. There was a higher frequency of advanced stage cancers in the younger cohort.
Following a test for proportional differences we identified 27 candidate positions and 17 truncated genes differentially mutated between younger and older patients (Supplemental Table 1) (p<0.05 uncorrected). Mutated positions were found in many genes, including specific mutations in the recurrently mutated APC, KRAS, and TP53 genes. Top hits included a stop mutation in FBXW7 and a point mutation in MAP2K4 (both seen in 3/30 younger and 0/195 older patients). Both mutations are observed in published data of colorectal cancer (cBioPortal29). Differentially truncating genes included several known tumor suppressors, including WRN, FBXW7, ATR, and NF1.
To confirm these results, an additional 21 younger patient tumor samples were sequenced with whole exome sequencing, and analyzed and enriched for somatic mutations as described (Supplemental Table 2). 16 positions remained significant (p<0.05, uncorrected), including mutations observed in FBXW7 and MAP2K4. Well characterized mutations in KRAS, TP53, and APC were no longer significant with the increased sample size. A multiple test correction was applied using the Benjamini-Hochberg method and resulted in two significant mutations: a stopgain in FBXW7 (4/51 young, 0/195 old, q=0.042) and a non-frameshifting insertion in CBX4 (0/39 young, 30/151 old, q=0.042, total sample counts reduced as low read depth resulted in some instances of missing data.) A similar non-frameshifting insertion is seen in the Exome Variant Server (http://evs.gs.washington.edu/EVS/) at overall allele frequency of 5.8%, suggesting this variant may be inherited. 11 truncated genes remained significant (p<0.05, uncorrected) but only FBXW7 survived multiple test correction (6/51 young, 2/195 old, q=0.021).
Given the significant results observed in FBXW7 using both specific mutation and truncating gene approaches, we compared the overall gene mutation rate in the expanded cohort. FBXW7 was about 2.5-fold more frequently mutated in younger samples compared to older (14/51 = 27.5% in young, 19/195 = 9.7% in older, p=0.0022) (Table 2). Mutations in this gene included the stop gain mutation we observed as more frequently mutated in young patients, as well as other recurrent and rarer mutations. The most highly mutated positions in our cohort were R578X (4/51 = 7.8% in young, 0/195 = 0% in old; R658X in cBioPortal), R425C (2/51 = 3.9% in young, 3/195 = 1.5% in old; R505C in cBioPortal) and R385C (4/51 = 7.8% in young, 4/195 = 2.1% in old; R465C in cBioPortal). Although not always statistically significant, these recurrent positions were each more frequently mutated in young patients, contributing to a significantly higher overall mutation rate.
To confirm the observation of increased FBXW7 mutations in an independent dataset, we examined TCGA colon and rectal adenocarcinoma somatic mutation data. Although the number of young patients was much smaller than in our study, we observed increased mutation rates at positions and in genes we have observed in this study. FBXW7 R578X was observed 1/29 in young, and 1/220 in old (p=0.22). FBXW7 truncating mutations were significantly higher in young patients (5/29 young, 11/220 old, p=0.026). Overall FBXW7 mutation rate was higher in young patients, but was not significant: (6/29 = 20.7% in young, 25/220 = 11.4% in old, p=0.22) (Table 2). Combining the extended TCC data set with published TCGA data resulted in a significantly higher overall FBXW7 mutation rate in younger patients: 20/80 = 25.0% in young, 44/415 = 10.6% in old, p=0.0016 (Table 2). CBX4 H398delinsHH was not observed in TCGA somatic mutation data, further suggesting it was an inherited variant. Truncating and recurrent mutations observed in the combined cohort were spread across the gene, but R578X (R658X in cBioPortal) was much more common in younger patients (Figure 2).
Mutational landscape studies have identified a number of recurrently mutated genes.29 We specifically examined mutation frequencies between young and old patient tumors across eight of the most commonly mutated genes in CRC as well as FBXW7: APC, KRAS, TP53, BRAF, FBXW7, NRAS, PIK3CA, SMAD4, and TCF7L2. We observed a significantly different mutation rate in FBXW7, but not in any of the other genes (Supplemental Table 3). Specific positions in these genes showed potential significance (Supplemental Table 1a), but did not survive multiple test correction, even after expanding the number of younger patients (Supplemental Table 2a). Interestingly, an initial analysis performed before removal of MSI-H samples indicated BRAF V600E was more frequently observed in older patients (0/34 young, 46/248 old, p=0.0023 uncorrected). Additional mutations related to MSI-H status were also more frequently observed in older samples. The difference was no longer significant after removing MSI-H samples, but BRAF V600E was still more frequent in older patients in the expanded cohort (1/51 young, 11/195 old, p=0.47) (Table 2).
Mutation rates in these genes were generally not significantly different in the TCGA dataset (Supplemental Table 4). APC was significantly more mutated in older patients when considering truncating mutations (13/29 in young, 158/220 in old, p=0.0051 uncorrected) or all mutations (15/29 in young, 162/220 in old, p=0.027, Supplemental Table 5). As APC trended to higher mutation rates in older TCC patients, we combined datasets: 46/80 (57.5%) younger, 306/415 (73.7%), p=0.0046 (Table 2). BRAF V600E was observed in older patients even after removal of MSI-H cases, but not in young (0/29 young, 18/220 old, p=0.24) Combining counts from our expanded TCC cohort with the TCGA cohort results in a significantly lower incidence in young patients: 1/80 (1.3%) younger, 29/415 (7.2%) older, p=0.043 (Table 2).
Recurrent polymerase episilon (POLE) mutations specific to the proofreading domain have been demonstrated in several different tumor types, including colorectal cancer.21, 31 We therefore examined our cohorts for mutational differences in the POLE proofreading domain, from amino acids 268–471.32 We observed such mutations in 5/51 young and 2/195 old patients (p-value 0.0048). Significantly increased proofreading domain mutation rates in young patients were also observed in TCGA colorectal data: 3/29 young, 2/220 old, p=0.012. Combining the datasets results in a significantly higher POLE proofreading domain mutation rate in younger patients: 8/80 (10%) younger, 4/415 (0.96 %) older, p=9.7×10−5 (Table 2, Figure 2). POLE proofreading domain mutations have previously been associated with high somatic mutation rates.33 We examined mutation counts in POLE mutated tumors in the TCC expanded and TCGA cohorts, and observed higher mutation counts in these samples within each cohort (Figure 3, Supplemental Table 5).
We have investigated differences in mutation rates between younger and older CRC patients that may contribute to variations in disease progression and outcome. While the overall genetic landscape of colorectal cancer is similar, we have shown that FBXW7 truncating mutations and POLE proofreading domain mutations are significantly more common in younger patients, with approximately 10-fold enrichment of each in this population.
FBXW7 is an F-box member of the Skp1-cullin-F-box (SCF) complexes responsible for phosphorylation dependent ubiquitination of specific proteins. Target substrates include cyclin-E, JUN, MYC, and others. The FBXW7 protein interacts with p53 and has been established as a tumor suppressor in several human cancers, including cholangiocarcinomas and endometrial cancers.34 Genetic alterations in the FBXW7 gene have been associated with poorer prognosis in CRC patients.35
Following differential mutation discovery, expansion, and comparison to independent external datasets, we found that FBXW7 is more frequently mutated in younger patients. The difference is seen in our patient samples when considering any mutation in the gene (27.5% in young, 10.7% in old) and when considering only truncating mutations (11.8% young, 1.0% old). Truncating mutations are significantly enriched in younger samples in the TCGA data as well. Future investigation of FBXW7 alterations may reveal contributions to clinical outcome (drug response, survival).
We also observed an increased rate of polymerase epsilon (POLE) proofreading domain mutations in younger patients. POLE mutations have been identified in highly mutated samples across cancer types. These mutations have been described in early onset CRC in individuals with no known familial syndrome.33 The tumors have been found to be microsatellite stable but hypermutated because of a tendency to develop base substitution mutations.
The recent advent of immunotherapy treatments in cancer could have clinical implications for POLE mutant patients. A link between high mutation rates and response to immune checkpoint inhibitors has been established.36 POLE mutations have been associated with increased neoantigen load, as well as increased immune cell infiltrate and expression of PD1 and PD-L1 in endometrial cancer, suggesting a potential therapeutic approach to these specific tumors.29, 37, 38 In a recent study of patients with advanced colorectal cancer, it was found that only patients with mismatch repair deficient tumors showed a response to an anti-PD1 agent.39 The mechanism by which this was hypothesized to occur was through higher tumor neoantigen load allowing for increased recognition by the immune system.
We specifically focused on POLE mutations in the proofreading domain, and observed significantly elevated mutation rates among young patients in our expanded cohort as well as the TCGA data set. These mutations are observed in young patients at around 10% frequency in both data sets versus only 1% in the older population. Furthermore, we observed that POLE proofreading domain mutations were associated with higher tumor mutational burden in both our cohort and TCGA datasets. Given the high rate of POLE mutation, the efficacy of immune checkpoint inhibitors merits specific study in these young CRC patients.
Several of the most common classically mutated genes in CRC, including KRAS and TP53, did not show a significant difference between cohorts in our discovery and expanded datasets, nor in the independent TCGA dataset, suggesting that there is molecular overlap among younger and older individuals with CRC. BRAF V600E mutations were more commonly observed in older patients. Although BRAF mutations are more associated with MSI-H CRC, this difference was observed in the context of MSS tumors. APC mutations were also somewhat more common in older patients. Although these differences were subtle, further study is warranted to confirm and understand such discrepancies.
Though there are overall similarities in commonly mutated genes between older and younger patients, distinct characteristics exist. Most strikingly, we noted significantly increased mutation rate in FBXW7 and in the POLE proofreading domain in younger patients. Further study is warranted to confirm our findings and to investigate the translational impact of FBXW7 and POLE in younger CRC patients, including the potential of immunotherapy to benefit a subset of younger patients.
Total Cancer Care® is enabled, in part, by the generous support of the DeBartolo Family, and we thank the many patients who so graciously provided data and tissue to the Total Cancer Care Consortium. Our study also received valuable assistance from the Cancer Informatics, Collaborative Data Services, and Tissue Core Facilities at the H. Lee Moffitt Cancer Center & Research Institute, an NCI designated Comprehensive Cancer Center, supported under NIH grant P30-CA76292. Funding was also provided by the Lewis Family Cancer Fund and the Gonzmart Family Foundation.
Conflict of Interest Disclosures: The authors have completed and submitted the ICMJE Form for Disclosure of Potential Conflicts of Interest. Dr. Teer reports personal fees from Philips, outside the submitted work. In addition, Dr. Teer has a patent pending for a large data storage model. The other authors have nothing to disclose.
Author Contributions:Conceptualization: Kothari, Teer, Abbott, Kim, Reed, Shibata.
Methodology: Teer, Reed, Shibata.
Software: Teer, Yoder, Zhang.
Validation: Teer, Yoder.
Formal analysis: Brohl.
Investigation: Kothari, Teer, Abbott, Srikumar, Yoder, Zhang.
Data Curation: Kothari, Teer, Abbott, Srikumar.
Writing – original draft: Kothari, Teer, Reed.
Writing – review and editing: Kothari, Teer, Abbott, Brohl, Kim, Reed, Shibata.
Visualization: Teer, Abbott, Srikumar, Reed.
Supervision: Kim, Reed, Shibata.
Project Administration: Yoder, Reed, Shibata.
Funding Acquisition: Reed, Shibata.