|Home | About | Journals | Submit | Contact Us | Français|
Panels of prognostic biomarkers selected using candidate approaches often do not validate in independent populations, so additional strategies are needed to identify reliable classifiers. In this study, we used an array-based approach to measure DNA methylation and applied a novel method for grouping CpG dinucleotides according to well-characterized genomic sequence features. A hypermethylation profile among 13 CpG loci characterized by polycomb group target genes, mammalian interspersed repeats, and transcription factor binding sites (PcG/MIR/TFBS), was associated with reduced survival (hazard ratio: 3.98, p=0.001) in head and neck squamous cell carcinoma patients. This association was driven by CpGs associated with the TAP1 and ALDH3A1 genes, findings that were validated in an independent patient group (hazard ratio: 2.86, p=0.04). Together, the data not only elucidate new potential targets for therapeutic intervention in head and neck cancer, but also may aid in the identification of poor prognosis patients who may require more aggressive treatment regimens.
Head and neck squamous cell carcinoma (HNSCC) arises from the mucosal lining of the larynx, pharynx, and oral cavity. Over 40,000 new cases develop annually in the US and despite advances in treatment regimens, five-year survival rates near 50% have declined only modestly over the past 30 years (1). The major determinants of patient prognosis include patient age, tumor stage and site, lifetime duration of alcohol/tobacco exposure, and oncogenic human papilloma virus-16 (HPV16) infection, although these do not fully inform patient outcome and are used conservatively in treatment planning (2). Identification of additional prognostic molecular biomarkers is needed to provide clinicians with better therapeutic decision-making tools. Candidate gene approaches have been employed in past studies of head and neck cancer to identify additional biomarkers of prognosis, although their results have almost uniformly been poorly replicated in independent studies (3-5). Thus, utilizing discovery-based approaches coupled with appropriate validation may provide novel markers of greater utility.
Epigenetic silencing of genes via promoter hypermethylation is a critical event in the development of many cancers, including HNSCCs (6-9). Due to the inherent stability of 5-methylcytosine, DNA methylation is attractive an epigenetic indicator for measurement. Consistent with studies of candidate genetic markers, candidate studies using DNA methylation as a predictive tool have uncovered novel biomarkers of survival (10). However, many prior studies suffer from a limited ability to detect strong overall survival associations and these identified associations may vary widely among different populations (11-13). Heterogeneous patterns of somatic alterations across tumors can decrease the utility of single candidate genes as biomarkers and molecular profile-based markers that address this challenge are increasingly pursued and employed.
Previous studies by our group and others have shown that profiles of cancer-related epigenetic alterations are dependent upon the local genomic architecture (11, 14-16). In fact, Lienert and colleagues (17) recently described “methylation-determining regions” of short DNA fragments whose sequence content (CpG density, transcription factor binding sites) largely defines methylation patterning in mouse embryonic stem cells. Therefore, we compared DNA methylation grouped by specific sets of phenotypically well-understood DNA sequence elements to ask if methylation profiles of these sets were associated with HNSCC patient survival times. We measured methylation genome-wide, at over 26,377 CpG loci and defined novel groups of loci based on their proximity to functional sequence elements.
The initial discovery-phase study population has been previously described (18) and an independent validation population was drawn from Boston-area hospitals between 2003 and 2007, in an extension of the original case-control study of HNSCCs. Tumor samples (encompassing all head and neck sites except nasopharyngeal carcinomas) from incident cases were microscopically examined and histologically confirmed to be >70% tumor content by the study pathologist. Patients were enrolled after providing written, informed consent, as approved by all of the participating institutional review boards. Clinical information was collected and HPV16 status assessed using short fragment PCR to amplify a region of the L1 gene of HPV16, according to previously published methods (19). In total, 91 fresh-frozen tumor specimens from the discovery patient population and 101 formalin-fixed paraffin-embedded (FFPE) tumors from the validation population were subjected to methylation analysis. All-cause patient survival data were obtained from the National Death Index and survival was tracked for eight years post-diagnosis when possible.
DNA was extracted from fresh-frozen tumors and 18 clinically normal head and neck tissues sourced from the National Disease Research Interchange using the DNeasy Blood and Tissue kit (Qiagen, Valencia, CA) or from 20 μm FFPE sections according to previously published procedures (20). One microgram of genomic DNA was sodium bisulfite-modified using the EZ DNA Methylation Kit (Zymo Research, Orange, CA) per the manufacturer’s instructions. DNA methylation was measured in the discovery patient population on a genome-wide scale using the Illumina Infinium HumanMethylation27 microarray platform (San Diego, CA). This BeadChip assay measures methylation (21), given as a β value ranging from zero to one, at over 27,000 CpG loci. Arrays were processed at the UCSF Institute for Human Genetics, Genomics Core Facility according to the manufacturer’s protocol. Data were assembled in BeadStudio without normalization, as recommended by the manufacturer. Control probes were used to assess sample performance. Specifically, the multivariate characteristics of array control probes based on fitted mean vector and variance-covariance matrix (Mahalanobis distance) was used to screen for outlying samples (although none were found). Sex chromosomal loci (n=1092) were excluded to avoid gender-specific methylation bias. As single nucleotide polymorphisms near the interrogated CpG site are known to induce spurious signals in the Illumina platforms, we excluded an additional 109 probes which resulted in a final dataset of 26,377 autosomal loci associated with 13,856 genes.
Methylation data were analyzed in R statistical software v2.8.1 (http://www.r-project.org). Differential methylation (hypermethylation/hypomethylation) was determined by comparing the distributions of the methylation betas for each locus between tumor samples and normal tissue. A nonparametric Cox-Wilcoxon rank sum test was used to determine significance (false discovery rate q<0.05) and a threshold of Δβ > 0.2 was imposed to identify potentially meaningful biological changes. All 26,377 autosomal array CpG loci were clustered into mutually exclusive groups based on the following genomic functional sequence element criteria: CpG island status (22) (obtained from the array annotation file), Polycomb group (PcG) target status of the gene associated with the CpG (i.e. gene was described as a PcG target in at least one of Bracken, et al. (23), Lee, et al. (24), Schlesinger, et al. (25), or Squazzo, et al.(26)), presence within 1kb of at least one of 258 computationally predicted transcription factor binding sites (TFBS; sequences obtained from the tfbsConsSites track of the UCSC Genomes Table Browser (NCBI36/hg18 assembly, TFBS Z-score > 2)), and presence within any of the following types of repetitive element as defined by the Repeatmasker v3.2.7 track within Genomes Browser: Alu, LINE-1, LINE-2, and MIR sequences. Methylation array β values were averaged across all CpG loci within each bioinformatic class to generate an aggregate methylation value and tumors were grouped into high/intermediate/low categories depending on whether their specific methylation level was in the highest, middle, or lowest tertile across the population for each class.
Cox proportional hazards models were performed to determine class-specific (and locus-specific) associations between methylation groups and prognosis, and were controlled for patient age at diagnosis, anatomic site, combined AJCC stage, and HPV16 status. The null distribution of the 41-dimensional Cox Z-statistic was obtained by randomly permuting aggregate methylation values with respect to the survival variables 10,000 times. Note here that the response variable is survival and the predictor is that of tumor methylation. An omnibus test of significance, analogous to a Kolmogorov-Smirnov test, was performed by comparing the observed maximum for 41 Z-scores with the corresponding quantity over the permutation distribution. We tested for univariate associations between epidemiologic factors and high/intermediate/low categorical tumor methylation using non-parametric Kruskal-Wallis anova tests for continuous variables and a 10,000-permutation chi-squared test for categorical variables. An association was considered significant where p< 0.05.
In order to determine the specificity of individual CpG locus methylation associations with patient outcomes, we assessed methylation at sites proximal to the index CpG loci (ALDH3A1 and TAP1). CpG loci in different bioinformatic groups or in nearby genes were compared, using Spearman’s rho, for their concordance with the index CpG, using the array CpG methylation values. Pyrosequencing assays were designed for the top two CpG loci (corresponding to ALDH3A1 and TAP1 genes) which were associated with survival. Pyrosequencing reactions were performed in triplicate according to the procedures described by Bollati et al. (27), with the following modifications: the annealing temperatures were 62°C-ALDH3A1 / 55.5°C-TAP1 and 47 (rather than 45) cycles were used in the case of ALDH3A1. Primers for this assay are provided in Supplementary Table 1. Sodium Bisulfite conversion efficiency was monitored using internal non-CpG cytosine residues using the PyroMark Q96MD system. DNA methylation at each locus was calculated by taking the percent of methylated signal divided by the sum of the methylated and unmethylated signals for each CpG. Nonparametric Spearman’s correlations were calculated between array CpG methylation and bisulfite pyrosequencing values. Tertile groups of methylation were determined across all tumors (within each dataset) for each CpG and tumors were stratified into low-, intermediate-, or high-methylation groups depending on their tertile membership. Survival was compared among groups using a log-rank test.
DNA methylation of TAP1 and ALDH3A1 loci were assessed in the independent validation patient set of 101 FFPE tumors. Pyrosequencing values of both loci were aggregated together (averaged) for each tumor and patient survival was compared across methylation tertiles. Cox models were used to adjust associations for clinical and demographic factors.
We determined the methylation state of 91 discovery-phase HNSCCs at all 26,377 autosomal loci using the Illumina Infinium HumanMethylation27 microarray. The methylation β distribution of this dataset is shown in Supplementary Figure 1, where each CpG locus is averaged across all 91 tumors. Given the enrichment of promoter and early exonic CpG loci in this array, a large proportion of loci in this array are unmethylated (45% of all CpGs where β<0.10) while a small proportion are highly methylated (0.5% of CpGs where β>0.90). To identify CpG loci whose altered methylation is associated with prognosis, we utilized a supervised approach that leverages genomic architecture to inform locus selection. Array CpG loci were bioinformatically clustered based on phenotypically well-understood DNA sequence elements into 41 mutually exclusive groups. Specifically, CpGs were clustered based on their presence within LINE-1, LINE-2, Alu, or MIR repeats, CpG islands, genes that are known targets of polycomb-group proteins, or within 1000-bp of transcription factor binding sites. Individual CpGs were only allowed to be members of a single bioinformatically-derived cluster. We hypothesized that this approach would attenuate biochemical and biological noise and reduce false discovery, thus providing increased power to detect significant associations with survival. Table 1 lists the DNA sequence elements that define each of the 41 CpG clusters and provides the number of CpGs in each cluster. Indicative of a preference for gene-rich regions in the array design, a single group of CpGs defined by being located in a CpG island and near a transcription factor binding site (CpGITFBS) contains nearly half of all autosomal array CpG loci. The remaining CpG loci were spread unevenly across the other bioinformatically-derived clusters and 6% of CpG loci were not associated with any sequence elements that defined our clusters.
To investigate the relationship between these profiles of DNA methylation alterations and patient prognosis, we determined the average methylation value across all CpGs (aggregate methylation) among bioinformatic clusters for each of the tumor samples. For every bioinformatic cluster, tumors were sorted into “high,” “intermediate,” or “low” methylation subgroups depending on whether their methylation level (among member CpG loci) fell into the highest, middle, or lowest tertile, respectively. An omnibus test for significance indicated that, across bioinformatic clusters, methylation was significantly associated with patient survival (permutation test p=0.0037; Figure 1A). This association was robust to potential confounding variables such as age at diagnosis, anatomic site, combined AJCC stage, and HPV16 status (permutation p=0.021; Figure 1B). In the latter model, the power to detect significant differences in survival times, based on methylation of locus clusters, was diminished due to incomplete clinical data, however the effect remained significant.
To characterize the methylation state of the tumors in each patient subgroup, we compared the number of CpG loci that were differentially methylated compared to normal head and neck tissues (Supplementary Figure 2) and found the ratios of hypomethylated/hypermethylated CpG were significantly different across patient subgroups (permutation p<0.001). The top differentially-methylated sites within each group are described in Supplementary Table 2.
Methylation of the CpG loci associated with the cluster: polycomb group target genes, mammalian interspersed repeats, and transcription factor binding sites (PcGMIRTFBS, Figure 1) was consistently associated with prognosis in these aggregate models. The aggregate methylation value for this group (β =0.52) was significantly higher than the mean across all clusters (average= 0.43, range= 0.11-0.71). Many of the genes associated with the 13 CpG loci in this cluster (described in Table 2) are involved in the maintenance of cellular homeostasis or possess tumor suppressive function. In order to assess possible bias or confounding arising from the characteristics of individual patients, we compared clinical and demographic factors the patients in the low, intermediate, and high methylation groups (Table 3). Notably, no significant differences in age at diagnosis, gender, HPV16 status, tumor site, tumor stage, drinking or smoking exposures were observed.
We next asked if the DNA methylation level within this cluster of 13 PcGMIRTFBS-member loci was associated with patient survival times. Kaplan-Meier analysis (Figure 2) revealed a significant difference in overall survival among the patient methylation groups by taking into account all loci in this class (Log-rank p=0.0003). The median survival time of patients in the high methylation group was 1.2 years with 13% of patients alive at 8 years, as compared to a 50% 8-year survivorship and a median survival time of 5.9 years in the low methylation group. In addition, the patient group with intermediate methylation demonstrated intermediate survival times. A Cox proportional hazards analysis, adjusted for age at diagnosis, tumor site, tumor stage, and HPV16 status, showed that patients in the high DNA methylation group had significantly decreased survival times (HR: 3.98, 95% CI: 1.8-8.9) (Table 4). Due to potential differences in treatment regimens of low vs. high stage disease that may influence survival times within and each patient methylation subgroup, we performed a stratified subset analysis and found that survival times did not significantly vary by stage within any of the high, intermediate, or low methylation subgroups (Log-rank p= 0.43, 0.33, 0.98, respectively).
To determine whether specific CpGs in the PcGMIRTFBS cluster were driving the observed survival association, individual Cox proportional hazards models were constructed for each of the 13 PcGMIRTFBS CpG loci (Supplementary Table 3). Methylation levels of TAP1 and ALDH3A1 CpGs were most highly associated with survival time (hazard ratios and p-values: 2.42 and 1.84, p<0.001 and p<0.02); we therefore selected these loci for confirmation by pyrosequencing all of the discovery-set tumors with available DNA (88 of 91). Methylation values determined by pyrosequencing were highly concordant with array measurements at both loci (Spearman’s correlation= 0.88 and 0.86) and survival associations were validated (log-rank p-value= 0.003 and 0.001; Supplementary Figure 3). To determine the specificity of this survival association for CpG loci in the PcGMIRTFBS cluster, array methylation values for ALDH3A1 and LMO2 CpGs were compared to those of loci within the same genes but clustered into a separate PcGTFBS group (LMO2 is examined here since there were no additional TAP1 CpGs with which to compare methylation in any other bioinformatic groups). Despite a close proximity of the ALDH3A1 CpG pairs and LMO2 CpGs pairs (~100 bp for each gene), methylation concordance was poor (Spearman’s correlation= 0.29 and 0.62) and proximal CpG methylation levels were not associated with survival in either case (log-rank p-value= 0.53 and 0.97; Supplementary Figure 4). Similarly, concordance of TAP1 and ALDH3A1 with neighboring gene loci (in ULK2 and PSMB9, respectively) was low (Spearman’s correlation= 0.14 and -0.14), and methylation of these CpGs were not associated with survival (log-rank p-value= 0.17 and 0.57; Supplementary Figure 5).
In order to validate the observed association with survival, we further examined an additional 101 tumors drawn from an independent tumor collection period. Bisulfite pyrosequencing analysis of these tumors at the TAP1 and ALDH3A1 loci was performed and aggregate methylation values were generated for each tumor. Membership in the patient subgroup with the highest methylation was independently poorly prognostic (Figure 3 and Table 4; HR: 2.86, 95% CI: 1.02-8.11), validating the initial discovery-phase observation. In addition, patients that belonged to the intermediate methylation subpopulation had decreased survival times compared to the low methylation subgroup, although this association did not reach significance in multivariable analyses (Figure 3 and Table 4; HR: 2.82, 95% CI: 0.99-8.02).
Epigenetic alterations, reflected in changes to the 5-methylcytosine content at specific genomic loci, are required for normal cellular function and development. Candidate gene approaches for identification of prognostic biomarkers are widely used by researchers and studies have found that DNA methylation alterations are associated with patient survival in a number of cancers (11, 28-36), including HNSCC. Head and neck malignancies are often aggressive tumors with a poor probability of survival, and there are few tools currently available to assist the oncologist in determining long-term outcome in these patients. Here we used a genome-wide array to measure DNA methylation in HNSCC, identified markers of patient survival time with a novel approach by clustering CpGs based on their association with local sequence features, and validated the identified markers in an independent set of tumors.
Recently, investigations have revealed that, in addition to aging and environment, the architecture of the genome itself may predispose certain CpG sites to DNA methylation, both in normal cells and in cancer (16, 37, 38). Therefore, methylation profiling using genomically-informed methodologies is an attractive approach for identifying novel biomarkers. Here, we have developed a unique method for classifying CpG loci that allows functional sequence elements to dictate the clustering. This technique has proven useful in defining a novel association between methylation of specific locus groups and patient prognosis.
Importantly, our analysis demonstrates that HNSCC patients with tumors hypermethylated at a group of 13 CpG loci (defined by proximity to polycomb gene targets, mammalian interspersed repetitive elements, and transcription factor binding sites) have a significantly reduced survival time independent of HPV16 infection status. This suggests that methylation alterations at these sites may determine the phenotype or therapeutic response of this disease. While more work is necessary to precisely determine how the nexus of these three functional sequence element types defines an observable phenotype, it is possible that methylation of these sites, particularly PcG target gene promoters and TFBSs, may potentiate the transformation into (or represent a product of) a stem cell-like tumor. Indeed, there is accumulating evidence that DNA hypermethylation is observed in many cancers at the sites of polycomb-mediated gene repression in embryonic cells, which become relaxed during differentiation, and this methylation correlates with stem cell characteristics (39). In addition, a number of transcription factors have been shown to be involved in the recruitment of DNA methyltransferases to PcG target sites (39). Further, studies of aging-dependent methylation have demonstrated that PcG marking and frequency of coincident retrotransposable elements are both correlated and complementary (14). All of these observations further support our result that the combination of these sequence elements is critical in determining tumor behavior.
Focusing on the individual gene members of the PcGMIRTFBS cluster, we see that many of these are highly relevant to head and neck disease. Chief among them is ALDH3A1, which is expressed in the oral mucosa (40) and is a member of the aldehyde dehydrogenase family of enzymes that convert the carcinogenic intermediate of ethanol metabolism, acetaldehyde, into non-toxic acetic acid. As alcohol is a major risk factor for the development of HNSCC, one might expect alterations of ALDH3A1 to play a role in the initiation and promotion of malignancy. At the same time, this gene has been shown to inhibit epithelial cell proliferation (41) and is frequently mutated in breast cancers (42). In addition, it is well-known that aldehyde dehydrogenase is involved in normal stem cell biology and is a functional marker for epithelial cells with enhanced tumorigenic potential. A recent publication showed enrichment for ALDH3 in the stem cell populations of mammalian oral tissues (43). This supports our hypothesis that alterations at PCGMIRTFBS-associated genes define a more stem-like constitution of tumor cells that engender more aggressive HNSCCs.
Downregulation of another gene represented in our PcGMIRTFBS-prognostic cluster, TAP1, allows HNSCC cells to avoid immune surveillance by cytotoxic T-lymphoctyes (44) and a lack of protein expression has recently been shown to confer a negative prognostic risk in HNSCC (45) as well as in many other cancers. This study, however, did not include HPV status in the survival analysis, so the question remained whether downregulation was truly independently prognostic. Another group reported that ectopic expression of TAP1 in xenograft assays significantly prolonged mouse survival time and increased immune infiltrate (46). In addition, functional investigations have revealed that this gene is downregulated in primary HNSCCs (47, 48), metastasis (45), and HNSCC cell lines (49), although promoter methylation was not previously described in this setting. Together, these data suggest that TAP1 may be a candidate for therapy in human HNSCCs.
Another gene marked by a CpG in the PcGMIRTFBS cluster that was associated with HNSCC survival is the ankyrin-repeat SOCS box-containing protein 2 (ASB2) which appears to be a modulator of notch signaling (50) and inhibits growth of leukemic cells (51). However, its potential status as a tumor suppressor in the head and neck had yet to be described. A prominent tumor suppressor in the PcGMIRTFBS cluster is CDKN1A, encoding the p21 (WAF) protein that signals G1 cell cycle arrest or senescence. Two additional genes identified in our analysis (SPOCK2 and NOPE) are known to be methylated as potential biomarkers in cancer (52, 53). Consistent with our observations, the genes represented by CpGs in the PcGMIRTFBS group are primarily tumor suppressors, and one would therefore expect that their inactivation through a combination of hypermethylation and additional somatic alterations would result in a poorer prognosis.
In summary, we have developed a novel technique to identify clinical characteristics of HNSCCs using the genomic information of CpG loci coupled with epigenetic content at those sites. In two independent populations, we show that DNA methylation profile markers may be used to identify those at greatest risk of death, irrespective of HPV status. The identification of specific DNA methylation biomarkers, such as those presented here, may assist in selecting patients who are most likely to benefit from tissue-sparing procedures and targeted therapies. It will be important to validate this in other populations in an effort to move these biomarkers into clinical practice.
This study was funded by the Flight Attendant Medical Research Institute (CJM) and the National Institutes of Health (KTK: CA078609, CA100679).
The authors declare no potential conflicts of interest.
Microarray data from this study have been contributed to the NCBI Gene Expression Omnibus under the accession number GSE25093 (http://www.ncbi.nlm.nih.gov/geo).