|Home | About | Journals | Submit | Contact Us | Français|
Head and neck squamous cell carcinomas (HNSCCs) represent clinically and etiologically heterogeneous tumors affecting >40000 patients per year in the USA. Previous research has identified individual epigenetic alterations and, in some cases, the relationship of these alterations with carcinogen exposure or patient outcomes, suggesting that specific exposures give rise to specific types of molecular alterations in HNSCCs. Here, we describe how different etiologic factors are reflected in the molecular character and clinical outcome of these tumors. In a case series of primary, incident HNSCC (n=68), we examined the DNA methylation profile of 1413 autosomal CpG loci in 773 genes, in relation to exposures and etiologic factors. The overall pattern of epigenetic alteration could significantly distinguish tumor from normal head and neck epithelial tissues (P<0.0001) more effectively than specific gene methylation events. Among tumors, there were significant associations between specific DNA methylation profile classes and tobacco smoking and alcohol exposures. Although there was a significant association between methylation profile and tumor stage (P<0.01), we did not observe an association between these profiles and overall patient survival after adjustment for stage; although methylation of a number of specific loci falling in different cellular pathways was associated with overall patient survival. We found that the etiologic heterogeneity of HNSCC is reflected in specific patterns of molecular epigenetic alterations within the tumors and that the DNA methylation profiles may hold clinical promise worthy of further study.
Head and neck squamous cell carcinoma (HNSCC) is a physically, etiologically and molecularly heterogeneous disease, with an annual incidence in the USA of >40000 cases. The majority of head and neck cancers are associated with tobacco and alcohol use, acting both independently and synergistically (1,2). However, human papilloma virus (HPV), particularly the high-risk type 16, is associated with 20–25% of HNSCC, and individuals with HPV-positive disease compared with HPV-negative have better overall survival (3,4). Given the established association of etiologic factors with clinical outcome, identifying the molecular character of tumors arising from varying exposures will aid in understanding the mechanisms influencing prognosis and provide novel targets for diagnosis and therapy of HNSCC.
Study of the contribution of epigenetic alterations to tumor biology is now a vast field, and it is widely accepted that epigenetic alterations in target tissues are part of the causal path to the development of malignancy (5,6). DNA methylation-associated epigenetic silencing of tumor suppressor genes is an aberrant mark of cancer with considerable specificity. DNA hypermethylation in HNSCCs targets genes in pathways such as DNA repair, cell cycle control, apoptosis, angiogenesis, cell–cell interaction and metastasis (7). Associations among HPV16, smoking, betel nut use and methylation of specific genes have been identified (8–10). These findings, though, have focused on single gene methylation alterations and their associations to exposures, but have not examined how exposures might be influencing the overall processes leading to epigenetic alteration. We have previously demonstrated that exposures and age, in bladder cancer, lead to an increased propensity for gene promoter hypermethylation in a panel of 16 tumor suppressor genes (11). Using now available high-throughput technologies, we are better equipped to understand the process by which carcinogenic exposures act to alter the DNA methylation status of a developing tumor.
We aim to more completely understand the etiology of epigenetic alterations by examining the relationships between these alterations and carcinogen exposures. In this manner, we hope to define novel pathways through which HNSCC can arise and aid in the development of diagnostic screening tools and targeted therapies. We characterized DNA methylation profiles of primary human HNSCC tumors by examining DNA methylation status of ~1400 CpG sites in ~800 cancer-related genes in a population-based case series of incident, primary HNSCC and non-diseased head and neck epithelium. Both the diagnostic and prognostic utility of these markers were defined; and uniquely, we also revealed how etiologic factors responsible for head and neck carcinogenesis are associated with the molecular character of these tumors.
The study population has been described previously (8,12). Briefly, incident cases of histologically confirmed HNSCC were identified from nine medical facilities in the Boston, MA, metropolitan area. Diagnoses were confirmed by an independent study pathologist. All cases enrolled in the study provided written informed consent as approved by the Institutional Review Boards of the participating institutions. Archived pathology specimens were used for analysis of promoter hypermethylation, and a total of 42 formalin-fixed paraffin-embedded (FFPE) and 26 fresh-frozen tumor samples were selected for analysis. Data on HPV16 tumor DNA status and serology from the parent case–control study (3) have been previously reported. Demographic and exposure information was collected through self-administered questionnaires and clinical information through medical chart reviews. Table I shows the demographic characteristics of the final population studied. In addition to the case tumor tissues, non-malignant head and neck tissues from individuals without head and neck cancer were obtained from the National Disease Research Interchange. Clinicopathologic information is limited by this anonymous tissue bank, but all samples were obtained from patients who were not previously diagnosed with any cancer, and thus whose cause of death was not cancer related.
For FFPE tissues, tumor sections with the greatest proportion of malignant tissue were selected by the study pathology for use in our molecular analyses. Three 20 μM sections were cut from each FFPE tumor sample and the sections were transferred into microcentrifuge tubes. The paraffin was dissolved using Histochoice Clearing Agent (Sigma–Aldrich, St Louis, MO) followed by two washes with 100% ethanol and one wash with phosphate-buffered saline. The samples were then incubated in sodium dodecyl sulfate lysis solution (50 mM Tris–HCl, pH 8.1, 10 mM ethylenediaminetetraacetic acid, 1% sodium dodecyl sulfate) with proteinase K (Qiagen, Valencia, CA) overnight at 55°C. De-crosslinking was performed by adding NaCl (final concentration 0.7 M) and incubating at 65°C for 4 h. DNA was recovered using the Wizard DNA clean-up kit (Promega, Madison, WI) according to the manufacturer's protocols. For fresh-frozen tumor tissues and peripheral blood samples, DNA was extracted using the QIAamp DNA mini kit according to the manufacturer's protocol (Qiagen). Sodium bisulfite modification of the DNA was performed using the EZ DNA Methylation Kit (Zymo Research, Orange, CA) following the manufacturer's protocol, with the addition of a 5 min initial incubation at 95°C prior to addition of the denaturation reagent. The de-crosslinking steps in the extraction as well as the 95°C incubation ensure more complete melting of the DNA and thus more complete sodium bisulfite conversion, especially for the highly cross-linked formalin-fixed specimens. Illumina GoldenGate® methylation bead arrays were used to simultaneously interrogate 1505 CpG loci associated with 803 cancer-related genes. This is a commercially available array designed by Illumina to interrogate genes with CpG islands considered cancer related. Bead arrays were run at the University of California-San Francisco Institute for Human Genetics, Genomics Core Facility according to the manufacturer's protocol and as described in ref. (13).
We assembled data with BeadStudio Methylation software from the array manufacturer Illumina (San Diego, CA). All array data points are represented by fluorescent signals from both methylated (Cy5) and unmethylated (Cy3) alleles, and methylation level is given by β=[max (Cy5, 0)]/(|Cy3|+|Cy5|+100), the average methylation (β) value is derived from the ~30 replicate methylation measurements for each loci. At each locus for each sample, the detection P-value is defined as 1−P-value computed from the background model characterizing the chance that the signal was distinguishable from negative controls. Using this as a metric for quality control for sample performance, seven samples (9%), all FFPE, where <75% of loci had a detection P-value <1×10−5, were dropped from analysis. A similar quality control for CpG loci eliminated those with median detection P-value >0.05 (n=8, 0.5%).
Subsequent analyses were carried out in The R Package. For exploratory/visualization purposes, hierarchical clustering using the Manhattan metric and average linkage was performed. Associations between sample type or covariates such as age or gender and methylation at individual CpG loci were tested with a generalized linear model. The beta distribution of average beta values was accounted for with a quasibinomial logit link with an estimated scale parameter constraining the mean between 0 and 1. In contrast, array-wide scanning for autosomal CpG loci (n=1413) associations with sample type or covariate used false discovery rate correction and q-values computed by the qvalue package in R. An false discovery rate q-value of <0.05 was considered significant for this examination. Initial analyses were performed to examine if there were systematic differences in methylation profile between FFPE and fresh tumor samples. Clustering did not reveal linkage by sample type, nor was any association observed across individual loci, thus these sample types (FFPE and fresh) were combined in all subsequent analyses.
The R Package was also used to build classifiers of sample type with the random forest (RF) approach, a supervised classification analysis. RF is a tree-based classification algorithm similar to Classification and Regression Tree (14–16) and was performed on CpG average beta values using RF R package version 4.5-18 by Liaw and Wiener. RF builds each individual tree by taking a bootstrap sample (sampling with replacement) of the original data and on average about one-third of the original data are not sampled [out of bag (OOB)]. Those sampled are used as the training set to grow the trees, and the OOB data are used as the test set. At each node of the tree, a random sample of m out of the total M variables is chosen and the best split is found among the m variables. The default value for m in the RF R package is . In this analysis, we will test a range of m from half of to two times and will use the m that gives the lowest prediction error. The OOB error rate is the percentage of time the RF prediction is incorrect. A test for association between methylation (predictors) and sample type was conducted by comparing the OOB obtained on the data set with the null distribution of OOB errors obtained by permuting sample type labels and running the RF procedure 20000 times.
For inference, data were clustered using a mixture model (17) with a mixture of beta distributions (18), and the number of classes was determined by Bayesian information criterion (19,20). The mixture model was fit by recursively partitioning the data using a 2-class mixture model, with Bayesian information criterion used as a criterion for the split, as described in ref. (21). Class membership was obtained from the mixture model and subsequent univariate associations were tested via permutation test with 10000 permutations each. For continuous variables, the Kruskal–Wallis test statistic was used, whereas for categorical variables, the standard chi-square goodness-of-fit test was used. For those variables having a significant univariate association, multinomial logistic regression was used to model methylation class while controlling for potential confounders. Because of the potentially large number of methylation classes, logistic regression coefficients were regularized using a ridge (L2) penalty, with coefficients for a common (non-intercept) covariate across outcome levels shrunk toward zero (22,23) the tuning parameter was selected by minimizing Bayesian information criterion. To test multivariate associations with stage, ordinary logistic regression and likelihood ratio tests were used. Cox proportional hazards models and likelihood ratio tests of models with and without methylation classes were used for survival analyses. In addition, we performed a locus-by-locus analysis of the 500 most variable loci to examine associations with patient survival using a Cox proportional hazards model, using false discovery rate correction and q-values computed by the qvalue package in R.
Pathway analysis was conducted with the use of Ingenuity Pathways Analysis (Ingenuity® Systems, Redwood City, CA; www.ingenuity.com) (24). Canonical pathways analysis identified the pathways from the Ingenuity Pathways Analysis library of canonical pathways that were most significant to the data set. The top 500 most variable CpG gene loci tested locus-by-locus for a survival association that was also associated with a canonical pathway in the Ingenuity Pathways Knowledge Base were considered for the analysis. Since we are using a cancer panel, the 500 CpG gene loci analyzed were also used as the reference set to avoid bias toward cancer-related pathways by selecting ‘user data set’ in the edit analysis parameters window. The significance of the association between the data set and the canonical pathway was measured in two ways: (i) a ratio of the number of gene loci from the data set that map to the pathway divided by the total number of gene loci that map to the canonical pathway is displayed and (ii) Fisher's exact test was used to calculate a P-value determining the probability that the association between the genes in the data set and the canonical pathway is explained by chance alone.
Characterization of the profile of DNA methylation alterations in non-malignant head and neck epithelial tissues compared with HNSCC tumor samples was completed using the Illumina Goldengate Methylation BeadArray. Unsupervised hierarchical clustering of the DNA methylation data with Manhattan distance and average linkage as the metric across the 1250 most variable autosomal loci (Figure 1A) depicts relatively tight clustering of the non-malignant tissues compared with the tumors, as well as the extent of variability in the methylation beta value across the loci. In a locus-by-locus analysis applying an false discovery rate cutoff q-value of 0.05, we identified 261 loci with significantly differential methylation between tumors and normal (see supplementary Figure S1 available at Carcinogenesis Online). Of those, 125 loci showed greater methylation in tumors compared with normal, whereas 136 loci exhibited lower methylation levels in tumors compared with normal tissues (see list of loci in supplementary Materials and Table 2, is available at Carcinogenesis Online). The confusion matrix (Table II) resulting from RF analysis shows which samples are correctly classified, those that are misclassified and the misclassification error rate for each sample type. Five normal tissues (45%) were misclassified as tumor tissues, while only two tumors were misclassified as normal tissues (3.0%). This results in an overall error rate of 8.86%, a significant improvement in sample classification compared with the expected error under the null hypothesis (P<0.0001). These results, consistent with our previous work, suggest that use of overall patterns of methylation alterations may have more utility in capturing the tumorigenic process than do individual alterations.
A recursive partitioning mixture model applied to methylation data from all autosomal loci in tumors and non-tumor head and neck epithelial tissues delineated 11 distinct methylation classes (Figure 1B). This model demonstrates that methylation class membership was a highly significant predictor of tumor status (permutation P <0.0001).
To examine how known risk factors for HNSCC are associated with these profiles, we utilized a case series approach and reconstructed the recursive partitioning mixture models using only the tumor data (Figure 2A), resulting in the delineation of six tumor-specific classes. A permutation test with tumor stage, dichotomized as high (Stage III or IV) versus low (Stage I or II) revealed a significant association between methylation class membership and stage (P<0.01). A logistic regression model of stage (see supplementary Table 1 available at Carcinogenesis Online) suggested that inclusion of methylation class is significant in predicting stage (likelihood ratio P<0.01) and that membership in Class 6 was associated with a significantly reduced risk of high-stage disease (odds ratio 0.1, 95% confidence interval 0.01, 1.0). Membership in Class 2 showed a similar protective effect, whereas membership in Class 5 was associated with an increased risk of high-stage disease, although the small numbers of tumors in these classes made these estimates unstable.
In order to identify if exposures leading to this disease are associated with these methylation classes, we examined the associations between individual risk factors for HNSCC and these classes. Methylation class was significantly associated with patient age as a continuous variable (permutation test P<0.01, Figure 2B); methylation Class 2 members had lower patient age, and Class 4 higher age compared with other classes. Smoking intensity (packs per day) also significantly differed across methylation classes (P<0.04, Figure 2C); Class 1 demonstrated lower smoking intensity, and 3 relatively high intensity. However, we did not observe a significant association of methylation class with smoking duration (years smoked) or pack-years smoked. A borderline significant association was observed with tumor site by methylation class (oral, pharyngeal and laryngeal, P <0.1) (Figure 2D). Tumor HPV16 DNA status also demonstrated an association with methylation class that approached statistical significance (P<0.1, Figure 2E), patients in Class 4 had a greater prevalence of HPV-positive tumors. Finally, lifetime average drinks per week also showed a strong differential trend by methylation class (P<0.1, Figure 2F).
Multinomial logistic regression results are shown in Table III, with the classes numbered as they were in Figure 2A and with Class 5 serving as the referent class as this class had the largest membership. The overall Wald P-value indicates whether the covariate significantly differentiates class membership overall. Individual confidence intervals for each covariate within a class identify the magnitude of any association and significance of the association of a covariate on membership in that class compared with the referent class (Class 5), conditional on membership in either class.
Patient age and average alcohol drinks per week each significantly differentiated membership across classes (Wald P<0.0001). Laryngeal tumors were less probably to be members of Class 1, and the odds of membership in Class 1 were significantly reduced with each year of age. In addition, the odds of membership in Class 1 compared with Class 5 were significantly decreased by almost 20% for each additional pack of cigarettes smoked per day on lifetime average. Each year of age reduced the odds of membership in Class 2 compared with Class 5 by >10%, and tumors in this class were mostly likely to be oral tumors compared with pharyngeal or laryngeal. Only age demonstrated a significant effect on membership in Class 3 and Class 4, leading to a reduced odds of membership in Class 3 compared with Class 5, but an increased odds of membership in Class 4 compared with Class 5. Class 6 tumors were significantly less probably to be HPV-positive tumors, but more probably to be from patients with greater lifetime alcohol exposures. These results overall suggest that differing etiologies of this disease influence the pattern of epigenetic alteration observed in the resulting tumors.
Next, we examined if the DNA methylation profiles or methylation at specific loci were associated with patient survival. Among the 68 samples examined for methylation using the array, there were 22 deaths and a mean of 2.75 years of follow-up among surviving patients (range 0.75–5 years). We found no significant association between methylation classes derived from the recursive partitioning mixture modeling procedure among tumors and overall patient survival, controlling for tumor stage and patient age.
Finally, we tested the hypothesis that biologic pathways, rather than overall profiles of methylation, are important in determining survival. To examine this hypothesis, we utilized Ingenuity Pathway Analysis to examine which specific pathways were overrepresented among the top 500 loci having both positive and negative correlation with survival as determined by loci-specific Cox proportional hazards analysis (24). The pathways identified to be significantly overrepresented are listed in Table IV, as well as correlation between the increase in methylation beta value of the genes represented by that pathways and patient survival, such that those with a positive correlation would represent loci whose increasing methylation level is associated with improved patient survival (i.e. hazard ratio <1), whereas those with a negative correlation are loci where increasing methylation is associated with poorer survival or a risk hazard ratio (>1). We also identified 18 loci with a false discovery rate <20% in their association with overall patient survival in models stratified by tumor stage and controlled for patient age, and those loci are shown in supplementary Table 2 (available at Carcinogenesis Online). Of note, only two of these 18 loci (ZAP70 and GP1BB) are associated with a hazard ratio >1, whereas 16 demonstrate a protective hazard ratio <1. Such a negative association with risk could indicate, in fact, that loss of methylation at these loci may be associated with increased risk, as one might expect from oncogene activation.
This study revealed that patterns of epigenetic alteration, rather than the identification of single or small numbers of epigenetic alterations affecting individual genes, may hold greater utility in identifying the process through which carcinogens act epigenetically to drive tumorigenesis as well as in providing useful diagnostic value to these molecular markers. The utility of profiles is probably due to the strong, yet multidirectional, correlation between epigenetic alterations across loci, as we have previously reported (25), and as clearly seen in the present data. Our method for identification of methylation classes based solely on the patterns of DNA methylation alterations derived from the Goldengate Methylation BeadArray limits the bias introduced by selection of candidate genes for examination and allows for statistical inference, so that the association between these classes and various clinical or risk factors can be assessed with rigor. It is also critically important to note that it is not only hypermethylation of specific loci that is used in defining these signatures of epigenetic alterations between non-malignant and malignant tissues but also hypomethylation of loci at particular CpG sites. The locus-by-locus approach suggested that 125 loci have greater methylation in tumors compared with normal, whereas 136 loci exhibit lower methylation levels in tumors compared with normal tissues. The RF classification suggests that using DNA methylation at all CpG loci may be clinically useful in confirming tumor, as misclassification of tumor as non-tumor tissue occurred at a frequency of only 3%, but that as a screening tool, differentiating non-tumor from tumor tissue, misclassification may be higher. The recursive partitioning approach defined patterns that can well differentiate between non-diseased oral epithelium and malignant tissues, with class membership highly significantly associated with tissue type (P<0.0001). Similarly, looking specifically within tumors, these methylation alterations patterns significantly differentiated between low-stage disease (I, II) and high (III, IV)-stage disease (P<0.01) with Class 6 being more probably of lower stage than Class 1, controlling for the other methylation patterns. It should be noted that differentiation of adjacent normal tissue in patients with tumors may be more complicated than differentiation of tumor from normal tissue of completely non-diseased individuals, and such an examination is worthy of further study.
Patients whose tumors comprised Class 3 had greater exposures to carcinogens causal to this disease, namely tobacco smoke and alcohol, and their tumors were more often laryngeal (Table III). Thus, the chemical carcinogens, particularly those in cigarette smoke, may be leading to this specific pattern of epigenetic alteration, either directly or through mutation of key genes related to the control of DNA methylation. On the other hand, the patients whose tumors comprised Class 1, demonstrated modest or low exposures to tobacco and alcohol, were not, for the most part, HPV16 DNA positive within the tumors and were relatively young compared with classes 3–6. Thus, these individuals may represent a group of people with true susceptibility for this disease, such that even low to modest exposure can initiate oral carcinogenesis. Further studies are necessary to confirm these observations in a larger series of tumors and may also consider integration of these profiles with genome-wide polymorphism studies to aid in identifying the genes or pathways responsible for this increased susceptibility and thereby aid in the prevention or treatment of this disease.
The individuals whose tumors reside in classes 4–6 demonstrated a greater intensity of tobacco smoke exposure as well as an elevated exposure to alcohol and appeared to be older than individuals of the other classes. Thus, the methylation patterns exhibited by these individuals may be driven more by these exposures and indicate that they are somewhat less susceptible to the disease, as they presented at a later age and required greater exposure to attain this disease.
We were unable to observe any significant associations between these methylation profiles and overall patient survival. This may indicate that our approach (defining methylation classes) classifies only the phenotypic results of methylation and is not integrating the obviously crucial results of genetic changes. Indeed, the epigenetic alteration profile may be driven predominantly by exposures that do not relate to patient outcome. Our analyses of pathways overrepresented among the loci demonstrating a relationship to patient survival suggests that patient outcome is more probably determined not by the pattern of methylation but by the combined targets of genetic and epigenetic alteration. The most significantly overrepresented pathway, in this analysis, was the ‘ephrin receptor-signaling pathway’, containing a number of oncogenes such as SRC, EGF and EPHA2. The results of the Cox model demonstrate a positive relationship between methylation beta value of these loci and improved patient survival, or more appropriately that loss of methylation at these loci is associated with poorer patient survival. Overexpression of EPHA2, for example, has been linked to invasiveness in breast (26) and ovarian (27) tumors, and in general, deregulation and overexpression of this family of genes are related to changes in cell motility, morphology, migration and adhesion—all marks of an aggressive phenotype (28,29). Similarly, loss of methylation of the stress activated protein kinase (SAPK)/c-jun N-terminal kinase and platelet-derived growth factor-signaling pathways, traditional oncogenic growth-signaling pathways, probably represents increasing tumor growth leading to poorer patient outcome. Gene-specific analyses also suggest that loss of methylation of critical oncogenic growth factors and receptors including HGF, FGF, ATP10A and NTRK3 is associated with poorer patient survival, whereas hypermethylation of ZAP70 and GP1BB is also associated with poor survival, further suggesting that these genes may be acting as novel tumor suppressors in HNSCC. Additional analyses using a larger independent set of tumors will aid in confirming these results and determining if a targeted panel of genes, such as those predicted by this array-based approach, can be clinically useful as prognostic indicators in this disease. In addition, these data suggest that integrative analyses of human tumor samples, including epigenetic alteration, genetic copy number changes and mutation, may be necessary to truly understand the biology and better identify meaningful disease biomarkers.
HNSCC is a relatively common tumor, characterized by its aggressiveness and high morbidity, particularly at later stages, as well as for the distinct clinical outcomes associated with its disparate etiologies. Molecular characterizations of HNSCC tumors have yet to clearly identify pathways responsible for driving the differential aggressiveness or treatment response related to HPV etiology. Due to its aggressiveness at high stages, the diagnostic and prognostic utility of somatic molecular alterations including DNA methylation and genetic alterations in accessible materials, such as saliva or buccal swabs, at early stages of disease development is highly desirable, but has proven elusive due to a general lack of markers with appropriate levels of sensitivity and specificity (30). Identification of novel sites or patterns of DNA methylation alterations in tumors may provide the necessary sensitivity and specificity to eventually make non-invasive, early diagnosis of this disease possible. In addition, understanding how the disease risk factors drive these alterations may allow for more specific diagnostic panels, as well as more targeted preventative strategies based on the molecular character of the resulting disease. It is clear that exposures causal to this disease not only can influence the genetic make-up of the tumor has been previously demonstrated but also significantly and distinctly alter the epigenetic profile. This concept that carcinogens and lifestyle factors contribute to tumorigenesis through epigenetic mechanisms is critical and holds great promise in disease prevention and treatment. In order to define those individuals at greatest risk and to limit the effect of carcinogens in these individuals, the epigenetic character of the individual and the epigenetic mechanism of the carcinogens must be considered. In addition, treatment aimed at the epigenome holds great promise as we continue to understand the complex and powerful contribution, this mode of somatic alteration plays in tumorigenesis.
Flight Attendants Medical Research Institute to C.J.M. and M.D.M.; National Institutes of Health (R01CA078609, R01CA100679, R01CA52689, P50CA097257); Tendrich/Berkow Fund; Friends of the Dana-Farber Cancer Institute.
Conflict of Interest Statement: None declared.