|Home | About | Journals | Submit | Contact Us | Français|
Head and neck cancer accounts for an estimated 47,560 new cases and 11,480 deaths annually in the United States, the majority of which are squamous cell carcinomas (HNSCC). The overall 5 year survival is approximately 60% and declines with increasing stage at diagnosis, indicating a need for non-invasive tests that facilitate the detection of early disease. DNA methylation is a stable epigenetic modification that is amenable to measurement and readily available in peripheral blood. We used a semi-supervised recursively partitioned mixture model (SS-RPMM) approach to identify novel blood DNA methylation markers of HNSCC using genome-wide methylation array data for peripheral blood samples from 92 HNSCC cases and 92 cancer-free control subjects. To assess the performance of the resultant markers, we constructed receiver operating characteristic (RJC) curves and calculated the corresponding area under the curve (AUC). Cases and controls were best differentiated by a methylation profile of six CpG loci (associated with FGD4, SERPINF1, WDR39, IL27, HYAL2 and PLEKHA6), with an AUC of 0.73 (95% CI: 0.62–0.82). After adjustment for subject age, gender, smoking, alcohol consumption and HPV16 serostatus, the AUC increased to 0.85 (95% CI: 0.76–0.92). We have identified a novel blood-based methylation profile that is indicative of HNSCC with a high degree of accuracy. This profile demonstrates the potential of DNA methylation measured in blood for development of non-invasive applications for detection of head and neck cancer.
In 2010, head and neck cancer accounted for an estimated 47,560 new cases and 11,480 deaths in the United States,1 the majority of which are squamous in origin (HNSCC).2 Cigarette smoking and alcohol consumption are the primary known risk factors, collectively implicated in 75% of cases.3 It has long been appreciated that increasing exposure of the oral epithelium to tobacco and alcohol carcinogens can give rise to fields of altered cells.4 At the same time, only a small subset of heavy smokers and drinkers develops HNSCC, and it is therefore thought that this susceptibility is attributable to some as yet unknown host factors. Human papillomavirus type 16 (HPV16) DNA is detectable in approximately 25% of HNSCC, and is particularly evident in oropharyngeal carcinomas, where it is detected in 45–60% of tumors and up to 80% for those arising in the lingual and palatine tonsils.5,6 Other risk factors for the disease include dietary factors,7–12 environmental and occupational exposures,13 gastresophageal reflux14–16 and inherited cancer syndromes, such as Bloom, Li Fraumeni, ataxia telangiectasia or xeroderma pigmentosum,17 although these account for a much smaller attributable risk of HNSCC relative to tobacco, alcohol and HPV16.
The 5-year survival for head and neck cancer is approximately 60% but sharply declines with increasing stage at diagnosis.18 The relative 5-year survival for a patient with localized disease is 83% for cancers of the oral cavity and pharynx and 78% for laryngeal cancers, compared with 55% and 42% for regional and only 32% and 33% for distant stage disease,18 respectively. This is problematic since around two-thirds of head and neck cancer patients present with regional lymph node involvement and about 10% present with distant metastases.19 Moreover, head and neck cancer patients frequently suffer from high morbidity, including disfigurement and impairment of basic functions, such as talking, swallowing, eating and breathing.19 The overall cost of this disease to society is substantial: HNSCC accounts for over $2 billion in direct morbidity and mortality costs annually in the United States (2001 US dollars).20 Furthermore, the median 1-year cost of treatment from 1995–2000 was $22,658 for patients with early-stage disease (stage I or II) compared with $27,655 for those with advanced disease (stage III or IV),21 representing a 22% escalation. Currently, no proven population screening methods are in widespread use for head and neck cancer aside from visual inspection. Thus, there is a need to develop effective markers for incident cancers that could eventually result in earlier diagnosis, reducing morbidity and mortality.
Peripheral blood is a readily available source of genomic DNA that can be used to assess DNA methylation profiles. There have been several recent reports on blood-based methylation biomarkers for various solid tumor types, including breast,22 ovarian,23,24 pancreatic,25 bladder,26 colorectal,27 and lung cancers.28 The aforementioned bladder cancer study, conducted by our group, identified a panel of 9 CpG loci in peripheral blood DNA that were indicative of bladder cancer.26 Here, we applied similar techniques for the identification of novel methylation biomarkers of HNSCC, using genome-wide methylation array data from peripheral blood DNA from a Greater Boston-area case-control study of head and neck cancer.
To identify a novel blood-based DNA methylation profile associated with HNSCC, we applied a semi-supervised RPMM strategy29 (Fig. 1) to Human Methylation27 BeadArray data for peripheral blood collected from 92 incident HNSCC cases and 92 cancer-free control subjects. A description of the study population is provided in Table 1. Relative to controls, cases were more likely to smoke (p = 0.04), and smoked more pack-years among ever-smokers (former + current; p < 0.001), consumed nearly 3-times as many drinks per week (p < 0.001), and were nearly 3-times as likely to be HPV16 seropositive (p < 0.001).
The initial step toward discovery of novel blood-based DNA methylation biomarkers for HNSCC was to identify the CpG loci for which methylation was most significantly associated with HNSCC case status in the training data. To avoid over fitting the data and provide for validation of the model, subjects were randomly allocated to either a training (n = 92) or testing (n = 92) set, stratified by case-control status to ensure an equal distribution of cases and controls between sets. Subject characteristics did not differ significantly between the training and testing sets (Table S1). We then fit a series of individual linear mixed effects models using only the training for each of the 26,486 autosomal CpG loci in the data set, with a random effect for plate (controlling for plate effect), and ranked them according to absolute t-statistic. The optimal number of top CpG loci for discriminating between cases and controls was selected based on a nested cross-validation procedure,29 resulting in 6 CpG loci (associated with FGD4, SERPINF1, WDR39, IL27, HYAL2 and PLEKHA6). Only one of the 6 CpGs (associated with SRPINF1) was located within a CpG island, as defined by Takai and Jones,30 but all 6 CpGs were located within 1 kb of a putative transcription factor binding site (TFBS). Training samples were clustered using a recursively partitioned mixture model (RPMM),31 based on the methylation profile of these 6 CpG loci, resulting in 4 methylation classes (labeled as two-letter combinations of L and R to correspond with branches in the dendogram). Class membership was then assigned to subjects in the testing set using only the methylation status of these 6 loci. The methylation profiles of the subjects in the testing set based on the 6 CpGs identified in the training set is depicted in Figure 2A, which also shows the mean methylation across loci within a given class. The distribution of the methylation values across classes for each of the 6 loci is depicted in Figure S1.
Methylation class membership was significantly associated with case-control status (p ≤ 0.001, permutation-based Chi-square test; Fig. 2B), with the right branch classes (RR, RL) containing a higher proportion of HNSCC cases relative to the left branch classes (LR, LL). The right branch classes predominantly contained HNSCC cases (74.4%) and had an overall mean methylation that was significantly less than the left branch classes p < 0.001), while the majority of the left branch class members were control subjects (67.9%; Table 2). Each of the 6 CpG loci used in the classifier had significantly less methylation among cases compared with controls (Fig. S2). Subjects in right branch classes were more than 6-times as likely to be HNSCC cases compared with those in left branch classes (OR = 6.1, 95% CI: 2.2–17.3), the magnitude of which further increased after controlling for subject age, gender, smoking, alcohol consumption and HPV16 serostatus (OR = 9.3, 95% CI: 2.9–29.6). No significant differences were observed for stage at diagnosis, tumor site or HPV16 serostatus of cases between the left and right classes (Table 2).
To assess the performance of the classifier, we constructed RJC curves and calculated the corresponding AUC (Fig. 3) in the testing data. Using methylation class alone, the AUC was 0.73 (95% CI: 0.62–0.82). After adjustment for subject age, gender, smoking status (never, former, current), smoking pack years, alcohol consumption and HPV16 serostatus, the AUC increased to 0.85 (95% CI: 0.76–0.92).
To assess the tumor specificity of the classifier, we tested its performance in classifiying bladder cancer, which is also associated with smoking. We applied the classifier to blood methylation data from an Infinium HumanMethylation27k BeadArray for a previously described case-control study of bladder cancer,26 consisting of 223 cases and 205 controls. The blood methylation profile used to predict HNSCC cases and controls was not predictive of bladder cancer, with no significant differences in case-control status observed across classes (p = 0.73; Fig. S3).
Methylation of CpG loci used in the methylation profile detected via the Infinium HumanMethylation27k BeadArray platform was confirmed through pyrosequencing (data not shown).
To identify profiles of DNA methylation associated with HPV16 serostatus among HNSCC cases (n = 92) and determine whether they are indicative of HPV16 serostatus in a series of blinded test samples we again used SS-RPMM. Training and testing sets were formed by randomly allocating HNSCC cases to either the training (n = 46) or testing set (n = 46), stratified by HPV16 serostatus to ensure an approximately equal distribution of HPV16-positive and HPV16-negative subjects in each set. The top 25 CpG loci were selected based on the nested cross-validation step and their methylation profiles were used to cluster the training samples, resulting in two classes (Fig. S4A).
The right branch class had an overall mean methylation that was significantly less than the left branch class (p < 0.001). In the test set, we observed that class membership was significantly associated with HPV16 serostatus (p = 0.02, permutation-based Chi-square test, Fig. S4B), where cases in the left branch class were 5 times as likely as those in the right branch class to be HPV16-positive (OR = 5.0, 95% CI: 1.04–24.9). Although a similar point-estimate is obtained after adjusting for subject age, gender, smoking and alcohol consumption, the difference was no longer significant (adjusted OR = 4.6, 95% CI: 0.8–25.0).
We assessed performance of the classifier by plotting RJC curves and calculating the AUC (Fig. S5). Using methylation class alone, the AUC was 0.69 (95% CI: 0.53–0.84). After adjustment for subject age, gender, smoking and alcohol consumption, the AUC increased to 0.86 (95% CI: 0.7–0.97).
Since HPV16 infection is a risk factor for HNSCC and HPV16 has been associated with altered blood folate status32 as well as with distinct epigenetic states in peripheral blood,33 we assessed whether there was any overlap in the biological pathways impacted by differential DNA methylation associated with HPV16 or HNSCC case status. Using the combined training and testing data (n = 184), gene set enrichment analysis (GSEA) was performed to compare Kegg-defined pathways over-represented among loci associated with HNSCC to those associated HPV16 (cases only; n = 92). Pathways with a nominal p < 0.05 based upon the GSEA enrichment statistic are provided in Figure 4, grouped by function. No overlapping pathways based on HPV16 and HNSCC-associated CpG loci were identified, although there was overlap of similar functional groupings of pathways, detailed in Figure 4.
We also evaluated the DNA sequences near the CpG dinucleotides associated with altered methylation, interrogating them for the presence of nearby transcription factor binding sites (TFBS), hypothesizing that altered DNA methylation observed in cases or among those with HPV16 positive status may be specifically targeting genes with common sequence features. The upper portion of Figure 4 illustrates the results of this GSEA-based analysis, depicting overrepresented TFBS located within 1 kb of loci whose DNA methylation related to HPV16 serostatus, HNSCC or both. There was overlap of the binding site for POU2F1, which is an octamer transcription factor (Oct-1).
Head and neck cancer bears substantial morbidity, mortality and treatment costs in the United States, which worsen with increasing stage at diagnosis. Thus there is a need for non-invasive tests that facilitate the detection of both early disease and recurrence, as well as those that improve the ability to predict risk of second primary cancers. Here, we describe a novel DNA methylation profile of 6 CpG loci in peripheral blood and assess its potential utility as a diagnostic biomarker of HNSCC.
We have demonstrated the ability of this profile to distinguish HNSCC cases from cancer-free control subjects with a high-degree of accuracy (AUC = 0.73), which further improves when age, gender, smoking, alcohol consumption and HPV16 serostatus are taken into account (AUC = 0.85). The profile also appears to have tumor specificity, as demonstrated by its inability to discriminate bladder cancer cases from controls. Moreover, we found no difference in stage at diagnosis of cases across classes, indicating that our methylation profile works with both early and advanced-stage disease. Although the sensitivity and specificity are not yet sufficient for clinical applications, it may be possible to improve diagnostic accuracy in the future by combining this profile with other biomarkers or diagnostic criteria. Furthermore, our results pave the way for research and development of DNA methylation biomarkers of HNSCC derived from blood. In the future, similar approaches may also allow for utility in predicting response to therapy, disease recurrence and risk of second primary cancer.
Different white blood cell types have different patterns of DNA methylation, and it is therefore conceivable that our observations are the result of significant, systematic changes in large cell populations in the hematopoietic system, such as might be observed in a specific immune response to the tumor.34 In fact, some, but not all, of the CpG loci that we identified are located within genes with either established or purported involvement in immune differentiation or function (FGD4, IL27 and HYAL2).35–37 If our results are attributable to changes in cell populations it is likely that these changes will be systematically observed in most HNSCCs and that DNA methylation at the loci we identified is very specific for a particular subset of white blood cells. To our knowledge, none of the loci we discovered have been specifically described in the literature as differentially methylated in any subset of white blood cells. Our data suggest a need for further investigation of these loci and their role in any HNSCC immune response. If our findings are the result of an altered peripheral blood profile, we believe it to be important for them to be evaluated for an association with disease outcome, recurrence and second primary cancers.
Alternatively, our findings may be indicative of systemic, developmentally acquired alterations similar in their mechanism of occurrence to constitutional epimutations. There are likely to be DNA methylation events that occur in utero perhaps due to maternal exposures or stochastic errors in the setting of methylation marks. It is possible that certain of these may predispose individuals to HNSCC.38,39 While it is known that some epimutations are high-penetrance and have a very dramatic phenotype,39,40 it is also common for them to exhibit mosaicism and have a phenotype that is more complex. Going forward, the most important test of this hypothesis could come from prospective studies that can assess whether these profiles are present at times that predate disease diagnosis.
Finally, an additional possibility that could contribute to our findings might be an unaccounted cancer-associated exposure that systemically alters DNA methylation directly or alters a mechanism responsible for production of DNA methylation. However, we statistically controlled for the major HNSCC risk factors (smoking, alcohol consumption and HPV16) in our analyses, making it unlikely that any such effect stemming from these exposures is confounding our results. Barring these exposures, we are unaware of any such factor but cannot rule out the possibility that it might exist.
We decided to further explore any similarities in observed methylation patterns of HPV16 seropositive cases and HNSCC by looking at possible overlap between methylation-associated pathways and transcription factor binding sites (TFBS) associated with HNSCC and HPV16 infection. We did not identify any overlapping pathways between HNSCC and HPV16, although there were common functional groupings, but the significance of this, if any, is uncertain. Interestingly, all 6 of our CpG loci were located within 1 kb of a putative TFBS. There was overlap of the POU2F1 transcription factor-binding site, overrepresented in both HNSCC and HPV16 seropositive cases. POU2F1 is expressed in all cell types and plays a role in immunoglobulin production,41 regulation of genes encoding histone H2B42 and small nuclear RNA (snRNA) activation.42 It is involved in both inflammation and epigenetic regulation, in-line with our proposed explanations for alterations in blood methylation; conversely, it may be that the observed commonality of POU2F1 is simply due to chance.
We have identified a novel blood-based methylation profile that is indicative of HNSCC with a high degree of accuracy. Although not yet adequate for use in clinical settings, this profile demonstrates the potential of blood DNA methylation for development of non-invasive applications for detection of head and neck cancer. Future studies should be aimed at continued exploration of blood DNA methylation biomarkers using prospective studies and denser genome-wide methylation arrays with greater CpG coverage that might allow for continued discovery and improvement of blood methylation panels, alone or in conjunction other types of markers, for enhanced sensitivity and specificity. Translational efforts aimed at identification of practical DNA methylation biomarkers, such as these, may eventually result in improvement in the detection and treatment of HNSCC, helping to alleviate the overall public health burden of this disease.
The study population consisted of 92 HNSCC cases and 92 cancer-free control subjects with available peripheral blood samples, randomly selected from a previously described case-control study of head and neck cancer.43 Briefly, the case-control study was comprised of incident head and neck cancer patients from the greater Boston area, and population-based controls from the same region with no prior history of cancer, frequency-matched on age and gender. Subjects completed a self-administered questionnaire that provided data on sociodemographics, personal characteristics, personal and family cancer history, and exposures. Study approval was obtained from the Brown University Institutional Review Board for sample collection and use of subject data. All subjects provided written informed consent for participation in this study.
Serologic HPV16 testing for E6, E7 and L1 viral proteins was performed on all subjects. Sandwich ELISA assays were used for detection of HPV16 antibodies as previously described in reference 44 and 45. Subjects were considered HPV16-positive if any of the viral protein tests (E6, E7 or L1) were positive.
DNA was extracted from peripheral blood using the DNeasy Blood and Tissue Kit (Qiagen, Valencia, CA) and sodium bisulfite-converted using the EZ DNA methylation kit (Zymo Research, Orange, CA) according to manufacturer's protocols. Genome-wide methylation analysis was performed using the HumanMethylation27 BeadArray (Illumina, San Diego, CA), which interrogates 27,578 CpG loci across 14,495 genes, at the University of California San Francisco Institute for Human Genomics Core Facility. Outliers were detected using array control probes supplied by Illumina to diagnose problems such as poor bisulfite conversion, batch or BeadChip effect or color-specific problems. Specifically, Mahalanobis distances were determined based on fitted mean vector and variance-covariance matrix, and arrays with large distances inconsistent with multivariate normality46 were discarded. The methylation status for each individual CpG locus was calculated as the ratio of fluorescent signals (β = Max(M,0)/[Max(M,0) + Max(U,0) + 100]), ranging from 0 to 1, using the average probe intensity for the methylated (M) and unmethylated (U) alleles. Beta (β) = 1 indicates complete methylation; β = 0 represents no methylation.
We applied a semi-supervised recursively partitioned mixture modeling (SS-RPMM) algorithm29 to identify novel blood DNA methylation biomarkers. This method is based both on the semi-supervised procedure proposed by Bair and Tibshirani47,48 and on the recursively partitioned mixture models (RPMM) developed by Houseman et al.31 a model-based unsupervised clustering procedure, which demonstrates efficient and effective performance for analysis of Illumina methylation array data.26,49–53 The general analytic strategy is presented in Figure 1. To avoid over fitting the data and provide for validation of the model, subjects were randomly split into training (n = 92) and testing (n = 92) sets, stratified by case-control status to ensure an equal distribution of cases and controls between sets. A series of linear mixed-effects models were then fit using the training data only for each of the 26,486 autosomal CpGs in the data set. Each methylation value (arcsine square root transformed prior to analysis54) was modeled as the dependent variable, with a random-effect for plate (to allow for inter-plate normalization). CpG loci were ranked based on the absolute value of the t-statistic for the fixed-effect term derived from the model; subsequently the top M loci were used to train a classifier for case/control status, where the number of loci M was selected using a nested cross-validation procedure. Specifically, RPMM was fit to the training data for the purpose of clustering subjects using the M selected loci. To predict class membership in the testing data, the latent class structure from the RPMM fit to the training data was used in conjunction with an empirical Bayes procedure. Receiver operating characteristic (RJC) curves and corresponding area under the curve (AUC) were used to assess the performance of the methylation classifiers, first using methylation class alone and then adjusting for age, gender, smoking, alcohol consumption and HPV16 serostatus. Unconditional logistic regression was used to calculate the magnitude of the association between methylation class and HNSCC, controlling for potential confounders (age, gender, smoking, alcohol and HPV16 serostatus). Similar methods were also applied for distinguishing cases by serologic markers of HPV16 exposure, using HPV16 serostatus (positive/negative) as the fixed-effect term.
To explore the biological relevance of blood-based alterations of DNA methylation for distinguishing HNSCC cases from controls, as well as by HPV16 serostatus, we performed a gene set enrichment analysis (GSEA) based on Kegg-defined pathways.55 This was conducted using cases from the combined training and testing data sets (restricted to cases due to low number of HPV16-positive controls; n = 92) to compare pathways overrepresented among loci associated with HNSCC to those associated with subject HPV16 serostatus. Overrepresentation of any of 258 putative transcription factor binding sites (TFBS) obtained from the tfbsConsSites track of the UCSC Genomes Browser site (TFBS Z-score > 2) located within 1 kb of differentially methylated CpG loci was similarly assessed.
The authors thank Drs. Michael Pawlita and Tim Waterboer at the Infection and Cancer Program of the German Cancer Research Center (DKFZ) for their contribution in performing the serologic HPV16 testing.