|Home | About | Journals | Submit | Contact Us | Français|
Oral squamous cell carcinoma (OSCC) is usually diagnosed at an advanced stage and is commonly preceded by oral premalignant lesions. The mortality rates have remained unchanged (50% within 5 years after diagnosis), and it is related to tobacco smoking and alcohol intake. Novel molecular markers for early diagnosis are urgently needed. The purpose of this study was to evaluate the diagnostic value of methylation level in a set of 18 genes by bisulfite next-generation sequencing.
With minimally invasive oral brushing, 28 consecutive OSCC, one squamous cell carcinoma with sarcomatoid features, six high-grade squamous intraepithelial lesions (HGSIL), 30 normal contralateral mucosa from the same patients, and 65 healthy donors were evaluated for DNA methylation analyzing 18 target genes by quantitative bisulfite next-generation sequencing. We further evaluated an independent cohort (validation dataset) made of 20 normal donors, one oral fibroma, 14 oral lichen planus (OLP), three proliferative verrucous leukoplakia (PVL), and two OSCC.
Comparing OSCC with normal healthy donors and contralateral mucosa in 355 CpGs, we identified the following epigenetically altered genes: ZAP70, ITGA4, KIF1A, PARP15, EPHX3, NTM, LRRTM1, FLI1, MIR193, LINC00599, PAX1, and MIR137HG showing hypermethylation and MIR296, TERT, and GP1BB showing hypomethylation. The behavior of ZAP70, GP1BB, H19, EPHX3, and MIR193 fluctuated among different interrogated CpGs. The gap between normal and OSCC samples remained mostly the same (Kruskal-Wallis P values <0.05), but the absolute values changed conspicuously. ROC curve analysis identified the most informative CpGs, and we correctly stratified OSCC and HGSIL from normal donors using a multiclass linear discriminant analysis in a 13-gene panel (AUC 0.981). Only the OSCC with sarcomatoid features was negative. Three contralateral mucosa were positive, a sign of a possible field cancerization. Among imprinted genes, only MIR296 showed loss of imprinting. DNMT1, TERC, and H19 together with the global methylation of long interspersed element 1 were unchanged. In the validation dataset, values over the threshold were detected in 2/2 OSCC, in 3/3 PVL, and in 2/14 OLP.
Our data highlight the importance of CpG location and correct estimation of DNA methylation level for highly accurate early diagnosis of OSCC.
The online version of this article (doi:10.1186/s13148-017-0386-7) contains supplementary material, which is available to authorized users.
Oral and pharyngeal cancer, grouped together, is the sixth most common cancer in the world. The annual estimated incidence is approximately 600,000 per year, two thirds of these cases occurring in developing countries . The prevalence of oral cancers is high especially in South and Southeast Asia, where distinct cultural practices such as betel-quid chewing and varying patterns of tobacco and alcohol use are important risk factors that predispose to cancer of the oral cavity. The mortality rates of these tumors have remained unchanged (50% within 5 years after diagnosis) and are related to tobacco smoking and alcohol intake. Oral cancer patients are usually diagnosed at an advanced stage (two thirds are III–IV), which is associated with a worse prognosis and higher radio- and chemotherapy morbidity. Moreover, quality of life is disproportionately compromised in the oral cavity patient since surgical therapy can be mutilating and often has significant effects on swallowing, speech, and physical appearance. Evidently, improved oral cancer prevention, early detection, and better diagnostic and clinical management tools are needed to identify high-risk patients, such as those with smoking and alcohol exposure, patients without adequate access to health care, and patients with high-risk lesions such as oral leukoplakia (OL), which may progress to carcinoma. The estimated prevalence of OL is approximately 0.5% worldwide. Oral squamous cell carcinoma (OSCC) is the most common neoplastic disease in the head and neck region and is commonly preceded by oral premalignant lesions (OPML) such as OL. In Western countries, the annual malignant transformation rate from OL to OSCC varies from 0.13 to 36.4% [2, 3].
In addition, OSCC patients can develop a second primary OSCC, with a frequency ranging between 17 and 30% . Clinical and histological features of OPML do not provide enough information to identify premalignant lesions at high risk of malignant transformation or patients who will develop an OSCC during follow-up. OSCC or high-grade squamous intraepithelial lesions (HGSIL) are usually diagnosed by incisional biopsy. Nevertheless, the biopsy requires a minimally invasive surgical approach that can create discomfort and may be refused by the patient. Moreover, only one third of patients present with early stage disease (T1–2, N0), and for the remainder, life expectancy is very short. Different researchers, including our group, have recently proposed an attractive strategy to reduce the burden of OSCC by analyzing the methylation status of a panel of genes starting from saliva and/or brushing specimens [5–10]. In many cancers, gene silencing by promoter methylation seems to be an early event in carcinogenesis and may occur even more frequently than structural inactivation of genes by mutations and deletions . Histologically normal tissue adjacent to tumors and OPML can have an aberrant methylation pattern in candidate genes, suggesting that these epigenetic modifications are early events in oral carcinogenesis . This study assessed the methylation status of a set of 18 gene promoters and long interspersed element 1 (LINE1) to detect early OSCC and OPML by bisulfite next-generation sequencing (NGS), starting from minimal invasive collection specimens obtained by oral brushing.
All clinical investigations were conducted according to the principles of the Declaration of Helsinki. The study was approved by the local ethics committee (study number 14092, protocol number 899/CE). All information regarding the human material used in this study was managed using anonymous numerical codes.
We enrolled 28 consecutive patients with a clinical and histological diagnosis of OSCC, one case of OSCC with sarcomatoid features, six consecutive patients with a clinical and histological diagnosis of HGSIL as defined by Gale et al. [13, 14], 35 related contralateral clinically normal mucosa, and 65 normal mucosa of healthy donors as reference controls. Clinical data including smoking status are summarized in Table Table1,1, and all these cases referred to the Department of Oral Sciences, University of Bologna, from January 2014 to December 2016. All patients presenting with a suspected oral neoplastic lesion that required diagnostic incisional biopsy also underwent oral brushing sampling. Lesions with an obvious etiology such as trauma and infective aphthous ulcerations were excluded. Oral brushing for histological diagnosis was performed as previously reported . Specimens were always selected before incisional biopsy and samples were enrolled in the population study only after histological confirmation of OSCC. Two different brushing specimens were collected in OSCC and HGSIL patients: one from the area with a lesion and the other from clinically normal distant mucosa (cheek opposite).
Histological examination was performed blindly by two pathologists (MPF, SA) at the Department of Biomedical and Neuromotor Sciences, “M. Malpighi” Section of Anatomic Pathology, at Bellaria Hospital, University of Bologna, Italy. A multihead microscope discussion was made on discordant cases to obtain a common diagnosis. Histological diagnoses were established following the WHO criteria [14, 16] and according to the 2014 Ljubljana classification . The oral brushing sample series included a group of 65 normal mucosa samples collected from healthy donors as normal controls. Only one oral brushing sample was collected in this group from similar areas with respect to the OSCC group. The flow chart in Fig. Fig.11 depicts the experimental design of the study.
We validate our results by an independent cohort of cases including 20 normal donors, one oral fibroma, 14 oral lichen planus, three proliferative verrucous leukoplakia, and two OSCC enrolled from January to May 2017.
DNA from exfoliating brush specimens was purified using The MasterPure™ Complete DNA Purification Kit (Epicentre, cod. MC85200). Bisulfite treatment of genomic DNA (200–500 ng) was carried out with the EZ DNA Methylation-Lightning™ Kit (Zymo Research, cod. D5031) according to the manufacturer’s protocol.
A set of 19 gene targets were selected because they were previously identified with altered methylation pattern in OSCC. A detailed list of reference for each gene is available in Table Table2.2. In particular, ZAP70, KIF1A, GP1BB, and MIR137HG were previously described to be differentially methylated in OSCC by our group  and others [5, 18, 19], while PAX1, LRRTM1, PARP15, FLI1, NTM, LINC00599, EPHX3, and ITGA4 were discovered by Guerrero-Preston et al. . Additionally, the remaining MIR193A, MIR296, TERT, TERC, H19, DNMT1A, and LINE1 were found to be epigenetically altered in OSCC by various authors [21–27] (see Table Table22 for details).
To identify the putative CpG island on the promoter region or in the first part of the coding region, the genomic sequence stored on the Ensembl genome browser (http://www.ensembl.org/index.html) including 1000 bp upstream of the ATG site were employed as query sequence. MethPrimer (http://www.urogene.org/cgi-bin/methprimer/methprimer.cgi)  designing was applied to identify CpGs and the best primers of choice. Regions of interest and mapping coordinates are listed in Table Table22.
The library was prepared in two steps: a first multiplex PCR amplification for target enrichment and a second round of amplification with a low number of cycles allowing the barcoding of the template-specific amplicons obtained from the first amplification step. Barcoding was performed using the Nextera™ index kit as previously described . Locus-specific bisulfite amplicon libraries were generated with tagged primers using Phusion U DNA polymerase (ThermoFisher, cod. F555L).
Amplification products were purified using SPRI-AMPure XT (Agencourt-Beckman Coulter, cod. A63881) quantified with Fluorometer Quantus™ (Promega, cod. E6150) and then employed as template (100 ng) for a second round of PCR (6 cycles). Sample-specific barcode sequences were added in this second PCR. The amplicon library was purified using Agencourt AMPure XP beads (Agencourt-Beckman Coulter, cod. A63881) and then quantitated with the Quantus™ Fluorometer (Promega, cod. E6150). Sequencing was conducted on the MiSEQ (Illumina, cod. 15027617) according to the manufacturer’s protocol. A set of three genes KIF1A, ZAP70, and GP1BB were initially evaluated in parallel by the protocol described by Morandi et al.  using pyrosequencing on GSJunior (Roche) to verify the quantification method in a set of 10 normal donors.
FASTQ files already trimmed for multiplex identifier were processed for quality control (>Q30) and for read lengths (>100 bp) and converted into FASTA format in a Galaxy Project environment . To evaluate the methylation ratio of each CpG, we loaded FASTA files into the bisulfite sequencing pattern analysis tool (BSPAT—http://cbc.case.edu/BSPAT/index.jsp) . In parallel, we used Perl to create single specific files for every interrogated gene, which were then visualized using BiQ Analyzer , QUMA , and BISMA  to confirm data from BSPAT analysis. Methylation plotter (http://gattaca.imppc.org:3838/methylation_plotter/)  was used to create Fig. Fig.44 and Additional file 1.
Multiclass linear discriminant analysis and receiver operating characteristic (ROC) curve analysis were calculated using IBM SPSS Statistics 21 (IBM) and the easyROC web tool (http://www.biosoft.hacettepe.edu.tr/easyROC/). PCA analysis and the methylation plot were created using ClustVis, a web tool for visualizing clustering multivariate data (http://biit.cs.ut.ee/clustvis/) .
Bisulfite NGS was used to examine the set of 18 genes listed in Table Table22 together with global methylation analysis evaluating long interspersed element 1 (LINE1), with a total of 355 CpGs, mostly located within the first exon.
Parallel evaluation using our new bisulfite NGS method on Illumina MiSEQ and the one described by Morandi et al.  using pyrosequencing on GSJunior (Roche) was performed in a set of 10 normal donors for KIF1A, GP1BB, and ZAP70 giving almost identical results (Spearman correlation of 0.969). A further index of reliability of the assay is the H19 mean value of 13 CpGs evaluated in normal donors, which was found very close to 50% as expected: 0.47 ± 0.14 (see Additional file 2).
An example of the methylation analysis using BISMA (http://services.ibc.uni-stuttgart.de/BDPC/BISMA/manual_unique.php)  as a web tool is shown in Fig. Fig.2,2, representing a comparison for KIF1A among case 1 (OSCC), its normal contralateral mucosa, case 31 (HG-SIL), and a healthy donor. Most OSCC cases had a homogeneous methylation pattern independently of the ratio, while all six HGSIL enrolled in this study showed an irregular methylation pattern among various CpGs as exemplified in Fig. Fig.22 for case 31.
After BSPAT data processing  (see “Methods” for the bioinformatic pipeline), a ROC analysis for each CpG of all 18 genes evaluating differences between OSCC and normal healthy donors is performed and summarized in Additional file 3 (only the three best CpGs were shown for simplicity). In Fig. Fig.3,3, the ROC of the most performant genes GP1BB and ZAP70 were displayed. DNMT1, TERC, and H19 were found to be unchanged both evaluating OSCC vs normal donors and vs normal contralateral mucosa.
Among the remaining 15 differentially methylated genes, 12 were hypermethylated (ZAP70, PAX1, KIF1A, LRRTM1, PARP15, FLI1, NTM, LINC0059, EPHX3, MIR137HG, ITGA4, MIR193), whereas three were hypomethylated (GP1BB, MIR296, TERT) in OSCC/HGSIL. Additional file 1 depicts the mean methylation level for each group of the interrogated genes visualized using the Methylation plotter tool  and the Kruskal-Wallis test for single CpG. Figure Figure44 shows the fluctuating methylation level among different CpGs evaluated in GP1BB and ZAP70. Mean methylation level and corresponding standard deviation of each group for all target genes is available in Additional file 2.
Considering only one gene at a time, the AUC varied depending on the various CpGs involved; ranges are summarized in Table Table33 for the best and worst AUC values.
The best CpGs identified by the ROC analysis were used to create an algorithm of choice to correctly discriminate OSCC and HGSIL, with the exception of MIR137HG, which was previously shown to be aberrantly methylated also in oral lichen planus (OLP) , and PAX1 which was used as an informative marker for a patent related to prognosis prediction in Head and Neck Squamous Cell Carcinoma (HNSCC) . A multiclass linear discriminant analysis (LDA) that weighted the contribution of each CpG was used to calculate the score.
ROC curve analysis applied to these scores discriminating OSCC and HGSIL from normal donors gave an area under the curve (AUC) of 0.981, identifying a threshold of 1.0615547 as the best value for sensitivity and specificity (97.1 and 100% respectively, see Fig. Fig.5).5). Twenty-eight out of 29 OSCC (96.6%) and six out of six HGSIL (100%) specimens exceeded the threshold value. The squamous cell carcinoma with sarcomatoid features was the only OSCC case found to be negative for the score with no sign of epigenetic changes in any of the genes evaluated. None of the 65 healthy donor specimens showed a positive result, and three out of 30 available contralateral normal mucosa specimens from OSCC patients exceeded the threshold value (10%). Five cases of contralateral mucosa did not yield enough DNA to perform the test, probably for inadequate repeat brushings.
Most of the investigated CpGs had a Kruskal-Wallis P value <0.05 (Fig. (Fig.4).4). Kruskal-Wallis test showed a significant between-group difference for scores generated with linear discriminant analysis (LDA) (T = 78.8587, P < 0.05) (Fig. (Fig.6).6). Multiple range test for LDA-generated scores did not show a statistical difference between the OSCC and HGSIL groups, whereas it identified a statistical difference between the OSCC group and healthy donors and contralateral mucosa. Furthermore, multiple range test showed a statistical difference between the HGSIL group and healthy donors and contralateral mucosa. Multiple range test identified a statistical difference between healthy donors and contralateral mucosa (see Table Table44 for details).
Among the 18 genes tested, DNMT1, MIR296, NTM, LRRTM1, and H19 were previously considered to be imprinted (see geneimprint database, http://www.geneimprint.com/site/genes-by-species). In this study, DNMT1, NTM, and LRRTM1 showed very low levels or absence of methylation in normal donors and in contralateral mucosa. We further analyzed DNA from whole blood DNA of a pool of healthy female donors (DNA female pool, Cod. G1521, Promega, Madison, WI, USA, data not shown) with absence of methylation as confirmation. On the contrary, H19 showed a classical imprinted status with 50% methylation both in normal mucosa and in different lesions. No evidence of an altered methylation pattern in OSCC cases was found with a mean methylation value of 0.56 ± 0.25 SD. A single CpG (Chr11: 2018169) revealed lower values (mean 0.30 ± 0.26 SD). The fluctuating behavior of the 12 CpGs in H19 is shown in Additional file 1.
For MIR296, full methylation was discovered (mean value of 0.98 ± 0.02) in normal cells. This gene was hypomethylated in all but four cases of OSCC considering the most informative CpG mapped at Chr20: 57392374 (mean value of 0.95 ± 0.05). A detailed pattern of methylation for each CpG of this gene is shown in Additional files 1 and 3.
Using the principal component analysis with the highest distribution of data (PC1) as the x-axis and the second highest principal component (PC2) as the y-axis, the data are distributed as evenly across the plot as possible while maintaining the distance between points as a proxy for how similar each point is to the other (Fig. (Fig.7).7). The graph shows that OSCC elements (violet) and HGSIL (red) are located in the left center of the plot, while normal donors (blue) are clustered in a well-defined and restricted area near the normal contralateral specimens (green) which span in a less defined manner.
Figure Figure88 shows a heatmap based on the best three CpGs of the 18 genes evaluated. Values in the matrix are color-coded, and rows (CpGs) and columns (specimens) are clustered using correlation distance and average linkage. The top of the heatmap shows an overview of two annotations, histology and smoking status. Two clusters are marked: left cluster: 65 normal donors, 24 contralateral mucosa, two OSCC, and the OSCC with sarcomatoid features; right cluster: 26 OSCC, six HGSIL, and six contralateral mucosa. Complete heatmap from all CpGs is available in Additional file 4.
LINE1 methylation analysis was evaluated to interrogate global methylation status of repetitive elements distributed genome-wide . As seen in Additional file 5, no differences among groups and no signs of hypomethylation were detected in OSCC.
Considering the validation dataset, all normal donors were detected under the threshold value, as well as one oral fibroma and 12 out of 14 OLP; on the contrary, the remaining two OLP, all PVL, and all OSCC were positives. PCA is available in Additional file 6 and heatmap in Additional file 7.
Screening populations for the early detection of asymptomatic carcinoma or precursor lesions is an attractive strategy to reduce the burden of OSCC. A major example of this approach is the cytological smear test in women for cervical cancer screening. However, low sensitivity and specificity have precluded the widescale adoption of microscopic cytology for the detection of oral cancer and OPML . Methods such as toluidine blue staining and autofluorescence imaging have been proposed to improve the clinical identification of high-risk OPML, but a recent meta-analysis confirmed there was no evidence to support the use of these adjunctive technologies as screening tools to reduce oral cancer mortality . Recent findings indicated that DNA methylation analysis of a specific set of genes may serve to detect early stage oral cancer lesions [22, 40], while the location of core regions and the density of methylation are required for gene silencing .
This study adopted a different approach from 450k methylation arrays to interrogate a set of genes already known to be altered in HNSCC [5, 15, 19, 20, 27]. For in-depth investigation of the true methylation level, we developed a novel accurate, sensitive, and specific assay to detect early OSCC and HGSIL from oral brushing specimens using bisulfite NGS. Quantitative DNA methylation analysis of the following 13 genes: ZAP70, ITGA4, KIF1A, PARP15, EPHX3, NTM, LRRTM1, FLI1, MIR193, LINC00599, MIR296, TERT, and GP1BB, clearly discriminated OSCC and HGSIL from healthy donors or from contralateral normal mucosa from the same patients. In particular, we found in OSCC and HGSIL hypermethylation of ZAP70, ITGA4, KIF1A, PARP15, EPHX3, NTM, LRRTM1, FLI1, MIR193, LINC00599, PAX1, and MIR137HG and hypomethylation of TERT, MIR296, and GP1BB at various CpGs. No epigenetic changes were found in DNMT1, H19, TERC, or any global hypomethylation of repetitive LINE1. Best performances were found for ZAP70, GP1BB, PAX1, LRRTM1, KIF1A, and TERT showing the best sensitivity and specificity in a ROC curve analysis (see the best CpGs for each gene in Figs. Figs.3,3, ,4,4, ,5,5, and and6).6). Despite its high discriminant potential, we excluded the new panel PAX1 described by Guerrero-Preston  because it was patented by the same group, and MIR137HG which yielded ambiguous results (see Additional files 1 and 3) and was previously found to be differentially methylated in OLP [15, 42].
The stratification among distinct groups (OSCC vs healthy normal individuals, OSCC vs normal contralateral mucosa) was based not simply on evaluation of the methylated/unmethylated allele but also on quantitative identification of the correct threshold among various CpGs of the different genes considered. Combining the methylation level of 13 genes, the discrimination power of this new assay reached an optimal level of accuracy (AUC 0.981). The only OSCC case negative for the score was the OSCC with sarcomatoid features. No aberrant methylation pattern was found in any of the genes investigated, signifying that this rare variant of oral carcinoma does not share the same epigenetic status as the other OSCC cases, as shown in Fig. Fig.88 in cluster analysis. This result may be considered for clinical management, and further investigation into genomic alterations is required.
Linear discriminant analysis allowed us to generate a score that weighted the best CpGs from the most informative genes investigated in the study. A single gene discriminating method or a single CpG may be inadequate for our purpose. Considering only one gene at a time such as GP1BB, the AUC varied between 0.968 and 0.876 depending on the CpGs, while for TERT, it varied from 0.924 to 0.667, implying the central role of CpG location (see Table Table33 for the best and worst AUC values).
By using an independent cohort of samples, we further validate our algorithm confirming a clear negative value for all normal donors enrolled, together with one oral fibroma and 12 OLP. On the contrary, values over the threshold were detected in the two OSCC, in all three PVL, and in two OLP. Positivity for all PVL may be related to its nature of a progressive, multifocal, exophytic form of leukoplakia with high rates of malignant transformation. For two OLP positives (14.3%), the World Health Organization (WHO) has considered OLP as a premalignant lesion. The range of malignant transformation varies between 0 and 10% , which is consistent with our findings.
Basically, the calculated thresholds of most interrogated CpGs were proximal to 0 in the following genes: FLI1, ITGA4, KIF1A, MIR193, NTM, LINC00599, PAX1, and MIR137. Overall, the aberrant methylation pattern of these OSCC-related markers was estimated to be 0.33 ± 0.3 SD, simply overcoming the basal unmethylated condition seen in normal cells from both the contralateral mucosa of the same patients and the oral mucosa of healthy donors. By contrast, EPHX3, LRRTM1, and PARP15 showed higher values both for threshold and mean values for the OSCC group (0.61 ± 0.31 SD). This implies that the basal level of methylated CpG in these genes in normal tissues is not close to 0, and a small fraction of epialleles are methylated. Moreover, methylation levels in OSCC exceeded 50%. The behavior of a set of five genes including ZAP70, GP1BB, H19, EPHX3, and MIR193 fluctuated among the various CpGs as shown in Additional file 1. Interestingly, the gap between normal and OSCC remained the same (Kruskal-Wallis P values were mostly <0.05) but the absolute values changed conspicuously among different positions investigated. This implies that each CpG must be considered apart, and a specific threshold calculated. Recently, van Vlodrop et al.  emphasized the emerging evidence on the importance of the location of CpG methylation in relation to gene expression and associations with clinicopathologic characteristics in cancer. Crucial CpG islands or single CpGs for regulating gene expression not only are often situated around the transcriptional start site but can also be observed more upstream or downstream.
We used bisulfite NGS methods because of their advantages: (i) many CpG sites, so that complex methylation patterns of individual DNA molecules can be determined; (ii) the longer reads can be aligned to the reference sequence more easily and accurately, especially in repetitive regions of the genome; and (iii) the long reads are more likely to cover more genotype information like single nucleotide polymorphisms (SNPs) in the neighborhood of cytosines for correlation analysis between DNA methylation and genotype. These features identified not simply different methylation levels among groups but for instance a different pattern of methylation between OSCC and HGSIL. Usually, OSCC revealed a homogeneous pattern with most epialleles methylated in cis, whereas HGSIL disclosed an irregular pattern of CpG methylation as shown in Fig. Fig.22 for case 31. This behavior was shared among all five HGSIL for all the genes involved and is probably due to partial epigenetic modifications which reach a complete pattern only in full-blown OSCC. This behavior is visible only using bisulfite NGS and not methylation-sensitive PCR or qPCR, or Infinium™. A previous report by Bock et al.  compared the most promising assays for measuring DNA methylation in large cohorts, clinical diagnostics, and biomarker development, including amplicon bisulfite sequencing, enrichment bisulfite sequencing, Infinium™, EpiTyper®, and Pyroseq™. Best performances were obtained using amplicon bisulfite sequencing with high accuracy and robustness. In addition, this approach guarantees high throughput and a variable number of targets to interrogate, depending on the assay design .
We did not identify LINE1 global hypomethylation in OSCC with respect to the normal donors and normal contralateral mucosa as previously described  using pyrosequencing technique. With respect to the pyrosequencing approach which interrogates only four CpGs, as indicated by Ogino et al. , we evaluated more CpGs (21), with high number of reads per single specimen (hundreds). However, probably, we select a CpG-rich region which may not be involved in changes in methylation.
From a bioinformatic point of view, bisulfite NGS analysis is complex and the pipeline requires many steps. Commonly, NGS runs produce FASTQ files as output already trimmed for multiplex identifier or MID/IonXpress/Index (a short barcode sequence used to label samples/patients when multiplexing) to recognize loaded specimens. Firstly, these FASTQ files are processed for quality control (>Q30 for Illumina) and read lengths (>100 bp) to discard primer dimers, then FASTQ files are converted to FASTA. With the advent of cloud computing and the availability of NGS webtools such as Galaxy Project , these steps have minimal PC requirements and are user friendly. Using perl to recognize specific sequence motifs, several FASTA files were created for each gene for each patient and then loaded onto a visualization tool such as BiQ Analyzer , MethVisual , QUMA , or BISMA . However, these tools do not scale up with massively parallel sequencing having been designed for Sanger sequencing. Newer tools such as Bismark  and BS-Seeker  have been utilized more efficiently and can handle larger datasets generated by NGS technologies. In addition, a web application service named bisulfite sequencing pattern analysis tool (BSPAT) uses Bismark’s read alignments and methylation calling functionalities to provide further quality control, co-occurrence pattern analysis, simple allele-specific methylation analysis, visualization, and integration with other databases and tools . BSPAT gives the correct methylation ratio of each epiallele investigated to be included in subsequent statistical analysis. We therefore used this tool to analyze our data as the best option, maintaining BISMA as a supporting tool for confirmation.
NTM, LRRTM1, MIR296, H19, and DNMT1 are known to be imprinted genes and are included in the imprinting gene database (http://www.geneimprint.com). However, no normal healthy donors, contralateral normal mucosa or DNA from whole blood of a pool of healthy female donors was either hemimethylated or fully methylated in DNMT1, LRRTM1, and NTM in our series. By contrast, H19 showed common signs of imprinting with half of the epialleles methylated in normal donors and contralateral normal mucosa. We found no evidence for an altered methylation pattern in OSCC cases with a mean methylation value of 0.56 ± 0.25 SD. For MIR296, another imprinted gene, a mean value of 0.95 ± 0.05 was discovered in normal donors, while it was found to be hypomethylated in all but two cases of OSCC (mean 0.91 ± 0.07, see Additional file 2). This is in agreement with a full methylation of both epialleles, while in lesions, we found a partial demethylation of only nine out of 15 CpGs tested (Additional file 1). Aberrant expression of MIR296 was previously related to gastric , bladder , and lung cancer . The same behavior was found for TERT which was fully methylated in both epialleles in normal samples and partially hypomethylated in OSCC/HGSIL. Hypomethylation of this gene was previously reported in glioblastoma [55, 56]. Five out of six CpGs of the TERT gene were informative, even if the discriminating gap among different classes was found to be very slight, but calculating the ROC curve, the AUC obtained was highly consistent (0.921, see Additional file 3 for ROC between OSCC and normal donors).
Among the other 11 genes included in our algorithm, Marsit et al.  and our group previously identified three (GP1BB, ZAP70, KIF1A) as possible markers . The present study provided evidence for a complete demonstration of their real value with the best AUC in the ROC analysis.
GP1BB encodes heterodimeric transmembrane protein that constitutes the receptor for von Willebrand factor and mediates platelet adhesion in the arterial circulation. Mutations of GP1BB are associated with Bernard-Soulier syndrome, an extremely rare inherited bleeding disorder . ZAP70 gene encodes the ζ-chain associated protein kinase 70 kDa, which is a tyrosine kinase normally expressed by natural killer cells and T cells. Hypermethylation of ZAP70 gene predicted an unfavorable disease course in terms of disease progression and overall survival in chronic lymphocytic leukemia . KIF1A (kinesin family member 1A) encodes a protein that is microtubule-dependent molecular motor involved in important intracellular functions such as organelle transport and cell division .
The other eight markers identified here included PAX1 and PARP15 already described by Guerrero-Preston et al.  as related to oral cancer, FLI1 involved in Ewing sarcoma , rectal cancer , and gastric cancer , NTM in prostate , EPHX3 in salivary gland adenoid cystic carcinoma , ITGA4 in colorectal cancer  ,and MIR193 in gastric cancer .
A second interesting finding of the present study was the behavior of normal distant mucosa from OSCC patients. Kruskal-Wallis test and multiple range test showed that the mean methylation value obtained in the group of normal distant mucosa of OSCC patients was significantly (P < 0.01) lower than the methylation value of the OSCC area but, at the same time, significantly higher (P < 0.01) than the methylation value of normal mucosa from healthy donors. Furthermore, three out of 30 normal distant mucosa (10%) exceeded the threshold, and this behavior in our opinion may not be due to a possible contamination of few cancer cells during the collection of normal mucosa, since our algorithm is based on quantitative methylation analysis and not simply on methylated/unmethylated status.
Genetic and epigenetic changes are also detected in histopathologically clean resection fields and could cause local relapse in mucosa primarily free of cancer cells. This has been explained by Slaughter’s model of “field cancerization” , whereas Braakhuis et al.  proposed a “patch-field” model in which a stem cell acquires genetic and epigenetic alterations in the initial phase, forming a clonal unit of altered daughter cells called a “patch.” A patch will transform into an expanding field acquiring additional genetic alterations, and by virtue of its growth advantage, a proliferating field gradually displaces the normal mucosa. Assuming that mucosa is predisposed to carcinogenesis due to exposure to exogenous genotoxins, an important clinical implication is that fields often remain after surgery of the primary tumor and may lead to new cancers clinicians currently refer to as “a second primary tumor” or “local recurrence,” depending on the exact site and time interval. The areas at highest risk for development of a second squamous cell carcinoma are large, sometimes extending to the lung . Genetically altered cells may escape macroscopic or histopathological examination and may require sophisticated biomolecular approaches. An altered pattern of gene methylation in morphologically normal mucosa in OSCC patients may indicate field cancerization, so that these patients could have a higher risk of developing a second primary tumor or local recurrence. Further study on the methylation pattern in surgically resected patients may expand the potential of our new assay even for prognostic applications.
In conclusion, the present study confirmed the role of epigenetic alterations and aberrant methylation DNA status in HGSIL/OSCC and also revealed an altered methylation pattern in normal mucosa distant from the OSCC area. Early diagnosis of OSCC may be important for clinical management, particularly in high-risk populations, and our novel assay based on quantitative bisulfite NGS analysis could be a highly sensitive and specific method to detect early OSCC starting from non-invasive, easy-to-perform brush sampling. Further studies with a larger cohort including more HGSIL, low-grade SIL, and OLP with 5 years of follow-up are needed to elucidate the intrinsic prognostic potential of our assay.
Methylation profile plot from 18 genes evaluated. For each group of samples, each line represents the methylation mean for each position. Asterisks indicate a statistical significance as calculated by the Kruskal-Wallis test. ZAP70, GP1BB, H19, EPHX3, and MIR193 revealed a fluctuating behavior among the various CpGs evaluated. The gap between normal and OSCC remained mostly the same (Kruskal-Wallis P values were <0.05), but the absolute values changed conspicuously among different positions investigated. (PDF 415 kb)
Summary tables of each gene targets with the mean, the standard deviation, the minimum, the maximum, and the number of missing data for each position and group of samples. This test is the non-parametric version of the ANOVA (one-way analysis of variance) and tests whether samples originate from the same distribution. If the test is statistically significant (P value less than 0.05), it means that at least one of the samples is different from the other samples. (XLSX 122 kb)
ROC analysis discriminating OSCC vs normal healthy donors using easyROC as a webtool, showing the three best performing CpGs from each gene of 18 evaluated. Comparing OSCC vs normal healthy donors in 355 CpGs, the following epigenetically altered genes revealed high discrimination power: ZAP70, ITGA4, KIF1A, PARP15, EPHX3, NTM, LRRTM1, FLI1, MIR193, LINC00599, PAX1, and MIR137HG showing hypermethylation and MIR296, TERT, and GP1BB showing hypomethylation. (PDF 255 kb)
Heatmap from 325 CpG methylation data points (rows) and 130 samples (column). Annotation labels refer to histology and smoking status. Rows are centered; unit variance scaling is applied to rows. Both rows and columns are clustered using correlation distance and average linkage. Smoking status and histology are highlighted in color. Three clusters are marked: left cluster: 55 normal donors, 21 contralateral mucosa, one OSCC, and the OSCC with sarcomatoid features; center cluster: three HGSIL, 11 OSCC, five contralateral mucosa, and one normal donor; right cluster: 16 OSCC, three HGSIL, four contralateral mucosa, and nine normal donors. (PDF 376 kb)
LINE1 mean methylation levels among OSCC, HGSIL, normal healthy donors, and contralateral normal mucosa. Asterisks indicate a statistical significance as calculated by the Kruskal-Wallis test. (PDF 22 kb)
PCA for validation dataset: Unit variance scaling is applied to rows; SVD with imputation is used to calculate principal components. X and Y axes show principal component 1 and principal component 2 that explain 53.9 and 9.8% of the total variance, respectively. Prediction ellipses are such that with probability 0.95, a new observation from the same group will fall inside the ellipse. N = 40 data points. (PDF 29 kb)
Heatmap for validation dataset. Rows are centered; unit variance scaling is applied to rows. Imputation is used for missing value estimation. Both rows and columns are clustered using correlation distance and average linkage; 54 rows, 40 columns. (PDF 45 kb)
We are grateful to Leonardo Caporali for his excellent technical assistance
This study was supported by an academic grant (FARB FFBO124539) from the University of Bologna.
LMor, DG, and AT conceived and designed the study. LMor, MPF, DG, AT, AG, CM, and LMont carried out the provision of the study materials or patients. MPF and SA participated in the diagnosis of the study. LMor, DG, and AT contributed to the collection and assembly of the data. LMor, DG, AT, SA, MPF, and LMont analyzed and interpreted the data. LMor, DG, AT, SA, and MPF wrote the manuscript. All authors read and approved the final manuscript.
All subjects signed an informed consent form for the use of their biological material for research purposes, approved by the local bioethics committee (AUSL, study number 14092, protocol number 899/CE).
As a possible conflict of interest, Luca Morandi, Davide Gissi, and Achille Tarsitano submitted a patent (the applicant is the University of Bologna) in November 2016 to the National Institute of Industrial Property; however, we believe that this is a natural step of translational research (bench-to-bedside) and guarantee that the scientific results are true. The remaining authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
The online version of this article (doi:10.1186/s13148-017-0386-7) contains supplementary material, which is available to authorized users.
Luca Morandi, Email: email@example.com.
Davide Gissi, Email: firstname.lastname@example.org.
Achille Tarsitano, Email: email@example.com.
Sofia Asioli, Email: firstname.lastname@example.org.
Andrea Gabusi, Email: email@example.com.
Claudio Marchetti, Email: firstname.lastname@example.org.
Lucio Montebugnoli, Email: email@example.com.
Maria Pia Foschini, Email: firstname.lastname@example.org.