In this report, we demonstrate that gene expression profile can significantly improve the prediction of OSCC development over clinical and histological variables in OPL patients. Multiple prediction models were developed and compared using CoxBoost algorithm. We observed a marked improvement in prediction accuracy when a gene expression profile was used. With the gene expression profile only, we developed a 29-trancript prediction model that had prediction error rate around 8%. Using the profile in combination with the previously known risk factors, the model showed a similar prediction error rate as the expression profile alone. Because using the previously known risk factors alone had a clear inferior performance () compared to Models 1 and 2, it is clear that the expression profiles have a predictive value beyond the known risk factors. As an alternative way to assess the misclassification rate of genomic predictors in general, we employed a simpler approach, which used DLDA algorithm to develop prediction models and the standard 10-fold cross validation scheme to evaluated the models. We obtained 16% misclassification rate, which is highly statistically significant (P < 1.0E-16, compared to null hypothesis) compared with other risk factors alone. These results suggest that the the gene expression profile may robustly predict oral cancer development in patients with OPL.
Because no prospective cohort is currently available to validate our finding, we acknowledge that our study only represents the first step in the development of a biomarker that could be used in clinical practice. However, we consider it as a proof-of-principle that a gene expression signature developed in patients with preneoplasia may improve our prediction of cancer development over clinical and pathological factors. It also provides a list of transcripts that could facilitate future efforts to better understand the disease and intervene in its progress. In order to move this work into a clinical tool, the next step will be (1
)-to refine the signature and adapt it on a CLIA-certifiable platform, and (2
)-to identify an independent cohort of patients for the validation of our models. This work is ongoing and clearly beyond the scope of this study.
It is important to note that we used tissue samples collected prior to cancer development in this study, which is different from most gene expression based studies where cancers were used. A number of studies have shown a value of gene expression profiles in cancer prognosis. For example, Shedden et al performed a multi-site, blinded validation study to assess several prognostic models based on gene expression profiles of 442 lung adenocarcinomas (10
). Several of the models being evaluated produced risk scores that significantly correlated with outcome, and the models worked better with clinical data. However, cancer prognosis remains to be a difficult problem because the tumors are heterogenous, and they evolve over time. Samples collected from a particular site at a particular time may not be able to provide adequate information to predict behavior of the cancer. In comparison, the samples used in this study may be less complex because they were in the early tumorigenic process.
Gene expression profiling was obtained from the whole biopsy. The absence of microdissection to isolate the epithelial cells from the underlying stroma, did not allow us to differentiate the respective contribution of these 2 compartments. Therefore, the genes we identified may include genes expressed by both the epithelial cells and stromal cells. Our objective in this work was to improve our prediction accuracy over clinical and histological markers. We believe that capturing the information from both compartments may be important to achieve this goal (32
The samples used in this study were collected at baseline or at 3 months after the inclusion. The conclusion of the trial was that the drugs used in the trial, even if they induced some clinical responses, were ineffective in preventing oral cancer development (7
). Therefore, the influence of the drugs used in the trial on gene expression is likely, but is peripheral in the context of our study, which objective was to identify genes associated with oral cancer development. Similarly, we did not consider other factors, such as gender and ethnicity, which may influence gene expression but were not associated with oral cancer development.
Our set of patients was enriched in patients who developed oral cancer and in never smokers compared to the remaining patients not included in the trial. Tobacco has been established as a significant risk factor in the development of oral leukoplakia and oral cancer. However, the population with leukoplakia is heterogeneous, and although never smokers as well as women often represent a small proportion of the patients with oral leukoplakia, the risk of oral cancer development has been reported to be higher than in smokers. With a mean follow-up of 7.2 years, Silverman et al reported a transformation rate of oral leukoplakia of 24% in never smokers versus 16% and 12% in current and former smokers respectively (2
). Einhorn et al and Roed-Petersen et al reported an eightfold risk and five-fold risk for patients never smoker with oral leukoplakia (33
). Because the incidence of human papilloma virus infection in oral cancer is low, as opposed to oropharyngeal cancer (34
), further studies are needed to better understand the development of oral cancer in never smokers.
It is a well recognized challenge to develop prognostic models from microarray gene expression profile data. Subramnian and Simon identified a number of statistical issues in the design and evluation of the prognostic models in recent studies, which casts some doubts on the readyness of the models for practical clinical use (29
). To ensure that our results are reproducible, we documented the script used in our analysis in detail (Supplementary Material 2
). The CoxBoost algorithm fits a Cox proportional hazards model by component wise likelihood based boosting. It is especially suited for models with a large number of predictors and allows for mandatory covariates. Binder et al.
demonstrated the utility of the method using both simulated data and real microarray data from patients with bladder cancer (16
). It was shown that microarray features selected by the CoxBoost approach can improve prediction performance over a purely clinical model. The algorithm has also been recently used as along with three other popular methods to compare gene-based versus pathway-based procedures for the identification of prediction models (36
). Thus, we thought CoxBoost is an appropriate tool to identify biomarkers beyond clinical variables from microarray gene expression profiling data. The consistency between the new CoxBoost approach and the more common Coxph model as shown in , was also reassuring.
Microarray gene expression profiling has become a mature and widely used high-throughput technology. Eventhough it is typical that RTQ-PCR is used for validating the finding in microarray studies, we did not think cherry-picking some of the transcripts included in Models 1 and 2 is effective or adequate. Instead, we used 8 different datasets generated from different microarray platforms to test wether the oral cancer index, which summarizes the information from a comprehensive list of transcripts associated with oral cancer development, can differentiate cancer from normal cells. Since we are able to find significant association between the oral cancer index and cancer status, it greatly enhances our confidence in our results. Furthermore, this list of transcripts may provide key biological factors associated with oral cancer development.
In a recent study, Bhutani et al. demonstrated that oral epithelium could serve as a surrogate tissue for assessing smoking-induced molecular alterations in the lungs (37
). They studied promoter methylation of p16 and FHIT genes in oral and bronchial brush specimens from smokers enrolled in a randomized placebo-controlled chemoprevention trial. They showed that bronchial methylation were correlated with oral tissue methylation. These results suggest the possibility of oral tissues as a molecular mirror of lung carcinogenesis (38
). On the other hand, Spira et al studied gene expression profiles of normal bronchial samples of smokers (27
). The authors developed a multi-gene index that can distinguish smokers with or without lung cancer from non-cancer samples with high sensitivity and specificity. They proposed that this index may also predict lung cancer risk in smokers. Since our study also predicts cancer risk, as we expected we found that the risk index calculated according to our list of significant transcripts also correlated with Spira et al’s lung cancer risk index ().
Since many of the significant transcripts have been shown altered in cancers, it suggests that gene expression profiles may evolve progressively towards cancer before the cells become cancers. Consistently, we observed a significant upregulation of several gene sets associated with the proteasome machinery using functional pathway analysis of the significant genes. Protein synthesis and degradation is a tightly regulated process that is essential for normal cellular homeostasis (39
). Many proteasome target proteins are involved in important processes of carcinogenesis and cancer survival, such as TP53
and CDKN1B p27
). Down regulation of these genes were also significantly associated with the development of oral cancer in our study (Supplementary Material 8
Consistent with our previous results using deltaNp63 protein expression, tumor protein p63 (TP63
) mRNA expression was also associated with a high risk to develop OSCC (hazard ratio (HR): 4.4, Wald test P = 3.6E-4) (7
). Among other very significant genes were 4 of the 5 small integrin-binding ligand N-linked glycoproteins (SIBLINGs), that are cell adhesion modulators, were among the transcripts most significantly associated with oral cancer development (dentin sialophosphoprotein (DSPP
), dentin matrix protein 1 (DMP1
), secreted phosphoprotein 1 (SPP1
), and integrin-binding sialoprotein (IBSP
)). The genes encoding the SIBLINGs are located within a cluster on chromosome 4. They deserve further studies to define their functional role in oral cancer development (40
) (Supplementary Material 8
Our study may provide valuable information for designing cancer prevention strategies. One may consider to use proteasome inhibitors (e.g., bortezomib) for oral cancer prevention. As a single agent or in combination with standard therapy, its limited inhibition activity in HNSCC and other solid tumors (41
) may be related to an upregulation of both pro-apoptotic proteins and anti-apoptotic proteins. Recent studies have shown that combining bortezomib with cetuximab (an EGFR-directed antibody) or STAT3 inhibitors, might enhance its efficacy (42
). However, bortezomib toxicity and its intravenous mode of administration preclude its evaluation in the chemoprevention setting (41
). Less toxic and orally active proteasome inhibitors are under development (44
). Several natural compounds with proteasome-inhibitory effects have also been investigated in chemoprevention (41
). Green tea consumption has produced promising effects against development of prostate cancer, without inducing major toxicities (45
). Based on the results of our study, those compounds deserve further evaluation in preclinical models of oral carcinogenesis. Tsao et al. reported recently the results of a Phase II randomized, placebo-controlled trial of green tea extract (GTE) in patients with high-risk oral premalignant lesions. The OPL clinical response rate was higher in all GTE arms (n = 28; 50%) versus placebo (n = 11; 18.2%; P = 0.09) but did not reach statistical significance. However, the two higher-dose GTE arms [58.8% (750 and 1,000 mg/m(2)), 36.4% (500 mg/m(2)), and 18.2% (placebo); P = 0.03] had higher responses, suggesting a dose-response effect (46
DNMT3B trancript, which is one of the most significant risk factors in our list (HR: 7.7, Wald test P = 4.3E-6) and part of Model 2, may deserve particular attention for its role in epigenetic tumorigenesis. It is possible that epigenetic tumorigenesis mediated by DNMT3B
could be an early event in oral tumorigenesis. The role of DNMT3B
in tumorigenesis has been recently highlighted in various cancer (47
). Variant forms of DNMT3B transcripts have been described to play a major role in non-small-cell lung cancer, and may deserve further studies in HNSCC (49
). Some DNMT3B
polymorphisms have been associated with HNSCC risk in non-Hispanic whites (50
). A recent study of the combination of a DNA demethylating drug and all-trans retinoic acid has shown a reduction of oral cavity cancer induced by the carcinogen 4-nitroquinoline 1-oxide in a mouse model (51
). We compared DNMT3B expression levels in 3 publicly available datasets and found DNMT3B
was overexpressed in HNSCC versus normal mucosa, consistent with the role of DNMT3B
overexpression in head and neck tumorigenesis (details not shown). One possible mechanism of regulation of DNMT3B expression involves noncoding RNAs. MicroRNA-29 family has been demonstrated to revert aberrant methylation in lung cancer by targeting DNMT3A
). Our microarray platform also measured the precursor forms of miRNA. Consistent with this hypothesis, hsa-miR-29b-1 was found to be the most protective marker in our univariate Cox model analysis (HR: 0.0008, Wald test P = 0.0002). A significant negative correlation was observed between hsa-miR-29b-1 and DNMT3B expression (R=-0.38, P = 0.0002).
Our results showed that hsa-miR-101-1 () was another microRNA associated with a low risk to develop oral cancer. Hsa-miR-101 expression was also reported to be reduced in early-stage neoplastic transformation in the lungs of F344 rats chronically treated with the tobacco carcinogen 4-(methylnitrosamino)-1-(3-pyridyl)-1-butanone (53
). It has been associated in these studieswith the upregulation of cyclooxygenase-2 (COX2
), and enhancer of zeste homolog 2 (EZH2
), a mammalian histone methyltransferase that contributes to the epigenetic silencing of target genes and regulates the survival and metastasis of cancer cells (54
). However, in our study, COX2
gene expression were not significantly associated with OSCC development. Other genes might be regulated by this microRNA.
The micro-RNAbased strategies might therefore be considered in future chemoprevention studies, especially for OPLs, which is easily accessible and frequently involves only one or a few lesions.
In summary, we have demonstrated the value of gene expression profiles in predicting oral cancer development in OPL patients, beyond previously reported clinical and pathological biomarkers. If validated in future studies, the profiles may serve as biomarkers to classify OPLs for oral cancer risk in routine clinical practice. Interestingly, certain transcripts in the profiles may be important in oral tumorigenesis and should be considered as potential targets for oral cancer prevention.