|Home | About | Journals | Submit | Contact Us | Français|
This study aims to identify promising biomarkers for the early detection of lung cancer and evaluate the prognosis of lung cancer patients. Genome-wide mRNA expression data obtained from the Gene Expression Omnibus (GSE19188, GSE18842 and GSE40791), including 231 primary tumor samples and 210 normal samples, were used to discover differentially expressed genes (DEGs). NEK2, DLGAP5 and ECT2 were found to be highly expressed in tumor samples. These results were experimentally confirmed by quantitative reverse transcription-polymerase chain reaction (qRT-PCR). The elevated expression of the three candidate genes was also validated using the Cancer Genome Atlas (TCGA) datasets, which consist of 349 tumor and 58 normal tissues. Furthermore, we performed receiver operating characteristics (ROC) analysis to assess the diagnostic value of these lung cancer biomarkers, and the results suggested that NEK2, DLGAP5 and ECT2 expression levels could robustly distinguish lung cancer patients from normal subjects. Finally, Kaplan-Meier analysis revealed that elevated NEK2, DLGAP5 and ECT2 expression was negatively correlated with both overall survival (OS) and relapse-free survival (RFS). Taken together, these findings indicate that these three genes might be used as promising biomarkers for the early detection of lung cancer, as well as predicting the prognosis of lung cancer patients.
Lung cancer is one of the leading causes of cancer-related death in the world1. Non-small cell lung cancer and small cell lung cancer are two major pathological types of lung cancer. Unfortunately, many patients are diagnosed with advanced lung cancer due to the asymptomatic nature of the early stages and a lack of effective screening modalities, resulting in a very low 5-year survival rate. Despite the development of multimodal treatment strategies in past decades, including surgical resection, chemotherapy, and radiation therapy, the outcomes of lung cancer patients remain unsatisfactory2. Therefore, novel biomarkers for diagnosis, prognosis, and drug response are urgently needed.
Gene expression profiles have been shown to provide diagnostic or prognostic information in a variety of cancers3–6. Yang et al.7 demonstrated that MARCKS contributed to constitutive CAF activation in ovarian cancer, and MARCKS overexpression defined a poor prognosis in ovarian cancer patients. Sun et al.8 investigated the prognostic potential of lncRNAs in diffuse large-B-cell lymphoma (DLBCL), and identified a potential panel of six-lncRNA signature as a composite biomarker for risk stratification of DLBCL patients at diagnosis. However, efforts to translate gene expression- based analytical methods into the clinical application have been met by several obstacles, including a lack of independent validation or inclusion of clinical variables, as well as overall tumor heterogeneity9. To overcome these hurdles, our investigation utilized a large number of patients from multiple studies with diverse patient populations.
In the present study, we identified differentially expressed genes that were common among several expression profiles. We selected the target genes from among the 100 differentially expressed genes based on biology. According to the literature, NIMA-related kinase 2 (NEK2), disc large (drosophila) homolog-associated protein 5 (DLGAP5) and epithelial cell transforming 2 (ECT2) are three specific mitosis-associated genes. In this study, CCNB1, CCNB2, CDKN2A, BUB1, BUB1B and TTK were also involved in cell cycle. Deregulated gene expression of mitosis-related factors, which forces chromosomal segregation during cell division, is frequently observed in cancer. The results of high throughput screening were confirmed by qRT-PCR and further validated in the TCGA datasets. The expression levels of NEK2, DLGAP5 and ECT2 were significantly higher in lung cancer patients than in normal subjects. In addition, we explored and discussed the diagnostic and prognostic value of the three genes in lung cancer. ROC analyses showed that NEK2, DLGAP5 and ECT2 levels could also robustly distinguish lung cancer patients from normal subjects, demonstrating high AUC, specificity and sensitivity values. Elevated expression of NEK2, DLGAP5 and ECT2 were both remarkably associated with reduced survival and increased risk of recurrence. Taken together, our findings revealed that NEK2, DLGAP5 and ECT2 might be used as promising biomarkers for the early detection of lung cancer, as well as predicting the prognosis of lung cancer patients.
In our study, three expression profiles (GSE19188, GSE18842, GSE40791) were used to identify DEGs between tumors and normal lung tissues. Genes with corrected P-values <0.05 and absolute fold changes >4 were considered as DEGs. The results showed that 131 genes were up-regulated in GSE19188, 316 genes were up-regulated in GSE18842, and 309 genes were up-regulated in GSE40791 (Figure S1A–C). Then, we performed an overlap analysis of the DEGs, a total of 100 genes were significantly up-regulated in the three lung cancer datasets (Figure S1D, Table S2). The increased expression of NEK2, DLGAP5 and ECT2 in lung cancer was identified in three GEO datasets. An unpaired t-test was applied to comparisons of the two groups (tumor vs normal), and p-values of less than 0.05 were considered to be statistically significant (Fig. 1A–C). Importantly, these three genes play an important role in mitosis. Thus, in this study, we focused on NEK2, DLGAP5 and ECT2, three critical mitotic genes.
To confirm our previous results, we selected a series of DEGs for further investigation using another independent set of 56 paired tumors and normal lung tissues. The clinical characteristics of this cohort are summarized in Table 1. NEK2, DLGAP5 and ECT2 expression levels were significantly elevated in tumor tissues compared with normal lung tissues (Fig. 2A–C). As our study was limited to a small number of patients, we expanded the sample size for further validation by using TCGA datasets. A total of 349 lung cancer and 58 normal tissue samples were selected. The expression levels of NEK2, DLGAP5 and ECT2 were similar to those in our training cohort, with significant differences in expression between tumor and normal (Fig. 3A,C,E), suggesting that the differential expression statuses of these three genes is a common feature of lung cancer. Moreover, the increases in NEK2, DLGAP5 and ECT2 expression levels were clearly discernible between TNM stages, with significantly higher levels in stage II-IV patients compared with stage I patients. (Fig. 3B,D,F).
Next, the analysis of the associations between DEG expression and clinicopathological characteristics are presented in Table 2. The TCGA dataset was used for correlation analyses. NEK2 expression was significantly associated with age (P=0.027), gender (P<0.001), clinical stage (P=0.033), pathologic T stage (P<0.001) and therapy outcome (P=0.004). Elevated DLGAP5 expression was significantly correlated with all six clinicopathologic variables. No significant association was observed between ECT2 expression and patient age or clinical stage. Table 2 shows the significant associations between high ECT2 expression in lung cancer and gender (P=0.002), new tumor event (P=0.026), pathologic T stage (P=0.002), and therapeutic outcome (P=0.012). These results suggest that expression changes in NEK2, DLGAP5 and ECT2 may play a vital role in lung cancer progression.
Subsequently, ROC analysis was performed to assess the diagnostic value of NEK2, DLGAP5 and ECT2 as biomarkers detecting lung cancer. The AUC of tumor and normal groups in NEK2 analyses were significantly different for all four lung cancer datasets, with the following values: AUCGSE19188=0.927 (sensitivity: 0.923, specificity: 0.890), AUCGSE18842=1 (sensitivity: 1, specificity: 1), AUCGSE40791=0.967 (sensitivity: 0.910, specificity: 0.926) and AUC TCGA=0.977 (sensitivity: 0.983, specificity: 0.873) (Fig. 4A, Table 3). Similarly, ROC analyses showed that DLGAP5 and ECT2 levels could also robustly distinguish lung cancer patients from normal subjects, demonstrating high AUC, specificity and sensitivity values (Fig. 4B–C, Table 3). Furthermore, in order to exclude the influence of primary clinical factors (age, gender, clinical stage, smoking history) on target gene performance, we further constructed prediction models including (Model 1) or excluding (Model 2) the target gene. Model 1 includes clinical factors and the target gene. Model 2 includes only clinical factors, and excludes the target gene. We compared these models, and the results of these comparisons are shown in Table S3 and Fig. 4D–F. Model 2 performed worse than Model 1. These results suggest that these target genes are important factors for maintaining the model’s performance. Collectively, our results suggest that NEK2, DLGAP5 and ECT2 could be suitable biomarkers for lung cancer diagnosis.
Furthermore, in order to assess the prognostic value of NEK2, DLGAP5 and ECT2 as biomarkers for lung cancer, we investigated the association between the expression levels of each of these targets with survival through Kaplan-Meier analysis. We used the log-rank test in 349 lung cancer patients. The Cox proportional hazards regression model was also used to evaluate the predictive value of NEK2, DLGAP5 and ECT2 mRNA levels in lung cancer patients. Two types of survival outcomes were considered in survival analyses. Overall survival (OS) was defined as the time between the date of surgery and date of death or last follow-up, and relapse-free survival (RFS) was defined as the period from surgery to recurrence or last follow-up.
In this study, the TCGA dataset was used for prognostic analyses. We divided expression levels into two categories using the median. High expression levels were classified as those that were above the median, while low expression levels were below the median. On the whole, patients with low NEK2 levels had statistically longer OS (P=0.009; Fig. 5A) and RFS (P=0.006; Fig. 5B) than those with high NEK2 levels. The median OS in NEK2 low expression group is 72.5 months, in NEK2 high expression group is 39 months. The median RFS in NEK2 low expression group is 73.9 months, in NEK2 high expression group is 25.7 months. Similarly, DLGAP5 expression was significantly related with OS (P=0.001; Fig. 5C) and RFS (P=0.003; Fig. 5D) of lung cancer patients. The median OS in the low and high DLGAP5 expression groups is 59.7 months and 35.8 months, respectively. The median RFS in the low and high DLGAP5 expression groups is 68.2 months and 25.7 months, respectively. These figures revealed that higher DLGAP5 expression correlated with a worse prognosis and earlier recurrence. Elevated expression of ECT2 was also remarkably associated with reduced survival (P=0.007; Fig. 5E) and increased risk of recurrence (P=0.005; Fig. 5F). The median OS in low and high ECT2 expression groups is 59.7 months and 41.2 months, respectively. The median RFS in low and high ECT2 expression groups is 68.2 months and 25.7 months, respectively. Taken together, high expression of these three genes were all remarkably associated with reduced survival and increased risk of recurrence. The univariate/multivariate analyses were carried out to evaluate the target genes and other factors using a Cox proportional hazard regression model. The results showed that the expression of each target gene was significantly correlated with the prognosis of lung cancer patients (Table 4).
Further subgroup analysis, stratified by clinicopathological features, were perfomed to explore the effects of NEK2 expression on OS and RFS in the patients. In patient groups characterized as female, age <65, stage T3+T4, or in groups with new tumor events, there was no difference in OS between NEK2-low and NEK2-high patients. Meanwhile, in groups characterized as age ≥65, male, stage T1+T2, patients with low NEK2 levels had statistically better OS than those with high NEK2 levels (P=0.019, Figure S2A; P=0.011, Figure S2B; P=0.036, Figure S2C, respectively). Similarly, Kaplan-Meier analysis revealed that groups with high NEK2 levels had poor RFS, which was significantly associated with groups age ≥65 (P=0.012, Figure S2D), male (P=0.034, Figure S2E), and stage T1+T2 (P=0.004, Figure S2F). In groups characterized as age <65 (or ≥65), male, stage T3+T4, the patients with low DLGAP5 levels had statistically better OS than those with high DLGAP5 levels (P=0.035, P=0.002, Figure S3A; P=0.020, Figure S3B; P=0.021, Figure S3C, respectively). Our results also showed that groups with high DLGAP5 levels had poor RFS, which was significantly associated with groups age ≥65 (P=0.009, Figure S3D), female (P=0.006, Figure S3E), and stage T1+T2 (P=0.038, Figure S3F). Kaplan-Meier analysis revealed that groups with low ECT2 levels had better OS, which was significantly associated with groups age <65 (P=0.005, Figure S4A), male (P=0.004, Figure S4B), and stage T3+T4 (P=0.023, Figure S4C). Similarly, low ECT2 levels had a better RFS which significantly associate with age <65 (P=0.008, Figure S4D), male (P=0.033, Figure S4E), and stage T1+T2 (P=0.041, Figure S4F).
Lung cancer remains the most common cause of cancer related death worldwide1. The high mortality among patients with lung cancer is mainly due to the absence of an effective screening strategy to identify lung cancer in early stages10. Current screening strategies for lung cancer include conventional radiography, sputum cytology, and more recently, low-dose computed tomography (LDCT). LDCT screening can significantly improve early diagnosis and reduce lung cancer mortality. However, the false-positive rate is high for screening with LDCT and this can lead to harm due to unnecessary workups of benign nodules11, 12. For many decades, cytotoxic chemotherapy was the most effective treatment to improve overall survival and life quality in these patients, despite its many drawbacks13. At the same time, researchers made substantial efforts towards the development of molecular targeted agents14. Systematic clinical studies and basic research on lung cancer has improved the survival; however, the long-term outcomes of lung cancer patients remain poor. Thus, it is necessary to identify new biomarkers to improve the diagnosis and prognosis of lung cancer.
NEK2 is a serine/threonine kinase that is involved in regulation of centrosome duplication and spindle assembly during mitosis15, 16. Dysregulation of these processes causes chromosome instability (CIN) and aneuploidy, which are hallmark changes in many tumors17, 18. NEK2 exists in three alternative splice isoforms, which are NEK2A, NEK2B and NEK2C19. NEK2 overexpression has been observed in several human cancers. Increased expression of NEK2 has been reported to be involved in tumor progression and is associated with poor prognosis in pancreatic ductal adenocarcinoma20, prostate cancer21, colon cancer22. However, the association between the expression level of NEK2 and the early diagnosis of lung cancer patients remains to be rigorously and systematically evaluated. ECT2 is a BRCT-containing protein whose function has been best studied in cytokinesis. He et al.23 showed that ECT2 is located to the chromatin and DNA damage foci-like structures and it facilitates PIKK-mediated phosphorylation of p53 on Ser15, the execution of apoptosis, and the activation of S and G2/M checkpoints. Luo et al.24 showed that elevated expression of ECT2 predicts an unfavorable prognosis in patients with colorectal cancer. Another potential predictor of lung cancer diagnosis and prognosis is DLGAP5. DLGAP5 is a mitotic spindle protein that promotes the formation of tubulin polymers resulting in tubulin sheets around the end of the microtubules25. DLGAP5 contains a guanylate-kinase-associated protein (GKAP) domain that is conserved among various species. This domain is also found in many eukaryotic signaling proteins, suggesting that DLGAP5 may have important biological functions as a signaling molecule26. DLGAP5 is involved in cancer formation and progression, suggesting that the gene and its product may be potential therapeutic targets27.
NEK2, DLGAP5 and ECT2 are mitosis-associated genes that play an important role in tumorigenesis. At present, these genes have been reported to be involved in lung cancer development. Through clustering of a genome-scale co-expression network, lung adenocarcinoma modules were revealed; in few modules, the genes such as DLGAP5 and BIRC5 are present that play a crucial role in cell cycle progression28. Das et al.29 uncovered a novel role for Nek2 in promoting tumorigenesis by regulating an axis of metastasis and cell survival. Ect2 regulates rRNA synth-esis through a PKCi-Ect2-Rac1-NPM signaling axis that is required for lung tumorigenesis30. It is of great clinical significance to explore the early diagnosis and prognosis of these three genes. In previous studies, there are some studies on the association between gene overexpression and poor prognosis in lung cancer. Zhong et al.31 discovered that the patients with overexpressed NEK2, Mcm7 and Ki67 had a poorer overall survival time compared to those with low expression for all stages. Landi et al.32 showed that the very mitotic genes (NEK2 and TTK) known to be involved in cancer development are induced by smoking and affect survival. Schneider et al.33 found that the expression of the mitosis-associated genes AURKA, DLGAP5, TPX2, KIF11 and CKAP5 is associated with the prognosis of NSCLC patients. ECT2 overexpression may be a useful index for application of adjuvant therapy to lung cancer patients who are likely to have poor clinical outcome34, 35. However, some genes identified with prognostic implications in one cohort might be difficult to be verified in other cohorts. The high reliability and reproducibility of the microarray technology in identifying the target genes are also essential for its application in discovering the clinical biomarkers.
Microarray technology has substantially enhanced the search for biomarkers for cancer diagnosis and prognosis. In this study, we identified and validated the expression of NEK2, DLGAP5 and ECT2 in multiple lung cancer datasets, and the results showed that the expression levels of these three genes were significantly higher in lung cancer patients than in normal subjects. Importantly, the expression levels of the three candidate genes were significantly associated with clinicopathologic variables. Furthermore, we revealed the diagnostic and prognostic value of the candidate genes. These cancer biomarkers can be used for early detection, disease monitoring and risk assessment. However, there are some limitations in this study. We just examined the expression of the target genes in tissue samples. Because the ultimate goal of biomarker is specific, early and non-invasive diagnosis and post-therapy monitoring of cancer, body fluid (plasma, urine and sputum) has been thought as an appropriate biological material. In the future, we will also detect the expression of these biomarkers in body fluid samples.
Taken together, these findings indicate that NEK2, DLGAP5 and ECT2 overexpression might be used as promising biomarkers for the diagnosis and prognosis of lung cancer. These genes may also serve as potential therapeutic targets in lung cancer. More work is needed to elucidate the function of these three candidate genes and their roles in tumorigenesis.
Fifty-six patients from Xiangya Hospital (Changsha, China) were included in this study. All the patients provided written informed consent. Experiments and procedures were performed in accordance with the Helsinki Declaration of 1975; and were approved by the Ethics Committee of Xiangya School of Medicine, Central South University. Tumor and matched distant (>5cm) normal lung tissue samples were collected from NSCLC patients who underwent resection for primary lung cancer. All fresh tissues were frozen in liquid nitrogen immediately after resection and stored at −80°C. Their basic clinical characteristics were summarized in Table 1.
Three lung cancer datasets (GSE19188, GSE18842, GSE40791) generated from the Affymetrix platform and corresponding clinical information of lung cancer patients were retrieved from the Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo). GSE19188, including 91 tumors and 65 adjacent normal lung tissues, GSE18842, which includes 46 tumors and 45 controls, and GSE40791 containing 94 tumors and 100 non-tumor tissues.
Validation datasets were acquired from the Cancer Genome Atlas (TCGA) data portal (http://tcga-data.nci. nih.gov). This data set contains 349 adenocarcinomas and 58 non-tumor tissues with both mRNA expression data and clinical feature information available for performing the Receiver Operating Curves (ROC) analysis, survival analysis and correlation analysis. The aim of this study was to identify promising biomarkers for the early detection of lung cancer and to evaluate the prognosis of lung cancer patients. The latest version of the TCGA LUAD dataset includes 571 samples (513 tumors and 58 normal tissues). Two recurrent tumor samples were removed, 28 samples lacking OS data were removed, 133 samples lacking RFS data were removed, and 1 sample lacking clinical stage data was removed, and finally retained the 349 adenocarcinoma samples (primary tumor) and 58 non-tumor samples. Detailed clinical information of patients used in this study was shown in Table 2.
Raw microarray data files (.CEL files) of the three datasets were analyzed using the Robust Multichip Average (RMA) algorithm by the R package Affy36. After that, the Linear Models for Microarray Data (LIMMA) package in R was used to calculate the probability of probes being differentially expressed between cases and controls37. P value correction was performed using the Benjamini-Hochberg (BH) FDR from the package in R. Corrected P-values <0.05 and absolute fold changes >4 were used to identify significantly DEGs. All data analysis were performed using R (http://www.r-project.org/, version 2.15.0) and Bioconductor38. Visualization of the DEGs including heat map, volcano plot and venn diagram was achieved by using gplots, lattice, and venn diagram packages in R, respectively.
Total RNA was extracted from samples with Trizol reagent (Takara, Dalian, China) and then reverse transcribed to cDNA using PrimeScriptTM RT-PCR Kit (Takara, Dalian, China) following the manufacturer’s instructions. Real-time PCR was performed using SYBR® Premix DimerEraser™ (Perfect Real Time) (Takara, Dalian, China) in Roche LightCycler 480 II Real-Time PCR system (Roche Diagnostics Ltd., Rotkreuz, Switzerland). Primers used for real-time PCR are shown in Supplementary Table 1. The threshold cycle value (Ct) of each product was determined and normalized against that of the internal control GAPDH. The differences in mRNA expression levels were compared by t test using SPSS 18.0 (SPSS Inc, Chicago, Illinois, USA). P-values of less than 0.05 were considered statistically significant.
The SPSS version 18.0 (Chicago, IL) and Prism 5.0 GraphPad software (San Diego, CA) were used for statistical analysis. Student’s t-test was applied for comparisons of two groups. ROC curves were used to assess the diagnostic value of each marker39. Area under the curve (AUC) was computed for each ROC curve, and 95% confidence intervals (CI) were also estimated by bootstrapping with 1,000 iterations. Survival analysis was carried out according to Kaplan–Meier analysis and the Log-rank test. The Cox proportional hazards regression model was applied to perform univariate and multivariate analyses. P-values of less than 0.05 were considered statistically significant.
We thank all supporting funds from the National High-tech R&D Program of China (863 Program) (2012AA02A517), National Natural Science Foundation of China (81373490, 81573508, 81573463), and Hunan Provincial Science and Technology Plan of China (2015TP1043).
Zhao-Qian Liu, Hong-Hao Zhou and Wei Zhang designed the experiments; Yuan-Xiang Shi and Ji-Ye Yin performed the experiments; Yuan-Xiang Shi and Yao Shen analyzed the data; Yuan-Xiang Shi wrote the paper; Zhao-Qian Liu and Ji-Ye Yin revised the manuscript. All authors approved the final version of this paper.
The authors declare that they have no competing interests.
Electronic supplementary material
Supplementary information accompanies this paper at doi:10.1038/s41598-017-08615-5
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.