|Home | About | Journals | Submit | Contact Us | Français|
Recent works have reported that long non-coding RNAs (lncRNAs) play critical roles in tumorigenesis and prognosis of cancers, suggesting the potential utility of lncRNAs as cancer prognostic markers. However, lncRNA signatures in predicting the survival of patients with clear cell renal cell carcinoma (ccRCC) remain unknown. In this study, we attempted to identify lncRNA signatures and their prognostic values in ccRCC. Using lncRNA expression profiling data in 440 ccRCC tumors from The Cancer Genome Atlas (TCGA) data, a five-lncRNA signature (AC069513.4, AC003092.1, CTC-205M6.2, RP11-507K2.3, U91328.21) has been identified to be significantly associated with ccRCC patients’ overall survival in both training set and testing set. Based on the lncRNA signature, ccRCC patients could be divided into high-risk and low-risk group with significantly different survival rate. Further multivariable Cox regression analysis suggested that the prognostic value of this signature was independent of clinical factors. Functional enrichment analyses showed the potential functional roles of the five prognostic lncRNAs in ccRCC oncogenesis. These results indicated that this five-lncRNA signature could be used as an independent prognostic biomarker in the prediction of ccRCC patients’ survival.
Renal cell carcinoma (RCC) is one of the most common renal malignancies worldwide, with an estimated 15,000 deaths every year . Recent studies showed that incidence and mortality rates of RCC are increasing in the United States . The vast majority of RCC subtypes are classified as clear cell renal cell carcinoma (ccRCC), which account for 70–80% of all RCCs. ccRCC has been reported to have the highest rate of progression and mortality [3, 4]. The standard of care for ccRCC remains surgical excision, and many ccRCC patients will be cured by surgery. However, about 30% of ccRCC patients had metastases and would die following removal of a confined tumor [4, 5]. To date, no widely accepted molecular biomarkers for ccRCC aggressiveness have been available. Great efforts to improve the early-stage detection of ccRCC are warranted.
Long non-coding RNAs (lncRNAs) are defined as RNA transcripts longer than 200 bp with little or no protein-coding capacity [6–9]. Mounting evidence suggested that lncRNAs are important molecular players with the ability of regulating gene expression at the level of chromatin modification, transcription and post transcriptional regulation [10–13]. Many dysregulated lncRNAs have been identified in cancers, which are tightly associated with tumor metastasis [14–17]. Moreover, many lncRNAs have been reported to be novel survival predictors for cancer patients, providing a broaden option for cancer diagnosis and prognosis [18–21]. Several prognostic biomarkers for ccRCC have been reported in clinical trials, such as Linc00152  and lnc-ZNF180-2 , etc. More potential and valuable lncRNA biomarkers are needed to be identified to improve the clinical outcome of ccRCC patients [24–26].
Prognostic lncRNA signatures have been examined in many cancer types [18, 27–31]. In this work, we used a cohort of 440 ccRCC patients from The Cancer Genome Atlas (TCGA) data to detect the potential lncRNA signature in predicting the survival of ccRCC patients. We identified a five-lncRNA signature from the TCGA dataset, and determined its independence of clinical factors. The identification of prognostic lncRNAs suggested the potential roles of lncRNA in ccRCC pathogenesis.
The 440 TCGA ccRCC patients were randomly divided into a training (n = 220) set or a testing set (n = 220), respectively. Based on the training set, the lncRNAs were subjected to univariable Cox regression model, and a total of five lncRNAs were significantly correlated with the patients overall survival (P-value < 0.001; Table Table1).1). Three of them (AC069513.4, AC003092.1, RP11-507K2.3) had positive coefficients, representing that the higher expression level was associated with shorter survival. The negative coefficients for the remaining two lncRNAs (CTC-205M6.2, U91328.21) suggest higher levels of expression were related to longer survival.
Base on the expression level of five lncRNAs, we designed a risk-score formula for ccRCC patients’ survival prediction. The risk score formula is as following: Risk score= (1.43 × expression level of AC069513.4) + (0.81 × expression level of AC003092.1) + (1.64 × expression level of RP11-507K2.3) + (-6.56 × expression level of CTC-205M6.2) + (-1.72 × expression level of U91328.21). Next, we calculated the lncRNA-based risk score for each ccRCC patient in the training set, and divided ccRCC patients into high-risk (n=110) and low-risk groups (n=110) using the median risk score value as a threshold. The Kaplan-Meier curves showed that patients in the high-risk group suffered worse prognosis than the patients in the low-risk group (33.3 months vs. 40.1 months, P-value=3.2e-6; Figure Figure1A).1A). The overall survival rate of the patients in the low-risk group was 80% in 3 years, 70% in 6 years and 60% in 9 years, whereas the survival rate in high-risk group was only 65% in 3 years, 45% in 6 years and 20% in 9 years, respectively. To evaluate the competitive performance of the five-lncRNA signature, time-dependent ROC curve analysis was measured, and the AUC score for the five-lncRNA signature was 0.703 (Figure (Figure1B),1B), demonstrating the better performance of survival prediction in the training dataset. Univariate Cox regression analysis showed that the five-lncRNA risk score were significantly associated with patients’ survival (P-value < 0.001, HR = 1.151, 95% CI = 1.1-1.2; Table Table2).2). The distribution of the risk score, overall survival and expression profiles of five lncRNAs in samples of the training dataset were showed in Figure Figure1C,1C, which were ranked according to the risk score value. Patients with high-risk scores had higher mortality than patients with low-risk scores. For patients with high risk scores, the expression profiles of lncRNAs (AC069513.4, AC003092.1 and RP11-507K2.3) were significantly up-regulated, whereas the remaining two lncRNAs (CTC-205M6.2 and U91328.21) were down-regulated.
We next validated our five-lncRNA signature in the testing set to confirm our findings. By calculating the risk score for each patient in the testing set based on the same risk score formula, we divided ccRCC patients into a high-risk group (n=94) and a low-risk group (n=126) using the same threshold. Consistent with the results in the training set, patients in the high-risk group had significantly shorter survival than those in the low-risk group (33.07 months vs. 36.55 months, log-rank test P-value=0.04; Figure Figure2A).2A). The overall survival rate of the patients in the low-risk group was 55% in 3 years, 12% in 6 years and 2% in 9 years, whereas the survival rate in high-risk group was only 48% in 3 years, 8% in 6 years and 0% in 9 years, respectively. In the entire TCGA data set, similar result was observed that patients in the high-risk group had significantly shorter survival than those in the low-risk group (33.4 months vs. 37.4 months, P-value=5.09e-7; Figure Figure2B).2B). Time dependent ROC curves analysis for the five-lncRNAs signature-based model achieved AUC score of 0.63 and 0.68 in the testing set and entire set, respectively.
We evaluated whether the survival prediction based on five-lncRNA signature was independent of clinical factors. Multivariate Cox regression analysis was then performed, including lncRNA-based risk score and other clinical information, such as age, gender, tumor grade and AJCC tumor stage (Table (Table2).2). The result showed that five-lncRNA risk score remained to be tightly associated with survival after adjusting the clinical factors. Moreover, we found that the age and AJCC stage were also significantly associated with overall survival. Then, stratified analysis was carried out, and the entire TCGA data set were divided into younger stratum (age ≤ 50, n=85) and older stratum (age > 50, n=355). The result showed that the five-lncRNA risk score can further divide ccRCC patients into high-risk and low-risk subgroup within each stratum (Figure (Figure3).3). These result suggested that prognostic value of five-lncRNA signature is independent of age. Similar results were obtained when the stratification analysis of AJCC tumor stage (Figure (Figure4)4) and tumor grade (Figure (Figure5)5) were performed. These findings suggested that five-lncRNA risk score has a competitive performance for the survival prediction of ccRCC patients.
To explore the functional implication of five prognostic lncRNAs in ccRCC tumorigenesis, we performed functional category enrichment analysis to examine their function. The biological functions of lncRNAs are still largely unknown. Many lncRNAs can act as cis-regulators, and the expression of lncRNA is significantly correlated with their neighboring protein-coding genes. Here, we predicted their putative functions based on co-expression network. Spearman correlation coefficients were calculated between lncRNAs and protein-coding genes based on their expression values. The top 1% protein-coding genes were selected as co-expressed partner of five prognostic lncRNAs. At last, a total of 1960 protein-coding genes were significantly correlated with at least one prognostic lncRNAs. Functional enrichment analysis showed that lncRNA correlated protein-coding genes were significantly enriched in 128 GO terms and 11 KEGG pathways (Figure (Figure6).6). The functional categories are mainly involved in four functional clusters, including proteasome, transcription regulation process, intracellular transport, GTPase activity and several pathways (Figure (Figure6A).6A). This results suggested that the five prognostic lncRNAs might be involved in tumorigenesis process through regulating protein-coding genes to influence known cancer related pathways.
Considering the great importance of lncRNAs in tumor tumorigenesis and progression, lncRNA dysregulation may serve as an important indicator of the characteristics of tumors. It has been documented that altered lncRNAs can exist in many cancer types, and are tightly associated with the outcome of cancers [32, 33]. Many works have focused on whether aberrant expression of specific lncRNAs in cancers can serve as independent markers for diagnosis and prognosis [17, 18, 31, 34]. Recently, although some works have announced the great importance of lncRNA in ccRCC tumorigenesis [23, 35], the comprehensive prognostic values of lncRNA in ccRCC have not been clarified clearly . Therefore, a reliable prognostic biomarker is quite necessary.
In the present work, a five-lncRNA prognostic signature was identified based on the lncRNA expression profiles of ccRCC patients. It was then confirmed to be an independent prognostic predictor for patients with ccRCC. This study determined the potential five-lncRNA signature to predict the prognosis of ccRCC. The performance of five-lncRNA signature was evaluated using ROC analysis, suggesting that the prognostic value of the five-lncRNA signature is competitive for survival prediction. To the best of our knowledge, these five lncRNAs have not been previously reported, and further functional annotation of these prognostic lncRNAs will increase our understanding of their biological implications in determining ccRCC prognosis.
The result suggested that the prognostic value of five-lncRNA signature was independent of other clinical factors in ccRCC. Actually, lncRNAs have been reported to have higher specificity than mRNA in some cancer types [30, 37, 38]. The present work may bring some clinical implications in the development of novel prognostic factors in ccRCC. Although these five prognostic lncRNAs have not been previously investigated in cancers, we speculate that these lncRNAs may be involved in ccRCC tumorigenesis and many works are needed in the future ccRCC studies. Previous works have reported some prognostic lncRNAs in ccRCC, such as TUG1 , TCL6 , H19 , MALAT-1  and NBAT1 . After measuring the prognosis of these lncRNAs using TCGA data in ccRCC, these lncRNAs are not involved in TCGA-based prognostic lncRNAs. We speculated the reasons why these reported prognostic lncRNAs cannot be validated in TCGA data. First, all these reported works are based on Chinese ccRCC patients, whereas the ccRCC patients in TCGA are Caucasian people, and the underlying molecular mechanisms might be different between populations. Second, these published works are all based on a small-scale ccRCC cases, which might draw a conclusion with deviation. Future works with more samples are necessary.
Several limitations of present work should be addressed. First, we only analyzed and validated the prognostic power of the five-lncRNA signature in the TCGA dataset, and no other ccRCC lncRNA expression data can be used for further validation. Although previously published microarray data can be used to identify some lncRNAs, these data only include a relatively small fraction of lncRNAs. Second, lncRNAs always play important regulatory roles in a wide range of biological processes through a complex regulatory network involving different kinds of cis- and trans-regulatory elements. Further integrated analysis may help us to predict the functional roles of the five prognostic lncRNAs in ccRCC more accurately. Third, no experimental data on the underlying mechanisms of lncRNAs have been performed, and future experimental studies on these lncRNAs can enhance our understanding of the functional role in ccRCC.
In this work, we reported a lncRNA signature in ccRCC patients to predict survival. Using large-scale independent expression profiles, we have demonstrated the prognostic values of lncRNAs in ccRCC patients. Our result has suggested that the five-lncRNA signature is helpful in predicting the clinical outcome, and may be an effective prognostic biomarker in the prediction of the survival of ccRCC patients.
The lncRNAs expression data and corresponding clinical information of ccRCC patients TCGA database. After excluding the data without complete survival information, a total of 440 ccRCC patients were enrolled in this work. We also downloaded the detailed clinical information of ccRCC patients, including age, gender, tumor grade, AJCC cancer stage, etc. Samples from TCGA data set were equally divided into training (n=220) and testing sets (n=220).
ccRCC RNA-seq data were downloaded from TCGA data portal (https://tcga-data.nci.nih.gov/tcga/). After alignment to the human genome (Ensembl database v72 assembly), the expression level of lncRNAs and mRNAs were determined by the value of Reads Per Kilobase of exon model per Million mapped reads (RPKM). We identified lncRNAs from TCGA dataset based on the following three criteria: 1) transcripts were not identified in any protein-coding region; 2) transcript sequences have been annotated in GENCODE project ; 3) transcripts were expressed in at least half of the ccRCC samples. The lncRNA expression profiles were defined as those with an average RPKM ≥ 0.1 across 440 ccRCC samples. At last, a total of 9669 lncRNAs in dataset were enrolled. We used edgeR  software to examine the expression difference.
Based on the training set, the association between the expression level of each lncRNA and patient's overall survival was calculated using a univariate Cox regression. Those lncRNAs were considered to be significant if their P-values were less than 0.001. Then, the selected lncRNAs were fitted in a multivariate Cox regression analysis in the training dataset. Risk scores were estimated by involving these selected lncRNAs, which were weighted by their estimated regression coefficients in the multivariable Cox regression model. The risk score can be calculated for each ccRCC patient based on prognostic five-lncRNA signature. Based on the risk score formula, ccRCC patients can be divided into high-risk and low-risk groups, respectively. Differences in patients’ survival between these two groups can be evaluated by the Kaplan-Meier survival analyses. To further determine whether the prediction of the lncRNA signature was independent of other clinical variables, multivariate Cox regression and stratified analyses were carried out. The receiver operating characteristic (ROC) curve within 5 years were performed to compare the sensitivity and specificity of the survival prediction based on the risk score. All analyses were performed using R package (version 3.3.0).
To evaluate the functional implication of lncRNAs, spearman correlation coefficients were computed between five lncRNAs and protein-coding genes. Functional enrichment analyses for those co-expressed protein-coding genes were performed using the DAVID Bioinformatics Tool (version 6.7) . GO and KEGG category enrichments were based on the threshold of P-value < 0.05 and enrichment score > 1.0. Significant enrichment results were visualized using Cytoscape software (version 3.4.0) .
This work is supported by National Natural Science of China to Dong Dong (Grant No. is 31200956).
Author contributionsDD conceived and designed the experiments. DS, QQ, QC, YW, YG and DD analyzed the data and wrote the manuscript. All authors read and approved the final manuscript.
CONFLICTS OF INTEREST
The authors declare that they have no conflicts of interest.