The data set analyzed offered a suitable representation of the general population of glioblastoma cases. The median overall survival was 13 months, and the probabilities of survival at 12, 24, 36, 48 and 60 months post-diagnosis were 0.59, 0.25, 0.15, 0.11 and 0.07 respectively, in this study. The median survival is similar to that reported by Krex et al. [
14], and the 60 month survival probability is comparable to the 5-year survival rate of 0.13 estimated for grade IV brain cancer reported by the National Cancer Institute Surveillance Epidemiology and End Results [
44]. The similarity between the survival rate in this study and that reported for primary glioblastoma suggests an insignificant fraction of secondary glioblastoma samples among the samples analyzed [
20].
Comparing findings against a literature review confirmed that the Cox survival analysis of multiple gene expression profiles and clinical variables simultaneously was an effective tool to detect an integrated set of gene expression profiles exhibiting general and cohort-dependent associations with the three glioblastoma survival variables. The majority of the genes associated with lifetime, overall, and progression-free survival, in this study, have been previously reported to be associated with glioblastoma (35, 24, and 35 genes, respectively) or with another cancer (10, 19, and 15, respectively). In addition, the multi-factor analysis and data used in this study allowed the uncovering several novel associations between gene profiles and glioblastoma survival. Specifically, 16, 4, and 10 previously unreported genes were associated with lifetime, overall, and progression-free survival, respectively in the present work. The discussion of the findings from our study is divided into genes associated with multiple survival variables, genes associated with glioblastoma in a cohort-independent or cohort-dependent manner, and further investigation of complex associations.
Pik3r1 and
E2f3 were associated with all three glioblastoma survival variables (Tables , , and ). The higher glioblastoma hazards associated with higher levels of
Pik3r1 observed in this study are supported by previous work showing that over-expression of this gene plays a role in the activation of the PI3K/Akt pathway resulting in cell proliferation and tumor invasion [
45]. Likewise, a link between
E2f3 and glioblastoma has been reported [
28,
46]. Among the 15 genes associated with two glioblastoma events (Table ),
Akr1c3,
Csf1,
Jag2,
Plcg1,
Rpl37a,
Sod2, and
Topors were associated with lifetime and overall survival (Tables and ).
Jag2 has been associated with adenomas [
47], pancreatic [
48] and breast cancer [
49],
Rpl37a with nasopharyngeal carcinoma cell lines [
50], and the rest with glioblastoma [
26,
51-
54]. The consistent findings across both glioblastoma survival events suggest that these genes may have specific roles in death. Likewise, the association between
Hras and overall and progression-free survival (Tables , and ), is consistent with previous glioblastoma studies [
55] and suggests that this gene may have a role in aggressive glioblastoma growth.
Fstl1,
Mtap,
Tp53,
Camk2g 214322_at, and
Il13ra1 probe 210904_s_at, were associated with lifetime and progression-free survival (Tables , and ) and these associations are supported by previous studies [
20,
24,
25,
28,
56-
58].
Most genes (lifetime survival, 55 out of 61 genes; overall survival, 45 out of 47 genes; and progression-free survival, 55 out of 60 genes) were associated with survival in a general or cohort-independent manner. The most extreme cohort-independent changes in lifetime survival were observed in
Syne1 (HR = 0.17) and
Pdcd4 (HR = 4.68), and the former profile has been found in lung [
59], ovarian [
60], colon, and breast cancers [
61]; while, the second has been associated with glioma [
62]. The most extreme cohort-independent changes in overall survival were observed in
Ighg1 (HR = 4.33) and
Tgfa (HR = 0.12), and the former trend has been found in cancer cell lines [
63]; meanwhile the later is present in the KEGG glioma pathway [
28]. Lastly, the genes that presented extreme hazard ratio values and general association with progression-free survival are
Pla2g7 (HR = 0.11) and
Paics (HR = 5.28). The
Pla2g7 and
Paics trends identified in this study are consistent with those reported for breast cancer in mice [
64] and in non-glioma types of cancer [
65,
66], respectively.
Several genes (lifetime survival, 6 out of 61 genes; overall survival, 2 out of 47 genes; and progression-free survival, 5 out of 60 genes) were associated with glioblastoma survival in a cohort-dependent manner. These findings indicate that effective use of these genes in prognostic indices or in therapy development must consider the personal characteristics of the individual. Higher levels of
C2 and
Prkcb (probe 209685_s_at) were associated with a higher lifetime death hazard in males (HR = 1.93 and 5.22, respectively) than in females (HR = 1.30 and 1.31, respectively) and the profile of the latter gene has been observed in colon cancer cell lines [
67]. The lifetime hazard estimate decreased with increased levels of
Sox10 in Caucasian individuals (HR = 0.55) compared to non-Caucasian individuals, and this pattern is concordant with broad distribution of
Sox10 in high grade gliomas [
68]. Increases in the level of
Chi3l1 were associated with significant increases in lifetime hazard estimates across all therapies with the highest hazard ratio observed in individuals receiving no therapy (None, HR = 2.42). This trend is consistent with reports that
Chi3l1/
Ykl-40 was highly overexpressed in glioblastoma relative to nonneoplastic brain [
69] and that
Ykl-40 is associated with poorer response to radiation and shorter lifetime survival in glioblastoma [
70]. Males (HR = 0.36) and individuals receiving no therapy (HR = 0.38) have the lowest hazard ratio per increase in
Prkcb (probe 207957_s_at). These trends are consistent with those reported for other cancer types [
67] and with observations of protein kinase C activation in gamma-irradiated proliferating and confluent human lung fibroblast cells [
71].
The cohort-dependent associations between overall survival and both
Polr2d and
Igf2bp3 have been observed in colorectal cancer [
72] and glioblastoma [
73], respectively. Three genes (
Rab31,
Rps20 and
Apool) exhibited a cohort-dependent association with overall survival that is consistent with previously reported trends [
74-
76]. Lastly, the gender-dependent association between
Gdf10 and progression-free survival is in agreement with reports of copy number loss of
Gdf10 in mesothelioma [
77].
Further analyses of the association between individual genes (with or without clinical variables) and hazards were undertaken when the trend estimated from the multi-gene index was opposite to that previously reported. Nine genes and survival events were re-analyzed individually and compared to previous reports including:
E2f3 and all three survival variables [
28,
46],
Egfr and lifetime survival [
78],
Cfs1 and lifetime survival,
Mdm2 with overall hazard [
79],
Fstl1 and lifetime and progression-free survival [
25],
Mtap and progression-free hazard [
57],
Pdcd4 and lifetime survival [
62],
Tgfa and overall survival [
80], and race-dependent
Vav3 and overall survival [
81]. In the first six cases, the consideration of the gene alone as predictor of glioblastoma survival as standard in previous reports resulted in non-significant associations, in this study. These results indicate that the accurate identification of biomarkers and precise characterization of the trend requires the study of the genes in concert with other genes in a systems biology framework, such as the approach implemented, in this study. Re-analysis of
Pdcd4 and
Vav3 confirmed the significant trend detected in the multi-gene analysis, suggesting that further studies are needed to precisely characterize the trend.
The LOOCV confirmed the adequacy of the set of genes and clinical variables identified to predict the glioblastoma hazards. The minor differences between the observed and predicted numbers in each group may be due to the discretization of the survival length into high and low groups required by the discriminant analysis; whereas, the Cox survival analysis models continuous time to the glioblastoma event. The significant number of genes prognostic of glioblastoma survival identified in this study that are also targets of microRNAs associated with glioblastoma [
43] further confirms our results.
In addition to literature review and LOOCV, the gene-survival associations detected in this study were confirmed using the information from the REMBRANDT database. The associations between survival and the 10 gene probes with the most extreme hazard ratio estimate for each of the three survival variables studied that did not interact with cohort variables (Tables , , ) were investigated in REMBRANDT. The query was performed using the Kaplan-Meier survival plot for Gene Expression Data. Of these, eight genes had the same significant trend observed in our study (Syne1, Gigyf2/Tnrc15, Scn5a, Hoxa10, Pdcd4, Tgfa, Pla2g7 and Agpat1), two did not have information on the REMBRANDT database (Ighg1 and Hnrnpd), Fstl1 had an opposite trend than the one observed in our study and in previous independent studies (Table ) and most of the remaining genes, although non-significant, had the same trend observed in our analysis. The latter results are consistent with the simpler analytical approach based on Kaplan-Meier curves available in REMBRANDT, when compared to the more flexible Cox survival analysis used in our study. The Kaplan-Meier approach relies on non-parametric rank-based test to compare the survival between individuals with high and low gene expression. These groups are obtained by setting up an arbitrary expression threshold. Non-parametric rank-based approaches tend to have lower power to detect significant variation than semi- and parametric approaches such as the Cox survival analysis. In addition, the Kaplan-Meier analysis only allows the consideration of one explanatory variable at a time, and this variable has to be discrete (thus, the reason for comparing high and low expression groups in REMBRANDT). This approach does not allow considering multiple continuous covariates (i.e. gene expression) and factors (e.g. race, gender, therapy and progression) or interactions simultaneously. The Cox-survival analysis implemented in our study allows the simultaneous consideration of multiple factors (such as possible population stratification due to race), covariates (e.g. other gene expression profiles) and interactions, and it does not require the discretization of the gene expression values that could result in potential loss of information. Thus, the Cox approach used in our study is able to capture the association between continuous gene expression values and survival conditional on all other model terms and is able to detect associations that are likely not to reach statistical significance using the Kaplan-Meier comparison of survival between high and low gene expression groups.
Among the GO categories, 19 biological processes and three molecular functions were over-represented (FDR adjusted P-value < 0.1, ≥ 3 genes per category) in the genes associated with the three glioblastoma events studied (Tables , and ). Two biological processes, cell cycle (GO:0007049) and death (GO:0016265), were over-represented in the lifetime and progression-free survival (Tables and ), and several biological processes have been previously associated with glioblastoma [
17,
62,
68,
70,
79,
82-
86]. These processes included: aging, morphogenesis, cell cycle and proliferation, and death for lifetime survival; morphogenesis for overall survival; and cell cycle, death and recognition, death, response to biotic and abiotic stimuli, programmed cell death, and apoptosis for progression-free survival.
The study of complementary glioblastoma survival variables allowed to confirm that the gene profiles associated with lifetime survival resulting in the enriched functional category of aging are clearly associated with cancer initiation and progression and are not a simply reflection of the natural aging process. Two results confirm that the biomarkers are not mere confounding with aging. First, the genes in the GO terms "aging (GO:0007568)" and "cell aging (GO:0007569)", Pdcd4, Cdkn2a, and Tp53, have all been associated with GBM in previous independent studies (Table ). In addition, Tp53 was associated with progression-free survival (Table ). Second, other functional terms enriched among the genes associated with lifetime glioblastoma survival were also identified on the other glioblastoma survival variables studied. The biological processes of cell death and cell cycle were enriched both for lifetime and progression-free survival.
The biological processes, molecular functions and gene networks particular to a glioblastoma survival event offered insights into the processes particular to the initiation and progression of this cancer. For instance, eight biological processes associated with lifetime survival were level 3, and one was level 4, indicating that the differentially expressed genes associated with lifetime survival participate on broad or general biological mechanisms. The interconnection between the genes pertaining to aging further confirms the significance of this gene network on lifetime survival (Figure ). Although only two biological processes were associated with overall survival, these processes correspond to levels 4 and 6. This result indicates that the genes associated with overall survival correspond to more specific mechanisms. Moreover, both biological processes are related to generation and organization of anatomical structures, such as organs, and this finding may be associated to the dispersion and development of malignant cells after diagnosis and resection. The close relationship between biomarker genes in this network supports this finding (Figure ). Albeit the study of progression-free survival encompassed a shorter period than lifetime and overall survival, the functional analysis showed several biological processes and molecular functions over-represented among the genes associated with this survival. Four of the biological processes are from level 6 to 8, indicating that specific gene networks and roles are associated with progression-free survival. The biological processes associated with progression-free survival include regulation of progression through cell cycle, programmed cell death, and apoptosis. Extensive relationships between the biomarker genes in the cell cycle were identified further, supporting the major role of this network on glioblastoma progression (Figure ). In addition, three molecular functions were enriched among the genes associated with progression-free survival. Therefore, many biological and molecular events occur in the period between the diagnosis of malignancy and progression or recurrence, probably due to response to numerous treatments, surgery, and cancer progression. Two genes were highly represented across the categories (Tables to ).
Tp53 has an important role as a tumor repressor [
83], and
App is highly expressed in individuals with short-term glioblastoma survival [
24].