In this study, we combined gene expression data from two different studies to investigate the differences in gene expression for advanced stage MYCN non-amplified tumours with contrasting outcome at five years after initial diagnosis.
Our results suggest that this subgroup of tumours can be distinguished into two biological subtypes showing distinct gene expression profiles that are associated with contrasting clinical outcomes. The expression of the genes that are differentially expressed between these two subtypes may represent a general indicator of neuroblastoma aggressiveness, since corresponding expression behaviour can be observed in low stage MYCN non-amplified tumours as well as advanced stage MYCN amplified tumours.
Instead of simply comparing lists of differentially expressed genes obtained on single study data or combining p-values calculated for each single study [
19], we applied a method of meta-analysis on the gene expression data that is based on a well established statistical framework and comprises modelling of study-to-study differences [
10]. Unfortunately, the combination of gene expression data from different studies has the disadvantage that only genes common to all microarray platforms can be used. As the reliability of the probe mapping is crucial for a cross-platform analysis, we applied a stringent sequence based mapping of the probes of different microarray platforms in order to avoid inappropriate mapping of the probes. The combination of data from different studies for our analysis resulted in a large number of included expression profiles for the investigated subset of patients although a stringent selection criterion of a follow up of 5 years was applied. This clearly led to a higher statistical power of our analysis in comparison to single study based results, since of the 72 significantly differentially expressed genes, 34 genes were found exclusively in the meta-analysis of both sets.
Among the significantly differentially expressed genes are some that are known in the context of neuroblastoma research. Our results confirm observations made for these genes as described earlier in literature. High expression of
NTRK1 is present in neuroblastomas with favourable biological features and highly correlated with patient survival [
20]. High expression of
FYN and high FYN kinase activity are restricted to low-stage tumours [
21].
PTN is highly expressed in favourable neuroblastomas, whereas it is expressed at a significantly lower level in advanced tumours [
22]. Low
CAMTA1 expression is associated with poor outcome [
23].
NCAM expression seems to enhance the malignancy of neuroblastoma cells and their tendency to metastasise [
24]. High
HuD (
ELAVL4) mRNA levels may predict a clinically favourable outcome [
25]. The fact that some of these genes were exclusively detected by the meta-analysis (Table ) underlines the benefit of cross-study analyses for investigation of tumour subgroups. Interestingly, the 72 genes found to be significant for the investigated subgroup of neuroblastoma tumours also show a distinct differential expression in other prognostic subgroups and may thus be used as a general prognostic marker for neuroblastoma patients. Moreover, this result suggests that, although for neuroblastoma tumours several different clinical stages and risk groups are defined, on the level of gene expression they seem to comprise only two distinct biological entities associated with adverse patient outcome.
The GO based gene set enrichment analysis of the GO terms associated with the selected genes in our meta-analysis did not show that any GO term is overrepresented with high statistical significance. However, the p-values calculated in the gene set enrichment analysis provide a useful ranking of the GO annotation terms that we used to select the genes shown in table for characterization of the biological functions represented by the selected genes. Among the GO annotation terms associated with the genes selected in our meta-analysis, the GO-based gene set enrichment analysis highly ranked the cell cycle associated GO terms. The up-regulation of the expression of cell cycle genes in aggressive neuroblastoma tumours was already observed by Krasnoselsky et al. [
26] in a comparison of tumours of different stages and
MYCN amplification status, where patient outcome was not regarded. In addition to the cell cycle genes, gene set enrichment analysis of the GO terms associated with the significant genes resulted in high rankings of three other GO terms known to be affected in tumorigenesis: DNA damage response, negative regulation of MAPK activity [
27] and Wnt receptor signalling pathway [
28]. For DNA-damage response genes, higher expression can be observed in tumours with a unfavourable outcome than in tumours with favourable outcome. This effect can also be seen in other tumour entities like prostate cancer. For the gene
APEX1, gene expression data is available in the gene expression data repository Oncomine [
29] that shows increasing expression according to tumour malignancy (Additional File
1). This effect might be caused by accumulated genetic abberations in tumours with unfavourable outcome which trigger the activity of DNA-damage response genes.
While interpreting the analysis of gene expression for non-amplified advanced stage neuroblastoma tumours with regard to patient outcome, the influence of the therapy that all these patients have received has to be taken into account, as differences of the gene expression profiles with regard to patient outcome may not only reflect tumour malignancy but also tumour responsiveness to the therapy.
Although only a small number of patients were available for generation and assessment of a predictive model, outcome prediction based on data of both studies (only genes common to both platforms) yielded good results. Both patients with favourable and unfavourable outcome were classified with good results as indicated by a balanced ratio of sensitivity and specificity. This shows the potential of the approach to use data of different gene expression studies to derive predictive models for patient subgroups where gene expression data is rare. However, for stable and compact (in terms of the number of used genes) predictive models, a larger total number of samples is needed [
30] which could be realised by combination of future gene expression profiling studies with the approach used here.