All types of data were acquired from TCGA project [
4] (
http://cancergenome.nih.gov/dataportal/data/about/). Gene expression microarrays were performed on Affymetrix HT Human Genome U133 Array Plate Set by Massachusetts Institute of Technology (MIT). Level three data gave calls for genes per sample after Probeset-level and Gene-level Robust Multiarray Analysis (quantile normalization and background corrected) until the most recent update on Sep. 05, 2008. After calculation the average expression values for duplicated samples, finally 243 tumor tissue samples, 10 normal tissues and 1 cell line sample from glioblastomas patients were used for differential expression analysis. MicroRNA expression experiments were performed on Agilent 8 x 15KHuman microRNA-specific microarray by Universities of North Carolina (UNC). There are 534 microRNAs (470 human microRNAs) and 240 tumor tissue samples, 10 normal tissue samples available in level three data (after quantile normalization and batch adjusted) until the most recent update on Nov. 10, 2008. As it is very difficult to get the brain tissue samples from normal people, the control samples are all from the adjacent normal tissues of glioblastomas patients. Thus we focus on detecting the effect of somatic difference on disease, which is also a common approach in many other cancer studies. We used 254 samples for gene expression and pathway analysis, 250 samples for microRNA expression analysis, 247 samples common in microRNA and gene expression datasets for miRNA targets analysis.
Gene expression analysis
A total of 1,236 genes were identified to be significantly differentially expressed between tumour and normal tissues. The results were given in Additional file
1. To further investigate the function of these differentially expressed genes, we used DAVID [
8,
9], bioinformatics resources and pathway analysis [
10] for systematic and integrative analysis of large gene lists. 1,221 of 1,236 differentially expressed genes had annotations in DAVID Functional Annotation Tools. We carried out gene set enrichment analysis to indentify the most enriched gene function annotation terms (GO terms) [
11] in the list of 1,221 annotated differentially expressed genes. (See methods for details.) The top ten enriched GO terms in the list of differentially expressed genes were shown in Table , suggesting these genes were enriched in brain and mainly associated with Nervous system development and function. The detailed information, for example, genes which shared the GO terms was given in Additional file
1.
| Table 1The top ten GO terms most enriched in the differentially expressed gene list |
DAVID also could cluster similar functional GO terms together. The first two enriched GO term groups in the differentially expressed gene list were all the function terms relevant to brain and neuron. They were: 1) GOTERM Cellular Component including five terms: neuron projection, cell projection, dendrite, cell soma, and axon. 53 genes belong to this cluster including CDK5 , SNCG , UCHL1 , FREQ.
According to NCBI Entrez gene annotation [
12], it was reported that the deregulation of gene CDK5 causes neuronal death and neurodegenerative diseases. Gene SNCG encodes a member of the synuclein family of proteins which are believed to be involved in the pathogenesis of neurodegenerative diseases. Mutations in this gene have also been associated with breast tumor development. Gene UCHL1 is specifically expressed in the neurons and in cells of the diffuse neuroendocrine system. Mutations in this gene may be associated with Parkinson disease. FREQ gene encodes calcium-binding proteins expressed predominantly in neurons. The protein encoded by this gene is associated with secretory granules and modulates synaptic transmission and synaptic plasticity. 2) GOTERM Biological Process including twenty one terms: synaptic transmission; transmission of nerve impulse; neurotransmitter secretion; regulated secretory pathway; generation of a signal involved in cell-cell signaling; regulation of neurotransmitter levels; neurological system process; cell-cell signaling; exocytosis; SNARE binding; secretory pathway and so on. A total of 336 genes belong to this cluster. The detailed information for this two GO term groups were given in Additional file
1.
Pathway analysis
We first used algorithm proposed in TAPPA (Topological Analysis of Pathway Phenotype Association) [
10] for pathway analysis. The results revealed that 131 pathways were significantly associated with glioblastoma (Additional file
2). The 131 associated pathways belonged to 33 functional groups, among which Cell Signaling, Neuroscience, Immunology and Expression were the most enriched pathway groups. Glioma pathway was the only significant pathway in the cancer functional group with P-value= 5.75 × 10
–7. Similar to the GO terms enrichment analysis, we used DAVID Functional Annotation Tools to indentify which pathways were most enriched in the list of differentially expressed genes. The 40 significant pathways were also given in Additional file
2. Cell Signaling, Signal Transduction, Apoptosis and Neuroscience were the most enriched pathway groups. A total of 30 significant pathways found by both methods were shown in Additional file
2. The detailed genes information involved in the over-represented pathways was also provided. Long-term potentiation(a Nervous System pathway) and Calcium signaling pathway(a Signal Transduction pathway), were the most significantly enriched pathways with p-value
2.62 ×
10–8 and
3.26 ×
10–8, respectively. There were 11significant Cell Signaling pathways, 4 significant Apoptosis pathways, 4 significant Signal Transduction pathways, 3 significant Immunology pathways, 3 significant Neuroscience pathways and 2 significant Nervous System pathways, (Some pathways may belong to different functional groups). The results suggested that the differentially expressed genes were most involved in signal, apoptosis and neuroscience pathways. Take long-term potentiation pathway as an example, Figure show all the genes in this pathway, Hippocampal long-term potentiation (LTP) is a long-lasting increase in synaptic efficacy, is the molecular basis for learning and memory. 3 of the 71 genes in this pathway were significant over expressed genes and were highlighted in blue and 21 were under expressed and were highlighted in red. (One box in the figure may denote several genes)
Analysis of differential expression of microRNA
A total of 97 microRNAs were significantly differentially expressed between tumor and normal tissues (
Additional files 3). To examine whether these miRNAs were associated with glioblastoma, we used miR2Disease [
13] to validate our results (Updated Date: Dec. 19, 2008). MiR2Disease provides a comprehensive literature reported resource of miRNA deregulation in various human diseases. From the data in miR2Disease, 81 of the 97 significant miRNAs have been reported to associate with 84 diseases, among them, 72 miRNAs are associated with 59 cancers and brain diseases. 22 of those miRNAs have been reported to induce glioblastoma/ glioblastoma multiforme(GBM)/neuroblastoma (NB) and the expression pattern of miRNA(up-regulated or down-regulated) in published literatures is exactly the same as that in our data. Table gave the p-value, expression pattern, disease and references for the 22 miRNAs. We inferred that the other 50 miRNAs which were related to other cancers and brain diseases may also be important for carcinogenesis in brain. However, further experiment validations were required to confirm our results. Among the 97 significant miRNAs, 30 miRNAs were up-regulated and 67 were down-regulated.
| Table 222 MicroRNAs related to glioblastoma/ GBM/ Neuroblastoma |
To further examine the function of those significant miRNAs, we need to find the target gene of miRNAs associated with glioblastomas. So we carried out the regression analysis for miRNA and gene expression.
The regulation of gene expression by microRNA
miRNA has been thought to promote degradation of target mRNA or suppress translation of corresponding protein by matching with mRNA in the 3’-UTR region[
20-
23]. There is no doubt that miRNAs perform various biological functions through regulation of gene expression. To reveal the mechanisms of how miRNA regulates gene expression in GBM, we identified target genes of miRNAs and constructed miRNA target networks. Since miRNAs repress the expression of its target gene, the first step was to test the inverse relationship between the expression profile of miRNA and that of its potential targets. To achieve this, we regressed the expression of target mRNA on the expression of miRNAs and select mRNA with significant negative regression coefficients as miRNA targets. P-value for declaring significant evidence of miRNA target was
1.00 ×
10–4. The second step was to conduct sequence analysis which used sequence complementarities of miRNA and its target site to predict potential miRNA target genes. To achieve this, we use experimentally verified and predicted miRNA targets data from three miRNAs databases: miR2Disease[
13], TarBase [
24] and miRBase[
25]. MiR2Disease (updated on Dec.19, 2008) and TarBase (updated on June, 2008) provided experimentally verified microRNA target genes. MiRBase predicted the target gene of miRNA by miRanda algorithm [
26], where the predicted target genes and miRNAs could be downloaded directly (updated on: Oct.31, 2007).
We compiled 1,236 differentially expressed mRNAs and 97 differentially expressed miRNAs data in 237 tumor tissue sample and 10 normal tissue samples. We found two experimentally confirmed results. The literature reported that the in nasopharyngeal carcinomas underexpressed hsa-mir-29c (expression fold change (tumor/normal)=0.20) target overexpressed gene COL4A1(expression fold change(tumor/normal)=5.24) [
27]. In our result, down-regulated hsa-mir-29c (differentially expressed P-value <
5.11 ×
10–12) targets over-expressed gene COL4A1 (differentially expressed P-value <
3.58 ×
10–6) with regression
β = –
389.02 and P=
1.35 ×
10–8. We conclude that hsa-mir-29c is also an important miRNA in glioblastomas. Another experiment validated targets gene was LDOC1 targeted by has-miR-155[
28]. The known oncogenic miRNA hsa-miR-155 can regulate a set of target genes including LDOC1, a regulator of apoptosis [
29]. Our results showed that hsa-miR-155 was over-expressed (differentially expressed P-value <
1.40 ×
10–10) and targets under-expressed gene LDOC1 (differentially expressed P-value <
1.085 ×
10–31) with regression
β = –
196.77 and P=
4.00 ×
10–15. We inferred that hsa-mir-155 could induce cancer through regulation of apoptosis gene LDOC1 in glioblastomas.
For predicted targets in miRBase, we found 1,094 matched miRNA-gene pairs including 70 miRNAs and 661 genes (Additional file
4). 44 down-regulated miRNAs target 202 overexpressed genes while 26 up-regulated microRNAs target 459 underexpressed genes. The up and down-regulated miRNA-gene pairs were shown in Figure and Figure .
The 661 target genes were a subset of the 1236 significant differentially expressed genes. We examined which pathways were these genes enriched in and compared them with the previous results. 11 pathways were significant by fisher exact test in DAVID, 8 of which were the same as the pathways identified from the previous sections: Epithelial cell signaling in Helicobacter pylori infection,Cholera - Infection, Long-term potentiation, Calcium signaling pathway, Neurodegenerative Diseases, Long-term depression, Gap junction, Neuroactive ligand-receptor interaction. Three new enriched pathways include Amyotrophic lateral sclerosis (ALS), Alzheimer’s disease, Wnt signaling pathway. These differentially expressed genes were also most involved in signal and neuroscience pathways.
To investigate the function of the 661 target genes, we searched the TSGDB [
30] (a tumor suppressor gene database) and DNA-Tumor Suppressor and Oncogene Database [
31] and we found eight tumor suppressor genes APC, TP53, BIN1, BTG1, CDK2AP1, LDOC1, RASSF1, WFDC1 and three oncogenes: MCF2, MPL, THRA.
According to NCBI Entrez gene annotation [
12], APC encodes a tumor suppressor protein that acts as an antagonist of the Wnt signaling pathway. It is also involved in other processes including cell migration and adhesion, transcriptional activation, and apoptosis. TP53 encodes tumor protein p53, which responds to diverse cellular stresses to regulate target genes that induce cell cycle arrest, apoptosis, senescence, DNA repair, or changes in metabolism. BIN1 encodes several isoforms of a nucleocytoplasmic adaptor protein, one of which was initially identified as a MYC-interacting protein with features of a tumor suppressor. Isoforms that are expressed in the central nervous system may be involved in synaptic vesicle endocytosis and may interact with dynanim, synaptojanin, endophilin, and clathrin. LDOC1 is thought to regulate the transcriptional response mediated by the nuclear factor kappa B (NF-kappaB). The gene has been proposed as a tumor suppressor gene whose protein product may have an important role in the development and/or progression of some cancers. RASSF1 encoded protein was found to interact with DNA repair protein XPA. The protein was also shown to inhibit the accumulation of cyclin D1, and thus induce cell cycle arrest. WFDC1 gene is mapped to chromosome 16q24, an area of frequent loss of heterozygosity in many cancers. Owing to its location and a possible growth inhibitory property of its gene product, this gene is suggested to be a tumor suppressor gene. MCF2 is a member of a large family of GDP-GTP exchange factors that modulate the activity of small GTPases of the Rho family. Five-prime recombinations result in the loss of N-terminal codons, producing MCF2 variants with oncogenic potential.
To further investigate the function of target genes, we identified the miRNA targeted pathways by right-tail fisher exact test, which tested enrichment of pathways in the miRNA target gene set. A total of 83 pathways targeted by 94 miRNAs were listed in Additional file
5 after Bonferroni correction for multiple tests (p-values<
1.00 ×
10–4). Many of the pathways were targeted by more than one miRNA. We shown 29 pathways which were targeted by more than 10 miRNAs in Figure . Long-term potentiation (Nervous System pathway) was targeted by 79 (the most) miRNAs and Nitric Oxide Signalling (Signalling pathway) was targeted by 74 (the second most) miRNAs. We can see that the differentially expressed miRNAs most frequently targeted genes in Cell Signalling and Nervous System. The red bar gave the negative logarithm with 10 base of average p-value indicating the significance of enrichment of the pathway in the miRNA targets. The DNA replication pathway and the cell cycle pathway have the smallest average P-value
7.70 ×
10–9 and the second smallest P- value
1.17 ×
10–6. P-values of Long-term potentiation pathway (
7.29 ×
10–6) and Nitric Oxide Signalling pathway (
9.99 ×
10–6) were also small.