SubpathwayMiner is available for pathway annotation and identification of any interesting gene/protein sets with identifiers supported by the system (Entrez Gene IDs, NCBI-gi IDs, UniProt IDs, PDB IDs, etc.). For example, the system is not limited to pathway analysis of gene expression data. It can also receive interesting gene sets from certain other approaches, such as the ensemble decision approach by the authors (19
A key function of SubpathwayMiner is sub-pathway identification of metabolic pathways. For comparison of entire pathway and sub-pathway identification, we showed an example application of SubpathwayMiner to a gene expression data, analyzed initially by Landi et al.
). The data was publicly available at the GEO database (accession number GSE10072). The pathway data got from KGML_v0.6.1 (ftp://ftp.genome.jp/pub/kegg/release/archive/kgml/KGML_v0.6.1/map).
We first identified a total of 1313 differentially expressed genes using the significance analysis of microarray (SAM) method (21
) (FDR <0.01) and Fold-change (FD >1.5 or <0.667). We then used SubpathwayMiner to annotate these differentially expressed genes to entire pathways and sub-pathways (k
= 4) of metabolic pathways. The results showed that these genes were annotated to 87 entire pathways and 307 sub-pathways of metabolic pathways. With the strict cutoff of p
-values <0.01, our system identified seven statistically significantly enriched entire pathways of metabolic pathways and 36 enriched sub-pathways corresponding to 10 entire pathways of metabolic pathways. The average overlap between the significant sub-pathways found within each single pathway was also calculated according to the Sokal and Sneath coefficient (22
) (). We have found that three entire pathways, which were included in 10 entire pathways that 36 sub-pathways correspond to, were not statistically significant (p
> 0.01). They were respectively path:00350 (tyrosine metabolism), path:00260 (glycine, serine and threonine metabolism), and path:00564 (glycerophospholipid metabolism). When we only adopt entire pathway identification method, these pathways may be ignored because of their high p
-values. However, some sub-pathways of these pathways were statistically significant in our system. The result indicates that these significant sub-pathways included in pathways of high p
-values may be associated with cancer initiation or progression. For looking for knowledge support, we searched PUBMED database. The results showed that gene macrophage migration inhibitory factor (MIF), which was differentially expressed and annotated in 5 sub-pathways (path:00350_5, path:00350_6, path:00350_7, path:00350_8 and path:00350_12) of the pathway path:00350, was associated with risk of recurrernce after resection of lung cancer (23
). MIF was also associated with beast cancer (24
), colorectal cancer (25
) and prostate cancer (26
), etc. Gene alcohol dehydrogenase 1B (ADH2), a differentially expressed gene annotated to these sub-pathways, was reported to be associated with esophageal cancer, aerodigestive cancer, breast cancer and colorectal cancer (27–30
). One differentially expressed gene annotated in a sub-pathway (path:00260_9) of the pathway path:00260, aldo-keto reductase family 1, member B10 (AKR1B10), was found to be useful as a new marker for identification of high lung cancer risk patients in usual interstitial pneumonia (31
). Mashkova et al.
showed that AKR1B10 was a potential oncogene and elevated transcription level is important for squamous cell lung cancer tumorogenesis (32
). Genes annotated in two sub-pathways (path:00564_1 and path:00564_2) of the pathway path:0000564 were found not to be obviously associated with lung cancer. However, two of them, Gene CHPT1 (choline phosphotransferase 1) and PLA2G4A (phospholipase A2, group IVA), were associated with breast cancer (33
) and colon cancer (34
). Moreover, some evidences were found in the literature for the biological significance of the highly enriched sub-pathways. Studies showed that some enzymes in sub-pathways of the ‘tyrosine metabolism’ pathway, including monoamine oxidase (MAO), aldehyde reductase (AR), catechol-Omethyltransferase (COMT), alcohol dehydrogenase (ADH) and aldehyde dehydrogenase (AD), were found to be highly associated with cancer (35–37
). Moreover, norepinephrine and its metabolism catalyzed by these enzymes were also found to be associated with cancer initiation and progression (37–41
). In the process of norepinephrine metabolism, norepinephrine is deaminated by MAO to 3,4-dihydroxyphenylglycolaldehyde (DOPEGAL). DOPEGAL is then converted by the sequential actions of AR, COMT, ADH and AD to 3,4-dihydroxyphenylglycol (DHPG), 3-methoxy-4-hydroxyphenylglycol (MHPG), 3-methoxy-4-hydroxyphenylglycolaldehyde (MOPEGAL) and formation of vanillylmandelic acid (VMA), respectively (37
). These evidences indicate that the sequential actions of enzymes (MAO, AR, COMT, ADH and AD), which are in the sub-pathways identified by our method, may play an important role in cancer initiation and progression. The above biological knowledge mining highly supports our analysis. We thus propose that pathways, which are statistically significant in sub-pathways but not in entire pathways, may be highly associated with cancer initiation and progression.
The statistically significantly enriched sub-pathways identified by SubpathwayMiner for differentially expressed genes from lung cancer