Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Nat Biotechnol. Author manuscript; available in PMC 2013 December 1.
Published in final edited form as:
PMCID: PMC3681899

Heterogeneity of tumor-induced gene expression changes in the human metabolic network


Reprogramming of cellular metabolism is an emerging hallmark of neoplastic transformation. However, it is not known how metabolic gene expression in tumors differs from that in normal tissues, or whether different tumor types exhibit similar metabolic changes. Here we compare expression patterns of metabolic genes across 22 diverse types of human tumors. Overall, the metabolic gene expression program in tumors is similar to that in the corresponding normal tissues. Although expression changes of some metabolic pathways (e.g., up-regulation of nucleotide biosynthesis and glycolysis) are frequently observed across tumors, expression changes of other pathways (e.g., oxidative phosphorylation and the tricarboxylic acid (TCA) cycle) are very heterogeneous. Our analysis also suggests that the expression changes of major metabolic processes across tumors can be rationalized in terms of several principal components. On the level of individual biochemical reactions, many hundreds of metabolic isoenzymes show significant and tumor-specific expression changes. These isoenzymes are potential targets for anticancer therapy.

All tumors share a common phenotype of uncontrolled cell proliferation. To support the synthesis of biomass components and to generate energy required for cellular growth, cancer cells have to reshape the regulatory and functional properties of their metabolic networks. Over 80 years ago, Otto Warburg1 identified a shift from oxidative to fermentative metabolism as a common physiological trait of tumor cells. Following this early insight into cancer metabolism, the main focus of cancer research generally shifted towards the analysis of signaling, gene-regulatory and genetic perturbations in various tumors2, 3. Recently, however, there has been a resurgence of interest in cancer metabolism4-6. An important factor contributing to this renaissance is the observation that many signaling pathways altered in cancer are key regulators of the human metabolic network5. In addition, the therapeutic potential of metabolic targets in cancer has also been rediscovered7, 8.

Taking advantage of a large compendium of gene expression profiles that has been accumulated over the last decade9, 10, in this study we comprehensively analyzed tumor-induced changes in mRNA expression of human metabolic genes across 22 diverse cancer types. To minimize confounding metabolic adaptations that may arise from tissue culture conditions, we analyzed only microarray data obtained from biopsies of primary tumors. We compared gene expression in tumors and corresponding normal tissues at several conceptual levels of biochemical organization: at the global network level, at the level of individual biochemical pathways and at the level of single enzymatic reactions. The focus on the human metabolic network and the analysis of the large collection of tumor and normal samples allowed us to gain statistical power and establish significance for many expression patterns not reported previously.


Global changes in metabolic gene expression

To understand metabolic gene expression in different cancers, we assembled a comprehensive collection of more than 2500 microarray measurements spanning 22 different tumor types (Online Methods and Supplementary Table 1). Although we analyzed only expression data obtained using the most comprehensive human expression array platform (HG U133 Plus 2.0; Supplementary Table 2), comparisons of data for the same tumor types obtained from independent studies and with different microarray platforms (Supplementary Table 3) showed a high correlation of expression changes (average Spearman’s rank correlation coefficient = 0.63), confirming the generality of the observed expression patterns (see Supplementary Fig. 1).

Using the assembled expression compendium, we first investigated the global shifts in metabolic gene expression between and within different cancers and their corresponding normal tissues. For this analysis we used 1421 human genes assigned to metabolic pathways in the KEGG database11. Using two different measures of divergence between a pair of expression profiles12, the Euclidean distance and the correlation distance (Online Methods), we compared global expression patterns between tumors and normal tissues (Supplementary Table 2). In order to account for batch effect arising due to variations in laboratory conditions and measurements, the estimated batch contribution was subtracted from expression distances between expression profiles measured in different studies (Online Methods).

Relative differences between the distributions are consistent for the two metrics of expression divergence (Fig. 1 and Supplementary Fig. 2). The expression distance between tumors and corresponding normal tissues (Tumorn-Normaln) is significantly larger than the distance between different samples of the same normal tissues (Normaln-Normaln; Mann-Whitney U test P-value = 10−8; Fig. 1) or between different samples of the same tumors (Tumorn-Tumorn; P-value = 4*10−7). The distance Tumorn-Normaln, however, is significantly smaller than the distance between different tumors (Tumorn-Tumorm; P-value = 2*10−7), which in turn is significantly smaller than the distance between different normal tissues (Normaln-Normalm; P-value < 2*10−16). The average expression distance between two different tumors is ~82% of the average distance between two different normal tissues, while the distance between a tumor and a corresponding normal tissue is ~63% of the distance between two different normal tissues. Consequently, although the metabolic expression patterns in different tumors become more similar than in corresponding normal tissues, the general metabolic expression program of the original tissue is mostly retained in tumors.

Figure 1
Global differences in metabolic gene expression between tumors and normal tissues. Colors represent distributions of the Euclidean (RMSD) expression distance between different samples of identical normal tissues (Normaln-Normaln, magenta), different samples ...

Expression changes of individual biochemical pathways

We next analyzed the expression changes associated with individual biochemical pathways defined in the KEGG database11 (see Supplementary Table 5 for pathway information and numbering). To identify the patterns of up- and down-regulation for each metabolic pathway, we determined the significance of its expression changes in the tumor samples relative to the corresponding normal samples using the Wilcoxon signed-rank test, adjusted for multiple hypothesis testing (Online Methods). Based on this analysis we calculated the average fraction of tumor samples in which each metabolic pathway was significantly (FDR-corrected P-value < 0.05) up-regulated n¯ and down-regulated m¯ across 22 cancer types (Fig. 2). To assess the statistical significance of the observed pathway behavior, we computed the null distributions of (n¯+m¯) and (n¯m¯) values for metabolic pathways highlighted in Fig. 2 and demonstrated statistical significance of the observed patterns (Online Methods and Supplementary Fig. 3). The general expression patterns observed with the KEGG pathways were similar to results obtained using the BioCyc pathway definitions13 (Supplementary Fig. 4a; Supplementary Table 6), suggesting the robustness of the results with respect to alternative pathway definitions.

Figure 2
Expression of individual metabolic pathways in tumors. The biochemical pathways defined in the KEGG database (see Supplementary Table 5 for pathway numbering) are shown in the coordinates of (n¯+m¯, horizontal axis) and (n¯ ...

As expected, pathways responsible for production of biomass components that are essential for cell division, such as pyrimidine (18) and purine (16) biosynthesis, are significantly up-regulated in many tumor samples (Fig. 2). Along with these two pathways, glycolysis (86) is also significantly up-regulated in many samples, consistent with an enhanced glucose uptake frequently observed in tumors14. Among other pathways displaying frequent and significant up-regulation are pathways related to protein synthesis (aminoacyl-tRNA biosynthesis (81)) and glycoprotein biosynthesis (N-glycan biosynthesis (40)). In tumor samples where the expression of the aforementioned pathways is not significantly changed, the overall metabolic gene expression was mostly either down-regulated or not significantly changed. Specifically, this is the case for 72% of tumor samples with no significant change in the glycolysis pathway expression, 84% of tumor samples with no change in the purine biosynthesis pathway expression, and 88% of tumor samples with no change in the pyrimidine biosynthesis pathway expression. The down-regulation of the overall metabolic gene expression is common for tumors originating from human tissues with significant metabolic functions (see principal component analysis below).

In contrast to the biosynthesis pathways, pathways responsible for degradation of essential amino acids (valine, leucine and isoleucine degradation (22)), cofactors (retinol metabolism (75)), and fatty acids (9) are frequently and significantly down-regulated. Interestingly, two metabolic pathways that are also consistently down-regulated across various tumors are xenobiotic (82) and drug (83) metabolism. These processes are responsible for detoxification and disposal of compounds foreign to the normal biochemistry of the cell. Several previous studies have shown that polymorphisms in cytochrome P450 genes correlate with cancer susceptibility in different types of cancer, including those of the lung, bladder and breast15, 16. Although the specific reasons for the decreased expression of xenobiotic pathways in cancer need to be further investigated, it is possible that this down-regulation contributes to the increased sensitivity of tumor cells to chemotherapies.

The heterogeneous behavior of the oxidative phosphorylation (15) and the TCA cycle (1) pathways is also notable (Fig. 2). Interestingly, oxidative phosphorylation shows the most heterogeneous behavior of all considered metabolic pathways. In brain, colon, kidney, pancreatic and thyroid cancers genes involved in oxidative phosphorylation are significantly down-regulated, whereas in breast, leukemia, lung, lymphoma and ovarian cancers these genes are significantly up-regulated (Supplementary Table 7). This pattern suggests that the role of oxidative phosphorylation is not universal for all tumors, but possibly reflects the adaptation of different cancers to tissue-specific physiological conditions such as hypoxia, nutrient availability, or complement of genetic lesions driving a specific tumor type.

We also explored the heterogeneity of metabolic pathway expression across different samples of the same (or similar) tumor types. Such an analysis (Online Methods and Supplementary Fig. 5) showed that oxidative phosphorylation gene expression is not only heterogeneous between different tumor types, but also frequently varies between samples of the same tumor. This observation suggests that the activity of oxidative phosphorylation is influenced not only by the variability of environments across different tumor types, but also by the specific physiological conditions and/or genetic composition of individual tumors in each cancer patient. In contrast, other metabolic pathways showed similar expression patterns across different samples of the same tumor.

Correlation between metabolic pathways and signaling genes

We next investigated correlations between the expression of metabolic pathways and expression of signaling and regulatory genes frequently involved in tumorigenesis. Although several previous studies17 demonstrated that correlated expression patterns usually cannot be equated with regulation causality, i.e. one gene being regulated by the other, significant correlations could still reveal important functional relationships. To evaluate expression correlations we used the context likelihood of relatedness (CLR) method18, which is based on mutual information between expression patterns and controls for specificity of each discovered relationship (Online Methods), to identify significant interactions (Z-score > 2.0, Supplementary Table 8) between the 214 non-metabolic genes annotated in the KEGG signaling/cancer pathways (Supplementary Table 9) and the 87 metabolic pathways considered in our analysis; significant relationships identified by CLR suggest high mutual information between expression patterns of the corresponding genes and pathways.

The CLR analysis revealed several interesting relationships. The oxidative phosphorylation pathway has high mutual information with the hypoxia-inducible factor (HIF1A) and its negative regulator RBX1. Notably, the oxidative phosphorylation expression is anti-correlated with the expression of HIF1A, and is correlated with the expression of RBX1. The observed anti-correlation between oxidative phosphorylation and HIF1A suggests that the heterogeneity in the expression of oxidative phosphorylation genes (Fig. 2) is likely to be influenced by tumor oxygen availability. In addition, the mutual information between HIF1A and glycolysis is not high, likely because HIF1A is involved in the up-regulation of glycolysis specifically under hypoxia, although many tumors show a strong expression of oxidative phosphorylation and may not be hypoxic. In contrast, there is significant mutual information between glycolysis and CDC42, a gene essential for cell cycle progression. Glycolysis is also strongly correlated with expression of RAS and genes from the MAPK pathway which have been previously implicated in promoting aerobic glycolysis19. Apart from glycolysis, CDC42 expression has also high mutual information with other pathways essential for fast cellular growth (such as purine, pyrimidine, and amino acids biosynthesis). On the other hand, CDC42 expression is not correlated with the expression of oxidative phosphorylation, suggesting that in fast growing tumor cells glucose fermentation dominates. This observation agrees with the results of the principal component analysis described below.

Principal component analysis of pathway expression changes

Individual metabolic pathways do not function in isolation. In contrast, they display highly correlated and interdependent patterns of gene expression. Therefore, we used principal component analysis (PCA)20 to better understand the joint behavior of metabolic pathways in cancer (Table 1). To reduce noise associated with heterogeneous expression of individual pathways, we considered the expression changes in the space of nine meta-pathways representing major metabolic processes (Online Methods). Combined, the first three principal components were able to capture approximately 85% of the meta-pathway expression variance.

Table 1
Principal component analysis (PCA) of gene expression in major metabolic processes. The PCA was performed using the average expression changes of genes forming nine metapathways representing major biochemical processes. The pathway weights indicate the ...

The first principal component accounts for ~62% of the variance in the meta-pathway expression changes. As all pathway weights for this component have the same sign and similar values, it represents an approximately uniform shift in the overall expression of metabolic genes. The projection of cancer samples onto the plane defined by the first and second principal components (Supplementary Fig. 6) shows that tumors of the digestive system (colon, kidney and liver) have high positive shift along this component, suggesting an overall decrease in metabolic gene expression. In contrast, other tumors (for example, cervix and lymphomas) show an overall increase in metabolic gene expression. It is likely that the observed shifts along the first principal component reflect, at least to some extent, the loss of specific metabolic functions required by the corresponding normal tissues, and the switch to a metabolic program primarily focused on growth and proliferation. This may account for the overall decrease in expression of metabolic genes observed in tumors of the gastrointestinal system that normally have high metabolic gene expression unique to these differentiated tissues.

Shifts along the second component, explaining ~16% of the expression variance, involve a change in the expression of glycolysis and nucleotide biosynthesis with a concomitant opposite change in the expression of three catabolic pathways. Because an increased rate of nucleotide biosynthesis is especially important during ribosome biogenesis and chromosomal duplication, our results suggest that dividing cells appear to increasingly rely on glycolysis. Oxidative phosphorylation is also associated with the second component, although with a significantly smaller weight than glycolysis (0.21 versus 0.65). Consequently, along this component glycolysis occurs concurrently with oxidative phosphorylation. Shifts along the third principal component, explaining ~7% of the variance, primarily involve a strong change in the expression of oxidative phosphorylation with a concomitant opposite change in nucleotide biosynthesis. Consequently, a strong up-regulation of oxidative phosphorylation along this component is likely to be associated with slower growth rates.

Individual biochemical reactions and isoenzymes

Next, we focused on expression changes associated with individual biochemical reactions, which form the most basic level in hierarchical organization of the human metabolic network. We used 2,307 reactions, each associated with at least one known enzyme (Online Methods) in a model of human metabolism21. In the human metabolic network and in the networks of other organisms22, a given biochemical reaction is frequently catalyzed by several different isoenzymes. Isoenzymes may be encoded by separate genes or arise from splice variants of the same gene. In the network model we used21, ~30% of metabolic reactions contain at least two known isoenzymes, and this percentage is even higher (~40%) for the reactions of central carbon metabolism. Different kinetic and regulatory properties of isoenzymes are often fine-tuned to meet specific metabolic requirements of various human tissues22. Owing to metabolic demands and constraints different from those of native tissues, it is likely that tumors might preferentially express isoenzymes that facilitate survival and uncontrolled proliferation23, 24.

The heterogeneity of isoenzyme expression across tumors is apparent from the analysis of central metabolism (Fig. 3). Although genes encoding glycolytic enzymes are frequently up-regulated in tumors, some isoenzymes are down-regulated in specific cancers. Gene expression is significantly increased for key enzymes of the pentose phosphate pathway (PPP), including both the oxidative and non-oxidative branches. Lactate dehydrogenase (LDH) is also strongly up-regulated, consistent with the high level of lactate production observed in many tumors25. Frequent down-regulation of the PDH complex likely contributes to a decreased flux of pyruvate into the TCA cycle observed in many tumors. The enzymes essential for purine and pyrimidine synthesis, as well as the glutathione synthetase (GSS)26, are strongly up-regulated. Although glutaminase (GLS) - an enzyme important for the TCA cycle anaplerosis - is generally down-regulated, it has been demonstrated that this enzyme is strongly up-regulated post-transcriptionally by the MYC-mediated suppression of miR-23a/b27. Notably, a recent study28 suggested that an alternative route for the glutamine-to-glutamate transformation, perhaps through nucleotide biosynthesis amidotransferases, may play an important role in the glutamine-dependent anaplerosis. This hypothesis is consistent with a strong up-regulation of the corresponding enzymes (PPAT and CAD) across tumors.

Figure 3
Tumor-induced mRNA expression changes for individual biochemical reactions in central metabolism. (a) Each metabolic reaction is marked with the number of tumors (out of 22 considered in our analysis) in which at least one isoenzyme catalyzing the corresponding ...

We investigated changes in relative isoenzyme expression using the Kullback-Leibler (KL) divergence (Online Methods); the KL divergence is an information-theoretic measure used to quantify the difference between two probability distributions. For each biochemical reaction the KL divergence was used to measure shifts in the distribution of isoenzyme expression between tumors and corresponding normal tissues. This analysis demonstrated that, on average, the relative expression patterns of isoenzymes are about two times more similar for different samples of identical normal tissues than for different samples of identical tumors (Fig. 4a). But more importantly, both of these distances are significantly smaller than the average distance between isoenzyme expression patterns in tumors and corresponding normal tissues (Mann-Whitney U test P-value < 2*10−16). This suggests that for many biochemical reactions neoplastic transformation leads to a significant shift in the relative expression of isoenzymes.

Figure 4
Cancer-induced changes in relative isoenzyme expression. (a) The Kullback-Leibler (KL) divergence was used to characterize differences in the relative expression of isoenzymes for all biochemical reactions with multiple isoenzymes. Colors represent distributions ...

The human aldolase is a notable example of an enzyme with perturbed expression patterns in tumors (Fig. 4b). The enzyme has three main isoforms A, B and C. Although aldolase A (ALDOA) is preferentially expressed in muscle cells, it is also strongly expressed in most other human tissues. Aldolase B (ALDOB) is preferentially expressed in the liver and aldolase C (ALDOC) in the brain. The expression analysis shows that the expression of ALDOA, relative to the other aldolase isoenzymes, significantly increases in tumors. Notably, ALDOA is also highly expressed in developing embryos29, and therefore may be particularly suitable for metabolic requirements during fast cell division. Indeed, kcat value for ALDOA is significantly higher than that of the other isoenzymes30.

Another example of an enzyme with perturbed expression patterns is aconitase. Our analysis suggests that the citrate efflux from the TCA cycle is likely to be enhanced in cancers by frequent down-regulation of the mitochondrial isoform of aconitase (ACO2) (Fig. 3a,b). Cytosolic citrate is used to generate acetyl-CoA, an important precursor required for many biosynthetic reactions involving lipogenesis31. Inhibition of the mitochondrial aconitase in normal human tissues32 and yeast33 was previously shown to significantly increase the TCA citrate efflux. The strong up-regulation of the ATP citrate lyase (ACL) across tumors (Fig. 3) provides additional support for the idea that these changes promote fatty acid biosynthesis in tumors. A recent study showed that an important route for the synthesis of lipogenic acetyl-CoA under hypoxia34 is through reductive metabolism of α-ketoglutarate by cytosolic isocitrate dehydrogenase (IDH1) and cytosolic aconitases (ACO1/ACO3). This pathway is also supported by the observed expression patterns because in contrast to the mitochondrial aconitase, the cytoplasmic aconitases are frequently up-regulated in specific cancers (Fig. 3a), and similar patterns are observed for IDH1 (see below).

To identify specific isoenzymes with frequently perturbed expression profiles, we calculated, for each isoenzyme in every biochemical reaction, the number of tumors in which the fractional expression of the isoenzyme among all isoenzymes catalyzing the same reaction is significantly up-regulated (Online Methods). After correcting for multiple hypothesis testing (Online Methods), 919 isoenzymes were relatively up-regulated in at least one tumor type, and 322 were up-regulated in more than 25% of the 22 tumor types considered in our analysis (Supplementary Table 12).

Expression of enzymes with recurrent tumor mutations

We next investigated expression changes for metabolic genes with known tumor-associated mutations. Recent sequencing studies have identified recurrent mutations in several genes associated with the TCA cycle35, 36. Heterozygous somatic mutations in two isoenzymes of isocitrate dehydrogenase (cytoplasmic IDH1 and mitochondrial IDH2) are frequently detected in gliomas and acute myeloid leukemia (AML). These gain-of-function mutations affect the IDH active site, and make it possible for the mutated enzymes to catalyze the conversion of α-ketoglutarate to 2-hydroxyglutarate (2HG), which has been proposed to promote cancer development37. Our analysis reveals that IDH1 and IDH2 isoenzymes are frequently up-regulated in cancers (Fig. 3b), but the expression of the other isocitrate dehydrogenase isoenzyme IDH3 (not commonly mutated in tumors) is not significantly perturbed. A detailed analysis of IDH expression across cancers (Supplementary Table 13) demonstrated that the up-regulation P-values of IDH1 and IDH2 for the three brain cancers and AML are among the five most significant of all considered tumor types. Recent sequencing efforts38 also demonstrated the presence of similar IDH active site mutations in peripheral T-cell lymphoma, another tumor in our study with significant up-regulation of IDH expression (Supplementary Table 13).

Germline and somatic loss-of-function mutations in fumarate hydratase (FH) and three subunits of succinate dehydrogenase (SDHB, SDHC, SDHD) are also observed in several tumors including renal cell carcinoma (RCC)36, 39. These deleterious mutations lead to the accumulation of the metabolites fumarate and succinate that regulate hypoxia-inducible factor (HIF) protein levels and chromatin state to influence tumor growth40, 41. We found that the SDH subunits (SDHB, SDHC, SDHD) and FH are strongly down-regulated specifically in RCC (Supplementary Tables 14 and 15). The only cancer in our analysis with a more significant down-regulation is colorectal cancer, in which decreased expression of SDH was reported previously42. Although no somatic mutation in SDH or FH has been observed in colorectal cancer43, 44, the significant decrease in their expression, similar to deleterious mutations in other tumors, is likely to cause mitochondrial efflux of the tumor-promoting TCA cycle intermediates and contribute to tumorigenesis.

Analysis of TCA cycle metabolites in colon cancer

To confirm our computational prediction about the TCA cycle intermediates in colon cancer, we measured and analyzed concentrations of specific metabolites from 10 colon cancer patients. The metabolite levels were obtained using GC/MS or LC/MS (Online Methods) and contained matched tumor and normal samples from each patient.

Consistent with significant down-regulation of oxidative phosphorylation pathway genes (Wilcoxon signed-rank test P-value = 10−9, Supplementary Table 7) and down-regulation of the PDH complex (P-value = 0.02) that controls the majority of glucose carbon flux into the TCA cycle, there is a significant decrease in the citrate concentration (Wilcoxon signed-rank test P-value = 0.01) in tumor samples and a concomitant increase in the lactate concentration (P-value = 0.001). Despite a large decrease in the citrate concentration (average decrease ~65%, median ~90%), the average concentration of a downstream metabolite succinate is only 33% lower than normal and the average concentration of fumarate is more than 50% higher than in normal samples (P-value = 0.03). This pattern of concentration changes is consistent with the significant down-regulation of the FH and SDH enzymes observed in colon cancer expression profiles. Such a down-regulation should lead to a significant increase, relative to the available citrate, in the concentration of their substrates fumarate and succinate. Notably, it was previously demonstrated40 that even a small increase in fumarate concentration is enough to stabilize HIF1A by inhibition of the α-ketoglutarate-dioxygenases regulating its degradation41. The average increase in fumarate concentration (~50%) was about half of the amount observed previously for bi-allelic deletions of the FH enzyme (~90%)40. For four patients in our samples, fumarate concentration was >50% higher than in matched normal samples and for three it was >100% higher. Consequently, the expression changes we observed should mimic the effects of cancer-associated heterozygous FH mutations in a substantial fraction of colon cancer patients.


Reprogramming of the metabolic network is now considered to be a hallmark of neoplastic transformation2. An overarching conclusion of our study is that cancer-induced changes in the expression of metabolic genes are very heterogeneous across different tumor types, i.e. there is no uniform metabolic transformation associated with all tumors. We observe heterogeneous behavior at all levels of biochemical organization, from global expression patterns to metabolic pathways to individual reactions and corresponding isoenzymes. The heterogeneous behavior of cancer metabolism is reminiscent of the high variability observed between tumors in terms of genetic and expression changes in signaling and regulatory pathways3.

Notably, despite the heterogeneity between cancers, the metabolic expression changes associated with individual tumors are not random. On the contrary, many of the observed changes are reproducible in independent samples of the same tumors. We can discern several principles unifying the observed tumor-induced expression perturbations. First, tumors often retain a significant imprint of the metabolic expression patterns present in the corresponding native tissues. This may be a consequence of similar local environments or indicate a relative rigidity of the metabolic expression program established in the original tissue. Such behavior is conceptually similar to the minimization of metabolic adjustments (MOMA) principle observed in microbial metabolism following genetic perturbations45. Second, a large fraction of the variance in the expression of major biochemical processes can be rationalized in terms of several principal components, representing important expression modes for key metabolic processes. Although, in agreement with physiological studies14, 46, we do not observe universal up- or down-regulation for genes associated with oxidative phosphorylation, the collective expression changes along the second and third principal components suggest that fast-growing cells increasingly rely on glucose fermentation. Third, we find that many hundreds of isoenzymes show significant and tumor-specific expression changes. A substantial fraction of these changes are likely to be functionally important and, at least in some cases, mimic (as in the case of SDH and FH) or possibly enhance (as in the case of IDH) the effects of recurrent tumor-promoting genetic mutations.

Beyond understanding of tumor-induced expression changes, we believe that our analysis has important implications for the development of anticancer therapeutics. Functionally important isoenzymes with cancer-specific expression changes can potentially serve as drug targets. The possibility of targeting specific isoenzymes, such as GLS18 and PKM27, has already been demonstrated, but our analysis suggests that many other potential targets may be pursued in a similar way. Due to the tumor-specific nature of the observed expression patterns, such targeting will require a focused analysis and understanding of essential metabolic transformations in each specific cancer type.


Microarray expression datasets

Published gene expression datasets were assembled from the GEO9 and ArrayExpress10 databases (Supplementary Table 1). Unless specified otherwise, we analyzed only expression data obtained using the most comprehensive human expression array platform (HG U133 Plus 2.0; Supplementary Table 2). For calculations involving global network properties and comparisons of expression data between different studies (Fig. 1), samples from all datasets were processed together. For all other calculations, tumor and normal samples from the same study were processed together. The affyQCReport package from Bioconductor ( was used to search for poor quality chips. For GeneChip arrays that passed Quality Control (QC) checks, we used the GCRMA algorithm47 from Bioconductor to perform quantile normalization and extract gene expression values on the log2 scale.

Calculation of differential expression (DE) for metabolic genes

Separately for each dataset, the Bioconductor method limma48, which is based on a modified t-statistic, was used to analyze differences between tumor samples and corresponding normal samples. Using the method we calculated the differential expression for each metabolic gene on the log2 scale. The differential expression P-values were adjusted for multiple hypothesis testing using Benjamini and Hochberg’s method49, controlling False Discovery Rate (FDR) at 5%.

Calculation of the global divergence between a pair of expression profiles

Two different measures of divergence between a pair of expression profiles were used in our study: (1) The Euclidean distance, RMSD=i=1n[Average(log2xi)Average(log2yi)]2n, where xi and yi are the expression of gene i over two expression profiles with p and q samples (x1, x2 ,…, xp ), (y1, y2 ,…, yq ), n = 1421 is the number of genes assigned to at least one metabolic pathway in the KEGG database, (2) The correlation-based distance dcor = 1–r(Average log 2 x), Average(log2 y)), where r is the Spearman’s rank correlation coefficient between average log2 expression values of corresponding genes in the two expression profiles.

When comparing datasets across different studies it is important to consider batch effect arising due to variations in laboratory conditions and measurements. To explore and address batch effect, we collected a set of microarray expression data for the same tissues/tumor types from multiple independent studies (Supplementary Table 4). All samples in Supplementary Table 4 were processed and normalized together. To estimate the influence of the batch effect, we calculated d1, the average expression distance between tumors (Tumorn) and corresponding normal tissues (Normaln) measured in different studies, and d2, the average expression distance between tumors (Tumorn) and corresponding normal tissues (Normaln) measured in the same studies. The difference ( d1d2 ) represents the average batch effect due to comparisons across different studies. To account for the batch effect the difference ( d1d2 ) was subtracted from all expression distances calculated between different studies.

Identification of metabolic pathways with significant expression changes

We used two different approaches to identify metabolic pathways with significant expression changes. The two approaches resulted in very similar results. In the first approach, which was used for all calculations presented in the paper, for each gene a, we calculated its expression change in tumor sample i relative to the corresponding normal samples, ΔEai=log2xaiAverage(log2ya), where xai is the expression in tumor sample i, and ya is the expression in the s corresponding normal samples (y1 ,y2 ,…, ys ). Wilcoxon signed-rank test of ΔE for all genes within a metabolic pathway was then used to determine the significance of up- or down- regulation of the pathway in that tumor sample. In the second approach, for each gene a, we calculated the z-score of its expression in tumor sample i relative to the distribution of its expression values in the s corresponding normal samples, zai=(log2xaiAverage(log2ya))σ(log2ya), where σ(log2ya) is the standard deviation. Wilcoxon signed-rank test of z was then used to determine the significance of up- or down- regulation of each pathway in that tumor sample. The pathway expression heterogeneity based on the second approach is shown in Supplementary Fig. 4c.

Statistical significance of the observed metabolic pathway expression patterns

We used randomized expression data to assess the statistical significance of the reported pathway expression patterns (Fig. 2). To generate the null distributions for the (n¯+m¯) and (n¯m¯) values we used the real expression data and randomly permuted metabolic gene labels while preserving the pathway sizes. We then calculated the null distributions using the same procedure as the one applied to the real data. Supplementary Fig. 3 shows the null distributions for the 10 top-regulated pathways (pathways with highest (n¯+m¯) values in Fig. 2) based on 1000 random permutations of the expression data.

Pathway expression heterogeneity across tumor samples of the same tumor type

To investigate the pathway expression heterogeneity across tumor samples of the same tumor type, we introduced a pathway-specific heterogeneity metric H=nm(n+m), where n is the fraction of tumor samples of a certain tumor type in which the pathway is significantly up-regulated, and m is the fraction of samples in which the pathway is significantly down-regulated. According to this definition, for high H values the pathway shows consistent expression changes across different samples of the same tumor type, i.e. the pathway is mostly up-regulated or mostly down-regulated. On the other hand, for small H values the pathway expression is variable, i.e. in some tumor samples the pathway is significantly up-regulated, while in other samples of the same tumor type the pathway is significantly down-regulated. The distribution of H values across 22 tumor types or 16 tumor types of different tissue-of-origin for 10 top-regulated pathways (pathways with highest (n+m) values), is shown in Supplementary Fig. 5. The 16 tumors types of different tissue-of-origin were obtained from 22 tumor types by considering samples of the three types of brain cancers, the two types of breast cancers and the four types of lymphomas together, respectively.

Calculation of significant relationships between metabolic pathway expression and expression of non-metabolic cancer/signaling genes

The context likelihood of relatedness (CLR) method18 is based on mutual information between expression profiles. CLR was used to identify significant relationships between 214 non-metabolic cancer/signaling genes annotated in the KEGG database and the 87 KEGG metabolic pathways (Supplementary Table 5). The set of 214 non-metabolic cancer/signaling genes was assembled using the following two criteria: (1) genes either from the 14 KEGG cancer or 25 KEGG signaling pathways (see Supplementary Table 9), and (2) not within any of the 87 KEGG metabolic pathways. For each gene a in each tumor sample i, the expression change ΔEai was calculated. And the mutual information (MI) between each non-metabolic cancer/signaling gene i and each metabolic pathway j was calculated across all tumor samples in our study: MIij=MI(ΔEi,anjΔEanj), where nj is the number of genes within the a pathway j. All mutual information values were computed using 10 bins of ΔE ; the calculated values were not sensitive to the exact number of bins used. The CLR interaction Z-score for each gene i and pathway j pair Zij=(zi2+zj2)2 was calculated using (I) the z-score (zi ) of MIij relative to the distribution of {MIi,1, MIi,2,…,MIi,87 }, and (II) the z-score ( zj ) of MIij relative to the distribution of {MI1,j,MI2,j,…,MI214,j }. In Supplementary Table 8 we show the identified significant relationships (with Z-score > 2.0) for each metabolic pathway.

Principal component analysis

The nine meta-pathways used for the principal component analysis were compiled by combining genes from corresponding metabolic pathways in the KEGG and BioCyc databases. To perform the principal component analysis (PCA), we calculated the p-by-q matrix D for tumor-to-normal expression changes of the meta-pathways, where p = 466 (the total number of tumor samples in our study) and q = 9 (the number of meta-pathways). We used two different approaches to calculate D. The two approaches resulted in very similar principal components. In the first approach, the (i, j)-element of the matrix Dij is the average gene-specific expression changes in tumor sample i across nj genes within meta-pathway j: Dij=a=1njΔEainj. In the second approach, Dij is the average gene-specific z-scores: Dij=a=1njzainj. Principal components were then obtained using the covariance method, i.e. we first centered the columns of D by subtracting the column means, and then calculated a covariance matrix based on D. The covariance matrix was then diagonalized and the eigenvectors and eigenvalues were calculated.

The results of the PCA analysis based on the first approach are shown in Table 1 and the results based on the second approach in Supplementary Table 10. We also explored the influence of genes shared between meta-pathways on the PCA results. The results obtained when all overlapping genes were excluded (Supplementary Table 11) were very similar to the results with all meta-pathway genes.

Human metabolic annotations used for isoenzyme expression analysis

The human metabolic network compiled by Duarte et al.21 was used for isoenzyme expression analysis. The network contains 1496 genes, 2712 compartment-specific metabolites, and 3743 internal and exchange reactions. In the analysis we used 2307 network reactions that are associated with at least one known metabolic gene. Proteins that are responsible for catalysis of identical reactions and are not members of the same complex were considered as isoenzymes. In total, the network by Duarte et al. contains 667 metabolic reactions with at least two isoenzymes.

Calculation of distances between isoenzyme expression patterns

The Kullback-Leibler (KL) divergence was used to quantify the changes in the relative expression of isoenzymes for pairs of expression profiles. For each sample, the fractional expression of a particular isoenzyme i was first calculated fi=xiinxi (n is the number of isoenzymes catalyzing the reaction and xi is the expression value of the isoenzyme i). The flexmix package in R was then used to estimate the Kullback-Leibler divergence between the discrete distributions {m(f1),m(f2),…,m(fn)} and {g(f1),g(f2),…,g(fn)}, where m(f) and g(f) are the averages of the two expression profiles over p and q samples (x1 ,x2 ,…, xp ), (x1 ,x2 ,…, xq ).

Identification of isoenzymes preferentially expressed in specific tumors

For each considered isoenzyme we used the non-parametric Mann-Whitney U test to determine the significance of its fractional expression changes in the tumor samples relative to the normal samples. Specifically, for an isoenzyme i we calculated its fractional expression among all isoenzymes associated with the same reaction: fi=xiinxi, where n is the number of isoenzymes catalyzing the same reaction, and xi is the expression value of the isoenzyme i. We then used the Mann-Whitney U statistic to test the hypothesis that the distribution of fi values for tumor samples associated with a particular cancer type has significantly larger mean than the distribution of fi values for the corresponding normal samples. All the P-values were FDR-adjusted at 5% considering the total number of tested hypothesis, 22704 (1032 isoenzymes times 22 cancer types). The isoenzymes passing the significance threshold (P-value < 0.05) are reported in Supplementary Table 12. We confirmed the isoenzyme results using an independently collected expression data from the TCGA consortium50; for the confirmation we used four tumor types from TCGA (glioblastoma multiforme, breast invasive carcinoma, colon adenocarcinoma, ovarian serous cystadenocarcinoma). Using the same tumor type 70% of the isoenzymes in Supplementary Table 12 showed the same up-regulation behavior in TCGA as in the dataset analyzed in the paper.

Statistical significance and multiple hypothesis testing

For pathway and isoenzyme calculations involving multiple hypothesis testing, all the corresponding P-values were adjusted with the BH procedure49 (using the multtest package in R) to control the false discovery rate (FDR) at 0.05. The FDR-corrected P-values were used to analyze statistical significances, and unless specified otherwise, significance was reported for the adjusted P-value < 0.05.

Quantitative metabolite profiling of TCA cycle intermediates in colon cancer. Sample collection and metabolite extraction

Tumors and surrounding grossly normal-appearing tissues were obtained from 10 colon cancer patients after surgical treatment. The excised tissues were immediately stored at −80°C. Samples were extracted and prepared for analysis using Metabolon’s standard solvent extraction method. The extracted samples were split into equal parts for analysis on the GC/MS and LC/MS platforms.


The samples destined for GC/MS analysis were re-dried under vacuum desiccation for a minimum of 24 hours prior to being derivatized under dried nitrogen using bistrimethyl-silyltriflouroacetamide (BSTFA). The GC column was 5% phenyl and the temperature ramp is from 40° to 300° C in a 16 minute period. Samples were analyzed on a Thermo-Finnigan Trace DSQ fast-scanning single-quadrupole mass spectrometer using electron impact ionization.


The LC/MS portion of the platform was based on a Waters ACQUITY UPLC and a Thermo-Finnigan LTQ mass spectrometer, which consisted of an electrospray ionization (ESI) source and linear ion-trap (LIT) mass analyzer. The sample extract was split into two aliquots, dried, then reconstituted in acidic or basic LC-compatible solvents, each of which contained 11 or more injection standards at fixed concentrations. One aliquot was analyzed using acidic positive ion optimized conditions and the other using basic negative ion optimized conditions in two independent injections using separate dedicated columns. Extracts reconstituted in acidic conditions were gradient eluted using water and methanol both containing 0.1% Formic acid, while the basic extracts, which also used water/methanol, contained 6.5 mM Ammonium Bicarbonate. The MS analysis alternated between MS and data-dependent MS2 scans using dynamic exclusion.

Data extraction and compound identification

The data extraction of the raw mass spec data files yielded information that could be loaded into a relational database and manipulated without resorting to BLOB manipulation. Once in the database the information was examined and appropriate QC limits were imposed. Peaks were identified using Metabolon’s proprietary peak integration software, and component parts were stored in a separate and specifically designed complex data structure. TCA cycle intermediates were identified by comparison to library entries of purified standards. The combination of chromatographic properties and mass spectra gave an indication of a match to the specific compound or an isobaric entity. The collected metabolite data is presented in Supplementary Table 16.

Figure 5
Concentration changes for measured metabolites of the TCA cycle. The metabolite data, obtained from 10 colon cancer patients, contained matched normal and tumor samples. Every point in the figure represents the log2 ratio of tumor-to-normal concentration ...

Supplementary Material



We would like to sincerely thank the members of the Vitkup laboratory for discussions. This work was supported by US National Institutes of Health grant GM079759 to D.V. and US National Centers for Biomedical Computing grant U54CA121852 to Columbia University. J.W.L. is supported by an NIH Pathway to Independence Award R00CA168997. J.H.B. is supported by an Ellison Medical Foundation New Scholar award AG-NS-0577-09, a National Institute of Environmental Health Sciences grant R01ES019319, and New Development Funds from the Fred Hutchinson Cancer Research Center. M.G.V.H. acknowledges support from the Burroughs Wellcome Fund, the Damon Runyon Cancer Research Foundation, the Smith Family, and the National Cancer Institute.


AUTHOR CONTRIBUTIONS J.H. performed computational research and data analysis, interpreted the results and wrote the manuscript. J.W.L. and M.G.V.H. interpreted the results and edited the manuscript. L.C.C. contributed to the interpretation of the results. J.O. and K.S. contributed tissue samples. J.H.B. designed and supervised experimental research and data analysis. D.V. designed the study, supervised the project, interpreted the results and wrote the manuscript. All authors read and approved the manuscript.

COMPETING FINANCIAL INTERESTS Authors declare no competing financial interests. However, M.G.V.H. is a consultant and SAB member, and L.C.C. is a consultant and founder of Agios Pharmaceuticals.


1. Warburg O, Posener K, Negelein E. On the metabolism of carcinoma cells. Biochem Z. 1924;152:309–344.
2. Hanahan D, Weinberg RA. Hallmarks of cancer: the next generation. Cell. 2011;144:646–674. [PubMed]
3. Vogelstein B, Kinzler KW. Cancer genes and the pathways they control. Nat Med. 2004;10:789–799. [PubMed]
4. Vander Heiden MG, Cantley LC, Thompson CB. Understanding the Warburg effect: the metabolic requirements of cell proliferation. Science. 2009;324:1029–1033. [PMC free article] [PubMed]
5. Deberardinis RJ, Sayed N, Ditsworth D, Thompson CB. Brick by brick: metabolism and tumor cell growth. Curr Opin Genet Dev. 2008;18:54–61. [PMC free article] [PubMed]
6. Hsu PP, Sabatini DM. Cancer cell metabolism: Warburg and beyond. Cell. 2008;134:703–707. [PubMed]
7. Anastasiou D, et al. Pyruvate kinase M2 activators promote tetramer formation and suppress tumorigenesis. Nat Chem Biol. 2012;8:839–847. [PMC free article] [PubMed]
8. Le A, et al. Glucose-independent glutamine metabolism via TCA cycling for proliferation and survival in B cells. Cell Metab. 2012;15:110–121. [PMC free article] [PubMed]
9. Barrett T, et al. NCBI GEO: archive for functional genomics data sets--10 years on. Nucleic Acids Res. 2011;39:D1005–1010. [PMC free article] [PubMed]
10. Parkinson H, et al. ArrayExpress update--from an archive of functional genomics experiments to the atlas of gene expression. Nucleic Acids Res. 2009;37:D868–872. [PMC free article] [PubMed]
11. Kanehisa M, Goto S, Furumichi M, Tanabe M, Hirakawa M. KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res. 2010;38:D355–360. [PMC free article] [PubMed]
12. Glazko G, Mushegian A. Measuring gene expression divergence: the distance to keep. Biol Direct. 2010;5:51. [PMC free article] [PubMed]
13. Romero P, et al. Computational prediction of human metabolic pathways from the complete human genome. Genome Biol. 2005;6:R2. [PMC free article] [PubMed]
14. Koppenol WH, Bounds PL, Dang CV. Otto Warburg’s contributions to current concepts of cancer metabolism. Nat Rev Cancer. 2011;11:325–337. [PubMed]
15. Smith CA, Moss JE, Gough AC, Spurr NK, Wolf CR. Molecular genetic analysis of the cytochrome P450-debrisoquine hydroxylase locus and association with cancer susceptibility. Environ Health Perspect. 1992;98:107–112. [PMC free article] [PubMed]
16. Khedhaier A, et al. Implication of Xenobiotic Metabolizing Enzyme gene (CYP2E1, CYP2C19, CYP2D6, mEH and NAT2) polymorphisms in breast carcinoma. BMC Cancer. 2008;8:109. [PMC free article] [PubMed]
17. Friedman N. Inferring cellular networks using probabilistic graphical models. Science. 2004;303:799–805. [PubMed]
18. Faith JJ, et al. Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol. 2007;5:e8. [PubMed]
19. Gillies RJ, Robey I, Gatenby RA. Causes and consequences of increased glucose metabolism of cancers. J Nucl Med. 2008;49(Suppl 2):24S–42S. [PubMed]
20. Jolliffe IT. Principal component analysis. Edn. 2nd Springer; New York: 2002.
21. Duarte NC, et al. Global reconstruction of the human metabolic network based on genomic and bibliomic data. Proc Natl Acad Sci U S A. 2007;104:1777–1782. [PubMed]
22. Lehninger AL, Nelson DL, Cox MM. Lehninger principles of biochemistry. Edn. 5th W.H. Freeman; New York: 2008.
23. Diaz-Ruiz R, Uribe-Carvajal S, Devin A, Rigoulet M. Tumor cell energy metabolism and its common features with yeast metabolism. Biochim Biophys Acta. 2009;1796:252–265. [PubMed]
24. Christofk HR, et al. The M2 splice isoform of pyruvate kinase is important for cancer metabolism and tumour growth. Nature. 2008;452:230–233. [PubMed]
25. Walenta S, Schroeder T, Mueller-Klieser W. Lactate in solid malignant tumors: potential basis of a metabolic classification in clinical oncology. Curr Med Chem. 2004;11:2195–2204. [PubMed]
26. Estrela JM, Ortega A, Obrador E. Glutathione in cancer biology and therapy. Crit Rev Clin Lab Sci. 2006;43:143–181. [PubMed]
27. Gao P, et al. c-Myc suppression of miR-23a/b enhances mitochondrial glutaminase expression and glutamine metabolism. Nature. 2009;458:762–765. [PMC free article] [PubMed]
28. Cheng T, et al. Pyruvate carboxylase is required for glutamine-independent growth of tumor cells. Proc Natl Acad Sci U S A. 2011;108:8674–8679. [PubMed]
29. Asaka M, et al. Alteration of aldolase isozymes in serum and tissues of patients with cancer and other diseases. J Clin Lab Anal. 1994;8:144–148. [PubMed]
30. Kusakabe T, Motoki K, Hori K. Human aldolase C: characterization of the recombinant enzyme expressed in Escherichia coli. J Biochem. 1994;115:1172–1177. [PubMed]
31. Swinnen JV, Brusselmans K, Verhoeven G. Increased lipogenesis in cancer cells: new players, novel targets. Curr Opin Clin Nutr Metab Care. 2006;9:358–365. [PubMed]
32. Costello LC, Franklin RB. Testosterone and prolactin regulation of metabolic genes and citrate metabolism of prostate epithelial cells. Horm Metab Res. 2002;34:417–424. [PMC free article] [PubMed]
33. Evans CT, Scragg AH, Ratledge C. A comparative study of citrate efflux from mitochondria of oleaginous and non-oleaginous yeasts. Eur J Biochem. 1983;130:195–204. [PubMed]
34. Metallo CM, et al. Reductive glutamine metabolism by IDH1 mediates lipogenesis under hypoxia. Nature. 2012;481:380–384. [PMC free article] [PubMed]
35. Dang L, Jin S, Su SM. IDH mutations in glioma and acute myeloid leukemia. Trends Mol Med. 2010;16:387–397. [PubMed]
36. Frezza C, Pollard PJ, Gottlieb E. Inborn and acquired metabolic defects in cancer. J Mol Med. 2011;89:213–220. [PMC free article] [PubMed]
37. Dang L, et al. Cancer-associated IDH1 mutations produce 2-hydroxyglutarate. Nature. 2009;462:739–744. [PMC free article] [PubMed]
38. Zou Y, et al. IDH1 and IDH2 mutations are frequent in Chinese patients with acute myeloid leukemia but rare in other types of hematological disorders. Biochem Biophys Res Commun. 2010;402:378–383. [PubMed]
39. Isaacs JS, et al. HIF overexpression correlates with biallelic loss of fumarate hydratase in renal cancer: novel role of fumarate in regulation of HIF stability. Cancer Cell. 2005;8:143–153. [PubMed]
40. Koivunen P, et al. Inhibition of hypoxia-inducible factor (HIF) hydroxylases by citric acid cycle intermediates: possible links between cell metabolism and stabilization of HIF. J Biol Chem. 2007;282:4524–4532. [PubMed]
41. Hewitson KS, et al. Structural and mechanistic studies on the inhibition of the hypoxia-inducible transcription factor hydroxylases by tricarboxylic acid cycle intermediates. J Biol Chem. 2007;282:3293–3301. [PubMed]
42. Habano W, et al. Reduced expression and loss of heterozygosity of the SDHD gene in colorectal and gastric cancer. Oncol Rep. 2003;10:1375–1380. [PubMed]
43. Bass AJ, et al. Genomic sequencing of colorectal adenocarcinomas identifies a recurrent VTI1A-TCF7L2 fusion. Nat Genet. 2011;43:964–968. [PMC free article] [PubMed]
44. Wood LD, et al. The genomic landscapes of human breast and colorectal cancers. Science. 2007;318:1108–1113. [PubMed]
45. Segre D, Vitkup D, Church GM. Analysis of optimality in natural and perturbed metabolic networks. Proc Natl Acad Sci U S A. 2002;99:15112–15117. [PubMed]
46. Zu XL, Guppy M. Cancer metabolism: facts, fantasy, and fiction. Biochem Biophys Res Commun. 2004;313:459–465. [PubMed]
47. Wu Z, Irizarry RA. Preprocessing of oligonucleotide array data. Nat Biotechnol. 2004;22:656–658. author reply 658. [PubMed]
48. Smyth GK. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol. 2004;3 Article3. [PubMed]
49. Benjamini Y, Hochberg Y. Controlling the False Discovery Rate - a Practical and Powerful Approach to Multiple Testing. J Roy Stat Soc B Met. 1995;57:289–300.
50. Stratton MR, Campbell PJ, Futreal PA. The cancer genome. Nature. 2009;458:719–724. [PMC free article] [PubMed]