|Home | About | Journals | Submit | Contact Us | Français|
Conceived and designed the experiments: MEP KRP. Performed the experiments: AZ THP SS. Analyzed the data: AZ THP SS MEP KRP. Wrote the paper: AZ THP MEP KRP.
Type 2 diabetes mellitus (T2DM) is a disorder characterized by both insulin resistance and impaired insulin secretion. Recent transcriptomics studies related to T2DM have revealed changes in expression of a large number of metabolic genes in a variety of tissues. Identification of the molecular mechanisms underlying these transcriptional changes and their impact on the cellular metabolic phenotype is a challenging task due to the complexity of transcriptional regulation and the highly interconnected nature of the metabolic network. In this study we integrate skeletal muscle gene expression datasets with human metabolic network reconstructions to identify key metabolic regulatory features of T2DM. These features include reporter metabolites—metabolites with significant collective transcriptional response in the associated enzyme-coding genes, and transcription factors with significant enrichment of binding sites in the promoter regions of these genes. In addition to metabolites from TCA cycle, oxidative phosphorylation, and lipid metabolism (known to be associated with T2DM), we identified several reporter metabolites representing novel biomarker candidates. For example, the highly connected metabolites NAD+/NADH and ATP/ADP were also identified as reporter metabolites that are potentially contributing to the widespread gene expression changes observed in T2DM. An algorithm based on the analysis of the promoter regions of the genes associated with reporter metabolites revealed a transcription factor regulatory network connecting several parts of metabolism. The identified transcription factors include members of the CREB, NRF1 and PPAR family, among others, and represent regulatory targets for further experimental analysis. Overall, our results provide a holistic picture of key metabolic and regulatory nodes potentially involved in the pathogenesis of T2DM.
Type 2 diabetes mellitus is a complex metabolic disease recognized as one of the main threats to human health in the 21st century. Recent studies of gene expression levels in human tissue samples have indicated that multiple metabolic pathways are dysregulated in diabetes and in individuals at risk for diabetes; which of these are primary, or central to disease pathogenesis, remains a key question. Cellular metabolic networks are highly interconnected and often tightly regulated; any perturbations at a single node can thus rapidly diffuse to the rest of the network. Such complexity presents a considerable challenge in pinpointing key molecular mechanisms and biomarkers associated with insulin resistance and type 2 diabetes. In this study, we address this problem by using a methodology that integrates gene expression data with the human cellular metabolic network. We demonstrate our approach by analyzing gene expression patterns in skeletal muscle. The analysis identified transcription factors and metabolites that represent potential targets for therapeutic agents and future clinical diagnostics for type 2 diabetes and impaired glucose metabolism. In a broader perspective, the study provides a framework for analysis of gene expression datasets from complex diseases in the context of changes in cellular metabolism.
Type 2 diabetes mellitus (T2DM) is emerging as one of the main threats to human health in the 21st century with an estimated 300 million individuals with T2DM by the year 2025 ,. T2DM is characterized by both insulin resistance (as manifested by reduced insulin-stimulated glucose uptake in skeletal muscle and adipose tissue and inappropriately high hepatic glucose output ,) and reduced insulin secretion by pancreatic β-cells ,. Although the specific molecular pathophysiology remains unclear, many risk factors have been identified for T2DM, including family history of diabetes and prominent environmental factors such as alterations in early life development, excessive food intake, obesity, decreased physical activity and aging ,,. At the cellular level, multiple regulatory mechanisms and metabolic pathways may contribute to the pathogenesis of insulin resistance, potentially mediated by alterations in insulin signaling , mitochondrial oxidative metabolism and ATP production –, fatty acid oxidation , or proinflammatory signaling . Similarly, alterations in β-cell development and metabolism  may contribute to decreased insulin secretion.
Available human tissue transcriptome data related to T2DM , provide an opportunity for identification of novel molecular mechanisms underlying the metabolic phenotype of T2DM. This task is challenging due to the need to account for the inherent high connectivity of bio-molecular interaction networks. We have utilized a network-centered methodology to link diabetes-related alterations in gene expression to metabolic hot spots and transcription factors potentially responsible for gene expression changes.
Metabolic phenotypes at a cellular level are essentially characterized by concentrations of metabolites and fluxes through the reactions that make up the metabolic network. Fluxes, in turn, are dependent on metabolite levels, enzyme activities, abundance of effectors and possibly other variables. Measurement of fluxes and metabolite concentrations at the entire metabolic network-scale is, however, a difficult task in humans due to a variety of technological and experimental limitations. By contrast, methods for measurement of expression of genes encoding metabolic enzymes are relatively well-established. Thus, the primary goal of this study is to use informatics approaches to integrate available gene expression data with metabolic networks, in order to predict metabolic phenotypes of skeletal muscle linked to the pathogenesis of type 2 diabetes. Such an approach will help not only to gain insight into the organization of transcriptional regulation in human tissues, but also provide guidance for improved design of experimental strategies for obtaining metabolite and flux data, which can be further integrated into metabolic models.
To achieve these goals, we applied an extension of the algorithm described in  (for various applications of this algorithm see –), which enables identification of so-called reporter metabolites, or metabolic hot spots around which transcriptional regulation is centered (Figure 1A). This analysis is based on the assumption that under most conditions of physiological interest, fluxes through enzymes connected to a metabolite are coordinated in order to maintain physiological homeostasis, or to eventually reach a new (pseudo-) steady state. Moreover, transcriptional regulation of expression of genes encoding critical enzymes in metabolic flux pathways facilitates concordance with the metabolic demands of the cell and corresponding stoichiometric and thermodynamic constraints on fluxes. For this analysis, we used two recently published human metabolic network models: i) Homo sapiens Recon1 , and ii) Edinburgh Human Metabolic Network (EHMN) .
We further hypothesized that the observed coordinated changes around reporter metabolites can be, at least in some cases, attributed to common transcriptional regulatory mechanisms. Specifically, we hypothesize that the neighbor enzymes of reporter metabolites may share one or more transcription factor binding sites in the promoter regions of the corresponding genes. In order to identify such potential regulatory players, we tested promoter sequences of the genes associated with the reporter metabolites for enrichment of known transcription factor binding motifs (Figure 1B). Transcription factors identified in this fashion provide clues to the regulatory mechanisms that lead to observed gene expression changes in the metabolic network.
Since our goal is to identify reporter metabolites and transcription factors potentially involved in diabetes pathogenesis and progression, we analyzed two independent studies of skeletal muscle transcriptomics in individuals with established type 2 diabetes or insulin resistance , (Text S1). In the first study , biopsies were obtained following insulin stimulation from a cohort of 43 Swedish men of Caucasian ethnicity with a spectrum of glucose tolerance, including 17 with normal glucose tolerance (NGT), 8 with impaired glucose tolerance (IGT), and 18 with established T2DM. The second dataset  was derived from a cohort of 15 subjects of Mexican American ethnicity, in whom muscle biopsies were performed in the fasting state. Importantly, this cohort included individuals with not only established diabetes (5 subjects, T2DM), but also individuals with completely normal glucose tolerance but a spectrum of insulin resistance; normal glucose tolerant subjects were subdivided by family history-linked diabetes risk (4 family history positive, more insulin resistant subjects, FH+; and 6 family history negative, more insulin sensitive subjects, FH−). With this approach, the individual contributions of isolated insulin resistance and diabetes risk (in the setting of normoglycemia, FH+), mild elevations in postprandial glucose (IGT), and established diabetes can be individually assessed. Moreover, the possible contribution of family history, potentially mediated by genetics or shared environment, can be assessed. Thus, we predict that analysis of the common patterns resulting from the two datasets will identify regulatory signatures potentially independent of study cohort and design variation but common to the pathophysiology of insulin resistance and diabetes.
In present study, we performed reporter metabolite analysis based on pair-wise comparisons within each dataset; differential expression and its significance were assessed with robust multi-array average (RMA) and empirical Bayes testing. Significance of differential expression for each gene was used as a scoring metric (Materials and Methods). The results are summarized as metabolic signatures (reporter metabolites) and regulatory signatures (transcription factors) for T2DM.
Reporter metabolite analysis for three pair-wise comparisons, viz., T2DM vs NGT, T2DM vs IGT, and IGT vs NGT, revealed significant reporter metabolites (p-value≤0.05) participating in lipid metabolism, TCA cycle, oxidative phosphorylation (OXPHOS) and glycolysis (Table 1, Table 2, Table S1 and Table S2). Among reporter metabolites identified for the T2DM vs NGT comparison were lipid species 1,2-diacyl-sn-glycerol (DAG), acetoacetyl-CoA, and the sphingolipid sphinganine. These are interesting, as prior studies , – have demonstrated that the related lipid molecules diacylglycerols (DAG), long-chain fatty acyl CoAs, and ceramides correlate positively with triglyceride content and inversely with insulin sensitivity  and have been shown to induce insulin resistance . Furthermore, given that saturated fatty acids appear to play a particularly important pathogenic role in insulin resistance , it is interesting that several metabolites of saturated fatty acids (such as hexanoyl-CoA, palmitoyl-CoA, tetradecanoyl-CoA, lauroyl-CoA, decanoyl-CoA and butanoyl-CoA) were found as reporter metabolites with mostly up-regulated neighboring genes in the IGT vs NGT comparison (Table 1 and S2), and thus may serve as potential markers of insulin resistance and IGT.
TCA cycle metabolites citrate and 2-oxoglutarate, with down-regulated neighboring genes, were also uncovered as reporter metabolites in the T2DM vs NGT comparison (Table 1, S1 and S2). These results are concordant with a study of human urine metabolome profiles from patients with T2DM , in which levels of citrate and 2-oxoglutarate were lower in T2DM compared to healthy controls . Among other mitochondrial metabolites, reduced and oxidized forms of cytochrome c and ubiquinol were identified as reporter metabolites (T2DM vs NGT, Table S1) with down-regulated expression of the associated genes.
Impaired glucose tolerance typically reflects an important transition between normoglycemia and overt diabetes, reporter metabolites which are identified in both IGT vs NGT and T2DM vs NGT, but not significantly different in the T2DM vs IGT comparison (e.g. phosphatidylethanolamine, 2-hydroxyglutarate, 2-oxoglutarate, 3′,5′-cyclic AMP, ATP, Table S1 and S2) may be considered novel biomarkers of early-stage glucose intolerance.
We similarly performed reporter metabolite analysis using both Recon1 and EHMN metabolic models in the Mexican-American dataset. This analysis revealed significant transcriptional regulation in metabolite nodes in TCA cycle, oxidative phosphorylation, and lipid metabolism, for both T2DM vs FH− and FH+ vs FH− comparisons (Table 2). Similar to the Swedish Caucasian dataset, metabolites involved in oxidative phosphorylation (e.g. ferrocytochrome c, H+, and fumarate) were among the top-ranking reporter metabolites, identified in both the T2DM vs FH− and FH+ vs FH− comparisons (Table 2, Table S3). Interestingly, urinary levels of fumarate, an important link between the TCA cycle and oxidative phosphorylation, were recently found to be decreased in T2DM patients .
Analysis using the EHMN model revealed TCA cycle-related metabolites, including 3-carboxy-1-hydroxypropyl-ThPP, aconitate, succinyl-CoA, malate and fumarate, as significant reporter metabolites (p-value≤0.05), with mostly down-regulated expression of the genes encoding their neighboring enzymes. Ubiquinol was found as reporter metabolite representative of electron transfer chain. Several molecules within β-oxidation pathways, such as 3-cis-dodecenoyl-CoA, glutaryl-CoA, trans-3-decenoyl-CoA, 3-methylbutanoyl-CoA and 3-methylcrotonyl-CoA, as well as in amino acid (leucine, lysine) metabolism were also identified as reporters (Table 2, Table S4). Moreover, glutamate, glycerol derivatives, phosphocreatine, a number of hormone derivatives and many others (Table S3 and S4) were found as significant reporter metabolites in the T2DM vs FH− comparison.
In order to determine the extent of overlap between the two study populations, we performed a cluster analysis of the pair-wise comparisons within the Swedish and Mexican-American datasets (Figure 2). Jaccard distance metric between two pair-wise comparisons (e.g. T2DM vs FH− and FH+ vs FH−) was calculated based on the overlap of reporter metabolites between the two comparisons. Jaccard distance provides a measure of dissimilarity between two sets of reporter metabolites, and is quantified as the fraction of non-overlapping reporter metabolites between the two sets. While similar clustering patterns were observed (Figure 2A and Figure S1A) independent of the use of either EHMN or Recon1 metabolic model, Swedish and Mexican-American studies clustered separately, perhaps related to differences in study population, study design (e.g. fasting studies in Mexican-Americans, insulin-stimulated studies in Swedish) or differences in microarrays used (thus differing in the coverage of metabolic enzymes). We observed substantial overlap between the T2DM vs FH− and FH+ vs FH− comparisons, suggesting that insulin resistance patterns could contribute to these findings.
We next examined the overlap of reporter metabolites between the two case studies (Figure 2B, Figure S1B, Table S5 and Table S6). Owing to differences in the metabolite-gene connectivity between EHMN and Recon1, the number of overlapping reporter metabolites is generally higher for the EHMN analysis. To a large extent, this difference is due to the groups of metabolites in EHMN that share the same gene neighbors (whether two metabolites share the same gene neighbors depends not only on the network used, i.e. number of distinct biochemical reactions associated with a particular enzyme, but also on the coverage of genes on the particular microarray chip used). In addition to many other metabolites, phosphocreatine appeared as a significant reporter in both case studies, viz., for T2DM vs NGT and T2DM vs FH− comparisons. Phosphocreatine is an important energy reservoir metabolite in skeletal muscle, and defects in recovery of phosphocreatine have been identified in vivo in humans with insulin resistance  and diabetes . Interestingly, low levels of urinary creatine have also been found in patients with T2DM .
In order to link the identified reporter metabolites to regulatory pathways controlling gene expression, we hypothesized that enzymes associated with reporter metabolites would be regulated by common transcription factors. As potential candidates subjected to such regulation, we selected all reporter metabolites with at least 5 up- or down-regulated neighboring genes (Materials and Methods). Up- and down-regulated gene sets were then analyzed separately in order to assess whether their promoter regions were enriched for known transcription factor binding sequence motifs. P-values for enrichment were estimated by using a hypergeometric test, which compared the proportion of promoters from a given gene set containing a particular motif with the frequency of occurrence of that motif in promoter regions of all other metabolic genes. Correction for multiple-testing was done by using q-value  and motifs with q-value≤0.05 were considered as significantly enriched.
In accord with our hypothesis, several transcription factor binding sites were overrepresented in the promoter regions of the enzymes associated with reporter metabolites. A summary of the main results from this analysis is illustrated in Figure 3A. Many transcription factors were found to be common across the two case studies (Figure 3B), albeit in connection with different reporter metabolites. PPAR family motifs (PPARγ and PPARα:RXRα) were enriched in seven downregulated enzyme sets including ATP. Tax/CREB motifs were enriched in promoters of downregulated enzymes associated with ATP, ADP and phosphate. Additional down-regulated neighbors of ATP were enriched for the binding sites of NF-κB, MEF-2, UF1-H3β, Pax-9 and NKX6.2, while the NRF-1 motif was enriched in the set of up-regulated enzymes neighboring ADP. Another potential regulatory signature was identified around the down-regulated neighbors of phosphatidylinositol and phosphatidylinositol 4,5-bisphospate (important phospholipids which participate in insulin and other signaling reactions), which were significantly enriched for binding sites of p53, PPARγ, SRF, SEF-1, v-Jun, GCNF, AR and many others (Table S7). These and other highly connected reporter metabolites in the metabolite-TF network (Figure 3A) demonstrate the concept that associated metabolic pathways can be transcriptionally regulated in multiple ways in response to environmental stimuli or metabolic perturbation.
Maintenance of whole-body glucose metabolism is reliant on a delicately balanced dynamic interaction between tissue sensitivity to insulin (including muscle, adipose and liver) and insulin secretion ,. Unfortunately, the molecular mechanisms responsible for diabetes risk remain unknown. A key metabolic phenotype associated with insulin resistance in humans is inappropriate lipid accumulation in tissues outside of adipose tissue, suggesting defects in fatty acid uptake, synthesis, and/or oxidation. With lipid excess and/or impaired oxidation, as observed in obesity and/or inactivity, flux of long-chain acyl CoAs (LC-CoA) may be redirected into cytosolic lipid species such as diacylglycerols (DAG), triacylglycerols (TG) and ceramides (derivatives of sphingosine and fatty acid metabolism)  that are correlated with reductions in insulin signaling and insulin resistance , –,. Whether alterations in mitochondrial oxidative function in humans with insulin resistance and diabetes contribute to, or are a consequence of these defects, remains unclear .
Recognizing these important gaps in our knowledge of diabetes pathophysiology, we have integrated transcriptomic data with metabolic networks to systematically identify, in an unbiased fashion, regulatory hot spots (reporter metabolites and associated transcription factors) associated with insulin resistance and T2DM. Our reporter metabolite results provide evidence for transcriptional dysregulation of multiple metabolic pathways in skeletal muscle. Interestingly, many of the reporter metabolites identified in our analysis have been appreciated in prior experimental studies in animal models (metabolites with italic font in Tables 1, ,22 and S8). A bird's-eye view of selected metabolic and regulatory nodes identified in our study is depicted in Figure 4.
In conditions of overnutrition and physical inactivity, availability of cellular fatty acids stimulate ligand–dependent PPARα/δ transcription factors which, in turn, induce transcription of genes responsible for β-oxidation ,. Metabolic byproducts of incomplete β-oxidation, such as acylcarnitines and reactive oxygen species, may accumulate in mitochondria and contribute to insulin resistance . Interestingly, our analysis identified enrichment of PPAR family transcription factor binding motifs in T2DM as compared with insulin sensitive subjects, in both the Swedish and Mexican-American datasets (T2DM vs NGT and T2DM vs FH−, respectively). Moreover, reporter analysis revealed lipid metabolites (Table S1), known to be natural ligands of PPARγ (prostaglandins) .
Another reporter metabolite identified in our analysis is diacylglycerol (DAG), a lipid signaling molecule known to inversely correlate with insulin sensitivity , –,. Our results suggest that perturbations in DAG levels may be accompanied by changes in the adjacent CDP-Choline branch of the Kennedy pathway of phospholipid metabolism (Figure 4). Thus, DAG could potentially affect insulin sensitivity via activation of serine/threonine kinases or alterations in phospholipid membrane composition, both of which could lead to defects in insulin signaling, reduced insulin-stimulated glucose uptake, and glycogen synthesis – key metabolic features of diabetes  (Figure 4). Together, identification of these lipid-linked regulatory motifs and reporter metabolites known to be involved in type 2 diabetes pathogenesis provides further support for the validity of our approach.
Using our approach we found several reporter metabolites from the TCA cycle (citrate, 2-oxoglutarate, succinyl-CoA, fumarate and malate) (Figure 4). The down-regulated genes associated with these metabolites support the idea that TCA cycle and/or oxidative phosphorylation flux is reduced in diabetes . It is also interesting that ATP is one of the reporter metabolites, as the majority of cellular ATP is generated via respiration. Moreover, significant enrichment of binding motif for NF-κβ in the upregulated ATP neighbors is consistent with the potential role of this transcription factor in mediating oxidative stress responses triggered by by-products of incomplete β-oxidation . Another interesting finding is the enrichment of CREB family and NRF-1 motifs in enzymes associated with ATP and ADP. These results corroborate the role of CREB as an indirect regulator of nuclear-encoded oxidative phosphorylation genes via PGC1-α and other regulators linked to nuclear-encoded mitochondrial genes (Figure 4) ,,.
The appearance of highly connected metabolites, such as ATP and NADH, among top-ranking reporter metabolites provides a possible link to the observed network-wide transcriptional changes in IGT and T2DM. Cellular levels of these co-factors are usually constrained within relatively narrow ranges to maintain thermodynamic stability. Oxidative phsophorylation, which is connected to TCA cycle flux via succinate and fumarate, accounts for most of the ATP (and NADH) turnover in a respiring cell. Our results suggest reduction in the activity of both TCA cycle and oxidative phosphorylation, in agreement with recent NMR data demonstrating that mitochondrial ATP synthesis is reduced in humans with insulin resistance –. Another major source of ATP and NADH production in the cell is glycolysis. Reporter metabolites representative of glycolysis (glucose, glucose-6-phosphate, glucose-1-phosphate and pyruvate) also exhibited concordant down-regulation of the neighboring genes.
The concordance between the changes in gene expression levels for glycolysis, TCA cycle and oxidative phosphorylation in IGT and T2DM suggests that transcriptional regulatory mechanisms may be a response to altered levels of ATP/NADH. Such response may achieve two purposes: (1) regulation of metabolism on global scale, as these co-factors are critical components of many metabolic pathways, and (2) regulation of NADH levels may help in reducing excessive (and potentially deleterious) oxidative stress resulting from sustained oxidation of excessive nutrients . Although the way such regulatory control is mechanistically linked to the corresponding metabolites cannot be deduced from the gene expression data alone, there are several examples where metabolite co-factors are directly involved in regulating gene expression, e.g. NADH(/+) dependent regulation of genes in gram-positive bacteria , yeast – and human ,. NAD+ dependent changes in gene expression levels could also be mediated by the action of PGC-1α and SIRT1 complex, which have important roles in regulation of glucose homeostasis . Additional regulatory links, between glycolytic flux, energy metabolism, TCA cycle flux and fatty acid metabolism are also known in other eukaryotic systems such as baker's yeast –. Furthermore, several of the enzymes from central carbon metabolism may be regulated to a large extent at the post-transcriptional level ,. Parallels of such regulatory circuits in human cells may be discovered in the future with the here-identified transcription factors (Table S7) as one of the starting points.
Metabolites involved in protein and lipid glycosylation were found as reporters and characterized by down-regulation of neighboring enzymes (Table S2). Alterations in glycosylation may ultimately cause misfolding of several proteins, a feature previously associated with over-nutrition in hepatocytes . Another reporter metabolite, shared by T2DM vs NGT and T2DM vs FH− comparison, is trichloroethanol, a metabolite in the cytochrome P450-mediated pathway derived from trichlorethene . Although tricholoethanol or tricholoethene is not an endogenous metabolite in human tissues, it appears that the expression of the cytochrome P450 is altered in T2DM. Interestingly, experimental evidence shows that mouse exposure to trichlorethene leads to PPARα activation and the reprogramming of gene expression, resulting in induction of enzymes mediating β- and ω-oxidation of fatty acids, and increased expression of genes involved in lipid metabolism , a pattern similar to the T2DM metabolic phenotype .
The identification of reporter metabolites from glycolysis and energy-generation pathways suggests that there may be regulation of certain physiological parameters, such as glucose uptake, at the transcriptional level of the corresponding metabolic pathways. To investigate the extent of such possible regulation, we calculated Pearson correlation coefficients between insulin sensitivity (as measured by either whole-body glucose uptake during the hyperinsulinemic euglycemic clamp or insulin levels achieved during the OGTT) and mean centroid expression levels of genes surrounding reporter metabolites (Swedish dataset) (Materials and methods). A significant linear correlation with whole-body glucose uptake was observed for several reporter metabolites. In most cases, the correlation was significant only for one of the conditions (NGT, IGT or T2DM). For example, significant correlation of transcriptional regulation around dUDP with glucose uptake was found only for NGT samples (Figure 5A). It appears that this potential connection is de-linked under IGT and T2DM conditions. Another example is 1-Phosphatidyl-1D-myo-inositol 3-phosphate (Figure 5B), where significant correlation is observed with insulin level only for IGT. Further investigation of the causal mechanisms behind these observed correlation patterns may help in elucidating the regulatory role of the reporter metabolites in diabetes pathogenesis.
A key scientific and clinical challenge is to identify molecular markers of diabetes risk, not only to better understand disease pathophysiology, but also to develop novel therapies for prevention and treatment of established diabetes. In this context, it is interesting that our analysis identified both PPARγ and its potential lipid ligands as regulatory molecules, since PPARγ ligand thiazolidinediones are currently employed as effective therapy for diabetes. We hypothesize that some transcriptional pathways identified in the current analysis, including CREB, NRF-1 and SRF, may be additional novel molecular mediators of the transcriptomic phenotype associated with insulin resistance, and thus potential targets for future intervention strategies. Of course, the potential roles of these pathways will require additional testing in cultured cells and animal models, where their impact on metabolic flux and insulin sensitivity can be fully assessed.
Similarly, reporter metabolites identified in our analysis represent molecules likely to be involved in human skeletal muscle insulin resistance phenoytpes and also novel candidate biomarkers of insulin resistance and diabetes risk. In support of this hypothesis, several of the identified metabolites have known physiological roles in T2DM (Table S8 and Discussion above). Additional molecules have been analyzed either in rodents and/or in other tissues (Table S8) and thus, their appearance as reporter metabolites also strongly implicates their involvement in insulin resistance in human skeletal muscle. Some of the novel metabolites identified in our analysis, including glycolytic and fatty acid oxidation intermediates, are known targets of metformin, a compound effective for diabetes therapy and prevention (Figure 4). We also identified an interesting link between DAG, a reporter metabolite for T2DM, and the CDP-choline branch of the Kennedy pathway of phospholipid metabolism (Figure 4). This pathway has been implicated in cancer development and is being established as anti-tumor drug target ,. Changes in phospholipid metabolism are known to affect the properties of cellular membranes, and subsequently signaling through membrane proteins. Further investigation of the role of phospholipids in T2DM pathogenesis may provide clues to some of the missing links that connect metabolic flux changes with insulin signaling in skeletal muscle cells.
Supplementary tables S1, S2, S3, S4 list additional reporter metabolites which are, to our knowledge, not (directly) linked with any of the known metabolic players in T2DM. Our analysis nevertheless suggests them as potential nodes of disruption or as biomarkers. Measurement of the intramyocellular concentration of the reporter metabolites in patients with diabetes risk may help to confirm the role of these metabolites in insulin resistance.
A particularly interesting finding from our analysis is the identification of highly connected metabolites as reporters, including ATP/ADP and NAD+/NADH. We hypothesize that diverse environmental and genetic risk factors result in insulin resistance when individuals are unable to mediate appropriate compensatory transcriptional and metabolic responses in other parts of the network connected by these hubs. Our results also suggest that alterations in gene expression linked to the highly connected co-factors are likely to be acquired features of established T2DM. Analysis of the transcriptional activity of CREB in the context of ATP concentrations and TCA cycle activity in skeletal muscle may help to elucidate regulatory mechanisms leading to these changes.
Reconstructed human metabolic network models are still evolving, incomplete, and subject to error. Well-annotated pathways such as central carbon metabolism are thereby likely to be over-represented in the reporter analysis. In order to partially compensate for this limitation, we used two reconstructions – Recon1 and EHMN. As network reconstructions will become more complete, it will be possible to better assess the extent of this limitation. Another essential input to our algorithm, in addition to metabolic network, is gene expression data for the genes represented in the network. We would like to note that neither EHMN nor Recon1 network genes were fully represented by the microarray chips used in the two case studies (Text S1). Only 54% and 39% genes from the Recon1 and EHMN, respectively, were represented on the chips used in Mexican-American case study, while this coverage was 85% and 60% in Swedish case study. Interestingly, re-analysis of the Swedish Male dataset by using only a subset of genes from the HG-U133A chip that were represented also on the HuGeneFL (used in Mexican-American case study) showed a large overlap between the two reporter metabolite sets thus obtained (86% for T2DM vs NGT comparison and 69% for the rest). The details of this analysis, together with relevant statistical considerations, can be found in Text S1.
Although the present analysis identified common metabolic and regulatory signatures across the two studies, there are several differences in the study designs, and therefore the results must be regarded with certain caution. In addition to relatively low number of subjects in Mexican-American study, the differences include fasting state biopsies in Mexican-American study vs post insulin stimulation biopsies in Swedish study. Furthermore, the age and BMI (Body Mass Index) of the individuals participating in the two studies were different and may contribute to the differences in the observed gene expression patterns. To our knowledge, these two case studies represent the only human skeletal muscle transcriptome datasets that were available at the time of here reported computational analysis. Analysis of new datasets which may become available in the future will be useful in obtaining further insight into molecular physiology of skeletal muscle in the context of T2DM. Moreover, emergence of better or new gene expression analysis tools will help to cover parts of metabolic network that are currently inaccessible due to the lack of data.
Extension of the analysis to discover more global regulatory patterns by using additional bio-molecular interaction data  such as protein-DNA and protein-protein interactions will definitely be an important step in obtaining a higher resolution picture of T2DM metabolic phenotypes. Availability of such interaction data at the high confidence level of metabolic interactions is the current major bottleneck. Another essential extension of the methodology will require the use of thermodynamic data for metabolic reactions –. Moreover, since mRNA levels do not necessarily correlate with the protein levels, incorporation of the proteomics data together with the thermodynamic data will allow more accurate interpretation of the reporter metabolites in terms of implications for flux and concentration changes.
We demonstrate the use of a network-guided data integration approach to discover key, physiologically relevant metabolic and regulatory nodes in T2DM pathogenesis. The methodology does not require the use of a priori disease-specific knowledge regarding the involvement of specific pathways or metabolites, thereby making it a robust and unbiased analytical framework for studying diseases linked to perturbations in the cellular metabolic network. Our results identify the highly connected metabolites ATP and NAD+ as reporters and potential mediators of the widespread changes in gene expression linked to insulin resistance in muscle. Moreover, our results extend previous knowledge about T2DM pathogenesis at the gene expression level – by reporting additional potential sites of disruption, e.g., TCA cycle and Kennedy pathway of phospholipid metabolism. Several metabolites from other pathways were also found to display significant differential gene expression of the genes around them and we suggest putative regulatory mechanisms behind these alterations. Our results suggest a framework of metabolic disruption observed with insulin resistance and diabetes, which can be used to test the role of specific pathways in mediating disease pathophysiology, and more practically, for the identification of potential biomarkers for preventive and therapeutic monitoring.
Two datasets used in the study were obtained from the Diabetes Genome Anatomy Project website (http://www.diabetesgenome.org). Brief comparison of microarray platforms from the experimental studies , used in the current work is presented in the Text S1. Promoter sequences for all genes were obtained from the Ensembl Biomart (http://www.ensembl.org/biomart). The transcriptional start sites (TSSs) were identified based on the annotation of the Ensembl Biomart sequences. Sequences in the −800 to 200 base pairs region of the TSS were deemed as promoter regions for the analysis. Interspersed repeats and low complexity DNA sequences were masked out.
Two reconstructions of human metabolic network, viz., Recon1  and EHMN  were used in this study. The Homo Sapiens Recon1 is a comprehensive literature-based metabolic network reconstruction that accounts for the functions of 1496 ORFs, 2004 proteins, 2766 metabolites and 3311 metabolic and transport reactions. The ENMN (Edinburgh Human Metabolic Model) is a network reconstructed by integrating genome annotation information from different databases and metabolic reaction information from the literature. The network contains nearly 3000 metabolic reactions, which were reorganized into about 70 human-specific pathways according to their functional relationships. The two models mainly differ in the coverage of reactions and in the accounting of compartmentalization and inter-organelle transport reactions.
Preprocessing of the gene expression data was carried out by using the statistical software environment – R (www.r-project.org). The probe intensities were obtained and corrected for background by using robust multi-array average method (RMA)  with only perfect-match (PM) probes. Normalization was performed by using the quantiles algorithm. Gene expression values were calculated from the PM probes with the median polish summarization method . All data preprocessing methods were used by invoking them through the affy package  by using rma function. Significance of the differential expression was calculated by using the empirical Bayes test . The probe-sets were grouped into genes, and to each gene the differential expression was defined by choosing the value from the top level probe-set (using the probe-set rank defined by Affymetrix). In case of more than one probe-set present at the top level, the median value was used.
Each metabolite in the metabolic network was scored based on the scores of its k neighbor enzymes (i.e. enzymes catalyzing reactions involving that metabolite, either as a substrate or as a product). Each enzyme was assigned with a p-value for differential expression based on the p-value of the gene encoding for that enzyme. In case of isozymes and enzyme-complexes, genes with most significant expression change were used to score the enzyme (Figure 1). P-values of genes pi, indicating the significance of differential expression, were converted to Z-scores Zi by using the inverse normal cumulative distribution function (CDF) (): . All metabolite nodes were assigned a Z-score, Zmetabolite, calculated as aggregated Z scores of the k neighbor enzymes: . Zmetabolite scores were then corrected for the background distribution by subtracting the mean (μk) and dividing by the standard deviation (σk) of the aggregated Z scores derived by sampling 10000 sets of k enzymes from the network: . Corrected Z-scores were then transformed to p-values by using CDF. Metabolites with p-values less than 0.05 were deemed as reporter metabolites. Detailed information on the reporter scoring can be found in the Text S1 and .
For all reporter metabolites, we assessed enrichment of known protein-binding sequence motifs in the promoter regions (−800 to 200 base pairs relative to the transcription start site) of the corresponding neighbor genes. In order to obtain robust results, we only considered sets consisting of at least 5 up- or down-regulated genes. For each reporter metabolite, the sequences of all enzyme neighbors were used as the positive sequence set, whereas all other enzymes in the network model were used as the negative (background) set. Known motifs were identified by using position frequency matrices of all known motifs stored in the TRANSFAC database . The motif enrichment analysis tool ASAP  was used to scan all TRANSFAC motif matrices against the positive sequence sets of each reporter metabolite. The negative sequence sets were used together with 2nd order background model. A one-tailed Fisher's exact test was used to assess per-sequence over-representation of any known motif, and the threshold used to calculate significance for each TRANSFAC matrix was set to 70% of the highest-scoring sequence motif. The q-value cut-off criteria  was used as a post-data measure of statistical significance of all motifs found to be significantly enriched.
Supporting text describing scoring methodology and datasets.
(0.74 MB PDF)
Reporter metabolites for Swedish male dataset (Recon1).
(0.05 MB XLS)
Reporter metabolites for Swedish male dataset (EHMN).
(0.07 MB XLS)
Reporter metabolites for Mexican-American dataset (Recon1).
(0.03 MB XLS)
Reporter metabolites for Mexican-American dataset (EHMN).
(0.05 MB XLS)
Overlapping reporter metabolites between two case studies (Recon1).
(0.05 MB XLS)
Overlapping reporter metabolites between two case studies (EHMN).
(0.06 MB XLS)
Results of the motif enrichment analysis.
(0.03 MB PDF)
Experimentally studies linking metabolite levels to T2DM pathophysiology
(0.12 MB PDF)
Hierarchical clustering of pair-wise comparisons within the Swedish male and Mexican-American datasets based on the overlapping reporter metabolites (EHMN network).
(0.21 MB TIF)
We are thankful to Isabel Rocha for a feedback on the manuscript. We thank Anders Krogh for advising on the choice of motif analysis method. We are grateful to the reviewers for useful comments.
The authors have declared that no competing interests exist.
MEP appreciates grant support from NIH grants DK062948 and DK060837 and the Graetz Fund. AZ acknowledges support from NOVO scholarship program 2008-9. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.