Background
Genome-wide association studies (GWAS) with metabolic traits and metabolome-wide association studies (MWAS) with traits of biomedical relevance are powerful tools to identify the contribution of genetic, environmental and lifestyle factors to the etiology of complex diseases. Hypothesis-free testing of ratios between all possible metabolite pairs in GWAS and MWAS has proven to be an innovative approach in the discovery of new biologically meaningful associations. The p-gain statistic was introduced as an ad-hoc measure to determine whether a ratio between two metabolite concentrations carries more information than the two corresponding metabolite concentrations alone. So far, only a rule of thumb was applied to determine the significance of the p-gain.
Results
Here we explore the statistical properties of the p-gain through simulation of its density and by sampling of experimental data. We derive critical values of the p-gain for different levels of correlation between metabolite pairs and show that B/(2*α) is a conservative critical value for the p-gain, where α is the level of significance and B the number of tested metabolite pairs.
Conclusions
We show that the p-gain is a well defined measure that can be used to identify statistically significant metabolite ratios in association studies and provide a conservative significance cut-off for the p-gain for use in future association studies with metabolic traits.
doi:10.1186/1471-2105-13-120
PMCID: PMC3537592
PMID: 22672667
p-gain; Metabolomics; MWAS; GWAS; Genome-wide association studies; Metabolome-wide association studies
Given the ever expanding number of model plant species for which complete genome sequences are available and the abundance of bio-resources such as knockout mutants, wild accessions and advanced breeding populations, there is a rising burden for gene functional annotation. In this protocol, annotation of plant gene function using combined co-expression gene analysis, metabolomics and informatics is provided (Figure 1). This approach is based on the theory of using target genes of known function to allow the identification of non-annotated genes likely to be involved in a certain metabolic process, with the identification of target compounds via metabolomics. Strategies are put forward for applying this information on populations generated by both forward and reverse genetics approaches in spite of none of these are effortless. By corollary this approach can also be used as an approach to characterise unknown peaks representing new or specific secondary metabolites in the limited tissues, plant species or stress treatment, which is currently the important trial to understanding plant metabolism.
doi:10.3791/3487
PMCID: PMC3476379
PMID: 22733029
Plant Biology; Issue 64; Genetics; Bioinformatics; Metabolomics; Plant metabolism; Transcriptome analysis; Functional annotation; Computational biology; Plant biology; Theoretical biology; Spectroscopy and structural analysis
Biotechnology, including genetic modification, is a very important approach to regulate the production of particular metabolites in plants to improve their adaptation to environmental stress, to improve food quality, and to increase crop yield. Unfortunately, these approaches do not necessarily lead to the expected results due to the highly complex mechanisms underlying metabolic regulation in plants. In this context, metabolomics plays a key role in plant molecular biotechnology, where plant cells are modified by the expression of engineered genes, because we can obtain information on the metabolic status of cells via a snapshot of their metabolome. Although metabolome analysis could be used to evaluate the effect of foreign genes and understand the metabolic state of cells, there is no single analytical method for metabolomics because of the wide range of chemicals synthesized in plants. Here, we describe the basic analytical advancements in plant metabolomics and bioinformatics and the application of metabolomics to the biological study of plants.
doi:10.1007/s11816-011-0191-2
PMCID: PMC3262138
PMID: 22308170
Metabolomics; Mass spectrometry; NMR; Functional genomics; GMO; Quantitative genetics; QTL
Nicholson, George | Rantalainen, Mattias | Li, Jia V. | Maher, Anthony D. | Malmodin, Daniel | Ahmadi, Kourosh R. | Faber, Johan H. | Barrett, Amy | Min, Josine L. | Rayner, N. William | Toft, Henrik | Krestyaninova, Maria | Viksna, Juris | Neogi, Sudeshna Guha | Dumas, Marc-Emmanuel | Sarkans, Ugis | Donnelly, Peter | Illig, Thomas | Adamski, Jerzy | Suhre, Karsten | Allen, Maxine | Zondervan, Krina T. | Spector, Tim D. | Nicholson, Jeremy K. | Lindon, John C. | Baunsgaard, Dorrit | Holmes, Elaine | McCarthy, Mark I. | Holmes, Chris C. | Barsh, Gregory S.
We have performed a metabolite quantitative trait locus (mQTL) study of the 1H nuclear magnetic resonance spectroscopy (1H NMR) metabolome in humans, building on recent targeted knowledge of genetic drivers of metabolic regulation. Urine and plasma samples were collected from two cohorts of individuals of European descent, with one cohort comprised of female twins donating samples longitudinally. Sample metabolite concentrations were quantified by 1H NMR and tested for association with genome-wide single-nucleotide polymorphisms (SNPs). Four metabolites' concentrations exhibited significant, replicable association with SNP variation (8.6×10−11
Author Summary
Physiological concentrations of metabolites—small molecules involved in biochemical processes in living systems—can be measured and used to diagnose and predict disease states. A common goal is to detect and clinically exploit statistical differences in metabolite concentrations between diseased and healthy individuals. As a basis for the design and interpretation of case-control studies, it is useful to have a characterization of metabolic diversity amongst healthy individuals, some of which stems from inter-individual genetic variation. When a single genetic locus has a sufficiently strong effect on metabolism, its genomic position can be determined by collecting metabolite concentration data and genome-wide genotype data on a set of individuals and searching for associations between the two data sets—a so-called metabolite quantitative trait locus (mQTL) study. By so tracing mQTLs, we can identify the genetic drivers of metabolism, characterize how the nature or quantity of the corresponding expressed protein(s) feeds forward to influence metabolite levels, and specify disease-predictive models that incorporate mutual dependence amongst genetics, environment, and metabolism.
doi:10.1371/journal.pgen.1002270
PMCID: PMC3169529
PMID: 21931564
Interpreting quantitative metabolome data is a difficult task owing to the high connectivity in metabolic networks and inherent interdependency between enzymatic regulation, metabolite levels and fluxes. Here we present a hypothesis-driven algorithm for the integration of such data with metabolic network topology. The algorithm thus enables identification of reporter reactions, which are reactions where there are significant coordinated changes in the level of surrounding metabolites following environmental/genetic perturbations. Applicability of the algorithm is demonstrated by using data from Saccharomyces cerevisiae. The algorithm includes preprocessing of a genome-scale yeast model such that the fraction of measured metabolites within the model is enhanced, and thus it is possible to map significant alterations associated with a perturbation even though a small fraction of the complete metabolome is measured. By combining the results with transcriptome data, we further show that it is possible to infer whether the reactions are hierarchically or metabolically regulated. Hereby, the reported approach represents an attempt to map different layers of regulation within metabolic networks through combination of metabolome and transcriptome data.
doi:10.1038/msb4100085
PMCID: PMC1682015
PMID: 17016516
data integration; genome-scale stoichiometric model; metabolic regulation; quantitative metabolomics; reporter reactions
Background
Current Genome-Wide Association Studies (GWAS) are performed in a single trait framework without considering genetic correlations between important disease traits. Hence, the GWAS have limitations in discovering genetic risk factors affecting pleiotropic effects.
Results
This work reports a novel data mining approach to discover patterns of multiple phenotypic associations over 52 anthropometric and biochemical traits in KARE and a new analytical scheme for GWAS of multivariate phenotypes defined by the discovered patterns. This methodology applied to the GWAS for multivariate phenotype highLDLhighTG derived from the predicted patterns of the phenotypic associations. The patterns of the phenotypic associations were informative to draw relations between plasma lipid levels with bone mineral density and a cluster of common traits (Obesity, hypertension, insulin resistance) related to Metabolic Syndrome (MS). A total of 15 SNPs in six genes (PAK7, C20orf103, NRIP1, BCL2, TRPM3, and NAV1) were identified for significant associations with highLDLhighTG. Noteworthy findings were that the significant associations included a mis-sense mutation (PAK7:R335P), a frame shift mutation (C20orf103) and SNPs in splicing sites (TRPM3).
Conclusions
The six genes corresponded to rat and mouse quantitative trait loci (QTLs) that had shown associations with the common traits such as the well characterized MS and even tumor susceptibility. Our findings suggest that the six genes may play important roles in the pleiotropic effects on lipid metabolism and the MS, which increase the risk of Type 2 Diabetes and cardiovascular disease. The use of the multivariate phenotypes can be advantageous in identifying genetic risk factors, accounting for the pleiotropic effects when the multivariate phenotypes have a common etiological pathway.
doi:10.1186/1752-0509-5-S2-S13
PMCID: PMC3287479
PMID: 22784570
Advances in analytical methodologies, principally nuclear magnetic resonance spectroscopy (NMR) and mass spectrometry (MS), during the last decade have made large-scale analysis of the human metabolome a reality. This is leading to the reawakening of the importance of metabolism in human diseases, particularly cancer. The metabolome is the functional readout of the genome, functional genome, and proteome; it is also an integral partner in molecular regulations for homeostasis. The interrogation of the metabolome, or metabolomics, is now being applied to numerous diseases, largely by metabolite profiling for biomarker discovery, but also in pharmacology and therapeutics. Recent advances in stable isotope tracer-based metabolomic approaches enable unambiguous tracking of individual atoms through compartmentalized metabolic networks directly in human subjects, which promises to decipher the complexity of the human metabolome at an unprecedented pace. This knowledge will revolutionize our understanding of complex human diseases, clinical diagnostics, as well as individualized therapeutics and drug response.
In this review, we focus on the use of stable isotope tracers with metabolomics technologies for understanding metabolic network dynamics in both model systems and in clinical applications. Atom-resolved isotope tracing via the two major analytical platforms, NMR and MS, has the power to determine novel metabolic reprogramming in diseases, discover new drug targets, and facilitates ADME studies. We also illustrate new metabolic tracer-based imaging technologies, which enable direct visualization of metabolic processes in vivo. We further outline current practices and future requirements for biochemoinformatics development, which is an integral part of translating stable isotope-resolved metabolomics into clinical reality.
doi:10.1016/j.pharmthera.2011.12.007
PMCID: PMC3471671
PMID: 22212615
Metabolomics; systems biochemistry; stable isotope tracing; drug discovery; metabolic compartmentation and regulation; pathway reconstruction
Xenobiotic metabolism, a ubiquitous natural response to foreign compounds, elicits initiating signals for many pathophysiological events. Currently, most widely used techniques for identifying xenobiotic metabolites and metabolic pathways are empirical and largely based on in vitro incubation assays and in vivo radiotracing experiments. Recent work in our lab has shown that LC-MS-based metabolomic techniques are useful tools for xenobiotic metabolism research since multivariate data analysis in metabolomics can significantly rationalize the processes of xenobiotic metabolite identification and metabolic pathway analysis. In this review, the technological elements of LC-MS-based metabolomics for constructing high-quality datasets and conducting comprehensive data analysis are examined. Four novel approaches of using LC-MS-based metabolomic techniques in xenobiotic metabolism research are proposed and illustrated by case studies and proof-of-concept experiments, and the perspective on their application is further discussed.
doi:10.1080/03602530701497804
PMCID: PMC2140249
PMID: 17786640
Metabolomics; Xenobiotic metabolism; Drug metabolism; LC-MS; Multivariate data analysis
Background
One of the goals of global metabolomic analysis is to identify metabolic markers that are hidden within a large background of data originating from high-throughput analytical measurements. Metabolite-based clustering is an unsupervised approach for marker identification based on grouping similar concentration profiles of putative metabolites. A major problem of this approach is that in general there is no prior information about an adequate number of clusters.
Results
We present an approach for data mining on metabolite intensity profiles as obtained from mass spectrometry measurements. We propose one-dimensional self-organizing maps for metabolite-based clustering and visualization of marker candidates. In a case study on the wound response of Arabidopsis thaliana, based on metabolite profile intensities from eight different experimental conditions, we show how the clustering and visualization capabilities can be used to identify relevant groups of markers.
Conclusion
Our specialized realization of self-organizing maps is well-suitable to gain insight into complex pattern variation in a large set of metabolite profiles. In comparison to other methods our visualization approach facilitates the identification of interesting groups of metabolites by means of a convenient overview on relevant intensity patterns. In particular, the visualization effectively supports researchers in analyzing many putative clusters when the true number of biologically meaningful groups is unknown.
doi:10.1186/1748-7188-3-9
PMCID: PMC2464586
PMID: 18582365
Genome-wide association studies (GWASs) have become a very important tool to address the genetic origin of phenotypic variability, in particular associated with diseases. Nevertheless, these types of studies provide limited information about disease etiology and the molecular mechanisms involved. Recently, the incorporation of metabolomics into the analysis has offered novel opportunities for a better understanding of disease-related metabolic deregulation. The pattern emerging from this work is that gene-driven changes in metabolism are prevalent and that common genetic variations can have a profound impact on the homeostatic concentrations of specific metabolites. A particularly interesting aspect of this work takes into account interactions of environment and lifestyle with the genome and how this interaction translates into changes in the metabolome. For instance, the role of PYROXD2 in trimethylamine metabolism points to an interaction between host and microbiome genomes (host/microbiota). Often, these findings reveal metabolic deregulations, which could eventually be tuned with a nutritional intervention. Here we review the development of gene–metabolism association studies from a single-gene/single-metabolite to a genome-wide/metabolome-wide approach and highlight the conceptual changes associated with this ongoing transition. Moreover, we report some of our recent GWAS results on a cohort of 265 individuals from an ethnically diverse population that validate and refine previous findings on gene–urine metabolism interactions. Specifically, our results confirm the effect of PYROXD2 polymorphisms on trimethylamine metabolism and suggest that a previously reported association of N-acetylated compounds with the ALMS1/NAT8 locus is driven by SNPs in the ALMS1 gene.
Electronic supplementary material
The online version of this article (doi:10.1007/s12263-012-0313-7) contains supplementary material, which is available to authorized users.
doi:10.1007/s12263-012-0313-7
PMCID: PMC3534994
PMID: 23065485
Metabolome wide associations; Genome wide associations; Metabolomics; Nutrition
This review presents an overview of the dynamically developing field of mass spectrometry-based metabolomics. Metabolomics aims at the comprehensive and quantitative analysis of wide arrays of metabolites in biological samples. These numerous analytes have very diverse physico-chemical properties and occur at different abundance levels. Consequently, comprehensive metabolomics investigations are primarily a challenge for analytical chemistry and specifically mass spectrometry has vast potential as a tool for this type of investigation. Metabolomics require special approaches for sample preparation, separation, and mass spectrometric analysis. Current examples of those approaches are described in this review. It primarily focuses on metabolic fingerprinting, a technique that analyzes all detectable analytes in a given sample with subsequent classification of samples and identification of differentially expressed metabolites, which define the sample classes. To perform this complex task, data analysis tools, metabolite libraries, and databases are required. Therefore, recent advances in metabolomics bioinformatics are also discussed.
doi:10.1002/mas.20108
PMCID: PMC1904337
PMID: 16921475
metabolomics; metabolic fingerprinting; metabolic profiling; lipidomics; mass spectrometry
High-throughput metabolomic experiments aim at identifying and ultimately quantifying all metabolites present in biological systems. The metabolites are interconnected through metabolic reactions, generally grouped into metabolic pathways. Classical metabolic maps provide a relational context to help interpret metabolomics experiments and a wide range of tools have been developed to help place metabolites within metabolic pathways. However, the representation of metabolites within separate disconnected pathways overlooks most of the connectivity of the metabolome. By definition, reference pathways cannot integrate novel pathways nor show relationships between metabolites that may be linked by common neighbours without being considered as joint members of a classical biochemical pathway. MetExplore is a web server that offers the possibility to link metabolites identified in untargeted metabolomics experiments within the context of genome-scale reconstructed metabolic networks. The analysis pipeline comprises mapping metabolomics data onto the specific metabolic network of an organism, then applying graph-based methods and advanced visualization tools to enhance data analysis. The MetExplore web server is freely accessible at http://metexplore.toulouse.inra.fr.
doi:10.1093/nar/gkq312
PMCID: PMC2896158
PMID: 20444866
Kaddurah-Daouk, Rima | Baillie, Rebecca A. | Zhu, Hongjie | Zeng, Zhao-Bang | Wiest, Michelle M. | Nguyen, Uyen Thao | Wojnoonski, Katie | Watkins, Steven M. | Trupp, Miles | Krauss, Ronald M. | Tse, Herman
Although statins are widely prescribed medications, there remains considerable variability in therapeutic response. Genetics can explain only part of this variability. Metabolomics is a global biochemical approach that provides powerful tools for mapping pathways implicated in disease and in response to treatment. Metabolomics captures net interactions between genome, microbiome and the environment. In this study, we used a targeted GC-MS metabolomics platform to measure a panel of metabolites within cholesterol synthesis, dietary sterol absorption, and bile acid formation to determine metabolite signatures that may predict variation in statin LDL-C lowering efficacy. Measurements were performed in two subsets of the total study population in the Cholesterol and Pharmacogenetics (CAP) study: Full Range of Response (FR), and Good and Poor Responders (GPR) were 100 individuals randomly selected from across the entire range of LDL-C responses in CAP. GPR were 48 individuals, 24 each from the top and bottom 10% of the LDL-C response distribution matched for body mass index, race, and gender. We identified three secondary, bacterial-derived bile acids that contribute to predicting the magnitude of statin-induced LDL-C lowering in good responders. Bile acids and statins share transporters in the liver and intestine; we observed that increased plasma concentration of simvastatin positively correlates with higher levels of several secondary bile acids. Genetic analysis of these subjects identified associations between levels of seven bile acids and a single nucleotide polymorphism (SNP), rs4149056, in the gene encoding the organic anion transporter SLCO1B1. These findings, along with recently published results that the gut microbiome plays an important role in cardiovascular disease, indicate that interactions between genome, gut microbiome and environmental influences should be considered in the study and management of cardiovascular disease. Metabolic profiles could provide valuable information about treatment outcomes and could contribute to a more personalized approach to therapy.
doi:10.1371/journal.pone.0025482
PMCID: PMC3192752
PMID: 22022402
s5-1
Metabolomics has matured over the past 10 years. By combining different platforms, over 2,000 identified metabolites can be screened. At the UC Davis Genome Center Metabolomics Facility, two laboratories work towards advancing methods and reaching out services, the Fiehn research laboratory and the metabolomics core. We have a combined use of 11 mass spectrometers for which a range of SOPs and quality controls have been developed for (a) primary metabolism, (b) volatile metabolites, (c) lipidomics, (d) secondary metabolites and (e) metabolic polymers. Over 300 studies have been completed over the past 5 years which are stored and disseminated via the SetupX study design database and facilitated by the BinBase mass spectrometry repositories. Lipid identifications by nanoESI-ion trap MS/MS are based on Genedata's MS Refiner software and a novel cross-instrument library, the LipidBLAST tool that stores calculated MS/MS spectra of over 180,000 lipids based on fragmentation patterns of authentic standards. The FiehnLib libraries of over 1,000 primary metabolites authenticate identifications in GC-TOF platforms, in conjunction with BinBase and the Adams volatile MS library. Polymers in biofuel research are assessed by pyrolysis-GC/MS and the MIT-based SpectConnect tool. LC-ion trap, Qtrap and QTOF mass spectrometry are used for determining compounds that are not amenable by one of the above methods, such as cationic metabolites (SAM, betaine, SMM), metabolic active biomarkers (acylcarnitines) and other important metabolic classes (dietary phytochemicals, folates and glucuronides). Despite this progress, metabolomics still faces a number of analytical challenges: the need for accuracy in structural identifications and quantifications, increases in total peak capacities, improved data processing software and the need for standardized database repositories. Current efforts are presented as well as a discussion on experiences in the dual task of ‘research’ and ‘service’ for metabolomic facilities and how to meet outside expectations and financial constraints.
PMCID: PMC2918200
Induced pluripotent stem cells are different from embryonic stem cells as shown by epigenetic and genomics analyses. Depending on cell types and culture conditions, such genetic alterations can lead to different metabolic phenotypes which may impact replication rates, membrane properties and cell differentiation. We here applied a comprehensive metabolomics strategy incorporating nanoelectrospray ion trap mass spectrometry (MS), gas chromatography-time of flight MS, and hydrophilic interaction- and reversed phase-liquid chromatography-quadrupole time-of-flight MS to examine the metabolome of induced pluripotent stem cells (iPSCs) compared to parental fibroblasts as well as to reference embryonic stem cells (ESCs). With over 250 identified metabolites and a range of structurally unknown compounds, quantitative and statistical metabolome data were mapped onto a metabolite networks describing the metabolic state of iPSCs relative to other cell types. Overall iPSCs exhibited a striking shift metabolically away from parental fibroblasts and toward ESCs, suggestive of near complete metabolic reprogramming. Differences between pluripotent cell types were not observed in carbohydrate or hydroxyl acid metabolism, pentose phosphate pathway metabolites, or free fatty acids. However, significant differences between iPSCs and ESCs were evident in phosphatidylcholine and phosphatidylethanolamine lipid structures, essential and non-essential amino acids, and metabolites involved in polyamine biosynthesis. Together our findings demonstrate that during cellular reprogramming, the metabolome of fibroblasts is also reprogrammed to take on an ESC-like profile, but there are select unique differences apparent in iPSCs. The identified metabolomics signatures of iPSCs and ESCs may have important implications for functional regulation of maintenance and induction of pluripotency.
doi:10.1371/journal.pone.0046770
PMCID: PMC3471894
PMID: 23077522
Mittelstrass, Kirstin | Ried, Janina S. | Yu, Zhonghao | Krumsiek, Jan | Gieger, Christian | Prehn, Cornelia | Roemisch-Margl, Werner | Polonikov, Alexey | Peters, Annette | Theis, Fabian J. | Meitinger, Thomas | Kronenberg, Florian | Weidinger, Stephan | Wichmann, Heinz Erich | Suhre, Karsten | Wang-Sattler, Rui | Adamski, Jerzy | Illig, Thomas | McCarthy, Mark I.
Metabolomic profiling and the integration of whole-genome genetic association data has proven to be a powerful tool to comprehensively explore gene regulatory networks and to investigate the effects of genetic variation at the molecular level. Serum metabolite concentrations allow a direct readout of biological processes, and association of specific metabolomic signatures with complex diseases such as Alzheimer's disease and cardiovascular and metabolic disorders has been shown. There are well-known correlations between sex and the incidence, prevalence, age of onset, symptoms, and severity of a disease, as well as the reaction to drugs. However, most of the studies published so far did not consider the role of sexual dimorphism and did not analyse their data stratified by gender. This study investigated sex-specific differences of serum metabolite concentrations and their underlying genetic determination. For discovery and replication we used more than 3,300 independent individuals from KORA F3 and F4 with metabolite measurements of 131 metabolites, including amino acids, phosphatidylcholines, sphingomyelins, acylcarnitines, and C6-sugars. A linear regression approach revealed significant concentration differences between males and females for 102 out of 131 metabolites (p-values<3.8×10−4; Bonferroni-corrected threshold). Sex-specific genome-wide association studies (GWAS) showed genome-wide significant differences in beta-estimates for SNPs in the CPS1 locus (carbamoyl-phosphate synthase 1, significance level: p<3.8×10−10; Bonferroni-corrected threshold) for glycine. We showed that the metabolite profiles of males and females are significantly different and, furthermore, that specific genetic variants in metabolism-related genes depict sexual dimorphism. Our study provides new important insights into sex-specific differences of cell regulatory processes and underscores that studies should consider sex-specific effects in design and interpretation.
Author Summary
The combination of genomic and metabolic studies during the last years has provided astonishing results. However, most of the studies published so far did not consider the role of sexual dimorphism and did not analyse their data stratified by sex. The investigation of 131 serum metabolite concentrations of >3,300 population-based samples (KORA F3/F4) revealed significant differences in the metabolite profile of males and females. Furthermore, a genome-wide picture of sex-specific genetic variations in human metabolism (>2,000 subjects from KORA F3/F4 cohorts) was investigated. Sex-specific genome-wide association studies (GWAS) showed differences in the effect of genetic variations on metabolites in men and women. SNPs in the CPS1 (carbamoyl-phosphate synthase 1) locus showed genome-wide significant differences in beta-estimates of sex-specific association analysis (significance level: 3.8×10−10) for glycine. As global metabolomic techniques are more and more refined to identify more compounds in single biological samples, the predictive power of this new technology will greatly increase. This suggests that metabolites, which may be used as predictive biomarkers to indicate the presence or severity of a disease, have to be used selectively depending on sex.
doi:10.1371/journal.pgen.1002215
PMCID: PMC3154959
PMID: 21852955
Background
The goal of metabolomics analyses is a comprehensive and systematic understanding of all metabolites in biological samples. Many useful platforms have been developed to achieve this goal. Gas chromatography coupled to mass spectrometry (GC/MS) is a well-established analytical method in metabolomics study, and 200 to 500 peaks are routinely observed with one biological sample. However, only ~100 metabolites can be identified, and the remaining peaks are left as "unknowns".
Result
We present an algorithm that acquires more extensive metabolite information. Pearson's product-moment correlation coefficient and the Soft Independent Modeling of Class Analogy (SIMCA) method were combined to automatically identify and annotate unknown peaks, which tend to be missed in routine studies that employ manual processing.
Conclusions
Our data mining system can offer a wealth of metabolite information quickly and easily, and it provides new insights, particularly into food quality evaluation and prediction.
doi:10.1186/1471-2105-12-131
PMCID: PMC3102042
PMID: 21542920
Systems Biology is a field in biological science that focuses on the combination of several or all “omics”-approaches in order to find out how genes, transcripts, proteins and metabolites act together in the network of life. Metabolomics as analog to genomics, transcriptomics and proteomics is more and more integrated into biological studies and often transcriptomic and metabolomic experiments are combined in one setup. At a first glance both data types seem to be completely different, but both produce information on biological entities, either transcripts or metabolites. Both types can be overlaid on metabolic pathways to obtain biological information on the studied system. For the joint analysis of both data types the MassTRIX webserver was updated. MassTRIX is freely available at www.masstrix.org.
doi:10.1371/journal.pone.0039860
PMCID: PMC3391204
PMID: 22815716
Gas chromatography coupled to mass spectrometry (GC-MS) is one of the most widespread routine technologies applied to the large scale screening and discovery of novel metabolic biomarkers. However, currently the majority of mass spectral tags (MSTs) remains unidentified due to the lack of authenticated pure reference substances required for compound identification by GC-MS. Here, we accessed the information on reference compounds stored in the Golm Metabolome Database (GMD) to apply supervised machine learning approaches to the classification and identification of unidentified MSTs without relying on library searches. Non-annotated MSTs with mass spectral and retention index (RI) information together with data of already identified metabolites and reference substances have been archived in the GMD. Structural feature extraction was applied to sub-divide the metabolite space contained in the GMD and to define the prediction target classes. Decision tree (DT)-based prediction of the most frequent substructures based on mass spectral features and RI information is demonstrated to result in highly sensitive and specific detections of sub-structures contained in the compounds. The underlying set of DTs can be inspected by the user and are made available for batch processing via SOAP (Simple Object Access Protocol)-based web services. The GMD mass spectral library with the integrated DTs is freely accessible for non-commercial use at http://gmd.mpimp-golm.mpg.de/. All matching and structure search functionalities are available as SOAP-based web services. A XML + HTTP interface, which follows Representational State Transfer (REST) principles, facilitates read-only access to data base entities.
doi:10.1007/s11306-010-0198-7
PMCID: PMC2874469
PMID: 20526350
Metabolic markers; Gas chromatography (GC); Mass spectrometry (MS); GC-MS; Mass spectral classification; Mass spectral matching; Metabolite fingerprinting; Metabolite profiling; Metabolomics; Metabonomics; Decision trees
Mycobacterium vanbaalenii PYR-1 is capable of degrading a wide range of high-molecular-weight polycyclic aromatic hydrocarbons (PAHs), including fluoranthene. We used a combination of metabolomic, genomic, and proteomic technologies to investigate fluoranthene degradation in this strain. Thirty-seven fluoranthene metabolites including potential isomers were isolated from the culture medium and analyzed by high-performance liquid chromatography, gas chromatography-mass spectrometry, and UV-visible absorption. Total proteins were separated by one-dimensional gel and analyzed by liquid chromatography-tandem mass spectrometry in conjunction with the M. vanbaalenii PYR-1 genome sequence (http://jgi.doe.gov), which resulted in the identification of 1,122 proteins. Among them, 53 enzymes were determined to be likely involved in fluoranthene degradation. We integrated the metabolic information with the genomic and proteomic results and proposed pathways for the degradation of fluoranthene. According to our hypothesis, the oxidation of fluoranthene is initiated by dioxygenation at the C-1,2, C-2,3, and C-7,8 positions. The C-1,2 and C-2,3 dioxygenation routes degrade fluoranthene via fluorene-type metabolites, whereas the C-7,8 routes oxidize fluoranthene via acenaphthylene-type metabolites. The major site of dioxygenation is the C-2,3 dioxygenation route, which consists of 18 enzymatic steps via 9-fluorenone-1-carboxylic acid and phthalate with the initial ring-hydroxylating oxygenase, NidA3B3, oxidizing fluoranthene to fluoranthene cis-2,3-dihydrodiol. Nonspecific monooxygenation of fluoranthene with subsequent O methylation of dihydroxyfluoranthene also occurs as a detoxification reaction.
doi:10.1128/JB.00128-07
PMCID: PMC1913438
PMID: 17449607
Background
Exposure to environmental tobacco smoke (ETS) leads to higher rates of pulmonary diseases and infections in children. To study the biochemical changes that may precede lung diseases, metabolomic effects on fetal and maternal lungs and plasma from rats exposed to ETS were compared to filtered air control animals. Genome- reconstructed metabolic pathways may be used to map and interpret dysregulation in metabolic networks. However, mass spectrometry-based non-targeted metabolomics datasets often comprise many metabolites for which links to enzymatic reactions have not yet been reported. Hence, network visualizations that rely on current biochemical databases are incomplete and also fail to visualize novel, structurally unidentified metabolites.
Results
We present a novel approach to integrate biochemical pathway and chemical relationships to map all detected metabolites in network graphs (MetaMapp) using KEGG reactant pair database, Tanimoto chemical and NIST mass spectral similarity scores. In fetal and maternal lungs, and in maternal blood plasma from pregnant rats exposed to environmental tobacco smoke (ETS), 459 unique metabolites comprising 179 structurally identified compounds were detected by gas chromatography time of flight mass spectrometry (GC-TOF MS) and BinBase data processing. MetaMapp graphs in Cytoscape showed much clearer metabolic modularity and complete content visualization compared to conventional biochemical mapping approaches. Cytoscape visualization of differential statistics results using these graphs showed that overall, fetal lung metabolism was more impaired than lungs and blood metabolism in dams. Fetuses from ETS-exposed dams expressed lower lipid and nucleotide levels and higher amounts of energy metabolism intermediates than control animals, indicating lower biosynthetic rates of metabolites for cell division, structural proteins and lipids that are critical for in lung development.
Conclusions
MetaMapp graphs efficiently visualizes mass spectrometry based metabolomics datasets as network graphs in Cytoscape, and highlights metabolic alterations that can be associated with higher rate of pulmonary diseases and infections in children prenatally exposed to ETS. The MetaMapp scripts can be accessed at http://metamapp.fiehnlab.ucdavis.edu.
doi:10.1186/1471-2105-13-99
PMCID: PMC3495401
PMID: 22591066
Metabolic networks; Enzymatic pathways; Perinatal lung development; Lung surfactants