Genome-wide association studies (GWAS) with metabolic traits and metabolome-wide association studies (MWAS) with traits of biomedical relevance are powerful tools to identify the contribution of genetic, environmental and lifestyle factors to the etiology of complex diseases. Hypothesis-free testing of ratios between all possible metabolite pairs in GWAS and MWAS has proven to be an innovative approach in the discovery of new biologically meaningful associations. The p-gain statistic was introduced as an ad-hoc measure to determine whether a ratio between two metabolite concentrations carries more information than the two corresponding metabolite concentrations alone. So far, only a rule of thumb was applied to determine the significance of the p-gain.
Here we explore the statistical properties of the p-gain through simulation of its density and by sampling of experimental data. We derive critical values of the p-gain for different levels of correlation between metabolite pairs and show that B/(2*α) is a conservative critical value for the p-gain, where α is the level of significance and B the number of tested metabolite pairs.
We show that the p-gain is a well defined measure that can be used to identify statistically significant metabolite ratios in association studies and provide a conservative significance cut-off for the p-gain for use in future association studies with metabolic traits.
p-gain; Metabolomics; MWAS; GWAS; Genome-wide association studies; Metabolome-wide association studies
Given the ever expanding number of model plant species for which complete genome sequences are available and the abundance of bio-resources such as knockout mutants, wild accessions and advanced breeding populations, there is a rising burden for gene functional annotation. In this protocol, annotation of plant gene function using combined co-expression gene analysis, metabolomics and informatics is provided (Figure 1). This approach is based on the theory of using target genes of known function to allow the identification of non-annotated genes likely to be involved in a certain metabolic process, with the identification of target compounds via metabolomics. Strategies are put forward for applying this information on populations generated by both forward and reverse genetics approaches in spite of none of these are effortless. By corollary this approach can also be used as an approach to characterise unknown peaks representing new or specific secondary metabolites in the limited tissues, plant species or stress treatment, which is currently the important trial to understanding plant metabolism.
Plant Biology; Issue 64; Genetics; Bioinformatics; Metabolomics; Plant metabolism; Transcriptome analysis; Functional annotation; Computational biology; Plant biology; Theoretical biology; Spectroscopy and structural analysis
Biotechnology, including genetic modification, is a very important approach to regulate the production of particular metabolites in plants to improve their adaptation to environmental stress, to improve food quality, and to increase crop yield. Unfortunately, these approaches do not necessarily lead to the expected results due to the highly complex mechanisms underlying metabolic regulation in plants. In this context, metabolomics plays a key role in plant molecular biotechnology, where plant cells are modified by the expression of engineered genes, because we can obtain information on the metabolic status of cells via a snapshot of their metabolome. Although metabolome analysis could be used to evaluate the effect of foreign genes and understand the metabolic state of cells, there is no single analytical method for metabolomics because of the wide range of chemicals synthesized in plants. Here, we describe the basic analytical advancements in plant metabolomics and bioinformatics and the application of metabolomics to the biological study of plants.
Metabolomics; Mass spectrometry; NMR; Functional genomics; GMO; Quantitative genetics; QTL
The metabolome is the terminal downstream product of the genome and consists of the total complement of all the low molecular weight molecules (metabolites) in a cell, tissue or organism. Metabolomics aims to measure a wide breadth of small molecules in the context of physiological stimuli or in disease states. Metabolomics methodologies fall into two distinct groups; untargeted metabolomics, an intended comprehensive analysis of all the measurable analytes in a sample including chemical unknowns, and targeted metabolomics, the measurement of defined groups of chemically characterized and biochemically annotated metabolites. The methodologies considered in this unit focus on the processes of conducting targeted metabolomics experiments, and the advantages of this general approach are highlighted herein. This unit outlines the procedures for extracting nitrogenous metabolites, including the amino acids, lipids, and intermediary metabolites, including the TCA cycle oxoacids, from blood plasma. Specifically, protocols for the analysis of these metabolites using liquid chromatography-mass spectrometry-based targeted metabolomics experiments is discussed.
Targeted Metabolomics; Liquid Chromatography-Mass Spectrometry; Multiple Reaction Monitoring
We have performed a metabolite quantitative trait locus (mQTL) study of the 1H nuclear magnetic resonance spectroscopy (1H NMR) metabolome in humans, building on recent targeted knowledge of genetic drivers of metabolic regulation. Urine and plasma samples were collected from two cohorts of individuals of European descent, with one cohort comprised of female twins donating samples longitudinally. Sample metabolite concentrations were quantified by 1H NMR and tested for association with genome-wide single-nucleotide polymorphisms (SNPs). Four metabolites' concentrations exhibited significant, replicable association with SNP variation (8.6×10−11
Physiological concentrations of metabolites—small molecules involved in biochemical processes in living systems—can be measured and used to diagnose and predict disease states. A common goal is to detect and clinically exploit statistical differences in metabolite concentrations between diseased and healthy individuals. As a basis for the design and interpretation of case-control studies, it is useful to have a characterization of metabolic diversity amongst healthy individuals, some of which stems from inter-individual genetic variation. When a single genetic locus has a sufficiently strong effect on metabolism, its genomic position can be determined by collecting metabolite concentration data and genome-wide genotype data on a set of individuals and searching for associations between the two data sets—a so-called metabolite quantitative trait locus (mQTL) study. By so tracing mQTLs, we can identify the genetic drivers of metabolism, characterize how the nature or quantity of the corresponding expressed protein(s) feeds forward to influence metabolite levels, and specify disease-predictive models that incorporate mutual dependence amongst genetics, environment, and metabolism.
In many fields of medicine there is a growing interest in characterizing diseases at molecular level with a view to developing an individually tailored therapeutic approach. Metabolomics is a novel area that promises to contribute significantly to the characterization of various disease phenotypes and to the identification of personal metabolic features that can predict response to therapies. Based on analytical platforms such as mass spectrometry or NMR-based spectroscopy, the metabolomic approach enables a comprehensive overview of the metabolites, leading to the characterization of the metabolic fingerprint of a given sample. These metabolic fingerprints can then be used to distinguish between different disease phenotypes and to predict a drug's effectiveness and/or toxicity.
Several studies published in the last few years applied the metabolomic approach in the field of pediatric medicine. Being a highly informative technique that can be used on samples collected non-invasively (e.g. urine or exhaled breath condensate), metabolomics has appeal for the study of pediatric diseases. Here we present and discuss the pediatric clinical studies that have taken the metabolomic approach.
Interpreting quantitative metabolome data is a difficult task owing to the high connectivity in metabolic networks and inherent interdependency between enzymatic regulation, metabolite levels and fluxes. Here we present a hypothesis-driven algorithm for the integration of such data with metabolic network topology. The algorithm thus enables identification of reporter reactions, which are reactions where there are significant coordinated changes in the level of surrounding metabolites following environmental/genetic perturbations. Applicability of the algorithm is demonstrated by using data from Saccharomyces cerevisiae. The algorithm includes preprocessing of a genome-scale yeast model such that the fraction of measured metabolites within the model is enhanced, and thus it is possible to map significant alterations associated with a perturbation even though a small fraction of the complete metabolome is measured. By combining the results with transcriptome data, we further show that it is possible to infer whether the reactions are hierarchically or metabolically regulated. Hereby, the reported approach represents an attempt to map different layers of regulation within metabolic networks through combination of metabolome and transcriptome data.
data integration; genome-scale stoichiometric model; metabolic regulation; quantitative metabolomics; reporter reactions
Current Genome-Wide Association Studies (GWAS) are performed in a single trait framework without considering genetic correlations between important disease traits. Hence, the GWAS have limitations in discovering genetic risk factors affecting pleiotropic effects.
This work reports a novel data mining approach to discover patterns of multiple phenotypic associations over 52 anthropometric and biochemical traits in KARE and a new analytical scheme for GWAS of multivariate phenotypes defined by the discovered patterns. This methodology applied to the GWAS for multivariate phenotype highLDLhighTG derived from the predicted patterns of the phenotypic associations. The patterns of the phenotypic associations were informative to draw relations between plasma lipid levels with bone mineral density and a cluster of common traits (Obesity, hypertension, insulin resistance) related to Metabolic Syndrome (MS). A total of 15 SNPs in six genes (PAK7, C20orf103, NRIP1, BCL2, TRPM3, and NAV1) were identified for significant associations with highLDLhighTG. Noteworthy findings were that the significant associations included a mis-sense mutation (PAK7:R335P), a frame shift mutation (C20orf103) and SNPs in splicing sites (TRPM3).
The six genes corresponded to rat and mouse quantitative trait loci (QTLs) that had shown associations with the common traits such as the well characterized MS and even tumor susceptibility. Our findings suggest that the six genes may play important roles in the pleiotropic effects on lipid metabolism and the MS, which increase the risk of Type 2 Diabetes and cardiovascular disease. The use of the multivariate phenotypes can be advantageous in identifying genetic risk factors, accounting for the pleiotropic effects when the multivariate phenotypes have a common etiological pathway.
One of the goals of global metabolomic analysis is to identify metabolic markers that are hidden within a large background of data originating from high-throughput analytical measurements. Metabolite-based clustering is an unsupervised approach for marker identification based on grouping similar concentration profiles of putative metabolites. A major problem of this approach is that in general there is no prior information about an adequate number of clusters.
We present an approach for data mining on metabolite intensity profiles as obtained from mass spectrometry measurements. We propose one-dimensional self-organizing maps for metabolite-based clustering and visualization of marker candidates. In a case study on the wound response of Arabidopsis thaliana, based on metabolite profile intensities from eight different experimental conditions, we show how the clustering and visualization capabilities can be used to identify relevant groups of markers.
Our specialized realization of self-organizing maps is well-suitable to gain insight into complex pattern variation in a large set of metabolite profiles. In comparison to other methods our visualization approach facilitates the identification of interesting groups of metabolites by means of a convenient overview on relevant intensity patterns. In particular, the visualization effectively supports researchers in analyzing many putative clusters when the true number of biologically meaningful groups is unknown.
Advances in analytical methodologies, principally nuclear magnetic resonance spectroscopy (NMR) and mass spectrometry (MS), during the last decade have made large-scale analysis of the human metabolome a reality. This is leading to the reawakening of the importance of metabolism in human diseases, particularly cancer. The metabolome is the functional readout of the genome, functional genome, and proteome; it is also an integral partner in molecular regulations for homeostasis. The interrogation of the metabolome, or metabolomics, is now being applied to numerous diseases, largely by metabolite profiling for biomarker discovery, but also in pharmacology and therapeutics. Recent advances in stable isotope tracer-based metabolomic approaches enable unambiguous tracking of individual atoms through compartmentalized metabolic networks directly in human subjects, which promises to decipher the complexity of the human metabolome at an unprecedented pace. This knowledge will revolutionize our understanding of complex human diseases, clinical diagnostics, as well as individualized therapeutics and drug response.
In this review, we focus on the use of stable isotope tracers with metabolomics technologies for understanding metabolic network dynamics in both model systems and in clinical applications. Atom-resolved isotope tracing via the two major analytical platforms, NMR and MS, has the power to determine novel metabolic reprogramming in diseases, discover new drug targets, and facilitates ADME studies. We also illustrate new metabolic tracer-based imaging technologies, which enable direct visualization of metabolic processes in vivo. We further outline current practices and future requirements for biochemoinformatics development, which is an integral part of translating stable isotope-resolved metabolomics into clinical reality.
Metabolomics; systems biochemistry; stable isotope tracing; drug discovery; metabolic compartmentation and regulation; pathway reconstruction
Xenobiotic metabolism, a ubiquitous natural response to foreign compounds, elicits initiating signals for many pathophysiological events. Currently, most widely used techniques for identifying xenobiotic metabolites and metabolic pathways are empirical and largely based on in vitro incubation assays and in vivo radiotracing experiments. Recent work in our lab has shown that LC-MS-based metabolomic techniques are useful tools for xenobiotic metabolism research since multivariate data analysis in metabolomics can significantly rationalize the processes of xenobiotic metabolite identification and metabolic pathway analysis. In this review, the technological elements of LC-MS-based metabolomics for constructing high-quality datasets and conducting comprehensive data analysis are examined. Four novel approaches of using LC-MS-based metabolomic techniques in xenobiotic metabolism research are proposed and illustrated by case studies and proof-of-concept experiments, and the perspective on their application is further discussed.
Metabolomics; Xenobiotic metabolism; Drug metabolism; LC-MS; Multivariate data analysis
Genome-wide association studies (GWASs) have become a very important tool to address the genetic origin of phenotypic variability, in particular associated with diseases. Nevertheless, these types of studies provide limited information about disease etiology and the molecular mechanisms involved. Recently, the incorporation of metabolomics into the analysis has offered novel opportunities for a better understanding of disease-related metabolic deregulation. The pattern emerging from this work is that gene-driven changes in metabolism are prevalent and that common genetic variations can have a profound impact on the homeostatic concentrations of specific metabolites. A particularly interesting aspect of this work takes into account interactions of environment and lifestyle with the genome and how this interaction translates into changes in the metabolome. For instance, the role of PYROXD2 in trimethylamine metabolism points to an interaction between host and microbiome genomes (host/microbiota). Often, these findings reveal metabolic deregulations, which could eventually be tuned with a nutritional intervention. Here we review the development of gene–metabolism association studies from a single-gene/single-metabolite to a genome-wide/metabolome-wide approach and highlight the conceptual changes associated with this ongoing transition. Moreover, we report some of our recent GWAS results on a cohort of 265 individuals from an ethnically diverse population that validate and refine previous findings on gene–urine metabolism interactions. Specifically, our results confirm the effect of PYROXD2 polymorphisms on trimethylamine metabolism and suggest that a previously reported association of N-acetylated compounds with the ALMS1/NAT8 locus is driven by SNPs in the ALMS1 gene.
Electronic supplementary material
The online version of this article (doi:10.1007/s12263-012-0313-7) contains supplementary material, which is available to authorized users.
Metabolome wide associations; Genome wide associations; Metabolomics; Nutrition
This review presents an overview of the dynamically developing field of mass spectrometry-based metabolomics. Metabolomics aims at the comprehensive and quantitative analysis of wide arrays of metabolites in biological samples. These numerous analytes have very diverse physico-chemical properties and occur at different abundance levels. Consequently, comprehensive metabolomics investigations are primarily a challenge for analytical chemistry and specifically mass spectrometry has vast potential as a tool for this type of investigation. Metabolomics require special approaches for sample preparation, separation, and mass spectrometric analysis. Current examples of those approaches are described in this review. It primarily focuses on metabolic fingerprinting, a technique that analyzes all detectable analytes in a given sample with subsequent classification of samples and identification of differentially expressed metabolites, which define the sample classes. To perform this complex task, data analysis tools, metabolite libraries, and databases are required. Therefore, recent advances in metabolomics bioinformatics are also discussed.
metabolomics; metabolic fingerprinting; metabolic profiling; lipidomics; mass spectrometry
High-throughput metabolomic experiments aim at identifying and ultimately quantifying all metabolites present in biological systems. The metabolites are interconnected through metabolic reactions, generally grouped into metabolic pathways. Classical metabolic maps provide a relational context to help interpret metabolomics experiments and a wide range of tools have been developed to help place metabolites within metabolic pathways. However, the representation of metabolites within separate disconnected pathways overlooks most of the connectivity of the metabolome. By definition, reference pathways cannot integrate novel pathways nor show relationships between metabolites that may be linked by common neighbours without being considered as joint members of a classical biochemical pathway. MetExplore is a web server that offers the possibility to link metabolites identified in untargeted metabolomics experiments within the context of genome-scale reconstructed metabolic networks. The analysis pipeline comprises mapping metabolomics data onto the specific metabolic network of an organism, then applying graph-based methods and advanced visualization tools to enhance data analysis. The MetExplore web server is freely accessible at http://metexplore.toulouse.inra.fr.
Although statins are widely prescribed medications, there remains considerable variability in therapeutic response. Genetics can explain only part of this variability. Metabolomics is a global biochemical approach that provides powerful tools for mapping pathways implicated in disease and in response to treatment. Metabolomics captures net interactions between genome, microbiome and the environment. In this study, we used a targeted GC-MS metabolomics platform to measure a panel of metabolites within cholesterol synthesis, dietary sterol absorption, and bile acid formation to determine metabolite signatures that may predict variation in statin LDL-C lowering efficacy. Measurements were performed in two subsets of the total study population in the Cholesterol and Pharmacogenetics (CAP) study: Full Range of Response (FR), and Good and Poor Responders (GPR) were 100 individuals randomly selected from across the entire range of LDL-C responses in CAP. GPR were 48 individuals, 24 each from the top and bottom 10% of the LDL-C response distribution matched for body mass index, race, and gender. We identified three secondary, bacterial-derived bile acids that contribute to predicting the magnitude of statin-induced LDL-C lowering in good responders. Bile acids and statins share transporters in the liver and intestine; we observed that increased plasma concentration of simvastatin positively correlates with higher levels of several secondary bile acids. Genetic analysis of these subjects identified associations between levels of seven bile acids and a single nucleotide polymorphism (SNP), rs4149056, in the gene encoding the organic anion transporter SLCO1B1. These findings, along with recently published results that the gut microbiome plays an important role in cardiovascular disease, indicate that interactions between genome, gut microbiome and environmental influences should be considered in the study and management of cardiovascular disease. Metabolic profiles could provide valuable information about treatment outcomes and could contribute to a more personalized approach to therapy.
Metabolomics has matured over the past 10 years. By combining different platforms, over 2,000 identified metabolites can be screened. At the UC Davis Genome Center Metabolomics Facility, two laboratories work towards advancing methods and reaching out services, the Fiehn research laboratory and the metabolomics core. We have a combined use of 11 mass spectrometers for which a range of SOPs and quality controls have been developed for (a) primary metabolism, (b) volatile metabolites, (c) lipidomics, (d) secondary metabolites and (e) metabolic polymers. Over 300 studies have been completed over the past 5 years which are stored and disseminated via the SetupX study design database and facilitated by the BinBase mass spectrometry repositories. Lipid identifications by nanoESI-ion trap MS/MS are based on Genedata's MS Refiner software and a novel cross-instrument library, the LipidBLAST tool that stores calculated MS/MS spectra of over 180,000 lipids based on fragmentation patterns of authentic standards. The FiehnLib libraries of over 1,000 primary metabolites authenticate identifications in GC-TOF platforms, in conjunction with BinBase and the Adams volatile MS library. Polymers in biofuel research are assessed by pyrolysis-GC/MS and the MIT-based SpectConnect tool. LC-ion trap, Qtrap and QTOF mass spectrometry are used for determining compounds that are not amenable by one of the above methods, such as cationic metabolites (SAM, betaine, SMM), metabolic active biomarkers (acylcarnitines) and other important metabolic classes (dietary phytochemicals, folates and glucuronides). Despite this progress, metabolomics still faces a number of analytical challenges: the need for accuracy in structural identifications and quantifications, increases in total peak capacities, improved data processing software and the need for standardized database repositories. Current efforts are presented as well as a discussion on experiences in the dual task of ‘research’ and ‘service’ for metabolomic facilities and how to meet outside expectations and financial constraints.
Induced pluripotent stem cells are different from embryonic stem cells as shown by epigenetic and genomics analyses. Depending on cell types and culture conditions, such genetic alterations can lead to different metabolic phenotypes which may impact replication rates, membrane properties and cell differentiation. We here applied a comprehensive metabolomics strategy incorporating nanoelectrospray ion trap mass spectrometry (MS), gas chromatography-time of flight MS, and hydrophilic interaction- and reversed phase-liquid chromatography-quadrupole time-of-flight MS to examine the metabolome of induced pluripotent stem cells (iPSCs) compared to parental fibroblasts as well as to reference embryonic stem cells (ESCs). With over 250 identified metabolites and a range of structurally unknown compounds, quantitative and statistical metabolome data were mapped onto a metabolite networks describing the metabolic state of iPSCs relative to other cell types. Overall iPSCs exhibited a striking shift metabolically away from parental fibroblasts and toward ESCs, suggestive of near complete metabolic reprogramming. Differences between pluripotent cell types were not observed in carbohydrate or hydroxyl acid metabolism, pentose phosphate pathway metabolites, or free fatty acids. However, significant differences between iPSCs and ESCs were evident in phosphatidylcholine and phosphatidylethanolamine lipid structures, essential and non-essential amino acids, and metabolites involved in polyamine biosynthesis. Together our findings demonstrate that during cellular reprogramming, the metabolome of fibroblasts is also reprogrammed to take on an ESC-like profile, but there are select unique differences apparent in iPSCs. The identified metabolomics signatures of iPSCs and ESCs may have important implications for functional regulation of maintenance and induction of pluripotency.
Metabolomic profiling and the integration of whole-genome genetic association data has proven to be a powerful tool to comprehensively explore gene regulatory networks and to investigate the effects of genetic variation at the molecular level. Serum metabolite concentrations allow a direct readout of biological processes, and association of specific metabolomic signatures with complex diseases such as Alzheimer's disease and cardiovascular and metabolic disorders has been shown. There are well-known correlations between sex and the incidence, prevalence, age of onset, symptoms, and severity of a disease, as well as the reaction to drugs. However, most of the studies published so far did not consider the role of sexual dimorphism and did not analyse their data stratified by gender. This study investigated sex-specific differences of serum metabolite concentrations and their underlying genetic determination. For discovery and replication we used more than 3,300 independent individuals from KORA F3 and F4 with metabolite measurements of 131 metabolites, including amino acids, phosphatidylcholines, sphingomyelins, acylcarnitines, and C6-sugars. A linear regression approach revealed significant concentration differences between males and females for 102 out of 131 metabolites (p-values<3.8×10−4; Bonferroni-corrected threshold). Sex-specific genome-wide association studies (GWAS) showed genome-wide significant differences in beta-estimates for SNPs in the CPS1 locus (carbamoyl-phosphate synthase 1, significance level: p<3.8×10−10; Bonferroni-corrected threshold) for glycine. We showed that the metabolite profiles of males and females are significantly different and, furthermore, that specific genetic variants in metabolism-related genes depict sexual dimorphism. Our study provides new important insights into sex-specific differences of cell regulatory processes and underscores that studies should consider sex-specific effects in design and interpretation.
The combination of genomic and metabolic studies during the last years has provided astonishing results. However, most of the studies published so far did not consider the role of sexual dimorphism and did not analyse their data stratified by sex. The investigation of 131 serum metabolite concentrations of >3,300 population-based samples (KORA F3/F4) revealed significant differences in the metabolite profile of males and females. Furthermore, a genome-wide picture of sex-specific genetic variations in human metabolism (>2,000 subjects from KORA F3/F4 cohorts) was investigated. Sex-specific genome-wide association studies (GWAS) showed differences in the effect of genetic variations on metabolites in men and women. SNPs in the CPS1 (carbamoyl-phosphate synthase 1) locus showed genome-wide significant differences in beta-estimates of sex-specific association analysis (significance level: 3.8×10−10) for glycine. As global metabolomic techniques are more and more refined to identify more compounds in single biological samples, the predictive power of this new technology will greatly increase. This suggests that metabolites, which may be used as predictive biomarkers to indicate the presence or severity of a disease, have to be used selectively depending on sex.
The goal of metabolomics analyses is a comprehensive and systematic understanding of all metabolites in biological samples. Many useful platforms have been developed to achieve this goal. Gas chromatography coupled to mass spectrometry (GC/MS) is a well-established analytical method in metabolomics study, and 200 to 500 peaks are routinely observed with one biological sample. However, only ~100 metabolites can be identified, and the remaining peaks are left as "unknowns".
We present an algorithm that acquires more extensive metabolite information. Pearson's product-moment correlation coefficient and the Soft Independent Modeling of Class Analogy (SIMCA) method were combined to automatically identify and annotate unknown peaks, which tend to be missed in routine studies that employ manual processing.
Our data mining system can offer a wealth of metabolite information quickly and easily, and it provides new insights, particularly into food quality evaluation and prediction.