PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1147579)

Clipboard (0)
None

Related Articles

1.  Genetics Meets Metabolomics: A Genome-Wide Association Study of Metabolite Profiles in Human Serum 
PLoS Genetics  2008;4(11):e1000282.
The rapidly evolving field of metabolomics aims at a comprehensive measurement of ideally all endogenous metabolites in a cell or body fluid. It thereby provides a functional readout of the physiological state of the human body. Genetic variants that associate with changes in the homeostasis of key lipids, carbohydrates, or amino acids are not only expected to display much larger effect sizes due to their direct involvement in metabolite conversion modification, but should also provide access to the biochemical context of such variations, in particular when enzyme coding genes are concerned. To test this hypothesis, we conducted what is, to the best of our knowledge, the first GWA study with metabolomics based on the quantitative measurement of 363 metabolites in serum of 284 male participants of the KORA study. We found associations of frequent single nucleotide polymorphisms (SNPs) with considerable differences in the metabolic homeostasis of the human body, explaining up to 12% of the observed variance. Using ratios of certain metabolite concentrations as a proxy for enzymatic activity, up to 28% of the variance can be explained (p-values 10−16 to 10−21). We identified four genetic variants in genes coding for enzymes (FADS1, LIPC, SCAD, MCAD) where the corresponding metabolic phenotype (metabotype) clearly matches the biochemical pathways in which these enzymes are active. Our results suggest that common genetic polymorphisms induce major differentiations in the metabolic make-up of the human population. This may lead to a novel approach to personalized health care based on a combination of genotyping and metabolic characterization. These genetically determined metabotypes may subscribe the risk for a certain medical phenotype, the response to a given drug treatment, or the reaction to a nutritional intervention or environmental challenge.
Author Summary
This paper reports what is, to the best of our knowledge, the first genome-wide association (GWA) study with metabolic traits as phenotypic traits. By simultaneous measurements of single nucleotide polymorphisms (SNPs) and serum concentrations of endogenous organic compounds in a human population, we identify genetically determined variants in metabolic phenotype (metabotype) that exhibit large effect sizes. Four of these polymorphisms are located in genes coding for well-characterized enzymes of the lipid metabolism. We find that individuals with different genotypes in these genes have significantly different metabolic capacities with respect to the synthesis of some polyunsaturated fatty acids, the beta-oxidation of short- and medium-chain fatty acids, and the breakdown of triglycerides. In this approach, the concept of the “genetically determined metabotype” as an intermediate phenotype is central, as it becomes a measurable quantity in the framework of GWA studies with metabolomics. The investigation of the genetically determined metabotypes in their biochemical context might help to better understand the pathogenesis of common diseases and gene–environment interactions. These findings could result in a step towards personalized health care and nutrition based on a combination of genotyping and metabolic characterization.
doi:10.1371/journal.pgen.1000282
PMCID: PMC2581785  PMID: 19043545
2.  Human metabolic profiles are stably controlled by genetic and environmental variation 
A comprehensive variation map of the human metabolome identifies genetic and stable-environmental sources as major drivers of metabolite concentrations. The data suggest that sample sizes of a few thousand are sufficient to detect metabolite biomarkers predictive of disease.
We designed a longitudinal twin study to characterize the genetic, stable-environmental, and longitudinally fluctuating influences on metabolite concentrations in two human biofluids—urine and plasma—focusing specifically on the representative subset of metabolites detectable by 1H nuclear magnetic resonance (1H NMR) spectroscopy.We identified widespread genetic and stable-environmental influences on the (urine and plasma) metabolomes, with (30 and 42%) attributable on average to familial sources, and (47 and 60%) attributable to longitudinally stable sources.Ten of the metabolites annotated in the study are estimated to have >60% familial contribution to their variation in concentration.Our findings have implications for the design and interpretation of 1H NMR-based molecular epidemiology studies. On the basis of the stable component of variation quantified in the current paper, we specified a model of disease association under which we inferred that sample sizes of a few thousand should be sufficient to detect disease-predictive metabolite biomarkers.
Metabolites are small molecules involved in biochemical processes in living systems. Their concentration in biofluids, such as urine and plasma, can offer insights into the functional status of biological pathways within an organism, and reflect input from multiple levels of biological organization—genetic, epigenetic, transcriptomic, and proteomic—as well as from environmental and lifestyle factors. Metabolite levels have the potential to indicate a broad variety of deviations from the ‘normal' physiological state, such as those that accompany a disease, or an increased susceptibility to disease. A number of recent studies have demonstrated that metabolite concentrations can be used to diagnose disease states accurately. A more ambitious goal is to identify metabolite biomarkers that are predictive of future disease onset, providing the possibility of intervention in susceptible individuals.
If an extreme concentration of a metabolite is to serve as an indicator of disease status, it is usually important to know the distribution of metabolite levels among healthy individuals. It is also useful to characterize the sources of that observed variation in the healthy population. A proportion of that variation—the heritable component—is attributable to genetic differences between individuals, potentially at many genetic loci. An effective, molecular indicator of a heritable, complex disease is likely to have a substantive heritable component. Non-heritable biological variation in metabolite concentrations can arise from a variety of environmental influences, such as dietary intake, lifestyle choices, general physical condition, composition of gut microflora, and use of medication. Variation across a population in stable-environmental influences leads to long-term differences between individuals in their baseline metabolite levels. Dynamic environmental pressures lead to short-term fluctuations within an individual about their baseline level. A metabolite whose concentration changes substantially in response to short-term pressures is relatively unlikely to offer long-term prediction of disease. In summary, the potential suitability of a metabolite to predict disease is reflected by the relative contributions of heritable and stable/unstable-environmental factors to its variation in concentration across the healthy population.
Studies involving twins are an established technique for quantifying the heritable component of phenotypes in human populations. Monozygotic (MZ) twins share the same DNA genome-wide, while dizygotic (DZ) twins share approximately half their inherited DNA, as do ordinary siblings. By comparing the average extent of phenotypic concordance within MZ pairs to that within DZ pairs, it is possible to quantify the heritability of a trait, and also to quantify the familiality, which refers to the combination of heritable and common-environmental effects (i.e., environmental influences shared by twins in a pair). In addition to incorporating twins into the study design, it is useful to quantify the phenotype in some individuals at multiple time points. The longitudinal aspect of such a study allows environmental effects to be decomposed into those that affect the phenotype over the short term and those that exert stable influence.
For the current study, urine and blood samples were collected from a cohort of MZ and DZ twins, with some twins donating samples on two occasions several months apart. Samples were analysed by 1H nuclear magnetic resonance (1H NMR) spectroscopy—an untargeted, discovery-driven technique for quantifying metabolite concentrations in biological samples. The application of 1H NMR to a biological sample creates a spectrum, made up of multiple peaks, with each peak's size quantitatively representing the concentration of its corresponding hydrogen-containing metabolite.
In each biological sample in our study, we extracted a full set of peaks, and thereby quantified the concentrations of all common plasma and urine metabolites detectable by 1H NMR. We developed bespoke statistical methods to decompose the observed concentration variation at each metabolite peak into that originating from familial, individual-environmental, and unstable-environmental sources.
We quantified the variability landscape across all common metabolite peaks in the urine and plasma 1H NMR metabolomes. We annotated a subset of peaks with a total of 65 metabolites; the variance decompositions for these are shown in Figure 1. Ten metabolites' concentrations were estimated to have familial contributions in excess of 60%. The average proportion of stable variation across all extracted metabolite peaks was estimated to be 47% in the urine samples and 60% in the plasma samples; the average estimated familiality was 30% for urine and 42% for plasma. These results comprise the first quantitative variation map of the 1H NMR metabolome. The identification and quantification of substantive widespread stability provides support for the use of these biofluids in molecular epidemiology studies. On the basis of our findings, we performed power calculations for a hypothetical study searching for predictive disease biomarkers among 1H NMR-detectable urine and plasma metabolites. Our calculations suggest that sample sizes of 2000–5000 should allow reliable identification of disease-predictive metabolite concentrations explaining 5–10% of disease risk, while greater sample sizes of 5000–20 000 would be required to identify metabolite concentrations explaining 1–2% of disease risk.
1H Nuclear Magnetic Resonance spectroscopy (1H NMR) is increasingly used to measure metabolite concentrations in sets of biological samples for top-down systems biology and molecular epidemiology. For such purposes, knowledge of the sources of human variation in metabolite concentrations is valuable, but currently sparse. We conducted and analysed a study to create such a resource. In our unique design, identical and non-identical twin pairs donated plasma and urine samples longitudinally. We acquired 1H NMR spectra on the samples, and statistically decomposed variation in metabolite concentration into familial (genetic and common-environmental), individual-environmental, and longitudinally unstable components. We estimate that stable variation, comprising familial and individual-environmental factors, accounts on average for 60% (plasma) and 47% (urine) of biological variation in 1H NMR-detectable metabolite concentrations. Clinically predictive metabolic variation is likely nested within this stable component, so our results have implications for the effective design of biomarker-discovery studies. We provide a power-calculation method which reveals that sample sizes of a few thousand should offer sufficient statistical precision to detect 1H NMR-based biomarkers quantifying predisposition to disease.
doi:10.1038/msb.2011.57
PMCID: PMC3202796  PMID: 21878913
biomarker; 1H nuclear magnetic resonance spectroscopy; metabolome-wide association study; top-down systems biology; variance decomposition
3.  Genome-Wide Association Study of Metabolic Traits Reveals Novel Gene-Metabolite-Disease Links 
PLoS Genetics  2014;10(2):e1004132.
Metabolic traits are molecular phenotypes that can drive clinical phenotypes and may predict disease progression. Here, we report results from a metabolome- and genome-wide association study on 1H-NMR urine metabolic profiles. The study was conducted within an untargeted approach, employing a novel method for compound identification. From our discovery cohort of 835 Caucasian individuals who participated in the CoLaus study, we identified 139 suggestively significant (P<5×10−8) and independent associations between single nucleotide polymorphisms (SNP) and metabolome features. Fifty-six of these associations replicated in the TasteSensomics cohort, comprising 601 individuals from São Paulo of vastly diverse ethnic background. They correspond to eleven gene-metabolite associations, six of which had been previously identified in the urine metabolome and three in the serum metabolome. Our key novel findings are the associations of two SNPs with NMR spectral signatures pointing to fucose (rs492602, P = 6.9×10−44) and lysine (rs8101881, P = 1.2×10−33), respectively. Fine-mapping of the first locus pinpointed the FUT2 gene, which encodes a fucosyltransferase enzyme and has previously been associated with Crohn's disease. This implicates fucose as a potential prognostic disease marker, for which there is already published evidence from a mouse model. The second SNP lies within the SLC7A9 gene, rare mutations of which have been linked to severe kidney damage. The replication of previous associations and our new discoveries demonstrate the potential of untargeted metabolomics GWAS to robustly identify molecular disease markers.
Author Summary
The concentrations of small molecules known as metabolites, are subject to tight regulation in all organisms. Collectively, the metabolite concentrations make up the metabolome, which differs amongst individuals as a function of their environment and genetic makeup. In our study, we have further developed an untargeted approach to identify genetic factors affecting human metabolism. In this approach, we first identify all genetic variants that correlate with any of the measured metabolome features in a large set of individuals. For these variants, we then compute a profile of significance for association with all features, generating a signature that facilitates the expert or computational identification of the metabolite whose concentration is most likely affected by the genetic variant at hand. Our study replicated many of the previously reported genetically driven variations in human metabolism and revealed two new striking examples of genetic variations with a sizeable effect on the urine metabolome. Interestingly, in these two gene-metabolite pairs both the gene and the affected metabolite are related to human diseases – Crohn's disease in the first case, and kidney disease in the second. This highlights the connection between genetic predispositions, affected metabolites, and human health.
doi:10.1371/journal.pgen.1004132
PMCID: PMC3930510  PMID: 24586186
4.  Discovery of Sexual Dimorphisms in Metabolic and Genetic Biomarkers 
PLoS Genetics  2011;7(8):e1002215.
Metabolomic profiling and the integration of whole-genome genetic association data has proven to be a powerful tool to comprehensively explore gene regulatory networks and to investigate the effects of genetic variation at the molecular level. Serum metabolite concentrations allow a direct readout of biological processes, and association of specific metabolomic signatures with complex diseases such as Alzheimer's disease and cardiovascular and metabolic disorders has been shown. There are well-known correlations between sex and the incidence, prevalence, age of onset, symptoms, and severity of a disease, as well as the reaction to drugs. However, most of the studies published so far did not consider the role of sexual dimorphism and did not analyse their data stratified by gender. This study investigated sex-specific differences of serum metabolite concentrations and their underlying genetic determination. For discovery and replication we used more than 3,300 independent individuals from KORA F3 and F4 with metabolite measurements of 131 metabolites, including amino acids, phosphatidylcholines, sphingomyelins, acylcarnitines, and C6-sugars. A linear regression approach revealed significant concentration differences between males and females for 102 out of 131 metabolites (p-values<3.8×10−4; Bonferroni-corrected threshold). Sex-specific genome-wide association studies (GWAS) showed genome-wide significant differences in beta-estimates for SNPs in the CPS1 locus (carbamoyl-phosphate synthase 1, significance level: p<3.8×10−10; Bonferroni-corrected threshold) for glycine. We showed that the metabolite profiles of males and females are significantly different and, furthermore, that specific genetic variants in metabolism-related genes depict sexual dimorphism. Our study provides new important insights into sex-specific differences of cell regulatory processes and underscores that studies should consider sex-specific effects in design and interpretation.
Author Summary
The combination of genomic and metabolic studies during the last years has provided astonishing results. However, most of the studies published so far did not consider the role of sexual dimorphism and did not analyse their data stratified by sex. The investigation of 131 serum metabolite concentrations of >3,300 population-based samples (KORA F3/F4) revealed significant differences in the metabolite profile of males and females. Furthermore, a genome-wide picture of sex-specific genetic variations in human metabolism (>2,000 subjects from KORA F3/F4 cohorts) was investigated. Sex-specific genome-wide association studies (GWAS) showed differences in the effect of genetic variations on metabolites in men and women. SNPs in the CPS1 (carbamoyl-phosphate synthase 1) locus showed genome-wide significant differences in beta-estimates of sex-specific association analysis (significance level: 3.8×10−10) for glycine. As global metabolomic techniques are more and more refined to identify more compounds in single biological samples, the predictive power of this new technology will greatly increase. This suggests that metabolites, which may be used as predictive biomarkers to indicate the presence or severity of a disease, have to be used selectively depending on sex.
doi:10.1371/journal.pgen.1002215
PMCID: PMC3154959  PMID: 21852955
5.  Genetic determinants of metabolism in health and disease: from biochemical genetics to genome-wide associations 
Genome Medicine  2012;4(4):30.
Increasingly sophisticated measurement technologies have allowed the fields of metabolomics and genomics to identify, in parallel, risk factors of disease; predict drug metabolism; and study metabolic and genetic diversity in large human populations. Yet the complementarity of these fields and the utility of studying genes and metabolites together is belied by the frequent separate, parallel applications of genomic and metabolomic analysis. Early attempts at identifying co-variation and interaction between genetic variants and downstream metabolic changes, including metabolic profiling of human Mendelian diseases and quantitative trait locus mapping of individual metabolite concentrations, have recently been extended by new experimental designs that search for a large number of gene-metabolite associations. These approaches, including metabolomic quantitiative trait locus mapping and metabolomic genome-wide association studies, involve the concurrent collection of both genomic and metabolomic data and a subsequent search for statistical associations between genetic polymorphisms and metabolite concentrations across a broad range of genes and metabolites. These new data-fusion techniques will have important consequences in functional genomics, microbial metagenomics and disease modeling, the early results and implications of which are reviewed.
doi:10.1186/gm329
PMCID: PMC3446258  PMID: 22546284
Metabonomics/metabolomics; Quantitative Trait Locus Mapping; Biochemical Genetics; NMR; MS
6.  Integrative genome-scale metabolic analysis of Vibrio vulnificus for drug targeting and discovery 
Chromosome 1 of Vibrio vulnificus tends to contain larger portion of essential or housekeeping genes on the basis of the genomic analysis and gene knockout experiments performed in this study, while its chromosome 2 seems to have originated and evolved from a plasmid.The genome-scale metabolic network model of V. vulnificus was reconstructed based on databases and literature, and was used to identify 193 essential metabolites.Five essential metabolites finally selected after the filtering process are 2-amino-4-hydroxy-6-hydroxymethyl-7,8-dihydropteridine (AHHMP), D-glutamate (DGLU), 2,3-dihydrodipicolinate (DHDP), 1-deoxy-D-xylulose 5-phosphate (DX5P), and 4-aminobenzoate (PABA), which were predicted to be essential in V. vulnificus, absent in human, and are consumed by multiple reactions.Chemical analogs of the five essential metabolites were screened and a hit compound showing the minimal inhibitory concentration (MIC) of 2 μg/ml and the minimal bactericidal concentration (MBC) of 4 μg/ml against V. vulnificus was identified.
Discovering new antimicrobial targets and consequently new antimicrobials is important as drug resistance of pathogenic microorganisms is becoming an increasingly serious problem in human healthcare management (Fischbach and Walsh, 2009). There clearly exists a gap between genomic studies and drug discovery as the accumulation of knowledge on pathogens at genome level has not successfully transformed into the development of effective drugs (Mills, 2006; Payne et al, 2007). In this study, we dissected the genome of a microbial pathogen in detail, and subsequently developed a systems biological strategy of employing genome-scale metabolic modeling and simulation together with metabolite essentiality analysis for effective drug targeting and discovery. This strategy was used for identifying new drug targets in an opportunistic pathogen Vibrio vulnificus CMCP6 as a model.
V. vulnificus is a Gram-negative halophilic bacterium that is found in estuarine waters, brackish ponds, or coastal areas, and its Biotype 1 is an opportunistic human pathogen that can attack immune-compromised patients, and causes primary septicemia, necrotized wound infections, and gastroenteritis. We previously found that many metabolic genes were specifically induced in vivo, suggesting that specific metabolic pathways are essential for in vivo survival and virulence of this pathogen (Kim et al, 2003; Lee et al, 2007). These results motivated us to carry out systems biological analysis of the genome and the metabolic network for new drug target discovery.
V. vulnificus CMCP6 has two chromosomes. We first re-sequenced genomic regions assembled in low quality and low depth, and subsequently re-annotated the whole genome of V. vulnificus. Horizontal gene transfer was suspected to be responsible for the diversification of each chromosome of V. vulnificus, and the presence of metabolic genes was more biased to chromosome 1 than chromosome 2. Further studies on V. vulnificus genome revealed that chromosome 2 is more prone to diversification for better adaptation to the environment than its chromosome 1, while chromosome 1 tends to expand their genetic repertoire while maintaining the core genes at a constant level.
Next, a genome-scale metabolic network VvuMBEL943 was reconstructed based on literature, databases and experiments for systematic studies on the metabolism of this pathogen and prediction of drug targets. The VvuMBEL943 model is composed of 943 reactions and 765 metabolites, and covers 673 genes. The model was validated by comparing its simulated cell growth phenotype obtained by constraints-based flux analysis with the V. vulnificus-specific experimental data previously reported in the literature. In this study, constraints-based flux analysis is an optimization-based simulation method that calculates intracellular fluxes under the specific genetic and environmental condition (Kim et al, 2008). As a result, 17 growth phenotypes were correctly predicted out of 18 cases, which demonstrate the validity of VvuMBEL943.
The main objective of constructing VvuMBEL943 in this study is to predict potential drug targets by system-wide analysis of the metabolic network for the effective treatment of V. vulnificus. To achieve this goal, a set of drug target candidates was predicted by taking a metabolite-centric approach. Metabolite essentiality analysis is a concept recently introduced for the study of cellular robustness to complement conventional reaction or gene-centric approach (Kim et al, 2007b). Metabolite essentiality analysis observes changes in flux distribution by removing each metabolite from the in silico metabolic network. Hence, metabolite essentiality predicts essential metabolites whose absence causes cell death. By selecting essential metabolites, it is possible to directly screen only their structural analogs, which substantially reduces the number of chemical compounds to screen from the chemical compound library. As a result of implementing this approach, 193 metabolites were initially identified to be essential to the cell. These essential metabolites were then further filtered based on the predetermined criteria, mainly organism specificity and multiple connectivity associated with each metabolite, in order to reduce the number of initial target candidates towards identifying the most effective ones.
Five essential metabolites finally selected are 2-amino-4-hydroxy-6-hydroxymethyl-7,8-dihydropteridine (AHHMP), D-glutamate (DGLU), 2,3-dihydrodipicolinate (DHDP), 1-deoxy-D-xylulose 5-phosphate (DX5P), and 4-aminobenzoate (PABA). Enzymes that consume these essential metabolites were experimentally verified to be essential, which indeed demonstrates the essentiality of these five metabolites. On the basis of the structural information of these five essential metabolites, whole-cell screening assay was performed using their analogs for possible antibacterial discovery. We screened 352 chemical analogs of the essential metabolites selected from the chemical compound library, and found a hit compound 24837, which shows the minimal inhibitory concentration (MIC) of 2 μg/ml and minimal bactericidal concentration (MBC) of 4 μg/ml, showing good antibacterial activity without further structural modification. Although this study demonstrates a proof-of-concept, the approaches and their rationale taken here should serve as a general strategy for discovering novel antibiotics and drugs based on systems-level analysis of metabolic networks.
Although the genomes of many microbial pathogens have been studied to help identify effective drug targets and novel drugs, such efforts have not yet reached full fruition. In this study, we report a systems biological approach that efficiently utilizes genomic information for drug targeting and discovery, and apply this approach to the opportunistic pathogen Vibrio vulnificus CMCP6. First, we partially re-sequenced and fully re-annotated the V. vulnificus CMCP6 genome, and accordingly reconstructed its genome-scale metabolic network, VvuMBEL943. The validated network model was employed to systematically predict drug targets using the concept of metabolite essentiality, along with additional filtering criteria. Target genes encoding enzymes that interact with the five essential metabolites finally selected were experimentally validated. These five essential metabolites are critical to the survival of the cell, and hence were used to guide the cost-effective selection of chemical analogs, which were then screened for antimicrobial activity in a whole-cell assay. This approach is expected to help fill the existing gap between genomics and drug discovery.
doi:10.1038/msb.2010.115
PMCID: PMC3049409  PMID: 21245845
drug discovery; drug targeting; genome analysis; metabolic network; Vibrio vulnificus
7.  A global approach to analysis and interpretation of metabolic data for plant natural product discovery† 
Natural product reports  2013;30(4):565-583.
Discovering molecular components and their functionality is key to the development of hypotheses concerning the organization and regulation of metabolic networks. The iterative experimental testing of such hypotheses is the trajectory that can ultimately enable accurate computational modelling and prediction of metabolic outcomes. This information can be particularly important for understanding the biology of natural products, whose metabolism itself is often only poorly defined. Here, we describe factors that must be in place to optimize the use of metabolomics in predictive biology. A key to achieving this vision is a collection of accurate time-resolved and spatially defined metabolite abundance data and associated metadata. One formidable challenge associated with metabolite profiling is the complexity and analytical limits associated with comprehensively determining the metabolome of an organism. Further, for metabolomics data to be efficiently used by the research community, it must be curated in publically available metabolomics databases. Such databases require clear, consistent formats, easy access to data and metadata, data download, and accessible computational tools to integrate genome system-scale datasets. Although transcriptomics and proteomics integrate the linear predictive power of the genome, the metabolome represents the nonlinear, final biochemical products of the genome, which results from the intricate system(s) that regulate genome expression. For example, the relationship of metabolomics data to the metabolic network is confounded by redundant connections between metabolites and gene-products. However, connections among metabolites are predictable through the rules of chemistry. Therefore, enhancing the ability to integrate the metabolome with anchor-points in the transcriptome and proteome will enhance the predictive power of genomics data. We detail a public database repository for metabolomics, tools and approaches for statistical analysis of metabolomics data, and methods for integrating these dataset with transcriptomic data to create hypotheses concerning specialized metabolism that generates the diversity in natural product chemistry. We discuss the importance of close collaborations among biologists, chemists, computer scientists and statisticians throughout the development of such integrated metabolism-centric databases and software.
doi:10.1039/c3np20111b
PMCID: PMC3629923  PMID: 23447050
8.  Insight in Genome-Wide Association of Metabolite Quantitative Traits by Exome Sequence Analyses 
PLoS Genetics  2015;11(1):e1004835.
Metabolite quantitative traits carry great promise for epidemiological studies, and their genetic background has been addressed using Genome-Wide Association Studies (GWAS). Thus far, the role of less common variants has not been exhaustively studied. Here, we set out a GWAS for metabolite quantitative traits in serum, followed by exome sequence analysis to zoom in on putative causal variants in the associated genes. 1H Nuclear Magnetic Resonance (1H-NMR) spectroscopy experiments yielded successful quantification of 42 unique metabolites in 2,482 individuals from The Erasmus Rucphen Family (ERF) study. Heritability of metabolites were estimated by SOLAR. GWAS was performed by linear mixed models, using HapMap imputations. Based on physical vicinity and pathway analyses, candidate genes were screened for coding region variation using exome sequence data. Heritability estimates for metabolites ranged between 10% and 52%. GWAS replicated three known loci in the metabolome wide significance: CPS1 with glycine (P-value  = 1.27×10−32), PRODH with proline (P-value  = 1.11×10−19), SLC16A9 with carnitine level (P-value  = 4.81×10−14) and uncovered a novel association between DMGDH and dimethyl-glycine (P-value  = 1.65×10−19) level. In addition, we found three novel, suggestively significant loci: TNP1 with pyruvate (P-value  = 1.26×10−8), KCNJ16 with 3-hydroxybutyrate (P-value  = 1.65×10−8) and 2p12 locus with valine (P-value  = 3.49×10−8). Exome sequence analysis identified potentially causal coding and regulatory variants located in the genes CPS1, KCNJ2 and PRODH, and revealed allelic heterogeneity for CPS1 and PRODH. Combined GWAS and exome analyses of metabolites detected by high-resolution 1H-NMR is a robust approach to uncover metabolite quantitative trait loci (mQTL), and the likely causative variants in these loci. It is anticipated that insight in the genetics of intermediate phenotypes will provide additional insight into the genetics of complex traits.
Author Summary
Human metabolic individuality is under strict control of genetic and environmental factors. In our study, we aimed to find the genetic determinants of circulating molecules in sera of large set of individuals representing the general population. First, we performed a hypothesis-free genome wide screen in this population to identify genetic regions of interest. Our study confirmed four known gene metabolite connections, but also pointed to four novel ones. Genome-wide screens enriched for common intergenic variants may miss causal genetic variations directly changing the protein sequence. To investigate this further, we zoomed into regions of interest and tested whether the association signals obtained in the first stage were direct, or whether they represent causal variations, which were not captured in the initial panel. These subsequent tests showed that protein coding and regulatory variations are involved in metabolite levels. For two genomic regions we also found that genes harbour more than one causal variant influencing metabolite levels independent of each other. We also observed strong connection between markers of cardio-metabolic health and metabolites. Taken together, our novel loci are of interest for further research to investigate the causal relation to for instance type 2 diabetes and cardiovascular disease.
doi:10.1371/journal.pgen.1004835
PMCID: PMC4287344  PMID: 25569235
9.  On the hypothesis-free testing of metabolite ratios in genome-wide and metabolome-wide association studies 
BMC Bioinformatics  2012;13:120.
Background
Genome-wide association studies (GWAS) with metabolic traits and metabolome-wide association studies (MWAS) with traits of biomedical relevance are powerful tools to identify the contribution of genetic, environmental and lifestyle factors to the etiology of complex diseases. Hypothesis-free testing of ratios between all possible metabolite pairs in GWAS and MWAS has proven to be an innovative approach in the discovery of new biologically meaningful associations. The p-gain statistic was introduced as an ad-hoc measure to determine whether a ratio between two metabolite concentrations carries more information than the two corresponding metabolite concentrations alone. So far, only a rule of thumb was applied to determine the significance of the p-gain.
Results
Here we explore the statistical properties of the p-gain through simulation of its density and by sampling of experimental data. We derive critical values of the p-gain for different levels of correlation between metabolite pairs and show that B/(2*α) is a conservative critical value for the p-gain, where α is the level of significance and B the number of tested metabolite pairs.
Conclusions
We show that the p-gain is a well defined measure that can be used to identify statistically significant metabolite ratios in association studies and provide a conservative significance cut-off for the p-gain for use in future association studies with metabolic traits.
doi:10.1186/1471-2105-13-120
PMCID: PMC3537592  PMID: 22672667
p-gain; Metabolomics; MWAS; GWAS; Genome-wide association studies; Metabolome-wide association studies
10.  ADEMA: An Algorithm to Determine Expected Metabolite Level Alterations Using Mutual Information 
PLoS Computational Biology  2013;9(1):e1002859.
Metabolomics is a relatively new “omics” platform, which analyzes a discrete set of metabolites detected in bio-fluids or tissue samples of organisms. It has been used in a diverse array of studies to detect biomarkers and to determine activity rates for pathways based on changes due to disease or drugs. Recent improvements in analytical methodology and large sample throughput allow for creation of large datasets of metabolites that reflect changes in metabolic dynamics due to disease or a perturbation in the metabolic network. However, current methods of comprehensive analyses of large metabolic datasets (metabolomics) are limited, unlike other “omics” approaches where complex techniques for analyzing coexpression/coregulation of multiple variables are applied. This paper discusses the shortcomings of current metabolomics data analysis techniques, and proposes a new multivariate technique (ADEMA) based on mutual information to identify expected metabolite level changes with respect to a specific condition. We show that ADEMA better predicts De Novo Lipogenesis pathway metabolite level changes in samples with Cystic Fibrosis (CF) than prediction based on the significance of individual metabolite level changes. We also applied ADEMA's classification scheme on three different cohorts of CF and wildtype mice. ADEMA was able to predict whether an unknown mouse has a CF or a wildtype genotype with 1.0, 0.84, and 0.9 accuracy for each respective dataset. ADEMA results had up to 31% higher accuracy as compared to other classification algorithms. In conclusion, ADEMA advances the state-of-the-art in metabolomics analysis, by providing accurate and interpretable classification results.
Author Summary
Metabolomics is an experimental approach that analyzes differences in metabolite levels detected in experimental samples. It has been used in the literature to understand the changes in metabolism with respect to diseases or drugs. Unlike transcriptomics or proteomics, which analyze gene and protein expression levels respectively, the techniques that consider co-regulation of multiple metabolites are quite limited. In this paper, we propose a novel technique, called ADEMA, which computes the expected level changes for each metabolite with respect to a given condition. ADEMA considers multiple metabolites at the same time and is mutual information (MI)-based. We show that ADEMA predicts metabolite level changes for young mice with Cystic Fibrosis (CF) better than significance testing that considers one metabolite at a time. Using three different datasets that contain CF and wild-type (WT) mice, we show that ADEMA can classify an individual as being CF or WT based on the metabolic profiles (with 1.0, 0.84, and 0.9 accuracy, respectively). Compared to other well-known classification algorithms, ADEMA's accuracy is higher by up to 31%.
doi:10.1371/journal.pcbi.1002859
PMCID: PMC3547803  PMID: 23341761
11.  A Genome-Wide Metabolic QTL Analysis in Europeans Implicates Two Loci Shaped by Recent Positive Selection 
PLoS Genetics  2011;7(9):e1002270.
We have performed a metabolite quantitative trait locus (mQTL) study of the 1H nuclear magnetic resonance spectroscopy (1H NMR) metabolome in humans, building on recent targeted knowledge of genetic drivers of metabolic regulation. Urine and plasma samples were collected from two cohorts of individuals of European descent, with one cohort comprised of female twins donating samples longitudinally. Sample metabolite concentrations were quantified by 1H NMR and tested for association with genome-wide single-nucleotide polymorphisms (SNPs). Four metabolites' concentrations exhibited significant, replicable association with SNP variation (8.6×10−11
Author Summary
Physiological concentrations of metabolites—small molecules involved in biochemical processes in living systems—can be measured and used to diagnose and predict disease states. A common goal is to detect and clinically exploit statistical differences in metabolite concentrations between diseased and healthy individuals. As a basis for the design and interpretation of case-control studies, it is useful to have a characterization of metabolic diversity amongst healthy individuals, some of which stems from inter-individual genetic variation. When a single genetic locus has a sufficiently strong effect on metabolism, its genomic position can be determined by collecting metabolite concentration data and genome-wide genotype data on a set of individuals and searching for associations between the two data sets—a so-called metabolite quantitative trait locus (mQTL) study. By so tracing mQTLs, we can identify the genetic drivers of metabolism, characterize how the nature or quantity of the corresponding expressed protein(s) feeds forward to influence metabolite levels, and specify disease-predictive models that incorporate mutual dependence amongst genetics, environment, and metabolism.
doi:10.1371/journal.pgen.1002270
PMCID: PMC3169529  PMID: 21931564
Annals of Botany  2012;110(6):1281-1290.
Background
Forage plant breeding is under increasing pressure to deliver new cultivars with improved yield, quality and persistence to the pastoral industry. New innovations in DNA sequencing technologies mean that quantitative trait loci analysis and marker-assisted selection approaches are becoming faster and cheaper, and are increasingly used in the breeding process with the aim to speed it up and improve its precision. High-throughput phenotyping is currently a major bottle neck and emerging technologies such as metabolomics are being developed to bridge the gap between genotype and phenotype; metabolomics studies on forages are reviewed in this article.
Scope
Major challenges for pasture production arise from the reduced availability of resources, mainly water, nitrogen and phosphorus, and metabolomics studies on metabolic responses to these abiotic stresses in Lolium perenne and Lotus species will be discussed here. Many forage plants can be associated with symbiotic microorganisms such as legumes with nitrogen fixing rhizobia, grasses and legumes with phosphorus-solubilizing arbuscular mycorrhizal fungi, and cool temperate grasses with fungal anti-herbivorous alkaloid-producing Neotyphodium endophytes and metabolomics studies have shown that these associations can significantly affect the metabolic composition of forage plants. The combination of genetics and metabolomics, also known as genetical metabolomics can be a powerful tool to identify genetic regions related to specific metabolites or metabolic profiles, but this approach has not been widely adopted for forages yet, and we argue here that more studies are needed to improve our chances of success in forage breeding.
Conclusions
Metabolomics combined with other ‘-omics’ technologies and genome sequencing can be invaluable tools for large-scale geno- and phenotyping of breeding populations, although the implementation of these approaches in forage breeding programmes still lags behind. The majority of studies using metabolomics approaches have been performed with model species or cereals and findings from these studies are not easily translated to forage species. To be most effective these approaches should be accompanied by whole-plant physiology and proof of concept (modelling) studies. Wider considerations of possible consequences of novel traits on the fitness of new cultivars and symbiotic associations need also to be taken into account.
doi:10.1093/aob/mcs023
PMCID: PMC3478039  PMID: 22351485
Metabolomics; forage plants; Lolium perenne; drought stress, nutrient stress; Neotyphodium spp. endophytes; Medicago sativa; Lotus spp.; arbuscular mycorrhizal fungi; rhizobia; metabolic profiling; genetical metabolomics
PLoS Genetics  2012;8(8):e1002907.
Association testing of multiple correlated phenotypes offers better power than univariate analysis of single traits. We analyzed 6,600 individuals from two population-based cohorts with both genome-wide SNP data and serum metabolomic profiles. From the observed correlation structure of 130 metabolites measured by nuclear magnetic resonance, we identified 11 metabolic networks and performed a multivariate genome-wide association analysis. We identified 34 genomic loci at genome-wide significance, of which 7 are novel. In comparison to univariate tests, multivariate association analysis identified nearly twice as many significant associations in total. Multi-tissue gene expression studies identified variants in our top loci, SERPINA1 and AQP9, as eQTLs and showed that SERPINA1 and AQP9 expression in human blood was associated with metabolites from their corresponding metabolic networks. Finally, liver expression of AQP9 was associated with atherosclerotic lesion area in mice, and in human arterial tissue both SERPINA1 and AQP9 were shown to be upregulated (6.3-fold and 4.6-fold, respectively) in atherosclerotic plaques. Our study illustrates the power of multi-phenotype GWAS and highlights candidate genes for atherosclerosis.
Author Summary
In this study, we aim to identify novel genetic variants for metabolism, characterize their effects on nearby genes, and show that the nearby genes are associated with metabolism and atherosclerosis. To discover new genetic variants, we use an alternative approach to traditional genome-wide association studies: we leverage the information in phenotype covariance to increase our statistical power. We identify variants at seven novel loci and then show that our top signals drive expression of nearby genes AQP9 and SERPINA1 in multiple tissues. We demonstrate that AQP9 and SERPINA1 gene expression, in turn, is associated with metabolite levels. Finally, we show that the genes are associated with atherosclerosis using mouse atherosclerotic lesion size (AQP9) as well as tissue from healthy human arteries and atherosclerotic plaques (AQP9 and SERPINA1). This study illustrates that multivariate analysis of correlated metabolites can boost power for gene discovery substantially. Further functional work will need to be performed to elucidate the biological role of SERPINA1 and AQP9 in atherosclerosis.
doi:10.1371/journal.pgen.1002907
PMCID: PMC3420921  PMID: 22916037
BMC Genomics  2013;14:865.
Background
Genome-wide association studies (GWAS) have identified many common single nucleotide polymorphisms (SNPs) that associate with clinical phenotypes, but these SNPs usually explain just a small part of the heritability and have relatively modest effect sizes. In contrast, SNPs that associate with metabolite levels generally explain a higher percentage of the genetic variation and demonstrate larger effect sizes. Still, the discovery of SNPs associated with metabolite levels is challenging since testing all metabolites measured in typical metabolomics studies with all SNPs comes with a severe multiple testing penalty. We have developed an automated workflow approach that utilizes prior knowledge of biochemical pathways present in databases like KEGG and BioCyc to generate a smaller SNP set relevant to the metabolite. This paper explores the opportunities and challenges in the analysis of GWAS of metabolomic phenotypes and provides novel insights into the genetic basis of metabolic variation through the re-analysis of published GWAS datasets.
Results
Re-analysis of the published GWAS dataset from Illig et al. (Nature Genetics, 2010) using a pathway-based workflow (http://www.myexperiment.org/packs/319.html), confirmed previously identified hits and identified a new locus of human metabolic individuality, associating Aldehyde dehydrogenase family1 L1 (ALDH1L1) with serine/glycine ratios in blood. Replication in an independent GWAS dataset of phospholipids (Demirkan et al., PLoS Genetics, 2012) identified two novel loci supported by additional literature evidence: GPAM (Glycerol-3 phosphate acyltransferase) and CBS (Cystathionine beta-synthase). In addition, the workflow approach provided novel insight into the affected pathways and relevance of some of these gene-metabolite pairs in disease development and progression.
Conclusions
We demonstrate the utility of automated exploitation of background knowledge present in pathway databases for the analysis of GWAS datasets of metabolomic phenotypes. We report novel loci and potential biochemical mechanisms that contribute to our understanding of the genetic basis of metabolic variation and its relationship to disease development and progression.
doi:10.1186/1471-2164-14-865
PMCID: PMC3879060  PMID: 24320595
Genome-wide association; Metabolite; Genotype-phenotype prioritization; Bioinformatics; Pathway databases
PLoS Computational Biology  2009;5(1):e1000270.
Metabolite concentrations can regulate gene expression, which can in turn regulate metabolic activity. The extent to which functionally related transcripts and metabolites show similar patterns of concentration changes, however, remains unestablished. We measure and analyze the metabolomic and transcriptional responses of Saccharomyces cerevisiae to carbon and nitrogen starvation. Our analysis demonstrates that transcripts and metabolites show coordinated response dynamics. Furthermore, metabolites and gene products whose concentration profiles are alike tend to participate in related biological processes. To identify specific, functionally related genes and metabolites, we develop an approach based on Bayesian integration of the joint metabolomic and transcriptomic data. This algorithm finds interactions by evaluating transcript–metabolite correlations in light of the experimental context in which they occur and the class of metabolite involved. It effectively predicts known enzymatic and regulatory relationships, including a gene–metabolite interaction central to the glycolytic–gluconeogenetic switch. This work provides quantitative evidence that functionally related metabolites and transcripts show coherent patterns of behavior on the genome scale and lays the groundwork for building gene–metabolite interaction networks directly from systems-level data.
Author Summary
Metabolism is the process of converting nutrients into usable energy and the building blocks of cellular structures. Although the biochemical reactions of metabolism are well characterized, the ways in which metabolism is regulated and regulates other biological processes remain incompletely understood. In particular, the extent to which metabolite concentrations are related to the production of gene products is an open question. To address this question, we have measured the dynamics of both metabolites and gene products in yeast in response to two different environmental stresses. We find a strong coordination of the responses of metabolites and functionally related gene products. The nature of this correlation (e.g., whether it is direct or inverse) depends on the type of metabolite (e.g., amino acid versus glycolytic compound) and the kind of stress to which the cells were subjected. We have used our observations of these dependencies to design a Bayesian algorithm that predicts functional relationships between metabolites and genes directly from experimental data. This approach lays the groundwork for a systems-level understanding of metabolism and its regulation by (and of) gene product levels. Such an understanding would be valuable for metabolic engineering and for understanding and treating metabolic diseases.
doi:10.1371/journal.pcbi.1000270
PMCID: PMC2614473  PMID: 19180179
PLoS Genetics  2010;6(11):e1001198.
Discovering links between the genotype of an organism and its metabolite levels can increase our understanding of metabolism, its controls, and the indirect effects of metabolism on other quantitative traits. Recent technological advances in both DNA sequencing and metabolite profiling allow the use of broad-spectrum, untargeted metabolite profiling to generate phenotypic data for genome-wide association studies that investigate quantitative genetic control of metabolism within species. We conducted a genome-wide association study of natural variation in plant metabolism using the results of untargeted metabolite analyses performed on a collection of wild Arabidopsis thaliana accessions. Testing 327 metabolites against >200,000 single nucleotide polymorphisms identified numerous genotype–metabolite associations distributed non-randomly within the genome. These clusters of genotype–metabolite associations (hotspots) included regions of the A. thaliana genome previously identified as subject to recent strong positive selection (selective sweeps) and regions showing trans-linkage to these putative sweeps, suggesting that these selective forces have impacted genome-wide control of A. thaliana metabolism. Comparing the metabolic variation detected within this collection of wild accessions to a laboratory-derived population of recombinant inbred lines (derived from two of the accessions used in this study) showed that the higher level of genetic variation present within the wild accessions did not correspond to higher variance in metabolic phenotypes, suggesting that evolutionary constraints limit metabolic variation. While a major goal of genome-wide association studies is to develop catalogues of intraspecific variation, the results of multiple independent experiments performed for this study showed that the genotype–metabolite associations identified are sensitive to environmental fluctuations. Thus, studies of intraspecific variation conducted via genome-wide association will require analyses of genotype by environment interaction. Interestingly, the network structure of metabolite linkages was also sensitive to environmental differences, suggesting that key aspects of network architecture are malleable.
Author Summary
Understanding how genetic variation can control phenotypic variation is a fundamental goal of modern biology. We combined genome-wide association mapping with metabolomics in the plant Arabidopsis thaliana to explore how species-wide genetic variation controls metabolism. We identified numerous naturally-variable genes that may influence plant metabolism, often clustering in “hotspots.” These hotspots were proximal to selective sweeps, regions of the genome showing decreased diversity possibly from a strong selective advantage of specific variants within the region. This suggests that metabolism may be connected to the selective advantage. Interestingly, metabolite variation in wild Arabidopsis is highly constrained despite the significant genetic variation, thus providing the plant un-sampled metabolic space if the environment shifts. The observed structuring of genetic and metabolic variation suggests individual convergence upon similar phenotypes via different genotypes, possibly intra-specific parallel evolution. This phenotypic convergence couples with a pattern of genotype—phenotype association consistent with metabolite variation largely controlled by numerous small effect genetic variants. This supports the supposition that large magnitude variation is likely unstable in a complex and interconnected metabolism. If this pattern proves generally applicable to other species, it could present a significant hurdle to identifying genes controlling metabolic trait variation via genome-wide association studies.
doi:10.1371/journal.pgen.1001198
PMCID: PMC2973833  PMID: 21079692
Genes & Nutrition  2012;8(1):19-27.
Genome-wide association studies (GWASs) have become a very important tool to address the genetic origin of phenotypic variability, in particular associated with diseases. Nevertheless, these types of studies provide limited information about disease etiology and the molecular mechanisms involved. Recently, the incorporation of metabolomics into the analysis has offered novel opportunities for a better understanding of disease-related metabolic deregulation. The pattern emerging from this work is that gene-driven changes in metabolism are prevalent and that common genetic variations can have a profound impact on the homeostatic concentrations of specific metabolites. A particularly interesting aspect of this work takes into account interactions of environment and lifestyle with the genome and how this interaction translates into changes in the metabolome. For instance, the role of PYROXD2 in trimethylamine metabolism points to an interaction between host and microbiome genomes (host/microbiota). Often, these findings reveal metabolic deregulations, which could eventually be tuned with a nutritional intervention. Here we review the development of gene–metabolism association studies from a single-gene/single-metabolite to a genome-wide/metabolome-wide approach and highlight the conceptual changes associated with this ongoing transition. Moreover, we report some of our recent GWAS results on a cohort of 265 individuals from an ethnically diverse population that validate and refine previous findings on gene–urine metabolism interactions. Specifically, our results confirm the effect of PYROXD2 polymorphisms on trimethylamine metabolism and suggest that a previously reported association of N-acetylated compounds with the ALMS1/NAT8 locus is driven by SNPs in the ALMS1 gene.
Electronic supplementary material
The online version of this article (doi:10.1007/s12263-012-0313-7) contains supplementary material, which is available to authorized users.
doi:10.1007/s12263-012-0313-7
PMCID: PMC3534994  PMID: 23065485
Metabolome wide associations; Genome wide associations; Metabolomics; Nutrition
We have designed an experimental/computational framework for studying complex phenotypes in bacteria.Our framework relies on whole-genome fitness profiling coupled with a module-level analysis to discover pathways that directly affect fitness.As a proof-of-principle, we studied ethanol tolerance in Escherichia coli and we identified key pathways that contribute to this phenotype.We then validated our findings through genetic manipulations, gene-expression profiling, metabolite-level measurements, and stable-isotope labeling.
Elucidating the genetic basis of complex phenotypes remains a fundamental challenge in biology. We have developed a systematic framework for comprehensive genetic analysis of microbial phenotypes. Our approach combines the power of fitness profiling (Girgis et al, 2007; Amini et al, 2009) with the sensitivity of module-level analysis (Goodarzi et al, 2009a) to identify key genetic modules that directly affect a phenotype under study. We applied our technology to ethanol tolerance, a complex phenotype with broad industrial relevance. Ethanol affects a variety of cellular components and pathways, including but not limited to membrane integrity (Dombek and Ingram, 1984), enzyme activities (Millar et al, 1982), and proton flux (D'Amore et al, 1990). Given the diversity of targets, the emergence of ethanol tolerance requires modifications to multiple pathway (D'Amore and Stewart, 1987).
To reveal the genetic basis of ethanol tolerance in Escherichia coli, we used two high-coverage mutant libraries (a transposon library and an overexpression library) to assess the fitness consequences of single-locus perturbations. Each cell in our transposon library contains a random transposon insertion in its genome (Girgis et al, 2007); whereas the cells in the overexpression library carry 1–3 kb genomic fragments cloned into a cloning vector (Amini et al, 2009). We grew these libraries under mild (4% v/v) and harsh (5.5% v/v) ethanol concentrations. On growth, the abundance of each transposon insertion or overexpression mutant changes as a function of its fitness, a process that can be monitored through parallel genetic footprinting and microarray hybridization (Figure 1A). This results in a global fitness profile, where the contribution of each genetic locus to ethanol tolerance can be quantified in parallel. However, in the context of ethanol tolerance and other complex phenotypes, single-locus perturbations typically result in modest changes in fitness. Although these small differences can be amplified through multiple rounds of selection, the number of generations is limited as spontaneous beneficial mutations emerge in the population and cause strong biases in the resulting fitness profiles. To boost our analytical power without introducing these biases in the data, we used a module-level computational method to discover the pathways and components that are strongly associated with the data as opposed to focusing on the genes individually (Goodarzi et al, 2009a). Genes function in the context of pathways and modules and module-level analyses increase statistical power through combining information from multiple genes functioning as part of a given pathway (Subramanian et al, 2005).
The module-level analysis of the fitness scores from both libraries revealed a diverse set of pathways that have a direct function in ethanol tolerance. Some of these pathways, including heat-shock stress response and osmoregulation, are known modifiers of ethanol tolerance; whereas others such as acid-stress response and fimbrial structures are novel pathways. Among our findings was the important function of three regulatory proteins: FNR, ArcA, and CafA. Knocking out FNR/ArcA that upregulates aerobic respiration proteins and TCA cycle components results in a marked increase in ethanol tolerance. Similarly, knocking out CafA, a post-transcriptional regulator of alcohol dehydrogenase, is beneficial for tolerance. Given these observations, we hypothesized that selection for ethanol tolerance can result in higher ethanol degradation.
As a large fraction of discovered pathways belonged to central metabolism, we used metabolomics to evaluate our findings. To directly assess the metabolic consequences of adaptation to ethanol, we evolved ethanol-tolerant strains in minimal media plus glucose for ∼30 and 160 generations. We then compared the steady-state level of metabolites in these strains to that of the wild type (Figure 1B). In agreement with our fitness profiling results, we observed a significant increase in TCA cycle metabolites in one of our ethanol-tolerant strains. Higher concentrations of TCA cycle components along with a high free coenzyme A (CoA) to acetyl-coenzyme A (acetyl-CoA) ratio hinted at the capacity of this strain to metabolize ethanol. To test this hypothesis, we performed stable-isotope labeling on our ethanol-tolerant strain versus wild type. After growth on labeled ethanol, we measured the fraction of metabolites that were labeled at each timepoint (Figure 1B). Our results confirmed that the ethanol-tolerant strain has the capacity to consume ethanol through its conversion into acetyl-CoA and further assimilation in the TCA cycle.
By using a variety of systems-level approaches, we have been able to genetically dissect ethanol tolerance in E. coli. We have shown that fitness profiling, in combination with module-level analysis tools, can serve as a powerful approach for revealing the genetic basis of complex phenotypes. The fact that laboratory evolution ended up using the very modules that we discovered, highlights the biological and adaptive relevance of the proposed framework.
Understanding the genetic basis of adaptation is a central problem in biology. However, revealing the underlying molecular mechanisms has been challenging as changes in fitness may result from perturbations to many pathways, any of which may contribute relatively little. We have developed a combined experimental/computational framework to address this problem and used it to understand the genetic basis of ethanol tolerance in Escherichia coli. We used fitness profiling to measure the consequences of single-locus perturbations in the context of ethanol exposure. A module-level computational analysis was then used to reveal the organization of the contributing loci into cellular processes and regulatory pathways (e.g. osmoregulation and cell-wall biogenesis) whose modifications significantly affect ethanol tolerance. Strikingly, we discovered that a dominant component of adaptation involves metabolic rewiring that boosts intracellular ethanol degradation and assimilation. Through phenotypic and metabolomic analysis of laboratory-evolved ethanol-tolerant strains, we investigated naturally accessible pathways of ethanol tolerance. Remarkably, these laboratory-evolved strains, by and large, follow the same adaptive paths as inferred from our coarse-grained search of the fitness landscape.
doi:10.1038/msb.2010.33
PMCID: PMC2913397  PMID: 20531407
adaptation; ethanol tolerance; evolution; fitness profiling
Metabolomics is the methodology that identifies and measures global pools of small molecules (of less than about 1,000 Da) of a biological sample, which are collectively called the metabolome. Metabolomics can therefore reveal the metabolic outcome of a genetic or environmental perturbation of a metabolic regulatory network, and thus provide insights into the structure and regulation of that network. Because of the chemical complexity of the metabolome and limitations associated with individual analytical platforms for determining the metabolome, it is currently difficult to capture the complete metabolome of an organism or tissue, which is in contrast to genomics and transcriptomics. This paper describes the analysis of Arabidopsis metabolomics data sets acquired by a consortium that includes five analytical laboratories, bioinformaticists, and biostatisticians, which aims to develop and validate metabolomics as a hypothesis-generating functional genomics tool. The consortium is determining the metabolomes of Arabidopsis T-DNA mutant stocks, grown in standardized controlled environment optimized to minimize environmental impacts on the metabolomes. Metabolomics data were generated with seven analytical platforms, and the combined data is being provided to the research community to formulate initial hypotheses about genes of unknown function (GUFs). A public database (www.PlantMetabolomics.org) has been developed to provide the scientific community with access to the data along with tools to allow for its interactive analysis. Exemplary datasets are discussed to validate the approach, which illustrate how initial hypotheses can be generated from the consortium-produced metabolomics data, integrated with prior knowledge to provide a testable hypothesis concerning the functionality of GUFs.
doi:10.3389/fpls.2012.00015
PMCID: PMC3355754  PMID: 22645570
Arabidopsis; metabolomics; gene annotation; functional genomics; database
PLoS Genetics  2014;10(3):e1004212.
Phenotypes proximal to gene action generally reflect larger genetic effect sizes than those that are distant. The human metabolome, a result of multiple cellular and biological processes, are functional intermediate phenotypes proximal to gene action. Here, we present a genome-wide association study of 308 untargeted metabolite levels among African Americans from the Atherosclerosis Risk in Communities (ARIC) Study. Nineteen significant common variant-metabolite associations were identified, including 13 novel loci (p<1.6×10−10). These loci were associated with 7–50% of the difference in metabolite levels per allele, and the variance explained ranged from 4% to 20%. Fourteen genes were identified within the nineteen loci, and four of them contained non-synonymous substitutions in four enzyme-encoding genes (KLKB1, SIAE, CPS1, and NAT8); the other significant loci consist of eight other enzyme-encoding genes (ACE, GATM, ACY3, ACSM2B, THEM4, ADH4, UGT1A, TREH), a transporter gene (SLC6A13) and a polycystin protein gene (PKD2L1). In addition, four potential disease-associated paths were identified, including two direct longitudinal predictive relationships: NAT8 with N-acetylornithine, N-acetyl-1-methylhistidine and incident chronic kidney disease, and TREH with trehalose and incident diabetes. These results highlight the value of using endophenotypes proximal to gene function to discover new insights into biology and disease pathology.
Author Summary
Most contemporary GWAS studies have achieved increased power by increasing the size of the discovery sample to tens of thousands of individuals. An alternative approach for detecting the effects of novel loci is to measure phenotypes that more immediately reflect the effects of gene function. The metabolome consists of a collection of small molecules resulting from a variety of cellular and biologic processes, which can be considered intermediate phenotypes proximal to gene function. Here, we report a genome-wide association study identifying nineteen genetic loci influencing untargeted metabolomes traits among African Americans in the Atherosclerosis Risk in Communities (ARIC) Study. Fourteen genes mapped within nineteen loci, including twelve enzyme-encoding genes (KLKB1, SIAE, CPS1, NAT8, ACE, GATM, ACY3, ACSM2B, THEM4, ADH4, UGT1A and TREH), a transporter gene (SLC6A13) and a polycystin protein gene (PKD2L1). In addition, four potential disease-associated paths were identified, including two direct longitudinal predictive relationships: NAT8 with N-acetylornithine, N-acetyl-1-methylhistidine and incident chronic kidney disease, and TREH with trehalose and incident diabetes. These results highlight the value of using phenotypes proximal to gene function to promote novel gene discovery.
doi:10.1371/journal.pgen.1004212
PMCID: PMC3952826  PMID: 24625756
The field of metabolomics has witnessed an exponential growth in the last decade driven by important applications spanning a wide range of areas in the basic and life sciences and beyond. Mass spectrometry in combination with chromatography and nuclear magnetic resonance are the two major analytical avenues for the analysis of metabolic species in complex biological mixtures. Owing to its inherent significantly higher sensitivity and fast data acquisition, MS plays an increasingly dominant role in the metabolomics field. Propelled by the need to develop simple methods to diagnose and manage the numerous and widespread human diseases, mass spectrometry has witnessed tremendous growth with advances in instrumentation, experimental methods, software, and databases. In response, the metabolomics field has moved far beyond qualitative methods and simple pattern recognition approaches to a range of global and targeted quantitative approaches that are now routinely used and provide reliable data, which instill greater confidence in the derived inferences. Powerful isotope labeling and tracing methods have become very popular. The newly emerging ambient ionization techniques such as desorption ionization and rapid evaporative ionization have allowed direct MS analysis in real time, as well as new MS imaging approaches. While the MS-based metabolomics has provided insights into metabolic pathways and fluxes, and metabolite biomarkers associated with numerous diseases, the increasing realization of the extremely high complexity of biological mixtures underscores numerous challenges including unknown metabolite identification, biomarker validation, and interlaboratory reproducibility that need to be dealt with for realization of the full potential of MS-based metabolomics. This chapter provides a glimpse at the current status of the mass spectrometry-based metabolomics field highlighting the opportunities and challenges.
doi:10.1007/978-1-4939-1258-2_1
PMCID: PMC4336784  PMID: 25270919
Mass spectrometry; Ionization methods; Quantitative metabolomics; Mass analyzers; Ambient ionization; MS-imaging; Chromatography; Capillary electrophoresis
BMC Systems Biology  2009;3:37.
Background
Metabolomics has emerged as a powerful tool in the quantitative identification of physiological and disease-induced biological states. Extracellular metabolome or metabolic profiling data, in particular, can provide an insightful view of intracellular physiological states in a noninvasive manner.
Results
We used an updated genome-scale metabolic network model of Saccharomyces cerevisiae, iMM904, to investigate how changes in the extracellular metabolome can be used to study systemic changes in intracellular metabolic states. The iMM904 metabolic network was reconstructed based on an existing genome-scale network, iND750, and includes 904 genes and 1,412 reactions. The network model was first validated by comparing 2,888 in silico single-gene deletion strain growth phenotype predictions to published experimental data. Extracellular metabolome data measured in response to environmental and genetic perturbations of ammonium assimilation pathways was then integrated with the iMM904 network in the form of relative overflow secretion constraints and a flux sampling approach was used to characterize candidate flux distributions allowed by these constraints. Predicted intracellular flux changes were consistent with published measurements on intracellular metabolite levels and fluxes. Patterns of predicted intracellular flux changes could also be used to correctly identify the regions of the metabolic network that were perturbed.
Conclusion
Our results indicate that integrating quantitative extracellular metabolomic profiles in a constraint-based framework enables inferring changes in intracellular metabolic flux states. Similar methods could potentially be applied towards analyzing biofluid metabolome variations related to human physiological and disease states.
doi:10.1186/1752-0509-3-37
PMCID: PMC2679711  PMID: 19321003
BMC Systems Biology  2011;5:72.
Background
The quantification of experimentally-induced alterations in biological pathways remains a major challenge in systems biology. One example of this is the quantitative characterization of alterations in defined, established metabolic pathways from complex metabolomic data. At present, the disruption of a given metabolic pathway is inferred from metabolomic data by observing an alteration in the level of one or more individual metabolites present within that pathway. Not only is this approach open to subjectivity, as metabolites participate in multiple pathways, but it also ignores useful information available through the pairwise correlations between metabolites. This extra information may be incorporated using a higher-level approach that looks for alterations between a pair of correlation networks. In this way experimentally-induced alterations in metabolic pathways can be quantitatively defined by characterizing group differences in metabolite clustering. Taking this approach increases the objectivity of interpreting alterations in metabolic pathways from metabolomic data.
Results
We present and justify a new technique for comparing pairs of networks--in our case these networks are based on the same set of nodes and there are two distinct types of weighted edges. The algorithm is based on the Generalized Singular Value Decomposition (GSVD), which may be regarded as an extension of Principle Components Analysis to the case of two data sets. We show how the GSVD can be interpreted as a technique for reordering the two networks in order to reveal clusters that are exclusive to only one. Here we apply this algorithm to a new set of metabolomic data from the prefrontal cortex (PFC) of a translational model relevant to schizophrenia, rats treated subchronically with the N-methyl-D-Aspartic acid (NMDA) receptor antagonist phencyclidine (PCP). This provides us with a means to quantify which predefined metabolic pathways (Kyoto Encyclopedia of Genes and Genomes (KEGG) metabolite pathway database) were altered in the PFC of PCP-treated rats. Several significant changes were discovered, notably: 1) neuroactive ligands active at glutamate and GABA receptors are disrupted in the PFC of PCP-treated animals, 2) glutamate dysfunction in these animals was not limited to compromised glutamatergic neurotransmission but also involves the disruption of metabolic pathways linked to glutamate; and 3) a specific series of purine reactions Xanthine ← Hypoxyanthine ↔ Inosine ← IMP → adenylosuccinate is also disrupted in the PFC of PCP-treated animals.
Conclusions
Network reordering via the GSVD provides a means to discover statistically validated differences in clustering between a pair of networks. In practice this analytical approach, when applied to metabolomic data, allows us to quantify the alterations in metabolic pathways between two experimental groups. With this new computational technique we identified metabolic pathway alterations that are consistent with known results. Furthermore, we discovered disruption in a novel series of purine reactions that may contribute to the PFC dysfunction and cognitive deficits seen in schizophrenia.
doi:10.1186/1752-0509-5-72
PMCID: PMC3239845  PMID: 21575198
Metabolites  2012;2(4):1031-1059.
Specialized compounds from photosynthetic organisms serve as rich resources for drug development. From aspirin to atropine, plant-derived natural products have had a profound impact on human health. Technological advances provide new opportunities to access these natural products in a metabolic context. Here, we describe a database and platform for storing, visualizing and statistically analyzing metabolomics data from fourteen medicinal plant species. The metabolomes and associated transcriptomes (RNAseq) for each plant species, gathered from up to twenty tissue/organ samples that have experienced varied growth conditions and developmental histories, were analyzed in parallel. Three case studies illustrate different ways that the data can be integrally used to generate testable hypotheses concerning the biochemistry, phylogeny and natural product diversity of medicinal plants. Deep metabolomics analysis of Camptotheca acuminata exemplifies how such data can be used to inform metabolic understanding of natural product chemical diversity and begin to formulate hypotheses about their biogenesis. Metabolomics data from Prunella vulgaris, a species that contains a wide range ofantioxidant, antiviral, tumoricidal and anti-inflammatory constituents, provide a case study of obtaining biosystematic and developmental fingerprint information from metabolite accumulation data in a little studied species. Digitalis purpurea, well known as a source of cardiac glycosides, is used to illustrate how integrating metabolomics and transcriptomics data can lead to identification of candidate genes encoding biosynthetic enzymes in the cardiac glycoside pathway. Medicinal Plant Metabolomics Resource (MPM) [1] provides a framework for generating experimentally testable hypotheses about the metabolic networks that lead to the generation of specialized compounds, identifying genes that control their biosynthesis and establishing a basis for modeling metabolism in less studied species. The database is publicly available and can be used by researchers in medicine and plant biology.
doi:10.3390/metabo2041031
PMCID: PMC3901233  PMID: 24957774
database; metabolomics; specialized metabolites; medicinal; cardiac glycoside; alkaloid; digitalis; terpene; phenolic.
Advances in Nutrition  2013;4(5):548-550.
Nutrients exert potent effects on metabolism through a variety of regulatory mechanisms, resulting in local and systemic changes in metabolite levels. Numerous studies have focused on mechanisms by which nutrients and disease states regulate metabolism at the gene or protein levels using genomic and proteomic approaches, respectively. However, few studies have investigated nutritional regulation of the whole metabolome. Thus, metabolomic approaches have recently emerged to complement the genomics and proteomics research and to help identify biologically meaningful metabolites and metabolic networks that control cellular responses to genetic and environmental factors, including diet, and to identify metabolic diseases that are influenced by genetic and dietary factors. These large-scale studies expedite our ability to develop targeted treatments. The goal of this symposium was to provide a forum to introduce the metabolomics field to nutrition researchers. An overview of the state-of-the-art metabolomic technologies used was provided. The impact of some specific nutrients, disease states, or genetic variations and their interaction with the metabolome was discussed by the speakers. Our objectives were as follows: 1) to educate the audience about the use of metabolomics as an innovative tool for linking changes in cell metabolites and genetic variations to nutrient metabolism, energy balance, and the overlying effects on health and disease; 2) to understand the concept of metabolomics and describe the analytical tools and resources available in this area; 3) to introduce the potential application of metabolomics in the field of nutrition research; and 4) to provide specific nutrition-relevant metabolomics study examples in investigating regulation of the metabolic network or metabolic changes resulting from disease states by dietary factors.
doi:10.3945/an.113.004267
PMCID: PMC3771145  PMID: 24038253

Results 1-25 (1147579)