PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (31)
 

Clipboard (0)
None

Select a Filter Below

Journals
more »
Year of Publication
1.  Mining the Unknown: A Systems Approach to Metabolite Identification Combining Genetic and Metabolic Information 
PLoS Genetics  2012;8(10):e1003005.
Recent genome-wide association studies (GWAS) with metabolomics data linked genetic variation in the human genome to differences in individual metabolite levels. A strong relevance of this metabolic individuality for biomedical and pharmaceutical research has been reported. However, a considerable amount of the molecules currently quantified by modern metabolomics techniques are chemically unidentified. The identification of these “unknown metabolites” is still a demanding and intricate task, limiting their usability as functional markers of metabolic processes. As a consequence, previous GWAS largely ignored unknown metabolites as metabolic traits for the analysis. Here we present a systems-level approach that combines genome-wide association analysis and Gaussian graphical modeling with metabolomics to predict the identity of the unknown metabolites. We apply our method to original data of 517 metabolic traits, of which 225 are unknowns, and genotyping information on 655,658 genetic variants, measured in 1,768 human blood samples. We report previously undescribed genotype–metabotype associations for six distinct gene loci (SLC22A2, COMT, CYP3A5, CYP2C18, GBA3, UGT3A1) and one locus not related to any known gene (rs12413935). Overlaying the inferred genetic associations, metabolic networks, and knowledge-based pathway information, we derive testable hypotheses on the biochemical identities of 106 unknown metabolites. As a proof of principle, we experimentally confirm nine concrete predictions. We demonstrate the benefit of our method for the functional interpretation of previous metabolomics biomarker studies on liver detoxification, hypertension, and insulin resistance. Our approach is generic in nature and can be directly transferred to metabolomics data from different experimental platforms.
Author Summary
Genome-wide association studies on metabolomics data have demonstrated that genetic variation in metabolic enzymes and transporters leads to concentration changes in the respective metabolite levels. The conventional goal of these studies is the detection of novel interactions between the genome and the metabolic system, providing valuable insights for both basic research as well as clinical applications. In this study, we borrow the metabolomics GWAS concept for a novel, entirely different purpose. Metabolite measurements frequently produce signals where a certain substance can be reliably detected in the sample, but it has not yet been elucidated which specific metabolite this signal actually represents. The concept is comparable to a fingerprint: each one is uniquely identifiable, but as long as it is not registered in a database one cannot tell to whom this fingerprint belongs. Obviously, this issue tremendously reduces the usability of a metabolomics analyses. The genetic associations of such an “unknown,” however, give us concrete evidence of the metabolic pathway this substance is most probably involved in. Moreover, we complement the approach with a specific measure of correlation between metabolites, providing further evidence of the metabolic processes of the unknown. For a number of cases, this even allows for a concrete identity prediction, which we then experimentally validate in the lab.
doi:10.1371/journal.pgen.1003005
PMCID: PMC3475673  PMID: 23093944
2.  Novel biomarkers for pre-diabetes identified by metabolomics 
A targeted metabolomics approach was used to identify candidate biomarkers of pre-diabetes. The relevance of the identified metabolites is further corroborated with a protein-metabolite interaction network and gene expression data.
Three metabolites (glycine, lysophosphatidylcholine (LPC) (18:2) and acetylcarnitine C2) were found with significantly altered levels in pre-diabetic individuals compared with normal controls.Lower levels of glycine and LPC (18:2) were found to predict risks for pre-diabetes and type 2 diabetes (T2D).Seven T2D-related genes (PPARG, TCF7L2, HNF1A, GCK, IGF1, IRS1 and IDE) are functionally associated with the three identified metabolites.The unique combination of methodologies, including prospective population-based and nested case–control, as well as cross-sectional studies, was essential for the identification of the reported biomarkers.
Type 2 diabetes (T2D) can be prevented in pre-diabetic individuals with impaired glucose tolerance (IGT). Here, we have used a metabolomics approach to identify candidate biomarkers of pre-diabetes. We quantified 140 metabolites for 4297 fasting serum samples in the population-based Cooperative Health Research in the Region of Augsburg (KORA) cohort. Our study revealed significant metabolic variation in pre-diabetic individuals that are distinct from known diabetes risk indicators, such as glycosylated hemoglobin levels, fasting glucose and insulin. We identified three metabolites (glycine, lysophosphatidylcholine (LPC) (18:2) and acetylcarnitine) that had significantly altered levels in IGT individuals as compared to those with normal glucose tolerance, with P-values ranging from 2.4 × 10−4 to 2.1 × 10−13. Lower levels of glycine and LPC were found to be predictors not only for IGT but also for T2D, and were independently confirmed in the European Prospective Investigation into Cancer and Nutrition (EPIC)-Potsdam cohort. Using metabolite–protein network analysis, we identified seven T2D-related genes that are associated with these three IGT-specific metabolites by multiple interactions with four enzymes. The expression levels of these enzymes correlate with changes in the metabolite concentrations linked to diabetes. Our results may help developing novel strategies to prevent T2D.
doi:10.1038/msb.2012.43
PMCID: PMC3472689  PMID: 23010998
early diagnostic biomarkers; IGT; metabolomics; prediction; T2D
3.  MassTRIX Reloaded: Combined Analysis and Visualization of Transcriptome and Metabolome Data 
PLoS ONE  2012;7(7):e39860.
Systems Biology is a field in biological science that focuses on the combination of several or all “omics”-approaches in order to find out how genes, transcripts, proteins and metabolites act together in the network of life. Metabolomics as analog to genomics, transcriptomics and proteomics is more and more integrated into biological studies and often transcriptomic and metabolomic experiments are combined in one setup. At a first glance both data types seem to be completely different, but both produce information on biological entities, either transcripts or metabolites. Both types can be overlaid on metabolic pathways to obtain biological information on the studied system. For the joint analysis of both data types the MassTRIX webserver was updated. MassTRIX is freely available at www.masstrix.org.
doi:10.1371/journal.pone.0039860
PMCID: PMC3391204  PMID: 22815716
4.  Body Fat Free Mass Is Associated with the Serum Metabolite Profile in a Population-Based Study 
PLoS ONE  2012;7(6):e40009.
Objective
To characterise the influence of the fat free mass on the metabolite profile in serum samples from participants of the population-based KORA (Cooperative Health Research in the Region of Augsburg) S4 study.
Subjects and Methods
Analyses were based on metabolite profile from 965 participants of the S4 and 890 weight-stable subjects of its seven-year follow-up study (KORA F4). 190 different serum metabolites were quantified in a targeted approach including amino acids, acylcarnitines, phosphatidylcholines (PCs), sphingomyelins and hexose. Associations between metabolite concentrations and the fat free mass index (FFMI) were analysed using adjusted linear regression models. To draw conclusions on enzymatic reactions, intra-metabolite class ratios were explored. Pairwise relationships among metabolites were investigated and illustrated by means of Gaussian graphical models (GGMs).
Results
We found 339 significant associations between FFMI and various metabolites in KORA S4. Among the most prominent associations (p-values 4.75×10−16–8.95×10−06) with higher FFMI were increasing concentrations of the branched chained amino acids (BCAAs), ratios of BCAAs to glucogenic amino acids, and carnitine concentrations. For various PCs, a decrease in chain length or in saturation of the fatty acid moieties could be observed with increasing FFMI, as well as an overall shift from acyl-alkyl PCs to diacyl PCs. These findings were reproduced in KORA F4. The established GGMs supported the regression results and provided a comprehensive picture of the relationships between metabolites. In a sub-analysis, most of the discovered associations did not exist in obese subjects in contrast to non-obese subjects, possibly indicating derangements in skeletal muscle metabolism.
Conclusion
A set of serum metabolites strongly associated with FFMI was identified and a network explaining the relationships among metabolites was established. These results offer a novel and more complete picture of the FFMI effects on serum metabolites in a data-driven network.
doi:10.1371/journal.pone.0040009
PMCID: PMC3384624  PMID: 22761945
5.  On the hypothesis-free testing of metabolite ratios in genome-wide and metabolome-wide association studies 
BMC Bioinformatics  2012;13:120.
Background
Genome-wide association studies (GWAS) with metabolic traits and metabolome-wide association studies (MWAS) with traits of biomedical relevance are powerful tools to identify the contribution of genetic, environmental and lifestyle factors to the etiology of complex diseases. Hypothesis-free testing of ratios between all possible metabolite pairs in GWAS and MWAS has proven to be an innovative approach in the discovery of new biologically meaningful associations. The p-gain statistic was introduced as an ad-hoc measure to determine whether a ratio between two metabolite concentrations carries more information than the two corresponding metabolite concentrations alone. So far, only a rule of thumb was applied to determine the significance of the p-gain.
Results
Here we explore the statistical properties of the p-gain through simulation of its density and by sampling of experimental data. We derive critical values of the p-gain for different levels of correlation between metabolite pairs and show that B/(2*α) is a conservative critical value for the p-gain, where α is the level of significance and B the number of tested metabolite pairs.
Conclusions
We show that the p-gain is a well defined measure that can be used to identify statistically significant metabolite ratios in association studies and provide a conservative significance cut-off for the p-gain for use in future association studies with metabolic traits.
doi:10.1186/1471-2105-13-120
PMCID: PMC3537592  PMID: 22672667
p-gain; Metabolomics; MWAS; GWAS; Genome-wide association studies; Metabolome-wide association studies
6.  Mesenchymal Cell Interaction with Ovarian Cancer Cells Triggers Pro-Metastatic Properties 
PLoS ONE  2012;7(5):e38340.
Tumor microenvironement is an important actor of ovarian cancer progression but the relations between mesenchymal cells and ovarian cancer cells remain unclear. The objective of this study was to determine the ovarian cancer cells' biological modifications induced by mesenchymal cells. To address this issue, we used two different ovarian cancer cell lines (NIH:OVCAR3 and SKOV3) and co-cultured them with mesenchymal cells. Upon co-culture the different cell populations were sorted to study their transcriptome and biological properties. Transcriptomic analysis revealed three biological-function gene clusters were enriched upon contact with mesenchymal cells. These were related to the increase of metastatic abilities (adhesion, migration and invasion), proliferation and chemoresistance in vitro. Therefore, contact with the mesenchymal cell niche could increase metastatic initiation and expansion through modification of cancer cells. Taken together these findings suggest that pathways involved in hetero-cellular interaction may be targeted to disrupt the acquired pro-metastatic profile.
doi:10.1371/journal.pone.0038340
PMCID: PMC3364218  PMID: 22666502
7.  Role of Medium- and Short-Chain L-3-Hydroxyacyl-CoA Dehydrogenase in the Regulation of Body Weight and Thermogenesis 
Endocrinology  2011;152(12):4641-4651.
Dysregulation of fatty acid oxidation plays a pivotal role in the pathophysiology of obesity and insulin resistance. Medium- and short-chain-3-hydroxyacyl-coenzyme A (CoA) dehydrogenase (SCHAD) (gene name, hadh) catalyze the third reaction of the mitochondrial β-oxidation cascade, the oxidation of 3-hydroxyacyl-CoA to 3-ketoacyl-CoA, for medium- and short-chain fatty acids. We identified hadh as a putative obesity gene by comparison of two genome-wide scans, a quantitative trait locus analysis previously performed in the polygenic obese New Zealand obese mouse and an earlier described small interfering RNA-mediated mutagenesis in Caenorhabditis elegans. In the present study, we show that mice lacking SCHAD (hadh−/−) displayed a lower body weight and a reduced fat mass in comparison with hadh+/+ mice under high-fat diet conditions, presumably due to an impaired fuel efficiency, the loss of acylcarnitines via the urine, and increased body temperature. Food intake, total energy expenditure, and locomotor activity were not altered in knockout mice. Hadh−/− mice exhibited normal fat tolerance at 20 C. However, during cold exposure, knockout mice were unable to clear triglycerides from the plasma and to maintain their normal body temperature, indicating that SCHAD plays an important role in adaptive thermogenesis. Blood glucose concentrations in the fasted and postprandial state were significantly lower in hadh−/− mice, whereas insulin levels were elevated. Accordingly, insulin secretion in response to glucose and glucose plus palmitate was elevated in isolated islets of knockout mice. Therefore, our data indicate that SCHAD is involved in thermogenesis, in the maintenance of body weight, and in the regulation of nutrient-stimulated insulin secretion.
doi:10.1210/en.2011-1547
PMCID: PMC3359510  PMID: 21990309
8.  A Genome-Wide Metabolic QTL Analysis in Europeans Implicates Two Loci Shaped by Recent Positive Selection 
PLoS Genetics  2011;7(9):e1002270.
We have performed a metabolite quantitative trait locus (mQTL) study of the 1H nuclear magnetic resonance spectroscopy (1H NMR) metabolome in humans, building on recent targeted knowledge of genetic drivers of metabolic regulation. Urine and plasma samples were collected from two cohorts of individuals of European descent, with one cohort comprised of female twins donating samples longitudinally. Sample metabolite concentrations were quantified by 1H NMR and tested for association with genome-wide single-nucleotide polymorphisms (SNPs). Four metabolites' concentrations exhibited significant, replicable association with SNP variation (8.6×10−11
Author Summary
Physiological concentrations of metabolites—small molecules involved in biochemical processes in living systems—can be measured and used to diagnose and predict disease states. A common goal is to detect and clinically exploit statistical differences in metabolite concentrations between diseased and healthy individuals. As a basis for the design and interpretation of case-control studies, it is useful to have a characterization of metabolic diversity amongst healthy individuals, some of which stems from inter-individual genetic variation. When a single genetic locus has a sufficiently strong effect on metabolism, its genomic position can be determined by collecting metabolite concentration data and genome-wide genotype data on a set of individuals and searching for associations between the two data sets—a so-called metabolite quantitative trait locus (mQTL) study. By so tracing mQTLs, we can identify the genetic drivers of metabolism, characterize how the nature or quantity of the corresponding expressed protein(s) feeds forward to influence metabolite levels, and specify disease-predictive models that incorporate mutual dependence amongst genetics, environment, and metabolism.
doi:10.1371/journal.pgen.1002270
PMCID: PMC3169529  PMID: 21931564
PLoS Genetics  2011;7(8):e1002215.
Metabolomic profiling and the integration of whole-genome genetic association data has proven to be a powerful tool to comprehensively explore gene regulatory networks and to investigate the effects of genetic variation at the molecular level. Serum metabolite concentrations allow a direct readout of biological processes, and association of specific metabolomic signatures with complex diseases such as Alzheimer's disease and cardiovascular and metabolic disorders has been shown. There are well-known correlations between sex and the incidence, prevalence, age of onset, symptoms, and severity of a disease, as well as the reaction to drugs. However, most of the studies published so far did not consider the role of sexual dimorphism and did not analyse their data stratified by gender. This study investigated sex-specific differences of serum metabolite concentrations and their underlying genetic determination. For discovery and replication we used more than 3,300 independent individuals from KORA F3 and F4 with metabolite measurements of 131 metabolites, including amino acids, phosphatidylcholines, sphingomyelins, acylcarnitines, and C6-sugars. A linear regression approach revealed significant concentration differences between males and females for 102 out of 131 metabolites (p-values<3.8×10−4; Bonferroni-corrected threshold). Sex-specific genome-wide association studies (GWAS) showed genome-wide significant differences in beta-estimates for SNPs in the CPS1 locus (carbamoyl-phosphate synthase 1, significance level: p<3.8×10−10; Bonferroni-corrected threshold) for glycine. We showed that the metabolite profiles of males and females are significantly different and, furthermore, that specific genetic variants in metabolism-related genes depict sexual dimorphism. Our study provides new important insights into sex-specific differences of cell regulatory processes and underscores that studies should consider sex-specific effects in design and interpretation.
Author Summary
The combination of genomic and metabolic studies during the last years has provided astonishing results. However, most of the studies published so far did not consider the role of sexual dimorphism and did not analyse their data stratified by sex. The investigation of 131 serum metabolite concentrations of >3,300 population-based samples (KORA F3/F4) revealed significant differences in the metabolite profile of males and females. Furthermore, a genome-wide picture of sex-specific genetic variations in human metabolism (>2,000 subjects from KORA F3/F4 cohorts) was investigated. Sex-specific genome-wide association studies (GWAS) showed differences in the effect of genetic variations on metabolites in men and women. SNPs in the CPS1 (carbamoyl-phosphate synthase 1) locus showed genome-wide significant differences in beta-estimates of sex-specific association analysis (significance level: 3.8×10−10) for glycine. As global metabolomic techniques are more and more refined to identify more compounds in single biological samples, the predictive power of this new technology will greatly increase. This suggests that metabolites, which may be used as predictive biomarkers to indicate the presence or severity of a disease, have to be used selectively depending on sex.
doi:10.1371/journal.pgen.1002215
PMCID: PMC3154959  PMID: 21852955
PLoS ONE  2011;6(7):e21230.
Background
Human plasma and serum are widely used matrices in clinical and biological studies. However, different collecting procedures and the coagulation cascade influence concentrations of both proteins and metabolites in these matrices. The effects on metabolite concentration profiles have not been fully characterized.
Methodology/Principal Findings
We analyzed the concentrations of 163 metabolites in plasma and serum samples collected simultaneously from 377 fasting individuals. To ensure data quality, 41 metabolites with low measurement stability were excluded from further analysis. In addition, plasma and corresponding serum samples from 83 individuals were re-measured in the same plates and mean correlation coefficients (r) of all metabolites between the duplicates were 0.83 and 0.80 in plasma and serum, respectively, indicating significantly better stability of plasma compared to serum (p = 0.01). Metabolite profiles from plasma and serum were clearly distinct with 104 metabolites showing significantly higher concentrations in serum. In particular, 9 metabolites showed relative concentration differences larger than 20%. Despite differences in absolute concentration between the two matrices, for most metabolites the overall correlation was high (mean r = 0.81±0.10), which reflects a proportional change in concentration. Furthermore, when two groups of individuals with different phenotypes were compared with each other using both matrices, more metabolites with significantly different concentrations could be identified in serum than in plasma. For example, when 51 type 2 diabetes (T2D) patients were compared with 326 non-T2D individuals, 15 more significantly different metabolites were found in serum, in addition to the 25 common to both matrices.
Conclusions/Significance
Our study shows that reproducibility was good in both plasma and serum, and better in plasma. Furthermore, as long as the same blood preparation procedure is used, either matrix should generate similar results in clinical and biological studies. The higher metabolite concentrations in serum, however, make it possible to provide more sensitive results in biomarker detection.
doi:10.1371/journal.pone.0021230
PMCID: PMC3132215  PMID: 21760889
BMC Systems Biology  2011;5:21.
Background
With the advent of high-throughput targeted metabolic profiling techniques, the question of how to interpret and analyze the resulting vast amount of data becomes more and more important. In this work we address the reconstruction of metabolic reactions from cross-sectional metabolomics data, that is without the requirement for time-resolved measurements or specific system perturbations. Previous studies in this area mainly focused on Pearson correlation coefficients, which however are generally incapable of distinguishing between direct and indirect metabolic interactions.
Results
In our new approach we propose the application of a Gaussian graphical model (GGM), an undirected probabilistic graphical model estimating the conditional dependence between variables. GGMs are based on partial correlation coefficients, that is pairwise Pearson correlation coefficients conditioned against the correlation with all other metabolites. We first demonstrate the general validity of the method and its advantages over regular correlation networks with computer-simulated reaction systems. Then we estimate a GGM on data from a large human population cohort, covering 1020 fasting blood serum samples with 151 quantified metabolites. The GGM is much sparser than the correlation network, shows a modular structure with respect to metabolite classes, and is stable to the choice of samples in the data set. On the example of human fatty acid metabolism, we demonstrate for the first time that high partial correlation coefficients generally correspond to known metabolic reactions. This feature is evaluated both manually by investigating specific pairs of high-scoring metabolites, and then systematically on a literature-curated model of fatty acid synthesis and degradation. Our method detects many known reactions along with possibly novel pathway interactions, representing candidates for further experimental examination.
Conclusions
In summary, we demonstrate strong signatures of intracellular pathways in blood serum data, and provide a valuable tool for the unbiased reconstruction of metabolic reactions from large-scale metabolomics data sets.
doi:10.1186/1752-0509-5-21
PMCID: PMC3224437  PMID: 21281499
Nucleic Acids Research  2010;39(Database issue):D220-D224.
The Munich Information Center for Protein Sequences (MIPS at the Helmholtz Center for Environmental Health, Neuherberg, Germany) has many years of experience in providing annotated collections of biological data. Selected data sets of high relevance, such as model genomes, are subjected to careful manual curation, while the bulk of high-throughput data is annotated by automatic means. High-quality reference resources developed in the past and still actively maintained include Saccharomyces cerevisiae, Neurospora crassa and Arabidopsis thaliana genome databases as well as several protein interaction data sets (MPACT, MPPI and CORUM). More recent projects are PhenomiR, the database on microRNA-related phenotypes, and MIPS PlantsDB for integrative and comparative plant genome research. The interlinked resources SIMAP and PEDANT provide homology relationships as well as up-to-date and consistent annotation for 38 000 000 protein sequences. PPLIPS and CCancer are versatile tools for proteomics and functional genomics interfacing to a database of compilations from gene lists extracted from literature. A novel literature-mining tool, EXCERBT, gives access to structured information on classified relations between genes, proteins, phenotypes and diseases extracted from Medline abstracts by semantic analysis. All databases described here, as well as the detailed descriptions of our projects can be accessed through the MIPS WWW server (http://mips.helmholtz-muenchen.de).
doi:10.1093/nar/gkq1157
PMCID: PMC3013725  PMID: 21109531
PLoS ONE  2010;5(11):e13953.
Background
Metabolomics is the rapidly evolving field of the comprehensive measurement of ideally all endogenous metabolites in a biological fluid. However, no single analytic technique covers the entire spectrum of the human metabolome. Here we present results from a multiplatform study, in which we investigate what kind of results can presently be obtained in the field of diabetes research when combining metabolomics data collected on a complementary set of analytical platforms in the framework of an epidemiological study.
Methodology/Principal Findings
40 individuals with self-reported diabetes and 60 controls (male, over 54 years) were randomly selected from the participants of the population-based KORA (Cooperative Health Research in the Region of Augsburg) study, representing an extensively phenotyped sample of the general German population. Concentrations of over 420 unique small molecules were determined in overnight-fasting blood using three different techniques, covering nuclear magnetic resonance and tandem mass spectrometry. Known biomarkers of diabetes could be replicated by this multiple metabolomic platform approach, including sugar metabolites (1,5-anhydroglucoitol), ketone bodies (3-hydroxybutyrate), and branched chain amino acids. In some cases, diabetes-related medication can be detected (pioglitazone, salicylic acid).
Conclusions/Significance
Our study depicts the promising potential of metabolomics in diabetes research by identification of a series of known and also novel, deregulated metabolites that associate with diabetes. Key observations include perturbations of metabolic pathways linked to kidney dysfunction (3-indoxyl sulfate), lipid metabolism (glycerophospholipids, free fatty acids), and interaction with the gut microflora (bile acids). Our study suggests that metabolic markers hold the potential to detect diabetes-related complications already under sub-clinical conditions in the general population.
doi:10.1371/journal.pone.0013953
PMCID: PMC2978704  PMID: 21085649
Metabolomics is an emerging field that is based on the quantitative measurement of as many small organic molecules occurring in a biological sample as possible. Due to recent technical advances, metabolomics can now be used widely as an analytical high-throughput technology in drug testing and epidemiological metabolome and genome wide association studies. Analogous to chip-based gene expression analyses, the enormous amount of data produced by modern kit-based metabolomics experiments poses new challenges regarding their biological interpretation in the context of various sample phenotypes. We developed metaP-server to facilitate data interpretation. metaP-server provides automated and standardized data analysis for quantitative metabolomics data, covering the following steps from data acquisition to biological interpretation: (i) data quality checks, (ii) estimation of reproducibility and batch effects, (iii) hypothesis tests for multiple categorical phenotypes, (iv) correlation tests for metric phenotypes, (v) optionally including all possible pairs of metabolite concentration ratios, (vi) principal component analysis (PCA), and (vii) mapping of metabolites onto colored KEGG pathway maps. Graphical output is clickable and cross-linked to sample and metabolite identifiers. Interactive coloring of PCA and bar plots by phenotype facilitates on-line data exploration. For users of commercial metabolomics kits, cross-references to the HMDB, LipidMaps, KEGG, PubChem, and CAS databases are provided. metaP-server is freely accessible at http://metabolomics.helmholtz-muenchen.de/metap2/.
doi:10.1155/2011/839862
PMCID: PMC2946609  PMID: 20936179
Virology  2008;380(2):388-393.
Three short (7 to 9 nucleotides) highly conserved nucleotide sequences were identified in the putative promoter regions (150 bp upstream and 50 bp downstream of the ATG translation start site) of three members of the genus Chlorovirus, family Phycodnaviridae. Most of these sequences occurred in similar locations within the defined promoter regions. The sequence and location of the motifs were often conserved among homologous ORFs within the Chlorovirus family. One of these conserved sequences (AATGACA) is predominately associated with genes expressed early in virus replication.
doi:10.1016/j.virol.2008.07.025
PMCID: PMC2582218  PMID: 18768195
Chlorella viruses; Phycodnaviridae; Chlorovirus; Virus PBCV-1; Virus NY-2A; Virus MT325; Promoters
PLoS ONE  2008;3(12):e3863.
Exposure to nicotine during smoking causes a multitude of metabolic changes that are poorly understood. We quantified and analyzed 198 metabolites in 283 serum samples from the human cohort KORA (Cooperative Health Research in the Region of Augsburg). Multivariate analysis of metabolic profiles revealed that the group of smokers could be clearly differentiated from the groups of former smokers and non-smokers. Moreover, 23 lipid metabolites were identified as nicotine-dependent biomarkers. The levels of these biomarkers are all up-regulated in smokers compared to those in former and non-smokers, except for three acyl-alkyl-phosphatidylcholines (e.g. plasmalogens). Consistently significant results were further found for the ratios of plasmalogens to diacyl-phosphatidylcolines, which are reduced in smokers and regulated by the enzyme alkylglycerone phosphate synthase (alkyl-DHAP) in both ether lipid and glycerophospholipid pathways. Notably, our metabolite profiles are consistent with the strong down-regulation of the gene for alkyl-DHAP (AGPS) in smokers that has been found in a study analyzing gene expression in human lung tissues. Our data suggest that smoking is associated with plasmalogen-deficiency disorders, caused by reduced or lack of activity of the peroxisomal enzyme alkyl-DHAP. Our findings provide new insight into the pathophysiology of smoking addiction. Activation of the enzyme alkyl-DHAP by small molecules may provide novel routes for therapy.
doi:10.1371/journal.pone.0003863
PMCID: PMC2588343  PMID: 19057651
PLoS Genetics  2008;4(11):e1000282.
The rapidly evolving field of metabolomics aims at a comprehensive measurement of ideally all endogenous metabolites in a cell or body fluid. It thereby provides a functional readout of the physiological state of the human body. Genetic variants that associate with changes in the homeostasis of key lipids, carbohydrates, or amino acids are not only expected to display much larger effect sizes due to their direct involvement in metabolite conversion modification, but should also provide access to the biochemical context of such variations, in particular when enzyme coding genes are concerned. To test this hypothesis, we conducted what is, to the best of our knowledge, the first GWA study with metabolomics based on the quantitative measurement of 363 metabolites in serum of 284 male participants of the KORA study. We found associations of frequent single nucleotide polymorphisms (SNPs) with considerable differences in the metabolic homeostasis of the human body, explaining up to 12% of the observed variance. Using ratios of certain metabolite concentrations as a proxy for enzymatic activity, up to 28% of the variance can be explained (p-values 10−16 to 10−21). We identified four genetic variants in genes coding for enzymes (FADS1, LIPC, SCAD, MCAD) where the corresponding metabolic phenotype (metabotype) clearly matches the biochemical pathways in which these enzymes are active. Our results suggest that common genetic polymorphisms induce major differentiations in the metabolic make-up of the human population. This may lead to a novel approach to personalized health care based on a combination of genotyping and metabolic characterization. These genetically determined metabotypes may subscribe the risk for a certain medical phenotype, the response to a given drug treatment, or the reaction to a nutritional intervention or environmental challenge.
Author Summary
This paper reports what is, to the best of our knowledge, the first genome-wide association (GWA) study with metabolic traits as phenotypic traits. By simultaneous measurements of single nucleotide polymorphisms (SNPs) and serum concentrations of endogenous organic compounds in a human population, we identify genetically determined variants in metabolic phenotype (metabotype) that exhibit large effect sizes. Four of these polymorphisms are located in genes coding for well-characterized enzymes of the lipid metabolism. We find that individuals with different genotypes in these genes have significantly different metabolic capacities with respect to the synthesis of some polyunsaturated fatty acids, the beta-oxidation of short- and medium-chain fatty acids, and the breakdown of triglycerides. In this approach, the concept of the “genetically determined metabotype” as an intermediate phenotype is central, as it becomes a measurable quantity in the framework of GWA studies with metabolomics. The investigation of the genetically determined metabotypes in their biochemical context might help to better understand the pathogenesis of common diseases and gene–environment interactions. These findings could result in a step towards personalized health care and nutrition based on a combination of genotyping and metabolic characterization.
doi:10.1371/journal.pgen.1000282
PMCID: PMC2581785  PMID: 19043545
Nucleic Acids Research  2008;36(Web Server issue):W481-W484.
Recent technical advances in mass spectrometry (MS) have brought the field of metabolomics to a point where large numbers of metabolites from numerous prokaryotic and eukaryotic organisms can now be easily and precisely detected. The challenge today lies in the correct annotation of these metabolites on the basis of their accurate measured masses. Assignment of bulk chemical formula is generally possible, but without consideration of the biological and genomic context, concrete metabolite annotations remain difficult and uncertain. MassTRIX responds to this challenge by providing a hypothesis-driven approach to high precision MS data annotation. It presents the identified chemical compounds in their genomic context as differentially colored objects on KEGG pathway maps. Information on gene transcription or differences in the gene complement (e.g. samples from different bacterial strains) can be easily added. The user can thus interpret the metabolic state of the organism in the context of its potential and, in the case of submitted transcriptomics data, real enzymatic capacities. The MassTRIX web server is freely accessible at http://masstrix.org
doi:10.1093/nar/gkn194
PMCID: PMC2447776  PMID: 18442993
Mimivirus, the largest known double-stranded DNA virus, unexpectedly exhibits components of the protein-translation apparatus. The crystallization and functional assay of the viral tyrosyl tRNA synthetase are reported.
The amoeba-infecting Mimivirus is the largest known double-stranded DNA virus, with a 400 nm particle size, comparable to that of mycoplasma. The complete sequence of its 1.2 Mbp genome has recently been determined [Raoult et al. (2004 ▶), Science, 306, 1344–1350] and revealed numerous genes that were not expected to be found in a virus, such as genes encoding translation components, including 4-amino-acyl tRNA synthetases and homologues to various translation initiation, elongation and termination factors. A comprehensive structural and functional study of these Mimivirus gene products was initiated, as they may hold important clues about the origin of DNA viruses. Here, the first preliminary crystallographic and functional results obtained on one of these targets, Mimivirus TyrRS, are reported. Preliminary phasing was obtained using an original combination of homology modelling and normal mode analysis. Experimental evidence that Mimivirus tyrosyl tRNA synthetase recombinant gene product does indeed activate tyrosine is also presented.
doi:10.1107/S174430910500062X
PMCID: PMC1952258  PMID: 16510997
nucleocytoplasmic large DNA virus; tyrosyl tRNA synthetase; translation apparatus; structural genomics
PLoS Genetics  2007;3(1):e14.
The Rickettsia genus is a group of obligate intracellular α-proteobacteria representing a paradigm of reductive evolution. Here, we investigate the evolutionary processes that shaped the genomes of the genus. The reconstruction of ancestral genomes indicates that their last common ancestor contained more genes, but already possessed most traits associated with cellular parasitism. The differences in gene repertoires across modern Rickettsia are mainly the result of differential gene losses from the ancestor. We demonstrate using computer simulation that the propensity of loss was variable across genes during this process. We also analyzed the ratio of nonsynonymous to synonymous changes (Ka/Ks) calculated as an average over large sets of genes to assay the strength of selection acting on the genomes of Rickettsia, Anaplasmataceae, and free-living γ-proteobacteria. As a general trend, Ka/Ks were found to decrease with increasing divergence between genomes. The high Ka/Ks for closely related genomes are probably due to a lag in the removal of slightly deleterious nonsynonymous mutations by natural selection. Interestingly, we also observed a decrease of the rate of gene loss with increasing divergence, suggesting a similar lag in the removal of slightly deleterious pseudogene alleles. For larger divergence (Ks > 0.2), Ka/Ks converge toward similar values indicating that the levels of selection are roughly equivalent between intracellular α-proteobacteria and their free-living relatives. This contrasts with the view that obligate endocellular microorganisms tend to evolve faster as a consequence of reduced effectiveness of selection, and suggests a major role of enhanced background mutation rates on the fast protein divergence in the obligate intracellular α-proteobacteria.
Author Summary
Genome downsizing and fast sequence divergence are frequently observed in bacteria living exclusively within the cells of higher eukaryotes. However, the driving forces and contributions of these processes to the genome diversity of the microorganisms remain poorly understood. The genus Rickettsia, a group of small obligate intracellular pathogens of humans, provides a fascinating model to study the genome downsizing process. In this article, we used seven Rickettsia genomes to reconstruct the genome of their ancestor and inferred the origin and fate of the genes found in today's species. We identify the process of gene loss as the main cause of genome diversification within the genus and show that the rate of gene loss, sequence divergence, and genome rearrangements are highly variable across the various Rickettsia lineages. This heterogeneity likely reflects the intricate effects of specialization to distinct arthropod hosts and critical alterations of the gene repertoire, such as the losses of DNA repair genes and the amplification of mobile genes. In contrast, we did not find evidence for the role of reduced population sizes on the long-term acceleration of sequence evolution. Overall, the data presented in this article shed new light on the fundamental evolutionary processes that drive the evolution of obligate intracellular bacteria.
doi:10.1371/journal.pgen.0030014
PMCID: PMC1779305  PMID: 17238289
BMC Bioinformatics  2006;7:99.
Background
As numerous diseases involve errors in signal transduction, modern therapeutics often target proteins involved in cellular signaling. Interpretation of the activity of signaling pathways during disease development or therapeutic intervention would assist in drug development, design of therapy, and target identification. Microarrays provide a global measure of cellular response, however linking these responses to signaling pathways requires an analytic approach tuned to the underlying biology. An ongoing issue in pattern recognition in microarrays has been how to determine the number of patterns (or clusters) to use for data interpretation, and this is a critical issue as measures of statistical significance in gene ontology or pathways rely on proper separation of genes into groups.
Results
Here we introduce a method relying on gene annotation coupled to decompositional analysis of global gene expression data that allows us to estimate specific activity on strongly coupled signaling pathways and, in some cases, activity of specific signaling proteins. We demonstrate the technique using the Rosetta yeast deletion mutant data set, decompositional analysis by Bayesian Decomposition, and annotation analysis using ClutrFree. We determined from measurements of gene persistence in patterns across multiple potential dimensionalities that 15 basis vectors provides the correct dimensionality for interpreting the data. Using gene ontology and data on gene regulation in the Saccharomyces Genome Database, we identified the transcriptional signatures of several cellular processes in yeast, including cell wall creation, ribosomal disruption, chemical blocking of protein synthesis, and, criticially, individual signatures of the strongly coupled mating and filamentation pathways.
Conclusion
This works demonstrates that microarray data can provide downstream indicators of pathway activity either through use of gene ontology or transcription factor databases. This can be used to investigate the specificity and success of targeted therapeutics as well as to elucidate signaling activity in normal and disease processes.
doi:10.1186/1471-2105-7-99
PMCID: PMC1413561  PMID: 16507110
Journal of Virology  2005;79(24):15591.
doi:10.1128/JVI.79.24.15591.2005
PMCID: PMC1316051
Journal of Virology  2005;79(22):14095-14101.
Gene duplication is key to molecular evolution in all three domains of life and may be the first step in the emergence of new gene function. It is a well-recognized feature in large DNA viruses but has not been studied extensively in the largest known virus to date, the recently discovered Acanthamoeba polyphaga Mimivirus. Here, I present a systematic analysis of gene and genome duplication events in the mimivirus genome. I found that one-third of the mimivirus genes are related to at least one other gene in the mimivirus genome, either through a large segmental genome duplication event that occurred in the more remote past or through more recent gene duplication events, which often occur in tandem. This shows that gene and genome duplication played a major role in shaping the mimivirus genome. Using multiple alignments, together with remote-homology detection methods based on Hidden Markov Model comparison, I assign putative functions to some of the paralogous gene families. I suggest that a large part of the duplicated mimivirus gene families are likely to interfere with important host cell processes, such as transcription control, protein degradation, and cell regulatory processes. My findings support the view that large DNA viruses are complex evolving organisms, possibly deeply rooted within the tree of life, and oppose the paradigm that viral evolution is dominated by lateral gene acquisition, at least in regard to large DNA viruses.
doi:10.1128/JVI.79.22.14095-14101.2005
PMCID: PMC1280231  PMID: 16254344
BMC Bioinformatics  2005;6:247.
Background
The large amount of completely sequenced genomes allows genomic context analysis to predict reliable functional associations between prokaryotic proteins. Major methods rely on the fact that genes encoding physically interacting partners or members of shared metabolic pathways tend to be proximate on the genome, to evolve in a correlated manner and to be fused as a single sequence in another organism.
Results
The new "Gene Function Predictor", linked to the web server Phydbac proposes putative associations between Escherichia coli K-12 proteins derived from a combination of these methods. We show that associations made by this tool are more accurate than linkages found in the other established databases. Predicted assignments to GO categories, based on pre-existing functional annotations of associated proteins are also available. This new database currently holds 9,379 pairwise links at an expected success rate of at least 80%, the 6,466 functional predictions to GO terms derived from these links having a level of accuracy higher than 70%.
Conclusion
The "Gene Function Predictor" is an automatic tool that aims to help biologists by providing them hypothetical functional predictions out of genomic context characteristics. The "Gene Function predictor" is available at .
doi:10.1186/1471-2105-6-247
PMCID: PMC1280922  PMID: 16221304
Nucleic Acids Research  2004;32(Web Server issue):W606-W609.
Molecular replacement (MR) is the method of choice for X-ray crystallography structure determination when structural homologues are available in the Protein Data Bank (PDB). Although the success rate of MR decreases sharply when the sequence similarity between template and target proteins drops below 35% identical residues, it has been found that screening for MR solutions with a large number of different homology models may still produce a suitable solution where the original template failed. Here we present the web tool CaspR, implementing such a strategy in an automated manner. On input of experimental diffraction data, of the corresponding target sequence and of one or several potential templates, CaspR executes an optimized molecular replacement procedure using a combination of well-established stand-alone software tools. The protocol of model building and screening begins with the generation of multiple structure–sequence alignments produced with T-COFFEE, followed by homology model building using MODELLER, molecular replacement with AMoRe and model refinement based on CNS. As a result, CaspR provides a progress report in the form of hierarchically organized summary sheets that describe the different stages of the computation with an increasing level of detail. For the 10 highest-scoring potential solutions, pre-refined structures are made available for download in PDB format. Results already obtained with CaspR and reported on the web server suggest that such a strategy significantly increases the fraction of protein structures which may be solved by MR. Moreover, even in situations where standard MR yields a solution, pre-refined homology models produced by CaspR significantly reduce the time-consuming refinement process. We expect this automated procedure to have a significant impact on the throughput of large-scale structural genomics projects. CaspR is freely available at http://igs-server.cnrs-mrs.fr/Caspr/.
doi:10.1093/nar/gkh400
PMCID: PMC441538  PMID: 15215460

Results 1-25 (31)