Polyphenols are a major class of bioactive phytochemicals whose consumption may play a role in the prevention of a number of chronic diseases such as cardiovascular diseases, type II diabetes and cancers. Phenol-Explorer, launched in 2009, is the only freely available web-based database on the content of polyphenols in food and their in vivo metabolism and pharmacokinetics. Here we report the third release of the database (Phenol-Explorer 3.0), which adds data on the effects of food processing on polyphenol contents in foods. Data on >100 foods, covering 161 polyphenols or groups of polyphenols before and after processing, were collected from 129 peer-reviewed publications and entered into new tables linked to the existing relational design. The effect of processing on polyphenol content is expressed in the form of retention factor coefficients, or the proportion of a given polyphenol retained after processing, adjusted for change in water content. The result is the first database on the effects of food processing on polyphenol content and, following the model initially defined for Phenol-Explorer, all data may be traced back to original sources. The new update will allow polyphenol scientists to more accurately estimate polyphenol exposure from dietary surveys.
Database URL: http://www.phenol-explorer.eu
Characterization of prophages in sequenced bacterial genomes is important for virulence assessment, evolutionary analysis, and phage application development. The objective of this study was to identify complete, inducible prophages in the cystic fibrosis (CF) clinical isolate Burkholderia cenocepacia H111. Using the prophage-finding program PHAge Search Tool (PHAST), we identified three putative intact prophages in the H111 sequence. Virions were readily isolated from H111 culture supernatants following extended incubation. Using shotgun cloning and sequencing, one of these virions (designated ϕH111-1 [vB_BceM_ϕH111-1]) was identified as the infective particle of a PHAST-detected intact prophage. ϕH111-1 has an extremely broad host range with respect to B. cenocepacia strains and is predicted to use lipopolysaccharide (LPS) as a receptor. Bioinformatics analysis indicates that the prophage is 42,972 base pairs in length, encodes 54 proteins, and shows relatedness to the virion morphogenesis modules of AcaML1 and “Vhmllikevirus” myoviruses. As ϕH111-1 is active against a broad panel of clinical strains and encodes no putative virulence factors, it may be therapeutically effective for Burkholderia infections.
prophage identification; PHAST; bioinformatics; phage therapy; Burkholderia cepacia complex
Urine has long been a “favored” biofluid among metabolomics researchers. It is sterile, easy-to-obtain in large volumes, largely free from interfering proteins or lipids and chemically complex. However, this chemical complexity has also made urine a particularly difficult substrate to fully understand. As a biological waste material, urine typically contains metabolic breakdown products from a wide range of foods, drinks, drugs, environmental contaminants, endogenous waste metabolites and bacterial by-products. Many of these compounds are poorly characterized and poorly understood. In an effort to improve our understanding of this biofluid we have undertaken a comprehensive, quantitative, metabolome-wide characterization of human urine. This involved both computer-aided literature mining and comprehensive, quantitative experimental assessment/validation. The experimental portion employed NMR spectroscopy, gas chromatography mass spectrometry (GC-MS), direct flow injection mass spectrometry (DFI/LC-MS/MS), inductively coupled plasma mass spectrometry (ICP-MS) and high performance liquid chromatography (HPLC) experiments performed on multiple human urine samples. This multi-platform metabolomic analysis allowed us to identify 445 and quantify 378 unique urine metabolites or metabolite species. The different analytical platforms were able to identify (quantify) a total of: 209 (209) by NMR, 179 (85) by GC-MS, 127 (127) by DFI/LC-MS/MS, 40 (40) by ICP-MS and 10 (10) by HPLC. Our use of multiple metabolomics platforms and technologies allowed us to identify several previously unknown urine metabolites and to substantially enhance the level of metabolome coverage. It also allowed us to critically assess the relative strengths and weaknesses of different platforms or technologies. The literature review led to the identification and annotation of another 2206 urinary compounds and was used to help guide the subsequent experimental studies. An online database containing the complete set of 2651 confirmed human urine metabolite species, their structures (3079 in total), concentrations, related literature references and links to their known disease associations are freely available at http://www.urinemetabolome.ca.
The widespread applications of various ‘omics’ technologies in biomedical research together with the emergence of public data repositories have resulted in a plethora of data sets for almost any given physiological state or disease condition. Properly combining or integrating these data sets with similar basic hypotheses can help reduce study bias, increase statistical power and improve overall biological understanding. However, the difficulties in data management and the complexities of analytical approaches have significantly limited data integration to enable meta-analysis. Here, we introduce integrative meta-analysis of expression data (INMEX), a user-friendly web-based tool designed to support meta-analysis of multiple gene-expression data sets, as well as to enable integration of data sets from gene expression and metabolomics experiments. INMEX contains three functional modules. The data preparation module supports flexible data processing, annotation and visualization of individual data sets. The statistical analysis module allows researchers to combine multiple data sets based on P-values, effect sizes, rank orders and other features. The significant genes can be examined in functional analysis module for enriched Gene Ontology terms or Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, or expression profile visualization. INMEX has built-in support for common gene/metabolite identifiers (IDs), as well as 45 popular microarray platforms for human, mouse and rat. Complex operations are performed through a user-friendly web interface in a step-by-step manner. INMEX is freely available at http://www.inmex.ca.
The goal of many metabolomic studies is to identify the molecular structure of endogenous molecules that are differentially expressed among sampled or treatment groups. The identified compounds can then be used to gain an understanding of disease mechanisms. Unfortunately, despite recent advances in a variety of analytical techniques, small molecule (<1000 Da) identification remains difficult. Rarely can a chemical structure be determined from experimental “features” such as retention time, exact mass, and collision induced dissociation spectra. Thus, without knowing structure, biological significance remains obscure. In this study we explore an identification method in which the measured exact mass of an unknown is used to query available chemical databases to compile a list of candidate compounds. Predictions are made for the candidates using models of experimental features that have been measured for the unknown. The predicted values are used to filter the candidate list by eliminating compounds with predicted values substantially different from the unknown. The intent is to reduce the list of candidates to a reasonable number that can be obtained and measured for confirmation. To facilitate this exploration, we measured data and created models for two experimental features; MS Ecom50 (the energy in eV required to fragment 50% of a selected precursor ion) and HPLC retention index. Using a dataset of 52 compounds, Ecom50 models were developed based on both Molconn and CODESSA structural descriptors. These models gave r2 values of 0.89 to 0.94 depending on the number of inputs, the modeling algorithm chosen, and whether neutral or protonated structures were used. The retention index model was developed with 400 compounds using a back propagation artificial neural network and 33 Molconn structure descriptors. External validation gave a v2 = 0.86 and standard error of 38 retention index units. As a test of the validity of the filtering approach, the Ecom50 and retention index models, along with exact mass and collision induced dissociation spectra matching, were used to identify 1,3-dicyclohexylurea in human plasma. This compound was not previously known to exist in human biofluids and its elemental formula was identical to 315 other candidate compounds downloaded from PubChem. These results suggest that the use of Ecom50 and retention index predictive models can improve non-targeted metabolite structure identification using HPLC/MS derived structural features.
metabolomics; retention index; Ecom50; mass spectrometry; HPLC; database searching; structure identification; molecular topology; Molconn; Codessa; artificial neural network
Prion diseases are fatal neurodegenerative diseases associated with the conversion of cellular prion protein (PrPC) in the central nervous system into the infectious isoform (PrPSc). The mechanics of conversion are almost entirely unknown, with understanding stymied by the lack of an atomic-level structure for PrPSc. A number of pathogenic PrPC mutants exist that are characterized by an increased propensity for conversion into PrPSc and that differ from wild-type by only a single amino-acid point mutation in their primary structure. These mutations are known to perturb the stability and conformational dynamics of the protein. Understanding of how this occurs may provide insight into the mechanism of PrPC conversion. In this work we sought to explore wild-type and pathogenic mutant prion protein structure and dynamics by analysis of the current fluctuations through an organic α-hemolysin nanometer-scale pore (nanopore) in which a single prion protein has been captured electrophoretically. In doing this, we find that wild-type and D178N mutant PrPC, (a PrPC mutant associated with both Fatal Familial Insomnia and Creutzfeldt-Jakob disease), exhibit easily distinguishable current signatures and kinetics inside the pore and we further demonstrate, with the use of Hidden Markov Model signal processing, accurate discrimination between these two proteins at the single molecule level based on the kinetics of a single PrPC capture event. Moreover, we present a four-state model to describe wild-type PrPC kinetics in the pore as a first step in our investigation on characterizing the differences in kinetics and conformational dynamics between wild-type and D178N mutant PrPC. These results demonstrate the potential of nanopore analysis for highly sensitive, real-time protein and small molecule detection based on single molecule kinetics inside a nanopore, and show the utility of this technique as an assay to probe differences in stability between wild-type and mutant prion proteins at the single molecule level.
Structure-based clustering is commonly used to identify correct protein folds among candidate folds (also called decoys) generated by protein structure prediction programs. However, traditional clustering methods exhibit a poor runtime performance on large decoy sets. We hypothesized that a more efficient “partial“ clustering approach in combination with an improved scoring scheme could significantly improve both the speed and performance of existing candidate selection methods.
We propose a new scheme that performs rapid but incomplete clustering on protein decoys. Our method detects structurally similar decoys (measured using either Cα RMSD or GDT-TS score) and extracts representatives from them without assigning every decoy to a cluster. We integrated our new clustering strategy with several different scoring functions to assess both the performance and speed in identifying correct or near-correct folds. Experimental results on 35 Rosetta decoy sets and 40 I-TASSER decoy sets show that our method can improve the correct fold detection rate as assessed by two different quality criteria. This improvement is significantly better than two recently published clustering methods, Durandal and Calibur-lite. Speed and efficiency testing shows that our method can handle much larger decoy sets and is up to 22 times faster than Durandal and Calibur-lite.
The new method, named HS-Forest, avoids the computationally expensive task of clustering every decoy, yet still allows superior correct-fold selection. Its improved speed, efficiency and decoy-selection performance should enable structure prediction researchers to work with larger decoy sets and significantly improve their ab initio structure prediction performance.
“Big” molecules such as proteins and genes still continue to capture the imagination of most biologists, biochemists and bioinformaticians. “Small” molecules, on the other hand, are the molecules that most biologists, biochemists and bioinformaticians prefer to ignore. However, it is becoming increasingly apparent that small molecules such as amino acids, lipids and sugars play a far more important role in all aspects of disease etiology and disease treatment than we realized. This particular chapter focuses on an emerging field of bioinformatics called “chemical bioinformatics” – a discipline that has evolved to help address the blended chemical and molecular biological needs of toxicogenomics, pharmacogenomics, metabolomics and systems biology. In the following pages we will cover several topics related to chemical bioinformatics. First, a brief overview of some of the most important or useful chemical bioinformatic resources will be given. Second, a more detailed overview will be given on those particular resources that allow researchers to connect small molecules to diseases. This section will focus on describing a number of recently developed databases or knowledgebases that explicitly relate small molecules – either as the treatment, symptom or cause – to disease. Finally a short discussion will be provided on newly emerging software tools that exploit these databases as a means to discover new biomarkers or even new treatments for disease.
Nucleotide excision repair (NER) removes many types of DNA lesions including those induced by UV radiation and platinum-based therapy. Resistance to platinum-based therapy correlates with high expression of ERCC1, a major element of the NER machinery. The interaction between ERCC1 and XPA is essential for a successful NER function. Therefore, one way to regulate NER is by inhibiting the activity of ERCC1 and XPA.
Here we continued our earlier efforts aimed at the identification and characterization of novel inhibitors of the ERCC1-XPA interaction. We used a refined virtual screening approach combined with a biochemical and biological evaluation of the compounds for their ability to interact with ERCC1 and to sensitize cells to UV radiation. Our findings reveal a new validated ERCC1-XPA inhibitor that significantly sensitized colon cancer cells to UV radiation indicating a strong inhibition of the ERCC1-XPA interaction.
NER is a major factor in acquiring resistance to platinum-based therapy. Regulating the NER pathway has the potential of improving the efficacy of platinum treatments. One approach that we followed is to inhibit the essential interaction between the two NER elements, ERCC1 and XPA. Here, we performed virtual screening against the ERCC1-XPA interaction and identified novel inhibitors that block the XPA-ERCC1 binding. The identified inhibitors significantly sensitized colon cancer cells to UV radiation indicating a strong inhibition of the ERCC1-XPA interaction.
Hypoxic ischaemic encephalopathy (HIE) in newborns can cause significant long-term neurological disability. The insult is a complex injury characterised by energy failure and disruption of cellular homeostasis, leading to mitochondrial damage. The importance of individual metabolic pathways, and their interaction in the disease process is not fully understood. The aim of this study was to describe and quantify the metabolomic profile of umbilical cord blood samples in a carefully defined population of full-term infants with HIE.
Methods and Findings
The injury severity was defined using both the modified Sarnat score and continuous multichannel electroencephalogram. Using these classification systems, our population was divided into those with confirmed HIE (n = 31), asphyxiated infants without encephalopathy (n = 40) and matched controls (n = 71). All had umbilical cord blood drawn and biobanked at −80°C within 3 hours of delivery. A combined direct injection and LC-MS/MS assay (AbsolutIDQ p180 kit, Biocrates Life Sciences AG, Innsbruck, Austria) was used for the metabolomic analyses of the samples. Targeted metabolomic analysis showed a significant alteration between study groups in 29 metabolites from 3 distinct classes (Amino Acids, Acylcarnitines, and Glycerophospholipids). 9 of these metabolites were only significantly altered between neonates with Hypoxic ischaemic encephalopathy and matched controls, while 14 were significantly altered in both study groups. Multivariate Discriminant Analysis models developed showed clear multifactorial metabolite associations with both asphyxia and HIE. A logistic regression model using 5 metabolites clearly delineates severity of asphyxia and classifies HIE infants with AUC = 0.92. These data describe wide-spread disruption to not only energy pathways, but also nitrogen and lipid metabolism in both asphyxia and HIE.
This study shows that a multi-platform targeted approach to metabolomic analyses using accurately phenotyped and meticulously biobanked samples provides insight into the pathogenesis of perinatal asphyxia. It highlights the potential for metabolomic technology to develop a diagnostic test for HIE.
Metabolomics is increasingly being applied towards the identification of biomarkers for disease diagnosis, prognosis and risk prediction. Unfortunately among the many published metabolomic studies focusing on biomarker discovery, there is very little consistency and relatively little rigor in how researchers select, assess or report their candidate biomarkers. In particular, few studies report any measure of sensitivity, specificity, or provide receiver operator characteristic (ROC) curves with associated confidence intervals. Even fewer studies explicitly describe or release the biomarker model used to generate their ROC curves. This is surprising given that for biomarker studies in most other biomedical fields, ROC curve analysis is generally considered the standard method for performance assessment. Because the ultimate goal of biomarker discovery is the translation of those biomarkers to clinical practice, it is clear that the metabolomics community needs to start “speaking the same language” in terms of biomarker analysis and reporting-especially if it wants to see metabolite markers being routinely used in the clinic. In this tutorial, we will first introduce the concept of ROC curves and describe their use in single biomarker analysis for clinical chemistry. This includes the construction of ROC curves, understanding the meaning of area under ROC curves (AUC) and partial AUC, as well as the calculation of confidence intervals. The second part of the tutorial focuses on biomarker analyses within the context of metabolomics. This section describes different statistical and machine learning strategies that can be used to create multi-metabolite biomarker models and explains how these models can be assessed using ROC curves. In the third part of the tutorial we discuss common issues and potential pitfalls associated with different analysis methods and provide readers with a list of nine recommendations for biomarker analysis and reporting. To help readers test, visualize and explore the concepts presented in this tutorial, we also introduce a web-based tool called ROCCET (ROC Curve Explorer & Tester, http://www.roccet.ca). ROCCET was originally developed as a teaching aid but it can also serve as a training and testing resource to assist metabolomics researchers build biomarker models and conduct a range of common ROC curve analyses for biomarker studies.
Electronic supplementary material
The online version of this article (doi:10.1007/s11306-012-0482-9) contains supplementary material, which is available to authorized users.
Biomarker analysis; ROC curve; AUC; Confidence intervals; Optimal threshold; Sample size; Bootstrapping; Cross validation; Biomarker validation and reporting
The Human Metabolome Database (HMDB) (www.hmdb.ca) is a resource dedicated to providing scientists with the most current and comprehensive coverage of the human metabolome. Since its first release in 2007, the HMDB has been used to facilitate research for nearly 1000 published studies in metabolomics, clinical biochemistry and systems biology. The most recent release of HMDB (version 3.0) has been significantly expanded and enhanced over the 2009 release (version 2.0). In particular, the number of annotated metabolite entries has grown from 6500 to more than 40 000 (a 600% increase). This enormous expansion is a result of the inclusion of both ‘detected’ metabolites (those with measured concentrations or experimental confirmation of their existence) and ‘expected’ metabolites (those for which biochemical pathways are known or human intake/exposure is frequent but the compound has yet to be detected in the body). The latest release also has greatly increased the number of metabolites with biofluid or tissue concentration data, the number of compounds with reference spectra and the number of data fields per entry. In addition to this expansion in data quantity, new database visualization tools and new data content have been added or enhanced. These include better spectral viewing tools, more powerful chemical substructure searches, an improved chemical taxonomy and better, more interactive pathway maps. This article describes these enhancements to the HMDB, which was previously featured in the 2009 NAR Database Issue. (Note to referees, HMDB 3.0 will go live on 18 September 2012.).
The Escherichia coli Metabolome Database (ECMDB, http://www.ecmdb.ca) is a comprehensively annotated metabolomic database containing detailed information about the metabolome of E. coli (K-12). Modelled closely on the Human and Yeast Metabolome Databases, the ECMDB contains >2600 metabolites with links to ∼1500 different genes and proteins, including enzymes and transporters. The information in the ECMDB has been collected from dozens of textbooks, journal articles and electronic databases. Each metabolite entry in the ECMDB contains an average of 75 separate data fields, including comprehensive compound descriptions, names and synonyms, chemical taxonomy, compound structural and physicochemical data, bacterial growth conditions and substrates, reactions, pathway information, enzyme data, gene/protein sequence data and numerous hyperlinks to images, references and other public databases. The ECMDB also includes an extensive collection of intracellular metabolite concentration data compiled from our own work as well as other published metabolomic studies. This information is further supplemented with thousands of fully assigned reference nuclear magnetic resonance and mass spectrometry spectra obtained from pure E. coli metabolites that we (and others) have collected. Extensive searching, relational querying and data browsing tools are also provided that support text, chemical structure, spectral, molecular weight and gene/protein sequence queries. Because of E. coli’s importance as a model organism for biologists and as a biofactory for industry, we believe this kind of database could have considerable appeal not only to metabolomics researchers but also to molecular biologists, systems biologists and individuals in the biotechnology industry.
Phenol-Explorer, launched in 2009, is the only comprehensive web-based database on the content in foods of polyphenols, a major class of food bioactives that receive considerable attention due to their role in the prevention of diseases. Polyphenols are rarely absorbed and excreted in their ingested forms, but extensively metabolized in the body, and until now, no database has allowed the recall of identities and concentrations of polyphenol metabolites in biofluids after the consumption of polyphenol-rich sources. Knowledge of these metabolites is essential in the planning of experiments whose aim is to elucidate the effects of polyphenols on health. Release 2.0 is the first major update of the database, allowing the rapid retrieval of data on the biotransformations and pharmacokinetics of dietary polyphenols. Data on 375 polyphenol metabolites identified in urine and plasma were collected from 236 peer-reviewed publications on polyphenol metabolism in humans and experimental animals and added to the database by means of an extended relational design. Pharmacokinetic parameters have been collected and can be retrieved in both tabular and graphical form. The web interface has been enhanced and now allows the filtering of information according to various criteria. Phenol-Explorer 2.0, which will be periodically updated, should prove to be an even more useful and capable resource for polyphenol scientists because bioactivities and health effects of polyphenols are dependent on the nature and concentrations of metabolites reaching the target tissues. The Phenol-Explorer database is publicly available and can be found online at http://www.phenol-explorer.eu.
Water plays a unique role in all living organisms. Not only is it nature’s ubiquitous solvent, but it also actively takes part in many cellular processes. In particular, the structure and properties of interfacial water near biomolecules like proteins are often related to the function of the respective molecule. It can therefore be highly instructive to study the local water density around solutes in cellular systems, particularly when solvent-mediated forces like the hydrophobic effect are relevant. Computational methods like molecular dynamics (MD) simulations seem well suited to study these systems at the atomic level. However, due to sampling requirements, it is not clear that MD simulations are indeed the method of choice to obtain converged densities at a given level of precision. We here compare the calculation of local water densities with two different methods, MD simulations and the three-dimensional reference interaction site model with the Kovalenko-Hirata closure (3D-RISM-KH). In particular, we investigate the convergence of the local water density to assess the required simulation times for different levels of resolution. Moreover, we provide a quantitative comparison of the densities calculated with MD and with 3D-RISM-KH, and investigate the effect of the choice of the water model for both methods. Our results show that 3D-RISM-KH yields density distributions that are very similar to those from MD up to a 0.5 Å resolution, but for significantly reduced computational cost. The combined use of MD and 3D-RISM-KH emerges as an auspicious perspective for efficient solvent sampling in dynamical systems.
interfacial water; solvation; confined water; hydrophobic effect; MD; box size
Collective motions on ns-µs time scales are known to have a major impact on protein folding, stability, binding and enzymatic efficiency. It is also believed that these motions may have an important role in the early stages of prion protein misfolding and prion disease. In an effort to accurately characterize these motions and their potential influence on the misfolding and prion disease transmissibility we have conducted a combined analysis of molecular dynamic simulations and NMR-derived flexibility measurements over a diverse range of prion proteins. Using a recently developed numerical formalism, we have analyzed the essential collective dynamics (ECD) for prion proteins from eight different species including human, cow, elk, cat, hamster, chicken, turtle and frog. We also compared the numerical results with flexibility profiles generated by the random coil index (RCI) from NMR chemical shifts. Prion protein backbone flexibility derived from experimental NMR data and from theoretical computations show strong agreement with each other, demonstrating that it is possible to predict the observed RCI profiles employing the numerical ECD formalism. Interestingly, flexibility differences in the loop between second b strand (S2) and the second a helix (HB) appear to distinguish prion proteins from species that are susceptible to prion disease and those that are resistant. Our results show that the different levels of flexibility in the S2-HB loop in various species are predictable via the ECD method, indicating that ECD may be used to identify disease resistant variants of prion proteins, as well as the influence of prion proteins mutations on disease susceptibility or misfolding propensity.
prion proteins structural stability; molecular dynamics simulation; essential collective dynamics; protein dynamic domains; biomolecular NMR; rigid loop
With recent improvements in DNA sequencing and sample extraction techniques, the quantity and quality of metagenomic data are now growing exponentially. This abundance of richly annotated metagenomic data and bacterial census information has spawned a new branch of microbiology called comparative metagenomics. Comparative metagenomics involves the comparison of bacterial populations between different environmental samples, different culture conditions or different microbial hosts. However, in order to do comparative metagenomics, one typically requires a sophisticated knowledge of multivariate statistics and/or advanced software programming skills. To make comparative metagenomics more accessible to microbiologists, we have developed a freely accessible, easy-to-use web server for comparative metagenomic analysis called METAGENassist. Users can upload their bacterial census data from a wide variety of common formats, using either amplified 16S rRNA data or shotgun metagenomic data. Metadata concerning environmental, culture, or host conditions can also be uploaded. During the data upload process, METAGENassist also performs an automated taxonomic-to-phenotypic mapping. Phenotypic information covering nearly 20 functional categories such as GC content, genome size, oxygen requirements, energy sources and preferred temperature range is automatically generated from the taxonomic input data. Using this phenotypically enriched data, users can then perform a variety of multivariate and univariate data analyses including fold change analysis, t-tests, PCA, PLS-DA, clustering and classification. To facilitate data processing, users are guided through a step-by-step analysis workflow using a variety of menus, information hyperlinks and check boxes. METAGENassist also generates colorful, publication quality tables and graphs that can be downloaded and used directly in the preparation of scientific papers. METAGENassist is available at http://www.metagenassist.ca.
First released in 2009, MetaboAnalyst (www.metaboanalyst.ca) was a relatively simple web server designed to facilitate metabolomic data processing and statistical analysis. With continuing advances in metabolomics along with constant user feedback, it became clear that a substantial upgrade to the original server was necessary. MetaboAnalyst 2.0, which is the successor to MetaboAnalyst, represents just such an upgrade. MetaboAnalyst 2.0 now contains dozens of new features and functions including new procedures for data filtering, data editing and data normalization. It also supports multi-group data analysis, two-factor analysis as well as time-series data analysis. These new functions have also been supplemented with: (i) a quality-control module that allows users to evaluate their data quality before conducting any analysis, (ii) a functional enrichment analysis module that allows users to identify biologically meaningful patterns using metabolite set enrichment analysis and (iii) a metabolic pathway analysis module that allows users to perform pathway analysis and visualization for 15 different model organisms. In developing MetaboAnalyst 2.0 we have also substantially improved its graphical presentation tools. All images are now generated using anti-aliasing and are available over a range of resolutions, sizes and formats (PNG, TIFF, PDF, PostScript, or SVG). To improve its performance, MetaboAnalyst 2.0 is now hosted on a much more powerful server with substantially modified code to take advantage the server’s multi-core CPUs for computationally intensive tasks. MetaboAnalyst 2.0 also maintains a collection of 50 or more FAQs and more than a dozen tutorials compiled from user queries and requests. A downloadable version of MetaboAnalyst 2.0, along detailed instructions for local installation is now available as well.
Human cerebral spinal fluid (CSF) is known to be a rich source of small molecule biomarkers for neurological and neurodegenerative diseases. In 2007, we conducted a comprehensive metabolomic study and performed a detailed literature review on metabolites that could be detected (via metabolomics or other techniques) in CSF. A total of 308 detectable metabolites were identified, of which only 23% were shown to be routinely identifiable or quantifiable with the metabolomics technologies available at that time. The continuing advancement in analytical technologies along with the growing interest in CSF metabolomics has led us to re-visit the human CSF metabolome and to re-assess both its size and the level of coverage than can be achieved with today's technologies.
We used five analytical platforms, including nuclear magnetic resonance (NMR), gas chromatography-mass spectrometry (GC-MS), liquid chromatography-mass spectrometry (LC-MS), direct flow injection-mass spectrometry (DFI-MS/MS) and inductively coupled plasma-mass spectrometry (ICP-MS) to perform quantitative metabolomics on multiple human CSF samples. This experimental work was complemented with an extensive literature review to acquire additional information on reported CSF compounds, their concentrations and their disease associations.
NMR, GC-MS and LC-MS methods allowed the identification and quantification of 70 CSF metabolites (as previously reported). DFI-MS/MS allowed the quantification of 78 metabolites (6 acylcarnitines, 13 amino acids, hexose, 42 phosphatidylcholines, 2 lyso-phosphatidylcholines and 14 sphingolipids), while ICP-MS provided quantitative results for 33 metal ions in CSF. Literature analysis led to the identification of 57 more metabolites. In total, 476 compounds have now been confirmed to exist in human CSF.
The use of improved metabolomic and other analytical techniques has led to a 54% increase in the known size of the human CSF metabolome over the past 5 years. Commonly available metabolomic methods, when combined, can now routinely identify and quantify 36% of the 'detectable' human CSF metabolome. Our experimental works measured 78 new metabolites that, as per our knowledge, have not been reported to be present in human CSF. An updated CSF metabolome database containing the complete set of 476 human CSF compounds, their concentrations, related literature references and links to their known disease associations is freely available at the CSF metabolome database.
Sea buckthorn (Hippophae rhamnoides L.) is a hardy, fruit-producing plant known historically for its medicinal and nutraceutical properties. The most recognized product of sea buckthorn is its fruit oil, composed of seed oil that is rich in essential fatty acids, linoleic (18∶2ω-6) and α-linolenic (18∶3ω-3) acids, and pulp oil that contains high levels of monounsaturated palmitoleic acid (16∶1ω-7). Sea buckthorn is fast gaining popularity as a source of functional food and nutraceuticals, but currently has few genomic resources; therefore, we explored the fatty acid composition of Canadian-grown cultivars (ssp. mongolica) and the sea buckthorn seed transcriptome using the 454 GS FLX sequencing technology.
GC-MS profiling of fatty acids in seeds and pulp of berries indicated that the seed oil contained linoleic and α-linolenic acids at 33–36% and 30–36%, respectively, while the pulp oil contained palmitoleic acid at 32–42%. 454 sequencing of sea buckthorn cDNA collections from mature seeds yielded 500,392 sequence reads, which identified 89,141 putative unigenes represented by 37,482 contigs and 51,659 singletons. Functional annotation by Gene Ontology and computational prediction of metabolic pathways indicated that primary metabolism (protein>nucleic acid>carbohydrate>lipid) and fatty acid and lipid biosynthesis pathways were highly represented categories. Sea buckthorn sequences related to fatty acid biosynthesis genes in Arabidopsis were identified, and a subset of these was examined for transcript expression at four developing stages of the berry.
This study provides the first comprehensive genomic resources represented by expressed sequences for sea buckthorn, and demonstrates that the seed oil of Canadian-grown sea buckthorn cultivars contains high levels of linoleic acid and α-linolenic acid in a close to 1∶1 ratio, which is beneficial for human health. These data provide the foundation for further studies on sea buckthorn oil, the enzymes involved in its biosynthesis, and the genes involved in the general hardiness of sea buckthorn against environmental conditions.
Originally released in 2005, BacMap is an electronic, interactive atlas of fully sequenced bacterial genomes. It contains fully labeled, zoomable and searchable chromosome maps for essentially all sequenced prokaryotic (archaebacterial and eubacterial) species. Each map can be zoomed to the level of individual genes and each gene is hyperlinked to a richly annotated gene card. The latest release of BacMap (http://bacmap.wishartlab.com/) now contains data for more than 1700 bacterial species (∼10× more than the 2005 release), corresponding to more than 2800 chromosome and plasmid maps. All bacterial genome maps are now supplemented with separate prophage genome maps as well as separate tRNA and rRNA maps. Each bacterial chromosome entry in BacMap also contains graphs and tables on a variety of gene and protein statistics. Likewise, every bacterial species entry contains a bacterial ‘biography’ card, with taxonomic details, phenotypic details, textual descriptions and images (when available). Improved data browsing and searching tools have also been added to allow more facile filtering, sorting and display of the chromosome maps and their contents.
Synthetic biology is an emerging branch of molecular biology that uses synthetic genetic constructs to create man-made cells or organisms that are capable of performing novel and/or useful applications. Using a synthetic chemically sensitive genetic toggle switch to activate appropriate fluorescent protein indicators (GFP, RFP) and a cell division inhibitor (minC), we have created a novel E. coli strain that can be used as a highly specific, yet simple and inexpensive chemical recording device. This biological “nanorecorder” can be used to determine both the type and the time at which a brief chemical exposure event has occurred. In particular, we show that the short–term exposure (15–30 min) of cells harboring this synthetic genetic circuit to small molecule signals (anhydrotetracycline or IPTG) triggered long-term and uniform cell elongation, with cell length being directly proportional to the time elapsed following a brief chemical exposure. This work demonstrates that facile modification of an existing genetic toggle switch can be exploited to generate a robust, biologically-based “nanorecorder” that could potentially be adapted to detect, respond and record a wide range of chemical stimuli that may vary over time and space.
The Yeast Metabolome Database (YMDB, http://www.ymdb.ca) is a richly annotated ‘metabolomic’ database containing detailed information about the metabolome of Saccharomyces cerevisiae. Modeled closely after the Human Metabolome Database, the YMDB contains >2000 metabolites with links to 995 different genes/proteins, including enzymes and transporters. The information in YMDB has been gathered from hundreds of books, journal articles and electronic databases. In addition to its comprehensive literature-derived data, the YMDB also contains an extensive collection of experimental intracellular and extracellular metabolite concentration data compiled from detailed Mass Spectrometry (MS) and Nuclear Magnetic Resonance (NMR) metabolomic analyses performed in our lab. This is further supplemented with thousands of NMR and MS spectra collected on pure, reference yeast metabolites. Each metabolite entry in the YMDB contains an average of 80 separate data fields including comprehensive compound description, names and synonyms, structural information, physico-chemical data, reference NMR and MS spectra, intracellular/extracellular concentrations, growth conditions and substrates, pathway information, enzyme data, gene/protein sequence data, as well as numerous hyperlinks to images, references and other public databases. Extensive searching, relational querying and data browsing tools are also provided that support text, chemical structure, spectral, molecular weight and gene/protein sequence queries. Because of S. cervesiae's importance as a model organism for biologists and as a biofactory for industry, we believe this kind of database could have considerable appeal not only to metabolomics researchers, but also to yeast biologists, systems biologists, the industrial fermentation industry, as well as the beer, wine and spirit industry.
PHAge Search Tool (PHAST) is a web server designed to rapidly and accurately identify, annotate and graphically display prophage sequences within bacterial genomes or plasmids. It accepts either raw DNA sequence data or partially annotated GenBank formatted data and rapidly performs a number of database comparisons as well as phage ‘cornerstone’ feature identification steps to locate, annotate and display prophage sequences and prophage features. Relative to other prophage identification tools, PHAST is up to 40 times faster and up to 15% more sensitive. It is also able to process and annotate both raw DNA sequence data and Genbank files, provide richly annotated tables on prophage features and prophage ‘quality’ and distinguish between intact and incomplete prophage. PHAST also generates downloadable, high quality, interactive graphics that display all identified prophage components in both circular and linear genomic views. PHAST is available at (http://phast.wishartlab.com).
A new computer program, called SHIFTX2, is described which is capable of rapidly and accurately calculating diamagnetic 1H, 13C and 15N chemical shifts from protein coordinate data. Compared to its predecessor (SHIFTX) and to other existing protein chemical shift prediction programs, SHIFTX2 is substantially more accurate (up to 26% better by correlation coefficient with an RMS error that is up to 3.3× smaller) than the next best performing program. It also provides significantly more coverage (up to 10% more), is significantly faster (up to 8.5×) and capable of calculating a wider variety of backbone and side chain chemical shifts (up to 6×) than many other shift predictors. In particular, SHIFTX2 is able to attain correlation coefficients between experimentally observed and predicted backbone chemical shifts of 0.9800 (15N), 0.9959 (13Cα), 0.9992 (13Cβ), 0.9676 (13C′), 0.9714 (1HN), 0.9744 (1Hα) and RMS errors of 1.1169, 0.4412, 0.5163, 0.5330, 0.1711, and 0.1231 ppm, respectively. The correlation between SHIFTX2’s predicted and observed side chain chemical shifts is 0.9787 (13C) and 0.9482 (1H) with RMS errors of 0.9754 and 0.1723 ppm, respectively. SHIFTX2 is able to achieve such a high level of accuracy by using a large, high quality database of training proteins (>190), by utilizing advanced machine learning techniques, by incorporating many more features (χ2 and χ3 angles, solvent accessibility, H-bond geometry, pH, temperature), and by combining sequence-based with structure-based chemical shift prediction techniques. With this substantial improvement in accuracy we believe that SHIFTX2 will open the door to many long-anticipated applications of chemical shift prediction to protein structure determination, refinement and validation. SHIFTX2 is available both as a standalone program and as a web server (http://www.shiftx2.ca).
Electronic supplementary material
The online version of this article (doi:10.1007/s10858-011-9478-4) contains supplementary material, which is available to authorized users.
NMR; Protein; Chemical shift; Machine learning