MetaboLights is the first general-purpose open-access curated repository for metabolomic studies, their raw experimental data and associated metadata, maintained by one of the major open-access data providers in molecular biology. Increases in the number of depositions, number of samples per study and the file size of data submitted to MetaboLights present a challenge for the objective of ensuring high-quality and standardized data in the context of diverse metabolomic workflows and data representations. Here, we describe the MetaboLights curation pipeline, its challenges and its practical application in quality control of complex data depositions.
MetaboLights (http://www.ebi.ac.uk/metabolights) is the first general-purpose, open-access repository for metabolomics studies, their raw experimental data and associated metadata, maintained by one of the major open-access data providers in molecular biology. Metabolomic profiling is an important tool for research into biological functioning and into the systemic perturbations caused by diseases, diet and the environment. The effectiveness of such methods depends on the availability of public open data across a broad range of experimental methods and conditions. The MetaboLights repository, powered by the open source ISA framework, is cross-species and cross-technique. It will cover metabolite structures and their reference spectra as well as their biological roles, locations, concentrations and raw data from metabolic experiments. Studies automatically receive a stable unique accession number that can be used as a publication reference (e.g. MTBLS1). At present, the repository includes 15 submitted studies, encompassing 93 protocols for 714 assays, and span over 8 different species including human, Caenorhabditis elegans, Mus musculus and Arabidopsis thaliana. Eight hundred twenty-seven of the metabolites identified in these studies have been mapped to ChEBI. These studies cover a variety of techniques, including NMR spectroscopy and mass spectrometry.
With ever-increasing amounts of metabolomics data produced each year, there is an even greater need to disseminate data and knowledge produced in a standard and reproducible way. To assist with this a general purpose, open source metabolomics repository, MetaboLights, was launched in 2012. To promote a community standard, initially culminated as metabolomics standards initiative (MSI), COordination of Standards in MetabOlomicS (COSMOS) was introduced. COSMOS aims to link life science e-infrastructures within the worldwide metabolomics community as well as develop and maintain open source exchange formats for raw and processed data, ensuring better flow of metabolomics information.
Metabolomics; MetaboLights; Database; Repository; Data sharing; Standard; MSI; ISA-Tab; Curation; COSMOS
Metabolomics, the global study of small molecules in a particular system, has in the last few years risen to become a primary –omics platform for the study of metabolic processes. With the ever-increasing pool of quantitative data yielded from metabolomic research, specialized methods and tools with which to analyze and extract meaningful conclusions from these data are becoming more and more crucial. Furthermore, the depth of knowledge and expertise required to undertake a metabolomics oriented study is a daunting obstacle to investigators new to the field. As such, we have created a new statistical analysis workflow, MetaboLyzer, which aims to both simplify analysis for investigators new to metabolomics, as well as provide experienced investigators the flexibility to conduct sophisticated analysis. MetaboLyzer’s workflow is specifically tailored to the unique characteristics and idiosyncrasies of postprocessed liquid chromatography/mass spectrometry (LC/MS) based metabolomic datasets. It utilizes a wide gamut of statistical tests, procedures, and methodologies that belong to classical biostatistics, as well as several novel statistical techniques that we have developed specifically for metabolomics data. Furthermore, MetaboLyzer conducts rapid putative ion identification and putative biologically relevant analysis via incorporation of four major small molecule databases: KEGG, HMDB, Lipid Maps, and BioCyc. MetaboLyzer incorporates these aspects into a comprehensive workflow that outputs easy to understand statistically significant and potentially biologically relevant information in the form of heatmaps, volcano plots, 3D visualization plots, correlation maps, and metabolic pathway hit histograms. For demonstration purposes, a urine metabolomics data set from a previously reported radiobiology study in which samples were collected from mice exposed to gamma radiation was analyzed. MetaboLyzer was able to identify 243 statistically significant ions out of a total of 1942. Numerous putative metabolites and pathways were found to be biologically significant from the putative ion identification workflow.
Searching metabolites against databases according to their masses is often the first step in metabolite identification for a mass spectrometry-based untargeted metabolomics study. Major metabolite databases include Human Metabolome DataBase (HMDB), Madison Metabolomics Consortium Database (MMCD), Metlin, and LIPID MAPS. Since each one of these databases covers only a fraction of the metabolome, integration of the search results from these databases is expected to yield a more comprehensive coverage. However, the manual combination of multiple search results is generally difficult when identification of hundreds of metabolites is desired. We have implemented a web-based software tool that enables simultaneous mass-based search against the four major databases, and the integration of the results. In addition, more complete chemical identifier information for the metabolites is retrieved by cross-referencing multiple databases. The search results are merged based on IUPAC International Chemical Identifier (InChI) keys. Besides a simple list of m/z values, the software can accept the ion annotation information as input for enhanced metabolite identification. The performance of the software is demonstrated on mass spectrometry data acquired in both positive and negative ionization modes. Compared with search results from individual databases, MetaboSearch provides better coverage of the metabolome and more complete chemical identifier information. Availability: The software tool is available at http://omics.georgetown.edu/MetaboSearch.html.
The application of reporting standards in metabolomics allow data from different laboratories to be shared, integrated and interpreted. Although minimum reporting standards related to metabolite identification were published in 2007, it is clear that significant efforts are required to ensure their continuous update and appropriate use by the metabolomics community. These include their use in metabolomics data submission (e.g., MetaboLights) and as a requirement for publication in peer-reviewed journals (e.g., Metabolomics). The Data Standards and Metabolite Identification Task Groups of the international Metabolomics Society are actively working to develop and promote these standards and educate the community on their use.
Metabolomics is a newly emerging field of ‘omics’ research that is concerned with characterizing large numbers of metabolites using NMR, chromatography and mass spectrometry. It is frequently used in biomarker identification and the metabolic profiling of cells, tissues or organisms. The data processing challenges in metabolomics are quite unique and often require specialized (or expensive) data analysis software and a detailed knowledge of cheminformatics, bioinformatics and statistics. In an effort to simplify metabolomic data analysis while at the same time improving user accessibility, we have developed a freely accessible, easy-to-use web server for metabolomic data analysis called MetaboAnalyst. Fundamentally, MetaboAnalyst is a web-based metabolomic data processing tool not unlike many of today's web-based microarray analysis packages. It accepts a variety of input data (NMR peak lists, binned spectra, MS peak lists, compound/concentration data) in a wide variety of formats. It also offers a number of options for metabolomic data processing, data normalization, multivariate statistical analysis, graphing, metabolite identification and pathway mapping. In particular, MetaboAnalyst supports such techniques as: fold change analysis, t-tests, PCA, PLS-DA, hierarchical clustering and a number of more sophisticated statistical or machine learning methods. It also employs a large library of reference spectra to facilitate compound identification from most kinds of input spectra. MetaboAnalyst guides users through a step-by-step analysis pipeline using a variety of menus, information hyperlinks and check boxes. Upon completion, the server generates a detailed report describing each method used, embedded with graphical and tabular outputs. MetaboAnalyst is capable of handling most kinds of metabolomic data and was designed to perform most of the common kinds of metabolomic data analyses. MetaboAnalyst is accessible at http://www.metaboanalyst.ca
Due to their sensitivity and speed, mass-spectrometry based analytical technologies are widely used to in metabolomics to characterize biological phenomena. To address issues like metadata organization, quality assessment, data processing, data storage, and, finally, submission to public repositories, bioinformatic pipelines of a non-interactive nature are often employed, complementing the interactive software used for initial inspection and visualization of the data. These pipelines often are created as open-source software allowing the complete and exhaustive documentation of each step, ensuring the reproducibility of the analysis of extensive and often expensive experiments. In this paper, we will review the major steps which constitute such a data processing pipeline, discussing them in the context of an open-source software for untargeted MS-based metabolomics experiments recently developed at our institute. The software has been developed by integrating our metaMS R package with a user-friendly web-based application written in Grails. MetaMS takes care of data pre-processing and annotation, while the interface deals with the creation of the sample lists, the organization of the data storage, and the generation of survey plots for quality assessment. Experimental and biological metadata are stored in the ISA-Tab format making the proposed pipeline fully integrated with the Metabolights framework.
metabolomics; ISA-Tab; pipeline; data analysis; LC-MS; GC-MS
A metabolome—the collection of comprehensive quantitative data on metabolites in an organism—has been increasingly utilized for applications such as data-intensive systems biology, disease diagnostics, biomarker discovery, and assessment of food quality. A considerable number of tools and databases have been developed to date for the analysis of data generated by various combinations of chromatography and mass spectrometry. We report here a web portal named KOMICS (The Kazusa Metabolomics Portal), where the tools and databases that we developed are available for free to academic users. KOMICS includes the tools and databases for preprocessing, mining, visualization, and publication of metabolomics data. Improvements in the annotation of unknown metabolites and dissemination of comprehensive metabolomic data are the primary aims behind the development of this portal. For this purpose, PowerGet and FragmentAlign include a manual curation function for the results of metabolite feature alignments. A metadata-specific wiki-based database, Metabolonote, functions as a hub of web resources related to the submitters' work. This feature is expected to increase citation of the submitters' work, thereby promoting data publication. As an example of the practical use of KOMICS, a workflow for a study on Jatropha curcas is presented. The tools and databases available at KOMICS should contribute to enhanced production, interpretation, and utilization of metabolomic Big Data.
One-dimensional 1H-NMR spectroscopy is widely used for high-throughput characterization of metabolites in complex biological mixtures. However, the accurate identification of individual compounds is still a challenging task, particularly in spectral regions with higher peak densities. The need for automatic tools to facilitate and further improve the accuracy of such tasks, while using increasingly larger reference spectral libraries becomes a priority of current metabolomics research.
We introduce a web server application, called MetaboHunter, which can be used for automatic assignment of 1H-NMR spectra of metabolites. MetaboHunter provides methods for automatic metabolite identification based on spectra or peak lists with three different search methods and with possibility for peak drift in a user defined spectral range. The assignment is performed using as reference libraries manually curated data from two major publicly available databases of NMR metabolite standard measurements (HMDB and MMCD). Tests using a variety of synthetic and experimental spectra of single and multi metabolite mixtures show that MetaboHunter is able to identify, in average, more than 80% of detectable metabolites from spectra of synthetic mixtures and more than 50% from spectra corresponding to experimental mixtures. This work also suggests that better scoring functions improve by more than 30% the performance of MetaboHunter's metabolite identification methods.
MetaboHunter is a freely accessible, easy to use and user friendly 1H-NMR-based web server application that provides efficient data input and pre-processing, flexible parameter settings, fast and automatic metabolite fingerprinting and results visualization via intuitive plotting and compound peak hit maps. Compared to other published and freely accessible metabolomics tools, MetaboHunter implements three efficient methods to search for metabolites in manually curated data from two reference libraries.
Discovering molecular components and their functionality is key to the development of hypotheses concerning the organization and regulation of metabolic networks. The iterative experimental testing of such hypotheses is the trajectory that can ultimately enable accurate computational modelling and prediction of metabolic outcomes. This information can be particularly important for understanding the biology of natural products, whose metabolism itself is often only poorly defined. Here, we describe factors that must be in place to optimize the use of metabolomics in predictive biology. A key to achieving this vision is a collection of accurate time-resolved and spatially defined metabolite abundance data and associated metadata. One formidable challenge associated with metabolite profiling is the complexity and analytical limits associated with comprehensively determining the metabolome of an organism. Further, for metabolomics data to be efficiently used by the research community, it must be curated in publically available metabolomics databases. Such databases require clear, consistent formats, easy access to data and metadata, data download, and accessible computational tools to integrate genome system-scale datasets. Although transcriptomics and proteomics integrate the linear predictive power of the genome, the metabolome represents the nonlinear, final biochemical products of the genome, which results from the intricate system(s) that regulate genome expression. For example, the relationship of metabolomics data to the metabolic network is confounded by redundant connections between metabolites and gene-products. However, connections among metabolites are predictable through the rules of chemistry. Therefore, enhancing the ability to integrate the metabolome with anchor-points in the transcriptome and proteome will enhance the predictive power of genomics data. We detail a public database repository for metabolomics, tools and approaches for statistical analysis of metabolomics data, and methods for integrating these dataset with transcriptomic data to create hypotheses concerning specialized metabolism that generates the diversity in natural product chemistry. We discuss the importance of close collaborations among biologists, chemists, computer scientists and statisticians throughout the development of such integrated metabolism-centric databases and software.
Recent technological advancements in liquid chromatography coupled with high resolution mass spectrometry have facilitated clinical biomarker discovery, verification and validation. Metabolomics is the comprehensive study of metabolic reactions and is fast gaining ground as a powerful tool to differentiate between the diseased and healthy states. The power of this rapidly emerging field has been comprehensively reviewed. An ideal metabolomics experiment aims to profile all the metabolites present in the given biological matrix, which is a huge analytical challenge, given the chemical and structural diversity of small molecules. However, since metabolites directly represent the phenotypic readout, global metabolomic profiling offers to provide novel insights into regulation of metabolism and complements the existing knowledge from other “omics” technologies thus integrating well in to the systems approach aimed at personalized medicine.
In the recent years there is a conscious effort towards developing technology in order to profile the global metabolome in order to study a variety of complex physiological diseases including obesity, diabetes and cancer. Our knowledge of the metabolome is still limited. In fact, the anticipated contribution of metabolomics to the field of science and to health care is highlighted by its presence in the NIH Roadmap. Mass spectrometric methods continuously improve in sensitivity and accuracy but are also characterized by high dimensionality and complex patterns with substantial amount of noise. Thus, metabolomics study design and data analysis play a key role in addressing this challenge. While, universally applicable tools for metabolomics study design and data analysis are still in development, metabolomics continues to emerge as a forefront technology with great applicability to basic, clinical and translational research. The Metabolomics Research Group (MRG) strives to initiate studies aimed at testing cross-platform and inter laboratory reproducibility with an ultimate goal of standardizing the use of this technology for cutting edge research.
Despite wide-spread use of Nuclear Magnetic Resonance (NMR) in metabolomics for the analysis of biological samples there is a lack of graphically driven, publicly available software to process large one and two-dimensional NMR data sets for statistical analysis.
Here we present MetaboLab, a MATLAB based software package that facilitates NMR data processing by providing automated algorithms for processing series of spectra in a reproducible fashion. A graphical user interface provides easy access to all steps of data processing via a script builder to generate MATLAB scripts, providing an option to alter code manually. The analysis of two-dimensional spectra (1H,13C-HSQC spectra) is facilitated by the use of a spectral library derived from publicly available databases which can be extended readily. The software allows to display specific metabolites in small regions of interest where signals can be picked. To facilitate the analysis of series of two-dimensional spectra, different spectra can be overlaid and assignments can be transferred between spectra. The software includes mechanisms to account for overlapping signals by highlighting neighboring and ambiguous assignments.
The MetaboLab software is an integrated software package for NMR data processing and analysis, closely linked to the previously developed NMRLab software. It includes tools for batch processing and gives access to a wealth of algorithms available in the MATLAB framework. Algorithms within MetaboLab help to optimize the flow of metabolomics data preparation for statistical analysis. The combination of an intuitive graphical user interface along with advanced data processing algorithms facilitates the use of MetaboLab in a broader metabolomics context.
Metabolomics has matured over the past 10 years. By combining different platforms, over 2,000 identified metabolites can be screened. At the UC Davis Genome Center Metabolomics Facility, two laboratories work towards advancing methods and reaching out services, the Fiehn research laboratory and the metabolomics core. We have a combined use of 11 mass spectrometers for which a range of SOPs and quality controls have been developed for (a) primary metabolism, (b) volatile metabolites, (c) lipidomics, (d) secondary metabolites and (e) metabolic polymers. Over 300 studies have been completed over the past 5 years which are stored and disseminated via the SetupX study design database and facilitated by the BinBase mass spectrometry repositories. Lipid identifications by nanoESI-ion trap MS/MS are based on Genedata's MS Refiner software and a novel cross-instrument library, the LipidBLAST tool that stores calculated MS/MS spectra of over 180,000 lipids based on fragmentation patterns of authentic standards. The FiehnLib libraries of over 1,000 primary metabolites authenticate identifications in GC-TOF platforms, in conjunction with BinBase and the Adams volatile MS library. Polymers in biofuel research are assessed by pyrolysis-GC/MS and the MIT-based SpectConnect tool. LC-ion trap, Qtrap and QTOF mass spectrometry are used for determining compounds that are not amenable by one of the above methods, such as cationic metabolites (SAM, betaine, SMM), metabolic active biomarkers (acylcarnitines) and other important metabolic classes (dietary phytochemicals, folates and glucuronides). Despite this progress, metabolomics still faces a number of analytical challenges: the need for accuracy in structural identifications and quantifications, increases in total peak capacities, improved data processing software and the need for standardized database repositories. Current efforts are presented as well as a discussion on experiences in the dual task of ‘research’ and ‘service’ for metabolomic facilities and how to meet outside expectations and financial constraints.
Metabolomics is the methodology that identifies and measures global pools of small molecules (of less than about 1,000 Da) of a biological sample, which are collectively called the metabolome. Metabolomics can therefore reveal the metabolic outcome of a genetic or environmental perturbation of a metabolic regulatory network, and thus provide insights into the structure and regulation of that network. Because of the chemical complexity of the metabolome and limitations associated with individual analytical platforms for determining the metabolome, it is currently difficult to capture the complete metabolome of an organism or tissue, which is in contrast to genomics and transcriptomics. This paper describes the analysis of Arabidopsis metabolomics data sets acquired by a consortium that includes five analytical laboratories, bioinformaticists, and biostatisticians, which aims to develop and validate metabolomics as a hypothesis-generating functional genomics tool. The consortium is determining the metabolomes of Arabidopsis T-DNA mutant stocks, grown in standardized controlled environment optimized to minimize environmental impacts on the metabolomes. Metabolomics data were generated with seven analytical platforms, and the combined data is being provided to the research community to formulate initial hypotheses about genes of unknown function (GUFs). A public database (www.PlantMetabolomics.org) has been developed to provide the scientific community with access to the data along with tools to allow for its interactive analysis. Exemplary datasets are discussed to validate the approach, which illustrate how initial hypotheses can be generated from the consortium-produced metabolomics data, integrated with prior knowledge to provide a testable hypothesis concerning the functionality of GUFs.
Arabidopsis; metabolomics; gene annotation; functional genomics; database
Improvements in the techniques for metabolomics analyses and growing interest in metabolomic approaches are resulting in the generation of increasing numbers of metabolomic profiles. Platforms are required for profile management, as a function of experimental design, and for metabolite identification, to facilitate the mining of the corresponding data. Various databases have been created, including organism-specific knowledgebases and analytical technique-specific spectral databases. However, there is currently no platform meeting the requirements for both profile management and metabolite identification for nuclear magnetic resonance (NMR) experiments.
MeRy-B, the first platform for plant 1H-NMR metabolomic profiles, is designed (i) to provide a knowledgebase of curated plant profiles and metabolites obtained by NMR, together with the corresponding experimental and analytical metadata, (ii) for queries and visualization of the data, (iii) to discriminate between profiles with spectrum visualization tools and statistical analysis, (iv) to facilitate compound identification. It contains lists of plant metabolites and unknown compounds, with information about experimental conditions, the factors studied and metabolite concentrations for several plant species, compiled from more than one thousand annotated NMR profiles for various organs or tissues.
MeRy-B manages all the data generated by NMR-based plant metabolomics experiments, from description of the biological source to identification of the metabolites and determinations of their concentrations. It is the first database allowing the display and overlay of NMR metabolomic profiles selected through queries on data or metadata. MeRy-B is available from http://www.cbib.u-bordeaux2.fr/MERYB/index.php.
First released in 2009, MetaboAnalyst (www.metaboanalyst.ca) was a relatively simple web server designed to facilitate metabolomic data processing and statistical analysis. With continuing advances in metabolomics along with constant user feedback, it became clear that a substantial upgrade to the original server was necessary. MetaboAnalyst 2.0, which is the successor to MetaboAnalyst, represents just such an upgrade. MetaboAnalyst 2.0 now contains dozens of new features and functions including new procedures for data filtering, data editing and data normalization. It also supports multi-group data analysis, two-factor analysis as well as time-series data analysis. These new functions have also been supplemented with: (i) a quality-control module that allows users to evaluate their data quality before conducting any analysis, (ii) a functional enrichment analysis module that allows users to identify biologically meaningful patterns using metabolite set enrichment analysis and (iii) a metabolic pathway analysis module that allows users to perform pathway analysis and visualization for 15 different model organisms. In developing MetaboAnalyst 2.0 we have also substantially improved its graphical presentation tools. All images are now generated using anti-aliasing and are available over a range of resolutions, sizes and formats (PNG, TIFF, PDF, PostScript, or SVG). To improve its performance, MetaboAnalyst 2.0 is now hosted on a much more powerful server with substantially modified code to take advantage the server’s multi-core CPUs for computationally intensive tasks. MetaboAnalyst 2.0 also maintains a collection of 50 or more FAQs and more than a dozen tutorials compiled from user queries and requests. A downloadable version of MetaboAnalyst 2.0, along detailed instructions for local installation is now available as well.
The genome sequencing projects have shown our limited knowledge regarding gene function, e.g. S. cerevisiae has 5–6,000 genes of which nearly 1,000 have an uncertain function. Their gross influence on the behaviour of the cell can be observed using large-scale metabolomic studies. The metabolomic data produced need to be structured and annotated in a machine-usable form to facilitate the exploration of the hidden links between the genes and their functions.
MeMo is a formal model for representing metabolomic data and the associated metadata. Two predominant platforms (SQL and XML) are used to encode the model. MeMo has been implemented as a relational database using a hybrid approach combining the advantages of the two technologies. It represents a practical solution for handling the sheer volume and complexity of the metabolomic data effectively and efficiently. The MeMo model and the associated software are available at .
The maturity of relational database technology is used to support efficient data processing. The scalability and self-descriptiveness of XML are used to simplify the relational schema and facilitate the extensibility of the model necessitated by the creation of new experimental techniques. Special consideration is given to data integration issues as part of the systems biology agenda. MeMo has been physically integrated and cross-linked to related metabolomic and genomic databases. Semantic integration with other relevant databases has been supported through ontological annotation. Compatibility with other data formats is supported by automatic conversion.
Standardization of analytical approaches and reporting methods via community-wide collaboration can work synergistically with web-tool development to result in rapid community-driven expansion of online data repositories suitable for data mining and meta-analysis. In metabolomics, the inter-laboratory reproducibility of gas-chromatography/mass-spectrometry (GC/MS) makes it an obvious target for such development. While a number of web-tools offer access to datasets and/or tools for raw data processing and statistical analysis, none of these systems are currently set up to act as a public repository by easily accepting, processing and presenting publicly submitted GC/MS metabolomics datasets for public re-analysis.
Here, we present MetabolomeExpress, a new File Transfer Protocol (FTP) server and web-tool for the online storage, processing, visualisation and statistical re-analysis of publicly submitted GC/MS metabolomics datasets. Users may search a quality-controlled database of metabolite response statistics from publicly submitted datasets by a number of parameters (eg. metabolite, species, organ/biofluid etc.). Users may also perform meta-analysis comparisons of multiple independent experiments or re-analyse public primary datasets via user-friendly tools for t-test, principal components analysis, hierarchical cluster analysis and correlation analysis. They may interact with chromatograms, mass spectra and peak detection results via an integrated raw data viewer. Researchers who register for a free account may upload (via FTP) their own data to the server for online processing via a novel raw data processing pipeline.
MetabolomeExpress https://www.metabolome-express.org provides a new opportunity for the general metabolomics community to transparently present online the raw and processed GC/MS data underlying their metabolomics publications. Transparent sharing of these data will allow researchers to assess data quality and draw their own insights from published metabolomics datasets.
One-dimensional (1D) 1H nuclear magnetic resonance (NMR) spectroscopy is widely used in metabolomic studies involving biofluids and tissue extracts. There are several software packages that support compound identification and quantification via 1D 1H NMR by spectral fitting techniques. Because 1D 1H NMR spectra are characterized by extensive peak overlap or spectral congestion, two-dimensional (2D) NMR, with its increased spectral resolution, could potentially improve and even automate compound identification or quantification. However, the lack of dedicated software for this purpose significantly restricts the application of 2D NMR methods to most metabolomic studies.
We describe a standalone graphics software tool, called MetaboMiner, which can be used to automatically or semi-automatically identify metabolites in complex biofluids from 2D NMR spectra. MetaboMiner is able to handle both 1H-1H total correlation spectroscopy (TOCSY) and 1H-13C heteronuclear single quantum correlation (HSQC) data. It identifies compounds by comparing 2D spectral patterns in the NMR spectrum of the biofluid mixture with specially constructed libraries containing reference spectra of ~500 pure compounds. Tests using a variety of synthetic and real spectra of compound mixtures showed that MetaboMiner is able to identify >80% of detectable metabolites from good quality NMR spectra.
MetaboMiner is a freely available, easy-to-use, NMR-based metabolomics tool that facilitates automatic peak processing, rapid compound identification, and facile spectrum annotation from either 2D TOCSY or HSQC spectra. Using comprehensive reference libraries coupled with robust algorithms for peak matching and compound identification, the program greatly simplifies the process of metabolite identification in complex 2D NMR spectra.
Metabolomics is a rapidly developing functional genomic tool that has a wide range of applications in diverse fields in biology and medicine. However, unlike transcriptomics and proteomics there is currently no central repository for the depositing of data despite efforts by the Metabolomics Standard Initiative (MSI) to develop a standardised description of a metabolomic experiment.
In this manuscript we describe how the MSI description has been applied to a published dataset involving the identification of cross-species metabolic biomarkers associated with type II diabetes. The study describes sample collection of urine from mice, rats and human volunteers, and the subsequent acquisition of data by high resolution 1H NMR spectroscopy. The metadata is described to demonstrate how the MSI descriptions could be applied in a manuscript and the spectra have also been made available for the mouse and rat studies to allow others to process the data.
The intention of this manuscript is to stimulate discussion as to whether the MSI description is sufficient to describe the metadata associated with metabolomic experiments and encourage others to make their data available to other researchers.
data standards; metabolomics repository; bioinformatics; NMR spectroscopy
The ABRF Metabolomics Research Group (MRG) was formed in 2009 and aims to educate research scientists and resource facilities in the analytical approaches and management of data resulting from comprehensive metabolite studies and to promote the science and standardization of metabolomic analyses for a variety of applications. Last year the MRG conducted a ‘Survey Study’ on the current use of metabolomics technologies and procedures in core facilities. This year the MRG is organizing a ‘Research Study’ involving a spiked plasma sample. The study sample consists of a human biofluid as the matrix, replicating a typical small scale metabolomics pilot experiment that either a core or research laboratory would perform. The sample consists of two groups of normal human plasma (NIST plasma ‘Standard Reference Material’) with spiked in compounds. There are three biological replicates in each group (n = 3 design) with different levels of spiked compounds differentiating the two groups. Participants are asked to determine statistical significance, fold change, and identify compounds that differ significantly between groups A and B. The design reflects issues encountered in an actual metabolomics experiment conducted with human or animal specimens. The study is compatible with many methodological approaches in metabolomics, including, but not limited to LC/MS, GC/MS, NMR, as well as other methods. As with any metabolomics profiling experiment, the best results would be expected using a combination of approaches. The study is the first of its kind in the field of metabolomics and is expected to produce important information on the strengths and limitations of the various platforms and technologies that are commonly used for comprehensive metabolite analyses.
The PlantMetabolomics (PM) database (http://www.plantmetabolomics.org) contains comprehensive targeted and untargeted mass spectrum metabolomics data for Arabidopsis mutants across a variety of metabolomics platforms. The database allows users to generate hypotheses about the changes in metabolism for mutants with genes of unknown function. Version 2.0 of PlantMetabolomics.org currently contains data for 140 mutant lines along with the morphological data. A web-based data analysis wizard allows researchers to select preprocessing and data-mining procedures to discover differences between mutants. This community resource enables researchers to formulate models of the metabolic network of Arabidopsis and enhances the research community's ability to formulate testable hypotheses concerning gene functions. PM features new web-based tools for data-mining analysis, visualization tools and enhanced cross links to other databases. The database is publicly available. PM aims to provide a hypothesis building platform for the researchers interested in any of the mutant lines or metabolites.
In the last decade, techniques were established for the large scale genome-wide analysis of proteins, RNA, and metabolites, and database solutions have been developed to manage the generated data sets. The Golm Metabolome Database for metabolite data (GMD) represents one such effort to make these data broadly available and to interconnect the different molecular levels of a biological system . As data interpretation in the light of already existing data becomes increasingly important, these initiatives are an essential part of current and future systems biology.
A mass spectral library consisting of experimentally derived tryptic peptide product ion spectra was generated based on liquid chromatography coupled to ion trap mass spectrometry (LC-IT-MS). Protein samples derived from Arabidopsis thaliana, Chlamydomonas reinhardii, Medicago truncatula, and Sinorhizobium meliloti were analysed. With currently 4,557 manually validated spectra associated with 4,226 unique peptides from 1,367 proteins, the database serves as a continuously growing reference data set and can be used for protein identification and quantification in uncharacterized biological samples. For peptide identification, several algorithms were implemented based on a recently published study for peptide mass fingerprinting  and tested for false positive and negative rates. An algorithm which considers intensity distribution for match correlation scores was found to yield best results. For proof of concept, an LC-IT-MS analysis of a tryptic leaf protein digest was converted to mzData format and searched against the mass spectral library. The utility of the mass spectral library was also tested for the identification of phosphorylated tryptic peptides. We included in vivo phosphorylation sites of Arabidopsis thaliana proteins and the identification performance was found to be improved compared to genome-based search algorithms. Protein identification by ProMEX is linked to other levels of biological organization such as metabolite, pathway, and transcript data. The database is further connected to annotation and classification services via BioMoby.
The ProMEX protein/peptide database represents a mass spectral reference library with the capability of matching unknown samples for protein identification. The database allows text searches based on metadata such as experimental information of the samples, mass spectrometric instrument parameters or unique protein identifier like AGI codes. ProMEX integrates proteomics data with other levels of molecular organization including metabolite, pathway, and transcript information and may thus become a useful resource for plant systems biology studies. The ProMEX mass spectral library is available at .
Metabolome analysis with GC/MS has meanwhile been established as one of the "omics" techniques. Compound identification is done by comparison of the MS data with compound libraries. Mass spectral libraries in the field of metabolomics ought to connect the relevant mass traces of the metabolites to other relevant data, e.g. formulas, chemical structures, identification numbers to other databases etc. Since existing solutions are either commercial and therefore only available for certain instruments or not capable of storing such information, there is need to provide a software tool for the management of such data.
Here we present mSpecs, an open source software tool to manage mass spectral data in the field of metabolomics. It provides editing of mass spectra and virtually any associated information, automatic calculation of formulas and masses and is extensible by scripts. The graphical user interface is capable of common techniques such as copy/paste, undo/redo and drag and drop. It owns import and export filters for the major public file formats in order to provide compatibility to commercial instruments.
mSpecs is a versatile tool for the management and editing of mass spectral libraries in the field of metabolomics. Beyond that it provides capabilities for the automatic management of libraries though its scripting functionality. mSpecs can be used on all major platforms and is licensed under the GNU General Public License and available at .