MetaboLights (http://www.ebi.ac.uk/metabolights) is the first general-purpose, open-access repository for metabolomics studies, their raw experimental data and associated metadata, maintained by one of the major open-access data providers in molecular biology. Metabolomic profiling is an important tool for research into biological functioning and into the systemic perturbations caused by diseases, diet and the environment. The effectiveness of such methods depends on the availability of public open data across a broad range of experimental methods and conditions. The MetaboLights repository, powered by the open source ISA framework, is cross-species and cross-technique. It will cover metabolite structures and their reference spectra as well as their biological roles, locations, concentrations and raw data from metabolic experiments. Studies automatically receive a stable unique accession number that can be used as a publication reference (e.g. MTBLS1). At present, the repository includes 15 submitted studies, encompassing 93 protocols for 714 assays, and span over 8 different species including human, Caenorhabditis elegans, Mus musculus and Arabidopsis thaliana. Eight hundred twenty-seven of the metabolites identified in these studies have been mapped to ChEBI. These studies cover a variety of techniques, including NMR spectroscopy and mass spectrometry.
With ever-increasing amounts of metabolomics data produced each year, there is an even greater need to disseminate data and knowledge produced in a standard and reproducible way. To assist with this a general purpose, open source metabolomics repository, MetaboLights, was launched in 2012. To promote a community standard, initially culminated as metabolomics standards initiative (MSI), COordination of Standards in MetabOlomicS (COSMOS) was introduced. COSMOS aims to link life science e-infrastructures within the worldwide metabolomics community as well as develop and maintain open source exchange formats for raw and processed data, ensuring better flow of metabolomics information.
Metabolomics; MetaboLights; Database; Repository; Data sharing; Standard; MSI; ISA-Tab; Curation; COSMOS
The application of reporting standards in metabolomics allow data from different laboratories to be shared, integrated and interpreted. Although minimum reporting standards related to metabolite identification were published in 2007, it is clear that significant efforts are required to ensure their continuous update and appropriate use by the metabolomics community. These include their use in metabolomics data submission (e.g., MetaboLights) and as a requirement for publication in peer-reviewed journals (e.g., Metabolomics). The Data Standards and Metabolite Identification Task Groups of the international Metabolomics Society are actively working to develop and promote these standards and educate the community on their use.
The combination of significantly lower cost and increased speed of sequencing has resulted in an explosive growth of data submitted into the primary next-generation sequence data archive, the Sequence Read Archive (SRA). The preservation of experimental data is an important part of the scientific record, and increasing numbers of journals and funding agencies require that next-generation sequence data are deposited into the SRA. The SRA was established as a public repository for the next-generation sequence data and is operated by the International Nucleotide Sequence Database Collaboration (INSDC). INSDC partners include the National Center for Biotechnology Information (NCBI), the European Bioinformatics Institute (EBI) and the DNA Data Bank of Japan (DDBJ). The SRA is accessible at http://www.ncbi.nlm.nih.gov/Traces/sra from NCBI, at http://www.ebi.ac.uk/ena from EBI and at http://trace.ddbj.nig.ac.jp from DDBJ. In this article, we present the content and structure of the SRA, detail our support for sequencing platforms and provide recommended data submission levels and formats. We also briefly outline our response to the challenge of data growth.
New generation sequencing platforms are producing data with significantly higher throughput and lower cost. A portion of this capacity is devoted to individual and community scientific projects. As these projects reach publication, raw sequencing datasets are submitted into the primary next-generation sequence data archive, the Sequence Read Archive (SRA). Archiving experimental data is the key to the progress of reproducible science. The SRA was established as a public repository for next-generation sequence data as a part of the International Nucleotide Sequence Database Collaboration (INSDC). INSDC is composed of the National Center for Biotechnology Information (NCBI), the European Bioinformatics Institute (EBI) and the DNA Data Bank of Japan (DDBJ). The SRA is accessible at www.ncbi.nlm.nih.gov/sra from NCBI, at www.ebi.ac.uk/ena from EBI and at trace.ddbj.nig.ac.jp from DDBJ. In this article, we present the content and structure of the SRA and report on updated metadata structures, submission file formats and supported sequencing platforms. We also briefly outline our various responses to the challenge of explosive data growth.
Next generation sequencing platforms are producing biological sequencing data in unprecedented amounts. The partners of the International Nucleotide Sequencing Database Collaboration, which includes the National Center for Biotechnology Information (NCBI), the European Bioinformatics Institute (EBI), and the DNA Data Bank of Japan (DDBJ), have established the Sequence Read Archive (SRA) to provide the scientific community with an archival destination for next generation data sets. The SRA is now accessible at http://www.ncbi.nlm.nih.gov/Traces/sra from NCBI, at http://www.ebi.ac.uk/ena from EBI and at http://www.ddbj.nig.ac.jp/sub/trace_sra-e.html from DDBJ. Users of these resources can obtain data sets deposited in any of the three SRA instances. Links and submission instructions are provided.
First released in 2009, MetaboAnalyst (www.metaboanalyst.ca) was a relatively simple web server designed to facilitate metabolomic data processing and statistical analysis. With continuing advances in metabolomics along with constant user feedback, it became clear that a substantial upgrade to the original server was necessary. MetaboAnalyst 2.0, which is the successor to MetaboAnalyst, represents just such an upgrade. MetaboAnalyst 2.0 now contains dozens of new features and functions including new procedures for data filtering, data editing and data normalization. It also supports multi-group data analysis, two-factor analysis as well as time-series data analysis. These new functions have also been supplemented with: (i) a quality-control module that allows users to evaluate their data quality before conducting any analysis, (ii) a functional enrichment analysis module that allows users to identify biologically meaningful patterns using metabolite set enrichment analysis and (iii) a metabolic pathway analysis module that allows users to perform pathway analysis and visualization for 15 different model organisms. In developing MetaboAnalyst 2.0 we have also substantially improved its graphical presentation tools. All images are now generated using anti-aliasing and are available over a range of resolutions, sizes and formats (PNG, TIFF, PDF, PostScript, or SVG). To improve its performance, MetaboAnalyst 2.0 is now hosted on a much more powerful server with substantially modified code to take advantage the server’s multi-core CPUs for computationally intensive tasks. MetaboAnalyst 2.0 also maintains a collection of 50 or more FAQs and more than a dozen tutorials compiled from user queries and requests. A downloadable version of MetaboAnalyst 2.0, along detailed instructions for local installation is now available as well.
Searching metabolites against databases according to their masses is often the first step in metabolite identification for a mass spectrometry-based untargeted metabolomics study. Major metabolite databases include Human Metabolome DataBase (HMDB), Madison Metabolomics Consortium Database (MMCD), Metlin, and LIPID MAPS. Since each one of these databases covers only a fraction of the metabolome, integration of the search results from these databases is expected to yield a more comprehensive coverage. However, the manual combination of multiple search results is generally difficult when identification of hundreds of metabolites is desired. We have implemented a web-based software tool that enables simultaneous mass-based search against the four major databases, and the integration of the results. In addition, more complete chemical identifier information for the metabolites is retrieved by cross-referencing multiple databases. The search results are merged based on IUPAC International Chemical Identifier (InChI) keys. Besides a simple list of m/z values, the software can accept the ion annotation information as input for enhanced metabolite identification. The performance of the software is demonstrated on mass spectrometry data acquired in both positive and negative ionization modes. Compared with search results from individual databases, MetaboSearch provides better coverage of the metabolome and more complete chemical identifier information. Availability: The software tool is available at http://omics.georgetown.edu/MetaboSearch.html.
The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena) is Europe’s primary nucleotide sequence archival resource, safeguarding open nucleotide data access, engaging in worldwide collaborative data exchange and integrating with the scientific publication process. ENA has made significant contributions to the collaborative nucleotide archival arena as an active proponent of extending the traditional collaboration to cover capillary and next-generation sequencing information. We have continued to co-develop data and metadata representation formats with our collaborators for both data exchange and public data dissemination. In addition to the DDBJ/EMBL/GenBank feature table format, we share metadata formats for capillary and next-generation sequencing traces and are using and contributing to the NCBI SRA Toolkit for the long-term storage of the next-generation sequence traces. During the course of 2009, ENA has significantly improved sequence submission, search and access functionalities provided at EMBL–EBI. In this article, we briefly describe the content and scope of our archive and introduce major improvements to our services.
The PlantMetabolomics (PM) database (http://www.plantmetabolomics.org) contains comprehensive targeted and untargeted mass spectrum metabolomics data for Arabidopsis mutants across a variety of metabolomics platforms. The database allows users to generate hypotheses about the changes in metabolism for mutants with genes of unknown function. Version 2.0 of PlantMetabolomics.org currently contains data for 140 mutant lines along with the morphological data. A web-based data analysis wizard allows researchers to select preprocessing and data-mining procedures to discover differences between mutants. This community resource enables researchers to formulate models of the metabolic network of Arabidopsis and enhances the research community's ability to formulate testable hypotheses concerning gene functions. PM features new web-based tools for data-mining analysis, visualization tools and enhanced cross links to other databases. The database is publicly available. PM aims to provide a hypothesis building platform for the researchers interested in any of the mutant lines or metabolites.
The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena) is Europe’s primary nucleotide-sequence repository. The ENA consists of three main databases: the Sequence Read Archive (SRA), the Trace Archive and EMBL-Bank. The objective of ENA is to support and promote the use of nucleotide sequencing as an experimental research platform by providing data submission, archive, search and download services. In this article, we outline these services and describe major changes and improvements introduced during 2010. These include extended EMBL-Bank and SRA-data submission services, extended ENA Browser functionality, support for submitting data to the European Genome-phenome Archive (EGA) through SRA, and the launch of a new sequence similarity search service.
The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/), maintained at the European Bioinformatics Institute (EBI), incorporates, organizes and distributes nucleotide sequences from public sources. The database is a part of an international collaboration with DDBJ (Japan) and GenBank (USA). Data are exchanged between the collaborating databases on a daily basis to achieve optimal synchrony. The web-based tool, Webin, is the preferred system for individual submission of nucleotide sequences, including Third Party Annotation (TPA) and alignment data. Automatic submission procedures are used for submission of data from large-scale genome sequencing centres and from the European Patent Office. Database releases are produced quarterly. The latest data collection can be accessed via FTP, email and WWW interfaces. The EBI’s Sequence Retrieval System (SRS) integrates and links the main nucleotide and protein databases as well as many other specialist molecular biology databases. For sequence similarity searching, a variety of tools (e.g. FASTA and BLAST) are available that allow external users to compare their own sequences against the data in the EMBL Nucleotide Sequence Database, the complete genomic component subsection of the database, the WGS data sets and other databases. All available resources can be accessed via the EBI home page at http://www.ebi.ac.uk.
The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/)
is maintained at the European Bioinformatics Institute (EBI) in
an international collaboration with the DNA Data Bank of Japan (DDBJ)
and GenBank at the NCBI (USA). Data is exchanged amongst the collaborating
databases on a daily basis. The major contributors to the EMBL database
are individual authors and genome project groups. Webin is the preferred
web-based submission system for individual submitters, whilst automatic procedures
allow incorporation of sequence data from large-scale genome sequencing
centres and from the European Patent Office (EPO). Database releases
are produced quarterly. Network services allow free access to the
most up-to-date data collection via ftp, email and World Wide Web
interfaces. EBI’s Sequence Retrieval System (SRS), a network
browser for databanks in molecular biology, integrates and links
the main nucleotide and protein databases plus many specialized
databases. For sequence similarity searching a variety of tools
(e.g. Blitz, Fasta, BLAST) are available which allow external users
to compare their own sequences against the latest data in the EMBL Nucleotide
Sequence Database and SWISS-PROT.
The standardization of reporting of data promises to revolutionize biology by allowing community access to data generated in laboratories across the globe. This approach has already influenced genomics and transcriptomics. Projects that have previously been viewed as being too big to implement can now be distributed across multiple sites. There are now public databases for gene sequences, transcriptomic profiling and proteomic experiments. However, progress in the metabolomic community has seemed to falter recently, and whereas there are ontologies to describe the metadata for metabolomics there are still no central repositories for the datasets themselves. Here, we examine some of the challenges and potential benefits of further efforts towards data standardization in metabolomics and metabonomics.
Metabolomics is a rapidly developing functional genomic tool that has a wide range of applications in diverse fields in biology and medicine. However, unlike transcriptomics and proteomics there is currently no central repository for the depositing of data despite efforts by the Metabolomics Standard Initiative (MSI) to develop a standardised description of a metabolomic experiment.
In this manuscript we describe how the MSI description has been applied to a published dataset involving the identification of cross-species metabolic biomarkers associated with type II diabetes. The study describes sample collection of urine from mice, rats and human volunteers, and the subsequent acquisition of data by high resolution 1H NMR spectroscopy. The metadata is described to demonstrate how the MSI descriptions could be applied in a manuscript and the spectra have also been made available for the mouse and rat studies to allow others to process the data.
The intention of this manuscript is to stimulate discussion as to whether the MSI description is sufficient to describe the metadata associated with metabolomic experiments and encourage others to make their data available to other researchers.
data standards; metabolomics repository; bioinformatics; NMR spectroscopy
Metabolomics is a newly emerging field of ‘omics’ research that is concerned with characterizing large numbers of metabolites using NMR, chromatography and mass spectrometry. It is frequently used in biomarker identification and the metabolic profiling of cells, tissues or organisms. The data processing challenges in metabolomics are quite unique and often require specialized (or expensive) data analysis software and a detailed knowledge of cheminformatics, bioinformatics and statistics. In an effort to simplify metabolomic data analysis while at the same time improving user accessibility, we have developed a freely accessible, easy-to-use web server for metabolomic data analysis called MetaboAnalyst. Fundamentally, MetaboAnalyst is a web-based metabolomic data processing tool not unlike many of today's web-based microarray analysis packages. It accepts a variety of input data (NMR peak lists, binned spectra, MS peak lists, compound/concentration data) in a wide variety of formats. It also offers a number of options for metabolomic data processing, data normalization, multivariate statistical analysis, graphing, metabolite identification and pathway mapping. In particular, MetaboAnalyst supports such techniques as: fold change analysis, t-tests, PCA, PLS-DA, hierarchical clustering and a number of more sophisticated statistical or machine learning methods. It also employs a large library of reference spectra to facilitate compound identification from most kinds of input spectra. MetaboAnalyst guides users through a step-by-step analysis pipeline using a variety of menus, information hyperlinks and check boxes. Upon completion, the server generates a detailed report describing each method used, embedded with graphical and tabular outputs. MetaboAnalyst is capable of handling most kinds of metabolomic data and was designed to perform most of the common kinds of metabolomic data analyses. MetaboAnalyst is accessible at http://www.metaboanalyst.ca
The PRoteomics IDEntifications (PRIDE, http://www.ebi.ac.uk/pride) database at the European Bioinformatics Institute is one of the most prominent data repositories of mass spectrometry (MS)-based proteomics data. Here, we summarize recent developments in the PRIDE database and related tools. First, we provide up-to-date statistics in data content, splitting the figures by groups of organisms and species, including peptide and protein identifications, and post-translational modifications. We then describe the tools that are part of the PRIDE submission pipeline, especially the recently developed PRIDE Converter 2 (new submission tool) and PRIDE Inspector (visualization and analysis tool). We also give an update about the integration of PRIDE with other MS proteomics resources in the context of the ProteomeXchange consortium. Finally, we briefly review the quality control efforts that are ongoing at present and outline our future plans.
The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl.html) constitutes Europe's primary nucleotide sequence resource. Main sources for DNA and RNA sequences are direct submissions from individual researchers, genome sequencing projects and patent applications. While automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO), the preferred submission tool for individual submitters is Webin (WWW). Through all stages, dataflow is monitored by EBI biologists communicating with the sequencing groups. In collaboration with DDBJ and GenBank the database is produced, maintained and distributed at the European Bioinformatics Institute (EBI). Database releases are produced quarterly and are distributed on CD-ROM. Network services allow access to the most up-to-date data collection via Internet and World Wide Web interface. EBI's Sequence Retrieval System (SRS) is a Network Browser for Databanks in Molecular Biology, integrating and linking the main nucleotide and protein databases, plus many specialised databases. For sequence similarity searching a variety of tools (e.g. Blitz, Fasta, Blast etc) are available for external users to compare their own sequences against the most currently available data in the EMBL Nucleotide Sequence Database and SWISS-PROT.
The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/) incorporates, organizes and distributes nucleotide sequences from all available public sources. The database is located and maintained at the European Bioinformatics Institute (EBI) near Cambridge, UK. In an international collaboration with DDBJ (Japan) and GenBank (USA), data are exchanged amongst the collaborating databases on a daily basis to achieve optimal synchronization. Webin is the preferred web-based submission system for individual submitters, while automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via FTP, Email and World Wide Web interfaces. EBI's Sequence Retrieval System (SRS) integrates and links the main nucleotide and protein databases plus many other specialized molecular biology databases. For sequence similarity searching, a variety of tools (e.g. Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT. All resources can be accessed via the EBI home page at http://www.ebi.ac.uk.
The EMBL Nucleotide Sequence Database (aka EMBL-Bank; http://www.ebi.ac.uk/embl/) incorporates, organises and distributes nucleotide sequences from all available public sources. EMBL-Bank is located and maintained at the European Bioinformatics Institute (EBI) near Cambridge, UK. In an international collaboration with DDBJ (Japan) and GenBank (USA), data are exchanged amongst the collaborating databases on a daily basis. Major contributors to the EMBL database are individual scientists and genome project groups. Webin is the preferred web-based submission system for individual submitters, whilst automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via FTP, email and World Wide Web interfaces. EBI’s Sequence Retrieval System (SRS), a network browser for databanks in molecular biology, integrates and links the main nucleotide and protein databases plus many other specialized databases. For sequence similarity searching, a variety of tools (e.g. Blitz, Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT. All resources can be accessed via the EBI home page at http://www.ebi.ac.uk.
The Yeast Metabolome Database (YMDB, http://www.ymdb.ca) is a richly annotated ‘metabolomic’ database containing detailed information about the metabolome of Saccharomyces cerevisiae. Modeled closely after the Human Metabolome Database, the YMDB contains >2000 metabolites with links to 995 different genes/proteins, including enzymes and transporters. The information in YMDB has been gathered from hundreds of books, journal articles and electronic databases. In addition to its comprehensive literature-derived data, the YMDB also contains an extensive collection of experimental intracellular and extracellular metabolite concentration data compiled from detailed Mass Spectrometry (MS) and Nuclear Magnetic Resonance (NMR) metabolomic analyses performed in our lab. This is further supplemented with thousands of NMR and MS spectra collected on pure, reference yeast metabolites. Each metabolite entry in the YMDB contains an average of 80 separate data fields including comprehensive compound description, names and synonyms, structural information, physico-chemical data, reference NMR and MS spectra, intracellular/extracellular concentrations, growth conditions and substrates, pathway information, enzyme data, gene/protein sequence data, as well as numerous hyperlinks to images, references and other public databases. Extensive searching, relational querying and data browsing tools are also provided that support text, chemical structure, spectral, molecular weight and gene/protein sequence queries. Because of S. cervesiae's importance as a model organism for biologists and as a biofactory for industry, we believe this kind of database could have considerable appeal not only to metabolomics researchers, but also to yeast biologists, systems biologists, the industrial fermentation industry, as well as the beer, wine and spirit industry.
The Human Metabolome Database (HMDB, http://www.hmdb.ca) is a richly annotated resource that is designed to address the broad needs of biochemists, clinical chemists, physicians, medical geneticists, nutritionists and members of the metabolomics community. Since its first release in 2007, the HMDB has been used to facilitate the research for nearly 100 published studies in metabolomics, clinical biochemistry and systems biology. The most recent release of HMDB (version 2.0) has been significantly expanded and enhanced over the previous release (version 1.0). In particular, the number of fully annotated metabolite entries has grown from 2180 to more than 6800 (a 300% increase), while the number of metabolites with biofluid or tissue concentration data has grown by a factor of five (from 883 to 4413). Similarly, the number of purified compounds with reference to NMR, LC-MS and GC-MS spectra has more than doubled (from 380 to more than 790 compounds). In addition to this significant expansion in database size, many new database searching tools and new data content has been added or enhanced. These include better algorithms for spectral searching and matching, more powerful chemical substructure searches, faster text searching software, as well as dedicated pathway searching tools and customized, clickable metabolic maps. Changes to the user-interface have also been implemented to accommodate future expansion and to make database navigation much easier. These improvements should make the HMDB much more useful to a much wider community of users.
Despite wide-spread use of Nuclear Magnetic Resonance (NMR) in metabolomics for the analysis of biological samples there is a lack of graphically driven, publicly available software to process large one and two-dimensional NMR data sets for statistical analysis.
Here we present MetaboLab, a MATLAB based software package that facilitates NMR data processing by providing automated algorithms for processing series of spectra in a reproducible fashion. A graphical user interface provides easy access to all steps of data processing via a script builder to generate MATLAB scripts, providing an option to alter code manually. The analysis of two-dimensional spectra (1H,13C-HSQC spectra) is facilitated by the use of a spectral library derived from publicly available databases which can be extended readily. The software allows to display specific metabolites in small regions of interest where signals can be picked. To facilitate the analysis of series of two-dimensional spectra, different spectra can be overlaid and assignments can be transferred between spectra. The software includes mechanisms to account for overlapping signals by highlighting neighboring and ambiguous assignments.
The MetaboLab software is an integrated software package for NMR data processing and analysis, closely linked to the previously developed NMRLab software. It includes tools for batch processing and gives access to a wealth of algorithms available in the MATLAB framework. Algorithms within MetaboLab help to optimize the flow of metabolomics data preparation for statistical analysis. The combination of an intuitive graphical user interface along with advanced data processing algorithms facilitates the use of MetaboLab in a broader metabolomics context.
Improvements in the techniques for metabolomics analyses and growing interest in metabolomic approaches are resulting in the generation of increasing numbers of metabolomic profiles. Platforms are required for profile management, as a function of experimental design, and for metabolite identification, to facilitate the mining of the corresponding data. Various databases have been created, including organism-specific knowledgebases and analytical technique-specific spectral databases. However, there is currently no platform meeting the requirements for both profile management and metabolite identification for nuclear magnetic resonance (NMR) experiments.
MeRy-B, the first platform for plant 1H-NMR metabolomic profiles, is designed (i) to provide a knowledgebase of curated plant profiles and metabolites obtained by NMR, together with the corresponding experimental and analytical metadata, (ii) for queries and visualization of the data, (iii) to discriminate between profiles with spectrum visualization tools and statistical analysis, (iv) to facilitate compound identification. It contains lists of plant metabolites and unknown compounds, with information about experimental conditions, the factors studied and metabolite concentrations for several plant species, compiled from more than one thousand annotated NMR profiles for various organs or tissues.
MeRy-B manages all the data generated by NMR-based plant metabolomics experiments, from description of the biological source to identification of the metabolites and determinations of their concentrations. It is the first database allowing the display and overlay of NMR metabolomic profiles selected through queries on data or metadata. MeRy-B is available from http://www.cbib.u-bordeaux2.fr/MERYB/index.php.
Metabolomics is the methodology that identifies and measures global pools of small molecules (of less than about 1,000 Da) of a biological sample, which are collectively called the metabolome. Metabolomics can therefore reveal the metabolic outcome of a genetic or environmental perturbation of a metabolic regulatory network, and thus provide insights into the structure and regulation of that network. Because of the chemical complexity of the metabolome and limitations associated with individual analytical platforms for determining the metabolome, it is currently difficult to capture the complete metabolome of an organism or tissue, which is in contrast to genomics and transcriptomics. This paper describes the analysis of Arabidopsis metabolomics data sets acquired by a consortium that includes five analytical laboratories, bioinformaticists, and biostatisticians, which aims to develop and validate metabolomics as a hypothesis-generating functional genomics tool. The consortium is determining the metabolomes of Arabidopsis T-DNA mutant stocks, grown in standardized controlled environment optimized to minimize environmental impacts on the metabolomes. Metabolomics data were generated with seven analytical platforms, and the combined data is being provided to the research community to formulate initial hypotheses about genes of unknown function (GUFs). A public database (www.PlantMetabolomics.org) has been developed to provide the scientific community with access to the data along with tools to allow for its interactive analysis. Exemplary datasets are discussed to validate the approach, which illustrate how initial hypotheses can be generated from the consortium-produced metabolomics data, integrated with prior knowledge to provide a testable hypothesis concerning the functionality of GUFs.
Arabidopsis; metabolomics; gene annotation; functional genomics; database