The Genomes OnLine Database (GOLD) provides a centralized resource for the continuous monitoring of genome and metagenome sequencing projects worldwide, uniquely integrated with their associated metadata and is currently in its fourth version since its launching in 1997 (1–5
). The number of registered sequencing projects has almost doubled since the publication of the previous report 2 years ago (5
). As of September 2011, 11
472 projects have been registered, versus 5843 in September 2009 (5
), 2905 in September 2007 (4
) and 1575 in September 2005 (3
) (A). This rapid growth is mainly attributed to decreasing costs due to advances in sequencing technologies, instigating several large-scale microbial genome sequencing initiatives, such as the Human Microbiome Project (HMP; http://www.hmpdacc.org/
) and the Genomic Encyclopedia of Bacteria and Archaea (GEBA; http://www.jgi.doe.gov/programs/GEBA/
). During this period, GOLD has also expanded its scope beyond standard genomic and metagenomic projects to now encompass data from the growing number of resequencing, transcriptome, metatranscriptome and single cell sequencing projects.
Figure 1. Statistical information from GOLD data as of September 2011. (A) Evolution of the complete, incomplete and total number of projects monitored in GOLD. Genome projects in GOLD: 11472. (B) Evolution of the complete projects monitored in GOLD separated (more ...)
Among the most important developments of the database during the last 2 years are those coupled to the growth of the metadata and metagenome projects. These include the implementation of GOLD-specific controlled vocabularies (CVs) for the representation of the associated data, in coordination with the Genomics Standards Consortium (GSC) (8
) complying with its recommendations for the Minimum Information about any (x) Sequence (MIxS) specifications (9
). Additionally, GOLD has implemented the canonical metagenome naming and standardized classification for all metagenome projects, as it has been proposed in 2010 (10
). Finally, GOLD has placed emphasis on the rapidly advancing field of metagenomics through (i) increasing the number of metadata fields associated with metagenomic samples, classifying them in separate categories for metagenomic sample info, sequencing info, environment metadata and host metadata; (ii) depicting the metagenome sample metadata in separate GOLD cards under a new GOLD ID, marked with the ‘Gs’ prefix; (iii) providing separate tables for metagenome sample data lists and (iv) adding a new metagenome advanced search option under the ‘Search Gold’ page.
As the rate of launching new projects increases exponentially, the task of monitoring and recording their data along with their metadata is now a sine-qua-non-condition for the coordination of the genome sequencing scientific community worldwide. Accordingly, accurate project and metadata tracking through GSC compliant registration are strongly recommended.
Integration of genomic and metagenomic data with their associated metadata adds significant value to both and can facilitate better educated comparative analysis and biological interpretations of the sequence data. For that purpose, the GOLD metadata are integrated into the Integrated Microbial Genomes (IMG) family of data management systems (11–13