Related Articles
Cochrane, Guy | Akhtar, Ruth | Bonfield, James | Bower, Lawrence | Demiralp, Fehmi | Faruque, Nadeem | Gibson, Richard | Hoad, Gemma | Hubbard, Tim | Hunter, Christopher | Jang, Mikyung | Juhos, Szilveszter | Leinonen, Rasko | Leonard, Steven | Lin, Quan | Lopez, Rodrigo | Lorenc, Dariusz | McWilliam, Hamish | Mukherjee, Gaurab | Plaister, Sheila | Radhakrishnan, Rajesh | Robinson, Stephen | Sobhany, Siamak | Hoopen, Petra Ten | Vaughan, Robert | Zalunin, Vadim | Birney, Ewan
Dramatic increases in the throughput of nucleotide sequencing machines, and the promise of ever greater performance, have thrust bioinformatics into the era of petabyte-scale data sets. Sequence repositories, which provide the feed for these data sets into the worldwide computational infrastructure, are challenged by the impact of these data volumes. The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/embl), comprising the EMBL Nucleotide Sequence Database and the Ensembl Trace Archive, has identified challenges in the storage, movement, analysis, interpretation and visualization of petabyte-scale data sets. We present here our new repository for next generation sequence data, a brief summary of contents of the ENA and provide details of major developments to submission pipelines, high-throughput rule-based validation infrastructure and data integration approaches.
doi:10.1093/nar/gkn765
PMCID: PMC2686451
PMID: 18978013
Leinonen, Rasko | Akhtar, Ruth | Birney, Ewan | Bower, Lawrence | Cerdeno-Tárraga, Ana | Cheng, Ying | Cleland, Iain | Faruque, Nadeem | Goodgame, Neil | Gibson, Richard | Hoad, Gemma | Jang, Mikyung | Pakseresht, Nima | Plaister, Sheila | Radhakrishnan, Rajesh | Reddy, Kethi | Sobhany, Siamak | Ten Hoopen, Petra | Vaughan, Robert | Zalunin, Vadim | Cochrane, Guy
The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena) is Europe’s primary nucleotide-sequence repository. The ENA consists of three main databases: the Sequence Read Archive (SRA), the Trace Archive and EMBL-Bank. The objective of ENA is to support and promote the use of nucleotide sequencing as an experimental research platform by providing data submission, archive, search and download services. In this article, we outline these services and describe major changes and improvements introduced during 2010. These include extended EMBL-Bank and SRA-data submission services, extended ENA Browser functionality, support for submitting data to the European Genome-phenome Archive (EGA) through SRA, and the launch of a new sequence similarity search service.
doi:10.1093/nar/gkq967
PMCID: PMC3013801
PMID: 20972220
Leinonen, Rasko | Akhtar, Ruth | Birney, Ewan | Bonfield, James | Bower, Lawrence | Corbett, Matt | Cheng, Ying | Demiralp, Fehmi | Faruque, Nadeem | Goodgame, Neil | Gibson, Richard | Hoad, Gemma | Hunter, Christopher | Jang, Mikyung | Leonard, Steven | Lin, Quan | Lopez, Rodrigo | Maguire, Michael | McWilliam, Hamish | Plaister, Sheila | Radhakrishnan, Rajesh | Sobhany, Siamak | Slater, Guy | Ten Hoopen, Petra | Valentin, Franck | Vaughan, Robert | Zalunin, Vadim | Zerbino, Daniel | Cochrane, Guy
The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena) is Europe’s primary nucleotide sequence archival resource, safeguarding open nucleotide data access, engaging in worldwide collaborative data exchange and integrating with the scientific publication process. ENA has made significant contributions to the collaborative nucleotide archival arena as an active proponent of extending the traditional collaboration to cover capillary and next-generation sequencing information. We have continued to co-develop data and metadata representation formats with our collaborators for both data exchange and public data dissemination. In addition to the DDBJ/EMBL/GenBank feature table format, we share metadata formats for capillary and next-generation sequencing traces and are using and contributing to the NCBI SRA Toolkit for the long-term storage of the next-generation sequence traces. During the course of 2009, ENA has significantly improved sequence submission, search and access functionalities provided at EMBL–EBI. In this article, we briefly describe the content and scope of our archive and introduce major improvements to our services.
doi:10.1093/nar/gkp998
PMCID: PMC2808951
PMID: 19906712
Kulikova, Tamara | Aldebert, Philippe | Althorpe, Nicola | Baker, Wendy | Bates, Kirsty | Browne, Paul | van den Broek, Alexandra | Cochrane, Guy | Duggan, Karyn | Eberhardt, Ruth | Faruque, Nadeem | Garcia-Pastor, Maria | Harte, Nicola | Kanz, Carola | Leinonen, Rasko | Lin, Quan | Lombard, Vincent | Lopez, Rodrigo | Mancuso, Renato | McHale, Michelle | Nardone, Francesco | Silventoinen, Ville | Stoehr, Peter | Stoesser, Guenter | Tuli, Mary Ann | Tzouvara, Katerina | Vaughan, Robert | Wu, Dan | Zhu, Weimin | Apweiler, Rolf
The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/), maintained at the European Bioinformatics Institute (EBI), incorporates, organizes and distributes nucleotide sequences from public sources. The database is a part of an international collaboration with DDBJ (Japan) and GenBank (USA). Data are exchanged between the collaborating databases on a daily basis to achieve optimal synchrony. The web-based tool, Webin, is the preferred system for individual submission of nucleotide sequences, including Third Party Annotation (TPA) and alignment data. Automatic submission procedures are used for submission of data from large-scale genome sequencing centres and from the European Patent Office. Database releases are produced quarterly. The latest data collection can be accessed via FTP, email and WWW interfaces. The EBI’s Sequence Retrieval System (SRS) integrates and links the main nucleotide and protein databases as well as many other specialist molecular biology databases. For sequence similarity searching, a variety of tools (e.g. FASTA and BLAST) are available that allow external users to compare their own sequences against the data in the EMBL Nucleotide Sequence Database, the complete genomic component subsection of the database, the WGS data sets and other databases. All available resources can be accessed via the EBI home page at http://www.ebi.ac.uk.
doi:10.1093/nar/gkh120
PMCID: PMC308854
PMID: 14681351
Next generation sequencing platforms are producing biological sequencing data in unprecedented amounts. The partners of the International Nucleotide Sequencing Database Collaboration, which includes the National Center for Biotechnology Information (NCBI), the European Bioinformatics Institute (EBI), and the DNA Data Bank of Japan (DDBJ), have established the Sequence Read Archive (SRA) to provide the scientific community with an archival destination for next generation data sets. The SRA is now accessible at http://www.ncbi.nlm.nih.gov/Traces/sra from NCBI, at http://www.ebi.ac.uk/ena from EBI and at http://www.ddbj.nig.ac.jp/sub/trace_sra-e.html from DDBJ. Users of these resources can obtain data sets deposited in any of the three SRA instances. Links and submission instructions are provided.
doi:10.1093/nar/gkp1078
PMCID: PMC2808927
PMID: 19965774
Kanz, Carola | Aldebert, Philippe | Althorpe, Nicola | Baker, Wendy | Baldwin, Alastair | Bates, Kirsty | Browne, Paul | van den Broek, Alexandra | Castro, Matias | Cochrane, Guy | Duggan, Karyn | Eberhardt, Ruth | Faruque, Nadeem | Gamble, John | Diez, Federico Garcia | Harte, Nicola | Kulikova, Tamara | Lin, Quan | Lombard, Vincent | Lopez, Rodrigo | Mancuso, Renato | McHale, Michelle | Nardone, Francesco | Silventoinen, Ville | Sobhany, Siamak | Stoehr, Peter | Tuli, Mary Ann | Tzouvara, Katerina | Vaughan, Robert | Wu, Dan | Zhu, Weimin | Apweiler, Rolf
The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl), maintained at the European Bioinformatics Institute (EBI) near Cambridge, UK, is a comprehensive collection of nucleotide sequences and annotation from available public sources. The database is part of an international collaboration with DDBJ (Japan) and GenBank (USA). Data are exchanged daily between the collaborating institutes to achieve swift synchrony. Webin is the preferred tool for individual submissions of nucleotide sequences, including Third Party Annotation (TPA) and alignments. Automated procedures are provided for submissions from large-scale sequencing projects and data from the European Patent Office. New and updated data records are distributed daily and the whole EMBL Nucleotide Sequence Database is released four times a year. Access to the sequence data is provided via ftp and several WWW interfaces. With the web-based Sequence Retrieval System (SRS) it is also possible to link nucleotide data to other specialist molecular biology databases maintained at the EBI. Other tools are available for sequence similarity searching (e.g. FASTA and BLAST). Changes over the past year include the removal of the sequence length limit, the launch of the EMBLCDSs dataset, extension of the Sequence Version Archive functionality and the revision of quality rules for TPA data.
doi:10.1093/nar/gki098
PMCID: PMC540052
PMID: 15608199
The combination of significantly lower cost and increased speed of sequencing has resulted in an explosive growth of data submitted into the primary next-generation sequence data archive, the Sequence Read Archive (SRA). The preservation of experimental data is an important part of the scientific record, and increasing numbers of journals and funding agencies require that next-generation sequence data are deposited into the SRA. The SRA was established as a public repository for the next-generation sequence data and is operated by the International Nucleotide Sequence Database Collaboration (INSDC). INSDC partners include the National Center for Biotechnology Information (NCBI), the European Bioinformatics Institute (EBI) and the DNA Data Bank of Japan (DDBJ). The SRA is accessible at http://www.ncbi.nlm.nih.gov/Traces/sra from NCBI, at http://www.ebi.ac.uk/ena from EBI and at http://trace.ddbj.nig.ac.jp from DDBJ. In this article, we present the content and structure of the SRA, detail our support for sequencing platforms and provide recommended data submission levels and formats. We also briefly outline our response to the challenge of data growth.
doi:10.1093/nar/gkq1019
PMCID: PMC3013647
PMID: 21062823
Stoesser, Guenter | Baker, Wendy | van
den Broek, Alexandra | Camon, Evelyn | Garcia-Pastor, Maria | Kanz, Carola | Kulikova, Tamara | Lombard, Vincent | Lopez, Rodrigo | Parkinson, Helen | Redaschi, Nicole | Sterk, Peter | Stoehr, Peter | Tuli, Mary
Ann
The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/)
is maintained at the European Bioinformatics Institute (EBI) in
an international collaboration with the DNA Data Bank of Japan (DDBJ)
and GenBank at the NCBI (USA). Data is exchanged amongst the collaborating
databases on a daily basis. The major contributors to the EMBL database
are individual authors and genome project groups. Webin is the preferred
web-based submission system for individual submitters, whilst automatic procedures
allow incorporation of sequence data from large-scale genome sequencing
centres and from the European Patent Office (EPO). Database releases
are produced quarterly. Network services allow free access to the
most up-to-date data collection via ftp, email and World Wide Web
interfaces. EBI’s Sequence Retrieval System (SRS), a network
browser for databanks in molecular biology, integrates and links
the main nucleotide and protein databases plus many specialized
databases. For sequence similarity searching a variety of tools
(e.g. Blitz, Fasta, BLAST) are available which allow external users
to compare their own sequences against the latest data in the EMBL Nucleotide
Sequence Database and SWISS-PROT.
PMCID: PMC29766
PMID: 11125039
The DNA Data Bank of Japan (DDBJ; http://www.ddbj.nig.ac.jp) maintains and provides archival, retrieval and analytical resources for biological information. The central DDBJ resource consists of public, open-access nucleotide sequence databases including raw sequence reads, assembly information and functional annotation. Database content is exchanged with EBI and NCBI within the framework of the International Nucleotide Sequence Database Collaboration (INSDC). In 2011, DDBJ launched two new resources: the ‘DDBJ Omics Archive’ (DOR; http://trace.ddbj.nig.ac.jp/dor) and BioProject (http://trace.ddbj.nig.ac.jp/bioproject). DOR is an archival database of functional genomics data generated by microarray and highly parallel new generation sequencers. Data are exchanged between the ArrayExpress at EBI and DOR in the common MAGE-TAB format. BioProject provides an organizational framework to access metadata about research projects and the data from the projects that are deposited into different databases. In this article, we describe major changes and improvements introduced to the DDBJ services, and the launch of two new resources: DOR and BioProject.
doi:10.1093/nar/gkr994
PMCID: PMC3244990
PMID: 22110025
Amid, Clara | Birney, Ewan | Bower, Lawrence | Cerdeño-Tárraga, Ana | Cheng, Ying | Cleland, Iain | Faruque, Nadeem | Gibson, Richard | Goodgame, Neil | Hunter, Christopher | Jang, Mikyung | Leinonen, Rasko | Liu, Xin | Oisel, Arnaud | Pakseresht, Nima | Plaister, Sheila | Radhakrishnan, Rajesh | Reddy, Kethi | Rivière, Stephane | Rossello, Marc | Senf, Alexander | Smirnov, Dimitriy | Ten Hoopen, Petra | Vaughan, Daniel | Vaughan, Robert | Zalunin, Vadim | Cochrane, Guy
The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena), Europe's primary nucleotide sequence resource, captures and presents globally comprehensive nucleic acid sequence and associated information. Covering the spectrum from raw data to assembled and functionally annotated genomes, the ENA has witnessed a dramatic growth resulting from advances in sequencing technology and ever broadening application of the methodology. During 2011, we have continued to operate and extend the broad range of ENA services. In particular, we have released major new functionality in our interactive web submission system, Webin, through developments in template-based submissions for annotated sequences and support for raw next-generation sequence read submissions.
doi:10.1093/nar/gkr946
PMCID: PMC3245037
PMID: 22080548
Stoesser, Guenter | Baker, Wendy | van den Broek, Alexandra | Camon, Evelyn | Garcia-Pastor, Maria | Kanz, Carola | Kulikova, Tamara | Leinonen, Rasko | Lin, Quan | Lombard, Vincent | Lopez, Rodrigo | Redaschi, Nicole | Stoehr, Peter | Tuli, Mary Ann | Tzouvara, Katerina | Vaughan, Robert
The EMBL Nucleotide Sequence Database (aka EMBL-Bank; http://www.ebi.ac.uk/embl/) incorporates, organises and distributes nucleotide sequences from all available public sources. EMBL-Bank is located and maintained at the European Bioinformatics Institute (EBI) near Cambridge, UK. In an international collaboration with DDBJ (Japan) and GenBank (USA), data are exchanged amongst the collaborating databases on a daily basis. Major contributors to the EMBL database are individual scientists and genome project groups. Webin is the preferred web-based submission system for individual submitters, whilst automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via FTP, email and World Wide Web interfaces. EBI’s Sequence Retrieval System (SRS), a network browser for databanks in molecular biology, integrates and links the main nucleotide and protein databases plus many other specialized databases. For sequence similarity searching, a variety of tools (e.g. Blitz, Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT. All resources can be accessed via the EBI home page at http://www.ebi.ac.uk.
PMCID: PMC99098
PMID: 11752244
Stoesser, Guenter | Baker, Wendy | van den Broek, Alexandra | Garcia-Pastor, Maria | Kanz, Carola | Kulikova, Tamara | Leinonen, Rasko | Lin, Quan | Lombard, Vincent | Lopez, Rodrigo | Mancuso, Renato | Nardone, Francesco | Stoehr, Peter | Tuli, Mary Ann | Tzouvara, Katerina | Vaughan, Robert
The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/) incorporates, organizes and distributes nucleotide sequences from all available public sources. The database is located and maintained at the European Bioinformatics Institute (EBI) near Cambridge, UK. In an international collaboration with DDBJ (Japan) and GenBank (USA), data are exchanged amongst the collaborating databases on a daily basis to achieve optimal synchronization. Webin is the preferred web-based submission system for individual submitters, while automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via FTP, Email and World Wide Web interfaces. EBI's Sequence Retrieval System (SRS) integrates and links the main nucleotide and protein databases plus many other specialized molecular biology databases. For sequence similarity searching, a variety of tools (e.g. Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT. All resources can be accessed via the EBI home page at http://www.ebi.ac.uk.
PMCID: PMC165468
PMID: 12519939
The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl.html) constitutes Europe's primary nucleotide sequence resource. Main sources for DNA and RNA sequences are direct submissions from individual researchers, genome sequencing projects and patent applications. While automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO), the preferred submission tool for individual submitters is Webin (WWW). Through all stages, dataflow is monitored by EBI biologists communicating with the sequencing groups. In collaboration with DDBJ and GenBank the database is produced, maintained and distributed at the European Bioinformatics Institute (EBI). Database releases are produced quarterly and are distributed on CD-ROM. Network services allow access to the most up-to-date data collection via Internet and World Wide Web interface. EBI's Sequence Retrieval System (SRS) is a Network Browser for Databanks in Molecular Biology, integrating and linking the main nucleotide and protein databases, plus many specialised databases. For sequence similarity searching a variety of tools (e.g. Blitz, Fasta, Blast etc) are available for external users to compare their own sequences against the most currently available data in the EMBL Nucleotide Sequence Database and SWISS-PROT.
PMCID: PMC148088
PMID: 9847133
The European Molecular Biology Laboratory (EMBL) Nucleotide Sequence Database (http://www.ebi.ac. uk/embl/index.html ) is maintained at the European Bioinformatics Institute (EBI) in an international collaboration with the DNA Data Bank of Japan (DDBJ) and GenBank (USA). Data is exchanged amongst the collaborative databases on a daily basis. The major contributors to the EMBL database are individual authors and genome project groups. WEBIN is the preferred web-based submission system for individual submitters, whilst automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via Internet and WWW interfaces. EBI’s Sequence Retrieval System (SRS) is a network browser for databanks in molecular biology, integrating and linking the main nucleotide and protein databases plus many specialised databases. For sequence similarity searching a variety of tools (e.g., BLITZ, FASTA, BLAST) are available which allow external users to compare their own sequences against the most currently available data in the EMBL Nucleotide Sequence Database and SWISS-PROT.
PMCID: PMC102461
PMID: 10592171
New generation sequencing platforms are producing data with significantly higher throughput and lower cost. A portion of this capacity is devoted to individual and community scientific projects. As these projects reach publication, raw sequencing datasets are submitted into the primary next-generation sequence data archive, the Sequence Read Archive (SRA). Archiving experimental data is the key to the progress of reproducible science. The SRA was established as a public repository for next-generation sequence data as a part of the International Nucleotide Sequence Database Collaboration (INSDC). INSDC is composed of the National Center for Biotechnology Information (NCBI), the European Bioinformatics Institute (EBI) and the DNA Data Bank of Japan (DDBJ). The SRA is accessible at www.ncbi.nlm.nih.gov/sra from NCBI, at www.ebi.ac.uk/ena from EBI and at trace.ddbj.nig.ac.jp from DDBJ. In this article, we present the content and structure of the SRA and report on updated metadata structures, submission file formats and supported sequencing platforms. We also briefly outline our various responses to the challenge of explosive data growth.
doi:10.1093/nar/gkr854
PMCID: PMC3245110
PMID: 22009675
The wide uptake of next-generation sequencing and other ultra-high throughput technologies by life scientists with a diverse range of interests, spanning fundamental biological research, medicine, agriculture and environmental science, has led to unprecedented growth in the amount of data generated. It has also put the need for unrestricted access to biological data at the centre of biology. The European Bioinformatics Institute (EMBL-EBI) is unique in Europe and is one of only two organisations worldwide providing access to a comprehensive, integrated set of these collections. Here, we describe how the EMBL-EBI’s biomolecular databases are evolving to cope with increasing levels of submission, a growing and diversifying user base, and the demand for new types of data. All of the resources described here can be accessed from the EMBL-EBI website: http://www.ebi.ac.uk
doi:10.1093/nar/gkp986
PMCID: PMC2808956
PMID: 19934258
Kersey, Paul | Bower, Lawrence | Morris, Lorna | Horne, Alan | Petryszak, Robert | Kanz, Carola | Kanapin, Alexander | Das, Ujjwal | Michoud, Karine | Phan, Isabelle | Gattiker, Alexandre | Kulikova, Tamara | Faruque, Nadeem | Duggan, Karyn | Mclaren, Peter | Reimholz, Britt | Duret, Laurent | Penel, Simon | Reuter, Ingmar | Apweiler, Rolf
Integr8 is a new web portal for exploring the biology of organisms with completely deciphered genomes. For over 190 species, Integr8 provides access to general information, recent publications, and a detailed statistical overview of the genome and proteome of the organism. The preparation of this analysis is supported through Genome Reviews, a new database of bacterial and archaeal DNA sequences in which annotation has been upgraded (compared to the original submission) through the integration of data from many sources, including the EMBL Nucleotide Sequence Database, the UniProt Knowledgebase, InterPro, CluSTr, GOA and HOGENOM. Integr8 also allows the users to customize their own interactive analysis, and to download both customized and prepared datasets for their own use. Integr8 is available at http://www.ebi.ac.uk/integr8.
doi:10.1093/nar/gki039
PMCID: PMC539993
PMID: 15608201
The mission of the European Bioinformatics Institute (EBI), an outstation of the European Molecular Biology Laboratory (EMBL) in Heidelberg, is to ensure that the growing body of information from molecular biology and genome research is placed in the public domain and is accessible freely to all parts of the scientific community in ways that promote scientific progress. To fulfil this mission, the EBI provides a wide variety of free, publicly available bioinformatics services. These can be divided into data submissions processing; access to query, analysis and retrieval systems and tools; ftp downloads of software and databases; training and education and user support. All of these services are available at the EBI website: http://www.ebi.ac.uk/services. This paper provides a detailed introduction to the interactive analysis systems that are available from the EBI and a brief introduction to other, related services.
doi:10.1093/nar/gkh405
PMCID: PMC441543
PMID: 15215339
The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl. html ) constitutes Europe's primary nucleotide sequence resource. DNA and RNA sequences are directly submitted from researchers and genome sequencing groups and collected from the scientific literature and patent applications (Fig. 1). In collaboration with DDBJ and GenBank the database is produced, maintained and distributed at the European Bioinformatics Institute. Database releases are produced quarterly and are distributed on CD-ROM. EBI's network services allow access to the most up-to-date data collection via Internet and World Wide Web interface, providing database searching and sequence similarity facilities plus access to a large number of additional databases.
PMCID: PMC147241
PMID: 9399791
Kapushesky, Misha | Adamusiak, Tomasz | Burdett, Tony | Culhane, Aedin | Farne, Anna | Filippov, Alexey | Holloway, Ele | Klebanov, Andrey | Kryvych, Nataliya | Kurbatova, Natalja | Kurnosov, Pavel | Malone, James | Melnichuk, Olga | Petryszak, Robert | Pultsin, Nikolay | Rustici, Gabriella | Tikhonov, Andrew | Travillian, Ravensara S. | Williams, Eleanor | Zorin, Andrey | Parkinson, Helen | Brazma, Alvis
Gene Expression Atlas (http://www.ebi.ac.uk/gxa) is an added-value database providing information about gene expression in different cell types, organism parts, developmental stages, disease states, sample treatments and other biological/experimental conditions. The content of this database derives from curation, re-annotation and statistical analysis of selected data from the ArrayExpress Archive and the European Nucleotide Archive. A simple interface allows the user to query for differential gene expression either by gene names or attributes or by biological conditions, e.g. diseases, organism parts or cell types. Since our previous report we made 20 monthly releases and, as of Release 11.08 (August 2011), the database supports 19 species, which contains expression data measured for 19 014 biological conditions in 136 551 assays from 5598 independent studies.
doi:10.1093/nar/gkr913
PMCID: PMC3245177
PMID: 22064864
Cochrane, Guy | Aldebert, Philippe | Althorpe, Nicola | Andersson, Mikael | Baker, Wendy | Baldwin, Alastair | Bates, Kirsty | Bhattacharyya, Sumit | Browne, Paul | van den Broek, Alexandra | Castro, Matias | Duggan, Karyn | Eberhardt, Ruth | Faruque, Nadeem | Gamble, John | Kanz, Carola | Kulikova, Tamara | Lee, Charles | Leinonen, Rasko | Lin, Quan | Lombard, Vincent | Lopez, Rodrigo | McHale, Michelle | McWilliam, Hamish | Mukherjee, Gaurab | Nardone, Francesco | Pastor, Maria Pilar Garcia | Sobhany, Siamak | Stoehr, Peter | Tzouvara, Katerina | Vaughan, Robert | Wu, Dan | Zhu, Weimin | Apweiler, Rolf
The EMBL Nucleotide Sequence Database () at the EMBL European Bioinformatics Institute, UK, offers a comprehensive set of publicly available nucleotide sequence and annotation, freely accessible to all. Maintained in collaboration with partners DDBJ and GenBank, coverage includes whole genome sequencing project data, directly submitted sequence, sequence recorded in support of patent applications and much more. The database continues to offer submission tools, data retrieval facilities and user support. In 2005, the volume of data offered has continued to grow exponentially. In addition to the newly presented data, the database encompasses a range of new data types generated by novel technologies, offers enhanced presentation and searchability of the data and has greater integration with other data resources offered at the EBI and elsewhere. In stride with these developing data types, the database has continued to develop submission and retrieval tools to maximise the information content of submitted data and to offer the simplest possible submission routes for data producers. New developments, the submission process, data retrieval and access to support are presented in this paper, along with links to sources of further information.
doi:10.1093/nar/gkj130
PMCID: PMC1347492
PMID: 16381823
Kaminuma, Eli | Kosuge, Takehide | Kodama, Yuichi | Aono, Hideo | Mashima, Jun | Gojobori, Takashi | Sugawara, Hideaki | Ogasawara, Osamu | Takagi, Toshihisa | Okubo, Kousaku | Nakamura, Yasukazu
The DNA Data Bank of Japan (DDBJ, http://www.ddbj.nig.ac.jp) provides a nucleotide sequence archive database and accompanying database tools for sequence submission, entry retrieval and annotation analysis. The DDBJ collected and released 3 637 446 entries/2 272 231 889 bases between July 2009 and June 2010. A highlight of the released data was archive datasets from next-generation sequencing reads of Japanese rice cultivar, Koshihikari submitted by the National Institute of Agrobiological Sciences. In this period, we started a new archive for quantitative genomics data, the DDBJ Omics aRchive (DOR). The DOR stores quantitative data both from the microarray and high-throughput new sequencing platforms. Moreover, we improved the content of the DDBJ patent sequence, released a new submission tool of the DDBJ Sequence Read Archive (DRA) which archives massive raw sequencing reads, and enhanced a cloud computing-based analytical system from sequencing reads, the DDBJ Read Annotation Pipeline. In this article, we describe these new functions of the DDBJ databases and support tools.
doi:10.1093/nar/gkq1041
PMCID: PMC3013661
PMID: 21062814
The International Nucleotide Sequence Database Collaboration (INSDC; http://www.insdc.org), one of the longest-standing global alliances of biological data archives, captures, preserves and provides comprehensive public domain nucleotide sequence information. Three partners of the INSDC work in cooperation to establish formats for data and metadata and protocols that facilitate reliable data submission to their databases and support continual data exchange around the world. In this article, the INSDC current status and update for the year of 2012 are presented. Among discussed items of international collaboration meeting in 2012, BioSample database and changes in submission are described as topics.
doi:10.1093/nar/gks1084
PMCID: PMC3531182
PMID: 23180798
Camon, Evelyn | Magrane, Michele | Barrell, Daniel | Lee, Vivian | Dimmer, Emily | Maslen, John | Binns, David | Harte, Nicola | Lopez, Rodrigo | Apweiler, Rolf
The Gene Ontology Annotation (GOA) database (http://www.ebi.ac.uk/GOA) aims to provide high-quality electronic and manual annotations to the UniProt Knowledgebase (Swiss-Prot, TrEMBL and PIR-PSD) using the standardized vocabulary of the Gene Ontology (GO). As a supplementary archive of GO annotation, GOA promotes a high level of integration of the knowledge represented in UniProt with other databases. This is achieved by converting UniProt annotation into a recognized computational format. GOA provides annotated entries for nearly 60 000 species (GOA-SPTr) and is the largest and most comprehensive open-source contributor of annotations to the GO Consortium annotation effort. By integrating GO annotations from other model organism groups, GOA consolidates specialized knowledge and expertise to ensure the data remain a key reference for up-to-date biological information. Furthermore, the GOA database fully endorses the Human Proteomics Initiative by prioritizing the annotation of proteins likely to benefit human health and disease. In addition to a non-redundant set of annotations to the human proteome (GOA-Human) and monthly releases of its GO annotation for all species (GOA-SPTr), a series of GO mapping files and specific cross-references in other databases are also regularly distributed. GOA can be queried through a simple user-friendly web interface or downloaded in a parsable format via the EBI and GO FTP websites. The GOA data set can be used to enhance the annotation of particular model organism or gene expression data sets, although increasingly it has been used to evaluate GO predictions generated from text mining or protein interaction experiments. In 2004, the GOA team will build on its success and will continue to supplement the functional annotation of UniProt and work towards enhancing the ability of scientists to access all available biological information. Researchers wishing to query or contribute to the GOA project are encouraged to email: goa@ebi.ac.uk.
doi:10.1093/nar/gkh021
PMCID: PMC308756
PMID: 14681408
Brooksbank, Catherine | Camon, Evelyn | Harris, Midori A. | Magrane, Michele | Martin, Maria Jesus | Mulder, Nicola | O'Donovan, Claire | Parkinson, Helen | Tuli, Mary Ann | Apweiler, Rolf | Birney, Ewan | Brazma, Alvis | Henrick, Kim | Lopez, Rodrigo | Stoesser, Guenter | Stoehr, Peter | Cameron, Graham
As the amount of biological data grows, so does the need for biologists to store and access this information in central repositories in a free and unambiguous manner. The European Bioinformatics Institute (EBI) hosts six core databases, which store information on DNA sequences (EMBL-Bank), protein sequences (SWISS-PROT and TrEMBL), protein structure (MSD), whole genomes (Ensembl) and gene expression (ArrayExpress). But just as a cell would be useless if it couldn't transcribe DNA or translate RNA, our resources would be compromised if each existed in isolation. We have therefore developed a range of tools that not only facilitate the deposition and retrieval of biological information, but also allow users to carry out searches that reflect the interconnectedness of biological information. The EBI's databases and tools are all available on our website at www.ebi.ac.uk.
PMCID: PMC165513
PMID: 12519944