For details of URLs, email addresses and other contact information, please refer to , where details of submission, retrieval and support access points to the EMBL Nucleotide Sequence Database are presented in full.
Submission and update
As the variety and volume of nucleotide data grow, the EMBL Nucleotide Sequence Database has successfully provided technical solutions to assist in sequence versioning, annotation and presentation to the community. Data are submitted to EMBL through project accounts or, increasingly, the direct submissions department. Project account submissions require extensive bioinformatics capabilities of the submitting team, while direct submissions call on an expert in house curation staff to assist in annotation and preparation of data for the database.
The core direct submissions tool, the web-based Webin, comprises a variety of submission procedures, offering a smooth submission process for single, manually annotated sequences through to fasta format submission of large numbers of similarly annotated sequences (e.g. cytochrome oxidase genes in a barcoding study) and extensively annotated complete genome submission.
Over the year, while the number of data submitters has grown gently, as new functionality has been rolled out into Webin, the database has seen a significant increase in entries submitted through direct submissions ().
Increase in scale of direct submissions. Monthly newly submitted entry counts are shown.
For large-scale direct submissions, starting with the Webin tool, users submit a representative sample entry from their dataset. They also detail which fields will vary between entries. Using the submission information and, in some cases, further communication with the submitter, a database curator offers the most appropriate (for submitter and curator alike) means of submission. This can involve the creation of web-based templates for completion or invitation to submit a single file in fasta, or some other format.
For large single entries that are likely to have extensive annotation, such as complete bacterial genomes or eukaryotic chromosomes, Webin users submit a ‘blank’ entry, including details of the source organism, submission and publication details and sequence. Following a curation step, submitters are invited to submit their annotation in a suitable format, such an output file from Artemis (16
The EMBL Nucleotide Sequence Database has continued to operate a two day turnaround for small direct submissions (<25 entries) and a 5 day working rule for large-scale submissions (>25 entries); we aim to provide accession numbers within these time frames, provided we have been supplied with all of the information we require to enter the data into the database.
Access points to EMBL Nucleotide Sequence Database data include SRS, homology search tools, the Sequence Version Archive (SVA), (17
), the FTP site, the Genomes webserver (for completed genomes) and sequence retrieval by accession number (Dbfetch, Wsdbfetch and netserv), (6
). These facilities offer a wide range of opportunities to search and download data. Assistance is available while using the tools from the EBI toolbox help documentation, linked from toolbox pages, by clicking on support at the foot of many EBI web pages, or by contacting datasubs/at/ebi.ac.uk
directly. The EBI also offers bioinformatics educational resources at the 2can site (see for all URLs).
SRS presents EMBL Nucleotide Sequence Database data in the form of a number of component libraries. For many uses, the virtual library, EMBL, is sufficient and is set by default for quick searches. EMBL libraries represented in SRS are shown in . Assistance with SRS is available at the SRS help centre and by email (see ). Specific help with EMBL data are also available from datasubs/at/ebi.ac.uk.
While the database access tools available are suitable for the majority of uses of the data, there are some users who have unusual queries to run on the data, or require presentations of the data that are unavailable through the tools offered. Typically, these users lack the bioinformatics resources required to download whole datasets from the FTP server and implement their own database for analysis, but have extensive data manipulation to perform. In many of these cases, the database is able to provide custom datasets to assist the user. Please send custom dataset requests to datasubs/at/ebi.ac.uk and we will be able to advise.
As a key service provider, we operate extensive helpdesk facilities, where users, both submitters and data readers, are able to resolve issues that relate to the EMBL Nucleotide Sequence Database, its data and tools. Moreover, helpdesk functions are integrated with the broader EBI support operations, where holistic solutions are provided by those at the working end of the resources concerned. The helpdesk team can be contacted at datasubs/at/ebi.ac.uk.
Staff members at the EMBL Nucleotide Sequence Database are able to assist users of the database with problems relating to data format for submission, the submission process, through to searching, downloading and making sense of the data. All queries are welcome and we aim to respond rapidly where possible, please provide any accession numbers or submission identifiers that we have supplied to help us deal with the issue quickly.
Announcements are posted as appropriate on the EMBL news page and changes to database format and contents are announced on the Forthcoming Changes page (see for all URLs).