PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (851469)

Clipboard (0)
None

Related Articles

1.  The sequence read archive: explosive growth of sequencing data 
Nucleic Acids Research  2011;40(D1):D54-D56.
New generation sequencing platforms are producing data with significantly higher throughput and lower cost. A portion of this capacity is devoted to individual and community scientific projects. As these projects reach publication, raw sequencing datasets are submitted into the primary next-generation sequence data archive, the Sequence Read Archive (SRA). Archiving experimental data is the key to the progress of reproducible science. The SRA was established as a public repository for next-generation sequence data as a part of the International Nucleotide Sequence Database Collaboration (INSDC). INSDC is composed of the National Center for Biotechnology Information (NCBI), the European Bioinformatics Institute (EBI) and the DNA Data Bank of Japan (DDBJ). The SRA is accessible at www.ncbi.nlm.nih.gov/sra from NCBI, at www.ebi.ac.uk/ena from EBI and at trace.ddbj.nig.ac.jp from DDBJ. In this article, we present the content and structure of the SRA and report on updated metadata structures, submission file formats and supported sequencing platforms. We also briefly outline our various responses to the challenge of explosive data growth.
doi:10.1093/nar/gkr854
PMCID: PMC3245110  PMID: 22009675
2.  The Sequence Read Archive 
Nucleic Acids Research  2010;39(Database issue):D19-D21.
The combination of significantly lower cost and increased speed of sequencing has resulted in an explosive growth of data submitted into the primary next-generation sequence data archive, the Sequence Read Archive (SRA). The preservation of experimental data is an important part of the scientific record, and increasing numbers of journals and funding agencies require that next-generation sequence data are deposited into the SRA. The SRA was established as a public repository for the next-generation sequence data and is operated by the International Nucleotide Sequence Database Collaboration (INSDC). INSDC partners include the National Center for Biotechnology Information (NCBI), the European Bioinformatics Institute (EBI) and the DNA Data Bank of Japan (DDBJ). The SRA is accessible at http://www.ncbi.nlm.nih.gov/Traces/sra from NCBI, at http://www.ebi.ac.uk/ena from EBI and at http://trace.ddbj.nig.ac.jp from DDBJ. In this article, we present the content and structure of the SRA, detail our support for sequencing platforms and provide recommended data submission levels and formats. We also briefly outline our response to the challenge of data growth.
doi:10.1093/nar/gkq1019
PMCID: PMC3013647  PMID: 21062823
3.  The European Nucleotide Archive 
Nucleic Acids Research  2010;39(Database issue):D28-D31.
The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena) is Europe’s primary nucleotide-sequence repository. The ENA consists of three main databases: the Sequence Read Archive (SRA), the Trace Archive and EMBL-Bank. The objective of ENA is to support and promote the use of nucleotide sequencing as an experimental research platform by providing data submission, archive, search and download services. In this article, we outline these services and describe major changes and improvements introduced during 2010. These include extended EMBL-Bank and SRA-data submission services, extended ENA Browser functionality, support for submitting data to the European Genome-phenome Archive (EGA) through SRA, and the launch of a new sequence similarity search service.
doi:10.1093/nar/gkq967
PMCID: PMC3013801  PMID: 20972220
4.  Archiving next generation sequencing data 
Nucleic Acids Research  2009;38(Database issue):D870-D871.
Next generation sequencing platforms are producing biological sequencing data in unprecedented amounts. The partners of the International Nucleotide Sequencing Database Collaboration, which includes the National Center for Biotechnology Information (NCBI), the European Bioinformatics Institute (EBI), and the DNA Data Bank of Japan (DDBJ), have established the Sequence Read Archive (SRA) to provide the scientific community with an archival destination for next generation data sets. The SRA is now accessible at http://www.ncbi.nlm.nih.gov/Traces/sra from NCBI, at http://www.ebi.ac.uk/ena from EBI and at http://www.ddbj.nig.ac.jp/sub/trace_sra-e.html from DDBJ. Users of these resources can obtain data sets deposited in any of the three SRA instances. Links and submission instructions are provided.
doi:10.1093/nar/gkp1078
PMCID: PMC2808927  PMID: 19965774
5.  The DNA Data Bank of Japan launches a new resource, the DDBJ Omics Archive of functional genomics experiments 
Nucleic Acids Research  2011;40(D1):D38-D42.
The DNA Data Bank of Japan (DDBJ; http://www.ddbj.nig.ac.jp) maintains and provides archival, retrieval and analytical resources for biological information. The central DDBJ resource consists of public, open-access nucleotide sequence databases including raw sequence reads, assembly information and functional annotation. Database content is exchanged with EBI and NCBI within the framework of the International Nucleotide Sequence Database Collaboration (INSDC). In 2011, DDBJ launched two new resources: the ‘DDBJ Omics Archive’ (DOR; http://trace.ddbj.nig.ac.jp/dor) and BioProject (http://trace.ddbj.nig.ac.jp/bioproject). DOR is an archival database of functional genomics data generated by microarray and highly parallel new generation sequencers. Data are exchanged between the ArrayExpress at EBI and DOR in the common MAGE-TAB format. BioProject provides an organizational framework to access metadata about research projects and the data from the projects that are deposited into different databases. In this article, we describe major changes and improvements introduced to the DDBJ services, and the launch of two new resources: DOR and BioProject.
doi:10.1093/nar/gkr994
PMCID: PMC3244990  PMID: 22110025
6.  The EMBL Nucleotide Sequence Database 
Nucleic Acids Research  2002;30(1):21-26.
The EMBL Nucleotide Sequence Database (aka EMBL-Bank; http://www.ebi.ac.uk/embl/) incorporates, organises and distributes nucleotide sequences from all available public sources. EMBL-Bank is located and maintained at the European Bioinformatics Institute (EBI) near Cambridge, UK. In an international collaboration with DDBJ (Japan) and GenBank (USA), data are exchanged amongst the collaborating databases on a daily basis. Major contributors to the EMBL database are individual scientists and genome project groups. Webin is the preferred web-based submission system for individual submitters, whilst automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via FTP, email and World Wide Web interfaces. EBI’s Sequence Retrieval System (SRS), a network browser for databanks in molecular biology, integrates and links the main nucleotide and protein databases plus many other specialized databases. For sequence similarity searching, a variety of tools (e.g. Blitz, Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT. All resources can be accessed via the EBI home page at http://www.ebi.ac.uk.
PMCID: PMC99098  PMID: 11752244
7.  The EMBL Nucleotide Sequence Database: major new developments 
Nucleic Acids Research  2003;31(1):17-22.
The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/) incorporates, organizes and distributes nucleotide sequences from all available public sources. The database is located and maintained at the European Bioinformatics Institute (EBI) near Cambridge, UK. In an international collaboration with DDBJ (Japan) and GenBank (USA), data are exchanged amongst the collaborating databases on a daily basis to achieve optimal synchronization. Webin is the preferred web-based submission system for individual submitters, while automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via FTP, Email and World Wide Web interfaces. EBI's Sequence Retrieval System (SRS) integrates and links the main nucleotide and protein databases plus many other specialized molecular biology databases. For sequence similarity searching, a variety of tools (e.g. Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT. All resources can be accessed via the EBI home page at http://www.ebi.ac.uk.
PMCID: PMC165468  PMID: 12519939
8.  Petabyte-scale innovations at the European Nucleotide Archive 
Nucleic Acids Research  2008;37(Database issue):D19-D25.
Dramatic increases in the throughput of nucleotide sequencing machines, and the promise of ever greater performance, have thrust bioinformatics into the era of petabyte-scale data sets. Sequence repositories, which provide the feed for these data sets into the worldwide computational infrastructure, are challenged by the impact of these data volumes. The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/embl), comprising the EMBL Nucleotide Sequence Database and the Ensembl Trace Archive, has identified challenges in the storage, movement, analysis, interpretation and visualization of petabyte-scale data sets. We present here our new repository for next generation sequence data, a brief summary of contents of the ENA and provide details of major developments to submission pipelines, high-throughput rule-based validation infrastructure and data integration approaches.
doi:10.1093/nar/gkn765
PMCID: PMC2686451  PMID: 18978013
9.  The EMBL Nucleotide Sequence Database 
Nucleic Acids Research  2004;32(Database issue):D27-D30.
The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/), maintained at the European Bioinformatics Institute (EBI), incorporates, organizes and distributes nucleotide sequences from public sources. The database is a part of an international collaboration with DDBJ (Japan) and GenBank (USA). Data are exchanged between the collaborating databases on a daily basis to achieve optimal synchrony. The web-based tool, Webin, is the preferred system for individual submission of nucleotide sequences, including Third Party Annotation (TPA) and alignment data. Automatic submission procedures are used for submission of data from large-scale genome sequencing centres and from the European Patent Office. Database releases are produced quarterly. The latest data collection can be accessed via FTP, email and WWW interfaces. The EBI’s Sequence Retrieval System (SRS) integrates and links the main nucleotide and protein databases as well as many other specialist molecular biology databases. For sequence similarity searching, a variety of tools (e.g. FASTA and BLAST) are available that allow external users to compare their own sequences against the data in the EMBL Nucleotide Sequence Database, the complete genomic component subsection of the database, the WGS data sets and other databases. All available resources can be accessed via the EBI home page at http://www.ebi.ac.uk.
doi:10.1093/nar/gkh120
PMCID: PMC308854  PMID: 14681351
10.  Major submissions tool developments at the European nucleotide archive 
Nucleic Acids Research  2011;40(D1):D43-D47.
The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena), Europe's primary nucleotide sequence resource, captures and presents globally comprehensive nucleic acid sequence and associated information. Covering the spectrum from raw data to assembled and functionally annotated genomes, the ENA has witnessed a dramatic growth resulting from advances in sequencing technology and ever broadening application of the methodology. During 2011, we have continued to operate and extend the broad range of ENA services. In particular, we have released major new functionality in our interactive web submission system, Webin, through developments in template-based submissions for annotated sequences and support for raw next-generation sequence read submissions.
doi:10.1093/nar/gkr946
PMCID: PMC3245037  PMID: 22080548
11.  The EMBL nucleotide sequence database 
Nucleic Acids Research  2001;29(1):17-21.
The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/) is maintained at the European Bioinformatics Institute (EBI) in an international collaboration with the DNA Data Bank of Japan (DDBJ) and GenBank at the NCBI (USA). Data is exchanged amongst the collaborating databases on a daily basis. The major contributors to the EMBL database are individual authors and genome project groups. Webin is the preferred web-based submission system for individual submitters, whilst automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via ftp, email and World Wide Web interfaces. EBI’s Sequence Retrieval System (SRS), a network browser for databanks in molecular biology, integrates and links the main nucleotide and protein databases plus many specialized databases. For sequence similarity searching a variety of tools (e.g. Blitz, Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT.
PMCID: PMC29766  PMID: 11125039
12.  The EMBL Nucleotide Sequence Database. 
Nucleic Acids Research  1999;27(1):18-24.
The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl.html) constitutes Europe's primary nucleotide sequence resource. Main sources for DNA and RNA sequences are direct submissions from individual researchers, genome sequencing projects and patent applications. While automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO), the preferred submission tool for individual submitters is Webin (WWW). Through all stages, dataflow is monitored by EBI biologists communicating with the sequencing groups. In collaboration with DDBJ and GenBank the database is produced, maintained and distributed at the European Bioinformatics Institute (EBI). Database releases are produced quarterly and are distributed on CD-ROM. Network services allow access to the most up-to-date data collection via Internet and World Wide Web interface. EBI's Sequence Retrieval System (SRS) is a Network Browser for Databanks in Molecular Biology, integrating and linking the main nucleotide and protein databases, plus many specialised databases. For sequence similarity searching a variety of tools (e.g. Blitz, Fasta, Blast etc) are available for external users to compare their own sequences against the most currently available data in the EMBL Nucleotide Sequence Database and SWISS-PROT.
PMCID: PMC148088  PMID: 9847133
13.  Facing growth in the European Nucleotide Archive 
Nucleic Acids Research  2012;41(D1):D30-D35.
The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena/) collects, maintains and presents comprehensive nucleic acid sequence and related information as part of the permanent public scientific record. Here, we provide brief updates on ENA content developments and major service enhancements in 2012 and describe in more detail two important areas of development and policy that are driven by ongoing growth in sequencing technologies. First, we describe the ENA data warehouse, a resource for which we provide a programmatic entry point to integrated content across the breadth of ENA. Second, we detail our plans for the deployment of CRAM data compression technology in ENA.
doi:10.1093/nar/gks1175
PMCID: PMC3531187  PMID: 23203883
14.  Assembly information services in the European Nucleotide Archive 
Nucleic Acids Research  2013;42(D1):D38-D43.
The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena) is a repository for the world public domain nucleotide sequence data output. ENA content covers a spectrum of data types including raw reads, assembly data and functional annotation. ENA has faced a dramatic growth in genome assembly submission rates, data volumes and complexity of datasets. This has prompted a broad reworking of assembly submission services, for which we now reach the end of a major programme of work and many enhancements have already been made available over the year to components of the submission service. In this article, we briefly review ENA content and growth over 2013, describe our rapidly developing services for genome assembly information and outline further major developments over the last year.
doi:10.1093/nar/gkt1082
PMCID: PMC3965037  PMID: 24214989
15.  DDBJ new system and service refactoring 
Nucleic Acids Research  2012;41(D1):D25-D29.
The DNA data bank of Japan (DDBJ, http://www.ddbj.nig.ac.jp) maintains a primary nucleotide sequence database and provides analytical resources for biological information to researchers. This database content is exchanged with the US National Center for Biotechnology Information (NCBI) and the European Bioinformatics Institute (EBI) within the framework of the International Nucleotide Sequence Database Collaboration (INSDC). Resources provided by the DDBJ include traditional nucleotide sequence data released in the form of 27 316 452 entries or 16 876 791 557 base pairs (as of June 2012), and raw reads of new generation sequencers in the sequence read archive (SRA). A Japanese researcher published his own genome sequence via DDBJ-SRA on 31 July 2012. To cope with the ongoing genomic data deluge, in March 2012, our computer previous system was totally replaced by a commodity cluster-based system that boasts 122.5 TFlops of CPU capacity and 5 PB of storage space. During this upgrade, it was considered crucial to replace and refactor substantial portions of the DDBJ software systems as well. As a result of the replacement process, which took more than 2 years to perform, we have achieved significant improvements in system performance.
doi:10.1093/nar/gks1152
PMCID: PMC3531146  PMID: 23180790
16.  Tidying Up International Nucleotide Sequence Databases: Ecological, Geographical and Sequence Quality Annotation of ITS Sequences of Mycorrhizal Fungi 
PLoS ONE  2011;6(9):e24940.
Sequence analysis of the ribosomal RNA operon, particularly the internal transcribed spacer (ITS) region, provides a powerful tool for identification of mycorrhizal fungi. The sequence data deposited in the International Nucleotide Sequence Databases (INSD) are, however, unfiltered for quality and are often poorly annotated with metadata. To detect chimeric and low-quality sequences and assign the ectomycorrhizal fungi to phylogenetic lineages, fungal ITS sequences were downloaded from INSD, aligned within family-level groups, and examined through phylogenetic analyses and BLAST searches. By combining the fungal sequence database UNITE and the annotation and search tool PlutoF, we also added metadata from the literature to these accessions. Altogether 35,632 sequences belonged to mycorrhizal fungi or originated from ericoid and orchid mycorrhizal roots. Of these sequences, 677 were considered chimeric and 2,174 of low read quality. Information detailing country of collection, geographical coordinates, interacting taxon and isolation source were supplemented to cover 78.0%, 33.0%, 41.7% and 96.4% of the sequences, respectively. These annotated sequences are publicly available via UNITE (http://unite.ut.ee/) for downstream biogeographic, ecological and taxonomic analyses. In European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena/), the annotated sequences have a special link-out to UNITE. We intend to expand the data annotation to additional genes and all taxonomic groups and functional guilds of fungi.
doi:10.1371/journal.pone.0024940
PMCID: PMC3174234  PMID: 21949797
17.  The EMBL Nucleotide Sequence Database 
Nucleic Acids Research  2000;28(1):19-23.
The European Molecular Biology Laboratory (EMBL) Nucleotide Sequence Database (http://www.ebi.ac. uk/embl/index.html ) is maintained at the European Bioinformatics Institute (EBI) in an international collaboration with the DNA Data Bank of Japan (DDBJ) and GenBank (USA). Data is exchanged amongst the collaborative databases on a daily basis. The major contributors to the EMBL database are individual authors and genome project groups. WEBIN is the preferred web-based submission system for individual submitters, whilst automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via Internet and WWW interfaces. EBI’s Sequence Retrieval System (SRS) is a network browser for databanks in molecular biology, integrating and linking the main nucleotide and protein databases plus many specialised databases. For sequence similarity searching a variety of tools (e.g., BLITZ, FASTA, BLAST) are available which allow external users to compare their own sequences against the most currently available data in the EMBL Nucleotide Sequence Database and SWISS-PROT.
PMCID: PMC102461  PMID: 10592171
18.  The EMBL nucleotide sequence database. 
Nucleic Acids Research  1998;26(1):8-15.
The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl. html ) constitutes Europe's primary nucleotide sequence resource. DNA and RNA sequences are directly submitted from researchers and genome sequencing groups and collected from the scientific literature and patent applications (Fig. 1). In collaboration with DDBJ and GenBank the database is produced, maintained and distributed at the European Bioinformatics Institute. Database releases are produced quarterly and are distributed on CD-ROM. EBI's network services allow access to the most up-to-date data collection via Internet and World Wide Web interface, providing database searching and sequence similarity facilities plus access to a large number of additional databases.
PMCID: PMC147241  PMID: 9399791
19.  DDBJ progress report: a new submission system for leading to a correct annotation 
Nucleic Acids Research  2013;42(D1):D44-D49.
The DNA Data Bank of Japan (DDBJ; http://www.ddbj.nig.ac.jp) maintains and provides archival, retrieval and analytical resources for biological information. This database content is shared with the US National Center for Biotechnology Information (NCBI) and the European Bioinformatics Institute (EBI) within the framework of the International Nucleotide Sequence Database Collaboration (INSDC). DDBJ launched a new nucleotide sequence submission system for receiving traditional nucleotide sequence. We expect that the new submission system will be useful for many submitters to input accurate annotation and reduce the time needed for data input. In addition, DDBJ has started a new service, the Japanese Genotype–phenotype Archive (JGA), with our partner institute, the National Bioscience Database Center (NBDC). JGA permanently archives and shares all types of individual human genetic and phenotypic data. We also introduce improvements in the DDBJ services and databases made during the past year.
doi:10.1093/nar/gkt1066
PMCID: PMC3964987  PMID: 24194602
20.  EMBL Nucleotide Sequence Database in 2006 
Nucleic Acids Research  2006;35(Database issue):D16-D20.
The EMBL Nucleotide Sequence Database () at the EMBL European Bioinformatics Institute, UK, offers a large and freely accessible collection of nucleotide sequences and accompanying annotation. The database is maintained in collaboration with DDBJ and GenBank. Data are exchanged between the collaborating databases on a daily basis to achieve optimal synchrony. Webin is the preferred tool for individual submissions of nucleotide sequences, including Third Party Annotation, alignments and bulk data. Automated procedures are provided for submissions from large-scale sequencing projects and data from the European Patent Office. In 2006, the volume of data has continued to grow exponentially. Access to the data is provided via SRS, ftp and variety of other methods. Extensive external and internal cross-references enable users to search for related information across other databases and within the database. All available resources can be accessed via the EBI home page at . Changes over the past year include changes to the file format, further development of the EMBLCDS dataset and developments to the XML format.
doi:10.1093/nar/gkl913
PMCID: PMC1897316  PMID: 17148479
21.  MetaBar - a tool for consistent contextual data acquisition and standards compliant submission 
BMC Bioinformatics  2010;11:358.
Background
Environmental sequence datasets are increasing at an exponential rate; however, the vast majority of them lack appropriate descriptors like sampling location, time and depth/altitude: generally referred to as metadata or contextual data. The consistent capture and structured submission of these data is crucial for integrated data analysis and ecosystems modeling. The application MetaBar has been developed, to support consistent contextual data acquisition.
Results
MetaBar is a spreadsheet and web-based software tool designed to assist users in the consistent acquisition, electronic storage, and submission of contextual data associated to their samples. A preconfigured Microsoft® Excel® spreadsheet is used to initiate structured contextual data storage in the field or laboratory. Each sample is given a unique identifier and at any stage the sheets can be uploaded to the MetaBar database server. To label samples, identifiers can be printed as barcodes. An intuitive web interface provides quick access to the contextual data in the MetaBar database as well as user and project management capabilities. Export functions facilitate contextual and sequence data submission to the International Nucleotide Sequence Database Collaboration (INSDC), comprising of the DNA DataBase of Japan (DDBJ), the European Molecular Biology Laboratory database (EMBL) and GenBank. MetaBar requests and stores contextual data in compliance to the Genomic Standards Consortium specifications. The MetaBar open source code base for local installation is available under the GNU General Public License version 3 (GNU GPL3).
Conclusion
The MetaBar software supports the typical workflow from data acquisition and field-sampling to contextual data enriched sequence submission to an INSDC database. The integration with the megx.net marine Ecological Genomics database and portal facilitates georeferenced data integration and metadata-based comparisons of sampling sites as well as interactive data visualization. The ample export functionalities and the INSDC submission support enable exchange of data across disciplines and safeguarding contextual data.
doi:10.1186/1471-2105-11-358
PMCID: PMC2912304  PMID: 20591175
22.  Priorities for nucleotide trace, sequence and annotation data capture at the Ensembl Trace Archive and the EMBL Nucleotide Sequence Database 
Nucleic Acids Research  2007;36(Database issue):D5-D12.
The Ensembl Trace Archive (http://trace.ensembl.org/) and the EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/), known together as the European Nucleotide Archive, continue to see growth in data volume and diversity. Selected major developments of 2007 are presented briefly, along with data submission and retrieval information. In the face of increasing requirements for nucleotide trace, sequence and annotation data archiving, data capture priority decisions have been taken at the European Nucleotide Archive. Priorities are discussed in terms of how reliably information can be captured, the long-term benefits of its capture and the ease with which it can be captured.
doi:10.1093/nar/gkm1018
PMCID: PMC2238915  PMID: 18039715
23.  The European Bioinformatics Institute (EBI) databases. 
Nucleic Acids Research  1994;22(17):3445-3449.
This paper describes the databases and services of the European Bioinformatics Institute (EBI). In collaboration with DDBJ and GenBank/NCBI, the EBI maintains and distributes the EMBL Nucleotide Sequence Database, Europe's primary nucleotide sequence data resource. The EBI also maintains and distributes the SWISS-PROT Protein Sequence Database, in collaboration with Amos Bairoch of the University of Geneva. Over thirty additional specialist molecular biology databases, as well as software and documentation of interest to molecular biologists, are also available. The EBI network services include database searching, entry retrieval, and sequence similarity searching facilities.
PMCID: PMC308299  PMID: 7937043
24.  The EMBL Nucleotide Sequence Database. 
Nucleic Acids Research  1997;25(1):7-14.
The EMBL Nucleotide Sequence Database is a comprehensive database of DNA and RNA sequences directly submitted from researchers and genome sequencing groups and collected from the scientific literature and patent applications. In collaboration with DDBJ and GenBank the database is produced, maintained and distributed at the European Bioinformatics Institute (EBI) and constitutes Europe's primary nucleotide sequence resource. Database releases are produced quarterly and are distributed on CD-ROM. EBI's network services allow access to the most up-to-date data collection via Internet and World Wide Web interface, providing database searching and sequence similarity facilities plus access to a large number of additional databases.
PMCID: PMC146376  PMID: 9016493
25.  Experimental Design-Based Functional Mining and Characterization of High-Throughput Sequencing Data in the Sequence Read Archive 
PLoS ONE  2013;8(10):e77910.
High-throughput sequencing technology, also called next-generation sequencing (NGS), has the potential to revolutionize the whole process of genome sequencing, transcriptomics, and epigenetics. Sequencing data is captured in a public primary data archive, the Sequence Read Archive (SRA). As of January 2013, data from more than 14,000 projects have been submitted to SRA, which is double that of the previous year. Researchers can download raw sequence data from SRA website to perform further analyses and to compare with their own data. However, it is extremely difficult to search entries and download raw sequences of interests with SRA because the data structure is complicated, and experimental conditions along with raw sequences are partly described in natural language. Additionally, some sequences are of inconsistent quality because anyone can submit sequencing data to SRA with no quality check. Therefore, as a criterion of data quality, we focused on SRA entries that were cited in journal articles. We extracted SRA IDs and PubMed IDs (PMIDs) from SRA and full-text versions of journal articles and retrieved 2748 SRA ID-PMID pairs. We constructed a publication list referring to SRA entries. Since, one of the main themes of -omics analyses is clarification of disease mechanisms, we also characterized SRA entries by disease keywords, according to the Medical Subject Headings (MeSH) extracted from articles assigned to each SRA entry. We obtained 989 SRA ID-MeSH disease term pairs, and constructed a disease list referring to SRA data. We previously developed feature profiles of diseases in a system called “Gendoo”. We generated hyperlinks between diseases extracted from SRA and the feature profiles of it. The developed project, publication and disease lists resulting from this study are available at our web service, called “DBCLS SRA” (http://sra.dbcls.jp/). This service will improve accessibility to high-quality data from SRA.
doi:10.1371/journal.pone.0077910
PMCID: PMC3805581  PMID: 24167589

Results 1-25 (851469)