) is an integrated database retrieval system that enables text searching, using simple Boolean queries, of a diverse set of 31 databases. Global Query, the default search on the NCBI homepage, searches across all the Entrez databases and rapidly returns the counts of matching records in each database. A user may then display results or further refine searches in any individual database. The Entrez databases include ~91 million DNA and protein sequences derived from several sources (1
), the NCBI taxonomy, genomes, population sets, gene expression data, over 1.2 million gene-oriented sequence clusters in UniGene, almost 500 000 sequence-tagged sites in UniSTS, 34 million genetic variations in dbSNP, over 36 000 protein structures from the Molecular Modeling Database (MMDB) (6
), 168 000 3D and 12 000 alignment-based protein domains, and the biomedical literature via PubMed, Pubmed Central (PMC), Online Mendelian Inheritance in Man (OMIM) and online books. The books database contains >60 online scientific textbooks. To enable researchers to quickly reach the appropriate NCBI resource, the content of the NCBI web pages and FTP directories has been incorporated into an Entrez database of its own. Searches of the NCBI web site using the same powerful queries available for the biological databases are therefore possible.
Entrez provides extensive links within and between database records. In their simplest form, these links may be cross-references between a sequence and the abstract of the paper in which it is reported, or between a protein sequence and its coding DNA sequence or, perhaps, its 3D-structure. Other examples are links between a genomic assembly and its components or between a genomic sequence and those sequences derived from its annotation. Computationally derived links between ‘neighboring records’ such as those based on computed similarities among sequences or among PubMed abstracts, allow rapid access to groups of related records. A service called LinkOut expands the range of links to include external services, such as organism-specific genome databases. To accommodate the growing number of links, Entrez provides a Links pull down menu that appears in the top, right hand corner of record displays.
The records retrieved in Entrez can be displayed in many formats and downloaded singly or in batches. A redirection control allows results to be saved in a local file, shown in the browser as plain text. Results may also be sent to the Entrez clipboard where they may be recalled later during an Entrez session or saved between sessions using My NCBI, described below. In addition, PubMed results and those from other databases may be emailed directly from Entrez or exported as RSS feeds. Formats available for GenBank records include the GenBank Flatfile, FASTA, XML, ASN.1 and others. Graphical display formats are offered for some types of records, including genomic records. For sequence records, a formatting control allows the display or download of a particular range of residues.
Entrez's ‘My NCBI’ allows users to store personal configuration options such as search filters, LinkOut preferences and document delivery providers. My NCBI also saves searches and can automatically email updated search results. Entrez uses a set of up to five filter tabs used to display subsets of database results. The tabs vary according to Entrez database; examples of some defaults include ‘mRNA’ and ‘RefSesq’ subsets for Nucleotide; a ‘Review’ subset for PubMed; ‘NMR’ and ‘X-ray’ subsets for Structure. Default filter tabs can be changed using My NCBI. Additional My NCBI features include changing the way Entrez links are displayed to standard html links or pull downs, and highlighting PubMed search terms. A recently added My NCBI feature called ‘Collections’ allows users to save search results and bibliographies indefinitely.
Scripted access to Entrez is provided by the Entrez Programming Utilities (E-Utilities), a suite of eight server-side programs supporting a uniform set of parameters used to search, link between, and download from, the Entrez databases. A search history, available via interactive Entrez as well as via the E-Utilities, allows users to recall the results of previous searches during an Entrez session and combine them using Boolean logic. The ‘einfo’ utility can be used to retrieve detailed information about the Entrez databases, such as lists of supported search fields or the date of the last database update, while ‘egquery’ returns the number of matches to a single query in every Entrez database. An automated system may use E-Utilities such as ‘efetch’ or ‘esummary’, to retrieve the data. Espell checks spelling within Entrez queries and offers suggestions in cases where a misspelling might cause key records to be missed. Support for the Simple Object Access Protocol (SOAP) interface to the E-Utilities was expanded during the past year, and now supports full downloads (efetch) from nine of the Entrez databases with esearch and esummary, support for all. Instructions for using the E-Utilities are found under the ‘Entrez Tools’ link on the NCBI home page.
PubMed and PubMed Central
The PubMed database includes over 16.5 million citations from >19 000 life science journals for biomedical articles back to the 1950s, most with abstracts and many with links to the full-text article. PubMed is heavily linked to other core Entrez databases such as Nucleotide, Protein, Gene, Structure and PubChem where it provides a crucial bridge between the data of molecular biology and the scientific literature. PubMed records are also linked to one another within Entrez as ‘related articles’ on the basis of computationally detected similarities using indexed Medical Subject Heading (7
) terms and the text of titles and abstracts. To put information about the top-ranking related articles at the fingertips of researches, the ‘Abstract-Plus’ display for single PubMed records was introduced this year as the default format for a single record. Abstract-Plus shows, in addition to the abstract of a paper, succinct descriptions of the top five related articles, increasing the potential for the discovery of important relationships.
PubMed Central (8
) is a digital archive of peer-reviewed journals in the life sciences providing access to >750 000 full-text articles, a 50% increase over the past year. More than 270 journals, including Nucleic Acids Research, deposit the full text of their articles in PMC. It includes digitized back content for many journals, going back in some cases to the 1800s or early 1900s. Participation in PMC requires a commitment to free access to full text, either immediately after publication or within a 12month period. All PMC free articles are identified in PubMed search results and PMC itself can be searched using Entrez.
The NCBI taxonomy database, growing at the rate of 2900 new taxa a month, indexes >240 000 named organisms that are represented in the databases with at least one nucleotide or protein sequence. The Taxonomy Browser can be used to view the taxonomic position or retrieve data from any of the principal Entrez databases for a particular organism or group. The Taxonomy Browser also displays links to the Map Viewer, Genomic BLAST services, the Trace Archive, and to external model organism and taxonomic databases via LinkOut. Searches of the NCBI taxonomy may be made on the basis of whole, partial or phonetically spelled organism names. Entrez Taxonomy displays include custom taxonomic trees representing user-specified subsets of the full NCBI taxonomy.