The Basic Local Alignment Search Tool (BLAST) programs (8
) perform sequence-similarity searches against a variety of sequence databases, returning a set of gapped alignments with links to full database records, to UniGene, Gene, the MMDB or GEO. One variant, BLAST2Sequences (11
), compares two DNA or protein sequences and produces a dot-plot representation of the alignments.
Each alignment returned by BLAST is scored and assigned a measure of statistical significance, called the Expectation Value (E
-value). BLAST takes into account the amino acid composition of the query sequence in its estimation of statistical significance. This composition-based statistical treatment, used in conventional protein BLAST searches as well as PSI-BLAST searches, tends to reduce the number of false-positive database hits (12
). The alignments returned can be limited by an E
-value threshold or range.
Standard output formats include the default pairwise alignment, several query-anchored multiple sequence alignment formats, an easily-parsable Hit Table and a taxonomically organized output. Database sequences appearing in BLAST results may be marked for batch retrieval using check boxes. A new, enhanced, formatter displays alignments against database sequences that are >200
000 bp in length with links to nearby features, such as genes. A new ‘Pairwise with identities’ mode better highlights differences between the query and a target sequence. An option to display masked characters in lower-case or using distinct colors is now available.
The web BLAST interface allows both the initial search and the results displayed to be restricted to a database subset using an Entrez query as a filter. Web BLAST uses a standard URL-API that allows complete search specifications, including BLAST parameters, such as Entrez restrictions and the search query, to be contained in a URL posted to the web page.
), designed to find nearly exact matches, offers a Web interface that handles batch nucleotide queries and operates up to 10 times faster than standard nucleotide BLAST. MegaBLAST is the default search program for NCBIs Genomic BLAST pages. MegaBLAST is also used to search the rapidly growing Trace Archive and is available for the standard BLAST databases as well. For rapid cross-species nucleotide queries, NCBI offers Discontiguous MegaBLAST which uses a non-contiguous word match (14
) as the nucleus for its alignments. Discontiguous MegaBLAST is far more rapid than a translated search such as blastx, yet maintains a competitive degree of sensitivity when comparing coding regions.
A new strategy has been designed to improve the speed of BLAST searches. The system, called ‘SplitD’, splits the databases into a number of segments to spread the calculations across multiple back-end machines. SplitD keeps track of the database segments that have been used most recently and are, therefore, most likely to remain in memory. These segments can be reused in the next search to avoid a slow read from disk storage.
BLAST Link (BLink) displays pre-computed BLAST alignments to similar sequences for each protein sequence in the Entrez databases. BLink can display alignment subsets limited by taxonomic criteria, by database of origin, relation to a complete genome, membership in a COG (15
) or by relation to a 3D structure or conserved protein domain. BLink links are displayed for protein records in Entrez as well as within Entrez Gene reports.