The most frequent type of analysis performed on GenBank data is the search for sequences similar to a query sequence. NCBI offers the BLAST family of search programs (8
) for this purpose. NCBI’s Web interface to the standard BLAST 2.0 program accepts a sequence or accession number as the input query. The search for similarity, performed using an identity matrix for blastn (nucleotide) searches and a PAM or BLOSUM scoring matrix for protein searches, results in a set of gapped alignments, with links to the full document records. Each BLAST alignment is accompanied by an alignment score and a measure of statistical significance, called the Expectation Value, for judging the quality of the alignment. Web BLAST also provides a graphical overview of the alignments, which are color-coded by alignment score and clearly show the extent and quality of the sequence similarities detected by BLAST, as well as the disposition of gaps in the alignments.
The default databases searched by BLAST are the non-redundant (nr) nucleotide and protein databases constructed from the Entrez databases. Several pre-defined specialized databases or subsets may also be searched, and searches may be restricted to sequences from a particular organism. Customized BLAST pages allow a nucleotide query against any combination of 21 complete and 40 incomplete microbial genomes, or against the genomes of malaria-associated pathogens.
Specialized versions of BLAST are also offered to facilitate other approaches to protein similarity searching. Position Specific Iterated BLAST (PSI-BLAST) (9
) initially performs a conventional BLAST search to produce alignments from which it constructs a position specific profile. Subsequent BLAST iterations use this profile matrix in place of the initial query and scoring matrix to find similarities in a database. Pattern Hit Initiated BLAST (PHI-BLAST) (10
) takes as input both a peptide query sequence and a peptide pattern, or motif, found within the peptide query sequence. The motif specifies an obligatory match between query and database sequences, about which optimal local alignments are constructed. Another variant, ‘BLAST2Sequences’ (11
), can display the similarity between two DNA or peptide sequences by producing a dot-plot representation of the alignments it reports.
Basic BLAST 2.0 searches can also be performed by Email through the address: email@example.com . Documentation can be obtained by sending the word ‘help’ to the server address.