BLAST2 is the most widely used tool to query databases by sequence similarity search (1
). The major BLAST2 webservers, as at NCBI, EBI or ExPASy, allow the user to retrieve sequences returned in the output. This is essential for enabling further analyses such as multiple alignment, calculating a tree, investigating residue conservation and other forms of comparative sequence analysis.
Often a user will be interested in a defined subset of sequences, for example just mammalian or just purple bacteria. Servers such as at NCBI and ExPASy allow prior taxonomic restriction of the database, saving the computational search time.
However, we find that there is a more common situation in which the user will not be interested in all the protein sequences found by BLAST, which, for example in the case of protein kinases, will be thousands, but that the set of interesting sequences is dependent on what is returned by the search and, as the analysis of the hits proceeds, may need continual revision. As an example, if one is interested in the evolution of vertebrate multigene families, an invertebrate outgroup may be needed. But which? Chordates like amphioxus or Ciona would be desirable but the sequence may not have been determined yet. The Drosophila proteome is available but the homologous protein may be absent—true for many vertebrate extracellular matrix proteins in this soft-centred organism for example. Shall we next try Caenorhabditis or Aplysia? Our choice of sequence to use as outgroup depends on what sequences are found to be available.
While the user can click through an output list to find out which species are present, the cryptic entry IDs used by SPTrEMBL (and other databases except SWISS-PROT) are a hindrance. Furthermore no major BLAST server provider currently includes sufficient useful information like species or gene name in the output to allow rapid perusal, even though this information can be easily parsed into the BLAST-formatted binary databases.
Experienced SRS users can run BLAST from within an SRS server (e.g. http://srs.ebi.ac.uk/
) and use query linking to select desired data subsets. The approach is powerful, though difficult for naive users. More recently BLAST at NCBI has allowed the user to apply ENTREZ keyword queries to select sequence entries from the output. As noted above, however, both servers currently provide uninformative BLAST outputs so the user may still have difficulty determining what is present.
To meet the need for flexible on-line retrieval, both for ourselves and others, we want a server that allows the user to apply taxonomic and other keywords to delimit sequence collection from BLAST output. One way to do this is by harnessing the SRS retrieval tool (2
) in tandem to BLAST, principally by setting up the BLAST web output with SRS controls. A comprehensive sequence similarity search is initially performed, from which the results, a list of database hits, are saved for subsequent rounds of selection/filtering and viewing by different criteria. Additionally, by carrying useful database annotation into the BLAST output, the user can be better informed as to what is available for retrieval.
Here we summarise the functionality of the BLAST2SRS server. It is designed for flexible retrieval of subsets of related sequence entries in the SWISS-PROT and SPTrEMBL protein sequence databases (4
). The focus is on this alone and features, such as graphical alignment summaries, found on more sophisticated user interfaces are not implemented here. Nor is this server aimed at maximising sensitive detection of remote homologues, where PSI-BLAST (1
), Profile (5
) or HMM (6
) searches would be more appropriate.