The HTML BLAST report at the NCBI website is based on a text report for a stand-alone program, which consists of several sections. First, the header lists the search performed, the query and database and the BLAST version. Second, the table of descriptions summarizes the results and presents the subject sequence identifier (accession), title and statistics about the match. Finally, the alignment section presents the full sequence title, additional accessions and titles for redundant sequences in the database, the length of the subject sequence, information about the score of the match, as well as the actual alignment. The HTML version of the report presented at the NCBI website is a modified version of the text report. It includes links to other reports such as ‘taxonomy reports’ and a ‘distance tree’ immediately after the header, followed by the BLAST Graphical Overview. A table contains the subject sequence descriptions, as well as subject sequence identifiers (hyperlinked to other NCBI resources), and links to the alignments further down the report.
There are a number of issues with the old HTML BLAST report. The linking to other NCBI resources is inconsistent. Sequence identifiers in the descriptions and alignments sections normally link to a GenBank or GenPept report in Entrez, but for assembled genomes they link to the Mapviewer. The old BLAST report uses one-letter icons to link to other NCBI resources, such as ‘G’ for Gene or ‘U’ for UniGene, but these icons are not obvious to users. Some users are more familiar with BLAST than the rest of the NCBI website; therefore, they might not know what information is provided by Gene or the difference between Gene and UniGene. The title is often truncated, especially for longer titles. There are almost no navigational links in the alignments section of the report and no convenient way to move to the next alignment or return to the top of the page. It is important to present the report as quickly as possible, but in some cases, formatting the alignments can delay loading of the page and can consume substantial resources on the user’s desktop. Users also often look at only a few alignments. Because of these considerations, the old BLAST report prints all of the descriptions and only half of the alignments by default. To see all the alignments, the user needs to reformat the report. Users may not know how many alignments they want to examine until they start looking at the report. They could initially format either too many or too few alignments. Users have also requested the ability to conveniently download FASTA for subject sequences, as well as XML or BLAST reports. Additionally, the old report does not include links to the newly developed graphical sequence viewer that can be used to display BLAST alignments. Despite the limitations discussed here, users are familiar with the basic format of the BLAST report and find it useful.
A new BLAST report addresses the aforementioned issues without changing the basic structure of the report. Later in the text, use a megaBLAST search of the genomic region for the gulonolactone (l-) oxidase gene of Rattus norvegicus (bases 48 898 799–48 921 150 of NC_005114.3) against the nucleotide collection (nt) to demonstrate new report features. This search uses default BLAST parameters, except that rodent repeat filtering is enabled. The header and BLAST Graphical Overview are unchanged from the old report. The table of descriptions, presented in , is different from the old report. For most sequences, as the title is more informative than the accession, the title is in the leftmost column of the table. The title is followed by statistics describing the quality of the match. Because a subject sequence may have multiple separate alignments to the query, both the highest scoring alignment and the total score of all alignments are presented (max score and total score). Query coverage describes what percentage of the query length matches the subject sequence. The expect value describes the statistical significance of the match and ‘Max ident’ the per cent identity of the match with the highest identity. Per cent identity is calculated from the number of identical letters divided by the alignment length, where the alignment length is the number of matching letters plus the number of gaps for either the query or the subject. Finally, on the right-hand side is the accession of the subject sequence (hyperlinked to the GenBank or GenPept style report). The table of descriptions can be sorted by clicking on numerical column headers. Columns can also be hidden using the gear icon on the right side of the table of descriptions (). The table of descriptions has been optimized to show more of the title for the subject sequence, as the browser window is widened. The new report provides additional download options, such as FASTA, for the full or aligned portion of subject sequences, GenBank reports and various BLAST reports. It also provides links to other resources at the NCBI, such as the graphical sequence viewer and the distance tree. For example, shows the graphical sequence viewer display of a query sequence and selected aligned subject sequences available from the descriptions table.
Figure 1. Table of descriptions for a search of the genomic region for the gulonolactone (l-) oxidase gene of Rattus norvegicus (bases 48 898 799–48 921 150 of NC_005114.3) against nt. Selecting the title loads the alignments for that sequence (if needed) (more ...)
Figure 2. Example view of query and selected subject sequences in the graphical sequence viewer. The subject sequences are selected mRNAs found by a megaBLAST search against nt with the genomic region for the gulonolactone (l-) oxidase gene of Rattus norvegicus (more ...)
Figure 3. Alignments from a search of the genomic region for the gulonolactone (l-) oxidase gene of Rattus norvegicus (bases 48 898 799–48 921 150 of NC_005114.3) against nt. The download menu and links for GenBank and graphic apply only to this subject (more ...)
presents a set of alignments from a search. A shaded bar begins the presentation of alignments for a subject sequence. Below the bar is information about the subject sequence as well as the match. There is a download menu as well as a link to Genbank and the Graphical sequence viewer. In the middle of the shaded bar there is a pull-down menu that controls the sorting of the matches for subject sequences with more than one match. To the right of the alignment the words next, previous and descriptions (along with some arrows) serve as navigation aids. Below these navigation aids, there are links to ‘related information’. These links spell out the name of the resource and provide a short description.
The BLAST help tab, accessible from the BLAST home page at http://blast.ncbi.nlm.nih.gov
, has links to a video (from the NCBI YouTube channel), as well as a document about using the new BLAST report.