Figure shows the top of a CD-Search results page. The page lists the version of RPS-BLAST used in the process, repeats information about the query sequence and displays statistics on the search database used.
Figure 1 Graphical summary of results and hit list. The query sequence used in this search was gi|116863 (9).
The graphical results summary comes with a text box which reports such details as accessions, short names and E-values as the mouse pointer is placed over one of the balloons representing an individual hit. The query sequence is drawn as a black bar on the top of the image, with a ruler indicating its length. Sections colored in cyan have been filtered out as low-complexity regions in the database search. RPS-BLAST, the search engine behind CD-Search, typically does not extend alignments into these regions. Alignment details are shown in individual pairwise alignment displays in the bottom part of the page, where low-complexity regions are again explicitly indicated.
The individual balloons are assigned colors according to a fixed schema. The best scoring hit is colored red; the second-best scoring hit is colored blue; for example see the online help document for details. Hits to conserved domain models that are identified as related by the CDART resource (8
) are given the same color. The redundancy present in the all-inclusive search set of the CDD (Conserved Domain Database) is readily visible in this example. Balloons are drawn so that they extend from the first to the last residue of the alignment footprints on the query sequence. These alignments may contain gapped-out regions, which are visible in the pairwise alignment displays at the bottom of the page but are not indicated in the graphical summary.
Balloons may have jagged edges. A jagged edge at the N- or C-terminus of a domain footprint indicates that >20% of the domain model's extent is missing from the RPS-BLAST pairwise alignment. Pairwise alignment displays at the bottom of the page list exact percentages of the domain models that were used in the alignment. In the example shown in Figure , the jagged edges indicate partial hits to N- and C-terminal parts of zinc-dependent metalloprotase domains (the partial hits are caused by the insertion of additional domains).
Balloons may also have indentations, such as the best scoring hit in the example shown in Figure . Indentations indicate that a repeat structure has been detected algorithmically in the search model; in Figure , the alignment model labeled ‘HX’ spans four copies of hemopexin-repeats.
Clicking on a balloon invokes a multiple alignment view which adds the matching fragment of the query sequence, aligned according to the RPS-BLAST algorithm. An example is shown in Figure . The user can modify conservation thresholds used for coloring columns in the alignment to better identify conserved sites, and can select subsets of family-member sequences to be included in the display. By default, alignment rows most similar to the query sequence are chosen. In many conserved domain models curated at the NCBI, conserved functional motifs have been recorded as features of that model. A feature's address is a set of columns in the multiple alignment. Features are highlighted with hash marks printed on top of the alignment blocks. Features are recorded together with evidence, which may consist of citations or specialized 3D displays. The evidence viewer is found at the bottom of the multiple alignment display (not shown in Figure ). These alignment display options are intended to assist the user in predicting whether a query sequence is a true and/or functional member of the domain family (and in particular to allow users to better discriminate between chance similarities and actual homology when the statistical significance places a similarity in a ‘twilight zone’).
A fragment of the query sequence embedded in the domain alignment for hemopexin-like repeats. Conserved features (metal binding sites) are indicated with hash marks on top of aligned columns.
A powerful aid in studying query–subject relationships is the availability of 3D information. A large proportion of domain models can be linked to one or several 3D structures, and the NCBI's 3D structure viewer, Cn3D, can be used to interactively visualize the structure together with the multiple alignment of a domain family. Information about annotated features can be accessed from within Cn3D, and this combination provides a unique set of tools to hypothesize about the effects of sequence variation on otherwise conserved sites, such as catalytic sets of residues or binding interfaces.
An example is shown in Figure . The Cn3D viewer is launched by clicking the button labeled ‘View 3D Structure’ on alignment display pages (Figure ). Cn3D needs to be installed locally as a helper application; for details follow the link labeled ‘download Cn3D’ on the alignment display pages (Figure ). The set of sequences displayed, together with the embedded query, the consensus and a representative 3D structure, can be controlled using the options next to the ‘Subset Rows’ button. By default, Cn3D will be launched with 10 aligned rows; sequences from the alignment model are picked as the ones most similar to the query (judged by the number of identical residues, according to the aligned footprint). For domain-alignment models curated at NCBI, several 3D structures may be shown at once, with structure super-positions precalculated in the curation process. Cn3D allows the user to interactively highlight residues in either the 3D or alignment view; highlights will immediately transfer to the other view. Cn3D also allows the user to explore prerecorded annotation of conserved features, feature evidence and prerecorded links to literature.
Figure 3 Alignment visualization in the context of 3D structure. The query sequence has been added to the domain-alignment model. In this particular view the user visualizes evidence for the metal binding site (a conserved feature of this family), which is provided (more ...)