Users of NCBI's services are likely to encounter CDD in two ways. (i) When protein query sequences are submitted for BLAST searches against protein databases, the queries will be submitted to CD-Search by default, and the results—if any—will be displayed graphically on the intermediate BLAST results page. Clicking on the image will launch a browser window with the detailed results, which allow further analysis. (ii) Pre-calculated CD-search results exist for proteins in Entrez, and are readily available following the [Domains] link associated with protein records and document summaries. One might, for example, study a hypothetical protein from a complete genome sequence, say gi|2495965 from
Methanocaldococcus jannaschii. Following the [Domains] link and expanding the summary to show more details will produce a graphical display, as shown in Figure . While the protein maps to a conserved family of unknown function (DUF135/pfam02003), the sequence also produces hits to two models for DNA ligases (pfam01068 and LOAD_ligase). In fact these three are grouped together with other domains as ‘related’ in the CDART database, as displayed on each member's conserved domain summary page. This bigger group of related domains comprises ATP- and NAD-dependent DNA Ligases, whose adenylation domains are known to share a well-conserved core structure around the active site (
10). The representative model from the LOAD set, ‘LOAD_ligase’, aligns very diverse members from a large superfamily, also including RNA-ligases and mRNA capping enzymes (
11).
These and other interesting family relationships are recorded implicitly in CDD and CDART. Related domains share subsets of sequences for which overlapping intervals hit both domains with significant E-values in CD-Searches. The pre-recorded relationships help understand the redundancy in the imported and curated collections.
But how do we know whether these relationships are indicative of common molecular function? Multiple alignments are readily available for inspection, with the ability to colour by conservation. If a three-dimensional structure has been linked to the domain model, Entrez's structure viewer Cn3D can be used to interactively visualize structure and sequence data for a family. With these tools, and by exploring relevant literature, starting from CD-linked citations, the user may understand that it's in fact the catalytic core which is preserved among these families, and that they are likely to share a common enzymatic mechanism. However, if the location of functionally relevant residues had been recorded in the alignment models, it might have been easier to arrive at that conclusion.