A simple query
MMDB is an integrated part of Entrez and can be accessed by querying Entrez’s ‘3D structure’ database for particular terms or keywords. This allows one to identify structures based on protein names, author names, publication dates, species names or other terms. A query such as this will produce a list of MMDB entries, and one may browse this list, following links to other databases, for example those to MEDLINE abstracts. At the time of writing, MMDB’s servers receive approximately 25 000 3D structure queries per day.
As an example, we consider a search with the terms ‘aminocyclopropane synthase’. This identifies several 3D structures available for this enzyme, including the structure with PDB identifier ‘1B8G’ (12
), the protein from Malus x domestica
, the apple tree. Following the link to ‘3D domains’, one sees that structure neighbors are available for eight different substructures, the complete chains A and B plus three compact domains identified within each. Following the link to ‘related 3D domains’ for domain ‘1B8G A 3’ (the third domain in chain A, as numbered from the N-terminus of the chain), one sees that 3D superpositions are available for over 1000 structure neighbors of this domain.
A more advanced query
Entrez provides a query refinement feature that allows one to combine the results of simple queries involving term-match hits, links or neighbors. To continue with the example above, suppose one wishes to identify some of the most evolutionarily distant structure neighbors of domain ‘1BG8 A 3’, as a means to identify conserved residues that may be associated with its binding and/or catalytic function. One option is to examine the tabular listing of VAST superposition statistics, available by following the link from the domain identifier ‘1BG8 A 3’, to choose structure neighbors with a low percentage of identical residues in the structural alignment. Another powerful method, however, is to choose structure neighbors from phylogenetically distant organisms. For this search it is necessary to combine results of an MMDB search by taxonomy with structure neighboring results.
As may be seen by following the taxonomy links from domain ‘1BG8 A 3’, this protein is derived from an organism (apple tree) in the superkingdom Eukaryota. The most distantly related organisms will be those from the two other superkingdom taxa, Eubacteria and Archaea. Searching Entrez’s ‘3D Domain’ database for ‘Archaea’ (with ‘limits’ set to ‘organism’), one finds that there are approximately 1000 3D domain structures known for this taxon. To select those that are also structure neighbors of 3D domain ‘1BG8 A 3’, one uses Entrez’s ‘history’ window to request the Boolean ‘AND’ of the 3D domains identified by each simple query: <1> AND <2>, where <1> and <2> represent query numbers as recorded in Entrez’s history list. Performing this search, one finds approximately 20 structures which are both structure neighbors of ‘1BG8 A 3’ and derived from Archaea, among them domain ‘1DJU A 3’, a domain from an aromatic aminotransferase from Pyrococcus horikoshii
). Proceeding similarly for ‘Eubacteria’, one finds that several hundred structure neighbors of ‘1BG8 A 3’ derive from this taxon, including ‘1AMQ 2’, an aspartate aminotransferase from Escherichia coli
Visualization of structure neighbors is available from the ‘View’ link provided with tabular listings of VAST superposition statistics. Choosing the structure neighbors ‘1DJU A 3’ and ‘1AMQ 2’ from among the other neighbors of ‘1BG8 A 3’, and pressing the ‘View’ button, one may launch a Cn3D display as shown in Figure . Setting Cn3D to color aligned residues by variability, one can immediately see that conserved residues are concentrated in a single region of these domains. Furthermore, since each structure contains a bound pyridoxal phosphate cofactor (or related compound), one can verify that these conserved residues line the binding pocket, and are presumably necessary for cofactor binding and aminotransferase activity. We note that tabular listings of VAST superposition statistics provide several controls for sorting and subset selection, as an aid to browsing. To reproduce the superposition in Figure it is helpful to select subset ‘all of MMDB’ and sort by ‘aligned residues’. This allows one to identify structure neighbors having extensive similarity (many aligned residues) and (in this example) with bound cofactors.
Figure 2 Structural alignment of aminotransferase domains from the three superkingdoms of the NCBI taxonomy, Archaea, Eukaryota and Eubacteria. Conserved residues are shown in red, partially conserved residues in magenta, and non-conserved residues in blue. Grey (more ...)