Summary statistics of the IBIS database
Currently, a total of 40 716 proteins (151 887 protein chains/domains) are represented in IBIS with at least one type of interaction observed in their structural complexes. As can be seen from , protein–protein and protein–chemical interactions are the most frequent types of interactions observed in protein structures. Protein–protein interactions are the most prevalent interactions as reflected by the number of domains involved in interactions and the number of binding sites. The number of inferred interactions is always higher than the number of observed interactions, especially for protein–peptide and protein–nucleic acid interactions, where the number of inferred interactions exceeds the number of observed ones (in terms of the number of protein chains) almost 5-fold. This ratio is even higher for binding site clusters (B). Altogether, IBIS provides information on binding partners and binding site locations with averages of 3.4 protein–chemical binding site clusters per chain, and eight protein–protein binding site clusters per domain. The scale of such annotations is approaching the scale of whole interactomes.
(A) Histogram depicting the number of proteins in PDB with observed/inferred binding sites. (B) Histogram showing the number of binding sites inferred by IBIS as compared to those observed in protein structure complexes.
Description of the IBIS interface
IBIS may be queried by supplying either a protein NCBI GenBank identifier or PDB code (the one letter PDB chain identifier is optional). For a given query, it is possible to see different types of interactions, protein–protein, protein–chemical, protein–DNA, protein–RNA and protein–peptide, by navigating through different tabs at the top of the page (the display of protein-ion interactions is currently under development). illustrates an IBIS Interaction Summary page. Observed and inferred binding site clusters are sorted by the ranking score. Each row in the table corresponds to a binding site cluster and can be expanded to show the cluster members.
IBIS screen shot for 1U59, Chain A, displaying various chemical binding sites inferred from its homologs. A blowup of the expanded cluster of the ATP binding site is also shown.
The main features of binding sites and interaction partners in the Interaction Summary table are as follows:
‘Interaction partner’—name of the interaction partner which interacts with either the actual query (‘observed’ interactions) or homologs of the query from within a given binding site cluster (‘inferred’ interactions). For protein–protein interactions, the CDD domain name of the binding partner is listed. For protein–chemical interactions, the column reports the name of the chemical bound to a representative member of the cluster. For protein–nucleic acid and protein–peptide interactions, the column reports the sequence of the first 20 biopolymer residues from the interaction partner of a representative cluster member.
‘Ranking score’—the score which ranks the binding site clusters in terms of their biological relevance and similarity to the query. The ranking score is not defined for the ‘singleton’ clusters.
‘Number of cluster members’—the number of cluster members. Upon cluster expansion only non-redundant cluster members are displayed (at <90% identity level). A complete list of members can also be viewed by clicking the ‘See all members’ link.
‘Average percent identity to query’—the average sequence identity between the query and the cluster members calculated over all of their structural alignments with the query.
‘Number of binding site residues’—the union of binding sites mapped from all members of the cluster to the query.
‘Number of chemicals’ (for protein–chemical interactions)—the number of unique, standardized chemicals present in a given binding site cluster.
‘Curator annotation’—binding site annotation from the CDD which overlaps by >50% with the sites annotated by IBIS. Binding site clusters with matching CDD annotation are top-ranked irrespective of their ranking score.
‘Taxonomic diversity’—the last common ancestor of the proteins from a given cluster, listed with a link to NCBI’s Taxonomy Browser, so that one can explore all taxonomic groups represented by the cluster.
The actual binding site residue alignment can be seen upon expanding the clusters, including the PDB codes corresponding to all complex structures summarized by the clusters. It is also possible to view the inferred binding sites projected onto the actual query structure using the Cn3D visualization software (http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml
). For the case of protein–protein interactions, the expanded table will provide the PISA validation status for each interaction interface. PISA may not be able to process a particular complex structure; these cases are indicated by an ‘N/A’ symbol.
The features of binding site clusters can be examined by using the ‘Advanced search’ option found on the left side bar. This option allows one to filter the interactions within a given interaction type by various criteria like level of sequence identity, structural similarity, names of interacting partner and others. In the case of chemical binding sites, for example, it is possible to pick and inspect various sites a particular chemical may bind to on a given query.
Annotating new binding sites using IBIS: example of human spleen tyrosine kinase catalytic domain
Spleen tyrosine kinase (Syk) is a non-receptor tyrosine kinase, expressed in a wide range of cell types, which plays an important role in immunoreceptor signaling (33
). It is an attractive drug target for the treatment of allergic and antibody mediated autoimmune diseases, breast and gastric cancers. Syk is characterized by two N-terminal SH2 adapter domains, a linker region and a C-terminal catalytic domain. Several drugs/inhibitors target the active site of the Syk catalytic domain and decrease its activity.
Here, we demonstrate how IBIS can be used to annotate the binding sites of the Syk catalytic domain. We start with a Syk sequence for which a structure of the complex with the ligands is available (pdb code: 1XBB); we predict binding sites using IBIS, and finally compare predicted sites with the actual binding sites observed in the structure. First we find the closest homolog with a known structure, a Zap-70 kinase (1U59 Chain A; Blast E-value of 6e-99 and 77% identity to the query sequence, ). Second, we use the structure of 1U59 as a query in IBIS and find nine protein–chemical binding site clusters. The top two clusters overlap with the ‘active site/ATP binding site’ CDD annotations. The first binding site cluster includes 360 homologous structures bound to 170 different chemicals. The consensus binding site alignment is 65 residues long, due to the diversity and size variation of the chemicals bound, but it highlights 13 highly conserved residues. The ATP-binding site represents an attractive target for the design of kinase inhibitors, and IBIS provides a concise summary of interactions at that site, which would otherwise require significant comparative analysis. Here IBIS groups and identifies an ATP-binding site, and provides a list of various chemicals, among them many kinase inhibitors, which might potentially bind to and inhibit the query protein. All binding sites observed in the actual structure complex with the anticancer drug imatinib (1XBB) are correctly annotated by IBIS (see table in ). Interestingly, imatinib binds not only to the ATP-binding site but also to a regulatory myristoylation site on the C-terminus (from the binding site cluster #8) that can be annotated on the query sequence.
Figure 4. Mapping of the 1U59 inferred ATP binding site onto the sequence of Syk tyrosine kinase (1XBB chain A) and its agreement with the observed binding site in Syk + complex with imatinib. MMDB residue numbering is used which starts from the beginning of the (more ...)
In addition to chemical binding sites, it is also possible to predict protein interaction partners for the Syk protein. For example, binding site cluster #1 under protein–protein interactions points to a potential SH2 domain binding site which is further validated by CDD curator annotation, although no structural complexes have been solved between Syk and SH2.