Data collection currently focuses on targets whose three-dimensional structures are available in the Protein Data Bank (5
) (PDB) or can be accurately modeled. Such data are of particular interest because they are amenable to structural analysis and are suitable for the development and validation of computational models of binding. Statistical sampling of the PDB in 2003 revealed that ~150 of the non-redundant proteins therein were considered current or potential drug-targets (unpublished data) and were thus suitable for data collection by BindingDB. This analysis omits additional drug-targets whose structures could be built by comparative modeling. Restricting attention to proteins of known structure allows BindingDB to complement, rather than overlap, other binding databases collecting data for membrane proteins whose 3D structures are, in the main, unavailable; e.g. GPCRDB [www.gpcr.org
)], the IUPHAR receptor database (www.iuphar-db.org
) and GLIDA [http://gdds.pharm.kyoto-u.ac.jp/services/glida/index.php
Proteins are selected for data collection based upon their importance as drug-targets or model systems, as well as the availability of suitable data. Once a protein is selected, relevant scientific articles are identified and their data are extracted and deposited into BindingDB. Data from multiple laboratories and companies are sought in order to obtain a wide range of chemotypes for the targeted protein. The journals from which data are drawn include J. Med. Chem
., Bioorg. Med. Chem. Lett
. and Biochem
. Web-accessible forms also allow direct deposition by experimentalists, but this route has not generated a significant number of entries. The majority of the data are based upon enzyme inhibition studies (>19
000 measurements), but a smaller number of data from the more informative method of isothermal titration calorimetry also are included (416 measurements). Each data entry includes detailed experimental conditions, such as solution composition, pH and temperature, because these can affect the measured affinities.
BindingDB currently holds ~20
000 binding data for ~11
000 different small molecule ligands and 110 different drug-targets; or 74 targets when mutants and isoforms are not counted separately. Examples include anthrax lethal factor, various caspases and kinases and HIV protease and reverse transcriptase. Perhaps the most similar public effort is KiBank (9
), which provides a sparser user-interface to a substantial data set of ~16
data for 5900 small molecule ligands and 50 protein targets, apparently including proteins for which no structural data are available. For a perspective on BindingDB's current data holdings, shows the number of binding measurements for various targets and target classes, and provides histograms of Ki
and IC50 values, and of the molecular weights of the small molecules across all entries. Although structural data are available for every protein target included in BindingDB, BindingDB collects data for many ligands that are not represented in the PDB. For example, the PDB has ~50 structures of acetylcholinesterases, while BindingDB has affinity data for acetylcholinesterase with ~250 different ligands. More generally, ~2% of ligands in BindingDB have an exact match in the PDB and ~15% of ligands in BindingDB have 90% similarity to a ligand in the PDB based upon the search criterion of the PDB. Thus, BindingDB's data collection differs significantly from those of databases which only collect affinities for protein–ligand complexes in the PDB, notably BindingMOAD (10
) which holds ~1400 data, PDBBind (11
) with ~1600 data, and AffinDB (13
) with ~750 data.
Number of measurements in BindingDB for various targets and target classes.
Histograms of binding affinities (1 M standard concentration), and molecular weights of ligands in BindingDB.