|Home | About | Journals | Submit | Contact Us | Français|
We updated our protein-protein docking benchmark to include complexes that became available since our previous release. As before, we only considered high-resolution complex structures that are non-redundant at the family-family pair level, for which the X-ray or NMR unbound structures of the constituent proteins are also available. Benchmark 4.0 adds 52 new complexes to the 124 cases of Benchmark 3.0, representing an increase of 42%. Benchmark 4.0 thus provides 176 unbound-unbound cases that can be used for protein-protein docking method development and assessment. 17 of the newly added cases are enzyme-inhibitor complexes, and we found no new antigen-antibody complexes. Classifying the new cases according to expected difficulty for protein-protein docking algorithms gives 33 rigid body cases, 11 cases of medium difficulty, and 8 cases that are difficult. Benchmark 4.0 listings and processed structure files are publicly accessible at http://zlab.umassmed.edu/benchmark/
During the last decade, the computational protein-protein docking field has advanced considerably. In part, this is due to the efforts of making algorithms available to the community through web servers and/or downloadable packages1–8, the community-wide CAPRI experiment9, and the development of publically available benchmarks of protein-protein complexes.10,11
A protein-protein docking benchmark provides the community with a set of non-redundant protein-protein complexes for which the complex structure and the constituent unbound structures are availabe. A benchmarks forms a subset of the Protein Data Bank (PDB)12, and provides a standard dataset that can be used for systematic comparison of docking algorithms. Quantity and diversity of interactions covered in a benchmark can be improved by tracking updates in PDB.
Eight years ago we introduced the first protein-protein docking benchmark,10 and we updated twice, in 2005 (Benchmark 2.0) and 2008 (Benchmark 3.0).13,14 Recently Kastritis and Bonvin collected experimentally measured protein-protein binding affinities (Kd’s) of 81 test cases in Benchmark 3.0.15 Since the last release, the number of entries in the PDB has increased by more than 13,000. This enables us to release a new update to the Benchmark.
We collected candidate structures from the PDB in a semiautomatic way with the same resolution cutoffs for X-ray structures (3.25 Å) and chain length (minimum of 30 residues) as described previously.10,13,14 Unlike the previous release, we now also consider structures determined with nuclear magnetic resonance (NMR) for the unbound forms of the proteins. We still excluded NMR structures for complexes, to preclude the possibility that they were generated with aid of docking algorithms. We used the biological assembly information from the PDB to distinguish crystal contacts from biological complexes. This initial pass yielded 47,767 unbound structures and 8,654 complex structures that represent hetero complexes of at least 2 interacting chains. The unbound forms of both binding partners were available for 1,667 complex structures, and we used the Structural Classification of Proteins (SCOP)16 database (version 1.75) to check this set for redundancy at the family level. Two complexes were deemed redundant if both proteins in one complex were in the same SCOP families as the two proteins in the other complex, respectively. This yielded 109 complexes that were non-redundant with the complexes in the previous release of the Benchmark and amongst themselves. (PDB entries without SCOP unique identifier sunid17 were excluded from the bound candidate list to remove possible redundancy.) Finally, we used literature information to eliminate obligate complexes18, which further reduced the list to 52 complexes.
When we found multiple candidates for an unbound structure, we selected one structure based on a combination of several considerations: highest sequence similarity with the bound structure, highest resolution, and lowest number of missing residues in protein-protein interface area. For an ensemble of multiple candidate entries for NMR structures, we selected the model that had the lowest interface RMSD (I-RMSD; defined below) with the bound form. The final structure files that are on the benchmark website include cofactors that were present in the original PDB files, and in the case of an NMR structure, all the models that were provided in the original file.
As done for the previous releases of the Benchmark, we classify the new entries according to expected difficulty for protein-protein docking algorithms, based on the structural difference between the bound and the unbound forms of the binding partners:14
We define I-RMSD as the root-mean-square distance between the unbound and the bound structures, superposed onto each other, calculated using the Cα atoms of the interface residues of both binding partners. In line with Mendez et al.19, fnat and fnon-nat are the fractions of native residue contacts and non-native residue contacts, respectively, of the superposed unbound structures.
The 52 new cases are listed in Table 1. The entire updated Benchmark is reported in Table S1 in Supplementary Materials. 1OYV is a 1:2 complex of a two-headed inhibitor and subtilisin.20 We split this complex into two cases for the Benchmark that represent the interaction between chain Aof subtilisin and chain I (inhibitor) and the interaction between chain B of subtilisin and chain I, respectively. In addition to the aforementioned properties, the tables also report the change in accessible surface area (ASA) upon complexation, which is a measure for the size of the interface between the binding partners.
Benchmark 4.0 includes 121 rigid body cases (33 new), 30 cases of medium difficulty (11 new), and 25 difficult cases (8 new). According to biochemical function, we have 52 enzyme-inhibitor (17 new), 25 antibody-antigen, and 99 complexes with other function (35 new). We did not find new antibody-antigen complexes. In this update of the Benchmark, we included 16 cases that involve NMR unbound structures. Among them, 11 cases are classified as rigid body, 4 cases of medium difficulty, and 1 case as difficult. Thus the expected difficulty for docking algorithms using NMR structures in the benchmark is similar to the expected difficulty using X-ray structures. If we would consider NMR structures for the bound complexes, we would have included seven more cases (1GGR, 1J6T, 1O2F, 1P9D, 1UR6, 2ODG, 3EZA). Although one can argue that exclusion of complex NMR structures from the Benchmark should be decided on a case-by-case basis, we decided to simply leave all out since inclusion would only lead to a small increase of the Benchmark.
Table 2 summarizes the average I-RMSD, fnat and fnon-nat for the different classes of docking difficulty. The numbers in Table 2 indicate that the new cases in Benchmark 4.0 (in parentheses) have generally higher I-RMSD for rigid body cases and cases of medium difficulty, which predicts the new test cases to be more challenging for computational docking. Also, the fraction of rigid body cases in the new cases is 0.63, somewhat lower than the 0.71 in Benchmark 3.0. Thus the new cases are expected to be more difficult for protein-protein docking algorithms and this must be taken into account when assessing docking algorithms, since performance will depend on the benchmark version utilized.
In summary, Benchmark 4.0 includes 52 new cases and a higher number of new rigid-body and medium difficulty cases show larger conformational changes upon binding than cases in the previous release. This is especially useful for the development of protein-protein docking algorithms that incorporate protein flexibility, a problem that has recently received much attention but still remains a major challenge.21
This work was funded by NIH grant R01 GM084884 awarded to ZW.