|Home | About | Journals | Submit | Contact Us | Français|
Conceived and designed the experiments: BGP YH ZW. Performed the experiments: BGP. Analyzed the data: BGP. Contributed reagents/materials/analysis tools: YH. Wrote the paper: BGP ZW.
Computational prediction of the 3D structures of molecular interactions is a challenging area, often requiring significant computational resources to produce structural predictions with atomic-level accuracy. This can be particularly burdensome when modeling large sets of interactions, macromolecular assemblies, or interactions between flexible proteins. We previously developed a protein docking program, ZDOCK, which uses a fast Fourier transform to perform a 3D search of the spatial degrees of freedom between two molecules. By utilizing a pairwise statistical potential in the ZDOCK scoring function, there were notable gains in docking accuracy over previous versions, but this improvement in accuracy came at a substantial computational cost. In this study, we incorporated a recently developed 3D convolution library into ZDOCK, and additionally modified ZDOCK to dynamically orient the input proteins for more efficient convolution. These modifications resulted in an average of over 8.5-fold improvement in running time when tested on 176 cases in a newly released protein docking benchmark, as well as substantially less memory usage, with no loss in docking accuracy. We also applied these improvements to a previous version of ZDOCK that uses a simpler non-pairwise atomic potential, yielding an average speed improvement of over 5-fold on the docking benchmark, while maintaining predictive success. This permits the utilization of ZDOCK for more intensive tasks such as docking flexible molecules and modeling of interactomes, and can be run more readily by those with limited computational resources.
Interactions between biomolecules are crucial to the function of biological systems, forming the basis of normal and aberrant cellular behavior, as well as defense against external pathogens. To fully understand these interactions, atomic-level descriptions of the structures of their binding interfaces are essential. While the structures of many protein-protein complexes have been characterized experimentally via x-ray crystallography and deposited in the Protein Data Bank (PDB; ), the majority of known complexes have not, providing an opportunity for predictive computational techniques to help elucidate these structures. Molecular docking approaches, which take two (or more) structures as input and predict the structure of their complex, are increasingly being used for this purpose .
The success of protein-protein docking algorithms in the past decade has given rise to several exciting developments in the field. This includes addressing molecular flexibility during binding by “cross-docking” ensembles representing snapshots of mobile structures , , , or combining docking results of rigid (or semi-rigid) substructures , . On a larger scale, other recent work includes the application of protein-protein docking to predict the structure of the yeast interactome , and the use of protein docking to distinguish binding versus nonbinding proteins based on docking scores . These areas of progress indicate that faster and more efficient docking algorithms are key to helping improve both predictive accuracy and proteomic coverage.
Previously our laboratory developed the program ZDOCK, which uses a grid-based representation of two proteins and a 3-dimensional (3D) fast Fourier transform (FFT) to efficiently explore the rigid-body search space of docking positions . The most recent version, ZDOCK 3.0, has a scoring function that includes shape complementarity, electrostatics, and a pairwise atomic statistical potential developed using contact propensities of transient protein complexes . ZDOCK 3.0 showed vast improvements in its predictive ability versus the previous version when tested on a protein-protein docking benchmark , and has led to highly successful performance in the blind protein docking experiment, CAPRI , . However, with the improved accuracy due to the pairwise statistical potential, the running time and memory usage of ZDOCK increased significantly, as seven FFTs (rather than two in the previous version, ZDOCK 2.3 ) needed to be computed per docking orientation.
To reduce this computational burden and make proteomic scale docking and ensemble docking approaches more tractable, we have developed a new version of ZDOCK that retains the predictive accuracy of ZDOCK 3.0 while vastly improving its computational performance. This was achieved by integrating an FFT library that was designed to improve 3D FFT performance , as well as several improvements to the molecular discretization to further reduce the grid size required to represent the input proteins. These optimizations were evaluated against 176 test cases in a newly released version of a protein docking benchmark , resulting in over 8.5-fold average improvement in running time. We also implemented these updates on ZDOCK 2.3, for those users who have pipelines or protocols in place with this tool (e.g. protein/DNA docking ), which resulted in 5.5-fold improvement in running time. Examining the test cases with the highest levels of running time improvement showed that the primary factor in improving performance is the reduced grid size due to the new FFT library.
The ZDOCK algorithm, which followed from initial efforts in FFT-based protein docking ,  and was described in detail by Chen and Weng , includes the following steps (not including the pre-processing step of marking surface atoms and atom types in PDB files). The terms “receptor” and “ligand” refer to the two input proteins, with the receptor generally being the larger protein or known to function as a receptor in vivo (e.g. an antibody in an antibody/antigen interaction).
ZD1. Center receptor coordinates at origin based on center of mass.
ZD2. Center ligand coordinates at origin based on center of mass.
ZD3. Select cubic grid size to contain centered molecules for FFT.
ZD4. Discretize receptor, assigning scores to 3D grid(s) of complex numbers.
ZD5. Rotate input ligand to random orientation, if specified.
ZD6. Rotate ligand to Euler angles from uniformly distributed set, and discretize.
ZD7. Perform 3D FFT to compute convolution between ligand and receptor grids, and select top scoring position from the resultant grid.
ZD8. Repeat steps 6–7 for a total of 3,600 ligand rotations (15° angular sampling) or 54,000 ligand rotations (6° angular sampling).
Here we present major improvements to ZDOCK's initial orientation and FFT procedures (bold steps above), while not modifying the discretization protocols that embody the ZDOCK scoring function. Previous ZDOCK versions and their scoring terms include: ZDOCK 1.3 : Grid-based shape complementarity, atomic contact energy (ACE; ), electrostatics: ZDOCK 2.1 : Pairwise shape complementarity (PSC); ZDOCK 2.3 : PSC, ACE, electrostatics; ZDOCK 3.0 : PSC, interface atomic contact energy (IFACE), electrostatics.
Modification of ZDOCK to improve its efficiency consisted of the following successive improvements:
We previously released ZDOCK 2.3 and 3.0 with the Conv3D library (Step 1) as interim versions, named ZDOCK 2.3.1 and ZDOCK 3.0.1 respectively. The current releases include Steps 1–4 and are ZDOCK 2.3.2 and ZDOCK 3.0.2. Due to the input rotation of the receptor and the possible switching of the ligand and receptor, the ZDOCK output file format required slight modification of its header format, and consequent updating of the code to create ZDOCK predictions (create_lig) from the output file. However, the create_lig program still generates structural predictions with the receptor fixed with respect to the coordinates of the input PDB file, so that the internal optimization of the receptor and ligand coordinates for ZDOCK discretization are not visible to the end user.
We also introduced a command line flag in ZDOCK (“-F”) to provide users the option of keeping the input receptor fixed (not rotated or switched with the ligand) during ZDOCK execution, resulting in the same ZDOCK output file format as previous versions. This entails Steps 1 and 2 (without Steps 3 and 4), and can be used for improved performance versus the base ZDOCK version (2.3 or 3.0) when users require the original ZDOCK output file format for their post-processing pipeline.
After updating ZDOCK versions 3.0 and 2.3 with the Conv3D FFT library and improved molecular representation (detailed in the Implementation section), we tested these new versions (3.0.2 and 2.3.2) for their computational efficiency using all 176 unbound test cases of protein-protein docking Benchmark 4.0 ; results are given in Table 1. Each run of ZDOCK used default angular sampling (3,600 ligand rotations), and a single 2.8 GHz 64-bit Opteron processor with 8 GB available RAM. To test the improvements due to specific modifications, we also measured the performance of ZDOCK with Conv3D only (Step 1 in Implementation; 3.0.1 and 2.3.1), and Conv3D with improved centering (Steps 1 and 2; 3.0.2f and 2.3.2f).
The most dramatic improvements were seen for ZDOCK 3.0.2, with 18.9 minutes average running time for the docking benchmark, from an original average running time of 167.1 minutes. This is nearly three times less than the average running time for ZDOCK 2.3 on the docking benchmark. On average, this version had an 8.6-fold improvement in running time versus ZDOCK 3.0; this was significantly higher than the 6.4-fold improvement from Conv3D alone (ZDOCK 3.0.1), though integrating Conv3D was evidently responsible for the majority of the running time improvement. Required memory concomitantly was reduced for these ZDOCK improvements, with less than half of the memory for ZDOCK 3.0 required, on average, by ZDOCK 3.0.2 (256 MB, versus 700 MB for ZDOCK 3.0).
Mirroring the improvements for ZDOCK 3.0, the running time and memory usage of ZDOCK 2.3 were also improved via these library and discretization modifications, although to a lesser extent (5.5-fold versus 8.6-fold improvement in running time). While ZDOCK 2.3 has over three times faster average running time on the docking benchmark versus ZDOCK 3.0, ZDOCK 2.3.2 is still over twice as fast as ZDOCK 3.0.2.
To ensure that the predictive accuracy of ZDOCK was maintained during optimization, we measured the success rates and the average number of hits for the original and updated versions of ZDOCK (Figure 1). As before, hits are defined as predictions with interface Cα root mean square distance (RMSD) ≤2.5 Å from the bound structure, and framework regions of antibodies were blocked prior to docking, to avoid non-CDR binding predictions as described previously . Details of the ZDOCK results are given in Tables S1 and S2.
As shown on a previous version of the docking benchmark , the success rate for ZDOCK 3.0 and hit count are substantially higher than for ZDOCK 2.3; this is also seen for the new ZDOCK implementations presented here (ZDOCK 3.0.2f and ZDOCK 3.0.2 versus ZDOCK 2.3.2f and ZDOCK 2.3.2). These new versions of ZDOCK have approximately the same success rates of the respective previous versions, with some minor differences (e.g. higher success at N=2 for ZDOCK 3.0.2f) that appear insignificant and not sustained for varying numbers of predictions. Hit counts (Figure 1B and 1D) likewise follow similar trends as the previous ZDOCK versions; although there are slightly higher hit counts for ZDOCK 3.0.2f and ZDOCK 3.0.2 versus ZDOCK 3.0 at larger numbers of predictions, this is much smaller and less significant than the differences between ZDOCK 2.3 and ZDOCK 3.0. For the rigid-body cases (121 out of 176 cases; Figure 1C and 1D), there is an upward shift in success rate and hit count compared with the results for all test cases (Figure 1A and 1B), which is to be expected given the lower binding conformational changes of these cases on average. However, considering just the rigid-body cases does not yield any differences in the relative docking success between the ZDOCK versions.
To examine the extent of the running time improvement among individual cases, we compared running times for ZDOCK 3.0 with ZDOCK 3.0.2 for all 176 Benchmark 4.0 test cases (Figure 2A; running time details are given in Tables S3 and S4 for ZDOCK 2.3.2 and 3.0.2 respectively). Most cases follow the trend of 8.6-fold average improvement, with the exception of a few outlier points. This includes 2VIS and 1I4D, which showed greater-than-average improvement of 14.7-fold and 21.7-fold in running time, respectively, and 1N2C, which was below the average but still had a substantial 6.4-fold improvement in running time. The minority of cases (46 out of 176) had ligand and receptor switched by ZDOCK 3.0.2 (Table S3), and some of these (e.g. 2VIS) clearly had dramatic improvements in running time. The high correlation between grid size and running time is shown in Figures 2B (ZDOCK 3.0; R=0.99) and 2C (ZDOCK 3.0.2; R=0.98). This indicates that the efficient use of the grid search space by ZDOCK 3.0.2 provides the basis for the improvements in running time across the docking benchmark.
Also evident is the clustered nature of running times for ZDOCK 3.0, as seen in the horizontal bands in Figure 2A and the overlapping points in Figure 2B (making the plot appear sparser than Figure 2C). This is due to the cubic grid size in ZDOCK 3.0 being selected from a finite set of numbers (as specified by FFTW, or ESSL for IBM ZDOCK compilations), which in turn leads to similar running times between cases that share the same grid size. With Conv3D, the receptor grid size is selected from a finite set of numbers for each x, y, and z dimension, leading to more possible 3D grid sizes available and a consequent dispersion of running times versus those for ZDOCK 3.0.
An individual example of a test case with dramatic speed improvement is shown in Figure 3, which shows the structure of the 1I4D receptor and its representative grids for ZDOCK 3.0 and 3.0.2. Is clear that the ability to use a rectangular rather than a cubic grid to represent this elongated protein, along with optimal alignment along the x, y, and z axes, enables the vast improvement in efficiency for this test case. On the other hand, 1N2C, which has below-average improvement as noted above, has two globular symmetric proteins as receptor and ligand, offering less opportunity for optimizing docking speed via grid size reduction or rotation of an input molecule.
These new versions of ZDOCK can be utilized more readily by those with limited computing resources, as well as those who are addressing challenging areas at the forefront of structural prediction, such as molecular flexibility and 3D network modeling. In fact, ZDOCK 3.0 (which has the same scoring function as ZDOCK 3.0.2, without performance optimization) was used to predict the structure of the yeast interactome using a large supercomputing cluster . We hope that the improved efficiency in ZDOCK 3.0.2 will permit further utilization of this docking tool in advanced research efforts. Structural interactome modeling in particular has had numerous recent advances , and rigid-body docking of domains and proteins from structural genomics efforts can complement atomic-level interactions modeled based on homology, and build upon the success in modeling structures of sub-proteome interaction networks .
Others have recently presented algorithms that perform fast rigid-body protein docking using FFT approaches. Notably, the program Hex, which uses spherical harmonics rather than a 3D Cartesian grid to represent proteins, but does not contain atomic pairwise potential terms as in ZDOCK 3.0.2, was optimized using graphical processors to achieve very high docking efficiency . In the same study, comparison of ZDOCK 3.0.1 with the program PIPER (which has pairwise potential terms based on docking decoys ) found ZDOCK 3.0.1 to be over 50 times faster when both are run on a single CPU of the same speed.
Future work will include the development and validation of ensemble and cross-docking approaches using ZDOCK 3.0.2, as well as incorporating it into the ZDOCK server (http://zdock.bu.edu), which at present uses ZDOCK 2.3 as the computational requirements of ZDOCK 3.0 precluded its use in the server framework.
By incorporating several methods to improve computational efficiency into the ZDOCK program, we have achieved significant performance gains for two versions of the ZDOCK program, with no loss in docking accuracy. The two new versions of ZDOCK presented here, 2.3.2 and 3.0.2, are freely available to academic and non-profit users at: http://zlab.umassmed.edu/zdockconv3d/.
Predictive performance of ZDOCK 2.3, 2.3.1, 2.3.2f and 2.3.2 for the test cases in Benchmark 4.0.
Predictive performance of ZDOCK 3.0, 3.0.1, 3.0.2f and 3.0.2 for the test cases in Benchmark 4.0.
Running times of ZDOCK versions 2.3, 2.3.1, 2.3.2f, and 2.3.2 for the test cases in Benchmark 4.0.
Running times of ZDOCK versions 3.0, 3.0.1, 3.0.2f, and 3.0.2 for the test cases in Benchmark 4.0.
The authors would like to thank Mary Ellen Fitzpatrick for computing support, Howook Hwang, Thom Vreven, and Xianjun Dong for valuable discussions, and the Scientific Computing Facilities at Boston University for computing resources.
Competing Interests: The authors have declared that no competing interests exist.
Funding: This work was funded by National Institutes of Health grant GM084884. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.