We adopt a rotameric approach to model the side chain conformational flexibility of proteins. Due to the large physiochemical space of ligands and also the large number of degrees of conformational freedom, it is impossible to build an enumerative rotamer library in the same way as for proteins24
. Therefore, we construct a rotamer library of ligands in a stochastic manner during each MedusaDock simulation. OMEGA by OpenEye Scientific Software (http://www.eyesopen.com/
) is often used to pre-compute the ligand conformations. Although OMEGA uses a different ligand conformation generating method, the depth-first algorithm, we find that our simple method feature similar performance in terms of finding the bioactive rotamers (kRMSD<1 Å) or computational time (~1s for each ligand) 37
. Although the near-native ligand conformations with kRMSD smaller than 2.0 Å or even 1.0 Å can consistently be sampled by STROLL (), the native ligand rotamer will not always be included in the rotamer library due to the stochastic nature of library generation, which in turn might affect the efficiency of finding and identifying the near-native binding pose. To test the efficiency and accuracy of MedusaDock, we compare the self- and cross-docking benchmark results with and without manually including the native ligand rotamers in STROLL (). As expected, manual inclusion of the native ligand rotamer allows MedusaDock to sample the near-native poses more efficiently, with more near-native poses sampled (). However, we find that the inclusion of the native poses does not significantly affect the docking prediction accuracy in terms of ranking the near-native poses (). This observation suggests that our ligand sampling by STROLL is sufficient for accurate docking.
Effect of inclusion of the native ligand rotamer in docking calculations
To increase the computational efficiency, we devise a two-step docking protocol, including the initial coarse-docking with representative ligand rotamers followed by fine-docking to enrich the ligand rotamers and minimize the docking poses. Clustering of the ligand rotamers in STROLL helps to group similar rotamers together, and coarse-docking using the corresponding representative conformations helps to avoid the repetitive calculation of rigid-body motions of similar ligand conformations. Since fine-docking only perturbs translation and rotation around the input coarse-docked pose, it is crucial for the coarse-docking procedure to place the ligand in the proximity of the native position and orientation. As pointed out by previous flexible docking studies 23, 26
, one of the challenges of docking is to fit the ligand simultaneously to multiple deep pockets, i.e., a rugged energy landscape (). We smooth the energy landscape by turning off the van der Waals repulsion between the ligand and the receptor side chains ( and ). Therefore, in the coarse-docking step, we adopt an iterative rigid-body docking and receptor side chain packing approach with slowly increasing van der Waals repulsion between the ligand and side chains to facilitate the search of the rugged energy landscape (Methods). During fine-docking, the ligand and receptor side chain rotamers are simultaneously modeled in a strongly-coupled manner. The self- and cross-docking benchmark of MedusaDock suggests that the proposed docking protocol is quite efficient in the sampling of the near-native poses (); in many cases (~80%) the program identifies more than 10 near-native poses from 100 runs of MedusaDock simulations.
The rugged energy landscape of docking
MedusaDock uses MedusaScore to guide docking and to rank the docking poses. MedusaScore uses a physical-based force field to describe the physical interactions between the ligand and the receptor. By correctly docking the ligand to the receptor, MedusaDock can capture the same important binding interactions as in the native pose. For example, the MedusaScore of poses minimized from the X-ray crystallographic structures is close to that of the best-ranked self-docking poses, which are mostly near-native (). Despite the fact that in many cases the near-native poses are not ranked as the top poses in cross-docking, the ranks of these near-native poses are quite high, mostly within the top ten. Hence, the MedusaScore of the best-ranked pose in cross-docking — the predicted binding energy — is similar to that of the best-ranked near-native pose, featuring the same binding interactions as self-docking (). Therefore, the MedusaScore from cross-docking simulations can be reliably used to estimate the binding energy for virtual-screening, which has been validated by our preliminary virtual-screening tests. Further test of the application of MedusaDock in virtual-screen by benchmarking on the whole DUD dataset and by comparing with other docking programs is required in the future study.
In this work, we do not model the receptor backbone flexibility. As a result, there are two cases in cross-docking where the program fails to identify the near-native poses due to severe backbone clashes near the ligand-binding pocket ( and ). Several approaches have been used to model protein backbone flexibility, including ensemble docking with multiple backbone conformations10
and backbone relaxation7, 23
. Interestingly, our method is able to capture the near-native poses of several challenging cases that other flexible receptor-backbone methods missed, such as thrombin (1dwc, 1dwd pair; Table 1 in ref.26
) and PPARγ (1FM9, 2PRG pair; Table 3 in ref.10
). We believe that the ability of MedusaDock to find near-native poses is a result of the efficient modeling of ligand conformations and protein side chains using rotameric approaches. However, we believe that further modeling of the receptor backbone flexibility will help improve the predictive power 38
Large-scale virtual-screening requires high computational efficiency, since a large library with millions of ligands will be docked to a target protein. The docking program can be highly parallelized since the calculations are independent of each other. Additionally, since the docking program is able to sample many near-native poses with high efficiency, we can reduce the total number of MedusaDock runs in a future virtual-screening study. The preliminary virtual-screening benchmark results for CDK2, VEGFr2, HIVRT, and HIVPR clearly demonstrate the capabilities of MedusaDock in large-scale virtual-screening, even for flexible protein targets.