|Home | About | Journals | Submit | Contact Us | Français|
Reliable prediction of protein-ligand docking pose requires proper account of induced fit effects. Treating both the ligand and the protein as flexible molecules is still challenging because many degrees of freedom are involved. Peptides are one type of ligands that are particularly difficult to study because of their extreme flexibility. In this study, we tested a molecular dynamics-based simulated-annealing cycling protocol in docking peptides to four protein kinases and two phosphatases using two implicit-solvent models: a distance-dependent dielectric model (ε(r)=4r) and a version of the Generalized Born model termed GBMV. We found that the simpler ε(r)=4r model identified docking pose better than the more expensive GBMV model. In addition, rescoring structures obtained from one implicit-solvent model with the other identified good docking poses for all six systems studied. Including protein energy in scoring also improved results.
Reliable prediction of protein-ligand docking structure remains a challenging problem. Peptide ligands present one of the more difficult cases because of their high flexibility, requiring extensive configurational sampling. Even for the rigid-protein model, it is demanding to handle flexible ligands with over 15–20 rotatable bonds, which is typical even for short peptides. This difficulty increases when the protein receptor is also flexible and a peptide ligand only binds to a small subset of the many possible conformations of the protein.
Nevertheless, an increasing number of research groups have employed various strategies to include protein flexibility in docking. For example, one can dock a ligand to an ensemble of receptor conformations obtained from molecular dynamics simulation, from normal mode calculation, from crystal structures with various ligands bound, from NMR structures, and from structures constructed with different combination of side-chain conformations in the binding site1–14. One can also “enlarge” the docking pocket in a protein by softening the van der Waals (vdw) potential (ref15 gives an example) or by offsetting vdw radii (e.g., reference16) to approximate the effects of protein flexibility.
More realistic approaches directly simulated the coupled structural change of the protein and the ligand. These approaches allowed protein movement in different ways. For example, the program FLIPDOCK17 used a Flexibility Tree (FT) data structure to choose a small number of variables to describe the key structural changes required by the protein to accommodate the ligand. On the other hand, the program FITTED18 employed a genetic algorithm to generate diverse protein structures.
Molecular dynamics-based methods provide another useful approach. Unlike many docking potentials solely designed for docking rigid or flexible ligands to rigid proteins, molecular dynamics-based methods take advantage of force fields already designed for simulating protein motion. One approach by Mangoni et al19 facilitated the docking of a flexible ligand to a flexible protein by assigning a high temperature to the translational motion of the center-of-mass of the ligand and only allowed the binding pocket of the protein to move. Another study by Nakajima et al20 employed the multi-canonical molecular dynamics simulation method to dock a flexible peptide to the SH3 domain of Abl tyrosine kinase. In their work, they only allowed the side chains in the ligand-binding pocket to move. On the other hand, our earlier work permitted full flexibility of every side chain but restrained the α carbons to suitable experimental reference structures to reduce computational time and to avoid sampling unphysical structures due to the use of non-optimal force fields21–23. This model have successfully docked small organic ligands to protein kinases and phosphatases21–23. This study further explores its application to the more challenging problem of peptide-protein docking.
One way to facilitate extensive configurational search is to use implicit- rather than explicit-solvent models. Researchers have now developed numerous implicit-solvent models, including simple distance-dependent dielectric models, generalized Born models, and Poisson-Boltzmann models. (Gilson and Zhou recently provided a good review of these models24.) Distance-dependent dielectric and Generalized Born models are cheaper to use and were first models to consider in this study. The ε(r) = r model, where r is the distance between two atoms in a molecular system, was the first distance-dependent dielectric model employed in molecular dynamics simulation of proteins. Later on, dielectric function that increased more rapidly with distance has become popular; ε(r) = 4r model is one of them. This and other distance-dependent dielectric models are now widely used in programs such as CHARMM25, DOCK26; FITTED18 and others21–23,27–31.
A more complex model is the Generalized Born model first proposed by Still and co-workers32. One form of their model calculates the electrostatic contribution to solvation energy by:
where εp represents the low dielectric constant inside a solute such as a protein, εw is the dielectric constant of water, and qi and qj are the charges of atoms i and j, respectively. There are various ways of calculating the effective Born radii αi and αj33–42, among them is the Generalized Born using Molecular Volume (GBMV)39,40 model which we used in this study. Because Generalized Born models are more expensive than simple distance-dependent dielectric functions to use, they are still not commonly employed in molecular docking, especially in larger scale virtual screening. On the other hand, it has been used to refine and rescore docking pose. For example, Lee and Sun recently reported43 that further optimizing and rescoring docking poses obtained from the DOCK program could improve the identification of correct docking pose in a test set of 79 protein-ligand complexes.
In this work, we focused on evaluating the ε(r) = 4r and the GBMV models in docking peptides to protein kinases and phosphatases. We used both models to perform docking directly and to rescore docking poses obtained from the other model. The main goal was to find out whether these models or a mixture of them could identify good docking pose for protein-peptide systems.
Section 2 describes the simulated annealing cycling protocol that we applied successfully to dock small molecules to protein kinases and phosphatases earlier. It also describes how we set up the systems and presents several energy models we used to identify docking pose. Section 3 details the results and discusses the performance of different models. Section 4 concludes with our major findings.
We used a novel simulated annealing cycling protocol described previously for protein-ligand docking.21,23 This protocol was found to consume less computational time than regular replica-exchange and simulated annealing methods if the goal was solely to find good docking pose.22 Each simulated annealing (SA) cycling simulation was performed as follows:
We continued the above loop until a prescribed simulation length was reached. Typically, each simulated annealing cycle approximately covered the following temperatures: 500 K, 250 K, 125 K, 62.5 K, 31.2 K, 15.6 K, 7.8 K, and 3.9 K. The end of each simulated-annealing cycle identified a structure near a local or the global energy minimum. By performing many such short simulated annealing cycles in a molecular dynamics simulation, one could identify many energy minima. If the energy and solvation model was sufficiently realistic, one could identify the correct docking pose from the lowest-energy structures, which we showed previously to be the case in docking small organic molecules to protein kinases22,23 and phosphatases21. In the simulation performed here, we ran twenty independent trajectories starting from the same structure and temperature but initiated atomic velocities with different random number generating seeds. By using two starting structures, we ran a total of forty trajectories for each protein-peptide system. We modified the MMTSB Toolkit45 to use with the CHARMM package25 to perform the simulated annealing docking just described.
We chose six protein-peptide test systems in this study: 1) protein tyrosine phosphatase 1B (PTB1B) and a phosphorylated tetrapeptide Ac-DEpYL--NH246; 2) protein tyrosine phosphatase YopH of Yersinia pestis and a mimic of a phosphorylated hexapeptide Ac-DADE-F2Pmp-L-NH247; 3) the activated form of the catalytic domain of the insulin receptor protein kinase (PTK) and a hexapeptide with the sequence GDYMNM48; 4) insulin-like growth factor 1 receptor kinase (IGF) and an octapeptide with the sequence GEYVNIEF49; 5) cyclic AMP-dependent kinase or protein kinase A (PKA) and a fragment of its peptide inhibitor PKI containing residues 14 through 2450; 6) cyclin dependent protein kinase 2 (CDK2), with cyclin A bound, and a recruitment 9-mer peptide derived from E2F151. Entries 1PTT46, 1QZ052, 1IR348, IK3A49, 1FMO50 and 1H2451 in the Protein Data Bank provided the co-crystal structures for these protein-peptide systems.
For the YopH-peptide system, we included six residues of the peptide in our simulation although only four residues could be observed in the crystal structure (in PDB entry 1QZ0). In addition, we included three crystal water molecules in the binding pocket that facilitate the interactions between the phosphotyrosine residue of the peptide and the protein. As this study focused on evaluating whether current force fields and the two solvation models were good enough for identifying good docking pose, we constrained these water molecules near their experimental structures during docking to reduce sampling time. In principle, one could allow these waters to move throughout the docking process but this would require longer simulation time to find good docking pose, especially when the starting structure was far away from the best docking pose. For the PTK-peptide system, the hexapeptide GDYMNM in the crystal structure was used. Its N-terminus was acetylated and its C-terminus was in the N-methyl amide form. We replaced the ATP analog in the crystal structure by ATP in the simulation. For the IGF-peptide system, we used the octapeptide GEYVNIEF in the crystal structure and blocked it similarly as the hexapeptide above. We also replaced the ATP analog by ATP. In addition, the coordinates of eight missing residues (A1069-A1076) in PDB entry 1K3A were obtained from the corresponding residues in PDB entry 1IR3 after superposition of their protein structures. Because three of these eight residues were different in these two protein structures, we only used the coordinates of the backbone atoms for these three residues from PDB entry 1IR3 and we used CHARMM25 to build the side chains. For the PKA-peptide system, we only used eleven residues, 14 through 24 with the sequence GRTGRRNAIHD, of PKI in the docking study to reduce computational time, even though the crystal structure contains a longer segment from residues 5 to 24. In the simulation, we also kept the adenosine in the crystal structure as it might facilitate protein-ligand recognition. For the CDK2-peptide system, we used the nine-residue peptide PVKRRLDLE and we only included the B chain of CDK2 from the crystal structure in the simulation.
During the simulation, we applied RMSD-restraint to keep the proteins, along with the additional water molecules, adenosine, or ATP, close to their crystal structures, as done previously21,22. Only the α carbons of the proteins, the water oxygens in the YopH-peptide system, the adenosine in the PKA-peptide system, and ATP in the PTK-peptide and IGF-peptide systems were restrained. These restraints prevented sampling of unrealistic structures far from experiment - due to the limitation of current force fields and solvation models - and reduced computational costs. Yet, the weak restraints permitted ample conformational flexibility of the proteins to accommodate a more diverse set of small organic molecules and peptides that would have been excluded by rigid-protein docking models.
Prior to each molecular dynamics simulation, we performed 500 steps of steepest descent energy minimization of a protein-peptide complex to remove bad contacts. During the energy minimization, the α carbons, the water oxygens, and the heavy atoms of adenosine and ATP were held fixed. On the other hand, for the IGF-peptide system, we also allowed the α carbons of the eight residues that we modeled in to relax.
We started each peptide-docking simulation either from the X-ray structure or with the peptide in the extended conformation on the protein surface near its binding site. We generated the extended structure of each peptide by the CHARMM program25. Twenty trajectories of simulated annealing cycling simulations were performed for each of the two starting structures so that forty trajectories resulted. Because each trajectory lasted 2 ns, the aggregate simulation time for each protein-peptide system covered 80 ns. We used a time step of 2 fs in these simulations. For the simulations using the GBMV model, we ran each trajectory for only 1 ns rather than 2 ns because it consumed significantly more computational time.
The simulated annealing cycling docking simulations were conducted with the CHARMM param27 force field53,54. In the simulation using the ε(r) = 4r model, we used a non-bonded cutoff distance of 14 Å, a switching function for the electrostatic interactions that began at 10 Å and ended at 12 Å, and a shifting function for the Lennard-Jones potential. We also used this ε(r) = 4r model in the energy minimization preceding each molecular dynamics simulation. In rescoring docking pose with the GBMV39,40 model, we used the GBMV1 parameters of Chocholoušová and Feig41 and we changed the above cutoff distances to 12 Å, 8 Å and 10 Å respectively.
In the simulation using the GBMV39–41 model, we also used a non-bonded cutoff distance of 12 Å and a switching function applied between 8 Å and 10 Å. However, subsequent rescoring with the ε(r) = 4r model changed these cutoff distances back to 14 Å, 10 Å and 12 Å respectively.
Different simulation models required different CPU times on a dual core-dual processor cluster node with 2.8GHz Intel Xeon EM64T processors. Table I compares the CPU time of the two simulation models for the six protein-peptide systems studied. The flexible-protein/GBMV model was about an order of magnitude more expensive than the flexible-protein/ε(r) = 4r model, even though the length of the former simulation was only half of the latter. It took more than six months using twenty CPUs to perform the GBMV simulations for the six protein-peptide systems studied here.
We tested several energy models for selecting good docking pose:
Table II shows the closest structures to experiment sampled by the simulated annealing cycling simulations using the ε(r) = 4r model. Obviously, simulations staring from the X-ray structure sampled more structures near the crystal structures (within about 1.5 Å) than simulations starting from the extended conformation of the peptide on the protein surface. Nevertheless, four out of the six systems (the YopH-peptide, PTK-peptide, IGF-peptide, and PKA-peptide systems) found structures closer to ~2.14 Å (RMSD of all heavy atoms: RMSDheavy) of experiment when the simulations were started from the protein surface. The other two systems - PTP-peptide and CDK-peptide –found structures up to only about 4 Å from experiment. Therefore, it will take longer simulation time to find good docking poses if one starts simulations with the extended conformation of the peptides on the protein surface.
Table III shows the best peptide-protein docking poses obtained with the GBMV model. In this case, only two out of the six systems, PTK-peptide and IGF-peptide, found docking poses within about 1.6 Å (RMSDheavy) of experiment if the simulations were started from the extended form of the peptides on the protein surface. The other four systems failed to find docking poses within 3 Å (RMSDheavy) of experiment within the duration of the 40-ns simulation. Because this study focused on examining whether the CHARMM force field along with two implicit-solvent models could properly identify the experimental co-crystal structures, we started half of the trajectories from the crystal structures, instead of from the extended conformation on the protein surface - to ensure that we sampled the region near the experimental structures well without running much longer simulation.
Figures 1a1 to 1a6 plot RMSDheavy versus energy for the six protein-peptide systems using the ε(r) = 4r model and Energy Model I (total energy). We only used structures near local energy minima obtained below 5 K in the plots. These plots show that this energy model identified good docking poses within ~3 Å of experiment for five of the six systems. The structures with the lowest energy were within 3.13 Å, 1.57 Å, 3.38 Å, 2.78 Å and 2.42 Å (RMSDheavy) of experiment for the PTP-, YopH-, PTK-, IGF-, and PKA-peptide systems respectively. On the other hand, this energy model failed to identify good docking pose for the CDK-peptide system; the lowest-energy structure was 6.30 Å (RMSDheavy) away from experiment.
Figures 1b1 to 1b6 show similar results when we re-scored the structures with Energy Model II, which included only the peptide energy and the protein-peptide interaction energy but not the protein energy. Although we previously found this model to work well for selecting good docking poses for small organic ligands21–23, it did not work well for peptide ligands. The lowest-energy poses for the PTP-, PTK-, IGF-, PTK-, and CDK-peptide systems deviated significantly from their corresponding crystal structures - with RMSDheavy equaled to 7.35 Å, 8.29 Å, 7.35 Å, 8.93 Å and 8.41 Å, respectively. The YopH-peptide worked a bit better but the lowest-energy pose still deviated from the crystal structure by 4.04 Å. Further excluding the ligand energy (Energy Model III) did not work either (the plots resembled Figures 1b1 to 1b6 and are not shown). Therefore, it seems important to include protein energy in scoring protein-peptide structures.
To test whether rescoring results from the ε(r) = 4r model with the GBMV model could improve results, we plotted RMSDheavy versus energy for the six protein-peptide systems using GBMV Energy Model I (Figures 2a1 to 2a6). These plots show that rescoring with the GBMV model significantly improved the identification of good docking poses. The lowest-energy docking poses were within 2.76 Å, 1.34 Å, 1.27 Å, 1.66 Å, 1.48 Å and 1.28 Å of experiment for the PTP-, YopH-, PTK-, IGF-, PKA-, and CDK-peptide systems respectively. On the other hand, rescoring with GBMV Energy Model II also notably improved the identification of experimental docking poses than the original ε(r) = 4r Energy Model II but it did not work as well as rescoring with GBMV Energy Model I (Figures 2b1 to 2b6); we obtained good docking poses for only four out of the six systems: YopH-, PTK-, IGF-, and PKA-peptide systems with RMSDheavy equaled 1.70 Å, 1.92 Å, 1.63 Å, and 2.45 Å respectively for their lowest energy structures. In contrast, the lowest-energy poses for the PTP-peptide and the CDK-peptide deviated significantly from their corresponding crystal structures - with RMSDheavy reaching 6.42 Å and 6.14 Å respectively.
We performed similar studies by running the simulated annealing cycling simulation with the GBMV model instead of the ε(r)=4r model. Figures 3a1 to 3a6 plot RMSDheavy versus energy for the six systems using Energy Model I. These plots show that the GBMV model could identify good docking poses for the PTP-peptide and IGF-peptide systems to within 2.72 Å (RMSDheavy) and 2.10 Å of the crystal structures respectively. It worked a bit less well for the PKA-peptide and CDK-peptide systems, with the lowest-energy structures deviated from experiment by 3.09 Å and 3.96 Å (RMSDheavy) respectively. And it failed to identity docking poses closer than 8 Å for the other two systems: YopH-peptide and PTK-peptide. Interestingly, rescoring the structures from the GBMV simulation with the ε(r)=4r model managed to identify good docking poses for all the six systems (Figures 3b1 to 3b6). The lowest-energy rescoring poses were within 1.98 Å, 1.21 Å, 2.42 Å, 1.19 Å, 1.62 Å and 1.47 Å of experiment (RMSDheavy) for the PTP-, YopH-, PTK-, IGF-, PKA- and CDK-peptide systems respectively.
We further examined excluding the protein energy in the GBMV model (Energy Model II) for scoring but obtained even poorer results. Figures 4a1 to 4a6 show that this energy model failed to identify good docking pose except for the IGF-peptide system, whose lowest-energy docking structure was within 2.37 Å (RMSDheavy) of experiment. On the other hand, GBMV Energy Model II could identify docking poses to 3.79 Å (RMSDheavy) at best for the PTP-peptide system, 5.76 Å for the CDK-peptide system, and more than 8 Å for the remaining 3 systems. However, rescoring these structures using the other solvation model, ε(r)=4r (Energy Model II), again could identify good docking poses for all six systems (Figures 4b1 to 4b6). The lowest-energy rescoring poses were within 2.36 Å, 1.73 Å, 1.76 Å, 2.04 Å, 2.66 and 3.17 Å of experiment (RMSDheavy) for the PTP-, YopH-, PTK-, IGF-, PKA- and CDK-peptide systems respectively.
The above results show that the ε(r)=4r model performed better than the GBMV model for peptide-protein docking when each model was used alone. In addition, Energy Model I (total energy) performed the best. Interestingly, no matter which solvation model was used in running the simulated annealing cycling simulation, rescoring structures with the other solvation model always yielded better results, with good docking poses identified for all the six systems studied here. Table IV summarizes the results for the six protein-peptide systems using the ε(r)=4r model for docking and several energy models for scoring. It is clear that scoring the structures obtained from ε(r)=4r simulated annealing cycling simulation with the GBMV model always gave good docking poses for Energy Model I. Table V shows that scoring structures obtained from the GBMV simulations with the ε(r)=4r model also identified good docking poses for all six systems if Energy Model I and Energy Model II were used. These results suggest the following practical strategy to identify good docking pose in protein-peptide docking: First use the cheaper ε(r)=4r model to dock a peptide to a protein kinase or phosphatase and then score the resulting structures with the GBMV model to identify docking pose using Energy Model I, i.e., total energy of the system.
We mentioned above that the GBMV model had more difficulty finding the correct docking pose when we started simulations with the extended conformations of the peptides on the protein surface. To analyze this in more detail, we compared the range of peptide structures sampled by the GBMV and the ε(r)=4r models. Table VI shows that for all the six systems, the GBMV simulations sampled structures further away from the crystal structures in comparison to the ε(r)=4r model. (Note that only the first 1 ns of the trajectories in the ε(r)=4r model was used for comparison because the GBMV trajectories lasted only 1 ns, instead of 2 ns.) This is consistent with earlier studies that protein-peptide structures were less stable in molecular dynamics simulations using Generalized Born models30.
Previously, we applied a molecular dynamics-based simulated annealing cycling protocol to dock small organic molecules to flexible models of protein kinases and phosphatases. In this work, we extended this approach to docking peptides to similar proteins. We ran simulations with the CHARMM force field using either the ε(r)=4r or the GBMV implicit-solvent model but we scored the structures with several energy models: total energy (Energy Model I), total minus protein energy (Energy Model II), and protein-peptide interaction energy only (Energy Model III) within the same solvation model and with the other solvation model. We obtained the best results by running docking simulation with one solvation model and scoring the structures with the other solvation model using the total energy of the system (Energy Model I). This strategy worked for all six systems studied here. If one used only one solvation energy model in running the simulated annealing cycling docking simulation and in scoring, the cheaper ε(r)=4r model performed better than the GBMV model and it was better to use the total energy of the system. In addition, using one solvation energy model alone could not identify the correct docking pose for all six systems.
Therefore, taking computational efforts into account, the best model tested here involved running docking simulation with the cheaper ε(r)=4r model and scoring the resulting structures with the GBMV model using the total energy of the system (Energy Model I).
This research was supported by a Research Award from the University of Missouri-Saint Louis, a Research Board Award from the University of Missouri System, the National Cancer Institute, and the National Institute of Allergy and Infectious Diseases. We also thank the University of Missouri Bioinformatics Consortium and the University of Missouri-Saint Louis Information Technology Services for providing computational resources.