|Home | About | Journals | Submit | Contact Us | Français|
Incorporating receptor flexibility into molecular docking should improve results for flexible proteins. However, the incorporation of explicit all-atom flexibility with molecular dynamics for the entire protein chain may also introduce significant error and “noise” that could decrease docking accuracy and deteriorate the ability of a scoring function to rank native-like poses. We address this apparent paradox by comparing the success of several flexible receptor models in cross-docking and multiple receptor ensemble docking for p38α mitogen-activated protein (MAP) kinase. Explicit all-atom receptor flexibility has been incorporated into a CHARMM-based molecular docking method (CDOCKER) using both molecular dynamics (MD) and torsion angle molecular dynamics (TAMD) for the refinement of predicted protein-ligand binding geometries. These flexible receptor models have been evaluated, and the accuracy and efficiency of TAMD sampling is directly compared to MD sampling. Several flexible receptor models are compared, encompassing flexible side chains, flexible loops, multiple flexible backbone segments, and treatment of the entire chain as flexible. We find that although including side chain and some backbone flexibility is required for improved docking accuracy as expected, docking accuracy also diminishes as additional and unnecessary receptor flexibility is included into the conformational search space. Ensemble docking results demonstrate that including protein flexibility leads to to improved agreement with binding data for 227 active compounds. This comparison also demonstrates that a flexible receptor model enriches high affinity compound identification without significantly increasing the number of false positives from low affinity compounds.
It is widely recognized that a significant problem in the field of molecular docking is the incorporation of protein flexibility.1,2 In experimentally characterized flexible protein binding sites, conformational changes are observed upon binding different ligands that include minor changes of side chain positions, significant local side chain rearrangement, flexible loop rearrangements, larger backbone movements of the binding pocket, and large hinge motion of domains.3 Incorporating protein flexibility in molecular docking is challenging because it is difficult to predict the extent of protein conformational changes upon ligand binding (challenges in sampling), and it is also more difficult to score flexible protein-ligand complexes accurately (challenges in scoring). It is clear from the literature that rigid receptor docking can be quite successful for some receptors.4 However, when rigid receptor docking is used for receptors that are known to be flexible, docking results from a single receptor conformation often suffer from too many false negatives5 that may be accommodated by minor conformational changes6 and/or slightly more open conformations of the binding pocket.7 Although many studies have shown that incorporating models of protein flexibility can improve docking for these cases, we demonstrate in this article that explicit protein flexibility with all atom molecular dynamics (MD) for the entire protein chain may also introduce significant error and “noise” thereby decreasing docking accuracy and diminishing the ability of a scoring function to discriminate native-like binding poses. In contrast to this result, we also show that similar models that use MD or torsion angle molecular dynamics (TAMD) sampling for flexible side chains and flexible loops are significantly more successful than rigid docking.
In this study, we assess the incorporation of explicit all-atom protein flexibility using both MD and TAMD in CHARMM. Both methods allow flexibility of protein and ligand simultaneously, which is essential for refinement in protein-ligand conformational space. However, TAMD (internal coordinate space) is more efficient in sampling compared to MD (Cartesian space) because there are significantly fewer degrees of freedom in internal coordinates. The use of conformational dynamics in internal coordinates for flexible receptor docking was pioneered by Abagyan and coworkers.8,9 Currently, explicitly exploring a large amount of conformational space for both protein and ligand during the docking procedure is too computationally expensive for screening a large number (>105) of compounds. As many docking algorithms have been optimized for fast and accurate docking of a flexible ligand into a static protein structure, a computationally attractive approach to incorporate protein flexibility is to use multiple receptor conformations (MRCs) for rigid docking, which is now usually referred to as “receptor ensemble docking” (RED).10 In this study, we assess several different flexible receptor models using cross-docking accuracy and also compare RED results to experimental binding data.
P38α mitogen-activated protein (MAP) kinase is a difficult test case for cross-docking in that it has a very flexible ligand binding site experimentally characterized by numerous crystal structures. Hydrogen exchange experiments monitored by mass spectrometry have shown that the unphosphorylated inactive state of p38α is significantly more flexible than the doubly phosphorylated active state.11 The majority of discovered inhibitors successfully bind to this more flexible inactive form of the active site, and some inhibitors bind to an alternative conformational of the binding site known as the “DFG-out” conformation.12 This alternative conformation of the binding site is denoted the DFG-out conformation because the F169 side chain on the flexible activation loop flips out into the binding site revealing a deep hydrophobic pocket to which ligands are able to bind (Figure 1A). Inhibitors have now been observed binding in the DFG-out conformation in numerous tyrosine kinases and ser/thr kinases.13 It has been proposed that targeting the enzyme in the flexible inactive (non-phosphorylated) conformation is a good strategy for designing selective kinase inhibitors.14 Several of these inhibitors have been shown to be selective rather than promiscuous and this has been attributed to binding interactions in the DFG-out conformation. In this paper, we examine an ensemble of 12 inactive (non-phosphorylated) p38α receptor conformations: 9 conformationally diverse crystal structures of DFG-in conformations (1a9u, 1bl6, 1bl7, 1di9, 1ouk, 1ouy, 1oz1, 1w84, 1yqj),15-20 and 3 crystal structures of DFG-out conformations (1kv1, 1kv2, 1w83).18,21
With cross-docking simulations it is possible to determine the sensitivity of docking results to changes in protein conformation by analyzing the effect on the docking accuracy and the prediction of ΔGbinding. The entire 12×12 cross-dock was performed from two different starting conformations of the ligand. The first was starting from the “swapped conformation” of the ligand from its co-crystal structure into a new crystal structure. The second was started from the “best rotation,” which is the lowest energy conformation from 1000 rigid body rotations into the new receptor. Cross docking results starting from the swapped conformation are biased towards native-like poses and should represent the theoretical maximum cross-docking accuracy for the rigid receptor that atom clashes and conformational changes will allow. Rigid receptor docking results starting from the best rotation are unbiased and more directly analogous to the flexible receptor docking method, which also starts from a best rotation into a flexible receptor conformation. For cross docking with a rigid receptor, the top 5 lowest energy docking poses were identified using the RDIE scoring function. The reason why it is important to consider the top 5 scoring poses, and not only the best single scoring pose is that scoring functions are not always able to correctly rank the most native-like poses, even in the case of self-docking. In comparing cross-docking starting from the swapped conformation and from the best rotation, the results are similar when only the top-scoring pose is considered. However, when the top 5 scoring poses are examined it is possible to see that cross docking from the swapped conformation had a higher percentage (38% ≤ 2.0 Å RMSD) of docking success compared to starting from the best rotation into the new receptor conformation (25% ≤ 2.0 Å RMSD). This is expected, as the conformational search starting from the swapped conformation should be more biased towards low RMSD native-like poses. The overall docking accuracy from rigid receptor cross docking is similar for the lowest energy pose regardless of whether they are scored using either the RDIE or the LIE1 scoring function, but the LIE1 scores perform slightly better if you consider all of the top 5 scoring poses. Three receptors (1kv1, 1kv2, 1w83) are in the DFG-out conformation,18,21 and when the results from only the single top scoring poses are considered, these receptors are only able to accurately cross-dock their own ligands to within (≤ 2.0 Å RMSD). Two smaller ligands 1di9 (a quinazoline based fragment) and 1w84 (an indole based fragment) which bind in a similar conformation and are selective to the DFG-in conformation, are also able to bind to the DFG-out receptor conformations within (≤ 3.0 Å RMSD) because the conformation of Phe 169 does not cause a direct atom clash. The ligands selective for the DFG-out conformation are not able to dock accurately in any of the 9 DFG-in receptor conformations.
It was our goal to directly compare cross-docking results using several models of protein flexibility scaling from small side chains movements to large backbone movements. We first considered a rigid backbone and flexible side chain model and then constructed several flexible protein models that were combinations of flexible side chains and flexible backbone segments. To determine which side chains should be considered flexible, the 12 reference protein-ligand complex structures were superimposed by the best fit to match the backbone Cα coordinates. After superposition, distances less than 6 Å between heavy atoms of ligand and protein side chain were used to determine that 34 total residue side chains were considered to interact with the 12 ligands. In the approximation of a rigid backbone, for large residue side chains to adequately sample conformational space, the side chains of adjacent nearest neighbor residues should also be considered to be flexible to allow for concerted structural rearrangements. Therefore, the 34 residues comprising the binding site were linked together into six distinct segments of contiguous residues (res: 30-41, 49-55, 67-89, 103-115, 137-157, 165-177) that form the topology of the binding site. These six contiguous residue segments were then considered to all have flexible side chains, for a total of 74 residues (non gly or pro) with flexible side chains. The two most flexible segments in the superposition of the reference structures are (res:30-41) which will be denoted the “P-loop” (phosphate binding loop) and part of the activation loop (res:165-177) which will be denoted the “DFG loop”. These two loops were the most obvious to include as flexible backbone segments in combination with flexible side chains. In total 8 flexible protein models were constructed which are summarized in Table 1. Model 1 includes flexible side chains only. Models including backbone flexibility were then combined with these flexible side chains: model 2 includes backbone flexibility only in the DFG loop, while model 3 includes backbone flexibility in the two most flexible loops the P-loop and the DFG loop. Model 4 includes backbone flexibility in all six contiguous residue segments that form the topology of the binding site. Subsequent models 5 through 7 were constructed that incrementally include flexibility into previously rigid segments. Model 7 is composed of one large flexible segment that defines the entire topology of the binding site (res:30-177) but still allows 58% of the protein chain to be rigid. Model 8 includes full backbone and side chain flexibility of the entire protein (res:5-351).
For flexible receptor docking using TAMD, the entire 12×12 cross-dock was performed for the 8 different models of protein flexibility. Ligand heavy atom RMSD metrics were used to compare overall docking accuracy over all 144 cross-docking simulations (Table 2). Considering the percentage where the lowest scoring pose is ≤ 2.0 Å RMSD, the docking accuracy seems to decrease proportional to the log of the flexible receptor degrees of freedom. The docking accuracy of flexible models 1 through 6 is all significantly better than rigid docking. Flexible model 8, where the entire protein chain is considered to be flexible, is the only model that had a lower docking accuracy than rigid docking. The fact that the most flexible receptor model 8 exhibited the lowest docking accuracy agrees with observations from previous studies suggesting that increased receptor flexibility may result in lower overall docking accuracy.22 However, when you consider the results including the top 5 scoring poses, model 8 still performs much better than rigid docking. This indicates that model 8 was able to successfully sample and then rank more native-like poses into the top 5 compared to rigid docking. For model 8, when low RMSD poses are in the top 5, there is a diminished ability of the scoring function to correctly rank the most native pose as the best pose compared to the other flexible receptor models. For model 8, when a pose ≤ 2.0 Å RMSD is represented in the top 5, it is correctly ranked as the lowest scoring pose only 21% of the time. In contrast for model 1, where only the side chains are flexible, when a pose ≤ 2.0 Å RMSD is represented in the top 5, it is correctly ranked as the lowest scoring pose 67% of the time. Models 2 through 4 are able to correctly rank low RMSD poses from the top 5 as the top pose on the order of 42-52% of the time. Therefore, the most flexible receptor model displays a reduced ability to score or rank native-like binding poses compared to the other models.
Considering only the percentage where any of the top 5 scoring poses is ≤ 2.0 Å RMSD) in Table 2, the docking accuracy increases from rigid receptor (38%) to model 1 (61%), model 2 (76%), and is the highest for model 3 (84%). Model 3 is the most successfully flexible receptor model for sampling and then ranking low RMSD poses into the top 5 (Supporting Information Figure S1). From a careful comparison of all 144 cross-docks individually, it is clear that this improvement is due to the inclusion of backbone flexibility for the two loops. The increase in docking accuracy from 38% (rigid receptor) to 61% (model 1) reflects success for many of the less difficult cross-docking situations where only minor or major changes in side chain conformation are sufficient for successful docking. The increase in docking accuracy from 61% for model 1 to 76% for model 2 reflects success in more difficult cross-docking situations where rearrangements in the backbone conformation of the DFG loop and side chains are required for successful docking. Including flexibility in the two most flexible loops (model 3) was even more successful at ranking low RMSD poses (≤ 2.0 Å ) in the top 5. It was somewhat surprising that there was such a drop in docking accuracy from model 3 (2 flexible loops) to model 4 (6 flexible segments). From superposition of the cross-docking crystal structures, it was expected that model 4 would perform well for backbone rearrangements in the gatekeeper region (res:103-115) required for success in a few specific cross-docks. Even though model 4 outperforms models 1 through 3 for a few of these specific cross-docking examples, it is less successful in many of the less challenging cross-docks that required only minor changes in side chain conformation.
Our investigation of models 4 through 8 was aimed at assessing ways of including binding site flexibility in the absence of multiple crystal structures or any evidence as to which parts of the receptor binding site is flexible. If nothing was known about a receptor’s flexibility, would it be better to apply backbone flexibility to multiple protein segments that form the binding site, or to all contiguous residues that form the binding site, or to the entire protein? It was surprising that when the top 5 scoring poses are examined, there is not much of a difference between models 4 through 7. In constrast to models 1 thru 3, models 4 thru 7 include backbone flexibility in the hinge region (res:106-112) which is the third most flexible backbone segment in the reference crystal structures, and atom clashes involving this region result in some rigid docking failures. It was therefore surprising that including backbone flexibility in this region did not have a significant impact. In general, our results reflect that the best docking success is achieved with the most minimal model that adequately describes the experimentally observed protein flexibility. Model 3, which has many flexible side chains and only two flexible loops, demonstrates the best performance in docking accuracy because it describes the flexibility of the receptor adequately with the fewest necessary degrees of freedom comprising the conformational search space. It would be possible to pursue this strategy for a new receptor target even in the absence of any knowledge of the flexibility of the binding site, by predicting the most flexible segments from MD simulations, normal mode analysis,23,24 elastic network normal modes,25 or other framework approach algorithms.26 Comparison between flexible receptor models 1 through 8 also suggests that it would likely be better to consider only the segments forming the binding site to be flexible, rather than the entire protein. These principles can be widely extended to build efficient flexible receptor docking models for new targets of interest. For systems where large conformational changes are expected, our results suggest that it may be a good idea to use other conformational sampling approaches (including: normal mode analysis, fully flexible MD conformational sampling with solvent, etc.) in order to identify a small subset of diverse low energy conformations. Then when low energy conformations representing these various states are selected, local explicit flexibility in the binding site can be applied to improve docking accuracy. This approach is in contrast to docking into a large number of fully flexible structures (like flexible model 8) where the performance of a scoring function is diminished due to energetic “noise”.
Reduced docking accuracy in flexible model 8 was primarily due to significant binding site rearrangements resulting in low scoring non-native ligand binding poses. Distortions in the geometry of the ligand binding residue side chains directly resulted in a loss of docking accuracy in flexible model 8. Over the entire cross-dock we show that 60-70% of the native protein-ligand contacts were sufficient for docking in native-like binding modes. In the majority of the non-native binding pocket conformations, more than 50% of the ligand binding residues had undergone significant rearrangement and cannot form enough native-like ligand contacts. Specific cross-docks where only model 8 failed, were examined in detail and in many cases there were significant binding site conformational changes in the side chains that are not observed in any of the reference crystal structures. In these cases the overall backbone Cα RMSD is in the range of 1.5 to 2.0 Å from the starting and reference structures, but these backbone deviations are on the order of the maximum Cα RMSD between reference crystal structures (1.7 Å Cα RMSD for 1w83 and 1yqj). One similarity between these docking failures was that there were significant changes in either the backbone and/or side chain conformation of the majority of residues that form native contacts with the ligand. Two specific groups of conserved hydrophobic contacts in kinase scaffolds have been proposed as key contacts involved in catalytic and regulatory conformational changes: the regulatory spine (res: 75, 86, 148, 169) and the catalytic spine (res: 38, 51, 113, 156, 157, 158, 212, 216).27,28 An interesting observation was that in many cases where only model 8 failed, conformational changes were also observed in the side chain contacts in the catalytic spine, where no changes in the catalytic spine are observed between any of the 12 reference crystal structures despite significant changes in the binding pocket. Model 8 is the only model where the entire catalytic spine is flexible and these observed rearrangements only occurred for model 8. Therefore, it would have been possible to use this information to select low energy conformations generated from model 8 that maintained native-like structure in the catalytic spine, but allowed significant rearrangements of the binding site.
It is possible that some non-native distortions to the structure in fully flexible model 8 may be due to a lack of explicit or implicit solvent during the brief conformational sampling procedure to generate apo receptor conformations. Including implicit solvent during this conformational search and the subsequent refinement of the receptor ligand complexes would be possible, but at a great increase in computational expense. To assess this, we performed a few control simulations of this procedure using the GBMV implicit solvent model, and found that the overall level of backbone (Cα RMSD) and side chain rearrangement is very similar. Apo receptor conformations generated with implicit solvent also exhibited similar binding site distortions in side chain conformations that contact the ligand, as well as side chain conformational changes involved in the catalytic spine (most notably residues Leu156, Val158, and Ile212), and in all four residues in the regulatory spine. One notable difference in the binding site conformations generated with implicit solvent is that Asp168 and Glu72 form less close contacts with Lys53, Arg67 and Arg173, presumably because the full charges on these side chains were solvated. For certain specific cross-docking failures non-native contacts between Asp168 and Lys53 were formed, but there are also examples of close interactions between these residues in some of the reference co-crystal structure (1ouk and 1w84) compared to the others. It remains unclear if the added computational expense of the implicit solvent during receptor conformation generation would actually improve docking accuracy compared to our other models. Performing the entire cross-dock again with this procedure was beyond the scope of this manuscript. A large ensemble of fully flexible protein structures produced with high quality explicit solvent simulations would likely retain ensemble members that contain such non-native distortions. Docking into this entire ensemble would still be “noisy” and therefore, we propose that a better strategy would be to select a few low energy representative states from such an ensemble and then apply local flexibility to the binding site as mentioned above. Our results for this specific kinase system may not be generalized to all other flexible receptors, but our conclusions should hold in the regime where the amount of sampling required for visiting diverse low energy conformational states (such as DFG-in/DFG-out) is sufficient to rearrange the majority of ligand interacting residues in the binding site.
TAMD flexible receptor docking succeeds in many difficult cases using models 2 and 3 where rigid receptor docking fails. When the cross-docking results are compared to the experimental binding data using the LIE2 scoring functions (Supporting Information Table S1), the flexible receptor results are in much better agreement (1.1 AUE kcal/mol) as expected. The poorer agreement with the binding data for the rigid receptor is the result of lower docking accuracy, especially for atom clashes with high affinity compounds (false negatives). There is a big improvement in docking accuracy for the three receptors in the DFG-out conformation using the TAMD flexible receptor models, where models 2 and 3 perform the best. Receptors in the DFG-out conformation are now able to successfully dock ligands that bind in the DFG-in conformation as well as the converse, ligands selective for DFG-out are also able to successfully dock to DFG-in receptors. A specific example of one of these difficult cross-docks is swapping the ligands from 1ouy (DFG-in) and 1w83 (DFG-out), shown in Figure 1A. The swapped conformation of each ligand has atom clashes with the DFG loop, requiring significant DFG loop conformational changes for successful cross docking into a native-like conformation. For flexible receptor cross docking of the ligand 1ouy into the receptor 1w83 (Figure 1B and C) the lowest energy pose for this cross-dock was 1.5 Å RMSD for the ligand and the complex contained 71% of the native protein-ligand contacts. For cross docking of the ligand 1w83 into the receptor 1ouy the lowest energy pose was 2.4 Å RMSD for the ligand and the complex contained 66% of the native protein-ligand contacts. Among the top 5 non-redundant poses was another pose 1.8 Å RMSD for the ligand which contained 69% of the native prot-lig contacts in the complex.
Despite the fact that the DFG loop does not sample the entire range of DFG loop conformations, TAMD models 2 and 3 are still able to successfully cross-dock ligands selective for the DFG-out conformation into receptors in the DFG-in conformation. Upon careful examination of these structures, many individual examples demonstrate that smaller movements of the DFG loop backbone (on the order of 3-6 Å), as well as changes in side chain conformation are sufficient to remove the primary atom clashes between the native conformation of the ligand and the DFG loop. Thus, for cross-docking these examples, even though the generated conformations of the DFG loop are only able to come as close as 2.5-3.0 Å Cα RMSD to the reference backbone conformation of the DFG-loop (Figure 1E and F), the ligands are still capable of docking in the correct conformation because up to 70% of the native protein-ligand contacts are properly formed and contribute the strongest anchor points for binding in the correct orientation. TAMD sampling using model 3 was also successful at correctly predicting significant backbone and side chain conformational changes in the P-loop. Figure 2 shows two separate examples of concerted changes in the two flexible loops that lead to successful cross docking in difficult cases. In both examples in Figure 2, Tyr 35 from the P-loop undergoes a successful rearrangement from the starting conformation (purple) to the final conformation (red), which is in turn closer to the reference target conformation (blue) allowing the ligand (red) to dock in the correct conformation. Arrows are shown to highlight the movement of Tyr 35 from the starting conformation closer to the reference conformation. In the first example (Figure 2A and B), the ligand 1bl7 (DFG-in) is able to dock accurately into the DFG-out conformation because of the backbone and side chain rearrangement of Phe 169. In the second example (Figure 2C and D) the ligand 1yqj (DFG-in) is able to dock accurately into the 1di9 (DFG-in) receptor conformation. Tyr 35 on the P-loop moves over (red) to form a more native-like contact with the ligand as in the reference structure (blue).
TAMD sampling was directly compared to MD sampling over the entire cross-dock set for flexible receptor models 2, 3, 4 & 8. Considering only the first docking accuracy metric in Table 2 (the percentage where the lowest scoring pose is ≤ 2.0 Å RMSD), TAMD docking accuracy is better than MD for models 2 through 4. MD sampling reduced the docking accuracy for cross-docks requiring significant DFG loop conformational changes. For many difficult cross-docks TAMD sampling was able to identify a native-like low energy conformation and MD sampling was insufficient. For these examples, scatter plots of ligand RMSD versus LIE1 energy show that TAMD sampling produced low energy native-like conformations and MD sampling was unable to sample native-like conformations at all (Supporting Information Figure S2). In other cases, MD sampling was sufficient to sample native-like conformations, yet displayed a reduced ability to score or rank native-like binding poses compared to TAMD (non-native conformations score better than the native). Despite these deficiencies of MD sampling compared to TAMD sampling, the MD model 2 was used for ensemble docking of 227 active compounds to determine if it was sufficient to demonstrate significant improvement over rigid ensemble docking. This choice was made as a compromise between accuracy and speed. While our rigid receptor docking protocol runs within an hour for each ligand to generate the top ranked poses, the fully flexible TAMD model 8 (or MD model 8) have comparative runtimes on the order of 34 hours. MD model 2 has a slightly faster runtime on the order of 26 hours when using the consfix routine in CHARMM to speed up the non-bonded evaluations. These runtimes are not yet optimized and all runtimes were evaluated on a single processor of a 2.66 GHz Intel Xeon quad core.
In analyzing the results from docking the series of 227 active compounds into the ensemble of 12 receptor conformations, LIE1 (derived from native scoring) was applied to rank the ligand conformations for both rigid and flexible receptor docking. After the top 5 scoring poses are identified, calculated energy components from MD that are used for comparison to experimental binding data using the LIE2 model that incorporates desolvation. For clarity in presenting results from ensemble docking, only results from the single lowest scoring pose will be used, rather than including all results from the top 5. Our objective in comparing to binding data (fitting LIE2 to experiment) is to assess if the top scoring ligand poses derived from flexible receptor docking are able to provide better agreement with binding data than rigid receptor docking.
In analyzing results from ensemble docking using the LIE2 model, much better fits to the experimental data were achieved by adding an entropic term related to the number of freely rotatable bonds29 in each given ligand that becomes immobilized upon binding to the receptor: [ΔGbinding = α(ΔVDW) +β(Δ(ELEC+GBE)) + γ(Nrotatable_bonds)]. This term had not been included previously in LIE2 because the cross-docking test set showed minimal improvement upon adding the entropic term. Fits to the LIE2 model yield values of γ that are within the range of other values (0.1-0.6 kcal/mol) reported in the literature for the entropic contribution to ΔGbinding from a single freely rotatable bond in a ligand.29,30 The parameter γ was found to be (rigid γ=0.19, flex γ=0.35) when the docking results for all 12 receptors are fit simultaneously to binding data. Comparisons to binding data are calculated in two ways: first, all 12 receptors were fit simultaneously to the binding data using a single LIE2 model, and secondly, each receptor was fit individually to the experimental data with its own LIE2 model. The second approach has smaller errors in kcal/mol as expected, but an important finding is that these two fitting approaches both show similar qualitative trends and few differences in comparing rigid and flexible receptors.
Although there are not crystal structures for all of the 227 active compounds, a reasonable approximation can be made for the ligand binding conformations assuming that the vast majority of active compounds will adopt binding modes similar to known crystal structures of similar compounds. There are six major series of related compounds: 1. 2-aminopyrimidine carbamates,31 2. 4-azaindoles,19 3. aminopyrazoles,32 4. benzimidazolones,33 5. indole and pyridine fragments,18 and 6. triazolopyridine oxazoles.34 All ligands that bind in the DFG-in conformation share at least two anchor points or pharmacophores (Figure 3): 1. a hydrogen bond or close interactions with the backbone and side chains in the gatekeeper region of the active site (res:107-110H) and 2. a deep hydrophobic pocket that binds aromatic groups (T106, L75, L104,). These two anchor points can be used to determine the overall orientation of each ligand docking pose. Series 2, 4 & 6, also have a third unique anchor point which is a hydrophilic interaction with the charged NH3 group of K53. Ligand binding poses were considered to bind in a native-like orientation if they satisfied two or more of these anchor points, and this criteria was used to approximate the overall docking accuracy for each receptor (Table 3). Hydrogen bonding to the hinge region in (Figure 3F) shows the peptide plane flip (from structure 1zzl) that is also present in the structure for 1ouk and 1ouy. All compounds are able to hydrogen bond to the hinge region in either conformation, although the ideal ligand interactions may not be formed.
The highest docking accuracy using a rigid receptor model was found for receptors 1ouk (67%) and 1ouy (64%). The receptor 1ouk also had the best docking accuracy in the cross-docking study according to ligand RMSD metrics. This is an interesting finding in light of a recent solution NMR study which concluded that the predominant ensemble in solution is most similar to the crystal structures of 1ouk and 1ouy.35 This NMR study compared a set of 73 measured residual dipolar couplings (RDCs) from the apo conformation in solution to various crystal structure conformations.35 Receptor conformations 1ouk and 1ouy exhibit a peptide plane flip over the gatekeeper residues M109 and G111, which participate in backbone hydrogen bonding with ligands. Comparison of the measured RDCs upon binding the pyridinylimidazole inhibitor 1a9u demonstrated evidence for a structural transition over residues M109 and G111 from the receptor conformation (1ouk) to the crystallographic conformation (1a9u).35 Therefore, prior to binding, the predominant conformation of the apo receptor was most similar to 1ouk and 1ouy. This experimental observation rationalizes the observed high docking accuracy and good agreement with binding data for rigid receptor docking with 1ouk and 1ouy. It is also interesting that the lowest rigid receptor docking accuracy is observed for the receptor conformation of 1a9u, and this can be attributed to the peptide plane flip in the gatekeeper region as well as a very closed conformation of the P-loop. Low docking accuracy for the ligand 1a9u in cross-docking can also be explained by NMR evidence that this ligand binds in more than one conformation.35,36
There are several receptors that perform very poorly for rigid docking that show improvements in docking accuracy upon adding flexibility. The docking accuracy for individual rigid receptors ranges from as low as 11% for 1a9u to as high as 67% for 1ouk. For 5 of 12 receptors with relatively low rigid docking accuracy (1a9u, 1bl6, 1bl7, 1w83, 1w84), the flexible docking model improves the docking accuracy. However, for 5 other receptors that had a comparatively high rigid receptor docking accuracy (1di9, 1ouk, 1ouy, 1oz1, 1yqj), including flexibility slightly diminished overall docking accuracy and overall agreement with binding data. In this way, flexibility improved agreement with experiment for receptors that perform poorly in rigid docking, and reduced agreement with experiment for the receptors that had the best results for rigid docking. However despite this small decrease in overall docking accuracy for these receptors, flexible receptor docking still achieved overall improvement in ranking compounds when compared to binding data.
Comparison to experimental binding data shows that the flexible receptor model has an improved ability to rank compounds. When all 12 receptors are fit to binding data simultaneously, 8 of 12 have a better linear correlation (R2) with the experimental data with a flexible receptor. When the receptors are fit individually, 7 of the 12 receptors showed improved agreement in terms of error (AUE) in kcal/mol and linear correlation (R2) with a flexible receptor. In a comparison of each individual receptor to experimental binding data, the flexible receptor model demonstrated improved enrichment of high affinity compounds over low affinity compounds (Table 3, Supporting Information Table S2). Results from rigid and flexible docking are compared by calculating the enrichment factor (EF) for each receptor based on the number of high affinity hits in the top n ranked compounds. Enrichment factors (EF) were calculated [ EF=(Hitsn/NCn)/(Hitstotal/NCtotal) ] where Hitsn are the number of true high affinity actives retrieved in NCn (the top n ranked compounds), Hitstotal are the total number of high affinity actives, and NCtotal are the total number of compounds. Enrichment factors were calculated for two sets of the highest affinity compounds. The first group of high affinity compounds was 46 of 227 ligands that have (ΔGexp<−11.0 kcal/mol), and the second group was 89 of 227 ligands with (ΔGexp<−10.0 kcal/mol). The enrichment factor EF1 was calculated for this first group 1 (ΔGexp<−11.0 ), and the enrichment factor EF2 was calculated for group 2 (ΔGexp<−10.0 kcal/mol). Given these definitions the maximum theoretical value of EF1 is 4.9, and the maximum theoretical value of EF2 is 2.6. A third group of 57 low affinity compounds were selected from the 227 using the criteria (ΔGexp>−8.0 kcal/mol). This third group of low affinity compounds was used to calculate the number of false positives, assuming that group 2 (true positives) could be successfully separated from group 3. A large majority of the individual receptors (10 of 12 receptors) had improved enrichment factors with a flexible receptor (Figure 4), and the overall enrichment over the ensemble of 12 receptors is better for the flexible receptor model (Supporting Information Table S2 & S3). The flexible receptor model also showed significant enrichment compared to the rigid receptor model in separating these 227 active compounds from a large number (>9000) of decoy molecules (Figure 5A and B). These decoy molecules, which are specific for p38α, were taken from the DUD (database of useful decoys), and represent a very difficult decoy set because the ligands were selected to be similar to the actives in several molecular properties37. As 28 of our 227 active compounds are larger in MW than the DUD decoys molecules, we also calculated the enrichment factors considering only 153 active ligands that share the same molecular weight range (450>MW>320) as the DUD decoys to eliminate any source of artificial enrichment (Figure 5C and D).38 Either approach to calculating the enrichment factor shows significant improvement of the flexible receptor model compared to the rigid receptor for either DFG-in or DFG-out conformations (Figure 5).
As our results show certain receptors perform well for one ligand series, while not so well for others, we investigated several combinations of minimal receptor ensembles. A minimal receptor ensemble of 4 receptors (1a9u, 1di9, 1ouk, 1oz1) was constructed using the best rigid receptor for each series respectively. When these rigid receptor docking results are fit simultaneously to the LIE2 model, it achieves much better agreement with binding data than any individual receptor (Figure 6), and shows that it is possible for this LIE2 model to rank all six series simultaneously using a multiple receptor conformation ensemble (MRC) approach. For the six major series of related compounds the receptor with the best agreement was used for each series respectively: 1. 2-aminopyrimidine carbamates (1ouk), 2. 4-azaindoles (1a9u), 3. aminopyrazoles (1a9u), 4. benzimidazolones (1oz1), 5. indole and pyridine fragments (1di9), and 6. triazolopyridine oxazoles (1ouk). The best linear fit for this model through all points (AUE=1.1 kcal/mol, RMS=1.4 kcal/mol, and R2=0.41) is shown in black in Figure 6A. Figure 6 shows that the best fit receptor for each series has a slightly different slope, even when fit simultaneously. The agreement with the binding data for each series is as follows: 1. 2-aminopyrimidine carbamates (R2=0.30), 2. 4-azaindoles (R2=0.24), 3. aminopyrazoles (R2=0.19), 4. benzimidazolones (R2=0.16), 5. indole and pyridine fragments (R2=0.72), and 6. triazolopyridine oxazoles (R2=0.17).
As p38α is considered to be a representative ser/thr kinase with strong induced fit effects, it has been used previously in several other flexible receptor docking studies.39-42 A small subset of our cross-dock can be compared to other studies with ICM39 and Glide41 for docking accuracy, where a docking success is defined as the single lowest energy structure <= 2.0 Å RMSD. For a very small 4×4 comparison with ICM39, our flexible model 1 was the best with a 38% accuracy which was lower than ICM with a 63% accuracy, however this lower performance is predominantly from poor docking accuracy of the ligand for 1a9u, which has been shown to bind experimentally in more than one binding mode.35,36 For a larger comparison with Glide41 (3×3 DFG-out) (9×9 DFG-in), flexible model 1 was the best with 46% accuracy where Glide was 60% accurate.
Early database enrichments were also measured in these two studies using ICM39 and Glide41 and our early enrichment in the top 1% to 5% of the ranked database (Figure 5C and D) are roughly comparable given that these two studies have different lists of actives and decoys. These two studies also both used on the order of 1000 decoys and we have used more than 9000 decoys. In the ICM study for the receptor 1a9u enrichment factors for the top 1%, 2%, and 10% were 16.7, 8.3 and 5.0 respectivly where our corresponding EF from Figure 5C for 1a9u are 7.1, 3.0 and 1.1 respectively. The early enrichments for our rigid docking results for receptor 1kv2 are also reasonable compared to rigid docking performed in the DUD dataset paper with DOCK37, where the EF at 1% was 2.1 with DOCK and our respective EF at 1% was 1.4, although our EF values range from above 10 to 3.3 for multiple actives found within the top 1%. In a recent report comparing ICM and Glide for enrichment of 111 actives against 6450 decoys, the area under a receiver operating characteristic (ROC) curve was reported for enrichment.42 In the ROC curve, the signal (fraction of true positives) is plotted on the y axis against the noise (fraction of false positives) on the x axis, and the area under the curve (AUC) indicates the overall signal to noise where 1.0 is theoretically perfect performance and 0.5 is a random performance. In the above mentioned study the ROC AUC values for single receptor docking were 0.64 for Glide and 0.70 for ICM and the ROC AUC for multiple receptor docking were 0.81 for Glide and 0.88 for ICM respectivly.42
We have calculated the ROC AUC for our actives with the DUD decoy set (Figure 5C and D) for the molecular weight range of 320-450 and have found that the ROC AUC for the flexible receptor is 0.47 (1a9u) and 0.44 (1kv2). These are both less than 0.5 and are therefore worse than random when considering ranking the entire database, despite the fact that these datasets show reasonable early enrichment within the top 1-10% of the dataset. However, the ROC AUC are still better for the flexible receptor compared to the rigid receptors 0.42 (1a9u) and 0.36 (1kv2). One reason for this poor apparent performance is due to the large number of low molecular weight (MW) actives in our set, where high MW decoys score better than low MW actives. When only the actives and the decoys in the high MW range of 375-450 are compared (52 actives and 4432 decoys), then our enrichment studies have a ROC AUC of 0.64 (1a9u) and 0.64 (1kv2) for the flexible receptors. These ROC AUC values are both better than random over the entire database and more similar to values reported for single receptor docking for Glide and ICM.42 Another reason for the lower ROC AUC for the entire MW range of 320-450 may be that the decoys in the DUD dataset may be more similar to p38α actives than the decoy set in the study using Glide and ICM.42 Our results show that for DUD that reasonable early enrichment is observed in EF factors, but that much better overall enrichment occurs in a comparison of actives and decoys in the MW range of 375-450 compared to the MW range of 320-375. We suspect that this may be because larger MW ligands have more specific contacts with the receptor compared to their decoys and that lower MW decoys are more similar to low MW actives.38 In general, although our results for the flexible receptor models are much better than our rigid receptor results, these flexible receptor models are still not performing as well as ICM and Glide.
For ranking docking poses of the same ligand, using the total energy of the protein-ligand complex with an R-dependent dielectric (RDIE) is sufficient,43 as long as the receptor is rigid. When flexibility is introduced into the protein during the conformational search, the total energy of a given protein-ligand complex is no longer a reliable metric to distinguish low energy native-like poses. This lower predictive power is due to “noise” from conformational changes in intra-protein interactions, which are energetically equivalent or greater than conformational changes in protein-ligand interaction energy. For this reason, when protein flexibility is introduced into the conformational search problem, a scoring function that is based on energetic components from protein-ligand interaction is more effective. In a previous study, CHARMm protein-ligand interaction energies performed well compared to 8 other commonly used scoring approaches for the discrimination of docked and mis-docked structures.43 Although this approach works well for rigid docking, for flexible protein docking, we have employed linear interaction energy (LIE) models44 for our CHARMm protein-ligand interaction energies of the generic form: [ΔGbinding = α(Δcomponent1) + β(Δcomponent2) + γ(Δcomponent3)] as the sum of potential energy components with weighted coefficients. Following the work of Caflisch and coworkers,44 we have used an LIE model of the form [ΔGbinding = α(ΔVDW) +β(Δ(ELEC+GBE))] for predicting the absolute ΔGbinding in order to compare ligands of various sizes and shapes (to rank a series of ligands). In the current study of protein flexibility, we use a less computationally expensive LIE model of the form [ΔGbinding = α(ΔVDW) + β(ΔELEC)] for ranking binding poses of the same ligand where both the protein and ligand are flexible simultaneously. This simplified LIE score avoids the computationally expensive calculation of a generalized Born (using GBMV) solvation energy components for every docking pose. Using this model assumes that differences in solvation energy should be a minor component when considering poses of the same ligand. Therefore our approach to flexible receptor docking uses a hierarchical two-step scoring approach to reduce computational expense,45,46 where a simplified “LIE1” model is used to select the best scoring poses, which are then rescored using a more sophisticated and computationally expensive “LIE2” model that includes desolvation.
In order to construct these LIE scoring functions for flexible receptor docking, energetic components acquired from “native scoring” and from self-docking using a rigid receptor were compared. Native scoring LIE components were calculated from both the minimized static native structure, and from MD simulations started from these structures. Three metrics of comparison are used to compare predicted ΔGbinding to experimental ΔGbinding values: average unsigned errror (AUE) in kcal/mol, root mean square error (RMS) in kcal/mol, and the best fit linear correlation coefficient (R2). When the parameters α and β are fit to the energy components from native scoring, the more sophisticated LIE2 model including the desolvation effects [ΔGbinding = α(ΔVDW) +β(Δ(ELEC+GBE))] demonstrates an improved agreement with experiment as expected compared to the LIE1 without solvation [ΔGbinding = α(ΔVDW) + β(ΔELEC)]. The best fit (AUE =1.0 kcal/mol) was from the LIE2 model applied to ensemble averages over MD simulation. Given the reasonable transferability of these LIE1 fits between native scoring and self-docking, we used them as a scoring function for flexible receptor docking.44 As an additional check, all of the docking poses collected from self-docking were re-scored using the LIE1 scoring function, and native-like ligand poses with heavy atom RMSD ≤ 2.0 Å were still obtained as lowest energy poses. Cross-docking results described below show that using LIE1 scores result in improved cross-docking accuracy compared to using the total energy of the complex, for both rigid receptor docking and for flexible receptor docking.
Our molecular docking approach uses the program CHARMM (Chemistry at HARvard Molecular Mechanics) for an all-atom potential energy description of protein-ligand complexes.47 The CHARMm force field, originally parameterized by Momany and Rone48 has been extended to describe ligands in the Ligand-Protein Database (LPDB) and was used to build potential energy functions for all ligands in this study.49 The crystal structure coordinates of 12 protein-ligand complex structures of p38α MAP kinase were downloaded from the PDB database.15-21 All structural water molecules were removed, and standard protonation states for neutral pH were applied to all ionizable protein residues. Appropriate protonation states were applied to ligands by calculating pKa’s for titratable groups using both Marvin (http://www.chemaxon.com) and the web server SPARC (http://ibmlc2/chem.uga.edu/sparc/).50 Hydrogen atoms were built for ligands using open Babel (http://openbabel.org), and all protein hydrogen atoms were built using CHARMM. For ligands with multiple possible tautomers (such as 1a9u), only the single tautomer that allows a close favorable interaction (between the NH3 group of Lys 53 and imidazole nitrogen) inferred from the co-crystal structure was used.
The activation loop residues D168 and F169 form the characteristic inactive “DFG-out conformation” in the structures of 1kv1 and 1kv2, and 1w83. The structures 1kv1 and 1kv2 are missing electron density for the rest of the flexible residues 170-184 in the activation loop. Flexible loop conformations for this missing electron density were built using Modeller and the MODloop server (http://modbase.compbio.ucsf.edu/modloop/modloop.html).51 For 1kv1 and 1kv2, several conformations of the activation loop were considered, but the lowest energy models similar to the DFG-out conformation of 1w83 were selected. The all atom protein-ligand complexes were minimized for 200 steps in vacuo and this was taken as the experimental reference conformation of all structures.
In our rigid receptor-docking model, a 1.0 Å grid is used to describe the static protein conformation of the binding site, where the interaction energies of 20 types of probe atoms are calculated for every point on the grid. The 3D grid is calculated to extend 8 Å in all directions from any atom in the ligand. Following a simulated annealing conformational search of the flexible ligand, the grid is removed and the all atom protein-ligand representation is minimized fixing the coordinates of the protein, using the standard all atom potential function with a distance dependent dielectric (RDIE). This interaction energy is taken as the score for the final ligand pose. Other details of previous CDOCKER setups and protocol have been published elsewhere.52-55 For each given protein-ligand complex, 10 docking trials were performed composed of 1000 individual docking attempts (100 random conformations x 10 random rotations) for a total of 10,000 docking attempts per complex. Within each docking trial, the interaction energy scores of the final ligand poses are sorted and clustered to identify the “top 5” non-redundant clusters of docking poses. After these “top 5” docking poses are identified, then a short MD simulation of the entire protein-ligand complex is performed for each of the “top 5” in order to more accurately predict ΔGbinding. In these short MD simulations, the electrostatic contribution to the solvation energy is approximated by using a generalized Born implicit solvent potential energy term (GBMV).56,57 For the “top 5” docking poses, MD simulations are performed using a 16 Å non-bonded cutoff and the GBMV implicit solvent model. These MD simulations are performed at 300 K using harmonic restraints on the Cα atoms of the protein backbone, starting with 500 (2 fs) MD steps for equilibration. The harmonic restraints are then released for an additional 1000 (2 fs) steps of production MD, which is used to calculate the ensemble average protein-ligand interaction energies. The first 50 steps are skipped, and then ensemble average interaction energy properties are calculated from 950 structures for the LIE2 model.
The TAMD method uses an efficient Newton-Euler inverse mass operator (NEIMO) algorithm for solving the equations of motion in internal coordinates.58 Molecules are represented with a branched tree structure of rigid body clusters connected by torsional hinges. Torsional cross-terms are constructed from local molecular fragments, using a soft-core potential to introduce implicit bond and angle flexibility into the rigid geometry approximation. The use of the soft-core potential effectively removes high-energy barriers on the IC potential energy surface and facilitates more efficient sampling of conformational space. Additional details about the TAMD method have been published elsewhere.59
The flexible receptor docking method differs from the rigid docking method in that it never uses a grid representation of the protein. An overview of the flexible docking procedure is shown in Figure 7. In the first step of the flexible receptor docking procedure, random conformations of the ligand are generated for a total of 200 ligand conformations. The next step is to perform TAMD conformational sampling of the receptor in the absence of the ligand to generate a diverse ensemble of 200 flexible receptor conformations. During this conformation generation procedure, a 14 Å non-bonded cutoff is used along with an R-dependent dielectric (RDIE) and a soft-core potential. Using TAMD for simulated annealing, quite large temperatures can be effectively employed for rapid conformational sampling, depending on which model of protein flexibility is used. Models with fewer flexible degrees of freedom can use larger temperatures for sampling without integrator stability problems. For flexible model 2, heating is performed from 2000 K to 2500 K, while for flexible model 8, heating is performed from 500 K to 700 K and then the system is cooled from 700 K down to 300 K. These protocols yield overall protein RMSD that are in the range of 0.3 - 3.0 Å Cα RMSD from the starting receptor conformations.
In the next step, the 200 random flexible receptor conformations are docked with 200 random flexible ligand conformations. For each new random pair of flexible ligand and receptor conformations, 1000 ligand rotations are performed around a reference ligand’s center of mass, to identify the optimal rotation conformation of the flexible ligand in the flexible receptor conformation. For this search for an optimal rotation in the binding site, a soft-core potential and all-atom models are used for both the ligand and the receptor. Next, the structure of the protein-ligand complex is refined using TAMD simulated annealing, which allows for simultaneous flexibility of the protein and ligand. The heating and cooling cycle is carried out for all complexes using 1000 steps of 2 fs TAMD, where the system is heated from 500 K to 700 K over 1000 steps, and then cooled from 700 K back down to 300 K over 1000 steps. During this simulated annealing cycle, significant rearrangements of both protein and ligand are possible. Although some complexes do not move far from their starting structure because they contain favorable protein-ligand interactions, other less favorable complexes undergo significant conformational rearrangements. For flexible protein model 2, changes characterized by atom movements as large as 6-8 Å are observed for both ligand and flexible protein side chains in numerous examples. The final TAMD generated protein-ligand complex structures are then minimized for 1000 steps using the soft-core potential. Then the soft-core potential is removed and replaced with the standard potential function with a distance dependent dielectric (RDIE). Interaction energies and potential energy components are calculated from this final minimized protein-ligand complex and are used to calculate the LIE1 score to energy rank the TAMD generated flexible receptor complexes. Similar to rigid receptor docking, these LIE1 scores for the final flexible protein-ligand complex conformation are sorted and clustered to identify the “top 5” non-redundant clusters of docking poses. As for rigid receptor docking, then a short MD simulation of the entire protein-ligand complex is performed using the GBMV implicit solvent model for each of the “top 5” in order to calculate the potential energy components for the LIE2 score.
In order to extend our study from the N=12 cross dock, we also considered the docking of 227 active compounds that have IC50 binding data for human p38α reported in the Binding Database (www.bindingdb.org).60 Approximate binding free energies were calculated from IC50 data using solution information and the relationship (Ki = IC50*(Km/(S+Km)). The covalent structure and initial low energy 3D conformations of each ligand were generated from smiles strings using Marvin. Appropriate protonation states were applied to ligands by calculating pKa’s for potential titratable groups using both Marvin and the web server SPARC. Decoy molecules specific for p38a were taken from the DUD (database of useful decoys), and the protonation states from DUD were used.37 All 227 ligands were docked to the ensemble of 12 receptor conformations using both the rigid and flexible receptor-docking model. For flexible receptor docking, flexible protein model 2 was used with MD flexible receptor conformational sampling in order to determine if MD sampling alone was sufficient to achieve improvements over rigid docking.
Several models were constructed to represent the flexibility of the binding site for p38α MAP kinase, which exhibits significant binding site flexibility in ligand bound co-crystal structures. These flexible receptor models scaled from the inclusion of: 1. flexible side chains (model 1); 2. one or more flexible backbone segments (models 2 through 7); and 3. treatment of the entire protein to be flexible (model 8). As constructed, these models scale in the number of total protein degrees of freedom that describe the conformational search space. Models 2 and 3 demonstrate superior performance in cross-docking accuracy because they describe the flexibility of the system adequately with the fewest degrees of freedom. We show that the fully flexible protein model 8 suffers from reduced docking accuracy. In fully flexible protein model 8, the scoring function docking accuracy is diminished and more “noisy” due to extensive binding site rearrangements involving the majority of the ligand interacting residues. While some of these conformational changes are native and observed in crystal structures leading to docking successes, other non-native rearrangements result in low scoring non-native ligand binding poses.
In a direct comparison of TAMD sampling to MD sampling for these various model of protein flexibility, TAMD sampling was found to be both more efficient and more accurate. However, even using MD sampling, the percentage of cross-docking success is significantly greater with the flexible receptor models 2 and 3 than for the rigid receptor models. This comparison demonstrates that the flexible receptor models succeed in many difficult cases where the rigid receptor model fails, and that reasonable “native-like” receptor conformations are sampled regardless of the starting conformation. Using MD sampling and flexible receptor model 2, we performed receptor ensemble docking (RED) using the same 12 receptor conformations and 227 active compounds. The comparison to experimental binding data showed that the flexible receptor model showed improved agreement with binding data and an improved ability to rank compounds. In a comparison of each individual receptor, the flexible receptor model also showed improved enrichment of true positives in top ranked compounds. The flexible receptor methodology described in this paper is most applicable for proteins that show strong induced fit effects.
This work has been supported by the National Institutes of Health to CLB (GM037554) and to RSA (GM076836), in the form of an NRSA Postdoctoral Fellowship. Molecular graphics images were produced using the UCSF Chimera61 package from the Resource of Biocomputing, Visualization, and Informatics at the University of California, San Francisco (supported by NIH P41 RR-01081). Marvin was used for drawing all chemical structures, Marvin 4.0.5 2007, ChemAxon (http:www.chemaxon.com).