|Home | About | Journals | Submit | Contact Us | Français|
The representation of protein flexibility is still a challenge for the state-of-the-art flexible ligand docking protocols. In this article we use a large and diverse benchmark to prove that is possible to improve consistently the cross docking performance against a single receptor conformation by using an equilibrium ensemble of binding site conformers. The benchmark contained 28 proteins, and the top ranked near native poses for the ligand were found 20% more efficiently than using a single receptor. The multiple conformations were derived from the collective variable space defined by all heavy atom elastic network normal modes, including backbone and side chains. We have found that the binding site displacements for best positioning of the ligand seem rather independent from the global collective motions of the protein. We also found that the number of binding site conformations needed to represent non-redundant flexibility was less than one hundred. The ensemble of receptor conformations can be generated in our web site at http://abagyan.scripps.edu/MRC.
Molecular docking is routinely used within the first stages of Structural Based Drug Design (SBDD). Unfortunately, despite ongoing methodological advances and some success histories, state of the art algorithms predict an incorrect binding pose for about 50-70% of all ligands when a single receptor conformation is used.1
The need for including the receptor flexibility in docking comes after many experimental and theoretical evidences showing that proteins move at room (or physiological) temperature, and rearrange in response to binding.2-4 Inspection of structures from the Protein Data Bank clearly shows that different binding site conformations of the same protein exist for different ligands.5 Such compelling arguments led to the replacement of the lock and key paradigm (where the protein is assumed to be rigid) by a more realistic view where the ligands interact with an ensemble of differently populated conformational states in equilibrium.6, 7 A ligand could bind with different affinities to one (or some) of such pre-existing equilibrium conformations, inducing the protein to populate infrequent states by shifting their population distribution, namely induced fit.
Recent advances in the description of protein flexibility in the context of docking have been published elsewhere,1, 5-9 and it is beyond of the scope of this paper give an extensive revision of them. Among the proposed protocols, one of the most promising alternative is the use of ensembles of structures, namely Multiple Receptor Conformations (MRC), where the ligand is docked not to just one but rather to a set of receptor conformations.6 The conformational ensemble can be provided by experimental methods such as X-ray crystallography or NMR spectroscopy,10-15 however, this information may only be partially available or not at all.
Probably the most rigorous approach to date for the atomistic description of flexibility is molecular dynamics simulation (MD) insofar as it represents the dynamics of the molecules in physiological environments using rigorous physical potentials.16, 17 MD has been successfully employed in docking context,18-26 however, high requirements both in terms of computational resources and human expertise strongly limit its routine application in the field. Also, a certain level of concern exists regarding the selection of representative structures for docking among the vast pool of generated conformations. In this regard, it is possible to isolate functional motions from purely thermal noise from the MD trajectories by using principal component analysis (PCA) of the covariance of atomic fluctuations, usually referred as essential dynamics.27 This collective space is very stable and highly dependent on the molecule topology28, 29 so that similar results can be obtained with different approximations. This is the case of motions obtained from anharmonic MD relative to simplistic methods for exploring the equilibrium dynamics such as harmonic Normal Mode Analysis (NMA),29-37 widely used as a tool for describing flexibility and that has been already applied in docking context.38-43 The fact that binding sites in globular proteins are rigid,44-46 packed as tightly as solid,47 along with its simplicity and speed, makes NMA especially suitable for docking, representing a viable alternative to the costly MD simulations.
Here we propose a fast and general in silico method for representing the equilibrium dynamics of the receptor where a small number of MRC conformations are generated within all heavy atom (backbone and side chains) anisotropic Elastic Network Normal Modes (EN·NMA) space.28 In contrast to previous results,41, 42 the method does not require a priori knowledge of the region to sample and can be applied to any experimental structure without further refinement steps,39 significantly reducing the computational time. The performance of the method has been evaluated with ICM48, 49 in a complete cross docking benchmark containing 28 holo-conformations corresponding to 14 targets of interest in biomedical applications50 (see Table 1) and compared with docking to single experimental receptor conformations (SRC). The algorithm concentrates ranks of near native cross ligands geometries (at 2 Å RMSD distance to the crystallographic pose) as the first ranked solution 20% more efficiently than a single experimental conformation and provides stronger binding score energies, which might be interesting from a predictive point of view.51 Three of the twenty-eight conformations (1P8D, 1PMV and 2PRG) present strong memory of the interactions and placement of their cognate ligand. For these cases, the sum of harmonic potentials seems inappropriate to model the fluctuations needed for accommodating new ligands properly. Based on our results, we can speculate that for such extreme induced fit changes, neither conformations derived from EN·NMA nor state-of-the-art molecular dynamics done without the cross ligand in the pocket will provide optimum solutions in cross docking experiments. Difficult cases may still be solvable by more time consuming methods that explicitly include the exogenous ligand in the refinement of the protein structure, as for example Sherman et al. procedure,52 SCARE,50 longer MD simulations19 or enhanced sampling methods.53
The proposed method consists of three major consecutive steps (see Figure 1): i) the generation of the multiple receptor conformations, ii) the docking of the ligand to the representative protein conformations and iii) the final ICM scoring of the poses. The protocol is built on top of a well established and optimized ICM docking and scoring algorithm to a single static receptor conformation (SRC).48, 49 The generation of the binding site conformations is described in detail below.
Previous studies39, 41 had already shown the capacity of coarse-grained normal mode analysis with elastic potentials for the representation of the trace flexibility of protein kinases (Cα) in docking context, especially when the region of interest is known. The use of the Cα as descriptors has the advantage of speed, because it drastically reduces the time needed for the diagonalization of the Hessian (i.e., 3N×3N matrix, where N is the number of atoms) of a standard protein (e.g. 300 residues). However, the concerted fluctuations of the atoms from backbone and side chains are not explicitly represented.
Here we address the same problem from a different perspective, assuming the initial specificity of ligand depends on local fluctuations of all the atoms defining the binding site.
The first issue we explored here was the optimum radius of the sphere for the residues to be included in EN·NMA. The need for a cutoff radius has two clear implications; one practical, because the computational speed and memory requirements for the diagonalization of the Hessian increases exponentially with the number of atoms (see Supporting Figure S1), and second, that each coordinate adds one dimension to the orthonormal space, diluting the local binding site fluctuations into the global protein movement.
For that purpose we selected the heavy atoms from residues within 10, 15 and 20 Å distance from the cognate ligand, leaving a reasonable space between the residues implied in docking and the boundaries (see Figure 1a), performed EN·NMA (see computational Methods for details) and generated 100 different conformers for each distance. For each MRC, independent docking calculations were performed where the bounding box was derived from residues within 5 Å from the cognate ligand. As can be observed in Supporting Figure S2, best results are obtained globally when a radius of 10 Å is chosen, that implies using from 63 to 125 amino acids, including between 482 to 1017 heavy atoms (see Table 2 and Figure 1), depending on the location of the binding site (surface, buried) and packing of the protein.
To quantify the effect of the rest of the protein in the docking performance we repeated the experiments using i) a multi-level representation of the protein,54 where we used an all heavy atom definition for the binding site and Cα for the rest, and ii) all the heavy atoms for the complete protein. According to our results, the inclusion of more atoms in the EN·NMA did not improve the docking performance in our benchmark, resulting in Near Native solution (NNs) ranks similar to those obtained with a radius greater than 10 Å. Therefore, it seems unnecessary to include them in the EN·NMA calculation unless a structural argument exists for the coupling of global and local movements (e.g., hinge movements, domain movement or coupled motions). These results are in agreement with the ones obtained for bacteriorhodopsin, which suggested that dynamics of the active site were different from the global protein movement.55, 56
Once the residues are chosen, the ligand is removed and an elastic network normal mode analysis is performed on the receptor (see Methods) where the structure is chosen as a minimum and an inverse exponential function is adopted to account for the distance dependence force constants.57 This approximation shows better correlation with atomistic MD with respect to a classical pure cutoff method58 where all the atoms within a cutoff have the same elastic constant.
The relative amplitude of the displacements in an elastic network is empirical and must be defined by the value of spring constants between atoms. To generate realistic fluctuations, the springs were calibrated so that the binding pocket residues fluctuate similarly not only qualitatively but also quantitatively to the ones obtained with atomistic molecular dynamics simulation for the protein Aldose Reductase (from MoDEL database;59 PDB Code: 1AH0, See Supporting Information Figure S3a). After the minimization, the average heavy atom RMSD of the ensembles against the X-ray (see Table 2) was ~1 Å (0.8 Å for backbone) being in the same range of the RMSD between each pair of conformations.50 As seen previously,20, 60 such a slight fluctuation dramatically improves the sterical interactions between receptor and ligand. It is worth noting that other qualitative and quantitative descriptions of the topology of the protein can be easily implemented, as for example, the calibration of the elastic constants according to X-ray densities. The method will also benefit from more accurate potential energy functions containing bonded and non-bonded terms such as the used in standard Normal Mode Analysis.37
The next step is to generate Cartesian MRC structures along the collective EN·NMA space that can be used in docking simulations. The diagonalization of the Hessian provides a set of 3N-6 eigenvectors ranked according to the corresponding eigenvalues. The eigenvectors are the collective directive of motions of atoms, and the eigenvalues give the energy cost of deforming the system by one unit length along the eigenvectors. When using a 10 Å sphere radius we deal with ~ 675 heavy atoms on average, resulting on 2010 eigenvectors, or in other words, 2010 collective directions to derive movements. However, not all the modes contribute equally to the movement and, in fact, only a subspace provide significant movement (see dimensionality61 in Table 2). In an attempt to establish the size of such subspace we calculated the correlation between MD and EN·NMA spaces for the Aldose Reductase protein (see Supporting Figure S3b). The dot product of both orthonormal spaces increases up to ~100 eigenvectors, indicating that a subspace containing 5% of the vectors has a correlation of 60%, being surprisingly high (~250 standard deviations above random) and similar to the correlation between two parts of the same MD trajectory. The collectivities62 (i.e., the percentage of atoms displaced in each mode) of the first 100 modes is high (See Supporting Information Figure S3c), indicating that at this level the force opposing oscillations stems from a combined effect of enough numerous interacting pairs, still being within the theoretical limit of the elastic network model.28 According to that, and after several tests, we generalized one hundred eigenvectors as the definition of the important subspace for all the 28 proteins, enclosing the majority of the relevant movement (see dimensionality in Table 2). Other roomy definitions (e.g., vectors including 75% of variance) provide similar results for the whole dataset. However, although the excess of normal modes does not decrease the global performance, the lack of them (e.g., using only 10 to 25 modes) consistently reduces the number of correct ligand geometries scored first in 10-15% of the benchmark.
For docking purposes, the ensemble should be not only diverse but also relevant so a balance between computational efficiency and accuracy is in order. Our goal is to find the average number of equilibrium structures that globally provides the best docking results (28 proteins). For that purpose we generated 5, 10, 25, 50, 100 and 200 independent conformations for the binding site (see Monte Carlo procedure in Methods), which were tested in independent docking runs. As shown in Figure 3, when using a single receptor conformation (SRC) we found a Near Native solution ranked in the top scoring position in 39% of the cases, in 71% of the cases within the first five scored poses and in 82% within the first ten, which indicates that ICM is performing well finding the near native geometries but the associated scoring energies are not within the top ranks.
As can be observed in Figure 3, it is possible to increase the number of correctly ligand geometries scored first only by using only 5 or 10 MRC. The results show also that the inclusion of the experimental structure among the MRC is important when the number of conformers is lower than 25. The best results are obtained when 100 MRC are used, having a good compromise between near native solutions ranked 1st, or within the first 5 or 10 lowest energy scores. Interestingly, when adding more conformations (200) the increase of false positives reduces the performance, as seen previously with experimental MRC.10
Although one hundred protein conformations provide the best results, some structural redundancy may exist. To eliminate it, we performed a geometrical clustering (RMSD based) on the side-chains heavy atoms (at 5 Å distance from the cognate ligand) by using thresholds in the range of 1 Å to 0.5 Å (see Figure 4). Our results suggest that a threshold of 0.6 Å captures the variability needed to obtain the same performance as the original ensemble but reducing the number of conformations by 40% on average.
At this point we investigated whether or not it is possible to reduce the number of structures being applied in, for example, a virtual ligand screening experiment (VLS). Cheng et al.18 reported recently good results by using only the representative structures from the most populated clusters from a MD simulation. Based on our results, we have not found any clear relation between the populations of the NMA clusters and the scoring energy of the ligand poses (See Supporting Figure S5). The data shows that in multiple cases, the most populated conformations of our ensembles are not the ones providing the best complementarities for the ligand. Thus, although the exclusion of less populated conformations could improve the speed, it could also affect the quality of the results.
After analyzing our results globally, we can conclude that with our protocol we need on average 66 ± 24 conformations to improve cross docking results in terms of geometry and scoring against a single receptor conformation. This implies using only ~2% of the computational time needed for other refinement-based protocols such as SCARE.50 The inclusion of the experimental structure of the protein enhances the global ratio of success when using less than 25 conformations, being less noticeable as the conformational space increases.
The presented benchmark comprises proteins conformations corresponding to 14 targets of interest in biomedical applications, each of them displaying two different conformations when co-crystallized with different ligands (see Table 1).
For single receptor conformation docking (see Supporting Figure S4 and Supporting Table S1), it is possible to successfully self dock the cognate ligand in 27 of 28 cases. In 25 of 28 the correct geometry is predicted as the best scored solution, and in 27 within the first two. We found only in one of the conformations (COX-2; 1CX2 with ligand SC-588) that the geometry of the ligand had an RMSD greater than 2 Å to the experimental position, but this still had the correct orientation and was predicted as the lowest energy solution. For cross docking (see Figure 3 and Table 3), 11 of 28 have a near native solution as the lowest energy position, 20 within the first five and 23 within the first ten.
When sufficient MRC are used in cross docking, there is a global increase in the number of near native solutions ranked as the lowest energy solution. Also, the associated binding scores of correctly predicted geometries are in general stronger (see Table 3, e.g., 1XKA). Interestingly, a small improvement in energies appears in some self docking solutions (see Supporting Table S1). Currently, we are analyzing if this improvement in the energies leads to better enrichment factors in Virtual Library Screening experiments.
There are still a couple of cases (1PQ6, 1YD6) where the binding energies are positive. With 1PQ6 it is still possible to capture a near native pose within the first 2 lowest energy solutions, but still some steric clashes exist, making this conformation not optimum for cross docking. 1YD6 neither provides good solutions for SRC nor MRC within the first positions because of high steric clashes with the ligand. Moreover, the increase in the number of ligand poses used in MRC is reflected in the rank of the first near native solution (MRC rank 51; SRC rank 9). However, the energy of the first near native found with MRC is still 20 Kcal/mol better than for SRC.
Of special interest are the three proteins, JNK3, LXR β LBD and PPARγ (note that the first 2 are not included in Sherman’s original dataset), having strong induced fit in one of the conformations. Thus, while 1PMN, 1PQ6 and 1FM9 self and cross dock easily, 1PMV, 1P8D and 2PRG do not cross dock either in SRC or MRC. An inspection of the structures showed that they have clear clashing residues facing the binding site. Also, the co-crystallized ligands are smaller in size than those in their partner and this is reflected in the number of heavy atoms selected for the EN·NMA (see Table 2). With the only exception of crystal structure 1P8D, the difference in size persisted after the ligand was removed and the binding pocket boundaries were predicted by the Pocketome Gaussian Convolution algorithm.63 Interestingly, in spite of the smaller number of atoms used in the EN·NMA, and less dimensionality, the problematic proteins needed more clusters than their partners (see Table 2). A deeper investigation is being carried out into the importance of flexibility descriptors in the selection of more suitable structures for VLS.
The proposed method is aiming at being a fast and simple alternative to MD simulations. Nevertheless, even MD structures from holo-form simulations may not perform satisfactorily in cross docking experiments.9 The results from the more challenging pairs are in agreement with MD simulation59 of the porcine holo-Aldose Reductase (1AH0), where conformations generated including the self ligand Sorbinil could never accommodate the ligand Tolrestat, whose native binding mode require a peculiar protein rearrangement that could not be sampled during the simulation (co-crystal 1AH3). For such a protein, neither MD conformations (coming from a multi-minima conformational energy surface) nor EN·NMA conformations (coming from a harmonic potential surface determined at a single energy minimum) were able to represent the fluctuations needed for the correct placement of the cross ligand. When the positions of the side chains strongly depend on the foreign ligand (e.g., PPARγ: 2PRG) other methodologies including refinement provide better ligand geometries, such as Sherman’s52 or SCARE,50 but with an associated important increase in computational time.
In contrast to other protocols with similar objective, our approach allows the movement of both backbone and side chains. By using all heavy atom EN·NMA we achieved a performance very close to that of the Molecular Mechanics based method very recently proposed by Koska and colleages.64 The full potential of the method is now being tested in proteins where wider backbone displacements are needed, as for example with some protein kinases or G-protein-coupled receptors. A variant of the method that could be easily implemented is the use of the collective motions from the combination of different ensembles, such as experimental conformations (X-ray or NMR) or MD. Concerted motions from such ensembles can be determined by principal component analysis method,27, 65 and the important subspace can be used for the generation of displacements.
Although we used ICM for the docking procedure, the proposed MRC methodology is aimed to be program independent. For that reason we have developed a web-server where the user can obtain an ensemble of protein conformers in various formats, including PDB, which can be used sequentially with other docking software.
The proposed benchmark was already used to test the performance of the SCARE cross docking methodology.50 The set of complexes is mainly based on the dataset of highly challenging test cases compiled by Sherman and colleagues,52 which was further expanded in order to include several structures from other cross docking exercises recently reported in literature.66-70 No more than two structures of the same protein (identical sequence) were collected; if possible, holo structures were preferred over apo structures and complexes with small organic molecules over complexes with endogenous binders, peptides, or peptidomimetic drugs. Metalloproteins were excluded from the selection due to the unfeasibility of correctly assessing the receptor flexibility without manually excluding the side chains that coordinate the metal ion(s). Finally, structures where the conformational differences could be ascribed to the poor quality of the crystals rather than to a genuine induce fit effect were excluded as well.
The benchmark encompassed only structures pairs where a limited number of side chains and backbone regions displayed significant rearrangements. No structure pairs where only subtle (if any) differences where detectable or, conversely, where large scale movements or domain motions took place were included. Twenty eight structures (two conformations of 14 different proteins) were selected (see Table 1). The set of structures can be considered diverse and consistently challenging for a cross docking protocol, representative of several important targets of interest in medicinal chemistry. The atomic coordinates were retrieved from the Protein Data Bank (PDB).71
The chains not defining the binding site were deleted. When multiple copies of the biological entity were included in the asymmetric unit, the most complete one was chosen. Water molecules were deliberately removed and only HET groups within the EN·NMA boundaries were kept. After the assignment of the correct atom types, disulfide bonds, hydrogens and missing heavy atoms were added to the structure. For the side chains whose occupancy was lower than one, the best energy conformation was selected. If available, histidines were assigned the tautomeric state reported; otherwise, the energetically most favorable state was assigned. Positions of polar hydrogen atoms, asparagines and glutamines adjacent to the binding site were optimized to maximize hydrogen bonding with the rest of the protein and the cognate ligand. Finally, the ligand was deleted from the holo structures and the binding pocket regions of the two conformations of each protein were superimposed72 using backbone atoms.
Coordinates of the ligands were also taken from the crystallographic complexes. Bonds order, tautomeric forms, stereochemistry, hydrogen atoms and protonation state were assigned based on the primary literature description. Each ligand was assigned the MMFF73 force field atom types and charges. Ligand molecules were prepared for docking by energy minimization in the absence of the receptor, and the lowest energy conformations were used as starting points for docking simulations.
The heavy atoms from residues within 10 Å from the cognate ligand were selected as initial coordinates for the Normal Mode Analysis.
In this methodology its is assumed that the displacement of an atom from its equilibrium position is small and that the potential energy in the vicinity of the equilibrium position can be approximated as a sum of terms that are quadratic in the atomic displacements.29 In its purest form, it uses the same allatom force field from a MD simulation, implying a prior minimization in vacuo.37 Tirion74 proposed a simplified model where the interaction between two atoms was described by Hookean pairwise potential where the distances are taken to be at the energy minimum, avoiding the minimization,
where ri denotes the coordinates of atom i, and rij = ri - rj. The zero superscript indicates the coordinate at the original conformation. The strength of the potential Kij is assumed to be the same for all interacting pairs and the total potential is,
where R is the cut off parameter, so that the interactions are limited to pairs separated by R (8-12 Å).
This simplified potential function commonly is referred as Elastic Network (EN·NMA) because the macromolecule is represented as a network, which includes all interactions within a distance. In this paper, we used a continuum function for the spring constant Kij that assumes an inverse exponential relationship between the distance and the force constant and avoids the use of the cutoff.57 The constants were quantitatively adjusted to describe fluctuations in the range of a 10 nanoseconds MD trajectory,59
where d0 is a fitted constant, taken as the mean Cα-Cα distance between consecutive residues. The masses of the heavy atoms where set to 1. The force constant matrix of the system is described by the Hessian matrix (3N × 3N matrix, being N the number of heavy atoms), obtained as the partial second derivatives of the potential with respect to the coordinates. Complete diagonalization of the Hessian yields 3N-6 eigenvectors ranked according to their corresponding eigenvalues.
The EN·NMA space is used for the generation of Cartesian displacements. Here we have chosen a Metropolis Monte Carlo algorithm with a Hamiltonian defined in eq. 5 that reproduced the original sampling fluctuations in MD trajectories.36 In each iteration, one of the one hundred modes is displaced randomly, following a Metropolis criteria. This generates allowed displacements (at a temperature of 300 K) on each of the modes that can be projected onto the Cartesian space.
where n is the number of modes (100) and ΔDi stands for a displacement along a given mode i. ki is the stiffness constant associated with harmonic deformation on mode i (i.e., eigenvalue).
It is possible to obtain bigger displacements by increasing the temperature of the Monte Carlo procedure.
The ICM receptor topology of the single receptor conformation (see above) was tethered to the coordinates of each MRC. The chemical distortions produced by displacements were corrected with 25 steps of Cartesian minimization, making unnecessary the inclusion of the hydrogen atoms in the EN·NMA calculation.
ICM addresses the docking issue as a global optimization problem, implementing a biased probability Monte Carlo (BPMC) global stochastic optimizer.48 Because the BPMC method was previously reported and thoroughly described, it is only briefly summarized here. During docking, one of the ligand torsional or roto-translational variables is randomly changed. A local refinement is carried out on the analytically differentiable terms by a conjugate-gradient minimization. The complete energy is calculated by adding up the contributions of the solvation energy and those of the conformational entropy. The new conformation is accepted or rejected according to the Metropolis criteria. A new random change is introduced and the whole procedure is repeated all over until a previously set number of steps are performed.
The molecular system was described using internal coordinate variables. Protein atomic types and parameters were taken from a modified version of the ECEPP/3 force field.75 The bounding box was derived from the residues at 5 Å distance from the self ligand. The binding pocket was described as a combination of five potentials that accounted for two van der Waals boundaries, electrostatics, hydrophobicity, and hydrogen bonding. Each potential was expressed on a pre-calculated grid spacing 0.5 Å. Because the standard 6-12 van der Waals potential was considered too sensitive to steric clashes for the purpose of the simulations, a truncated soft version was introduced and the other potentials were rescaled accordingly, where the van der Waals potentials were set to 1.0 Kcal/mol. The ligand was docked into the grid representation of the pocket using the BPMC method described above.
The global optimization provided a stack of geometrically diverse ligand poses, and the best ten were assigned a docking score by the standard ICM empirical scoring function.76, 77 In MRC docking, an independent run was carried out for each receptor conformation and the best five poses of each run were scored and combined in a stack consisting of 5×M (being M the number of MRC) poses. For selected cases, we checked that the different number of scored poses between SRC and MRC was not responsible of their performance (i.e., combining 5×M’ poses, being M’ the number of SRC docking experiments). In all the cases the redundancy was eliminated from the stack and the poses were sorted according to their ICM binding score energies. A Near Native solution (NNs) was considered when the ligand heavy atom RMSD against the crystallographic coordinates was ≤ 2.0 Å.
The receptors and ligands preparation, the docking simulations and the energy evaluations were all carried out by ICM 3.6 (Molsoft L.L.C., La Jolla, CA) coupled to in-house programs for the MRC generation.
The MRC method, including Normal Mode Analysis, generation of Cartesian ensemble of 100 conformations, tethering, minimization and geometrical clustering took around 5 minutes per protein (see figure 1). Each single rigid receptor docking run took approximately between 30 and 90 seconds depending on the ligand size. The approximate total time per ligand is 1 hour on a single CPU. Because rigid dockings are independent, the calculation could be easily split between different CPUs.
The docking simulations ran on an Intel Intel® Core™ 2 Quad 2.4 GHz with 3 GB of RAM memory workstation. The MRC methodology is implemented in a web-server where the user can either choose a PDB code or upload a file containing a PDB structure and obtain an ensemble of conformations (http://abagyan.scripps.edu/MRC).
The inclusion of the flexibility of the protein is in many cases crucial to accurately predict the orientation and interactions of ligands within the binding pocket. One of the big challenges is to routinely incorporate flexibility considerations into structure based drug discovery in an affordable computing time. Here, we have shown that using less than a hundred conformations it is possible to improve the performance of top scoring cross docking against a single experimental conformation. The multiple conformation ensembles were derived from the essential motions of the residues surrounding the binding site, including heavy atoms from backbone and side chains. The needed fluctuations of pocket residues seem local and rather independent of the global collective motions of the protein. The results presented here could be useful to the docking-simulation community, and improvements are expected from the use of the collective motions extracted from mixed ensembles from MD, X-ray or NMR structures. The use of the essential motions of the binding site could enhance the conformational space needed for docking.
The authors thank to Alberto Perez, Carles Ferrer-Costa, Michael S. Fish and George Nicola for reviewing the manuscript. MR is supported by a Spanish MEC Postdoctoral fellowship. This work was supported by NIH/NIGMS grants 5-R01-GM071872 and 1-R01-GM074832.