PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
J Chem Inf Model. Author manuscript; available in PMC 2010 August 1.
Published in final edited form as:
PMCID: PMC2891173
NIHMSID: NIHMS89148

Consistent improvement of cross docking results using binding site ensembles generated with Elastic Network Normal Modes

Abstract

The representation of protein flexibility is still a challenge for the state-of-the-art flexible ligand docking protocols. In this article we use a large and diverse benchmark to prove that is possible to improve consistently the cross docking performance against a single receptor conformation by using an equilibrium ensemble of binding site conformers. The benchmark contained 28 proteins, and the top ranked near native poses for the ligand were found 20% more efficiently than using a single receptor. The multiple conformations were derived from the collective variable space defined by all heavy atom elastic network normal modes, including backbone and side chains. We have found that the binding site displacements for best positioning of the ligand seem rather independent from the global collective motions of the protein. We also found that the number of binding site conformations needed to represent non-redundant flexibility was less than one hundred. The ensemble of receptor conformations can be generated in our web site at http://abagyan.scripps.edu/MRC.

INTRODUCTION

Molecular docking is routinely used within the first stages of Structural Based Drug Design (SBDD). Unfortunately, despite ongoing methodological advances and some success histories, state of the art algorithms predict an incorrect binding pose for about 50-70% of all ligands when a single receptor conformation is used.1

The need for including the receptor flexibility in docking comes after many experimental and theoretical evidences showing that proteins move at room (or physiological) temperature, and rearrange in response to binding.2-4 Inspection of structures from the Protein Data Bank clearly shows that different binding site conformations of the same protein exist for different ligands.5 Such compelling arguments led to the replacement of the lock and key paradigm (where the protein is assumed to be rigid) by a more realistic view where the ligands interact with an ensemble of differently populated conformational states in equilibrium.6, 7 A ligand could bind with different affinities to one (or some) of such pre-existing equilibrium conformations, inducing the protein to populate infrequent states by shifting their population distribution, namely induced fit.

Recent advances in the description of protein flexibility in the context of docking have been published elsewhere,1, 5-9 and it is beyond of the scope of this paper give an extensive revision of them. Among the proposed protocols, one of the most promising alternative is the use of ensembles of structures, namely Multiple Receptor Conformations (MRC), where the ligand is docked not to just one but rather to a set of receptor conformations.6 The conformational ensemble can be provided by experimental methods such as X-ray crystallography or NMR spectroscopy,10-15 however, this information may only be partially available or not at all.

Probably the most rigorous approach to date for the atomistic description of flexibility is molecular dynamics simulation (MD) insofar as it represents the dynamics of the molecules in physiological environments using rigorous physical potentials.16, 17 MD has been successfully employed in docking context,18-26 however, high requirements both in terms of computational resources and human expertise strongly limit its routine application in the field. Also, a certain level of concern exists regarding the selection of representative structures for docking among the vast pool of generated conformations. In this regard, it is possible to isolate functional motions from purely thermal noise from the MD trajectories by using principal component analysis (PCA) of the covariance of atomic fluctuations, usually referred as essential dynamics.27 This collective space is very stable and highly dependent on the molecule topology28, 29 so that similar results can be obtained with different approximations. This is the case of motions obtained from anharmonic MD relative to simplistic methods for exploring the equilibrium dynamics such as harmonic Normal Mode Analysis (NMA),29-37 widely used as a tool for describing flexibility and that has been already applied in docking context.38-43 The fact that binding sites in globular proteins are rigid,44-46 packed as tightly as solid,47 along with its simplicity and speed, makes NMA especially suitable for docking, representing a viable alternative to the costly MD simulations.

Here we propose a fast and general in silico method for representing the equilibrium dynamics of the receptor where a small number of MRC conformations are generated within all heavy atom (backbone and side chains) anisotropic Elastic Network Normal Modes (EN·NMA) space.28 In contrast to previous results,41, 42 the method does not require a priori knowledge of the region to sample and can be applied to any experimental structure without further refinement steps,39 significantly reducing the computational time. The performance of the method has been evaluated with ICM48, 49 in a complete cross docking benchmark containing 28 holo-conformations corresponding to 14 targets of interest in biomedical applications50 (see Table 1) and compared with docking to single experimental receptor conformations (SRC). The algorithm concentrates ranks of near native cross ligands geometries (at 2 Å RMSD distance to the crystallographic pose) as the first ranked solution 20% more efficiently than a single experimental conformation and provides stronger binding score energies, which might be interesting from a predictive point of view.51 Three of the twenty-eight conformations (1P8D, 1PMV and 2PRG) present strong memory of the interactions and placement of their cognate ligand. For these cases, the sum of harmonic potentials seems inappropriate to model the fluctuations needed for accommodating new ligands properly. Based on our results, we can speculate that for such extreme induced fit changes, neither conformations derived from EN·NMA nor state-of-the-art molecular dynamics done without the cross ligand in the pocket will provide optimum solutions in cross docking experiments. Difficult cases may still be solvable by more time consuming methods that explicitly include the exogenous ligand in the refinement of the protein structure, as for example Sherman et al. procedure,52 SCARE,50 longer MD simulations19 or enhanced sampling methods.53

Table 1
List of crystallographic structures included in the cross docking benchmark

RESULTS AND DISCUSSION

The proposed method consists of three major consecutive steps (see Figure 1): i) the generation of the multiple receptor conformations, ii) the docking of the ligand to the representative protein conformations and iii) the final ICM scoring of the poses. The protocol is built on top of a well established and optimized ICM docking and scoring algorithm to a single static receptor conformation (SRC).48, 49 The generation of the binding site conformations is described in detail below.

Figure 1
A schematic outline of the Normal Mode Analysis MRC algorithm.

1 Generation of multiple receptor conformations through normal modes ensemble

Previous studies39, 41 had already shown the capacity of coarse-grained normal mode analysis with elastic potentials for the representation of the trace flexibility of protein kinases (Cα) in docking context, especially when the region of interest is known. The use of the Cα as descriptors has the advantage of speed, because it drastically reduces the time needed for the diagonalization of the Hessian (i.e., 3N×3N matrix, where N is the number of atoms) of a standard protein (e.g. 300 residues). However, the concerted fluctuations of the atoms from backbone and side chains are not explicitly represented.

Here we address the same problem from a different perspective, assuming the initial specificity of ligand depends on local fluctuations of all the atoms defining the binding site.

1.1 Effect of the sphere radius

The first issue we explored here was the optimum radius of the sphere for the residues to be included in EN·NMA. The need for a cutoff radius has two clear implications; one practical, because the computational speed and memory requirements for the diagonalization of the Hessian increases exponentially with the number of atoms (see Supporting Figure S1), and second, that each coordinate adds one dimension to the orthonormal space, diluting the local binding site fluctuations into the global protein movement.

For that purpose we selected the heavy atoms from residues within 10, 15 and 20 Å distance from the cognate ligand, leaving a reasonable space between the residues implied in docking and the boundaries (see Figure 1a), performed EN·NMA (see computational Methods for details) and generated 100 different conformers for each distance. For each MRC, independent docking calculations were performed where the bounding box was derived from residues within 5 Å from the cognate ligand. As can be observed in Supporting Figure S2, best results are obtained globally when a radius of 10 Å is chosen, that implies using from 63 to 125 amino acids, including between 482 to 1017 heavy atoms (see Table 2 and Figure 1), depending on the location of the binding site (surface, buried) and packing of the protein.

Table 2
Structural and flexibility descriptors associated to the proteins included in the benchmark

To quantify the effect of the rest of the protein in the docking performance we repeated the experiments using i) a multi-level representation of the protein,54 where we used an all heavy atom definition for the binding site and Cα for the rest, and ii) all the heavy atoms for the complete protein. According to our results, the inclusion of more atoms in the EN·NMA did not improve the docking performance in our benchmark, resulting in Near Native solution (NNs) ranks similar to those obtained with a radius greater than 10 Å. Therefore, it seems unnecessary to include them in the EN·NMA calculation unless a structural argument exists for the coupling of global and local movements (e.g., hinge movements, domain movement or coupled motions). These results are in agreement with the ones obtained for bacteriorhodopsin, which suggested that dynamics of the active site were different from the global protein movement.55, 56

1.2 Calibration of the elastic network heavy atom potential

Once the residues are chosen, the ligand is removed and an elastic network normal mode analysis is performed on the receptor (see Methods) where the structure is chosen as a minimum and an inverse exponential function is adopted to account for the distance dependence force constants.57 This approximation shows better correlation with atomistic MD with respect to a classical pure cutoff method58 where all the atoms within a cutoff have the same elastic constant.

The relative amplitude of the displacements in an elastic network is empirical and must be defined by the value of spring constants between atoms. To generate realistic fluctuations, the springs were calibrated so that the binding pocket residues fluctuate similarly not only qualitatively but also quantitatively to the ones obtained with atomistic molecular dynamics simulation for the protein Aldose Reductase (from MoDEL database;59 PDB Code: 1AH0, See Supporting Information Figure S3a). After the minimization, the average heavy atom RMSD of the ensembles against the X-ray (see Table 2) was ~1 Å (0.8 Å for backbone) being in the same range of the RMSD between each pair of conformations.50 As seen previously,20, 60 such a slight fluctuation dramatically improves the sterical interactions between receptor and ligand. It is worth noting that other qualitative and quantitative descriptions of the topology of the protein can be easily implemented, as for example, the calibration of the elastic constants according to X-ray densities. The method will also benefit from more accurate potential energy functions containing bonded and non-bonded terms such as the used in standard Normal Mode Analysis.37

1.3 Definition of the “important subspace”

The next step is to generate Cartesian MRC structures along the collective EN·NMA space that can be used in docking simulations. The diagonalization of the Hessian provides a set of 3N-6 eigenvectors ranked according to the corresponding eigenvalues. The eigenvectors are the collective directive of motions of atoms, and the eigenvalues give the energy cost of deforming the system by one unit length along the eigenvectors. When using a 10 Å sphere radius we deal with ~ 675 heavy atoms on average, resulting on 2010 eigenvectors, or in other words, 2010 collective directions to derive movements. However, not all the modes contribute equally to the movement and, in fact, only a subspace provide significant movement (see dimensionality61 in Table 2). In an attempt to establish the size of such subspace we calculated the correlation between MD and EN·NMA spaces for the Aldose Reductase protein (see Supporting Figure S3b). The dot product of both orthonormal spaces increases up to ~100 eigenvectors, indicating that a subspace containing 5% of the vectors has a correlation of 60%, being surprisingly high (~250 standard deviations above random) and similar to the correlation between two parts of the same MD trajectory. The collectivities62 (i.e., the percentage of atoms displaced in each mode) of the first 100 modes is high (See Supporting Information Figure S3c), indicating that at this level the force opposing oscillations stems from a combined effect of enough numerous interacting pairs, still being within the theoretical limit of the elastic network model.28 According to that, and after several tests, we generalized one hundred eigenvectors as the definition of the important subspace for all the 28 proteins, enclosing the majority of the relevant movement (see dimensionality in Table 2). Other roomy definitions (e.g., vectors including 75% of variance) provide similar results for the whole dataset. However, although the excess of normal modes does not decrease the global performance, the lack of them (e.g., using only 10 to 25 modes) consistently reduces the number of correct ligand geometries scored first in 10-15% of the benchmark.

1.4 Generation of the Cartesian ensemble and choice of the number of conformations

For docking purposes, the ensemble should be not only diverse but also relevant so a balance between computational efficiency and accuracy is in order. Our goal is to find the average number of equilibrium structures that globally provides the best docking results (28 proteins). For that purpose we generated 5, 10, 25, 50, 100 and 200 independent conformations for the binding site (see Monte Carlo procedure in Methods), which were tested in independent docking runs. As shown in Figure 3, when using a single receptor conformation (SRC) we found a Near Native solution ranked in the top scoring position in 39% of the cases, in 71% of the cases within the first five scored poses and in 82% within the first ten, which indicates that ICM is performing well finding the near native geometries but the associated scoring energies are not within the top ranks.

Figure 3
Success rate for cross docking experiments using 1, 5, 10, 25, 50 and 200 protein conformations. A value of 100% indicates success in the 28 proteins. The colours show if the rank of the first near native solution (NNs) has the lowest energy score, is ...

As can be observed in Figure 3, it is possible to increase the number of correctly ligand geometries scored first only by using only 5 or 10 MRC. The results show also that the inclusion of the experimental structure among the MRC is important when the number of conformers is lower than 25. The best results are obtained when 100 MRC are used, having a good compromise between near native solutions ranked 1st, or within the first 5 or 10 lowest energy scores. Interestingly, when adding more conformations (200) the increase of false positives reduces the performance, as seen previously with experimental MRC.10

1.5 Geometrical clustering to eliminate protein structural redundancy

Although one hundred protein conformations provide the best results, some structural redundancy may exist. To eliminate it, we performed a geometrical clustering (RMSD based) on the side-chains heavy atoms (at 5 Å distance from the cognate ligand) by using thresholds in the range of 1 Å to 0.5 Å (see Figure 4). Our results suggest that a threshold of 0.6 Å captures the variability needed to obtain the same performance as the original ensemble but reducing the number of conformations by 40% on average.

Figure 4
Effect of the geometric clustering (side chains) on the global success in cross docking experiments. A value of 100% indicates success in the 28 proteins. The colours show if the rank of the first near native solution (NNs) has the lowest energy score, ...

At this point we investigated whether or not it is possible to reduce the number of structures being applied in, for example, a virtual ligand screening experiment (VLS). Cheng et al.18 reported recently good results by using only the representative structures from the most populated clusters from a MD simulation. Based on our results, we have not found any clear relation between the populations of the NMA clusters and the scoring energy of the ligand poses (See Supporting Figure S5). The data shows that in multiple cases, the most populated conformations of our ensembles are not the ones providing the best complementarities for the ligand. Thus, although the exclusion of less populated conformations could improve the speed, it could also affect the quality of the results.

After analyzing our results globally, we can conclude that with our protocol we need on average 66 ± 24 conformations to improve cross docking results in terms of geometry and scoring against a single receptor conformation. This implies using only ~2% of the computational time needed for other refinement-based protocols such as SCARE.50 The inclusion of the experimental structure of the protein enhances the global ratio of success when using less than 25 conformations, being less noticeable as the conformational space increases.

2 Benchmark results

The presented benchmark comprises proteins conformations corresponding to 14 targets of interest in biomedical applications, each of them displaying two different conformations when co-crystallized with different ligands (see Table 1).

For single receptor conformation docking (see Supporting Figure S4 and Supporting Table S1), it is possible to successfully self dock the cognate ligand in 27 of 28 cases. In 25 of 28 the correct geometry is predicted as the best scored solution, and in 27 within the first two. We found only in one of the conformations (COX-2; 1CX2 with ligand SC-588) that the geometry of the ligand had an RMSD greater than 2 Å to the experimental position, but this still had the correct orientation and was predicted as the lowest energy solution. For cross docking (see Figure 3 and Table 3), 11 of 28 have a near native solution as the lowest energy position, 20 within the first five and 23 within the first ten.

Table 3
Cross docking results obtained when using 100 (EN·NMA based) multiple receptor conformations (italics) against single receptor conformationa

When sufficient MRC are used in cross docking, there is a global increase in the number of near native solutions ranked as the lowest energy solution. Also, the associated binding scores of correctly predicted geometries are in general stronger (see Table 3, e.g., 1XKA). Interestingly, a small improvement in energies appears in some self docking solutions (see Supporting Table S1). Currently, we are analyzing if this improvement in the energies leads to better enrichment factors in Virtual Library Screening experiments.

There are still a couple of cases (1PQ6, 1YD6) where the binding energies are positive. With 1PQ6 it is still possible to capture a near native pose within the first 2 lowest energy solutions, but still some steric clashes exist, making this conformation not optimum for cross docking. 1YD6 neither provides good solutions for SRC nor MRC within the first positions because of high steric clashes with the ligand. Moreover, the increase in the number of ligand poses used in MRC is reflected in the rank of the first near native solution (MRC rank 51; SRC rank 9). However, the energy of the first near native found with MRC is still 20 Kcal/mol better than for SRC.

Of special interest are the three proteins, JNK3, LXR β LBD and PPARγ (note that the first 2 are not included in Sherman’s original dataset), having strong induced fit in one of the conformations. Thus, while 1PMN, 1PQ6 and 1FM9 self and cross dock easily, 1PMV, 1P8D and 2PRG do not cross dock either in SRC or MRC. An inspection of the structures showed that they have clear clashing residues facing the binding site. Also, the co-crystallized ligands are smaller in size than those in their partner and this is reflected in the number of heavy atoms selected for the EN·NMA (see Table 2). With the only exception of crystal structure 1P8D, the difference in size persisted after the ligand was removed and the binding pocket boundaries were predicted by the Pocketome Gaussian Convolution algorithm.63 Interestingly, in spite of the smaller number of atoms used in the EN·NMA, and less dimensionality, the problematic proteins needed more clusters than their partners (see Table 2). A deeper investigation is being carried out into the importance of flexibility descriptors in the selection of more suitable structures for VLS.

The proposed method is aiming at being a fast and simple alternative to MD simulations. Nevertheless, even MD structures from holo-form simulations may not perform satisfactorily in cross docking experiments.9 The results from the more challenging pairs are in agreement with MD simulation59 of the porcine holo-Aldose Reductase (1AH0), where conformations generated including the self ligand Sorbinil could never accommodate the ligand Tolrestat, whose native binding mode require a peculiar protein rearrangement that could not be sampled during the simulation (co-crystal 1AH3). For such a protein, neither MD conformations (coming from a multi-minima conformational energy surface) nor EN·NMA conformations (coming from a harmonic potential surface determined at a single energy minimum) were able to represent the fluctuations needed for the correct placement of the cross ligand. When the positions of the side chains strongly depend on the foreign ligand (e.g., PPARγ: 2PRG) other methodologies including refinement provide better ligand geometries, such as Sherman’s52 or SCARE,50 but with an associated important increase in computational time.

In contrast to other protocols with similar objective, our approach allows the movement of both backbone and side chains. By using all heavy atom EN·NMA we achieved a performance very close to that of the Molecular Mechanics based method very recently proposed by Koska and colleages.64 The full potential of the method is now being tested in proteins where wider backbone displacements are needed, as for example with some protein kinases or G-protein-coupled receptors. A variant of the method that could be easily implemented is the use of the collective motions from the combination of different ensembles, such as experimental conformations (X-ray or NMR) or MD. Concerted motions from such ensembles can be determined by principal component analysis method,27, 65 and the important subspace can be used for the generation of displacements.

Although we used ICM for the docking procedure, the proposed MRC methodology is aimed to be program independent. For that reason we have developed a web-server where the user can obtain an ensemble of protein conformers in various formats, including PDB, which can be used sequentially with other docking software.

MATERIALS AND METHODS

Benchmark

The proposed benchmark was already used to test the performance of the SCARE cross docking methodology.50 The set of complexes is mainly based on the dataset of highly challenging test cases compiled by Sherman and colleagues,52 which was further expanded in order to include several structures from other cross docking exercises recently reported in literature.66-70 No more than two structures of the same protein (identical sequence) were collected; if possible, holo structures were preferred over apo structures and complexes with small organic molecules over complexes with endogenous binders, peptides, or peptidomimetic drugs. Metalloproteins were excluded from the selection due to the unfeasibility of correctly assessing the receptor flexibility without manually excluding the side chains that coordinate the metal ion(s). Finally, structures where the conformational differences could be ascribed to the poor quality of the crystals rather than to a genuine induce fit effect were excluded as well.

The benchmark encompassed only structures pairs where a limited number of side chains and backbone regions displayed significant rearrangements. No structure pairs where only subtle (if any) differences where detectable or, conversely, where large scale movements or domain motions took place were included. Twenty eight structures (two conformations of 14 different proteins) were selected (see Table 1). The set of structures can be considered diverse and consistently challenging for a cross docking protocol, representative of several important targets of interest in medicinal chemistry. The atomic coordinates were retrieved from the Protein Data Bank (PDB).71

Single receptor conformation set up

The chains not defining the binding site were deleted. When multiple copies of the biological entity were included in the asymmetric unit, the most complete one was chosen. Water molecules were deliberately removed and only HET groups within the EN·NMA boundaries were kept. After the assignment of the correct atom types, disulfide bonds, hydrogens and missing heavy atoms were added to the structure. For the side chains whose occupancy was lower than one, the best energy conformation was selected. If available, histidines were assigned the tautomeric state reported; otherwise, the energetically most favorable state was assigned. Positions of polar hydrogen atoms, asparagines and glutamines adjacent to the binding site were optimized to maximize hydrogen bonding with the rest of the protein and the cognate ligand. Finally, the ligand was deleted from the holo structures and the binding pocket regions of the two conformations of each protein were superimposed72 using backbone atoms.

Ligand set up

Coordinates of the ligands were also taken from the crystallographic complexes. Bonds order, tautomeric forms, stereochemistry, hydrogen atoms and protonation state were assigned based on the primary literature description. Each ligand was assigned the MMFF73 force field atom types and charges. Ligand molecules were prepared for docking by energy minimization in the absence of the receptor, and the lowest energy conformations were used as starting points for docking simulations.

Multiple receptor conformation set up

The heavy atoms from residues within 10 Å from the cognate ligand were selected as initial coordinates for the Normal Mode Analysis.

Normal Mode Analysis

In this methodology its is assumed that the displacement of an atom from its equilibrium position is small and that the potential energy in the vicinity of the equilibrium position can be approximated as a sum of terms that are quadratic in the atomic displacements.29 In its purest form, it uses the same allatom force field from a MD simulation, implying a prior minimization in vacuo.37 Tirion74 proposed a simplified model where the interaction between two atoms was described by Hookean pairwise potential where the distances are taken to be at the energy minimum, avoiding the minimization,

E(ri,rj)=Kij(rijrij0)2,
(2)

where ri denotes the coordinates of atom i, and rij = ri - rj. The zero superscript indicates the coordinate at the original conformation. The strength of the potential Kij is assumed to be the same for all interacting pairs and the total potential is,

E=i,j,ri,jRN,NE(ri,rj),
(3)

where R is the cut off parameter, so that the interactions are limited to pairs separated by R (8-12 Å).

This simplified potential function commonly is referred as Elastic Network (EN·NMA) because the macromolecule is represented as a network, which includes all interactions within a distance. In this paper, we used a continuum function for the spring constant Kij that assumes an inverse exponential relationship between the distance and the force constant and avoids the use of the cutoff.57 The constants were quantitatively adjusted to describe fluctuations in the range of a 10 nanoseconds MD trajectory,59

Kij={350Kcalmol.A2forj=i+10.25Kcalmol.A2(dij0dij)6forji+1}
(4)

where d0 is a fitted constant, taken as the mean Cα-Cα distance between consecutive residues. The masses of the heavy atoms where set to 1. The force constant matrix of the system is described by the Hessian matrix (3N × 3N matrix, being N the number of heavy atoms), obtained as the partial second derivatives of the potential with respect to the coordinates. Complete diagonalization of the Hessian yields 3N-6 eigenvectors ranked according to their corresponding eigenvalues.

The EN·NMA space is used for the generation of Cartesian displacements. Here we have chosen a Metropolis Monte Carlo algorithm with a Hamiltonian defined in eq. 5 that reproduced the original sampling fluctuations in MD trajectories.36 In each iteration, one of the one hundred modes is displaced randomly, following a Metropolis criteria. This generates allowed displacements (at a temperature of 300 K) on each of the modes that can be projected onto the Cartesian space.

E=i=1n12kiΔDi2
(5)

where n is the number of modes (100) and ΔDi stands for a displacement along a given mode i. ki is the stiffness constant associated with harmonic deformation on mode i (i.e., eigenvalue).

It is possible to obtain bigger displacements by increasing the temperature of the Monte Carlo procedure.

The ICM receptor topology of the single receptor conformation (see above) was tethered to the coordinates of each MRC. The chemical distortions produced by displacements were corrected with 25 steps of Cartesian minimization, making unnecessary the inclusion of the hydrogen atoms in the EN·NMA calculation.

ICM docking and scoring

ICM addresses the docking issue as a global optimization problem, implementing a biased probability Monte Carlo (BPMC) global stochastic optimizer.48 Because the BPMC method was previously reported and thoroughly described, it is only briefly summarized here. During docking, one of the ligand torsional or roto-translational variables is randomly changed. A local refinement is carried out on the analytically differentiable terms by a conjugate-gradient minimization. The complete energy is calculated by adding up the contributions of the solvation energy and those of the conformational entropy. The new conformation is accepted or rejected according to the Metropolis criteria. A new random change is introduced and the whole procedure is repeated all over until a previously set number of steps are performed.

The molecular system was described using internal coordinate variables. Protein atomic types and parameters were taken from a modified version of the ECEPP/3 force field.75 The bounding box was derived from the residues at 5 Å distance from the self ligand. The binding pocket was described as a combination of five potentials that accounted for two van der Waals boundaries, electrostatics, hydrophobicity, and hydrogen bonding. Each potential was expressed on a pre-calculated grid spacing 0.5 Å. Because the standard 6-12 van der Waals potential was considered too sensitive to steric clashes for the purpose of the simulations, a truncated soft version was introduced and the other potentials were rescaled accordingly, where the van der Waals potentials were set to 1.0 Kcal/mol. The ligand was docked into the grid representation of the pocket using the BPMC method described above.

The global optimization provided a stack of geometrically diverse ligand poses, and the best ten were assigned a docking score by the standard ICM empirical scoring function.76, 77 In MRC docking, an independent run was carried out for each receptor conformation and the best five poses of each run were scored and combined in a stack consisting of 5×M (being M the number of MRC) poses. For selected cases, we checked that the different number of scored poses between SRC and MRC was not responsible of their performance (i.e., combining 5×M’ poses, being M’ the number of SRC docking experiments). In all the cases the redundancy was eliminated from the stack and the poses were sorted according to their ICM binding score energies. A Near Native solution (NNs) was considered when the ligand heavy atom RMSD against the crystallographic coordinates was ≤ 2.0 Å.

The receptors and ligands preparation, the docking simulations and the energy evaluations were all carried out by ICM 3.6 (Molsoft L.L.C., La Jolla, CA) coupled to in-house programs for the MRC generation.

Calculation Time

The MRC method, including Normal Mode Analysis, generation of Cartesian ensemble of 100 conformations, tethering, minimization and geometrical clustering took around 5 minutes per protein (see figure 1). Each single rigid receptor docking run took approximately between 30 and 90 seconds depending on the ligand size. The approximate total time per ligand is 1 hour on a single CPU. Because rigid dockings are independent, the calculation could be easily split between different CPUs.

The docking simulations ran on an Intel Intel® Core™ 2 Quad 2.4 GHz with 3 GB of RAM memory workstation. The MRC methodology is implemented in a web-server where the user can either choose a PDB code or upload a file containing a PDB structure and obtain an ensemble of conformations (http://abagyan.scripps.edu/MRC).

CONCLUSIONS

The inclusion of the flexibility of the protein is in many cases crucial to accurately predict the orientation and interactions of ligands within the binding pocket. One of the big challenges is to routinely incorporate flexibility considerations into structure based drug discovery in an affordable computing time. Here, we have shown that using less than a hundred conformations it is possible to improve the performance of top scoring cross docking against a single experimental conformation. The multiple conformation ensembles were derived from the essential motions of the residues surrounding the binding site, including heavy atoms from backbone and side chains. The needed fluctuations of pocket residues seem local and rather independent of the global collective motions of the protein. The results presented here could be useful to the docking-simulation community, and improvements are expected from the use of the collective motions extracted from mixed ensembles from MD, X-ray or NMR structures. The use of the essential motions of the binding site could enhance the conformational space needed for docking.

Figure 2
MRC methodology applied to the estrogen receptor protein, PDBid: 3ERT. (a) Example of the residues included in the elastic network normal mode analysis with a sphere radius of 10 Å around 4-Hydroxytamoxifene. (b) Cross docking of Raloxifene in ...

Supplementary Material

ACKNOWLEDGMENT

The authors thank to Alberto Perez, Carles Ferrer-Costa, Michael S. Fish and George Nicola for reviewing the manuscript. MR is supported by a Spanish MEC Postdoctoral fellowship. This work was supported by NIH/NIGMS grants 5-R01-GM071872 and 1-R01-GM074832.

Footnotes

SUPPORTING INFORMATION AVAILABLE Supporting Figures S1 to S5 and Table S1 indicated in the text. This information is available free of charge via the Internet at http://pubs.acs.org.

REFERENCES

1. Totrov M, Abagyan R. Flexible ligand docking to multiple receptor conformations: a practical alternative. Curr. Opin. Struct. Biol. 2008;18:178–184. [PMC free article] [PubMed]
2. Henzler-Wildman KA, Lei M, Thai V, Kerns SJ, Karplus M, Kern D. A hierarchy of timescales in protein dynamics is linked to enzyme catalysis. Nature. 2007;450:913–916. [PubMed]
3. Ishima R, Torchia DA. Protein dynamics from NMR. Nat. Struct. Biol. 2000;7:740–743. [PubMed]
4. Lange OF, Lakomek NA, Fares C, Schroder GF, Walter KF, Becker S, Meiler J, Grubmuller H, Griesinger C, de Groot BL. Recognition dynamics up to microseconds revealed from an RDC-derived ubiquitin ensemble in solution. Science. 2008;320:1471–1475. [PubMed]
5. Teague SJ. Implications of protein flexibility for drug discovery. Nat. Rev. Drug Dis. 2003;2:527–541. [PubMed]
6. Sousa SF, Fernandes PA, Ramos MJ. Protein-ligand docking: Current status and future challenges. Proteins-Struct. Funct. and Bioinf. 2006;65:15–26. [PubMed]
7. Carlson HA, McCammon JA. Accommodating protein flexibility in computational drug design. Mol. Pharmacol. 2000;57:213–218. [PubMed]
8. Teodoro ML, Kavraki LE. Conformational flexibility models for the receptor in structure based drug design. Curr. Pharm. Des. 2003;9:1635–1648. [PubMed]
9. Cozzini P, Kellogg GE, Spyrakis F, Abraham DJ, Costantino G, Emerson A, Fanelli F, Gohlke H, Kuhn LA, Morris GM, Orozco M, Pertinhez TA, Rizzi M, Sotriffer CA. Target flexibility: an emerging consideration in drug discovery and design. J. Med. Chem. 2008;51:6237–6255. [PMC free article] [PubMed]
10. Barril X, Morley S. Unveiling the full potential of flexible receptor docking using multiple crystallographic structures. J. Med. Chem . 2005;48:4432–4443. [PubMed]
11. Sheridan RP, McGaughey GB, Cornell WD. Multiple protein structures and multiple ligands: effects on the apparent goodness of virtual screening results. J. Comput. Aided. Mol. Des. 2008;22:257–265. [PubMed]
12. Rockey WM, Elcock AH. Structure selection for protein kinase docking and virtual screening: homology models or crystal structures? Curr. Protein. Pept. Sci. 2006;7:437–457. [PubMed]
13. Thomas MP, McInnes C, Fischer PM. Protein structures in virtual screening: a case study with CDK2. J. Med. Chem. 2006;49:92–104. [PubMed]
14. Huang SY, Zou X. Ensemble docking of multiple protein structures: considering protein structural variations in molecular docking. Proteins. 2007;66:399–421. [PubMed]
15. Damm KL, Carlson HA. Exploring experimental sources of multiple protein conformations in structure-based drug design. J. Am. Chem. Soc. 2007;129:8225–8235. [PubMed]
16. Karplus M, Kuriyan J. Molecular dynamics and protein function. Proc. Natl. Acad. Sci. U S A. 2005;102:6679–6685. [PubMed]
17. Karplus M. Molecular dynamics of biological macromolecules: A brief history and perspective. Biopolymers. 2003;68:350–358. [PubMed]
18. Cheng LS, Amaro RE, Xu D, Li WW, Arzberger PW, McCammon JA. Ensemble-based virtual screening reveals potential novel antiviral compounds for avian influenza neuraminidase. J. Med. Chem . 2008;51:3878–3894. [PMC free article] [PubMed]
19. Wang Y, Tajkhorshid E. Electrostatic funneling of substrate in mitochondrial inner membrane carriers. Proc. Natl. Acad. Sci. U S A. 2008;105:9598–9603. [PubMed]
20. Zacharias M. Rapid protein-ligand docking using soft modes from molecular dynamics simulations to account for protein deformability: binding of FK506 to FKBP. Proteins. 2004;54:759–767. [PubMed]
21. Xu Y, Colletier JP, Jiang H, Silman I, Sussman JL, Weik M. Induced-fit or preexisting equilibrium dynamics? Lessons from protein crystallography and MD simulations on acetylcholinesterase and implications for structure-based drug design. Protein Sci. 2008;17:601–605. [PubMed]
22. Marco E, Gago F. Overcoming the inadequacies or limitations of experimental structures as drug targets by using computational modeling tools and molecular dynamics simulations. ChemMedChem. 2007;2:1338–1401. [PubMed]
23. Sotriffer CA, Kramer O, Klebe G. Probing flexibility and “induced-fit” phenomena in aldose reductase by comparative crystal structure analysis and molecular dynamics simulations. Proteins. 2004;56:52–66. [PubMed]
24. Meagher KL, Carlson HA. Incorporating protein flexibility in structure-based drug discovery: Using HIV-1 protease as a test case. J. Am. Chem. Soc. 2004;126:13276–13281. [PubMed]
25. Soliva R, Gelpi JL, Almansa C, Virgili M, Orozco M. Dissection of the recognition properties of p38 MAP kinase. Determination of the binding mode of a new pyridinyl-heterocycle inhibitor family. J. Med. Chem. 2007;50:283–293. [PubMed]
26. Amaro RE, Baron R, McCammon JA. An improved relaxed complex scheme for receptor flexibility in computer-aided drug design. J. Comput. Aided. Mol. Des. 2008;22:693–705. [PMC free article] [PubMed]
27. Amadei A, Linssen AB, Berendsen HJ. Essential dynamics of proteins. Proteins. 1993;17:412–425. [PubMed]
28. Tirion MM. Large Amplitude Elastic Motions in Proteins from a Single-Parameter, Atomic Analysis. Phys. Rev. Lett. 1996;77:1905–1908. [PubMed]
29. Cui Q, Bahar I. Normal Mode Analysis: Theory and Applications to Biological and Chemical Systems. CRC Press; 2006. p. 406.
30. Go N, Scheraga HA. On the Use of Classical Statistical Mechanics in the Treatment of Polymer Chain Conformation. Macromolecules. 1976;9:535–542.
31. Levy RM, Karplus M. Vibrational approach to the dynamics of an a-helix. Biopolymers. 1979;18:2465–2495.
32. Go N, Noguti T, Nishikawa T. Dynamics of a small globular protein in terms of low-frequency vibrational modes. Proc. Natl. Acad. Sci. U S A. 1983;80:3696–3700. [PubMed]
33. Brooks B, Karplus M. Harmonic dynamics of proteins: normal modes and fluctuations in bovine pancreatic trypsin inhibitor. Proc. Natl. Acad. Sci. U S A. 1983;80:6571–6575. [PubMed]
34. Hayward S, Kitao A, Go N. Harmonic and Anharmonic Aspects in the Dynamics of BPTI - a Normal-Mode Analysis and Principal Component Analysis. Protein Sci. 1994;3:936–943. [PubMed]
35. Hayward S, Kitao A, Go N. Harmonicity and Anharmonicity in Protein Dynamics - a Normal-Mode Analysis and Principal Component Analysis. Proteins-Struct. Funct. Genet. 1995;23:177–186. [PubMed]
36. Rueda M, Chacon P, Orozco M. Thorough Validation of Protein Normal Mode Analysis: A Comparative Study with Essential Dynamics. Structure. 2007;15:565–575. [PubMed]
37. Hayward S, de Groot BL. Normal modes and essential dynamics. Methods. Mol. Biol. 2008;443:89–106. [PubMed]
38. Zacharias M, Sklenar H. Harmonic modes as variables to approximately account for receptor flexibility in ligand-receptor docking simulations: Application to DNA minor groove ligand complex. J. Comp. Chem. 1999;20:287–300.
39. May A, Zacharias M. Protein-Ligand Docking Accounting for Receptor Side Chain and Global Flexibility in Normal Modes: Evaluation on Kinase Inhibitor Cross Docking. J. Med. Chem. 2008;51:3499–3506. [PubMed]
40. Sandera T, Liljeforsa T, Balle T. Prediction of the receptor conformation for iGluR2 agonist binding: QM/MM docking to an extensive conformational ensemble generated using normal mode analysis. J. Mol. Graph. Model. 2008;26:1259–1268. [PubMed]
41. Cavassotto C, Kovacs J, Abagyan R. Representing receptor flexibility in ligand docking through relevant normal modes. J. Am. Chem. Soc . 2005;127:9632–9640. [PubMed]
42. Floquet N, Marechal JD, Badet-Denisot MA, Robert CH, Dauchez M, Perahia D. Normal mode analysis as a prerequisite for drug design: application to matrix metalloproteinases inhibitors. FEBS Lett. 2006;580:5130–5136. [PubMed]
43. Kovacs JA, Cavasotto CN, Abagyan R. Conformational Sampling of Protein Flexibility in Generalized Coordinates: Application to Ligand Docking. J. Comp. Theor. Nanosc. 2005;2:354–361.
44. Bartlett GJ, Porter CT, Borkakoti N, Thornton JM. Analysis of catalytic residues in enzyme active sites. J. Mol. Biol. 2002;324:105–121. [PubMed]
45. Yuan Z, Zhao J, Wang ZX. Flexibility analysis of enzyme active sites by crystallographic temperature factors. Protein Eng. 2003;16:109–114. [PubMed]
46. Sacquin-Mora S, Laforet E, Lavery R. Locating the active sites of enzymes using mechanical properties. Proteins. 2007;67:350–359. [PubMed]
47. Zhou Y, Vitkup D, Karplus M. Native proteins are surface-molten solids: application of the Lindemann criterion for the solid versus liquid state. J. Mol. Biol. 1999;285:1371–1375. [PubMed]
48. Abagyan R, Totrov M. Biased probability Monte Carlo conformational searches and electrostatic calculations for peptides and proteins. J. Mol. Biol. 1994;235:983–1002. [PubMed]
49. Totrov M, Abagyan R. Detailed ab initio prediction of lysozyme-antibody complex with 1.6 A accuracy. Nat. Struct. Biol. 1994;1:259–263. [PubMed]
50. Bottegoni G, Kufareva I, Totrov M, Abagyan R. A new method for ligand docking to flexible receptors by dual alanine scanning and refinement (SCARE) J. Comput. Aided. Mol. Des. 2008;22:311–325. [PMC free article] [PubMed]
51. Brem R, Dill KA. The effect of multiple binding modes on empirical modeling of ligand docking to proteins. Protein Sci. 1999;8:1134–1143. [PubMed]
52. Sherman W, Day T, Jacobson MP, Friesner RA, Farid R. Novel procedure for modeling ligand/receptor induced fit effects. J. Med. Chem. 2006;49:534–553. [PubMed]
53. Gervasio FL, Laio A, Parrinello M. Flexible docking in solution using metadynamics. J. Am. Chem. Soc. 2005;127:2600–2607. [PubMed]
54. Kurkcuoglua O, Jernigan RL, Dorukera P. Collective Dynamics of Large Proteins from Mixed Coarse-Grained Elastic Network Model. QSAR & Combi. Sci. 2004;24:443–448.
55. Daniel RM, Dumm RV, Finney JL, Smith JC. The role of dynamics in enzyme activity. Ann. Rev. Biophys. Biomol. Struct. 2003;32:69–92. [PubMed]
56. Reat V, Patzelt H, Ferrand M, Pfister C, Oesterhelt D, Zaccai G. Dynamics of different functional parts of bacteriorhodopsin: H-2H labeling and neutron scattering. Proc. Natl. Acad. Sci. U S A. 1998;95:4970–4975. [PubMed]
57. Kovacs JA, Chacon P, Abagyan R. Predictions of protein flexibility: first-order measures. Proteins. 2004;56:661–668. [PubMed]
58. Tirion MM, ben-Avraham D. Normal mode analysis of G-actin. J. Mol. Biol. 1993;230:186–195. [PubMed]
59. Rueda M, Ferrer-Costa C, Meyer T, Perez A, Camps J, Hospital A, Gelpi JL, Orozco M. A consensus view of protein dynamics. Proc. Natl. Acad. Sci. U S A. 2007;104:796–801. [PubMed]
60. Gutteridge A, Thornton J. Conformational changes observed in enzyme crystal structures upon substrate binding. J. Mol. Biol. 2005;346:21–28. [PubMed]
61. Perez A, Blas JR, Rueda M, López-Bes JM, de La Cruz X, Luque FJ, Orozco M. Exploring the essential dynamics of B.DNA. J. Chem. Theo. Comput. 2005;1:790–800. [PubMed]
62. Brüschweiler R. Collective protein dynamics and nuclear spin relaxation. J. Chem. Phys. 1995;102:3396–3403.
63. An J, Totrov M, Abagyan R. Pocketome via comprehensive identification and classification of ligand binding envelopes. Mol. Cell Prot . 2005;4:752–761. [PubMed]
64. Koska J, Spassov VZ, Maynard AJ, Yan L, Austin N, Flook PK, Venkatachalam CM. Fully Automated Molecular Mechanics Based Induced Fit Protein-Ligand Docking Method. J. Chem. Inf. Model. 2008;48:1965–1973. [PubMed]
65. Mustard D, Ritchie DW. Docking essential dynamics eigenstructures. Proteins. 2005;60:269–274. [PubMed]
66. Cavasotto CN, Abagyan RA. Protein flexibility in ligand docking and virtual screening to protein kinases. J. Mol. Biol. 2004;337:209–225. [PubMed]
67. Meiler J, Baker D. ROSETTALIGAND: protein-small molecule docking with full side-chain flexibility. Proteins. 2006;65:538–548. [PubMed]
68. Polgar T, Baki A, Szendrei GI, Keseru GM. Comparative virtual and experimental high-throughput screening for glycogen synthase kinase-3 beta inhibitors. J. Med.Chem. 2005;48:7946–7959. [PubMed]
69. Polgar T, Keseru GM. Ensemble docking into flexible active sites. Critical evaluation of FlexE against JNK-3 and beta-secretase. J. Chem. Inf. Model. 2006;46:1795–1805. [PubMed]
70. Sheng-You Huang XZ. Ensemble docking of multiple protein structures: Considering protein structural variations in molecular docking. Proteins. 2007;66:399–421. [PubMed]
71. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. The Protein Data Bank. Nuc. Ac. Res. 2000;28:235–242. [PMC free article] [PubMed]
72. Damm KL, Carlson HA. Gaussian-weighted RMSD superposition of proteins: a structural comparison for flexible proteins and predicted protein structures. Biophys J. 2006;90:4558–4573. [PubMed]
73. Halgren TA. Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94. J. Comp. Chem. 1996;17:490–519.
74. Tirion MM, ben-Avraham D, Lorenz M, Holmes KC. Normal modes as refinement parameters for the F-actin model. Biophys. J. 1995;68:5–12. [PubMed]
75. Nemethy G, Gibson KD, Palmer KA, Yoon CN, Paterlini G, Zagari A, Rumsey S, Scheraga HA. Energy parameters in polypeptides. 10. Improved geometrical parameters and nonbonded interactions for use in the ECEPP(SLASH)3 algorithm, with application to proline-containing peptides. J. Chem. Phys. 1992;96:6472.
76. Totrov M, Abagyan R. In: Istrail S, Pevzner P, Waterman M, editors. Derivation of sensitive discrimination potential for virtual ligand screening; RECOMB’99: Proceedings of the third annual international conference on computational molecular biology, France, 1999; Association for Computer Machinery: France. 1999.
77. Totrov M, Abagyan R. Protein-ligand docking as an energy optimization problem. In: Sons JW, Raffa R, editors. Drug-receptor Thermodynamics: Introduction and experimental applications. New York: 2001.