|Home | About | Journals | Submit | Contact Us | Français|
State of the art docking algorithms predict an incorrect binding pose for about 50 to 70% of all ligands when only a single fixed receptor conformation is considered. In many more cases, lack of receptor flexibility results in meaningless ligand binding scores, even when the correct pose is obtained. Incorporating conformational rearrangements of the receptor binding pocket into predictions of both ligand binding pose and binding score is critical for improving structure based drug design and virtual ligand screening methodologies. However, direct modeling of protein binding site flexibility remains challenging due to the large conformational space that must be sampled, and difficulties remain in constructing a suitably accurate energy function. Here we show that using multiple fixed receptor conformations, either experimentally determined by crystallography or NMR, or computationally generated, is a practical shortcut that may improve docking calculations. In several cases, such an approach has led to experimentally validated predictions.
Molecular docking is playing an increasingly important role in lead discovery and design. Nevertheless, the docking field is still far from the goal of accurately and reliably predicting complex structures for arbitrary ligand-receptor pairs. It has long been recognized that a simplistic rigid ‘lock-and-key’ model of ligand-receptor interaction is inadequate and incorporation of ligand and receptor flexibility is required for accurate docking. While ligand flexibility has been addressed by a variety of algorithms, receptor flexibility remains a formidable challenge.
Several approaches to incorporate receptor flexibility in ligand docking were previously reviewed by Teodoro and Kavraki . They propose a classification of methods that spans five categories: a) ‘soft’ receptors that limit penalties due to steric clashes, b) selection of a few critical degrees of freedom in the binding site, c) use of multiple receptor structures, d) use of modified molecular simulation methods and e) use of collective degrees of freedom as a new basis of representation for protein flexibility. In this review, we focus on using static multiple receptor conformations, either experimental or computationally generated.
Direct modeling of protein movements associated with binding site flexibility represents a significant problem due to the dual challenge of high dimensionality of the conformational space and of the complexity of energy function. A typical ligand binding site for a drug-like molecule consists of ten to twenty amino acid side-chains, which may mean dozens of potentially rotatable torsions. This number can easily be several times larger than the number of degrees of freedom for the ligand (typically 6 to 12). Considering the backbone movements may dramatically worsen the situation since, in contrast to relatively independent side chains, each backbone movement affects multiple side chains. Thus, fully flexible receptor/ligand docking simulation may involve sampling of an order of magnitude higher number of degrees of freedom than typical rigid receptor/flexible ligand simulations routinely used in current structure-based virtual screening and ligand design projects. Apart from being computationally demanding, this expansion of sampling imposes high requirements on the energy function that must be able to discriminate a small number of low energy structures that are actually realized in nature from the vast number of hypothetical conformations generated by the sampling procedures. Taken to the limit, ligand docking into the flexible receptor essentially becomes the protein folding problem in the presence of ligand. Therefore, practical approaches to receptor flexibility incorporation into docking simulations must restrict radically the subspace of the full protein conformational space that they actually search.
Limiting flexibility to side-chains makes the problem much more tractable, and an exhaustive search of the binding site side-chain conformations, similar to what has been done in protein docking , is possible for smaller ligands [3,4]. As an alternative to exhaustive search, a ‘minimal rotation hypothesis’ was proposed by Zavodsky and Kuhn . Their docking algorithm, SLIDE, attempts to resolve ligand-receptor steric clashes by a minimal number of side-chain rotations, with the cost of side-chain movement evaluated as a product of the rotation angle and the number of atoms moved.
Depending on the specific system, side-chain flexibility alone may or may not be sufficient for adequate modeling. For example, conformational variability in the HIV protease binding site is apparently well described in terms of movements of several sidechains and a water molecule . On the other hand, many kinases exhibit loop rearrangements as well as large-scale mutual movement of the two ‘lobes’ delimiting the active site . Diversity in ligand binding mechanisms and the frequent unpredictability of receptor movement types makes the use of pre-determined (by experimental or computational means) multiple receptor conformations (MRC) an attractive practical alternative.
Detailed case analysis of a large number of incorrect ligand-receptor docking poses or inadequate binding scores finds many alternative sources of error beyond receptor flexibility. They include ‘fantasy’ (outside the electron density) positions of the ligand pocket atoms (sidechains or loops), incorrect orientations of His, Asn and Gln side chains, improperly assigned histidine tautomers and charged states for aspartate, glutatame and histidine, and improper proline ring puckering, among others . In some cases, protonation or isomeric states may be ligand dependent. These inevitable receptor ambiguities affect the MRC method in two different ways. First, we need to take those possibilities into account upon adding any protein structure to the MRC set. Second, as an added benefit, the multiple receptor structures may represent those uncertainties in specific atomic details of the receptor binding pocket, including alternative tautomers, isomers, ring puckering, protonation states, hydrogen positions and presence/absence of specific water molecules potentially participating in ligand binding.
It is noteworthy that the limitations of single conformation representation of experimental protein structure have been recently brought to attention of X-ray crystallography community . Methods for generation of crystallographic ensembles are being developed . If proposed deposition of crystallographic ensembles in PDB becomes a common practice, these ensembles will become another natural source of input for MRC docking studies.
Given the variety and success of available flexible ligand/rigid receptor docking algorithms, the easiest way to include multiple conformations of receptor in a docking experiment is simply to run multiple independent simulations (Figure 1). However, integration of MRC sampling into the docking algorithm may offer advantages in terms of calculation speed as well as simplification of the data management. Such ‘ensemble docking’ extensions of original rigid receptor algorithms have been reported, for example for AUTODOCK  or ICM . Extension of the popular FlexX algorithm, FlexE not only utilizes MRC individually, but attempts to extend the search space beyond the input set of conformations by detecting distinct dissimilar parts and joining them combinatorially . New potentially accessible receptor conformations are thus generated during the search. However, consideration of too many conformations can lead to reduced performance. In a recent critical evaluation of FlexE on two targets of pharmaceutical interest, β-secretase and JNK-3, the algorithm was unable to handle large loop movements and could not match enrichment factors obtained by running multiple independent FlexX runs on each receptor structure .
FLIPDock is another algorithm using the AutoDock force field that introduces a highly sophisticated data structure for the MRC representation, termed Flexibility Tree (FT) . The FT data structure describes the receptor as a nested system of molecular fragments which can be involved in a range of movement types such as hinge, shear, twist, normal modes, side-chain rotameric states etc. Representation of protein flexibility in terms of intuitive hierarchical classification of movements is a great strength of the approach. The authors proceed to demonstrate that FT can be successfully used to include side-chain movements into docking simulations. It is less clear to what extent the FT can generate realistic atomic-level receptor structures when large-scale movements are involved, as only balanol/protein kinase A docking results were presented.
The original DOCK algorithm was also extended to ensembles of receptor structures, and a comparative study of the MRC docking versus the ‘soft’ docking potential was reported by Ferrari and coworkers . Two cavities in a mutant T4 lysozyme and aldose reductase (AldR) active site were used as targets for docking and VLS experiments. The lysozyme cavities were considered as an ideal case for the ‘soft’ docking approach, and indeed the authors found improvement in compound ranking as compared to ‘hard’ single apo conformation docking. The docking to four manually chosen conformations resulted in a modest further improvement of ranking for both cavities: 72% and 68% of the native ligands were recovered in the top 1% of the database versus 57% and 64% for the ‘soft’ docking or only 51% and 49% for the ‘hard’ docking. Results of VLS experiments with multiple AldR conformations also showed a 40% improvement over ‘hard’ docking to a single conformation. However, different forms of softened potential, e.g. a truncated 6-12 Lennard-Jones type potential instead of the 6-9 form used by Ferrari and coworkers, are likely to yield different results. Remarkably, the authors were also able to use the results of the MRC VLS for AldR to select a few compounds for experimental testing and discovered two novel low-micromolar inhibitors.
The ensemble docking approach by Huang and Zou  also builds upon DOCK. A ‘reference’ consensus receptor structure is derived from the input ensemble and used in a rigid ligand placement step. The additional Simplex minimization step uses receptor conformation as a discrete variable. It seems questionable whether the Simplex algorithm is well-suited for the optimization of integer parameters. The authors report good validation results, although an unorthodox success criterion (a solution within 2.5Å of the native among top 5 poses) makes objective comparison to other methods difficult.
The recently reported FITTED algorithm allows two receptor flexibility modes. The first mode, termed ‘semi-flexible’, is essentially an MRC ensemble docking. The second ‘fully-flexible’ mode allows genetic algorithm (GA) to generate different combinations of side-chain rotamers and backbone conformations found in the input ensemble. In addition, the algorithm is capable of simulating displaceable interface water molecules by a combination of special functional form for water interaction and sampling absence/presence of waters in GA.
Ensemble methods may offer significant performance advantage over sequential docking to multiple conformations by conventional rigid-receptor algorithms. For example, 6-fold speedup was reported for ensemble docking to 12 conformations of lysozyme cavity , and the speedup reached 18 fold when docking against 48 conformations. Ultimately the efficiency of ensemble methods should depend on the diversity of the receptor conformations: if the ensemble only involves minor structural variations, its exploration may contribute only additively to the overall computational cost; however, if highly dissimilar binding site conformations are included, each of them will have to be explored virtually independently, potentially multiplying the search time by the number of conformations.
Post-docking optimization may help to further improve both docking pose and its score. Nabuurs, Wagener and de Vlieg demonstrated a robust performance of a combination of FlexX-Ensemble docking combined with a post-docking explicit receptor ligand optimization on a benchmark of 35 ligand-receptor complexes .
Advantages and pitfalls of the MRC approach in docking and VLS are well illustrated by an in-depth benchmarking study of Barril and Morley. Their test set consisted of 49 structures of cyclin-dependent kinase 2 (CDK2) with 34 ligands and 149 structures of heat shock protein 90 (HSP90) with 57 ligands. These receptors are among the most thoroughly investigated experimentally. On average, only 33% (CDK2) and 25% (HSP90) of ligands would dock within 2Å RMSD to any single receptor structure, while 97% of them would dock correctly to at least one receptor structure. Unfortunately, the best performing single receptor structure for each ligand is not known in advance. The best performing single receptor structure could be used to correctly dock up to 68% and 49% ligands (CDK2 and HSP90 respectively). Best-performing combinations of two or more receptor structures were next investigated. Success rate gradually improved to 94% and 77% for the best subsets of 6 CDK2 structures and 8 HSP90 structures respectively. A pitfall of using a large number of receptor structures was also observed: success rate actually declined when more than 39 (CDK2) or 81 (HSP90) structures were used. The performance dependence on the MRC set for realistic random subsets is less dramatic: the average performance improved monotonically with the number of conformations, reaching 76% and 51% for full sets. The bulk of the improvement still occurred for the first 10 (CDK2) and 25 (HSP90) structures, suggesting that a relatively small subset of structures can embody a sufficient number of receptor conformational states to adequately represent induced fit. Interestingly, at least for HSP90 the performance could be significantly improved by including an ad-hoc solvation-based receptor conformational penalty to the scoring function. The observation emphasizes the need for eventual development of methods for accurate receptor conformation scoring, which is currently often disregarded.
While MRC docking may improve pose prediction, each additional conformation increases the chance of a false positive in VLS. Possible increases in false positive rates with the number of MRCs are also encountered in protein-protein docking simulations, recently reviewed in . Furthermore, the usual tacit assumption that an MRC docking improves the results for each receptor, albeit at a higher cost, is not true at all. It is entirely possible that for those easier cases in which the correct ligand pose was predicted with a single receptor conformation, the introduction of MRCs may lead to a wrong pose being top-scored.
Barril and Morley investigated the effect of multiple receptor conformations in VLS experiments. The results demonstrate that VLS is much more sensitive to the potential artifacts introduced by MRC: in the best case scenario (choosing receptor conformations resulting in best performance), maximum enrichment factors (13 versus 8.7 for best single conformation of CDK2) are achieved with two receptor conformations and deteriorate steeply for higher conformation numbers. For arbitrarily chosen conformations, some improvement from using MRC could only be observed when enrichment factor (EF) is calculated for a top 10% virtual library subset. For a more practical 1% subset, using MRC would only deteriorate EF, apparently because additional receptor conformations on average tend to bring more false positives than true hits into the top scoring list. The results indicate that when using MRC in VLS experiments, it is best to use a few carefully selected conformations rather then include every structure available. The balance between obtaining correct poses and good scores for larger number of ligands should be weighed against increased potential for docking pose artifacts and VLS false positives.
While significant advances have been made in the utilization of MRCs for ligand docking, automatic generation of reasonably small yet representative sets of receptor conformation remains challenging. For popular targets, this issue may be increasingly addressed by the rapid expansion of the PDB. Dozens of complex X-ray structures are available for several tyrosine kinases, HIV protease, and a number of other proteases and metalloproteases. These numbers further expand when close homologues are considered. Nevertheless, novel protein targets for which docking-based virtual screening could be of particular interest are still often represented by a single structure. Discovering alternative conformations of their binding sites could potentially result in breakthrough ligand designs, as illustrated by the success of the allosteric inhibitors of tyrosine kinases such as STI-571 (imatinib or Gleevec)[22,23].
A normal mode approach was investigated by Cavasotto et al. . Normally, the large number of normal modes, even if only the low-frequency ones are selected, makes their use to generate alternative conformations problematic. However, if the normal modes impacting pocket conformation are selected, the number of variables becomes small enough for a suitable representative set of conformations. Improvements in both pose prediction and VLS enrichment are shown for cAMP-dependent protein kinase.
Combination of graph-theoretical algorithm FIRST for protein flexibility analysis and random-walk sampling by ROCK was proposed by Zavodszky and colleagues as a source of receptor ensembles to be used in docking with SLIDE . Realism of the ensembles was evaluated by comparison with NMR. While qualitatively ROCK results correlate well with the experimental data, the study was focused on only two receptor-ligand pairs and the results mainly illustrate the ability of the approach to simulate flexibility rather than predict specific conformational changes. Application of the approach in a ligand discovery study was recently reported .
Damm and Carlson also compared multiple conformations from two principle sources of experimental methods of structure determination . While it is generally believed that X-ray structures are more accurate, the authors concluded that the NMR ensembles provided a broader representation of alternative conformations in the ligand binding site.
Molecular dynamics simulations can also be used as a source of alternative conformations. Gorfe and Caflish investigated conformational plasticity of beta-secretase (BACE) conformations using multiple MD runs for apo- and inhibitor bound protein . Wong et al docked balanol to MD trajectory snapshots of protein kinase A . Due to a high computational cost, the MD studies typically still focus on a single protein target. This is a very limited scope as compared to the majority of recent docking validation studies, which typically use 100 or more structures covering dozens of protein targets. Therefore, it is hard to assess to what extent the MD sampling can cover the conformational flexibility of binding site for an arbitrary target of interest. Larger scale movements may occur on time scales that would make MD simulations impractical. Conversely, a more difficult question is what fraction of the generated alternative conformations are artifacts. The problem of validation of the predicted alternative receptor conformations is not limited to MD and applies to any MRC generation algorithm. When activity data is available for a significant number of ligands, docking/scoring of the known binders and non-binders can be used to validate the generated receptor conformations and select them for further studies, e.g. .
Success of MRC docking approach critically depends on two aspects of the input set of receptor conformations. Firstly, the set should ideally be representative of all or most of the binding site conformations realized in nature. In many cases the nature of the induced fit is limited, e.g. upon agonist binding in a GPCR, or antagonist binding in nuclear receptors, or large scale loop rearrangements in kinase loops. Secondly, it is equally important to avoid artifact conformations which can result in both, incorrect pose prediction and false positives in virtual screening. The increased dimensionality of the essential conformational space around the binding site rapidly aggravates both aspects.
Sometimes no atoms are better than incorrectly placed atoms. In  the alanine conversion method was proposed, side chain conformations were successfully predicted without combinatorial search of the side chain conformations of the neighbors by a simple conversion of the nearby side chains to alanines. Later the side chains can be built back and refined. The same idea can be applied to protein docking and ligand docking. Sherman et al proposed a set of four rules to allow the selection of up to three residue side chains for the alanine conversion depending on the B-factors, occupancy and superpositions of known structures . The Ala-converted model is used for rigid docking with reduced van der Waals radii of the pocket atoms, followed by the full atom refinement. The authors used this model as a single model prior to the docking step.
In an MRC version of the omission model approach, if part of the receptor pocket is uncertain or flexible, it is always possible to add several conformations in which different uncertain parts are combinatorially omitted to the MRC set. This set now may contain both full atom conformers and omission models. Then different models can be tested by the binding score, refined and re-scored.
Multiple receptor conformations for high throughput ligand docking can be generated with one or several ligands actually present in the binding site. Fully receptor flexible docking of a few known ligands can be performed to force the receptor into alternative conformations. The irrelevant generated conformations can then be filtered out by evaluating the enrichment factor on a test set. Bisson and coworkers generated multiple conformations of androgen receptor with two different antagonists by Monte-Carlo sampling in ICM . Each conformation was tested for their ability to discriminate between AR binders and non-binders in a panel of 88 nuclear receptor ligands. The two AR conformations with the best enrichment characteristics were then used for virtual ligand screening of the marketed drugs for potential androgen receptor antagonists. Three identified antipsychotic drugs exhibited anti-androgenic activity and were then rationally re-purposed to nonsteroidal molecules with improved AR antagonism and marked reduction in affinity for dopaminergic and sterotonergic receptors (see Figure 2).
The nearly exponential growth in number of protein structures in PDB, the improved understanding of the induced fit, and improved conformational sampling methods make the multiple receptor conformations approach to the ligand docking increasingly attractive. This MRC approach is relatively fast and still suitable for virtual ligand screening as long as the number of fixed receptor conformations is relatively small and carefully chosen. The ligand-guided selection from a set of generated receptor conformations was shown to correctly predict the secondary activities of drugs. These types of the conformational generators can be used in the absence of a suitable diversity of the crystal structures or even starting from a homology model.
The authors thank Irina Kufareva and Giovanni Bottegoni for helpful discussions and artistic assistance, and Kim Reynolds for reviewing the manuscript.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.