|Home | About | Journals | Submit | Contact Us | Français|
We review insights from computational studies of affinities of ligands binding to proteins. The power of structural biology is in translating knowledge of protein structures into insights about their forces, binding, and mechanisms. However, the complementary power of computer modeling is in showing “the rest of the story” (i.e., how motions and ensembles and alternative conformers and the entropies and forces that cannot be seen in single molecular structures also contribute to binding affinities). Upon binding to a protein, a ligand can bind in multiple orientations; the protein or ligand can be deformed by the binding event; waters, ions, or cofactors can have unexpected involvement; and conformational or solvation entropies can sometimes play large and otherwise unpredictable roles. Computer modeling is helping to elucidate these factors.
Structure-based computer modeling of ligand-protein interactions is now a core component of modern drug discovery (Charifson and Kuntz, 1997). It is now difficult to imagine the drug discovery process without computation (Jorgensen, 2004). Computational methods have played a key role in the drug discovery process for a growing number of marketed drugs, including HIV protease inhibitors (Charifson and Kuntz, 1997; Greer et al., 1994; Jorgensen, 2004) and zanamivir (an antiviral neuraminidase inhibitor) (von Itzstein et al., 1993), and in the development of new drug candidates, such as HIV integrase inhibitors (Hazuda et al., 2004; Schames et al., 2004), hepatitis C protease inhibitors (Liverton et al., 2008; Thomson and Perni, 2006), and beta-secretase inhibitors (BACE-1) (Stauffer et al., 2007).
An early step in this field was the invention of the DOCK method in 1982 (Kuntz et al., 1982). There are now at least four classes of physical computer methods (listed from fastest to slowest, and least physical to most physical): (1) very fast molecular docking methods, including DOCK, Glide, AutoDock, FlexX, ICN, PMF, and GOLD, (2) approximate free energy methods, in which the solvent and protein motions are taken into account with fewer approximations, (3) relative binding free energy (RBFE) methods, which include full solvent and protein motions, but which require prior knowledge of the structure of a complex of the protein with a ligand that is similar to the one of interest, and (4) absolute binding free energy (ABFE) methods, which are the most expensive computationally, but which include the physics in the most rigorous way that is currently practical (see Figure 1). ABFE methods start from an unbound ligand and potentially the unbound structure of the protein to attempt to predict the structures, affinities, and thermal properties of the complexes of interest. Mining minima is another method that is very nearly in this last category and has provided insight into binding (Chang and Gilson, 2004; Gilson and Zhou, 2007; Head et al., 1997).
First, we define some terms. A lead compound is a molecule, typically in early-stage drug discovery, that can be further chemically modified to improve its properties as a possible drug candidate. A complex is a receptor and ligand bound together. A pose is one conformation of a ligand in a complex and specifies both the ligand conformation and its position relative to the receptor. A pose can refer either to a conformation that is known from an experimental structure of a complex, or to a hypothetical conformation generated in a computer model. The apo form refers to the structure of the protein that has no ligand bound to it. The holo form refers to the structure of the protein when it is complexed with ligand. The binding free energy, ΔG°, is the free energy of the complex minus the free energies of the ligand and apo protein separately in aqueous solution. The binding free energy is related to the equilibrium association constant, Ka, (in units of M−1) by ΔG° = −RT ln (C° Ka), where R is the gas constant, T is the absolute temperature, and C° is the standard concentration (1 M). The binding affinity, or dissociation constant, equals 1/Ka. The binding free has two components, ΔG = ΔH – TΔS, where H is the enthalpy and S is the entropy. Here are some of the key approaches used for studying binding.
Docking methods start with a known protein structure and a known ligand structure and aim to rapidly generate an optimal protein-ligand bound conformation. Docking was designed to be very rapid (seconds or less per compound), which is desirable for screening large libraries in the short times required for modern pharmaceutical lead discovery. Docking explores many ligand conformations and orientations, and in some cases even different potential binding sites. The different poses are rank ordered by a score, a quantity that ideally would correlate with the free energy of binding, and is obtained either from a physical or knowledge-based potential. Often, docking approaches treat the protein as completely rigid, having a single fixed receptor conformation. Other docking methods treat protein motions by moving only certain atoms out of the way. Though some modern docking approaches can allow for some motions of side chains or backbone (Corbeil et al., 2007; Cozzini et al., 2008; Leach, 1994; Meiler and Baker, 2006; Sherman et al., 2006; Wei et al., 2004), treating these degrees of freedom slows down the computations considerably. Docking is an appealing way to generate leads (Shoichet et al., 2002) because of its speed and ability to screen large libraries of potential leads (Huang and Jacobson, 2007; Babaoglu et al., 2008). But because docking trades off physical accuracy for speed, it is seldom accurate enough to predict binding affinities or rank-order compounds. Its power to discriminate binders from nonbinders varies widely depending on the target protein (Graves et al., 2008; Warren et al., 2006). But because of its speed, docking approaches are the method of choice for filtering out compounds that are likely nonbinders and for identifying native-like poses.
MM-PBSA/GBSA is more physically rigorous than docking. The acronym stands for molecular mechanics with Poisson-Boltzmann + surface area or MM-GBSA (GB stands for Generalized Born), and the method was originated by the Kollman and Case labs in the late 1990s (Cheatham et al., 1998; Kollman et al., 2000; Srinivasan et al., 1998), with parallel work by others (Vorobjev and Hermans, 1999). It involves greater computational cost than docking (at least several hours per compound), but also is more physical in its more extensive conformational sampling. MM-PBSA aims to estimate the binding free energies, or relative binding free energies, of related compounds. Here, a computer generates representative bound and unbound structures by molecular mechanics simulations or by energy minimization of a protein-ligand complex, usually in explicit solvent. The goal is to estimate the change in enthalpy on binding by comparing the average enthalpy of bound and unbound states, but this would be a small difference of two large, noisy energies. So after the all-atom simulations, the water is removed and the enthalpies and binding free energies are estimated using an implicit (Poisson-Boltzmann or Generalized Born) representation of water. The binding free energy estimate includes the enthalpy change plus the change in salvation free energy from the implicit solvent model.
In many cases, an approximate value of the entropy is also estimated from these simulations. Because MM-PBSA/GBSA invests more effort in sampling and entropies, it is closer to a true free energy calculation. However, often, because of limitations in the approximations for estimating entropy (Gilson and Zhou, 2007), entropic contributions are omitted when estimating relative binding strengths, in the hope that these contributions will cancel when comparing similar ligands (Gilson and Zhou, 2007; Shirts et al., 2009).
Early results with the MM-PBSA method were quite promising (typically with mean-squared errors under 3 kcal/mol for the first several years) (Huo et al., 2002; Kuhn and Kollman, 2000; Mardis et al., 2001; Rizzo et al., 2004; Schwarzl et al., 2002; Shirts et al., 2009), but more recent studies have seen larger errors in some cases (Shirts et al., 2009). Applications have typically been limited to single targets, so it is difficult to evaluate how well the method does generally.
The drawbacks of MM-PBSA/GBSA are that it, too, is sometimes not predictive (Pearlman, 2005; Shirts et al., 2009; Steinbrecher et al., 2006) and it requires prior knowledge of a likely bound complex as a starting point, although such starting conformations can be taken from prior docking (Steinbrecher et al., 2006).
A still more rigorous approach uses the energetics of a physical force field and extensive conformational sampling from molecular dynamics simulations to actually compute differences in binding free energies between similar ligands. This can be done using computational alchemy to obtain the difference in binding free energies, ΔΔGA→B. This is the free energy of changing ligand A into ligand B in the receptor, minus the free energy of changing A into B in solution. To compute this free energy difference for just one pair of ligands binding to the same protein can cost several hundred CPU days. These relative free energies can be computed precisely—given sufficiently long molecular dynamics simulations—using one of several different analysis techniques (Shirts et al., 2007). Though the accuracy of the binding free energies obtained from this method depends on the accuracy of the underlying molecular mechanics force field, it does treat fully, at least in principle, free energies associated with conformational change as well as entropies.
The first alchemical calculations were performed in the 1980s in the McCammon lab (Tembe and McCammon, 1984; Wong and McCammon, 1986), and then by others (Hermans and Subramaniam, 1986; Warshel et al., 1986; Bash et al., 1987; Shirts et al., 2007). Limitations of these methods are the high computational costs and the need to know at least one bound structure of a similar ligand in the protein as a starting point. Accuracies are generally better than for MM-PBSA (Pearlman, 2005; Steinbrecher et al., 2006) and docking (Mobley et al., 2007b; Pearlman and Charifson, 2001), but few systematic comparisons have been done. These methods are only useful for comparing related ligands or receptors.
The most powerful approach, in principle, is the method of absolute binding free energies (ABFE) (Boresch et al., 2003; Hermans and Wang, 1997; Roux et al., 1996). Like RBFE methods, ABFE methods also use full molecular dynamics simulations with fully detailed atomic force fields, and also involve separate sets of simulations for the solvated ligand, the solvated protein, and the complex. But ABFE methods do not require prior knowledge of the binding affinity of a related ligand, hence the term absolute. There have been two groups of ABFE approaches. The first begins with the structure of the ligand of interest bound to the protein. However, the ultimate goal is to begin with no prior knowledge of either the structure or affinity of the ligand complex. A second, more recent group replaces starting knowledge of the structure with one or more docking poses (Mobley et al., 2006, 2007b; Jayachandran et al., 2006). Various studies suggest that ABFE methods are fairly accurate, with good correlations to experimental binding affinities and with RMS errors often less than 3 kcal/mol (Deng and Roux, 2006; Fujitani et al., 2005; Jayachandran et al., 2006; Mobley et al., 2007b; Shirts et al., 2007; Wang et al., 2006), and sometimes much better.
The enterprise of structural biology has given us powerful “eyes” to see single structures—specific native structures and specific bound complexes—and some of the driving forces that hold them together: hydrogen bonds, hydrophobic interactions, ion pairing, and van der Waals packing. However, “what you see” is not always “what you get.” Other equally important forces, namely the entropies, are not visible in native structures.
To capture both the observable and nonobservable contributions to the energetics, it is important to note that binding takes place on an energy landscape. Exploring energy landscapes often requires modeling and computer simulations. For binding, the energy landscape is the free energy of the system as a function of its degrees of freedom, which are many, and include translational, rotational, conformational, and solvation degrees of freedom.
Relative to a receptor, a ligand has three translational degrees of freedom (x, y, and z directions) and three orientational degrees of freedom. When bound, motion in these degrees of freedom becomes restricted. This loss of freedom results in an entropic and free energy cost, opposing binding and favoring the dissociated state (Chang et al., 2007; Chang and Gilson, 2004; Chen et al., 2004; Deng and Roux, 2006; Lee and Olson, 2006; Wang et al., 2006). The loss of freedom depends on the mobility remaining in the binding site, so that in a series of increasingly tightly bound structures there will be increasing losses in translational and rotational entropies, resulting in a contribution opposing binding.
Some simple models assume that the ligand entropy lost on binding correlates with the number of rotatable bonds in the ligand (Böhm, 1993; Gilson and Zhou, 2007; Huey et al., 2007; Laederach and Reilly, 2003; Taylor et al., 2002; Gohlke and Klebe, 2002). The ligand is envisioned to start in its unbound state having access to all possible conformers and to end in its bound state having a single conformer. However, interestingly, more rigorous recent computational studies in host-guest systems indicate that losses in ligand conformational entropy on binding are not strongly correlated with the number of rotatable bonds (Chang and Gilson, 2004; Chen et al., 2004; Guimaraes and Cardozo, 2008). Recent work on salvation free energies of small molecules has led to similar conclusions (Mobley et al., 2008). Not all small-molecule conformers are populated equally in solution. Thus computing accurate ligand affinities (and entropy losses) requires more accurate treatments of the different ligand populations in solution. Entropic contributions can also vary between different conformations of the same ligand in a particular receptor (Chen et al., 2004; Gilson and Zhou, 2007), which may be important even for docking (Ruvinsky and Kozintsev, 2005).
A ligand can sometimes adopt multiple different conformations or orientations upon binding (see Figure 2). These different poses can be separated by energetic barriers. In some cases the different poses are due to ligand or protein symmetries. HIV-1 protease is a dimer with a nearly symmetric active site; as a result, many HIV protease inhibitors have two nearly identically binding modes (e.g., see Protein Data Bank [PDB] codes 1AXA, 1IZH, 1MUI, and 1U8G). Ligand symmetries can lead to trivial cases of multiple binding modes, which have significant entropic implications. Multiple binding modes are observed also when symmetries do not play a role. Computational studies show multiple distinct ligand binding modes in binding sites in T4 lysozyme (Mobley et al., 2006, 2007b), neutrophil elastase (Steinbrecher et al., 2006), estrogen receptor inhibitors (Oostenbrink and van Gunsteren, 2004), FKBP inhibitors (Jayachandran et al., 2006), biotin and streptavidin (Lazaridis et al., 2002), and cytochrome P450cam (Paulsen and Ornstein, 1992).
Do experimental studies support these predictions of multiple ligand conformations? The challenge is that multiple conformers are difficult to determine experimentally. But there is at least some direct crystallographic evidence suggesting multiple relevant orientations: in T4 lysozyme (Graves et al., 2005; Mobley et al., 2006, 2007b), influenza neuraminidase (Stoll et al., 2003) and possibly in trypsin (Stubbs et al., 2002), where the binding mode is affected by pH. Multiple binding modes of fragment-like kinase inhibitors have also been observed (Constantine et al., 2008). Multiple orientations or binding modes have also been seen in thymidylate synthase (Montfort et al., 1990), in the binding of an HIV-1 cell entry inhibitor to the core of HIV-1 gp41 (Zhou et al., 2000), the binding of a transition state analog to an AmpC beta lactamase mutant (Chen et al., 2006), the binding of thiocamphor to cytochrome P450cam (Raag and Poulos, 1991), the binding of flavin to para hydroxybenzoate hydrolase (Gatti et al., 1994), and in the binding of some HIV-1 protease inhibitors (Murthy et al., 1992). There is additional evidence for multiple orientations in several other cases (Birdsall et al., 1989; Böhm and Klebe, 1996; Lazaridis et al., 2002; Mewshaw et al., 2005; Orville et al., 1997; Uytterhoeven et al., 2002). Spectroscopic data (Deng et al., 2001) and studies of drastically different binding modes of related inhibitors (Figure 3) (Stout et al., 1999; Badger et al., 1988; Böhm and Klebe, 1996; Kim, 2007a, 2007b; Pei et al., 2006; Reich et al., 1995; Stoll et al., 2003), some of which may have multiple binding modes (Montfort et al., 1990), suggest that multiple binding modes may be relatively common.
It is not only ligands that can have multiple binding modes. Proteins can, too (Mobley et al., 2007a, 2007b). We refer here not to induced fit, where the ligand binding event causes a change in protein conformation. Rather, we focus on internal motions or freedom of the protein that occur either in the apo structure of the protein itself or in the complex itself. Comparisons of different apo structures of the same proteins show that there are some rotamer changes near binding sites even in the absence of the ligand (Najmanovich et al., 2000), suggesting that multiple rotameric states may be relevant; this is also supported by NMR data (Chou and Bax, 2001). Structural data in the apolar T4 lysozyme binding cavity suggests that helix F, which borders on the binding cavity, can undergo substantial motions of 1.5–2.5 Å with little free energy cost (Morton and Matthews, 1995); various other motions occur in T4 lysozyme as well (Zhang et al., 1995). DHFR appears to have multiple relevant conformations, both in isolation and when binding ligands, and the populations are modulated by pH (Birdsall et al., 1989); each state in the catalytic cycle appears to have at least partially occupied conformations that resemble those before or after it in the cycle (Boehr et al., 2006). Crystallographic evidence suggests multiple protein conformations due to domain motion in some cases (Ma et al., 2002). Multiple conformers are also seen in host-guest binding (Chang and Gilson, 2004; Chen et al., 2004) and can be critical for protein mechanisms such as enzyme catalytic motions (Eisenmesser et al., 2005; Arora and Brooks, 2007; Henzler-Wildman et al., 2007; Henzler-Wildman and Kern, 2007). Several other studies also have provided evidence for multiple protein conformations (Eisenmesser et al., 2002; Gerstein et al., 1994; Min et al., 2005a, 2005b; Fragai et al., 2006).
Ligand binding to a protein may induce strain. Strain refers to an energy cost, usually associated with a deformation of some sort. To achieve the lowest free energy of binding in the complex, the protein and/or ligand may become deformed relative to its unbound state in solvent, which costs energy (strain). Computational studies in the apolar lysozyme model binding site have found that protein strain energies for a valine side-chain rotamer change can be 3–4 kcal/mol (Deng and Roux, 2006; Mobley et al., 2007a, 2007b). When such strain energies are not taken into account, it leads to errors in predicted binding free energies (Mobley et al., 2007a, 2007b).
Computational modeling suggests that ligand strain free energies can be significant. In a survey of 150 crystallographic protein-ligand complexes, Perola and Charifson (2004) used molecular mechanics scoring functions to assess strain energies and found that roughly 50% of ligands with 4–6 rotatable bonds had strain energies more than 3 kcal/mol, and, overall, 40% of ligands had strain energies more than 5 kcal/mol. Another study computed quantum mechanical torsional potentials for a variety of PDB ligands and found that typical strain energies could be on the order of 0.6 kcal/mol per torsion motif, amounting to roughly 3 kcal/mol for a ligand with five torsion motifs (Hao et al., 2007). Another study found, for several ligands, that free energy costs of restricting the ligand to the bound conformation could be a few kcal/mol (Tirado-Rives and Jorgensen, 2006). More recent work suggests that these values could be overestimates of the true strain, as crystal structures (from which strain is estimated) may be refined with a different force field than is used in estimating the strain, introducing artifacts. Nevertheless, strain energies often appear to be greater than several kT (Huang et al., 2006). See also Warren and Perola's (Warren and Perola, 2008) presentation on the topic from OpenEye's CUP meeting (http://www.eyesopen.com/about/events/cups-2008/pdfs-CUP/CUP9-Field-of-Extremes.pdf). Apparently, binding interactions can be strong enough to pay a substantial strain price for deforming one or both partners. Hence the true bound structure of a complex will not be the one that maximizes the interaction energy between the receptor and ligand, but rather the one that best balances the tradeoff between gaining additional favorable interactions while also inducing strain (Sharp, 2005).
Some experiments support this contention that strain free energies can be substantial. In an NMR study on maltose binding protein, Tang et al. (2007) found that the unbound protein was predominantly (roughly 95%) in the open apo conformation, and had a smaller (roughly 5%) population in a minor apo conformation that was more like the holo conformation, but with no evidence that it populated the holo conformation at all in the absence of the ligand. Thus the minor apo conformation is roughly 1.7 kcal/mol less favorable than the major apo conformation, and the holo conformation is probably still more unfavorable. In another instance, in NtrCr, a conformational switch in bacteria that undergoes a conformational change upon phos-phorylation, the active conformation has been shown to be partially populated even when the protein is unphosphorylated (Volkman et al., 2001), but with a smaller population. Based on the populations, this active-like conformation is about 2 kcal/mol less favorable than the norm in active conformation. So, functional protein conformational changes can make significant contributions to the thermodynamics.
When a ligand binds to a protein, it causes conformational changes in the protein. This may or may not be accompanied by strain in the protein, as strain is an (invisible) energy cost.
Ligand-induced protein conformational changes are not rare events. Comparisons of apo and holo structures from the PDB show that backbone conformational motions on ligand binding are relatively common; 20% of binding residues (Gutteridge and Thornton, 2005) and 25% of binding sites (Najmanovich et al., 2000) across a variety of proteins have backbone Cα motions more than 1 Å. And 15% of binding site residues have side-chain motions of more than 2 Å (Gutteridge and Thornton, 2005), whereas only 30%–40% of binding sites have been shown to undergo no side-chain rotamer changes (Najmanovich et al., 2000). More anecdotal reports of conformational changes on ligand binding are available for a wide range of systems; kinases, for example, are notoriously flexible (Noble et al., 2004; Vajpai et al., 2008; Weisberg et al., 2007), as are many other proteins (Böhm and Klebe, 1996; Kim, 2007a; Meiler and Baker, 2006; Teague, 2003). An extreme example may be natively disordered proteins in which large parts of the protein may become ordered upon interacting with binding partners (Hilser and Thompson, 2007; Radivojac et al., 2007; Wright and Dyson, 1999).
In addition, a given protein can adopt different conformations for different ligands. A PDB study of 206 binding site pairs (each pair consisting of two structures of the same protein with different, similar ligands in the binding site) showed that in 83% of the cases there were significant conformational changes in the binding sites between pair members (Bostrom et al., 2006); changes were judged significant if the RMSD for all side-chain atoms if at least one amino acid residue within 5 Å of the ligand is greater than 1.0 Å. The most frequent differences were changes in water architecture and side-chain conformation (both occurring in over 50% of the pairs). Significant backbone conformational changes occurred in only 7% of the set; changes were judged significant if the RMSD of at least one backbone heavy atom in three or more consecutive amino acids is more than 0.5 Å. A smaller study found examples of substantial conformational changes on binding similar ligands for a variety of systems as well (Kim, 2007a). It is even possible for a single ligand to bind to different protein conformations under different solution conditions (Miller and Dill, 1997). Thus, changes in binding site architecture, at least at the side-chain and water level, should be regarded as the rule, rather than the exception.
Some computational studies predict that even when a binding site structure is not perturbed very much, its energetics can change substantially. For example, simulations show that neglecting even small protein motions can lead to large errors (RMS errors relative to experiment of nearly 20 kcal/mol when protein motions are not allowed), relative to much smaller errors (1.7 kcal/mol RMS) when protein motion is allowed. Even small relaxations of the protein reduced the RMS errors to 4–5 kcal/mol) (Mobley et al., 2007b). This is important for both conceptual and practical reasons. Conceptually, it means the strength or quality of binding interactions is sensitive to minute details of the bound structure and is not easily assessed by simple metrics like hydrogen bond counts or apparent fit. Practically, it means that free energy methods that include these protein conformational changes can potentially have much higher accuracy than docking methods that neglect them.
The conclusion that these small changes can make big differences in the energetics is supported by a variety of docking and (re)scoring studies that have looked at the effects of introducing small amounts of protein flexibility. Though scores do not necessarily improve for all ligands, they do typically change substantially, showing some improvement (Graves et al., 2008; Huang et al., 2006; Meiler and Baker, 2006; Sousa et al., 2006; Wei et al., 2004). But introducing protein flexibility without accounting for protein strain energies can potentially increase false positive rates by making binding sites too permissive (Graves et al., 2008; Sousa et al., 2006; Wei et al., 2004). This likely highlights the role of conformational change and strain.
Several detailed binding free energy studies have suggested that differences in solvation may play an important role in differences in binding free energy between relatively similar compounds (Jiao et al., 2008; Reddy and Erion, 2001). Two molecules might have similar interactions with a protein, similar strain energies, etc., but have different solvation properties in water, leading to solvation-driven differences in binding free energies. These differences may not always be intuitive. For example, the N-methylacetamide/amine “problem” (Rizzo and Jorgensen, 1999) suggests that adding a hydrophobic methyl group to acetamide or ammonia increases the affinity for water, whereas subsequent methylations decrease the affinity.
The importance of solvation and desolvation is supported by an emerging trend toward including approximate estimates of solvation/desolvation energies in approximate docking methods for scoring protein-ligand binding. Including such estimates appears to result in improved scoring (Ferrara et al., 2004; Gilson and Zhou, 2007; Shoichet et al., 1999). Without these contributions, charged ligands can wrongly appear to bind better than polar ligands in a polar binding site. A charged ligand may make favorable electrostatic interactions in a polar binding site, but it also costs a huge amount of energy to remove it from water (Brenk et al., 2006; Gilson and Zhou, 2007; Shoichet et al., 1999). In other cases, a small modification to a ligand can potentially lead to affinity gains due to a change in the desolvation cost (Kangas and Tidor, 2001).
Computer simulations have been used to study the role of crystallographic waters in binding thermodynamics (Barillari et al., 2007; Hamelberg and McCammon, 2004; Lu et al., 2006; Olano and Rick, 2004; Zhang and Hermans, 1996; see also Helms and Wade [1995, 1998a, 1998b] for desolvation of a buried binding cavity). In many cases, binding or ordering of waters occurs concurrently with ligand binding, so it can be extremely difficult to experimentally assess the contribution of water binding to overall binding thermodynamics. Computational methods can directly compute the free energy of inserting or removing a water molecule from a binding site, providing key insight that is hard to obtain experimentally.
These computational studies indicate that bound waters contribute substantially to binding free energies, contributing as much as −10 kcal/mol for some waters (Barillari et al., 2007), but smaller values between −3 and −6 kcal/mol are more typical (Barillari et al., 2007; Hamelberg and McCammon, 2004; Lu et al., 2006; Olano and Rick, 2004; Zhang and Hermans, 1996). In some cases, crystallographic waters appear substantially unfavorable relative to bulk, raising the possibility of problems with refinement or force fields (Barillari et al., 2007; Olano and Rick, 2004). Perhaps ligands can be designed with improved affinities by recognizing nearby sites where waters can be easily displaced (Abel et al., 2008; Pan et al., 2007).
Sometimes ligand binding can involve concerted reordering of many water molecules. In some hydrophobic sites in proteins that bind fatty acids or lipids, whole networks of more than a half-dozen water molecules shift their structures to form a “hydrophobic” interface with the ligand (LaLonde et al., 1994; Sulsky et al., 2007).
Binding free energies can also be affected by other unseen and unexpected factors. For example, protonation states can change on binding (Czodrowski et al., 2007; Dullweber et al., 2001; Gohlke and Klebe, 2002; Steuber et al., 2007), as can tautomeric states (Pospisil et al., 2003) and other factors. In some cases, multiple protonation or tautomeric states can be relevant, as observed crystallographically for one CDK2 inhibitor (Furet et al., 2002) and hypothesized in another instance (Lee et al., 1996). “Similar” ligands may also adopt different protonation states on binding (Dullweber et al., 2001).
We have reviewed some recent computational studies of ligand binding to proteins. Ultimately, to predict accurate binding affinities, it will be necessary to go beyond predicting a single “dominant” conformation of the ligand complexed with the protein. Binding free energy is not driven by a single conformation, but rather by the free energy landscape. It is the shape of the energy landscape that is crucial, the shape and width of the minima influences entropies. Entropies are key contributors to binding thermodynamics and are not observable in single bound structures. Other factors about the full landscape also play key roles, such as multiple ligand poses, protein conformations, strain energies, changes in water structure, and solvation and protonation all play roles. And none of these are observable in single structures. Computational tools can help provide insight into the unseen landscape, so those doing crystallographic studies may want to complement their work by using computational tools to explore this landscape. And those relying on crystallographic data (e.g., in a drug design context) should be aware that there are various binding possibilities that might not be captured in a single crystal structure.
We thank Scott P. Brown (Abbott Laboratories), Sarah Boyce (University of California, San Francisco), John van Drie, and John D. Chodera (Stanford University), for help with references; Sarah Boyce, Eric Manas (GlaxoSmithKline), and Gabe Rocklin (University of California, San Francisco) for comments on the manuscript; and Sarina Bromberg for preparing figures. We acknowledge financial support of the National Institutes of Health (Grant GM34993 to K.A.D.).