|Home | About | Journals | Submit | Contact Us | Français|
Binding hot spots, protein regions with high binding affinity, can be identified by using X-ray crystallography or NMR spectroscopy to screen libraries of small organic molecules that tend to cluster at such hot spots. FTMap, a direct computational analogue of the experimental screening approaches, uses 16 different probe molecules for global sampling of the surface of a target protein on a dense grid and evaluates the energy of interaction using an empirical energy function that includes a continuum electrostatic term. Energy evaluation is based on the fast Fourier transform correlation approach, which allows for the sampling of billions of probe positions. The grid sampling is followed by off-grid minimization that uses a more detailed energy expression with a continuum electrostatics term. FTMap identifies the hot spots as consensus clusters formed by overlapping clusters of several probes. The hot spots are ranked on the basis of the number of probe clusters, which predicts their binding propensity. We applied FTMap to nine structures of hen egg-white lysozyme (HEWL), whose hot spots have been extensively studied by both experimental and computational methods. FTMap found the primary hot spot in site C of all nine structures, in spite of conformational differences. In addition, secondary hot spots in sites B and D that are known to be important for the binding of polysaccharide substrates were found. The predicted probe–protein interactions agree well with those seen in the complexes of HEWL with various ligands and also agree with an NMR-based study of HEWL in aqueous solutions of eight organic solvents. We argue that FTMap provides more complete information on the HEWL binding site than previous computational methods and yields fewer false-positive binding locations than the X-ray structures of HEWL from crystals soaked in organic solvents.
The analysis of ligand binding sites of proteins is often the starting point for function identification and drug discovery. The sites generally include smaller regions called hot spots that are major contributors to the binding free energy and hence are crucial to the binding of any ligand at that particular site.(1) In drug design applications, such hot spots can be identified by screening for the binding of fragment-sized organic molecules.(2) Since the binding of these small-molecule probes is very weak, it is usually detected by NMR spectroscopy3,4 or X-ray crystallography.2,5−8 Individual probe molecules can bind at a number of locations, but clusters of different probes occur only at hot spots.2,4 Although the origin of this weakly specific binding is not fully understood, the phenomenon itself has been well-established. For example, using their structure–activity relationships by the NMR method, Fesik et al. observed that for a diverse set of targets, nearly 90% of fragment-sized ligands bind exclusively to protein sites that are known also to bind druglike small molecules.(4) Similar conclusions have been made using the multiple-solvent crystal structures (MSCS) method, which is based on determining the structure of a protein by X-ray crystallography in aqueous solutions of several organic solvents and superimposing the structures to identify clusters of overlapping probe molecules.2,5−8
The hot spots of the model protein hen egg-white lysozyme (HEWL) have been extensively studied by both experimental3,9−13 and computational14,15 methods. HEWL recognizes its substrate, polymeric carbohydrates from bacterial cell walls, in an active site that can accommodate up to six saccharide units, such as 2-N-acetylglycosamine (NAG) or N-acetyl-d-muramic acid, in subsites A, B, C, D, E, and F. The X-ray structures of HEWL with various polysaccharides show that the most important sites, in the order of occupancy, are C, B, D, and A,16−19 with C being the highest-affinity site.(20) X-ray structures have also been determined with a number of small organic molecules, including ethanol,(9) bromoethanol,(10) dimethyl sulfoxide (DMSO),(11) urea,(12) and acetonitrile (CCN).(13) CCN binds only at site C,(13) but all of the other compounds are found at a number of locations. However, overlapping the structures shows a cluster only at site C, which is also the highest-occupancy site in all structures, whereas the other binding sites are frequently located at crystal contacts.(11) These types of false-positive sites were eliminated by Liepinsh and Otting, who used 1H NMR spectroscopy to measure the nuclear Overhauser effect (NOE) between eight different small organic compounds and H atoms of HEWL.(3) They found all eight molecules in site C, interacting with residues Asn59, Trp63, Ile98, Ala107, and Trp108. This site is the same identified by Wang et al. as binding a single CCN molecule (PDB entry 2LYO).(13) The NMR-based mapping also found that residues Trp62, Val109, and Ala110 participate in the binding of probe molecules. These residues lie within the B and D sites, respectively, in agreement with the X-ray data, which show Trp62, Ile98 (also part of site C), Asp101, and Asn103 in site B and Asn46, Asp52, Val109, and Ala110 in site D.(16)
Two recent studies focused on the computational identification of the main hot spot in site C. Lexa and Carlson performed molecular dynamics (MD) simulations on HEWL in a 50% CCN/50% water mixture.(14) Since CCN density was found in site C only when allowing for protein flexibility, they concluded that full flexibility is essential for proper hot-spot mapping. The problems with this conclusion are that binding of a single compound is not sufficient to identify a hot spot and that Lexa and Carlson used the CCN-bound structure of HEWL, in which the large and generally open binding site is somewhat contracted around the small ligand. This narrowing of the binding site may have also caused the failure to identify site C when the simulations were run assuming a rigid protein. To address these concerns, Guarnieri et al. performed grand canonical Monte Carlo simulations to map eight HEWL structures using eight different organic solvents as probes.(15) Individual simulations were run for each probe molecule without allowing for protein flexibility, and the bound probe positions were clustered to identify the hot spots as in the MSCS experiments. In this approach, the simulations were started in neat solvent and ended at very low concentrations, essentially in vacuum, without considering solvation effects. To eliminate false binding of probe molecules at water binding sites, simulations were also run using water as the probe molecule, and clusters of organic solvent molecules within 1 Å of a water cluster were discarded. In spite of this indirect approach, the simulations showed excellent results for each of the eight HEWL structures, with a single cluster of probe molecules found within site C, indicating that the main hot spot could be detected using a static protein.
In this work, we applied the protein mapping method FTMap(21) to the HEWL structures considered by Guarnieri et al.(15) as well as the HEWL apo structure. FTMap has three major advantages over the other two computational methods. First, it directly samples the potential probe binding sites on a dense grid around the target protein using empirical free energy functions that model the competition with water by adding a continuum electrostatic term. By accounting for solvation while sampling, the method avoids the need for a separate calculation of water binding positions. Second, the sampling is extremely efficient because of the use of fast Fourier transforms in energy evaluations, which allows for the sampling of millions of probe positions on the protein surface. Because of the resulting efficiency, we have made FTMap available as a public server, in contrast to the other two methods, which require lengthy runs and substantial computational resources. Third, since we can map a protein using many different probe molecules, FTMap reliably identifies all hot spots of a target protein while eliminating false positives.21,22 In this manner, both primary and secondary hot spots are identified, which is crucial for accurate protein mapping since, as shown for HEWL, the secondary hot spots in sites B and D are known to be important for ligand binding.
The FTMap server (http://ftmap.bu.edu/) currently uses 16 small molecules as probes (ethanol, isopropanol, isobutanol, acetone, acetaldehyde, dimethyl ether, cyclohexane, ethane, acetonitrile, urea, methylamine, phenol, benzaldehyde, benzene, acetamide, and N,N-dimethylformamide). FTMap performs four steps as follows. (1) The rotational/translational space of each probe is systematically sampled on a grid around the fixed protein, consisting of 0.8 Å translations and 500 rotations at each location. The energy function includes a stepwise approximation of the van der Waals energy with attractive and repulsive contributions and an electrostatics/solvation term based on the Poisson–Boltzmann continuum model with dielectric constants (ε) of 4 and 80 for the protein and solvent, respectively. The 2000 best poses for each probe are retained for further processing. (2) The 2000 complexes are refined by off-grid energy minimization during which the protein atoms are held fixed while the atoms of the probe molecules are free to move. The energy function includes the bonded and van der Waals terms of the CHARMM potential(23) and an electrostatics/solvation term based on the analytic continuum electrostatic (ACE) model,(24) as implemented in CHARMM. (3) The minimized probe conformations are grouped into clusters using a simple greedy algorithm and a 4 Å root-mean-square deviation clustering radius. Clusters with less than 10 members are excluded from consideration. The retained clusters are ranked on the basis of their Boltzmann-averaged energies. The six clusters with the lowest average energies are retained for each probe. (4) To determine the hot spots, FTMap finds consensus sites (CSs), which are regions on the protein where clusters of different probes overlap.(21) Therefore, the probe clusters are clustered again using the distance between the cluster centers of mass as the distance measure and 4 Å as the clustering radius. The CSs are ranked on the basis of their numbers of clusters, with duplicate clusters of the same type also considered in the count. The largest CS defines the most important hot spot, with smaller CSs identifying secondary hot spots that generally also contribute to ligand binding. It was shown for a large variety of proteins that the CSs determined by this algorithm agree very well with the hot spots identified by X-ray crystallographic or NMR techniques.21,22,25−27
The nine different HEWL structures mapped are summarized in Table Table1.1. In short, the structure derived from PDB entry 2LYO corresponds to the experiment of Wang et al.,(13) with the singly bound CCN molecule residing in site C; the corresponding apo structure is also listed (PDB entry 2LYM).(13) All of these structures except the apo structure were also studied by Guarnieri et al.(15) Prior to mapping, all water molecules and heteroatoms were removed from the structures.
For all nine structures, FTMap predicted that the highest-ranked CS, CS1, occupies site C (Table (Table1).1). In particular, FTMap successfully identified site C as the top-ranked CS for the unbound HEWL structure (Figure (Figure1A).1A). Furthermore, in all cases, CS1 was found to have a very high probe cluster count, which is a good indicator of druggability.4,27 Besides correctly locating site C, FTMap was also able to predict additional hot spots in sites B and D (Figure (Figure1B1B and Table Table1),1), as identified by the NMR mapping studies.(3)
In all cases, FTMap correctly identified the main hot spot (site C) as CS1. However, inspection of Table Table11 does reveal some differences in both the total probe cluster count and the secondary site assignments, particularly for structures 1XEI, 1XEJ, and 1XEK. This series of structures has a collapse of the binding site cavity from largest (1XEI) to smallest (1XEK), and that size difference is reflected by the trend in the total probe cluster count (Table (Table1),1), which also decreases from 59 to 41. Furthermore, no CSs were found for site B for structures 1XEJ and 1XEK. This result can be explained by the rotation of Trp62 by ~90° around Cβ, thereby blocking off the top of the cavity (Figure (Figure2).2). Despite this conformational change, the primary hot spot was still detected without accounting for flexibility in the computational mapping.
The CSs predicted by FTMap were also in agreement with known ligand binding sites. Figure Figure1C1C illustrates the overlap between the CSs and polysaccharide substrates (shown as sticks) from three X-ray structures. To quantify further the similarity between the mapping results and known ligands, the distribution of nonbonded interactions between the protein residues and probe molecules was calculated. Figure Figure33 shows the frequency of contacts between the probe molecules and each residue in the ligand binding site of HEWL. As shown, Ala107, Gln57, Trp63, Asn59, Trp108, and Ile98 participate in the highest number of nonbonded interactions with the probe molecules. These residues are the same as those identified using NMR spectroscopy, which are indicated by an asterisk in Figure Figure33.(3) For comparison, Figure Figure33 also shows interaction frequencies observed in eight HEWL structures cocrystallized with polysaccharide ligands ranging from (NAG)3 to (NAG)6. Three of these substrates are shown in Figure Figure1C.1C. Figure Figure33 confirms that the residues with the highest percentage of nonbonded interactions are in good agreement with those predicted by the mapping. For both the mapping results and the ligands, the residues with the highest percentage of nonbonded interactions are in site C, with those in site B also exhibiting a high frequency of contacts. Indeed, the ligands interact with a number of residues outside of site C, all of which are identified by the CSs predicted by FTMap. Thus, FTMap identified residues in both the primary and secondary hot spots that are important for ligand binding, in contrast to the other two computational methods, which restricted consideration to site C.14,15
The two previous studies correctly placed CCN probes within the CCN binding site (site C).14,15 FTMap also predicted that the top CS, CS1, contains at least one cluster of CCN molecules for each of the nine structures, with the mapping results from the 2LYO, 1IR8, 1LSY, 1XEI, and 1XEJ structures containing two CCN clusters. To quantify the position of CCN within each cluster in relation to that found in the bound structure (PDB entry 2LYO), the distance between the middle carbons of the lowest-energy CCN probe in the highest ranking FTMap-predicted cluster and the CCN molecule from the 2LYO structure was measured for each structure (Table (Table1).1). In seven of the nine cases, this distance is 1.5 Å or less, which indicates very good agreement considering that we mapped ligand-free HEWL structures. For the mutant 1IR8 and 1IR9 structures, the antiparallel β-sheet located near Ile58 somewhat protrudes into the binding cavity, thereby slightly altering the position of the CCN. In each case, the probe representative located closest to that from the 2LYO structure has the lowest energy. As shown in Figure Figure4,4, the CCN position predicted by FTMap for the apo structure of HEWL is nearly identical to that found in the MD simulations by Lexa and Carlson for the holo structure(14) as well as to the one in the X-ray structure (the CCN molecule and density from the X-ray crystallographic data are shown in cyan).(13) However, as already discussed, we emphasize that the bound position of a single probe compound is not necessarily a hot spot and that hot spot identification requires simulations with at least six to eight different probes.2,21 Since MD requires substantial computational resources, it would have been very difficult to carry out such a large number of simulations. In addition, while both MD and FTMap accurately predicted the CCN binding site, to obtain results in agreement with the experimental data, the MD simulations had to allow for protein flexibility.(14) Without full flexibility, multiple CCN binding sites were located, with only weak occupancy in site C, suggesting that the simulations did not provide adequate sampling.
FTMap was able to identify the primary hot spot of HEWL as the top-ranked consensus site in nine different structures of HEWL. This hot spot coincides with site C of HEWL, which is known to be a key site for ligand binding.13,3 Furthermore, each top-ranked CS included at least one cluster of CCN molecules in close proximity to the experimentally determined position of CCN in the 2LYO structure. The results confirm that when a diverse set of probes are used in computational mapping, the primary hot spot is identified and spurious minima are avoided. Although considering protein flexibility may become important for detailed characterization of the binding site,(27) the hot spots showed remarkable robustness to conformational changes and were consistently found in a variety of HEWL structures. An additional advantage of FTMap is that, in contrast to previous computational methods,14,15 it also detected secondary hot spots in sites B and D that are known to be important for the binding of polysaccharide substrates. The predicted probe–protein interactions agree well with those seen in complexes of HEWL with various polysaccharides and also agree with the results of an NMR-based study of the protein in aqueous solutions of eight different organic solvents.(3) We thus conclude that FTMap provides more complete information on the HEWL binding site than the two recently published computational methods that focused only on site C.14,15 As discussed, soaking of HEWL crystals in an organic solvent such as ethanol(9) or DMSO(11) shows binding at site C but also at a number of locations, primarily in crystal contacts, that are clearly false positives. Thus, it appears that the computational approach is even more reliable than the X-ray based one. FTMap has been implemented as a server, which is freely available at http://ftmap.bu.edu/.
This investigation was supported by Grant GM064700 from the National Institute of General Medical Sciences.
National Institutes of Health, United States
These authors contributed equally.