Search tips
Search criteria 


Logo of proengLink to Publisher's site
Protein Eng Des Sel. 2016 September; 29(9): 377–390.
Published online 2016 August 8. doi:  10.1093/protein/gzw035
PMCID: PMC5001139

An in silico algorithm for identifying stabilizing pockets in proteins: test case, the Y220C mutant of the p53 tumor suppressor protein


The p53 tumor suppressor protein performs a critical role in stimulating apoptosis and cell cycle arrest in response to oncogenic stress. The function of p53 can be compromised by mutation, leading to increased risk of cancer; approximately 50% of cancers are associated with mutations in the p53 gene, the majority of which are in the core DNA-binding domain. The Y220C mutation of p53, for example, destabilizes the core domain by 4 kcal/mol, leading to rapid denaturation and aggregation. The associated loss of tumor suppressor functionality is associated with approximately 75 000 new cancer cases every year. Destabilized p53 mutants can be ‘rescued’ and their function restored; binding of a small molecule into a pocket on the surface of mutant p53 can stabilize its wild-type structure and restore its function. Here, we describe an in silico algorithm for identifying potential rescue pockets, including the algorithm's integration with the Dynameomics molecular dynamics data warehouse and the DIVE visual analytics engine. We discuss the results of the application of the method to the Y220C p53 mutant, entailing finding a putative rescue pocket through MD simulations followed by an in silico search for stabilizing ligands that dock into the putative rescue pocket. The top three compounds from this search were tested experimentally and one of them bound in the pocket, as shown by nuclear magnetic resonance, and weakly stabilized the mutant.

Keywords: cancer, docking, drug design, ligand, pharmacophore, small-molecule, visual analytics


The p53 tumor suppressor protein (Fig. (Fig.1) is1) is a DNA-binding protein that initiates cell cycle arrest and apoptosis in the presence of various oncogenic factors (Vogelstein et al., 2000). It is a well-studied cancer target as >50% of human cancers involve a missense mutation of its gene with the majority of those mutations located in the central DNA-binding domain (Joerger et al., 2006). These mutations can be broken out into two general categories: ‘contact’ mutations that alter DNA binding, and ‘structural’ mutations that affect the protein's structural integrity (Cho et al. 1994). Wild-type (WT) p53 protein is only moderately stable at physiological temperatures, and the ‘structural’ mutations destabilize it further, causing it to unfold and lose its tumor-suppressing functionality (Bullock et al., 2000).

Fig. 1
The DNA-binding core domain of the p53 tumor suppressor protein. The DNA-binding region is shown at the top and the Y220C mutation site discussed herein is shown at the bottom of the beta-sandwich structure.

Destabilized p53 mutants can be ‘rescued’, or re-stabilized such that they can resume their anti-tumor functionality; naturally this has become a point of interest in the search for new cancer therapies. These mutants can be rescued both by specific rescue mutations (Brachmann et al., 1998) (mutations that stabilize, rather than destabilize, the protein) and by rescue ligands, small molecules that dock into pockets on the surface of the protein and stabilize it (Boeckler et al., 2008). But, it is not always clear where on the protein these rescue regions and rescue pockets exist. Protein regions amenable to stabilization by a ligand have been uncovered by visual analysis, analyzing known stabilizing ligands or stabilizing mutations, as discussed above, and, as shown by Wassman et al., by analyzing homologous proteins and associated stabilizing ligands (Wassman et al., 2013).

p53 stabilization is a validated approach to rescuing and reactivating p53 mutants, yet there is no general method for identifying rescue pockets that are capable of protein-stabilization, particularly if the pockets are not present in an experimental structure. Here, we propose an in silico algorithm for identifying potential rescue pockets from molecular dynamics (MD) simulation data. MD computationally simulates the forces involved in protein dynamics and produces time-series snapshots of protein structures. The MD data used here were part of the Dynameomics project, an ongoing project designed to elucidate the unfolding pathways and native dynamics of representatives of all known protein folds (Beck et al., 2008; Van der Kamp et al., 2010).

Our hypothesis is that potential protein rescue regions are those regions of the protein whose native contacts become destabilized due to mutation. This was supported and motivated by our original observations that the most disrupted region of the Y220C mutant simulations coincided with an experimental pocket capable of docking stabilizing ligands (discussed below). Using contacts as proxies for structure, and leveraging the basic tenants of protein structure → protein function, we postulated that re-asserting native-like contacts, possibly via ligand-mediation, could re-assert native-like function. We used MD simulation data and our in-house contact-analysis tool Contact Walker (Bromley et al., 2013) to compare the inter-residue contact data of mutant and WT p53. We then computationally identified regions of the mutant protein that were destabilized relative to WT and were therefore potential rescue regions.

Once a destabilized region was identified, it was necessary to find a pocket or crevice that permitted access to that region. Proteins are dynamic molecules and, as such, the surface topology of any given region changes continuously. Quantifying and analyzing this dynamic topology is a strength of MD. Because our primary data originated from MD simulations, we had access to hundreds of thousands of protein structures, each with multiple pockets. To make such pocket-data analysis tractable, we created a pocket database and made them accessible via a front-end software tool.

To test our ability to find rescue pockets, we analyzed MD simulations of the core DNA-binding domain of the p53 WT and Y220C mutant proteins. The Y220C mutation was selected for analysis because previous work has identified pockets on its surface that dock stabilizing ligands (Joerger et al., 2006; Boeckler et al., 2008); as a result, Y220C acts as a positive control to test our method's ability to predict and identify potential rescue pockets. After identifying a putative rescue pocket, we identified three possible rescue ligands and compared both the pocket and the ligands to experimentally validated rescue pockets and ligands, and we determined whether the selected ligands were able to successfully interact with the putative rescue pocket and displayed a weak stabilizing effect on the p53 Y220C mutant. Although there are many examples of MD simulations of p53 (Basse et al., 2010; Barakat et al., 2011; Calhoun and Daggett, 2011; Lukman et al., 2013; Wassman et al., 2013), we build upon this earlier work using multi-simulation contact-disruption analysis to identify potential rescue regions and rescue pockets, which also sheds light on the structural and dynamic causes of the mutation-induced destabilization.


MD simulations

MD simulations were performed using in lucem molecular mechanics (ilmm) (Beck et al., 2000–2016), our in-house modeling package, with the Levitt et al. force field and associated procedures (Levitt et al., 1995, 1997; Beck et al., 2000–2016; Beck and Daggett, 2004). Starting structures of the WT p53 DNA-binding region were obtained using the protein data bank (PDB) (Berman et al., 2000) crystal-structure 2ocj (Wang et al., 2007, 2.05 Å resolution, chain A). Structures for the Y220C and R175H mutants were created by in silico point mutation. As zinc is significantly dissociated from p53 under physiological conditions, as well as in the presence of destabilizing mutations (Butler and Loh, 2003; Loh, 2010), we focus primarily on 310 K zinc-free (apo) simulations. The holo-WT protein was also simulated at 298 K for comparison with experiment. Each simulation was performed in triplicate, in explicit solvent and at neutral pH (negatively charged Asp and Glu, positive Arg and Lys, and neutral His). The apo 310 K simulations were performed for at least 51 ns and the holo 298 K WT simulations were performed for at least 100 ns.

To begin, the starting models were minimized for 1000 steps using steepest-descent minimization. Models were placed in a periodic box with walls at least 10 Å from all protein atoms, and then solvated using the F3C water model. Solvent density was 0.993 or 0.997 g/mL, the experimental values for water density at 310 and 298 K, respectively (Kell, 1967). Water minimization followed for another 1000 steps, followed by water-only dynamics at 298 or 310 k for another 500 steps, 500 steps of solvent minimization and then 500 steps of minimization of both the solvent and the protein.

Initial atomic velocities were assigned from a low-temperature Maxwellian distribution, after which the temperature was increased to 298 or 310 K. The non-bonded cutoff was set to 10 Å and the interaction list was updated every 2 steps. The NVE (constant number of particles, volume and energy) microcanonical ensemble with the number of particles, energy and volume fixed was employed. Structures were saved every 500 steps for analysis (1 ps). The p53 DNA-binding domain is an immunoglobulin-like β-sandwich fold, which is represented by Rank 1 of our Consensus Domain Dictionary (Day et al., 2003; Schaeffer et al., 2011).

Simulation analysis

Contact analysis, Cα root-mean-squared-fluctuation (RMSF), average properties and other aggregate analyses were performed over the last 35 ns of the 310 K apo simulations to allow for equilibration. The 298 K holo-WT simulations were slower to equilibrate and Nuclear Overhauser Effect (NOE) connectivities and solvent accessible surface area (SASA) data for comparison with experiment were calculated over the last 25 ns of the 100 ns simulations.

Structure alignment for Cα root-mean-squared-deviation (RMSD) and Cα RMSF analysis used the β-strand residues. Contact analysis was heavy atom-only and used a 5.4 Å cutoff for carbon–carbon contacts and a 4.6 Å cutoff for all others. Only one interatomic contact was required for residue: residue contact and contacts between adjacent residues were not considered. Secondary-structure comparison in the pocket search tool used define secondary structure of proteins (Kabsch and Sander, 1983) secondary-structure assignment. Images were created using Pymol (DeLano, 2002) and Accelrys Discovery Studio 3.5 Visualizer (

Pocket database and search tool

Initial pocket finding was performed at 1 ps granularity on the 310 K apo simulations of WT p53 and the Y220C mutant; pocket finding was performed using the EPOS BALLPass software ( To make this data set more usable, the pocket data were loaded into a relational database. This database first standardized the data and then organized them in a pocket-centric manner. Each pocket was identified with a unique number, associated with pocket properties such as volume and polarity, and linked via simulation identifier, structure identifier and simulation time step back to the original Dynameomics data warehouse (Beck et al., 2008; Simms et al., 2008; Van der Kamp et al., 2010).

A second set of data tables, linked by the unique pocket identifier, detailed which atoms were involved in each pocket. These data included each atom's name and type, residue number and type, and flags indicating if an atom is a heavy atom and/or if it is part of the residue main chain or side chain. Finally, the atoms were assigned atom identifiers from the original Dynameomics MD trajectories, allowing each atom and residue to be analyzed in its original dynamic context, and in association with the standard Dynameomics analyses such as SASA, Cα RMSF and dihedral angle analysis. The database was built using Microsoft SQL Server (

A search-tool based on the DIVE visual analytics system was built to provide a front-end user-interface to the pocket database (Bromley et al., 2014; Rysavy et al., 2014). Given one or more simulation identifiers, one or more residue numbers, and optional minimum and maximum pocket volumes, the tool searches the pocket database and returns pockets from the given simulations whose pocket-lining atoms belong to the specified residues and whose volumes lie within the specified volume-range. It then aligns and clusters these pockets based on heavy atom RMSD of the specified search residues and produces a hierarchical-clustering dendrogram plot, useful for identifying inter-pocket structural-similarities. As an option, experimental structures can be integrated into this search by specifying one or more external PDB files. These structures are aligned and clustered along with the dynamic pockets, allowing researchers to identify the dynamic pockets whose structures most closely match known experimental pockets.

The search tool can also optionally output docking files suitable for analysis by Autodock Vina (Trott and Olson, 2010). These files include the protein structures converted to the required file format, optional flexible side chains and a configuration file specifying the search box around the selected pocket. The only remaining information necessary for performing Vina docking is a ligand structure file. Finally, the search tool outputs a comma-separated value file detailing the pockets in each cluster and where the pocket structures can be found in the database, in the Dynameomics data warehouse and/or on disk. The pocket search results also include the pocket volumes, the time in the trajectory at which they occurred, their degree of secondary-structure overlap with the simulation starting structure and the total SASA of the specified residues. This pocket analysis pipeline is illustrated in Fig. Fig.2.2.

Fig. 2
Overview of pocket discovery pipeline.

Contact analysis

Searching the pocket database requires a list of pocket-associated residues. In the presence of experimental data, these residues can be specified by analyzing which residues make contact with a particular ligand, for example, or by identifying the residues that line a known pocket cavity. In the absence of experimental data, however, pocket residues must be identified in some other manner.

The Contact Walker tool was originally designed to identify regions of a protein that had become destabilized by mutation (Bromley et al., 2013). Briefly, it uses MD trajectory data and compares the inter-residue heavy-atom contact-occupancy between WT and mutant proteins. Contact Walker then produces a connected-graph diagram that shows which contacts become disrupted as a result of the mutation and whether the disruption stabilized them (increased occupancy) or destabilized them (decreased occupancy). Contact Walker can aggregate data from multiple simulations to show the minimum demonstrated occupancy change between WT and mutant simulations, i.e. for each contact, it shows the most conservative estimate of occupancy change.

Ligand docking

Ligand docking was performed using Autodock Vina version 1.1.2 and UCSF Dock Blaster (Irwin et al., 2009). UCSF Dock Blaster uses UCSF DOCK (Lorber and Shoichet, 1998, 2005) version 3.6 to perform high-throughput virtual screening of the entire ZINC (Irwin and Shoichet, 2005) ‘leads-now’ database into a user-specified pocket. We utilized the ZINC ‘leads-now’ database containing 2 395 805 lead-like molecules (‘lead-like’ indicates a molecular weight between 250 and 350 g/mol, xlogp ≤ 3.5, and 7 or fewer rotatable bonds). The Dock Blaster service returns the top 500 docking hits. Molecule file conversion was performed using Autodock Tools (Morris et al., 2009) and OpenBabel (O'Boyle et al., 2011). Ligand-protein polar contacts were identified using Pymol with a cutoff of 3.3 Å (Dock Blaster default).

Small molecule probing

Small-molecule energy probing was used to determine if the selected pocket would admit a benzene moiety into the pocket volume vacated when Y220 was mutated. Minimization and energy calculations for small-molecule probing was performed in vacuo using our ilmm molecular modeling package (Beck et al., 2000–2016) and the Levitt et al. force field (1997). The specific probe algorithm has been described previously (Bernard, 2006). Briefly, benzene molecules were placed in a regular 1 Å grid around the pocket residues; grid points that were more than 4.2 Å away from a protein residue or closer than 2.4 Å to a protein residue were discarded. At each probe point, the benzene molecule was rotated evenly around a sphere to calculate energies in different positions. For each rotation, steepest-descent minimization was performed for 10 steps. The probe location was reported as the geometric center of the benzene atoms.

NOE comparison

Comparison of experimental NOE connectivities with MD was performed using 298 K nuclear magnetic resonance (NMR) data (PDB code: 2fej, Canadillas et al., 2006) and our 298 K holo simulations. An NOE was considered satisfied if the average inter-hydrogen r−6 distance was less than 5 Å over the production periods of the simulations, whichever was larger.

Experimental assessment of binding

Stabilized p53-Y220C DNA-binding domain (residues 94–312) was expressed and purified as described previously (Wilcken et al., 2012). Protein melting temperatures were determined via differential scanning fluorimetry (DSF) using 8 µM protein (stabilized p53-Y220C DNA-binding domain) and 10× SYPRO orange (Life Technologies) in a 25 mM KPi pH 7.2, 150 mM NaCl, 1 mM TCEP assay buffer at a final DMSO concentration of 5% (v/v) (Wilcken et al., 2012). ΔTm values were calculated as: ΔTm = Tm (protein + compound)−Tm (protein). All samples were measured in triplicate. 1H/15N-HSQC spectra of 15N-labeled p53-Y220C DNA-binding domain (75 μM) and compounds were recorded and analyzed as described previously (Kaar et al., 2010; Wilcken et al., 2012).

Results and discussion

Comparison of 298 K holo simulations with experiment

Because experimental structural data are not available for 310 K apo-p53, we first checked our in silico protein system and procedures by simulating the p53 holo-WT protein at 298 K and comparing those simulations to experimental 298 K NMR data on holo-protein. Three independent 100 ns simulations were performed and the native structures were maintained throughout the simulations and no unfolding was observed. However, β-structure in the three-residue strands S1 and S5 was fragmented and dynamic. But, the protein main chain retained its basic structure and no significant restructuring occurred. The largest consistent deviations from the starting structure occurred in the S7/S8 loop. Holo-WT 298 K simulations satisfied 89% of the published 298 K NOEs on holo-protein and 45% of residue pairs containing an NOE violation also contained at least one NOE satisfaction. Comparison of the SASA of the experimentally derived NMR models and the WT simulations yielded a correlation coefficient of R = 0.79.

Analysis of 310 K WT apo simulations

The average Cα RMSD from the starting structure for the three 310 K apo-WT simulations were 3.7 ± 0.3 Å, 3.9 ± 0.3 Å and 3.6 ± 0.4 Å. The per-residue Cα RMSD and Cα RMSF values are shown in Fig. Fig.3.3. The primary regions of deviation from the starting structure were the loop-sheet-helix region, L2 and L3, particularly around the zinc-binding residues, and the S7/S8 loop. The Cα RMSF followed a similar pattern with the largest fluctuation values demonstrated in the loop-sheet-helix region, L2, L3, the S7/S8 loop and the S6/S7 loop. Secondary-structure analysis indicated that while most structures in the larger S6/S7/S4/S9/S10/S2’/S2 β-sheet were maintained, the smaller S1/S3/S8/S5 β-sheet demonstrated loss of structure, as did the H1 helix. In one simulation, S1 and S3 adopted α-sheet secondary structure, which has been proposed to play a role in protein aggregation (Armen et al., 2004; Daggett, 2006), and the L3 loop gained helical content around residue G245. In another simulation, the region around H168 gained helical structure, although it should be noted that the crystal structure of this region contains enough helical content that some algorithms characterize it as helical and others do not. In the third simulation, distance measurements between the Cα atoms of the K120 and R280 DNA contacts indicated that L1 separated from H2 by 22 Å, disrupting the loop-sheet-helix DNA-binding region.

Fig. 3
Per-residue average Cα RMSD and Cα RMSF of the p53 WT and Y220C mutant.

Validation of search tool functionality

The pocket search tool was designed to search a database and recover pockets with certain structural characteristics such as volume and residue composition (i.e. certain residues play a role in the structure of the pocket). To investigate the search-capabilities of our tool, we searched the Y220C simulations for a pocket similar to the experimentally validated ligand-docking pocket found in the x-ray crystal structure of the p53 Y220C mutant (PDB: 4AGQ, Wilcken et al., 2012). If our MD simulations contained the pocket, the pocket search tool was expected to recover it. In the Y220C ligand-stabilized crystal structure, residues 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 220, 221, 222, 223, 228, 229 and 230 were identified as having heavy atoms within 5.4 Å of the stabilizing ligand. We directed our search tool to find any dynamic pockets involving these residues. The pocket search tool discovered 170 MD structures that contained a pocket matching these specifications. Approximately 150 000 structures and 2 million pockets were searched to arrive at this set. The final pockets were all less than 3 Å heavy-atom RMSD from their position in the crystal structure (RMSD was calculated using heavy atoms from the 18 specified residues). Figure Figure44 shows a selection of the pockets discovered.

Fig. 4
In silico search tool recovery of experimental stabilizing pocket. Experimental structure (magenta) and five recovered MD pockets (gray). The experimental ligand is overlaid on each structure for orientation. Simulation time points are shown for the MD ...

Our original hypothesis was that Contact Walker could identify potential rescue regions by identifying native contacts that had become destabilized. There has been a great deal of work done analyzing the Y220C pocket originally published by Joerger et al. (2006); indeed, it was analysis of this pocket that initially motivated our work. More recently, a rescue pocket for the p53 R175H mutant was identified by Wassman et al. (2013), providing us an opportunity to further test our hypothesis. Wassman et al. used both experimental and in silico techniques to identify a region of the protein around residues L114 and C124 that had stabilizing potential. They then showed that stictic acid is capable of docking into a pocket in that region, increasing the melting temperature of the R175H mutant by 4.6°C.

Consequently, we analyzed three 310 K apo simulations of the R175H mutant. By comparing these mutations to WT, Contact Walker highlighted those regions of the mutant protein that were consistently disrupted across the ensemble of simulations. The most consistently disrupted contacts in the R175H simulations were L114:T125 and L114:Y126. As shown in Fig. Fig.5,5, these contacts and residues are immediately adjacent to the rescue pocket identified by Wassman et al. (2013), lending support to our methods.

Fig. 5
R175H destabilized region. Upper left: Contact Walker analysis of R175H indicated that L114 had lost contact with T125 and Y126 (magenta lines) and that this region was a potential rescue region. Right: Protein structure published by Wassman et al. showing ...

Y220C mutant simulation analysis

The average Cα RMSD values for the three 310 K apo Y220C mutant simulations relative to the holo-WT starting structure were 3.8 ± 0.3 Å, 3.2 ± 0.2 Å and 3.5 ± 0.2 Å (Fig. (Fig.3).3). The average per-residue Cα RMSD and Cα RMSF values are shown in Fig. Fig.3.3. Like the WT, the Y220C mutant demonstrated the largest deviations from the starting structure in the loop-sheet-helix region, L2, L3 and the S7/S8 loop. Also like WT, the Cα RMSF values followed the same pattern as Cα RMSD with the largest fluctuations in the loop-sheet-helix region, L2, L3, the S7/S8 loop and the S6/S7 loop. Relative to WT, one Y220C simulation also demonstrated a slight increase (1 Å) in fluctuation in the S5/S6 loop. However, secondary-structure analysis showed a larger departure from native secondary structures than was demonstrated by WT. Like WT, most of the larger S6/S7/S4/S9/S10/S2’/S2 β-sheet was well maintained. However, one simulation showed novel helical content in the L1 loop, disrupting the loop-sheet-helix region, another simulation showed loss of β-structure in S1 and S5 and an increase in helical structure around H168, and a third simulation showed adoption of α-sheet secondary structure41,42 between S1, S3 and S8. One simulation showed a 25 Å separation between L1 and H2 (distance measured between the Cα atoms of DNA-contacts K120 and R280).

Contact Walker analysis of the p53 Y220C mutant is shown in Fig. Fig.6.6. To illustrate how to read a Contact Walker diagram, this figure indicates that L145, V147, P151, C220, P223, L257, and L265 (magenta circles) were all highly destabilized. It shows that C220 lost contact with V147 (magenta line) and that P151 lost contact with L257 and L265 (magenta lines) but gained alternative contacts with T230 and E221 (green lines). Note that E221 is colored gray despite gaining two alternative contacts. This is because E221 lost very little contact-occupancy; this property-coloring scheme can be controlled via DIVE ‘micro-scripting’ (Rysavy et al., 2014). We are focusing here on contact loss because our goal here is to stabilize native contacts, not destabilize nonnative contacts. Note also that this figure shows the minimum demonstrated occupancy difference among three WT simulations and three mutant simulations. Thus, if a given contact has occupancy values of 10%, 20% and 30% in the WT simulations and occupancy values of 70%, 80% and 90% in the mutant simulations, the reported contact difference will be 30–70% = −40%, or a net mutation-associated stabilization of, at least, 40% occupancy. This difference represents the most conservative estimation of disruption possible given the demonstrated occupancies.

Fig. 6
Example Contact Walker diagram for the p53 Y220C mutant showing mutation-associated contact disruptions. Residues are represented by circles and inter-residue contacts are represented by lines. The orange circle indicates the mutation site. Only peptide ...

Pocket-selection algorithm and application

As discussed above, Contact Walker identified the most disrupted region of the protein as the region near the C220 mutation site (Fig. (Fig.6).6). After analyzing the MD data for this region, L145, V147, P151, C220, L257 and L265 were selected to identify potential rescue pockets. While this is the region that was discussed earlier as correlating with a known rescue pocket, this analysis was performed in the absence of experimental data, using only inter-residue contact-information from MD. Pockets containing these six residues were present 1.5% of the simulation time.

Querying the pocket database for pockets in this region with volumes >600 Å3 returned 98 potential rescue pockets, a 104 data-reduction from the more than two million pockets stored in the pocket database. These 98 candidate pockets were then manually inspected and filtered based on the following criteria:

  1. The pocket needed to provide a space in which an aromatic ring, positioned similarly to the tyrosine benzene moiety lost by the mutation, could dock. Essentially, we wanted to provide an opportunity to replace what was lost without introducing additional foreign material.
  2. The pocket should not be overly invasive. The mutation site and the lost benzene moiety lie near the surface, so replacing what was lost should not require binding a ligand deep into the hydrophobic core. It was the core that we were trying to stabilize and injecting a large foreign presence could easily introduce unfavorable contacts.
  3. The protein conformation containing the pocket needed to retain as much WT-like structure as possible, particularly around the hydrophobic core. We were trying to rescue a protein from unfolding, so we wanted the hydrophobic core of the stabilized conformation to be as WT-like as possible. As discussed previously by Joerger et al. (2004), ligands that preferentially stabilize the native state over the denatured state should drive the unfolding equilibria toward the native state.

This entire process is shown in Fig. Fig.7 and7 and the final selected putative rescue pocket is shown in Fig. Fig.8.8. This pocket met the basic criteria of being in the disrupted region, providing a solvent-accessible opportunity to introduce a tyrosine-like benzene moiety, and being located in a protein structure with a relatively intact beta-core. Small-molecule energy probing indicated that the rescue pocket could accommodate a benzene moiety in the desired position (Fig. (Fig.9).9).

Fig. 7
Pocket search and refinement process. More than 160 K protein structures resulted in over 2.2 million pockets. Use of the pocket database and pocket search tool reduced the search space to 98 pockets. These 98 pockets were manually analyzed and a final ...
Fig. 8
Final pocket selected from the 98 candidate pockets. Disrupted residues are shown in green, aligned WT Y220 is shown in magenta. Pocket was solvent-accessible from the top, and the middle of the pocket lay roughly adjacent to the WT Y220 benzene moiety. ...
Fig. 9
Histogram showing benzene probe points with negative energiesEnergy profile was a bimodal distribution; probe points from the large peak on the left are shown as cyan spheres in the inset image. (1) Surface probe points. (2) Internal pocket probe points. ...

Dihedral angle analysis as well as visual inspection indicated that the pocket-lining residues did not undergo large conformational changes over the 1 ns window surrounding the rescue pocket. The residues with the greatest side-chain movement were S149, C220 (the mutation site) and E221. S149 and E221 both faced outwards into the solvent, but C220 was not solvent accessible aside except through the putative rescue pocket. Cα RMSF analysis of the rescue pocket simulation indicated that the motion of the pocket-lining residues all fell within WT ranges with the exception of H115, which exhibited approximately 1.7 Å larger Cα RMSF than WT. Analysis of these residues in the other two Y220C simulations indicated similar WT-like stability, although in one simulation residues V225 and C229 exhibited Cα RMSF values between 1 and 2 Å greater than WT.

Ligand selection

To confirm that the selected pocket had rescue potential, it was necessary to confirm that it could interact with a stabilizing ligand. Therefore, once the putative rescue pocket was selected, we identified ligands that could potentially dock into it and stabilize the protein. To begin this search, we docked the ZINC (Irwin and Shoichet, 2005) ‘leads-now’ database containing 2.4 M ligands into the rescue pocket using the UCSF Dock Blaster (Irwin et al., 2009) service. After initial docking, the top 500 ligands were manually inspected for two different characteristics:

  1. Ligands needed to have a benzene moiety in roughly the position where the WT tyrosine would be. This was in keeping with the goal of simply replacing what had been removed by mutation rather than attempting to bind more aggressively and risk introducing new structural artifacts. Large ligands were not perceived as necessary to stabilize the protein; for example, the N239Y rescue mutation rescues the G245S mutation by introducing a tyrosine side chain that stabilizes the local hydrophobic packing (Joerger et al., 2004). Such a side chain is similar in scope to a small-molecule stabilizing ligand.
  2. Ligands needed to support a strong contact to L145. Originally, this was not a required criterion. However, several of the final 500 ligands contained such a contact and thus it became an option in the refinement process. The L145 side-chain position within the rescue pocket was WT like and thus a contact in that position was expected to not only help secure the stabilizing ligand in place, but also help stabilize L145, part of the β-core, in a WT-like position.

Of the 500 ligands returned by Dock Blaster, only three ligands met all of these criteria. For additional docking-validation, these three ligands were re-docked into the putative rescue pocket using Autodock Vina. Our docking energy point of reference was −6.5 kcal/mol, calculated by using Autodock Vina to dock the known rescue ligand PhiKan5196 (Wilcken et al., 2012) into its holo crystal position (PDB: 4AGQ). Vina energy estimates of the Dock Blaster poses were −5.2 kcal/mol for ZINC 67692870, −5.5 kcal/mol for ZINC 95476490 and −4.4 kcal for ZINC 19565304. For this last ligand, Vina found a −6.0 kcal/mol pose that was very similar to the Dock Blaster pose with comparable positioning of the benzene moiety and contact with L145. This process is illustrated in Fig. Fig.10.10.

Fig. 10
Ligand refinement process. 2.4 million lead-like ligands were screened using UCSF Dock Blaster. This reduced the number of ligands under consideration to 500. These were then inspected manually for various structural and pharmacophore characteristics. ...

A histogram of the overall binding-energy distribution is shown in Fig. Fig.1111 with an inset showing only the favorable (negative) binding energies. The two ligands with the least favorable energies reported by Dock Blaster were ZINC 3904140 (+2.2 M kcal/mol) and ZINC 6653655 (+45 K kcal/mol). The three selected rescue ligands described above had negative potential energies three and four standard deviations less than the mean negative binding energy (−22.5 ± 3.6 kcal/mol). Ninety-nine percent of the compounds were able to favorably bind somewhere in the vicinity of the selected pocket. However, re-docking of the two worst binders with Autodock Vina indicated that the negative energies alone did not necessarily imply successful pocket-docking; sometimes, ligands would dock only peripherally at the surface or in small adjacent clefts that inadvertently fell into the specified rectangular docking-volume. Both of these scenarios placed the ligands far away from the intended docking site. More specifically, docking conformations such as these placed the ligands far away from the destabilized residues that were hypothesized to provide stabilizing contacts. These observations underscored the notion that understanding which residues the ligands should contact and why the ligands should contact them was an important part of our pocket-selection algorithm and the subsequent ligand selection.

Fig. 11
Potential energy distribution of 2.4 M ligands docked by UCSF Dock Blaster. The distribution shape was a tall, narrow bell curve just below zero with a shallow tail extending to the right and a remote data outlier far to the right. Ninety-nine percent ...

In silico analysis and comparison to existing experimental data

The three final ligand picks are illustrated in Fig. Fig.12.12. As discussed above, both the pocket and the selected ligands were discovered using strictly in silico methods. However, the final rescue pocket and the final ligand selections had considerable structural and pharmacophore similarities to experimentally validated rescue pockets and ligands. Given these similarities, it is worth comparing the selected pocket and ligands with their experimental counterparts.

Fig. 12
Final ligand picks. The three final ligands and specific binding data are shown from top to bottom. On the left is a molecular drawing and docking energies. In the middle is the ligand in its docked pose. The ligand is blue and L145 is green. WT L145 ...

To begin, the selected pocket was in the same region that is stabilized by the PhiKan5196 ligand. Indeed, it was the colocation of the PhiKan5196 docking site and Contact Walker's analysis of disrupted contacts that motivated our disrupted-region-as-rescue region hypothesis. Despite this location overlap, however, the actual surface-accessible pocket geometries were different. Consider the two ligands illustrated in Fig. Fig.13.13. Both ligands interacted with the same region of the protein (i.e. the disrupted region specified by Contact Walker), but they did so from pockets with different surface geometries; the PhiKan5196 pocket contained a long surface trench with a subsurface cavity (Fig. (Fig.13a)13a) and the putative rescue pocket selected by Contact Walker and the pocket database (Fig. (Fig.13b)13b) was primarily an extension of a similar subsurface cavity, but without the trench.

Fig. 13
Overlap of known rescue ligand and putative rescue ligand addressing the same protein region from pockets with different surface geometries. (a) PhiKan5196 docked in 4AGQ crystal structure. (b) ZINC 67692870 (example putative rescue ligand) docked into ...

Despite these pocket differences, the ligands themselves had an overlap of pharmacophore groups, notably the Y220 ‘replacement’ aromatic group and a strong contact to L145 (Fig. (Fig.1212 and Fig. Fig.13c13c and d). In addition to being present in validated rescue ligands, both of these features as well as the polar contacts near D228 (Fig. (Fig.12)12) have been shown in fragment-binding experiments (Basse et al., 2010) to promote docking. PhiKan5196 extends further in the direction of the C220 mutation site than any of the putative rescue ligands, which may contribute to stability. However, PhiKan083 (Boeckler et al., 2008), a stabilizing molecule docked into this same mutant (PDB: 2VUK), is not extended in this manner, suggesting that such an extension is not necessary for protein rescue. Similarly, PhiKan083 does not support a strong contact with L145, a feature that was present in other known rescue ligands and each of the suggested ligands. Table TableII compares the ligand contacts of PhiKan083, PhiKan5196 and the three putative rescue ligands (heavy-atom contacts only, 5.4 Å cutoff). Finally, the side-chain conformation of the six selected disrupted residues in the 2VUK crystal structure was similar to that of the selected rescue pocket (Fig. (Fig.14),14), suggesting that the pocket selected by the pocket search tool presented a docking opportunity similar to that of a known rescue pocket.

Fig. 14
Stereo image showing overlap of putative rescue pocket and the 2VUK crystal structure.
Table I.
Ligand/protein heavy atom contacts

Experimental assessment of predicted binders

The three putative stabilizing compounds, as well as the two worst dockers (used here as decoys) were tested for their abilities to stabilize the Y220C mutant. Thermal shift assay results showed that one of the putative stabilizing ligands (ZINC: 95476490) had a small and potentially dose-dependent stabilization of the Y220C mutant (Fig. (Fig.15,15, highlighted box). In silico analysis of ligand Zinc 95476490 showed two polar contacts with D228, as well as contacts with both Q110 and H115 (Fig. (Fig.1212 and Table TableI).I). The only other compound to show a positive response was one of the decoy compounds, which showed less dose-dependent behavior than did the putative stabilizing ligand. 1H/15N-HSQC NMR analysis of the effect of compound Zinc 95476490 on Y220C showed slight shifts in the chemical shifts of residues in the pocket comparing the spectra with and without ligand (Fig. (Fig.16).16). The shifts indicate that the chemical environments near several residues in the putative rescue pocket were altered in the presence of the ligand (Fig. (Fig.16),16), consistent with binding of the compound in the pocket. The L145 peak in the HSQC spectrum is the most ‘sensitive peak’ in the spectrum for detecting binding to Y220C.

Fig. 15
Initial thermal shift assay results. One of the non-decoy ligands (Zinc ID: 95476490) showed a small, possibly dose-dependent stabilizing effect on the Y220C mutant (yellow highlight).
Fig. 16
Initial NMR chemical-shift data. (a) Peak assignments suggest that the ligand capable of small thermal stabilizations (Zinc ID: 95476490) was also binding in or near the identified putative rescue pocket. (L145 assignment (italic) is preliminary). (b ...

From past experience, the HSQC spectrum of the stabilizing compound induces chemical shifts of residues (e.g. L145 and V147) that are part of the Y220C binding pocket, very similar to well-characterized Y220C binders such as PhiKan083, PK7088 and PK5196, which do not bind to WT p53. In p53 WT this pocket is closed/blocked by a large tyrosine residue, ruling out binding of large compounds like Zinc 95476490. Typically fragments or compounds that bind to the protein display a small to medium shift of the L145 peak, while the rest of the spectrum typically does not show any stronger or larger peak shifts. For stronger binders (Kd < 15 µM) we even see disappearance/intermediate exchange of the L145 peak. Zinc 95476490 binds, but the peak shift is considerably smaller than for stronger binders like PK083 or PK5196. Nevertheless, the MD simulations combined with the Contact Walker analysis classified the L145 side-chain position as WT-like, and similar to all other known Y220C-stabilizers, our identified hit binds in close proximity to L145, which helps it stabilize the protein. The residues affected by Zinc 95476490 binding (L145, W146, Val147, Gly154) are highlighted on the WT crystal structure in Fig. Fig.1616.

So, one of our predicted rescue ligands bound Y220C in the MD-generated pocket, but unfortunately the effect was weak. While disappointing, such docking methods are known to produce many false positive and false negative results and the strength of the method, which has been quite successful overall, is in the scale (Irwin and Shoichet, 2016). While we screened many compounds in silico many more need to be considered at the experimental validation stage to ensure an adequate number of hits and their discrimination from false positives and negatives. Irwin and Shoichet (2016) state that with their current and well-developed methods of docking and screening compounds that testing 25–50 new molecules experimentally is enough to yield two to five hits. That we obtained a single hit (albeit weak) from only three predicted binders is encouraging.


Our approach of identifying dynamic clefts during MD as well as analysis of potential sites to rescue p53 was supported by experiment for one of our predicted binders. We are encouraged by the improvement in stability upon binding, but the effect was very minor unfortunately and one of our decoys also stabilized the mutant. Thus, the preliminary experimental results were not sufficient to show conclusively that our algorithm was effective. Aspects of the results are encouraging, however, and, given that the preliminary data show the basic hallmarks of stabilizing ligands – correct interaction location, positive stabilization, and dose-dependent action – further refinement and testing are warranted. Our algorithm aims to select small-molecules whose contacts will mimic, or at least serve as proxies for, lost native contacts. In some cases, this may not work; for example, some mutations alter DNA-contacts, while in others small→large mutations can cause repacking around a buried residue. However, absent these or similar conditions, our hypothesis is that re-establishing native-like contacts may help re-establish native-like functionality.

In support of our approach, we independently identified pockets through MD. In silico analysis of the putative rescue pocket and ligand showed a considerable degree of overlap with published experimental evidence for Y220C. Although experiment is the final assessment, multiple aspects of structural and pharmacophore agreement with well-established in silico tools suggest that the algorithm can successfully produce results that fall within these tools’ predictive ranges. This suggests that the next step is to either refine our pocket and ligand selections using additional tools or criteria, or to sample the ligand space more extensively by testing more ligands.

The goal of developing this algorithm was to provide a general means of discovering drug leads for protein targets that are not amenable to structural characterization. There is a need for such a method, underscored by recognizing that p53 plays a crucial role in cancer biology, yet there are few mutant structures due to the drop in stability. Fundamentally, our algorithm was intended to find stabilization pockets; confirmation of the stabilizing potential of a pocket can only truly be assessed, however, by confirming that it can interact successfully with a stabilizing ligand. Thus, we conclude that this algorithm holds potential, but is in need of more extensive testing and future refinement.


We thank the LMB NMR Spectroscopy Facility and Trevor Rutherford for help with setup of NMR experiments and providing the NMR assignment map of the p53-Y220C DNA-binding domain.

Board Members: Assaf Friedler and Frederic Rousseau.


We are grateful for financial support provided by the National Institutes of Health (GM50789 to V.D.) and the National Library of Medicine (project 5T15LM007442 to D.B.). The experimental studies were funded through an ERC Advanced Grant (268506, p53lazarus to A.R.F.). Protein simulation computer time was part of the Dynameomics project and was supported through the US Department of Energy (DOE) Office of Biological and Environmental Research as provided by the National Energy Research Scientific Computing Center (NERSC), which is supported by the DOE Office of Science (contract DE-AC02–05CH11231).


  • Armen R.S., Alonso D.O.V. and Daggett V. (2004) Structure, 12, 1847–1863. [PubMed]
  • Barakat K., Issack B.B., Stepanova M. and Tuszynski J. (2011) PLoS ONE, 6, e27651. [PMC free article] [PubMed]
  • Basse N., et al. . (2010) Chem. Biol., 17, 46–56. [PubMed]
  • Beck D.A.C. and Daggett V. (2004) Methods, 34, 112–120. [PubMed]
  • Beck D.A.C., et al. . (2008) Protein. Eng. Des. Sel., 21, 353–368. [PubMed]
  • Beck D.A.C., McCully M., Alonso D.O.V. and Daggett V. ilmm, in lucem Molecular Mechanics. (2000. –2016).
  • Berman H.M., et al. . (2000) Nucleic. Acids. Res., 28, 235–242. [PMC free article] [PubMed]
  • Bernard B. Incorporation of protein flexibility in structure-guided drug design. (Thesis MSB—University of Washington, 2006).
  • Boeckler F.M., et al. . (2008) Proc. Natl. Acad. Sci. U.S.A., 105, 10360–10365. [PubMed]
  • Brachmann R.K., Yu K., Eby Y., Pavletich N.P. and Boeke J.D. (1998) EMBO. J., 17, 1847–1859. [PubMed]
  • Bromley D., et al. . (2014) Bioinformatics, 30, 593–595. [PMC free article] [PubMed]
  • Bromley D., Anderson P.C. and Daggett V. (2013) Biochemistry, 52, 4264–4273. [PMC free article] [PubMed]
  • Bullock A.N., Henckel J. and Fersht A.R. (2000) Oncogene, 19, 1245–1256. [PubMed]
  • Butler J.S. and Loh S.N. (2003) Biochemistry, 42, 2396–2403. [PubMed]
  • Calhoun S. and Daggett V. (2011) Biochemistry, 50, 5345–5353. [PMC free article] [PubMed]
  • Cañadillas J.M.P., et al. . (2006) Proc. Natl. Acad. Sci. U.S.A., 103, 2109–2114. [PubMed]
  • Cho Y., Gorina S., Jeffrey P.D. and Pavletich N.P. (1994) Science, 265, 346–355. [PubMed]
  • Daggett V. (2006) Acc. Chem. Res., 39, 594–602. [PubMed]
  • Day R., Beck D.A.C., Armen R.S. and Daggett V. (2003) Protein. Sci., 12, 2150–2160. [PubMed]
  • DeLano W.L. The PyMOL Molecular Graphics System. (2002).
  • Irwin J.J. and Shoichet B.K. (2005) J. Chem. Inf. Model., 45, 177–182. [PMC free article] [PubMed]
  • Irwin J.J., et al. . (2009) J. Med. Chem., 52, 5712–5720. [PMC free article] [PubMed]
  • Irwin J.J. and Shoichet. B.K. (2016) J. Med. Chem., 59, 4103–4120. [PMC free article] [PubMed]
  • Joerger A.C. and Fersht A.R. (2010) Cold Spring Harb. Perspect. Biol., 2, a000919 [PMC free article] [PubMed]
  • Joerger A.C., Allen M.D. and Fersht A.R. (2004) J. Biol. Chem., 279, 1291–1296. [PubMed]
  • Joerger A.C., Ang H.C. and Fersht A.R. (2006) Proc. Natl. Acad. Sci. U.S.A., 103, 15056–15061. [PubMed]
  • Kaar J.L., et al. . (2010) Protein. Sci., 19, 2267–2278. [PubMed]
  • Kabsch W. and Sander C. (1983) Biopolymers, 22, 2577–2637. [PubMed]
  • Kell G.S. (1967) J. Chem. Eng. Data, 12, 66–69.
  • Levitt M., Hirshberg M., Sharon R. and Daggett V. (1995) Comput. Phys. Commun., 91, 215–231.
  • Levitt M., Hirshberg M., Sharon R., Laidig K.E. and Daggett V. (1997) J. Phys. Chem. B, 101, 5051–5061.
  • Loh S.N. (2010) Metallomics, 2, 442–449. [PubMed]
  • Lorber D.M. and Shoichet B.K. (1998) Protein. Sci., 7, 938–950. [PubMed]
  • Lorber D.M. and Shoichet B.K. (2005) Curr. Top. Med. Chem., 5, 739–749. [PMC free article] [PubMed]
  • Lukman S., Lane D.P. and Verma C.S. (2013) PLoS ONE, 8, e80221 [PMC free article] [PubMed]
  • Morris G.M., et al. . (2009) J. Comput. Chem., 30, 2785–2791. [PMC free article] [PubMed]
  • O'Boyle N.M., et al. . (2011) J. Cheminformatics, 3, 1–14.
  • Rysavy S.J., Bromley D. and Daggett V. (2014) IEEE. Comput. Graph. Appl., 34, 26–37. [PMC free article] [PubMed]
  • Schaeffer R.D., Jonsson A.L., Simms A.M. and Daggett V. (2011) Bioinformatics, 27, 46–54. [PMC free article] [PubMed]
  • Simms A.M., Toofanny R.D., Kehl C., Benson N.C. and Daggett V. (2008) Protein. Eng. Des. Sel., 21, 369–377. [PubMed]
  • Trott O. and Olson A.J. (2010) J. Comput. Chem., 31, 455–461. [PMC free article] [PubMed]
  • Van der Kamp M.W., et al. . (2010) Structure, 18, 423–435. [PMC free article] [PubMed]
  • Vogelstein B., Lane D. and Levine A.J. (2000) Nature, 408, 307–310. [PubMed]
  • Wang Y., Rosengarth A. and Luecke H. (2007) Acta. Crystallogr. D. Biol. Crystallogr., 63, 276–281. [PubMed]
  • Wassman C.D., et al. . (2013) Nat. Commun., 4, 1–9.
  • Wilcken R., et al. . (2012) J. Am. Chem. Soc., 134, 6810–6818. [PMC free article] [PubMed]

Articles from Protein Engineering, Design and Selection are provided here courtesy of Oxford University Press