A powerful early approach to evaluating the druggability of proteins involved determining the hit rate in NMR-based screening of a library of small compounds. Here we show that a computational analog of this method, based on mapping proteins using small molecules as probes, can reliably reproduce druggability results from NMR-based screening, and can provide a more meaningful assessment in cases where the two approaches disagree. We apply the method to a large set of proteins. The results show that, because the method is based on the biophysics of binding rather than on empirical parameterization, meaningful information can be gained about classes of proteins and classes of compounds beyond those resembling validated targets and conventionally druglike ligands. In particular, the method identifies targets that, while not druggable by druglike compounds, may become druggable using compound classes such as macrocycles or other large molecules beyond the rule-of-five limit.
drug discovery; ligand design; binding hot spot; drug target protein; macrocyclic compounds; inhibitors of protein-protein interaction
Analysis of binding energy hot spots at protein surfaces can provide crucial insights into the prospects for successful application of fragment-based drug discovery (FBDD), and whether a fragment hit can be advanced into a high affinity, druglike ligand. The key factor is the strength of the top ranking hot spot, and how well a given fragment complements it. We show that published data are sufficient to provide a sophisticated and quantitative understanding of how hot spots derive from protein three-dimensional structure, and how their strength, number and spatial arrangement govern the potential for a surface site to bind to fragment-sized and larger ligands. This improved understanding provides important guidance for the effective application of FBDD in drug discovery.
2016;84(Suppl Suppl 1):323-348.
We present the results for CAPRI Round 30, the first joint CASP‐CAPRI experiment, which brought together experts from the protein structure prediction and protein–protein docking communities. The Round comprised 25 targets from amongst those submitted for the CASP11 prediction experiment of 2014. The targets included mostly homodimers, a few homotetramers, and two heterodimers, and comprised protein chains that could readily be modeled using templates from the Protein Data Bank. On average 24 CAPRI groups and 7 CASP groups submitted docking predictions for each target, and 12 CAPRI groups per target participated in the CAPRI scoring experiment. In total more than 9500 models were assessed against the 3D structures of the corresponding target complexes. Results show that the prediction of homodimer assemblies by homology modeling techniques and docking calculations is quite successful for targets featuring large enough subunit interfaces to represent stable associations. Targets with ambiguous or inaccurate oligomeric state assignments, often featuring crystal contact‐sized interfaces, represented a confounding factor. For those, a much poorer prediction performance was achieved, while nonetheless often providing helpful clues on the correct oligomeric state of the protein. The prediction performance was very poor for genuine tetrameric targets, where the inaccuracy of the homology‐built subunit models and the smaller pair‐wise interfaces severely limited the ability to derive the correct assembly mode. Our analysis also shows that docking procedures tend to perform better than standard homology modeling techniques and that highly accurate models of the protein components are not always required to identify their association modes with acceptable accuracy. Proteins 2016; 84(Suppl 1):323–348. © 2016 The Authors Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc.
CAPRI; CASP; oligomer state; blind prediction; protein interaction; protein docking
We present the results for CAPRI Round 30, the first joint CASP-CAPRI experiment, which brought together experts from the protein structure prediction and protein-protein docking communities. The Round comprised 25 targets from amongst those submitted for the CASP11 prediction experiment of 2014. The targets included mostly homodimers, a few homotetramers, and two heterodimers, and comprised protein chains that could readily be modeled using templates from the Protein Data Bank. On average 24 CAPRI groups and 7 CASP groups submitted docking predictions for each target, and 12 CAPRI groups per target participated in the CAPRI scoring experiment. In total more than 9500 models were assessed against the 3D structures of the corresponding target complexes. Results show that the prediction of homodimer assemblies by homology modeling techniques and docking calculations is quite successful for targets featuring large enough subunit interfaces to represent stable associations. Targets with ambiguous or inaccurate oligomeric state assignments, often featuring crystal contact-sized interfaces, represented a confounding factor. For those, a much poorer prediction performance was achieved, while nonetheless often providing helpful clues on the correct oligomeric state of the protein. The prediction performance was very poor for genuine tetrameric targets, where the inaccuracy of the homology-built subunit models and the smaller pair-wise interfaces severely limited the ability to derive the correct assembly mode. Our analysis also shows that docking procedures tend to perform better than standard homology modeling techniques and that highly accurate models of the protein components are not always required to identify their association modes with acceptable accuracy.
CAPRI; CASP; Oligomer state; blind prediction; protein interaction; protein docking
The protein-protein docking server ClusPro is used by thousands of laboratories, and models built by the server have been reported in over 300 publications. Although the structures generated by the docking include near-native ones for many proteins, selecting the best model is difficult due to the uncertainty in scoring. Small Angle X-ray Scattering (SAXS) is an experimental technique for obtaining low resolution structural information in solution. While not sufficient on its own to uniquely predict complex structures, accounting for SAXS data improves the ranking of models and facilitates the identification of the most accurate structure. Although SAXS profiles are currently available only for a small number of complexes, due to its simplicity the method is becoming increasingly popular. Since combining SAXS experiments will provide a viable strategy for fairly high-throughput determination of protein complex structures, the option of using SAXS restraints is added to the ClusPro server.
protein complex; structure prediction; docking method; scoring function; SAXS restraints
FTMap is a computational mapping server that identifies binding hot spots of macromolecules, i.e., regions of the surface with major contributions to the ligand binding free energy. To use FTMap, users submit a protein, DNA, or RNA structure in PDB format. FTMap samples billions of positions of small organic molecules used as probes and scores the probe poses using a detailed energy expression. Regions that bind clusters of multiple probe types identify the binding hot spots, in good agreement with experimental data. FTMap serves as basis for other servers, namely FTSite to predict ligand binding sites, FTFlex to account for side chain flexibility, FTMap/param to parameterize additional probes, and FTDyn to map ensembles of protein structures. Applications include determining druggability of proteins, identifying ligand moieties that are most important for binding, finding the most bound-like conformation in ensembles of unliganded protein structures, and providing input for fragment based drug design. FTMap is more accurate than classical mapping methods such as GRID and MCSS, and is much faster than the more recent approaches to protein mapping based on mixed molecular dynamics. Using 16 probe molecules, the FTMap server finds the hot spots of an average size protein in less than an hour. Since FTFlex performs mapping for all low energy conformers of side chains in the binding site, its completion time is proportionately longer.
ligand-protein interaction; ligand binding site; drug discovery; druggability; fragment based drug design
We study the impact of optimizing side-chain positions in the interface region between two proteins during the process of binding. Mathematically, the problem is similar to side-chain prediction, extensively explored in the process of protein structure prediction. The protein-protein docking application, however, has a number of characteristics that necessitate different algorithmic and implementation choices. In this work, we implement a distributed approximate algorithm that can be implemented on multi-processor architectures and enables trading off accuracy with running speed. We report computational results on benchmarks of enzyme-inhibitor and other types of complexes, establishing that the side-chain flexibility our algorithm introduces substantially improves the performance of docking protocols. Further, we establish that the inclusion of unbound side-chain conformers in the side-chain positioning problem is critical in these performance improvements.
In this paper we extend a recently introduced rigid body minimization algorithm, defined on manifolds, to the problem of minimizing the energy of interacting flexible molecules. The goal is to integrate moving the ligand in six dimensional rotational/translational space with internal rotations around rotatable bonds within the two molecules. We show that adding rotational degrees of freedom to the rigid moves of the ligand results in an overall optimization search space that is a manifold to which our manifold optimization approach can be extended. The effectiveness of the method is shown for three different docking problems of increasing complexity. First we minimize the energy of fragment-size ligands with a single rotatable bond as part of a protein mapping method developed for the identification of binding hot spots. Second, we consider energy minimization for docking a flexible ligand to a rigid protein receptor, an approach frequently used in existing methods. In the third problem we account for flexibility in both the ligand and the receptor. Results show that minimization using the manifold optimization algorithm is substantially more efficient than minimization using a traditional all-atom optimization algorithm while producing solutions of comparable quality. In addition to the specific problems considered, the method is general enough to be used in a large class of applications such as docking multidomain proteins with flexible hinges. The code is available under open source license (at http://cluspro.bu.edu/Code/Code_Rigtree.tar), and with minimal effort can be incorporated into any molecular modeling package.
We report the first assessment of blind predictions of water positions at protein-protein interfaces, performed as part of the CAPRI (Critical Assessment of Predicted Interactions) community-wide experiment. Groups submitting docking predictions for the complex of the DNase domain of colicin E2 and Im2 immunity protein (CAPRI target 47), were invited to predict the positions of interfacial water molecules using the method of their choice. The predictions – 20 groups submitted a total of 195 models – were assessed by measuring the recall fraction of water-mediated protein contacts. Of the 176 high or medium quality docking models – a very good docking performance per se – only 44% had a recall fraction above 0.3, and a mere 6% above 0.5. The actual water positions were in general predicted to an accuracy level no better than 1.5 Å, and even in good models about half of the contacts represented false positives. This notwithstanding, three hotspot interface water positions were quite well predicted, and so was one of the water positions that is believed to stabilize the loop that confers specificity in these complexes. Overall the best interface water predictions was achieved by groups that also produced high quality docking models, indicating that accurate modelling of the protein portion is a determinant factor. The use of established molecular mechanics force fields, coupled to sampling and optimization procedures also seemed to confer an advantage. Insights gained from this analysis should help improve the prediction of protein-water interactions and their role in stabilizing protein complexes.
Protein docking; water; blind prediction; CAPRI; protein interface
mechanics and dynamics simulations use distance based
cutoff approximations for faster computation of pairwise van der Waals
and electrostatic energy terms. These approximations traditionally
use a precalculated and periodically updated list of interacting atom
pairs, known as the “nonbonded neighborhood lists” or
nblists, in order to reduce the overhead of finding atom pairs that
are within distance cutoff. The size of nblists grows linearly with
the number of atoms in the system and superlinearly with the distance
cutoff, and as a result, they require significant amount of memory
for large molecular systems. The high space usage leads to poor cache
performance, which slows computation for large distance cutoffs. Also,
the high cost of updates means that one cannot afford to keep the
data structure always synchronized with the configuration of the molecules
when efficiency is at stake. We propose a dynamic octree data structure
for implicit maintenance of nblists using space linear in the number
of atoms but independent of the distance cutoff. The list can be updated
very efficiently as the coordinates of atoms change during the simulation.
Unlike explicit nblists, a single octree works for all distance cutoffs.
In addition, octree is a cache-friendly data structure, and hence,
it is less prone to cache miss slowdowns on modern memory hierarchies
than nblists. Octrees use almost 2 orders of magnitude less memory,
which is crucial for simulation of large systems, and while they are
comparable in performance to nblists when the distance cutoff is small,
they outperform nblists for larger systems and large cutoffs. Our
tests show that octree implementation is approximately 1.5 times faster
in practical use case scenarios as compared to nblists.
ligand design; fragment conservation; druggability
Many proteins of widely differing functionality and structure are capable of binding heparin and heparan sulfate. Since crystallizing protein-heparin complexes for structure determination is generally difficult, computational docking can be a useful approach for understanding specific interactions. Previous studies used programs originally developed for docking small molecules to well-defined pockets, rather than for docking polysaccharides to highly charged shallow crevices that usually bind heparin. We have extended the program PIPER and the automated protein-protein docking server ClusPro to heparin docking. Using a molecular mechanics energy function for scoring and the fast Fourier transform correlation approach, the method generates and evaluates close to a billion poses of a heparin tetrasaccharide probe. The docked structures are clustered using pairwise root mean square deviations as the distance measure. It was shown that clustering of heparin molecules close to each other but having different orientations and selecting the clusters with the highest protein-ligand contacts reliably predicts the heparin binding site. In addition, the centers of the five most populated clusters include structures close to the native orientation of the heparin. These structures can provide starting points for further refinement by methods that account for flexibility such as molecular dynamics. The heparin docking method is available as an advanced option of the ClusPro server at http://cluspro.bu.edu/.
proteins of widely differing functionality and structure are
capable of binding heparin and heparan sulfate. Since crystallizing
protein–heparin complexes for structure determination is generally
difficult, computational docking can be a useful approach for understanding
specific interactions. Previous studies used programs originally developed
for docking small molecules to well-defined pockets, rather than for
docking polysaccharides to highly charged shallow crevices that usually
bind heparin. We have extended the program PIPER and the automated
protein–protein docking server ClusPro to heparin docking.
Using a molecular mechanics energy function for scoring and the fast
Fourier transform correlation approach, the method generates and evaluates
close to a billion poses of a heparin tetrasaccharide probe. The docked
structures are clustered using pairwise root-mean-square deviations
as the distance measure. It was shown that clustering of heparin molecules
close to each other but having different orientations and selecting
the clusters with the highest protein–ligand contacts reliably
predicts the heparin binding site. In addition, the centers of the
five most populated clusters include structures close to the native
orientation of the heparin. These structures can provide starting
points for further refinement by methods that account for flexibility
such as molecular dynamics. The heparin docking method is available
as an advanced option of the ClusPro server at http://cluspro.bu.edu/.
translation initiation factor 2B (eIF2B), the guanine
nucleotide exchange factor for the G-protein eIF2, is one of the main
targets for the regulation of protein synthesis. The eIF2B activity
is inhibited in response to a wide range of stress factors and diseases,
including viral infections, hypoxia, nutrient starvation, and heme
deficiency, collectively known as the integrated stress response.
eIF2B has five subunits (α–ε). The α, β,
and δ subunits are homologous to each other and form the eIF2B
regulatory subcomplex, which is believed to be a trimer consisting
of monomeric α, β, and δ subunits. Here we use a
combination of biophysical methods, site-directed mutagenesis, and
bioinformatics to show that the human eIF2Bα subunit is in fact
a homodimer, at odds with the current trimeric model for the eIF2Bα/β/δ
regulatory complex. eIF2Bα dimerizes using the same interface
that is found in the homodimeric archaeal eIF2Bα/β/δ
aIF2B and related metabolic enzymes. We also present evidence that
the eIF2Bβ/δ binding interface is similar to that in the
eIF2Bα2 homodimer. Mutations at the predicted eIF2Bβ/δ
dimer interface cause genetic neurological disorders in humans. We
propose that the eIF2B regulatory subcomplex is an α2β2δ2 hexamer, composed of one α2 homodimer and two βδ heterodimers. Our results
offer novel insights into the architecture of eIF2B and its interactions
with the G-protein eIF2.
The potential utility of synthetic macrocycles as drugs, particularly against low
druggability targets such as protein-protein interactions, has been widely discussed.
There is little information, however, to guide the design of macrocycles for good target
protein-binding activity or bioavailability. To address this knowledge gap we analyze the
binding modes of a representative set of macrocycle-protein complexes. The results,
combined with consideration of the physicochemical properties of approved macrocyclic
drugs, allow us to propose specific guidelines for the design of synthetic macrocycles
libraries possessing structural and physicochemical features likely to favor strong
binding to protein targets and also good bioavailability. We additionally provide evidence
that large, natural product derived macrocycles can bind to targets that are not druggable
by conventional, drug-like compounds, supporting the notion that natural product inspired
synthetic macrocycles can expand the number of proteins that are druggable by synthetic
druglikeness; druggability; ligand efficiency; binding mode; macrocyclic drugs
We propose a new stochastic global optimization method targeting protein docking problems. The method is based on finding a general convex polynomial underestimator to the binding energy function in a permissive subspace that possesses a funnel-like structure. We use Principal Component Analysis (PCA) to determine such permissive subspaces. The problem of finding the general convex polynomial underestimator is reduced into the problem of ensuring that a certain polynomial is a Sum-of-Squares (SOS), which can be done via semi-definite programming. The underestimator is then used to bias sampling of the energy function in order to recover a deep minimum. We show that the proposed method significantly improves the quality of docked conformations compared to existing methods.
In this paper we consider the problem of minimization of a cost function that depends on the location and poses of one or more rigid bodies, or bodies that consist of rigid parts hinged together. We present a unified setting for formulating this problem as an optimization on an appropriately defined manifold for which efficient manifold optimizations can be developed. This setting is based on a Lie group representation of the rigid movements of a body that is different from what is commonly used for this purpose. We illustrate this approach by using the steepest descent algorithm on the manifold of the search space and specify conditions for its convergence.
In screening a library of natural and synthetic products for eukaryotic translation modulators, we identified two natural products, isohymenialdisine and hymenialdisine, that exhibit stimulatory effects on translation. The characterization of these compounds lead to the insight that mRNA used to program the translation extracts during high throughput assay set-up was leading to phosphorylation of eIF2α, a potent negative regulatory event that is mediated by one of four kinases. We identified double-stranded RNA-dependent protein kinase (PKR) as the eIF2α kinase that was being activated by exogenously added mRNA template. Characterization of the mode of action of isohymenialdisine revealed that it directly acts on PKR by inhibiting autophosphorylation, perturbs the PKR-eIF2α phosphorylation axis, and can be modeled into the PKR ATP binding site. Our results identify a source of false positives for high throughput screening (HTS) campaigns using translation extracts, raising a cautionary note for this type of screen.
High Throughput Screens; Translation; PKR; eIF2α; Isohymenialdisine; Hymenialdisine
The protein docking server ClusPro has been participating in CAPRI since its introduction in 2004. This paper evaluates the performance of ClusPro 2.0 for targets 46–58 in rounds 22–27 of CAPRI. The analysis leads to a number of important observations. First, ClusPro reliably yields acceptable or medium accuracy models for targets of moderate difficulty that have also been successfully predicted by other groups, and fails only for targets that have few acceptable models submitted. Second, the quality of automated docking by ClusPro is very close to that of the best human predictor groups, including our own submissions. This is very important, because servers have to submit results within 48 hours and the predictions should be reproducible, whereas human predictors have several weeks and can use any type of information. Third, while we refined the ClusPro results for manual submission by running computationally costly Monte Carlo minimization simulations, we observed significant improvement in accuracy only for two of the six complexes correctly predicted by ClusPro. Fourth, new developments, not seen in previous rounds of CAPRI, are that the top ranked model provided by ClusPro was acceptable or better quality for all these six targets, and that the top ranked model was also the highest quality for five of the six, confirming that ranking models based on cluster size can reliably identify the best near-native conformations.
protein-protein docking; structure refinement; method development; CAPRI docking experiment; web based server; user community
Peptide-mediated interactions, in which a short linear motif binds to a globular domain, play major roles in cellular regulation. An accurate structural model of this type of interaction is an excellent starting point for the characterization of the binding specificity of a given peptide-binding domain. A number of different protocols have recently been proposed for the accurate modeling of peptide-protein complex structures, given the structure of the protein receptor and the binding site on its surface. When no information about the peptide binding site(s) is a priori available, there is a need for new approaches to locate peptide-binding sites on the protein surface. While several approaches have been proposed for the general identification of ligand binding sites, peptides show very specific binding characteristics, and therefore, there is a need for robust and accurate approaches that are optimized for the prediction of peptide-binding sites.
Here we present PeptiMap, a protocol for the accurate mapping of peptide binding sites on protein structures. Our method is based on experimental evidence that peptide-binding sites also bind small organic molecules of various shapes and polarity. Using an adaptation of ab initio ligand binding site prediction based on fragment mapping (FTmap), we optimize a protocol that specifically takes into account peptide binding site characteristics. In a high-quality curated set of peptide-protein complex structures PeptiMap identifies for most the accurate site of peptide binding among the top ranked predictions. We anticipate that this protocol will significantly increase the number of accurate structural models of peptide-mediated interactions.
protein peptide interactions; FFT sampling; binding site detection; mapping; PeptiDB
Background: The use of alternative flame retardants has increased since the phase out of pentabromodiphenyl ethers (pentaBDEs). One alternative, Firemaster® 550 (FM550), induces obesity in rats. Triphenyl phosphate (TPP), a component of FM550, has a structure similar to that of organotins, which are obesogenic in rodents.
Objectives: We tested the hypothesis that components of FM550 are biologically active peroxisome proliferator-activated receptor γ (PPARγ) ligands and estimated indoor exposure to TPP.
Methods: FM550 and its components were assessed for ligand binding to and activation of human PPARγ. Solvent mapping was used to model TPP in the PPARγ binding site. Adipocyte and osteoblast differentiation were assessed in bone marrow multipotent mesenchymal stromal cell models. We estimated exposure of children to TPP using a screening-level indoor exposure model and house dust concentrations determined previously.
Results: FM550 bound human PPARγ, and binding appeared to be driven primarily by TPP. Solvent mapping revealed that TPP interacted with binding hot spots within the PPARγ ligand binding domain. FM550 and its organophosphate components increased human PPARγ1 transcriptional activity in a Cos7 reporter assay and induced lipid accumulation and perilipin protein expression in BMS2 cells. FM550 and TPP diverted osteogenic differentiation toward adipogenesis in primary mouse bone marrow cultures. Our estimates suggest that dust ingestion is the major route of exposure of children to TPP.
Conclusions: Our findings suggest that FM550 components bind and activate PPARγ. In addition, in vitro exposure initiated adipocyte differentiation and antagonized osteogenesis. TPP likely is a major contributor to these biological actions. Given that TPP is ubiquitous in house dust, further studies are warranted to investigate the health effects of FM550.
Citation: Pillai HK, Fang M, Beglov D, Kozakov D, Vajda S, Stapleton HM, Webster TF, Schlezinger JJ. 2014. Ligand binding and activation of PPARγ by Firemaster® 550: effects on adipogenesis and osteogenesis in vitro. Environ Health Perspect 122:1225–1232; http://dx.doi.org/10.1289/ehp.1408111
Most structure prediction algorithms consist of initial sampling of the conformational space, followed by re-scoring and possibly refinement of a number of selected structures. Here we focus on protein docking, and show that while decoupling sampling and scoring facilitates method development, integration of the two steps can lead to substantial improvements in docking results. Since decoupling is usually achieved by generating a decoy set containing both non-native and near-native docked structures, which are then used for scoring function construction, we first review the roles and potential pitfalls of decoys in protein-protein docking, and show that some type of decoys are better than others for method development. We then describe three case studies showing that complete decoupling of scoring from sampling is not optimal for solving realistic docking problems. Although some of the examples are based on our own experience, the results of the CAPRI docking and scoring experiments also show that decoupling leads to worse results. Next we investigate how the selection of training and decoy sets affects the performance of the scoring functions obtained. Finally, we discuss pathways to better integration of the two steps, and show some algorithms that achieve a certain level of integration. Although we focus on protein-protein docking, our observations also apply to other conformational search problems, including protein structure prediction and the docking of small molecules to proteins.
Molecular interaction; protein-protein docking; conformational search; structure refinement; CAPRI docking experiment; scoring function; molecular mechanics; Monte Carlo method; structure-based potential
Community-wide blind prediction experiments such as CAPRI and CASP provide an objective measure of the current state of predictive methodology. Here we describe a community-wide assessment of methods to predict the effects of mutations on protein-protein interactions. Twenty-two groups predicted the effects of comprehensive saturation mutagenesis for two designed influenza hemagglutinin binders and the results were compared with experimental yeast display enrichment data obtained using deep sequencing. The most successful methods explicitly considered the effects of mutation on monomer stability in addition to binding affinity, carried out explicit side chain sampling and backbone relaxation, and evaluated packing, electrostatic and solvation effects, and correctly identified around a third of the beneficial mutations. Much room for improvement remains for even the best techniques, and large-scale fitness landscapes should continue to provide an excellent test bed for continued evaluation of methodological improvement.
CAPRI; hemagglutinin; binding; deep mutational scanning; yeast display
Many protein-protein interactions (PPIs) are compelling targets for drug discovery, and in a number of cases can be disrupted by small molecules. The main goal of this study is to examine the mechanism of binding site formation in the interface region of proteins that are PPI targets by comparing ligand-free and ligand-bound structures. To avoid any potential bias, we focus on ensembles of ligand-free protein conformations obtained by nuclear magnetic resonance (NMR) techniques and deposited in the Protein Data Bank, rather than on ensembles specifically generated for this study. The measures used for structure comparison are based on detecting binding hot spots, i.e., protein regions that are major contributors to the binding free energy. The main tool of the analysis is computational solvent mapping, which explores the surface of proteins by docking a large number of small “probe” molecules. Although we consider conformational ensembles obtained by NMR techniques, the analysis is independent of the method used for generating the structures. Finding the energetically most important regions, mapping can identify binding site residues using ligand-free models based on NMR data. In addition, the method selects conformations that are similar to some peptide-bound or ligand-bound structure in terms of the properties of the binding site. This agrees with the conformational selection model of molecular recognition, which assumes such pre-existing conformations. The analysis also shows the maximum level of similarity between unbound and bound states that is achieved without any influence from a ligand. Further shift toward the bound structure assumes protein-peptide or protein-ligand interactions, either selecting higher energy conformations that are not part of the NMR ensemble, or leading to induced fit. Thus, forming the sites in protein-protein interfaces that bind peptides and can be targeted by small ligands always includes conformational selection, although other recognition mechanisms may also be involved.
Many protein-protein interfaces (PPIs) are biologically compelling drug targets. Disrupting the interaction between two large proteins by a small inhibitor requires forming a high affinity binding site in the interface that generally can bind both peptides and drug-like compounds. Here we investigate whether such sites are induced by peptide or ligand binding, or already exist in the unbound state. The analysis requires comparing ligand-free and ligand-bound structures. To avoid any potential bias, we study ensembles of ligand-free protein conformations obtained by nuclear magnetic resonance (NMR) rather than generated by simulations. The analysis is based on computational solvent mapping, which explores the surface of the target protein by docking a large number of small “probe” molecules. Results show that ensembles of ligand-free models always include conformations that are fairly similar to some peptide-bound or ligand-bound structure in terms of the properties of the binding site. The analysis also identifies the models that are the most similar to a bound state, and shows the maximum level of similarity that is achieved without any influence from a ligand. While forming the binding site may require a combination of recognition mechanisms, there is preference for the spontaneous formation of bound-like structures.
Side-chain positioning (SCP) is an important component of computational protein docking methods. Existing SCP methods and available software have been designed for protein folding applications where side-chain positioning is also important. As a result they do not take into account significant special structure that SCP for docking exhibits. We propose a new algorithm which poses SCP as a Maximum Weighted Independent Set (MWIS) problem on an appropriately constructed graph. We develop an approximate algorithm which solves a relaxation of the MWIS and then rounds the solution to obtain a high-quality feasible solution to the problem. The algorithm is fully distributed and can be executed on a large network of processing nodes requiring only local information and message-passing between neighboring nodes. Motivated by the special structure in docking, we establish optimality guarantees for a certain class of graphs. Our results on a benchmark set of enzyme-inhibitor protein complexes show that our predictions are close to the native structure and are comparable to the ones obtained by a state-of-the-art method. The results are substantially improved if rotamers from unbound protein structures are included in the search. We also establish that the use of our SCP algorithm substantially improves docking results.