We introduce a message-passing algorithm to solve the Side Chain Positioning (SCP) problem. SCP is a crucial component of protein docking refinement, which is a key step of an important class of problems in computational structural biology called protein docking. We model SCP as a combinatorial optimization problem and formulate it as a Maximum Weighted Independent Set (MWIS) problem. We then employ a modified and convergent belief-propagation algorithm to solve a relaxation of MWIS and develop randomized estimation heuristics that use the relaxed solution to obtain an effective MWIS feasible solution. Using a benchmark set of protein complexes we demonstrate that our approach leads to more accurate docking predictions compared to a baseline algorithm that does not solve the SCP.
PMCID: PMC3600151
Fragment based drug design (FBDD) starts with finding fragment-sized compounds that are highly ligand efficient and can serve as a core moiety for developing high affinity leads. Although the core-bound structure of a protein facilitates the construction of leads, effective design is far from straightforward. We show that protein mapping, a computational method developed to find binding hot spots and implemented as the FTMap server, provides information that complements the fragment screening results and can drive the evolution of core fragments into larger leads with a minimal loss or, in some cases, even a gain in ligand efficiency. The method places small molecular probes, the size of organic solvents, on a dense grid around the protein, and identifies the hot spots as consensus clusters formed by clusters of several probes. The hot spots are ranked based on the number of probe clusters, which predicts the binding propensity of the subsites and hence their importance for drug design. Accordingly, with a single exception the main hot spot identified by FTMap binds the core compound found by fragment screening. The most useful information is provided by the neighboring secondary hot spots, indicating the regions where the core can be extended to increase its affinity. To quantify this information, we calculate the density of probes from mapping, which describes the binding propensity at each point, and show that the change in the correlation between a ligand position and the probe density upon extending or repositioning the core moiety predicts the expected change in ligand efficiency.
doi:10.1021/ci200468p
PMCID: PMC3264775
PMID: 22145575
Protein mapping; protein docking; drug design; ligand efficiency; affinity prediction
Motivation: Binding site identification is a classical problem that is important for a range of applications, including the structure-based prediction of function, the elucidation of functional relationships among proteins, protein engineering and drug design. We describe an accurate method of binding site identification, namely FTSite. This method is based on experimental evidence that ligand binding sites also bind small organic molecules of various shapes and polarity. The FTSite algorithm does not rely on any evolutionary or statistical information, but achieves near experimental accuracy: it is capable of identifying the binding sites in over 94% of apo proteins from established test sets that have been used to evaluate many other binding site prediction methods.
Availability: FTSite is freely available as a web-based server at http://ftsite.bu.edu.
Contact: vajda@bu.edu; midas@bu.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btr651
PMCID: PMC3259439
PMID: 22113084
Summary
Computational solvent mapping globally samples the surface of target proteins using molecular probes – small molecules or functional groups – to identify potentially favorable binding positions. The method is based on X-ray and NMR screening studies showing that the binding sites of proteins also bind a large variety of fragment-sized molecules. We have developed the multi-stage mapping algorithm FTMap (available as a server at http://ftmap.bu.edu/) based on the fast Fourier transform (FFT) correlation approach. Identifying regions of low free energy rather than individual low energy conformations, FTMap reproduces the available experimental mapping results. Applications to a variety of proteins show that the probes always cluster in important subsites of the binding site, and the amino acid residues that interact with many probes also bind the specific ligands of the protein. The “consensus” sites at which a number of different probes cluster are likely to be “druggable” sites, capable of binding drug-size ligands with high affinity. Due to its sensitivity to conformational changes the method can also be used for comparing the binding sites in different structures of a protein.
doi:10.1007/978-1-61779-465-0_2
PMCID: PMC3526383
PMID: 22183527
Protein structure; protein-ligand interactions; binding site; binding hot spots; fragment-based ligand design; druggability; binding site comparison; docking
Buhrman, Greg | O’Connor, Casey | Zerbe, Brandon | Kearney, Bradley M. | Napoleon, Raeanne | Kovrigina, Elizaveta A. | Vajda, Sandor | Kozakov, Dima | Kovrigin, Evgenii L. | Mattos, Carla
We have recently discovered an allosteric switch in Ras, bringing an additional level of complexity to this GTPase whose mutants are involved in nearly 30% of cancers. Upon activation of the allosteric switch, there is a shift in helix 3/loop 7 associated with a disorder to order transition in the active site. Here, we use a combination of multiple solvent crystal structures and computational solvent mapping (FTMap) to determine binding site hot spots in the “off” and “on” allosteric states of the GTP-bound form of H-Ras. Thirteen sites are revealed, expanding possible target sites for ligand binding well beyond the active site. Comparison of FTMaps for the H and K isoforms reveals essentially identical hot spots. Furthermore, using NMR measurements of spin relaxation, we determined that K-Ras exhibits global conformational dynamics very similar to those we previously reported for H-Ras. We thus hypothesize that the global conformational rearrangement serves as a mechanism for allosteric coupling between the effector interface and remote hot spots in all Ras isoforms. At least with respect to the binding sites involving the G domain, H-Ras is an excellent model for K-Ras and probably N-Ras as well. Ras has so far been elusive as a target for drug design. The present work identifies various unexplored hot spots throughout the entire surface of Ras, extending the focus from the disordered active site to well-ordered locations that should be easier to target.
doi:10.1016/j.jmb.2011.09.011
PMCID: PMC3247908
PMID: 21945529
Ras isoforms; drug target; binding site hot spots; Ras dynamics; allosteric switch
Hingtgen, Shawn | Kasmieh, Randa | Elbayly, Elizabeth | Nesterenko, Irina | Figueiredo, Jose-Luiz | Dash, Rupesh | Sarkar, Devanand | Hall, David | Kozakov, Dima | Vajda, Sandor | Fisher, Paul B. | Shah, Khalid | Najbauer, Joseph
Creating new molecules that simultaneously enhance tumor cell killing and permit diagnostic tracking is vital to overcoming the limitations rendering current therapeutic regimens for terminal cancers ineffective. Accordingly, we investigated the efficacy of an innovative new multi-functional targeted anti-cancer molecule, SM7L, using models of the lethal brain tumor Glioblastoma multiforme (GBM). Designed using predictive computer modeling, SM7L incorporates the therapeutic activity of the promising anti-tumor cytokine MDA-7/IL-24, an enhanced secretory domain, and diagnostic domain for non-invasive tracking. In vitro assays revealed the diagnostic domain of SM7L produced robust photon emission, while the therapeutic domain showed marked anti-tumor efficacy and significant modulation of p38MAPK and ERK pathways. In vivo, the unique multi-functional nature of SM7L allowed simultaneous real-time monitoring of both SM7L delivery and anti-tumor efficacy. Utilizing engineered stem cells as novel delivery vehicles for SM7L therapy (SC-SM7L), we demonstrate that SC-SM7L significantly improved pharmacokinetics and attenuated progression of established peripheral and intracranial human GBM xenografts. Furthermore, SC-SM7L anti-tumor efficacy was augmented in vitro and in vivo by concurrent activation of caspase-mediated apoptosis induced by adjuvant SC-mediated S-TRAIL delivery. Collectively, these studies define a promising new approach to treating highly aggressive cancers, including GBM, using the optimized therapeutic molecule SM7L.
doi:10.1371/journal.pone.0040234
PMCID: PMC3394792
PMID: 22808125
Formaldehyde has long been recognized as a hazardous environmental agent highly reactive with DNA. Recently, it has been realized that due to the activity of histone demethylation enzymes within the cell nucleus, formaldehyde is produced endogenously, in direct vicinity of genomic DNA. Should it lead to extensive DNA damage? We address this question with the aid of a computational mapping method, analogous to X-ray and nuclear magnetic resonance techniques for observing weakly specific interactions of small organic compounds with a macromolecule in order to establish important functional sites. We concentrate on the leading reaction of formaldehyde with free bases: hydroxymethylation of cytosine amino groups. Our results show that in B-DNA, cytosine amino groups are totally inaccessible for the formaldehyde attack. Then, we explore the effect of recently discovered transient flipping of Watson–Crick (WC) pairs into Hoogsteen (HG) pairs (HG breathing). Our results show that the HG base pair formation dramatically affects the accessibility for formaldehyde of cytosine amino nitrogens within WC base pairs adjacent to HG base pairs. The extensive literature on DNA interaction with formaldehyde is analyzed in light of the new findings. The obtained data emphasize the significance of DNA HG breathing.
doi:10.1093/nar/gks519
PMCID: PMC3439909
PMID: 22705795
Binding hot spots, protein sites with high-binding affinity, can be identified using X-ray crystallography or NMR by screening libraries of small organic molecules that tend to cluster at such regions. FTMAP, a direct computational analog of the experimental screening approaches, globally samples the surface of a target protein using small organic molecules as probes, finds favorable positions, clusters the conformations and ranks the clusters on the basis of the average energy. The regions that bind several probe clusters predict the binding hot spots, in good agreement with experimental results. Small molecules discovered by fragment-based approaches to drug design also bind at the hot spot regions. To identify such molecules and their most likely bound positions, we extend the functionality of FTMAP (http://ftmap.bu.edu/param) to accept any small molecule as an additional probe. In its updated form, FTMAP identifies the hot spots based on a standard set of probes, and for each additional probe shows representative structures of nearby low energy clusters. This approach helps to predict bound poses of the user-selected molecules, detects if a compound is not likely to bind in the hot spot region, and provides input for the design of larger ligands.
doi:10.1093/nar/gks441
PMCID: PMC3394268
PMID: 22589414
Cencic, Regina | Desforges, Marc | Hall, David R. | Kozakov, Dima | Du, Yuhong | Min, Jaeki | Dingledine, Raymond | Fu, Haian | Vajda, Sandor | Talbot, Pierre J. | Pelletier, Jerry
Coronaviruses are a family of enveloped single-stranded positive-sense RNA viruses causing respiratory, enteric, and neurologic diseases in mammals and fowl. Human coronaviruses are recognized to cause up to a third of common colds and are suspected to be involved in enteric and neurologic diseases. Coronavirus replication involves the generation of nested subgenomic mRNAs (sgmRNAs) with a common capped 5′ leader sequence. The translation of most of the sgmRNAs is thought to be cap dependent and displays a requirement for eukaryotic initiation factor 4F (eIF4F), a heterotrimeric complex needed for the recruitment of 40S ribosomes. We recently reported on an ultrahigh-throughput screen to discover compounds that inhibit eIF4F activity by blocking the interaction of two of its subunits (R. Cencic et al., Proc. Natl. Acad. Sci. U. S. A. 108:1046–1051, 2011). Herein we describe a molecule from this screen that prevents the interaction between eIF4E (the cap-binding protein) and eIF4G (a large scaffolding protein), inhibiting cap-dependent translation. This inhibitor significantly decreased human coronavirus 229E (HCoV-229E) replication, reducing the percentage of infected cells and intra- and extracellular infectious virus titers. Our results support the strategy of targeting the eIF4F complex to block coronavirus infection.
doi:10.1128/JVI.00078-11
PMCID: PMC3126520
PMID: 21507972
Kozakov, Dima | Hall, David R. | Beglov, Dmitri | Brenke, Ryan | Comeau, Stephen R. | Shen, Yang | Li, Keyong | Zheng, Jiefu | Vakili, Pirooz | Paschalidis, Ioannis Ch. | Vajda, Sandor
Our approach to protein-protein docking includes three main steps. First we run PIPER, a rigid body docking program based on the Fast Fourier Transform (FFT) correlation approach, extended to use pairwise interactions potentials. Next, the 1000 best energy conformations are clustered, and the 30 largest clusters are retained for refinement. Third, the stability of the clusters is analyzed by short Monte Carlo simulations, and the structures are refined by the medium-range optimization method SDU. The first two steps of this approach are implemented in the ClusPro 2.0 protein-protein docking server. Despite being fully automated, the last step is computationally too expensive to be included in the server. Comparing the models obtained in CAPRI rounds 13–19 by ClusPro, by the refinement of the ClusPro predictions, and by all predictor groups, we arrived at three conclusions. First, for the first time in the CAPRI history, our automated ClusPro server was able to compete with the best human predictor groups. Second, selecting the top ranked models, our current protocol reliably generates high quality structures of protein-protein complexes from the structures of separately crystallized proteins, even in the absence of biological information, provided that there is limited backbone conformational change. Third, despite occasional successes, homology modeling requires further improvement to achieve reliable docking results.
doi:10.1002/prot.22835
PMCID: PMC3027207
PMID: 20818657
Structures of the influenza A virus M2 proton channel have been determined by X-ray crystallography in the open conformation, and by NMR in the closed state. Whereas the X-ray structure shows a single inhibitor molecule in the middle of the channel, four inhibitor molecules bind the channel’s outer surface in the NMR structure. Although in both structures the strongest hot spots (i.e., regions which substantially contribute to the free energy of binding any potential ligand) lie inside the pore, hot spots also are found at exterior locations. By considering all available models, we propose the primary drug binding site is inside the pore, but that exterior binding also occurs under appropriate conditions.
doi:10.1016/j.tibs.2010.03.006
PMCID: PMC2919587
PMID: 20382026
The steroid and xenobiotic-responsive human pregnane X receptor (PXR) binds a broad range of structurally diverse compounds. The structures of the apo and ligand-bound forms of PXR are very similar, in contrast to most promiscuous proteins that generally adapt their shape to different ligands. We investigated the structural origins of PXR's recognition promiscuity using computational solvent mapping, a technique developed for the identification and characterization of hot spots, i.e., regions of the protein surface that are major contributors to the binding free energy. Results reveal that the smooth and nearly spherical binding site of PXR has a well-defined hot spot structure, with four hot spots located on four different sides of the pocket and a fifth close to its center. Three of these hot spots are already present in the ligand-free protein. The most important hot spot is defined by three structurally and sequentially conserved residues, W299, F288, and Y306. This largely hydrophobic site is not very specific, and interacts with all known PXR ligands. Depending on their sizes and shapes, individual PXR ligands extend into 2, 3, or 4 more hot spot regions. The large number of potential arrangements within the binding site explains why PXR is able to accommodate a large variety of compounds. All five hot spots include at least one important residue, which is conserved in all mammalian PXRs, suggesting that the hot spot locations have remained largely invariant during mammalian evolution. The same side chains also show a high level of structural conservation across hPXR structures. However, each of the hPXR hot spots also includes residues with moveable side chains, further increasing the size variation in ligands that PXR can bind. Results also suggest a unique signal transduction mechanism between the PXR homodimerization interface and its co-activator binding site.
doi:10.1021/bi901578n
PMCID: PMC2789303
PMID: 19856963
The interactions of beta2 glycoprotein I (B2GPI) with the receptors of the low-density lipoprotein receptor (LDLR) family are implicated in the clearance of negatively charged phospholipids and apoptotic cells and, in the presence of autoimmune anti-B2GPI antibodies, in cell activation, which might play a role in the pathology of antiphospholipid syndrome (APS). The ligand-binding domains of the lipoprotein receptors consist of multiple homologous LA modules connected by flexible linkers. In this study, we investigated at the atomic level the features of the LA modules required for binding to B2GPI. To compare the binding interface in B2GPI/LA complex to that observed in the high-resolution co-crystal structure of the receptor associated protein (RAP) with the LA modules 3 and 4 from the LDLR, we used the LA module 4 from the LDLR in our studies. Using solution NMR spectroscopy, we found that LA4 interacts with B2GPI and the binding site for B2GPI on the 15N-labeled LA4 is formed by the calcium coordinating residues of the LA module. We built a model for the complex between domain V of B2GPI (B2GPI-DV) and LA4 without introducing any experimentally derived constraints into the docking procedure. Our model, which is in the agreement with the NMR data, suggests that the binding interface of B2GPI for the lipoprotein receptors is centered at three lysine residues of B2GPI-DV, Lys 308, Lys 282 and Lys317.
doi:10.1002/prot.22519
PMCID: PMC2767435
PMID: 19676115
LDLR; lipoprotein receptors; B2GPI; beta2-glycoprotein I; PIPER; molecular docking; antiphospholipid syndrome; APS
Landon, Melissa R. | Lieberman, Raquel L. | Hoang, Quyen Q. | Ju, Shulin | Caaveiro, Jose M. M. | Orwig, Susan D. | Kozakov, Dima | Brenke, Ryan | Chuang, Gwo-Yu | Beglov, Dmitry | Vajda, Sandor | Petsko, Gregory A. | Ringe, Dagmar
The identification of hot spots, i.e. binding regions that contribute substantially to the free energy of ligand binding, is a critical step for structure-based drug design. Here we present the application of two fragment-based methods to the detection of hot spots for DJ-1 and glucocerebrosidase (GCase), targets for the development of therapeutics for Parkinson’s and Gaucher’s diseases respectively. While the structures of these two proteins are known, binding information is lacking. In this study we employ both the multiple solvent crystal structures (MSCS) method and the FTMap algorithm to identify regions suitable for the development of pharmacological chaperones for DJ-1 and GCase. Comparison of data derived via MSCS and FTMap also shows that FTMap, a computational method for the identification of fragment binding hot spots, is an accurate and robust alternative to the performance of expensive and difficult MSCS experiments.
doi:10.1007/s10822-009-9283-2
PMCID: PMC2889209
PMID: 19521672
fragment-based drug design; structure-based drug design; hot spot identification; DJ-1; glucocerebrosidase; Parkinson’s disease; Gaucher’s disease; pharmacological chaperones
The analysis of results from CAPRI (Critical Assessment of Predicted Interactions), the first communitywide experiment devoted to protein docking, shows that all successful methods consist of multiple stages. The methods belong to three classes: global methods based on fast Fourier transforms or geometric matching, medium range Monte Carlo methods, and the restraint-guided HADDOCK program. Although these classes of methods require very different amounts of information in addition to the structures of component proteins, they all share the same four computational steps: (1) simplified and/or rigid body search; (2) selecting the region(s) of interest; (3) refinement of docked structures; and (4) selecting the best models. While each method is optimal for a specific class of docking problems, combining computational steps from different methods can improve the reliability and accuracy of results.
doi:10.1016/j.sbi.2009.02.008
PMCID: PMC2763924
PMID: 19327983
Motivation: The binding sites of proteins generally contain smaller regions that provide major contributions to the binding free energy and hence are the prime targets in drug design. Screening libraries of fragment-sized compounds by NMR or X-ray crystallography demonstrates that such ‘hot spot’ regions bind a large variety of small organic molecules, and that a relatively high ‘hit rate’ is predictive of target sites that are likely to bind drug-like ligands with high affinity. Our goal is to determine the ‘hot spots’ computationally rather than experimentally.
Results: We have developed the FTMAP algorithm that performs global search of the entire protein surface for regions that bind a number of small organic probe molecules. The search is based on the extremely efficient fast Fourier transform (FFT) correlation approach which can sample billions of probe positions on dense translational and rotational grids, but can use only sums of correlation functions for scoring and hence is generally restricted to very simple energy expressions. The novelty of FTMAP is that we were able to incorporate and represent on grids a detailed energy expression, resulting in a very accurate identification of low-energy probe clusters. Overlapping clusters of different probes are defined as consensus sites (CSs). We show that the largest CS is generally located at the most important subsite of the protein binding site, and the nearby smaller CSs identify other important subsites. Mapping results are presented for elastase whose structure has been solved in aqueous solutions of eight organic solvents, and we show that FTMAP provides very similar information. The second application is to renin, a long-standing pharmaceutical target for the treatment of hypertension, and we show that the major CSs trace out the shape of the first approved renin inhibitor, aliskiren.
Availability: FTMAP is available as a server at http://ftmap.bu.edu/.
Contact: vajda@bu.edu
Supplementary information: Supplementary Material is available at Bioinformatics online.
doi:10.1093/bioinformatics/btp036
PMCID: PMC2647826
PMID: 19176554
Fast Fourier Transform (FFT) correlation methods of protein-protein docking, combined with the clustering of low energy conformations, can find a number of local minima on the energy surface. For most complexes the locations of the near-native structures can be constrained to the 30 largest clusters, each surrounding a local minimum. However, no reliable further discrimination can be obtained by energy measures because the differences in the energy levels between the minima are comparable to the errors in the energy evaluation. In fact, no current scoring function accounts for the entropic contributions that relate to the width rather than the depth of the minima. Since structures at narrow minima loose more entropy, some of the non-native states can be detected by determining whether or not a local minimum is surrounded by a broad region of attraction on the energy surface. The analysis is based on starting Monte Carlo Minimization (MCM) runs from random points around each minimum, and observing whether a certain fraction of trajectories converge to a small region within the cluster. The cluster is considered stable if such a strong attractor exists, has at least 10 convergent trajectories, is relatively close to the original cluster center, and contains a low energy structure. We studied the stability of clusters for enzyme-inhibitor and antibody-antigen complexes in the Protein Docking Benchmark. The analysis yields three main results. First, all clusters that are close to the native structure are stable. Second, restricting considerations to stable clusters eliminates around half of the false positives, i.e., solutions that are low in energy but far from the native structure of the complex. Third, dividing the conformational space into clusters and determining the stability of each cluster, the combined approach is less dependent on a priori information than exploring the potential conformational space by Monte Carlo minimizations.
doi:10.1002/prot.21997
PMCID: PMC2823634
PMID: 18300245
Fast Fourier Transform; Monte Carlo minimization; structure refinement; selection of near-native structures
This paper introduces a new stochastic global optimization method targeting protein-protein docking problems, an important class of problems in computational structural biology. The method is based on finding general convex quadratic underestimators to the binding energy function that is funnel-like. Finding the optimum underestimator requires solving a semidefinite programming problem, hence the name semidefinite programming-based underestimation (SDU). The underestimator is used to bias sampling in the search region. It is established that under appropriate conditions SDU locates the global energy minimum with probability approaching one as the sample size grows. A detailed comparison of SDU with a related method of convex global underestimator (CGU), and computational results for protein-protein docking problems are provided.
doi:10.1109/TAC.2007.894518
PMCID: PMC2744142
PMID: 19759849
Linear matrix inequalities (LMIs); optimization; protein-protein docking; semidefinite programming; structural biology
Schwede, Torsten | Sali, Andrej | Honig, Barry | Levitt, Michael | Berman, Helen M. | Jones, David | Brenner, Steven E. | Burley, Stephen K. | Das, Rhiju | Dokholyan, Nikolay V. | Dunbrack, Roland L. | Fidelis, Krzysztof | Fiser, Andras | Godzik, Adam | Huang, Yuanpeng Janet | Humblet, Christine | Jacobson, Matthew P. | Joachimiak, Andrzej | Krystek, Stanley R. | Kortemme, Tanja | Kryshtafovych, Andriy | Montelione, Gaetano T. | Moult, John | Murray, Diana | Sanchez, Roberto | Sosnick, Tobin R. | Standley, Daron M. | Stouch, Terry | Vajda, Sandor | Vasquez, Max | Westbrook, John D. | Wilson, Ian A.
Summary
We describe the proceedings and conclusions from a “Workshop on Applications of Protein Models in Biomedical Research” that was held at University of California at San Francisco on 11 and 12 July, 2008. At the workshop, international scientists involved with structure modeling explored (i) how models are currently used in biomedical research, (ii) what the requirements and challenges for different applications are, and (iii) how the interaction between the computational and experimental research communities could be strengthened to advance the field.
doi:10.1016/j.str.2008.12.014
PMCID: PMC2739730
PMID: 19217386
Motivation: Predicting how proteins interact at the molecular level is a computationally intensive task. Many protein docking algorithms begin by using fast Fourier transform (FFT) correlation techniques to find putative rigid body docking orientations. Most such approaches use 3D Cartesian grids and are therefore limited to computing three dimensional (3D) translational correlations. However, translational FFTs can speed up the calculation in only three of the six rigid body degrees of freedom, and they cannot easily incorporate prior knowledge about a complex to focus and hence further accelerate the calculation. Furthemore, several groups have developed multi-term interaction potentials and others use multi-copy approaches to simulate protein flexibility, which both add to the computational cost of FFT-based docking algorithms. Hence there is a need to develop more powerful and more versatile FFT docking techniques.
Results: This article presents a closed-form 6D spherical polar Fourier correlation expression from which arbitrary multi-dimensional multi-property multi-resolution FFT correlations may be generated. The approach is demonstrated by calculating 1D, 3D and 5D rotational correlations of 3D shape and electrostatic expansions up to polynomial order L=30 on a 2 GB personal computer. As expected, 3D correlations are found to be considerably faster than 1D correlations but, surprisingly, 5D correlations are often slower than 3D correlations. Nonetheless, we show that 5D correlations will be advantageous when calculating multi-term knowledge-based interaction potentials. When docking the 84 complexes of the Protein Docking Benchmark, blind 3D shape plus electrostatic correlations take around 30 minutes on a contemporary personal computer and find acceptable solutions within the top 20 in 16 cases. Applying a simple angular constraint to focus the calculation around the receptor binding site produces acceptable solutions within the top 20 in 28 cases. Further constraining the search to the ligand binding site gives up to 48 solutions within the top 20, with calculation times of just a few minutes per complex. Hence the approach described provides a practical and fast tool for rigid body protein-protein docking, especially when prior knowledge about one or both binding sites is available.
Availability: http://www.csd.abdn.ac.uk/hex/
Contact: d.w.ritchie@abdn.ac.uk
doi:10.1093/bioinformatics/btn334
PMCID: PMC2732220
PMID: 18591193
Similarly to protein folding, the association of two proteins is driven by a free energy funnel, determined by favorable interactions in some neighborhood of the native state. We describe a docking method based on stochastic global minimization of funnel-shaped energy functions in the space of rigid body motions (SE(3)) while accounting for flexibility of the interface side chains. The method, called semi-definite programming-based underestimation (SDU), employs a general quadratic function to underestimate a set of local energy minima and uses the resulting underestimator to bias further sampling. While SDU effectively minimizes functions with funnel-shaped basins, its application to docking in the rotational and translational space SE(3) is not straightforward due to the geometry of that space. We introduce a strategy that uses separate independent variables for side-chain optimization, center-to-center distance of the two proteins, and five angular descriptors of the relative orientations of the molecules. The removal of the center-to-center distance turns out to vastly improve the efficiency of the search, because the five-dimensional space now exhibits a well-behaved energy surface suitable for underestimation. This algorithm explores the free energy surface spanned by encounter complexes that correspond to local free energy minima and shows similarity to the model of macromolecular association that proceeds through a series of collisions. Results for standard protein docking benchmarks establish that in this space the free energy landscape is a funnel in a reasonably broad neighborhood of the native state and that the SDU strategy can generate docking predictions with less than 5 Å ligand interface Cα root-mean-square deviation while achieving an approximately 20-fold efficiency gain compared to Monte Carlo methods.
Author Summary
Protein–protein interactions play a central role in various aspects of the structural and functional organization of the cell, and their elucidation is crucial for a better understanding of processes such as metabolic control, signal transduction, and gene regulation. Genomewide proteomics studies, primarily yeast two-hybrid assays, will provide an increasing list of interacting proteins, but only a small fraction of the potential complexes will be amenable to direct experimental analysis. Thus, it is important to develop computational docking methods that can elucidate the details of specific interactions at the atomic level. Protein–protein docking generally starts with a rigid body search that generates a large number of docked conformations with good shape, electrostatic, and chemical complementarity. The conformations are clustered to obtain a manageable number of models, but the current methods are unable to select the most likely structure among these models. Here we describe a refinement algorithm that, applied to the individual clusters, improves the quality of the models. The better models are suitable for higher-accuracy energy calculation, thereby increasing the chances that near-native structures can be identified, and thus the refinement increases the reliability of the entire docking algorithm.
doi:10.1371/journal.pcbi.1000191
PMCID: PMC2538569
PMID: 18846200
We propose a new computational approach for protein docking exploiting energy funnels in the 6-dimensional space of translations and rotations of the ligand with respect to the receptor. Our approach consists of a series of translational and orientational moves of the ligand towards the receptor. Each move is performed using a global optimization method we have developed – the Semi-Definite Underestimation (SDU) method – which can exploit a funnel-like energy function. We compared our approach with Monte Carlo on a set of 10 protein complexes using two residue-level potentials. To achieve the same level of performance (produce a near-native ≤ 3Å RMSD complex) our approach reduces energy evaluations by more than a factor of two, on average.
doi:10.1109/IEMBS.2006.260790
PMCID: PMC2446401
PMID: 17946298
Computational biology; Global optimization; Semi-definite programming; Molecular docking
The influenza virus subtype H5N1 has raised concerns of a possible human pandemic threat because of its high virulence and mutation rate. Although several approved anti-influenza drugs effectively target the neuraminidase, some strains have already acquired resistance to the currently available anti-influenza drugs. In this study, we present the synergistic application of extended explicit solvent molecular dynamics (MD) and computational solvent mapping (CS-Map) to identify putative ‘hot spots’ within flexible binding regions of N1 neuraminidase. Using representative conformations of the N1 binding region extracted from a clustering analysis of four concatenated 40-ns MD simulations, CS-Map was utilized to assess the ability of small, solvent-sized molecules to bind within close proximity to the sialic acid binding region. Mapping analyses of the dominant MD conformations reveal the presence of additional hot spot regions in the 150- and 430-loop regions. Our hot spot analysis provides further support for the feasibility of developing high-affinity inhibitors capable of binding these regions, which appear to be unique to the N1 strain.
doi:10.1111/j.1747-0285.2007.00614.x
PMCID: PMC2438278
PMID: 18205727
computational solvent mapping; ensemble-based drug design; H5N1; hot spot; molecular dynamics; neuraminidase; receptor flexibility; RMSD clustering
PRECISE (Predicted and Consensus Interaction Sites in Enzymes) is a database of interactions between the amino acid residues of an enzyme and its ligands (substrate and transition state analogs, cofactors, inhibitors and products). It is available online at http://precise.bu.edu/. In the current version, all information on interactions is extracted from the enzyme–ligand complexes in the Protein Data Bank (PDB) by performing the following steps: (i) clustering homologous enzyme chains such that, in each cluster, the proteins have the same EC number and all sequences are similar; (ii) selecting a representative chain for each cluster; (iii) selecting ligand types; (iv) finding non-bonded interactions and hydrogen bonds; and (v) summing the interactions for all chains within the cluster. The output of the search is the color-coded sequence of the representative. The colors indicate the total number of interactions found at each amino acid position in all chains of the cluster. Clicking on a residue displays a detailed list of interactions for that residue. Optional filters allow restricting the output to selected chains in the cluster, to non-bonded or hydrogen bonding interactions, and to selected ligand types. The binding site information is essential for understanding and altering substrate specificity and for the design of enzyme inhibitors.
doi:10.1093/nar/gki091
PMCID: PMC540045
PMID: 15608178
Consensus is a server developed to produce high-quality alignments for comparative modeling, and to identify the alignment regions reliable for copying from a given template. This is accomplished even when target–template sequence identity is as low as 5%. Combining the output from five different alignment methods, the server produces a consensus alignment, with a reliability measure indicated for each position and a prediction of the regions suitable for modeling. Models built using the server predictions are typically within 3 Å rms deviations from the crystal structure. Users can upload a target protein sequence and specify a template (PDB code); if no template is given, the server will search for one. The method has been validated on a large set of homologous protein structure pairs. The Consensus server should prove useful for modelers for whom the structural reliability of the model is critical in their applications. It is currently available at http://structure.bu.edu/cgi-bin/consensus/consensus.cgi.
doi:10.1093/nar/gkh456
PMCID: PMC441594
PMID: 15215349