Side-chain positioning (SCP) is an important component of computational protein docking methods. Existing SCP methods and available software have been designed for protein folding applications where side-chain positioning is also important. As a result they do not take into account significant special structure that SCP for docking exhibits. We propose a new algorithm which poses SCP as a Maximum Weighted Independent Set (MWIS) problem on an appropriately constructed graph. We develop an approximate algorithm which solves a relaxation of the MWIS and then rounds the solution to obtain a high-quality feasible solution to the problem. The algorithm is fully distributed and can be executed on a large network of processing nodes requiring only local information and message-passing between neighboring nodes. Motivated by the special structure in docking, we establish optimality guarantees for a certain class of graphs. Our results on a benchmark set of enzyme-inhibitor protein complexes show that our predictions are close to the native structure and are comparable to the ones obtained by a state-of-the-art method. The results are substantially improved if rotamers from unbound protein structures are included in the search. We also establish that the use of our SCP algorithm substantially improves docking results.
Our work is motivated by energy minimization of biological macromolecules, an essential step in computational docking. By allowing some ligand flexibility, we generalize a recently introduced novel representation of rigid body minimization as an optimization on the SO(3)×R3 manifold, rather than on the commonly used Special Euclidean group SE(3). We show that the resulting flexible docking can also be formulated as an optimization on a Lie group that is the direct product of simpler Lie groups for which geodesics and exponential maps can be easily obtained. Our computational results for a local optimization algorithm developed based on this formulation show that it is about an order of magnitude faster than the state-of-the-art local minimization algorithms for computational protein-small molecule docking.
Computational solvent mapping finds binding hot spots, determines their druggability and provides information for drug design. While mapping of a ligand-bound structure yields more accurate results, usually the apo structure serves as the starting point in design. The FTFlex algorithm, implemented as a server, can modify an apo structure to yield mapping results that are similar to those of the respective bound structure. Thus, FTFlex is an extension of our FTMap server, which only considers rigid structures. FTFlex identifies flexible residues within the binding site and determines alternative conformations using a rotamer library. In cases where the mapping results of the apo structure were in poor agreement with those of the bound structure, FTFlex was able to yield a modified apo structure, which lead to improved FTMap results. In cases where the mapping results of the apo and bound structures were in good agreement, no new structure was predicted.
Availability: FTFlex is freely available as a web-based server at http://ftflex.bu.edu/.
email@example.com or firstname.lastname@example.org
Supplementary data are available at Bioinformatics online.
We report a comprehensive analysis of binding energy hot spots at the protein-protein interaction (PPI) interface between NF-κB Essential Modulator (NEMO) and IκB kinase subunit β (IKKβ), an interaction that is critical for NF-κB pathway signaling, using experimental alanine scanning mutagenesis and also the FTMap method for computational fragment screening. The experimental results confirm that the previously identified NBD region of IKKβ contains the highest concentration of hot spot residues, the strongest of which are W739, W741 and L742 (ΔΔG = 4.3, 3.5 and 3.2 kcal/mol, respectively). The region occupied by these residues defines a potentially druggable binding site on NEMO that extends for ~16 Å to additionally include the regions that bind IKKβ L737 and F734. NBD residues D738 and S740 are also important for binding but do not make direct contact with NEMO, instead likely acting to stabilize the active conformation of surrounding residues. We additionally found two previously unknown hot spot regions centered on IKKβ residues L708/V709 and L719/I723. The computational approach successfully identified all three hot spot regions on IKKβ. Moreover, the method was able to accurately quantify the energetic importance of all hot spots residues involving direct contact with NEMO. Our results provide new information to guide the discovery of small molecule inhibitors that target the NEMO/IKKβ interaction. They additionally clarify the structural and energetic complementarity between “pocket-forming” and “pocket occupying” hot spot residues, and further validate computational fragment mapping as a method for identifying hot spots at PPI interfaces.
IKKγ; alanine scanning mutagenesis; protein-protein interactions; IKKγ; fluorescence polarization; fluorescence anisotropy
Our work is motivated by energy minimization in the space of rigid affine transformations of macromolecules, an essential step in computational protein-protein docking. We introduce a novel representation of rigid body motion that leads to a natural formulation of the energy minimization problem as an optimization on the SO(3)×R3 manifold, rather than the commonly used SE(3). The new representation avoids the complications associated with optimization on the SE(3) manifold and provides additional flexibilities for optimization not available in that formulation. The approach is applicable to general rigid body minimization problems. Our computational results for a local optimization algorithm developed based on the new approach show that it is about an order of magnitude faster than a state of art local minimization algorithms for computational protein-protein docking.
An outstanding challenge has been to understand the mechanism whereby proteins associate. We report here the results of exhaustively sampling the conformational space in protein–protein association using a physics-based energy function. The agreement between experimental intermolecular paramagnetic relaxation enhancement (PRE) data and the PRE profiles calculated from the docked structures shows that the method captures both specific and non-specific encounter complexes. To explore the energy landscape in the vicinity of the native structure, the nonlinear manifold describing the relative orientation of two solid bodies is projected onto a Euclidean space in which the shape of low energy regions is studied by principal component analysis. Results show that the energy surface is canyon-like, with a smooth funnel within a two dimensional subspace capturing over 75% of the total motion. Thus, proteins tend to associate along preferred pathways, similar to sliding of a protein along DNA in the process of protein-DNA recognition.
Proteins rarely act alone. Instead, they tend to bind to other proteins to form structures known as complexes. When two proteins come together to form a complex, they twist and turn through a series of intermediate states before they form the actual complex. These intermediate states are difficult to study because they don’t last for very long, which means that our knowledge of how complexes are formed remains incomplete.
One promising approach for studying the formation of complexes is called paramagnetic relaxation enhancement. In this technique certain areas in one of the proteins are labelled with magnetic particles, which produce signals when the two proteins are close to each other. Repeating the measurement several times with the magnetic particles in different positions provides information about the overall structure of the complex. Computational modelling can then be used to work out the fine details of the structure, including the shapes of the intermediate structures made by the proteins as they interact.
A computer method called docking can be used to predict the most favourable positions that the proteins can take, relative to one another, in a complex. This involves calculating the energy contained in the system, with the correct structure having the lowest energy. Docking methods also predict protein models with slightly higher energies, but with structures that are radically different. Modellers usually ignore these structures, but comparing the docking results to paramagnetic relaxation enhancement data, Kozakov et al. found that these structures actually represent the intermediate states.
Analysing the structure of the intermediate states revealed that the movement of the two proteins relative to one another is severely restricted as they form the final complex. Kozakov et al. found that proteins associate along preferred pathways, similar to the way a protein slides along DNA in the process of protein-DNA recognition. Knowing that the movement of the proteins is restricted in this way will enable researchers to improve the efficiency of docking calculations.
encounter landscapes; FFT sampling; protein–protein interactions; none
Virtually all docking methods include some local continuous minimization of an energy/scoring function in order to remove steric clashes and obtain more reliable energy values. In this paper, we describe an efficient rigid-body optimization algorithm that, compared to the most widely used algorithms, converges approximately an order of magnitude faster to conformations with equal or slightly lower energy. The space of rigid body transformations is a nonlinear manifold, namely, a space which locally resembles a Euclidean space. We use a canonical parametrization of the manifold, called the exponential parametrization, to map the Euclidean tangent space of the manifold onto the manifold itself. Thus, we locally transform the rigid body optimization to an optimization over a Euclidean space where basic optimization algorithms are applicable. Compared to commonly used methods, this formulation substantially reduces the dimension of the search space. As a result, it requires far fewer costly function and gradient evaluations and leads to a more efficient algorithm. We have selected the LBFGS quasi-Newton method for local optimization since it uses only gradient information to obtain second order information about the energy function and avoids the far more costly direct Hessian evaluations. Two applications, one in protein-protein docking, and the other in protein-small molecular interactions, as part of macromolecular docking protocols are presented. The code is available to the community under open source license, and with minimal effort can be incorporated into any molecular modeling package.
Motivation: An effective docking algorithm for antibody–protein antigen complex prediction is an important first step toward design of biologics and vaccines. We have recently developed a new class of knowledge-based interaction potentials called Decoys as the Reference State (DARS) and incorporated DARS into the docking program PIPER based on the fast Fourier transform correlation approach. Although PIPER was the best performer in the latest rounds of the CAPRI protein docking experiment, it is much less accurate for docking antibody–protein antigen pairs than other types of complexes, in spite of incorporating sequence-based information on the location of the paratope. Analysis of antibody–protein antigen complexes has revealed an inherent asymmetry within these interfaces. Specifically, phenylalanine, tryptophan and tyrosine residues highly populate the paratope of the antibody but not the epitope of the antigen.
Results: Since this asymmetry cannot be adequately modeled using a symmetric pairwise potential, we have removed the usual assumption of symmetry. Interaction statistics were extracted from antibody–protein complexes under the assumption that a particular atom on the antibody is different from the same atom on the antigen protein. The use of the new potential significantly improves the performance of docking for antibody–protein antigen complexes, even without any sequence information on the location of the paratope. We note that the asymmetric potential captures the effects of the multi-body interactions inherent to the complex environment in the antibody–protein antigen interface.
Availability: The method is implemented in the ClusPro protein docking server, available at http://cluspro.bu.edu.
email@example.com or firstname.lastname@example.org
Supplementary data are available at Bioinformatics online.
In the context of protein-protein interactions, the term “hot spot” refers to a residue or cluster of residues that makes a major contribution to the binding free energy, as determined by alanine scanning mutagenesis. In contrast, in pharmaceutical research a hot spot is a site on a target protein that has high propensity for ligand binding and hence is potentially important for drug discovery. Here we examine the relationship between these two hot spot concepts by comparing alanine scanning data for a set of 15 proteins with results from mapping the protein surfaces for sites that can bind fragment-sized small molecules. We find the two types of hot spots are largely complementary; the residues protruding into hot spot regions identified by computational mapping or experimental fragment screening are almost always themselves hot spot residues as defined by alanine scanning experiments. Conversely, a residue that is found by alanine scanning to contribute little to binding rarely interacts with hot spot regions on the partner protein identified by fragment mapping. In spite of the strong correlation between the two hot spot concepts, they fundamentally differ, however. In particular, while identification of a hot spot by alanine scanning establishes the potential to generate substantial interaction energy with a binding partner, there are additional topological requirements to be a hot spot for small molecule binding. Hence, only a minority of hot spots identified by alanine scanning represent sites that are potentially useful for small inhibitor binding, and it is this subset that is identified by experimental or computational fragment screening.
The goal of this paper is to reduce the complexity of the side chain search within docking problems. We apply six methods of generating side chain conformers to unbound protein structures, and determine their ability of obtaining the bound conformation in small ensembles of conformers. Methods are evaluated in terms of the positions of side chain end groups. Results for 68 protein complexes yield two important observations. First, the end group positions change less than 1 Å upon association for over 60% of interface side chains. Thus, the unbound protein structure carries substantial information about the side chains in the bound state, and the inclusion of the unbound conformation into the ensemble of conformers is very beneficial. Second, considering each surface side chain separately in its protein environment, small ensembles of low energy states include the bound conformation for a large fraction of side chains. In particular, the ensemble consisting of the unbound conformation and the two highest probability predicted conformers includes the bound conformer with an accuracy of 1 Å for 78% of interface side chains. Since more than 60% of the interface side chains have only one conformer and many others only a few, these ensembles of low energy states substantially reduce the complexity of side chain search in docking problems. This approach was already used for finding pockets in protein-protein interfaces that can bind small molecules to potentially disrupt protein-protein interactions. Side chain search with the reduced search space will also be incorporated into protein docking algorithms.
rotamer libraries; side chain flexibility; protein binding; structure prediction; preexisting ensemble of conformers
We introduce a message-passing algorithm to solve the Side Chain Positioning (SCP) problem. SCP is a crucial component of protein docking refinement, which is a key step of an important class of problems in computational structural biology called protein docking. We model SCP as a combinatorial optimization problem and formulate it as a Maximum Weighted Independent Set (MWIS) problem. We then employ a modified and convergent belief-propagation algorithm to solve a relaxation of MWIS and develop randomized estimation heuristics that use the relaxed solution to obtain an effective MWIS feasible solution. Using a benchmark set of protein complexes we demonstrate that our approach leads to more accurate docking predictions compared to a baseline algorithm that does not solve the SCP.
Fragment based drug design (FBDD) starts with finding fragment-sized compounds that are highly ligand efficient and can serve as a core moiety for developing high affinity leads. Although the core-bound structure of a protein facilitates the construction of leads, effective design is far from straightforward. We show that protein mapping, a computational method developed to find binding hot spots and implemented as the FTMap server, provides information that complements the fragment screening results and can drive the evolution of core fragments into larger leads with a minimal loss or, in some cases, even a gain in ligand efficiency. The method places small molecular probes, the size of organic solvents, on a dense grid around the protein, and identifies the hot spots as consensus clusters formed by clusters of several probes. The hot spots are ranked based on the number of probe clusters, which predicts the binding propensity of the subsites and hence their importance for drug design. Accordingly, with a single exception the main hot spot identified by FTMap binds the core compound found by fragment screening. The most useful information is provided by the neighboring secondary hot spots, indicating the regions where the core can be extended to increase its affinity. To quantify this information, we calculate the density of probes from mapping, which describes the binding propensity at each point, and show that the change in the correlation between a ligand position and the probe density upon extending or repositioning the core moiety predicts the expected change in ligand efficiency.
Protein mapping; protein docking; drug design; ligand efficiency; affinity prediction
Motivation: Binding site identification is a classical problem that is important for a range of applications, including the structure-based prediction of function, the elucidation of functional relationships among proteins, protein engineering and drug design. We describe an accurate method of binding site identification, namely FTSite. This method is based on experimental evidence that ligand binding sites also bind small organic molecules of various shapes and polarity. The FTSite algorithm does not rely on any evolutionary or statistical information, but achieves near experimental accuracy: it is capable of identifying the binding sites in over 94% of apo proteins from established test sets that have been used to evaluate many other binding site prediction methods.
Availability: FTSite is freely available as a web-based server at http://ftsite.bu.edu.
Contact: email@example.com; firstname.lastname@example.org
Supplementary information: Supplementary data are available at Bioinformatics online.
Computational solvent mapping globally samples the surface of target proteins using molecular probes – small molecules or functional groups – to identify potentially favorable binding positions. The method is based on X-ray and NMR screening studies showing that the binding sites of proteins also bind a large variety of fragment-sized molecules. We have developed the multi-stage mapping algorithm FTMap (available as a server at http://ftmap.bu.edu/) based on the fast Fourier transform (FFT) correlation approach. Identifying regions of low free energy rather than individual low energy conformations, FTMap reproduces the available experimental mapping results. Applications to a variety of proteins show that the probes always cluster in important subsites of the binding site, and the amino acid residues that interact with many probes also bind the specific ligands of the protein. The “consensus” sites at which a number of different probes cluster are likely to be “druggable” sites, capable of binding drug-size ligands with high affinity. Due to its sensitivity to conformational changes the method can also be used for comparing the binding sites in different structures of a protein.
Protein structure; protein-ligand interactions; binding site; binding hot spots; fragment-based ligand design; druggability; binding site comparison; docking
We have recently discovered an allosteric switch in Ras, bringing an additional level of complexity to this GTPase whose mutants are involved in nearly 30% of cancers. Upon activation of the allosteric switch, there is a shift in helix 3/loop 7 associated with a disorder to order transition in the active site. Here, we use a combination of multiple solvent crystal structures and computational solvent mapping (FTMap) to determine binding site hot spots in the “off” and “on” allosteric states of the GTP-bound form of H-Ras. Thirteen sites are revealed, expanding possible target sites for ligand binding well beyond the active site. Comparison of FTMaps for the H and K isoforms reveals essentially identical hot spots. Furthermore, using NMR measurements of spin relaxation, we determined that K-Ras exhibits global conformational dynamics very similar to those we previously reported for H-Ras. We thus hypothesize that the global conformational rearrangement serves as a mechanism for allosteric coupling between the effector interface and remote hot spots in all Ras isoforms. At least with respect to the binding sites involving the G domain, H-Ras is an excellent model for K-Ras and probably N-Ras as well. Ras has so far been elusive as a target for drug design. The present work identifies various unexplored hot spots throughout the entire surface of Ras, extending the focus from the disordered active site to well-ordered locations that should be easier to target.
Ras isoforms; drug target; binding site hot spots; Ras dynamics; allosteric switch
Creating new molecules that simultaneously enhance tumor cell killing and permit diagnostic tracking is vital to overcoming the limitations rendering current therapeutic regimens for terminal cancers ineffective. Accordingly, we investigated the efficacy of an innovative new multi-functional targeted anti-cancer molecule, SM7L, using models of the lethal brain tumor Glioblastoma multiforme (GBM). Designed using predictive computer modeling, SM7L incorporates the therapeutic activity of the promising anti-tumor cytokine MDA-7/IL-24, an enhanced secretory domain, and diagnostic domain for non-invasive tracking. In vitro assays revealed the diagnostic domain of SM7L produced robust photon emission, while the therapeutic domain showed marked anti-tumor efficacy and significant modulation of p38MAPK and ERK pathways. In vivo, the unique multi-functional nature of SM7L allowed simultaneous real-time monitoring of both SM7L delivery and anti-tumor efficacy. Utilizing engineered stem cells as novel delivery vehicles for SM7L therapy (SC-SM7L), we demonstrate that SC-SM7L significantly improved pharmacokinetics and attenuated progression of established peripheral and intracranial human GBM xenografts. Furthermore, SC-SM7L anti-tumor efficacy was augmented in vitro and in vivo by concurrent activation of caspase-mediated apoptosis induced by adjuvant SC-mediated S-TRAIL delivery. Collectively, these studies define a promising new approach to treating highly aggressive cancers, including GBM, using the optimized therapeutic molecule SM7L.
Formaldehyde has long been recognized as a hazardous environmental agent highly reactive with DNA. Recently, it has been realized that due to the activity of histone demethylation enzymes within the cell nucleus, formaldehyde is produced endogenously, in direct vicinity of genomic DNA. Should it lead to extensive DNA damage? We address this question with the aid of a computational mapping method, analogous to X-ray and nuclear magnetic resonance techniques for observing weakly specific interactions of small organic compounds with a macromolecule in order to establish important functional sites. We concentrate on the leading reaction of formaldehyde with free bases: hydroxymethylation of cytosine amino groups. Our results show that in B-DNA, cytosine amino groups are totally inaccessible for the formaldehyde attack. Then, we explore the effect of recently discovered transient flipping of Watson–Crick (WC) pairs into Hoogsteen (HG) pairs (HG breathing). Our results show that the HG base pair formation dramatically affects the accessibility for formaldehyde of cytosine amino nitrogens within WC base pairs adjacent to HG base pairs. The extensive literature on DNA interaction with formaldehyde is analyzed in light of the new findings. The obtained data emphasize the significance of DNA HG breathing.
Binding hot spots, protein sites with high-binding affinity, can be identified using X-ray crystallography or NMR by screening libraries of small organic molecules that tend to cluster at such regions. FTMAP, a direct computational analog of the experimental screening approaches, globally samples the surface of a target protein using small organic molecules as probes, finds favorable positions, clusters the conformations and ranks the clusters on the basis of the average energy. The regions that bind several probe clusters predict the binding hot spots, in good agreement with experimental results. Small molecules discovered by fragment-based approaches to drug design also bind at the hot spot regions. To identify such molecules and their most likely bound positions, we extend the functionality of FTMAP (http://ftmap.bu.edu/param) to accept any small molecule as an additional probe. In its updated form, FTMAP identifies the hot spots based on a standard set of probes, and for each additional probe shows representative structures of nearby low energy clusters. This approach helps to predict bound poses of the user-selected molecules, detects if a compound is not likely to bind in the hot spot region, and provides input for the design of larger ligands.
The interactions of beta2 glycoprotein I (B2GPI) with the receptors of the low-density lipoprotein receptor (LDLR) family are implicated in the clearance of negatively charged phospholipids and apoptotic cells and, in the presence of autoimmune anti-B2GPI antibodies, in cell activation, which might play a role in the pathology of antiphospholipid syndrome (APS). The ligand-binding domains of the lipoprotein receptors consist of multiple homologous LA modules connected by flexible linkers. In this study, we investigated at the atomic level the features of the LA modules required for binding to B2GPI. To compare the binding interface in B2GPI/LA complex to that observed in the high-resolution co-crystal structure of the receptor associated protein (RAP) with the LA modules 3 and 4 from the LDLR, we used the LA module 4 from the LDLR in our studies. Using solution NMR spectroscopy, we found that LA4 interacts with B2GPI and the binding site for B2GPI on the 15N-labeled LA4 is formed by the calcium coordinating residues of the LA module. We built a model for the complex between domain V of B2GPI (B2GPI-DV) and LA4 without introducing any experimentally derived constraints into the docking procedure. Our model, which is in the agreement with the NMR data, suggests that the binding interface of B2GPI for the lipoprotein receptors is centered at three lysine residues of B2GPI-DV, Lys 308, Lys 282 and Lys317.
LDLR; lipoprotein receptors; B2GPI; beta2-glycoprotein I; PIPER; molecular docking; antiphospholipid syndrome; APS
The analysis of results from CAPRI (Critical Assessment of Predicted Interactions), the first communitywide experiment devoted to protein docking, shows that all successful methods consist of multiple stages. The methods belong to three classes: global methods based on fast Fourier transforms or geometric matching, medium range Monte Carlo methods, and the restraint-guided HADDOCK program. Although these classes of methods require very different amounts of information in addition to the structures of component proteins, they all share the same four computational steps: (1) simplified and/or rigid body search; (2) selecting the region(s) of interest; (3) refinement of docked structures; and (4) selecting the best models. While each method is optimal for a specific class of docking problems, combining computational steps from different methods can improve the reliability and accuracy of results.
Motivation: The binding sites of proteins generally contain smaller regions that provide major contributions to the binding free energy and hence are the prime targets in drug design. Screening libraries of fragment-sized compounds by NMR or X-ray crystallography demonstrates that such ‘hot spot’ regions bind a large variety of small organic molecules, and that a relatively high ‘hit rate’ is predictive of target sites that are likely to bind drug-like ligands with high affinity. Our goal is to determine the ‘hot spots’ computationally rather than experimentally.
Results: We have developed the FTMAP algorithm that performs global search of the entire protein surface for regions that bind a number of small organic probe molecules. The search is based on the extremely efficient fast Fourier transform (FFT) correlation approach which can sample billions of probe positions on dense translational and rotational grids, but can use only sums of correlation functions for scoring and hence is generally restricted to very simple energy expressions. The novelty of FTMAP is that we were able to incorporate and represent on grids a detailed energy expression, resulting in a very accurate identification of low-energy probe clusters. Overlapping clusters of different probes are defined as consensus sites (CSs). We show that the largest CS is generally located at the most important subsite of the protein binding site, and the nearby smaller CSs identify other important subsites. Mapping results are presented for elastase whose structure has been solved in aqueous solutions of eight organic solvents, and we show that FTMAP provides very similar information. The second application is to renin, a long-standing pharmaceutical target for the treatment of hypertension, and we show that the major CSs trace out the shape of the first approved renin inhibitor, aliskiren.
Availability: FTMAP is available as a server at http://ftmap.bu.edu/.
Supplementary information: Supplementary Material is available at Bioinformatics online.
Fast Fourier Transform (FFT) correlation methods of protein-protein docking, combined with the clustering of low energy conformations, can find a number of local minima on the energy surface. For most complexes the locations of the near-native structures can be constrained to the 30 largest clusters, each surrounding a local minimum. However, no reliable further discrimination can be obtained by energy measures because the differences in the energy levels between the minima are comparable to the errors in the energy evaluation. In fact, no current scoring function accounts for the entropic contributions that relate to the width rather than the depth of the minima. Since structures at narrow minima loose more entropy, some of the non-native states can be detected by determining whether or not a local minimum is surrounded by a broad region of attraction on the energy surface. The analysis is based on starting Monte Carlo Minimization (MCM) runs from random points around each minimum, and observing whether a certain fraction of trajectories converge to a small region within the cluster. The cluster is considered stable if such a strong attractor exists, has at least 10 convergent trajectories, is relatively close to the original cluster center, and contains a low energy structure. We studied the stability of clusters for enzyme-inhibitor and antibody-antigen complexes in the Protein Docking Benchmark. The analysis yields three main results. First, all clusters that are close to the native structure are stable. Second, restricting considerations to stable clusters eliminates around half of the false positives, i.e., solutions that are low in energy but far from the native structure of the complex. Third, dividing the conformational space into clusters and determining the stability of each cluster, the combined approach is less dependent on a priori information than exploring the potential conformational space by Monte Carlo minimizations.
Fast Fourier Transform; Monte Carlo minimization; structure refinement; selection of near-native structures
This paper introduces a new stochastic global optimization method targeting protein-protein docking problems, an important class of problems in computational structural biology. The method is based on finding general convex quadratic underestimators to the binding energy function that is funnel-like. Finding the optimum underestimator requires solving a semidefinite programming problem, hence the name semidefinite programming-based underestimation (SDU). The underestimator is used to bias sampling in the search region. It is established that under appropriate conditions SDU locates the global energy minimum with probability approaching one as the sample size grows. A detailed comparison of SDU with a related method of convex global underestimator (CGU), and computational results for protein-protein docking problems are provided.
Linear matrix inequalities (LMIs); optimization; protein-protein docking; semidefinite programming; structural biology
We describe the proceedings and conclusions from a “Workshop on Applications of Protein Models in Biomedical Research” that was held at University of California at San Francisco on 11 and 12 July, 2008. At the workshop, international scientists involved with structure modeling explored (i) how models are currently used in biomedical research, (ii) what the requirements and challenges for different applications are, and (iii) how the interaction between the computational and experimental research communities could be strengthened to advance the field.
Motivation: Predicting how proteins interact at the molecular level is a computationally intensive task. Many protein docking algorithms begin by using fast Fourier transform (FFT) correlation techniques to find putative rigid body docking orientations. Most such approaches use 3D Cartesian grids and are therefore limited to computing three dimensional (3D) translational correlations. However, translational FFTs can speed up the calculation in only three of the six rigid body degrees of freedom, and they cannot easily incorporate prior knowledge about a complex to focus and hence further accelerate the calculation. Furthemore, several groups have developed multi-term interaction potentials and others use multi-copy approaches to simulate protein flexibility, which both add to the computational cost of FFT-based docking algorithms. Hence there is a need to develop more powerful and more versatile FFT docking techniques.
Results: This article presents a closed-form 6D spherical polar Fourier correlation expression from which arbitrary multi-dimensional multi-property multi-resolution FFT correlations may be generated. The approach is demonstrated by calculating 1D, 3D and 5D rotational correlations of 3D shape and electrostatic expansions up to polynomial order L=30 on a 2 GB personal computer. As expected, 3D correlations are found to be considerably faster than 1D correlations but, surprisingly, 5D correlations are often slower than 3D correlations. Nonetheless, we show that 5D correlations will be advantageous when calculating multi-term knowledge-based interaction potentials. When docking the 84 complexes of the Protein Docking Benchmark, blind 3D shape plus electrostatic correlations take around 30 minutes on a contemporary personal computer and find acceptable solutions within the top 20 in 16 cases. Applying a simple angular constraint to focus the calculation around the receptor binding site produces acceptable solutions within the top 20 in 28 cases. Further constraining the search to the ligand binding site gives up to 48 solutions within the top 20, with calculation times of just a few minutes per complex. Hence the approach described provides a practical and fast tool for rigid body protein-protein docking, especially when prior knowledge about one or both binding sites is available.