Binding hot spots, protein regions with high binding affinity, can be identified by using X-ray crystallography or NMR spectroscopy to screen libraries of small organic molecules that tend to cluster at such hot spots. FTMap, a direct computational analogue of the experimental screening approaches, uses 16 different probe molecules for global sampling of the surface of a target protein on a dense grid and evaluates the energy of interaction using an empirical energy function that includes a continuum electrostatic term. Energy evaluation is based on the fast Fourier transform correlation approach, which allows for the sampling of billions of probe positions. The grid sampling is followed by off-grid minimization that uses a more detailed energy expression with a continuum electrostatics term. FTMap identifies the hot spots as consensus clusters formed by overlapping clusters of several probes. The hot spots are ranked on the basis of the number of probe clusters, which predicts their binding propensity. We applied FTMap to nine structures of hen egg-white lysozyme (HEWL), whose hot spots have been extensively studied by both experimental and computational methods. FTMap found the primary hot spot in site C of all nine structures, in spite of conformational differences. In addition, secondary hot spots in sites B and D that are known to be important for the binding of polysaccharide substrates were found. The predicted probe–protein interactions agree well with those seen in the complexes of HEWL with various ligands and also agree with an NMR-based study of HEWL in aqueous solutions of eight organic solvents. We argue that FTMap provides more complete information on the HEWL binding site than previous computational methods and yields fewer false-positive binding locations than the X-ray structures of HEWL from crystals soaked in organic solvents.
Computational solvent mapping globally samples the surface of target proteins using molecular probes – small molecules or functional groups – to identify potentially favorable binding positions. The method is based on X-ray and NMR screening studies showing that the binding sites of proteins also bind a large variety of fragment-sized molecules. We have developed the multi-stage mapping algorithm FTMap (available as a server at http://ftmap.bu.edu/) based on the fast Fourier transform (FFT) correlation approach. Identifying regions of low free energy rather than individual low energy conformations, FTMap reproduces the available experimental mapping results. Applications to a variety of proteins show that the probes always cluster in important subsites of the binding site, and the amino acid residues that interact with many probes also bind the specific ligands of the protein. The “consensus” sites at which a number of different probes cluster are likely to be “druggable” sites, capable of binding drug-size ligands with high affinity. Due to its sensitivity to conformational changes the method can also be used for comparing the binding sites in different structures of a protein.
Protein structure; protein-ligand interactions; binding site; binding hot spots; fragment-based ligand design; druggability; binding site comparison; docking
Motivation: The binding sites of proteins generally contain smaller regions that provide major contributions to the binding free energy and hence are the prime targets in drug design. Screening libraries of fragment-sized compounds by NMR or X-ray crystallography demonstrates that such ‘hot spot’ regions bind a large variety of small organic molecules, and that a relatively high ‘hit rate’ is predictive of target sites that are likely to bind drug-like ligands with high affinity. Our goal is to determine the ‘hot spots’ computationally rather than experimentally.
Results: We have developed the FTMAP algorithm that performs global search of the entire protein surface for regions that bind a number of small organic probe molecules. The search is based on the extremely efficient fast Fourier transform (FFT) correlation approach which can sample billions of probe positions on dense translational and rotational grids, but can use only sums of correlation functions for scoring and hence is generally restricted to very simple energy expressions. The novelty of FTMAP is that we were able to incorporate and represent on grids a detailed energy expression, resulting in a very accurate identification of low-energy probe clusters. Overlapping clusters of different probes are defined as consensus sites (CSs). We show that the largest CS is generally located at the most important subsite of the protein binding site, and the nearby smaller CSs identify other important subsites. Mapping results are presented for elastase whose structure has been solved in aqueous solutions of eight organic solvents, and we show that FTMAP provides very similar information. The second application is to renin, a long-standing pharmaceutical target for the treatment of hypertension, and we show that the major CSs trace out the shape of the first approved renin inhibitor, aliskiren.
Availability: FTMAP is available as a server at http://ftmap.bu.edu/.
Supplementary information: Supplementary Material is available at Bioinformatics online.
Fragment based drug design (FBDD) starts with finding fragment-sized compounds that are highly ligand efficient and can serve as a core moiety for developing high affinity leads. Although the core-bound structure of a protein facilitates the construction of leads, effective design is far from straightforward. We show that protein mapping, a computational method developed to find binding hot spots and implemented as the FTMap server, provides information that complements the fragment screening results and can drive the evolution of core fragments into larger leads with a minimal loss or, in some cases, even a gain in ligand efficiency. The method places small molecular probes, the size of organic solvents, on a dense grid around the protein, and identifies the hot spots as consensus clusters formed by clusters of several probes. The hot spots are ranked based on the number of probe clusters, which predicts the binding propensity of the subsites and hence their importance for drug design. Accordingly, with a single exception the main hot spot identified by FTMap binds the core compound found by fragment screening. The most useful information is provided by the neighboring secondary hot spots, indicating the regions where the core can be extended to increase its affinity. To quantify this information, we calculate the density of probes from mapping, which describes the binding propensity at each point, and show that the change in the correlation between a ligand position and the probe density upon extending or repositioning the core moiety predicts the expected change in ligand efficiency.
Protein mapping; protein docking; drug design; ligand efficiency; affinity prediction
The identification of hot spots, i.e. binding regions that contribute substantially to the free energy of ligand binding, is a critical step for structure-based drug design. Here we present the application of two fragment-based methods to the detection of hot spots for DJ-1 and glucocerebrosidase (GCase), targets for the development of therapeutics for Parkinson’s and Gaucher’s diseases respectively. While the structures of these two proteins are known, binding information is lacking. In this study we employ both the multiple solvent crystal structures (MSCS) method and the FTMap algorithm to identify regions suitable for the development of pharmacological chaperones for DJ-1 and GCase. Comparison of data derived via MSCS and FTMap also shows that FTMap, a computational method for the identification of fragment binding hot spots, is an accurate and robust alternative to the performance of expensive and difficult MSCS experiments.
fragment-based drug design; structure-based drug design; hot spot identification; DJ-1; glucocerebrosidase; Parkinson’s disease; Gaucher’s disease; pharmacological chaperones
To address the problem of specificity in G-protein coupled receptor (GPCR) drug discovery, there has been tremendous recent interest in allosteric drugs that bind at sites topographically distinct from the orthosteric site. Unfortunately, structure-based drug design of allosteric GPCR ligands has been frustrated by the paucity of structural data for allosteric binding sites, making a strong case for predictive computational methods. In this work, we map the surfaces of the β1 (β1AR) and β2 (β2AR) adrenergic receptor structures, to detect a series of five potentially druggable allosteric sites. We employ the FTMAP algorithm to identify “hot spots” with affinity for a variety of organic probe molecules corresponding to drug fragments. Our work is distinguished by an ensemble-based approach, whereby we map diverse receptor conformations taken from Molecular Dynamics (MD) simulations totalling ~0.5 μs. Our results reveal distinct pockets formed at both solvent-exposed and lipid-exposed cavities, which we interpret in the light of experimental data and which may constitute novel targets for GPCR drug discovery. This mapping data can now serve to drive a combination of fragment-based and virtual screening approaches for the discovery of small molecules that bind at these sites and which may offer highly selective therapies.
molecular dynamics; allosteric; GPCR; docking; fragment-based
To address the problem of specificity in G-protein coupled receptor (GPCR) drug discovery, there has been tremendous recent interest in allosteric drugs that bind at sites topographically distinct from the orthosteric site. Unfortunately, structure-based drug design of allosteric GPCR ligands has been frustrated by the paucity of structural data for allosteric binding sites, making a strong case for predictive computational methods. In this work, we map the surfaces of the β1 (β1AR) and β2 (β2AR) adrenergic receptor structures to detect a series of five potentially druggable allosteric sites. We employ the FTMAP algorithm to identify ‘hot spots’ with affinity for a variety of organic probe molecules corresponding to drug fragments. Our work is distinguished by an ensemble-based approach, whereby we map diverse receptor conformations taken from molecular dynamics (MD) simulations totaling approximately 0.5 μs. Our results reveal distinct pockets formed at both solvent-exposed and lipid-exposed cavities, which we interpret in light of experimental data and which may constitute novel targets for GPCR drug discovery. This mapping data can now serve to drive a combination of fragment-based and virtual screening approaches for the discovery of small molecules that bind at these sites and which may offer highly selective therapies.
allosteric; docking; fragment-based; GPCR; molecular dynamics
We have recently discovered an allosteric switch in Ras, bringing an additional level of complexity to this GTPase whose mutants are involved in nearly 30% of cancers. Upon activation of the allosteric switch, there is a shift in helix 3/loop 7 associated with a disorder to order transition in the active site. Here, we use a combination of multiple solvent crystal structures and computational solvent mapping (FTMap) to determine binding site hot spots in the “off” and “on” allosteric states of the GTP-bound form of H-Ras. Thirteen sites are revealed, expanding possible target sites for ligand binding well beyond the active site. Comparison of FTMaps for the H and K isoforms reveals essentially identical hot spots. Furthermore, using NMR measurements of spin relaxation, we determined that K-Ras exhibits global conformational dynamics very similar to those we previously reported for H-Ras. We thus hypothesize that the global conformational rearrangement serves as a mechanism for allosteric coupling between the effector interface and remote hot spots in all Ras isoforms. At least with respect to the binding sites involving the G domain, H-Ras is an excellent model for K-Ras and probably N-Ras as well. Ras has so far been elusive as a target for drug design. The present work identifies various unexplored hot spots throughout the entire surface of Ras, extending the focus from the disordered active site to well-ordered locations that should be easier to target.
Ras isoforms; drug target; binding site hot spots; Ras dynamics; allosteric switch
Hot spots are energetically important residues at protein interfaces and they are not randomly distributed across the interface but rather clustered. These clustered hot spots form hot regions. Hot regions are important for the stability of protein complexes, as well as providing specificity to binding sites. We propose a database called HotRegion, which provides the hot region information of the interfaces by using predicted hot spot residues, and structural properties of these interface residues such as pair potentials of interface residues, accessible surface area (ASA) and relative ASA values of interface residues of both monomer and complex forms of proteins. Also, the 3D visualization of the interface and interactions among hot spot residues are provided. HotRegion is accessible at http://prism.ccbb.ku.edu.tr/hotregion.
How proteins approach surrounding molecules is fundamental to our understanding of the specific interactions that occur at the surface of proteins. The enhanced surface accessibility of small molecules such as organic solvents and paramagnetic probes to protein binding sites has been observed; however, the molecular basis of this finding has not been fully established. Recently, it has been suggested that hydration dynamics play a predominant role in controlling the distribution of hot spots on surface of proteins.
In the present study, the hydration of the archaeal multifunctional protein Sso7d from Solfolobus solfataricus was investigated using a combination of computational and experimental data derived from molecular dynamics simulations and ePHOGSY NMR spectroscopy.
We obtained a convergent protein hydration landscape that indicated how the shape and stability of the Sso7d hydration shell could modulate the function of the protein. The DNA binding domain overlaps with the protein region involved in chaperon activity and this domain is hydrated only in a very small central region. This localized hydration seems to favor intermolecular approaches from a large variety of ligands. Conversely, high water density was found in surface regions of the protein where the ATP binding site is located, suggesting that surface water molecules play a role in protecting the protein from unspecific interactions.
Use of solvent-mapping, based on multiple-copy minimization (MCM) techniques, is common in structure-based drug discovery. The minima of small-molecule probes define locations for complementary interactions within a binding pocket. Here, we present improved methods for MCM. In particular, a Jarvis-Patrick method is outlined for grouping the final locations of minimized probes into physical clusters. This algorithm has been tested through a study of protein-protein interfaces, showing the process to be robust, deterministic, and fast in the mapping of protein “hot spots”. Improvements in the initial placement of probe molecules are also described. A final application to HIV-1 protease shows how our automated technique can be used to partition data too complicated to analyze by hand. These new automated methods may be easily and quickly extended to other protein systems, and our clustering methodology may be readily incorporated into other clustering packages.
Clustering; structure-based drug design; Jarvis-Patrick
A protein binding hot spot is a small cluster of residues tightly packed at the center of the interface between two interacting proteins. Though a hot spot constitutes a small fraction of the interface, it is vital to the stability of protein complexes. Recently, there are a series of hypotheses proposed to characterize binding hot spots, including the pioneering O-ring theory, the insightful 'coupling' and 'hot region' principle, and our 'double water exclusion' (DWE) hypothesis. As the perspective changes from the O-ring theory to the DWE hypothesis, we examine the physicochemical properties of the binding hot spots under the new hypothesis and compare with those under the O-ring theory.
The requirements for a cluster of residues to form a hot spot under the DWE hypothesis can be mathematically satisfied by a biclique subgraph if a vertex is used to represent a residue, an edge to indicate a close distance between two residues, and a bipartite graph to represent a pair of interacting proteins. We term these hot spots as DWE bicliques. We identified DWE bicliques from crystal packing contacts, obligate and non-obligate interactions. Our comparative study revealed that there are abundant unique bicliques to the biological interactions, indicating specific biological binding behaviors in contrast to crystal packing. The two sub-types of biological interactions also have their own signature bicliques. In our analysis on residue compositions and residue pairing preferences in DWE bicliques, the focus was on interaction-preferred residues (ipRs) and interaction-preferred residue pairs (ipRPs). It is observed that hydrophobic residues are heavily involved in the ipRs and ipRPs of the obligate interactions; and that aromatic residues are in favor in the ipRs and ipRPs of the biological interactions, especially in those of the non-obligate interactions. In contrast, the ipRs and ipRPs in crystal packing are dominated by hydrophilic residues, and most of the anti-ipRs of crystal packing are the ipRs of the obligate or non-obligate interactions.
These ipRs and ipRPs in our DWE bicliques describe a diverse binding features among the three types of interactions. They also highlight the specific binding behaviors of the biological interactions, sharply differing from the artifact interfaces in the crystal packing. It can be noted that DWE bicliques, especially the unique bicliques, can capture deep insights into the binding characteristics of protein interfaces.
We present a new database of computational hot spots in protein interfaces: HotSprint. Hot spots are residues comprising only a small fraction of interfaces yet accounting for the majority of the binding energy. HotSprint contains data for 35 776 protein interfaces among 49 512 protein interfaces extracted from the multi-chain structures in Protein Data Bank (PDB) as of February 2006. The conserved residues in interfaces with certain buried accessible solvent area (ASA) and complex ASA thresholds are flagged as computational hot spots. The predicted hot spots are observed to correlate with the experimental hot spots with an accuracy of 76%. Several machine-learning methods (SVM, Decision Trees and Decision Lists) are also applied to predict hot spots, results reveal that our empirical approach performs better than the others. A web interface for the HotSprint database allows users to browse and query the hot spots in protein interfaces. HotSprint is available at http://prism.ccbb.ku.edu.tr/hotsprint; and it provides information for interface residues that are functionally and structurally important as well as the evolutionary history and solvent accessibility of residues in interfaces.
Hot spots are residues contributing the most of binding free energy yet accounting for a small portion of a protein interface. Experimental approaches to identify hot spots such as alanine scanning mutagenesis are expensive and time-consuming, while computational methods are emerging as effective alternatives to experimental approaches.
In this study, we propose a semi-supervised boosting SVM, which is called sbSVM, to computationally predict hot spots at protein-protein interfaces by combining protein sequence and structure features. Here, feature selection is performed using random forests to avoid over-fitting. Due to the deficiency of positive samples, our approach samples useful unlabeled data iteratively to boost the performance of hot spots prediction. The performance evaluation of our method is carried out on a dataset generated from the ASEdb database for cross-validation and a dataset from the BID database for independent test. Furthermore, a balanced dataset with similar amounts of hot spots and non-hot spots (65 and 66 respectively) derived from the first training dataset is used to further validate our method. All results show that our method yields good sensitivity, accuracy and F1 score comparing with the existing methods.
Our method boosts prediction performance of hot spots by using unlabeled data to overcome the deficiency of available training data. Experimental results show that our approach is more effective than the traditional supervised algorithms and major existing hot spot prediction methods.
The KFC Server is a web-based implementation of the KFC (Knowledge-based FADE and Contacts) model—a machine learning approach for the prediction of binding hot spots, or the subset of residues that account for most of a protein interface's; binding free energy. The server facilitates the automated analysis of a user submitted protein–protein or protein–DNA interface and the visualization of its hot spot predictions. For each residue in the interface, the KFC Server characterizes its local structural environment, compares that environment to the environments of experimentally determined hot spots and predicts if the interface residue is a hot spot. After the computational analysis, the user can visualize the results using an interactive job viewer able to quickly highlight predicted hot spots and surrounding structural features within the protein structure. The KFC Server is accessible at http://kfc.mitchell-lab.org.
Systematic mutagenesis studies have shown that only a few interface residues termed hot spots contribute significantly to the binding free energy of protein-protein interactions. Therefore, hot spots prediction becomes increasingly important for well understanding the essence of proteins interactions and helping narrow down the search space for drug design. Currently many computational methods have been developed by proposing different features. However comparative assessment of these features and furthermore effective and accurate methods are still in pressing need.
In this study, we first comprehensively collect the features to discriminate hot spots and non-hot spots and analyze their distributions. We find that hot spots have lower relASA and larger relative change in ASA, suggesting hot spots tend to be protected from bulk solvent. In addition, hot spots have more contacts including hydrogen bonds, salt bridges, and atomic contacts, which favor complexes formation. Interestingly, we find that conservation score and sequence entropy are not significantly different between hot spots and non-hot spots in Ab+ dataset (all complexes). While in Ab- dataset (antigen-antibody complexes are excluded), there are significant differences in two features between hot pots and non-hot spots. Secondly, we explore the predictive ability for each feature and the combinations of features by support vector machines (SVMs). The results indicate that sequence-based feature outperforms other combinations of features with reasonable accuracy, with a precision of 0.69, a recall of 0.68, an F1 score of 0.68, and an AUC of 0.68 on independent test set. Compared with other machine learning methods and two energy-based approaches, our approach achieves the best performance. Moreover, we demonstrate the applicability of our method to predict hot spots of two protein complexes.
Experimental results show that support vector machine classifiers are quite effective in predicting hot spots based on sequence features. Hot spots cannot be fully predicted through simple analysis based on physicochemical characteristics, but there is reason to believe that integration of features and machine learning methods can remarkably improve the predictive performance for hot spots.
Protein–protein interactions are central to almost any cellular process. Although typically protein interfaces are large, it is well established that only a relatively small region, the so-called ‘hot spot’, contributes the most to the total binding energy. There is a clear interest in identifying hot spots because of its application in drug discovery and protein design. Presaging Critical Residues in Protein Interfaces Database (PCRPi-DB) is a public repository that archives computationally annotated hot spots in protein complexes for which the 3D structure is known. Hot spots have been annotated using a new and highly accurate computational method developed in the lab. PCRPi-DB is freely available to the scientific community at http://www.bioinsilico.org/PCRPIDB. Besides browsing and querying the contents of the database, extensive documentation and links to relevant on-line resources and contents are available to users. PCRPi-DB is updated on a weekly basis.
Structures of the influenza A virus M2 proton channel have been determined by X-ray crystallography in the open conformation, and by NMR in the closed state. Whereas the X-ray structure shows a single inhibitor molecule in the middle of the channel, four inhibitor molecules bind the channel’s outer surface in the NMR structure. Although in both structures the strongest hot spots (i.e., regions which substantially contribute to the free energy of binding any potential ligand) lie inside the pore, hot spots also are found at exterior locations. By considering all available models, we propose the primary drug binding site is inside the pore, but that exterior binding also occurs under appropriate conditions.
This paper reports a new strategy based on plasma etching that allows us to exclusively probe the SERS-active molecules adsorbed in the hot-spot region formed between two Ag nanocubes. Experimentally, we verified that the enhancement factor of the hot spot (EFhot-spot) was strongly dependent on its orientation relative to the laser polarization. For the hot spot formed between two Ag nanocubes of 100 nm in edge length, the EFhot-spot was found to vary from 1.0×108 to 4.1×106 and 4.4×105 as the long axis of the dimer was changed from 0 (parallel) to 45 and 90 (perpendicular) degrees relative to the direction of laser polarization. These results suggest a maximum enhancement of Raman signals by ~170 folds for the hot spot relative to the EF obtained for a single Ag nanocube of similar size. While the hot spot made a major contribution to the observed SERS signals when the dimer's long axis was parallel to the laser polarization, the hot spot did not contribute additionally to the detected signals when the dimer was in other orientations relative to the laser polarization.
SERS; hot spot; silver; nanocube; dimer
The influenza virus subtype H5N1 has raised concerns of a possible human pandemic threat because of its high virulence and mutation rate. Although several approved anti-influenza drugs effectively target the neuraminidase, some strains have already acquired resistance to the currently available anti-influenza drugs. In this study, we present the synergistic application of extended explicit solvent molecular dynamics (MD) and computational solvent mapping (CS-Map) to identify putative ‘hot spots’ within flexible binding regions of N1 neuraminidase. Using representative conformations of the N1 binding region extracted from a clustering analysis of four concatenated 40-ns MD simulations, CS-Map was utilized to assess the ability of small, solvent-sized molecules to bind within close proximity to the sialic acid binding region. Mapping analyses of the dominant MD conformations reveal the presence of additional hot spot regions in the 150- and 430-loop regions. Our hot spot analysis provides further support for the feasibility of developing high-affinity inhibitors capable of binding these regions, which appear to be unique to the N1 strain.
computational solvent mapping; ensemble-based drug design; H5N1; hot spot; molecular dynamics; neuraminidase; receptor flexibility; RMSD clustering
Meiotic recombination occurs preferentially at certain regions in the genome referred to as hot spots. The number of hot spots known in humans has increased manifold in recent years. The identification of these hot spots in humans is of great interest to population and medical geneticists since they influence the structure of Linkage Disequilibrium and Haplotype blocks in human populations, whose patterns have applications in mapping disease genes. HUMHOT is a web-based database of Human Meiotic Recombination Hot Spots. The database comprises DNA sequences corresponding to the hot spot regions from the literature that have been mapped to a high resolution (<4 kb) in humans. It also provides flanking sequence information for the hot spot region along with references describing the hot spot. The database can be queried based on hot spot identity, chromosome position or by homology to user-defined sequences. It is also updated with new hot spot sequences as they are discovered and provides hyperlinks to commonly used tools for estimating recombination rates, performing genetic analysis and new advances in our understanding of meiotic hot spots. Public access to the HUMHOT database is available at .
Interactions between a membrane protein and the lipid molecules that surround it in the membrane are important in determining the structure and function of the protein. These interactions can be pictured at the molecular level using fluorescence spectroscopy, making use of the ability to introduce tryptophan residues into regions of interest in bacterial membrane proteins. Fluorescence quenching methods have been developed to study lipid binding separately on the two sides of the membrane. Lipid binding to the surface of the mechanosensitive channel MscL is heterogeneous, with a hot-spot for binding anionic lipid on the cytoplasmic side, associated with a cluster of three positively charged residues. The environmental sensitivity of tryptophan fluorescence emission has been used to identify the residues at the ends of the hydrophobic core of the second transmembrane α-helix in MscL. The efficiency of hydrophobic matching between MscL and the surrounding lipid bilayer is high. Fluorescence quenching methods can also be used to study binding of lipids to non-annular sites such as those between monomers in the homotetrameric potassium channel KcsA.
bacterial channel; fluorescence; KcsA; lipid-protein interaction; mechanosensitive channel; potassium channel
The study of protein-protein interactions is becoming increasingly important for biotechnological and therapeutic reasons. We can define two major areas therein: the structural prediction of protein-protein binding mode, and the identification of the relevant residues for the interaction (so called 'hot-spots'). These hot-spot residues have high interest since they are considered one of the possible ways of disrupting a protein-protein interaction. Unfortunately, large-scale experimental measurement of residue contribution to the binding energy, based on alanine-scanning experiments, is costly and thus data is fairly limited. Recent computational approaches for hot-spot prediction have been reported, but they usually require the structure of the complex.
We have applied here normalized interface propensity (NIP) values derived from rigid-body docking with electrostatics and desolvation scoring for the prediction of interaction hot-spots. This parameter identifies hot-spot residues on interacting proteins with predictive rates that are comparable to other existing methods (up to 80% positive predictive value), and the advantage of not requiring any prior structural knowledge of the complex.
The NIP values derived from rigid-body docking can reliably identify a number of hot-spot residues whose contribution to the interaction arises from electrostatics and desolvation effects. Our method can propose residues to guide experiments in complexes of biological or therapeutic interest, even in cases with no available 3D structure of the complex.
It is well known that most of the binding free energy of protein interaction is contributed by a few key hot spot residues. These residues are crucial for understanding the function of proteins and studying their interactions. Experimental hot spots detection methods such as alanine scanning mutagenesis are not applicable on a large scale since they are time consuming and expensive. Therefore, reliable and efficient computational methods for identifying hot spots are greatly desired and urgently required.
In this work, we introduce an efficient approach that uses support vector machine (SVM) to predict hot spot residues in protein interfaces. We systematically investigate a wide variety of 62 features from a combination of protein sequence and structure information. Then, to remove redundant and irrelevant features and improve the prediction performance, feature selection is employed using the F-score method. Based on the selected features, nine individual-feature based predictors are developed to identify hot spots using SVMs. Furthermore, a new ensemble classifier, namely APIS (A combined model based on Protrusion Index and Solvent accessibility), is developed to further improve the prediction accuracy. The results on two benchmark datasets, ASEdb and BID, show that this proposed method yields significantly better prediction accuracy than those previously published in the literature. In addition, we also demonstrate the predictive power of our proposed method by modelling two protein complexes: the calmodulin/myosin light chain kinase complex and the heat shock locus gene products U and V complex, which indicate that our method can identify more hot spots in these two complexes compared with other state-of-the-art methods.
We have developed an accurate prediction model for hot spot residues, given the structure of a protein complex. A major contribution of this study is to propose several new features based on the protrusion index of amino acid residues, which has been shown to significantly improve the prediction performance of hot spots. Moreover, we identify a compact and useful feature subset that has an important implication for identifying hot spot residues. Our results indicate that these features are more effective than the conventional evolutionary conservation, pairwise residue potentials and other traditional features considered previously, and that the combination of our and traditional features may support the creation of a discriminative feature set for efficient prediction of hot spot residues. The data and source code are available on web site http://home.ustc.edu.cn/~jfxia/hotspot.html.
A protein binding hot spot is a cluster of residues in the interface that are energetically important for the binding of the protein with its interaction partner. Identifying protein binding hot spots can give useful information to protein engineering and drug design, and can also deepen our understanding of protein-protein interaction. These residues are usually buried inside the interface with very low solvent accessible surface area (SASA). Thus SASA is widely used as an outstanding feature in hot spot prediction by many computational methods. However, SASA is not capable of distinguishing slightly buried residues, of which most are non hot spots, and deeply buried ones that are usually inside a hot spot.
We propose a new descriptor called “burial level” for characterizing residues, atoms and atomic contacts. Specifically, burial level captures the depth the residues are buried. We identify different kinds of deeply buried atomic contacts (DBAC) at different burial levels that are directly broken in alanine substitution. We use their numbers as input for SVM to classify between hot spot or non hot spot residues. We achieve F measure of 0.6237 under the leave-one-out cross-validation on a data set containing 258 mutations. This performance is better than other computational methods.
Our results show that hot spot residues tend to be deeply buried in the interface, not just having a low SASA value. This indicates that a high burial level is not only a necessary but also a more sufficient condition than a low SASA for a residue to be a hot spot residue. We find that those deeply buried atoms become increasingly more important when their burial levels rise up. This work also confirms the contribution of deeply buried interfacial atomic contacts to the energy of protein binding hot spot.