Motivation: Mutating residues into alanine (alanine scanning) is one of the fastest experimental means of probing hypotheses about protein function. Alanine scans can reveal functional hot spots, i.e. residues that alter function upon mutation. In vitro mutagenesis is cumbersome and costly: probing all residues in a protein is typically as impossible as substituting by all non-native amino acids. In contrast, such exhaustive mutagenesis is feasible in silico.
Results: Previously, we developed SNAP to predict functional changes due to non-synonymous single nucleotide polymorphisms. Here, we applied SNAP to all experimental mutations in the ASEdb database of alanine scans; we identified 70% of the hot spots (≥1 kCal/mol change in binding energy); more severe changes were predicted more accurately. Encouraged, we carried out a complete all-against-all in silico mutagenesis for human glucokinase. Many of the residues predicted as functionally important have indeed been confirmed in the literature, others await experimental verification, and our method is ready to aid in the design of in vitro mutagenesis.
Availability: ASEdb and glucokinase scores are available at http://www.rostlab.org/services/SNAP. For submissions of large/whole proteins for processing please contact the author.
It is well known that most of the binding free energy of protein interaction is contributed by a few key hot spot residues. These residues are crucial for understanding the function of proteins and studying their interactions. Experimental hot spots detection methods such as alanine scanning mutagenesis are not applicable on a large scale since they are time consuming and expensive. Therefore, reliable and efficient computational methods for identifying hot spots are greatly desired and urgently required.
In this work, we introduce an efficient approach that uses support vector machine (SVM) to predict hot spot residues in protein interfaces. We systematically investigate a wide variety of 62 features from a combination of protein sequence and structure information. Then, to remove redundant and irrelevant features and improve the prediction performance, feature selection is employed using the F-score method. Based on the selected features, nine individual-feature based predictors are developed to identify hot spots using SVMs. Furthermore, a new ensemble classifier, namely APIS (A combined model based on Protrusion Index and Solvent accessibility), is developed to further improve the prediction accuracy. The results on two benchmark datasets, ASEdb and BID, show that this proposed method yields significantly better prediction accuracy than those previously published in the literature. In addition, we also demonstrate the predictive power of our proposed method by modelling two protein complexes: the calmodulin/myosin light chain kinase complex and the heat shock locus gene products U and V complex, which indicate that our method can identify more hot spots in these two complexes compared with other state-of-the-art methods.
We have developed an accurate prediction model for hot spot residues, given the structure of a protein complex. A major contribution of this study is to propose several new features based on the protrusion index of amino acid residues, which has been shown to significantly improve the prediction performance of hot spots. Moreover, we identify a compact and useful feature subset that has an important implication for identifying hot spot residues. Our results indicate that these features are more effective than the conventional evolutionary conservation, pairwise residue potentials and other traditional features considered previously, and that the combination of our and traditional features may support the creation of a discriminative feature set for efficient prediction of hot spot residues. The data and source code are available on web site http://home.ustc.edu.cn/~jfxia/hotspot.html.
Hot spots are residues contributing the most of binding free energy yet accounting for a small portion of a protein interface. Experimental approaches to identify hot spots such as alanine scanning mutagenesis are expensive and time-consuming, while computational methods are emerging as effective alternatives to experimental approaches.
In this study, we propose a semi-supervised boosting SVM, which is called sbSVM, to computationally predict hot spots at protein-protein interfaces by combining protein sequence and structure features. Here, feature selection is performed using random forests to avoid over-fitting. Due to the deficiency of positive samples, our approach samples useful unlabeled data iteratively to boost the performance of hot spots prediction. The performance evaluation of our method is carried out on a dataset generated from the ASEdb database for cross-validation and a dataset from the BID database for independent test. Furthermore, a balanced dataset with similar amounts of hot spots and non-hot spots (65 and 66 respectively) derived from the first training dataset is used to further validate our method. All results show that our method yields good sensitivity, accuracy and F1 score comparing with the existing methods.
Our method boosts prediction performance of hot spots by using unlabeled data to overcome the deficiency of available training data. Experimental results show that our approach is more effective than the traditional supervised algorithms and major existing hot spot prediction methods.
Alanine scanning mutagenesis is a powerful experimental methodology for investigating the structural and energetic characteristics of protein complexes. Individual amino-acids are systematically mutated to alanine and changes in free energy of binding (ΔΔG) measured. Several experiments have shown that protein-protein interactions are critically dependent on just a few residues ("hot spots") at the interface. Hot spots make a dominant contribution to the free energy of binding and if mutated they can disrupt the interaction. As mutagenesis studies require significant experimental efforts, there is a need for accurate and reliable computational methods. Such methods would also add to our understanding of the determinants of affinity and specificity in protein-protein recognition.
We present a novel computational strategy to identify hot spot residues, given the structure of a complex. We consider the basic energetic terms that contribute to hot spot interactions, i.e. van der Waals potentials, solvation energy, hydrogen bonds and Coulomb electrostatics. We treat them as input features and use machine learning algorithms such as Support Vector Machines and Gaussian Processes to optimally combine and integrate them, based on a set of training examples of alanine mutations. We show that our approach is effective in predicting hot spots and it compares favourably to other available methods. In particular we find the best performances using Transductive Support Vector Machines, a semi-supervised learning scheme. When hot spots are defined as those residues for which ΔΔG ≥ 2 kcal/mol, our method achieves a precision and a recall respectively of 56% and 65%.
We have developed an hybrid scheme in which energy terms are used as input features of machine learning models. This strategy combines the strengths of machine learning and energy-based methods. Although so far these two types of approaches have mainly been applied separately to biomolecular problems, the results of our investigation indicate that there are substantial benefits to be gained by their integration.
Protein-protein association is essential for a variety of cellular processes and hence a large number of investigations are being carried out to understand the principles of protein-protein interactions. In this study, oligomeric protein structures are viewed from a network perspective to obtain new insights into protein association. Structure graphs of proteins have been constructed from a non-redundant set of protein oligomer crystal structures by considering amino acid residues as nodes and the edges are based on the strength of the non-covalent interactions between the residues. The analysis of such networks has been carried out in terms of amino acid clusters and hubs (highly connected residues) with special emphasis to protein interfaces.
A variety of interactions such as hydrogen bond, salt bridges, aromatic and hydrophobic interactions, which occur at the interfaces are identified in a consolidated manner as amino acid clusters at the interface, from this study. Moreover, the characterization of the highly connected hub-forming residues at the interfaces and their comparison with the hubs from the non-interface regions and the non-hubs in the interface regions show that there is a predominance of charged interactions at the interfaces. Further, strong and weak interfaces are identified on the basis of the interaction strength between amino acid residues and the sizes of the interface clusters, which also show that many protein interfaces are stronger than their monomeric protein cores. The interface strengths evaluated based on the interface clusters and hubs also correlate well with experimentally determined dissociation constants for known complexes. Finally, the interface hubs identified using the present method correlate very well with experimentally determined hotspots in the interfaces of protein complexes obtained from the Alanine Scanning Energetics database (ASEdb). A few predictions of interface hot spots have also been made based on the results obtained from this analysis, which await experimental verification.
The construction and analysis of oligomeric protein structure networks and their comparison with monomeric protein structure networks provide insights into protein association. Further, the interface hubs identified using the present method can be effective targets for interface de-stabilizing mutations. We believe this analysis will significantly enhance our knowledge of the principles behind protein association and also aid in protein design.
Protein–protein interactions, a key to almost any biological process, are mediated by molecular mechanisms that are not entirely clear. The study of these mechanisms often focuses on all residues at protein–protein interfaces. However, only a small subset of all interface residues is actually essential for recognition or binding. Commonly referred to as “hotspots,” these essential residues are defined as residues that impede protein–protein interactions if mutated. While no in silico tool identifies hotspots in unbound chains, numerous prediction methods were designed to identify all the residues in a protein that are likely to be a part of protein–protein interfaces. These methods typically identify successfully only a small fraction of all interface residues. Here, we analyzed the hypothesis that the two subsets correspond (i.e., that in silico methods may predict few residues because they preferentially predict hotspots). We demonstrate that this is indeed the case and that we can therefore predict directly from the sequence of a single protein which residues are interaction hotspots (without knowledge of the interaction partner). Our results suggested that most protein complexes are stabilized by similar basic principles. The ability to accurately and efficiently identify hotspots from sequence enables the annotation and analysis of protein–protein interaction hotspots in entire organisms and thus may benefit function prediction and drug development. The server for prediction is available at http://www.rostlab.org/services/isis.
Interactions between proteins underlie all biological processes. Hence, to fully understand or to control biological processes we need to unravel the principles of protein interactions. The quest for these principles has focused predominantly on the entire interfaces between two interacting proteins. However, it has been shown that only few of the interface residues are essential for the recognition and binding to other proteins. The identification of these residues, commonly referred to as binding “hotspots,” is a first step toward understanding the function of proteins and studying their interactions. Experimentally, hotspots could be identified by mutating single residues—an expensive and laborious procedure that is not applicable on a large scale. Here, we show that it is possible to identify protein interaction hotspots computationally on a large scale based on the amino acid sequence of a single protein, without requiring the knowledge of its interaction partner. Our results suggest that most protein complexes are stabilized by similar basic principles. The ability to accurately and efficiently identify hotspots from sequence enables the annotation and analysis of protein–protein interaction hotspots in an entire organism and thus may benefit function prediction and drug development.
Summary: Many existing databases annotate experimentally characterized single nucleotide polymorphisms (SNPs). Each non-synonymous SNP (nsSNP) changes one amino acid in the gene product (single amino acid substitution;SAAS). This change can either affect protein function or be neutral in that respect. Most polymorphisms lack experimental annotation of their functional impact. Here, we introduce SNPdbe—SNP database of effects, with predictions of computationally annotated functional impacts of SNPs. Database entries represent nsSNPs in dbSNP and 1000 Genomes collection, as well as variants from UniProt and PMD. SAASs come from >2600 organisms; ‘human’ being the most prevalent. The impact of each SAAS on protein function is predicted using the SNAP and SIFT algorithms and augmented with experimentally derived function/structure information and disease associations from PMD, OMIM and UniProt. SNPdbe is consistently updated and easily augmented with new sources of information. The database is available as an MySQL dump and via a web front end that allows searches with any combination of organism names, sequences and mutation IDs.
Fks1, with orthologs in nearly all fungi as well as plants and many protists, plays a central role in fungal cell wall formation as the putative catalytic component of β-1,3-glucan synthase. It is also the target for an important new antifungal group, the echinocandins, as evidenced by the localization of resistance-conferring mutations to Fks1 hot spots 1, 2, and 3 (residues 635 to 649, 1354 to 1361, and 690 to 700, respectively). Since Fks1 is an integral membrane protein and echinocandins are cyclic peptides with lipid tails, Fks1 topology is key to understanding its function and interaction with echinocandins. We used hemagglutinin (HA)-Suc2-His4C fusions to C-terminally truncated Saccharomyces cerevisiae Fks1 to experimentally define its topology and site-directed mutagenesis to test function of selected residues. Of the 15 to 18 transmembrane helices predicted in silico for Fks1 from evolutionarily diverse fungi, 13 were experimentally confirmed. The N terminus (residues 1 to 445) is cytosolic and the C terminus (residues 1823 to 1876) external; both are essential to Fks1 function. The cytosolic central domain (residues 715 to 1294) includes newly recognized homology to glycosyltransferases, and residues potentially involved in substrate UDP-glucose binding and catalysis are essential. All three hot spots are external, with hot spot 1 adjacent to and hot spot 3 largely embedded within the outer leaflet of the membrane. This topology suggests a model in which echinocandins interact through their lipid tails with hot spot 3 and through their cyclic peptides with hot spots 1 and 2.
In the context of protein-protein interactions, the term “hot spot” refers to a residue or cluster of residues that makes a major contribution to the binding free energy, as determined by alanine scanning mutagenesis. In contrast, in pharmaceutical research a hot spot is a site on a target protein that has high propensity for ligand binding and hence is potentially important for drug discovery. Here we examine the relationship between these two hot spot concepts by comparing alanine scanning data for a set of 15 proteins with results from mapping the protein surfaces for sites that can bind fragment-sized small molecules. We find the two types of hot spots are largely complementary; the residues protruding into hot spot regions identified by computational mapping or experimental fragment screening are almost always themselves hot spot residues as defined by alanine scanning experiments. Conversely, a residue that is found by alanine scanning to contribute little to binding rarely interacts with hot spot regions on the partner protein identified by fragment mapping. In spite of the strong correlation between the two hot spot concepts, they fundamentally differ, however. In particular, while identification of a hot spot by alanine scanning establishes the potential to generate substantial interaction energy with a binding partner, there are additional topological requirements to be a hot spot for small molecule binding. Hence, only a minority of hot spots identified by alanine scanning represent sites that are potentially useful for small inhibitor binding, and it is this subset that is identified by experimental or computational fragment screening.
We report a comprehensive analysis of binding energy hot spots at the protein-protein interaction (PPI) interface between NF-κB Essential Modulator (NEMO) and IκB kinase subunit β (IKKβ), an interaction that is critical for NF-κB pathway signaling, using experimental alanine scanning mutagenesis and also the FTMap method for computational fragment screening. The experimental results confirm that the previously identified NBD region of IKKβ contains the highest concentration of hot spot residues, the strongest of which are W739, W741 and L742 (ΔΔG = 4.3, 3.5 and 3.2 kcal/mol, respectively). The region occupied by these residues defines a potentially druggable binding site on NEMO that extends for ~16 Å to additionally include the regions that bind IKKβ L737 and F734. NBD residues D738 and S740 are also important for binding but do not make direct contact with NEMO, instead likely acting to stabilize the active conformation of surrounding residues. We additionally found two previously unknown hot spot regions centered on IKKβ residues L708/V709 and L719/I723. The computational approach successfully identified all three hot spot regions on IKKβ. Moreover, the method was able to accurately quantify the energetic importance of all hot spots residues involving direct contact with NEMO. Our results provide new information to guide the discovery of small molecule inhibitors that target the NEMO/IKKβ interaction. They additionally clarify the structural and energetic complementarity between “pocket-forming” and “pocket occupying” hot spot residues, and further validate computational fragment mapping as a method for identifying hot spots at PPI interfaces.
IKKγ; alanine scanning mutagenesis; protein-protein interactions; IKKγ; fluorescence polarization; fluorescence anisotropy
Systematic mutagenesis studies have shown that only a few interface residues termed hot spots contribute significantly to the binding free energy of protein-protein interactions. Therefore, hot spots prediction becomes increasingly important for well understanding the essence of proteins interactions and helping narrow down the search space for drug design. Currently many computational methods have been developed by proposing different features. However comparative assessment of these features and furthermore effective and accurate methods are still in pressing need.
In this study, we first comprehensively collect the features to discriminate hot spots and non-hot spots and analyze their distributions. We find that hot spots have lower relASA and larger relative change in ASA, suggesting hot spots tend to be protected from bulk solvent. In addition, hot spots have more contacts including hydrogen bonds, salt bridges, and atomic contacts, which favor complexes formation. Interestingly, we find that conservation score and sequence entropy are not significantly different between hot spots and non-hot spots in Ab+ dataset (all complexes). While in Ab- dataset (antigen-antibody complexes are excluded), there are significant differences in two features between hot pots and non-hot spots. Secondly, we explore the predictive ability for each feature and the combinations of features by support vector machines (SVMs). The results indicate that sequence-based feature outperforms other combinations of features with reasonable accuracy, with a precision of 0.69, a recall of 0.68, an F1 score of 0.68, and an AUC of 0.68 on independent test set. Compared with other machine learning methods and two energy-based approaches, our approach achieves the best performance. Moreover, we demonstrate the applicability of our method to predict hot spots of two protein complexes.
Experimental results show that support vector machine classifiers are quite effective in predicting hot spots based on sequence features. Hot spots cannot be fully predicted through simple analysis based on physicochemical characteristics, but there is reason to believe that integration of features and machine learning methods can remarkably improve the predictive performance for hot spots.
Motivation: The O-ring theory reveals that the binding hot spot at a protein interface is surrounded by a ring of residues that are energetically less important than the residues in the hot spot. As this ring of residues is served to occlude water molecules from the hot spot, the O-ring theory is also called ‘water exclusion’ hypothesis. We propose a ‘double water exclusion’ hypothesis to refine the O-ring theory by assuming the hot spot itself is water-free. To computationally model a water-free hot spot, we use a biclique pattern that is defined as two maximal groups of residues from two chains in a protein complex holding the property that every residue contacts with all residues in the other group.
Methods and Results: Given a chain pair A and B of a protein complex from the Protein Data Bank (PDB), we calculate the interatomic distance of all possible pairs of atoms between A and B. We then represent A and B as a bipartite graph based on these distance information. Maximal biclique subgraphs are subsequently identified from all of the bipartite graphs to locate biclique patterns at the interfaces. We address two properties of biclique patterns: a non-redundant occurrence in PDB, and a correspondence with hot spots when the solvent-accessible surface area (SASA) of a biclique pattern in the complex form is small. A total of 1293 biclique patterns are discovered which have a non-redundant occurrence of at least five, and which each have a minimum two and four residues at the two sides. Through extensive queries to the HotSprint and ASEdb databases, we verified that biclique patterns are rich of true hot residues. Our algorithm and results provide a new way to identify hot spots by examining proteins' structural data.
Availability: The biclique mining algorithm is available at http://www.ntu.edu.sg/home/jyli/dwe.html.
Supplementary information: Supplementary data are available at Bioinformatics online.
Summary: Many non-synonymous single nucleotide polymor-phisms (nsSNPs) in humans are suspected to impact protein function. Here, we present a publicly available server implementation of the method SNAP (screening for non-acceptable polymorphisms) that predicts the functional effects of single amino acid substitutions. SNAP identifies over 80% of the non-neutral mutations at 77% accuracy and over 76% of the neutral mutations at 80% accuracy at its default threshold. Each prediction is associated with a reliability index that correlates with accuracy and thereby enables experimentalists to zoom into the most promising predictions.
Availability: Web-server: http://www.rostlab.org/services/SNAP; downloadable program available upon request.
Supplementary information: Supplementary data are available at Bioinformatics online.
Binding free energy and binding hot spots at protein-protein interfaces are two important research areas for understanding protein interactions. Computational methods have been developed previously for accurate prediction of binding free energy change upon mutation for interfacial residues. However, a large number of interrupted and unimportant atomic contacts are used in the training phase which caused accuracy loss.
This work proposes a new method, βACV
, to predict the change of binding free energy after alanine mutations. βACV
integrates accessible surface area (ASA) and our newly defined β contacts together into an atomic contact vector (ACV). A β contact between two atoms is a direct contact without being interrupted by any other atom between them. A β contact’s potential contribution to protein binding is also supposed to be inversely proportional to its ASA to follow the water exclusion hypothesis of binding hot spots. Tested on a dataset of 396 alanine mutations, our method is found to be superior in classification performance to many other methods, including Robetta, FoldX, HotPOINT, an ACV method of β contacts without ASA integration, and ACV
methods (similar to βACV
but based on distance-cutoff contacts). Based on our data analysis and results, we can draw conclusions that: (i) our method is powerful in the prediction of binding free energy change after alanine mutation; (ii) β contacts are better than distance-cutoff contacts for modeling the well-organized protein-binding interfaces; (iii) β contacts usually are only a small fraction number of the distance-based contacts; and (iv) water exclusion is a necessary condition for a residue to become a binding hot spot.
is designed using the advantages of both β contacts and water exclusion. It is an excellent tool to predict binding free energy changes and binding hot spots after alanine mutation.
The entry of the SARS coronavirus (SCV) into cells is initiated by binding of its spike envelope glycoprotein (S) to a receptor, ACE2. We and others identified the receptor-binding domain (RBD) by using S fragments of various lengths but all including the amino acid residue 318 and two other potential glycosylation sites. To further characterize the role of glycosylation and identify residues important for its function as an interacting partner of ACE2, we have cloned, expressed and characterized various soluble fragments of S containing RBD, and mutated all potential glycosylation sites and 32 other residues. The shortest of these fragments still able to bind the receptor ACE2 did not include residue 318 (which is a potential glycosylation site), but started at residue 319, and has only two potential glycosylation sites (residues 330 and 357). Mutation of each of these sites to either alanine or glutamine, as well as mutation of residue 318 to alanine in longer fragments resulted in the same decrease of molecular weight (by approximately 3 kDa) suggesting that all glycosylation sites are functional. Simultaneous mutation of all glycosylation sites resulted in lack of expression suggesting that at least one glycosylation site (any of the three) is required for expression. Glycosylation did not affect binding to ACE2. Alanine scanning mutagenesis of the fragment S319–518 resulted in the identification of ten residues (K390, R426, D429, T431, I455, N473, F483, Q492, Y494, R495) that significantly reduced binding to ACE2, and one residue (D393) that appears to increase binding. Mutation of residue T431 reduced binding by about 2-fold, and mutation of the other eight residues – by more than 10-fold. Analysis of these data and the mapping of these mutations on the recently determined crystal structure of a fragment containing the RBD complexed to ACE2 (Li, F, Li, W, Farzan, M, and Harrison, S. C., submitted) suggested the existence of two hot spots on the S RBD surface, R426 and N473, which are likely to contribute significant portion of the binding energy. The finding that most of the mutations (23 out of 34 including glycosylation sites) do not affect the RBD binding function indicates possible mechanisms for evasion of immune responses.
Human heat shock protein of 90 kDa (hHsp90) is a homodimer that has an essential role in facilitating malignant transformation at the molecular level. Inhibiting hHsp90 function is a validated approach for treating different types of tumors. Inhibiting the dimerization of hHsp90 via its C-terminal domain (CTD) should provide a novel way to therapeutically interfere with hHsp90 function. Here, we predicted hot spot residues that cluster in the CTD dimerization interface by a structural decomposition of the effective energy of binding computed by the MM-GBSA approach and confirmed these predictions using in silico alanine scanning with DrugScorePPI. Mutation of these residues to alanine caused a significant decrease in the melting temperature according to differential scanning fluorimetry experiments, indicating a reduced stability of the mutant hHsp90 complexes. Size exclusion chromatography and multi-angle light scattering studies demonstrate that the reduced stability of the mutant hHsp90 correlates with a lower complex stoichiometry due to the disruption of the dimerization interface. These results suggest that the identified hot spot residues can be used as a pharmacophoric template for identifying and designing small-molecule inhibitors of hHsp90 dimerization.
The melanocortin 4 receptor (MC4R) is a G-protein-coupled receptor (GPCR) and a key molecule in the regulation of energy homeostasis. At least 159 substitutions in the coding region of human MC4R (hMC4R) have been described experimentally; over 80 of those occur naturally, and many have been implicated in obesity. However, assessment of the presumably functionally essential residues remains incomplete. Here we have performed a complete in silico mutagenesis analysis to assess the functional essentiality of all possible nonnative point mutants in the entire hMC4R protein (332 residues). We applied SNAP, which is a method for quantifying functional consequences of single amino acid (AA) substitutions, to calculate the effects of all possible substitutions at each position in the hMC4R AA sequence. We compiled a mutability score that reflects the degree to which a particular residue is likely to be functionally important. We performed the same experiment for a paralogue human melanocortin receptor (hMC1R) and a mouse orthologue (mMC4R) in order to compare computational evaluations of highly related sequences. Three results are most salient: 1) our predictions largely agree with the available experimental annotations; 2) this analysis identified several AAs that are likely to be functionally critical, but have not yet been studied experimentally; and 3) the differential analysis of the receptors implicates a number of residues as specifically important to MC4Rs vs. other GPCRs, such as hMC1R.—Bromberg, Y., Overton, J., Vaisse, C., Leibel, R. L., Rost, B. In silico mutagenesis: a case study of the melanocortin 4 receptor.
MC4R; MC1R; SNAP; active functional site; obesity; diabetes
A protein binding hot spot is a cluster of residues in the interface that are energetically important for the binding of the protein with its interaction partner. Identifying protein binding hot spots can give useful information to protein engineering and drug design, and can also deepen our understanding of protein-protein interaction. These residues are usually buried inside the interface with very low solvent accessible surface area (SASA). Thus SASA is widely used as an outstanding feature in hot spot prediction by many computational methods. However, SASA is not capable of distinguishing slightly buried residues, of which most are non hot spots, and deeply buried ones that are usually inside a hot spot.
We propose a new descriptor called “burial level” for characterizing residues, atoms and atomic contacts. Specifically, burial level captures the depth the residues are buried. We identify different kinds of deeply buried atomic contacts (DBAC) at different burial levels that are directly broken in alanine substitution. We use their numbers as input for SVM to classify between hot spot or non hot spot residues. We achieve F measure of 0.6237 under the leave-one-out cross-validation on a data set containing 258 mutations. This performance is better than other computational methods.
Our results show that hot spot residues tend to be deeply buried in the interface, not just having a low SASA value. This indicates that a high burial level is not only a necessary but also a more sufficient condition than a low SASA for a residue to be a hot spot residue. We find that those deeply buried atoms become increasingly more important when their burial levels rise up. This work also confirms the contribution of deeply buried interfacial atomic contacts to the energy of protein binding hot spot.
Many genetic variations are single nucleotide polymorphisms (SNPs). Non-synonymous SNPs are ‘neutral’ if the resulting point-mutated protein is not functionally discernible from the wild type and ‘non-neutral’ otherwise. The ability to identify non-neutral substitutions could significantly aid targeting disease causing detrimental mutations, as well as SNPs that increase the fitness of particular phenotypes. Here, we introduced comprehensive data sets to assess the performance of methods that predict SNP effects. Along we introduced SNAP (screening for non-acceptable polymorphisms), a neural network-based method for the prediction of the functional effects of non-synonymous SNPs. SNAP needs only sequence information as input, but benefits from functional and structural annotations, if available. In a cross-validation test on over 80 000 mutants, SNAP identified 80% of the non-neutral substitutions at 77% accuracy and 76% of the neutral substitutions at 80% accuracy. This constituted an important improvement over other methods; the improvement rose to over ten percentage points for mutants for which existing methods disagreed. Possibly even more importantly SNAP introduced a well-calibrated measure for the reliability of each prediction. This measure will allow users to focus on the most accurate predictions and/or the most severe effects. Available at http://www.rostlab.org/services/SNAP
The significant work that has been invested toward understanding protein–protein interaction has not translated into significant advances in structure-based predictions. In particular redesigning protein surfaces to bind to unrelated receptors remains a challenge, partly due to receptor flexibility, which is often neglected in these efforts. In this work, we computationally graft the binding epitope of various small proteins obtained from the RCSB database to bind to barnase, lysozyme, and trypsin using a previously derived and validated algorithm. In an effort to probe the protein complexes in a realistic environment, all native and designer complexes were subjected to a total of nearly 400 ns of explicit-solvent molecular dynamics (MD) simulation. The MD data led to an unexpected observation: some of the designer complexes were highly unstable and decomposed during the trajectories. In contrast, the native and a number of designer complexes remained consistently stable. The unstable conformers provided us with a unique opportunity to define the structural and energetic factors that lead to unproductive protein–protein complexes. To that end we used free energy calculations following the MM-PBSA approach to determine the role of nonpolar effects, electrostatics and entropy in binding. Remarkably, we found that a majority of unstable complexes exhibited more favorable electrostatics than native or stable designer complexes, suggesting that favorable electrostatic interactions are not prerequisite for complex formation between proteins. However, nonpolar effects remained consistently more favorable in native and stable designer complexes reinforcing the importance of hydrophobic effects in protein–protein binding. While entropy systematically opposed binding in all cases, there was no observed trend in the entropy difference between native and designer complexes. A series of alanine scanning mutations of hot-spot residues at the interface of native and designer complexes showed less than optimal contacts of hot-spot residues with their surroundings in the unstable conformers, resulting in more favorable entropy for these complexes. Finally, disorder predictions revealed that secondary structures at the interface of unstable complexes exhibited greater disorder than the stable complexes.
Potassium (K+) ion channels switch between open and closed conformations. The nature of this important transition was revealed by comparing the X-ray crystal structures of the MthK channel from Methanobacterium thermoautotrophicum, obtained in its open conformation, and the KcsA channel from Streptomyces lividans, obtained in its closed conformation. We analyzed the dynamic characteristics and energetics of these homotetrameric structures in order to study the role of the intersubunit cooperativity in this transition. For this, elastic models and in silico alanine-scanning mutagenesis were used, respectively. Reassuringly, the calculations manifested motion from the open (closed) towards the closed (open) conformation. The calculations also revealed a network of dynamically and energetically coupled residues. Interestingly, the network suggests coupling between the selectivity filter and the gate, which are located at the two ends of the channel pore. Coupling between these two regions was not observed in calculations that were conducted with the monomer, which emphasizes the importance of the intersubunit interactions within the tetrameric structure for the cooperative gating behavior of the channel.
Potassium channels are found, in essence, in all kingdoms of life and all types of cells, and they are involved in key biological processes. For example, they are involved in the generation and propagation of nerve impulses in the synapse and neuron. Mutations in the proteins that form the channel may lead to diseases, such as multiple sclerosis, cystic fibrosis, and cardiac arrhythmia. Because of their involvement in these and other channelopathies, i.e., channel-related diseases, they are major drug targets. The channels switch between open (ion-conducting) and closed conformations. The structural characteristics of the transition between these conformations were studied using X-ray crystallography, spectroscopic, and single-molecule techniques, as well as computations. Here we used normal-mode analysis and in silico alanine-scanning mutagenesis to understand the molecular underpinnings of this transition. Our results suggest that the transition is mediated through a network of amino acids that are coupled to each other and connect the two ends of the pore. The importance of many of these residues was noted in previous empirical studies. The calculations also suggest that interactions between subunits of the homotetrameric structure of the channel contribute to the transition. The approach may also be useful to elucidate the mechanism of other transmembrane proteins in molecular details and to suggest key amino acids that are functionally important.
The 3D-partner is a web tool to predict interacting partners and binding models of a query protein sequence through structure complexes and a new scoring function. 3D-partner first utilizes IMPALA to identify homologous structures (templates) of a query from a heterodimer profile library. The interacting-partner sequence profiles of these templates are then used to search interacting candidates of the query from protein sequence databases (e.g. SwissProt) by PSI-BLAST. We developed a new scoring function, which includes the contact-residue interacting score (e.g. the steric, hydrogen bonds, and electrostatic interactions) and the template consensus score (e.g. couple-conserved residue and the template similarity scores), to evaluate how well the interfaces between the query and interacting candidates. Based on this scoring function, 3D-partner provides the statistic significance, the binding models (e.g. hydrogen bonds and conserved amino acids) and functional annotations of interacting partners. The correlation between experimental energies and predicted binding affinities of our scoring function is 0.91 on 275 mutated residues from the ASEdb. The average precision of the server is 0.72 on 563 queries and the execution time of this server for a query is ∼15 s on average. These results suggest that the 3D-partner server can be useful in protein-protein interaction predictions and binding model visualizations. The server is available online at: http://3D-partner.life.nctu.edu.tw.
The study of protein-protein interactions is becoming increasingly important for biotechnological and therapeutic reasons. We can define two major areas therein: the structural prediction of protein-protein binding mode, and the identification of the relevant residues for the interaction (so called 'hot-spots'). These hot-spot residues have high interest since they are considered one of the possible ways of disrupting a protein-protein interaction. Unfortunately, large-scale experimental measurement of residue contribution to the binding energy, based on alanine-scanning experiments, is costly and thus data is fairly limited. Recent computational approaches for hot-spot prediction have been reported, but they usually require the structure of the complex.
We have applied here normalized interface propensity (NIP) values derived from rigid-body docking with electrostatics and desolvation scoring for the prediction of interaction hot-spots. This parameter identifies hot-spot residues on interacting proteins with predictive rates that are comparable to other existing methods (up to 80% positive predictive value), and the advantage of not requiring any prior structural knowledge of the complex.
The NIP values derived from rigid-body docking can reliably identify a number of hot-spot residues whose contribution to the interaction arises from electrostatics and desolvation effects. Our method can propose residues to guide experiments in complexes of biological or therapeutic interest, even in cases with no available 3D structure of the complex.
Heterozygous mutations in the central glycolytic enzyme glucokinase (GCK) can result in an autosomal dominant inherited disease, namely maturity-onset diabetes of the young, type 2 (MODY 2). MODY 2 is characterised by early onset: it usually appears before 25 years of age and presents as a mild form of hyperglycaemia. In recent years, the number of known GCK mutations has markedly increased. As a result, interpreting which mutations cause a disease or confer susceptibility to a disease and characterising these deleterious mutations can be a difficult task in large-scale analyses and may be impossible when using a structural perspective. The laborious and time-consuming nature of the experimental analysis led us to attempt to develop a cost-effective computational pipeline for diabetic research that is based on the fundamentals of protein biophysics and that facilitates our understanding of the relationship between phenotypic effects and evolutionary processes. In this study, we investigate missense mutations in the GCK gene by using a wide array of evolution- and structure-based computational methods, such as SIFT, PolyPhen2, PhD-SNP, SNAP, SNPs&GO, fathmm, and Align GVGD. Based on the computational prediction scores obtained using these methods, three mutations, namely E70K, A188T, and W257R, were identified as highly deleterious on the basis of their effects on protein structure and function. Using the evolutionary conservation predictors Consurf and Scorecons, we further demonstrated that most of the predicted deleterious mutations, including E70K, A188T, and W257R, occur in highly conserved regions of GCK. The effects of the mutations on protein stability were computed using PoPMusic 2.1, I-mutant 3.0, and Dmutant. We also conducted molecular dynamics (MD) simulation analysis through in silico modelling to investigate the conformational differences between the native and the mutant proteins and found that the identified deleterious mutations alter the stability, flexibility, and solvent-accessible surface area of the protein. Furthermore, the functional role of each SNP in GCK was identified and characterised using SNPeffect 4.0, F-SNP, and FASTSNP. We hope that the observed results aid in the identification of disease-associated mutations that affect protein structure and function. Our in silico findings provide a new perspective on the role of GCK mutations in MODY2 from an evolution-based structure-centric point of view. The computational architecture described in this paper can be used to predict the most appropriate disease phenotypes for large-genome sequencing projects and to provide individualised drug therapy for complex diseases such as diabetes.
GCK; Diabetes; Missense mutations; Evolutionary analysis; Molecular dynamics
Biological evolution conserves protein residues that are important for structure and function. Both protein stability and function often require a certain degree of structural co-operativity between spatially neighboring residues and it has previously been shown that conserved residues occur clustered together in protein tertiary structures, enzyme active sites and protein-DNA interfaces. Residues comprising protein interfaces are often more conserved compared to those occurring elsewhere on the protein surface. We investigate the extent to which conserved residues within protein-protein interfaces are clustered together in three-dimensions.
Out of 121 and 392 interfaces in homodimers and heterocomplexes, 96.7 and 86.7%, respectively, have the conserved positions clustered within the overall interface region. The significance of this clustering was established in comparison to what is seen for the subsets of the same size of randomly selected residues from the interface. Conserved residues occurring in larger interfaces could often be sub-divided into two or more distinct sub-clusters. These structural cluster(s) comprising conserved residues indicate functionally important regions within the protein-protein interface that can be targeted for further structural and energetic analysis by experimental scanning mutagenesis. Almost 60% of experimental hot spot residues (with ΔΔG > 2 kcal/mol) were localized to these conserved residue clusters. An analysis of the residue types that are enriched within these conserved subsets compared to the overall interface showed that hydrophobic and aromatic residues are favored, but charged residues (both positive and negative) are less common. The potential use of this method for discriminating binding sites (interfaces) versus random surface patches was explored by comparing the clustering of conserved residues within each of these regions - in about 50% cases the true interface is ranked among the top 10% of all surface patches.
Protein-protein interaction sites are much larger than small molecule biding sites, but still conserved residues are not randomly distributed over the whole interface and are distinctly clustered. The clustered nature of evolutionarily conserved residues within interfaces as compared to those within other surface patches not involved in binding has important implications for the identification of protein-protein binding sites and would have applications in docking studies.