Mutating residues into alanine (alanine scanning) is one of the fastest experimental means of probing hypotheses about protein function. Alanine scans can reveal functional hot spots, i.e. residues that alter function upon mutation. In vitro mutagenesis is cumbersome and costly: probing all residues in a protein is typically as impossible as substituting by all non-native amino acids. In contrast, such exhaustive mutagenesis is feasible in silico.
Previously, we developed SNAP to predict functional changes due to non-synonymous single nucleotide polymorphisms. Here, we applied SNAP to all experimental mutations in the ASEdb database of alanine scans; we identified 70% of the hot spots (≥1kCal/mol change in binding energy); more severe changes were predicted more accurately. Encouraged, we carried out a complete all-against-all in silico mutagenesis for human glucokinase. Many of the residues predicted as functionally important have indeed been confirmed in the literature, others await experimental verification, and our method is ready to aid in the design of in vitro mutagenesis.
ASEdb and glucokinase scores are available at http://www.rostlab.org/services/SNAP. For submissions of large/whole proteins for processing please contact the author.
Hot spots are residues contributing the most of binding free energy yet accounting for a small portion of a protein interface. Experimental approaches to identify hot spots such as alanine scanning mutagenesis are expensive and time-consuming, while computational methods are emerging as effective alternatives to experimental approaches.
In this study, we propose a semi-supervised boosting SVM, which is called sbSVM, to computationally predict hot spots at protein-protein interfaces by combining protein sequence and structure features. Here, feature selection is performed using random forests to avoid over-fitting. Due to the deficiency of positive samples, our approach samples useful unlabeled data iteratively to boost the performance of hot spots prediction. The performance evaluation of our method is carried out on a dataset generated from the ASEdb database for cross-validation and a dataset from the BID database for independent test. Furthermore, a balanced dataset with similar amounts of hot spots and non-hot spots (65 and 66 respectively) derived from the first training dataset is used to further validate our method. All results show that our method yields good sensitivity, accuracy and F1 score comparing with the existing methods.
Our method boosts prediction performance of hot spots by using unlabeled data to overcome the deficiency of available training data. Experimental results show that our approach is more effective than the traditional supervised algorithms and major existing hot spot prediction methods.
Alanine scanning mutagenesis is a powerful experimental methodology for investigating the structural and energetic characteristics of protein complexes. Individual amino-acids are systematically mutated to alanine and changes in free energy of binding (ΔΔG) measured. Several experiments have shown that protein-protein interactions are critically dependent on just a few residues ("hot spots") at the interface. Hot spots make a dominant contribution to the free energy of binding and if mutated they can disrupt the interaction. As mutagenesis studies require significant experimental efforts, there is a need for accurate and reliable computational methods. Such methods would also add to our understanding of the determinants of affinity and specificity in protein-protein recognition.
We present a novel computational strategy to identify hot spot residues, given the structure of a complex. We consider the basic energetic terms that contribute to hot spot interactions, i.e. van der Waals potentials, solvation energy, hydrogen bonds and Coulomb electrostatics. We treat them as input features and use machine learning algorithms such as Support Vector Machines and Gaussian Processes to optimally combine and integrate them, based on a set of training examples of alanine mutations. We show that our approach is effective in predicting hot spots and it compares favourably to other available methods. In particular we find the best performances using Transductive Support Vector Machines, a semi-supervised learning scheme. When hot spots are defined as those residues for which ΔΔG ≥ 2 kcal/mol, our method achieves a precision and a recall respectively of 56% and 65%.
We have developed an hybrid scheme in which energy terms are used as input features of machine learning models. This strategy combines the strengths of machine learning and energy-based methods. Although so far these two types of approaches have mainly been applied separately to biomolecular problems, the results of our investigation indicate that there are substantial benefits to be gained by their integration.
In the context of protein-protein interactions, the term “hot spot” refers to a residue or cluster of residues that makes a major contribution to the binding free energy, as determined by alanine scanning mutagenesis. In contrast, in pharmaceutical research a hot spot is a site on a target protein that has high propensity for ligand binding and hence is potentially important for drug discovery. Here we examine the relationship between these two hot spot concepts by comparing alanine scanning data for a set of 15 proteins with results from mapping the protein surfaces for sites that can bind fragment-sized small molecules. We find the two types of hot spots are largely complementary; the residues protruding into hot spot regions identified by computational mapping or experimental fragment screening are almost always themselves hot spot residues as defined by alanine scanning experiments. Conversely, a residue that is found by alanine scanning to contribute little to binding rarely interacts with hot spot regions on the partner protein identified by fragment mapping. In spite of the strong correlation between the two hot spot concepts, they fundamentally differ, however. In particular, while identification of a hot spot by alanine scanning establishes the potential to generate substantial interaction energy with a binding partner, there are additional topological requirements to be a hot spot for small molecule binding. Hence, only a minority of hot spots identified by alanine scanning represent sites that are potentially useful for small inhibitor binding, and it is this subset that is identified by experimental or computational fragment screening.
It is well known that most of the binding free energy of protein interaction is contributed by a few key hot spot residues. These residues are crucial for understanding the function of proteins and studying their interactions. Experimental hot spots detection methods such as alanine scanning mutagenesis are not applicable on a large scale since they are time consuming and expensive. Therefore, reliable and efficient computational methods for identifying hot spots are greatly desired and urgently required.
In this work, we introduce an efficient approach that uses support vector machine (SVM) to predict hot spot residues in protein interfaces. We systematically investigate a wide variety of 62 features from a combination of protein sequence and structure information. Then, to remove redundant and irrelevant features and improve the prediction performance, feature selection is employed using the F-score method. Based on the selected features, nine individual-feature based predictors are developed to identify hot spots using SVMs. Furthermore, a new ensemble classifier, namely APIS (A combined model based on Protrusion Index and Solvent accessibility), is developed to further improve the prediction accuracy. The results on two benchmark datasets, ASEdb and BID, show that this proposed method yields significantly better prediction accuracy than those previously published in the literature. In addition, we also demonstrate the predictive power of our proposed method by modelling two protein complexes: the calmodulin/myosin light chain kinase complex and the heat shock locus gene products U and V complex, which indicate that our method can identify more hot spots in these two complexes compared with other state-of-the-art methods.
We have developed an accurate prediction model for hot spot residues, given the structure of a protein complex. A major contribution of this study is to propose several new features based on the protrusion index of amino acid residues, which has been shown to significantly improve the prediction performance of hot spots. Moreover, we identify a compact and useful feature subset that has an important implication for identifying hot spot residues. Our results indicate that these features are more effective than the conventional evolutionary conservation, pairwise residue potentials and other traditional features considered previously, and that the combination of our and traditional features may support the creation of a discriminative feature set for efficient prediction of hot spot residues. The data and source code are available on web site http://home.ustc.edu.cn/~jfxia/hotspot.html.
The study of protein-protein interactions is becoming increasingly important for biotechnological and therapeutic reasons. We can define two major areas therein: the structural prediction of protein-protein binding mode, and the identification of the relevant residues for the interaction (so called 'hot-spots'). These hot-spot residues have high interest since they are considered one of the possible ways of disrupting a protein-protein interaction. Unfortunately, large-scale experimental measurement of residue contribution to the binding energy, based on alanine-scanning experiments, is costly and thus data is fairly limited. Recent computational approaches for hot-spot prediction have been reported, but they usually require the structure of the complex.
We have applied here normalized interface propensity (NIP) values derived from rigid-body docking with electrostatics and desolvation scoring for the prediction of interaction hot-spots. This parameter identifies hot-spot residues on interacting proteins with predictive rates that are comparable to other existing methods (up to 80% positive predictive value), and the advantage of not requiring any prior structural knowledge of the complex.
The NIP values derived from rigid-body docking can reliably identify a number of hot-spot residues whose contribution to the interaction arises from electrostatics and desolvation effects. Our method can propose residues to guide experiments in complexes of biological or therapeutic interest, even in cases with no available 3D structure of the complex.
Systematic mutagenesis studies have shown that only a few interface residues termed hot spots contribute significantly to the binding free energy of protein-protein interactions. Therefore, hot spots prediction becomes increasingly important for well understanding the essence of proteins interactions and helping narrow down the search space for drug design. Currently many computational methods have been developed by proposing different features. However comparative assessment of these features and furthermore effective and accurate methods are still in pressing need.
In this study, we first comprehensively collect the features to discriminate hot spots and non-hot spots and analyze their distributions. We find that hot spots have lower relASA and larger relative change in ASA, suggesting hot spots tend to be protected from bulk solvent. In addition, hot spots have more contacts including hydrogen bonds, salt bridges, and atomic contacts, which favor complexes formation. Interestingly, we find that conservation score and sequence entropy are not significantly different between hot spots and non-hot spots in Ab+ dataset (all complexes). While in Ab- dataset (antigen-antibody complexes are excluded), there are significant differences in two features between hot pots and non-hot spots. Secondly, we explore the predictive ability for each feature and the combinations of features by support vector machines (SVMs). The results indicate that sequence-based feature outperforms other combinations of features with reasonable accuracy, with a precision of 0.69, a recall of 0.68, an F1 score of 0.68, and an AUC of 0.68 on independent test set. Compared with other machine learning methods and two energy-based approaches, our approach achieves the best performance. Moreover, we demonstrate the applicability of our method to predict hot spots of two protein complexes.
Experimental results show that support vector machine classifiers are quite effective in predicting hot spots based on sequence features. Hot spots cannot be fully predicted through simple analysis based on physicochemical characteristics, but there is reason to believe that integration of features and machine learning methods can remarkably improve the predictive performance for hot spots.
Protein–protein complexes play key roles in all cellular signal transduction processes. We have developed a fast and accurate computational approach to predict changes in the binding free energy upon alanine mutations in protein–protein interfaces. The approach is based on a knowledge-based scoring function, DrugScorePPI, for which pair potentials were derived from 851 complex structures and adapted against 309 experimental alanine scanning results. Based on this approach, we developed the DrugScorePPI webserver. The input consists of a protein–protein complex structure; the output is a summary table and bar plot of binding free energy differences for wild-type residue-to-Ala mutations. The results of the analysis are mapped on the protein–protein complex structure and visualized using J mol. A single interface can be analyzed within a few minutes. Our approach has been successfully validated by application to an external test set of 22 alanine mutations in the interface of Ras/RalGDS. The DrugScorePPI webserver is primarily intended for identifying hotspot residues in protein–protein interfaces, which provides valuable information for guiding biological experiments and in the development of protein–protein interaction modulators. The DrugScorePPI Webserver, accessible at http://cpclab.uni-duesseldorf.de/dsppi, is free and open to all users with no login requirement.
It is known that binding free energy of protein-protein interaction is mainly contributed by hot spot (high energy) interface residues. Here, we
investigate the characteristics of hot spots by examining inter-atomic sidechain-sidechain interactions using a dataset of 296 alanine-mutated interface
residues. Results show that hot spots participate in strong and energetically favorable sidechain-sidechain interactions. Subsequently, we describe a novel,
yet simple ‘hot spot’ prediction model with an accuracy that is similar to many available approaches. The model is also shown to efficiently distinguish
specific protein-protein interactions from non-specific interactions.
protein-protein interaction; interface analysis; hot spot residues; inter-atomic interaction
γ-Tubulin is a conserved essential protein required for assembly
and function of the mitotic spindle in humans and yeast. For example,
human γ-tubulin can replace the γ-tubulin gene in
Schizosaccharomyces pombe. To understand the
structural/functional domains of γ-tubulin, we performed a systematic
alanine-scanning mutagenesis of human γ-tubulin
(TUBG1) and studied phenotypes of each mutant allele in
S. pombe. Our screen, both in the presence and absence
of the endogenous S. pombe γ-tubulin, resulted in 11
lethal mutations and 12 cold-sensitive mutations. Based on structural
mapping onto a homology model of human γ-tubulin generated by free
energy minimization, all deleterious mutations are found in residues
predicted to be located on the surface, some in positions to interact
with α- and/or β-tubulins in the microtubule lattice. As expected,
one class of tubg1 mutations has either an abnormal
assembly or loss of the mitotic spindle. Surprisingly, a subset of
mutants with abnormal spindles does not arrest in M phase but proceeds
through anaphase followed by abnormal cytokinesis. These studies reveal
that in addition to its previously appreciated role in spindle
microtubule nucleation, γ-tubulin is involved in the coordination of
postmetaphase events, anaphase, and cytokinesis.
Protein-protein interactions are critically dependent on just a few ‘hot spot’ residues at the interface. Hot spots make a dominant contribution to the free energy of binding and they can disrupt the interaction if mutated to alanine. Here, we present HSPred, a support vector machine(SVM)-based method to predict hot spot residues, given the structure of a complex. HSPred represents an improvement over a previously described approach (Lise et al, BMC Bioinformatics 2009, 10:365). It achieves higher accuracy by treating separately predictions involving either an arginine or a glutamic acid residue. These are the amino acid types on which the original model did not perform well. We have therefore developed two additional SVM classifiers, specifically optimised for these cases. HSPred reaches an overall precision and recall respectively of 61% and 69%, which roughly corresponds to a 10% improvement. An implementation of the described method is available as a web server at http://bioinf.cs.ucl.ac.uk/hspred. It is free to non-commercial users.
Identifying features that effectively represent the energetic contribution of an individual interface residue to the interactions between proteins remains problematic. Here, we present several new features and show that they are more effective than conventional features. By combining the proposed features with conventional features, we develop a predictive model for interaction hot spots. Initially, 54 multifaceted features, composed of different levels of information including structure, sequence and molecular interaction information, are quantified. Then, to identify the best subset of features for predicting hot spots, feature selection is performed using a decision tree. Based on the selected features, a predictive model for hot spots is created using support vector machine (SVM) and tested on an independent test set. Our model shows better overall predictive accuracy than previous methods such as the alanine scanning methods Robetta and FOLDEF, and the knowledge-based method KFC. Subsequent analysis yields several findings about hot spots. As expected, hot spots have a larger relative surface area burial and are more hydrophobic than other residues. Unexpectedly, however, residue conservation displays a rather complicated tendency depending on the types of protein complexes, indicating that this feature is not good for identifying hot spots. Of the selected features, the weighted atomic packing density, relative surface area burial and weighted hydrophobicity are the top 3, with the weighted atomic packing density proving to be the most effective feature for predicting hot spots. Notably, we find that hot spots are closely related to π–related interactions, especially π · · · π interactions.
The melanocortin 4 receptor (MC4R) is a G-protein-coupled receptor (GPCR) and a key molecule in the regulation of energy homeostasis. At least 159 substitutions in the coding region of human MC4R (hMC4R) have been described experimentally; over 80 of those occur naturally, and many have been implicated in obesity. However, assessment of the presumably functionally essential residues remains incomplete. Here we have performed a complete in silico mutagenesis analysis to assess the functional essentiality of all possible nonnative point mutants in the entire hMC4R protein (332 residues). We applied SNAP, which is a method for quantifying functional consequences of single amino acid (AA) substitutions, to calculate the effects of all possible substitutions at each position in the hMC4R AA sequence. We compiled a mutability score that reflects the degree to which a particular residue is likely to be functionally important. We performed the same experiment for a paralogue human melanocortin receptor (hMC1R) and a mouse orthologue (mMC4R) in order to compare computational evaluations of highly related sequences. Three results are most salient: 1) our predictions largely agree with the available experimental annotations; 2) this analysis identified several AAs that are likely to be functionally critical, but have not yet been studied experimentally; and 3) the differential analysis of the receptors implicates a number of residues as specifically important to MC4Rs vs. other GPCRs, such as hMC1R.—Bromberg, Y., Overton, J., Vaisse, C., Leibel, R. L., Rost, B. In silico mutagenesis: a case study of the melanocortin 4 receptor.
MC4R; MC1R; SNAP; active functional site; obesity; diabetes
Hepatitis C virus glycoprotein E2 contains 18 conserved cysteines predicted to form nine disulfide pairs. In this study, a comprehensive cysteine-alanine mutagenesis scan of all 18 cysteine residues was performed in E1E2-pseudotyped retroviruses (HCVpp) and recombinant E2 receptor-binding domain (E2 residues 384 to 661 [E2661]). All 18 cysteine residues were absolutely required for HCVpp entry competence. The phenotypes of individual cysteines and pairwise mutation of disulfides were largely the same for retrovirion-incorporated E2 and E2661, suggesting their disulfide arrangements are similar. However, the contributions of each cysteine residue and the nine disulfides to E2 structure and function varied. Individual Cys-to-Ala mutations revealed discordant effects, where removal of one Cys within a pair had minimal effect on H53 recognition and CD81 binding (C486 and C569) while mutation of its partner abolished these functions (C494 and C564). Removal of disulfides at C581-C585 and C452-C459 significantly reduced the amount of E1 coprecipitated with E2, while all other disulfides were absolutely required for E1E2 heterodimerization. Remarkably, E2661 tolerates the presence of four free cysteines, as simultaneous mutation of C452A, C486A, C569A, C581A, C585A, C597A, and C652A (M+C597A) retained wild-type CD81 binding. Thus, only one disulfide from each of the three predicted domains, C429-C552 (DI), C503-C508 (DII), and C607-C644 (DIII), is essential for the assembly of the E2661 CD81-binding site. Furthermore, the yield of total monomeric E2 increased to 70% in M+C597A. These studies reveal the contribution of each cysteine residue and the nine disulfide pairs to E2 structure and function.
To probe the structure and function of the Saccharomyces cerevisiae general transcription factor TFIIA, we have systematically mutagenized the genes encoding both subunits and analyzed the effects of the mutations both in vivo and in vitro. We found that the central nonconserved region of the large subunit is not essential for function and likely acts as a spacer between the conserved N- and C-terminal regions. Deletion mutagenesis of the large subunit defined a region which is required for TATA binding protein (TBP) interaction. Alanine scanning mutagenesis defined a cluster of four basic residues which are likely required for interaction with DNA in the TBP-DNA complex. Much of the conserved regions of both subunits is required for subunit association, suggesting that these conserved regions fold into compact domains which extensively interact. In vitro transcription performed with extracts from yeast strains with mutations in either the large or the small TFIIA subunit demonstrated that TFIIA stimulates both basal and activated polymerase II (Pol II) transcription. The TFIIA-depleted extracts have normal Pol I and Pol III transcription activity, showing that TFIIA is a Pol II-specific factor. In vivo depletion of TFIIA activity reduced transcription from four different Pol II promoters. Finally, alanine scanning mutagenesis of TFIIA's small subunit has identified at least one mutation which is defective in transcription but which is not defective in subunit association or binding to TBP or TBP-DNA complexes.
Small ankyrin-1 is a splice variant of the ANK1 gene that binds to obscurin A. Previous studies have identified electrostatic interactions that contribute to this interaction. In addition, molecular dynamics (MD) simulations predict four hydrophobic residues in a ‘hot spot’ on the surface of the ankyrin-like repeats of sAnk1, near the charged residues involved in binding. We used site-directed mutagenesis, blot overlays and surface plasmon resonance assays to study the contribution of the hydrophobic residues, V70, F71, I102 and I103, to two different 30-mers of obscurin that bind sAnk1, Obsc6316–6345 and Obsc6231–6260. Alanine mutations of each of the hydrophobic residues disrupted binding to the high affinity binding site, Obsc6316–6345. In contrast, V70A and I102A mutations had no effect on binding to the lower affinity site, Obsc6231–6260. Alanine mutagenesis of the five hydrophobic residues present in Obsc6316–6345 showed that V6328, I6332, and V6334 were critical to sAnk1 binding. Individual alanine mutants of the six hydrophobic residues of Obsc6231–6260 had no effect on binding to sAnk1, although a triple alanine mutant of residues V6233/I6234/I6235 decreased binding. We also examined a model of the Obsc6316–6345-sAnk1 complex in MD simulations and found I102 of sAnk1 to be within 2.2Å of V6334 of Obsc6316–6345. In contrast to the I102A mutation, mutating I102 of sAnk1 to other hydrophobic amino acids such as phenylalanine or leucine did not disrupt binding to obscurin. Our results suggest that hydrophobic interactions contribute to the higher affinity of Obsc6316– 6345 for sAnk1 and to the dominant role exhibited by this sequence in binding.
Skeletal muscle; hydrophobic interactions; molecular dynamics
The extracytoplasmic-function (ECF) family of sigma factors comprises a large group of proteins required for synthesis of a wide variety of extracytoplasmic products by bacteria. Residues important for core RNA polymerase (RNAP) binding, DNA melting, and promoter recognition have been identified in conserved regions 2 and 4.2 of primary sigma factors. Seventeen residues in region 2 and eight residues in region 4.2 of an ECF sigma factor, PvdS from Pseudomonas aeruginosa, were selected for alanine-scanning mutagenesis on the basis of sequence alignments with other sigma factors. Fourteen of the mutations in region 2 had a significant effect on protein function in an in vivo assay. Four proteins with alterations in regions 2.1 and 2.2 were purified as His-tagged fusions, and all showed a reduced affinity for core RNAP in vitro, consistent with a role in core binding. Region 2.3 and 2.4 mutant proteins retained the ability to bind core RNAP, but four mutants had reduced or no ability to cause core RNA polymerase to bind promoter DNA in a band-shift assay, identifying residues important for DNA binding. All mutations in region 4.2 reduced the activity of PvdS in vivo. Two of the region 4.2 mutant proteins were purified, and each showed a reduced ability to cause core RNA polymerase to bind to promoter DNA. The results show that some residues in PvdS have functions equivalent to those of corresponding residues in primary sigma factors; however, they also show that several residues not shared with primary sigma factors contribute to protein function.
The σ54 factor associates with core RNA polymerase (RNAP) to form a holoenzyme that is unable to initiate transcription unless acted on by an activator protein. σ54 is closely involved in many steps of activator-dependent transcription, such as core RNAP binding, promoter recognition, activator interaction and open complex formation. To systematically define σ54 residues that contribute to each of these functions and to generate a resource for site specific protein labeling, a complete mutant library of σ54 was constructed by alanine–cysteine scanning mutagenesis. Amino acid residues from 3 to 476 of Cys(-)σ54 were systematically mutated to alanine and cysteine in groups of two adjacent residues at a time. The influences of each substitution pair upon the functions of σ54 were analyzed in vivo and in vitro and the functions of many residues were revealed for the first time. Increased σ54 isomerization activity seldom corresponded with an increased transcription activity of the holoenzyme, suggesting the steps after σ54 isomerization, likely to be changes in core RNAP structure, are also strictly regulated or rate limiting to open complex formation. A linkage between core RNAP-binding activity and activator responsiveness indicates that the σ54-core RNAP interface changes upon activation.
Understanding the effects of mutation on pH-dependent protein binding affinity is important in protein design, especially in the area of protein therapeutics. We propose a novel method for fast in silico mutagenesis of protein–protein complexes to calculate the effect of mutation as a function of pH. The free energy differences between the wild type and mutants are evaluated from a molecular mechanics model, combined with calculations of the equilibria of proton binding. The predicted pH-dependent energy profiles demonstrate excellent agreement with experimentally measured pH-dependency of the effect of mutations on the dissociation constants for the complex of turkey ovomucoid third domain (OMTKY3) and proteinase B. The virtual scanning mutagenesis identifies all hotspots responsible for pH-dependent binding of immunoglobulin G (IgG) to neonatal Fc receptor (FcRn) and the results support the current understanding of the salvage mechanism of the antibody by FcRn based on pH-selective binding. The method can be used to select mutations that change the pH-dependent binding profiles of proteins and guide the time consuming and expensive protein engineering experiments. As an application of this method, we propose a computational strategy to search for mutations that can alter the pH-dependent binding behavior of IgG to FcRn with the aim of improving the half-life of therapeutic antibodies in the target organism.
binding affinity; mutation; scanning mutagenesis; pH; protein ionization; generalized born; CHARMm; antibody; FcRn; neonatal Fc receptor; ImG immunoglobulin G
Fks1, with orthologs in nearly all fungi as well as plants and many protists, plays a central role in fungal cell wall formation as the putative catalytic component of β-1,3-glucan synthase. It is also the target for an important new antifungal group, the echinocandins, as evidenced by the localization of resistance-conferring mutations to Fks1 hot spots 1, 2, and 3 (residues 635 to 649, 1354 to 1361, and 690 to 700, respectively). Since Fks1 is an integral membrane protein and echinocandins are cyclic peptides with lipid tails, Fks1 topology is key to understanding its function and interaction with echinocandins. We used hemagglutinin (HA)-Suc2-His4C fusions to C-terminally truncated Saccharomyces cerevisiae Fks1 to experimentally define its topology and site-directed mutagenesis to test function of selected residues. Of the 15 to 18 transmembrane helices predicted in silico for Fks1 from evolutionarily diverse fungi, 13 were experimentally confirmed. The N terminus (residues 1 to 445) is cytosolic and the C terminus (residues 1823 to 1876) external; both are essential to Fks1 function. The cytosolic central domain (residues 715 to 1294) includes newly recognized homology to glycosyltransferases, and residues potentially involved in substrate UDP-glucose binding and catalysis are essential. All three hot spots are external, with hot spot 1 adjacent to and hot spot 3 largely embedded within the outer leaflet of the membrane. This topology suggests a model in which echinocandins interact through their lipid tails with hot spot 3 and through their cyclic peptides with hot spots 1 and 2.
Biosynthesis of the commercial carotenoids canthaxanthin and astaxanthin requires β-carotene ketolase. The functional importance of the conserved amino acid residues of this enzyme from Paracoccus sp. strain N81106 (formerly classified as Agrobacterium aurantiacum) was analyzed by alanine-scanning mutagenesis. Mutations in the three highly conserved histidine motifs involved in iron coordination abolished its ability to catalyze the formation of ketocarotenoids. This supports the hypothesis that the CrtW ketolase belongs to the family of iron-dependent integral membrane proteins. Most of the mutations generated at other highly conserved residues resulted in partial activity. All partially active mutants showed a higher amount of adonixanthin accumulation than did the wild type when expressed in Escherichia coli cells harboring the zeaxanthin biosynthetic gene cluster. Some of the partially active mutants also produced a significant amount of echinenone when expressed in cells producing β-carotene. In fact, expression of a mutant carrying D117A resulted in the accumulation of echinenone as the predominant carotenoid. These observations indicate that partial inactivation of the CrtW ketolase can often lead to the production of monoketolated intermediates. In order to improve the conversion rate of astaxanthin catalyzed by the CrtW ketolase, a color screening system was developed. Three randomly generated mutants, carrying L175M, M99V, and M99I, were identified to have improved activity. These mutants are potentially useful in pathway engineering for the production of astaxanthin.
SNAP25 is synthesized as a soluble protein but must associate with the plasma membrane to function in exocytosis; however, this membrane-targeting pathway is poorly defined. SNAP25 contains a palmitoylated cysteine-rich domain with four cysteines, and we show that coexpression of specific DHHC palmitoyl transferases is sufficient to promote SNAP25 membrane association in HEK293 cells. siRNA-mediated knockdown of its SNARE partner, syntaxin 1A, does not affect membrane interaction of SNAP25 in PC12 cells, whereas specific cysteine-to-alanine mutations perturb membrane binding, which is restored by leucine substitutions. These results suggest a role for cysteine hydrophobicity in initial membrane interactions of SNAP25, and indeed other hydrophobic residues in the cysteine-rich domain are also important for membrane binding. In addition to the cysteine-rich domain, proline-117 is also essential for SNAP25 membrane binding, and experiments in HEK293 cells revealed that mutation of this residue inhibits membrane binding induced by coexpression with DHHC17, but not DHHC3 or DHHC7. These results suggest a model whereby SNAP25 interacts autonomously with membranes via its hydrophobic cysteine-rich domain, requiring only sufficient expression of partner DHHC proteins for stable membrane binding. The role of proline-117 in SNAP25 palmitoylation is one of the first descriptions of elements within substrate proteins that modulate DHHC specificity.
Most biological processes involve multiple proteins interacting with each other. It has been recently discovered that certain residues in these protein-protein interactions, which are called hot spots, contribute more significantly to binding affinity than others. Hot spot residues have unique and diverse energetic properties that make them challenging yet important targets in the modulation of protein-protein complexes. Design of therapeutic agents that interact with hot spot residues has proven to be a valid methodology in disrupting unwanted protein-protein interactions. Using biological methods to determine which residues are hot spots can be costly and time consuming. Recent advances in computational approaches to predict hot spots have incorporated a myriad of features, and have shown increasing predictive successes. Here we review the state of knowledge around protein-protein interactions, hot spots, and give an overview of multiple in silico prediction techniques of hot spot residues.
Protein-Protein Interactions; Hot Spot Residues; Structure-based Drug Discovery; In Silico Prediction; Alanine Scanning; TRAF6
We present a new database of computational hot spots in protein interfaces: HotSprint. Hot spots are residues comprising only a small fraction of interfaces yet accounting for the majority of the binding energy. HotSprint contains data for 35 776 protein interfaces among 49 512 protein interfaces extracted from the multi-chain structures in Protein Data Bank (PDB) as of February 2006. The conserved residues in interfaces with certain buried accessible solvent area (ASA) and complex ASA thresholds are flagged as computational hot spots. The predicted hot spots are observed to correlate with the experimental hot spots with an accuracy of 76%. Several machine-learning methods (SVM, Decision Trees and Decision Lists) are also applied to predict hot spots, results reveal that our empirical approach performs better than the others. A web interface for the HotSprint database allows users to browse and query the hot spots in protein interfaces. HotSprint is available at http://prism.ccbb.ku.edu.tr/hotsprint; and it provides information for interface residues that are functionally and structurally important as well as the evolutionary history and solvent accessibility of residues in interfaces.
Protein–protein interactions (PPIs) are ubiquitous in Biology, and thus offer an enormous potential for the discovery of novel therapeutics. Although protein interfaces are large and lack defining physiochemical traits, is well established that only a small portion of interface residues, the so-called hot spot residues, contribute the most to the binding energy of the protein complex. Moreover, recent successes in development of novel drugs aimed at disrupting PPIs rely on targeting such residues. Experimental methods for describing critical residues are lengthy and costly; therefore, there is a need for computational tools that can complement experimental efforts. Here, we describe a new computational approach to predict hot spot residues in protein interfaces. The method, called Presaging Critical Residues in Protein interfaces (PCRPi), depends on the integration of diverse metrics into a unique probabilistic measure by using Bayesian Networks. We have benchmarked our method using a large set of experimentally verified hot spot residues and on a blind prediction on the protein complex formed by HRAS protein and a single domain antibody. Under both scenarios, PCRPi delivered consistent and accurate predictions. Finally, PCRPi is able to handle cases where some of the input data is either missing or not reliable (e.g. evolutionary information).