PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of bmcbiologyBioMed Centralsearchsubmit a manuscriptregisterthis articleBMC Biology
 
BMC Biol. 2007; 5: 43.
Published online Oct 9, 2007. doi:  10.1186/1741-7007-5-43
PMCID: PMC2231411
Spatial chemical conservation of hot spot interactions in protein-protein complexes
Alexandra Shulman-Peleg,corresponding author1 Maxim Shatsky,4 Ruth Nussinov,2,3 and Haim J Wolfsoncorresponding author1
1School of Computer Science, Raymond and Beverly Sackler Faculty of Exact Sciences Tel Aviv University, Tel Aviv 69978, Israel
2Physical Biosciences Division, Berkeley National Lab, California, USA
3Basic Research Program, SAIC-Frederick, Inc. Center for Cancer Research Nanobiology Program, NCI, Frederick, MD 21702, USA
4Department of Human Genetics and Molecular Medicine Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
corresponding authorCorresponding author.
Alexandra Shulman-Peleg: shulmana/at/tau.ac.il; Maxim Shatsky: maxshats/at/tau.ac.il; Ruth Nussinov: ruthn/at/ncifcrf.gov; Haim J Wolfson: wolfson/at/tau.ac.il
Received January 18, 2007; Accepted October 9, 2007.
Background
Conservation of the spatial binding organizations at the level of physico-chemical interactions is important for the formation and stability of protein-protein complexes as well as protein and drug design. Due to the lack of computational tools for recognition of spatial patterns of interactions shared by a set of protein-protein complexes, the conservation of such interactions has not been addressed previously.
Results
We performed extensive spatial comparisons of physico-chemical interactions common to different types of protein-protein complexes. We observed that 80% of these interactions correspond to known hot spots. Moreover, we show that spatially conserved interactions allow prediction of hot spots with a success rate higher than obtained by methods based on sequence or backbone similarity. Detection of spatially conserved interaction patterns was performed by our novel MAPPIS algorithm. MAPPIS performs multiple alignments of the physico-chemical interactions and the binding properties in three dimensional space. It is independent of the overall similarity in the protein sequences, folds or amino acid identities. We present examples of interactions shared between complexes of colicins with immunity proteins, serine proteases with inhibitors and T-cell receptors with superantigens. We unravel previously overlooked similarities, such as the interactions shared by the structurally different RNase-inhibitor families.
Conclusion
The key contribution of MAPPIS is in discovering the 3D patterns of physico-chemical interactions. The detected patterns describe the conserved binding organizations that involve energetically important hot spot residues and are crucial for the protein-protein associations.
Protein-protein interfaces (PPIs) are defined as regions of interaction between two non-covalently linked protein molecules. As binding is closely related to function, analysis of the properties of PPIs have long been a problem of major interest [1-7]. The pioneering work of Clackson and Wells has shown that only a small and complementary set of cooperative contact residues, termed "hot spots" maintains the binding affinity [8]. Hot spots are identified by alanine scanning experiments. They are defined as residues whose mutation to alanine leads to a significant drop in the binding free energy [9,10]. Several works have studied the nature and organization of hot spots [11-13] as well as their computational prediction [14-19]. Using the double mutant cycle, Schreiber and Fersht have shown the cooperativity of residues and interactions across the interface [20]. Furthermore, it was shown that PPIs are built in a modular fashion [21] and there is a cooperativity between the hot regions [22] and the conserved residues [23,24].
A key underlying concept in many studies postulates that functionally important properties are conserved throughout evolution [13,25] and can be recognized by the comparison of a set of protein sequences [26-29] or structures [30-32]. Structural classification of protein-protein interfaces by their Cα patterns [33,34] has led to an insight into interface organizations [35] and preferred residues conformations [36]. However, backbone atoms do not fully capture the physico-chemical nature of the interfaces and chemical interactions are known to be created by atoms of side chains with different residue identities. Current methods that do compare physico-chemical properties align single binding sites (i.e. one side of the interface) and do not consider the interacting partner [37-40]. Recently, we have presented a method that aligns a pair of PPIs by simultaneously considering the two pairs of complementary binding sites [41,42]. However, a combination of high scoring pairwise patterns does not necessarily provide a high scoring pattern common to a set of PPIs [43]. Several studies considered the chemical interactions formed across the interface [44-46] and used them for classification [47,48] and complex prediction [49]. However, the spatial conservation of these interactions was not systematically addressed, mostly due to the lack of computational tools for recognition of spatial patterns of interactions shared by a set of PPIs.
Here, we present the first extensive study of the spatial conservation of physico-chemical interactions shared within families of PPIs formed by functionally similar proteins. This study was performed with our novel method, MAPPIS (Multiple Alignment of PPIS). The method is based on physico-chemical interactions formed across the interface between groups of atoms, which may derive from amino acids with different identities and backbone locations [50]. The uniqueness of MAPPIS lies in its ability to detect spatially conserved patterns of interactions even when there is no sequence and fold homology between the corresponding proteins. By applying MAPPIS to different families of PPIs, we observed that (i) most of the conserved physico-chemical interactions are contributed by the hot spot residues, and (ii) consequently, MAPPIS predicts hot spots with a high success rate, indicating the functional importance of the conserved chemical interactions. Using MAPPIS, we further provide specific biological examples that reveal previously overlooked similarities between structurally different though functionally related complexes.
We assess the significance of spatially conserved patterns of interactions. First, we describe the physico-chemical patterns we look for and the concept behind MAPPIS. Next, we present an extensive analysis of the families of PPIs that were previously studied by experimental alanine scanning, and show that spatially conserved interactions can predict hot spots. Finally, we provide the details of the specific patterns of interactions shared within these families.
Recognition of shared interactions
A PPI is defined by a pair of interacting binding sites. The area of each binding site is determined by the solvent accessible surface points [51] that are located less than 4Å from the surface of the binding partner. Following the definition of Schmitt et al [39], each amino acid in a protein is represented by points in 3D space termed pseudocenters. Each pseudocenter represents a group of atoms according to the interactions in which it may participate: hydrogen-bond donor, hydrogen-bond acceptor, mixed donor/acceptor, hydrophobic aliphatic and aromatic (π) contacts. Some of the atoms of a pseudocenter may be buried and some may be exposed. We considered all the pseudocenters with at least one surface exposed atom. These were assigned the following attributes: (i) charge; (ii) normal vectors that denote the surface direction and ring plane orientation (for aromatic rings); (iii) surface patch size and curvature (estimated by the solid angle shape function [52]); Figure Figure1A1A presents examples of a representation of amino acids by pseudocenters. For example, the side chain of Lys is represented by a donor, located at the nitrogen atom, and a hydrophobic aliphatic pseudocenter, located at the center of mass of its four carbons [39].
Figure 1
Figure 1
Shared interactions. (A) The left figure shows a PPI represented by the solvent accessible surfaces (small dots [51]) and the pseudocenters (balls). Only surface exposed pseudocenters are considered. Hydrogen bond donors are blue, acceptors – (more ...)
An interaction is defined by a pair of close enough pseudocenters, one from each side of the interface, possessing complementary physico-chemical properties. Specifically, hydrogen bond donors are complementary to acceptors, while hydrophobic aliphatic and aromatic centers can interact with similar ones. Pseudocenters with the mixed donor/acceptor property, such as the nitrogen atoms of His, can interact with both donors and acceptors. The interaction distance thresholds are 3.9Å [53] for hydrogen bonds and 8Å for the rest (according to the maximal possible distance between pseudocenters that represent groups of atoms). As the exact computational definition of real interactions is not straightforward [53], we practically overcame this problem by considering all possible interactions at the early stages and selecting only those that are conserved in all the complexes.
We compared the spatial arrangements of the following three interaction types: hydrogen bonds, hydrophobic aliphatic and aromatic (π) contacts. Two interactions are considered similar if they are created by similar pseudocenters that are superimposed to nearby spatial locations (e.g. ≤ 3Å). The similarity of pseudocenters was measured by a scoring function that compares properties like spatial proximity (after the superimposition), charge, surface curvature as well as aromatic ring plane orientation. The similarity of interactions from two different PPIs is scored according to the similarity of the corresponding pseudocenters and the complementarity of their properties. Specifically, we measured the complementarity in terms of the pseudocenter proximity, charge complementarity, surface fit as well as aromatic ring orientations (favoring perpendicular and parallel π stacking). Given a set of PPIs, MAPPIS finds a set of transformations that superimpose them in 3D space in a way that maximizes the spatial and chemical similarity of their interactions and pseudocenters (see Methods).
To illustrate the concept behind MAPPIS we aligned six PPIs of serine proteases with inhibitors. These are formed between serine proteases of two structural folds (trypsin and subtilisin) with inhibitors that have different structural classifications [54] and less than 4% sequence identity. Figure Figure1B1B presents a pattern of nine common interactions recognized by MAPPIS (six hydrogen bonds, two hydrophobic aliphatic and one aromatic). The correct alignment of the catalytic residues of the serine proteases indicates the accuracy of the MAPPIS solution. Studying the PPIs of trypsins, Scheidig et al [55] have stressed the importance of the interactions formed with the hot spots Lys15 and Arg15 of the trypsin inhibitors (1cbw, 1taw, 1ca0). Our results are consistent with this observation. Moreover, MAPPIS found that the PPIs of subtilisins exhibit five spatially similar interactions formed with the residues Leu45 (1cse), Met73 (2sic) and Arg5 (1oyv) of the corresponding subtilisin inhibitors. In particular, as can be seen these interactions are formed by amino acids with different identities and backbone locations. However, these amino acids have similar physico-chemical properties (pseudocenters) that form similar spatially conserved interactions. Hence, residue-based methods would not have detected these conserved (hot spot) interactions.
Hot spot prediction
Here, we perform an extensive analysis of the available structural data and show that recognition of spatially conserved interactions can predict hot spots. We have retrieved all complexes with significant numbers of alanine mutations deposited in the AseDB database [10] and analyzed by Kortemme et al. [14].
For each such complex, we retrieved all the complexes created by molecules with the same molecule name in the PDB [56] and the same family id in SCOP [54]. As similarity of the overall sequences and structures does not necessarily implies the similarity in the binding patterns and vice versa, we did not remove sequence homologues and retained SCOP family members only if they shared more than three interactions with the constructed PPI family. Following this procedure we obtained a dataset of 12 PPI families, each with an average of six members (see Table Table11).
Table 1
Table 1
The dataset of PPI families with available alanine scanning data [10,14]. The complexes tested by experimental alanine scanning, are detailed in columns 1–3. Column 4 presents the number of PPI family members created by molecules with the same (more ...)
We have observed that in these families, on average 80% of the shared interactions with similar spatial physico-chemical organization are created by the hot spot residues (following Kortemme et al. hot spots are defined as residues with ΔΔG ≥ 1 kcal/mol [14]). Moreover, we show that these conserved interactions can be used to predict hot spots with a mean success rate of 0.75, calculated as the average of the true positive rate (specificity) and the true negative rate (sensitivity) of the hot spot predictions. The specificity is defined by TN/(TN + FP), and the sensitivity is TP/(TP + FN), where TP and FP are the numbers of true and false positives and TN and FN are the number of true and false negatives respectively. In addition, we constructed (ROC) curves, which plot the sensitivity as a function of the true negative rate (1-specificity), while varying the prediction threshold. The area under this curve indicates the performance gain over a random predictor (with an area of 0.5).
Remarkably, the average ROC area of MAPPIS is 0.77 and it is thus considered to be a good hot spot predictor. Table Table22 presents a comparison of MAPPIS with two state-of-the art computational methods: Consurf [27], which calculates the evolutionary conservation within a protein family, and Robetta [14,15], which explicitly calculates the expected change in the binding free energy upon mutation to alanine. The performance of MAPPIS was significantly better than Consurf, which had a ROC area of 0.48. When compared to Robetta both methods had almost the same specificity (0.86 for MAPPIS and 0.85 for Robetta). The sensitivity of MAPPIS and Robetta were also quite similar, with only a slight difference (0.66 for MAPPIS versus Robetta's 0.64). These results show that MAPPIS captures the energetics of the protein-protein interactions and can predict hot spots with a high success rate. As computational alanine scanning methods, like Robetta [15], consider single structures, MAPPIS can not replace them. Rather, it complements them by showing the important role of the hot spots in the atomic interactions and in explaining their cooperativity. Moreover, it reveals the conserved chemical binding organizations, which are formed by the atomic interactions and can not be detected at the residue level.
Table 2
Table 2
Prediction of hot spots with MAPPIS, Consurf [27] and Robetta [14,15]. Columns 1–4 are as in Table 1. The ROC curves were constructed by varying the MAPPIS threshold of the interaction score and the Consurf conservation grade. The sensitivity (more ...)
We compared the predictive power of MAPPIS with our previously developed multiple alignment methods. The first method, MultiProt [57] performs multiple structural alignment of the protein backbones represented by the Cα atoms. Here, it was applied to simultaneously align the overall structures of both proteins in a complex. The specificity of its hot spots predictions was low (0.29) and due to the large number of false positive solutions it is less suitable for this purpose. The second method, MultiBind [58] is based on recognition of similar physico-chemical properties of the protein binding sites without any consideration of the binding partners. As most of the conserved interactions recognized by MAPPIS are created by regions with similar physico-chemical properties, the predictions made by MAPPIS are a subset of the predictions of MultiBind. However, as it ignores the binding partners and the interactions created across the interfaces it has a high false positive rate, its specificity is 0.44 and its area under the ROC curve is 0.58 (see table in Additional file 2). In addition, as MAPPIS utilizes the information of interactions, its running times are 10-fold faster than those of MultiBind and its average running time on a typical family of 6–7 PPIs is 3–4 minutes (on a standard PC, 2.60 GHz CPU, 2 GB RAM).
PPIs of ribonucleases with inhibitors
Ribonucleases (RNases), which catalyze RNA degradation, are lethal to the cell when expressed without their specific RNase inhibitor (RI). The affinity of RI for RNases is one of the highest among known protein-protein complexes (e.g. 1 fM for RI-Angiogenin [59]). Below we analyze the different types of RNase-inhibitor complexes and present the interactions shared within each family as well as the interactions conserved between the PPIs formed by proteins with different overall sequences and folds.
Barnase-Barstar
Barnase is a bacterial protein with a RNase activity and barstar is its specific inhibitor. We aligned six PPIs of barnase-barstar (PDB:chain1:chain2 – 1brsAD, 1b2sAD, 1b27AD, 1x1uAD, 1x1wAD, 1b2uAD. See figure 1A in Additional file 1). These PPIs were recognized to share 17 interactions, which are conserved among the average of 25 interface interactions. Thirteen of them are interactions created by known hot spots in at least one PPI chain and six of them are created by pairs of interacting hot spots. These are created by Asp39 of barstar interacting with Arg-83 and Arg-87 of barnase as well as barstar Glu76 and Asp36 which interact with barnase Arg59 and His102 respectively. The importance of these interactions was experimentally validated by the double mutant studies of Schreiber and Fersht [20] who have measured that their coupling energies range from 5–7 kcal/mol.
RNase A-like with leucine-rich repeat inhibitors
Another type of Ribonuclease-inhibitor complex is formed by RNase A-like ribonucleases [54] with leucine-rich repeat inhibitors. We applied MAPPIS to compare the PPIs of four complexes (see figure 1B in Additional file 1): (1) RI with Angiogenin (1a4yAB); (2) RI with human eosinopil derived neurotoxin (2bexAC); (3) RI complexed with RNase I (1z7xZY) and (4) RI with RNase A (1dfjIE); MAPPIS recognized 7 interactions that are spatially and physico-chemically conserved in all the complexes (see table 2 in Additional file 2). The conserved interactions recognized by MAPPIS are formed by the known hot spots Tyr-434 and Asp-435 of Angiogenin with ΔΔG of 3.3 and 3.5 kcal/mol respectively. Additional interactions are the π contacts between the rings of Angiogenin Tyr-437 with His-114 of RI, which in spite of a ΔΔG of 0.8 kcal/mol are conserved in all the complexes. Some interactions are formed between groups of atoms and are independent of the amino acid identities. For example, the hydrogen bond between a side-chain of Tyr-434 (donor/acceptor) and the backbone O atom of Pro-38 (acceptor) in RI-Angiogenin complex (1a4y) is similar to a hydrogen bond formed in the RI-neurotoxin complex (2bex) by the side-chain of Tyr-434 with the backbone of Arg-36. Interestingly, the side chains of these RI residues Pro-38 in 1a4y and Arg-36 in 2bex form similar hydrophobic interactions with Val-432 in RNases, which although not experimentally tested, is predicted by Robetta [14] to be a hot spot.
Ribonucleases with inhibitors: different folds similar functions
Both of the above examples are RNase-inhibitor families that perform similar functions, but their sequences and structures are totally different. MAPPIS enables the recognition of previously overlooked spatial patterns of interactions shared by their PPIs. Specifically, we applied MAPPIS to compare between the three most distinct complexes (less than 4% sequence identity): (i) Barnase-barstar (1brsAD); (ii) RI with Angiogenin (1a4yAB) and (iii) Barstar with RNase Sa (1ay7AB). Figure Figure2A2A as well as Table Table33 present the results of the MAPPIS alignment. We have recognized four interactions that are formed by known hot spots in all types of complexes [20,59].
Figure 2
Figure 2
Alignment examples. (A) Alignment of 3 RNase-inhibitor PPIs. The Angiogenin is yellow (1a4yB) and the Rnase Sa (1ay7A) and barnase (1brsA) are dark and light orange. The Leucine-rich RI (1a4yA) is in magenta while the barstars (1ay7B,1brsD) are blue and (more ...)
Table 3
Table 3
The interactions shared by PPIs of structurally different Ribonucleases with inhibitors. Each pair of rows details the interacting pseudocenters of two PPI chains. Each three columns present the details of a specific PPI: (i) chain identifier and residue (more ...)
Specifically, we recognized two similar hydrogen bonds formed by the hot spots Asp-39 and Arg-83 of barnase-barstar and the Trp-438 and Arg-5 of RI-Angiogenin. We recognized that a hydrogen bond formed by the hot spots Asp-35 and Arg-59 in the barnase-barstar complex is similar to the hydrogen bond formed by the hot spots Asp-435 and Lys-40 in the RI-Angiogenin complex. Separately, for each type of complex the importance of these interactions has already been reported [20,59]. However, as they are created by amino acids with different identities and backbone locations, their similarity have never been detected before.
PPIs of colicin DNases with immunity proteins
The E colicin DNases are bacterial toxins that kill target microbial cells through random degradation of chromosomal DNA. Their catalytic activity is neutralized by the respective immunity proteins (Im) [60]. We applied MAPPIS to analyze and classify the 5 types of available complexes: (i) colicin E3 DNase with Im3; (ii) E5-Im5; (iii) colicin D with Im; (iv) E7-Im7; (v) E9-Im9. While the PPIs of the first 3 types were recognized to be distinct and to belong to different classes, the interactions of E7-Im7 and E9-Im9 were observed to be extremely similar. Specifically, we have aligned 6 PPIs of E9-Im9 (1bxiAB, 1emvAB, 1fr2AB) and E7-Im7(1mz8AB, 1ujzAB, 1fr2AB) and observed 7 conserved interactions (see in Additional file 2). Four shared interactions (two hydrogen bonds and two aromatic interactions) are created by the conserved YY motif (Tyr-54, Tyr-55, 1bxi numbering) [60]. The rest of the conserved interactions are the hydrogen bonds formed by Glu-30 and Asp-51 and a hydrophobic aliphatic interaction formed by Pro-56. The results of MAPPIS are consistent with alanine scanning, and residues Asp-51, Tyr-55 and Pro-56 are indeed hotspots with ΔΔG of 5.9, 4.6 and 1.24 kcal/mol respectively. In addition, the results of MAPPIS are consistent with previous biological studies [60], which emphasized the conservation of the interaction of Tyr-55 (E9) with Phe-86 (Im9). Interestingly, we observed that, due to reduction of the number of false positives, interactions shared between E9-Im9 and E7-Im7 provided a better prediction of hot spots than interactions shared only by the PPIs of E9-Im9. Specifically, the success rate of the predictions based on the three PPIs of E9-Im9 is only 0.58 (ROC area 0.47, specificity 0.38 and sensitivity 0.78). Taking into consideration the additional three complexes of E7-Im7 increases the specificity of the predictions and achieves the success rate of 0.72 (ROC area 0.72, specificity 1.0 and sensitivity 0.44. As can be seen in Figure Figure2B2B MAPPIS maximizes the similarity in the interface area and allows to overcome the backbone flexibility and the rotation of the overall structures. These were described by Joachimiak et al. [61], who have designed a new interface of E7-Im7. Interestingly, when we added the structure of the redesigned PPI (2erh) to our alignment, the shared pattern of interactions, detailed in Figure Figure2B2B remained almost unchanged. Most of the amino acids that create it were not modified and even those that were mutated preserved the interaction. For example, one of the interactions conserved in all the PPIs was created by the backbone O of Gln-528 in E7 (1mz8). Using MAPPIS, we observed that a similar backbone interaction was present in the redesigned PPI (2erh), in which this amino-acid was mutated to Lys-528. This example shows that MAPPIS can be used to guide protein design studies. It can recognize the most crucial interactions, which should remain unchanged and can point to amino acids that are not crucial for interaction or interact via their backbone atoms and can be replaced.
PPIs of superantigens with T-Cell receptors
Superantigens (SAGs) are a group of toxins that activate T-cells causing system-wide inflammation and other human diseases. Sundberg et al. [62] have analyzed complexes of different SAGs with T cell receptors (TCRs) and observed a diversity of binding modes backbone conformations. Intrigued by this phenomenon, we applied MAPPIS to align 6 complexes of TCRs with SAGs: (1) SEC3 (PDB: 1jckAB, 2aq3AB); (2) SEB (PDB:1sbbAB); (3) SpeA (PDB:1l0yAB, 1l0xAB) and (4) SpeC (PDB:1ktkEA). Remarkably, within 1 second, which is the running time of MAPPIS, we obtained results that are consistent with the thorough manual analysis of the interactions in each type of complexes done by Sundberg et al. [62]. Although the overall backbones of the compared SAGs can not be rigidly aligned in 3D space (see figure 2 in [62]), the chemical binding organizations of their complexes are similar. Specifically, we recognized 4 spatially conserved interactions (1 aromatic and 3 hydrogen bonds, see Table 4 in Additional file 2). Notably, all of these shared interactions are created by experimentally verified hot spots. Moreover, two of these interactions are created by pairs of cooperative hot spots that interact across the interface: Gly-53 and Thr-55 of TCR that interact with the hot spots Gln-210 and Asn-23 of SAG respectively (1jck numbering). Most of the shared interactions, which are spread along the TCR regions CDR2/FR3, were detailed by Sundberg et al. for each of the complexes with SpeA, SpeC and SEB (see figure 4 in [62]). Although the compared complexes have different binding conformations, MAPPIS aligns the loops in the binding regions and overcomes the backbone flexibility. Moreover, as these spatially conserved interactions are created by amino acids with different identities, the similarities described above can not be recognized by residue based computational methods.
Here, we have shown that spatially conserved physico-chemical interactions play a crucial functional role. We have presented a computational method, MAPPIS, for recognition of such patterns of conserved interactions formed between groups of atoms independent of the identity of the amino acids as well as the overall protein sequences and folds. Considering multiple complexes of functionally similar PPIs, MAPPIS allows the identification of the smallest set of interactions that may be responsible for binding and function. We have shown that chemical groups that form spatially conserved interactions correlate with cooperative effects in double mutant cycles and are useful for predicting hot spots.
Interestingly, we observed that increasing the number of the compared PPIs, as well as comparing PPIs of proteins with different overall sequences and folds, improves the specificity of the hot spot prediction. The main limitation of our approach is the requirement for the existence of a sufficient number of high resolution structures of complexes comprised by functionally similar proteins. The selection of such complexes is not straightforward, especially as there is no direct correspondence between functional similarity and the similarity of the overall sequences and structures [32].
With the fast progress of Structural Genomics and the availability of multiple structures of functionally related proteins, methods like MAPPIS are expected to become increasingly useful. MAPPIS complements both computational and experimental alanine scanning by explaining the functional role of hot spots in the formation of atomic interactions. Further, by recognition of conserved spatial patterns of physico-chemical interactions, it rationalizes hot spots' cooperativity and elucidates the complex binding organizations of the protein-protein interfaces. Therefore, it complements the experimental techniques, such as the double mutant cycle, which provide the experimental evidence for the cooperativity effects at the amino acid level but do not describe the atomic interactions that are responsible for it. Moreover, analysis of the conserved interactions with MAPPIS can explain the effect of amino acids' mutations and can contribute to studies of the binding affinity and specificity. Furthermore, targeting the conserved chemical organizations may be a useful strategy in protein and drug design.
Given a set of PPIs, MAPPIS solves an optimization problem of finding a set of transformations that superimpose the PPIs in 3D space in a way that maximizes the spatial and chemical similarity of their interactions and pseudocenters. As this optimization problem is computationally NP-hard [58], we provide an efficient approximation algorithm, the main stages of which are presented in Figure Figure33 and below.
Figure 3
Figure 3
Overview of the MAPPIS method.
The input
The input to MAPPIS consists of K PPIs equation M1. These are represented by their physico-chemical properties and interactions as presented in Figure Figure1A1A (see Results). For K interfaces we define the similarity with respect to the pivot PPI, which is selected as the first PPI. We assume that we are given the correspondence between the compared protein chains (i.e. Ai corresponds to Bi). This correspondence can be obtained either from the biological data (e.g. molecule names) or by running the pairwise alignment between (A1, B1) - (Ai, Bi) and (A1, B1) - (Bi, Ai), for each i ≠ 1.
Generation of pairwise transformations
Given a set of PPIs, we create a set of pairwise transformations that can superimpose each PPI the pivot. These transformations are constructed based on the information of the physico-chemical interactions formed across each interface. Specifically, each pair of pivot interactions is stored in a four-dimensional hash table with a key encoding the interactions' lengths and the distances between pseudocenters as well as their physico-chemical properties. Each pair of interactions from each PPI except the pivot is used to access the hash table and retrieve similar interaction pairs of the pivot. Each pair of matched interactions defines a candidate transformation that can superimpose the considered PPI upon the pivot. In particular, we use the least square fitting method and given two interactions from two PPIs, equation M2 and equation M3, i = 1, 2, we compute a transformation that can best superimpose them in 3D space, i.e. a transformation that minimizes the RMSD between the pseudocenters: equation M4. As we construct only the transformations that can superimpose at least two physico-chemically similar interactions, we reduce the number of the constructed transformations and achieve a performance gain over other methods (e.g. MultiBind, see Table Table11 in Additional file 2).
Multiple combination of 3D transformations
At the next stage we construct the multiple alignments which are based on the combination of all the candidate pairwise transformations constructed at the previous stage. The number of possible combinations is exponential in the number of PPIs. To practically overcome this limitation we apply an efficient branch-and-bound technique that effectively filters out a large number of low scoring solutions [58]. As illustrated in Figure Figure3,3, we iteratively traverse the created transformations. Each time we create a multiple alignment of a set of m PPIs and try to add a transformation equation M5 of the PPI, Im+1. However, if an estimated score of the multiple alignment between these m + 1 PPIs is lower than the score of the best multiple alignment found so far between all the K input PPIs (K m), we can ignore this combination of transformations and there is no need to try to extend it. Essentially, we continue and try to add another transformation, equation M6 of Im+1, and so on. Although theoretically the number of such traversals may be exponential, the filtering is very efficient and leads to low running times.
Furthermore, we achieve an additional speed up by the observation that we do not need to actually construct a multiple alignment for each set of m + 1 PPIs, but we can estimate an upper bound on its score. In particular, we calculate the highest score that can be achieved between the superimposed pseudocenters, without the requirement for the exact correspondence which resolves multiple matches.
Construction of the common pattern
For each potentially high scoring multiple superposition we compute the exact correspondence between the superimposed pseudocenters and interactions and determine the common pattern. The calculation of such correspondence involves solving a problem of PPI K-partite matching which is NP-hard even for a pair of PPIs [50]. Here, we implement the following greedy algorithm. First, we sort the superimposed interactions and pseudocenters according to their physico-chemical score (see Additional file 3). Each time, we greedily select a highest scoring set of multiply matched interactions (one from each PPI) and mark the selected pseudocenters as matched. The next selection will be made from the still unmatched pseudocenters. Where the number of interactions in which each pseudocenter can participate is bounded by the valency of the atoms. Once we have determined the pattern of interactions we apply a similar greedy procedure to determine the set of matched non-interacting pseudocenters. All candidate patterns are scored by the physico-chemical scoring functions which is detailed in Additional file 3. In all of the described examples (see Section Results) we have referred only to a single solution which received the highest score.
Running Time Complexity
The time complexity depends mainly on the stage of the multiple combination of 3D transformations and it is bounded by O(n3K'nK log(n)), where n is the number of pseudocenters in the largest PPI and K' is the depth of branch-and-bound stage (K' K) [50]. The practical running times of MAPPIS are as low as reported in Table 1 in Additional file 2
Availability and requirements
The MAPPIS software is available for download at: http://bioinfo3d.cs.tau.ac.il/mappis/. The software package contains the executable and a set of Perl scripts for PPI extraction. The package is suitable for the Linux platform and its download is free for non-commercial users.
Competing interests
The author(s) declares that there are no competing interests.
Authors' contributions
AS-P and MS developed the MAPPIS method. All authors participated in the research design and manuscript preparation.
Supplementary Material
Additional file 1
Supplementary figures.
Additional file 2
Supplementary tables.
Additional file 3
The Physico-Chemical Scoring Function.
Acknowledgements
We thank D. Reichmann, D. Schneidman, O. Dror, I. Halperin-Landsberg, and K. Lasker for useful insights and the help with the preparation of this manuscript. The research of AS-P was supported by the Clore PhD Fellowship. The research of HJW has been supported in part by the Israel Science Foundation (grant no. 281/05), by the NIAID, NIH (grant No. 1UC1AI067231), by the Binational US-Israel Science Foundation (BSF) and by the Hermann Minkowski-Minerva Center for Geometry at TAU. This publication has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under contract NO1-CO-12400. This research was supported [in part] by the Intramural Research Program of the NIH, National Cancer Institute, Center for Cancer Research. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the US Government.
  • Jones S, Thornton JM. Principles of protein-protein interactions. Proc Natl Acad Sci USA. 1996;93:13–20. doi: 10.1073/pnas.93.1.13. [PubMed] [Cross Ref]
  • Lo Conte L, Chothia C, Janin J. The atomic structure of protein-protein recognition sites. J Mol Biol. 1999;285:2177–2198. doi: 10.1006/jmbi.1998.2439. [PubMed] [Cross Ref]
  • Chakrabarti P, Janin J. Dissecting protein-protein recognition sites. Proteins. 2002;47:334–343. doi: 10.1002/prot.10085. [PubMed] [Cross Ref]
  • Bahadur RP, Chakrabarti P, Rodier F, Janin J. A dissection of specific and non-specific protein-protein interfaces. J Mol Biol. 2004;336:943–955. doi: 10.1016/j.jmb.2003.12.073. [PubMed] [Cross Ref]
  • Valdar WS, Thornton JM. Protein-protein interfaces: analysis of amino acid conservation in homodimers. Proteins. 2001;42:108–124. doi: 10.1002/1097-0134(20010101)42:1<108::AID-PROT110>3.0.CO;2-O. [PubMed] [Cross Ref]
  • Sheinerman FB, Norel R, Honig B. Electrostatic aspects of protein-protein interactions. Curr Opin Struct Biol. 2000;10:153–156. doi: 10.1016/S0959-440X(00)00065-8. [PubMed] [Cross Ref]
  • Nooren IMA, Thornton JM. Diversity of protein-protein interactions. EMBO J. 2003;22:3486–3492. doi: 10.1093/emboj/cdg359. [PubMed] [Cross Ref]
  • Clackson T, Wells JA. A hot spot of binding energy in a hormone-receptor interface. Science. 1995;267:383–6. doi: 10.1126/science.7529940. [PubMed] [Cross Ref]
  • Bogan A, Thorn K. Anatomy of hot spots in protein interfaces. J Mol Biol. 1998;280:1–9. doi: 10.1006/jmbi.1998.1843. [PubMed] [Cross Ref]
  • Thorn K, Bogan A. ASEdb: a database of alanine mutations and their effects on the free energy of binding in protein interactions. Bioinformatics. 2001;17:284–285. doi: 10.1093/bioinformatics/17.3.284. [PubMed] [Cross Ref]
  • DeLano W, Ultsch AM, Hand deVos M, Wells J. Convergent solutions to binding at a protein-protein interface. Science. 2000;287:1279–83. doi: 10.1126/science.287.5456.1279. [PubMed] [Cross Ref]
  • DeLano W. Unraveling hot spots in binding interfaces: progress and challenges. Curr Opin Struct Biol. 2002;12:14–20. doi: 10.1016/S0959-440X(02)00283-X. [PubMed] [Cross Ref]
  • Guharoy M, Chakrabarti P. Conservation and relative importance of residues across protein-protein interfaces. PNAS. 2005;102:15447–15452. doi: 10.1073/pnas.0505425102. [PubMed] [Cross Ref]
  • Kortemme T, Baker D. A simple physical model for binding energy hot spots in protein-protein complexes. Proc Natl Acad Sci USA. 2002;99:14116–21. doi: 10.1073/pnas.202485799. [PubMed] [Cross Ref]
  • Kortemme T, Kim DE, Baker D. Computational alanine scanning of protein-protein interfaces. Sci STKE. 2004;2004:12.
  • Guerois R, Nielsen J, Serrano L. Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J Mol Biol. 2002;320:369–387. doi: 10.1016/S0022-2836(02)00442-4. [PubMed] [Cross Ref]
  • Massova I, Kollman PA. Computational Alanine Scanning To Probe Protein-Protein Interactions: A Novel Approach To Evaluate Binding Free Energies. J Am Chem Soc. 1999;121:8133–8143. doi: 10.1021/ja990935j. [Cross Ref]
  • Moreira IS, Fernandes PA, Ramos MJ. Unravelling Hot Spots: a comprehensive computational mutagenesis study. Theor Chem Accounts. 2007;1:99–113.
  • Moreira IS, Fernandes PA, Ramos MJ. Computational alanine scanning mutagenesis – An improved methodological approach. J Comput Chem. 2006;28:644–654. doi: 10.1002/jcc.20566. [PubMed] [Cross Ref]
  • Schreiber G, Fersht AR. Energetics of protein-protein interactions: analysis of the barnase-barstar interface by single mutations and double mutant cycles. J Mol Biol. 1995;248:478–86. [PubMed]
  • Reichmann D, Rahat O, Albeck S, Meged R, Dym O, Schreiber G. The modular architecture of protein-protein binding interfaces. Proc Natl Acad Sci USA. 2005;102:57–62. doi: 10.1073/pnas.0407280102. [PubMed] [Cross Ref]
  • Moza B, Buonpane RA, Zhu P, Herfst CA, Rahman AK, McCormick JK, Kranz DM, Sundberg EJ. Long-range cooperative binding effects in a T cell receptor variable domain. Proc Natl Acad Sci USA. 2006;103:9867–9872. doi: 10.1073/pnas.0600220103. [PubMed] [Cross Ref]
  • Halperin I, Wolfson H, Nussinov R. Protein-protein interactions; coupling of structurally conserved residues and of hot spots across interfaces. Implications for docking. Structure. 2004;12:1027–38. doi: 10.1016/j.str.2004.04.009. [PubMed] [Cross Ref]
  • Pazos F, Helmer-Citterich M, Ausiello G, Valencia A. Correlated mutations contain information about protein-protein interaction. J Mol Biol. 1997;29:511–523. doi: 10.1006/jmbi.1997.1198. [PubMed] [Cross Ref]
  • Di Nardo AA, Larson SM, Davidson AR. The Relationship Between Conservation, Thermodynamic Stability, and Function in the SH3 Domain Hydrophobic Core. J Mol Biol. 2003;333:641–655. doi: 10.1016/j.jmb.2003.08.035. [PubMed] [Cross Ref]
  • Res I, Lichtarge O. Character and evolution of protein-protein interfaces. Phys Biol. 2005;2:S36–S43. doi: 10.1088/1478-3975/2/2/S04. [PubMed] [Cross Ref]
  • Glaser F, Pupko T, Paz I, Bell R, Bechor-Shental D, Martz E, Ben-Tal N. ConSurf: identification of functional regions in proteins by surface-mapping of phylogenetic information. Bioinformatics. 2003;19:163–164. doi: 10.1093/bioinformatics/19.1.163. [PubMed] [Cross Ref]
  • Falquet L, Pagni M, Bucher P, Hulo N, Sigrist C, Hofmann K, Bairoch A. The PROSITE database, its status in 2002. Nucleic Acids Res. 2002;30:235–238. doi: 10.1093/nar/30.1.235. [PMC free article] [PubMed] [Cross Ref]
  • Schueler-Furman O, Baker D. Conserved residue clustering and protein structure prediction. Proteins. 2003;52:225–235. doi: 10.1002/prot.10365. [PubMed] [Cross Ref]
  • Ma B, Elkayam T, Wolfson H, Nussinov R. Protein-protein interactions: Structurally conserved residues distinguish between binding sites and exposed protein surfaces. Proc Natl Acad Sci USA. 2003;100:5772–5777. doi: 10.1073/pnas.1030237100. [PubMed] [Cross Ref]
  • Aytuna AS, Gursoy A, Keskin O. Prediction of protein-protein interactions by combining structure and sequence conservation in protein interfaces. Bioinformatics. 2005;21:2850–2855. doi: 10.1093/bioinformatics/bti443. [PubMed] [Cross Ref]
  • Wolfson HJ, Shatsky M, Schneidman-Duhovny D, Dror O, Shulman-Peleg A, Ma B, Nussinov R. From Structure to Function: Methods and Applications. Curr Prot and Pep Sci. 2005;6:171–83. doi: 10.2174/1389203053545435. [Cross Ref]
  • Keskin A, Tsai CH, Wolfson HJ, Nussinov R. A new, structurally non-reduntant, diverse dataset of protein-protein interfaces and its implications. Prot Sci. 2004;13:1043–55. doi: 10.1110/ps.03484604. [Cross Ref]
  • Winter C, Henschel A, Kim WK, Schroeder M. SCOPPI: a structural classification of protein-protein interfaces. Nucleic Acids Res. 2006;34:D310–4. doi: 10.1093/nar/gkj099. [PMC free article] [PubMed] [Cross Ref]
  • Keskin O, Ma B, Nussinov R. Hot regions in protein-protein interactions: the organization and contribution of structurally conserved hot spot residues. J Mol Biol. 2005;345:1281–94. doi: 10.1016/j.jmb.2004.10.077. [PubMed] [Cross Ref]
  • Li X, Keskin O, Ma B, Nussinov R, Liang J. Protein-protein interactions: hot spots and structurally conserved residues often locate in complemented pockets that pre-organized in the unbound states: implications for docking. J Mol Biol. 2004;344:781–795. doi: 10.1016/j.jmb.2004.09.051. [PubMed] [Cross Ref]
  • Wallace AC, Laskowski RA, Thornton JM. Derivation of 3D coordinate templates for searching structural databases: application to Ser-His-Asp catalytic triads in the serine proteinases and lipases. Protein Sci. 1996;5:1001–1013. [PubMed]
  • Ausiello G, Via A, Helmer-Citterich M. Query3d: a new method for high-throughput analysis of functional residues in protein structures. BMC Bioinformatics. 2005;6:S5. doi: 10.1186/1471-2105-6-S4-S5. [PMC free article] [PubMed] [Cross Ref]
  • Schmitt S, Kuhn D, Klebe G. A New Method to Detect Related Function Among Proteins Independent of Sequence or Fold Homology. J Mol Biol. 2002;323:387–406. doi: 10.1016/S0022-2836(02)00811-2. [PubMed] [Cross Ref]
  • Shulman-Peleg A, Nussinov R, Wolfson HJ. Recognition of Functional Sites in Protein Structures. J Mol Biol. 2004;339:607–633. doi: 10.1016/j.jmb.2004.04.012. [PubMed] [Cross Ref]
  • Shulman-Peleg A, Mintz S, Nussinov R, Wolfson H. Protein-Protein Interfaces: Recognition of Similar Spatial and Chemical Organizations. In: Jonassen I, Kim J, editor. Workshop on Algorithms in Bioinformatics, Springer, Lec Notes in Comp Sci. Vol. 3240. 2004. pp. 194–205.
  • Mintz S, Shulman-Peleg A, Wolfson HJ, Nussinov R. Generation and analysis of a protein-protein interface dataset with similar chemical and spatial patterns of interactions. Proteins. 2005;61:6–20. doi: 10.1002/prot.20580. [PubMed] [Cross Ref]
  • Akutsu T, Halldorson MM. On the approximation of largest common subtrees and largest common point sets. Theoretical Computer Science. 2000;233:33–50. doi: 10.1016/S0304-3975(97)00278-8. [Cross Ref]
  • Gao Y, Wang R, Lai L. Structure-based method for analyzing protein-protein interfaces. J Mol Model. 2004;10:44–54. doi: 10.1007/s00894-003-0168-3. [PubMed] [Cross Ref]
  • Sobolev V, Sorokine A, Prilusky J, Abola E, Edelman M. Automated analysis of interatomic contacts in proteins. Bioinformatics. 1999;15:327–332. doi: 10.1093/bioinformatics/15.4.327. [PubMed] [Cross Ref]
  • Saha R, Bahadur R, Pal A, Mandal S, Chakrabarti P. ProFace: a server for the analysis of the physicochemical features of protein-protein interfaces. BMC Structural Biology. 2006;6:11. doi: 10.1186/1472-6807-6-11. [PMC free article] [PubMed] [Cross Ref]
  • Mintseris J, Weng Z. Atomic contact vectors in protein-protein recognition. Proteins. 2003;53:629–639. doi: 10.1002/prot.10432. [PubMed] [Cross Ref]
  • Block P, Paern J, Hullermeier E, Sanschagrin P, Sotriffer CA, Klebe G. Physicochemical descriptors to discriminate protein-protein interactions in permanent and transient complexes selected by means of machine learning algorithms. Proteins. 2006;63:607–22. doi: 10.1002/prot.21104. [PubMed] [Cross Ref]
  • Aloy P, Russell RB. Interrogating protein interaction networks through structural biology. Proc Natl Acad Sci USA. 2002;99:5896–901. doi: 10.1073/pnas.092147999. [PubMed] [Cross Ref]
  • Shulman-Peleg A, Shatsky M, Nussinov R, Wolfson H. MAPPIS: Multiple 3D Alignment of Protein-Protein Interfaces. In: Berthold M, editor. Complife, Konstanz, Germany, Springer Lec Notes in Comp Sci. Vol. 3695. 2005. pp. 91–103.
  • Connolly M. Analytical molecular surface calculation. J Appl Cryst. 1983;16:548–558. doi: 10.1107/S0021889883010985. [Cross Ref]
  • Connolly ML. Measurement of protein surfaces shape by solid angles. J Mol Graph. 1986;4:3–6. doi: 10.1016/0263-7855(86)80086-8. [Cross Ref]
  • McDonald IK, Thornton JM. Satisfying hydrogen bonding potential in proteins. J Mol Biol. 1994;238:777–793. doi: 10.1006/jmbi.1994.1334. [PubMed] [Cross Ref]
  • Murzin A, Brenner S, Hubbard T, Chothia C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol. 1995;247:536–540. doi: 10.1006/jmbi.1995.0159. [PubMed] [Cross Ref]
  • Scheidig AJ, Hynes TR, Pelletier LA, Wells JA, Kossiako3 AA. Crystal structures of bovine chymotrypsin and trypsin complexed to the inhibitor domain of Alzheimer's amyloid beta-protein precursor (APPI) and basic pancreatic trypsin inhibitor (BPTI): engineering of inhibitors with altered specificities. Protein Sci. 1997;6:1806–24. [PubMed]
  • Berman H, Westbrook J, Feng Z, Gilliland G, Bhat T, Weissig H, Shindyalov I, Bourne P. The Protein Data Bank. Nucleic Acids Res. 2000;28:235–242. doi: 10.1093/nar/28.1.235. [PMC free article] [PubMed] [Cross Ref]
  • Shatsky M, Nussinov R, Wolfson HJ. A method for simultaneous alignment of multiple protein structures. Proteins. 2004;56:143–156. doi: 10.1002/prot.10628. [PubMed] [Cross Ref]
  • Shatsky M, Shulman-Peleg A, Nussinov R, Wolfson H. The multiple common point set problem and its application to molecule binding pattern detection. J Comput Biol. 2006;13:407–42. doi: 10.1089/cmb.2006.13.407. [PubMed] [Cross Ref]
  • Papageorgiou AC, Shapiro R, Acharya KR. Molecular recognition of human angiogenin by placental ribonuclease inhibitor-an X-ray crystallographic study at 2.0 A resolution. EMBO J. 1997;16:5162–77. doi: 10.1093/emboj/16.17.5162. [PubMed] [Cross Ref]
  • Keeble AH, Kirkpatrick N, Shimizu S, Kleanthous C. Calorimetric dissection of colicin DNase-immunity protein complex specificity. Biochemistry. 2006;45:3243–54. doi: 10.1021/bi052373o. [PubMed] [Cross Ref]
  • Joachimiak LA, Kortemme T, Stoddard BL, Baker D. Computational design of a new hydrogen bond network and at least a 300-fold specificity switch at a protein-protein interface. J Mol Biol. 2006;361:195–208. doi: 10.1016/j.jmb.2006.05.022. [PubMed] [Cross Ref]
  • Sundberg E, Li H, Llera AS, McCormick JK, Tormo J, Schlievert PM, Karjalainen K, Mariuzza RA. Structures of two streptococcal superantigens bound to TCR beta chains reveal diversity in the architecture of T cell signaling complexes. Structure. 2002;10:687–99. doi: 10.1016/S0969-2126(02)00759-1. [PubMed] [Cross Ref]
  • Shatsky M, Nussinov R, Wolfson H. Optimization of Multiple Sequence Alignment Based on Multiple Structure Alignment. Proteins. 2006;62:209–17. doi: 10.1002/prot.20665. [PubMed] [Cross Ref]
  • Shatsky M, Shulman-Peleg A, Nussinov R, Wolfson H. Recognition of Binding Patterns Common to a Set of Protein Structures. In: Miyano S, editor. RECOMB Cambridge MA, LNCS. Vol. 3500. 2005. pp. 440–455.
Articles from BMC Biology are provided here courtesy of
BioMed Central