|Home | About | Journals | Submit | Contact Us | Français|
Protein–protein complexes play key roles in all cellular signal transduction processes. We have developed a fast and accurate computational approach to predict changes in the binding free energy upon alanine mutations in protein–protein interfaces. The approach is based on a knowledge-based scoring function, DrugScorePPI, for which pair potentials were derived from 851 complex structures and adapted against 309 experimental alanine scanning results. Based on this approach, we developed the DrugScorePPI webserver. The input consists of a protein–protein complex structure; the output is a summary table and bar plot of binding free energy differences for wild-type residue-to-Ala mutations. The results of the analysis are mapped on the protein–protein complex structure and visualized using J mol. A single interface can be analyzed within a few minutes. Our approach has been successfully validated by application to an external test set of 22 alanine mutations in the interface of Ras/RalGDS. The DrugScorePPI webserver is primarily intended for identifying hotspot residues in protein–protein interfaces, which provides valuable information for guiding biological experiments and in the development of protein–protein interaction modulators. The DrugScorePPI Webserver, accessible at http://cpclab.uni-duesseldorf.de/dsppi, is free and open to all users with no login requirement.
Protein–protein interactions have important implications in most cellular signalling networks (1). Interfering with protein–protein interactions, on the one hand, bears the potential to understand the function of regulatory units in signaling networks and, on the other hand, offers a promising way to develop new therapeutics (2,3). The ability to inhibit protein interactions requires knowledge about affinity and specificity in protein interfaces. A powerful tool for analyzing crucial interactions in protein interfaces is provided by experimental alanine scanning mutagenesis (4). Alanine scanning measures the change in binding free energy (ΔΔG) of a protein–protein complex upon mutation of an amino acid residue to alanine, i.e. the deletion of a sidechain beyond the Cβ carbon atom. Scanning all amino acids of a protein–protein interface then yields a map of which interactions are critical for protein binding and which ones are not. In fact, protein–protein complex formation depends, in most cases, on only a few interface residues that account for the highest contribution to the binding free energy (5,6). These residues are called ‘hotspots’ (7). Despite significant advances in molecular biology, alanine scanning still represents a large experimental effort that cannot be applied easily to high-throughput screening of protein–protein interfaces. Hence, there is a strong need for computational approaches to detect hot spots in modeled or experimentally determined protein–protein complexes for which no experimental mutagenesis data is available.
Here, we introduce the DrugScorePPI webserver, a new webservice that offers a user-friendly way of performing alanine scanning in silico. For that purpose, we have developed a fast and accurate computational alanine scanning protocol that, for a given structure of protein–protein complex, allows an automatic scanning of the protein–protein interface within only a few minutes on a single standard CPU. Our method is grounded on knowledge-based pair potentials derived by following the DrugScore formalism (8), which has already been applied successfully to score protein–ligand (8) and RNA–ligand interactions (9). For DrugScorePPI, the statistical potentials were further fine-tuned against experimentally determined alanine scanning results. Application to an independent external test set of alanine mutations demonstrated the predictive power of the method. To date, several computational tools have been developed to perform alanine scanning in silico (10–12). The novelty of the DrugScorePPI webservice is rooted, first, in its high accuracy for predicting ΔΔG values, and, second, in its efficiency, allowing to score a single mutant protein–protein complex within seconds.
The webservice is easy to use: as input, a PDB file of the protein–protein complex of interest or a PDB code is required, as is information about the protein chain(s) that should be mutated. The results can either be obtained by email or interactively visualized in the web browser. The results contain a summary table plus a corresponding bar plot detailing computed ΔΔG results, the degree of buriedness of each mutated residue, and a note if a side chain is potentially involved in a salt-bridge. In addition, a PDB file is provided whose B-factor column contains ΔΔG values for visualization with common molecule viewers. The visualization is also possible in the web browser using the Jmol applet (http://jmol.sourceforge.net). To the best of our knowledge, there are no other web services for computational alanine scanning that provide a similar level of integrated structural analysis and visualization capabilities.
For deriving the distance-dependent pair-potentials of the DrugScorePPI scoring function, the same formalism was applied as already described for the DrugScore scoring function for protein−ligand complexes (8). The knowledge-based pair potentials for scoring protein–protein interactions have been derived from 851 crystallographically determined protein–protein complexes using an in-house mySQL database that contains structural information of all PDB entries (E. Schmidt, S. Derksen and H. Gohlke, unpublished results). The dataset consists of 655 homodimers and 196 heterodimers (13). In all of the cases, the complexes had been resolved to at least 2.5 Å. PDB codes of all complexes used for deriving the potentials are listed in ref. (13). Potentials were derived for all DrugScore standard atom types that occur in the 20 natural amino acids (8).
DrugScorePPI was used for computational alanine-scanning on a dataset of 18 protein–protein complexes with a total of 309 alanine mutations (14) (Supplementary Table S1). These mutations were obtained out of ~3000 mutations reported in ASEdb (14) by omitting those mutations (i) for which no PDB structure was available for the protein–protein complex, (ii) which are more than 5 Å away from a respective binding partner because this value is the upper distance limit of the DrugScorePPI potentials and (iii) that stabilize loops or form interactions to other structural elements within the same protein but do not interact with the binding partner. ΔΔG values associated with the latter mutations will very likely report on the stabilization or destabilization of the structure of the one protein rather than on changes in the interactions with the corresponding binding partner. See Supplementary Figure S4 for an example of such a mutation. The average experimental uncertainty for 78 (PDB codes of the corresponding complexes: 1a22, 1a4y, 1bxi, 1dfj, 3hfm) of the 309 mutations amounts to 0.31 kcal mol−1. Computed and experimental ΔΔG values showed a linear correlation with a coefficient r = 0.58 (Table 1, Supplementary Figure S1) and a standard deviation of 1.06 kcal mol−1 when the original pair potentials were applied. The linear correlation is statistically significant according to a P-value < 0.05.
To improve the predictive power of DrugScorePPI, we decided to adapt the weighting of the distance-dependent pair potentials W(d) in the summation that yields the score for each wild-type amino acid-to-Ala mutation of residue R with atoms r with respect to residues B of the binding partner, which consist of atoms b [Equations (1) and (2)]:
For this, we identified 24 residue-specific atomtypes Tv, e.g. C.3. in Val, C.3. in Leu, C.3. in Phe and so on. Pair potential contributions with low standard deviations across the training set were identified initially, and the respective atomtypes Tc were excluded from fitting. These pair potentials were scaled by a factor s. Finally, the degree of buriedness DOB(R) of each residue was used as an additional descriptor scaled by k. The coefficients cT(r), s and k (Supplementary Table S3) were then determined by correlating experimental and computed ΔΔG values employing partial least squares regression as implemented in MatLab (http://www.mathworks.com). The thus adapted potentials improve the correlation to rtrain = 0.73 (Supplementary Figure S2) with a root mean square deviation of 0.84 kcal mol−1 (Table 1). A leave-one-mutation-out cross-validation analysis yielded rLOO = 0.64 and STD = 0.94 kcal mol−1 (Supplementary Figure S3). A more stringent leave-one-complex-out cross-validation yielded rLCO = 0.63 and STD = 0.96 kcal mol−1. Both these validations clearly demonstrate the robustness of the model. Finally, a leave-homologous-complexes-out cross-validation yielded rLHO = 0.80 and STD = 0.81 kcal mol−1. To perform a leave-homologous-complexes-out cross-validation, we skipped all ribonuclease-like complexes (PDB ID’s: 1A4Y and 1DFJ) from the dataset and predicted ΔΔG values for them using adapted potentials derived from the rest of the dataset. The identification of homologous complexes was performed using the ProCKSI-Server (http://www.procksi.net) in this case. For comparison, when original DrugScorePPI potentials were used to predict ΔΔG values for the ribonuclease-like complexes, r = 0.49 and STD = 1.30 kcal mol−1 resulted. In our opinion, this clearly points out an improved predictive power of the adapted potentials compared to the original ones. Finally, we performed an all-against-all similarity search with the FASTA algorithm for all chains of all protein complexes and further analyzed the results by creating a standardized distance matrix. Out of 2048 chain comparisons, only 32 showed significant sequence similarities, thus pointing out the overall diversity of our dataset, which again speaks for the robustness of the model.
The flow chart of the DrugScorePPI webservice is depicted in Figure 1. The DrugScorePPI webserver requires a PDB file of the protein–protein complex as input, which is either downloaded from the RCSB Protein Database (15) or provided by the user. First, all non-peptide molecules and hydrogen atoms are deleted, and all interface residues in a protein–protein interface are identified based on the user-defined chain information. A residue is defined as to be in the interface if it has at least one atom within 5 Å radius of an atom belonging to a binding partner in the protein–protein complex. Next, the degree of buriedness is calculated for each of the interface residues to describe a sidechain’s surrounding by considering the number of atoms of nearby residues within a radius of 4 Å. The higher this number, the more buried is a sidechain. Then, pairs of residues that are potentially involved in a salt-bridge, i.e. Asp or Glu within 4 Å distance to Arg or Lys, are detected. Both, the degree of buriedness and the salt-bridge detection are meant to provide a first hint to users for potential hotspots. Finally, each of the interface residues is individually replaced with Ala, and the effect of this mutation on the binding free energy of the complex (ΔΔG) is computed.
The DrugScorePPI webservice submission page is shown in Figure 2. The DrugScorePPI webservice requires a structure of the protein–protein complex provided as PDB file or PDB code, chain identifier(s), and rewriting of a given security code to prevent misuse of the webservice. Optionally, an email address may be provided, in which case the results will be sent to that address. Otherwise, a link to a result page is provided after job submission in order to view the results in the web browser. These results will be stored on the server for ten days.
Providing valid chain identifiers is crucial, for they determine (i) the chain(s) of interest on which interface residues will be mutated to Ala and (ii) which other chains of the protein–protein complex will be considered as ‘corresponding chains’. ‘Corresponding chains’ are those chains that interact with the chain(s) of interest. It is strongly recommended that all chain identifiers of one binding partner be provided, as otherwise intramolecular interactions within this binding partner will also be considered for the ΔΔG computation. If a PDB file with a valid header section is provided, a warning will be issued if this recommendation is not followed.
Some restrictions apply to the PDB input file: (i) Only standard protein residues are considered for ΔΔG computations. (ii) Hydrogen atoms are neither required for nor considered during the computation, as the DrugScorePPI scoring function is only based on non-hydrogen atoms. Accordingly, for amino acid sidechains a standard protonation is assumed, i.e. Asp and Glu are treated as deprotonated, Arg, Lys and His as protonated. (iii) Alternative side chain conformations in the interface are not allowed. If present in the PDB file, an error message will be issued.
A typical run of the DrugScorePPI webservice takes a few minutes. Upon completion of a job, either an email is sent with the results or the results are presented in the web browser. Both the email and the results page contain a PDF and a PDB file.
The first part of the PDF file contains a summary table with four columns (Figure 3A). These columns contain, from left to right: (i) the three-letter code of the mutated residue, the residue number and the chain identifier; (ii) the computed relative binding free energy difference ΔΔG for the amino acid-to-Ala mutation, with positive values indicating a potential hot spot residue; (iii) the degree of buriedness for each side chain in the interface; (iv) a note as to whether a residue is potentially involved in a salt-bridge. The second part of the PDF file contains a corresponding bar plot of the ΔΔG results for mutated amino acids (Figure 3B). The PDB file contains the input protein–protein complex structure with computed ΔΔG values in the B-factor column. This file can be used to color residues in the interface according to their sidechains’ contributions to the binding free energy, e.g. with PyMol (http://pymol.sourceforge.net) or VMD (http://www.ks.uiuc.edu/Research/vmd/). Finally, warnings are provided about input structure characteristics that might have influenced the results (Figure 4A). As such, a list of missing residues and/or atoms as declared in the PDB header is presented because missing interface atoms or residues will certainly lead to false ΔΔG computations. Furthermore, if corresponding information is available in the PDB header, a warning is issued if not all chain identifiers of one binding partner have been provided, as in this case intramolecular interactions within this binding partner were also considered for the ΔΔG computation.
In the web browser, alanine scanning results by DrugScorePPI are visualized directly by mapping ΔΔG values onto the protein–protein complex structure using the Jmol applet (http://jmol.sourceforge.net) (Figure 4C). For this, residues in the interface are represented by a color code according to their sidechains’ contribution to the binding free energy, with reddish colors indicating hot spot residues, as detailed by the provided color scale. The chain(s) of the protein for which residue contributions were calculated is (are) colored in white; the corresponding chain(s) of the binding partner(s) is (are) colored in magenta.
Based on the implementation of our approach, several assumptions are made that can affect the applicability of the method, which should be considered when interpreting the results. (i) As ΔΔG values are calculated from the protein–protein complex structure only, both binding partners are assumed to have the same unbound and bound conformations, respectively. Consequently, contributions to ΔΔG due to changes in intramolecular interactions upon complex formation are not considered, which poses a limitation in those cases where conformational changes upon binding and/or unfolding-to-folding transitions of the binding partners are expected. (ii) Non-peptide ligands, cofactors, metal ions and water molecules are not taken into account. Energetic contributions to ΔΔG for sidechains contacting these molecular species are thus missing. (iii) Due to the nature of knowledge-based scoring functions, all terms in DrugScorePPI are pairwise additive. For that reason, cooperative effects between mutations, as have been observed in double-mutant cycles (16), are not taken into account. This is also a reason why indirect effects exerted by residues not making direct interactions in the interface are generally not captured, as described above. Likewise, only interactions between atoms at a distance < 5 Å are scored, thus neglecting long-range contributions due to, e.g. electrostatic interactions. Finally, the knowledge-based potentials represent ‘effective pair energies’ (8,17) and, thus, are expected to implicitly cover van der Waals, electrostatic and (de-)solvation contributions. However, they neglect changes in the dynamics of the binding partners upon complex formation, which can lead to significant entropic contributions to ΔΔG (17). (iv) The symmetry of protein–protein complexes is not taken into account. That is, in symmetric interfaces, only one residue at a time is considered during alanine scanning, whereas symmetry-related corresponding residues in the other binding partner(s) are modeled as wild-type. However, due to the additive nature of the pair potentials, single contributions of symmetry-related residues may be added.
The DrugScorePPI webserver has been implemented in Python, as have been the subroutines to calculate the degree of buriedness and to detect possible saltbridges. The DrugScorePPI scoring routine has been implemented in C++. Given the low computational demand of our approach, up to 10 submitted jobs can be run in parallel at present.
To evaluate the predictive power of the DrugScorePPI webservice using the adapted DrugScorePPI potentials, ΔΔG values were computed for interface residues in the protein–protein complex Ras/RalGDS (PDB code: 1lfd) and compared to experimental values for 22 alanine mutants (Supplementary Table S2) taken from ref. (18). We note that the Ras/RalGDS complex was not part of the training set to adapt the pair potentials and, thus, provides an independent external test case. A predictive correlation coefficient rpred = 0.66 was found (Table 2), which is close to rLOO = 0.64 from the leave-one-mutation-out cross-validation, again demonstrating the robustness of the model. The predictive power of the adapted DrugScorePPI scoring function was then compared to four other computational methods that are able to predict changes in the binding free energy upon alanine mutations in protein interfaces: FoldX (12), MM/GBSA (17), the CC/PBSA method (19) and Robetta (20) (Figure 5). As demonstrated by the statistical parameters given in Table 2, the adapted DrugScorePPI potentials significantly outperform the CC/PBSA, FoldX and Robetta methods with respect to predictive power and perform as good as the MM/GBSA method, which had been applied to a subset of 16 mutations. In addition, DrugScorePPI is the most efficient method, as it only requires about three seconds per residue for this system on a standard CPU.
We note that the external validation dataset consisting of 22 mutations is rather small. However, the lack of a reasonably large and independent test set to validate and compare different methods is a common problem in this field. At least for the adapted DrugScorePPI potentials, due to the overall diversity of our training dataset, each leave-one-complex-out cross-validation step comes close to testing against an independent test set. Thus, we consider this cross-validation (Table 1) to be strongly indicative of the predictive power of our method.
The development of the DrugScorePPI webserver was motivated by a growing demand for comprehensive tools assisting research in the field of protein–protein recognition. The DrugScorePPI webserver allows fast and accurate in silico alanine scanning based on adapted knowledge-based distance-dependent pair potentials. The approach has been successfully validated on an independent external test set, and the results on this dataset compare favorably with other established methods. The DrugScorePPI webserver is primarily intended for identifying hotspot residues in protein–protein interfaces. Knowledge of potential hotspot residues is valuable for guiding biological experiments and in the development of protein-protein interaction modulators (2). A user-friendly interface, minimal demands on input information, and a detailed output as well as an embedded visualization capability make this web server potentially useful for users without a prior knowledge of structural bioinformatics analyses. Overall, we expect the DrugScorePPI webserver to be a valuable tool for predicting hotspots in protein–protein interfaces.
Supplementary Data are available at NAR Online.
Heinrich-Heine-University, Düsseldorf. Funding for open access charges: Institiutional funds of Heinrich-Heine-University.
Conflict of interest statement. None declared.