The role of a protein in an interaction pathway is arguably its most important function (Eisenberg et al.
). Thus, protein–protein and protein–substrate interactions are essential for survival. Typically very few residues are essential for any protein interaction interface in the sense that mutating these significantly impacts the reaction (Bogan and Thorn, 1998
; Weiss et al.
); these crucial residues are often referred to as protein–protein interaction hot spots
. One coarse-grained experimental probe for elucidating the function of a protein is to mutate residues that are hypothesized to be involved in function. Alanine, glycine, proline and cysteine scanning mutagenesis (individual substitutions of residues by any of the said amino acids) are used to identify functionally important sites (Clackson and Wells, 1995
; Gardsvoll et al.
; Konishi et al.
; Kouadio et al.
; Qin et al.
). Because of a variety of biophysical and technical reasons, alanine scans dominate. Rarely multiple mutations are tested for the same residue (Xiang et al.
; Yang et al.
). The impact of mutations on function is captured by a variety of probes; one of the more accurate means is the measurement of the change in the binding energy between the wild-type (native sequence) and the mutated protein. Although, large energy changes may result from destabilization of the affected proteins and from deformation of the binding sites, such dramatic alterations often indicate that a hot spot was mutated. To illustrate the relevance of hot spots to research: over 400 PubMed records mention hot spots in 2007 alone. One reasonable definition for a hot spot is that its mutation alters the binding energy by ≥1kcal/mol (Kortemme and Baker, 2002
Computational methods can identify hot spots for proteins of known three-dimensional (3D) structure (DeLano, 2002
; Guerois et al.
; Shulman-Peleg et al.
), and more recent attempts even spot these crucial sites from sequence (Gonzalez-Ruiz and Gohlke, 2006
; Ofran and Rost, 2007b
). ISIS (Ofran and Rost, 2007a
) was the first tool to specifically predict protein–protein interaction hot spots from sequence, but estimates for the effects of single substitutions have long been around (Epstein, 1966
; Vegotsky and Fox, 1962
; Zuckerkandl and Pauling, 1965
). The most recent methods are tailored to predict the effects of non-synonymous single nucleotide polymorphisms (SNPs), i.e. single nucleotide changes that alter the protein sequence (Bromberg and Rost, 2007
; Ng and Henikoff, 2003
; Ramensky et al.
; Yue et al.
). Such methods have not been assessed in light of large-scale alanine scans and hot spots. One reason might be that while function changes are sensed by such methods, the amount or severity of change is not. Thus, the predicted functional change may just as likely be a hot spot as it may not be.
Here, we examined the potential of one particular implementation for in silico
mutagenesis, namely SNAP (Bromberg and Rost, 2007
), that has been optimized to predict the effect of non-synonymous SNPs on a version of the public database PMD (Kawabata et al.
; Nishikawa et al.
) curated by us. SNAP evaluates functional effects of single amino acid substitutions using neural networks; its output is a value from –100 (no effect) to +100 (effect). First, we established that SNAP correctly captured the effect of alanine scans extracted from ASEdb (Thorn and Bogan, 2001
). Then, we assessed substitutions by amino acids other than alanine. Combining these results, we could analyze in silico
to which extent alanine scans correlate with all possible mutations. For technical reasons, we confined this analysis to one particular protein with ample experimental data (hexokinase).
To the best of our knowledge this is the first comprehensive study that connects biophysical data from alanine scans with methods optimized to capture the functional effects of SNPs. Making this connection is by itself an important novelty. What makes it even more interesting is that only in silico can we comprehensively address the question as to how representative current alanine scanning is, and only by this means can we comprehensively study the effects of mutagenesis without exorbitant costs. Further large-scale testing of our pilot study is required to establish more clearly that our approach actually captures functionally important residues and hot spots.