We analyze the polymorphism in S. cerevisiae
residues in domains associated with PPIs (). The data we used contain the gene and protein sequences of 39 S. cerevisiae
strains included in the Saccharomyces
Genome Resequencing Project database 
. These strains were collected from around the world and their genomes were sequenced by ABI shotgun sequencing and Illumina GA (Solexa). The database supplies all single nucleotide polymorphisms (SNPs) in the 38 strains compared to the genome of the reference laboratory strain. In addition, it supplies multiple sequence alignments of the protein sequences of the 39 strains. We used these data to investigate synonymous and non-synonymous SNPs in domain residues involved in protein interactions. The number of solved structures of S. cerevisiae
complexes is limited, and consequently the dataset of S. cerevisiae
domains that were reliably determined as involved in protein interactions is small. Therefore, we extended the dataset of interacting domains by inferring about domain interaction from other data sources, which differed in the reliability that could be attributed to the interaction. We use the term ‘levels of resolution’ to refer to the classification of residues as interacting and non-interacting at different reliability levels. Our analysis included seven levels of resolution, as defined in (see also Methods
). Only proteins with Pfam domain annotations 
that had at least one SNP were included in the analysis, resulting in a dataset of 3,927 proteins. This dataset was further reduced to 3,669 proteins after filtering alternative splicing variants and ambiguously annotated proteins. For each resolution level we calculated the fractions of the non-synonymous SNPs out of all relevant residues of a protein, and then computed the average of these fractions (Methods
). As shown in , the fraction of non-synonymous SNPs increases as the reliability that these residues are involved in protein interactions decreases. We repeated the analysis using different thresholds for determination of SNPs (Supporting Figure S1A
). The trend of reliable interacting residues being less polymorphic persisted for all thresholds.
Residues included in the analysis at the different resolution levels.
Analysis of residue conservation in interacting domains.
As it was previously reported that highly expressed proteins evolve slower than weakly expressed ones 
, we turned to verify that our results are not affected by variation in expression levels. To this end we repeated our analysis when the proteins were divided to two groups by their expression levels 
: weakly-expressed proteins and highly-expressed proteins (Methods
). For both groups we found reduced polymorphism in interaction-mediating domains (Supporting Figure S2
), consistent with the above results and emphasizing the robustness of our findings.
Due to the specific population structure of the yeast strains that were sequenced and in order to cope with the effect that it might have on our analysis, we repeated the analysis with representatives of the various clades. The selection of these representatives was done iteratively using a tree of all strains, built based on SNP differences in their sequences 
. In each iteration the selected strain was the most distant one from the other already selected strains. The results using six and ten representative strains were consistent with the aforementioned results, further supporting our conclusions (Supporting Figure S1B,C
To further investigate the conservation of interacting domains taking into account the local mutation rate, we calculated the non-synonymous to synonymous mutation ratios (pN/pS). This analysis was carried out for the residues at various resolution levels as in the previous analyses, each time comparing the distribution of pN/pS ratios of the relevant residues in the studied proteins to their distribution in a complementary set of residues in the same proteins (). This comparison revealed that the pN/pS values of interacting domains are lower than those of non-interacting domains, implying that residues in interacting domains are more conserved. We repeated the analysis using different thresholds for SNP determination, as described above, and found that the phenomenon is consistent (Supporting Figure S3
). We verified that our results are not biased due to over-representation of specific paralogs that may have specific conservation patterns, using the dataset of Wapinski et al.
of paralogous proteins in S.cerevisiae
. We kept a representative protein for each paralogous cluster and repeated the analysis, obtaining results consistent with the above.
Next, we investigated the substitutions of amino acids in the 38 S. cerevisiae
strains compared to the laboratory strain. We compared the distribution of the substitution scores between interacting and non-interacting residues, defined for different resolution levels as in (see Methods
). We found that all comparisons (except for the comparison of the residue set in the highest resolution level to its complementary set, which was based on a small number of scores) showed statistically significant differences (p-values ranged between 9×10−3
, applying FDR correction). In all the comparisons the residues in interacting domains have substitutions with higher scores than the residues in the non-interacting domains, implying that they are substituted by similar amino acids that probably preserve their functionality.
Our analysis provides a bird's eye view on the polymorphism in yeast residues involved in protein interactions at various levels of resolution. To obtain a more concrete understanding of our results, we provide as an example a closer look at one protein in our data, DCP1, mRNA-decapping enzyme subunit 1 (Q12517), a single domain protein whose homodimer structure was solved by crystallography 
and included in the 3DID database. The domain mediating the homodimerization is PF06058 (resolution 2 in our analysis) and 18 specific residues were determined as participating in the interaction (resolution 1 in our analysis). Analyzing the multiple sequence alignment and SNPs regarding residues at resolution level 1, we obtain that all interacting residues exhibit no polymorphism. At the domain resolution (resolution 2) we identified three positions with non-synonymous substitutions. Thus, interacting residues of this domain are less polymorphic, consistent with the trend observed for the whole database. The crystal structure and the polymorphism results suggest that the specific interacting residues have a greater effect on the stability of the complex than other domain residues. To substantiate this conjecture we applied to DCP1 dimer the FoldX algorithm 
, an algorithm that quantitatively estimates the importance and contribution of interface residues to the stability of a protein complex. This algorithm performs a computational alanine-scan for residues in a protein interface and calculates the change in the energy of the complex. Application of the algorithm to 16 residues in the interface that are classified as interacting revealed an average energy change of 0.94 kcal/mole per residue, alanine substitution of a polymorphic non-interacting residue of the domain was predicted to even increase the complex stability (−2.15 kcal/mole), and substitutions in seven non-interacting, non-polymorphic interface residues were predicted to destabilize the complex by an average of 0.17 kcal/mole per residue. These results are consistent with the expectations from the SNP analysis and complex structure, where substitutions of non-polymorphic domain residues have on average a greater effect on complex stability than the substitution of the polymorphic residue (0.17 versus −2.15 kcal/mole), and the non-polymorphic interacting residues have a greater effect on complex stability than non-polymorphic non-interacting residues (0.94 versus 0.17 kcal/mole).
In summary, our study shows at the different resolutions that residues in domains associated with PPIs are less polymorphic than in other domains. At the lowest resolution level, our results are consistent with that of Vishnoi et al.
, who found that residues in domains are less polymorphic than extra-domain residues. At a higher resolution are domains that were found in yeast and other organisms to mediate PPIs. We found that their residues are less polymorphic than residues in other domains. This further emphasizes the functionality of these domains. These domains were suggested to constitute a limited repertoire of domain-pairs that play a role as PPI mediators, and their lower polymorphism among yeast strains is consistent with this supposition. At the finest resolution, of yeast protein residues that are included in interacting domains or were shown to be involved in interaction in crystal structures, our results imply that these residues and domains undergo tighter selection to preserve their functionality.