|Home | About | Journals | Submit | Contact Us | Français|
Post-translational modifications offer a dynamic way to regulate protein activity, subcellular localization and stability. Here we estimate the effect of phosphorylation on protein binding and function for different types of complexes from human proteome. We find that phosphorylation sites have a tendency to be located on binding interfaces in heterooligomeric and weak transient homooligomeric complexes. The analysis of molecular mechanisms of phosphorylation shows that phosphorylation may modulate the strength of interactions directly on interfaces and binding hotspots have a tendency to be phosphorylated in heterooligomers. Although majority of phosphosites do not show significant estimated stability differences upon attaching the phosphate groups, for about one third of all complexes it causes relatively large changes in binding energy. We discuss the cases where phosphorylation mediates the complex formation and regulates the function. We show that phosphorylation sites are not only more likely to be evolutionary conserved than surface residues but even more so than other interfacial residues.
Cellular regulatory mechanisms provide a sensitive, specific and robust response to external stimuli and post-translational modifications offer a dynamic way to regulate protein activity, subcellular localization, and stability (Olsen et al., 2006; Ptacek and Snyder, 2006; Schlessinger, 2000). Such dynamic regulation is achieved through reversibility and fast kinetics of post-translational modifications, such as when a phosphate group can be quickly attached and removed by kinases and phosphatases, respectively. Indeed, adding or removing a dianionic phosphate group somewhere on a protein might change its physico-chemical properties, stability, kinetics, and dynamics (Johnson, 2009). Recent phosphoproteomic analyses have revealed that the majority of proteins in a mammalian cell are phosphorylated (Olsen et al., 2006; Olsen et al., 2010) so regulatory mechanisms involving phosphorylation are very widespread.
Many signaling and other types of pathways involve a dense network of protein-protein interactions and the reaction rates of these processes depend on protein concentrations and association/dissociation constants of protein assemblies. Phosphorylation can be used to modulate the nature and the strength of protein-protein interactions thereby regulating protein binding and coordinating different pathways. If phosphorylation occurs at or near a binding interface, it may directly affect the binding energy of the complex. At the same time phosphorylation of a site outside a binding interface may cause long-range conformational changes through allosteric mechanisms and affect the binding of the partner as observed for the classical example of glycogen phosphorylase (Jenal and Galperin, 2009; Lin et al., 1997). Another aspect of coupling between phosphorylation and binding is the recognition of the phosphates by special phospho-Ser/Thr or Tyr binding domains (such as 14-3-3, SH2, MH2 and others), such process may release the protein from autoinhibition and result in activation and subsequent signal propagation like in the case for Src kinases (Schlessinger, 2000). Finally it has been shown that flexible regions and intrinsically disordered proteins have a tendency to be phosphorylated and phosphorylation might induce disorder-to-order as well as order-to-disorder transitions (Antz et al., 1999; Collins et al., 2008; Gsponer et al., 2008; Radhakrishnan et al., 1997).
In this work we analyze the effect of phosphorylation on protein binding for different types of complexes from the human proteome and varying by stability and the nature of the interacting subunits. We show that there exists a coupling between phosphorylation and protein-protein binding for all types of heterooligomeric and weak transient homooligomeric complexes. Computational alanine scanning experiments and analysis of the energetic effect of attaching/removing of phosphate groups show that phosphorylation may modulate the strength of interactions directly on interfaces and binding hotspots have a tendency to be phosphorylated for heterooligomers. Although for many Ser/Thr/Tyr sites we did not find significant stability differences upon attaching the phosphate group, for one third of all complexes it brings about a relatively large changes in binding energy of more than 2kcal/mol. We analyze the effect of phosphorylation on protein function and show that several pathways especially hemostasis pathway is enriched with phosphoproteins and phosphosites. At the end we show that phosphosites on interfaces are more likely to be evolutionarily conserved than other interfacial residues.
Using a non-redundant set of 933 phosphorylated human hetero and homooligomeric complexes (see Methods for detail) we have observed on average two phosphorylated sites (pTyr, pSer, or pThr) per protein. Note that the majority of protein complexes do not have actual phosphate groups in the PDB structures (Zanzoni et al., 2011). As one can see from Figure 1, the distribution of fractions of phosphosites in phosphoproteins is quite narrow with a large majority of all phosphocomplexes having about 5-10% of all Ser, Thr, or Tyr residues phosphorylated. The distribution has a long tail, which is consistent with the fact that proteins with multiple phosphorylation sites occur more often than expected by chance, in agreement with previous studies for Arabidopsis thaliana (Riano-Pachon et al., 2010). Overall, we observed the relative fractions of the types of phosphosites to be ~40% pSer, ~25% pThr, and ~35% pTyr in protein structural complexes, and this observation did not depend on whether the complexes represented hetero- or homooligomers. The frequencies of pSer, pThr, and pTyr observed in structural complexes were quite different from those obtained in high-throughput experiments for phosphoproteomes, which identified only a small fraction of pTyr sites (Hunter and Sefton, 1980; Olsen et al., 2006). This discrepancy may be explained by the observation that hydrophobic Tyr is more likely to be found in structured regions while Ser and Thr are frequently found in disordered and flexible regions. Indeed, recently it was reported that almost half of pTyr sites were located within conserved protein domains (Sugiyama et al., 2008). Moreover, tyrosine phosphorylation might occur on less abundant proteins compared to serine and threonine phosphorylation and so the statistics for rather redundant phosphoproteomes may differ from our non-redundant set.
Further, we studied the coupling between phosphorylation and protein-protein binding by examining binding interfaces and locations of phosphosites in complexes (Table S1). Overall, we found that the association between phosphorylation sites and binding interfaces is very strong for heterooligomers (Fisher’s exact test p-value = 7.4e-15) and significant but not so prominentfor weak transient homooligomers (p-value = 0.0008) (Figure 1, Table 1 and Table S2). No association was found for permanent and strong transient homooligomers. Since the stability of the complex depends on the number of subunits we also performed a similar analysis restricted to dimers and found similar trends (p-value = 5.3e-06 for heterooligomers). The tendency of phosphosites to be involved in binding did not correlate with the estimated stability of heterooligomers which, in turn, were generally less stable than homoologomers according to our analysis (Wilcoxon rank sum test p-value = 0.03). These results are consistent with our previous study which showed that transient complexes which bind different protein partners using the same interface (promiscuous binding) are enriched with Tyr, Ser, and Thr among a few other residues on their interfaces, and their phosphorylation may provide the switch between different functional pathways (Tyagi et al., 2009).
Although phosphorylated sites are usually located on protein surfaces, some of their structural properties are different from the other surface residues (Gnad et al., 2007; Jimenez et al., 2007; Zanzoni et al., 2011). We analyzed the structural properties of phosphosites (sites which can be phosphorylated even if there is no actual phosphate present in PDB structure) on interfaces to see if these properties are different from non-phosphorylated Ser/Thr/Tyr sites on interfaces. Phosphosites in heterooligomers seem to be more solvent accessible than non-phosphorylated sites in isolated protomers (on average by 23 Å2 more accessible, p-value = 2.2e-16) and tend to change solvent accessibility upon complex formation burying more surface area (on average by 13Å2, p-value = 2.2e-16, Table 1, Figure S1). This is consistent with our previous observation that phosphosites are predominantly located on binding interfaces. At interfaces phosphorylated sites contribute to the complex stability by forming more hydrogen bonds and residues contacts than non-phosphosites on interfaces (for hydrogen bond difference p-value = 0.0005 for hetero and p-value = 0.04 for weak homooligomers, Table 1). Additionally Tyr residues tend to be located in the core of protein interfaces, playing a critical role for oligomerization through aromatic stacking interactions, its phosphorylation therefore might directly affect the binding affinity. The estimate of binding energy provides additional evidence for these findings as shown in the following section.
Those residues which are essential for the structural integrity of proteins or protein complexes are called binding hotspots (Bogan and Thorn, 1998; Tuncbag et al., 2009). They are predominantly located on interaction interfaces and their substitution by different amino acids (for example, Ala) causes large differences in binding energy (more than 1-2 kcal/mol), destabilizing the complex. The effect of such substitutions and therefore the contribution of a given site to the binding energy can be measured in terms of ΔΔGala (see Methods). We performed substitutions of Tyr/Thr/Ser residues in phosphoproteins from our test set by Ala (computational alanine scanning experiments) and calculated ΔΔΔGala separately for phosphorylated and non-phosphorylated sites using the FoldX algorithm (see Methods). Overall, the substitution of amino acids at both phosphorylated and non-phosphorylated sites destabilizes the complex and the ΔΔΔGala distributions are significantly shifted to positive values for all homo- and hetero complexes (both p-values are 2.2e-16). We have not detected any Ala substitutions which would result in increased stability of the native complex by more than 2 kcal/mol (negative values of ΔΔΔGala correspond to stabilizing substitutions). This implies that the interfaces are relatively well optimized which is congruent with the previous studies (Brock et al., 2007).
Even though the majority of substitutions on interfaces do not change the binding energy very much, a significant fraction of them (10% for homooligomers and 13% for heterooligomers) contribute to ΔΔΔGala of more than +2 kcal/mol (destabilizing the complex) or in other words form binding hotspots. We considered the question whether phosphorylation events have a tendency to involve binding hotspots. We found that for heterooligomers the ΔΔΔGala values for amino acid substitutions at phosphorylated sites on binding interfaces are larger compared to other sites on interfaces (Wilcoxon rank-sum test p-value = 0.0003), namely 7% of non-phosphorylated sites and 13% of phosphorylated sites correspondingly contribute more than 2 kcal/mol to ΔΔΔGala (20% of non-phosphorylated and 30% phosphorylated sites have ΔΔGala more than 1kcal/mol respectively). In general, the association between phosphosites and binding hotspots is statistically significant for the entire dataset and for heterooligomers in particular (Fisher’s exact test p-value = 0.0006).This result does not hold true if only homooligomers are considered (Table 1, Figure S2).
As mentioned before, the majority of protein complexes in PDB do not have actual phosphate groups present. Therefore to further assess the energetic effect of phosphorylation, we attached the phosphate group to those Ser/Thr/Tyr sites on binding interfaces which are known to be phosphorylated and calculated the change of binding energy upon phosphorylation as ΔΔΔGp (see Methods). In the majority of cases attaching a phosphate group resulted in very moderate changes in the estimated binding energy of about +0.5-1 kcal/mol. Experimental studies on MAPK cascade scaffold protein showed that introducing phosphate increases the dissociation energy by about 1.5 kcal/mol (Serber and Ferrell, 2007; Strickfaden et al., 2007). Nevertheless, overall, the ΔΔΔGp distribution was significantly shifted toward positive values (Figure 2,p-values are 1.3e-08 for homooligomers and 1.3e-12 for heterooligomers). Namely, in 39% and 35% of the cases the attachment of a phosphate group destabilized the complex for hetero and homooligomers respectively by more than +2 kcal/mol. The phosphorylation of heterooligomers caused slightly higher destabilization compared to homooligomers. On the other hand there were 8 (64) cases where phosphorylation lead to ΔΔΔGp values of less than - 2(1) kcal/mol leading to complex stabilization. There were 12 complexes in our test set where the actual phosphate group was resolved on protein interfaces and in these cases we removed the phosphate group and assessed the effect, in most cases ΔΔΔGp was less than 2 kcal/mol.
The evolutionary conservation of phosphorylated sites has been a topic of several studies and it was found that phosphoproteins are more conserved in evolution than non-phosphorylated ones (Boekhorst et al., 2008; Macek et al., 2008) , whereas the conservation of phosphorylated sites is limited (Levy et al., 2010). One of the reasons for weak conservation of phosphorylated sites is that the majority of phosphorylation events might have occurred relatively recently in evolution, especially Tyr phosphorylation (Chen et al., 2010; Gnad et al., 2010; Sridhara et al., 2011). Trying to clarify this controversy we mapped phosphorylated sites on multiple sequence alignments of manually curated CDD families at the superfamily level and calculated their sequence conservation. Overall, 539 protein complexes from our dataset were mapped to 292 CDD families. First we found in consensus with other studies (Boekhorst et al., 2008; Gnad et al., 2007; Gray and Kumar, 2011; Zanzoni et al., 2011) that phosphorylated sites are more conserved than the surface sites for heterooligomers (Wilcoxon test p-value = 0.00001) (Figure S3). Next we went further and checked whether phosphorylated sites on interfaces are more conserved than other interface sites. Figure 3 shows the probability density plot of sequence conservation calculated with respect to background conservation of the overall family for both phosphosites and all other Tyr/Thr/Ser sites on interfaces. As one can see from this figure that heterooligomers unlike homooligomers have a small peak in the positive range of interface conservation values which is consistent with the previous studies (Choi et al., 2009). Moreover, the majority of non-phosphorylated sites on interfaces are less conserved than the family background (mean value of the distribution is shifted toward negative values) which can be explained by the fact that protein core residues and active sites might be under stronger evolutionary pressure than Ser, The and Tyr residues on interfaces.
When we look at the conservation of phosphorylated sites, it is evident that there are two almost equal populations of Ser/The/Tyr sites: those that are less conserved than the family background and those that are more conserved than the background. Overall the conservation distribution for phosphosites is significantly shifted toward positive values compared to conservation of interfacial non-phosphosites for all complexes and for heterooligomers in particular (p-value = 0.00001 for all and p-value = 0.015 for heterooligomers). Calculated separately for homooligomers this shift is not significant. Thus we see that phosphosites are more conserved than non-phosphosites on interfaces in human complexes implying that there is additional evolutionary pressure to conserve the phosphosites which are important for binding events. It is also consistent with our previous observation that phosphosites in heterooligomers have a tendency to be located at the binding hot spots and such hot spots are more evolutionarily conserved than the rest of the interface.
It was reported earlier that phosphorylated proteins have specific molecular functions in a cell (Wang et al., 2011). We analyzed our non-redundant set of homooliogomers and heterooligomers including phosphorylated and non-phosphorylated proteins (see Methods) and studied their association with particular GO protein functions. We found that heterooligomers with GO annotations “catalytic (GO: 0003824)”, “hydrolase (GO: 0016787)”, “transferase (GO: 0016740)” and “signal transducer (GO: 0004871)” activities have larger numbers of phosphorylated sites on interfaces compared to other proteins (Figure S4, Table S3).
It is also known that some pathways are differentially regulated by using reversible phosphorylation of their constituent proteins. In this respect we performed an analysis of phosphorylated complexes participating in different biological pathways. The data on pathway proteins were taken from the NCBI Biosystems database which includes 5016 human specific pathways, mostly coming from KEGG and Reactome sources (Geer et al., 2010). We found that metabolic and hemostasis pathways (KEGG pathway ID: hsa01100 and Reactome ID: REACT_604) were significantly enriched with phosphorylated homooligomeric complexes (p-value = 0.002) while the “Hemostasis” (REACT_604), “Pathways in cancer (hsa05200)”, “Cell Cycle, Mitotic (REACT_152)” and “Signaling in immune system (REACT_6900)” pathways are enriched with phosphorylated heterooligomeric complexes (p-value = 0.00001, Table S3).
TGF-β signaling is controlled by receptor Ser/Thr kinases and the Smad protein family. In response to cytokine oligomerization, phosphorylation of Ser residues and subsequent activation of cytoplasmic Ser/Thr kinase occurs. Then activated kinase phosphorylates the C-terminal SSXS motif of specific tumor suppressors from the R-Smad (Smad1, Smad2) protein family. Once phosphorylated, the SSXS motif of Smad2 promotes the formation of a heterooligomer between R-Smad and Smad4 which in turn regulates gene expression. We compared a Smad1 protein from our test set (PDB accession1khu) with a Smad2 protein (1khx) which has actual phosphate groups present in the crystal structure (except for the first Ser). Both structures have SSXS motif located on the binding interface, moreover these proteins are 80% identical and display extensive structural similarity (Figure 4a). We considered the effect of phosphorylation and dephosphorylation of Smad1 and Smad2 on trimer formation. We made a model of phosphorylated Smad1 and calculated the change in binding energy upon phosphorylation of Smad1 and also dephosphorylation of Smad2 (see Methods). The model of phosphorylated state of the Smad1 is shown in Figure 4a, the different subunits are depicted in blue, green and magenta whereas the structure of the actual phosphorylated state of Smad2 protein is depicted in yellow. We showed that phosphorylation of all Ser, especially the third one in the SSXS motif stabilized the complex of Smad1 (negative average ΔΔΔGp values up to −1.5 kcal/mol; Table 2). At the same time dephosphorylation of the second and especially third Ser destabilized the Smad2 complex by up to 2kcal/mol. Removing the phosphate group of the first Ser slightly stabilized the complex.
The results of our computations are consistent with experimental results obtained for the Smad2 protein (Wu et al., 2001) and the effect of phosphorylation of the first Ser in SSXS motif is still considered to be controversial (Abdollah et al., 1997). Experimental studies demonstrate that unphosphorylated Smad2 exists as a monomer, whereas phosphorylation of Smad2 promotes homotrimer formation through its MH2 domain (Wu et al., 2001). Interestingly, the trimer interface overlaps with the interface for the interaction between the Smad2-MH2 domain and receptor Ser/Thr kinase domain, and as was shown previously phosphorylation facilitates the dissociation of Smad2 from kinase (Wu et al., 2001). Therefore this provides an example where phosphorylation mediates the complex formation and through competitive binding implements a negative control mechanism promoting the dissociation of the heterologomeric complex of Smad2 with the kinase domain.
We found that the vast majority of phosphocomplexes contain just a few phosphorylated sites whereas for some proteins up to half of their sites (Ser, Thr and Tyr sites) are potentially phosphorylated at some point, which is evident from the long tail of the probability distribution (Figure 1) for the fraction of phosphosites per protein. Several studies previously established that phosphosites may form clusters along specific regions of a protein sequence or on a protein surface (Schweiger and Linial, 2010; Yachie et al., 2009). Although the main reasons for these findings remain largely unknown, it was observed that in some cases the groups of sites can be phosphorylated simultaneously and cooperatively leading to certain advantages in terms of signal amplification and its strength modulation (Park et al., 2006). In our study we showed that phosphosites have a tendency to be located on binding interfaces in protein complexes and this trend depends on the type of complex. This allows us to better understand the regulation of protein activity through phosphorylation within the framework of protein binding.
There are several reasons for such coupling between phosphorylation and binding. Phosphorylation may modulate the strength of interactions, bringing about changes in binding energy which may trigger the transitions between different conformer and oligomeric states. For the majority of proteins in our dataset the phosphorylation did not change the binding affinity significantly, which is consistent with several experimental studies pointing to the modest effect of phosphorylation on stability and protein conformation (Murray et al., 1998; Serber and Ferrell, 2007; Strickfaden et al., 2007). At the same time in one third of our complexes the attachment of a phosphate group to interfacial Ser/Thr/Tyr sites which are expected to be phosphorylated caused a relatively large change in estimated binding energy. This in turn could lead to conformational changes or due to steric constraints preclude undesired interactions. Moreover, phosphorylation sites on interfaces significantly overlapped with the binding hot spots in heterooligomeric complexes and phosphorylation at binding hotspots could potentially disrupt the complex formation. In addition we showed that phospho Ser/Thr/Tyr on interfaces were more conserved than non-phosphorylated Ser/Thr/Tyr on interfaces. It should be mentioned that regulatory mechanisms of phosphorylation are quite diverse and in some cases phosphorylation might destabilize the complex and lead to protein activation or inactivation, while in others it may mediate complex formation and through competitive binding provide a negative control mechanism (as was shown for Smad example). In our study the phosphate group was attached to only one site at a time and since there can be several phosphosites per protein (on average there are about 2 phosphosites per protein in the set), we expect a greater effect if multiple sites are phosphorylated simultaneously.
Phosphorylation might not directly affect significantly complex stability, but rather provide diversity in recognition patterns and offer recognition sites for binding of certain domains and motifs (e.g. pTyr-binding by the SH2 domain, pSer/pThr binding by the SH2 and FHA domains) thereby modulating binding selectivity. Indeed, the reversibility of phosphorylation events allows decouple the binding selectivity and affinity thereby mediating selective binding even between proteins within transient and not very stable complexes. At the same time, phosphorylation of multiple sites on interfaces may amplify this signal and provide enhanced binding selectivity. Indeed such specific and reversible signaling at the residue level is a good indicator that a previous stage in cellular signaling networks has completed successfully. Many cellular control mechanisms operate at the level of protein-protein interactions and main signaling pathways involve dense networks of protein-protein interactions and phosphorylation events. Moreover, signaling pathways are quite often disrupted in cancer and it was recently shown that somatic cancer mutations are enriched with those that cause gain or loss of phosphorylation sites (Radivojac et al., 2008). Similarly our study showed that the following signaling pathways “Hemostasis”, “Pathways in cancer”, “Cell Cycle, Mitotic”, and “Signaling in immune system” are enriched with phosphorylated heterooligomeric complexes.
Interestingly, we found that metabolic and hemostasis pathways are also enriched with phosphorylated homooligomeric complexes and phosphosites in weak transient homooligomers are considerably involved in binding. Previously we manually compiled a set of experiments which furnish evidence that phosphorylation at or near the homooligomer interface shifted the equilibrium between different oligomeric states with different protein activities (Hashimoto et al., 2011). According to the classical model by Goldbeter and Koshland post-translational modifications may allow large activity changes with only moderate concentration changes to provide sensitive response to external stimuli (Goldbeter and Koshland, 1981). To supplement this model our analysis offers additional new evidence for how reversible phosphorylation events may modulate reversible transitions between different discrete conformations or oligomeric states in homooligomeric and heterooligomeric complexes and might represent an important mechanism for regulation of protein activity.
The data on phosphorylation sites in human proteins is derived from the PhosphoSitePlus (Hornbeck et al., 2004), Phospho.ELM (Dinkel et al., 2011) and PHOSIDA (Gnad et al., 2007) databases. Most phosphorylation sites in these databases are identified by high-throughput (HTP) methods which might contain significant experimental errors (Lin et al., 2010). Therefore we used the GPS 2.1 program (Xue et al., 2008) to verify HTP phosphosites. GPS predicts phosphorylation sites from protein sequence based on sequence patterns using decision-trees. In our study, we employed the most conservative thresholds reported by GPS with the estimated false positive rates being 2% and 4% for Ser/Thr and Tyr sites respectively. The sites identified by low-throughput methods (indicated as “PUBMED_LTP” in PhosphoSitePlus and “LTP” in Phospho.ELM) and HTP sites verified by GPS were then used for our analysis. Proteins with reliable phosphorylation sites were linked to their structures via Uniprot (Magrane and Consortium, 2011) and the phosphorylation sites were mapped onto PDB structures using the Muscle alignment algorithm (Edgar, 2004). Protein list with phosphosites and all results is available at ftp://ftp.ncbi.nih.gov/pub/mmdb/phospho/phosphorylation_on_complexes.xls.
We started our analysis with the whole set of PDB structures (Berman et al., 2000) and retrieved all structures containing more than one protein chain. The oligomeric states and binding interfaces are defined using PISA algorithm (Krissinel and Henrick, 2007). PISA is considered as a standard state-of-art method which detects biological macromolecular assemblies in PDB with 80-90% accuracy. We regard a complex as stable if PISA reports a unique oligomeric state for a given structure. A complex is considered as homooligomer if sequence identities between all sequence pairs in the complex are more than 90% identity, otherwise it is defined as heterooligomer. Phosphorylation sites were mapped to protein complexes and then to compile a non-redundant set, similar proteins (with BLAST p-value < 10e-07) were removed. Finally we obtained 382 homooligomers and 551 heterooligomers with 1983 phosphorylation sites altogether. Homooligomers were further divided into three categories similar to the classification introduced recently (Perkins et al., 2010) according to their ΔG of dissociation calculated by PISA: weak transient (ΔGdiss ~ 0, coexistence of different oligomeric states), strong transient (0 ≤ ΔGdiss ≤ 20 kcal/mol) and permanent (ΔGdiss> 20 kcal/mol). For reference, for dimers in equilibrium, the concentrations of dimers and monomers are equal at ΔGdiss ~5 kcal/mol (Krissinel and Henrick, 2007).
The change in the standard free energy upon complex dissociation may be calculated as follows:
where KD is the dissociation constant and [M] and [C] are equilibrium concentrations of complex C and monomers M. Dissociation energy was calculated with PISA program and was used to assess the complex stability. Since amino acid substitution and phosphate attachment can affect the stability of both monomers and complexes we estimated also the binding energy with the rigid body approach using the same atomic coordinates for the monomers as in the complexes:
where ΔGC and ΔGM are stabilities (unfolding free energies) of the complex and monomers respectively.
The computational alanine scanning and attachment of phosphate groups was performed by the FoldX program (Guerois et al., 2002) using the “complex_alascan” and “PositionScan” options, respectively. The FoldX program calculates the stability of protein complexes using an empirical force field. As was shown previously FoldX performs among the best three methods which estimate the effect of mutations on protein stability. It reaches 0.64 sensitivity and 0.43 specificity (Khan and Vihinen, 2010) of prediction and reports correlation coefficient between experimental and computed ΔΔG values in the range of 0.5-0.8 (Guerois et al., 2002; Potapov et al., 2009). The standard deviation of computed ΔΔG values is reported to be 0.8 kcal/mol (Guerois et al., 2002). In “complex alascan” mode, FoldX replaces the residue on the interface by alanine, optimizes their side chain conformations, and calculates the difference in binding energies between the original and substituted complexes (ΔΔΔGala). Similarly in “PositionScan” mode, FoldX attaches a phosphate group to Ser/Thr/Tyr, optimizes the side chain conformations, and calculates the difference in binding energies between the original and phosphorylated complexes (ΔΔΔGp). Positive and negative values of ΔΔΔG correspond to destabilizing and stabilizing effects respectively. Note that ΔΔΔGp was calculated only for dimers due to limitations of the program.
To examine the evolutionary conservation of phosphorylation sites, we searched protein sequences from our data set using RPS-BLAST (Marchler-Bauer et al., 2002) and the Conserved Domain Database (CDD) (Marchler-Bauer et al., 2009) and then embedded the protein sequences in the CDD multiple sequence alignments. The conservation score was calculated using the al2co program (Pei and Grishin, 2001) with default parameters. It represented the entropy-based measure calculated from sequence weighted observed amino acid frequencies. The score was normalized by subtracting the mean and dividing by the standard deviation of the score distribution for the whole alignment. Therefore the conservation score of a given site can be negative if the site is less conserved than the average conservation background of protein family and vice versa.
We used Gene Ontology (Ashburner et al., 2000) for the annotation of the protein function. The “molecular function” terms of each protein were obtained from Gene Ontology annotation (GOA) (Barrell et al., 2009). For pathway analysis, we used the NCBI BioSystems Database (Geer et al., 2010) and Flink web service (http://www.ncbi.nlm.nih.gov/Structure/flink/flink.cgi) to map proteins to biological pathways. The content of the Biosystems Database comes from several pathway databases: KEGG (Kanehisa et al., 2010), BioCyc (Caspi et al., 2010), PID (Schaefer et al., 2009), and Reactome (Croft et al., 2011). Only human-specific pathways were considered for our analysis. In addition to the set of phosphorylated protein complexes described above, non-redundant sets of homooliogomers and heterooligomers including phosphorylated and non-phosphorylated proteins were compiled to estimate whether phosphorylated proteins are enriched in specific function or pathways. All human complexes were taken from PDB, PISA validated and similar proteins (with BLAST p-value < 10e-07) were then removed as described previously. The final data set contained 248 phosphorylated and 451 non-phosphorylated homooligomers, and 253 phosphorylated and 401 non-phosphorylated heteooligomers respectively.
The authors thank Michael Galperin and Tom Madej for helpful discussions. K.H. was partially supported by a Research Fellowship from the Japan Society for the Promotion of Science. This work was supported by National Institutes of Health/Department of Health and Human Service (DHHS) (Intramural Research program of the National Library of Medicine).
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.