Protein distance, protein connectivity, expression level and complex number in the yeast protein-protein interaction network
We estimated the rates of amino acid substitution using the amino acid sequences of the orthologous pairs of S. cerevisiae
and S. paradoxus
and thereby calculated the protein distances (by Kimura's method 1983 [19
]). A number of parameters like protein expression level, protein connectivity and complex forming nature of a protein were previously shown to affect the rate of protein evolution [20
]. However, there has been no evidence whether the above mentioned factors independently determine the evolutionary rate of a protein. We first determined the non-parametric Spearman's correlation of the above mentioned three biological factors using the protein distance. All the three parameters correlate negatively with the protein evolutionary rate in CORE and FULL datasets (Table ). In order to examine whether all the three factors independently influence evolutionary rate we did partial correlation analysis. In partial correlation analysis, we focused on the correlation between evolutionary rate and one of the aforementioned three factors, thereby controlling the other two factors. We observed that all the factors have significant partial correlation with the protein evolutionary rates (Table ). However, in some cases partial correlation analysis is not reliable to detect the independent influence of various factors [6
]. We, therefore, performed multivariate regression analysis [22
] on both the datasets. Multivariate regression analysis has been employed by Plotkin and Fraser to justify the independent contribution of multiple variables in governing protein evolutionary rates in yeast [23
]. Multivariate regression method enabled us to study the influence of all potential predictor variables at the same time and can eliminate step by step those predictors that contribute least to the regression model. Multivariate regression analysis confirmed that all the aforementationed three factors independently influence evolutionary rate of proteins in both the datasets (Table ).
Correlation and partial correlation analysis of four putative determinants of protein distance.
Multiple regression analysis between various factors and evolutionary rate.
Principal Component Analysis (PCA) was then used to assess the contribution of each variable. The dominant eigen vectors (taken as equal to or greater than 1) that emerge from this analysis can be interpreted as the most important contributors guiding protein evolution. The first principal component accounted for 43% and 44% of the total variance for CORE and FULL dataset respectively. Its main contribution comes from the complex number (CORE: ≈ 0.77; FULL: ≈ 0.78) and expression level (CORE: ≈ 0.75; FULL: ≈ 0.71) whereas the contribution of the degree (CORE: ≈ 0.40; FULL: ≈ 0.47) was low. Moreover, the first principal component generated by PCA is also significantly negatively correlated (CORE: Spearman's ρ = -0.439, P = 1.00 × 10-6; FULL: Spearman's ρ = -0.415, P = 1.00 × 10-6) with protein distance. Thus, our study puts forward a novel determinant of evolutionary rates for yeast proteins - the complex forming ability of proteins emerged as a significant contributor of evolutionary rate variation followed by expression level and protein connectivity.
In the later sections of the paper we deal with the role of the features of the interacting partners in modulating the evolutionary rates of proteins in the yeast protein-protein interaction network since it has recently been considered as an important force in protein evolution [18
]. However our result from PCA motivates us to re-examine this result while taking into account the contribution from the additional parameter - complex number which is untraced so far.
Complex forming DF proteins evolve slower than SF proteins
In general all biological processes require precise organization of molecules and complexes which are the fundamental units of macromolecular organization [24
]. Recently it has also been said that the formation of proteins into stable protein complexes plays a fundamental role in the operation of the cell and the genes coding for the protein pairs that participate in the same protein complex are conserved [25
]. We scan both our CORE and FULL datasets to check the ratio of complex-forming to non-complex-forming proteins in each dataset and we found in CORE dataset the ratio is 0.82 whereas in FULL dataset the ratio is 0.52 (two sided Fishers exact test, P
= 1.60 × 10-13
). From this observation it is clear that the CORE dataset is biased with a preponderance of complex forming proteins. The emergence of complex forming proteins as the main contributor of evolutionary rate variation is again supported by the fact that the proteins in FULL dataset (3335 proteins are present in the dataset) evolve faster than the proteins that are present in CORE dataset (1741 proteins are present in the dataset) (Mann-Whitney U
= 1.50 × 10-2
Previously, Makino and Gojobori (2006) showed DF proteins evolve slower than the SF proteins in yeast PPIs network irrespective of connectivity. We also observed the DF proteins evolve slower than SF proteins in the CORE dataset, while in the FULL dataset no such difference was found [Figure ]. Since CORE dataset contains the larger proportion of complex forming proteins, we reanalyzed our observation by splitting both our CORE and FULL datasets into two groups, viz., complex-forming and non-complex-forming proteins. In our CORE dataset we found 524 out of 1094 SF proteins and 259 out of 616 DF proteins and in FULL dataset 687 out of 1516 SF proteins and 427 out of 1528 DF proteins can act as a subunit of protein complexes. We did not find any significant difference of evolutionary rates between SF and DF proteins in the non-complex group in both the datasets, but complex forming SF proteins evolve faster than the DF proteins in both the CORE and FULL datasets [Figure ]. This observation suggests that the evolutionary rate difference between SF and DF proteins is primarily attributed to the complex forming proteins present in the PPIs network. Contextually, we wanted to explore the relationship between the complex-forming ability of the DF and SF proteins with their evolutionary rates. For this, we have counted the number of complexes for each DF/SF protein in which it can participate as a subunit and labeled this number as the complex number for this protein. We performed Spearman's rank correlation analysis and observed that the complex number correlates negatively with the protein distance (CORE: ρ = -0.156, P = 1.10 × 10-5; FULL: ρ = -0.150, P = 1.00 × 10-6) as well as with the coefficient of functionality (CORE: ρ = -0.083, P = 2.00 × 10-2; FULL: ρ = -0.171, P = 1.00 × 10-6). Thus, we infer that the DF proteins are more likely to be part of protein complexes which might be a decisive factor in lowering their evolutionary rates.
Figure 1 Evolutionary rates of SF and DF proteins. The figure shows the average values of evolutionary rate of SF and DF proteins in CORE and FULL datasets; C denotes for CORE, CC denotes for CORE Complex, CN denotes for CORE Non-complex, F denotes for FULL, FC (more ...)
Highly expressed proteins are known to be more conserved than proteins expressed at low levels [5
]. We obtained comparable results as in the CORE dataset SF proteins have lower expression levels (Mann-Whitney U
= 4.00 × 10-3
) than the DF proteins, whereas no significant differences (Mann-Whitney U
= 3.10 × 10-1
) ware observed in the FULL dataset, similar to the trend as observed for evolutionary rate differences (Table ). Moreover, the complex forming SF proteins have significantly lower average expression level than their DF counterparts in both CORE and FULL datasets which is not observed for the non-complex-forming SF and DF proteins (Table ).
Expression level of SF and DF proteins in both CORE and FULL datasets
The classification of SF and DF proteins was done by considering the functional class assignment of the proteins and their partners in the PPIs. Interestingly, we found a negative correlation between functional coefficient and protein connectivity both in CORE and FULL datasets (CORE: Spearman's ρ = -0.145, P = 1.00 × 10-6; FULL: Spearman's ρ = -0.191, P = 1.00 × 10-6). This correlation suggests that coefficient of functionality decreases with increasing connectivity, i.e., the DF proteins should have higher connections than SF proteins. Accordingly, we observed that DF proteins have higher connections than SF proteins in both CORE and FULL datasets (Table ). Thus the coefficient of functionality is related to the protein connectivity in the overall PPI network. The significant positive correlation (CORE: Spearman's ρ = 0.267, P = 1.00 × 10-6; FULL: Spearman's ρ = 0.270, P = 1.00 × 10-6) between the complex number and the expression level for the DF and SF proteins signifies that the evolutionary rate of the DF proteins is more constrained. This is perhaps due to their greater ability to be a part of protein complexes. Subsequently the increase in the expression levels for the DF proteins is possibly due to their participation in larger number of complexes. This is the interrelationship between the features, viz., the expression level, complex forming ability and the coefficient of functionality, that guided the difference in evolutionary rates of DF and SF proteins.
Connectivity of SF and DF proteins in both CORE and FULL datasets
Complex forming SP proteins evolve slower than DP proteins
Clustering coefficient is the network's small-scale property, addressing the influence of a protein's immediate neighbors on its conservation rate [17
]. It has also been reported that proteins tightly clustered in a particular part of the PPI network have more interactions among themselves than with the proteins in the rest of the network [26
]. We calculated the protein distance of yeast dense part (DP) as well as sparse part (SP) proteins. In an earlier study, it has been shown that SP proteins evolve slower than DP proteins [18
]. In contrast with this observation, our result shows no significant differences between the protein distance of DP and SP proteins in both CORE and FULL datasets [Figure ]. We also calculated the expression level of the DP and SP proteins and our result indicated that there are no significant differences in expression levels between DP and SP proteins for both the datasets (Table ). The clustering coefficients are determined from the degree distribution of the protein itself in the interaction network (see Methods). We therefore wanted to ascertain the relationship between the clustering coefficient and the connectivity of the proteins in the network and quite predictably there is a positive correlation between these two parameters (CORE: ρ = 0.169, P = 1.00 × 10-6
; FULL: ρ = 0.445, P = 1.00 × 10-6
) for the DP and SP proteins taken together. This projects the quite obvious fact that the DP proteins are those with high clustering coefficients resulting from their higher connectivity in the protein-protein interaction networks and thus designated to be DP proteins as they are located in the dense part of the protein interaction networks.
Figure 2 Evolutionary rates of SP and DP proteins. The figure shows the average values of evolutionary rate of SP and DP proteins in CORE and FULL datasets; C denotes for CORE, CC denotes for CORE Complex, CN denotes for CORE Non-complex, F denotes for FULL, FC (more ...)
Expression level of SP and DP proteins in both CORE and FULL datasets
Still, in the previous section we have seen that the evolutionary rate differences between the SF and DF proteins can be attributed to their complex-forming ability. So, we classified the DP and SP proteins into complex forming and non-complex-forming groups. We calculated the evolutionary rates of complex forming DP and SP proteins [Figure ]. From Figure , it is evident that the average value of the protein distance is significantly higher in complex forming DP proteins than complex forming SP proteins in both the CORE and FULL datasets (Mann-Whitney U test, CORE: P = 7.80 × 10-5; FULL: P = 3.90 × 10-5). It clearly shows that the complex forming ability is an important factor for controlling the evolutionary rate for the SP and DP proteins since for non-complex forming SP and DP proteins, the protein distances do not differ significantly. The complex forming SP proteins are also highly expressed and highly connected than their DP counterparts (Tables , ).
Connectivity of SP and DP proteins in both CORE and FULL datasets
The number of protein complexes a protein participates in (i.e., complex number) has been calculated for each DP and SP proteins. The numbers of DP and SP proteins in the CORE dataset that participate in protein complex formation are 289 and 316 respectively out of 483 DP and 692 SP proteins. On the other hand in the FULL dataset 519 DP proteins and 569 SP proteins out of the 916 DP and 1901 SP proteins respectively act as a subunit of any protein-complex. In our study, the number of complexes of which the SP/DP protein is a subunit varies inversely with their evolutionary rate [for CORE: Spearman's ρ (complex number, evolutionary rate) = -0.169, P = 2.80 × 10-5; for FULL: Spearman's ρ (complex number, evolutionary rate) = -0.150, P = 1.00 × 10-6] emphasizing the influence of complex-forming ability in the evolution of SP and DP proteins. Moreover, the DP proteins participate in fewer complexes than the SP proteins as evident from correlation analysis [for CORE: Spearman's ρ (complex number, clustering coefficient) = -0.214, P = 1.00 × 10-6; for FULL Spearman's ρ (complex number, clustering coefficient) = -0.119, P = 8.60 × 10-5]. We observed a significant positive correlation between expression levels and complex numbers [complex number, expression: CORE = 0.241, P = 1.00 × 10-6; FULL = 0.259, P = 1.00 × 10-6 for the DP and SP proteins]. Thus, the complex-forming ability is a significant constraint acting on the SP proteins in order to lower their evolutionary rate and consequently augmenting the expression level for themselves in comparison to the DP proteins.