|Home | About | Journals | Submit | Contact Us | Français|
Hubs are ubiquitous network elements with high connectivity. One of the common observations about hub proteins is their preferential attachment leading to scale-free network topology. Here we examine the question: does rich protein always get richer, or can it get poor too? To answer this question, we compared similar and well-annotated hub proteins in six organisms, from prokaryotes to eukaryotes. Our findings indicate that hub proteins retain, gain or lose connectivity based on the context. Furthermore, the loss or gain of connectivity appears to correlate with the functional role of the protein in a given system.
In a network, hubs represent nodes with large number of links surrounded by nodes with just few connections. The hubs have important roles, not only in information management within a network but also as regulatory molecules (Rodriguez-Caso et al. 2005). Protein–Protein Interaction (PPI) networks, in their general form, appear as a small number of hubs and a large number of sparsely connected non-hub proteins. Recent evidence points to the preponderance of structural disorderliness in hub proteins compared with the non-hub proteins (Haynes et al. 2006; Singh et al. 2007).
A notable feature of the PPI networks is their property of power-law and scale-freeness. The differential degree of connectivity distribution has been ascribed to the preferential attachment of hub proteins (Barabasi and Albert 1999; (Barabasi and Oltvai 2004). According to this model, the probability of connecting to a new node in the network is proportional to the connectivity density (i.e., vertex degree) of the existing node. Thus, a highly connected node has greater chances of attracting a new node compared to the ‘connectivity potential’ of a sparsely connected node, i.e., rich gets richer. Interestingly, enzymes that are candidates for horizontal gene transfer seem to have higher average connectivity than other enzymes (Light et al. 2005). Furthermore, protein age seems to correlate with the network connectivity (Eisenberg and Levanon 2003). Saturation and viability have also been reported as properties of networks showing preferential attachment (D’Souza et al. 2007). Though preferential attachment of hub proteins has been studied in the past, the reasons of its origin and the functional implications of ‘rich-gets-richer’ are unclear. Interestingly, Pagel et al. (2007) propose a variable rate of attachment model according to which power-scaling can be achieved without preferential attachment.
In this study we asked: Is preferential attachment universal to all the protein-protein interaction networks? Are there any exceptions? What is the functional implication of the preferential attachment model, or the lack of it? To answer these questions, well-annotated and evolutionary conserved hub orthologs in several organisms were identified. Six protein–protein interaction (PPI) databases were integrated to minimize the effect of data bias. Stringent criteria were followed to ensure the accuracy of the sample. Overall, our work supports the existing notion of “rich-gets-richer”. However, every hub protein that we studied did not follow the preferential attachment model. Several examples of ‘rich-proteins-getting-poor’ were observed. Based on these findings, we propose a hypothesis of hub convertibility, i.e., a hub protein can get rich or poor, based on the context. Our findings also point to the functional correlation with the loss or gain of hubness.
Six organisms, H. pylori, E. coli, S. cerevisiae (Yeast), C. elegans (Worm), D. melanogaster (Fly) and H. sapiens (Human) were selected in the present study due to their rich annotation and interaction data. The publicly available protein-protein interaction databases (DIP, Salwinski et al. 2004; BIND, Bader et al. 2003; IntAct, Hermjakob et al. 2004; Reactome, Joshi-Tope et al. 2005; HPRD, Mishra et al. 2006; MINT, Chatr-aryamontri et al. 2007) were integrated. We used exact sequence homology to overcome the problem of unique accession identifier in each database. Using this meta-dataset, a sample of non-redundant proteins and their interactions was extracted (Table 1). In all, 44,149 proteins were collected from 100,915 accessions and 205,835 interactions were collected from 260,449 accessions. The average overlap for proteins was 1.95 and the average overlap for interaction between them is 1.26.
The InParanoid algorithm (O’Brien et al. 2005) with BLOSUM45 substitution matrix for prokaryote and BLOSUM60 for eukaryote was used to identify orthologs. The pair wise orthologs were combined into multi-species cluster using MultiParanoid algorithm (Alexeyenko et al. 2006). To separate orthologs from spurious matches, a 50 bits cut-off was used. The matching segment of the longer sequence exceeded 50% of its total length. The ortholog data were also examined using COG and KOG protein family classification.
We selected fold change definition for connectivity, kfc = k/k, and appropriate cutoff to identify hubs in different PPI networks. For Prokaryotes, a node with kfc ≥ 2 was considered as hub (cutoff, P < 0.03 using distribution of standard normalized kfc values in Ortho_Pk). For eukaryotes the criterion was, kfc ≥ 10 (with P < 0.001). A higher cutoff of 10-fold change was used for eukaryotes in order to minimize the effect of false positives, especially in data from S. cerevisiae. Of the orthologs identified in each category (Ortho_PkEk, Ortho_Pk and Ortho_Ek), only those satisfying stringent hub criteria in at least one species were selected. A hub was considered ‘converted to non-hub’ when its kfc value crossed the cutoff. In order to define the nature of this hub convertibility, we observed trends along the complexity profile of six species (H. pylori, E. coli, S. cerevisiae, C. elegans, D. melanogaster and Homo sapiens).
Three classes of hubs were identified from the data: (a) the “getting rich” hubs showing increasing kfc, (b) the “getting poor” hubs showing decreasing kfc and (c) “flexible” hubs with non-uniform connectivity trend across the organisms. The GO annotation for each protein was obtained from the source database (BIND, SGD, Flybase, Wormbase, HPRD). The resulting GO annotation are enriched into 10 top level molecular functions namely: catalytic activity, structural molecule activity, transporter activity, binding, antioxidant activity, chaperone regulator activity, enzyme regulator activity, transcription regulator activity, translation regulator activity and molecular transducer activity.
Here the aim was to find if a hub-protein retained the same partner across the species. A statistically significant (e-value less than 0.001) similarity was counted as 1 (i.e., partner retained) else it was counted as 0 (i.e., partner changed). By summing up the scores, the total number of evolutionally conserved interactions was computed. Furthermore, the total number was divided by individual connectivity, and average score was calculated. The statistical significance of ‘getting rich’, ‘getting poor’ and ‘flexible’ groups was studied using F-test. The F-value was calculated as:
SSM and SSG are sum squared error from the total mean and categorical means. df1 and df2 are degree of freedoms of SSM and SSG, respectively, and their values are 20 (21–1) and 2 (3–1) P-value is obtained from F-distribution of respective degrees of freedom (Brandt 1983).
Table 1, summarizes the PPI network data obtained for six model species. All the networks exhibit power law degree distribution of nodes (as p(k) k-) with very similar constant values for the exponent. This reveals comparable scale free nature of all the PPI networks selected. However, the average connectivity, k varies largely across the six species from 3.8 in H. pylori (HPY) to 13.8 in S. cerevisiae. For ortholog selection, a total of 101 orthologs in Prokaryotic group (denoted as Ortho_Pk) and 377 in Eukaryotic group (Ortho_Ek) were identified, out of which 21 were common to both (Ortho_PkEk). These groups of orthologs form the basis of our analysis and argument in support of hub convertibility.
Figure 1 shows the hub connectivity profile (based on kfc) for six species. Using a cutoff level of 2-fold change, all the 21 Ortho_PkEk proteins showed hubness in at least one species. Furthermore analysis of the kfc data showed three distinct trends for 21 core proteins (separately indicated for clarity in Fig. 1). A total of 57.1% of the core proteins showed “flexible” hub convertibility, 14.3% proteins showed “getting rich” and 28.6% proteins showed “getting poor” trend. It is important to note that “getting poor” proteins show higher confidence of hub convertibility. The hubness is lost in higher organisms even though the cutoff is far below the actual limit (kfc ≥ 10 for Eukaryotes). In order to observe the hub conversion between two major groups of phylogeny, we compared the average kfc (kfc) profiles by grouping species in each category (Fig. 2). A common cutoff of kfc ~ 2 was used to observe the hub convertibility between Prokaryotes and Eukaryotes. Some proteins [proteins 6 and 17 in Ortho_EkPk] were found to be retained as highly connected nodes in both the species groups (Hub–Hub). It is evident from their annotation that these two proteins (with functional domains Ribosomal L14 and Elongation Factor, respectively) are essential to protein folding mechanism. Orthologs (12, 15 and 19 with functional annotations DNAj, ATP_synthase and dehydrogenase) were observed to “get rich”, in Eukaryotes. Orthologs (4, 11 13 representing MSH family, helicase and ClpA protein, respectively) show “getting poor” trend from Prokaryotes to Eukaryotes.
Using the criteria of fold change cutoff kfc (2 and 10 for Prokaryotes and Eukaryotes respectively), about 25% of Ortho_Pk proteins (25 out of 101) and 12.7% of Ortho_Ek proteins (48 out of 377) were found to exhibit hubness. It was difficult to establish meaningful conclusions for the remaining orthologs exhibiting significant connectivity variations. These nodes were below the connectivity threshold set by hubness criteria for all the organisms.
Even though the two organisms (H. pylori and E. coli) belong to the same super kingdom of Eubacteria, they exhibit different patterns of hubness for the conserved proteins (Fig. 3a). Thirteen out of 25 potential hub candidates from Ortho_Pk list, exhibit differential hubness. Eight core proteins (32%) are significantly richer (seven times higher) in HPY compared to ECL showing “getting poor” trend. Whereas five proteins (20%) get richer in ECL connectivity (10 times more than HPY kfc values), the remaining 12 proteins continue to exist as hub nodes in both the networks. Figure 3b, with average kfc profiles for these three groups summarizes these observations for Prokaryote hub convertibility.
Among 48 Ortho_Ek hub proteins, 45.8% proteins (22 out of 48) were found to show “get rich” pattern whereas 39.6% of ortholog proteins (19 out of 48) showed decreasing hubness across the four species (Fig. 4a). The gain and loss of hubness indicates a significant change of connectivity in several organisms, with average kfc profiles (in Fig. 4b) indicating an order of magnitude difference. Hubness trends for the “flexible” nodes shows a large deviation in connectivity, possibly due to presence of higher number of false positives in Yeast and Human data. The “getting poor” and “getting rich” hub convertibility trends are stable with smaller standard deviations. The hub connectivity profile shows a distinct trend among various organisms (Fig. 5a–d).
Figure 6 shows percent protein-partners with at least one significantly similar (E-value of less than 0.001) partner protein. The mean values and standard deviations for each protein class are shown. The figure suggests that getting rich and getting poor categories of proteins are significantly higher in retaining protein-partners than the proteins in the flexible category.
The relative abundance of molecular functions for six species in each category was also studied (Fig. 7a). Figure 7b shows changes in average functional counts from H. Pylori to Human for ‘getting rich’, ‘flexible’ and ‘getting poor’ categories. The changes reflect number of documented annotations to a protein. Our work broadly suggests some functional meaning of the protein-partnership trend. However, more studies need to be performed to address this issue in depth.
The ability to understand information-flow in bio-molecular networks is one of the key goals in systems biology. Hubs are central to this process of information management, as they literally ‘hold the networks together’.
Originally introduced by Barabasi and Albert (1999) and followed by several interesting papers (Barabasi and Oltvai 2004; Nacher and Akutsu 2007), scale freeness has been widely accepted as a generic model of networks exhibiting power law distributions. However, there are some reports of protein interaction networks not conforming to the power law (Khanin and Wit 2006; Tanaka et al. 2005). It is further argued that some published PPI networks are better described by an exponential function and proponents of this approach recommend using rank plots instead of frequency-degree plots (Tanaka et al. 2005). Furthermore, it has been demonstrated that power law degree distribution is equivalent to a power law degree-rank function only if scaling exponent is greater than 2 (Wu et al. 2008). It is important to recognize that rich getting richer paradigm is not the only mechanism leading to scale free networks (Li et al. 2005). Finaly, the scale-free topology of existing protein-protein interaction networks may not be confidently extrapolated to complete interactomes (Han et al. 2005).
We addressed the paradigm of ‘rich-getting-richer’ from a different perspective. Our aim was to see if rich always get richer. If no, what would happen if hub proteins lost most of the links? As a first step, the data were integrated from six protein-protein interaction databases to create a reasonably large size and variety of the sample. Sequence homology was used to eliminate redundancy among hub proteins. A protein node was classified as a hub or non-hub based on the extent of its connectivity. People have used several criteria to define hubness based on the type of network analysis. For example, Barabasi and Albert (1999) suggest that, hub nodes in scale free networks generally exhibit connectivity, k, an order of magnitude higher than average vertex degree k of the network. Unfortunately, such measures cannot be generalized to biological networks exhibiting modularity. Han et al. (2004) used hub node criterion of k ≥ 5 (in a network with average vertex degree k = 3.6). Such arbitrary cutoff measures can lead to misclassification of nodes in large networks with potential false positive interactions. Single criterion, based on k cutoff is misleading as even a non-hub node in dense network might have edges more than hub node in less denser network. The Z score cutoff (≥2.5), based on the standardized normal distribution of connectivity values (k) has also been used to establish significant hub nodes (Ekman et al. 2006; Guimera and Nunes Amaral 2005). The Z scores definition is inappropriate in our study, since different networks exhibit different degree distributions with varying k.
One of the confounding factors in studying PPI networks is the low quality of experimental data (Gentleman and Huber 2007; Jensen and Bork 2008). This can be particularly worrying if one relies too much on a particular database or attempts to integrate several independently constructed databases (Alexeyenko and Sonnhammer 2009). To address the issue of data integrity, we adopted a stringent metric of 10-fold change to identify hubs and minimize false positives, mainly for eukaryotes. Thus, in theory even if 50% edges turn out to be false positives, the protein will still show a significantly high vertex degree to qualify for the standard definition of hubness. A smaller cutoff was, however, used for prokaryotic hub proteins given the relatively smaller k and smaller size of networks.
The current dataset includes proteins representing both physical and functional interactions to reduce data bias (Han et al. 2004). To ensure that we were studying the same protein in different organisms, a set of conserved proteins (orthologs) were extracted in all the six species. A stringent threshold of 50 bits and matching segment exceeding 50% of its total length was adopted. The ortholog data were also examined using COG and KOG protein family classification.
Our observation of proteins “getting rich” supports the preferential attachment model for all scale free networks (Barabasi and Albert 1999; Qian et al. 2001). However, we also found incidences of rich-proteins-getting-poor. Interestingly, the existing network growth models do not consider the possibility of fluctuating connectivity in hub proteins. Even if we take into account the yeast, S. cerevisiae where average connectivity score is higher, the results are still significant at 10-fold change cutoff. Our findings support the concept of hub convertibility i.e., loss, retention and gain of hubness based on the context.
We further asked if the getting-rich or getting-poor hub proteins retain their core protein partners across all the species? A total of 380 protein-partners were found to exhibit significant similarity by way of their sequence identity (e-value 0.001). Among them, 24 partners shared more than two species. Although statically non-significant (P-value = 0.13), the getting-rich proteins tend to maintain their interacting partners, thereby reflecting “an intrinsic scale-free design” of larger networks. However from this data, we could not identify an interaction partner that was ortholog in all six species. Interestingly, the functional-spread of the hub proteins shows a decrease as their partners gradually downsize in number. The present hub convertibility data suggests that several ancient proteins (hub nodes in Prokaryotes) are conserved i.e., remain as orthologs in eukaryotes despite the loss of hubness. One reason for this observation could be their key roles (Kunin et al. 2004) in the cell, irrespective of their observed connectivity patterns.
In future, it would be interesting to address following questions (i) how do new hubs arise in the networks? (ii) Which cellular decisions determine the ‘retirement’ of proteins that previously existed as hubs? (iii) Is there any protein-structure basis for hub convertibility? (iv) How networks compensate for the loss of rich proteins getting poor? (v) How is robustness maintained in view of fluctuating connectivity trends in hub/non-hub proteins? (vi) Are some of the proteins more susceptible to hub convertibility than others, If yes—why? (vii) Can the ‘conversion potential’ of a protein be predicted from its sequence/structure data? (viii) Does compartmentalization impact the gain or loss of hubness? (ix) Does protein-partner loss/retention impact the maintenance of specific functional modules? (x) How do metabolic needs impact component reuse vis-à-vis retention or invention of novel functional modules in a network?
This work was funded with intramural grant from RIKEN. P. K. Dhar gratefully acknowledges Dr. Y. Sakaki’s encouragement for this work. R. K. Rao and L. Samavedham thankfully acknowledge the internship support from RIKEN and NUS.
Open Access This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
Kyaw Tun, Email: pj.nekir.csg@nutwayk.
Raghuraj Keshava Rao, Email: gs.ude.sun@jaruhgar.
Lakshminarayanan Samavedham, Email: gs.ude.sun@slehc.
Hiroshi Tanaka, Email: pj.ca.dmt.mic@akanat.
Pawan K. Dhar, Email: pj.nekir@rahdkp.