Detecting signaling networks based on PPI data
To evaluate the performance of our model, we applied it to extract signaling networks from yeast PPI network. The PPI data were obtained from the DIP database (
9) and also used by Scott
et al. (
4). In this case, the weight in Equation (
1) was defined as the confidence score of PPI. The ILP model was applied to find two known yeast MAPK signaling pathways: pheromone response and filamentous growth. Generally, the PPI networks generated by high-throughput techniques are very large. For example, there are about 4839 proteins and 14 319 interactions in the PPI network used here. In such a network, there are many components that are far from the starting or ending nodes of the signaling network, which do not likely belong to the signaling network. Therefore, the Depth First Search (DFS) algorithm was employed to reduce the network size, remove the obviously irrelevant nodes and restrict the search space into a realistic and meaningful one. The smaller PPI network generated by DFS consists of all possible paths of length 6–9 with overlapping, and the interactions among proteins in this network are all from the original PPI network. Consequently, two smaller PPI networks were generated by DFS for extracting the two MAPK signaling pathways, respectively. Note that this kind of pre-processing has been widely used in the literature (
4–7).
As mentioned in the preceding section, λ controls the size of the detected network. By varying λ from a small value to a large one in Equation (
1), we can obtain signaling networks with different sizes, i.e. from a complicated network to a linear pathway. Numerical results confirm that a detected larger network always covers a smaller one. Therefore, we can have a series of nested networks by changing λ, where a smaller one is viewed to be more likely involved in the STN since it is covered by all larger networks, but a larger one includes more components.
For the pheromone response pathway, our model was applied to find a signaling network starting from membrane protein STE3 and ending at TF STE12, and the curve for determining λ can be found in of the
Supplementary Data. Figure 2A shows the pheromone response pathways detected by color coding (
4) and ILP model (with a large λ, λ = 0.85), respectively. Comparing the results by our method and color coding based on the same dataset (
4), we can see that the signaling pathway detected by our method covers the one by color coding. Furthermore, other proteins, i.e. STE18, FAR1, STE20, CDC42, STE50 and STE11, were also detected by our method.
B shows the signaling network by ILP with a smaller λ (λ = 0.8). Clearly, it covers the one shown in A. This signaling network consists of 19 proteins. By comparing the detected signaling network with those found by Netsearch (
5) and color coding (
4), we can easily verify that most of the components of the three signaling networks are common. Compared with the signaling network of the same size detected by Netsearch (
5), although our model did not find proteins SST2, DIG1, DIG2 and SPH1 (not in the main chain, see ), we detected STE50 that has been identified by the color coding method (
4), and detected STE20 and CDC42, which are involved in the main chain of the pheromone pathway (see ). Compared against the color coding method, our method did not find DIG1 and DIG2, but detected MPT5, which has been identified by the Netsearch method (
4). In particular, the ILP model successfully detected STE20 in the main chain (see ). Furthermore, the proposed method identified two new proteins, i.e. CLN2 and CDC28, where CLN2/CDC28p complexes repress the pheromone signaling (
26,
27). Therefore, the signaling network found by the proposed method is biologically plausible. In addition, shows the number of components shared among the STNs by ILP, Netsearch and color coding. It can be easily seen that most of the detected components are shared among the three methods.
shows the P-values of functional enrichment on 5 GO terms for members in the signaling network found by our proposed method. It can be easily seen that most of the members in the extracted signaling network have similar functions. In addition, the P-value of the uncovered STN obtained with the random networks as background is smaller than 10− 15, which demonstrates the significance of the extracted STN. From the above results, clearly our method is a good complement to the existing algorithms, whereas the ILP model is easy to be implemented and simple to be interpreted.
| Table 2.The P-values of functional enrichment for pheromone response signaling network found by ILP |
For the filamentous growth invasion pathway, our model was applied to detect the signaling network starting from membrane protein RAS2 and ending at TF STE12, and the curve for determining λ can be found in of the
Supplementary Data. Figure 4A, respectively, shows the signaling pathways detected by color coding (
4) and our method (λ = 0.90). It can be seen from A that the signaling pathway uncovered by our method matches the known signaling pathway (see ) to a large extent. The CDC25 and HSP82, which do not appear in the pathway of KEGG, were detected due to the missing link between RAS2 and CDC42 in the PPI network. With the same dataset, the ILP model can find the identical signaling pathway of the same size as that by color coding. In addition, the ILP model found several additional links compared against the color coding method. The additional links may imply alternative signaling pathways, since such redundant mechanisms can compensate single protein disruptions and keep signal transduction unblocked (
1,
2).
B shows the filamentation signaling network of a larger size detected by the ILP model (λ = 0.90). The signaling network consists of 18 proteins, where the proteins CDC25, SPA2, CYR1, FUS3, HSP82 and BEM1 are assumed to be known and involved in the signaling pathway to test the effectiveness of the additional information. Although it is difficult to know exactly all the proteins involved in a signaling pathway, some components and casual relationships in the signaling pathway can be obtained from literature (
1,
2). Actually, it is easy to include those conditions into the formulation of ILP by simply adding linear constraint [i.e. Equation (
6)] for such a case, which is one of the major advantages of the proposed method. It can be seen from B that the detected signaling network matches the one found by Netsearch (
5) to a large extent. The HSC82 detected by Netsearch is not in our network because there is a direct interaction between STE11 and HSP82. The result by ILP does not include proteins ABP1, DIG1, DIG2 and BNI1, but instead two other proteins VRP1 and LAS17 were found because VRP11, LAS17, BEM1, BUD6 and SRV2 occur in the same complex (
28) and may have similar functions. Furthermore, LAS17 forms complex with ABP1 (
28), implying that LAS17 has similar functions as ABP1. In particular, our method found STE20 that is in the main chain of filamentation pathway (see ) while Netsearch failed to detect it.
shows the P-values of functional enrichment on 5 GO terms for members in the signaling network by our method. From , it can be seen that most of the members in the signaling network have similar functions. In addition, the P-value of the extracted STN calculated with random networks as background is smaller than 10− 15, which indicates the significance of the uncovered STN.
| Table 3.The P-values of functional enrichment for filamentation singling network found by ILP |
From the results described earlier, we can see that the proposed ILP model is effective for uncovering signaling networks from only PPI data. Furthermore, the ILP model is simple and flexible for various conditions, and is able to detect the signaling networks directly instead of heuristical multistage procedure like Netsearch and color coding.
Detecting signaling networks based on integrated data
In the previous section, our method works well even by using only PPI data, partly because the confidence scores of yeast PPIs were estimated with high precision. However, PPIs for many organisms have no confidence scores or have not been estimated properly. On the other hand, a tremendous amount of gene expression data are nowadays available, and provide insights into signaling pathways. In this part, we investigated whether the integration of gene expression profiles (microarray data) with PPI data can improve the performance of the proposed method. Different from Detecting signaling networks based on PPI data Section, we directly apply the ILP model to the data without any pre-processing (e.g. DFS) here. In addition to the two signaling pathways described earlier, the proposed method was also applied to detect the cell wall integrity and high osmolarity (HOG) pathways. shows the details of integration of PPI and gene expression datasets used here. For the three pathways including pheromone response, filamentation and cell wall integrity, the DIP Core (
10) dataset was employed. For the HOG pathway, the SPA interaction data constructed by Arga
et al. (
7) were used here because there are many missing interactions in DIP Core data for the HOG pathway. In the integrated data, the weight
wij in Equation (
1) is the absolute value of correlation coefficients based on the gene expression data, where the network structure is determined by PPI. Furthermore, to see the performance of different methods,
precision and
recall were employed in this work, where
precision is defined as the percentage of components detected by the computational methods that are also in the KEGG pathway, and
recall is the percentage of components in the KEGG pathway that are detected by the computational methods. Note that these statistical measures can only be seen as a rough reference, since there is no gold standard for those cases, i.e. the true signaling networks are not available.
| Table 4.Integration of PPIs with gene expression datasets for detecting yeast MAPK pathways |
shows the signaling networks uncovered by the ILP model, where only pathways linking membrane proteins and TFs are illustrated, and the size of each circle in the signaling network is proportional to the sum of scores of the paths that it is involved in. The curves for determining optimal λ can be found in of the
Supplementary Data. Figure 5A shows the pheromone response pathway (λ = 0.50), which contains 34 proteins. It can be seen that all the components in the main chain have been successfully uncovered by our model, especially including CDC42 and STE20, where both CDC42 and STE20 are not found by Netsearch (
5) while STE20 is not found by the color coding (
4). Compared with the signaling networks detected by Netsearch (
5) and color coding (
4), we can see that the ILP model can uncover almost all the components found by the two existing methods except GPA1, SST2 and SPH1, which are not in the main chain (see ) and have low expression correlations (< 0.5) with other members in the signaling network. However, our method successfully identified STE20 and BNI1, where the former is in the main chain and the latter has been confirmed in KEGG (
25). Furthermore, the detected signaling network contains several additional proteins. Among these proteins, it has been confirmed that they are relevant to the pheromone response, i.e. CDC28-CLN1 and CDC28-CLN2 complexes repress the start of pheromone signaling (
26), IQG1 mediates the regulatory effects of CDC42 on ACT1 (
29), RSR1 is the upstream regulator of CDC42p (
30), GIC1p and GIC2p are downstream effectors of the CDC42p small GTPase (
30), GCS1 is GTPase-activating protein (
31), LAS17 is actin assembly factor (
24), BOI1 is implicated in polar growth (
24), and SPA2 forms a complex with BUD6 and BNI1 (
32).
shows the comparison of ILP model with other existing methods including color coding (
4), Netsearch (
5) and Pathfinder (
8) with respect to
precision and
recall. In , we can learn that our proposed method can find the maximum number of components deposited in the KEGG signaling pathway, whereas Pathfinder got the highest precision. However, Pathfinder adopted the reconstructed PPI network in the computation. It can be seen from the results that, despite the simplicity, our method performs comparably well with existing methods. It should be noted that such comparison is not so reliable, since the true signaling networks are not known and KEGG mainly contains linear pathways instead of signaling networks. shows the functional enrichment of the pheromone signaling network, where we can see that most of the members in the signaling network have similar functions. Furthermore, the
P-value of extracted STN calculated with the background random networks is smaller than 10
− 15, which verified the effectiveness of the proposed method and significance of the extracted STN. In addition, STNs of different sizes for pheromone pathway by adjusting λ can be found in Figure 6 of the Supplementary Data.
| Table 5.Comparison of various methods in detecting pheromone signaling network |
| Table 6.The P-values of functional enrichment for pheromone response signaling network found by ILP based on integrated data |
B shows the filamentation signaling network detected by the ILP model starting from RAS2 and ending at STE12 (λ = 0.50), which contains 28 proteins. It can be seen that our method can detect all the components in the main chain except CDC42 due to the incompleteness of the PPI data used in this article. Compared with Netsearch, our method successfully found STE20, but missed ABP1, HSC82, FUS1 and HSP82 that are not in the main chain. Specifically, HSP82 was not detected by our model because it is not included in the PPI network due to its low expression change, HSC82 was not included due to the missing link between CDC25 and HSC82, while FUS1 has not been confirmed to be related to the filamentation signaling pathway. Despite the failure to detect HSP82 and HSC82, our method detects SSA2 that is also stress gene similar to HSP82 and HSC82 (
33). Furthermore, instead of ABP1, we identified three other members including actin assembly factor LAS17, actin-associated protein RVS167 and PFY1. These three proteins are in the same complex with SRV2, BUD6 and ABP1 (
28), which implies that they have similar functions. SPH1 is included due to its strong correlation with STE7 and STE11, where SPH1 activates STE7 (
34). The rest of the proteins are included because these proteins are shared among different pathways and have strong correlations.
shows the comparison among various methods in terms of precision and recall. In this case, the Pathfinder detects the maximum number of components involved in the KEGG filamentation pathway but has the lowest precision, whereas the ILP model performs comparably well with the other methods. This example demonstrates that no method can always perform best and different methods are complementary to each other. shows the functional enrichment of the filamentation signaling network, where we can see that most of the members in the signaling network have similar functions. Furthermore, the P-value of extracted STN calculated with the background random networks is smaller than 10− 15, which also demonstrates the effectiveness of the proposed method and the significance of the identified network. In addition, STNs of different sizes by the ILP is given in Figure 7 of the Supplementary Data.
| Table 7.Comparison of various methods in detecting filamentation signaling network |
| Table 8.The P-values of functional enrichment for filamentation response signaling network found by ILP based on integrated data |
C shows the cell wall integrity signaling network by our method starting from MID2 and ending at RLM1 (λ = 0.15). From the figure, we can see that all the members in the main chain were successfully detected except BCK1. Compared with the one found by Netsearch, the ILP model did not find FKS1, GIC2, ACT1, BUD6, BCK1, SPH1 and SMD3 due to the missing interactions in the PPI network used in this article, but successfully detected two TFs SWI4 and SWI6 that are in the main chain of cell wall pathway (). In addition, our method found several other members including MBP1 that forms a complex with SWI6p (
28), RHO1 effectors SKN7 (
35), and BEM4p involved in the RHO1-mediated signaling pathway (
36). summarizes the performance of our method and Netsearch in detecting cell wall signaling network. Although the true signaling network is not available, the comparison results demonstrate the effectiveness of the proposed method. shows the functional enrichment of the cell wall signaling network found by ILP, where we can see that most of the members in the signaling network have similar functions. Furthermore, the
P-value of extracted STN calculated with the background random networks is 0.002, which implies the significance of the network derived by the proposed method. In addition, STNs of different sizes by the ILP can be found in Figure 8 of the Supplementary Data.
| Table 9.Comparison of ILP with Netsearch in detecting cell wall signaling network |
| Table 10.The P-values of functional enrichment for cell wall integrity signaling network found by ILP based on integrated data |
Figure 5D shows the HOG signaling pathway found by the ILP model starting from SLN1 and ending at HOG1 (λ = 0.90). It can be seen from the figure that the main chain was successfully recovered by our method. Furthermore, the member STE11 involved in the signaling network was also detected. Aside from this, several new links among the members were also discovered, which may correspond to alternative signaling pathways. To further test the performance of the proposed method, we also searched the possible paths of length 6–7 from the same integrated network that has been used by the ILP model. The paths starting from SLN1 and ending at HOG1 were found with DFS algorithm. All the detected paths were ranked by employing pairwise correlation according to the sum of the weights for the edges in the paths as described in (
6,
7). From the ranking list (found in text1 of the Supplementary Data), we can see that the signaling pathway found by the ILP model was ranked at 75. In other words, the HOG pathway cannot be found by simply ranking the possible linear paths in this case although such a strategy is adopted by existing methods (
6,
7). The existing methods utilizing the pairwise correlation between proteins failed to detect the HOG pathway in this case because the HOG pathway is not a linear path and there are additional links among members in the main chain of pathway. In contrast, our method handles the HOG pathway as a global entity and thereby performs better. This example clearly confirms the efficiency and effectiveness of the proposed method.
From the results described earlier, we can see that with the integration of PPI and gene expression profiles, our method is indeed effective for uncovering signaling networks. In particular, many putative components of signaling networks not detected by the existing methods have been identified, e.g. we found STE20 for pheromone and filamentation pathways, and SWI4 and SWI6 for cell wall integrity. Although Pathfinder can also detect STE20, it works by reconstructing the PPI network whereas our method works on the original PPI network. In addition, the overlap between the published results and the ones uncovered by the ILP model confirms the effectiveness and prediction power of the proposed method.