Monitoring the changes in gene expression, protein levels and post-translational modifications of proteins is crucial to better understand signaling pathways, the mechanisms of action of drugs and the changes that occur during disease. Many approaches are available for the analysis of individual types of data. However, integration and curation of multiple ‘omic’ data sources are more challenging (1
). Typically, there is little overlap among the proteins and genes identified by different methods (2
), and the overlap between experimental findings and known pathways is very low. Previous publications have shown that a more coherent view of the underlying biological processes can be obtained by using a network approach in which the hits from ‘omic’ experiments are mapped onto a network of protein–protein interactions.
Given the high rates of false positives and negatives in these data, the resulting networks are frequently too large and noisy to interpret directly, leading to a number of algorithmic approaches to the problem (3–16
). Flow optimization has been used to reconstruct pathways by inferring genetic hits and transcriptional data (16
), and this method is available as a web server (9
). In another approach, transcriptional data obtained from gene knockout experiments are integrated with interactome and causal paths are identified by linear programming (11
). Bayesian networks have been used to integrate siRNA data in insulin signaling (13
) and copy number and gene expression data for finding drivers in diseases (3
). A maximum-likelihood-based approach has been applied to reveal causal paths in transcriptional regulation by integrating gene knockout experiments (15
). Other approaches include network inference from gene expression (6
), electric circuits (8
) and network propagation (14
In our previous work, we have shown that one successful approach to this problem is constrained optimization to identify a subset of the ‘omic’ hits that are connected directly or indirectly by high probability interactions (7
). This was achieved by searching for the solution to the prize-collecting Steiner tree (PCST) problem (4
). In this approach, the detected proteins/genes in experiments are defined as ‘terminal nodes’ and we seek to connect them to each other either directly or through other undetected proteins (Steiner nodes) using protein–protein and protein–gene interactions. A critical feature of the algorithm is that we do not require it to connect all the terminal nodes. Rather, we seek a network composed of high-confidence edges that ultimately link a subset of the termini. To identify this network, we assign costs to each interaction reflecting our confidence that the interaction is real. In addition, we assign penalties to the terminal nodes based on our confidence in the proteomic or transcriptional data. The PCST algorithm identifies a relevant subnetwork by simultaneously minimizing the cost of edges included in the tree and the penalties of terminals that are excluded.
The PCST approach has been evaluated on the phosphoproteomic and transcriptional data representing the well-characterized yeast pheromone response. The benefits of this approach are that (i) it is robust to noise in experiments, (ii) it integrates both transcriptional and proteomics data in a single pipeline, (iii) the resulting optimum tree contains functionally correlated components and (iv) it contains relevant signaling proteins and transcription factors that were undetected in experiments.
Here, we present the web application of this approach in SteinerNet to make this method (7
) available in a user-friendly way. SteinerNet takes a set of proteins/genes for processing and returns an optimum and compact network from these data. It contains a visualization panel that provides the user a basic display of the optimum Steiner tree and a download panel that provides links to download all the analyses and results. We believe that SteinerNet will help researchers in integrating their high-throughput data to reveal hidden components in their specific system and to obtain biologically meaningful compact networks.