During the last several years a substantial amount of information on large-scale structure of intracellular regulatory and signaling networks has been accumulated. However, the growth in our understanding of how these networks function in a robust and specific manner was lagging behind the shear rate of data acquisition.
To be able to understand the biological functioning or even to efficiently visualize a complex regulatory and signaling network it is important to determine the dominant direction of the information flow and to identify the links that go against this flow and thus generate feedback loops. Ordering a network in such as way that the information cascades down from higher to lower hierarchical levels can help to detect its previously unknown inputs and outputs, to track sources of perturbations based on their observable downstream effects, etc. A simple-minded hierarchical layout of a densely interconnected network is often impossible due to a ubiquitous presence of feedback loops. Indeed, all nodes in a strongly connected component of a network by definition are simultaneously upstream and downstream of each other.
However, if the forward flow of information in the network along multiple channels dominates over the backward flow along relatively few feedback links, the proper hierarchical layout could still be reconstructed based on the network topology alone. The identification and removal of a small number of feedback links would enable one to perform the hierarchical layout of the remaining acyclic network. In the next section we introduce a new probabilistic algorithm to detect an optimal hierarchical layout which minimizes the number of feedback links going from lower to higher levels in the hierarchy. In addition to direct biological applications, this algorithm provides a new computational approach to one of the 21 classic Karp's NP-hard problems: finding the Minimum Feedback Arc Set in a directed graph [1
]. This problem enjoys a seemingly everlasting popularity reflected in a substantial number of approximate solutions (see, for example, [2
] and references therein). It has also been shown that the Minimum Feedback Arc Set problem, apart from being NP-complete, is also APX-hard, which means that there exists a constant k
> 1 (often called the approximation factor) such that there is no polynomial-time approximation algorithm that always finds a link set at most k
times bigger than the optimal result. The first polynomial approximation algorithm for the feedback arc set problem was designed by Leighton and Rao [4
] with the approximation factor k
) where N
is the number of graph vertices. This estimate for the approximation factor was improved by Seymour to
]) for polynomial-time algorithms.
Many of the existing approximate algorithms to the Minimal Arc Set problem are deterministic and greedy in nature, which on one hand side allows a fairly precise prediction of their complexity, performance and precision, yet on the other hand side involves consequences such intrinsic "near-sightedness" and frequent inability to find the global optimum among the local ones. A good example of such deterministic greedy algorithm which also relates the minimum feedback arc set search to graph layout is presented in [6
]. According to this algorithm the hierarchical level of a node is determined by the difference between its out-and in-degrees. This way, the nodes with larger than average out-degrees and/or smaller than average in-degrees are naturally placed among the top hierarchical layers. These nodes directly or indirectly control other nodes with larger than average in-degree and/or smaller than average out-degree. The number of operations in this approximate algorithm scales linearly with the number of graph links.
Our proposed algorithm, unlike the existing algorithms described above, is probabilistic in nature. In the limit of sufficiently slow and long annealing it has a good chance to converge to the actual solution of the minimum feedback arc set problem. At the same time it still requires only a polynomial number of operations, which is proportional to the product of the number of links and vertices in the graph. To evaluate the advantages of the proposed annealing algorithm and to reveal its distinction from the deterministic algorithms, we compare it to our own greedy deterministic algorithm. This algorithm sequentially cuts the links that belong to the largest number of cycles in the network. We found that the probabilistic simulated annealing algorithm generally outperforms the deterministic one in both the number of removed feedback links (which needs to be minimized) as well as in the speed and memory requirements. A simple visual example is provided for the situation when the deterministic greedy algorithm is non-optimal.
Following that, we discuss biological implications and applications of our findings as well as how additional constraints such as a priori knowledge of the function and, therefore, the hierarchical position of certain nodes affects the resulting layout.