In this section, we describe the hierarchical decomposition method we developed. Section HIDEN describes our method. Section Example demonstrates HIDEN on a simple example. Section Divide and Conquer method describes divide and conquer method we employ to scale HIDEN to larger networks.
HIDEN transforms the hierarchical network decomposition problem to a MIPP [7
]. Given a TRN, HIDEN first constructs a set of linear constraints and a linear optimization function that collectively describe the penalty of the decomposition. Then it uses existing optimizer software to solve the resulting problem. Next, we will explain how we formulate the MIPP.
Let us denote the given network that will be decomposed with
. Let us denote the nodes (i.e., TFs) of this network with n1
, …, nm
, where m
is the total number of nodes of
. HIDEN, allows the user to set a limit on the maximum number of allowed levels for hierarchical decomposition. Let us denote this number with M
. Also, let us represent the level assigned to node ni
for all i
}, (i.e. ∀i
}). We aim to find the level assignment T
} that minimizes the total penalty resulting from this level assignment. Therefore, the objective of our problem is the sum of individual penalty scores for each pair of nodes:
Next, we set a limit on the number of levels in the hierarchy. We do this by limiting the variables ti as follows:
We, then, represent each pij as a linear constraint. Remember that pij is a binary function in the following form:
We can rewrite this function as follows:
Let us only consider the cases where ϕ(ni,nj) = 1. We can represent the rest of this function using two linear inequalities. The following set of constraints represent the function pij:
In order to prove that these inequalities model the function pij correctly, we need to inspect all possible scenarios:
1. if ti > tj and pij = 0, then −1 ≥ tj − ti ≥ −(M−1) and M × pij = 0. Therefore both (4) and (5) holds.
2. if ti ≤ tj and pij = 0, then tj−ti ≥ 0 and M × pij = 0. Therefore, (4) holds, however, (5) does not hold.
3. if ti > tj and pij = 1, then −1 ≥ tj−ti ≥ −(M−1) and M × pij = M. Therefore the expression tj−ti −M×pij is smaller than or equal to −M−1. This implies that (4) does not hold but (5) holds.
4. if ti ≤ tj and pij = 1, then (M−1) ≥ tj−ti ≥ 0 and M × pij = M, therefore both (4) and (5) holds.
Therefore, enforcing the constraints (3), (4) and (5) implies:
This corresponds to the latter definition of the function pij except the condition of ϕ(ni,nj) = 1. Since we choose the function ϕ prior to the construction of the MIPP, we know the value of ϕ(ni,nj) for every pair (ni,nj). Therefore, we can manually ensure this property, by only considering pij where ϕ(ni,nj) = 1 and excluding pij completely from our calculations where ϕ(ni,nj) = 0.
Based on the constraints above, the MIPP we construct to solve the network hierarchy problem is as follows:
In this section, we show the application of HIDEN on the network in Figure . We will use adjacency penalty function in this example. Therefore:
Using this ϕ function, the objective of the MIPP is to minimize the following function:
Now we go over to the constraints. First set of constraints limit ti:
Then, we write the remaining functions as follows:
In the resulting problem, M is left as a user defined parameter. When we run the above problem with M = 4, HIDEN returns the following result:
Figure shows the result of HIDEN on the given network. Note that HIDEN computes the level decomposition successfully despite the existence of a cycle in the network.
Result of the hierarchical decomposition of the network in Figureusing HIDEN. Note that the decomposition differs from the decomposition in Figure .
Divide and Conquer method
HIDEN works well for networks that have up to 100 nodes. For larger networks, however, it becomes difficult to solve the resulting MIPP using current hardware. This is mainly because the number of integer variables of the MIPP that describe the problem for the given network increases. This increases the memory consumption and the running time significantly.
In order to solve our problem for networks that have more than 100 nodes we adopt a divide and conquer approach. Given a large TRN, we randomly divide this network into fixed size partitions. We do this by first randomly selecting a node from the given network. This node is the seed of the first partition, and thus it is a member of that partition. We then chose the remaining nodes in that partition iteratively by randomly growing the partition one node at a time. More specifically, at each iteration, we randomly select a node that is not selected so far and that is interacting with at least one of the nodes in the partition. We repeat these iterations until the number of nodes in the partition reaches to a predefined threshold or all the nodes in the TRN are assigned to a partition. Then, we use HIDEN to decompose the subnetwork defined by the nodes and the edges in this partition into hierarchical levels. Once we determine the levels of all the nodes in the current partition, we store those values as they will remain unchanged in the rest of our solution. Next we randomly pick another node from the given TRN among those that have not been considered yet as the seed of the next partition. We grow the next partition similarly and use HIDEN to decompose it into hierarchical levels. We repeat these steps until we exhaust all the nodes in the given TRN.
This method greatly reduces the running time of HIDEN on large networks. Since MIPP is NP-hard, depending on the size and the connectivity of the given TRN, the divide and conquer strategy can be orders of magnitude faster than the unpartitioned HIDEN. However, due to random selection of the nodes, it is possible for us to not achieve the optimal result. This is possible if the partition of the network we start with does not intersect with one or more of the levels in its underlying hierarchy. It is worth mentioning that this probability is usually very low. We can explain this as follows. Consider an N node network which contains n nodes belonging to a specific level x. If we select k nodes among these N nodes randomly, the probability that none of the k nodes belong to level x is (N-n choose k)/(N choose k). As k or n increases, this expression quickly converges to zero. In order to reduce this probability further, we repeat the divide and conquer strategy multiple times, each time starting from a randomly selected node. In our implementation, we repeat this process 1000 times for real TRNs. After 1000 iterations, the probability of all the trials starting with an undesired (i.e. does not intersect with all the final levels) partition becomes very small (i.e. if for 1 iteration, the probability is as high as 0.9, after 1000 iterations, the probability becomes 0.91000~10−46). Since the running time of partitioned HIDEN is orders of magnitude less than that of the unpartitioned HIDEN, 1000 repetitions remains to be practical. It took less than 10 minutes for the largest dataset (S. cerevisiae). Our experiments showed that on the average, the results of the divide and conquer method reach its optimum in less than 500 iterations.