In the post-genome era, there are many high-throughput data such as yeast two-hybrid, genetics interaction and gene expression microarray data are generated. Therefore, analysis of these data becomes an important research issues. One of major analyses is detecting functional modules on biological networks. Taking a protein as a vertex and connecting any two proteins that have direct interaction by an edge, we can build a Protein-Protein Interaction (PPI) network from a protein interaction dataset. Research evidence shows that biological systems are composed of functional modules [1
]. Proteins in a module work together to perform certain biological functions. The interactions among these module components (proteins in this module) must be frequent. Based on this idea, a functional module should induce dense regions on the PPI network. Hence, detecting a densely connected cluster is a good heuristics to find protein functional module.
Algorithms in graph/network analysis have been applied in identifying essential functional modules from biological networks. For the divisive cluster method, it takes the whole network as a cluster at its beginning stage, and then split the cluster into smaller ones repeatedly until the network meet its stop criterion. Based on this idea, Dunn et al
] investigated biological function using Girvan and Newman's Edge-Betweenness algorithm which removes the edges with the highest edge-betweenness in each iteration. On the contrary, for the agglomerative clustering method, every single vertex forms a cluster at the beginning stage, and clusters are allowed to merge and grow as bigger as possible under certain constraints. The CPC (Clique Percolation Clustering method)[4
], SCAN (Structural Clustering Algorithm for Networks) [6
], COACH (COre-AttaCHment based method) [7
], CMC (Clustering-based on Maximal Cliques)[8
] and Core (Core-Attachment approach)[9
] are classified into this category. There is a fusion strategy which combines the divisive and agglomerative approach, such as MoNet (Modular organization of protein interaction Networks) [10
]. In the first stage of MoNet, it removes an edge with the highest edge-betweenness and pushes the edge into a stack until there is no edge can be removed. In the second stage, an edge is popped from stack and then adds into graph under certain condition.
Besides those methods mentioned above, there are many other functional module-detecting methods such as MCL (Markov CLuster algorithm) [11
], MATISSE (Module Analysis via Topology of Interactions and Similarity SEts) [13
], CEZANNE (Co-Expression Zone ANalysis using NEtworks)[14
], and MST extension [15
]. Based on a simulation of flow in graphs, MCL partitions the PPI network into many non-overlapping dense clusters. By finding proteins with highly similar gene expressions, MATISSE and CEZANNE generate non-overlapping clusters. According to maximum spanning trees calculated from weighted PPI networks, MST extension algorithm produces overlapping clusters. Recently, Gavin et al. [16
] suggested that a protein complex consists of two parts, a core and an attachment. There are many researchers are based on this concept to design their own detecting protein complex algorithms, such as COACH (COre-AttaCHment based method) [7
] and Core (Core-Attachment approach) [9
]. These kinds of methods are also belonged to agglomerative method because a cluster grows from a core. This concept is also adopted in our algorithm.
Some previous studies showed that current PPI networks contain certain rate of false positive and false negative interactions [17
]. However, most current functional module detecting methods from protein interactions pay little attention on this precondition. In addition, many clustering methods do not allow a vertex assigned to multiple clusters, but a protein may play roles in different ways. Therefore, functional modules may overlap with each other. To resolve these issues, we developed a novel agglomerative clustering method to detect functional modules from confidence-scored protein interactions. We conducted our approach on the PPI network came from Collins et al
] and gene expression data from MATISSE website [35
]. The idea of Gavin et al. [16
] on protein complex is also included in the algorithm. Our method can perform better than other existed ones to reconstruct the components and sub-complexes inside the protein complexes.