Efforts for network-based function prediction have been going on for over 6 years now, since the introduction of molecular techniques capable of mapping protein interactions on a genome-wide scale. Despite the large number of techniques suggested for functional annotation using networks, systematic annotation is still mostly based on other data sources, such as sequence homology. Several goals, reviewed below, have to be accomplished in order for the network-based functional annotation tools to become widely used.
Despite the large number of different algorithms developed for both direct and module-assisted function prediction, the implementations of only a small fraction of them are publicly accessible. Only a handful, such as MCODE (Bader and Hogue, 2003
), PRODISTIN (Baudot et al, 2006
), CFinder (Adamcsek et al, 2006
) and NetworkBlast (http://www.pathblast.org/
), are currently supported with a graphical interface. Networks are highly visualizable, and as the human eye is better in pattern detection than any computer, a good graphical interface will help make such computational tools widely used by the biological community.
Although the field has advanced considerably in recent years on the methodological side, comprehensive comparisons of the plethora of available annotation methods, similar to that performed by Brohee and van Helden (2006)
, are greatly in need. Such systematic evaluation efforts were recently performed in other fields, for example, discovery of TF-binding sites (Tompa et al, 2005
), biclustering of expression data (Prelic et al, 2006
) and protein structure prediction (Kryshtafovych et al, 2005
), which has a long and successful history of community evaluations. Owing to the fundamental differences between the different annotation types, such as biological process and molecular complexes, it is clear that different methods are best suited for different types of annotation. An important prerequisite for a comprehensive comparison is the definition of golden standards for functional annotation. In most of the studies described here, the MIPS complexes catalog and Gene Ontology were used as a benchmark for prediction success. However, both data sets are currently not comprehensive and some annotations are found in one but not in the other.
Although several methods described above use diverse functional genomic data sources, they are still greatly outnumbered by methods utilizing only the network topology. Owing to the increasing accessibility of microarray technology, gene expression measurements have become widely available for diverse conditions across species. As of August 2006, almost 95 000 and 45 000 hybridization samples were available in Gene Expression Omnibus and ArrayExpress databases, respectively. This huge body of data is currently poorly exploited by integrative annotation methods, as most of them focus on expression data derived in a single study. Following the success of several methods integrating expression data from multiple studies (Ihmels et al, 2002
; Lee et al, 2004
; Segal et al, 2005
; Tanay et al, 2005
), we expect that techniques based on large compendia of expression data and protein interaction networks will significantly increase the accuracy of functional annotation. Additional large-scale genomic data, such as deletion phenotypes (Brown et al, 2006
), proteomic measurements (Kislinger et al, 2006
) and protein cellular localization (Huh et al, 2003
), can also be used in an integrative framework. Data of a high diversity and dimensionality have been integrated using biclustering (Tanay et al, 2005
) and kernel-based methods (Lanckriet et al, 2004
This review focused on methods aimed at genome-scale functional annotation using network data from a single species. Several additional studies developed methods for detection of functional modules in slightly different contexts. In particular, specific algorithms were developed for detection of molecular complexes from lists of proteins identified in biochemical purification experiments, rather than from binary interaction networks (Krause et al, 2003
; Hollunder et al, 2005
; Scholtens et al, 2005
; Gavin et al, 2006
). Another set of works attempted to identify evolutionarily conserved functional modules via the integration of networks from multiple organisms (Kelley et al, 2003
; Sharan et al, 2005
; Campillos et al, 2006
; Flannick et al, 2006
; Gandhi et al, 2006
; see also the review in Sharan and Ideker, 2006
Which methods should be used by a newcomer to the field? As mentioned above, the limited information about the comparative performance of the methods presented here makes it difficult to decide which method should be used in a specific setting. When using only PPI data, our initial and limited comparison does seem to indicate that direct methods are currently slightly superior to module-assisted ones, with MRF and MCL being the leading techniques for direct and module-assisted function prediction, respectively. New techniques should thus be compared to these methods to prove their superiority. If the goal is actual function prediction rather than methodological improvement, the use is mainly limited to methods that are implemented as a tool with a graphical user interface or available as a web server (). As to methods integrating multiple data sources, no comparative assessment is currently available.
When using interaction networks, whether as a sole information source or in conjunction with other data sources, the current limitations of these data have to be recognized. The currently available protein interaction data are known to be both noisy (von Mering et al, 2002
) and partial (Hart et al, 2006
). In addition, as large-scale interaction mappings are conducted only in a single growth condition or in a single tissue type, interaction data currently lack any spatial or temporal information. Clearly, for some functional annotations, the relevant interactions may occur only under specific conditions in a specific time point. In addition, not every functional aspect of the protein is expected to be manifested in its interaction pattern. Some proteins, such as metabolic enzymes, are most functional on their own without the need for cooperation from other proteins.
Despite these caveats, analysis of interaction networks is a young, promising and very active research area. The utilization of such networks for function prediction is just one of a plethora of possible ways by which this rich source of information can be exploited. Although techniques for network-based function prediction have been continuously improving, there is still a lot of room for improvement, both in terms of the methodologies and in terms of their evaluation. We expect that improved, more accurate methods that are made readily accessible to the biological community will make interaction networks a prevalent instrument for functional annotation, among their many other important uses.