Home | About | Journals | Submit | Contact Us | Français |

**|**BioData Min**|**v.4; 2011**|**PMC3101653

Formats

Article sections

- Abstract
- Introduction
- Graph Theory and Definitions
- Data Structures
- Network Properties
- Network Centralities and Node Ranking
- Network Topology
- Network Models
- Cluster Analysis and Visualization
- Discussion
- Conclusions
- Competing interests
- Authors' contributions
- References

Authors

Related links

BioData Min. 2011; 4: 10.

Published online 2011 April 28. doi: 10.1186/1756-0381-4-10

PMCID: PMC3101653

Georgios A Pavlopoulos,^{}^{1,}^{2} Maria Secrier,^{3} Charalampos N Moschopoulos,^{4,}^{5} Theodoros G Soldatos,^{6} Sophia Kossida,^{5} Jan Aerts,^{2} Reinhard Schneider,^{3,}^{7} and Pantelis G Bagos^{1}

Georgios A Pavlopoulos: ed.lbme@uopolvap; Maria Secrier: ed.lbme@reirces.airam; Charalampos N Moschopoulos: rg.sartapu.diec@lupoxsom; Theodoros G Soldatos: moc.smetsysoibefil@sotadlos; Sophia Kossida: rg.ymedacaoib@adissoks; Jan Aerts: eb.nevueluk.tase@strea.naj; Reinhard Schneider: ed.lbme@dienhcsr; Pantelis G Bagos: rg.gcu@sogabp

Received 2010 November 2; Accepted 2011 April 28.

Copyright ©2011 Pavlopoulos et al; licensee BioMed Central Ltd.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article has been cited by other articles in PMC.

Understanding complex systems often requires a bottom-up analysis towards a systems biology approach. The need to investigate a system, not only as individual components but as a whole, emerges. This can be done by examining the elementary constituents individually and then how these are connected. The myriad components of a system and their interactions are best characterized as networks and they are mainly represented as graphs where thousands of nodes are connected with thousands of vertices. In this article we demonstrate approaches, models and methods from the graph theory universe and we discuss ways in which they can be used to reveal hidden properties and features of a network. This network profiling combined with knowledge extraction will help us to better understand the biological significance of the system.

The theory of complex networks plays an important role in a wide variety of disciplines, ranging from computer science, sociology, engineering and physics, to molecular and population biology. Within the fields of biology and medicine, potential applications of network analysis include for example drug target identification, determining a protein's or gene's function, designing effective strategies for treating various diseases or providing early diagnosis of disorders. Protein-protein interaction (PPI) networks, biochemical networks, transcriptional regulation networks, signal transduction or metabolic networks are the highlighted network categories in systems biology often sharing characteristics and properties.

**Protein-protein interaction (PPI) networks **[1] mainly hold information of how different proteins operate in coordination with others to enable the biological processes within the cell. Despite the fact that for the majority of proteins the complete sequence is already known, their molecular function is not yet fully determined. Predicting protein function is still a bottleneck in computational biology research and many experimental and computational techniques have been developed in order to infer protein function from interactions with other biomolecules. Large-scale and high-throughput techniques can detect proteins that interact within an organism. Among them, the most well-known are the pull down assays [2], tandem affinity purification (TAP) [3], yeast two-hybrid (Y2H) [4], mass spectrometry [5], microarrays [6] and phage display [7]. Some very well-known datasets that have been recently produced by employing the aforementioned techniques and that are widely used are the Tong [8], Krogan [9], DIP [10], MIPS [11], Gavin 2002 [5] and Gavin 2006 [12] datasets. Besides the various experimental methods, a variety of large biological databases that contain information concerning PPI data is already available and most of them are organism specific. Some well-known databases are the Yeast Proteome Database (YPD) [13], the Munich Information Center for Protein Sequences (MIPS) [14], the Molecular Interactions (MINT) database [15], the IntAct database [16], the Database of Interacting Proteins (DIP) [10], the Biomolecular Interaction Network Database (BIND) [17], the BioGRID database [18], the Human Protein Reference Database (HPRD) [19], the HPID [20] or the DroID [21] for Drosophila. Two additional well-documented services based on text mining analysis are the Stitch [22] and String [23] databases.

**Regulatory networks (GRNs) **contain information concerning the control of gene expression in cells. This process is modulated by many variables, such as transcription factors [24], their post-translational modifications or association with other biomolecules [25]. Usually, these networks use a directed graph representation in an effort to model the way that proteins and other biological molecules are involved in gene expression and try to imitate the series of events that take place in different stages of the process. They often exhibit specific motifs and patterns concerning their topology. Data collection, data integration and analysis techniques give now the possibility to study gene regulatory networks in a larger scale [26]. Protein-DNA interaction data is collected in databases like JASPAR [27], TRANSFAC [28,29] or B-cell interactome (BCI) [30], while post-translational modification can be found in databases like Phospho.ELM [31], NetPhorest [32] or PHOSIDA [33].

**Signal transduction networks **often use multi-edged directed graphs to represent a series of interactions between different bioentities such as proteins, chemicals or macromolecules and to investigate how signal transmission is performed either from the outside to the inside of the cell, or within the cell. Environmental parameters change the homeostasis of the cell and, depending on the circumstances, different responses can be triggered. Similarly to GRNs, these networks also exhibit common patterns and motifs concerning their topology [34]. Databases that store information about signal transduction pathways are MiST [35], TRANSPATH [36], etc.

**Metabolic and biochemical networks **[37] are powerful tools for studying and modelling metabolism in various organisms. As metabolic pathways, we consider a series of chemical reactions occurring within a cell at different time points. The main role within a metabolic network is played by the enzymes, since they are the main determinants in catalyzing biochemical reactions. Often, enzymes are dependent on other cofactors such as vitamins for proper functioning. The collection of pathways, holding information about a series of biochemical events and the way they are correlated, is called a metabolic network. Modern sequencing techniques allow the reconstruction of the network of biochemical reactions in many organisms, from bacteria to human [38,39]. Among the several databases holding information about biochemical networks some of the most popular are the Kyoto Encyclopedia of Genes and Genomes (KEGG) [40], EcoCyc [41], BioCyc [42] and metaTIGER [43]. Several methods have also been discovered to analyze the pathway structure of metabolic networks [44-48].

Many computer readable formats are available to describe biological networks. The *Systems Biology Markup Language (SBML*) [49] is an XML-like machine-readable language, that is able to represent models to be analyzed by a computer. SBML can represent metabolic networks, cell signaling pathways, regulatory networks, and many other kinds of systems [50]. Other file formats that can represent biological networks are the *Proteomics Standards Initiative Interaction (PSI-MI) *[51], *Chemical Markup Language *(*CML*) [52,53] for chemicals or *BioPAX *[54] for pathways. Secondary formats that can also be used in similar ways are the *Cell Markup Language *[55] which is an XML-like machine-readable language mainly developed for the exchange of computer-based mathematical models or the *Resource Description Framework, RDF *which is a language for the representation of information about resources on the World Wide Web [56,57].

After having given a short overview of how data can be produced either experimentally or retrieved from various databases and which formats are available for each type of network, we further emphasize on the computational analysis as defined in graph theory. We finally conclude by describing which properties of the ones discussed below characterize the various networks.

To introduce the basic concepts of graph theory, we give both the empirical and the mathematical description of graphs that represent networks as they are originally defined in the literature [58,59].

A graph *G *can be defined as a pair *(V, E) *where *V *is a set of vertices representing the nodes and *E *is a set of edges representing the connections between the nodes. We define as *E *= *{(i, j)| i, j * *V} *the single connection between nodes *i *and *j*. In this case, we say that *i *and *j *are ** neighbors**. A

A directed graph is defined as an ordered triple *G *= *(V, E, f)*, where *f *is a function that maps each element in *E *to an ordered pair of vertices in *V*. The ordered pairs of vertices are called ** directed edges, arcs or arrows**. An edge

A weighted graph is defined as a graph *G = (V, E) *where *V *is a set of vertices and *E *is a set of edges between the vertices *E = {(u, v) | u, v * *V} *associated with it a weight function *w: E→R*, where *R *denotes the set of all real numbers. Most of the times, the weight *w _{ij }*of the edge between nodes

** Bipartite graph **is an undirected graph

If *G = (V, E) *is a graph, then *G _{1 }*=

Examples and shapes describing the aforementioned graph types can be found in Figure Figure1.1. The most common data structures that are used to make these networks computer readable are adjacency matrices or adjacency lists. The following section provides a short mathematical description of these data structures.

The ** degree of a node **in an undirected graph is the number of connections or edges the node has to other nodes and is defined as

The two main data structures used to store network graph representations are described below.

Given a graph *G *= *(V, E) *the adjacency matrix representation consists of a *|V|x|V| *= *nxn *matrix *A *= *(a _{ij}) *such that

Given a graph *G *= *(V, E) *the adjacency list representation consists of an array *Adj *of *|E| *elements where for each *e**E Adj(0, e) = i **V*. Adjacency lists require space *Θ (|V|+|E|) *and are preferable for sparse graphs with a low density of connections. An example of how these data structures represent a graph is given in Figure Figure22.

Looking at different network properties can provide valuable insight into the internal organization of a biological network, the repartition of molecules among cellular processes, as well as the evolutionary constraints that have shaped an organism's protein, metabolic or regulatory network into a functional, feasible structure. In the following, we give a short description of the main properties that are commonly analyzed in networks.

The * graph density *shows how sparse or dense a graph is according to the number of connections per node set and is defined as . A

In the mathematical field of graph theory, a ** complete graph **is a simple graph in which every pair of distinct vertices is connected by a unique edge. The complete graph on

Let *G*_{1}= (*V*_{1}, *E*_{1}) and *G*_{2}= (*V*_{2}, *E*_{2}) be two undirected graphs. A function *f*: *V*_{1 }->*V*_{2 }is called isomorphism if *f *is an edge-preserving bisection, such that for all *a, b**V*_{1}, *(a, b)**E*_{1 }if and only if *(f(a), f(b)) * *E*_{2}. When such function exists, then *G*_{1 }and *G*_{2 }are called isomorphic. An example is shown in Figure Figure33.

A ** walk **is a pass through a specific sequence of nodes

A ** clique **in an undirected graph

** Clustering Coefficient **is the measurement that shows the tendency of a graph to be divided into clusters. A cluster is a subset of vertices that contains lots of edges connecting these vertices to each other. Assuming that

Biological networks have a significantly higher average clustering coefficient compared to random networks, which proves their modular nature. Indeed, many cellular processes are governed by subsets of biomolecules that form an interaction module. Since cellular processes are linked, the modules tend to be linked as well, but the linking molecules are often few, such that the module overlap is quite low [72,73].

** Centralization **is the measurement that shows whether a network has a star-like topology or whether the nodes of the network have on average the same connectivity. The closer the centralization is to 1, the more likely is the network to have a star-like topology. The closer to 0, the more likely it is that the nodes of the network have on average the same connectivity (for example a square, where every node is connected with 2 neighbors). It is calculated as

** Network Motifs **represent patterns in complex networks occurring significantly more often than in randomized networks [74]. They consist of subgraphs of local interconnections between network elements. A motif is a small connected graph

This section shows how nodes can be ranked or sorted according to their properties, depending on the question asked. In biological networks, it is important for example to detect central nodes or intermediate nodes that affect the topology of the network, depending of course on the biological question. Such a question would be to find the molecules in a biological pathway that are not necessarily central but have a crucial biological role in signal transduction or in PPI networks, to detect such nodes that interact with many other proteins or find molecules that are crucial for stimulating the expression of genes.

** Degree Centrality **shows that an important node is involved in a large number of interactions. For a node

** Closeness Centrality **indicates important nodes that can communicate quickly with other nodes of the network. Let

** Betweenness Centrality **shows that nodes which are intermediate between neighbors rank higher. Without these nodes, there would be no way for two neighbors to communicate with each other. Thus,

** Eigenvector Centrality **ranks higher the nodes that are connected to important neighbors. Let

** Eccentricity Centrality **is the measure that shows how easily accessible a node is from other nodes. Let

** Subgraph Centrality **is the measure that ranks nodes according to the number of subgraphs of the overall network in which the node participates, with more weight given to small subgraphs. Let

** Matching Index **is the measure that shows how similar two nodes are within the network. Two vertices that are functionally similar do not always have to be connected. The matching index

or . An example is shown in Figure Figure9.9. The matching index is often used to cluster different components of a biological network according to some property. For instance, it has been used to describe spatial growth in brain networks during development [91] or to predict the connectivity of primate cortical networks [92].

Further centrality measurements and their application to the study of PPIs in yeast are introduced in [85]. A discussion about how centrality correlates with lethality in biological networks can be found in [93]. The coupling between centrality and essentiality has also been investigated in several eukaryotic protein networks [94]. It is very often the case that studies of a particular network involve the analysis and comparison of several centrality measures, for instance to study pleiotropy in human genetic diseases [87], to compare PPI and transcriptional regulation networks [95] or to test hub essentiality [77]. Tools that have implemented functionality for exploring the different types of centralities previously mentioned in biological networks and not only are CentiBiN [96], Visone [97], Pajek [98], VisANT [99]. In most of the cases, however, only a limited selection of centrality measures is available.

The topology of the network often reveals information about its biological significance. Often, networks follow patterns and rules and have a specific topology that allows scientists to go through a deeper investigation towards knowledge extraction.

** Scale-free **or otherwise real world networks describe natural networks like online communities (i.e Facebook) where the nodes are the people and the edges the connection between them, or networks such as the World Wide Web (www) where the nodes are individual web pages and the links are hyperlinks. Many biological networks also have scale-free properties, with nodes representing bioentities and edges the interactions between them (like proteins that interact physically or metabolites that take part in the same reaction) [73,93,100]. Assuming that

The ** degree distribution P(k) **has become one of the most prominent characteristics in network topology. In terms of numerical estimation, a more reliable property, very similar to the previous, is the

To visually represent the properties of the network we usually rank the vertices according to their degree and then plot the degree versus the rank of each vertex. Another representation is to create a histogram by plotting the vertices of the graph sorted according to their degree using a logarithmic scale. A third and very popular representation is to plot the degrees of the nodes sorted versus either their degree distribution *P(k) *or their cumulative degree distribution *P _{c}(k)*. An interesting analysis of most of these properties in various PPI, metabolic or transcriptional networks of several organisms (

A network is called ** assortative **if the vertices with higher degree have the tendency to connect with other vertices that also have high degree of connectivity; one such category is social networks [102]. If the vertices with higher degree have the tendency to connect with other vertices with low degree then the network is called

To correlate the degrees of two nodes *i *and *j *we use a joint probability distribution *P(k _{i}, k_{j}) = P(k_{i})P(k_{j})*. A more straightforward way is to use the Pearson's Correlation Coefficient (PCC), which quantifies the correlation or linear dependence between two variables (in this case, the degrees of two nodes). In other words, it measures to which extent one variable increases/decreases as the other increases. PCC (

This is equivalent to the Pearson correlation coefficient of the degrees at either ends of an edge. The range of the *r-*values is between *+1 *and -*1, r *<*0 *corresponding to a disassortative network whereas r > 0 to an assortative one. Another way to correlate degrees is to calculate the ** average neighbor degree**. For each vertex

Several topological models have been built to describe the global structure of a network, as introduced below.

This model was mainly introduced to describe the properties of a random graph. The simple model of a network involves taking a number of vertices *N *and connecting nodes by selecting edges from the *N(N-1)/2 *possible edges randomly. The degree distribution for this model is given by a binomial distribution. The probability of a vertex to have degree *k *is , where *k* is the average connectivity of the network. For small *P *probabilities, the network seems to be disconnected and consists of many isolated components whereas for *P *>*log(N)/N *almost all vertices are connected.

This model was introduced to describe networks that follow the small world topology. This type of topology characterizes many biological networks, like metabolic networks where it often happens that paths of few (three-four) reactions link most metabolites. As a consequence, local changes in metabolite concentration local perturbations in these networks will propagate throughout the entire network. In this model, the frequency of nodes *P(k) *with *k *connections follows a power-law distribution equation *P*(*k*) ~ *k*^{-γ}, in which most nodes are connected with small proportion of other nodes and a small proportion of nodes are highly connected. Thus each vertex is connected to *N/2 *nearest neighbors. In exponential networks the probability that a node has a high number of connections is very low.

This model describes scale-free networks and it is one of the most basic models since it describes most of the biological networks [37,108]. The concept behind this model is to reveal information about the dynamics of the network, especially from an evolutionary perspective. The networks are built to mimic gene duplication events, such that they expand continuously by addition of new nodes and the new nodes attach preferentially to sites that are already well connected [109]. Initially we start with small number of nodes *m _{0}*. At each step, a new node

** Cluster analysis **[110] aims at classifying a set of observations into two or more mutually exclusive

It is generally difficult to predict behavior and properties based on observations of behaviors or properties of other elements in the same system, therefore various approaches for cluster analysis emerge. Clustering algorithms may be *Exclusive, Overlapping, Hierarchical *or *Probabilistic*. In the first case, data are grouped in an exclusive way, so that a certain element can be assigned to only one group (exclusively). On the other hand, the overlapping clustering uses fuzzy sets to cluster data, so that each point may belong to two or more clusters with different degrees of membership. A hierarchical clustering algorithm organizes data in hierarchies and is based on the union between the two nearest clusters; it is commonly used for microarray and sequence analysis [115]. A more analytical categorization of clustering algorithms can be found at [110,116].

An important component of a clustering algorithm is the distance measure between data points. If all the components of the data instance vectors have the same physical units, it is then possible that the simple Euclidean distance metric is sufficient to successfully group similar data instances. One example is to cluster cities on a map, since in this case Euclidean distance represents real natural distances. However, for higher dimensional data the Euclidean distance can sometimes be misleading. In that case, a popular measure is the ** Minkowski metric **and is calculated as where

**Hierarchical clustering **is a method of cluster analysis which seeks to build a hierarchy of clusters. There are two different strategies to organize data. These are the *agglomerative *and the *divisive*: ** Agglomerative**: It is a "bottom-up" approach. Each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy.

Let *n _{r }*be the number of clusters and x

** Single linkage **calculates the smallest distance between objects in the two clusters to merge them:

** Complete linkage **calculates the largest distance between objects in the two clusters to merge them:

** Average linkage **uses the average distance between all pairs of objects in any two clusters: . This algorithm is also known as

** Centroid linkage **finds the Euclidean distance between the centroids of the two clusters: , is the Euclidean distance.

** Median linkage **uses the Euclidean distance between weighted centroids of the two clusters, are weighted centroids for the clusters

Single or complete linkages are the fastest of the linkage methods. However, single linkage tends to produce stringy clusters, which is not always preferable. The centroid or average linkage produce better results regarding the accordance between the produced clusters and the structure present in the data. These methods require much more computations. Average linkage and complete linkage may be the preferred methods for microarray data analysis [115].

** Ward's linkage **finds the incremental sum of squares; that is, the increase in the total within-cluster sum of squares as a result of joining two clusters. The within-cluster sum of squares is defined as the sum of the squares of the distances between all objects in the cluster and the centroid of the cluster. The sum of squares measure is equivalent to the following distance measure ,

where || ||_{2 }is the Euclidean distance and are the centroids of clusters *r *and *s *and *n _{r }*and

** Weighted average linkage **uses a recursive definition for the distance between two clusters. If cluster

** Neighbor Joining **[112,119] was initially proposed for finding pairs of operational taxonomic units (OTUs) that minimize the total branch length at each stage of clustering of OTUs starting with a star-like tree. The branch lengths as well as the topology of a parsimonious tree can quickly be obtained by using this method [112].

Known platforms that already share the tree-based algorithms described above are the Hierarchical Clustering Explorer (HCE) [122,123], MEGA [124-127] or TM4 [128]. A recent review article shows which file formats, visualization techniques and algorithms can be used for tree analysis [129].

Another category of clustering algorithms tries to cluster data in separate groups by identifying common properties that the nodes of a network share. Different strategies exist, like for example trying to find dense areas in a graph or areas where message exchange between nodes is easier or to identify strongly connected components or clique-like areas etc. Many of such algorithms have been used in different case studies like for example to identify protein families [130], to detect protein complexes in PPI networks [131,132], or for finding patterns and motifs in a sequence [133]. Even though many more exist, some of the most famous algorithms are given below.

** Markov Clustering **[134] (MCL) algorithm is a fast and scalable unsupervised clustering algorithm based on simulation of stochastic flow in graphs. The MCL algorithm can detect cluster structures in graphs by a mathematical bootstrapping procedure which takes into account the connectivity properties of the underlying network. The process deterministically computes the probabilities of random walks through a graph by alternating two operations: expansion, and inflation of the underlying matrix. The principle behind it is that random walks on a graph are likely to get locked within dense subgraphs rather than move between dense subgraphs via sparse connections. In other words, higher length paths are more often encountered between nodes in the same cluster than between nodes within different clusters, such that the probabilities between nodes in the same complex will typically be higher in expanded matrices. Clusters are identified by alternating expansion and inflation until the graph is partitioned into subsets so that there are no longer paths between these subsets [135,136].

** k-Means **[137] is a method of cluster analysis which aims to partition

** Affinity Propagation **[139] takes as input measures of similarity between pairs of data points and simultaneously considers all data points as potential candidates. Real-valued messages are exchanged between data points until a high-quality set of exemplars and corresponding clusters gradually emerges.

** Restricted Neighborhood Search Cluster Algorithm **[140]: It tries to find low cost clustering by composing first an initial random clustering. Later it iteratively moves one node from one cluster to another in a random way trying to improve the clustering cost.

** Spectral clustering **[141]: This algorithm tries to find clusters in the graph such that the nodes within a cluster are connected with highly-similar edges and the connections between such areas are weak, constituted by edges with low similarity. The aim is to identify these tightly coupled clusters, and cut the inter-cluster edges. Figure Figure1111 shows an example of protein complex prediction from PPI yeast dataset [12].

Despite the great variety of clustering techniques, many articles directly compare the various clustering methodologies like [135] and [142]. Very often we encounter articles that compare similar algorithms using different datasets and come to very diverse conclusions and results i.e [142,143].

Concerning the visualization of networks, the availability of clustering techniques and their complex configuration/combination, today to a large extent, there is a lack of visualization platforms or tools that are able to integrate a variety of more advanced algorithms and the development implementation of such implementations emerges [144]. Platforms that share clustering algorithms are the Network Analysis Tool (NEAT) [145] and jClust [146] but they are still poor in the variety of methods they offer. Software like ArrayCluster [147] and MCODE [60] is often used in analysis of gene expression profiles and coexpression detection. Many visualization tools [144] such as Medusa [148], Cytoscape [149], Pajek [98] and many others [144] visualize networks in both 2D and 3D, but very few of them like Arena3D [150] try to bridge the gap between clustering analysis and visualization.

**Protein-protein interaction (PPI) networks **[1] are very diverse and it is difficult to come to general conclusions about their properties, mainly because data are generated from different sources both computationally and experimentally as described in a previous section. In most of the cases, PPI networks follow the laws of scale-free networks [93]. In such networks there are always proteins with higher degree of connectivity that appear to be of higher biological significance. Such proteins are the most important for the survival of the cell [93]. Large-scale maps of protein interaction networks have been constructed recently using high-throughput approaches to identify protein interactions [151-155]. It has been shown that these networks are highly dynamic, both during common cellular processes and on the evolutionary scale [109]. Further details on PPI network construction and analysis are given in [156].

**Regulatory networks (GRNs) **are usually sparsely connected. More specifically, the average number of upstream-regulators per gene is less than two [64]. Theoretical results show that the selection for robust gene networks will form minimal complexes even more sparsely connected [64], thus a fundamental design constraint could shape the evolution of gene network complexity. Network maps have been constructed for the transcriptional regulatory networks of *E. coli *and *S. cerevisiae *and are maintained in databases [26,157,158]. They are very sensitive and flexible to evolution [159] since their dynamics changes continuously over time and since transcription factors evolve faster than their target genes [160]. The number of regulators *N _{reg }*grows faster than the number of genes

**Signal transduction networks **are characterized by several patterns and motifs like self-sustaining feedback loops. These patterns appear at every time point during the signal transduction in the network and they reveal information about the topology of the network, therefore are important for biological functionality [163]. The nodes with the highest centralities in such networks correspond to domains involved in signal transduction and cell-cell contacts [164]. Signal transduction networks are sparse and they follow the scale-free properties. In *E. coli *and *S. sereviase*, the degree distribution is *P*(*k*) = *k*^{-γ}, *γ *≈ 2 and most of the molecules are involved into few interactions and only few of them have higher connectivity [8,165].

**Metabolic and biochemical networks **are scale-free networks indicating a small-world structure considering the topology of the network based on its metabolites [166,167], where all of the nodes in such networks are connected through a short path to any other. One example is presented in [167] for *E. coli*. The probability that a substrate participates as input in *k *metabolic reactions follows the power-law distribution *P*(*k*) = *k*^{-γin}, *γ _{in }*≈ 2.2 whereas the probability of a substrate to be produced by

The mathematical discipline which underpins the study of complex networks in biological and other applications is graph theory. It has been successfully applied to the study of biological network topology, from the global perspective of their scale-free, small world, hierarchical nature, to the zoomed-in view of interaction motifs, clusters and modules and the specific interactions between different biomolecules. The structure of biological networks proves to be far away from randomness but rather linked to function. Furthermore, the power of network topology analysis is limited, as it provides a static perspective of what is otherwise a highly dynamic system, such that additional tools should be combined with this approach in order to obtain a deeper understanding of cellular processes.

The complexity of biological networks increases as data are accumulated. The inherent variability of biological data, data inaccuracy and noise, the overload of information and the need to study the dynamics and network topology over time, are currently the bottlenecks in systems biology. Improved techniques for integration of data arising from different sources, as well as for visualization, will be crucial for understanding the functionality of complex networks. Moreover, new mathematical developments in the field and discovery of new areas of applications should be pursued in the near future.

The authors declare that they have no competing interests.

MS was financially supported by the EMBL PhD Programme. PGB and RS supervised the study. CNM, TGS and SK wrote parts of the manuscript. JA input was crucial for the article. All authors read and approved the final manuscript.

GAP and MS were financially supported by the EMBL PhD Programme. GAP was financially supported as a postdoctoral fellow from the Greek State Scholarship Foundation (I.K.Y - http://www.iky.gr/IKY/portal/en).

- Pellegrini Matteo, Haynor David, Johnson JM. Protein interaction networks. Expert Rev Proteomics. 2004;1(2) [PubMed]
- Vikis HG, Guan KL. Glutathione-S-transferase-fusion based assays for studying protein-protein interactions. Methods Mol Biol. 2004;261:175–186. [PubMed]
- Puig O, Caspary F, Rigaut G, Rutz B, Bouveret E, Bragado-Nilsson E, Wilm M, Seraphin B. The tandem affinity purification (TAP) method: a general procedure of protein complex purification. Methods. 2001;24(3):218–229. [PubMed]
- Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci USA. 2001;98(8):4569–4574. [PubMed]
- Gavin AC, Bosche M, Krause R, Grandi P, Marzioch M, Bauer A, Schultz J, Rick JM, Michon AM, Cruciat CM. et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature. 2002;415(6868):141–147. [PubMed]
- Stoll D, Templin MF, Bachmann J, Joos TO. Protein microarrays: applications and future challenges. Curr Opin Drug Discov Devel. 2005;8(2):239–252. [PubMed]
- Willats WG. Phage display: practicalities and prospects. Plant Mol Biol. 2002;50(6):837–854. [PubMed]
- Tong AH, Lesage G, Bader GD, Ding H, Xu H, Xin X, Young J, Berriz GF, Brost RL, Chang M. et al. Global mapping of the yeast genetic interaction network. Science. 2004;303(5659):808–813. [PubMed]
- Krogan NJ, Cagney G, Yu H, Zhong G, Guo X, Ignatchenko A, Li J, Pu S, Datta N, Tikuisis AP. et al. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature. 2006;440(7084):637–643. [PubMed]
- Xenarios I, Rice DW, Salwinski L, Baron MK, Marcotte EM, Eisenberg D. DIP: the database of interacting proteins. Nucleic Acids Res. 2000;28(1):289–291. [PMC free article] [PubMed]
- Mewes HW, Frishman D, Mayer KF, Munsterkotter M, Noubibou O, Pagel P, Rattei T, Oesterheld M, Ruepp A, Stumpflen V. MIPS: analysis and annotation of proteins from whole genomes in 2005. Nucleic Acids Res. 2006. pp. D169–172. [PMC free article] [PubMed]
- Gavin AC, Aloy P, Grandi P, Krause R, Boesche M, Marzioch M, Rau C, Jensen LJ, Bastuck S, Dumpelfeld B. et al. Proteome survey reveals modularity of the yeast cell machinery. Nature. 2006;440(7084):631–636. [PubMed]
- Hodges PE, McKee AH, Davis BP, Payne WE, Garrels JI. The Yeast Proteome Database (YPD): a model for the organization and presentation of genome-wide functional data. Nucleic Acids Res. 1999;27(1):69–73. [PMC free article] [PubMed]
- Mewes HW, Amid C, Arnold R, Frishman D, Guldener U, Mannhaupt G, Munsterkotter M, Pagel P, Strack N, Stumpflen V, MIPS: analysis and annotation of proteins from whole genomes. Nucleic Acids Res. 2004. pp. D41–44. [PMC free article] [PubMed]
- Zanzoni A, Montecchi-Palazzi L, Quondam M, Ausiello G, Helmer-Citterich M, Cesareni G. MINT: a Molecular INTeraction database. FEBS Lett. 2002;513(1):135–140. [PubMed]
- Kerrien S, Alam-Faruque Y, Aranda B, Bancarz I, Bridge A, Derow C, Dimmer E, Feuermann M, Friedrichsen A, Huntley R, IntAct--open source resource for molecular interaction data. Nucleic Acids Res. 2007. pp. D561–565. [PMC free article] [PubMed]
- Bader GD, Donaldson I, Wolting C, Ouellette BF, Pawson T, Hogue CW. BIND--The Biomolecular Interaction Network Database. Nucleic Acids Res. 2001;29(1):242–245. [PMC free article] [PubMed]
- Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M. BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 2006. pp. D535–539. [PMC free article] [PubMed]
- Keshava Prasad TS, Goel R, Kandasamy K, Keerthikumar S, Kumar S, Mathivanan S, Telikicherla D, Raju R, Shafreen B, Venugopal A, Human Protein Reference Database--2009 update. Nucleic Acids Res. 2009. pp. D767–772. [PMC free article] [PubMed]
- Han K, Park B, Kim H, Hong J, Park J. HPID: the Human Protein Interaction Database. Bioinformatics. 2004;20(15):2466–2470. [PubMed]
- Yu J, Pacifico S, Liu G, Finley RL Jr. DroID: the Drosophila Interactions Database, a comprehensive resource for annotated gene and protein interactions. BMC Genomics. 2008;9:461. [PMC free article] [PubMed]
- Kuhn Michael, Szklarczyk Damian, Franceschini Andrea, Campillos Monica, von Mering Christian, Lars Juhl Jensen AB, Bork P. STITCH 2: an interaction network database for small molecules and proteins. Nucleic Acids Res. 2010. pp. D552–D556. [PMC free article] [PubMed]
- Jensen LJ, Kuhn M, Stark M, Chaffron S, Creevey C, Muller J, Doerks T, Julien P, Roth A, Simonovic M, STRING 8--a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res. 2009. pp. D412–416. [PMC free article] [PubMed]
- Pea Carninci. The transcriptional landscape of the mammalian genome. Science. 2005;309:1559–1563. [PubMed]
- Rea Linding. NetworKIN: a resource for exploring cellular phosphorylation networks. Nucleid Acids Res. 2008;36:D695–D699. [PMC free article] [PubMed]
- Lee TI, Rinaldi NJ, Robert F, Odom DT, Bar-Joseph Z, Gerber GK, Hannett NM, Harbison CT, Thompson CM, Simon I. et al. Transcriptional regulatory networks in Saccharomyces cerevisiae. Science. 2002;298(5594):799–804. [PubMed]
- Sandelin A, Alkema W, Engström P, Wasserman WW, Lenhard B. JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res. 2004;32:D91–94. [PMC free article] [PubMed]
- Wingender E, Dietze P, Karas H, Knuppel R. TRANSFAC: a database on transcription factors and their DNA binding sites. Nucleic Acids Res. 1996;24(1):238–241. [PMC free article] [PubMed]
- Matys V, Fricke E, Geffers R, Gossling E, Haubrock M, Hehl R, Hornischer K, Karas D, Kel AE, Kel-Margoulis OV. et al. TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res. 2003;31(1):374–378. [PMC free article] [PubMed]
- Lefebvre C, Lim WK, Basso K, Dalla Favera R, Califano A. A context-specific network of protein-DNA and protein-protein interactions reveals new regulatory motifs in human B cells. Lecture Notes in Bioinformatics (LNCS) 2007;4532:42–56.
- Diella FCS, Gemünd C, Linding R, Via A, Kuster B, Sicheritz-Pontén T, Blom N, Gibson TJ. Phospho.ELM: a database of experimentally verified phosphorylation sites in eukaryotic proteins. BMC Bioinformatics. 2004;5 [PMC free article] [PubMed]
- Miller ML. et al. Linear motif atlas for phosphorylation-dependent signaling. Sci Signal. 2008;1(35) [PubMed]
- Gnad F, Ren S, Cox J, Olsen JV, Macek B, Oroshi M, Mann M. PHOSIDA (phosphorylation site database): management, structural and evolutionary investigation, and prediction of phosphosites. Genome Biol. 2007;8(11) [PMC free article] [PubMed]
- Kholodenko BN, Hancock JF, Koch W. Signalling ballet in space and time. Nature Rev Molecular Cell Biology. 2010;11:414–426. [PMC free article] [PubMed]
- Ulrich LE, Z IB. MiST: a microbial signal transduction database. Nucleic Acids Res. 2007;35:D386–390. [PMC free article] [PubMed]
- Krull M, Voss N, Choi C, Pistor S, Potapov A, Wingender E. TRANSPATH: an integrated database on signal transduction and a tool for array analysis. Nucleic Acids Res. 2003;31(1):97–100. [PMC free article] [PubMed]
- Jeong H, Tombor B, Albert R, Oltvai ZN. AL. The large-scale organization of metabolic networks. Nature. 2000;407:651–654. [PubMed]
- Feist AM, Herrgård MJ, Thiele I, Reed JL, Palsson B. Reconstruction of biochemical networks in microorganisms. Nature Rev Microbiology. 2009;7:129–143. [PMC free article] [PubMed]
- Ma H, Mazein A, Selkov A, Selkov E, Demin O, Goryanin I. The Edinburgh human metabolic network reconstruction and its functional analysis. Mol Syst Biol. 2007;3(135) [PMC free article] [PubMed]
- Kanehisa M, Goto S, Furumichi M, Tanabe M, Hirakawa M. KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res. pp. D355–360. [PMC free article] [PubMed]
- Keseler IM, Bonavides-Martinez C, Collado-Vides J, Gama-Castro S, Gunsalus RP, Johnson DA, Krummenacker M, Nolan LM, Paley S, Paulsen IT, EcoCyc: a comprehensive view of Escherichia coli biology. Nucleic Acids Res. 2009. pp. D464–470. [PMC free article] [PubMed]
- Karp PD, Ouzounis CA, Moore-Kochlacs C, Goldovsky L, Kaipa P, Ahren D, Tsoka S, Darzentas N, Kunin V, Lopez-Bigas N. Expansion of the BioCyc collection of pathway/genome databases to 160 genomes. Nucleic Acids Res. 2005;33(19):6083–6089. [PMC free article] [PubMed]
- Whitaker JW, Letunic I, McConkey GA, Westhead DR. metaTIGER: a metabolic evolution resource. Nucleic Acids Res. 2009. pp. D531–538. [PMC free article] [PubMed]
- Schilling CH, Letscher D, Palsson BO. Theory for the systemic definition of metabolic pathways and their use in interpreting metabolic function from a pathway-oriented perspective. J Theor Biol. 2000;203(3):229–248. [PubMed]
- Schilling CH, Palsson BO. Assessment of the metabolic capabilities of Haemophilus influenzae Rd through a genome-scale pathway analysis. J Theor Biol. 2000;203(3):249–283. [PubMed]
- Schilling CH, Schuster S, Palsson BO, Heinrich R. Metabolic pathway analysis: basic concepts and scientific applications in the post-genomic era. Biotechnol Prog. 1999;15(3):296–303. [PubMed]
- Schuster S, Fell DA, Dandekar T. A general definition of metabolic pathways useful for systematic organization and analysis of complex metabolic networks. Nat Biotechnol. 2000;18(3):326–332. [PubMed]
- Schuster S, Dandekar T, Fell DA. Detection of elementary flux modes in biochemical networks: a promising tool for pathway analysis and metabolic engineering. Trends Biotechnol. 1999;17(2):53–60. [PubMed]
- Hucka M, Finney A, Sauro HM, Bolouri H, Doyle JC, Kitano H, Arkin AP, Bornstein BJ, Bray D, Cornish-Bowden A. et al. The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics. 2003;19(4):524–531. [PubMed]
- Finney A, Hucka M. Systems biology markup language: Level 2 and beyond. Biochemical Society transactions. 2003;31(Pt 6):1472–1473. [PubMed]
- Hermjakob H, Montecchi-Palazzi L, Bader G, Wojcik J, Salwinski L, Ceol A, Moore S, Orchard S, Sarkans U, von Mering C. et al. The HUPO PSI's molecular interaction format - a community standard for the representation of protein interaction data. Nat Biotechnol. 2004;22(2):177–183. [PubMed]
- Murray RP, S RH. Chemical Markup, XML, and the Worldwide Web. 1. Basic Principles. Chem Inf Comput Sci. 1999;39:928–942.
- Murray-Rust P, Rzepa HS, Wright M. Development of Chemical Markup Language (CML) as a System for Handling Complex Chemical Content. New J Chem. 2001. pp. 618–634.
- BioPAX Working group. BioPAX-biological pathways exchange language. Version 10 Documentation. 2004.
- Lloyd CM, Halstead MD, Nielsen PF. CellML: its future, present and past. Progress in biophysics and molecular biology. 2004;85(2-3):433–450. [PubMed]
- Lassila O, Swick R. Resource Description Framework (RDF) Model and Syntax Specification. The World Wide Web Consortium (W3C) MIT, INRIA. 1999.
- RDF vocabulary description language 1.0: RDF Schema. http://www.w3.org/tr/2002/wd-rdf-schema-20020430/
- Cormen TH, Leiserson CE, Rivest Ronald L, Stein C. Introduction to algorithms. Cambridge, Massachusetts 02142: The MIT Press; 2002.
- Huber W, Carey VJ, Long L, Falcon S, Gentleman R. Graphs in molecular biology. BMC Bioinformatics. 2007;8(Suppl 6):S8. [PMC free article] [PubMed]
- Lee HK, Hsu AK, Sajdak J, Qin J, Pavlidis P. Coexpression analysis of human genes across many microarray data sets. Genome Res. 2004;14(6):1085–1094. [PubMed]
- Schulz H-J, John M, Unger A, Schumann H. Visual analysis of bipartite biological networks. Eurographics Workshop on Visual Computing for Biomedicine. 2008.
- Burgos E, Ceva H, Hernández L, Perazzo RPJ, Devoto M, Medan D. Two classes of bipartite networks: nested biological and social systems. Phys Rev. 2008;78 [PubMed]
- Picard F, Miele V, Daudin J-J, Cottret L, Robin S. Deciphering the connectivity structure of biological networks using MixNet. BMC Bioinformatics. 2009;10 [PMC free article] [PubMed]
- Leclerc RD. Survival of the sparsest: robust gene networks are parsimonious. Mol Syst Biol. 2008;4:213. [PMC free article] [PubMed]
- Dijkstra EW. A note on two problems in connexion with graphs. Numerische Mathematik. 1959;1:269–271.
- Floyd RW. Algorithm 97. Comm ACM. 1962;5-6:345.
- Bron C, Kerbosch J. Algorithm 457: finding all cliques of an undirected graph. Commun ACM (ACM) 1973;16(9):575–577.
- Zhang H, Song X, Wang H, Zhang X. MIClique: An Algorithm to Identify Differentially Coexpressed Disease Gene Subset from Microarray Data. Journal of Biomedicine and Biotechnology. 2009. [PMC free article] [PubMed]
- Voy BH, Scharff JA, Perkins AD, Saxton AM, Borate B, Chesler EJ, Branstetter LK, Langston MA. Extracting Gene Networks for Low-Dose Radiation Using Graph Theoretical Algorithms. PLoS Comput Biol. 2006;2(7) [PubMed]
- Manfield IW, Jen CH, Pinney JW, Michalopoulos I, Bradford JR, Gilmartin PM, Westhead DR. Arabidopsis Co-expression Tool (ACT): web server tools for microarray-based gene expression analysis. Nucleic Acids Res. 2006. pp. W504–509. [PMC free article] [PubMed]
- Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J. et al. Bioconductor: open software development for computational biology and bioinformatics. Genome biology. 2004;5(10):R80. [PMC free article] [PubMed]
- Ravasz E, Somera A, Mongru D, Oltvai Z, Barabási A-L. Hierarchical organization of modularity in metabolic networks. Science. 2002;297:1551–1555. [PubMed]
- Barabási A-L, Gulbahce N, Loscalzo J. Network medicine: a network-based approach to human disease. Nature Reviews Genetics. 2011;12:56–68. [PMC free article] [PubMed]
- Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U. Network motifs: simple building blocks of complex networks. Science. 2002;298(5594):824–827. [PubMed]
- Shen-Orr S, Milo R, Mangan S, Alon U. Network motifs in the transcriptional regulation network of Escherichia coli. Nat Genet. 2002;31:64–68. [PubMed]
- Ingram PJ, Stumpf MP, Stark J. Network motifs: structure does not determine function. BMC Genomics. 2006;7:108. [PMC free article] [PubMed]
- Zotenko E, Mestre J, O'Leary DP, Przytycka TM. Why do hubs in the yeast protein interaction network tend to be essential: re-examining the connection between the network topology and essentiality. PLoS Comput Biol. 2008;4:1–16. [PMC free article] [PubMed]
- Levy SF, S ML. Network hubs buffer environmental variation in Saccharomyces cerevisiae. PLoS Biol. 2008;6(11) [PMC free article] [PubMed]
- Ma H-W, Z A-P. The connectivity structure, giant strong component and centrality of metabolic networks. Bioinformatics. 2003;19(11) [PubMed]
- Mazurie A, Bonchev D, Schwikowski B, Buck GA. Evolution of metabolic network organization. BMC Syst Bio. 2010;4 [PMC free article] [PubMed]
- da Silva MR, Ma H, Zeng A-P. Centrality, Network Capacity, and Modularity as Parameters to Analyze the Core-Periphery Structure in Metabolic Networks. Proceedings of the IEEE. 2008;96(8):1411–1420.
- Rong ZHL, X Lu, W L. Pinning a complex network through the betweenness centrality strategy. Circuits and Systems IEEE International Symposium. 2009. pp. 1689–1692.
- Kitsak M, Havlin S, Paul G, Riccaboni M, Pammolli F, Stanley HE. Betweenness centrality of fractal and nonfractal scale-free model networks and tests on real networks. Phys Rev E. 2007;75 [PubMed]
- Joy MP, Brock A, Ingber DE, Huang S. High-Betweenness Proteins in the Yeast Protein Interaction Network. J Biomed Biotechnol. 2005;2:96–103. [PMC free article] [PubMed]
- Paladugu SR, Zhao S, Ray A, Raval A. Mining protein networks for synthetic genetic interactions. BMC Bioinformatics. 2008;9 [PMC free article] [PubMed]
- Özgür A, Vu T, Erkan G, Radev DR. Identifying gene-disease associations using centrality on a literature mined gene-interaction network. Bioinformatics. 2008;24(13):i277–i285. [PMC free article] [PubMed]
- Chavali S, Barrenas F, Kanduri K, Benson M. Network properties of human disease genes with pleiotropic effects. BMC Syst Bio. 2010;4 [PMC free article] [PubMed]
- Estrada E. Characterization of the folding degree of proteins. Bioinformatics. 2002;18:697–704. [PubMed]
- Estrada E, Uriarte E. Recent advances on the role of topological indices in drug discovery research. Curr Med Chem. 2001;8:1699–1714. [PubMed]
- Estrada E. Generalized walks-based centrality measures for complex biological networks. J Theor Biol. 2010;263(4):556–565. [PubMed]
- Nisbach F, K M. Developmental time windows for spatial growth generate multiple-cluster small-world networks. Eur Phys J B. 2007;58:185–191.
- Costa LdF, Kaiser M, Hilgetag CC. Predicting the connectivity of primate cortical networks from topological and spatial node properties. BMC Syst Bio. 2007;1 [PMC free article] [PubMed]
- Jeong H, Mason SP, Barabasi A-L, Oltvai ZN. Lethality and centrality in protein networks. Nature. 2001;411(6833):41–42. [PubMed]
- Hahn M, K A. Comparative genomics of centrality and essentiality in three eukaryotic protein-protein interaction networks. Mol Biol Evol. 2005;22:803–806. [PubMed]
- Koschützki D, S F. Comparison of Centralities for Biological Networks. Proc German Conf Bioinformatics (GCB'04) 2004;P-53 of LNI
- Junker BH, Koschützki D, Schreiber F. Exploration of biological network centralities with CentiBiN. BMC Bioinformatics. 2006;7 [PMC free article] [PubMed]
- Baur M, Benkert M, Brandes U, Cornelsen S, Gaertler M, Köpf B, Lerner J, Wagner D. visone - Software for Visual Social Network Analysis. Proc 9th Intl Symp Graph Drawing (GD '01), LNCS. 2002;2265:463–464.
- Batagelj V, Mrvar A. Pajek - Program for Large Network Analysis. Connections. 1998;21:47–57.
- Hu Z, Mellor J, Wu J, Yamada T, Holloway D, DeLisi C. VisANT: data-integrating visual framework for biological networks and modules. Nucleic Acids Res. 2005;33:W352–W357. [PMC free article] [PubMed]
- Albert R. Scale-free networks in cell biology. Journal of Cell Science. 2005;118 [PubMed]
- Lima-Mendez G, van Helden J. The powerful law of the power law and other myths in network biology. Mol Biosyst. 2009;5(12):1482–1493. [PubMed]
- Newman MEJ. Assortative Mixing in Networks. Phys Rev Lett. 2002;89(208701) [PubMed]
- Newman MEJ. Mixing patterns in networks. Phys Rev. 2003;67
- Redner S. Networks: teasing out the missing links. Nature. 2008;453:47–48. [PubMed]
- Erdös P, R A. On the strength of connectedness of a random graph. Acta Math Acad Sci Hungar. 1961;12:261–267.
- Watts DJ, S SH. Collective dynamics of 'small-world' networks. Nature. 1998;393:440–442. [PubMed]
- Barabási A-L, A R. Emergence of scaling in random networks. Science. 1999;286:509–512. [PubMed]
- Berg J, Lassig M, Wagner A. Structure and evolution of protein interaction networks: a statistical model for link dynamics and gene duplications. BMC Evol Biol. 2004;4(1):51. [PMC free article] [PubMed]
- Yamada T, B P. Evolution of biomolecular networks - lessons from metabolic and protein interactions. Nature Rev Molecular Cell Biology. 2009;10:791–803. [PubMed]
- Jain AK, Murty MN, Flynn PJ. Data Clustering: A Review. ACM Computing Surveys (CSUR) 1999;31(3):264–323.
- Duda RO, Hart PE, Stork DG. Pattern Classification, ch.10: Unsupervised learning and clustering. Wiley, New York. 2001. p. 571.
- Saitou N, Nei M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol. 1987;4(4):406–425. [PubMed]
- Borate BR, Chesler EJ, Langston MA, Saxton AM, Voy BH. Comparison of threshold selection methods for microarray gene co-expression matrices. BMC Res Notes. 2009;2(240) [PMC free article] [PubMed]
- Perkins AD, L MA. Threshold selection in gene co-expression networks using spectral graph theory techniques. BMC Bioinformatics. 2009;10 [PMC free article] [PubMed]
- Quackenbush J. Computational genetics: Computational analysis of microarray data. Nat Rev Genetics. 2001;2:418–427. [PubMed]
- Milligan Glenn W, Cooper MC. Methodology Review: Clustering Methods. Applied Psychological Measurement. 1987;11(4):329–354.
- Sneath PHA, Sokal RR. Numerical Taxonomy. San Francisco: Freeman; 1973. Unweighted Pair Group Method with Arithmetic Mean; pp. 230–234.
- Michener CD, Sokal RR. A Quantitative Approach to a Problem in Classification. Evolution. 1957;11(2):130–162.
- Gascuel O, Steel M. Neighbor-joining revealed. Mol Biol Evol. 2006;23(11):1997–2000. [PubMed]
- D'andrade R. U-Statistic Hierarchical Clustering. Psychometrika. 1978;4:58–67.
- Johnson SC. Hierarchical Clustering Schemes. Psychometrika. 1967;2:241–254. [PubMed]
- Seo J, Shneiderman B. Interactively Exploring Hierarchical Clustering Results. Computer. 2002;35(7):80–86.
- Seo J, Gordish-Dressman H, Hoffman EP. An interactive power analysis tool for microarray hypothesis testing and generation. Bioinformatics. 2006;22(7):808–814. [PubMed]
- Kumar S, Tamura K, Nei M. MEGA3: Integrated software for Molecular Evolutionary Genetics Analysis and sequence alignment. Brief Bioinform. 2004;5(2):150–163. [PubMed]
- Tamura K, J D, Nei M, S K. MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Molecular Biology and Evolution. 2007;24:1596–1599. [PubMed]
- Kumar S, Tamura K, Jakobsen I, Nei M. MEGA2: molecular evolutionary genetics analysis software. Bioinformatics. 2001;17(12):1244–1245. [PubMed]
- Kumar S, Tamura K, Nei M. MEGA: Molecular Evolutionary Genetics Analysis software for microcomputers. Comput Appl Biosci. 1994;10(2):189–191. [PubMed]
- Saeed AI, Sharov V, White J, Li J, Liang W, Bhagabati N, Braisted J, Klapa M, Currier T, Thiagarajan M. et al. TM4: a free, open-source system for microarray data management and analysis. BioTechniques. 2003;34(2):374–378. [PubMed]
- Pavlopoulos GA, Soldatos TG, Barbosa-Silva A, Schneider R. A reference guide for tree analysis and visualization. BioData Min. 2010;3(1):1. [PMC free article] [PubMed]
- Enright AJ, Van Dongen S, Ouzounis CA. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 2002;30(7):1575–1584. [PMC free article] [PubMed]
- Moschopoulos CN, Pavlopoulos GA, Schneider R, Likothanassis SD, Kossida S. GIBA: a clustering tool for detecting protein complexes. BMC Bioinformatics. 2009;10(Suppl 6):S11. [PMC free article] [PubMed]
- Gao L, Sun PG, Song J. Clustering algorithms for detecting functional modules in protein interaction networks. J Bioinform Comput Biol. 2009;7(1):217–242. [PubMed]
- Zhong W, Altun G, Harrison R, Tai PC, Pan Y. Improved K-means clustering algorithm for exploring local protein sequence motifs representing common structural property. IEEE Trans Nanobioscience. 2005;4(3):255–265. [PubMed]
- van Dogen S. PhD thesis. University of Utrecht; 2000. Graph Clustering by Flow Simulation.
- Vlasblom J, Wodak SJ. Markov clustering versus affinity propagation for the partitioning of protein interaction graphs. BMC Bioinformatics. 2009;10:99. [PMC free article] [PubMed]
- Enright AJ, Van Dongen S, Ouzounis CA. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 2002;30(7):1575–1584. [PMC free article] [PubMed]
- MacQueen B. Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability. Vol. 1. Berkeley, University of California Press; 1967. Some Methods for classification and Analysis of Multivariate Observations; pp. 281–297.
- Lu Y, Lu S, Fotouhi F, Deng Y, Brown SJ. Incremental genetic K-means algorithm and its application in gene expression data analysis. BMC Bioinformatics. 2004;5:172. [PMC free article] [PubMed]
- Frey BJ, Dueck D. Clustering by passing messages between data points. Science. 2007;315(5814):972–976. [PubMed]
- King AD, Przulj N, Jurisica I. Protein complex prediction via cost-based clustering. Bioinformatics. 2004;20(17):3013–3020. [PubMed]
- Paccanaro A, Casbon JA, Saqi MA. Spectral clustering of protein sequences. Nucleic Acids Res. 2006;34(5):1571–1580. [PMC free article] [PubMed]
- Li X, Wu M, Kwoh CK, Ng SK. Computational approaches for detecting protein complexes from protein interaction networks: a survey. BMC Genomics. 2010;11(Suppl 1):S3. [PMC free article] [PubMed]
- Brohee S, van Helden J. Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinformatics. 2006;7:488. [PMC free article] [PubMed]
- Pavlopoulos GA, Wegener AL, Schneider R. A survey of visualization tools for biological network analysis. BioData Min. 2008;1:12. [PMC free article] [PubMed]
- Brohee S, Faust K, Lima-Mendez G, Sand O, Janky R, Vanderstocken G, Deville Y, van Helden J. NeAT: a toolbox for the analysis of biological networks, clusters, classes and pathways. Nucleic Acids Res. 2008. pp. W444–451. [PMC free article] [PubMed]
- Pavlopoulos GA, Moschopoulos CN, Hooper SD, Schneider R, Kossida S. jClust: a clustering and visualization toolbox. Bioinformatics. 2009;25(15):1994–1996. [PMC free article] [PubMed]
- Yoshida R, Higuchi T, Imoto S, Miyano S. ArrayCluster: an analytic tool for clustering, data visualization and module finder on gene expression profiles. Bioinformatics. 2006;22:1538–1539. [PubMed]
- Hooper SD, Bork P. Medusa: a simple tool for interaction graph analysis. Bioinformatics. 2005;21(24):4432–4433. [PubMed]
- Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13(11):2498–2504. [PubMed]
- Pavlopoulos GA, O'Donoghue SI, Satagopam VP, Soldatos TG, Pafilis E, Schneider R. Arena3D: visualization of biological networks in 3D. BMC systems biology. 2008;2:104. [PMC free article] [PubMed]
- Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, Knight JR, Lockshon D, Narayan V, Srinivasan M, Pochart P. et al. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature. 2000;403(6770):623–627. [PubMed]
- Rain JC, Selig L, De Reuse H, Battaglia V, Reverdy C, Simon S, Lenzen G, Petel F, Wojcik J, Schachter V. et al. The protein-protein interaction map of Helicobacter pylori. Nature. 2001;409(6817):211–215. [PubMed]
- Giot L, Bader JS, Brouwer C, Chaudhuri A, Kuang B, Li Y, Hao YL, Ooi CE, Godwin B, Vitols E. et al. A protein interaction map of Drosophila melanogaster. Science. 2003;302(5651):1727–1736. [PubMed]
- Li S, Armstrong CM, Bertin N, Ge H, Milstein S, Boxem M, Vidalain PO, Han JD, Chesneau A, Hao T. et al. A map of the interactome network of the metazoan C. elegans. Science. 2004;303(5657):540–543. [PMC free article] [PubMed]
- von Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S, Bork P. Comparative assessment of large-scale data sets of protein-protein interactions. Nature. 2002;417(6887):399–403. [PubMed]
- Raman K. Construction and analysis of protein-protein interaction networks. Autom Exp. 2010;2(1):2. [PMC free article] [PubMed]
- Salgado H, Santos-Zavaleta A, Gama-Castro S, Peralta-Gil M, Penaloza-Spinola MI, Martinez-Antonio A, Karp PD, Collado-Vides J. The comprehensive updated regulatory network of Escherichia coli K-12. BMC Bioinformatics. 2006;7:5. [PMC free article] [PubMed]
- Salgado H, Gama-Castro S, Peralta-Gil M, Diaz-Peredo E, Sanchez-Solano F, Santos-Zavaleta A, Martinez-Flores I, Jimenez-Jacinto V, Bonavides-Martinez C, Segura-Salazar J, RegulonDB (version 5.0): Escherichia coli K-12 transcriptional regulatory network, operon organization, and growth conditions. Nucleic Acids Res. 2006. pp. D394–397. [PMC free article] [PubMed]
- Lozada-Chavez I, Janga SC, Collado-Vides J. Bacterial regulatory networks are extremely flexible in evolution. Nucleic Acids Res. 2006;34(12):3434–3445. [PMC free article] [PubMed]
- Madan Babu M, Teichmann SA, Aravind L. Evolutionary dynamics of prokaryotic transcriptional regulatory networks. J Mol Biol. 2006;358(2):614–633. [PubMed]
- Sneppen Kim, Zocchi G. Physics in Molecular Biology. Giovanni Zocchi; 2005.
- van Nimwegen E. Scaling laws in the functional content of genomes. Trends Genet. 2003;19(9):479–484. [PubMed]
- Bhalla US, Iyengar R. Emergent properties of networks of biological signaling pathways. Science. 1999;283(5400):381–387. [PubMed]
- Junker Björn H, Schreiber F. Analysis of Biological Networks. 2008.
- Guelzim N, Bottani S, Bourgine P, Kepes F. Topological and causal structure of the yeast transcriptional regulatory network. Nat Genet. 2002;31(1):60–63. [PubMed]
- Ma H, Zeng AP. Reconstruction of metabolic networks from genome data and analysis of their global structure for various organisms. Bioinformatics. 2003;19(2):270–277. [PubMed]
- Jeong H, Tombor B, Albert R, Oltvai ZN, Barabási A-L. The large-scale organization of metabolic networks. Nature. 2000;407(6804):651–654. [PubMed]
- Gagneur J, Jackson DB, Casari G. Hierarchical analysis of dependency in metabolic networks. Bioinformatics. 2003;19(8):1027–1034. [PubMed]
- Holme P, Huss M, Jeong H. Subnetwork hierarchies of biochemical pathways. Bioinformatics. 2003;19(4):532–538. [PubMed]

Articles from BioData Mining are provided here courtesy of **BioMed Central**

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |