# Related Articles

Background

Successive whole genome duplications have recently been firmly established in all major eukaryote kingdoms. Such exponential evolutionary processes must have largely contributed to shape the topology of protein-protein interaction (PPI) networks by outweighing, in particular, all time-linear network growths modeled so far.

Results

We propose and solve a mathematical model of PPI network evolution under successive genome duplications. This demonstrates, from first principles, that evolutionary conservation and scale-free topology are intrinsically linked properties of PPI networks and emerge from i) prevailing exponential network dynamics under duplication and ii) asymmetric divergence of gene duplicates. While required, we argue that this asymmetric divergence arises, in fact, spontaneously at the level of protein-binding sites. This supports a refined model of PPI network evolution in terms of protein domains under exponential and asymmetric duplication/divergence dynamics, with multidomain proteins underlying the combinatorial formation of protein complexes. Genome duplication then provides a powerful source of PPI network innovation by promoting local rearrangements of multidomain proteins on a genome wide scale. Yet, we show that the overall conservation and topology of PPI networks are robust to extensive domain shuffling of multidomain proteins as well as to finer details of protein interaction and evolution. Finally, large scale features of direct and indirect PPI networks of S. cerevisiae are well reproduced numerically with only two adjusted parameters of clear biological significance (i.e. network effective growth rate and average number of protein-binding domains per protein).

Conclusion

This study demonstrates the statistical consequences of genome duplication and domain shuffling on the conservation and topology of PPI networks over a broad evolutionary scale across eukaryote kingdoms. In particular, scale-free topologies of PPI networks, which are found to be robust to extensive shuffling of protein domains, appear to be a simple consequence of the conservation of protein-binding domains under asymmetric duplication/divergence dynamics in the course of evolution.

doi:10.1186/1752-0509-1-49

PMCID: PMC2245809
PMID: 17999763

In this work, we introduce a novel network synthesis model that can generate families of evolutionarily related synthetic protein–protein interaction (PPI) networks. Given an ancestral network, the proposed model generates the network family according to a hypothetical phylogenetic tree, where the descendant networks are obtained through duplication and divergence of their ancestors, followed by network growth using network evolution models. We demonstrate that this network synthesis model can effectively create synthetic networks whose internal and cross-network properties closely resemble those of real PPI networks. The proposed model can serve as an effective framework for generating comprehensive benchmark datasets that can be used for reliable performance assessment of comparative network analysis algorithms. Using this model, we constructed a large-scale network alignment benchmark, called NAPAbench, and evaluated the performance of several representative network alignment algorithms. Our analysis clearly shows the relative performance of the leading network algorithms, with their respective advantages and disadvantages. The algorithm and source code of the network synthesis model and the network alignment benchmark NAPAbench are publicly available at http://www.ece.tamu.edu/bjyoon/NAPAbench/.

doi:10.1371/journal.pone.0041474

PMCID: PMC3418285
PMID: 22912671

We model the evolution of eukaryotic protein-protein interaction (PPI) networks. In our model, PPI networks evolve by two known biological mechanisms: (1) Gene duplication, which is followed by rapid diversification of duplicate interactions. (2) Neofunctionalization, in which a mutation leads to a new interaction with some other protein. Since many interactions are due to simple surface compatibility, we hypothesize there is an increased likelihood of interacting with other proteins in the target protein’s neighborhood. We find good agreement of the model on 10 different network properties compared to high-confidence experimental PPI networks in yeast, fruit flies, and humans. Key findings are: (1) PPI networks evolve modular structures, with no need to invoke particular selection pressures. (2) Proteins in cells have on average about 6 degrees of separation, similar to some social networks, such as human-communication and actor networks. (3) Unlike social networks, which have a shrinking diameter (degree of maximum separation) over time, PPI networks are predicted to grow in diameter. (4) The model indicates that evolutionarily old proteins should have higher connectivities and be more centrally embedded in their networks. This suggests a way in which present-day proteomics data could provide insights into biological evolution.

doi:10.1371/journal.pone.0039052

PMCID: PMC3387198
PMID: 22768057

Many real-world phenomena have been described in terms of large networks. Networks have been invaluable models for the understanding of biological systems. Since proteins carry out most biological processes, we focus on analysing protein–protein interaction (PPI) networks. Proteins interact to perform a function. Thus, PPI networks reflect the interconnected nature of biological processes and analysing their structural properties could provide insights into biological function and disease. We have already demonstrated, by using a sensitive graph theoretic method for comparing topologies of node neighbourhoods called ‘graphlet degree signatures’, that proteins with similar surroundings in PPI networks tend to perform the same functions. Here, we explore whether the involvement of genes in cancer suggests the similarity of their topological ‘signatures’ as well. By applying a series of clustering methods to proteins' topological signature similarities, we demonstrate that the obtained clusters are significantly enriched with cancer genes. We apply this methodology to identify novel cancer gene candidates, validating 80 per cent of our predictions in the literature. We also validate predictions biologically by identifying cancer-related negative regulators of melanogenesis identified in our siRNA screen. This is encouraging, since we have done this solely from PPI network topology. We provide clear evidence that PPI network structure around cancer genes is different from the structure around non-cancer genes. Understanding the underlying principles of this phenomenon is an open question, with a potential for increasing our understanding of complex diseases.

doi:10.1098/rsif.2009.0192

PMCID: PMC2842789
PMID: 19625303

biological networks; protein interaction networks; network topology; cancer gene identification

Background

Based on the distinguishing properties of protein-protein interaction networks such as power-law degree distribution and modularity structure, several stochastic models for the evolution of these networks have been purposed, motivated by the idea that a validated model should reproduce similar topological properties of the empirical network. However, being able to capture topological properties does not necessarily mean it correctly reproduces how networks emerge and evolve. More importantly, there is already evidence suggesting functional organization and significance of these networks. The current stochastic models of evolution, however, grow the network without consideration for biological function and natural selection.

Results

To test whether protein interaction networks are functionally organized and their impacts on the evolution of these networks, we analyzed their evolution at both the topological and functional level. We find that the human network is shown to be functionally organized, and its function evolves with the topological properties of the network. Our analysis suggests that function most likely affects local modularity of the network. Consistently, we further found that the topological unit is also the functional unit of the network.

Conclusion

We have demonstrated functional organization of a protein interaction network. Given our observations, we suggest that its significance should not be overlooked when studying network evolution.

doi:10.1186/1471-2164-13-150

PMCID: PMC3375200
PMID: 22530615

The topology behind biological interaction networks has been studied for over a decade. Yet, there is no definite agreement on the theoretical models which best describe protein-protein interaction (PPI) networks. Such models are critical to quantifying the significance of any empirical observation regarding those networks. Here, we perform a comprehensive analysis of yeast PPI networks in order to gain insights into their topology and its dependency on interaction-screening technology. We find that: (1) interaction-detection technology has little effect on the topology of PPI networks; (2) topology of these interaction networks differs in organisms with different cellular complexity (human and yeast); (3) clear topological difference is present between PPI networks, their functional sub-modules, and their inter-functional “linkers”; (4) high confidence PPI networks have more “geometrical” topology compared to predicted, incomplete, or noisy PPI networks; and (5) inter-functional “linker” proteins serve as mediators in signal transduction, transport, regulation and organisational cellular processes.

doi:10.1038/srep04273

PMCID: PMC3940977
PMID: 24589662

The (asymptotic) degree distributions of the best-known “scale-free” network models are all similar and are independent of the seed graph used; hence, it has been tempting to assume that networks generated by these models are generally similar. In this paper, we observe that several key topological features of such networks depend heavily on the specific model and the seed graph used. Furthermore, we show that starting with the “right” seed graph (typically a dense subgraph of the protein–protein interaction network analyzed), the duplication model captures many topological features of publicly available protein–protein interaction networks very well.

Author Summary

The interactions among proteins in an organism can be represented as a protein–protein interaction (PPI) network, where each protein is represented with a node, and each interaction is represented with an edge between two nodes. As PPI networks of several model organisms become available, their topological features attract considerable attention. It is believed that the available PPI networks are (1) “small-world” networks, and (2) their degree distribution is in the form of a “power law.” In other words, (1) it is possible to reach from a protein to any other protein in only a small (approximately six) number of hops, and (2) although most proteins have only a few interactions (one or two), there are a few proteins with many more interactions (200 or more) and that act as “hubs.” It has thus been tempting to develop simple mathematical network generators with topological features similar to those of the available PPI networks. One such model, the “duplication model,” is based on Ohno's model of genome growth. It starts with a small “seed network” and grows by “duplicating” one of the existing nodes at a time, with an identical set of interactions; a randomly selected subset of these interactions is then deleted, and a few new interactions are added at random. It has been mathematically proven that the duplication model provides a small-world network and also has a power-law degree distribution. What we show in this paper is that by choosing the “right” seed network, many other topological features of the available PPI networks can be captured by the duplication model. The right seed network in this case turns out to include two sizable “cliques” (subnetworks where all node pairs are connected) with many interactions in between. In this paper, we also consider the preferential attachment model, which again grows by adding to a seed network one node at a time and connecting the new node to every other node with probability proportional to the existing degree of the second node. Because the preferential attachment model also provides a small-world network and has a power-law degree distribution, it has been considered equivalent to the duplication model. We show that the two models are vastly different in terms of other topological features we consider, and the preferential attachment model cannot capture some key features of the available PPI networks.

doi:10.1371/journal.pcbi.0030118

PMCID: PMC1913096
PMID: 17616981

The (asymptotic) degree distributions of the best-known “scale-free” network models are all similar and are independent of the seed graph used; hence, it has been tempting to assume that networks generated by these models are generally similar. In this paper, we observe that several key topological features of such networks depend heavily on the specific model and the seed graph used. Furthermore, we show that starting with the “right” seed graph (typically a dense subgraph of the protein–protein interaction network analyzed), the duplication model captures many topological features of publicly available protein–protein interaction networks very well.

Author Summary

The interactions among proteins in an organism can be represented as a protein–protein interaction (PPI) network, where each protein is represented with a node, and each interaction is represented with an edge between two nodes. As PPI networks of several model organisms become available, their topological features attract considerable attention. It is believed that the available PPI networks are (1) “small-world” networks, and (2) their degree distribution is in the form of a “power law.” In other words, (1) it is possible to reach from a protein to any other protein in only a small (approximately six) number of hops, and (2) although most proteins have only a few interactions (one or two), there are a few proteins with many more interactions (200 or more) and that act as “hubs.” It has thus been tempting to develop simple mathematical network generators with topological features similar to those of the available PPI networks. One such model, the “duplication model,” is based on Ohno's model of genome growth. It starts with a small “seed network” and grows by “duplicating” one of the existing nodes at a time, with an identical set of interactions; a randomly selected subset of these interactions is then deleted, and a few new interactions are added at random. It has been mathematically proven that the duplication model provides a small-world network and also has a power-law degree distribution. What we show in this paper is that by choosing the “right” seed network, many other topological features of the available PPI networks can be captured by the duplication model. The right seed network in this case turns out to include two sizable “cliques” (subnetworks where all node pairs are connected) with many interactions in between. In this paper, we also consider the preferential attachment model, which again grows by adding to a seed network one node at a time and connecting the new node to every other node with probability proportional to the existing degree of the second node. Because the preferential attachment model also provides a small-world network and has a power-law degree distribution, it has been considered equivalent to the duplication model. We show that the two models are vastly different in terms of other topological features we consider, and the preferential attachment model cannot capture some key features of the available PPI networks.

doi:10.1371/journal.pcbi.0030118

PMCID: PMC1913096
PMID: 17616981

Cellular functions are based on the complex interplay of proteins, therefore the structure and dynamics of these protein-protein interaction (PPI) networks are the key to the functional understanding of cells. In the last years, large-scale PPI networks of several model organisms were investigated. A number of theoretical models have been developed to explain both the network formation and the current structure. Favored are models based on duplication and divergence of genes, as they most closely represent the biological foundation of network evolution. However, studies are often based on simulated instead of empirical data or they cover only single organisms. Methodological improvements now allow the analysis of PPI networks of multiple organisms simultaneously as well as the direct modeling of ancestral networks. This provides the opportunity to challenge existing assumptions on network evolution. We utilized present-day PPI networks from integrated datasets of seven model organisms and developed a theoretical and bioinformatic framework for studying the evolutionary dynamics of PPI networks. A novel filtering approach using percolation analysis was developed to remove low confidence interactions based on topological constraints. We then reconstructed the ancient PPI networks of different ancestors, for which the ancestral proteomes, as well as the ancestral interactions, were inferred. Ancestral proteins were reconstructed using orthologous groups on different evolutionary levels. A stochastic approach, using the duplication-divergence model, was developed for estimating the probabilities of ancient interactions from today's PPI networks. The growth rates for nodes, edges, sizes and modularities of the networks indicate multiplicative growth and are consistent with the results from independent static analysis. Our results support the duplication-divergence model of evolution and indicate fractality and multiplicative growth as general properties of the PPI network structure and dynamics.

doi:10.1371/journal.pone.0058134

PMCID: PMC3603955
PMID: 23526967

Background

In many protein-protein interaction (PPI) networks, densely connected hub proteins are more likely to be essential proteins. This is referred to as the "centrality-lethality rule", which indicates that the topological placement of a protein in PPI network is connected with its biological essentiality. Though such connections are observed in many PPI networks, the underlying topological properties for these connections are not yet clearly understood. Some suggested putative connections are the involvement of essential proteins in the maintenance of overall network connections, or that they play a role in essential protein clusters. In this work, we have attempted to examine the placement of essential proteins and the network topology from a different perspective by determining the correlation of protein essentiality and reverse nearest neighbor topology (RNN).

Results

The RNN topology is a weighted directed graph derived from PPI network, and it is a natural representation of the topological dependences between proteins within the PPI network. Similar to the original PPI network, we have observed that essential proteins tend to be hub proteins in RNN topology. Additionally, essential genes are enriched in clusters containing many hub proteins in RNN topology (RNN protein clusters). Based on these two properties of essential genes in RNN topology, we have proposed a new measure; the RNN cluster centrality. Results from a variety of PPI networks demonstrate that RNN cluster centrality outperforms other centrality measures with regard to the proportion of selected proteins that are essential proteins. We also investigated the biological importance of RNN clusters.

Conclusions

This study reveals that RNN cluster centrality provides the best correlation of protein essentiality and placement of proteins in PPI network. Additionally, merged RNN clusters were found to be topologically important in that essential proteins are significantly enriched in RNN clusters, and biologically important because they play an important role in many Gene Ontology (GO) processes.

doi:10.1186/1471-2105-11-505

PMCID: PMC3098085
PMID: 20939873

Motivation: Much of the large-scale molecular data from living cells can be represented in terms of networks. Such networks occupy a central position in cellular systems biology. In the protein–protein interaction (PPI) network, nodes represent proteins and edges represent connections between them, based on experimental evidence. As PPI networks are rich and complex, a mathematical model is sought to capture their properties and shed light on PPI evolution. The mathematical literature contains various generative models of random graphs. It is a major, still largely open question, which of these models (if any) can properly reproduce various biologically interesting networks. Here, we consider this problem where the graph at hand is the PPI network of Saccharomyces cerevisiae. We are trying to distinguishing between a model family which performs a process of copying neighbors, represented by the duplication–divergence (DD) model, and models which do not copy neighbors, with the Barabási–Albert (BA) preferential attachment model as a leading example.

Results: The observed property of the network is the distribution of maximal bicliques in the graph. This is a novel criterion to distinguish between models in this area. It is particularly appropriate for this purpose, since it reflects the graph's growth pattern under either model. This test clearly favors the DD model. In particular, for the BA model, the vast majority (92.9%) of the bicliques with both sides ≥4 must be already embedded in the model's seed graph, whereas the corresponding figure for the DD model is only 5.1%. Our results, based on the biclique perspective, conclusively show that a naïve unmodified DD model can capture a key aspect of PPI networks.

Contact: regevs01@cs.huji.ac.il; michall@cc.huji.ac.il; nati@cs.huji.ac.il

Supplementary information: Supplementary data are available at Bioinformatics online.

doi:10.1093/bioinformatics/btr201

PMCID: PMC3117378
PMID: 21685063

Gene duplication with subsequent interaction divergence is one of the primary driving forces in the evolution of genetic systems. Yet little is known about the precise mechanisms and the role of duplication divergence in the evolution of protein networks from the prokaryote and eukaryote domains. We developed a novel, model-based approach for Bayesian inference on biological network data that centres on approximate Bayesian computation, or likelihood-free inference. Instead of computing the intractable likelihood of the protein network topology, our method summarizes key features of the network and, based on these, uses a MCMC algorithm to approximate the posterior distribution of the model parameters. This allowed us to reliably fit a flexible mixture model that captures hallmarks of evolution by gene duplication and subfunctionalization to protein interaction network data of Helicobacter pylori and Plasmodium falciparum. The 80% credible intervals for the duplication–divergence component are [0.64, 0.98] for H. pylori and [0.87, 0.99] for P. falciparum. The remaining parameter estimates are not inconsistent with sequence data. An extensive sensitivity analysis showed that incompleteness of PIN data does not largely affect the analysis of models of protein network evolution, and that the degree sequence alone barely captures the evolutionary footprints of protein networks relative to other statistics. Our likelihood-free inference approach enables a fully Bayesian analysis of a complex and highly stochastic system that is otherwise intractable at present. Modelling the evolutionary history of PIN data, it transpires that only the simultaneous analysis of several global aspects of protein networks enables credible and consistent inference to be made from available datasets. Our results indicate that gene duplication has played a larger part in the network evolution of the eukaryote than in the prokaryote, and suggests that single gene duplications with immediate divergence alone may explain more than 60% of biological network data in both domains.

Author Summary

The importance of gene duplication to biological evolution has been recognized since the 1930s. For more than a decade, substantial evidence has been collected from genomic sequence data in order to elucidate the importance and the mechanisms of gene duplication; however, most biological characteristics arise from complex interactions between the cell's numerous constituents. Recently, preliminary descriptions of the protein interaction networks have become available for species of different domains. Adapting novel techniques in stochastic simulation, the authors demonstrate that evolutionary inferences can be drawn from large-scale, incomplete network data by fitting a stochastic model of network growth that captures hallmarks of evolution by duplication and divergence. They have also analyzed the effect of summarizing protein networks in different ways, and show that a reliable and consistent analysis requires many aspects of network data to be considered jointly; in contrast to what is commonly done in practice. Their results indicate that duplication and divergence has played a larger role in the network evolution of the eukaryote P. falciparum than in the prokaryote H. pylori, and emphasize at least for the eukaryote the potential importance of subfunctionalization in network evolution.

doi:10.1371/journal.pcbi.0030230

PMCID: PMC2098858
PMID: 18052538

Background

In recent years, a considerable amount of research effort has been directed to the analysis of biological networks with the availability of genome-scale networks of genes and/or proteins of an increasing number of organisms. A protein-protein interaction (PPI) network is a particular biological network which represents physical interactions between pairs of proteins of an organism. Major research on PPI networks has focused on understanding the topological organization of PPI networks, evolution of PPI networks and identification of conserved subnetworks across different species, discovery of modules of interaction, use of PPI networks for functional annotation of uncharacterized proteins, and improvement of the accuracy of currently available networks.

Results

In this article, we map known functional annotations of proteins onto a PPI network in order to identify frequently occurring interaction patterns in the functional space. We propose a new frequent pattern identification technique, PPISpan, adapted specifically for PPI networks from a well-known frequent subgraph identification method, gSpan. Existing module discovery techniques either look for specific clique-like highly interacting protein clusters or linear paths of interaction. However, our goal is different; instead of single clusters or pathways, we look for recurring functional interaction patterns in arbitrary topologies. We have applied PPISpan on PPI networks of Saccharomyces cerevisiae and identified a number of frequently occurring functional interaction patterns.

Conclusion

With the help of PPISpan, recurring functional interaction patterns in an organism's PPI network can be identified. Such an analysis offers a new perspective on the modular organization of PPI networks. The complete list of identified functional interaction patterns is available at .

doi:10.1186/1471-2105-9-276

PMCID: PMC2442100
PMID: 18547430

Background

Recent advancements in experimental biotechnology have produced large amounts of protein-protein interaction (PPI) data. The topology of PPI networks is believed to have a strong link to their function. Hence, the abundance of PPI data for many organisms stimulates the development of computational techniques for the modeling, comparison, alignment, and clustering of networks. In addition, finding representative models for PPI networks will improve our understanding of the cell just as a model of gravity has helped us understand planetary motion. To decide if a model is representative, we need quantitative comparisons of model networks to real ones. However, exact network comparison is computationally intractable and therefore several heuristics have been used instead. Some of these heuristics are easily computable "network properties," such as the degree distribution, or the clustering coefficient. An important special case of network comparison is the network alignment problem. Analogous to sequence alignment, this problem asks to find the "best" mapping between regions in two networks. It is expected that network alignment might have as strong an impact on our understanding of biology as sequence alignment has had. Topology-based clustering of nodes in PPI networks is another example of an important network analysis problem that can uncover relationships between interaction patterns and phenotype.

Results

We introduce the GraphCrunch 2 software tool, which addresses these problems. It is a significant extension of GraphCrunch which implements the most popular random network models and compares them with the data networks with respect to many network properties. Also, GraphCrunch 2 implements the GRAph ALigner algorithm ("GRAAL") for purely topological network alignment. GRAAL can align any pair of networks and exposes large, dense, contiguous regions of topological and functional similarities far larger than any other existing tool. Finally, GraphCruch 2 implements an algorithm for clustering nodes within a network based solely on their topological similarities. Using GraphCrunch 2, we demonstrate that eukaryotic and viral PPI networks may belong to different graph model families and show that topology-based clustering can reveal important functional similarities between proteins within yeast and human PPI networks.

Conclusions

GraphCrunch 2 is a software tool that implements the latest research on biological network analysis. It parallelizes computationally intensive tasks to fully utilize the potential of modern multi-core CPUs. It is open-source and freely available for research use. It runs under the Windows and Linux platforms.

doi:10.1186/1471-2105-12-24

PMCID: PMC3036622
PMID: 21244715

Background

A goal of systems biology is to analyze large-scale molecular networks including gene expressions and protein-protein interactions, revealing the relationships between network structures and their biological functions. Dividing a protein-protein interaction (PPI) network into naturally grouped parts is an essential way to investigate the relationship between topology of networks and their functions. However, clear modular decomposition is often hard due to the heterogeneous or scale-free properties of PPI networks.

Methodology/Principal Findings

To address this problem, we propose a diffusion model-based spectral clustering algorithm, which analytically solves the cluster structure of PPI networks as a problem of random walks in the diffusion process in them. To cope with the heterogeneity of the networks, the power factor is introduced to adjust the diffusion matrix by weighting the transition (adjacency) matrix according to a node degree matrix. This algorithm is named adjustable diffusion matrix-based spectral clustering (ADMSC). To demonstrate the feasibility of ADMSC, we apply it to decomposition of a yeast PPI network, identifying biologically significant clusters with approximately equal size. Compared with other established algorithms, ADMSC facilitates clear and fast decomposition of PPI networks.

Conclusions/Significance

ADMSC is proposed by introducing the power factor that adjusts the diffusion matrix to the heterogeneity of the PPI networks. ADMSC effectively partitions PPI networks into biologically significant clusters with almost equal sizes, while being very fast, robust and appealing simple.

doi:10.1371/journal.pone.0012623

PMCID: PMC2935381
PMID: 20830307

The architecture of the network of protein–protein physical interactions in Saccharomyces cerevisiae is exposed through the combination of two complementary theoretical network measures, betweenness centrality and ‘Q-modularity’. The yeast interactome is characterized by well-defined topological modules connected via a small number of inter-module protein interactions. Should such topological inter-module connections turn out to constitute a form of functional coordination between the modules, we speculate that this coordination is occurring typically in a pairwise fashion, rather than by way of high-degree hub proteins responsible for coordinating multiple modules. The unique non-hub-centric hierarchical organization of the interactome is not reproduced by gene duplication-and-divergence stochastic growth models that disregard global selective pressures.

doi:10.1093/nar/gkl325

PMCID: PMC1464412
PMID: 16717286

Background

Protein interactions support cell organization and mediate its response to any specific stimulus. Recent technological advances have produced large data-sets that aim at describing the cell interactome. These data are usually presented as graphs where proteins (nodes) are linked by edges to their experimentally determined partners. This representation reveals that protein-protein interaction (PPI) networks, like other kinds of complex networks, are not randomly organized and display properties that are typical of "hierarchical" networks, combining modularity and local clustering to scale free topology. However informative, this representation is static and provides no clue about the dynamic nature of protein interactions inside the cell.

Results

To fill this methodological gap, we designed and implemented a computer model that captures the discrete and stochastic nature of protein interactions. In ProtNet, our simplified model, the intracellular space is mapped onto either a two-dimensional or a three-dimensional lattice with each lattice site having a linear size (5 nm) comparable to the diameter of an average globular protein. The protein filled lattice has an occupancy (e.g. 20%) compatible with the estimated crowding of proteins in the cell cytoplasm. Proteins or protein complexes are free to translate and rotate on the lattice that represents a sort of naïve unstructured cell (devoid of compartments). At each time step, molecular entities (proteins or complexes) that happen to be in neighboring cells may interact and form larger complexes or dissociate depending on the interaction rules defined in an experimental protein interaction network. This whole procedure can be seen as a sort of "discrete molecular dynamics" applied to interacting proteins in a cell.

We have tested our model by performing different simulations using as interaction rules those derived from an experimental interactome of Saccharomyces cerevisiae (1378 nodes, 2491 edges) and we have compared the dynamics of complex formation in a two and a three dimensional lattice model.

Conclusion

ProtNet is a cellular automaton model, where each protein molecule or complex is explicitly represented and where simple interaction rules are applied to populations of discrete particles. This tool can be used to simulate the dynamics of protein interactions in the cell.

doi:10.1186/1471-2105-8-S1-S4

PMCID: PMC1885856
PMID: 17430571

We investigate a very simple model describing the evolution of protein-protein interaction networks via duplication and divergence. The model exhibits a remarkably rich behavior depending on a single parameter, the probability to retain a duplicated link during divergence. When this parameter is large, the network growth is not self-averaging and an average node degree increases algebraically. The lack of self-averaging results in a great diversity of networks grown out of the same initial condition. When less than a half of links are (on average) preserved after divergence, the growth is self-averaging, the average degree increases very slowly or tends to a constant, and a degree distribution has a power-law tail. The predicted degree distributions are in a very good agreement with the distributions observed in real protein networks.

doi:10.1103/PhysRevE.71.061911

PMCID: PMC2092385
PMID: 16089769

Proteins are essential macromolecules of life that carry out most cellular processes. Since proteins aggregate to perform function, and since protein-protein interaction (PPI) networks model these aggregations, one would expect to uncover new biology from PPI network topology. Hence, using PPI networks to predict protein function and role of protein pathways in disease has received attention. A debate remains open about whether network properties of “biologically central (BC)” genes (i.e., their protein products), such as those involved in aging, cancer, infectious diseases, or signaling and drug-targeted pathways, exhibit some topological centrality compared to the rest of the proteins in the human PPI network.

To help resolve this debate, we design new network-based approaches and apply them to get new insight into biological function and disease. We hypothesize that BC genes have a topologically central (TC) role in the human PPI network. We propose two different concepts of topological centrality. We design a new centrality measure to capture complex wirings of proteins in the network that identifies as TC those proteins that reside in dense extended network neighborhoods. Also, we use the notion of domination and find dominating sets (DSs) in the PPI network, i.e., sets of proteins such that every protein is either in the DS or is a neighbor of the DS. Clearly, a DS has a TC role, as it enables efficient communication between different network parts.

We find statistically significant enrichment in BC genes of TC nodes and outperform the existing methods indicating that genes involved in key biological processes occupy topologically complex and dense regions of the network and correspond to its “spine” that connects all other network parts and can thus pass cellular signals efficiently throughout the network. To our knowledge, this is the first study that explores domination in the context of PPI networks.

doi:10.1371/journal.pone.0023016

PMCID: PMC3162560
PMID: 21887225

Background

Studying the large-scale protein-protein interaction (PPI) network is important in understanding biological processes. The current research presents the first PPI map of swine, which aims to give new insights into understanding their biological processes.

Results

We used three methods, Interolog-based prediction of porcine PPI network, domain-motif interactions from structural topology-based prediction of porcine PPI network and motif-motif interactions from structural topology-based prediction of porcine PPI network, to predict porcine protein interactions among 25,767 porcine proteins. We predicted 20,213, 331,484, and 218,705 porcine PPIs respectively, merged the three results into 567,441 PPIs, constructed four PPI networks, and analyzed the topological properties of the porcine PPI networks. Our predictions were validated with Pfam domain annotations and GO annotations. Averages of 70, 10,495, and 863 interactions were related to the Pfam domain-interacting pairs in iPfam database. For comparison, randomized networks were generated, and averages of only 4.24, 66.79, and 44.26 interactions were associated with Pfam domain-interacting pairs in iPfam database. In GO annotations, we found 52.68%, 75.54%, 27.20% of the predicted PPIs sharing GO terms respectively. However, the number of PPI pairs sharing GO terms in the 10,000 randomized networks reached 52.68%, 75.54%, 27.20% is 0. Finally, we determined the accuracy and precision of the methods. The methods yielded accuracies of 0.92, 0.53, and 0.50 at precisions of about 0.93, 0.74, and 0.75, respectively.

Conclusion

The results reveal that the predicted PPI networks are considerably reliable. The present research is an important pioneering work on protein function research. The porcine PPI data set, the confidence score of each interaction and a list of related data are available at (http://pppid.biositemap.com/).

doi:10.1186/1477-5956-10-2

PMCID: PMC3306829
PMID: 22230699

protein-protein interaction network; Interolog; D-MIST; M-MIST topological properties; Pfam domain annotations; GO annotations

Background

Many biological networks such as protein-protein interaction networks, signaling networks, and metabolic networks have topological characteristics of a scale-free degree distribution. Preferential attachment has been considered as the most plausible evolutionary growth model to explain this topological property. Although various studies have been undertaken to investigate the structural characteristics of a network obtained using this growth model, its dynamical characteristics have received relatively less attention.

Results

In this paper, we focus on the robustness of a network that is acquired during its evolutionary process. Through simulations using Boolean network models, we found that preferential attachment increases the number of coupled feedback loops in the course of network evolution. Whereas, if networks evolve to have more coupled feedback loops rather than following preferential attachment, the resulting networks are more robust than those obtained through preferential attachment, although both of them have similar degree distributions.

Conclusion

The presented analysis demonstrates that coupled feedback loops may play an important role in network evolution to acquire robustness. The result also provides a hint as to why various biological networks have evolved to contain a number of coupled feedback loops.

doi:10.1186/1471-2105-8-430

PMCID: PMC2249609
PMID: 17988389

Miyamoto-Sato, Etsuko | Fujimori, Shigeo | Ishizaka, Masamichi | Hirai, Naoya | Masuoka, Kazuyo | Saito, Rintaro | Ozawa, Yosuke | Hino, Katsuya | Washio, Takanori | Tomita, Masaru | Yamashita, Tatsuhiro | Oshikubo, Tomohiro | Akasaka, Hidetoshi | Sugiyama, Jun | Matsumoto, Yasuo | Yanagawa, Hiroshi | Peccoud, Jean
Large-scale data sets of protein-protein interactions (PPIs) are a valuable resource for mapping and analysis of the topological and dynamic features of interactome networks. The currently available large-scale PPI data sets only contain information on interaction partners. The data presented in this study also include the sequences involved in the interactions (i.e., the interacting regions, IRs) suggested to correspond to functional and structural domains. Here we present the first large-scale IR data set obtained using mRNA display for 50 human transcription factors (TFs), including 12 transcription-related proteins. The core data set (966 IRs; 943 PPIs) displays a verification rate of 70%. Analysis of the IR data set revealed the existence of IRs that interact with multiple partners. Furthermore, these IRs were preferentially associated with intrinsic disorder. This finding supports the hypothesis that intrinsically disordered regions play a major role in the dynamics and diversity of TF networks through their ability to structurally adapt to and bind with multiple partners. Accordingly, this domain-based interaction resource represents an important step in refining protein interactions and networks at the domain level and in associating network analysis with biological structure and function.

doi:10.1371/journal.pone.0009289

PMCID: PMC2827538
PMID: 20195357

The large influx of data from high-throughput genomic and proteomic technologies has encouraged the researchers to seek approaches for understanding the structure of gene regulatory networks and proteomic networks. This work reviews some of the most important statistical methods used for modeling of gene regulatory networks (GRNs) and protein-protein interaction (PPI) networks. The paper focuses on the recent advances in the statistical graphical modeling techniques, state-space representation models, and information theoretic methods that were proposed for inferring the topology of GRNs. It appears that the problem of inferring the structure of PPI networks is quite different from that of GRNs. Clustering and probabilistic graphical modeling techniques are of prime importance in the statistical inference of PPI networks, and some of the recent approaches using these techniques are also reviewed in this paper. Performance evaluation criteria for the approaches used for modeling GRNs and PPI networks are also discussed.

doi:10.1155/2013/953814

PMCID: PMC3594945
PMID: 23509452

With protein or gene interaction systems as the background, this paper proposes an evolving model of biological undirected networks, which are consistent with some plausible mechanisms in biology. Through introducing a rule of preferential duplication of a node inversely proportional to the degree of existing nodes and an attribute of the age of the node (the older, the more influence), by which the probability of a node receiving re-wiring links is chosen, the model networks generated in certain parameter conditions could reproduce series of statistic topological characteristics of real biological graphs, including the scale-free feature, small world effect, hierarchical modularity, limited structural robustness, and disassortativity of degree–degree correlation.

doi:10.1007/s10867-009-9142-3

PMCID: PMC2669123
PMID: 19669562

Biological undirected networks; Preferential duplication; Nodes; Disassortativity

Background

How gene duplication has influenced the evolution of gene networks is one of the core problems in evolution. Current duplication-divergence theories generally suggested that genes on the periphery of the networks were preferentially retained after gene duplication. However, previous studies were mostly based on gene networks in invertebrate species, and they had the inherent shortcoming of not being able to provide information on how the duplication-divergence process proceeded along the time axis during major speciation events.

Results

In this study, we constructed a model system consisting of human G protein-coupled receptors (GPCRs) and their downstream genes in the GPCR pathways. These two groups of genes offered a natural partition of genes in the peripheral and the backbone layers of the network. Analysis of the age distributions of the duplication events in human GPCRs and "downstream genes" gene families indicated that they both experienced an explosive expansion at the time of early vertebrate emergence. However, we found only GPCR families saw a continued expansion after early vertebrates, mostly prominently in several small subfamilies of GPCRs involved in immune responses and sensory responses.

Conclusion

In general, in the human GPCR model system, we found that the position of a gene in the gene networks has significant influences on the likelihood of fixation of its duplicates. However, for a super gene family, the influence was not uniform among subfamilies. For super families, such as GPCRs, whose gene basis of expression diversity was well established at early vertebrates, continued expansions were mostly prominent in particular small subfamilies mainly involved in lineage-specific functions.

doi:10.1186/1471-2164-10-S1-S14

PMCID: PMC2709257
PMID: 19594873