Search tips
Search criteria

Results 1-25 (1199402)

Clipboard (0)

Related Articles

1.  Network Archaeology: Uncovering Ancient Networks from Present-Day Interactions 
PLoS Computational Biology  2011;7(4):e1001119.
What proteins interacted in a long-extinct ancestor of yeast? How have different members of a protein complex assembled together over time? Our ability to answer such questions has been limited by the unavailability of ancestral protein-protein interaction (PPI) networks. To overcome this limitation, we propose several novel algorithms to reconstruct the growth history of a present-day network. Our likelihood-based method finds a probable previous state of the graph by applying an assumed growth model backwards in time. This approach retains node identities so that the history of individual nodes can be tracked. Using this methodology, we estimate protein ages in the yeast PPI network that are in good agreement with sequence-based estimates of age and with structural features of protein complexes. Further, by comparing the quality of the inferred histories for several different growth models (duplication-mutation with complementarity, forest fire, and preferential attachment), we provide additional evidence that a duplication-based model captures many features of PPI network growth better than models designed to mimic social network growth. From the reconstructed history, we model the arrival time of extant and ancestral interactions and predict that complexes have significantly re-wired over time and that new edges tend to form within existing complexes. We also hypothesize a distribution of per-protein duplication rates, track the change of the network's clustering coefficient, and predict paralogous relationships between extant proteins that are likely to be complementary to the relationships inferred using sequence alone. Finally, we infer plausible parameters for the model, thereby predicting the relative probability of various evolutionary events. The success of these algorithms indicates that parts of the history of the yeast PPI are encoded in its present-day form.
Author Summary
Many questions about present-day interaction networks could be answered by tracking how the network changed over time. We present a suite of algorithms to uncover an approximate node-by-node and edge-by-edge history of changes of a network when given only a present-day network and a plausible growth model by which it evolved. Our approach tracks the extant network backwards in time by finding high-likelihood previous configurations. Using topology alone, we show we can estimate protein ages and can identify anchor nodes from which proteins have duplicated. Our reconstructed histories also allow us to study how topological properties of the network have changed over time and how interactions and modules may have evolved. Further, we provide another line of evidence indicating that major features of the evolution of the yeast PPI are best captured by a duplication-based model. The study of inferred ancient networks is a novel application of dynamic network analysis that can unveil the evolutionary principles that drive cellular mechanisms. The algorithms presented here will likely also be useful for investigating other ancient, unavailable networks.
PMCID: PMC3077358  PMID: 21533211
2.  Not All Scale-Free Networks Are Born Equal: The Role of the Seed Graph in PPI Network Evolution 
PLoS Computational Biology  2007;3(7):e118.
The (asymptotic) degree distributions of the best-known “scale-free” network models are all similar and are independent of the seed graph used; hence, it has been tempting to assume that networks generated by these models are generally similar. In this paper, we observe that several key topological features of such networks depend heavily on the specific model and the seed graph used. Furthermore, we show that starting with the “right” seed graph (typically a dense subgraph of the protein–protein interaction network analyzed), the duplication model captures many topological features of publicly available protein–protein interaction networks very well.
Author Summary
The interactions among proteins in an organism can be represented as a protein–protein interaction (PPI) network, where each protein is represented with a node, and each interaction is represented with an edge between two nodes. As PPI networks of several model organisms become available, their topological features attract considerable attention. It is believed that the available PPI networks are (1) “small-world” networks, and (2) their degree distribution is in the form of a “power law.” In other words, (1) it is possible to reach from a protein to any other protein in only a small (approximately six) number of hops, and (2) although most proteins have only a few interactions (one or two), there are a few proteins with many more interactions (200 or more) and that act as “hubs.” It has thus been tempting to develop simple mathematical network generators with topological features similar to those of the available PPI networks. One such model, the “duplication model,” is based on Ohno's model of genome growth. It starts with a small “seed network” and grows by “duplicating” one of the existing nodes at a time, with an identical set of interactions; a randomly selected subset of these interactions is then deleted, and a few new interactions are added at random. It has been mathematically proven that the duplication model provides a small-world network and also has a power-law degree distribution. What we show in this paper is that by choosing the “right” seed network, many other topological features of the available PPI networks can be captured by the duplication model. The right seed network in this case turns out to include two sizable “cliques” (subnetworks where all node pairs are connected) with many interactions in between. In this paper, we also consider the preferential attachment model, which again grows by adding to a seed network one node at a time and connecting the new node to every other node with probability proportional to the existing degree of the second node. Because the preferential attachment model also provides a small-world network and has a power-law degree distribution, it has been considered equivalent to the duplication model. We show that the two models are vastly different in terms of other topological features we consider, and the preferential attachment model cannot capture some key features of the available PPI networks.
PMCID: PMC1913096  PMID: 17616981
3.  Modeling protein network evolution under genome duplication and domain shuffling 
BMC Systems Biology  2007;1:49.
Successive whole genome duplications have recently been firmly established in all major eukaryote kingdoms. Such exponential evolutionary processes must have largely contributed to shape the topology of protein-protein interaction (PPI) networks by outweighing, in particular, all time-linear network growths modeled so far.
We propose and solve a mathematical model of PPI network evolution under successive genome duplications. This demonstrates, from first principles, that evolutionary conservation and scale-free topology are intrinsically linked properties of PPI networks and emerge from i) prevailing exponential network dynamics under duplication and ii) asymmetric divergence of gene duplicates. While required, we argue that this asymmetric divergence arises, in fact, spontaneously at the level of protein-binding sites. This supports a refined model of PPI network evolution in terms of protein domains under exponential and asymmetric duplication/divergence dynamics, with multidomain proteins underlying the combinatorial formation of protein complexes. Genome duplication then provides a powerful source of PPI network innovation by promoting local rearrangements of multidomain proteins on a genome wide scale. Yet, we show that the overall conservation and topology of PPI networks are robust to extensive domain shuffling of multidomain proteins as well as to finer details of protein interaction and evolution. Finally, large scale features of direct and indirect PPI networks of S. cerevisiae are well reproduced numerically with only two adjusted parameters of clear biological significance (i.e. network effective growth rate and average number of protein-binding domains per protein).
This study demonstrates the statistical consequences of genome duplication and domain shuffling on the conservation and topology of PPI networks over a broad evolutionary scale across eukaryote kingdoms. In particular, scale-free topologies of PPI networks, which are found to be robust to extensive shuffling of protein domains, appear to be a simple consequence of the conservation of protein-binding domains under asymmetric duplication/divergence dynamics in the course of evolution.
PMCID: PMC2245809  PMID: 17999763
4.  The Evolutionary Dynamics of Protein-Protein Interaction Networks Inferred from the Reconstruction of Ancient Networks 
PLoS ONE  2013;8(3):e58134.
Cellular functions are based on the complex interplay of proteins, therefore the structure and dynamics of these protein-protein interaction (PPI) networks are the key to the functional understanding of cells. In the last years, large-scale PPI networks of several model organisms were investigated. A number of theoretical models have been developed to explain both the network formation and the current structure. Favored are models based on duplication and divergence of genes, as they most closely represent the biological foundation of network evolution. However, studies are often based on simulated instead of empirical data or they cover only single organisms. Methodological improvements now allow the analysis of PPI networks of multiple organisms simultaneously as well as the direct modeling of ancestral networks. This provides the opportunity to challenge existing assumptions on network evolution. We utilized present-day PPI networks from integrated datasets of seven model organisms and developed a theoretical and bioinformatic framework for studying the evolutionary dynamics of PPI networks. A novel filtering approach using percolation analysis was developed to remove low confidence interactions based on topological constraints. We then reconstructed the ancient PPI networks of different ancestors, for which the ancestral proteomes, as well as the ancestral interactions, were inferred. Ancestral proteins were reconstructed using orthologous groups on different evolutionary levels. A stochastic approach, using the duplication-divergence model, was developed for estimating the probabilities of ancient interactions from today's PPI networks. The growth rates for nodes, edges, sizes and modularities of the networks indicate multiplicative growth and are consistent with the results from independent static analysis. Our results support the duplication-divergence model of evolution and indicate fractality and multiplicative growth as general properties of the PPI network structure and dynamics.
PMCID: PMC3603955  PMID: 23526967
5.  In Search of the Biological Significance of Modular Structures in Protein Networks 
PLoS Computational Biology  2007;3(6):e107.
Many complex networks such as computer and social networks exhibit modular structures, where links between nodes are much denser within modules than between modules. It is widely believed that cellular networks are also modular, reflecting the relative independence and coherence of different functional units in a cell. While many authors have claimed that observations from the yeast protein–protein interaction (PPI) network support the above hypothesis, the observed structural modularity may be an artifact because the current PPI data include interactions inferred from protein complexes through approaches that create modules (e.g., assigning pairwise interactions among all proteins in a complex). Here we analyze the yeast PPI network including protein complexes (PIC network) and excluding complexes (PEC network). We find that both PIC and PEC networks show a significantly greater structural modularity than that of randomly rewired networks. Nonetheless, there is little evidence that the structural modules correspond to functional units, particularly in the PEC network. More disturbingly, there is no evolutionary conservation among yeast, fly, and nematode modules at either the whole-module or protein-pair level. Neither is there a correlation between the evolutionary or phylogenetic conservation of a protein and the extent of its participation in various modules. Using computer simulation, we demonstrate that a higher-than-expected modularity can arise during network growth through a simple model of gene duplication, without natural selection for modularity. Taken together, our results suggest the intriguing possibility that the structural modules in the PPI network originated as an evolutionary byproduct without biological significance.
Author Summary
Many complex networks are naturally divided into communities or modules, where links within modules are much denser than those across modules. For example, human individuals belonging to the same ethnic groups interact more than those from different ethnic groups. Cellular functions are also organized in a highly modular manner, where each module is a discrete object composed of a group of tightly linked components and performs a relatively independent task. It is interesting to ask whether this modularity in cellular function arises from modularity in molecular interaction networks such as the transcriptional regulatory network and protein–protein interaction (PPI) network. We analyze the yeast PPI network and show that it is indeed significantly more modular than randomly rewired networks. However, we find little evidence that the structural modules correspond to functional units. We also fail to observe any evolutionary conservation among yeast, fly, and nematode PPI modules. We then show by computer simulation that modular structures can arise during network growth via a simple model of gene duplication, without natural selection for modularity. Thus, it appears that the structural modules in the PPI network may have originated as an evolutionary byproduct without much biological significance.
PMCID: PMC1885274  PMID: 17542644
6.  Probing the Extent of Randomness in Protein Interaction Networks 
PLoS Computational Biology  2008;4(7):e1000114.
Protein–protein interaction (PPI) networks are commonly explored for the identification of distinctive biological traits, such as pathways, modules, and functional motifs. In this respect, understanding the underlying network structure is vital to assess the significance of any discovered features. We recently demonstrated that PPI networks show degree-weighted behavior, whereby the probability of interaction between two proteins is generally proportional to the product of their numbers of interacting partners or degrees. It was surmised that degree-weighted behavior is a characteristic of randomness. We expand upon these findings by developing a random, degree-weighted, network model and show that eight PPI networks determined from single high-throughput (HT) experiments have global and local properties that are consistent with this model. The apparent random connectivity in HT PPI networks is counter-intuitive with respect to their observed degree distributions; however, we resolve this discrepancy by introducing a non-network-based model for the evolution of protein degrees or “binding affinities.” This mechanism is based on duplication and random mutation, for which the degree distribution converges to a steady state that is identical to one obtained by averaging over the eight HT PPI networks. The results imply that the degrees and connectivities incorporated in HT PPI networks are characteristic of unbiased interactions between proteins that have varying individual binding affinities. These findings corroborate the observation that curated and high-confidence PPI networks are distinct from HT PPI networks and not consistent with a random connectivity. These results provide an avenue to discern indiscriminate organizations in biological networks and suggest caution in the analysis of curated and high-confidence networks.
Author Summary
A protein–protein interaction network represents the set of pair-wise associations that have been discerned between the constituent proteins of an organism. There are three main types of such networks: (i) those determined from a single high-throughput experiment; (ii) curated, where interactions are compiled from the literature; and (iii) high-confidence, which contain subsets of interactions from total sets that may comprise any from types (i) and (ii). The latter are deemed to better represent those interactions actually occurring in a cell. Through the use of graph-theoretic analyses and a random network connectivity model, we find that biological networks of type (i), determined from a single high-throughput experiment, contain random, indiscriminate, binding patterns. However, networks of type (ii) and type (iii) are not representative of the random model, suggesting that they contain biased influences upon the protein associations. These conclusions have been suspected for some time but are further clarified in this work. Our findings provide an avenue to detect unconstrained or completely random network structures and lend insights into the identification of preferentially connected networks resulting from the underlying biological processes or manual curation.
PMCID: PMC2527968  PMID: 18769589
7.  Using Likelihood-Free Inference to Compare Evolutionary Dynamics of the Protein Networks of H. pylori and P. falciparum 
PLoS Computational Biology  2007;3(11):e230.
Gene duplication with subsequent interaction divergence is one of the primary driving forces in the evolution of genetic systems. Yet little is known about the precise mechanisms and the role of duplication divergence in the evolution of protein networks from the prokaryote and eukaryote domains. We developed a novel, model-based approach for Bayesian inference on biological network data that centres on approximate Bayesian computation, or likelihood-free inference. Instead of computing the intractable likelihood of the protein network topology, our method summarizes key features of the network and, based on these, uses a MCMC algorithm to approximate the posterior distribution of the model parameters. This allowed us to reliably fit a flexible mixture model that captures hallmarks of evolution by gene duplication and subfunctionalization to protein interaction network data of Helicobacter pylori and Plasmodium falciparum. The 80% credible intervals for the duplication–divergence component are [0.64, 0.98] for H. pylori and [0.87, 0.99] for P. falciparum. The remaining parameter estimates are not inconsistent with sequence data. An extensive sensitivity analysis showed that incompleteness of PIN data does not largely affect the analysis of models of protein network evolution, and that the degree sequence alone barely captures the evolutionary footprints of protein networks relative to other statistics. Our likelihood-free inference approach enables a fully Bayesian analysis of a complex and highly stochastic system that is otherwise intractable at present. Modelling the evolutionary history of PIN data, it transpires that only the simultaneous analysis of several global aspects of protein networks enables credible and consistent inference to be made from available datasets. Our results indicate that gene duplication has played a larger part in the network evolution of the eukaryote than in the prokaryote, and suggests that single gene duplications with immediate divergence alone may explain more than 60% of biological network data in both domains.
Author Summary
The importance of gene duplication to biological evolution has been recognized since the 1930s. For more than a decade, substantial evidence has been collected from genomic sequence data in order to elucidate the importance and the mechanisms of gene duplication; however, most biological characteristics arise from complex interactions between the cell's numerous constituents. Recently, preliminary descriptions of the protein interaction networks have become available for species of different domains. Adapting novel techniques in stochastic simulation, the authors demonstrate that evolutionary inferences can be drawn from large-scale, incomplete network data by fitting a stochastic model of network growth that captures hallmarks of evolution by duplication and divergence. They have also analyzed the effect of summarizing protein networks in different ways, and show that a reliable and consistent analysis requires many aspects of network data to be considered jointly; in contrast to what is commonly done in practice. Their results indicate that duplication and divergence has played a larger role in the network evolution of the eukaryote P. falciparum than in the prokaryote H. pylori, and emphasize at least for the eukaryote the potential importance of subfunctionalization in network evolution.
PMCID: PMC2098858  PMID: 18052538
8.  Discovering functional interaction patterns in protein-protein interaction networks 
BMC Bioinformatics  2008;9:276.
In recent years, a considerable amount of research effort has been directed to the analysis of biological networks with the availability of genome-scale networks of genes and/or proteins of an increasing number of organisms. A protein-protein interaction (PPI) network is a particular biological network which represents physical interactions between pairs of proteins of an organism. Major research on PPI networks has focused on understanding the topological organization of PPI networks, evolution of PPI networks and identification of conserved subnetworks across different species, discovery of modules of interaction, use of PPI networks for functional annotation of uncharacterized proteins, and improvement of the accuracy of currently available networks.
In this article, we map known functional annotations of proteins onto a PPI network in order to identify frequently occurring interaction patterns in the functional space. We propose a new frequent pattern identification technique, PPISpan, adapted specifically for PPI networks from a well-known frequent subgraph identification method, gSpan. Existing module discovery techniques either look for specific clique-like highly interacting protein clusters or linear paths of interaction. However, our goal is different; instead of single clusters or pathways, we look for recurring functional interaction patterns in arbitrary topologies. We have applied PPISpan on PPI networks of Saccharomyces cerevisiae and identified a number of frequently occurring functional interaction patterns.
With the help of PPISpan, recurring functional interaction patterns in an organism's PPI network can be identified. Such an analysis offers a new perspective on the modular organization of PPI networks. The complete list of identified functional interaction patterns is available at .
PMCID: PMC2442100  PMID: 18547430
9.  GraphCrunch 2: Software tool for network modeling, alignment and clustering 
BMC Bioinformatics  2011;12:24.
Recent advancements in experimental biotechnology have produced large amounts of protein-protein interaction (PPI) data. The topology of PPI networks is believed to have a strong link to their function. Hence, the abundance of PPI data for many organisms stimulates the development of computational techniques for the modeling, comparison, alignment, and clustering of networks. In addition, finding representative models for PPI networks will improve our understanding of the cell just as a model of gravity has helped us understand planetary motion. To decide if a model is representative, we need quantitative comparisons of model networks to real ones. However, exact network comparison is computationally intractable and therefore several heuristics have been used instead. Some of these heuristics are easily computable "network properties," such as the degree distribution, or the clustering coefficient. An important special case of network comparison is the network alignment problem. Analogous to sequence alignment, this problem asks to find the "best" mapping between regions in two networks. It is expected that network alignment might have as strong an impact on our understanding of biology as sequence alignment has had. Topology-based clustering of nodes in PPI networks is another example of an important network analysis problem that can uncover relationships between interaction patterns and phenotype.
We introduce the GraphCrunch 2 software tool, which addresses these problems. It is a significant extension of GraphCrunch which implements the most popular random network models and compares them with the data networks with respect to many network properties. Also, GraphCrunch 2 implements the GRAph ALigner algorithm ("GRAAL") for purely topological network alignment. GRAAL can align any pair of networks and exposes large, dense, contiguous regions of topological and functional similarities far larger than any other existing tool. Finally, GraphCruch 2 implements an algorithm for clustering nodes within a network based solely on their topological similarities. Using GraphCrunch 2, we demonstrate that eukaryotic and viral PPI networks may belong to different graph model families and show that topology-based clustering can reveal important functional similarities between proteins within yeast and human PPI networks.
GraphCrunch 2 is a software tool that implements the latest research on biological network analysis. It parallelizes computationally intensive tasks to fully utilize the potential of modern multi-core CPUs. It is open-source and freely available for research use. It runs under the Windows and Linux platforms.
PMCID: PMC3036622  PMID: 21244715
10.  Generative probabilistic models for protein–protein interaction networks—the biclique perspective 
Bioinformatics  2011;27(13):i142-i148.
Motivation: Much of the large-scale molecular data from living cells can be represented in terms of networks. Such networks occupy a central position in cellular systems biology. In the protein–protein interaction (PPI) network, nodes represent proteins and edges represent connections between them, based on experimental evidence. As PPI networks are rich and complex, a mathematical model is sought to capture their properties and shed light on PPI evolution. The mathematical literature contains various generative models of random graphs. It is a major, still largely open question, which of these models (if any) can properly reproduce various biologically interesting networks. Here, we consider this problem where the graph at hand is the PPI network of Saccharomyces cerevisiae. We are trying to distinguishing between a model family which performs a process of copying neighbors, represented by the duplication–divergence (DD) model, and models which do not copy neighbors, with the Barabási–Albert (BA) preferential attachment model as a leading example.
Results: The observed property of the network is the distribution of maximal bicliques in the graph. This is a novel criterion to distinguish between models in this area. It is particularly appropriate for this purpose, since it reflects the graph's growth pattern under either model. This test clearly favors the DD model. In particular, for the BA model, the vast majority (92.9%) of the bicliques with both sides ≥4 must be already embedded in the model's seed graph, whereas the corresponding figure for the DD model is only 5.1%. Our results, based on the biclique perspective, conclusively show that a naïve unmodified DD model can capture a key aspect of PPI networks.
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3117378  PMID: 21685063
11.  Reconstructing Genome-Wide Protein–Protein Interaction Networks Using Multiple Strategies with Homologous Mapping 
PLoS ONE  2015;10(1):e0116347.
One of the crucial steps toward understanding the biological functions of a cellular system is to investigate protein–protein interaction (PPI) networks. As an increasing number of reliable PPIs become available, there is a growing need for discovering PPIs to reconstruct PPI networks of interesting organisms. Some interolog-based methods and homologous PPI families have been proposed for predicting PPIs from the known PPIs of source organisms.
Here, we propose a multiple-strategy scoring method to identify reliable PPIs for reconstructing the mouse PPI network from two well-known organisms: human and fly. We firstly identified the PPI candidates of target organisms based on homologous PPIs, sharing significant sequence similarities (joint E-value ≤ 1 × 10−40), from source organisms using generalized interolog mapping. These PPI candidates were evaluated by our multiple-strategy scoring method, combining sequence similarities, normalized ranks, and conservation scores across multiple organisms. According to 106,825 PPI candidates in yeast derived from human and fly, our scoring method can achieve high prediction accuracy and outperform generalized interolog mapping. Experiment results show that our multiple-strategy score can avoid the influence of the protein family size and length to significantly improve PPI prediction accuracy and reflect the biological functions. In addition, the top-ranked and conserved PPIs are often orthologous/essential interactions and share the functional similarity. Based on these reliable predicted PPIs, we reconstructed a comprehensive mouse PPI network, which is a scale-free network and can reflect the biological functions and high connectivity of 292 KEGG modules, including 216 pathways and 76 structural complexes.
Experimental results show that our scoring method can improve the predicting accuracy based on the normalized rank and evolutionary conservation from multiple organisms. Our predicted PPIs share similar biological processes and cellular components, and the reconstructed genome-wide PPI network can reflect network topology and modularity. We believe that our method is useful for inferring reliable PPIs and reconstructing a comprehensive PPI network of an interesting organism.
PMCID: PMC4300222  PMID: 25602759
12.  Simulated Evolution of Protein-Protein Interaction Networks with Realistic Topology 
PLoS ONE  2012;7(6):e39052.
We model the evolution of eukaryotic protein-protein interaction (PPI) networks. In our model, PPI networks evolve by two known biological mechanisms: (1) Gene duplication, which is followed by rapid diversification of duplicate interactions. (2) Neofunctionalization, in which a mutation leads to a new interaction with some other protein. Since many interactions are due to simple surface compatibility, we hypothesize there is an increased likelihood of interacting with other proteins in the target protein’s neighborhood. We find good agreement of the model on 10 different network properties compared to high-confidence experimental PPI networks in yeast, fruit flies, and humans. Key findings are: (1) PPI networks evolve modular structures, with no need to invoke particular selection pressures. (2) Proteins in cells have on average about 6 degrees of separation, similar to some social networks, such as human-communication and actor networks. (3) Unlike social networks, which have a shrinking diameter (degree of maximum separation) over time, PPI networks are predicted to grow in diameter. (4) The model indicates that evolutionarily old proteins should have higher connectivities and be more centrally embedded in their networks. This suggests a way in which present-day proteomics data could provide insights into biological evolution.
PMCID: PMC3387198  PMID: 22768057
13.  Genetic interactions reveal the evolutionary trajectories of duplicate genes 
Duplicate genes show significantly fewer interactions than singleton genes, and functionally similar duplicates can exhibit dissimilar profiles because common interactions are ‘hidden' due to buffering.Genetic interaction profiles provide insights into evolutionary mechanisms of duplicate retention by distinguishing duplicates under dosage selection from those retained because of some divergence in function.The genetic interactions of duplicate genes evolve in an extremely asymmetric way and the directionality of this asymmetry correlates well with other evolutionary properties of duplicate genes.Genetic interaction profiles can be used to elucidate the divergent function of specific duplicate pairs.
Gene duplication and divergence serves as a primary source for new genes and new functions, and as such has broad implications on the evolutionary process. Duplicate genes within S. cerevisiae have been shown to retain a high degree of similarity with regard to many of their functional properties (Papp et al, 2004; Guan et al, 2007; Wapinski et al, 2007; Musso et al, 2008), and perturbation of duplicate genes has been shown to result in smaller fitness defects than singleton genes (Gu et al, 2003; DeLuna et al, 2008; Dean et al, 2008; Musso et al, 2008). Individual genetic interactions between pairs of genes and profiles of such interactions across the entire genome provide a new context in which to examine the properties of duplicate compensation.
In this study we use the most recent and comprehensive set of genetic interactions in yeast produced to date (Costanzo et al, 2010) to address questions of duplicate retention and redundancy. We show that the ability for duplicate genes to buffer the deletion of a partner has three main consequences. First it agrees with previous work demonstrating that a high proportion of duplicate pairs are synthetic lethal, a classic indication of the ability to buffer one another functionally (DeLuna et al, 2008; Dean et al, 2008; Musso et al, 2008). Second, it reduces the number of genetic interactions observed between duplicate genes and the rest of the genome by masking interactions relating to common function from experimental detection. Third, this buffering of common interactions serves to reduce profile similarity in spite of common function (Figure 1). The compensatory ability of functionally similar duplicates buffers genetic interactions related to their common function (reducing the number of genetic interactions overall), while allowing the measurement of interactions related to any divergent function. Thus, even functionally similar duplicates may have dissimilar genetic interaction profiles. As previously surmised (Ihmels et al, 2007), duplicate genes under selection for dosage amplification have differing profile characteristics. We show that dosage-mediated duplicates have much higher genetic interaction profile similarity than do other duplicate pairs. Furthermore, we show in a comparison with local neighbors on a protein–protein interaction (PPI) network, that although dosage-mediated duplicates more often have higher similarity to each other than they do to their neighbors, the reverse is true for duplicates in general. That is, slightly divergent duplicate genes more often exhibit a higher similarity with a common neighbor on the PPI network than they do with each other, and that observation is consistent with the idea that common interactions are buffered while interactions corresponding to divergent functions are observed.
We then asked whether duplicates' genetic interactions that are not buffered appear in a symmetric or an asymmetric fashion. Previous work has established asymmetric patterns with regard to PPI degree (Wagner, 2002; He and Zhang, 2005), sequence divergence (Conant and Wagner, 2003; Zhang et al, 2003; Kellis et al, 2004; Scannell and Wolfe, 2008) and expression patterns (Gu et al, 2002b; Tirosh and Barkai, 2007). Although genetic interactions are further removed from mechanism than protein–protein interactions, for example, they do offer a more direct measurement of functional consequence and, thus, may give a better indication of the functional differences between a duplicate pair. We found that duplicates exhibit a strikingly asymmetric pattern of genetic interactions, with the ratio of interactions between sisters commonly exceeding 7:1 (Figure 4A). The observations differ significantly from random simulations in which genetic interactions were redistributed between sisters with equal probability (Figure 4A). Moreover, the directionality of this interaction asymmetry agrees with other physiological properties of duplicate pairs. For example, the sister with more genetic interactions also tends to have more protein–protein interactions and also tends to evolve at a slower rate (Figure 4B).
Genetic interaction degree and profiles can be used to understand the functional divergence of particular duplicates pairs. As a case example, we consider the whole-genome-duplication pair CIK1–VIK1. Each of these genes encode proteins that form distinct heterodimeric complexes with the microtubule motor protein Kar3 (Manning et al, 1999). Although each of these proteins depend on a direct physical interaction with Kar3, Cik1 has a much higher profile similarity to Kar3 than does Vik1 (r=0.5 and r=0.3, respectively). Consistent with its higher similarity, Δcik1 and Δkar3 exhibit several similar phenotypes, including abnormally short spindles, chromosome loss and delayed cell cycle progression (Page et al, 1994; Manning et al, 1999). In contrast, a Δvik1 mutant strain exhibits no overt phenotype (Manning et al, 1999).
The characterization of functional redundancy and divergence between duplicate genes is an important step in understanding the evolution of genetic systems. Large-scale genetic network analysis in Saccharomyces cerevisiae provides a powerful perspective for addressing these questions through quantitative measurements of genetic interactions between pairs of duplicated genes, and more generally, through the study of genome-wide genetic interaction profiles associated with duplicated genes. We show that duplicate genes exhibit fewer genetic interactions than other genes because they tend to buffer one another functionally, whereas observed interactions are non-overlapping and reflect their divergent roles. We also show that duplicate gene pairs are highly imbalanced in their number of genetic interactions with other genes, a pattern that appears to result from asymmetric evolution, such that one duplicate evolves or degrades faster than the other and often becomes functionally or conditionally specialized. The differences in genetic interactions are predictive of differences in several other evolutionary and physiological properties of duplicate pairs.
PMCID: PMC3010121  PMID: 21081923
duplicate genes; functional divergence; genetic interactions; paralogs; Saccharomyces cerevisiae
14.  Evolution of Complex Modular Biological Networks 
PLoS Computational Biology  2008;4(2):e23.
Biological networks have evolved to be highly functional within uncertain environments while remaining extremely adaptable. One of the main contributors to the robustness and evolvability of biological networks is believed to be their modularity of function, with modules defined as sets of genes that are strongly interconnected but whose function is separable from those of other modules. Here, we investigate the in silico evolution of modularity and robustness in complex artificial metabolic networks that encode an increasing amount of information about their environment while acquiring ubiquitous features of biological, social, and engineering networks, such as scale-free edge distribution, small-world property, and fault-tolerance. These networks evolve in environments that differ in their predictability, and allow us to study modularity from topological, information-theoretic, and gene-epistatic points of view using new tools that do not depend on any preconceived notion of modularity. We find that for our evolved complex networks as well as for the yeast protein–protein interaction network, synthetic lethal gene pairs consist mostly of redundant genes that lie close to each other and therefore within modules, while knockdown suppressor gene pairs are farther apart and often straddle modules, suggesting that knockdown rescue is mediated by alternative pathways or modules. The combination of network modularity tools together with genetic interaction data constitutes a powerful approach to study and dissect the role of modularity in the evolution and function of biological networks.
Author Summary
The modular organization of cells is not immediately obvious from the network of interacting genes, proteins, and molecules. A new window into cellular modularity is opened up by genetic data that identifies pairs of genes that interact either directly or indirectly to provide robustness to cellular function. Such pairs can map out the modular nature of a network if we understand how they relate to established mathematical clustering methods applied to networks to identify putative modules. We can test the relationship between genetically interacting pairs and modules on artificial data: large networks of interacting proteins and molecules that were evolved within an artificial chemistry and genetics, and that pass the standard tests for biological networks. Modularity evolves in these networks in order to deal with a multitude of functional goals, with a degree depending on environmental variability. Relationships between genetically interacting pairs and modules similar to those displayed by the artificial gene networks are found in the protein–protein interaction network of baker's yeast. The evolution of complex functional biological networks in silico provides an opportunity to develop and test new methods and tools to understand the complexity of biological systems at the network level.
PMCID: PMC2233666  PMID: 18266463
15.  Interface-Resolved Network of Protein-Protein Interactions 
PLoS Computational Biology  2013;9(5):e1003065.
We define an interface-interaction network (IIN) to capture the specificity and competition between protein-protein interactions (PPI). This new type of network represents interactions between individual interfaces used in functional protein binding and thereby contains the detail necessary to describe the competition and cooperation between any pair of binding partners. Here we establish a general framework for the construction of IINs that merges computational structure-based interface assignment with careful curation of available literature. To complement limited structural data, the inclusion of biochemical data is critical for achieving the accuracy and completeness necessary to analyze the specificity and competition between the protein interactions. Firstly, this procedure provides a means to clarify the information content of existing data on purported protein interactions and to remove indirect and spurious interactions. Secondly, the IIN we have constructed here for proteins involved in clathrin-mediated endocytosis (CME) exhibits distinctive topological properties. In contrast to PPI networks with their global and relatively dense connectivity, the fragmentation of the IIN into distinctive network modules suggests that different functional pressures act on the evolution of its topology. Large modules in the IIN are formed by interfaces sharing specificity for certain domain types, such as SH3 domains distributed across different proteins. The shared and distinct specificity of an interface is necessary for effective negative and positive design of highly selective binding targets. Lastly, the organization of detailed structural data in a network format allows one to identify pathways of specific binding interactions and thereby predict effects of mutations at specific surfaces on a protein and of specific binding inhibitors, as we explore in several examples. Overall, the endocytosis IIN is remarkably complex and rich in features masked in the coarser PPI, and collects relevant detail of protein association in a readily interpretable format.
Author Summary
Much of the work inside the cell is carried out by proteins interacting with other proteins. Each edge in a protein-protein interaction network reflects these functional interactions and each node a separate protein, creating a complex structure that nevertheless follows well-established global and local patterns related to robust protein function. However, this network is not detailed enough to assess whether a particular protein can bind multiple interaction partners simultaneously through distinct interfaces, or whether the partners targeting a specific interface share similar structural or chemical properties. By breaking each protein node into its constituent interface nodes, we generate and assess such a detailed new network. To sample protein binding interactions broadly and accurately beyond those seen in crystal structures, our method combines computational interface assignment with data from biochemical studies. Using this approach we are able to assign interfaces to the majority of known interactions between proteins involved in the clathrin-mediated endocytosis pathway in yeast. Analysis of this interface-interaction network provides novel insights into the functional specificity of protein interactions, and highlights elements of cooperativity and competition among the proteins. By identifying diverse multi-protein complexes, interface-interaction networks also provide a map for targeted drug development.
PMCID: PMC3656101  PMID: 23696724
16.  Biomolecular network motif counting and discovery by color coding 
Bioinformatics  2008;24(13):i241-i249.
Protein–protein interaction (PPI) networks of many organisms share global topological features such as degree distribution, k-hop reachability, betweenness and closeness. Yet, some of these networks can differ significantly from the others in terms of local structures: e.g. the number of specific network motifs can vary significantly among PPI networks.
Counting the number of network motifs provides a major challenge to compare biomolecular networks. Recently developed algorithms have been able to count the number of induced occurrences of subgraphs with k≤ 7 vertices. Yet no practical algorithm exists for counting non-induced occurrences, or counting subgraphs with k≥ 8 vertices. Counting non-induced occurrences of network motifs is not only challenging but also quite desirable as available PPI networks include several false interactions and miss many others.
In this article, we show how to apply the ‘color coding’ technique for counting non-induced occurrences of subgraph topologies in the form of trees and bounded treewidth subgraphs. Our algorithm can count all occurrences of motif G′ with k vertices in a network G with n vertices in time polynomial with n, provided k=O(log n). We use our algorithm to obtain ‘treelet’ distributions for k≤ 10 of available PPI networks of unicellular organisms (Saccharomyces cerevisiae Escherichia coli and Helicobacter Pyloris), which are all quite similar, and a multicellular organism (Caenorhabditis elegans) which is significantly different. Furthermore, the treelet distribution of the unicellular organisms are similar to that obtained by the ‘duplication model’ but are quite different from that of the ‘preferential attachment model’. The treelet distribution is robust w.r.t. sparsification with bait/edge coverage of 70% but differences can be observed when bait/edge coverage drops to 50%.
PMCID: PMC2718641  PMID: 18586721
17.  Why Do Hubs in the Yeast Protein Interaction Network Tend To Be Essential: Reexamining the Connection between the Network Topology and Essentiality 
PLoS Computational Biology  2008;4(8):e1000140.
The centrality-lethality rule, which notes that high-degree nodes in a protein interaction network tend to correspond to proteins that are essential, suggests that the topological prominence of a protein in a protein interaction network may be a good predictor of its biological importance. Even though the correlation between degree and essentiality was confirmed by many independent studies, the reason for this correlation remains illusive. Several hypotheses about putative connections between essentiality of hubs and the topology of protein–protein interaction networks have been proposed, but as we demonstrate, these explanations are not supported by the properties of protein interaction networks. To identify the main topological determinant of essentiality and to provide a biological explanation for the connection between the network topology and essentiality, we performed a rigorous analysis of six variants of the genomewide protein interaction network for Saccharomyces cerevisiae obtained using different techniques. We demonstrated that the majority of hubs are essential due to their involvement in Essential Complex Biological Modules, a group of densely connected proteins with shared biological function that are enriched in essential proteins. Moreover, we rejected two previously proposed explanations for the centrality-lethality rule, one relating the essentiality of hubs to their role in the overall network connectivity and another relying on the recently published essential protein interactions model.
Author Summary
Analysis of protein interaction networks in the budding yeast Saccharomyces cerevisiae has revealed that a small number of proteins, the so-called hubs, interact with a disproportionately large number of other proteins. Furthermore, many hub proteins have been shown to be essential for survival of the cell—that is, in optimal conditions, yeast cannot grow and multiply without them. This relation between essentiality and the number of neighbors in the protein–protein interaction network has been termed the centrality-lethality rule. However, why are such hubs essential? Jeong and colleagues [1] suggested that overrepresentation of essential proteins among high-degree nodes can be attributed to the central role that hubs play in mediating interactions among numerous, less connected proteins. Another view, proposed by He and Zhang, suggested that that the majority of proteins are essential due to their involvement in one or more essential protein–protein interactions that are distributed uniformly at random along the network edges [2]. We find that none of the above reasons determines essentiality. Instead, the majority of hubs are essential due to their involvement in Essential Complex Biological Modules, a group of densely connected proteins with shared biological function that are enriched in essential proteins. This study sheds new light on the topological complexity of protein interaction networks.
PMCID: PMC2467474  PMID: 18670624
18.  Systems-level cancer gene identification from protein interaction network topology applied to melanogenesis-related functional genomics data 
Many real-world phenomena have been described in terms of large networks. Networks have been invaluable models for the understanding of biological systems. Since proteins carry out most biological processes, we focus on analysing protein–protein interaction (PPI) networks. Proteins interact to perform a function. Thus, PPI networks reflect the interconnected nature of biological processes and analysing their structural properties could provide insights into biological function and disease. We have already demonstrated, by using a sensitive graph theoretic method for comparing topologies of node neighbourhoods called ‘graphlet degree signatures’, that proteins with similar surroundings in PPI networks tend to perform the same functions. Here, we explore whether the involvement of genes in cancer suggests the similarity of their topological ‘signatures’ as well. By applying a series of clustering methods to proteins' topological signature similarities, we demonstrate that the obtained clusters are significantly enriched with cancer genes. We apply this methodology to identify novel cancer gene candidates, validating 80 per cent of our predictions in the literature. We also validate predictions biologically by identifying cancer-related negative regulators of melanogenesis identified in our siRNA screen. This is encouraging, since we have done this solely from PPI network topology. We provide clear evidence that PPI network structure around cancer genes is different from the structure around non-cancer genes. Understanding the underlying principles of this phenomenon is an open question, with a potential for increasing our understanding of complex diseases.
PMCID: PMC2842789  PMID: 19625303
biological networks; protein interaction networks; network topology; cancer gene identification
19.  Examination of the relationship between essential genes in PPI network and hub proteins in reverse nearest neighbor topology 
BMC Bioinformatics  2010;11:505.
In many protein-protein interaction (PPI) networks, densely connected hub proteins are more likely to be essential proteins. This is referred to as the "centrality-lethality rule", which indicates that the topological placement of a protein in PPI network is connected with its biological essentiality. Though such connections are observed in many PPI networks, the underlying topological properties for these connections are not yet clearly understood. Some suggested putative connections are the involvement of essential proteins in the maintenance of overall network connections, or that they play a role in essential protein clusters. In this work, we have attempted to examine the placement of essential proteins and the network topology from a different perspective by determining the correlation of protein essentiality and reverse nearest neighbor topology (RNN).
The RNN topology is a weighted directed graph derived from PPI network, and it is a natural representation of the topological dependences between proteins within the PPI network. Similar to the original PPI network, we have observed that essential proteins tend to be hub proteins in RNN topology. Additionally, essential genes are enriched in clusters containing many hub proteins in RNN topology (RNN protein clusters). Based on these two properties of essential genes in RNN topology, we have proposed a new measure; the RNN cluster centrality. Results from a variety of PPI networks demonstrate that RNN cluster centrality outperforms other centrality measures with regard to the proportion of selected proteins that are essential proteins. We also investigated the biological importance of RNN clusters.
This study reveals that RNN cluster centrality provides the best correlation of protein essentiality and placement of proteins in PPI network. Additionally, merged RNN clusters were found to be topologically important in that essential proteins are significantly enriched in RNN clusters, and biologically important because they play an important role in many Gene Ontology (GO) processes.
PMCID: PMC3098085  PMID: 20939873
20.  Dissecting the Human Protein-Protein Interaction Network via Phylogenetic Decomposition 
Scientific Reports  2014;4:7153.
The protein-protein interaction (PPI) network offers a conceptual framework for better understanding the functional organization of the proteome. However, the intricacy of network complexity complicates comprehensive analysis. Here, we adopted a phylogenic grouping method combined with force-directed graph simulation to decompose the human PPI network in a multi-dimensional manner. This network model enabled us to associate the network topological properties with evolutionary and biological implications. First, we found that ancient proteins occupy the core of the network, whereas young proteins tend to reside on the periphery. Second, the presence of age homophily suggests a possible selection pressure may have acted on the duplication and divergence process during the PPI network evolution. Lastly, functional analysis revealed that each age group possesses high specificity of enriched biological processes and pathway engagements, which could correspond to their evolutionary roles in eukaryotic cells. More interestingly, the network landscape closely coincides with the subcellular localization of proteins. Together, these findings suggest the potential of using conceptual frameworks to mimic the true functional organization in a living cell.
PMCID: PMC4239568  PMID: 25412639
21.  Dominating Biological Networks 
PLoS ONE  2011;6(8):e23016.
Proteins are essential macromolecules of life that carry out most cellular processes. Since proteins aggregate to perform function, and since protein-protein interaction (PPI) networks model these aggregations, one would expect to uncover new biology from PPI network topology. Hence, using PPI networks to predict protein function and role of protein pathways in disease has received attention. A debate remains open about whether network properties of “biologically central (BC)” genes (i.e., their protein products), such as those involved in aging, cancer, infectious diseases, or signaling and drug-targeted pathways, exhibit some topological centrality compared to the rest of the proteins in the human PPI network.
To help resolve this debate, we design new network-based approaches and apply them to get new insight into biological function and disease. We hypothesize that BC genes have a topologically central (TC) role in the human PPI network. We propose two different concepts of topological centrality. We design a new centrality measure to capture complex wirings of proteins in the network that identifies as TC those proteins that reside in dense extended network neighborhoods. Also, we use the notion of domination and find dominating sets (DSs) in the PPI network, i.e., sets of proteins such that every protein is either in the DS or is a neighbor of the DS. Clearly, a DS has a TC role, as it enables efficient communication between different network parts.
We find statistically significant enrichment in BC genes of TC nodes and outperform the existing methods indicating that genes involved in key biological processes occupy topologically complex and dense regions of the network and correspond to its “spine” that connects all other network parts and can thus pass cellular signals efficiently throughout the network. To our knowledge, this is the first study that explores domination in the context of PPI networks.
PMCID: PMC3162560  PMID: 21887225
22.  POINeT: protein interactome with sub-network analysis and hub prioritization 
BMC Bioinformatics  2009;10:114.
Protein-protein interactions (PPIs) are critical to every aspect of biological processes. Expansion of all PPIs from a set of given queries often results in a complex PPI network lacking spatiotemporal consideration. Moreover, the reliability of available PPI resources, which consist of low- and high-throughput data, for network construction remains a significant challenge. Even though a number of software tools are available to facilitate PPI network analysis, an integrated tool is crucial to alleviate the burden on querying across multiple web servers and software tools.
We have constructed an integrated web service, POINeT, to simplify the process of PPI searching, analysis, and visualization. POINeT merges PPI and tissue-specific expression data from multiple resources. The tissue-specific PPIs and the numbers of research papers supporting the PPIs can be filtered with user-adjustable threshold values and are dynamically updated in the viewer. The network constructed in POINeT can be readily analyzed with, for example, the built-in centrality calculation module and an integrated network viewer. Nodes in global networks can also be ranked and filtered using various network analysis formulas, i.e., centralities. To prioritize the sub-network, we developed a ranking filtered method (S3) to uncover potential novel mediators in the midbody network. Several examples are provided to illustrate the functionality of POINeT. The network constructed from four schizophrenia risk markers suggests that EXOC4 might be a novel marker for this disease. Finally, a liver-specific PPI network has been filtered with adult and fetal liver expression profiles.
The functionalities provided by POINeT are highly improved compared to previous version of POINT. POINeT enables the identification and ranking of potential novel genes involved in a sub-network. Combining with tissue-specific gene expression profiles, PPIs specific to selected tissues can be revealed. The straightforward interface of POINeT makes PPI search and analysis just a few clicks away. The modular design permits further functional enhancement without hampering the simplicity. POINeT is available at .
PMCID: PMC2683814  PMID: 19379523
23.  Integrating domain similarity to improve protein complexes identification in TAP-MS data 
Proteome Science  2013;11(Suppl 1):S2.
Detecting protein complexes in protein-protein interaction (PPI) networks plays an important role in improving our understanding of the dynamic of cellular organisation. However, protein interaction data generated by high-throughput experiments such as yeast-two-hybrid (Y2H) and tandem affinity-purification/mass-spectrometry (TAP-MS) are characterised by the presence of a significant number of false positives and false negatives. In recent years there has been a growing trend to incorporate diverse domain knowledge to support large-scale analysis of PPI networks.
This paper presents a new algorithm, by incorporating Gene Ontology (GO) based semantic similarities, to detect protein complexes from PPI networks generated by TAP-MS. By taking co-complex relations in TAP-MS data into account, TAP-MS PPI networks are modelled as bipartite graph, where bait proteins consist of one set of nodes and prey proteins are on the other. Similarities between pairs of bait proteins are computed by considering both the topological features and GO-driven semantic similarities. Bait proteins are then grouped in to sets of clusters based on their pair-wise similarities to produce a set of 'seed' clusters. An expansion process is applied to each 'seed' cluster to recruit prey proteins which are significantly associated with the same set of bait proteins. Thus, completely identified protein complexes are then obtained.
The proposed algorithm has been applied to real TAP-MS PPI networks. Fifteen quality measures have been employed to evaluate the quality of generated protein complexes. Experimental results show that the proposed algorithm has greatly improved the accuracy of identifying complexes and outperformed several state-of-the-art clustering algorithms. Moreover, by incorporating semantic similarity, the proposed algorithm is more robust to noises in the networks.
PMCID: PMC3907791  PMID: 24565259
24.  Functional organization and its implication in evolution of the human protein-protein interaction network 
BMC Genomics  2012;13:150.
Based on the distinguishing properties of protein-protein interaction networks such as power-law degree distribution and modularity structure, several stochastic models for the evolution of these networks have been purposed, motivated by the idea that a validated model should reproduce similar topological properties of the empirical network. However, being able to capture topological properties does not necessarily mean it correctly reproduces how networks emerge and evolve. More importantly, there is already evidence suggesting functional organization and significance of these networks. The current stochastic models of evolution, however, grow the network without consideration for biological function and natural selection.
To test whether protein interaction networks are functionally organized and their impacts on the evolution of these networks, we analyzed their evolution at both the topological and functional level. We find that the human network is shown to be functionally organized, and its function evolves with the topological properties of the network. Our analysis suggests that function most likely affects local modularity of the network. Consistently, we further found that the topological unit is also the functional unit of the network.
We have demonstrated functional organization of a protein interaction network. Given our observations, we suggest that its significance should not be overlooked when studying network evolution.
PMCID: PMC3375200  PMID: 22530615
25.  The topology of the bacterial co-conserved protein network and its implications for predicting protein function 
BMC Genomics  2008;9:313.
Protein-protein interactions networks are most often generated from physical protein-protein interaction data. Co-conservation, also known as phylogenetic profiles, is an alternative source of information for generating protein interaction networks. Co-conservation methods generate interaction networks among proteins that are gained or lost together through evolution. Co-conservation is a particularly useful technique in the compact bacteria genomes. Prior studies in yeast suggest that the topology of protein-protein interaction networks generated from physical interaction assays can offer important insight into protein function. Here, we hypothesize that in bacteria, the topology of protein interaction networks derived via co-conservation information could similarly improve methods for predicting protein function. Since the topology of bacteria co-conservation protein-protein interaction networks has not previously been studied in depth, we first perform such an analysis for co-conservation networks in E. coli K12. Next, we demonstrate one way in which network connectivity measures and global and local function distribution can be exploited to predict protein function for previously uncharacterized proteins.
Our results showed, like most biological networks, our bacteria co-conserved protein-protein interaction networks had scale-free topologies. Our results indicated that some properties of the physical yeast interaction network hold in our bacteria co-conservation networks, such as high connectivity for essential proteins. However, the high connectivity among protein complexes in the yeast physical network was not seen in the co-conservation network which uses all bacteria as the reference set. We found that the distribution of node connectivity varied by functional category and could be informative for function prediction. By integrating of functional information from different annotation sources and using the network topology, we were able to infer function for uncharacterized proteins.
Interactions networks based on co-conservation can contain information distinct from networks based on physical or other interaction types. Our study has shown co-conservation based networks to exhibit a scale free topology, as expected for biological networks. We also revealed ways that connectivity in our networks can be informative for the functional characterization of proteins.
PMCID: PMC2488357  PMID: 18590549

Results 1-25 (1199402)