Protein-protein interactions networks are most often generated from physical protein-protein interaction data. Co-conservation, also known as phylogenetic profiles, is an alternative source of information for generating protein interaction networks. Co-conservation methods generate interaction networks among proteins that are gained or lost together through evolution. Co-conservation is a particularly useful technique in the compact bacteria genomes. Prior studies in yeast suggest that the topology of protein-protein interaction networks generated from physical interaction assays can offer important insight into protein function. Here, we hypothesize that in bacteria, the topology of protein interaction networks derived via co-conservation information could similarly improve methods for predicting protein function. Since the topology of bacteria co-conservation protein-protein interaction networks has not previously been studied in depth, we first perform such an analysis for co-conservation networks in E. coli K12. Next, we demonstrate one way in which network connectivity measures and global and local function distribution can be exploited to predict protein function for previously uncharacterized proteins.
Our results showed, like most biological networks, our bacteria co-conserved protein-protein interaction networks had scale-free topologies. Our results indicated that some properties of the physical yeast interaction network hold in our bacteria co-conservation networks, such as high connectivity for essential proteins. However, the high connectivity among protein complexes in the yeast physical network was not seen in the co-conservation network which uses all bacteria as the reference set. We found that the distribution of node connectivity varied by functional category and could be informative for function prediction. By integrating of functional information from different annotation sources and using the network topology, we were able to infer function for uncharacterized proteins.
Interactions networks based on co-conservation can contain information distinct from networks based on physical or other interaction types. Our study has shown co-conservation based networks to exhibit a scale free topology, as expected for biological networks. We also revealed ways that connectivity in our networks can be informative for the functional characterization of proteins.
Experimental methods for the identification of essential proteins are always costly, time-consuming, and laborious. It is a challenging task to find protein essentiality only through experiments. With the development of high throughput technologies, a vast amount of protein-protein interactions are available, which enable the identification of essential proteins from the network level. Many computational methods for such task have been proposed based on the topological properties of protein-protein interaction (PPI) networks. However, the currently available PPI networks for each species are not complete, i.e. false negatives, and very noisy, i.e. high false positives, network topology-based centrality measures are often very sensitive to such noise. Therefore, exploring robust methods for identifying essential proteins would be of great value.
In this paper, a new essential protein discovery method, named CoEWC (Co-Expression Weighted by Clustering coefficient), has been proposed. CoEWC is based on the integration of the topological properties of PPI network and the co-expression of interacting proteins. The aim of CoEWC is to capture the common features of essential proteins in both date hubs and party hubs. The performance of CoEWC is validated based on the PPI network of Saccharomyces cerevisiae. Experimental results show that CoEWC significantly outperforms the classical centrality measures, and that it also outperforms PeC, a newly proposed essential protein discovery method which outperforms 15 other centrality measures on the PPI network of Saccharomyces cerevisiae. Especially, when predicting no more than 500 proteins, even more than 50% improvements are obtained by CoEWC over degree centrality (DC), a better centrality measure for identifying protein essentiality.
We demonstrate that more robust essential protein discovery method can be developed by integrating the topological properties of PPI network and the co-expression of interacting proteins. The proposed centrality measure, CoEWC, is effective for the discovery of essential proteins.
Uncovering the protein–protein interaction network is a fundamental step in the quest to understand the molecular machinery of a cell. This motivates the search for efficient computational methods for predicting such interactions. Among the available predictors are those that are based on the co-evolution hypothesis “evolutionary trees of protein families (that are known to interact) are expected to have similar topologies”. Many of these methods are limited by the fact that they can handle only a small number of protein sequences. Also, details on evolutionary tree topology are missing as they use similarity matrices in lieu of the trees.
We introduce MORPH, a new algorithm for predicting protein interaction partners between members of two protein families that are known to interact. Our approach can also be seen as a new method for searching the best superposition of the corresponding evolutionary trees based on tree automorphism group. We discuss relevant facts related to the predictability of protein–protein interaction based on their co-evolution. When compared with related computational approaches, our method reduces the search space by ~3 × 105-fold and at the same time increases the accuracy of predicting correct binding partners.
The structure of molecular networks is believed to determine important aspects of their cellular function, such as the organismal resilience against random perturbations. Ultimately, however, cellular behaviour is determined by the dynamical processes, which are constrained by network topology. The present work is based on a fundamental relation from dynamical systems theory, which states that the macroscopic resilience of a steady state is correlated with the uncertainty in the underlying microscopic processes, a property that can be measured by entropy. Here, we use recent network data from large-scale protein interaction screens to characterize the diversity of possible pathways in terms of network entropy. This measure has its origin in statistical mechanics and amounts to a global characterization of both structural and dynamical resilience in terms of microscopic elements. We demonstrate how this approach can be used to rank network elements according to their contribution to network entropy and also investigate how this suggested ranking reflects on the functional data provided by gene knockouts and RNAi experiments in yeast and Caenorhabditis elegans. Our analysis shows that knockouts of proteins with large contribution to network entropy are preferentially lethal. This observation is robust with respect to several possible errors and biases in the experimental data. It underscores the significance of entropy as a fundamental invariant of the dynamical system, and as a measure of structural and dynamical properties of networks. Our analytical approach goes beyond the phenomenological studies of cellular robustness based on local network observables, such as connectivity. One of its principal achievements is to provide a rationale to study proxies of cellular resilience and rank proteins according to their importance within the global network context.
network entropy; protein interactions; cellular robustness
The large influx of data from high-throughput genomic and proteomic technologies has encouraged the researchers to seek approaches for understanding the structure of gene regulatory networks and proteomic networks. This work reviews some of the most important statistical methods used for modeling of gene regulatory networks (GRNs) and protein-protein interaction (PPI) networks. The paper focuses on the recent advances in the statistical graphical modeling techniques, state-space representation models, and information theoretic methods that were proposed for inferring the topology of GRNs. It appears that the problem of inferring the structure of PPI networks is quite different from that of GRNs. Clustering and probabilistic graphical modeling techniques are of prime importance in the statistical inference of PPI networks, and some of the recent approaches using these techniques are also reviewed in this paper. Performance evaluation criteria for the approaches used for modeling GRNs and PPI networks are also discussed.
Hepatitis B virus (HBV) infection is a leading source of liver diseases such as hepatitis, cirrhosis and hepatocellular carcinoma. In
this study, we use computation methods in order to improve our understanding of the complex interactions that occur between
molecules related to Hepatitis B virus (HBV). Due to the complexity of the disease and the numerous molecular players involved,
we devised a method to construct a systemic network of interactions of the processes ongoing in patients affected by HBV. The
network is based on high-throughput data, refined semi-automatically with carefully curated literature-based information. We find
that some nodes in the network that prove to be topologically important, in particular HBx is also known to be important target
protein used for the treatment of HBV. Therefore, HBx protein is the preferential choice for inhibition to stop the proteolytic
processing. Hence, the 3D structure of HBx protein was downloaded from PDB. Ligands for the active site were designed using
LIGBUILDER. The HBx protein's active site was explored to find out the critical interactions pattern for inhibitor binding using
molecular docking methodology using AUTODOCK Vina. It should be noted that these predicted data should be validated using
suitable assays for further consideration.
Hepatitis B virus; HBx protein; PathVisio; Molecular-interaction map; Virtual screening; Docking; Inhibitor
Studies of cellular signaling indicate that signal transduction pathways combine to form large networks of interactions. Viewing protein-protein and ligand-protein interactions as graphs (networks), where biomolecules are represented as nodes and their interactions are represented as links, is a promising approach for integrating experimental results from different sources to achieve a systematic understanding of the molecular mechanisms driving cell phenotype. The emergence of large-scale signaling networks provides an opportunity for topological statistical analysis while visualization of such networks represents a challenge.
SNAVI is Windows-based desktop application that implements standard network analysis methods to compute the clustering, connectivity distribution, and detection of network motifs, as well as provides means to visualize networks and network motifs. SNAVI is capable of generating linked web pages from network datasets loaded in text format. SNAVI can also create networks from lists of gene or protein names.
SNAVI is a useful tool for analyzing, visualizing and sharing cell signaling data. SNAVI is open source free software. The installation may be downloaded from: . The source code can be accessed from:
Complex genetic disorders often involve products of multiple genes acting cooperatively. Hence, the pathophenotype is the outcome of the perturbations in the underlying pathways, where gene products cooperate through various mechanisms such as protein-protein interactions. Pinpointing the decisive elements of such disease pathways is still challenging. Over the last years, computational approaches exploiting interaction network topology have been successfully applied to prioritize individual genes involved in diseases. Although linkage intervals provide a list of disease-gene candidates, recent genome-wide studies demonstrate that genes not associated with any known linkage interval may also contribute to the disease phenotype. Network based prioritization methods help highlighting such associations. Still, there is a need for robust methods that capture the interplay among disease-associated genes mediated by the topology of the network. Here, we propose a genome-wide network-based prioritization framework named GUILD. This framework implements four network-based disease-gene prioritization algorithms. We analyze the performance of these algorithms in dozens of disease phenotypes. The algorithms in GUILD are compared to state-of-the-art network topology based algorithms for prioritization of genes. As a proof of principle, we investigate top-ranking genes in Alzheimer's disease (AD), diabetes and AIDS using disease-gene associations from various sources. We show that GUILD is able to significantly highlight disease-gene associations that are not used a priori. Our findings suggest that GUILD helps to identify genes implicated in the pathology of human disorders independent of the loci associated with the disorders.
The use of computational methods for predicting protein interaction networks will continue to grow with the number of fully sequenced genomes available. The Co-Conservation method, also known as the Phylogenetic profiles method, is a well-established computational tool for predicting functional relationships between proteins.
Here, we examined how various aspects of this method affect the accuracy and topology of protein interaction networks. We have shown that the choice of reference genome influences the number of predictions involving proteins of previously unknown function, the accuracy of predicted interactions, and the topology of predicted interaction networks. We show that while such results are relatively insensitive to the E-value threshold used in defining homologs, predicted interactions are influenced by the similarity metric that is employed. We show that differences in predicted protein interactions are biologically meaningful, where judicious selection of reference genomes, or use of a new scoring scheme that explicitly considers reference genome relatedness, produces known protein interactions as well as predicted protein interactions involving coordinated biological processes that are not accessible using currently available databases.
These studies should prove valuable for future studies seeking to further improve phylogenetic profiling methodologies as well for efforts to efficiently employ such methods to develop new biological insights.
Predicting the biological function of all the genes of an organism is one of the fundamental goals of computational system biology. In the last decade, high-throughput experimental methods for studying the functional interactions between gene products (GPs) have been combined with computational approaches based on Bayesian networks for data integration. The result of these computational approaches is an interaction network with weighted links representing connectivity likelihood between two functionally related GPs. The weighted network generated by these computational approaches can be used to predict annotations for functionally uncharacterized GPs. Here we introduce Weighted Network Predictor (WNP), a novel algorithm for function prediction of biologically uncharacterized GPs. Tests conducted on simulated data show that WNP outperforms other 5 state-of-the-art methods in terms of both specificity and sensitivity and that it is able to better exploit and propagate the functional and topological information of the network. We apply our method to Saccharomyces cerevisiae yeast and Arabidopsis thaliana networks and we predict Gene Ontology function for about 500 and 10000 uncharacterized GPs respectively.
Proteins are essential macromolecules of life that carry out most cellular processes. Since proteins aggregate to perform function, and since protein-protein interaction (PPI) networks model these aggregations, one would expect to uncover new biology from PPI network topology. Hence, using PPI networks to predict protein function and role of protein pathways in disease has received attention. A debate remains open about whether network properties of “biologically central (BC)” genes (i.e., their protein products), such as those involved in aging, cancer, infectious diseases, or signaling and drug-targeted pathways, exhibit some topological centrality compared to the rest of the proteins in the human PPI network.
To help resolve this debate, we design new network-based approaches and apply them to get new insight into biological function and disease. We hypothesize that BC genes have a topologically central (TC) role in the human PPI network. We propose two different concepts of topological centrality. We design a new centrality measure to capture complex wirings of proteins in the network that identifies as TC those proteins that reside in dense extended network neighborhoods. Also, we use the notion of domination and find dominating sets (DSs) in the PPI network, i.e., sets of proteins such that every protein is either in the DS or is a neighbor of the DS. Clearly, a DS has a TC role, as it enables efficient communication between different network parts.
We find statistically significant enrichment in BC genes of TC nodes and outperform the existing methods indicating that genes involved in key biological processes occupy topologically complex and dense regions of the network and correspond to its “spine” that connects all other network parts and can thus pass cellular signals efficiently throughout the network. To our knowledge, this is the first study that explores domination in the context of PPI networks.
Protein interaction networks have become a tool to study biological processes, either for predicting molecular functions or for designing proper new drugs to regulate the main biological interactions. Furthermore, such networks are known to be organized in sub-networks of proteins contributing to the same cellular function. However, the protein function prediction is not accurate and each protein has traditionally been assigned to only one function by the network formalism. By considering the network of the physical interactions between proteins of the yeast together with a manual and single functional classification scheme, we introduce a method able to reveal important information on protein function, at both micro- and macro-scale. In particular, the inspection of the properties of oscillatory dynamics on top of the protein interaction network leads to the identification of misclassification problems in protein function assignments, as well as to unveil correct identification of protein functions. We also demonstrate that our approach can give a network representation of the meta-organization of biological processes by unraveling the interactions between different functional classes.
Simulating signal transduction in cellular signaling networks provides predictions of network dynamics by quantifying the changes in concentration and activity-level of the individual proteins. Since numerical values of kinetic parameters might be difficult to obtain, it is imperative to develop non-parametric approaches that combine the connectivity of a network with the response of individual proteins to signals which travel through the network. The activity levels of signaling proteins computed through existing non-parametric modeling tools do not show significant correlations with the observed values in experimental results. In this work we developed a non-parametric computational framework to describe the profile of the evolving process and the time course of the proportion of active form of molecules in the signal transduction networks. The model is also capable of incorporating perturbations. The model was validated on four signaling networks showing that it can effectively uncover the activity levels and trends of response during signal transduction process.
Different species are of different importance in maintaining ecosystem functions in natural communities. Quantitative approaches are needed to identify unusually important or influential, ‘keystone’ species particularly for conservation purposes. Since the importance of some species may largely be the consequence of their rich interaction structure, one possible quantitative approach to identify the most influential species is to study their position in the network of interspecific interactions. In this paper, I discuss the role of network analysis (and centrality indices in particular) in this process and present a new and simple approach to characterizing the interaction structures of each species in a complex network. Understanding the linkage between structure and dynamics is a condition to test the results of topological studies, I briefly overview our current knowledge on this issue. The study of key nodes in networks has become an increasingly general interest in several disciplines: I will discuss some parallels. Finally, I will argue that conservation biology needs to devote more attention to identify and conserve keystone species and relatively less attention to rarity.
centrality; food web; indirect effect; keystone species; network analysis
Identification of essential proteins is always a challenging task since it requires experimental approaches that are time-consuming and laborious. With the advances in high throughput technologies, a large number of protein-protein interactions are available, which have produced unprecedented opportunities for detecting proteins' essentialities from the network level. There have been a series of computational approaches proposed for predicting essential proteins based on network topologies. However, the network topology-based centrality measures are very sensitive to the robustness of network. Therefore, a new robust essential protein discovery method would be of great value.
In this paper, we propose a new centrality measure, named PeC, based on the integration of protein-protein interaction and gene expression data. The performance of PeC is validated based on the protein-protein interaction network of Saccharomyces cerevisiae. The experimental results show that the predicted precision of PeC clearly exceeds that of the other fifteen previously proposed centrality measures: Degree Centrality (DC), Betweenness Centrality (BC), Closeness Centrality (CC), Subgraph Centrality (SC), Eigenvector Centrality (EC), Information Centrality (IC), Bottle Neck (BN), Density of Maximum Neighborhood Component (DMNC), Local Average Connectivity-based method (LAC), Sum of ECC (SoECC), Range-Limited Centrality (RL), L-index (LI), Leader Rank (LR), Normalized α-Centrality (NC), and Moduland-Centrality (MC). Especially, the improvement of PeC over the classic centrality measures (BC, CC, SC, EC, and BN) is more than 50% when predicting no more than 500 proteins.
We demonstrate that the integration of protein-protein interaction network and gene expression data can help improve the precision of predicting essential proteins. The new centrality measure, PeC, is an effective essential protein discovery method.
Interaction detection methods have led to the discovery of thousands of interactions between proteins, and discerning relevance within large-scale data sets is important to present-day biology. Here, a spectral method derived from graph theory was introduced to uncover hidden topological structures (i.e. quasi-cliques and quasi-bipartites) of complicated protein–protein interaction networks. Our analyses suggest that these hidden topological structures consist of biologically relevant functional groups. This result motivates a new method to predict the function of uncharacterized proteins based on the classification of known proteins within topological structures. Using this spectral analysis method, 48 quasi-cliques and six quasi-bipartites were isolated from a network involving 11 855 interactions among 2617 proteins in budding yeast, and 76 uncharacterized proteins were assigned functions.
Biological network data, such as metabolic-, signaling- or physical interaction graphs of proteins are increasingly available in public repositories for important species. Tools for the quantitative analysis of these networks are being developed today. Protein network-based drug target identification methods usually return protein hubs with large degrees in the networks as potentially important targets. Some known, important protein targets, however, are not hubs at all, and perturbing protein hubs in these networks may have several unwanted physiological effects, due to their interaction with numerous partners. Here, we show a novel method applicable in networks with directed edges (such as metabolic networks) that compensates for the low degree (non-hub) vertices in the network, and identifies important nodes, regardless of their hub properties. Our method computes the PageRank for the nodes of the network, and divides the PageRank by the in-degree (i.e., the number of incoming edges) of the node. This quotient is the same in all nodes in an undirected graph (even for large- and low-degree nodes, that is, for hubs and non-hubs as well), but may differ significantly from node to node in directed graphs. We suggest to assign importance to non-hub nodes with large PageRank/in-degree quotient. Consequently, our method gives high scores to nodes with large PageRank, relative to their degrees: therefore non-hub important nodes can easily be identified in large networks. We demonstrate that these relatively high PageRank scores have biological relevance: the method correctly finds numerous already validated drug targets in distinct organisms (Mycobacterium tuberculosis, Plasmodium falciparum and MRSA Staphylococcus aureus), and consequently, it may suggest new possible protein targets as well. Additionally, our scoring method was not chosen arbitrarily: its value for all nodes of all undirected graphs is constant; therefore its high value captures importance in the directed edge structure of the graph.
Identification of protein complexes in large interaction networks is crucial to understand principles of cellular organization and predict protein functions, which is one of the most important issues in the post-genomic era. Each protein might be subordinate multiple protein complexes in the real protein-protein interaction networks. Identifying overlapping protein complexes from protein-protein interaction networks is a considerable research topic.
As an effective algorithm in identifying overlapping module structures, clique percolation method (CPM) has a wide range of application in social networks and biological networks. However, the recognition accuracy of algorithm CPM is lowly. Furthermore, algorithm CPM is unfit to identifying protein complexes with meso-scale when it applied in protein-protein interaction networks. In this paper, we propose a new topological model by extending the definition of k-clique community of algorithm CPM and introduced distance restriction, and develop a novel algorithm called CP-DR based on the new topological model for identifying protein complexes. In this new algorithm, the protein complex size is restricted by distance constraint to conquer the shortcomings of algorithm CPM. The algorithm CP-DR is applied to the protein interaction network of Sacchromyces cerevisiae and identifies many well known complexes.
The proposed algorithm CP-DR based on clique percolation and distance restriction makes it possible to identify dense subgraphs in protein interaction networks, a large number of which correspond to known protein complexes. Compared to algorithm CPM, algorithm CP-DR has more outstanding performance.
Computational identification of heme-binding residues is beneficial for predicting and designing novel heme proteins. Here we proposed a novel method for heme-binding residue prediction by exploiting topological properties of these residues in the residue interaction networks derived from three-dimensional structures. Comprehensive analysis showed that key residues located in heme-binding regions are generally associated with the nodes with higher degree, closeness and betweenness, but lower clustering coefficient in the network. HemeNet, a support vector machine (SVM) based predictor, was developed to identify heme-binding residues by combining topological features with existing sequence and structural features. The results showed that incorporation of network-based features significantly improved the prediction performance. We also compared the residue interaction networks of heme proteins before and after heme binding and found that the topological features can well characterize the heme-binding sites of apo structures as well as those of holo structures, which led to reliable performance improvement as we applied HemeNet to predicting the binding residues of proteins in the heme-free state. HemeNet web server is freely accessible at http://mleg.cse.sc.edu/hemeNet/.
Here we introduce the ‘interaction generality’ measure, a new method for computationally assessing the reliability of protein–protein interactions obtained in biological experiments. This measure is basically the number of proteins involved in a given interaction and also adopts the idea that interactions observed in a complicated interaction network are likely to be true positives. Using a group of yeast protein–protein interactions identified in various biological experiments, we show that interactions with low generalities are more likely to be reproducible in other independent assays. We constructed more reliable networks by eliminating interactions whose generalities were above a particular threshold. The rate of interactions with common cellular roles increased from 63% in the unadjusted estimates to 79% in the refined networks. As a result, the rate of cross-talk between proteins with different cellular roles decreased, enabling very clear predictions of the functions of some unknown proteins. The results suggest that the interaction generality measure will make interaction data more useful in all organisms and may yield insights into the biological roles of the proteins studied.
For understanding cellular systems and biological networks, it is important to analyze functions and interactions of proteins and domains. Many methods for predicting protein-protein interactions have been developed. It is known that mutual information between residues at interacting sites can be higher than that at non-interacting sites. It is based on the thought that amino acid residues at interacting sites have coevolved with those at the corresponding residues in the partner proteins. Several studies have shown that such mutual information is useful for identifying contact residues in interacting proteins.
We propose novel methods using conditional random fields for predicting protein-protein interactions. We focus on the mutual information between residues, and combine it with conditional random fields. In the methods, protein-protein interactions are modeled using domain-domain interactions. We perform computational experiments using protein-protein interaction datasets for several organisms, and calculate AUC (Area Under ROC Curve) score. The results suggest that our proposed methods with and without mutual information outperform EM (Expectation Maximization) method proposed by Deng et al., which is one of the best predictors based on domain-domain interactions.
We propose novel methods using conditional random fields with and without mutual information between domains. Our methods based on domain-domain interactions are useful for predicting protein-protein interactions.
With ever increasing amount of available data on biological networks, modeling and understanding the structure of these large networks is an important problem with profound biological implications. Cellular functions and biochemical events are coordinately carried out by groups of proteins interacting each other in biological modules. Identifying of such modules in protein interaction networks is very important for understanding the structure and function of these fundamental cellular networks. Therefore, developing an effective computational method to uncover biological modules should be highly challenging and indispensable.
The purpose of this study is to introduce a new quantitative measure modularity density into the field of biomolecular networks and develop new algorithms for detecting functional modules in protein-protein interaction (PPI) networks. Specifically, we adopt the simulated annealing (SA) to maximize the modularity density and evaluate its efficiency on simulated networks. In order to address the computational complexity of SA procedure, we devise a spectral method for optimizing the index and apply it to a yeast PPI network.
Our analysis of detected modules by the present method suggests that most of these modules have well biological significance in context of protein complexes. Comparison with the MCL and the modularity based methods shows the efficiency of our method.
Uncovering cellular roles of a protein is a task of tremendous importance and complexity that requires dedicated experimental work as well as often sophisticated data mining and processing tools. Protein functions, often referred to as its annotations, are believed to manifest themselves through topology of the networks of inter-proteins interactions. In particular, there is a growing body of evidence that proteins performing the same function are more likely to interact with each other than with proteins with other functions. However, since functional annotation and protein network topology are often studied separately, the direct relationship between them has not been comprehensively demonstrated. In addition to having the general biological significance, such demonstration would further validate the data extraction and processing methods used to compose protein annotation and protein-protein interactions datasets.
We developed a method for automatic extraction of protein functional annotation from scientific text based on the Natural Language Processing (NLP) technology. For the protein annotation extracted from the entire PubMed, we evaluated the precision and recall rates, and compared the performance of the automatic extraction technology to that of manual curation used in public Gene Ontology (GO) annotation. In the second part of our presentation, we reported a large-scale investigation into the correspondence between communities in the literature-based protein networks and GO annotation groups of functionally related proteins. We found a comprehensive two-way match: proteins within biological annotation groups form significantly denser linked network clusters than expected by chance and, conversely, densely linked network communities exhibit a pronounced non-random overlap with GO groups. We also expanded the publicly available GO biological process annotation using the relations extracted by our NLP technology. An increase in the number and size of GO groups without any noticeable decrease of the link density within the groups indicated that this expansion significantly broadens the public GO annotation without diluting its quality. We revealed that functional GO annotation correlates mostly with clustering in a physical interaction protein network, while its overlap with indirect regulatory network communities is two to three times smaller.
Protein functional annotations extracted by the NLP technology expand and enrich the existing GO annotation system. The GO functional modularity correlates mostly with the clustering in the physical interaction network, suggesting that the essential role of structural organization maintained by these interactions. Reciprocally, clustering of proteins in physical interaction networks can serve as an evidence for their functional similarity.
Identification of novel cancer-causing genes is one of the main goals in cancer research. The rapid accumulation of genome-wide protein-protein interaction (PPI) data in humans has provided a new basis for studying the topological features of cancer genes in cellular networks. It is important to integrate multiple genomic data sources, including PPI networks, protein domains and Gene Ontology (GO) annotations, to facilitate the identification of cancer genes.
Topological features of the PPI network, as well as protein domain compositions, enrichment of gene ontology categories, sequence and evolutionary conservation features were extracted and compared between cancer genes and other genes. The predictive power of various classifiers for identification of cancer genes was evaluated by cross validation. Experimental validation of a subset of the prediction results was conducted using siRNA knockdown and viability assays in human colon cancer cell line DLD-1.
Cross validation demonstrated advantageous performance of classifiers based on support vector machines (SVMs) with the inclusion of the topological features from the PPI network, protein domain compositions and GO annotations. We then applied the trained SVM classifier to human genes to prioritize putative cancer genes. siRNA knock-down of several SVM predicted cancer genes displayed greatly reduced cell viability in human colon cancer cell line DLD-1.
Topological features of PPI networks, protein domain compositions and GO annotations are good predictors of cancer genes. The SVM classifier integrates multiple features and as such is useful for prioritizing candidate cancer genes for experimental validations.
The coordinated and dynamic modulation or interaction of genes or proteins acts as an important mechanism used by a cell in functional regulation. Recent studies have shown that many transcriptional networks exhibit a scale-free topology and hierarchical modular architecture. It has also been shown that transcriptional networks or pathways are dynamic and behave only in certain ways and controlled manners in response to disease development, changing cellular conditions, and different environmental factors. Moreover, evolutionarily conserved and divergent transcriptional modules underline fundamental and species-specific molecular mechanisms controlling disease development or cellular phenotypes. Various computational algorithms have been developed to explore transcriptional networks and modules from gene expression data. In silico studies have also been made to mimic the dynamic behavior of regulatory networks, analyzing how disease or cellular phenotypes arise from the connectivity or networks of genes and their products. Here, we review the recent development in computational biology research on deciphering modular and dynamic behaviors of transcriptional networks, highlighting important findings. We also demonstrate how these computational algorithms can be applied in systems biology studies as on disease, stem cells, and drug discovery.
Systems biology; Coexpression; Transcriptional module; Pathway dynamics; Transcriptional intervention; ModulePro; PathwayPro