Search tips
Search criteria

Results 1-25 (1581419)

Clipboard (0)

Related Articles

1.  Structural and functional protein network analyses predict novel signaling functions for rhodopsin 
Proteomic analyses, literature mining, and structural data were combined to generate an extensive signaling network linked to the visual G protein-coupled receptor rhodopsin. Network analysis suggests novel signaling routes to cytoskeleton dynamics and vesicular trafficking.
Using a shotgun proteomic approach, we identified the protein inventory of the light sensing outer segment of the mammalian photoreceptor.These data, combined with literature mining, structural modeling, and computational analysis, offer a comprehensive view of signal transduction downstream of the visual G protein-coupled receptor rhodopsin.The network suggests novel signaling branches downstream of rhodopsin to cytoskeleton dynamics and vesicular trafficking.The network serves as a basis for elucidating physiological principles of photoreceptor function and suggests potential disease-associated proteins.
Photoreceptor cells are neurons capable of converting light into electrical signals. The rod outer segment (ROS) region of the photoreceptor cells is a cellular structure made of a stack of around 800 closed membrane disks loaded with rhodopsin (Liang et al, 2003; Nickell et al, 2007). In disc membranes, rhodopsin arranges itself into paracrystalline dimer arrays, enabling optimal association with the heterotrimeric G protein transducin as well as additional regulatory components (Ciarkowski et al, 2005). Disruption of these highly regulated structures and processes by germline mutations is the cause of severe blinding diseases such as retinitis pigmentosa, macular degeneration, or congenital stationary night blindness (Berger et al, 2010).
Traditionally, signal transduction networks have been studied by combining biochemical and genetic experiments addressing the relations among a small number of components. More recently, large throughput experiments using different techniques like two hybrid or co-immunoprecipitation coupled to mass spectrometry have added a new level of complexity (Ito et al, 2001; Gavin et al, 2002, 2006; Ho et al, 2002; Rual et al, 2005; Stelzl et al, 2005). However, in these studies, space, time, and the fact that many interactions detected for a particular protein are not compatible, are not taken into consideration. Structural information can help discriminate between direct and indirect interactions and more importantly it can determine if two or more predicted partners of any given protein or complex can simultaneously bind a target or rather compete for the same interaction surface (Kim et al, 2006).
In this work, we build a functional and dynamic interaction network centered on rhodopsin on a systems level, using six steps: In step 1, we experimentally identified the proteomic inventory of the porcine ROS, and we compared our data set with a recent proteomic study from bovine ROS (Kwok et al, 2008). The union of the two data sets was defined as the ‘initial experimental ROS proteome'. After removal of contaminants and applying filtering methods, a ‘core ROS proteome', consisting of 355 proteins, was defined.
In step 2, proteins of the core ROS proteome were assigned to six functional modules: (1) vision, signaling, transporters, and channels; (2) outer segment structure and morphogenesis; (3) housekeeping; (4) cytoskeleton and polarity; (5) vesicles formation and trafficking, and (6) metabolism.
In step 3, a protein-protein interaction network was constructed based on the literature mining. Since for most of the interactions experimental evidence was co-immunoprecipitation, or pull-down experiments, and in addition many of the edges in the network are supported by single experimental evidence, often derived from high-throughput approaches, we refer to this network, as ‘fuzzy ROS interactome'. Structural information was used to predict binary interactions, based on the finding that similar domain pairs are likely to interact in a similar way (‘nature repeats itself') (Aloy and Russell, 2002). To increase the confidence in the resulting network, edges supported by a single evidence not coming from yeast two-hybrid experiments were removed, exception being interactions where the evidence was the existence of a three-dimensional structure of the complex itself, or of a highly homologous complex. This curated static network (‘high-confidence ROS interactome') comprises 660 edges linking the majority of the nodes. By considering only edges supported by at least one evidence of direct binary interaction, we end up with a ‘high-confidence binary ROS interactome'. We next extended the published core pathway (Dell'Orco et al, 2009) using evidence from our high-confidence network. We find several new direct binary links to different cellular functional processes (Figure 4): the active rhodopsin interacts with Rac1 and the GTP form of Rho. There is also a connection between active rhodopsin and Arf4, as well as PDEδ with Rab13 and the GTP-bound form of Arl3 that links the vision cycle to vesicle trafficking and structure. We see a connection between PDEδ with prenyl-modified proteins, such as several small GTPases, as well as with rhodopsin kinase. Further, our network reveals several direct binary connections between Ca2+-regulated proteins and cytoskeleton proteins; these are CaMK2A with actinin, calmodulin with GAP43 and S1008, and PKC with 14-3-3 family members.
In step 4, part of the network was experimentally validated using three different approaches to identify physical protein associations that would occur under physiological conditions: (i) Co-segregation/co-sedimentation experiments, (ii) immunoprecipitations combined with mass spectrometry and/or subsequent immunoblotting, and (iii) utilizing the glycosylated N-terminus of rhodopsin to isolate its associated protein partners by Concanavalin A affinity purification. In total, 60 co-purification and co-elution experiments supported interactions that were already in our literature network, and new evidence from 175 co-IP experiments in this work was added. Next, we aimed to provide additional independent experimental confirmation for two of the novel networks and functional links proposed based on the network analysis: (i) the proposed complex between Rac1/RhoA/CRMP-2/tubulin/and ROCK II in ROS was investigated by culturing retinal explants in the presence of an ROCK II-specific inhibitor (Figure 6). While morphology of the retinas treated with ROCK II inhibitor appeared normal, immunohistochemistry analyses revealed several alterations on the protein level. (ii) We supported the hypothesis that PDEδ could function as a GDI for Rac1 in ROS, by demonstrating that PDEδ and Rac1 co localize in ROS and that PDEδ could dissociate Rac1 from ROS membranes in vitro.
In step 5, we use structural information to distinguish between mutually compatible (‘AND') or excluded (‘XOR') interactions. This enables breaking a network of nodes and edges into functional machines or sub-networks/modules. In the vision branch, both ‘AND' and ‘XOR' gates synergize. This may allow dynamic tuning of light and dark states. However, all connections from the vision module to other modules are ‘XOR' connections suggesting that competition, in connection with local protein concentration changes, could be important for transmitting signals from the core vision module.
In the last step, we map and functionally characterize the known mutations that produce blindness.
In summary, this represents the first comprehensive, dynamic, and integrative rhodopsin signaling network, which can be the basis for integrating and mapping newly discovered disease mutants, to guide protein or signaling branch-specific therapies.
Orchestration of signaling, photoreceptor structural integrity, and maintenance needed for mammalian vision remain enigmatic. By integrating three proteomic data sets, literature mining, computational analyses, and structural information, we have generated a multiscale signal transduction network linked to the visual G protein-coupled receptor (GPCR) rhodopsin, the major protein component of rod outer segments. This network was complemented by domain decomposition of protein–protein interactions and then qualified for mutually exclusive or mutually compatible interactions and ternary complex formation using structural data. The resulting information not only offers a comprehensive view of signal transduction induced by this GPCR but also suggests novel signaling routes to cytoskeleton dynamics and vesicular trafficking, predicting an important level of regulation through small GTPases. Further, it demonstrates a specific disease susceptibility of the core visual pathway due to the uniqueness of its components present mainly in the eye. As a comprehensive multiscale network, it can serve as a basis to elucidate the physiological principles of photoreceptor function, identify potential disease-associated genes and proteins, and guide the development of therapies that target specific branches of the signaling pathway.
PMCID: PMC3261702  PMID: 22108793
protein interaction network; rhodopsin signaling; structural modeling
2.  Information Flow Analysis of Interactome Networks 
PLoS Computational Biology  2009;5(4):e1000350.
Recent studies of cellular networks have revealed modular organizations of genes and proteins. For example, in interactome networks, a module refers to a group of interacting proteins that form molecular complexes and/or biochemical pathways and together mediate a biological process. However, it is still poorly understood how biological information is transmitted between different modules. We have developed information flow analysis, a new computational approach that identifies proteins central to the transmission of biological information throughout the network. In the information flow analysis, we represent an interactome network as an electrical circuit, where interactions are modeled as resistors and proteins as interconnecting junctions. Construing the propagation of biological signals as flow of electrical current, our method calculates an information flow score for every protein. Unlike previous metrics of network centrality such as degree or betweenness that only consider topological features, our approach incorporates confidence scores of protein–protein interactions and automatically considers all possible paths in a network when evaluating the importance of each protein. We apply our method to the interactome networks of Saccharomyces cerevisiae and Caenorhabditis elegans. We find that the likelihood of observing lethality and pleiotropy when a protein is eliminated is positively correlated with the protein's information flow score. Even among proteins of low degree or low betweenness, high information scores serve as a strong predictor of loss-of-function lethality or pleiotropy. The correlation between information flow scores and phenotypes supports our hypothesis that the proteins of high information flow reside in central positions in interactome networks. We also show that the ranks of information flow scores are more consistent than that of betweenness when a large amount of noisy data is added to an interactome. Finally, we combine gene expression data with interaction data in C. elegans and construct an interactome network for muscle-specific genes. We find that genes that rank high in terms of information flow in the muscle interactome network but not in the entire network tend to play important roles in muscle function. This framework for studying tissue-specific networks by the information flow model can be applied to other tissues and other organisms as well.
Author Summary
Protein–protein interactions mediate numerous biological processes. In the last decade, there have been efforts to comprehensively map protein–protein interactions occurring in an organism. The interaction data generated from these high-throughput projects can be represented as interconnected networks. It has been found that knockouts of proteins residing in topologically central positions in the networks more likely result in lethality of the organism than knockouts of peripheral proteins. However, it is difficult to accurately define topologically central proteins because high-throughput data is error-prone and some interactions are not as reliable as others. In addition, the architecture of interaction networks varies in different tissues for multi-cellular organisms. To this end, we present a novel computational approach to identify central proteins while considering the confidence of data and gene expression in tissues. Moreover, our approach takes into account multiple alternative paths in interaction networks. We apply our method to yeast and nematode interaction networks. We find that the likelihood of observing lethality and pleiotropy when a given protein is eliminated correlates better with our centrality score for that protein than with its scores based on traditional centrality metrics. Finally, we set up a framework to identify central proteins in tissue-specific interaction networks.
PMCID: PMC2685719  PMID: 19503817
3.  Human Cancer Protein-Protein Interaction Network: A Structural Perspective 
PLoS Computational Biology  2009;5(12):e1000601.
Protein-protein interaction networks provide a global picture of cellular function and biological processes. Some proteins act as hub proteins, highly connected to others, whereas some others have few interactions. The dysfunction of some interactions causes many diseases, including cancer. Proteins interact through their interfaces. Therefore, studying the interface properties of cancer-related proteins will help explain their role in the interaction networks. Similar or overlapping binding sites should be used repeatedly in single interface hub proteins, making them promiscuous. Alternatively, multi-interface hub proteins make use of several distinct binding sites to bind to different partners. We propose a methodology to integrate protein interfaces into cancer interaction networks (ciSPIN, cancer structural protein interface network). The interactions in the human protein interaction network are replaced by interfaces, coming from either known or predicted complexes. We provide a detailed analysis of cancer related human protein-protein interfaces and the topological properties of the cancer network. The results reveal that cancer-related proteins have smaller, more planar, more charged and less hydrophobic binding sites than non-cancer proteins, which may indicate low affinity and high specificity of the cancer-related interactions. We also classified the genes in ciSPIN according to phenotypes. Within phenotypes, for breast cancer, colorectal cancer and leukemia, interface properties were found to be discriminating from non-cancer interfaces with an accuracy of 71%, 67%, 61%, respectively. In addition, cancer-related proteins tend to interact with their partners through distinct interfaces, corresponding mostly to multi-interface hubs, which comprise 56% of cancer-related proteins, and constituting the nodes with higher essentiality in the network (76%). We illustrate the interface related affinity properties of two cancer-related hub proteins: Erbb3, a multi interface, and Raf1, a single interface hub. The results reveal that affinity of interactions of the multi-interface hub tends to be higher than that of the single-interface hub. These findings might be important in obtaining new targets in cancer as well as finding the details of specific binding regions of putative cancer drug candidates.
Author Summary
Protein-protein interaction networks provide a global picture of cellular function and biological processes. The dysfunction of some interactions causes many diseases, including cancer. Proteins interact through their interfaces. Therefore, studying the interface properties of cancer-related proteins will help explain their role in the interaction networks. The structural details of interfaces are immensely useful in efforts to answer some fundamental questions such as: (i) what features of cancer-related protein interfaces make them act as hubs; (ii) how hub protein interfaces can interact with tens of other proteins with varying affinities; and (iii) which interactions can occur simultaneously and which are mutually exclusive. Addressing these questions, we propose a method to characterize interactions in a human protein-protein interaction network using three-dimensional protein structures and interfaces. Protein interface analysis shows that the strength and specificity of the interactions of hub proteins and cancer proteins are different than the interactions of non-hub and non-cancer proteins, respectively. In addition, distinguishing overlapping from non-overlapping interfaces, we illustrate how a fourth dimension, that of the sequence of processes, is integrated into the network with case studies. We believe that such an approach should be useful in structural systems biology.
PMCID: PMC2785480  PMID: 20011507
4.  Dynamic Changes in Protein Functional Linkage Networks Revealed by Integration with Gene Expression Data 
PLoS Computational Biology  2008;4(11):e1000237.
Response of cells to changing environmental conditions is governed by the dynamics of intricate biomolecular interactions. It may be reasonable to assume, proteins being the dominant macromolecules that carry out routine cellular functions, that understanding the dynamics of protein∶protein interactions might yield useful insights into the cellular responses. The large-scale protein interaction data sets are, however, unable to capture the changes in the profile of protein∶protein interactions. In order to understand how these interactions change dynamically, we have constructed conditional protein linkages for Escherichia coli by integrating functional linkages and gene expression information. As a case study, we have chosen to analyze UV exposure in wild-type and SOS deficient E. coli at 20 minutes post irradiation. The conditional networks exhibit similar topological properties. Although the global topological properties of the networks are similar, many subtle local changes are observed, which are suggestive of the cellular response to the perturbations. Some such changes correspond to differences in the path lengths among the nodes of carbohydrate metabolism correlating with its loss in efficiency in the UV treated cells. Similarly, expression of hubs under unique conditions reflects the importance of these genes. Various centrality measures applied to the networks indicate increased importance for replication, repair, and other stress proteins for the cells under UV treatment, as anticipated. We thus propose a novel approach for studying an organism at the systems level by integrating genome-wide functional linkages and the gene expression data.
Author Summary
Many cellular processes and the response of cells to environmental cues are determined by the intricate protein∶protein interactions. These cellular protein interactions can be represented in the form of a graph, where the nodes represent the proteins and the edges signify the interactions between them. However, the available protein functional linkage maps do not incorporate the dynamics of gene expression and thus do not portray the dynamics of true protein∶protein interactions in vivo. We have used gene expression data as well as the available protein functional interaction information for Escherichia coli to build the protein interaction networks for expressed genes in a given condition. These networks, named conditional networks, capture the differences in the protein interaction networks and hence the cell physiology. Thus, by exploring the dynamics of protein interaction profiles, we hope to understand the response of cells to environmental changes.
PMCID: PMC2580820  PMID: 19043542
5.  Detecting Network Communities: An Application to Phylogenetic Analysis 
PLoS Computational Biology  2011;7(5):e1001131.
This paper proposes a new method to identify communities in generally weighted complex networks and apply it to phylogenetic analysis. In this case, weights correspond to the similarity indexes among protein sequences, which can be used for network construction so that the network structure can be analyzed to recover phylogenetically useful information from its properties. The analyses discussed here are mainly based on the modular character of protein similarity networks, explored through the Newman-Girvan algorithm, with the help of the neighborhood matrix . The most relevant networks are found when the network topology changes abruptly revealing distinct modules related to the sets of organisms to which the proteins belong. Sound biological information can be retrieved by the computational routines used in the network approach, without using biological assumptions other than those incorporated by BLAST. Usually, all the main bacterial phyla and, in some cases, also some bacterial classes corresponded totally (100%) or to a great extent (>70%) to the modules. We checked for internal consistency in the obtained results, and we scored close to 84% of matches for community pertinence when comparisons between the results were performed. To illustrate how to use the network-based method, we employed data for enzymes involved in the chitin metabolic pathway that are present in more than 100 organisms from an original data set containing 1,695 organisms, downloaded from GenBank on May 19, 2007. A preliminary comparison between the outcomes of the network-based method and the results of methods based on Bayesian, distance, likelihood, and parsimony criteria suggests that the former is as reliable as these commonly used methods. We conclude that the network-based method can be used as a powerful tool for retrieving modularity information from weighted networks, which is useful for phylogenetic analysis.
Author Summary
Complex weighted networks have been applied to uncover organizing principles of complex biological, technological, and social systems. We propose herein a new method to identify communities in such structures and apply it to phylogenetic analysis. Recent studies using this theory in genomics and proteomics contributed to the understanding of the structure and dynamics of cellular complex interaction webs. Three main distinct molecular networks have been investigated based on transcriptional and metabolic activity, and on protein interaction. Here we consider the evolutionary relationship between proteins throughout phylogeny, employing the complex network approach to perform a comparative study of the enzymes related to the chitin metabolic pathway. We show how the similarity index of protein sequences can be used for network construction, and how the underlying structure is analyzed by the computational routines of our method to recover useful and sound information for phylogenetic studies. By focusing on the modular character of protein similarity networks, we were successful in matching the identified networks modules to main bacterial phyla, and even some bacterial classes. The network-based method reported here can be used as a new powerful tool for identifying communities in complex networks, retrieving useful information for phylogenetic studies.
PMCID: PMC3088654  PMID: 21573202
6.  The topology of the bacterial co-conserved protein network and its implications for predicting protein function 
BMC Genomics  2008;9:313.
Protein-protein interactions networks are most often generated from physical protein-protein interaction data. Co-conservation, also known as phylogenetic profiles, is an alternative source of information for generating protein interaction networks. Co-conservation methods generate interaction networks among proteins that are gained or lost together through evolution. Co-conservation is a particularly useful technique in the compact bacteria genomes. Prior studies in yeast suggest that the topology of protein-protein interaction networks generated from physical interaction assays can offer important insight into protein function. Here, we hypothesize that in bacteria, the topology of protein interaction networks derived via co-conservation information could similarly improve methods for predicting protein function. Since the topology of bacteria co-conservation protein-protein interaction networks has not previously been studied in depth, we first perform such an analysis for co-conservation networks in E. coli K12. Next, we demonstrate one way in which network connectivity measures and global and local function distribution can be exploited to predict protein function for previously uncharacterized proteins.
Our results showed, like most biological networks, our bacteria co-conserved protein-protein interaction networks had scale-free topologies. Our results indicated that some properties of the physical yeast interaction network hold in our bacteria co-conservation networks, such as high connectivity for essential proteins. However, the high connectivity among protein complexes in the yeast physical network was not seen in the co-conservation network which uses all bacteria as the reference set. We found that the distribution of node connectivity varied by functional category and could be informative for function prediction. By integrating of functional information from different annotation sources and using the network topology, we were able to infer function for uncharacterized proteins.
Interactions networks based on co-conservation can contain information distinct from networks based on physical or other interaction types. Our study has shown co-conservation based networks to exhibit a scale free topology, as expected for biological networks. We also revealed ways that connectivity in our networks can be informative for the functional characterization of proteins.
PMCID: PMC2488357  PMID: 18590549
7.  HepatoNet1: a comprehensive metabolic reconstruction of the human hepatocyte for the analysis of liver physiology 
We present HepatoNet1, a manually curated large-scale metabolic network of the human hepatocyte that encompasses >2500 reactions in six intracellular and two extracellular compartments.Using constraint-based modeling techniques, the network has been validated to replicate numerous metabolic functions of hepatocytes corresponding to a reference set of diverse physiological liver functions.Taking the detoxification of ammonia and the formation of bile acids as examples, we show how these liver-specific metabolic objectives can be achieved by the variable interplay of various metabolic pathways under varying conditions of nutrients and oxygen availability.
The liver has a pivotal function in metabolic homeostasis of the human body. Hepatocytes are the principal site of the metabolic conversions that underlie diverse physiological functions of the liver. These functions include provision and homeostasis of carbohydrates, amino acids, lipids and lipoproteins in the systemic blood circulation, biotransformation, plasma protein synthesis and bile formation, to name a few. Accordingly, hepatocyte metabolism integrates a vast array of differentially regulated biochemical activities and is highly responsive to environmental perturbations such as changes in portal blood composition (Dardevet et al, 2006). The complexity of this metabolic network and the numerous physiological functions to be achieved within a highly variable physiological environment necessitate an integrated approach with the aim of understanding liver metabolism at a systems level. To this end, we present HepatoNet1, a stoichiometric network of human hepatocyte metabolism characterized by (i) comprehensive coverage of known biochemical activities of hepatocytes and (ii) due representation of the biochemical and physiological functions of hepatocytes as functional network states. The network comprises 777 metabolites in six intracellular (cytosol, endoplasmic reticulum and Golgi apparatus, lysosome, mitochondria, nucleus, and peroxisome) and two extracellular compartments (bile canaliculus and sinusoidal space) and 2539 reactions, including 1466 transport reactions. It is based on the manual evaluation of >1500 original scientific research publications to warrant a high-quality evidence-based model. The final network is the result of an iterative process of data compilation and rigorous computational testing of network functionality by means of constraint-based modeling techniques. We performed flux-balance analyses to validate whether for >300 different metabolic objectives a non-zero stationary flux distribution could be established in the network. Figure 1 shows one such functional flux mode associated with the synthesis of the bile acid glycochenodeoxycholate, one important hepatocyte-specific physiological liver function. Besides those pathways directly linked to the synthesis of the bile acid, the mevalonate pathway and the de novo synthesis of cholesterol, the flux mode comprises additional pathways such as gluconeogenesis, the pentose phosphate pathway or the ornithine cycle because the calculations were routinely performed on a minimal set of exchangeable metabolites, that is all reactants were forced to be balanced and all exportable intermediates had to be catabolized into non-degradable end products. This example shows how HepatoNet1 under the challenges of limited exchange across the network boundary can reveal numerous cross-links between metabolic pathways traditionally perceived as separate entities. For example, alanine is used as gluconeogenic substrate to form glucose-6-phosphate, which is used in the pentose phosphate pathway to generate NADPH. The glycine moiety for bile acid conjugation is derived from serine. Conversion of ammonia into non-toxic nitrogen compounds is one central homeostatic function of hepatocytes. Using the HepatoNet1 model, we investigated, as another example of a complex metabolic objective dependent on systemic physiological parameters, how the consumption of oxygen, glucose and palmitate is affected when an external nitrogen load is converted in varying proportions to the non-toxic nitrogen compounds: urea, glutamine and alanine. The results reveal strong dependencies between the available level of oxygen and the substrate demand of hepatocytes required for effective ammonia detoxification by the liver.
Oxygen demand is highest if nitrogen is exclusively transformed into urea. At lower fluxes into urea, an intriguing pattern for oxygen demand is predicted: oxygen demand attains a minimum if the nitrogen load is directed to urea, glutamine and alanine with relative fluxes of 0.17, 0.43 and 0.40, respectively (Figure 2A). Oxygen demand in this flux distribution is four times lower than for the maximum (100% urea) and still 77 and 33% lower than using alanine and glutamine as exclusive nitrogen compounds, respectively. This computationally predicted tendency is consistent with the notion that the zonation of ammonia detoxification, that is the preferential conversion of ammonia to urea in periportal hepatocytes and to glutamine in perivenous hepatocytes, is dictated by the availability of oxygen (Gebhardt, 1992; Jungermann and Kietzmann, 2000). The decreased oxygen demand in flux distributions using higher proportions of glutamine or alanine is accompanied by increased uptake of the substrates glucose and palmitate (Figure 2B). This is due to an increased demand of energy and carbon for the amidation and transamination of glutamate and pyruvate to discharge nitrogen in the form of glutamine and alanine, respectively. In terms of both scope and specificity, our model bridges the scale between models constructed specifically to examine distinct metabolic processes of the liver and modeling based on a global representation of human metabolism. The former include models for the interdependence of gluconeogenesis and fatty-acid catabolism (Chalhoub et al, 2007), impairment of glucose production in von Gierke's and Hers' diseases (Beard and Qian, 2005) and other processes (Calik and Akbay, 2000; Stucki and Urbanczik, 2005; Ohno et al, 2008). The hallmark of these models is that each of them focuses on a small number of reactions pertinent to the metabolic function of interest embedded in a customized representation of the principal pathways of central metabolism. HepatoNet1, currently, outperforms liver-specific models computationally predicted (Shlomi et al, 2008) on the basis of global reconstructions of human metabolism (Duarte et al, 2007; Ma and Goryanin, 2008). In contrast to either of the aforementioned modeling scales, HepatoNet1 provides the combination of a system-scale representation of metabolic activities and representation of the cell type-specific physical boundaries and their specific transport capacities. This allows for a highly versatile use of the model for the analysis of various liver-specific physiological functions. Conceptually, from a biological system perspective, this type of model offers a large degree of comprehensiveness, whereas retaining tissue specificity, a fundamental design principle of mammalian metabolism. HepatoNet1 is expected to provide a structural platform for computational studies on liver function. The results presented herein highlight how internal fluxes of hepatocyte metabolism and the interplay with systemic physiological parameters can be analyzed with constraint-based modeling techniques. At the same time, the framework may serve as a scaffold for complementation of kinetic and regulatory properties of enzymes and transporters for analysis of sub-networks with topological or kinetic modeling methods.
We present HepatoNet1, the first reconstruction of a comprehensive metabolic network of the human hepatocyte that is shown to accomplish a large canon of known metabolic liver functions. The network comprises 777 metabolites in six intracellular and two extracellular compartments and 2539 reactions, including 1466 transport reactions. It is based on the manual evaluation of >1500 original scientific research publications to warrant a high-quality evidence-based model. The final network is the result of an iterative process of data compilation and rigorous computational testing of network functionality by means of constraint-based modeling techniques. Taking the hepatic detoxification of ammonia as an example, we show how the availability of nutrients and oxygen may modulate the interplay of various metabolic pathways to allow an efficient response of the liver to perturbations of the homeostasis of blood compounds.
PMCID: PMC2964118  PMID: 20823849
computational biology; flux balance; liver; minimal flux
8.  Linkers of Cell Polarity and Cell Cycle Regulation in the Fission Yeast Protein Interaction Network 
PLoS Computational Biology  2012;8(10):e1002732.
The study of gene and protein interaction networks has improved our understanding of the multiple, systemic levels of regulation found in eukaryotic and prokaryotic organisms. Here we carry out a large-scale analysis of the protein-protein interaction (PPI) network of fission yeast (Schizosaccharomyces pombe) and establish a method to identify ‘linker’ proteins that bridge diverse cellular processes - integrating Gene Ontology and PPI data with network theory measures. We test the method on a highly characterized subset of the genome consisting of proteins controlling the cell cycle, cell polarity and cytokinesis and identify proteins likely to play a key role in controlling the temporal changes in the localization of the polarity machinery. Experimental inspection of one such factor, the polarity-regulating RNB protein Sts5, confirms the prediction that it has a cell cycle dependent regulation. Detailed bibliographic inspection of other predicted ‘linkers’ also confirms the predictive power of the method. As the method is robust to network perturbations and can successfully predict linker proteins, it provides a powerful tool to study the interplay between different cellular processes.
Author Summary
Analysis of protein interaction networks has been of use as a means to grapple with the complexity of the interactome of biological organisms. So far, network based approaches have only been used in a limited number of organisms due to the lack of high-throughput experiments. In this study, we investigate by graph theoretical network analysis approaches the protein-protein interaction network of fission yeast, and present a new network measure, linkerity, that predicts the ability of certain proteins to function as bridges between diverse cellular processes. We apply this linkerity measure to a highly conserved and coupled subset of the fission yeast network, consisting of the proteins that regulate cell cycle, polarized cell growth, and cell division. In depth literature analysis confirms that several proteins identified as linkers of cell polarity regulation are indeed also associated with cell cycle and/or cell division control. Similarly, experimental testing confirms that a mostly uncharacterized polarity regulator identified by the method as an important linker is regulated by the cell cycle, as predicted.
PMCID: PMC3475659  PMID: 23093924
9.  Detecting and Removing Inconsistencies between Experimental Data and Signaling Network Topologies Using Integer Linear Programming on Interaction Graphs 
PLoS Computational Biology  2013;9(9):e1003204.
Cross-referencing experimental data with our current knowledge of signaling network topologies is one central goal of mathematical modeling of cellular signal transduction networks. We present a new methodology for data-driven interrogation and training of signaling networks. While most published methods for signaling network inference operate on Bayesian, Boolean, or ODE models, our approach uses integer linear programming (ILP) on interaction graphs to encode constraints on the qualitative behavior of the nodes. These constraints are posed by the network topology and their formulation as ILP allows us to predict the possible qualitative changes (up, down, no effect) of the activation levels of the nodes for a given stimulus. We provide four basic operations to detect and remove inconsistencies between measurements and predicted behavior: (i) find a topology-consistent explanation for responses of signaling nodes measured in a stimulus-response experiment (if none exists, find the closest explanation); (ii) determine a minimal set of nodes that need to be corrected to make an inconsistent scenario consistent; (iii) determine the optimal subgraph of the given network topology which can best reflect measurements from a set of experimental scenarios; (iv) find possibly missing edges that would improve the consistency of the graph with respect to a set of experimental scenarios the most. We demonstrate the applicability of the proposed approach by interrogating a manually curated interaction graph model of EGFR/ErbB signaling against a library of high-throughput phosphoproteomic data measured in primary hepatocytes. Our methods detect interactions that are likely to be inactive in hepatocytes and provide suggestions for new interactions that, if included, would significantly improve the goodness of fit. Our framework is highly flexible and the underlying model requires only easily accessible biological knowledge. All related algorithms were implemented in a freely available toolbox SigNetTrainer making it an appealing approach for various applications.
Author Summary
Cellular signal transduction is orchestrated by communication networks of signaling proteins commonly depicted on signaling pathway maps. However, each cell type may have distinct variants of signaling pathways, and wiring diagrams are often altered in disease states. The identification of truly active signaling topologies based on experimental data is therefore one key challenge in systems biology of cellular signaling. We present a new framework for training signaling networks based on interaction graphs (IG). In contrast to complex modeling formalisms, IG capture merely the known positive and negative edges between the components. This basic information, however, already sets hard constraints on the possible qualitative behaviors of the nodes when perturbing the network. Our approach uses Integer Linear Programming to encode these constraints and to predict the possible changes (down, neutral, up) of the activation levels of the involved players for a given experiment. Based on this formulation we developed several algorithms for detecting and removing inconsistencies between measurements and network topology. Demonstrated by EGFR/ErbB signaling in hepatocytes, our approach delivers direct conclusions on edges that are likely inactive or missing relative to canonical pathway maps. Such information drives the further elucidation of signaling network topologies under normal and pathological phenotypes.
PMCID: PMC3764019  PMID: 24039561
10.  Dominating Biological Networks 
PLoS ONE  2011;6(8):e23016.
Proteins are essential macromolecules of life that carry out most cellular processes. Since proteins aggregate to perform function, and since protein-protein interaction (PPI) networks model these aggregations, one would expect to uncover new biology from PPI network topology. Hence, using PPI networks to predict protein function and role of protein pathways in disease has received attention. A debate remains open about whether network properties of “biologically central (BC)” genes (i.e., their protein products), such as those involved in aging, cancer, infectious diseases, or signaling and drug-targeted pathways, exhibit some topological centrality compared to the rest of the proteins in the human PPI network.
To help resolve this debate, we design new network-based approaches and apply them to get new insight into biological function and disease. We hypothesize that BC genes have a topologically central (TC) role in the human PPI network. We propose two different concepts of topological centrality. We design a new centrality measure to capture complex wirings of proteins in the network that identifies as TC those proteins that reside in dense extended network neighborhoods. Also, we use the notion of domination and find dominating sets (DSs) in the PPI network, i.e., sets of proteins such that every protein is either in the DS or is a neighbor of the DS. Clearly, a DS has a TC role, as it enables efficient communication between different network parts.
We find statistically significant enrichment in BC genes of TC nodes and outperform the existing methods indicating that genes involved in key biological processes occupy topologically complex and dense regions of the network and correspond to its “spine” that connects all other network parts and can thus pass cellular signals efficiently throughout the network. To our knowledge, this is the first study that explores domination in the context of PPI networks.
PMCID: PMC3162560  PMID: 21887225
11.  Multi-tissue Analysis of Co-expression Networks by Higher-Order Generalized Singular Value Decomposition Identifies Functionally Coherent Transcriptional Modules 
PLoS Genetics  2014;10(1):e1004006.
Recent high-throughput efforts such as ENCODE have generated a large body of genome-scale transcriptional data in multiple conditions (e.g., cell-types and disease states). Leveraging these data is especially important for network-based approaches to human disease, for instance to identify coherent transcriptional modules (subnetworks) that can inform functional disease mechanisms and pathological pathways. Yet, genome-scale network analysis across conditions is significantly hampered by the paucity of robust and computationally-efficient methods. Building on the Higher-Order Generalized Singular Value Decomposition, we introduce a new algorithmic approach for efficient, parameter-free and reproducible identification of network-modules simultaneously across multiple conditions. Our method can accommodate weighted (and unweighted) networks of any size and can similarly use co-expression or raw gene expression input data, without hinging upon the definition and stability of the correlation used to assess gene co-expression. In simulation studies, we demonstrated distinctive advantages of our method over existing methods, which was able to recover accurately both common and condition-specific network-modules without entailing ad-hoc input parameters as required by other approaches. We applied our method to genome-scale and multi-tissue transcriptomic datasets from rats (microarray-based) and humans (mRNA-sequencing-based) and identified several common and tissue-specific subnetworks with functional significance, which were not detected by other methods. In humans we recapitulated the crosstalk between cell-cycle progression and cell-extracellular matrix interactions processes in ventricular zones during neocortex expansion and further, we uncovered pathways related to development of later cognitive functions in the cortical plate of the developing brain which were previously unappreciated. Analyses of seven rat tissues identified a multi-tissue subnetwork of co-expressed heat shock protein (Hsp) and cardiomyopathy genes (Bag3, Cryab, Kras, Emd, Plec), which was significantly replicated using separate failing heart and liver gene expression datasets in humans, thus revealing a conserved functional role for Hsp genes in cardiovascular disease.
Author Summary
Complex biological interactions and processes can be modelled as networks, for instance metabolic pathways or protein-protein interactions. The growing availability of large high-throughput data in several experimental conditions now permits the full-scale analysis of biological interactions and processes. However, no reliable and computationally efficient methods for simultaneous analysis of multiple large-scale interaction datasets (networks) have been developed to date. To overcome this shortcoming, we have developed a new computational framework that is parameter-free, computationally efficient and highly reliable. We showed how these distinctive properties make it a useful tool for real genomic data exploration and analyses. Indeed, in extensive simulation studies and real-data analyses we have demonstrated that our method outperformed existing approaches in terms of efficiency and, most importantly, reproducibility of the results. Beyond the computational advantages, we illustrated how our method can be effectively applied to leverage the vast stream of genome-scale transcriptional data that has risen exponentially over the last years. In contrast with existing approaches, using our method we were able to identify and replicate multi-tissue gene co-expression networks that were associated with specific functional processes relevant to phenotypic variation and disease in rats and humans.
PMCID: PMC3879165  PMID: 24391511
12.  Simultaneous Genome-Wide Inference of Physical, Genetic, Regulatory, and Functional Pathway Components 
PLoS Computational Biology  2010;6(11):e1001009.
Biomolecular pathways are built from diverse types of pairwise interactions, ranging from physical protein-protein interactions and modifications to indirect regulatory relationships. One goal of systems biology is to bridge three aspects of this complexity: the growing body of high-throughput data assaying these interactions; the specific interactions in which individual genes participate; and the genome-wide patterns of interactions in a system of interest. Here, we describe methodology for simultaneously predicting specific types of biomolecular interactions using high-throughput genomic data. This results in a comprehensive compendium of whole-genome networks for yeast, derived from ∼3,500 experimental conditions and describing 30 interaction types, which range from general (e.g. physical or regulatory) to specific (e.g. phosphorylation or transcriptional regulation). We used these networks to investigate molecular pathways in carbon metabolism and cellular transport, proposing a novel connection between glycogen breakdown and glucose utilization supported by recent publications. Additionally, 14 specific predicted interactions in DNA topological change and protein biosynthesis were experimentally validated. We analyzed the systems-level network features within all interactomes, verifying the presence of small-world properties and enrichment for recurring network motifs. This compendium of physical, synthetic, regulatory, and functional interaction networks has been made publicly available through an interactive web interface for investigators to utilize in future research at
Author Summary
To maintain the complexity of living biological systems, many proteins must interact in a coordinated manner to integrate their unique functions into a cooperative system. Pathways are typically constructed to capture modular subsets of this dynamic network, each made up of a collection of biomolecular interactions of diverse types that together carry out a specific cellular function. Deciphering these pathways at a global level is a crucial step for unraveling systems biology, aiding at every level from basic biological understanding to translational biomarker and drug target discovery. The combination of high-throughput genomic data with advanced computational methods has enabled us to infer the first genome-wide compendium of bimolecular pathway networks, comprising 30 distinct bimolecular interaction types. We demonstrate that this interaction network compendium, derived from ∼3,500 experimental conditions, can be used to direct a range of biomedical hypothesis generation and testing. We show that our results can be used to predict novel protein interactions and new pathway components, and also that they enable system-level analysis to investigate the network characteristics of cell-wide regulatory circuits. The resulting compendium of biological networks is made publicly available through an interactive web interface to enable future research in other biological systems of interest.
PMCID: PMC2991250  PMID: 21124865
13.  Emergence of Switch-Like Behavior in a Large Family of Simple Biochemical Networks 
PLoS Computational Biology  2011;7(5):e1002039.
Bistability plays a central role in the gene regulatory networks (GRNs) controlling many essential biological functions, including cellular differentiation and cell cycle control. However, establishing the network topologies that can exhibit bistability remains a challenge, in part due to the exceedingly large variety of GRNs that exist for even a small number of components. We begin to address this problem by employing chemical reaction network theory in a comprehensive in silico survey to determine the capacity for bistability of more than 40,000 simple networks that can be formed by two transcription factor-coding genes and their associated proteins (assuming only the most elementary biochemical processes). We find that there exist reaction rate constants leading to bistability in ∼90% of these GRN models, including several circuits that do not contain any of the TF cooperativity commonly associated with bistable systems, and the majority of which could only be identified as bistable through an original subnetwork-based analysis. A topological sorting of the two-gene family of networks based on the presence or absence of biochemical reactions reveals eleven minimal bistable networks (i.e., bistable networks that do not contain within them a smaller bistable subnetwork). The large number of previously unknown bistable network topologies suggests that the capacity for switch-like behavior in GRNs arises with relative ease and is not easily lost through network evolution. To highlight the relevance of the systematic application of CRNT to bistable network identification in real biological systems, we integrated publicly available protein-protein interaction, protein-DNA interaction, and gene expression data from Saccharomyces cerevisiae, and identified several GRNs predicted to behave in a bistable fashion.
Author Summary
Switch-like behavior is found across a wide range of biological systems, and as a result there is significant interest in identifying the various ways in which biochemical reactions can be combined to yield a switch-like response. In this work we use a set of mathematical tools from chemical reaction network theory that provide information about the steady-states of a reaction network irrespective of the values of network rate constants, to conduct a large computational study of a family of model networks consisting of only two protein-coding genes. We find that a large majority of these networks (∼90%) have (for some set of parameters) the mathematical property known as bistability and can behave in a switch-like manner. Interestingly, the capacity for switch-like behavior is often maintained as networks increase in size through the introduction of new reactions. We then demonstrate using published yeast data how theoretical parameter-free surveys such as this one can be used to discover possible switch-like circuits in real biological systems. Our results highlight the potential usefulness of parameter-free modeling for the characterization of complex networks and to the study of network evolution, and are suggestive of a role for it in the development of novel synthetic biological switches.
PMCID: PMC3093349  PMID: 21589886
14.  Perturbation Biology: Inferring Signaling Networks in Cellular Systems 
PLoS Computational Biology  2013;9(12):e1003290.
We present a powerful experimental-computational technology for inferring network models that predict the response of cells to perturbations, and that may be useful in the design of combinatorial therapy against cancer. The experiments are systematic series of perturbations of cancer cell lines by targeted drugs, singly or in combination. The response to perturbation is quantified in terms of relative changes in the measured levels of proteins, phospho-proteins and cellular phenotypes such as viability. Computational network models are derived de novo, i.e., without prior knowledge of signaling pathways, and are based on simple non-linear differential equations. The prohibitively large solution space of all possible network models is explored efficiently using a probabilistic algorithm, Belief Propagation (BP), which is three orders of magnitude faster than standard Monte Carlo methods. Explicit executable models are derived for a set of perturbation experiments in SKMEL-133 melanoma cell lines, which are resistant to the therapeutically important inhibitor of RAF kinase. The resulting network models reproduce and extend known pathway biology. They empower potential discoveries of new molecular interactions and predict efficacious novel drug perturbations, such as the inhibition of PLK1, which is verified experimentally. This technology is suitable for application to larger systems in diverse areas of molecular biology.
Author Summary
Drugs that target specific effects of signaling proteins are promising agents for treating cancer. One of the many obstacles facing optimal drug design is inadequate quantitative understanding of the coordinated interactions between signaling proteins. De novo model inference of network or pathway models refers to the algorithmic construction of mathematical predictive models from experimental data without dependence on prior knowledge. De novo inference is difficult because of the prohibitively large number of possible sets of interactions that may or may not be consistent with observations. Our new method overcomes this difficulty by adapting a method from statistical physics, called Belief Propagation, which first calculates probabilistically the most likely interactions in the vast space of all possible solutions, then derives a set of individual, highly probable solutions in the form of executable models. In this paper, we test this method on artificial data and then apply it to model signaling pathways in a BRAF-mutant melanoma cancer cell line based on a large set of rich output measurements from a systematic set of perturbation experiments using drug combinations. Our results are in agreement with established biological knowledge, predict novel interactions, and predict efficacious drug targets that are specific to the experimental cell line and potentially to related tumors. The method has the potential, with sufficient systematic perturbation data, to model, de novo and quantitatively, the effects of hundreds of proteins on cellular responses, on a scale that is currently unreachable in diverse areas of cell biology. In a disease context, the method is applicable to the computational design of novel combination drug treatments.
PMCID: PMC3868523  PMID: 24367245
15.  Organization of Physical Interactomes as Uncovered by Network Schemas 
PLoS Computational Biology  2008;4(10):e1000203.
Large-scale protein-protein interaction networks provide new opportunities for understanding cellular organization and functioning. We introduce network schemas to elucidate shared mechanisms within interactomes. Network schemas specify descriptions of proteins and the topology of interactions among them. We develop algorithms for systematically uncovering recurring, over-represented schemas in physical interaction networks. We apply our methods to the S. cerevisiae interactome, focusing on schemas consisting of proteins described via sequence motifs and molecular function annotations and interacting with one another in one of four basic network topologies. We identify hundreds of recurring and over-represented network schemas of various complexity, and demonstrate via graph-theoretic representations how more complex schemas are organized in terms of their lower-order constituents. The uncovered schemas span a wide range of cellular activities, with many signaling and transport related higher-order schemas. We establish the functional importance of the schemas by showing that they correspond to functionally cohesive sets of proteins, are enriched in the frequency with which they have instances in the H. sapiens interactome, and are useful for predicting protein function. Our findings suggest that network schemas are a powerful paradigm for organizing, interrogating, and annotating cellular networks.
Author Summary
Large-scale networks of protein-protein interactions provide a view into the workings of the cell. However, these interaction maps do not come with a key for interpreting them, so it is necessary to develop methods that shed light on their functioning and organization. We propose the language of network schemas for describing recurring patterns of specific types of proteins and their interactions. That is, network schemas describe proteins and specify the topology of interactions among them. A single network schema can describe, for example, a common template that underlies several distinct cellular pathways, such as signaling pathways. We develop a computational methodology for identifying network schemas that are recurrent and over-represented in the network, even given the distributions of their constituent components. We apply this methodology to the physical interaction network in S. cerevisiae and begin to build a hierarchy of schemas starting with the four simplest topologies. We validate the biological relevance of the schemas that we find, discuss the insights our findings lend into the organization of interactomes, touch upon cross-genomic aspects of schema analysis, and show how to use schemas to annotate uncharacterized protein families.
PMCID: PMC2561054  PMID: 18949022
16.  Proteomic snapshot of the EGF-induced ubiquitin network 
In this work, the authors report the first proteome-wide analysis of EGF-regulated ubiquitination, revealing surprisingly pervasive growth factor-induced ubiquitination across a broad range of cellular systems and signaling pathways.
Epidermal growth factor (EGF) triggers a novel ubiquitin (Ub)-based signaling cascade that appears to intersect both housekeeping and regulatory circuitries of cellular physiology.The EGF-regulated Ubiproteome includes scores ubiquitinating and deubiquitinating enzymes, suggesting that the Ub signal might be rapidly transmitted and amplified through the Ub machinery.The EGF-Ubiproteome overlaps significantly with the EGF-phosphotyrosine proteome, pointing to a possible crosstalk between these two signaling mechanisms.The significant number of biological insights uncovered in our study (among which EphA2 as a novel, downstream ubiquitinated target of EGF receptor) illustrates the general relevance of such proteomic screens and calls for further analysis of the dynamics of the Ubiproteome.
Ubiquitination is a process by which one or more ubiquitin (Ub) monomers or chains are covalently attached to target proteins by E3 ligases. Deubiquitinating enzymes (DUBs) revert Ub conjugation, thus ensuring a dynamic equilibrium between pools of ubiquitinated and deubiquitinated proteins (Amerik and Hochstrasser, 2004). Traditionally, ubiquitination has been associated with protein degradation; however, it is now becoming apparent that this post-translation modification is an important signaling mechanism that can modulate the function, localization and protein/protein interaction abilities of targets (Mukhopadhyay and Riezman, 2007; Ravid and Hochstrasser, 2008).
One of the best-characterized signaling pathways involving ubiquitination is the epidermal growth factor (EGF)-induced pathway. Upon EGF stimulation, a variety of proteins are subject to Ub modification. These include the EGF receptor (EGFR), which undergoes both multiple monoubiquitination (Haglund et al, 2003) and K63-linked polyubiquitination (Huang et al, 2006), as well as components of the downstream endocytic machinery, which are modified by monoubiquitination (Polo et al, 2002; Mukhopadhyay and Riezman, 2007). Ubiquitination of the EGFR has been shown to have an impact on receptor internalization, intracellular sorting and metabolic fate (Acconcia et al, 2009). However, little is known about the wider impact of EGF-induced ubiquitination on cellular homeostasis and on the pleiotropic biological functions of the EGFR. In this paper, we attempt to address this issue by characterizing the repertoire of proteins that are ubiquitinated upon EGF stimulation, i.e., the EGF-Ubiproteome.
To achieve this, we employed two different purification procedures (endogenous—based on the purification of proteins modified by endogenous Ub from human cells; tandem affinity purification (TAP)—based on the purification of proteins modified by an ectopically expressed tagged-Ub from mouse cells) with stable isotope labeling with amino acids in cell culture-based MS to obtain both steady-state Ubiproteomes and EGF-induced Ubiproteomes. The steady-state Ubiproteomes consist of 1175 and 582 unambiguously identified proteins for the endogenous and TAP approaches, respectively, which we largely validated. Approximately 15% of the steady-state Ubiproteome was EGF-regulated at 10 min after stimulation; 176 of 1175 in the endogenous approach and 105 of 582 in the TAP approach. Both hyper- and hypoubiquitinated proteins were detected, indicating that EGFR-mediated signaling can modulate the ubiquitin network in both directions. Interestingly, many E2, E3 and DUBs were present in the EGF-Ubiproteome, suggesting that the Ub signal might be rapidly transmitted and amplified through the Ub machinery. Moreover, analysis of Ub-chain topology, performed using mass spectrometry and specific abs, suggested that the K63-linkage was the major Ub-based signal in the EGF-induced pathway.
To obtain a higher-resolution molecular picture of the EGF-regulated Ub network, we performed a network analysis on the non-redundant EGF-Ubiproteome (265 proteins). This analysis revealed that in addition to well-established liaisons with endocytosis-related pathways, the EGF-Ubiproteome intersects many circuitries of intracellular signaling involved in, e.g., DNA damage checkpoint regulation, cell-to-cell adhesion mechanisms and actin remodeling (Figure 5A).
Moreover, the EGF-Ubiproteome was enriched in hubs, proteins that can establish multiple protein/protein interaction and thereby regulate the organization of networks. These results are indicative of a crosstalk between EGFR-activated pathways and other signaling pathways through the Ub-network.
As EGF binding to its receptor also triggers a series of phosphorylation events, we examined whether there was any overlap between our EGF-Ubiproteome and published EGF-induced phosphotyrosine (pY) proteomes (Blagoev et al, 2004; Oyama et al, 2009; Hammond et al, 2010). We observed a significant overlap between ubiquitinated and pY proteins: 23% (61 of 265) of the EGF-Ubiproteome proteins were also tyrosine phosphorylated. Pathway analysis of these 61 Ub/pY-containing proteins revealed a significant enrichment in endocytic and signal-transduction pathways, while ‘hub analysis' revealed that Ub/pY-containing proteins are enriched in highly connected proteins to an even greater extent than Ub-containing proteins alone. These data point to a complex interplay between the Ub and pY networks and suggest that the flow of information from the receptor to downstream signaling molecules is driven by two complementary and interlinked enzymatic cascades: kinases/phosphatases and E3 ligases/DUBs.
Finally, we provided a proof of principle of the biological relevance of our EGF-Ubiproteome. We focused on EphA2, a receptor tyrosine kinase, which is involved in development and is often overexpressed in cancer (Pasquale, 2008). We started from the observation that EphA2 is present in the EGF-Ubiproteome and that proteins of the EGF-Ubiproteome are enriched in the Ephrin receptor signaling pathway(s). We confirmed the MS data by demonstrating that the EphA2 is ubiquitinated upon EGF stimulation. Moreover, EphA2 also undergoes tyrosine phosphorylation, indicating crosstalk between the two receptors. The EGFR kinase domain was essential for these modifications of EphA2, and a partial co-internalization with EGFR upon EGF activation was clearly detectable. Finally, we demonstrated by knockdown of EphA2 in MCF10A cells that this receptor is critically involved in EGFR biological outcomes, such as proliferation and migration (Figure 7).
Overall, our results unveil the complex impact of growth factor signaling on Ub-based intracellular networks to levels that extend well beyond what might have been expected and highlight the ‘resource' feature of our EGF-Ubiproteome.
The activity, localization and fate of many cellular proteins are regulated through ubiquitination, a process whereby one or more ubiquitin (Ub) monomers or chains are covalently attached to target proteins. While Ub-conjugated and Ub-associated proteomes have been described, we lack a high-resolution picture of the dynamics of ubiquitination in response to signaling. In this study, we describe the epidermal growth factor (EGF)-regulated Ubiproteome, as obtained by two complementary purification strategies coupled to quantitative proteomics. Our results unveil the complex impact of growth factor signaling on Ub-based intracellular networks to levels that extend well beyond what might have been expected. In addition to endocytic proteins, the EGF-regulated Ubiproteome includes a large number of signaling proteins, ubiquitinating and deubiquitinating enzymes, transporters and proteins involved in translation and transcription. The Ub-based signaling network appears to intersect both housekeeping and regulatory circuitries of cellular physiology. Finally, as proof of principle of the biological relevance of the EGF-Ubiproteome, we demonstrated that EphA2 is a novel, downstream ubiquitinated target of epidermal growth factor receptor (EGFR), critically involved in EGFR biological responses.
PMCID: PMC3049407  PMID: 21245847
EGF; network; proteomics; signaling; ubiquitin
17.  Dynamic interaction networks in a hierarchically organized tissue 
We have integrated gene expression profiling with database and literature mining, mechanistic modeling, and cell culture experiments to identify intercellular and intracellular networks regulating blood stem cell self-renewal.Blood stem cell fate in vitro is regulated non-autonomously by a coupled positive–negative intercellular feedback circuit, composed of megakaryocyte-derived stimulatory growth factors (VEGF, PDGF, EGF, and serotonin) versus monocyte-derived inhibitory factors (CCL3, CCL4, CXCL10, TGFB2, and TNFSF9).The antagonistic signals converge in a core intracellular network focused around PI3K, Raf, PLC, and Akt.Model simulations enable functional classification of the novel endogenous ligands and signaling molecules.
Intercellular (between cell) communication networks are required to maintain homeostasis and coordinate regenerative and developmental cues in multicellular organisms. Despite the recognized importance of intercellular networks in regulating adult stem and progenitor cell fate, the specific cell populations involved, and the underlying molecular mechanisms are largely undefined. Although a limited number of studies have applied novel bioinformatic approaches to unravel intercellular signaling in other cell systems (Frankenstein et al, 2006), a comprehensive analysis of intercellular communication in a stem cell-derived, hierarchical tissue network has yet to be reported.
As a model system to explore intercellular communication networks in a hierarchically organized tissue, we cultured human umbilical cord blood (UCB)-derived stem and progenitor cells in defined, minimal cytokine-supplemented liquid culture (Madlambayan et al, 2006). To systematically explore the molecular and cellular dynamics underlying primitive progenitor growth and differentiation, gene expression profiles of primitive (lineage negative; Lin−) and mature (lineage positive; Lin+) populations were generated during phases of stem cell expansion versus depletion. Parallel phenotypic and subproteomic experiments validated that mRNA expression correlated with complex measures of proteome activity (protein secretion and cell surface expression). Using a curated list of secreted ligand–receptor interactions and published expression profiles of purified mature blood populations, we implemented a novel algorithm to reconstruct the intercellular signaling networks established between stem cells and multi-lineage progeny in vitro. By correlating differential expression patterns with stem cell growth, we predict cell populations, pathways, and secreted ligands associated with stem cell self-renewal and differentiation (Figure 3A).
We then tested the correlative predictions in a series of cell culture experiments. UCB progenitor cell cultures were supplemented with saturating amounts of 18 putative regulatory ligands, or cocultured with purified mature blood lineages (megakaryocytes, monocytes, and erythrocytes), and analyzed for effects on total cell, progenitor, and primitive progenitor growth. At the primitive progenitor level, 3/5 novel predicted stimulatory ligands (EGF, PDGFB, and VEGF) displayed significant positive effects, 5/7 predicted inhibitory factors (CCL3, CCL4, CXCL10, TNFSF9, and TGFB2) displayed negative effects, whereas only 1/5 non-correlated ligand (CXCL7) displayed an effect. Also consistent with predictions from gene expression data, megakaryocytes and monocytes were found to stimulate and inhibit primitive progenitor growth, respectively, and these effects were attributable to differential secretome profiles of stimulatory versus inhibitory ligands.
Cellular responses to external stimuli, particularly in heterogeneous and dynamic cell populations, represent complex functions of multiple cell fate decisions acting both directly and indirectly on the target (stem cell) populations. Experimentally distinguishing the mode of action of cytokines is thus a difficult task. To address this we used our previously published interactive model of hematopoiesis (Kirouac et al, 2009) to classify experimentally identified regulatory ligands into one of four distinct functional categories based on their differential effects on cell population growth. TGFB2 was classified as a proliferation inhibitor, CCL4, CXCL10, SPARC, and TNFSF9 as self-renewal inhibitors, CCL3 a proliferation stimulator, and EGF, VEGF, and PDGFB as self-renewal stimulators.
Stem and progenitor cells exposed to combinatorial extracellular signals must propagate this information through intracellular molecular networks, and respond appropriately by modifying cell fate decisions. To explore how our experimentally identified positive and negative regulatory signals are integrated at the intracellular level, we constructed a blood stem cell self-renewal signaling network through extensive literature curation and protein–protein interaction (PPI) network mapping. We find that signal transduction pathways activated by the various stimulatory and inhibitory ligands converge on a limited set of molecular control nodes, forming a core subnetwork enriched for known regulators of self-renewal (Figure 6A). To experimentally test the intracellular signaling molecules computationally predicted as regulators of stem cell self-renewal, we obtained five small molecule antagonists against the kinases Phosphatidylinositol 3-kinase (PI3K), Raf, Akt, Phospholipase C (PLC), and MEK1. Liquid cultures were supplemented with the five molecules individually, and resultant cell population outputs compared against model simulations to deconvolute the functional effects on proliferation (and survival) versus self-renewal. This analysis classifies inhibition of PI3K and Raf activity as selectively targeting self-renewal, PLC as selectively targeting survival, and Akt as selectively targeting proliferation; MEK inhibition appears non-specific for these processes.
This represents the first systematic characterization of how cell fate decisions are regulated non-autonomously through lineage-specific interactions with differentiated progeny. The complex intercellular communication networks can be approximated as an antagonistic positive–negative feedback circuit, wherein progenitor expansion is modulated by a balance of megakaryocyte-derived stimulatory factors (EGF, PDGF, VEGF, and possibly serotonin) versus monocyte-derived inhibitory factors (CCL3, CCL4, CXCL10, TGFB2, and TNFSF9). This complex milieu of endogenous regulatory signals is integrated and processed within a core intracellular signaling network, resulting in modulation of cell-level kinetic parameters (proliferation, survival, and self-renewal). We reconstruct a stem cell associated intracellular network, and identify PI3K, Raf, Akt, and PLC as functionally distinct signal integration nodes, linking extracellular and intracellular signaling. These findings lay the groundwork for novel strategies to control blood stem cell self-renewal in vitro and in vivo.
Intercellular (between cell) communication networks maintain homeostasis and coordinate regenerative and developmental cues in multicellular organisms. Despite the importance of intercellular networks in stem cell biology, their rules, structure and molecular components are poorly understood. Herein, we describe the structure and dynamics of intercellular and intracellular networks in a stem cell derived, hierarchically organized tissue using experimental and theoretical analyses of cultured human umbilical cord blood progenitors. By integrating high-throughput molecular profiling, database and literature mining, mechanistic modeling, and cell culture experiments, we show that secreted factor-mediated intercellular communication networks regulate blood stem cell fate decisions. In particular, self-renewal is modulated by a coupled positive–negative intercellular feedback circuit composed of megakaryocyte-derived stimulatory growth factors (VEGF, PDGF, EGF, and serotonin) versus monocyte-derived inhibitory factors (CCL3, CCL4, CXCL10, TGFB2, and TNFSF9). We reconstruct a stem cell intracellular network, and identify PI3K, Raf, Akt, and PLC as functionally distinct signal integration nodes, linking extracellular, and intracellular signaling. This represents the first systematic characterization of how stem cell fate decisions are regulated non-autonomously through lineage-specific interactions with differentiated progeny.
PMCID: PMC2990637  PMID: 20924352
cellular networks; hematopoiesis; intercellular signaling; self-renewal; stem cells
18.  Identification of important interacting proteins (IIPs) in Plasmodium falciparum using large-scale interaction network analysis and in-silico knock-out studies 
Malaria Journal  2015;14:70.
Plasmodium falciparum causes the most severe form of malaria and affects 3.2 million people annually. Due to the increasing incidence of resistance to existing drugs, there is a growing need to discover new and more effective drugs against malaria. Despite the global importance of P. falciparum, vast majority of its proteins are uncharacterized experimentally. Application of newer approaches using several “omics” data has become successful for exploring the biological interactions underlying cellular processes. Till date not many system level study has been published using P. falciparum protein protein interaction. Hence, the purpose of this study is to develop a standardized pipeline for structural, functional, and topographical analysis of large scale protein protein interaction network (PPIN) in order to identify proteins important for network topology and integrity. Here, P. falciparum PPIN has been utilized as a model for better understanding of the molecular mechanisms of survival and pathogenesis of malaria parasite.
Various graph theoretical approaches were implemented to identify highly interacting hub and central proteins that are crucial for network integrity. Further, potential network perturbing proteins via an in-silico knock-out (KO) analysis to isolate important interacting proteins (IIPs), which in principle, can elicit significant impact on the global and local environments of the P. falciparum interaction network.
177 hubs and 132 central proteins were identified from the malarial (proteins: 1607; interactions: 4750) PPI networks. Using the in-silico knock-out exercise 131 and 99 global and local network perturbing proteins were also identified. Finally, 271 proteins from P. falciparum were shortlisted as important interacting proteins (IIPs), which not only play crucial role in intra-pathogen network integrity, stage specificity but also interact with various human proteins involved in multiple metabolic pathways within the host cell. These IIPs could be used as potential drug targets in malarial research.
Graph theoretical analysis of PPIN can be a very useful approach to identify proteins that are important for regulation of the interactions required for an organism’s survival. Important interacting proteins (IIPs) identified using P. falciparum PPIN provides a useful dataset containing probable candidates for future drug target analysis.
Electronic supplementary material
The online version of this article (doi:10.1186/s12936-015-0562-1) contains supplementary material, which is available to authorized users.
PMCID: PMC4333160
Centrality analysis; IIPs; Plasmodium; Graph theory; Host pathogen interaction; Hubs
19.  A New Method for the Discovery of Essential Proteins 
PLoS ONE  2013;8(3):e58763.
Experimental methods for the identification of essential proteins are always costly, time-consuming, and laborious. It is a challenging task to find protein essentiality only through experiments. With the development of high throughput technologies, a vast amount of protein-protein interactions are available, which enable the identification of essential proteins from the network level. Many computational methods for such task have been proposed based on the topological properties of protein-protein interaction (PPI) networks. However, the currently available PPI networks for each species are not complete, i.e. false negatives, and very noisy, i.e. high false positives, network topology-based centrality measures are often very sensitive to such noise. Therefore, exploring robust methods for identifying essential proteins would be of great value.
In this paper, a new essential protein discovery method, named CoEWC (Co-Expression Weighted by Clustering coefficient), has been proposed. CoEWC is based on the integration of the topological properties of PPI network and the co-expression of interacting proteins. The aim of CoEWC is to capture the common features of essential proteins in both date hubs and party hubs. The performance of CoEWC is validated based on the PPI network of Saccharomyces cerevisiae. Experimental results show that CoEWC significantly outperforms the classical centrality measures, and that it also outperforms PeC, a newly proposed essential protein discovery method which outperforms 15 other centrality measures on the PPI network of Saccharomyces cerevisiae. Especially, when predicting no more than 500 proteins, even more than 50% improvements are obtained by CoEWC over degree centrality (DC), a better centrality measure for identifying protein essentiality.
We demonstrate that more robust essential protein discovery method can be developed by integrating the topological properties of PPI network and the co-expression of interacting proteins. The proposed centrality measure, CoEWC, is effective for the discovery of essential proteins.
PMCID: PMC3605424  PMID: 23555595
20.  Interface-Resolved Network of Protein-Protein Interactions 
PLoS Computational Biology  2013;9(5):e1003065.
We define an interface-interaction network (IIN) to capture the specificity and competition between protein-protein interactions (PPI). This new type of network represents interactions between individual interfaces used in functional protein binding and thereby contains the detail necessary to describe the competition and cooperation between any pair of binding partners. Here we establish a general framework for the construction of IINs that merges computational structure-based interface assignment with careful curation of available literature. To complement limited structural data, the inclusion of biochemical data is critical for achieving the accuracy and completeness necessary to analyze the specificity and competition between the protein interactions. Firstly, this procedure provides a means to clarify the information content of existing data on purported protein interactions and to remove indirect and spurious interactions. Secondly, the IIN we have constructed here for proteins involved in clathrin-mediated endocytosis (CME) exhibits distinctive topological properties. In contrast to PPI networks with their global and relatively dense connectivity, the fragmentation of the IIN into distinctive network modules suggests that different functional pressures act on the evolution of its topology. Large modules in the IIN are formed by interfaces sharing specificity for certain domain types, such as SH3 domains distributed across different proteins. The shared and distinct specificity of an interface is necessary for effective negative and positive design of highly selective binding targets. Lastly, the organization of detailed structural data in a network format allows one to identify pathways of specific binding interactions and thereby predict effects of mutations at specific surfaces on a protein and of specific binding inhibitors, as we explore in several examples. Overall, the endocytosis IIN is remarkably complex and rich in features masked in the coarser PPI, and collects relevant detail of protein association in a readily interpretable format.
Author Summary
Much of the work inside the cell is carried out by proteins interacting with other proteins. Each edge in a protein-protein interaction network reflects these functional interactions and each node a separate protein, creating a complex structure that nevertheless follows well-established global and local patterns related to robust protein function. However, this network is not detailed enough to assess whether a particular protein can bind multiple interaction partners simultaneously through distinct interfaces, or whether the partners targeting a specific interface share similar structural or chemical properties. By breaking each protein node into its constituent interface nodes, we generate and assess such a detailed new network. To sample protein binding interactions broadly and accurately beyond those seen in crystal structures, our method combines computational interface assignment with data from biochemical studies. Using this approach we are able to assign interfaces to the majority of known interactions between proteins involved in the clathrin-mediated endocytosis pathway in yeast. Analysis of this interface-interaction network provides novel insights into the functional specificity of protein interactions, and highlights elements of cooperativity and competition among the proteins. By identifying diverse multi-protein complexes, interface-interaction networks also provide a map for targeted drug development.
PMCID: PMC3656101  PMID: 23696724
21.  Gene regulatory network inference: evaluation and application to ovarian cancer allows the prioritization of drug targets 
Genome Medicine  2012;4(5):41.
Altered networks of gene regulation underlie many complex conditions, including cancer. Inferring gene regulatory networks from high-throughput microarray expression data is a fundamental but challenging task in computational systems biology and its translation to genomic medicine. Although diverse computational and statistical approaches have been brought to bear on the gene regulatory network inference problem, their relative strengths and disadvantages remain poorly understood, largely because comparative analyses usually consider only small subsets of methods, use only synthetic data, and/or fail to adopt a common measure of inference quality.
We report a comprehensive comparative evaluation of nine state-of-the art gene regulatory network inference methods encompassing the main algorithmic approaches (mutual information, correlation, partial correlation, random forests, support vector machines) using 38 simulated datasets and empirical serous papillary ovarian adenocarcinoma expression-microarray data. We then apply the best-performing method to infer normal and cancer networks. We assess the druggability of the proteins encoded by our predicted target genes using the CancerResource and PharmGKB webtools and databases.
We observe large differences in the accuracy with which these methods predict the underlying gene regulatory network depending on features of the data, network size, topology, experiment type, and parameter settings. Applying the best-performing method (the supervised method SIRENE) to the serous papillary ovarian adenocarcinoma dataset, we infer and rank regulatory interactions, some previously reported and others novel. For selected novel interactions we propose testable mechanistic models linking gene regulation to cancer. Using network analysis and visualization, we uncover cross-regulation of angiogenesis-specific genes through three key transcription factors in normal and cancer conditions. Druggabilty analysis of proteins encoded by the 10 highest-confidence target genes, and by 15 genes with differential regulation in normal and cancer conditions, reveals 75% to be potential drug targets.
Our study represents a concrete application of gene regulatory network inference to ovarian cancer, demonstrating the complete cycle of computational systems biology research, from genome-scale data analysis via network inference, evaluation of methods, to the generation of novel testable hypotheses, their prioritization for experimental validation, and discovery of potential drug targets.
PMCID: PMC3506907  PMID: 22548828
22.  Automated identification of pathways from quantitative genetic interaction data 
We present a novel Bayesian learning method that reconstructs large detailed gene networks from quantitative genetic interaction (GI) data.The method uses global reasoning to handle missing and ambiguous measurements, and provide confidence estimates for each prediction.Applied to a recent data set over genes relevant to protein folding, the learned networks reflect known biological pathways, including details such as pathway ordering and directionality of relationships.The reconstructed networks also suggest novel relationships, including the placement of SGT2 in the tail-anchored biogenesis pathway, a finding that we experimentally validated.
Recent developments have enabled large-scale quantitative measurement of genetic interactions (GIs) that report on the extent to which the activity of one gene is dependent on a second. It has long been recognized (Avery and Wasserman, 1992; Hartman et al, 2001; Segre et al, 2004; Tong et al, 2004; Drees et al, 2005; Schuldiner et al, 2005; St Onge et al, 2007; Costanzo et al, 2010) that functional dependencies revealed by GI data can provide rich information regarding underlying biological pathways. Further, the precise phenotypic measurements provided by quantitative GI data can provide evidence for even more detailed aspects of pathway structure, such as differentiating between full and partial dependence between two genes (Drees et al, 2005; Schuldiner et al, 2005; St Onge et al, 2007; Jonikas et al, 2009) (Figure 1A). As GI data sets become available for a range of quantitative phenotypes and organisms, such patterns will allow researchers to elucidate pathways important to a diverse set of biological processes.
We present a new method that exploits the high-quality, quantitative nature of recent GI assays to automatically reconstruct detailed multi-gene pathway structures, including the organization of a large set of genes into coherent pathways, the connectivity and ordering within each pathway, and the directionality of each relationship. We introduce activity pathway networks (APNs), which represent functional dependencies among a set of genes in the form of a network. We present an automatic method to efficiently reconstruct APNs over large sets of genes based on quantitative GI measurements. This method handles uncertainty in the data arising from noise, missing measurements, and data points with ambiguous interpretations, by performing global reasoning that combines evidence from multiple data points. In addition, because some structure choices remain uncertain even when jointly considering all measurements, our method maintains multiple likely networks, and allows computation of confidence estimates over each structure choice.
We applied our APN reconstruction method to the recent high-quality GI data set of Jonikas et al (2009), which examined the functional interaction between genes that contribute to protein folding in the ER. Specifically, Jonikas et al used the cell's endogenous sensor (the unfolded protein response), to first identify several hundred yeast genes with functions in endoplasmic reticulum folding and then systematically characterized their functional interdependencies by measuring unfolded protein response levels in double mutants. Our analysis produced an ensemble of 500 likelihood-weighted APNs over 178 genes (Figure 2).
We performed an aggregate evaluation of our results by comparing to known biological relationships between gene pairs, including participation in pathways according to the Kyoto Encyclopedia of Genes and Genomes (KEGG), correlation of chemical genomic profiles in a recent high-throughput assay (Hillenmeyer et al, 2008) and similarity of Gene Ontology (GO) annotations. In each evaluation performed, our reconstructed APNs were significantly more consistent with the known relationships than either the raw GI values or the Pearson correlation between profiles of GI values.
Importantly, our approach provides not only an improved means for defining pairs or groups of related genes, but also enables the identification of detailed multi-gene network structures. In many cases, our method successfully reconstructed known cellular pathways, including the ER-associated degradation (ERAD) pathway, and the biosynthesis of N-linked glycans, ranking them among the highest confidence structures. In-depth examination of the learned network structures indicates agreement with many known details of these pathways. In addition, quantitative analysis indicates that our learned APNs are indicative of ordering within KEGG-annotated biological pathways.
Our results also suggest several novel relationships, including placement of uncharacterized genes into pathways, and novel relationships between characterized genes. These include the dependence of the J domain chaperone JEM1 on the PDI homolog MPD1, dependence of the Ubiquitin-recycling enzyme DOA4 on N-linked glycosylation, and the dependence of the E3 Ubiquitin ligase DOA10 on the signal peptidase complex subunit SPC2. Our APNs also place the poorly characterized TPR-containing protein SGT2 upstream of the tail-anchored protein biogenesis machinery components GET3, GET4, and MDY2 (also known as GET5), suggesting that SGT2 has a function in the insertion of tail-anchored proteins into membranes. Consistent with this prediction, our experimental analysis shows that sgt2Δ cells show a defect in localization of the tail-anchored protein GFP-Sed5 from punctuate Golgi structures to a more diffuse pattern, as seen in other genes involved in this pathway.
Our results show that multi-gene, detailed pathway networks can be reconstructed from quantitative GI data, providing a concrete computational manifestation to intuitions that have traditionally accompanied the manual interpretation of such data. Ongoing technological developments in both genetics and imaging are enabling the measurement of GI data at a genome-wide scale, using high-accuracy quantitative phenotypes that relate to a range of particular biological functions. Methods based on RNAi will soon allow collection of similar data for human cell lines and other mammalian systems (Moffat et al, 2006). Thus, computational methods for analyzing GI data could have an important function in mapping pathways involved in complex biological systems including human cells.
High-throughput quantitative genetic interaction (GI) measurements provide detailed information regarding the structure of the underlying biological pathways by reporting on functional dependencies between genes. However, the analytical tools for fully exploiting such information lag behind the ability to collect these data. We present a novel Bayesian learning method that uses quantitative phenotypes of double knockout organisms to automatically reconstruct detailed pathway structures. We applied our method to a recent data set that measures GIs for endoplasmic reticulum (ER) genes, using the unfolded protein response as a quantitative phenotype. The results provided reconstructions of known functional pathways including N-linked glycosylation and ER-associated protein degradation. It also contained novel relationships, such as the placement of SGT2 in the tail-anchored biogenesis pathway, a finding that we experimentally validated. Our approach should be readily applicable to the next generation of quantitative GI data sets, as assays become available for additional phenotypes and eventually higher-level organisms.
PMCID: PMC2913392  PMID: 20531408
computational biology; genetic interaction; pathway reconstruction; probabilistic methods
23.  Predicting and Validating Protein Interactions Using Network Structure 
PLoS Computational Biology  2008;4(7):e1000118.
Protein interactions play a vital part in the function of a cell. As experimental techniques for detection and validation of protein interactions are time consuming, there is a need for computational methods for this task. Protein interactions appear to form a network with a relatively high degree of local clustering. In this paper we exploit this clustering by suggesting a score based on triplets of observed protein interactions. The score utilises both protein characteristics and network properties. Our score based on triplets is shown to complement existing techniques for predicting protein interactions, outperforming them on data sets which display a high degree of clustering. The predicted interactions score highly against test measures for accuracy. Compared to a similar score derived from pairwise interactions only, the triplet score displays higher sensitivity and specificity. By looking at specific examples, we show how an experimental set of interactions can be enriched and validated. As part of this work we also examine the effect of different prior databases upon the accuracy of prediction and find that the interactions from the same kingdom give better results than from across kingdoms, suggesting that there may be fundamental differences between the networks. These results all emphasize that network structure is important and helps in the accurate prediction of protein interactions. The protein interaction data set and the program used in our analysis, and a list of predictions and validations, are available at
Author Summary
For understanding the complex activities within an organism, a complete and error-free network of protein interactions which occur in the organism would be a significant step forward. The large amount of experimentally derived data now available has provided us with a chance to study the complicated behaviour of protein interactions. The power of such studies, however, has been limited due to the high false positive and false negative rates in the datasets. We propose a network-based method, taking advantage of the tendency of clustering in protein interaction networks, to validate experimental data and to predict unknown interactions. The integration of multiple protein characteristics (i.e., structure, function, etc.) allows our predictive method to significantly outperform two other approaches based on homology and protein-domain relationships on datasets which contain a large amount of interactions, but not much detailed information on the proteins involved in the interactions. In addition, our predictive score based on triadic interaction patterns improves over a pair-wise approach, suggesting the importance of network structure. Moreover, using pooled interactions as prior information, we find evidence for fundamental differences in protein interaction networks between eukaryotes and prokaryotes.
PMCID: PMC2435280  PMID: 18654616
24.  Elucidation of functional consequences of signalling pathway interactions 
BMC Bioinformatics  2009;10:370.
A great deal of data has accumulated on signalling pathways. These large datasets are thought to contain much implicit information on their molecular structure, interaction and activity information, which provides a picture of intricate molecular networks believed to underlie biological functions. While tremendous advances have been made in trying to understand these systems, how information is transmitted within them is still poorly understood. This ever growing amount of data demands we adopt powerful computational techniques that will play a pivotal role in the conversion of mined data to knowledge, and in elucidating the topological and functional properties of protein - protein interactions.
A computational framework is presented which allows for the description of embedded networks, and identification of common shared components thought to assist in the transmission of information within the systems studied. By employing the graph theories of network biology - such as degree distribution, clustering coefficient, vertex betweenness and shortest path measures - topological features of protein-protein interactions for published datasets of the p53, nuclear factor kappa B (NF-κB) and G1/S phase of the cell cycle systems were ascertained. Highly ranked nodes which in some cases were identified as connecting proteins most likely responsible for propagation of transduction signals across the networks were determined. The functional consequences of these nodes in the context of their network environment were also determined. These findings highlight the usefulness of the framework in identifying possible combination or links as targets for therapeutic responses; and put forward the idea of using retrieved knowledge on the shared components in constructing better organised and structured models of signalling networks.
It is hoped that through the data mined reconstructed signal transduction networks, well developed models of the published data can be built which in the end would guide the prediction of new targets based on the pathway's environment for further analysis. Source code is available upon request.
PMCID: PMC2778660  PMID: 19895694
25.  An interdomain sector mediating allostery in Hsp70 molecular chaperones 
The Hsp70 family of molecular chaperones provides a well defined and experimentally powerful model system for understanding allosteric coupling between different protein domains.New extensions to the statistical coupling analysis (SCA) method permit identification of a group of co-evolving amino-acid positions—a sector—in the Hsp70 that is associated with allosteric function.Literature-based and new experimental studies support the notion that the protein sector identified through SCA underlies the allosteric mechanism of Hsp70.This work extends the concept of protein sectors by showing that two non-homologous protein domains can share a single sector when the underlying biological function is defined by the coupled activity of the two domains.
Allostery is a biologically critical property by which distantly positioned functional surfaces on proteins functionally interact. This property remains difficult to elucidate at a mechanistic level (Smock and Gierasch, 2009) because long-range coupling within proteins arises from the cooperative action of groups of amino acids. As a case study, consider the Hsp70 molecular chaperones, a large and diverse family of two-domain allosteric proteins required for cellular viability in nearly every organism (Figure 1) (Mayer and Bukau, 2005). In the ADP-bound state, the two domains act independently, the C-terminal substrate-binding domain displays a stable configuration in which the so-called ‘lid' region is docked against the β-sandwich subdomain, and substrates bind with relatively high affinity (Figure 1A) (Moro et al, 2003; Swain et al, 2007; Bertelsen et al, 2009). Exchange of ADP for ATP in the N-terminal nucleotide-binding domain causes significant local and propagated conformational change, formation of an interface with the substrate-binding domain, opening of the lid subdomain, and a decrease in the binding affinity for substrates (Figure 1B) (Rist et al, 2006; Swain et al, 2007). Upon ATP hydrolysis by the nucleotide-binding domain, Hsp70 is returned to the ADP-bound configuration suitable for another round of substrate binding and release. This process of cyclical substrate binding and release underlies all biological functions of Hsp70 proteins.
What is the structural basis for the long-range functional coupling within Hsp70? When allostery is a conserved property of a protein family, one approach to this problem is to analyze the correlated evolution of amino acids in the family—the expected statistical signature of cooperative action of protein residues (Lockless and Ranganathan, 1999; Kass and Horovitz, 2002; Suel et al, 2003). Previous work using an implementation of this concept (the statistical coupling analysis or SCA) showed that proteins contain sparse networks of co-evolving amino acids termed ‘sectors' that link protein active sites with distinct functional surfaces through the protein core (Halabi et al, 2009). This architecture is consistent with known allosteric mechanisms in protein domains (Suel et al, 2003; Halabi et al, 2009).
However, the principle of co-evolution of protein residues need not be limited to the study of individual protein domains. Indeed, conserved allosteric coupling between two (or more) non-homologous domains implies the existence of shared sectors that span functional sites on different domains. Here, we test this concept by extending the SCA method to consider the allosteric mechanism acting between the two domains of the Hsp70 proteins. Hsp70-like proteins include not only the allosteric Hsp70s, but also the Hsp110s—homologs that contain both domains and are regarded as structural models for Hsp70s, but that do not exhibit allosteric coupling. In this study, we take advantage of the functional divergence between the Hsp70s and Hsp110s to reveal patterns of co-evolution between amino acids that are specifically associated with the allosteric mechanism.
To identify the allosteric sector in Hsp70, we used SCA to compute a weighted correlation matrix, C̃, that describes the co-evolution of every pair of amino-acids positions in a sequence alignment of 926 members of the Hsp70/110 family. We then applied a mathematical method known as singular value decomposition to simultaneously evaluate the pattern of divergence between sequences and the pattern of co-evolution between amino-acid positions. The basic idea is that if the pattern of sequence divergence is able to classify members of a protein family into distinct functional subgroups, then we can rigorously identify the group of co-evolving residues that correspond to the underlying mechanism. Figure 2A shows the principal axis of sequence variation in the Hsp70/110 family, showing a clear separation of the allosteric (Hsp70) and non-allosteric (Hsp110) members of this family. The corresponding axis of co-evolution between amino-acid positions reveals a subset of Hsp70/110 positions (∼20%, 115 residues out of 605 total) that underlie the divergence of Hsp70 and Hsp110 proteins (Figure 2B). These positions derive roughly equally from the nucleotide-binding domain (in blue, 56 positions) and the substrate-binding domain (in green, 59 positions) and are more conserved within the Hsp70 sub-family. These results define a protein sector that is predicted to underlie the allosteric mechanism of Hsp70.
What is the structural arrangement of the putative allosteric sector within the Hsp70 protein? Consistent with a function in allosteric coupling, the 115 sector residues form a physically contiguous network of atoms, linking the ATP-binding site on the nucleotide-binding domain to the substrate recognition site on the substrate-binding domain through the interdomain interface (Figure 2C). The physical connectivity is remarkable given that only ∼20% of overall Hsp70 residues is involved (Figure 2B). Thus, functionally coupled but non-homologous protein domains can share a single sector of co-evolving residues that connects their respective functional sites.
We compared the Hsp70 sector mapping with the large body of biochemical studies that have been carried out in this family. We find strong experimental support for the involvement of sector positions in the Hsp70 allosteric mechanism in several regions: (1) within the ATP-binding site, (2) at the interface linking the two domains, and (3) within the β-sandwich core of the substrate-binding domain. The sector analysis also makes predictions about the involvement of some previously untested residues; we show that mutations at two such sites in fact reduce the allosteric coupling within Hsp70 in vitro and fail to complement a DnaK knockout strain of E. coli in a stress-response assay. Taken together, we conclude that sector positions are associated with the allosteric mechanism of Hsp70.
This work also adds a new finding with regard to the concept of protein sectors. Previous work showed that multiple quasi-independent sectors, each of which contributes a different aspect of function, are possible within a single protein domain (Halabi et al, 2009). This work shows that a single sector can also span two different protein domains when biological function (here, nucleotide-dependent substrate binding) arises from their coupled action. This result emphasizes the point that sectors are units of functional selection and are not obviously related to traditional hierarchies of structural organization in proteins. An interesting possibility is that evolution of allostery between proteins might evolve through the joining of protein sectors, a conjecture that can be tested in future work.
Allosteric coupling between protein domains is fundamental to many cellular processes. For example, Hsp70 molecular chaperones use ATP binding by their actin-like N-terminal ATPase domain to control substrate interactions in their C-terminal substrate-binding domain, a reaction that is critical for protein folding in cells. Here, we generalize the statistical coupling analysis to simultaneously evaluate co-evolution between protein residues and functional divergence between sequences in protein sub-families. Applying this method in the Hsp70/110 protein family, we identify a sparse but structurally contiguous group of co-evolving residues called a ‘sector', which is an attribute of the allosteric Hsp70 sub-family that links the functional sites of the two domains across a specific interdomain interface. Mutagenesis of Escherichia coli DnaK supports the conclusion that this interdomain sector underlies the allosteric coupling in this protein family. The identification of the Hsp70 sector provides a basis for further experiments to understand the mechanism of allostery and introduces the idea that cooperativity between interacting proteins or protein domains can be mediated by shared sectors.
PMCID: PMC2964120  PMID: 20865007
allostery; chaperone; co-evolution; SCA; sector

Results 1-25 (1581419)