We consider the problem of building a model to predict protein-protein interactions (PPIs) between the bacterial species Salmonella Typhimurium and the plant host Arabidopsis thaliana which is a host-pathogen pair for which no known PPIs are available. To achieve this, we present approaches, which use homology and statistical learning methods called “transfer learning.” In the transfer learning setting, the task of predicting PPIs between Arabidopsis and its pathogen S. Typhimurium is called the “target task.” The presented approaches utilize labeled data i.e., known PPIs of other host-pathogen pairs (we call these PPIs the “source tasks”). The homology based approaches use heuristics based on biological intuition to predict PPIs. The transfer learning methods use the similarity of the PPIs from the source tasks to the target task to build a model. For a quantitative evaluation we consider Salmonella-mouse PPI prediction and some other host-pathogen tasks where known PPIs exist. We use metrics such as precision and recall and our results show that our methods perform well on the target task in various transfer settings. We present a brief qualitative analysis of the Arabidopsis-Salmonella predicted interactions. We filter the predictions from all approaches using Gene Ontology term enrichment and only those interactions involving Salmonella effectors. Thereby we observe that Arabidopsis proteins involved e.g., in transcriptional regulation, hormone mediated signaling and defense response may be affected by Salmonella.
protein interaction prediction; host pathogen protein interactions; plant pathogen protein interactions; machine learning methods; transfer learning; kernel mean matching
Salmonellosis is the most frequent foodborne disease worldwide and can be transmitted to humans by a variety of routes, especially via animal and plant products. Salmonella bacteria are believed to use not only animal and human but also plant hosts despite their evolutionary distance. This raises the question if Salmonella employs similar mechanisms in infection of these diverse hosts. Given that most of our understanding comes from its interaction with human hosts, we investigate here to what degree knowledge of Salmonella–human interactions can be transferred to the Salmonella–plant system. Reviewed are recent publications on analysis and prediction of Salmonella–host interactomes. Putative protein–protein interactions (PPIs) between Salmonella and its human and Arabidopsis hosts were retrieved utilizing purely interolog-based approaches in which predictions were inferred based on available sequence and domain information of known PPIs, and machine learning approaches that integrate a larger set of useful information from different sources. Transfer learning is an especially suitable machine learning technique to predict plant host targets from the knowledge of human host targets. A comparison of the prediction results with transcriptomic data shows a clear overlap between the host proteins predicted to be targeted by PPIs and their gene ontology enrichment in both host species and regulation of gene expression. In particular, the cellular processes Salmonella interferes with in plants and humans are catabolic processes. The details of how these processes are targeted, however, are quite different between the two organisms, as expected based on their evolutionary and habitat differences. Possible implications of this observation on evolution of host–pathogen communication are discussed.
host–pathogen interactions; systems biology; prediction; pathways; interactome
The central role of mitochondria in metabolic pathways and in cell death mechanisms requires sophisticated signaling systems. Essential in this signaling process is an array of lipid mediators derived from polyunsaturated fatty acids. However, the molecular machinery for the production of oxygenated polyunsaturated fatty acids is localized in the cytosol and their biosynthesis has not been identified in mitochondria. Here we report that a range of diversified polyunsaturated molecular species derived from a mitochondria-specific phospholipid, cardiolipin, are oxidized by the intermembrane space hemoprotein, cytochrome c. We show that an assortment of oxygenated cardiolipin species undergoes phospholipase A2-catalyzed hydrolysis thus generating multiple oxygenated fatty acids, including well known lipid mediators. This represents a new biosynthetic pathway for lipid mediators. We demonstrate that this pathway including oxidation of polyunsaturated cardiolipins and accumulation of their hydrolysis products – oxygenated linoleic, arachidonic acids and monolyso-cardiolipins – is activated in vivo after acute tissue injury.
A key challenge in interdisciplinary research is choosing the best approach from a large number of techniques derived from different disciplines and their interfaces.
To address this challenge in the area of Biophysics and Structural Biology, we have designed a graduate level course to teach students insightful use of experimental biophysical approaches in relationship to addressing biological questions related to biomolecular interactions and dynamics. A weekly seminar and data and literature club are used to compliment the training in class. The course contains wet-laboratory experimental demonstration and real-data analysis as well as lectures, grant proposal preparation and assessment, and student presentation components. Active student participation is mandatory in all aspects of the class. Students prepare materials for the class receiving individual and iterative feedback from course directors and local experts generating high quality classroom presentations.
The ultimate goal of the course is to teach students the skills needed to weigh different experimental approaches against each other in addressing a specific biological question by thinking and executing academic tasks like faculty.
Teaching; Interdisciplinary education; Molecular biophysics and structural biology; Student lectures
Eosinophil peroxidase (EPO) is one of the major oxidant-producing enzymes during inflammatory states in the human lung. The degradation of single-walled carbon nanotubes (SWCNTs) upon incubation with human EPO and H2O2 is reported. Biodegradation of SWCNTs is higher in the presence of NaBr, but neither EPO alone nor H2O2 alone caused the degradation of nanotubes. Molecular modeling reveals two binding sites for SWCNTs on EPO, one located at the proximal side (same side as the catalytic site) and the other on the distal side of EPO. The oxidized groups on SWCNTs in both cases are stabilized by electrostatic interactions with positively charged residues. Biodegradation of SWCNTs can also be executed in an ex vivo culture system using primary murine eosinophils stimulated to undergo degranulation. Biodegradation is proven by a range of methods including transmission electron microscopy, UV-visible-NIR spectroscopy, Raman spectroscopy, and confocal Raman imaging. Thus, human EPO (in vitro) and ex vivo activated eosinophils mediate biodegradation of SWCNTs: an observation that is relevant to pulmonary responses to these materials.
Recognition of injured mitochondria for degradation by macroautophagy is essential for cellular health, but the mechanisms remain poorly understood. Cardiolipin is an inner mitochondrial membrane phospholipid. We found that rotenone, staurosporine, 6-hydroxydopamine and other pro-mitophagy stimuli caused externalization of cardiolipin to the mitochondrial surface in primary cortical neurons and SH-SY5Y cells. RNAi knockdown of cardiolipin synthase or of phospholipid scramblase-3, which transports cardiolipin to the outer mitochondrial membrane, decreased mitochondrial delivery to autophagosomes. Furthermore, we found that the autophagy protein microtubule-associated-protein-1-light chain-3 (LC3), which mediates both autophagosome formation and cargo recognition, contains cardiolipin-binding sites important for the engulfment of mitochondria by the autophagic system. Mutation of LC3 residues predicted as cardiolipin-interaction sites by computational modeling inhibited its participation in mitophagy. These data indicate that redistribution of cardiolipin serves as an “eat-me” signal for the elimination of damaged mitochondria from neuronal cells.
Ca2+ independent lipoprotein associated phospholipase A2 (Lp-PLA2) is a member of the phospholipase A2 superfamily with a distinguishing characteristic of high specificity for oxidatively modified sn-2 fatty acid residues in phospholipids which has been especially well characterized for peroxidized species of phosphatidylcholines (PC). The ability of Lp-PLA2 to hydrolyze peroxidized species of phosphatidylserine (PS) – acting as a recognition signal for clearance of apoptotic cells by professional phagocytes - as well as the products of the reaction have not been investigated. We performed LC-MS-ESI-based structural characterization of oxygenated/hydrolyzed molecular species of PS - containing linoleic acid in either sn-2 position (C18:0/C18:2) or in both sn-1 and sn-2 positions (C18:2/C18:2) - formed in cytochrome c/ H2O2 driven enzymatic oxidation reaction. Cytochrome c has been chosen as a catalyst of peroxidation reactions due to its likely involvement in PS oxidation in apoptotic cells. We found that Lp-PLA2 catalyzed the hydrolysis of both non-truncated and truncated (oxidatively fragmented) species of oxidized PS species albeit with different efficiencies and performed detailed characterization of the major reaction products – oxygenated derivatives of linoleic acid as well as non-oxygenated and oxygenated species of lyso-PS. Among linoleic acid products, derivatives oxygenated at the C9 position, including 9-hydroxyoctadecadienoic acid (9-HODE) – a potent ligand of G protein-coupled receptor G2A - were the most abundant. Computer modeling of interactions of Lp-PLA2 with different PS oxidized species indicated that they are able to bind in proximity (<5Å) to Ser273 and His351 of the catalytic triad. For 9-hydroxy- and 9-hydroperoxy- derivatives of oxidized PS, the sn-2 ester bond was positioned within the very close proximity (<3Å) from the Ser273 residue - a nucleophile directly attacking the sn-2 bond – thus favoring the hydrolysis reaction. We suggest that oxidatively modified free fatty acids and lyso-PS species generated by Lp-PLA2 may represent important signals facilitating and regulating execution of apoptotic and phagocytosis programs essential for control of inflammation.
G protein coupled receptors (GPCRs) bind diverse classes of ligands, and depending on the receptor, these may bind in their transmembrane or the extracellular domains, demonstrating the principal ability of GPCRs to bind ligand in either domains. Most recently, it was also observed that small molecule ligands can bind in the cytoplasmic domain, and modulate binding and response to extracellular or transmembrane ligands. Thus, all three domains in GPCRs are potential sites for allosteric ligands, and whether a ligand is allosteric or orthosteric depends on the receptor. Here, we will review the evidence supporting the presence of putative binding pockets in all three domains of GPCRs and discuss possible pathways of communication between these pockets.
Rhodopsin; Metabotropic Glutamate Receptors; Allosteric Network; Communication; Membrane Proteins
Motivation: An important aspect of infectious disease research involves understanding the differences and commonalities in the infection mechanisms underlying various diseases. Systems biology-based approaches study infectious diseases by analyzing the interactions between the host species and the pathogen organisms. This work aims to combine the knowledge from experimental studies of host–pathogen interactions in several diseases to build stronger predictive models. Our approach is based on a formalism from machine learning called ‘multitask learning’, which considers the problem of building models across tasks that are related to each other. A ‘task’ in our scenario is the set of host–pathogen protein interactions involved in one disease. To integrate interactions from several tasks (i.e. diseases), our method exploits the similarity in the infection process across the diseases. In particular, we use the biological hypothesis that similar pathogens target the same critical biological processes in the host, in defining a common structure across the tasks.
Results: Our current work on host–pathogen protein interaction prediction focuses on human as the host, and four bacterial species as pathogens. The multitask learning technique we develop uses a task-based regularization approach. We find that the resulting optimization problem is a difference of convex (DC) functions. To optimize, we implement a Convex–Concave procedure-based algorithm. We compare our integrative approach to baseline methods that build models on a single host–pathogen protein interaction dataset. Our results show that our approach outperforms the baselines on the training data. We further analyze the protein interaction predictions generated by the models, and find some interesting insights.
Availability: The predictions and code are available at: http://www.cs.cmu.edu/∼mkshirsa/ismb2013_paper320.html
Supplementary data are available at Bioinformatics online.
The pulmonary route represents one of the most important portals of entry for nanoparticles into the body. However, the in vivo interactions of nanoparticles with biomolecules of the lung have not been sufficiently studied. Here, using an established mouse model of pharyngeal aspiration of single-walled carbon nanotubes (SWCNTs), we recovered SWCNTs from the bronchoalveolar lavage fluid (BALf), purified them from possible contamination with lung cells and examined the composition of phospholipids adsorbed on SWCNTs by liquid chromatography mass spectrometry (LC-MS) analysis. We found that SWCNTs selectively adsorbed two types of the most abundant surfactant phospholipids – phosphatidylcholines (PC) and phosphatidylglycerols (PG). Molecular speciation of these phospholipids was also consistent with pulmonary surfactant. Quantitation of adsorbed lipids by LC-MS along with the structural assessments of phospholipid binding by atomic force microscopy and molecular modeling indicated that the phospholipids (~108 molecules per SWCNT) formed an uninterrupted “coating” whereby the hydrophobic alkyl chains of the phospholipids were adsorbed onto the SWCNT with the polar head groups pointed away from the SWCNT into the aqueous phase. In addition, the presence of surfactant proteins A, B and D on SWCNTs was determined by LC-MS. Finally, we demonstrated that the presence of this surfactant coating markedly enhanced the in vitro uptake of SWCNTs by macrophages. Taken together, this is the first demonstration of the in vivo adsorption of the surfactant lipids and proteins on SWCNTs in a physiologically relevant animal model.
carbon nanotubes; surfactant; macrophages; mass spectrometric analysis
Salmonellosis caused by Salmonella bacteria is a food-borne disease and worldwide health threat causing millions of infections and thousands of deaths every year. This pathogen infects an usually broad range of host organisms including human and plants. A better understanding of the mechanisms of communication between Salmonella and its hosts requires identifying the interactions between Salmonella and host proteins. Protein-protein interactions (PPIs) are the fundamental building blocks of communication. Here we utilize the prediction platform BIANA to obtain the putative Salmonella-human and Salmonella-Arabidopsis interactomes based on sequence and domain similarity to known PPIs. A gold standard list of Salmonella-host PPIs served to validate the quality of the human model. 24,726 and 10,926 PPIs comprising interactions between 38 and 33 Salmonella effectors and virulence factors with 9,740 human and 4,676 Arabidopsis proteins, respectively, were predicted. Putative hub proteins could be identified and parallels between the two interactomes were discovered. This approach can provide insight into possible biological functions of so far uncharacterized proteins. The predicted interactions are available via a web interface which allows filtering of the database according to parameters provided by the user to narrow down the list of suspected interactions. The interactions are available via a webinterface at http://sbi.imim.es/web/SHIPREC.php
The risk of radionuclide release in terrorist acts or exposure of healthy tissue during radiotherapy demand potent radioprotectants/radiomitigators. Ionizing radiation induces cell death by initiating the selective peroxidation of cardiolipin in mitochondria by the peroxidase activity of its complex with cytochrome c leading to release of hemoprotein into the cytosol and commitment to the apoptotic program. Here we design and synthesize mitochondria-targeted triphenylphosphonium-conjugated imidazole-substituted oleic and stearic acids which blocked peroxidase activity of cytochrome c/cardiolipin complex by specifically binding to its heme-iron. We show that both compounds inhibit pro-apoptotic oxidative events, suppress cyt c release, prevent cell death, and protect mice against lethal doses of irradiation. Significant radioprotective/radiomitigative effects of imidazole-substituted oleic acid are observed after pretreatment of mice from 1 hr before through 24 hrs after the irradiation.
Salmonella bacteria cause millions of infections and thousands of deaths every year. This pathogen has an unusually broad host range including humans, animals, and even plants. During infection, Salmonella expresses a variety of virulence factors and effectors that are delivered into the host cell triggering cellular responses through protein–protein interactions (PPIs) with host cell proteins which make the pathogen’s invasion and replication possible. To speed up proteomic efforts in elucidating Salmonella–host interactomes, we carried out a survey of the currently published Salmonella–host PPI. Such a list can serve as the gold standard for computational models aimed at predicting Salmonella–host interactomes through integration of large-scale biological data sources. Manual literature and database search of >2200 journal articles and >100 databases resulted in a gold standard list of currently 62 PPI, including primarily interactions of Salmonella proteins with human and mouse proteins. Only six of these interactions were directly retrievable from PPI databases and 16 were highlighted in databases featuring literature extracts. Thus, the literature survey resulted in the most complete interactome available to date for Salmonella. Pathway analysis using Ingenuity and Broad Gene Set Enrichment Analysis (GSEA) software revealed among general pathways such as MAPK signaling in particular those related to cell death as well as cell morphology, turnover, and interactions, in addition to response to not only Salmonella but also other pathogenic – viral and bacterial – infections. The list of interactions is available at http://www.shiprec.org/indicationslist.htm
Interactome; Pathway analysis; Protein–protein interaction; Salmonella; Subnetwork analysis
Motivation: Approaches that use supervised machine learning techniques for protein–protein interaction (PPI) prediction typically use features obtained by integrating several sources of data. Often certain attributes of the data are not available, resulting in missing values. In particular, our host–pathogen PPI datasets have a large fraction, in the range of 58–85% of missing values, which makes it challenging to apply machine learning algorithms.
Results: We show that specialized techniques for missing value imputation can improve the performance of the models significantly. We use cross species information in combination with machine learning techniques like Group lasso with ℓ1/ℓ2 regularization. We demonstrate the benefits of our approach on two PPI prediction problems. In our first example of Salmonella–human PPI prediction, we are able to obtain high prediction accuracies with 77.6% precision and 84% recall. Comparison with various other techniques shows an improvement of 9 in F1 score over the next best technique. We also apply our method to Yersinia–human PPI prediction successfully, demonstrating the generality of our approach.
Availability: Predicted interactions, datasets, features are available at: http://www.cs.cmu.edu/~mkshirsa/eccb2012_paper46.html.
Supplementary data are available at Bioinformatics online.
Formation of cytochrome c (cyt c)/cardiolipin (CL) peroxidase complex selective towards peroxidation of polyunsaturated CLs is a pre-requisite for mitochondrial membrane permeabilization. Tyrosine residues – via the generation of tyrosyl radicals (Tyr•) - are likely reactive intermediates of the peroxidase cycle leading to CL peroxidation. We used mutants of horse heart cyt c in which each of the four Tyr residues was substituted for Phe and assessed their contribution to the peroxidase catalysis. Tyr67Phe mutation was associated with a partial loss of the oxygenase function of the cyt c/CL complex and the lowest concentration of H2O2-induced Tyr radicals in electron paramagnetic resonance (EPR) spectra. Our MS experiments directly demonstrated decreased production of CL-hydroperoxides (CL-OOH) by Tyr67Phe mutant. Similarly, oxidation of a phenolic substrate, Amplex Red, was affected to a greater extent in Tyr67Phe than in three other mutants. Tyr67Phe mutant exerted high resistance to H2O2-induced oligomerization. Measurements of Tyr fluorescence, hetero-nuclear magnetic resonance (NMR) and computer simulations position Tyr67 in close proximity to the porphyrin ring heme iron and one of the two axial heme-iron ligand residues, Met80. Thus, the highly conserved Tyr67 is a likely electron-donor (radical acceptor) in the oxygenase half-reaction of the cyt c/CL peroxidase complex.
cytochrome c; cardiolipin; tyrosine; cardiolipin hydroperoxides; peroxidase
Viruses depend on their hosts at every stage of their life cycles and must therefore communicate with them via Protein-Protein Interactions (PPIs). To investigate the mechanisms of communication by different viruses, we overlay reported pairwise human-virus PPIs on human signalling pathways. Of 671 pathways obtained from NCI and Reactome databases, 355 are potentially targeted by at least one virus. The majority of pathways are linked to more than one virus. We find evidence supporting the hypothesis that viruses often interact with different proteins depending on the targeted pathway. Pathway analysis indicates overrepresentation of some pathways targeted by viruses. The merged network of the most statistically significant pathways shows several centrally located proteins, which are also hub proteins. Generally, hub proteins are targeted more frequently by viruses. Numerous proteins in virus-targeted pathways are known drug targets, suggesting that these might be exploited as potential new approaches to treatments against multiple viruses.
systems biology; protein-protein interaction networks; host-pathogen interactions; protein function; signal transduction; human immunodeficiency virus-1; human papillomavirus; bovine papillomavirus; Epstein-Barr virus; human herpesvirus
G protein coupled receptors (GPCRs) are seven helical transmembrane proteins that function as signal transducers. They bind ligands in their extracellular and transmembrane regions and activate cognate G proteins at their intracellular surface at the other side of the membrane. The relay of allosteric communication between the ligand binding site and the distant G protein binding site is poorly understood. In this study, GREMLIN
, a recently developed method that identifies networks of co-evolving residues from multiple sequence alignments, was used to identify those that may be involved in communicating the activation signal across the membrane. The GREMLIN-predicted long-range interactions between amino acids were analyzed with respect to the seven GPCR structures that have been crystallized at the time this study was undertaken.
GREMLIN significantly enriches the edges containing residues that are part of the ligand binding pocket, when compared to a control distribution of edges drawn from a random graph. An analysis of these edges reveals a minimal GPCR binding pocket containing four residues (T1183.33, M2075.42, Y2686.51 and A2927.39). Additionally, of the ten residues predicted to have the most long-range interactions (A1173.32, A2726.55, E1133.28, H2115.46, S186EC2, A2927.39, E1223.37, G902.57, G1143.29 and M2075.42), nine are part of the ligand binding pocket.
We demonstrate the use of GREMLIN to reveal a network of statistically correlated and functionally important residues in class A GPCRs. GREMLIN identified that ligand binding pocket residues are extensively correlated with distal residues. An analysis of the GREMLIN edges across multiple structures suggests that there may be a minimal binding pocket common to the seven known GPCRs. Further, the activation of rhodopsin involves these long-range interactions between extracellular and intracellular domain residues mediated by the retinal domain.
GPCR; GREMLIN; Long-range interactions; Ligand binding pocket; Graphical model
Protein–protein interactions (PPIs) play a crucial role in biology, and high-throughput experiments have greatly increased the coverage of known interactions. Still, identification of complete inter- and intraspecies interactomes is far from being complete. Experimental data can be complemented by the prediction of PPIs within an organism or between two organisms based on the known interactions of the orthologous genes of other organisms (interologs). Here, we present the BIANA (Biologic Interactions and Network Analysis) Interolog Prediction Server (BIPS), which offers a web-based interface to facilitate PPI predictions based on interolog information. BIPS benefits from the capabilities of the framework BIANA to integrate the several PPI-related databases. Additional metadata can be used to improve the reliability of the predicted interactions. Sensitivity and specificity of the server have been calculated using known PPIs from different interactomes using a leave-one-out approach. The specificity is between 72 and 98%, whereas sensitivity varies between 1 and 59%, depending on the sequence identity cut-off used to calculate similarities between sequences. BIPS is freely accessible at http://sbi.imim.es/BIPS.php.
Mortality from head and neck squamous cell carcinoma (HNSCC) is usually associated with locoregional invasion of the tumor into vital organs including the airway. Understanding the signaling mechanisms that abrogate HNSCC invasion may reveal novel therapeutic targets for intervention. The purpose of this study was to investigate the efficacy of combined inhibition of c-Src and PLCγ-1 in the abrogation of head and neck squamous cell carcinoma (HNSCC) invasion.
PLCγ-1 and c-Src inhibition was achieved by a combination of small molecule inhibitors and dominant negative approaches. The effect of inhibition of PLCγ-1 and c-Src on invasion of HNSCC cells was assessed in an in vitro Matrigel coated transwell invasion assay. In addition, the immunoprecipitation reactions and in silico database mining was used to examine the interactions between PLCγ-1 and c-Src.
Here we demonstrate that Inhibition of PLCγ-1 or c-Src with the PLC inhibitor U73122 or the Src family inhibitor AZD0530, or using dominant-negative constructs attenuated EGF-stimulated HNSCC invasion. Further, EGF stimulation increased the association between PLCγ-1 and c-Src in HNSCC cells. Combined inhibition of PLCγ-1 and c-Src resulted in further attenuation of HNSCC cell invasion in vitro.
These cumulative results suggest that PLCγ-1 and c-Src activation contribute to HNSCC invasion downstream of EGFR and that targeting these pathways may be a novel strategy to prevent tumor invasion in HNSCC.
c-Src; PLCγ-1; head and neck squamous cell carcinoma; EGFR; invasion; combination therapy
Two-dimensional graphitic carbon is a new material with many emerging applications, and studying its chemical properties is an important goal. Here, we reported a new phenomenon – the enzymatic oxidation of a single layer of graphitic carbon by horseradish peroxidase (HRP). In the presence of low concentrations of hydrogen peroxide (~40 µM), HRP catalyzed the oxidation of graphene oxide, which resulted in the formation of holes on its basal plane. During the same period of analysis, HRP failed to oxidize chemically reduced graphene oxide (RGO). The enzymatic oxidation was characterized by Raman, UV-Vis, EPR and FT-IR spectroscopy, TEM, AFM, SDS-PAGE, and GC-MS. Computational docking studies indicated that HRP was preferentially bound to the basal plane rather than the edge for both graphene oxide and RGO. Due to the more dynamic nature of HRP on graphene oxide, the heme active site of HRP was in closer proximity to graphene oxide compared to RGO, thereby facilitating the oxidation of the basal plane of graphene oxide. We also studied the electronic properties of the reduced intermediate product, holey reduced graphene oxide (hRGO), using field-effect transistor (FET) measurements. While RGO exhibited a V-shaped transfer characteristic similar to a single layer of graphene that was attributed to its zero band gap, hRGO demonstrated a p-type semiconducting behavior with a positive shift in the Dirac points. This p-type behavior rendered hRGO, which can be conceptualized as interconnected graphene nanoribbons, as a potentially attractive material for FET sensors.
graphene; oxidation; microscopy; peroxidase; field-effect transistor
The mammalian dim-light photoreceptor rhodopsin is a prototypic G protein coupled receptor (GPCR), interacting with the G protein, transducin, rhodopsin kinase, and arrestin. All of these proteins interact with rhodopsin at its cytoplasmic surface. Structural and modeling studies have provided in-depth descriptions of the respective interfaces. Overlap and thus competition for binding surfaces is a major regulatory mechanism for signal processing. Recently, it was found that the same surface is also targeted by small molecules. These ligands can directly interfere with the binding and activation of the proteins of the signal transduction cascade, but they can also allosterically modulate the retinal ligand binding pocket. Because the pocket that is targeted contains residues that are highly conserved across Class A GPCRs, these findings imply that it may be possible to target multiple GPCRs with the same ligand(s). This is desirable for example in complex diseases such as cancer where multiple GPCRs participate in the disease networks.
G protein coupled receptors; allostery; conformational changes; docking; protein-protein interactions
Human immunodeficiency virus-1 (HIV-1) in acquired immune deficiency syndrome (AIDS) relies on human host cell proteins in virtually every aspect of its life cycle. Knowledge of the set of interacting human and viral proteins would greatly contribute to our understanding of the mechanisms of infection and subsequently to the design of new therapeutic approaches. This work is the first attempt to predict the global set of interactions between HIV-1 and human host cellular proteins. We propose a supervised learning framework, where multiple information data sources are utilized, including co-occurrence of functional motifs and their interaction domains and protein classes, gene ontology annotations, posttranslational modifications, tissue distributions and gene expression profiles, topological properties of the human protein in the interaction network and the similarity of HIV-1 proteins to human proteins’ known binding partners. We trained and tested a Random Forest (RF) classifier with this extensive feature set. The model’s predictions achieved an average Mean Average Precision (MAP) score of 23%. Among the predicted interactions was for example the pair, HIV-1 protein tat and human vitamin D receptor. This interaction had recently been independently validated experimentally. The rank-ordered lists of predicted interacting pairs are a rich source for generating biological hypotheses. Amongst the novel predictions, transcription regulator activity, immune system process and macromolecular complex were the top most significant molecular function, process and cellular compartments, respectively. Supplementary material is available at URL www.cs.cmu.edu/~oznur/hiv/hivPPI.html
TMpro is a transmembrane (TM) helix prediction algorithm that uses language processing methodology for TM segment identification. It is primarily based on the analysis of statistical distributions of properties of amino acids in transmembrane segments. This article describes the availability of TMpro on the internet via a web interface. The key features of the interface are: (i) output is generated in multiple formats including a user-interactive graphical chart which allows comparison of TMpro predicted segment locations with other labeled segments input by the user, such as predictions from other methods. (ii) Up to 5000 sequences can be submitted at a time for prediction. (iii) TMpro is available as a web server and is published as a web service so that the method can be accessed by users as well as other services depending on the need for data integration.
Isotope labeling of proteins represents an important and often required tool for the application of nuclear magnetic resonance (NMR) spectroscopy to investigate the structure and dynamics of proteins. Mammalian expression systems have conventionally been considered to be too weak and inefficient for protein expression. However, recent advances have significantly improved the expression levels of these systems. Here, we provide an overview of some of the recent developments in expression strategies for mammalian expression systems in view of NMR investigations.
Isotope labeling; Nuclear magnetic resonance; Recombinant protein expression; Human embryonic kidney cells
Protein–protein interactions play a key role in many biological systems. High-throughput methods can directly detect the set of interacting proteins in yeast, but the results are often incomplete and exhibit high false-positive and false-negative rates. Recently, many different research groups independently suggested using supervised learning methods to integrate direct and indirect biological data sources for the protein interaction prediction task. However, the data sources, approaches, and implementations varied. Furthermore, the protein interaction prediction task itself can be subdivided into prediction of (1) physical interaction, (2) co-complex relationship, and (3) pathway co-membership. To investigate systematically the utility of different data sources and the way the data is encoded as features for predicting each of these types of protein interactions, we assembled a large set of biological features and varied their encoding for use in each of the three prediction tasks. Six different classifiers were used to assess the accuracy in predicting interactions, Random Forest (RF), RF similarity-based k-Nearest-Neighbor, Naïve Bayes, Decision Tree, Logistic Regression, and Support Vector Machine. For all classifiers, the three prediction tasks had different success rates, and co-complex prediction appears to be an easier task than the other two. Independently of prediction task, however, the RF classifier consistently ranked as one of the top two classifiers for all combinations of feature sets. Therefore, we used this classifier to study the importance of different biological datasets. First, we used the splitting function of the RF tree structure, the Gini index, to estimate feature importance. Second, we determined classification accuracy when only the top-ranking features were used as an input in the classifier. We find that the importance of different features depends on the specific prediction task and the way they are encoded. Strikingly, gene expression is consistently the most important feature for all three prediction tasks, while the protein interactions identified using the yeast-2-hybrid system were not among the top-ranking features under any condition.
protein–protein; interaction; high-throughput data; joint learning