PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of bioinfoLink to Publisher's site
 
Bioinformatics. Aug 15, 2012; 28(16): 2186–2188.
Published online Jun 11, 2012. doi:  10.1093/bioinformatics/bts331
PMCID: PMC3413384
ETAscape: analyzing protein networks to predict enzymatic function and substrates in Cytoscape
Benjamin J. Bachman,1,2,3 Eric Venner,1,2 Rhonald C. Lua,1 Serkan Erdin,1 and Olivier Lichtarge1,2,3*
1Departments of Molecular and Human Genetics, 2Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030 and 3W. M. Keck Center for Interdisciplinary Bioscience Training, Houston, TX 77005, USA
*To whom correspondence should be addressed.
The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.
Associate Editor: Anna Tramontano
Received February 21, 2012; Revised May 11, 2012; Accepted May 30, 2012.
Summary: Most proteins lack experimentally validated functions. To address this problem, we implemented the Evolutionary Trace Annotation (ETA) method in the Cytoscape network visualization environment. The result is the ETAscape plugin, which builds a structural genomics network based on local structural and evolutionary similarities among proteins and then globally diffuses known annotations across the resulting network. The plugin displays these novel functional annotations, their confidence, the molecular basis for individual matches and the set of matches that lead to a prediction.
Availability: The ETA Network Plugin is available publicly for download at http://mammoth.bcm.tmc.edu/networks/.
Contact: lichtarge/at/bcm.edu
The Structural Genomics Initiative (SGI) generates abundant structural data (Erdin et al., 2011; Valencia, 2005), but many of these structures lack annotation (Redfern et al., 2008). Computational methods that match small structural motifs of functionally important residues (a template) and suggest a function when the geometry is close enough (Laskowski et al., 2005; Redfern et al., 2009) are an especially promising way to approach this problem. A template can be constructed from prior knowledge of functional residues and mechanisms, or it can be created de novo by Evolutionary Trace (ET) analysis, which predicts functionally relevant amino acids and pinpoints functional sites using evolutionary principles (Lichtarge et al., 1996). ET accuracy has been thoroughly tested both experimentally (Adikesavan et al., 2011; Rodriguez et al., 2010) and computationally (Mihalek et al., 2004; Res et al., 2005).
Evolutionary Trace Annotation (ETA) first maps the evolutionary importance of each residue onto a structure and selects a cluster of important surface residues as the template. It then seeks a match that is similar both geometrically and evolutionarily in protein structures with known function. These ETA templates usually overlap with catalytic sites (Ward et al., 2008) and identify function with 87% accuracy at 61% coverage (Kristensen et al., 2008).
Using a network in which structures form the nodes and ETA matches form the edges helps to overcome limitations from sparse functional data. We make predictions by allowing Enzyme Commision (EC) numbers to ‘diffuse’ through the network according to the cost function:
A mathematical equation, expression, or formula.
 Object name is bts331m1.jpg
(1)
where the elements of y are 1, 0 or - 1 depending on whether a protein is known to have, known not to have or is unknown to have a particular EC number. After minimization, f contains the ‘diffused’ values. The first term reflects the desire to not lose known information, and penalizes nodes whose function differs before and after diffusion. The second term reflects the fact that we expect neighboring proteins in this network to have similar functions, and punishes neighbors where this is not the case according to the edge weight. Repeating this process for all ECs yields a prediction for each possible function at each node. By normalizing the prediction scores across all nodes with unknown function, we create a prediction confidence. In benchmarks, the accuracy of this ETA diffusion network was > 97% at 50% coverage, allowing the prediction and experimental confirmation of the function of an unannotated Staphylococcus aureus protein (Venner et al., 2010).
This method is now made widely available and more transparent by embedding it into the Cytoscape network visualization environment (Smoot et al., 2011). The ETAscape plugin allows users to view ETA networks, add proteins, make novel predictions, as well as view annotations, ETA templates and protein structures, adding to a public suite of ET tools that make functional site analysis and function prediction transparent (Lua and Lichtarge, 2010; Ward et al., 2009).
The plugin is available from mammoth.bcm.tmc.edu/networks. All commands are available as menu options, and a manual and tutorial video are available from the download page. A starting network of ETA matches between a subset of the Protein Data Bank (PDB) filtered for 90% sequence identity (Berman et al., 2000) is included with the plugin. Node colors are based on known enzymatic function and the layout clusters similar proteins. ETA networks subdivide into a large number of small networks due to the specificity filters. Right-clicking a node provides links to PDBsum (Laskowski, 2009) and the ET Server.
The ‘Add new node to Network’ menu option queries the ETA Server (Ward et al., 2009), which opens in a browser and suggests an ET template that the user may customize. This template is then matched against proteins in the network and matches are filtered as described previously (Ward et al., 2008). Modified networks may be saved and later reloaded.
Structures and ETA templates can be opened in Jmol (Hanson, 2010) windows by selecting nodes and running the ‘Show Templates’ menu option. The ‘Run Diffusion’ menu option predicts the function of unannotated proteins with our diffusion model (Venner et al., 2010). Novel annotations and prediction confidences are available in the node attribute browser. The Show Influencing Proteins menu command shows proteins with the largest influence on the predicted function of the selected protein, often including nodes with strong indirect connections. After making predictions, users can export them to a file.
As an example, the plugin predicts that a protein expressed from gene locus At3g16990 in Arabidopsis thaliana (PDB ID 2f2gB), an SGI protein of unknown function, is a thiamine pyridinylase with Enzyme Classification number EC 2.5.1.2. (Fig. 1). Although the direct matches lack functional annotation, the software arrives at this prediction by diffusing the function across the intermediate links. There is one other function present in the subnetwork in proximity to the query protein (Aminopyrimidine aminohydrolase, EC 3.5.99.2). Interestingly, even though they are distinct reactions, both functions share the substrate Thiamine, possibly explaining the detected template similarity. 2f2gBs direct matches are well below the reliable homology range with sequence identities of 16% with 1rtwD and 14% with 1z72B. As observed previously (Ward et al., 2008), the direct matches share overall structural similarity: many of the proteins in this cluster belong to the CATH heme oxygenase superfamily.
Fig. 1
Fig. 1
Screen capture of the plugin. Nodes are proteins, with colors encoding the first two EC numbers. Edges are ETA template matches and indicate local structural and evolutionary similarity. Two unknown protein structures (yellow nodes in center) are highlighted (more ...)
3 CONCLUSIONS
The ETAscape plugin extends an existing suite of protein function annotation tools to infer functional residues, identify functional sites and predict protein function (Lua and Lichtarge, 2010; Ward et al., 2009). This tool pairs state-of-the-art network analysis with network visualization, putting the ability to generate novel predictions into the hands of researchers. Perhaps more importantly, it provides insight into the basis for those predictions.
ACKNOWLEDGEMENTS
Funding: The authors gratefully acknowledge grant support from the National Institute of Health, the National Science Foundation and National Library of Medicine. National Institute of Health, NIH GM079656 and GM066099; National Science Foundation, NSF, CCF 0905536 and DBI 1062455; and from the National Library of Medicine NLM T15LM007093 through the Gulf Coast Consortia's Keck Center.
Conflict of Interest: none declared.
  • Adikesavan A.K., et al. Separation of recombination and SOS response in escherichia coli RecA suggests LexA interaction sites. PLoS Genet. 2011;7:e1002244. [PMC free article] [PubMed]
  • Berman H.M., et al. The protein data bank. Nucleic Acids Res. 2000;28:235–242. [PMC free article] [PubMed]
  • Erdin S., et al. Protein function prediction: towards integration of similarity metrics. Curr. Opin. Struct. Biol. 2011;21:180–188. [PMC free article] [PubMed]
  • Hanson R.M. Jmol a paradigm shift in crystallographic visualization. J. Appl. Crystallogr. 2010;43:1250–1260.
  • Kristensen D.M., et al. Prediction of enzyme function based on 3D templates of evolutionarily important amino acids. BMC Bioinformatics. 2008;9:17. [PMC free article] [PubMed]
  • Laskowski R.A., et al. ProFunc: a server for predicting protein function from 3D structure. Nucleic Acids Res. 2005;33:W89–W93. [PMC free article] [PubMed]
  • Laskowski R.A. PDBsum new things. Nucleic Acids Res. 2009;37:D355–D359. [PMC free article] [PubMed]
  • Lichtarge O., et al. An evolutionary trace method defines binding surfaces common to protein families. J. Mol. Biol. 1996;257:342–358. [PubMed]
  • Lua R.C., Lichtarge O. PyETV: a PyMOL evolutionary trace viewer to analyze functional site predictions in protein complexes. Bioinformatics. 2010;26:2981–2982. [PMC free article] [PubMed]
  • Mihalek I., et al. A family of evolution-entropy hybrid methods for ranking protein residues by importance. J. Mol. Biol. 2004;336:1265–1282. [PubMed]
  • Redfern O.C., et al. Exploring the structure and function paradigm. Curr. Opin. Struct. Biol. 2008;18:394–402. [PMC free article] [PubMed]
  • Redfern O.C., et al. FLORA: a novel method to predict protein function from structure in diverse superfamilies. PLoS Comput. Biol. 2009;5:e1000485. [PMC free article] [PubMed]
  • Res I., et al. An evolution based classifier for prediction of protein interfaces without using protein structures. Bioinformatics. 2005;21:2496–2501. [PubMed]
  • Rodriguez G.J., et al. Evolution-guided discovery and recoding of allosteric pathway specificity determinants in psychoactive bioamine receptors. Proc. Nat. Acad. Sci. 2010;107:7787–7792. [PubMed]
  • Smoot M.E., et al. Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics. 2011;27:431–432. [PMC free article] [PubMed]
  • Valencia A. Automatic annotation of protein function. Curr. Opin. Struct. Biol. 2005;15:267–274. [PubMed]
  • Venner E., et al. Accurate protein structure annotation through competitive diffusion of enzymatic functions over a network of local evolutionary similarities. PLoS ONE. 2010;5:e14286. [PMC free article] [PubMed]
  • Ward R.M., et al. De-orphaning the structural proteome through reciprocal comparison of evolutionarily important structural features. PloS ONE. 2008;3:e2136. [PMC free article] [PubMed]
  • Ward R.M., et al. Evolutionary trace annotation server: automated enzyme function prediction in protein structures using 3D templates. Bioinformatics. 2009;25:1426–1427. [PMC free article] [PubMed]
Articles from Bioinformatics are provided here courtesy of
Oxford University Press