|Home | About | Journals | Submit | Contact Us | Français|
Summary: PyETV is a PyMOL plugin for viewing, analyzing and manipulating predictions of evolutionarily important residues and sites in protein structures and their complexes. It seamlessly captures the output of the Evolutionary Trace server, namely ranked importance of residues, for multiple chains of a complex. It then yields a high resolution graphical interface showing their distribution and clustering throughout a quaternary structure, including at interfaces. Together with other tools in the popular PyMOL viewer, PyETV thus provides a novel tool to integrate evolutionary forces into the design of experiments targeting the most functionally relevant sites of a protein.
Availability: The PyETV module is written in Python. Installation instructions and video demonstrations may be found at the URL http://mammoth.bcm.tmc.edu/traceview/HelpDocs/PyETVHelp/pyInstructions.html.
Since protein–protein interactions are ubiquitous and an emerging target for design and therapeutics (Mandell and Kortemme, 2009), it is critical to improve the tools that enable their characterization. Here, we present an ET Viewer that provides a high-quality interface to map and analyze evolutionary forces and the functional sites they define across multi-protein interfaces and assemblies. Importantly, it enables the integration of Evolutionary Trace (ET) analysis (Lichtarge and Wilkins, 2010; Lichtarge et al., 1996) with any type of structural and biophysical information accessible to PyMOL (DeLano, 2002).
ET is a well-validated method to identify functional sites and their residue determinants in proteins. It drives experiments that efficiently elucidate the molecular basis of binding, catalysis and allostery, and that rationally perturb networks (Ribes-Zamora et al., 2007; Rodriguez et al., 2010). ET ranks the ‘evolutionary importance’ of sequence positions by tracking whether their variations during evolution correlate with large or small divergences among orthologs and paralogs (Lichtarge et al., 1996). These ET ranks are robust, and the best-ranked sequence positions form continuous spatial clusters (Wilkins et al., 2010) that reveal functional sites and their functional determinants. Varying the threshold of importance reveals these sites at varying levels of detail. Recent studies show that small motifs of ET residues can be compared across all protein structures to predict functions in enzymes and non-enzymes alike, thereby validating ET predictions of functional determinants on a large scale (Erdin et al., 2010; Kristensen et al., 2008).
Yet, the study of the structural interactions of evolutionary important ET residues across structural interfaces has been limited. The ET report_maker (Mihalek et al., 2006) only provided a PDF report of individual sequence or structure ET analysis, with no graphical user interface. A prior Evolutionary Trace Viewer (ETV) that did provide an interactive molecular viewer (Morgan et al., 2006) could only display just a single protein chain at a time; and it could not show surfaces, secondary structure elements or any other type of information, such as crystallographic B-factors, electrostatics or bound ligands. By addressing these problems, PyETV offers a tool to study protein functional determinants and evolutionary forces in the more relevant structural context of quaternary interactions.
PyETV builds on the popular and extensible PyMOL (DeLano, 2002) platform. PyMOL is a molecular graphics package to view, select, label and perturb any number of structures or residues in many ways (e.g. cartoon, surface, stereo, etc.). Moreover, it is easily extended with plugins—scripts that overlay complementary information through custom interfaces, such as the APBS plugin (by M. G. Lerner) to generate electrostatics maps. Likewise, PyETV, which opens when selected among the items under the ‘Plugin’ menu, specifically maps ET ranks onto structures.
The primary source of precomputed ET rank data is the ET server (http://mammoth.bcm.tmc.edu/ETserver.html), which regularly updates to incorporate new PDB (Berman et al., 2000) structures. If a user wishes to start with a new protein sequence or provide a custom alignment, new ET rank data can be generated via the ET Wizard feature of ETV (Morgan et al., 2006).
ET rank data can be loaded into PyETV in several ways:
Once the ET data are loaded for any number of chains, the top-ranked residues of each one can be highlighted at any desired threshold of evolutionary importance by dragging a slider (horizontal scrollbar) specific to that structure. The color scheme is flexible: it can be made uniform (all red for the top n-th percentile rank of importance), or it can follow a rainbow spectrum in which red is the most important and purple the least so. For added clarity, top-ranked residues may also be distinguished from the rest of the structure by their representation (e.g. spheres, C-alpha atoms, sticks). (For a video demonstration, see http://www.youtube.com/watch?v=Wt5Q0Nwvu24.) These operations can be repeated individually for each chain, or they can be coordinated among all chains to display the same threshold of importance throughout a complex.
After selecting top-ranked residues for some threshold, PyETV provides z-scores (Mihalek et al., 2003; Wilkins et al., 2010) that quantitatively assess whether these residues cluster within a single protein structure or across a protein–protein interface more so than expected by chance (i.e. if z-score > 2). High z-scores indicate that surface groupings of ET residues may well reveal functional sites. Two types of clustering z-scores are available for single structures: one that accounts for close sequence neighbors (3D biased z-score) and another that does not (3D no-bias z-score). For protein–protein complexes, a similar z-score, called Zcoupling, is also available.
It is often non-trivial to extract the biologically relevant complex from a PDB file. As an aid to general users, PyETV can load a macromolecular assembly directly from Protein Interfaces, Surfaces and Assemblies (PISA, http://www.ebi.ac.uk/msd-srv/prot_int/pistart.html) (Krissinel and Henrick, 2007), and match precomputed traces to each chain. The user need only to supply a PDB code. Mappings of ET ranks can be simultaneously controlled over all chains at once to examine the joint distribution of importance over the entire complex. To assist with interface analysis, one may select residues in a structure that is in contact with any ligand, such as a protein complex partner.
For demonstrations of these features, see the following: (i) to load PISA structures, map ET ranks and select a protein–protein interface, view http://www.youtube.com/watch?v=1VCdKPKqLdg. (ii) for another video example of selecting and isolating interface residues with ET ranks mapped to these residues, visit http://www.youtube.com/watch?v=OUyaJCSYQQA.
PyETV integrates data from several sources (ET, PDB, PISA) and extends the trace-to-structure mapping originally implemented in the Java-based ETV to any number of structures and traces. PyETV relies on the power, flexibility and wide availability of the PyMOL molecular visualization system and its extensibility through Python. Future updates may include new tools for analyzing protein–protein interfaces, mapping new attributes such as correlations between two residues or ET ranks for residue pairs, and incorporating quality measures that supplement the clustering z-score.
We thank P. Katsonis, D. H. Morgan, J. Quiros, E. Venner, A. D. Wilkins and R. Yao.
Funding: National Science Foundation (DBI-0547695 and CCF-0905536 to O.L.); National Institutes of Health (GM079656 and GM066099 to O.L.).
Conflict of Interest: none declared.