It has become apparent that surfaces, comprised of a fraction of the total residues, are the most conserved functional features of proteins. Proteins utilize common surface motifs to create precise chemical environments designed to perform specific functions. These motifs are not restricted to a single protein scaffold but can be found within different protein folds or at domain/domain and subunits interfaces. While biochemical activity can be attributed to a few key residues (e.g catalytic triads), the broader surrounding environment (i.e. auxiliary residues in spatial proximity) often plays an equally import role in fine-tuning molecular recognition and/or catalysis.
Powerful evolutionary forces have allowed proteins to govern ligand binding through seemingly subtle local surface variability. These changes, which are not easily detectable by sequence analysis, may provide competitive advantage for optimization of co-factor specificity. In some circumstances, surface diversity adversely affects normal cell process by providing environments for undesired binding events (e.g. drug side effects) or mutations directly correlated to disease[1
]. The conservation of functional surfaces presents an opportunity to compare and analyze proteins independent of sequence or fold. These comparisons can be used to classify protein functions or to infer biochemical activity for proteins with unknown function, such as those targeted by structural genomics programs.
Several methods have been developed detecting localized, spatial protein similarities with applications for evolutionary analysis, function prediction and drug discovery. The use of graph theory has been widely applied to the comparison of three-dimensional patterns. Artymiuk et al
. developed an algorithm based on subgraph isomorphism detection to search residue patterns against the PDB[2
]. Kinoshita et al
. used clique detection algorithms to assign protein biochemical functions using the similarity information of molecular surface geometries and electrostatic potentials[3
]. Using a clique-detection algorithm, Schmitt et al
., compared generic pseudo-centers that code for possible ligand-protein interactions in protein cavities. Query cavities are searched against Cavbase, a pre-computed database of cavities extracted from the PDB[4
]. The method has been applied to identify surfaces in non-homologous proteins as well as for the classification of protein families[5
]. Kleywegt searched for motifs of residue pseudo-centers in a library of protein structures using a depth-first search algorithm[6
]. Russell also developed an algorithm based on depth-first search that detects atomic geometric patterns common in between side-chains in proteins and presented new examples of convergent evolution[7
]. Parametric statistical evaluations of Russell's atomic superposition method were extended by Stark et al
Another widely used approach is geometric hashing, which is an efficient method for matching features against a database. Jackson and Gold used geometric hashing to perform an all-against-all comparison of protein-ligand binding sites in the SitesBase database [9
]. Their method was also applied for functional annotation and building pharmacophore models for drug discovery[11
]. Fischer et. al
. developed an algorithm based on geometric hashing that detects surface similarities of proteins using spatial patterns of atoms[12
]. A similar method, TESS, has been applied for the derivation and matching of annotated spatial templates[14
], a successor to TESS, searches small groups of atoms under arbitrary constraints on geometry and chemistry and utilized statistics to evaluate matches. It is used to query the Catalytic Site Atlas (CSA)[16
] a collection of annotated residues patterns extracted from manual literature searches. JESS is also used in the PROFUNC[17
] suite of annotation tools in the reverse template search, where a radius defined perimeter extends a local residue pattern search for improved search specificity.
A protein evolution based method, pvSOAR, was developed that used the unique approach of aligning sub-sequences of surface residues to establish a residue correspondence between surfaces[18
]. The residues were then superimposed on each other and statistical significance was evaluated for the resulting RMSD. This method was used to detect similar functional surfaces in non-homologous proteins. Furthermore, in a recent study of shape variation of ligand binding pockets, Kahraman et. al
., used a shape-only comparison metric based on spherical harmonics[20
]. It was shown that shape descriptors could be used to classify ligand into their binding sites.
In this study, we describe a new method for the sequence order independent comparison and alignment of protein functional surfaces. Our method, SurfaceScreen, attempts to optimize two components, global surface shape and local physicochemical texture, for evaluating the similarity between a pair of surfaces. Surface shape similarity is assessed using a three-dimensional object recognition algorithm and is used to rapidly pre-classify surfaces from a large library of surfaces. Surfaces with sufficient shape complimentarity are then aligned by combinatorially identifying the best superimposition of common residues between the two surfaces. We introduce several metrics for scoring different properties of a surface alignment and an overall scoring function used in library searches. Furthermore, we introduce the Global Protein Surface Survey (GPSS), a library of annotated protein surfaces calculated from all structures in the PDB. Querying surfaces from proteins of unknown function against the GPSS library allows SurfaceScreen to be utilized as predictive tool.
We describe three types of analysis to assess surface shape comparisons and spatial alignments. First, we describe the retrieval of surfaces from the GPSS library for surfaces, from the same protein, that bind ligands of various size, shape and pharmacophore properties. For this we use the example of HIV-1 protease. Second, we use the example of heme (iron-protoporphyrin IX) binding sites to describe the retrieval of a functionally diverse binding surface that binds the same ligand. We provide the example of using our method as an annotation tool, identifying a new member of the heme binding monooxygenase family. Third, we describe how conformational diversity of bound ligands impacts retrieval rate for ubiquitous nucleotide binding sites. We also present the example of a nucleotide binding surface prediction and crystallographic validation for a structural genomics target with a new fold. We conclude with an analysis of the ATP binding surface landscape to provide insight on the correlation between surface similarity and function for structures in the PDB and for the subset of protein kinases.