The increasing number of structures available as a result of structural genomic initiatives has generated great interest in the development of structure-based function prediction methods 
. Similar to sequence analysis the most straightforward approach is to compare the protein to be characterized with a set of proteins of known function. Global structural comparison methods, such as Dali 
, Vast 
and CE 
, can be used to identify remote homology relationships that defy traditional sequence analysis.
In addition, since the function of a protein usually depends on the identity and location of a small number of residues, local structural comparison methods (reviewed in 
) represent the ideal tool to focus the comparative analysis on the residues which are critical to function. Therefore one can compare a protein of unknown function with a set of well-characterized structures in order to check whether there are local similarities involving the known functional patches. Alternatively, from the analysis of a number of structures sharing some property, it is possible to derive a structural template encoding the function-determining residues, and use that to screen the proteins of interest.
The local comparison problem comprises two different tasks:
- finding a suitable representation for the protein structure
- searching for the correspondence between the descriptors used that is optimal according to some criteria (e.g. length, RMSD, or a combination of both).
As we will show, the type of representation used can greatly influence the kind of results that are obtained by the application of these methods. Indeed different functional sites may require a residue description focused on different physicochemical properties.
In terms of search strategy three approaches are commonly used: recursive branch and bound algorithms, subgraph isomorphism and geometric hashing. The first two algorithmic strategies are equivalent in practice. A recursive branch and bound algorithm is used by RIGOR/SPASM 
, Query3d 
and PINTS 
. Methods based on subgraph isomorphism include ASSAM 
, CavBase 
and eF-Site 
. Methods relying on geometric hashing include C-alpha Match 
, Prospect 
, SiteEngine 
and ProteMiner-SSM 
However the two tasks of representing the structure and searching for correspondences can be decoupled. Indeed, once a structure representation has been calculated according to the specific method used by the program, however complex this step may be, the problem simply becomes that of finding a correspondence between two sets of descriptors in space. We present here a novel program that leverages this observation. This program is called superpose3D and is available under the open source GPL license at http://cbm.bio.uniroma2.it/superpose3D
. Superpose3D allows users to flexibly specify the way that residues are to be represented during the computation and the pairing rules.
To the best of our knowledge the only downloadable, open-source methods for local structural comparison are RIGOR/SPASM and PINTS. RIGOR/SPASM allows the user to specify the residue substitutions. However, in terms of structure representation, the only option is whether to use the CA, the geometric centroid of the side chain or both.
The residue definition syntax of PINTS is much more flexible. Users are required to assign arbitrary types to different atoms. Atoms of the same type are part of the same equivalency group and therefore can be matched with each other. Therefore it is not possible to specify that atoms A–B of residue X must match atoms C–D of residue Y and have to be paired as A;C B;D. In other words it is not possible to specify constraints that involve more than one equivalence group at the same time. Moreover when multiple atoms are selected for the same residue PINTS always uses the geometric centroid.
In this work we describe superpose3D and the syntax used to specify different residue descriptions. We will also discuss several examples that highlight the advantages of using three types of structure description with varying levels of detail. These examples underscore the importance of using a residue representation that is tailored to the analysis at hand.