High-throughput methods for structural genomics have produced an increasing number of protein structures to be solved by X-ray crystallography. The abundance of protein structure information in the Protein Data Bank (PDB) has increased the need and desire for structure-based function prediction [1
] and has contributed to structure-based drug design [2
]. However, two problems remain regarding the prediction of enzyme function. First, proteins within a superfamily, which are usually expected to share the same catalytic properties, can catalyze different reactions. There are reports that enzymes with 98% sequence identity, such as melamine deaminase and atrazine chlorohydrolase, may catalyze different reactions [3
]. Second, two enzymes belonging to different superfamilies or fold classes can catalyze almost identical reactions [4
The function of a protein can be affected by a small number of residues in a localized region of its three-dimensional structure [5
]. Moreover, the specific arrangement and conformation of these residues can be crucial to a protein’s function and may be strongly conserved during its evolution, even when the protein sequence and structure change significantly [5
]. For example, it was reported that the positioning of the reactive region of a substrate with respect to a cofactor is generally conserved in flavoenzymes [6
Two methods for the description of local structures have been developed for predicting enzymatic functions. First, in the element-based description of catalytic residues, the catalytic roles in an enzymatic reaction are defined as acid–base, stabilizer or modulator roles [7
]. Some insight into enzymatic reactions can be gained using this method, but manual annotation is inherently required. In addition, it is often difficult to differentiate between the acid–base and stabilizer roles because most structures solved by X-ray crystallography provide no information about hydrogen atoms. The second method is based on descriptions of substructures within the local structures of enzymes [8
]. Many approaches to analyze and compare local structures have been proposed. One group of algorithms, which includes the PINTS [8
], ETA [9
] and FLORA [12
] algorithms, scans protein structural databases using pre-calculated or automatically generated templates. Another group includes algorithms that compare the substructural epitopes of proteins using geometric hashing [13
]. Similarly, SiteEngine [16
] uses the concept of pseudocenters [17
] to define the properties of the corresponding surface. None of these approaches can characterize catalytic sites and create feature vectors, even though they assess the similarity between catalytic sites.
In this study, we examine the structures of oxidoreductases and transferases using radial distribution functions (RDFs) that encode radially distributed properties of active sites centered around the reacting points of bound ligands. Thus, element-based and substructure descriptions are integrated into the RDF, assuming that catalytic roles are restricted by distances and that different catalytic residues can play identical roles. Although the topological correlation vector method of Stahl et al.
] and WaveGeoMap, developed by Kupas et al.
], provide feature vectors related to enzyme cavities, these descriptions use patches of active sites, regardless of the orientation of the catalytic residues. Therefore, it is still unclear whether the orientation of active sites around a reacting point is related to enzymatic function and how much of the orientation is conserved. Our method provides a different view of enzymatic function by focusing on the physicochemical properties surrounding a reacting point found in enzyme cofactors.