Protein three-dimensional (3D) structural information help to understand functional protein properties and the precise mechanisms of proteins implicated in physiological and pathological processes.1
Knowledge of 3D protein structures linked to small molecules can be used for structure- and ligand-based drug design approaches.2
It also gives direct hints to the protein functional mechanisms. A protein’s activity often depends on a small, highly conserved set of residues within the binding site.4
Comparison and detection of protein binding sites are key steps for annotating structures with functional predictions. In this field, Structural Genomics consortia have radically changed mankind’s base of protein structural knowledge. Their endeavors have permitted the resolution of numerous structures characterized as “Unknown function”, and multiple functional sites are not associated with any known binding partner.6
Consequently, the development of computational methods to functionally annotate protein structures has become a major research area.
The simplest approaches are based on sequence analogy
, eg, PSI-BLAST,7
or on the characterization of functional patterns or profiles, eg, PROSITE.8
They help to draw on knowledge and assumptions of protein functions in assigning predicted functions. However, they cannot embrace the complexity of local 3D folds. During the past years, various methods to compare and detect binding sites have been elaborated; they use diverse types of descriptors. Their general purpose is often to create automated functional annotation methods independent from amino acid sequence or from global fold similarity, eg, CavBase,9
Some of these approaches share gross features but they also have notable distinctions. For instance, SiteEngine and CavBase both associate physico-chemical properties to structural characteristics. However, SiteEngine allows the comparison of entire protein surfaces to a binding site database, whereas CavBase is restricted to cavity comparisons. The web-based version of SiteEngine is restricted to the comparison of a single site versus one protein structure.10
CavBase detects related cavities based on a clique detection algorithm9
while CPASS comparison uses an alignment of binding site pairs through a root–mean–square–difference (RMSD) scoring function.12
Roterman has developed an innovative methodology based on irregular hydrophobicity distribution.14
A few other methods are based on the detection of conserved residues to characterize binding sites, eg, evolutionary trace method15
or sequence alignment with a dedicated dataset as Catalytic Site Atlas (CSA).4
In this research area, SuMo is a powerful technology to localize similar local regions on protein surfaces ie, binding sites.18
Each chemical property, or interaction, of an amino acid residue is represented by a specific surface chemical feature (SCF). These are gathered in triangles to constitute a SuMo graph vertex. Since each SCF is associated with heterogeneous geometrical properties, and that triplets have specific superimposition rules (distance, angle), the comparison heuristic is extremely rapid. The comparison of a 3D pattern against all the binding sites of the PDB can be performed in a few minutes.19
MED-SuMo is the latest evolution of SuMo software developed by MEDIT-SA (see http://www.medit.pharma.com/
). Recent developments have improved its binding site database, and have included novel functional annotation tools as presented in a recent study.20
Proteins are also classified according to their folds,21
eg, SCOP (Structural Classification of Proteins),22
that provides a manually refined classification with detailed and comprehensive descriptions of the structural and evolutionary relationships of the known protein structure.22
However, a critical limitation of these fold-based classifications is the use of complete protein folds or protein domains. Similarity of fold does not necessarily correspond to a similarity of function. In this paper, we focus on an interesting SCOP superfamily which includes the heat shock protein 90 SCOP family (HSP90, see ).
Heat shock protein 90 (HSP90) SCOP superfamily: GHKL: HSP90, MutL proteins, pyruvate dehydrogenase kinase and DNA topoisomerase VI all share this fold.
HSP90 is one of the most abundant proteins. Its different forms exhibit mainly chaperone functions associated to protein folding, cell survival,24
apoptosis and tumor repression.25
It binds ATP (see ) and is the target of some innovative drugs including geldanamycin which has enabled 50% reduction of tumor growth,26
and celasterol which disrupts interactions between HSP90 and Cdc37 in pancreatic cancer cells.27
Some recent research focussed on a new potential drug, radicicol. This molecule has a very high affinity for HSP90 (20 nM).28
shows the association of the drug with the HSP90 at the binding site normally filled with a natural ligand.28
However, radicicol is not specific to HSP90 as it binds bacterial Sensor Kinase PhoQ,29
and topoisomerase VI.30
An interesting detail is that HSP90 chaperone, MutL/DNA topoisomerase or histidine kinases share (see ) a common fold and that a common region of ATP-binding has been detected (see ).
Figure 2 An example of heat shock protein 90 (HSP90) bound to its natural ligand. The protein shown is an HSP90 of Saccharomyces cerevisiae (PDB code 1AMW). a–b) underlines the close contacts (in red) of the ADP (in blue). c–d) underlines in green (more ...)
An example of heat shock protein 90 (HSP90) bound to radicicol. Both views represent an HSP90 of Saccharomyces cerevisiae (PDB code 1BGQ) bound to the drug radicicol shown in blue (see to compare with the natural ligand of HSP90).
To analyze the similar and different features of this fold, we use a novel classification method, MED-SuMo Multi approach (MED-SMA), based on the MED-SuMo technology. In this work, binding sites from the SCOP superfamily ATPase domain of HSP90 chaperone/DNA topoisomerase II/histidine kinase proteins are gathered in a dataset, compared pairwise and classified using the Markov Cluster Algorithm (MCL).31
Results from this method highlight common and distinct functional features between the analyzed proteins.