PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of dddtDove Medical PressSubscribeSubmit a ManuscriptSearchFollowDovepressDrug Design, Development and Therapy
 
Drug Des Devel Ther. 2009; 3: 59–72.
Published online Sep 21, 2009.
PMCID: PMC2769237
Analysis of HSP90-related folds with MED-SuMo classification approach
Olivia Doppelt-Azeroual,1,2 Fabrice Moriaud,1 François Delfaud,1 and Alexandre G de Brevern2
1 MEDIT SA, Palaiseau, France
2 INSERM UMR-S 726, Equipe de Bioinformatique Génomique and Moléculaire, (EBGM), DSIMB, Institut National de Transfusion Sanguine (INTS), Université Paris Diderot, Paris, France
Correspondence: Olivia Doppelt-Azeroual, MEDIT SA, 2 rue du Belvédère, 91120, Palaiseau, France, Tel +33160148743, Fax +33160148473, Email olivia.doppelt/at/univ-paris-diderot.fr
Three-dimensional structural information is critical for understanding functional protein properties and the precise mechanisms of protein functions implicated in physiological and pathological processes. Comparison and detection of protein binding sites are key steps for annotating structures with functional predictions and are extremely valuable steps in a drug design process. In this research area, MED-SuMo is a powerful technology to detect and characterize similar local regions on protein surfaces. Each amino acid residue’s potential chemical interactions are represented by specific surface chemical features (SCFs). The MED-SuMo heuristic is based on the representation of binding sites by a graph structure suitable for exploration by an efficient comparison algorithm. We use this approach to analyze one particular SCOP superfamily which includes HSP90 chaperone, MutL/DNA topoisomerase, histidine kinases, and α-ketoacid dehydrogenase kinase C (BCK). They share a common fold and a common region for ATP-binding. To analyze both similar and differing features of this fold, we use a novel classification method, the MED-SuMo multi approach (MED-SMA). We highlight common and distinct features of these proteins. The different clusters created by MED-SMA yield interesting observations. For instance, one cluster gathers three types of proteins (HSP90, topoisomerase VI, and BCK) which all bind the drug radicicol.
Keywords: functional classification, surface similarity, protein surface chemical feature, radicicol binding
Protein three-dimensional (3D) structural information help to understand functional protein properties and the precise mechanisms of proteins implicated in physiological and pathological processes.1 Knowledge of 3D protein structures linked to small molecules can be used for structure- and ligand-based drug design approaches.2,3 It also gives direct hints to the protein functional mechanisms. A protein’s activity often depends on a small, highly conserved set of residues within the binding site.4,5 Comparison and detection of protein binding sites are key steps for annotating structures with functional predictions. In this field, Structural Genomics consortia have radically changed mankind’s base of protein structural knowledge. Their endeavors have permitted the resolution of numerous structures characterized as “Unknown function”, and multiple functional sites are not associated with any known binding partner.6 Consequently, the development of computational methods to functionally annotate protein structures has become a major research area.
The simplest approaches are based on sequence analogy, eg, PSI-BLAST,7 or on the characterization of functional patterns or profiles, eg, PROSITE.8 They help to draw on knowledge and assumptions of protein functions in assigning predicted functions. However, they cannot embrace the complexity of local 3D folds. During the past years, various methods to compare and detect binding sites have been elaborated; they use diverse types of descriptors. Their general purpose is often to create automated functional annotation methods independent from amino acid sequence or from global fold similarity, eg, CavBase,9 SiteEngine,10 FLAP,11 CPASS,12 or eF-seek.13
Some of these approaches share gross features but they also have notable distinctions. For instance, SiteEngine and CavBase both associate physico-chemical properties to structural characteristics. However, SiteEngine allows the comparison of entire protein surfaces to a binding site database, whereas CavBase is restricted to cavity comparisons. The web-based version of SiteEngine is restricted to the comparison of a single site versus one protein structure.10 CavBase detects related cavities based on a clique detection algorithm9 while CPASS comparison uses an alignment of binding site pairs through a root–mean–square–difference (RMSD) scoring function.12 Roterman has developed an innovative methodology based on irregular hydrophobicity distribution.14 A few other methods are based on the detection of conserved residues to characterize binding sites, eg, evolutionary trace method1517 or sequence alignment with a dedicated dataset as Catalytic Site Atlas (CSA).4
In this research area, SuMo is a powerful technology to localize similar local regions on protein surfaces ie, binding sites.18 Each chemical property, or interaction, of an amino acid residue is represented by a specific surface chemical feature (SCF). These are gathered in triangles to constitute a SuMo graph vertex. Since each SCF is associated with heterogeneous geometrical properties, and that triplets have specific superimposition rules (distance, angle), the comparison heuristic is extremely rapid. The comparison of a 3D pattern against all the binding sites of the PDB can be performed in a few minutes.19 MED-SuMo is the latest evolution of SuMo software developed by MEDIT-SA (see http://www.medit.pharma.com/). Recent developments have improved its binding site database, and have included novel functional annotation tools as presented in a recent study.20
Proteins are also classified according to their folds,21 eg, SCOP (Structural Classification of Proteins),22,23 that provides a manually refined classification with detailed and comprehensive descriptions of the structural and evolutionary relationships of the known protein structure.22,23 However, a critical limitation of these fold-based classifications is the use of complete protein folds or protein domains. Similarity of fold does not necessarily correspond to a similarity of function. In this paper, we focus on an interesting SCOP superfamily which includes the heat shock protein 90 SCOP family (HSP90, see Figure 1).
Figure 1
Figure 1
Heat shock protein 90 (HSP90) SCOP superfamily: GHKL: HSP90, MutL proteins, pyruvate dehydrogenase kinase and DNA topoisomerase VI all share this fold.
HSP90 is one of the most abundant proteins. Its different forms exhibit mainly chaperone functions associated to protein folding, cell survival,24 apoptosis and tumor repression.25 It binds ATP (see Figures 2a and 2b) and is the target of some innovative drugs including geldanamycin which has enabled 50% reduction of tumor growth,26 and celasterol which disrupts interactions between HSP90 and Cdc37 in pancreatic cancer cells.27 Some recent research focussed on a new potential drug, radicicol. This molecule has a very high affinity for HSP90 (20 nM).28 Figure 3 shows the association of the drug with the HSP90 at the binding site normally filled with a natural ligand.28 However, radicicol is not specific to HSP90 as it binds bacterial Sensor Kinase PhoQ,29 and topoisomerase VI.30 An interesting detail is that HSP90 chaperone, MutL/DNA topoisomerase or histidine kinases share (see Figure 1) a common fold and that a common region of ATP-binding has been detected (see Figures 2c and 2d).
Figure 2
Figure 2
An example of heat shock protein 90 (HSP90) bound to its natural ligand. The protein shown is an HSP90 of Saccharomyces cerevisiae (PDB code 1AMW). ab) underlines the close contacts (in red) of the ADP (in blue). cd) underlines in green (more ...)
Figure 3
Figure 3
An example of heat shock protein 90 (HSP90) bound to radicicol. Both views represent an HSP90 of Saccharomyces cerevisiae (PDB code 1BGQ) bound to the drug radicicol shown in blue (see Figure 2 to compare with the natural ligand of HSP90).
To analyze the similar and different features of this fold, we use a novel classification method, MED-SuMo Multi approach (MED-SMA), based on the MED-SuMo technology. In this work, binding sites from the SCOP superfamily ATPase domain of HSP90 chaperone/DNA topoisomerase II/histidine kinase proteins are gathered in a dataset, compared pairwise and classified using the Markov Cluster Algorithm (MCL).31 Results from this method highlight common and distinct functional features between the analyzed proteins.
Protein structure database
SCOP web site provides the list of proteins associated to a selected fold.23 The “ATPase domain of HSP90 chaperone/DNA topoisomerase II/histidine kinase” superfamily contains 116 PDB structures (see http://scop.berkeley.edu/data/scop.b.e.ccg.A.html). The protein binding sites were selected to perform the classification.
MED-SuMo algorithm
MED-SuMo is designed to localize similar regions associated to a defined function.1820 A key advantage is its ability to detect binding site similarities even when local flexibility is observed. Its heuristic is based on a 3D representation of macromolecules using precise SCFs. For MED-SuMo, a protein structure is represented by a set of functional groups including, for example, unbound hydrogen bond (Hbond) donors or acceptors, accessible sides of aromatic rings and carboxylate, charges, hydroxyl groups. Each feature encodes its chemical characteristics with precise geometrical properties. The overall MED-SuMo comparison methodology is presented in Figure 4. SCFs are displayed on the protein structure through a lexicographic analysis of the atoms in the PDB files, ie, a residue is represented by a set of representative SCFs (cf. Figures 4a, 4b). Their positions and orientations are filtered as shown in Figure 4c. Remaining SCFs are assembled into triplets with specific geometric characteristics, eg, edge size, perimeter, angles (cf. Figure 4d). The full triplet network is stored in the MED-SuMo database as a graph data structure where triplets are the vertices and edges connect adjacent triangles (ie, those sharing at least two SCFs).
Figure 4
Figure 4
MED-SuMo comparison procedure. a) Localization of an interesting part of the protein surface often characterized by the presence of a co-crystallized ligand. b) Surface chemical features (SCFs) are displayed on the protein structure through a lexicographic (more ...)
To compare graphs, MED-SuMo looks for compatible triplets; composed of compatible SCFs (cf. Figure 4e). These triplets are called comparison “seeds”. When a seed is detected, MED-SuMo extends the comparisons to the vertices of the neighbourhood, until no more similarities are found. This process enables the formation of similar patches (common groups of SCFs) between two graphs, weighted up by the MED-SuMo score.18 These comparisons are usually performed between a query and a database of precompiled graphs. Two kinds of MED-SuMo database are commonly used: the binding site database that is composed from the SCFs around co-crystallized ligands and the full surface database, composed from SCFs covering the whole surface of each studied protein, typically the entire PDB. The database characteristics are defined by three essential parameters: the size of the ligand environment taken into account by MED-SuMo (named ligand_radius and only concerning the binding site database), the maximal distance between two SCFs to be included in a triplet (named edge_max) and the maximal perimeter for a triangle (named max_edge_sum).
Classification of protein binding sites
As noted, MED-SuMo has an interesting and original approach to detect structural and functional similarities between protein binding sites.1820 We decided to apply this approach to classify defined sets of structures. This new method, named MED-SuMo Multi Approach (MED-SMA), enables the comparison of all binding sites from a set of proteins using a pairwise comparison system. Matching regions are found in the binding sites to derive a similarity graph. This graph is classified with the MCL31. Figure 5 illustrates the overall procedure. For this work, MED-SMA is only applied on the MED-SuMo binding sites database.
Figure 5
Figure 5
Global steps of binding site classification heuristic. MED-SuMo Multi approach (MED-SMA) can be divided in 5 steps: a) Database construction: all selected binding sites are stored as graph in the MED-SuMo database. b) Pairwise comparisons: all binding (more ...)
To begin, a set of proteins is selected (see previous paragraph, cf. Figure 5a). Ligands’ characteristics are used to decide which binding sites to include in the MED-SuMo database. Once the ligands parameters are set, the database is created and the pairwise comparison is launched using the standard MED-SuMo comparison procedure.
These comparisons highlight similar regions between pairs of binding sites (cf. Figure 5b) represented by groups of SCFs called patches. Only comparisons with a MED-SuMo score higher than a fixed cut-off (parameter score_min) are accepted. Patches associated to the same binding sites are analyzed: if two patches share enough SCFs (defined by a threshold parameter named covering_factor), they are merged in a multipatch (cf. Figure 5c). A multipatch is a set of SCFs common to several binding sites of the protein set; they can also be called sub-sites. They represent the true meaningful common regions of binding sites. They have two properties: (i) enough SCFs are in common, such that binding sites are structurally and chemically similar, and (ii) they can provide a measure of sub-pocket similarity. These measures are used to compute a similarity matrix. For this matrix, the MED-SuMo score between matching multipatches is calculated (cf. Figure 5d). MCL is used to interpret the matrix through classification of the protein binding site set into clusters of sub-sites (cf. Figure 5e). A 2D plot of the clusters can be visualized using tools such as Biolayout.32,33
MED-SMA classification
To generate the MED-SuMo database, only binding sites co-crystallized with ligands with more than ten atoms are selected. Of the originally selected 116 PDB structures, 101 satisfy this filter. This yields a total of 146 binding sites in the final database. Several kinds of ligands are present, purines, eg, adenosine tri-phosphate or N-ethyl-5′-carboxamido adenosine, or potential drugs, eg, Radicicol or Novobiocin. Of these 146 binding sites, 78 are from HSP90, 38 from topoisomerase/MutL, 26 are from histidine kinase, and four are from α-keto-acid dehydrogenase kinase C (BCK). The database parameters are set to a ligand radius of 6.0 Å and triangle parameters of 13 Å and 39 Å (respectively edge_max and max_edge_sum). To classify this dataset, MED-SMA takes around two minutes on a four CPU machine. The classification parameters are set to a minimal compatibility score (score_min) of 4.0 and a covering_factor of 0.6.
Here, the MED-SMA approach produces five clusters. The distribution of these clusters in regards to the SCOP families is shown in Table 1 and the composition of each cluster is available in Supplementary data 1.
Table 1
Table 1
Confusion matrix of the SCOP families within the clusters. The MED-SuMo clusters are arranged vertically whereas the SCOP families are arranged horizontally. MED-SuMo clusters #1, #3 and #5 are homogeneous clusters, they only contain protein from: SCOP (more ...)
Two types of MED-SMA clusters are seen. Three clusters are homogeneous as they contain only proteins from a unique SCOP family (MED-SMA clusters 1, 3, and 5). Two clusters are heterogeneous as they contain at least two SCOP families (MED-SMA clusters 2 and 4). MED-SMA clusters 1 and 3 are specific to topoisomerase/MutL while cluster 5 is specific to histidine kinase. MED-SMA cluster 2 contains binding sites from two families (ie, BCK and histidine kinase) and MED-SMA cluster 4’s binding sites are from three of the four families (HSP90, topoisomerase/MutL, and BCK).
MED-SMA clusters 1 and 3
MED-SMA clusters 1 and 3 contain 22 and 6 binding sites of the 38 proteins of the topoisomerase/MutL/DNA gyrase family, respectively. The two forms of topoisomerases IV structures of Escherichia coli (PDB code 1S14 and 1S16) share 99.5% sequence identity except for a 23 residue insertion in 1S16. These two proteins are separated by MED-SMA. A precise look at their ATP-binding sites highlights structural similarities but, above all, some strong distinctions. Figure 6 shows a 3D superimposition of these proteins. The region noted (1) on Figure 6 shows an excellent superimposition of several β-sheets and 2 α-helixes. Moreover a part of the binding sites is also similar, with a set of five SCFs well superimposed (noted [2] on Figure 6). Conversely, the other side of the binding site (noted [3] on Figure 6) is quite diverse. Ligands of these two topoisomerases are novobiocin for 1S14 and phosphoaminophosphonic acid-adenylate ester (ANP) for 1S16. They are not located at the same spatial position and their overlap is small (~10 atoms) compared to their respective sizes (44 atoms for novobiocin and 31 atoms for ANP). Furthermore, novobiocin can not fit at all in the 1S16 binding site, otherwise a steric clash appears with 1S16’s α helixes (noted [4] on Figure 6). Thus, binding sites from MED-SMA clusters 1 and 3 do not share sufficient similarities to be gathered by MED-SMA, neither can they bind the same kind of molecules. Interestingly, the two forms are very close but the residue insertion causes strongly diverging affinities to ligands of this class.34 So, our results reinforce the study of Bellon and colleauges. Moreover, it characterizes with elegance the fact that these two distinct local conformations are found in different related proteins.
Figure 6
Figure 6
Superimposition of two topoisomerase VI separated by MED-SMA. PDB codes 1S16 (red) and 1S14 (green) are superimposed. They are both topoisomerase but their binding sites do not share enough similarity to be grouped in the same cluster. This figure is (more ...)
MED-SMA cluster 4
As mentioned earlier, MED-SMA cluster 4 gathers three different SCOP families. It is the largest cluster, containing 89 binding sites. All HSP90s of the dataset are present (78 binding sites), 10 from mutL/DNA topoisomerase family (with one topoisomerase VI, five MutL, and four PMS2) and one from BCK family. Only the histidine kinase family is not represented in this MED-SMA cluster. The ligands are highly diverse with 48 unique ligands found.
Binding sites in this MED-SMA cluster share a common set of SCFs. Figure 7 shows a global superimposition of one structure of each family. The white rectangles show similarities whereas the remainder is very different as represented in the global superimposition of all the protein families in Figure 1. Figure 8 shows a close view around the radicicol. The eight labelled SCFs (circled in yellow) are shared by all superimposed structures in Figure 7. They are located all around the ligand meaning that the similarities concern the whole binding site.
Figure 7
Figure 7
Superimposition of four proteins from three distinct SCOP families but gathered in the same cluster by MED-SMA. (PDB codes 2HKJ [green], 2CCT [cyan], 1B63 [pink] 1JM6 [yellow]). The white rectangles show similarities around the ligands and also the helices (more ...)
Figure 8
Figure 8
A close view around the radicicol ligand. The eight labelled SCFs (circled in yellow) are shared by all superimposed structures in Figure 7. They are located all around the ligand, which means that the similarities concern the whole binding site.
The fact that MED-SMA gathers the binding sites from three different SCOP families implies a high probability that the binding modes are related. Considering the nonspecific drug radicicol which binds HSP90 and topoisomerase VI,30 we could easily make the hypothesis that this drug would also bind the different proteins included in this MED-SMA cluster.
MED-SMA clusters 2 and 5
MED-SMA clusters 2 and 5 mostly consist of histidine kinase. MED-SMA cluster 2 is heterogeneous while MED-SMA cluster 5 is homogeneous. Cluster 5 is very worthwhile because it is pure and that the dimensions of its binding sites are very similar as they all bind purine ligands. Since the binding sites gathered by MED-SMA share binding modes to ligands, this type of cluster could be used to search for specific drugs; here, drugs to inhibit histidine kinase CheA action.
Interestingly, MED-SMA cluster 2 also contains two histidine kinase CheA (PDB codes 2CH4 and 1I5D). The separation of proteins from the same family in two different clusters is due to differences between their binding sites. When 1I5D’s binding site is compared to histidine kinase CheA from cluster 5, the MED-SuMo score is less than 4.0 (which is the cut-off we chose for the pairwise comparison step). So, a drug designed to inhibit binding sites of cluster 5 would not bind (or not with the same affinity) the two excluded histidine kinase CheA binding sites.
Another interesting point on MED-SMA cluster 2 is that it contains both BCK and anti-sigma factor spoIIab. These two proteins are inhibited like HSP90 by the radicicol. However, as they are not associated to MED-SMA cluster 4, it may reflect a specific binding mode.
The detection of functional sites on protein surfaces is important for the identification of biological activity. Ligand-protein interactions occur for the majority of protein structures and they are implicated in major biological processes. However, with no help from known related sequences or structures their detection is difficult.14 Several innovative approaches have been proposed, ie, the use of hydrophobicity distribution on protein structures based on the fuzzy oil drop model,35 the destabilization of limited protein regions,36 phylogenomic classification of protein sequences37 or the classification of known protein catalytic sites.38 Prediction of protein functional sites is an important step to identify small-molecule interactions for drug discovery39 and it can be very useful to optimize drug design.40 Another valuable application is as a pre-processing step to reduce the search space for rigorous computational docking algorithms.
Methods to compare binding sites have been developed using various kinds of structural descriptors, eg, CavBase uses pseudocenters,41 and the strong hypothesis that chemical similarity and activity are linked. In this field, MED-SuMo has an interesting approach using SCFs. Each SCF represents a pertinent chemical property and is described with specific geometric rules. The search for equivalent binding sites is performed by detection of similar graphs.42 The specific geometric rules of each SCF enable the heuristic to be quite fast. So, MED-SuMo provides an interesting and original method to detect structural and functional similarities between protein binding sites. Unlike MED-SuMo, very few methods enable functional classification of sets of binding sites43 and specific binding sites are usually chosen (protein kinase) for the published work. Comparing our protocol with others is quite difficult.
Here, it is applied in a new clustering approach where the ligand environment is classified. An application to a particular protein fold, the Bergerat ATP-binding fold characterized as the ATPase domain of HSP90 chaperone/DNA topoisomerase II/histidine kinase SCOP superfamily is described here. The constituent families are quite different but their ATP binding sites appear quite alike. MED-SMA detects five different clusters. Three out of five are specific to a single family. These three MED-SMA clusters highlight the specificity of the binding sites; for example; no molecule binding to cluster 1’s binding site would also bind MED-SMA cluster 2 sites with the same interactions. The fact that the ligands are similar in MED-SMA cluster 1 and 2 (eg, ADP) emphasizes the previous observation. The ligands are the same whereas the binding modes are different. Oppositely, MED-SMA cluster 4 gathers three different families. The 3D superimposition from MED-SuMo, points out the difference of the global fold whereas the Bergerat fold can be observed (white rectangle on Figure 7). Interestingly, SCFs can be found all around the query ligand (cf. Figure 7), meaning that there is a global similarity of the binding sites from the three SCOP families. Moreover, this result is consistent with the experimental data as the proteins from these three SCOP families all bind radicicol.2830,44
These different results demonstrate the ability of the method to gather binding sites with related binding modes. This kind of relationship between families is very interesting and their identification is a direct application for MED-SMA. Moreover, with this kind of association, we can validate the assertion that functions can be assigned to unknown proteins by associating them to a specific best matching cluster. Matching clusters rather than single structures overcomes most of the noise in both the assignments and in the functions of those assigned matches. Other applications are planned, for example, a more general kinase classification using MED-SMA is under investigation.
This example clearly shows that our approach is well suited for finding common and distinct characteristics of ligand binding pockets. Thus, close proteins can have different local binding modes, while more distant ones can share common binding features ie, a potential cross-reaction may be possible. For instance, proteins associated to radicicol are found in the same MED-SMA clusters. This approach is clearly applicable to structural genomics research. As noted by Ferrè and colleagues, functional patches associated to a large collection of protein surface cavities can be used to provide functional clues for protein with unknown structures.45 This observation is shared from our study. Thus, MED-SuMo is an approach that may improve the efficiency and effectiveness of early steps along the drug discovery path, improving early lead choices, enhancing poor leads, or aiding multivariate optimizations. This study further demonstrates that MED-SuMo is appropriate for both annotating protein structures and for deriving structural functional classifications.
Finally, with its effectiveness at dealing with the entire PDB, and the parallelisation of the computational process in course, MED-SuMo is well-suited to large-scale applications. In fact it is currently used to resolve the big challenge of the POPS project (see http://www.pops-systematic.org/) in classifying every binding site represented in the PDB.
Software licensing
Commercial information regarding MED-SuMo is available at http://www.medit.fr/. Questions about MED-SuMo licensing should be addressed to info/at/medit.fr. Researcher from the Inserm Institute UMR-S 726 has no financial interests in MEDIT and collaborates with this company only for the present project. Therefore, MEDIT SA has the exclusivity for MED-SuMo sales.
Supplementary table 1
MED-SMA cluster IDPDB_LIG_IDLigand nameSCOP family
CL_11EI1_1_92ANPDNA_Gyrase_B_EColi
CL_11EI1_2_90ANPDNA_Gyrase_B_EColi
CL_11MX0_1_31ANPTOPO_VI
CL_11MX0_2_29ANPTOPO_VI
CL_11MX0_3_28ANPTOPO_VI
CL_11MX0_4_25ANPTOPO_VI
CL_11MX0_5_22ANPTOPO_VI
CL_11MX0_6_21ANPTOPO_VI
CL_11PVG_1_1ANPDNA_TOPO_II_Byeast
CL_11PVG_2_0ANPDNA_TOPO_II_Byeast
CL_11QZR_1_117CDXDNA_TOPO_II_Byeast
CL_11QZR_2_113ANPDNA_TOPO_II_Byeast
CL_11QZR_3_111ANPDNA_TOPO_II_Byeast
CL_11S16_1_102ANPTOPO_IV
CL_11S16_2_100ANPTOPO_IV
CL_11Z59_1_17ADPTOPO_VI
CL_11Z5A_1_11ADPTOPO_VI
CL_11Z5A_2_8ADPTOPO_VI
CL_11Z5B_1_86ADPTOPO_VI
CL_11Z5B_2_84ADPTOPO_VI
CL_11Z5C_1_9ADPTOPO_VI
CL_11Z5C_2_5ADPTOPO_VI
CL_21GJV_1_112SAPalpha-ketoacid_dehydrogenase_kinase
CL_21GKZ_1_33ADPalpha-ketoacid_dehydrogenase_kinase
CL_21I5D_1_118128Histidine_Kinase_CheA
CL_21ID0_1_10ANPHistidine_Kinase_PhoQ
CL_21JM6_2_61ADPPyruvate_dehydrogenase_kinase
CL_21L0O_1_75ADPAnti-sigma_factor_spoIIab
CL_21L0O_2_73ADPAnti-sigma_factor_spoIIab
CL_21TH8_1_123ADPAnti-sigma_factor_spoIIab
CL_21TH8_1_124ADPAnti-sigma_factor_spoIIab
CL_21THN_1_104ADPAnti-sigma_factor_spoIIab
CL_21THN_2_101ADPAnti-sigma_factor_spoIIab
CL_21TID_1_35ATPAnti-sigma_factor_spoIIab
CL_21TID_2_32ATPAnti-sigma_factor_spoIIab
CL_21TIL_1_27ATPAnti-sigma_factor_spoIIab
CL_21TIL_2_24ATPAnti-sigma_factor_spoIIab
CL_21TIL_3_23ATPAnti-sigma_factor_spoIIab
CL_22C2A_1_120ADPSensor_histidine_kinase_TM0853
CL_22CH4_1_56ANPHistidine_Kinase_CheA
CL_31AJ6_1_76NOVDNA_GYRASE_B_EColi
CL_31KIJ_1_66NOVDNA_GYRASE_B_TT
CL_31KIJ_2_64NOVDNA_GYRASE_B_TT
CL_31KZN_1_52CBNDNA_GYRASE_B_EColi
CL_31S14_1_105NOVTOPO_IV
CL_31S14_2_103NOVTOPO_IV
CL_41A4H_1_62GMYHSP90_Yeast
CL_41AM1_1_37ADPHSP90_Yeast
CL_41AMW_1_7ADPHSP90_Yeast
CL_41B62_1_16ADPMulL
CL_41B63_1_91ANPMulL
CL_41BGQ_1_55RDCHSP90_Yeast
CL_41BYQ_1_99ADPHSP90_Human
CL_41EA6_1_110ADPPMS2
CL_41EA6_2_109ADPPMS2
CL_41H7U_1_46ATGPMS2
CL_41H7U_2_44ATGPMS2
CL_41JM6_1_63ADPPyruvate_dehydrogenase_kinase
CL_41NHH_1_43ANPMulL
CL_41NHI_1_108ANPMulL
CL_41NHJ_1_42ANPMulL
CL_41OSF_1_83KOSHSP90_Human
CL_41QY5_1_13NECHSP90_Dog
CL_41QY8_1_87RDIHSP90_Dog
CL_41QYE_1_143CDYHSP90_Dog
CL_41TBW_1_107AMPHSP90_Dog
CL_41TBW_2_106AMPHSP90_Dog
CL_41TC0_1_125ATPHSP90_Dog
CL_41TC0_2_121ATPHSP90_Dog
CL_41TC6_1_116ADPHSP90_Dog
CL_41TC6_2_114ADPHSP90_Dog
CL_41U0Y_1_26PA7HSP90_Dog
CL_41U0Z_1_95RDCHSP90_Dog
CL_41U0Z_6_93RDCHSP90_Dog
CL_41U2O_1_3NECHSP90_Dog
CL_41U2O_2_2NECHSP90_Dog
CL_41UY6_1_88PU3HSP90_Human
CL_41UY7_1_6PU4HSP90_Human
CL_41UY8_1_82PU5HSP90_Human
CL_41UY9_1_4PU6HSP90_Human
CL_41UYC_1_144PU7HSP90_Human
CL_41UYD_1_74PU8HSP90_Human
CL_41UYE_1_141PU9HSP90_Human
CL_41UYF_1_71PU1HSP90_Human
CL_41UYG_1_138PU2HSP90_Human
CL_41UYH_1_68PU0HSP90_Human
CL_41UYI_1_135PUZHSP90_Human
CL_41UYK_1_133PUXHSP90_Human
CL_41UYM_1_132PU3HSP90_Human
CL_41YC1_1_154BCHSP90_Human
CL_41YC3_1_144BCHSP90_Human
CL_41YC4_1_8943PHSP90_Human
CL_41YET_1_39GDMHSP90_Human
CL_41YSZ_1_131NECHSP90_Dog
CL_41YT0_1_80ADPHSP90_Dog
CL_41ZW9_1_137H64HSP90_Yeast
CL_41ZWH_1_58RDEHSP90_Yeast
CL_42BRC_1_20CT5HSP90_Yeast
CL_42BRE_1_19KJ2HSP90_Yeast
CL_42BRE_2_18KJ2HSP90_Yeast
CL_42BSM_1_77BSMHSP90_Human
CL_42BT0_1_81CT5HSP90_Human
CL_42BT0_2_79CT5HSP90_Human
CL_42BYH_1_692D7HSP90_Human
CL_42BYI_1_1342DDHSP90_Human
CL_42BZ5_1_70AB4HSP90_Human
CL_42BZ5_2_65AB4HSP90_Human
CL_42CCS_1_984BHHSP90_Human
CL_42CCT_1_302E1HSP90_HumanC
CL_42CCU_1_972D9HSP90_Human
CL_42CDD_1_96CT5HSP90_Human
CL_42CDD_2_94CT5HSP90_Human
CL_42EXL_1_41GMYHSP90_Dog
CL_42EXL_2_40GMYHSP90_Dog
CL_42FWY_1_12H64HSP90_Human
CL_42FWZ_1_85H71HSP90_Human
CL_42FXS_1_78RDAHSP90_Yeast
CL_42FYP_1_60RDEHSP90_Dog
CL_42FYP_2_59RDEHSP90_Dog
CL_42GFD_1_72RDAHSP90_Dog
CL_42GFD_2_67RDAHSP90_Dog
CL_42GQP_1_130PA7HSP90_Dog
CL_42GQP_2_128PA7HSP90_Dog
CL_42H55_1_122DZ8HSP90_Human
CL_42H8M_1_139NEIHSP90_Dog
CL_42H8M_2_136NEIHSP90_Dog
CL_42HCH_1_142N5AHSP90_Dog
CL_42HCH_2_140N5AHSP90_Dog
CL_42HG1_1_36N5OHSP90_Dog
CL_42HG1_2_34N5OHSP90_Dog
CL_42HKJ_1_38RDCTOPOVI
CL_42IWS_1_48NP4HSP90_Yeast
CL_42IWU_1_45NP5HSP90_Yeast
CL_42IWX_1_00M1SHSP90_Yeast
CL_42UWD_1_1262GGHSP90_Human
CL_51I58_1_129ACPHistidine_Kinase_CheA
CL_51I58_2_127ADPHistidine_Kinase_CheA
CL_51I59_1_57ANPHistidine_Kinase_CheA
CL_51I59_2_54ADPHistidine_Kinase_CheA
CL_51I5A_1_51ACPHistidine_Kinase_CheA
CL_51I5A_2_49ACPHistidine_Kinase_CheA
CL_51I5B_1_119ANPHistidine_Kinase_CheA
CL_51I5B_2_115ANPHistidine_Kinase_CheA
CL_51I5C_1_50ADPHistidine_Kinase_CheA
CL_51I5C_2_47ADPHistidine_Kinase_CheA
CL_52CH4_2_53ANPHistidine_Kinase_CheA
Acknowledgments
This work was supported by French Institute for Health and Medical Care (INSERM) and University Denis Diderot Paris 7. ODA’s PhD is financed by the French technical research association (ANRT) through a CIFRE grant. MEDIT holds all the rights on the presented methodology. The authors are indebted to S. Adcock for useful comments on the manuscript.
1. Wendt KU, Weiss MS, Cramer P, Heinz DW. Structures and diseases. Nat Struct Mol Biol. 2008;15:117–120. [PubMed]
2. Guido RV, Oliva G, Andricopulo AD. Virtual screening and its integration with modern drug design technologies. Curr Med Chem. 2008;15:37–46. [PubMed]
3. Waszkowycz B. Towards improving compound selection in structure-based virtual screening. Drug Discov Today. 2008;13:219–226. [PubMed]
4. Porter CT, Bartlett GJ, Thornton JM. The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data. Nucleic Acids Res. 2004;32:D129–133. [PMC free article] [PubMed]
5. Bartlett GJ, Porter CT, Borkakoti N, Thornton JM. Analysis of catalytic residues in enzyme active sites. J Mol Biol. 2002;324:105–121. [PubMed]
6. Fox BG, Goulding C, Malkowski MG, Stewart L, Deacon A. Structural genomics: from genes to structures with valuable materials and many questions in between. Nat Methods. 2008;5:129–132. [PubMed]
7. Altschul SF, Madden TL, Schaffer AA, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. [PMC free article] [PubMed]
8. Bairoch A. PROSITE: a dictionary of sites and patterns in proteins. Nucleic Acids Res. 1991;19(Suppl):2241–2245. [PMC free article] [PubMed]
9. Schmitt S, Kuhn D, Klebe G. A new method to detect related function among proteins independent of sequence and fold homology. J Mol Biol. 2002;323:387–406. [PubMed]
10. Shulman-Peleg A, Nussinov R, Wolfson HJ. SiteEngines: recognition and comparison of binding sites and protein-protein interfaces. Nucleic Acids Res. 2005;33:W337–41. [PMC free article] [PubMed]
11. Baroni M, Cruciani G, Sciabola S, Perruccio F, Mason JS. A common reference framework for analyzing/comparing proteins and ligands. Fingerprints for Ligands and Proteins (FLAP): theory and application. J Chem Inf Model. 2007;47:279–294. [PubMed]
12. Powers R, Copeland JC, Germer K, Mercier KA, Ramanathan V, Revesz P. Comparison of protein active site structures for functional annotation of proteins and drug design. Proteins. 2006;65:124–135. [PubMed]
13. Standley DM, Kinjo AR, Kinoshita K, Nakamura H. Protein structure databases with new web services for structural biology and biomedical research. Brief Bioinform. 2008;9:276–285. [PubMed]
14. Brylinski M, Prymula K, Jurkowski W, et al. Prediction of functional sites based on the fuzzy oil drop model. PLoS Comput Biol. 2007;3:e94. [PMC free article] [PubMed]
15. Lichtarge O, Bourne HR, Cohen FE. An evolutionary trace method defines binding surfaces common to protein families. J Mol Biol. 1996;257:342–358. [PubMed]
16. Mihalek I, Res I, Lichtarge O. Evolutionary trace report_maker: a new type of service for comparative analysis of proteins. Bioinformatics. 2006;22:1656–1657. [PubMed]
17. Morgan DH, Kristensen DM, Mittelman D, Lichtarge O. ET viewer: an application for predicting and visualizing functional sites in protein structures. Bioinformatics. 2006;22:2049–2050. [PubMed]
18. Jambon M, Imberty A, Deleage G, Geourjon C. A new bioinformatic approach to detect common 3D sites in protein structures. Proteins. 2003;52:137–145. [PubMed]
19. Jambon M, Andrieu O, Combet C, Deleage G, Delfaud F, Geourjon C. The SuMo server: 3D search for protein functional sites. Bioinformatics. 2005;21:3929–2930. [PubMed]
20. Doppelt O, Moriaud F, Bornot A, de Brevern AG. Functional annotation strategy for protein structures. Bioinformation. 2007;1:357–359. [PMC free article] [PubMed]
21. Jefferson ER, Walsh TP, Barton GJ. A comparison of SCOP and CATH with respect to domain-domain interactions. Proteins. 2008;70:54–62. [PubMed]
22. Andreeva A, Howorth D, Chandonia JM, et al. Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res. 2008;36:D419–425. [PMC free article] [PubMed]
23. Murzin AG, Brenner SE, Hubbard T, Chothia C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol. 1995;247:536–540. [PubMed]
24. Picard D. Heat-shock protein 90, a chaperone for folding and regulation. Cell Mol Life Sci. 2002;59:1640–1648. [PubMed]
25. Whitesell L, Lindquist SL. HSP90 and the chaperoning of cancer. Nat Rev Cancer. 2005;5:761–772. [PubMed]
26. Goetz MP, Toft DO, Ames MM, Erlichman C. The Hsp90 chaperone complex as a novel target for cancer therapy. Ann Oncol. 2003;14:1169–1176. [PubMed]
27. Zhang T, Hamza A, Cao X, Wang B, Yu S, Zhan CG, Sun D. A novel Hsp90 inhibitor to disrupt Hsp90/Cdc37 complex against pancreatic cancer cells. Mol Cancer Ther. 2008;7:162–170. [PubMed]
28. Roe SM, Prodromou C, O’Brien R, Ladbury JE, Piper PW, Pearl LH. Structural basis for inhibition of the Hsp90 molecular chaperone by the antitumor antibiotics radicicol and geldanamycin. J Med Chem. 1999;42:260–266. [PubMed]
29. Guarnieri MT, Zhang L, Shen J, Zhao R. The Hsp90 inhibitor radicicol interacts with the ATP-binding pocket of bacterial sensor kinase PhoQ. J Mol Biol. 2008;379:82–93. [PubMed]
30. Corbett KD, Berger JM. Structural basis for topoisomerase VI inhibition by the anti-Hsp90 drug radicicol. Nucleic Acids Res. 2006;34:4269–4277. [PMC free article] [PubMed]
31. van Dongen S. Graph Clustering by Flow Simulation. Utrecht, The Netherlands: University of Utrecht; 2000. PhD thesis.
32. Enright AJ, Ouzounis CA. BioLayout–an automatic graph layout algorithm for similarity visualization. Bioinformatics. 2001;17:853–854. [PubMed]
33. Goldovsky L, Cases I, Enright AJ, Ouzounis CA. BioLayout(Java): versatile network visualisation of structural and functional relationships. Appl Bioinformatics. 2005;4:71–74. [PubMed]
34. Bellon S, Parsons JD, Wei Y, et al. Crystal structures of Escherichia coli topoisomerase IV ParE subunit (24 and 43 kilodaltons): a single residue dictates differences in novobiocin potency against topoisomerase IV and DNA gyrase. Antimicrob Agents Chemother. 2004;48:1856–1864. [PMC free article] [PubMed]
35. Dessailly BH, Lensink MF, Wodak SJ. Relating destabilizing regions to known functional sites in proteins. BMC Bioinformatics. 2007;8:141. [PMC free article] [PubMed]
36. Brown DP, Krishnamurthy N, Sjolander K. Automated protein subfamily identification and classification. PLoS Comput Biol. 2007;3:e160. [PubMed]
37. Ramensky V, Sobol A, Zaitseva N, Rubinov A, Zosimov V. A novel approach to local similarity of protein binding sites substantially improves computational drug design results. Proteins. 2007;69:349–357. [PubMed]
38. Mao L, Wang Y, Liu Y, Hu X. Molecular determinants for ATP-binding in proteins: a data mining and quantum chemical analysis. J Mol Biol. 2004;336:787–807. [PubMed]
39. Niefind K, Putter M, Guerra B, Issinger OG, Schomburg D. GTP plus water mimic ATP in the active site of protein kinase CK2. Nat Struct Biol. 1999;6:1100–1103. [PubMed]
40. Yde CW, Ermakova I, Issinger OG, Niefind K. Inclining the purine base binding plane in protein kinase CK2 by exchanging the flanking side-chains generates a preference for ATP as a cosubstrate. J Mol Biol. 2005;347:399–414. [PubMed]
41. Nebel JC, Herzyk P, Gilbert DR. Automatic generation of 3D motifs for classification of protein binding sites. BMC Bioinformatics. 2007;8:321. [PMC free article] [PubMed]
42. Wu S, Liang MP, Altman RB. The SeqFEATURE library of 3D functional site models: comparison to existing methods and applications to protein function annotation. Genome Biol. 2008;9:R8. [PMC free article] [PubMed]
43. Kuhn D, Weskamp N, Hullermeier E, Klebe G. Functional classification of protein kinase binding sites using Cavbase. Chem Med Chem. 2007;2:1432–1447. [PubMed]
44. Besant PG, Lasker MV, Bui CD, Turck CW. Inhibition of branched-chain alpha-keto acid dehydrogenase kinase and Sln1 yeast histidine kinase by the antifungal antibiotic radicicol. Mol Pharmacol. 2002;62:289–296. [PubMed]
45. Ferre F, Ausiello G, Zanzoni A, Helmer-Citterich M. Functional annotation by identification of local surface similarities: a novel tool for structural genomics. BMC Bioinformatics. 2005;6:194. [PMC free article] [PubMed]
Articles from Drug Design, Development and Therapy are provided here courtesy of
Dove Press