Search tips
Search criteria

Results 1-25 (736374)

Clipboard (0)

Related Articles

1.  Gaia: automated quality assessment of protein structure models 
Bioinformatics  2011;27(16):2209-2215.
Motivation: Increasing use of structural modeling for understanding structure–function relationships in proteins has led to the need to ensure that the protein models being used are of acceptable quality. Quality of a given protein structure can be assessed by comparing various intrinsic structural properties of the protein to those observed in high-resolution protein structures.
Results: In this study, we present tools to compare a given structure to high-resolution crystal structures. We assess packing by calculating the total void volume, the percentage of unsatisfied hydrogen bonds, the number of steric clashes and the scaling of the accessible surface area. We assess covalent geometry by determining bond lengths, angles, dihedrals and rotamers. The statistical parameters for the above measures, obtained from high-resolution crystal structures enable us to provide a quality-score that points to specific areas where a given protein structural model needs improvement.
Availability and Implementation: We provide these tools that appraise protein structures in the form of a web server Gaia ( Gaia evaluates the packing and covalent geometry of a given protein structure and provides quantitative comparison of the given structure to high-resolution crystal structures.
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3150034  PMID: 21700672
2.  iFoldRNA: three-dimensional RNA structure prediction and folding 
Bioinformatics  2008;24(17):1951-1952.
Summary: Three-dimensional RNA structure prediction and folding is of significant interest in the biological research community. Here, we present iFoldRNA, a novel web-based methodology for RNA structure prediction with near atomic resolution accuracy and analysis of RNA folding thermodynamics. iFoldRNA rapidly explores RNA conformations using discrete molecular dynamics simulations of input RNA sequences. Starting from simplified linear-chain conformations, RNA molecules (<50 nt) fold to native-like structures within half an hour of simulation, facilitating rapid RNA structure prediction. All-atom reconstruction of energetically stable conformations generates iFoldRNA predicted RNA structures. The predicted RNA structures are within 2–5 Å root mean squre deviations (RMSDs) from corresponding experimentally derived structures. RNA folding parameters including specific heat, contact maps, simulation trajectories, gyration radii, RMSDs from native state, fraction of native-like contacts are accessible from iFoldRNA. We expect iFoldRNA will serve as a useful resource for RNA structure prediction and folding thermodynamic analyses.
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC2559968  PMID: 18579566
3.  Rigidity analysis of protein biological assemblies and periodic crystal structures 
BMC Bioinformatics  2013;14(Suppl 18):S2.
We initiate in silico rigidity-theoretical studies of biological assemblies and small crystals for protein structures. The goal is to determine if, and how, the interactions among neighboring cells and subchains affect the flexibility of a molecule in its crystallized state. We use experimental X-ray crystallography data from the Protein Data Bank (PDB). The analysis relies on an effcient graph-based algorithm. Computational experiments were performed using new protein rigidity analysis tools available in the new release of our KINARI-Web server
We provide two types of results: on biological assemblies and on crystals. We found that when only isolated subchains are considered, structural and functional information may be missed. Indeed, the rigidity of biological assemblies is sometimes dependent on the count and placement of hydrogen bonds and other interactions among the individual subchains of the biological unit. Similarly, the rigidity of small crystals may be affected by the interactions between atoms belonging to different unit cells.
We have analyzed a dataset of approximately 300 proteins, from which we generated 982 crystals (some of which are biological assemblies). We identified two types of behaviors. (a) Some crystals and/or biological assemblies will aggregate into rigid bodies that span multiple unit cells/asymmetric units. Some of them create substantially larger rigid cluster in the crystal/biological assembly form, while in other cases, the aggregation has a smaller effect just at the interface between the units. (b) In other cases, the rigidity properties of the asymmetric units are retained, because the rigid bodies did not combine.
We also identified two interesting cases where rigidity analysis may be correlated with the functional behavior of the protein. This type of information, identified here for the first time, depends critically on the ability to create crystals and biological assemblies, and would not have been observed only from the asymmetric unit.
For the Ribonuclease A protein (PDB file 5RSA), which is functionally active in the crystallized form, we found that the individual protein and its crystal form retain the flexibility parameters between the two states. In contrast, a derivative of Ribonuclease A (PDB file 9RSA), has no functional activity, and the protein in both the asymmetric and crystalline forms, is very rigid.
For the vaccinia virus D13 scaffolding protein (PDB file 3SAQ), which has two biological assemblies, we observed a striking asymmetry in the rigidity cluster decomposition of one of them, which seems implausible, given its symmetry. Upon careful investigation, we tracked the cause to a placement decision by the Reduce software concerning the hydrogen atoms, thus affecting the distribution of certain hydrogen bonds. The surprising result is that the presence or lack of a very few, but critical, hydrogen bonds, can drastically affect the rigid cluster decomposition of the biological assembly.
The rigidity analysis of a single asymmetric unit may not accurately reflect the protein's behavior in the tightly packed crystal environment. Using our KINARI software, we demonstrated that additional functional and rigidity information can be gained by analyzing a protein's biological assembly and/or crystal structure. However, performing a larger scale study would be computationally expensive (due to the size of the molecules involved). Overcoming this limitation will require novel mathematical and computational extensions to our software.
PMCID: PMC3817814  PMID: 24564201
4.  Knowledge-based Fragment Binding Prediction 
PLoS Computational Biology  2014;10(4):e1003589.
Target-based drug discovery must assess many drug-like compounds for potential activity. Focusing on low-molecular-weight compounds (fragments) can dramatically reduce the chemical search space. However, approaches for determining protein-fragment interactions have limitations. Experimental assays are time-consuming, expensive, and not always applicable. At the same time, computational approaches using physics-based methods have limited accuracy. With increasing high-resolution structural data for protein-ligand complexes, there is now an opportunity for data-driven approaches to fragment binding prediction. We present FragFEATURE, a machine learning approach to predict small molecule fragments preferred by a target protein structure. We first create a knowledge base of protein structural environments annotated with the small molecule substructures they bind. These substructures have low-molecular weight and serve as a proxy for fragments. FragFEATURE then compares the structural environments within a target protein to those in the knowledge base to retrieve statistically preferred fragments. It merges information across diverse ligands with shared substructures to generate predictions. Our results demonstrate FragFEATURE's ability to rediscover fragments corresponding to the ligand bound with 74% precision and 82% recall on average. For many protein targets, it identifies high scoring fragments that are substructures of known inhibitors. FragFEATURE thus predicts fragments that can serve as inputs to fragment-based drug design or serve as refinement criteria for creating target-specific compound libraries for experimental or computational screening.
Author Summary
In drug discovery, the goal is to identify new compounds to alter the behavior of a protein implicated in disease. With the very large number of small molecules to test, researchers have increasingly studied fragments (compounds with a small number of atoms) because there are fewer possibilities to evaluate and they can be used to identify larger compounds. Computational tools can efficiently assess if a fragment will bind a protein target of interest. Given the large number of structures available for protein-small molecule complexes, we present in this study a data-driven computational method for fragment binding prediction called FragFEATURE. FragFEATURE predicts fragments preferred by a protein structure using a knowledge base of all previously observed protein-fragment interactions. Comparison to previous observations enables it to determine if a query structure is likely to bind particular fragments. For numerous protein structures bound to small molecules, FragFEATURE predicted fragments matching the bound entity. For multiple proteins, it also predicted fragments matching drugs known to inhibit the proteins. These fragments can therefore lead us to promising drug-like compounds to study further using computational tools or experimental resources.
PMCID: PMC3998881  PMID: 24762971
5.  deconSTRUCT: general purpose protein database search on the substructure level 
Nucleic Acids Research  2010;38(Web Server issue):W590-W594.
deconSTRUCT webserver offers an interface to a protein database search engine, usable for a general purpose detection of similar protein (sub)structures. Initially, it deconstructs the query structure into its secondary structure elements (SSEs) and reassembles the match to the target by requiring a (tunable) degree of similarity in the direction and sequential order of SSEs. Hierarchical organization and judicious use of the information about protein structure enables deconSTRUCT to achieve the sensitivity and specificity of the established search engines at orders of magnitude increased speed, without tying up irretrievably the substructure information in the form of a hash. In a post-processing step, a match on the level of the backbone atoms is constructed. The results presented to the user consist of the list of the matched SSEs, the transformation matrix for rigid superposition of the structures and several ways of visualization, both downloadable and implemented as a web-browser plug-in. The server is available at
PMCID: PMC2896154  PMID: 20522512
6.  PocketAnnotate: towards site-based function annotation 
Nucleic Acids Research  2012;40(Web Server issue):W400-W408.
A computational pipeline PocketAnnotate for functional annotation of proteins at the level of binding sites has been proposed in this study. The pipeline integrates three in-house algorithms for site-based function annotation: PocketDepth, for prediction of binding sites in protein structures; PocketMatch, for rapid comparison of binding sites and PocketAlign, to obtain detailed alignment between pair of binding sites. A novel scheme has been developed to rapidly generate a database of non-redundant binding sites. For a given input protein structure, putative ligand-binding sites are identified, matched in real time against the database and the query substructure aligned with the promising hits, to obtain a set of possible ligands that the given protein could bind to. The input can be either whole protein structures or merely the substructures corresponding to possible binding sites. Structure-based function annotation at the level of binding sites thus achieved could prove very useful for cases where no obvious functional inference can be obtained based purely on sequence or fold-level analyses. An attempt has also been made to analyse proteins of no known function from Protein Data Bank. PocketAnnotate would be a valuable tool for the scientific community and contribute towards structure-based functional inference. The web server can be freely accessed at
PMCID: PMC3394344  PMID: 22618878
7.  ProteMiner-SSM: a web server for efficient analysis of similar protein tertiary substructures 
Nucleic Acids Research  2004;32(Web Server issue):W76-W82.
Analysis of protein–ligand interactions is a fundamental issue in drug design. As the detailed and accurate analysis of protein–ligand interactions involves calculation of binding free energy based on thermodynamics and even quantum mechanics, which is highly expensive in terms of computing time, conformational and structural analysis of proteins and ligands has been widely employed as a screening process in computer-aided drug design. In this paper, a web server called ProteMiner-SSM designed for efficient analysis of similar protein tertiary substructures is presented. In one experiment reported in this paper, the web server has been exploited to obtain some clues about a biochemical hypothesis. The main distinction in the software design of the web server is the filtering process incorporated to expedite the analysis. The filtering process extracts the residues located in the caves of the protein tertiary structure for analysis and operates with O(nlogn) time complexity, where n is the number of residues in the protein. In comparison, the α-hull algorithm, which is a widely used algorithm in computer graphics for identifying those instances that are on the contour of a three-dimensional object, features O(n2) time complexity. Experimental results show that the filtering process presented in this paper is able to speed up the analysis by a factor ranging from 3.15 to 9.37 times. The ProteMiner-SSM web server can be found at There is a mirror site at
PMCID: PMC441563  PMID: 15215355
8.  FATCAT: a web server for flexible structure comparison and structure similarity searching 
Nucleic Acids Research  2004;32(Web Server issue):W582-W585.
Protein structure comparison, an important problem in structural biology, has two main applications: (i) comparing two protein structures in order to identify the similarities and differences between them, and (ii) searching for structures similar to a query structure. Many web-based resources for both applications are available, but all are based on rigid structural alignment algorithms. FATCAT server implements the recently developed flexible protein structure comparison algorithm FATCAT, which automatically identifies hinges and internal rearrangements in two protein structures. The server provides access to two algorithms: FATCAT-pairwise for pairwise flexible structure comparison and FATCAT-search for database searching for structurally similar proteins. Given two protein structures [in the Protein Data Bank (PDB) format], FATCAT-pairwise reports their structural alignment and the corresponding statistical significance of the similarity measured as a P-value. Users can view the superposition of the structures online in web browsers that support the Chime plug-in, or download the superimposed structures in PDB format. In FATCAT-search, users provide one query structure and the server returns a list of protein structures that are similar to the query, ordered by the P-values. In addition, FATCAT server can report the conformational changes of the query structure as compared to other proteins in the structure database. FATCAT server is available at
PMCID: PMC441568  PMID: 15215455
9.  RNATOPS-W: a web server for RNA structure searches of genomes 
Bioinformatics  2009;25(8):1080-1081.
Summary: RNATOPS-W is a web server to search sequences for RNA secondary structures including pseudoknots. The server accepts an annotated RNA multiple structural alignment as a structural profile and genomic or other sequences to search. It is built upon RNATOPS, a command line C++software package for the same purpose, in which filters to speed up search are manually selected. RNATOPS-W improves upon RNATOPS by adding the function of automatic selection of a hidden Markov model (HMM) filter and also a friendly user interface for selection of a substructure filter by the user. In addition, RNATOPS-W complements existing RNA secondary structure search web servers that either use built-in structure profiles or are not able to detect pseudoknots. RNATOPS-W inherits the efficiency of RNATOPS in detecting large, complex RNA structures.
Availability: The web server RNATOPS-W is available at the web site The underlying search program RNATOPS can be downloaded at
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC2720711  PMID: 19269988
10.  Protemot: prediction of protein binding sites with automatically extracted geometrical templates 
Nucleic Acids Research  2006;34(Web Server issue):W303-W309.
Geometrical analysis of protein tertiary substructures has been an effective approach employed to predict protein binding sites. This article presents the Protemot web server that carries out prediction of protein binding sites based on the structural templates automatically extracted from the crystal structures of protein–ligand complexes in the PDB (Protein Data Bank). The automatic extraction mechanism is essential for creating and maintaining a comprehensive template library that timely accommodates to the new release of PDB as the number of entries continues to grow rapidly. The design of Protemot is also distinctive by the mechanism employed to expedite the analysis process that matches the tertiary substructures on the contour of the query protein with the templates in the library. This expediting mechanism is essential for providing reasonable response time to the user as the number of entries in the template library continues to grow rapidly due to rapid growth of the number of entries in PDB. This article also reports the experiments conducted to evaluate the prediction power delivered by the Protemot web server. Experimental results show that Protemot can deliver a superior prediction power than a web server based on a manually curated template library with insufficient quantity of entries. Availability: .
PMCID: PMC1538868  PMID: 16845015
11.  A Real-Time All-Atom Structural Search Engine for Proteins 
PLoS Computational Biology  2014;10(7):e1003750.
Protein designers use a wide variety of software tools for de novo design, yet their repertoire still lacks a fast and interactive all-atom search engine. To solve this, we have built the Suns program: a real-time, atomic search engine integrated into the PyMOL molecular visualization system. Users build atomic-level structural search queries within PyMOL and receive a stream of search results aligned to their query within a few seconds. This instant feedback cycle enables a new “designability”-inspired approach to protein design where the designer searches for and interactively incorporates native-like fragments from proven protein structures. We demonstrate the use of Suns to interactively build protein motifs, tertiary interactions, and to identify scaffolds compatible with hot-spot residues. The official web site and installer are located at and the source code is hosted at (PyMOL plugin, BSD license), (command line client, BSD license), and (search engine server, GPLv2 license).
PMCID: PMC4117414  PMID: 25079944
12.  UFSRAT: Ultra-Fast Shape Recognition with Atom Types –The Discovery of Novel Bioactive Small Molecular Scaffolds for FKBP12 and 11βHSD1 
PLoS ONE  2015;10(2):e0116570.
Using molecular similarity to discover bioactive small molecules with novel chemical scaffolds can be computationally demanding. We describe Ultra-fast Shape Recognition with Atom Types (UFSRAT), an efficient algorithm that considers both the 3D distribution (shape) and electrostatics of atoms to score and retrieve molecules capable of making similar interactions to those of the supplied query.
Computational optimization and pre-calculation of molecular descriptors enables a query molecule to be run against a database containing 3.8 million molecules and results returned in under 10 seconds on modest hardware. UFSRAT has been used in pipelines to identify bioactive molecules for two clinically relevant drug targets; FK506-Binding Protein 12 and 11β-hydroxysteroid dehydrogenase type 1. In the case of FK506-Binding Protein 12, UFSRAT was used as the first step in a structure-based virtual screening pipeline, yielding many actives, of which the most active shows a KD, app of 281 µM and contains a substructure present in the query compound. Success was also achieved running solely the UFSRAT technique to identify new actives for 11β-hydroxysteroid dehydrogenase type 1, for which the most active displays an IC50 of 67 nM in a cell based assay and contains a substructure radically different to the query. This demonstrates the valuable ability of the UFSRAT algorithm to perform scaffold hops.
Availability and Implementation
A web-based implementation of the algorithm is freely available at
PMCID: PMC4319890  PMID: 25659145
13.  The LabelHash algorithm for substructure matching 
BMC Bioinformatics  2010;11:555.
There is an increasing number of proteins with known structure but unknown function. Determining their function would have a significant impact on understanding diseases and designing new therapeutics. However, experimental protein function determination is expensive and very time-consuming. Computational methods can facilitate function determination by identifying proteins that have high structural and chemical similarity.
We present LabelHash, a novel algorithm for matching substructural motifs to large collections of protein structures. The algorithm consists of two phases. In the first phase the proteins are preprocessed in a fashion that allows for instant lookup of partial matches to any motif. In the second phase, partial matches for a given motif are expanded to complete matches. The general applicability of the algorithm is demonstrated with three different case studies. First, we show that we can accurately identify members of the enolase superfamily with a single motif. Next, we demonstrate how LabelHash can complement SOIPPA, an algorithm for motif identification and pairwise substructure alignment. Finally, a large collection of Catalytic Site Atlas motifs is used to benchmark the performance of the algorithm. LabelHash runs very efficiently in parallel; matching a motif against all proteins in the 95% sequence identity filtered non-redundant Protein Data Bank typically takes no more than a few minutes. The LabelHash algorithm is available through a web server and as a suite of standalone programs at The output of the LabelHash algorithm can be further analyzed with Chimera through a plugin that we developed for this purpose.
LabelHash is an efficient, versatile algorithm for large-scale substructure matching. When LabelHash is running in parallel, motifs can typically be matched against the entire PDB on the order of minutes. The algorithm is able to identify functional homologs beyond the twilight zone of sequence identity and even beyond fold similarity. The three case studies presented in this paper illustrate the versatility of the algorithm.
PMCID: PMC2996407  PMID: 21070651
14.  ProteinDBS v2.0: a web server for global and local protein structure search 
Nucleic Acids Research  2010;38(Web Server issue):W53-W58.
ProteinDBS v2.0 is a web server designed for efficient and accurate comparisons and searches of structurally similar proteins from a large-scale database. It provides two comparison methods, global-to-global and local-to-local, to facilitate the searches of protein structures or substructures. ProteinDBS v2.0 applies advanced feature extraction algorithms and scalable indexing techniques to achieve a high-running speed while preserving reasonably high precision of structural comparison. The experimental results show that our system is able to return results of global comparisons in seconds from a complete Protein Data Bank (PDB) database of 152 959 protein chains and that it takes much less time to complete local comparisons from a non-redundant database of 3276 proteins than other accurate comparison methods. ProteinDBS v2.0 supports query by PDB protein ID and by new structures uploaded by users. To our knowledge, this is the only search engine that can simultaneously support global and local comparisons. ProteinDBS v2.0 is a useful tool to investigate functional or evolutional relationships among proteins. Moreover, the common substructures identified by local comparison can be potentially used to assist the human curation process in discovering new domains or folds from the ever-growing protein structure databases. The system is hosted at
PMCID: PMC2896110  PMID: 20538653
15.  NASSAM: a server to search for and annotate tertiary interactions and motifs in three-dimensional structures of complex RNA molecules 
Nucleic Acids Research  2012;40(Web Server issue):W35-W41.
Similarities in the 3D patterns of RNA base interactions or arrangements can provide insights into their functions and roles in stabilization of the RNA 3D structure. Nucleic Acids Search for Substructures and Motifs (NASSAM) is a graph theoretical program that can search for 3D patterns of base arrangements by representing the bases as pseudo-atoms. The geometric relationship of the pseudo-atoms to each other as a pattern can be represented as a labeled graph where the pseudo-atoms are the graph’s nodes while the edges are the inter-pseudo-atomic distances. The input files for NASSAM are PDB formatted 3D coordinates. This web server can be used to identify matches of base arrangement patterns in a query structure to annotated patterns that have been reported in the literature or that have possible functional and structural stabilization implications. The NASSAM program is freely accessible without any login requirement at
PMCID: PMC3394293  PMID: 22661578
16.  Protinfo PPC: A web server for atomic level prediction of protein complexes 
Nucleic Acids Research  2009;37(Web Server issue):W519-W525.
‘Protinfo PPC’ (Prediction of Protein Complex) is a web server that predicts atomic level structures of interacting proteins from their amino-acid sequences. It uses the interolog method to search for experimental protein complex structures that are homologous to the input sequences submitted by a user. These structures are then used as starting templates to generate protein complex models, which are returned to the user in Protein Data Bank format via email. The server supports modeling of both homo and hetero multimers and generally produces full atomic level models (including insertion/deletion regions) of protein complexes as long as at least one putative homologous template for the query sequences is found. The modeling pipeline behind Protinfo PPC has been rigorously benchmarked and proven to produce highly accurate protein complex models. The fully automated all atom comparative modeling service for protein complexes provided by Protinfo PPC server offers wide capabilities ranging from prediction of protein complex interactions to identification of possible interaction sites, which will be useful for researchers studying these topics. The Protinfo PPC web server is available at
PMCID: PMC2703994  PMID: 19420059
17.  Statistical Potential for Modeling and Ranking of Protein-Ligand Interactions 
Applications in structural biology and medicinal chemistry require protein-ligand scoring functions for two distinct tasks: (i) ranking different poses of a small molecule in a protein binding site; and (ii) ranking different small molecules by their complementarity to a protein site. Using probability theory, we developed two atomic distance-dependent statistical scoring functions: PoseScore was optimized for recognizing native binding geometries of ligands from other poses and RankScore was optimized for distinguishing ligands from nonbinding molecules. Both scores are based on a set of 8,885 crystallographic structures of protein-ligand complexes, but differ in the values of three key parameters. Factors influencing the accuracy of scoring were investigated, including the maximal atomic distance and non-native ligand geometries used for scoring, as well as the use of protein models instead of crystallographic structures for training and testing the scoring function. For the test set of 19 targets, RankScore improved the ligand enrichment (logAUC) and early enrichment (EF1) scores computed by DOCK 3.6 for 13 and 14 targets, respectively. In addition, RankScore performed better at rescoring than each of seven other scoring functions tested. Accepting both the crystal structure and decoy geometries with all-atom root-mean-square errors of up to 2 Å from the crystal structure as correct binding poses, PoseScore gave the best score to a correct binding pose among 100 decoys for 88% of all cases in a benchmark set containing 100 protein-ligand complexes. PoseScore accuracy is comparable to that of DrugScoreCSD and ITScore/SE, and superior to 12 other tested scoring functions. Therefore, RankScore can facilitate ligand discovery, by ranking complexes of the target with different small molecules; PoseScore can be used for protein-ligand complex structure prediction, by ranking different conformations of a given protein-ligand pair. The statistical potentials are available through the Integrative Modeling Platform (IMP) software package ( and the LigScore web server (
PMCID: PMC3246566  PMID: 22014038
statistical potential; reference state; binding pose; ligand enrichment
18.  FireDock: a web server for fast interaction refinement in molecular docking† 
Nucleic Acids Research  2008;36(Web Server issue):W229-W232.
Structural details of protein–protein interactions are invaluable for understanding and deciphering biological mechanisms. Computational docking methods aim to predict the structure of a protein–protein complex given the structures of its single components. Protein flexibility and the absence of robust scoring functions pose a great challenge in the docking field. Due to these difficulties most of the docking methods involve a two-tier approach: coarse global search for feasible orientations that treats proteins as rigid bodies, followed by an accurate refinement stage that aims to introduce flexibility into the process. The FireDock web server, presented here, is the first web server for flexible refinement and scoring of protein–protein docking solutions. It includes optimization of side-chain conformations and rigid-body orientation and allows a high-throughput refinement. The server provides a user-friendly interface and a 3D visualization of the results. A docking protocol consisting of a global search by PatchDock and a refinement by FireDock was extensively tested. The protocol was successful in refining and scoring docking solution candidates for cases taken from docking benchmarks. We provide an option for using this protocol by automatic redirection of PatchDock candidate solutions to the FireDock web server for refinement. The FireDock web server is available at
PMCID: PMC2447790  PMID: 18424796
19.  PBSword: a web server for searching similar protein–protein binding sites 
Nucleic Acids Research  2012;40(Web Server issue):W428-W434.
PBSword is a web server designed for efficient and accurate comparisons and searches of geometrically similar protein–protein binding sites from a large-scale database. The basic idea of PBSword is that each protein binding site is first represented by a high-dimensional vector of ‘visual words’, which characterizes both the global and local shape features of the binding site. It then uses a scalable indexing technique to search for those binding sites whose visual words representations are similar to that of the query binding site. Our system is able to return ranked results of binding sites in short time from a database of 194 322 domain–domain binding sites. PBSword supports query by protein ID and by new structures uploaded by users. PBSword is a useful tool to investigate functional connections among proteins based on the local structures of binding site and has potential applications to protein–protein docking and drug discovery. The system is hosted at
PMCID: PMC3394332  PMID: 22689645
20.  EvoDesign: de novo protein design based on structural and evolutionary profiles 
Nucleic Acids Research  2013;41(Web Server issue):W273-W280.
Protein design aims to identify new protein sequences of desirable structure and biological function. Most current de novo protein design methods rely on physics-based force fields to search for low free-energy states following Anfinsen’s thermodynamic hypothesis. A major obstacle of such approaches is the inaccuracy of the force field design, which cannot accurately describe the atomic interactions or distinguish correct folds. We developed a new web server, EvoDesign, to design optimal protein sequences of given scaffolds along with multiple sequence and structure-based features to assess the foldability and goodness of the designs. EvoDesign uses an evolution-profile–based Monte Carlo search with the profiles constructed from homologous structure families in the Protein Data Bank. A set of local structure features, including secondary structure, torsion angle and solvation, are predicted by single-sequence neural-network training and used to smooth the sequence motif and accommodate the physicochemical packing. The EvoDesign algorithm has been extensively tested in large-scale protein design experiments, which demonstrate enhanced foldability and structural stability of designed sequences compared with the physics-based designing methods. The EvoDesign server is freely available at
PMCID: PMC3692067  PMID: 23671331
21.  Corresponding Functional Dynamics across the Hsp90 Chaperone Family: Insights from a Multiscale Analysis of MD Simulations 
PLoS Computational Biology  2012;8(3):e1002433.
Understanding how local protein modifications, such as binding small-molecule ligands, can trigger and regulate large-scale motions of large protein domains is a major open issue in molecular biology. We address various aspects of this problem by analyzing and comparing atomistic simulations of Hsp90 family representatives for which crystal structures of the full length protein are available: mammalian Grp94, yeast Hsp90 and E.coli HtpG. These chaperones are studied in complex with the natural ligands ATP, ADP and in the Apo state. Common key aspects of their functional dynamics are elucidated with a novel multi-scale comparison of their internal dynamics. Starting from the atomic resolution investigation of internal fluctuations and geometric strain patterns, a novel analysis of domain dynamics is developed. The results reveal that the ligand-dependent structural modulations mostly consist of relative rigid-like movements of a limited number of quasi-rigid domains, shared by the three proteins. Two common primary hinges for such movements are identified. The first hinge, whose functional role has been demonstrated by several experimental approaches, is located at the boundary between the N-terminal and Middle-domains. The second hinge is located at the end of a three-helix bundle in the Middle-domain and unfolds/unpacks going from the ATP- to the ADP-state. This latter site could represent a promising novel druggable allosteric site common to all chaperones.
Author Summary
Understanding the connections between structure, binding, dynamics and function in proteins is one of the most fascinating problems in biology and is actively investigated experimentally and computationally. In the latter context, significant advancements are possible by exposing the causal link between the fine atomic-scale protein-ligand interactions and the large-scale protein motions. One ideal avenue to explore this relationship is given by proteins of the Hsp90 chaperones family. Their dynamics is regulated by ATP binding and hydrolysis, which activates the onset of large-scale, functional conformational changes. Herein, we concentrated on three homologs with markedly different structural organization—mammalian Grp94, yeast Hsp90 and prokaryotic HtpG—and developed a novel computational multiscale approach to detect and characterize the salient traits of the functionally-oriented internal dynamics of the three chaperones. The comparative analysis, which exploits a novel highly simplified, yet viable, description of the protein internal dynamics, highlights fundamental mechanical aspects that preside the ligand-dependent conformational arrangements in all chaperones. For the three molecules, two corresponding regions are singled out as ligand-susceptible hinges for the large-scale internal motion. On the basis of this and other evidence it is suggested that these regions represent functionally relevant druggable substructures in the discovery of novel allosteric modulators.
PMCID: PMC3310708  PMID: 22457611
22.  The RosettaDock server for local protein–protein docking 
Nucleic Acids Research  2008;36(Web Server issue):W233-W238.
The RosettaDock server ( identifies low-energy conformations of a protein–protein interaction near a given starting configuration by optimizing rigid-body orientation and side-chain conformations. The server requires two protein structures as inputs and a starting location for the search. RosettaDock generates 1000 independent structures, and the server returns pictures, coordinate files and detailed scoring information for the 10 top-scoring models. A plot of the total energy of each of the 1000 models created shows the presence or absence of an energetic binding funnel. RosettaDock has been validated on the docking benchmark set and through the Critical Assessment of PRedicted Interactions blind prediction challenge.
PMCID: PMC2447798  PMID: 18442991
23.  DBD2BS: connecting a DNA-binding protein with its binding sites 
Nucleic Acids Research  2012;40(Web Server issue):W173-W179.
By binding to short and highly conserved DNA sequences in genomes, DNA-binding proteins initiate, enhance or repress biological processes. Accurately identifying such binding sites, often represented by position weight matrices (PWMs), is an important step in understanding the control mechanisms of cells. When given coordinates of a DNA-binding domain (DBD) bound with DNA, a potential function can be used to estimate the change of binding affinity after base substitutions, where the changes can be summarized as a PWM. This technique provides an effective alternative when the chromatin immunoprecipitation data are unavailable for PWM inference. To facilitate the procedure of predicting PWMs based on protein–DNA complexes or even structures of the unbound state, the web server, DBD2BS, is presented in this study. The DBD2BS uses an atom-level knowledge-based potential function to predict PWMs characterizing the sequences to which the query DBD structure can bind. For unbound queries, a list of 1066 DBD–DNA complexes (including 1813 protein chains) is compiled for use as templates for synthesizing bound structures. The DBD2BS provides users with an easy-to-use interface for visualizing the PWMs predicted based on different templates and the spatial relationships of the query protein, the DBDs and the DNAs. The DBD2BS is the first attempt to predict PWMs of DBDs from unbound structures rather than from bound ones. This approach increases the number of existing protein structures that can be exploited when analyzing protein–DNA interactions. In a recent study, the authors showed that the kernel adopted by the DBD2BS can generate PWMs consistent with those obtained from the experimental data. The use of DBD2BS to predict PWMs can be incorporated with sequence-based methods to discover binding sites in genome-wide studies.
Available at:,, and
PMCID: PMC3394304  PMID: 22693214
24.  PDTD: a web-accessible protein database for drug target identification 
BMC Bioinformatics  2008;9:104.
Target identification is important for modern drug discovery. With the advances in the development of molecular docking, potential binding proteins may be discovered by docking a small molecule to a repository of proteins with three-dimensional (3D) structures. To complete this task, a reverse docking program and a drug target database with 3D structures are necessary. To this end, we have developed a web server tool, TarFisDock (Target Fishing Docking) , which has been used widely by others. Recently, we have constructed a protein target database, Potential Drug Target Database (PDTD), and have integrated PDTD with TarFisDock. This combination aims to assist target identification and validation.
PDTD is a web-accessible protein database for in silico target identification. It currently contains >1100 protein entries with 3D structures presented in the Protein Data Bank. The data are extracted from the literatures and several online databases such as TTD, DrugBank and Thomson Pharma. The database covers diverse information of >830 known or potential drug targets, including protein and active sites structures in both PDB and mol2 formats, related diseases, biological functions as well as associated regulating (signaling) pathways. Each target is categorized by both nosology and biochemical function. PDTD supports keyword search function, such as PDB ID, target name, and disease name. Data set generated by PDTD can be viewed with the plug-in of molecular visualization tools and also can be downloaded freely. Remarkably, PDTD is specially designed for target identification. In conjunction with TarFisDock, PDTD can be used to identify binding proteins for small molecules. The results can be downloaded in the form of mol2 file with the binding pose of the probe compound and a list of potential binding targets according to their ranking scores.
PDTD serves as a comprehensive and unique repository of drug targets. Integrated with TarFisDock, PDTD is a useful resource to identify binding proteins for active compounds or existing drugs. Its potential applications include in silico drug target identification, virtual screening, and the discovery of the secondary effects of an old drug (i.e. new pharmacological usage) or an existing target (i.e. new pharmacological or toxic relevance), thus it may be a valuable platform for the pharmaceutical researchers. PDTD is available online at .
PMCID: PMC2265675  PMID: 18282303
25.  A structural bioinformatics approach for identifying proteins predisposed to bind linear epitopes on pre-selected target proteins 
We have developed a protocol for identifying proteins that are predisposed to bind linear epitopes on target proteins of interest. The protocol searches through the protein database for proteins (scaffolds) that are bound to peptides with sequences similar to accessible, linear epitopes on the target protein. The sequence match is considered more significant if residues calculated to be important in the scaffold–peptide interaction are present in the target epitope. The crystal structure of the scaffold–peptide complex is then used as a template for creating a model of the scaffold bound to the target epitope. This model can then be used in conjunction with sequence optimization algorithms or directed evolution methods to search for scaffold mutations that further increase affinity for the target protein. To test the applicability of this approach we targeted three disease-causing proteins: a tuberculosis virulence factor (TVF), the apical membrane antigen (AMA) from malaria, and hemagglutinin from influenza. In each case the best scoring scaffold was tested, and binders with Kds equal to 37 μM and 50 nM for TVF and AMA, respectively, were identified. A web server ( has been created for performing the scaffold search process with user-defined target sequences.
PMCID: PMC3601849  PMID: 23341643
epitope; protein engineering; protein–protein interaction; protein scaffold; structural bioinformatics

Results 1-25 (736374)