Searching for well-fitting 3D oligopeptide fragments within a large collection of protein structures is an important task central to many analyses involving protein structures. This article reports a new web server, Super, dedicated to the task of rapidly screening the protein data bank (PDB) to identify all fragments that superpose with a query under a prespecified threshold of root-mean-square deviation (RMSD). Super relies on efficiently computing a mathematical bound on the commonly used structural similarity measure, RMSD of superposition. This allows the server to filter out a large proportion of fragments that are unrelated to the query; >99% of the total number of fragments in some cases. For a typical query, Super scans the current PDB containing over 80 500 structures (with ∼40 million potential oligopeptide fragments to match) in under a minute. Super web server is freely accessible from: http://lcb.infotech.monash.edu.au/super.
3dSS is a web-based interactive computing server, primarily designed to aid researchers, to superpose two or several 3D protein structures. In addition, the server can be effectively used to find the invariant and common water molecules present in the superposed homologous protein structures. The molecular visualization tool RASMOL is interfaced with the server to visualize the superposed 3D structures with the water molecules (invariant or common) in the client machine. Furthermore, an option is provided to save the superposed 3D atomic coordinates in the client machine. To perform the above, users need to enter Protein Data Bank (PDB)-id(s) or upload the atomic coordinates in PDB format. This server uses a locally maintained PDB anonymous FTP server that is being updated weekly. This program can be accessed through our Bioinformatics web server at the URL or .
A central tenet of structural biology is that related proteins of common function share structural similarity. This has key practical consequences for the derivation and analysis of protein structures, and is exploited by the process of “molecular sieving” whereby a common core is progressively distilled from a comparison of two or more protein structures. This paper reports a novel web server for “sieving” of protein structures, based on the multiple structural alignment program MUSTANG.
“Sieved” models are generated from MUSTANG-generated multiple alignment and superpositions by iteratively filtering out noisy residue-residue correspondences, until the resultant correspondences in the models are optimally “superposable” under a threshold of RMSD. This residue-level sieving is also accompanied by iterative elimination of the poorly fitting structures from the input ensemble. Therefore, by varying the thresholds of RMSD and the cardinality of the ensemble, multiple sieved models are generated for a given multiple alignment and superposition from MUSTANG. To aid the identification of structurally conserved regions of functional importance in an ensemble of protein structures, Lesk-Hubbard graphs are generated, plotting the number of residue correspondences in a superposition as a function of its corresponding RMSD. The conserved “core” (or typically active site) shows a linear trend, which becomes exponential as divergent parts of the structure are included into the superposition.
The application addresses two fundamental problems in structural biology: First, the identification of common substructures among structurally related proteins—an important problem in characterization and prediction of function; second, generation of sieved models with demonstrated uses in protein crystallographic structure determination using the technique of Molecular Replacement.
Most of the proteins in the Protein Data Bank (PDB) are oligomeric complexes consisting of two or more subunits that associate by rotational or helical symmetries. Despite the myriad of superimposition tools in the literature, we could not find any able to account for rotational symmetry and display the graphical results in the web browser.
BioSuper is a free web server that superimposes and calculates the root mean square deviation (RMSD) of protein complexes displaying rotational symmetry. To the best of our knowledge, BioSuper is the first tool of its kind that provides immediate interactive visualization of the graphical results in the browser, biomolecule generator capabilities, different levels of atom selection, sequence-dependent and structure-based superimposition types, and is the only web tool that takes into account the equivalence of atoms in side chains displaying symmetry ambiguity. BioSuper uses ICM program functionality as a core for the superimpositions and displays the results as text, HTML tables and 3D interactive molecular objects that can be visualized in the browser or in Android and iOS platforms with a free plugin.
BioSuper is a fast and functional tool that allows for pairwise superimposition of proteins and assemblies displaying rotational symmetry. The web server was created after our own frustration when attempting to superimpose flexible oligomers. We strongly believe that its user-friendly and functional design will be of great interest for structural and computational biologists who need to superimpose oligomeric proteins (or any protein). BioSuper web server is freely available to all users at http://ablab.ucsd.edu/BioSuper.
Protein structures often show similarities to another which would not be seen at the sequence level. Given the coordinates of a protein chain, the SALAMI server at
www.zbh.uni-hamburg.de/salami will search the protein data bank and return a set of similar structures without using sequence information. The results page lists the related proteins, details of the sequence and structure similarity and implied sequence alignments. Via a simple structure viewer, one can view superpositions of query and library structures and finally download superimposed coordinates. The alignment method is very tolerant of large gaps and insertions, and tends to produce slightly longer alignments than other similar programs.
PALI (release 1.2) contains three-dimensional (3-D) structure-dependent
sequence alignments as well as structure-based phylogenetic trees
of homologous protein domains in various families. The data set
of homologous protein structures has been derived by consulting
the SCOP database (release 1.50) and the data set comprises 604
families of homologous proteins involving 2739 protein domain structures with
each family made up of at least two members. Each member in a family
has been structurally aligned with every other member in the same
family (pairwise alignment) and all the members in the family are
also aligned using simultaneous superposition (multiple
alignment). The structural alignments are performed largely automatically,
with manual interventions especially in the cases of distantly related
proteins, using the program STAMP (version 4.2). Every family is
also associated with two dendrograms, calculated using PHYLIP (version
3.5), one based on a structural dissimilarity metric defined for every
pairwise alignment and the other based on similarity of topologically
equivalent residues. These dendrograms enable easy comparison of
sequence and structure-based relationships among the members in
a family. Structure-based alignments with the details of structural
and sequence similarities, superposed coordinate sets and dendrograms
can be accessed conveniently using a web interface. The database can
be queried for protein pairs with sequence or structural similarities
falling within a specified range. Thus PALI forms a useful resource
to help in analysing the relationship between sequence and structure variation
at a given level of sequence similarity. PALI also contains over
653 ‘orphans’ (single member families). Using
the web interface involving PSI_BLAST and PHYLIP it is
possible to associate the sequence of a new protein with one of
the families in PALI and generate a phylogenetic tree combining
the query sequence and proteins of known 3-D structure. The database
with the web interfaced search and dendrogram generation tools can
be accessed at http://pa
Our web site (http://ekhidna.biocenter.helsinki.fi/dali_server) runs the Dali program for protein structure comparison. The web site consists of three parts: (i) the Dali server compares newly solved structures against structures in the Protein Data Bank (PDB), (ii) the Dali database allows browsing precomputed structural neighbourhoods and (iii) the pairwise comparison generates suboptimal alignments for a pair of structures. Each part has its own query form and a common format for the results page. The inputs are either PDB identifiers or novel structures uploaded by the user. The results pages are hyperlinked to aid interactive analysis. The web interface is simple and easy to use. The key purpose of interactive analysis is to check whether conserved residues line up in multiple structural alignments and how conserved residues and ligands cluster together in multiple structure superimpositions. In favourable cases, protein structure comparison can lead to evolutionary discoveries not detected by sequence analysis.
3dLOGO is a web server for the identification and analysis of conserved protein 3D substructures. Given a set of residues in a PDB (Protein Data Bank) chain, the server detects the matching substructure(s) in a set of user-provided protein structures, generates a multiple structure alignment centered on the input substructures and highlights other residues whose structural conservation becomes evident after the defined superposition. Conserved residues are proposed to the user for highlighting functional areas, deriving refined structural motifs or building sequence patterns. Residue structural conservation can be visualized through an expressly designed Java application, 3dProLogo, which is a 3D implementation of a sequence logo. The 3dLOGO server, with related documentation, is available at http://3dlogo.uniroma2.it/
FF (Fragment Finder) is a web-based interactive search engine developed to retrieve the user-desired similar 3D structural fragments from the selected subset of 25 or 90% non-homologous protein chains. The search is based on the comparison of the main chain backbone conformational angles (φ and ϕ). Additionally, the queried motifs can be superimposed to find out how similar the structural fragments are, so that the information can be effectively used in molecular modeling. The engine has facilities to view the resultant superposed or individual 3D structure(s) on the client machine. The proposed web server is made freely accessible at the following URL: or .
MovieMaker is a web server that allows short (∼10 s), downloadable movies of protein motions to be generated. It accepts PDB files or PDB accession numbers as input and automatically calculates, renders and merges the necessary image files to create colourful animations covering a wide range of protein motions and other dynamic processes. Users have the option of animating (i) simple rotation, (ii) morphing between two end-state conformers, (iii) short-scale, picosecond vibrations, (iv) ligand docking, (v) protein oligomerization, (vi) mid-scale nanosecond (ensemble) motions and (vii) protein folding/unfolding. MovieMaker does not perform molecular dynamics calculations. Instead it is an animation tool that uses a sophisticated superpositioning algorithm in conjunction with Cartesian coordinate interpolation to rapidly and automatically calculate the intermediate structures needed for many of its animations. Users have extensive control over the rendering style, structure colour, animation quality, background and other image features. MovieMaker is intended to be a general-purpose server that allows both experts and non-experts to easily generate useful, informative protein animations for educational and illustrative purposes. MovieMaker is accessible at .
Currently, the PDB contains approximately 29,000 protein structures comprising over 70,000 experimentally determined three-dimensional structures of over 5,000 different low molecular weight compounds. Information about these PDB ligands can be very helpful in the field of molecular modelling and prediction, particularly for the prediction of protein binding sites and function.
Here we present an Internet accessible database delivering PDB ligands in the MDL Mol file format which, in contrast to the PDB format, includes information about bond types. Structural similarity of the compounds can be detected by calculation of Tanimoto coefficients and by three-dimensional superposition. Topological similarity of PDB ligands to known drugs can be assessed via Tanimoto coefficients.
SuperLigands supplements the set of existing resources of information about small molecules bound to PDB structures. Allowing for three-dimensional comparison of the compounds as a novel feature, this database represents a valuable means of analysis and prediction in the field of biological and medical research.
Protein structure research often deals with the comparison of two or more structures of the same protein, for instance when handling alternative structure models for the same protein, point mutants, molecule movements, structure predictions, etc. Often the difference between structures is small, restricted to a local neighborhood, and buried in structural "noise" due to trivial differences resulting from experimental artifacts. In such cases, whole-structure comparisons by means of structure superposition may be unsatisfactory and researchers have to perform a tedious process of manually superposing different segments individually and/or use different frames of reference, chosen roughly by educated guessing.
We have developed an algorithm to compare local structural differences between alternative structures of the same protein. We have implemented the algorithm through a computer program that performs the numerical evaluation and allows inspecting visually the results of the structure comparison. We have tested the algorithm on different kinds of model systems. Here we present the algorithm and some results to illustrate its characteristics.
This program may provide an insight into the local structural changes produced in a protein structure by different interactions or modifications. It is convenient for the general user and it can be applied to standard or specific tasks on protein structure research.
An appropriate structural superposition identifies similarities and differences between homologous proteins that are not evident from sequence alignments alone. We have coupled our Gaussian-weighted RMSD (wRMSD) tool with a sequence aligner and seed extension (SE) algorithm to create a robust technique for overlaying structures and aligning sequences of homologous proteins (HwRMSD). HwRMSD overcomes errors in the initial sequence alignment that would normally propagate into a standard RMSD overlay. SE can generate a corrected sequence alignment from the improved structural superposition obtained by wRMSD. HwRMSD’s robust performance and its superiority over standard RMSD are demonstrated over a range of homologous proteins. Its better overlay results in corrected sequence alignments with good agreement to HOMSTRAD. Finally, HwRMSD is compared to established structural alignment methods: FATCAT, SSM, CE, and Dalilite. Most methods are comparable at placing residue pairs within 2 Å, but HwRMSD places many more residue pairs within 1 Å, providing a clear advantage. Such high accuracy is essential in drug design, where small distances can have a large impact on computational predictions. This level of accuracy is also needed to correct sequence alignments in an automated fashion, especially for omics-scale analysis. HwRMSD can align homologs with low sequence identity and large conformational differences, cases where both sequence-based and structural-based methods may fail. The HwRMSD pipeline overcomes the dependency of structural overlays on initial sequence pairing and removes the need to determine the best sequence-alignment method, substitution matrix, and gap parameters for each unique pair of homologs.
Homolog; protein flexibility; sequence alignment; structure overlay; RMSD; structure alignment
The identification of evolutionarily conserved features of protein structures can provide insights into their functional and structural properties. Three methods have been developed and implemented as WWW tools, CAMPO, SCR_FIND and CHC_FIND, to analyze evolutionarily conserved residues (ECRs), structurally conserved regions (SCRs) and conserved hydrophobic contacts (CHCs) in protein families and superfamilies, on the basis of their 3D structures and the homologous sequences available. The programs identify protein segments that conserve a similar main-chain conformation, compute residue-to-residue hydrophobic contacts involving only apolar atoms common to all the 3D structures analyzed and allow the identification of conserved amino-acid sites among protein structures and their homologous sequences. The programs also allow the visualization of SCRs, CHCs and ECRs directly on the superposed structures and their multiple structural and sequence alignments. Tools and tutorials explaining their usage are available at , and .
The SWISS-MODEL Repository is a database of annotated three-dimensional comparative protein structure models generated by the fully automated homology-modelling pipeline SWISS-MODEL. The Repository currently contains about 300 000 three-dimensional models for sequences from the Swiss-Prot and TrEMBL databases. The content of the Repository is updated on a regular basis incorporating new sequences, taking advantage of new template structures becoming available and reflecting improvements in the underlying modelling algorithms. Each entry consists of one or more three-dimensional protein models, the superposed template structures, the alignments on which the models are based, a summary of the modelling process and a force field based quality assessment. The SWISS-MODEL Repository can be queried via an interactive website at http://swissmodel.expasy.org/repository/. Annotation and cross-linking of the models with other databases, e.g. Swiss-Prot on the ExPASy server, allow for seamless navigation between protein sequence and structure information. The aim of the SWISS-MODEL Repository is to provide access to an up-to-date collection of annotated three-dimensional protein models generated by automated homology modelling, bridging the gap between sequence and structure databases.
Summary: With the continuous growth of the RCSB Protein Data Bank (PDB), providing an up-to-date systematic structure comparison of all protein structures poses an ever growing challenge. Here, we present a comparison tool for calculating both 1D protein sequence and 3D protein structure alignments. This tool supports various applications at the RCSB PDB website. First, a structure alignment web service calculates pairwise alignments. Second, a stand-alone application runs alignments locally and visualizes the results. Third, pre-calculated 3D structure comparisons for the whole PDB are provided and updated on a weekly basis. These three applications allow users to discover novel relationships between proteins available either at the RCSB PDB or provided by the user.
Availability and Implementation: A web user interface is available at http://www.rcsb.org/pdb/workbench/workbench.do. The source code is available under the LGPL license from http://www.biojava.org. A source bundle, prepared for local execution, is available from http://source.rcsb.org
Contact: firstname.lastname@example.org; email@example.com
CE-MC server (http://cemc.sdsc.edu) provides a web-based facility for the alignment of multiple protein structures based on C-α coordinate distances, using combinatorial extension (CE) and Monte Carlo (MC) optimization methods. Alignments are possible for user-selected PDB (Protein Data Bank) chains as well as for user-uploaded structures or the combination of the two. The whole process of generating multiple structure alignments involves three distinct steps, i.e. all-to-all pairwise alignment using the CE algorithm, iterative global optimization of a multiple alignment using the MC algorithm and formatting MC results using the JOY program. The server can be used to get multiple alignments for up to 25 protein structural chains with the flexibility of uploading multiple coordinate files and performing multiple structure alignment for user-selected PDB chains. For large-scale jobs and local installation of the CE-MC program, users can download the source code and precompiled binaries from the web server.
It is often assumed that in the Protein Data Bank (PDB), two proteins with similar sequences will also have similar structures. Accordingly, it has proved useful to develop subsets of the PDB from which “redundant” structures have been removed, based on a sequence-based criterion for similarity. Similarly, when predicting protein structure using homology modeling, if a template structure for modeling a target sequence is selected by sequence alone, this implicitly assumes that all sequence-similar templates are equivalent. Here, we show that this assumption is often not correct and that standard approaches to create subsets of the PDB can lead to the loss of structurally and functionally important information. We have carried out sequence-based structural superpositions and geometry-based structural alignments of a large number of protein pairs to determine the extent to which sequence similarity ensures structural similarity. We find many examples where two proteins that are similar in sequence have structures that differ significantly from one another. The source of the structural differences usually has a functional basis. The number of such proteins pairs that are identified and the magnitude of the dissimilarity depend on the approach that is used to calculate the differences; in particular sequence-based structure superpositioning will identify a larger number of structurally dissimilar pairs than geometry-based structural alignments. When two sequences can be aligned in a statistically meaningful way, sequence-based structural superpositioning provides a meaningful measure of structural differences. This approach and geometry-based structure alignments reveal somewhat different information and one or the other might be preferable in a given application. Our results suggest that in some cases, notably homology modeling, the common use of nonredundant datasets, culled from the PDB based on sequence, may mask important structural and functional information. We have established a data base of sequence-similar, structurally dissimilar protein pairs that will help address this problem (http://luna.bioc.columbia.edu/rachel/seqsimstrdiff.htm).
structure comparison; structure alignment; structural differences; nonredundant; structure prediction
Calculation of the root mean square deviation (RMSD) between the atomic coordinates of two optimally superposed structures is a basic component of structural comparison techniques. We describe a quaternion based method, GPU-Q-J, that is stable with single precision calculations and suitable for graphics processor units (GPUs). The application was implemented on an ATI 4770 graphics card in C/C++ and Brook+ in Linux where it was 260 to 760 times faster than existing unoptimized CPU methods. Source code is available from the Compbio website http://software.compbio.washington.edu/misc/downloads/st_gpu_fit/ or from the author LHH.
The Nutritious Rice for the World Project (NRW) on World Community Grid predicted de novo, the structures of over 62,000 small proteins and protein domains returning a total of 10 billion candidate structures. Clustering ensembles of structures on this scale requires calculation of large similarity matrices consisting of RMSDs between each pair of structures in the set. As a real-world test, we calculated the matrices for 6 different ensembles from NRW. The GPU method was 260 times faster that the fastest existing CPU based method and over 500 times faster than the method that had been previously used.
GPU-Q-J is a significant advance over previous CPU methods. It relieves a major bottleneck in the clustering of large numbers of structures for NRW. It also has applications in structure comparison methods that involve multiple superposition and RMSD determination steps, particularly when such methods are applied on a proteome and genome wide scale.
Motivation: Functional similarity between proteins is evident at both the sequence and structure levels. SeSAW is a web-based program for identifying functionally or evolutionarily conserved motifs in protein structures by locating sequence and structural similarities, and quantifying these at the level of individual residues. Results can be visualized in 2D, as annotated alignments, or in 3D, as structural superpositions. An example is given for both an experimentally determined query structure and a homology model.
Availability and Implementation: The web server is located at http://www.pdbj.org/SeSAW/
Motivation: Comparing 3D structures of homologous RNA molecules yields information about sequence and structural variability. To compare large RNA 3D structures, accurate automatic comparison tools are needed. In this article, we introduce a new algorithm and web server to align large homologous RNA structures nucleotide by nucleotide using local superpositions that accommodate the flexibility of RNA molecules. Local alignments are merged to form a global alignment by employing a maximum clique algorithm on a specially defined graph that we call the ‘local alignment’ graph.
Results: The algorithm is implemented in a program suite and web server called ‘R3D Align’. The R3D Align alignment of homologous 3D structures of 5S, 16S and 23S rRNA was compared to a high-quality hand alignment. A full comparison of the 16S alignment with the other state-of-the-art methods is also provided. The R3D Align program suite includes new diagnostic tools for the structural evaluation of RNA alignments. The R3D Align alignments were compared to those produced by other programs and were found to be the most accurate, in comparison with a high quality hand-crafted alignment and in conjunction with a series of other diagnostics presented. The number of aligned base pairs as well as measures of geometric similarity are used to evaluate the accuracy of the alignments.
Availability: R3D Align is freely available through a web server http://rna.bgsu.edu/R3DAlign. The MATLAB source code of the program suite is also freely available for download at that location.
Supplementary data are available at Bioinformatics online.
Protein structure comparison, an important problem in structural biology, has two main applications: (i) comparing two protein structures in order to identify the similarities and differences between them, and (ii) searching for structures similar to a query structure. Many web-based resources for both applications are available, but all are based on rigid structural alignment algorithms. FATCAT server implements the recently developed flexible protein structure comparison algorithm FATCAT, which automatically identifies hinges and internal rearrangements in two protein structures. The server provides access to two algorithms: FATCAT-pairwise for pairwise flexible structure comparison and FATCAT-search for database searching for structurally similar proteins. Given two protein structures [in the Protein Data Bank (PDB) format], FATCAT-pairwise reports their structural alignment and the corresponding statistical significance of the similarity measured as a P-value. Users can view the superposition of the structures online in web browsers that support the Chime plug-in, or download the superimposed structures in PDB format. In FATCAT-search, users provide one query structure and the server returns a list of protein structures that are similar to the query, ordered by the P-values. In addition, FATCAT server can report the conformational changes of the query structure as compared to other proteins in the structure database. FATCAT server is available at http://fatcat.burnham.org.
The recent accumulation of large amounts of 3D structural data warrants a sensitive and automatic method to compare and classify these structures. We developed a web server for comparing protein 3D structures using the program Matras (http://biunit.aist-nara.ac.jp/matras). An advantage of Matras is its structure similarity score, which is defined as the log-odds of the probabilities, similar to Dayhoff's substitution model of amino acids. This score is designed to detect evolutionarily related (homologous) structural similarities. Our web server has three main services. The first one is a pairwise 3D alignment, which is simply align two structures. A user can assign structures by either inputting PDB codes or by uploading PDB format files in the local machine. The second service is a multiple 3D alignment, which compares several protein structures. This program employs the progressive alignment algorithm, in which pairwise 3D alignments are assembled in the proper order. The third service is a 3D library search, which compares one query structure against a large number of library structures. We hope this server provides useful tools for insights into protein 3D structures.
KinDOCK is a new web server for the analysis of ATP-binding sites of protein kinases. This characterization is based on the docking of ligands already co-crystallized with other protein kinases. A structural library of protein kinase–ligand complexes has been extracted from the Protein Data Bank (PDB). This library can provide both potential ligands and their putative binding orientation for a given protein kinase. After protein–protein structural superposition, the ligands are transferred from the template complexes to the target protein kinase. The resulting complexes are evaluated using the program SCORE to compute a theoretical affinity. They can be dynamically visualized to allow a rapid mapping of important steric clashes and potential substitutions relevant for specificity and affinity. These characteristics allow a quick characterization of protein kinase active sites including conformation changes potentially required to accommodate particular ligands. Additionally, promising pharmacophores can be identified in the focussed library. These features will help to rationalize or optimize virtual screening (VS) on larger chemical compound libraries. The server and its documentation are freely available at .
HOMSTRAD (http://www-cryst.bioc.cam.ac.uk/homstrad/) is a collection of protein families, clustered on the basis of sequence and structural similarity. The database is unique in that the protein family sequence alignments have been specially annotated using the program, JOY, to highlight a wide range of structural features. Such data are useful for identifying key structurally conserved residues within the families. Superpositions of the structures within each family are also available and a sensitive structure-aided search engine, FUGUE, can be used to search the database for matches to a query protein sequence. Historically, HOMSTRAD families were generated using several key pieces of software, including COMPARER and MNYFIT, and held in a number of flat files and indexes. A new relational database version of HOMSTRAD, HOMSTRAD BETA (http://www-cryst.bioc.cam.ac.uk/homstradbeta/) is being developed using MySQL. This relational data structure provides more flexibility for future developments, reduces update times and makes data more easily accessible. Consequently it has been possible to add a number of new web features including a custom alignment facility. Altogether, this makes HOMSTRAD and its new BETA version, an excellent resource both for comparative modelling and for identifying distant sequence/structure similarities between proteins.