Molecular dynamics (MD) simulations of a double-stranded DNA with explicit water and small ions were performed with the zero-dipole summation (ZD) method, which was recently developed as one of the non-Ewald methods. Double-stranded DNA is highly charged and polar, with phosphate groups in its backbone and their counterions, and thus precise treatment for the long-range electrostatic interactions is always required to maintain the stable and native double-stranded form. A simple truncation method deforms it profoundly. On the contrary, the ZD method, which considers the neutralities of charges and dipoles in a truncated subset, well reproduced the electrostatic energies of the DNA system calculated by the Ewald method. The MD simulations using the ZD method provided a stable DNA system, with similar structures and dynamic properties to those produced by the conventional Particle mesh Ewald method.
A symposium celebrating the 40th anniversary of the Protein Data Bank archive (PDB), organized by the Worldwide Protein Data Bank, was held at Cold Spring Harbor Laboratory (CSHL) October 28–30, 2011. PDB40’s distinguished speakers highlighted four decades of innovation in structural biology, from the early era of structural determination to future directions for the field.
Several non-Ewald methods for calculating electrostatic interactions have recently been developed, such as the Wolf method, the reaction field method, the pre-averaging method, and the zero-dipole summation method, for molecular dynamics simulations of various physical systems, including biomolecular systems. We review the theories of these approaches and their potential applications to molecular simulations, and discuss their relationships.
Molecular dynamics; Electrostatic interaction; Reaction field method; Pre-averaging method; Wolf method; Zero-dipole summation method
Recent clinical trials using antibodies with low toxicity and high efficiency have raised expectations for the development of next-generation protein therapeutics. However, the process of obtaining therapeutic antibodies remains time consuming and empirical. This review summarizes recent progresses in the field of computer-aided antibody development mainly focusing on antibody modeling, which is divided essentially into two parts: (i) modeling the antigen-binding site, also called the complementarity determining regions (CDRs), and (ii) predicting the relative orientations of the variable heavy (VH) and light (VL) chains. Among the six CDR loops, the greatest challenge is predicting the conformation of CDR-H3, which is the most important in antigen recognition. Further computational methods could be used in drug development based on crystal structures or homology models, including antibody–antigen dockings and energy calculations with approximate potential functions. These methods should guide experimental studies to improve the affinities and physicochemical properties of antibodies. Finally, several successful examples of in silico structure-based antibody designs are reviewed. We also briefly review structure-based antigen or immunogen design, with application to rational vaccine development.
antibody design; antibody engineering; protein therapeutics; vaccine design
Most biological processes are described as a series of interactions between proteins and other molecules, and interactions are in turn described in terms of atomic structures. To annotate protein functions as sets of interaction states at atomic resolution, and thereby to better understand the relation between protein interactions and biological functions, we conducted exhaustive all-against-all atomic structure comparisons of all known binding sites for ligands including small molecules, proteins and nucleic acids, and identified recurring elementary motifs. By integrating the elementary motifs associated with each subunit, we defined composite motifs that represent context-dependent combinations of elementary motifs. It is demonstrated that function similarity can be better inferred from composite motif similarity compared to the similarity of protein sequences or of individual binding sites. By integrating the composite motifs associated with each protein function, we define meta-composite motifs each of which is regarded as a time-independent diagrammatic representation of a biological process. It is shown that meta-composite motifs provide richer annotations of biological processes than sequence clusters. The present results serve as a basis for bridging atomic structures to higher-order biological phenomena by classification and integration of binding site structures.
Protein folding and protein–ligand docking have long persisted as important subjects in biophysics. Using multicanonical molecular dynamics (McMD) simulations with realistic expressions, i.e., all-atom protein models and an explicit solvent, free-energy landscapes have been computed for several systems, such as the folding of peptides/proteins composed of a few amino acids up to nearly 60 amino-acid residues, protein–ligand interactions, and coupled folding and binding of intrinsically disordered proteins. Recent progress in conformational sampling and its applications to biophysical systems are reviewed in this report, including descriptions of several outstanding studies. In addition, an algorithm and detailed procedures used for multicanonical sampling are presented along with the methodology of adaptive umbrella sampling. Both methods control the simulation so that low-probability regions along a reaction coordinate are sampled frequently. The reaction coordinate is the potential energy for multicanonical sampling and is a structural identifier for adaptive umbrella sampling. One might imagine that this probability control invariably enhances conformational transitions among distinct stable states, but this study examines the enhanced conformational sampling of a simple system and shows that reasonably well-controlled sampling slows the transitions. This slowing is induced by a rapid change of entropy along the reaction coordinate. We then provide a recipe to speed up the sampling by loosening the rapid change of entropy. Finally, we report all-atom McMD simulation results of various biophysical systems in an explicit solvent.
Molecular dynamics; Enhanced sampling; Generalized ensemble; Multicanonical; Canonical ensemble; Free-energy landscape
The Protein Data Bank Japan (PDBj, http://pdbj.org) is a member of the worldwide Protein Data Bank (wwPDB) and accepts and processes the deposited data of experimentally determined macromolecular structures. While maintaining the archive in collaboration with other wwPDB partners, PDBj also provides a wide range of services and tools for analyzing structures and functions of proteins, which are summarized in this article. To enhance the interoperability of the PDB data, we have recently developed PDB/RDF, PDB data in the Resource Description Framework (RDF) format, along with its ontology in the Web Ontology Language (OWL) based on the PDB mmCIF Exchange Dictionary. Being in the standard format for the Semantic Web, the PDB/RDF data provide a means to integrate the PDB with other biological information resources.
Despite the availability of a large number of protein–protein interactions (PPIs) in several species, researchers are often limited to using very small subsets in a few organisms due to the high prevalence of spurious interactions. In spite of the importance of quality assessment of experimentally determined PPIs, a surprisingly small number of databases provide interactions with scores and confidence levels. We introduce HitPredict (http://hintdb.hgc.jp/htp/), a database with quality assessed PPIs in nine species. HitPredict assigns a confidence level to interactions based on a reliability score that is computed using evidence from sequence, structure and functional annotations of the interacting proteins. HitPredict was first released in 2005 and is updated annually. The current release contains 36 930 proteins with 176 983 non-redundant, physical interactions, of which 116 198 (66%) are predicted to be of high confidence.
This article is a tutorial for PDBj Mine, a new database and its interface for Protein Data Bank Japan (PDBj). In PDBj Mine, data are loaded from files in the PDBMLplus format (an extension of PDBML, PDB's canonical XML format, enriched with annotations), which are then served for the user of PDBj via the worldwide web (WWW). We describe the basic design of the relational database (RDB) and web interfaces of PDBj Mine. The contents of PDBMLplus files are first broken into XPath entities, and these paths and data are indexed in the way that reflects the hierarchical structure of the XML files. The data for each XPath type are saved into the corresponding relational table that is named as the XPath itself. The generation of table definitions from the PDBMLplus XML schema is fully automated. For efficient search, frequently queried terms are compiled into a brief summary table. Casual users can perform simple keyword search, and 'Advanced Search' which can specify various conditions on the entries. More experienced users can query the database using SQL statements which can be constructed in a uniform manner. Thus, PDBj Mine achieves a combination of the flexibility of XML documents and the robustness of the RDB.
Database URL: http://www.pdbj.org/
The PiRaNhA web server is a publicly available online resource that automatically predicts the location of RNA-binding residues (RBRs) in protein sequences. The goal of functional annotation of sequences in the field of RNA binding is to provide predictions of high accuracy that require only small numbers of targeted mutations for verification. The PiRaNhA server uses a support vector machine (SVM), with position-specific scoring matrices, residue interface propensity, predicted residue accessibility and residue hydrophobicity as features. The server allows the submission of up to 10 protein sequences, and the predictions for each sequence are provided on a web page and via email. The prediction results are provided in sequence format with predicted RBRs highlighted, in text format with the SVM threshold score indicated and as a graph which enables users to quickly identify those residues above any specific SVM threshold. The graph effectively enables the increase or decrease of the false positive rate. When tested on a non-redundant data set of 42 protein sequences not used in training, the PiRaNhA server achieved an accuracy of 85%, specificity of 90% and a Matthews correlation coefficient of 0.41 and outperformed other publicly available servers. The PiRaNhA prediction server is freely available at http://www.bioinformatics.sussex.ac.uk/PIRANHA.
Motivation: Functional similarity between proteins is evident at both the sequence and structure levels. SeSAW is a web-based program for identifying functionally or evolutionarily conserved motifs in protein structures by locating sequence and structural similarities, and quantifying these at the level of individual residues. Results can be visualized in 2D, as annotated alignments, or in 3D, as structural superpositions. An example is given for both an experimentally determined query structure and a homology model.
Availability and Implementation: The web server is located at http://www.pdbj.org/SeSAW/
Hubs are proteins with a large number of interactions in a protein-protein interaction network. They are the principal agents in the interaction network and affect its function and stability. Their specific recognition of many different protein partners is of great interest from the structural viewpoint. Over the last few years, the structural properties of hubs have been extensively studied. We review the currently known features that are particular to hubs, possibly affecting their binding ability. Specifically, we look at the levels of intrinsic disorder, surface charge and domain distribution in hubs, as compared to non-hubs, along with differences in their functional domains.
protein-protein interactions; interaction networks; hubs; promiscuous binding
A molecular similarity measure has been developed using molecular topological graphs and atomic partial charges. Two kinds of topological graphs were used. One is the ordinary adjacency matrix and the other is a matrix which represents the minimum path length between two atoms of the molecule. The ordinary adjacency matrix is suitable to compare the local structures of molecules such as functional groups, and the other matrix is suitable to compare the global structures of molecules. The combination of these two matrices gave a similarity measure. This method was applied to in silico drug screening, and the results showed that it was effective as a similarity measure.
Protein–protein docking simulations can provide the predicted complex structural models. In a docking simulation, several putative structural models are selected by scoring functions from an ensemble of many complex models. Scoring functions based on statistical analyses of heterodimers are usually designed to select the complex model with the most abundant interaction mode found among the known complexes, as the correct model. However, because the formation schemes of heterodimers are extremely diverse, a single scoring function does not seem to be sufficient to describe the fitness of the predicted models other than the most abundant interaction mode. Thus, it is necessary to classify the heterodimers in terms of their individual interaction modes, and then to construct multiple scoring functions for each heterodimer type. In this study, we constructed the classification method of heterodimers based on the discriminative characters between near-native and decoy models, which were found in the comparison of the interfaces in terms of the complementarities for the hydrophobicity, the electrostatic potential and the shape. Consequently, we found four heterodimer clusters, and then constructed the multiple scoring functions, each of which was optimized for each cluster. Our multiple scoring functions were applied to the predictions in the unbound docking.
classification of heterodimers; prediction of complex structures; scoring functions; protein-protein docking; CAPRI
A discrimination method between biologically relevant interfaces and artificial crystal-packing contacts in crystal structures was constructed. The method evaluates protein-protein interfaces in terms of complementarities for hydrophobicity, electrostatic potential and shape on the protein surfaces, and chooses the most probable biological interfaces among all possible contacts in the crystal. The method uses a discriminator named as “COMP”, which is a linear combination of the complementarities for the above three surface features and does not correlate with the contact area. The discrimination of homo-dimer interfaces from symmetry-related crystal-packing contacts based on the COMP value achieved the modest success rate. Subsequent detailed review of the discrimination results raised the success rate to about 88.8%. In addition, our discrimination method yielded some clues for understanding the interaction patterns in several examples in the PDB. Thus, the COMP discriminator can also be used as an indicator of the “biological-ness” of protein-protein interfaces.
protein-protein interaction; complementarity analysis; homo-dimer interface; crystal-packing contact; biological interfaces
We examined the procedures to combine two different in silico drug-screening results to achieve a high hit ratio. When the 3D structure of the target protein and some active compounds are known, both structure-based and ligand-based in silico screening methods can be applied. In the present study, the machine-learning score modification multiple target screening (MSM-MTS) method was adopted as a structure-based screening method, and the machine-learning docking score index (ML-DSI) method was adopted as a ligand-based screening method. To combine the predicted compound’s sets by these two screening methods, we examined the product of the sets (consensus set) and the sum of the sets. As a result, the consensus set achieved a higher hit ratio than the sum of the sets and than either individual predicted set. In addition, the current combination was shown to be robust enough for the structural diversities both in different crystal structure and in snapshot structures during molecular dynamics simulations.
in silico; screening; consensus score; protein-based screening; protein-ligand docking; conformation of active site
Position-specific scoring matrices (PSSMs) are useful for detecting weak homology in protein sequence analysis, and they are thought to contain some essential signatures of the protein families. In order to elucidate what kind of ingredients constitute such family-specific signatures, we apply singular value decomposition to a set of PSSMs and examine the properties of dominant right and left singular vectors. The first right singular vectors were correlated with various amino acid indices including relative mutability, amino acid composition in protein interior, hydropathy, or turn propensity, depending on proteins. A significant correlation between the first left singular vector and a measure of site conservation was observed. It is shown that the contribution of the first singular component to the PSSMs act to disfavor potentially but falsely functionally important residues at conserved sites. The second right singular vectors were highly correlated with hydrophobicity scales, and the corresponding left singular vectors with contact numbers of protein structures. It is suggested that sequence alignment with a PSSM is essentially equivalent to threading supplemented with functional information. In addition, singular vectors may be useful for analyzing and annotating the characteristics of conserved sites in protein families.
We describe the role of the BioMagResBank (BMRB) within the Worldwide Protein Data Bank (wwPDB) and recent policies affecting the deposition of biomolecular NMR data. All PDB depositions of structures based on NMR data must now be accompanied by experimental restraints. A scheme has been devised that allows depositors to specify a representative structure and to define residues within that structure found experimentally to be largely unstructured. The BMRB now accepts coordinate sets representing three-dimensional structural models based on experimental NMR data of molecules of biological interest that fall outside the guidelines of the Protein Data Bank (i.e., the molecule is a peptide with 23 or fewer residues, a polynucleotide with 3 or fewer residues, a polysaccharide with 3 or fewer sugar residues, or a natural product), provided that the coordinates are accompanied by representation of the covalent structure of the molecule (atom connectivity), assigned NMR chemical shifts, and the structural restraints used in generating model. The BMRB now contains an archive of NMR data for metabolites and other small molecules found in biological systems.
Archived NMR data; Metabolomics; NMR structure; Structural restraints; Unstructured regions
The Worldwide Protein Data Bank (wwPDB; wwpdb.org) is the international collaboration that manages the deposition, processing and distribution of the PDB archive. The online PDB archive at ftp://ftp.wwpdb.org is the repository for the coordinates and related information for more than 47 000 structures, including proteins, nucleic acids and large macromolecular complexes that have been determined using X-ray crystallography, NMR and electron microscopy techniques. The members of the wwPDB–RCSB PDB (USA), MSD-EBI (Europe), PDBj (Japan) and BMRB (USA)–have remediated this archive to address inconsistencies that have been introduced over the years. The scope and methods used in this project are presented.
We have developed a method to predict ligand-binding sites in a new protein structure by searching for similar binding sites in the Protein Data Bank (PDB). The similarities are measured according to the shapes of the molecular surfaces and their electrostatic potentials. A new web server, eF-seek, provides an interface to our search method. It simply requires a coordinate file in the PDB format, and generates a prediction result as a virtual complex structure, with the putative ligands in a PDB format file as the output. In addition, the predicted interacting interface is displayed to facilitate the examination of the virtual complex structure on our own applet viewer with the web browser (URL: http://eF-site.hgc.jp/eF-seek).
Structure alignment methods offer the possibility of measuring distant evolutionary relationships between proteins that are not visible by sequence-based analysis. However, the question of how structural differences and similarities ought to be quantified in this regard remains open. In this study we construct a training set of sequence-unique CATH and SCOP domains, from which we develop a scoring function that can reliably identify domains with the same CATH topology and SCOP fold classification. The score is implemented in the ASH structure alignment package, for which the source code and a web service are freely available from the PDBj website .
The new ASH score shows increased selectivity and sensitivity compared with values reported for several popular programs using the same test set of 4,298,905 structure pairs, yielding an area of .96 under the receiver operating characteristic (ROC) curve. In addition, weak sequence homologies between similar domains are revealed that could not be detected by BLAST sequence alignment. Also, a subset of domain pairs is identified that exhibit high similarity, even though their CATH and SCOP classification differs. Finally, we show that the ranking of alignment programs based solely on geometric measures depends on the choice of the quality measure.
ASH shows high selectivity and sensitivity with regard to domain classification, an important step in defining distantly related protein sequence families. Moreover, the CPU cost per alignment is competitive with the fastest programs, making ASH a practical option for large-scale structure classification studies.
The worldwide Protein Data Bank (wwPDB) is the international collaboration that manages the deposition, processing and distribution of the PDB archive. The online PDB archive is a repository for the coordinates and related information for more than 38 000 structures, including proteins, nucleic acids and large macromolecular complexes that have been determined using X-ray crystallography, NMR and electron microscopy techniques. The founding members of the wwPDB are RCSB PDB (USA), MSD-EBI (Europe) and PDBj (Japan) [H.M. Berman, K. Henrick and H. Nakamura (2003) Nature Struct. Biol., 10, 980]. The BMRB group (USA) joined the wwPDB in 2006. The mission of the wwPDB is to maintain a single archive of macromolecular structural data that are freely and publicly available to the global community. Additionally, the wwPDB provides a variety of services to a broad community of users. The wwPDB website at provides information about services provided by the individual member organizations and about projects undertaken by the wwPDB.
PreBI is a server that predicts biological interfaces in protein crystal structures, according to the complementarity and the area of the interface. The server accepts a coordinate file in the PDB format, and all of the possible interfaces are generated automatically, according to the symmetry operations given in the coordinate file. For all of the interfaces generated, the complementarities of the electrostatic potential, hydrophobicity and shape of the interfaces are analyzed, and the most probable biological interface is identified according to the combination of the degree of complementarity derived from the database analyses and the area of the interface. The results can be checked through an interactive viewer, and the most probable complex can be downloaded as atomic coordinates in the PDB format. PreBI is available at .
We introduce GASH, a new, publicly accessible program for structural alignment and superposition. Alignments are scored by the Number of Equivalent Residues (NER), a quantitative measure of structural similarity that can be applied to any structural alignment method. Multiple alignments are optimized by conjugate gradient maximization of the NER score within the genetic algorithm framework. Initial alignments are generated by the program Local ASH, and can be supplemented by alignments from any other program.
We compare GASH to DaliLite, CE, and to our earlier program Global ASH on a difficult test set consisting of 3,102 structure pairs, as well as a smaller set derived from the Fischer-Eisenberg set. The extent of alignment crossover, as well as the completeness of the initial set of alignments are examined. The quality of the superpositions is evaluated both by NER and by the number of aligned residues under three different RMSD cutoffs (2,4, and 6Å). In addition to the numerical assessment, the alignments for several biologically related structural pairs are discussed in detail.
Regardless of which criteria is used to judge the superposition accuracy, GASH achieves the best overall performance, followed by DaliLite, Global ASH, and CE. In terms of CPU usage, DaliLite CE and GASH perform similarly for query proteins under 500 residues, but for larger proteins DaliLite is faster than GASH or CE. Both an http interface and a simple object application protocol (SOAP) interface to the GASH program are available at .