The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB, http://rcsb.org), the US data center for the global PDB archive, makes PDB data freely available to all users, from structural biologists to computational biologists and beyond. New tools and resources have been added to the RCSB PDB web portal in support of a ‘Structural View of Biology.’ Recent developments have improved the User experience, including the high-speed NGL Viewer that provides 3D molecular visualization in any web browser, improved support for data file download and enhanced organization of website pages for query, reporting and individual structure exploration. Structure validation information is now visible for all archival entries. PDB data have been integrated with external biological resources, including chromosomal position within the human genome; protein modifications; and metabolic pathways. PDB-101 educational materials have been reorganized into a searchable website and expanded to include new features such as the Geis Digital Archive.
Structures of biomolecular systems are increasingly computed by integrative modeling that relies on varied types of experimental data and theoretical information. We describe here the proceedings and conclusions from the first wwPDB Hybrid/Integrative Methods Task Force Workshop held at the European Bioinformatics Institute in Hinxton, UK, October 6 and 7, 2014. At the workshop, experts in various experimental fields of structural biology, experts in integrative modeling and visualization, and experts in data archiving addressed a series of questions central to the future of structural biology. How should integrative models be represented? How should the data and integrative models be validated? What data should be archived? How should the data and models be archived? What information should accompany the publication of integrative models?
integrative modeling; hybrid modeling; integrative structural biology; Protein Data Bank
DCC is a wrapper for third-party software packages to aid in structure factor analysis and validation. As the results are recorded in PDBx/mmCIF format, the output from DCC can be used in automatic data pipelines.
Since 2008, X-ray structure depositions to the Protein Data Bank archive (PDB) have required submission of experimental data in the form of structure factor files. RCSB PDB has developed the program DCC to allow worldwide PDB (wwPDB; http://wwpdb.org) biocurators, using a single command-line program, to invoke a number of third-party software packages to compare the model file with the experimental data. DCC functionality includes structure factor validation, electron-density map generation and slicing, local electron-density analysis, and residual B factor analysis. DCC outputs a summary containing various crystallographic statistics in PDBx/mmCIF format for use in automatic data processing and archiving pipelines.
Protein Data Bank; structure factor validation; utility programs; DCC
Summary: The Chemical Component Dictionary (CCD) is a chemical reference data resource that describes all residue and small molecule components found in Protein Data Bank (PDB) entries. The CCD contains detailed chemical descriptions for standard and modified amino acids/nucleotides, small molecule ligands and solvent molecules. Each chemical definition includes descriptions of chemical properties such as stereochemical assignments, chemical descriptors, systematic chemical names and idealized coordinates. The content, preparation, validation and distribution of this CCD chemical reference dataset are described.
Availability and implementation: The CCD is updated regularly in conjunction with the scheduled weekly release of new PDB structure data. The CCD and amino acid variant reference datasets are hosted in the public PDB ftp repository at ftp://ftp.wwpdb.org/pub/pdb/data/monomers/components.cif.gz, ftp://ftp.wwpdb.org/pub/pdb/data/monomers/aa-variants-v1.cif.gz, and its mirror sites, and can be accessed from http://wwpdb.org.
Supplementary data are available at Bioinformatics online.
Three-dimensional Electron Microscopy (3DEM) has become a key experimental method in structural biology for a broad spectrum of biological specimens from molecules to cells. The EMDataBank project provides a unified portal for deposition, retrieval and analysis of 3DEM density maps, atomic models and associated metadata (emdatabank.org). We provide here an overview of the rapidly growing 3DEM structural data archives, which include maps in EM Data Bank and map-derived models in the Protein Data Bank. In addition, we describe progress and approaches toward development of validation protocols and methods, working with the scientific community, in order to create a validation pipeline for 3DEM data.
The RCSB Protein Data Bank (RCSB PDB, http://www.rcsb.org) provides access to 3D structures of biological macromolecules and is one of the leading resources in biology and biomedicine worldwide. Our efforts over the past 2 years focused on enabling a deeper understanding of structural biology and providing new structural views of biology that support both basic and applied research and education. Herein, we describe recently introduced data annotations including integration with external biological resources, such as gene and drug databases, new visualization tools and improved support for the mobile web. We also describe access to data files, web services and open access software components to enable software developers to more effectively mine the PDB archive and related annotations. Our efforts are aimed at expanding the role of 3D structure in understanding biology and medicine.
The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) develops tools and resources that provide a structural view of biology for research and education. The RCSB PDB web site (http://www.rcsb.org) uses the curated 3D macromolecular data contained in the PDB archive to offer unique methods to access, report and visualize data. Recent activities have focused on improving methods for simple and complex searches of PDB data, creating specialized access to chemical component data and providing domain-based structural alignments. New educational resources are offered at the PDB-101 educational view of the main web site such as Author Profiles that display a researcher’s PDB entries in a timeline. To promote different kinds of access to the RCSB PDB, Web Services have been expanded, and an RCSB PDB Mobile application for the iPhone/iPad has been released. These improvements enable new opportunities for analyzing and understanding structure data.
This Meeting Review describes the proceedings and conclusions from the inaugural meeting of the Electron Microscopy Validation Task Force organized by the Unified Data Resource for 3DEM (http://www.emdatabank.org) and held at Rutgers University in New Brunswick, NJ on September 28 and 29, 2010. At the workshop, a group of scientists involved in collecting electron microscopy data, using the data to determine three-dimensional electron microscopy (3DEM) density maps, and building molecular models into the maps explored how to assess maps, models, and other data that are deposited into the Electron Microscopy Data Bank and Protein Data Bank public data archives. The specific recommendations resulting from the workshop aim to increase the impact of 3DEM in biology and medicine.
The Protein Structure Initiative’s Structural Biology Knowledgebase (SBKB, URL: http://sbkb.org) is an open web resource designed to turn the products of the structural genomics and structural biology efforts into knowledge that can be used by the biological community to understand living systems and disease. Here we will present examples on how to use the SBKB to enable biological research. For example, a protein sequence or Protein Data Bank (PDB) structure ID search will provide a list of related protein structures in the PDB, associated biological descriptions (annotations), homology models, structural genomics protein target status, experimental protocols, and the ability to order available DNA clones from the PSI:Biology-Materials Repository. A text search will find publication and technology reports resulting from the PSI’s high-throughput research efforts. Web tools that aid in research, including a system that accepts protein structure requests from the community, will also be described. Created in collaboration with the Nature Publishing Group, the Structural Biology Knowledgebase monthly update also provides a research library, editorials about new research advances, news, and an events calendar to present a broader view of structural genomics and structural biology.
Protein; Protein production; Structural biology; Structural databases; Structural genomics; Theoretical models
The RCSB Protein Data Bank (RCSB PDB) web site (http://www.pdb.org) has been redesigned to increase usability and to cater to a larger and more diverse user base. This article describes key enhancements and new features that fall into the following categories: (i) query and analysis tools for chemical structure searching, query refinement, tabulation and export of query results; (ii) web site customization and new structure alerts; (iii) pair-wise and representative protein structure alignments; (iv) visualization of large assemblies; (v) integration of structural data with the open access literature and binding affinity data; and (vi) web services and web widgets to facilitate integration of PDB data and tools with other resources. These improvements enable a range of new possibilities to analyze and understand structure data. The next generation of the RCSB PDB web site, as described here, provides a rich resource for research and education.
Cryo-electron microscopy reconstruction methods are uniquely able to reveal structures of many important macromolecules and macromolecular complexes. EMDataBank.org, a joint effort of the Protein Data Bank in Europe (PDBe), the Research Collaboratory for Structural Bioinformatics (RCSB) and the National Center for Macromolecular Imaging (NCMI), is a global ‘one-stop shop’ resource for deposition and retrieval of cryoEM maps, models and associated metadata. The resource unifies public access to the two major archives containing EM-based structural data: EM Data Bank (EMDB) and Protein Data Bank (PDB), and facilitates use of EM structural data of macromolecules and macromolecular complexes by the wider scientific community.
We describe the proceedings and conclusions from a “Workshop on Applications of Protein Models in Biomedical Research” that was held at University of California at San Francisco on 11 and 12 July, 2008. At the workshop, international scientists involved with structure modeling explored (i) how models are currently used in biomedical research, (ii) what the requirements and challenges for different applications are, and (iii) how the interaction between the computational and experimental research communities could be strengthened to advance the field.
Structural Genomics has been successful in determining the structures of many unique proteins in a high throughput manner. Still, the number of known protein sequences is much larger than the number of experimentally solved protein structures. Homology (or comparative) modeling methods make use of experimental protein structures to build models for evolutionary related proteins. Thereby, experimental structure determination efforts and homology modeling complement each other in the exploration of the protein structure space. One of the challenges in using model information effectively has been to access all models available for a specific protein in heterogeneous formats at different sites using various incompatible accession code systems. Often, structure models for hundreds of proteins can be derived from a given experimentally determined structure, using a variety of established methods. This has been done by all of the PSI centers, and by various independent modeling groups. The goal of the Protein Model Portal (PMP) is to provide a single portal which gives access to the various models that can be leveraged from PSI targets and other experimental protein structures. A single interface allows all existing pre-computed models across these various sites to be queried simultaneously, and provides links to interactive services for template selection, target-template alignment, model building, and quality assessment. The current release of the portal consists of 7.6 million model structures provided by different partner resources (CSMP, JCSG, MCSG, NESG, NYSGXRC, JCMM, ModBase, SWISS-MODEL Repository). The PMP is available at http://www.proteinmodelportal.org and from the PSI Structural Genomics Knowledgebase.
Protein model portal; PSI structural genomics knowledgebase; Comparative protein structure modeling; Homology modeling; Model database
The Protein Structure Initiative Structural Genomics Knowledgebase (PSI SGKB, http://kb.psi-structuralgenomics.org) has been created to turn the products of the PSI structural genomics effort into knowledge that can be used by the biological research community to understand living systems and disease. This resource provides central access to structures in the Protein Data Bank (PDB), along with functional annotations, associated homology models, worldwide protein target tracking information, available protocols and the potential to obtain DNA materials for many of the targets. It also offers the ability to search all of the structural and methodological publications and the innovative technologies that were catalyzed by the PSI's high-throughput research efforts. In collaboration with the Nature Publishing Group, the PSI SGKB provides a research library, editorials about new research advances, news and an events calendar to present a broader view of structural biology and structural genomics. By making these resources freely available, the PSI SGKB serves as a bridge to connect the structural biology and the greater biomedical communities.
A new data model for PDB entries of viruses and other biological assemblies with regular noncrystallographic symmetry is described.
A new scheme has been devised to represent viruses and other biological assemblies with regular noncrystallographic symmetry in the Protein Data Bank (PDB). The scheme describes existing and anticipated PDB entries of this type using generalized descriptions of deposited and experimental coordinate frames, symmetry and frame transformations. A simplified notation has been adopted to express the symmetry generation of assemblies from deposited coordinates and matrix operations describing the required point, helical or crystallographic symmetry. Complete correct information for building full assemblies, subassemblies and crystal asymmetric units of all virus entries is now available in the remediated PDB archive.
virus structures; Protein Data Bank; database integration; uniform curation; point symmetry; helical symmetry; biological assemblies