The Technology Portal of the Protein Structure Initiative Structural Biology Knowledgebase (PSI SBKB; http://technology.sbkb.org/portal/) is a web resource providing information about methods and tools that can be used to relieve bottlenecks in many areas of protein production and structural biology research. Several useful features are available on the web site, including multiple ways to search the database of over 250 technological advances, a link to videos of methods on YouTube, and access to a technology forum where scientists can connect, ask questions, get news, and develop collaborations. The Technology Portal is a component of the PSI SBKB (http://sbkb.org), which presents integrated genomic, structural, and functional information for all protein sequence targets selected by the Protein Structure Initiative. Created in collaboration with the Nature Publishing Group, the SBKB offers an array of resources for structural biologists, such as a research library, editorials about new research advances, a featured biological system each month, and a Functional Sleuth for searching protein structures of unknown function. An overview of the various features and examples of user searches highlight the information, tools, and avenues for scientific interaction available through the Technology Portal.
Database; Protein; Protein Production; Structural Biology; Structural Genomics; Technology
The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) develops tools and resources that provide a structural view of biology for research and education. The RCSB PDB web site (http://www.rcsb.org) uses the curated 3D macromolecular data contained in the PDB archive to offer unique methods to access, report and visualize data. Recent activities have focused on improving methods for simple and complex searches of PDB data, creating specialized access to chemical component data and providing domain-based structural alignments. New educational resources are offered at the PDB-101 educational view of the main web site such as Author Profiles that display a researcher’s PDB entries in a timeline. To promote different kinds of access to the RCSB PDB, Web Services have been expanded, and an RCSB PDB Mobile application for the iPhone/iPad has been released. These improvements enable new opportunities for analyzing and understanding structure data.
A symposium celebrating the 40th anniversary of the Protein Data Bank archive (PDB), organized by the Worldwide Protein Data Bank, was held at Cold Spring Harbor Laboratory (CSHL) October 28–30, 2011. PDB40’s distinguished speakers highlighted four decades of innovation in structural biology, from the early era of structural determination to future directions for the field.
The E. coli trp repressor (trpR) homodimer recognizes its palindromic DNA-binding site through a pair of flexible helix-turn-helix (HTH) motifs displayed on an intertwined helical core. Flexible N-terminal arms mediate association between dimers bound to tandem DNA sites. The 2.5 Å X-ray structure of trpR crystallized in 30% (v/v) isopropanol reveals a substantial conformational rearrangement of HTH motifs and N-terminal arms, with the protein appearing in the unusual form of an ordered 3D domain-swapped supramolecular array. Small angle X-ray scattering measurements show that the self-association properties of trpR in solution are fundamentally altered by isopropanol.
The Protein Structure Initiative’s Structural Biology Knowledgebase (SBKB, URL: http://sbkb.org) is an open web resource designed to turn the products of the structural genomics and structural biology efforts into knowledge that can be used by the biological community to understand living systems and disease. Here we will present examples on how to use the SBKB to enable biological research. For example, a protein sequence or Protein Data Bank (PDB) structure ID search will provide a list of related protein structures in the PDB, associated biological descriptions (annotations), homology models, structural genomics protein target status, experimental protocols, and the ability to order available DNA clones from the PSI:Biology-Materials Repository. A text search will find publication and technology reports resulting from the PSI’s high-throughput research efforts. Web tools that aid in research, including a system that accepts protein structure requests from the community, will also be described. Created in collaboration with the Nature Publishing Group, the Structural Biology Knowledgebase monthly update also provides a research library, editorials about new research advances, news, and an events calendar to present a broader view of structural genomics and structural biology.
Protein; Protein production; Structural biology; Structural databases; Structural genomics; Theoretical models
The RCSB Protein Data Bank (RCSB PDB, www.pdb.org) is a key online resource for structural biology and related scientific disciplines. The website is used on average by 165 000 unique visitors per month, and more than 2000 other websites link to it. The amount and complexity of PDB data as well as the expectations on its usage are growing rapidly. Therefore, ensuring the reliability and robustness of the RCSB PDB query and distribution systems are crucially important and increasingly challenging. This article describes quality assurance for the RCSB PDB website at several distinct levels, including: (i) hardware redundancy and failover, (ii) testing protocols for weekly database updates, (iii) testing and release procedures for major software updates and (iv) miscellaneous monitoring and troubleshooting tools and practices. As such it provides suggestions for how other websites might be operated.
Database URL: www.pdb.org
The RCSB Protein Data Bank (RCSB PDB) web site (http://www.pdb.org) has been redesigned to increase usability and to cater to a larger and more diverse user base. This article describes key enhancements and new features that fall into the following categories: (i) query and analysis tools for chemical structure searching, query refinement, tabulation and export of query results; (ii) web site customization and new structure alerts; (iii) pair-wise and representative protein structure alignments; (iv) visualization of large assemblies; (v) integration of structural data with the open access literature and binding affinity data; and (vi) web services and web widgets to facilitate integration of PDB data and tools with other resources. These improvements enable a range of new possibilities to analyze and understand structure data. The next generation of the RCSB PDB web site, as described here, provides a rich resource for research and education.
The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) serves a community of users with diverse backgrounds and interests. In addition to processing, archiving and distributing structural data, it also develops educational resources and materials to enable people to utilize PDB data and to further a structural view of biology.
The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) supports scientific research and education worldwide by providing an essential resource of information on biomolecular structures. In addition to serving as a deposition, data-processing and distribution center for PDB data, the RCSB PDB offers resources and online materials that different audiences can use to customize their structural biology instruction. These include resources for general audiences that present macromolecular structure in the context of a biological theme, method-based materials for researchers who take a more traditional approach to the presentation of structural science, and materials that mix theme-based and method-based approaches for educators and students. Through these efforts the RCSB PDB aims to enable optimal use of structural data by researchers, educators and students designing and understanding experiments in biology, chemistry and medicine, and by general users making informed decisions about their life and health.
Protein Data Bank; crystallographic education; macromolecular structures; biological crystallography
One obstacle to achieving complete understanding of the principles underlying sequence-dependent recognition of DNA is the paucity of structural data for DNA recognition sequences in their free (unbound) state. Here we carried out crystallization screening of 50 DNA duplexes containing cognate protein binding sites and obtained new crystal structures of free DNA binding sites for three distinct modes of DNA recognition: anti-parallel beta strands (MetR), helix-turn-helix motif + hinge helices (PurR), and zinc fingers (Zif268). Structural changes between free and protein-bound DNA are manifested differently in each case. The new DNA structures reveal that distinctive sequence-dependent DNA geometry dominates recognition by MetR, protein-induced bending of DNA dictates recognition by PurR, and deformability of DNA along the A-B continuum is important in recognition by Zif268. Together, our findings show that crystal structures of free DNA binding sites provide new information about the nature of protein-DNA interactions and thus lend insights towards a structural code for DNA recognition.
DNA structure; transcription factors; indirect readout; protein-DNA interactions; gene regulation
Recent structures of Escherichia coli catabolite activator protein (CAP) in complex with DNA, and in complex with RNA polymerase α subunit C-terminal domain (αCTD) and DNA, have yielded insights into how CAP binds DNA and activates transcription. Comparison of multiple structures of CAP-DNA complexes has revealed contributions of direct readout and indirect readout to DNA binding by CAP. The structure of the CAP-αCTD-DNA complex has provided the first structural description of interactions between a transcription activator and its functional target within the general transcription machinery. Using the structure of the CAP-αCTD-DNA complex, the structure of an RNAP-DNA complex, and restraints from biophysical, biochemical, and genetic experiments, it has been possible to construct detailed three-dimensional models of intact Class I and Class II transcription activation complexes.
catabolite activator protein (CAP); cAMP receptor protein (CRP); RNA polymerase; σ70; promoter; DNA binding; DNA bending; transcription activation
We describe the proceedings and conclusions from a “Workshop on Applications of Protein Models in Biomedical Research” that was held at University of California at San Francisco on 11 and 12 July, 2008. At the workshop, international scientists involved with structure modeling explored (i) how models are currently used in biomedical research, (ii) what the requirements and challenges for different applications are, and (iii) how the interaction between the computational and experimental research communities could be strengthened to advance the field.
Structural Genomics has been successful in determining the structures of many unique proteins in a high throughput manner. Still, the number of known protein sequences is much larger than the number of experimentally solved protein structures. Homology (or comparative) modeling methods make use of experimental protein structures to build models for evolutionary related proteins. Thereby, experimental structure determination efforts and homology modeling complement each other in the exploration of the protein structure space. One of the challenges in using model information effectively has been to access all models available for a specific protein in heterogeneous formats at different sites using various incompatible accession code systems. Often, structure models for hundreds of proteins can be derived from a given experimentally determined structure, using a variety of established methods. This has been done by all of the PSI centers, and by various independent modeling groups. The goal of the Protein Model Portal (PMP) is to provide a single portal which gives access to the various models that can be leveraged from PSI targets and other experimental protein structures. A single interface allows all existing pre-computed models across these various sites to be queried simultaneously, and provides links to interactive services for template selection, target-template alignment, model building, and quality assessment. The current release of the portal consists of 7.6 million model structures provided by different partner resources (CSMP, JCSG, MCSG, NESG, NYSGXRC, JCMM, ModBase, SWISS-MODEL Repository). The PMP is available at http://www.proteinmodelportal.org and from the PSI Structural Genomics Knowledgebase.
Protein model portal; PSI structural genomics knowledgebase; Comparative protein structure modeling; Homology modeling; Model database
The Protein Structure Initiative Structural Genomics Knowledgebase (PSI SGKB, http://kb.psi-structuralgenomics.org) has been created to turn the products of the PSI structural genomics effort into knowledge that can be used by the biological research community to understand living systems and disease. This resource provides central access to structures in the Protein Data Bank (PDB), along with functional annotations, associated homology models, worldwide protein target tracking information, available protocols and the potential to obtain DNA materials for many of the targets. It also offers the ability to search all of the structural and methodological publications and the innovative technologies that were catalyzed by the PSI's high-throughput research efforts. In collaboration with the Nature Publishing Group, the PSI SGKB provides a research library, editorials about new research advances, news and an events calendar to present a broader view of structural biology and structural genomics. By making these resources freely available, the PSI SGKB serves as a bridge to connect the structural biology and the greater biomedical communities.
A new data model for PDB entries of viruses and other biological assemblies with regular noncrystallographic symmetry is described.
A new scheme has been devised to represent viruses and other biological assemblies with regular noncrystallographic symmetry in the Protein Data Bank (PDB). The scheme describes existing and anticipated PDB entries of this type using generalized descriptions of deposited and experimental coordinate frames, symmetry and frame transformations. A simplified notation has been adopted to express the symmetry generation of assemblies from deposited coordinates and matrix operations describing the required point, helical or crystallographic symmetry. Complete correct information for building full assemblies, subassemblies and crystal asymmetric units of all virus entries is now available in the remediated PDB archive.
virus structures; Protein Data Bank; database integration; uniform curation; point symmetry; helical symmetry; biological assemblies
We describe the role of the BioMagResBank (BMRB) within the Worldwide Protein Data Bank (wwPDB) and recent policies affecting the deposition of biomolecular NMR data. All PDB depositions of structures based on NMR data must now be accompanied by experimental restraints. A scheme has been devised that allows depositors to specify a representative structure and to define residues within that structure found experimentally to be largely unstructured. The BMRB now accepts coordinate sets representing three-dimensional structural models based on experimental NMR data of molecules of biological interest that fall outside the guidelines of the Protein Data Bank (i.e., the molecule is a peptide with 23 or fewer residues, a polynucleotide with 3 or fewer residues, a polysaccharide with 3 or fewer sugar residues, or a natural product), provided that the coordinates are accompanied by representation of the covalent structure of the molecule (atom connectivity), assigned NMR chemical shifts, and the structural restraints used in generating model. The BMRB now contains an archive of NMR data for metabolites and other small molecules found in biological systems.
Archived NMR data; Metabolomics; NMR structure; Structural restraints; Unstructured regions
The catabolite activator protein (CAP) bends DNA in the CAP-DNA complex, typically introducing a sharp DNA kink, with a roll angle of ∼40° and a twist angle of ∼20°, between positions 6 and 7 of the DNA half-site, 5′-A1A2A3T4G5T6G7A8T9C10T11-3′ (“primary kink”). In previous work, we showed that CAP recognizes the nucleotide immediately 5′ to the primary-kink site, T6, through an “indirect-readout” mechanism involving sequence effects on energetics of primary-kink formation. In this work, to understand further this example of indirect readout, we have determined crystal structures of CAP-DNA complexes containing each possible nucleotide at position 6. The structures show that CAP can introduce a DNA kink at the primary-kink site with any nucleotide at position 6. The DNA kink is sharp with the consensus pyrimidine-purine step T6G7 and the nonconsensus pyrimidine-purine step C6G7 (roll angles of ∼42°, twist angles of ∼16°), but is much less sharp with the nonconsensus purine-purine steps A6G7 and G6G7 (roll angles of ∼20°, twist angles of ∼17°). We infer that CAP discriminates between consensus and non-consensus pyrimidine-purine steps at positions 6-7 solely based on differences in the energetics of DNA deformation, but that CAP discriminates between the consensus pyrimidine-purine step and non-consensus purine-purine steps at positions 6-7 both based on differences in the energetics of DNA deformation and based on qualitative differences in DNA deformation. The structures further show that CAP can achieve a similar, ∼46° per DNA half-site, overall DNA bend through a sharp DNA kink, a less sharp DNA kink, or a smooth DNA bend. Analysis of these and other crystal structures of CAP-DNA complexes indicates that there is a large, ∼28° per DNA half-site, out-of plane, component of CAP-induced DNA bending in structures not constrained by end-to-end DNA lattice interactions and that lattice contacts involving CAP tend to involve residues in or near biologically functional surfaces.
catabolite activator protein (CAP); cAMP receptor protein (CRP); protein-DNA interaction; protein-induced DNA bending; indirect readout
The RCSB Protein Data Bank (PDB) offers online tools, summary reports and target information related to the worldwide structural genomics initiatives from its portal at . There are currently three components to this site: Structural Genomics Initiatives contains information and links on each structural genomics site, including progress reports, target lists, target status, targets in the PDB and level of sequence redundancy; Targets provides combined target information, protocols and other data associated with protein structure determination; and Structures offers an assessment of the progress of structural genomics based on the functional coverage of the human genome by PDB structures, structural genomics targets and homology models. Functional coverage can be examined according to enzyme classification, gene ontology (biological process, cell component and molecular function) and disease.
RNA exhibits a large diversity of conformations. Three thousand nucleotides of 23S and 5S ribosomal RNA from a structure of the large ribosomal subunit were analyzed in order to classify their conformations. Fourier averaging of the six 3D distributions of torsion angles and analyses of the resulting pseudo electron maps, followed by clustering of the preferred combinations of torsion angles were performed on this dataset. Eighteen non-A-type conformations and 14 A-RNA related conformations were discovered and their torsion angles were determined; their Cartesian coordinates are available.
The Protein Data Bank (PDB; http://www.pdb.org) is the primary source of information on the 3D structure of biological macromolecules. The PDB’s mandate is to disseminate this information in the most usable form and as widely as possible. The current query and distribution system is described and an alpha version of the future re-engineered system introduced.
A method to detect DNA-binding sites on the surface of a protein structure is important for functional annotation. This work describes the analysis of residue patches on the surface of DNA-binding proteins and the development of a method of predicting DNA-binding sites using a single feature of these surface patches. Surface patches and the DNA-binding sites were initially analysed for accessibility, electrostatic potential, residue propensity, hydrophobicity and residue conservation. From this, it was observed that the DNA-binding sites were, in general, amongst the top 10% of patches with the largest positive electrostatic scores. This knowledge led to the development of a prediction method in which patches of surface residues were selected such that they excluded residues with negative electrostatic scores. This method was used to make predictions for a data set of 56 non-homologous DNA-binding proteins. Correct predictions made for 68% of the data set.
The Protein Data Bank (PDB; http://www.pdb.org/) continues to be actively involved in various aspects of the informatics of structural genomics projects—developing and maintaining the Target Registration Database (TargetDB), organizing data dictionaries that will define the specification for the exchange and deposition of data with the structural genomics centers and creating software tools to capture data from standard structure determination applications.
The Protein Data Bank (PDB; http://www.pdb.org/) is the single worldwide archive of structural data of biological macromolecules. This paper describes the progress that has been made in validating all data in the PDB archive and in releasing a uniform archive for the community. We have now produced a collection of mmCIF data files for the PDB archive (ftp://beta.rcsb.org/pub/pdb/uniformity/data/mmCIF/). A utility application that converts the mmCIF data files to the PDB format (called CIFTr) has also been released to provide support for existing software.
A detailed computational analysis of 32 protein–RNA complexes
is presented. A number of physical and chemical properties of the
intermolecular interfaces are calculated and compared with those
observed in protein–double-stranded DNA and protein–single-stranded
DNA complexes. The interface properties of the protein–RNA
complexes reveal the diverse nature of the binding sites. van der
Waals contacts played a more prevalent role than hydrogen bond contacts,
and preferential binding to guanine and uracil was observed. The
positively charged residue, arginine, and the single aromatic residues,
phenylalanine and tyrosine, all played key roles in the RNA binding
sites. A comparison between protein–RNA and protein–DNA
complexes showed that whilst base and backbone contacts (both hydrogen bonding
and van der Waals) were observed with equal frequency in the protein–RNA
complexes, backbone contacts were more dominant in the protein–DNA
complexes. Although similar modes of secondary structure interactions
have been observed in RNA and DNA binding proteins, the current
analysis emphasises the differences that exist between the two types
of nucleic acid binding protein at the atomic contact level.
The Protein Data Bank (PDB; http://www.rcsb.org/pdb/) is
the single worldwide archive of structural data of biological macromolecules.
This paper describes the data uniformity project that is underway
to address the inconsistency in PDB data.
On the basis of a structural analysis of 240 protein-DNA complexes contained in the Protein Data Bank (PDB), we have classified the DNA-binding proteins involved into eight different structural/functional groups, which are further classified into 54 structural families. Here we present this classification and review the functions, structures and binding interactions of these protein-DNA complexes.