|Home | About | Journals | Submit | Contact Us | Français|
The Protein Data Bank in Europe (PDBe; pdbe.org) is actively involved in managing the international archive of biomacromolecular structure data as one of the partners in the Worldwide Protein Data Bank (wwPDB; wwpdb.org). PDBe also develops new tools to make structural data more widely and more easily available to the biomedical community. PDBe has developed a browser to access and analyze the structural archive using classification systems that are familiar to chemists and biologists. The PDBe web pages that describe individual PDB entries have been enhanced through the introduction of plain-English summary pages and iconic representations of the contents of an entry (PDBprints). In addition, the information available for structures determined by means of NMR spectroscopy has been expanded. Finally, the entire web site has been redesigned to make it substantially easier to use for expert and novice users alike. PDBe works closely with other teams at the European Bioinformatics Institute (EBI) and in the international scientific community to develop new resources with value-added information. The SIFTS initiative is an example of such a collaboration—it provides extensive mapping data between proteins whose structures are available from the PDB and a host of other biomedical databases. SIFTS is widely used by major bioinformatics resources.
The Protein Data Bank in Europe (PDBe; pdbe.org) (1) is the European partner in the Worldwide Protein Data Bank (wwPDB; wwpdb.org) (2), the international partnership that manages the Protein Data Bank (PDB) (3,4) archive of experimentally determined biomacromolecular structures. The other wwPDB partners are the Research Collaboratory for Structural Bioinformatics (RCSB) (5) and the BioMagResBank (BMRB) (6) in the USA, as well as the Protein Data Bank Japan (PDBj) (7). The four partners provide data deposition and annotation facilities for the experimental structural-biology community. This collaboration has resulted in a single, uniform archive for macromolecular structure data and has led to substantial improvements in the quality, consistency and integrity of the archive. The PDB is updated weekly with new and revised entries and is made available by all the wwPDB sites simultaneously at 0:00 UTC (Coordinated Universal Time) on Wednesdays. The archive is freely downloadable and is mirrored by many third-party sites.
The wwPDB partners each offer different and competing services to deliver the basic archival data along with value-added information, thus providing alternative and in some cases complementary ways for the user community to obtain biomacromolecular structure information. Historically, PDBe has provided the structural biology community with advanced tools and services for biomacromolecular structure search and analysis (8). PDBe has also been at the forefront of developing resources for X-ray crystallography, Nuclear Magnetic Resonance (NMR) spectroscopy and cryo-Electron Microscopy (EM). The Electron Microscopy Data Bank (EMDB; EMDataBank.org) (9,10) was established at the EBI in 2002 and is now managed and developed in collaboration with the RCSB and Baylor College of Medicine (EMDataBank.org this issue).
As the PDB approaches its 40th anniversary in 2011, PDBe has turned its attention to a fundamental problem facing the structural biology community: ‘how to make the wealth of structural data available to the larger biomedical community?’ Addressing this issue will require rethinking the ways in which structure data is delivered to users. Issues of data quality and validation are crucial to ensure that users with relatively little structural biology background can assess the quality of the data they want to use. In this article, we discuss the first steps towards providing better access to biomacromolecular structure data for expert and novice users alike. In addition, we describe new resources for value-added NMR data and the SIFTS project (11), which is the authoritative source of up-to-date residue-level annotation of protein structures in the PDB with data available in UniProt (12) and several other major biomedical databases.
As a first step towards making biomacromolecular structure data available to the biomedical community, PDBe has developed an interface that allows access to the 3D-structure data based on classification systems that are familiar and intuitive to molecular biologists, biochemists and other life scientists. This marks a shift from the traditional way of accessing PDB data based on PDB accession code or by searches using information regarding, for instance, a publication, a molecule name, a sequence or a related 3D structure.
The new browsing capability can be used not only to list and sort the relevant PDB entries, but also to analyze the structural knowledge embodied in the PDB in the context of the biological knowledge represented in various biological classification systems. One of the oldest and most widely used biochemical classification systems is the ‘Enzyme Classification’ (EC) (13), which classifies the known enzymes into functional families. Based on information from the IntEnz (14) database we have developed a new interactive interface to browse PDB data in the context of the Enzyme Classification.
The EC browser (pdbe.org/ec) enables users to retrieve and analyze information on any or all of the enzyme structures available in the PDB. Figure 1 shows the EC-browser interface, which consists of three components: a panel that enables users to select enzymes or enzyme classes of interest; a panel that displays information about the class of enzymes selected by the user; and a central panel that presents different views on the structural information available in the PDB and related resources about the selected enzyme class, organized in a number of tabs. Each tab provides a different view on the data:
All the data displayed in the browser is retrieved from the PDBe database in real time and all graphs are generated on the fly. Hence, the information shown is always up-to-date. The interface also allows users to download data presented in the central panel for further analysis or reporting. Browsing the structural archive in this fashion gives both expert and non-expert users an intuitive means for accessing and analyzing the wealth of information available in the PDB, using familiar biological or (bio-)chemical terms and classifications.
In addition to the EC browser, PDBe has developed two other browser modules that are based on the sequence-based protein-family classification system Pfam (17) (pdbe.org/pfam) and the structure-fold-based protein-family classification system CATH (15) (pdbe.org/cath), respectively, and further modules are under development. There is also a browser-like interface for analyzing the results of FASTA-based (20) sequence searches of the PDB (21) (pdbe.org/fasta). The functionality and interface of these browsers is very similar to that of the EC browser.
In order to convey key information about a PDB entry, PDBe has designed a set of intuitive icons called PDBlogos. In addition, PDBe has introduced PDBprints (pdbe.org/pdbprints), which are sets of seven icons that convey specific bits of information in a well-defined order. On the PDBe Atlas pages, PDBprints are used to give an at-a-glance overview of the contents of an entry. PDBprints also allow for easy comparison of PDB entries listed in the result lists of the PDBe search system. In the first release of PDBprints (summer 2010), the following categories of information are included:
Some of the icons may have either a grey or a colored background. If the background is grey, this implies that the corresponding feature, data or information is absent or unavailable. For example, Figure 2 shows the PDBprints for PDB entry 1atp (22), which immediately reveals that 1atp is a published crystal structure of a heterologously expressed mouse protein in complex with a ligand, for which the experimental data is available. The grey icon indicates that the entry contains no nucleic acid molecules.
PDBe has established itself as a provider of advanced services such as PDBeFold (23), PDBeMotif (24) and PDBePISA (16). These services are used not only by the experimental structural biology community, but also by the wider bioinformatics and biology community interested in information about biological assemblies or in comparing the folds or ligand-binding sites of structures available in the PDB. To make these services more easily accessible, we have redesigned our web site with two major objectives in mind: to make PDBe services and the PDB archive data more accessible to a broader user base, in particular novice and non-expert users, and to integrate services in a transparent manner.
The PDBe home page (pdbe.org) was redesigned to allow easy access to the PDB data and the advanced PDBe services (Figure 3). A ‘PDBe Tools’ panel provides links to some of the most popular search and analysis tools at PDBe, organized in problem-oriented sets (‘deposit’, ‘browse’, ‘search’, etc.). The central part of the home page provides access to a wealth of information about PDBe and its resources, services and tools. By default, the page opens on the ‘Home’ tab, which offers a number of quick ways to access information for a particular PDB entry. The ‘Sequence search’ sub-tab provides easy access to the FASTA-based (21) browser interface. Users can find additional information about various PDBe resources, tools and services via the ‘PDBe feature’ sub-tab. The ‘Quick access’ sub-tab, which is displayed by default, enables users to enter a PDB code and gain single-click access to a number of commonly requested information sources about that entry. At present, these include (i) the new English-language summary Atlas page (Figure 4); (ii) the PDB-formatted file from the archive; (iii) a page with links to files related to the PDB entry [e.g. mmCIF and PDBML files of the structure, experimental data and SIFTS (11) data]; (iv) the probable quaternary structure (derived by PDBePISA); (v) similar folds in the PDB [calculated by PDBeFold (23), based on the program SSM (25)]; and (vi) analyses of sequence motifs, 3D motifs, ligand interactions, etc. from PDBeMotif (24). The ‘Quick access’ sub-tab further allows users to search for PDB entries based on an external database identifier [e.g. from PubMed (26), UniProt (12), SCOP (27), Pfam (17), CATH (15) or GO (18)]. Finally, the ‘Quick access’ sub-tab provides a number of links that give access to a random PDB entry that satisfies a criterion relating to the method used to solve the structure, the type of molecules in the entry, or when the entry was released. These links are very useful for education and outreach purposes.
Novice users of the PDBe web site will benefit from the newly introduced Wizard. This Wizard tries to determine what a user is looking for based on answers to a series of questions. In most cases, it eventually presents users with a search form for direct access to the information or it suggests an appropriate resource or advanced service at PDBe. Users are also provided with a ‘Shortcut’ method that does not require use of the Wizard pages, should they wish to carry out similar searches in the future. Figure 5 shows an example of a series of Wizard pages.
The search bar at the top of the PDBe home page can be used to carry out a quick search for a specific PDB or EMDB (9,10) entry, or a quick keyword search of both databases. The two databases are queried simultaneously and the search results are classified based on the categories in which the search term was present. For instance, a search term such as ‘cancer’ may occur as part of a journal title, a publication title, a molecule name, a keyword or a domain or sequence family name, etc. This facility enables users to refine their search and only find results in a category of their interest.
A final feature to help users access PDBe resources is the implementation of a number of easy-to-remember short-cut URLs (Table 1), e.g. ‘pdbe.org’ gives direct access to the PDBe home page while ‘pdbe.org/fold’ and ‘pdbe.org/pisa’ provide direct access to the PDBeFold and PDBePISA services, respectively. The URL ‘pdbe.org/1xyz’ links directly to the PDBe summary Atlas page for PDB entry ‘1xyz’, while ‘pdbe.org/download/1xyz’ gives direct access to the PDB file for that entry.
The European Bioinformatics Institute (EBI) is home to a number of core bioinformatics databases and services that provide data relevant to the biomedical field. As part of the EBI, PDBe is in a unique position to enhance the annotation of biomacromolecular structures with data from other biological databases by cross-referencing and mapping to in-house resources. The ‘Structure Integration with Function, Taxonomy and Sequence’ initiative (SIFTS; pdbe.org/sifts) (11) is a close collaboration between the PDBe and UniProt (12) teams, with the goal of improving the integration of protein structure and sequence data. The project was started in 2001 and has resulted in the development of a robust mechanism for enhancing annotations and exchanging data between the major structure- and sequence-based resources.
The SIFTS procedure (S.V. et al., unpublished data) identifies the correct cross-reference in the UniProt database for every protein in a PDB entry. All the data for residue-level mapping between a PDB entry and the corresponding UniProt entry is generated using automated procedures once all the taxonomy and UniProt cross-reference information has been identified. To validate the mapping, the data is loaded into the PDBe database where data integrity checks are performed independently of the mapping process. The mapping information is enriched with cross-reference information from the NCBI taxonomy database (26,28), IntEnz (14), CATH (12), SCOP (27), InterPro (29), Pfam (15) and PubMed (26). This process is based either on information from the corresponding UniProt entry or from the links available to the PDB entry from the corresponding databases. While this process is very effective in gathering and cross-mapping these data resources, in relation to GO (18) terms it can introduce spurious mappings. This is due to the fact that all the GO terms are mapped onto the complete sequence in a UniProt entry whereas a PDB entry may only contain a fragment or domain to which the GO annotation may not be applicable. To address this problem, PDBe has developed an improved mapping process for GO terms in collaboration with the InterPro and GOA teams at the EBI. The new process uses InterProScan (30) and considers only the domains present in the PDB entry to map the corresponding GO terms (18). All cross-reference data is made freely available in tab-delimited files and XML files from the PDBe ftp area (pdbe.org/sifts/ftp). SIFTS data is used by major bioinformatics resources such as RCSB (5), PDBsum (31), Pfam (17), SCOP (27), InterPro (29), several DAS server providers (32) and many research and service groups around the world to provide cross-reference information on their web pages. Table 2 shows statistics for the PDB entries that have SIFTS-based cross-references as of September 2010.
PDBe has worked closely with the NMR community and BMRB to improve the data quality of NMR depositions in the PDB. NMR spectroscopy is an important structure-determination technique but has suffered from a lack of standards and tools, which has limited the reliability of the deposited data. Capturing experimental data is key to enhancing the reliability of NMR structures in the PDB. The deposition of restraints derived from the experimental data (such as NOE-based distant restraints) has been mandatory since February 2008, and by the end of 2010 the deposition of chemical shift information will also become mandatory.
To facilitate the deposition of NMR models, experimental data, restraints and other metadata, PDBe has developed the ‘Entry Completion Interface’ (ECI) (33). The software is based on the CCPN (34) framework and enables pooling of all NMR-related data in one project. The CCPN FormatConverter (34) can be used within ECI to import data from the output of commonly used NMR software. ECI also carries out basic validation of chemical shifts against standard values. The finalized CCPN project can be uploaded to PDBe using AutoDep (33,35). AutoDep also accepts chemical shifts as a separate file in NMR-STAR (V3.1) format and allows the input of referencing information for the relevant nuclei. In that case, an additional validation step is carried out to check the correspondence of the atom nomenclature between the coordinate and the chemical shift data. The chemical shift data is automatically forwarded to the BMRB for further annotation and archiving.
The PDBe Atlas pages contain details about the underlying experiments for every PDB entry. In the case of NMR entries, these pages have been redesigned to provide access to a wealth of publicly available information, some of which is unique to PDBe (Figure 6). For every NMR entry for which the appropriate data is available, the following information is provided (Table 3 lists the number of NMR entries for which each kind of information is available, as of September 2010):
PDBe works closely with its wwPDB partners, the structural biology community and bioinformatics resources at the EBI and elsewhere to improve the quality and consistency of the data in the PDB. It is actively involved in the wwPDB X-ray and NMR validation task forces and is implementing the recommendations of the X-ray validation task force (41). This validation pipeline will be used by all wwPDB partners, both to validate new depositions to the PDB and to assess the quality of existing entries. The validation data will be crucial to help experts and novices alike to access structural information that is reliable. In the next few years, PDBe will also endeavour to provide new ways to access and integrate structural and related data, especially for non-expert users. Simultaneously, PDBe will continue to develop advanced services aimed more specifically at the structural biology community.
PDBe gratefully acknowledges the support of the European Molecular Biology Laboratory (EMBL), the Wellcome Trust (grant number 088944), the European Union (213010 and 226073), the UK Biotechnology and Biological Sciences Research Council (BB/C512110/1, BB/G022577/1, BB/E007511/1), and the National Institutes of Health (R01GM079429-01A1). Funding for open access charge: Wellcome Trust.
Conflict of interest statement. None declared.
The authors wish to thank all collaborators and partners in the EBI, EMBL, wwPDB, EMDB, CCPN, CCP4, CCDC and other collaborative efforts, as well as the structural biology community for depositing their structures and experimental data in the PDB and EMDB.