|Home | About | Journals | Submit | Contact Us | Français|
PCDB (http://www.pcdb.unq.edu.ar) is a database of protein conformational diversity. For each protein, the database contains the redundant compilation of all the corresponding crystallographic structures obtained under different conditions. These structures could be considered as different instances of protein dynamism. As a measure of the conformational diversity we use the maximum RMSD obtained comparing the structures deposited for each domain. The redundant structures were extracted following CATH structural classification and cross linked with additional information. In this way it is possible to relate a given amount of conformational diversity with different levels of information, such as protein function, presence of ligands and mutations, structural classification, active site information and organism taxonomy among others. Currently the database contains 7989 domains with a total of 36581 structures from 4171 different proteins. The maximum RMSD registered is 26.7Å and the average of different structures per domain is 4.5.
Protein conformational diversity is a key feature to understand protein function. Since the early studies of Max Perutz on the T and R forms of hemoglobin, increasing experimental evidence supports the notion that native state of proteins is not unique. In fact, the native state is better represented by an ensemble of conformers in equilibrium describing the conformational diversity or dynamism of a protein (1). It has been showed that the ensemble description is essential to understand central biological aspects of protein function such as the catalytic process of enzymes (2–4), protein–protein recognition (5–7), macromolecular process such as DNA replication and protein folding by chaperonins (8), enzyme promiscuity (9), signal transduction (10,11) and the proteins ability to develop new functions (property known as ‘evolvability’) (12,13). Despite that, the characterization of the equilibrium ensemble of conformers, involving the study of the structural and thermodynamic features of each individual conformer, represents a major challenge to overcome. In this way, different procedures have been applied to the study of protein dynamisms. Experimentally, the nuclear magnetic resonance (NMR) spectroscopy is among the most widely used approaches representing a promising and active area of research (14). On the other hand, computational methods like Coarse-Grained Molecular Dynamics and Monte Carlo methods techniques, used in combination with Normal mode analysis, have been revealed that they are useful tools to explore the conformational landscape of proteins (15–19). Finally, a completely different approach to study conformational diversity considers that crystallographic structures of the same protein obtained under different conditions are snapshots or instances of protein dynamism. This view is supported by the correlation found between the observed structural diversity determined by solution experiments such as NMR measurements and those coming from crystallographic structures of proteins obtained in different conditions (6,20–25). Also a good correlation was found when computational methods, such as molecular dynamics, were used to simulate protein dynamism and then compared with solution structures from NMR (26,27).
With thousands of structures redundantly deposited in structural databases (28) the extension and distribution of the conformational diversity can be explored for a large number of proteins not accessible with the methodologies mentioned above. In this paper we have used this approach to develop a database of proteins with conformational diversity. Here, we describe PCDB (from protein Conformational diversity database), its web functionality and possible applications.
PCDB is a database of proteins showing conformational diversity. As was mentioned above, conformational diversity is estimated from a redundant collection of structures for each protein domain deposited in the database. PCDB was developed from CATH database v3.3 following its protein domain structural hierarchy and definitions (29). Briefly, CATH clusters proteins domains using structural and sequence similarities in a hierarchy defined by 9 levels called CATHSOLID where the ‘D’ level assigns a number for each individual domain in the database and corresponds with the collection of different crystallographic structures for an individual protein. This level was used to build PCDB collecting all the proteins domains with at least two different crystallographic structures classified in CATH. The current version of the PCDB contains 7989 protein domains from 4171 proteins and 34775 crystallographic structures and 1806 corresponding to NMR (Table 1).
The structures collected for each protein domain could have been crystallized under the same or different conditions. If the conditions were the same, it is known that RMSD between different structures is as much as 0.1 to 0.4 A (30). Larger RMSD are expected when conformational diversity appears and this could happen when crystallization conditions varies among the structures considered. In fact RMSD as high as 23.4 have been reported in redundant studies of protein structures (28). Following the addition of ligands, for example, it is well established that a conformational equilibrium shift towards a high affinity conformer could occurred originating changes in tertiary structure (12,31,32). Besides, other changes in crystallization conditions like modifications in the oligomerization state (33), pH and temperature, as well as the presence of mutations (34) can also modify the relative stability of conformers and then originate differences between crystallographic structures for the same protein. In addition, different sequence modifications or crystallographic errors could introduce conformational diversity unrelated to biological reasons. Considering that our method to measure conformational diversity relies in the quality of the crystallographic structure, different filters were used in order to build the database. The different criteria used to select the structures are explained below and a general PCDB building scheme can be found in Supplementary Figure S1.
In PCDB, the structures are linked with information contained in PDB concerning the crystallization procedure and supplementary data that could help to understand the occurrence of conformational diversity. The factors considered are: the presence of ligands, mutations, changes in the oligomeric state and pH. The maximum RMSD (RMSDmax) among the redundant structures of each protein domain is used to evaluate the extension of the structural change. Using the data in PCDB, we have found that at least one of these set of selected experimental features is involved in the 74% of all the domains (Table 2), and in the 60% of the domains with more than 0.4 RMSDmax. Besides the information provided for the crystallization procedure, each of the proteins deposited in PCDB was cross linked with different databases. In this way, a given extension of conformational diversity measured by RMSDmax can be related with diverse biological and structural information such as biological function [GO terms (35) and Enzyme Commission numbers(EC) (36)], structural classification [CATH (29)], taxonomy (NCBI taxonomic ID and genus and species names), metabolic pathways location, subcellular location, protein interactions, protein family, presence of characterized catalytic site [Catalytic Site Atlas (37)] and derived InterPro links (38).
PCDB is composed of a web application based on PHP language, connected with a MySQL database. The database includes information derived from numerous biological databases and online servers and data acquired from personal scripting and programs. PCDB search tool is based on dynamics SQL queries generated in PHP. PCDB browsing capability is based on SQL stored procedures that are executed dynamically, using PHP language. PCDB was built using the redundant structures from each protein domain collected from CATH v3.3 (39) (see Supplementary Figure S1). The structures belonging to each protein domain were structurally aligned using MAMMOTH (40) and the RMSDmax between conformers were registered. Information about crystallization conditions was extracted from PDBML/XML files, as well as the oligomeric state, presence of sequence modifications, mutations, deletion and missing residues. Post-translational modifications were extracted from the ‘Controlled vocabulary of post-translational modifications’ provided by Uniprot. Information about catalytic residues was extracted from Catalytic Site Atlas (37). Further biological information for each structure were extracted from different databases: PDB (30), SIFTS (http://www.ebi.ac.uk/msd/sifts/) and UniProt (41).
Conformational diversity is a central issue to understand protein function so its characterization could span multiple applications. PCDB database is designed to retrieve proteins with a given amount of conformational diversity measured by RMSDmax and allows relating this value with different levels of information. There are two main ways to explore PCDB (Figure 1). The main attribute to search PCDB concerns the extension of conformational diversity measured by RMSDmax. This type of search could be limited using a set of four attributes (presence of ligands, presence of mutations, changes in oligomerization state and changes in pH) considering the properties characterizing the experimental conditions of crystallization of each structure. These attributes can be selected separately or in different combinations (Table 2) and can be used to explain the RMSDmax obtained for a given protein. In the example showed in Figure 1, we were interested in searching PCDB for proteins with 5–10 RMSDmax between their respective structures due to the presence of ligands. Therefore, the resulted extension of conformational diversity can be univocally associated to conformational changes upon ligand binding. Also in Figure 1, and below the search field, the field to customize the output information is displayed. In this section it is possible to select different levels of information from structural classification, protein function or subcellular location among others. It is also possible to retrieve the structural superposition of the conformers with the maximum RMSD. Similar searches could explore PCBD using a single or a combination of the attributes producing conformational changes. Furthermore, the biological and structural data contained in the customizable output field, could be used to explore different trends related with conformational diversity.
We are interested in increasing the amount and diversity of available biological and structural data for each domain represented in the database, to enhance possible correlations studies between conformational diversity and a broad spectrum of physiochemical parameters. One of our near future goals is to introduce sequence alignments for each deposited protein to derive evolutionary information such as the relative conservation of different positions and evolutionary rates. The link between the pattern of residue substitution and the extension of conformational diversity is a promising field to increase our understanding about protein evolution; however it is almost an unexplored field yet. Beside this, and following previous works, we would like to enrich PCDB introducing structures from close homologous proteins (21) in order to increase the conformational representation of the deposited domains.
Two main features differentiate PCDB from other databases containing information about conformational diversity in proteins (42,43). Firstly, PCDB uses experimentally determined structures and secondly this data are related with biological and structural information to possible explains the observed conformational diversity extension. In the present version, PCDB contains 7989 protein domains with a broad range of conformational diversity from the trivial zero to 26.7 RMSDmax. In this way PCDB could be an essential tool to understand conformational diversity and by this means obtain a better understanding of protein function.
Supplementary Data are available at NAR Online.
Proyectos de Investigacion Plurianuales (PIP) CONICET grant (112-200801-02849) and Universidad Nacional de Quilmes grant (53/B056); Ezequiel Juritz has a type II fellowship from CONICET. Funding for open access charge: CONICET and UNQ grants.
Conflict of interest statement. None declared.
We would like to thank the referees for all the helpful comments. Sebastian Fernandez Alberti and Gustavo Parisi are members of the Research Career of the National Research Council (CONICET, Argentina).