|Home | About | Journals | Submit | Contact Us | Français|
Phospho3D is a database of three-dimensional (3D) structures of phosphorylation sites (P-sites) derived from the Phospho.ELM database, which also collects information on the residues surrounding the P-site in space (3D zones). The database also provides the results of a large-scale structural comparison of the 3D zones versus a representative dataset of structures, thus associating to each P-site a number of structurally similar sites. The new version of Phospho3D presents an 11-fold increase in the number of 3D sites and incorporates several additional features, including new structural descriptors, the possibility of selecting non-redundant sets of 3D structures and the availability for download of non-redundant sets of structurally annotated P-sites. Moreover, it features P3Dscan, a new functionality that allows the user to submit a protein structure and scan it against the 3D zones collected in the Phospho3D database. Phospho3D version 2.0 is available at: http://www.phospho3d.org/.
During recent years, there has been an increasing interest in the structural features of protein phosphorylation sites (P-sites). This fact can be ascribed to the steadily growing experimentally verified P-sites provided by high-throughput mass spectrometry-based proteomics techniques [e.g. (1)]. The simultaneous availability of an increasing number of three-dimensional (3D) structures is making it possible to infer the structural context for a significant number of P-sites. In order to identify the structural determinants of kinase specificity, some authors tried to characterize the 3D environment of P-sites with the aim of pinpointing specific structural features (2–4). Both phosphorylation sites databases, also reporting structural data, and interesting systematic structural analyses of P-sites have recently appeared in the literature [for a review, see ref. (5)]. PHOSIDA (1), a phosphorylation sites database, includes the predicted accessibility and secondary structure of each P-site. The mtcPTM (6) database stores homology models for proteins and protein domains that contain phosphorylated residues. Finally, many of the P-site predictors incorporating 3D-context information (1,4,7–10) display an improvement in performance with respect to predictors using sequence information only.
So far the structural attributes stored in P-site databases and incorporated in P-site predictors are essentially of two types: accessibility and secondary structure.
In 2007, we presented Phospho3D, a database of reliably predicted 3D structures of protein phosphorylation sites (11), derived from the Phospho.ELM database (12) and enriched with structural annotation at the residue level, including accessibility, secondary structure and residue conservation as from the Consurf-HSSP database (13). Phospho3D also stored and annotated the sequence flanking the P-site (10 residues) and the zone, i.e. the 3D region defined by the set of residues at a distance not exceeding 12Å from the phospho-instance.
Since then, the number of Phospho.ELM instances increased ~4-fold, raising from 5314 (in 1805 proteins) to 42474 (in 8718 proteins), and more than 26000 structures were added to the Protein Data Bank (PDB) (14).
Here we introduce Phospho3D version 2.0, which—besides an eleven-fold increase in the number of Phospho.ELM unique instances mapped onto 3D structures (compared to version 1.0)—incorporates several new features, including additional structural descriptors of P-sites, the possibility of browsing the database selecting non-redundant sets of 3D structures, the availability for download of many non-redundant sets of structurally annotated P-sites—aimed at serving as reliable benchmark datasets for predictors’ training and test—and P3Dscan, a new functionality that allows the user to submit a protein structure and scan it against the 3D P-site zones collected in the Phospho3D database.
The updated Phospho3D database was constructed by collecting data from the latest release of the Phospho.ELM database (Version 9.0, August 2010), which currently stores about 42500 experimentally verified phosphorylation sites in 8718 substrate proteins, both manually extracted from the literature and obtained from mass spectrometry-based proteomics experiments. The correspondence between Phospho.ELM sequences and PDB chains was based on sequence alignment using at least 98% sequence identity. P-sites in gapped regions of the alignment were discarded.
This resulted in 5387 mapped instances (1770 unique Phospho.ELM instances on 2158 protein chains—897 Ser, 338 Thr, 535 Tyr).
Notice that P-sites derived from mass spectrometry (MS) experiments should be taken with caution. In fact, due to the current procedures for MS data deposition, it is difficult to systematically detect if a phospho-instance was identified in physiologically abnormal conditions (e.g. in proteins extracted from oncogenic tissues or that do not undergo phosphorylation, such as hemoglobin) (15). In order to help users detect such potentially problematic cases, we reported—for each P-site—the nature of the original experiment (low- or high-throughput) and the corresponding literature reference (PMID). Moreover, we encourage users to carefully analyze the structural context of P-sites, which might be indicative of problems in the original data. One example is represented by the Tyr phosphorylation site mapped to position 133 of the human hemoglobin subunit beta (UniProtKB:P68871), for which Phospho3D stores 43 PDB structures. In most of the reported structures, the solvent accessibility of Y133 is zero and it is never >3.5%. This structural information suggests that the original data might not be reliable.
The basic information stored in Phospho3D consists of the P-site instance, its flanking sequence (10 residues) and the P-site 3D zone, i.e. the set of residues in a 12Å radius surrounding the P-site in space. For each residue in the zone Phospho3D 2.0 stores the following structural descriptors: secondary structure and solvent accessibility (in Å2) as defined by DSSP (16); percentage solvent accessibility, obtained by normalizing the DSSP solvent accessibility by the maximum accessibility value for each residue as determined in ref. (17); B-factor, computed as specified in ref. (18); occurrence in a cavity together with the rank and volume of the cavity calculated with the SURFNET program (19); the depth index DPX (20) and the protrusion index CX (21), obtained using the PSAIA software (22); the CONSURF conservation score extracted from ConSurfDB (23); the disorder probability provided by DisEMBL (24) according to three different criteria: (i) loops/coils as defined by DSSP, (ii) hot loops, i.e. loops with a high degree of mobility as determined by temperature (B-) factor and (iii) missing coordinates in X-Ray structure as defined by REMARK-465 entries in PDB.
A detailed description of each structural attribute is reported in the website documentation.
Phospho3D 2.0 also provides information derived from Phospho.ELM, such as, when available, the kinase(s) phosphorylating a given P-site, and, for each zone, the results of a large-scale local structural comparison versus a non redundant (sequence identity ≤20%) dataset of 487 PDB X-ray protein chains with experimental resolution ≤1.5Å extracted from eukaryotic organisms. The comparison is carried out using the new version of the algorithm (25) and the same criteria for assessing structural similarity used in the previous database version (11) although more stringent thresholds are applied in this case, as described in the website documentation. The database queries can now be performed on seven PDB non-redundant sets: the whole collection of P-sites, the set of P-sites found in non-identical structures (PDB100) and P-sites found in PDB structures belonging to five redundancy sets, ranging from PDB90 to PDB20, where the number corresponds to the maximum sequence identity shared by the protein chains in the redundancy set. These sets have been determined using the PISCES resource (26).
Additionally, the P-site annotations at the residue level are available for download on the Phospho3D website. These can serve as benchmark for P-site predictors’ training and test and for analyses of P-site structural features.
Finally, Phospho3D 2.0 now links each entry to the corresponding Phospho.ELM instance and the kinase names to their UniProt ACs (27).
Phospho3D 2.0 provides a novel functionality that allows the user to upload a PDB-formatted structure and perform a local structural comparison against the 5387 zones (one for each Phospho.ELM mapped instance) stored in the database, aimed at identifying local structural similarities between the user query structure and one of the structural patches containing a P-site. In order to evaluate the structural context of each match, we provide its graphical display and a table reporting the structural information at the residue level of both the query and the target 3D matching patches. The comparison algorithm—that P3Dscan runs on-the-fly—is the same as the one used for the large-scale comparison whose results are stored in the database. The comparison results are also provided in text format for download.
Similarly to the previous version, Phospho3D 2.0 can be searched by kinase name, by PDB identification code or by keyword. In this new version, however, the user can additionally select a redundancy set in order to avoid retrieving identical or very similar P-sites. The data returned to the user consist of a brief description of the PDB structure(s) that fulfill the search criteria and a list of instances presented along with associated information. In particular, each instance is now linked to the corresponding Phospho.ELM entry. For each P-site, the user can select three options related to the surrounding structural zone: a graphical view using the Jmol Java Applet (http://www.jmol.org), a tabular view reporting the zone annotation at the residue level or a list of 3D matches identiﬁed by local structural comparison. Each match can be visualized using Jmol.
The P3Dscan webpage can be reached from the Phospho3D homepage. Users upload a PDB-formatted file, choose a redundancy set of 3D zones they want to scan against their structure and run the comparison by clicking the ‘p3d scan’ button. P3Dscan results are displayed in tabular format (Figure 1). The result table can be sorted by increasing match score or decreasing RMSD. Each line of the table reports the information of a single match. A match can be graphically visualized by clicking on the corresponding button. Moreover, the tabular view button links to a window displaying structural annotation at the residue level, both for the query and the target 3D patches. The Result Table can also be downloaded in text format.
We performed a large-scale structural analysis of the P-sites stored in Phospho3D and plotted the statistical distributions of each 3D attribute used to annotate the P-sites in the database. The analysis was carried out separately for each redundancy set. The distributions for the P-sites falling on non-identical structures (PISCES PDB100) can be found at http://www.phospho3d.org/stats.py#3.
The new version of Phospho3D stores a markedly increased number of structurally annotated P-sites. In addition, it incorporates new significant improvements, such as several new structural descriptors, non-redundant datasets and a tool, P3Dscan, for the analysis of uploaded protein structures.
We believe that this enhanced version of the database makes it possible to fully exploit available structural information on P-sites and use it to perform structural analyses and/or build P-site predictors.
Importantly, the Phospho3D update procedure is now completely automated, allowing regular and timely updates of the database with each new Phospho.ELM release.
This work was supported by Istituto Pasteur—Fondazione Cenci Bolognetti, Roma; the 7th EC Framework Programme LEISHDRUG project [grant number 223414]; and a ‘Juan de la Cierva’ fellowship to A.Z. Funding for open access charge: 7th EC Framework Programme LEISHDRUG project.
Conflict of interest statement. None declared.
Many thanks to Holger Dinkel (Phospho.ELM database) for technical support.