Phospho.ELM is developed and deployed with open source software. The database management system used is PostgreSQL [12
]. The software was developed in Python 2.2 including some modules from the BioPython.org project for retrieval of information from SWISS-PROT and the PyGreSQL module for PostgreSQL interfacing. The web interface software uses the CGI model framework [13
The Phospho.ELM 1.0 database contained a dataset of 289 proteins. The current release (Phospho.ELM 2.0) has integrated data from PhosphoBase to give a total of 556 proteins (299 human, 52 mouse, 54 rat, and 151 from other species). The Phospho.ELM dataset represents the largest collection of experimentally verified phosphorylation sites: the annotated proteins contain 556 tyrosine, 913 serine and 234 threonine phosphorylation sites (instances) that are verified substrates for 119 different protein kinases (Table ).
Selected protein kinases, their class, the number of known protein substrates and the instances recorded in Phospho.ELM.
In the Phospho.ELM database information is presented in two classes, instance and phosphoprotein. The key information consists of the phosphorylated site (instance) and its flanking sequence within a protein, for which experimental evidence has been found in the literature. Moreover, annotations to each instance include (where known) the kinase(s) that phosphorylate(s) the given site, the domain(s) that bind to a phosphorylated motif (this is particularly relevant for tyrosine phosphorylation, e.g. SH2), and a link to the ELM server to retrieve further information about the kinase and the regular expression used for prediction of kinase substrates (see Fig. ). Where available, hyperlinks are provided to protein structures containing phosphorylated residues [14
]. Furthermore, additional information for each protein kinase substrate includes the subcellular compartment (annotated with Gene Ontology terms [15
]), tissue distribution, a list of interaction partners derived from the MINT database [17
], and a diagram of a signaling pathway in which the protein is involved. When one is available we provide a link to the BioCarta-Charting Pathways of Life [18
]. Controlled vocabularies to describe experimental evidence [19
] will soon be included in the database.
Figure 1 The simplified Phospho.ELM database scheme. The key data objects are Substrates (phosphoprotein) and Instances for which relevant information is stored, as well as links to external databases. pkey and fkey stand for "primary key" and "foreign key", respectively. (more ...)
The database can be searched by protein name (for the substrate), kinase name to get a list of known substrates, or by phosphopeptide-binding domain to retrieve all instances interacting with the given domain. An example of a search output is given in Fig. .
Figure 2 A) Scheme for the PI3Kp85 protein with domains and phosphorylation sites. B) Output example of keyword search using PI3Kp85. Information about the phosphorylated sites includes the flanking sequence, the PubMed reference, the kinase responsible for the (more ...)