UbiProt Database is considered to be a useful tool for different purposes, but mainly for identification of proteins posttranslationally modified by ubiquitin in high-throughput studies. A computational biology (e.g. comparative and evolutionary analysis of protein ubiquitylation) also offers a promising field of a database application. The dataset collected will provide an insight into the ubiquitin-dependent mechanisms controlling essential cellular events and, in future, help to develop new compounds for treating ubiquitin-associated human diseases.
At the present time the information about 400 individual proteins from different organisms is collected and this work is still in progress. Our group permanently analyses a broad array of literature data (approximately 40 journal articles per week) in order to collect new information on the target proteins, identified ubiquitylation sites, structures of multi-ubiquitin chains and features of the ubiquitylation machinery. A continuous renewal of the corresponding fields as well as an insertion of new information blocks, especially concerning a biological function of ubiquitylation, are provided. Our database undergoes updating as soon as appropriate confident information becomes available. We also plan to extend our dataset with a detailed description of up-stream and down-stream components of ubiquitin system, together with comprehensive information on the domain structure of target proteins including ubiquitin-binding domains (reviewed in [12
UbiProt is more convenient in terms of search of information about ubiquitylated proteins than more general databases. Although databases like Swiss-Prot and Human Protein Reference Database [13
] also contain data about ubiquitylation, there are several problems with a retrieval of the information of interest. First of all, one trying to obtain information about ubiquitylation from Swiss-Prot using the SRS system [14
] will face some difficulties with a query formulation, as far as the SRS poorly works with complex queries [15
]. A query should be formulated very precisely, so it is necessary to find at least one entry prior to the main search. In addition, even precise queries work well only for proteins with an identified ubiquitylation site(s). An amount of proteins with unknown ubiquitylation sites is much more bigger, and in this case query formulation rules may differ. Another problem is search redundancy. A search for ubiquitylated proteins in Swiss-Prot most likely will return not only pure ubiquitylated proteins, but also numerous enzymes of the ubiquitylation cascade.
Besides several search problems mentioned above, there is lack of data in the established databases. Many known substrates of ubiquitylation do not appear as ubiquitylated proteins in Swiss-Prot (e.g. p53, BRCA1, MDM2, Ymer and others), for many other proteins precise ubiquitylation sites are not designated. Human Protein Reference Database also cannot serve as a complex reference source for ubiquitylation, because it contains data only about 18 ubiquitylated proteins, without details about an ubiquitylation type and respective enzymes. This can be due to the insufficient use of data from the proteomic studies dedicated to ubiquitylation. Only 2 recent papers presenting results of high-throughput analysis [4
] are reviewed in Swiss-Prot at the moment. Our dataset is based also on another proteomic works published so far [6
UbiProt aims to collect data from a number of resources including cited databases to make the information easily accessible after validation and annotation. It is more specific and comprehensive comparing to general sequence databases such as Swiss-Prot.
All scientists working on protein ubiquitylation are encouraged to join collaboration in keeping the database up-to-date by submitting additional information and comments. A downloadable Excel form can be used for submitting new data.