The amount and diversity of data generated at Proteomics experiments is very large no matter which workflow is used. The use of automated systems to handle and treat such data is an important issue on Bioinformatics today. Several systems have been developed with that goal. However, few of those allow the registering of experimental conditions in all the steps used for protein identification. In this paper we present PRODIS, the Proteomics Data Integrated System, a Proteomics integrated data management system designed to store unprocessed raw and analyzed data for a Proteomics workflow that allows experiment tracking along with project management. The experiment tracking feature connects the information produced by different types of experiments in order to be able to present to the researcher a picture of the experimental process with all steps and conditions used in the process. This feature makes it easier to access all data related to each other and to specific experiments increasing the reliability of the analysis.
The focus of PRODIS is to manage and store unprocessed data and experimental conditions as well as to help researchers in tracking data generated by individual experiments and how this data relates to other experimental data even before the experiment results have been completely processed. As it was designed to manage rather that analyze data, PRODIS is a system complementary to other Proteomics tools such as PRIDE, ProSE and MASPECTRAS. These systems manage the data regarding proteomic identification focusing especially on mass spectrometry and protein identification. PRODIS main concern is the storage of experimental data and the information on how the experiments have been performed regardless of the experiment type.
The scope of PRIDE is mainly the information regarding peptide and proteins identified by the user and the data and metadata associated with it such as the sequence and coordinates of the peptide within the protein that it provides evidence for, any post-translational modifications coordinated in relation to the specific peptide that they have been found upon, instrumentation used to perform the analysis and processed peak lists supporting the identifications. PRODIS on the other hand works as a system complementary to PRIDE storing the general information concerning the conditions and protocols used in the identification experiments since sample extraction, providing a snapshot of the experiments performed and their relationships, in order to show the most successful experimental conditions and point possible mistakes not only for MS experiments but also for the other steps in the proteomic analysis.
ProSe is a system similar to PRODIS as it handles LC, MS and 2D-PAGE data. ProSe supports the sample tracking function. However, sample tracking only allows one to see which experiments have been generated from a given sample. It is not possible to associate all related experiments as allowed by the experiment tree constructed by PRODIS. Using PRODIS it is possible to identify experiments that are related to any experiment performed, not only the initial sample. For example, it is possible to identify which spots generated a m/z list on a MS experiment and all the data associated to it such as gel images or files. The main strength of PRODIS is that it is a system to help the researcher at the initial steps of proteomic analysis by storing experimental data and experiments attributes independent of which protein it relates to and allowing all information to be retrieved regardless of the step of analysis being performed.
MASPECTRAS focuses on mass spectrometry, and PRIDE on protein identification. Even though some functionality exists in these systems for managing other types of experiments, PRODIS makes it simpler and more efficient to track the complete Proteomics experimental process. ProSe handles more types of experiments, but its sample tracking functionality is not as comprehensive as PRODIS experiment tree. PRODIS uses a web server as the interface for data capture. Simple forms constructed in the PHP language are made available for data entry. The researcher uploads the result files generated directly from the experiment using this interface. The files are processed through parsers that are used to interpret this data directly out of the experiment results. Using a web based interface gives an increased portability to data capture since users do not need to install any specific software to have access to the database for data importing and exporting. Also, this system makes data importing and storing faster and less error prone than a manual input, since the files are imported automatically, processed by the parsers and inserted directly on the database, without user interventions such as typos or wrong naming of some attributes. This is also an advantage over ProSe, which has a complex interface, and even after installation may require the installation and use of “plugins” to perform some of its functions. Using ProSe efficiently must be preceded by a installation and/or customization step which requires technical expertise. PRODIS, on the other hand, even though less flexible, is simpler to use. After installation, which consists of copying the PHP scripts to a web server accessible directory, accessing a web page is all that is required.
PRODIS has been designed to store data from experiments of several different organisms. At the moment data from S. viridicornis and S. mansoni, proteomic studies are being collected to feed the database. These experiments have been performed in different laboratories by different research groups, demonstrating the usefulness of the PRODIS, which can provide assistance in the Proteomics research for a large scientific community. Moreover, it demonstrates the capability of the database to store data from different formats and research groups, emphasizing also its flexibility.