J Biomol Tech. 2010 September; 21(3 Suppl): S57.
PMCID: PMC2918002

Analysis and Interpretation of Multiple Proteomic Datasets Biologically Relevant Information Obtained in Less than 3 Hours

C. Ingrell, K. Pihakari, and O. Vorm
Proxeon, Denmark



A growing issue in proteomics is data interpretation, in particular the time-consuming steps of analyzing proteins and peptides identified by database search engines, extracting biologically meaningful information, and sharing results with co-workers and collaborators. This poster shows the application of a bioinformatics tool that was used to manage protein and peptide lists, put them into a biological context and enable the results to be shared directly. Within 3 hours, output generated by protein database search engines sourced from multiple proteomic data sets, was translated into biological information and shared with collaborators.The bioinformatics tool, ProteinCenter (Proxeon) uses biological annotations from multiple resources to produce a biologically-relevant overview in large-scale proteomics studies. ProteinCenter contains public sequence databases to form a comprehensive and consistent superset of 12 million protein sequences derived from over 50 million protein records from GenBank, Refseq, EMBL, UniProt, Swiss-Prot, Trembl, PIR, IPI, PDB, Ensembl etc., including more than 5 million outdated accession numbers. The ProteinCenter database is built using Sun Java technology and Microsoft mySQL database technology for optimal performance. Here we present the bioinformatic analysis and comparison of proteomics datasets derived from the PRIDE database, including an organ-specific proteome map for Arabidopsis thaliana and HUPO projects. Protein identifications were clustered using ProteinCenter algorithms based on indistinguishable proteins or sequence homology. The results of ProteinCenter data processing will be presented including statistical analysis of over- and under-represented features such as gene ontology categories, PFAM annotated proteins, signal peptide proteins, trans-membrane annotated proteins, enzymes, involvement in KEGG pathways and others.A bioinformatics tool that gives access to all major protein repositories enabled analysis and interpretation of multiple, large scale proteomics datasets, sourced directly from search engines, to be performed within 2-3 hours.

