PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of jbtJBT IndexAssociation Homepage
 
J Biomol Tech. 2007 February; 18(1): 4.
PMCID: PMC2291831

P9-T ProteinExtractor—From Peptide ID to Protein ID

Abstract

In proteomics workflows, proteins are often digested first, then peptides are separated and subjected to identification by mass spectrometry (e.g., 2D-LC). In this process the peptide assignment to a protein is lost and has to be rebuilt by bioinformatic methods. We present ProteinExtractor, a module of the ProteinScape Bioinformatics Platform, which uses an empiric, iterative method to derive minimal protein lists from peptide search results, which may even come from different search algorithms or different MS datasets.

ProteinExtractor uses an iterative approach to generate a minimal protein list. With composite database searches ProteinExtractor allows measuring the false-positive rate of the protein list. A test dataset (five recombinant proteins, 408 spectra, Bruker Ultraflex), and a real-life dataset (200410 LC/ESI-MS/MS spectra, Bruker Esquire HCT-Ultra, and 11619 LC/MALDI-MS/MS spectra, Bruker Ultraflex, both obtained from an analysis of proteins from a human cell line—SW480) were analyzed.

The most probable protein sequence entries contained in the test dataset were identified with intensive manual data interpretation by several mass spectrometry experts. Using standard search algorithms, the correct protein sequence database entries are scattered over the first 171 protein ranks. Together with application specialists, we developed a set of rules to define a minimal protein list containing only those proteins (and isoforms) that can be unequivocally distinguished on the basis of MS/MS data. Applying these rules, the correct five proteins are ranked within the top eight protein candidates.

In the real-life dataset, the peptide search results of Mascot, Sequest, Phenyx, and ProteinSolver were merged using ProteinExtractor. Merging all four search algorithms, over 50% more proteins could be identified than by using Mascot alone (with a false-positive rate of less then 2.5%). Merging ESI and MALDI data together, another 25% more proteins could be identified.


Articles from Journal of Biomolecular Techniques : JBT are provided here courtesy of The Association of Biomolecular Resource Facilities