J Biomol Tech. 2007 February; 18(1): 1.
PMCID: PMC2291883

P1-S On the Manipulation and Comparison of Protein and Peptide Identification Results from MS Data: Walking on Eggs in the Format Jungle


A number of protein and peptide identification software tools based on MS data are available to the proteomics researchers. They all share a common functionality: they process MS data and present in their output peptides and proteins that best match with the input data. Even if restricting to sequence search engines one can observe heterogeneity of approaches, of algorithms, of input parameters, of the use of available sequence databases, of output information (scores, confidence levels, details of interpretation, etc.) and of possibilities to export results. The results obtained from different tools also vary both from the content and the form point of view. It is a challenge for the bio-informatics to help lab-researchers in manipulating results obtained from replicate analyses or from submissions made to multiple search engines.

Here we present our approach to represent side-by-side results from different MS/MS identification results. We expose elements of the difficulty to get appropriate exports from different search engines and to map the provided information, in order to align it in a single interface. We address questions such as: which export format from each tool is the most useful to perform alignment of results; how to align proteins and peptides coming from two different sequence databases (NCBInr and SwissProt, for instance); how to interpret protein grouping in separate queries; how to identify that proteins are the same if the sequence is not present in the result, or whether any of the database identifiers are different, etc.

As an illustrative example, we show how we convert outputs from Phenyx, Mascot, Sequest, or X!Tandem into the Phenyx result comparison feature and more. We will also show how this effort will contribute to and profit from the development of AnalysisXML, a HUPO PSI standard XML format to capture results from protein and peptide identification results.

