J Biomol Tech. 2010 September; 21(3 Suppl): S12.
PMCID: PMC2918209

Assessing and Interpreting Protein Identifications



In the first half of the previous decade, protein identification using mass spectrometry went through an arms race, where papers progressively claimed higher numbers of proteins. While well intended, the race was flawed in that reports failed to reflect the true number of protein species monitored. Counts were inflated – often by many fold – because similar protein database entries that shared some or all of their supporting evidence were reported as separate identifications. Formal protein inference or ‘protein grouping’ algorithms were developed to properly constrain reporting to the minimal set of proteins necessary to explain the observed peptide identification evidence. While these algorithms emerged early on, it took journal publication guidelines to enforce their regular use. By present convention, the number of detected proteins is considered to be the number of protein groups, each group having significant independent supporting evidence, and multiple database entries are listed within a protein group to indicate ambiguity among sequences that equally explain the same supporting evidence. While seemingly semantic, the importance of these details becomes obvious as soon as you try to do anything with a list of identifications. A basic but critical analysis like comparison of results across multiple samples cannot be done correctly without proper protein reporting. For example, in biomarker research, poor protein reporting can yield poor alignment of multiple results, which translates into a protein feature table with an artificially high level of ‘incomplete data’ – cases where an analyte does not have a measurement in all samples. Improper treatment of these issues may be partly responsible for the perception of reproducibility problems that dogs proteomics today. This talk will briefly consider the basic concepts of protein inference in bottom-up proteomics workflows and then focus on practical considerations in how to attain and work with protein identifications optimally.

