The field of proteomics has used mass spectrometry (MS) techniques to provide qualitative results that describe the protein complement of complex protein samples [1
]. Researchers also use modifications of these MS technologies for the quantitative analysis of proteins in complex samples [1
], and often hundreds to thousands of proteins are quantified per experiment. Some quantitative techniques involve peptide isotopic labeling [4
]. In contrast, label-free techniques have focused on analysis of MS/MS peak heights or observed peptide spectral count information [9
]. Peptides are produced in an enzymatic digestion of the protein mixture, often using trypsin, which generally cleaves the proteins at the C-terminus of lysine or arginine amino acid residues [13
Spectral counting techniques typically infer the relative quantity of a protein by counting the number of MS detected tryptic peptides associated with the protein being quantified as a fraction of all observed peptide counts. However, spectral counting can be confounded by the fact that the likelihood of peptide detection by MS techniques can vary greatly from one peptide to another based on the particular physicochemical properties of the peptide sequences. Peptide physicochemical properties can affect final MS detection through several factors such as the ability to recover peptides during the cation exchange and reversed phase LC stages of sample preparation, variation in ionization efficiency of the peptide in the ion source of a particular MS instrument, and can affect mass analysis in MS and MS/MS modes [9
]. Peptide properties such as peptide length, mass, amino acid composition, solubility, net charge, and other properties can impact peptide detection. This variability in peptide detection can lead to errors in assessing the abundance of the parent protein producing the tryptic peptides.
Lu et al
] have described a novel technique for protein quantitation, Absolute Protein Expression measurements (APEX), where machine learning techniques are used to improve quantitation results over basic spectral counting. In the APEX technique, a supervised classification algorithm is used to predict the probability of peptide detection by MS based on the peptide's physicochemical properties. For each protein in the sample, the expected number of peptide observations (spectral counts) is computed based on predicted MS detectability of the corresponding tryptic peptides. In other words, the computationally predicted (expected) spectral counts are corrected for the variable peptide detection probabilities related to peptide physicochemical properties and the specific MS technology in use.
More formally, the APEX technique, given by equation 1 [16
], is a modified spectral counting method in which the total observed
spectral count for protein i
) is normalized by a computationally predicted
) for one molecule of protein i
. The computed values are weighted based on the protein identification probability (pi
). A relative APEX score is obtained by dividing by the sum of the values for all N
proteins being quantified. The user-supplied normalization factor C
, typically an estimate of total protein concentration, converts the relative abundance values into absolute terms.
APEX abundance estimates are absolute
in the sense that they are not relative to a second dataset representing a different condition or control, as is done in some relative protein quantitation methods such as SILAC [8
]. Also, the abundance estimates within a sample are normalized and can be readily compared to estimates from other samples. While a particular protein's abundance is presented relative
to all proteins within the sample, multiplication by C puts the abundance values into absolute terms.
This paper describes a new software tool, the APEX Quantitative Proteomics Tool, an implementation of the APEX technique for the quantitation of proteins based on LC- MS/MS proteomics results. The main role of the tool is to compute APEX protein abundance values using equation 1, however the tool also supports preparation of prior information, such as derivation of Oi values for proteins under study, as well as post-processing data analysis.
The APEX tool supports three primary processing tasks as shown in figure . The first task is the construction of a training data set that relates prior peptide MS data to a set of peptide physicochemical properties which is used to predict peptide MS detection probabilities. The peptide MS detection probabilities are needed to estimate expected spectral counts for each protein (Oi values). The use of prior (user-defined) MS data insures that the later calculations reflect the specific laboratory protocols, MS instrumentation, instrument settings, and the particular proteins under study and other factors that could influence peptide detection. The training data can be an independent high-quality MS dataset or even an experimental dataset.
Figure 1 Primary Processing Tasks within the APEX Quantitative Proteomics Tool. The flowchart illustrates the three major processing tasks within the APEX tool. Processes are depicted in rounded rectangles while data inputs are shown as rectangles where bold rectangles (more ...)
The second processing task is the generation of an Oi value for each protein under study. This step uses the generated training data, peptide physicochemical properties and peptide MS detection calls, to build a classifier to predict peptide detection probabilities. Each protein sequence from a supplied FASTA sequence file undergoes an in silico trypsin digestion and each peptide is assigned an MS detection probability. The probabilities for each peptide derived from protein i are summed to produce the protein's Oi value. This Oi value is the predicted peptide detection (spectral) count for one molecule of protein i.
The third processing task uses the previously generated Oi values and LC-MS/MS experimental results, which provide ni and pi, to produce protein abundance values according to equation 1. These quantitation results can be piped into several post-processing tools.