|Home | About | Journals | Submit | Contact Us | Français|
Mass spectrometry is a method of choice for quantifying low-abundance proteins and peptides in many biological studies. Here, we describe a range of computational aspects of protein and peptide quantitation, including methods for finding and integrating mass spectrometric peptide peaks, and detecting interference to obtain a robust measure of the amount of proteins present in samples.
Mass spectrometry (MS)-based quantitative proteomics has been applied to solve a wide variety of biological problems, and several MS-based workflows have been developed for protein and peptide quantitation (Fig. 1). In mass spectrometric quantitation methods it is usually assumed that the measured signal has a linear dependence on the amount of material in the sample for the entire range of amounts being studied. A prerequisite for accurate quantitation is that unwanted experimental variations in sample extraction, preparation, and analysis be minimized, and it is therefore critical that each step in the workflow is optimized for reproducibility.
One way of optimizing the reproducibility is to label the samples with stable isotopes, mix them together and perform the subsequent sample-handling steps on the mixed sample. The earlier in the workflow that the stable isotope label is introduced and the samples mixed, the smaller is the effect of variations in sample handling. Metabolic labeling (1, 2) provides the earliest possible introduction of stable isotope labels into the sample (Fig. 1a). Here, labels are introduced as isotopically distinct metabolic precursors, and the samples can be mixed before all subsequent steps in the work-flow. It is important to monitor the level of incorporation of the label, but this can, for example, be done by using two heavy labels that are incorporated into the samples with equal efficiency (3). In cases when metabolic labeling is not feasible, the stable isotope labels also can be introduced later in the workflow (4–9) by heavy isotope labeling of proteins (Fig. 1b, c) or peptides (Fig. 1d–f). In general, stable isotope labels need to be designed carefully in order to prevent introducing systematic errors caused by dissimilar behavior of the compounds with different labels. For example, it has been observed that using hydrogen/deuterium substitution in the heavy label can affect the retention time of the labeled peptides, while 12C/13C substitution does not have any observable effect on the retention time (10).
Label-free methods (11–13) for quantitation are often used when the introduction of stable isotopes is impractical (e.g., in many animal studies) or the cost is prohibitive (e.g., in biomarker studies where a relatively large number of samples need to be analyzed). Three label-free quantitation workflows are shown in Fig. 1g–i. In these workflows the different samples are analyzed separately and it is therefore critical that each step of the workflow is carefully optimized for reproducibility. In label-free quantitation workflows, usually the peptide ion peaks are integrated and used as a measure of quantity. This allows the quantity of protein and peptides to be compared in different samples (Fig. 1g) or the absolute quantity can be calculated using a standard curve (Fig. 1h). The peptide fragment ions can also be used for quantitation by integrating one or more of their peaks (Fig. 1i) as, for example, in Multiple Reaction Monitoring (MRM) (14). Using fragment ions for quantitation provides increased specificity because in addition to requiring the mass of the precursor ion be close to its predicted mass, the masses of the fragment ions are also required to be correct. Because peptides fragment in a sequence-specific manner, additional specificity can be gained by requiring that the relative intensities of the fragment ions do not deviate from the expected intensities. Alternative methods for quantitation using fragment mass spectra do not integrate peaks but are based on the results of searching protein sequence collections (see Note 1).
Currently, there are several software packages available for analysis of data from these different workflows where the quantitation is done by integrating peaks of ions that correspond to peptides or their fragments (see Note 2 for a few examples). Here, we describe how the mass spectra are processed to allow for finding the peptide peaks, detecting interference, and integrating the peaks to obtain a measure of the amount of material present in the samples.
Peptide peaks of interest for quantitation may range between smooth peaks with a large signal-to-noise ratio and noisy peaks that are barely above the background. The width of these peaks is, however, characteristic of the resolution of the mass spectrometer, the data acquisition parameters used, as well as the mass-to-charge ratio (m/z) of the peptide. Therefore, peaks can readily be detected by scanning the mass spectra for local maxima of the expected width (see Note 3). In addition, peptides are not observed as a single peak in mass spectrometry, but as a cluster of peaks, because of the presence of small amounts of stable heavy isotopes in nature (e.g., 1.11% 13C) and each peptide contains many carbon atoms. The relative intensities of the peaks in these isotope clusters are characteristic of the atomic composition of the peptides and they are strongly dependent on the peptide mass (Fig. 2a–c, see Note 4).
A majority of quantitation experiments are performed by coupling liquid chromatography with mass spectrometry, which introduces a retention time dimension. During these experiments, usually the same peptide is observed during several adjacent time points (Fig. 2d–g) with highly abundant peptides typically being observed over larger time windows than low-abundance peptides. But even with separation in both m/z and retention time, it is not uncommon to have unwanted interference between peaks from different peptides (Fig. 2e, g).
The following characteristics of peptide peaks can be used as filters to differentiate them from interfering and non-peptide peaks: (1) the width of individual peaks in m/z and retention time, (2) the intensity distribution of the isotope clusters, and (3) the measured peptide m/z. These characteristics are shown in Fig. 3 for two peptides. The width of individual peaks as a function of m/z is highly characteristic of the instrument parameters with very little variation and therefore a narrow peak width filter can be used. The width of individual peaks as a function of retention time (Fig. 3a–c, j–l) shows larger variation. This variation is mainly dependent on the peak intensity and the elution time, although strong peptide sequence dependent variation can also be observed, and therefore a wider filter must be applied. High-accuracy measurement of peptide mass is a sensitive and selective filter that is highly reproducible even at the tails of the peak where the intensity is low (Fig. 3g–i, p–r). The shape of the isotope distribution is also a sensitive and selective filter that can be used to detect interference from other peaks (Fig. 3d–f, m–o). A convenient measure of the similarity of isotope distributions is the dot product (see Note 5) between them (Fig. 3f, o). The dot product can be applied to compare sets containing any number of peaks, for example, to detect interferences when a set of fragment ions is monitored in a MRM experiment. In the example shown in Fig. 3, dot product analysis of the chromatograms shown in the panels on the right shows that only the first isotope cluster corresponds to the peptide of sequence YVLTQPPSVSVAPGQTAR, while the second and third peaks are interfering peaks from peptides whose first three isotope peaks have a similar m/z, but their relative intensity is different.
The quantity of peptides is measured by calculating the height or the area of the corresponding peaks in the ion chromatograms. Careful background subtraction is essential for accurate determination of both the height and the area of peaks (see Note 6). The advantage of using the height of the peak as the measure of quantity is the simplicity and robustness of its calculation (e.g., the average or median height for a few points around the centroid can be used). The peak height is a good measure of quantity if the width of the peak does not vary between samples and the signal is strong with little noise. In contrast, the peak area is a better measurement of quantity when there is substantial noise because many more data points are used, but it is much more sensitive to interference from other peaks because of the larger area in the m/z and retention time space that is used. The difficulty in calculating the peak area is in deciding where the peak ends and the background starts in both m/z and retention time dimensions. This determination can be very challenging for peaks with long tails. It is also important to use the same peak limits for a specific peptide in all samples. One way of circumventing the problem of finding the peak limits is to select a function and fit its parameters (e.g., centroid, width, skewness, etc.) to the peak and integrate the function. However, often it is not straightforward to find a function that fits well to all peaks in the spectrum.
In many quantitation studies more than one experiment (i.e., replicates and/or multiple samples) is performed. This requires the matching of the peptides quantified in the different experiments. For successful matching of peptides, the retention time scales of all experiments have to be aligned, because there are always uncontrolled variations in the experimental conditions that affect the peptide retention times in a nonlinear manner. This alignment can be done by identifying peaks present in all experiments that can be used as landmarks. These peaks are matched across experiments using either their mass and retention time, or their identity as determined by tandem MS. A smooth function is fitted to the retention times of these landmarks and used for aligning the retention times of all quantified peptides. The residual difference in retention time for the landmarks can be used to estimate the uncertainty in the alignment.
For some mass spectrometers, the m/z scale needs to be calibrated between experiments. This mass calibration can be done using the same landmarks as used for retention time alignment. When experiments are aligned in retention time and are mass calibrated, the quantified peptides can be matched within windows determined by the uncertainty in the retention time and the m/z.
The measured intensities of peptide peaks commonly vary from experiment to experiment in a global manner. It is therefore advisable to design experiments so that only a few of the quantified peptides have changes related to the hypothesis, and the majority of peptides change because of random variations in the experimental conditions. The randomly changing peptides can be used to normalize the overall intensity using either their median change in the intensity ratios or by fitting an intensity dependent smooth function to the measured intensity ratios.
Protein quantity can be estimated by measuring of peptide quantities. There are, however, several factors that can make the estimates of protein quantity uncertain even when highly accurate peptide quantities have been obtained. Because only a few peptides are typically measured for a given protein, these peptides might not be sufficient to define all isoforms of the protein that are present in the sample – i.e., some of the peptide sequences might be shared with other proteins, making them only suitable for quantitating the group of proteins. A few peptides might also be modified, and the change in the amount of the modified and unmodified forms of the protein is often not the same. Despite these issues, a reasonable estimate of the protein quantity can often be obtained even when only a few of its peptides are quantified. When many peptides are observed for a given protein it can be possible to even calculate the variation in quantity of several isoforms.
The significance of a measured change in quantity can be calculated if the distribution of random quantity changes (due to uncontrolled variation of experimental conditions) is known (Fig. 4a). This distribution can be obtained by analysis of technical and biological replicates. When the distribution of random quantity changes is known, the significance of a measured change in quantity can be calculated by integrating under the curve from the measured change in quantity to infinity and dividing this area by the area under the entire distribution of random changes. This value represents the probability that the measured quantity change was obtained from purely random variations, that is, the probability of rejecting the null hypothesis that there is no change in the experimental conditions. The distribution of random quantity changes is strongly dependent on the experimental conditions and the workflow that is chosen. For example, for label-free quantitation the distribution of random quantity changes depends on the number of replicates obtained (Fig. 4b–g). It is important to design quantitation experiments to minimize the width of the distribution of random quantity changes to allow for detection of small nonrandom changes.
This work was supported by funding provided by the National Institutes of Health Grants RR00862, RR022220, NS050276, and CA126485, the Carl Trygger foundation, and the Swedish research council.
1Alternative methods for quantitation search fragment mass spectra against a protein sequence collection and use the search results for quantitation. One method uses the number of different fragment mass spectra that identifies a peptide as a measure of its quantity (15). Another method calculates a measure that is based on the fraction of the protein sequence that the identified peptides cover (16). However, these alternative methods that are not based on peak integration are generally less accurate when only a few fragment spectra or peptides are observed for a given protein because of the limited statistics. On the other hand, they are less sensitive to interference and can often be more robust.
2There are many software packages available for quantitation. A few examples of freely available software are listed below:
|MaxQuant (18, 19)||SILAC||http://www.maxquant.org/|
|Pview (21)||SILAC Label-free||http://compbio.cs.princeton.edu/pview/|
3For a mass spectrum where I(k) is the measured intensity at a point k with 0 ≤ k ≤ N, and N is the total number of points in the mass spectrum. The peaks are detected by calculating the sum, over the expected peak width wl for each point, l, in the spectrum, and detecting local maxima in S(l). In cases where there is sufficient noise in the spectrum the signal-to-noise ratio is calculated by taking the ratio of the root mean square (RMS) of the intensities over the peak ( , where Î is the mean intensity over the peak) and the RMS of the intensities in a nearby region where there are no peaks (see Note 6).
4Peptides are observed as clusters of peaks in mass spectrometry, because of the presence of small amounts of stable heavy isotopes in nature (e.g., 0.015% 2H, 1.11% 13C and 0.366% 15N, 0.038% 17O, 0.200% 18O, 0.75% 33S, 4.21% 34S, 0.02% 36S). The intensities of the isotope distribution are calculated accurately by including all possible isotopes. The largest effect comes from 13C and a first order estimate of the relative peak intensities is given by , where Tm is the intensity of peak m in the distribution, m is the number of 13C, n the total number of carbon atoms in the peptide, and p is the probability for 13C (i.e., 1.11%). The isotope distribution of peptides is strongly dependent on the peptide mass because the number of atoms increases with mass, and therefore the probability increases for having one or more of the naturally occurring heavy isotopes.
5The normalized dot product between the measured intensities, I = (I1, I2,…, In) and theoretical intensities T = (T1, T2,…, Tn) of the isotope distribution is given by . The range of the normalized dot product is from −1 to 1. If the measured and theoretical intensities are identical the resulting dot product is 1 and any differences between them will result in lower values of the dot product.
6Low-frequency background can be removed by fitting a smooth curve to the regions of the mass spectrum where there are no peaks. This smoothing can, for example, be achieved by applying a very wide and strong smoothing function to the entire spectrum, which will result in a smooth function slightly higher than the background. Subsequently, points in the original spectrum that are far above this smooth curve are removed (i.e., the peaks). The smoothing procedure is repeated, this time without including the peaks, to produce a smooth function that will closely follow the background of the spectrum (25).