|Home | About | Journals | Submit | Contact Us | Français|
A major unmet need in LC-MS/MS-based proteomics analyses is a set of tools for quantitative assessment of system performance and evaluation of technical variability. Here we describe 46 system performance metrics for monitoring chromatographic performance, electrospray source stability, MS1 and MS2 signals, dynamic sampling of ions for MS/MS, and peptide identification. Applied to data sets from replicate LC-MS/MS analyses, these metrics displayed consistent, reasonable responses to controlled perturbations. The metrics typically displayed variations less than 10% and thus can reveal even subtle differences in performance of system components. Analyses of data from interlaboratory studies conducted under a common standard operating procedure identified outlier data and provided clues to specific causes. Moreover, interlaboratory variation reflected by the metrics indicates which system components vary the most between laboratories. Application of these metrics enables rational, quantitative quality assessment for proteomics and other LC-MS/MS analytical applications.
LC-MS/MS provides the most widely used technology platform for proteomics analyses of purified proteins, simple mixtures, and complex proteomes. In a typical analysis, protein mixtures are proteolytically digested, the peptide digest is fractionated, and the resulting peptide fractions then are analyzed by LC-MS/MS (1, 2). Database searches of the MS/MS spectra yield peptide identifications and, by inference and assembly, protein identifications. Depending on protein sample load and the extent of peptide fractionation used, LC-MS/MS analytical systems can generate from hundreds to thousands of peptide and protein identifications (3). Many variations of LC-MS/MS analytical platforms have been described, and the performance of these systems is influenced by a number of experimental design factors (4).
Comparison of data sets obtained by LC-MS/MS analyses provides a means to evaluate the proteomic basis for biologically significant states or phenotypes. For example, data-dependent LC-MS/MS analyses of tumor and normal tissues enabled unbiased discovery of proteins whose expression is enhanced in cancer (5–7). Comparison of data-dependent LC-MS/MS data sets from phosphotyrosine peptides in drug-responsive and -resistant cell lines identified differentially regulated phosphoprotein signaling networks (8, 9). Similarly, activity-based probes and data-dependent LC-MS/MS analysis were used to identify differentially regulated enzymes in normal and tumor tissues (10). All of these approaches assume that the observed differences reflect differences in the proteomic composition of the samples analyzed rather than analytical system variability. The validity of this assumption is difficult to assess because of a lack of objective criteria to assess analytical system performance.
The problem of variability poses three practical questions for analysts using LC-MS/MS proteomics platforms. First, is the analytical system performing optimally for the reproducible analysis of complex proteomes? Second, can the sources of suboptimal performance and variability be identified, and can the impact of changes or improvements be evaluated? Third, can system performance metrics provide documentation to support the assessment of proteomic differences between biologically interesting samples?
Currently, the most commonly used measure of variability in LC-MS/MS proteomics analyses is the number of confident peptide identifications (11–13). Although consistency in numbers of identifications may indicate repeatability, the numbers do not indicate whether system performance is optimal or which components require optimization. One well characterized source of variability in peptide identifications is the automated sampling of peptide ion signals for acquisition of MS/MS spectra by instrument control software, which results in stochastic sampling of lower abundance peptides (14). Variability certainly also arises from sample preparation methods (e.g. protein extraction and digestion). A largely unexplored source of variability is the performance of the core LC-MS/MS analytical system, which includes the LC system, the MS instrument, and system software. The configuration, tuning, and operation of these system components govern sample injection, chromatography, electrospray ionization, MS signal detection, and sampling for MS/MS analysis. These characteristics all are subject to manipulation by the operator and thus provide means to optimize system performance.
Here we describe the development of 46 metrics for evaluating the performance of LC-MS/MS system components. We have implemented a freely available software pipeline that generates these metrics directly from LC-MS/MS data files. We demonstrate their use in characterizing sources of variability in proteomics platforms, both for replicate analyses on a single instrument and in the context of large interlaboratory studies conducted by the National Cancer Institute-supported Clinical Proteomic Technology Assessment for Cancer (CPTAC)1 Network.
A list of the 46 metrics described in this report is shown schematically in Fig. 1. Short descriptions of the metrics, their assigned reference codes, and the direction indicating improved performance for each can be found in Table I. Although the following section lists the currently used metrics, updates will be described with each release. Additionally, all of the data described in this work can be downloaded from the ProteomeCommons Tranche network at http://cptac.tranche.proteomecommons.org/rudnicketal.html.
Metrics C-1A and C-1B report the fraction of peptides with repeat identifications either >4 min earlier (C-1A) or later (C-1B) than the identification nearest the chromatographic peak maximum. Early identifications indicate bleed (typically non-retained, hydrophilic peptides), whereas later identifications arise from peak tailing of either overloaded peptides or peptides with poor chromatographic behavior. These are reported as fractions of all peptide identifications.
Metric C-2A reports the retention time period over which the middle 50% of the identified peptides eluted, and C-2B is the rate of peptide identification during that period. Sample LC-MS chromatograms from an analysis of a yeast proteome tryptic digest on three LTQ instruments are depicted in Fig. 2. The longer the time period over which peptides elute, the more time is available to acquire MS2 spectra and the greater the number of peptides likely to be sampled and identified is. In this work, this period is defined as the time over which the middle 50% of the identified peptides elute, which corresponds to the difference between the end of the first and beginning of the last retention quartiles (also called “interquartile range”) (C-2A). Various measures described later are computed only during this central period (e.g. C-2B, the identification rate in unique peptides/min) to reject early and later uninformative periods of chromatography and correct for differences in absolute start and end times when comparing chromatograms.
Metrics C-3A, C-4A, and C-4B report chromatographic peak widths for identified peptide peaks. Sharper chromatographic peaks generate higher signal intensities and can reduce oversampling, thereby increasing the diversity of peptides identified. Peak widths (full width at half-maximum) were calculated as RT2 − RT1. Peak width medians (C-3A) and interquartile distances for these values (C-3B) are reported. C-3B is a measure of the distribution of peak widths for all peptides. Smaller values indicate higher degrees of peak uniformity. Peak widths were also calculated on smaller sets of early (decile 1, C-4A) or late (decile 10, C-4B) eluting peptides to measure for peak broadening at the extremes of the gradient.
Peptide elution order can be used to measure elution differences early (hydrophilic) and late (hydrophobic) in the chromatographic gradient. C-6A and C-6B are calculated by first identifying the N(a, b) peptides in common for a pair of runs, a and b, and then sorting all peptides in each run by elution time. If we define R1(a, b) as the rank of the earliest eluting peptide (rank 1) in run a that is also present in run b and R1(b, a) as the equivalent rank of the earliest co-occurring peptide in run b, then R1(a, b) − R1(b, a) is a measure of the number of extra early eluting peptides in run a (C-6A). To make this more robust, we find the maximum of the difference Rn(a, b) − Rn(b, a) from n = 1 to n = N(a, b)/10. C-6B is calculated similarly but for the high ranking, co-occurring peptides. This maximum is divided by the total number of peptides identified in the run, giving the fractional excess (or if negative, deficit) of hydrophobic peptides. For both measures, average differences between all intraseries runs and all interseries runs are reported.
Metrics DS-1A and DS-1B are measures of peptide ion oversampling, which is controlled by the dynamic exclusion settings in the acquisition software. Ideally, these settings minimize wasteful multiple sampling of a peptide ion by permitting just one MS2 spectrum for a given peptide ion over a chromatographic peak. Ratios of singly to doubly (DS-1A) and doubly to triply (DS-1B) identified peptide ions are reported as measures of oversampling. The ratio of spectrum identifications to peptide ion identifications provides a more general measure of oversampling but mixes the effects of distorted chromatographic peaks and bleed with effects more directly originating from dynamic sampling.
The numbers of MS (DS-2A) and MS2 (DS-2B) spectra acquired during the middle 50% peptide retention period (C2-A) indicate the effective speed of sampling over the most information-rich section of the chromatogram. If, for a given MS spectrum, there is insufficient signal to reach the target threshold or the dynamic exclusion settings are not working as expected, the numbers of MS or MS2 spectra may vary substantially between technical replicates.
Metrics DS-3A and DS-3B describe peak sampling. Ideally, chromatographic peaks will be sampled at their maximum intensity. However, current methods make sampling decisions prior to peak maximization, and larger chromatographic peaks are often sampled well before reaching the maximum. Because an m/z value chosen for MS2 sampling is typically placed on an exclusion list for the theoretical remainder of the peak, it is important that the amount of signal be sufficient to trigger acquisition of an MS2 spectrum with adequate signal to noise (S/N) for successful peptide identification. The trade-off is that the threshold value should not be so high as to prevent lower intensity peaks from being sampled. By monitoring the median ratio of the maximum MS1 intensity value over the MS1 intensity at the sampling time for all identified peptides (DS-3A), it is possible to look for variability in chromatographic peak sampling. To approximate the sampling ratio for less abundant analytes, ratios for peptides in the bottom 50% by MS1 abundance are calculated separately (DS-3B). The reciprocal values for these metrics or the percentage of the total peak height at sampling is also useful and perhaps more intuitive. MS1 maximum peak heights were found by linear interpolation from the extracted ion chromatograms for each precursor m/z using a newly developed method (15–18).
Metrics IS-1A and IS-1B are measures of electrospray stability. Short term electrospray stability is monitored as the median of ratios of MS1 total ion current for each set of adjacent scans, the larger ion current divided by the smaller, over the interquartile time interval C-2A. Electrospray drop-off was monitored by counting the number of events where the total ion current changed more (IS-1A) or less (IS-1B) than 10-fold in adjacent full MS scans. Values greater than zero for either of these metrics indicate spray tip “sputter.”
Instrument setup and tuning, including electrospray optimization, affect distributions of the peptide ions m/z values. This is monitored as the median precursor m/z (IS-2) of the identified peptide ions. This measure is sensitive to sample loading with higher concentrations of peptides generating higher m/z values, presumably at least in part due to the increasing preference for lower charge states with fewer charges available per peptide.
The availability of protons during electrospray ionization, the relative sequence length of peptides, and pH can all affect the charge state ratios of the identified peptides. Because doubly charged peptide ions (2+) represent the majority of the identified species from a typical tryptic digest, the ratios of 1+/2+ (IS-3A), 3+/2+ (IS-3B), and 4+/2+ (IS-3C) allow monitoring of the stability of the distributions. Perturbations in these ratios (or in the median precursor m/z) either run to run or between series may also correlate with decreased overlap of the observed peptide identifications.
Reported intensities are the most direct measure of signal strength, although we have found that they can vary substantially between runs and labs for no clear reason. Metrics reporting these values over the C-2A interval are MS1-2B, the median TIC for full MS1 spectra, and MS1-3B, the median of maximum MS1 intensities for peptides. MS1-3A is a measure of “dynamic range,” which is taken as the ratio of the 95th/5th percentile of MS1 maximum intensities for identified peptides over C-2A.
The maximum-to-median signal (a measure of S/N) in both MS1 (MS1-2A) and MS2 (MS2-2) spectra has proven to be a more stable measure of signal strength than absolute intensities. However, because these metrics can depend on thresholding and centroiding, they could vary between classes of instruments and data processing systems. For the data generated in this study with Thermo ion trap instruments, this did not appear as a problem, and this measure served well for monitoring run-to-run variation. The total base peak-normalized abundance provides a related measure but can be sensitive to the m/z range and background ions. Numbers of reported peaks (MS2-3) provide another highly correlated measure and are also reported. We note the general difficulty of determining whether variations in signal intensity arise from amounts injected or changes in instrument sensitivity. In fact, none of the measures described can reliably distinguish between these two factors.
Median ion injection times for MS1 (MS1-1) and MS2 (MS2-1) spectra are also reported. Short times should be associated with high signal levels or low threshold settings. In many cases, the median is also the maximum, so mean ion injection times are also reported. Reduced ion injection times occur when analyzing higher sample loads but should be relatively stable between technical replicates.
Because intensity is used to select ions for fragmentation, intensity variation of the same peptide in different runs is a major source of run-to-run variability. A measure of this variability is the median relative deviation of peptide ion intensities in common between two runs. Absolute differences in peptide ion intensity between the two runs must first be corrected. This is done by sorting an array of the ratios of MS1 maximum intensities for peptide ions found in pairs of runs. The quartile values (Qn is the value for the nth quartile) yielding the median relative deviations are calculated using the following expression.
The average for within series (MS1-4A) and the ratio of within/between series (MS1-4B) are reported.
Another measure of performance is the fraction of identified (scores above threshold) MS2 spectra at different MS1 maximum peptide intensity quartiles (MS2-A–D). These values indicate to what extent intensity variations are sensitive to MS1 signal strength.
An additional class of metrics reports precursor accuracy (i.e. error associated with the identified peptides). All mass error measurements are derived from 2+ peptide ions only. Additionally, mass errors >0.45 m/z were rejected for high resolution Orbitrap and FT MS instruments; they are largely due to incorrect monoisotopic mass assignments at acquisition time. MS1-5A reports the median difference between the theoretical precursor m/z and the measured precursor m/z value as reported in the scan header. Reported monoisotopic values were used if available. MS1-5B is the mean of the absolute differences. MS1-5C is the median value of the real differences in ppm, and MS1-5D is the interquartile distance for the distribution used to determine MS1-5C.
Peptide identifications from MS2 spectra, needed for many of the above metrics, were made by matching spectra to those in a reference spectrum library (19). This method, long used in gas chromatography-MS, measures the similarity of reference and search spectra. A “dot product”-based metric measures similarity (20). This function is not only widely used for gas chromatography-MS but has been found to be effective for MS/MS identification and has recently been used in several peptide spectrum-matching search methods (19, 21–23). Spectrum libraries were derived from spectra assigned to peptide ions by sequence search engines (23–25). The yeast library (NIST yeast_consensus_final_ true_lib (June 30, 2008)) contained 79,990 spectra and was derived from a large number of analyses made by many laboratories, including those of the CPTAC Network, and is available on the web. The chicken egg yolk spectral library (NIST chicken_consensus_final_ true_lib (December 5, 2008)) contained 4,437 spectra and was generated from many dozens of runs at NIST. Both libraries are available from the authors (S. E. Stein) on request. Score thresholds were fixed for all analyses to yield an overall false discovery rate of 1% estimated by searching decoy libraries of unrelated organisms and eliminating homologous matches. In cases where a spectrum identified more than one peptide, only the peptide identification with the highest score was used.
ReAdW4Mascot2.exe version 2.1 (ConvVer 20081119b), an extension of ReAdW.exe (Patrick Pedroli, Institute for Systems Biology) was used to extract peak lists to mzXML or Mascot Generic format (MGF). The converter was run using the following arguments: for LTQ data: -sep1 -NoPeaks1 -c -MaxPI; and for Orbitrap data: -sep1 -NoPeaks1 -c -MaxPI -ChargeMgfOrbi -MonoisoMgfOrbi. The search engine used was SpectraST (version 3.0, TPP v4.0 JETSTREAM revision 2, Build 200807011544) (19). For spectral library searches with SpectraST, enzyme specificity (trypsin), numbers of missed cleavages permitted (≤2), fixed modifications (none), and variable modifications (carbamidomethyl-Cys, oxidized Met, N-terminal acetyl, pyro-Glu, and pyro-carbamidomethyl cysteine) were determined by the contents of the spectral library. Mass tolerance for precursor ions was 2 m/z for LTQ data and 1.0 m/z for Orbitrap data. Mass tolerance for fragment ions is not adjustable in SpectraST. SpectraST does not select candidate spectra based on charge. Therefore, charge information in the peak lists is ignored. The cutoff score/expectation value for accepting individual MS/MS spectra was an fval of 0.45. This threshold was based on a global FDR calculation using decoy spectra. FDR was calculated using the formula
where FP is 2 times the number of false positive matches at or above this score and TP is the number of true positive matches at or above this score. Overlapping decoy and target spectra were removed prior to the analysis, and FP was scaled according to the ratio of the target and decoy library size. Decoy spectra were actual peptide spectra from an unrelated organism, and the distribution of random matches between target and decoy libraries was approximately equal.
P-2C, a good overall measure of performance, is defined as the number of distinct identified tryptic peptide sequences, ignoring modifications and charge state. Numbers of unique semitryptic peptides (truncated tryptic peptides) were also counted, and the ratio of semitryptic/tryptic peptides is reported (P-3). Because semitryptic peptides can be formed by sample degradation or in the ionization source, higher total peptide values do not necessarily reflect better performance. This metric should be useful for determining how complete a digest is between preparations or for assessing how variable in-source fragmentation is between runs of the same sample. Also reported are the total numbers of identified spectra (P-2A) and the number of identified precursor ions (P-2B). The median score of identified peptides is also reported (P-1). In the case of the default analysis pipeline, this is the median SpectraST (19) fval for all peptide identifications. A score threshold of 0.45 was applied to all analyses. Relative decreases in the median score can indicate a reduction in MS2 S/N or other problems resulting from a divergence in similarity to consensus library spectra.
We have used a newly developed metrics pipeline to calculate all of the values presented in this study (Fig. 3). The software consists of 1) a data extraction/feature-finding algorithm derived from ReAdW.exe (SourceForge) and 2) a peptide identification engine (either SpectraST (19) or open mass spectrometry search algorithm (OMSSA) (24) are currently integrated) followed by 3) a program that calculates all of the metrics from the extracted data and 4) a program that generates statistics for the data series.
The pipeline is currently used to analyze Thermo Fisher .RAW files, requiring Thermo XcaliburTM to be installed. Updates for handling other vendor-specific files are being developed. The entire work flow is driven by a Perl script that is directed to one or more directories full of raw data files producing an output file in tab-delimited text format. This software is available for download.
Fresh chicken egg yolk was evaporated in vacuo, and 1-mg samples were stirred in 100 μl of 6 m urea, 0.1 m Tris buffer, pH 8, for 2 h at room temperature. Cysteines were reduced and alkylated with 1 mmol of dithiothreitol (1 h) followed by addition of 4 mmol of iodoacetamide for 1 h at room temperature. Excess iodoacetamide was incubated with excess dithiothreitol (4 mmol) for 1 h. The sample was then diluted with water to a total volume of 1 ml, mixed with 20 μg of Promega sequencing grade modified trypsin, and stirred at 37 °C for 18 h. After digestion, the solution was acidified with 20 μl of 50% formic acid. The digest was divided into three samples. One was refrigerated for 2 weeks and analyzed (sample 1). A second aliquot was frozen for 2 weeks, thawed, and then analyzed (sample 2). The third was evaporated to dryness in vacuo, redissolved in water at the same concentration, frozen for 2 weeks, then thawed, and analyzed (sample 3).
Five technical replicate analyses of each sample were done by LC-MS/MS using an LC-Packings Ultimate 3000 HPLC system (Dionex Corp., Sunnyvale, CA) coupled to a Thermo LTQ linear ion trap mass spectrometer (Thermo Scientific, Waltham, MA). Samples (2 μl) were injected onto a Dionex C18 Acclaim PepMap 300 column (300-μm inner diameter, 15 cm long) and eluted with a gradient of water (A)/acetonitrile (B), each containing 0.1% formic acid as follows: 0–40 min, 0–50% B; 40–45 min, 50–95% B; 45–48 min, 95% B; 48–50 min, 95–0% B; 50–60 min, 0% B. The flow rate was 4 μl/min. The eluent passed through a 15-μm silica tip (New Objective, Woburn, MA) and was sprayed into the LTQ mass spectrometer. Each MS scan was followed by eight MS/MS spectra of the eight most intense peaks, taken in reverse order. A dynamic exclusion time of 20 s was used with a list size of 500. The collision energy was set to 35%.
An aliquot of the tryptic digest of the CPTAC yeast reference material (see supplemental material) was redissolved in 0.1% formic acid in water at a concentration of 1 μg/μl. Solutions at 400 ng/μl, 40 ng/μl, 4 ng/μl, and 400 pg/μl were made by serial dilution and were analyzed by LC-MS/MS using a nanoLC-2D LC pump (Eksigent Technologies, Dublin, CA) coupled to a Thermo LTQ mass spectrometer (Thermo Scientific). For each series of LC-MS/MS runs, samples were analyzed back to back from low to high concentration. Blank gradient runs were done between each series. Samples were loaded onto an Atlantis dC18 trap column at 4.5 μl/min (Waters Corp.) and eluted onto a 100-μm × 10-cm BioBasic C18 IntegraFrit column (New Objective) connected to a 20-μm SilicaTip with a 10-μm tip (New Objective). Peptides were separated at 450 nl/min with the following gradient: 2–30.5% B over 90 min, 30.5–90% B over 10 min, 90% B for 2 min, 90–2% B over 3 min, and 2% B for 10 min (A = 0.1% formic acid in water, and B = 0.1% formic acid in acetonitrile). Except for the LC analysis described above, all other instrument settings and parameters were according to the CPTAC Network Study 6 standard operating procedure (SOP) (see the supplemental material).
Complete descriptions of the CPTAC yeast protein materials, their preparation, and digestion and of the SOPs for CPTAC Network Studies 5 and 6 described here are presented in the supplemental material.
We developed 46 performance metrics, which map to LC-MS/MS system components as shown in Fig. 1. A list annotated with brief descriptions of each metric is provided in Table I; in the text, we refer to them by category and code. The metrics map to functional system components including the liquid chromatography system, the MS instrument (electrospray source and MS1 and MS2 signal intensities), the MS instrument control system (dynamic sampling), and the data analysis system (peptide identification outputs after database searching).
The metrics were first derived empirically through examination of run-to-run or lab-to-lab differences in data sets. In some cases, characteristics of the data sets suggested relevant metrics. For example, it was noted that on one LC-MS/MS system early eluting peptides were absent. Based on this observation, an algorithm was written to calculate the median peptide elution rank order differences in the early part of the chromatogram. Further logical examination of variability in the chromatography led to a focus on quantification of the observed variation and thus to the related chromatography metrics. In another example, average precursor m/z values for identified peptides were calculated for each run because the data were easily accessible. Differences between instruments for this metric were later attributed to differences in tuning protocols. Variations of this iterative “observe, quantify, evaluate, and refine” approach have yielded over 100 metrics of which 46 are reported here. Further development of performance metrics is a continuing focus of our work (at NIST).
We initially evaluated the behavior of the metrics in replicate analyses of three similar samples on a single LC-MS/MS system. An egg yolk protein extract was digested with trypsin and stored refrigerated for 2 weeks (preparation A, six LC-MS/MS replicates); frozen for 2 weeks, thawed, and analyzed (preparation B, six replicates); or evaporated in vacuo, redissolved at the same concentration, stored frozen for 2 weeks, thawed, and analyzed (preparation C, five runs). The replicates were each analyzed on a Thermo LTQ linear ion trap mass spectrometer. The median intraseries (replicates within a preparation) and interseries (between preparations) deviations are given in Table II. The %dev values represent the median absolute difference between all pairs of measurements expressed as a percentage of the median value. The average %dev within replicates for each preparation was less than 2% under these conditions and less than 3% across all runs for the three preparations. As expected, variation was greater on average between preparations A, B, and C than between replicate analyses. For example, the variability between preparations in the number of tryptic peptide counts (P-2A) was nearly 3 times greater than the average intraseries variability. This reflects the expected behavior of this metric in which modest differences between the three preparations should give rise to varying peptide identifications.
The average ratio of interseries to intraseries %dev in Table II was 1.29, which indicates that the metrics vary slightly more due to modest sample differences than to variation between replicate analyses. Thus, the results in Table II indicate that the metrics are stable across analytical replicates and provide estimates of normal ranges for the metrics for typical laboratory handling.
To evaluate the response of the metrics to controlled experimental variation, we analyzed a tryptic digest of the NCI-CPTAC yeast proteome reference material in triplicate at 10 sample loads ranging from 1.6 to 6,000 ng. Patterns of variation of the metrics with sample load illustrate their response characteristics (Fig. 4). Tryptic peptide identification metrics P-2A, P-2B, and P-2C all increased with load, but increases were marginal at loads above 100 ng (Fig. 4a). This was paralleled by increases in MS1 signal intensity parameters MS1-2B and MS1-3B together with a concomitant drop-off in ion injection time (MS1-1) (Fig. 4e) and an increase in median S/N for identified MS1 (MS1-2A) and MS2 spectra (MS2-2) for identified peptides (Fig. 4f). These metrics quantitatively illustrate the well known relationship between higher sample load, greater MS1 and MS2 signal intensities, higher spectral quality, and greater numbers of successful database matches. The increased variation in the metric values at lower sample loads is also presumably due to the diminishing concentration of the analytes. The fraction of identified MS2 spectra in each of the four MS1 quartiles also increased with sample loading up to 100 ng (MS2-4A through MS2-4D) (Fig. 4f) as expected. Metrics of ion source performance indicated that, although the median precursor m/z for identified peptides (IS-2) increased only slightly with sample load, the number of 1+ species identified relative to 2+ species (IS-3A) increased sharply at loads above 160 ng (Fig. 4c). This may reflect limited availability of protons at the source at higher analyte concentrations.
Some of the chromatography metrics exhibited striking dependence on sample load (Fig. 4b). Most notably, C-2B (peptide identification rate) increased only at lower loading concentrations, whereas the “bleed” metric C-1A increased across the entire range. This latter metric was not accompanied by corresponding changes in peak width metrics (C-3A and C-3B), which suggests that C-1A simply reflects load rather than loss of column performance. This is consistent with the observation that peptide identifications increased across the load range, albeit more slowly at the highest concentrations.
These data demonstrate that the metrics correctly represent well understood relationships among system components and demonstrate sensitivity to “real world” variables (e.g. sample load). In this example, the practical inference drawn from the combined metrics is that system performance improves dramatically with incremental sample loading increases below 160 ng as reflected collectively by identifications, chromatography, and MS signal intensities, whereas improvement at higher loads is not as significant. This illustrates the value of the metrics for rational, quantitative optimization of system performance.
The CPTAC interlaboratory studies (CPTAC Studies 5 and 6) involved analyses of a common yeast proteome reference sample on multiple Thermo LTQ and LTQ-Orbitrap instruments in several laboratories. The yeast extract was digested with trypsin, and aliquots were distributed for analysis. Although the participating laboratories used different LC systems and autosamplers with these MS instruments, they also used an SOP, which standardized sample loading, chromatography, MS instrument tuning, and dynamic sampling (see supplemental material).
As part of Study 6, three replicate analyses of the yeast digest were analyzed on four different LTQ-Orbitrap systems in three different laboratories (Fig. 5). Inspection of the peptide identification summary (Fig. 5a) indicates an approximately 40% reduction in the number of peptide identifications by LTQ-Orbitrap@86 compared with the other three Orbitraps. Inspection of the other metrics indicates that dynamic oversampling parameters DS-1A and DS-1B were lower for LTQ-Orbitrap@86 than for the others (Fig. 5d), indicating excessive repeat sampling of peptide ion signals for MS/MS. This would have the observed effect of lowering peptide identifications. Another major characteristic of underperformance of instrument “@86” is the reduced value for ratios of 3+/2+ charge states (IS-3B) and 4+/2+ charge states (IS-3C) for identified peptides (Fig. 5c). (No 4+ peptide ions were identified for this instrument.) This may indicate a shift in the distribution of all peptide ion charge states and may have led to an increased fraction of 1+ peptide ions, which were excluded from MS/MS in the Orbitraps. Compliance with SOP dynamic exclusion settings was verified by inspection of the data file headers. However, further investigation identified inadequate formic acid content in the LC mobile phase as the likely cause, which is consistent with both a decreased proportion of higher charge state ions (IS-3B and IS-3C) and increased oversampling (fall in the value of DS-1A).
The Study 6 data also identified performance metrics whose variation had little or no impact on peptide identifications, including the MS1 and MS2 signal intensity metrics (Fig. 5, c and e) for the case of “Orbitrap@86.” Also noteworthy is the modest dip in peptide identifications in the second run for instrument LTQ-Orbitrap@65P (Fig. 5a). This occurred together with evidence of electrospray instability as indicated by 10-fold jumps or drops in MS1 signal intensity (IS-1A and IS-1B; Fig. 5c) in adjacent full scans. Manual inspection of the corresponding data file revealed a periodic “sawtooth” profile for the base peak chromatogram, indicating electrospray instability.
Another noteworthy aspect of the Study 6 data is the high degree of reproducibility of the chromatography metrics across instruments. Although the four Orbitraps used three different combinations of LC and autosampler systems, the SOP specified stationary phase, column measurements, flow rates, and injection parameters. Peptide elution periods, peak widths, and chromatographic bleed were generally consistent across instruments and replicate analyses, illustrating the feasibility of SOP-driven studies of LC-MS platforms, even when different hardware components may be used.
In CPTAC Study 5, the yeast digest was analyzed in six replicates on three LTQ and three Orbitrap instruments in five laboratories. The metric C-6A, which detects differences in the relative numbers of early and late eluting peptides within and between labs, identified large differences within replicates for “LTQ2@95” (Fig. 6a). These differences were reflected in increased values for chromatographic bleed metrics C-1A and C-1B and decreased peptide identification rate (C-2B) (Fig. 6b) and lowered numbers of peptide identifications (Fig. 6c) in the fourth and fifth runs in the series. This enabled identification of wash solvent cross-contamination of the sample injection loop as a probable cause. Correction of the problem partially restored the metrics and peptide identifications to values comparable to the other systems (see Fig. 6, “LTQ2@95-rep”).
Whereas Table II describes variation in the metrics for a single instrument in replicate analyses, the CPTAC interlaboratory studies provided a means to evaluate variability across multiple laboratories, instruments, and analyses. Fig. 7a displays average intralab %dev values for the metrics for all six instruments (three LTQs and three Orbitraps) in CPTAC Study 5 (six replicate yeast digest analyses). The metrics have been sorted within categories from lowest to highest and the error bars represent the %dev of the intralab %dev (i.e. they estimate the range of %dev values across the labs). The plotted size of the colored bar represents variation within a laboratory; the relative size of the accompanying error bar indicates variation (%dev) of the intralab %dev across laboratories. These values as displayed are not intended to represent interlab variability, which would require comparing measurements between instrument classes directly, but to approximate the variability of the intralab %dev values. For most of the metrics, these values are comparable and average less than 10%. The most highly variable metrics describe MS1 signal intensity, dynamic sampling, and chromatographic bleed. Indeed, this latter metric was the most variable both within and between labs, also consistent with the egg yolk studies presented in Table I.
This comparison can be extended to interlab values for the LTQs and Orbitraps (Fig. 7, b and c). Whereas Fig. 6a depicts the variation within labs, Fig. 7, b and c, reveal values that are most irregular between labs. The main finding is that interlaboratory variation in peptide identifications and most other performance metrics are comparable between the LTQ and Orbitrap systems, even though there are large differences in the average values for some metrics between them (not shown). Differences in these mass analyzers result in different MS1 signal intensities, but this also results in greater variation in both MS1 and dynamic sampling metrics for the Orbitraps (Fig. 7, b versus c). This could also be reflected by the fact that one of the Orbitraps outperformed the others in the study by more than 20% in the number of unique tryptic peptide identifications for this study (data not shown). Nevertheless, variations in MS2 metrics appear comparable for LTQs and Orbitraps.
Fig. 7, a–c, also provide a broad perspective on which analytical techniques pose the greatest challenges to intralaboratory and interlaboratory standardization. For example, the metrics with the highest variation describe peptide chromatography (e.g. C-1A and C-1B), which is subject to more variable influences than any other component of the LC-MS/MS system. Even modest variations in mobile phase composition, gradient delivery, flow control stability, sample contaminants, and the composition of previous samples can influence peptide elution. We note that the relative variability of C-1A and C-1B corresponds to the typical experience in chromatography where peak tailing (C-1B) is a more common problem than peak bleed or “fronting” (C-1A).
Troubleshooting poor system performance in LC-MS/MS-based proteomics typically involves a combination of experience, intuition, and a highly subjective evaluation of limited data (e.g.“what does the chromatogram look like?”). Although causes of commonly encountered system problems are well known and can be rationalized in retrospect, the combinatorial possibilities of malfunction among multiple components may preclude a systematic approach to diagnosis. The 46 performance metrics described here enable the implementation of a quantitative, integrated approach to system troubleshooting. These metrics enable diagnosis of poor system performance, such as the ~40% decline in peptide identifications by one Orbitrap compared with others in an interlaboratory study (Fig. 5).
However, performance metrics can help diagnose much more subtle, yet important, problems. These metrics provide the first effective means to deal with modest decrements in system performance, which are frequently encountered yet difficult to diagnose. The metrics are quite sensitive to small changes in performance: most have %dev values of less than 10%. The sensitivity of these metrics enabled detection of electrospray instability as a contributing cause of modestly diminished peptide identification performance in one replicate analysis of a yeast proteome digest in CPTAC Study 6 (Fig. 5, LTQ-Orbitrap@65P).
A relatively large number of metrics (46 presented here) could be considered excessive for purposes of troubleshooting system performance. However, the availability of over 40 metrics does not imply that all metrics are always used together for diagnostic purposes. Indeed, this would probably always be unnecessary. In the examples we present here, specific system malfunctions are indicated by changes in smaller subsets of metrics. On the other hand, the advantage of the relatively large number of metrics is that they reflect diverse components of the system.
A complete record of system performance should become a critical element of quality control documentation for LC-MS data sets. This requirement for documentation of system performance is important in many applications in proteomics, particularly where analysis and comparison of LC-MS data sets provide the basis for identifying distinct characteristics of biological systems. Apparent differences between phenotypes are detected by comparing data sets from multiple technical replicate analyses of the corresponding samples. A key assumption underlying this approach is that observed differences represent true proteomic differences rather than variability in system performance. This is particularly important when biological differences between samples are relatively modest as analyses must be able to discern differences comprising a small subset of proteome components. The metrics we describe here could provide an unambiguous basis for quantitatively defining platform stability and could enable identification of outlier data that would otherwise confound biological comparisons.
We also have shown here in the context of the CPTAC interlaboratory studies that these metrics display stability in behavior across multiple laboratories and systems. These observations not only further define the utility and normal ranges of the metrics but provide insights into the aspects of multilaboratory studies that provide the greatest barriers to platform standardization. However, the implementation of an SOP in CPTAC Study 6 effectively normalized key features of the chromatography (Fig. 5c), a remarkable achievement in view of the use of different LC systems by the participating laboratories. We note that SOPs were used in the CPTAC studies to enable comparisons under conditions where key system variables were held constant, thus enabling identification of sources of variability in peptide and protein detection. The SOPs represented a balance between performance optimization for peptide detection and practical considerations for interlaboratory studies. They do not represent fully optimized methods and are not intended as prescriptive for the proteomics research community.
This work on performance metrics was done with Thermo LTQ and LTQ-Orbitrap instruments, which are commonly used for LC-MS/MS proteomics and were the principal instruments available in our laboratories (at NIST) and in the participating CPTAC laboratories. Although a few of the metrics we describe (e.g. ion injection times) are not applicable to other instruments, such as quadrupole-time of flight instruments, most of these metrics can be applied to any electrospray LC-MS/MS instrument platform used for proteomics. Extension of the metrics to these other systems will require software to extract data from instrument data files and reference data sets to define the behavior of individual metrics in different instrument platforms.
The metrics described here represent a subset of a larger body of measurements explored in our studies of LC-MS systems. Although these metrics were applied to data-dependent LC-MS/MS analyses, subsets of these metrics and variations thereof could be applied similarly to high resolution LC-MS “MS1 profiling” systems or to LC-multiple reaction monitoring-MS on triple quadrupole and quadrupole-ion trap instruments. Indeed, many of the metrics are applicable to the analysis of LC-MS systems for non-peptide analytes, including metabolites, lipids, carbohydrates, and other molecule classes. Implementation of these performance metrics will be facilitated by the distribution of software for extracting the metrics directly from raw data files and by development of graphical user interfaces and integration with standard proteome analysis work flows. We are continuing development and evaluation of new performance metrics and will make available software to facilitate their implementation in the analytical community.
* This work was supported, in whole or in part, by an interagency agreement between the NCI, National Institutes of Health and the National Institute of Standards and Technology and by National Institutes of Health Grants U24CA126476, U24CA126485, U24CA126480, U24CA126477, and U24CA126479 as part of the NCI Clinical Proteomic Technologies for Cancer (http://proteomics.cancer.gov) initiative. A component of this initiative is the Clinical Proteomic Technology Assessment for Cancer (CPTAC) teams, which include the Broad Institute of MIT and Harvard (with the Fred Hutchinson Cancer Research Center, Massachusetts General Hospital, the University of North Carolina at Chapel Hill, the University of Victoria, and the Plasma Proteome Institute), Memorial Sloan-Kettering Cancer Center (with the Skirball Institute at New York University), Purdue University (with Monarch Life Sciences, Indiana University, Indiana University-Purdue University Indianapolis, and the Hoosier Oncology Group), University of California, San Francisco (with the Buck Institute for Age Research, Lawrence Berkeley National Laboratory, the University of British Columbia, and the University of Texas M. D. Anderson Cancer Center), and Vanderbilt University School of Medicine (with the University of Texas M. D. Anderson Cancer Center, the University of Washington, and the University of Arizona). Contracts from NCI, National Institutes of Health through SAIC to Texas A&M University funded statistical support for this work. A complete listing of the CPTAC Network can be found at http://proteomics.cancer.gov.
The on-line version of this article (available at http://www.mcponline.org) contains supplemental material, including detailed descriptions of the preparation and characterization of the yeast protein extracts used in these studies and standard operating procedures for LC-MS/MS analyses in CPTAC Studies 5 and 6.
1 The abbreviations used are: