PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
J Am Soc Mass Spectrom. Author manuscript; available in PMC 2010 September 27.
Published in final edited form as:
PMCID: PMC2946190
NIHMSID: NIHMS234627

Comparison of Different Signal Thresholds on Data Dependent Sampling in Orbitrap and LTQ Mass Spectrometry for the Identification of Peptides and Proteins in Complex Mixtures

Abstract

We evaluate the effect of ion-abundance threshold settings for data dependent acquisition on a hybrid LTQ-Orbitrap mass spectrometer, analyzing features such as the total number of spectra collected, the signal to noise ratio of the full MS scans, the spectral quality of the tandem mass spectra acquired, and the number of peptides and proteins identified from a complex mixture. We find that increasing the threshold for data dependent acquisition generally decreases the quantity but increases the quality of the spectra acquired. This is especially true when the threshold setting is set above the noise level of the full MS scan. We compare two distinct experimental configurations: one where full MS scans are acquired in the Orbitrap analyzer, while tandem MS scans are acquired in the LTQ analyzer and one where both full MS and tandem MS scans are acquired in the LTQ analyzer. We examine the number of spectra, peptides, and proteins identified under various threshold conditions, and we find that the optimal threshold setting is at or below the respective noise level of the instrument regardless of whether the full MS scan is performed in the Orbitrap or in the LTQ analyzer. When comparing the high-throughput identification performance of the two analyzers, we conclude that, used at optimal threshold levels, the LTQ and the Orbitrap identify similar numbers of peptides and proteins. The higher scan speed of the LTQ, which results in more spectra being collected, is roughly compensated by the higher mass accuracy of the Orbitrap, which results in improved database searching and peptide validation software performance.

1. Introduction

Proteomics has emerged as a large-scale approach to study the functions and physiological roles of proteins. This approach has been facilitated by the creation of genome sequence resources for many different organisms that serve as model systems for the study of biological processes. Mass spectrometry based approaches have capitalized on the availability of sequence resources to speed the interpretation of data. By using predefined sequences mass spectrometry data is matched to sequences in the database rather than interpreted de novo. One of the most common approaches for proteomics makes use of tandem mass spectrometers and collision induced dissociation to create data indicative of the amino acid sequence of a protein. In a “bottom up” approach, a protein is first digested with protease(s) and then subjected to analysis by liquid chromatography in conjunction with tandem mass spectrometry (LC/MS/MS). Peptide ions are subjected to tandem mass spectrometry and the spectra are then searched through sequence databases to identify the amino acid sequence with the best fit. When a protein mixture is subjected to proteolysis prior to analysis, this approach is referred to as “shotgun proteomics” and uses the tandem mass spectra obtained from each peptide to assign the presence of proteins in the mixture.12

The analysis of complex protein mixtures using a “shotgun proteomic” approach is made possible through computer control of an instrument’s operation using “data-dependent-acquisition” (DDA).3 Shotgun proteomics is dependent on the efficient and rapid acquisition of tandem mass spectra. Most commercial mass spectrometry (MS) instruments have some form of software to control tandem mass spectrometry experiments of precursor ions selected from a previously acquired full scan. By employing high scan speeds and sampling rates, more peptide ions are acquired per unit of time, resulting in the acquisition of a larger number of tandem mass spectra. A common approach for a “data-dependent” experiment is to trigger the acquisition of product ion spectra based on the intensity of ions detected in full scan data.3 Thus, precursor ions above a pre-set ion abundance threshold trigger the instrument to automatically perform CID on those precursor ions. Acquiring high quality tandem mass spectra is essential for proper fragment ion assignment and matching to sequences in database searches, as well as de novo interpretation.

Several acquisition parameters affect the collection of tandem mass spectra using data-dependent acquisition. Lynn et al examined the effect of signal-averaging full and MS/MS scans on protein identification in order to optimize duty cycle and ion injection time with spectral quality.4 Regardless of the type of data-dependant procedure used, the signal to noise of precursor ions ultimately impacts the detection of ions for data dependent acquisition. The detection limit is not only determined by the limit of detection of the mass spectrometer, but also by the different types of noise present in the system. Therefore a reduction of noise to improve the selection of peptide ion signals can increase the acquisition of spectra which represent peptide ions. The minimum ion abundance threshold set to trigger DDA differs between researchers, and there has never been a systematic analysis of different strategies to assess which approach might lead to an increase in protein identifications or potentially a greater number of false positive identifications. In general post acquisition data processing has been used to minimize the number of false positives by removing poor quality data rather than altering the data acquisition strategy. One approach is to run data-dependent experiments using a threshold set to the level of chemical noise in an LC analysis to promote the sampling of low abundant peptides, but this is done at the expense of collecting “junk” MS/MS spectra. An assumption is that chemical noise is typically higher in the precursor ion scans of an ion trap mass spectrometer and thus ions can be selected from amongst the noise that may lead to correct identifications. This approach is used to both increase the limit of detection and to improve the dynamic range of an analysis. Another approach is to use higher thresholds but also acquire far fewer tandem MS spectra and risk missing low abundant peptides.5 In this study we evaluate the effects of using different ion abundance threshold values on peptide and protein identification using a linear ion trap-Orbitrap (LTQ-Orbitrap) and a linear ion trap (LTQ) mass spectrometer.

2. Experimental

2.1. Sample Preparation and Digestion

Yeast whole-cell lysate was grown using a previously published protocol.6,7 Yeast cells were collected by centrifugation at 1000g, the soluble fraction was denatured with 8 M urea in 100 mM Tris (pH 8.5), reduced and alkylated with 5 mM Tris(2-carboxyethyl) phosphine hydrochloride (TCEP, Roche Applied Science, Palo Alto, CA) and 10 mM iodoacetamide (IAM, Sigma) in 100 mM Tris (pH 8.5). Digestion was performed in the presence of 5 mM calcium chloride (Sigma) using endoproteinase Lys-C (Roche Diagnostics, Indianapolis, IN) followed by sequencing grade trypsin (Promega, Madison, WI).8 The resulting peptide mixtures were acidified with 5% formic acid, and ~5μg aliquot was pressure-loaded onto an equilibrated reversed-phase (RP) column. The microcapillary column was constructed by slurry packing ~10 cm of C18 material (5 um, 125 Å, Aqua, Phenomenex) into a 100 μm fused silica capillary, which had been previously pulled to a tip diameter of ~5 μm using a Sutter Instruments laser puller (Sutter Manufacturing, Novato, CA). Separations were performed on an Agilent 1100 quaternary HPLC (Palo Alto, CA). The buffer solutions used were 5% acetonitrile/0.1% formic acid (buffer A) and 80% acetonitrile/0.1% formic acid (buffer B). A 120 min gradient from 0–100% buffer B was used.

2.2. Mass Spectrometry Methods

Data-dependent tandem mass spectrometry (MS/MS) analysis was performed using an LTQ-Orbitrap hybrid mass spectrometer (ThermoFisher, San Jose, CA). Full MS spectra were acquired in profile mode, with a mass range of 600 – 2000 either in the Orbitrap analyzer with resolution set at 15,000 (with disabled preview scan mode), or in the LTQ using the normal scan rate, followed by four MS/MS events in the linear ion trap. To prevent repetitive analysis, dynamic exclusion was enabled with a repeat count of 1, a repeat duration of 30 seconds, and an exclusion list size of 100. MS scan functions and HPLC solvent gradients were controlled by the XCalibur data system (ThermoFisher, San Jose, CA). All tandem mass spectra were collected using a normalized collision energy of 35% and an isolation window of 3 Daltons. One micro scan was applied for all experiments in Orbitrap or LTQ. Maximum ion injection times for full scan in Orbitrap and LTQ were 500 ms and 50 ms respectively, and for MS/MS scan was 100 ms. AGC targets for LTQ and Orbitrap are 3e4 and 5e5 (full scan), 1e4 and 2e5 (Msn scan).

2.3. Software

2.3.1 Spectral quality determination

Spectral quality was evaluated using a recently developed extension of a previously published algorithm.9 We define the measure of quality to be the fraction of b and y ions observed among the peaks of high intensity. More specifically, we define

Quality=(Nb+Ny)/(2Length2),

where Nb and Ny are the number of b and y ion peaks, respectively, whose intensity ranks are less than 100, and Length is the number of amino acids in the peptide. This feature runs from 0.0 to 1.0, from no b and y ion peaks present among the top 100 peaks of a spectrum, to all b and y ions present among the top 100 peaks. The quality measure of spectra, as defined above, can be determined a posteriori, after the database searching algorithm has identified the matching peptide and the corresponding b and y ions.

We determined a predicted quality score (Q) based on two features that have been shown9 to be good predictors of spectral quality: Qdiffs, the likelihood that two peaks in the spectrum differ by the mass of an amino acid, and Qcomplements, the likelihood that a pair of peaks in the spectrum have complementary m/z-values summing to the mass of the parent ion. Both features are weighed by the intensities of the respective peaks, as described in the supplementary information.

Quality scores were calculated for the tandem mass spectra collected. All spectra, including those with very low scores, were then analyzed by the database searching algorithm, as described in the next section.

2.3.2 Database searching and analysis of tandem mass spectra

Full MS and tandem mass spectra were extracted from raw files,10 and the tandem mass spectra were searched against a Saccharomyces cerevisiae protein database containing 5,873 protein sequences, containing the translations of all systematically named ORFs, downloaded as FASTA-formatted sequences from the Saccharomyces Genome Database (database released on December 16, 2005), and 123 common contaminant proteins, for a total of 5,996 target database sequences. In order to calculate confidence levels and false discovery rates, a decoy database containing the reverse sequences of the 5,996 proteins was appended to the target database,11 and the SEQUEST2 algorithm was used to find the best matching sequences from the combined database.

SEQUEST searches were done on an Intel Xeon 80-processor cluster running under the Linux operating system. The peptide mass search tolerance was set to 3 Da for spectra acquired on the LTQ instrument, and 50ppm for spectra acquired on the hybrid LTQ-Orbitrap instrument. Since the peak selected for MS/MS analysis by the instrument control software is often not a monoisotopic ion, the search algorithm considers multiple isotopes, with a 50ppm mass tolerance for each possible theoretical isotope peak. For comparison, we also performed a “mock” analysis of the LTQ-Orbitrap data, ignoring the accurate precursor mass information, and using a peptide mass search tolerance of 3 Da (in essence, treating the Orbitrap-LTQ data the same as LTQ data). The mass of the amino acid Cysteine was statically modified by +57.02146 Da, to take into account the carboxyamidomethylation of the sample, and we considered a variable modification of +15.99491 (oxidation) on the amino acid Methionine. No enzymatic cleavage conditions were imposed on the database search, so the search space included all candidate peptides whose theoretical mass fell within the mass tolerance window, regardless of their tryptic status.

The validity of peptide/spectrum matches was assessed in DTASelect12,13 using two SEQUEST-defined parameters, the cross-correlation score (XCorr) and normalized difference in cross-correlation scores (DeltaCN). For Orbitrap samples using the accurate precursor mass information, a third scoring parameter was included: DeltaMass, the absolute difference between the experimental precursor ion mass and the nearest theoretical isotope peak. The search results were grouped by charge state (+1, +2, +3, and +4), tryptic status (fully tryptic, half-tryptic, and non-tryptic), and modification status (modified and unmodified peptides), resulting in 24 distinct sub-groups. In each one of these sub-groups, the distribution of Xcorr, DeltaCN, and DeltaMass values for (a) direct and (b) decoy database hits was obtained, then the direct and decoy subsets were separated by discriminant analysis. Outlier points in the two distributions (for example, matches with very low Xcorr but very high DeltCN) were discarded. Full separation of the direct and decoy subsets is not generally possible; therefore, the discriminant score was set such that a false discovery rate of 5% was determined based on the number of accepted decoy database peptides. This procedure was independently performed on each data subset, resulting in a false discovery rate independent of tryptic status, modification status, or charge state.

In addition, a minimum sequence length of 7 amino acid residues was required, and each protein on the list was supported by at least two peptide identifications, with a minimum sequence coverage of 5%. These additional requirements resulted in the elimination of most decoy database and false positive hits, as these tended to be overwhelmingly present as proteins identified by single peptide matches, or with very low sequence coverage. After this last filtering step, both the protein and peptide false discovery rates were reduced to below 0.5%.

This procedure was applied to each data set, to ensure a uniform identification standard. For this study, setting the same standard across the pool of experiments is more important than the particular choice of peptide and protein false discovery rates. We repeated the analysis allowing for higher (5%) or lower (0.1%) false discovery rates (data not shown), and, although the number of identifications changed depending on the choice of filtering criteria, we found that the overall trends and conclusions stayed the same.

2.3.3 Signal to noise evaluation in the full MS scans

A method to evaluate the signal to noise of precursor ions in MS scans that immediately precede identified tandem mass spectra was developed in house. For each MS spectrum, the peaks were ranked by intensity, and the bottom 10% of peaks were considered chemical noise. The 10% value can be user modified, and we tested several values, with similar results. The background noise level was then defined as the average intensity of these peaks, and the signal to noise of a precursor ion was defined as the ratio of that peak’s intensity to the background noise level.

The signal to noise was calculated for all acquired precursors of tandem mass spectra, in both the LTQ and Orbitrap analyzers, for different ion abundance threshold values.

3. Results and Discussion

The main goal of this study was to determine the effects of the ion abundance trigger threshold value on peptide and protein identification in shotgun proteomics. Intuitively, increasing the level of the ion abundance threshold should have a negative effect on the number of tandem mass spectra collected, as well as a positive effect on the overall quality of tandem mass spectra. Since both of these factors affect peptide identification in database searches, it is of significant interest to determine the threshold level(s) which result in optimal peptide and protein identification. It has been noted in the past that large numbers of tandem mass spectra remain unmatched in database searches (sometimes as many as 70–80%). An explanation of the low identification rate is typically attributed to missed matches in database searches, suggesting that unmatched spectra may represent peptides with unanticipated modifications or sequence variations. It is also possible that the low success rate could simply be a result of setting threshold parameters low enough that the mass spectrometer is collecting tandem mass spectra predominantly from chemical noise. This study was designed to evaluate various ion abundance acquisition levels, to determine the appropriate level that minimizes acquisition of spectra from noise and maximize matches to real peptide sequences.

3.1 Number of Tandem Mass Spectra Collected as a Function of Threshold Trigger Value

A typical choice for the ion abundance threshold value is at the level of chemical noise in the LC/MS/MS system for a given instrument. In this study we determined this value by a solvent blank analysis prior to acquisition of data from the sample. Data was collected using either the LTQ or the Orbitrap for MS scan acquisition, followed by tandem mass spectra acquired in the LTQ analyzer. Figure 1 shows the base peak chromatogram of solvent blank acquired in the LTQ analyzer: the upper part (a) of the figure shows the entire run, while the lower part (b) depicts the amplified baseline of the chromatogram between t = 22 minutes and t = 32 minutes. From Figure 1b, one can estimate the noise level in the LTQ full MS scan at 1e4 intensity units. Using this method, the level of noise in the LC/MS analyses of this study was determined to be approximately 1e4 and 1e5 intensity units for MS scans acquired in the LTQ and Orbitrap, respectively.

Figure 1
(a) Base peak chromatogram of solvent blank for a single phase run on the LTQ analyzer (the intensity of solvent impurity at about 3e5). (b) Amplified baseline of chromatogram between time points t = 21 minutes and t = 32 minutes. The average peak intensity ...

Starting from these levels, we collected data on the same sample at threshold values from 1e1 to 1e8 intensity units (only 1e1 to 1e7 results are presented in this paper, as the instruments were not able to acquire any MS2 spectra at the 1e8 threshold condition). Each measurement was made using a 5 μg aliquot of a trypsin digested soluble yeast cell lysate, and three replicate measurements were made at each threshold value using each analyzer for MS scans. Tables 1a and 1b show the average and standard deviation for the number of spectra acquired and identified as a function of threshold level, for the LTQ and Orbitrap analyzers (the number of spectra acquired and identified in each individual replicate experiment is presented in Supplementary Tables 1a and 1b). Figure 2 displays a comparison of the average number of spectra, over the 3 replicates, as a function of threshold value, for the LTQ and Orbitrap analyzers. We find that the number of tandem mass spectra acquired generally decreases with increasing threshold value. An ANOVA analysis of the number of spectra acquired as a function of the threshold value shows that, in general, the threshold value has a statistically significant effect on data acquisition (p-value [double less-than sign] 0.05). However, when the analysis is limited to those thresholds well below the noise level (1e1 to 1e3), there is no statistically significant change in the number of acquired spectra. For low threshold values, either an ANOVA test or pair-wise t-tests (presented in Table 3) of the number of spectra acquired as a function of the threshold value fails to reject the null hypothesis. This is an indication that the instrument can always find candidate precursors within the chemical noise, and reflects its inability to differentiate peptide precursor ions from chemical noise. In contrast, above the 1e4 threshold for LTQ and 1e5 for Orbitrap, a sharp drop in the number of tandem mass spectra collected is observed with each 10 fold increase in the threshold levels. Going from an 1e4 to an 1e5 threshold on the LTQ results in a very significant change in the number of acquired spectra (t-test with a p-value p = 0.00052 [double less-than sign] 0.05). Likewise, going from an 1e5 to an 1e6 threshold on the Orbitrap results in a very significant change (t-test with a p-value p = 0.000017 [double less-than sign] 0.05). Finally, at threshold levels of 1e6 and 1e7, for LTQ and Orbitrap respectively, very few tandem mass spectra are collected at all.

Figure 2
Comparison of number of spectra acquired by the LTQ (red) and Orbitrap (blue) analyzers as a function of threshold trigger level. Each data point was obtained by averaging over 3 replicate experiments.
Table 1
Total number of acquired spectra, and the number of identified spectra, peptides, and proteins, as a function of threshold trigger values on the LTQ analyzer (a) and Orbitrap analyzer (b). Each experiment was performed in triplicate, and the average values ...
Table 3
Statistical analysis of the effect of threshold values for (a) the LTQ and (b) the Orbitrap: t-tests for the number of acquired spectra and number of identified spectra, peptides and proteins. Values listed in boldface show statistically significant ...

When comparing the two instruments, the LTQ has the upper hand at low threshold levels, due to its faster scan speed. It acquires approximately 20% more spectra than the Orbitrap when the threshold is set at or below 1e4 intensity units. However, the LTQ’s acquisition performance drops faster at higher threshold levels, where it collects significantly fewer spectra. Table 2 shows a statistical comparison summary (t-test p-values) of the two analyzers. The number of spectra collected by the two analyzers is significantly different (p < 0.05) for all threshold values except 1e5. For values lower than 1e5 the LTQ collects more spectra, for values higher than 1e5 the Orbitrap collects more spectra, with a cross-over region (statistically insignificant difference) around 1e5.

Table 2
Statistical comparison of the LTQ and Orbitrap analyzers: t-tests for the number of acquired spectra and number of identified spectra, peptides and proteins. Each p-value listed in the table is derived from a corresponding t-test using 3 replicate experiments ...

So far, the data suggests that setting the threshold level at approximately the noise level may be the optimal choice. Going too far above the noise level results in a sharp drop in the number of acquired spectra, while going too far below brings no apparent benefits. The following sections further analyze this problem, by examining the quality of the collected spectra and the resulting peptide identifications.

3.2 Quality of Tandem Mass Spectra Collected as a Function of Threshold Trigger Value

One possible effect of a low threshold value is the acquisition of lower quality tandem mass spectra. On one hand, such spectra could include spectra derived from chemical noise, which would hinder the acquisition of real peptide spectra as well as burden the computational system. In addition, this approach may also acquire low quality peptide spectra derived from low abundance precursor ions. These real, but poor quality peptide tandem mass spectra could be difficult to identify, because they typically have fewer sequence ions present, as well as poor signal to noise.

On the other hand, the acquisition of low abundant peptide spectra could potentially lead to increased peptide and protein identifications, if the identification software is capable of handling the lower overall spectral quality. For example, using the accurate precursor mass information derived from an Orbitrap full MS scan may result in a confident peptide identification, even though the tandem mass spectrum itself is of low quality.

To better understand the give and take associated with MS acquisition thresholds, we scored all tandem mass spectra collected at different threshold values with the quality score Q, as defined earlier in the paper. Quality scores were averaged for each of the three analyses at each threshold value. Figures 3a and 3b show the normalized quality score distributions for LTQ and Orbitrap analyzers, respectively. Regardless of which analyzer was used for survey scan acquisition, a shift in scores is observed as the threshold is increased with a general trend of higher threshold values resulting in higher quality scores. An ANOVA analysis of the data confirms that the average quality score is significantly influenced by the threshold value (p-value [double less-than sign] 0.05 for either the LTQ or Orbitrap data). It is important to note that the quality scores are similar for the LTQ and Orbitrap analyzers. This is a reflection of the fact that the quality scores only measure the features of the tandem mass spectra, which are acquired in the LTQ analyzer in both methods, and not the features of the full MS scans.

Figure 3
Distribution of normalized quality scores for tandem mass spectra, as a function of threshold trigger level. (a) Precursor scan in the LTQ analyzer. (b) Precursor scan in the Orbitrap analyzer.

Defining good quality spectra as those whose quality score Q is larger than 0.2 (an empirical cut-off we typically use to filter out low quality spectra prior to database searching), we then display in Supplementary Figure 1 the percentage of good quality spectra acquired as a function of the threshold value. This shows that at threshold values higher than the noise level, most of the spectra acquired are of good quality, and therefore should be assigned confident identifications by the database searching and post-validation algorithms.

3.3 Signal to Noise Ratio as a Function of Threshold Trigger Value

In data dependent acquisition, precursor ions are typically selected for MS/MS analysis based on their intensities from MS scans. Because the chemical noise level changes throughout an LC analysis, we chose to investigate the distribution of signal to noise (S/N) of precursor ions for each of the different threshold values we analyzed. To accomplish this, we used a simple method to determine the S/N of every precursor ion selected for MSMS analysis (see Experimental Section for explanation of software).

Figures 4a and 4b show plots representing the signal to noise distributions of the precursor ions collected for each of the thresholds employed, for the LTQ and Orbitrap analyzers, respectively. The S/N distributions show an increasing trend with respect to the threshold value regardless of whether LTQ or Orbitrap MS scans were employed. The lower threshold trigger values (1e2 – 1e4) show similar signal to noise distributions; this illustrates the fact that the threshold has little effect on the average signal to noise ratio when it is set lower than the noise level. As the threshold is increased to 1e5 and 1e6, both data sets show a shift towards significantly higher S/N.

Figure 4
Distributions of signal to noise ratios (S/N) of the precursor ions for three replicate experiments in (a) the LTQ analyzer and (b) the Orbitrap analyzer, as a function of threshold trigger values.

One noticeable difference between the two data sets is that the measured S/N levels for the largest thresholds tended to be more accurate on the Orbitrap than on the LTQ. For example, when using the 1e7 threshold, which is approximately 100 times above the Orbitrap noise level, most of the S/N measurements were near or above 100. In contrast, the LTQ data obtained from the 1e6 threshold (also approximately 100 times higher than the noise level) shows most of the S/N measurements to be in the 20–30 range, with only a very few measurements approaching 100. It should be noted that changes in the noise level throughout the LC run would be expected to contribute to error in these types of comparisons, although we do not feel it is the major source of the differences between the two datasets. Instead, we feel the difference can most likely be attributed to the much larger ion capacity of the Orbitrap and the subsequent increase in intra-spectral dynamic range which has been shown to be in excess of 5000.14

3.4 Number of Peptide and Protein Identifications as a Function of Threshold Trigger Values

The overall goal of this study was to evaluate the effect of threshold trigger values on the efficacy of peptide and protein identification. Tables 1a and 1b and Supplementary Tables 1a and 1b show the total number of spectra collected, the number of spectra identified, and the number of peptides and proteins identified by the analyzers using different threshold values. Table 2 presents a summary comparison of the LTQ and Orbitrap analyzers, expressed as p-values derived from statistical t-tests for all the quantities calculated in Table 1, while Supplementary Tables 2a and 2b present the output of a series of t-tests comparing results for different pairs of threshold values. A graphical comparison of the number of peptides identified by the two analyzers is displayed in Figure 5. The data was analyzed using SEQUEST and DTASelect, as described in the Experimental section. Filtering parameters were dynamically and uniformly set such that both the protein and peptide false identification rates were below 0.5% for each data set. The results presented in these tables and figures are further discussed below.

Figure 5
Comparison of the total number of peptides confidently identified in the LTQ experiments (red) and the Orbitrap experiments (blue), as a function of threshold trigger level. Each data point was obtained by averaging over 3 replicate experiments.

Several conclusions stand out from these tables and figures: first, it is essential that the threshold trigger value not be set higher than the noise level of each respective analyzer. Increasing the threshold value significantly above the noise level results in a sharp drop in the number of spectra acquired. Although the spectra collected at high threshold values are of mostly high quality and their identification rate is very high, their quality does not make up for the lost quantity of spectra not acquired. On the other hand, the number of spectra, peptides and proteins identified does not significantly change as long as the threshold is set at or below the noise level. Tables 3a and 3b display t-test results for the LTQ and Orbitrap analyzers, respectively, showing that no significant changes in peptide or protein identification (p < 0.05) occur below 1e4 for the LTQ and below 1e5 for the Orbitrap. One-way ANOVA and Tukey tests confirm these findings (data not shown). In conclusion, while a single optimal value of the threshold may not exist, it is apparent that the entire range of threshold values below the noise level produces statistically similar outcomes in peptide and protein identification.

To find the “optimal” setting, other factors may be considered. For example, a user may avoid setting the threshold value extremely low (well below the noise level). This would only result, according to the data in Tables 1 and and3,3, in a marginal increase in the number of spectra acquired, without a corresponding increase in the number of identifications. Since there are no benefits in acquiring low-quality, unidentifiable spectra, and there are potential drawbacks (additional storage and perhaps additional data filtering required, depending on the bioinformatics pipeline), we conclude that a threshold value set at or very close below the noise level should be optimal.

Second, the overall high-throughput shotgun proteomics performance of the LTQ analyzer is comparable to that of the more sophisticated Orbitrap analyzer. The main advantage of the LTQ is its higher scan speed, which allows it to acquire approximately 20% more spectra. All other things being equal, the LTQ would identify more peptides when compared to the Orbitrap. Supplementary Table 1 shows the results of the mock analysis of the Orbitrap-LTQ data set, in which the high mass accuracy of the Orbitrap was ignored, and the data was analyzed using the same parameters and conditions as those for LTQ data. The number of spectra, peptides, and proteins identified was indeed significantly lower than those for the LTQ analysis.

However, the high mass accuracy of the Orbitrap allows it to confidently identify peptides with very low correlation scores. Figure 6 shows the distribution of cross-correlation-based Z-scores15 and mass deviation values for peptide matches to spectra whose precursors were isolated in the Orbitrap analyzer. The Orbitrap confidently identifies many peptides whose database searching scores are poor, but whose theoretical masses are within a few ppm of the experimental values (the bottom end of the high-confidence region). Since the discriminant analysis on the Orbitrap data uses an extra dimension (mass deviation), a lower database searching score is needed to achieve a false discovery rate of less than 0.5%. This advantage of the Orbitrap analyzer allows it to close the gap with the LTQ analyzer. Table 2 and Figure 5 illustrate this comparison, showing that the two instruments offer similar performance in high-throughput peptide identification.

Figure 6
Distribution of cross-correlation-based Z-scores and mass deviation values for peptide matches to spectra whose precursors were isolated in the Orbitrap analyzer. The black dots represent forward database matches, while the red dots represent decoy (reverse) ...

Conclusion

An evaluation of the effect of threshold setting to trigger data dependent acquisition has been performed on a complex peptide mixture using single dimension liquid chromatography on an LTQ-Orbitrap mass spectrometer. In this study we kept the experimental conditions under a fixed standard protocol in order to focus on the variation of a single parameter (the acquisition threshold). However, it is important to note that other factors, such as protein loading amount or concentration may also affect the optimal instrument threshold, since the signal-to-noise ratio (S/N) would depend on the sample concentration.

We analyzed the data by determining the number of tandem mass spectra acquired, the quality of the spectra obtained, the signal-to-noise ratio distribution and the number of peptides and proteins identified. We find that the number of tandem mass spectra obtained is relatively constant at threshold values set at or below the noise level. This may reflect the fact that data dependent acquisition is attempting to acquire a tandem mass spectrum at every opportunity. By examining the spectra in terms of both spectral quality measures and signal-to-noise ratios, we observe that the quality of tandem mass spectra is poorer at lower threshold values. We find that the optimal threshold setting for the best balance between the quantity and quality of spectra collected is at the respective noise levels of the LTQ and Orbitrap analyzers. Finally, we conclude that, for high-throughput protein identification, the two analyzers perform similarly well: the higher scan speed of the LTQ allows it to acquire more spectra, while the higher mass accuracy of the Orbitrap allows a higher proportion of the acquired spectra to be correctly identified.

Supplementary Material

Supplementary Files

Acknowledgments

The authors thank James Wohlschlegel and Jeffrey R. Johnson for their valuable comments and discussions, and thank James Wohlschlegel for the yeast whole-cell lysate sample. C.C.L.W is supported by a National Research Grant (NIH) No. P41 RR011823-10. D.C is supported by NIH Grant No. 5R01-MH067880. J.D.V. is supported by a NIH Service Award fellowship. T.X. is supported by NIH Grant No. DE016267. J.R.Y. is supported by NIH Grants No. RR011823 and 5R01-MH067880.

References

1. Link AJ, Eng J, Schieltz DM, Carmack E, Mize GJ, Morris DR, Garvik BM, Yates JR., III Direct Analysis of Protein Complexes Using Mass Spectrometry. Nat Biotechnol. 1999;17:676–682. [PubMed]
2. Eng J, McCormack A, Yates JR., III An Approach to Correlate MS/MS Data to Amino Acid Sequences in a Protein Database. J Am Soc Mass Spectrom. 1994;5:976–989. [PubMed]
3. Stahl DC, Swiderek KM, Davis MT, Lee TD. Data-Controlled Automation of Liquid Chromatography/Tandem Mass Spectrometry Analysis of Peptide Mixtures. J Am Soc Mass Spectrom. 1996;7:523–540. [PubMed]
4. Wenner BR, Lynn BC. Factors That Affect Ion Trap Data-Dependent MS/MS in Proteomics. J Am Soc Mass Spectrom. 2004;15:150–157. [PubMed]
5. Mann M, Meng CK, Fenn Interpreting Mass Spectra of Multiply Charge Ions. J. B. Anal Chem. 1989;61:1702–1708.
6. Washburn MP, Ulaszek R, Deciu C, Schieltz DM, Yates JR., III Analysis of Quantitative Proteomic Data Generated via Multidimensional Protein Identification Technology. Anal Chem. 2002;74:1650–1657. [PubMed]
7. MacCoss MJ, Wu CC, Lui HB, Sadygov R, Yates JR., III A Correlation Algorithm for the Automated Quantitative Analysis of Shotgun Proteomics Data. Anal Chem. 2003;75:6912–6921. [PubMed]
8. McDonald WH, Ohi R, Miyamoto DT, Mitchison TJ, Yates JR., III Comparison of Three Directly Coupled HPLC MS/MS Strategies for Identification from Complex Mixtures: Single-Dimension LC-MS/MS, 2-phase MudPIT, and 3-phase MudPIT. Int J Mass Spectrom. 2002;219:245–251.
9. Bern M, Goldberg D, McDonald WH, Yates JR., III Automatic Quality Assessment of Peptide Tandem Mass Spectra. Bioinformatics. 2004;20(Suppl 1):149–154. [PubMed]
10. McDonald WH, Tabb DL, Sadygov RG, MacCoss MJ, Venable J, Graumann J, Johnson JR, Cociorva D, Yates JR., III MS1, MS2, and SQT – Three Unified, Compact, and Easily Parsed File Formats for the Storage of Shotgun Proteomic Spectra and Identifications. Rapid Commun Mass Spectrom. 2004;18:2162–2168. [PubMed]
11. Peng J, Elias JE, Thoreen CC, Licklider LJ, Gygi SP. Evaluation of Multidimensional Chromatography Coupled with Tandem Mass Spectrometry (LC/LC-MS/MS) for Large-scale Protein Analysis: the Yeast Proteome. J Proteome Res. 2003;2:43–50. [PubMed]
12. Cociorva D, Tabb DL, Yates JR., III Validation of Tandem Mass Spectrometry Database Search Results Using DTASelect. Curr Protocols Bioinformatics. 2007 Jan;Chapter 13(Unit 13.4) [PubMed]
13. Tabb DL, McDonald HW, Yates JR., III DTASelect and Contrast: Proteomic Tools for Filtering, Summarizing, and Comparing Tandem Mass Spectrometry Results. J Proteome Res. 2002;1:21–36. [PMC free article] [PubMed]
14. Venable JD, Wohlschlegel J, McClatchy DB, Park SK, Yates JR., III Relative Quantification of Stable Isotope Labeled Peptides Using a Linear Ion Trap-Orbitrap Hybrid Mass Spectrometer. Anal Chem. 2007;79:3056–3064. [PMC free article] [PubMed]
15. Xu T, Venable JD, Park SK, Cociorva D, Lu B, Liao L, Wohlschlegel J, Hewel J, Yates JR., III ProLuCID, a fast and sensitive tandem mass spectra-based protein identification program. Mol Cell Prot. 2006;5(10):S174–S174. 671. Suppl S.