PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of mcpMolecular & Cellular Proteomics : MCP
 
Mol Cell Proteomics. 2015 February; 14(2): 405–417.
Published online 2014 November 30. doi:  10.1074/mcp.O114.041376
PMCID: PMC4350035

Preprocessing Significantly Improves the Peptide/Protein Identification Sensitivity of High-resolution Isobarically Labeled Tandem Mass Spectrometry Data*An external file that holds a picture, illustration, etc.
Object name is sbox.jpg

Quanhu Sheng,§** Rongxia Li,** Jie Dai,** Qingrun Li, Zhiduan Su, Yan Guo,§ Chen Li, Yu Shyr,§ and Rong Zeng

Abstract

Isobaric labeling techniques coupled with high-resolution mass spectrometry have been widely employed in proteomic workflows requiring relative quantification. For each high-resolution tandem mass spectrum (MS/MS), isobaric labeling techniques can be used not only to quantify the peptide from different samples by reporter ions, but also to identify the peptide it is derived from. Because the ions related to isobaric labeling may act as noise in database searching, the MS/MS spectrum should be preprocessed before peptide or protein identification. In this article, we demonstrate that there are a lot of high-frequency, high-abundance isobaric related ions in the MS/MS spectrum, and removing isobaric related ions combined with deisotoping and deconvolution in MS/MS preprocessing procedures significantly improves the peptide/protein identification sensitivity. The user-friendly software package TurboRaw2MGF (v2.0) has been implemented for converting raw TIC data files to mascot generic format files and can be downloaded for free from https://github.com/shengqh/RCPA.Tools/releases as part of the software suite ProteomicsTools. The data have been deposited to the ProteomeXchange with identifier PXD000994.

Mass spectrometry-based proteomics has been widely applied to investigate protein mixtures derived from tissue, cell lysates, or from body fluids (1, 2). Liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS)1 is the most popular strategy for protein/peptide mixtures analysis in shotgun proteomics (3). Large-scale protein/peptide mixtures are separated by liquid chromatography followed by online detection by tandem mass spectrometry. The capabilities of proteomics rely greatly on the performance of the mass spectrometer. With the improvement of MS technology, proteomics has benefited significantly from the high-resolution and excellent mass accuracy (4). In recent years, based on the higher efficiency of higher energy collision dissociation (HCD), a new “high–high” strategy (high-resolution MS as well as MS/MS(tandem MS)) has been applied instead of the “high–low” strategy (high-resolution MS, i.e. in Orbitrap, and low-resolution MS/MS, i.e. in ion trap) to obtain high quality tandem MS/MS data as well as full MS in shotgun proteomics. Both full MS scans and MS/MS scans can be performed, and the whole cycle time of MS detection is very compatible with the chromatographic time scale (5).

High-resolution measurement is one of the most important features in mass spectrometric application. In this high–high strategy, high-resolution and accurate spectra will be achieved in tandem MS/MS scans as well as full MS scans, which makes isotopic peaks distinguishable from one another, thus enabling the easy calculation of precise charge states and monoisotopic mass. During an LC-MS/MS experiment, a multiply charged precursor ion (peptide) is usually isolated and fragmented, and then the multiple charge states of the fragment ions are generated and collected. After full extraction of peak lists from original tandem mass spectra, the commonly used search engines (i.e. Mascot (6), Sequest (7)) have no capability to distinguish isotopic peaks and recognize charge states, so all of the product ions are considered as all charge state hypotheses during the database search for protein identification. These multiple charge states of fragment ions and their isotopic cluster peaks can be incorrectly assigned by the search engine, which can cause false peptide identification. To overcome this issue, data preprocessing of the high-resolution MS/MS spectra is required before submitting them for identification. There are usually two major preprocessing steps used for high-resolution MS/MS data: deisotoping and deconvolution (8, 9). Deisotoping of spectra removes all isotopic peaks except monoisotopic peaks from multi-isotopic peaks. Deconvolution of spectra translates multiply charged ions to singly charged ions and also accumulates the intensity of fragment ions by summing up all the intensities from their multiply charged states. After performing these two data-preprocessing steps, the resulting spectra is simpler and cleaner and allows more precise database searching and accurate bioinformatics analysis.

With the capacity to analyze multiple samples simultaneously, stable isotope labeling approaches have been widely used in quantitative proteomics. Stable isotope labeling approaches are categorized as metabolic labeling (SILAC, stable isotope labeling by amino acids in cell culture) and chemical labeling (10, 11). The peptides labeled by the SILAC approach are quantified by precursor ions in full MS spectra, whereas peptides that have been isobarically labeled using chemical means are quantified by reporter ions in MS/MS spectra. There are two similar isobaric chemical labeling methods: (1) isobaric tag for relative and absolute quantification (iTRAQ), and (2) tandem mass tag (TMT) (12, 13). These reagents contain an amino-reactive group that specifically reacts with N-terminal amino groups and epilson-amino groups of lysine residues to label digested peptides in a typical shotgun proteomics experiment. There are four different channels of isobaric tags: TMT two-plex, iTRAQ four-plex, TMT six-plex, and iTRAQ eight-plex (1216). The number before “plex” denotes the number of samples that can be analyzed by the mass spectrum simultaneously. Peptides labeled with different isotopic variants of the tag show identical or similar mass and appear as a single peak in full scans. This single peak may be selected for subsequent MS/MS analysis. In an MS/MS scan, the mass of reporter ions (114 to 117 for iTRAQ four-plex, 113 to 121 for iTRAQ eight-plex, and 126 to 131for TMT six-plex upon CID or HCD activation) are associated with corresponding samples, and the intensities represent the relative abundances of the labeled peptides. Meanwhile, the other ions from the MS/MS spectra can be used for peptide identification. Because of the multiplexing capability, isobaric labeling methods combined with bottom-up proteomics have been widely applied for accurate quantification of proteins on a global scale (14, 1719). Although mostly associated with peptide labeling, these isobaric labeling methods have also been applied at protein level (2023).

For the proteomic analysis of isobarically labeled peptides/proteins in “high–high” MS strategy, the common consensus is that accurate reporter ions can contribute to more accurate quantification. However, there is no evidence to show how the ions related to isobaric labeling affect the peptide/protein identification and what preprocessing steps should be taken for high-resolution isobarically labeled MS/MS. To demonstrate the effectiveness and importance of preprocessing, we examined how the combination of preprocessing steps improved peptide/protein sensitivity in database searching. Several combinatorial ways of data-preprocessing were applied for high-throughput data analysis including deisotoping to keep simple monoisotopic mass peaks, deconvolution of ions with multiple charge states, and preservation of top 10 peaks in every 100 Dalton mass range. After systematic analysis of high-resolution isobarically labeled spectra, we further processed the spectra and removed interferential ions that were not related to the peptide. Our results suggested that the preprocessing of isobarically labeled high-resolution tandem mass spectra significantly improved the peptide/protein identification sensitivity.

EXPERIMENTAL PROCEDURES

Sample Preparation

The Goto-Kakizaki (GK) rat liver tissue was respectively mixed with SDT-lysis buffer (2% SDS, 0.1 m DTT, and 0.1 m Tris-HCl, pH = 7.6) and then heated for 5 min at 100 °C. After that, the tissue layers were cooled to room temperature, sonicated 60 s at 100 w, and then centrifuged at 16,000 × g for 30 min at 20 °C for removing cell debris. The protein concentration was detected by measurements of tryptophan fluorescence as described (24). Briefly, 1 μl of sample or tryptophan standard (100 ng/μl) was added into 3 ml of 8 m urea buffer (8 m urea and 20 mm Tris-HCl, pH = 7.6). Fluorescence was excited at 295 nm and measured at 350 nm. The slits were set at 10 nm.

Six hundred micrograms of liver tissue from GK rat was digested by the FASP procedure as described (25) with small modifications. Each sample was transferred to a 10k filter (Pall Corporation, Port Washington, NY) and centrifuged at 10,000 × g for 20 min at 20 °C. 200 μl of UA buffer (8 m urea and 0.1 m Tris-HCl, pH = 8.5) was added and centrifuged at 10,000 × g for 20 min again. This step was repeated once. Then, the concentrate was mixed with 100 μl of 50 mm IAA in UA buffer and incubated for an additional 40 min at room temperature in darkness. After that, IAA was removed by centrifugation at 10,000 × g for 20 min. Following dilution with 200 μl of UA buffer and centrifugation twice, 200 μl of 200 mm triethylammonium bicarbonate (TEAB) buffer (pH 8.5) was added and centrifuged at 10,000 × g for 20 min. This step was repeated four times. Finally, 100 μl of 50 mm TEAB buffer (pH 8.5) and Trypsin (1:50, enzyme to protein) was added to the filter, and after 4 h, another 50 μg trypsin was added. The samples were digested 20 h at 37 °C and peptides were collected by centrifugation at 16,000 × g. To increase the yield of peptides, the filter was washed twice with 500 μl 0.5 m TEAB buffer (pH 8.5). The peptide solutions were dried in a vacuum concentrator.

The trypsin digestion of 100 μg protein from each sample was processed as described elsewhere. iTRAQ labeling was done following the manufacturer's instructions (AB SCIEX, Foster City, CA). Briefly, for each four- or eight-plex experiment, 100 μg of dried peptide mixture power from each digested sample was reconstituted with 30μl 0.5 mm TEAB Buffer (pH 8.5). Each peptide solution was labeled at room temperature for 2 h with one iTRAQ reagent vial (four-plex mass tag 114, 115, 116, 117 or eight-plex mass tag 113,114, 115, 116, 117, 118, 119, 121) previously reconstituted with 70 μl of anhydrous acetonitrile (ACN). After 2 h, 100 μl ddH2O were added to each tube to quench the iTRAQ reaction and incubated at room temperature for 30 min. The contents of all iTRAQ reagent-labeled sample tubes were combined into one tube for four or eight-plex experiments, respectively. Then, labeled samples were dried down by evaporation in a SpeedVac to obtain a brown pellet. 100 μl of water was added to the tube and the sample was dried completely. Prior to MS analysis, samples were desalted onto Empore C18 47 mm Disk (3 m). Just prior nano-LC, the fractions were resuspended in 20 μl of H2O with 0.1% (v/v) TFA.

LC-MS/MS Analysis

The reverse phase-high performance liquid chromatography (RP-HPLC) separation was achieved on an UltiMate 3000 RSLC nanoLC Systems (Dionex, now ThermoFisher Scientific) equipped with a self-packed tip column (75 μm × 240 mm; C18, 1.9 μm) using a 180 min gradient at a flow rate of 150 nl/min. An LTQ-Orbitrap Velos instrument (Thermo Fisher Scientific) was operated in data-dependent mode. MS full scans were acquired in ranges m/z 300–2000. The mass spectrometer was set so that each full MS scan was followed by the ten most intense ions for MS/MS with charge ≥ +2 with the following Dynamic Exclusion™ settings: repeat counts, 1; repeat duration, 30 s; exclusion duration, 180 s. The normalized collision energy for MS2 was 45.0%. Full MS scans and MS/MS scans were acquired at a resolution of 30,000 for profile-mode and 7500 for centroid-mode respectively, with a lock mass option enabled for the 445.120025 ion. Data were acquired using Xcalibur software.

b/y Free Windows

b/y free windows are two mass windows for a specific mass spectrum that no B ion or Y ion would be in. With the assumption that the mass of an isobaric tag was M, trypsin was used as protease and the isobaric tag was attached at both the N-terminal of peptide and lysine (K), for a spectrum with singly charged precursor mass MH+, the b/y free windows of that spectrum can be calculated as below. Because only full-tryptic peptides are considered in data analysis, the latest amino acid of the peptide will be either arginine (R) with mass 156 or lysine with mass 128. Given the fact that glycine (G) is the smallest amino acid with mass 57, the minimum and maximum mass of B and Y ions can be calculated as formula (1–4):

equation image

equation image

equation image

equation image

where H2O is the mass of water and H is the mass of hydrogen. Then, the b/y free window in the low mass range is from 0 to minimum (minimum (B), minimum (Y)) and the b/y free window in the high mass range is from maximum (maximum (B), maximum (Y)) to infinite.

Ion Frequency and Abundance Analysis

Only the spectra with precursor charges 2, 3, and 4 were used to detect high frequency ions. The ion frequency and ion abundance distribution in each sample were generated by software “Raw Ion Frequency Statistic Builder,” which was also a part of ProteomicsTools. The charge, mass to charge (m/z), and abundance of each ion were extracted from each MS/MS spectrum through Thermo's MS File Reader interface. The abundance of ions in each MS/MS was normalized to a uniform distribution [0..1]. The ions with relative abundance less than 0.01 were discarded. All remaining ions were deconvoluted to corresponding singly charged ions by formula (5). The ions without charge information were treated as a single charge state.

equation image

where H is the mass of hydrogen.

The ions in different deconvoluted spectra but with difference in masses less than 20 parts per million (ppm) were considered identical ions. The ion frequency and ion average relative abundance were calculated from all the MS/MS spectra in the sample. The ions with frequency larger than 0.3 and average relative abundance larger than 0.05 were defined as high frequency ions and classified to five categories: “Rep+,” “Label+,” “Y1,” “b/y free,” and “Unknown.” “Rep+” denotes that an ion is a reporter ion. “Label+” denotes that an ion is an isobaric tag ion with both reporter group and balance group. “Y1” denotes that an ion is a first Y series ion. Because trypsin was used in the sample preparation, a Y1 ion was produced from either lysine (K) or arginine (R). b/y free denotes that the mass of the ion is located in the b/y free windows of that spectrum. All other ions belonged to the “Unknown” category. An ion within one of the first four categories “Rep+, Label+, Y1, and b/y free) was considered annotated. For each deconvoluted tandem mass spectrum (forward spectrum), a backward spectrum was generated by using the mass of the precursor minus the mass of each forward ion. The backward ions were also filtered and annotated in the same fashion as the forward ions except that the ions with mass equal to “Label+” were marked as “Precursor-Label+.” “Precursor-Label+” denotes a precursor ion without the isobaric tag. The ions annotated as Rep+, Label+, and Precursor-Label+ are not related to the peptide and therefore can be confidently removed during data preprocessing. For the ions annotated as b/y free in low mass range, they are very likely not related to the peptide as well. But it is still possible that those ions are actually multiply charged ions that lack charge information in the spectrum.

Data Preprocessing

The tandem mass spectra were extracted by TurboRaw2MGF (v1.3.4) for database searching. Four fixed criteria were used to filter out low quality spectra: (1) the required precursor mass weight range was 400 to 5000 Daltons, (2) the minimum ion absolute abundance was 1.0, 3) the minimum ion count of a spectrum was 15, and 4) the minimum total ion absolute abundance of a spectrum was 100. Four processing options were also provided in TurboRaw2MGF including deisotoping to keep monoisotopic mass peaks, deconvolution of ions with multiple charge states, preservation of the top 10 peaks in every 100 Dalton mass range, and removing the ions that may not be related to the peptide. The spectra that passed the fixed criteria and were processed with a combination of the four options were saved in mascot generic format for further database searching.

Database Searching

Five engines were used for database searching, including Mascot (v2.2.2) (6), Comet (2014.01 rev. 1) (26), MyriMatch (v2.2.140) (27), OMSSA (v2.1.9) (28), and X! Tandem (2013.09.01.1) (29). All MS/MS spectra were searched against a composite target-decoy rat Uniprot database (Version 20120222), in which each protein sequence was followed by a reversed amino acid sequence. Trypsin was set as protease. Carbamidomethylation on cysteine (+57.021464), iTRAQ-labeling on N-terminal, and lysine were set as fixed modifications. Oxidation on methionine (+15.994915) was set as a variable modification. One missing cleavage site was allowed. The tolerances of peptides and fragment ions were set at 10 ppm and 0.02 Daltons respectively. SearchGUI (30) was used for MyriMatch and OMSSA searching. BuildSummary (31) was used to generate a confident protein list for both peptide and protein with a false discovery rate ≤ 0.01.

Software Development

We implemented our preprocessing steps in a user friendly software package named TurboRaw2MGF (v2.0). The previous version of TurboRaw2MGF was developed for low-resolution tandem mass spectra and was integrated into the package ProtQuantSuite (32). TurboRaw2MGF (v2.0) was developed using the C# programming language and was compiled in the Microsoft Visual Studio 2012 Professional Edition. The software is fully compatible with Windows-based operating systems with dotNET framework v4.5. TurboRaw2MGF (v2.0) and its source code can be downloaded freely from ln]https://github.com/shengqh/RCPA.Tools/releases/. The manual of TurboRaw2MGF (v2.0) can be viewed at https://github.com/shengqh/RCPA.Tools/wiki/.

Data Availability

The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium (33) via the PRIDE partner repository with the data set identifier PXD000994 and DOI 10.6019/PXD000994.

To access the data please visit: http://tinyurl.com/pdbkesj

Username: ku.ca.ibe@69760reweiver

Password: jWjYoiuT

RESULTS

Isobaric Related Mass Range

Table I illustrates some important ion properties in isobaric labeling methods. For iTRAQ4 spectra, the mass of a Label+ ion is within the low mass b/y free window, and the mass of a Precursor-Label+ ion is also within the high mass b/y free window. The isobaric related mass ranges include both low and high b/y free windows. For iTRAQ8 spectra, the mass of a Label+ ion is not within the low mass b/y free window and the mass of a Precursor-Label+ ion is also not within the high mass b/y free window. The isobaric related mass ranges not only include both low and high b/y free windows but also include the mass range around Label+ ion and Precursor-Label+ ion within a specific tolerance, which was 20 ppm in our study.

Table I
Ion characteristics of isobaric labeling methods

Ion Frequency and Abundance

Tables II and III show the high frequency forward ions in iTRAQ4 and iTRAQ8 tandem mass spectra respectively. Almost all high frequency forward ions in iTRAQ4 tandem mass spectra were annotated, except 429.0888. Even with the majority of high frequency ions annotated, there were still more ions left unannotated in the iTRAQ8 tandem mass spectra than in iTRAQ4 tandem mass spectra.

Table II
High frequency ions in iTRAQ4 tandem mass spectra
Table III
High frequency ions in iTRAQ8 tandem mass spectra. Rep+: singly charged reporter ion, Label+: singly charged isobaric tag ion, Y1(R): y1 ion from peptide with 3′ terminal amino acid R, Y1(K): y1 ion from peptide with 3′ terminal amino ...

For backward ions, only 144.1 (frequency = 0.3316, median of abundance = 0.207) from iTRAQ4 tandem mass spectra with double precursor charge and 304.1997 (frequency = 0.380, median of abundance = 0.199) from iTRAQ8 tandem mass spectra with double precursor charge passed the criteria. Both ions were annotated as Precursor-Label+.

Also, the frequency and abundance of reporter ions in both iTRAQ4 and iTRAQ8 data sets were decreased while the corresponding precursor charge increased.

Identification Sensitivity Improvement

We evaluated how the combination of preprocessing steps affected the peptide/protein identification sensitivity under the same peptide/protein false discovery rate 0.01. Table IV indicated 16 methods with different combination of five processing options used in the data preprocessing.

Table IV
16 preprocessing methods with different combinations of three preprocessing steps

Fig. 1 illustrates the identification results from iTRAQ4 and iTRAQ8 data sets using five search engines. The bigger the point of a method in the graph, the more identification that method achieved in the same engine and same isobaric labeling method. The red circle indicates the preprocessing method that achieved the most identification among all 16 methods. In iTRAQ4 data set, Mascot, MyriMatch, OMSSA, and X! Tandem achieved the most identified spectrum, peptide, and two-hit protein identification with preprocessing isobaric related ions, although the top performance method of each engine might not be identical to each other. In iTRAQ8 data set, only Mascot, OMSSA, and X! Tandem achieved most two-hit protein identification with preprocessing isobaric related ions. The preprocessing did not significantly improve the Comet identification sensitivity in both iTRAQ4 and iTRAQ8 data sets.

Fig. 1.
Identification improvement rank of 16 preprocessing methods in five searching engines and two isobaric labeling approaches. The size of spot indicates the rank of method based on identification performance. The bigger the spot, the better the identification ...

Fig. 2 illustrates the identification improvement of 15 preprocessing methods compared with non-preprocessing methods in iTRAQ4 and iTRAQ8 data sets. Among all five search engines, Mascot identification sensitivity was significantly improved by most of the preprocessing methods. The identification sensitivity of MyriMatch, OMSSA, and X! Tandem was moderately improved by some of the preprocessing methods. The identification sensitivity of Comet was not improved by most of the preprocessing methods. The detailed identification summary was also provided as supplemental Table S1–S10.

Fig. 2.
The spectrum/peptide/two-hit protein identification improvement percentage of 16 preprocessing methods in five searching engines and two isobaric labeling approaches. Mascot achieved most identification improvement among five engines while Comet achieved ...

Comparing method 2 to method 1 in Table IV and V indicates that deisotoping and deconvolution significantly improved the Mascot spectrum identification for iTRAQ4 and iTRAQ8 from 16,442 to 18,286 (increased 11.2%) and from 8817 to 10,219 (increased 15.9%) respectively. Comparing method 3 to method 1 shows that keeping the top 10 ions in each 100 Dalton window decreased the Mascot identification sensitivity for the iTRAQ4 data set but increased the identification sensitivity for the iTRAQ8 data set. Identified spectrum count were moderately increased for iTRAQ4 (from 16,442 to 17,912, increased 8.9%) and significantly increased for iTRAQ8 (from 8817 to 12,012, increased 36.2%) by removing isobaric tag ions and the ions in low mass range b/y free window (comparing method 5 to method 1). Comparing methods 5, 6, and 7 to 1 indicates removing any one of the three isobaric related ion types improved Mascot identification sensitivity in both iTRAQ4 and iTRAQ8 data sets, except the ions in high mass range b/y free window in iTRAQ4 data set. Finally, comparing method 10 to method 1 in Table IV indicates that deisotoping, deconvolution, and removing isobaric ions improved the Mascot spectrum identification from 16,442 to 19,118 (increased 16.3%), the peptide identification from 6275 to 7148 (increased 13.9%), and the two-hit protein identification from 950 to 1013 (increased 6.6%) in iTRAQ4 data set. Comparing method 16 to method 1 in Table V indicates that deisotoping, deconvolution, and removing all possible isobaric related ions improved the Mascot spectrum identification from 8817 to 13,240 (increased 50.2%), the peptide identification from 3349 to 4671 (increased 39.5%) and the two-hit protein identification from 612 to 766 (increased 25.2%) in iTRAQ8 data set.

Table V
Identification result from iTRAQ4 dataset using Mascot
Table VI
Identification result from iTRAQ8 dataset using Mascot

Mascot Score Improvement by Data Preprocessing

We evaluated how the Mascot peptide identification scores were improved by preprocessing of tandem mass spectra before database searching. The scores of peptide-spectrum-match identified in method 1 and 10 in iTRAQ4 data set and method 1 and 16 in iTRAQ8 data set were extracted (See supplemental Table S11). Fig. 3 indicates that data preprocessing before database searching improved the identification scores from a majority of spectra at both iTRAQ4 and iTRAQ8 data sets. p value 2.2e-16 from Wilcoxon rank sum test indicates that the score improvement in iTRAQ8 data set was significantly higher than in iTRAQ4 data set.

Fig. 3.
Mascot score improvement after preprocessing tandem mass spectra. Both top two density plots and bottom two violin plots indicated that the majority of the spectra gained score improvement with data preprocessing in both iTRAQ4 and iTRAQ8 data sets. ...

C-terminal Peptide Identification

Because the tryptic peptide generated from the protein carboxyl terminus (C-terminal peptide) usually does not follow the assumption that the Y1 ion is either Y1(K) or Y1(R), which we use for calculating the b/y free window, we checked how those peptides were identified before and after data preprocessing. The scores of C-terminal peptide identified in method 1 and 10 in iTRAQ4 data set and method 1 and 16 in iTRAQ8 data set were extracted (See supplemental Table S12). In Fig. 4, the top two Venn diagrams indicate that preprocessing also increases C- terminal peptide identification sensitivity in both iTRAQ4 and iTRAQ8 data set, and the bottom two scatter plots indicate that the Mascot scores from a majority of commonly identified C- terminal peptides also increased after preprocessing.

Fig. 4.
C-terminal peptide identification improvement in iTRAQ4 and iTRAQ8 data sets after preprocessing tandem mass spectra. The top two Venn diagrams indicated that preprocessing also increased C-terminal peptide identification sensitivity in both iTRAQ4 and ...

DISCUSSION

We annotated the high frequency ions in isobarically labeled tandem mass spectra. The majority of high frequency ions in iTRAQ4 and iTRAQ8 data sets could be annotated as reporter ions (Rep+), isobaric tag ions (Label+), Y1 ions, or ions in the b/y free window. More unannotated ions were observed in iTRAQ8 data set than in iTRAQ4 data set. Such a phenomenon can be caused by the more complex iTRAQ8 isobaric labeling tag compared with iTRAQ4, which could introduce more byproduct ions by isolation of mass spectrometry. Reporter ions and isobaric tag ions are isobaric ions and can be confidently removed from the MS/MS spectrum for database searching. The other high frequency ions in the b/y free windows are very possibly not introduced by the peptide itself but by either the isobaric labeling procedure or mass spectrometry system. Those ions might be removed to de-noise the tandem mass spectra for improving identification sensitivity. But there is still a possibility that the ions in the low mass range b/y free window are actually multiply charged b/y ions but that their charges cannot be estimated from mass spectrum, thus, removing such ions may decrease the identification sensitivity. The benefit of removing the ions in b/y free window may be varied between different isobaric labeling methods and different searching engines. With less ions in low mass b/y free window in iTRAQ4 than in iTRAQ8 data set (supplemental Fig. S1), removing isobaric ions only may be more suitable for iTRAQ4 data and removing ions in low mass b/y free window may be more suitable for iTRAQ8 data. We also observed a few high frequency ions outside of b/y free windows, including 429.0888. Without confidential evidence, we did not remove them in this study.

We also examined the factors that might affect the sensitivity of peptide identification. Our results showed that the combination of deisotoping/deconvolution and removing isobaric related ions significantly improved the Mascot identification sensitivity and moderately improved MyriMatch, X! Tandem, and OMSSA identification sensitivity for both iTRAQ4 and iTRAQ8 data sets. Comet was only slightly affected by preprocessing procedure. We further validated our results using an independent TMT6 data set using Mascot. The analysis results from this TMT6 data set also showed similar peptide/protein identification sensitivity improvement (See supplemental Table S13). Based on our results, we conclude that removing isobaric related ions combined with deisotoping/deconvolution is highly recommended for preprocessing isobarically labeled MS/MS spectra before database search, especially for Mascot search engine.

The complexity of the isobaric labeling tag significantly affects the identification sensitivity improvement after preprocessing tandem mass spectra. Keeping the top 10 ions in each 100 Dalton window slightly decreased the Mascot peptide identification sensitivity in iTRAQ4 data sets, regardless of whether it was combined with deisotoping and deconvolution. It may indicate that the high-resolution mass spectra in our iTRAQ4 data set were very clean that keeping the top 10 ions in each 100 Daltons was not necessary during data preprocessing. This finding may require additional validation in other independent iTRAQ4 data sets. On the other hand, keeping the top 10 ions in each 100 Dalton window slightly increased the Mascot peptide identification sensitivity in the iTRAQ8 data sets. Comparing to method 1, a combination of deisotoping/deconvolution, keeping the top 10 ions in each 100 Dalton window, and removing isobaric related ions (method 16) improved identified spectra, peptides, and two-hit proteins for iTRAQ8 over iTRAQ4 by 32.7%, 36.4%, and 18.5% respectively. This suggests that preprocessing is more crucial for iTRAQ8 than iTRAQ4 data.

We validated the identification improvement of the C-terminal peptides. C-terminal peptides might not end with “K” or “R,” which voids our assumption for b/y free window calculation that Y1 ions were either from K or R. The result indicated that data preprocessing not only improved the Mascot scores of major C-terminal peptides but also increased the identification sensitivity of C-terminal peptides: even the ions in low mass b/y free window were removed.

We implemented TurboRawToMGF (v2.0) with a user friendly GUI. The GUI allows users to transfer the data generated from high-resolution mass spectrometry (such as Thermo LTQ-OrbitrapVelos) to mascot generic format file conveniently. TurboRawToMGF also supports filtering spectra based on user-defined mass ranges. For example, the user may define 428.75–429.25 to remove the 429.0888 ion. TurboRawToMGF (v2.0) offers many other conveniences to users. For example, the conversion from mzData and mzXML format file to mascot generic format file is supported, and conversion of multiple files in batch mode is also provided. TurboRawToMGF is free, and it will be consistently supported in the coming years.

Supplementary Material

Supplemental Data:

Acknowledgments

We thank GSA program by Thermo. We are grateful to Margot Bjoring for her editorial support. The data deposition to the ProteomeXchange Consorium was supported by PRIDE Team, EBI.

Footnotes

Contributed by

Author contributions: Q.S., J.D., Y.S., and R.Z. designed research; R.L., Q.L., Z.S., and C.L. performed research; Q.S. analyzed data; Q.S., R.L., J.D., and Y.G. wrote the paper.

* This work was supported by grants from Ministry of Science and Technology (2011CB910200, 2014CB910500, 2011CB910600), and a grant from the National Natural Science Foundation of China (31130034).

An external file that holds a picture, illustration, etc.
Object name is sbox.jpg This article contains supplemental Fig. S1 and Tables S1 to S13.

1 The abbreviations used are:

MS/MS
Tandem Mass Spectrometry
LC
Liquid Chromatography
m/z
mass-to-charge ratios
SILAC
stable isotope labeling by amino acids in cell culture
iTRAQ
isobaric tag for relative and absolute quantification
TMT
tandem mass tag.

REFERENCES

1. Yates J. R., 3rd, Gilchrist A., Howell K. E., Bergeron J. J. (2005) Proteomics of organelles and large cellular structures. Nat. Rev. Mol. Cell Biol. 6, 702–714 [PubMed]
2. Walther T. C., Mann M. (2010) Mass spectrometry-based proteomics in cell biology. J. Cell Biol. 190, 491–500 [PMC free article] [PubMed]
3. Wolters D. A., Washburn M. P., Yates J. R., 3rd (2001) An automated multidimensional protein identification technology for shotgun proteomics. Anal. Chem. 73, 5683–5690 [PubMed]
4. Mann M., Kelleher N. L. (2008) Precision proteomics: the case for high-resolution and high mass accuracy. Proc. Natl. Acad. Sci. U.S.A. 105, 18132–18138 [PubMed]
5. Olsen J. V., Schwartz J. C., Griep-Raming J., Nielsen M. L., Damoc E., Denisov E., Lange O., Remes P., Taylor D., Splendore M., Wouters E. R., Senko M., Makarov A., Mann M., Horning S. (2009) A dual pressure linear ion trap Orbitrap instrument with very high sequencing speed. Mol. Cell Proteomics 8, 2759–2769 [PMC free article] [PubMed]
6. Perkins D. N., Pappin D. J., Creasy D. M., Cottrell J. S. (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551–3567 [PubMed]
7. Eng J. K., McCormack A. L., Yates J. R. (1994) An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectr. 5, 976–989 [PubMed]
8. Carvalho P. C., Xu T., Han X., Cociorva D., Barbosa V. C., Yates J. R., 3rd (2009) YADA: a tool for taking the most out of high-resolution spectra. Bioinformatics 25, 2734–2736 [PMC free article] [PubMed]
9. Liu X., Inbar Y., Dorrestein P. C., Wynne C., Edwards N., Souda P., Whitelegge J. P., Bafna V., Pevzner P. A. (2010) Deconvolution and database search of complex tandem mass spectra of intact proteins: a combinatorial approach. Mol. Cell Proteomics 9, 2772–2782 [PMC free article] [PubMed]
10. Ong S. E., Blagoev B., Kratchmarova I., Kristensen D. B., Steen H., Pandey A., Mann M. (2002) Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics. Mol. Cell Proteomics 1, 376–386 [PubMed]
11. Bantscheff M., Schirle M., Sweetman G., Rick J., Kuster B. (2007) Quantitative mass spectrometry in proteomics: a critical review. Anal. Bioanal. Chem. 389, 1017–1031 [PubMed]
12. Thompson A., Schafer J., Kuhn K., Kienle S., Schwarz J., Schmidt G., Neumann T., Johnstone R., Mohammed A. K., Hamon C. (2003) Tandem mass tags: a novel quantification strategy for comparative analysis of complex protein mixtures by MS/MS. Anal. Chem. 75, 1895–1904 [PubMed]
13. Ross P. L., Huang Y. N., Marchese J. N., Williamson B., Parker K., Hattan S., Khainovski N., Pillai S., Dey S., Daniels S., Purkayastha S., Juhasz P., Martin S., Bartlet-Jones M., He F., Jacobson A., Pappin D. J. (2004) Multiplexed protein quantitation in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents. Mol. Cell Proteomics 3, 1154–1169 [PubMed]
14. Aggarwal K., Choe L. H., Lee K. H. (2006) Shotgun proteomics using the iTRAQ isobaric tags. Brief. Funct. Genomics Proteomics 5, 112–120 [PubMed]
15. Choe L., D'Ascenzo M., Relkin N. R., Pappin D., Ross P., Williamson B., Guertin S., Pribil P., Lee K. H. (2007) 8-plex quantitation of changes in cerebrospinal fluid protein expression in subjects undergoing intravenous immunoglobulin treatment for Alzheimer's disease. Proteomics 7, 3651–3660 [PMC free article] [PubMed]
16. Dayon L., Hainard A., Licker V., Turck N., Kuhn K., Hochstrasser D. F., Burkhard P. R., Sanchez J. C. (2008) Relative quantification of proteins in human cerebrospinal fluids by MS/MS using 6-plex isobaric tags. Anal. Chem. 80, 2921–2931 [PubMed]
17. Leitner A., Lindner W. (2009) Chemical tagging strategies for mass spectrometry-based phospho-proteomics. Methods Mol. Biol. 527, 229–243 [PubMed]
18. Treumann A., Thiede B. (2010) Isobaric protein and peptide quantification: perspectives and issues. Expert Rev. Proteomics 7, 647–653 [PubMed]
19. Coombs K. M. (2011) Quantitative proteomics of complex mixtures. Expert Rev. Proteomics 8, 659–677 [PubMed]
20. Wiese S., Reidegeld K. A., Meyer H. E., Warscheid B. (2007) Protein labeling by iTRAQ: a new tool for quantitative mass spectrometry in proteome research. Proteomics 7, 340–350 [PubMed]
21. Prudova A., auf dem Keller U., Butler G. S., Overall C. M. (2010) Multiplex N-terminome analysis of MMP-2 and MMP-9 substrate degradomes by iTRAQ-TAILS quantitative proteomics. Mol. Cell Proteomics 9, 894–911 [PMC free article] [PubMed]
22. Sinclair J., Timms J. F. (2011) Quantitative profiling of serum samples using TMT protein labelling, fractionation and LC-MS/MS. Methods 54, 361–369 [PubMed]
23. Hung C. W., Tholey A. (2012) Tandem mass tag protein labeling for top-down identification and quantification. Anal. Chem. 84, 161–170 [PubMed]
24. Nielsen P. A., Olsen J. V., Podtelejnikov A. V., Andersen J. R., Mann M., Wisniewski J. R. (2005) Proteomic mapping of brain plasma membrane proteins. Mol. Cell Proteomics 4, 402–408 [PubMed]
25. Cox J., Mann M. (2008) MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 26, 1367–1372 [PubMed]
26. Eng J. K., Jahan T. A., Hoopmann M. R. (2013) Comet: an open-source MS/MS sequence database search tool. Proteomics 13, 22–24 [PubMed]
27. Tabb D. L., Fernando C. G., Chambers M. C. (2007) MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. J. Proteome Res. 6, 654–661 [PMC free article] [PubMed]
28. Geer L. Y., Markey S. P., Kowalak J. A., Wagner L., Xu M., Maynard D. M., Yang X., Shi W., Bryant S. H. (2004) Open mass spectrometry search algorithm. J. Proteome Res. 3, 958–964 [PubMed]
29. Craig R., Beavis R. C. (2004) TANDEM: matching proteins with tandem mass spectra. Bioinformatics 20, 1466–1467 [PubMed]
30. Vaudel M., Barsnes H., Berven F. S., Sickmann A., Martens L. (2011) SearchGUI: An open-source graphical user interface for simultaneous OMSSA and X!Tandem searches. Proteomics 11, 996–999 [PubMed]
31. Sheng Q., Dai J., Wu Y., Tang H., Zeng R. (2012) BuildSummary: using a group-based approach to improve the sensitivity of peptide/protein identification in shotgun proteomics. J. Proteome Res. 11, 1494–1502 [PubMed]
32. Mann B., Madera M., Sheng Q., Tang H., Mechref Y., Novotny M. V. (2008) ProteinQuant Suite: a bundle of automated software tools for label-free quantitative proteomics. Rapid Commun. Mass Spectrom. 22, 3823–3834 [PubMed]
33. Vizcaino J. A., Deutsch E. W., Wang R., Csordas A., Reisinger F., Rios D., Dianes J. A., Sun Z., Farrah T., Bandeira N., Binz P. A., Xenarios I., Eisenacher M., Mayer G., Gatto L., Campos A., Chalkley R. J., Kraus H. J., Albar J. P., Martinez-Bartolome S., Apweiler R., Omenn G. S., Martens L., Jones A. R., Hermjakob H. (2014) ProteomeXchange provides globally coordinated proteomics data submission and dissemination. Nat. Biotechnol. 32, 223–226 [PMC free article] [PubMed]

Articles from Molecular & Cellular Proteomics : MCP are provided here courtesy of American Society for Biochemistry and Molecular Biology