Summary: YADA can deisotope and decharge high-resolution mass spectra from large peptide molecules, link the precursor monoisotopic peak information to the corresponding tandem mass spectrum, and account for different co-fragmenting ion species (multiplexed spectra). We describe how YADA enables a pipeline consisting of ProLuCID and DTASelect for analyzing large-scale middle-down proteomics data.
Supplementary information: Supplementary data are available at Bioinformatics online.
Stable isotope tracing with ultra-high resolution Fourier transform-ion cyclotron resonance-mass spectrometry (FT-ICR-MS) can provide simultaneous determination of hundreds to thousands of metabolite isotopologue species without the need for chromatographic separation. Therefore, this experimental metabolomics methodology may allow the tracing of metabolic pathways starting from stable-isotope-enriched precursors, which can improve our mechanistic understanding of cellular metabolism. However, contributions to the observed intensities arising from the stable isotope's natural abundance must be subtracted (deisotoped) from the raw isotopologue peaks before interpretation. Previously posed deisotoping problems are sidestepped due to the isotopic resolution and identification of individual isotopologue peaks. This peak resolution and identification come from the very high mass resolution and accuracy of FT-ICR-MS and present an analytically solvable deisotoping problem, even in the context of stable-isotope enrichment.
We present both a computationally feasible analytical solution and an algorithm to this newly posed deisotoping problem, which both work with any amount of 13C or 15N stable-isotope enrichment. We demonstrate this algorithm and correct for the effects of 13C natural abundance on a set of raw isotopologue intensities for a specific phosphatidylcholine lipid metabolite derived from a 13C-tracing experiment.
Correction for the effects of 13C natural abundance on a set of raw isotopologue intensities is computationally feasible when the raw isotopologues are isotopically resolved and identified. Such correction makes qualitative interpretation of stable isotope tracing easier and is required before attempting a more rigorous quantitative interpretation of the isotopologue data. The presented implementation is very robust with increasing metabolite size. Error analysis of the algorithm will be straightforward due to low relative error from the implementation itself. Furthermore, the algorithm may serve as an independent quality control measure for a set of observed isotopologue intensities.
Protonated molecular peptide ions and their product ions generated by tandem mass spectrometry appear as isotopologue clusters due to the natural isotopic variations of carbon, hydrogen, nitrogen, oxygen and sulfur. Quantitation of the isotopic composition of peptides can be employed in experiments involving isotope effects, isotope exchange, isotopic labeling by chemical reactions, and studies of metabolism by stable isotope incorporation. Both ion trap and quadrupole-time of flight mass spectrometry are shown to be capable of determining the isotopic composition of peptide product ions obtained by tandem mass spectrometry with both precision and accuracy. Tandem mass spectra obtained in profile-mode of clusters of isotopologue ions are fit by non-linear least squares to a series of Gaussian peaks (described in the accompanying manuscript) which quantify the Mn/M0 values which define the isotopologue distribution (ID). To determine the isotopic composition of product ions from their ID, a new algorithm that predicts the Mn/M0 ratios is developed which obviates the need to determine the intensity of all of the ions of an ID. Consequently a precise and accurate determination of the isotopic composition a product ion may be obtained from only the initial values of the ID, however the entire isotopologue cluster must be isolated prior to fragmentation. Following optimization of the molecular ion isolation width, fragmentation energy and detector sensitivity, the presence of isotopic excess (2H, 13C, 15N, 18O) is readily determined within 1%. The ability to determine the isotopic composition of sequential product ions permits the isotopic composition of individual amino acid residues in the precursor ion to be determined.
isotopologue distribution; mass isotopomer distribution; tandem mass spectrometry; deuterium incorporation; isotopic excess; isotope quantitation; H/D exchange; protein turnover
We developed the LipidQA (Lipid Qualitative/Quantitative Analysis) software platform to identify and quantitate complex lipid molecular species in biological mixtures. LipidQA can process raw electronic data files from the TSQ-7000 triple stage quadrupole and LTQ linear ion trap mass spectrometers from Thermo-Finnigan and the Q-TOF hybrid quadrupole/time-of-flight instrument from Waters-Micromass and could readily be modified to accommodate data from others. The program processes multiple spectra in a few seconds and includes a deisotoping algorithm that increases the accuracy of structural identification and quantitation. Identification is achieved by comparing MS2 spectra obtained in a data-dependent manner to a library of reference spectra of complex lipids that we have acquired or constructed from established fragmentation rules. The current form of the algorithm can process data acquired in negative or positive ion mode for glycerophospholipid species of all major head-group classes.
glycerophospholipid; electrospray ionization; computer algorithm; quantitation; identification; database searching
Tandem mass spectrometry (MS/MS) is frequently used in the identification of peptides and proteins. Typical proteomic experiments rely on algorithms such as SEQUEST and MASCOT to compare thousands of tandem mass spectra against the theoretical fragment ion spectra of peptides in a database. The probabilities that these spectrum-to-sequence assignments are correct can be determined by statistical software such as PeptideProphet or through estimations based on reverse or decoy databases. However, many of the software applications that assign probabilities for MS/MS spectra to sequence matches were developed using training datasets from 3D ion-trap mass spectrometers. Given the variety of types of mass spectrometers that have become commercially available over the last five years, we sought to generate a dataset of reference data covering multiple instrumentation platforms to facilitate both the refinement of existing computational approaches and the development of novel software tools. We analyzed the proteolytic peptides in a mixture of tryptic digests of 18 proteins, named the “ISB standard protein mix”, using 8 different mass spectrometers. These include linear and 3D ion traps, two quadrupole time-of-flight platforms (qq-TOF) and two MALDI-TOF-TOF platforms. The resulting dataset, which has been named the Standard Protein Mix Database, consists of over 1.1 million spectra in 150+ replicate runs on the mass spectrometers. The data were inspected for quality of separation and searched using SEQUEST. All data, including the native raw instrument and mzXML formats and the PeptideProphet validated peptide assignments, are available at http://regis-web.systemsbiology.net/PublicDatasets/.
Proteomics; reference dataset; database search software; standard protein mix; Standard Protein Mix Database
In a bottom-up shotgun approach, the proteins of a mixture are enzymatically digested, separated, and analyzed via tandem mass spectrometry. The mass spectra relating fragment ion intensities (abundance) to the mass-to-charge are used to deduce the amino acid sequence and identify the peptides and proteins. The variables that influence intensity were characterized using a multi-factorial mixed-effects model, a ten-fold cross-validation, and stepwise feature selection on 6,352,528 fragment ions from 61,543 peptide ions. Intensity was higher in fragment ions that did not have neutral mass loss relative to any mass loss or that had a +1 charge state. Peptide ions classified for proton mobility as non-mobile had lowest intensity of all mobility levels. Higher basic residue (arginine, lysine or histidine) counts in the peptide ion and low counts in the fragment ion were associated with lower fragment ion intensities. Higher counts of proline in peptide and fragment ions were associated with lower intensities. These results are consistent with the mobile proton theory. Opposite trends between peptide and fragment ion counts and intensity may be due to the different impact of factor under consideration at different stages of the MS/MS experiment or to the different distribution of observations across peptide and fragment ion levels. Presence of basic residues at all three positions next to the fragmentation site was associated with lower fragment ion intensity. The presence of proline proximal to the fragmentation site enhanced fragmentation and had the opposite trend when located distant from the site. A positive association between fragment ion intensity and presence of sulfur residues (cysteine and methionine) on the vicinity of the fragmentation site was identified. These results highlight the multi-factorial nature of fragment ion intensity and could improve the algorithms for peptide identification and the simulation in tandem mass spectrometry experiments.
Fragment ion; Peptide ion; MS/MS; Proton mobility; Stepwise selection; Neutral mass loss; Proline
Peptide sequence identification using tandem mass spectroscopy remains a major challenge for complex proteomic studies. Peptide matching algorithms require the accurate determination of both the mass and charge of the precursor ion and accommodate uncertainties in these properties by using a wide precursor mass tolerance and by testing, for each spectrum, several possible candidate charges. Using a data acquisition strategy that includes obtaining narrow mass-range MS1 “zoom” scans, we describe here a post-acquisition algorithm dubbed MAZIE, that accurately determines the charge and monoisotopic mass of precursor ions on a low-resolution Thermo LTQ-XL mass spectrometer. This is achieved by examining the isotopic distribution obtained in the preceding MS1 zoom spectrum and comparing to theoretical distributions for candidate charge states from +1 to +4. MAZIE then writes modified data files with the corrected monoisotopic mass and charge. We have validated MAZIE results by comparing the sequence search results obtained with the MAZIE-generated data files to results using the unmodified data files. Using two different search algorithms and a false discovery rate filter, we found that MAZIE-interpreted data resulted in 80% (using SEQUEST) and 30% (using OMSSA) more high-confidence sequence identifications. Analyses of these results indicate that the accurate determination of the precursor ion mass greatly facilitates the ability to differentiate between true and false positive matches, while the determination of the precursor ion charge reduces the overall search time but does not significantly reduce the ambiguity of interpreting the search results. MAZIE is distributed as an open-source PERL script.
Although tandem mass spectrometry (MS/MS) has become an integral part of proteomics, intensity patterns in MS/MS spectra are rarely weighted heavily in most widely used algorithms because they are not yet fully understood. Here a knowledge mining approach is demonstrated to discover fragmentation intensity patterns and elucidate the chemical factors behind such patterns. Fragmentation intensity information from 28 330 ion trap peptide MS/MS spectra of different charge states and sequences went through unsupervised clustering using a penalized K-means algorithm. Without any prior chemistry assumptions, four clusters with distinctive fragmentation patterns were obtained. A decision tree was generated to investigate peptide sequence motif and charge state status that caused these fragmentation patterns. This data-mining scheme is generally applicable for any large data sets. It bypasses the common prior knowledge constraints and reports on the overall peptide fragmentation behavior. It improves the understanding of gas-phase peptide dissociation and provides a foundation for new or improved protein identification algorithms.
data mining; cluster analysis; K-means algorithm; penalized K-means algorithm; CART; statistical analysis; quantile map; peptide; MS/MS; intensity; ion trap; CID; fragmentation pattern; dissociation pattern; pairwise cleavage; Xxx-Zzz; cleavage pair; Fisher information; Pro; Gly; Asp; Glu; Arg; Lys
We describe a method for assessing the quality of mass spectra and improving reliability of relative ratio estimations from 18O-water labeling experiments acquired from low resolution mass spectrometers. The mass profiles of heavy and light peptide pairs are often affected by artifacts, including co-eluting contaminant species, noise signal, instrumental fluctuations in measuring ion position and abundance levels. Such artifacts distort the profiles, leading to erroneous ratio estimations thus reducing the reliability of ratio estimations in high throughput quantification experiments.
We used support vector machines (SVMs) to filter out mass spectra that deviated significantly from expected theoretical isotope distributions. We built an SVM classifier with a decision function which assigns a score to every mass profile based on such spectral features as mass accuracy, signal-to-noise ratio, and differences between experimental and theoretical isotopic distributions.
The classifier was trained using a dataset obtained from samples of mouse renal cortex. We then tested it on protein samples (bovine serum albumin), mixed in five different ratios of labeled and unlabeled species. We demonstrated that filtering the data using our SVM classifier results in as much as a nine-fold reduction in the coefficient of variance of peptide ratios, thus significantly improving the reliability of ratio estimations.
support vector machines; stable-isotope labeling; signal-to-noise ratio; isotope distribution; mass accuracy
Peptide and protein identification via tandem mass spectrometry (MS/MS) lies at the heart of proteomic characterization of biological samples. Several algorithms are able to search, score, and assign peptides to large MS/MS datasets. Most popular methods, however, underutilize the intensity information available in the tandem mass spectrum due to the complex nature of the peptide fragmentation process, thus contributing to loss of potential identifications. We present a novel probabilistic scoring algorithm called Context-Sensitive Peptide Identification (CSPI) based on highly flexible Input-Output Hidden Markov Models (IO-HMM) that capture the influence of peptide physicochemical properties on their observed MS/MS spectra. We use several local and global properties of peptides and their fragment ions from literature. Comparison with two popular algorithms, Crux (re-implementation of SEQUEST) and X!Tandem, on multiple datasets of varying complexity, shows that peptide identification scores from our models are able to achieve greater discrimination between true and false peptides, identifying up to ∼25% more peptides at a False Discovery Rate (FDR) of 1%. We evaluated two alternative normalization schemes for fragment ion-intensities, a global rank-based and a local window-based. Our results indicate the importance of appropriate normalization methods for learning superior models. Further, combining our scores with Crux using a state-of-the-art procedure, Percolator, we demonstrate the utility of using scoring features from intensity-based models, identifying ∼4-8 % additional identifications over Percolator at 1% FDR. IO-HMMs offer a scalable and flexible framework with several modeling choices to learn complex patterns embedded in MS/MS data.
High-resolution tandem mass spectra can now be readily acquired with hybrid instruments, such as LTQ-Orbitrap and LTQ-FT, in high-throughput shotgun proteomics workflows. The improved spectral quality enables more accurate de novo sequencing for identification of post-translational modifications and amino acid polymorphisms.
In this study, a new de novo sequencing algorithm, called Vonode, has been developed specifically for analysis of such high-resolution tandem mass spectra. To fully exploit the high mass accuracy of these spectra, a unique scoring system is proposed to evaluate sequence tags based primarily on mass accuracy information of fragment ions. Consensus sequence tags were inferred for 11,422 spectra with an average peptide length of 5.5 residues from a total of 40,297 input spectra acquired in a 24-hour proteomics measurement of Rhodopseudomonas palustris. The accuracy of inferred consensus sequence tags was 84%. According to our comparison, the performance of Vonode was shown to be superior to the PepNovo v2.0 algorithm, in terms of the number of de novo sequenced spectra and the sequencing accuracy.
Here, we improved de novo sequencing performance by developing a new algorithm specifically for high-resolution tandem mass spectral data. The Vonode algorithm is freely available for download at http://compbio.ornl.gov/Vonode.
Collision-induced dissociation (CID) is a common ion activation technique used to energize mass-selected peptide ions during tandem mass spectrometry. Characteristic fragment ions form from the cleavage of amide bonds within a peptide undergoing CID, allowing the inference of its amino acid sequence. The statistical characterization of these fragment ions is essential for improving peptide identification algorithms and for understanding the complex reactions taking place during CID. An examination of 1465 ion trap spectra from doubly charged tryptic peptides reveals several trends important to understanding this fragmentation process. While less abundant than y ions, b ions are present in sufficient numbers to aid sequencing algorithms. Fragment ions exhibit a characteristic series-specific relationship between their masses and intensities. Each residue influences fragmentation at adjacent amide bonds, with Pro quantifiably enhancing cleavage at its N-terminal amide bond and His increasing the formation of b ions at its C-terminal amide bond. Fragment ions corresponding to a formal loss of ammonia appear preferentially in peptides containing Gln and Asn. These trends are partially responsible for the complexity of peptide tandem mass spectra.
Tandem mass spectrometry has become particularly useful for the rapid identification and characterization of protein components of complex biological mixtures. Powerful database search methods have been developed for the peptide identification, such as SEQUEST and MASCOT, which are implemented by comparing the mass spectra obtained from unknown proteins or peptides with theoretically predicted spectra derived from protein databases. However, the majority of spectra generated from a mass spectrometry experiment are of too poor quality to be interpreted while some of spectra with high quality cannot be interpreted by one method but perhaps by others. Hence a filtering algorithm that removes those spectra with poor quality prior to the database search is appealing.
This paper proposes a support vector machine (SVM) based approach to assess the quality of tandem mass spectra. Each mass spectrum is mapping into the 16 proposed features to describe its quality. Based the results from SEQUEST, four SVM classifiers with the input of the 16 features are trained and tested on ISB data and TOV data, respectively. The superior performance of the proposed SVM classifiers is illustrated both by the comparison with the existing classifiers and by the validation in terms of MASCOT search results.
The proposed method can be employed to effectively remove the poor quality spectra before the spectral searching, and also to find the more peptides or post-translational peptides from spectra with high quality using different search engines or de novo method.
Currently, the tandem mass spectrometry (MSMS) of peptides is a dominant technique used to identify peptides and consequently proteins. The peptide fragmentation inside the mass analyzer typically offers a spectrum containing several different groups of ions. The mass to charge (m/z) values of these ions can be exactly calculated following simple rules based on the possible peptide fragmentation reactions. But the (relative) intensities of the particular ions cannot be simply predicted from the amino-acid sequence of the peptide. This study presents initial work towards developing a theoretical fundamental approach to ion intensity elucidation by utilizing quantum mechanical computations.
MSMS spectra of the doubly charged GAVLK peptide were collected on electrospray ion trap mass spectrometers using low energy modes of fragmentation. Density functional theory (DFT) calculations were performed on the population of ion precursors to determine the fragment ion intensities corresponding to a Boltzmann distribution of the protonation of nitrogens in the peptide backbone amide bonds.
We were able to a) predict the y and b ions intensities order in concert with the experimental observation; b) predict relative intensities of y ions with errors not exceeding the experimental variation.
These results suggest that the GAVLK peptide fragmentation process in the ion trap mass spectrometer is predominantly driven by the thermodynamic stability of the precursor ions formed upon ionization of the sample. The computational approach presented in this manuscript successfully calculated ion intensities in the mass spectra of this doubly charged tryptic peptide, based solely on its amino acid sequence. As such, this work indicates a potential of incorporating quantum mechanical calculations into mass spectrometry based algorithms for molecular identification.
Shotgun proteomics experiments are dependent upon database search engines to identify peptides from tandem mass spectra. Many of these algorithms score potential identifications by evaluating the number of fragment ions matched between each peptide sequence and an observed spectrum. These systems, however, generally do not distinguish between matching an intense peak and matching a minor peak. We have developed a statistical model to score peptide matches that is based upon the multivariate hypergeometric distribution. This scorer, part of the “MyriMatch” database search engine, places greater emphasis on matching intense peaks. The probability that the best match for each spectrum has occurred by random chance can be employed to separate correct matches from random ones. We evaluated this software on data sets from three different laboratories employing three different ion trap instruments. Employing a novel system for testing discrimination, we demonstrate that stratifying peaks into multiple intensity classes improves the discrimination of scoring. We compare MyriMatch results to those of Sequest and X!Tandem, revealing that it is capable of higher discrimination than either of these algorithms. When minimal peak filtering is employed, performance plummets for a scoring model that does not stratify matched peaks by intensity. On the other hand, we find that MyriMatch discrimination improves as more peaks are retained in each spectrum. MyriMatch also scales well to tandem mass spectra from high-resolution mass analyzers. These findings may indicate limitations for existing database search scorers that count matched peaks without differentiating them by intensity. This software and source code is available under Mozilla Public License at this URL: http://www.mc.vanderbilt.edu/msrc/bioinformatics/.
Proteomics; Identification; Statistical Distribution; Reversed Database; Peak Filtering
Protein-amide proton hydrogen-deuterium exchange (HDX) is used to investigate protein conformation, conformational changes and surface binding sites for other molecules. To our knowledge, software tools to automate data processing and analysis from sample fractionating (LC-MALDI) mass-spectrometry-based HDX workflows are not publicly available.
An integrated data pipeline (Solvent Explorer/TOF2H) has been developed for the processing of LC-MALDI-derived HDX data. Based on an experiment-wide template, and taking an ab initio approach to chromatographic and spectral peak finding, initial data processing is based on accurate mass-matching to fully deisotoped peaklists accommodating, in MS/MS-confirmed peptide library searches, ambiguous mass-hits to non-target proteins. Isotope-shift re-interrogation of library search results allows quick assessment of the extent of deuteration from peaklist data alone. During raw spectrum editing, each spectral segment is validated in real time, consistent with the manageable spectral numbers resulting from LC-MALDI experiments. A semi-automated spectral-segment editor includes a semi-automated or automated assessment of the quality of all spectral segments as they are pooled across an XIC peak for summing, centroid mass determination, building of rates plots on-the-fly, and automated back exchange correction. The resulting deuterium uptake rates plots from various experiments can be averaged, subtracted, re-scaled, error-barred, and/or scatter-plotted from individual spectral segment centroids, compared to solvent exposure and hydrogen bonding predictions and receive a color suggestion for 3D visualization. This software lends itself to a "divorced" HDX approach in which MS/MS-confirmed peptide libraries are built via nano or standard ESI without source modification, and HDX is performed via LC-MALDI using a standard MALDI-TOF. The complete TOF2H package includes additional (eg LC analysis) modules.
"TOF2H" provides a comprehensive HDX data analysis package that has accelerated the processing of LC-MALDI-based HDX data in the authors' lab from weeks to hours. It runs in a standard MS Windows (XP or Vista) environment, and can be downloaded or obtained from the authors at no cost.
Mass spectrometers can produce a large number of tandem mass spectra. They are unfortunately noise-contaminated. Noises can affect the quality of tandem mass spectra and thus increase the false positives and false negatives in the peptide identification. Therefore, it is appealing to develop an approach to denoising tandem mass spectra.
We propose a novel approach to denoising tandem mass spectra. The proposed approach consists of two modules: spectral peak intensity adjustment and intensity local maximum extraction. In the spectral peak intensity adjustment module, we introduce five features to describe the quality of each peak. Based on these features, a score is calculated for each peak and is used to adjust its intensity. As a result, the intensity will be adjusted to a local maximum if a peak is a signal peak, and it will be decreased if the peak is a noisy one. The second module uses a morphological reconstruction filter to remove the peaks whose intensities are not the local maxima of the spectrum. Experiments have been conducted on two ion trap tandem mass spectral datasets: ISB and TOV. Experimental results show that our algorithm can remove about 69% of the peaks of a spectrum. At the same time, the number of spectra that can be identified by Mascot algorithm increases by 31.23% and 14.12% for the two tandem mass spectra datasets, respectively.
The proposed denoising algorithm can be integrated into current popular peptide identification algorithms such as Mascot to improve the reliability of assigning peptides to spectra.
Availability of the software
The software created from this work is available upon request.
Protein quantification is an essential step in many proteomics experiments. A number of labeling approaches have been proposed and adopted in mass spectrometry (MS) based relative quantification. The mTRAQ, one of the stable isotope labeling methods, is amine-specific and available in triplex format, so that the sample throughput could be doubled when compared with duplex reagents.
Methods and results
Here we propose a novel data analysis algorithm for peptide quantification in triplex mTRAQ experiments. It improved the accuracy of quantification in two features. First, it identified and separated triplex isotopic clusters of a peptide in each full MS scan. We designed a schematic model of triplex overlapping isotopic clusters, and separated triplex isotopic clusters by solving cubic equations, which are deduced from the schematic model. Second, it automatically determined the elution areas of peptides. Some peptides have similar atomic masses and elution times, so their elution areas can have overlaps. Our algorithm successfully identified the overlaps and found accurate elution areas. We validated our algorithm using standard protein mixture experiments.
We showed that our algorithm was able to accurately quantify peptides in triplex mTRAQ experiments. Its software implementation is compatible with Trans-Proteomic Pipeline (TPP), and thus enables high-throughput analysis of proteomics data.
Tandem mass spectra (MS/MS) produced using electron transfer dissociation (ETD) differ from those derived from collision-activated dissociation (CAD) in several important ways. Foremost, the predominant fragment ion series are different: c-and z•-type ions are favored in ETD spectra while b-and y-type ions comprise the bulk of the CAD spectra. Additionally, ETD spectra possess specific neutral losses and charge-reduced precursors . Most database search algorithms were designed to analyze CAD spectra, and have only recently been adapted to accomodate c-and z•-ions; however, inclusion of these additional spectral features can hinder identification, leading to lower confidence scores and decreased sensitivity. Therefore, it is important to pre-process spectral data prior to submission to a database search to remove those features which cause complications. Here, we demonstrate the effect of removing these features on the number of identifications at a 1% false discovery rate (FDR) using the open mass spectrometry search algorithm (OMSSA). When analyzing two biological replicates of a yeast protein extract in three total analyses, the number of identifications with a ~1% FDR increased from ~4611 to ~5931 upon spectral pre-processing – an increase of ~28.6%. We outline the most effective pre-processing methods, and provide free software containing these algorithms.
Post translational modifications (PTMs) play most important roles in the accomplishment of biological processes and molecular functions. Tandem mass spectrometry (MS/MS) has become one of the most promising methods for the detection and characterizat ion of PTMs. Most MS/MS based researches for the detection of PTMs have achieved good performance by searching protein databases with short peptide sequence tags or by exhaustive searching on one PTM. It is challenging to identify two PTMs for a tandem mass spectrum. In this paper, we proposed a new algorithm to detect two PTMs with unknown types. First, we constructed a Pair of Peak Set (PPS) which is composed of pairs of peaks that have the highest sum of intensities. We used PPS instead of the whole experimental spectra to compare with the theoretical spectra from a peptide sequence database. Second, we revealed the relationship between PPS and the whole experimental spectrum. Third, a series of logic conditions was proposed to detect PTMs from a MS/MS spectrum. The conditions were used to filter a peptide sequence database and get candidate hits. Finally, we used a scoring function to rank the candidate hits. We applied the method to a large MS/MS dataset and the experimental results demonstrated that the proposed method achieved better performance of identifying any types of PTMs in a blind mode than current existing methods. In our future work, we will use this method to identify more than two PTMs from a MS/MS spectrum.
tandem mass spectrum; post-translational modification (PTM); peptide sequence tags
Analysis of multiple LC-MS based metabolomic studies is carried out to determine overlaps and differences among various experiments. For example, in large metabolic biomarker discovery studies involving hundreds of samples, it may be necessary to conduct multiple experiments, each involving a subset of the samples due to technical limitations. The ions selected from each experiment are analyzed to determine overlapping ions. One of the challenges in comparing the ion lists is the presence of a large number of derivative ions such as isotopes, adducts, and fragments. These derivative ions and the retention time drifts need to be taken into account during comparison.
We implemented an ion annotation-assisted method to determine overlapping ions in the presence of derivative ions. Following this, each ion is represented by the monoisotopic mass of its cluster. This mass is then used to determine overlaps among the ions selected across multiple experiments.
The resulting ion list provides better coverage and more accurate identification of metabolites compared to the traditional method in which overlapping ions are selected on the basis of individual ion mass.
The advantages and disadvantages of acquiring tandem mass spectra by collision-induced dissociation (CID) of peptides in linear ion trap – Fourier-transform hybrid instruments are described. These instruments offer the possibility to transfer fragment ions from the linear ion trap to the FT-based analyzer for analysis with both high resolution and high mass accuracy. In addition, performing CID during the transfer of ions from the linear ion trap (LTQ) to the FT analyzer is also possible in instruments containing an additional collision cell (i.e., the “C-trap” in the LTQ-Orbitrap), resulting in tandem mass spectra over the full m/z range and not limited by the ejection q value of the LTQ. Our results show that these scan modes have lower duty cycles than tandem mass spectra acquired in the LTQ with nominal mass resolution, and typically result in fewer peptide identifications during data-dependent analysis of complex samples. However, the higher measured mass accuracy and resolution provides more specificity and hence provides a lower false positive ratio for the same number of true positives during database search of peptide tandem mass spectra. In addition, the search for modified and unexpected peptides is greatly facilitated with this data acquisition mode. It is therefore concluded that acquisition of tandem mass spectral data with high measured mass accuracy and resolution is a competitive alternative to “classical” data acquisition strategies, especially in situations of complex searches from large databases, searches for modified peptides, or for peptides resulting from unspecific cleavages.
Peptides are commonly identified by searching tandem mass spectrometric data against a protein sequence collection; the protein sequences are digested in silico according to the specificity of the enzyme used in the experiment, and the fragments of each peptide are calculated. With this approach, the information in the tandem mass spectra cannot fully be utilized because we can neither predict the probability for observing a peptide, nor accurately calculate the intensities of the fragment ions from the sequence. In contrast, if we search a spectrum library that has been built from experimental data, we can more effectively utilize the information, because the search space is minimized by searching only mass spectra that have previously been observed, and the intensities of the fragment ions in the query spectrum and the library spectrum are similar. Here, we present a method for constructing and using annotated peptide spectrum libraries (ASL).
The ASL were created from a set of consensus spectra associated with peptide sequences from approximately 13,000,000 confidently assigned experimental tandem spectra from the Global Proteome Machine Database, using a four-stage pipeline curation process to improve the reliability. The current ASL collection contains data for six model eukaryotic species: human, mouse, dog, cow, rat, and budding yeast. Average ASL number of spectra per gene ranged from 6.9 (human) to 3.8 (cow). The peptide sequences in these libraries are sequence aligned with the corresponding ENSEMBL, SWISS-PROT, IPI, or SGD accession numbers. A high-speed search engine, X! Hunter, was constructed to use these libraries, which can identify peptides from sets of experimental mass spectra at a rate of 20,000 spectra/second.
The speed and sensitivity of the search engine are compared with standard techniques, and application of ASL to the high-speed screening of experimental data and instrument control is discussed.
Searching spectral libraries in tandem mass spectrometry (MS/MS) is an important new approach to improving the quality of peptide and protein identification. The idea relies on the observation that ion intensities in an MS/MS spectrum of a given peptide are generally reproducible across experiments, and thus, matching between spectra from an experiment and the spectra of previously identified peptides stored in a spectral library can lead to better peptide identification compared to the traditional database search. However, the use of libraries is greatly limited by their coverage of peptide sequences: even for well-studied organisms a large fraction of peptides have not been previously identified. To address this issue, we propose to expand spectral libraries by predicting the MS/MS spectra of peptides based on the spectra of peptides with similar sequences. We first demonstrate that the intensity patterns of dominant fragment ions between similar peptides tend to be similar. In accordance with this observation, we develop a neighbor-based approach which first selects peptides that are likely to have spectra similar to the target peptide and then combines their spectra using a weighted K-nearest neighbor method to accurately predict fragment ion intensities corresponding to the target peptide. This approach has the potential to predict spectra for every peptide in the proteome. When rigorous quality criteria are applied, we estimate that the method increases the coverage of spectral libraries available from the National Institute of Standards and Technology by 20–60%, although the values vary with peptide length and charge state. We find that the overall best search performance is achieved when spectral libraries are supplemented by the high quality predicted spectra.
The acquisition of high-resolution tandem mass spectra (MS/MS) is becoming more prevalent in proteomics, but most researchers employ peptide identification algorithms that were designed prior to this development. Here we demonstrate new software, Morpheus, designed specifically for high–mass accuracy data, based on a simple score that is little more than the number of matching products. For a diverse collection of datasets from a variety of organisms (E. coli, yeast, human) acquired on a variety of instruments (quadrupole–time of flight, ion trap orbitrap, quadrupole–orbitrap) in different laboratories, Morpheus gives more spectrum, peptide, and protein identifications at a 1% false discovery rate (FDR) than Mascot, Open Mass Spectrometry Search Algorithm (OMSSA), and Sequest. Additionally, Morpheus is 1.5 to 4.6 times faster—depending on the dataset—than the next fastest algorithm, OMSSA. Morpheus was developed in C#.NET and is available free and open source under a permissive license.
proteomics; mass spectrometry; database search algorithm; software