Data generated from liquid chromatography coupled to high-resolution mass spectrometry (LC-MS)-based studies of a biological sample can contain large amounts of biologically significant information in the form of proteins, peptides, and metabolites. Interpreting this data involves inferring the masses and abundances of biomolecules injected into the instrument. Because of the inherent complexity of mass spectral patterns produced by these biomolecules, the analysis is significantly enhanced by using visualization capabilities to inspect and confirm results. In this paper we describe Decon2LS, an open-source software package for automated processing and visualization of high-resolution MS data. Drawing extensively on algorithms developed over the last ten years for ICR2LS, Decon2LS packages the algorithms as a rich set of modular, reusable processing classes for performing diverse functions such as reading raw data, routine peak finding, theoretical isotope distribution modelling, and deisotoping. Because the source code is openly available, these functionalities can now be used to build derivative applications in relatively fast manner. In addition, Decon2LS provides an extensive set of visualization tools, such as high performance chart controls.
With a variety of options that include peak processing, deisotoping, isotope composition, etc, Decon2LS supports processing of multiple raw data formats. Deisotoping can be performed on an individual scan, an individual dataset, or on multiple datasets using batch processing. Other processing options include creating a two dimensional view of mass and liquid chromatography (LC) elution time features, generating spectrum files for tandem MS data, creating total intensity chromatograms, and visualizing theoretical peptide profiles. Application of Decon2LS to deisotope different datasets obtained across different instruments yielded a high number of features that can be used to identify and quantify peptides in the biological sample.
Decon2LS is an efficient software package for discovering and visualizing features in proteomics studies that require automated interpretation of mass spectra. Besides being easy to use, fast, and reliable, Decon2LS is also open-source, which allows developers in the proteomics and bioinformatics communities to reuse and refine the algorithms to meet individual needs.
Decon2LS source code, installer, and tutorials may be downloaded free of charge at .
Methods based on isotope labeling and LC-MS/MS analysis are becoming a standard approach for the simultaneous identification and quantification of proteins on the proteomic scale, as they provide high accuracy, precision, sensitivity, speed, and compatibility with protein pre-fractionation. However, due to the high complexity of these experiments, a software platform is required that allows for efficient data acquisition, interpretation, and validation.
We describe such a validation platform for quantitative proteomics, which supports different types of mass spectrometers (e.g., MALDI-TOF/TOF, ESI-ion trap, -Q-OTOF, and -FTMS) and any label chemistry including multiplex labels. The software assists in the optimization of LC and MS instrument-specific parameters by providing 2D views on the whole LC-MS/MS dataset (SurveyViewer). The SurveyViewer facilitates tasks such as the fast assessment of the mass calibration or the fragmentation efficiency of isobaric labeling tags.
Protein regulations are derived from peptide regulation distributions using lognormal theory and median statistics. Median statistics is used in addition for the automated elimination of outliers. Box and whisker plots and bar graphs assist in evaluating the quantitative results and providing the basis for the manual acceptance or rejection of individual quantitative data.
Direct links to sequence-annotated raw spectra of a particular regulated protein or peptide with plots of the identified amino acid sequence allow for efficient case study and curation of unexpected results, and the interactive validation of uncertain results.
The described features of the software platform WARP-LC are exemplified in LC-MS/MS studies of protein mixtures using ICPL, iTRAQ, and SILAC. Essential methodological issues such as quantification based on median vs. average statistics and peak areas vs. peak intensities were evaluated. It appears that the monoisotopic peak intensities of de-isotoped isotopic clusters are preferable for quantification, as the average quantification errors are approximately 20% lower.
Summary: YADA can deisotope and decharge high-resolution mass spectra from large peptide molecules, link the precursor monoisotopic peak information to the corresponding tandem mass spectrum, and account for different co-fragmenting ion species (multiplexed spectra). We describe how YADA enables a pipeline consisting of ProLuCID and DTASelect for analyzing large-scale middle-down proteomics data.
Supplementary information: Supplementary data are available at Bioinformatics online.
Mass spectrometry (MS) is an essential analytical tool in proteomics. Many existing algorithms for peptide detection are based on isotope template matching and usually work at different charge states separately, making them ineffective to detect overlapping peptides and low abundance peptides.
We present BPDA, a Bayesian approach for peptide detection in data produced by MS instruments with high enough resolution to baseline-resolve isotopic peaks, such as MALDI-TOF and LC-MS. We model the spectra as a mixture of candidate peptide signals, and the model is parameterized by MS physical properties. BPDA is based on a rigorous statistical framework and avoids problems, such as voting and ad-hoc thresholding, generally encountered in algorithms based on template matching. It systematically evaluates all possible combinations of possible peptide candidates to interpret a given spectrum, and iteratively finds the best fitting peptide signal in order to minimize the mean squared error of the inferred spectrum to the observed spectrum. In contrast to previous detection methods, BPDA performs deisotoping and deconvolution of mass spectra simultaneously, which enables better identification of weak peptide signals and produces higher sensitivities and more robust results. Unlike template-matching algorithms, BPDA can handle complex data where features overlap. Our experimental results indicate that BPDA performs well on simulated data and real MS data sets, for various resolutions and signal to noise ratios, and compares very favorably with commonly used commercial and open-source software, such as flexAnalysis, OpenMS, and Decon2LS, according to sensitivity and detection accuracy.
Unlike previous detection methods, which only employ isotopic distributions and work at each single charge state alone, BPDA takes into account the charge state distribution as well, thus lending information to better identify weak peptide signals and produce more robust results. The proposed approach is based on a rigorous statistical framework, which avoids problems generally encountered in algorithms based on template matching. Our experiments indicate that BPDA performs well on both simulated data and real data, and compares very favorably with commonly used commercial and open-source software. The BPDA software can be downloaded from http://gsp.tamu.edu/Publications/supplementary/sun10a/bpda.
Protonated molecular peptide ions and their product ions generated by tandem mass spectrometry appear as isotopologue clusters due to the natural isotopic variations of carbon, hydrogen, nitrogen, oxygen and sulfur. Quantitation of the isotopic composition of peptides can be employed in experiments involving isotope effects, isotope exchange, isotopic labeling by chemical reactions, and studies of metabolism by stable isotope incorporation. Both ion trap and quadrupole-time of flight mass spectrometry are shown to be capable of determining the isotopic composition of peptide product ions obtained by tandem mass spectrometry with both precision and accuracy. Tandem mass spectra obtained in profile-mode of clusters of isotopologue ions are fit by non-linear least squares to a series of Gaussian peaks (described in the accompanying manuscript) which quantify the Mn/M0 values which define the isotopologue distribution (ID). To determine the isotopic composition of product ions from their ID, a new algorithm that predicts the Mn/M0 ratios is developed which obviates the need to determine the intensity of all of the ions of an ID. Consequently a precise and accurate determination of the isotopic composition a product ion may be obtained from only the initial values of the ID, however the entire isotopologue cluster must be isolated prior to fragmentation. Following optimization of the molecular ion isolation width, fragmentation energy and detector sensitivity, the presence of isotopic excess (2H, 13C, 15N, 18O) is readily determined within 1%. The ability to determine the isotopic composition of sequential product ions permits the isotopic composition of individual amino acid residues in the precursor ion to be determined.
isotopologue distribution; mass isotopomer distribution; tandem mass spectrometry; deuterium incorporation; isotopic excess; isotope quantitation; H/D exchange; protein turnover
Mass spectrometers can produce a large number of tandem mass spectra. They are unfortunately noise-contaminated. Noises can affect the quality of tandem mass spectra and thus increase the false positives and false negatives in the peptide identification. Therefore, it is appealing to develop an approach to denoising tandem mass spectra.
We propose a novel approach to denoising tandem mass spectra. The proposed approach consists of two modules: spectral peak intensity adjustment and intensity local maximum extraction. In the spectral peak intensity adjustment module, we introduce five features to describe the quality of each peak. Based on these features, a score is calculated for each peak and is used to adjust its intensity. As a result, the intensity will be adjusted to a local maximum if a peak is a signal peak, and it will be decreased if the peak is a noisy one. The second module uses a morphological reconstruction filter to remove the peaks whose intensities are not the local maxima of the spectrum. Experiments have been conducted on two ion trap tandem mass spectral datasets: ISB and TOV. Experimental results show that our algorithm can remove about 69% of the peaks of a spectrum. At the same time, the number of spectra that can be identified by Mascot algorithm increases by 31.23% and 14.12% for the two tandem mass spectra datasets, respectively.
The proposed denoising algorithm can be integrated into current popular peptide identification algorithms such as Mascot to improve the reliability of assigning peptides to spectra.
Availability of the software
The software created from this work is available upon request.
Stable isotope tracing with ultra-high resolution Fourier transform-ion cyclotron resonance-mass spectrometry (FT-ICR-MS) can provide simultaneous determination of hundreds to thousands of metabolite isotopologue species without the need for chromatographic separation. Therefore, this experimental metabolomics methodology may allow the tracing of metabolic pathways starting from stable-isotope-enriched precursors, which can improve our mechanistic understanding of cellular metabolism. However, contributions to the observed intensities arising from the stable isotope's natural abundance must be subtracted (deisotoped) from the raw isotopologue peaks before interpretation. Previously posed deisotoping problems are sidestepped due to the isotopic resolution and identification of individual isotopologue peaks. This peak resolution and identification come from the very high mass resolution and accuracy of FT-ICR-MS and present an analytically solvable deisotoping problem, even in the context of stable-isotope enrichment.
We present both a computationally feasible analytical solution and an algorithm to this newly posed deisotoping problem, which both work with any amount of 13C or 15N stable-isotope enrichment. We demonstrate this algorithm and correct for the effects of 13C natural abundance on a set of raw isotopologue intensities for a specific phosphatidylcholine lipid metabolite derived from a 13C-tracing experiment.
Correction for the effects of 13C natural abundance on a set of raw isotopologue intensities is computationally feasible when the raw isotopologues are isotopically resolved and identified. Such correction makes qualitative interpretation of stable isotope tracing easier and is required before attempting a more rigorous quantitative interpretation of the isotopologue data. The presented implementation is very robust with increasing metabolite size. Error analysis of the algorithm will be straightforward due to low relative error from the implementation itself. Furthermore, the algorithm may serve as an independent quality control measure for a set of observed isotopologue intensities.
Peptide identification by tandem mass spectrometry is the dominant proteomics workflow for protein characterization in complex samples. The peptide fragmentation spectra generated by these workflows exhibit characteristic fragmentation patterns that can be used to identify the peptide. In other fields, where the compounds of interest do not have the convenient linear structure of peptides, fragmentation spectra are identified by comparing new spectra with libraries of identified spectra, an approach called spectral matching. In contrast to sequence-based tandem mass spectrometry search engines used for peptides, spectral matching can make use of the intensities of fragment peaks in library spectra to assess the quality of a match. We evaluate a hidden Markov model approach (HMMatch) to spectral matching, in which many examples of a peptide's fragmentation spectrum are summarized in a generative probabilistic model that captures the consensus and variation of each peak's intensity. We demonstrate that HMMatch has good specificity and superior sensitivity, compared to sequence database search engines such as X!Tandem. HMMatch achieves good results from relatively few training spectra, is fast to train, and can evaluate many spectra per second. A statistical significance model permits HMMatch scores to be compared with each other, and with other peptide identification tools, on a unified scale. HMMatch shows a similar degree of concordance with X!Tandem, Mascot, and NIST's MS Search, as they do with each other, suggesting that each tool can assign peptides to spectra that the others miss. Finally, we show that it is possible to extrapolate HMMatch models beyond a single peptide's training spectra to the spectra of related peptides, expanding the application of spectral matching techniques beyond the set of peptides previously observed.
computational molecular biology; mass spectroscopy; HMM; peptide identification; algorithms
Successful application of crosslinking combined with mass spectrometry for studying proteins and protein complexes requires specifically-designed crosslinking reagents, experimental techniques, and data analysis software. Using isotopically-coded ("heavy and light") versions of the crosslinker and cleavable crosslinking reagents is analytically advantageous for mass spectrometric applications and provides a "handle" that can be used to distinguish crosslinked peptides of different types, and to increase the confidence of the identification of the crosslinks.
Here, we describe a program suite designed for the analysis of mass spectrometric data obtained with isotopically-coded cleavable crosslinkers. The suite contains three programs called: DX, DXDX, and DXMSMS. DX searches the mass spectra for the presence of ion signal doublets resulting from the light and heavy isotopic forms of the isotopically-coded crosslinking reagent used. DXDX searches for possible mass matches between cleaved and uncleaved isotopically-coded crosslinks based on the established chemistry of the cleavage reaction for a given crosslinking reagent. DXMSMS assigns the crosslinks to the known protein sequences, based on the isotopically-coded and un-coded MS/MS fragmentation data of uncleaved and cleaved peptide crosslinks.
The combination of these three programs, which are tailored to the analytical features of the specific isotopically-coded cleavable crosslinking reagents used, represents a powerful software tool for automated high-accuracy peptide crosslink identification. See: http://www.creativemolecules.com/CM_Software.htm
Motivation: The addition of ion mobility spectrometry to liquid chromatography-mass spectrometry experiments requires new, or updated, software tools to facilitate data processing.
Results: We introduce a command line software application LC-IMS-MS Feature Finder that searches for molecular ion signatures in multidimensional liquid chromatography-ion mobility spectrometry-mass spectrometry (LC-IMS-MS) data by clustering deisotoped peaks with similar monoisotopic mass, charge state, LC elution time and ion mobility drift time values. The software application includes an algorithm for detecting and quantifying co-eluting chemical species, including species that exist in multiple conformations that may have been separated in the IMS dimension.
Availability: LC-IMS-MS Feature Finder is available as a command-line tool for download at http://omics.pnl.gov/software/LC-IMS-MS_Feature_Finder.php. The Microsoft.NET Framework 4.0 is required to run the software. All other dependencies are included with the software package. Usage of this software is limited to non-profit research to use (see README).
Supplementary data are available at Bioinformatics online.
In a bottom-up shotgun approach, the proteins of a mixture are enzymatically digested, separated, and analyzed via tandem mass spectrometry. The mass spectra relating fragment ion intensities (abundance) to the mass-to-charge are used to deduce the amino acid sequence and identify the peptides and proteins. The variables that influence intensity were characterized using a multi-factorial mixed-effects model, a ten-fold cross-validation, and stepwise feature selection on 6,352,528 fragment ions from 61,543 peptide ions. Intensity was higher in fragment ions that did not have neutral mass loss relative to any mass loss or that had a +1 charge state. Peptide ions classified for proton mobility as non-mobile had lowest intensity of all mobility levels. Higher basic residue (arginine, lysine or histidine) counts in the peptide ion and low counts in the fragment ion were associated with lower fragment ion intensities. Higher counts of proline in peptide and fragment ions were associated with lower intensities. These results are consistent with the mobile proton theory. Opposite trends between peptide and fragment ion counts and intensity may be due to the different impact of factor under consideration at different stages of the MS/MS experiment or to the different distribution of observations across peptide and fragment ion levels. Presence of basic residues at all three positions next to the fragmentation site was associated with lower fragment ion intensity. The presence of proline proximal to the fragmentation site enhanced fragmentation and had the opposite trend when located distant from the site. A positive association between fragment ion intensity and presence of sulfur residues (cysteine and methionine) on the vicinity of the fragmentation site was identified. These results highlight the multi-factorial nature of fragment ion intensity and could improve the algorithms for peptide identification and the simulation in tandem mass spectrometry experiments.
Fragment ion; Peptide ion; MS/MS; Proton mobility; Stepwise selection; Neutral mass loss; Proline
The PepArML meta-search peptide identification platform provides a unified search interface to seven search engines; a robust cluster, grid, and cloud computing scheduler for large-scale searches; and an unsupervised, model-free, machine-learning-based result combiner, which selects the best peptide identification for each spectrum, estimates false-discovery rates, and outputs pepXML format identifications. The meta-search platform supports Mascot; Tandem with native, k-score, and s-score scoring; OMSSA; MyriMatch; and InsPecT with MS-GF spectral probability scores — reformatting spectral data and constructing search configurations for each search engine on the fly. The combiner selects the best peptide identification for each spectrum based on search engine results and features that model enzymatic digestion, retention time, precursor isotope clusters, mass accuracy, and proteotypic peptide properties, requiring no prior knowledge of feature utility or weighting. The PepArML meta-search peptide identification platform often identifies 2–3 times more spectra than individual search engines at 10% FDR.
Proteomics; Mass-Spectrometry; Machine-Learning; Cloud-Computing
Solution-phase hydrogen/deuterium exchange monitored by mass spectrometry is an excellent tool to study protein-protein interactions and conformational changes in biological systems, especially when traditional methods such as X-ray crystallography or nuclear magnetic resonance are not feasible. Peak overlap among the dozens of proteolytic fragments (including those from autolysis of the protease) can be severe, due to high protein molecular weight(s) and the broad isotopic distributions due to multiple deuterations of many peptides. In addition, different subunits of a protein complex can yield isomeric proteolytic fragments. Here, we show that depletion of 13C and/or 15N for one or more protein subunits of a complex can greatly simplify the mass spectra, increase the signal-to-noise ratio of the depleted fragment ions, and remove ambiguity in assignment of the m/z values to the correct isomeric peptides. Specifically, it becomes possible to monitor the exchange progress for two isobaric fragments originating from two or more different subunits within the complex, without having to resort to tandem mass spectrometry techniques that can lead to deuterium scrambling in the gas phase. Finally, because the isotopic distribution for a small to medium-size peptide is essentially just the monoisotopic species (12Cc1Hh14Nn16Oo32Ss), it is not necessary to deconvolve the natural abundance distribution for each partially deuterated peptide during HDX data reduction.
Fourier Transform; Ion Cyclotron Resonance Mass Spectrometry; FTMS; Protein Complexes; Troponin
Modern mass spectrometers are now capable of producing hundreds of thousands of tandem (MS/MS) spectra per experiment, making the translation of these fragmentation spectra into peptide matches a common bottleneck in proteomics research. When coupled with experimental designs that enrich for post-translational modifications such as phosphorylation and/or include isotopically-labeled amino acids for quantification, additional burdens are placed on this computational infrastructure by shotgun sequencing. To address this issue, we have developed a new database searching program that utilizes the massively parallel compute capabilities of a graphical processing unit (GPU) to produce peptide spectral matches in a very high throughput fashion. Our program, named Tempest, combines efficient database digestion and MS/MS spectral indexing on a CPU with fast similarity scoring on a GPU. In our implementation, the entire similarity score, including the generation of full theoretical peptide candidate fragmentation spectra and its comparison to experimental spectra, is conducted on the GPU. Although Tempest uses the classical SEQUEST XCorr score as a primary metric for evaluating similarity for spectra collected at unit resolution, we have developed a new “Accelerated Score” for MS/MS spectra collected at high resolution that is based on a computationally inexpensive dot product but exhibits scoring accuracy similar to the classical XCorr. In our experience, Tempest provides compute-cluster level performance in an affordable desktop computer.
search algorithm; peptide spectral matching; graphical processing unit; GPU; tandem mass spectrometry; peptide identification
Protein-amide proton hydrogen-deuterium exchange (HDX) is used to investigate protein conformation, conformational changes and surface binding sites for other molecules. To our knowledge, software tools to automate data processing and analysis from sample fractionating (LC-MALDI) mass-spectrometry-based HDX workflows are not publicly available.
An integrated data pipeline (Solvent Explorer/TOF2H) has been developed for the processing of LC-MALDI-derived HDX data. Based on an experiment-wide template, and taking an ab initio approach to chromatographic and spectral peak finding, initial data processing is based on accurate mass-matching to fully deisotoped peaklists accommodating, in MS/MS-confirmed peptide library searches, ambiguous mass-hits to non-target proteins. Isotope-shift re-interrogation of library search results allows quick assessment of the extent of deuteration from peaklist data alone. During raw spectrum editing, each spectral segment is validated in real time, consistent with the manageable spectral numbers resulting from LC-MALDI experiments. A semi-automated spectral-segment editor includes a semi-automated or automated assessment of the quality of all spectral segments as they are pooled across an XIC peak for summing, centroid mass determination, building of rates plots on-the-fly, and automated back exchange correction. The resulting deuterium uptake rates plots from various experiments can be averaged, subtracted, re-scaled, error-barred, and/or scatter-plotted from individual spectral segment centroids, compared to solvent exposure and hydrogen bonding predictions and receive a color suggestion for 3D visualization. This software lends itself to a "divorced" HDX approach in which MS/MS-confirmed peptide libraries are built via nano or standard ESI without source modification, and HDX is performed via LC-MALDI using a standard MALDI-TOF. The complete TOF2H package includes additional (eg LC analysis) modules.
"TOF2H" provides a comprehensive HDX data analysis package that has accelerated the processing of LC-MALDI-based HDX data in the authors' lab from weeks to hours. It runs in a standard MS Windows (XP or Vista) environment, and can be downloaded or obtained from the authors at no cost.
De novo interpretation of tandem mass spectrometry (MS/MS) spectra provides sequences for searching protein databases when limited sequence information is present in the database. Our objective was to define a strategy for this type of homology-tolerant database search. Homology searches, using MS-Homology software, were conducted with 20, 10, or 5 of the most abundant peptides from 9 proteins, based either on precursor trigger intensity or on total ion current, and allowing for 50%, 30%, or 10% mismatch in the search. Protein scores were corrected by subtracting a threshold score that was calculated from random peptides. The highest (p < .01) corrected protein scores (i.e., above the threshold) were obtained by submitting 20 peptides and allowing 30% mismatch. Using these criteria, protein identification based on ion mass searching using MS/MS data (i.e., Mascot) was compared with that obtained using homology search. The highest-ranking protein was the same using Mascot, homology search using the 20 most intense peptides, or homology search using all peptides, for 63.4% of 112 spots from two-dimensional polyacrylamide gel electrophoresis gels. For these proteins, the percent coverage was greatest using Mascot compared with the use of all or just the 20 most intense peptides in a homology search (25.1%, 18.3%, and 10.6%, respectively). Finally, 35% of de novo sequences completely matched the corresponding known amino acid sequence of the matching peptide. This percentage increased when the search was limited to the 20 most intense peptides (44.0%). After identifying the protein using MS-Homology, a peptide mass search may increase the percent coverage of the protein identified.
homology search; mass spectrometry; bioinformatics
We sought to evaluate the reproducibility of a liquid chromatography-tandem mass spectrometry (LC-MS/MS)-based approach to measure the stable-isotope enrichment of in vivo-labeled muscle ATP synthase β subunit (β-F1-ATPase), a protein most directly involved in ATP production, and whose abundance is reduced under a variety of circumstances. Muscle was obtained from a rat infused with stable-isotope-labeled leucine. The muscle was homogenized, β-F1-ATPase immunoprecipitated, and the protein was resolved using 1D-SDS PAGE. Following trypsin digestion of the isolated protein, the resultant peptide mixtures were subjected to analysis by HPLC-ESI-MS/MS, which resulted in the detection of multiple β-F1-ATPase peptides. There were three β-F1-ATPase unique peptides with a leucine residue in the amino acid sequence, and which were detected with high intensity relative to other peptides and assigned with >95% probability to β-F1-ATPase. These peptides were specifically targeted for fragmentation to access their stable-isotope enrichment based on MS/MS peak areas calculated from extracted ion chromatographs for selected labeled and unlabeled fragment ions. Results showed best linearity (R2 = 0.99) in the detection of MS/MS peak areas for both labeled and unlabeled fragment ions, over a wide range of amounts of injected protein, specifically for the β-F1-ATPase134-143 peptide. Measured stable-isotope enrichment was highly reproducible for the β-F1-ATPase134-143 peptide (CV = 2.9%). Further, using mixtures of synthetic labeled and unlabeled peptides we determined that there is an excellent linear relationship (R2 = 0.99) between measured and predicted enrichment for percent enrichments ranging between 0.009% and 8.185% for the β-F1-ATPase134-143 peptide. The described approach provides a reliable approach to measure the stable-isotope enrichment of in-vivo-labeled muscle β-F1-ATPase based on the determination of the enrichment of the β-F1-ATPase134-143 peptide.
Motivation: In recent years stable isotopic labeling has become a standard approach for quantitative proteomic analyses. Among the many available isotopic labeling strategies, metabolic labeling is attractive for the excellent internal control it provides. However, analysis of data from metabolic labeling experiments can be complicated because the spacing between labeled and unlabeled forms of each peptide depends on its sequence, and is thus variable from analyte to analyte. As a result, one generally needs to know the sequence of a peptide to identify its matching isotopic distributions in an automated fashion. In some experimental situations it would be necessary or desirable to match pairs of labeled and unlabeled peaks from peptides of unknown sequence. This article addresses this largely overlooked problem in the analysis of quantitative mass spectrometry data by presenting an algorithm that not only identifies isotopic distributions within a mass spectrum, but also annotates matches between natural abundance light isotopic distributions and their metabolically labeled counterparts. This algorithm is designed in two stages: first we annotate the isotopic peaks using a modified version of the IDM algorithm described last year; then we use a probabilistic classifier that is supplemented by dynamic programming to find the metabolically labeled matched isotopic pairs. Such a method is needed for high-throughput quantitative proteomic metabolomic experiments measured via mass spectrometry.
Results: The primary result of this article is that the dynamic programming approach performs well given perfect isotopic distribution annotations. Our algorithm achieves a true positive rate of 99% and a false positive rate of 1% using perfect isotopic distribution annotations. When the isotopic distributions are annotated given ‘expert’ selected peaks, the same algorithm gets a true positive rate of 77% and a false positive rate of 1%. Finally, when annotating using machine selected peaks, which may contain noise, the dynamic programming algorithm gives a true positive rate of 36% and a false positive rate of 1%. It is important to mention that these rates arise from the requirement of exact annotations of both the light and heavy isotopic distributions. In our evaluations, a match is considered ‘entirely incorrect’ if it is missing even one peak or containing an extraneous peak. If we only require that the ‘monoisotopic’ peaks exist within the two matched distributions, our algorithm obtains a positive rate of 45% and a false positive rate of 1% on the ‘machine’ selected data. Changes to the algorithm's scoring function and training example generation improves our ‘monoisotopic’ peak score true positive rate to 65% while obtaining a false positive rate of 2%. All results were obtained within 10-fold cross-validation of 41 mass spectra with a mass-to-charge range of 800–4000m/z. There are a total of 713 isotopic distributions and 255 matched isotopic pairs that are hand-annotated for this study.
Availability: Programs are available via http://www.cs.wisc.edu/~mcilwain/IDM/
The considerable progress in high throughput proteomics analysis via liquid chromatography-electrospray ionization-tandem mass spectrometry over the last decade has been fueled to a large degree by continuous improvements in instrumentation. High throughput identification experiments are based on peptide sequencing and are largely accomplished through the use of tandem mass spectrometry, with ion trap and trap-based instruments having become broadly adopted analytical platforms. To satisfy increasingly demanding requirements for depth of characterization and throughput, we present a newly developed dual-pressure linear ion trap mass spectrometer (LTQ Velos) that features increased sensitivity, afforded by a new source design, and demonstrates practical cycle times two times shorter than that of an LTQ XL, while improving or maintaining spectral quality for MS/MS fragmentation spectra. These improvements resulted in a substantial increase in the detection and identification of both proteins and unique peptides from the complex proteome of Caenorhabditis elegans, as compared to existing platforms. The greatly increased ion flux into the mass spectrometer in combination with improved isolation of low-abundance precursor ions resulted in increased detection of low-abundance peptides. These improvements cumulatively resulted in a substantially greater penetration into the baker’s yeast (Saccharomyces cerevisiae) proteome compared to LTQ XL. Alternatively, faster cycle times on the new instrument allowed for higher throughput for a given depth of proteome analysis, with more peptides and proteins identified in 60 min using an LTQ Velos than in 180 min using an LTQ XL. When mass analysis was carried out with resolution in excess of 25,000 FWHM, it became possible to isotopically resolve a small intact protein and its fragments, opening possibilities for top down experiments.
Quantification of protein expression by means of mass spectrometry (MS) has been introduced in various proteomics studies. In particular, two label-free quantification methods, such as spectral counting and spectra feature analysis have been extensively investigated in a wide variety of proteomic studies. The cornerstone of both methods is peptide identification based on a proteomic database search and subsequent estimation of peptide retention time. However, they often suffer from restrictive database search and inaccurate estimation of the liquid chromatography (LC) retention time. Furthermore, conventional peptide identification methods based on the spectral library search algorithms such as SEQUEST or SpectraST have been found to provide neither the best match nor high-scored matches. Lastly, these methods are limited in the sense that target peptides cannot be identified unless they have been previously generated and stored into the database or spectral libraries.
To overcome these limitations, we propose a novel method, namely Quantification method based on Finding the Identical Spectral set for a Homogenous peptide (Q-FISH) to estimate the peptide's abundance from its tandem mass spectrometry (MS/MS) spectra through the direct comparison of experimental spectra. Intuitively, our Q-FISH method compares all possible pairs of experimental spectra in order to identify both known and novel proteins, significantly enhancing identification accuracy by grouping replicated spectra from the same peptide targets.
We applied Q-FISH to Nano-LC-MS/MS data obtained from human hepatocellular carcinoma (HCC) and normal liver tissue samples to identify differentially expressed peptides between the normal and disease samples. For a total of 44,318 spectra obtained through MS/MS analysis, Q-FISH yielded 14,747 clusters. Among these, 5,777 clusters were identified only in the HCC sample, 6,648 clusters only in the normal tissue sample, and 2,323 clusters both in the HCC and normal tissue samples. While it will be interesting to investigate peptide clusters only found from one sample, further examined spectral clusters identified both in the HCC and normal samples since our goal is to identify and assess differentially expressed peptides quantitatively. The next step was to perform a beta-binomial test to isolate differentially expressed peptides between the HCC and normal tissue samples. This test resulted in 84 peptides with significantly differential spectral counts between the HCC and normal tissue samples. We independently identified 50 and 95 peptides by SEQUEST, of which 24 and 56 peptides, respectively, were found to be known biomarkers for the human liver cancer. Comparing Q-FISH and SEQUEST results, we found 22 of the differentially expressed 84 peptides by Q-FISH were also identified by SEQUEST. Remarkably, of these 22 peptides discovered both by Q-FISH and SEQUEST, 13 peptides are known for human liver cancer and the remaining 9 peptides are known to be associated with other cancers.
We proposed a novel statistical method, Q-FISH, for accurately identifying protein species and simultaneously quantifying the expression levels of identified peptides from mass spectrometry data. Q-FISH analysis on human HCC and liver tissue samples identified many protein biomarkers that are highly relevant to HCC. Q-FISH can be a useful tool both for peptide identification and quantification on mass spectrometry data analysis. It may also prove to be more effective in discovering novel protein biomarkers than SEQUEST and other standard methods.
A better understanding of the mechanisms involved in gas-phase fragmentation of peptides is essential for the development of more reliable algorithms for high-throughput protein identification using mass spectrometry (MS). Current methodologies depend predominantly on the use of derived m/z values of fragment ions, and, the knowledge provided by the intensity information present in MS/MS spectra has not been fully exploited. Indeed spectrum intensity information is very rarely utilized in the algorithms currently in use for high-throughput protein identification.
In this work, a Bayesian neural network approach is employed to analyze ion intensity information present in 13878 different MS/MS spectra. The influence of a library of 35 features on peptide fragmentation is examined under different proton mobility conditions. Useful rules involved in peptide fragmentation are found and subsets of features which have significant influence on fragmentation pathway of peptides are characterised. An intensity model is built based on the selected features and the model can make an accurate prediction of the intensity patterns for given MS/MS spectra. The predictions include not only the mean values of spectra intensity but also the variances that can be used to tolerate noises and system biases within experimental MS/MS spectra.
The intensity patterns of fragmentation spectra are informative and can be used to analyze the influence of various characteristics of fragmented peptides on their fragmentation pathway. The features with significant influence can be used in turn to predict spectra intensities. Such information can help develop more reliable algorithms for peptide and protein identification.
Glycomics quintavariate-informed quantification (GlyQ-IQ) is a biologically guided glycomics analysis tool for identifying N-glycans in liquid chromatography–mass spectrometry (LC–MS) data. Glycomics LC–MS data sets have convoluted extracted ion chromatograms that are challenging to deconvolve with software. LC deconvolution into constituent pieces is critical in glycomics data sets because chromatographic peaks correspond to different intact glycan structural isomers. The biological targeted analysis approach offers several key advantages to traditional LC–MS data processing. A priori glycan information about the individual target's elemental composition allows for improved sensitivity by utilizing the exact isotope profile information to focus chromatogram generation and LC peak fitting on the isotopic species having the highest intensity. Glycan target annotation utilizes glycan family relationships and in source fragmentation in addition to high specificity feature LC–MS detection to improve the specificity of the analysis. The GlyQ-IQ software is developed in this work and was used to profile N-glycan compositions from human serum LC–MS data sets. A case study is presented to demonstrate how GlyQ-IQ identifies and removes confounding chromatographic peaks from high mannose glycan isomers from human blood serum. In addition, GlyQ-IQ was used to generate a broad human serum N-glycan profile from a high resolution nanoelectrospray-liquid chromatography–tandem mass spectrometry (nESI-LC–MS/MS) data set. A total of 156 glycan compositions and 640 glycan isomers were detected from a single sample. Over 99% of the GlyQ-IQ glycan-feature assignments passed manual validation and are backed with high-resolution mass spectra.
An important step in mass spectrometry (MS)-based proteomics is the identification of peptides by their fragment spectra. Regardless of the identification score achieved, almost all tandem-MS (MS/MS) spectra contain remaining peaks that are not assigned by the search engine. These peaks may be explainable by human experts but the scale of modern proteomics experiments makes this impractical. In computer science, Expert Systems are a mature technology to implement a list of rules generated by interviews with practitioners. We here develop such an Expert System, making use of literature knowledge as well as a large body of high mass accuracy and pure fragmentation spectra. Interestingly, we find that even with high mass accuracy data, rule sets can quickly become too complex, leading to over-annotation. Therefore we establish a rigorous false discovery rate, calculated by random insertion of peaks from a large collection of other MS/MS spectra, and use it to develop an optimized knowledge base. This rule set correctly annotates almost all peaks of medium or high abundance. For high resolution HCD data, median intensity coverage of fragment peaks in MS/MS spectra increases from 58% by search engine annotation alone to 86%. The resulting annotation performance surpasses a human expert, especially on complex spectra such as those of larger phosphorylated peptides. Our system is also applicable to high resolution collision-induced dissociation data. It is available both as a part of MaxQuant and via a webserver that only requires an MS/MS spectrum and the corresponding peptides sequence, and which outputs publication quality, annotated MS/MS spectra (www.biochem.mpg.de/mann/tools/). It provides expert knowledge to beginners in the field of MS-based proteomics and helps advanced users to focus on unusual and possibly novel types of fragment ions.
Current experimental techniques, especially those applying liquid chromatography mass spectrometry, have made high-throughput proteomic studies possible. The increase in throughput however also raises concerns on the accuracy of identification or quantification. Most experimental procedures select in a given MS scan only a few relatively most intense parent ions, each to be fragmented (MS2) separately, and most other minor co-eluted peptides that have similar chromatographic retention times are ignored and their information lost.
We have computationally investigated the possibility of enhancing the information retrieval during a given LC/MS experiment by selecting the two or three most intense parent ions for simultaneous fragmentation. A set of spectra is created via superimposing a number of MS2 spectra, each can be identified by all search methods tested with high confidence, to mimick the spectra of co-eluted peptides. The generated convoluted spectra were used to evaluate the capability of several database search methods – SEQUEST, Mascot, X!Tandem, OMSSA, and RAId_DbS – in identifying true peptides from superimposed spectra of co-eluted peptides. We show that using these simulated spectra, all the database search methods will gain eventually in the number of true peptides identified by using the compound spectra of co-eluted peptides.
Open peer review
Reviewed by Vlad Petyuk (nominated by Arcady Mushegian), King Jordan and Shamil Sunyaev. For the full reviews, please go to the Reviewers' comments section.
Currently, the tandem mass spectrometry (MSMS) of peptides is a dominant technique used to identify peptides and consequently proteins. The peptide fragmentation inside the mass analyzer typically offers a spectrum containing several different groups of ions. The mass to charge (m/z) values of these ions can be exactly calculated following simple rules based on the possible peptide fragmentation reactions. But the (relative) intensities of the particular ions cannot be simply predicted from the amino-acid sequence of the peptide. This study presents initial work towards developing a theoretical fundamental approach to ion intensity elucidation by utilizing quantum mechanical computations.
MSMS spectra of the doubly charged GAVLK peptide were collected on electrospray ion trap mass spectrometers using low energy modes of fragmentation. Density functional theory (DFT) calculations were performed on the population of ion precursors to determine the fragment ion intensities corresponding to a Boltzmann distribution of the protonation of nitrogens in the peptide backbone amide bonds.
We were able to a) predict the y and b ions intensities order in concert with the experimental observation; b) predict relative intensities of y ions with errors not exceeding the experimental variation.
These results suggest that the GAVLK peptide fragmentation process in the ion trap mass spectrometer is predominantly driven by the thermodynamic stability of the precursor ions formed upon ionization of the sample. The computational approach presented in this manuscript successfully calculated ion intensities in the mass spectra of this doubly charged tryptic peptide, based solely on its amino acid sequence. As such, this work indicates a potential of incorporating quantum mechanical calculations into mass spectrometry based algorithms for molecular identification.