Mass spectrometry (MS)-based shotgun proteomics allows protein identifications even in complex biological samples. Protein abundances can then be estimated from the counts of MS/MS spectra attributable to each protein, provided that one corrects for differential MS-detectability of the contributing peptides. We describe the use of a method, APEX, which calculates Absolute Protein EXpression levels based on learned correction factors, MS/MS spectral counts, and each protein's probability of correct identification.
The APEX-based calculations consist of three parts: (1) Using training data, peptide sequences and their sequence properties, a model is built that can be used to estimate MS-detectability (Oi) for any given protein. (2) Absolute abundances of proteins measured in an MS/MS experiment are calculated with information from spectral counts, identification probabilities and the learned Oi -values. (3) Simple statistics allow for significance analysis of differential expression in two distinct biological samples, i.e., measuring relative protein abundances. APEX-based protein abundances span more than four orders of magnitude and are applicable to mixtures of hundreds to thousands of proteins from any type of organism.
Quantitative proteomics; Protein expression; Label-free mass spectrometry; Spectral counting
Proteomics is the large-scale study of proteins, particularly their expression, structures and functions. This still-emerging combination of technologies aims to describe and characterize all expressed proteins in a biological system. Because of upper limits on mass detection of mass spectrometers, proteins are usually digested into peptides and the peptides are then separated, identified and quantified from this complex enzymatic digest. The problem in digesting proteins first and then analyzing the peptide cleavage fragments by mass spectrometry is that huge numbers of peptides are generated that overwhelm direct mass spectral analyses. The objective in the liquid chromatography approach to proteomics is to fractionate peptide mixtures to enable and maximize identification and quantification of the component peptides by mass spectrometry. This review will focus on existing multidimensional liquid chromatographic (MDLC) platforms developed for proteomics and their application in combination with other techniques such as stable isotope labeling. We also provide some perspectives on likely future developments.
multi-dimensional liquid chromatography; stable isotope labeling; label free; proteomics
Peptide identification by tandem mass spectrometry is the dominant proteomics workflow for protein characterization in complex samples. The peptide fragmentation spectra generated by these workflows exhibit characteristic fragmentation patterns that can be used to identify the peptide. In other fields, where the compounds of interest do not have the convenient linear structure of peptides, fragmentation spectra are identified by comparing new spectra with libraries of identified spectra, an approach called spectral matching. In contrast to sequence-based tandem mass spectrometry search engines used for peptides, spectral matching can make use of the intensities of fragment peaks in library spectra to assess the quality of a match. We evaluate a hidden Markov model approach (HMMatch) to spectral matching, in which many examples of a peptide's fragmentation spectrum are summarized in a generative probabilistic model that captures the consensus and variation of each peak's intensity. We demonstrate that HMMatch has good specificity and superior sensitivity, compared to sequence database search engines such as X!Tandem. HMMatch achieves good results from relatively few training spectra, is fast to train, and can evaluate many spectra per second. A statistical significance model permits HMMatch scores to be compared with each other, and with other peptide identification tools, on a unified scale. HMMatch shows a similar degree of concordance with X!Tandem, Mascot, and NIST's MS Search, as they do with each other, suggesting that each tool can assign peptides to spectra that the others miss. Finally, we show that it is possible to extrapolate HMMatch models beyond a single peptide's training spectra to the spectra of related peptides, expanding the application of spectral matching techniques beyond the set of peptides previously observed.
computational molecular biology; mass spectroscopy; HMM; peptide identification; algorithms
A goal of proteomics is to distinguish between states of a biological system by identifying protein expression differences. Liu et al. demonstrated a method to perform semi-relative protein quantitation in shotgun proteomics data by correlating the number of tandem mass spectra obtained for each protein, or "spectral count", with its abundance in a mixture; however, two issues have remained open: how to normalize spectral counting data and how to efficiently pinpoint differences between profiles. Moreover, Chen et al. recently showed how to increase the number of identified proteins in shotgun proteomics by analyzing samples with different MS-compatible detergents while performing proteolytic digestion. The latter introduced new challenges as seen from the data analysis perspective, since replicate readings are not acquired.
To address the open issues above, we present a program termed PatternLab for proteomics. This program implements existing strategies and adds two new methods to pinpoint differences in protein profiles. The first method, ACFold, addresses experiments with less than three replicates from each state or having assays acquired by different protocols as described by Chen et al. ACFold uses a combined criterion based on expression fold changes, the AC test, and the false-discovery rate, and can supply a "bird's-eye view" of differentially expressed proteins. The other method addresses experimental designs having multiple readings from each state and is referred to as nSVM (natural support vector machine) because of its roots in evolutionary computing and in statistical learning theory. Our observations suggest that nSVM's niche comprises projects that select a minimum set of proteins for classification purposes; for example, the development of an early detection kit for a given pathology. We demonstrate the effectiveness of each method on experimental data and confront them with existing strategies.
PatternLab offers an easy and unified access to a variety of feature selection and normalization strategies, each having its own niche. Additionally, graphing tools are available to aid in the analysis of high throughput experimental data. PatternLab is available at .
In order to study the differential protein expression in complex biological samples, strategies for rapid, highly reproducible and accurate quantification are necessary. Isotope labeling and fluorescent labeling techniques have been widely used in quantitative proteomics research. However, researchers are increasingly turning to label-free shotgun proteomics techniques for faster, cleaner, and simpler results. Mass spectrometry-based label-free quantitative proteomics falls into two general categories. In the first are the measurements of changes in chromatographic ion intensity such as peptide peak areas or peak heights. The second is based on the spectral counting of identified proteins. In this paper, we will discuss the technologies of these label-free quantitative methods, statistics, available computational software, and their applications in complex proteomics studies.
Peptide identification via tandem mass spectrometry is the basic task of current proteomics research. Due to the complexity of mass spectra, the majority of mass spectra cannot be interpreted at present. The existence of unexpected or unknown protein post-translational modifications is a major reason.
This paper describes an efficient and sequence database-independent approach to detecting abundant post-translational modifications in high-accuracy peptide mass spectra. The approach is based on the observation that the spectra of a modified peptide and its unmodified counterpart are correlated with each other in their peptide masses and retention time. Frequently occurring peptide mass differences in a data set imply possible modifications, while small and consistent retention time differences provide orthogonal supporting evidence. We propose to use a bivariate Gaussian mixture model to discriminate modification-related spectral pairs from random ones. Due to the use of two-dimensional information, accurate modification masses and confident spectral pairs can be determined as well as the quantitative influences of modifications on peptide retention time.
Experiments on two glycoprotein data sets demonstrate that our method can effectively detect abundant modifications and spectral pairs. By including the discovered modifications into database search or by propagating peptide assignments between paired spectra, an average of 10% more spectra are interpreted.
We describe Abacus, a computational tool for extracting spectral counts from tandem mass spectrometry based proteomic datasets. The program aggregates data from multiple experiments, adjusts spectral counts to accurately account for peptides shared across multiple proteins, and performs common normalization steps. It can also output the spectral count data at the gene level, thus simplifying the integration and comparison between gene and protein expression data. Abacus is compatible with the widely used Trans-Proteomic Pipeline suite of tools and comes with a graphical user interface making it easy to interact with the program. The main aim of Abacus is to streamline the analysis of spectral count data by providing an automated, easy to use solution for extracting this information from proteomic datasets for subsequent, more sophisticated statistical analysis.
Label free quantification; spectral counts; software; tandem mass spectrometry; protein inference; shared peptides
Quantification of protein expression by means of mass spectrometry (MS) has been introduced in various proteomics studies. In particular, two label-free quantification methods, such as spectral counting and spectra feature analysis have been extensively investigated in a wide variety of proteomic studies. The cornerstone of both methods is peptide identification based on a proteomic database search and subsequent estimation of peptide retention time. However, they often suffer from restrictive database search and inaccurate estimation of the liquid chromatography (LC) retention time. Furthermore, conventional peptide identification methods based on the spectral library search algorithms such as SEQUEST or SpectraST have been found to provide neither the best match nor high-scored matches. Lastly, these methods are limited in the sense that target peptides cannot be identified unless they have been previously generated and stored into the database or spectral libraries.
To overcome these limitations, we propose a novel method, namely Quantification method based on Finding the Identical Spectral set for a Homogenous peptide (Q-FISH) to estimate the peptide's abundance from its tandem mass spectrometry (MS/MS) spectra through the direct comparison of experimental spectra. Intuitively, our Q-FISH method compares all possible pairs of experimental spectra in order to identify both known and novel proteins, significantly enhancing identification accuracy by grouping replicated spectra from the same peptide targets.
We applied Q-FISH to Nano-LC-MS/MS data obtained from human hepatocellular carcinoma (HCC) and normal liver tissue samples to identify differentially expressed peptides between the normal and disease samples. For a total of 44,318 spectra obtained through MS/MS analysis, Q-FISH yielded 14,747 clusters. Among these, 5,777 clusters were identified only in the HCC sample, 6,648 clusters only in the normal tissue sample, and 2,323 clusters both in the HCC and normal tissue samples. While it will be interesting to investigate peptide clusters only found from one sample, further examined spectral clusters identified both in the HCC and normal samples since our goal is to identify and assess differentially expressed peptides quantitatively. The next step was to perform a beta-binomial test to isolate differentially expressed peptides between the HCC and normal tissue samples. This test resulted in 84 peptides with significantly differential spectral counts between the HCC and normal tissue samples. We independently identified 50 and 95 peptides by SEQUEST, of which 24 and 56 peptides, respectively, were found to be known biomarkers for the human liver cancer. Comparing Q-FISH and SEQUEST results, we found 22 of the differentially expressed 84 peptides by Q-FISH were also identified by SEQUEST. Remarkably, of these 22 peptides discovered both by Q-FISH and SEQUEST, 13 peptides are known for human liver cancer and the remaining 9 peptides are known to be associated with other cancers.
We proposed a novel statistical method, Q-FISH, for accurately identifying protein species and simultaneously quantifying the expression levels of identified peptides from mass spectrometry data. Q-FISH analysis on human HCC and liver tissue samples identified many protein biomarkers that are highly relevant to HCC. Q-FISH can be a useful tool both for peptide identification and quantification on mass spectrometry data analysis. It may also prove to be more effective in discovering novel protein biomarkers than SEQUEST and other standard methods.
The fission yeast Schizosaccharomyces pombe is a widely used model organism to study basic mechanisms of eukaryotic biology, but unlike other model organisms, its proteome remains largely uncharacterized. Using a shotgun proteomics approach based on multidimensional prefractionation and tandem mass spectrometry, we have detected ∼30% of the theoretical fission yeast proteome. Applying statistical modelling to normalize spectral counts to the number of predicted tryptic peptides, we have performed label-free quantification of 1465 proteins. The fission yeast protein data showed considerable correlations with mRNA levels and with the abundance of orthologous proteins in budding yeast. Functional pathway analysis indicated that the mRNA–protein correlation is strong for proteins involved in signalling and metabolic processes, but increasingly discordant for components of protein complexes, which clustered in groups with similar mRNA–protein ratios. Self-organizing map clustering of large-scale protein and mRNA data from fission and budding yeast revealed coordinate but not always concordant expression of components of functional pathways and protein complexes. This finding reaffirms at the protein level the considerable divergence in gene expression patterns of the two model organisms that was noticed in previous transcriptomic studies.
fission yeast; LC-MS/MS; mRNA–protein correlation; relative protein quantification; protein profiling
We report the performance of capillary zone electrophoresis coupled with an electrokinetically pumped electrospray interface and an Orbitrap-Velos mass spectrometer for high sensitivity protein analysis. We first investigated the system for quantitation of the tryptic digest of bovine serum albumin (BSA). The system produced outstanding linearity with respect to peak height, number of peptide IDs, and spectral counts across the range of 12 nM to 750 nM (60 amol to 3.5 fmol) of BSA injected. One peptide produced a detection limit of 0.3 nM (1.5 amol) injected. We also analyzed 700 pg of a tryptic digest prepared from a RAW264.7 cell lysate; 10 proteins were identified in triplicate analyses after filtering the data with peptide confidence value as high. This sample size corresponds to the protein content of ~10 eukaryotic cells.
Electrokinetically driven sheath flow interface; CZE-ESI-MS/MS; Protein digests
The in vitro stationary phase proteome of the human pathogen Shigella dysenteriae serotype 1 (SD1) was quantitatively analyzed in Coomassie Blue G250 (CBB)-stained 2D gels. More than four hundred and fifty proteins, of which 271 were associated with distinct gel spots, were identified. In parallel, we employed 2D-LC-MS/MS followed by the label-free computationally modified spectral counting method APEX for absolute protein expression measurements. Of the 4502 genome-predicted SD1 proteins, 1148 proteins were identified with a false positive discovery rate of 5% and quantitated using 2D-LC-MS/MS and APEX. The dynamic range of the APEX method was approximately one order of magnitude higher than that of CBB-stained spot intensity quantitation. A squared Pearson correlation analysis revealed a reasonably good correlation (R2 = 0.67) for protein quantities surveyed by both methods. The correlation was decreased for protein subsets with specific physicochemical properties, such as low Mr values and high hydropathy scores. Stoichiometric ratios of subunits of protein complexes characterized in E. coli were compared with APEX quantitative ratios of orthologous SD1 protein complexes. A high correlation was observed for subunits of soluble cellular protein complexes in several cases, demonstrating versatile applications of the APEX method in quantitative proteomics.
Motivation: Mass spectrometry (MS) instruments and experimental protocols are rapidly advancing, but de novo peptide sequencing algorithms to analyze tandem mass (MS/MS) spectra are lagging behind. Although existing de novo sequencing tools perform well on certain types of spectra [e.g. Collision Induced Dissociation (CID) spectra of tryptic peptides], their performance often deteriorates on other types of spectra, such as Electron Transfer Dissociation (ETD), Higher-energy Collisional Dissociation (HCD) spectra or spectra of non-tryptic digests. Thus, rather than developing a new algorithm for each type of spectra, we develop a universal de novo sequencing algorithm called UniNovo that works well for all types of spectra or even for spectral pairs (e.g. CID/ETD spectral pairs). UniNovo uses an improved scoring function that captures the dependences between different ion types, where such dependencies are learned automatically using a modified offset frequency function.
Results: The performance of UniNovo is compared with PepNovo+, PEAKS and pNovo using various types of spectra. The results show that the performance of UniNovo is superior to other tools for ETD spectra and superior or comparable with others for CID and HCD spectra. UniNovo also estimates the probability that each reported reconstruction is correct, using simple statistics that are readily obtained from a small training dataset. We demonstrate that the estimation is accurate for all tested types of spectra (including CID, HCD, ETD, CID/ETD and HCD/ETD spectra of trypsin, LysC or AspN digested peptides).
Availability: UniNovo is implemented in JAVA and tested on Windows, Ubuntu and OS X machines. UniNovo is available at http://proteomics.ucsd.edu/Software/UniNovo.html along with the manual.
firstname.lastname@example.org or email@example.com
Supplementary data are available at Bioinformatics online.
Spectral counting, a promising method for quantifying relative changes in protein abundance in mass spectrometry-based proteomic analysis, was compared to metabolic stable isotope labeling using 15N/14N “heavy/light” peptide pairs. The data were drawn primarily from a Methanococcus maripaludis experiment comparing a wild-type strain with a mutant deficient in a key enzyme relevant to energy metabolism. The dataset contained both proteome and transcriptome measurements. The normalization technique used previously for the isotopic measurements was inappropriate for spectral counting, but a simple adjustment for sampling frequency was sufficient for normalization. This adjustment was satisfactory both for M. maripaludis, an organism that showed relatively little expression change between the wild-type and mutant strains, and Porphyromonas gingivalis, an intracellular pathogen that has demonstrated widespread changes between intracellular and extracellular conditions. Spectral counting showed lower overall sensitivity defined in terms of detecting a two-fold change in protein expression, and in order to achieve the same level of quantitative proteome coverage as the stable isotope method, it would have required approximately doubling the number of mass spectra collected.
The analysis of tandem mass (MS/MS) data to identify and quantify proteins is hampered by the heterogeneity of file formats at the raw spectral data, peptide identification, and protein identification levels. Different mass spectrometers output their raw spectral data in a variety of proprietary formats, and alternative methods that assign peptides to MS/MS spectra and infer protein identifications from those peptide assignments each write their results in different formats. Here we describe an MS/MS analysis platform, the Trans-Proteomic Pipeline, which makes use of open XML file formats for storage of data at the raw spectral data, peptide, and protein levels. This platform enables uniform analysis and exchange of MS/MS data generated from a variety of different instruments, and assigned peptides using a variety of different database search programs. We demonstrate this by applying the pipeline to data sets generated by ThermoFinnigan LCQ, ABI 4700 MALDI-TOF/TOF, and Waters Q-TOF instruments, and searched in turn using SEQUEST, Mascot, and COMET.
analysis platform; open XML; proteomics
A key problem in computational proteomics is distinguishing between correct and false peptide identifications. We argue that evaluating the error rates of peptide identifications is not unlike computing generating functions in combinatorics. We show that the generating functions and their derivatives (spectral energy and spectral probability) represent new features of tandem mass spectra that, similarly to Δ-scores, significantly improve peptide identifications. Furthermore, the spectral probability provides a rigorous solution to the problem of computing statistical significance of spectral identifications. The spectral energy/probability approach improves the sensitivity-specificity trade-off of existing MS/MS search tools, addresses the notoriously difficult problem of “one-hit-wonders” in mass spectrometry, and often eliminates the need for decoy database searches. We therefore argue that the generating function approach has the potential to increase the number of peptide identifications in MS/MS searches.
For proteomic analysis using tandem mass spectrometry, linear ion trap instruments provide unsurpassed sensitivity, but unreliably detect low mass peptide fragments, precluding their use with iTRAQ reagent labeled samples. While the popular LTQ linear ion trap supports analyzing iTRAQ reagent labeled peptides via pulsed Q dissociation, PQD, its effectiveness remains questionable. Using a standard mixture, we found careful tuning of relative collision energy necessary for fragmenting iTRAQ reagent labeled peptides, and increasing microscan acquisition and repeat count improves quantification, but identifies somewhat fewer peptides. We developed software to calculate abundance ratios via summing reporter ion intensities across spectra matching to each protein, thereby providing maximized accuracy. Testing found results closely corresponded between analysis using optimized LTQ-PQD settings plus our software and using a Qstar instrument. Thus, we demonstrate the effectiveness of LTQ-PQD analyzing iTRAQ reagent labeled peptides, and provide guidelines for successful quantitative proteomic studies.
quantitative proteomics; iTRAQ; linear ion trap; pulsed-Q-dissociation
Shotgun proteomics provides the most powerful analytical platform for global inventory of complex proteomes using liquid chromatography−tandem mass spectrometry (LC−MS/MS) and allows a global analysis of protein changes. Nevertheless, sampling of complex proteomes by current shotgun proteomics platforms is incomplete, and this contributes to variability in assessment of peptide and protein inventories by spectral counting approaches. Thus, shotgun proteomics data pose challenges in comparing proteomes from different biological states. We developed an analysis strategy using quasi-likelihood Generalized Linear Modeling (GLM), included in a graphical interface software package (QuasiTel) that reads standard output from protein assemblies created by IDPicker, an HTML-based user interface to query shotgun proteomic data sets. This approach was compared to four other statistical analysis strategies: Student t test, Wilcoxon rank test, Fisher’s Exact test, and Poisson-based GLM. We analyzed the performance of these tests to identify differences in protein levels based on spectral counts in a shotgun data set in which equimolar amounts of 48 human proteins were spiked at different levels into whole yeast lysates. Both GLM approaches and the Fisher Exact test performed adequately, each with their unique limitations. We subsequently compared the proteomes of normal tonsil epithelium and HNSCC using this approach and identified 86 proteins with differential spectral counts between normal tonsil epithelium and HNSCC. We selected 18 proteins from this comparison for verification of protein levels between the individual normal and tumor tissues using liquid chromatography−multiple reaction monitoring mass spectrometry (LC−MRM-MS). This analysis confirmed the magnitude and direction of the protein expression differences in all 6 proteins for which reliable data could be obtained. Our analysis demonstrates that shotgun proteomic data sets from different tissue phenotypes are sufficiently rich in quantitative information and that statistically significant differences in proteins spectral counts reflect the underlying biology of the samples.
Shotgun proteomics provides the most powerful analytical platform for global inventory of complex proteomes but incomplete sampling poses challenges in comparing protein inventories by spectral counting approaches. We developed a statistical method based on quasi-likelihood modeling and demonstrate that it compares favorably to other statistical tests. Statistically significant spectral count differences were confirmed by MRM demonstrating that the observed protein level differences reflect the underlying biology of the samples.
LC−MS/MS; shotgun proteomics; multiple reaction monitoring (MRM); head and neck carcinoma; Generalized Linear Model; spectral counting
Spectral counting is a strategy to quantitate relative protein concentrations in pre-digested protein mixtures analyzed by liquid chromatography online with tandem mass spectrometry. In this work we used combinations of normalization and statistical (feature selection) methods on spectral counting data to verify whether we could pinpoint which and how many proteins were differentially expressed when comparing complex protein mixtures. These combinations were evaluated on real, but controlled, experiments (protein markers were spiked into yeast lysates in different concentrations to simulate differences), which are therefore verifiable. The following normalization methods were applied: total signal, Z-normalization, hybrid normalization, and log preprocessing. The feature selection methods were: Golub's index, Student's t-test, a strategy based on the weighting used in a support vector machine model (SVM-F), and support vector machine recursive feature elimination. The results showed that Z-normalization combined with SVM-F correctly identified which and how many protein markers were added to the yeast lysates for all different concentrations. The software we used is available at http://pcarvalho.com/patternlab.
MudPIT; feature selection; SVM; spectral counting; feature ranking
The field of proteomics involves the characterization of the peptides and proteins expressed in a cell under specific conditions. Proteomics has made rapid advances in recent years following the sequencing of the genomes of an increasing number of organisms. A prominent technology for high throughput proteomics analysis is the use of liquid chromatography coupled to Fourier transform ion cyclotron resonance mass spectrometry (LC-FTICR-MS). Meaningful biological conclusions can best be made when the peptide identities returned by this technique are accompanied by measures of accuracy and confidence.
After a tryptically digested protein mixture is analyzed by LC-FTICR-MS, the observed masses and normalized elution times of the detected features are statistically matched to the theoretical masses and elution times of known peptides listed in a large database. The probability of matching is estimated for each peptide in the reference database using statistical classification methods assuming bivariate Gaussian probability distributions on the uncertainties in the masses and the normalized elution times.
A database of 69,220 features from 32 LC-FTICR-MS analyses of a tryptically digested bovine serum albumin (BSA) sample was matched to a database populated with 97% false positive peptides. The percentage of high confidence identifications was found to be consistent with other database search procedures. BSA database peptides were identified with high confidence on average in 14.1 of the 32 analyses. False positives were identified on average in just 2.7 analyses.
Using a priori probabilities that contrast peptides from expected and unexpected proteins was shown to perform better in identifying target peptides than using equally likely a priori probabilities. This is because a large percentage of the target peptides were similar to unexpected peptides which were included to be false positives. The use of triplicate analyses with a "2 out of 3" reporting rule was shown to have excellent rejection of false positives.
Differential analysis of whole cell proteomes by mass spectrometry has largely been applied using various forms of stable isotope labeling. While metabolic stable isotope labeling has been the method of choice, it is often not possible to apply such an approach. Four different label free ways of calculating expression ratios in a classic “two-state” experiment are compared: signal intensity at the peptide level, signal intensity at the protein level, spectral counting at the peptide level, and spectral counting at the protein level. The quantitative data were mined from a dataset of 1245 qualitatively identified proteins, about 56% of the protein encoding open reading frames from Porphyromonas gingivalis, a Gram-negative intracellular pathogen being studied under extracellular and intracellular conditions. Two different control populations were compared against P. gingivalis internalized within a model human target cell line. The q-value statistic, a measure of false discovery rate previously applied to transcription microarrays, was applied to proteomics data. For spectral counting, the most logically consistent estimate of random error came from applying the locally weighted scatter plot smoothing procedure (LOWESS) to the most extreme ratios generated from a control technical replicate, thus setting upper and lower bounds for the region of experimentally observed random error.
spectral count; Porphyromonas gingivalis; q-value; quantitative proteomics; G test
One of the most popular methods to prepare tryptic peptides for bottom-up proteomic analysis is in-gel digestion. To date, there have been few studies comparing various digestion methods. In this study, we compare the efficiency of several popular in-gel digestion methods, along with new technologies that may improve digestion efficiency, using a human epidermoid carcinoma cell lysate protein standard. The efficiency of each protocol was based on the average number of proteins identified and their respective sequence coverage and relative quantitation using spectral counting. The importance of this study lies in its comparison of pre-existing in-gel digestion methods with those that use newly developed technologies that may introduce the potential for a more cost-effective digestion, higher protein yield, and an overall reduction in processing time. The following four protocols were compared: an overnight in-gel digestion protocol; an overnight in-gel digestion protocol, in which we remove the vacuum centrifugation steps; in-gel digestion in a barometric pressure cycler; and in-gel digestion in a scientific microwave. Several variables were tested for increased digestion efficiency and decreased keratin contamination. Statistical analysis was performed on replicate samples to determine significant differences between protocols.
mass spectrometry; proteomics
One of the most popular methods to prepare tryptic peptides for bottom-up proteomic analysis is in-gel digestion. To date, there have been few studies comparing various digestion protocols. In this study we compare the efficiency of several popular in-gel digestion protocols along with new pieces of technology that may improve digestion efficiency, using a human epidermoid carcinoma cell lysate protein standard. The efficiency of each protocol will be based on the number of proteins identified, their respective sequence coverage and relative quantitation using spectral counting. The importance of this study lies in its comparison of pre-existing in-gel digestion methods and newly developed technologies. These new technologies introduce the potential for a more cost effective digestion, higher protein yield and an overall reduction in time. The following four protocols will be compared: Shevchenko's overnight protocol (Methods in Molecular Biology 1999;122:383-397), in-gel digestion in a barometric pressure cycler (Pressure Biosciences, Boston, MA), and in-gel digestion in a scientific microwave (CEM, Mathews, NC). In addition several variables will be tested for increased digestion efficiency and keratin contamination including the elimination of vacuum centrifugation and the use of modified and non-modified trypsin. Statistical analysis will be performed on replicate samples to determine if there are any significant differences between protocols.
Plasma biomarkers studies are based on the differential expression of proteins between different treatment groups or between diseased and control populations. Most mass spectrometry-based methods of protein quantitation, however, are based on the detection and quantitation of peptides, not intact proteins. For peptide-based protein quantitation to be accurate, the digestion protocols used in proteomic analyses must be both efficient and reproducible. There have been very few studies, however, where plasma denaturation/digestion protocols have been compared using absolute quantitation methods. In this paper, 14 combinations of heat, solvent [acetonitrile, methanol, trifluoroethanol], chaotropic agents [guanidine hydrochloride, urea], and surfactants [sodium dodecyl sulfate (SDS) and sodium deoxycholate (DOC)] were compared with respect to their effectiveness in improving subsequent tryptic digestion. These digestion protocols were evaluated by quantitating the production of proteotypic tryptic peptides from 45 moderate- to high-abundance plasma proteins, using tandem mass spectrometry in multiple reaction monitoring mode, with a mixture of stable-isotope labeled analogues of these proteotypic peptides as internal standards. When the digestion efficiencies of these 14 methods were compared, we found that both of the surfactants (SDS and DOC) produced an increase in the overall yield of tryptic peptides from these 45 proteins, when compared to the more commonly used urea protocol. SDS, however, can be a serious interference for subsequent mass spectrometry. DOC, on the other hand, can be easily removed from the samples by acid precipitation. Examining the results of a reproducibility study, done with 5 replicate digestions, DOC and SDS with a 9 h digestion time produced the highest average digestion efficiencies (~80%), with the highest average reproducibility (<5% error, defined as the relative deviation from the mean value). However, because of potential interferences resulting from the use of SDS, we recommend DOC with a 9 h digestion procedure as the optimum protocol.
protein digestion; deoxycholate; urea; sodium dodecyl sulfate; heat denaturation; solvent denaturation
The primary utility of trypsin digestion in proteomics is that it cleaves proteins at predictable locations, but it is also notable for yielding peptides that terminate in basic arginine and lysine residues. Tryptic peptides fragment in ion trap tandem mass spectrometry to produce prominent C-terminal y series ions. Alternative proteolytic digests may produce peptides that do not follow these rules. In this study, we examine 2568 peptides generated through proteinase K digestion, a technique that produces a greater diversity of basic residue content in peptides. We show that the position of basic residues within peptides influences the peak intensities of b and y series ions; a basic residue near the N-terminus of a peptide can lead to prominent b series peaks rather than the intense y series peaks associated with tryptic peptides. The effects of presence and position for arginine, lysine, and histidine are explored separately and in combination. Arg shows the most dominant effects followed by His and then by Lys. Fragment ions containing basic residues produce more intense peaks than those without basic residues. Doubly charged precursor ions have generally been modeled as producing only singly charged fragment ions, but fragment ions that contain two basic residues may accept both protons during fragmentation. By characterizing the influence of basic residues on gas-phase fragmentation of peptides, this research makes possible more accurate fragmentation models for peptide identification algorithms.
Immunoaffinity depletion with antibodies to the top 7 or top 14 high abundance plasma proteins is used to enhance detection of lower abundance proteins in both shotgun and targeted proteomic analyses. We evaluated the effects of top 7/top 14 immunodepletion on the shotgun proteomic analysis of human plasma. Our goal was to evaluate the impact of immunodepletion on detection of proteins across detectable ranges of abundance. The depletion columns afforded highly repeatable and efficient plasma protein fractionation. Relatively few nontargeted proteins were captured by the depletion columns. Analyses of unfractionated and immunodepleted plasma by peptide isoelectric focusing (IEF), followed by liquid chromatography-tandem mass spectrometry (LC-MS/MS) demonstrated enrichment of nontargeted plasma proteins by an average of 4-fold, as assessed by MS/MS spectral counting. Either top 7 or top 14 immunodepletion resulted in a 25% increase in identified proteins compared to unfractionated plasma. Although 23 low abundance (<10 ng mL−1) plasma proteins were detected, they accounted for only 5–6% of total protein identifications in immunodepleted plasma. In both unfractionated and immunodepleted plasma, the 50 most abundant plasma proteins accounted for 90% of cumulative spectral counts and precursor ion intensities, leaving little capacity to sample lower abundance proteins. Untargeted proteomic analyses using current LC-MS/MS platforms—even with immunodepletion—cannot be expected to efficiently discover low abundance, disease-specific biomarkers in plasma.
plasma; high-abundance protein depletion; multiple affinity removal system; isoelectric focusing; shotgun proteomics