Search tips
Search criteria

Results 1-25 (1191846)

Clipboard (0)

Related Articles

1.  Label-Free Protein Quantitation Using Weighted Spectral Counting 
Methods in molecular biology (Clifton, N.J.)  2012;893:10.1007/978-1-61779-885-6_20.
Mass spectrometry (MS)-based shotgun proteomics allows protein identifications even in complex biological samples. Protein abundances can then be estimated from the counts of MS/MS spectra attributable to each protein, provided that one corrects for differential MS-detectability of the contributing peptides. We describe the use of a method, APEX, which calculates Absolute Protein EXpression levels based on learned correction factors, MS/MS spectral counts, and each protein's probability of correct identification.
The APEX-based calculations consist of three parts: (1) Using training data, peptide sequences and their sequence properties, a model is built that can be used to estimate MS-detectability (Oi) for any given protein. (2) Absolute abundances of proteins measured in an MS/MS experiment are calculated with information from spectral counts, identification probabilities and the learned Oi -values. (3) Simple statistics allow for significance analysis of differential expression in two distinct biological samples, i.e., measuring relative protein abundances. APEX-based protein abundances span more than four orders of magnitude and are applicable to mixtures of hundreds to thousands of proteins from any type of organism.
PMCID: PMC3654649  PMID: 22665309
Quantitative proteomics; Protein expression; Label-free mass spectrometry; Spectral counting
2.  Proteome-wide systems analysis of a cellulosic biofuel-producing microbe 
We apply mass spectrometry-based ReDi proteomics to quantify the Clostridium phytofermentans proteome during fermentation of cellulosic substrates. ReDi proteomics gives accurate, low-cost quantification of an extra and intracellular microbial proteome. When combined with physiological measurements, these methods form a general systems biology strategy to evaluate the efficiency of cellulosic bioconversion and to identify enzyme targets to engineer for improving this process.C. phytofermentans expressed more than 100 carbohydrate-active enzymes, of which distinct subsets were upregulated on cellulose and hemicellulose. Numerous extracellular enzymes cleave insoluble plant polysaccharides into oligosaccharides, which are transported into the cell to be further degraded by intracellular carbohydratases. Sugars are catabolized by EMP glycolysis incorporating alternative glycolytic enzymes to maximize the ATP yield of anaerobic metabolism.During cellulosic fermentation, cells adhered to the substrate and altered metabolic processes such as upregulation of tryptophan and nicotinamide synthesis proteins and repression of proteins for fatty acid metabolism and cell motility. These diverse metabolic changes highlight how a systems approach can identify novel ways to optimize cellulosic fermentation.
Cellulose is the world's most abundant renewable, biological energy source (Leschine, 1995). Microbial fermentation of cellulosic biomass could sustainably provide enough ethanol for 65% of US ground transportation fuel at current levels (Somerville, 2006). However, cellulose in plant biomass is packaged into a crystalline matrix, making biomass deconstruction a key roadblock to using it as a feedstock (Houghton et al, 2006). A promising strategy to overcome biomass recalcitrance is consolidated bioprocessing (Lynd et al, 2002), which uses microbes such as Clostridium phytofermentans to both secrete enzymes to depolymerize biomass and then ferment the resulting hexose and pentose sugars to a biofuel such as ethanol. The C. phytofermentans genome encodes 161 carbohydrate-active enzymes (CAZy) including 108 glycoside hydrolases spread across 39 families (Cantarel et al, 2009), highlighting the elaborate set of enzymes needed to breakdown different cellulosic polysaccharides. Faced with the complexity of metabolizing biomass, systems biology strategies are needed to comprehensively identify which cellulolytic and metabolic enzymes are used to ferment different cellulosic substrates.
This study presents a systems-level analysis of how C. phytofermentans ferments different cellulosic substrates that incorporates quantitative mass spectrometry-based proteomics of over 2500 proteins. Protein concentrations within each carbon source treatment were calculated by machine learning-supported spectral counting (Absolute Protein EXpression, APEX) (Lu et al, 2007). Protein levels on hemicellulose and cellulose relative to glucose were determined using reductive methylation (Hsu et al, 2003; Boersema et al, 2009), here called ReDi labeling, to chemically incorporate hydrogen or deuterium isotopes at lysines and N-terminal amines of tryptic peptides. We show that ReDi proteomics gives accurate, low-cost quantification of a microbial proteome and can be used to discern extracellular proteins. Further, we combine these quantitative proteomics with detailed measurements of growth, biomass consumption, fermentation product analyses, and electron microscopy. Together, these methods form a general strategy to evaluate the efficiency of cellulosic bioconversion and to identify enzyme targets to engineer for improving this process (Figure 1).
We found that fermentation of cellulosic substrates by C. phytofermentans involves secretion of numerous CAZy as well as proteins for binding of extracellular solutes, proteolysis, and motility. The most highly expressed protein in the proteome is a secreted protein that appears to compose a surface layer to support the cell and anchor cell surface proteins, including some enzymes for plant degradation. Once the secreted CAZy cleave insoluble plant polysaccharides into oligosaccharides, they are taken into the cell to be further degraded by intracellular CAZy, enabling more efficient sugar transport, conserving energy by phosphorolytic cleavage, and ensuring the sugar monomers were not available to competing microbes. Sugars are catabolized by EMP glycolysis incorporating reversible, PPi-dependent glycolytic enzymes, and pyruvate ferredoxin oxidoreductase. The genome encodes seven alcohol dehydrogenases, among which two iron-dependent enzymes are highly expressed and likely facilitate the high ethanol yields. Growth on cellulose also resulted in indirect changes such as increased tryptophan and nicotinamide synthesis and repression of fatty acid synthesis. We distilled the data into a model showing the highly expressed enzymes enabling efficient cellulosic fermentation by C. phytofermentans (Figure 7). Collectively, these data help understand how bacteria recycle plant biomass works towards enabling the use of plant biomass as a low-cost chemical feedstock.
Fermentation of plant biomass by microbes like Clostridium phytofermentans recycles carbon globally and can make biofuels from inedible feedstocks. We analyzed C. phytofermentans fermenting cellulosic substrates by integrating quantitative mass spectrometry of more than 2500 proteins with measurements of growth, enzyme activities, fermentation products, and electron microscopy. Absolute protein concentrations were estimated using Absolute Protein EXpression (APEX); relative changes between treatments were quantified with chemical stable isotope labeling by reductive dimethylation (ReDi). We identified the different combinations of carbohydratases used to degrade cellulose and hemicellulose, many of which were secreted based on quantification of supernatant proteins, as well as the repertoires of glycolytic enzymes and alcohol dehydrogenases (ADHs) enabling ethanol production at near maximal yields. Growth on cellulose also resulted in diverse changes such as increased expression of tryptophan synthesis proteins and repression of proteins for fatty acid metabolism and cell motility. This study gives a systems-level understanding of how this microbe ferments biomass and provides a rational, empirical basis to identify engineering targets for industrial cellulosic fermentation.
PMCID: PMC3049413  PMID: 21245846
bioenergy; clostridium; proteomics
3.  MRM screening/biomarker discovery with linear ion trap MS: a library of human cancer-specific peptides 
BMC Cancer  2009;9:96.
The discovery of novel protein biomarkers is essential in the clinical setting to enable early disease diagnosis and increase survivability rates. To facilitate differential expression analysis and biomarker discovery, a variety of tandem mass spectrometry (MS/MS)-based protein profiling techniques have been developed. For achieving sensitive detection and accurate quantitation, targeted MS screening approaches, such as multiple reaction monitoring (MRM), have been implemented.
MCF-7 breast cancer protein cellular extracts were analyzed by 2D-strong cation exchange (SCX)/reversed phase liquid chromatography (RPLC) separations interfaced to linear ion trap MS detection. MS data were interpreted with the Sequest-based Bioworks software (Thermo Electron). In-house developed Perl-scripts were used to calculate the spectral counts and the representative fragment ions for each peptide.
In this work, we report on the generation of a library of 9,677 peptides (p < 0.001), representing ~1,572 proteins from human breast cancer cells, that can be used for MRM/MS-based biomarker screening studies. For each protein, the library provides the number and sequence of detectable peptides, the charge state, the spectral count, the molecular weight, the parameters that characterize the quality of the tandem mass spectrum (p-value, DeltaM, Xcorr, DeltaCn, Sp, no. of matching a, b, y ions in the spectrum), the retention time, and the top 10 most intense product ions that correspond to a given peptide. Only proteins identified by at least two spectral counts are listed. The experimental distribution of protein frequencies, as a function of molecular weight, closely matched the theoretical distribution of proteins in the human proteome, as provided in the SwissProt database. The amino acid sequence coverage of the identified proteins ranged from 0.04% to 98.3%. The highest-abundance proteins in the cellular extract had a molecular weight (MW)<50,000.
Preliminary experiments have demonstrated that putative biomarkers, that are not detectable by conventional data dependent MS acquisition methods in complex un-fractionated samples, can be reliable identified with the information provided in this library. Based on the spectral count, the quality of a tandem mass spectrum and the m/z values for a parent peptide and its most abundant daughter ions, MRM conditions can be selected to enable the detection of target peptides and proteins.
PMCID: PMC2670839  PMID: 19327145
4.  Enhanced peptide quantification using spectral count clustering and cluster abundance 
BMC Bioinformatics  2011;12:423.
Quantification of protein expression by means of mass spectrometry (MS) has been introduced in various proteomics studies. In particular, two label-free quantification methods, such as spectral counting and spectra feature analysis have been extensively investigated in a wide variety of proteomic studies. The cornerstone of both methods is peptide identification based on a proteomic database search and subsequent estimation of peptide retention time. However, they often suffer from restrictive database search and inaccurate estimation of the liquid chromatography (LC) retention time. Furthermore, conventional peptide identification methods based on the spectral library search algorithms such as SEQUEST or SpectraST have been found to provide neither the best match nor high-scored matches. Lastly, these methods are limited in the sense that target peptides cannot be identified unless they have been previously generated and stored into the database or spectral libraries.
To overcome these limitations, we propose a novel method, namely Quantification method based on Finding the Identical Spectral set for a Homogenous peptide (Q-FISH) to estimate the peptide's abundance from its tandem mass spectrometry (MS/MS) spectra through the direct comparison of experimental spectra. Intuitively, our Q-FISH method compares all possible pairs of experimental spectra in order to identify both known and novel proteins, significantly enhancing identification accuracy by grouping replicated spectra from the same peptide targets.
We applied Q-FISH to Nano-LC-MS/MS data obtained from human hepatocellular carcinoma (HCC) and normal liver tissue samples to identify differentially expressed peptides between the normal and disease samples. For a total of 44,318 spectra obtained through MS/MS analysis, Q-FISH yielded 14,747 clusters. Among these, 5,777 clusters were identified only in the HCC sample, 6,648 clusters only in the normal tissue sample, and 2,323 clusters both in the HCC and normal tissue samples. While it will be interesting to investigate peptide clusters only found from one sample, further examined spectral clusters identified both in the HCC and normal samples since our goal is to identify and assess differentially expressed peptides quantitatively. The next step was to perform a beta-binomial test to isolate differentially expressed peptides between the HCC and normal tissue samples. This test resulted in 84 peptides with significantly differential spectral counts between the HCC and normal tissue samples. We independently identified 50 and 95 peptides by SEQUEST, of which 24 and 56 peptides, respectively, were found to be known biomarkers for the human liver cancer. Comparing Q-FISH and SEQUEST results, we found 22 of the differentially expressed 84 peptides by Q-FISH were also identified by SEQUEST. Remarkably, of these 22 peptides discovered both by Q-FISH and SEQUEST, 13 peptides are known for human liver cancer and the remaining 9 peptides are known to be associated with other cancers.
We proposed a novel statistical method, Q-FISH, for accurately identifying protein species and simultaneously quantifying the expression levels of identified peptides from mass spectrometry data. Q-FISH analysis on human HCC and liver tissue samples identified many protein biomarkers that are highly relevant to HCC. Q-FISH can be a useful tool both for peptide identification and quantification on mass spectrometry data analysis. It may also prove to be more effective in discovering novel protein biomarkers than SEQUEST and other standard methods.
PMCID: PMC3234305  PMID: 22034872
5.  Absolute quantification of microbial proteomes at different states by directed mass spectrometry 
The developed, directed mass spectrometry workflow allows to generate consistent and system-wide quantitative maps of microbial proteomes in a single analysis. Application to the human pathogen L. interrogans revealed mechanistic proteome changes over time involved in pathogenic progression and antibiotic defense, and new insights about the regulation of absolute protein abundances within operons.
The developed, directed proteomic approach allowed consistent detection and absolute quantification of 1680 proteins of the human pathogen L. interrogans in a single LC–MS/MS experiment.The comparison of 25 extensive, consistent and quantitative proteome maps revealed new insights about the proteome changes involved in pathogenic progression and antibiotic defense of L. interrogans, and about the regulation of protein abundances within operons.The generated time-resolved data sets are compatible with pattern analysis algorithms developed for transcriptomics, including hierarchical clustering and functional enrichment analysis of the detected profile clusters.This is the first study that describes the absolute quantitative behavior of any proteome over multiple states and represents the most comprehensive proteome abundance pattern comparison for any organism to date.
Over the last decade, mass spectrometry (MS)-based proteomics has evolved as the method of choice for system-wide proteome studies and now allows for the characterization of several thousands of proteins in a single sample. Despite these great advances, redundant monitoring of protein levels over large sample numbers in a high-throughput manner remains a challenging task. New directed MS strategies have shown to overcome some of the current limitations, thereby enabling the acquisition of consistent and system-wide data sets of proteomes with low-to-moderate complexity at high throughput.
In this study, we applied this integrated, two-stage MS strategy to investigate global proteome changes in the human pathogen L. interrogans. In the initial discovery phase, 1680 proteins (out of around 3600 gene products) could be identified (Schmidt et al, 2008) and, by focusing precious MS-sequencing time on the most dominant, specific peptides per protein, all proteins could be accurately and consistently monitored over 25 different samples within a few days of instrument time in the following scoring phase (Figure 1). Additionally, the co-analysis of heavy reference peptides enabled us to obtain absolute protein concentration estimates for all identified proteins in each perturbation (Malmström et al, 2009). The detected proteins did not show any biases against functional groups or protein classes, including membrane proteins, and span an abundance range of more than three orders of magnitude, a range that is expected to cover most of the L. interrogans proteome (Malmström et al, 2009).
To elucidate mechanistic proteome changes over time involved in pathogenic progression and antibiotic defense of L. interrogans, we generated time-resolved proteome maps of cells perturbed with serum and three different antibiotics at sublethal concentrations that are currently used to treat Leptospirosis. This yielded an information-rich proteomic data set that describes, for the first time, the absolute quantitative behavior of any proteome over multiple states, and represents the most comprehensive proteome abundance pattern comparison for any organism to date. Using this unique property of the data set, we could quantify protein components of entire pathways across several time points and subject the data sets to cluster analysis, a tool that was previously limited to the transcript level due to incomplete sampling on protein level (Figure 4). Based on these analyses, we could demonstrate that Leptospira cells adjust the cellular abundance of a certain subset of proteins and pathways as a general response to stress while other parts of the proteome respond highly specific. The cells furthermore react to individual treatments by ‘fine tuning' the abundance of certain proteins and pathways in order to cope with the specific cause of stress. Intriguingly, the most specific and significant expression changes were observed for proteins involved in motility, tissue penetration and virulence after serum treatment where we tried to simulate the host environment. While many of the detected protein changes demonstrate good agreement with available transcriptomics data, most proteins showed a poor correlation. This includes potential virulence factors, like Loa22 or OmpL1, with confirmed expression in vivo that were significantly up-regulated on the protein level, but not on the mRNA level, strengthening the importance of proteomic studies. The high resolution and coverage of the proteome data set enabled us to further investigate protein abundance changes of co-regulated genes within operons. This suggests that although most proteins within an operon respond to regulation synchronously, bacterial cells seem to have subtle means to adjust the levels of individual proteins or protein groups outside of the general trend, a phenomena that was recently also observed on the transcript level of other bacteria (Güell et al, 2009).
The method can be implemented with standard high-resolution mass spectrometers and software tools that are readily available in the majority of proteomics laboratories. It is scalable to any proteome of low-to-medium complexity and can be extended to post-translational modifications or peptide-labeling strategies for quantification. We therefore expect the approach outlined here to become a cornerstone for microbial systems biology.
Over the past decade, liquid chromatography coupled with tandem mass spectrometry (LC–MS/MS) has evolved into the main proteome discovery technology. Up to several thousand proteins can now be reliably identified from a sample and the relative abundance of the identified proteins can be determined across samples. However, the remeasurement of substantially similar proteomes, for example those generated by perturbation experiments in systems biology, at high reproducibility and throughput remains challenging. Here, we apply a directed MS strategy to detect and quantify sets of pre-determined peptides in tryptic digests of cells of the human pathogen Leptospira interrogans at 25 different states. We show that in a single LC–MS/MS experiment around 5000 peptides, covering 1680 L. interrogans proteins, can be consistently detected and their absolute expression levels estimated, revealing new insights about the proteome changes involved in pathogenic progression and antibiotic defense of L. interrogans. This is the first study that describes the absolute quantitative behavior of any proteome over multiple states, and represents the most comprehensive proteome abundance pattern comparison for any organism to date.
PMCID: PMC3159967  PMID: 21772258
absolute quantification; directed mass spectrometry; Leptospira interrogans; microbiology; proteomics
6.  Comparative Shotgun Proteomics Using Spectral Count Data and Quasi-Likelihood Modeling 
Journal of Proteome Research  2010;9(8):4295-4305.
Shotgun proteomics provides the most powerful analytical platform for global inventory of complex proteomes using liquid chromatography−tandem mass spectrometry (LC−MS/MS) and allows a global analysis of protein changes. Nevertheless, sampling of complex proteomes by current shotgun proteomics platforms is incomplete, and this contributes to variability in assessment of peptide and protein inventories by spectral counting approaches. Thus, shotgun proteomics data pose challenges in comparing proteomes from different biological states. We developed an analysis strategy using quasi-likelihood Generalized Linear Modeling (GLM), included in a graphical interface software package (QuasiTel) that reads standard output from protein assemblies created by IDPicker, an HTML-based user interface to query shotgun proteomic data sets. This approach was compared to four other statistical analysis strategies: Student t test, Wilcoxon rank test, Fisher’s Exact test, and Poisson-based GLM. We analyzed the performance of these tests to identify differences in protein levels based on spectral counts in a shotgun data set in which equimolar amounts of 48 human proteins were spiked at different levels into whole yeast lysates. Both GLM approaches and the Fisher Exact test performed adequately, each with their unique limitations. We subsequently compared the proteomes of normal tonsil epithelium and HNSCC using this approach and identified 86 proteins with differential spectral counts between normal tonsil epithelium and HNSCC. We selected 18 proteins from this comparison for verification of protein levels between the individual normal and tumor tissues using liquid chromatography−multiple reaction monitoring mass spectrometry (LC−MRM-MS). This analysis confirmed the magnitude and direction of the protein expression differences in all 6 proteins for which reliable data could be obtained. Our analysis demonstrates that shotgun proteomic data sets from different tissue phenotypes are sufficiently rich in quantitative information and that statistically significant differences in proteins spectral counts reflect the underlying biology of the samples.
Shotgun proteomics provides the most powerful analytical platform for global inventory of complex proteomes but incomplete sampling poses challenges in comparing protein inventories by spectral counting approaches. We developed a statistical method based on quasi-likelihood modeling and demonstrate that it compares favorably to other statistical tests. Statistically significant spectral count differences were confirmed by MRM demonstrating that the observed protein level differences reflect the underlying biology of the samples.
PMCID: PMC2920032  PMID: 20586475
LC−MS/MS; shotgun proteomics; multiple reaction monitoring (MRM); head and neck carcinoma; Generalized Linear Model; spectral counting
7.  EP6 Quantitative Proteomics 
There are numerous approaches to study the proteome in a quantitative manner. All rely heavily on optimized sample preparation and appropriate statistical analysis of resulting datasets. This session will cover the following aspects of quantitative proteomics approaches:
Quantitative profiling of the membrane proteome requires special considerations not addressed in typical mass-spectrometry analyses. Optimized sample preparation and separation strategies will be discussed in the context of enriched membrane fractions and a quantitative proteomics platform using stable isotopes.In shotgun proteomics, a complex protein mixture is first digested to peptides, which are then analyzed by a combination of nanoflow chromatography and tandem mass spectrometry. The effects of subtle changes in sample preparation and chromatographic conditions in the characterization of complex mixtures will be presented. A discovery-based mass spectrometry approach using a bench-top LTQ linear ion trap and in-house written software for label-free differential protein profiling will be presented. This approach is quite comprehensive and is compatible with even the most inexpensive mass spectrometers. For proteins not detected routinely using our discovery-based approaches, we have applied selected reaction monitoring using a TSQ Quantum Ultra. This approach has been used to identify and quantify proteins at the low ng/mL level in plasma without any prior fractionation. A software pipeline has been developed to go from hypothesized proteins of interest derived from the literature to predicted hSRM transitions, collision offsets, and predicted chromatographic retention times. The combination of both discovery- and hypothesis-driven proteomics using nanoflow separations and tandem mass spectrometry provides us with unparalleled sensitivity and dynamic range in characterizing complex mixtures.Spectrum counting is an appealing and relatively straightforward approach for quantitative proteomics. Since the spectrum count of a protein in a proteomic analysis is the total number of peptides, not just unique peptides detected and identified for a given protein, searching criteria and false-positive minimization is important. There are several different versions of spectral counting currently in use, but each approach has shared core characteristics. An additional important consideration for quantitative proteomic analysis is the use of replicates for statistical analysis and determining the proper statistical test to use based on the overall structure of the datasets. This presentation will describe the foundation of spectral counting and the modifications to this approach used by different researchers. In addition, selected examples of the biological implementation of these approaches will be described.
PMCID: PMC2292016
8.  Impact of Protein Stability, Cellular Localization, and Abundance on Proteomic Detection of Tumor-Derived Proteins in Plasma 
PLoS ONE  2011;6(7):e23090.
Tumor-derived, circulating proteins are potentially useful as biomarkers for detection of cancer, for monitoring of disease progression, regression and recurrence, and for assessment of therapeutic response. Here we interrogated how a protein's stability, cellular localization, and abundance affect its observability in blood by mass-spectrometry-based proteomics techniques. We performed proteomic profiling on tumors and plasma from two different xenograft mouse models. A statistical analysis of this data revealed protein properties indicative of the detection level in plasma. Though 20% of the proteins identified in plasma were tumor-derived, only 5% of the proteins observed in the tumor tissue were found in plasma. Both intracellular and extracellular tumor proteins were observed in plasma; however, after normalizing for tumor abundance, extracellular proteins were seven times more likely to be detected. Although proteins that were more abundant in the tumor were also more likely to be observed in plasma, the relationship was nonlinear: Doubling the spectral count increased detection rate by only 50%. Many secreted proteins, even those with relatively low spectral count, were observed in plasma, but few low abundance intracellular proteins were observed. Proteins predicted to be stable by dipeptide composition were significantly more likely to be identified in plasma than less stable proteins. The number of tryptic peptides in a protein was not significantly related to the chance of a protein being observed in plasma. Quantitative comparison of large versus small tumors revealed that the abundance of proteins in plasma as measured by spectral count was associated with the tumor size, but the relationship was not one-to-one; a 3-fold decrease in tumor size resulted in a 16-fold decrease in protein abundance in plasma. This study provides quantitative support for a tumor-derived marker prioritization strategy that favors secreted and stable proteins over all but the most abundant intracellular proteins.
PMCID: PMC3146523  PMID: 21829587
9.  Spectral counting assessment of protein dynamic range in cerebrospinal fluid following depletion with plasma-designed immunoaffinity columns 
Clinical proteomics  2011;8(1):6.
In cerebrospinal fluid (CSF), which is a rich source of biomarkers for neurological diseases, identification of biomarkers requires methods that allow reproducible detection of low abundance proteins. It is therefore crucial to decrease dynamic range and improve assessment of protein abundance.
We applied LC-MS/MS to compare the performance of two CSF enrichment techniques that immunodeplete either albumin alone (IgYHSA) or 14 high-abundance proteins (IgY14). In order to estimate dynamic range of proteins identified, we measured protein abundance with APEX spectral counting method.
Both immunodepletion methods improved the number of low-abundance proteins detected (3-fold for IgYHSA, 4-fold for IgY14). The 10 most abundant proteins following immunodepletion accounted for 41% (IgY14) and 46% (IgYHSA) of CSF protein content, whereas they accounted for 64% in non-depleted samples, thus demonstrating significant enrichment of low-abundance proteins. Defined proteomics experiment metrics showed overall good reproducibility of the two immunodepletion methods and MS analysis. Moreover, offline peptide fractionation in IgYHSA sample allowed a 4-fold increase of proteins identified (520 vs. 131 without fractionation), without hindering reproducibility.
The novelty of this study was to show the advantages and drawbacks of these methods side-to-side. Taking into account the improved detection and potential loss of non-target proteins following extensive immunodepletion, it is concluded that both depletion methods combined with spectral counting may be of interest before further fractionation, when searching for CSF biomarkers. According to the reliable identification and quantitation obtained with APEX algorithm, it may be considered as a cheap and quick alternative to study sample proteomic content.
PMCID: PMC3167203  PMID: 21906361
CSF; APEX; Biomarkers; depletion column; enrichment; low-abundance proteins
10.  2DB: a Proteomics database for storage, analysis, presentation, and retrieval of information from mass spectrometric experiments 
BMC Bioinformatics  2008;9:302.
The amount of information stemming from proteomics experiments involving (multi dimensional) separation techniques, mass spectrometric analysis, and computational analysis is ever-increasing. Data from such an experimental workflow needs to be captured, related and analyzed. Biological experiments within this scope produce heterogenic data ranging from pictures of one or two-dimensional protein maps and spectra recorded by tandem mass spectrometry to text-based identifications made by algorithms which analyze these spectra. Additionally, peptide and corresponding protein information needs to be displayed.
In order to handle the large amount of data from computational processing of mass spectrometric experiments, automatic import scripts are available and the necessity for manual input to the database has been minimized. Information is in a generic format which abstracts from specific software tools typically used in such an experimental workflow. The software is therefore capable of storing and cross analysing results from many algorithms. A novel feature and a focus of this database is to facilitate protein identification by using peptides identified from mass spectrometry and link this information directly to respective protein maps. Additionally, our application employs spectral counting for quantitative presentation of the data. All information can be linked to hot spots on images to place the results into an experimental context. A summary of identified proteins, containing all relevant information per hot spot, is automatically generated, usually upon either a change in the underlying protein models or due to newly imported identifications. The supporting information for this report can be accessed in multiple ways using the user interface provided by the application.
We present a proteomics database which aims to greatly reduce evaluation time of results from mass spectrometric experiments and enhance result quality by allowing consistent data handling. Import functionality, automatic protein detection, and summary creation act together to facilitate data analysis. In addition, supporting information for these findings is readily accessible via the graphical user interface provided. The database schema and the implementation, which can easily be installed on virtually any server, can be downloaded in the form of a compressed file from our project webpage.
PMCID: PMC2475538  PMID: 18605993
11.  Cysteinyl Peptide Capture for Shotgun Proteomics: Global Assessment of Chemoselective Fractionation 
Journal of Proteome Research  2010;9(10):5461-5472.
The complexity of cell and tissue proteomes presents one of the most significant technical challenges in proteomic biomarker discovery. Multidimensional liquid chromatography−tandem mass spectrometry (LC−MS/MS)-based shotgun proteomics can be coupled with selective enrichment of cysteinyl peptides (Cys-peptides) to reduce sample complexity and increase proteome coverage. Here we evaluated the impact of Cys-peptide enrichment on global proteomic inventories. We employed a new cleavable thiol-reactive biotinylating probe, N-(2-(2-(2-(2-(3-(1-hydroxy-2-oxo-2-phenylethyl)phenoxy)acetamido)ethoxy)-ethoxy)ethyl)-5-(2-oxohexahydro-1H-thieno[3,4-d]imidazol-4-yl)pentanamide (IBB), to capture Cys-peptides after digestion. Treatment of tryptic digests with the IBB reagent followed by streptavidin capture and mild alkaline hydrolysis releases a highly purified population of Cys-peptides with a residual S-carboxymethyl tag. Isoelectric focusing (IEF) followed by LC−MS/MS of Cys-peptides significantly expanded proteome coverage in Saccharomyces cerevisiae (yeast) and in human colon carcinoma RKO cells. IBB-based fractionation enhanced detection of Cys-proteins in direct proportion to their cysteine content. The degree of enrichment typically was 2−8-fold but ranged up to almost 20-fold for a few proteins. Published copy number annotation for the yeast proteome enabled benchmarking of MS/MS spectral count data to yeast protein abundance and revealed selective enrichment of cysteine-rich, lower abundance proteins. Spectral count data further established this relationship in RKO cells. Enhanced detection of low abundance proteins was due to the chemoselectivity of Cys-peptide capture, rather than simplification of the peptide mixture through fractionation.
Chemoselective enrichment of cysteinyl peptides (Cys-peptides) has been used in proteome analyses to reduce sample complexity and increase proteome coverage. We evaluated the impact of Cys-peptide enrichment on global proteomic inventories by multidimensional liquid chromatography−tandem mass spectrometry (LC−MS/MS)-based shotgun proteomics. Enhanced detection of low abundance proteins was due to the chemoselectivity of Cys-peptide capture, rather than simplification of the peptide mixture through fractionation.
PMCID: PMC2948434  PMID: 20731415
12.  Abacus: A computational tool for extracting and pre-processing spectral count data for label-free quantitative proteomic analysis 
Proteomics  2011;11(7):1340-1345.
We describe Abacus, a computational tool for extracting spectral counts from tandem mass spectrometry based proteomic datasets. The program aggregates data from multiple experiments, adjusts spectral counts to accurately account for peptides shared across multiple proteins, and performs common normalization steps. It can also output the spectral count data at the gene level, thus simplifying the integration and comparison between gene and protein expression data. Abacus is compatible with the widely used Trans-Proteomic Pipeline suite of tools and comes with a graphical user interface making it easy to interact with the program. The main aim of Abacus is to streamline the analysis of spectral count data by providing an automated, easy to use solution for extracting this information from proteomic datasets for subsequent, more sophisticated statistical analysis.
PMCID: PMC3113614  PMID: 21360675
Label free quantification; spectral counts; software; tandem mass spectrometry; protein inference; shared peptides
13.  Corra: Computational framework and tools for LC-MS discovery and targeted mass spectrometry-based proteomics 
BMC Bioinformatics  2008;9:542.
Quantitative proteomics holds great promise for identifying proteins that are differentially abundant between populations representing different physiological or disease states. A range of computational tools is now available for both isotopically labeled and label-free liquid chromatography mass spectrometry (LC-MS) based quantitative proteomics. However, they are generally not comparable to each other in terms of functionality, user interfaces, information input/output, and do not readily facilitate appropriate statistical data analysis. These limitations, along with the array of choices, present a daunting prospect for biologists, and other researchers not trained in bioinformatics, who wish to use LC-MS-based quantitative proteomics.
We have developed Corra, a computational framework and tools for discovery-based LC-MS proteomics. Corra extends and adapts existing algorithms used for LC-MS-based proteomics, and statistical algorithms, originally developed for microarray data analyses, appropriate for LC-MS data analysis. Corra also adapts software engineering technologies (e.g. Google Web Toolkit, distributed processing) so that computationally intense data processing and statistical analyses can run on a remote server, while the user controls and manages the process from their own computer via a simple web interface. Corra also allows the user to output significantly differentially abundant LC-MS-detected peptide features in a form compatible with subsequent sequence identification via tandem mass spectrometry (MS/MS). We present two case studies to illustrate the application of Corra to commonly performed LC-MS-based biological workflows: a pilot biomarker discovery study of glycoproteins isolated from human plasma samples relevant to type 2 diabetes, and a study in yeast to identify in vivo targets of the protein kinase Ark1 via phosphopeptide profiling.
The Corra computational framework leverages computational innovation to enable biologists or other researchers to process, analyze and visualize LC-MS data with what would otherwise be a complex and not user-friendly suite of tools. Corra enables appropriate statistical analyses, with controlled false-discovery rates, ultimately to inform subsequent targeted identification of differentially abundant peptides by MS/MS. For the user not trained in bioinformatics, Corra represents a complete, customizable, free and open source computational platform enabling LC-MS-based proteomic workflows, and as such, addresses an unmet need in the LC-MS proteomics field.
PMCID: PMC2651178  PMID: 19087345
14.  ProtQuant: a tool for the label-free quantification of MudPIT proteomics data 
BMC Bioinformatics  2007;8(Suppl 7):S24.
Effective and economical methods for quantitative analysis of high throughput mass spectrometry data are essential to meet the goals of directly identifying, characterizing, and quantifying proteins from a particular cell state. Multidimensional Protein Identification Technology (MudPIT) is a common approach used in protein identification. Two types of methods are used to detect differential protein expression in MudPIT experiments: those involving stable isotope labelling and the so-called label-free methods. Label-free methods are based on the relationship between protein abundance and sampling statistics such as peptide count, spectral count, probabilistic peptide identification scores, and sum of peptide Sequest XCorr scores (ΣXCorr). Although a number of label-free methods for protein quantification have been described in the literature, there are few publicly available tools that implement these methods. We describe ProtQuant, a Java-based tool for label-free protein quantification that uses the previously published ΣXCorr method for quantification and includes an improved method for handling missing data.
ProtQuant was designed for ease of use and portability for the bench scientist. It implements the ΣXCorr method for label free protein quantification from MudPIT datasets. ProtQuant has a graphical user interface, accepts multiple file formats, is not limited by the size of the input files, and can process any number of replicates and any number of treatments. In addition,ProtQuant implements a new method for dealing with missing values for peptide scores used for quantification. The new algorithm, called ΣXCorr*, uses "below threshold" peptide scores to provide meaningful non-zero values for missing data points. We demonstrate that ΣXCorr* produces an average reduction in false positive identifications of differential expression of 25% compared to ΣXCorr.
ProtQuant is a tool for protein quantification built for multi-platform use with an intuitive user interface. ProtQuant efficiently and uniquely performs label-free quantification of protein datasets produced with Sequest and provides the user with facilities for data management and analysis. Importantly, ProtQuant is available as a self-installing executable for the Windows environment used by many bench scientists.
PMCID: PMC2099493  PMID: 18047724
15.  Comprehensive analysis of protein digestion using six trypsins reveals the origin of trypsin as a significant source of variability in proteomics1 
Journal of proteome research  2013;12(12):5666-5680.
Trypsin is an endoprotease commonly used for sample preparation in proteomics experiments. Importantly, protein digestion is dependent on multiple factors, including the trypsin origin and digestion conditions. In-depth characterization of trypsin activity could lead to improved reliability of peptide detection and quantitation in both targeted and discovery proteomics studies. To this end, we assembled a data analysis pipeline and suite of visualization tools for quality control and comprehensive characterization of pre-analytical variability in proteomics experiments. Using these tools, we evaluated six available proteomics-grade trypsins and their digestion of a single purified protein, human serum albumin (HSA). HSA was aliquoted and then digested for 2 or 18 hours for each trypsin, and the resulting digests were desalted and analyzed in triplicate by reversed phase liquid chromatography - tandem mass spectrometry. Peptides were identified and quantified using the NIST MSQC pipeline and a comprehensive HSA mass spectral library. We performed a statistical analysis of peptide abundances from different digests, and further visualized the data using the principal component analysis and quantitative protein “sequence maps”. While the performance of individual trypsins across repeat digests was reproducible, significant differences were observed depending on the origin of the trypsin (i.e., bovine vs. porcine). Bovine trypsins produced a higher number of peptides containing missed cleavages, whereas porcine trypsins produced more semi-tryptic peptides. In addition, many cleavage sites showed variable digestion kinetics patterns, evident from the comparison of peptide abundances in 2 hour vs. 18 hour digests. Overall, this work illustrates effects of an often neglected source of variability in proteomics experiments: the origin of the trypsin.
PMCID: PMC4076643  PMID: 24116745
proteomics; mass spectrometry; trypsin; digestion; endoprotease specificity; peptide abundance; variability; missed cleavages; label-free quantification; statistical analysis
16.  Urinary proteome analysis of irritable bowel syndrome (IBS) symptom subgroups 
Journal of proteome research  2012;11(12):5650-5662.
Irritable bowel syndrome (IBS) is a functional gastrointestinal (GI) disorder characterized by chronic abdominal pain associated with alterations in bowel function. Given the heterogeneity of the symptoms, multiple pathophysiologic factors are suspected to play a role. We classified women with IBS into four subgroups based on distinct symptom profiles. In-depth shotgun proteomic analysis was carried out to profile the urinary proteomes to identify possible proteins associated with these subgroups. First void urine samples with urine creatinine level ≥ 100 mg/dL were used after excluding samples that tested positive for blood. Urine from ten subjects representing each symptom subgroup was pooled for proteomic analysis. The urine proteome was analyzed by liquid chromatography-tandem mass spectrometry (LC-MS/MS) using a data-independent method known as Precursor Acquisition Independent From Ion Count (PAcIFIC) that allowed extended detectable dynamic range. Differences in protein quantities were determined by peptide spectral counting followed by validation of select proteins with ELISA or a targeted single reaction monitoring (LC-SRM/MS) approach. Four IBS symptom subgroups were selected: 1) Constipation, 2) Diarrhea + Low Pain, 3) Diarrhea + High Pain, and 4) High Pain + High Pychological Distress. A fifth group consisted of Healthy Control subjects. From comparisons of quantitative spectral counting data among the symptom subgroups and controls, a total of 18 proteins that showed quantitative differences in relative abundance and possible physiological relevance to IBS were selected for further investigation. Three of the 18 proteins were chosen for validation by either ELISA or SRM. An elevated expression of gelsolin (GSN) was associated with the high pain groups. Trefoil Factor 3 (TFF3) levels were higher in IBS groups compared to controls. In this study the IBS patients subclassified by predominant symptoms showed differences in urine proteome levels. Proteins showing distinctive changes are involved in homeostasis of intestinal function and inflammatory response. These findings warrant future studies with larger, independent cohorts to enable more extensive assessment and validation of urinary protein markers as a diagnostic tool in adult with IBS.
PMCID: PMC3631108  PMID: 22998556
biomarker; irritable bowel syndrome; mass spectrometry; proteomics; urine; women
17.  Comprehensive analysis of the mouse renal cortex using two-dimensional HPLC – tandem mass spectrometry 
Proteome Science  2008;6:15.
Proteomic methodologies increasingly have been applied to the kidney to map the renal cortical proteome and to identify global changes in renal proteins induced by diseases such as diabetes. While progress has been made in establishing a renal cortical proteome using 1-D or 2-DE and mass spectrometry, the number of proteins definitively identified by mass spectrometry has remained surprisingly small. Low coverage of the renal cortical proteome as well as our interest in diabetes-induced changes in proteins found in the renal cortex prompted us to perform an in-depth proteomic analysis of mouse renal cortical tissue.
We report a large scale analysis of mouse renal cortical proteome using SCX prefractionation strategy combined with HPLC – tandem mass spectrometry. High-confidence identification of ~2,000 proteins, including cytoplasmic, nuclear, plasma membrane, extracellular and unknown/unclassified proteins, was obtained by separating tryptic peptides of renal cortical proteins into 60 fractions by SCX prior to LC-MS/MS. The identified proteins represented the renal cortical proteome with no discernible bias due to protein physicochemical properties, subcellular distribution, biological processes, or molecular function. The highest ranked molecular functions were characteristic of tubular epithelium, and included binding, catalytic activity, transporter activity, structural molecule activity, and carrier activity. Comparison of this renal cortical proteome with published human urinary proteomes demonstrated enrichment of renal extracellular, plasma membrane, and lysosomal proteins in the urine, with a lack of intracellular proteins. Comparison of the most abundant proteins based on normalized spectral abundance factor (NSAF) in this dataset versus a published glomerular proteome indicated enrichment of mitochondrial proteins in the former and cytoskeletal proteins in the latter.
A whole tissue extract of the mouse kidney cortex was analyzed by an unbiased proteomic approach, yielding a dataset of ~2,000 unique proteins identified with strict criteria to ensure a high level of confidence in protein identification. As a result of extracting all proteins from the renal cortex, we identified an exceptionally wide range of renal proteins in terms of pI, MW, hydrophobicity, abundance, and subcellular location. Many of these proteins, such as low-abundance proteins, membrane proteins and proteins with extreme values in pI or MW are traditionally under-represented in 2-DE-based proteomic analysis.
PMCID: PMC2412861  PMID: 18501002
18.  Equivalence of Protein Inventories Obtained from Formalin-fixed Paraffin-embedded and Frozen Tissue in Multidimensional Liquid Chromatography-Tandem Mass Spectrometry Shotgun Proteomic Analysis* 
Formalin-fixed paraffin-embedded (FFPE) tissue specimens comprise a potentially valuable resource for retrospective biomarker discovery studies, and recent work indicates the feasibility of using shotgun proteomics to characterize FFPE tissue proteins. A critical question in the field is whether proteomes characterized in FFPE specimens are equivalent to proteomes in corresponding fresh or frozen tissue specimens. Here we compared shotgun proteomic analyses of frozen and FFPE specimens prepared from the same colon adenoma tissues. Following deparaffinization, rehydration, and tryptic digestion under mild conditions, FFPE specimens corresponding to 200 μg of protein yielded ∼400 confident protein identifications in a one-dimensional reverse phase liquid chromatography-tandem mass spectrometry (LC-MS/MS) analysis. The major difference between frozen and FFPE proteomes was a decrease in the proportions of lysine C-terminal to arginine C-terminal peptides observed, but these differences had little effect on the proteins identified. No covalent peptide modifications attributable to formaldehyde chemistry were detected by analyses of the MS/MS datasets, which suggests that undetected, cross-linked peptides comprise the major class of modifications in FFPE tissues. Fixation of tissue for up to 2 days in neutral buffered formalin did not adversely impact protein identifications. Analysis of archival colon adenoma FFPE specimens indicated equivalent numbers of MS/MS spectral counts and protein group identifications from specimens stored for 1, 3, 5, and 10 years. Combination of peptide isoelectric focusing-based separation with reverse phase LC-MS/MS identified 2554 protein groups in 600 ng of protein from frozen tissue and 2302 protein groups from FFPE tissue with at least two distinct peptide identifications per protein. Analysis of the combined frozen and FFPE data showed a 92% overlap in the protein groups identified. Comparison of gene ontology categories of identified proteins revealed no bias in protein identification based on subcellular localization. Although the status of posttranslational modifications was not examined in this study, archival samples displayed a modest increase in methionine oxidation, from ∼17% after one year of storage to ∼25% after 10 years. These data demonstrate the equivalence of proteome inventories obtained from FFPE and frozen tissue specimens and provide support for retrospective proteomic analysis of FFPE tissues for biomarker discovery.
PMCID: PMC2722776  PMID: 19467989
19.  ABRF Research Group Development and Characterization of a Proteomics Normalization Standard Consisting of 1,000 Stable Isotope Labeled Peptides 
The ABRF Proteomics Standards Research Group (sPRG) is reporting the progress of a two-year study (2012–2014) which focuses on the generation of interassay, interspecies, and interlaboratory peptide standard that can be used for normalization of protein abundance measurements in mass spectrometry based quantitative proteomics analyses. The standard has been formulated as two mixtures: 1,000 stable isotope 13C/15N-labeled (SIL) synthetic peptides alone, and peptides mixed with a tryptic digest of a HEK 293 cell lysate. The sequences of the synthetic peptides were derived from 552 proteins conserved across proteomes of commonly analyzed species: Homo sapiens, Mus musculus and Rattus norvegicus. The selected peptides represent a full range of hydrophobicities and isoelectric points, typical of tryptic peptides derived from complex proteomic samples. The standard was designed to represent proteins of various concentrations, spanning three orders of magnitude. First year efforts were focused on selection of appropriate protein and peptide candidates, peptide synthesis, quality assessment and LC-MS/MS evaluation conducted in laboratories of sPRG members. Using a variety of instrumental configurations and bioinformatics approaches, a thorough characterization of all 1,000 peptides was established. In the second year, the group launched the study to the entire proteomics community. A lyophilized mixture of HEK 293 tryptic digest cell lysate spiked with the 1,000 SIL peptide standards was provided to each participant. Also provided were a Skyline tutorial, tutorial datasets, three MS/MS spectral libraries generated from linear ion-trap (CID), Q-TOF/QQQ (CID), or Orbitrap (HCD) instrumentation, and a Panorama data repository. Participants were asked to analyze the sample in triplicate and calculate ratios of the spiked SIL to endogenous peptides and coefficients of variance for each peptide. Over 40 datasets were returned, and results following thorough characterization of the standard using various instrumental configurations will be reported.
PMCID: PMC4162257
20.  MaXIC-Q Web: a fully automated web service using statistical and computational methods for protein quantitation based on stable isotope labeling and LC–MS 
Nucleic Acids Research  2009;37(Web Server issue):W661-W669.
Isotope labeling combined with liquid chromatography–mass spectrometry (LC–MS) provides a robust platform for analyzing differential protein expression in proteomics research. We present a web service, called MaXIC-Q Web (, for quantitation analysis of large-scale datasets generated from proteomics experiments using various stable isotope-labeling techniques, e.g. SILAC, ICAT and user-developed labeling methods. It accepts spectral files in the standard mzXML format and search results from SEQUEST, Mascot and ProteinProphet as input. Furthermore, MaXIC-Q Web uses statistical and computational methods to construct two kinds of elution profiles for each ion, namely, PIMS (projected ion mass spectrum) and XIC (extracted ion chromatogram) from MS data. Toward accurate quantitation, a stringent validation procedure is performed on PIMSs to filter out peptide ions interfered with co-eluting peptides or noise. The areas of XICs determine ion abundances, which are used to calculate peptide and protein ratios. Since MaXIC-Q Web adopts stringent validation on spectral data, it achieves high accuracy so that manual validation effort can be substantially reduced. Furthermore, it provides various visualization diagrams and comprehensive quantitation reports so that users can conveniently inspect quantitation results. In summary, MaXIC-Q Web is a user-friendly, interactive, robust, generic web service for quantitation based on ICAT and SILAC labeling techniques.
PMCID: PMC2703943  PMID: 19528069
21.  Comparison of spectral counting and metabolic stable isotope labeling for use with quantitative microbial proteomics 
The Analyst  2006;131(12):1335-1341.
Spectral counting, a promising method for quantifying relative changes in protein abundance in mass spectrometry-based proteomic analysis, was compared to metabolic stable isotope labeling using 15N/14N “heavy/light” peptide pairs. The data were drawn primarily from a Methanococcus maripaludis experiment comparing a wild-type strain with a mutant deficient in a key enzyme relevant to energy metabolism. The dataset contained both proteome and transcriptome measurements. The normalization technique used previously for the isotopic measurements was inappropriate for spectral counting, but a simple adjustment for sampling frequency was sufficient for normalization. This adjustment was satisfactory both for M. maripaludis, an organism that showed relatively little expression change between the wild-type and mutant strains, and Porphyromonas gingivalis, an intracellular pathogen that has demonstrated widespread changes between intracellular and extracellular conditions. Spectral counting showed lower overall sensitivity defined in terms of detecting a two-fold change in protein expression, and in order to achieve the same level of quantitative proteome coverage as the stable isotope method, it would have required approximately doubling the number of mass spectra collected.
PMCID: PMC2660848  PMID: 17124542
22.  Refining comparative proteomics by spectral counting to account for shared peptides and multiple search engines 
Analytical and bioanalytical chemistry  2012;404(4):1115-1125.
Spectral counting has become a widely used approach for measuring and comparing protein abundance in label-free shotgun proteomics. However, when analyzing complex samples, the ambiguity of matching between peptides and proteins greatly affects the assessment of peptide and protein inventories, differentiation, and quantification. Meanwhile, the configuration of database searching algorithms that assign peptides to MS/MS spectra may produce different results in comparative proteomic analysis. Here, we present three strategies to improve comparative proteomics through spectral counting. We show that comparing spectral counts for peptide groups rather than for protein groups forestalls problems introduced by shared peptides. We demonstrate the advantage and flexibility of this new method in two datasets. We present four models to combine four popular search engines that lead to significant gains in spectral counting differentiation. Among these models, we demonstrate a powerful vote counting model that scales well for multiple search engines. We also show that semi-tryptic searching outperforms tryptic searching for comparative proteomics. Overall, these techniques considerably improve protein differentiation on the basis of spectral count tables.
PMCID: PMC3717168  PMID: 22552787
Label-free comparative proteomics; Spectral counting; Combining database search engines
23.  Utilizing Spectral Counting to Quantitatively Characterize Tandem Removal of Abundant Proteins (TRAP) in Human Plasma 
Analytical chemistry  2010;82(24):10179-10185.
Biomarker discovery efforts in serum and plasma are greatly hindered by the presence of high abundance proteins that prevent the detection and quantification of less abundant, yet biologically significant proteins. The most common method for addressing this problem is to specifically remove the few abundant proteins through immunoaffinity depletion/subtraction. Herein, we improved upon this method by utilizing multiple depletion columns in series, so as to increase the efficiency of the abundant protein removal and augment the detection/identification of less abundant plasma proteins. Spectral counting was utilized to make quantitative comparisons between un-depleted plasma, plasma depleted with a single depletion column, and plasma depleted using two or three depletion columns in tandem. In the un-depleted plasma only 29 lower abundance protein groups were identified with the top-scoring protein from each group having a median spectral count of 3, while in the plasma processed using a single HSA depletion column 61 such protein groups were identified with a median spectral count of 8. In comparison, 76 lesser abundant protein groups were identified with a median spectral count of 11.5 in the two column setup (i.e., HSA followed by MARS Hu14). However, in the ultimate depleted plasma sample, which was created using three depletion columns in tandem, the number of less abundant protein groups identified increased to 81 and the median, average spectral count for the top-scoring proteins from each group increased to 15 counts per protein. Moreover, exogenous B-type Natriuretic Peptide-32, which was added to the plasma as a detection benchmark at 12 μg/mL, was only detected in the plasma sample depleted using three depletion columns in tandem. Collectively, these data demonstrate this method, Tandem Removal of Abundant Proteins or TRAP, provides superior removal efficiency compared to traditional applications and improves the depth of proteome coverage in plasma.
PMCID: PMC3654688  PMID: 21090636
24.  How to Generate High Quality Protein Interaction Maps 
Affinity purification followed by mass spectrometry (AP-MS) has become a commonly used method for the identification of protein-protein interactions and protein complexes. We will start with a review of the most commonly used experimental AP-MS workflows, with an emphasis on the experimental design and data analysis challenges typically encountered in such studies. One of the foremost challenges of interactome mapping is a large number of false positive protein interactions present in unfiltered datasets. We will review computational and informatics strategies for detecting specific protein interaction partners in AP-MS experiments, with a focus on incomplete (as opposite to genome-wide) interactome mapping studies. These strategies range from standard statistical approaches, to empirical scoring schemes optimized for a particular type of data, to advanced computational frameworks. The common denominator among these methods is the use of label-free quantitative information such as spectral counts or peptide ion intensities that can be extracted from MS data. We will discuss in more detail the current state of the computational tool SAINT developed in our lab. We will present its extension to intensity-based data, and compare the two quantitative strategies (spectral counts and intensities) in the context of AP-MS studies. We will also discuss related issues such as combining multiple biological or technical replicates, and dealing with data generated using different tagging strategies. We then present a new resource – the Contaminant Repository for Affinity Purification – a central repository to store, annotate, statistically analyze and disseminate lists of background contaminants likely to be observed in AP-MS studies. We will show how the contaminant repository, coupled with statistical scoring tools such as SAINT, can significantly improve the ability of individual researchers, especially in small-scale studies, to filter out likely false interactions based on the analysis of protein abundance profiles across multiple control experiments annotated in the repository.
PMCID: PMC3635379
25.  aLFQ: an R-package for estimating absolute protein quantities from label-free LC-MS/MS proteomics data 
Bioinformatics  2014;30(17):2511-2513.
Motivation: The determination of absolute quantities of proteins in biological samples is necessary for multiple types of scientific inquiry. While relative quantification has been commonly used in proteomics, few proteomic datasets measuring absolute protein quantities have been reported to date. Various technologies have been applied using different types of input data, e.g. ion intensities or spectral counts, as well as different absolute normalization strategies. To date, a user-friendly and transparent software supporting large-scale absolute protein quantification has been lacking.
Results: We present a bioinformatics tool, termed aLFQ, which supports the commonly used absolute label-free protein abundance estimation methods (TopN, iBAQ, APEX, NSAF and SCAMPI) for LC-MS/MS proteomics data, together with validation algorithms enabling automated data analysis and error estimation.
Availability and implementation: aLFQ is written in R and freely available under the GPLv3 from CRAN ( Instructions and example data are provided in the R-package. The raw data can be obtained from the PeptideAtlas raw data repository (PASS00321).
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC4147881  PMID: 24753486

Results 1-25 (1191846)