PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1112663)

Clipboard (0)
None

Related Articles

1.  Label-Free Protein Quantitation Using Weighted Spectral Counting 
Methods in molecular biology (Clifton, N.J.)  2012;893:10.1007/978-1-61779-885-6_20.
Mass spectrometry (MS)-based shotgun proteomics allows protein identifications even in complex biological samples. Protein abundances can then be estimated from the counts of MS/MS spectra attributable to each protein, provided that one corrects for differential MS-detectability of the contributing peptides. We describe the use of a method, APEX, which calculates Absolute Protein EXpression levels based on learned correction factors, MS/MS spectral counts, and each protein's probability of correct identification.
The APEX-based calculations consist of three parts: (1) Using training data, peptide sequences and their sequence properties, a model is built that can be used to estimate MS-detectability (Oi) for any given protein. (2) Absolute abundances of proteins measured in an MS/MS experiment are calculated with information from spectral counts, identification probabilities and the learned Oi -values. (3) Simple statistics allow for significance analysis of differential expression in two distinct biological samples, i.e., measuring relative protein abundances. APEX-based protein abundances span more than four orders of magnitude and are applicable to mixtures of hundreds to thousands of proteins from any type of organism.
doi:10.1007/978-1-61779-885-6_20
PMCID: PMC3654649  PMID: 22665309
Quantitative proteomics; Protein expression; Label-free mass spectrometry; Spectral counting
2.  Proteome-wide systems analysis of a cellulosic biofuel-producing microbe 
We apply mass spectrometry-based ReDi proteomics to quantify the Clostridium phytofermentans proteome during fermentation of cellulosic substrates. ReDi proteomics gives accurate, low-cost quantification of an extra and intracellular microbial proteome. When combined with physiological measurements, these methods form a general systems biology strategy to evaluate the efficiency of cellulosic bioconversion and to identify enzyme targets to engineer for improving this process.C. phytofermentans expressed more than 100 carbohydrate-active enzymes, of which distinct subsets were upregulated on cellulose and hemicellulose. Numerous extracellular enzymes cleave insoluble plant polysaccharides into oligosaccharides, which are transported into the cell to be further degraded by intracellular carbohydratases. Sugars are catabolized by EMP glycolysis incorporating alternative glycolytic enzymes to maximize the ATP yield of anaerobic metabolism.During cellulosic fermentation, cells adhered to the substrate and altered metabolic processes such as upregulation of tryptophan and nicotinamide synthesis proteins and repression of proteins for fatty acid metabolism and cell motility. These diverse metabolic changes highlight how a systems approach can identify novel ways to optimize cellulosic fermentation.
Cellulose is the world's most abundant renewable, biological energy source (Leschine, 1995). Microbial fermentation of cellulosic biomass could sustainably provide enough ethanol for 65% of US ground transportation fuel at current levels (Somerville, 2006). However, cellulose in plant biomass is packaged into a crystalline matrix, making biomass deconstruction a key roadblock to using it as a feedstock (Houghton et al, 2006). A promising strategy to overcome biomass recalcitrance is consolidated bioprocessing (Lynd et al, 2002), which uses microbes such as Clostridium phytofermentans to both secrete enzymes to depolymerize biomass and then ferment the resulting hexose and pentose sugars to a biofuel such as ethanol. The C. phytofermentans genome encodes 161 carbohydrate-active enzymes (CAZy) including 108 glycoside hydrolases spread across 39 families (Cantarel et al, 2009), highlighting the elaborate set of enzymes needed to breakdown different cellulosic polysaccharides. Faced with the complexity of metabolizing biomass, systems biology strategies are needed to comprehensively identify which cellulolytic and metabolic enzymes are used to ferment different cellulosic substrates.
This study presents a systems-level analysis of how C. phytofermentans ferments different cellulosic substrates that incorporates quantitative mass spectrometry-based proteomics of over 2500 proteins. Protein concentrations within each carbon source treatment were calculated by machine learning-supported spectral counting (Absolute Protein EXpression, APEX) (Lu et al, 2007). Protein levels on hemicellulose and cellulose relative to glucose were determined using reductive methylation (Hsu et al, 2003; Boersema et al, 2009), here called ReDi labeling, to chemically incorporate hydrogen or deuterium isotopes at lysines and N-terminal amines of tryptic peptides. We show that ReDi proteomics gives accurate, low-cost quantification of a microbial proteome and can be used to discern extracellular proteins. Further, we combine these quantitative proteomics with detailed measurements of growth, biomass consumption, fermentation product analyses, and electron microscopy. Together, these methods form a general strategy to evaluate the efficiency of cellulosic bioconversion and to identify enzyme targets to engineer for improving this process (Figure 1).
We found that fermentation of cellulosic substrates by C. phytofermentans involves secretion of numerous CAZy as well as proteins for binding of extracellular solutes, proteolysis, and motility. The most highly expressed protein in the proteome is a secreted protein that appears to compose a surface layer to support the cell and anchor cell surface proteins, including some enzymes for plant degradation. Once the secreted CAZy cleave insoluble plant polysaccharides into oligosaccharides, they are taken into the cell to be further degraded by intracellular CAZy, enabling more efficient sugar transport, conserving energy by phosphorolytic cleavage, and ensuring the sugar monomers were not available to competing microbes. Sugars are catabolized by EMP glycolysis incorporating reversible, PPi-dependent glycolytic enzymes, and pyruvate ferredoxin oxidoreductase. The genome encodes seven alcohol dehydrogenases, among which two iron-dependent enzymes are highly expressed and likely facilitate the high ethanol yields. Growth on cellulose also resulted in indirect changes such as increased tryptophan and nicotinamide synthesis and repression of fatty acid synthesis. We distilled the data into a model showing the highly expressed enzymes enabling efficient cellulosic fermentation by C. phytofermentans (Figure 7). Collectively, these data help understand how bacteria recycle plant biomass works towards enabling the use of plant biomass as a low-cost chemical feedstock.
Fermentation of plant biomass by microbes like Clostridium phytofermentans recycles carbon globally and can make biofuels from inedible feedstocks. We analyzed C. phytofermentans fermenting cellulosic substrates by integrating quantitative mass spectrometry of more than 2500 proteins with measurements of growth, enzyme activities, fermentation products, and electron microscopy. Absolute protein concentrations were estimated using Absolute Protein EXpression (APEX); relative changes between treatments were quantified with chemical stable isotope labeling by reductive dimethylation (ReDi). We identified the different combinations of carbohydratases used to degrade cellulose and hemicellulose, many of which were secreted based on quantification of supernatant proteins, as well as the repertoires of glycolytic enzymes and alcohol dehydrogenases (ADHs) enabling ethanol production at near maximal yields. Growth on cellulose also resulted in diverse changes such as increased expression of tryptophan synthesis proteins and repression of proteins for fatty acid metabolism and cell motility. This study gives a systems-level understanding of how this microbe ferments biomass and provides a rational, empirical basis to identify engineering targets for industrial cellulosic fermentation.
doi:10.1038/msb.2010.116
PMCID: PMC3049413  PMID: 21245846
bioenergy; clostridium; proteomics
3.  MRM screening/biomarker discovery with linear ion trap MS: a library of human cancer-specific peptides 
BMC Cancer  2009;9:96.
Background
The discovery of novel protein biomarkers is essential in the clinical setting to enable early disease diagnosis and increase survivability rates. To facilitate differential expression analysis and biomarker discovery, a variety of tandem mass spectrometry (MS/MS)-based protein profiling techniques have been developed. For achieving sensitive detection and accurate quantitation, targeted MS screening approaches, such as multiple reaction monitoring (MRM), have been implemented.
Methods
MCF-7 breast cancer protein cellular extracts were analyzed by 2D-strong cation exchange (SCX)/reversed phase liquid chromatography (RPLC) separations interfaced to linear ion trap MS detection. MS data were interpreted with the Sequest-based Bioworks software (Thermo Electron). In-house developed Perl-scripts were used to calculate the spectral counts and the representative fragment ions for each peptide.
Results
In this work, we report on the generation of a library of 9,677 peptides (p < 0.001), representing ~1,572 proteins from human breast cancer cells, that can be used for MRM/MS-based biomarker screening studies. For each protein, the library provides the number and sequence of detectable peptides, the charge state, the spectral count, the molecular weight, the parameters that characterize the quality of the tandem mass spectrum (p-value, DeltaM, Xcorr, DeltaCn, Sp, no. of matching a, b, y ions in the spectrum), the retention time, and the top 10 most intense product ions that correspond to a given peptide. Only proteins identified by at least two spectral counts are listed. The experimental distribution of protein frequencies, as a function of molecular weight, closely matched the theoretical distribution of proteins in the human proteome, as provided in the SwissProt database. The amino acid sequence coverage of the identified proteins ranged from 0.04% to 98.3%. The highest-abundance proteins in the cellular extract had a molecular weight (MW)<50,000.
Conclusion
Preliminary experiments have demonstrated that putative biomarkers, that are not detectable by conventional data dependent MS acquisition methods in complex un-fractionated samples, can be reliable identified with the information provided in this library. Based on the spectral count, the quality of a tandem mass spectrum and the m/z values for a parent peptide and its most abundant daughter ions, MRM conditions can be selected to enable the detection of target peptides and proteins.
doi:10.1186/1471-2407-9-96
PMCID: PMC2670839  PMID: 19327145
4.  Enhanced peptide quantification using spectral count clustering and cluster abundance 
BMC Bioinformatics  2011;12:423.
Background
Quantification of protein expression by means of mass spectrometry (MS) has been introduced in various proteomics studies. In particular, two label-free quantification methods, such as spectral counting and spectra feature analysis have been extensively investigated in a wide variety of proteomic studies. The cornerstone of both methods is peptide identification based on a proteomic database search and subsequent estimation of peptide retention time. However, they often suffer from restrictive database search and inaccurate estimation of the liquid chromatography (LC) retention time. Furthermore, conventional peptide identification methods based on the spectral library search algorithms such as SEQUEST or SpectraST have been found to provide neither the best match nor high-scored matches. Lastly, these methods are limited in the sense that target peptides cannot be identified unless they have been previously generated and stored into the database or spectral libraries.
To overcome these limitations, we propose a novel method, namely Quantification method based on Finding the Identical Spectral set for a Homogenous peptide (Q-FISH) to estimate the peptide's abundance from its tandem mass spectrometry (MS/MS) spectra through the direct comparison of experimental spectra. Intuitively, our Q-FISH method compares all possible pairs of experimental spectra in order to identify both known and novel proteins, significantly enhancing identification accuracy by grouping replicated spectra from the same peptide targets.
Results
We applied Q-FISH to Nano-LC-MS/MS data obtained from human hepatocellular carcinoma (HCC) and normal liver tissue samples to identify differentially expressed peptides between the normal and disease samples. For a total of 44,318 spectra obtained through MS/MS analysis, Q-FISH yielded 14,747 clusters. Among these, 5,777 clusters were identified only in the HCC sample, 6,648 clusters only in the normal tissue sample, and 2,323 clusters both in the HCC and normal tissue samples. While it will be interesting to investigate peptide clusters only found from one sample, further examined spectral clusters identified both in the HCC and normal samples since our goal is to identify and assess differentially expressed peptides quantitatively. The next step was to perform a beta-binomial test to isolate differentially expressed peptides between the HCC and normal tissue samples. This test resulted in 84 peptides with significantly differential spectral counts between the HCC and normal tissue samples. We independently identified 50 and 95 peptides by SEQUEST, of which 24 and 56 peptides, respectively, were found to be known biomarkers for the human liver cancer. Comparing Q-FISH and SEQUEST results, we found 22 of the differentially expressed 84 peptides by Q-FISH were also identified by SEQUEST. Remarkably, of these 22 peptides discovered both by Q-FISH and SEQUEST, 13 peptides are known for human liver cancer and the remaining 9 peptides are known to be associated with other cancers.
Conclusions
We proposed a novel statistical method, Q-FISH, for accurately identifying protein species and simultaneously quantifying the expression levels of identified peptides from mass spectrometry data. Q-FISH analysis on human HCC and liver tissue samples identified many protein biomarkers that are highly relevant to HCC. Q-FISH can be a useful tool both for peptide identification and quantification on mass spectrometry data analysis. It may also prove to be more effective in discovering novel protein biomarkers than SEQUEST and other standard methods.
doi:10.1186/1471-2105-12-423
PMCID: PMC3234305  PMID: 22034872
5.  Absolute quantification of microbial proteomes at different states by directed mass spectrometry 
The developed, directed mass spectrometry workflow allows to generate consistent and system-wide quantitative maps of microbial proteomes in a single analysis. Application to the human pathogen L. interrogans revealed mechanistic proteome changes over time involved in pathogenic progression and antibiotic defense, and new insights about the regulation of absolute protein abundances within operons.
The developed, directed proteomic approach allowed consistent detection and absolute quantification of 1680 proteins of the human pathogen L. interrogans in a single LC–MS/MS experiment.The comparison of 25 extensive, consistent and quantitative proteome maps revealed new insights about the proteome changes involved in pathogenic progression and antibiotic defense of L. interrogans, and about the regulation of protein abundances within operons.The generated time-resolved data sets are compatible with pattern analysis algorithms developed for transcriptomics, including hierarchical clustering and functional enrichment analysis of the detected profile clusters.This is the first study that describes the absolute quantitative behavior of any proteome over multiple states and represents the most comprehensive proteome abundance pattern comparison for any organism to date.
Over the last decade, mass spectrometry (MS)-based proteomics has evolved as the method of choice for system-wide proteome studies and now allows for the characterization of several thousands of proteins in a single sample. Despite these great advances, redundant monitoring of protein levels over large sample numbers in a high-throughput manner remains a challenging task. New directed MS strategies have shown to overcome some of the current limitations, thereby enabling the acquisition of consistent and system-wide data sets of proteomes with low-to-moderate complexity at high throughput.
In this study, we applied this integrated, two-stage MS strategy to investigate global proteome changes in the human pathogen L. interrogans. In the initial discovery phase, 1680 proteins (out of around 3600 gene products) could be identified (Schmidt et al, 2008) and, by focusing precious MS-sequencing time on the most dominant, specific peptides per protein, all proteins could be accurately and consistently monitored over 25 different samples within a few days of instrument time in the following scoring phase (Figure 1). Additionally, the co-analysis of heavy reference peptides enabled us to obtain absolute protein concentration estimates for all identified proteins in each perturbation (Malmström et al, 2009). The detected proteins did not show any biases against functional groups or protein classes, including membrane proteins, and span an abundance range of more than three orders of magnitude, a range that is expected to cover most of the L. interrogans proteome (Malmström et al, 2009).
To elucidate mechanistic proteome changes over time involved in pathogenic progression and antibiotic defense of L. interrogans, we generated time-resolved proteome maps of cells perturbed with serum and three different antibiotics at sublethal concentrations that are currently used to treat Leptospirosis. This yielded an information-rich proteomic data set that describes, for the first time, the absolute quantitative behavior of any proteome over multiple states, and represents the most comprehensive proteome abundance pattern comparison for any organism to date. Using this unique property of the data set, we could quantify protein components of entire pathways across several time points and subject the data sets to cluster analysis, a tool that was previously limited to the transcript level due to incomplete sampling on protein level (Figure 4). Based on these analyses, we could demonstrate that Leptospira cells adjust the cellular abundance of a certain subset of proteins and pathways as a general response to stress while other parts of the proteome respond highly specific. The cells furthermore react to individual treatments by ‘fine tuning' the abundance of certain proteins and pathways in order to cope with the specific cause of stress. Intriguingly, the most specific and significant expression changes were observed for proteins involved in motility, tissue penetration and virulence after serum treatment where we tried to simulate the host environment. While many of the detected protein changes demonstrate good agreement with available transcriptomics data, most proteins showed a poor correlation. This includes potential virulence factors, like Loa22 or OmpL1, with confirmed expression in vivo that were significantly up-regulated on the protein level, but not on the mRNA level, strengthening the importance of proteomic studies. The high resolution and coverage of the proteome data set enabled us to further investigate protein abundance changes of co-regulated genes within operons. This suggests that although most proteins within an operon respond to regulation synchronously, bacterial cells seem to have subtle means to adjust the levels of individual proteins or protein groups outside of the general trend, a phenomena that was recently also observed on the transcript level of other bacteria (Güell et al, 2009).
The method can be implemented with standard high-resolution mass spectrometers and software tools that are readily available in the majority of proteomics laboratories. It is scalable to any proteome of low-to-medium complexity and can be extended to post-translational modifications or peptide-labeling strategies for quantification. We therefore expect the approach outlined here to become a cornerstone for microbial systems biology.
Over the past decade, liquid chromatography coupled with tandem mass spectrometry (LC–MS/MS) has evolved into the main proteome discovery technology. Up to several thousand proteins can now be reliably identified from a sample and the relative abundance of the identified proteins can be determined across samples. However, the remeasurement of substantially similar proteomes, for example those generated by perturbation experiments in systems biology, at high reproducibility and throughput remains challenging. Here, we apply a directed MS strategy to detect and quantify sets of pre-determined peptides in tryptic digests of cells of the human pathogen Leptospira interrogans at 25 different states. We show that in a single LC–MS/MS experiment around 5000 peptides, covering 1680 L. interrogans proteins, can be consistently detected and their absolute expression levels estimated, revealing new insights about the proteome changes involved in pathogenic progression and antibiotic defense of L. interrogans. This is the first study that describes the absolute quantitative behavior of any proteome over multiple states, and represents the most comprehensive proteome abundance pattern comparison for any organism to date.
doi:10.1038/msb.2011.37
PMCID: PMC3159967  PMID: 21772258
absolute quantification; directed mass spectrometry; Leptospira interrogans; microbiology; proteomics
6.  Comparative Shotgun Proteomics Using Spectral Count Data and Quasi-Likelihood Modeling 
Journal of Proteome Research  2010;9(8):4295-4305.
Shotgun proteomics provides the most powerful analytical platform for global inventory of complex proteomes using liquid chromatography−tandem mass spectrometry (LC−MS/MS) and allows a global analysis of protein changes. Nevertheless, sampling of complex proteomes by current shotgun proteomics platforms is incomplete, and this contributes to variability in assessment of peptide and protein inventories by spectral counting approaches. Thus, shotgun proteomics data pose challenges in comparing proteomes from different biological states. We developed an analysis strategy using quasi-likelihood Generalized Linear Modeling (GLM), included in a graphical interface software package (QuasiTel) that reads standard output from protein assemblies created by IDPicker, an HTML-based user interface to query shotgun proteomic data sets. This approach was compared to four other statistical analysis strategies: Student t test, Wilcoxon rank test, Fisher’s Exact test, and Poisson-based GLM. We analyzed the performance of these tests to identify differences in protein levels based on spectral counts in a shotgun data set in which equimolar amounts of 48 human proteins were spiked at different levels into whole yeast lysates. Both GLM approaches and the Fisher Exact test performed adequately, each with their unique limitations. We subsequently compared the proteomes of normal tonsil epithelium and HNSCC using this approach and identified 86 proteins with differential spectral counts between normal tonsil epithelium and HNSCC. We selected 18 proteins from this comparison for verification of protein levels between the individual normal and tumor tissues using liquid chromatography−multiple reaction monitoring mass spectrometry (LC−MRM-MS). This analysis confirmed the magnitude and direction of the protein expression differences in all 6 proteins for which reliable data could be obtained. Our analysis demonstrates that shotgun proteomic data sets from different tissue phenotypes are sufficiently rich in quantitative information and that statistically significant differences in proteins spectral counts reflect the underlying biology of the samples.
Shotgun proteomics provides the most powerful analytical platform for global inventory of complex proteomes but incomplete sampling poses challenges in comparing protein inventories by spectral counting approaches. We developed a statistical method based on quasi-likelihood modeling and demonstrate that it compares favorably to other statistical tests. Statistically significant spectral count differences were confirmed by MRM demonstrating that the observed protein level differences reflect the underlying biology of the samples.
doi:10.1021/pr100527g
PMCID: PMC2920032  PMID: 20586475
LC−MS/MS; shotgun proteomics; multiple reaction monitoring (MRM); head and neck carcinoma; Generalized Linear Model; spectral counting
7.  EP6 Quantitative Proteomics 
There are numerous approaches to study the proteome in a quantitative manner. All rely heavily on optimized sample preparation and appropriate statistical analysis of resulting datasets. This session will cover the following aspects of quantitative proteomics approaches:
Quantitative profiling of the membrane proteome requires special considerations not addressed in typical mass-spectrometry analyses. Optimized sample preparation and separation strategies will be discussed in the context of enriched membrane fractions and a quantitative proteomics platform using stable isotopes.In shotgun proteomics, a complex protein mixture is first digested to peptides, which are then analyzed by a combination of nanoflow chromatography and tandem mass spectrometry. The effects of subtle changes in sample preparation and chromatographic conditions in the characterization of complex mixtures will be presented. A discovery-based mass spectrometry approach using a bench-top LTQ linear ion trap and in-house written software for label-free differential protein profiling will be presented. This approach is quite comprehensive and is compatible with even the most inexpensive mass spectrometers. For proteins not detected routinely using our discovery-based approaches, we have applied selected reaction monitoring using a TSQ Quantum Ultra. This approach has been used to identify and quantify proteins at the low ng/mL level in plasma without any prior fractionation. A software pipeline has been developed to go from hypothesized proteins of interest derived from the literature to predicted hSRM transitions, collision offsets, and predicted chromatographic retention times. The combination of both discovery- and hypothesis-driven proteomics using nanoflow separations and tandem mass spectrometry provides us with unparalleled sensitivity and dynamic range in characterizing complex mixtures.Spectrum counting is an appealing and relatively straightforward approach for quantitative proteomics. Since the spectrum count of a protein in a proteomic analysis is the total number of peptides, not just unique peptides detected and identified for a given protein, searching criteria and false-positive minimization is important. There are several different versions of spectral counting currently in use, but each approach has shared core characteristics. An additional important consideration for quantitative proteomic analysis is the use of replicates for statistical analysis and determining the proper statistical test to use based on the overall structure of the datasets. This presentation will describe the foundation of spectral counting and the modifications to this approach used by different researchers. In addition, selected examples of the biological implementation of these approaches will be described.
PMCID: PMC2292016
8.  Cysteinyl Peptide Capture for Shotgun Proteomics: Global Assessment of Chemoselective Fractionation 
Journal of Proteome Research  2010;9(10):5461-5472.
The complexity of cell and tissue proteomes presents one of the most significant technical challenges in proteomic biomarker discovery. Multidimensional liquid chromatography−tandem mass spectrometry (LC−MS/MS)-based shotgun proteomics can be coupled with selective enrichment of cysteinyl peptides (Cys-peptides) to reduce sample complexity and increase proteome coverage. Here we evaluated the impact of Cys-peptide enrichment on global proteomic inventories. We employed a new cleavable thiol-reactive biotinylating probe, N-(2-(2-(2-(2-(3-(1-hydroxy-2-oxo-2-phenylethyl)phenoxy)acetamido)ethoxy)-ethoxy)ethyl)-5-(2-oxohexahydro-1H-thieno[3,4-d]imidazol-4-yl)pentanamide (IBB), to capture Cys-peptides after digestion. Treatment of tryptic digests with the IBB reagent followed by streptavidin capture and mild alkaline hydrolysis releases a highly purified population of Cys-peptides with a residual S-carboxymethyl tag. Isoelectric focusing (IEF) followed by LC−MS/MS of Cys-peptides significantly expanded proteome coverage in Saccharomyces cerevisiae (yeast) and in human colon carcinoma RKO cells. IBB-based fractionation enhanced detection of Cys-proteins in direct proportion to their cysteine content. The degree of enrichment typically was 2−8-fold but ranged up to almost 20-fold for a few proteins. Published copy number annotation for the yeast proteome enabled benchmarking of MS/MS spectral count data to yeast protein abundance and revealed selective enrichment of cysteine-rich, lower abundance proteins. Spectral count data further established this relationship in RKO cells. Enhanced detection of low abundance proteins was due to the chemoselectivity of Cys-peptide capture, rather than simplification of the peptide mixture through fractionation.
Chemoselective enrichment of cysteinyl peptides (Cys-peptides) has been used in proteome analyses to reduce sample complexity and increase proteome coverage. We evaluated the impact of Cys-peptide enrichment on global proteomic inventories by multidimensional liquid chromatography−tandem mass spectrometry (LC−MS/MS)-based shotgun proteomics. Enhanced detection of low abundance proteins was due to the chemoselectivity of Cys-peptide capture, rather than simplification of the peptide mixture through fractionation.
doi:10.1021/pr1007015
PMCID: PMC2948434  PMID: 20731415
AUTHOR; PLEASE; SUBMIT; KEYWORDS
9.  Impact of Protein Stability, Cellular Localization, and Abundance on Proteomic Detection of Tumor-Derived Proteins in Plasma 
PLoS ONE  2011;6(7):e23090.
Tumor-derived, circulating proteins are potentially useful as biomarkers for detection of cancer, for monitoring of disease progression, regression and recurrence, and for assessment of therapeutic response. Here we interrogated how a protein's stability, cellular localization, and abundance affect its observability in blood by mass-spectrometry-based proteomics techniques. We performed proteomic profiling on tumors and plasma from two different xenograft mouse models. A statistical analysis of this data revealed protein properties indicative of the detection level in plasma. Though 20% of the proteins identified in plasma were tumor-derived, only 5% of the proteins observed in the tumor tissue were found in plasma. Both intracellular and extracellular tumor proteins were observed in plasma; however, after normalizing for tumor abundance, extracellular proteins were seven times more likely to be detected. Although proteins that were more abundant in the tumor were also more likely to be observed in plasma, the relationship was nonlinear: Doubling the spectral count increased detection rate by only 50%. Many secreted proteins, even those with relatively low spectral count, were observed in plasma, but few low abundance intracellular proteins were observed. Proteins predicted to be stable by dipeptide composition were significantly more likely to be identified in plasma than less stable proteins. The number of tryptic peptides in a protein was not significantly related to the chance of a protein being observed in plasma. Quantitative comparison of large versus small tumors revealed that the abundance of proteins in plasma as measured by spectral count was associated with the tumor size, but the relationship was not one-to-one; a 3-fold decrease in tumor size resulted in a 16-fold decrease in protein abundance in plasma. This study provides quantitative support for a tumor-derived marker prioritization strategy that favors secreted and stable proteins over all but the most abundant intracellular proteins.
doi:10.1371/journal.pone.0023090
PMCID: PMC3146523  PMID: 21829587
10.  2DB: a Proteomics database for storage, analysis, presentation, and retrieval of information from mass spectrometric experiments 
BMC Bioinformatics  2008;9:302.
Background
The amount of information stemming from proteomics experiments involving (multi dimensional) separation techniques, mass spectrometric analysis, and computational analysis is ever-increasing. Data from such an experimental workflow needs to be captured, related and analyzed. Biological experiments within this scope produce heterogenic data ranging from pictures of one or two-dimensional protein maps and spectra recorded by tandem mass spectrometry to text-based identifications made by algorithms which analyze these spectra. Additionally, peptide and corresponding protein information needs to be displayed.
Results
In order to handle the large amount of data from computational processing of mass spectrometric experiments, automatic import scripts are available and the necessity for manual input to the database has been minimized. Information is in a generic format which abstracts from specific software tools typically used in such an experimental workflow. The software is therefore capable of storing and cross analysing results from many algorithms. A novel feature and a focus of this database is to facilitate protein identification by using peptides identified from mass spectrometry and link this information directly to respective protein maps. Additionally, our application employs spectral counting for quantitative presentation of the data. All information can be linked to hot spots on images to place the results into an experimental context. A summary of identified proteins, containing all relevant information per hot spot, is automatically generated, usually upon either a change in the underlying protein models or due to newly imported identifications. The supporting information for this report can be accessed in multiple ways using the user interface provided by the application.
Conclusion
We present a proteomics database which aims to greatly reduce evaluation time of results from mass spectrometric experiments and enhance result quality by allowing consistent data handling. Import functionality, automatic protein detection, and summary creation act together to facilitate data analysis. In addition, supporting information for these findings is readily accessible via the graphical user interface provided. The database schema and the implementation, which can easily be installed on virtually any server, can be downloaded in the form of a compressed file from our project webpage.
doi:10.1186/1471-2105-9-302
PMCID: PMC2475538  PMID: 18605993
11.  Spectral counting assessment of protein dynamic range in cerebrospinal fluid following depletion with plasma-designed immunoaffinity columns 
Clinical proteomics  2011;8(1):6.
Background
In cerebrospinal fluid (CSF), which is a rich source of biomarkers for neurological diseases, identification of biomarkers requires methods that allow reproducible detection of low abundance proteins. It is therefore crucial to decrease dynamic range and improve assessment of protein abundance.
Results
We applied LC-MS/MS to compare the performance of two CSF enrichment techniques that immunodeplete either albumin alone (IgYHSA) or 14 high-abundance proteins (IgY14). In order to estimate dynamic range of proteins identified, we measured protein abundance with APEX spectral counting method.
Both immunodepletion methods improved the number of low-abundance proteins detected (3-fold for IgYHSA, 4-fold for IgY14). The 10 most abundant proteins following immunodepletion accounted for 41% (IgY14) and 46% (IgYHSA) of CSF protein content, whereas they accounted for 64% in non-depleted samples, thus demonstrating significant enrichment of low-abundance proteins. Defined proteomics experiment metrics showed overall good reproducibility of the two immunodepletion methods and MS analysis. Moreover, offline peptide fractionation in IgYHSA sample allowed a 4-fold increase of proteins identified (520 vs. 131 without fractionation), without hindering reproducibility.
Conclusions
The novelty of this study was to show the advantages and drawbacks of these methods side-to-side. Taking into account the improved detection and potential loss of non-target proteins following extensive immunodepletion, it is concluded that both depletion methods combined with spectral counting may be of interest before further fractionation, when searching for CSF biomarkers. According to the reliable identification and quantitation obtained with APEX algorithm, it may be considered as a cheap and quick alternative to study sample proteomic content.
doi:10.1186/1559-0275-8-6
PMCID: PMC3167203  PMID: 21906361
CSF; APEX; Biomarkers; depletion column; enrichment; low-abundance proteins
12.  Abacus: A computational tool for extracting and pre-processing spectral count data for label-free quantitative proteomic analysis 
Proteomics  2011;11(7):1340-1345.
We describe Abacus, a computational tool for extracting spectral counts from tandem mass spectrometry based proteomic datasets. The program aggregates data from multiple experiments, adjusts spectral counts to accurately account for peptides shared across multiple proteins, and performs common normalization steps. It can also output the spectral count data at the gene level, thus simplifying the integration and comparison between gene and protein expression data. Abacus is compatible with the widely used Trans-Proteomic Pipeline suite of tools and comes with a graphical user interface making it easy to interact with the program. The main aim of Abacus is to streamline the analysis of spectral count data by providing an automated, easy to use solution for extracting this information from proteomic datasets for subsequent, more sophisticated statistical analysis.
doi:10.1002/pmic.201000650
PMCID: PMC3113614  PMID: 21360675
Label free quantification; spectral counts; software; tandem mass spectrometry; protein inference; shared peptides
13.  Corra: Computational framework and tools for LC-MS discovery and targeted mass spectrometry-based proteomics 
BMC Bioinformatics  2008;9:542.
Background
Quantitative proteomics holds great promise for identifying proteins that are differentially abundant between populations representing different physiological or disease states. A range of computational tools is now available for both isotopically labeled and label-free liquid chromatography mass spectrometry (LC-MS) based quantitative proteomics. However, they are generally not comparable to each other in terms of functionality, user interfaces, information input/output, and do not readily facilitate appropriate statistical data analysis. These limitations, along with the array of choices, present a daunting prospect for biologists, and other researchers not trained in bioinformatics, who wish to use LC-MS-based quantitative proteomics.
Results
We have developed Corra, a computational framework and tools for discovery-based LC-MS proteomics. Corra extends and adapts existing algorithms used for LC-MS-based proteomics, and statistical algorithms, originally developed for microarray data analyses, appropriate for LC-MS data analysis. Corra also adapts software engineering technologies (e.g. Google Web Toolkit, distributed processing) so that computationally intense data processing and statistical analyses can run on a remote server, while the user controls and manages the process from their own computer via a simple web interface. Corra also allows the user to output significantly differentially abundant LC-MS-detected peptide features in a form compatible with subsequent sequence identification via tandem mass spectrometry (MS/MS). We present two case studies to illustrate the application of Corra to commonly performed LC-MS-based biological workflows: a pilot biomarker discovery study of glycoproteins isolated from human plasma samples relevant to type 2 diabetes, and a study in yeast to identify in vivo targets of the protein kinase Ark1 via phosphopeptide profiling.
Conclusion
The Corra computational framework leverages computational innovation to enable biologists or other researchers to process, analyze and visualize LC-MS data with what would otherwise be a complex and not user-friendly suite of tools. Corra enables appropriate statistical analyses, with controlled false-discovery rates, ultimately to inform subsequent targeted identification of differentially abundant peptides by MS/MS. For the user not trained in bioinformatics, Corra represents a complete, customizable, free and open source computational platform enabling LC-MS-based proteomic workflows, and as such, addresses an unmet need in the LC-MS proteomics field.
doi:10.1186/1471-2105-9-542
PMCID: PMC2651178  PMID: 19087345
14.  Utilizing Spectral Counting to Quantitatively Characterize Tandem Removal of Abundant Proteins (TRAP) in Human Plasma 
Analytical chemistry  2010;82(24):10179-10185.
Biomarker discovery efforts in serum and plasma are greatly hindered by the presence of high abundance proteins that prevent the detection and quantification of less abundant, yet biologically significant proteins. The most common method for addressing this problem is to specifically remove the few abundant proteins through immunoaffinity depletion/subtraction. Herein, we improved upon this method by utilizing multiple depletion columns in series, so as to increase the efficiency of the abundant protein removal and augment the detection/identification of less abundant plasma proteins. Spectral counting was utilized to make quantitative comparisons between un-depleted plasma, plasma depleted with a single depletion column, and plasma depleted using two or three depletion columns in tandem. In the un-depleted plasma only 29 lower abundance protein groups were identified with the top-scoring protein from each group having a median spectral count of 3, while in the plasma processed using a single HSA depletion column 61 such protein groups were identified with a median spectral count of 8. In comparison, 76 lesser abundant protein groups were identified with a median spectral count of 11.5 in the two column setup (i.e., HSA followed by MARS Hu14). However, in the ultimate depleted plasma sample, which was created using three depletion columns in tandem, the number of less abundant protein groups identified increased to 81 and the median, average spectral count for the top-scoring proteins from each group increased to 15 counts per protein. Moreover, exogenous B-type Natriuretic Peptide-32, which was added to the plasma as a detection benchmark at 12 μg/mL, was only detected in the plasma sample depleted using three depletion columns in tandem. Collectively, these data demonstrate this method, Tandem Removal of Abundant Proteins or TRAP, provides superior removal efficiency compared to traditional applications and improves the depth of proteome coverage in plasma.
doi:10.1021/ac102248d
PMCID: PMC3654688  PMID: 21090636
15.  ABRF Research Group Development and Characterization of a Proteomics Normalization Standard Consisting of 1,000 Stable Isotope Labeled Peptides 
The ABRF Proteomics Standards Research Group (sPRG) is reporting the progress of a two-year study (2012–2014) which focuses on the generation of interassay, interspecies, and interlaboratory peptide standard that can be used for normalization of protein abundance measurements in mass spectrometry based quantitative proteomics analyses. The standard has been formulated as two mixtures: 1,000 stable isotope 13C/15N-labeled (SIL) synthetic peptides alone, and peptides mixed with a tryptic digest of a HEK 293 cell lysate. The sequences of the synthetic peptides were derived from 552 proteins conserved across proteomes of commonly analyzed species: Homo sapiens, Mus musculus and Rattus norvegicus. The selected peptides represent a full range of hydrophobicities and isoelectric points, typical of tryptic peptides derived from complex proteomic samples. The standard was designed to represent proteins of various concentrations, spanning three orders of magnitude. First year efforts were focused on selection of appropriate protein and peptide candidates, peptide synthesis, quality assessment and LC-MS/MS evaluation conducted in laboratories of sPRG members. Using a variety of instrumental configurations and bioinformatics approaches, a thorough characterization of all 1,000 peptides was established. In the second year, the group launched the study to the entire proteomics community. A lyophilized mixture of HEK 293 tryptic digest cell lysate spiked with the 1,000 SIL peptide standards was provided to each participant. Also provided were a Skyline tutorial, tutorial datasets, three MS/MS spectral libraries generated from linear ion-trap (CID), Q-TOF/QQQ (CID), or Orbitrap (HCD) instrumentation, and a Panorama data repository. Participants were asked to analyze the sample in triplicate and calculate ratios of the spiked SIL to endogenous peptides and coefficients of variance for each peptide. Over 40 datasets were returned, and results following thorough characterization of the standard using various instrumental configurations will be reported.
PMCID: PMC4162257
16.  ProtQuant: a tool for the label-free quantification of MudPIT proteomics data 
BMC Bioinformatics  2007;8(Suppl 7):S24.
Background
Effective and economical methods for quantitative analysis of high throughput mass spectrometry data are essential to meet the goals of directly identifying, characterizing, and quantifying proteins from a particular cell state. Multidimensional Protein Identification Technology (MudPIT) is a common approach used in protein identification. Two types of methods are used to detect differential protein expression in MudPIT experiments: those involving stable isotope labelling and the so-called label-free methods. Label-free methods are based on the relationship between protein abundance and sampling statistics such as peptide count, spectral count, probabilistic peptide identification scores, and sum of peptide Sequest XCorr scores (ΣXCorr). Although a number of label-free methods for protein quantification have been described in the literature, there are few publicly available tools that implement these methods. We describe ProtQuant, a Java-based tool for label-free protein quantification that uses the previously published ΣXCorr method for quantification and includes an improved method for handling missing data.
Results
ProtQuant was designed for ease of use and portability for the bench scientist. It implements the ΣXCorr method for label free protein quantification from MudPIT datasets. ProtQuant has a graphical user interface, accepts multiple file formats, is not limited by the size of the input files, and can process any number of replicates and any number of treatments. In addition,ProtQuant implements a new method for dealing with missing values for peptide scores used for quantification. The new algorithm, called ΣXCorr*, uses "below threshold" peptide scores to provide meaningful non-zero values for missing data points. We demonstrate that ΣXCorr* produces an average reduction in false positive identifications of differential expression of 25% compared to ΣXCorr.
Conclusion
ProtQuant is a tool for protein quantification built for multi-platform use with an intuitive user interface. ProtQuant efficiently and uniquely performs label-free quantification of protein datasets produced with Sequest and provides the user with facilities for data management and analysis. Importantly, ProtQuant is available as a self-installing executable for the Windows environment used by many bench scientists.
doi:10.1186/1471-2105-8-S7-S24
PMCID: PMC2099493  PMID: 18047724
17.  Urinary proteome analysis of irritable bowel syndrome (IBS) symptom subgroups 
Journal of proteome research  2012;11(12):5650-5662.
Irritable bowel syndrome (IBS) is a functional gastrointestinal (GI) disorder characterized by chronic abdominal pain associated with alterations in bowel function. Given the heterogeneity of the symptoms, multiple pathophysiologic factors are suspected to play a role. We classified women with IBS into four subgroups based on distinct symptom profiles. In-depth shotgun proteomic analysis was carried out to profile the urinary proteomes to identify possible proteins associated with these subgroups. First void urine samples with urine creatinine level ≥ 100 mg/dL were used after excluding samples that tested positive for blood. Urine from ten subjects representing each symptom subgroup was pooled for proteomic analysis. The urine proteome was analyzed by liquid chromatography-tandem mass spectrometry (LC-MS/MS) using a data-independent method known as Precursor Acquisition Independent From Ion Count (PAcIFIC) that allowed extended detectable dynamic range. Differences in protein quantities were determined by peptide spectral counting followed by validation of select proteins with ELISA or a targeted single reaction monitoring (LC-SRM/MS) approach. Four IBS symptom subgroups were selected: 1) Constipation, 2) Diarrhea + Low Pain, 3) Diarrhea + High Pain, and 4) High Pain + High Pychological Distress. A fifth group consisted of Healthy Control subjects. From comparisons of quantitative spectral counting data among the symptom subgroups and controls, a total of 18 proteins that showed quantitative differences in relative abundance and possible physiological relevance to IBS were selected for further investigation. Three of the 18 proteins were chosen for validation by either ELISA or SRM. An elevated expression of gelsolin (GSN) was associated with the high pain groups. Trefoil Factor 3 (TFF3) levels were higher in IBS groups compared to controls. In this study the IBS patients subclassified by predominant symptoms showed differences in urine proteome levels. Proteins showing distinctive changes are involved in homeostasis of intestinal function and inflammatory response. These findings warrant future studies with larger, independent cohorts to enable more extensive assessment and validation of urinary protein markers as a diagnostic tool in adult with IBS.
doi:10.1021/pr3004437
PMCID: PMC3631108  PMID: 22998556
biomarker; irritable bowel syndrome; mass spectrometry; proteomics; urine; women
18.  Comprehensive analysis of protein digestion using six trypsins reveals the origin of trypsin as a significant source of variability in proteomics1 
Journal of proteome research  2013;12(12):5666-5680.
Trypsin is an endoprotease commonly used for sample preparation in proteomics experiments. Importantly, protein digestion is dependent on multiple factors, including the trypsin origin and digestion conditions. In-depth characterization of trypsin activity could lead to improved reliability of peptide detection and quantitation in both targeted and discovery proteomics studies. To this end, we assembled a data analysis pipeline and suite of visualization tools for quality control and comprehensive characterization of pre-analytical variability in proteomics experiments. Using these tools, we evaluated six available proteomics-grade trypsins and their digestion of a single purified protein, human serum albumin (HSA). HSA was aliquoted and then digested for 2 or 18 hours for each trypsin, and the resulting digests were desalted and analyzed in triplicate by reversed phase liquid chromatography - tandem mass spectrometry. Peptides were identified and quantified using the NIST MSQC pipeline and a comprehensive HSA mass spectral library. We performed a statistical analysis of peptide abundances from different digests, and further visualized the data using the principal component analysis and quantitative protein “sequence maps”. While the performance of individual trypsins across repeat digests was reproducible, significant differences were observed depending on the origin of the trypsin (i.e., bovine vs. porcine). Bovine trypsins produced a higher number of peptides containing missed cleavages, whereas porcine trypsins produced more semi-tryptic peptides. In addition, many cleavage sites showed variable digestion kinetics patterns, evident from the comparison of peptide abundances in 2 hour vs. 18 hour digests. Overall, this work illustrates effects of an often neglected source of variability in proteomics experiments: the origin of the trypsin.
doi:10.1021/pr400611h
PMCID: PMC4076643  PMID: 24116745
proteomics; mass spectrometry; trypsin; digestion; endoprotease specificity; peptide abundance; variability; missed cleavages; label-free quantification; statistical analysis
19.  Comprehensive analysis of the mouse renal cortex using two-dimensional HPLC – tandem mass spectrometry 
Proteome Science  2008;6:15.
Background
Proteomic methodologies increasingly have been applied to the kidney to map the renal cortical proteome and to identify global changes in renal proteins induced by diseases such as diabetes. While progress has been made in establishing a renal cortical proteome using 1-D or 2-DE and mass spectrometry, the number of proteins definitively identified by mass spectrometry has remained surprisingly small. Low coverage of the renal cortical proteome as well as our interest in diabetes-induced changes in proteins found in the renal cortex prompted us to perform an in-depth proteomic analysis of mouse renal cortical tissue.
Results
We report a large scale analysis of mouse renal cortical proteome using SCX prefractionation strategy combined with HPLC – tandem mass spectrometry. High-confidence identification of ~2,000 proteins, including cytoplasmic, nuclear, plasma membrane, extracellular and unknown/unclassified proteins, was obtained by separating tryptic peptides of renal cortical proteins into 60 fractions by SCX prior to LC-MS/MS. The identified proteins represented the renal cortical proteome with no discernible bias due to protein physicochemical properties, subcellular distribution, biological processes, or molecular function. The highest ranked molecular functions were characteristic of tubular epithelium, and included binding, catalytic activity, transporter activity, structural molecule activity, and carrier activity. Comparison of this renal cortical proteome with published human urinary proteomes demonstrated enrichment of renal extracellular, plasma membrane, and lysosomal proteins in the urine, with a lack of intracellular proteins. Comparison of the most abundant proteins based on normalized spectral abundance factor (NSAF) in this dataset versus a published glomerular proteome indicated enrichment of mitochondrial proteins in the former and cytoskeletal proteins in the latter.
Conclusion
A whole tissue extract of the mouse kidney cortex was analyzed by an unbiased proteomic approach, yielding a dataset of ~2,000 unique proteins identified with strict criteria to ensure a high level of confidence in protein identification. As a result of extracting all proteins from the renal cortex, we identified an exceptionally wide range of renal proteins in terms of pI, MW, hydrophobicity, abundance, and subcellular location. Many of these proteins, such as low-abundance proteins, membrane proteins and proteins with extreme values in pI or MW are traditionally under-represented in 2-DE-based proteomic analysis.
doi:10.1186/1477-5956-6-15
PMCID: PMC2412861  PMID: 18501002
20.  Equivalence of Protein Inventories Obtained from Formalin-fixed Paraffin-embedded and Frozen Tissue in Multidimensional Liquid Chromatography-Tandem Mass Spectrometry Shotgun Proteomic Analysis* 
Formalin-fixed paraffin-embedded (FFPE) tissue specimens comprise a potentially valuable resource for retrospective biomarker discovery studies, and recent work indicates the feasibility of using shotgun proteomics to characterize FFPE tissue proteins. A critical question in the field is whether proteomes characterized in FFPE specimens are equivalent to proteomes in corresponding fresh or frozen tissue specimens. Here we compared shotgun proteomic analyses of frozen and FFPE specimens prepared from the same colon adenoma tissues. Following deparaffinization, rehydration, and tryptic digestion under mild conditions, FFPE specimens corresponding to 200 μg of protein yielded ∼400 confident protein identifications in a one-dimensional reverse phase liquid chromatography-tandem mass spectrometry (LC-MS/MS) analysis. The major difference between frozen and FFPE proteomes was a decrease in the proportions of lysine C-terminal to arginine C-terminal peptides observed, but these differences had little effect on the proteins identified. No covalent peptide modifications attributable to formaldehyde chemistry were detected by analyses of the MS/MS datasets, which suggests that undetected, cross-linked peptides comprise the major class of modifications in FFPE tissues. Fixation of tissue for up to 2 days in neutral buffered formalin did not adversely impact protein identifications. Analysis of archival colon adenoma FFPE specimens indicated equivalent numbers of MS/MS spectral counts and protein group identifications from specimens stored for 1, 3, 5, and 10 years. Combination of peptide isoelectric focusing-based separation with reverse phase LC-MS/MS identified 2554 protein groups in 600 ng of protein from frozen tissue and 2302 protein groups from FFPE tissue with at least two distinct peptide identifications per protein. Analysis of the combined frozen and FFPE data showed a 92% overlap in the protein groups identified. Comparison of gene ontology categories of identified proteins revealed no bias in protein identification based on subcellular localization. Although the status of posttranslational modifications was not examined in this study, archival samples displayed a modest increase in methionine oxidation, from ∼17% after one year of storage to ∼25% after 10 years. These data demonstrate the equivalence of proteome inventories obtained from FFPE and frozen tissue specimens and provide support for retrospective proteomic analysis of FFPE tissues for biomarker discovery.
doi:10.1074/mcp.M800518-MCP200
PMCID: PMC2722776  PMID: 19467989
21.  MaXIC-Q Web: a fully automated web service using statistical and computational methods for protein quantitation based on stable isotope labeling and LC–MS 
Nucleic Acids Research  2009;37(Web Server issue):W661-W669.
Isotope labeling combined with liquid chromatography–mass spectrometry (LC–MS) provides a robust platform for analyzing differential protein expression in proteomics research. We present a web service, called MaXIC-Q Web (http://ms.iis.sinica.edu.tw/MaXIC-Q_Web/), for quantitation analysis of large-scale datasets generated from proteomics experiments using various stable isotope-labeling techniques, e.g. SILAC, ICAT and user-developed labeling methods. It accepts spectral files in the standard mzXML format and search results from SEQUEST, Mascot and ProteinProphet as input. Furthermore, MaXIC-Q Web uses statistical and computational methods to construct two kinds of elution profiles for each ion, namely, PIMS (projected ion mass spectrum) and XIC (extracted ion chromatogram) from MS data. Toward accurate quantitation, a stringent validation procedure is performed on PIMSs to filter out peptide ions interfered with co-eluting peptides or noise. The areas of XICs determine ion abundances, which are used to calculate peptide and protein ratios. Since MaXIC-Q Web adopts stringent validation on spectral data, it achieves high accuracy so that manual validation effort can be substantially reduced. Furthermore, it provides various visualization diagrams and comprehensive quantitation reports so that users can conveniently inspect quantitation results. In summary, MaXIC-Q Web is a user-friendly, interactive, robust, generic web service for quantitation based on ICAT and SILAC labeling techniques.
doi:10.1093/nar/gkp476
PMCID: PMC2703943  PMID: 19528069
22.  MultiAlign: a multiple LC-MS analysis tool for targeted omics analysis 
BMC Bioinformatics  2013;14:49.
Background
MultiAlign is a free software tool that aligns multiple liquid chromatography-mass spectrometry datasets to one another by clustering mass and chromatographic elution features across datasets. Applicable to both label-free proteomics and metabolomics comparative analyses, the software can be operated in several modes. For example, clustered features can be matched to a reference database to identify analytes, used to generate abundance profiles, linked to tandem mass spectra based on parent precursor masses, and culled for targeted liquid chromatography-tandem mass spectrometric analysis. MultiAlign is also capable of tandem mass spectral clustering to describe proteome structure and find similarity in subsequent sample runs.
Results
MultiAlign was applied to two large proteomics datasets obtained from liquid chromatography-mass spectrometry analyses of environmental samples. Peptides in the datasets for a microbial community that had a known metagenome were identified by matching mass and elution time features to those in an established reference peptide database. Results compared favorably with those obtained using existing tools such as VIPER, but with the added benefit of being able to trace clusters of peptides across conditions to existing tandem mass spectra. MultiAlign was further applied to detect clusters across experimental samples derived from a reactor biomass community for which no metagenome was available. Several clusters were culled for further analysis to explore changes in the community structure. Lastly, MultiAlign was applied to liquid chromatography-mass spectrometry-based datasets obtained from a previously published study of wild type and mitochondrial fatty acid oxidation enzyme knockdown mutants of human hepatocarcinoma to demonstrate its utility for analyzing metabolomics datasets.
Conclusion
MultiAlign is an efficient software package for finding similar analytes across multiple liquid chromatography-mass spectrometry feature maps, as demonstrated here for both proteomics and metabolomics experiments. The software is particularly useful for proteomic studies where little or no genomic context is known, such as with environmental proteomics.
doi:10.1186/1471-2105-14-49
PMCID: PMC3599190  PMID: 23398735
Metabolomics; Proteomics; Mass spectrometry; Liquid chromatography; Spectral clustering; Alignment
23.  Comparison of spectral counting and metabolic stable isotope labeling for use with quantitative microbial proteomics 
The Analyst  2006;131(12):1335-1341.
Summary
Spectral counting, a promising method for quantifying relative changes in protein abundance in mass spectrometry-based proteomic analysis, was compared to metabolic stable isotope labeling using 15N/14N “heavy/light” peptide pairs. The data were drawn primarily from a Methanococcus maripaludis experiment comparing a wild-type strain with a mutant deficient in a key enzyme relevant to energy metabolism. The dataset contained both proteome and transcriptome measurements. The normalization technique used previously for the isotopic measurements was inappropriate for spectral counting, but a simple adjustment for sampling frequency was sufficient for normalization. This adjustment was satisfactory both for M. maripaludis, an organism that showed relatively little expression change between the wild-type and mutant strains, and Porphyromonas gingivalis, an intracellular pathogen that has demonstrated widespread changes between intracellular and extracellular conditions. Spectral counting showed lower overall sensitivity defined in terms of detecting a two-fold change in protein expression, and in order to achieve the same level of quantitative proteome coverage as the stable isotope method, it would have required approximately doubling the number of mass spectra collected.
doi:10.1039/b610957h
PMCID: PMC2660848  PMID: 17124542
24.  Introducing Inexperienced Users to Label-free Quantitative Proteomics 
This presentation will describe our experiences translating facility users and collaborators from protein identification experiments to quantitative proteomics experiments. The platform is nano-liquid chromatography with data-dependent tandem mass spectrometry on an instrument capable of high-resolution, typical of bottom-up proteomics experiments across the world yielding high-resolution precursor ion and low-resolution product ion information, as well as retention time. The focus will be on single run-to-run comparisons rather than more complex multi-dimensional (MUDPIT) type experiments.
Typically new users require ‘protein identification’ of a band on a gel. When they get presented with a list of protein hits they start to ask questions regarding quantification. At this point we introduce the concept of spectral counting, put simply, the most abundant protein in the sample yields the most identified peptides. Exceptions and caveats are also described at this time.
As a user becomes more comfortable with the data, it is usual that a quantitative proteomics experiment is performed, for example, the comparison of two nLC-MSMS data files from two different conditions. Since users at this stage are familiar with protein identification software, this is augmented by a popular commercial spectral counting algorithm that allows the user to view and manipulate their data outside of the facility. As users gain more experience in quantitative proteomics they can be mentored in experimental design and statistics. At some point the user typically asks whether there is a more rigorous strategy for quantification beyond spectral counts, at which point we discuss the use of signal intensity for specific precursors, and software packages that utilize this information. The presentation will be illustrated with examples spanning radiation-induced changes to bodily fluids, through response of the brain dialysate peptidome to drugs of abuse.
PMCID: PMC3630544
25.  COMPASS: a suite of pre- and post-search proteomics software tools for OMSSA 
Proteomics  2011;11(6):1064-1074.
Here we present the Coon OMSSA Proteomic Analysis Software Suite (COMPASS): a free and open-source software pipeline for high-throughput analysis of proteomics data, designed around the Open Mass Spectrometry Search Algorithm. We detail a synergistic set of tools for protein database generation, spectral reduction, peptide false discovery rate analysis, peptide quantitation via isobaric labeling, protein parsimony and protein false discovery rate analysis, and protein quantitation. We strive for maximum ease of use, utilizing graphical user interfaces and working with data files in the original instrument vendor format. Results are stored in plain text comma-separated values files, which are easy to view and manipulate with a text editor or spreadsheet program. We illustrate the operation and efficacy of COMPASS through the use of two LC–MS/MS datasets. The first is a dataset of a highly annotated mixture of standard proteins and manually validated contaminants that exhibits the identification workflow. The second is a dataset of yeast peptides, labeled with isobaric stable isotope tags and mixed in known ratios, to demonstrate the quantitative workflow. For these two datasets, COMPASS performs equivalently or better than the current de facto standard, the Trans-Proteomic Pipeline.
doi:10.1002/pmic.201000616
PMCID: PMC3049964  PMID: 21298793
Informatics; Protein identification; Protein quantitation; Proteomics; Software

Results 1-25 (1112663)