Data-independent tandem mass spectrometry isolates and fragments all of the molecular species within a given mass-to-charge window, regardless of whether a precursor ion was detected within the window. For shotgun proteomics on complex protein mixtures, data-independent MS/MS offers certain advantages over the traditional data-dependent MS/MS: identification of low-abundance peptides with insignificant precursor peaks; more direct relative quantification, free of biases caused by competing precursors and dynamic exclusion; and faster throughput due to simultaneous fragmentation of multiple peptides. However, data-independent MS/MS, especially on low-resolution ion-trap instruments, strains standard peptide identification programs, because of less precise knowledge of the peptide precursor mass and large numbers of spectra composed of two or more peptides. Here we describe a computer program called DeMux that deconvolves mixture spectra and improves the peptide identification rate by ~25%. We compare the number of identifications made by data-independent and data-dependent MS/MS at the peptide and protein levels: conventional data-dependent MS/MS makes a greater number of identifications but is less reproducible from run to run.
We describe the use of a targeted proteomics approach, Selected Reaction Monitoring (SRM) mass spectrometry, to detect and assess RNAi-mediated depletion or ‘knockdown’ of specific proteins from human cells and from Drosophila flies. This label-free approach does not require any specific reagents to confirm the depletion of RNAi target protein(s) in unfractionated cell or whole organism extracts. The protocol described here is general, can be developed rapidly and can be multiplexed to detect and measure multiple proteins at once. Furthermore, the methodology can be extended to any tandem mass spectrometer - making it widely accessible. This methodology will be applicable to a wide range of basic science and clinical questions where RNAi-mediated protein depletion needs to be verified, or where differences in relative abundance of target proteins need to be rapidly assessed between samples.
Mass spectrometry; selected reaction monitoring; RNAi
Abalone, a broadcast spawning marine mollusk, is an important model for molecular interactions and positive selection in fertilization, but the focus has previously been on only two sperm proteins, lysin and sp18.We used genomic and proteomic techniques to bring new insights to this model by characterizing the testis transcriptome and sperm proteome of the Red abalone Haliotis rufescens. One pair of homologous, testis-specific proteins contains a secretion signal and is small, abundant, and associated with the acrosome. Comparative analysis revealed that homologs are extremely divergent between species, and show strong evidence for positive selection. The acrosomal localization and rapid evolution of these proteins indicates that they play an important role in fertilization, and could be involved in the species-specificity of sperm-egg interactions in abalone. Our genomic and proteomic characterization of abalone fertilization resulted in the identification of interesting, novel peptides that have eluded detection in this important model system for 20 years.
Statistical process control (SPC) is a robust set of tools that aids in the visualization, detection, and identification of assignable causes of variation in any process that creates products, services, or information. A tool has been developed termed Statistical Process Control in Proteomics (SProCoP) which implements aspects of SPC (e.g., control charts and Pareto analysis) into the Skyline proteomics software. It monitors five quality control metrics in a shotgun or targeted proteomic workflow. None of these metrics require peptide identification. The source code, written in the R statistical language, runs directly from the Skyline interface which supports the use of raw data files from several of the mass spectrometry vendors. It provides real time evaluation of the chromatographic performance (e.g., retention time reproducibility, peak asymmetry, and resolution); and mass spectrometric performance (targeted peptide ion intensity and mass measurement accuracy for high resolving power instruments) via control charts. Thresholds are experiment- and instrument-specific and are determined empirically from user-defined quality control standards that enable the separation of random noise and systematic error. Finally, Pareto analysis provides a summary of performance metrics and guides the user to metrics with high variance. The utility of these charts to evaluate proteomic experiments is illustrated in two case studies.
Quality Control; Statistical Process Control; Proteomics; Mass Spectrometry; Shewhart Control Charts
Reversed-phase liquid chromatography is the most commonly used separation method for shotgun proteomics. Nanoflow chromatography has emerged as the preferred chromatography method for its increased sensitivity and separation. Despite its common use, there are a wide range of parameters and conditions used across research groups. These parameters have an effect on the quality of the chromatographic separation, which is critical to maximizing the number of peptide identifications and minimizing ion suppression. Here we examined the relationship between column lengths, gradient lengths, peptide identifications and peptide peak capacity. We found that while longer column and gradients lengths generally increase peptide identifications, the degree of improvement is dependent on both parameters and is diminished at longer column and gradients. Peak capacity, in comparison, showed a more linear increase with column and gradient lengths. We discuss the discrepancy between these two results and some of the considerations that should be taken into account when deciding on the chromatographic conditions for a proteomics experiment.
Shotgun proteomics; nanoflow liquid chromatography; peak capacity
We report an algorithm designed for the calibration of low resolution peptide mass spectra. Our algorithm is implemented in a program called FineTune which corrects systematic mass measurement error in one minute, with no input required besides the mass spectra themselves. The mass measurement accuracy for a set of spectra collected on an LTQ-Velos improved 20-fold from −0.1776 ± 0.0010 m/z to 0.0078 ± 0.0006 m/z after calibration (avg +/− 95% confidence interval). The precision in mass measurement was improved due to the correction of non-linear variation in mass measurement accuracy across the m/z range.
Mass measurement accuracy; shotgun proteomics; linear ion trap
Multiple reaction monitoring (MRM) has recently become the method of choice for targeted quantitative measurement of proteins using mass spectrometry. The method, however, is limited in the number of peptides that can be measured in one run. This number can be markedly increased by scheduling the acquisition if the accurate retention time (RT) of each peptide is known.
Here we present iRT, an empirically derived dimensionless peptide-specific value that allows for highly accurate RT prediction. The iRT of a peptide is a fixed number relative to a standard set of reference iRT-peptides that can be transferred across laboratories and chromatographic systems.
We show that iRT facilitates the setup of multiplexed experiments with acquisition windows more than 4 times smaller compared to in silico RT predictions resulting in improved quantification accuracy. iRTs can be determined by any laboratory and shared transparently. The iRT concept has been implemented in Skyline, the most widely used software for MRM experiments.
Mass spectrometry; multiplexing; proteomics methods; optimization; quantitative analysis
In mass spectrometry based proteomics, data-independent acquisition (DIA) strategies have the ability to acquire a single dataset useful for identification and quantification of detectable peptides in a complex mixture. Despite this, DIA is often overlooked due to noisier data resulting from a typical five to ten fold reduction in precursor selectivity compared to data dependent acquisition or selected reaction monitoring. We demonstrate a multiplexing technique which improves precursor selectivity five-fold.
Data Independent Acquisition; Q-Exactive; Multiplexing; Targeted Proteomics; Shotgun Proteomics
Hardklör and Krönik are software tools for feature detection and data reduction of high resolution mass spectra. Hardklör is used to reduce peptide isotope distributions to a single monoisotopic mass and charge state, and can deconvolve overlapping peptide isotope distributions. Krönik filters, validates, and summarizes peptide features identified with Hardklör from data obtained during liquid chromatography mass spectrometry (LC-MS). Both software tools contain a simple user interface and can be run from nearly any desktop computer. These tools are freely available from http://proteome.gs.washington.edu/software/hardklor.
proteomics; mass spectrometry; liquid chromatography; high resolution; feature detection; deisotoping; peptide isotope distribution
Understanding the genetic basis of reproductive isolation promises insight into speciation and the origins of biological diversity. While progress has been made in identifying genes underlying barriers to reproduction that function after fertilization (post-zygotic isolation), we know much less about earlier acting pre-zygotic barriers. Of particular interest are barriers involved in mating and fertilization that can evolve extremely rapidly under sexual selection, suggesting they may play a prominent role in the initial stages of reproductive isolation. A significant challenge to the field of speciation genetics is developing new approaches for identification of candidate genes underlying these barriers, particularly among non-traditional model systems. We employ powerful proteomic and genomic strategies to study the genetic basis of conspecific pollen precedence, an important component of pre-zygotic reproductive isolation among yellow monkeyflowers (Mimulus spp.) resulting from male pollen competition. We use isotopic labeling in combination with shotgun proteomics to identify more than 2,000 male function (pollen tube) proteins within maternal reproductive structures (styles) of M. guttatus flowers where pollen competition occurs. We then sequence array-captured pollen tube exomes from a large outcrossing population of M. guttatus, and identify those genes with evidence of selective sweeps or balancing selection consistent with their role in pollen competition. We also test for evidence of positive selection on these genes more broadly across yellow monkeyflowers, because a signal of adaptive divergence is a common feature of genes causing reproductive isolation. Together the molecular evolution studies identify 159 pollen tube proteins that are candidate genes for conspecific pollen precedence. Our work demonstrates how powerful proteomic and genomic tools can be readily adapted to non-traditional model systems, allowing for genome-wide screens towards the goal of identifying the molecular basis of genetically complex traits.
Barriers to reproduction are necessary for generating new species. Little is known about the genes underlying reproductive barriers, particularly those that function prior to fertilization, but their identity is of great interest as they offer insight into the genetic mechanisms and evolutionary forces generating biological diversity. In this work, we use an emerging plant model system for speciation studies (yellow monkeyflowers, species of Mimulus) to identify genes that might influence the relative competitive abilities of male pollen from the same versus different species within the maternal flower's style. This is a common reproductive barrier among plant taxa known as conspecific pollen precedence (CPP), and is analogous to sperm competition during animal fertilization. We first identify the pollen proteins that are found within the style where pollen competition occurs, and then screen these for evidence that may indicate which genes have been targets of pollen competition (a form of sexual selection among individuals of a population) or adaptive diversification among species of yellow monkeyflowers (a common feature of genes underlying reproductive barriers). Our evolutionary analyses identify 159 candidates that may function in reproductive isolation of yellow monkeyflowers, and provide some of the first broad perspectives on evolution of plant reproductive genes.
We report the implementation of front-end higher energy collision induced dissociation (fHCD) on a bench-top dual pressure linear ion trap. Software and hardware modifications were employed, described in detail vide-infra, to allow isolated ions to undergo collisions with ambient gas molecules in an intermediate multipole (q00) of the instrument. Results comparing the performance of fHCD and resonance excitation collision induced dissociation (RE-CID) in terms of injection time, total number of scans, efficiency, mass measurement accuracy (MMA), unique peptide identifications, and spectral quality of labile modified peptides are presented. fHCD is approximately 23% as efficient as RE-CID and, depending on the search algorithm, it identifies 6.6% more or 15% less peptides (q<0.01) from a soluble whole-cell lysate (Caenorhabditis elegans) than RE-CID using Mascot or Sequest search algorithms, respectively. fHCD offers a clear advantage for the analysis of phosphorylated and glycosylated (O-GlcNAc) peptides as the average cross-correlation score (XCorr) for spectra using fHCD was statistically greater (p<0.05) than for spectra collected using RE-CID.
The identification of proteins from spectra derived from a tandem mass spectrometry experiment involves several challenges: matching each observed spectrum to a peptide sequence, ranking the resulting collection of peptide-spectrum matches, assigning statistical confidence estimates to the matches, and identifying the proteins. The present work addresses algorithms to rank peptide-spectrum matches. Many of these algorithms, such as PeptideProphet, IDPicker, or Q-ranker, follow similar methodology that includes representing peptide-spectrum matches as feature vectors and using optimization techniques to rank them. We propose a richer and more flexible feature set representation that is based on the parametrization of the SEQUEST XCorr score and that can be used by all of these algorithms. This extended feature set allows a more effective ranking of the peptide-spectrum matches based on the target-decoy strategy, in comparison to a baseline feature set devoid of these XCorr-based features. Ranking using the extended feature set gives 10–40% improvement in the number of distinct peptide identifications relative to a range of q-value thresholds. While this work is inspired by the model of the theoretical spectrum and the similarity measure between spectra used specifically by SEQUEST, the method itself can be applied to the output of any database search. Further, our approach can be trivially extended beyond XCorr to any linear operator that can serve as similarity score between experimental spectra and peptide sequences.
Filter aided sample preparation (FASP) and a new sample preparation method using a modified commercial SDS removal spin column are quantitatively compared in terms of their performance for shotgun proteomic experiments in three complex proteomic samples: a Saccharomyces cerevisiae lysate (insoluble fraction), a Caenorhabditis elegans lysate (soluble fraction), and a human embryonic kidney cell line (HEK293T). The characteristics and total number of peptides and proteins identified are compared between the two procedures. The SDS spin column procedure affords a conservative 4-fold improvement in throughput, is more reproducible, less expensive (i.e., requires less materials), and identifies between 30–107% more peptides at a q≤0.01, than the FASP procedure. The peptides identified by SDS spin column are more hydrophobic than species identified by the FASP procedure as indicated by the distribution of GRAVY scores. Ultimately, these improvements correlate to as great as a 50% increase in protein identifications with 2 or more peptides.
Bottom-up proteomics; shotgun proteomics; protein identifications; sample preparation protocols; sodium dodecyl sulfate
Yellow dwarf viruses cause the most economically important virus diseases of cereal crops worldwide and are transmitted by aphid vectors. The identification of aphid genes and proteins mediating virus transmission is critical to develop agriculturally sustainable virus management practices and to understand viral strategies for circulative movement in all insect vectors. Two cyclophilin B proteins, S28 and S29, were identified previously in populations of Schizaphisgraminum that differed in their ability to transmit the RPV strain of Cereal yellow dwarf virus (CYDV-RPV). The presence of S29 was correlated with F2 genotypes that were efficient virus transmitters. The present study revealed the two proteins were isoforms, and a single amino acid change distinguished S28 and S29. The distribution of the two alleles was determined in 12 F2 genotypes segregating for CYDV-RPV transmission capacity and in 11 genetically independent, field-collected S. graminum biotypes. Transmission efficiency for CYDV-RPV was determined in all genotypes and biotypes. The S29 isoform was present in all genotypes or biotypes that efficiently transmit CYDV-RPV and more specifically in genotypes that efficiently transport virus across the hindgut. We confirmed a direct interaction between CYDV-RPV and both S28 and S29 using purified virus and bacterially expressed, his-tagged S28 and S29 proteins. Importantly, S29 failed to interact with a closely related virus that is transported across the aphid midgut. We tested for in vivo interactions using an aphid-virus co-immunoprecipitation strategy coupled with a bottom-up LC-MS/MS analysis using a Q Exactive mass spectrometer. This analysis enabled us to identify a third cyclophilin protein, cyclophilin A, interacting directly or in complex with purified CYDV-RPV. Taken together, these data provide evidence that both cyclophilin A and B interact with CYDV-RPV, and these interactions may be important but not sufficient to mediate virus transport from the hindgut lumen into the hemocoel.
We report a method to measure in vivo turnover of four proteins from sequential tracheal aspirates obtained from human newborn infants with respiratory distress syndrome using targeted proteomics. We detected enrichment for all targeted proteins approximately 3 hours from the start of infusion of [5,5,5-2H3] leucine, secretion times that varied from 1.2 to 2.5 hours, and half lives that ranged between 10 and 21 hours. Complement factor B, a component of the alternative pathway of complement activation, had an ~2-fold longer half life than the other three proteins. In addition, the kinetics of mature and carboxy-terminal tryptic peptides from the same protein (surfactant protein B) were not statistically different (p=0.49).
Protein Turnover; Respiratory Distress Syndrome; Selected Reaction Monitoring; SRM; Protein Kinetics; Protein Metabolism
We investigate the role of mitochondrial oxidative stress in mitochondrial proteome remodelling using mouse models of heart failure induced by pressure overload.
Methods and results
We demonstrate that mice overexpressing catalase targeted to mitochondria (mCAT) attenuate pressure overload-induced heart failure. An improved method of label-free unbiased analysis of the mitochondrial proteome was applied to the mouse model of heart failure induced by transverse aortic constriction (TAC). A total of 425 mitochondrial proteins were compared between wild-type and mCAT mice receiving TAC or sham surgery. The changes in the mitochondrial proteome in heart failure included decreased abundance of proteins involved in fatty acid metabolism, an increased abundance of proteins in glycolysis, apoptosis, mitochondrial unfolded protein response and proteolysis, transcription and translational control, and developmental processes as well as responses to stimuli. Overexpression of mCAT better preserved proteins involved in fatty acid metabolism and attenuated the increases in apoptotic and proteolytic enzymes. Interestingly, gene ontology analysis also showed that monosaccharide metabolic processes and protein folding/proteolysis were only overrepresented in mCAT but not in wild-type mice in response to TAC.
This is the first study to demonstrate that scavenging mitochondrial reactive oxygen species (ROS) by mCAT not only attenuates most of the mitochondrial proteome changes in heart failure, but also induces a subset of unique alterations. These changes represent processes that are adaptive to the increased work and metabolic requirements of pressure overload, but which are normally inhibited by overproduction of mitochondrial ROS.
Mitochondria; Oxidative stress; Proteome; Pressure overload; Cardiomyopathy
High field asymmetric waveform ion mobility spectrometry (FAIMS) has been used increasingly in recent years as an additional method of ion separation and selection prior to mass spectrometry. The FAIMS electrodes are relatively simple to design and fabricate for laboratories wishing to implement their own FAIMS designs. However, construction of the electronics apparatus needed to produce the required high magnitude asymmetric electric field oscillating at a frequency of several hundred kilohertz is not trivial. Here we present an entirely custom-built electronics setup capable of supplying the required waveforms and voltages. The apparatus is relatively simple and inexpensive to implement. We also present data acquired on this system demonstrating the use of FAIMS as a gas phase ion filter interface to an ion trap mass spectrometer.
Spectral counting methods provide an easy means of identifying proteins with differing abundances between complex mixtures using shotgun proteomics data. The crux spectral-counts command, implemented as part of the Crux software toolkit, implements four previously reported spectral counting methods, the spectral index (SIN), the exponentially modified protein abundance index (emPAI), the normalized spectral abundance factor (NSAF), and the distributed normalized spectral abundance factor (dNSAF).
We compared the reproducibility and the linearity relative to each protein’s abundance of the four spectral counting metrics. Our analysis suggests that NSAF yields the most reproducible counts across technical and biological replicates, and both SIN and NSAF achieve the best linearity.
With the crux spectral-counts command, Crux provides open-source modular methods to analyze mass spectrometry data for identifying and now quantifying peptides and proteins. The C++ source code, compiled binaries, spectra and sequence databases are available at
There are ongoing events where aircraft engine lubricant containing tricresyl phosphates (TCPs) contaminates aircraft cabins. Some individuals have experienced tremors or other neurological symptoms that may last for many months following exposures. Mass spectrometric (MS) protocols are being developed to determine the percentage of “biomarker proteins” that are modified by such exposures, specifically on active site serines. Both plasma butyrylcholinesterase (BChE) and red cell acylpeptide hydrolase (APH) are readily inhibited by 2-(o-cresyl)-4H-1:3:2:benzodioxaphosphoran-2-one (CBDP) or phenyl saligenin cyclic phosphate (PSP) and have the potential to provide information about the level of exposure of an individual. We have developed immunomagnetic bead-based single-step purification protocols for both BChE and APH and have characterized the active site serine adducts of BChE by MS.
Biomarkers; Tricresyl phosphate; CBDP; Butyrylcholinesterase; Acylpeptide hydrolase; Aerotoxic syndrome
Selected reaction monitoring (SRM) is a powerful tandem mass spectrometry method that can be used to monitor target peptides within a complex protein digest. The specificity and sensitivity of the approach, as well as its capability to multiplex the measurement of many analytes in parallel, has made it a technology of particular promise for hypothesis driven proteomics. An underappreciated step in the development of an assay to measure many peptides in parallel is the time and effort necessary to establish a usable assay. Here we report the use of shotgun proteomics data to expedite the selection of SRM transitions for target peptides of interest. The use of tandem mass spectrometry data acquired on an LTQ ion trap mass spectrometer can accurately predict which fragment ions will produce the greatest signal in an SRM assay using a triple quadrupole mass spectrometer. Furthermore, we present a scoring routine that can compare the targeted SRM chromatogram data with an MS/MS spectrum acquired by data-dependent acquisition and stored in a library. This scoring routine is invaluable in determining which signal in the chromatogram from a complex mixture best represents the target peptide. These algorithmic developments have been implemented in a software package that is available from the authors upon request.
The identification of peptides by microcapillary liquid chromatography-tandem mass spectrometry (µLC-MS/MS) has become routine because of the development of fast scanning mass spectrometers, data-dependent acquisition, and database searching algorithms. However, many peptides within the detection limit of the mass spectrometer remain unidentified because of limitations in MS/MS sampling speed despite the dynamic range and peak capacity of the instrument. We have developed an automated approach that uses the mass spectra from high resolution µLC-MS data to define the molecular species present in the mixture and directs the acquisition of MS/MS spectra to precursors that were missed in prior analyses. This approach increases the coverage of the molecular species sampled by MS/MS and consequently the number of peptides and proteins identified during the acquisition of technical or biological replicates using a simple one-dimensional chromatographic separation. The combination of a unique workflow and custom software contribute to the improved identification of molecular features detected in proteomics experiments of complex protein mixtures.
Proteomics experiments on complex mixtures have benefited greatly from the advent of fast-scanning ion trap mass spectrometers. However, the complexity and dynamic range of mixtures analyzed using shotgun proteomics is still beyond what can be sampled by data-dependent acquisition. Furthermore, the total liquid chromatography-mass spectrometry (LC-MS) peak capacity is not sufficient to resolve the precursors within these mixtures, let alone acquire tandem mass spectra on all of them. Here we describe the application of a high-field asymmetric waveform ion mobility spectrometry (FAIMS) device as an interface to an ion trap mass spectrometer. The dynamic range and peak capacity of the nanoflow LC-FAIMS-MS analysis was assessed using a complex tryptic digest of S. cerevisiae proteins. By adding this relatively simple device to the front of the mass spectrometer, we obtain an increase in peak capacity >8 fold and an increase in dynamic range of >5 fold, without increasing the length of the LC-MS analysis. Thus, the addition of FAIMS to the front of a table top mass spectrometer can obtain the peak capacity of multidimensional protein identification technology (MudPIT) while increasing the throughput by a factor of 12.
We report a method for high-throughput, cost-efficient empirical discovery of optimal proteotypic peptides and fragment ions for targeted proteomics applications using in vitro-synthesized proteins. We demonstrate the approach using human transcription factors – which are typically difficult, low-abundance – targets with an overall success rate of 98%. We show further that targeted proteomic assays developed using our approach facilitate robust in vivo quantification of human transcription factors.
Advances in Fourier transform mass spectrometry have made the acquisition of high-resolution and accurate mass measurements routine on a chromatographic time-scale. Here we report an algorithm, Hardklör, for the rapid and robust analysis of high resolution mass spectra acquired in shotgun proteomics experiments. Our algorithm is demonstrated in the analysis of an Escherichia coli enriched membrane fraction. The mass spectrometry data of the respective peptides are acquired by micro-capillary HPLC on an LTQ-Orbitrap mass spectrometer with data-dependent acquisition of MS/MS spectra. Hardklör detects 211,272 total peptide isotope distributions over a two hour analysis (75 min gradient) in only a small fraction of the time required to acquire the data. From these data there are 13,665 distinct, chromatographically persistent peptide isotope distributions. Hardklör is also used to assess the quality of the product ion spectra and finds that more than 11.2% of the MS/MS spectra are composed of fragment ions from multiple different molecular species. Additionally, a method is reported that enzymatically labels N-linked glycosylation sites on proteins, creating a unique isotope signature that can be detected with Hardklör. Using the protein invertase, Hardklör identifies 18O-labeled peptide isotope distributions of four glycosylation sites. The speed and robustness of the algorithm create a versatile tool that can be used in many different areas of mass spectrometry data analysis.
High-throughput proteomics experiments involving tandem mass spectrometry produce large volumes of complex data that require sophisticated computational analyses. As such, the field offers many challenges for computational biologists. In this article, we briefly introduce some of the core computational and statistical problems in the field and then describe a variety of outstanding problems that readers of PLoS Computational Biology might be able to help solve.