Data-independent tandem mass spectrometry isolates and fragments all of the molecular species within a given mass-to-charge window, regardless of whether a precursor ion was detected within the window. For shotgun proteomics on complex protein mixtures, data-independent MS/MS offers certain advantages over the traditional data-dependent MS/MS: identification of low-abundance peptides with insignificant precursor peaks; more direct relative quantification, free of biases caused by competing precursors and dynamic exclusion; and faster throughput due to simultaneous fragmentation of multiple peptides. However, data-independent MS/MS, especially on low-resolution ion-trap instruments, strains standard peptide identification programs, because of less precise knowledge of the peptide precursor mass and large numbers of spectra composed of two or more peptides. Here we describe a computer program called DeMux that deconvolves mixture spectra and improves the peptide identification rate by ~25%. We compare the number of identifications made by data-independent and data-dependent MS/MS at the peptide and protein levels: conventional data-dependent MS/MS makes a greater number of identifications but is less reproducible from run to run.
The identification of proteins from spectra derived from a tandem mass spectrometry experiment involves several challenges: matching each observed spectrum to a peptide sequence, ranking the resulting collection of peptide-spectrum matches, assigning statistical confidence estimates to the matches, and identifying the proteins. The present work addresses algorithms to rank peptide-spectrum matches. Many of these algorithms, such as PeptideProphet, IDPicker, or Q-ranker, follow similar methodology that includes representing peptide-spectrum matches as feature vectors and using optimization techniques to rank them. We propose a richer and more flexible feature set representation that is based on the parametrization of the SEQUEST XCorr score and that can be used by all of these algorithms. This extended feature set allows a more effective ranking of the peptide-spectrum matches based on the target-decoy strategy, in comparison to a baseline feature set devoid of these XCorr-based features. Ranking using the extended feature set gives 10–40% improvement in the number of distinct peptide identifications relative to a range of q-value thresholds. While this work is inspired by the model of the theoretical spectrum and the similarity measure between spectra used specifically by SEQUEST, the method itself can be applied to the output of any database search. Further, our approach can be trivially extended beyond XCorr to any linear operator that can serve as similarity score between experimental spectra and peptide sequences.
Yellow dwarf viruses cause the most economically important virus diseases of cereal crops worldwide and are transmitted by aphid vectors. The identification of aphid genes and proteins mediating virus transmission is critical to develop agriculturally sustainable virus management practices and to understand viral strategies for circulative movement in all insect vectors. Two cyclophilin B proteins, S28 and S29, were identified previously in populations of Schizaphisgraminum that differed in their ability to transmit the RPV strain of Cereal yellow dwarf virus (CYDV-RPV). The presence of S29 was correlated with F2 genotypes that were efficient virus transmitters. The present study revealed the two proteins were isoforms, and a single amino acid change distinguished S28 and S29. The distribution of the two alleles was determined in 12 F2 genotypes segregating for CYDV-RPV transmission capacity and in 11 genetically independent, field-collected S. graminum biotypes. Transmission efficiency for CYDV-RPV was determined in all genotypes and biotypes. The S29 isoform was present in all genotypes or biotypes that efficiently transmit CYDV-RPV and more specifically in genotypes that efficiently transport virus across the hindgut. We confirmed a direct interaction between CYDV-RPV and both S28 and S29 using purified virus and bacterially expressed, his-tagged S28 and S29 proteins. Importantly, S29 failed to interact with a closely related virus that is transported across the aphid midgut. We tested for in vivo interactions using an aphid-virus co-immunoprecipitation strategy coupled with a bottom-up LC-MS/MS analysis using a Q Exactive mass spectrometer. This analysis enabled us to identify a third cyclophilin protein, cyclophilin A, interacting directly or in complex with purified CYDV-RPV. Taken together, these data provide evidence that both cyclophilin A and B interact with CYDV-RPV, and these interactions may be important but not sufficient to mediate virus transport from the hindgut lumen into the hemocoel.
We report the implementation of front-end higher energy collision induced dissociation (fHCD) on a bench-top dual pressure linear ion trap. Software and hardware modifications were employed, described in detail vide-infra, to allow isolated ions to undergo collisions with ambient gas molecules in an intermediate multipole (q00) of the instrument. Results comparing the performance of fHCD and resonance excitation collision induced dissociation (RE-CID) in terms of injection time, total number of scans, efficiency, mass measurement accuracy (MMA), unique peptide identifications, and spectral quality of labile modified peptides are presented. fHCD is approximately 23% as efficient as RE-CID and, depending on the search algorithm, it identifies 6.6% more or 15% less peptides (q<0.01) from a soluble whole-cell lysate (Caenorhabditis elegans) than RE-CID using Mascot or Sequest search algorithms, respectively. fHCD offers a clear advantage for the analysis of phosphorylated and glycosylated (O-GlcNAc) peptides as the average cross-correlation score (XCorr) for spectra using fHCD was statistically greater (p<0.05) than for spectra collected using RE-CID.
We report a method to measure in vivo turnover of four proteins from sequential tracheal aspirates obtained from human newborn infants with respiratory distress syndrome using targeted proteomics. We detected enrichment for all targeted proteins approximately 3 hours from the start of infusion of [5,5,5-2H3] leucine, secretion times that varied from 1.2 to 2.5 hours, and half lives that ranged between 10 and 21 hours. Complement factor B, a component of the alternative pathway of complement activation, had an ~2-fold longer half life than the other three proteins. In addition, the kinetics of mature and carboxy-terminal tryptic peptides from the same protein (surfactant protein B) were not statistically different (p=0.49).
Protein Turnover; Respiratory Distress Syndrome; Selected Reaction Monitoring; SRM; Protein Kinetics; Protein Metabolism
Filter aided sample preparation (FASP) and a new sample preparation method using a modified commercial SDS removal spin column are quantitatively compared in terms of their performance for shotgun proteomic experiments in three complex proteomic samples: a Saccharomyces cerevisiae lysate (insoluble fraction), a Caenorhabditis elegans lysate (soluble fraction), and a human embryonic kidney cell line (HEK293T). The characteristics and total number of peptides and proteins identified are compared between the two procedures. The SDS spin column procedure affords a conservative 4-fold improvement in throughput, is more reproducible, less expensive (i.e., requires less materials), and identifies between 30–107% more peptides at a q≤0.01, than the FASP procedure. The peptides identified by SDS spin column are more hydrophobic than species identified by the FASP procedure as indicated by the distribution of GRAVY scores. Ultimately, these improvements correlate to as great as a 50% increase in protein identifications with 2 or more peptides.
Bottom-up proteomics; shotgun proteomics; protein identifications; sample preparation protocols; sodium dodecyl sulfate
We investigate the role of mitochondrial oxidative stress in mitochondrial proteome remodelling using mouse models of heart failure induced by pressure overload.
Methods and results
We demonstrate that mice overexpressing catalase targeted to mitochondria (mCAT) attenuate pressure overload-induced heart failure. An improved method of label-free unbiased analysis of the mitochondrial proteome was applied to the mouse model of heart failure induced by transverse aortic constriction (TAC). A total of 425 mitochondrial proteins were compared between wild-type and mCAT mice receiving TAC or sham surgery. The changes in the mitochondrial proteome in heart failure included decreased abundance of proteins involved in fatty acid metabolism, an increased abundance of proteins in glycolysis, apoptosis, mitochondrial unfolded protein response and proteolysis, transcription and translational control, and developmental processes as well as responses to stimuli. Overexpression of mCAT better preserved proteins involved in fatty acid metabolism and attenuated the increases in apoptotic and proteolytic enzymes. Interestingly, gene ontology analysis also showed that monosaccharide metabolic processes and protein folding/proteolysis were only overrepresented in mCAT but not in wild-type mice in response to TAC.
This is the first study to demonstrate that scavenging mitochondrial reactive oxygen species (ROS) by mCAT not only attenuates most of the mitochondrial proteome changes in heart failure, but also induces a subset of unique alterations. These changes represent processes that are adaptive to the increased work and metabolic requirements of pressure overload, but which are normally inhibited by overproduction of mitochondrial ROS.
Mitochondria; Oxidative stress; Proteome; Pressure overload; Cardiomyopathy
Spectral counting methods provide an easy means of identifying proteins with differing abundances between complex mixtures using shotgun proteomics data. The crux spectral-counts command, implemented as part of the Crux software toolkit, implements four previously reported spectral counting methods, the spectral index (SIN), the exponentially modified protein abundance index (emPAI), the normalized spectral abundance factor (NSAF), and the distributed normalized spectral abundance factor (dNSAF).
We compared the reproducibility and the linearity relative to each protein’s abundance of the four spectral counting metrics. Our analysis suggests that NSAF yields the most reproducible counts across technical and biological replicates, and both SIN and NSAF achieve the best linearity.
With the crux spectral-counts command, Crux provides open-source modular methods to analyze mass spectrometry data for identifying and now quantifying peptides and proteins. The C++ source code, compiled binaries, spectra and sequence databases are available at
There are ongoing events where aircraft engine lubricant containing tricresyl phosphates (TCPs) contaminates aircraft cabins. Some individuals have experienced tremors or other neurological symptoms that may last for many months following exposures. Mass spectrometric (MS) protocols are being developed to determine the percentage of “biomarker proteins” that are modified by such exposures, specifically on active site serines. Both plasma butyrylcholinesterase (BChE) and red cell acylpeptide hydrolase (APH) are readily inhibited by 2-(o-cresyl)-4H-1:3:2:benzodioxaphosphoran-2-one (CBDP) or phenyl saligenin cyclic phosphate (PSP) and have the potential to provide information about the level of exposure of an individual. We have developed immunomagnetic bead-based single-step purification protocols for both BChE and APH and have characterized the active site serine adducts of BChE by MS.
Biomarkers; Tricresyl phosphate; CBDP; Butyrylcholinesterase; Acylpeptide hydrolase; Aerotoxic syndrome
High field asymmetric waveform ion mobility spectrometry (FAIMS) has been used increasingly in recent years as an additional method of ion separation and selection prior to mass spectrometry. The FAIMS electrodes are relatively simple to design and fabricate for laboratories wishing to implement their own FAIMS designs. However, construction of the electronics apparatus needed to produce the required high magnitude asymmetric electric field oscillating at a frequency of several hundred kilohertz is not trivial. Here we present an entirely custom-built electronics setup capable of supplying the required waveforms and voltages. The apparatus is relatively simple and inexpensive to implement. We also present data acquired on this system demonstrating the use of FAIMS as a gas phase ion filter interface to an ion trap mass spectrometer.
We report a method for high-throughput, cost-efficient empirical discovery of optimal proteotypic peptides and fragment ions for targeted proteomics applications using in vitro-synthesized proteins. We demonstrate the approach using human transcription factors – which are typically difficult, low-abundance – targets with an overall success rate of 98%. We show further that targeted proteomic assays developed using our approach facilitate robust in vivo quantification of human transcription factors.
Regulatory factor binding to genomic DNA protects the underlying sequence from cleavage by DNaseI, leaving nucleotide-resolution footprints. Using genomic DNaseI footprinting across 41 diverse cell and tissue types, we detected 45 million factor occupancy events within regulatory regions, representing differential binding to 8.4 million distinct short sequence elements. Here we show that this small genomic sequence compartment, roughly twice the size of the exome, encodes an expansive repertoire of conserved recognition sequences for DNA-binding proteins that nearly doubles the size of the human cis-regulatory lexicon. We find that genetic variants affecting allelic chromatin states are concentrated in footprints, and that these elements are preferentially sheltered from DNA methylation. High-resolution DNaseI cleavage patterns mirror nucleotide-level evolutionary conservation and track the crystallographic topography of protein-DNA interfaces, indicating that transcription factor structure has been evolutionarily imprinted on the human genome sequence. We identify a stereotyped 50 base-pair footprint that precisely defines the site of transcript origination within thousands of human promoters. Finally, we describe a large collection of novel regulatory factor recognition motifs that are highly conserved in both sequence and function, and exhibit cell-selective occupancy patterns that closely parallel major regulators of development, differentiation, and pluripotency.
chromatin; protein occupancy; DNaseI footprinting; ENCODE; regulation
Selected reaction monitoring (SRM) is a powerful tandem mass spectrometry method that can be used to monitor target peptides within a complex protein digest. The specificity and sensitivity of the approach, as well as its capability to multiplex the measurement of many analytes in parallel, has made it a technology of particular promise for hypothesis driven proteomics. An underappreciated step in the development of an assay to measure many peptides in parallel is the time and effort necessary to establish a usable assay. Here we report the use of shotgun proteomics data to expedite the selection of SRM transitions for target peptides of interest. The use of tandem mass spectrometry data acquired on an LTQ ion trap mass spectrometer can accurately predict which fragment ions will produce the greatest signal in an SRM assay using a triple quadrupole mass spectrometer. Furthermore, we present a scoring routine that can compare the targeted SRM chromatogram data with an MS/MS spectrum acquired by data-dependent acquisition and stored in a library. This scoring routine is invaluable in determining which signal in the chromatogram from a complex mixture best represents the target peptide. These algorithmic developments have been implemented in a software package that is available from the authors upon request.
The identification of peptides by microcapillary liquid chromatography-tandem mass spectrometry (µLC-MS/MS) has become routine because of the development of fast scanning mass spectrometers, data-dependent acquisition, and database searching algorithms. However, many peptides within the detection limit of the mass spectrometer remain unidentified because of limitations in MS/MS sampling speed despite the dynamic range and peak capacity of the instrument. We have developed an automated approach that uses the mass spectra from high resolution µLC-MS data to define the molecular species present in the mixture and directs the acquisition of MS/MS spectra to precursors that were missed in prior analyses. This approach increases the coverage of the molecular species sampled by MS/MS and consequently the number of peptides and proteins identified during the acquisition of technical or biological replicates using a simple one-dimensional chromatographic separation. The combination of a unique workflow and custom software contribute to the improved identification of molecular features detected in proteomics experiments of complex protein mixtures.
Proteomics experiments on complex mixtures have benefited greatly from the advent of fast-scanning ion trap mass spectrometers. However, the complexity and dynamic range of mixtures analyzed using shotgun proteomics is still beyond what can be sampled by data-dependent acquisition. Furthermore, the total liquid chromatography-mass spectrometry (LC-MS) peak capacity is not sufficient to resolve the precursors within these mixtures, let alone acquire tandem mass spectra on all of them. Here we describe the application of a high-field asymmetric waveform ion mobility spectrometry (FAIMS) device as an interface to an ion trap mass spectrometer. The dynamic range and peak capacity of the nanoflow LC-FAIMS-MS analysis was assessed using a complex tryptic digest of S. cerevisiae proteins. By adding this relatively simple device to the front of the mass spectrometer, we obtain an increase in peak capacity >8 fold and an increase in dynamic range of >5 fold, without increasing the length of the LC-MS analysis. Thus, the addition of FAIMS to the front of a table top mass spectrometer can obtain the peak capacity of multidimensional protein identification technology (MudPIT) while increasing the throughput by a factor of 12.
High-throughput proteomics experiments involving tandem mass spectrometry produce large volumes of complex data that require sophisticated computational analyses. As such, the field offers many challenges for computational biologists. In this article, we briefly introduce some of the core computational and statistical problems in the field and then describe a variety of outstanding problems that readers of PLoS Computational Biology might be able to help solve.
Traditionally, protein turnover has been measured using stable isotope labeled (SIL) tracers. The labeled tracer is incorporated into proteins, proteins of interest are isolated, hydrolyzed into their amino acid constituents, derivatized, and enrichment is measured via gas chromatography mass spectrometry. This method has significant limitations including low throughput and the accuracy of this method can be compromised by the efficacy of the protein isolation step – limiting each experiment to a single abundant protein that is easily purified. Herein, we present a method to determine protein turnover on a global scale using in-house developed software and shotgun proteomics. These developments allow for the determination of protein kinetics of over 1000 proteins in only a few hours. The method for producing labeled alveolar type 2 cells will be described in detail. The cells were grown in media containing 100% 2H3-leucine and cells were harvested at 3 different time points of 4, 8, and 24 hours. Samples were analyzed by nano-flow liquid chromatography coupled to an LTQ-FT-ICR MS. In house developed software, termed Topograph, was used to determine enrichment values and calculate half lives of all identified leucine containing peptides. Preliminary results demonstrate that half lives could be calculated for approximately 1400 of the 2000 proteins detected. In addition, 2 different peptides from mature surfactant protein B, which is an essential component of proper functioning surfactant, showed great agreement (20.0 and 21.1 hours). This method will be used as a model to study various genetic regulators of surfactant composition and protein metabolism. We also demonstrate that a similar stable isotope tracer strategy can be applied in vivo toward measuring surfactant protein B using targeted proteomics.
Variation in RNA, protein, and metabolite levels among individuals is an important source of physiological and phenotypic differences within and between species. However, relatively little is known about the magnitude and genetic basis of these high-dimensional molecular phenotypes. Yeast provide an ideal model system for the genetic dissection of complex and quantitative traits, and whole-genome sequences are accumulating for dozens of Saccharomyces cerevisiae strains isolated from natural, industrial, and lab environments. We grew a diverse selection of sequenced strains in continuous culture and used a randomized and replicated study design. We exploited all the technologies in the Yeast Resource Center to obtain high quality and high coverage measurements of RNA, protein, metabolite, and morphological phenotypes. The resulting data sets provide a unique and powerful opportunity to combine comparative functional genomics data with comparative sequence analyses and delineate the genetic architecture of complex and quantitative phenotypes in yeast. Our initial analyses indicate that a high degree of strain-to-strain variation exists at all systems levels, and that this variation largely correlates with strain relatedness as measured by sequence comparison. Variation in RNA levels correlates with the corresponding peptides and related metabolites in complex ways. These experiments have resulted in an important large-scale data set of thousands of quantitative traits collected in a carefully designed randomized study, which will provide novel insights into the magnitude and patterns of natural variation of molecular and morphological phenotypes, as well as preliminary insights into their genetic basis.
Proteomics experiments based on Selected Reaction Monitoring (SRM, also referred to as Multiple Reaction Monitoring or MRM) are being used to target large numbers of protein candidates in complex mixtures. At present, instrument parameters are often optimized for each peptide, a time and resource intensive process. Large SRM experiments are greatly facilitated by having the ability to predict MS instrument parameters that work well with the broad diversity of peptides they target. For this reason, we investigated the impact of using simple linear equations to predict the collision energy (CE) on peptide signal intensity and compared it with the empirical optimization of the CE for each peptide and transition individually. Using optimized linear equations, the difference between predicted and empirically derived CE values was found to be an average gain of only 7.8% of total peak area. We also found that existing commonly used linear equations fall short of their potential, and should be recalculated for each charge state and when introducing new instrument platforms. We provide a fully automated pipeline for calculating these equations and individually optimizing CE of each transition on SRM instruments from Agilent, Applied Biosystems, Thermo-Scientific and Waters in the open source Skyline software tool (http://proteome.gs.washington.edu/software/skyline).
Proper centromere function is critical to maintain genomic stability and to prevent aneuploidy, a hallmark of tumors and birth defects. A conserved feature of all eukaryotic centromeres is an essential histone H3 variant called CENP-A that requires a centromere targeting domain (CATD) for its localization. Although proteolysis prevents CENP-A from mislocalizing to euchromatin, regulatory factors have not been identified. Here, we identify an E3 ubiquitin ligase called Psh1 that leads to the degradation of Cse4, the budding yeast CENP-A homolog. Cse4 overexpression is toxic to psh1Δ cells and results in euchromatic localization. Strikingly, the Cse4 centromere targeting domain is a key regulator of its stability and helps Psh1 discriminate Cse4 from histone H3. Taken together, we propose that the CATD has a previously unknown role in maintaining the exclusive localization of Cse4 by preventing its mislocalization to euchromatin via Psh1-mediated degradation.
The problem of identifying proteins from a shotgun proteomics experiment has not been definitively solved. Identifying the proteins in a sample requires ranking them, ideally with interpretable scores. In particular, “degenerate” peptides, which map to multiple proteins, have made such a ranking difficult to compute. The problem of computing posterior probabilities for the proteins, which can be interpreted as confidence in a protein’s presence, has been especially daunting. Previous approaches have either ignored the peptide degeneracy problem completely, addressed it by computing a heuristic set of proteins or heuristic posterior probabilities, or by estimating the posterior probabilities with sampling methods. We present a probabilistic model for protein identification in tandem mass spectrometry that recognizes peptide degeneracy. We then introduce graph-transforming algorithms that facilitate efficient computation of protein probabilities, even for large data sets. We evaluate our identification procedure on five different well-characterized data sets and demonstrate our ability to efficiently compute high-quality protein posteriors.
Electron-transfer dissociation (ETD) induces fragmentation along the peptide backbone by transferring an electron from a radical anion to a protonated peptide. In contrast with collision induced dissociation, side chains and modifications such as phosphorylation are left intact through the ETD process. Because the precursor charge state is an important input to MS/MS sequence database search tools, the ability to accurately determine the precursor charge is helpful for the identification process. Furthermore, because ETD can be applied to large, highly charged peptides, the need for accurate precursor charge state determination is magnified. Otherwise, each spectrum must be searched repeatedly using a large range of possible precursor charge states. To address this problem, we have developed an ETD charge state prediction tool based on support vector machine classifiers that is demonstrated to exhibit superior classification accuracy while minimizing the overall number of predicted charge states. The tool is freely available, open source, cross platform compatible, and demonstrated to perform well when compared with an existing charge state prediction tool. The program is available from http://code.google.com/p/etdz/.
electron transfer dissociation; charge state prediction; support vector machine; tandem mass spectrometry
Advances in Fourier transform mass spectrometry have made the acquisition of high-resolution and accurate mass measurements routine on a chromatographic time-scale. Here we report an algorithm, Hardklör, for the rapid and robust analysis of high resolution mass spectra acquired in shotgun proteomics experiments. Our algorithm is demonstrated in the analysis of an Escherichia coli enriched membrane fraction. The mass spectrometry data of the respective peptides are acquired by micro-capillary HPLC on an LTQ-Orbitrap mass spectrometer with data-dependent acquisition of MS/MS spectra. Hardklör detects 211,272 total peptide isotope distributions over a two hour analysis (75 min gradient) in only a small fraction of the time required to acquire the data. From these data there are 13,665 distinct, chromatographically persistent peptide isotope distributions. Hardklör is also used to assess the quality of the product ion spectra and finds that more than 11.2% of the MS/MS spectra are composed of fragment ions from multiple different molecular species. Additionally, a method is reported that enzymatically labels N-linked glycosylation sites on proteins, creating a unique isotope signature that can be detected with Hardklör. Using the protein invertase, Hardklör identifies 18O-labeled peptide isotope distributions of four glycosylation sites. The speed and robustness of the algorithm create a versatile tool that can be used in many different areas of mass spectrometry data analysis.
Knowledge of protein structures and protein-protein interactions is essential for understanding of biological processes. Recent advances in protein crosslinking and mass spectrometry (MS) have shown significant potential to contribute to this area. Here we report a novel method to rapidly and accurately identify crosslinked peptides based on their unique isotope signature when digested in the presence of H218O. This method overcomes the need for specially synthesized crosslinkers and/or multiple MS runs required by other techniques. We validated our method by performing a ‘blind’ analysis of 5 proteins/complexes of known structure. Side chain repacking calculations using Rosetta show that 17 of our 20 positively identified crosslinks fit the published atomic structures. The remaining 3 crosslinks are likely due to protein aggregation. The accuracy and rapid throughput of our workflow will advance the use of protein crosslinking in structural biology.
Protein Crosslinking; Isotope signature; Structure; Protein-protein interaction; Mass spectrometry; Rosetta; Hardklör; PepLynx
The Purkinje cell degeneration (pcd) mouse is a recessive model of neurodegeneration, involving cerebellum and retina. Purkinje cell death in pcd is dramatic, as >99% of Purkinje neurons are lost in three weeks. Loss-of-function of Nna1 causes pcd, and Nna1 is a highly conserved zinc carboxypeptidase. To determine the basis of pcd, we implemented a two-pronged approach, combining characterization of loss-of-function phenotypes of the Drosophila Nna1 orthologue (NnaD) with proteomics analysis of pcd mice. Reduced NnaD function yielded larval lethality, with survivors displaying phenotypes that mirror disease in pcd. Quantitative proteomics revealed expression alterations for glycolytic and oxidative phosphorylation enzymes. Nna proteins localize to mitochondria, loss of NnaD / Nna1 produces mitochondrial abnormalities, and pcd mice display altered proteolytic processing of Nna1 interacting proteins. Our studies indicate that Nna1 loss-of-function results in altered bioenergetics and mitochondrial dysfunction, and suggest that pcd shares pathogenic features with neurodegenerative disorders such as Parkinson's disease.