Understanding the genetic basis of reproductive isolation promises insight into speciation and the origins of biological diversity. While progress has been made in identifying genes underlying barriers to reproduction that function after fertilization (post-zygotic isolation), we know much less about earlier acting pre-zygotic barriers. Of particular interest are barriers involved in mating and fertilization that can evolve extremely rapidly under sexual selection, suggesting they may play a prominent role in the initial stages of reproductive isolation. A significant challenge to the field of speciation genetics is developing new approaches for identification of candidate genes underlying these barriers, particularly among non-traditional model systems. We employ powerful proteomic and genomic strategies to study the genetic basis of conspecific pollen precedence, an important component of pre-zygotic reproductive isolation among yellow monkeyflowers (Mimulus spp.) resulting from male pollen competition. We use isotopic labeling in combination with shotgun proteomics to identify more than 2,000 male function (pollen tube) proteins within maternal reproductive structures (styles) of M. guttatus flowers where pollen competition occurs. We then sequence array-captured pollen tube exomes from a large outcrossing population of M. guttatus, and identify those genes with evidence of selective sweeps or balancing selection consistent with their role in pollen competition. We also test for evidence of positive selection on these genes more broadly across yellow monkeyflowers, because a signal of adaptive divergence is a common feature of genes causing reproductive isolation. Together the molecular evolution studies identify 159 pollen tube proteins that are candidate genes for conspecific pollen precedence. Our work demonstrates how powerful proteomic and genomic tools can be readily adapted to non-traditional model systems, allowing for genome-wide screens towards the goal of identifying the molecular basis of genetically complex traits.
Barriers to reproduction are necessary for generating new species. Little is known about the genes underlying reproductive barriers, particularly those that function prior to fertilization, but their identity is of great interest as they offer insight into the genetic mechanisms and evolutionary forces generating biological diversity. In this work, we use an emerging plant model system for speciation studies (yellow monkeyflowers, species of Mimulus) to identify genes that might influence the relative competitive abilities of male pollen from the same versus different species within the maternal flower's style. This is a common reproductive barrier among plant taxa known as conspecific pollen precedence (CPP), and is analogous to sperm competition during animal fertilization. We first identify the pollen proteins that are found within the style where pollen competition occurs, and then screen these for evidence that may indicate which genes have been targets of pollen competition (a form of sexual selection among individuals of a population) or adaptive diversification among species of yellow monkeyflowers (a common feature of genes underlying reproductive barriers). Our evolutionary analyses identify 159 candidates that may function in reproductive isolation of yellow monkeyflowers, and provide some of the first broad perspectives on evolution of plant reproductive genes.
We report an algorithm designed for the calibration of low resolution peptide mass spectra. Our algorithm is implemented in a program called FineTune which corrects systematic mass measurement error in one minute, with no input required besides the mass spectra themselves. The mass measurement accuracy for a set of spectra collected on an LTQ-Velos improved 20-fold from −0.1776 ± 0.0010 m/z to 0.0078 ± 0.0006 m/z after calibration (avg +/− 95% confidence interval). The precision in mass measurement was improved due to the correction of non-linear variation in mass measurement accuracy across the m/z range.
Mass measurement accuracy; shotgun proteomics; linear ion trap
The identification of proteins from spectra derived from a tandem mass spectrometry experiment involves several challenges: matching each observed spectrum to a peptide sequence, ranking the resulting collection of peptide-spectrum matches, assigning statistical confidence estimates to the matches, and identifying the proteins. The present work addresses algorithms to rank peptide-spectrum matches. Many of these algorithms, such as PeptideProphet, IDPicker, or Q-ranker, follow similar methodology that includes representing peptide-spectrum matches as feature vectors and using optimization techniques to rank them. We propose a richer and more flexible feature set representation that is based on the parametrization of the SEQUEST XCorr score and that can be used by all of these algorithms. This extended feature set allows a more effective ranking of the peptide-spectrum matches based on the target-decoy strategy, in comparison to a baseline feature set devoid of these XCorr-based features. Ranking using the extended feature set gives 10–40% improvement in the number of distinct peptide identifications relative to a range of q-value thresholds. While this work is inspired by the model of the theoretical spectrum and the similarity measure between spectra used specifically by SEQUEST, the method itself can be applied to the output of any database search. Further, our approach can be trivially extended beyond XCorr to any linear operator that can serve as similarity score between experimental spectra and peptide sequences.
Yellow dwarf viruses cause the most economically important virus diseases of cereal crops worldwide and are transmitted by aphid vectors. The identification of aphid genes and proteins mediating virus transmission is critical to develop agriculturally sustainable virus management practices and to understand viral strategies for circulative movement in all insect vectors. Two cyclophilin B proteins, S28 and S29, were identified previously in populations of Schizaphisgraminum that differed in their ability to transmit the RPV strain of Cereal yellow dwarf virus (CYDV-RPV). The presence of S29 was correlated with F2 genotypes that were efficient virus transmitters. The present study revealed the two proteins were isoforms, and a single amino acid change distinguished S28 and S29. The distribution of the two alleles was determined in 12 F2 genotypes segregating for CYDV-RPV transmission capacity and in 11 genetically independent, field-collected S. graminum biotypes. Transmission efficiency for CYDV-RPV was determined in all genotypes and biotypes. The S29 isoform was present in all genotypes or biotypes that efficiently transmit CYDV-RPV and more specifically in genotypes that efficiently transport virus across the hindgut. We confirmed a direct interaction between CYDV-RPV and both S28 and S29 using purified virus and bacterially expressed, his-tagged S28 and S29 proteins. Importantly, S29 failed to interact with a closely related virus that is transported across the aphid midgut. We tested for in vivo interactions using an aphid-virus co-immunoprecipitation strategy coupled with a bottom-up LC-MS/MS analysis using a Q Exactive mass spectrometer. This analysis enabled us to identify a third cyclophilin protein, cyclophilin A, interacting directly or in complex with purified CYDV-RPV. Taken together, these data provide evidence that both cyclophilin A and B interact with CYDV-RPV, and these interactions may be important but not sufficient to mediate virus transport from the hindgut lumen into the hemocoel.
Regulatory factor binding to genomic DNA protects the underlying sequence from cleavage by DNaseI, leaving nucleotide-resolution footprints. Using genomic DNaseI footprinting across 41 diverse cell and tissue types, we detected 45 million factor occupancy events within regulatory regions, representing differential binding to 8.4 million distinct short sequence elements. Here we show that this small genomic sequence compartment, roughly twice the size of the exome, encodes an expansive repertoire of conserved recognition sequences for DNA-binding proteins that nearly doubles the size of the human cis-regulatory lexicon. We find that genetic variants affecting allelic chromatin states are concentrated in footprints, and that these elements are preferentially sheltered from DNA methylation. High-resolution DNaseI cleavage patterns mirror nucleotide-level evolutionary conservation and track the crystallographic topography of protein-DNA interfaces, indicating that transcription factor structure has been evolutionarily imprinted on the human genome sequence. We identify a stereotyped 50 base-pair footprint that precisely defines the site of transcript origination within thousands of human promoters. Finally, we describe a large collection of novel regulatory factor recognition motifs that are highly conserved in both sequence and function, and exhibit cell-selective occupancy patterns that closely parallel major regulators of development, differentiation, and pluripotency.
chromatin; protein occupancy; DNaseI footprinting; ENCODE; regulation
We report a method to measure in vivo turnover of four proteins from sequential tracheal aspirates obtained from human newborn infants with respiratory distress syndrome using targeted proteomics. We detected enrichment for all targeted proteins approximately 3 hours from the start of infusion of [5,5,5-2H3] leucine, secretion times that varied from 1.2 to 2.5 hours, and half lives that ranged between 10 and 21 hours. Complement factor B, a component of the alternative pathway of complement activation, had an ~2-fold longer half life than the other three proteins. In addition, the kinetics of mature and carboxy-terminal tryptic peptides from the same protein (surfactant protein B) were not statistically different (p=0.49).
Protein Turnover; Respiratory Distress Syndrome; Selected Reaction Monitoring; SRM; Protein Kinetics; Protein Metabolism
With a 1 in 6 lifetime risk and over 200,000 diagnoses per year in United States, prostate cancer is the most common noncutaneous cancer in men. The discrepancy between the number of diagnoses and the mortality risk (1 in 35) has led to scrutiny in the clinical management of the disease. The incidence of the disease has more than doubled since the FDA approval of prostate specific antigen test. Despite its low specificity, it has decreased the proportion of metastatic cancers at diagnosis. However, it has also increased the total number of cancers diagnosed with a majority of them having either indolent disease or completely benign conditions. This discrepancy between increased treatment and decreased disease aggressiveness has lead to much criticism that prostate cancer is “overdiagnosed” leading to unnecessary treatment. To this end, it is of interest to develop protein markers or panels of markers that are more indicative/specific of disease severity than currently available. Herein, we begin this endeavor by determining the precision of protein expression from different regions of benign tissue and Gleason 6 grade cancer tissue within the same individual using laser capture microdissection coupled to LC-MS.
After LCM, tissue were lysed and digested using established laboratory procedures. 500ng of protein was separated by LC and analyzed by a velos orbitrap mass spectrometer operating in a top 10 data dependent mode. Data was searched using Sequest and imported into skyline for differential abundance determination. Statistical design of experiments was used to optimized data dependent settings and afforded a 33% increase in unique peptide identifications from our current default DDA settings. 2000 proteins were identified including prostate specific antigen and prostatic acid phophatase. Label free methods will be used to develop an intra individual variability index of both benign and cancerous tissue which will aid in statistically identifying differences.
Mass-spectrometry-based proteomics has become an important component of biological research. Numerous proteomics methods have been developed to identify and quantify the proteins in biological and clinical samples1, identify pathways affected by endogenous and exogenous perturbations2, and characterize protein complexes3. Despite successes, the interpretation of vast proteomics datasets remains a challenge. There have been several calls for improvements and standardization of proteomics data analysis frameworks, as well as for an application-programming interface for proteomics data access4,5. In response, we have developed the ProteoWizard Toolkit, a robust set of open-source, software libraries and applications designed to facilitate proteomics research. The libraries implement the first-ever, non-commercial, unified data access interface for proteomics, bridging field-standard open formats and all common vendor formats. In addition, diverse software classes enable rapid development of vendor-agnostic proteomics software. Additionally, ProteoWizard projects and applications, building upon the core libraries, are becoming standard tools for enabling significant proteomics inquiries.
Hematopoietic protein-1 (Hem-1) is a hematopoietic cell specific member of the WAVE (Wiskott-Aldrich syndrome verprolin-homologous protein) complex, which regulates filamentous actin (F-actin) polymerization in many cell types including immune cells. However, the roles of Hem-1 and the WAVE complex in erythrocyte biology are not known. In this study, we utilized mice lacking Hem-1 expression due to a non-coding point mutation in the Hem1 gene to show that absence of Hem-1 results in microcytic, hypochromic anemia characterized by abnormally shaped erythrocytes with aberrant F-actin foci and decreased lifespan. We find that Hem-1 and members of the associated WAVE complex are normally expressed in wildtype erythrocyte progenitors and mature erythrocytes. Using mass spectrometry and global proteomics, Coomassie staining, and immunoblotting, we find that the absence of Hem-1 results in decreased representation of essential erythrocyte membrane skeletal proteins including α- and β- spectrin, dematin, p55, adducin, ankyrin, tropomodulin 1, band 3, and band 4.1. Hem1−/− erythrocytes exhibit increased protein kinase C-dependent phosphorylation of adducin at Ser724, which targets adducin family members for dissociation from spectrin and actin, and subsequent proteolysis. Increased adducin Ser724 phosphorylation in Hem1−/− erythrocytes correlates with decreased protein expression of the regulatory subunit of protein phosphatase 2A (PP2A), which is required for PP2A-dependent dephosphorylation of PKC targets. These results reveal a novel, critical role for Hem-1 in the homeostasis of structural proteins required for formation and stability of the actin membrane skeleton in erythrocytes.
We report the implementation of front-end higher energy collision induced dissociation (fHCD) on a bench-top dual pressure linear ion trap. Software and hardware modifications were employed, described in detail vide-infra, to allow isolated ions to undergo collisions with ambient gas molecules in an intermediate multipole (q00) of the instrument. Results comparing the performance of fHCD and resonance excitation collision induced dissociation (RE-CID) in terms of injection time, total number of scans, efficiency, mass measurement accuracy (MMA), unique peptide identifications, and spectral quality of labile modified peptides are presented. fHCD is approximately 23% as efficient as RE-CID and, depending on the search algorithm, it identifies 6.6% more or 15% less peptides (q<0.01) from a soluble whole-cell lysate (Caenorhabditis elegans) than RE-CID using Mascot or Sequest search algorithms, respectively. fHCD offers a clear advantage for the analysis of phosphorylated and glycosylated (O-GlcNAc) peptides as the average cross-correlation score (XCorr) for spectra using fHCD was statistically greater (p<0.05) than for spectra collected using RE-CID.
Capillary electrophoresis can provide fast and efficient separations of peptides. However, the high speed separation and limited loading capacity of capillary electrophoresis requires the use of a fast and sensitive detector. While laser-induced fluorescence provides exquisite sensitivity and millisecond response time, it inherently generates a low information content signal. In contrast, mass spectrometry provides an information rich signal that is attractive for peptide analysis. The recently introduced Velos-Orbitrap mass spectrometer is capable of fast and sensitive tandem MS acquisition and simultaneous high accuracy MS acquisition, which is well suited for coupling with fast and efficient separation methods for peptide analysis. We evaluated this instrument as a detector for peptide separation by capillary electrophoresis. In MS mode, we observed low attomole detection limits for a number of peptides in a tryptic digest of standard proteins with high mass resolution (30,000 at m/z 400). The response time of the Orbitrap at this resolution was ~0.70 seconds, which was adequate to reconstruct the peak shape and area of our electrophoretic peaks. The linear ion-trap successfully recorded tandem MS spectra of tryptic peptides at 20 nM concentration.
We investigate the role of mitochondrial oxidative stress in mitochondrial proteome remodelling using mouse models of heart failure induced by pressure overload.
Methods and results
We demonstrate that mice overexpressing catalase targeted to mitochondria (mCAT) attenuate pressure overload-induced heart failure. An improved method of label-free unbiased analysis of the mitochondrial proteome was applied to the mouse model of heart failure induced by transverse aortic constriction (TAC). A total of 425 mitochondrial proteins were compared between wild-type and mCAT mice receiving TAC or sham surgery. The changes in the mitochondrial proteome in heart failure included decreased abundance of proteins involved in fatty acid metabolism, an increased abundance of proteins in glycolysis, apoptosis, mitochondrial unfolded protein response and proteolysis, transcription and translational control, and developmental processes as well as responses to stimuli. Overexpression of mCAT better preserved proteins involved in fatty acid metabolism and attenuated the increases in apoptotic and proteolytic enzymes. Interestingly, gene ontology analysis also showed that monosaccharide metabolic processes and protein folding/proteolysis were only overrepresented in mCAT but not in wild-type mice in response to TAC.
This is the first study to demonstrate that scavenging mitochondrial reactive oxygen species (ROS) by mCAT not only attenuates most of the mitochondrial proteome changes in heart failure, but also induces a subset of unique alterations. These changes represent processes that are adaptive to the increased work and metabolic requirements of pressure overload, but which are normally inhibited by overproduction of mitochondrial ROS.
Mitochondria; Oxidative stress; Proteome; Pressure overload; Cardiomyopathy
Filter aided sample preparation (FASP) and a new sample preparation method using a modified commercial SDS removal spin column are quantitatively compared in terms of their performance for shotgun proteomic experiments in three complex proteomic samples: a Saccharomyces cerevisiae lysate (insoluble fraction), a Caenorhabditis elegans lysate (soluble fraction), and a human embryonic kidney cell line (HEK293T). The characteristics and total number of peptides and proteins identified are compared between the two procedures. The SDS spin column procedure affords a conservative 4-fold improvement in throughput, is more reproducible, less expensive (i.e., requires less materials), and identifies between 30–107% more peptides at a q≤0.01, than the FASP procedure. The peptides identified by SDS spin column are more hydrophobic than species identified by the FASP procedure as indicated by the distribution of GRAVY scores. Ultimately, these improvements correlate to as great as a 50% increase in protein identifications with 2 or more peptides.
Bottom-up proteomics; shotgun proteomics; protein identifications; sample preparation protocols; sodium dodecyl sulfate
Spectral counting methods provide an easy means of identifying proteins with differing abundances between complex mixtures using shotgun proteomics data. The crux spectral-counts command, implemented as part of the Crux software toolkit, implements four previously reported spectral counting methods, the spectral index (SIN), the exponentially modified protein abundance index (emPAI), the normalized spectral abundance factor (NSAF), and the distributed normalized spectral abundance factor (dNSAF).
We compared the reproducibility and the linearity relative to each protein’s abundance of the four spectral counting metrics. Our analysis suggests that NSAF yields the most reproducible counts across technical and biological replicates, and both SIN and NSAF achieve the best linearity.
With the crux spectral-counts command, Crux provides open-source modular methods to analyze mass spectrometry data for identifying and now quantifying peptides and proteins. The C++ source code, compiled binaries, spectra and sequence databases are available at
Circulative transmission of viruses in the Luteoviridae, such as cereal yellow dwarf virus (CYDV), requires a series of precisely orchestrated interactions between virus, plant, and aphid proteins. Natural selection has favored these viruses to be retained in the phloem to facilitate acquisition and transmission by aphids. We show that treatment of infected oat tissue homogenate with sodium sulfite reduces transmission of the purified virus by aphids. Transmission electron microscopy data indicated no gross change in virion morphology due to treatments. However, treated virions were not acquired by aphids through the hindgut epithelial cells and were not transmitted when injected directly into the hemocoel. Analysis of virus preparations using nanoflow liquid chromatography coupled to tandem mass spectrometry revealed a number of host plant proteins co-purifying with viruses, some of which were lost following sodium sulfite treatment. Using targeted mass spectrometry, we show data suggesting that several of the virus-associated host plant proteins accumulated to higher levels in aphids that were fed on CYDV-infected plants compared to healthy plants. We propose two hypotheses to explain these observations, and these are not mutually exclusive: (a) that sodium sulfite treatment disrupts critical virion-host protein interactions required for aphid transmission, or (b) that host infection with CYDV modulates phloem protein expression in a way that is favorable for virus uptake by aphids. Importantly, the genes coding for the plant proteins associated with virus may be examined as targets in breeding cereal crops for new modes of virus resistance that disrupt phloem-virus or aphid-virus interactions.
There are ongoing events where aircraft engine lubricant containing tricresyl phosphates (TCPs) contaminates aircraft cabins. Some individuals have experienced tremors or other neurological symptoms that may last for many months following exposures. Mass spectrometric (MS) protocols are being developed to determine the percentage of “biomarker proteins” that are modified by such exposures, specifically on active site serines. Both plasma butyrylcholinesterase (BChE) and red cell acylpeptide hydrolase (APH) are readily inhibited by 2-(o-cresyl)-4H-1:3:2:benzodioxaphosphoran-2-one (CBDP) or phenyl saligenin cyclic phosphate (PSP) and have the potential to provide information about the level of exposure of an individual. We have developed immunomagnetic bead-based single-step purification protocols for both BChE and APH and have characterized the active site serine adducts of BChE by MS.
Biomarkers; Tricresyl phosphate; CBDP; Butyrylcholinesterase; Acylpeptide hydrolase; Aerotoxic syndrome
We report a method for high-throughput, cost-efficient empirical discovery of optimal proteotypic peptides and fragment ions for targeted proteomics applications using in vitro-synthesized proteins. We demonstrate the approach using human transcription factors – which are typically difficult, low-abundance – targets with an overall success rate of 98%. We show further that targeted proteomic assays developed using our approach facilitate robust in vivo quantification of human transcription factors.
Despite advances in metabolic and postmetabolic labeling methods for quantitative proteomics, there remains a need for improved label-free approaches. This need is particularly pressing for workflows that incorporate affinity enrichment at the peptide level, where isobaric chemical labels such as isobaric tags for relative and absolute quantitation and tandem mass tags may prove problematic or where stable isotope labeling with amino acids in cell culture labeling cannot be readily applied. Skyline is a freely available, open source software tool for quantitative data processing and proteomic analysis. We expanded the capabilities of Skyline to process ion intensity chromatograms of peptide analytes from full scan mass spectral data (MS1) acquired during HPLC MS/MS proteomic experiments. Moreover, unlike existing programs, Skyline MS1 filtering can be used with mass spectrometers from four major vendors, which allows results to be compared directly across laboratories. The new quantitative and graphical tools now available in Skyline specifically support interrogation of multiple acquisitions for MS1 filtering, including visual inspection of peak picking and both automated and manual integration, key features often lacking in existing software. In addition, Skyline MS1 filtering displays retention time indicators from underlying MS/MS data contained within the spectral library to ensure proper peak selection. The modular structure of Skyline also provides well defined, customizable data reports and thus allows users to directly connect to existing statistical programs for post hoc data analysis. To demonstrate the utility of the MS1 filtering approach, we have carried out experiments on several MS platforms and have specifically examined the performance of this method to quantify two important post-translational modifications: acetylation and phosphorylation, in peptide-centric affinity workflows of increasing complexity using mouse and human models.
High-throughput proteomics experiments involving tandem mass spectrometry produce large volumes of complex data that require sophisticated computational analyses. As such, the field offers many challenges for computational biologists. In this article, we briefly introduce some of the core computational and statistical problems in the field and then describe a variety of outstanding problems that readers of PLoS Computational Biology might be able to help solve.
Traditionally, protein turnover has been measured using stable isotope labeled (SIL) tracers. The labeled tracer is incorporated into proteins, proteins of interest are isolated, hydrolyzed into their amino acid constituents, derivatized, and enrichment is measured via gas chromatography mass spectrometry. This method has significant limitations including low throughput and the accuracy of this method can be compromised by the efficacy of the protein isolation step – limiting each experiment to a single abundant protein that is easily purified. Herein, we present a method to determine protein turnover on a global scale using in-house developed software and shotgun proteomics. These developments allow for the determination of protein kinetics of over 1000 proteins in only a few hours. The method for producing labeled alveolar type 2 cells will be described in detail. The cells were grown in media containing 100% 2H3-leucine and cells were harvested at 3 different time points of 4, 8, and 24 hours. Samples were analyzed by nano-flow liquid chromatography coupled to an LTQ-FT-ICR MS. In house developed software, termed Topograph, was used to determine enrichment values and calculate half lives of all identified leucine containing peptides. Preliminary results demonstrate that half lives could be calculated for approximately 1400 of the 2000 proteins detected. In addition, 2 different peptides from mature surfactant protein B, which is an essential component of proper functioning surfactant, showed great agreement (20.0 and 21.1 hours). This method will be used as a model to study various genetic regulators of surfactant composition and protein metabolism. We also demonstrate that a similar stable isotope tracer strategy can be applied in vivo toward measuring surfactant protein B using targeted proteomics.
Variation in RNA, protein, and metabolite levels among individuals is an important source of physiological and phenotypic differences within and between species. However, relatively little is known about the magnitude and genetic basis of these high-dimensional molecular phenotypes. Yeast provide an ideal model system for the genetic dissection of complex and quantitative traits, and whole-genome sequences are accumulating for dozens of Saccharomyces cerevisiae strains isolated from natural, industrial, and lab environments. We grew a diverse selection of sequenced strains in continuous culture and used a randomized and replicated study design. We exploited all the technologies in the Yeast Resource Center to obtain high quality and high coverage measurements of RNA, protein, metabolite, and morphological phenotypes. The resulting data sets provide a unique and powerful opportunity to combine comparative functional genomics data with comparative sequence analyses and delineate the genetic architecture of complex and quantitative phenotypes in yeast. Our initial analyses indicate that a high degree of strain-to-strain variation exists at all systems levels, and that this variation largely correlates with strain relatedness as measured by sequence comparison. Variation in RNA levels correlates with the corresponding peptides and related metabolites in complex ways. These experiments have resulted in an important large-scale data set of thousands of quantitative traits collected in a carefully designed randomized study, which will provide novel insights into the magnitude and patterns of natural variation of molecular and morphological phenotypes, as well as preliminary insights into their genetic basis.
Proteomics experiments based on Selected Reaction Monitoring (SRM, also referred to as Multiple Reaction Monitoring or MRM) are being used to target large numbers of protein candidates in complex mixtures. At present, instrument parameters are often optimized for each peptide, a time and resource intensive process. Large SRM experiments are greatly facilitated by having the ability to predict MS instrument parameters that work well with the broad diversity of peptides they target. For this reason, we investigated the impact of using simple linear equations to predict the collision energy (CE) on peptide signal intensity and compared it with the empirical optimization of the CE for each peptide and transition individually. Using optimized linear equations, the difference between predicted and empirically derived CE values was found to be an average gain of only 7.8% of total peak area. We also found that existing commonly used linear equations fall short of their potential, and should be recalculated for each charge state and when introducing new instrument platforms. We provide a fully automated pipeline for calculating these equations and individually optimizing CE of each transition on SRM instruments from Agilent, Applied Biosystems, Thermo-Scientific and Waters in the open source Skyline software tool (http://proteome.gs.washington.edu/software/skyline).
Proper centromere function is critical to maintain genomic stability and to prevent aneuploidy, a hallmark of tumors and birth defects. A conserved feature of all eukaryotic centromeres is an essential histone H3 variant called CENP-A that requires a centromere targeting domain (CATD) for its localization. Although proteolysis prevents CENP-A from mislocalizing to euchromatin, regulatory factors have not been identified. Here, we identify an E3 ubiquitin ligase called Psh1 that leads to the degradation of Cse4, the budding yeast CENP-A homolog. Cse4 overexpression is toxic to psh1Δ cells and results in euchromatic localization. Strikingly, the Cse4 centromere targeting domain is a key regulator of its stability and helps Psh1 discriminate Cse4 from histone H3. Taken together, we propose that the CATD has a previously unknown role in maintaining the exclusive localization of Cse4 by preventing its mislocalization to euchromatin via Psh1-mediated degradation.
The problem of identifying proteins from a shotgun proteomics experiment has not been definitively solved. Identifying the proteins in a sample requires ranking them, ideally with interpretable scores. In particular, “degenerate” peptides, which map to multiple proteins, have made such a ranking difficult to compute. The problem of computing posterior probabilities for the proteins, which can be interpreted as confidence in a protein’s presence, has been especially daunting. Previous approaches have either ignored the peptide degeneracy problem completely, addressed it by computing a heuristic set of proteins or heuristic posterior probabilities, or by estimating the posterior probabilities with sampling methods. We present a probabilistic model for protein identification in tandem mass spectrometry that recognizes peptide degeneracy. We then introduce graph-transforming algorithms that facilitate efficient computation of protein probabilities, even for large data sets. We evaluate our identification procedure on five different well-characterized data sets and demonstrate our ability to efficiently compute high-quality protein posteriors.
Electron-transfer dissociation (ETD) induces fragmentation along the peptide backbone by transferring an electron from a radical anion to a protonated peptide. In contrast with collision induced dissociation, side chains and modifications such as phosphorylation are left intact through the ETD process. Because the precursor charge state is an important input to MS/MS sequence database search tools, the ability to accurately determine the precursor charge is helpful for the identification process. Furthermore, because ETD can be applied to large, highly charged peptides, the need for accurate precursor charge state determination is magnified. Otherwise, each spectrum must be searched repeatedly using a large range of possible precursor charge states. To address this problem, we have developed an ETD charge state prediction tool based on support vector machine classifiers that is demonstrated to exhibit superior classification accuracy while minimizing the overall number of predicted charge states. The tool is freely available, open source, cross platform compatible, and demonstrated to perform well when compared with an existing charge state prediction tool. The program is available from http://code.google.com/p/etdz/.
electron transfer dissociation; charge state prediction; support vector machine; tandem mass spectrometry