PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-22 (22)
 

Clipboard (0)
None

Select a Filter Below

Journals
more »
Year of Publication
1.  Basophile: Accurate Fragment Charge State Prediction Improves Peptide Identification Rates 
In shotgun proteomics, database search algorithms rely on fragmentation models to predict fragment ions that should be observed for a given peptide sequence. The most widely used strategy (Naive model) is oversimplified, cleaving all peptide bonds with equal probability to produce fragments of all charges below that of the precursor ion. More accurate models, based on fragmentation simulation, are too computationally intensive for on-the-fly use in database search algorithms. We have created an ordinal-regression-based model called Basophile that takes fragment size and basic residue distribution into account when determining the charge retention during CID/higher-energy collision induced dissociation (HCD) of charged peptides. This model improves the accuracy of predictions by reducing the number of unnecessary fragments that are routinely predicted for highly-charged precursors. Basophile increased the identification rates by 26% (on average) over the Naive model, when analyzing triply-charged precursors from ion trap data. Basophile achieves simplicity and speed by solving the prediction problem with an ordinal regression equation, which can be incorporated into any database search software for shotgun proteomic identification.
doi:10.1016/j.gpb.2012.11.004
PMCID: PMC3737598  PMID: 23499924
Fragmentation; Basicity; Fragment size; Ordinal regression
2.  Halogen Photoelimination from Dirhodium Phosphazane Complexes via Chloride-Bridged Intermediates 
Halogen photoelimination is a critical step in HX-splitting photocatalysis. Herein, we report the photoreduction of a pair of valence-isomeric dirhodium phosphazane complexes, and suggest that a common intermediate is accessed in the photochemistry of both mixed-valent and valence-symmetric complexes. The results of these investigations suggest that halogen photoelimination proceeds by two sequential photochemical reactions: ligand dissociation followed by subsequent halogen elimination.
doi:10.1039/C3SC50462J
PMCID: PMC3819227  PMID: 24224081
3.  QuaMeter: Multivendor Performance Metrics for LC–MS/MS Proteomics Instrumentation 
Analytical chemistry  2012;84(14):5845-5850.
LC-MS/MS-based proteomics studies rely on stable analytical system performance that can be evaluated by objective criteria. The National Institute of Standards and Technology (NIST) introduced the MSQC software to compute diverse metrics from experimental LC-MS/MS data, enabling quality analysis and quality control (QA/QC) of proteomics instrumentation. In practice, however, several attributes of the MSQC software prevent its use for routine instrument monitoring. Here, we present QuaMeter, an open-source tool that improves MSQC in several aspects. QuaMeter can directly read raw data from instruments manufactured by different vendors. The software can work with a wide variety of peptide identification software for improved reliability and flexibility. Finally, QC metrics implemented in QuaMeter are rigorously defined and tested. The source code and binary versions of QuaMeter are available under Apache 2.0 License at http://fenchurch.mc.vanderbilt.edu.
doi:10.1021/ac300629p
PMCID: PMC3730131  PMID: 22697456
4.  The 2012/2013 PRG Study: Assessing Longitudinal Variability in Routine Peptide LC-MS/MS Analysis 
The PRG study for 2012-2013 was intended to catalog critical parameters of variability influencing LC-MS/MS data quality within laboratories over a nine month period between March and November, 2012. This study was intended to determine intra-laboratory reproducibility and inform participants of key areas of variability in routine peptide mass spectrometry analyses. Aliquots of a dried, digested protein mixture was sent to all participants with the expectation that once per month a new vial will be reconstituted and analyzed using routine LC-MS and data-dependent MS/MS acquisition settings. Of key importance in the design of this study is the lack of a standard operating protocol. The goal was to measure the degree of reproducibility within a lab as it applies to their established HPLC and MS settings and QC measures. A survey was conducted with each sample submission to catalog individual laboratory practices, instrument configurations, acquisition settings, and routine and non-routine maintenance procedures. Over 80 participants submitted at least one data set, and 36 participants completed the study with 8 or more submissions over the 9 month period. Survey data revealed the vast majority of laboratories (>90%) perform routine QC to determine system suitability, but there was considerable variability in the type and frequency of QC analysis. Collected raw data was searched using identical parameters by the PRG and analyzed for more than 40 MS and MS/MS metrics using the software QuaMeter. The software tool generates metrics that assess multiple properties of LC-MS/MS, from extracted ion chromatogram peak width to total ion current distribution and MS sampling rates. Both identification-dependent and identification-independent metrics can be generated. The variability within these metrics across time was analyzed for each participant and correlative relationships with survey results will be presented.
PMCID: PMC3635275
5.  [No title available] 
The PRG study for 2012–2013 was intended to catalog critical parameters of variability influencing LC-MS/MS data quality within laboratories over a nine month period between March and November, 2012. This study was intended to determine intra-laboratory reproducibility and inform participants of key areas of variability in routine peptide mass spectrometry analyses. Aliquots of a dried, digested protein mixture were sent to all participants with the expectation that once per month a new vial will be reconstituted and analyzed using routine LC-MS and data-dependent MS/MS acquisition settings. Of key importance in the design of this study is the lack of a standard operating protocol. The goal was to measure the degree of reproducibility within a lab as it applies to their established HPLC and MS settings and QC measures. A survey was conducted with each sample submission to catalog individual laboratory practices, instrument configurations, acquisition settings, and routine and non-routine maintenance procedures. Over 80 participants submitted at least one data set, and 36 participants completed the study with 8 or more submissions over the 9 month period. Survey data revealed the vast majority of laboratories (90%) perform routine QC to determine system suitability, but there was considerable variability in the type and frequency of QC analysis. Collected raw data was searched using identical parameters by the PRG and analyzed for more than 40 MS and MS/MS metrics using the software QuaMeter. The software tool generates metrics that assess multiple properties of LC-MS/MS, from extracted ion chromatogram peak width to total ion current distribution and MS sampling rates. Both identification-dependent and identification-independent metrics can be generated. The variability within these metrics across time was analyzed for each participant and correlative relationships with survey results will be presented.
PMCID: PMC3635300
6.  A Cross-platform Toolkit for Mass Spectrometry and Proteomics 
Nature biotechnology  2012;30(10):918-920.
Mass-spectrometry-based proteomics has become an important component of biological research. Numerous proteomics methods have been developed to identify and quantify the proteins in biological and clinical samples1, identify pathways affected by endogenous and exogenous perturbations2, and characterize protein complexes3. Despite successes, the interpretation of vast proteomics datasets remains a challenge. There have been several calls for improvements and standardization of proteomics data analysis frameworks, as well as for an application-programming interface for proteomics data access4,5. In response, we have developed the ProteoWizard Toolkit, a robust set of open-source, software libraries and applications designed to facilitate proteomics research. The libraries implement the first-ever, non-commercial, unified data access interface for proteomics, bridging field-standard open formats and all common vendor formats. In addition, diverse software classes enable rapid development of vendor-agnostic proteomics software. Additionally, ProteoWizard projects and applications, building upon the core libraries, are becoming standard tools for enabling significant proteomics inquiries.
doi:10.1038/nbt.2377
PMCID: PMC3471674  PMID: 23051804
7.  The HUPO proteomics standards initiative- mass spectrometry controlled vocabulary 
Controlled vocabularies (CVs), i.e. a collection of predefined terms describing a modeling domain, used for the semantic annotation of data, and ontologies are used in structured data formats and databases to avoid inconsistencies in annotation, to have a unique (and preferably short) accession number and to give researchers and computer algorithms the possibility for more expressive semantic annotation of data. The Human Proteome Organization (HUPO)–Proteomics Standards Initiative (PSI) makes extensive use of ontologies/CVs in their data formats. The PSI-Mass Spectrometry (MS) CV contains all the terms used in the PSI MS–related data standards. The CV contains a logical hierarchical structure to ensure ease of maintenance and the development of software that makes use of complex semantics. The CV contains terms required for a complete description of an MS analysis pipeline used in proteomics, including sample labeling, digestion enzymes, instrumentation parts and parameters, software used for identification and quantification of peptides/proteins and the parameters and scores used to determine their significance. Owing to the range of topics covered by the CV, collaborative development across several PSI working groups, including proteomics research groups, instrument manufacturers and software vendors, was necessary. In this article, we describe the overall structure of the CV, the process by which it has been developed and is maintained and the dependencies on other ontologies.
Database URL: http://psidev.cvs.sourceforge.net/viewvc/psidev/psi/psi-ms/mzML/controlledVocabulary/psi-ms.obo
doi:10.1093/database/bat009
PMCID: PMC3594986  PMID: 23482073
8.  Pepitome: evaluating improved spectral library search for identification complementarity and quality assessment 
Journal of Proteome Research  2012;11(3):1686-1695.
Spectral libraries have emerged as a viable alternative to protein sequence databases for peptide identification. These libraries contain previously detected peptide sequences and their corresponding tandem mass spectra (MS/MS). Search engines can then identify peptides by comparing experimental MS/MS scans to those in the library. Many of these algorithms employ the dot product score for measuring the quality of a spectrum-spectrum match (SSM). This scoring system does not offer a clear statistical interpretation and ignores fragment ion m/z discrepancies in the scoring. We developed a new spectral library search engine, Pepitome, which employs statistical systems for scoring SSMs. Pepitome outperformed the leading library search tool, SpectraST, when analyzing data sets acquired on three different mass spectrometry platforms. We characterized the reliability of spectral library searches by confirming shotgun proteomics identifications through RNA-Seq data. Applying spectral library and database searches on the same sample revealed their complementary nature. Pepitome identifications enabled the automation of quality analysis and quality control (QA/QC) for shotgun proteomics data acquisition pipelines.
doi:10.1021/pr200874e
PMCID: PMC3292681  PMID: 22217208
9.  Supporting tool suite for production proteomics 
Bioinformatics  2011;27(22):3214-3215.
Summary: The large amount of data produced by proteomics experiments requires effective bioinformatics tools for the integration of data management and data analysis. Here we introduce a suite of tools developed at Vanderbilt University to support production proteomics. We present the Backup Utility Service tool for automated instrument file backup and the ScanSifter tool for data conversion. We also describe a queuing system to coordinate identification pipelines and the File Collector tool for batch copying analytical results. These tools are individually useful but collectively reinforce each other. They are particularly valuable for proteomics core facilities or research institutions that need to manage multiple mass spectrometers. With minor changes, they could support other types of biomolecular resource facilities.
Availability and Implementation: Source code and executable versions are available under Apache 2.0 License at http://www.vicc.org/jimayersinstitute/data/
Contact: daniel.liebler@vanderbilt.edu
doi:10.1093/bioinformatics/btr544
PMCID: PMC3208394  PMID: 21965817
10.  ScanRanker: Quality Assessment of Tandem Mass Spectra via Sequence Tagging 
Journal of proteome research  2011;10(7):2896-2904.
In shotgun proteomics, protein identification by tandem mass spectrometry relies on bioinformatics tools. Despite recent improvements in identification algorithms, a significant number of high quality spectra remain unidentified for various reasons. Here we present ScanRanker, an open-source tool that evaluates the quality of tandem mass spectra via sequence tagging with reliable performance in data from different instruments. The superior performance of ScanRanker enables it not only to find unassigned high quality spectra that evade identification through database search, but also to select spectra for de novo sequencing and cross-linking analysis. In addition, we demonstrate that the distribution of ScanRanker scores predicts the richness of identifiable spectra among multiple LC-MS/MS runs in an experiment, and ScanRanker scores assist the process of peptide assignment validation to increase confident spectrum identifications. The source code and executable versions of ScanRanker are available from http://fenchurch.mc.vanderbilt.edu.
doi:10.1021/pr200118r
PMCID: PMC3128668  PMID: 21520941
spectral quality; sequence tagging; bioinformatics; tandem mass spectrometry; cross-linking
11.  The mzIdentML Data Standard for Mass Spectrometry-Based Proteomics Results 
Molecular & Cellular Proteomics : MCP  2012;11(7):M111.014381.
We report the release of mzIdentML, an exchange standard for peptide and protein identification data, designed by the Proteomics Standards Initiative. The format was developed by the Proteomics Standards Initiative in collaboration with instrument and software vendors, and the developers of the major open-source projects in proteomics. Software implementations have been developed to enable conversion from most popular proprietary and open-source formats, and mzIdentML will soon be supported by the major public repositories. These developments enable proteomics scientists to start working with the standard for exchanging and publishing data sets in support of publications and they provide a stable platform for bioinformatics groups and commercial software vendors to work with a single file format for identification data.
doi:10.1074/mcp.M111.014381
PMCID: PMC3394945  PMID: 22375074
12.  Sequence tagging reveals unexpected modifications in toxicoproteomics 
Chemical research in toxicology  2011;24(2):204-216.
Toxicoproteomic samples are rich in posttranslational modifications (PTMs) of proteins. Identifying these modifications via standard database searching can incur significant performance penalties. Here we describe the latest developments in TagRecon, an algorithm that leverages inferred sequence tags to identify modified peptides in toxicoproteomic data sets. TagRecon identifies known modifications more effectively than the MyriMatch database search engine. TagRecon outperformed state of the art software in recognizing unanticipated modifications from LTQ, Orbitrap, and QTOF data sets. We developed user-friendly software for detecting persistent mass shifts from samples. We follow a three-step strategy for detecting unanticipated PTMs in samples. First, we identify the proteins present in the sample with a standard database search. Next, identified proteins are interrogated for unexpected PTMs with a sequence tag-based search. Finally, additional evidence is gathered for the detected mass shifts with a refinement search. Application of this technology on toxicoproteomic data sets revealed unintended cross-reactions between proteins and sample processing reagents. Twenty five proteins in rat liver showed signs of oxidative stress when exposed to potentially toxic drugs. These results demonstrate the value of mining toxicoproteomic data sets for modifications.
doi:10.1021/tx100275t
PMCID: PMC3042045  PMID: 21214251
13.  The 2012 PRG study: Assessing Longitudinal Variability in Routine Peptide LC-MS/MS Analysis 
The PRG study for 2012 is intended to catalog critical parameters of variability influencing LC-MS/MS data quality within labs over a nine month period between March and November, 2012. This study is intended to inform participant labs of key areas of variability in their routine qualitative and quantitative analyses. A dried digested protein mix is sent to labs in aliquots with the expectation that once per month a new vial will be reconstituted and analyzed using routine LC-MS and data-dependent MS/MS acquisition settings. Participants will return the raw data to a centralized server for analysis. The analysis consists of 42 MS and MS/MS metrics that have been determined through the efforts of the CPTC consortium and implemented in open source software from NIST (“MSQC”) and Vanderbilt University (“QuaMeter”). Of key importance in the design of this study is the lack of a standard operating protocol. The concept is to determine variability within a lab when that lab uses their own routine settings and QC measures. A survey is conducted with each sample submission to catalog changes in operators, acquisition settings, as well as routine and non-routine maintenance procedures. As of date, there were 95 labs in 23 countries requesting sample. Within these labs are 25 different models of mass spectrometers from 6 commercial vendors.
PMCID: PMC3630542
14.  The 2012 PRG Study: Assessing Longitudinal Variability in Routine Peptide LC-MS/MS Analysis 
The PRG study for 2012 is intended to catalog critical parameters of variability influencing LC-MS/MS data quality within labs over a nine month period between March and November, 2012. This study is intended to inform participant labs of key areas of variability in their routine qualitative and quantitative analyses. A dried digested protein mix is sent to labs in aliquots with the expectation that once per month a new vial will be reconstituted and analyzed using routine LC-MS and data-dependent MS/MS acquisition settings. Participants will return the raw data to a centralized server for analysis. The analysis consists of 42 MS and MS/MS metrics that have been determined through the efforts of the CPTC consortium and implemented in open source software from NIST (“MSQC”) and Vanderbilt University (“QuaMeter”). Of key importance in the design of this study is the lack of a standard operating protocol. The concept is to determine variability within a lab when that lab uses their own routine settings and QC measures. A survey is conducted with each sample submission to catalog changes in operators, acquisition settings, as well as routine and non-routine maintenance procedures. As of date, there were 95 labs in 23 countries requesting sample. Within these labs are 25 different models of mass spectrometers from 6 commercial vendors.
PMCID: PMC3630553
15.  TraML—A Standard Format for Exchange of Selected Reaction Monitoring Transition Lists* 
Molecular & Cellular Proteomics : MCP  2011;11(4):R111.015040.
Targeted proteomics via selected reaction monitoring is a powerful mass spectrometric technique affording higher dynamic range, increased specificity and lower limits of detection than other shotgun mass spectrometry methods when applied to proteome analyses. However, it involves selective measurement of predetermined analytes, which requires more preparation in the form of selecting appropriate signatures for the proteins and peptides that are to be targeted. There is a growing number of software programs and resources for selecting optimal transitions and the instrument settings used for the detection and quantification of the targeted peptides, but the exchange of this information is hindered by a lack of a standard format. We have developed a new standardized format, called TraML, for encoding transition lists and associated metadata. In addition to introducing the TraML format, we demonstrate several implementations across the community, and provide semantic validators, extensive documentation, and multiple example instances to demonstrate correctly written documents. Widespread use of TraML will facilitate the exchange of transitions, reduce time spent handling incompatible list formats, increase the reusability of previously optimized transitions, and thus accelerate the widespread adoption of targeted proteomics via selected reaction monitoring.
doi:10.1074/mcp.R111.015040
PMCID: PMC3322582  PMID: 22159873
16.  TagRecon: High-Throughput Mutation Identification through Sequence Tagging 
Journal of proteome research  2010;9(4):1716-1726.
Shotgun proteomics produces collections of tandem mass spectra that contain all the data needed to identify mutated peptides from clinical samples. Identifying these sequence variations, however, has not been feasible with conventional database search strategies, which require exact matches between observed and expected sequences. Searching for mutations as mass shifts on specified residues through database search can incur significant performance penalties and generate substantial false positive rates. Here we describe TagRecon, an algorithm that leverages inferred sequence tags to identify unanticipated mutations in clinical proteomic data sets. TagRecon identifies unmodified peptides as sensitively as the related MyriMatch database search engine. In both LTQ and Orbitrap data sets, TagRecon outperformed state of the art software in recognizing sequence mismatches from data sets with known variants. We developed guidelines for filtering putative mutations from clinical samples, and we applied them in an analysis of cancer cell lines and an examination of colon tissue. Mutations were found in up to 6% of identified peptides, and only a small fraction corresponded to dbSNP entries. The RKO cell line, which is DNA mismatch repair deficient, yielded more mutant peptides than the mismatch repair proficient SW480 line. Analysis of colon cancer tumor and adjacent tissue revealed hydroxyproline modifications associated with extracellular matrix degradation. These results demonstrate the value of using sequence tagging algorithms to fully interrogate clinical proteomic data sets.
doi:10.1021/pr900850m
PMCID: PMC2859315  PMID: 20131910
mutation; bioinformatics; hydroxyproline; sequence tagging
17.  Skyline: an open source document editor for creating and analyzing targeted proteomics experiments 
Bioinformatics  2010;26(7):966-968.
Summary: Skyline is a Windows client application for targeted proteomics method creation and quantitative data analysis. It is open source and freely available for academic and commercial use. The Skyline user interface simplifies the development of mass spectrometer methods and the analysis of data from targeted proteomics experiments performed using selected reaction monitoring (SRM). Skyline supports using and creating MS/MS spectral libraries from a wide variety of sources to choose SRM filters and verify results based on previously observed ion trap data. Skyline exports transition lists to and imports the native output files from Agilent, Applied Biosystems, Thermo Fisher Scientific and Waters triple quadrupole instruments, seamlessly connecting mass spectrometer output back to the experimental design document. The fast and compact Skyline file format is easily shared, even for experiments requiring many sample injections. A rich array of graphs displays results and provides powerful tools for inspecting data integrity as data are acquired, helping instrument operators to identify problems early. The Skyline dynamic report designer exports tabular data from the Skyline document model for in-depth analysis with common statistical tools.
Availability: Single-click, self-updating web installation is available at http://proteome.gs.washington.edu/software/skyline. This web site also provides access to instructional videos, a support board, an issues list and a link to the source code project.
Contact: brendanx@u.washington.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btq054
PMCID: PMC2844992  PMID: 20147306
18.  mzML—a Community Standard for Mass Spectrometry Data* 
Molecular & Cellular Proteomics : MCP  2010;10(1):R110.000133.
Mass spectrometry is a fundamental tool for discovery and analysis in the life sciences. With the rapid advances in mass spectrometry technology and methods, it has become imperative to provide a standard output format for mass spectrometry data that will facilitate data sharing and analysis. Initially, the efforts to develop a standard format for mass spectrometry data resulted in multiple formats, each designed with a different underlying philosophy. To resolve the issues associated with having multiple formats, vendors, researchers, and software developers convened under the banner of the HUPO PSI to develop a single standard. The new data format incorporated many of the desirable technical attributes from the previous data formats, while adding a number of improvements, including features such as a controlled vocabulary with validation tools to ensure consistent usage of the format, improved support for selected reaction monitoring data, and immediately available implementations to facilitate rapid adoption by the community. The resulting standard data format, mzML, is a well tested open-source format for mass spectrometer output files that can be readily utilized by the community and easily adapted for incremental advances in mass spectrometry technology.
doi:10.1074/mcp.R110.000133
PMCID: PMC3013463  PMID: 20716697
19.  IDPicker 2.0: Improved Protein Assembly with High Discrimination Peptide Identification Filtering 
Journal of proteome research  2009;8(8):3872-3881.
Tandem mass spectrometry-based shotgun proteomics has become a widespread technology for analyzing complex protein mixtures. A number of database searching algorithms have been developed to assign peptide sequences to tandem mass spectra. Assembling the peptide identifications to proteins, however, is a challenging issue because many peptides are shared among multiple proteins. IDPicker is an open-source protein assembly tool that derives a minimum protein list from peptide identifications filtered to a specified False Discovery Rate. Here, we update IDPicker to increase confident peptide identifications by combining multiple scores produced by database search tools. By segregating peptide identifications for thresholding using both the precursor charge state and the number of tryptic termini, IDPicker retrieves more peptides for protein assembly. The new version is more robust against false positive proteins, especially in searches using multispecies databases, by requiring additional novel peptides in the parsimony process. IDPicker has been designed for incorporation in many identification workflows by the addition of a graphical user interface and the ability to read identifications from the pepXML format. These advances position IDPicker for high peptide discrimination and reliable protein assembly in large-scale proteomics studies. The source code and binaries for the latest version of IDPicker are available from http://fenchurch.mc.vanderbilt.edu/.
doi:10.1021/pr900360j
PMCID: PMC2810655  PMID: 19522537
bioinformatics; parsimony; protein assembly; protein inference; false discovery rate
20.  Proteomic Parsimony through Bipartite Graph Analysis Improves Accuracy and Transparency 
Journal of proteome research  2007;6(9):3549-3557.
Assembling peptides identified from LC–MS/MS spectra into a list of proteins is a critical step in analyzing shotgun proteomics data. As one peptide sequence can be mapped to multiple proteins in a database, naïve protein assembly can substantially overstate the number of proteins found in samples. We model the peptide–protein relationships in a bipartite graph and use efficient graph algorithms to identify protein clusters with shared peptides and to derive the minimal list of proteins. We test the effects of this parsimony analysis approach using MS/MS data sets generated from a defined human protein mixture, a yeast whole cell extract, and a human serum proteome after MARS column depletion. The results demonstrate that the bipartite parsimony technique not only simplifies protein lists but also improves the accuracy of protein identification. We use bipartite graphs for the visualization of the protein assembly results to render the parsimony analysis process transparent to users. Our approach also groups functionally related proteins together and improves the comprehensibility of the results. We have implemented the tool in the IDPicker package. The source code and binaries for this protein assembly pipeline are available under Mozilla Public License at the following URL: http://www.mc.vanderbilt.edu/msrc/bioinformatics/.
doi:10.1021/pr070230d
PMCID: PMC2810678  PMID: 17676885
parsimony analysis; bipartite graph; shotgun proteomics; LC-MS/MS; protein assembly
21.  DirecTag: Accurate Sequence Tags from Peptide MS/MS through Statistical Scoring 
Journal of proteome research  2008;7(9):3838-3846.
In shotgun proteomics, tandem mass spectra of peptides are typically identified through database search algorithms such as Sequest. We have developed DirecTag, an open-source algorithm to infer partial sequence tags directly from observed fragment ions. This algorithm is unique in its implementation of three separate scoring systems to evaluate each tag on the basis of peak intensity, m/z fidelity, and complementarity. In data sets from several types of mass spectrometers, DirecTag reproducibly exceeded the accuracy and speed of InsPecT and GutenTag, two previously published algorithms for this purpose. The source code and binaries for DirecTag are available from http://fenchurch.mc.vanderbilt.edu.
doi:10.1021/pr800154p
PMCID: PMC2810657  PMID: 18630943
sequence tagging; bioinformatics; de novo; multi-platform; peptide identification
22.  MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis 
Journal of proteome research  2007;6(2):654-661.
Shotgun proteomics experiments are dependent upon database search engines to identify peptides from tandem mass spectra. Many of these algorithms score potential identifications by evaluating the number of fragment ions matched between each peptide sequence and an observed spectrum. These systems, however, generally do not distinguish between matching an intense peak and matching a minor peak. We have developed a statistical model to score peptide matches that is based upon the multivariate hypergeometric distribution. This scorer, part of the “MyriMatch” database search engine, places greater emphasis on matching intense peaks. The probability that the best match for each spectrum has occurred by random chance can be employed to separate correct matches from random ones. We evaluated this software on data sets from three different laboratories employing three different ion trap instruments. Employing a novel system for testing discrimination, we demonstrate that stratifying peaks into multiple intensity classes improves the discrimination of scoring. We compare MyriMatch results to those of Sequest and X!Tandem, revealing that it is capable of higher discrimination than either of these algorithms. When minimal peak filtering is employed, performance plummets for a scoring model that does not stratify matched peaks by intensity. On the other hand, we find that MyriMatch discrimination improves as more peaks are retained in each spectrum. MyriMatch also scales well to tandem mass spectra from high-resolution mass analyzers. These findings may indicate limitations for existing database search scorers that count matched peaks without differentiating them by intensity. This software and source code is available under Mozilla Public License at this URL: http://www.mc.vanderbilt.edu/msrc/bioinformatics/.
doi:10.1021/pr0604054
PMCID: PMC2525619  PMID: 17269722
Proteomics; Identification; Statistical Distribution; Reversed Database; Peak Filtering

Results 1-22 (22)