PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (32)
 

Clipboard (0)
None

Select a Filter Below

Journals
more »
Year of Publication
1.  Rescue of the 1947 Zika Virus Prototype Strain with a Cytomegalovirus Promoter-Driven cDNA Clone 
mSphere  2016;1(5):e00246-16.
The study of ZIKV, which has become increasingly important with the recent association of this virus with microcephaly and Guillain-Barré syndrome, would benefit from an efficient strategy to genetically manipulate the virus. This work describes a model system to produce infectious virus in cell culture. We created a plasmid carrying the prototype 1947 Uganda MR766 ZIKV genome that both was stable in bacteria and could produce high levels of infectious virus in mammalian cells through direct delivery of this DNA. Furthermore, growth properties of this rescued virus closely resembled those of the viral isolate from which it was derived. This model system will provide a simple and effective means to study how ZIKV genetics impact viral replication and pathogenesis.
ABSTRACT
The recent Zika virus (ZIKV) outbreak has been linked to severe pathogenesis. Here, we report the construction of a plasmid carrying a cytomegalovirus (CMV) promoter-expressed prototype 1947 Uganda MR766 ZIKV cDNA that can initiate infection following direct plasmid DNA transfection of mammalian cells. Incorporation of a synthetic intron in the nonstructural protein 1 (NS1) region of the ZIKV polyprotein reduced viral cDNA-associated toxicity in bacteria. High levels of infectious virus were produced following transfection of the plasmid bearing the wild-type MR766 ZIKV genome, but not one with a disruption to the viral nonstructural protein 5 (NS5) polymerase active site. Multicycle growth curve and plaque assay experiments indicated that the MR766 virus resulting from plasmid transfection exhibited growth characteristics that were more similar to its parental isolate than previously published 2010 Cambodia and 2015 Brazil cDNA-rescued ZIKV. This ZIKV infectious clone will be useful for investigating the genetic determinants of ZIKV infection and pathogenesis and should be amenable to construction of diverse infectious clones expressing reporter proteins and representing a range of ZIKV isolates.
IMPORTANCE The study of ZIKV, which has become increasingly important with the recent association of this virus with microcephaly and Guillain-Barré syndrome, would benefit from an efficient strategy to genetically manipulate the virus. This work describes a model system to produce infectious virus in cell culture. We created a plasmid carrying the prototype 1947 Uganda MR766 ZIKV genome that both was stable in bacteria and could produce high levels of infectious virus in mammalian cells through direct delivery of this DNA. Furthermore, growth properties of this rescued virus closely resembled those of the viral isolate from which it was derived. This model system will provide a simple and effective means to study how ZIKV genetics impact viral replication and pathogenesis.
doi:10.1128/mSphere.00246-16
PMCID: PMC5040786  PMID: 27704051
Zika virus; cell culture; flavivirus; infectious clones
2.  Reproducibility of Differential Proteomic Technologies in CPTAC Fractionated Xenografts 
Journal of Proteome Research  2015;15(3):691-706.
The NCI Clinical Proteomic Tumor Analysis Consortium (CPTAC) employed a pair of reference xenograft proteomes for initial platform validation and ongoing quality control of its data collection for The Cancer Genome Atlas (TCGA) tumors. These two xenografts, representing basal and luminal-B human breast cancer, were fractionated and analyzed on six mass spectrometers in a total of 46 replicates divided between iTRAQ and label-free technologies, spanning a total of 1095 LC–MS/MS experiments. These data represent a unique opportunity to evaluate the stability of proteomic differentiation by mass spectrometry over many months of time for individual instruments or across instruments running dissimilar workflows. We evaluated iTRAQ reporter ions, label-free spectral counts, and label-free extracted ion chromatograms as strategies for data interpretation (source code is available from http://homepages.uc.edu/~wang2x7/Research.htm). From these assessments, we found that differential genes from a single replicate were confirmed by other replicates on the same instrument from 61 to 93% of the time. When comparing across different instruments and quantitative technologies, using multiple replicates, differential genes were reproduced by other data sets from 67 to 99% of the time. Projecting gene differences to biological pathways and networks increased the degree of similarity. These overlaps send an encouraging message about the maturity of technologies for proteomic differentiation.
doi:10.1021/acs.jproteome.5b00859
PMCID: PMC4779376  PMID: 26653538
Differential proteomics; label-free; iTRAQ; quality control; xenografts; technology assessment; CPTAC
3.  proBAMsuite, a Bioinformatics Framework for Genome-Based Representation and Analysis of Proteomics Data* 
To facilitate genome-based representation and analysis of proteomics data, we developed a new bioinformatics framework, proBAMsuite, in which a central component is the protein BAM (proBAM) file format for organizing peptide spectrum matches (PSMs)1 within the context of the genome. proBAMsuite also includes two R packages, proBAMr and proBAMtools, for generating and analyzing proBAM files, respectively. Applying proBAMsuite to three recently published proteomics datasets, we demonstrated its utility in facilitating efficient genome-based sharing, interpretation, and integration of proteomics data. First, the interpretation of proteomics data is significantly enhanced with the rich genomic annotation information. Second, PSMs can be easily reannotated using user-specified gene annotation schemes and assembled into both protein and gene identifications. Third, using the genome as a common reference, proBAMsuite facilitates seamless proteomics and proteogenomics data integration. Finally, proBAM files can be readily visualized in genome browsers and thus bring proteomics data analysis to a general audience beyond the proteomics community. Results from this study establish proBAMsuite as a useful bioinformatics framework for proteomics and proteogenomics research.
doi:10.1074/mcp.M115.052860
PMCID: PMC4813696  PMID: 26657539
4.  Viral Determinants of miR-122-Independent Hepatitis C Virus Replication 
mSphere  2015;1(1):e00009-15.
Hepatitis C virus (HCV) is the leading cause of liver cancer in the Western Hemisphere. HCV infection requires miR-122, which is expressed only in liver cells, and thus is one reason that replication of this virus occurs efficiently only in cells of hepatic origin. To understand how HCV genetics impact miR-122 usage, we knocked out miR-122 using clustered regularly interspaced short palindromic repeat (CRISPR) technology and adapted virus to replicate in the presence of noncognate miR-122 RNAs. In doing so, we identified viral mutations that allow replication in the complete absence of miR-122. This work provides new insights into how HCV genetics influence miR-122 requirements and proves that replication can occur without this miRNA, which has broad implications for how HCV tropism is maintained.
ABSTRACT
Hepatitis C virus (HCV) replication requires binding of the liver-specific microRNA (miRNA) miR-122 to two sites in the HCV 5′ untranslated region (UTR). Although we and others have shown that viral genetics impact the amount of active miR-122 required for replication, it is unclear if HCV can replicate in the complete absence of this miRNA. To probe the absolute requirements for miR-122 and the genetic basis for those requirements, we used clustered regularly interspaced short palindromic repeat (CRISPR) technology to knock out miR-122 in Huh-7.5 cells and reconstituted these knockout (KO) cells with either wild-type miR-122 or a mutated version of this miRNA. We then characterized the replication of the wild-type virus, as well as a mutated HCV bearing 5′ UTR substitutions to restore binding to the mutated miR-122, in miR-122 KO Huh-7.5 cells expressing no, wild-type, or mutated miR-122. We found that while replication was most efficient when wild-type or mutated HCV was provided with the matched miR-122, inefficient replication could be observed in cells expressing the mismatched miR-122 or no miR-122. We then selected viruses capable of replicating in cells expressing noncognate miR-122 RNAs. Unexpectedly, these viruses contained multiple mutations throughout their first 42 nucleotides that would not be predicted to enhance binding of the provided miR-122. These mutations increased HCV RNA replication in cells expressing either the mismatched miR-122 or no miR-122. These data provide new evidence that HCV replication can occur independently of miR-122 and provide unexpected insights into how HCV genetics influence miR-122 requirements.
IMPORTANCE Hepatitis C virus (HCV) is the leading cause of liver cancer in the Western Hemisphere. HCV infection requires miR-122, which is expressed only in liver cells, and thus is one reason that replication of this virus occurs efficiently only in cells of hepatic origin. To understand how HCV genetics impact miR-122 usage, we knocked out miR-122 using clustered regularly interspaced short palindromic repeat (CRISPR) technology and adapted virus to replicate in the presence of noncognate miR-122 RNAs. In doing so, we identified viral mutations that allow replication in the complete absence of miR-122. This work provides new insights into how HCV genetics influence miR-122 requirements and proves that replication can occur without this miRNA, which has broad implications for how HCV tropism is maintained.
doi:10.1128/mSphere.00009-15
PMCID: PMC4863629  PMID: 27303683
hepatitis C virus; microRNA; miR-122; CRISPR
5.  The Molecular Epidemiological Study of HCV Subtypes among Intravenous Drug Users and Non-Injection Drug Users in China 
PLoS ONE  2015;10(10):e0140263.
Background
More than half of intravenous drug users (IDUs) in China suffer from the Hepatitis C virus (HCV). The virus is also more prevalent in non-injection drug users (NIDUs) than in the general population. However, not much is known about HCV subtype distribution in these populations.
Methods
Our research team conducted a cross-sectional study in four provinces in China. We sampled 825 IDUs and 244 NIDUs (1162 total), genotyped each DU’s virus, and performed a phylogenetic analysis to differentiate HCV subtypes.
Results
Nucleic acid testing (NAT) determined that 82% percent (952/1162) of samples were HCV positive; we subtyped 90% (859/952) of these. We found multiple HCV subtypes: 3b (249, 29.0%), 3a (225, 26.2%), 6a (156, 18.2%), 1b (137, 15.9%), 6n (50, 5.9%), 1a (27, 3.1%), and 2a (15, 1.7%). An analysis of subtype distributions adjusted for province found statistically significant differences between HCV subtypes in IDUs and NIDUs.
Discussion
HCV subtypes 3b, 3a, 6a, and 1b were the most common in our study, together accounting for 89% of infections. The subtype distribution differences we found between IDUs and NIDUs suggested that sharing syringes was not the most likely pathway for HCV transmission in NIDUs. However, further studies are needed to elucidate how NIDUs were infected.
doi:10.1371/journal.pone.0140263
PMCID: PMC4605846  PMID: 26466103
6.  Correcting systematic bias and instrument measurement drift with mzRefinery 
Bioinformatics  2015;31(23):3838-3840.
Motivation: Systematic bias in mass measurement adversely affects data quality and negates the advantages of high precision instruments.
Results: We introduce the mzRefinery tool for calibration of mass spectrometry data files. Using confident peptide spectrum matches, three different calibration methods are explored and the optimal transform function is chosen. After calibration, systematic bias is removed and the mass measurement errors are centered at 0 ppm. Because it is part of the ProteoWizard package, mzRefinery can read and write a wide variety of file formats.
Availability and implementation: The mzRefinery tool is part of msConvert, available with the ProteoWizard open source package at http://proteowizard.sourceforge.net/
Contact: samuel.payne@pnnl.gov
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btv437
PMCID: PMC4653383  PMID: 26243018
7.  Proteogenomic characterization of human colon and rectal cancer 
Nature  2014;513(7518):382-387.
Summary
We analyzed proteomes of colon and rectal tumors previously characterized by the Cancer Genome Atlas (TCGA) and performed integrated proteogenomic analyses. Somatic variants displayed reduced protein abundance compared to germline variants. mRNA transcript abundance did not reliably predict protein abundance differences between tumors. Proteomics identified five proteomic subtypes in the TCGA cohort, two of which overlapped with the TCGA “MSI/CIMP” transcriptomic subtype, but had distinct mutation, methylation, and protein expression patterns associated with different clinical outcomes. Although copy number alterations showed strong cis- and trans-effects on mRNA abundance, relatively few of these extend to the protein level. Thus, proteomics data enabled prioritization of candidate driver genes. The chromosome 20q amplicon was associated with the largest global changes at both mRNA and protein levels; proteomics data highlighted potential 20q candidates including HNF4A, TOMM34 and SRC. Integrated proteogenomic analysis provides functional context to interpret genomic abnormalities and affords a new paradigm for understanding cancer biology.
doi:10.1038/nature13438
PMCID: PMC4249766  PMID: 25043054
8.  QC metrics from CPTAC raw LC-MS/MS data interpreted through multivariate statistics 
Analytical chemistry  2014;86(5):2497-2509.
Shotgun proteomics experiments integrate a complex sequence of processes, any of which can introduce variability. Quality metrics computed from LC-MS/MS data have relied upon identifying MS/MS scans, but a new mode for the QuaMeter software produces metrics that are independent of identifications. Rather than evaluating each metric independently, we have created a robust multivariate statistical toolkit that accommodates the correlation structure of these metrics and allows for hierarchical relationships among data sets. The framework enables visualization and structural assessment of variability. Study 1 for the Clinical Proteomics Technology Assessment for Cancer (CPTAC), which analyzed three replicates of two common samples at each of two time points among 23 mass spectrometers in nine laboratories, provided the data to demonstrate this framework, and CPTAC Study 5 provided data from complex lysates under Standard Operating Procedures (SOPs) to complement these findings. Identification-independent quality metrics enabled the differentiation of sites and run-times through robust principal components analysis and subsequent factor analysis. Dissimilarity metrics revealed outliers in performance, and a nested ANOVA model revealed the extent to which all metrics or individual metrics were impacted by mass spectrometer and run time. Study 5 data revealed that even when SOPs have been applied, instrument-dependent variability remains prominent, although it may be reduced, while within-site variability is reduced significantly. Finally, identification-independent quality metrics were shown to be predictive of identification sensitivity in these data sets. QuaMeter and the associated multivariate framework are available from http://fenchurch.mc.vanderbilt.edu and http://homepages.uc.edu/~wang2x7/, respectively.
doi:10.1021/ac4034455
PMCID: PMC3982976  PMID: 24494671
9.  IDPQuantify: Combining Precursor Intensity with Spectral Counts for Protein and Peptide Quantification 
Journal of proteome research  2013;12(9):4111-4121.
Differentiating and quantifying protein differences in complex samples produces significant challenges in sensitivity and specificity. Label-free quantification can draw from two different information sources: precursor intensities and spectral counts. Intensities are accurate for calculating protein relative abundance, but values are often missing due to peptides that are identified sporadically. Spectral counting can reliably reproduce difference lists, but differentiating peptides or quantifying all but the most concentrated protein changes is usually beyond its abilities. Here we developed new software, IDPQuantify, to align multiple replicates using principal component analysis, extract accurate precursor intensities from MS data, and combine intensities with spectral counts for significant gains in differentiation and quantification. We have applied IDPQuantify to three comparative proteomic datasets featuring gold standard protein differences spiked in complicated backgrounds. The software is able to associate peptides with peaks that are otherwise left unidentified to increase the efficiency of protein quantification, especially for low-abundance proteins. By combing intensities with spectral counts from IDPicker, it gains an average of 30% more true positive differences among top differential proteins. IDPQuantify quantifies protein relative abundance accurately in these test datasets to produce good correlations between known and measured concentrations.
doi:10.1021/pr400438q
PMCID: PMC3804902  PMID: 23879310
precursor ion intensity; principal component analysis; retention time mapping; protein differentiation; comparative proteomics; quantitative proteomics; spectral counting; CPTAC
10.  Basophile: Accurate Fragment Charge State Prediction Improves Peptide Identification Rates 
In shotgun proteomics, database search algorithms rely on fragmentation models to predict fragment ions that should be observed for a given peptide sequence. The most widely used strategy (Naive model) is oversimplified, cleaving all peptide bonds with equal probability to produce fragments of all charges below that of the precursor ion. More accurate models, based on fragmentation simulation, are too computationally intensive for on-the-fly use in database search algorithms. We have created an ordinal-regression-based model called Basophile that takes fragment size and basic residue distribution into account when determining the charge retention during CID/higher-energy collision induced dissociation (HCD) of charged peptides. This model improves the accuracy of predictions by reducing the number of unnecessary fragments that are routinely predicted for highly-charged precursors. Basophile increased the identification rates by 26% (on average) over the Naive model, when analyzing triply-charged precursors from ion trap data. Basophile achieves simplicity and speed by solving the prediction problem with an ordinal regression equation, which can be incorporated into any database search software for shotgun proteomic identification.
doi:10.1016/j.gpb.2012.11.004
PMCID: PMC3737598  PMID: 23499924
Fragmentation; Basicity; Fragment size; Ordinal regression
11.  Halogen Photoelimination from Dirhodium Phosphazane Complexes via Chloride-Bridged Intermediates 
Halogen photoelimination is a critical step in HX-splitting photocatalysis. Herein, we report the photoreduction of a pair of valence-isomeric dirhodium phosphazane complexes, and suggest that a common intermediate is accessed in the photochemistry of both mixed-valent and valence-symmetric complexes. The results of these investigations suggest that halogen photoelimination proceeds by two sequential photochemical reactions: ligand dissociation followed by subsequent halogen elimination.
doi:10.1039/C3SC50462J
PMCID: PMC3819227  PMID: 24224081
12.  QuaMeter: Multivendor Performance Metrics for LC–MS/MS Proteomics Instrumentation 
Analytical chemistry  2012;84(14):5845-5850.
LC-MS/MS-based proteomics studies rely on stable analytical system performance that can be evaluated by objective criteria. The National Institute of Standards and Technology (NIST) introduced the MSQC software to compute diverse metrics from experimental LC-MS/MS data, enabling quality analysis and quality control (QA/QC) of proteomics instrumentation. In practice, however, several attributes of the MSQC software prevent its use for routine instrument monitoring. Here, we present QuaMeter, an open-source tool that improves MSQC in several aspects. QuaMeter can directly read raw data from instruments manufactured by different vendors. The software can work with a wide variety of peptide identification software for improved reliability and flexibility. Finally, QC metrics implemented in QuaMeter are rigorously defined and tested. The source code and binary versions of QuaMeter are available under Apache 2.0 License at http://fenchurch.mc.vanderbilt.edu.
doi:10.1021/ac300629p
PMCID: PMC3730131  PMID: 22697456
13.  The 2012/2013 PRG Study: Assessing Longitudinal Variability in Routine Peptide LC-MS/MS Analysis 
The PRG study for 2012-2013 was intended to catalog critical parameters of variability influencing LC-MS/MS data quality within laboratories over a nine month period between March and November, 2012. This study was intended to determine intra-laboratory reproducibility and inform participants of key areas of variability in routine peptide mass spectrometry analyses. Aliquots of a dried, digested protein mixture was sent to all participants with the expectation that once per month a new vial will be reconstituted and analyzed using routine LC-MS and data-dependent MS/MS acquisition settings. Of key importance in the design of this study is the lack of a standard operating protocol. The goal was to measure the degree of reproducibility within a lab as it applies to their established HPLC and MS settings and QC measures. A survey was conducted with each sample submission to catalog individual laboratory practices, instrument configurations, acquisition settings, and routine and non-routine maintenance procedures. Over 80 participants submitted at least one data set, and 36 participants completed the study with 8 or more submissions over the 9 month period. Survey data revealed the vast majority of laboratories (>90%) perform routine QC to determine system suitability, but there was considerable variability in the type and frequency of QC analysis. Collected raw data was searched using identical parameters by the PRG and analyzed for more than 40 MS and MS/MS metrics using the software QuaMeter. The software tool generates metrics that assess multiple properties of LC-MS/MS, from extracted ion chromatogram peak width to total ion current distribution and MS sampling rates. Both identification-dependent and identification-independent metrics can be generated. The variability within these metrics across time was analyzed for each participant and correlative relationships with survey results will be presented.
PMCID: PMC3635275
14.  [No title available] 
The PRG study for 2012–2013 was intended to catalog critical parameters of variability influencing LC-MS/MS data quality within laboratories over a nine month period between March and November, 2012. This study was intended to determine intra-laboratory reproducibility and inform participants of key areas of variability in routine peptide mass spectrometry analyses. Aliquots of a dried, digested protein mixture were sent to all participants with the expectation that once per month a new vial will be reconstituted and analyzed using routine LC-MS and data-dependent MS/MS acquisition settings. Of key importance in the design of this study is the lack of a standard operating protocol. The goal was to measure the degree of reproducibility within a lab as it applies to their established HPLC and MS settings and QC measures. A survey was conducted with each sample submission to catalog individual laboratory practices, instrument configurations, acquisition settings, and routine and non-routine maintenance procedures. Over 80 participants submitted at least one data set, and 36 participants completed the study with 8 or more submissions over the 9 month period. Survey data revealed the vast majority of laboratories (90%) perform routine QC to determine system suitability, but there was considerable variability in the type and frequency of QC analysis. Collected raw data was searched using identical parameters by the PRG and analyzed for more than 40 MS and MS/MS metrics using the software QuaMeter. The software tool generates metrics that assess multiple properties of LC-MS/MS, from extracted ion chromatogram peak width to total ion current distribution and MS sampling rates. Both identification-dependent and identification-independent metrics can be generated. The variability within these metrics across time was analyzed for each participant and correlative relationships with survey results will be presented.
PMCID: PMC3635300
15.  A Cross-platform Toolkit for Mass Spectrometry and Proteomics 
Nature biotechnology  2012;30(10):918-920.
Mass-spectrometry-based proteomics has become an important component of biological research. Numerous proteomics methods have been developed to identify and quantify the proteins in biological and clinical samples1, identify pathways affected by endogenous and exogenous perturbations2, and characterize protein complexes3. Despite successes, the interpretation of vast proteomics datasets remains a challenge. There have been several calls for improvements and standardization of proteomics data analysis frameworks, as well as for an application-programming interface for proteomics data access4,5. In response, we have developed the ProteoWizard Toolkit, a robust set of open-source, software libraries and applications designed to facilitate proteomics research. The libraries implement the first-ever, non-commercial, unified data access interface for proteomics, bridging field-standard open formats and all common vendor formats. In addition, diverse software classes enable rapid development of vendor-agnostic proteomics software. Additionally, ProteoWizard projects and applications, building upon the core libraries, are becoming standard tools for enabling significant proteomics inquiries.
doi:10.1038/nbt.2377
PMCID: PMC3471674  PMID: 23051804
16.  The HUPO proteomics standards initiative- mass spectrometry controlled vocabulary 
Controlled vocabularies (CVs), i.e. a collection of predefined terms describing a modeling domain, used for the semantic annotation of data, and ontologies are used in structured data formats and databases to avoid inconsistencies in annotation, to have a unique (and preferably short) accession number and to give researchers and computer algorithms the possibility for more expressive semantic annotation of data. The Human Proteome Organization (HUPO)–Proteomics Standards Initiative (PSI) makes extensive use of ontologies/CVs in their data formats. The PSI-Mass Spectrometry (MS) CV contains all the terms used in the PSI MS–related data standards. The CV contains a logical hierarchical structure to ensure ease of maintenance and the development of software that makes use of complex semantics. The CV contains terms required for a complete description of an MS analysis pipeline used in proteomics, including sample labeling, digestion enzymes, instrumentation parts and parameters, software used for identification and quantification of peptides/proteins and the parameters and scores used to determine their significance. Owing to the range of topics covered by the CV, collaborative development across several PSI working groups, including proteomics research groups, instrument manufacturers and software vendors, was necessary. In this article, we describe the overall structure of the CV, the process by which it has been developed and is maintained and the dependencies on other ontologies.
Database URL: http://psidev.cvs.sourceforge.net/viewvc/psidev/psi/psi-ms/mzML/controlledVocabulary/psi-ms.obo
doi:10.1093/database/bat009
PMCID: PMC3594986  PMID: 23482073
17.  Basophile: Accurate Fragment Charge State Prediction Improves Peptide Identification Rates 
In shotgun proteomics, database search algorithms rely on fragmentation models to predict fragment ions that should be observed for a given peptide sequence. The most widely used strategy (Naive model) is oversimplified, cleaving all peptide bonds with equal probability to produce fragments of all charges below that of the precursor ion. More accurate models, based on fragmentation simulation, are too computationally intensive for on-the-fly use in database search algorithms. We have created an ordinal-regression-based model called Basophile that takes fragment size and basic residue distribution into account when determining the charge retention during CID/higher-energy collision induced dissociation (HCD) of charged peptides. This model improves the accuracy of predictions by reducing the number of unnecessary fragments that are routinely predicted for highly-charged precursors. Basophile increased the identification rates by 26% (on average) over the Naive model, when analyzing triply-charged precursors from ion trap data. Basophile achieves simplicity and speed by solving the prediction problem with an ordinal regression equation, which can be incorporated into any database search software for shotgun proteomic identification.
doi:10.1016/j.gpb.2012.11.004
PMCID: PMC3737598  PMID: 23499924
Fragmentation; Basicity; Fragment size; Ordinal regression
18.  Pepitome: evaluating improved spectral library search for identification complementarity and quality assessment 
Journal of Proteome Research  2012;11(3):1686-1695.
Spectral libraries have emerged as a viable alternative to protein sequence databases for peptide identification. These libraries contain previously detected peptide sequences and their corresponding tandem mass spectra (MS/MS). Search engines can then identify peptides by comparing experimental MS/MS scans to those in the library. Many of these algorithms employ the dot product score for measuring the quality of a spectrum-spectrum match (SSM). This scoring system does not offer a clear statistical interpretation and ignores fragment ion m/z discrepancies in the scoring. We developed a new spectral library search engine, Pepitome, which employs statistical systems for scoring SSMs. Pepitome outperformed the leading library search tool, SpectraST, when analyzing data sets acquired on three different mass spectrometry platforms. We characterized the reliability of spectral library searches by confirming shotgun proteomics identifications through RNA-Seq data. Applying spectral library and database searches on the same sample revealed their complementary nature. Pepitome identifications enabled the automation of quality analysis and quality control (QA/QC) for shotgun proteomics data acquisition pipelines.
doi:10.1021/pr200874e
PMCID: PMC3292681  PMID: 22217208
19.  Supporting tool suite for production proteomics 
Bioinformatics  2011;27(22):3214-3215.
Summary: The large amount of data produced by proteomics experiments requires effective bioinformatics tools for the integration of data management and data analysis. Here we introduce a suite of tools developed at Vanderbilt University to support production proteomics. We present the Backup Utility Service tool for automated instrument file backup and the ScanSifter tool for data conversion. We also describe a queuing system to coordinate identification pipelines and the File Collector tool for batch copying analytical results. These tools are individually useful but collectively reinforce each other. They are particularly valuable for proteomics core facilities or research institutions that need to manage multiple mass spectrometers. With minor changes, they could support other types of biomolecular resource facilities.
Availability and Implementation: Source code and executable versions are available under Apache 2.0 License at http://www.vicc.org/jimayersinstitute/data/
Contact: daniel.liebler@vanderbilt.edu
doi:10.1093/bioinformatics/btr544
PMCID: PMC3208394  PMID: 21965817
20.  ScanRanker: Quality Assessment of Tandem Mass Spectra via Sequence Tagging 
Journal of proteome research  2011;10(7):2896-2904.
In shotgun proteomics, protein identification by tandem mass spectrometry relies on bioinformatics tools. Despite recent improvements in identification algorithms, a significant number of high quality spectra remain unidentified for various reasons. Here we present ScanRanker, an open-source tool that evaluates the quality of tandem mass spectra via sequence tagging with reliable performance in data from different instruments. The superior performance of ScanRanker enables it not only to find unassigned high quality spectra that evade identification through database search, but also to select spectra for de novo sequencing and cross-linking analysis. In addition, we demonstrate that the distribution of ScanRanker scores predicts the richness of identifiable spectra among multiple LC-MS/MS runs in an experiment, and ScanRanker scores assist the process of peptide assignment validation to increase confident spectrum identifications. The source code and executable versions of ScanRanker are available from http://fenchurch.mc.vanderbilt.edu.
doi:10.1021/pr200118r
PMCID: PMC3128668  PMID: 21520941
spectral quality; sequence tagging; bioinformatics; tandem mass spectrometry; cross-linking
21.  The mzIdentML Data Standard for Mass Spectrometry-Based Proteomics Results 
Molecular & Cellular Proteomics : MCP  2012;11(7):M111.014381.
We report the release of mzIdentML, an exchange standard for peptide and protein identification data, designed by the Proteomics Standards Initiative. The format was developed by the Proteomics Standards Initiative in collaboration with instrument and software vendors, and the developers of the major open-source projects in proteomics. Software implementations have been developed to enable conversion from most popular proprietary and open-source formats, and mzIdentML will soon be supported by the major public repositories. These developments enable proteomics scientists to start working with the standard for exchanging and publishing data sets in support of publications and they provide a stable platform for bioinformatics groups and commercial software vendors to work with a single file format for identification data.
doi:10.1074/mcp.M111.014381
PMCID: PMC3394945  PMID: 22375074
22.  Sequence tagging reveals unexpected modifications in toxicoproteomics 
Chemical research in toxicology  2011;24(2):204-216.
Toxicoproteomic samples are rich in posttranslational modifications (PTMs) of proteins. Identifying these modifications via standard database searching can incur significant performance penalties. Here we describe the latest developments in TagRecon, an algorithm that leverages inferred sequence tags to identify modified peptides in toxicoproteomic data sets. TagRecon identifies known modifications more effectively than the MyriMatch database search engine. TagRecon outperformed state of the art software in recognizing unanticipated modifications from LTQ, Orbitrap, and QTOF data sets. We developed user-friendly software for detecting persistent mass shifts from samples. We follow a three-step strategy for detecting unanticipated PTMs in samples. First, we identify the proteins present in the sample with a standard database search. Next, identified proteins are interrogated for unexpected PTMs with a sequence tag-based search. Finally, additional evidence is gathered for the detected mass shifts with a refinement search. Application of this technology on toxicoproteomic data sets revealed unintended cross-reactions between proteins and sample processing reagents. Twenty five proteins in rat liver showed signs of oxidative stress when exposed to potentially toxic drugs. These results demonstrate the value of mining toxicoproteomic data sets for modifications.
doi:10.1021/tx100275t
PMCID: PMC3042045  PMID: 21214251
23.  The 2012 PRG study: Assessing Longitudinal Variability in Routine Peptide LC-MS/MS Analysis 
The PRG study for 2012 is intended to catalog critical parameters of variability influencing LC-MS/MS data quality within labs over a nine month period between March and November, 2012. This study is intended to inform participant labs of key areas of variability in their routine qualitative and quantitative analyses. A dried digested protein mix is sent to labs in aliquots with the expectation that once per month a new vial will be reconstituted and analyzed using routine LC-MS and data-dependent MS/MS acquisition settings. Participants will return the raw data to a centralized server for analysis. The analysis consists of 42 MS and MS/MS metrics that have been determined through the efforts of the CPTC consortium and implemented in open source software from NIST (“MSQC”) and Vanderbilt University (“QuaMeter”). Of key importance in the design of this study is the lack of a standard operating protocol. The concept is to determine variability within a lab when that lab uses their own routine settings and QC measures. A survey is conducted with each sample submission to catalog changes in operators, acquisition settings, as well as routine and non-routine maintenance procedures. As of date, there were 95 labs in 23 countries requesting sample. Within these labs are 25 different models of mass spectrometers from 6 commercial vendors.
PMCID: PMC3630542
24.  The 2012 PRG Study: Assessing Longitudinal Variability in Routine Peptide LC-MS/MS Analysis 
The PRG study for 2012 is intended to catalog critical parameters of variability influencing LC-MS/MS data quality within labs over a nine month period between March and November, 2012. This study is intended to inform participant labs of key areas of variability in their routine qualitative and quantitative analyses. A dried digested protein mix is sent to labs in aliquots with the expectation that once per month a new vial will be reconstituted and analyzed using routine LC-MS and data-dependent MS/MS acquisition settings. Participants will return the raw data to a centralized server for analysis. The analysis consists of 42 MS and MS/MS metrics that have been determined through the efforts of the CPTC consortium and implemented in open source software from NIST (“MSQC”) and Vanderbilt University (“QuaMeter”). Of key importance in the design of this study is the lack of a standard operating protocol. The concept is to determine variability within a lab when that lab uses their own routine settings and QC measures. A survey is conducted with each sample submission to catalog changes in operators, acquisition settings, as well as routine and non-routine maintenance procedures. As of date, there were 95 labs in 23 countries requesting sample. Within these labs are 25 different models of mass spectrometers from 6 commercial vendors.
PMCID: PMC3630553
25.  TraML—A Standard Format for Exchange of Selected Reaction Monitoring Transition Lists* 
Molecular & Cellular Proteomics : MCP  2011;11(4):R111.015040.
Targeted proteomics via selected reaction monitoring is a powerful mass spectrometric technique affording higher dynamic range, increased specificity and lower limits of detection than other shotgun mass spectrometry methods when applied to proteome analyses. However, it involves selective measurement of predetermined analytes, which requires more preparation in the form of selecting appropriate signatures for the proteins and peptides that are to be targeted. There is a growing number of software programs and resources for selecting optimal transitions and the instrument settings used for the detection and quantification of the targeted peptides, but the exchange of this information is hindered by a lack of a standard format. We have developed a new standardized format, called TraML, for encoding transition lists and associated metadata. In addition to introducing the TraML format, we demonstrate several implementations across the community, and provide semantic validators, extensive documentation, and multiple example instances to demonstrate correctly written documents. Widespread use of TraML will facilitate the exchange of transitions, reduce time spent handling incompatible list formats, increase the reusability of previously optimized transitions, and thus accelerate the widespread adoption of targeted proteomics via selected reaction monitoring.
doi:10.1074/mcp.R111.015040
PMCID: PMC3322582  PMID: 22159873

Results 1-25 (32)