Search tips
Search criteria

Results 1-25 (45)

Clipboard (0)

Select a Filter Below

Year of Publication
more »
1.  Comprehensive analysis of protein digestion using six trypsins reveals the origin of trypsin as a significant source of variability in proteomics1 
Journal of proteome research  2013;12(12):5666-5680.
Trypsin is an endoprotease commonly used for sample preparation in proteomics experiments. Importantly, protein digestion is dependent on multiple factors, including the trypsin origin and digestion conditions. In-depth characterization of trypsin activity could lead to improved reliability of peptide detection and quantitation in both targeted and discovery proteomics studies. To this end, we assembled a data analysis pipeline and suite of visualization tools for quality control and comprehensive characterization of pre-analytical variability in proteomics experiments. Using these tools, we evaluated six available proteomics-grade trypsins and their digestion of a single purified protein, human serum albumin (HSA). HSA was aliquoted and then digested for 2 or 18 hours for each trypsin, and the resulting digests were desalted and analyzed in triplicate by reversed phase liquid chromatography - tandem mass spectrometry. Peptides were identified and quantified using the NIST MSQC pipeline and a comprehensive HSA mass spectral library. We performed a statistical analysis of peptide abundances from different digests, and further visualized the data using the principal component analysis and quantitative protein “sequence maps”. While the performance of individual trypsins across repeat digests was reproducible, significant differences were observed depending on the origin of the trypsin (i.e., bovine vs. porcine). Bovine trypsins produced a higher number of peptides containing missed cleavages, whereas porcine trypsins produced more semi-tryptic peptides. In addition, many cleavage sites showed variable digestion kinetics patterns, evident from the comparison of peptide abundances in 2 hour vs. 18 hour digests. Overall, this work illustrates effects of an often neglected source of variability in proteomics experiments: the origin of the trypsin.
PMCID: PMC4076643  PMID: 24116745
proteomics; mass spectrometry; trypsin; digestion; endoprotease specificity; peptide abundance; variability; missed cleavages; label-free quantification; statistical analysis
2.  Using ProHits to store, annotate and analyze affinity purification - mass spectrometry (AP-MS) data 
Affinity purification coupled with mass spectrometry (AP-MS) is a robust technique used to identify protein-protein interactions. With recent improvements in sample preparation, and dramatic advances in MS instrumentation speed and sensitivity, this technique is becoming more widely used throughout the scientific community. To meet the needs of research groups both large and small, we have developed software solutions for tracking, scoring and analyzing AP-MS data. Here, we provide details for the installation and utilization of ProHits, a Laboratory Information Management System designed specifically for AP-MS interaction proteomics. This protocol explains: (i) how to install the complete ProHits system, including modules for the management of mass spectrometry files and the analysis of interaction data, and (ii) alternative options for the use of pre-existing search results in simpler versions of ProHits, including a virtual machine implementation of our ProHits Lite software. We also describe how to use the main features of the software to analyze AP-MS data.
PMCID: PMC3669397  PMID: 22948730
Affinity purification coupled with mass spectrometry; Data analysis; Virtual machine; Statistical models; Protein-protein interactions
3.  Adaptive Discriminant Function Analysis and Re-ranking of MS/MS Database Search Results for Improved Peptide Identification in Shotgun Proteomics 
Journal of proteome research  2008;7(11):4878-4889.
Robust statistical validation of peptide identifications obtained by tandem mass spectrometry and sequence database searching is an important task in shotgun proteomics. PeptideProphet is a commonly used computational tool that computes confidence measures for peptide identifications. In this paper, we investigate several limitations of the PeptideProphet modeling approach, including the use of fixed coefficients in computing the discriminant search score and selection of the top scoring peptide assignment per spectrum only. To address these limitations, we describe an adaptive method in which a new discriminant function is learned from the data in an iterative fashion. We extend the modeling framework to go beyond the top scoring peptide assignment per spectrum. We also investigate the effect of clustering the spectra according to their spectrum quality score followed by cluster-specific mixture modeling. The analysis is carried out using data acquired from a mixture of purified proteins on four different types of mass spectrometers, as well as using a complex human serum dataset. A special emphasis is placed on the analysis of data generated on high mass accuracy instruments.
PMCID: PMC3744223  PMID: 18788775
Tandem Mass Spectrometry; Database searching; Peptide Identification; Statistical Modeling; Adaptive Discriminant Analysis; Mass Accuracy; Decoy Sequences
4.  Improved Sequence Tag Generation Method for Peptide Identification in Tandem Mass Spectrometry 
Journal of proteome research  2008;7(10):4422-4434.
The sequence tag-based peptide identification methods are a promising alternative to the traditional database search approach. However, a more comprehensive analysis, optimization, and comparison with established methods are necessary before these methods can gain widespread use in the proteomics community. Using the InsPecT open source code base (Tanner et al., Anal Chem. 2005, 77:4626–39), we present an improved sequence tag generation method that directly incorporates multi-charged fragment ion peaks present in many tandem mass spectra of higher charge states. We also investigate the performance of sequence tagging under different settings using control datasets generated on five different types of mass spectrometers, as well as using a complex phosphopeptide-enriched sample. We also demonstrate that additional modeling of InsPecT search scores using a semi-parametric approach incorporating the accuracy of the precursor ion mass measurement provides additional improvement in the ability to discriminate between correct and incorrect peptide identifications. The overall superior performance of the sequence tag-based peptide identification method is demonstrated by comparison with a commonly used SEQUEST/PeptideProphet approach.
PMCID: PMC3744226  PMID: 18785767
Proteomics; Tandem Mass Spectrometry; Peptide Identification; Database Searching; De Novo Sequencing; Algorithms; Statistical Analysis
5.  Comparison of MS2-only, MSA, and MS2/MS3 Methodologies for Phosphopeptide Identification 
Journal of proteome research  2009;8(2):887-899.
Current mass spectrometers provide a number of alternative methodologies for producing tandem mass spectra specifically for phosphopeptide analysis. In particular, generation of MS3 spectra in a data-dependent manner upon detection of the neutral loss of a phosphoric acid in MS2 spectra is a popular technique for circumventing the problem of poor phosphopeptide backbone fragmentation. The newer Multistage Activation method provides another option. Both these strategies require additional cycle time on the instrument and therefore reduce the number of spectra that can be measured in the same amount of time. Additional informatics is often required to make most efficient use of the additional information provided by these spectra as well. This work presents a comparison of several commonly used mass spectrometry methods for the study of phosphopeptide-enriched samples: an MS2-only method, a Multistage Activation method, and an MS2/MS3 data-dependent neutral loss method. Several strategies for dealing effectively with the resulting MS3 data in the latter approach are also presented and compared. The overall goal is to infer whether any one methodology performs significantly better than another for identifying phosphopeptides. On data presented here, the Multistage Activation methodology is demonstrated to perform optimally and does not result in significant loss of unique peptide identifications.
PMCID: PMC2734953  PMID: 19072539
Protein phosphorylation; mass spectrometry; MS3; Multistage Activation; phosphoproteomics; bioinformatics; peptide identification; database search
6.  Reconstructing targetable pathways in lung cancer by integrating diverse omics data 
Nature communications  2013;4:2617.
Global ‘multi-omics’ profiling of cancer cells harbours the potential for characterizing the signaling networks associated with specific oncogenes. Here we profile the transcriptome, proteome and phosphoproteome in a panel of non-small cell lung cancer (NSCLC) cell lines in order to reconstruct targetable networks associated with KRAS dependency. We develop a two-step bioinformatics strategy addressing the challenge of integrating these disparate data sets. We first define an ‘abundance-score’ combining transcript, protein and phospho-protein abundances to nominate differentially abundant proteins and then use the Prize Collecting Steiner Tree algorithm to identify functional sub-networks. We identify three modules centered on KRAS and MET, LCK and PAK1 and b-Catenin. We validate activation of these proteins in KRAS-dependent (KRAS-Dep) cells and perform functional studies defining LCK as a critical gene for cell proliferation in KRAS-Dep but not KRAS-independent NSCLCs. These results suggest that LCK is a potential druggable target protein in KRAS-Dep lung cancers.
PMCID: PMC4107456  PMID: 24135919
7.  GSK3β controls epithelial-mesenchymal transition and tumor metastasis by CHIP-mediated degradation of Slug 
Oncogene  2013;33(24):3172-3182.
Glycogen synthase kinase 3 beta (GSK3β) is highly inactivated in epithelial cancers and is known to inhibit tumor migration and invasion. The zinc-finger-containing transcriptional repressor, Slug, represses E-cadherin transcription and enhances epithelial-mesenchymal transition (EMT). In this study, we find that the GSK3β-pSer9 level is associated with the expression of Slug in non-small cell lung cancer (NSCLC). GSK3β-mediated phosphorylation of Slug facilitates Slug protein turnover. Proteomic analysis reveals that the C-terminus of Hsc70-interacting protein (CHIP) interacts with wild-type Slug (wtSlug). Knockdown of CHIP stabilizes the wtSlug protein and reduces Slug ubiquitylation and degradation. In contrast, nonphosphorylatable Slug-4SA is not degraded by CHIP. The accumulation of nondegradable Slug may further lead to the repression of E-cadherin expression and promote cancer cell migration, invasion, and metastasis. Our findings provide evidence of a de novo GSK3β-CHIP-Slug pathway that may be involved in the progression of metastasis in lung cancer.
PMCID: PMC4096338  PMID: 23851495
GSK3β; Slug; CHIP; post-translational modification
8.  A Global Protein Kinase and Phosphatase Interaction Network in Yeast 
Science (New York, N.Y.)  2010;328(5981):1043-1046.
The interactions of protein kinases and phosphatases with their regulatory subunits and substrates underpin cellular regulation. We identified a kinase and phosphatase interaction (KPI) network of 1844 interactions in budding yeast by mass spectrometric analysis of protein complexes. The KPI network contained many dense local regions of interactions that suggested new functions. Notably, the cell cycle phosphatase Cdc14 associated with multiple kinases that revealed roles for Cdc14 in mitogen-activated protein kinase signaling, the DNA damage response, and metabolism, whereas interactions of the target of rapamycin complex 1 (TORC1) uncovered new effector kinases in nitrogen and carbon metabolism. An extensive backbone of kinase-kinase interactions cross-connects the proteome and may serve to coordinate diverse cellular responses.
PMCID: PMC3983991  PMID: 20489023
9.  The Yeast Sks1p Kinase Signaling Network Regulates Pseudohyphal Growth and Glucose Response 
PLoS Genetics  2014;10(3):e1004183.
The yeast Saccharomyces cerevisiae undergoes a dramatic growth transition from its unicellular form to a filamentous state, marked by the formation of pseudohyphal filaments of elongated and connected cells. Yeast pseudohyphal growth is regulated by signaling pathways responsive to reductions in the availability of nitrogen and glucose, but the molecular link between pseudohyphal filamentation and glucose signaling is not fully understood. Here, we identify the glucose-responsive Sks1p kinase as a signaling protein required for pseudohyphal growth induced by nitrogen limitation and coupled nitrogen/glucose limitation. To identify the Sks1p signaling network, we applied mass spectrometry-based quantitative phosphoproteomics, profiling over 900 phosphosites for phosphorylation changes dependent upon Sks1p kinase activity. From this analysis, we report a set of novel phosphorylation sites and highlight Sks1p-dependent phosphorylation in Bud6p, Itr1p, Lrg1p, Npr3p, and Pda1p. In particular, we analyzed the Y309 and S313 phosphosites in the pyruvate dehydrogenase subunit Pda1p; these residues are required for pseudohyphal growth, and Y309A mutants exhibit phenotypes indicative of impaired aerobic respiration and decreased mitochondrial number. Epistasis studies place SKS1 downstream of the G-protein coupled receptor GPR1 and the G-protein RAS2 but upstream of or at the level of cAMP-dependent PKA. The pseudohyphal growth and glucose signaling transcription factors Flo8p, Mss11p, and Rgt1p are required to achieve wild-type SKS1 transcript levels. SKS1 is conserved, and deletion of the SKS1 ortholog SHA3 in the pathogenic fungus Candida albicans results in abnormal colony morphology. Collectively, these results identify Sks1p as an important regulator of filamentation and glucose signaling, with additional relevance towards understanding stress-responsive signaling in C. albicans.
Author Summary
Eukaryotic cells respond to nutritional and environmental stress through complex regulatory programs controlling cell metabolism, growth, and morphology. In the budding yeast Saccharomyces cerevisiae, conditions of limited nitrogen and/or glucose can initiate a dramatic growth transition wherein the yeast cells form extended multicellular filaments resembling the true hyphal tubes of filamentous fungi. The formation of these pseudohyphal filaments is governed by core regulatory pathways that have been studied for decades; however, the mechanism by which these signaling systems are integrated is less well understood. We find that the protein kinase Sks1p contributes to the integration of signals for nitrogen and/or glucose limitation, resulting in pseudohyphal growth. We implemented a mass spectrometry-based approach to profile phosphorylation events across the proteome dependent upon Sks1p kinase activity and identified phosphorylation sites important for mitochondrial function and pseudohyphal growth. Our studies place Sks1p in the regulatory context of a well-known pseudohyphal growth signaling pathway. We further find that SKS1 is conserved and required for stress-responsive colony morphology in the principal opportunistic human fungal pathogen Candida albicans. Thus, Sks1p is part of the mechanism integrating glucose-responsive cell signaling and pseudohyphal growth, and its function is required for colony morphology linked with virulence in C. albicans.
PMCID: PMC3945295  PMID: 24603354
10.  Sparsely correlated hidden Markov models with application to genome-wide location studies 
Bioinformatics  2013;29(5):533-541.
Motivation: Multiply correlated datasets have become increasingly common in genome-wide location analysis of regulatory proteins and epigenetic modifications. Their correlation can be directly incorporated into a statistical model to capture underlying biological interactions, but such modeling quickly becomes computationally intractable.
Results: We present sparsely correlated hidden Markov models (scHMM), a novel method for performing simultaneous hidden Markov model (HMM) inference for multiple genomic datasets. In scHMM, a single HMM is assumed for each series, but the transition probability in each series depends on not only its own hidden states but also the hidden states of other related series. For each series, scHMM uses penalized regression to select a subset of the other data series and estimate their effects on the odds of each transition in the given series. Following this, hidden states are inferred using a standard forward–backward algorithm, with the transition probabilities adjusted by the model at each position, which helps retain the order of computation close to fitting independent HMMs (iHMM). Hence, scHMM is a collection of inter-dependent non-homogeneous HMMs, capable of giving a close approximation to a fully multivariate HMM fit. A simulation study shows that scHMM achieves comparable sensitivity to the multivariate HMM fit at a much lower computational cost. The method was demonstrated in the joint analysis of 39 histone modifications, CTCF and RNA polymerase II in human CD4+ T cells. scHMM reported fewer high-confidence regions than iHMM in this dataset, but scHMM could recover previously characterized histone modifications in relevant genomic regions better than iHMM. In addition, the resulting combinatorial patterns from scHMM could be better mapped to the 51 states reported by the multivariate HMM method of Ernst and Kellis.
Availability: The scHMM package can be freely downloaded from and is recommended for use in a linux environment.
Contact: or
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3582268  PMID: 23325620
11.  The CRAPome: a Contaminant Repository for Affinity Purification Mass Spectrometry Data 
Nature methods  2013;10(8):730-736.
Affinity purification coupled with mass spectrometry (AP-MS) is now a widely used approach for the identification of protein-protein interactions. However, for any given protein of interest, determining which of the identified polypeptides represent bona fide interactors versus those that are background contaminants (e.g. proteins that interact with the solid-phase support, affinity reagent or epitope tag) is a challenging task. While the standard approach is to identify nonspecific interactions using one or more negative controls, most small-scale AP-MS studies do not capture a complete, accurate background protein set. Fortunately, negative controls are largely bait-independent. Hence, aggregating negative controls from multiple AP-MS studies can increase coverage and improve the characterization of background associated with a given experimental protocol. Here we present the Contaminant Repository for Affinity Purification (the CRAPome) and describe the use of this resource to score protein-protein interactions. The repository (currently available for Homo sapiens and Saccharomyces cerevisiae) and computational tools are freely available online at
PMCID: PMC3773500  PMID: 23921808
12.  Analyzing protein-protein interactions from affinity purification-mass spectrometry data with SAINT 
Significance Analysis of INTeractome (SAINT) is a software package for scoring protein-protein interactions based on label-free quantitative proteomics data (e.g. spectral count or intensity) in affinity purification – mass spectrometry (AP-MS) experiments. SAINT allows bench scientists to select bona fide interactions and remove non-specific interactions in an unbiased manner. However, there is no `one-size-fits-all' statistical model for every dataset, since the experimental design varies across studies. Key variables include the number of baits, the number of biological replicates per bait, and control purifications. Here we give a detailed account of input data format, control data, selection of high confidence interactions, and visualization of filtered data. We explain additional options for customizing the statistical model for optimal filtering in specific datasets. We also discuss a graphical user interface of SAINT in connection to the LIMS system ProHits which can be installed as a virtual machine on Mac OSX or PC Windows computers.
PMCID: PMC3446209  PMID: 22948729
Protein-protein interactions; Label-free quantitative proteomics; Affinity purification – mass spectrometry (AP-MS); Statistical model
13.  Comparative analysis of different label-free mass spectrometry based protein abundance estimates and their correlation with RNA-Seq gene expression data 
Journal of proteome research  2012;11(4):2261-2271.
An increasing number of studies involve integrative analysis of gene and protein expression data taking advantage of new technologies such as next-generation transcriptome sequencing (RNA-Seq) and highly sensitive mass spectrometry (MS) instrumentation. Thus, it becomes interesting to revisit the correlative analysis of gene and protein expression data using more recently generated datasets. Furthermore, within the proteomics community there is a substantial interest in comparing the performance of different label-free quantitative proteomic strategies. Gene expression data can be used as an indirect benchmark for such protein-level comparisons. In this work we use publicly available mouse data to perform a joint analysis of genomic and proteomic data obtained on the same organism. First, we perform a comparative analysis of different label-free protein quantification methods (intensity-based and spectral count based, and using various associated data normalization steps) using several software tools on proteomic side. Similarly, we perform correlative analysis of gene expression data derived using microarray and RNA-Seq methods on genomic side. We also investigate the correlation between gene and protein expression data, and various factors affecting the accuracy of quantitation at both levels. It is observed that spectral count-based protein abundance metrics, which are easy to extract from any published data, are comparable to intensity-base measures with respect to correlation with gene expression data. The results of this work should be useful for designing robust computational pipelines for extraction and joint analysis of gene and protein expression data in the context of integrative studies.
PMCID: PMC3744887  PMID: 22329341
14.  SAINT-MS1: protein-protein interaction scoring using label-free intensity data in affinity purification – mass spectrometry experiments 
Journal of proteome research  2012;11(4):2619-2624.
We present a statistical method SAINT-MS1 for scoring protein-protein interactions based on the label-free MS1 intensity data from affinity purification - mass spectrometry (AP-MS) experiments. The method is an extension of Significance Analysis of INTeractome (SAINT), a model-based method previously developed for spectral count data. We reformulated the statistical model for the log-transformed intensity data, including adequate treatment of missing observations, i.e. interactions whose quantitative data are inconsistent over replicate purifications. We demonstrate the performance of SAINT-MS1 using two recently published datasets: a small LTQ-Orbitrap dataset with three replicate purifications of single human bait protein and control purifications, and a larger drosophila dataset targeting insulin receptor/target of rapamycin signaling pathway generated using an LTQ-FT instrument. Using the drosophila dataset, we also compare and discuss the performance of SAINT analysis based on spectral count and MS1 intensity data in terms of the recovery of orthologous and literature-curated interactions. Given rapid advances in high mass accuracy instrumentation and intensity-based label-free quantification software, we expect that SAINT-MS1 will become a useful tool allowing improved detection of protein interactions in label-free AP-MS data, especially in the low abundance range.
PMCID: PMC3744231  PMID: 22352807
protein-protein interaction; interaction scoring; affinity purification; mass spectrometry; spectral counts; intensity
15.  Computational and informatics strategies for identification of specific protein interaction partners in affinity purification mass spectrometry experiments 
Proteomics  2012;12(10):1639-1655.
Analysis of protein interaction networks and protein complexes using affinity purification and mass spectrometry (AP/MS) is among most commonly used and successful applications of proteomics technologies. One of the foremost challenges of AP/MS data is a large number of false positive protein interactions present in unfiltered datasets. Here we review computational and informatics strategies for detecting specific protein interaction partners in AP/MS experiments, with a focus on incomplete (as opposite to genome-wide) interactome mapping studies. These strategies range from standard statistical approaches, to empirical scoring schemes optimized for a particular type of data, to advanced computational frameworks. The common denominator among these methods is the use of label-free quantitative information such as spectral counts or integrated peptide intensities that can be extracted from AP/MS data. We also discuss related issues such as combining multiple biological or technical replicates, and dealing with data generated using different tagging strategies. Computational approaches for benchmarking of scoring methods are discussed, and the need for generation of reference AP/MS datasets is highlighted. Finally, we discuss the possibility of more extended modeling of experimental AP/MS data, including integration with external information such as protein interaction predictions based on functional genomics data.
PMCID: PMC3744239  PMID: 22611043
Proteomics; Affinity Purification; Mass Spectrometry; Protein Interactions; Statistical Models; Label-free Quantification; Integrative Analysis
16.  The functional interactome landscape of the human histone deacetylase family 
This study presents the first global protein interaction network for all 11 human HDACs in T cells and an integrative mass spectrometry approach for profiling relative interaction stability within isolated protein complexes.
T-cell lines stably expressing each of the human HDACs (1 - 11), C-terminally tagged with both EGFP and FLAG, were generated using retroviral transduction.Affinity purification coupled to mass spectrometry-based proteomics (AP-MS) was used to build the first global protein interaction network for all eleven human HDACs in T cells.An optimized label free AP-MS and computational workflow was developed for profiling relative interaction stability among isolated protein complexes.HDAC11 is a member of the “survival of motor neuron” protein complex with a functional role in mRNA splicing.
Histone deacetylases (HDACs) are a diverse family of essential transcriptional regulatory enzymes, that function through the spatial and temporal recruitment of protein complexes. As the composition and regulation of HDAC complexes are only partially characterized, we built the first global protein interaction network for all 11 human HDACs in T cells. Integrating fluorescence microscopy, immunoaffinity purifications, quantitative mass spectrometry, and bioinformatics, we identified over 200 unreported interactions for both well-characterized and lesser-studied HDACs, a subset of which were validated by orthogonal approaches. We establish HDAC11 as a member of the survival of motor neuron complex and pinpoint a functional role in mRNA splicing. We designed a complementary label-free and metabolic-labeling mass spectrometry-based proteomics strategy for profiling interaction stability among different HDAC classes, revealing that HDAC1 interactions within chromatin-remodeling complexes are largely stable, while transcription factors preferentially exist in rapid equilibrium. Overall, this study represents a valuable resource for investigating HDAC functions in health and disease, encompassing emerging themes of HDAC regulation in cell cycle and RNA processing and a deeper functional understanding of HDAC complex stability.
PMCID: PMC3964310  PMID: 23752268
HDAC; I-DIRT; interactions; proteomics; SAINT
17.  The Nucleotide Synthesis Enzyme CAD Inhibits NOD2 Antibacterial Function in Human Intestinal Epithelial Cells 
Gastroenterology  2012;142(7):1483-92.e6.
Polymorphisms that reduce the function of nucleotide-binding oligomerization domain (NOD)2, a bacterial sensor, have been associated with Crohn’s disease (CD). No proteins that regulate NOD2 activity have been identified as selective pharmacologic targets. We sought to discover regulators of NOD2 that might be pharmacologic targets for CD therapies.
Carbamoyl phosphate synthetase/ aspartate transcarbamylase/dihydroorotase (CAD) is an enzyme required for de novo pyrimidine nucleotide synthesis; it was identified as a NOD2-interacting protein by immunoprecipitation-coupled mass spectrometry. CAD expression was assessed in colon tissues from individuals with and without inflammatory bowel disease by immunohistochemistry. The interaction between CAD and NOD2 was assessed in human HCT116 intestinal epithelial cells by immunoprecipitation, immunoblot, reporter gene, and gentamicin protection assays. We also analyzed human cell lines that express variants of NOD2 and the effects of RNA interference, overexpression and CAD inhibitors.
CAD was identified as a NOD2-interacting protein expressed at increased levels in the intestinal epithelium of patients with CD compared with controls. Overexpression of CAD inhibited NOD2-dependent activation of nuclear factor κB and p38 mitogen-activated protein kinase, as well as intracellular killing of Salmonella. Reduction of CAD expression or administration of CAD inhibitors increased NOD2-dependent signaling and antibacterial functions of NOD2 variants that are and are not associated with CD.
The nucleotide synthesis enzyme CAD is a negative regulator of NOD2. The antibacterial function of NOD2 variants that have been associated with CD increased in response to pharmacologic inhibition of CAD. CAD is a potential therapeutic target for CD.
PMCID: PMC3565430  PMID: 22387394
NLR; Innate Immunity; IBD; PALA
18.  Keeping Track of Interactomes Using the ProHits LIMS 
Affinity purification coupled with mass spectrometry (AP-MS) is a robust technique used to identify protein-protein interactions. With recent improvements in sample preparation, and dramatic advances in MS instrumentation speed and sensitivity, this technique is becoming more widely used throughout the scientific community. To meet the needs of research groups both large and small, we have developed software solutions for tracking, scoring and analyzing AP-MS data. Here, we provide details for the installation and utilization of ProHits, a Laboratory Information Management System designed specifically for AP-MS interaction proteomics that we distribute freely to the scientific community at, and which is under continuous development. The complete ProHits solution1 performs scheduled backup of mass spectrometry data and initiates database searches (Mascot, X!Tandem, COMET, SEQUEST and the output from the TransProteomics Pipeline are now supported). It stores search results and enables linking the mass spectrometry data to entries in the relational database module called “Analyst”, which is also available as a stand-alone application (including as an easy-to-install virtual machine implementation2). ProHits Analyst is organized in a hierarchical manner by project, bait, experiment and sample and also serves as an electronic notebook. When a sample is created, mass spectrometry search results can be uploaded. Search results can be explored using a series of viewers, filtered based on mass spectrometry quality, frequency of detection or background lists, viewed in Cytoscape-Web or exported to text or as a PSI XML format for deposition in interaction databases. Importantly, however, search results can be further analyzed using the SAINT statistical tool which is seamlessly integrated within ProHits to derive interaction confidence scores(3-5). With the integration with a number of open source tools and public repositories, ProHits facilitates transparent analysis and reporting of AP-MS data. 1PMID:209445832PMID:229487303PMID:204890234PMID:211319685PMID:22948729
PMCID: PMC3635280
19.  Computational Analysis of Unassigned High Quality MS/MS Spectra in Proteomic Datasets 
Proteomics  2010;10(14):2712-2718.
In a typical shotgun proteomics experiments, a significant number of high quality MS/MS spectra remain “unassigned”. The main focus of this work is to improve our understanding of various sources of unassigned high quality spectra. To achieve this, we designed an iterative computational approach for more efficient interrogation of MS/MS data. The method involves multiple stages of database searching with different search parameters, spectral library searching, blind searching for modified peptides, and genomic database searching. The method is applied to a large publicly available shotgun proteomic dataset.
PMCID: PMC3517130  PMID: 20455209
Tandem mass spectrometry; unassigned spectra; spectral quality assessment; interactive database search; post translational modification; peptide polymorphisms; novel peptides
20.  A statistical model-building perspective to identification of MS/MS spectra with PeptideProphet 
BMC Bioinformatics  2012;13(Suppl 16):S1.
PeptideProphet is a post-processing algorithm designed to evaluate the confidence in identifications of MS/MS spectra returned by a database search. In this manuscript we describe the "what and how" of PeptideProphet in a manner aimed at statisticians and life scientists who would like to gain a more in-depth understanding of the underlying statistical modeling. The theory and rationale behind the mixture-modeling approach taken by PeptideProphet is discussed from a statistical model-building perspective followed by a description of how a model can be used to express confidence in the identification of individual peptides or sets of peptides. We also demonstrate how to evaluate the quality of model fit and select an appropriate model from several available alternatives. We illustrate the use of PeptideProphet in association with the Trans-Proteomic Pipeline, a free suite of software used for protein identification.
PMCID: PMC3489532  PMID: 23176103
21.  Comprehensive analysis of proteins of pH fractionated samples using monolithic LC/MS/MS, intact MW measurement and MALDI-QIT-TOF MS 
A comprehensive platform that integrates information from the protein and peptide levels by combining various MS techniques has been employed for the analysis of proteins in fully malignant human breast cancer cells. The cell lysates were subjected to chromatofocusing fractionation, followed by tryptic digestion of pH fractions for on-line monolithic RP-HPLC interfaced with linear ion trap MS analysis for rapid protein identification. This unique approach of direct analysis of pH fractions resulted in the identification of large numbers of proteins from several selected pH fractions, in which approximately 1.5 μg of each of the pH fraction digests was consumed for an analysis time of ca 50 min. In order to combine valuable information retained at the protein level with the protein identifications obtained from the peptide level information, the same pH fraction was analyzed using nonporous (NPS)-RP-HPLC/ESI-TOF MS to obtain intact protein MW measurements. In order to further validate the protein identification procedures from the fraction digest analysis, NPS-RP-HPLC separation was performed for off-line protein collection to closely examine each protein using MALDI-TOF MS and MALDI-quadrupole ion trap (QIT)-TOF MS, and excellent agreement of protein identifications was consistently observed. It was also observed that the comparison to intact MW and other MS information was particularly useful for analyzing proteins whose identifications were suggested by one sequenced peptide from fraction digest analysis.
PMCID: PMC3426914  PMID: 17206599
pH fractionation; intact protein MW; LC/MS/MS; MALDI-QIT-TOF; monolith
22.  Label-free quantitative proteomics and SAINT analysis enable interactome mapping for the human Ser/Thr protein phosphatase 5 
Proteomics  2011;11(8):1508-1516.
Affinity-purification coupled to mass spectrometry (AP-MS) represents a powerful and proven approach for the analysis of protein-protein interactions. However, the detection of true interactions for proteins that are commonly considered background contaminants is currently a limitation of AP-MS. Here using spectral counts and the new statistical tool, Significance Analysis of INTeractome (SAINT), true interaction between the serine/threonine phosphatase 5 (PP5) and a chaperonin, heat shock protein 90 (Hsp90), is discerned. Furthermore, we report and validate a new interaction between PP5 and an Hsp90 adaptor protein, stress-induced phosphoprotein 1 (STIP1; HOP). Mutation of PP5, replacing key basic amino acids (K97A and R101A) in the tetratricopeptide repeat (TPR) region known to be necessary for interactions with Hsp90, abolished both the known interaction of PP5 with Cdc37 and the novel interaction of PP5 with STIP1. Taken together, the results presented demonstrate the usefulness of label-free quantitative proteomics and statistical tools to discriminate between noise and true interactions, even for proteins normally considered as background contaminants.
PMCID: PMC3086140  PMID: 21360678
Protein interactions; Hsp90; protein phosphatase; PP5; affinity purification-mass spectrometry; contaminant filtering; SAINT
23.  Abacus: A computational tool for extracting and pre-processing spectral count data for label-free quantitative proteomic analysis 
Proteomics  2011;11(7):1340-1345.
We describe Abacus, a computational tool for extracting spectral counts from tandem mass spectrometry based proteomic datasets. The program aggregates data from multiple experiments, adjusts spectral counts to accurately account for peptides shared across multiple proteins, and performs common normalization steps. It can also output the spectral count data at the gene level, thus simplifying the integration and comparison between gene and protein expression data. Abacus is compatible with the widely used Trans-Proteomic Pipeline suite of tools and comes with a graphical user interface making it easy to interact with the program. The main aim of Abacus is to streamline the analysis of spectral count data by providing an automated, easy to use solution for extracting this information from proteomic datasets for subsequent, more sophisticated statistical analysis.
PMCID: PMC3113614  PMID: 21360675
Label free quantification; spectral counts; software; tandem mass spectrometry; protein inference; shared peptides
24.  Modularity and hormone sensitivity of the Drosophila melanogaster insulin receptor/target of rapamycin interaction proteome 
First systematic analysis of the evolutionary conserved InR/TOR pathway interaction proteome in Drosophila.Quantitative mass spectrometry revealed that 22% of identified protein interactions are regulated by the growth hormone insulin affecting membrane proximal as well as intracellular signaling complexes.Systematic RNA interference linked a significant fraction of network components to the control of dTOR kinase activity.Combined biochemical and genetic data suggest dTTT, a dTOR-containing complex required for cell growth control by dTORC1 and dTORC2 in vivo.
Cellular growth is a fundamental process that requires constant adaptations to changing environmental conditions, like growth factor and nutrient availability, energy levels and more. Over the years, the insulin receptor/target of rapamycin pathway (InR/TOR) emerged as a key signaling system for the control of metazoan cell growth. Genetic screens carried out in the fruit fly Drosophila melanogaster identified key InR/TOR pathway components and their relationships. Phenotypes such as altered cell growth are likely to emerge from perturbed dynamic networks containing InR/TOR pathway components, which stably or transiently interact with other cellular proteins to form complexes and networks thereof. Systematic studies on the topology and dynamics of protein interaction networks become therefore highly relevant to gain systems level understanding of deregulated cell growth. Despite much progress in genetic analysis only few systematic protein interaction studies have been reported for Drosophila, which in most cases lack quantitative information representing the dynamic nature of such networks. Here, we present the first quantitative affinity purification mass spectrometry (AP–MS/MS) analysis on the evolutionary conserved InR/TOR signaling network in Drosophila. Systematic RNAi-based functional analysis of identified network components revealed key components linked to the regulation of the central effector kinase dTOR. This includes also dTTT, a novel dTOR-containing complex required for the control of dTORC1 and dTORC2 in vivo.
For systematic AP–MS analysis, we generated Drosophila Kc167 cell lines inducibly expressing affinity-tagged bait proteins previously linked to InR/TOR signaling. Bait expressing Kc167 cell lines were harvested before and after insulin stimulation for subsequent affinity purification. Following LC–MS/MS analysis and probabilistic data filtering using SAINT (Choi et al, 2010), we generated a quantitative network model from 97 high confidence protein–protein interactions and 58 network components (Figure 2). The presented network displayed a high degree of orthologous interactions conserved also in human cells and identified a number of novel molecular interactions with InR/TOR signaling components for future hypothesis driven analysis.
To measure insulin-induced changes within the InR/TOR interaction proteome, we applied a recently introduced label-free quantitative MS approach (Rinner et al, 2007). The obtained quantitative data suggest that 22% of all interactions in the network are regulated by insulin. Major changes could be observed within the membrane proximal InR/chico/PI3K signaling complexes, and also in 14-3-3 protein containing signaling complexes and dTORC1, a complex that contains besides dTOR all major orthologous proteins found also in human mTORC1 including the two dTORC1 substrates d4E-BP (Thor) and S6 Kinase (S6K). Insulin triggered both, dissociation and association of dTORC1 proteins. Among the proteins that showed enhanced binding to dTORC1 upon insulin stimulation we found Unkempt, a RING-finger protein with a proposed role in ubiquitin-mediated protein degradation (Lores et al, 2010). Besides dTORC1 our systematic AP–MS analysis also revealed the presence of dTORC2, the second major TOR complex in Drosophila. dTORC2 contains the Drosophila orthologous of human mTORC2 proteins, but in contrast to dTORC1 was not affected upon insulin stimulation. Interestingly, we also found a specific set of proteins that were not linked to the canonical TOR complexes TORC1 and TORC2 in dTOR purifications. These include LqfR (liquid facets related), Pontin, Reptin, Spaghetti and the gene product of CG16908. We found the same set of proteins when we used CG16908 as a bait, suggesting complex formation among the identified proteins. None of the dTORC1/2 components besides dTOR was identified in CG16908 purifications, indicating that these proteins form dTOR complexes distinct from dTORC1 and dTORC2. Based on known interaction information from other species and data obtained from this study we refer to this complex as dTTT (Drosophila TOR, TELO2, TTI1) (Horejsi et al, 2010; [18]Hurov et al, 2010; [20]Kaizuka et al, 2010). A directed quantitative MS analysis of dTOR complex components suggests that dTORC1 is the most abundant dTOR complex we identified in Kc167 cells.
We next studied the potential roles of the identified network components for controlling the activity of the dInR/TOR pathway using systematic RNAi depletion and quantitative western blotting to measure the changes in abundance of phosphorylated substrates of dTORC1 (Thor/d4E-BP, dS6K) and dTORC2 (dPKB) in RNAi-treated cells (Figure 5). Overall, we could identify 16 proteins (out of 58) whose depletion caused an at least 50% increase or decrease in the levels of phosphorylated d4E-BP, S6K and/or PKB compared with control GFP RNAi. Besides established pathway components, we found several novel regulators within the dInR/TOR interaction network. For example, RNAi against the novel insulin-regulated dTORC1 component Unkempt resulted in enhanced phosphorylation of the dTORC1 substrate d4E-BP, which suggests a negative role for Unkempt on dTORC1 activity. In contrast, depletion of CG16908 and LqfR caused hypo-phosphorylation of all dTOR substrates similar to dTOR itself, suggesting a positive role for the dTTT complex on dTOR activity. Subsequently, we tested whether dTTT components also plays a role in dTOR-mediated cell growth in vivo. Depletion of both dTTT components, CG16908 and LqfR, in the Drosophila eye resulted in a substantial decrease in eye size. Likewise, FLP-FRT-mediated mitotic recombination resulted in CG16908 and LqfR mutant clones with a similar reduced growth phenotype as observed in dTOR mutant clones. Hence, the combined biochemical and genetic analysis revealed dTTT as a dTOR-containing complex required for the activity of both dTORC1 and dTORC2 and thus plays a critical role in controlling cell growth.
Taken together, these results illustrate how a systematic quantitative AP–MS approach when combined with systematic functional analysis in Drosophila can reveal novel insights into the dynamic organization of regulatory networks for cell growth control in metazoans.
Using quantitative mass spectrometry, this study reports how insulin affects the modularity of the interaction proteome of the Drosophila InR/TOR pathway, an evolutionary conserved signaling system for the control of metazoan cell growth. Systematic functional analysis linked a significant number of identified network components to the control of dTOR activity and revealed dTTT, a dTOR complex required for in vivo cell growth control by dTORC1 and dTORC2.
Genetic analysis in Drosophila melanogaster has been widely used to identify a system of genes that control cell growth in response to insulin and nutrients. Many of these genes encode components of the insulin receptor/target of rapamycin (InR/TOR) pathway. However, the biochemical context of this regulatory system is still poorly characterized in Drosophila. Here, we present the first quantitative study that systematically characterizes the modularity and hormone sensitivity of the interaction proteome underlying growth control by the dInR/TOR pathway. Applying quantitative affinity purification and mass spectrometry, we identified 97 high confidence protein interactions among 58 network components. In all, 22% of the detected interactions were regulated by insulin affecting membrane proximal as well as intracellular signaling complexes. Systematic functional analysis linked a subset of network components to the control of dTORC1 and dTORC2 activity. Furthermore, our data suggest the presence of three distinct dTOR kinase complexes, including the evolutionary conserved dTTT complex (Drosophila TOR, TELO2, TTI1). Subsequent genetic studies in flies suggest a role for dTTT in controlling cell growth via a dTORC1- and dTORC2-dependent mechanism.
PMCID: PMC3261712  PMID: 22068330
cell growth; InR/TOR pathway; interaction proteome; quantitative mass spectrometry; signaling
25.  A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics 
Journal of proteomics  2010;73(11):2092-2123.
This manuscript provides a comprehensive review of the peptide and protein identification process using tandem mass spectrometry (MS/MS) data generated in shotgun proteomic experiments. The commonly used methods for assigning peptide sequences to MS/MS spectra are critically discussed and compared, from basic strategies to advanced multi-stage approaches. A particular attention is paid to the problem of false-positive identifications. Existing statistical approaches for assessing the significance of peptide to spectrum matches are surveyed, ranging from single-spectrum approaches such as expectation values to global error rate estimation procedures such as false discovery rates and posterior probabilities. The importance of using auxiliary discriminant information (mass accuracy, peptide separation coordinates, digestion properties, and etc.) is discussed, and advanced computational approaches for joint modeling of multiple sources of information are presented. This review also includes a detailed analysis of the issues affecting the interpretation of data at the protein level, including the amplification of error rates when going from peptide to protein level, and the ambiguities in inferring the identifies of sample proteins in the presence of shared peptides. Commonly used methods for computing protein-level confidence scores are discussed in detail. The review concludes with a discussion of several outstanding computational issues.
PMCID: PMC2956504  PMID: 20816881
Proteomics; Bioinformatics; Mass Spectrometry; Peptide Identification; Protein Inference; Statistical Models; False Discovery Rates

Results 1-25 (45)