Search tips
Search criteria

Results 1-25 (25)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
1.  Micro-RNAs in regenerating lungs: an integrative systems biology analysis of murine influenza pneumonia 
BMC Genomics  2014;15(1):587.
Tissue regeneration in the lungs is gaining increasing interest as a potential influenza management strategy. In this study, we explored the role of microRNAs, short non-coding RNAs involved in post-transcriptional regulation, during pulmonary regeneration after influenza infection.
We profiled miRNA and mRNA expression levels following lung injury and tissue regeneration using a murine influenza pneumonia model. BALB/c mice were infected with a sub-lethal dose of influenza A/PR/8(H1N1) virus, and their lungs were harvested at 7 and 15 days post-infection to evaluate the expression of ~300 miRNAs along with ~36,000 genes using microarrays. A global network was constructed between differentially expressed miRNAs and their potential target genes with particular focus on the pulmonary repair and regeneration processes to elucidate the regulatory role of miRNAs in the lung repair pathways. The miRNA arrays revealed a global down-regulation of miRNAs. TargetScan analyses also revealed specific miRNAs highly involved in targeting relevant gene functions in repair such as miR-290 and miR-505 at 7 dpi; and let-7, miR-21 and miR-30 at 15 dpi.
The significantly differentially regulated miRNAs are implicated in the activation or suppression of cellular proliferation and stem cell maintenance, which are required during the repair of the damaged lungs. These findings provide opportunities in the development of novel repair strategies in influenza-induced pulmonary injury.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-587) contains supplementary material, which is available to authorized users.
PMCID: PMC4108790  PMID: 25015185
Lung repair; Pulmonary regeneration; Influenza pneumonia; miRNAs; miRNome; Transcriptome
2.  A Global Protein Kinase and Phosphatase Interaction Network in Yeast 
Science (New York, N.Y.)  2010;328(5981):1043-1046.
The interactions of protein kinases and phosphatases with their regulatory subunits and substrates underpin cellular regulation. We identified a kinase and phosphatase interaction (KPI) network of 1844 interactions in budding yeast by mass spectrometric analysis of protein complexes. The KPI network contained many dense local regions of interactions that suggested new functions. Notably, the cell cycle phosphatase Cdc14 associated with multiple kinases that revealed roles for Cdc14 in mitogen-activated protein kinase signaling, the DNA damage response, and metabolism, whereas interactions of the target of rapamycin complex 1 (TORC1) uncovered new effector kinases in nitrogen and carbon metabolism. An extensive backbone of kinase-kinase interactions cross-connects the proteome and may serve to coordinate diverse cellular responses.
PMCID: PMC3983991  PMID: 20489023
3.  Sparsely correlated hidden Markov models with application to genome-wide location studies 
Bioinformatics  2013;29(5):533-541.
Motivation: Multiply correlated datasets have become increasingly common in genome-wide location analysis of regulatory proteins and epigenetic modifications. Their correlation can be directly incorporated into a statistical model to capture underlying biological interactions, but such modeling quickly becomes computationally intractable.
Results: We present sparsely correlated hidden Markov models (scHMM), a novel method for performing simultaneous hidden Markov model (HMM) inference for multiple genomic datasets. In scHMM, a single HMM is assumed for each series, but the transition probability in each series depends on not only its own hidden states but also the hidden states of other related series. For each series, scHMM uses penalized regression to select a subset of the other data series and estimate their effects on the odds of each transition in the given series. Following this, hidden states are inferred using a standard forward–backward algorithm, with the transition probabilities adjusted by the model at each position, which helps retain the order of computation close to fitting independent HMMs (iHMM). Hence, scHMM is a collection of inter-dependent non-homogeneous HMMs, capable of giving a close approximation to a fully multivariate HMM fit. A simulation study shows that scHMM achieves comparable sensitivity to the multivariate HMM fit at a much lower computational cost. The method was demonstrated in the joint analysis of 39 histone modifications, CTCF and RNA polymerase II in human CD4+ T cells. scHMM reported fewer high-confidence regions than iHMM in this dataset, but scHMM could recover previously characterized histone modifications in relevant genomic regions better than iHMM. In addition, the resulting combinatorial patterns from scHMM could be better mapped to the 51 states reported by the multivariate HMM method of Ernst and Kellis.
Availability: The scHMM package can be freely downloaded from and is recommended for use in a linux environment.
Contact: or
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3582268  PMID: 23325620
4.  The CRAPome: a Contaminant Repository for Affinity Purification Mass Spectrometry Data 
Nature methods  2013;10(8):730-736.
Affinity purification coupled with mass spectrometry (AP-MS) is now a widely used approach for the identification of protein-protein interactions. However, for any given protein of interest, determining which of the identified polypeptides represent bona fide interactors versus those that are background contaminants (e.g. proteins that interact with the solid-phase support, affinity reagent or epitope tag) is a challenging task. While the standard approach is to identify nonspecific interactions using one or more negative controls, most small-scale AP-MS studies do not capture a complete, accurate background protein set. Fortunately, negative controls are largely bait-independent. Hence, aggregating negative controls from multiple AP-MS studies can increase coverage and improve the characterization of background associated with a given experimental protocol. Here we present the Contaminant Repository for Affinity Purification (the CRAPome) and describe the use of this resource to score protein-protein interactions. The repository (currently available for Homo sapiens and Saccharomyces cerevisiae) and computational tools are freely available online at
PMCID: PMC3773500  PMID: 23921808
5.  Reinvestigation of Aminoacyl-TRNA Synthetase Core Complex by Affinity Purification-Mass Spectrometry Reveals TARSL2 as a Potential Member of the Complex 
PLoS ONE  2013;8(12):e81734.
Twenty different aminoacyl-tRNA synthetases (ARSs) link each amino acid to their cognate tRNAs. Individual ARSs are also associated with various non-canonical activities involved in neuronal diseases, cancer and autoimmune diseases. Among them, eight ARSs (D, EP, I, K, L, M, Q and RARS), together with three ARS-interacting multifunctional proteins (AIMPs), are currently known to assemble the multi-synthetase complex (MSC). However, the cellular function and global topology of MSC remain unclear. In order to understand the complex interaction within MSC, we conducted affinity purification-mass spectrometry (AP-MS) using each of AIMP1, AIMP2 and KARS as a bait protein. Mass spectrometric data were funneled into SAINT software to distinguish true interactions from background contaminants. A total of 40, 134, 101 proteins in each bait scored over 0.9 of SAINT probability in HEK 293T cells. Complex-forming ARSs, such as DARS, EPRS, IARS, Kars, LARS, MARS, QARS and RARS, were constantly found to interact with each bait. Variants such as, AIMP2-DX2 and AIMP1 isoform 2 were found with specific peptides in KARS precipitates. Relative enrichment analysis of the mass spectrometric data demonstrated that TARSL2 (threonyl-tRNA synthetase like-2) was highly enriched with the ARS-core complex. The interaction was further confirmed by coimmunoprecipitation of TARSL2 with other ARS core-complex components. We suggest TARSL2 as a new component of ARS core-complex.
PMCID: PMC3846882  PMID: 24312579
6.  Analyzing protein-protein interactions from affinity purification-mass spectrometry data with SAINT 
Significance Analysis of INTeractome (SAINT) is a software package for scoring protein-protein interactions based on label-free quantitative proteomics data (e.g. spectral count or intensity) in affinity purification – mass spectrometry (AP-MS) experiments. SAINT allows bench scientists to select bona fide interactions and remove non-specific interactions in an unbiased manner. However, there is no `one-size-fits-all' statistical model for every dataset, since the experimental design varies across studies. Key variables include the number of baits, the number of biological replicates per bait, and control purifications. Here we give a detailed account of input data format, control data, selection of high confidence interactions, and visualization of filtered data. We explain additional options for customizing the statistical model for optimal filtering in specific datasets. We also discuss a graphical user interface of SAINT in connection to the LIMS system ProHits which can be installed as a virtual machine on Mac OSX or PC Windows computers.
PMCID: PMC3446209  PMID: 22948729
Protein-protein interactions; Label-free quantitative proteomics; Affinity purification – mass spectrometry (AP-MS); Statistical model
7.  Using ProHits to store, annotate and analyze affinity purification - mass spectrometry (AP-MS) data 
Affinity purification coupled with mass spectrometry (AP-MS) is a robust technique used to identify protein-protein interactions. With recent improvements in sample preparation, and dramatic advances in MS instrumentation speed and sensitivity, this technique is becoming more widely used throughout the scientific community. To meet the needs of research groups both large and small, we have developed software solutions for tracking, scoring and analyzing AP-MS data. Here, we provide details for the installation and utilization of ProHits, a Laboratory Information Management System designed specifically for AP-MS interaction proteomics. This protocol explains: (i) how to install the complete ProHits system, including modules for the management of mass spectrometry files and the analysis of interaction data, and (ii) alternative options for the use of pre-existing search results in simpler versions of ProHits, including a virtual machine implementation of our ProHits Lite software. We also describe how to use the main features of the software to analyze AP-MS data.
PMCID: PMC3669397  PMID: 22948730
Affinity purification coupled with mass spectrometry; Data analysis; Virtual machine; Statistical models; Protein-protein interactions
8.  Adaptive Discriminant Function Analysis and Re-ranking of MS/MS Database Search Results for Improved Peptide Identification in Shotgun Proteomics 
Journal of proteome research  2008;7(11):4878-4889.
Robust statistical validation of peptide identifications obtained by tandem mass spectrometry and sequence database searching is an important task in shotgun proteomics. PeptideProphet is a commonly used computational tool that computes confidence measures for peptide identifications. In this paper, we investigate several limitations of the PeptideProphet modeling approach, including the use of fixed coefficients in computing the discriminant search score and selection of the top scoring peptide assignment per spectrum only. To address these limitations, we describe an adaptive method in which a new discriminant function is learned from the data in an iterative fashion. We extend the modeling framework to go beyond the top scoring peptide assignment per spectrum. We also investigate the effect of clustering the spectra according to their spectrum quality score followed by cluster-specific mixture modeling. The analysis is carried out using data acquired from a mixture of purified proteins on four different types of mass spectrometers, as well as using a complex human serum dataset. A special emphasis is placed on the analysis of data generated on high mass accuracy instruments.
PMCID: PMC3744223  PMID: 18788775
Tandem Mass Spectrometry; Database searching; Peptide Identification; Statistical Modeling; Adaptive Discriminant Analysis; Mass Accuracy; Decoy Sequences
9.  SAINT-MS1: protein-protein interaction scoring using label-free intensity data in affinity purification – mass spectrometry experiments 
Journal of proteome research  2012;11(4):2619-2624.
We present a statistical method SAINT-MS1 for scoring protein-protein interactions based on the label-free MS1 intensity data from affinity purification - mass spectrometry (AP-MS) experiments. The method is an extension of Significance Analysis of INTeractome (SAINT), a model-based method previously developed for spectral count data. We reformulated the statistical model for the log-transformed intensity data, including adequate treatment of missing observations, i.e. interactions whose quantitative data are inconsistent over replicate purifications. We demonstrate the performance of SAINT-MS1 using two recently published datasets: a small LTQ-Orbitrap dataset with three replicate purifications of single human bait protein and control purifications, and a larger drosophila dataset targeting insulin receptor/target of rapamycin signaling pathway generated using an LTQ-FT instrument. Using the drosophila dataset, we also compare and discuss the performance of SAINT analysis based on spectral count and MS1 intensity data in terms of the recovery of orthologous and literature-curated interactions. Given rapid advances in high mass accuracy instrumentation and intensity-based label-free quantification software, we expect that SAINT-MS1 will become a useful tool allowing improved detection of protein interactions in label-free AP-MS data, especially in the low abundance range.
PMCID: PMC3744231  PMID: 22352807
protein-protein interaction; interaction scoring; affinity purification; mass spectrometry; spectral counts; intensity
10.  Label-free quantitative proteomics reveals differentially regulated proteins in the latex of sticky diseased Carica papaya L. plants 
Journal of Proteomics  2012;75(11):3191-3198.
Papaya meleira virus (PMeV) is so far the only described laticifer-infecting virus, the causal agent of papaya (Carica papaya L.) sticky disease. The effects of PMeV on the laticifers’ regulatory network were addressed here through the proteomic analysis of papaya latex. Using both 1-DE- and 1D-LC-ESI-MS/MS, 160 unique papaya latex proteins were identified, representing 122 new proteins in the latex of this plant. Quantitative analysis by normalized spectral counting revealed 10 down-regulated proteins in the latex of diseased plants, 9 cysteine proteases (chymopapain) and 1 latex serine proteinase inhibitor. A repression of papaya latex proteolytic activity during PMeV infection was hypothesized. This was further confirmed by enzymatic assays that showed a reduction of cysteine-protease-associated proteolytic activity in the diseased papaya latex. These findings are discussed in the context of plant responses against pathogens and may greatly contribute to understand the roles of laticifers in plant stress responses.
PMCID: PMC3381983  PMID: 22465191
Carica papaya; Label-free quantitative proteomics; Latex; Mass spectrometry; Plant proteomics
11.  Keeping Track of Interactomes Using the ProHits LIMS 
Affinity purification coupled with mass spectrometry (AP-MS) is a robust technique used to identify protein-protein interactions. With recent improvements in sample preparation, and dramatic advances in MS instrumentation speed and sensitivity, this technique is becoming more widely used throughout the scientific community. To meet the needs of research groups both large and small, we have developed software solutions for tracking, scoring and analyzing AP-MS data. Here, we provide details for the installation and utilization of ProHits, a Laboratory Information Management System designed specifically for AP-MS interaction proteomics that we distribute freely to the scientific community at, and which is under continuous development. The complete ProHits solution1 performs scheduled backup of mass spectrometry data and initiates database searches (Mascot, X!Tandem, COMET, SEQUEST and the output from the TransProteomics Pipeline are now supported). It stores search results and enables linking the mass spectrometry data to entries in the relational database module called “Analyst”, which is also available as a stand-alone application (including as an easy-to-install virtual machine implementation2). ProHits Analyst is organized in a hierarchical manner by project, bait, experiment and sample and also serves as an electronic notebook. When a sample is created, mass spectrometry search results can be uploaded. Search results can be explored using a series of viewers, filtered based on mass spectrometry quality, frequency of detection or background lists, viewed in Cytoscape-Web or exported to text or as a PSI XML format for deposition in interaction databases. Importantly, however, search results can be further analyzed using the SAINT statistical tool which is seamlessly integrated within ProHits to derive interaction confidence scores(3-5). With the integration with a number of open source tools and public repositories, ProHits facilitates transparent analysis and reporting of AP-MS data. 1PMID:209445832PMID:229487303PMID:204890234PMID:211319685PMID:22948729
PMCID: PMC3635280
12.  A Web Resource for Improved Analysis of AP-MS Protein Interaction Data 
Affinity purification coupled with mass spectrometry (AP-MS) is now a widely used approach for the identification of protein-protein interactions. However, for any given protein of interest, determining which of the identified polypeptides represent bona fide interactors versus those that are background contaminants (e.g. proteins that interact with the solid-phase support, affinity reagent or epitope tag) is a challenging task. While the standard approach is to identify nonspecific interactions using one or more negative controls, most small-scale AP-MS studies do not capture a complete, accurate background protein set. Fortunately, since negative controls are largely bait-independent, we reasoned that the negative controls generated by the proteomics research community could be developed as a resource for scoring AP-MS data.
Here we present the Contaminant Repository for Affinity Purification (The CRAPome), currently containing AP-MS data from 343 control purifications conducted by 11 different research groups ( Users employ an intuitive graphical user interface to explore the database, by either querying one protein at a time, downloading background contaminant lists for selected experimental conditions, or uploading their own data (alongside their own negative controls when available) and performing data analysis. The CRAPome database scores contaminants vs. true interactors based on semi-quantitative mass spectrometry data (normalized spectral counts) embedded in most mass spectrometry experiments. The Significance Analysis of INTeractome (SAINT) scoring scheme, in addition to a simpler Fold Change calculation (FC score) are used to score user-supplied data and return a ranked list of putative interactors. We also describe database structure and composition, provide examples of the use of this resource to filter contaminants with properly chosen controls, and demonstrate the utility of the scoring scheme for identifying bona fide interaction partners. The CRAPome accommodates a variety of purification schemes and, while currently focused on human data, will be expanded to other species.
PMCID: PMC3635329
13.  Label-free quantitative proteomics and SAINT analysis enable interactome mapping for the human Ser/Thr protein phosphatase 5 
Proteomics  2011;11(8):1508-1516.
Affinity-purification coupled to mass spectrometry (AP-MS) represents a powerful and proven approach for the analysis of protein-protein interactions. However, the detection of true interactions for proteins that are commonly considered background contaminants is currently a limitation of AP-MS. Here using spectral counts and the new statistical tool, Significance Analysis of INTeractome (SAINT), true interaction between the serine/threonine phosphatase 5 (PP5) and a chaperonin, heat shock protein 90 (Hsp90), is discerned. Furthermore, we report and validate a new interaction between PP5 and an Hsp90 adaptor protein, stress-induced phosphoprotein 1 (STIP1; HOP). Mutation of PP5, replacing key basic amino acids (K97A and R101A) in the tetratricopeptide repeat (TPR) region known to be necessary for interactions with Hsp90, abolished both the known interaction of PP5 with Cdc37 and the novel interaction of PP5 with STIP1. Taken together, the results presented demonstrate the usefulness of label-free quantitative proteomics and statistical tools to discriminate between noise and true interactions, even for proteins normally considered as background contaminants.
PMCID: PMC3086140  PMID: 21360678
Protein interactions; Hsp90; protein phosphatase; PP5; affinity purification-mass spectrometry; contaminant filtering; SAINT
14.  When One and One Gives More than Two: Challenges and Opportunities of Integrative Omics 
Frontiers in Genetics  2012;2:105.
Since the dawn of the post-genomic era a myriad of novel high-throughput technologies have been developed that are capable of measuring thousands of biological molecules at once, giving rise to various “omics” platforms. These advances offer the unique opportunity to study how individual parts of a biological system work together to produce emerging phenotypes. Today, many research laboratories are moving toward applying multiple omics platforms to analyze the same biological samples. In addition, network information of interacting molecules is being incorporated more and more into the analysis and interpretation of these multiple omics datasets, which provides novel ways to integrate multiple layers of heterogeneous biological information into a single coherent picture. Here, we provide a perspective on how such recent “integrative omics” efforts are likely going to shift biological paradigms once again, and what challenges lie ahead.
PMCID: PMC3262227  PMID: 22303399
data integration; omics; systems biology; statistical data analysis
15.  Modularity and hormone sensitivity of the Drosophila melanogaster insulin receptor/target of rapamycin interaction proteome 
First systematic analysis of the evolutionary conserved InR/TOR pathway interaction proteome in Drosophila.Quantitative mass spectrometry revealed that 22% of identified protein interactions are regulated by the growth hormone insulin affecting membrane proximal as well as intracellular signaling complexes.Systematic RNA interference linked a significant fraction of network components to the control of dTOR kinase activity.Combined biochemical and genetic data suggest dTTT, a dTOR-containing complex required for cell growth control by dTORC1 and dTORC2 in vivo.
Cellular growth is a fundamental process that requires constant adaptations to changing environmental conditions, like growth factor and nutrient availability, energy levels and more. Over the years, the insulin receptor/target of rapamycin pathway (InR/TOR) emerged as a key signaling system for the control of metazoan cell growth. Genetic screens carried out in the fruit fly Drosophila melanogaster identified key InR/TOR pathway components and their relationships. Phenotypes such as altered cell growth are likely to emerge from perturbed dynamic networks containing InR/TOR pathway components, which stably or transiently interact with other cellular proteins to form complexes and networks thereof. Systematic studies on the topology and dynamics of protein interaction networks become therefore highly relevant to gain systems level understanding of deregulated cell growth. Despite much progress in genetic analysis only few systematic protein interaction studies have been reported for Drosophila, which in most cases lack quantitative information representing the dynamic nature of such networks. Here, we present the first quantitative affinity purification mass spectrometry (AP–MS/MS) analysis on the evolutionary conserved InR/TOR signaling network in Drosophila. Systematic RNAi-based functional analysis of identified network components revealed key components linked to the regulation of the central effector kinase dTOR. This includes also dTTT, a novel dTOR-containing complex required for the control of dTORC1 and dTORC2 in vivo.
For systematic AP–MS analysis, we generated Drosophila Kc167 cell lines inducibly expressing affinity-tagged bait proteins previously linked to InR/TOR signaling. Bait expressing Kc167 cell lines were harvested before and after insulin stimulation for subsequent affinity purification. Following LC–MS/MS analysis and probabilistic data filtering using SAINT (Choi et al, 2010), we generated a quantitative network model from 97 high confidence protein–protein interactions and 58 network components (Figure 2). The presented network displayed a high degree of orthologous interactions conserved also in human cells and identified a number of novel molecular interactions with InR/TOR signaling components for future hypothesis driven analysis.
To measure insulin-induced changes within the InR/TOR interaction proteome, we applied a recently introduced label-free quantitative MS approach (Rinner et al, 2007). The obtained quantitative data suggest that 22% of all interactions in the network are regulated by insulin. Major changes could be observed within the membrane proximal InR/chico/PI3K signaling complexes, and also in 14-3-3 protein containing signaling complexes and dTORC1, a complex that contains besides dTOR all major orthologous proteins found also in human mTORC1 including the two dTORC1 substrates d4E-BP (Thor) and S6 Kinase (S6K). Insulin triggered both, dissociation and association of dTORC1 proteins. Among the proteins that showed enhanced binding to dTORC1 upon insulin stimulation we found Unkempt, a RING-finger protein with a proposed role in ubiquitin-mediated protein degradation (Lores et al, 2010). Besides dTORC1 our systematic AP–MS analysis also revealed the presence of dTORC2, the second major TOR complex in Drosophila. dTORC2 contains the Drosophila orthologous of human mTORC2 proteins, but in contrast to dTORC1 was not affected upon insulin stimulation. Interestingly, we also found a specific set of proteins that were not linked to the canonical TOR complexes TORC1 and TORC2 in dTOR purifications. These include LqfR (liquid facets related), Pontin, Reptin, Spaghetti and the gene product of CG16908. We found the same set of proteins when we used CG16908 as a bait, suggesting complex formation among the identified proteins. None of the dTORC1/2 components besides dTOR was identified in CG16908 purifications, indicating that these proteins form dTOR complexes distinct from dTORC1 and dTORC2. Based on known interaction information from other species and data obtained from this study we refer to this complex as dTTT (Drosophila TOR, TELO2, TTI1) (Horejsi et al, 2010; [18]Hurov et al, 2010; [20]Kaizuka et al, 2010). A directed quantitative MS analysis of dTOR complex components suggests that dTORC1 is the most abundant dTOR complex we identified in Kc167 cells.
We next studied the potential roles of the identified network components for controlling the activity of the dInR/TOR pathway using systematic RNAi depletion and quantitative western blotting to measure the changes in abundance of phosphorylated substrates of dTORC1 (Thor/d4E-BP, dS6K) and dTORC2 (dPKB) in RNAi-treated cells (Figure 5). Overall, we could identify 16 proteins (out of 58) whose depletion caused an at least 50% increase or decrease in the levels of phosphorylated d4E-BP, S6K and/or PKB compared with control GFP RNAi. Besides established pathway components, we found several novel regulators within the dInR/TOR interaction network. For example, RNAi against the novel insulin-regulated dTORC1 component Unkempt resulted in enhanced phosphorylation of the dTORC1 substrate d4E-BP, which suggests a negative role for Unkempt on dTORC1 activity. In contrast, depletion of CG16908 and LqfR caused hypo-phosphorylation of all dTOR substrates similar to dTOR itself, suggesting a positive role for the dTTT complex on dTOR activity. Subsequently, we tested whether dTTT components also plays a role in dTOR-mediated cell growth in vivo. Depletion of both dTTT components, CG16908 and LqfR, in the Drosophila eye resulted in a substantial decrease in eye size. Likewise, FLP-FRT-mediated mitotic recombination resulted in CG16908 and LqfR mutant clones with a similar reduced growth phenotype as observed in dTOR mutant clones. Hence, the combined biochemical and genetic analysis revealed dTTT as a dTOR-containing complex required for the activity of both dTORC1 and dTORC2 and thus plays a critical role in controlling cell growth.
Taken together, these results illustrate how a systematic quantitative AP–MS approach when combined with systematic functional analysis in Drosophila can reveal novel insights into the dynamic organization of regulatory networks for cell growth control in metazoans.
Using quantitative mass spectrometry, this study reports how insulin affects the modularity of the interaction proteome of the Drosophila InR/TOR pathway, an evolutionary conserved signaling system for the control of metazoan cell growth. Systematic functional analysis linked a significant number of identified network components to the control of dTOR activity and revealed dTTT, a dTOR complex required for in vivo cell growth control by dTORC1 and dTORC2.
Genetic analysis in Drosophila melanogaster has been widely used to identify a system of genes that control cell growth in response to insulin and nutrients. Many of these genes encode components of the insulin receptor/target of rapamycin (InR/TOR) pathway. However, the biochemical context of this regulatory system is still poorly characterized in Drosophila. Here, we present the first quantitative study that systematically characterizes the modularity and hormone sensitivity of the interaction proteome underlying growth control by the dInR/TOR pathway. Applying quantitative affinity purification and mass spectrometry, we identified 97 high confidence protein interactions among 58 network components. In all, 22% of the detected interactions were regulated by insulin affecting membrane proximal as well as intracellular signaling complexes. Systematic functional analysis linked a subset of network components to the control of dTORC1 and dTORC2 activity. Furthermore, our data suggest the presence of three distinct dTOR kinase complexes, including the evolutionary conserved dTTT complex (Drosophila TOR, TELO2, TTI1). Subsequent genetic studies in flies suggest a role for dTTT in controlling cell growth via a dTORC1- and dTORC2-dependent mechanism.
PMCID: PMC3261712  PMID: 22068330
cell growth; InR/TOR pathway; interaction proteome; quantitative mass spectrometry; signaling
16.  Global Analysis of Protein Palmitoylation in African Trypanosomes▿† 
Eukaryotic Cell  2011;10(3):455-463.
Many eukaryotic proteins are posttranslationally modified by the esterification of cysteine thiols to long-chain fatty acids. This modification, protein palmitoylation, is catalyzed by a large family of palmitoyl acyltransferases that share an Asp-His-His-Cys Cys-rich domain but differ in their subcellular localizations and substrate specificities. In Trypanosoma brucei, the flagellated protozoan parasite that causes African sleeping sickness, protein palmitoylation has been observed for a few proteins, but the extent and consequences of this modification are largely unknown. We undertook the present study to investigate T. brucei protein palmitoylation at both the enzyme and substrate levels. Treatment of parasites with an inhibitor of total protein palmitoylation caused potent growth inhibition, yet there was no effect on growth by the separate, selective inhibition of each of the 12 individual T. brucei palmitoyl acyltransferases. This suggested either that T. brucei evolved functional redundancy for the palmitoylation of essential palmitoyl proteins or that palmitoylation of some proteins is catalyzed by a noncanonical transferase. To identify the palmitoylated proteins in T. brucei, we performed acyl biotin exchange chemistry on parasite lysates, followed by streptavidin chromatography, two-dimensional liquid chromatography-tandem mass spectrometry protein identification, and QSpec statistical analysis. A total of 124 palmitoylated proteins were identified, with an estimated false discovery rate of 1.0%. This palmitoyl proteome includes all of the known palmitoyl proteins in procyclic-stage T. brucei as well as several proteins whose homologues are palmitoylated in other organisms. Their sequences demonstrate the variety of substrate motifs that support palmitoylation, and their identities illustrate the range of cellular processes affected by palmitoylation in these important pathogens.
PMCID: PMC3067466  PMID: 21193548
17.  MSblender: a probabilistic approach for integrating peptide identifications from multiple database search engines 
Journal of proteome research  2011;10(7):2949-2958.
Shotgun proteomics using mass spectrometry is a powerful method for protein identification but suffers limited sensitivity in complex samples. Integrating peptide identifications from multiple database search engines is a promising strategy to increase the number of peptide identifications and reduce the volume of unassigned tandem mass spectra. Existing methods pool statistical significance scores such as p-values or posterior probabilities of peptide-spectrum matches (PSMs) from multiple search engines after high scoring peptides have been assigned to spectra, but these methods lack reliable control of identification error rates as data are integrated from different search engines. We developed a statistically coherent method for integrative analysis, termed MSblender. MSblender converts raw search scores from search engines into a probability score for all possible PSMs and properly accounts for the correlation between search scores. The method reliably estimates false discovery rates and identifies more PSMs than any single search engine at the same false discovery rate. Increased identifications increment spectral counts for all detected proteins and allow quantification of proteins that would not have been quantified by individual search engines. We also demonstrate that enhanced quantification contributes to improve sensitivity in differential expression analyses.
PMCID: PMC3128686  PMID: 21488652
integrative analysis; database search; peptide identification
18.  SAINT: Probabilistic Scoring of Affinity Purification - Mass Spectrometry Data 
Nature methods  2010;8(1):70-73.
We present SAINT (Significance Analysis of INTeractome), a computational tool that assigns confidence scores to protein-protein interaction data generated using affinity-purification coupled to mass spectrometry (AP-MS). The method utilizes label-free quantitative data and constructs separate distributions for true and false interactions to derive the probability of a bona fide protein-protein interaction. We demonstrate that SAINT is applicable to data of different scales and protein connectivity and allows for the transparent analysis of AP-MS data.
PMCID: PMC3064265  PMID: 21131968
19.  Prestroke Proteomic Changes in Cerebral Microvessels in Stroke-Prone, Transgenic[hCETP]-Hyperlipidemic, Dahl Salt-Sensitive Hypertensive Rats 
Molecular Medicine  2011;17(7-8):588-598.
Stroke is the third leading cause of death in the United States with high rates of morbidity among survivors. The search to fill the unequivocal need for new therapeutic approaches would benefit from unbiased proteomic analyses of animal models of spontaneous stroke in the prestroke stage. Since brain microvessels play key roles in neurovascular coupling, we investigated prestroke microvascular proteome changes. Proteomic analysis of cerebral cortical microvessels (cMVs) was done by tandem mass spectrometry comparing two prestroke time points. Metaprotein-pathway analyses of proteomic spectral count data were done to identify risk factor–induced changes, followed by QSPEC-analyses of individual protein changes associated with increased stroke susceptibility. We report 26 cMV proteome profiles from male and female stroke-prone and non–stroke-prone rats at 2 months and 4.5 months of age prior to overt stroke events. We identified 1,934 proteins by two or more peptides. Metaprotein pathway analysis detected age-associated changes in energy metabolism and cell-to-microenvironment interactions, as well as sex-specific changes in energy metabolism and endothelial leukocyte transmigration pathways. Stroke susceptibility was associated independently with multiple protein changes associated with ischemia, angiogenesis or involved in blood brain barrier (BBB) integrity. Immunohistochemical analysis confirmed aquaporin-4 and laminin-α1 induction in cMVs, representative of proteomic changes with >65 Bayes factor (BF), associated with stroke susceptibility. Altogether, proteomic analysis demonstrates significant molecular changes in ischemic cerebral microvasculature in the prestroke stage, which could contribute to the observed model phenotype of microhemorrhages and postischemic hemorrhagic transformation. These pathways comprise putative targets for translational research of much needed novel diagnostic and therapeutic approaches for stroke.
PMCID: PMC3146600  PMID: 21519634
20.  Metabolites of Purine Nucleoside Phosphorylase (NP) in Serum Have the Potential to Delineate Pancreatic Adenocarcinoma 
PLoS ONE  2011;6(3):e17177.
Pancreatic Adenocarcinoma (PDAC), the fourth highest cause of cancer related deaths in the United States, has the most aggressive presentation resulting in a very short median survival time for the affected patients. Early detection of PDAC is confounded by lack of specific markers that has motivated the use of high throughput molecular approaches to delineate potential biomarkers. To pursue identification of a distinct marker, this study profiled the secretory proteome in 16 PDAC, 2 carcinoma in situ (CIS) and 7 benign patients using label-free mass spectrometry coupled to 1D-SDS-PAGE and Strong Cation-Exchange Chromatography (SCX). A total of 431 proteins were detected of which 56 were found to be significantly elevated in PDAC. Included in this differential set were Parkinson disease autosomal recessive, early onset 7 (PARK 7) and Alpha Synuclein (aSyn), both of which are known to be pathognomonic to Parkinson's disease as well as metabolic enzymes like Purine Nucleoside Phosphorylase (NP) which has been exploited as therapeutic target in cancers. Tissue Microarray analysis confirmed higher expression of aSyn and NP in ductal epithelia of pancreatic tumors compared to benign ducts. Furthermore, extent of both aSyn and NP staining positively correlated with tumor stage and perineural invasion while their intensity of staining correlated with the existence of metastatic lesions in the PDAC tissues. From the biomarker perspective, NP protein levels were higher in PDAC sera and furthermore serum levels of its downstream metabolites guanosine and adenosine were able to distinguish PDAC from benign in an unsupervised hierarchical classification model. Overall, this study for the first time describes elevated levels of aSyn in PDAC as well as highlights the potential of evaluating NP protein expression and levels of its downstream metabolites to develop a multiplex panel for non-invasive detection of PDAC.
PMCID: PMC3063153  PMID: 21448452
21.  A Double-Layered Mixture Model for the Joint Analysis of DNA Copy Number and Gene Expression Data 
Journal of Computational Biology  2010;17(2):121-137.
Copy number aberration is a common form of genomic instability in cancer. Gene expression is closely tied to cytogenetic events by the central dogma of molecular biology, and serves as a mediator of copy number changes in disease phenotypes. Accordingly, it is of interest to develop proper statistical methods for jointly analyzing copy number and gene expression data. This work describes a novel Bayesian inferential approach for a double-layered mixture model (DLMM) which directly models the stochastic nature of copy number data and identifies abnormally expressed genes due to aberrant copy number. Simulation studies were conducted to illustrate the robustness of DLMM under various settings of copy number aberration frequency, confounding effects, and signal-to-noise ratio in gene expression data. Analysis of a real breast cancer data shows that DLMM is able to identify expression changes specifically attributable to copy number aberration in tumors and that a sample-specific index built based on the selected genes is correlated with relevant clinical information.
PMCID: PMC3148827  PMID: 20170400
cancer genomics; statistics
22.  Hierarchical hidden Markov model with application to joint analysis of ChIP-chip and ChIP-seq data 
Bioinformatics  2009;25(14):1715-1721.
Motivation: Chromatin immunoprecipitation (ChIP) experiments followed by array hybridization, or ChIP-chip, is a powerful approach for identifying transcription factor binding sites (TFBS) and has been widely used. Recently, massively parallel sequencing coupled with ChIP experiments (ChIP-seq) has been increasingly used as an alternative to ChIP-chip, offering cost-effective genome-wide coverage and resolution up to a single base pair. For many well-studied TFs, both ChIP-seq and ChIP-chip experiments have been applied and their data are publicly available. Previous analyses have revealed substantial technology-specific binding signals despite strong correlation between the two sets of results. Therefore, it is of interest to see whether the two data sources can be combined to enhance the detection of TFBS.
Results: In this work, hierarchical hidden Markov model (HHMM) is proposed for combining data from ChIP-seq and ChIP-chip. In HHMM, inference results from individual HMMs in ChIP-seq and ChIP-chip experiments are summarized by a higher level HMM. Simulation studies show the advantage of HHMM when data from both technologies co-exist. Analysis of two well-studied TFs, NRSF and CCCTC-binding factor (CTCF), also suggests that HHMM yields improved TFBS identification in comparison to analyses using individual data sources or a simple merger of the two.
Availability: Source code for the software ChIPmeta is freely available for download at∼hwchoi/, implemented in C and supported on linux.
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC2732365  PMID: 19447789
23.  Analysis of protein complexes through model-based biclustering of label-free quantitative AP-MS data 
Affinity purification followed by mass spectrometry (AP-MS) has become a common approach for identifying protein–protein interactions (PPIs) and complexes. However, data analysis and visualization often rely on generic approaches that do not take advantage of the quantitative nature of AP-MS. We present a novel computational method, nested clustering, for biclustering of label-free quantitative AP-MS data. Our approach forms bait clusters based on the similarity of quantitative interaction profiles and identifies submatrices of prey proteins showing consistent quantitative association within bait clusters. In doing so, nested clustering effectively addresses the problem of overrepresentation of interactions involving baits proteins as compared with proteins only identified as preys. The method does not require specification of the number of bait clusters, which is an advantage against existing model-based clustering methods. We illustrate the performance of the algorithm using two published intermediate scale human PPI data sets, which are representative of the AP-MS data generated from mammalian cells. We also discuss general challenges of analyzing and interpreting clustering results in the context of AP-MS data.
PMCID: PMC2913403  PMID: 20571534
clustering; mass spectrometry; protein complexes; protein–protein interaction; spectral counts
24.  Global Associations between Copy Number and Transcript mRNA Microarray Data: An Empirical Study 
Cancer Informatics  2008;6:17-23.
With an increasing number of cancer profiling studies assaying both transcript mRNA and copy number expression levels, a natural question then involves the potential to combine information across the two types of genomic data. In this article, we perform a study to assess the nature of association between the two types of data across several experiments. We report on several interesting findings: 1) global correlation between gene expression and copy number is relatively weak but consistent across studies; 2) there is strong evidence for a cis-dosage effect of copy number on gene expression; 3) segmenting the copy number levels helps to improve correlations.
PMCID: PMC2623285  PMID: 19259399
circular binary segmentation; high-dimensional data; machine learning; two-color microarray platform
25.  A Latent Variable Approach for Meta-Analysis of Gene Expression Data from Multiple Microarray Experiments 
BMC Bioinformatics  2007;8:364.
With the explosion in data generated using microarray technology by different investigators working on similar experiments, it is of interest to combine results across multiple studies.
In this article, we describe a general probabilistic framework for combining high-throughput genomic data from several related microarray experiments using mixture models. A key feature of the model is the use of latent variables that represent quantities that can be combined across diverse platforms. We consider two methods for estimation of an index termed the probability of expression (POE). The first, reported in previous work by the authors, involves Markov Chain Monte Carlo (MCMC) techniques. The second method is a faster algorithm based on the expectation-maximization (EM) algorithm. The methods are illustrated with application to a meta-analysis of datasets for metastatic cancer.
The statistical methods described in the paper are available as an R package, metaArray 1.8.1, which is at Bioconductor, whose URL is .
PMCID: PMC2246152  PMID: 17900369

Results 1-25 (25)