Search tips
Search criteria

Results 1-25 (947869)

Clipboard (0)

Related Articles

1.  Computational Biomarker Pipeline from Discovery to Clinical Implementation: Plasma Proteomic Biomarkers for Cardiac Transplantation 
PLoS Computational Biology  2013;9(4):e1002963.
Recent technical advances in the field of quantitative proteomics have stimulated a large number of biomarker discovery studies of various diseases, providing avenues for new treatments and diagnostics. However, inherent challenges have limited the successful translation of candidate biomarkers into clinical use, thus highlighting the need for a robust analytical methodology to transition from biomarker discovery to clinical implementation. We have developed an end-to-end computational proteomic pipeline for biomarkers studies. At the discovery stage, the pipeline emphasizes different aspects of experimental design, appropriate statistical methodologies, and quality assessment of results. At the validation stage, the pipeline focuses on the migration of the results to a platform appropriate for external validation, and the development of a classifier score based on corroborated protein biomarkers. At the last stage towards clinical implementation, the main aims are to develop and validate an assay suitable for clinical deployment, and to calibrate the biomarker classifier using the developed assay. The proposed pipeline was applied to a biomarker study in cardiac transplantation aimed at developing a minimally invasive clinical test to monitor acute rejection. Starting with an untargeted screening of the human plasma proteome, five candidate biomarker proteins were identified. Rejection-regulated proteins reflect cellular and humoral immune responses, acute phase inflammatory pathways, and lipid metabolism biological processes. A multiplex multiple reaction monitoring mass-spectrometry (MRM-MS) assay was developed for the five candidate biomarkers and validated by enzyme-linked immune-sorbent (ELISA) and immunonephelometric assays (INA). A classifier score based on corroborated proteins demonstrated that the developed MRM-MS assay provides an appropriate methodology for an external validation, which is still in progress. Plasma proteomic biomarkers of acute cardiac rejection may offer a relevant post-transplant monitoring tool to effectively guide clinical care. The proposed computational pipeline is highly applicable to a wide range of biomarker proteomic studies.
Author Summary
Novel proteomic technology has led to the generation of vast amounts of biological data and the identification of numerous potential biomarkers. However, computational approaches to translate this information into knowledge capable of impacting clinical care have been lagging. We propose a computational proteomic pipeline for biomarker studies that is founded on the combination of advanced statistical methodologies. We demonstrate our approach through the analysis of data obtained from heart transplant patients. Heart transplantation is the gold standard treatment for patients with end-stage heart failure, but is complicated by episodes of immune rejection that can adversely impact patient outcomes. Current rejection monitoring approaches are highly invasive, requiring a biopsy of the heart. This work aims to reduce the need for biopsies, and demonstrate the power and utility of computational approaches in proteomic biomarker discovery. Our work utilizes novel high-throughput proteomic technology combined with advanced statistical techniques to identify blood markers that guide the decision as to whether a biopsy is warranted, reduce the number of unnecessary biopsies, and ultimately diagnose the presence of rejection in heart transplant patients. Additionally, the proposed computational methodologies can be applied to a range of proteomic biomarker studies of various diseases and conditions.
PMCID: PMC3617196  PMID: 23592955
2.  EP6 Quantitative Proteomics 
There are numerous approaches to study the proteome in a quantitative manner. All rely heavily on optimized sample preparation and appropriate statistical analysis of resulting datasets. This session will cover the following aspects of quantitative proteomics approaches:
Quantitative profiling of the membrane proteome requires special considerations not addressed in typical mass-spectrometry analyses. Optimized sample preparation and separation strategies will be discussed in the context of enriched membrane fractions and a quantitative proteomics platform using stable isotopes.In shotgun proteomics, a complex protein mixture is first digested to peptides, which are then analyzed by a combination of nanoflow chromatography and tandem mass spectrometry. The effects of subtle changes in sample preparation and chromatographic conditions in the characterization of complex mixtures will be presented. A discovery-based mass spectrometry approach using a bench-top LTQ linear ion trap and in-house written software for label-free differential protein profiling will be presented. This approach is quite comprehensive and is compatible with even the most inexpensive mass spectrometers. For proteins not detected routinely using our discovery-based approaches, we have applied selected reaction monitoring using a TSQ Quantum Ultra. This approach has been used to identify and quantify proteins at the low ng/mL level in plasma without any prior fractionation. A software pipeline has been developed to go from hypothesized proteins of interest derived from the literature to predicted hSRM transitions, collision offsets, and predicted chromatographic retention times. The combination of both discovery- and hypothesis-driven proteomics using nanoflow separations and tandem mass spectrometry provides us with unparalleled sensitivity and dynamic range in characterizing complex mixtures.Spectrum counting is an appealing and relatively straightforward approach for quantitative proteomics. Since the spectrum count of a protein in a proteomic analysis is the total number of peptides, not just unique peptides detected and identified for a given protein, searching criteria and false-positive minimization is important. There are several different versions of spectral counting currently in use, but each approach has shared core characteristics. An additional important consideration for quantitative proteomic analysis is the use of replicates for statistical analysis and determining the proper statistical test to use based on the overall structure of the datasets. This presentation will describe the foundation of spectral counting and the modifications to this approach used by different researchers. In addition, selected examples of the biological implementation of these approaches will be described.
PMCID: PMC2292016
3.  Coherent pipeline for biomarker discovery using mass spectrometry and bioinformatics 
BMC Bioinformatics  2010;11:437.
Robust biomarkers are needed to improve microbial identification and diagnostics. Proteomics methods based on mass spectrometry can be used for the discovery of novel biomarkers through their high sensitivity and specificity. However, there has been a lack of a coherent pipeline connecting biomarker discovery with established approaches for evaluation and validation. We propose such a pipeline that uses in silico methods for refined biomarker discovery and confirmation.
The pipeline has four main stages: Sample preparation, mass spectrometry analysis, database searching and biomarker validation. Using the pathogen Clostridium botulinum as a model, we show that the robustness of candidate biomarkers increases with each stage of the pipeline. This is enhanced by the concordance shown between various database search algorithms for peptide identification. Further validation was done by focusing on the peptides that are unique to C. botulinum strains and absent in phylogenetically related Clostridium species. From a list of 143 peptides, 8 candidate biomarkers were reliably identified as conserved across C. botulinum strains. To avoid discarding other unique peptides, a confidence scale has been implemented in the pipeline giving priority to unique peptides that are identified by a union of algorithms.
This study demonstrates that implementing a coherent pipeline which includes intensive bioinformatics validation steps is vital for discovery of robust biomarkers. It also emphasises the importance of proteomics based methods in biomarker discovery.
PMCID: PMC2939613  PMID: 20796299
4.  Gel-Based and Gel-Free Quantitative Proteomics Approaches at a Glance 
Two-dimensional gel electrophoresis (2-DE) is widely applied and remains the method of choice in proteomics; however, pervasive 2-DE-related concerns undermine its prospects as a dominant separation technique in proteome research. Consequently, the state-of-the-art shotgun techniques are slowly taking over and utilising the rapid expansion and advancement of mass spectrometry (MS) to provide a new toolbox of gel-free quantitative techniques. When coupled to MS, the shotgun proteomic pipeline can fuel new routes in sensitive and high-throughput profiling of proteins, leading to a high accuracy in quantification. Although label-based approaches, either chemical or metabolic, gained popularity in quantitative proteomics because of the multiplexing capacity, these approaches are not without drawbacks. The burgeoning label-free methods are tag independent and suitable for all kinds of samples. The challenges in quantitative proteomics are more prominent in plants due to difficulties in protein extraction, some protein abundance in green tissue, and the absence of well-annotated and completed genome sequences. The goal of this perspective assay is to present the balance between the strengths and weaknesses of the available gel-based and -free methods and their application to plants. The latest trends in peptide fractionation amenable to MS analysis are as well discussed.
PMCID: PMC3508552  PMID: 23213324
5.  SP9 Adapting Shotgun Proteomics Platforms for Clinical Proteomics 
Analysis of tissues from cancers, precancers, and normal tissues provides a means to identify candidate markers for disease detection. The only proteomic technology platform capable of large-scale inventory and identification of serum proteins is shotgun proteomics, in which proteins are first digested to peptides and then the peptides are subjected to analysis by multidimensional liquid chromatography-tandem MS (LC-MS-MS). However, current implementations of shotgun proteome analyses are limited in both sample throughput and reproducibility in identification and detection, particularly for lower-abundance proteins. Here, we describe efforts to refine, standardize, and implement shotgun proteomics platforms for application to high-throughput analysis of clinical tissue specimens. Our guiding principles in developing and standardizing shotgun proteome analysis platforms, in order of decreasing priority, are to (1) achieve sufficient reproducibility to allow single analyses to replace multiple replicates, (2) reduce the amount of MS instrument time required for analysis, thus increasing throughput, and (3) achieve the greatest sensitivity and depth of coverage possible, with the ultimate goal of equaling or exceeding the performance of lower-throughput shotgun proteome analyses in current use. Refinement of the multidimensional LC-MS-MS platform is focused on (1) improving the reproducibility and standardization of peptide separations by replacing strong cation exchange separations with isoelectric focusing on immobilized pH gradient strips; (2) employing new methods to acquire MS-MS spectra in LC-MS-MS analyses using hybrid LTQ-Orbitrap instruments; (3) applying new data-analysis algorithms and software to identify peptides and proteins from MS-MS data and to quantify with label-free methods. A major challenge is the statistical comparison of multiple complex datasets derived by shotgun analyses to identify tissue-specific proteomic characteristics that can be selected as candidate markers.
PMCID: PMC2291836
6.  ATAQS: A computational software tool for high throughput transition optimization and validation for selected reaction monitoring mass spectrometry 
BMC Bioinformatics  2011;12:78.
Since its inception, proteomics has essentially operated in a discovery mode with the goal of identifying and quantifying the maximal number of proteins in a sample. Increasingly, proteomic measurements are also supporting hypothesis-driven studies, in which a predetermined set of proteins is consistently detected and quantified in multiple samples. Selected reaction monitoring (SRM) is a targeted mass spectrometric technique that supports the detection and quantification of specific proteins in complex samples at high sensitivity and reproducibility. Here, we describe ATAQS, an integrated software platform that supports all stages of targeted, SRM-based proteomics experiments including target selection, transition optimization and post acquisition data analysis. This software will significantly facilitate the use of targeted proteomic techniques and contribute to the generation of highly sensitive, reproducible and complete datasets that are particularly critical for the discovery and validation of targets in hypothesis-driven studies in systems biology.
We introduce a new open source software pipeline, ATAQS (Automated and Targeted Analysis with Quantitative SRM), which consists of a number of modules that collectively support the SRM assay development workflow for targeted proteomic experiments (project management and generation of protein, peptide and transitions and the validation of peptide detection by SRM). ATAQS provides a flexible pipeline for end-users by allowing the workflow to start or end at any point of the pipeline, and for computational biologists, by enabling the easy extension of java algorithm classes for their own algorithm plug-in or connection via an external web site.
This integrated system supports all steps in a SRM-based experiment and provides a user-friendly GUI that can be run by any operating system that allows the installation of the Mozilla Firefox web browser.
Targeted proteomics via SRM is a powerful new technique that enables the reproducible and accurate identification and quantification of sets of proteins of interest. ATAQS is the first open-source software that supports all steps of the targeted proteomics workflow. ATAQS also provides software API (Application Program Interface) documentation that enables the addition of new algorithms to each of the workflow steps. The software, installation guide and sample dataset can be found in
PMCID: PMC3213215  PMID: 21414234
7.  A pipeline that integrates the discovery and verification of plasma protein biomarkers reveals candidate markers for cardiovascular disease 
Nature Biotechnology  2011;29(7):635-643.
We developed a pipeline to integrate the proteomic technologies used from the discovery to the verification stages of plasma biomarker identification and applied it to identify early biomarkers of cardiac injury from the blood of patients undergoing a therapeutic, planned myocardial infarction (PMI) for treatment of hypertrophic cardiomyopathy. Sampling of blood directly from patient hearts before, during and after controlled myocardial injury ensured enrichment for candidate biomarkers and allowed patients to serve as their own biological controls. LC-MS/MS analyses detected 121 highly differentially expressed proteins, including previously credentialed markers of cardiovascular disease and >100 novel candidate biomarkers for myocardial infarction (MI). Accurate inclusion mass screening (AIMS) qualified a subset of the candidates based on highly specific, targeted detection in peripheral plasma, including some markers unlikely to have been identified without this step. Analyses of peripheral plasma from controls and patients with PMI or spontaneous MI by quantitative multiple reaction monitoring mass spectrometry or immunoassays suggest that the candidate biomarkers may be specific to MI. This study demonstrates that modern proteomic technologies, when coherently integrated, can yield novel cardiovascular biomarkers meriting further evaluation in large, heterogeneous cohorts.
PMCID: PMC3366591  PMID: 21685905
Electrophoresis  2009;30(23):4063-4070.
A compelling need exists for the development of technologies that facilitate and accelerate the discovery of novel protein biomarkers with therapeutic and diagnostic potential. Comparisons among shotgun proteome technologies, including capillary isotachophoresis (CITP)-based multidimensional separations and multidimensional liquid chromatography system, are therefore performed in this study regarding their abilities to address the challenges of protein complexity and relative abundance inherent in glioblastoma multiforme derived cancer stem cells. Comparisons are conducted using a single processed protein digest with equal sample loading, identical second dimension separation (reversed phase liquid chromatography) and mass spectrometry conditions, and consistent search parameters and cutoff established by the target-decoy determined false discovery rate.
Besides achieving superior overall proteome performance in total peptide, distinct peptide, and distinct protein identifications, analytical reproducibility of the CITP proteome platform coupled with the spectral counting approach is determined by a Pearson R2 value of 0.98 and a coefficient of variation of 15% across all proteins quantified. In contrast, extensive fraction overlapping in strong cation exchange greatly limits the ability of multidimensional liquid chromatography separations for mining deeper into the tissue proteome as evidenced by the poor coverage in various protein functional categories and key protein pathways. The CITP proteomic technology, equipped with selective analyte enrichment and ultrahigh resolving power, is expected to serve as a critical component in the overall toolset required for biomarker discovery via shotgun proteomic analysis of tissue specimens.
PMCID: PMC3465977  PMID: 19960471
Biomarker; Capillary Electrophoresis; Mass Spectrometry; Strong Cation Exchange Chromatography; Tissue Proteomics
9.  Structural and functional protein network analyses predict novel signaling functions for rhodopsin 
Proteomic analyses, literature mining, and structural data were combined to generate an extensive signaling network linked to the visual G protein-coupled receptor rhodopsin. Network analysis suggests novel signaling routes to cytoskeleton dynamics and vesicular trafficking.
Using a shotgun proteomic approach, we identified the protein inventory of the light sensing outer segment of the mammalian photoreceptor.These data, combined with literature mining, structural modeling, and computational analysis, offer a comprehensive view of signal transduction downstream of the visual G protein-coupled receptor rhodopsin.The network suggests novel signaling branches downstream of rhodopsin to cytoskeleton dynamics and vesicular trafficking.The network serves as a basis for elucidating physiological principles of photoreceptor function and suggests potential disease-associated proteins.
Photoreceptor cells are neurons capable of converting light into electrical signals. The rod outer segment (ROS) region of the photoreceptor cells is a cellular structure made of a stack of around 800 closed membrane disks loaded with rhodopsin (Liang et al, 2003; Nickell et al, 2007). In disc membranes, rhodopsin arranges itself into paracrystalline dimer arrays, enabling optimal association with the heterotrimeric G protein transducin as well as additional regulatory components (Ciarkowski et al, 2005). Disruption of these highly regulated structures and processes by germline mutations is the cause of severe blinding diseases such as retinitis pigmentosa, macular degeneration, or congenital stationary night blindness (Berger et al, 2010).
Traditionally, signal transduction networks have been studied by combining biochemical and genetic experiments addressing the relations among a small number of components. More recently, large throughput experiments using different techniques like two hybrid or co-immunoprecipitation coupled to mass spectrometry have added a new level of complexity (Ito et al, 2001; Gavin et al, 2002, 2006; Ho et al, 2002; Rual et al, 2005; Stelzl et al, 2005). However, in these studies, space, time, and the fact that many interactions detected for a particular protein are not compatible, are not taken into consideration. Structural information can help discriminate between direct and indirect interactions and more importantly it can determine if two or more predicted partners of any given protein or complex can simultaneously bind a target or rather compete for the same interaction surface (Kim et al, 2006).
In this work, we build a functional and dynamic interaction network centered on rhodopsin on a systems level, using six steps: In step 1, we experimentally identified the proteomic inventory of the porcine ROS, and we compared our data set with a recent proteomic study from bovine ROS (Kwok et al, 2008). The union of the two data sets was defined as the ‘initial experimental ROS proteome'. After removal of contaminants and applying filtering methods, a ‘core ROS proteome', consisting of 355 proteins, was defined.
In step 2, proteins of the core ROS proteome were assigned to six functional modules: (1) vision, signaling, transporters, and channels; (2) outer segment structure and morphogenesis; (3) housekeeping; (4) cytoskeleton and polarity; (5) vesicles formation and trafficking, and (6) metabolism.
In step 3, a protein-protein interaction network was constructed based on the literature mining. Since for most of the interactions experimental evidence was co-immunoprecipitation, or pull-down experiments, and in addition many of the edges in the network are supported by single experimental evidence, often derived from high-throughput approaches, we refer to this network, as ‘fuzzy ROS interactome'. Structural information was used to predict binary interactions, based on the finding that similar domain pairs are likely to interact in a similar way (‘nature repeats itself') (Aloy and Russell, 2002). To increase the confidence in the resulting network, edges supported by a single evidence not coming from yeast two-hybrid experiments were removed, exception being interactions where the evidence was the existence of a three-dimensional structure of the complex itself, or of a highly homologous complex. This curated static network (‘high-confidence ROS interactome') comprises 660 edges linking the majority of the nodes. By considering only edges supported by at least one evidence of direct binary interaction, we end up with a ‘high-confidence binary ROS interactome'. We next extended the published core pathway (Dell'Orco et al, 2009) using evidence from our high-confidence network. We find several new direct binary links to different cellular functional processes (Figure 4): the active rhodopsin interacts with Rac1 and the GTP form of Rho. There is also a connection between active rhodopsin and Arf4, as well as PDEδ with Rab13 and the GTP-bound form of Arl3 that links the vision cycle to vesicle trafficking and structure. We see a connection between PDEδ with prenyl-modified proteins, such as several small GTPases, as well as with rhodopsin kinase. Further, our network reveals several direct binary connections between Ca2+-regulated proteins and cytoskeleton proteins; these are CaMK2A with actinin, calmodulin with GAP43 and S1008, and PKC with 14-3-3 family members.
In step 4, part of the network was experimentally validated using three different approaches to identify physical protein associations that would occur under physiological conditions: (i) Co-segregation/co-sedimentation experiments, (ii) immunoprecipitations combined with mass spectrometry and/or subsequent immunoblotting, and (iii) utilizing the glycosylated N-terminus of rhodopsin to isolate its associated protein partners by Concanavalin A affinity purification. In total, 60 co-purification and co-elution experiments supported interactions that were already in our literature network, and new evidence from 175 co-IP experiments in this work was added. Next, we aimed to provide additional independent experimental confirmation for two of the novel networks and functional links proposed based on the network analysis: (i) the proposed complex between Rac1/RhoA/CRMP-2/tubulin/and ROCK II in ROS was investigated by culturing retinal explants in the presence of an ROCK II-specific inhibitor (Figure 6). While morphology of the retinas treated with ROCK II inhibitor appeared normal, immunohistochemistry analyses revealed several alterations on the protein level. (ii) We supported the hypothesis that PDEδ could function as a GDI for Rac1 in ROS, by demonstrating that PDEδ and Rac1 co localize in ROS and that PDEδ could dissociate Rac1 from ROS membranes in vitro.
In step 5, we use structural information to distinguish between mutually compatible (‘AND') or excluded (‘XOR') interactions. This enables breaking a network of nodes and edges into functional machines or sub-networks/modules. In the vision branch, both ‘AND' and ‘XOR' gates synergize. This may allow dynamic tuning of light and dark states. However, all connections from the vision module to other modules are ‘XOR' connections suggesting that competition, in connection with local protein concentration changes, could be important for transmitting signals from the core vision module.
In the last step, we map and functionally characterize the known mutations that produce blindness.
In summary, this represents the first comprehensive, dynamic, and integrative rhodopsin signaling network, which can be the basis for integrating and mapping newly discovered disease mutants, to guide protein or signaling branch-specific therapies.
Orchestration of signaling, photoreceptor structural integrity, and maintenance needed for mammalian vision remain enigmatic. By integrating three proteomic data sets, literature mining, computational analyses, and structural information, we have generated a multiscale signal transduction network linked to the visual G protein-coupled receptor (GPCR) rhodopsin, the major protein component of rod outer segments. This network was complemented by domain decomposition of protein–protein interactions and then qualified for mutually exclusive or mutually compatible interactions and ternary complex formation using structural data. The resulting information not only offers a comprehensive view of signal transduction induced by this GPCR but also suggests novel signaling routes to cytoskeleton dynamics and vesicular trafficking, predicting an important level of regulation through small GTPases. Further, it demonstrates a specific disease susceptibility of the core visual pathway due to the uniqueness of its components present mainly in the eye. As a comprehensive multiscale network, it can serve as a basis to elucidate the physiological principles of photoreceptor function, identify potential disease-associated genes and proteins, and guide the development of therapies that target specific branches of the signaling pathway.
PMCID: PMC3261702  PMID: 22108793
protein interaction network; rhodopsin signaling; structural modeling
10.  A complete mass spectrometric map for the analysis of the yeast proteome and its application to quantitative trait analysis 
Nature  2013;494(7436):266-270.
Complete reference maps or datasets, like the genomic map of an organism, are highly beneficial tools for biological and biomedical research. Attempts to generate such reference datasets for a proteome so far failed to reach complete proteome coverage, with saturation apparent at approximately two thirds of the proteomes tested, even for the most thoroughly characterized proteomes. Here, we used a strategy based on high-throughput peptide synthesis and mass spectrometry to generate a close to complete reference map (97% of the genome-predicted proteins) of the S. cerevisiae proteome. We generated two versions of this mass spectrometric map one supporting discovery- (shotgun) and the other hypothesis-driven (targeted) proteomic measurements. The two versions of the map, therefore, constitute a complete set of proteomic assays to support most studies performed with contemporary proteomic technologies. The reference libraries can be browsed via a web-based repository and associated navigation tools. To demonstrate the utility of the reference libraries we applied them to a protein quantitative trait locus (pQTL) analysis, which requires measurement of the same peptides over a large number of samples with high precision. Protein measurements over a set of 78 S. cerevisiae strains revealed a complex relationship between independent genetic loci, impacting on the levels of related proteins. Our results suggest that selective pressure favors the acquisition of sets of polymorphisms that maintain the stoichiometry of protein complexes and pathways.
PMCID: PMC3951219  PMID: 23334424
S. cerevisiae; selected reaction monitoring; SRM; MRM; spectral library; peptide library; mass spectrometric map; protein QTL
11.  A Semiautomated Framework for Integrating Expert Knowledge into Disease Marker Identification 
Disease markers  2013;35(5):513-523.
Background. The availability of large complex data sets generated by high throughput technologies has enabled the recent proliferation of disease biomarker studies. However, a recurring problem in deriving biological information from large data sets is how to best incorporate expert knowledge into the biomarker selection process. Objective. To develop a generalizable framework that can incorporate expert knowledge into data-driven processes in a semiautomated way while providing a metric for optimization in a biomarker selection scheme. Methods. The framework was implemented as a pipeline consisting of five components for the identification of signatures from integrated clustering (ISIC). Expert knowledge was integrated into the biomarker identification process using the combination of two distinct approaches; a distance-based clustering approach and an expert knowledge-driven functional selection. Results. The utility of the developed framework ISIC was demonstrated on proteomics data from a study of chronic obstructive pulmonary disease (COPD). Biomarker candidates were identified in a mouse model using ISIC and validated in a study of a human cohort. Conclusions. Expert knowledge can be introduced into a biomarker discovery process in different ways to enhance the robustness of selected marker candidates. Developing strategies for extracting orthogonal and robust features from large data sets increases the chances of success in biomarker identification.
PMCID: PMC3809975  PMID: 24223463
12.  Absolute quantification of microbial proteomes at different states by directed mass spectrometry 
The developed, directed mass spectrometry workflow allows to generate consistent and system-wide quantitative maps of microbial proteomes in a single analysis. Application to the human pathogen L. interrogans revealed mechanistic proteome changes over time involved in pathogenic progression and antibiotic defense, and new insights about the regulation of absolute protein abundances within operons.
The developed, directed proteomic approach allowed consistent detection and absolute quantification of 1680 proteins of the human pathogen L. interrogans in a single LC–MS/MS experiment.The comparison of 25 extensive, consistent and quantitative proteome maps revealed new insights about the proteome changes involved in pathogenic progression and antibiotic defense of L. interrogans, and about the regulation of protein abundances within operons.The generated time-resolved data sets are compatible with pattern analysis algorithms developed for transcriptomics, including hierarchical clustering and functional enrichment analysis of the detected profile clusters.This is the first study that describes the absolute quantitative behavior of any proteome over multiple states and represents the most comprehensive proteome abundance pattern comparison for any organism to date.
Over the last decade, mass spectrometry (MS)-based proteomics has evolved as the method of choice for system-wide proteome studies and now allows for the characterization of several thousands of proteins in a single sample. Despite these great advances, redundant monitoring of protein levels over large sample numbers in a high-throughput manner remains a challenging task. New directed MS strategies have shown to overcome some of the current limitations, thereby enabling the acquisition of consistent and system-wide data sets of proteomes with low-to-moderate complexity at high throughput.
In this study, we applied this integrated, two-stage MS strategy to investigate global proteome changes in the human pathogen L. interrogans. In the initial discovery phase, 1680 proteins (out of around 3600 gene products) could be identified (Schmidt et al, 2008) and, by focusing precious MS-sequencing time on the most dominant, specific peptides per protein, all proteins could be accurately and consistently monitored over 25 different samples within a few days of instrument time in the following scoring phase (Figure 1). Additionally, the co-analysis of heavy reference peptides enabled us to obtain absolute protein concentration estimates for all identified proteins in each perturbation (Malmström et al, 2009). The detected proteins did not show any biases against functional groups or protein classes, including membrane proteins, and span an abundance range of more than three orders of magnitude, a range that is expected to cover most of the L. interrogans proteome (Malmström et al, 2009).
To elucidate mechanistic proteome changes over time involved in pathogenic progression and antibiotic defense of L. interrogans, we generated time-resolved proteome maps of cells perturbed with serum and three different antibiotics at sublethal concentrations that are currently used to treat Leptospirosis. This yielded an information-rich proteomic data set that describes, for the first time, the absolute quantitative behavior of any proteome over multiple states, and represents the most comprehensive proteome abundance pattern comparison for any organism to date. Using this unique property of the data set, we could quantify protein components of entire pathways across several time points and subject the data sets to cluster analysis, a tool that was previously limited to the transcript level due to incomplete sampling on protein level (Figure 4). Based on these analyses, we could demonstrate that Leptospira cells adjust the cellular abundance of a certain subset of proteins and pathways as a general response to stress while other parts of the proteome respond highly specific. The cells furthermore react to individual treatments by ‘fine tuning' the abundance of certain proteins and pathways in order to cope with the specific cause of stress. Intriguingly, the most specific and significant expression changes were observed for proteins involved in motility, tissue penetration and virulence after serum treatment where we tried to simulate the host environment. While many of the detected protein changes demonstrate good agreement with available transcriptomics data, most proteins showed a poor correlation. This includes potential virulence factors, like Loa22 or OmpL1, with confirmed expression in vivo that were significantly up-regulated on the protein level, but not on the mRNA level, strengthening the importance of proteomic studies. The high resolution and coverage of the proteome data set enabled us to further investigate protein abundance changes of co-regulated genes within operons. This suggests that although most proteins within an operon respond to regulation synchronously, bacterial cells seem to have subtle means to adjust the levels of individual proteins or protein groups outside of the general trend, a phenomena that was recently also observed on the transcript level of other bacteria (Güell et al, 2009).
The method can be implemented with standard high-resolution mass spectrometers and software tools that are readily available in the majority of proteomics laboratories. It is scalable to any proteome of low-to-medium complexity and can be extended to post-translational modifications or peptide-labeling strategies for quantification. We therefore expect the approach outlined here to become a cornerstone for microbial systems biology.
Over the past decade, liquid chromatography coupled with tandem mass spectrometry (LC–MS/MS) has evolved into the main proteome discovery technology. Up to several thousand proteins can now be reliably identified from a sample and the relative abundance of the identified proteins can be determined across samples. However, the remeasurement of substantially similar proteomes, for example those generated by perturbation experiments in systems biology, at high reproducibility and throughput remains challenging. Here, we apply a directed MS strategy to detect and quantify sets of pre-determined peptides in tryptic digests of cells of the human pathogen Leptospira interrogans at 25 different states. We show that in a single LC–MS/MS experiment around 5000 peptides, covering 1680 L. interrogans proteins, can be consistently detected and their absolute expression levels estimated, revealing new insights about the proteome changes involved in pathogenic progression and antibiotic defense of L. interrogans. This is the first study that describes the absolute quantitative behavior of any proteome over multiple states, and represents the most comprehensive proteome abundance pattern comparison for any organism to date.
PMCID: PMC3159967  PMID: 21772258
absolute quantification; directed mass spectrometry; Leptospira interrogans; microbiology; proteomics
13.  Shared Immunoproteome for ovarian cancer diagnostics and immunotherapy: Potential theranostic approach to cancer 
Journal of proteome research  2007;6(7):2509-2517.
Elimination of cancer through early detection and treatment is the ultimate goal of cancer research, and is especially critical for ovarian and other forms of cancers typically diagnosed at very late stages and that have very poor response rates. Proteomics has opened new avenues for the discovery of diagnostic and therapeutic targets. Immunoproteomics, which defines the subset of proteins involved in the immune response, holds considerable promise for providing a better understanding of the early stage immune response to cancer as well as important insights into antigens that may be suitable for immunotherapy. Early administration of immunotherapeutic vaccines can potentially have profound effects on prevention of metastasis and may potentially cure through efficient and complete tumor elimination. We developed a mass-spectrometry-based method to identify novel autoantibody-based serum biomarkers for the early diagnosis of ovarian cancer that uses native tumor-associated proteins immunoprecipitated by autoantibodies from sera obtained from cancer patients and from cancer-free controls to identify autoantibody signatures that occur at high frequency only in cancer patient sera. Interestingly, we identified a subset of more than 50 autoantigens that were also processed and presented by MHC class I molecules on the surfaces of ovarian cancer cells and thus common to the two immunological processes of humoral and cell-mediated immunity. These shared autoantigens were highly representative of families of proteins with roles in key processes in carcinogenesis and metastasis, such as cell cycle regulation, cell proliferation, apoptosis, tumor suppression and cell adhesion. Autoantibodies appearing at the early stages of cancer suggest that this detectable immune response to the developing tumor can be exploited as early stage biomarkers for the development of ovarian cancer diagnostics. Correspondingly, because the T cell immune response depends on MHC class I processing and presentation of peptides, the identification of proteins that go through this pathway are potential candidates for the development of immunotherapeutics designed to activate a T cell immune response to cancer. To the best of our knowledge, this is the first comprehensive study that identifies and categorizes proteins that are involved in both humoral and cell-mediated immunity against ovarian cancer, and may have broad implications for the discovery and selection of theranostic molecular targets for cancer therapeutics and diagnostics in general.
PMCID: PMC2533805  PMID: 17547437
Immunoproteomics; auto-antigens; ovarian cancer; immunotherapy; bio-marker; early diagnosis
14.  Biomarkers in inflammatory bowel diseases: Current status and proteomics identification strategies 
Unambiguous diagnosis of the two main forms of inflammatory bowel diseases (IBD): Ulcerative colitis (UC) and Crohn’s disease (CD), represents a challenge in the early stages of the diseases. The diagnosis may be established several years after the debut of symptoms. Hence, protein biomarkers for early and accurate diagnostic could help clinicians improve treatment of the individual patients. Moreover, the biomarkers could aid physicians to predict disease courses and in this way, identify patients in need of intensive treatment. Patients with low risk of disease flares may avoid treatment with medications with the concomitant risk of adverse events. In addition, identification of disease and course specific biomarker profiles can be used to identify biological pathways involved in the disease development and treatment. Knowledge of disease mechanisms in general can lead to improved future development of preventive and treatment strategies. Thus, the clinical use of a panel of biomarkers represents a diagnostic and prognostic tool of potentially great value. The technological development in recent years within proteomic research (determination and quantification of the complete protein content) has made the discovery of novel biomarkers feasible. Several IBD-associated protein biomarkers are known, but none have been successfully implemented in daily use to distinguish CD and UC patients. The intestinal tissue remains an obvious place to search for novel biomarkers, which blood, urine or stool later can be screened for. When considering the protein complexity encountered in intestinal biopsy-samples and the recent development within the field of mass spectrometry driven quantitative proteomics, a more thorough and accurate biomarker discovery endeavor could today be performed than ever before. In this review, we report the current status of the proteomics IBD biomarkers and discuss various emerging proteomic strategies for identifying and characterizing novel biomarkers, as well as suggesting future targets for analysis.
PMCID: PMC3964395  PMID: 24696607
Inflammatory bowel disease; Biomarker; Proteomics; Citrullination; Ulcerative colitis; Crohn’s disease; Posttranslational modification
15.  A Metaproteomic Approach to Study Human-Microbial Ecosystems at the Mucosal Luminal Interface 
PLoS ONE  2011;6(11):e26542.
Aberrant interactions between the host and the intestinal bacteria are thought to contribute to the pathogenesis of many digestive diseases. However, studying the complex ecosystem at the human mucosal-luminal interface (MLI) is challenging and requires an integrative systems biology approach. Therefore, we developed a novel method integrating lavage sampling of the human mucosal surface, high-throughput proteomics, and a unique suite of bioinformatic and statistical analyses. Shotgun proteomic analysis of secreted proteins recovered from the MLI confirmed the presence of both human and bacterial components. To profile the MLI metaproteome, we collected 205 mucosal lavage samples from 38 healthy subjects, and subjected them to high-throughput proteomics. The spectral data were subjected to a rigorous data processing pipeline to optimize suitability for quantitation and analysis, and then were evaluated using a set of biostatistical tools. Compared to the mucosal transcriptome, the MLI metaproteome was enriched for extracellular proteins involved in response to stimulus and immune system processes. Analysis of the metaproteome revealed significant individual-related as well as anatomic region-related (biogeographic) features. Quantitative shotgun proteomics established the identity and confirmed the biogeographic association of 49 proteins (including 3 functional protein networks) demarcating the proximal and distal colon. This robust and integrated proteomic approach is thus effective for identifying functional features of the human mucosal ecosystem, and a fresh understanding of the basic biology and disease processes at the MLI.
PMCID: PMC3221670  PMID: 22132074
16.  Development of Biomarkers for Screening Hepatocellular Carcinoma Using Global Data Mining and Multiple Reaction Monitoring 
PLoS ONE  2013;8(5):e63468.
Hepatocellular carcinoma (HCC) is one of the most common and aggressive cancers and is associated with a poor survival rate. Clinically, the level of alpha-fetoprotein (AFP) has been used as a biomarker for the diagnosis of HCC. The discovery of useful biomarkers for HCC, focused solely on the proteome, has been difficult; thus, wide-ranging global data mining of genomic and proteomic databases from previous reports would be valuable in screening biomarker candidates. Further, multiple reaction monitoring (MRM), based on triple quadrupole mass spectrometry, has been effective with regard to high-throughput verification, complementing antibody-based verification pipelines. In this study, global data mining was performed using 5 types of HCC data to screen for candidate biomarker proteins: cDNA microarray, copy number variation, somatic mutation, epigenetic, and quantitative proteomics data. Next, we applied MRM to verify HCC candidate biomarkers in individual serum samples from 3 groups: a healthy control group, patients who have been diagnosed with HCC (Before HCC treatment group), and HCC patients who underwent locoregional therapy (After HCC treatment group). After determining the relative quantities of the candidate proteins by MRM, we compared their expression levels between the 3 groups, identifying 4 potential biomarkers: the actin-binding protein anillin (ANLN), filamin-B (FLNB), complementary C4-A (C4A), and AFP. The combination of 2 markers (ANLN, FLNB) improved the discrimination of the before HCC treatment group from the healthy control group compared with AFP. We conclude that the combination of global data mining and MRM verification enhances the screening and verification of potential HCC biomarkers. This efficacious integrative strategy is applicable to the development of markers for cancer and other diseases.
PMCID: PMC3661589  PMID: 23717429
17.  Liquid Chromatography-Tandem and MALDI imaging mass spectrometry analyses of RCL2/CS100-fixed paraffin embedded tissues: proteomics evaluation of an alternate fixative for biomarker discovery 
Journal of proteome research  2009;8(12):5619-5628.
Human tissues are an important source of biological material for the discovery of novel biomarkers. Fresh-frozen tissue could represent an ideal supply of archival material for molecular investigations. However, immediate flash freezing is usually not possible, especially for rare or valuable tissue samples such as biopsies. Here, we investigated the compatibility of RCL2/CS100, a non-crosslinking, non-toxic, and non-volatile organic fixative, with shotgun proteomic analyses. Several protein extraction protocols compatible with mass spectrometry were investigated from RCL2/CS100-fixed and fresh-frozen colonic mucosa, breast, and prostate tissues. The peptides and proteins identified from RCL2/CS100 tissue were then comprehensively compared with those identified from matched fresh-frozen tissues using a bottom-up strategy based on nano-reversed phase liquid chromatography coupled with tandem mass spectrometry (nanoRPLC-MS/MS). Results showed that similar peptides could be identified in both archival conditions and the proteome coverage was not obviously compromised by the RCL2/CS100 fixation process. NanoRPLC-MS/MS of laser capture microdissected RCL2/CS100-fixed tissues gave the same amount of biological information as that recovered from whole RCL2/CS100-fixed or frozen tissues. We next performed MALDI tissue profiling and imaging mass spectrometry and observed a high level of agreement in protein expression as well as excellent agreement between the images obtained from RCL2/CS100-fixed and fresh-frozen tissue samples. These results suggest that RCL2/CS100-fixed tissues are suitable for shotgun proteomic analyses and tissue imaging. More importantly, this alternate fixative opens the door to the analysis of small, valuable, and rare target lesions that are usually inaccessible to complementary biomarker-driven genomic and proteomic research.
PMCID: PMC2924679  PMID: 19856998
18.  Challenges and Solutions in Proteomics 
Current Genomics  2007;8(1):21-28.
The accelerated growth of proteomics data presents both opportunities and challenges. Large-scale proteomic profiling of biological samples such as cells, organelles or biological fluids has led to discovery of numerous key and novel proteins involved in many biological/disease processes including cancers, as well as to the identification of novel disease biomarkers and potential therapeutic targets. While proteomic data analysis has been greatly assisted by the many bioinformatics tools developed in recent years, a careful analysis of the major steps and flow of data in a typical highthroughput analysis reveals a few gaps that still need to be filled to fully realize the value of the data. To facilitate functional and pathway discovery for large-scale proteomic data, we have developed an integrated proteomic expression analysis system, iProXpress, which facilitates protein identification using a comprehensive sequence library and functional interpretation using integrated data. With its modular design, iProXpress complements and can be integrated with other software in a proteomic data analysis pipeline. This novel approach to complex biological questions involves the interrogation of multiple data sources, thereby facilitating hypothesis generation and knowledge discovery from the genomic-scale studies and fostering disease diagnosis and drug development.
PMCID: PMC2474689  PMID: 18645629
Proteomic profiling; high-throughput analysis; biomarkers; bioinformatic tools; iProXpress; sequence library; pathway discovery; stage specific proteins
19.  Towards Systematic Discovery of Signaling Networks in Budding Yeast Filamentous Growth Stress Response Using Interventional Phosphorylation Data 
PLoS Computational Biology  2013;9(6):e1003077.
Reversible phosphorylation is one of the major mechanisms of signal transduction, and signaling networks are critical regulators of cell growth and development. However, few of these networks have been delineated completely. Towards this end, quantitative phosphoproteomics is emerging as a useful tool enabling large-scale determination of relative phosphorylation levels. However, phosphoproteomics differs from classical proteomics by a more extensive sampling limitation due to the limited number of detectable sites per protein. Here, we propose a comprehensive quantitative analysis pipeline customized for phosphoproteome data from interventional experiments for identifying key proteins in specific pathways, discovering the protein-protein interactions and inferring the signaling network. We also made an effort to partially compensate for the missing value problem, a chronic issue for proteomics studies. The dataset used for this study was generated using SILAC (Stable Isotope Labeling with Amino acids in Cell culture) technique with interventional experiments (kinase-dead mutations). The major components of the pipeline include phosphopeptide meta-analysis, correlation network analysis and causal relationship discovery. We have successfully applied our pipeline to interventional experiments identifying phosphorylation events underlying the transition to a filamentous growth form in Saccharomyces cerevisiae. We identified 5 high-confidence proteins from meta-analysis, and 19 hub proteins from correlation analysis (Pbi2p and Hsp42p were identified by both analyses). All these proteins are involved in stress responses. Nine of them have direct or indirect evidence of involvement in filamentous growth. In addition, we tested four of our predicted proteins, Nth1p, Pbi2p, Pdr12p and Rcn2p, by interventional phenotypic experiments and all of them present differential invasive growth, providing prospective validation of our approach. This comprehensive pipeline presents a systematic way for discovering signaling networks using interventional phosphoproteome data and can suggest candidate proteins for further investigation. We anticipate the methodology to be applicable as well to other interventional studies via different experimental platforms.
Author Summary
Signal transduction is a ubiquitous and essential mechanism regulating cellular functions, including responses to environmental stress. Dysfunction of signaling pathways results in a variety of diseases, including cancer, diabetes, and cardiovascular disease. Phosphorylation regulates the activity of signaling and target proteins at different cellular locations and controls activation and inactivation of signal pathways. Here, we provide an analysis of phosphoproteome datasets from yeast, utilizing kinase mutants versus wild type strains. In order to provide an objective approach to identify candidate proteins involved in the transition to a filamentous growth form, we proposed and applied a comprehensive pipeline incorporating statistical and mathematical methods to investigate the phosphoproteome data from multiple perspectives. This included phosphorylation variation in response to a single mutant, phosphorylation variation patterns over multiple mutants, and the relationships represented by these patterns. We make an effort to discover the components and targets of the signaling network, infer the network structure, and to find the relationships of changes of protein phosphorylation to cellular functions, specifically in response to stress in the context of filamentous growth.
PMCID: PMC3694812  PMID: 23825934
20.  Carbonic Anhydrase I as a New Plasma Biomarker for Prostate Cancer 
ISRN Oncology  2012;2012:768190.
Serum prostate-specific antigen (PSA) levels ranging from 4 to 10 ng/mL is considered a diagnostic gray zone for detecting prostate cancer because biopsies reveal no evidence of cancer in 75% of these subjects. Our goal was to discover a new highly specific biomarker for prostate cancer by analyzing plasma proteins using a proteomic technique. Enriched plasma proteins from 25 prostate cancer patients and 15 healthy controls were analyzed using a label-free quantitative shotgun proteomics platform called 2DICAL (2-dimensional image converted analysis of liquid chromatography and mass spectrometry) and candidate biomarkers were searched. Among the 40,678 identified mass spectrum (MS) peaks, 117 peaks significantly differed between prostate cancer patients and healthy controls. Ten peaks matched carbonic anhydrase I (CAI) by tandem MS. Independent immunological assays revealed that plasma CAI levels in 54 prostate cancer patients were significantly higher than those in 60 healthy controls (P = 0.022, Mann-Whitney U test). In the PSA gray-zone group, the discrimination rate of prostate cancer patients increased by considering plasma CAI levels. CAI can potentially serve as a valuable plasma biomarker and the combination of PSA and CAI may have great advantages for diagnosing prostate cancer in patients with gray-zone PSA level.
PMCID: PMC3506895  PMID: 23213568
21.  High Quality Catalog of Proteotypic Peptides from Human Heart 
Journal of proteome research  2008;7(11):5055-5061.
Proteomics research is beginning to expand beyond the more traditional shotgun analysis of protein mixtures to include targeted analyses of specific proteins using mass spectrometry. Integral to the development of a robust assay based on targeted mass spectrometry is prior knowledge of which peptides provide an accurate and sensitive proxy of the originating gene product (i.e., proteotypic peptides). To develop a catalog of “proteotypic peptides” in human heart, TRIzol extracts of left-ventricular tissue from nonfailing and failing human heart explants were optimized for shotgun proteomic analysis using Multidimensional Protein Identification Technology (MudPIT). Ten replicate MudPIT analyses were performed on each tissue sample and resulted in the identification of 30 605 unique peptides with a q-value ≤ 0.01, corresponding to 7138 unique human heart proteins. Experimental observation frequencies were assessed and used to select over 4476 proteotypic peptides for 2558 heart proteins. This human cardiac data set can serve as a public reference to guide the selection of proteotypic peptides for future targeted mass spectrometry experiments monitoring potential protein biomarkers of human heart diseases.
PMCID: PMC2765113  PMID: 18803417
proteotypic peptides; targeted mass spectrometry; human heart explant; dilated cardiomyopathy; MudPIT
22.  A Comprehensive Peptidome Profiling Technology for the Identification of Early Detection Biomarkers for Lung Adenocarcinoma 
PLoS ONE  2011;6(4):e18567.
The mass spectrometry-based peptidomics approaches have proven its usefulness in several areas such as the discovery of physiologically active peptides or biomarker candidates derived from various biological fluids including blood and cerebrospinal fluid. However, to identify biomarkers that are reproducible and clinically applicable, development of a novel technology, which enables rapid, sensitive, and quantitative analysis using hundreds of clinical specimens, has been eagerly awaited. Here we report an integrative peptidomic approach for identification of lung cancer-specific serum peptide biomarkers. It is based on the one-step effective enrichment of peptidome fractions (molecular weight of 1,000–5,000) with size exclusion chromatography in combination with the precise label-free quantification analysis of nano-LC/MS/MS data set using Expressionist proteome server platform. We applied this method to 92 serum samples well-managed with our SOP (standard operating procedure) (30 healthy controls and 62 lung adenocarcinoma patients), and quantitatively assessed the detected 3,537 peptide signals. Among them, 118 peptides showed significantly altered serum levels between the control and lung cancer groups (p<0.01 and fold change >5.0). Subsequently we identified peptide sequences by MS/MS analysis and further assessed the reproducibility of Expressionist-based quantification results and their diagnostic powers by MRM-based relative-quantification analysis for 96 independently prepared serum samples and found that APOA4 273–283, FIBA 5–16, and LBN 306–313 should be clinically useful biomarkers for both early detection and tumor staging of lung cancer. Our peptidome profiling technology can provide simple, high-throughput, and reliable quantification of a large number of clinical samples, which is applicable for diverse peptidome-targeting biomarker discoveries using any types of biological specimens.
PMCID: PMC3075260  PMID: 21533267
23.  Determination of burn patient outcome by large-scale quantitative discovery proteomics 
Critical care medicine  2013;41(6):1421-1434.
Emerging proteomics techniques can be used to establish proteomic outcome signatures and to identify candidate biomarkers for survival following traumatic injury. We applied high-resolution liquid chromatography-mass spectrometry (LC-MS) and multiplex cytokine analysis to profile the plasma proteome of survivors and non-survivors of massive burn injury to determine the proteomic survival signature following a major burn injury.
Proteomic discovery study.
Five burn hospitals across the U.S.
Thirty-two burn patients (16 non-survivors and 16 survivors), 19–89 years of age, were admitted within 96 h of injury to the participating hospitals with burns covering >20% of the total body surface area and required at least one surgical intervention.
Measurements and Main Results
We found differences in circulating levels of 43 proteins involved in the acute phase response, hepatic signaling, the complement cascade, inflammation, and insulin resistance. Thirty-two of the proteins identified were not previously known to play a role in the response to burn. IL-4, IL-8, GM-CSF, MCP-1, and β2-microglobulin correlated well with survival and may serve as clinical biomarkers.
These results demonstrate the utility of these techniques for establishing proteomic survival signatures and for use as a discovery tool to identify candidate biomarkers for survival. This is the first clinical application of a high-throughput, large-scale LC-MS-based quantitative plasma proteomic approach for biomarker discovery for the prediction of patient outcome following burn, trauma or critical illness.
PMCID: PMC3660437  PMID: 23507713
burn; inflammation; proteomic profiling; plasma proteins; LC-MS; biomarker
24.  Salivary Proteomics for Oral Cancer Biomarker Discovery 
This study aims to explore the presence of informative protein biomarkers in the human saliva proteome and to evaluate their potential for detection of oral squamous cell carcinoma (OSCC).
Experimental Design
Whole saliva samples were collected from patients (n = 64) with OSCC and matched healthy subjects (n = 64). The proteins in pooled whole saliva samples of patients with OSCC (n = 16) and matched healthy subjects (n = 16) were profiled using shotgun proteomics based on C4 reversed-phase liquid chromatography for prefractionation, capillary reversed-phase liquid chromatography with quadruple time-of-flight mass spectrometry, and Mascot sequence database searching. Immunoassays were used for validation of the candidate biomarkers on a new group of OSCC (n = 48) and matched healthy subjects (n = 48). Receiver operating characteristic analysis was exploited to evaluate the diagnostic value of discovered candidate biomarkers for OSCC.
Subtractive proteomics revealed several salivary proteins at differential levels between the OSCC patients and matched control subjects. Five candidate biomarkers were successfully validated using immunoassays on an independent set of OSCC patients and matched healthy subjects. The combination of these candidate biomarkers yielded a receiver operating characteristic value of 93%, sensitivity of 90%, and specificity of 83% in detecting OSCC.
Patient-based saliva proteomics is a promising approach to searching for OSCC biomarkers. The discovery of these new targets may lead to a simple clinical tool for the noninvasive diagnosis of oral cancer. Long-term longitudinal studies with large populations of individuals with oral cancer and those who are at high risk of developing oral cancer are needed to validate these potential biomarkers.
PMCID: PMC2877125  PMID: 18829504
25.  Proteogenomic strategies for identification of aberrant cancer peptides using large-scale Next Generation Sequencing data 
Proteomics  2014;14(0):2719-2730.
Cancer is driven by the acquisition of somatic DNA lesions. Distinguishing the early driver mutations from subsequent passenger mutations is key to molecular sub-typing of cancers, understanding cancer progression, and the discovery of novel biomarkers. The advances of genomics technologies (whole-genome exome, and transcript sequencing, collectively referred to as NGS(Next Gengeration Sequencing)) have fueled recent studies on somatic mutation discovery. However, the vision is challenged by the complexity, redundancy, and errors in genomic data, and the difficulty of investigating the proteome translated portion of aberrant genes using only genomic approaches. Combination of proteomic and genomic technologies are increasingly being employed.
Various strategies have been employed to allow the usage of large scale NGS data for conventional MS/MS searches. This paper provides a discussion of applying different strategies relating to large database search, and FDR(False Discovery Rate) based error control, and their implication to cancer proteogenomics. Moreover, it extends and develops the idea of a unified genomic variant database that can be searched by any mass spectrometry sample. A total of 879 BAM files downloaded from TCGA repository were used to create a 4.34 GB unified FASTA database which contained 2, 787, 062 novel splice junctions, 38, 464 deletions, 1, 105 insertions, and 182, 302 substitutions. Proteomic data from a single ovarian carcinoma sample (439, 858 spectra) was searched against the database. By applying the most conservative FDR measure, we have identified 524 novel peptides and 65, 578 known peptides at 1% FDR threshold. The novel peptides include interesting examples of doubly mutated peptides, frame-shifts, and non-sample-recruited mutations, which emphasize the strength of our approach.
PMCID: PMC4256132  PMID: 25263569

Results 1-25 (947869)