Robust biomarkers are needed to improve microbial identification and diagnostics. Proteomics methods based on mass spectrometry can be used for the discovery of novel biomarkers through their high sensitivity and specificity. However, there has been a lack of a coherent pipeline connecting biomarker discovery with established approaches for evaluation and validation. We propose such a pipeline that uses in silico methods for refined biomarker discovery and confirmation.
The pipeline has four main stages: Sample preparation, mass spectrometry analysis, database searching and biomarker validation. Using the pathogen Clostridium botulinum as a model, we show that the robustness of candidate biomarkers increases with each stage of the pipeline. This is enhanced by the concordance shown between various database search algorithms for peptide identification. Further validation was done by focusing on the peptides that are unique to C. botulinum strains and absent in phylogenetically related Clostridium species. From a list of 143 peptides, 8 candidate biomarkers were reliably identified as conserved across C. botulinum strains. To avoid discarding other unique peptides, a confidence scale has been implemented in the pipeline giving priority to unique peptides that are identified by a union of algorithms.
This study demonstrates that implementing a coherent pipeline which includes intensive bioinformatics validation steps is vital for discovery of robust biomarkers. It also emphasises the importance of proteomics based methods in biomarker discovery.
Recent technical advances in the field of quantitative proteomics have stimulated a large number of biomarker discovery studies of various diseases, providing avenues for new treatments and diagnostics. However, inherent challenges have limited the successful translation of candidate biomarkers into clinical use, thus highlighting the need for a robust analytical methodology to transition from biomarker discovery to clinical implementation. We have developed an end-to-end computational proteomic pipeline for biomarkers studies. At the discovery stage, the pipeline emphasizes different aspects of experimental design, appropriate statistical methodologies, and quality assessment of results. At the validation stage, the pipeline focuses on the migration of the results to a platform appropriate for external validation, and the development of a classifier score based on corroborated protein biomarkers. At the last stage towards clinical implementation, the main aims are to develop and validate an assay suitable for clinical deployment, and to calibrate the biomarker classifier using the developed assay. The proposed pipeline was applied to a biomarker study in cardiac transplantation aimed at developing a minimally invasive clinical test to monitor acute rejection. Starting with an untargeted screening of the human plasma proteome, five candidate biomarker proteins were identified. Rejection-regulated proteins reflect cellular and humoral immune responses, acute phase inflammatory pathways, and lipid metabolism biological processes. A multiplex multiple reaction monitoring mass-spectrometry (MRM-MS) assay was developed for the five candidate biomarkers and validated by enzyme-linked immune-sorbent (ELISA) and immunonephelometric assays (INA). A classifier score based on corroborated proteins demonstrated that the developed MRM-MS assay provides an appropriate methodology for an external validation, which is still in progress. Plasma proteomic biomarkers of acute cardiac rejection may offer a relevant post-transplant monitoring tool to effectively guide clinical care. The proposed computational pipeline is highly applicable to a wide range of biomarker proteomic studies.
Novel proteomic technology has led to the generation of vast amounts of biological data and the identification of numerous potential biomarkers. However, computational approaches to translate this information into knowledge capable of impacting clinical care have been lagging. We propose a computational proteomic pipeline for biomarker studies that is founded on the combination of advanced statistical methodologies. We demonstrate our approach through the analysis of data obtained from heart transplant patients. Heart transplantation is the gold standard treatment for patients with end-stage heart failure, but is complicated by episodes of immune rejection that can adversely impact patient outcomes. Current rejection monitoring approaches are highly invasive, requiring a biopsy of the heart. This work aims to reduce the need for biopsies, and demonstrate the power and utility of computational approaches in proteomic biomarker discovery. Our work utilizes novel high-throughput proteomic technology combined with advanced statistical techniques to identify blood markers that guide the decision as to whether a biopsy is warranted, reduce the number of unnecessary biopsies, and ultimately diagnose the presence of rejection in heart transplant patients. Additionally, the proposed computational methodologies can be applied to a range of proteomic biomarker studies of various diseases and conditions.
Two-dimensional gel electrophoresis (2-DE) is widely applied and remains the method of choice in proteomics; however, pervasive 2-DE-related concerns undermine its prospects as a dominant separation technique in proteome research. Consequently, the state-of-the-art shotgun techniques are slowly taking over and utilising the rapid expansion and advancement of mass spectrometry (MS) to provide a new toolbox of gel-free quantitative techniques. When coupled to MS, the shotgun proteomic pipeline can fuel new routes in sensitive and high-throughput profiling of proteins, leading to a high accuracy in quantification. Although label-based approaches, either chemical or metabolic, gained popularity in quantitative proteomics because of the multiplexing capacity, these approaches are not without drawbacks. The burgeoning label-free methods are tag independent and suitable for all kinds of samples. The challenges in quantitative proteomics are more prominent in plants due to difficulties in protein extraction, some protein abundance in green tissue, and the absence of well-annotated and completed genome sequences. The goal of this perspective assay is to present the balance between the strengths and weaknesses of the available gel-based and -free methods and their application to plants. The latest trends in peptide fractionation amenable to MS analysis are as well discussed.
Hepatocellular carcinoma (HCC) is one of the most common and aggressive cancers and is associated with a poor survival rate. Clinically, the level of alpha-fetoprotein (AFP) has been used as a biomarker for the diagnosis of HCC. The discovery of useful biomarkers for HCC, focused solely on the proteome, has been difficult; thus, wide-ranging global data mining of genomic and proteomic databases from previous reports would be valuable in screening biomarker candidates. Further, multiple reaction monitoring (MRM), based on triple quadrupole mass spectrometry, has been effective with regard to high-throughput verification, complementing antibody-based verification pipelines. In this study, global data mining was performed using 5 types of HCC data to screen for candidate biomarker proteins: cDNA microarray, copy number variation, somatic mutation, epigenetic, and quantitative proteomics data. Next, we applied MRM to verify HCC candidate biomarkers in individual serum samples from 3 groups: a healthy control group, patients who have been diagnosed with HCC (Before HCC treatment group), and HCC patients who underwent locoregional therapy (After HCC treatment group). After determining the relative quantities of the candidate proteins by MRM, we compared their expression levels between the 3 groups, identifying 4 potential biomarkers: the actin-binding protein anillin (ANLN), filamin-B (FLNB), complementary C4-A (C4A), and AFP. The combination of 2 markers (ANLN, FLNB) improved the discrimination of the before HCC treatment group from the healthy control group compared with AFP. We conclude that the combination of global data mining and MRM verification enhances the screening and verification of potential HCC biomarkers. This efficacious integrative strategy is applicable to the development of markers for cancer and other diseases.
Summary and recent advances
Mass spectrometry, specifically the analysis of complex peptide mixtures by liquid chromatography and tandem mass spectrometry (shotgun proteomics) has been at the center of proteomics research for the last decade. To overcome some of the fundamental limitations of the approach, including its limited sensitivity and high degree of redundancy, new proteomics workflows are being developed. Among these, targeting methods in which specific peptides are selectively isolated, identified and quantified are particularly promising. Here we summarize recent incremental advances in shotgun proteomics methods and outline emerging targeted workflows. The development of the target driven approaches with their ability to detect and quantify identical, non-redundant sets of proteins in multiple repeat analyses will be critically important for the application of proteomics to biomarker discovery and validation, and to systems biology research.
A compelling need exists for the development of technologies that facilitate and accelerate the discovery of novel protein biomarkers with therapeutic and diagnostic potential. Comparisons among shotgun proteome technologies, including capillary isotachophoresis (CITP)-based multidimensional separations and multidimensional liquid chromatography system, are therefore performed in this study regarding their abilities to address the challenges of protein complexity and relative abundance inherent in glioblastoma multiforme derived cancer stem cells. Comparisons are conducted using a single processed protein digest with equal sample loading, identical second dimension separation (reversed phase liquid chromatography) and mass spectrometry conditions, and consistent search parameters and cutoff established by the target-decoy determined false discovery rate.
Besides achieving superior overall proteome performance in total peptide, distinct peptide, and distinct protein identifications, analytical reproducibility of the CITP proteome platform coupled with the spectral counting approach is determined by a Pearson R2 value of 0.98 and a coefficient of variation of 15% across all proteins quantified. In contrast, extensive fraction overlapping in strong cation exchange greatly limits the ability of multidimensional liquid chromatography separations for mining deeper into the tissue proteome as evidenced by the poor coverage in various protein functional categories and key protein pathways. The CITP proteomic technology, equipped with selective analyte enrichment and ultrahigh resolving power, is expected to serve as a critical component in the overall toolset required for biomarker discovery via shotgun proteomic analysis of tissue specimens.
Biomarker; Capillary Electrophoresis; Mass Spectrometry; Strong Cation Exchange Chromatography; Tissue Proteomics
High-throughput technologies can now identify hundreds of candidate protein biomarkers for any disease with relative ease. However, because there are no assays for the majority of proteins and de novo immunoassay development is prohibitively expensive, few candidate biomarkers are tested in clinical studies. We tested whether the analytical performance of a biomarker identification pipeline based on targeted mass spectrometry would be sufficient for data-dependent prioritization of candidate biomarkers, de novo development of assays and multiplexed biomarker verification. We used a data-dependent triage process to prioritize a subset of putative plasma biomarkers from >1,000 candidates previously identified using a mouse model of breast cancer. Eighty-eight novel quantitative assays based on selected reaction monitoring mass spectrometry were developed, multiplexed and evaluated in 80 plasma samples. Thirty-six proteins were verified as being elevated in the plasma of tumor-bearing animals. The analytical performance of this pipeline suggests that it should support the use of an analogous approach with human samples.
We developed a pipeline to integrate the proteomic technologies used from the discovery to the verification stages of plasma biomarker identification and applied it to identify early biomarkers of cardiac injury from the blood of patients undergoing a therapeutic, planned myocardial infarction (PMI) for treatment of hypertrophic cardiomyopathy. Sampling of blood directly from patient hearts before, during and after controlled myocardial injury ensured enrichment for candidate biomarkers and allowed patients to serve as their own biological controls. LC-MS/MS analyses detected 121 highly differentially expressed proteins, including previously credentialed markers of cardiovascular disease and >100 novel candidate biomarkers for myocardial infarction (MI). Accurate inclusion mass screening (AIMS) qualified a subset of the candidates based on highly specific, targeted detection in peripheral plasma, including some markers unlikely to have been identified without this step. Analyses of peripheral plasma from controls and patients with PMI or spontaneous MI by quantitative multiple reaction monitoring mass spectrometry or immunoassays suggest that the candidate biomarkers may be specific to MI. This study demonstrates that modern proteomic technologies, when coherently integrated, can yield novel cardiovascular biomarkers meriting further evaluation in large, heterogeneous cohorts.
Although the field of mass spectrometry-based proteomics is still in its infancy, recent developments in targeted proteomic techniques have left the field poised to impact the clinical protein biomarker pipeline now more than at any other time in history. For proteomics to meet its potential for finding biomarkers, clinicians, statisticians, epidemiologists and chemists must work together in an interdisciplinary approach. These interdisciplinary efforts will have the greatest chance for success if participants from each discipline have a basic working knowledge of the other disciplines. To that end, the purpose of this review is to provide a nontechnical overview of the emerging/evolving roles that mass spectrometry (especially targeted modes of mass spectrometry) can play in the biomarker pipeline, in hope of making the technology more accessible to the broader community for biomarker discovery efforts. Additionally, the technologies discussed are broadly applicable to proteomic studies, and are not restricted to biomarker discovery.
targeted proteomics; multiple reaction monitoring; selected reaction monitoring; biomarker; mass spectrometry
Biomarker discovery produces lists of candidate markers whose presence and level must be subsequently verified in serum or plasma. Verification represents a paradigm shift from unbiased discovery approaches to targeted, hypothesis-driven methods and relies upon specific, quantitative assays optimized for the selective detection of target proteins. Many protein biomarkers of clinical currency are present at or below the nanogram/milliliter range in plasma and have been inaccessible to date by MS-based methods. Using multiple reaction monitoring coupled with stable isotope dilution mass spectrometry, we describe here the development of quantitative, multiplexed assays for six proteins in plasma that achieve limits of quantitation in the 1–10 ng/ml range with percent coefficients of variation from 3 to 15% without immunoaffinity enrichment of either proteins or peptides. Sample processing methods with sufficient throughput, recovery, and reproducibility to enable robust detection and quantitation of candidate biomarker proteins were developed and optimized by addition of exogenous proteins to immunoaffinity depleted plasma from a healthy donor. Quantitative multiple reaction monitoring assays were designed and optimized for signature peptides derived from the test proteins. Based upon calibration curves using known concentrations of spiked protein in plasma, we determined that each target protein had at least one signature peptide with a limit of quantitation in the 1–10 ng/ml range and linearity typically over 2 orders of magnitude in the measurement range of interest. Limits of detection were frequently in the high picogram/milliliter range. These levels of assay performance represent up to a 1000-fold improvement compared with direct analysis of proteins in plasma by MS and were achieved by simple, robust sample processing involving abundant protein depletion and minimal fractionation by strong cation exchange chromatography at the peptide level prior to LC-multiple reaction monitoring/MS. The methods presented here provide a solid basis for developing quantitative MS-based assays of low level proteins in blood.
Serum prostate-specific antigen (PSA) levels ranging from 4 to 10 ng/mL is considered a diagnostic gray zone for detecting prostate cancer because biopsies reveal no evidence of cancer in 75% of these subjects. Our goal was to discover a new highly specific biomarker for prostate cancer by analyzing plasma proteins using a proteomic technique. Enriched plasma proteins from 25 prostate cancer patients and 15 healthy controls were analyzed using a label-free quantitative shotgun proteomics platform called 2DICAL (2-dimensional image converted analysis of liquid chromatography and mass spectrometry) and candidate biomarkers were searched. Among the 40,678 identified mass spectrum (MS) peaks, 117 peaks significantly differed between prostate cancer patients and healthy controls. Ten peaks matched carbonic anhydrase I (CAI) by tandem MS. Independent immunological assays revealed that plasma CAI levels in 54 prostate cancer patients were significantly higher than those in 60 healthy controls (P = 0.022, Mann-Whitney U test). In the PSA gray-zone group, the discrimination rate of prostate cancer patients increased by considering plasma CAI levels. CAI can potentially serve as a valuable plasma biomarker and the combination of PSA and CAI may have great advantages for diagnosing prostate cancer in patients with gray-zone PSA level.
Aberrant interactions between the host and the intestinal bacteria are thought to contribute to the pathogenesis of many digestive diseases. However, studying the complex ecosystem at the human mucosal-luminal interface (MLI) is challenging and requires an integrative systems biology approach. Therefore, we developed a novel method integrating lavage sampling of the human mucosal surface, high-throughput proteomics, and a unique suite of bioinformatic and statistical analyses. Shotgun proteomic analysis of secreted proteins recovered from the MLI confirmed the presence of both human and bacterial components. To profile the MLI metaproteome, we collected 205 mucosal lavage samples from 38 healthy subjects, and subjected them to high-throughput proteomics. The spectral data were subjected to a rigorous data processing pipeline to optimize suitability for quantitation and analysis, and then were evaluated using a set of biostatistical tools. Compared to the mucosal transcriptome, the MLI metaproteome was enriched for extracellular proteins involved in response to stimulus and immune system processes. Analysis of the metaproteome revealed significant individual-related as well as anatomic region-related (biogeographic) features. Quantitative shotgun proteomics established the identity and confirmed the biogeographic association of 49 proteins (including 3 functional protein networks) demarcating the proximal and distal colon. This robust and integrated proteomic approach is thus effective for identifying functional features of the human mucosal ecosystem, and a fresh understanding of the basic biology and disease processes at the MLI.
Elimination of cancer through early detection and treatment is the ultimate goal of cancer research, and is especially critical for ovarian and other forms of cancers typically diagnosed at very late stages and that have very poor response rates. Proteomics has opened new avenues for the discovery of diagnostic and therapeutic targets. Immunoproteomics, which defines the subset of proteins involved in the immune response, holds considerable promise for providing a better understanding of the early stage immune response to cancer as well as important insights into antigens that may be suitable for immunotherapy. Early administration of immunotherapeutic vaccines can potentially have profound effects on prevention of metastasis and may potentially cure through efficient and complete tumor elimination. We developed a mass-spectrometry-based method to identify novel autoantibody-based serum biomarkers for the early diagnosis of ovarian cancer that uses native tumor-associated proteins immunoprecipitated by autoantibodies from sera obtained from cancer patients and from cancer-free controls to identify autoantibody signatures that occur at high frequency only in cancer patient sera. Interestingly, we identified a subset of more than 50 autoantigens that were also processed and presented by MHC class I molecules on the surfaces of ovarian cancer cells and thus common to the two immunological processes of humoral and cell-mediated immunity. These shared autoantigens were highly representative of families of proteins with roles in key processes in carcinogenesis and metastasis, such as cell cycle regulation, cell proliferation, apoptosis, tumor suppression and cell adhesion. Autoantibodies appearing at the early stages of cancer suggest that this detectable immune response to the developing tumor can be exploited as early stage biomarkers for the development of ovarian cancer diagnostics. Correspondingly, because the T cell immune response depends on MHC class I processing and presentation of peptides, the identification of proteins that go through this pathway are potential candidates for the development of immunotherapeutics designed to activate a T cell immune response to cancer. To the best of our knowledge, this is the first comprehensive study that identifies and categorizes proteins that are involved in both humoral and cell-mediated immunity against ovarian cancer, and may have broad implications for the discovery and selection of theranostic molecular targets for cancer therapeutics and diagnostics in general.
Immunoproteomics; auto-antigens; ovarian cancer; immunotherapy; bio-marker; early diagnosis
Protein biomarkers are critical for diagnosis, prognosis, and treatment of disease. The transition from protein biomarker discovery to verification can be a rate limiting step in clinical development of new diagnostics. Liquid chromatography-selected reaction monitoring mass spectrometry (LC-SRM MS) is becoming an important tool for biomarker verification studies in highly complex biological samples. Analyte enrichment or sample fractionation is often necessary to reduce sample complexity and improve sensitivity of SRM for quantitation of clinically relevant biomarker candidates present at the low ng/mL range in blood. In this paper, we describe an alternative method for sample preparation for LC-SRM MS, which does not rely on availability of antibodies. This new platform is based on selective enrichment of proteotypic peptides from complex biological peptide mixtures via isoelectric focusing (IEF) on a digital ProteomeChip (dPC™) for SRM quantitation using a triple quadrupole (QQQ) instrument with an LC-Chip (Chip/Chip/SRM). To demonstrate the value of this approach, the optimization of the Chip/Chip/SRM platform was performed using prostate specific antigen (PSA) added to female plasma as a model system. The combination of immunodepletion of albumin and IgG with peptide fractionation on the dPC, followed by SRM analysis, resulted in a limit of quantitation of PSA added to female plasma at the level of ~1–2.5 ng/mL with a CV of ~13%. The optimized platform was applied to measure levels of PSA in plasma of a small cohort of male patients with prostate cancer (PCa) and healthy matched controls with concentrations ranging from 1.5 to 25 ng/mL. A good correlation (r2 = 0.9459) was observed between standard clinical ELISA tests and the SRM-based-assay. Our data demonstrate that the combination of IEF on the dPC and SRM (Chip/Chip/SRM) can be successfully applied for verification of low abundance protein biomarkers in complex samples.
Isoelectric focusing; IEF; digital ProteomeChip; dPC; selected reaction monitoring; SRM; prostate specific antigen; PSA; QQQ; LC-Chip
Osteosarcoma (OSA) is the most common primary bone tumor of dogs and carries a poor prognosis despite aggressive treatment. An improved understanding of the biology of OSA is critically needed to allow for development of novel diagnostic, prognostic, and therapeutic tools. The surface-exposed proteome (SEP) of a cancerous cell includes a multifarious array of proteins critical to cellular processes such as proliferation, migration, adhesion, and inter-cellular communication. The specific aim of this study was to define a SEP profile of two validated canine OSA cell lines and a normal canine osteoblast cell line utilizing a biotinylation/streptavidin system to selectively label, purify, and identify surface-exposed proteins by mass spectrometry (MS) analysis. Additionally, we sought to validate a subset of our MS-based observations via quantitative real-time PCR, Western blot and semi-quantitative immunocytochemistry. Our hypothesis was that MS would detect differences in the SEP composition between the OSA and the normal osteoblast cells.
Shotgun MS identified 133 putative surface proteins when output from all samples were combined, with good consistency between biological replicates. Eleven of the MS-detected proteins underwent analysis of gene expression by PCR, all of which were actively transcribed, but varied in expression level. Western blot of whole cell lysates from all three cell lines was effective for Thrombospondin-1, CYR61 and CD44, and indicated that all three proteins were present in each cell line. Semi-quantitative immunofluorescence indicated that CD44 was expressed at much higher levels on the surface of the OSA than the normal osteoblast cell lines.
The results of the present study identified numerous differences, and similarities, in the SEP of canine OSA cell lines and normal canine osteoblasts. The PCR, Western blot, and immunocytochemistry results, for the subset of proteins evaluated, were generally supportive of the mass spectrometry data. These methods may be applied to other cell lines, or other biological materials, to highlight unique and previously unrecognized differences between samples. While this study yielded data that may prove useful for OSA researchers and clinicians, further refinements of the described techniques are expected to yield greater accuracy and produce a more thorough SEP analysis.
Dog; Proteomics; Osteosarcoma; Mass spectrometry; Biotinylation
The application of “omics” technologies to biological samples generates hundreds to thousands of biomarker candidates; however, a discouragingly small number make it through the pipeline to clinical use. This is in large part due to the incredible mismatch between the large numbers of biomarker candidates and the paucity of reliable assays and methods for validation studies. We desperately need a pipeline that relieves this bottleneck between biomarker discovery and validation. This paper reviews the requirements for technologies to adequately credential biomarker candidates for costly clinical validation and proposes methods and systems to verify biomarker candidates. Models involving pooling of clinical samples, where appropriate, are discussed. We conclude that current proteomic technologies are on the cusp of significantly affecting translation of molecular diagnostics into the clinic.
Biomarker verification; Multiple reaction monitoring; Targeted proteomics
Biomarkers are needed to overcome critical roadblocks in the development of disease-modifying therapeutics for neurodegenerative diseases. Evolving genome-wide expression technologies can comprehensively search for molecular biomarkers and allow fascinating insights into the expanding complexity of the human transcriptome. The technology has matured to the point where some applications are deemed reliable enough for use in patient care. In the neurosciences, it has led to the discoveries of osteopontin in multiple sclerosis and SORL1/LR11 in Alzheimer's, and recent studies indicate its potential for identifying neurogenomic biomarkers. Advances in pre-analytical and analytical methods are improving search efficiency and reproducibility and may lead to a pipeline of biomarker candidates suitable for development into future neurologic diagnostics.
gene expression; transcriptional profiling; microarray; biomarker; blood; biological fluids; variation; stability; reproducibility; validation; Parkinson's disease; Alzheimer's disease; multiple sclerosis; SORL1; LR11; α-synuclein
To critically review and illustrate current methodologic and statistical considerations for bladder cancer biomarker discovery and evaluation.
Original, review, and methodological articles, and editorials were reviewed and summarized.
Biomarkers may be useful at multiple stages of bladder cancer management: early detection, diagnosis, staging, prognosis, and treatment; however, few novel biomarkers are currently used in clinical practice. The reasons for this disjunction are manifold and reflect the long and difficult pathway from candidate biomarker discovery to clinical assay, and the lack of coherent and comprehensive processes (pipelines) for biomarker development. Conceptually, the development of new biomarkers should be a process that is similar to therapeutic drug evaluation - a highly regulated process with carefully regulated phases from discovery to human applications. In a further effort to address the pervasive problem of inadequacies in the design, analysis, and reporting of biomarker prognostic studies, a set of reporting recommendations are discussed. For example, biomarkers should provide unique information that adds to known clinical and pathologic information. Conventional multivariable analyses are not sufficient to demonstrate improved prediction of outcomes. Predictive models, including or excluding any new putative biomarker, needs to show clinically significant improvement of performance in order to claim any real benefit. Towards this end, proper model building, avoidance of overfitting, and external validation are crucial. In addition, it is important to choose appropriate performance measures dependent on outcome and prediction type and to avoid use of cut-points. Biomarkers providing a continuous score provide potentially more useful information than cut-points since risk fits a continuum model. Combination of complementary and independent biomarkers is likely to better capture the biologic potential of a tumor than any single biomarker. Finally, methods that incorporate clinical consequences such as decision curve analysis are crucial to the evaluation of biomarkers.
Attention to sound design and statistical practice should be delivered as early as possible and will help maximize the promise of biomarkers for patient care. Studies should include a measure of predictive accuracy and clinical decision-analysis. External validation using data from an independent cohort provides the strongest evidence that a model is valid. There is a need for adequately assessed clinical biomarkers in bladder cancer.
biomarker; diagnosis; prognosis; treatment; nomogram; decision-analysis; bladder cancer; statistics; statistical analysis
Recent technological developments in proteomics have shown promising initiatives in identifying novel biomarkers of various diseases. Such technologies are capable of investigating multiple samples and generating large amount of data end-points. Examples of two promising proteomics technologies are mass spectrometry, including an instrument based on surface enhanced laser desorption/ionization, and protein microarrays. Proteomics data must, however, undergo analytical processing using bioinformatics. Due to limitations in proteomics tools including shortcomings in bioinformatics analysis, predictive bioinformatics can be utilized as an alternative strategy prior to performing elaborate, high-throughput proteomics procedures. This review describes mass spectrometry, protein microarrays, and bioinformatics and their roles in biomarker discovery, and highlights the significance of integration between proteomics and bioinformatics.
proteomics; mass spectrometry; protein microarrays; surface enhanced laser desorption/ionization; bioinformatics
Predictive medicine, utilizing the ProteinChip® Array technology, will develop through the implementation of novel biomarkers and multimarker patterns for detecting disease, determining patient prognosis, monitoring drug effects such as efficacy or toxicity, and for defining treatment options. These biomarkers may also serve as novel protein drug candidates or protein drug targets. In addition, the technology can be used for discovering small molecule drugs or for defining their mode of action utilizing protein-based assays. In this review, we describe the following applications of the ProteinChip Array technology: (1) discovery and identification of novel inhibitors of HIV-1 replication, (2) serum and tissue proteome analysis for the discovery and development of novel multimarker clinical assays for prostate, breast, ovarian, and other cancers, and (3) biomarker and drug discovery applications for neurological disorders.
Verification of candidate biomarkers relies upon specific, quantitative assays optimized for selective detection of target proteins, and is increasingly viewed as a critical step in the discovery pipeline that bridges unbiased biomarker discovery to preclinical validation. Although individual laboratories have demonstrated that multiple reaction monitoring (MRM) coupled with isotope dilution mass spectrometry can quantify candidate protein biomarkers in plasma, reproducibility and transferability of these assays between laboratories have not been demonstrated. We describe a multilaboratory study to assess reproducibility, recovery, linear dynamic range and limits of detection and quantification of multiplexed, MRM-based assays, conducted by NCI-CPTAC. Using common materials and standardized protocols, we demonstrate that these assays can be highly reproducible within and across laboratories and instrument platforms, and are sensitive to low µg/ml protein concentrations in unfractionated plasma. We provide data and benchmarks against which individual laboratories can compare their performance and evaluate new technologies for biomarker verification in plasma.
Annotated formalin-fixed, paraffin-embedded (FFPE) tissue archives constitute a valuable resource for retrospective biomarker discovery. However, proteomic exploration of archival tissue is impeded by extensive formalin-induced covalent cross-linking. Robust methodology enabling proteomic profiling of archival resources is urgently needed. Recent work is beginning to support the feasibility of biomarker discovery in archival tissues, but further developments in extraction methods which are compatible with quantitative approaches are urgently needed. We report a cost-effective extraction methodology permitting quantitative proteomic analyses of small amounts of FFPE tissue for biomarker investigation. This surfactant/heat-based approach results in effective and reproducible protein extraction in FFPE tissue blocks. In combination with a liquid chromatography−mass spectrometry-based label-free quantitative proteomics methodology, the protocol enables the robust representative and quantitative analyses of the archival proteome. Preliminary validation studies in renal cancer tissues have identified typically 250−300 proteins per 500 ng of tissue with 1D LC−MS/MS with comparable extraction in FFPE and fresh frozen tissue blocks and preservation of tumor/normal differential expression patterns (205 proteins, r = 0.682; p < 10−15). The initial methodology presented here provides a quantitative approach for assessing the potential suitability of the vast FFPE tissue archives as an alternate resource for biomarker discovery and will allow exploration of methods to increase depth of coverage and investigate the impact of preanalytical factors.
We report a cost-effective, single tube extraction methodology permitting the quantitative proteomic analyses of small amounts of FFPE tissue for biomarker investigation. This surfactant/heat-based approach results in effective and reproducible protein extraction from FFPE tissue blocks. In combination with a liquid chromatography−mass spectrometry-based label-free quantitation, the protocol enables the robust representative and quantitative analyses of the archival proteome. Preliminary validation of the methodology in renal cancer tissues is presented.
Formalin-fixed paraffin-embedded (FFPE) tissue; proteomics; label-free quantitation; renal cell carcinoma; tissue biomarkers; biomarker discovery; archival tissue
Unbiased discovery proteomics strategies have the potential to identify large numbers of novel biomarkers that can improve diagnostic and prognostic testing in a clinical setting and may help guide therapeutic interventions. When large numbers of candidate proteins are identified, it may be difficult to validate candidate biomarkers in a timely and efficient fashion from patient plasma samples that are event-driven, of finite volume and irreplaceable, such as at the onset of acute graft-versus-host disease (GVHD), a potentially life-threatening complication of allogeneic hematopoietic stem cell transplantation (HSCT).
Here we describe the process of performing commercially available ELISAs for six validated GVHD proteins: IL-2Rα5, TNFR16, HGF7, IL-88, elafin2, and REG3α3 (also known as PAP1) in a sequential fashion to minimize freeze-thaw cycles, thawed plasma time and plasma usage. For this procedure we perform the ELISAs in sequential order as determined by sample dilution factor as established in our laboratory using manufacturer ELISA kits and protocols with minor adjustments to facilitate optimal sequential ELISA performance. The resulting plasma biomarker concentrations can then be compiled and analyzed for significant findings within a patient cohort. While these biomarkers are currently for research purposes only, their incorporation into clinical care is currently being investigated in clinical trials.
This technique can be applied to perform ELISAs for multiple proteins/cytokines of interest on the same sample(s) provided the samples do not need to be mixed with other reagents. If ELISA kits do not come with pre-coated plates, 96-well half-well plates or 384-well plates can be used to further minimize use of samples/reagents.
Medicine; Issue 68; ELISA; Sequential ELISA; Cytokine; Blood plasma; biomarkers; proteomics; graft-versus-host disease; Small sample; Quantification
Shotgun proteome analysis platforms based on multidimensional liquid chromatography-tandem mass spectrometry (LC-MS/MS) provide a powerful means to discover biomarker candidates in tissue specimens. Analysis platforms must balance sensitivity for peptide detection, reproducibility of detected peptide inventories and analytical throughput for protein amounts commonly present in tissue biospecimens (<100 µg), such that platform stability is sufficient to detect modest changes in complex proteomes. We compared shotgun proteomics platforms by analyzing tryptic digests of whole cell and tissue proteomes using strong cation exchange (SCX) and isoelectric focusing (IEF) separations of peptides prior to LC-MS/MS analysis on a LTQ-Orbitrap hybrid instrument. IEF separations provided superior reproducibility and resolution for peptide fractionation from samples corresponding to both large (100 µg) and small (10 µg) protein inputs. SCX generated more peptide and protein identifications than did IEF with small (10 µg) samples, whereas the two platforms yielded similar numbers of identifications with large (100 µg) samples. In nine replicate analyses of tryptic peptides from 50 µg colon adenocarcinoma protein, overlap in protein detection by the two platforms was 77% of all proteins detected by both methods combined. IEF more quickly approached maximal detection, with 90% of IEF-detectable medium abundance proteins (those detected with a total of 3–4 peptides) detected within three replicate analyses. In contrast, the SCX platform required six replicates to detect 90% of SCX-detectable medium abundance proteins. High reproducibility and efficient resolution of IEF peptide separations make the IEF platform superior to the SCX platform for biomarker discovery via shotgun proteomic analyses of tissue specimens.
shotgun proteomics; isoelectric focusing; ion exchange; LTQ-Orbitrap; cancer
Proteogenomics has the potential to advance genome annotation
through high quality peptide identifications derived from mass spectrometry
experiments, which demonstrate a given gene or isoform is expressed
and translated at the protein level. This can advance our understanding
of genome function, discovering novel genes and gene structure that
have not yet been identified or validated. Because of the high-throughput
shotgun nature of most proteomics experiments, it is essential to
carefully control for false positives and prevent any potential misannotation.
A number of statistical procedures to deal with this are in wide use
in proteomics, calculating false discovery rate (FDR) and posterior
error probability (PEP) values for groups and individual peptide spectrum
matches (PSMs). These methods control for multiple testing and exploit
decoy databases to estimate statistical significance. Here, we show
that database choice has a major effect on these confidence estimates
leading to significant differences in the number of PSMs reported.
We note that standard target:decoy approaches using six-frame translations
of nucleotide sequences, such as assembled transcriptome data, apparently
underestimate the confidence assigned to the PSMs. The source of this
error stems from the inflated and unusual nature of the six-frame
database, where for every target sequence there exists five “incorrect”
targets that are unlikely to code for protein. The attendant FDR and
PEP estimates lead to fewer accepted PSMs at fixed thresholds, and
we show that this effect is a product of the database and statistical
modeling and not the search engine. A variety of approaches to limit
database size and remove noncoding target sequences are examined and
discussed in terms of the altered statistical estimates generated
and PSMs reported. These results are of importance to groups carrying
out proteogenomics, aiming to maximize the validation and discovery
of gene structure in sequenced genomes, while still controlling for
proteogenomics; peptide spectrum match; false
discovery rate; posterior error probability; expressed