The principal goal in proteomics is to extract biologically or clinically meaningful information from large-scale studies in order to provide new insights into fundamental biological processes or find new means to diagnose or treat disease. Many labs now have methods and machinery in place that make possible the robust generation of many thousands of protein identifications each day. This, however, has exposed a new major bottleneck in the proteomics workflow—the problem of analyzing the wealth of protein identifications to find the relatively few proteins that actually render support for biological hypotheses or that have potential medical relevance.
This presentation shows the application of a new bioinformatics tool that almost entirely removes this bottleneck. The new technology was specifically developed to help researchers gain a fast overview of biologically relevant features in vast protein datasets and rapidly zoom in on single proteins or subsets of proteins of particular interest. In a matter of minutes, the output from the MS database search software was turned into information that was biologically more meaningful. This was done by means of sequential steps that: (1) collapsed all protein redundancies into non-redundant lists; (2) filtered/sorted these lists based on experimental observations and biological sequence annotation that was automatically added; and (3) compared/combined lists of annotated proteins to elucidate differences and overlaps between multiple experimental datasets.
The presentation will show through examples how the new technology can be used in proteomics experiments in order to accelerate the otherwise tedious process of making biological sense of lists of protein accession codes. We present the efficient data mining and categorizing of several large datasets of proteins, including data uploaded from the PRIDE database and HUPO projects.
Advances in proteomics technology offer great promise in the understanding and treatment of the molecular basis of disease. The past decade of proteomics research, the study of dynamic protein expression, post-translational modifications, cellular and sub-cellular protein distribution, and protein-protein interactions, has culminated in the identification of many disease-related biomarkers and potential new drug targets. While proteomics remains the tool of choice for discovery research, new innovations in proteomic technology now offer the potential for proteomic profiling to become standard practice in the clinical laboratory. Indeed, protein profiles can serve as powerful diagnostic markers, and can predict treatment outcome in many diseases, in particular cancer. A number of technical obstacles remain before routine proteomic analysis can be achieved in the clinic; however the standardisation of methodologies and dissemination of proteomic data into publicly available databases is starting to overcome these hurdles. At present the most promising application for proteomics is in the screening of specific subsets of protein biomarkers for certain diseases, rather than large scale full protein profiling. Armed with these technologies the impending era of individualised patient-tailored therapy is imminent. This review summarises the advances in proteomics that has propelled us to this exciting age of clinical proteomics, and highlights the future work that is required for this to become a reality.
The accelerated growth of proteomics data presents both opportunities and
challenges. Large-scale proteomic profiling of biological samples such as cells,
organelles or biological fluids has led to discovery of numerous key and novel
proteins involved in many biological/disease processes including cancers, as
well as to the identification of novel disease biomarkers and potential
therapeutic targets. While proteomic data analysis has been greatly assisted by
the many bioinformatics tools developed in recent years, a careful analysis of
the major steps and flow of data in a typical highthroughput analysis reveals a
few gaps that still need to be filled to fully realize the value of the data. To
facilitate functional and pathway discovery for large-scale proteomic data, we
have developed an integrated proteomic expression analysis system, iProXpress,
which facilitates protein identification using a comprehensive sequence library
and functional interpretation using integrated data. With its modular design,
iProXpress complements and can be integrated with other software in a proteomic
data analysis pipeline. This novel approach to complex biological questions
involves the interrogation of multiple data sources, thereby facilitating
hypothesis generation and knowledge discovery from the genomic-scale studies and
fostering disease diagnosis and drug development.
Proteomic profiling; high-throughput analysis; biomarkers; bioinformatic tools; iProXpress; sequence library; pathway discovery; stage specific proteins
The flatworm Schistosoma mansoni is a blood fluke parasite that causes schistosomiasis, a debilitating disease that occurs throughout the developing world. Current schistosomiasis control strategies are mainly based on chemotherapy, but many researchers believe that the best long-term strategy to control schistosomiasis is through immunization with an antischistosomiasis vaccine combined with drug treatment. Several papers on Schistosoma mansoni vaccine and drug development have been published in the past few years, representing an important field of study. The advent of technologies that allow large-scale studies of genes and proteins had a remarkable impact on the screening of new and potential vaccine candidates in schistosomiasis. In this postgenomic scenario, bioinformatic technologies have emerged as important tools to mine transcriptomic, genomic, and proteomic databases. These new perspectives are leading to a new round of rational vaccine development. Herein, we discuss different strategies to identify potential S. mansoni vaccine candidates using computational vaccinology.
Recent technological developments in proteomics have shown promising initiatives in identifying novel biomarkers of various diseases. Such technologies are capable of investigating multiple samples and generating large amount of data end-points. Examples of two promising proteomics technologies are mass spectrometry, including an instrument based on surface enhanced laser desorption/ionization, and protein microarrays. Proteomics data must, however, undergo analytical processing using bioinformatics. Due to limitations in proteomics tools including shortcomings in bioinformatics analysis, predictive bioinformatics can be utilized as an alternative strategy prior to performing elaborate, high-throughput proteomics procedures. This review describes mass spectrometry, protein microarrays, and bioinformatics and their roles in biomarker discovery, and highlights the significance of integration between proteomics and bioinformatics.
proteomics; mass spectrometry; protein microarrays; surface enhanced laser desorption/ionization; bioinformatics
Since the advent of the new proteomics era more than a decade ago, large-scale studies of protein profiling have been used to identify distinctive molecular signatures in a wide array of biological systems, spanning areas of basic biological research, clinical diagnostics, and biomarker discovery directed toward therapeutic applications. Recent advances in protein separation and identification techniques have significantly improved proteomic approaches, leading to enhancement of the depth and breadth of proteome coverage.
Proteomic signatures, specific for multiple diseases, including cancer and pre-invasive lesions, are emerging. This article combines, in a simple manner, relevant proteomic and OMICS clues used in the discovery and development of diagnostic and prognostic biomarkers that are applicable to all clinical fields, thus helping to improve applications of clinical proteomic strategies for translational medicine research.
Proteomics is the large-scale study of the structure and function of proteins in complex biological sample. Such an approach has the potential value to understand the complex nature of the organism. Current proteomic tools allow large-scale, high-throughput analyses for the detection, identification, and functional investigation of proteome. Advances in protein fractionation and labeling techniques have improved protein identification to include the least abundant proteins. In addition, proteomics has been complemented by the analysis of posttranslational modifications and techniques for the quantitative comparison of different proteomes. However, the major limitation of proteomic investigations remains the complexity of biological structures and physiological processes, rendering the path of exploration paved with various difficulties and pitfalls. The quantity of data that is acquired with new techniques places new challenges on data processing and analysis. This article provides a brief overview of currently available proteomic techniques and their applications, followed by detailed description of advantages and technical challenges. Some solutions to circumvent technical difficulties are proposed.
After the genomic era, proteomic corresponds to a wide variety of techniques that study the protein content of cells, tissue, or organism and that allow the isolation of protein of interest. It offers the choice between gel-based and gel-free methods or shotgun proteomics. Applications of proteomic technology may concern three principal objectives in several biomedical or clinical domains of research as in osteoarthritis: (i) to understand the physiopathology or underlying mechanisms leading to a disease or associated with a particular model, (ii), to find disease-specific biomarker, and (iii) to identify new therapeutic targets. This review aimed at gathering most of the data regarding the proteomic techniques and their applications to osteoarthritis research. It also reported technical limitations and solutions, as for example for sample preparation. Proteomics open wide perspectives in biochemical research but many technical matters still remain to be solved.
Since the advent of the new proteomics era more than a decade ago, large-scale studies of protein profiling have been exploited to identify the distinctive molecular signatures in a wide array of biological systems spanning areas of basic biological research, various disease states, and biomarker discovery directed toward therapeutic applications. Recent advances in protein separation and identification techniques have significantly improved proteomics approaches, leading to enhancement of the depth and breadth of proteome coverage. Proteomic signatures specific for invasive lung cancer and preinvasive lesions have begun to emerge. In this review we provide a critical assessment of the state of recent advances in proteomic approaches and the biological lessons they have yielded, with specific emphasis on the discovery of biomarker signatures for the early detection of lung cancer.
proteomics; biomarker; early detection; lung cancer
A perspective overview is given describing the current development of multiplex mass spectrometry assay technology platforms utilized for high throughput clinical sample analysis. The development of targeted therapies with novel personalized medicine drugs will require new tools for monitoring efficacy and outcome that will rely on both the quantification of disease progression related biomarkers as well as the measurement of disease specific pathway/signaling proteins.
The bioinformatics developments play a key central role in the area of clinical proteomics where targeted peptide expressions in health and disease are investigated in small-, medium- and large-scaled clinical studies.
An outline is presented describing applications of the selected reaction monitoring (SRM) mass spectrometry assay principle. This assay form enables the simultaneous description of multiple protein biomarkers and is an area under a fast and progressive development throughout the community. The Human Proteome Organization, HUPO, recently launched the Human Proteome Project (HPP) that will map the organization of proteins on specific chromosomes, on a chromosome-by-chromosome basis utilizing the SRM technology platform. Specific examples of an SRM-multiplex quantitative assay platform dedicated to the cardiovascular disease area, screening Apo A1, Apo A4, Apo B, Apo CI, Apo CII, Apo CIII, Apo D, Apo E, Apo H, and CRP biomarkers used in daily diagnosis routines in clinical hospitals globally, are presented. We also provide data on prostate cancer studies that have identified a variety of PSA isoforms characterized by high-resolution separation interfaced to mass spectrometry.
For the past decade, the development of genomic technology has revolutionized modern biological research and drug discovery. Functional genomic analyses enable biologists to perform analysis of genetic events on a global scale and they have been widely used in gene discovery, biomarker determination, disease classification, and drug target identification. In this article, we provide an overview of the current and emerging tools involved in genomic studies, including expression arrays, microRNA arrays, array CGH, ChIP-on-chip, methylation arrays, mutation analysis, genome wide-association studies, proteomic analysis, integrated functional genomic analysis and related bioinformatic and biostatistical analyses. Using human liver cancer as an example, we provide further information of how these genomic approaches can be applied in cancer research.
Functional genomics; arrays; cancer
Proteomics is a rapidly developing field and it opens new horizons in many research areas of life sciences. In the field of medicine, proteomics promises to accelerate the discovery of new drug targets and protein disease markers useful for in vitro diagnosis. In this article, we review the current proteomics technologies for biomarker discovery and validation, which include two-dimensional gel electrophoresis, one- and two-dimensional liquid chromatography, and proteomic microarrays. We will also review proteomic strategies for protein–protein interactions and identification of post-translational modifications. Selection of the more effective technology or combination of technologies is required to maximize the interpretation and utility of the data.
2D electrophoresis; multiple dimensional chromatography; post-translational modification; proteomic microarrays; proteomics
Proteomics technologies have revolutionized cell biology and biochemistry by providing powerful new tools to characterize complex proteomes, multiprotein complexes and post-translational modifications. Although proteomics technologies could address important problems in clinical and translational cancer research, attempts to use proteomics approaches to discover cancer biomarkers in biofluids and tissues have been largely unsuccessful and have given rise to considerable skepticism. The National Cancer Institute has taken a leading role in facilitating the translation of proteomics from research to clinical application, through its Clinical Proteomic Technologies for Cancer. This article highlights the building of a more reliable and efficient protein biomarker development pipeline that incorporates three steps: discovery, verification and qualification. In addition, we discuss the merits of multiple reaction monitoring mass spectrometry, a multiplex targeted proteomics platform, which has emerged as a potentially promising, high-throughput protein biomarker measurements technology for preclinical ‘verification’.
biomarker; multiple reaction monitoring mass spectrometry; proteomics; verification
Proteomics has been proposed as one of the key technologies in the postgenomic era. So far, however, the comprehensive analysis of cellular proteomes has been a challenge because of the dynamic nature and complexity of the multitude of proteins in cells and tissues. Various approaches have been established for the analyses of proteins in a cell at a given state, and mass spectrometry (MS) has proven to be an efficient and versatile tool. MS-based proteomics approaches have significantly improved beyond the initial identification of proteins to comprehensive characterization and quantification of proteomes and their posttranslational modifications (PTMs). Despite these advances, there is still ongoing development of new technologies to profile and analyze cellular proteomes more completely and efficiently. In this review, we focus on MS-based techniques, describe basic approaches for MS-based profiling of cellular proteomes and analysis methods to identify proteins in complex mixtures, and discuss the different approaches for quantitative proteome analysis. Finally, we briefly discuss novel developments for the analysis of PTMs. Altered levels of PTM, sometimes in the absence of protein expression changes, are often linked to cellular responses and disease states, and the comprehensive analysis of cellular proteome would not be complete without the identification and quantification of the extent of PTMs of proteins.
quantitative proteomics; isotopic labeling; phosphoproteomics
The revolutionary growth in the computation speed and memory storage capability has fueled a new era in the analysis of biological data. Hundreds of microbial genomes and many eukaryotic genomes including a cleaner draft of human genome have been sequenced raising the expectation of better control of microorganisms. The goals are as lofty as the development of rational drugs and antimicrobial agents, development of new enhanced bacterial strains for bioremediation and pollution control, development of better and easy to administer vaccines, the development of protein biomarkers for various bacterial diseases, and better understanding of host-bacteria interaction to prevent bacterial infections. In the last decade the development of many new bioinformatics techniques and integrated databases has facilitated the realization of these goals. Current research in bioinformatics can be classified into: (i) genomics – sequencing and comparative study of genomes to identify gene and genome functionality, (ii) proteomics – identification and characterization of protein related properties and reconstruction of metabolic and regulatory pathways, (iii) cell visualization and simulation to study and model cell behavior, and (iv) application to the development of drugs and anti-microbial agents. In this article, we will focus on the techniques and their limitations in genomics and proteomics. Bioinformatics research can be classified under three major approaches: (1) analysis based upon the available experimental wet-lab data, (2) the use of mathematical modeling to derive new information, and (3) an integrated approach that integrates search techniques with mathematical modeling. The major impact of bioinformatics research has been to automate the genome sequencing, automated development of integrated genomics and proteomics databases, automated genome comparisons to identify the genome function, automated derivation of metabolic pathways, gene expression analysis to derive regulatory pathways, the development of statistical techniques, clustering techniques and data mining techniques to derive protein-protein and protein-DNA interactions, and modeling of 3D structure of proteins and 3D docking between proteins and biochemicals for rational drug design, difference analysis between pathogenic and non-pathogenic strains to identify candidate genes for vaccines and anti-microbial agents, and the whole genome comparison to understand the microbial evolution. The development of bioinformatics techniques has enhanced the pace of biological discovery by automated analysis of large number of microbial genomes. We are on the verge of using all this knowledge to understand cellular mechanisms at the systemic level. The developed bioinformatics techniques have potential to facilitate (i) the discovery of causes of diseases, (ii) vaccine and rational drug design, and (iii) improved cost effective agents for bioremediation by pruning out the dead ends. Despite the fast paced global effort, the current analysis is limited by the lack of available gene-functionality from the wet-lab data, the lack of computer algorithms to explore vast amount of data with unknown functionality, limited availability of protein-protein and protein-DNA interactions, and the lack of knowledge of temporal and transient behavior of genes and pathways.
Genome sequencing and bioinformatics have provided the full hypothetical proteome of many pathogenic organisms. Advances in microarray and mass spectrometry have also yielded large output datasets of possible target proteins/genes. However, the challenge remains to identify new targets for drug discovery from this wealth of information. Further analysis includes bioinformatics and/or molecular biology tools to validate the findings. This is time consuming and expensive, and could fail to yield novel drugs if protein purification and crystallography is impossible. To pre-empt this, a researcher may want to rapidly filter the output datasets for proteins that show good homology to proteins that have already been structurally characterised or proteins that are already targets for known drugs. Critically, those researchers developing novel antibiotics need to select out the proteins that show close homology to any human proteins, as future inhibitors are likely to cross-react with the host protein, causing off-target toxicity effects later in clinical trials.
To solve many of these issues, we have developed a free online resource called Genomes2Drugs which ranks sequences to identify proteins that are (i) homologous to previously crystallized proteins or (ii) targets of known drugs, but are (iii) not homologous to human proteins. When tested using the Plasmodium falciparum malarial genome the program correctly enriched the ranked list of proteins with known drug target proteins.
Genomes2Drugs rapidly identifies proteins that are likely to succeed in drug discovery pipelines. This free online resource helps in the identification of potential drug targets. Importantly, the program further highlights proteins that are likely to be inhibited by FDA-approved drugs. These drugs can then be rapidly moved into Phase IV clinical studies under ‘change-of-application’ patents.
Proteomic technologies have undergone significant development in recent years, which has led to extensive advances in protein research. Currently, proteomic approaches have been applied to many scientific areas, including basic research, various disease and malignant tumour diagnostics, biomarker discovery and other therapeutic applications. In addition, proteomics-driven research articles examining reproductive biology and medicine are becoming increasingly common. The key challenge for this field is to move from lists of identified proteins to obtaining biological information regarding protein function. The present article reviews the available scientific literature related to spermatogenesis. In addition, this study uses two-dimensional electrophoresis mass spectrometry (2DE-MS) and liquid chromatography (LC)-MS to construct a series of proteome profiles describing spermatogenesis. This large-scale identification of proteins provides a rich resource for elucidating the mechanisms underlying male fertility and infertility.
infertility; male fertility; proteomics; spermatogenesis
The work of the consortium that has been formed to complete the entire sequence of the genome of a selected clone of the human malaria parasite, Plasmodium falciparum, is almost finished. Already huge tracts of the genome are available as fully assembled chromosomes or large contigs and the work of initial annotation is in an advanced state. Post-genomic research is in one sense the process of furthering the process of annotation, creating biological atlases and preliminary attempts to make global descriptions of gene transcription and proteome analysis are underway. Comparison between significant amounts of genome data from both closely, and more distantly related organisms, can facilitate the identification of genes themselves, coordinately regulated gene expression groups, gene function and genome organization. Models of malaria can fulfil these functions and in addition provide an experimental system wherein predictions can be tested and basic experimental investigations performed within numerous aspects of disease, pathology, parasite-host and parasite-vector interactions. Comparative genomics in Plasmodium has already been shown to have informative roles in the completion of annotation and the elucidation of gene function. These roles will be illustrated by example and used as the basis for a discussion of the utility of genome information and malaria models in realizing the desired product of Plasmodium genomics, the development of malaria therapies.
Oncoproteomics is the study of proteins and their interactions in a cancer cell by proteomic technologies. Proteomic research first came to the fore with the introduction of two-dimensional gel electrophoresis. At the turn of the century, proteomics has been increasingly applied to cancer research with the wide-spread introduction of mass spectrometry and proteinchip. There is an intense interest in applying proteomics to foster an improved understanding of cancer pathogenesis, develop new tumor biomarkers for diagnosis, and early detection using proteomic portrait of samples. Oncoproteomics has the potential to revolutionize clinical practice, including cancer diagnosis and screening based on proteomic platforms as a complement to histopathology, individualized selection of therapeutic combinations that target the entire cancer-specific protein network, real-time assessment of therapeutic efficacy and toxicity, and rational modulation of therapy based on changes in the cancer protein network associated with prognosis and drug resistance. Besides, oncoproteomics is also applied to the discovery of new therapeutic targets and to the study of drug effects. In pace with the successful completion of the Human Genome Project, the wave of proteomics has raised the curtain on the postgenome era. The study of oncoproteomics provides mankind with a better understanding of neoplasia. In this article, the discovery of cancer biomarkers in recent years is reviewed. The challenges ahead and perspectives of oncoproteomics for biomarkers development are also addressed. With a wealth of information that can be applied to a broad spectrum of biomarker research projects, this review serves as a reference for biomarker researchers, scientists working in proteomics and bioinformatics, oncologists, pharmaceutical scientists, biochemists, biologists, and chemists.
Bioinformatics is the application of omics science, information technology, mathematics and statistics in the field of biomarker detection. Clinical bioinformatics can be applied for identification and validation of new biomarkers to improve current methods of monitoring disease activity and identify new therapeutic targets. Acute lung injurt (ALI)/Acute respiratory distress syndrome (ARDS) affects a large number of patients with a poor prognosis. The present review mainly focused on the progress in understanding disease heterogeneity through the use of evolving biological, genomic, and genetic approaches and the role of clinical bioinformatics in the pathogenesis and treatment of ALI/ARDS. The remarkable advances in clinical bioinformatics can be a new way for understanding disease pathogenesis, diagnosis and treatment.
Acute lung injury; Acute respiratory distress syndrome; Genomics; Proteomics; Metabolomics; Bioinformatics
Information visualization techniques, which take advantage of the bandwidth of human vision, are powerful tools for organizing and analyzing a large amount of data. In the postgenomic era, information visualization tools are indispensable for biomedical research. This paper aims to present an overview of current applications of information visualization techniques in bioinformatics for visualizing different types of biological data, such as from genomics, proteomics, expression profiling and structural studies. Finally, we discuss the challenges of information visualization in bioinformatics related to dealing with more complex biological information in the emerging fields of systems biology and systems medicine.
Information visualization; Bioinformatics
Genomic, proteomic, and other omic-based approaches are now broadly used in biomedical research to facilitate the understanding of disease mechanisms and identification of molecular targets and biomarkers for therapeutic and diagnostic development. While the Omics technologies and bioinformatics tools for analyzing Omics data are rapidly advancing, the functional analysis and interpretation of the data remain challenging due to the inherent nature of the generally long workflows of Omics experiments. We adopt a strategy that emphasizes the use of curated knowledge resources coupled with expert-guided examination and interpretation of Omics data for the selection of potential molecular targets. We describe a downstream workflow and procedures for functional analysis that focus on biological pathways, from which molecular targets can be derived and proposed for experimental validation.
Proteomics; Genomics; Bioinformatics; Biological pathways; Cell signaling; Databases; Molecular targets; Biomarkers
Proteomics refers to the study of the entire set of proteins in a given cell or tissue. With the extensive development of protein separation, mass spectrometry, and bioinformatics technologies, clinical proteomics has shown its potential as a powerful approach for biomarker discovery, particularly in the area of oncology. More than 130 exploratory studies have defined candidate markers in serum, gastrointestinal (GI) fluids, or cancer tissue. In this article, we introduce the commonly adopted proteomic technologies and describe results of a comprehensive review of studies that have applied these technologies to GI oncology, with a particular emphasis on developments in the last 3 years. We discuss reasons why the more than 130 studies to date have had little discernible clinical impact, and we outline steps that may allow proteomics to realize its promise for early detection of disease, monitoring of disease recurrence, and identification of targets for individualized therapy.
Clinical proteomics; Gastrointestinal oncology; Mass spectrometry; Biomarker discovery
Systems biology aims to integrate multiple biological data types such as genomics, transcriptomics and proteomics across different levels of structure and scale; it represents an emerging paradigm in the scientific process which challenges the reductionism that has dominated biomedical research for hundreds of years. Systems biology will nevertheless only be successful if the technologies on which it is based are able to deliver the required type and quality of data. In this review we discuss how well positioned is proteomics to deliver the data necessary to support meaningful systems modelling in parasite biology. We summarise the current state of identification proteomics in parasites, but argue that a new generation of quantitative proteomics data is now needed to underpin effective systems modelling. We discuss the challenges faced to acquire more complete knowledge of protein post-translational modifications, protein turnover and protein-protein interactions in parasites. Finally we highlight the central role of proteome-informatics in ensuring that proteomics data is readily accessible to the user-community and can be translated and integrated with other relevant data types.
Systems-biology; proteomics; transcriptomics; genomics; parasites; host-parasite interactions; Descartes; reductionism
The EMBnet Conference 2008, focusing on 'Leading Applications and Technologies in Bioinformatics', was organized by the European Molecular Biology network (EMBnet) to celebrate its 20th anniversary. Since its foundation in 1988, EMBnet has been working to promote collaborative development of bioinformatics services and tools to serve the European community of molecular biology laboratories. This conference was the first meeting organized by the network that was open to the international scientific community outside EMBnet. The conference covered a broad range of research topics in bioinformatics with a main focus on new achievements and trends in emerging technologies supporting genomics, transcriptomics and proteomics analyses such as high-throughput sequencing and data managing, text and data-mining, ontologies and Grid technologies. Papers selected for publication, in this supplement to BMC Bioinformatics, cover a broad range of the topics treated, providing also an overview of the main bioinformatics research fields that the EMBnet community is involved in.