MicroRNAs (miRNAs) are small noncoding RNAs that play important roles in posttranscriptional regulation of gene expression. Mature miRNAs associate with the RNA interference silencing complex to repress mRNA translation and/or degrade mRNA transcripts. Mass spectrometry-based proteomics has enabled identification of several core components of the canonical miRNA processing pathway and their posttranslational modifications which are pivotal in miRNA regulatory mechanisms. The use of quantitative proteomic strategies has also emerged as a key technique for experimental identification of miRNA targets by allowing direct determination of proteins whose levels are altered because of translational suppression. This review focuses on the role of proteomics and labeling strategies to understand miRNA biology.
Cell biology; iTRAQ; miRNA; Multiple reaction monitoring; Noncoding RNA; SILAC
Advances in mass spectrometry-based proteomics have enabled the incorporation of proteomic data into systems approaches to biology. However, development of analytical methods has lagged behind. Here we describe an empirical Bayes framework for quantitative proteomics data analysis. The method provides a statistical description of each experiment, including the number of proteins that differ in abundance between 2 samples, the experiment's statistical power to detect them, and the false-positive probability of each protein.
We analyzed 2 types of mass spectrometric experiments. First, we showed that the method identified the protein targets of small-molecules in affinity purification experiments with high precision. Second, we re-analyzed a mass spectrometric data set designed to identify proteins regulated by microRNAs. Our results were supported by sequence analysis of the 3′ UTR regions of predicted target genes, and we found that the previously reported conclusion that a large fraction of the proteome is regulated by microRNAs was not supported by our statistical analysis of the data.
Our results highlight the importance of rigorous statistical analysis of proteomic data, and the method described here provides a statistical framework to robustly and reliably interpret such data.
Recent technological developments in proteomics have shown promising initiatives in identifying novel biomarkers of various diseases. Such technologies are capable of investigating multiple samples and generating large amount of data end-points. Examples of two promising proteomics technologies are mass spectrometry, including an instrument based on surface enhanced laser desorption/ionization, and protein microarrays. Proteomics data must, however, undergo analytical processing using bioinformatics. Due to limitations in proteomics tools including shortcomings in bioinformatics analysis, predictive bioinformatics can be utilized as an alternative strategy prior to performing elaborate, high-throughput proteomics procedures. This review describes mass spectrometry, protein microarrays, and bioinformatics and their roles in biomarker discovery, and highlights the significance of integration between proteomics and bioinformatics.
proteomics; mass spectrometry; protein microarrays; surface enhanced laser desorption/ionization; bioinformatics
In this review we examine the current state of analytical methods in proteomics. The conventional methodology using two-dimensional electrophoresis gels and mass spectrometry is discussed, with particular reference to the advantages and shortcomings thereof. Two recently published methods which offer an alternative approach are presented and discussed, with emphasis on how they can provide information not available via two-dimensional gel electrophoresis. These two methods are the isotope-coded affinity tags approach of Gygi et al. and the two-dimensional liquid chromatography–tandem mass spectrometry approach as presented by Link et al. We conclude that both of these new techniques represent significant advances in analytical methodology for proteome analysis. Furthermore, we believe that in the future biological research will continue to be enhanced by the continuation of such developments in proteomic analytical technology.
Proteomics is the study of proteomes, which are the collections of proteins expressed in cells. Whereas genomes are essentially invariant in different cells in an organism, proteomes vary from cell to cell, with time and as a function of environmental stimuli and stress. The integration of new mass spectrometry (MS) methods, data analysis algorithms, and information from databases of protein and gene sequences has enabled the characterization of proteomes. Many environmental agents directly or indirectly generate reactive electrophiles that covalently modify proteins. Although considerable evidence supports a key role for protein adducts in adverse effects of chemicals, limitations in analytical technology have slowed progress in this area. New applications of liquid chromatography-tandem mass spectrometry (LC-MS-MS) now offer the potential to identify protein targets of reactive electrophiles and to map adducts at the level of amino acid sequence. Use of the data-analysis tools Sequest and SALSA (Scoring Algorithm for Spectral Analysis) together with LC-MS-MS analyses of protein digests enables the identification of modified forms of proteins in a sample. These approaches can map adducts to specific amino acids in protein targets and are being adapted to searches for protein adducts in complex proteomes. These tools will facilitate the identification of new biomarkers of chemical exposure and studies of mechanisms by which protein modifications contribute to the adverse effects of environmental exposures.
The ‘omics’ approaches – genomics, proteomics and metabolomics – are based on high-throughput, high-information-content analysis. Using these approaches, as opposed to targeting one or a few analytes, a holistic understanding of the composition of a sample can be obtained. These approaches have revolutionized sample-analysis and data-processing protocols. In metabolomic studies, hundreds of small molecules are simultaneously analyzed using analytical platforms (e.g., gas chromatography-mass spectrometry (GC-MS) or liquid chromatography coupled to tandem mass spectrometry (LC-MS2)). This philosophy of holistic analysis and the application of high-throughput, high-information-content analysis offer several advantages. In this article, we compare the conventional analytical approach of one or a few analytes per sample to the LC-MS2-based metabolomics-type approach in the context of pharmaceutical and environmental analysis.
Chromatography; Environmental; LC-MS2; Metabolomics; Pharmaceutical
The continuously growing interest in small regulatory RNA exploration is one of the important factors that have inspired the recent development of new high throughput techniques such as DNA microarrays or next generation sequencing. Each of these methods offers some significant advantages but at the same time each of them is expensive, laborious and challenging especially in terms of data analysis. Therefore, there is still a need to develop new analytical methods enabling the fast, simple and cost-effective examination of the complex RNA mixtures. Recently, increasing attention has been focused on the RNA degradome as a potential source of riboregulators. Accordingly, we attempted to employ a two-dimensional gel electrophoresis as a quick and uncomplicated method of profiling RNA degradome in plant or human cells. This technique has been successfully used in proteome analysis. However, its application in nucleic acids studies has been very limited. Here we demonstrate that two dimensional electrophoresis is a technique which allows one to quickly and cost-effectively identify and compare the profiles of 10–90 nucleotide long RNA accumulation in various cells and organs.
Electronic supplementary material
The online version of this article (doi:10.1007/s11033-011-0718-1) contains supplementary material, which is available to authorized users.
RNA degradome; 2D-PAGE; Small non-coding RNA; RNA biomarkers
The completion of the sequencing of the human genome and the concurrent, rapid development of high-throughput proteomic methods have resulted in an increasing need for automated approaches to archive proteomic data in a repository that enables the exchange of data among researchers and also accurate integration with genomic data. PeptideAtlas () addresses these needs by identifying peptides by tandem mass spectrometry (MS/MS), statistically validating those identifications and then mapping identified sequences to the genomes of eukaryotic organisms. A meaningful comparison of data across different experiments generated by different groups using different types of instruments is enabled by the implementation of a uniform analytic process. This uniform statistical validation ensures a consistent and high-quality set of peptide and protein identifications. The raw data from many diverse proteomic experiments are made available in the associated PeptideAtlas repository in several formats. Here we present a summary of our process and details about the Human, Drosophila and Yeast PeptideAtlas builds.
Proteomic technologies are used to study the complexity of proteins, their roles and biological functions. It is based on the premise that the diversity of proteins, comprising their isoforms, and post translational modifications (PTMs) underlies biology. Based on an annotated human cardiac proteins 62 % have at least one PTM (phosphorylation currently dominating) while ~25% have more than one type of modification. The field of proteomics strives to observe and quantify this protein diversity. It represents a broad group of technologies and methods arising from analytical protein biochemistry, analytical separation, mass spectrometry and bioinformatics. Since the 1990s the application of proteomic analysis has been increasingly used in cardiovascular research. Technology development and adaptation has been at the heart of this progress. Technology undergoes a maturing becoming routine and ultimately obsolete being replaced by newer methods. Due to extensive methodological improvements, many proteomic studies today observe 1000-5000 proteins. Only five years ago this was not feasible. Even so, there are still road blocks. Nowadays, there is a focus on obtaining better characterization of protein isoforms and specific PTMs. Consequently, new techniques for identification and quantification of modified amino acid residues are required, as is the assessment of SNPs in addition to determination of the structural and functional consequences. In this series, four articles provide concrete examples of how proteomics can be incorporated into cardiovascular research and address specific biological questions. They also illustrate how novel discoveries can be made and how proteomic technology has continued to evolve.
Proteomics; technology; protein isoform; posttranslational modification; polymorphorism
Protein modification by ubiquitin is a central regulatory mechanism in eukaryotic cells. Recent proteomics developments in mass spectrometry enable systematic analysis of cellular components in the ubiquitin pathway. Here, we review the advances in analyzing ubiquitinated substrates, determining modified lysine residues, quantifying polyubiquitin chain topologies, as well as profiling deubiquitinating enzymes based on the activity. Moreover, proteomic approaches have been developed for probing the interactome of proteasome and for identifying proteins with ubiquitin-binding domains. Similar strategies have been applied on the studies of the modification by ubiquitin-like proteins as well. These strategies are discussed with respect to their advantages, limitations and potential improvements. While the utilization of current methodologies has rapidly expanded the scope of protein modification by the ubiquitin family, a more active role is anticipated in the functional studies with the emerging of quantitative mass spectrometry.
Quantitative targeted proteomics has recently taken front stage in the proteomics community. Centered on multiple reaction monitoring–mass spectrometry (MRM–MS) methodologies, quantitative targeted proteomics is being used in the verification of global proteomics data, the discovery of lower abundance proteins, protein post-translational modifications, discrimination of select highly homologous protein isoforms and as the final step in biomarker discovery. An older methodology utilized with small molecule analysis, the proteomics community is making great technological strides to develop MRM–MS as the next method to address previously challenging issues in global proteomics experimentation, namely dynamic range, identification of post-translational modifications, sensitivity and selectivity of measurement which will undoubtedly further biomedical knowledge. This brief review will provide a general introduction of MRM–MS and highlight its novel application for targeted quantitative proteomic experimentations.
absolute quantification; quantitative proteomics; mass spectrometry; multiple reaction monitoring; stable isotope dilution; targeted proteomics
An increasing number of studies involve integrative analysis of gene and protein expression data taking advantage of new technologies such as next-generation transcriptome sequencing (RNA-Seq) and highly sensitive mass spectrometry (MS) instrumentation. Thus, it becomes interesting to revisit the correlative analysis of gene and protein expression data using more recently generated datasets. Furthermore, within the proteomics community there is a substantial interest in comparing the performance of different label-free quantitative proteomic strategies. Gene expression data can be used as an indirect benchmark for such protein-level comparisons. In this work we use publicly available mouse data to perform a joint analysis of genomic and proteomic data obtained on the same organism. First, we perform a comparative analysis of different label-free protein quantification methods (intensity-based and spectral count based, and using various associated data normalization steps) using several software tools on proteomic side. Similarly, we perform correlative analysis of gene expression data derived using microarray and RNA-Seq methods on genomic side. We also investigate the correlation between gene and protein expression data, and various factors affecting the accuracy of quantitation at both levels. It is observed that spectral count-based protein abundance metrics, which are easy to extract from any published data, are comparable to intensity-base measures with respect to correlation with gene expression data. The results of this work should be useful for designing robust computational pipelines for extraction and joint analysis of gene and protein expression data in the context of integrative studies.
The relatively small numbers of proteins and fewer possible posttranslational modifications in microbes provides a unique opportunity to comprehensively characterize their dynamic proteomes. We have constructed a Peptide Atlas (PA) for 62.7% of the predicted proteome of the extremely halophilic archaeon Halobacterium salinarum NRC-1 by compiling approximately 636,000 tandem mass spectra from 497 mass spectrometry runs in 88 experiments. Analysis of the PA with respect to biophysical properties of constituent peptides, functional properties of parent proteins of detected peptides, and performance of different mass spectrometry approaches has helped highlight plausible strategies for improving proteome coverage and selecting signature peptides for targeted proteomics. Notably, discovery of a significant correlation between absolute abundances of mRNAs and proteins has helped identify low abundance of proteins as the major limitation in peptide detection. Furthermore we have discovered that iTRAQ labeling for quantitative proteomic analysis introduces a significant bias in peptide detection by mass spectrometry. Therefore, despite identifying at least one proteotypic peptide for almost all proteins in the PA, a context-dependent selection of proteotypic peptides appears to be the most effective approach for targeted proteomics.
Peptide Atlas; Halobacterium; iTRAQ; bioinformatics; archaea; proteomics
A mass spectrometry analysis of the yeast proteome shows that complex mixture analysis is not limited by sensitivity but by a combination of dynamic range and by effective sequencing speed.
Mass spectrometry has become a powerful tool for the analysis of large numbers of proteins in complex samples, enabling much of proteomics. Due to various analytical challenges, so far no proteome has been sequenced completely. O'Shea, Weissman and co-workers have recently determined the copy number of yeast proteins, making this proteome an excellent model system to study factors affecting coverage.
To probe the yeast proteome in depth and determine factors currently preventing complete analysis, we grew yeast cells, extracted proteins and separated them by one-dimensional gel electrophoresis. Peptides resulting from trypsin digestion were analyzed by liquid chromatography mass spectrometry on a linear ion trap-Fourier transform mass spectrometer with very high mass accuracy and sequencing speed. We achieved unambiguous identification of more than 2,000 proteins, including very low abundant ones. Effective dynamic range was limited to about 1,000 and effective sensitivity to about 500 femtomoles, far from the subfemtomole sensitivity possible with single proteins. We used SILAC (stable isotope labeling by amino acids in cell culture) to generate one-to-one pairs of true peptide signals and investigated if sensitivity, sequencing speed or dynamic range were limiting the analysis.
Advanced mass spectrometry methods can unambiguously identify more than 2,000 proteins in a single proteome. Complex mixture analysis is not limited by sensitivity but by a combination of dynamic range (high abundance peptides preventing sequencing of low abundance ones) and by effective sequencing speed. Substantially increased coverage of the yeast proteome appears feasible with further development in software and instrumentation.
The chemical modification of protein thiols by reduction and alkylation is common in the preparation of proteomic samples for analysis by mass spectrometry (MS). Modification at other functional groups has received less attention in MS-based proteomics. Amine modification (Lys, N-termini) by reductive dimethylation or by acylation (e.g. iTRAQ labeling) has recently gained some popularity in peptide-based approaches (bottom-up MS). Modification at acidic groups (Asp, Glu, C-termini) has been explored very minimally. Here, we describe a sequential labeling strategy that enables complete modification of thiols, amines, and acids on peptides or small intact proteins. This method includes (1) the reduction and alkylation of thiols, (2) the reductive dimethylation of amines, and (3) the amidation of acids with any of several amines. This chemical modification scheme offers several options both for the incorporation of stable isotopes for relative quantification and for improving peptides or proteins as MS analytes.
mass spectrometry (MS); stable isotope labeling; acylation; dimethylation; amidation; proteomics; protein derivatization; peptide derivatization
Mass spectrometry based structural proteomics approaches for probing protein structures are increasingly gaining in popularity. The potential for such studies is limited because of the lack of analytical techniques for the automated interpretation of resulting data. In this paper, a suite of algorithms called ProtMapMS is developed, integrated, and implemented specifically for the comprehensive automatic analysis of mass spectrometry data obtained for protein structure studies using covalent labeling. The functions include data format conversion, mass spectrum interpretation, detection and verification of all peptide species, confirmation of the modified peptide products, and quantification of the extent of peptide modification. The results thus obtained provide valuable data for use in combination with computational approaches for protein structure modeling. The structures of both monomeric and hexameric forms of insulin were investigated by oxidative protein footprinting followed by high-resolution mass spectrometry. The resultant data was analyzed both manually and using ProtMapMS without any manual intervention. The results obtained using the two methods were found to be in close agreement and overall were consistent with predictions from the crystallographic structure.
Mass spectrometry-based investigation of clinical samples enables the
high-throughput identification of protein biomarkers. We provide an overview of
mass spectrometry-based proteomic techniques that are applicable to the
investigation of clinical samples. We address sample collection, protein
extraction and fractionation, mass spectrometry modalities, and quantitative
proteomics. Finally, we examine the limitations and further potential of such
technologies. Liquid chromatography fractionation coupled with tandem mass
spectrometry is well suited to handle mixtures of hundreds or thousands of
proteins. Mass spectrometry-based proteome elucidation can reveal potential
biomarkers and aid in the development of hypotheses for downstream investigation
of the molecular mechanisms of disease.
chronic pancreatitis; biomarkers; pancreas; mass spectrometry
“Proteomics” refers to the systematic analysis of proteins. It complements other “omics” technologies such as genomics and transcriptomics in elucidating the identity of proteins of an organism, and understanding their functions. Proteomics is used in many areas of research such as discovery of markers for diagnosis and vaccine candidates, understanding pathogenic mechanisms, in the study of expression patterns at different time points and in response to different stimuli, and in elucidating functional protein networks. Proteomics analysis involves sample preparation, protein separation, and protein identification. The ‘heart’ of current proteomics is mass-spectrometry, with LC-MS/MS and MALDI-TOF/TOF being commonly used equipment. However, the high costs of the equipment, software, databases, and the need for skilled personnel limit the wide utilization of this technology in the less developed countries. Therefore, there need to be sharing of facilities, better networking and collaborations among our scientists and laboratories to take advantage of this powerful technology.
analysis; mass-spectrometry; proteins; proteomics; technology
A renewed interest in non-coding RNA (ncRNA) has led to the discovery of novel RNA species and post-transcriptional ribonucleoside modifications, and an emerging appreciation for the role of ncRNA in RNA epigenetics. Although much can be learned by amplification-based analysis of ncRNA sequence and quantity, there is a significant need for direct analysis of RNA, which has led to numerous methods for purification of specific ncRNA molecules. However, no single method allows purification of the full range of cellular ncRNA species. To this end, we developed a multidimensional chromatographic platform to resolve, isolate and quantify all canonical ncRNAs in a single sample of cells or tissue, as well as novel ncRNA species. The applicability of the platform is demonstrated in analyses of ncRNA from bacteria, human cells and plasmodium-infected reticulocytes, as well as a viral RNA genome. Among the many potential applications of this platform are a system-level analysis of the dozens of modified ribonucleosides in ncRNA, characterization of novel long ncRNA species, enhanced detection of rare transcript variants and analysis of viral genomes.
Recent advancement in mass spectrometry leads us to a new era of proteomic analysis. Human saliva can be easily collected; however, the complexity of the salivary proteome in the past prevented the use of saliva for proteomic analysis. Here we review the development of proteomic analyses for human saliva and focus on the use of a new mass spectrometric technology known as surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF). SELDI-TOF, a modification of matrix-assisted laser desorption/ionization mass spectrometry (MALDI-TOF), combines the precision of mass spectrometry and the high-through-put nature of protein arrays known as Protein Chips. This technology shows a promising future for salivary proteomic analysis in monitoring treatments and diseases, as well as novel biomarker discovery.
Proteomics; mass spectrometry; biomarker; biomarker discovery; proteomic profiling; protein expression; SELDI; SELDI-TOF; protein chips.
Mass spectrometry is a powerful tool with much promise in global proteomic studies. The discipline of statistics offers robust methodologies to extract and interpret high-dimensional mass-spectrometry data and will be a valuable contributor to the field. Here, we describe the process by which data are produced, characteristics of the data, and the analytical preprocessing steps that are taken in order to interpret the data and use it in downstream statistical analyses. Because of the complexity of data acquisition, statistical methods developed for gene expression microarray data are not directly applicable to proteomic data. Areas in need of statistical research for proteomic data include alignment, experimental design, abundance normalization, and statistical analysis.
Experimental design; Fourier transform; Mass calibration; Mass spectrometry; Normalization
Recent advances in the field of RNA research have provided compelling evidence implicating microRNA (miRNA) and long non-coding RNA molecules in many diverse and substantial biological processes, including transcriptional and post-transcriptional regulation of gene expression, genomic imprinting, and modulation of protein activity. Thus, studies of non-coding RNA (ncRNA) may contribute to the discovery of possible biomarkers in human cancers. Considering that the response to chemotherapy can differ amongst individuals, researchers have begun to isolate and identify the genes responsible. Identification of targets of this ncRNA associated with cancer can suggest that networks of these linked to oncogenes or tumor suppressors play pivotal roles in cancer development. Moreover, these ncRNA are attractive drug targets since they may be differentially expressed in malignant versus normal cells and regulate expression of critical proteins in the cell. This review focuses on ncRNAs that are differently expressed in malignant tissue, and discusses some of challenges derived from their use as potential biomarkers of tumor properties.
biomarkers; cancer; prognostic; non-coding RNA
The fields of mass spectrometry (MS) and stem cell biology have expanded greatly in the past twenty years. Taken alone, these fields occupy entirely different branches of science; however, the points where they overlap provide valuable insight, both in the biological and technical arenas. From a biological perspective, MS-based proteomics offers the capacity to follow post-transcriptional regulation and signaling that are 1) fundamental to pluripotency and differentiation, 2) largely beyond the reach of genomic technologies, and 3) otherwise difficult or impossible to examine on a large-scale. At the same time, addressing questions fundamental to stem cell biology has compelled proteomic researchers to pursue more sensitive and creative ways to probe the proteome, both in a targeted and high-throughput manner. Here, we highlight experiments that straddle proteomics and stem cell biology, with an emphasis on studies that apply mass spectrometry to dissect pluripotency and differentiation.
Mass spectrometry; Embryonic stem cell; Differentiation; Large-scale analysis; Quantitative proteomics
The scope of gas phase ion/ion chemistry accessible to mass spectrometry is largely defined by the available tools. Due to the development of novel instrumentation, a wide range of reaction phenomenologies have been noted, many of which have been studied extensively and exploited for analytical applications. This perspective presents the development of mass spectrometry-based instrumentation for the study of the gas phase ion/ion chemistry in which at least one of the reactants is multiply-charged. The instrument evolution is presented within the context of three essential elements required for any ion/ion reaction study: the ionization source(s), the reaction vessel or environment, and the mass analyzer. Ionization source arrangements have included source combinations that allow for reactions between multiply charged ions of one polarity and singly charged ions of opposite polarity, arrangements that enable the study of reactions of multiply charged ions of opposite polarity, and most recently, arrangements that allow for ion formation from more than two ion sources. Gas phase ion/ion reaction studies have been performed at near atmospheric pressure in flow reactor designs and within electrodynamic ion traps operated in the mTorr range. With ion trap as a reaction vessel, ionization and reaction processes can be independently optimized and ion/ion reactions can be implemented within the context of MSn experiments. Spatial separation of the reaction vessel from the mass analyzer allows for the use of any form of mass analysis in conjunction with ion/ion reactions. Time-of-flight mass analysis, for example, has provided significant improvements in mass analysis figures of merit relative to mass filters and ion traps.
Currently, glycans are attracting attention from the scientific community as potential biomarkers or as posttranslational modifications (PTMs) of therapeutic proteins. However, structural characterization of glycoproteins and glycopeptides remains analytically challenging. Here, we report on the implementation of a novel acquisition strategy termed higher-energy collision dissociation-accurate mass-product-dependent electron transfer dissociation (HCD-PD-ETD) on a hybrid linear ion trap-orbitrap mass spectrometer. This acquisition strategy uses the complementary fragmentations of ETD and HCD for glycopeptides analysis in an intelligent fashion. Furthermore, the approach minimizes user input for optimizing instrumental parameters and enables straightforward detection of glycopeptides. ETD spectra are only acquired when glycan oxonium ions from MS/MS HCD are detected. The advantage of this approach is that it streamlines data analysis and improves dynamic range and duty cycle. Here, we present the benefits of HCD-PD-ETD relative to the traditional alternating HCD/ETD for a trainer set containing twelve-protein mixture with two glycoproteins: human serotransferrin, ovalbumin and contaminations of two other: bovine alpha 1 acid glycoprotein (bAGP) and bovine fetuin.