PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (996133)

Clipboard (0)
None

Related Articles

1.  Absolute quantification of microbial proteomes at different states by directed mass spectrometry 
The developed, directed mass spectrometry workflow allows to generate consistent and system-wide quantitative maps of microbial proteomes in a single analysis. Application to the human pathogen L. interrogans revealed mechanistic proteome changes over time involved in pathogenic progression and antibiotic defense, and new insights about the regulation of absolute protein abundances within operons.
The developed, directed proteomic approach allowed consistent detection and absolute quantification of 1680 proteins of the human pathogen L. interrogans in a single LC–MS/MS experiment.The comparison of 25 extensive, consistent and quantitative proteome maps revealed new insights about the proteome changes involved in pathogenic progression and antibiotic defense of L. interrogans, and about the regulation of protein abundances within operons.The generated time-resolved data sets are compatible with pattern analysis algorithms developed for transcriptomics, including hierarchical clustering and functional enrichment analysis of the detected profile clusters.This is the first study that describes the absolute quantitative behavior of any proteome over multiple states and represents the most comprehensive proteome abundance pattern comparison for any organism to date.
Over the last decade, mass spectrometry (MS)-based proteomics has evolved as the method of choice for system-wide proteome studies and now allows for the characterization of several thousands of proteins in a single sample. Despite these great advances, redundant monitoring of protein levels over large sample numbers in a high-throughput manner remains a challenging task. New directed MS strategies have shown to overcome some of the current limitations, thereby enabling the acquisition of consistent and system-wide data sets of proteomes with low-to-moderate complexity at high throughput.
In this study, we applied this integrated, two-stage MS strategy to investigate global proteome changes in the human pathogen L. interrogans. In the initial discovery phase, 1680 proteins (out of around 3600 gene products) could be identified (Schmidt et al, 2008) and, by focusing precious MS-sequencing time on the most dominant, specific peptides per protein, all proteins could be accurately and consistently monitored over 25 different samples within a few days of instrument time in the following scoring phase (Figure 1). Additionally, the co-analysis of heavy reference peptides enabled us to obtain absolute protein concentration estimates for all identified proteins in each perturbation (Malmström et al, 2009). The detected proteins did not show any biases against functional groups or protein classes, including membrane proteins, and span an abundance range of more than three orders of magnitude, a range that is expected to cover most of the L. interrogans proteome (Malmström et al, 2009).
To elucidate mechanistic proteome changes over time involved in pathogenic progression and antibiotic defense of L. interrogans, we generated time-resolved proteome maps of cells perturbed with serum and three different antibiotics at sublethal concentrations that are currently used to treat Leptospirosis. This yielded an information-rich proteomic data set that describes, for the first time, the absolute quantitative behavior of any proteome over multiple states, and represents the most comprehensive proteome abundance pattern comparison for any organism to date. Using this unique property of the data set, we could quantify protein components of entire pathways across several time points and subject the data sets to cluster analysis, a tool that was previously limited to the transcript level due to incomplete sampling on protein level (Figure 4). Based on these analyses, we could demonstrate that Leptospira cells adjust the cellular abundance of a certain subset of proteins and pathways as a general response to stress while other parts of the proteome respond highly specific. The cells furthermore react to individual treatments by ‘fine tuning' the abundance of certain proteins and pathways in order to cope with the specific cause of stress. Intriguingly, the most specific and significant expression changes were observed for proteins involved in motility, tissue penetration and virulence after serum treatment where we tried to simulate the host environment. While many of the detected protein changes demonstrate good agreement with available transcriptomics data, most proteins showed a poor correlation. This includes potential virulence factors, like Loa22 or OmpL1, with confirmed expression in vivo that were significantly up-regulated on the protein level, but not on the mRNA level, strengthening the importance of proteomic studies. The high resolution and coverage of the proteome data set enabled us to further investigate protein abundance changes of co-regulated genes within operons. This suggests that although most proteins within an operon respond to regulation synchronously, bacterial cells seem to have subtle means to adjust the levels of individual proteins or protein groups outside of the general trend, a phenomena that was recently also observed on the transcript level of other bacteria (Güell et al, 2009).
The method can be implemented with standard high-resolution mass spectrometers and software tools that are readily available in the majority of proteomics laboratories. It is scalable to any proteome of low-to-medium complexity and can be extended to post-translational modifications or peptide-labeling strategies for quantification. We therefore expect the approach outlined here to become a cornerstone for microbial systems biology.
Over the past decade, liquid chromatography coupled with tandem mass spectrometry (LC–MS/MS) has evolved into the main proteome discovery technology. Up to several thousand proteins can now be reliably identified from a sample and the relative abundance of the identified proteins can be determined across samples. However, the remeasurement of substantially similar proteomes, for example those generated by perturbation experiments in systems biology, at high reproducibility and throughput remains challenging. Here, we apply a directed MS strategy to detect and quantify sets of pre-determined peptides in tryptic digests of cells of the human pathogen Leptospira interrogans at 25 different states. We show that in a single LC–MS/MS experiment around 5000 peptides, covering 1680 L. interrogans proteins, can be consistently detected and their absolute expression levels estimated, revealing new insights about the proteome changes involved in pathogenic progression and antibiotic defense of L. interrogans. This is the first study that describes the absolute quantitative behavior of any proteome over multiple states, and represents the most comprehensive proteome abundance pattern comparison for any organism to date.
doi:10.1038/msb.2011.37
PMCID: PMC3159967  PMID: 21772258
absolute quantification; directed mass spectrometry; Leptospira interrogans; microbiology; proteomics
2.  The Genome Organization of Thermotoga maritima Reflects Its Lifestyle 
PLoS Genetics  2013;9(4):e1003485.
The generation of genome-scale data is becoming more routine, yet the subsequent analysis of omics data remains a significant challenge. Here, an approach that integrates multiple omics datasets with bioinformatics tools was developed that produces a detailed annotation of several microbial genomic features. This methodology was used to characterize the genome of Thermotoga maritima—a phylogenetically deep-branching, hyperthermophilic bacterium. Experimental data were generated for whole-genome resequencing, transcription start site (TSS) determination, transcriptome profiling, and proteome profiling. These datasets, analyzed in combination with bioinformatics tools, served as a basis for the improvement of gene annotation, the elucidation of transcription units (TUs), the identification of putative non-coding RNAs (ncRNAs), and the determination of promoters and ribosome binding sites. This revealed many distinctive properties of the T. maritima genome organization relative to other bacteria. This genome has a high number of genes per TU (3.3), a paucity of putative ncRNAs (12), and few TUs with multiple TSSs (3.7%). Quantitative analysis of promoters and ribosome binding sites showed increased sequence conservation relative to other bacteria. The 5′UTRs follow an atypical bimodal length distribution comprised of “Short” 5′UTRs (11–17 nt) and “Common” 5′UTRs (26–32 nt). Transcriptional regulation is limited by a lack of intergenic space for the majority of TUs. Lastly, a high fraction of annotated genes are expressed independent of growth state and a linear correlation of mRNA/protein is observed (Pearson r = 0.63, p<2.2×10−16 t-test). These distinctive properties are hypothesized to be a reflection of this organism's hyperthermophilic lifestyle and could yield novel insights into the evolutionary trajectory of microbial life on earth.
Author Summary
Genomic studies have greatly benefited from the advent of high-throughput technologies and bioinformatics tools. Here, a methodology integrating genome-scale data and bioinformatics tools is developed to characterize the genome organization of the hyperthermophilic, phylogenetically deep-branching bacterium Thermotoga maritima. This approach elucidates several features of the genome organization and enables comparative analysis of these features across diverse taxa. Our results suggest that the genome of T. maritima is reflective of its hyperthermophilic lifestyle. Ultimately, constraints imposed on the genome have negative impacts on regulatory complexity and phenotypic diversity. Investigating the genome organization of Thermotogae species will help resolve various causal factors contributing to the genome organization such as phylogeny and environment. Applying a similar analysis of the genome organization to numerous taxa will likely provide insights into microbial evolution.
doi:10.1371/journal.pgen.1003485
PMCID: PMC3636130  PMID: 23637642
3.  Methods for visual mining of genomic and proteomic data atlases 
BMC Bioinformatics  2012;13:58.
Background
As the volume, complexity and diversity of the information that scientists work with on a daily basis continues to rise, so too does the requirement for new analytic software. The analytic software must solve the dichotomy that exists between the need to allow for a high level of scientific reasoning, and the requirement to have an intuitive and easy to use tool which does not require specialist, and often arduous, training to use. Information visualization provides a solution to this problem, as it allows for direct manipulation and interaction with diverse and complex data. The challenge addressing bioinformatics researches is how to apply this knowledge to data sets that are continually growing in a field that is rapidly changing.
Results
This paper discusses an approach to the development of visual mining tools capable of supporting the mining of massive data collections used in systems biology research, and also discusses lessons that have been learned providing tools for both local researchers and the wider community. Example tools were developed which are designed to enable the exploration and analyses of both proteomics and genomics based atlases. These atlases represent large repositories of raw and processed experiment data generated to support the identification of biomarkers through mass spectrometry (the PeptideAtlas) and the genomic characterization of cancer (The Cancer Genome Atlas). Specifically the tools are designed to allow for: the visual mining of thousands of mass spectrometry experiments, to assist in designing informed targeted protein assays; and the interactive analysis of hundreds of genomes, to explore the variations across different cancer genomes and cancer types.
Conclusions
The mining of massive repositories of biological data requires the development of new tools and techniques. Visual exploration of the large-scale atlas data sets allows researchers to mine data to find new meaning and make sense at scales from single samples to entire populations. Providing linked task specific views that allow a user to start from points of interest (from diseases to single genes) enables targeted exploration of thousands of spectra and genomes. As the composition of the atlases changes, and our understanding of the biology increase, new tasks will continually arise. It is therefore important to provide the means to make the data available in a suitable manner in as short a time as possible. We have done this through the use of common visualization workflows, into which we rapidly deploy visual tools. These visualizations follow common metaphors where possible to assist users in understanding the displayed data. Rapid development of tools and task specific views allows researchers to mine large-scale data almost as quickly as it is produced. Ultimately these visual tools enable new inferences, new analyses and further refinement of the large scale data being provided in atlases such as PeptideAtlas and The Cancer Genome Atlas.
doi:10.1186/1471-2105-13-58
PMCID: PMC3352268  PMID: 22524279
4.  Empirical Bayes Analysis of Quantitative Proteomics Experiments 
PLoS ONE  2009;4(10):e7454.
Background
Advances in mass spectrometry-based proteomics have enabled the incorporation of proteomic data into systems approaches to biology. However, development of analytical methods has lagged behind. Here we describe an empirical Bayes framework for quantitative proteomics data analysis. The method provides a statistical description of each experiment, including the number of proteins that differ in abundance between 2 samples, the experiment's statistical power to detect them, and the false-positive probability of each protein.
Methodology/Principal Findings
We analyzed 2 types of mass spectrometric experiments. First, we showed that the method identified the protein targets of small-molecules in affinity purification experiments with high precision. Second, we re-analyzed a mass spectrometric data set designed to identify proteins regulated by microRNAs. Our results were supported by sequence analysis of the 3′ UTR regions of predicted target genes, and we found that the previously reported conclusion that a large fraction of the proteome is regulated by microRNAs was not supported by our statistical analysis of the data.
Conclusions/Significance
Our results highlight the importance of rigorous statistical analysis of proteomic data, and the method described here provides a statistical framework to robustly and reliably interpret such data.
doi:10.1371/journal.pone.0007454
PMCID: PMC2759080  PMID: 19829701
5.  PEPPI: a peptidomic database of human protein isoforms for proteomics experiments 
BMC Bioinformatics  2010;11(Suppl 6):S7.
Abstract
Background
Protein isoform generation, which may derive from alternative splicing, genetic polymorphism, and posttranslational modification, is an essential source of achieving molecular diversity by eukaryotic cells. Previous studies have shown that protein isoforms play critical roles in disease diagnosis, risk assessment, sub-typing, prognosis, and treatment outcome predictions. Understanding the types, presence, and abundance of different protein isoforms in different cellular and physiological conditions is a major task in functional proteomics, and may pave ways to molecular biomarker discovery of human diseases. In tandem mass spectrometry (MS/MS) based proteomics analysis, peptide peaks with exact matches to protein sequence records in the proteomics database may be identified with mass spectrometry (MS) search software. However, due to limited annotation and poor coverage of protein isoforms in proteomics databases, high throughput protein isoform identifications, particularly those arising from alternative splicing and genetic polymorphism, have not been possible.
Results
Therefore, we present the PEPtidomics Protein Isoform Database (PEPPI, http://bio.informatics.iupui.edu/peppi), a comprehensive database of computationally-synthesized human peptides that can identify protein isoforms derived from either alternatively spliced mRNA transcripts or SNP variations. We collected genome, pre-mRNA alternative splicing and SNP information from Ensembl. We synthesized in silico isoform transcripts that cover all exons and theoretically possible junctions of exons and introns, as well as all their variations derived from known SNPs. With three case studies, we further demonstrated that the database can help researchers discover and characterize new protein isoform biomarkers from experimental proteomics data.
Conclusions
We developed a new tool for the proteomics community to characterize protein isoforms from MS-based proteomics experiments. By cataloguing each peptide configurations in the PEPPI database, users can study genetic variations and alternative splicing events at the proteome level. They can also batch-download peptide sequences in FASTA format to search for MS/MS spectra derived from human samples. The database can help generate novel hypotheses on molecular risk factors and molecular mechanisms of complex diseases, leading to identification of potentially highly specific protein isoform biomarkers.
doi:10.1186/1471-2105-11-S6-S7
PMCID: PMC3026381  PMID: 20946618
6.  Status of complete proteome analysis by mass spectrometry: SILAC labeled yeast as a model system 
Genome Biology  2006;7(6):R50.
A mass spectrometry analysis of the yeast proteome shows that complex mixture analysis is not limited by sensitivity but by a combination of dynamic range and by effective sequencing speed.
Background
Mass spectrometry has become a powerful tool for the analysis of large numbers of proteins in complex samples, enabling much of proteomics. Due to various analytical challenges, so far no proteome has been sequenced completely. O'Shea, Weissman and co-workers have recently determined the copy number of yeast proteins, making this proteome an excellent model system to study factors affecting coverage.
Results
To probe the yeast proteome in depth and determine factors currently preventing complete analysis, we grew yeast cells, extracted proteins and separated them by one-dimensional gel electrophoresis. Peptides resulting from trypsin digestion were analyzed by liquid chromatography mass spectrometry on a linear ion trap-Fourier transform mass spectrometer with very high mass accuracy and sequencing speed. We achieved unambiguous identification of more than 2,000 proteins, including very low abundant ones. Effective dynamic range was limited to about 1,000 and effective sensitivity to about 500 femtomoles, far from the subfemtomole sensitivity possible with single proteins. We used SILAC (stable isotope labeling by amino acids in cell culture) to generate one-to-one pairs of true peptide signals and investigated if sensitivity, sequencing speed or dynamic range were limiting the analysis.
Conclusion
Advanced mass spectrometry methods can unambiguously identify more than 2,000 proteins in a single proteome. Complex mixture analysis is not limited by sensitivity but by a combination of dynamic range (high abundance peptides preventing sequencing of low abundance ones) and by effective sequencing speed. Substantially increased coverage of the yeast proteome appears feasible with further development in software and instrumentation.
doi:10.1186/gb-2006-7-6-r50
PMCID: PMC1779535  PMID: 16784548
7.  EP3 Fundamentals of Protein Sequence Characterization by Mass Spectrometry 
The first section of the tutorial will describe the instrumentation typically used in biological mass spectrometry applications related to protein identification. We focus on the relevant ionization techniques, common mass analyzers, and sample introduction systems. Attention will be given to properties, such as mass accuracy and mass resolution, which are important to protein characterization and database search strategies for protein identification. Practical considerations regarding the selection and use of instruments as well as troubleshooting information will be offered throughout the presentation.
The fundamentals of basic protein sequence characterization, including post-translational modifications, by mass spectrometry will be presented in the second section of the tutorial. Emphasis is placed on the use of tandem mass spectrometry at the peptide level to confirm and in some cases derive partial peptide sequence, identify post-translationally modified sequences, and localize the specific site of attachment. We will describe the basic principles of peptide fragmentation by collision-induced dissociation and how to use these principles to interpret MS/MS spectra. Basic sample preparation protocols compatible with mass spectrometry analysis will be described.
The third section of the tutorial will focus on mass spectrometric analyses of protein mixtures (proteomes). Besides shear numbers of proteins, the range of concentrations in certain samples is frequently an impediment to a complete analysis. Various fractionation, capture, and depletion methods will be described for dealing with very complex protein mixtures. Some of these capture methods also provide additional information regarding post-translational modifications. A brief description of database search methods for protein identification will be followed by a more extensive discussion of validating the search results. Finally, brief descriptions of protein quantitation methods will be presented, and their various advantages and disadvantages will be discussed.
PMCID: PMC2291869
8.  Cell-type specific analysis of translating RNAs in developing flowers reveals new levels of control 
Combining translating ribosome affinity purification with RNA-seq for cell-specific profiling of translating RNAs in developing flowers.Cell type comparisons of cell type-specific hormone responses, promoter motifs, coexpressed cognate binding factor candidates, and splicing isoforms.Widespread post-transcriptional regulation at both the intron splicing and translational stages.A new class of noncoding RNAs associated with polysomes.
What constitutes a differentiated cell type? How much do cell types differ in their transcription of genes? The development and functions of tissues rely on constant interactions among distinct and nonequivalent cell types. Answering these questions will require quantitative information on transcriptomes, proteomes, protein–protein interactions, protein–nucleic acid interactions, and metabolomes at cellular resolution. The systems approaches emerging in biology promise to explain properties of biological systems based on genome-wide measurements of expression, interaction, regulation, and metabolism. To facilitate a systems approach, it is essential first to capture such components in a global manner, ideally at cellular resolution.
Recently, microarray analysis of transcriptomes has been extended to a cellular level of resolution by using laser microdissection or fluorescence-activated sorting (for review, see Nelson et al, 2008). These methods have been limited by stresses associated with cellular separation and isolation procedures, and biases associated with mandatory RNA amplification steps. A newly developed method, translating ribosome affinity purification (TRAP; Zanetti et al, 2005; Heiman et al, 2008; Mustroph et al, 2009), circumvents these problems by epitopetagging a ribosomal protein in specific cellular domains to selectively purify polysomes. We combined TRAP with deep sequencing, which we term TRAP-seq, to provide cell-level spatiotemporal maps for Arabidopsis early floral development at single-base resolution.
Flower development in Arabidopsis has been studied extensively and is one of the best understood aspects of plant development (for review, see Krizek and Fletcher, 2005). Genetic analysis of homeotic mutants established the ABC model, in which three classes of regulatory genes, A, B and C, work in a combinatorial manner to confer organ identities of four whorls (Coen and Meyerowitz, 1991). Each class of regulatory gene is expressed in a specific and evolutionarily conserved domain, and the action of the class A, B and C genes is necessary for specification of organ identity (Figure 1A).
Using TRAP-seq, we purified cell-specific translating mRNA populations, which we and others call the translatome, from the A, B and C domains of early developing flowers, in which floral patterning and the specification of floral organs is established. To achieve temporal specificity, we used a floral induction system to facilitate collection of early stage flowers (Wellmer et al, 2006). The combination of TRAP-seq with domain-specific promoters and this floral induction system enabled fine spatiotemporal isolation of translating mRNA in specific cellular domains, and at specific developmental stages.
Multiple lines of evidence confirmed the specificity of this approach, including detecting the expression in expected domains but not in other domains for well-studied flower marker genes and known physiological functions (Figures 1B–D and 2A–C). Furthermore, we provide numerous examples from flower development in which a spatiotemporal map of rigorously comparable cell-specific translatomes makes possible new views of the properties of cell domains not evident in data obtained from whole organs or tissues, including patterns of transcription and cis-regulation, new physiological differences among cell domains and between flower stages, putative hormone-active centers, and splicing events specific for flower domains (Figure 2A–D). Such findings may provide new targets for reverse genetics studies and may aid in the formulation and validation of interaction and pathway networks.
Beside cellular heterogeneity, the transcriptome is regulated at several steps through the life of mRNA molecules, which are not directly available through traditional transcriptome profiling of total mRNA abundance. By comparing the translatome and transcriptome, we integratively profiled two key posttranscriptional control points, intron splicing and translation state. From our translatome-wide profiling, we (i) confirmed that both posttranscriptional regulation control points were used by a large portion of the transcriptome; (ii) identified a number of cis-acting features within the coding or noncoding sequences that correlate with splicing or translation state; and (iii) revealed correlation between each regulation mechanism and gene function. Our transcriptome-wide surveys have highlighted target genes transcripts of which are probably under extensive posttranscriptional regulation during flower development.
Finally, we reported the finding of a large number of polysome-associated ncRNAs. About one-third of all annotated ncRNA in the Arabidopsis genome were observed co-purified with polysomes. Coding capacity analysis confirmed that most of them are real ncRNA without conserved ORFs. The group of polysome-associated ncRNA reported in this study is a potential new addition to the expanding riboregulator catalog; they could have roles in translational regulation during early flower development.
Determining both the expression levels of mRNA and the regulation of its translation is important in understanding specialized cell functions. In this study, we describe both the expression profiles of cells within spatiotemporal domains of the Arabidopsis thaliana flower and the post-transcriptional regulation of these mRNAs, at nucleotide resolution. We express a tagged ribosomal protein under the promoters of three master regulators of flower development. By precipitating tagged polysomes, we isolated cell type-specific mRNAs that are probably translating, and quantified those mRNAs through deep sequencing. Cell type comparisons identified known cell-specific transcripts and uncovered many new ones, from which we inferred cell type-specific hormone responses, promoter motifs and coexpressed cognate binding factor candidates, and splicing isoforms. By comparing translating mRNAs with steady-state overall transcripts, we found evidence for widespread post-transcriptional regulation at both the intron splicing and translational stages. Sequence analyses identified structural features associated with each step. Finally, we identified a new class of noncoding RNAs associated with polysomes. Findings from our profiling lead to new hypotheses in the understanding of flower development.
doi:10.1038/msb.2010.76
PMCID: PMC2990639  PMID: 20924354
Arabidopsis; flower; intron; transcriptome; translation
9.  Clusters of Internally Primed Transcripts Reveal Novel Long Noncoding RNAs 
PLoS Genetics  2006;2(4):e37.
Non-protein-coding RNAs (ncRNAs) are increasingly being recognized as having important regulatory roles. Although much recent attention has focused on tiny 22- to 25-nucleotide microRNAs, several functional ncRNAs are orders of magnitude larger in size. Examples of such macro ncRNAs include Xist and Air, which in mouse are 18 and 108 kilobases (Kb), respectively. We surveyed the 102,801 FANTOM3 mouse cDNA clones and found that Air and Xist were present not as single, full-length transcripts but as a cluster of multiple, shorter cDNAs, which were unspliced, had little coding potential, and were most likely primed from internal adenine-rich regions within longer parental transcripts. We therefore conducted a genome-wide search for regional clusters of such cDNAs to find novel macro ncRNA candidates. Sixty-six regions were identified, each of which mapped outside known protein-coding loci and which had a mean length of 92 Kb. We detected several known long ncRNAs within these regions, supporting the basic rationale of our approach. In silico analysis showed that many regions had evidence of imprinting and/or antisense transcription. These regions were significantly associated with microRNAs and transcripts from the central nervous system. We selected eight novel regions for experimental validation by northern blot and RT-PCR and found that the majority represent previously unrecognized noncoding transcripts that are at least 10 Kb in size and predominantly localized in the nucleus. Taken together, the data not only identify multiple new ncRNAs but also suggest the existence of many more macro ncRNAs like Xist and Air.
Synopsis
The human genome has been sequenced, and, intriguingly, less than 2% specifies the information for the basic protein building blocks of our bodies. So, what does the other 98% do? It now appears that the mammalian genome also specifies the instructions for many previously undiscovered “non protein-coding RNA” (ncRNA) genes. However, what these ncRNAs do is largely unknown. In recent years, strategies have been designed that have successfully identified hundreds of short ncRNAs—termed microRNAs—many of which have since been shown to act as genetic regulators. Also known to be functionally important are a handful of ncRNAs orders of magnitude larger in size than microRNAs. The availability of complete genome and comprehensive transcript sequences allows for the systematic discovery of more large ncRNAs. The authors developed a computational strategy to screen the mouse genome and identify large ncRNAs. They detected existing large ncRNAs, thus validating their approach, but, more importantly, discovered more than 60 other candidates, some of which were subsequently confirmed experimentally. This work opens the door to a virtually unexplored world of large ncRNAs and beckons future experimental work to define the cellular functions of these molecules.
doi:10.1371/journal.pgen.0020037
PMCID: PMC1449886  PMID: 16683026
10.  Unique Signatures of Long Noncoding RNA Expression in Response to Virus Infection and Altered Innate Immune Signaling 
mBio  2010;1(5):e00206-10.
Studies of the host response to virus infection typically focus on protein-coding genes. However, non-protein-coding RNAs (ncRNAs) are transcribed in mammalian cells, and the roles of many of these ncRNAs remain enigmas. Using next-generation sequencing, we performed a whole-transcriptome analysis of the host response to severe acute respiratory syndrome coronavirus (SARS-CoV) infection across four founder mouse strains of the Collaborative Cross. We observed differential expression of approximately 500 annotated, long ncRNAs and 1,000 nonannotated genomic regions during infection. Moreover, studies of a subset of these ncRNAs and genomic regions showed the following. (i) Most were similarly regulated in response to influenza virus infection. (ii) They had distinctive kinetic expression profiles in type I interferon receptor and STAT1 knockout mice during SARS-CoV infection, including unique signatures of ncRNA expression associated with lethal infection. (iii) Over 40% were similarly regulated in vitro in response to both influenza virus infection and interferon treatment. These findings represent the first discovery of the widespread differential expression of long ncRNAs in response to virus infection and suggest that ncRNAs are involved in regulating the host response, including innate immunity. At the same time, virus infection models provide a unique platform for studying the biology and regulation of ncRNAs.
IMPORTANCE
Most studies examining the host transcriptional response to infection focus only on protein-coding genes. However, there is growing evidence that thousands of non-protein-coding RNAs (ncRNAs) are transcribed from mammalian genomes. While most attention to the involvement of ncRNAs in virus-host interactions has been on small ncRNAs such as microRNAs, it is becoming apparent that many long ncRNAs (>200 nucleotides [nt]) are also biologically important. These long ncRNAs have been found to have widespread functionality, including chromatin modification and transcriptional regulation and serving as the precursors of small RNAs. With the advent of next-generation sequencing technologies, whole-transcriptome analysis of the host response, including long ncRNAs, is now possible. Using this approach, we demonstrated that virus infection alters the expression of numerous long ncRNAs, suggesting that these RNAs may be a new class of regulatory molecules that play a role in determining the outcome of infection.
doi:10.1128/mBio.00206-10
PMCID: PMC2962437  PMID: 20978541
11.  Identification of CRISPR and riboswitch related RNAs among novel noncoding RNAs of the euryarchaeon Pyrococcus abyssi 
BMC Genomics  2011;12:312.
Background
Noncoding RNA (ncRNA) has been recognized as an important regulator of gene expression networks in Bacteria and Eucaryota. Little is known about ncRNA in thermococcal archaea except for the eukaryotic-like C/D and H/ACA modification guide RNAs.
Results
Using a combination of in silico and experimental approaches, we identified and characterized novel P. abyssi ncRNAs transcribed from 12 intergenic regions, ten of which are conserved throughout the Thermococcales. Several of them accumulate in the late-exponential phase of growth. Analysis of the genomic context and sequence conservation amongst related thermococcal species revealed two novel P. abyssi ncRNA families. The CRISPR family is comprised of crRNAs expressed from two of the four P. abyssi CRISPR cassettes. The 5'UTR derived family includes four conserved ncRNAs, two of which have features similar to known bacterial riboswitches. Several of the novel ncRNAs have sequence similarities to orphan OrfB transposase elements. Based on RNA secondary structure predictions and experimental results, we show that three of the twelve ncRNAs include Kink-turn RNA motifs, arguing for a biological role of these ncRNAs in the cell. Furthermore, our results show that several of the ncRNAs are subjected to processing events by enzymes that remain to be identified and characterized.
Conclusions
This work proposes a revised annotation of CRISPR loci in P. abyssi and expands our knowledge of ncRNAs in the Thermococcales, thus providing a starting point for studies needed to elucidate their biological function.
doi:10.1186/1471-2164-12-312
PMCID: PMC3124441  PMID: 21668986
12.  Analytical Utility of Mass Spectral Binning in Proteomic Experiments by SPectral Immonium Ion Detection (SPIID)*  
Unambiguous identification of tandem mass spectra is a cornerstone in mass-spectrometry-based proteomics. As the study of post-translational modifications (PTMs) by means of shotgun proteomics progresses in depth and coverage, the ability to correctly identify PTM-bearing peptides is essential, increasing the demand for advanced data interpretation. Several PTMs are known to generate unique fragment ions during tandem mass spectrometry, the so-called diagnostic ions, which unequivocally identify a given mass spectrum as related to a specific PTM. Although such ions offer tremendous analytical advantages, algorithms to decipher MS/MS spectra for the presence of diagnostic ions in an unbiased manner are currently lacking. Here, we present a systematic spectral-pattern-based approach for the discovery of diagnostic ions and new fragmentation mechanisms in shotgun proteomics datasets. The developed software tool is designed to analyze large sets of high-resolution peptide fragmentation spectra independent of the fragmentation method, instrument type, or protease employed. To benchmark the software tool, we analyzed large higher-energy collisional activation dissociation datasets of samples containing phosphorylation, ubiquitylation, SUMOylation, formylation, and lysine acetylation. Using the developed software tool, we were able to identify known diagnostic ions by comparing histograms of modified and unmodified peptide spectra. Because the investigated tandem mass spectra data were acquired with high mass accuracy, unambiguous interpretation and determination of the chemical composition for the majority of detected fragment ions was feasible. Collectively we present a freely available software tool that allows for comprehensive and automatic analysis of analogous product ions in tandem mass spectra and systematic mapping of fragmentation mechanisms related to common amino acids.
doi:10.1074/mcp.O113.035915
PMCID: PMC4125726  PMID: 24895383
13.  The majority of total nuclear-encoded non-ribosomal RNA in a human cell is 'dark matter' un-annotated RNA 
BMC Biology  2010;8:149.
Background
Discovery that the transcriptional output of the human genome is far more complex than predicted by the current set of protein-coding annotations and that most RNAs produced do not appear to encode proteins has transformed our understanding of genome complexity and suggests new paradigms of genome regulation. However, the fraction of all cellular RNA whose function we do not understand and the fraction of the genome that is utilized to produce that RNA remain controversial. This is not simply a bookkeeping issue because the degree to which this un-annotated transcription is present has important implications with respect to its biologic function and to the general architecture of genome regulation. For example, efforts to elucidate how non-coding RNAs (ncRNAs) regulate genome function will be compromised if that class of RNAs is dismissed as simply 'transcriptional noise'.
Results
We show that the relative mass of RNA whose function and/or structure we do not understand (the so called 'dark matter' RNAs), as a proportion of all non-ribosomal, non-mitochondrial human RNA (mt-RNA), can be greater than that of protein-encoding transcripts. This observation is obscured in studies that focus only on polyA-selected RNA, a method that enriches for protein coding RNAs and at the same time discards the vast majority of RNA prior to analysis. We further show the presence of a large number of very long, abundantly-transcribed regions (100's of kb) in intergenic space and further show that expression of these regions is associated with neoplastic transformation. These overlap some regions found previously in normal human embryonic tissues and raises an interesting hypothesis as to the function of these ncRNAs in both early development and neoplastic transformation.
Conclusions
We conclude that 'dark matter' RNA can constitute the majority of non-ribosomal, non-mitochondrial-RNA and a significant fraction arises from numerous very long, intergenic transcribed regions that could be involved in neoplastic transformation.
doi:10.1186/1741-7007-8-149
PMCID: PMC3022773  PMID: 21176148
14.  Relevant phosphoproteomic and mass spectrometry: approaches useful in clinical research 
Background
"It's not what we do, it's the way that we do it". Never has this maxim been truer in proteomics than now. Mass Spectrometry-based proteomics/phosphoproteomics tools are critical to understand the structure and dynamics (spatial and temporal) of signalling that engages and migrates through the entire proteome. Approaches such as affinity purification followed by Mass Spectrometry (MS) have been used to elucidate relevant biological questions disease vs. health. Thousands of proteins interact via physical and chemical association. Moreover, certain proteins can covalently modify other proteins post-translationally. These post-translational modifications (PTMs) ultimately give rise to the emergent functions of cells in sequence, space and time.
Findings
Understanding the functions of phosphorylated proteins thus requires one to study proteomes as linked-systems rather than collections of individual protein molecules. Indeed, the interacting proteome or protein-network knowledge has recently received much attention, as network-systems (signalling pathways) are effective snapshots in time, of the proteome as a whole. MS approaches are clearly essential, in spite of the difficulties of some low abundance proteins for future clinical advances.
Conclusion
Clinical proteomics-MS has come a long way in the past decade in terms of technology/platform development, protein chemistry, and together with bioinformatics and other OMICS tools to identify molecular signatures of diseases based on protein pathways and signalling cascades. Hence, there is great promise for disease diagnosis, prognosis, and prediction of therapeutic outcome on an individualized basis. However, and as a general rule, without correct study design, strategy and implementation of robust analytical methodologies, the efforts, efficiency and expectations to make biomarkers (especially phosphorylated kinases) a useful reality in the near future, can easily be hampered.
doi:10.1186/2001-1326-1-2
PMCID: PMC3552569  PMID: 23369602
Phosphoproteomics; Mass spectrometry; Clinical research
15.  Quantification of mRNA and protein and integration with protein turnover in a bacterium 
Determination of the average cellular copy number of 400 proteins under different growth conditions and integration with protein turnover and absolute mRNA levels reveals the dynamics of protein expression in the genome-reduced bacterium Mycoplasma pneumoniae.
Our study provides a fine-grained, quantitative picture to unprecedented detail in an established model organism for systems-wide studies.Our integrative approach reveals a novel, dynamic view on the processes, interactions and regulations underlying the central dogma pathway and the composition of protein complexes.Simulations using our quantitative data on mRNA, protein and turnover show how an organism copes with stochastic noise in gene expression in vivo.Our data serve as an important resource for colleagues both within our field of research and in related disciplines.
A hallmark of Systems Biology is the integration of diverse, large quantitative data sets with the aim to gain novel insights into how biological processes work. We measured individual mRNA and protein abundances as well as protein turnover in the bacterium Mycoplasma pneumoniae. This human pathogen is an ideal model organism for organism-wide studies. It can be readily cultured under laboratory conditions and it has a very small genome with only 690 protein-coding genes. This comparably low complexity allows for the exhaustive analysis of major cellular biomolecules avoiding constrains introduced by limitations of available analysis techniques.
Using a recently developed mass spectrometry-based approach, we determined the average cellular copy number for over 400 individual proteins under different growth and stress conditions. The 20 most abundant proteins, including Elongation factor Tu, cellular chaperones, and proteins involved in metabolizing glucose, the major energy source of M. pneumoniae account for nearly 44% of the total cellular protein mass. We observed abundance changes of many expected and several unexpected proteins in response to cellular stress, such as heat shock, DNA damage and osmotic stress, as well as along batch culture growth over 4 days.
Integration of the protein abundance data with quantitative mRNA measurements revealed a modest correlation between these two classes of biomolecules. However, for several classical stress-induced proteins, we observed a correlated induction of mRNA and protein in response to heat shock. A focused analysis of mRNA–protein abundance dynamics during batch culture growth suggested that the regulation of gene expression is largely decoupled from protein dynamics in M. pneumoniae, indicating extensive post-transcriptional and post-translational regulation influencing the cellular mRNA–protein ratios.
To investigate the factors influencing the cellular protein abundance, we measured individual protein turnover rates by mass spectrometry using a label-chase approach involving stable isotope-labelled amino acids. The average half-life of a protein in M. pneumoniae is 23 h. Based on the measured quantitative mRNA data, the protein abundances and their half-lives, we established an ordinary differential equations model for the estimation of individual in vivo protein degradation and translation efficiency rates. We found out that translation efficiency rather than protein turnover is the dominating factor influencing protein abundance. Using our abundance and turnover data, we additionally performed stochastic simulations of gene expression. We observed that long protein half-life and low translational efficiency buffers gene expression noise propagating from low cellular mRNA levels in vivo.
We compared the abundance ratios of proteins associating into complexes in vivo with their expected functional stoichiometries. We observed that for stable protein complexes, such as the GroEL/ES chaperonin or DNA gyrase, our measured abundance ratios reflected the expected subunit stoichiometries. More dynamic protein complexes, such as the DnaK/J/GrpE chaperone system or RNA polymerase, showed several unusual subunit ratios, pointing towards transient interaction of sub-stoichiometric subunits for function. A detailed, quantitative analysis of the ribosome, the largest cellular protein complex, revealed large abundance differences of the 51 subunits. This observation indicates a multi-functionality for several, abundant ribosomal proteins.
Finally, a comparison of the determined average cellular protein abundances with a different pathogenic bacterium, Leptospira interrogans, revealed that cellular protein abundances closely reflect their respective lifestyles.
Our study represents an organism-wide, quantitative analysis of cellular protein abundances. Integrating our proteomics data with determined mRNA levels and protein turnover rates reveals insights into the dynamic interplay and regulation of mRNA and proteins, the central biomolecules of a cell.
Biological function and cellular responses to environmental perturbations are regulated by a complex interplay of DNA, RNA, proteins and metabolites inside cells. To understand these central processes in living systems at the molecular level, we integrated experimentally determined abundance data for mRNA, proteins, as well as individual protein half-lives from the genome-reduced bacterium Mycoplasma pneumoniae. We provide a fine-grained, quantitative analysis of basic intracellular processes under various external conditions. Proteome composition changes in response to cellular perturbations reveal specific stress response strategies. The regulation of gene expression is largely decoupled from protein dynamics and translation efficiency has a higher regulatory impact on protein abundance than protein turnover. Stochastic simulations using in vivo data show how low translation efficiency and long protein half-lives effectively reduce biological noise in gene expression. Protein abundances are regulated in functional units, such as complexes or pathways, and reflect cellular lifestyles. Our study provides a detailed integrative analysis of average cellular protein abundances and the dynamic interplay of mRNA and proteins, the central biomolecules of a cell.
doi:10.1038/msb.2011.38
PMCID: PMC3159969  PMID: 21772259
mRNA–protein; Mycoplasma pneumoniae; protein homeostasis; protein turnover; quantitative proteomics
16.  The Coding and Noncoding Architecture of the Caulobacter crescentus Genome 
PLoS Genetics  2014;10(7):e1004463.
Caulobacter crescentus undergoes an asymmetric cell division controlled by a genetic circuit that cycles in space and time. We provide a universal strategy for defining the coding potential of bacterial genomes by applying ribosome profiling, RNA-seq, global 5′-RACE, and liquid chromatography coupled with tandem mass spectrometry (LC-MS) data to the 4-megabase C. crescentus genome. We mapped transcript units at single base-pair resolution using RNA-seq together with global 5′-RACE. Additionally, using ribosome profiling and LC-MS, we mapped translation start sites and coding regions with near complete coverage. We found most start codons lacked corresponding Shine-Dalgarno sites although ribosomes were observed to pause at internal Shine-Dalgarno sites within the coding DNA sequence (CDS). These data suggest a more prevalent use of the Shine-Dalgarno sequence for ribosome pausing rather than translation initiation in C. crescentus. Overall 19% of the transcribed and translated genomic elements were newly identified or significantly improved by this approach, providing a valuable genomic resource to elucidate the complete C. crescentus genetic circuitry that controls asymmetric cell division.
Author Summary
Caulobacter crescentus is a model system for studying asymmetric cell division, a fundamental process that, through differential gene expression in the two daughter cells, enables the generation of cells with different fates. To explore how the genome directs and maintains asymmetry upon cell division, we performed a coordinated analysis of multiple genomic and proteomic datasets to identify the RNA and protein coding features in the C. crescentus genome. Our integrated analysis identifies many new genetic regulatory elements, adding significant regulatory complexity to the C. crescentus genome. Surprisingly, 75.4% of protein coding genes lack a canonical translation initiation sequence motif (the Shine-Dalgarno site) which hybridizes to the 3′ end of the ribosomal RNA allowing translation initiation. We find Shine-Dalgarno sites primarily inside of genes where they cause translating ribosomes to pause, possibly allowing nascent proteins to correctly fold. With our detailed map of genomic transcription and translation elements, a systems view of the genetic network that controls asymmetric cell division is within reach.
doi:10.1371/journal.pgen.1004463
PMCID: PMC4117421  PMID: 25078267
17.  Stanford University Mass Spectrometry 
Journal of Biomolecular Techniques : JBT  2010;21(3 Suppl):S70-S71.
CF-11
Stanford University Mass Spectrometry (SUMS) is Stanford University's central core facility for mass spectrometry-based analysis. SUMS wears several hats, as the Vincent Coates Foundation Mass Spectrometry Laboratory named in honor of a generous gift from Vincent and Stella Coates; a Stanford Bio-X core facility, embodying the Bio-X spirit of interdisciplinary communication and collaboration; and the Proteomics Shared Resource of the Stanford Comprehensive Cancer Center. The laboratory's expertise and support are available to researchers throughout Stanford University, Stanford Medical Center, and beyond. SUMS users have broad analytical needs and interests, ranging from general qualitative analysis to targeted quantitative assays, and proteomics to metabolomics. A total of 11 mass spectrometers interfaced with analytical- and capillary-scale HPLC and UPLC, as well as GC front ends support these research projects:Single quad GC-MS and LC-MS instruments are operated as open access systems, available 24/7 to trained users. Projects are run by staff scientists on one or more of single quad, ion trap, triple quad, Q-Tof, hybrid Orbitrap, and benchtop Orbitrap instruments, matching the requirements of the projects to the strengths of the instrumentation. The expertise and enthusiasm of the SUMS staff are the bedrock of the laboratory. In addition to making available state-of-the-art, user-friendly facilities and services, SUMS enables education, method development, and new applications development, designed to meet the rapidly evolving needs of researchers.
PMCID: PMC2918057
18.  OmicsHub Proteomics Software Tool 
RP-6
OmicsHub Proteomics integrates in one single platform all the steps of a Mass Spectrometry Experiment reducing time and data management complexity. The proteomics data automation and data management/analysis provided by OmicsHub Proteomics solves the typical problems your lab members find on a daily basis and makes life easier when performing tasks such as multiple search engine support, pathways integration or custom report generation for external customers. OmicsHub has been designed as a central data management system to collect, analyze and annotate proteomics experimental data enabling users to automate tasks. OmicsHub Proteomics helps laboratories to easily meet proteomics standards such as PRIDE or FuGE and works with controlled vocabulary experiment annotation. The software enables your lab members to take a greater advantage of the Mascot and Phenyx search engines unique capabilities for protein identification. Multiple searches can be launch at once, allowing peak list data from several spots or chromatograms to be sent concurrently to Mascot/Phenyx. OmicsHub Proteomics works for both LC and Gel workflows. The system allows to store and compare proteomics data generated from different Mass Spectrometry instruments in a single platform instead of having a specific software for each of them. It is a web application which installs in a single server needing just Web Browser to have access to it. All experimental actions are userstamp and datestamp allowing the audit tracking of every action performed in OmicsHub. Some of the OmicsHub Proteomics main features are Protein identification, Biological annotation, Report customization, PRIDE standard, Pathways integration, Group proteins results removing redundancy, Peak filtering and FDR cutoff for decoy databases. OmicsHub Proteomics its flexible enough to parsers for new file formats to be easily imported and fits your budget having a very competitive price for its perpetual license.
PMCID: PMC2918172
19.  Computational prediction of novel non-coding RNAs in Arabidopsis thaliana 
BMC Bioinformatics  2009;10(Suppl 1):S36.
Background
Non-coding RNA (ncRNA) genes do not encode proteins but produce functional RNA molecules that play crucial roles in many key biological processes. Recent genome-wide transcriptional profiling studies using tiling arrays in organisms such as human and Arabidopsis have revealed a great number of transcripts, a large portion of which have little or no capability to encode proteins. This unexpected finding suggests that the currently known repertoire of ncRNAs may only represent a small fraction of ncRNAs of the organisms. Thus, efficient and effective prediction of ncRNAs has become an important task in bioinformatics in recent years. Among the available computational methods, the comparative genomic approach seems to be the most powerful to detect ncRNAs. The recent completion of the sequencing of several major plant genomes has made the approach possible for plants.
Results
We have developed a pipeline to predict novel ncRNAs in the Arabidopsis (Arabidopsis thaliana) genome. It starts by comparing the expressed intergenic regions of Arabidopsis as provided in two whole-genome high-density oligo-probe arrays from the literature with the intergenic nucleotide sequences of all completely sequenced plant genomes including rice (Oryza sativa), poplar (Populus trichocarpa), grape (Vitis vinifera), and papaya (Carica papaya). By using multiple sequence alignment, a popular ncRNA prediction program (RNAz), wet-bench experimental validation, protein-coding potential analysis, and stringent screening against various ncRNA databases, the pipeline resulted in 16 families of novel ncRNAs (with a total of 21 ncRNAs).
Conclusion
In this paper, we undertake a genome-wide search for novel ncRNAs in the genome of Arabidopsis by a comparative genomics approach. The identified novel ncRNAs are evolutionarily conserved between Arabidopsis and other recently sequenced plants, and may conduct interesting novel biological functions.
doi:10.1186/1471-2105-10-S1-S36
PMCID: PMC2648795  PMID: 19208137
20.  The quantitative proteomes of human-induced pluripotent stem cells and embryonic stem cells 
An in-depth proteomic comparison of human-induced pluripotent stem cells, and their parent fibroblast cells, with embryonic stem cells shows that the reprogramming process comprehensively remodels protein expression levels, creating cells that closely resemble natural stem cells.
We present here a large proteomic characterization of human embryonic stem cells, human-induced pluripotent stem cells and their parental fibroblasts cell lines.Overall, 97.8% of the 2683 quantified proteins in four experiments showed no significant differences in abundance between hESC and hiPSC highlighting the high similarity of these pluripotent cell lines.In total, 58 proteins were found significantly differentially expressed between hiPSCs and hESCs. The observed low overlap of these proteins with previous transcriptomic studies suggests that those differences do no reflect a recurrent molecular signature.
Human embryonic stem cells (hESCs) are capable of self-renewal and multi-lineage differentiation. However, the use of hESCs for clinical treatment entails ethical issues as they are derived from human embryos. Recently, reprogramming of somatic cells to an embryonic stem cell-like state, named induced pluripotent stem cells (iPSCs), was achieved through ectopic expression of defined factors. In addition to their clinical potential, hiPSCs represent a unique tool to develop cellular models for human diseases as well. Although current functional assays (e.g., tetraploid complementation) have confirmed the pluripotency of hiPSCs, there might still be significant differences (e.g., differentiation potential) when compared with their natural hESCs counterparts. Consequently, an extensive molecular characterization to address differences and similarities between these two pluripotent cell lines seems to be a prerequisite before any clinical application is conducted. Despite that great efforts, mainly at the genomic levels, have been made to address how similar hESCs and hiPSCs are, the definite answer to this fundamental question is currently still debated. Direct assessment of protein levels has yet to be incorporated into these integrative systems-level analyses. Protein levels are tuned by intricate mechanisms of gene expression regulation and it has recently been documented that mRNA and protein levels poorly correlate in mouse ESCs. Here, we use in-depth quantitative proteomics to gain insights into the differences and similarities in the protein content of two hiPS cell lines, their precursor IMR90 and 4Skin fibroblast cell lines and one hES cell line, providing novel molecular signatures that may assist in filling a gap in the understanding of pluripotency.
To study the degree of similarity, at the protein level, between hiPSCs and hESCs, four MS-based proteomic experiments were designed that use our in-house developed triplex dimethyl labeling chemistry followed by extensive fractionation by strong cation exchange (SCX) chromatography to reduce the sample complexity. High-resolution LC-MS/MS with dedicated fragmentation schemes (i.e., electron transfer dissociation, collision-induced dissociation and higher-energy collision dissociation) was subsequently used to maximize peptide identification rates. A total of 348 LC-MS/MS analyses (including technical and biological replicates) were performed. We confidently identified 1 593 446 peptide spectrum matches (peptide FDR<1%) corresponding to 10 628 unique protein groups (protein FDR∼4%). Using the extracted ion chromatograms, we also estimated the absolute abundance of the proteins within the samples spanning six orders of magnitude. To the best of our knowledge, the coverage obtained in this study represents the largest achieved by any proteomics screen on pluripotent cells.
Most importantly, our results indicate that the reprogramming process remodeled the proteome of both fibroblast cell lines to a profile that closely resembles the pluripotent hESCs proteome: 97.8% of the quantified proteins (2638 proteins in all four experiments) showed nonsignificant changes. Nevertheless, a small fraction of 58 proteins, mainly related to metabolism, antigen processing and cell adhesion, was found significantly regulated between hiPSCs and hESCs. A comparison of the regulated proteins to previously published transcriptomic studies showed a low overlap, highlighting the emerging notion that differences between both pluripotent cell lines rather reflect experimental conditions than a recurrent molecular signature. On the other side, the inclusion of the two parental fibroblast cell lines in our analysis allowed us to study changes in the proteome at both the starting and end points of the reprogramming process. As expected, the vast majority of the proteins (73.4%) showed differential expression between the parental fibroblasts and the reprogrammed pluripotent cells.
To find out if the differences observed in our study were a consequence of transcriptional or translational regulation, we performed paired genome-wide gene expression analyses on the same six samples that were used for the proteomic profiling. Overall, we observed a good correlation between mRNA and protein levels (r∼0.7). These results further authenticated the proteomic measurements and implied a high degree of control at the transcriptional level. Nevertheless, numerous genes were found uncorrelated highlighting the necessity of complementing transcriptomic-based approaches with proteomics.
Assessing relevant molecular differences between human-induced pluripotent stem cells (hiPSCs) and human embryonic stem cells (hESCs) is important, given that such differences may impact their potential therapeutic use. Controversy surrounds recent gene expression studies comparing hiPSCs and hESCs. Here, we present an in-depth quantitative mass spectrometry-based analysis of hESCs, two different hiPSCs and their precursor fibroblast cell lines. Our comparisons confirmed the high similarity of hESCs and hiPSCS at the proteome level as 97.8% of the proteins were found unchanged. Nevertheless, a small group of 58 proteins, mainly related to metabolism, antigen processing and cell adhesion, was found significantly differentially expressed between hiPSCs and hESCs. A comparison of the regulated proteins with previously published transcriptomic studies showed a low overlap, highlighting the emerging notion that differences between both pluripotent cell lines rather reflect experimental conditions than a recurrent molecular signature.
doi:10.1038/msb.2011.84
PMCID: PMC3261715  PMID: 22108792
human embryonic stem cells; human-induced pluripotent stem cells; proteomics; quantitation
21.  The maturing of proteomics in cardiovascular research 
Circulation research  2011;108(4):490-498.
Proteomic technologies are used to study the complexity of proteins, their roles and biological functions. It is based on the premise that the diversity of proteins, comprising their isoforms, and post translational modifications (PTMs) underlies biology. Based on an annotated human cardiac proteins 62 % have at least one PTM (phosphorylation currently dominating) while ~25% have more than one type of modification. The field of proteomics strives to observe and quantify this protein diversity. It represents a broad group of technologies and methods arising from analytical protein biochemistry, analytical separation, mass spectrometry and bioinformatics. Since the 1990s the application of proteomic analysis has been increasingly used in cardiovascular research. Technology development and adaptation has been at the heart of this progress. Technology undergoes a maturing becoming routine and ultimately obsolete being replaced by newer methods. Due to extensive methodological improvements, many proteomic studies today observe 1000-5000 proteins. Only five years ago this was not feasible. Even so, there are still road blocks. Nowadays, there is a focus on obtaining better characterization of protein isoforms and specific PTMs. Consequently, new techniques for identification and quantification of modified amino acid residues are required, as is the assessment of SNPs in addition to determination of the structural and functional consequences. In this series, four articles provide concrete examples of how proteomics can be incorporated into cardiovascular research and address specific biological questions. They also illustrate how novel discoveries can be made and how proteomic technology has continued to evolve.
doi:10.1161/CIRCRESAHA.110.226894
PMCID: PMC3500592  PMID: 21335431
Proteomics; technology; protein isoform; posttranslational modification; polymorphorism
22.  Proteome Regulation during Olea europaea Fruit Development 
PLoS ONE  2013;8(1):e53563.
Background
Widespread in the Mediterranean basin, Olea europaea trees are gaining worldwide popularity for the nutritional and cancer-protective properties of the oil, mechanically extracted from ripe fruits. Fruit development is a physiological process with remarkable impact on the modulation of the biosynthesis of compounds affecting the quality of the drupes as well as the final composition of the olive oil. Proteomics offers the possibility to dig deeper into the major changes during fruit development, including the important phase of ripening, and to classify temporal patterns of protein accumulation occurring during these complex physiological processes.
Methodology/Principal Findings
In this work, we started monitoring the proteome variations associated with olive fruit development by using comparative proteomics coupled to mass spectrometry. Proteins extracted from drupes at three different developmental stages were separated on 2-DE and subjected to image analysis. 247 protein spots were revealed as differentially accumulated. Proteins were identified from a total of 121 spots and discussed in relation to olive drupe metabolic changes occurring during fruit development. In order to evaluate if changes observed at the protein level were consistent with changes of mRNAs, proteomic data produced in the present work were compared with transcriptomic data elaborated during previous studies.
Conclusions/Significance
This study identifies a number of proteins responsible for quality traits of cv. Coratina, with particular regard to proteins associated to the metabolism of fatty acids, phenolic and aroma compounds. Proteins involved in fruit photosynthesis have been also identified and their pivotal contribution in oleogenesis has been discussed. To date, this study represents the first characterization of the olive fruit proteome during development, providing new insights into fruit metabolism and oil accumulation process.
doi:10.1371/journal.pone.0053563
PMCID: PMC3547947  PMID: 23349718
23.  A proteomic chronology of gene expression through the cell cycle in human myeloid leukemia cells 
eLife  2014;3:e01630.
Technological advances have enabled the analysis of cellular protein and RNA levels with unprecedented depth and sensitivity, allowing for an unbiased re-evaluation of gene regulation during fundamental biological processes. Here, we have chronicled the dynamics of protein and mRNA expression levels across a minimally perturbed cell cycle in human myeloid leukemia cells using centrifugal elutriation combined with mass spectrometry-based proteomics and RNA-Seq, avoiding artificial synchronization procedures. We identify myeloid-specific gene expression and variations in protein abundance, isoform expression and phosphorylation at different cell cycle stages. We dissect the relationship between protein and mRNA levels for both bulk gene expression and for over ∼6000 genes individually across the cell cycle, revealing complex, gene-specific patterns. This data set, one of the deepest surveys to date of gene expression in human cells, is presented in an online, searchable database, the Encyclopedia of Proteome Dynamics (http://www.peptracker.com/epd/).
DOI: http://dx.doi.org/10.7554/eLife.01630.001
eLife digest
Cells are complex environments: at any one time, thousands of different genes act as molecular templates to produce messenger RNA (mRNA) molecules, which themselves are templates used to produce proteins. However, not all genes are active at all times inside all cells: as cells grow and divide as part of the cell division cycle, genes are switched on and off on a regular basis. Similarly, the patterns of mRNA and protein production are different in, say, immune and skin cells.
In recent years, the tools available for detecting mRNA molecules and proteins have become more powerful, allowing researchers to move beyond just measuring the total amounts of mRNA and protein in the cell to now measuring individual amounts of specific mRNA and protein molecules encoded by specific genes. However, it has been a challenge to make these measurements at different stages of the cell cycle. Most of the methods used to do this have involved artificially ‘arresting’ the cell cycle, which can lead to side effects that are difficult to account for.
Ly et al. have now overcome these problems using a combination of three methods to measure the levels of mRNA and protein molecules associated with over 6000 genes in human cancer cells derived from myeloid leukemia. Exploiting the fact that cells change size during the cell cycle, Ly et al. used a centrifugation technique to separate cells based on their size and, therefore, the stage of the cell cycle they were at, thus avoiding the need to arrest the cell cycle. An approach called RNA-Seq was then employed to measure the levels of the different mRNA molecules in the cells, and a device called a mass spectrometer was used to identify and measure the levels of many different proteins.
In addition to being able to follow the level of mRNA and protein production for a large number of genes throughout the cell division cycle, while also obtaining detailed information about how many of the proteins are modified, Ly et al. discovered that—contrary to expectations—low numbers of mRNA molecules were sometimes associated with high numbers of the corresponding protein, and vice versa. This work provides a better understanding of the complex relationship between the levels of an mRNA and its corresponding protein product, and also demonstrates how it may be possible to detect subtle but important differences between cell types and disease states, including different types of cancer.
DOI: http://dx.doi.org/10.7554/eLife.01630.002
doi:10.7554/eLife.01630
PMCID: PMC3936288  PMID: 24596151
proteomics; mass spectrometry; RNA-Seq; cell cycle; transcriptomics; human
24.  Computational Biomarker Pipeline from Discovery to Clinical Implementation: Plasma Proteomic Biomarkers for Cardiac Transplantation 
PLoS Computational Biology  2013;9(4):e1002963.
Recent technical advances in the field of quantitative proteomics have stimulated a large number of biomarker discovery studies of various diseases, providing avenues for new treatments and diagnostics. However, inherent challenges have limited the successful translation of candidate biomarkers into clinical use, thus highlighting the need for a robust analytical methodology to transition from biomarker discovery to clinical implementation. We have developed an end-to-end computational proteomic pipeline for biomarkers studies. At the discovery stage, the pipeline emphasizes different aspects of experimental design, appropriate statistical methodologies, and quality assessment of results. At the validation stage, the pipeline focuses on the migration of the results to a platform appropriate for external validation, and the development of a classifier score based on corroborated protein biomarkers. At the last stage towards clinical implementation, the main aims are to develop and validate an assay suitable for clinical deployment, and to calibrate the biomarker classifier using the developed assay. The proposed pipeline was applied to a biomarker study in cardiac transplantation aimed at developing a minimally invasive clinical test to monitor acute rejection. Starting with an untargeted screening of the human plasma proteome, five candidate biomarker proteins were identified. Rejection-regulated proteins reflect cellular and humoral immune responses, acute phase inflammatory pathways, and lipid metabolism biological processes. A multiplex multiple reaction monitoring mass-spectrometry (MRM-MS) assay was developed for the five candidate biomarkers and validated by enzyme-linked immune-sorbent (ELISA) and immunonephelometric assays (INA). A classifier score based on corroborated proteins demonstrated that the developed MRM-MS assay provides an appropriate methodology for an external validation, which is still in progress. Plasma proteomic biomarkers of acute cardiac rejection may offer a relevant post-transplant monitoring tool to effectively guide clinical care. The proposed computational pipeline is highly applicable to a wide range of biomarker proteomic studies.
Author Summary
Novel proteomic technology has led to the generation of vast amounts of biological data and the identification of numerous potential biomarkers. However, computational approaches to translate this information into knowledge capable of impacting clinical care have been lagging. We propose a computational proteomic pipeline for biomarker studies that is founded on the combination of advanced statistical methodologies. We demonstrate our approach through the analysis of data obtained from heart transplant patients. Heart transplantation is the gold standard treatment for patients with end-stage heart failure, but is complicated by episodes of immune rejection that can adversely impact patient outcomes. Current rejection monitoring approaches are highly invasive, requiring a biopsy of the heart. This work aims to reduce the need for biopsies, and demonstrate the power and utility of computational approaches in proteomic biomarker discovery. Our work utilizes novel high-throughput proteomic technology combined with advanced statistical techniques to identify blood markers that guide the decision as to whether a biopsy is warranted, reduce the number of unnecessary biopsies, and ultimately diagnose the presence of rejection in heart transplant patients. Additionally, the proposed computational methodologies can be applied to a range of proteomic biomarker studies of various diseases and conditions.
doi:10.1371/journal.pcbi.1002963
PMCID: PMC3617196  PMID: 23592955
25.  A multidimensional platform for the purification of non-coding RNA species 
Nucleic Acids Research  2013;41(17):e168.
A renewed interest in non-coding RNA (ncRNA) has led to the discovery of novel RNA species and post-transcriptional ribonucleoside modifications, and an emerging appreciation for the role of ncRNA in RNA epigenetics. Although much can be learned by amplification-based analysis of ncRNA sequence and quantity, there is a significant need for direct analysis of RNA, which has led to numerous methods for purification of specific ncRNA molecules. However, no single method allows purification of the full range of cellular ncRNA species. To this end, we developed a multidimensional chromatographic platform to resolve, isolate and quantify all canonical ncRNAs in a single sample of cells or tissue, as well as novel ncRNA species. The applicability of the platform is demonstrated in analyses of ncRNA from bacteria, human cells and plasmodium-infected reticulocytes, as well as a viral RNA genome. Among the many potential applications of this platform are a system-level analysis of the dozens of modified ribonucleosides in ncRNA, characterization of novel long ncRNA species, enhanced detection of rare transcript variants and analysis of viral genomes.
doi:10.1093/nar/gkt668
PMCID: PMC3783195  PMID: 23907385

Results 1-25 (996133)