PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-17 (17)
 

Clipboard (0)
None

Select a Filter Below

Journals
more »
Year of Publication
1.  “Add to Subtract”: A Simple Method to Remove Complex Background Signals from the 1H Nuclear Magnetic Resonance Spectra of Mixtures 
Analytical Chemistry  2011;84(2):994-1002.
Due to its highly reproducible and quantitative nature, and minimal requirements for sample preparation or separation, 1H nuclear magnetic resonance (NMR) spectroscopy is widely used for profiling small-molecule metabolites in biofluids. However 1H NMR spectra contain many overlapped peaks. In particular, blood serum/plasma and diabetic urine samples contain high concentrations of glucose, which produce strong peaks between 3.2 ppm – 4.0 ppm. Signals from most metabolites in this region are overwhelmed by the glucose background signals and become invisible. We propose a simple “Add to Subtract” background subtraction method, and show that it can reduce the glucose signals by 98% to allow retrieval of the hidden information. This procedure includes adding a small drop of concentrated glucose solution to the sample in the NMR tube, mixing, waiting for an equilibration time, and acquisition of a second spectrum. The glucose-free spectra are then generated by spectral subtraction using Bruker Topspin software. Subsequent multivariate statistical analysis can then be used to identify biomarker candidate signals for distinguishing different types of biological samples. The principle of this approach is generally applicable for all quantitative spectral data and should find utility in a variety of NMR-based mixture analyses as well as in metabolite profiling.
doi:10.1021/ac202548n
PMCID: PMC3282557  PMID: 22221170
1H NMR; metabolomics; metabolite profiling; glucose; signal suppression; mixture analysis; blood; urine
3.  A statistical model-building perspective to identification of MS/MS spectra with PeptideProphet 
BMC Bioinformatics  2012;13(Suppl 16):S1.
PeptideProphet is a post-processing algorithm designed to evaluate the confidence in identifications of MS/MS spectra returned by a database search. In this manuscript we describe the "what and how" of PeptideProphet in a manner aimed at statisticians and life scientists who would like to gain a more in-depth understanding of the underlying statistical modeling. The theory and rationale behind the mixture-modeling approach taken by PeptideProphet is discussed from a statistical model-building perspective followed by a description of how a model can be used to express confidence in the identification of individual peptides or sets of peptides. We also demonstrate how to evaluate the quality of model fit and select an appropriate model from several available alternatives. We illustrate the use of PeptideProphet in association with the Trans-Proteomic Pipeline, a free suite of software used for protein identification.
doi:10.1186/1471-2105-13-S16-S1
PMCID: PMC3489532  PMID: 23176103
4.  Statistical protein quantification and significance analysis in label-free LC-MS experiments with complex designs 
BMC Bioinformatics  2012;13(Suppl 16):S6.
Background
Liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS) is widely used for quantitative proteomic investigations. The typical output of such studies is a list of identified and quantified peptides. The biological and clinical interest is, however, usually focused on quantitative conclusions at the protein level. Furthermore, many investigations ask complex biological questions by studying multiple interrelated experimental conditions. Therefore, there is a need in the field for generic statistical models to quantify protein levels even in complex study designs.
Results
We propose a general statistical modeling approach for protein quantification in arbitrary complex experimental designs, such as time course studies, or those involving multiple experimental factors. The approach summarizes the quantitative experimental information from all the features and all the conditions that pertain to a protein. It enables both protein significance analysis between conditions, and protein quantification in individual samples or conditions. We implement the approach in an open-source R-based software package MSstats suitable for researchers with a limited statistics and programming background.
Conclusions
We demonstrate, using as examples two experimental investigations with complex designs, that a simultaneous statistical modeling of all the relevant features and conditions yields a higher sensitivity of protein significance analysis and a higher accuracy of protein quantification as compared to commonly employed alternatives. The software is available at http://www.stat.purdue.edu/~ovitek/Software.html.
doi:10.1186/1471-2105-13-S16-S6
PMCID: PMC3489535  PMID: 23176351
Label-free LC-MS/MS; linear mixed effects models; protein quantification; quantitative proteomics; statistical design of experiments
5.  Noise reduction in genome-wide perturbation screens using linear mixed-effect models 
Bioinformatics  2011;27(16):2173-2180.
Motivation: High-throughput perturbation screens measure the phenotypes of thousands of biological samples under various conditions. The phenotypes measured in the screens are subject to substantial biological and technical variation. At the same time, in order to enable high throughput, it is often impossible to include a large number of replicates, and to randomize their order throughout the screens. Distinguishing true changes in the phenotype from stochastic variation in such experimental designs is extremely challenging, and requires adequate statistical methodology.
Results: We propose a statistical modeling framework that is based on experimental designs with at least two controls profiled throughout the experiment, and a normalization and variance estimation procedure with linear mixed-effects models. We evaluate the framework using three comprehensive screens of Saccharomyces cerevisiae, which involve 4940 single-gene knock-out haploid mutants, 1127 single-gene knock-out diploid mutants and 5798 single-gene overexpression haploid strains. We show that the proposed approach (i) can be used in conjunction with practical experimental designs; (ii) allows extensions to alternative experimental workflows; (iii) enables a sensitive discovery of biologically meaningful changes; and (iv) strongly outperforms the existing noise reduction procedures.
Availability: All experimental datasets are publicly available at www.ionomicshub.org. The R package HTSmix is available at http://www.stat.purdue.edu/~ovitek/HTSmix.html.
Contact: ovitek@stat.purdue.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btr359
PMCID: PMC3150043  PMID: 21685046
6.  Developmental Changes in the Metabolic Network of Snapdragon Flowers 
PLoS ONE  2012;7(7):e40381.
Evolutionary and reproductive success of angiosperms, the most diverse group of land plants, relies on visual and olfactory cues for pollinator attraction. Previous work has focused on elucidating the developmental regulation of pathways leading to the formation of pollinator-attracting secondary metabolites such as scent compounds and flower pigments. However, to date little is known about how flowers control their entire metabolic network to achieve the highly regulated production of metabolites attracting pollinators. Integrative analysis of transcripts and metabolites in snapdragon sepals and petals over flower development performed in this study revealed a profound developmental remodeling of gene expression and metabolite profiles in petals, but not in sepals. Genes up-regulated during petal development were enriched in functions related to secondary metabolism, fatty acid catabolism, and amino acid transport, whereas down-regulated genes were enriched in processes involved in cell growth, cell wall formation, and fatty acid biosynthesis. The levels of transcripts and metabolites in pathways leading to scent formation were coordinately up-regulated during petal development, implying transcriptional induction of metabolic pathways preceding scent formation. Developmental gene expression patterns in the pathways involved in scent production were different from those of glycolysis and the pentose phosphate pathway, highlighting distinct developmental regulation of secondary metabolism and primary metabolic pathways feeding into it.
doi:10.1371/journal.pone.0040381
PMCID: PMC3394800  PMID: 22808147
7.  Identification and quantification of metabolites in 1H NMR spectra by Bayesian model selection 
Bioinformatics  2011;27(12):1637-1644.
Motivation: Nuclear magnetic resonance (NMR) spectroscopy is widely used for high-throughput characterization of metabolites in complex biological mixtures. However, accurate interpretation of the spectra in terms of identities and abundances of metabolites can be challenging, in particular in crowded regions with heavy peak overlap. Although a number of computational approaches for this task have recently been proposed, they are not entirely satisfactory in either accuracy or extent of automation.
Results: We introduce a probabilistic approach Bayesian Quantification (BQuant), for fully automated database-based identification and quantification of metabolites in local regions of 1H NMR spectra. The approach represents the spectra as mixtures of reference profiles from a database, and infers the identities and the abundances of metabolites by Bayesian model selection. We show using a simulated dataset, a spike-in experiment and a metabolomic investigation of plasma samples that BQuant outperforms the available automated alternatives in accuracy for both identification and quantification.
Availability: The R package BQuant is available at: http://www.stat.purdue.edu/~ovitek/BQuant-Web/.
Contact: ovitek@stat.purdue.edu; zhengc@purdue.edu
Supplementary Information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btr118
PMCID: PMC3106181  PMID: 21398670
8.  Multivariate Statistical Identification of Human Bladder Carcinomas Using Ambient Ionization Imaging Mass Spectrometry 
Diagnosis of human bladder cancer in untreated tissue sections is achieved by using imaging data from desorption electrospray ionization mass spectrometry (DESI-MS) combined with multivariate statistical analysis. We use the distinctive DESI-MS glycerophospholipid (GP) mass spectral profiles to visually characterize and formally classify twenty pairs (40 tissue samples) of human cancerous and adjacent normal bladder tissue samples. The individual ion images derived from the acquired profiles correlate with standard histological hematoxylin and eosin (H&E)-stained serial sections. The profiles allow us to classify the disease status of the tissue samples with high accuracy as judged by reference histological data. To achieve this, the data from the twenty pairs were divided into a training set and a validation set. Spectra from the tumor and normal regions of each of the tissue sections in the training set were used for orthogonal projection to latent structures (O-PLS) treated partial least-square discriminate analysis (PLS-DA). This predictive model was then validated by using the validation set and showed a 5% error rate for classification and a misclassification rate of 12%. It was also used to create synthetic images of the tissue sections showing pixel-by-pixel disease classification of the tissue and these data agreed well with the independent classification that uses histological data by a certified pathologist. This represents the first application of multivariate statistical methods for classification by ambient ionization although these methods have been applied previously to other MS imaging methods. The results are encouraging in terms of the development of a method that could be utilized in a clinical setting through visualization and diagnosis of intact tissue.
doi:10.1002/chem.201001692
PMCID: PMC3050580  PMID: 21284043
cancer; desorption electrospray ionization; lipidomics; molecular imaging; multivariate statistics; mass spectrometry
9.  Computational Mass Spectrometry–Based Proteomics 
PLoS Computational Biology  2011;7(12):e1002277.
doi:10.1371/journal.pcbi.1002277
PMCID: PMC3228769  PMID: 22144880
10.  Phosphoproteomic Analysis Reveals Interconnected System-Wide Responses to Perturbations of Kinases and Phosphatases in Yeast 
Science signaling  2010;3(153):rs4.
The phosphorylation and dephosphorylation of proteins by kinases and phosphatases constitute an essential regulatory network in eukaryotic cells. This network supports the flow of information from sensors through signaling systems to effector molecules, and ultimately drives the phenotype and function of cells, tissues, and organisms. Dysregulation of this process has severe consequences and is one of the main factors in the emergence and progression of diseases, including cancer. Thus, major efforts have been invested in developing specific inhibitors that modulate the activity of individual kinases or phosphatases; however, it has been difficult to assess how such pharmacological interventions would affect the cellular signaling network as a whole. Here, we used label-free, quantitative phosphoproteomics in a systematically perturbed model organism (Saccharomyces cerevisiae) to determine the relationships between 97 kinases, 27 phosphatases, and more than 1000 phosphoproteins. We identified 8814 regulated phosphorylation events, describing the first system-wide protein phosphorylation network in vivo. Our results show that, at steady state, inactivation of most kinases and phosphatases affected large parts of the phosphorylation-modulated signal transduction machinery, and not only the immediate downstream targets. The observed cellular growth phenotype was often well maintained despite the perturbations, arguing for considerable robustness in the system. Our results serve to constrain future models of cellular signaling and reinforce the idea that simple linear representations of signaling pathways might be insufficient for drug development and for describing organismal homeostasis.
doi:10.1126/scisignal.2001182
PMCID: PMC3072779  PMID: 21177495
11.  A Coastal Cline in Sodium Accumulation in Arabidopsis thaliana Is Driven by Natural Variation of the Sodium Transporter AtHKT1;1 
PLoS Genetics  2010;6(11):e1001193.
The genetic model plant Arabidopsis thaliana, like many plant species, experiences a range of edaphic conditions across its natural habitat. Such heterogeneity may drive local adaptation, though the molecular genetic basis remains elusive. Here, we describe a study in which we used genome-wide association mapping, genetic complementation, and gene expression studies to identify cis-regulatory expression level polymorphisms at the AtHKT1;1 locus, encoding a known sodium (Na+) transporter, as being a major factor controlling natural variation in leaf Na+ accumulation capacity across the global A. thaliana population. A weak allele of AtHKT1;1 that drives elevated leaf Na+ in this population has been previously linked to elevated salinity tolerance. Inspection of the geographical distribution of this allele revealed its significant enrichment in populations associated with the coast and saline soils in Europe. The fixation of this weak AtHKT1;1 allele in these populations is genetic evidence supporting local adaptation to these potentially saline impacted environments.
Author Summary
The unusual geographical distribution of certain animal and plant species has provided puzzling questions to the scientific community regarding the interrelationship of evolutionary and geographic histories for generations. With DNA sequencing, such puzzles have now extended to the geographical distribution of genetic variation within a species. Here, we explain one such puzzle in the European population of Arabidopsis thaliana, where we find that a version of a gene encoding for a sodium-transporter with reduced function is almost uniquely found in populations of this plant growing close to the coast or on known saline soils. This version of the gene has previously been linked with elevated salinity tolerance, and its unusual distribution in populations of plants growing in coastal regions and on saline soils suggests that it is playing a role in adapting these plants to the elevated salinity of their local environment.
doi:10.1371/journal.pgen.1001193
PMCID: PMC2978683  PMID: 21085628
12.  Correlation between y-Type Ions Observed in Ion Trap and Triple Quadrupole Mass Spectrometers 
Journal of proteome research  2009;8(9):4243-4251.
Multiple reaction monitoring mass spectrometry (MRM-MS) is a technique for high-sensitivity targeted analysis. In proteomics, MRM-MS can be used to monitor and quantify a peptide based on the production of expected fragment peaks from the selected peptide precursor ion. The choice of which fragment ions to monitor in order to achieve maximum sensitivity in MRM-MS can potentially be guided by existing MS/MS spectra. However, because the majority of discovery experiments are performed on ion trap platforms, there is concern in the field regarding the generalizability of these spectra to MRM-MS on a triple quadrupole instrument. In light of this concern, many operators perform an optimization step to determine the most intense fragments for a target peptide on a triple quadrupole mass spectrometer. We have addressed this issue by targeting, on a triple quadrupole, the top six y-ion peaks from ion trap-derived consensus library spectra for 258 doubly charged peptides from three different sample sets and quantifying the observed elution curves. This analysis revealed a strong correlation between the y-ion peak rank order and relative intensity across platforms. This suggests that y-type ions obtained from ion trap-based library spectra are well-suited for generating MRM-MS assays for triple quadrupoles and that optimization is not required for each target peptide.
doi:10.1021/pr900298b
PMCID: PMC2802215  PMID: 19603825
multiple reaction monitoring (MRM); selective reaction monitoring (SRM); triple quadrupole; ion trap; mass spectrometer; y-ions; spectral library; spectral correlation
13.  Interdependence of Signal Processing and Analysis of Urine 1H NMR Spectra for Metabolic Profiling 
Analytical chemistry  2009;81(15):6080-6088.
Metabolic profiling of urine presents challenges due to the extensive random variation of metabolite concentrations, and to dilution resulting from changes in the overall urine volume. Thus statistical analysis methods play a particularly important role, however appropriate choices of these methods are not straightforward. Here we investigate constant and variance-stabilization normalization of raw and peak picked spectra, for use with exploratory analysis (principal component analysis) and confirmatory analysis (ordinary and Empirical Bayes t-test) in 1H NMR-based metabolic profiling of urine. We compare the performance of these methods using urine samples spiked with known metabolites according to a Latin square design. We find that analysis of peak picked and log-transformed spectra is preferred, and that signal processing and statistical analysis steps are interdependent. While variance-stabilizing transformation is preferred in conjunction with principal component analysis, constant normalization is more appropriate for use with a t-test. Empirical Bayes t-test provides more reliable conclusions when the number of samples in each group is relatively small. Performance of these methods is illustrated using a clinical metabolomics experiment on patients with type 1 diabetes to evaluate the effect of insulin deprivation.
doi:10.1021/ac900424c
PMCID: PMC2789356  PMID: 19950923
Metabolomics; Metabolite profiling; NMR spectroscopy; Normalization; Moderated t-test; Logarithmic transformation; Urine; Diabetes
14.  Differential Plasma Glycoproteome of p19ARF Skin Cancer Mouse Model Using the Corra Label-Free LC-MS Proteomics Platform 
Clinical proteomics  2008;4(3-4):105.
A proof-of-concept demonstration of the use of label-free quantitative glycoproteomics for biomarker discovery workflow is presented here, using a mouse model for skin cancer as an example. Blood plasma was collected from 10 control mice, and 10 mice having a mutation in the p19ARF gene, conferring them high propensity to develop skin cancer after carcinogen exposure. We enriched for N-glycosylated plasma proteins, ultimately generating deglycosylated forms of the modified tryptic peptides for liquid chromatography mass spectrometry (LC-MS) analyses. LC-MS runs for each sample were then performed with a view to identifying proteins that were differentially abundant between the two mouse populations. We then used a recently developed computational framework, Corra, to perform peak picking and alignment, and to compute the statistical significance of any observed changes in individual peptide abundances. Once determined, the most discriminating peptide features were then fragmented and identified by tandem mass spectrometry with the use of inclusion lists. We next assessed the identified proteins to see if there were sets of proteins indicative of specific biological processes that correlate with the presence of disease, and specifically cancer, according to their functional annotations. As expected for such sick animals, many of the proteins identified were related to host immune response. However, a significant number of proteins also directly associated with processes linked to cancer development, including proteins related to the cell cycle, localisation, trasport, and cell death. Additional analysis of the same samples in profiling mode, and in triplicate, confirmed that replicate MS analysis of the same plasma sample generated less variation than that observed between plasma samples from different individuals, demonstrating that the reproducibility of the LC-MS platform was sufficient for this application. These results thus show that an LC-MS-based workflow can be a useful tool for the generation of candidate proteins of interest as part of a disease biomarker discovery effort.
doi:10.1007/s12014-008-9018-8
PMCID: PMC2821048  PMID: 20157627
Skin cancer; LC-MS; Label-free protein quantification; Biomarker discovery; Systems biology; Targeted peptide sequencing; Glycoproteomics; Plasma
15.  Quantification of the Compositional Information Provided by Immonium Ions on a Quadrupole-Time-of-Flight Mass Spectrometer 
Analytical chemistry  2008;80(14):5596-5606.
Immonium ions have been largely overlooked during the rapid expansion of mass spectrometry-based proteomics largely due to the dominance of ion trap instruments in the field. However, immonium ions are visible in hybrid quadrupole-time-of-flight (QTOF) mass spectrometers, which are now widely available. We have created the largest database to date of high-confidence sequence assignments to characterize the appearance of immonium ions in CID spectra using a QTOF instrument under “typical” operating conditions. With these data, we are able to demonstrate excellent correlation between immonium ion peak intensity and the likelihood of the appearance of the expected amino acid in the assigned sequence for phenylalanine, tyrosine, tryptophan, proline, histidine, valine, and the indistinguishable leucine and isoleucine residues. In addition, we have clearly demonstrated a positional effect whereby the proximity of the amino acid generating the immonium ion to the amino terminal of the peptide correlates with the strength of the immonium ion peak. This compositional information provided by the immonium ion peaks could substantially improve algorithms used for spectral assignment in mass spectrometry analysis using QTOF platforms.
doi:10.1021/ac8006076
PMCID: PMC2638499  PMID: 18564857
16.  Getting Started in Computational Mass Spectrometry–Based Proteomics 
PLoS Computational Biology  2009;5(5):e1000366.
doi:10.1371/journal.pcbi.1000366
PMCID: PMC2668757  PMID: 19492072
17.  Corra: Computational framework and tools for LC-MS discovery and targeted mass spectrometry-based proteomics 
BMC Bioinformatics  2008;9:542.
Background
Quantitative proteomics holds great promise for identifying proteins that are differentially abundant between populations representing different physiological or disease states. A range of computational tools is now available for both isotopically labeled and label-free liquid chromatography mass spectrometry (LC-MS) based quantitative proteomics. However, they are generally not comparable to each other in terms of functionality, user interfaces, information input/output, and do not readily facilitate appropriate statistical data analysis. These limitations, along with the array of choices, present a daunting prospect for biologists, and other researchers not trained in bioinformatics, who wish to use LC-MS-based quantitative proteomics.
Results
We have developed Corra, a computational framework and tools for discovery-based LC-MS proteomics. Corra extends and adapts existing algorithms used for LC-MS-based proteomics, and statistical algorithms, originally developed for microarray data analyses, appropriate for LC-MS data analysis. Corra also adapts software engineering technologies (e.g. Google Web Toolkit, distributed processing) so that computationally intense data processing and statistical analyses can run on a remote server, while the user controls and manages the process from their own computer via a simple web interface. Corra also allows the user to output significantly differentially abundant LC-MS-detected peptide features in a form compatible with subsequent sequence identification via tandem mass spectrometry (MS/MS). We present two case studies to illustrate the application of Corra to commonly performed LC-MS-based biological workflows: a pilot biomarker discovery study of glycoproteins isolated from human plasma samples relevant to type 2 diabetes, and a study in yeast to identify in vivo targets of the protein kinase Ark1 via phosphopeptide profiling.
Conclusion
The Corra computational framework leverages computational innovation to enable biologists or other researchers to process, analyze and visualize LC-MS data with what would otherwise be a complex and not user-friendly suite of tools. Corra enables appropriate statistical analyses, with controlled false-discovery rates, ultimately to inform subsequent targeted identification of differentially abundant peptides by MS/MS. For the user not trained in bioinformatics, Corra represents a complete, customizable, free and open source computational platform enabling LC-MS-based proteomic workflows, and as such, addresses an unmet need in the LC-MS proteomics field.
doi:10.1186/1471-2105-9-542
PMCID: PMC2651178  PMID: 19087345

Results 1-17 (17)