Search tips
Search criteria

Results 1-25 (100)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
more »
1.  Shaking Alone Induces De Novo Conversion of Recombinant Prion Proteins to β-Sheet Rich Oligomers and Fibrils 
PLoS ONE  2014;9(6):e98753.
The formation of β-sheet rich prion oligomers and fibrils from native prion protein (PrP) is thought to be a key step in the development of prion diseases. Many methods are available to convert recombinant prion protein into β-sheet rich fibrils using various chemical denaturants (urea, SDS, GdnHCl), high temperature, phospholipids, or mildly acidic conditions (pH 4). Many of these methods also require shaking or another form of agitation to complete the conversion process. We have identified that shaking alone causes the conversion of recombinant PrP to β-sheet rich oligomers and fibrils at near physiological pH (pH 5.5 to pH 6.2) and temperature. This conversion does not require any denaturant, detergent, or any other chemical cofactor. Interestingly, this conversion does not occur when the water-air interface is eliminated in the shaken sample. We have analyzed shaking-induced conversion using circular dichroism, resolution enhanced native acidic gel electrophoresis (RENAGE), electron microscopy, Fourier transform infrared spectroscopy, thioflavin T fluorescence and proteinase K resistance. Our results show that shaking causes the formation of β-sheet rich oligomers with a population distribution ranging from octamers to dodecamers and that further shaking causes a transition to β-sheet fibrils. In addition, we show that shaking-induced conversion occurs for a wide range of full-length and truncated constructs of mouse, hamster and cervid prion proteins. We propose that this method of conversion provides a robust, reproducible and easily accessible model for scrapie-like amyloid formation, allowing the generation of milligram quantities of physiologically stable β-sheet rich oligomers and fibrils. These results may also have interesting implications regarding our understanding of prion conversion and propagation both within the brain and via techniques such as protein misfolding cyclic amplification (PMCA) and quaking induced conversion (QuIC).
PMCID: PMC4043794  PMID: 24892647
2.  Metabolomic Analysis of Cold Acclimation of Arctic Mesorhizobium sp. Strain N33 
PLoS ONE  2013;8(12):e84801.
Arctic Mesorhizobium sp. N33 isolated from nodules of Oxytropis arctobia in Canada’s eastern Arctic has a growth temperature range from 0°C to 30°C and is a well-known cold-adapted rhizobia. The key molecular mechanisms underlying cold adaptation in Arctic rhizobia remains totally unknown. Since the concentration and contents of metabolites are closely related to stress adaptation, we applied GC-MS and NMR to identify and quantify fatty acids and water soluble compounds possibly related to low temperature acclimation in strain N33. Bacterial cells were grown at three different growing temperatures (4°C, 10°C and 21°C). Cells from 21°C were also cold-exposed to 4°C for different times (2, 4, 8, 60 and 240 minutes). We identified that poly-unsaturated linoleic acids 18∶2 (9, 12) & 18∶2 (6, 9) were more abundant in cells growing at 4 or 10°C, than in cells cultivated at 21°C. The mono-unsaturated phospho/neutral fatty acids myristoleic acid 14∶1(11) were the most significantly overexpressed (45-fold) after 1hour of exposure to 4°C. As reported in the literature, these fatty acids play important roles in cold adaptability by supplying cell membrane fluidity, and by providing energy to cells. Analysis of water-soluble compounds revealed that isobutyrate, sarcosine, threonine and valine were more accumulated during exposure to 4°C. These metabolites might play a role in conferring cold acclimation to strain N33 at 4°C, probably by acting as cryoprotectants. Isobutyrate was highly upregulated (19.4-fold) during growth at 4°C, thus suggesting that this compound is a precursor for the cold-regulated fatty acids modification to low temperature adaptation.
PMCID: PMC3875568  PMID: 24386418
3.  SMPDB 2.0: Big Improvements to the Small Molecule Pathway Database 
Nucleic Acids Research  2013;42(D1):D478-D484.
The Small Molecule Pathway Database (SMPDB, is a comprehensive, colorful, fully searchable and highly interactive database for visualizing human metabolic, drug action, drug metabolism, physiological activity and metabolic disease pathways. SMPDB contains >600 pathways with nearly 75% of its pathways not found in any other database. All SMPDB pathway diagrams are extensively hyperlinked and include detailed information on the relevant tissues, organs, organelles, subcellular compartments, protein cofactors, protein locations, metabolite locations, chemical structures and protein quaternary structures. Since its last release in 2010, SMPDB has undergone substantial upgrades and significant expansion. In particular, the total number of pathways in SMPDB has grown by >70%. Additionally, every previously entered pathway has been completely redrawn, standardized, corrected, updated and enhanced with additional molecular or cellular information. Many SMPDB pathways now include transporter proteins as well as much more physiological, tissue, target organ and reaction compartment data. Thanks to the development of a standardized pathway drawing tool (called PathWhiz) all SMPDB pathways are now much more easily drawn and far more rapidly updated. PathWhiz has also allowed all SMPDB pathways to be saved in a BioPAX format. Significant improvements to SMPDB’s visualization interface now make the browsing, selection, recoloring and zooming of pathways far easier and far more intuitive. Because of its utility and breadth of coverage, SMPDB is now integrated into several other databases including HMDB and DrugBank.
PMCID: PMC3965088  PMID: 24203708
4.  DrugBank 4.0: shedding new light on drug metabolism 
Nucleic Acids Research  2013;42(D1):D1091-D1097.
DrugBank ( is a comprehensive online database containing extensive biochemical and pharmacological information about drugs, their mechanisms and their targets. Since it was first described in 2006, DrugBank has rapidly evolved, both in response to user requests and in response to changing trends in drug research and development. Previous versions of DrugBank have been widely used to facilitate drug and in silico drug target discovery. The latest update, DrugBank 4.0, has been further expanded to contain data on drug metabolism, absorption, distribution, metabolism, excretion and toxicity (ADMET) and other kinds of quantitative structure activity relationships (QSAR) information. These enhancements are intended to facilitate research in xenobiotic metabolism (both prediction and characterization), pharmacokinetics, pharmacodynamics and drug design/discovery. For this release, >1200 drug metabolites (including their structures, names, activity, abundance and other detailed data) have been added along with >1300 drug metabolism reactions (including metabolizing enzymes and reaction types) and dozens of drug metabolism pathways. Another 30 predicted or measured ADMET parameters have been added to each DrugCard, bringing the average number of quantitative ADMET values for Food and Drug Administration-approved drugs close to 40. Referential nuclear magnetic resonance and MS spectra have been added for almost 400 drugs as well as spectral and mass matching tools to facilitate compound identification. This expanded collection of drug information is complemented by a number of new or improved search tools, including one that provides a simple analyses of drug–target, –enzyme and –transporter associations to provide insight on drug–drug interactions.
PMCID: PMC3965102  PMID: 24203711
5.  pE-DB: a database of structural ensembles of intrinsically disordered and of unfolded proteins 
Nucleic Acids Research  2013;42(D1):D326-D335.
The goal of pE-DB ( is to serve as an openly accessible database for the deposition of structural ensembles of intrinsically disordered proteins (IDPs) and of denatured proteins based on nuclear magnetic resonance spectroscopy, small-angle X-ray scattering and other data measured in solution. Owing to the inherent flexibility of IDPs, solution techniques are particularly appropriate for characterizing their biophysical properties, and structural ensembles in agreement with these data provide a convenient tool for describing the underlying conformational sampling. Database entries consist of (i) primary experimental data with descriptions of the acquisition methods and algorithms used for the ensemble calculations, and (ii) the structural ensembles consistent with these data, provided as a set of models in a Protein Data Bank format. PE-DB is open for submissions from the community, and is intended as a forum for disseminating the structural ensembles and the methodologies used to generate them. While the need to represent the IDP structures is clear, methods for determining and evaluating the structural ensembles are still evolving. The availability of the pE-DB database is expected to promote the development of new modeling methods and leads to a better understanding of how function arises from disordered states.
PMCID: PMC3964940  PMID: 24174539
6.  Phenol-Explorer 3.0: a major update of the Phenol-Explorer database to incorporate data on the effects of food processing on polyphenol content 
Polyphenols are a major class of bioactive phytochemicals whose consumption may play a role in the prevention of a number of chronic diseases such as cardiovascular diseases, type II diabetes and cancers. Phenol-Explorer, launched in 2009, is the only freely available web-based database on the content of polyphenols in food and their in vivo metabolism and pharmacokinetics. Here we report the third release of the database (Phenol-Explorer 3.0), which adds data on the effects of food processing on polyphenol contents in foods. Data on >100 foods, covering 161 polyphenols or groups of polyphenols before and after processing, were collected from 129 peer-reviewed publications and entered into new tables linked to the existing relational design. The effect of processing on polyphenol content is expressed in the form of retention factor coefficients, or the proportion of a given polyphenol retained after processing, adjusted for change in water content. The result is the first database on the effects of food processing on polyphenol content and, following the model initially defined for Phenol-Explorer, all data may be traced back to original sources. The new update will allow polyphenol scientists to more accurately estimate polyphenol exposure from dietary surveys.
Database URL:
PMCID: PMC3792339  PMID: 24103452
7.  Identification and characterization of ϕH111-1 
Bacteriophage  2013;3(4):e26649.
Characterization of prophages in sequenced bacterial genomes is important for virulence assessment, evolutionary analysis, and phage application development. The objective of this study was to identify complete, inducible prophages in the cystic fibrosis (CF) clinical isolate Burkholderia cenocepacia H111. Using the prophage-finding program PHAge Search Tool (PHAST), we identified three putative intact prophages in the H111 sequence. Virions were readily isolated from H111 culture supernatants following extended incubation. Using shotgun cloning and sequencing, one of these virions (designated ϕH111-1 [vB_BceM_ϕH111-1]) was identified as the infective particle of a PHAST-detected intact prophage. ϕH111-1 has an extremely broad host range with respect to B. cenocepacia strains and is predicted to use lipopolysaccharide (LPS) as a receptor. Bioinformatics analysis indicates that the prophage is 42,972 base pairs in length, encodes 54 proteins, and shows relatedness to the virion morphogenesis modules of AcaML1 and “Vhmllikevirus” myoviruses. As ϕH111-1 is active against a broad panel of clinical strains and encodes no putative virulence factors, it may be therapeutically effective for Burkholderia infections.
PMCID: PMC3829948  PMID: 24265978
prophage identification; PHAST; bioinformatics; phage therapy; Burkholderia cepacia complex
8.  The Human Urine Metabolome 
PLoS ONE  2013;8(9):e73076.
Urine has long been a “favored” biofluid among metabolomics researchers. It is sterile, easy-to-obtain in large volumes, largely free from interfering proteins or lipids and chemically complex. However, this chemical complexity has also made urine a particularly difficult substrate to fully understand. As a biological waste material, urine typically contains metabolic breakdown products from a wide range of foods, drinks, drugs, environmental contaminants, endogenous waste metabolites and bacterial by-products. Many of these compounds are poorly characterized and poorly understood. In an effort to improve our understanding of this biofluid we have undertaken a comprehensive, quantitative, metabolome-wide characterization of human urine. This involved both computer-aided literature mining and comprehensive, quantitative experimental assessment/validation. The experimental portion employed NMR spectroscopy, gas chromatography mass spectrometry (GC-MS), direct flow injection mass spectrometry (DFI/LC-MS/MS), inductively coupled plasma mass spectrometry (ICP-MS) and high performance liquid chromatography (HPLC) experiments performed on multiple human urine samples. This multi-platform metabolomic analysis allowed us to identify 445 and quantify 378 unique urine metabolites or metabolite species. The different analytical platforms were able to identify (quantify) a total of: 209 (209) by NMR, 179 (85) by GC-MS, 127 (127) by DFI/LC-MS/MS, 40 (40) by ICP-MS and 10 (10) by HPLC. Our use of multiple metabolomics platforms and technologies allowed us to identify several previously unknown urine metabolites and to substantially enhance the level of metabolome coverage. It also allowed us to critically assess the relative strengths and weaknesses of different platforms or technologies. The literature review led to the identification and annotation of another 2206 urinary compounds and was used to help guide the subsequent experimental studies. An online database containing the complete set of 2651 confirmed human urine metabolite species, their structures (3079 in total), concentrations, related literature references and links to their known disease associations are freely available at
PMCID: PMC3762851  PMID: 24023812
9.  INMEX—a web-based tool for integrative meta-analysis of expression data 
Nucleic Acids Research  2013;41(Web Server issue):W63-W70.
The widespread applications of various ‘omics’ technologies in biomedical research together with the emergence of public data repositories have resulted in a plethora of data sets for almost any given physiological state or disease condition. Properly combining or integrating these data sets with similar basic hypotheses can help reduce study bias, increase statistical power and improve overall biological understanding. However, the difficulties in data management and the complexities of analytical approaches have significantly limited data integration to enable meta-analysis. Here, we introduce integrative meta-analysis of expression data (INMEX), a user-friendly web-based tool designed to support meta-analysis of multiple gene-expression data sets, as well as to enable integration of data sets from gene expression and metabolomics experiments. INMEX contains three functional modules. The data preparation module supports flexible data processing, annotation and visualization of individual data sets. The statistical analysis module allows researchers to combine multiple data sets based on P-values, effect sizes, rank orders and other features. The significant genes can be examined in functional analysis module for enriched Gene Ontology terms or Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, or expression profile visualization. INMEX has built-in support for common gene/metabolite identifiers (IDs), as well as 45 popular microarray platforms for human, mouse and rat. Complex operations are performed through a user-friendly web interface in a step-by-step manner. INMEX is freely available at
PMCID: PMC3692077  PMID: 23766290
10.  Development of Ecom50 and Retention Index Models for Non-Targeted Metabolomics: Identification of 1,3-dicyclohexylurea in Human Serum by HPLC/Mass Spectrometry 
The goal of many metabolomic studies is to identify the molecular structure of endogenous molecules that are differentially expressed among sampled or treatment groups. The identified compounds can then be used to gain an understanding of disease mechanisms. Unfortunately, despite recent advances in a variety of analytical techniques, small molecule (<1000 Da) identification remains difficult. Rarely can a chemical structure be determined from experimental “features” such as retention time, exact mass, and collision induced dissociation spectra. Thus, without knowing structure, biological significance remains obscure. In this study we explore an identification method in which the measured exact mass of an unknown is used to query available chemical databases to compile a list of candidate compounds. Predictions are made for the candidates using models of experimental features that have been measured for the unknown. The predicted values are used to filter the candidate list by eliminating compounds with predicted values substantially different from the unknown. The intent is to reduce the list of candidates to a reasonable number that can be obtained and measured for confirmation. To facilitate this exploration, we measured data and created models for two experimental features; MS Ecom50 (the energy in eV required to fragment 50% of a selected precursor ion) and HPLC retention index. Using a dataset of 52 compounds, Ecom50 models were developed based on both Molconn and CODESSA structural descriptors. These models gave r2 values of 0.89 to 0.94 depending on the number of inputs, the modeling algorithm chosen, and whether neutral or protonated structures were used. The retention index model was developed with 400 compounds using a back propagation artificial neural network and 33 Molconn structure descriptors. External validation gave a v2 = 0.86 and standard error of 38 retention index units. As a test of the validity of the filtering approach, the Ecom50 and retention index models, along with exact mass and collision induced dissociation spectra matching, were used to identify 1,3-dicyclohexylurea in human plasma. This compound was not previously known to exist in human biofluids and its elemental formula was identical to 315 other candidate compounds downloaded from PubChem. These results suggest that the use of Ecom50 and retention index predictive models can improve non-targeted metabolite structure identification using HPLC/MS derived structural features.
PMCID: PMC3376006  PMID: 22489687
metabolomics; retention index; Ecom50; mass spectrometry; HPLC; database searching; structure identification; molecular topology; Molconn; Codessa; artificial neural network
11.  Nanopore Analysis of Wild-Type and Mutant Prion Protein (PrPC): Single Molecule Discrimination and PrPC Kinetics 
PLoS ONE  2013;8(2):e54982.
Prion diseases are fatal neurodegenerative diseases associated with the conversion of cellular prion protein (PrPC) in the central nervous system into the infectious isoform (PrPSc). The mechanics of conversion are almost entirely unknown, with understanding stymied by the lack of an atomic-level structure for PrPSc. A number of pathogenic PrPC mutants exist that are characterized by an increased propensity for conversion into PrPSc and that differ from wild-type by only a single amino-acid point mutation in their primary structure. These mutations are known to perturb the stability and conformational dynamics of the protein. Understanding of how this occurs may provide insight into the mechanism of PrPC conversion. In this work we sought to explore wild-type and pathogenic mutant prion protein structure and dynamics by analysis of the current fluctuations through an organic α-hemolysin nanometer-scale pore (nanopore) in which a single prion protein has been captured electrophoretically. In doing this, we find that wild-type and D178N mutant PrPC, (a PrPC mutant associated with both Fatal Familial Insomnia and Creutzfeldt-Jakob disease), exhibit easily distinguishable current signatures and kinetics inside the pore and we further demonstrate, with the use of Hidden Markov Model signal processing, accurate discrimination between these two proteins at the single molecule level based on the kinetics of a single PrPC capture event. Moreover, we present a four-state model to describe wild-type PrPC kinetics in the pore as a first step in our investigation on characterizing the differences in kinetics and conformational dynamics between wild-type and D178N mutant PrPC. These results demonstrate the potential of nanopore analysis for highly sensitive, real-time protein and small molecule detection based on single molecule kinetics inside a nanopore, and show the utility of this technique as an assay to probe differences in stability between wild-type and mutant prion proteins at the single molecule level.
PMCID: PMC3564863  PMID: 23393562
12.  An improved method to detect correct protein folds using partial clustering 
BMC Bioinformatics  2013;14:11.
Structure-based clustering is commonly used to identify correct protein folds among candidate folds (also called decoys) generated by protein structure prediction programs. However, traditional clustering methods exhibit a poor runtime performance on large decoy sets. We hypothesized that a more efficient “partial“ clustering approach in combination with an improved scoring scheme could significantly improve both the speed and performance of existing candidate selection methods.
We propose a new scheme that performs rapid but incomplete clustering on protein decoys. Our method detects structurally similar decoys (measured using either Cα RMSD or GDT-TS score) and extracts representatives from them without assigning every decoy to a cluster. We integrated our new clustering strategy with several different scoring functions to assess both the performance and speed in identifying correct or near-correct folds. Experimental results on 35 Rosetta decoy sets and 40 I-TASSER decoy sets show that our method can improve the correct fold detection rate as assessed by two different quality criteria. This improvement is significantly better than two recently published clustering methods, Durandal and Calibur-lite. Speed and efficiency testing shows that our method can handle much larger decoy sets and is up to 22 times faster than Durandal and Calibur-lite.
The new method, named HS-Forest, avoids the computationally expensive task of clustering every decoy, yet still allows superior correct-fold selection. Its improved speed, efficiency and decoy-selection performance should enable structure prediction researchers to work with larger decoy sets and significantly improve their ab initio structure prediction performance.
PMCID: PMC3626854  PMID: 23323835
13.  Chapter 3: Small Molecules and Disease 
PLoS Computational Biology  2012;8(12):e1002805.
“Big” molecules such as proteins and genes still continue to capture the imagination of most biologists, biochemists and bioinformaticians. “Small” molecules, on the other hand, are the molecules that most biologists, biochemists and bioinformaticians prefer to ignore. However, it is becoming increasingly apparent that small molecules such as amino acids, lipids and sugars play a far more important role in all aspects of disease etiology and disease treatment than we realized. This particular chapter focuses on an emerging field of bioinformatics called “chemical bioinformatics” – a discipline that has evolved to help address the blended chemical and molecular biological needs of toxicogenomics, pharmacogenomics, metabolomics and systems biology. In the following pages we will cover several topics related to chemical bioinformatics. First, a brief overview of some of the most important or useful chemical bioinformatic resources will be given. Second, a more detailed overview will be given on those particular resources that allow researchers to connect small molecules to diseases. This section will focus on describing a number of recently developed databases or knowledgebases that explicitly relate small molecules – either as the treatment, symptom or cause – to disease. Finally a short discussion will be provided on newly emerging software tools that exploit these databases as a means to discover new biomarkers or even new treatments for disease.
PMCID: PMC3531289  PMID: 23300405
14.  Virtual Screening and Biological Evaluation of Inhibitors Targeting the XPA-ERCC1 Interaction 
PLoS ONE  2012;7(12):e51329.
Nucleotide excision repair (NER) removes many types of DNA lesions including those induced by UV radiation and platinum-based therapy. Resistance to platinum-based therapy correlates with high expression of ERCC1, a major element of the NER machinery. The interaction between ERCC1 and XPA is essential for a successful NER function. Therefore, one way to regulate NER is by inhibiting the activity of ERCC1 and XPA.
Methodology/Principal Findings
Here we continued our earlier efforts aimed at the identification and characterization of novel inhibitors of the ERCC1-XPA interaction. We used a refined virtual screening approach combined with a biochemical and biological evaluation of the compounds for their ability to interact with ERCC1 and to sensitize cells to UV radiation. Our findings reveal a new validated ERCC1-XPA inhibitor that significantly sensitized colon cancer cells to UV radiation indicating a strong inhibition of the ERCC1-XPA interaction.
NER is a major factor in acquiring resistance to platinum-based therapy. Regulating the NER pathway has the potential of improving the efficacy of platinum treatments. One approach that we followed is to inhibit the essential interaction between the two NER elements, ERCC1 and XPA. Here, we performed virtual screening against the ERCC1-XPA interaction and identified novel inhibitors that block the XPA-ERCC1 binding. The identified inhibitors significantly sensitized colon cancer cells to UV radiation indicating a strong inhibition of the ERCC1-XPA interaction.
PMCID: PMC3522735  PMID: 23272099
15.  The Metabolomic Profile of Umbilical Cord Blood in Neonatal Hypoxic Ischaemic Encephalopathy 
PLoS ONE  2012;7(12):e50520.
Hypoxic ischaemic encephalopathy (HIE) in newborns can cause significant long-term neurological disability. The insult is a complex injury characterised by energy failure and disruption of cellular homeostasis, leading to mitochondrial damage. The importance of individual metabolic pathways, and their interaction in the disease process is not fully understood. The aim of this study was to describe and quantify the metabolomic profile of umbilical cord blood samples in a carefully defined population of full-term infants with HIE.
Methods and Findings
The injury severity was defined using both the modified Sarnat score and continuous multichannel electroencephalogram. Using these classification systems, our population was divided into those with confirmed HIE (n = 31), asphyxiated infants without encephalopathy (n = 40) and matched controls (n = 71). All had umbilical cord blood drawn and biobanked at −80°C within 3 hours of delivery. A combined direct injection and LC-MS/MS assay (AbsolutIDQ p180 kit, Biocrates Life Sciences AG, Innsbruck, Austria) was used for the metabolomic analyses of the samples. Targeted metabolomic analysis showed a significant alteration between study groups in 29 metabolites from 3 distinct classes (Amino Acids, Acylcarnitines, and Glycerophospholipids). 9 of these metabolites were only significantly altered between neonates with Hypoxic ischaemic encephalopathy and matched controls, while 14 were significantly altered in both study groups. Multivariate Discriminant Analysis models developed showed clear multifactorial metabolite associations with both asphyxia and HIE. A logistic regression model using 5 metabolites clearly delineates severity of asphyxia and classifies HIE infants with AUC = 0.92. These data describe wide-spread disruption to not only energy pathways, but also nitrogen and lipid metabolism in both asphyxia and HIE.
This study shows that a multi-platform targeted approach to metabolomic analyses using accurately phenotyped and meticulously biobanked samples provides insight into the pathogenesis of perinatal asphyxia. It highlights the potential for metabolomic technology to develop a diagnostic test for HIE.
PMCID: PMC3515614  PMID: 23227182
16.  Translational biomarker discovery in clinical metabolomics: an introductory tutorial 
Metabolomics  2012;9(2):280-299.
Metabolomics is increasingly being applied towards the identification of biomarkers for disease diagnosis, prognosis and risk prediction. Unfortunately among the many published metabolomic studies focusing on biomarker discovery, there is very little consistency and relatively little rigor in how researchers select, assess or report their candidate biomarkers. In particular, few studies report any measure of sensitivity, specificity, or provide receiver operator characteristic (ROC) curves with associated confidence intervals. Even fewer studies explicitly describe or release the biomarker model used to generate their ROC curves. This is surprising given that for biomarker studies in most other biomedical fields, ROC curve analysis is generally considered the standard method for performance assessment. Because the ultimate goal of biomarker discovery is the translation of those biomarkers to clinical practice, it is clear that the metabolomics community needs to start “speaking the same language” in terms of biomarker analysis and reporting-especially if it wants to see metabolite markers being routinely used in the clinic. In this tutorial, we will first introduce the concept of ROC curves and describe their use in single biomarker analysis for clinical chemistry. This includes the construction of ROC curves, understanding the meaning of area under ROC curves (AUC) and partial AUC, as well as the calculation of confidence intervals. The second part of the tutorial focuses on biomarker analyses within the context of metabolomics. This section describes different statistical and machine learning strategies that can be used to create multi-metabolite biomarker models and explains how these models can be assessed using ROC curves. In the third part of the tutorial we discuss common issues and potential pitfalls associated with different analysis methods and provide readers with a list of nine recommendations for biomarker analysis and reporting. To help readers test, visualize and explore the concepts presented in this tutorial, we also introduce a web-based tool called ROCCET (ROC Curve Explorer & Tester, ROCCET was originally developed as a teaching aid but it can also serve as a training and testing resource to assist metabolomics researchers build biomarker models and conduct a range of common ROC curve analyses for biomarker studies.
Electronic supplementary material
The online version of this article (doi:10.1007/s11306-012-0482-9) contains supplementary material, which is available to authorized users.
PMCID: PMC3608878  PMID: 23543913
Biomarker analysis; ROC curve; AUC; Confidence intervals; Optimal threshold; Sample size; Bootstrapping; Cross validation; Biomarker validation and reporting
17.  HMDB 3.0—The Human Metabolome Database in 2013 
Nucleic Acids Research  2012;41(D1):D801-D807.
The Human Metabolome Database (HMDB) ( is a resource dedicated to providing scientists with the most current and comprehensive coverage of the human metabolome. Since its first release in 2007, the HMDB has been used to facilitate research for nearly 1000 published studies in metabolomics, clinical biochemistry and systems biology. The most recent release of HMDB (version 3.0) has been significantly expanded and enhanced over the 2009 release (version 2.0). In particular, the number of annotated metabolite entries has grown from 6500 to more than 40 000 (a 600% increase). This enormous expansion is a result of the inclusion of both ‘detected’ metabolites (those with measured concentrations or experimental confirmation of their existence) and ‘expected’ metabolites (those for which biochemical pathways are known or human intake/exposure is frequent but the compound has yet to be detected in the body). The latest release also has greatly increased the number of metabolites with biofluid or tissue concentration data, the number of compounds with reference spectra and the number of data fields per entry. In addition to this expansion in data quantity, new database visualization tools and new data content have been added or enhanced. These include better spectral viewing tools, more powerful chemical substructure searches, an improved chemical taxonomy and better, more interactive pathway maps. This article describes these enhancements to the HMDB, which was previously featured in the 2009 NAR Database Issue. (Note to referees, HMDB 3.0 will go live on 18 September 2012.).
PMCID: PMC3531200  PMID: 23161693
18.  ECMDB: The E. coli Metabolome Database 
Nucleic Acids Research  2012;41(D1):D625-D630.
The Escherichia coli Metabolome Database (ECMDB, is a comprehensively annotated metabolomic database containing detailed information about the metabolome of E. coli (K-12). Modelled closely on the Human and Yeast Metabolome Databases, the ECMDB contains >2600 metabolites with links to ∼1500 different genes and proteins, including enzymes and transporters. The information in the ECMDB has been collected from dozens of textbooks, journal articles and electronic databases. Each metabolite entry in the ECMDB contains an average of 75 separate data fields, including comprehensive compound descriptions, names and synonyms, chemical taxonomy, compound structural and physicochemical data, bacterial growth conditions and substrates, reactions, pathway information, enzyme data, gene/protein sequence data and numerous hyperlinks to images, references and other public databases. The ECMDB also includes an extensive collection of intracellular metabolite concentration data compiled from our own work as well as other published metabolomic studies. This information is further supplemented with thousands of fully assigned reference nuclear magnetic resonance and mass spectrometry spectra obtained from pure E. coli metabolites that we (and others) have collected. Extensive searching, relational querying and data browsing tools are also provided that support text, chemical structure, spectral, molecular weight and gene/protein sequence queries. Because of E. coli’s importance as a model organism for biologists and as a biofactory for industry, we believe this kind of database could have considerable appeal not only to metabolomics researchers but also to molecular biologists, systems biologists and individuals in the biotechnology industry.
PMCID: PMC3531117  PMID: 23109553
19.  Phenol-Explorer 2.0: a major update of the Phenol-Explorer database integrating data on polyphenol metabolism and pharmacokinetics in humans and experimental animals 
Phenol-Explorer, launched in 2009, is the only comprehensive web-based database on the content in foods of polyphenols, a major class of food bioactives that receive considerable attention due to their role in the prevention of diseases. Polyphenols are rarely absorbed and excreted in their ingested forms, but extensively metabolized in the body, and until now, no database has allowed the recall of identities and concentrations of polyphenol metabolites in biofluids after the consumption of polyphenol-rich sources. Knowledge of these metabolites is essential in the planning of experiments whose aim is to elucidate the effects of polyphenols on health. Release 2.0 is the first major update of the database, allowing the rapid retrieval of data on the biotransformations and pharmacokinetics of dietary polyphenols. Data on 375 polyphenol metabolites identified in urine and plasma were collected from 236 peer-reviewed publications on polyphenol metabolism in humans and experimental animals and added to the database by means of an extended relational design. Pharmacokinetic parameters have been collected and can be retrieved in both tabular and graphical form. The web interface has been enhanced and now allows the filtering of information according to various criteria. Phenol-Explorer 2.0, which will be periodically updated, should prove to be an even more useful and capable resource for polyphenol scientists because bioactivities and health effects of polyphenols are dependent on the nature and concentrations of metabolites reaching the target tissues. The Phenol-Explorer database is publicly available and can be found online at
Database URL:
PMCID: PMC3414821  PMID: 22879444
20.  Calculation of Local Water Densities in Biological Systems — A Comparison of Molecular Dynamics Simulations and the 3D-RISM-KH Molecular Theory of Solvation 
Water plays a unique role in all living organisms. Not only is it nature’s ubiquitous solvent, but it also actively takes part in many cellular processes. In particular, the structure and properties of interfacial water near biomolecules like proteins are often related to the function of the respective molecule. It can therefore be highly instructive to study the local water density around solutes in cellular systems, particularly when solvent-mediated forces like the hydrophobic effect are relevant. Computational methods like molecular dynamics (MD) simulations seem well suited to study these systems at the atomic level. However, due to sampling requirements, it is not clear that MD simulations are indeed the method of choice to obtain converged densities at a given level of precision. We here compare the calculation of local water densities with two different methods, MD simulations and the three-dimensional reference interaction site model with the Kovalenko-Hirata closure (3D-RISM-KH). In particular, we investigate the convergence of the local water density to assess the required simulation times for different levels of resolution. Moreover, we provide a quantitative comparison of the densities calculated with MD and with 3D-RISM-KH, and investigate the effect of the choice of the water model for both methods. Our results show that 3D-RISM-KH yields density distributions that are very similar to those from MD up to a 0.5 Å resolution, but for significantly reduced computational cost. The combined use of MD and 3D-RISM-KH emerges as an auspicious perspective for efficient solvent sampling in dynamical systems.
PMCID: PMC3407544  PMID: 21174421
interfacial water; solvation; confined water; hydrophobic effect; MD; box size
21.  Comparative analysis of essential collective dynamics and NMR-derived flexibility profiles in evolutionarily diverse prion proteins 
Prion  2011;5(3):188-200.
Collective motions on ns-µs time scales are known to have a major impact on protein folding, stability, binding and enzymatic efficiency. It is also believed that these motions may have an important role in the early stages of prion protein misfolding and prion disease. In an effort to accurately characterize these motions and their potential influence on the misfolding and prion disease transmissibility we have conducted a combined analysis of molecular dynamic simulations and NMR-derived flexibility measurements over a diverse range of prion proteins. Using a recently developed numerical formalism, we have analyzed the essential collective dynamics (ECD) for prion proteins from eight different species including human, cow, elk, cat, hamster, chicken, turtle and frog. We also compared the numerical results with flexibility profiles generated by the random coil index (RCI) from NMR chemical shifts. Prion protein backbone flexibility derived from experimental NMR data and from theoretical computations show strong agreement with each other, demonstrating that it is possible to predict the observed RCI profiles employing the numerical ECD formalism. Interestingly, flexibility differences in the loop between second b strand (S2) and the second a helix (HB) appear to distinguish prion proteins from species that are susceptible to prion disease and those that are resistant. Our results show that the different levels of flexibility in the S2-HB loop in various species are predictable via the ECD method, indicating that ECD may be used to identify disease resistant variants of prion proteins, as well as the influence of prion proteins mutations on disease susceptibility or misfolding propensity.
PMCID: PMC3226046  PMID: 21869604
prion proteins structural stability; molecular dynamics simulation; essential collective dynamics; protein dynamic domains; biomolecular NMR; rigid loop
22.  METAGENassist: a comprehensive web server for comparative metagenomics 
Nucleic Acids Research  2012;40(Web Server issue):W88-W95.
With recent improvements in DNA sequencing and sample extraction techniques, the quantity and quality of metagenomic data are now growing exponentially. This abundance of richly annotated metagenomic data and bacterial census information has spawned a new branch of microbiology called comparative metagenomics. Comparative metagenomics involves the comparison of bacterial populations between different environmental samples, different culture conditions or different microbial hosts. However, in order to do comparative metagenomics, one typically requires a sophisticated knowledge of multivariate statistics and/or advanced software programming skills. To make comparative metagenomics more accessible to microbiologists, we have developed a freely accessible, easy-to-use web server for comparative metagenomic analysis called METAGENassist. Users can upload their bacterial census data from a wide variety of common formats, using either amplified 16S rRNA data or shotgun metagenomic data. Metadata concerning environmental, culture, or host conditions can also be uploaded. During the data upload process, METAGENassist also performs an automated taxonomic-to-phenotypic mapping. Phenotypic information covering nearly 20 functional categories such as GC content, genome size, oxygen requirements, energy sources and preferred temperature range is automatically generated from the taxonomic input data. Using this phenotypically enriched data, users can then perform a variety of multivariate and univariate data analyses including fold change analysis, t-tests, PCA, PLS-DA, clustering and classification. To facilitate data processing, users are guided through a step-by-step analysis workflow using a variety of menus, information hyperlinks and check boxes. METAGENassist also generates colorful, publication quality tables and graphs that can be downloaded and used directly in the preparation of scientific papers. METAGENassist is available at
PMCID: PMC3394294  PMID: 22645318
23.  MetaboAnalyst 2.0—a comprehensive server for metabolomic data analysis 
Nucleic Acids Research  2012;40(Web Server issue):W127-W133.
First released in 2009, MetaboAnalyst ( was a relatively simple web server designed to facilitate metabolomic data processing and statistical analysis. With continuing advances in metabolomics along with constant user feedback, it became clear that a substantial upgrade to the original server was necessary. MetaboAnalyst 2.0, which is the successor to MetaboAnalyst, represents just such an upgrade. MetaboAnalyst 2.0 now contains dozens of new features and functions including new procedures for data filtering, data editing and data normalization. It also supports multi-group data analysis, two-factor analysis as well as time-series data analysis. These new functions have also been supplemented with: (i) a quality-control module that allows users to evaluate their data quality before conducting any analysis, (ii) a functional enrichment analysis module that allows users to identify biologically meaningful patterns using metabolite set enrichment analysis and (iii) a metabolic pathway analysis module that allows users to perform pathway analysis and visualization for 15 different model organisms. In developing MetaboAnalyst 2.0 we have also substantially improved its graphical presentation tools. All images are now generated using anti-aliasing and are available over a range of resolutions, sizes and formats (PNG, TIFF, PDF, PostScript, or SVG). To improve its performance, MetaboAnalyst 2.0 is now hosted on a much more powerful server with substantially modified code to take advantage the server’s multi-core CPUs for computationally intensive tasks. MetaboAnalyst 2.0 also maintains a collection of 50 or more FAQs and more than a dozen tutorials compiled from user queries and requests. A downloadable version of MetaboAnalyst 2.0, along detailed instructions for local installation is now available as well.
PMCID: PMC3394314  PMID: 22553367
24.  Multi-platform characterization of the human cerebrospinal fluid metabolome: a comprehensive and quantitative update 
Genome Medicine  2012;4(4):38.
Human cerebral spinal fluid (CSF) is known to be a rich source of small molecule biomarkers for neurological and neurodegenerative diseases. In 2007, we conducted a comprehensive metabolomic study and performed a detailed literature review on metabolites that could be detected (via metabolomics or other techniques) in CSF. A total of 308 detectable metabolites were identified, of which only 23% were shown to be routinely identifiable or quantifiable with the metabolomics technologies available at that time. The continuing advancement in analytical technologies along with the growing interest in CSF metabolomics has led us to re-visit the human CSF metabolome and to re-assess both its size and the level of coverage than can be achieved with today's technologies.
We used five analytical platforms, including nuclear magnetic resonance (NMR), gas chromatography-mass spectrometry (GC-MS), liquid chromatography-mass spectrometry (LC-MS), direct flow injection-mass spectrometry (DFI-MS/MS) and inductively coupled plasma-mass spectrometry (ICP-MS) to perform quantitative metabolomics on multiple human CSF samples. This experimental work was complemented with an extensive literature review to acquire additional information on reported CSF compounds, their concentrations and their disease associations.
NMR, GC-MS and LC-MS methods allowed the identification and quantification of 70 CSF metabolites (as previously reported). DFI-MS/MS allowed the quantification of 78 metabolites (6 acylcarnitines, 13 amino acids, hexose, 42 phosphatidylcholines, 2 lyso-phosphatidylcholines and 14 sphingolipids), while ICP-MS provided quantitative results for 33 metal ions in CSF. Literature analysis led to the identification of 57 more metabolites. In total, 476 compounds have now been confirmed to exist in human CSF.
The use of improved metabolomic and other analytical techniques has led to a 54% increase in the known size of the human CSF metabolome over the past 5 years. Commonly available metabolomic methods, when combined, can now routinely identify and quantify 36% of the 'detectable' human CSF metabolome. Our experimental works measured 78 new metabolites that, as per our knowledge, have not been reported to be present in human CSF. An updated CSF metabolome database containing the complete set of 476 human CSF compounds, their concentrations, related literature references and links to their known disease associations is freely available at the CSF metabolome database.
PMCID: PMC3446266  PMID: 22546835
25.  Fatty Acid Composition of Developing Sea Buckthorn (Hippophae rhamnoides L.) Berry and the Transcriptome of the Mature Seed 
PLoS ONE  2012;7(4):e34099.
Sea buckthorn (Hippophae rhamnoides L.) is a hardy, fruit-producing plant known historically for its medicinal and nutraceutical properties. The most recognized product of sea buckthorn is its fruit oil, composed of seed oil that is rich in essential fatty acids, linoleic (18∶2ω-6) and α-linolenic (18∶3ω-3) acids, and pulp oil that contains high levels of monounsaturated palmitoleic acid (16∶1ω-7). Sea buckthorn is fast gaining popularity as a source of functional food and nutraceuticals, but currently has few genomic resources; therefore, we explored the fatty acid composition of Canadian-grown cultivars (ssp. mongolica) and the sea buckthorn seed transcriptome using the 454 GS FLX sequencing technology.
GC-MS profiling of fatty acids in seeds and pulp of berries indicated that the seed oil contained linoleic and α-linolenic acids at 33–36% and 30–36%, respectively, while the pulp oil contained palmitoleic acid at 32–42%. 454 sequencing of sea buckthorn cDNA collections from mature seeds yielded 500,392 sequence reads, which identified 89,141 putative unigenes represented by 37,482 contigs and 51,659 singletons. Functional annotation by Gene Ontology and computational prediction of metabolic pathways indicated that primary metabolism (protein>nucleic acid>carbohydrate>lipid) and fatty acid and lipid biosynthesis pathways were highly represented categories. Sea buckthorn sequences related to fatty acid biosynthesis genes in Arabidopsis were identified, and a subset of these was examined for transcript expression at four developing stages of the berry.
This study provides the first comprehensive genomic resources represented by expressed sequences for sea buckthorn, and demonstrates that the seed oil of Canadian-grown sea buckthorn cultivars contains high levels of linoleic acid and α-linolenic acid in a close to 1∶1 ratio, which is beneficial for human health. These data provide the foundation for further studies on sea buckthorn oil, the enzymes involved in its biosynthesis, and the genes involved in the general hardiness of sea buckthorn against environmental conditions.
PMCID: PMC3338740  PMID: 22558083

Results 1-25 (100)