|Home | About | Journals | Submit | Contact Us | Français|
The emerging field of “metabolomics,” in which a large number of small molecule metabolites from body fluids or tissues are detected quantitatively in a single step, promises immense potential for early diagnosis, therapy monitoring and for understanding the pathogenesis of many diseases. Metabolomics methods are mostly focused on the information rich analytical techniques of nuclear magnetic resonance (NMR) spectroscopy and mass spectrometry (MS). Analysis of the data from these high-resolution methods using advanced chemometric approaches provides a powerful platform for translational and clinical research, and diagnostic applications. In this review, the current trends and recent advances in NMR- and MS-based metabolomics are described with a focus on the development of advanced NMR and MS methods, improved multivariate statistical data analysis and recent applications in the area of cancer, diabetes, inborn errors of metabolism, and cardiovascular diseases.
While vast progress in the fields of genomics and proteomics has occurred, additional evidence of biological end points of human diseases is highly desired for disease diagnosis, prognosis and therapeutic development. Thus detection of metabolites that are involved in human diseases or that can be used to help develop new drugs, using cells, tissue, organs or biological fluids has risen in prominence over the past several years [1–6]. The promising field of metabolomics, and the closely related areas of metabonomics and metabolite profiling, involve the quantitative detection of multiple small molecule metabolites in biological systems. An improved understanding of biological systems at the molecular level, i.e., systems biology, is anticipated to result from the metabolomics approach, especially when combined with genomics and proteomics information. Perhaps even more importantly, truly “personalized medicine” is anticipated to become a reality through the advancement of metabolomics and other ‘omic’ sciences . A major advantage in the application of metabolomics stems from an improved ability to detect up to many hundreds of metabolites in parallel, which provides an efficient method for monitoring altered biochemistry. It is thought that the human body contains approximately 3000 to 5000 detectable metabolites, a sizable fraction of which have already been identified . In addition, metabolite concentration alterations are often amplified when compared to those of gene expression or protein levels, making the detection of metabolite profiles a relatively sensitive measure of biological status. Changes in biological status is then based on the detection of perturbations in the concentrations and fluxes of specific endogenous metabolites involved in a number of key disease-related or other specific cellular pathways. Thus, metabolomics can reveal crucial information that is closely related to the current disease or theraputic status. More generally, the metabolic profile of biological specimens is affected by a numerous factors such as diet, age, ethnicity, drugs, lifestyle or gut microfloral populations, and these factors need to be either controlled or deconvoluted in order to obtain information specific to disease . A number of articles providing information on the background of NMR- and MS-based metabolomics and related areas, its various applications and technologies, as well as the advantages and limitations of the metabolomics approach have appeared [6, 9–19].
Among the analytical techniques that can be employed for metabolomics applications, nuclear magnetic resonance (NMR) spectroscopy and mass spectrometry (MS) are the most common. A number of methodologies within these two technological areas are currently being developed specifically to deal with the types of complex samples analyzed in metabolomics studies. NMR spectroscopy is known as one of the premier methods for the analyses of multi-component mixtures as it requires little or no sample preparation; is rapid, non-destructive, and non-invasive; and provides highly reproducible results (coefficient of variation ~1–2%). Peaks in the NMR spectra can be reliably assigned to specific metabolic species, based on their chemical shifts and multiplet patterns, and thus NMR provides a wealth of information on the identity and quantity of a large number of metabolites in parallel from a single experiment. With advanced high-throughput NMR methodology, up to 200 samples can be measured within a day with the assistance of flow-injection probes and automated liquid handlers. The detection limit can also be decreased to 10’s of ng by the use of high field magnets, cryogenically cooled probes, microcoil probes equipped to handle very small samples, and methodologies that couple NMR to liquid chromatography and solid phase extraction [20–22]. On the other hand, the intrinsic high sensitivity (typically pg level) of MS detection makes it an important method for measuring metabolites in complex biofluids. A variety of MS methods in combination with separation techniques such as gas-chromatography (GC) and liquid chromatography (LC) and their variants have been used in numerous metabolomics investigations [19, 23–27]. Recently, a variety of promising atmospheric sample introduction MS methods have been developed that require essentially no sample separation or preparation [28–30].
Data from NMR and MS experiments are generally complex since they contain qualitative/quantitative information on upwards of several hundreds of metabolites. Multivariate statistical analyses are thus used for data reduction and in particular for differentiating biofluids samples into “disease” and “control” populations based on the differences in signals of multiple metabolites (Figure 1). A variety of statistical methodologies exist and many are now easily accessible via commercial software or “freeware,” and these methods provide extremely helpful tools for filtering the large amounts of data and for accessing the often-subtle biochemical perturbations latent in the spectra [31–35]. In addition, these approaches are used to extract single biomarkers or sets of biomarkers with the best properties for the assessment of disease status. Validation of such putative biomarkers is of great importance, as is the biological understanding of the disease that can provide additional validation in the application of metabolomics.
Urine and blood serum or plasma are the most commonly used biofluids for metabolomics-based studies for the simple reasons that they both contain hundreds to thousands of detectable metabolites and can be obtained non- or minimally invasively. A number of other fluids such as cerebrospinal fluid, bile, seminal fluid, amniotic fluid, synovial fluid, gut aspirate and saliva have also been studied [36–38]. More recently, metabolic profiling of intact tissue and its lipid and aqueous metabolites extracts is gaining more importance for biomarker detection .
Compared with other biofluids, the analysis of urine provides certain obvious advantages. The relatively low concentrations of proteins and high concentrations of low molecular weight compounds minimize sample preparation and result in high quality measurements due to the narrow line widths of the spectral peaks in the NMR spectra. These characteristics enhance the process of biomarker identification by NMR for both diagnostic and monitoring applications. However, the high salt content of urine is more challenging for MS measurements which typically require some sample pretreatment.
Blood maintains a normal homeostasis in the human body by constant regulatory mechanisms and hence metabolic profiling of serum/plasma provides a global view of the instantaneous metabolic status. Moreover, blood perfuses essentially all living cells in the human body and thus is anticipated to carry vital information on virtually every cell. Unlike urine, the NMR spectrum of serum/plasma includes both narrow signals from small molecule metabolites and broad signals from proteins and lipids. A variety of spectral editing methods are used to selectively detect small or large molecules signals. MS analysis of serum is normally carried out using extracts, and in the case of GC/MS, derivatization procedures.
Metabolic profiling using intact tissue has gained momentum as an approach for understanding the molecular basis of diseases . This interest stems from the fact that biomarkers due to pathophysiological stress are anticipated to be more highly concentrated in the pathological source for diseases such as cancer. The latest technological advancements in NMR have reduced the required sample quantity to as little as a few mgs so that even the biopsy tissue is sufficient to obtain good quality NMR spectra with resolution that is comparable to solution-state spectra. The rich metabolic profile of tissue is thought to be particularly useful for guiding the detection of biomarkers in more relatively easily accessible biofluids.
A number of one and two-dimensional NMR methods are currently used for metabolomics applications. A recent article provides NMR experimental protocols for the common biological samples used in metabolomics along with protocols for sample preparation .
Currently, the simple one pulse sequence and one-dimensional nuclear Overhauser enhancement spectroscopy (NOESY) sequence with water suppression are the most commonly used NMR methods for metabolomics applications. Water suppression in the one pulse experiment depends more critically on good shimming. On the other hand, 1D NOESY is more robust and provides a flatter baseline under similar conditions. A number of pulse sequences are available all of which are designed to effectively suppress the high intensity water signal leaving the metabolites signals intact [41, 42]. A commonly used method for suppressing the broad signals from large molecules (such as in tissue or serum samples) is the Carr–Purcell–Mieboom–Gill (CPMG) sequence. This sequence is generally robust, and is widely used in a number of studies to date. In contrast, the so-called “diffusion edited” NMR experiment may be used for observing signals from large molecules such as lipids . The 1D selective TOCSY experiment has been successfully applied to metabolomics studies to detect metabolites quantitatively even if they are found at concentrations 10–100 times below those of the major components . This approach has been shown to be highly useful for detecting targeted metabolites in biological samples .
2D NMR methods are highly useful for reducing the spectral complexity and obtaining connectivity between the nuclei to make assignments and identify metabolites. However, these methods have not been widely used in metabolomics to date because of their increased acquisition time, data size, and complexity in data analysis. Nevertheless, a small but growing number of papers report using 2D approaches in metabolomics studies [46–49]. The most commonly used include 2D–J spectroscopy, correlation spectroscopy (COSY), total correlation spectroscopy (TOCSY), heteronuclear single quantum coherence (HSQC) spectroscopy and heteronuclear multiple bond correlation (HMBC) experiments. 2D J-resolved (JRES) spectroscopy is attractive for metabolomics studies  because it can lead to a substantial simplification of the spectra. One drawback of this method is that the integral of signals is strongly influenced by T2 relaxation during generally the long t1 evolution period and hence only a relative quantification of concentration of metabolites is possible.
The use of 13C NMR in metabolomics, while attractive from the standpoint of improved resolution, has been limited due to the low natural abundance (~1.1%) and low gyromagnetic ratio of 13C nuclei, and therefore 13C NMR requires unacceptably long data acquisition times. To improve this situation, a 13C labeling approach that can be carried out directly in aqueous solution at ambient temperature has been introduced . The analysis of complex mixtures such as urine, serum or other bio-fluids is improved because the approach can be combined with fast 2D (1H-13C) heteronuclear experiments to yield spectra with good signal-to-noise ratios. The method has been applied to identify patients with inborn errors of metabolism.
For the non-invasive metabolic profiling of tissue specimens, a technique called high resolution magic angle spinning (HRMAS) NMR spectroscopy is utilized. In this approach, a rotor containing the tissue sample is spun typically at 3 to 6 kHz at an angle of 54.7° relative to the applied magnetic field, resulting in high-resolution, liquid-like NMR spectra . Most of the common NMR pulse techniques can be used for tissue NMR applications. 1H HRMAS NMR spectra of tissue samples can generally be obtained using small quantity of intact tissue (~ 5 to 20 mg).
MS methods coupled with prior separation modalities such as gas chromatography (GC), liquid chromatography (LC), and capillary electrophoresis (CE) provide enormous amounts of chemical information for metabolomics studies [19, 23–26,53]. A range of MS instruments, including quadrupoles, triple quads, ion traps, and time-of-flight mass analyzers are commonly used. Tandem MS (MS/MS or even MSn) methods are often used to validate the identity of unknown molecules. Fourier-transform ion cyclotron resonance (FT-ICR) provides an (expensive) alternative approach with extremely high resolution and a mass accuracy better than 1 ppm . The introduction of Orbitrap MS , allows an alternative high resolution mass analyzer that detects ions with very high mass accuracy.
LC-MS is currently the most important MS-based approach for metabolomics application  because of its sensitivity and rich information content. Biofluids such as urine can be directly injected into the LC system while samples such as serum require minimal sample preparation such as protein precipitation. LC-MS is considered to be a moderately high throughput method. The recently developed ultrahigh pressure liquid chromatography (UPLC) approach has significantly improved the chromatic resolution, and reduced the limit of detection by 3–5 fold . A common challenge for LC- (and GC-) MS measurements is the inter-batch-variation as well as the fact that the separation process makes the analysis time-consuming. In addition, the selectivity of LC/GC-MS to specific classes of analytes provides both benefits and complications.
The high separation efficiency and reproducibility of GC-MS makes it also a very useful tool for metabolomics . Depending on the sample preparation conditions, GC-MS can be applied to the analysis of a wide range of metabolite classes including ketones, aldehydes, alcohols, esters, sulfides, sugars, sugar-phosphates, sugar-alcohols, organic acids, amino acids, lipids, peptides, alkaloids, amines and amides. Unlike LC-MS, GC-MS often requires rather extensive sample preparation steps such as chemical derivatization before the analysis. A new technology that incorporates a second GC column (2D GC-MS) is very promising for metabolomics applications because of the additional resolution it provides [27, 56].
As an alternative to chromatographic separation, sample effusion methods  and several recently introduced atmospheric sample introduction methods appear promising for fast screening purposes. Recently developed techniques such as EESI (extractive electrospray ionization)-MS , DESI (desorption electrospray atmospheric ionization)-MS , DART (direct analysis in real time)-MS  promise new avenues for metabolic profiling of human body specimens particularly since these methods require little or no sample preparation or extraction. DESI is carried out by directing a charged and nebulized solvent toward an analyte of interest (which could be a biofluid sample directly deposited on to a piece of filter paper). DART-MS, which uses a stream of excited metastable He gas and hot N2 to volatilize and ionize analytes of interest, also provides real-time information on analytes of interest and can be used to identify metabolites. Another atmospheric method is EESI, which utilizes two colliding spray sources for ionization and introduction into the MS. EESI has been used recently to study human breath samples , and to analyze rat urine to identify dietary changes . MALDI (matrix-assisted laser desorption/ionization)-MS has been investigated for the simultaneous detection of several metabolites using a synthetic cocktail of 30 metabolites separately and after spiking into a microbial extract . A recent study that compared several ionization methods quantitatively using serum found more metabolite species by laser desorption on silica compared to other methods, indicating its potential for biomarker detection .
A summary of potentially useful NMR and MS methods for metabolomics applications is shown in Table 1.
Both NMR and MS data contain up to thousands of signals arising from the many hundreds of detected metabolites. Analysis of such complex data is extremely challenging, and thus a variety of pattern recognition methods are used to simplify the data. A number of data pre-treatments are generally required before statistical analysis can be performed to obtain meaningful information on healthy and disease samples.
For NMR data, baseline correction is used to reduce the effect of any non-ideal offsets in individual data. This is particularly important for low-abundance but potentially important metabolites that have small peaks and are more prone to baseline artifacts than high-abundance metabolites. Statistical analysis of spectral data sets requires each spectral peak (or variable) be compared throughout all observations (samples). Misalignment will jeopardize the construction of an appropriate model, which readily produces incorrect metabolic patterns and erroneous identification of potential biomarkers. NMR spectra can be aligned using a reference compound (for example trimethylsilylpropionic acid-d4 sodium salt, TSP or sodium 4,4-dimethyl-4-silapentane-1-sulphonate-d6, DSS). Metabolomic data analysis packages such as KnowItAll (BioRad, Philadelphia, PA), AMIX (Bruker, Billerica, MA) and others allow rapid analysis of biosamples with improved NMR data alignment. The alignment of LC- and GC-MS data is often more challenging because of the tendency of peaks to shift or even reverse in the chromatographic separation as well as the size of the data sets involved. A number of attempts at solving these challenges have been made [62–66], which are typically based on either pattern recognition, time warping, or similarity calculations. This important and challenging area is still evolving.
Data binning is often used and follows spectral alignment. An important advantage of data binning is that it reduces the effect of peak misalignments. At the extremes, binning size does affect biomarker exploration, so judicious use of binning is advised. In any event, the full resolution NMR or MS spectra are recoverable if necessary for metabolite identification.
Data scaling, which allows the emphasis of smaller concentration metabolites is often used. A number of scaling methods are popular, including variance scaling (division by the standard deviation of the peak intensities across the set of spectra) and Parato scaling (division by the square root of the standard deviations). Log scaling has been used to reduce the size of very large and dominant peaks. The data are then typically mean centered by subtracting the average of all the spectra. For samples such as urine, data normalization is also generally performed to reduce any dependence on overall concentration differences.
Modern multivariate statistical methods have become an essential part of the metabolomics field. While feature selection through the use of p-values is of high utility, the need to build predictive models based on multiple biomarkers necessitates the use of multivariate methods. These methods are also quite useful to reduce the dimensionality of the NMR/MS data, and to extract the maximum information from the data. A variety of such methods are capable of analyzing several thousand inputs or “variables” and their corresponding intensities. These statistical approaches are broadly classified into two categories: “unsupervised” and “supervised” methods.
Unsupervised methods classify the spectra without the knowledge of the class of biological specimens (such as disease or control) by using the NMR frequencies or MS m/z values and their intensities for each sample as the sole inputs. Principal component analysis (PCA) is the most commonly used unsupervised method in multivariate analysis. PCA generates orthogonal and ranked principal components (PCs) that explain the variance in the data. The PCs are essentially a new set of data descriptors (basis set or axes) obtained from the linear combination of the variables (metabolite signals) from the NMR or MS data. Hierarchical cluster analysis (HCA) aims to define natural clusters based on comparing distances between pairs of samples (or variables): small distances between samples imply that the samples share similar metabolite content representing similar physiological properties, dietary habits or disease grades, etc. HCA represents analytical results in the format of dendrogram, and facilitates the visualization of different categories with a given similarity level. In biomarker discovery, HCA is usually used as a supporting method to more powerful methods such as PCA in order to target key individual metabolites or spectral regions which most correlate to the class membership. Proton NMR spectroscopy of sera coupled with PCA and HCA has been successfully used in discriminating 120 serum samples into three baseline clusters and two treatment clusters to detect variations in the metabolism of lipids resulting from statin treatments . K nearest neighbor (KNN) analysis is a method of classification based on the similarity within classes. Each spectrum can be treated as a point in a multi-dimensional space. In KNN, the Euclidean distance between every pair of spectra is first calculated. The class assignment of one sample is based on the majority vote of its nearest neighbors.
Supervised methods require a training data set in which the outcome (i.e., disease or healthy) is known and used to build a (hopefully) predictive model. After training, the model can be used on a test set to classify unknown samples and measure the predictive accuracy of the model. Supervised methods are very useful for detecting subtle differences between similar samples in order to identify potential biomarker candidates. Cross validation  is used to test the robustness of putative biomarker candidates during the training process. Supervised techniques can be appropriate to force classification (such as in determining which metabolites distinguish between groups) or to regress a pattern against a trend (such as correlating a temporal progression with metabolic changes). Methods for supervised pattern recognition include partial least squares discriminant analysis (PLS-DA)  and soft independent modeling of class analogies (SIMCA), which are extensive tools for the classification of spectroscopic results. Other methods, including orthogonal signal correction (OSC), genetic programming and neural networks are also used. OSC and PLS-DA have recently been combined to extend the power of supervised methods into metabolomics analyses , and this approach has quickly gained wide usage. In general, it is extremely important to validate the findings of PCA, PLS or other methods using extensive cross validation and, in particular, a second set of samples (preferably blinded and from a second location). Ultimately, biological validation, involving a disease hypothesis specifically related to the discovered biomarkers, will likely be required before acceptance by the medical and scientific communities can be anticipated.
Correlation methods, either within one spectroscopic method  or used to combine NMR and MS data have shown to be effective in identifying metabolites based on NMR chemical shifts and m/z values [70,71]. STOCY has been useful in providing multiple metabolite peaks from a single metabolite, thus simplifying identification . Statistical heterospectroscopy (SHY) operates through the analysis of the intrinsic covariance between signal intensities in the same and related molecules measured by different techniques across cohorts of samples . Further, it is also possible to combine the results of PCA from NMR and MS. This is so because the principle components of NMR data can be treated as independent to those of MS data . Such combined analysis may be useful for larger data sets where 2D score plots are insufficient to differentiate the samples.
There has been an explosive growth in the application of NMR- and MS-based metabolomics driven by the potential for earlier disease detection, therapy monitoring, and ultimately for reaching the goal of personalized medicine. In particular, metabolomics studies have been focused on the identification of metabolites associated with a number of diseases including cancer, diabetes, inborn errors of metabolism and cardiovascular diseases. In general, these early studies are promising, however, validation studies are critically needed to confirm the identity and generality of the putative biomarkers. Validation studies normally involve several independent sample sets for the same disease with even representation to such factors as gender, age, ethnicity, co-morbidity from other diseases, and geographical origin.
Cancer is typically detected radiographically, and often in late stage when therapy options are limited. Therefore, there is a high demand for alternative, earlier and chemically based detection modalities. NMR- and MS-based metabolomics tools have the potential for early diagnosis and even therapy management. Recent developments include the exploration of cancer biomarkers and disease pathways in both humans and animal cancer models (Table 2).
A number of investigations have been carried out to establish breast cancer biomarkers [73–78], with a majority of these focused on biomarker detection directly in breast cancer tumors. The correlation of multiple metabolites such as lactate, lipids, phosphocholine, choline and glycine with the cancer was observed using a variety of 1D and 2D 1H or 31P high resolution MAS NMR. Based on multivariate statistical analysis of the NMR data, tumor and non-involved tissues could be classified with a high specificity (100%) and sensitivity (82%) . Differences in mammary epithelial cell lines such as upregulation of fatty acid synthesis have been detected from the combined 2D NMR and GC-MS methods . Attempts to detect breast cancer from the analysis of exhaled breath have also been made. Volatile organic compounds were targeted in breath from women with abnormal mammograms and biopsies. In this study, cancer patients and controls have been distinguished with a sensitivity and specificity of 94.1% 73.8%, respectively .
The 1H-NMR study of preoperative serum specimens has been made to detect epithelial ovarian cancer . NMR data were analyzed using PCA and SIMCA to classify the patients into ovarian cancer and non-cancer (benign ovarian cysts and healthy controls) subgroups. Statistical analysis distinguished cancer patients from benign and control samples with 97–100% accuracy. However, the putative biomarkers consisted of two non-specific signals emanating from the lipid region, and 3-hydroxybutarate, which has been seen in a number of other metabolomics studies and may result from gut microfloral metabolism. In another study, multivariate analysis of GC-MS data of ovarian tumor and the borderline tissue has shown classification accuracy of 88%. In this study, 51 metabolites were found to be significantly different between the two types of tissue .
Recently, NMR-based metabolomics has been applied to explore liver cancer biomarkers. Differentiation of both low-grade and high-grade tumors from adjacent non-involved tissue was obtained using 1H HRMAS NMR in combination with multivariate statistical analysis . Interestingly, apart from showing metabolic differences between cancer and non-cancer tissue, the analysis showed distinct metabolic differences between low-grade and high grade tumors. Significant alteration in the levels of metabolites such as lactate, phosphorylethanolamine, phosphocholine, amino acids triglycerides, glucose, and glycogen were detected.
Altered metabolic profiles in pancreatic cancer have been studied using plasma and tissue samples [82, 83]. 1H NMR analysis of plasma  was based on the hypothesis that the reported altered insulin and glucose levels in pancreatic cancer result in an altered lipid profile in the blood. NMR spectra of the extracted plasma lipids were subjected to statistical analysis using PLS-DA. The sensitivity, specificity and the overall accuracy of detecting pancreatic cancer was reported to be 96%, 88% and 92%, respectively, when 4 NMR spectral regions were used for the discrimination, and 98%, 94% and 96%, respectively, when 5 regions were used. Mass spectrometric analysis indicated a decrease of phospholipids in the cancer samples. These results are complementary to the metabolic profile recently derived from the 1H HRMAS studies of tissue from animal models of pancreatic cancer . In that study, phosphocholine and glycerophosphocholine were found to decrease in pancreatic cancer.
Metabolites characterization of cervical tumors has been shown using 1H HRMAS NMR and multivariate statistical analysis. Malignant tissue of the cervix showed higher levels of cholines and amino acids compared to non-malignant tissue . Very recently, HRMAS NMR studies were performed to explore apoptosis in cervical cancer. The spectra were analysed for lipid and non-lipid metabolites using one pulse and spin-echo experiments, respectively . Significant correlations were found between the tumor cell fraction and glucose concentration, between tumor cell density and glycerophosphocholine concentration, and in the ratio of glycerophosphocholine to choline. The results suggest an application of the method to explore the role of apoptosis in the course of the disease.
Advancements in metabolite-based detection of lung cancer have been made over a number of years [86–91]. The majority of these investigations focus on the detection of volatile organic compounds in breath samples using various collection strategies and highly sensitive GC-MS detection. A large number of volatile organic compounds have been shown to distinguish between lung cancer and controls [86, 88–91], however achieving a diagnostic accuracy above 90% has been challenging. Lung cancer biomarkers investigations have also been carried out using body fluids such as serum and urine [72,87]. From the GC-MS analysis of serum, higher concentrations of two aldehydes, hexanal and heptanal, have been shown to distinguish lung cancer patients from controls . Metabolite profiles in urine from xenographic lung cancer mice models have been explored using both NMR and the new DESI-MS technique . Urine from both cancerous and control mice were subjected to NMR and MS combined with multivariate statistical analysis. PCA of both the NMR and MS identified a large number of differentiating metabolites, many of which where localized to the purine metabolism pathway.
A combined metabolomic and proteomic approach was employed to study a mouse model of prostate cancer . Multivariate analysis including O-PLS was applied to interpret the differences in plasma metabolomic and proteomic profiles. Correlations between a serotransferrin precursor and both tyrosine and 3-D-hydroxybutyrate, and between a decreased concentration of tyrosine and an increased presence of gelsolin were observed. Several metabolomic studies on prostate cancer focused on the analysis of tumors and seminal fluid [93–99] A combined 1H HRMAS NMR and quantitative histopathology study on same tumor specimens showed a linear correlation between the concentration of spermine measured by NMR and the volume percentage of normal prostatic epithelial cells quantified by histopathology . These findings highlight the role of NMR as an effective tool for investigating the inhibitory mechanism of spermine in humans. A combined in vivo and in vitro NMR, and histopathology study indicate the potential utility of translating ex vivo derived biomarkers for improved clinical interpretation of prostate cancer using in vivo NMR . In this study, healthy glandular tissue was discriminated from prostate cancer based on high citrate and polyamines, and low choline, phosphocholine and glycerophosphocholine. In addition, concentrations of taurine, myo-inositol, and scyllo-inositol were all higher in cancer compared to healthy glandular and stromal tissues. A computer model of tissue pathology based on metabolic profiles derived from HRMAS NMR was proposed for prostate cancer . The results of NMR and computer aided tissue analysis showed a linear correlation between them for both normal epithelium and prostate cancer. The diagnostic capability of NMR spectroscopy for predicting prostate cancer was tested using the multivariate analysis of HRMAS NMR metabolic and quantitative histopathology data of 199 tissue samples from 82 cancer patients . Recently, quantities of several prostate metabolite concentrations were determined using 1H HRMAS NMR and compared between normal and cancer tissue . Concentrations of phosphocholine /glycerophosphocholine, total choline, lactate, and alanine were higher in prostate cancer than in healthy glandular and healthy stromal tissues, while citrate and polyamine concentrations were significantly higher in healthy glandular tissues than in healthy stromal or prostate cancer tissues. A 1H NMR study utilizing seminal fluid indicated that citrate based prostate cancer detection outperforms prostate specific antigen testing  and a more recent 1H NMR study highlights the use of myo-inositol and spermine, in addition to citrate, for detecting the presence of prostate cancer .
Exploring metabolite biomarkers for renal cancer based on tissue samples was explored soon after HRMAS was developed . Subsequently, renal tumors have been extensively investigated employing the latest technological advancements in HRMAS NMR and multivariate analysis [101,102]. Conventional 1D NMR methods such as one pulse and spin-echo as well as 2D experiments such as J-resolved, TOCSY and 1H-13C HMQC have been utilized for the resonance assignments. Unsupervised and supervised multivariate statistical analyses of the HRMAS NMR data focused on classifying normal and renal carcinoma tissue, and showed a clear distinction based on NMR signal intensities. A linear discriminant analysis was used to classify normal and tumor tissues with 100% accuracy . Recently, metabolic profiles of normal cortex and medulla samples were compared with malignant tissue also using HRMAS NMR . Compared to the normal cells, renal carcinoma cells had lower organic osmolytes and higher lipid concentrations. On the other hand, in the papillary renal cell carcinoma the taurine concentration was higher and the lipid signals were absent. Kind et al. have evaluated three MS-based analytical methods to identify potential biomarkers for renal cancer. The combined approach gives a good coverage of the urinary metabolites, several of which the authors believe may be useful for diagnosis .
Quite a few recent studies have focused on the identification of differential metabolites in brain cancers [104–112]. Detailed assignments of the biochemical compounds in brain tumors have been made using the combination of in vivo and ex vivo analyses and employing several 1D and 2D NMR experiments . A large number of metabolites have been shown to differentiate brain tumor from normal tissue, and metabolic ratios are used to achieve the highest sensitivity . Considering the fact that brain in vivo MR spectroscopy is easily amenable for clinical applications, attempts have been made to establish the link between MR spectroscopy and neuropathological analysis utilizing the metabolic profile obtained using ex vivo high resolution NMR spectroscopy [104,106]. A recent study combined HRMAS NMR and micro scale genomics. It was shown that tissue samples as small as 2 mg could be successfully used for HRMAS experiments and minute mRNA amounts yielded high-quality genomic data. This is one of new interesting applications in which metabolomics and genomics have been combined. In this case, alterations in the expression of Kennedy pathway genes and dysregulation in Sonic Hedgehog pathway in the pathogenesis of cancer were observed . Classification of patients based on metastasis and survival prediction of brain cancer patients were made by performing multivariate analysis of the HRMAS NMR data . While the PCA results clearly showed a trend in clustering due to the origin of the metastases, PLS analysis indicated distinct clustering of the spectra of the patients who died less than 5 months after surgery. Although further validation is needed, these results indicate the potential for clinical applications to manage brain cancer patients.
Metabolomics is ideal for studying metabolic diseases, and has already been applied to both type 1 and type 2 diabetes utilizing a range of biological specimens including urine, serum/plasma and tissue (Table 3). 1H NMR analysis of urine has identified a number of significantly changing metabolites, including acetate, lactate, citrate, glycine, alanine, hippurate, trimethylamine-N-oxide, and dimethylamine [113,114]. Multivariate statistical analyses of the 1H NMR data from human, rat and mouse urine demonstrate metabolic similarities among the three species including responses associated with type 2 diabetes involving glucose metabolism, the TCA cycle, and the nucleotide and methylamine metabolisms . Another recent study has used quantitative NMR-based metabolomics to correlate differentiating metabolites in induced diabetes in rats. Significant disturbances in several metabolic pathways including glucose metabolism and the TCA cycle, the alanine pathway, the Cori cycle, the acetate switch, and choline metabolism, as well as a contribution from gut microbial metabolism were identified from the analysis . A dramatic loss in the correlation among the detected metabolites was observed. Studies on animal diabetic models aimed at understanding insulin-resistance induced by a high fat diet have shown increased concentrations of lipids, lactate, pyruvate, glucose, fucose, phosphatidylcholine, trimethylamine N-oxide and methylamines in plasma . The effect of gut microbiota to the fatty liver phenotype in insulin-resistant mice has been studied and a model linking the impaired glucose homeostasis and nonalcoholic fatty liver disease (NAFLD) in reducing mammalian availability of choline was deduced using a number of biomarkers such as choline, phosphatidylcholine and methylaminies . 1H NMR-based metabolomics was applied to assess diabetes induced nephropathy and the results indicate high positive as well as negative predictive values (89% and 83.6%, respectively) which are comparable to those derived from clinical biochemistry data, 95.5% and 79.2%, respectively . NMR of other nuclei such as 13C and 31P has been extensively used to understand the pathogenesis of diabetes [120–125]. For example, 13C NMR was used to investigate metabolic responses to a dextrose challenge  and to understand the role of reduced glycogen synthesis on muscle insulin resistance . Measurements of glycogen synthesis was made using 13C NMR to study the effect of insulin resistance on both type 1 and type 2 diabetes [122, 123]. 31P NMR was employed to measure the glucose transport/phosphorylation activity in situ. The results correlated well with the glycogen synthesis rate as measured by 13C NMR [124,125].
Several investigations have demonstrated the application of metabolomics to diabetes using a combination of MS and multivariate statistics [126–129]. Plasma was analysed using LC-MS and the data subjected to PCA and PLS-DA, focusing on phospholipid metabolites. Type 2 diabetes could be distinguished from controls based on the differences in the spectral features . Utilizing the advanced UPLC-MS approach, nearly 10,000 ions in rat plasma have been detected, and the ability of such metabolic data to distinguish among three rat strains, obese, lean and the lean/obese was demonstrated using multivariate analysis . Nevertheless, the detection of so many features does give pause as to the required level of validation that is necessary. Targeted metabolic profiling using GC-MS has enabled detection of plasma fatty acids including non-esterified fatty acids (NEFA) and esterified fatty acids . This study detected a number of additional putative biomarkers, and allowed a comprehensive understanding of the role of NEFA and the effect of treatment with thiazolidinediones.
Finally, the interest in combining NMR and MS methods for the study of diabetes is growing [130–132]. The metabolic regulatory mechanisms in diabetes was investigated by obtaining metabolic profiles in plasma from normal Wistar-derived and Zucker (fa/fa) obese rats from multiple analytical platforms including NMR, UPLC-MS and GC-MS. PCA analysis of the data readily detected the differences in the metabolite profiles between the two rat strains . For example, a number of biomarkers including cholesterol, arachidonic acid, oleic acid, hexadecanoic acid, monooleoylglycerol and low and very low density lipoproteins were higher in the Zucker rats. Another recent study combining 1H HRMAS NMR, GC-MS and LC-MS examined the metabolic perturbations in type 2 diabetes and obesity. Both PCA and PLS analysis showed dramatic alterations in the levels of several metabolites such as glucose, glutamine, alanine and lactate and indicated perturbations in glycolysis, the TCA cycle, and gluconeogenesis .
The IEM form a large and diverse group of diseases. A majority of these involve single genetic defects that affect a specific enzyme . Individually, IEM are rare, however, their collective incidence is relatively high and lie in the range of 1 per 1,400 to 5,000 live births in the United States. In most of the disorders, problems arise due to the accumulation of metabolites that are toxic or interfere with normal function. Often, IEM are difficult to diagnose since clinical signs and symptoms overlap among the different diseases. Biochemical tests are often nonspecific and gene analysis is not always conclusive. Moreover, for several metabolic disorders, comprehensive analytical techniques have not been established making it difficult to diagnose such diseases. For example, N-acetylated metabolites in urine that are involved in several IEM are not easily detectable . Currently GC/MS or tandem MS is used to detect up to 80 different IEM, although these analyses are typically based on the detection of single metabolite biomarkers. Metabolomics may have a role to play in changing the detection of at least some of these IEM. NMR spectroscopy is very useful for the diagnosis of a number of IEMs [134–138], and has been successfully combined with MS using a metabolomics approach . Metabolomics appears to be particularly promising for identifying additional potential IEM biomarkers, and may prove useful for detecting borderline cases, or for sub-classifying the diseases .
Coronary heart diseases (CHD) provide another excellent target for metabolomics diagnostic development. Angiography, a current diagnostic modality for CHD is both expensive and invasive. Hence, advancement in metabolomics-based assessment of CHD is highly desirable. In view of this opportunity, lipid metabolites have been assessed for some time using NMR spectroscopy to understand the risk of CHD [141,142]. The diagnostic utility of the NMR-based metabolomics approach was assessed using serum from individuals with no evidence of stenosis (normal coronary arteries, NCA), or severe CHD defined as at least 50% of stenosis (triple vessel disease, TVD) employing multivariate statistical analysis . Disease and control subjects were distinctly separated from the analysis using PLS-DA and OSC. Classification was achieved mainly from subtle differences in the lipid signals of the NMR spectra between the two groups. These findings were in conformity with the results of independent studies made on individuals with and without CHD. Subsequently, NMR spectroscopy was evaluated as a diagnostic method on patients with hypertension, and the results clearly distinguished low/normal systolic blood pressure (SBP) serum samples from borderline and high SBP samples . NMR-based diagnosis of CHD depends mainly on the major lipid regions of the 1H NMR spectra and many variables such as diet, gender, lifestyle and drugs affect lipid composition: failing to take such confounding variable into account can lead to a false conclusion. A study was performed to determine the predictive power of the NMR-based method in groups of male patients. This study achieved a mere 36.2% and 6.2% predictive accuracy at the 99% confidence level for untreated and treated groups, respectively, indicating that the NMR-based metabolomics method still lacks sufficient diagnostic accuracy . A very recent review presents an evaluation of the metabolomics approach with emphasis on the CHD risk assessment and diagnosis using 1H NMR of plasma. It highlights the potential utility of combining in vitro 1H NMR based metabolomics and in vivo multicontrast magnetic resonance imaging for early diagnosis and multiphase risk assessment of atherothrombosis .
In metabolomics multi-component measurements are primarily focused on the information rich analytical techniques of MS and NMR spectroscopy. Analysis of the complex data from these methods using advanced multivariate statistics provides a powerful platform for diagnostic applications, as well as for translational and clinical research. During the past several years numerous developments have taken place in the field including a variety of advances in the analytical methods for high throughput measurements, improved statistical approaches for classifying samples based on subtle changes, and applications in the area of disease diagnosis and toxicity assessment.
Today, MS methods provide high sensitivity, however reproducibility is still a concern, as is the unique identification of unknown and interesting metabolites. NMR methods allow the identification and quantification of metabolites down to µM limits facilitated by cryogenic probes, micro-coil NMR and isotope labeling, however, spectral complexity is still a problem due to the high degree of signal overlap. Numerous multivariate statistical methods are readily available due to their incorporation into user-friendly software packages. These advancements combined with the recently developed databases of human metabolites and metabolic information, and the vast body of metabolic pathway information currently available greatly benefit the metabolomics field.
A majority of metabolomics studies have focused on using a single analytical method, NMR spectroscopy or MS. Given the complexity of the biological systems, it will be more prudent to exploit both methods in parallel, at least in the developmental stages of the field, to derive more meaningful information on metabolic variations in health and disease. The high reproducibility of NMR and high sensitivity of MS provide both supplementary and complementary data important for biomarker identification and validation. Combined multivariate analysis of data from NMR and MS will provide information which is more useful and important than using a single approach.
There has been an explosive growth in both NMR- and MS-based metabolomics studies and applications, with a number of these being applied to assess important diseases. Metabolomics-based methods will likely have significant clinical diagnostic utility for numerous inborn errors of metabolism owing to the fact that, generally, such diseases exhibit massive metabolic disturbances. Risk assessment of cardiovascular diseases using NMR-based metabolomics is potentially highly promising because the abundant lipoproteins can be easily detected by NMR thus avoiding the often tedious separation procedures. In fact, NMR spectroscopy is already being applied for individual risk assessments. The addition of MS-based metabolomics approaches might well improve the diagnostic accuracy and utility.
Currently, metabolomics applications to diseases such as cancer and diabetes have provided better insights in to the altered metabolic pathways and the disease pathogenesis. However, for applications in early diagnosis the technology is still in its evolutionary stages. Factors such as diet, age, gender, lifestyle, drugs and environment contribute immensely to human bio-complexity, and identification of subtle metabolic variations associated with early cancer or diabetes pathogenesis is a great challenge. Further studies focused on the deconvolution of such confounding effects are required. In general, NMR detects relatively highly concentrated metabolites, and it is generally thought to lack sufficient sensitivity to detect low concentration, more specific early biomarkers. Although the latest technological advancements have demonstrated a dramatic reduction in the detection limit of NMR using pure substances, such methods are still not considered high throughput for routine biological samples analysis. Combining the latest advancements in NMR methods and targeted metabolic profiling using sensitivity enhanced approaches such as isotope labeling or others may achieve a breakthrough in biomarker detection.
While MS is intrinsically highly sensitive capable of detecting early biomarkers, problems with reproducibility arising from the chromatography of biofluids and factors such as matrix effects and/or ion suppression still present considerable challenges. However, in light of recent methodological improvements in both NMR and MS, as well as multivariate statistical methods, it can be envisaged that metabolomics will emerge as a sensitive and convenient approach for early disease diagnoses. The five year outlook for metabolomics is very strong given the rapid developments in the field, as well as the important lessons learned from early studies in the genomics and proteomics fields.
Papers of special note have been highlighted as:
• of interest
•• of considerable interest