Metabolomics, a methodology for measuring small-molecule metabolite profiles and fluxes in biological matrices, following genetic modification or exogenous challenges, has become an important component of systems biology, complementing genomics, transcriptomics and proteomics (Nicholson and Wilson, 2003
; Fernie et al., 2004
; Griffin, 2006
). Because of the comprehensive nature of metabolite measurement and the capacity to detect subtle changes in a large dataset, metabolomics has found broad application. Examples include the identification of biomarkers, and the unravelling of pathophysiological mechanisms in many scientific fields, such as plant biology (Schauer and Fernie, 2006
), microbiology (van der Werf et al., 2005
), toxicology (Nicholson et al., 2002
) and disease diagnosis (Brindle et al., 2002
). While metabolomics is often used to refer to studies of the endogenous metabolites, in our view, this term should also be extended to the environmental chemicals and drugs to which we are exposed. The human metabolome, for example, is many times bigger than the 2,180 or so calculated endogenous chemicals that are purely the products of the genome, transcriptome and proteome in isolation (Wishart et al., 2007
) since food, medicines, bacterial metabolites (gut flora), and environmental chemicals (diet, air, pollution) furnish a far greater proportion of this metabolite pool. For example, the food mutagen PhIP and its metabolites (see below), which are not purely of endogenous origin, are surely present in the tissues of any person who consumes cooked, as opposed to boiled, food. Therefore, it will never be possible in the immediate future to distinguish pure endobiotics from pure xenobiotics and xenobiotics that have been modified by the proteome. In order not to involve ourselves in futile labeling of constituents of the metabolome, we take the word “metabolomics” to cover all small molecule constituents of biological fluids/tissues that can be identified and quantitated, irrespective of their origin. This approach of broadening the coverage of metabolomics has a clear advantage than endogenous compound-only approach for solving some dilemma of defining whether a chemical is a component of the metabolome or not, such as acetaminophen glucuronide, which has approximately half of the mass from exogenous parent drug and the other half (glucuronic acid moiety) from endogenous glucose metabolism, and life-essential sodium ions present in body fluids, which is purely a xenobiotic.
Most widely-used analytical instruments in metabolomic research are nuclear magnetic resonance (NMR) spectrometers and MS. Other techniques, such as electrochemistry or infrared spectroscopy, have also been adopted, but their application is limited by the lack of detailed structural information that they provide (Kristal et al., 2007
). The pros and cons of using NMR or MS in metabolomic research have been discussed extensively in several recent reviews (Schlotterbeck et al., 2006
; Pan and Raftery, 2007
). Overall, NMR has advantages such as the non-destructive nature of sample preparation and the comprehensive coverage of chemical species, while MS possesses much better sensitivity and resolution as well as high-throughput capacity.
Attempts to improve the limiting sensitivity of 1
H NMR for biological samples into the nanomolar range have included the application of extremely high static magnetic field strengths, up to 21.2 Tesla (920 MHz 1
H spectrometer; Hashi et al., 2002
) and the usage of a probe that is cryogenically cooled to 4.5 K in order to increase signal to noise ratios (Kikuchi et al., 2004
). However, the metabolomic implementation of such cutting-edge instrumentation is still awaited. Another and exciting application of NMR in metabolomic research has been the use of so-called hetero-nuclear NMR methodologies, which are particularly pertinent to plant metabolomics. Here, plants grown exclusively on 13
C- and 15
N-labeled nutrient sources have been investigated using two-dimensional heteronuclear single quantum correlation spectroscopy (2D-HSQC NMR). 1
C and 1
N 2D-HSQC have furnished valuable insights into the effects of xenobiotic stresses on the systems biology of both wild-type and mutant Arabidopsis thaliana
plants (Kikuchi et al., 2004
). Such technologies, however, have so far found limited application in mammalian drug metabolism studies.
Although many seminal metabolomic studies were conducted using NMR, an increasing number of studies based on MS technology have been published recently, dependent upon the abovementioned advantages and the wide availability of MS instruments (Want et al., 2007
). Introduction of prepared biological samples into a mass spectrometer can be through direct injection, gas chromatography (GC), liquid chromatography (LC) as well as capillary electrophoresis (Dettmer et al., 2007
). Among them, LC-MS is the most widely used instrument since LC-based sample introduction results in lesser ion suppression and higher resolution than direct infusion, and it generally avoids the chemical derivatization that is generally required for GC-MS. Therefore, this review will focus on the LC-MS-based metabolomic techniques and their application to xenobiotic metabolism.
Traditionally, HPLC has been main staple of LC instruments. Recent development on ultra-performance liquid chromatography (UPLC) using 1–2 micron-size particles brings a much higher resolution for analyte separation and a lower limit of detection for ions (Wilson et al., 2005
). MS instruments with high-resolution mass measurement, such as time-of-flight (TOF) or Fourier transform (FT) mass spectrometers are preferred, since accurate masses of xenobiotic metabolites are not only the prerequisite for peak identification and collection across samples, but also readily furnish empirical formulae, and thus candidate chemical structures. Furthermore, the sensitivity of metabolite detection can be improved by optimizing the gradient or composition of mobile phase to improve ion spray and ionization. It should also be noted that LC-MS measurement in the absence of authentic standards is non-quantitative for the measurement of either major or minor metabolites. However, it can provide an estimation of relative abundances of identified metabolites if metabolites are ionized with comparable efficiency and the compounds elute at positions in the chromatogram that are subject to similar ion suppression effects. Overall, high-resolution and reproducible LC-MS measurement sets up the basis for subsequent data processing and multivariate data analysis (MDA).
Appropriate data processing is required to prepare chromatographic and spectral data for MDA (Katajamaa and Oresic, 2007
). General procedures include data condensation and reduction by centroiding and deisotoping mass spectra; chromatographic alignment to reduce the variation in retention time; filtering to remove noise or background signals; and peak recognition and collection by setting threshold windows for mass (m/z) and retention time (RT). To decrease the influence of systematic and sample biases (such as the degree of urine dilution), MS data should also be normalized by either the parameters of the complete dataset (such as total ion count, median ion count, etc.) or the intensities of single or multiple internal standards (such as creatinine in the case of urine; Sysi-Aho et al., 2007
). Consequentially, a multivariate dataset containing information about sample identities, ion identities (RT and m/z
values) and relative ion intensities can be generated (). Processed datasets can be directly used for MDA, or be further statistically transformed and scaled according to the properties of data and the purpose of MDA analysis. To identify latent components or principal components (PC) in a complex dataset, data are projected onto a new coordinate system based on pattern recognition algorithm or MDA method (Schlotterbeck et al., 2006
). Thereafter, a model containing one or multiple PCs can be established to represent a large portion of examined dataset. In contrast to other statistical techniques, such as t
-test and ANOVA, an established MDA model and its PCs can be presented in the scores plot, in which sample-PC and sample-sample relationships can be visualized. In LC-MS-based metabolomics, the spatial distance between two samples in the scores plot reflects their differences in chemical composition. When a clear sample clustering is observed in the scores plot, the contribution of individual ions to PCs and to the group separation can be further examined in the loadings plot, in which the relationships between ions and PCs are depicted. With appropriate MDA modeling, ions contributing to the sample separation can be detected in the loadings plot and be further characterized.
Figure 1 Procedures of LC-MS-based metabolomics. Chromatographic and spectral data are acquired by high-resolution LC-MS. Subsequent data processing, such as centroiding, deisotoping, filtering, peak recognition, yields a data matrix containing information on (more ...)
Two major categories of MDA methods, unsupervised and supervised MDA, have been widely used in metabolomic data analysis. In unsupervised MDA, sample classification is unknown or intentionally blinded to the analytical software, while in supervised MDA this information is provided to the software for the purpose of model construction. The most popular unsupervised method is principal components analysis (PCA). Because of its indiscriminate nature, the markers identified in a robust PCA model can usually be validated. Supervised MDA encompasses many methods, including partial least squares (PLS); orthogonal partial least squares (OPLS); soft-independent modeling of class analogy (SIMCA) and partial least squares-discriminant analysis (PLS-DA) (Trygg et al., 2007
). The selection of supervised MDA method is determined by the data properties and the aim of MDA analysis.