An industry panel—Steven Fischer (Agilent Technologies), Suma Ramagiri (AB SCIEX), John Ryals (Metabolon), Mark Sanders (Thermo Fisher Scientific), John Shockcor (Waters Corporation), and Joe Shambaugh (Genedata)—was asked to discuss the benefits, drawbacks, and areas of future development for targeted versus global untargeted profiling as tools for metabolic phenotyping.
In general, mastering the tools of chromatographic separation methods takes precedence over metabolite identification. Normal phase and reversed phase chromatography have synergism for global small molecule separation and identification.36
Supercriticial fluid chromatography, for example, may provide a new modality for future global lipid analysis.37
Derivatization can aid targeted LC/MS/MS analysis, as used in amino acid38,39
and acyl carnitine analysis.40,41
A targeted quantitative approach, using GC/MS and LC/MS/MS, is the best first approach for any metabolomics/lipidomics problem. This should be followed with a global profiling paradigm, first aimed at getting the best possible exact MS data, in particular with retention time locked databases, subsequently re-run to obtain MS/MS to aid database searching. The principal challenges in global profiling are the creative use of algorithms for the separation of peaks from noise, optimal data mining paradigms and databases, and for biofluids determining the source for the metabolites identified42–47
Figure 7 Overview of metabolomic data generation and data analysis. The flowchart used for metabolite extraction, data mining, and metabolite identification is detailed. This illustrates sample preparation, mass spectrometric analysis, peak extraction/identification (more ...)
Metabolite biomarkers include those synthesized in vivo
and those derived from exogenous sources, including the microbiome. The consensus was that humans are capable of synthesizing roughly 2500 compounds. As reviewed in Dunn et al
2,000–7,000 metabolic features can be detected in a serum or plasma sample. A single metabolite can be detected as different ion types: for example, as protonated and deprotonated ions, adduct ions, isotopomers, fragment ions, dimers, and trimers. Therefore, a large number of metabolic features identified correspond to a smaller number of actual metabolites.49
Humans may contain more molecules than they are able to directly synthesize, due to microbiome metabolism, drugs, or dietary supplements. Differences in the amount of compounds in human plasma found at different facilities stem, in part, from whether pooled human samples were used versus individual test subjects, as well as some differences due to the particular MS platform used. Pooled plasma samples have as many as 2000 compounds (Fischer, private communication), while individual subjects have at least 500–600 compounds.50
The dataset derived from untargeted mass spectrum analysis may be very noisy, with noise in unit mass and/or accurate mass instruments being ~80% of the total data collected.51
Optimal peak identification/separation of sample peaks from chemical noise, and clustering of their GC/MS and LC/MS data before library search for metabolite identification, is facilitated by software packages such as Mass Profiler Professional, Thermo Scientific Sieve,52
Genedata Expressionist for Mass Spec,53
Transomics, and XC/MS45
The current data mining paradigm involves extracting data using a naive feature extractor and performing compound identification on the reconstructed spectra. Untargeted mass spectrum analysis is facilitated by assembly of a database composed of a large number of library standards. Each standard entry can have a number of features, such as a retention time index, MS spectra, and MS/MS fragmentation spectra, obtained at different collision energies. Retention time libraries can be machine- and column-specific, as different machines have different sensitivities, and some problems requiring nano-UPLC will necessarily have a different retention time library than standard UPLC. As mentioned, due to the redundancy of the ion spectra, each library entry may have ~ 10 or more features, as each molecular standard can be associated with 5–10 ion features.48,51,54
The current Agilent-METLIN database and MS/MS library contains ~ 45,000 compounds, with ~ 9000 compounds having MS/MS spectra.44
METLIN data has been acquired using a collision cell shared by triple quadrapole and qTOF machines. MS/MS spectra are collected in both positive- and negative-ion mode and at 10, 20, and 40 eV collision energies. Those spectra that have at least one ion with ~ 1000 counts of signal are retained for entry into the MS/MS library. The spectra are edited to only include ion signals coming from the standard, and the reported mass is corrected to its theoretical mass.44
GC/MS metabolite identifications are facilitated by well-defined MS conditions and libraries, as reviewed Kind and Fiehn,55
and METLIN, Mass Frontier, and m/z Clouda
are establishing databases that together cover a wide variety of MS platforms.
The loose fit of MSn
spectra with the METLIN database suggests that MS and MSn
spectra generated on LTQ-Orbitrap machines are best identified by Mass Frontier.55
The larger the database, the better it works, and the m/z Cloud community-based effort aims to establish a comprehensive library of high quality spectral trees to improve the structural elucidation of unknowns by identifying compounds even when they are not present in the library, using spectral tree searches. For example, individual MSn
spectra can be searched against the m/z Cloud library to retrieve structural or substructural hits. The challenge is reassembly, which can be expert-motivated and have input from correlations with other metabolites to assemble the puzzle.55
Lipidomic database searches benefit from the LipidMaps initiative.56
which has resulted in dedicated commercially available in silico
lipid databases such as LipidView (Ab Sciex), SimLipid (Biosoft), or Lipid Search (MKI), enabling one to uniquely identify over 20,000 lipid species using characteristic lipid fragments.50
The use of pathways as a means to interpret metabolomics data acquired using non-targeted data acquisition strategies opens up a different approach to data mining. By using pathways for biological interpretation, the researcher has defined the metabolites in the pathway(s) as a target compound list. The identified target list then can be used for statistical analysis57
() rather than just analyzing features. This compound list can be used as the template for further mining the pathway(s) using targeted identification and data extraction.
Future developmental work could center on matching possible metabolites at successive nodes, integrating searching with pathway databases for both GC and LC. For GC, this would involve theoretical calculations of derivatization effects.58
As compounds are actually identified, a database can be created that records this information for future use in compound identification. Another possibility is to use genome-wide association studies (GWAS)59
data to see if there is an association to the molecule of interest. At times, associating a particular allele to specific metabolite biomarkers may suggest a known gene or a gene of a known class.
An unknown compound can be identified, tracked, and quantitated with relative or semi-quantification even though its true identity is not known. If such a molecule becomes an important biomarker, there are several approaches that can be used to either suggest an identity or get clues as to the identity. Biochemicals are typically not independent variables; they change in groups that are related biosynthetically or functionally, and statistical correlative methods can be of use to postulate relationships. Important biomarkers identified in this manner can have their mass accurately determined, atomic composition calculated, and identification made more complete by using MSn analysis. Such approaches can give scientists better ideas about the identity of the metabolite, its molecular composition, and the pathways involved in its metabolism.