Analytical Variation (LC/MS + XCMS)
The importance of statistical strategies in metabolomics has been the subject of recent reviews27
discussing several data-analysis tools, including XCMS.23, 28–33
Pooled human plasma and human CSF extracts were used to assess the analytical variability of the LC/MS platform and XCMS analysis procedure. Plasma samples from individual patients were pooled together to create representative average samples. The CSF samples were similarly pooled. Analytical replicates of a pooled sample of plasma or CSF were then run in a series of injections, each separated by a shorter run serving to wash and re-equilibrate the column, to evaluate analytical variability. The first five samples were injected to condition the system prior to the series of sample injections and the data from the conditioning injections were not included in the analysis. A principle component analysis showed that conditioning of the system greatly improved the reproducibility of the leading sample injections. This trend has been previously reported and Zelena et al. have shown that the number of injections required to condition the system may vary for different biofluids.34
The data were processed with XCMS independently for each biofluid, which detected approximately 5,300 features in the human plasma replicates and 4,900 features in the human CSF replicates in the mass range of m
100 to m
1000. We estimate the number of distinct metabolites to be substantially less than the number of detected features, since a single metabolite may give rise to multiple mass spectral peaks, including the molecular ion, in-source fragments, adducts, and dimers, and each of these is reported in XCMS as a separate peak, or “feature”. Additionally, there will be some spurious peaks arising from background and noise that are classified as signals.
contains the summary of measured analytical variation in the replicate runs. The median coefficient of variation (CV) was calculated by using the peak areas of all features reported by XCMS. Peak areas showed high analytical precision, with a median CV of 16% or less for features across replicate analyses, for both types of biofluids.
Measured variation in human plasma and CSF.
Several reports suggest the importance of considering biological variability in human biofluid analysis since metabolic profiles may be sensitive to genetic, dietary, lifestyle, or environmental factors.35, 36
Variation in sample handling during the collection and post-collection procedures can also appear as inter-individual variation.
A challenge in the application of metabolomics for disease biomarker discovery is to determine the effect of human variation on metabolite profiles. The success of metabolomics in non-human model systems is partially due to the controlled nature of such variation in the subjects, environmentally, behaviorally, and genetically. Typically, animal studies may have several specimens and contain several subjects per group or category.37
In human studies, the inability to control the genetics, living conditions, diet, and other factors leads to the question of the number of human patients required to form statistically meaningful datasets.38
Typical chromatograms for human plasma and human CSF are shown in ; plasma is dominated by peaks arising from lysophospholipids. XCMS was used to compare the data from the individual samples which were pooled in the analytical replicate analysis. The results are summarized in . XCMS detected approximately 3,600 features in the human plasma biological replicates and 4,800 features in the human CSF biological replicates in the mass range of m/z 100 to m/z 1000. The total number of detected features decreased for the plasma biological replicates relative to the pooled plasma analytical replicates, but this was not observed for the CSF replicates. The majority of features showed low inter-individual variation, suggesting that there are characteristic plasma and CSF metabolite profiles including a core set of relatively low-variance metabolites in healthy individuals. The median CV of the features detected in human plasma biological replicates was 46%. Human CSF biological replicates showed less variation than replicates plasma with a median CV of 35%.
Figure 2 Total Ion Chromatogram (TIC) of human plasma (A, top) and CSF (B, bottom). The following metabolites appear in the plasma chromatogram: a: methionine b: leucine/isoleucine c: phenylalanine d: tryptophan e. caffeine f, g, h: lysophospholipids. The blue (more ...)
Biomarker discovery typically employs a comparison of peak areas to statistically analyze metabolite changes between two or more phenotypic groups. Based on integrated peak areas, XCMS reports variations between two groups, including the average fold change difference and the t-test p-values. To determine the magnitude of significant changes which would be detected by XCMS in a within-group comparison of healthy individuals, 12 biological replicates of human CSF were compared by dividing the biological replicate samples into two groups of 6 and running XCMS to detect changes between the two groups. For the within-group comparison of CSF samples, 11 of the 4,800 features detected were reported to change by 2-fold or greater within a 95% confidence interval (p-value < 0.05), and only 3 of these features were reported within a 99% confidence interval (p-value < 0.01). The number of potential “false positives” was slightly higher for human plasma. This test was repeated for 14 human plasma biological replicates, which were split into two groups of 7. Of the 3,600 features detected, 48 features were reported to change by 2-fold or greater within a 95% confidence interval, however, only 5 of these were within a 99% confidence interval. For both human plasma and CSF, only one feature was reported to change by 3-fold or greater within a 99% confidence interval. Based on these calculations, applying a filter to include detected changes which are only 2-fold or greater should minimize the effect of biological variability which is inherent in a healthy control group.
When matched plasma and CSF from the same individual was compared (for a sample group consisting of 10 male individuals), less variation in the basal metabolite profiles was observed in CSF relative to plasma. illustrates a comparison of integrated intensities and CV for plasma and CSF sampled from these 10 individuals. To understand the nature of the differences, we compared the composition of each. 70% of the shared features have a greater intensity in plasma, which is dominated by lysophospholipids. A study on the characterization of the human CSF metabolome has previously been performed by Wishart et al., showing that lysophospholipids are present at very low concentrations in CSF.39
Due to ion suppression, the relative intensities of features in this study cannot serve as a measure of concentration in such different biofluids, but it provides an idea of the large scale differences between plasma and CSF. Ion suppression results from the limited charge available to all metabolites eluting from the column at a particular time and the larger number of readily ionizable phospholipids in plasma.
Comparison of matched human plasma and CSF from 10 individuals. A) Integrated intensity of features in human plasma versus CSF. 70 percent of shared features have a greater intensity in plasma. B) Coefficient of Variation in human plasma versus CSF.
A graph illustrating the relationship of the CV of plasma and CSF (, right) further supports the higher variation of plasma compared to CSF. A large number of features demonstrate high variation in plasma, with a CV near 80–90%, whereas in CSF these same features have a CV of 50% or less. These results show that there is a correlation between metabolite intensities in plasma and CSF, but that the inter-individual differences in metabolite profiles is more conserved in CSF relative to plasma.
A principal component analysis of biological replicates of plasma and CSF from shows more variability in individual plasma metabolite profiles compared to CSF metabolite profiles. The PCA plot in illustrates tight clustering of the CSF samples, whereas much greater spread is observed among the plasma samples which contain an outlier.
PCA analysis of human plasma (red triangles) and human CSF (blue triangles) metabolomics datasets. Note the greater spread in the plasma compared with CSF.
Scatter plots, shown in , illustrate the relationship between intensity and CV, for both analytical and biological replicates of each biofluid. In the biological replicates of human CSF (), a high-frequency cluster appears of medium intensity and low CV (< 20%) data. A similar cluster also appears in the biological replicates of human plasma (). Selection of low-variation clusters might aid in choosing candidate biomarkers that benefit from having efficient ionization, reproducible detection, and low variation among healthy individuals.
Figure 5 Left. Histogram showing the frequency distribution of the coefficient of variation (CV). Right. Plot of integrated intensity versus the CV. A) Analytical replicates of human CSF. B) Biological replicates of human CSF. C) Analytical replicates of human (more ...)
Displayed on the left hand side of are histograms depicting the frequency of features binned according to their CV. In both plasma and CSF, the histograms of analytical replicates are similar, with a narrow distribution and a maximum number of features having a CV of approximately 10%. For the biological replicates, the envelope of the histogram is wider compared to analytical replicates, corresponding to a greater number of features having a higher CV. There is also a shift in the histogram to the right, the result of the biological variability on the distribution. In the histogram of biological plasma replicates there is a cluster of features which seem to retain conservation and low variation among individuals. This cluster is not evident in the histogram of biological CSF replicates, indicating that there is much less variation in CSF metabolite profiles overall.
Finally, we compared the biological variation of ions of unique m
in each biofluid. The average CV was calculated for groups of features binned according to their m
values (). While no direct correlation between CV and m
was observed for either biofluid, the average CV was more consistent over different m
ranges for CSF compared to plasma. The relative consistency in CV versus m
for both plasma and CSF suggests that many features of each biofluid have comparable biological variability. To further investigate the variability of unique features in each biofluid and to compare the variation of specific features in plasma to those in CSF, we identified several metabolites present in both biofluids. These metabolites are listed in , with their experimentally measured m
values and their product ions, which were used for identification against authentic model compounds in the METLIN database. A model match is shown in supplemental figure S1
for the metabolite tyrosine, including a box plot depicting the variation in tyrosine intensities in plasma and CSF. Retention times of compounds are also taken into account to accurately identify metabolites, and are listed in . The variation and intensity of identified metabolites were compared across matched plasma and CSF from 10 male individuals. lists the variation and intensity observed for each metabolite in plasma versus CSF, and a detailed summary is in Supplementary Table S1
. The relative change in the identified metabolites was consistent with the median biological CV determined for all peaks. These observations, together with analysis of intra-group variation of healthy individuals, indicates that fold changes above 2 in metabolomics studies investigating plasma or CSF are statistically relevant with respect to the inherent variability of a healthy control group.
Figure 6 Plot of the average CV over a range of m/z values for biological replicates of plasma (blue squares) and CSF (red triangles). The total number of features over each range of m/z values is represented by the relative size of the corresponding icon (i.e., (more ...)
MS/MS parent ions, fragment ions and chromatographic retention times of identified metabolites in human plasma and CSF.
Variation of individual metabolites in matched human plasma and CSF from 10 male individuals.