We begin with the first experiment (fixed concentration Raman scatterers in all 49 samples.) The observed Raman spectra of the 49 samples are shown in . The associated DRS spectra are shown in . To better illustrate the relative change in the DRS spectra due to changing optical properties, the DRS spectra in are normalized to 4% intralipid with no absorber added (μs = 73.6 cm-1). Note that any of the samples could in theory be chosen as the basis for normalization. The DRS spectra were not normalized to a white reflectance standard such as Spectralon (>99%, Labsphere) because the reference glucose and creatinine concentrations were measured in an aqueous solution. Thus, the “intrinsic” Raman spectrum includes the effects of water absorption and it was not necessary to correct the turbid samples for this effect. Because the DRS spectra were internally normalized, Rd is here the relative diffuse reflectance rather than the absolute diffuse reflectance. This difference has no effect on the functionality of the IRS correction method. As can be seen in , even though the intensity variations in this wavelength region (830-960 nm) are substantial, the spectral variations from sample to sample, owing to changes in Intralipid and water absorption, are minimal. Thus, we can use the -independent correction method.
(a). Raman spectra (b). Relative DRS spectra normalized to 4% intralipid with no absorber present.
For the λ-independent method, we compare the observed number of molecules measured (NOBS) for each sample to the actual reference concentration (NREF). For convenience, NOBS was obtained through OLS analysis, in which the measured Raman spectrum is fit by a linear combination of the spectra of each component in the physical tissue model (e.g., water, Intralipid, India ink, creatinine, fused silica cuvette, etc.). The fit coefficient for the (constant) concentration of the Raman scatterer, in this case creatinine, is NOBS. (It does not matter which Raman scatterer is used as the probe to calibrate f(Rd), as the shape of this parameter is not analyte-dependent.) In the absence of turbidity, the ratio NOBS/NREF should be 1. However, owing to the effects of absorption and scattering, this value varied significantly, from 0.34 to 1.14. NOBS/NREF for each of the 49 samples is displayed in as a function of μa and μs.
The ratio of observed to actual values of an analyte at constant concentration showing significant deviation from 1 resulting from turbidity distortions, plotted as a function of μa and μs.
Following Eq. (2)
, values of (NOBS/NREF
are plotted in . Data collected from the two experiments are combined. The fit to the data is f
), which for these experiments is best represented by an exponential function. Although the data collection periods were two weeks apart and the sample was replaced several times, it can be seen that f
) remains constant, as would be expected for similar geometry and anisotropy, g
(NOBS/NREF)·μt versus Rd for experiments 1 and 2, showing reproducible curvature.
The noise in , i.e., data points that do not lie on the curve, is a result of experimental signal-to-noise. The creatinine signal is much smaller than the signals from most of the other constituents; therefore slight modeling errors and noise in the spectral data contribute to error in extracting the OLS fit coefficient for creatinine. As noted above, any Raman scatterer can be used to determine the IRS calibration curve, f(Rd). For example, intralipid itself may be used provided that its apparent concentration change due to varying optical properties is deconvolved from its real concentration change. Processed accordingly, use of intralipid data with its higher signal-to-noise ratio results in a tight f(Rd) curve equivalent to the curvature of , but with no appreciable spread. However, to prevent potential confusion on the part of the reader, we opted to present the analysis using the fixed-concentration Raman scatterer, creatinine.
Once f(Rd) was obtained, NOBS for each sample was multiplied by the corresponding μt of that sample and divided by f(Rd) to obtain NPRED, which may be compared to NREF for assessing prediction accuracy. The root-mean-square error of prediction (RMSEP) for the uncorrected data (NOBS) was 36%, versus an RMSEP for the IRS-corrected data (NPRED) of 6%. Thus, IRS significantly improved the prediction accuracy.
The second data set was designed to illustrate the effectiveness of the IRS correction method under an implicit calibration framework [18
]. The data were analyzed twice, first using OLS (explicit calibration) and then via PLS (implicit calibration). Prior to implicit analysis, the 10 sequential Raman spectra for each sample were averaged and smoothed via a Savitsky-Golay function.
The Raman spectra were then split into calibration and validation sets following two different procedures. In the first procedure, the 50 sample spectra were randomly split into 36 calibration and 14 prediction sample spectra for 500 independent iterations. For comparison between the uncorrected analysis, in which sampling volume changes are ignored, and IRS, the reference values used in conjunction with the calibration data set to generate the regression vector [18
] were either the actual reference values, NREF
, or, for IRS application, NOBS
, as calculated from Eq. (2)
. The number of PLS factors employed in either case was 6, which is in accordance with the number of known variables and the size of the calibration set. The resulting calibration algorithms were applied to the same validation set to compare predictive capabilities. In the case of IRS application, a final step was needed to convert the predicted values NPRED’
, again via Eq. (2)
. The boxplot for the 500 RMSEP values for glucose generated with uncorrected and IRS-corrected data is shown in . The mean and standard deviation are 0.56 and 0.11 for the uncorrected data, and 0.43 and 0.08 for the corrected data, resulting in more than a 20% reduction in mean and standard deviation of prediction error by implementing IRS.
Boxplot showing reduction in mean and standard deviation of prediction error by application of IRS. Values were derived from 500 unique splittings of 50 samples into 36 calibration and 14 prediction sample sets.
In the second procedure the samples were split into calibration and validation sets according to the μs/μa value, where μs=μs(1-g) with g=0.8. Samples having μs/μa values less than 65 comprised the calibration set and samples having μs/μa values higher than 65 comprised the validation set (). Such a splitting replicates the scenario of not incorporating the full range of optical property variability into the calibration set. Typically, this scenario results in a calibration algorithm that is not robust. In other words, applying the generated calibration algorithm to samples lying outside the range of the calibration set gives much larger prediction error.
Fig. 7 (a). 50 samples split into 36 calibration and 14 prediction samples based on optical property values. (b). Standard error of cross-validation (SECV) and standard error of prediction (SEP) for samples split into calibration and prediction sets based on (more ...)
Indeed, for uncorrected analysis of the data (using NREF for the reference values as described above), the standard error of cross-validated (SECV) for the calibration set is 0.47 mM whereas the standard error of prediction (SEP) for the validation set is nearly twice as large, 0.86 mM. For the IRS analysis, the SECV is 0.41 mM and, in contrast to the uncorrected analysis, the SEP remains nearly level at 0.42 mM (). This is an improvement of over 50% compared to the uncorrected RMSEP. This scenario illustrates the advantages of the IRS correction method. By replacing the reference values with the actual number of analyte molecules in the sampling-volume region NOBS, the multivariate calibration technique is better able to lock onto the signal from the analyte of interest, generating a more robust model and greatly reducing the prediction error.