Glucose profiles of two typical, representative subjects, #6 and #8, are used to initially evaluate the adequateness of the proposed approach (). Mapping of raw iSense subcutaneous current measurements (nA) to glucose concentration (mg/dl) is obtained by performing linear regression between the entire reference capillary BG measurements collected 20 times a day for 5 days and raw iSense sensor data, and by subsequently applying the regression fit to map the entire subcutaneous sensor data into glucose concentrations. Data from each subject are used to compute the regression fit and map data for that subject. Mapped sensor data are henceforth taken as the “gold standard” against which glucose model predictions are compared.
Subcutaneous glucose measurements for approximately 5 consecutive days with the iSense continuous glucose monitoring system for two typical individuals, subject #6 (continuous line) and subject #8 (dotted line), with type 1 diabetes.
To determine if AR models can represent glucose data by fully capturing the correlations in the time series, we first develop an AR model for a specific subject using a portion of time-series data of that subject and then, using the developed model, compute the prediction residuals of the series, i.e., the point-by-point difference between measured and predicted glucose values. The top panel in shows the measured and (1-minute ahead) predicted glucose levels obtained with an AR model of order m = 10 using the first 2000 minutes, i.e., the first 2000 data points, of subject's #6 data. The prediction residuals, illustrated in the bottom panel of , are small, indicating a good model fit.
Model fit and residual error. Model fit for subject #6 based on an autoregressive model using the first 2000 minutes or data points (top) and corresponding residual error between model fit and sensor data (bottom).
Moreover, to determine the capability of the model fit in capturing correlations in time-series glucose data, we compute the autocorrelation of the residuals and check for whiteness in the correlation.12
An appropriate model that fully captures the correlation in the data should result in uncorrelated residuals, consisting of white noise. For purely white noise residuals, the autocorrelation coefficients, normalized between −1 and 1, attain a value equal to one for a zero shift (i.e., zero delay) and a value equal to zero for all other shifts in the signal. For practical applications, where the residuals are not purely white and the autocorrelation coefficients for nonzero shifts hover around zero, a degree of certainty of the whiteness of the residuals may be inferred by computing approximate confidence intervals around the autocorrelation coefficients.
shows the autocorrelation function of the residual error corresponding to subject's #6 model depicted in
. The approximated 95% confidence intervals of the autocorrelation coefficients about zero, determined by the Portmanteau test,12
are illustrated by the shaded area.
suggests that the 10th-order AR model captures most of the correlations in glucose data, but not all of them. This is evidenced by the apparent structure remaining in the residuals, leading to minor but noticeable sinusoidal-type oscillations. This could be because of misspecification of the selected regression model. A model that completely captures the correlations in data for a specific subject, however, is not necessarily the best overall model if we wish the model to have good generalization capabilities and be portable from individual to individual.
Autocorrelation function and associated 95% confidence interval of the model fit in .
Next, we investigate the predictive power of AR models by determining the accuracy of the glucose-level predictions as a function of prediction horizon. Employing the AR model discussed earlier, developed by using the first 2000 minutes of subject's #6 data, we predict this subject's glucose levels for the remaining 4000 minutes for arbitrarily selected but clinically useful 30-, 60-, and 120-minute prediction horizons (). That is, the prediction at any given time point in the future is performed 30, 60, or 120 minutes prior to that time. For example, predictions at 3000 minutes are performed at 2970, 2940, and 2880 minutes, respectively, for 30-, 60-, and 120-minute prediction horizons. As expected, the prediction accuracy decreases as the prediction horizon increases. This may be quantified by computing the root mean square error (RMSE) between measured and predicted glucose levels over the 4000 predicted points. shows that while predictions for the 30-minute horizon are quite accurate, exhibiting a small prediction delay and an RMSE of 22.2 mg/dl, the accuracy of the predictions deteriorates for longer horizons, indicating considerable phase shifts and a larger RMSE (53.8 mg/dl) for the 120-minute-ahead predictions.
Figure 4. Autoregressive model predictions for subject #6 for three different prediction horizons: 30 minutes (top), 60 minutes (middle), and 120 minutes (bottom). The first 2000 minutes or data points (shaded area) are used to “learn” the model (more ...)
To further assess the utility of the predictions using clinically acceptable metrics, we perform Clarke error grid analysis,13
which maps pairs of sensor-predicted glucose concentrations into five zones, A to E, of varying degrees of accuracy and inaccuracy of glucose estimations. Values in zones A and B are clinically acceptable, whereas values in zones C, D, and E are potentially dangerous, with an increasing chance of incorrect treatment as the points move from zones C to D to E.
The Clarke error grid in shows the 30-minute prediction horizon results associated with the corresponding 4000 predictions in . The results are also summarized in (under the column marked 6-6, used to indicate that a model derived based on subject's #6 data is used to predict subject #6). The majority of the points (85.3%) lie in zone A, 13.3% in zone B, and the remaining 1.4% in zone D. also shows results for the corresponding 60-minute prediction horizon, where 66.2% of the pairs lie in zone A, 31.1% in zone B, 0.6% in zone C, and 2.1% in zone D.
Figure 5. Clarke error grid analysis for 30-minute prediction horizon for the last 4000 data points of subject's #6 data using a model based on that subject's first 2000 data points (6-6 results in the top panel of ). Over 85% of the pairs fall in zone (more ...)
Clarke Error Grid Analysis for 30- and 60-minute Prediction Horizons for AR Models Based on Subjects #6 and #8a Prediction horizon
The Clarke error grid is used here (as one of two performance metrics) because of its clinical acceptability and as a common basis for comparison with other glucose-management algorithms.3
We note, however, that it has limitations in assessing the performance of CGM devices.14
In particular, it does not account for temporal dependencies in the signal and only provides a composite analysis, where all errors are treated equally (as a percentage) without accounting for consistent errors.
Finally, to determine the possibility of having AR models made portable from individual to individual without any need for model tuning, we apply the model developed using subject's #6 data to predict the entire glucose-level profile (6000 minutes) of subject #8 for the three prediction horizons (6-8 results in ). The results are very similar (in terms of RMSE) to those in , where a model developed using subject's #6 data is used to predict unseen data for subject #6 (6-6), and to those obtained when the first 2000 minutes of subject's #8 data are used to predict that subject's remaining 4000 minutes (8-8 results not shown). The results are also similar in terms of the Clarke error grid analysis in . When comparing 8-8 results with 6-8 results, we notice that there is only a slight deterioration in the percentage of points falling in zones A plus B, for both 30- and 60-minute prediction horizons, when subject's #6 model is used to predict subject #8.
Figure 6. Autoregressive model predictions where the model based on the first 2000 minutes of subject's #6 data is employed to predict subject #8 (6-8 results). The predictions are provided for three prediction horizons: 30 minutes (top panel), 60 minutes (middle (more ...)
Similarly, shows the results when a model developed using the first 2000 minutes of subject's #8 data (with order m = 10) is used to predict the entire glucose-level profile for subject #6 (8-6 results). Comparison of the prediction accuracy over the last 4000 minutes between this model and the one developed using subject's #6 data indicates modest deterioration, with 8-6 results () showing slightly higher RMSEs than 6-6 results (). The Clarke error grid analysis in indicates that there is a small degradation when subject's #8 model is used to predict subject #6 (8-6 versus 6-6), suggesting that results achieved with portable models are not significantly inferior from those obtained with individually-tuned models.
Figure 7. Autoregressive model predictions where the model based on the first 2000 minutes of subject's #8 data is employed to predict subject #6 (8-6 results). The predictions are provided for three prediction horizons: 30 minutes (top panel), 60 minutes (middle (more ...)
These results suggest that there is very small interindividual variability in the autocorrelation of time-series glucose data. Indeed, comparative analysis of the AR model coefficients b for subjects #6 and #8 indicates that the three most significant (latest) coefficients, which are orders of magnitude larger than the remaining seven coefficients, are very similar.
To provide further evidence of the portability of AR models, we perform cross-subject predictions over all nine subjects who passed the modeling exclusion criteria. shows the comparison of two sets of predictions for each subject on the basis of RMSE and Clarke error grid analysis (zones A plus B). In the first set, predictions for each subject are obtained by using the subject's first 2000 minutes to develop the subject's model (with m = 10), which is subsequently used to predict the remaining 4000 minutes of glucose data (labeled as “self”). In the second set, each subject's model, developed as discussed earlier, is used to predict the entire glucose profile (6000 minutes) for each of the other eight subjects. The entries in for these cross-subject predictions indicate average results and associated standard deviations for each subject based on predictions for that subject employing the models of the other eight subjects (“cross-subject”).
Comparison of Individually Tuned (Self) and Portable (Cross-Subject) Model Performance for the Nine Subjects Who Passed Modeling Exclusion Criteria Subject
shows that, as expected, for both metrics the 30-minute-ahead predictions are consistently more accurate than the 60-minute predictions. In terms of Clarke error grid, for 30-minute-ahead predictions, 95.8 to 100.0% of the results fall in the clinically-acceptable zones A and B. More significantly, results indicate that there is only a modest decrement in performance between individually- tuned, self-models and cross-subject, portable models. For example, for the 30-minute-ahead predictions, the maximum decrement in performance, observed in both subjects #6 and #15, is of only 2.8 basis points (98.6– 95.8 for #6). Of additional importance is the negligible variance of the cross-subject results for each one of the nine subjects, indicating that the model of any one subject is capable of adequately predicting each of the other eight subjects. These results strongly support the hypothesis that AR models can be made portable.