|Home | About | Journals | Submit | Contact Us | Français|
This study evaluated the accuracy and large inaccuracy of the Freestyle® Navigator (FSN) (Abbott Diabetes Care, Alameda, CA) and Dexcom® SEVEN® PLUS (DSP) (Dexcom, Inc., San Diego, CA) continuous glucose monitoring (CGM) systems during closed-loop studies.
Paired CGM and plasma glucose values (7,182 data pairs) were collected, every 15–60min, from 32 adults (36.2±9.3 years) and 20 adolescents (15.3±1.5 years) with type 1 diabetes who participated in closed-loop studies. Levels 1, 2, and 3 of large sensor error with increasing severity were defined according to absolute relative deviation greater than or equal to±40%,±50%, and±60% at a reference glucose level of ≥6 mmol/L or absolute deviation greater than or equal to±2.4 mmol/L,±3.0 mmol/L, and±3.6 mmol/L at a reference glucose level of <6 mmol/L.
Median absolute relative deviation was 9.9% for FSN and 12.6% for DSP. Proportions of data points in Zones A and B of Clarke error grid analysis were similar (96.4% for FSN vs. 97.8% for DSP). Large sensor over-reading, which increases risk of insulin over-delivery and hypoglycemia, occurred two- to threefold more frequently with DSP than FSN (once every 2.5, 4.6, and 10.7 days of FSN use vs. 1.2, 2.0, and 3.7 days of DSP use for Level 1–3 errors, respectively). At Levels 2 and 3, large sensor errors lasting 1h or longer were absent with FSN but persisted with DSP.
FSN and DSP differ substantially in the frequency and duration of large inaccuracy despite only modest differences in conventional measures of numerical and clinical accuracy. Further evaluations are required to confirm that FSN is more suitable for integration into closed-loop delivery systems.
Because of lifelong dependency on exogenous insulin, type 1 diabetes imposes a heavy burden on individual and society. Maintaining good glycemic control while avoiding hypoglycemia remains an intangible goal for many subjects with type 1 diabetes despite progress in therapeutic options such as insulin pump therapy and continuous glucose monitoring (CGM).1,2 Novel approaches are needed before cure in the form of cell-based therapy or immunotherapy becomes available.
Closed-loop insulin delivery systems, modulating insulin delivery in glucose-responsive fashion, have the potential to revolutionize the treatment of type 1 diabetes.3 The last decade has witnessed a rapid expansion of research into closed-loop systems informed by subcutaneous glucose levels measured by commercial CGM devices.4–12 Despite significant progress, continuous subcutaneous glucose monitors are less accurate than capillary glucose measurements. Some of these deviations are physiological, such as at times of rapid glucose change, reflecting the time taken for the movement of glucose between different body compartments,13 but at other times sensors deviate from reference glucose because of inaccurate calibration, pressure dropouts, sensor dislodgment, and other artifacts. Such unpredictable sensor deviations of sufficient length and magnitude may lead to temporal insulin over- or under-delivery during closed-loop operation, the impact of which will depend on several factors such as prevailing glucose, relation to meals and exercise, and other factors such as insulin sensitivity. Despite these subject-specific and circumstantial factors, larger and prolonged sensor inaccuracies are more likely to cause clinically significant insulin over- or under-delivery during closed-loop operation, leading to hypo- or hyperglycemia.
The objective of the current study was to explore novel means to assess CGM inaccuracy and determine the frequency and duration of large sensor errors during our previous closed-loop studies using two different types of commercially available monitors: the Freestyle® Navigator (FSN) (Abbott Diabetes Care, Alameda, CA) and the Dexcom® SEVEN® PLUS (DSP) (Dexcom Inc., San Diego, CA).
We analyzed retrospectively data from five closed-loop studies performed at a clinical research facility.5,7,14–16 All study protocols were approved by the Research Ethics Committee and carried out in accordance with the Declaration of Helsinki. Written informed consent was obtained from participants. Subjects 15 years and younger provided written assent with accompanying written consent provided by a parent or guardian.
Between April 2007 and August 2011, 32 adults including 12 pregnant women (36.2±9.3 years old) and 20 adolescents (15.3±1.5 years old) were enrolled from adult and pediatric diabetes clinics at Cambridge, Norwich, and London, United Kingdom. Key inclusion criteria were type 1 diabetes mellitus (World Health Organization criteria or confirmed C-peptide negative) over at least 1 year and treatment with insulin pump therapy. Key exclusion criteria were concurrent illness or medications likely to interfere with interpretation of study results, poor glycemic control (glycated hemoglobin >10%), recurrent severe hypoglycemia, significant hypoglycemia unawareness, and clinically significant nephropathy, neuropathy, or retinopathy.
The closed-loop studies adopted a randomized crossover design and were conducted at the Wellcome Trust Clinical Research Facility, Addenbrooke's Hospital, Cambridge. Subjects attended the research facility on two or three occasions separated by 1–5 weeks to contrast alternative treatments (closed-loop insulin delivery and conventional treatment5,7,14,15 or two closed-loop visits.16 Study visits lasted between 18 to 36h. Subjects ate main meals (40–100g of carbohydrates) and received premeal insulin boluses calculated according to individual insulin-to-carbohydrate ratios. Four studies involved periods of exercise (lasting 30–60min). During closed-loop visits, basal insulin delivery was directed by a model predictive control algorithm according to sensor glucose levels.4,5 Preprogrammed insulin pump rates were applied during conventional treatment. Further details about individual studies are provided in Supplementary Table S1 (Supplementary Data are available online at www.liebertonline.com/dia).
Sensors were inserted 24–48h before the start of each visit and were calibrated according to the manufacturer's instructions except in one study (DSP), which adopted two additional fingerstick calibrations over the 24-h study period. FSN was calibrated using the built-in glucometer, and DSP was calibrated using quality-controlled LifeScan OneTouch® (Ortho Clinical Diagnostics, High Wycombe, United Kingdom) or Bayer Contour® (Bayer plc, Newbury, United Kingdom) glucometers. Venous blood samples were taken to measure reference plasma glucose by the YSI 2300 STAT Plus™ analyzer (Yellow Springs Instrument, Farnborough, UK) every 60min or every 15–30min after meals, after exercise, and at low glucose levels.
Numerical accuracy was assessed by absolute deviation, absolute relative deviation (ARD), and International Organization for Standardization (ISO) criteria. Clinical accuracy was evaluated using Clarke error grid analysis.18
Three levels of large sensor error were defined: Level 1 (least severe) to Level 3 (most severe). ARD at a plasma glucose level of ≥6 mmol/L was used to distinguish between error levels (Table 1). At a plasma glucose level of <6 mmol/L, absolute deviation was used. Error levels were quantified during sensor over-reading and sensor under-reading. The former increases the risk of hypoglycemia, the latter increases the risk of hyperglycemia. A representative CGM and reference glucose trace presenting the three error levels is shown in Supplementary Figure S1.
Incidence was evaluated as the number of events over the total duration of sensor use expressed as number of events per 100 days of sensor use. Events separated by 30min or longer were counted independently. The duration of events was calculated using 1-min CGM and 1-min interpolated reference plasma glucose data. For numerical accuracy metrics and sensor error duration data, a repeated-measures linear regression model was fitted to native values (or the ranked normal transformation for non-normally distributed variables) to compare the two CGM systems. Plasma glucose, glucose variability, and sensor performance during each admission were compared using the independent-samples t test for normally distributed data or the Mann–Whitney U test for non-normally distributed variables. Calculations and comparisons were carried out using MATLAB version R2011b (MathWorks®, Natick, MA) and SPSS version 19 (IBM Software, Portsmouth, United Kingdom). Data are presented as mean (SD) or median (interquartile range) values unless stated otherwise. A value of P<0.05 was considered statistically significant.
Paired CGM and reference glucose values were collected over 2,988h, yielding 7,182 sensor–reference plasma glucose (YSI) pairs. Duration of sensor use was similar for FSN (1,548h) and DSP (1,440h) (Table 2). Because of differences in sampling frequency, more sensor–YSI pairs were collected in studies using FSN (FSN, 4,218 pairs; DSP, 2,964 pairs), but the percentage of pairs in hypoglycemia (<3.9 mmol/L) was similar (9.3% vs. 9.5%, respectively). The exposure to post-exercise periods was longer for FSN (75h) compared with DSP (36h), and exposure to postprandial conditions was higher for DSP (168 meals) than FSN (136 meals) (see Supplementary Table S2). Further information about studies such as mean glucose and variability and sensor accuracy is shown in Supplementary Table S3. Pooled analysis showed that studies involving FSN achieved lower plasma glucose and lower variability than DSP (Supplementary Table S4), predominantly because of the study involving pregnant women, which incidentally also had the highest ARD for FSN.
Numerical and clinical accuracy is summarized in Table 2, indicating a slightly better performance by FSN. Over the observed glucose range from 1.6 to 20.7 mmol/L, mean ARD was slightly lower with FSN compared with DSP (13.9% vs. 16.4%, respectively; P=0.002). Median ARD was also slightly lower, and percentage of data points in Clarke error grid analysis Zone A and ISO criteria was higher with FSN. During euglycemia (3.9–10 mmol/L) and hyperglycemia (>10 mmol/L), FSN was more accurate, and during hypoglycemia DSP was more accurate. Proportions of subjects with a median ARD greater than 15% were 25% with FSN and 42% with DSP.
Incidence of large sensor over-reading was two to three times higher with DSP than FSN across the three error levels (Fig. 1). With FSN, the incidence of large sensor over-reading was 40.0, 21.8, and 9.3 per 100 days at Level 1, 2, and 3, respectively, indicating that large sensor over-reading at Level 1, 2, and 3 occurred, respectively, once every 2.5, 4.6, and 10.7 days of FSN use. With DSP, the incidence of large sensor over-reading was 83.3, 50.0, and 27.0, respectively, per 100 days, so that an event occurred every 1.2, 2.0, and 3.7 days, respectively. Incidence of large error declined as the error level increased but more so for FSN than DSP. Incidence of large sensor under-reading was similar between FSN and DSP. Although we have pooled the incidence of sensor over- and under-reading across all glucose range (Fig. 1), it may be relevant to stratify the incidence during hypoglycemia, euglycemia, and hyperglycemia separately as the anticipated detrimental impact may vary. The incidence of sensor over-reading and under-reading stratified according to reference glucose is shown in Supplementary Tables S5 and S6. In particular, there were no sensor over-reading episodes for FSN above a glucose level of 10 mmol/L. The percentage of data points in each error level is shown in Supplementary Table S9. The percentage of data points with sensor over-reading was consistently higher with DSP across all three error levels, while the percentage of data points in sensor under-reading was comparable. Number and percentage of admissions with none, one, two, three, four, and five more sensor errors are shown in Supplementary Table S10.
The duration of large errors combining sensor over-reading and under-reading is shown in Figure 2 and Supplementary Tables S7 and S8. Median sensor error duration at Level 3 tended to be higher with DSP but did not reach statistical significance: 11 (2–16) min for FSN versus 29 (8–42) min for DSP (P=0.052). At Level 1, 2, and 3, respectively, 52%, 57%, and 78% of FSN errors and 57%, 41%, and 58% of DSP errors were less than 30min long. Large errors lasting longer than 1h were absent with FSN at Level 2 and 3 but persisted with DSP (respectively, 0% and 0% vs. 12.5% and 10.6% of all events) (Fig. 3).
We propose novel measures of CGM performance with focus on reliability. We define three levels of large sensor inaccuracy according to severity and anticipated detrimental impact on safety of closed-loop glucose control. The proposed metrics identified important differences in the incidence of large sensor over-reading between two commercial CGM systems despite modest differences in traditional measures of numerical and clinical accuracy. Sensor over-reading occurred two to three times more commonly with DSP than with FSN, and at error Level 2 and 3 all FSN errors were less than 1h. In contrast, the incidence of sensor under-reading was similar between the two sensors.
Traditionally, the accuracy of CGM devices is expressed using numerical and clinical standards.19 Numerical accuracy is based on mean or median ARD and ISO criteria,20 whereas clinical accuracy is often expressed using the Clarke or consensus error grid analyses.18,21 Many of these measures, including the Clarke and consensus error grids, were developed to assess the accuracy of point-of-care glucose testing devices. Despite potential limitations, numerical and clinical accuracy metrics are widely used to assess CGM devices.22–25
Despite widespread use, existing criteria may not be appropriate for analyzing accuracy of CGM from a closed-loop perspective. For example, a sensor reading of 180mg/dL when the actual glucose level is 110mg/dL, nearly 64% sensor over-reading, is classified as a benign sensor error (Zone B) in Clarke error grid analysis but, if prolonged, may lead to harmful insulin over-delivery during closed-loop operations.26 Continuous glucose error grid analysis is a more recently developed tool, specifically designed to evaluate the CGM data in terms of both point accuracy and rate of change accuracy.27 However, none of the above measures provides information about the duration, severity, and incidence of individual large sensor errors. Yet, at an individual level, it is the largest and longest sensor inaccuracy that is likely to have the greatest impact on closed-loop insulin delivery, and at present there are no established criteria to evaluate the performance of CGM from a closed-loop perspective. Computer simulation studies suggest occurrence of significant hypoglycemia (glucose level of ≤36mg/dL) when the sensor error is consistently more than 45%.26 It is conceivable that a milder degree of hypoglycemia may occur at much lower degrees of sensor error.
Assessment of the incidence and duration of large sensor errors is an important determinant for the use of CGM systems in closed-loop systems for outpatient use. Our findings are particularly relevant for closed-loop operation but may also have relevance for routine clinical use of CGM, where erroneous sensor readings may be used to adjust insulin doses despite common advice to base treatment decisions on fingerstick values.
To our knowledge, this is the first study quantifying the incidence and duration of large inaccuracy for commercially available CGM systems. In the absence of established guidelines, we identified Level 1–3 errors (Table 1), building on our previous simulation work26 as well as real-life experience of closed-loop studies.4–7
Measured as median ARD, previous studies have shown FSN accuracy to be in the range of 9–12%22,23 and DSP accuracy to be around 13%.24 Our results concur. Proportions of data points in Zones A and B of Clarke error grid analysis in our study broadly were similar (96.4% for FSN vs. 97.8% for DSP), in agreement with previously published data.22,24 However, the percentage in Zone B was much higher with DSP (18% for FSN vs. 27% for DSP) and particularly in the sensor over-reading zone. Thus an increased percentage of values in Zone B may be an indicator of significant large sensor inaccuracies, suggesting that Clarke error grid may not provide sufficient granularity to evaluate inaccuracies.
Strengths of the present study include addressing an unmet need to assess reliability of CGM systems and using data from studies performed under uniform single-center controlled conditions. The single most important limitation of our study is the use of different studies, with differences in exposure to postprandial and post-exercise conditions, to assess the accuracy of two CGM systems. However, all participating subjects had type 1 diabetes and were on insulin pump therapy before and during the studies. Studies involving DSP had higher mean glucose and higher glucose variability, and we cannot exclude possible impact from this on sensor performance. However, we were investigating large sensor errors in excess of 40%, and it is unlikely that the observed considerable differences originated in small differences in mean glucose or variability. All studies were conducted prospectively, but the current analysis was undertaken retrospectively, and the determination of incidence and duration of large sensor error was not the a priori end point. DSP was over-calibrated in one study, but during other studies calibration was according to the manufacturer's instructions. Of note is that DSP is being replaced by Gen 4, already available in Europe and with reported improved performance providing exciting opportunity for closed-loop systems.28,29
In summary, our data suggest significant differences in the rates of large sensor over-reading between two commercially available CGM systems with fairly comparable clinical accuracy. This highlights limitations of current accuracy metrics for analyzing CGM data. Based on our findings FSN may be safer than DSP during closed-loop operations, but further studies where both CGM systems are evaluated in the same patient under identical study conditions are required before firm conclusions can be made. Further work is required to understand the full clinical significance of large sensor errors.
We thank study participants, parents of adolescents, and staff at the Wellcome Trust Clinical Research Facility, Cambridge University Hospitals NHS Foundation Trust, Addenbrooke's Hospital, Cambridge, United Kingdom. Studies described in this manuscript were funded by the Juvenile Diabetes Research Foundation, Diabetes UK, U.S. National Institutes of Health, and European Commission under 7th Framework Programme. Additional support by the NIHR Cambridge Biomedical Research Centre and the Medical Research Council Centre for Obesity and Related Metabolic Diseases is acknowledged. We thank Abbott Diabetes Care and Dexcom for support; these companies provided no funding or played any other role in clinical studies or data analysis.
R.H. reports having received speaker honoraria from Minimed Medtronic, LifeScan, Eli Lilly, and Novo Nordisk, is serving on an advisory panel for Animas and Minimed Medtronic, is receiving license fees from BBraun, and has served as a consultant to BBraun and Profil. M.L.E. reports having received speaker honoraria from Abbott Diabetes Care and Animas and is serving on an advisory board for Medtronic, Roche, and Cellnovo. J.M.A., D.E., K.K., K.C., H.R.M., C.L.A., M.E.W., A.H., D.B.D., M.N., and L.L. report no competing financial interests. R.H. conceptualized the study, acts as the guarantor, and held overall responsibility for the conduct of closed-loop studies at Cambridge. R.H. and L.L. co-designed the present analysis. L.L., M.N., and R.H. wrote the manuscript. M.N. and L.L. were responsible for data analysis. L.L., J.A., D.E., K.K., and K.C. conducted clinical studies. M.E.W. and A.H. contributed to algorithm development and simulation studies. M.L.E., H.R.M., D.B.D., and C.L.A. acted as Principal Clinical Investigators for closed-loop studies.