The SUPREME-DM project has united a large consortium of researchers with extensive expertise in childhood, adult, and gestational diabetes to identify more than 1 million unique individuals with diabetes from comprehensive EHR and administrative data of 11 integrated health systems, of whom 428,349 had incident diabetes. Because the DataLink is constructed from comprehensive inpatient, outpatient, pharmaceutical dispensing, and laboratory results data available from the EHR, clinical, and administrative databases of each of these health care systems, and because these data are extracted from defined populations with a known denominator, the DataLink is a unique resource for conducting comparative effectiveness research, surveillance, and epidemiologic studies of unprecedented scale and clinical detail.
Use of registries has enhanced medical care for patients with diabetes in individual health care systems for 2 decades (
1). Indeed, several of the participating sites were early developers of diabetes registries derived from electronic data (
2,
3,
11,
15,
16) and have used these registries for clinical care, quality improvement, and research purposes. However, these registries have traditionally been limited to patients served by only 1 health care delivery system, and variation in how registries were created has impeded cross-system comparisons (
9). One major goal of the SUPREME-DM DataLink is to standardize data definitions across participating systems to provide the best possible estimates of diabetes and its complications. The variability among organizations in the proportion of people with diabetes and the source of recognition of incident cases emphasizes the need for this next step.
Although useful for more limited analyses, other previous or existing electronic registries cannot provide equivalent data for analysis. For example, in conjunction with the Centers for Disease Control and Prevention, a collaboration of 3 managed care organizations developed a unified system in 1998 for conducting diabetes surveillance, tracking health services, and delivering preventive care (
17). That system has not been maintained. The Department of Veterans Affairs (VA) has an excellent linked national database of VA patients that has been used to identify patients with diabetes, but the population is not representative because patients are predominantly male and sicker than the overall population of patients with diabetes (
8). One diabetes database recently developed by the University of Pittsburgh Medical Center (UPMC) represents a more heterogeneous population, combining data from a large number of insurers (
18) but covering only a single region. It is unclear whether the UPMC database will be routinely refreshed or whether a denominator of patients with and without diabetes can be easily identified, an essential component for estimating rates of diabetes and its complications. After assessment of other US diabetes databases, we believe the SUPREME-DM DataLink is unique in its size, comprehensiveness, and geographic coverage.
Currently, the best estimates of US adult diabetes prevalence emerge from analyses of the National Health and Nutrition Examination Survey (NHANES). Those data suggest that 7.7% of the US adult population (aged ≥20 y) had diabetes in 2005–2006 (
19). Similarly, we found that 6.9% of all enrollees (including children) in the SUPREME-DM DataLink have diabetes. NHANES identifies people with diabetes on the basis of self-report and by a single, unconfirmed elevated laboratory test result. Our DataLink has much more robust parameters to confirm diabetes status. Furthermore, as a cross-sectional survey, NHANES can estimate diabetes prevalence but not diabetes incidence. The longitudinal nature of the DataLink will allow the estimation of the incidence of diabetes and its complications, a unique feature that holds promise for future research and national surveillance efforts.
As recently noted by the Institute of Medicine (IOM), no surveillance system operates nationally and in a coordinated manner to integrate current and emerging data (
20). The IOM report calls for a system that includes data on incidence and prevalence over time, primary and secondary prevention (including early detection), health outcomes following surveillance, representative samples, and disparities, noting that EHR data will play a key role in the surveillance of chronic disease. The SUPREME-DM DataLink answers that call by using the actual medical records of more than 15 million people. The comprehensive EHR data available to the DataLink can be used to conduct population-based studies of the complications of diabetes while accounting for a wide range of demographic and clinical characteristics that independently contribute to risk. Furthermore, by examining data before and after diabetes diagnosis, the SUPREME-DM DataLink can be used to study the complete natural history of hyperglycemia and its associated complications.
Despite our standardized definition of diabetes, we observed variation across sites in how members with incident diabetes were initially identified. In addition to differences in the demographic makeup of the site-specific populations, there are several possible explanations. Although each of the 11 sites participating in SUPREME-DM is an integrated health care delivery system, their organizational structures differ (even across the 6 Kaiser Permanente regions). Furthermore, laboratory tests may not use the same reference ranges in all sites, and the use of hemoglobin A1c (HbA1c) assays, although moving toward standardization, could introduce variation. Differences in how providers code diagnoses during outpatient encounters or the inclusion of diagnostic codes linked to laboratory procedures or prescriptions could also introduce variation in the identification of diabetes across sites. Incomplete data capture at some sites, specifically of laboratory tests conducted outside the system or prescriptions filled outside of system pharmacies, could also contribute to variation. Site differences in ascertainment may lead to apparent but artificial differences in diabetes duration or severity — a topic for future SUPREME-DM research. These possibilities are all under investigation. Despite these potential sources of variation in diabetes identification across sites, however, it is likely that a patient with diabetes in any of the systems will be recognized in a reasonably short period of time, especially when multiple data sources such as pharmacy, diagnosis codes, and laboratory results are used for this purpose. Indeed, approximately 85% of diabetes cases in all sites had multiple indications.
As with any observational data collected for health care and payment, there are potential limitations to the SUPREME-DM DataLink. Inconsistencies in data availability (eg, not all sites can distinguish between random and fasting glucose tests) may preclude use of the DataLink for certain purposes or require exclusion of some participating centers from specific analyses. Unrecognized or unmeasurable differences among our study sites in the use of EHRs and the completeness of data could lead to inaccuracies and potential bias in the estimation of diabetes incidence and prevalence. The patient populations in integrated health delivery systems may not generalize to patients managed in less integrated settings, in other geographic areas, or to uninsured populations. A common case identification algorithm was used to identify members with diabetes across all SUPREME-DM sites, but we did not have the resources to individually validate each case through medical record review. Thus, ancillary studies should use caution when approaching individual health plan members because of the occasional member with a coded diagnosis who may not truly have diabetes. An additional limitation is the inability to distinguish members with type 1 and type 2 diabetes with a high level of precision. Finally, date of diabetes diagnosis, an important element in analyses of the natural history and clinical outcomes of diabetes, is not known for 60% of the diabetes cases.
We are expanding the DataLink to include members at risk for developing diabetes on the basis of elevated fasting glucose, glucose tolerance, or HbA1c tests that do not meet diagnostic criteria for diabetes, and to identify women with gestational diabetes. Data for additional years (2010–2012) will be added as they become available. The SUPREME-DM DataLink is a valuable resource that provides an opportunity to conduct comparative effectiveness research, epidemiologic surveillance including longitudinal analyses, and population-based care management studies of people with diabetes, gestational diabetes, and prediabetes, and to explore associated risk factors, complications, and health outcomes in new ways. The DataLink also provides an excellent source for pragmatic clinical trials of preventive or treatment interventions to improve the health and quality of care for people with diabetes.