Data Sources and Study Population
We used data from the Healthcare Data Analysis Information Group in Milwaukee, Wisconsin for pharmacy and laboratory results and the central data repository for the VHA in Austin, TX for demographics and inpatient and outpatient utilization with associated International Classifications of Diseases, 9th edition (ICD-9) codes. Veterans who used the VHA in Fiscal Year 1999 (FY99) or Fiscal Year 2000 (FY00) and in at least 1 year received a diabetes medication, had an ICD-9 code 250.xx (diabetes) associated with more than one outpatient encounter, or had any inpatient encounter were considered to have diabetes (Hebert et al. 1999
; Health Plan Employer Data and Information Set [HEDIS®] 2004
). This identified 1,476,921 A1c tests on 486,558 patients.
Hemoglobin A1c is the percentage of hemoglobin with an attached glucose moiety. As red blood cells, the repositories of hemoglobin, have an average lifespan of 120 days, A1c values reflect average ambient glucose values over the past 120 days, although the last 30 days' average glucose disproportionately determines the measured A1c (Tahara and Shima 1995
). Thus, we only included A1c values for each individual that were separated by at least 30 days, excluding 38,186 tests. We also eliminated 1,481 A1c values that fell outside the plausible physiologic range (3–18 percent).
Considerable lab assay methodology variations for A1c testing have been described within the United States and VHA (Little et al. 2001
; Safford et al. 2003
). Because the National Glycohemoglobin Standardization Program (NGSP) is currently implementing nationwide standardization, such variation is likely to diminish in the near future. In FY99 and FY00, however, many labs in the United States, including several VHA facilities, were using methods not certified by the NGSP. Other VHA facilities were using multiple methods, some certified and others not. To minimize this variation, we excluded all 618,877 uncertified A1c tests from the analysis.
Many VHA patients receive care at multiple facilities within a given year; for example, 11 percent of the patients in our sample had A1c tests at more than one facility. For evaluation of facility-level differences in quality of care, each patient was assigned a “home facility” for the study period (FY99 and FY00), defined as the facility at which the patient made the most outpatient diabetes-related visits.
Some VHA facilities were reorganized in FY00, some did not provide lab assay method information, and some facilities reported lab data incompletely (Kerr et al. 2002
). This resulted in the exclusion from the sample data of 32,342 tests at 28 facilities. Additionally, 19 facilities had fewer than 100 A1c tests over the study period, far fewer than the typical VHA facility. Because of this scarcity of data, it was judged that estimates of growth curves for these facilities would be too unstable to compare reliably to larger VHA facilities. Hence, they and their 1,274 tests were dropped from the analysis.
The final study sample consisted of 284,895 patients from 125 facilities with 816,721 A1c tests. Approval from our institution's Institutional Review Board was obtained.
The outcome variable of interest was patient A1c level. Available covariates included the number and dates of all A1c lab tests for each individual, lab methodology, and case-mix adjustment variables (which included demographic information, marital status, diabetes severity, and the Charlson index of comorbid illnesses). The Charlson index was selected to capture potential illness-related influences on treatment decisions to achieve glycemic control.
We operationalized two domains of time: “relative time” and “absolute time.” Relative time was used to model individual patients' pattern of change in A1c values over the course of the study period. The relative time associated with a patient's given A1c value was the number of months elapsed from the patient's first recorded A1c test during the study period to the given A1c test date. Absolute time was defined as the number of days elapsed from the start of FY99 to the given A1c test date, which was used to model the seasonal population-level fluctuations in A1c that we observed in preliminary analyses (see ).
Mean Monthly A1c for Veterans Health Administration Veteran Users with Diabetes in Fiscal Year 1999 and Fiscal Year 2000, by Month
We found small but significant differences among the NGSP certified lab methods. Thus, we included A1c lab methodology in the model, with categories denoted by method B (Biorad, Hercules, CA), method T (Tosoh, Tokyo), and other.
Available demographic factors included age, sex, race/ethnicity, and marital status. Age has been associated with A1c levels, with older individuals having lower values (Harris et al. 1999
; Zhang et al. 2000
). Therefore, we included age at onset of the study period as a baseline adjuster. Patient sex was also included in the model. Race was categorized as white, African American, Hispanic, Native American, Asian, other, and unknown. Marital status (married or not married) was included because it has been associated with A1c levels in older men (Zhang et al. 2000
; Safford et al. 2003
) and may reflect social support.
Insulin use has been closely associated with duration and severity of diabetes (UKPDS Group 1998
). The patient's first diabetes treatment recorded in the study period (insulin, no insulin, and missing) was used as a measure of baseline diabetes severity.
The models ascribed to each facility an expected trajectory (“growth curve”) of A1c values for patients utilizing that facility as their home facility. We used the estimated facility-level growth curve slopes to compare outcomes of diabetes care among the facilities, both with and without case-mix adjustment. The hierarchical linear models used are described below; more detailed descriptions are available at http://www.stat.rutgers.edu/~slothrop/VA/Appendix.htm
Two parts, along with a random error term, give the general framework of the model: As with the usual linear regression models, the error terms were treated as independent, mean-zero normal variables.
The time invariant terms in the model included the certified A1c lab method and the case-mix adjustment variables. The time variant terms included seasonal and growth curve terms. The seasonal term consisted of a linear combination of sine and cosine with a period of 1 year; inclusion of both sine and cosine allowed the starting point of the cycle to be empirically determined. The growth curve term was linear in relative time. Each patient in the study was ascribed an individual growth curve for A1c values over relative time, with coefficients containing both fixed and random components. The fixed component of the growth curve coefficients, which differed by facility, described the pattern of change over time for an average patient in that patient's home facility. The random component of the growth curve coefficients modeled deviations of the patient-specific growth curve from the facility-level average growth curve. The random effects were treated as independent mean-zero normal variables.
The random-effects model required estimation of random effects for each patient. With 284,895 patients, this entails intensive computational effort. Because of this practical limitation, generalized estimating equations (GEEs) were used to fit the model rather than implementing the random-effects model directly (Liang and Zeger 1986
). The GEE approach relieved computing load by avoiding estimation of random effects. For continuous responses such as A1c levels, this approach produces consistent parameter estimates for the fixed effects that are identical to those produced by fitting the random-effects model (Diggle, Liang, and Zeger 1996
A common term for model evaluation is R2, which is interpreted as the proportion of variation explained by the regression model. In the longitudinal context, no generally accepted counterpart of R2 exists and so we defined “working”R2. For each patient, we computed the mean observed A1c value and the mean predicted A1c value. The working R2 was computed as in the usual regression setting with independent observations, using the patient mean observed and predicted values instead (see Appendix for details).
The facility-specific intercepts of the growth curve model represented initial mean A1c at that facility and the slopes represented mean monthly A1c change. We used the facility slopes to compare diabetes outcomes. A facility with an expected patient A1c trajectory that brought A1c levels into control rapidly may have provided better care than facilities with trajectories that brought levels into control more slowly or not at all. A facility with a more negative slope was therefore considered to be better performing than one whose slope was either less negative or positive. For multiple comparisons, we constructed simultaneous confidence intervals (SCIs) for the facility slopes using the Studentized maximum modulus method (Hsu 1996
). We assessed changes in facility slopes between the models with and without case-mix variables.
The facilities were also ranked based on the slope estimates, from most rapid glycemic improvement
, or steepest negative slope, to least rapid/glycemic deterioration
. We constructed confidence intervals for the ranks based on a slight modification of the simulation approach of Goldstein and Spiegelhalter (1996)
and Hsu (1996)
(see Appendix for details).
We also investigated an alternative explanation for the overall declining trend of the A1c values: those with worsening glycemic control preferentially leaving the VA system. For this analysis, we selected patients who were alive at the end of the 2-year period by examining the Beneficiary Identification and Records Locator Subsystem for vital status. We then defined two subgroups of patients: those who likely remained in the system, that is, they utilized services in both years of the observation period (n =142,426); and those who left it (“out-migrators”), that is, they had utilization only in the first year (n =51,054). We then fitted the hierarchical growth curve models as described for the main analysis to each subgroup separately for the first year only. Any difference between the average Alc decline rates in each of the two subgroups would indicate if patients who “out-migrated” were more often experiencing worsening control.
All statistical analyses were performed using SAS version 8.2. The level of significance used for inclusion of covariates was p <.05.
compares the sample used in this study to the overall population of VHA users with diabetes in FY99 and FY00. The mean age for subjects in the study sample was 64 years and 98 percent were male, 70 percent were white, 61 percent were married, and 31 percent used insulin at entry into the sample. These differed only slightly from the overall VHA diabetes patient population. For both groups, mean A1c levels decreased monotonically with age. Patients who were unmarried, using insulin, or were African American or Hispanic had higher A1c values compared with their counterparts. A1c levels were slightly but significantly higher in the winter than in the summer and monthly A1c values trended downward over the study period (see ).
Comparison of Study Sample and the Overall Population VHA Users with Diabetes in FY99 and FY00
In the growth curve models, the facility slopes represented the average monthly change in patients' A1c in a given home facility. In the model without case-mix adjustment, the facility slopes averaged −0.0148 A1c units per month (SD 0.0193, range −0.074 to 0.042). Of the 125 facilities, 105 had negative slopes, but 20 had monthly rises in A1c. The 95 percent SCIs for the slopes (see ) exhibited significant differences among the estimates. Additionally, 70 of the negative slopes and five of the positive slopes had SCIs which did not cross zero.
Plot of Monthly A1c Change Rates, with 95 Percent Simultaneous Confidence Intervals, for Average Patient by Facility
Most facility slope estimates were also negative in the case-mix adjusted model. In this model, 102 of 125 facilities had negative adjusted slopes, with mean −0.0131 (SD 0.0188, range −0.079 to 0.043). The 95 percent SCIs for the case-mix adjusted slopes also exhibited significant differences among the estimates. As well, 68 of the negative slopes and five of the positive slopes had SCIs which did not cross zero. The graph of these slopes and SCIs was remarkably similar to and hence is not shown.
We found some differences between the growth curve parameter estimates with and without case-mix adjustment. As expected, the facility intercepts from both models, or initial mean A1c values for patients at that facility, averaged 7.17 (SE 0.252) and 7.84 (SE 0.265) for models with and without case-mix adjustment, respectively, an unsurprising difference reflecting the entry of more explanatory variables into the model. The facility slopes averaged −0.0148 (SE 0.0193) and −0.0131 (SE 0.0188), respectively, and were thus much more similar. After case-mix adjustment, three facilities changed from negative to positive slope and no facility changed from a positive to negative slope. The model without case-mix adjustment had working R2=1.77 percent and the case-mix adjusted model had working R2=14.45 percent, reflecting considerable explanatory power for the case-mix adjustment variables.
displays the case-mix adjusted model fixed parameter estimates. The case-mix parameter estimates largely mirrored the descriptive statistics. For example, the youngest patients (<55) had higher A1c than older patients. Being unmarried or using insulin was also associated with high A1c. Hispanics and African Americans had higher A1c compared with whites. Lab assay method B yielded slightly but significantly higher A1c than other certified lab assay methods. Seasonal effects were modest but significant in both models.
Parameter Estimates for Case-Mix Variables, Seasonal Terms, and A1c Lab Method*
Higher-order polynomial growth curve models were also explored. These demonstrated little improvement over simple linear growth curve models. For the subanalysis on “out-migrators,” the average Alc decline rate was −0.033 (SE 0.0056) for patients in the “out-migrator” group and −0.027 (SE 0.0046) for the patients who remained in VA care. Normal-based statistical tests suggested that these two decline rates were significant (p-values <.001 for both), but there were no significant difference between them (p-value=.407). This suggested that the overall declining trend of the A1c values could not be attributed to patients with worsening control “out migrating.”
One method for comparing the quality of health care is through ranking performance based on some specified criterion. In this study, we ranked facilities based on the facility slopes, using the case-mix adjusted model estimates. We limited ranking to those 105 facilities that had at least 50 patients with two or more A1c values to maximize clinical relevance. shows facility ranks with 90 percent two-sided CIs for the 105 facilities, allowing assessment of comparative performance with the specified certainty. Relative performance of some facilities was evaluated fairly precisely, others less so. For instance, the facility ranked 105th (most highly positive slope) had a 90 percent SCI of [101, 105], so that we can be confident it is a poor performer relative to other facilities. However, the facility ranked 102nd had a 90 percent SCI of [34, 105], so we can only ascertain that it is not in the first quartile of performers. For the top 10 performers (ranks 1–10) we obtained 90 percent upper CIs for the ranks and for the bottom 10 performers (ranks 96–105) we obtained 90 percent lower CIs for the ranks. Three out of the 10 best performers had CIs contained within the top decile and another four had CIs contained within the top quartile. Also, four of the 10 worst performers had CIs contained in the bottom decile, and another four had CIs contained in the bottom half.
Diabetes Quality-of-Care Rank Assignments from Best (1) to Worst (105), with 90 Percent Simultaneous Confidence Intervals
We also examined change in facility ranks between the models with and without case-mix adjustment. Despite fairly modest differences in growth curve parameter estimates, some substantial changes in ranks were evident when case-mix variables were included. After case-mix adjustment, seven of 105 facilities improved their quartile of performance and nine moved to a lower quartile. Eighty-nine facilities remained at the same quartile with and without case-mix adjustment. Seven of 10 facilities remained at or above the 90th percentile of performance and eight of 10 facilities remained at or below the 10th percentile of performance while four of six remained at or above the 95th percentile and five of six remained at or below the 5th percentile.
Current approaches to comparison of health care systems based on quality-of-care measures rely on cross-sectional methodology because of the need to rely upon chart abstractions in many instances. As cross-sectional evaluation of intermediate outcomes, such as A1c levels, cannot measure within-patient change over time, it is difficult to assess care in stable populations of individuals with chronic disease. Furthermore, quality comparisons for diabetes care currently do not include adjustments for case mix or duration of health system care. For example, health care systems that serve largely older individuals, like Medicare beneficiaries, may have better A1c than those serving inner city minorities because of population age and ethnicity. Additionally, if newer enrollees are healthier, cross-sectional measures can appear better for similar nonquality reasons. Although currently used cross-sectional measures of quality of diabetes care have made major contributions and demonstrable progress toward improvement, they cannot evaluate continued health care given over time to individuals with chronic diseases (Zhang et al. 2000
; Weiner and Long 2004
). The longitudinal approach presented here measures changes in A1c over time within individuals, effectively overcoming some of the cross-sectional problems and capturing a new domain in quality measurement.
We described considerable facility-level variation in controlling A1c over time for patients within semi-autonomous units within this large national health care system. On average, case-mix adjusted facility-level A1c values decreased by −0.314 (range −1.90 to 1.03) over 2 years, suggesting that overall VHA diabetes care improved. This level of change is clinically meaningful; a patient with A1c of 7.5 percent who began diabetes care at a facility at the start of the study period would on average have had A1c of 7.2 percent by the end of the study period. The majority of 125 facilities (102 with case-mix adjustment, 105 without case-mix adjustment) demonstrated negative monthly change rates in A1c. The facilities with rising within-patient A1c levels over time might represent targets for quality improvement efforts, especially those whose slope estimates had SCIs that did not cross zero. Such slope estimate comparisons can be highly useful for quality improvement purposes.
We also assigned ranks to facilities based on slope estimates. Identifying facilities in the lowest decile of performance is a common approach when determining which facilities could improve quality of care. Fairness dictates high confidence in such rank assignments, although statistical variability remains a barrier to achieving high confidence in many situations. Even after case-mix adjustment and limiting rank assignment to facilities with sufficient data, we found considerable uncertainty in ranking, similar to other reports (Marshall and Spiegelhalter 1998
). Nevertheless, four facilities identified in the lowest decile of performance had rank CIs remaining within the lowest decile of performance, indicating that these facilities likely did perform “worse.” Use of ranks for accountability has been debated but may be highly useful for quality improvement purposes (Schneider and Epstein 1996
; Marshall and Spiegelhalter 1998
We found some effect of case-mix adjustment on diabetes care evaluation, in that the case-mix adjusted slope estimates were modestly different than those without case-mix adjustment. We observed changes in composition of deciles of rank, with five out of 20 facilities identified at or below the 10th percentile of performance without case-mix variables moving into a better decile in the case-mix adjusted model. Consequently, case-mix adjustment may be warranted for longitudinal evaluations of A1c rates of change in an accountability context.
One of the strengths of this study was the ability to describe a large population by using administrative data, which has been demonstrated to have high reliability compared with chart abstraction data (Kerr et al. 2002
). Additionally, the VHA has effectively reduced access barriers to care and pharmacy services, diminishing variation because of accessibility that can overwhelm quality-of-care studies in other settings (Jha et al. 2003
There are also some limitations to this study. First, the restriction of the study to the VHA, which serves primarily older men, possibly impacts the generalizability of our findings. Second, we controlled variation in A1c measurement by only using values from NGSP-certified methods, yet we found that one method had consistently slightly higher values than the other certified methods. Recent standards for NGSP certification have become more stringent than those in place at the time of this study (National Glycohemoglobin Standardization Program [NGSP] 2004
). The ongoing implementation of the NGSP nationally should improve future A1c comparability. Third, as an observational study based on existing administrative data, potentially important confounders were not available for analysis. For example, medication adherence may be a very important determinant, but is difficult to ascertain from administrative data, especially for insulin utilization. Also, an important problem in studies like this is loss to follow-up. We performed a subanalysis that demonstrated that patients leaving the system were similar in A1c control as those who remained in VA care. Thus, a “healthy survivor effect” was unlikely to account for the findings. Finally, for ranking purposes, we used only facilities with at least two measures on 50 or more individuals to maximize stability of estimates. Because enrollment in private sector health plans is on average about 2.5 years, the individuals with this much data may be limited for typical Health Employer Data Information Set samples of 411 chart reviews (HEDIS® 2004
). However, increasing emphasis on frequent A1c measurement by the American Diabetes Association (2004)
and the National Committee on Quality Assurance (HEDIS® 2004
) is likely to increase these numbers in the future.
In summary, we demonstrated significant differences among VHA facilities in A1c change rates for average patients within a given facility. Additionally, our results indicate the likely importance of risk adjustment if the results are to be used for accountability purposes. This use of longitudinal data may offer distinct advantages over cross-sectional methods by distinguishing within-subject effects from population effects, thereby determining patient response to the clinical setting. We suggest evaluation and comparison of these within-patient change rates may prove to be a useful new approach to studying the quality of ongoing care to a population with chronic disease, and may identify those facilities that could be more closely evaluated for best (and more problematic) practices.