|Home | About | Journals | Submit | Contact Us | Français|
A1c levels are widely used to assess quality of diabetes care provided by health care systems. Currently, cross-sectional measures are commonly used for such assessments.
To study within-patient longitudinal changes in A1c levels at Veterans Health Administration (VHA) facilities as an alternative to cross-sectional measures of quality of diabetes care.
Longitudinal study using institutional data on individual patient A1c level over time (October 1, 1998–September 30, 2000) with time variant and invariant covariates.
One hundred and twenty-five VHA facilities nationwide, October 1, 1998–September 30, 2000.
Diabetic veteran users with A1c measurement performed using National Glycosylated Hemoglobin Standardization Project certified A1c lab assay methods.
Characteristics unlikely to reflect quality of care, but known to influence A1c levels, demographics, and baseline illness severity.
Monthly change in A1c for average patient cared for at each facility.
The preponderance of facilities showed monthly declines in within-patient A1c over the study period (mean change of −0.0148 A1c units per month, range −0.074 to 0.042). Individual facilities varied in their monthly change, with 105 facilities showing monthly declines (70 significant at .05 level) and 20 showing monthly increases (5 significant at .05 level). Case-mix adjustment resulted in modest changes (mean change of −0.0131 case-mix adjusted A1c units per month, range −0.079 to 0.043). Facilities were ranked from worst to best, with attached 90 percent confidence intervals. Among the bottom 10 ranked facilities, four remained within the bottom decile with 90 percent confidence.
There is substantial variation in facility-level longitudinal changes in A1c levels. We propose that evaluation of change in A1c levels over time can be used as a new measure to reflect quality of care provided to populations of individuals with chronic disease.
Improving the quality of health care is a fundamental concern in the Institute of Medicine's landmark 2001 report Crossing the Quality Chasm (Committee on Quality Health Care in America, Institute of Medicine 2000). Central to this are measures used to evaluate quality of care. Current efforts at measuring quality-of-care focus primarily on processes of care because comparisons of receipt of processes are relatively straightforward. With diabetes patients, for example, glycated hemoglobin (A1c) testing is a central component of the widely implemented Diabetes Quality Improvement Project performance measures. Intermediate health outcomes are also desirable for use in such assessments because they represent a more distal effect of care and are more directly linked to risks of outcomes than processes of care. One such intermediate health outcome for diabetes patients is A1c level, which has been associated with microvascular complications (Diabetes Control and Complications Trial Research Group 1993; U.K. Prospective Diabetes Study [UKPDS] Group 1998). Especially as they can be lowered with medications, A1c levels are considered evidence-based performance measures for quality-of-care assessments. Consequently, A1c testing is recommended at least annually for all diabetes patients and therefore provides a readily available pool of information.
However, A1c values are also related to patient-level factors extraneous to health care systems (Chin et al. 2000), complicating attribution of less-than-optimal A1c values to lapses in quality of care. Indeed, even in the high-quality environment of landmark clinical trials (Wright et al. 2002), A1c levels rose over time within most individuals. This reflects in part the limitation of currently available treatments and in part other factors such as patient age or severity of diabetes. To account for this, A1c levels can be case-mix adjusted to result in more meaningful quality-of-care assessments (Iezzoni 1997; Zhang et al. 2000). However, case-mix adjustment methodology is not yet well established for intermediate health outcomes such as A1c levels, and most attention has been focused on cross-sectional analyses.
In addition to cross-sectional analyses, another potentially useful strategy would be to evaluate each patient's A1c level over time within a health care system. Multiple measures on individuals over time, termed “longitudinal data,” (Diggle, Liang, and Zeger 1996) are crucial for assessing change over time within patients. Longitudinal data enable differentiation of cohort effects (patient population changes over time) from within-subject time effects. For example, more aggressive screening may identify individuals earlier in their disease course. Therefore, serial cross-sectional assessments could result in declining A1c levels that are because of selection biases and not to better diabetes care. By following individuals' glycemic control over time, these sorts of cohort effects can effectively be separated from the effect of the health system. Longitudinal methods may give a better sense of how a patient's health care system is treating the disease over time than do cross-sectional data.
Longitudinal data can be analyzed using the random-effects growth curve model (Bryk and Raudenbush 1992). This method can be used to model an average A1c growth curve for all patients utilizing a given health care system for diabetes care, describing how its patients' A1c values changed on average over time. The deviation of patient-level change in A1c from the average A1c growth curve for a given health care system is accounted for by random effects in the model. These random effects incorporate the correlation of multiple A1c measurements made on each patient (Diggle, Liang, and Zeger 1996). This use of the growth curve methodology is attractive because it should result in meaningful assessments of quality of care as reflected in how health care systems influence patient A1c levels over time, or in other words, the patient response to the clinical setting.
We studied the longitudinal growth curve model as a new approach to evaluating quality of care among diabetic veterans in 125 of the Veterans Health Administration's (VHA) facilities nationwide. We hypothesized that some facilities would be more or less successful at controlling A1c levels over time, as reflected in facility-level average monthly A1c growth curves over 2 years. We also examined the importance of case-mix adjustment in the longitudinal context.
We used data from the Healthcare Data Analysis Information Group in Milwaukee, Wisconsin for pharmacy and laboratory results and the central data repository for the VHA in Austin, TX for demographics and inpatient and outpatient utilization with associated International Classifications of Diseases, 9th edition (ICD-9) codes. Veterans who used the VHA in Fiscal Year 1999 (FY99) or Fiscal Year 2000 (FY00) and in at least 1 year received a diabetes medication, had an ICD-9 code 250.xx (diabetes) associated with more than one outpatient encounter, or had any inpatient encounter were considered to have diabetes (Hebert et al. 1999; Health Plan Employer Data and Information Set [HEDIS®] 2004). This identified 1,476,921 A1c tests on 486,558 patients.
Hemoglobin A1c is the percentage of hemoglobin with an attached glucose moiety. As red blood cells, the repositories of hemoglobin, have an average lifespan of 120 days, A1c values reflect average ambient glucose values over the past 120 days, although the last 30 days' average glucose disproportionately determines the measured A1c (Tahara and Shima 1995). Thus, we only included A1c values for each individual that were separated by at least 30 days, excluding 38,186 tests. We also eliminated 1,481 A1c values that fell outside the plausible physiologic range (3–18 percent).
Considerable lab assay methodology variations for A1c testing have been described within the United States and VHA (Little et al. 2001; Safford et al. 2003). Because the National Glycohemoglobin Standardization Program (NGSP) is currently implementing nationwide standardization, such variation is likely to diminish in the near future. In FY99 and FY00, however, many labs in the United States, including several VHA facilities, were using methods not certified by the NGSP. Other VHA facilities were using multiple methods, some certified and others not. To minimize this variation, we excluded all 618,877 uncertified A1c tests from the analysis.
Many VHA patients receive care at multiple facilities within a given year; for example, 11 percent of the patients in our sample had A1c tests at more than one facility. For evaluation of facility-level differences in quality of care, each patient was assigned a “home facility” for the study period (FY99 and FY00), defined as the facility at which the patient made the most outpatient diabetes-related visits.
Some VHA facilities were reorganized in FY00, some did not provide lab assay method information, and some facilities reported lab data incompletely (Kerr et al. 2002). This resulted in the exclusion from the sample data of 32,342 tests at 28 facilities. Additionally, 19 facilities had fewer than 100 A1c tests over the study period, far fewer than the typical VHA facility. Because of this scarcity of data, it was judged that estimates of growth curves for these facilities would be too unstable to compare reliably to larger VHA facilities. Hence, they and their 1,274 tests were dropped from the analysis.
The final study sample consisted of 284,895 patients from 125 facilities with 816,721 A1c tests. Approval from our institution's Institutional Review Board was obtained.
The outcome variable of interest was patient A1c level. Available covariates included the number and dates of all A1c lab tests for each individual, lab methodology, and case-mix adjustment variables (which included demographic information, marital status, diabetes severity, and the Charlson index of comorbid illnesses). The Charlson index was selected to capture potential illness-related influences on treatment decisions to achieve glycemic control.
We operationalized two domains of time: “relative time” and “absolute time.” Relative time was used to model individual patients' pattern of change in A1c values over the course of the study period. The relative time associated with a patient's given A1c value was the number of months elapsed from the patient's first recorded A1c test during the study period to the given A1c test date. Absolute time was defined as the number of days elapsed from the start of FY99 to the given A1c test date, which was used to model the seasonal population-level fluctuations in A1c that we observed in preliminary analyses (see Figure 1).
We found small but significant differences among the NGSP certified lab methods. Thus, we included A1c lab methodology in the model, with categories denoted by method B (Biorad, Hercules, CA), method T (Tosoh, Tokyo), and other.
Available demographic factors included age, sex, race/ethnicity, and marital status. Age has been associated with A1c levels, with older individuals having lower values (Harris et al. 1999; Zhang et al. 2000). Therefore, we included age at onset of the study period as a baseline adjuster. Patient sex was also included in the model. Race was categorized as white, African American, Hispanic, Native American, Asian, other, and unknown. Marital status (married or not married) was included because it has been associated with A1c levels in older men (Zhang et al. 2000; Safford et al. 2003) and may reflect social support.
Insulin use has been closely associated with duration and severity of diabetes (UKPDS Group 1998). The patient's first diabetes treatment recorded in the study period (insulin, no insulin, and missing) was used as a measure of baseline diabetes severity.
The Charlson index reflects 1-year mortality risk based on the presence of selected illnesses (Charlson et al. 1987). This index has been adapted for use in administrative data (Deyo, Cherkin, and Ciol 1992), which we used to construct Charlson index scores to capture general illness burden. Charlson score was used as a time invariant measure of comorbidity, similar to other reports (Berlowitz et al. 1998; Chin et al. 2000; Zhang et al. 2000; Safford et al. 2003).
The models ascribed to each facility an expected trajectory (“growth curve”) of A1c values for patients utilizing that facility as their home facility. We used the estimated facility-level growth curve slopes to compare outcomes of diabetes care among the facilities, both with and without case-mix adjustment. The hierarchical linear models used are described below; more detailed descriptions are available at http://www.stat.rutgers.edu/~slothrop/VA/Appendix.htm
Two parts, along with a random error term, give the general framework of the model: As with the usual linear regression models, the error terms were treated as independent, mean-zero normal variables.
The time invariant terms in the model included the certified A1c lab method and the case-mix adjustment variables. The time variant terms included seasonal and growth curve terms. The seasonal term consisted of a linear combination of sine and cosine with a period of 1 year; inclusion of both sine and cosine allowed the starting point of the cycle to be empirically determined. The growth curve term was linear in relative time. Each patient in the study was ascribed an individual growth curve for A1c values over relative time, with coefficients containing both fixed and random components. The fixed component of the growth curve coefficients, which differed by facility, described the pattern of change over time for an average patient in that patient's home facility. The random component of the growth curve coefficients modeled deviations of the patient-specific growth curve from the facility-level average growth curve. The random effects were treated as independent mean-zero normal variables.
The random-effects model required estimation of random effects for each patient. With 284,895 patients, this entails intensive computational effort. Because of this practical limitation, generalized estimating equations (GEEs) were used to fit the model rather than implementing the random-effects model directly (Liang and Zeger 1986). The GEE approach relieved computing load by avoiding estimation of random effects. For continuous responses such as A1c levels, this approach produces consistent parameter estimates for the fixed effects that are identical to those produced by fitting the random-effects model (Diggle, Liang, and Zeger 1996).
A common term for model evaluation is R2, which is interpreted as the proportion of variation explained by the regression model. In the longitudinal context, no generally accepted counterpart of R2 exists and so we defined “working”R2. For each patient, we computed the mean observed A1c value and the mean predicted A1c value. The working R2 was computed as in the usual regression setting with independent observations, using the patient mean observed and predicted values instead (see Appendix for details).
The facility-specific intercepts of the growth curve model represented initial mean A1c at that facility and the slopes represented mean monthly A1c change. We used the facility slopes to compare diabetes outcomes. A facility with an expected patient A1c trajectory that brought A1c levels into control rapidly may have provided better care than facilities with trajectories that brought levels into control more slowly or not at all. A facility with a more negative slope was therefore considered to be better performing than one whose slope was either less negative or positive. For multiple comparisons, we constructed simultaneous confidence intervals (SCIs) for the facility slopes using the Studentized maximum modulus method (Hsu 1996). We assessed changes in facility slopes between the models with and without case-mix variables.
The facilities were also ranked based on the slope estimates, from most rapid glycemic improvement, or steepest negative slope, to least rapid/glycemic deterioration. We constructed confidence intervals for the ranks based on a slight modification of the simulation approach of Goldstein and Spiegelhalter (1996) and Hsu (1996) (see Appendix for details).
We also investigated an alternative explanation for the overall declining trend of the A1c values: those with worsening glycemic control preferentially leaving the VA system. For this analysis, we selected patients who were alive at the end of the 2-year period by examining the Beneficiary Identification and Records Locator Subsystem for vital status. We then defined two subgroups of patients: those who likely remained in the system, that is, they utilized services in both years of the observation period (n =142,426); and those who left it (“out-migrators”), that is, they had utilization only in the first year (n =51,054). We then fitted the hierarchical growth curve models as described for the main analysis to each subgroup separately for the first year only. Any difference between the average Alc decline rates in each of the two subgroups would indicate if patients who “out-migrated” were more often experiencing worsening control.
All statistical analyses were performed using SAS version 8.2. The level of significance used for inclusion of covariates was p <.05.
Table 1 compares the sample used in this study to the overall population of VHA users with diabetes in FY99 and FY00. The mean age for subjects in the study sample was 64 years and 98 percent were male, 70 percent were white, 61 percent were married, and 31 percent used insulin at entry into the sample. These differed only slightly from the overall VHA diabetes patient population. For both groups, mean A1c levels decreased monotonically with age. Patients who were unmarried, using insulin, or were African American or Hispanic had higher A1c values compared with their counterparts. A1c levels were slightly but significantly higher in the winter than in the summer and monthly A1c values trended downward over the study period (see Figure 1).
In the growth curve models, the facility slopes represented the average monthly change in patients' A1c in a given home facility. In the model without case-mix adjustment, the facility slopes averaged −0.0148 A1c units per month (SD 0.0193, range −0.074 to 0.042). Of the 125 facilities, 105 had negative slopes, but 20 had monthly rises in A1c. The 95 percent SCIs for the slopes (see Figure 2) exhibited significant differences among the estimates. Additionally, 70 of the negative slopes and five of the positive slopes had SCIs which did not cross zero.
Most facility slope estimates were also negative in the case-mix adjusted model. In this model, 102 of 125 facilities had negative adjusted slopes, with mean −0.0131 (SD 0.0188, range −0.079 to 0.043). The 95 percent SCIs for the case-mix adjusted slopes also exhibited significant differences among the estimates. As well, 68 of the negative slopes and five of the positive slopes had SCIs which did not cross zero. The graph of these slopes and SCIs was remarkably similar to Figure 2 and hence is not shown.
We found some differences between the growth curve parameter estimates with and without case-mix adjustment. As expected, the facility intercepts from both models, or initial mean A1c values for patients at that facility, averaged 7.17 (SE 0.252) and 7.84 (SE 0.265) for models with and without case-mix adjustment, respectively, an unsurprising difference reflecting the entry of more explanatory variables into the model. The facility slopes averaged −0.0148 (SE 0.0193) and −0.0131 (SE 0.0188), respectively, and were thus much more similar. After case-mix adjustment, three facilities changed from negative to positive slope and no facility changed from a positive to negative slope. The model without case-mix adjustment had working R2=1.77 percent and the case-mix adjusted model had working R2=14.45 percent, reflecting considerable explanatory power for the case-mix adjustment variables.
Table 2 displays the case-mix adjusted model fixed parameter estimates. The case-mix parameter estimates largely mirrored the descriptive statistics. For example, the youngest patients (<55) had higher A1c than older patients. Being unmarried or using insulin was also associated with high A1c. Hispanics and African Americans had higher A1c compared with whites. Lab assay method B yielded slightly but significantly higher A1c than other certified lab assay methods. Seasonal effects were modest but significant in both models.
Higher-order polynomial growth curve models were also explored. These demonstrated little improvement over simple linear growth curve models. For the subanalysis on “out-migrators,” the average Alc decline rate was −0.033 (SE 0.0056) for patients in the “out-migrator” group and −0.027 (SE 0.0046) for the patients who remained in VA care. Normal-based statistical tests suggested that these two decline rates were significant (p-values <.001 for both), but there were no significant difference between them (p-value=.407). This suggested that the overall declining trend of the A1c values could not be attributed to patients with worsening control “out migrating.”
One method for comparing the quality of health care is through ranking performance based on some specified criterion. In this study, we ranked facilities based on the facility slopes, using the case-mix adjusted model estimates. We limited ranking to those 105 facilities that had at least 50 patients with two or more A1c values to maximize clinical relevance. Figure 3 shows facility ranks with 90 percent two-sided CIs for the 105 facilities, allowing assessment of comparative performance with the specified certainty. Relative performance of some facilities was evaluated fairly precisely, others less so. For instance, the facility ranked 105th (most highly positive slope) had a 90 percent SCI of [101, 105], so that we can be confident it is a poor performer relative to other facilities. However, the facility ranked 102nd had a 90 percent SCI of [34, 105], so we can only ascertain that it is not in the first quartile of performers. For the top 10 performers (ranks 1–10) we obtained 90 percent upper CIs for the ranks and for the bottom 10 performers (ranks 96–105) we obtained 90 percent lower CIs for the ranks. Three out of the 10 best performers had CIs contained within the top decile and another four had CIs contained within the top quartile. Also, four of the 10 worst performers had CIs contained in the bottom decile, and another four had CIs contained in the bottom half.
We also examined change in facility ranks between the models with and without case-mix adjustment. Despite fairly modest differences in growth curve parameter estimates, some substantial changes in ranks were evident when case-mix variables were included. After case-mix adjustment, seven of 105 facilities improved their quartile of performance and nine moved to a lower quartile. Eighty-nine facilities remained at the same quartile with and without case-mix adjustment. Seven of 10 facilities remained at or above the 90th percentile of performance and eight of 10 facilities remained at or below the 10th percentile of performance while four of six remained at or above the 95th percentile and five of six remained at or below the 5th percentile.
Current approaches to comparison of health care systems based on quality-of-care measures rely on cross-sectional methodology because of the need to rely upon chart abstractions in many instances. As cross-sectional evaluation of intermediate outcomes, such as A1c levels, cannot measure within-patient change over time, it is difficult to assess care in stable populations of individuals with chronic disease. Furthermore, quality comparisons for diabetes care currently do not include adjustments for case mix or duration of health system care. For example, health care systems that serve largely older individuals, like Medicare beneficiaries, may have better A1c than those serving inner city minorities because of population age and ethnicity. Additionally, if newer enrollees are healthier, cross-sectional measures can appear better for similar nonquality reasons. Although currently used cross-sectional measures of quality of diabetes care have made major contributions and demonstrable progress toward improvement, they cannot evaluate continued health care given over time to individuals with chronic diseases (Zhang et al. 2000; Weiner and Long 2004). The longitudinal approach presented here measures changes in A1c over time within individuals, effectively overcoming some of the cross-sectional problems and capturing a new domain in quality measurement.
We described considerable facility-level variation in controlling A1c over time for patients within semi-autonomous units within this large national health care system. On average, case-mix adjusted facility-level A1c values decreased by −0.314 (range −1.90 to 1.03) over 2 years, suggesting that overall VHA diabetes care improved. This level of change is clinically meaningful; a patient with A1c of 7.5 percent who began diabetes care at a facility at the start of the study period would on average have had A1c of 7.2 percent by the end of the study period. The majority of 125 facilities (102 with case-mix adjustment, 105 without case-mix adjustment) demonstrated negative monthly change rates in A1c. The facilities with rising within-patient A1c levels over time might represent targets for quality improvement efforts, especially those whose slope estimates had SCIs that did not cross zero. Such slope estimate comparisons can be highly useful for quality improvement purposes.
We also assigned ranks to facilities based on slope estimates. Identifying facilities in the lowest decile of performance is a common approach when determining which facilities could improve quality of care. Fairness dictates high confidence in such rank assignments, although statistical variability remains a barrier to achieving high confidence in many situations. Even after case-mix adjustment and limiting rank assignment to facilities with sufficient data, we found considerable uncertainty in ranking, similar to other reports (Marshall and Spiegelhalter 1998). Nevertheless, four facilities identified in the lowest decile of performance had rank CIs remaining within the lowest decile of performance, indicating that these facilities likely did perform “worse.” Use of ranks for accountability has been debated but may be highly useful for quality improvement purposes (Schneider and Epstein 1996; Marshall and Spiegelhalter 1998).
We found some effect of case-mix adjustment on diabetes care evaluation, in that the case-mix adjusted slope estimates were modestly different than those without case-mix adjustment. We observed changes in composition of deciles of rank, with five out of 20 facilities identified at or below the 10th percentile of performance without case-mix variables moving into a better decile in the case-mix adjusted model. Consequently, case-mix adjustment may be warranted for longitudinal evaluations of A1c rates of change in an accountability context.
One of the strengths of this study was the ability to describe a large population by using administrative data, which has been demonstrated to have high reliability compared with chart abstraction data (Kerr et al. 2002). Additionally, the VHA has effectively reduced access barriers to care and pharmacy services, diminishing variation because of accessibility that can overwhelm quality-of-care studies in other settings (Jha et al. 2003).
There are also some limitations to this study. First, the restriction of the study to the VHA, which serves primarily older men, possibly impacts the generalizability of our findings. Second, we controlled variation in A1c measurement by only using values from NGSP-certified methods, yet we found that one method had consistently slightly higher values than the other certified methods. Recent standards for NGSP certification have become more stringent than those in place at the time of this study (National Glycohemoglobin Standardization Program [NGSP] 2004). The ongoing implementation of the NGSP nationally should improve future A1c comparability. Third, as an observational study based on existing administrative data, potentially important confounders were not available for analysis. For example, medication adherence may be a very important determinant, but is difficult to ascertain from administrative data, especially for insulin utilization. Also, an important problem in studies like this is loss to follow-up. We performed a subanalysis that demonstrated that patients leaving the system were similar in A1c control as those who remained in VA care. Thus, a “healthy survivor effect” was unlikely to account for the findings. Finally, for ranking purposes, we used only facilities with at least two measures on 50 or more individuals to maximize stability of estimates. Because enrollment in private sector health plans is on average about 2.5 years, the individuals with this much data may be limited for typical Health Employer Data Information Set samples of 411 chart reviews (HEDIS® 2004). However, increasing emphasis on frequent A1c measurement by the American Diabetes Association (2004) and the National Committee on Quality Assurance (HEDIS® 2004) is likely to increase these numbers in the future.
In summary, we demonstrated significant differences among VHA facilities in A1c change rates for average patients within a given facility. Additionally, our results indicate the likely importance of risk adjustment if the results are to be used for accountability purposes. This use of longitudinal data may offer distinct advantages over cross-sectional methods by distinguishing within-subject effects from population effects, thereby determining patient response to the clinical setting. We suggest evaluation and comparison of these within-patient change rates may prove to be a useful new approach to studying the quality of ongoing care to a population with chronic disease, and may identify those facilities that could be more closely evaluated for best (and more problematic) practices.
This work was supported by a grant from the Veterans Administration Health Services Research and Development Program (IIR 00-072-1) (Dr. Pogach). The research is also partly supported by NSF SES-0241859 (Dr. Xie). Address questions about the statistical methodology to Dr. Xie at ude.sregtur@eixm.