The goal of most practice profiling efforts is to improve quality or efficiency by decreasing variation and providing incentives to move providers toward more optimal care practices. However, for such a system to be effective, it requires a basic understanding of the sources of and reasons for this variation so that interventions can be appropriately targeted and resources expended wisely. This analysis demonstrates that there are sizeable differences in the amount of practice variation in diabetes care both across levels of care (PCP, provider group, and facility) and by type of indicator (resource use, processes, and intermediate outcome). The greatest amount of practice variation, for basically all of the indicators examined, tended to be attributable to the facility level. For process measures, such as whether an HbA1c was measured, the estimated facility and PCP effects were generally comparable (9 percent attributable to facility, 8 percent attributable to PCP). However, for three resource use measures, the facility effect was at least six times the size of the PCP effect; and, for the intermediate outcomes, the facility effects ranged from two to sixty times the size of the PCP level effect. The provider-group-level effects were negligible for most of the indicators.
Indicators with the largest PCP effects tended to be process measures, such as whether a lab test was obtained. Having an LDL-C value successfully measured in the past year was the only indicator in which the PCP effect was substantially greater than the facility effect. The reason for this result is not entirely clear, especially when the provider effect for whether a lipid profile was obtained is about the same as for LDL measured, but with the lipid profile measure there is also a facility effect that is comparable to the PCP effect. One key difference between the two measures is that not having an LDL-C value obtained can result from the inability to calculate an LDL-C for patients with elevated triglycerides, which in turn can be related to not fasting prior to sample collection. Therefore, providers who see more patients in the afternoons could have a more difficult time obtaining a fasting sample from their afternoon patients. However, it is often the case in practice that patients seen in the morning have also not been fasting and this result could be related to other factors.
Furthermore, an issue of greater importance is whether patients of practitioners who obtain lipid or HbA1c tests more frequently than their peers actually achieve better levels of control. If not, can we legitimately consider them “quality” measures? Ordering a lab test is just the first step, but can only improve the quality of care if it results in better treatment and, ultimately, better health outcomes. Therefore, while process measures (like HbA1c obtained in the past year) may be more feasible to profile, such an effort could also be counterproductive, as well as a waste of time and money, if it allows providers to “game the system” without truly improving meaningful aspects of patient care.
It is somewhat disappointing that the greatest amount of PCP practice variation was observed for processes of care that have a relatively weak association with clinical outcomes (e.g., frequency of HbA1c testing) and there was almost no PCP level variation in indicators for which there is stronger evidence that improvements should result in better patient outcomes (e.g., lipid and glycemic control). While it is conceivable that even for indicators where there is a small PCP effect, changes in practice affecting only 2 percent of the variation could have an effect on outcomes (either clinical or economic) that is considered important on an absolute scale, this does not negate the fact that a sufficient sample size and some detectable variability are necessary to generate accurate profiles. Otherwise, resources and attention may be spent on trying to address illusory differences in practice or result in some practitioners being unfairly penalized.
On the other hand, we feel that a particularly provocative finding is that a considerably larger PCP effect was detected for the combined LDL-C (intermediate outcome) and statin use (process) measure, when compared with the standard intermediate outcome indicators. There are several very attractive features of such “linked” process-intermediate outcome measures. First, they are more clinically meaningful than either process measures or intermediate outcomes in isolation (Lasker, Shapiro, and Tucker 1992
). Second, this type of indicator reflects an activity that is more controllable by the clinician and therefore may be more reliably profiled at the PCP level. Third, the clinician could receive immediate credit for his/her actions (such as starting and titrating proven medical therapies) rather than being penalized for caring for sicker patients. Consequently, the use of “linked” measures may help avoid one of the potentially perverse incentives associated with profiling, whereby providers can more easily improve their intermediate outcome profiles by avoiding or deselecting patients than by improving their care (Hofer et al. 1999
Generally, these results suggest that differences observed with many currently used performance indicators may be related more to facility level factors, be it organizational characteristics or attributes of the patient population, rather than to the practice patterns of individual providers or provider groups. Moreover, profiling of PCPs using these indicators is apt to be less accurate, and perhaps less effective, than facility profiling due to the smaller amount of systematic detectable variation at the PCP level. For the process measures with the largest PCP effects (around 8 percent), generating a reliable profile would still require a panel size of close to 50 patients per provider. While some PCPs have a panel containing 50 or more patients with diabetes, oftentimes the number of patients that belong to a particular health plan is much smaller, thereby decreasing the effective panel size if profiling is done by a single insurer. In addition, for several of the most important indicators (e.g., lipid control), panel sizes of at least 200 patients would be needed to produce reliable profiles. Two-thirds of the PCPs in this study had a panel of fewer than 50 patients with diabetes and the median panel size was 24 patients. Similarly, the reported median panel size for a group of 250 PCPs at one HMO was 29 (Hofer et al. 1999
). Most facilities, on the other hand, have several hundred to several thousand patients with diabetes.
The literature on variations in practice patterns is extensive. However, few studies explicitly identify the amount of variation attributable to individual practitioners and yet even fewer consider the relative amount of variability attributable to other levels of care. Our findings on variation in individual provider practices are consistent with other published estimates. These estimates range from 2 percent of the variability in resource use at a teaching hospital (Hayward et al. 1994
), 3 percent of the variability in the prescribing rates of general practitioners (Davis and Gribben 1995
), 4 percent of the variability in outpatient visits for patients with diabetes (Hofer et al. 1999
), and 10 percent or less of the variance in three measures of patient satisfaction (Sixma, Spreeuwenberg, and van der Pasch 1998
). The largest effects to date have been found by Orav et al. (1996)
who estimated that practitioner effects accounted for 22 percent of the variability in follow up of high serum glucose and 23 percent in the monitoring of patients on digoxin. However, the practitioner level effects for other indicators assessed within the same study were much smaller (e.g., 9 percent of the variance for hematocrit screening, 3 percent for cancer screening, and from 4 percent to 15 percent for different pediatric care measures including gastroenteritis, otitis media, urinary tract infection, and well child care) (Orav et al. 1996
Case-mix adjustment resulted in minimal changes in the observed pattern of attributable variation across levels. However, case-mix adjustment is not a simple matter. The approach we used might be reasonable for adjusting outcomes such as mortality or resource use since individuals who have more illnesses or certain types of illnesses are often more likely to die or use more resources (Weiner et al. 1996
; Shwartz et al. 1994
). In contrast, simply because someone has other comorbidities (or certain sociodemographic characteristics) does not, for example, mean that we should not continue to monitor their level of glycemic control by obtaining an HbA1c or that they will necessarily have poorer glycemic control. Therefore, without convincing biological, physiological, or epidemiological evidence, it may not be appropriate to adjust for these factors when looking at process measures or intermediate outcomes as this would obscure what might be true differences in care quality (Hanchak and Schlackman 1995
). Additionally, incorporating case-mix information tended to decrease the amount of attributable variance at certain levels for some of the indicators, and the use of a more rigorous adjustment process could make the PCP and facility effects even smaller (Salem-Schatz et al. 1994
). Nonetheless, more work is needed to identify how patient-specific factors influence variability, particularly at the facility level, across a broad range of quality indicators.
There are limitations associated with this analysis. First, there were patients at each facility who were not assigned to a specific PCP and thus not included in our analysis. These patients were less likely to have specified tests completed, had slightly poorer values for intermediate outcome measures, and used fewer resources. However, this pattern was true at all facilities and even though the proportion of unassigned patients varied by facility we could not identify any other systematic site-specific reasons for whether a patient did or did not have an assigned PCP. Additionally, while the results reported are based on models that excluded these unassigned patients, models including this group produced the same patterns and conclusions. Second, the lack of variation attributable to the provider group level may be due to the lack of consistency in group definitions across study sites, although several sites now report they are actively promoting the development of more functional provider groups. Third, these analyses are based on data from one large regional health care system operated by the VA and may not be representative of other care systems. On the other hand, there are currently very few places outside the VA that have the type of data required for conducting such analysis and, as discussed above, analyses using data from other health systems suggest the results may be similar (e.g., Orav et al. 1996
; Hofer et al. 1999
). Nonetheless, further studies are needed to examine these issues both inside and outside the VA. Finally, this analysis focuses on diabetes-related measures only and it is possible that different results could be found with other condition-specific or generic indicators (or other aspects of care such as satisfaction and patient–provider communication) used in performance monitoring and profiling systems. Nevertheless, the diabetes indicators are among the most well-developed and widely used measures and this analysis includes some of the most common types of indicators that one is likely to encounter in any sort of profiling system.
In conclusion, this study suggests that a considerable amount of time and resources may be wasted in trying to develop and implement practice profiles of individual primary care providers using many of the currently popular quality indicators. Instead, efforts might be better spent on developing and evaluating indicators that are not designed just to grade providers but to support and promote specific, high-priority clinical actions. Likewise, in-creased emphasis on constructing and examining facility/clinic level profiles may be more productive. This includes the advancement of information systems for obtaining detailed clinical data; continued support for the creation and use of a consistent measurement set (e.g., HEDIS) that focuses on aspects of care that are truly important for improving patient outcomes; and finally, identifying what factors contribute to performance differences at the facility or clinic level, including characteristics of the patient population and the facility (e.g., academic affiliation, practitioner mix, implementation of special programs or clinics, and referral procedures). These steps will, in turn, help with initiating more targeted and prudent approaches to promoting improvements both in patient care and patient health outcomes.