Pay-for-performance systems and public reporting of performance data are becoming increasingly prevalent in our health care system3
. Although mostly overlooked in discussions of this topic, identifying the appropriate patient population for performance assessment is crucial to the design and implementation of these systems because using various methods of identifying patient populations might lead to large differences in observed performance that are, in fact, spurious. In this study, we present data that demonstrate the potential impact of different sample design decisions on health center assessments of cancer screening rates when the populations vary across centers. This approach has potentially broad implications because variation of the type we observed in our health centers and simulated in our analyses are broadly applicable within primary care.
We find several notable results. First, regardless of sample design, we find large numbers of individuals eligible for each of the screening tests we examined at all of the health centers. Second, we show wide variation in the proportion of health center patients that would be included in the denominator population, with almost three-fold variation in the proportion of health center patients that would be eligible for the denominator using the most divergent sampling criteria. This implies that fairly subtle differences in selection criteria may lead to very different assessment groups in some settings. Finally, using simulation techniques, we also show that variations in patient populations reflecting different levels of continuity of care across health centers could lead to the observation of large differences in health center performance even when actual performance rates are identical for the populations of interest, as in the scenario we designed.
The sample design chosen for performance assessment should vary according to the purposes of the assessment4
. For assessment systems that are going to be used for external purposes, including public release of performance information or pay-for-performance, assessment systems should strive to minimize differences across physicians (or health centers in this case) that arise due to differences in the patient populations served, as opposed to underlying performance. Alternatively, if assessment is being used primarily to track performance or for internal quality improvement efforts, one might desire a more inclusive definition that would afford centers the possibility of identifying populations of patients that might benefit from more intensive outreach or other efforts.
Our study also highlights the importance of choosing the appropriate denominator population in order to restore confidence in the validity of performance assessments. Physicians may lose faith in performance assessments if the patient selection criteria do not match their own perceptions of the patients for whom they are responsible. The present results, by showing that different sample designs could lead to different assessments, suggest that denominator populations should be chosen carefully based on the target audience of the assessment results. For example, if a goal is to change physician behavior, we may need to choose a sample of patients who match physicians’ own definition of patients to whom they are responsible.
Our study is subject to several important limitations. First, our empirical data collection was limited to administrative data from 15 sites at 9 health centers located in a single region of the country. However, the purpose of these data is to illustrate the potential variation across health centers and, even with this small sample size, we saw important variation in the proportions of patients that would qualify under various sample design scenarios. A bigger limitation is the lack of empirical data on real differences that might be observed among these populations. Although the overall estimated rates we used were consistent with prior data, we were not able to sample sufficient numbers of patients from each of the mutually exclusive groups to enable precise estimates of this portion of variation. Nonetheless, we believe that our simulated results represent a reasonable range of results that might be observed under these sample designs.
In summary, we find that performance assessments of community health centers are likely to be sensitive to the sample designs used to define the patient populations for whom they are responsible. The potential differences we observe are clinically important and could have important ramifications for CHCs if connected to funding or public reporting. Moreover, our results also have implications for public reporting and pay-for-performance programs that face similar methodological challenges. Our results suggest that assessment systems must carefully consider the definitions used to identify accountable populations and how such populations might differ across health care settings.