We found that roughly 65% of all primary care physicians active in the Medicare program work in practices with insufficient numbers of beneficiaries to reliably differentiate their practices’ performance from national quality and cost benchmarks. Only the largest primary care physician practices, which are also the most uncommon, can be expected to have sufficient caseloads to measure significant differences in performance.
To the best of our knowledge, ours is the first study to estimate the percentage of primary care physicians who care for sufficient numbers of Medicare patients to produce statistically reliable performance measurements at the practice level. Other studies have assessed statistical reliability based on the probability that a physician’s performance is the same after repeated measurements.15–17
We used power calculations instead because they are less prone to misclassification bias and better aligned with the goal of value-based purchasing, which seeks to differentiate performance between physicians. If a sound majority of primary care physicians at the practice level do not have caseloads large enough to detect 10% relative differences in performance of common cost and quality measures from national benchmarks, then our findings have several implications for value-based purchasing.
Our study suggests that rethinking the approach to performance measurement in ambulatory care may be necessary for the Medicare program. Many existing quality measures are already limited by their focus on particular sub-populations of patients, such as patients with diabetes aged 65 to 75 years and women aged 65 to 69 years. Preventable hospitalization and readmission are relatively rare events, and a multiplicity of factors outside of ambulatory care can influence their occurrence.18
Cost measures always require large sample sizes because individual patients’ annual expenditures vary widely, even after risk adjustment.19
Overcoming these limitations is possible by increasing the number of patients eligible for statistical analysis. One approach would be to pool patients from all payer sources. Another way to increase sample size would be to pool patients across a variety of measures rather than for each single measure, although a physician’s performance on different ambulatory care measures is poorly correlated.20–22
Alternatively, performance could be measured over a 2-or 3-year period, although this approach would not provide timely information on performance. New performance measures not available from claims data, such as patient experience or work processes measures, might be devised but would require the expense of collecting data.
In addition to the current limitations of performance measurement, an interrelated challenge for value-based purchasing is identifying a model of accountability without excluding large portions of physicians and the patients receiving their care. The medical home has recently emerged as an attractive means to explicitly identify an accountable entity.23
Its most basic form would involve a prespecified clinician or set of clinicians responsible for a patient’s care. Yet, our findings suggest that it would be extremely difficult to surmount the limitations of performance measurement for a medical home the size of the typical primary care physician practice.
Therefore, the right model for measuring performance and fostering accountability may vary in accordance with the different ways in which physician practices are organized. We found that it is possible to reliably measure 10% relative differences in performance for the care of Medicare patients in practices with 50 or more primary care physicians. Since few primary care physicians practice in groups of this size, it might be possible if, for value-based purchasing purposes, smaller primary care physician practices were aggregated into virtual groups or networks of more than 50 primary care physicians. It has been suggested that organizations such as independent physician associations, physician-hospital associations, or accountable entities centered around the acute care hospital in which physicians admit the majority of their patients could serve as virtual groups.24–26
However, at present, there appear to be few independent physician associations and physician-hospital associations in which physicians are successfully working together to improve quality,27
and other than physician-hospital associations, accountable entities centered around acute care hospitals currently do not exist. A virtual group, therefore, would only be effective insofar as its physicians believed that they shared responsibility for working together to improve the care they provide, even though they remain in independent practices that are smaller than the virtual group.
Our study has several limitations. First, accurately matching physicians to their practices in a given year is challenging because physician turnover makes matching physicians to practices a perennial moving target. To be more confident about practice assignment, we excluded primary care physicians affiliated with multiple practices; these primary care physicians had the same caseloads as primary care physicians with a single-practice affiliation. It is unlikely that we inaccurately assigned a disproportionate number of primary care physicians to the smallest practices since the percentage of primary care physicians who were solo practitioners was lower than national survey estimates.28
If we inaccurately assigned a higher proportion of physicians to the largest practices, then any bias in our estimate of the physician distribution would overestimate the number of physicians affiliated with practices that could reliably detect relative differences in performance.
Second, counting all patients treated by primary care physicians in a physician practice, as opposed to assigning patients to unique practices, allowed us to calculate a conservative estimate of the proportion of practices of various sizes that could reliably detect relative differences in costs and quality. Had we assigned each patient to a unique practice, an even smaller number of primary care physician practices would have met the necessary caseload thresholds. Likewise, we did not account for clustering of patients within practices.
Third, larger groups of primary care physicians may be desirable for performance measurement, but there may be advantages to patients receiving care in smaller physician practices.
Finally, we chose a 10% relative difference in performance from national rates as representing a meaningful difference. We thought this level was an appropriate starting point for measuring costs and quality for 2 reasons: first, detecting smaller differences would require more patients and make it more challenging to accurately differentiate practices. Second, increases or decreases in costs or quality greater than 10% of a national average may be unrealistic. Power calculations could use larger relative differences for rare events such as preventable hospitalization and a log-transformation to lessen the variability of ambulatory costs. Such approaches would reliably differentiate more primary care physician practices, yet the proportion of primary care physician practices with insufficient caseloads would remain considerable.
In the absence of performance measurement approaches that amass larger numbers of eligible patients at the physician or practice level, the results from this study call into question the wisdom of pay-for-performance programs and quality reporting initiatives that focus on differentiating the value of care delivered to the Medicare population by primary care physicians. Novel measurement approaches appear to be needed for the twin purposes of performance assessment and accountability.