|Home | About | Journals | Submit | Contact Us | Français|
To examine the effects of incentive payment frequency on quality measures in a physician-specific pay-for-performance (P4P) experiment.
A multispecialty physician group practice.
In 2007, all primary care physicians (n=179) were randomized into two study arms differing by the frequency of incentive payment, either four quarterly bonus checks or a single year-end bonus (maximum of U.S.$5,000/year for both arms).
Data were extracted from electronic health records. Quality measure scores between the two arms over four quarters were compared.
There was no difference between the two arms in average quality measure score or in total bonus amount earned.
Physicians' responses to a P4P program with a small maximum bonus do not differ by frequency of bonus payment.
A rapidly growing number of health plans and payers in the United States are adopting some type of pay-for-performance (P4P) mechanisms to reward quality of care in addition to the traditional payment structure that rewards volume of care provided. P4P programs are designed to offer financial incentives to a group of (and occasionally individual) physicians based on their performance on clinical and service quality measures. Implementation of P4P programs, however, cannot readily be informed by empirical evidence, as findings of existing studies are often inconsistent and details on program design and implementation are generally not well documented (Petersen et al. 2006). Organizational-level incentives from payers are generally not distributed to individual physicians or are shared equally (Hillman et al. 1998, 1999; Kouides et al. 1998; Amundson et al. 2003; Roski et al. 2003). Some recent studies examined financial rewards based on individual physician's performance (Beaulieu and Horrigan 2005; Doran et al. 2006; Levin-Scherz, DeVita, and Timbie 2006; Gilmore et al. 2007;), suggesting that even a small amount of bonuses could improve performances on incentivized measures. In P4P programs in published studies, incentives were typically provided annually. One study assessed quarterly bonuses, which were given to medical groups rather than individual physicians (Rosenthal et al. 2005).
Reinforcement theory from psychology literature suggests that changing behavior by incentives is easiest when the linkage between behavior and incentive (or positive reinforcer) is clearest, and the reinforcers are placed in routine (Luthans and Kreitner 1985). Therefore, bonuses based on individual performance and provided promptly might be more effective than group-level and delayed bonuses. On the other hand, theories in organizational behavior suggest that the amount of payment contingent on performance, apart from any system effect, would be an important determinant of the effectiveness of incentives (Lawler 1968). Empirical evidence in nonhealth areas suggests that increasing the anticipated bonus amount may improve performance (Locke et al. 1980; Khan and Sherer 1990;). To our knowledge, however, there is no empirical evidence on whether the frequency of payment by itself, given a fixed maximum achievable bonus with a periodic reporting, would have an impact on performance.
Our study examined this question by taking advantage of a rare opportunity of conducting a randomized experiment within a P4P program. The notion we test here is that a check attached to the periodic quality measure report will garner a physician's attention and impact a physician's performance more than a quality measure report without immediate financial feedback. As a physician-specific P4P program for primary care physicians (PCPs) was being implemented in a large group practice, the physicians were randomized into one of the two study arms differing only in the frequency of payment—quarterly versus year end. Physicians in both arms continued receiving quarterly reminders on quality reporting, and all were newly eligible for the performance-based individual incentive payments. The overall favorable effect of the physician-specific P4P program on the incentivized quality measures is reported elsewhere (Chung et al. 2009); here we focus on whether the frequency of payment alone makes difference in physicians' response to the incentive program.
Our study adds to the literature on physician P4P in two important ways. First, we provide the first empirical evidence on a specific method of implementing physician-specific performance-based payment in a practice setting. If increasing frequency of bonus payment, which incurs additional cost, does not make an improvement, a less frequent payment option would be more cost-effective. Second, in comparing the two methods, we use a randomized controlled design that allows for assessing the causal relationship between incentive frequency and performance improvement.
The study was conducted at Palo Alto Medical Clinic (PAMC) of the Palo Alto Medical Foundation (PAMF). The PAMF is a not-for-profit health care organization in Northern California, contracting in 2007 with three multispecialty physician groups. The PAMC currently provides care in five clinics operating in Fremont, Los Altos, Palo Alto, Redwood City, and Redwood Shores. Approximately 13 percent of the general population in the underlying geographic area is enrolled with PAMC. The data of this study came from the electronic health records (EHR) that have been in use at the study sites since 2000.
In 2007, PAMC implemented a physician-level P4P incentive program where the amount of the incentive was determined by individual physician performance. Physicians at PAMC are normally paid based on relative value-based units of service. All the PCPs at PAMC in family medicine, internal medicine, or pediatrics participated in the incentive program, and all the patients of the participating PCPs were considered for the performance evaluation, regardless of insurance plan.
The program at PAMC had the two arms differing only by the frequency of incentive payment. Participating physicians were randomized to one of these two arms: one arm received four quarterly bonuses and another received a single year-end bonus. Bonuses were calculated each quarter, paid to the quarterly group and accumulated for the year-end group. The incentive payments were included with the regular paychecks. The maximum achievable bonus was U.S.$5,000/year or U.S.$1,250/quarter, representing about 2 percent of the annual PCP salary.
The incentive program targets and incentives were developed via a consensus process by representatives of the participating physician departments. This included definitions of measures, eligible and qualifying patients for each measure, and formulae for incentive calculation. See Chung et al. (2009) for additional details on the program and quality measures.
PCPs at PAMC had been receiving reports, with quarterly updates, on performance on a variety of quality measures since 2003. With this reporting system, physicians were alerted by e-mail with an electronic link to a detailed quality score workbook describing their scores for each quality measure, peer physicians' scores (individually identified), and rank relative to other physicians in the department. With the implementation of the incentive program in 2007, the plan was for the quarterly score report to be sent on the 24th day of the month following the evaluation quarter and the paycheck to be delivered 2 weeks after the quarterly report. In the first quarter, however, there was a 2-month administrative delay in both the calculations and check. The quarterly group, therefore, did not receive the first quarter's bonus until July 2007 and received the second quarter's bonus 1 month after the first quarter's bonus. The other reports and bonuses were properly timed as planned.
Nine of 15 measures implemented in the incentive program were selected from measures routinely monitored and reported to the physicians for several years. These included three outcome measures for diabetes control (blood pressure ≤130/80 mmHg, HbA1C<7%, and LDL<100 mg/dL) and six process measures (prescription of asthma controller, cervical cancer screening, Chlamydia screening, colon cancer screening, whether the height and weight were measured and recorded, and documentation of tobacco use history).
The other six measures, all specific to pediatric patients, were newly adopted for the 2007 program. As physicians began to use the new measures, many raised questions about specific definitions of these metrics, and some metrics were modified during the year. We therefore excluded these pediatrics measures and the assessments of all pediatricians in the present analyses to ensure meaningful comparisons of scores over the four quarters.
Physicians received either four quarterly bonuses or single year-end bonuses; both were based on individual composite points calculated quarterly by an algorithm developed by the incentive program leadership. For each quality measure, the percent score (i.e., numerator/denominator × 100, where numerator was the number of patients receiving recommended care and denominator was the number of patients eligible for the recommended care) was calculated. Thresholds for performance ranging from minimally acceptable (1) to stretch (3) points were set based on previous scores (reflecting all the physicians in each department) for that measure. Only measures for which a physician had six or more patients in the quarter were considered qualifying. (The purpose of physicians setting the minimum criterion for the denominator in the incentive design was to prevent the percent score from being dominated by a few cases; we followed their approach.) The bonus for each quarter was calculated as (points earned for the qualifying measures/maximum achievable points) × U.S.$1,250, where the maximum achievable points=3 × number of qualifying measures.
Figure 1 is an example of percent scores, thresholds, and achieved points for the “diabetes HbA1C control” measure among family medicine providers, as it appears in the provider workbook screen. All the physicians received instructions on how to use and interpret the quality score workbook that allows them to look at their own patients who were included in their numerators and denominators.
We used the percent score (0–100) of each qualifying measure as a main dependent variable. (This eliminates the “boundary effects” potentially arising from scores close to the threshold levels that affected bonuses.) We tested for a difference in the percent score in each quarter or in the trend of improvement over the four quarters between quarterly and year-end payment groups. We also examined the quarterly trend in bonus amount (either paid to the quarterly group or accumulated for the year-end group) between the two arms; this measure reflects the impact of thresholds on bonus payments. We included site-fixed effects and interaction terms between site and intervention arm to examine whether the intervention effect differed across clinic sites.
In all analyses, the first quarter score was the referent point. The unit of analysis was physician for each quarter. For each physician, observations from all nine measures were assessed. Within-physician correlations were taken into account using a physician random effects model. Statistical significance was considered at the p<.05 level. All statistical analysis was performed using STATA 10.0 (College Station, TX).
Among the 179 physicians initially randomized, 167 physicians were included in the program for all four quarters; 12 did not participate in the program for the whole years for various reasons (some left the medical group, others were on medical leave or sabbatical leave, and still others were working part time and did not have enough qualifying patients). After excluding physicians in the pediatrics department (n=43) who had very few eligible patients for the adult-focused measures, 124 physicians were included in the present study.
Among the 124 physicians, nearly all (n=120) had one or more qualifying measures all four quarters. Slightly less than half (44 percent) were in the quarterly paid group. Reflecting the random assignment, there was no difference in the average prebaseline performance scores between the two arms.
The frequency of payment—quarterly or year-end—did not affect the average quality score over the four quarters (result table not presented). The average quality score in each quarter, by study arm, is plotted in Figure 2. In the plot, the y axis is the average of the scores of the nine measures, weighing each measure equally. While it appears there was a slight increasing trend in the third and fourth quarter scores as compared to the first and second quarter scores, which may be associated with the delay in the payment and reporting, the trend was not statistically significant.
Similarly, the average bonus amount did not differ by the frequency of payment, as seen in Figure 3. Total bonus amount received varied substantially across physicians, ranging from U.S.$425 to U.S.$4,484 (average U.S.$2,868, standard deviation U.S.$724). Trends over the four quarters in bonus amount (either received for quarterly arm or accumulated for year-end arm) between the two arms did not differ. There was no site-specific variation in the effect of payment frequency on either quality score or bonus amount.
Despite of the promise and growing popularity of P4P as a method to compensate physicians, our current understanding on specifics of how to design and implement the program in real practice settings is lacking. The present study addresses frequency of payment, which is a potentially important factor to consider in implementing P4P in a clinical setting. In our P4P experiment, where incentives were given to individual physicians for their performance on a variety of quality measures, however, we found no differential improvement in overall quality measure scores based on the frequency of payment.
The main limitation of our study is that the impact of quarterly payment cannot be isolated from the impact of quarterly reporting. An alternative study design to assess the impact of reporting concurrently would use a third arm that only received year-end reporting along with bonus payment, but it was considered unethical to withhold the quarterly reports as part of a trial. Although the quarterly report had been sent to the physicians for several years, practice directors reported to the investigators that physicians in both arms suddenly began raising questions regarding the quarterly report after the implementation of the physician-level incentive program. While the purpose of the quarterly report was to remind physicians about quality monitoring, it may have become more effective in conjunction with the bonus program. If true, that is, the “bonus” payment changed physician's response to quality reporting, that should be interpreted as an incremental effect of bonus payment. The question the present study sought to answer is whether the incremental effect differed by the frequency of payment, given other factors being equal, including continuation of the existing quarterly reporting.
Findings of our study should be interpreted within the context of the setting: physicians in this large medical group have been exposed for several years to reporting of the measures that were the focus of the bonus program. With information technology tools already in place, physicians in both study arms could easily identify their own eligible patients for each measure, not only to check the validity of the measures, but for which patients improvements might be necessary. They could also compare their performance with that of other physicians. The effect of frequency of payment might have been different in another setting.
The maximum bonus offered in this P4P program was roughly 2.5 percent of the average physician's annual pay, and the average bonus (U.S.$2,868) was 1.4 percent of the average physician's annual pay. The magnitude of bonuses used in other studies examining physician-specific P4P incentives for quality improvement varies widely. Larger bonuses seem to be more effective in changing physician's practice: a U.K. study showed significant improvement in measured quality with exceptionally generous bonuses (average of 35 percent of physician income; Doran et al. 2006). Thus, one reason our study did not show any effect of frequency of payment may be because the achievable bonus amount was too small, regardless of the frequency of payment.
We did not formally analyze the costs of implementing quarterly versus annual bonuses. Because the P4P experiment was built on the existing performance evaluation and reporting system, incremental costs for the preparation of the bonus calculation were small. Quarterly reporting is clearly more expensive than annual, but the ongoing feedback is perceived to be valuable; the incremental cost of sending three additional checks per physician per year is small.
In conclusion, the frequency of payment itself, with no difference in the maximum bonus amount or in the frequency of reporting, may not substantially affect physicians' response to a P4P program. Future work should further investigate the effect of varying the frequency of reporting under the same financial incentive scheme, as well as and the effects of varying bonus amounts.
Joint Acknowledgment/Disclosure Statement: This manuscript is derived from work supported under a contract with The Agency for Healthcare Research and Quality (contract #HHSA290200600023I). We appreciate inputs from Laurel Trujillo, MD, and Tomas Moran, MS, who provided data and contextual information about the P4P program, and Sally Kraft, MD, and Dan Dohan, PhD, research collaborators at the earlier stage of the program implementation.
Additional supporting information may be found in the online version of this article:
Appendix SA1: Author Matrix.
Please note: Wiley-Blackwell is not responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.