Compliance improved for all reported performance measures between Year 1 and Year 3 of the P4P demonstration project both in hospitals receiving financial incentives and in other hospitals that were only subject to public reporting. P4P hospitals experience larger unadjusted gains on some but not all targeted measures ().
| Table 1Unadjusted Average Process Compliance by Pay-for-Performance (P4P) and Public-Reporting-Only Hospitals, Fiscal Years 2003–2005 |
As shown in , P4P and reporting-only hospitals increased performance across both easy and difficult measures. Overall gains are nearly identical for P4P and early adopter non-P4P hospitals for AMI (3.5 percentage points versus 3.2). Early adopter hospitals actually make larger gains in use of ACE-inhibitors for left ventricular systolic dysfunction (LVSD), which is classified as a difficult task. P4P hospitals do exhibit larger gains in composite scores for both heart failure (7.8 versus 6.8) and pneumonia (11.5 versus 10.1 percentage points) relative to the early adopter non-P4P hospitals. In contrast to the predicted behavior for P4P hospitals to reduce efforts on difficult tasks, incentivized hospitals make larger gains on hard tasks for both heart failure and pneumonia than comparison hospitals do.
reports regression results from the first set of random effects regressions comparing P4P hospitals to public reporting early adopters. P4P hospitals score higher on easy tasks than control hospitals for AMI (α=0.93 percentage points, SE=0.36) and heart failure (α=3.12, SE=2.68), and pneumonia (α=0.05, SE=0.21), though only the AMI effect is statistically significant. The differences between P4P and control hospitals for difficult tasks are small and insignificant. The P4P coefficient for heart failure is negative (α=−0.44, SE=0.90) as expected, but positive for heart attack (α=0.44, SE=1.48) and pneumonia (α=1.04, SE=0.72). The P4Ph× time effects are positive and statistically significant for the hard pneumonia composites, indicating that P4P hospitals improve more rapidly on difficult tasks than unincentivized hospitals, contrary to our expectations.
| Table 2Pay-for-Performance (P4P) Participation, Initial Performance and Hospital Process Compliance, Early Reporters Only: Fiscal Years 2003–2005 |
The regression evidence confirms our observation from the descriptive statistics; hospitals generally did not respond to P4P incentives as expected. Point estimates are small in magnitude; for example, the 0.93 percentage point increase in the easy AMI composite represents about 1 percent of the Year 1 mean score.
The P4P incentives in PHQID are most relevant for high and low performers. Contrary to our expectations, we fail to find statistically significant effects for P4P hospitals at either end of the initial quality distribution relative to hospitals with average scores. In sensitivity analysis, we fail to observe significant P4P effects in models estimated separately by quintile. We compare P4P hospitals to early-adopter hospitals because we are concerned that other unobserved hospital characteristics, such as motivation to improve and prior improvement activity, correlate with P4P status and generate biased estimates. Early adopter public reporting hospitals have somewhat higher initial composite quality scores for all three incentivized conditions. In sensitivity analysis, models are estimated using the full public-reporting sample as a control group (
Table SA1). Our results are essentially unchanged, though P4P coefficients are slightly larger in magnitude.
P4P incentives may be more salient for larger hospitals that are eligible for larger potential bonus payments. reports regression results controlling for hospital volume and a volume × P4Ph interaction. We first omit the initial performance quintiles, which were insignificant determinants of process score in the first set of regressions. P4P main effects are positive and statistically significant for both AMI and Heart Failure easy tasks (αAMI=1.24, SE=0.43; αHF=5.2, SE=2.52). While the P4P effect remains small and statistically insignificant for the easy pneumonia composite, P4P hospitals exhibit significantly higher performance on the difficult pneumonia tasks (αPN=2.20, SE=0.99).
| Table 3Pay-for-Performance (P4P) Participation, Volume and Hospital Process Compliance among Early Reporters, Fiscal Year 2003–2005 |
Magnitudes of the P4P point estimates are reduced when we reintroduce initial performance quintiles Qh and P4Ph× Qh. Only the AMI easy P4P effect remains statistically significant, indicating a 1 percentage point higher process compliance score among P4P hospitals relative to public reporting only. P4P hospitals also improve compliance with hard measures of pneumonia care by an additional 2 percentage points in each of the second 2 years of the demonstration, the only significant difference in performance over time between P4P and comparison hospitals. The full P4P effect for the heart failure easy and pneumonia hard measures, including all interaction terms, is also both statistically insignificant and inconsistently signed for most combinations of hospital size, year, and initial performance quintile.
We sought additional evidence as to whether hospitals strategically substitute toward easy tasks in order to improve their scores. In , we examine the distribution of relative numbers of eligible patients across measures to understand the potential for effort substitution across targeted tasks. Hospitals have, on average, 5.9 times as many patients eligible for an aspirin at admission for AMI (an easy measure) as are eligible for an ACE-inhibitor among those with LVSD (difficult measure). This implies that if the average hospital faces marginal costs to provide an ACE-inhibitor for those with LVSD that are >5.9 times the marginal costs of aspirin at admission, they should substitute efforts from the hard to the easy measure in order to maximize the P4P composite. It is implausible that the marginal cost ratio is not >5.9 for the average hospital in practice, but substitution is not observed to have occurred. For some task pairs, the easy:difficult ratio is <1. Unless the harder task was substantially cheaper (at the margin) than the easy task, we would expect score-maximizing hospitals to have fully substituted toward the easier task by Year 3.
| Table 4Task Substitution Ratios of Difficult versus Easy Tasks |
In regression analyses, we confirm that hospitals which face a lower marginal cost ratio for substitution (and therefore greater incentives to substitute) were not more likely to substitute toward easier tasks under P4P. We estimate our comprehensive specification of
equation (1) including the full set of initial performance, year, and volume P4P interactions separately for each of the incentivized tasks (
Table SA2). Among individual measures, the P4P main effect is statistically significant only for two of the easy AMI measures (aspirin at arrival and discharge) and one of the easy pneumonia measures (vaccination status). While hospitals in the highest quintile of performance score do not differentially respond to P4P incentives, hospitals in the lowest performance quintile for heart failure care exhibit higher scores for one easy (smoking cessation counseling,
α=6.8 percentage points, SE=2.7) and one difficult measure (left ventricular assessment,
α=3.0 percentage points, SE=1.46).
We conducted additional sensitivity analyses to confirm our results. Our main findings—that P4P is associated with a 1 percentage point gain in compliance for easy AMI tasks but not related to performance on heart failure or pneumonia measures—are robust across multiple specifications. Findings persist when we reestimate
equation (1) using the natural logarithm of compliance score as the dependent variable and in a seemingly unrelated regression model with the change in score as the dependent variable, which allows the error terms to correlate across conditions.