|Home | About | Journals | Submit | Contact Us | Français|
Objective To investigate whether the incentive scheme for UK general practitioners led them to neglect activities not included in the scheme.
Design Longitudinal analysis of achievement rates for 42 activities (23 included in incentive scheme, 19 not included) selected from 428 identified indicators of quality of care.
Setting 148 general practices in England (653500 patients).
Main outcome measures Achievement rates projected from trends in the pre-incentive period (2000-1 to 2002-3) and actual rates in the first three years of the scheme (2004-5 to 2006-7).
Results Achievement rates improved for most indicators in the pre-incentive period. There were significant increases in the rate of improvement in the first year of the incentive scheme (2004-5) for 22 of the 23 incentivised indicators. Achievement for these indicators reached a plateau after 2004-5, but quality of care in 2006-7 remained higher than that predicted by pre-incentive trends for 14 incentivised indicators. There was no overall effect on the rate of improvement for non-incentivised indicators in the first year of the scheme, but by 2006-7 achievement rates were significantly below those predicted by pre-incentive trends.
Conclusions There were substantial improvements in quality for all indicators between 2001 and 2007. Improvements associated with financial incentives seem to have been achieved at the expense of small detrimental effects on aspects of care that were not incentivised.
Over the past two decades funders and policy makers worldwide have experimented with initiatives to change physicians’ behaviour and improve the quality and efficiency of medical care.1 Success has been mixed, and attention has recently turned to payment mechanism reform, in particular offering direct financial incentives to providers for delivering high quality care.2 In 2004 in the UK the Quality and Outcomes Framework (QOF) was introduced—a mechanism intended to improve quality by linking up to 25% of general practitioners’ income to achievement of publicly reported quality targets for several chronic conditions.3
Should these incentives succeed, the potential benefits for patients with the relevant conditions are considerable.4 Incentives might also improve general organisation of care, benefiting processes and conditions beyond those covered by the incentives.5 Financial incentives have several potential unintended consequences, however. For example, they might result in diminished provider professionalism, neglect of patients for whom quality targets are perceived to be more difficult to achieve, and widening of health inequalities.6 7 Doctors might also focus on the conditions linked to incentives and neglect other conditions8 or, where certain activities are incentivised within the management of a particular condition, might neglect other activities for patients with that condition.
Practices in England generally performed well on incentivised activities in the first year of the UK incentive scheme, and overall performance improved over the next two years.9 10 11 It is not known, however, how much of this improvement is attributable to the incentive scheme and how much to underlying trends in quality improvement. There is also little evidence on the impact of the incentives on activities lying outside the incentive scheme.
Investigating these issues is problematic because performance data were not routinely collected before the scheme’s implementation, and afterwards data were collected only at the practice level for activities included in the framework. Evidence from small patient groups suggests that achievement of incentivised activities did accelerate on the introduction of the scheme, with some positive spillover to non-incentivised activities for incentivised conditions in the first year12 but not for non-incentivised conditions.13
The aim of our study is to use a longitudinal dataset at the patient level to examine changes in performance after the introduction of the incentive scheme for processes that became part of the incentive scheme and for processes that did not, and to compare the two groups.
The Quality and Outcomes Framework, introduced in 2004, links up to 25% of UK family practitioner income to performance on 76 clinical quality indicators and 70 indicators relating to organisation of care and patient experience.3 Of the clinical indicators, 10 relate to maintaining disease registers, 56 to processes of care (such as measuring disease parameters and giving treatments), and 10 to intermediate outcomes (such as controlling blood pressure). Indicators are periodically reviewed, and can be adjusted or dropped from the scheme altogether, with new indicators being introduced. Physicians are permitted to use their clinical judgment to exclude inappropriate patients from achievement calculations (“exception report”). Practices are awarded points based on the proportion of patients for whom targets are achieved, between a lower achievement threshold of 40% for most indicators (that is, practices must achieve the targets for over 40% of patients to receive any points) and an upper threshold that varies according to the indicator. In 2007 each point earned the practice £125 (€141; $202), adjusted for patient population size and disease prevalence. A maximum of 1000 points was available, equating to £31000 per physician.
Patient level data were extracted from the General Practice Research Database (GPRD), which contains anonymised, patient based data on morbidity, prescribing, treatment, and referral collected from over 500 general practices, covering about 7% of the UK population (4.4 million patients).14 Data are in Read code format—a hierarchical system used to code clinical data. Additional data on prescriptions and test results are available as free-text entries. We selected a sample of 148 practices that provided data to the GPRD continuously between January 2000 and December 2007, structured to include a range of list (patient panel) sizes. Selected practices were nationally representative in terms of patient sex and age distribution and area socioeconomic deprivation but had a relatively large average list size, reflecting a bias towards larger practices in the GPRD (table 11).). Overall, the selected practices performed marginally better than national practices on the clinical indicators in the Quality and Outcomes Framework (table 1). A random selection of 4500 patients registered for at least one day between 1 January 2000 and 31 December 2007 was drawn from each practice. For practices with fewer than 4501 patients, all patients were selected. The final sample consisted of 653500 patients. Patients with relevant conditions were identified from their diagnostic Read codes (see appendix on bmj.com). Patients for whom targets were met were identified from the relevant Read codes and free-text terms.
TD, SC, JMV, CS, and MR initially identified 428 quality of care indicators—combinations of processes of care and patient groups—from published indicators of quality of care that had been developed by a recognised method or which had broad professional consensus (for example, derived from the British National Formulary, published national guidelines, or authoritative statements from specialist societies).15 16 17 18 19 20 21 22 23 24 25 26
Indicators were rejected if the activity did not fall under the remit of all general practices or if data to assess performance were not available in practices’ electronic patient records. In order to exclude, as far as possible, the effects on indicator achievement of factors other than inclusion in the incentive scheme, we also rejected indicators for which there was, during the period of observation, a substantial change in the evidence base, in coding procedures, or in availability of data from electronic laboratory reports. Our analysis measured the effects of incentives over both the short term (one year) and longer term (three years), and we therefore also rejected indicators that were incorporated into, or dropped from, the incentive scheme after the first year. Examples of rejected indicators are given in the appendix.
After this process, 42 indicators were selected for the study. The selected indicators were divided into two categories, incentivised indicators (that is, those included in the Quality and Outcomes Framework, where the particular process was incentivised for the particular patient group) or non-incentivised indicators (that is, those not included in the Quality and Outcomes Framework). Using an a priori schema, we further classified the indicators into two types of clinical activity—measurement (such as monitoring blood pressure) or prescribing (such prescribing β blockers). Research has shown that practices approach these types of activities differently under the incentive scheme: for example, rates of exception reporting (identifying patients as unsuitable for a clinical indicator) averaged 3% for measurement activities and 13% for treatment or prescribing activities in 2005-6.27 The levels of achievement that are practically attainable also differ between the activity types, with achievement rates generally lower for prescribing compared with measurement activities. We therefore analysed indicators of the two activity types separately. Indicators of a third activity type—intermediate outcomes—were excluded as no examples of non-incentivised outcome indicators could be identified.
The four combinations of indicator category and activity type (that is, incentivised measurement, incentivised prescribing, non-incentivised measurement, and non-incentivised prescribing) were designated “indicator groups” for our analysis. Table 22 describes the individual quality indicators.
We divided the study period into financial years (1 April to 31 March) to correspond to the assessment periods for the financial incentives. We designated 2000-1, 2001-2, and 2002-3 as pre-intervention time points; 2003-4 as a preparatory phase (when details of the forthcoming quality targets were in the public domain but incentives were not yet available); and 2004-5, 2005-6, and 2006-7 as post-intervention time points. Patients were included in the sample for a year provided they were registered with their practice for the entire year. All available events were then examined, and patients with relevant diagnostic or activity codes were included in the denominator (Di) for a given quality indicator. From these patients, those for whom the indicator was met within the required time frame were included in the numerator (Ni). We excluded patients from non-incentivised indicators where the activity was incentivised for a co-existing condition—for example, patients with coronary heart disease were excluded from indicator C8 (cholesterol measurement for patients with hypertension). For each practice, annual achievement rates on each indicator were calculated as Ni/Di. We applied logit transformations to achievement rates to reduce floor and ceiling effects.28
We analysed and compared indicator groups with respect to the difference between achievement predicted from the trend in the pre-incentive period (2000-1 to 2002-3) and actual achievement in both 2004-5 (the first year of the incentive scheme) and 2006-7 (the third year). Data were analysed as a two level, mixed, multivariate regression with indicator crossed with practice. Covariates included predicted achievement rates and control variables for differences in disease registers (indicator denominators, patient age, and patient sex). We derived estimates of means and standard errors for each indicator group, first controlling for differences in covariates across time points and practices within indicators (model 1), then adding control for differences between indicators (model 2). Full details of each model appear in the appendix on bmj.com. We first examined the impact of the incentive scheme on each of our four indicator groups separately, and then used post-estimation tests to compare between incentivised and non-incentivised indicators for each of the two activity types. We also analysed individual indicators. All statistical comparisons were made at an α level of 5%. Analyses were performed using Stata (version 11). For presentation purposes, we report means back transformed from the logit scale to percentages.
Over the pre-intervention period (2000-1 to 2002-3) achievement increased significantly for 32 of the 42 indicators, decreased significantly for two, and did not change for eight. In 2002-3 achievement varied from 1.4% for indicator C3 (women aged >50 with depression who have had thyroid function tests) to 98.7% for indicator D7 (patients treated with sumatriptan who do not have angina) (see table 33 for full list). There were systematic differences between indicator groups for the rate of improvement (P<0.001), with incentivised measurement indicators (that is, measurement indicators that became incentivised in 2004-5) having the fastest overall rate of improvement, and non-incentivised prescribing indicators the slowest overall rate (figure(figure).). At the end of the pre-incentive period (2002-3) indicator groups also differed in mean performance scores (p<0.001), with the highest overall score for non-incentivised prescribing indicators and lowest for non-incentivised measurement indicators.
In 2004-5, achievement rates for incentivised indicators were significantly higher than rates projected from pre-incentive trends for all 17 measurement indicators, and five of the six prescribing indicators (table 33).). The increase in achievement above that predicted from pre-incentive trends varied from 1.2% to 37.7%, with four indicators (all relating to measuring smoking status) having increases of over 30%. Both groups of incentivised indicators showed overall increases in achievement significantly above the predicted rates (table 44).
For the non-incentivised indicators, there were significant increases in achievement above the predicted rates for two of the nine measurement indicators, and a significant decrease for one (that is, achievement was below that predicted from pre-incentive trends). There was no significant overall effect for this group of indicators. Two of the 10 non-incentivised prescribing indicators had a significant increase above predicted achievement rates, three had significant decreases, and there was no effect for the remaining five. Overall, there was no significant effect for this group of indicators (table 44).). There was, however, substantial heterogeneity of effect within this group of indicators.
For both types of activity—measurement and prescribing—increases in achievement above predicted rates were significantly larger for incentivised indicators than non-incentivised indicators under both model 1 and model 2 (table 44).
There were significant increases in achievement rates for all four indicator groups between 2004-5 (the first intervention year) and 2006-7 (the third intervention year). With the exception of non-incentivised measurement indicators, these increases, although significant, were small (<3%, see table 44).
For the incentivised indicators, achievement remained significantly above projected rates in 2006-7 for 10 of the 17 measurement indicators, and four of the six prescribing indicators (table 33).). However, rates were significantly below projections for five measurement indicators and one prescribing indicator. Even so, both incentivised groups continued to have overall achievement rates significantly higher than predicted.
For the non-incentivised indicators, achievement was significantly above projected rates in 2006-7 for one of the nine measurement indicators, but significantly below for seven. The overall achievement rate for this group was significantly lower than predicted (by 5.6%). Four of the 10 prescribing indicators had achievement rates significantly lower than predicted, and one had a significantly higher rate. The overall achievement rate for this group of indicators was also significantly lower than predicted (table 44),), although heterogeneity of effect was high.
Relative to predicted values, overall achievement rates in 2006-7 for non-incentivised indicators were significantly below those for incentivised indicators, for both measurement and prescribing activities (table 44).
The success of quality improvement initiatives depends not only on whether they achieve their intended outcomes, but on their unintended consequences. With “pay for performance” schemes there is a risk that rewarding performance of certain clinical activities will divert attention from other, unrewarded activities. The original UK Quality and Outcomes Framework was developed with a wide range of indicators, 76 clinical and 70 non-clinical, in part to prevent doctors focusing on too narrow a range of activities and to encourage a broader systematic approach to improving quality of care, although quality improvement initiatives implemented in response to the incentives tended to focus specifically on the incentivised conditions and activities.29 We used patient-level data from general practices using electronic patient records from 1999 onwards to examine trends in achievement for incentivised and non-incentivised activities.
Our study is subject to limitations. First, any changes in the consistency and accuracy of data recording over time would affect our findings, particularly for incentivised indicators. Such changes are more likely to affect indicators assessed by practices (such as body mass index) than indicators assessed by third parties such as laboratories (for example, serum creatinine concentration). However, we found no consistent relation between changes in achievement rates over time and the agent responsible for measuring indicator parameters.
Second, changes in case mix over time might have affected achievement rates, particularly if there were changes in case finding activity under the incentive scheme. To account for this, we controlled for changes over time in indicator denominators and age and sex profiles. We also found no substantial differences in patterns of achievement for different cohorts of patients (such as diabetic patients diagnosed in 2000-1 compared with patients diagnosed in later years).
Third, although the sampled practices were nationally representative in terms of patient demographics, they might have been atypical in terms of organisation and quality of care. Collectively, they had marginally higher achievement scores on incentivised indicators than the national average in each intervention year,30 which might be attributable to higher baseline performance in the pre-incentive period or a greater response to the incentives. However, with respect to their general pattern of achievement for the incentivised indicators—high in the first year followed by diminishing improvements in the second and third years—the sampled practices followed the national trend.8
Fourth, we were highly conservative in selecting indicators, focusing on processes not subject to other forms of incentive and for which guidelines remained unchanged throughout the study period. This limits the generalisability of our findings, although comparison with trends in achievement rates for non-selected incentivised indicators suggests that the selected indicators were not atypical.
In addition to examining trends in achievement for incentivised and non-incentivised aspects of care, we made direct comparisons between these. To strengthen this comparison we grouped indicators, a priori, by activity type. We also controlled for different baseline achievement rates and trends, and for changes in disease registers over time. Nevertheless, the subgroups remained non-equivalent in important respects, both in terms of patient groups and care processes. For example, in contrast to other subgroups, most non-incentivised prescribing indicators related to “negative” (patient safety) activities, and the general focus of the incentives on promoting recommended processes of care rather than on avoiding proscribed activities may have diverted attention from patient safety issues. Activities that were incorporated into the incentive scheme may also have been accorded greater importance by clinicians even before the scheme was implemented.
Non-equivalence does not invalidate our findings on the effect of the incentives on incentivised and non-incentivised indicators, but it raises the question of whether such non-equivalence can account for the observed post-intervention differences between the groups. To investigate further, we used data from the first two pre-intervention years to predict rates of achievement in the last pre-intervention year (2002-3) and found only small deviations from expectation for all four indicator groups. This implies that uncontrolled sources of non-equivalence did not produce group differences in the immediate pre-intervention period, and that the post-intervention differences we observed developed after that point (see appendix on bmj.com). However, as the financial incentives were introduced simultaneously nationwide, with no control practices, we could not test our assumption that in the absence of the incentives the pre-intervention achievement trends would have continued.
Of the 23 incentivised indicators we analysed, measurement indicators improved at the fastest rate in the pre-intervention period, from a relatively low baseline in 2000-1. After the introduction of incentives, achievement rates increased substantially for all measurement indicators, with increases above projected rates in 2004-5 of up to 38%. Of the six indicators with the greatest increases, five entailed recording smoking status—a technically straightforward activity that had a low baseline (either because practices were not asking most of their patients before the incentives or were not recording that they were doing so), required minimal patient compliance, and was monitored exclusively by the practices. Collectively, prescribing activities had higher baseline achievement rates than measurement activities in 2000-1, slower increases in achievement rates in the pre-intervention period, and smaller increases above projected rates in the first intervention year (2004-5) of between 1.2% and 8.3%. Achievement rates for all incentivised indicators reached a plateau in the second and third years of the scheme, and as a result only 14 of the 23 incentivised indicators had achievement rates significantly higher than rates projected from pre-intervention trends after three years.
For non-incentivised indicators the introduction of financial incentives seemed to have little overall impact in the first year, with quality continuing to improve at around the pre-intervention rate. In the second and third years, although quality continued to improve, the rate of improvement slowed. By 2006-7 quality was significantly worse than projected from pre-incentive trends, most notably for measurement activities, and was also significantly below—relative to projection—the quality for incentivised activities.
The general improvement in quality before the introduction of the financial incentives suggests that the quality initiatives implemented in the UK at that time—including clinical audits, development of information technology, and creation of quality oriented statutory bodies—were having an effect in improving quality of care. The Quality and Outcomes Framework built on this infrastructure and introduced a national set of quality targets (clinical guidelines) supported by computer prompts and feedback and backed by financial and reputational incentives. The scheme was associated with additional increases in performance in its first year across a broad range of incentivised activities, but the rate of improvement was not sustained. There are several possible explanations for this. Practices might have improved their recording procedures in the first year of the scheme in response to the incentives, effectively “correcting” achievement rates, particularly for some measurement indicators. After this, the rate of quality improvement seemed to plateau. Alternatively, practices might have reached their achievement limit by 2006-7, with the incentive scheme hastening their arrival at this ceiling. A third explanation is that practices relaxed their efforts after the first year of the scheme. Most practices attained achievement rates above the maximum achievement thresholds (the level of achievement required to secure maximum remuneration under the scheme) for most incentivised indicators in the first year, and thereafter had no financial incentive for further improvement.
Quality of primary care was generally improving in England in the early 2000s. The introduction of an incentive scheme seemed to accelerate this trend for incentivised activities, but quality quickly reached a plateau. Incentives had little apparent impact on non-incentivised activities in the short term, but seem to have had some detrimental effects in the longer term, possibly because of practices focusing on patients for whom rewards applied. Some aspects of the UK Quality and Outcomes Framework and its setting may have limited its impact on non-incentivised activities (such as the lack of financial penalties, the wide range of indicators, and the existence of other quality initiatives) and other aspects may have exacerbated its impact (such as the large size of the incentives and the appearance of incentive related prompts on clinical computing systems during consultations). Findings may be different under schemes with different incentive structures operating in different settings. Nevertheless, these findings show some important limitations of financial incentive schemes in health care, and the importance of monitoring, as far as possible, activities that are not incentivised in addition to those that are when determining the effects of such schemes.
We are grateful for comments provided by Arnold Epstein and Meredith Rosenthal of the Harvard School of Public Health during the drafting of the paper.
Contributors: TD participated in the planning of the study, analysis and interpretation of data, and drafting and editing the manuscript, and is guarantor for the integrity of the data and accuracy of the data analysis. EK participated in the planning of the study, analysis and interpretation of data, and drafting and editing the manuscript. JMV, SC, CR, and DR participated in the planning of the study, analysis and interpretation of data, and editing the final manuscript. All authors saw and approved the final version of the manuscript and had full access to all of the data in the study
Funding: There was no direct funding for this study, but the National Primary Care Research and Development Centre receives core funding from the UK Department of Health. The views expressed are those of the authors and not necessarily those of the Department of Health. At the time of writing, TD was supported by a grant from the Commonwealth Fund as a Harkness Fellow in Health Care Policy and Practice.
Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf (available on request from the corresponding author) and declare: no support from any organisation for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; MR served as an academic advisor to the government and British Medical Association negotiating teams during the development of the UK pay for performance scheme during 2001 and 2002.
Ethical approval: Not required: study based on publicly available data.
Data sharing: Technical appendix and statistical code available from the corresponding author (firstname.lastname@example.org). The dataset was derived from the General Practice Research Database and is not available from the authors, but it can be derived on application to GPRD.
Cite this as: BMJ 2011;342:d3590