|Home | About | Journals | Submit | Contact Us | Français|
Outcomes of care are a blunt instrument for judging performance and should be replaced, say Richard J Lilford, Celia A Brown, and Jon Nicholl
Healthcare organisations are increasingly scrutinised by external agencies, such as the Health Care Commission in England and Medicare in the Unites States. Such agencies increasingly concern themselves with the quality of care and not just measures of throughput, such as waiting times and the average length of hospital stay. Measures of clinical quality are also likely to be used increasingly to monitor the performance of individual doctors.1 But how should quality be measured? The intuitive response is to measure the outcomes of care—after all, patients use the service to improve their health outcomes. We argue that this beguiling solution has serious disadvantages because of the poor correlation between outcome and quality and that use of outcome as a proxy for quality is a greater problem when the data are used for some purposes than for others.
Data on quality can be used either for internal quality improvement or for external reporting. In the first scenario, data are collected by an organisation or individual for internal audit in the spirit of continuous improvement (quality circles, total quality management, plan do act, Kaizen, etc). In the second scenario, monitoring is imposed externally by health service funders for purposes of accountability (performance management). When results lie above or below some predefined threshold, funders may use the data to prompt further investigation in a completely non-pejorative manner. Alternatively, they may use data as the basis for sanction or reward. For example, hospitals may be given ratings that determine managerial freedoms and financial reward or a doctor may be suspended. We shall refer to use of data for sanction or reward as data for judgment. It is such use that is particularly problematic.
The main disadvantage of measuring outcomes arises from the low signal to noise ratio: outcomes are likely to be affected by factors other than the quality of care. A recent systematic review showed that, although statistically significant, the correlation between the quality of clinical practice and hospital mortality is low2 and hence mortality is neither a sensitive nor a specific test for quality. Modelling shows that big differences in the quality of care are likely to be lost in mortality statistics3 and that over half of the institutions with the worst quality of care are likely to have mortality in the normal range and vice versa.4 The situation may be worse at the community level.5
It is a myth that the problem of poor correlation between quality and outcomes can be solved by statistical adjustment for risk (the risk adjustment fallacy).6 Risk adjustment does not remove the problems of bias in rankings for two reasons:
Firstly, risk adjustment cannot allow for case mix variables that have not been measured (perhaps because they are unknown) and are therefore omitted from the statistical model. Nor can it allow for differences in definitions (or in how the same definitions are applied) to either numerators or denominators. For instance, differences in discharge policies (perhaps influenced by availability of a local hospice) will affect the types of patients included in the statistics.
Secondly, risk adjustment is sensitive to modelling assumptions. Adjustment may even increase bias if the risk associated with the risk factor is not constant across groups being compared.7 8 For example, the effect of age on mortality may be greater in some groups (such as those from low socioeconomic backgrounds) than in others. If this is the case risk adjustment will under-adjust for groups in which age has the largest effect. The predicted mortality will be lower than the observed mortality and the playing field is tilted against clinicians or institutions in places where the age effect is greatest.
The problems of risk adjustment do not just apply to mortality but also to other outcomes, such as surgical site infections, for which the definition varies widely from place to place. Using surrogate outcomes, such as the proportion of diabetic patients whose measure of glucose control exceeds some threshold, brings in further confounding variables, such as systematic differences in patients' willingness to adhere to treatment. Even using patient satisfaction to rank individuals or institutions for humane care is potentially misleading since the results may be confounded by systematic differences in expectations.9
There may be some topics where the signal to noise ratio is much better than in the examples cited above and hence where a sizeable portion of the variance in outcome is made up of variance in quality. If such examples exist, they are likely to arise among the most technically demanding services such as paediatric cardiac surgery. Claims and counter claims have been made about the role of outcome monitoring for coronary artery bypass surgery.10 We suspect that even in these topics case selection and other differences between institutions have the major role. We therefore believe those who wish to use such outcomes for performance management within an accountability framework must first prove that they are strongly correlated (not just statistically associated) with quality.
With some possible exceptions, outcomes are clearly neither sensitive nor specific measures of quality. Managers and clinicians therefore quite properly distrust them. This can induce perverse incentives—staff apply their ingenuity to altering or disproving the figures rather than tackling quality or safety, patients are nudged into more severe prognostic categories, treatment may be targeted at patients with the best prognosis (who are often those with the least capacity to benefit), and there are even cases where statutory data have been altered.11 12
The problem that outcome data are poor barometers of clinical quality is viciously confounded by both their inability to discriminate between good and poor performers13 and the lack of information they convey about how improvements should be made. In education, for example, it is now standard practice to include a comment and not just a grade when assessing an essay or assignment. Using outcomes to trigger sanctions or rewards may induce a sense of shame or institutional stigma6—the feeling of diminished status that comes of being branded bad without being told what the problem is.
When outcomes are used to judge the performance of individual clinicians further problems arise. Firstly, the results are less precise than they are at institutional level. Secondly, outcomes synthesise all of the processes received by the patient and therefore reflect the activities of many clinicians and support services.
Measures of clinical process have many advantages over outcomes. These advantages are particularly important if policy makers insist on using data for judgment. Clearly, the processes selected for scrutiny must comprise accepted and scientifically valid tenets of clinical care: do patients with a fractured neck of the femur get surgery within 24 hours? Are patients on ventilation nursed in a semi-prone position? Do clinicians monitor respiratory rate on the acute medical wards and, if so, do they respond promptly to signs of deterioration?
Such measures are not a panacea. The measures themselves must be valid and important. Furthermore, process measures are not immune from case mix bias; sicker patients challenge the system more than those who are not so sick, so the playing field is tilted against those who care for more vulnerable patients. Nevertheless, we believe that process measures have four fundamental advantages over outcomes:
Reduction of case mix bias—Using opportunity for error rather than the number of patients treated as the denominator reduces the confounding that arises when one clinician or institution cares for sicker patients than another.14 This is because sicker patients present more opportunities for clinical process errors (of either omission or commission). Expressing errors as a function of opportunities for those errors adjusts (at least in part) for case mix bias. This method cannot be used when outcomes are assessed because the patient is the smallest possible unit of aggregation under these circumstances.
Lack of stigma—The message is “improve X,” not “you are bad.” For this reason they are less likely to prompt perverse solutions. Arguably it is easier and more natural to improve the care process than to try to discredit the measure (see below).
Prompt wider action—Process measures encourage action from all organisations or individuals with room for improvement, not just a small proportion of outliers. Shifting the whole distribution will achieve a larger health gain than simply improving the performance of those in the bottom tail, as the figurefigure shows. Assuming a normal distribution in quality, a shift of 10% would result in a health gain of 10%. However, improving the performance of the bottom 10% would produce a gain of 7.2%, even if this threshold distinguished perfectly between good and poorly performing units. Furthermore, organisations do not fail simultaneously across all dimensions of safety and quality. Rather they have particular strengths and weaknesses and improvement efforts can be targeted where they are needed: there is no need to produce a summary measure across criteria. In fact, we found no correlation between adherence to various evidence based quality criteria in 20 randomly selected UK maternity units.15 Thus, a hospital with above average recorded outcomes is still likely to have room for improvement in many aspects of care.
Useful for delayed events—Process measures are more useful than outcomes when the contingent adverse event is markedly delayed (such as failing to monitor patients with diabetes for proteinuria or to administer anti-D immunoglobulin when a rhesus negative woman gives birth to a rhesus positive baby).
Process standards used in performance management should be valid in that they must either be self evident measures of quality or be evidence based. However, validity is not sufficient—the standards must also be genuinely important to health care. This is because the opportunity cost of improving some processes may exceed the contingent gains.16 Worse, healthcare providers may put their efforts into the monitored processes at the expense of those that are not monitored.17 One way to ameliorate this effect may be to elicit clinical standards from professional societies or consortia of providers and users of health care. There are plenty of important, evidence based criteria, and health services fall well short of full compliance.18 19 The Royal College of Obstetricians and Gynaecologists produced clinical guidelines as early as 199320 and a before and after study showed massive change in line with the evidence.15
The different methods of measuring process all have advantages and disadvantages. Broadly speaking, measures can be either explicit or implicit (although both methods can be combined). Explicit measures use a set of predetermined criteria (checklists). Implicit measures are assessed more like an expert review of a set of case notes. Although the explicit method has greater reliability (greater inter observer agreement), the implicit method covers more dimensions of care because it can detect errors that might not have been specified on a predetermined checklist.21
Where outcomes are a specific measure of quality, externally imposed performance management by outcome may be effective. Collection of outcome data for cardiac surgery in the UK seems to have raised standards, although debate continues about whether the observed improvements exceed the secular trend.10 In the more common scenario where outcomes are not a specific measure of quality, process measures are a better method of judgment. However, process is expensive to measure as it currently requires access to patients' case notes. Evaluating case notes is time intensive and requires staff with clinical expertise. Electronic patients records and an increase in coded information in these records should make monitoring easier.
The cost of obtaining process measurements (and the contingent action) needs to be compared to the value (in terms of health benefit) of the improvement in quality that results from providers' responses to initial measurements. Although much of the evidence of effectiveness relates to bottom-up improvement programmes,22 there is also empirical support for the effectiveness of top-down performance management using process measures.23 24
Although process measures are the most suitable tool for performance management, measurement of outcomes remains important. Outcomes are useful for research, particularly for generating hypotheses. Here, simply finding an association between a variable (such as staff-patient ratios) and outcome (such as mortality) may be sufficient to prompt investigation, even when the strength of the association is low. Outcome data can also be used as a form of process control, such that institutions with abrupt changes in outcome or whose outcomes deviate by a large amount (three standard deviations or more is a sensible threshold 25) can be further investigated. For example, an outbreak of hospital acquired infection may be traced to a problem in the water supply. Lastly, the public are entitled to have access to outcome data, although such outcomes should always be published with a proper warning about the limitations.
We thank the referees for helpful comments.
Contributors and sources: This paper is the result of a synthesis of prolonged writing, review, and consultation with experts in the field. RJL has had many opportunities to debate and test the argument presented here, most notably as a participant at numerous Pennyhill Anglo-American health summits. RJL conceived the article, which was subsequently drafted by RJL, CAB, and JN. All three authors approved the final manuscript. RJL is the guarantor.
Competing interests: None declared.
Provenance and peer review: Not commissioned, externally peer reviewed.