Commentaries on patient safety in the United States five years after the publication of two key reports on patient safety in 2000 were characterised by some despair at an apparent lack of progress.
19 Our data suggest that a more encouraging story on patient safety in the NHS can now be told.
Baseline performance across hospitals was already high on many criteria relating to quality, leaving little room for improvement. Over 90% of patients with an acute exacerbation of obstructive airways disease received steroids when indicated, and rates of perioperative prophylaxis against venous thrombosis and infection approached 100%, corroborating an earlier study.
20 Where scope for improvement existed, we found many examples of improved, and none of worsening, practice. Vigilance in relation to monitoring vital signs on acute medical wards and use of severity scoring, observed in our study of SPI1, continued to improve. A strong upward trend in recording intraoperative temperature was noted. Rates of handwashing seemed to have increased significantly, and the incidence of
C difficile and MRSA infection fell. Though results of the staff survey showed little change over time, the survey of patients showed improvement across all five prespecified dimensions, suggesting a better experience for patients. There was even an improvement in medical history taking. Adverse event rates (3.03% in our study) seemed similar to those reported in the Harvard medical practice study (3.7%), which was based on data collected in 1984.
21 We found low levels of preventability among adverse events overall (about 20%) and among deaths (less than 10%). If these findings are corroborated, they have implications for future evaluations and performance management, as the signal (preventable adverse events) seems to be buried in a lot of noise (non-preventable adverse events).
The data we collected on SPI2 suggest that an additional effect of SPI is difficult to detect over and above the improvements occurring across the health service generally. Indeed, in a reversal of our evaluation of SPI1, organisational climate as measured by the staff survey favoured the controls. Adherence rates for many of the specific criteria reflecting quality of care remained high over time in both groups of hospitals, possibly reflecting a long history of quality improvement in areas such as perioperative care. Those areas that underwent marked improvement did so to a similar degree in both sets of hospitals. One exception was the drop in mortality among acute medical cases in SPI2 hospitals and an unexplained rise in control hospitals, such that the difference in differences would have been just significant if we had selected a P<0.05 threshold. Any suggestion of a difference in mortality rates resulting from difference in the quality of care, however, does not align well with the review of the quality of care observed among those same case notes. The observed difference in overall mortality cannot be accounted for by a difference in preventable deaths as only seven of the 91 deaths fell into the possibly preventable category. Overall, though there is considerable evidence of good or improved quality and safety in NHS hospitals, we could not detect a net effect attributable to SPI2 with our study measures. This largely mirrors the evaluation of SPI1,
1 though the latter did show an effect of SPI on the quality of monitoring the respiratory rate. Table 13 summarises the effects of both phases of SPI versus control, in terms of direction of the point estimate and degree of significance.
| Table 13 Summary of directions of effects of SPI across all quantitative evaluations of SPI1 and SPI2. Significant results are indicated |
Strengths and weaknesses of this study
The argument we made in the companion article,
1 and elsewhere,
3 that studies of quality improvement interventions should follow predefined protocols and incorporate contemporaneous controls is reinforced by our study of SPI2, where many end points improved significantly across both SPI2 hospitals and controls. We have also shown the importance of using a difference in difference approach to analysis to overcome the ambiguities of single difference studies. This method is widely used in economics research, where there are policy and other changes occurring during the implementation of a programme,
22 and it is clearly suitable for evaluations of quality improvement programmes in healthcare. We have also shown the need to allow for learning/fatigue effects in reviewing.
A particular strength of our study arises from its possibilities for triangulation. While available funding did not permit us to build further qualitative studies into the design, we did have various internal controls. Findings on the use of handwashing materials and rates of two different types of infection support the hypothesis of general improvement in this area. The observation that vital signs were recorded with increasing diligence and that risk scoring was used more often supports the idea that patients at risk of deterioration were being monitored more diligently. Mortality rates on acute medical wards could be triangulated, not only by an audit of compliance with process standards, but also by scrutinising each death in the sample to see if it could have been caused by poor care.
With hindsight, there are some things that we would do differently in this study. We would not measure all prescribing errors as this is expensive, and many errors are minor and of uncertain validity as surrogates for serious error.
23 We would instead concentrate on errors whose serious nature had been established. The reliability reviewer was new to case note reviews, and although several training sessions took place we would approach this training more systematically in the future. Nevertheless, previous work has also found that reliability for holistic reviews is lower than reliability for explicit reviews.
24We are developing and evaluating a novel tool based on review of case notes of patients who die in hospital, where each death is scored on a sliding scale of preventability. The aim is to produce a reliable measure of the proportion of hospital deaths that are preventable.
Possible improvements in clinical areas not studied
Though we had an explicit rationale for the clinical areas in which we focused our study, improvements could have occurred in areas that we did not study, such as ventilator acquired pneumonia and central line infections in intensive care units. If improvements did occur in these areas, it is possible that a greater “dose” of SPI was administered in these settings (for example, more activity by SPI “change agents”) or that such settings were more responsive to change than those we studied.
Improvements below the level of statistical detection
The absence of an additive SPI effect detected by our study does not exclude smaller effects that might none the less be cost effective. The threshold in England under which an intervention is judged cost effective is about £30

000 (€35

000; $48

000) per quality adjusted life year (QALY). The SPI would, therefore, need to save fewer than seven lives with a mean duration of five healthy years (ignoring discounting) to justify the SPI1 investment of £775

000 per hospital (and an even smaller magnitude of effect would be cost effective at the smaller costs in SPI2 hospitals). An effect of this magnitude cannot be excluded in a study of any feasible size; with many hundreds of deaths taking place in each hospital in each year the signal would be lost in the noise.
25 None the less, large effects postulated in advance of the study have been excluded, at least in the areas examined. The study was, after all, large enough to detect temporal improvements. The 50% and 30% reductions in adverse events that were aims of SPI1 and SPI2, respectively, were unnecessarily large in the sense that much smaller effect sizes would justify the costs of the intervention.
SPI hospitals might have been less sensitive to the intervention
The study was not randomised, and we cannot exclude the possibility that SPI hospitals as a whole were less sensitive to the intervention than controls. There were few differences at baseline, however, and where there was room for improvement among controls, similar room was available for SPI hospitals. It is also possible that SPI works better in some types of hospital than others.
26 We did not have statistical power to test for such interactions.
Possible suboptimal specification or implementation of SPI
Some of the reasons for the absence of an additional detectable SPI effect might lie in the design and implementation of the programme. While interviews conducted with senior staff in the study of SPI1
1 emphasised the “bottom-up” nature of the intervention, this was not necessarily how it was perceived by most ward staff. Despite the enthusiasm and broad understanding of the principles underlying the SPI at a strategic level, the programme and organisational theories of change might not have been sufficiently explicit. For example, no formal protocol for the intervention was published. There is evidence from the qualitative work in SPI1 that the scale of the SPI task was perceived as huge and demanding of resource. There were also suggestions that there was a need for the programme to be purposefully and actively led in each clinical setting, rather than assuming spontaneous “spread” from one setting to another. More work before the intervention might have identified with more precision how and under what conditions the programme would work best and would have more completely specified the underlying theories.
Optimising design and execution of quality improvement programmes is clearly necessary for many reasons, not least to avert the risk of damaging the credibility of such programmes as a whole. A combination of a more explicit programme theory and organisational theory of change, including better specification of the method of vertical and horizontal spread, might, for example, have explicitly confronted the six “universal challenges” for quality improvement (structural, political, cultural, educational, emotional, and physical/technological),
27 and it might have focused more attention on ensuring clinical engagement and use of clinical networks. Such an approach might have encouraged an earlier recognition that the intervention was broad relative to resources and might have identified that effects were likely to be localised in response to “dose” of intervention. In that case a more focused and less ambitious intervention, and somewhat narrower evaluation, might have been a better strategy.
There is also an argument that participation in SPI could secure greater long term commitment to quality and safety in participating hospitals and that improvements made in the intervention hospitals will either surface at a later date or be sustained better. This hypothesis can be tested only with further data collection, but it is possible that any effect of SPI might be in the form of “stickiness”; intervention hospitals might potentially be better equipped to show sustained improvements after the policy spotlight has moved elsewhere.
Contemporaneous policy and professional forces in the control environment
SPI coincided with a period of unprecedented increase in NHS funding that could have contributed to many of the improvements observed. An important reason for the absence of an additional effect of SPI might lie in the extent of the policy level programmes and initiatives that were largely contemporaneous with the SPI, shared some of its goals, principles, and methods, and acted forcefully on the control environment. For example, the “cleanyourhands” campaign promoted the same goal of improved hand hygiene as the SPI and began around the same time. In addition, the Health Act 2006 introduced new legislation on mandatory requirements for prevention and control of infections associated with healthcare and is likely to have exerted further pressures on hospitals.
Perhaps most importantly, several initiatives with features similar to IHI-style techniques and principles had increasing impact on policy at around the time that the SPI (which was mentored by the IHI) was launched. For example, the Department of Health’s Saving Lives programme, beginning in June 2005 with a revised version in 2007,
28 included a self assessment tool for trusts to assess their managerial and clinical performance and a set of “high impact interventions” that were similar to the IHI bundles and were aimed at several clinical processes also targeted by the SPI. The interest in IHI-like interventions might indeed have been prompted or inspired at least in part by the SPI; the House of Commons committee report on patient safety, for example, lists the SPI’s beginning in 2004 as among the important policy developments in the patient safety timeline.
29 It is also relevant that many of these policy initiatives had already been anticipated by consensus in professional societies and medical colleges, and thus enjoyed considerable professional legitimacy—a crucial factor in promoting safe and effective practice.
30 Patient safety and quality improvement was also, during the period when SPI was being implemented, drawing increasing attention from journals, professional meetings, and conferences. The SPI programme was thus being implemented at a time when the momentum towards quality improvement was accelerating and when it might itself have been one of the forces implicated in the momentum.
Given that many of the changes in practice being urged at a policy level were so similar to the SPI, and the resource directed at the SPI was relatively small, the SPI itself might not have been a sufficient additional “dose” to generate further detectable differences in participating hospitals: from £270

000 to £775

000 spent over 18 months in hospitals with annual budgets of £150m to £300m might simply be too small. This is perhaps most vividly illustrated by the disappearance in SPI2 of the positive impact on measures of compliance with monitoring and response to vital signs deterioration that we found in SPI1. This probably occurred because guidelines on recognition and response to acutely ill patients were issued by the National Institute for Health and Clinical Excellence (NICE) in 2007,
31 just as SPI2 was getting started. The detectable effects of SPI could have been muted compared with a situation where no similar policy changes were occurring.
In clinical research it has long been known that outcomes tend to improve over time, with the result that before and after studies systematically exaggerate treatment effects compared with studies with contemporaneous controls.
32 In clinical research temporal trends are usually the result of various factors apart from the intervention of interest, although exceptions exist—for example, HIV drugs and prostate specific antigen screening diffused into widespread use before evaluations were complete.
33 34 This risk of pre-evaluation diffusion is arguably greater in the case of management interventions that are multi-faceted, not easily containable, and are promoted as part of “continuous improvement” strategies. While new medicines are generally evaluated before they can be licensed and adopted, service interventions can more easily come into general use and generate social reinforcement before a formal evaluation has been put in place. Indeed growing interest in the intervention might be the stimulus both for increasing adoption and for the evaluation. The evidence provided above suggests that something like that happened with SPI and might have occurred in the provocatively null result of the MERIT study of rapid response service on medical wards.
35Our results suggest the occurrence of a phenomenon where the measured effect of an intervention is attenuated by similar changes happening more generally. This should be distinguished from the phenomenon of contamination, where the control group receives (some of) the intervention targeted at the study group.
36 In the case of the SPI, “contamination” is an inappropriate descriptor as the study was “anamnestic” and controls were selected after the SPI had been put in place. SPI implementation was well under way when controls were selected and the controls were not exposed to the extensive and expensive mentoring process that SPI entailed. We propose, rather, that a “rising tide” phenomenon was at work; both control and SPI sites were subject to the same tidal forces, and these same latent factors were the source of both a change in practice and the perceived need to evaluate these changes. Under these circumstances it is still worth evaluating an intervention, but this is more akin to evaluating “dose” in clinical research; the idea is pragmatic and aims to find out whether the marginal gains of an extra “push” is worth marginal expenditure or, at least, to provide some evidence to inform such a judgment.
Conclusions
Our studies show encouraging signs of improvements in quality and safety in the NHS in England, but detected only one specific improvement as a result of SPI, and that was confined to the first phase of the programme. Any detectable effects of such interventions might take time to surface. Such interventions are likely to benefit from clarity about the theories of change underlying the programme, recognition of the scale of resource and organisational support required to make patient safety efforts work, and improved understanding of how practitioners, middle managers, and organisational systems can be better supported in the face of daunting complexity and multiple priorities. Robust methods are needed to make appropriate conclusions about the impact of quality improvement efforts.
What is already known on this topic
- There are many examples of evaluations of interventions to improve the quality of specific clinical processes, but fewer attempting evaluations of system-wide change in whole hospitals
- The second phase of an attempt to effect system-wide change, the Safer Patients Initiative, was rolled out in 10 hospitals in England and 10 hospitals in other countries of the UK from March 2007 to September 2008
What this study adds
- Patient safety has improved across the NHS on many of the measures used in our study of English hospitals
- No additional effect of the Safer Patients Initiative could be detected
- Several possible explanations for the absence of an additional effect of the programme can be offered, including a “rising tide” phenomenon where improvements in patient safety were driven by common forces across the NHS