Efficient and evidence-based medical device and equipment prioritization is of particular importance in low-income countries due to constraints in financing capacity, physical infrastructure and human resource capabilities.
This paper outlines a medical device prioritization method developed in first instance for the Republic of South Sudan. The simple algorithm offered here is a starting point for procurement and selection of medical devices and can be regarded as a screening test for those that require more labour intensive health economic modelling.
A heuristic method, such as the one presented here, is appropriate for reaching many medical device prioritization decisions in low-income settings. Further investment and purchasing decisions that cannot be reached so simply require more complex health economic modelling approaches.
Electronic supplementary material
The online version of this article (doi:10.1186/s12962-014-0027-3) contains supplementary material, which is available to authorized users.
Medical devices; Equipment; Prioritization; Purchasing; Selection; Low-income country
Stepped-wedge cluster randomised trials (SW-CRTs) are being used with increasing frequency in health service evaluation. Conventionally, these studies are cross-sectional in design with equally spaced steps, with an equal number of clusters randomised at each step and data collected at each and every step. Here we introduce several variations on this design and consider implications for power.
One modification we consider is the incomplete cross-sectional SW-CRT, where the number of clusters varies at each step or where at some steps, for example, implementation or transition periods, data are not collected. We show that the parallel CRT with staggered but balanced randomisation can be considered a special case of the incomplete SW-CRT. As too can the parallel CRT with baseline measures. And we extend these designs to allow for multiple layers of clustering, for example, wards within a hospital. Building on results for complete designs, power and detectable difference are derived using a Wald test and obtaining the variance–covariance matrix of the treatment effect assuming a generalised linear mixed model. These variations are illustrated by several real examples.
We recommend that whilst the impact of transition periods on power is likely to be small, where they are a feature of the design they should be incorporated. We also show examples in which the power of a SW-CRT increases as the intra-cluster correlation (ICC) increases and demonstrate that the impact of the ICC is likely to be smaller in a SW-CRT compared with a parallel CRT, especially where there are multiple levels of clustering. Finally, through this unified framework, the efficiency of the SW-CRT and the parallel CRT can be compared.
stepped-wedge; cluster; sample size; multiple levels of clustering
To evaluate the cost–effectiveness of pulse oximetry – compared with no peri-operative monitoring – during surgery in low-income countries.
We considered the use of tabletop and portable, hand-held pulse oximeters among patients of any age undergoing major surgery in low-income countries. From earlier studies we obtained baseline mortality and the effectiveness of pulse oximeters to reduce mortality. We considered the direct costs of purchasing and maintaining pulse oximeters as well as the cost of supplementary oxygen used to treat hypoxic episodes identified by oximetry. Health benefits were measured in disability-adjusted life-years (DALYs) averted and benefits and costs were both discounted at 3% per year. We used recommended cost–effectiveness thresholds – both absolute and relative to gross domestic product (GDP) per capita – to assess if pulse oximetry is a cost–effective health intervention. To test the robustness of our results we performed sensitivity analyses.
In 2013 prices, tabletop and hand-held oximeters were found to have annual costs of 310 and 95 United States dollars (US$), respectively. Assuming the two types of oximeter have identical effectiveness, a single oximeter used for 22 procedures per week averted 0.83 DALYs per annum. The tabletop and hand-held oximeters cost US$ 374 and US$ 115 per DALY averted, respectively. For any country with a GDP per capita above US$ 677 the hand-held oximeter was found to be cost–effective if it prevented just 1.7% of anaesthetic-related deaths or 0.3% of peri-operative mortality.
Pulse oximetry is a cost–effective intervention for low-income settings.
Endometrial cancer is the most common genital tract carcinoma among women in developed countries, with most women presenting with stage 1 disease. Adjuvant progestagen therapy has been advocated following primary surgery to reduce the risk of recurrence of disease.
To evaluate the effectiveness and safety of adjuvant progestagen therapy for the treatment of endometrial cancer.
We searched the Cochrane Gynaecological Cancer Group Trials Specilaised Register, Cochrane Central Register of Controlled Trials (CENTRAL) Issue 2, 2009. MEDLINE and EMBASE up to April 2009.
Randomised controlled trials (RCTs) of progestagen therapy in women who have had surgery for endometrial cancer.
Data collection and analysis
Two review authors independently abstracted data and assessed risk of bias. Risk ratios (RRs) comparing survival in women who did and did not receive progestagen were pooled in random effects meta-analyses..
Seven trials assessing 4556 women were identified. Three trials included women with stage one disease only, whereas four included women with more advanced disease. Meta-analysis of four trials showed that there was no significant difference in the risk of death at five years between adjuvant progestagen therapy and no further treatment (RR = 1.00, 95% CI 0.85 to 1.18). This conclusion is also robust to single trial analyses at 4 and 7 years and in one trial across all points in time using a hazard ratio (HR). There was also no significant difference between progestagen therapy and control in terms of the risk of death from endometrial cancer, cardiovascular disease and intercurrent disease. Relapse of disease appeared to be reduced by progestagen therapy in one trial (HR = 0.71, 95% CI 0.52 to 0.97 and 5 year RR = 0.74, 95% CI 0.58 to 0.96), but there was no evidence of a difference in disease recurrence in another trial at 7 years (RR = 1.34, 95% CI 0.79 to 2.27).
There is no evidence to support the use of adjuvant progestagen therapy in the primary treatment of endometrial cancer. There have now been several RCTs which have failed to establish a role for adjuvant progestagen therapy after primary treatment for endometrial cancer and therefore, further trials in this field are probably not justified.
Antineoplastic Agents, Hormonal [*therapeutic use]; Cause of Death; Chemotherapy, Adjuvant; Endometrial Neoplasms [*drug therapy; mortality; pathology; therapy]; Hydroxyprogesterones [*therapeutic use]; Medroxyprogesterone Acetate [*therapeutic use]; Progestins [*therapeutic use]; Randomized Controlled Trials as Topic; Female; Humans
Medical device procurement processes for low- and middle-income countries (LMICs) are a poorly understood and researched topic. To support LMIC policy formulation in this area, international public health organizations and research institutions issue a large body of predominantly grey literature including guidelines, manuals and recommendations. We propose to undertake a systematic review to identify and explore the medical device procurement methodologies suggested within this and further literature. Procurement facilitators and barriers will be identified, and methodologies for medical device prioritization under resource constraints will be discussed.
Searches of both bibliographic and grey literature will be conducted to identify documents relating to the procurement of medical devices in LMICs. Data will be extracted according to protocol on a number of pre-specified issues and variables. First, data relating to the specific settings described within the literature will be noted. Second, information relating to medical device procurement methodologies will be extracted, including prioritization of procurement under resource constraints, the use of evidence (e.g. cost-effectiveness evaluations, burden of disease data) as well as stakeholders participating in procurement processes. Information relating to prioritization methodologies will be extracted in the form of quotes or keywords, and analysis will include qualitative meta-summary. Narrative synthesis will be employed to analyse data otherwise extracted. The PRISMA guidelines for reporting will be followed.
The current review will identify recommended medical device procurement methodologies for LMICs. Prioritization methods for medical device acquisition will be explored. Relevant stakeholders, facilitators and barriers will be discussed. The review is aimed at both LMIC decision makers and the international research community and hopes to offer a first holistic conceptualization of this topic.
Developing countries; Prioritization; Procurement; Medical devices
To understand how the results of laboratory tests are communicated to patients in primary care and perceptions on how the process may be improved.
Qualitative study employing staff focus groups.
Four UK primary care practices.
Staff involved in the communication of test results.
Five main themes emerged from the data: (i) the default method for communicating results differed between practices; (ii) clinical impact of results and patient characteristics such as anxiety level or health literacy influenced methods by which patients received their test result; (iii) which staff member had responsibility for the task was frequently unclear; (iv) barriers to communicating results existed, including there being no system or failsafe in place to determine whether results were returned to a practice or patient; (v) staff envisaged problems with a variety of test result communication methods discussed, including use of modern technologies, such as SMS messaging or online access.
Communication of test results is a complex yet core primary care activity necessitating flexibility by both patients and staff. Dealing with the results from increasing numbers of tests is resource intensive and pressure on practice staff can be eased by greater utilization of electronic communication. Current systems appear vulnerable with no routine method of tracing delayed or missing results. Instead, practices only become aware of missing results following queries from patients. The creation of a test communication protocol for dissemination among patients and staff would help ensure both groups are aware of their roles and responsibilities.
Diagnostic tests; medical errors/patient safety; practice management; primary care; qualitative research/study; quality of care.
This protocol concerns the assessment of cost-effectiveness of hospital health information technology (HIT) in four hospitals. Two of these hospitals are acquiring ePrescribing systems incorporating extensive decision support, while the other two will implement systems incorporating more basic clinical algorithms. Implementation of an ePrescribing system will have diffuse effects over myriad clinical processes, so the protocol has to deal with a large amount of information collected at various ‘levels’ across the system.
The method we propose is use of Bayesian ideas as a philosophical guide.
Assessment of cost-effectiveness requires a number of parameters in order to measure incremental cost utility or benefit – the effectiveness of the intervention in reducing frequency of preventable adverse events; utilities for these adverse events; costs of HIT systems; and cost consequences of adverse events averted. There is no single end-point that adequately and unproblematically captures the effectiveness of the intervention; we therefore plan to observe changes in error rates and adverse events in four error categories (death, permanent disability, moderate disability, minimal effect). For each category we will elicit and pool subjective probability densities from experts for reductions in adverse events, resulting from deployment of the intervention in a hospital with extensive decision support. The experts will have been briefed with quantitative and qualitative data from the study and external data sources prior to elicitation. Following this, there will be a process of deliberative dialogues so that experts can “re-calibrate” their subjective probability estimates. The consolidated densities assembled from the repeat elicitation exercise will then be used to populate a health economic model, along with salient utilities. The credible limits from these densities can define thresholds for sensitivity analyses.
The protocol we present here was designed for evaluation of ePrescribing systems. However, the methodology we propose could be used whenever research cannot provide a direct and unbiased measure of comparative effectiveness.
ePrescribing; Health information technology; Cost-effectiveness; Adverse events; Bayesian elicitation; Probability densities
CT colonography (CTC) may be an acceptable test for colorectal cancer screening but bowel preparation can be a barrier to uptake. This study tested the hypothesis that prospective screening invitees would prefer full-laxative preparation with higher sensitivity and specificity for polyps, despite greater burden, over less burdensome reduced-laxative or non-laxative alternatives with lower sensitivity and specificity.
Discrete choice experiment.
Online, web-based survey.
2819 adults (45–54 years) from the UK responded to an online invitation to take part in a cancer screening study. Quota sampling ensured that the sample reflected key demographics of the target population and had no relevant bowel disease or medical qualifications. The analysis comprised 607 participants.
After receiving information about screening and CTC, participants completed 3–4 choice scenarios. Scenarios showed two hypothetical forms of CTC with different permutations of three attributes: preparation, sensitivity and specificity for polyps.
Primary outcome measures
Participants considered the trade-offs in each scenario and stated their preferred test (or chose neither).
Preparation and sensitivity for polyps were both significant predictors of preferences (coefficients: −3.834 to −6.346 for preparation, 0.207–0.257 for sensitivity; p<0.0005). These attributes predicted preferences to a similar extent. Realistic specificity values were non-significant (−0.002 to 0.025; p=0.953). Contrary to our hypothesis, probabilities of selecting tests were similar for realistic forms of full-laxative, reduced-laxative and non-laxative preparations (0.362–0.421). However, they were substantially higher for hypothetical improved forms of reduced-laxative or non-laxative preparations with better sensitivity for polyps (0.584–0.837).
Uptake of CTC following non-laxative or reduced-laxative preparations is unlikely to be greater than following full-laxative preparation as perceived gains from reduced burden may be diminished by reduced sensitivity. However, both attributes are important so a more sensitive form of reduced-laxative or non-laxative preparation might improve uptake substantially.
Preventive Medicine; Public Health; Radiology & Imaging
To establish the relative weighting given by patients and healthcare professionals to gains in diagnostic sensitivity versus loss of specificity when using CT colonography (CTC) for colorectal cancer screening.
Materials and Methods
Following ethical approval and informed consent, 75 patients and 50 healthcare professionals undertook a discrete choice experiment in which they chose between “standard” CTC and “enhanced” CTC that raised diagnostic sensitivity 10% for either cancer or polyps in exchange for varying levels of specificity. We established the relative increase in false-positive diagnoses participants traded for an increase in true-positive diagnoses.
Data from 122 participants were analysed. There were 30 (25%) non-traders for the cancer scenario and 20 (16%) for the polyp scenario. For cancer, the 10% gain in sensitivity was traded up to a median 45% (IQR 25 to >85) drop in specificity, equating to 2250 (IQR 1250 to >4250) additional false-positives per additional true-positive cancer, at 0.2% prevalence. For polyps, the figure was 15% (IQR 7.5 to 55), equating to 6 (IQR 3 to 22) additional false-positives per additional true-positive polyp, at 25% prevalence. Tipping points were significantly higher for patients than professionals for both cancer (85 vs 25, p<0.001) and polyps (55 vs 15, p<0.001). Patients were willing to pay significantly more for increased sensitivity for cancer (p = 0.021).
When screening for colorectal cancer, patients and professionals believe gains in true-positive diagnoses are worth much more than the negative consequences of a corresponding rise in false-positives. Evaluation of screening tests should account for this.
To systematically evaluate the evidence across surgical specialties as to whether staples or sutures better improve patient and provider level outcomes.
A systematic review of systematic reviews and panoramic meta-analysis of pooled estimates.
Eleven systematic reviews, including 13,661 observations, met the inclusion criteria. In orthopaedic surgery sutures were found to be preferable, and for appendicial stump sutures were protective against both surgical site infection and post surgical complications. However, staples were protective against leak in ilecolic anastomosis. For all other surgery types the evidence was inconclusive with wider confidence intervals including the possibly of preferential outcomes for surgical site infection or post surgical complication for either staples or sutures. Whilst reviews showed substantial variation in mean differences in operating time (I2 94%) there was clear evidence of a reduction in average operating time across all surgery types. Few reviews reported on length of stay, but the three reviews that did (I2 0%, including 950 observations) showed a non significant reduction in length of stay, but showed evidence of publication bias (P-value for Egger test 0.05).
Evidence across surgical specialties indicates that wound closure with staples reduces the mean operating time. Despite including several thousand observations, no clear evidence of superiority emerged for either staples or sutures with respect to surgical site infection, post surgical complications, or length of stay.
Evaluation of predictive value of liver function tests (LFTs) for the detection of liver-related disease in primary care.
A prospective observational study.
11 UK primary care practices.
Patients (n=1290) with an abnormal eight-panel LFT (but no previously diagnosed liver disease).
Main outcome measures
Patients were investigated by recording clinical features, and repeating LFTs, specific tests for individual liver diseases, and abdominal ultrasound scan. Patients were characterised as having: hepatocellular disease; biliary disease; tumours of the hepato-biliary system and none of the above. The relationship between LFT results and disease categories was evaluated by stepwise regression and logistic discrimination, with adjustment for demographic and clinical factors. True and False Positives generated by all possible LFT combinations were compared with a view towards optimising the choice of analytes in the routine LFT panel.
Regression methods showed that alanine aminotransferase (ALT) was associated with hepatocellular disease (32 patients), while alkaline phosphatase (ALP) was associated with biliary disease (12 patients) and tumours of the hepatobiliary system (9 patients). A restricted panel of ALT and ALP was an efficient choice of analytes, comparing favourably with the complete panel of eight analytes, provided that 48 False Positives can be tolerated to obtain one additional True Positive. Repeating a complete panel in response to an abnormal reading is not the optimal strategy.
The LFT panel can be restricted to ALT and ALP when the purpose of testing is to exclude liver disease in primary care.
Primary Care; Epidemiology; Chemical Pathology
Medication errors are an important source of potentially preventable morbidity and mortality. The PINCER study, a cluster randomised controlled trial, is one of the world’s first experimental studies aiming to reduce the risk of such medication related potential for harm in general practice. Bayesian analyses can improve the clinical interpretability of trial findings.
Experts were asked to complete a questionnaire to elicit opinions of the likely effectiveness of the intervention for the key outcomes of interest - three important primary care medication errors. These were averaged to generate collective prior distributions, which were then combined with trial data to generate Bayesian posterior distributions. The trial data were analysed in two ways: firstly replicating the trial reported cohort analysis acknowledging pairing of observations, but excluding non-paired observations; and secondly as cross-sectional data, with no exclusions, but without acknowledgement of the pairing. Frequentist and Bayesian analyses were compared.
Bayesian evaluations suggest that the intervention is able to reduce the likelihood of one of the medication errors by about 50 (estimated to be between 20% and 70%). However, for the other two main outcomes considered, the evidence that the intervention is able to reduce the likelihood of prescription errors is less conclusive.
Clinicians are interested in what trial results mean to them, as opposed to what trial results suggest for future experiments. This analysis suggests that the PINCER intervention is strongly effective in reducing the likelihood of one of the important errors; not necessarily effective in reducing the other errors. Depending on the clinical importance of the respective errors, careful consideration should be given before implementation, and refinement targeted at the other errors may be something to consider.
Effective use of the laryngeal mask airway (LMA) requires learning proper insertion technique in normal patients undergoing routine surgical procedures. However, there is a move towards simulation training for learning practical clinical skills, such as LMA placement. The evidence linking different amounts of mannequin simulation training to the undergraduate clinical skill of LMA placement in real patients is limited. The purpose of this study was to compare the effectiveness in vivo of two LMA placement simulation courses of different durations.
Medical students (n = 126) enrolled in a randomised controlled trial. Seventy-eight of these students completed the trial. The control group (n = 38) received brief mannequin training while the intervention group (n = 40) received additional more intensive mannequin training as part of which they repeated LMA insertion until they were proficient. The anaesthetists supervising LMA placements in real patients rated the participants' performance on assessment forms. Participants completed a self-assessment questionnaire.
Additional mannequin training was not associated with improved performance (37% of intervention participants received an overall placement rating of > 3/5 on their first patient compared to 48% of the control group, X2 = 0.81, p = 0.37). The agreement between the participants and their instructors in terms of LMA placement success rates was poor to fair. Participants reported that mannequins were poor at mimicking reality.
The results suggest that the value of extended mannequin simulation training in the case of LMA placement is limited. Educators considering simulation for the training of practical skills should reflect on the extent to which the in vitro simulation mimics the skill required and the degree of difficulty of the procedure.
Cluster randomised controlled trials (CRCTs) are frequently used in health service evaluation. Assuming an average cluster size, required sample sizes are readily computed for both binary and continuous outcomes, by estimating a design effect or inflation factor. However, where the number of clusters are fixed in advance, but where it is possible to increase the number of individuals within each cluster, as is frequently the case in health service evaluation, sample size formulae have been less well studied.
We systematically outline sample size formulae (including required number of randomisation units, detectable difference and power) for CRCTs with a fixed number of clusters, to provide a concise summary for both binary and continuous outcomes. Extensions to the case of unequal cluster sizes are provided.
For trials with a fixed number of equal sized clusters (k), the trial will be feasible provided the number of clusters is greater than the product of the number of individuals required under individual randomisation (nI) and the estimated intra-cluster correlation (ρ). So, a simple rule is that the number of clusters (k) will be sufficient provided:
Where this is not the case, investigators can determine the maximum available power to detect the pre-specified difference, or the minimum detectable difference under the pre-specified value for power.
Designing a CRCT with a fixed number of clusters might mean that the study will not be feasible, leading to the notion of a minimum detectable difference (or a maximum achievable power), irrespective of how many individuals are included within each cluster.
We aimed to assess whether routine data produced by an electronic prescribing system might be useful in identifying doctors at higher risk of making a serious prescribing error.
Retrospective analysis of prescribing by junior doctors over 12 months using an electronic prescribing information and communication system. The system issues a graded series of prescribing alerts (low-level, intermediate, and high-level), and warnings and prompts to respond to abnormal test results. These may be overridden or heeded, except for high-level prescribing alerts, which are indicative of a potentially serious error and impose a ‘hard stop’.
A large teaching hospital.
All junior doctors in the study setting.
Main outcome measures
Rates of prescribing alerts and laboratory warnings and doctors' responses.
Altogether 848,678 completed prescriptions issued by 381 doctors (median 1538 prescriptions per doctor, interquartile range [IQR] 328–3275) were analysed. We identified 895,029 low-level alerts (median 1033 per 1000 prescriptions per doctor, IQR 903–1205) with a median of 34% (IQR 31–39%) heeded; 172,434 intermediate alerts (median 196 per 1000 prescriptions per doctor, IQR 159–266), with a median of 23% (IQR 16–30%) heeded; and 11,940 high-level ‘hard stop’ alerts. Doctors vary greatly in the extent to which they trigger and respond to alerts of different types. The rate of high-level alerts showed weak correlation with the rate of intermediate prescribing alerts (correlation coefficient, r = 0.40, P = <0.001); very weak correlation with low-level alerts (r = 0.12, P = 0.019); and showed weak (and sometimes negative) correlation with propensity to heed test-related warnings or alarms. The degree of correlation between generation of intermediate and high-level alerts is insufficient to identify doctors at high risk of making serious errors.
Routine data from an electronic prescribing system should not be used to identify doctors who are at risk of making serious errors. Careful evaluation of the kinds of quality assurance questions for which routine data are suitable will be increasingly valuable.
Tests of cognitive ability are probably the best method at present
Liver function tests (LFTs) are ordered in large numbers in primary care, and the Birmingham and Lambeth Liver Evaluation Testing Strategies (BALLETS) study was set up to assess their usefulness in patients with no pre-existing or self-evident liver disease. All patients were tested for chronic viral hepatitis thereby providing an opportunity to compare various strategies for detection of this serious treatable disease.
This study uses data from the BALLETS cohort to compare various testing strategies for viral hepatitis in patients who had received an abnormal LFT result. The aim was to inform a strategy for identification of patients with chronic viral hepatitis. We used a cost-minimisation analysis to define a base case and then calculated the incremental cost per case detected to inform a strategy that could guide testing for chronic viral hepatitis.
Of the 1,236 study patients with an abnormal LFT, 13 had chronic viral hepatitis (nine hepatitis B and four hepatitis C). The strategy advocated by the current guidelines (repeating the LFT with a view to testing for specific disease if it remained abnormal) was less efficient (more expensive per case detected) than a simple policy of testing all patients for viral hepatitis without repeating LFTs. A more selective strategy of viral testing all patients for viral hepatitis if they were born in countries where viral hepatitis was prevalent provided high efficiency with little loss of sensitivity. A notably high alanine aminotransferase (ALT) level (greater than twice the upper limit of normal) on the initial ALT test had high predictive value, but was insensitive, missing half the cases of viral infection.
Based on this analysis and on widely accepted clinical principles, a "fast and frugal" heuristic was produced to guide general practitioners with respect to diagnosing cases of viral hepatitis in asymptomatic patients with abnormal LFTs. It recommends testing all patients where a clear clinical indication of infection is present (e.g. evidence of intravenous drug use), followed by testing all patients who originated from countries where viral hepatitis is prevalent, and finally testing those who have a notably raised ALT level (more than twice the upper limit of normal). Patients not picked up by this efficient algorithm had a risk of chronic viral hepatitis that is lower than the general population.
Objective To independently evaluate the impact of the second phase of the Health Foundation’s Safer Patients Initiative (SPI2) on a range of patient safety measures.
Design A controlled before and after design. Five substudies: survey of staff attitudes; review of case notes from high risk (respiratory) patients in medical wards; review of case notes from surgical patients; indirect evaluation of hand hygiene by measuring hospital use of handwashing materials; measurement of outcomes (adverse events, mortality among high risk patients admitted to medical wards, patients’ satisfaction, mortality in intensive care, rates of hospital acquired infection).
Setting NHS hospitals in England.
Participants Nine hospitals participating in SPI2 and nine matched control hospitals.
Intervention The SPI2 intervention was similar to the SPI1, with somewhat modified goals, a slightly longer intervention period, and a smaller budget per hospital.
Results One of the scores (organisational climate) showed a significant (P=0.009) difference in rate of change over time, which favoured the control hospitals, though the difference was only 0.07 points on a five point scale. Results of the explicit case note reviews of high risk medical patients showed that certain practices improved over time in both control and SPI2 hospitals (and none deteriorated), but there were no significant differences between control and SPI2 hospitals. Monitoring of vital signs improved across control and SPI2 sites. This temporal effect was significant for monitoring the respiratory rate at both the six hour (adjusted odds ratio 2.1, 99% confidence interval 1.0 to 4.3; P=0.010) and 12 hour (2.4, 1.1 to 5.0; P=0.002) periods after admission. There was no significant effect of SPI for any of the measures of vital signs. Use of a recommended system for scoring the severity of pneumonia improved from 1.9% (1/52) to 21.4% (12/56) of control and from 2.0% (1/50) to 41.7% (25/60) of SPI2 patients. This temporal change was significant (7.3, 1.4 to 37.7; P=0.002), but the difference in difference was not significant (2.1, 0.4 to 11.1; P=0.236). There were no notable or significant changes in the pattern of prescribing errors, either over time or between control and SPI2 hospitals. Two items of medical history taking (exercise tolerance and occupation) showed significant improvement over time, across both control and SPI2 hospitals, but no additional SPI2 effect. The holistic review showed no significant changes in error rates either over time or between control and SPI2 hospitals. The explicit case note review of perioperative care showed that adherence rates for two of the four perioperative standards targeted by SPI2 were already good at baseline, exceeding 94% for antibiotic prophylaxis and 98% for deep vein thrombosis prophylaxis. Intraoperative monitoring of temperature improved over time in both groups, but this was not significant (1.8, 0.4 to 7.6; P=0.279), and there were no additional effects of SPI2. A dramatic rise in consumption of soap and alcohol hand rub was similar in control and SPI2 hospitals (P=0.760 and P=0.889, respectively), as was the corresponding decrease in rates of Clostridium difficile and meticillin resistant Staphylococcus aureus infection (P=0.652 and P=0.693, respectively). Mortality rates of medical patients included in the case note reviews in control hospitals increased from 17.3% (42/243) to 21.4% (24/112), while in SPI2 hospitals they fell from 10.3% (24/233) to 6.1% (7/114) (P=0.043). Fewer than 8% of deaths were classed as avoidable; changes in proportions could not explain the divergence of overall death rates between control and SPI2 hospitals. There was no significant difference in the rate of change in mortality in intensive care. Patients’ satisfaction improved in both control and SPI2 hospitals on all dimensions, but again there were no significant changes between the two groups of hospitals.
Conclusions Many aspects of care are already good or improving across the NHS in England, suggesting considerable improvements in quality across the board. These improvements are probably due to contemporaneous policy activities relating to patient safety, including those with features similar to the SPI, and the emergence of professional consensus on some clinical processes. This phenomenon might have attenuated the incremental effect of the SPI, making it difficult to detect. Alternatively, the full impact of the SPI might be observable only in the longer term. The conclusion of this study could have been different if concurrent controls had not been used.
Objectives To conduct an independent evaluation of the first phase of the Health Foundation’s Safer Patients Initiative (SPI), and to identify the net additional effect of SPI and any differences in changes in participating and non-participating NHS hospitals.
Design Mixed method evaluation involving five substudies, before and after design.
Setting NHS hospitals in the United Kingdom.
Participants Four hospitals (one in each country in the UK) participating in the first phase of the SPI (SPI1); 18 control hospitals.
Intervention The SPI1 was a compound (multi-component) organisational intervention delivered over 18 months that focused on improving the reliability of specific frontline care processes in designated clinical specialties and promoting organisational and cultural change.
Results Senior staff members were knowledgeable and enthusiastic about SPI1. There was a small (0.08 points on a 5 point scale) but significant (P<0.01) effect in favour of the SPI1 hospitals in one of 11 dimensions of the staff questionnaire (organisational climate). Qualitative evidence showed only modest penetration of SPI1 at medical ward level. Although SPI1 was designed to engage staff from the bottom up, it did not usually feel like this to those working on the wards, and questions about legitimacy of some aspects of SPI1 were raised. Of the five components to identify patients at risk of deterioration—monitoring of vital signs (14 items); routine tests (three items); evidence based standards specific to certain diseases (three items); prescribing errors (multiple items from the British National Formulary); and medical history taking (11 items)—there was little net difference between control and SPI1 hospitals, except in relation to quality of monitoring of acute medical patients, which improved on average over time across all hospitals. Recording of respiratory rate increased to a greater degree in SPI1 than in control hospitals; in the second six hours after admission recording increased from 40% (93) to 69% (165) in control hospitals and from 37% (141) to 78% (296) in SPI1 hospitals (odds ratio for “difference in difference” 2.1, 99% confidence interval 1.0 to 4.3; P=0.008). Use of a formal scoring system for patients with pneumonia also increased over time (from 2% (102) to 23% (111) in control hospitals and from 2% (170) to 9% (189) in SPI1 hospitals), which favoured controls and was not significant (0.3, 0.02 to 3.4; P=0.173). There were no improvements in the proportion of prescription errors and no effects that could be attributed to SPI1 in non-targeted generic areas (such as enhanced safety culture). On some measures, the lack of effect could be because compliance was already high at baseline (such as use of steroids in over 85% of cases where indicated), but even when there was more room for improvement (such as in quality of medical history taking), there was no significant additional net effect of SPI1. There were no changes over time or between control and SPI1 hospitals in errors or rates of adverse events in patients in medical wards. Mortality increased from 11% (27) to 16% (39) among controls and decreased from 17% (63) to 13% (49) among SPI1 hospitals, but the risk adjusted difference was not significant (0.5, 0.2 to 1.4; P=0.085). Poor care was a contributing factor in four of the 178 deaths identified by review of case notes. The survey of patients showed no significant differences apart from an increase in perception of cleanliness in favour of SPI1 hospitals.
Conclusions The introduction of SPI1 was associated with improvements in one of the types of clinical process studied (monitoring of vital signs) and one measure of staff perceptions of organisational climate. There was no additional effect of SPI1 on other targeted issues nor on other measures of generic organisational strengthening.
In the third in a series of articles on evaluating eHealth, Richard Lilford and colleagues consider the evaluation of health IT systems as they are rolled out following preimplementation testing.
Background and objective Utilities (values representing preferences) for healthcare priority setting are typically obtained indirectly by asking patients to fill in a quality of life questionnaire and then converting the results to a utility using population values. We compared such utilities with those obtained directly from patients or the public.
Design Review of studies providing both a direct and indirect utility estimate.
Selection criteria Papers reporting comparisons of utilities obtained directly (standard gamble or time trade off) or indirectly (European quality of life 5D [EQ-5D], short form 6D [SF-6D], or health utilities index [HUI]) from the same patient.
Data sources PubMed and Tufts database of utilities.
Statistical methods Sign test for paired comparisons between direct and indirect utilities; least squares regression to describe average relations between the different methods.
Main outcome measures Mean utility scores (or median if means unavailable) for each method, and differences in mean (median) scores between direct and indirect methods.
Results We found 32 studies yielding 83 instances where direct and indirect methods could be compared for health states experienced by adults. The direct methods used were standard gamble in 57 cases and time trade off in 60 (34 used both); the indirect methods were EQ-5D (67 cases), SF-6D (13), HUI-2 (5), and HUI-3 (37). Mean utility values were 0.81 (standard gamble) and 0.77 (time trade off) for the direct methods; for the indirect methods: 0.59 (EQ-5D), 0.63 (SF-6D), 0.75 (HUI-2) and 0.68 (HUI-3).
Discussion Direct methods of estimating utilities tend to result in higher health ratings than the more widely used indirect methods, and the difference can be substantial. Use of indirect methods could have important implications for decisions about resource allocation: for example, non-lifesaving treatments are relatively more favoured in comparison with lifesaving interventions than when using direct methods.
Objective To assess the validity of case mix adjustment methods used
to derive standardised mortality ratios for hospitals, by examining the
consistency of relations between risk factors and mortality across
Design Retrospective analysis of routinely collected hospital data
comparing observed deaths with deaths predicted by the Dr Foster Unit case mix
Setting Four acute National Health Service hospitals in the West
Midlands (England) with case mix adjusted standardised mortality ratios ranging
from 88 to 140.
Participants 96 948 (April 2005 to March 2006), 126 695 (April 2006
to March 2007), and 62 639 (April to October 2007) admissions to the four
Main outcome measures Presence of large interaction effects between
case mix variable and hospital in a logistic regression model indicating
non-constant risk relations, and plausible mechanisms that could give rise to
Results Large significant (P≤0.0001) interaction effects were seen
with several case mix adjustment variables. For two of these variables—the
Charlson (comorbidity) index and emergency admission—interaction effects could
be explained credibly by differences in clinical coding and admission practices
Conclusions The Dr Foster Unit hospital standardised mortality ratio
is derived from an internationally adopted/adapted method, which uses at least
two variables (the Charlson comorbidity index and emergency admission) that are
unsafe for case mix adjustment because their inclusion may actually increase the
very bias that case mix adjustment is intended to reduce. Claims that variations
in hospital standardised mortality ratios from Dr Foster Unit reflect
differences in quality of care are less than credible.
The importance of antenatal care (ANC) for improving perinatal outcomes is well established. However access to ANC in Kenya has hardly changed in the past 20 years. This study aims to identify the determinants of attending ANC and the association between attendance and behavioural and perinatal outcomes (live births and healthy birthweight) for women in the Kwale region of Kenya.
A Cohort survey of 1,562 perinatal outcomes (response rate 100%) during 2004–05 in the catchment areas for five Ministry of Health dispensaries in two divisions of the Kwale region. The associations between background and behavioural decisions on ANC attendance and perinatal outcomes were explored using univariate analysis and multivariate logistic regression models with backwards-stepwise elimination. The outputs from these analyses were reported as odds ratios (OR) with 95% confidence intervals (CI).
Only 32% (506/1,562) of women reported having any ANC. Women with secondary education or above (adjusted OR 1.83; 95% CI 1.06–3.15) were more likely to attend for ANC, while those living further than 5 km from a dispensary were less likely to attend (OR 0.29; 95% CI 0.22–0.39). Paradoxically, however, the number of ANC visits increased with distance from the dispensary (OR 1.46; 95% CI 1.33–1.60). Women attending ANC at least twice were more likely to have a live birth (vs. stillbirth) in both multivariate models. Women attending for two ANC visits (but not more than two) were more likely to have a healthy weight baby (OR 4.39; 95% CI 1.36–14.15).
The low attendance for ANC, combined with a positive relationship between attendance and perinatal outcomes for the women in the Kwale region highlight the need for further research to understand reasons for attendance and non-attendance and also for strategies to be put in place to improve attendance for ANC.
Background and Aims
The standard whole-colon tests used to investigate patients with symptoms of colorectal cancer are barium enema and colonoscopy. Colonoscopy is the reference test but is technically difficult, resource intensive, and associated with adverse events, especially in the elderly. Barium enema is safer but has reduced sensitivity for cancer. CT colonography ("virtual colonoscopy") is a newer alternative that may combine high sensitivity for cancer with safety and patient acceptability. The SIGGAR trial aims to determine the diagnostic efficacy, acceptability, and economic costs associated with this new technology.
The SIGGAR trial is a multi-centre randomised comparison of CT colonography versus standard investigation (barium enema or colonoscopy), the latter determined by individual clinician preference. Diagnostic efficacy for colorectal cancer and colonic polyps measuring 1 cm or larger will be determined, as will the physical and psychological morbidity associated with each diagnostic test, the latter via questionnaires developed from qualitative interviews. The economic costs of making or excluding a diagnosis will be determined for each diagnostic test and information from the trial and other data from the literature will be used to populate models framed to summarise the health effects and costs of alternative approaches to detection of significant colonic neoplasia in patients of different ages, prior risks and preferences. This analysis will focus particularly on the frequency, clinical relevance, costs, and psychological and physical morbidity associated with detection of extracolonic lesions by CT colonography.
Recruitment commenced in March 2004 and at the time of writing (July 2007) 5025 patients have been randomised. A lower than expected prevalence of end-points in the barium enema sub-trial has caused an increase in sample size. In addition to the study protocol, we describe our approach to recruitment, notably the benefits of extensive piloting, the use of a sham-randomisation procedure, which was employed to determine whether centres interested in participating were likely to be effective in practice, and the provision of funding for dedicated sessions for a research nurse at each centre to devote specifically to this trial.
Current Controlled Trials ISRCTN95152621
Outcomes of care are a blunt instrument for judging performance and should be replaced, say Richard J Lilford, Celia A Brown, and Jon Nicholl