|Home | About | Journals | Submit | Contact Us | Français|
Study concept and design: Dudley, Kernisan.
Acquisition of data: Kernisan, Dudley.
Analysis and interpretation of data: Kernisan, Dudley, Boscardin, Landefeld, Lee.
Drafting of the manuscript: Kernisan, Dudley.
Critical revision of the manuscript for important intellectual content: Kernisan, Dudley, Lee, Boscardin, Landefeld.
Statistical Analysis: Boscardin, Dudley, Kernisan.
Obtained funding: Landefeld, Dudley.
Study supervision: Dudley, Landefeld.
The Leapfrog Hospital Survey allows hospitals to self-report the steps they have taken towards implementing the “Safe Practices for Better Healthcare” endorsed by the National Quality Forum. Currently Leapfrog ranks hospital performance on the Safe Practices Leap by quartiles, and presents this information to the public on its website. It is unknown how well a hospital's resulting Safe Practices Score correlates with outcomes such as inpatient mortality.
To determine the relationship between hospitals’ Safe Practices Scores and risk-adjusted inpatient mortality rates.
Observational analysis of discharge data for all urban U.S. hospitals completing the 2006 Safe Practices Leap and identifiable in the Nationwide Inpatient Sample (NIS). Leapfrog provided a Safe Practices Score (SPS) for each hospital, as well as three alternative scores based on shorter versions of the original survey. Hierarchical logistic regression was used to determine the relationship between quartiles of SPS and risk-adjusted inpatient mortality, after adjusting for hospitals’ discharge volume and teaching status. Subgroup analyses were done on patients older than 65 years old and patients with greater than 5% expected mortality.
Inpatient risk-adjusted mortality, by quartiles of survey score.
155 of 1075 (14%) Leapfrog hospitals were identifiable in the NIS (1,772,064 discharges). Raw observed mortality in the primary sample was 2.09%. Fully adjusted mortality rates (95% confidence intervals in parentheses) by quartile of SPS, from lowest to highest, were 1.97% (1.78-2.18%), 2.04% (1.84-2.25%), 1.96% (1.77-2.16%), 2.00% (1.80-2.22%); p for linear trend =0.99. Results were similar in the subgroup analyses. None of the three alternative survey scores was associated with risk-adjusted inpatient mortality, although p values for linear trends were lower (0.80, 0.20, 0.11).
In this sample of 14% of all hospitals nationally that completed the Safe Practices Leap, survey scores were not significantly associated with risk-adjusted inpatient mortality.
The Leapfrog Group is a well-known non-profit business coalition that provides information regarding hospital safety and quality to its members (large companies that purchase health care) and to consumers1-3. Its primary method of evaluating hospitals is via voluntary participation in the Leapfrog Hospital Survey. Initially these annual surveys assessed hospitals’ adoption of three “Leaps” that the organization believed would improve patient safety: computerized physician order entry, intensive care unit physician staffing, and evidence-based referrals for high-mortality surgeries. In 2004 a fourth section was added to the survey, the Safe Practices Leap, to allow hospitals to report efforts towards implementing the National Quality Forum's “Safe Practices for Better Healthcare”.4
Approximately 1100 urban hospitals have completed the Safe Practices survey in recent years, and these results are reported to the public on the internet (www.leapfroggroup.org). Nevertheless, it remains unclear how well quality as assessed by the Safe Practices survey (consisting of hospitals’ self-report of structural and process measures) correlates with outcomes of interest to patients and policymakers, such as mortality. To date each of the first three Leaps has been the subject of multiple peer-reviewed studies 5-11, and a recent study examining the first three Leaps collectively did find some positive associations between survey performance and reduced mortality 12. However little has been published regarding the Safe Practices survey, which is the most time consuming component to complete. In particular, to our knowledge it is not yet confirmed that higher scores on the survey correlate with actual outcomes. This issue is pertinent since on the internet performance on the survey is ranked by quartiles, which likely suggests to consumers that hospitals in the highest quartile provide safer care than those in lower quartiles.13
In this paper we present an analysis examining the relationship between urban hospitals’ scores on the 2006 Safe Practices survey and risk-adjusted inpatient mortality. To address questions of generalizability and response bias we also present comparisons between hospitals that participated in the survey and those that did not.
Mortality data was obtained from the most recent version of the Nationwide Inpatient Sample (NIS) available in late 2007, which was the 2005 NIS14. This database of inpatient discharge data is collected via federal-state partnership as part of the Agency for Healthcare Research and Quality's Healthcare Cost & Utilization Project. The 2005 NIS contains administrative data on all discharges from 1,054 hospitals located in 37 States, approximating a 20-percent stratified sample of U.S. hospitals (7,995,048 discharges in total)15. 633 hospitals are classified as urban. However, only 24 states allow the release of hospital-identifying information, leaving 400 identifiable urban hospitals in the 2005 NIS. Included in the NIS data are classification and severity measures for each patient record, including an All Patient Refined Diagnosis Related Group (APR-DRG), and APR-DRG mortality risk subclass.16 The APR-DRG system is a well-known proprietary classification system developed by 3M Health Information Systems, which uses diagnosis codes, procedure codes, and other administrative data to classify patients into base disease categories, as well as assign them to 1 of 4 levels of mortality risk (minor, moderate, major, and extreme risk of dying) within each base disease category17, 18.
Leapfrog provided survey data on 1075 urban hospitals that had participated in the 2006 Safe Practices survey. Of these, 679 were located within the 24 states providing hospital-identified data to the NIS. As the survey is completed by hospitals in the spring, the 2006 survey captures safety practices in place during the 2005 calendar year.
We discussed our analysis with the survey's authors before starting. They were interested in streamlining their survey, and asked us to consider an “Action SPS”, obtained through a different scoring methodology. This “action-focused” methodology assigned points for a given safe practice solely on the basis of a hospital's answer to the most “actionable” item in the action portion of each Safe Practice survey section. (The original scoring methodology gave credit for establishing systems of awareness, accountability, ability (capacity building investments), and action. See Box 1.) In 2008 the survey was reduced from 27 to 13 Safe Practices.* We accordingly repeated our analyses using an “SPS-13” score and an “Action SPS-13” score based on the 13 retained practices.
The sample for the primary analysis evaluating the association between Safe Practices Score (SPS) and mortality consisted of discharges from those urban U.S. hospitals which had completed the Safe Practices survey and were also identifiable in the 2005 NIS (155 hospitals). We also planned subgroup analyses for two populations in which we hypothesized that inpatient mortality might be more sensitive to adherence to Safe Practices: patients older than 65 years old and patients with greater than 5% expected mortality.
Discharges excluded from the analysis included: patients under age 18, oncology patients, solid organ transplants, and transfers to or from another acute facility. Application of the exclusion criteria to discharges from urban hospitals in the 24 hospital-identifying NIS states reduced the number of eligible discharges from 4,873,959 to 3,672,146.. The exclusions were standard per the recommendations of 3M Health Information Systems for use of the APR-DRG system in risk-adjustment18.
The primary predictor of interest was each hospital's 2006 Safe Practices Score (SPS). The maximum total SPS score possible was 1000. An additional predictor of interest was each hospital's 2006 Action Safe Practices Score (ASPS). We also subsequently tested SPS and ASPS scores based on the 13 safe practices that were retained in the 2008 survey (SPS-13 and ASPS-13, respectively).
To examine the relationship between survey scores and inpatient mortality, we built hierarchical logistic regression models, known more broadly as generalized linear mixed models19, with each discharged patient as the unit of analysis and a random intercept for each hospital to capture the correlation of patients within a hospital. Because the distribution of survey scores was heavily skewed, we categorized the scores into quartiles prior to using them as a predictor in our models. Categorizing the score into quartiles also seemed appropriate as Leapfrog itself rates hospitals on its website based on which quartile of survey score they fall into.
To adjust for mortality risk, we used the population of discharges from urban hospitals in the 24 hospital-identifying NIS states (3,672,146 discharges) to calculate the observed mortality rate for each combination of APR-DRG and APR-DRG mortality risk score. We then assigned to each patient the mortality risk associated with his or her APR-DRG and APR-DRG mortality risk category and used this expected mortality risk as an adjustor in our logistic regression models.
Additional hospital characteristics included in the model were volume of discharges and whether the hospital was a teaching hospital. These were treated as control variables to account for the possibility that a hospital's Safe Practices Score may indicate better quality of care through mechanisms not mediated by hospital size and teaching status. Rural hospitals were excluded from our analysis because: 1) Leapfrog does not target these hospitals, and 2) the co-linearity of rural location and hospital discharge volume would have rendered our models extremely unstable. (Rural hospitals participating in the survey collectively only accounted for 71,169 discharges in the NIS.)
In our models, survey quartile categorizations were entered as three dummy variables with the highest quartile as reference group to produce odds ratios for mortality versus the highest SPS quartile, as well as adjusted mortality rates. A test for linear trend of the log-odds of mortality was conducted using a likelihood ratio test with the linear orthogonal polynomial contrast.
A post-analysis power calculation was done to assess the difference in mortality that our data would permit us to detect. Data were available on 1,772,064 admissions clustered in 155 hospitals of interest. Using a conservative estimate of intrahospital correlation of risk-adjusted mortality of 0.025 (others have estimated this correlation to be below 0.0120) would imply an effective sample size of 6000. Assuming an overall mortality rate of 2%, we calculated that we had 80% power to detect a one percentage point linear increase in mortality from the 1st to the 4th quartiles Safe Practices Score, assuming two-tailed alpha=0.05. We also calculated that to have 80% power to detect a mortality increase from 1.9% (Q1) to 2.1% (Q4) would require data from 500 hospitals.
As the human subjects data used for this study was completely de-identified and not collected for the purpose of this research, this study was considered exempt from institutional review board approval, per the policies of the University of California San Francisco's Committee on Human Research. All analyses were performed using StataMP version 10.0 (StatCorp, College Station, Texas), except for the power calculations which were performed using nQuery version 6.02. A two-sided significance level of 0.05 was used for all hypothesis tests.
Of the 155 urban hospitals in the NIS which had completed the Safe Practices survey, 34% were teaching hospitals and 66% were non-teaching hospitals (Table 1). Within the NIS, hospitals participating in the survey tended to have higher volumes of discharges than hospitals that had not participated in the survey. Survey scores for participating hospitals identifiable in the NIS were similar to survey scores for participating hospitals not identifiable in the NIS (Table 1). The distribution of survey scores was heavily skewed, with most hospitals scoring above 770/1000, regardless of the scoring methodology used.
In 2005 there were 1,772,064 discharges from the 155 hospitals of interest, of which 37,033 resulted in an inpatient death (2.09%). Quartiles of Safe Practices Score (SPS) were not a significant predictor of mortality whether or not we adjusted for expected mortality rate; or added volume of discharges and teaching status to our models (Table 2). From lowest to highest quartiles of Safe Practices Score, inpatient mortality rates adjusted for patient and hospital characteristics were (95% confidence intervals in parentheses) 1.97% (1.78-2.18%), 2.04% (1.84-2.25%), 1.96% (1.77-2.16%), and 2.00% (1.80-2.22%); p for trend across four quartiles = 0.99. Similarly, quartiles of performance using the SPS-13 (which assesses hospitals regarding the 13 Safe Practices retained in the 2008 survey) did not predict inpatient mortality: fully-adjusted mortality rates from lowest to highest quartile were 1.99% (1.80-2.21%), 2.00% (1.81-2.21%), 2.02% (1.82-2.24%) and 1.95% (1.75-2.16%); p for trend across four quartiles = 0.80.
Quartiles of survey performance using the action-based scoring methodology (ASPS) also did not significantly predict inpatient mortality. Fully-adjusted inpatient mortality rates from lowest to highest quartile of ASPS were 2.08% (1.87-2.30%), 2.02% (1.83-2.23%), 2.02% (1.83-2.24%), and 1.86% (1.68-2.06%); p for trend across all quartiles = 0.20. Using ASPS-13, fully-adjusted inpatient mortality rates from lowest to highest quartile were 2.12% (1.91-2.35%), 2.01 (1.82-2.22%), 2.01% (1.82-2.22%), and 1.85% (1.66-2.05%); p for trend across four quartiles = 0.11.
To see if survey scores might be more predictive of mortality in patients who might be at higher risk of death, we repeated our analyses with two subgroups of patients: those age 65 and older (n=721,497) and those with 5% or greater expected mortality risk (n=163,138). Overall in-hospital mortality rates in these populations were 3.9% and 18.3%, respectively. Quartiles of survey score were not predictive in these subgroups (Tables 3 & 4).
Finally, we also examined the data to see if hospitals that had chosen to participate in the Safe Practices survey had fewer inpatient deaths than other hospitals in the NIS (within the 24 states providing hospital-identifying information). Fully adjusted mortality (with 95% confidence intervals in parentheses) among hospitals participating in the survey (1,772,064 discharges within 155 hospitals) was 1.96% (1.83-2.09%) compared to 2.06% (1.94-2.19%) among hospitals not participating (1,892,725 discharges within 243 hospitals), p = 0.45.
On the internet, hospital performance on the Safe Practices survey is ranked by quartiles, which may suggest to consumers that hospitals in higher quartiles are safer than hospitals in lower quartiles. In this first study of the relationship between Safe Practices survey scores and hospital outcomes, we studied a national sample of hospitals and found no relationship between quartiles of score and in-hospital mortality, regardless of whether or not we adjusted for expected mortality risk and certain hospital characteristics (Table 2).
A hospital's performance on each Safe Practice is evaluated in the survey via several questions assessing institutional systems to promote awareness, accountability, ability (capacity-building investments), and action (see Box 1). Part of the rationale for designing the survey this way originally may have been to provide “training wheels” and give hospitals credit for creating systems that could eventually support full implementation of a given Safe Practice. However, awarding survey points for hospital administrative structures raises the possibility that the survey is capturing excessive noise, which may be overwhelming an important signal.
It seemed plausible that inpatient mortality could relate to whether actions were being taken to implement the Safe Practice. For this reason the survey's creators proposed the alternative “action-based” scoring method which assigned all points for a Safe Practice based on whether or not the hospital indicated that it had implemented the key action for the practice. In our analysis, using this “action-based” scoring method slightly improved the ability of the survey ranking to predict in-hospital mortality, although the association was not statistically significant. Focusing on actions in the future may improve the survey's ability to discriminate between high-quality and low-quality hospitals.
The Safe Practices survey has recently been shortened from 27 to 13 Safe Practices, largely in response to feedback regarding the considerable time required to complete the survey. Our findings indicate that Safe Practices scores based on the 13 retained practices are unlikely to be significantly associated with inpatient mortality even if scoring is limited to actions taken, as was done in the ASPS-13. The findings do not rule out a modest association between the ASPS-13 and hospital mortality. As presented in our results (Table 2), the difference in risk-adjusted mortality between the best and worst quartiles determined by ASPS-13 was 0.27% (p-value for trend = 0.11). This difference in absolute mortality risk corresponds to a “number needed to treat” of approximately 370, which some would find clinically significant. Nonetheless, the likelihood that the observed difference arose by chance is increased by the facts that multiple statistical tests were performed and that associations were not observed in the high risk groups in which they were most strongly hypothesized.
Given the voluntary nature of this self-reported survey, it is also plausible that a lack of correlation with mortality might be due to confounding associated with a “healthy volunteer” effect. Hospitals that have already engaged in improving quality of care may be more likely to want to participate in the Safe Practices survey. However, in our study we found that participation in the survey was not predictive of lower risk-adjusted mortality. This would suggest that our negative findings are unlikely to be due to safer hospitals being more likely to participate in the survey.
To our knowledge, this is the first peer-reviewed analysis that has sought to assess whether better performance on the Safe Practices survey correlates with outcomes that are indicative of improved patient safety. Given that two recently published analyses21, 22 in the business/medical literature used Leapfrog's Safe Practices survey as the metric of hospital quality, it seems clear that additional efforts to explore the validity and value of this well-publicized1, 23, 24 quality measure are needed. As the patient-safety movement grows in importance, hospitals face increasingly complex choices regarding improvement and reporting to the public. Likewise, consumers are faced with multiple sources of information on hospital quality, and are encouraged to choose a facility based on this information. Despite a lack of evidence demonstrating the validity of the Safe Practices survey, the survey is well-known and is influential. For this reason, validating the survey rankings as a measure and as a source of accurate information for consumers and researchers is important.
Our findings suggest, however, that the survey as currently designed does not discriminate between hospitals with higher and lower inpatient mortality. Some will question our choice of overall inpatient mortality as the outcome of interest. We acknowledge that very valid concerns about use of mortality rates as a measure of hospital quality of care have been raised25-27. Despite this, risk-adjusted mortality rates remain among the most commonly reported outcomes in both the published literature and in public reports of hospital quality. Furthermore, the Institute of Medicine has cited prevention of inpatient deaths as an important reason to focus on patient safety.28 Until consumer-oriented hospital quality reports become explicit as to what benefits consumers can expect from a “safer” hospital, consumers will almost certainly assume a safer hospital is one in which a patient is less likely to die.29 Thus, our analyses are consistent with the most likely consumer interpretation of the data presented on the internet.
Our study results suggest several points regarding the Safe Practices survey that would be valuable for the greater patient safety community to consider. An important issue is whether the SPS is measuring what needs to be measured. Many of the Safe Practices are processes to improve care, yet in its current form the survey is measuring the “processes around the process”. This often gives hospitals credit for what essentially may be good intentions. This also gives points for having structures that may support implementation of a Safe Practice, rather than only awarding credit when the Safe Practice is being consistently followed. Such a scoring system likely is vulnerable to inflation of scores. Of note, most hospitals score quite well on the survey (Table 1). It may be that all hospitals truly are doing well on the Safe Practices; however, it seems more likely that the survey as currently designed is unable to discriminate between truly high and low adherence to the Safe Practices.
It may also be that too much is being measured. Steps have already been taken to address this by reducing the survey to 13 practices, but our results suggest that this alone is unlikely to improve the survey's ability to correlate with inpatient mortality.
Finally, it is unknown how well a hospital executive's report of actions being taken to support and implement safety practices correlate with actual activity within that hospital. It may not be reasonable to assume that hospitals are doing what their executives say they are. Our study results call into question the use of a lengthy unaudited survey as a tool for measuring adherence to Safe Practices. Further research to examine how well self-reported activities regarding hospital safety practices correlate with actual activities within the hospital will be helpful in determining the value of self-reported safety data.
Our study has important limitations. The most significant limitation is that our main analysis only had enough power to detect 1% or greater differences in mortality. Although at a policy and epidemiology level the observed 0.2% difference in mortality rate between the first and fourth quartiles of performance using ASPS scoring is potentially important, we calculated that we would have needed on the order of 11 million admissions grouped in 500 hospitals to conclude statistical significance for a difference of this magnitude. Such an analysis was not feasible given the data available from the largest hospital data set in the nation. Furthermore, even if a statistically significant association were to be found with a larger sample, our results indicate that the overall magnitude of the relationship between survey score and mortality would almost certainly remain quite small, and would be of unclear utility to individual consumers.
A second limitation is that we did not study other outcomes that might be responsive to adoption of the Safe Practices, such as complications. It is possible that high performance on the survey does correlate with decreased complication rates, or other outcomes of interest to purchasers and policymakers. However, complication rates are difficult to accurately measure using administrative data. Finally, our primary analysis examines 155 of the 1075 urban hospitals that participated in the Safe Practices survey, raising the question of whether our findings can be generalized to the many survey participants which were not in the Nationwide Inpatient Sample (NIS). However, the NIS is designed to approximate a stratified random sample of U.S. hospitals, and individual hospitals cannot choose whether or not to be included, so there is no reason to believe that survey participants that were included in the NIS differed in any systematic fashion from those that were not in the NIS. Furthermore, we found that survey participants had similar survey scores whether or not they were in the NIS. Hence, although we cannot exclude the possibility that our findings are not generalizable to other hospitals participating in the survey, we have not identified any specific reason to question the generalizability of our findings.
In summary, although a recent study has found that hospitals performing well on the first 3 Leaps of the Leapfrog Hospital Survey do have lower risk-adjusted mortality12, our analysis was unable to find a correlation between better performance on the 4th Leap (the Safe Practices survey) and lower risk-adjusted mortality. It is possible that inviting hospitals to self-report on their patient safety practices and then assigning them to quartiles of Safe Practices Score is not an effective way to assess hospital quality and safety. Our findings should not be interpreted, however, as indicating that the Safe Practices are not important, or that the Safe Practices cannot be measured in an informative and valid way. Rather, future work should seek to establish valid methods for assessing adherence to the Safe Practices. Further research is needed to determine how performance on the Safe Practices survey or other instruments designed to measure Safe Practices performance may correlate with other outcomes of interest to patients and policymakers.
We are grateful to the Leapfrog Group for sharing its data with us.
Funding/Support: This study was supported by the UCSF Division of Geriatrics, the UCSF Institute for Health Policy Studies, and the San Francisco VA Medical Center. Dr. Kernisan was supported by a T32 research fellowship, followed by a VA Quality Scholars fellowship Dr. Dudley's work was supported by a Robert Wood Johnson Foundation Investigator Award in Health Policy. This study was also supported by the Leapfrog Group, which provided its data at no cost.
Conflicts of interest and financial disclosures: Dr. Dudley is an occasional consultant for the Leapfrog Group but is not paid for this work. He serves on the Leapfrog Group's Hospital Rewards Program Steering Committee. Work on this article was not financially supported by the Leapfrog Group. Drs. Kernisan, Lee, Boscardin, and Landefeld report no financial or other conflicts of interest.
Role of the Sponsor: The Leapfrog Group was involved in the design of the study and provided data, but was not involved in the analysis or interpretation of the data. The Leapfrog Group was not involved in the preparation of the manuscript. They did review the manuscript and provided minor corrections of factual statements about their survey program. They were not asked to approve the final manuscript. The Robert Wood Johnson Foundation was not involved in the design of the study, the analyses performed, or the preparation of the manuscript.
*A list of the 27 and the 13 Safe Practices is available in an online Appendix.