Surgical site infections (SSIs) are increasingly used as a measure of hospital quality.8,9
This study demonstrates that SSI rates are a reliable measure of hospital quality when an adequate number of cases have been reported. When the number of cases is low (< 65), more than 50% of variability between hospitals is due to statistical noise. When the number of cases reported is less than 94, reliability falls below the acceptable threshold of 70% reliability. Furthermore, for hospitals in the highest tertile by caseload, quality was the largest contributor to explaining variations in outcome. Although patient factors are important for explaining variation at the level of the individual, they contributed little overall to the variations in hospital outcomes.
Reliability is primarily driven by the number of cases and frequency of the outcome.7, 10
Previous studies have evaluated reliability of other outcome measures. Hofer et al evaluated the reliability of physician performance measures for diabetic care such as number of physician visits and hospitalizations, laboratory resource use, and adequacy of glycemic control as measured by hemoglobin A1c. They found that these performance measures, even after adjustment for case-mix, were only 40% reliable, meaning that 60% of the variation between physicians as due to noise.11
Adams et al demonstrated that physician cost-profile scores, based on resource use for all episodes of care, were largely unreliable. Vascular surgery cost-profiles had the lower median reliability of 0.05 among the specialties, with reliabilities ranging from 0.05 to 0.79.10
Using 2007 ACS NSQIP data, Osborne et al demonstrated that as vascular surgery case volume increased across quartiles, the proportion of variation in mortality across hospitals due to statistical noise decreased from 94% in the lowest quartile to 64% in the highest quartile and the reliability of mortality as a quality indicator improved.12
Presently, about half of ACS NSQIP hospitals collecting data on colon resections submit enough cases to meet the threshold for 70% reliability. Despite the demonstrated validity of the ACS NSQIP methodology,5-6
reliability of its outcome measures is necessary to prevent misclassification of hospitals when ranking hospital performance. For example, Osborne et al demonstrated that 43% of hospitals participating in the ACS NSQIP vascular surgery program were misclassified into the wrong quartile when using standard regression methods, including 51% of the top quartile and 26% of the bottom quartile.12
This misclassification can have significant implications for quality improvement efforts, public perception, and hospital finances in an era of pay for performance.
One potential solution might be to require ACS NSQIP participating hospitals to submit at least 94 colon cases so as to achieve at least 70% reliability. However, if other outcomes and types of surgeries are included, the number of cases that need to be reported to ensure reliability might be prohibitive, particularly for low volume hospitals. Furthermore, increasing reporting requirements would require more time and effort by the clinical nurse reviewer and could reduce the quality of other data collection efforts. The new generation of ACS NSQIP will address the problem of cost containment versus sufficient sampling to ensure reliability by using a 100% sampling strategy only for selected high risk procedures.13
An alternative solution would be to use a novel technique known as reliability adjustment.
Reliability adjustment is being increasingly used in quality measurement. This technique uses empirical Bayes methods to adjust for measurement error (“noise”), which is usually due to low sample size or low event rates. As a result, unreliable outcomes from low volume hospitals will move closer to the mean, while more reliable estimates from higher volume hospitals will remain relatively stable. For example, low volume hospitals may be incorrectly classified as having extreme performance using standard analytic models, when the results were due to chance alone. Reliability adjustment would move those estimates closer to the mean and decrease the likelihood of classifying them as outliers. The disadvantages of reliability adjustment include the potential for overestimation of performance for low volume hospitals with high SSI rates; by shrinking their risk-adjusted outcomes towards the mean, we may be obscuring quality problems at low volume providers. Low volume hospitals with poor outcomes should be closely scrutinized and other additional methods for evaluating quality of care in these hospitals should be considered to avoid this problem. Lastly, there have not been any prospective studies demonstrating the superiority of reliability adjustment. Nonetheless, Dimick et al have demonstrated using cohort data that reliability adjustment for uncommon major surgical procedures such as abdominal aortic aneurysm repair or pancreatic resection significantly reduced variations in hospital mortality rates and improved the ability to predict future low mortality.14
This study has several limitations. First, the study uses ACS NSQIP data which selects a representative sample of cases to determine risk-adjusted outcomes. Use of only some rather than all cases may underestimate the reliability of SSIs and overestimate the percentage of low reliability hospitals. However, this methodology currently forms the basis for participating hospitals’ quality improvement efforts. Second, only colon resection cases were included in the analysis. Because colon resections are a common and high-risk procedure, the reliability of superficial SSIs in this study may be higher than that for other procedures. Whether superficial SSIs are reliable across all surgical procedures or whether pooling rates across procedures to increase reliability will be appropriate to guide hospital quality improvement efforts are unclear.
In conclusion, superficial SSI rates after colon resections are a reliable indicator of hospital quality when the number of cases is adequate, likely due to the prevalence of both the procedure and the outcome. Consideration should be given to methods to increase the reliability of measured outcomes such as 100% sampling of targeted high risk procedures that will be used in the new generation of ACS NSQIP and/or reliability adjustment, particularly given the implications of misclassifying hospitals and surgeons based on performance.