In 2007, Medicare introduced a web-based model to aid the public in selecting hospitals (Francis 2007
). While motivated by legitimate concerns about the instability of estimates of quality in small hospitals, the random effects model used by Medicare assumes hospital quality is a random variable unrelated to hospital volume, in contrast to a substantial empirical literature that shows strong relationships between hospital volume and AMI mortality.
The assumption in the CMS Hospital Compare random effects model developed by Krumholz et al. (2006a
; is that there is no association between the volume of the hospital and hospital characteristics associated with quality of care. By using such a random effects model to evaluate hospital quality, Medicare has, in effect, assumed away any volume–outcome effect and reassigned (recalculated or shrunken) adjusted mortality rates in low-volume hospitals thereby reducing any volume–outcome association that may be present. This recalculation is inherent in the decision to exclude volume and other hospital characteristics from the shrinkage model. A better model (see ) would allow the data to speak to the issue of the association between AMI risk and hospital attributes, such as hospital volume and other facility characteristics, rather than assuming there is no such association. If one estimates using shrinkage, one must shrink toward a reasonable model.
As a group, small hospitals are performing below average, yet in the Hospital Compare model, one by one, these small hospital outcomes are shrunken to the expected mortality rate of the entire population and are reported to be no different from average. See Tables 3 and 4 of the Appendix SA2
for simulation studies displaying this finding.
In a recent report, Dimick et al. (2009)
display the usefulness of using both mortality and volume in predictive models in surgery, and Li et al. (2009)
suggest that random effects models used to compare nursing home quality appear to underestimate the poor performance of smaller nursing homes. Mukamel et al. (2010)
have also shown that the random effects model can incorrectly shrink to the grand mean when systematic differences occur in the population. The Leapfrog Group
has also recently implemented a random effects model with volume. For some related theory, see Cox and Wong (2010)
One may argue that hospital volume or even other hospital characteristics relevant for AMI be included in the Hospital Compare random effects model. Normand, Glickman, and Gatsonis (1997)
have contended that when there are systematic differences across providers, “… more accurate estimates of provider specific adjusted outcomes will be obtained by inclusion of relevant provider characteristics.” In the case of the Hospital Compare random effects model, these provider characteristics should include hospital volume and possibly other factors associated with better outcomes, such as nurse-to-bed ratio, nurse mix, technology status, and resident-to-bed ratio (Silber et al. 2007
; Silber et al. 2009
We included volume in the model through the use of indicators for hospital volume. This is a simple approach and it avoids difficulties that might arise in which the shape of the volume–outcome relationship is mostly estimated from the largest hospitals. One can certainly include hospital volume as a continuous variable in the model, but one must model the relationship correctly. In particular, in looking at the matter, we found the relationship is not linear on the logit scale. One could also use various forms of local regression (see Ruppert, Wand, and Carroll 2003
The Hospital Compare website is intended to guide patients to better hospitals. For that task, it is reasonable, perhaps imperative, to use patterns such as the volume–outcome relationship, which are only visible when data from many hospitals are combined. That task is concerned with providing good advice to most patients. For an individual small hospital, no model can determine with precision what the mortality rate may be. However, we can say with great certainty that the typical small hospital performs worse than the typical large hospital. The Hospital Compare website would do better to provide the public with this information, rather than suggest that any individual small hospital is no different from the mean of all hospitals. To assert that each individual small hospital is average because we lack sufficient evidence to reject the null hypothesis that this individual hospital is average is to make the familiar though serious error of asserting the null hypothesis is true because one lacks sufficient evidence to show that it is false. Given small numbers, Hospital Compare can say that there is too little data available to suggest that the individual small hospital is different from its peer group of other small hospitals, while stressing that Hospital Compare knows for certain only that small hospitals have higher death rates as a group than large hospitals.
Guiding patients to capable hospitals is one task, but evaluating the performance of a particular hospital administrator at a particular small hospital is another very different task. In one single small hospital, the observed mortality rate is, again, too unstable to be used to evaluate the unique performance of that hospital, and a random effects model will of necessity lump a small hospital with other small hospitals having similar attributes (if the random effects model includes volume). Therefore, a random effects model of whatever kind could not recognize a unique superb hospital administrator who had overcome all the problems presented by low volume to produce care of superior quality. For this reason, while a random effects model may provide patients with guidance when selecting among potential hospitals, a random effects model of whatever kind should not be used to assign grades to the performance of individual providers when their volume is low, as there may be excellent small hospitals with some characteristics that typically predict poor outcomes.
When a reasonable model is used to shrink mortality rates, those rates may more accurately describe the population of hospitals as a whole, but there will always be uncertainty about whether the shrinkage has improved the estimate for an individual hospital with low volume. If the shrunken rates are used to guide policy for the population as a whole, then individual small hospitals have no basis for complaint; however, if the shrunken rates are misused to evaluate individual small hospitals based largely on the performance of many other small hospitals, then complaint is justified. Shrinkage may be useful in performing one task and useless or harmful in performing a very different task.
Care is needed in deciding which hospital attributes to use in a random effects model that shrinks predictions toward the model. One concern here is with gaming, that is, with manipulating one's attributes, so as to be shrunken toward a better prediction, without actually improving the quality of care. Gaming is likely to occur if random effects models are misused to evaluate and reward or punish the performance of individual hospitals. Just as gaming through upcoding (Green and Wintfeld 1995
) and selection (Werner and Asch 2005
) can occur with patient characteristics, so too may there be gaming of hospital characteristics. Presumably, it may not be too difficult to uncover those hospitals gaming accounting practices through mergers to manipulate volume. However, a more difficult problem may occur when a random effects model includes an indicator of a potentially effective technology at the hospital, and it may be possible to acquire the technology without putting it to effective use. In this case, the quality of care at that hospital would not improve, but its prediction would improve merely because other hospitals use the same technology effectively.
In summary, there is a considerable literature on the volume–outcome relationship, which consistently shows lower AMI mortality risk at higher volume hospitals, a relationship that exists within the data used by Medicare for the Hospital Compare model. However, the Hospital Compare random effects model uses low volume to “shrink” individual small hospitals, one-by-one, back to the overall national mean; hence, the model underestimates the degree of poor performance in low-volume hospitals. If shrinkage is used to guide patients toward hospitals with superior outcomes, it is important to respect and preserve patterns in national data, such as the volume–outcome relationship, and not allow the shrinkage to remove those patterns.