Our systematic review of health care efficiency measures identified a large number of existing measures, the majority appearing in the published literature. The measures in the published literature were typically developed by academic researchers, whereas the measures in the gray literature were developed by vendors and are proprietary. There is almost no overlap between the measures found in the published literature and those in the gray literature, suggesting that the driving forces behind research and practice result in very different choices of measure. Many of the measures in the published literature rely on methods such as SFA and DEA. The complexity of these methods may have inhibited the use of these measures beyond research, because measurement results can be sensitive to a multitude of specification choices and difficult to interpret. The vendor-developed measures, although relatively few in number, are used much more widely by providers, purchasers, and payers than the measures in the published literature. We observed very little convergence around a “consensus” set of efficiency measures that have been used repeatedly. However, we did observe some convergence around several general approaches—SFA and DEA are most commonly used for research, while ETGs and MEGs appear to have the greatest market share in “real-world” applications. In this way, efficiency measurement differs from quality measurement, where in recent years a fair degree of consensus has been formed around core sets of measures.
The state of the art in health care efficiency measurement contrasts sharply with the measurement of health care quality in several other ways. Unlike the evolution of most quality measures, the efficiency measures in use are not typically derived from practice standards from the research literature, professional medical associations, or expert panels (Schuster, McGlynn, and Brook 1998
). Unlike most quality measures, efficiency measures in both the published and gray literature have been subjected to few rigorous evaluations of their performance characteristics including reliability, validity, and sensitivity to methods used. Existing criteria in use could be applied to efficiency measures (e.g., National Quality Forum criteria for quality measures). Measurement scientists would prefer that steps be taken to improve these metrics in the laboratory before implementing them in operational uses. Purchasers and health plans are willing to use measures without such testing under the belief that the measures will improve with use.
A central issue in efficiency measurement is the potential for differences in the quality of the outputs used. Only one of the measures reviewed explicitly incorporated quality of care into the efficiency measure, although several included quality as an explanatory factor in analyses of variation in efficiency or in a separate comparison. Therefore, almost all of the purported efficiency measures reviewed would be classified as “cost of care” measures by the AQA definition, not true “efficiency measures.” As discussed above, methods for accounting for quality are not well developed at this time, and preliminary testing shows that results are sensitive to the approach used. These measures could be used under the assumption that variations in quality of care across groups were modest. This assumption may be reasonable for certain comparisons, for example, several HEDIS measures of cardiac care now show high mean performance with minimal variation. Similarly, the most recent New York State Cardiac Surgery Reporting System results show 38 of 39 hospitals to have between them no statistically significant differences in risk-adjusted mortality ratio. For other types of comparisons, it is likely that quality does vary substantially. Evidence is lacking on the variation in quality for important types of measures, such as cost-per-episode measures that are widely used. If systematic differences in quality do exist, measures that do not account for the differences would reflect the cost of care only, not efficiency. This could provide an incentive for physicians to selectively treat lower-risk patients, potentially increasing disparities in care (Newhouse 1994
There are several additional unresolved issues in the specification of efficiency measures. These issues include risk adjustment of episode-based measures; attribution of episodes, patients, or other outputs to specific providers when care is dispersed over multiple providers; levels of reliability for detecting differences among entities; and differences between proprietary grouper methodologies (Milstein and Lee 2007
). One concern of providers who are the subject of efficiency measurement is that proprietary measures are not sufficiently transparent to allow for definitive answers to these and related measurement questions (American Medical Association 2007
). These concerns, along with the quality measurement issue, have been the basis of legal action against users of efficiency measures (Massachusetts Medical Society 2008
Our findings are subject to several important limitations. We excluded studies from non-U.S. data sources, primarily because we judged the studies done on U.S. data would be most relevant to the task we were contracted to perform. It is possible, however, that adding the non-U.S. literature would have identified additional measures of potential interest. Other exclusion criteria we used may have omitted some relevant studies from the results. For example, our review did not find any measures at the health system level, but several authors have examined “waste” in the U.S. health care system and other issues related to efficiency, estimating that approximately 30 percent of spending is wasteful (Reid et al. 2005
An important limitation common to systematic reviews is the quality of the original studies. A substantial amount of work has been done to identify criteria in the design and execution of the studies of the effectiveness of health care interventions, and these criteria are routinely used in systematic reviews of interventions. However, we are unaware of any such agreed-upon criteria that assess the design or execution of a study of a health care efficiency measure. We did evaluate whether studies assessed the scientific soundness of their measures (and found this mostly lacking).
Notwithstanding these limitations, our systematic review suggests that the state of efficiency measurement lags far behind quality measurement in health care. Stakeholders are under pressure to make use of existing efficiency measures and “learn on the job.” However, use of these measures without greater understanding of their measurement properties is likely to engender resistance from providers subject to measurement and could lead to unintended consequences. Using measures that do not account for differences in the quality of the outputs being compared could lead purchasers to favor providers who achieve favorable cost profiles by withholding necessary care. Using measures that are not reliable enough to detect differences could lead patients to change physicians, disrupting continuity, without achieving better value for health care spending. Going forward, it will be essential to find a balance between the desire to use tools that facilitate improving the efficiency of care delivery and producing inaccurate or misleading information from tools that are not ready for prime time.