We describe a systematic, explicit method of ranking quality measures using regularly updated clinical practice guidelines and prospectively collected performance data. Measures are ranked based on their potential to improve a prospectively defined outcome in a specified patient population, rather than on their ability to increase institutional concordance values. When applied to a comprehensive set of breast-cancer process-of-care measures, the highest-ranking measures recommend (1) chemotherapy for node-negative, hormone-receptor positive, tumors measuring 1.1-3 cm, (2) hormone therapy for node-positive, hormone-receptor positive tumors, (3) chemotherapy for node-positive, hormone-receptor positive tumors, (4) radiation therapy following breast-conserving surgery, and (5) hormone therapy for node-negative, hormone-receptor positive, tumors measuring 1.1-3 cm.
Higher-ranking measures tend to have more eligible patients and demonstrate a larger difference between the highest and overall concordance values. Sometimes measures with many eligible patients (#6) or many non-concordant patients (#14 and 16) do not rank highly, because they offer relatively limited potential for improvement. Since the rankings depend largely on the relative number of eligible patients per measure and this factor should be reasonably consistent across systems of care, the quality measures that rank highly in this analysis could have broad relevance beyond the institutions that provided performance data. However, additional studies should assess the reproducibility of the data used to assess concordance and the validity of the measures considered high priority by our analysis before these measures are implemented widely by other health care systems.
Treatments with few eligible patients rarely rank highly, in part because their corresponding measures cannot have a large number of non-concordant patients. This reinforces the need to appropriately scale quality measures to the population and organization being assessed. Treatments that confer only modest improvements in outcomes for individual patients sometimes rank highly (#18). This occurs when many patients do not receive recommended treatments and benchmark concordance values are much greater than overall concordance values. The fact that such measures rank highly underscores the importance this approach places on improving the outcomes of a population rather than the outcomes of individuals.
Using the highest concordance achieved by an institution as the goal for all institutions is advantageous, because it defines a level of performance that is feasible and highlights circumstances where interventions may be more likely to work. However, this approach has its limitations. First, it fails to prioritize situations where care is universally non-concordant (i.e., all institutions perform below 100% and no institution demonstrates a significantly higher concordance). While this is a potential weakness of our approach, such situations do not necessarily represent areas where attention, and quality improvement resources, should be focused. There may be other explanations for consistently non-concordant care. Moreover, systematically identifying a realistic benchmark when all institutions exhibit the same level of care is difficult. Second, it inherently prioritizes situations for which there is substantial variability in performance from center to center. One could argue this often occurs when data are conflicting and experts disagree. However, deriving measures from consensus-based guidelines, as was done for this analysis, helps to minimize this risk.
Our approach to prioritizing quality measures relies on qualitative estimates of the benefits associated with treatments as determined by a survey of a relatively small group of expert breast cancer clinicians. We considered using the results of clinical trials to estimate these benefits, or to calculate the incremental quality-adjusted life years generated by treatments. However, the published data on breast cancer outcomes were too inconsistent to estimate these benefits reliably and consistently for each recommendation. Clinical trials rarely select the same outcomes (DFS, recurrence-free-survival, etc.), end-points (5 years, 10 years, etc.), or patient populations. Furthermore, the estimates provided by clinical trials often compare the outcomes associated with recommended treatments to the outcomes associated with experimental treatments, not the outcomes associated with common non-concordant treatments. Our priority was to use the same estimation method for each recommendation. The approach we chose is simple, practical, and reproducible. It is reassuring that we identified an association between magnitude-of-benefit estimates and DFS, and important to note that the final rankings are only modestly sensitive to the magnitude-of-benefit estimates.
Our goal was to prioritize quality measures based on their potential to improve DFS and QOL. Certainly, these are not the only outcomes that need to be considered. We realize our rankings would have been different if the goal had been different. For example, if we had prioritized measures based on their potential to improve overall survival, then some measures would have ranked lower (#6) and others would have ranked higher (# 22). Moreover, treatment effectiveness is not the only important component of health care quality that needs to be addressed. The Institute of Medicine considers patient safety, patient centeredness and timelines-of-care to be equally important aspects of health care quality.
46 While some of the measures included in our analysis, such as the ‘over-use’ measures, address these other components of quality, these ‘over-use’ measures were often not prioritized highly by our methodology. If one believes all components of quality should receive balanced attention, then it may be necessary to develop unique measures for each component of quality and prioritize them separately. Doing so, however, would be challenging because there are relatively few reliable measures and it is hard to define clear, quantifiable goals for these other aspects of health care quality.
While we used quality measures to help identify where potentially ameliorable gaps in quality of care exist, there are other applications for quality measures (e.g., public reporting, grading providers and paying-for-performance). The measures identified as high priority in our analysis may not be ideally suited for these other applications. Unfortunately, quality measures are frequently not tailored to the different purposes for which they are used or the groups to which they are applied. To make quality measurement more efficient and effective, one may have to develop unique measures for these different applications.
It is important to recognize that our prioritization methodology requires a comprehensive set of quality measures and an ability to estimate the impact recommended treatments have on outcomes. Unfortunately, it is not always possible to define an extensive set of measures or estimate the impact of treatments. Our approach also requires a detailed patterns-of-care database – a resource that may not be available in many centers. If non-NCCN centers exhibit different patterns of care than NCCN centers, then all institutions will have to repeat the analysis to identify their own, unique high priority quality measures. However, the resources required to do this could be prohibitive. Finally, this methodology does not preclude the need to reevaluate practice performance as clinical evidence, practice patterns, and quality measures change. The recommendation for chemotherapy in hormone-receptor-positive, node-negative, breast cancer was in line with the highest-level evidence when it was created, but emerging data now suggest chemotherapy may only benefit a subset of these patients. While the measure based on this recommendation (#18) ranked highly in this analysis, it might rank differently in the future, as evidence and practice patterns change.
A few organizations have described criteria for identifying where quality improvement efforts should focus their resources. In addition to those enumerated by the Institute of Medicine (discussed above)
43, authors have recommended considering impact on health, meaningfulness to consumers, potential for quality improvement, and susceptibility to influence by the health care system.
46 Some researchers have proposed selecting quality measures based on their clinical impact, reliability, feasibility, scientific acceptability, usefulness, and potential for improvement.
10,47 Each set of criteria could be used to generate quality measures, and the last set has been used to identify several widely accepted measures. However, we are not aware of any previous efforts that use explicit criteria to prioritize a set of measures in a systematic way or that identify which measures are most likely to help achieve a particular outcome.
Several organizations have described quality measures for breast cancer.
9,14,16,48-50 The National Quality Form recommended four: needle biopsy before excision, radiation therapy following breast conserving surgery for women under 70, combination chemotherapy within 60 days of surgery for hormone-receptor negative breast cancer > 1 cm, and axillary node dissection or sentinel node biopsy for stage I-IIb breast cancer.
16 The RAND corporation endorsed three: offer modified radical mastectomy or breast-conserving surgery, radiation therapy within 6 weeks of surgery or chemotherapy for women who have breast conserving surgery, and adjuvant systemic therapy (combination chemotherapy and/or tamoxifen) for women over age 50 with positive nodes.
14These quality measures have limitations. Some are not supported by high-quality clinical evidence. Others do not clearly define a population of eligible patients or recommend a specific treatment. Several relate to aspects of care for which it is hard to identify a measurable process that a quality improvement program could target. Most importantly, all were selected as consensus measures by expert panels, without considering actual patterns-of-care data or impact on outcomes. While they overlap somewhat with the recommendations prioritized by our analysis, we identified several unique measures (e.g., #18 and 19). Moreover, some of the measures selected by other organizations and supported by high-quality evidence did not rank near the top of our list (e.g., # 22 and 23), because few patients were eligible for these recommendations and there was not much room for improvement. All of the measures included in our analysis were derived from evidence and consensus-based clinical practice guidelines. Analyses performed by the NCCN pre hoc ensure the measures are feasible and reliable. Most importantly, the highest-ranking measures in our analysis identify clinical areas where practice performance is sub-optimal and a change in practice performance can substantially improve outcomes.
The systematic method of prioritizing quality measures that we describe represents a significant departure from previous efforts to identify priority areas for quality improvement. The methodology is simple and flexible, and could easily be applied to other practice settings, data sources, and diseases, or used it to rank measures across different diseases. The breast cancer quality measures that ranked highly in our analysis represent key leverage points that may have broad relevance beyond the institutions that contributed performance data. In conjunction with the NCCN, the American Society of Clinical Oncology used the results of our analysis to help select their breast cancer quality measures.
11 Widespread use of the methods described above could increase the efficiency and efficacy of quality improvement efforts and improve the outcomes of people who rely on our health care system.