|Home | About | Journals | Submit | Contact Us | Français|
While many quality measures have been created, there is no consensus regarding which are most important. We sought to develop a simple, explicit strategy for prioritizing breast cancer quality measures based on their potential to highlight areas where quality improvement efforts could most impact a population.
Using performance data for 9,019 breast cancer patients treated at 10 National Comprehensive Cancer Network institutions, we assessed concordance relative to 30 reliable, valid breast cancer process-based treatment measures. We identified four attributes that indicated there was room for improvement and characterized the extent of burden imposed by failing to follow each measure: number of non-concordant patients, concordance across all institutions, highest concordance at any one institution, and magnitude of benefit associated with concordant care. For each measure, we used data from the concordance analyses to derive the first 3 attributes and surveyed expert breast cancer physicians to estimate the fourth. A simple algorithm incorporated these attributes and produced a final score for each measure; these scores were used to rank the measures.
We successfully prioritized quality measures using explicit, objective methods and actual performance data. The number of non-concordant patients had the greatest influence on the rankings. The highest-ranking measures recommended chemotherapy and hormone therapy for hormone-receptor positive tumors, and radiation therapy after breast-conserving surgery.
This simple, explicit approach is a significant departure from methods used previously, and effectively identifies breast cancer quality measures that have broad clinical relevance. Systematically prioritizing quality measures could increase the efficiency and efficacy of quality improvement efforts and substantially improve outcomes.
There is widespread consensus that the quality of health care in the United States is suboptimal.1-3 However, there is little consensus regarding how best to address this problem. Many different quality improvement strategies have been tested, with varying degrees of success.4-8 One common strategy has been to develop quality measures – estimates of the degree of adherence to practice standards – to serve as evidence-based tools for evaluating and improving quality of care. While quality measures have been proposed for nearly all aspects of medical care, few have been used widely, and even when measures have been implemented it is hard to assess the impact they have had on health care.
The process of creating quality measures usually involves a panel of experts who consider a number of factors – such as clinical evidence, feasibility of measurement, and potential impact on patients – and then come to consensus.9-13 However, in many cases consensus panels do not analyze clinical evidence or potential impact on patients in a consistent and explicit manner, and do not consider actual practice performance data when creating quality measures. As a result, the measures created by these panels are not always based on high quality evidence14,15, may not identify situations for which practice performance is clearly sub-optimal16,17, and frequently cannot highlight which processes of care should be targeted to effect improvements in quality14.
Measuring practice performance is an expensive and time-consuming process, and resources are limited. So there are strong incentives to identify which of the available and scientifically acceptable measures are most likely to impact quality of care and should therefore be given the highest priority. If practice performance is already high or a guideline applies to very few patients, then measuring quality relative to that guideline may be an inefficient use of resources because it will never translate into a significant improvement in outcomes at the population level.
Just as there are a growing number of quality measures, there are also a growing number of ways to use quality measures to influence practice performance. For example, they have been used to inform providers and institutions about their performance18, direct payment incentives19-21, and publicly report performance grades22-24. However, these efforts have yielded only limited success and there is no agreement regarding which approach is best. Moreover, it is possible that quality measures are not one-size-fits all tools – that different quality measures are required for different purposes. Some have suggested that quality measures could instead be used to identify where to target quality improvement efforts.25 In the past, quality improvement programs have selected areas for intervention based on perceived importance, anecdotal evidence, financial impact, or intuition. Developing a systematic, explicit method of prioritizing a set of measures based on how well they identify high-priority targets for quality improvement could offer significant advantages, but would require detailed patterns-of-care data and accurate knowledge regarding the impact of recommended treatments on patients' outcomes.
Cancer care is no exception to the quality problem. Investigators have identified many circumstances in which cancer patients do not receive treatments proven effective.15,17,26,27 Breast cancer has been a prime focus of quality assessment efforts because it is prevalent, effective treatments exist, and public interest is strong. Despite extensive efforts to measure the quality of breast cancer care13,17,28,29 and disseminate treatment recommendations via clinical practice guidelines 30-32, significant disparities between recommended treatments and actual patterns of care persist 33-36.
The National Comprehensive Cancer Network (NCCN), an alliance of twenty U.S. cancer centers, has developed a comprehensive set of evidence-based cancer guidelines12,37 and a prospective database on patterns-of-care at member institutions. Using these resources, it has generated an extensive set of evidence-based breast cancer quality measures.38 We wanted to build upon this work and develop an integrated approach to quality improvement. Our objectives were (1) to develop an explicit, transparent, and simple strategy for prioritizing quality measures based on their potential to highlight areas where quality improvement strategies could improve the outcomes of a population, and (2) to apply this methodology to a comprehensive set of breast-cancer quality measures to identify those that offer the greatest potential to improve disease-free survival (DFS) and quality of life (QOL).
The NCCN has produced and regularly updates a comprehensive set of evidence-based cancer treatment guidelines.12,37 The NCCN also supports a prospective database for women with breast cancer treated at participating member institutions that collects all the information needed to assess concordance relative to the guidelines, including detailed patient, treatment, and outcomes variables.39,40 The eligibility criteria and data collection procedures for the database have been described previously.41,42 Some data elements, including socio-demographic features, are collected from a survey administered to patients when they first present. Other variables, including stage and treatments, are collected through a series of routine chart reviews by trained, dedicated abstractors. Rigorous quality assurance processes ensure the data are reliable.
To assess practice performance, we applied the methods originally developed by Weeks and colleagues.38 First, a quality measure is created for each definitive guideline recommendation. Then, information from the outcomes database are used to calculate concordance with each measure – the number of patients who receive a recommended treatment divided by the number eligible to receive that treatment. As the guidelines are updated to reflect new and emerging data, the measures are modified so they always conform with the most current version of the guidelines. Concordance is always assessed relative to the recommendations in place when a patient is treated. The reliability and validity of each measure is evaluated by the NCCN as part of its yearly internal concordance report.
We assessed concordance with 30 quality measures derived from the 2003 NCCN guidelines (Table 1) – the most current guidelines for which at least one year of follow-up data are available. Twelve recommendations apply to chemotherapy, eight to hormonal therapy, six to radiation therapy, and four to surgery. Twenty-one recommend for and nine recommend against a treatment. We assessed concordance among women with newly diagnosed stage 0-III breast cancer. Women 70 or older were excluded, because the NCCN guidelines report there are insufficient data to define chemotherapy recommendations for this cohort. Data from ten centers who volunteered to participate in the outcomes database are included in this analysis: City of Hope, Dana-Farber, Fox Chase, M.D. Anderson, Roswell Park, University of Michigan, Ohio State University, H. Lee Moffitt, University of Nebraska, and Stanford University. Institutional Review Boards (IRBs) from each center approved the data collection, transmission and storage protocols.
Organizations that develop quality measures, such as the Institute of Medicine, recommend considering factors such as the extent of burden imposed by a condition, the extent of the gap between current practice and evidence-based practice, the likelihood the gap can be closed, and the relevance of an area to a broad range of individuals,43 when creating measures. Our goal was to develop a systematic and explicit method of incorporating these factors into the historically subjective, consensus-based quality measure development process. We endeavored to prioritize a set of feasible and scientifically acceptable quality measures using factors similar to those enumerated by the Institute of Medicine.
We identified four key attributes that could be used to help prioritize quality measures. The overall concordance, defined as the number of patients who receive a recommended therapy divided by the number eligible to receive that therapy, serves as a measure of performance across all institutions and helps characterize the extent of the gap between current and evidence-based practice. We considered using mean institutional concordance instead, to reduce the influence of large centers, but wanted the overall assessment to reflect performance at the population rather than the institution level. The number of patients who did not receive concordant care, calculated as the number of eligible patients minus the number who receive the recommended treatment, shows how relevant a measure is for a broad range of individuals. The highest concordance achieved by any one institution represents a realistic benchmark that all institutions should strive to achieve, and helps identify an achievable goal. Concordance values from institutions enrolling fewer than 10 patients on a recommendation were considered unreliable, and were excluded when determining the highest concordance for that recommendation.
The impact of a quality improvement program on a population depends not only on the number of patients who could benefit, but also on the magnitude of benefit experienced by each patient. To fully consider the extent of burden imposed by a condition when prioritizing quality measures, a fourth feature was identified – how much better would a population's outcomes be if patients who had received non-concordant care instead received concordant care. To estimate the impact specific treatments have on outcomes, we surveyed a panel of physicians who have expertise in breast cancer and familiarity with the NCCN guidelines. The panel included 22 medical, radiation and surgical oncologists from 19 U.S. cancer centers who participated in the NCCN breast cancer guidelines panel. The survey was developed using an iterative approach, with serial rounds of question development followed by feedback from test subjects. The Dana-Farber IRB reviewed the survey and its associated methodologies.
The goal of the survey was to have respondents consider both DFS and QOL, and generate a single estimate of the benefit experienced by patients who received the recommended treatment instead of a common non-concordant treatment. First, participants were asked to estimate the improvement in DFS as the percent absolute benefit at five years, and the improvement in QOL as “greatly favors the recommended treatment,” “slightly favors the recommended treatment,” “no difference,” “slightly favors the non-concordant treatment,” or “greatly favors the non-concordant treatment.” Then, participants were asked to consider both factors and report one estimate of the magnitude of benefit using a seven point scale: none (0), minimal (1), small (2), moderate (3), large (4), very large (5), or substantial (6). The mean magnitude-of-benefit for each recommendation was divided by six (the maximum value) to produce a fractional magnitude-of-benefit estimate.
The four attributes described above were incorporated into a single algorithm (Figure 1). The highest concordance minus the overall concordance was divided by 100% minus the overall concordance, and this result was multiplied by the number of non-concordant encounters. In essence, the algorithm computed the number of patients whose care would have to be converted from non-concordant to concordant to raise the overall concordance to the benchmark value. This number was then multiplied by the fractional magnitude-of-benefit estimate to derive a final score.
To explore whether the algorithm gives too much or too little weight to one contributing factor, sensitivity analyses were performed to assess the effect of adjusting the weights of the four variables on the rankings. Adjustments included squaring each variable and performing a natural logarithmic transformation of the skewed number-of-discordant-patients variable. Spearman correlation coefficients were used to assess the relationship between the magnitude of benefit estimates and the survival and quality-of-life estimates, and to explore the association between the final scores and the four contributing variables. Trend lines were derived using simple linear regression. Statistical analyses were performed using SAS software version 9.1 (Cary, NC). A two-sided P value < 0.05 was considered significant.
Concordance analyses were performed on 9,019 women with newly diagnosed breast cancer. A majority (56%) were 50-69 years old; the rest were under 50. At diagnosis, 14.3% had DCIS, 38.4% had stage I, 40.3% had stage II, and 7.1% had stage III breast cancer. The Eastern Co-operative Oncology Group performance status was ≥ 1 in 12.9%; the Charlson co-morbidity score 44,45 was ≥ 1 in 18.4%. Most (83.1%) were Caucasian non-Hispanic; 7.2% were African-American and 9.8% were of another race-ethnic background.
Magnitude-of-benefit estimates came from the survey of expert breast cancer clinicians. Thirteen physicians from twelve institutions completed the survey (response rate 59%). To assess the validity of the magnitude-of-benefit estimates as a measure of DFS and QOL, we compared magnitude-of-benefit estimates with DFS and QOL estimates separately for recommendations stating that treatments should and should not be administered (Figure 2). As expected, there were significant positive relationships between magnitude-of-benefit and DFS estimates among recommendations for treatment (Spearman correlation = 0.80; p < 0.001), and between magnitude-of-benefit and QOL estimates among recommendations against treatment (Spearman correlation = 0.92; p < 0.001). Respondents identified little DFS benefit, but reported a range of magnitude of benefit estimates for recommendations stating treatments should not be administered.
Quality measures were ranked based on their final scores. The five highest-ranking measures, listed in Table 2, assess the care offered to 65% of the cohort. Together, they encompass 45% of all the episodes where a recommended treatment was not provided. Sensitivity analyses in which we adjusted the weights of the four variables, as described above, yielded rankings that correlated highly (Spearman correlation ≥ 0.8; P<0.01) with those generated by the simplest algorithm. Therefore, the simplest algorithm (Figure 1) was selected for further analyses.
The final scores and the four values used to generate these scores for each measure are presented in parallel in Figure 3. The number of non-concordant patients showed the greatest correlation with, and had the greatest influence on, the final scores (Spearman correlation = 0.90; P < 0.001). The magnitude-of-benefit estimates, highest concordance values, and overall concordance values demonstrated smaller, borderline significant correlations with the final scores (Spearman correlations = 0.75 [P=0.06], 0.77 [P=0.06], and 0.72 [P=0.07], respectively).
We describe a systematic, explicit method of ranking quality measures using regularly updated clinical practice guidelines and prospectively collected performance data. Measures are ranked based on their potential to improve a prospectively defined outcome in a specified patient population, rather than on their ability to increase institutional concordance values. When applied to a comprehensive set of breast-cancer process-of-care measures, the highest-ranking measures recommend (1) chemotherapy for node-negative, hormone-receptor positive, tumors measuring 1.1-3 cm, (2) hormone therapy for node-positive, hormone-receptor positive tumors, (3) chemotherapy for node-positive, hormone-receptor positive tumors, (4) radiation therapy following breast-conserving surgery, and (5) hormone therapy for node-negative, hormone-receptor positive, tumors measuring 1.1-3 cm.
Higher-ranking measures tend to have more eligible patients and demonstrate a larger difference between the highest and overall concordance values. Sometimes measures with many eligible patients (#6) or many non-concordant patients (#14 and 16) do not rank highly, because they offer relatively limited potential for improvement. Since the rankings depend largely on the relative number of eligible patients per measure and this factor should be reasonably consistent across systems of care, the quality measures that rank highly in this analysis could have broad relevance beyond the institutions that provided performance data. However, additional studies should assess the reproducibility of the data used to assess concordance and the validity of the measures considered high priority by our analysis before these measures are implemented widely by other health care systems.
Treatments with few eligible patients rarely rank highly, in part because their corresponding measures cannot have a large number of non-concordant patients. This reinforces the need to appropriately scale quality measures to the population and organization being assessed. Treatments that confer only modest improvements in outcomes for individual patients sometimes rank highly (#18). This occurs when many patients do not receive recommended treatments and benchmark concordance values are much greater than overall concordance values. The fact that such measures rank highly underscores the importance this approach places on improving the outcomes of a population rather than the outcomes of individuals.
Using the highest concordance achieved by an institution as the goal for all institutions is advantageous, because it defines a level of performance that is feasible and highlights circumstances where interventions may be more likely to work. However, this approach has its limitations. First, it fails to prioritize situations where care is universally non-concordant (i.e., all institutions perform below 100% and no institution demonstrates a significantly higher concordance). While this is a potential weakness of our approach, such situations do not necessarily represent areas where attention, and quality improvement resources, should be focused. There may be other explanations for consistently non-concordant care. Moreover, systematically identifying a realistic benchmark when all institutions exhibit the same level of care is difficult. Second, it inherently prioritizes situations for which there is substantial variability in performance from center to center. One could argue this often occurs when data are conflicting and experts disagree. However, deriving measures from consensus-based guidelines, as was done for this analysis, helps to minimize this risk.
Our approach to prioritizing quality measures relies on qualitative estimates of the benefits associated with treatments as determined by a survey of a relatively small group of expert breast cancer clinicians. We considered using the results of clinical trials to estimate these benefits, or to calculate the incremental quality-adjusted life years generated by treatments. However, the published data on breast cancer outcomes were too inconsistent to estimate these benefits reliably and consistently for each recommendation. Clinical trials rarely select the same outcomes (DFS, recurrence-free-survival, etc.), end-points (5 years, 10 years, etc.), or patient populations. Furthermore, the estimates provided by clinical trials often compare the outcomes associated with recommended treatments to the outcomes associated with experimental treatments, not the outcomes associated with common non-concordant treatments. Our priority was to use the same estimation method for each recommendation. The approach we chose is simple, practical, and reproducible. It is reassuring that we identified an association between magnitude-of-benefit estimates and DFS, and important to note that the final rankings are only modestly sensitive to the magnitude-of-benefit estimates.
Our goal was to prioritize quality measures based on their potential to improve DFS and QOL. Certainly, these are not the only outcomes that need to be considered. We realize our rankings would have been different if the goal had been different. For example, if we had prioritized measures based on their potential to improve overall survival, then some measures would have ranked lower (#6) and others would have ranked higher (# 22). Moreover, treatment effectiveness is not the only important component of health care quality that needs to be addressed. The Institute of Medicine considers patient safety, patient centeredness and timelines-of-care to be equally important aspects of health care quality.46 While some of the measures included in our analysis, such as the ‘over-use’ measures, address these other components of quality, these ‘over-use’ measures were often not prioritized highly by our methodology. If one believes all components of quality should receive balanced attention, then it may be necessary to develop unique measures for each component of quality and prioritize them separately. Doing so, however, would be challenging because there are relatively few reliable measures and it is hard to define clear, quantifiable goals for these other aspects of health care quality.
While we used quality measures to help identify where potentially ameliorable gaps in quality of care exist, there are other applications for quality measures (e.g., public reporting, grading providers and paying-for-performance). The measures identified as high priority in our analysis may not be ideally suited for these other applications. Unfortunately, quality measures are frequently not tailored to the different purposes for which they are used or the groups to which they are applied. To make quality measurement more efficient and effective, one may have to develop unique measures for these different applications.
It is important to recognize that our prioritization methodology requires a comprehensive set of quality measures and an ability to estimate the impact recommended treatments have on outcomes. Unfortunately, it is not always possible to define an extensive set of measures or estimate the impact of treatments. Our approach also requires a detailed patterns-of-care database – a resource that may not be available in many centers. If non-NCCN centers exhibit different patterns of care than NCCN centers, then all institutions will have to repeat the analysis to identify their own, unique high priority quality measures. However, the resources required to do this could be prohibitive. Finally, this methodology does not preclude the need to reevaluate practice performance as clinical evidence, practice patterns, and quality measures change. The recommendation for chemotherapy in hormone-receptor-positive, node-negative, breast cancer was in line with the highest-level evidence when it was created, but emerging data now suggest chemotherapy may only benefit a subset of these patients. While the measure based on this recommendation (#18) ranked highly in this analysis, it might rank differently in the future, as evidence and practice patterns change.
A few organizations have described criteria for identifying where quality improvement efforts should focus their resources. In addition to those enumerated by the Institute of Medicine (discussed above)43, authors have recommended considering impact on health, meaningfulness to consumers, potential for quality improvement, and susceptibility to influence by the health care system.46 Some researchers have proposed selecting quality measures based on their clinical impact, reliability, feasibility, scientific acceptability, usefulness, and potential for improvement.10,47 Each set of criteria could be used to generate quality measures, and the last set has been used to identify several widely accepted measures. However, we are not aware of any previous efforts that use explicit criteria to prioritize a set of measures in a systematic way or that identify which measures are most likely to help achieve a particular outcome.
Several organizations have described quality measures for breast cancer.9,14,16,48-50 The National Quality Form recommended four: needle biopsy before excision, radiation therapy following breast conserving surgery for women under 70, combination chemotherapy within 60 days of surgery for hormone-receptor negative breast cancer > 1 cm, and axillary node dissection or sentinel node biopsy for stage I-IIb breast cancer.16 The RAND corporation endorsed three: offer modified radical mastectomy or breast-conserving surgery, radiation therapy within 6 weeks of surgery or chemotherapy for women who have breast conserving surgery, and adjuvant systemic therapy (combination chemotherapy and/or tamoxifen) for women over age 50 with positive nodes.14
These quality measures have limitations. Some are not supported by high-quality clinical evidence. Others do not clearly define a population of eligible patients or recommend a specific treatment. Several relate to aspects of care for which it is hard to identify a measurable process that a quality improvement program could target. Most importantly, all were selected as consensus measures by expert panels, without considering actual patterns-of-care data or impact on outcomes. While they overlap somewhat with the recommendations prioritized by our analysis, we identified several unique measures (e.g., #18 and 19). Moreover, some of the measures selected by other organizations and supported by high-quality evidence did not rank near the top of our list (e.g., # 22 and 23), because few patients were eligible for these recommendations and there was not much room for improvement. All of the measures included in our analysis were derived from evidence and consensus-based clinical practice guidelines. Analyses performed by the NCCN pre hoc ensure the measures are feasible and reliable. Most importantly, the highest-ranking measures in our analysis identify clinical areas where practice performance is sub-optimal and a change in practice performance can substantially improve outcomes.
The systematic method of prioritizing quality measures that we describe represents a significant departure from previous efforts to identify priority areas for quality improvement. The methodology is simple and flexible, and could easily be applied to other practice settings, data sources, and diseases, or used it to rank measures across different diseases. The breast cancer quality measures that ranked highly in our analysis represent key leverage points that may have broad relevance beyond the institutions that contributed performance data. In conjunction with the NCCN, the American Society of Clinical Oncology used the results of our analysis to help select their breast cancer quality measures.11 Widespread use of the methods described above could increase the efficiency and efficacy of quality improvement efforts and improve the outcomes of people who rely on our health care system.
This work was supported in part by grant P50 CA89393 from the National Cancer Institute to Dana-Farber Cancer Institute. Dr. Hassett received salary support from R25 CA092203. The sponsors had no direct influence on the design of the study, analysis of the data, interpretation of the results, or writing of the manuscript.
Michael J. Hassett, Department of Medical Oncology, Dana-Farber Cancer Institute, 44 Binney Street, Boston, MA 02115; P: 617-632-6631; F: 617-632-3161.
Melissa E. Hughes, Department of Medical Oncology, Dana-Farber Cancer Institute, 44 Binney Street, Boston, MA 02115; P: 617-632-2268; F: 617-632-3161.
Joyce C. Niland, Division of Information Sciences, City of Hope National Medical Center, 1500 E. Duarte Road, Duarte, CA 91010-3000; P: 626-359-8111; F: 626-301 8802; Email: jniland/at/coh.org.
Rebecca Ottesen, Division of Information Sciences, City of Hope National Medical Center, 1500 E. Duarte Road, Duarte, CA 91010-3000; P: 805-594-0441; F: 805-594 0442.
Stephen B. Edge, Department of Breast and Soft Tissue Surgery, Roswell Park Cancer Institute, Elm & Carlton Streets, Buffalo, NY 14263; P: 716-845-5789; F: 716 845-3434.
Michael A. Bookman, Fox Chase Cancer Center, 333 Cottman Avenue, W11/OPD Dept, Philadelphia, PA 19111; P: 215-728-2987.
Robert W. Carlson, Stanford Hospital & Clinics, Stanford Cancer Center, 875 Blake Wilbur Drive; Room #2236, Stanford, CA 94305-5826; P: 650-725-6457; F: 650-498-4696.
Richard L. Theriault, University of Texas M. D. Anderson Cancer Center, 1515 Holcombe Boulevard, Unit 424, Houston, TX 77030; P: 713-792-2817; F: 713-794-4385.
Jane C. Weeks, Department of Medical Oncology, Dana-Farber Cancer Institute, 44 Binney Street, Boston, MA 02115; P: 617-632-2509; F: 617-632-2270.