|Home | About | Journals | Submit | Contact Us | Français|
Mammography quality assurance programs have been in place for over a decade. We studied radiologists’ self-reported performance goals for accuracy in screening mammography and compared them to published recommendations.
A mailed survey of radiologists at mammography registries in seven states within the Breast Cancer Surveillance Consortium (BCSC) assessed radiologists’ performance goals for interpreting screening mammograms. Self-reported goals were compared to published American College of Radiology (ACR) recommended desirable ranges for recall rate and false positive rate, positive predictive value of biopsy recommendation (PPV2), and cancer detection rate. Radiologists’ goals for interpretive accuracy within desirable range were evaluated for associations with their demographic characteristics, clinical experience and receipt of audit reports.
The survey response rate was 71% (257 of 364 radiologists). The percentage of radiologists reporting goals within desirable ranges was 79% for recall rate, 22% for false positive rate, 39% for PPV2, and 61% for cancer detection rate. The range of reported goals was 0 to 100% for false-positive rate and PPV2. Primary academic affiliation, receiving more hours of breast imaging continuing medical education (CME), and receiving audit reports at least annually were associated with desirable PPV2 goals. Radiologists reporting desirable cancer detection rate goals were more likely to have interpreted mammograms for 10 or more years, and > 1,000 mammograms per year.
Many radiologists report goals for their accuracy when interpreting screening mammograms that fall outside of published desirable benchmarks, particularly for false positive rate and PPV2, indicating an opportunity for education.
Of all the specialties within radiology, breast imaging lends itself to the objective assessment of interpretive performance. As information technology infrastructure in medicine develops, more specialties may be added. Benchmarks for desirable interpretation in breast imaging have been published for the U.S. and Europe 1-3. Many countries now mandate that audit performance data be collected and reviewed so that administrators and radiologists know how well they are performing 4, 5. It is not clear, however, what impact the efforts to collect and review audit data are having on individual radiologists, or whether radiologists have goals for their own performance that align with published benchmarks.
Educational studies have shown that when clinicians understand that a gap exists between their performance and national targets, they can be predisposed to change their behavior 7, 8. Interpretive accuracy in mammography could be improved if radiologists are motivated by recognizing a gap between their individual performance and desirable benchmarks. One web-based continuing medical education (CME) intervention utilized individual radiologist’s own recall rate data and compared it to rates of a large cohort of their peers 9. Radiologists with inappropriately high recall rates were able to come up with specific plans to improve their recall rates based upon this recognition of a need to improve. For improvements to occur, radiologists must recognize the difference between their own performance and desired targets, which is potentially feasible given collection and review of MSQA audit data. However, it is not clear if radiologists are aware of common desirable performance goal ranges.
In this study, we surveyed a large number of community-based radiologists all working in breast imaging and asked them to indicate their personal goals for interpretive performance. We sought to determine the proportion of radiologists’ personal performance goals within published benchmarks and which, if any, characteristics of the radiologists and their practices would be associated with having goals within desirable benchmarks.
All radiologists who interpreted screening mammograms in 2005-2006 in the National Cancer Institute-funded Breast Cancer Surveillance Consortium (BCSC) 10, 11 were invited to complete a self-administered mailed survey. This included seven sites representing distinct geographic regions of the U.S. (California, Colorado, North Carolina, New Mexico, New Hampshire, Vermont, and Washington). Most radiologists participating in the consortium are community-based, and routinely receive audit reports 12. All procedures were Health Insurance Portability and Accountability Act compliant, and all BCSC sites and the Statistical Coordinating Center received a Federal Certificate of Confidentiality and other protection for the identities of the physicians who are subjects of this research. Institutional Review Boards (IRB) of the study university and all seven BCSC sites approved the study.
The survey was developed by a multi-disciplinary team of experts in breast imaging, clinical medicine, health services research, biostatistics, epidemiology, behavioral sciences, and educational psychology, and was extensively pilot-tested. The content and development of the survey have been previously described in detail 6. Our primary outcomes included self-reported goals for various measures of interpretive performance. Radiologists were asked to report their goal or value they would like to achieve for recall rate, false positive rate, positive predictive value of biopsy recommendation (PPV2), and cancer detection rate per 1000 screening mammograms. All performance measures were defined in the survey (Table 1). We also assessed the frequency with which radiologists received performance audits (none, once per year, more than once per year)6.
Surveys (available online at http://breastscreening.cancer.gov/collaborations/favor_ii_mammography_practice_survey.pdf). were administered to radiologists between January 2006 and September 2007, depending on each BCSC site’s funding mechanism and IRB status. Incentives to complete the survey varied among the seven sites and included bookstore gift cards worth $25-$50 for radiologists (seven sites) and for mammography facility administrators and/or technologists (four sites), as well as the fourth edition of the BI-RADS manual 2 for participating facilities (four sites). Once each site obtained completed surveys with informed consent, the data were double-entered and discrepancies were corrected. Encrypted data were sent to the BCSC Statistical Coordinating Center for pooled analyses.
Recommendations from the Agency for Health Care Policy and Research (AHRQ) 1994 clinical practice guideline and the Breast Imaging and Reporting Data System (BI-RADS) manual published by the American College of Radiology (ACR) were used to develop a list of “desirable goals” for mammography performance outcomes 2, 13. Although these goals were not considered guidelines for current medical practice in the U.S., they represent the most recent consensus statement regarding target ranges for accuracy in mammography interpretation. We defined desirable performance goals based on the BI-RADS manual for recall rate, PPV2, and cancer detection rate 2. While false positive rate is not explicitly defined in the BI-RADs manual, it is easily calculated (1-specificity) and was included in this study. We modified the lower bound of recall rate and false positive rate to exclude 0-2%, because rates lower than 2% would not be considered desirable or realistic in the U.S. for mammography screening. The desirable performance ranges we used for analysis were recall rate 2-10%, false positive rate 2-10%, PPV2 25-40%, and cancer detection rate 2-10 per 1,000 screening exams (Table 1).
We also compared radiologists’ self-reported goals to the published U.S. benchmark 25-75% range of performance for BCSC radiologists 14, and those of their highest performing peers, defined as the lowest 0-24 percentile for recall and false positive rates, and highest 76-100 percentiles for PPV2 and cancer detection rates. We used this cohort of radiologists for comparison because it is a very large, generalizable sample of community radiologists from seven U.S. geographic regions for whom well-documented performance measures are available. The BCSC performance ranges are based on 4,032,556 screening mammography examinations performed between 1996 and 2005 at 152 mammography facilities by 803 radiologists. The BCSC inter-quartile ranges were: recall rate 6.4-13.3%, false positive rate 7.5-14.0%, PPV2 18.8-32.0%, and cancer detection rate 3.2-5.8/1,000 screening exams (Table 1) 14, 15.
We calculated the proportions of radiologists who reported performance goals within the desirable range, goals above and below the range, and among those who did not respond. For some analyses, these categories were further collapsed into within desirable range versus outside of range. For such analyses, we assumed that “no response” indicated that a radiologist had performance goals outside of the desirable range. Because 13.6% (35/257) of radiologists did not respond to items on performance goals, we also examined a restricted cohort limited to radiologists who responded to at least one of these items (n=222). In addition, we determined the proportion of radiologists whose self-reported performance goals fell within the U.S. inter-quartile performance range of BCSC radiologists.
We then used chi-squared statistics to assess associations between having goals within desirable range or not and radiologist characteristics (demographics, practice type, breast imaging experience, mammography volume, and audit frequency). Finally, we repeated these analyses with the restricted cohort described above. All statistically significant associations are reported at the P<0.05 level, and p-values are two-sided. Data analyses were conducted by using SAS® software, Version 9.2 (SAS institute, Cary, NC).
Of 364 eligible radiologists, 257 (71%) responded to the survey and 222 completed question(s) related to performance goals. Figure 1 demonstrates the distribution of radiologists’ stated performance goals; many false positive rates and PPV2 goals fall well above the desirable range. The percentage of radiologists reporting goals within the desirable range was 79% for recall rate, 22% for false positive rate, 39% for PPV2, and 61% for cancer detection rate (Figure 2A). Radiologists were more likely to report goals above the desirable range for false positive rate and PPV2 than for recall and cancer detection rates. A much smaller proportion of the reported goals fell within the BCSC interquartile range (25-75%): recall rate 48%, false positive rate 7%, PPV2 28%, and cancer detection rate 24%(Figure 2B). Those with goals consistent with the highest performing quartile of their peers included: recall rate 33%, false positive rate 24%, PPV2 39% and cancer detection rate 38%.
Radiologists’ performance goals stratified by radiologist characteristics are shown in Table 2. Most radiologists were men (72%), not affiliated with an academic medical center (81%), and working more than 40 hours per week in breast imaging (60%). No radiologist characteristics were significantly associated with reporting goals within the desirable range for recall rate or false positive rate. PPV2 goals within the desirable range were associated with having a primary academic affiliation; completing 30 or more hours of breast imaging CME over the past three years (compared to ≥15 hours); and receiving more than one audit report per year. Radiologists between 45-54 years of age; interpreting mammograms for 10-19 years compared to <10 years; and with annual interpretive volume >1,000 mammograms/year were more likely to report desirable cancer detection rate goals (Table 2). When the analyses were repeated with the cohort limited to radiologists who answered at least one of the questions on performance goals (n=222), relationships between radiologists characteristics and outcome measures did not change (data not shown).
Only 21 of 257 (8%) radiologists received fellowship training in breast imaging, and although a higher proportion of this group reported performance goals within the desirable range these findings did not achieve statistical significance.
When the analysis was limited to radiologists who answered at least one of the questions on performance goals (n=222), 90% fell within the desirable range for recall rate, 25% for false positive rate, 45% for PPV2, and 70% for cancer detection rate. In this subgroup, the proportion of radiologists reporting goals that fell within the BCSC interquartile range (25-75%) were: recall rate 56%, false positive rate 8%, PPV2 32%, and cancer detection rate 28%. data not shown).
Quality assurance programs for breast cancer screening services are intended to utilize audit data to improve clinical outcomes 3, 4, and helping clinicians understand the gap between their own performance and national targets has been demonstrated to predispose physicians to change 7, 8. In this study, many radiologists reported goals for their interpretive performance that were either above or below published desirable benchmarks. Self-reported goals for recall rate and cancer detection rate, two measures well understood by interpreting radiologists, were most closely aligned with published goal ranges. For false positive rate and PPV2, a majority of radiologists reported goals that fell outside of desirable ranges, with relatively even dispersion of reported goals between 0 and 100%. This indicates that many radiologists are not familiar with false positive rate and PPV2 or they have unrealistic goals for these measures.
For over a decade the Mammography Quality Standards Act (MQSA) has legislated that radiologists review mammography outcome data4, however a 2005 Institute of Medicine report on Improving Breast Imaging Quality Standards 16 noted that interpretation by radiologists remains quite variable. Attempts to identify predictors of accuracy in mammography interpretation have studied a wide array of potential characteristics, and found that fellowship training was the only trait associated with better interpretive performance 6. Another study suggested that a learning curve exists, peaking approximately five years after completion of residency 17. Thus education of motivated radiologists may have the potential to improve mammography interpretation, including shifting the learning curve forward in time.
Education plays a role in this study also. Radiologists with more hours of CME in breast imaging and primary affiliations with academic medical centers were more likely to report PPV2 goals within desirable range. Receiving audit feedback at least annually was also associated with PPV2 goals within range. One study of community U.S. radiologists with high recall rates demonstrated the practical application of the combination of education and audit feedback, by using web-based CME to compare their recall rates to their peers’ and motivate them to set appropriate recall rate goals 9. Education is used in the United Kingdom for radiologists participating in the Breast Screening Program who train biannually using test sets, and if indicated, receive additional training specific to their areas of identified weakness 18. Before specific education to improve radiologists’ performance can occur, radiologists must be motivated to improve by recognizing the gap between their own performance and desired ranges.
The medical audit is recognized as one of the best quality assurance tools because it can identify performance strengths and weaknesses. In contrast to many European countries 3, audit feedback to radiologists in the U.S. is variable in content and format 12 Recall and cancer detection rates were clearly identified on all audit reports received by radiologists in this study 12, and were the measures most reported within desirable range. In contrast, false positive rate was not explicitly presented in any audit, and only some of the audits reported PPV2 by name 12. Thus, our findings suggest that clear, specific reporting of individual performance feedback juxtaposed with desirable ranges could help radiologists visualize their own performance gap, and theoretically motivate them to take the next step towards wanting to find specific ways to improve.
In many European Union (E.U.) countries, radiologists typically specialize in breast imaging and interpret high volumes of mammograms, 5,000 exams/year or more 18. In the U.S., however, radiologists are often not breast specialists and annual volumes of exams required of radiologists are relatively low (480 screening mammograms per year). Studies of radiologists’ volume and accuracy, though not consistent, have generally found that radiologists interpreting higher volumes have lower false positive rates without increased cancer detection 19, 20. In this study, community practicing radiologists identifying desirable goals for cancer detection were more likely to have greater interpretive volume and more years interpreting.
While most radiologists reported goals within desirable range for recall rate (79%), few reported goals within desirable range for false positive rate (22%). False positive rate is not explicitly defined in ACR guidelines, and radiologists had the most difficulty identifying appropriate goals for this measure. Recall rate and false positive rate are numerically similar because the small number of cancers in a screening population (4.7 cancers per 1,000 screening exams) means that most mammograms that are recalled are false positive 14. The significance of false positive exams has been highlighted in the literature 21-23, frequently discussed in the lay press 24, and the potential harms of over diagnosis are increasingly being acknowledged 25. It is possible that clearly defining false positive rate in future guidelines and explicitly reporting this rate in audit data could improve radiologists’ awareness and understanding of this commonly used measure. In conceptual terms, it is important for radiologists to understand their false positive rates, because while working toward maximizing cancer detection, they should attempt to minimize the burden of false positive work-ups.
A unique strength of the present study is our comparison of study radiologists’ reported goals with the actual benchmark performance of a large cohort of community practicing radiologists from the Breast Cancer Surveillance Consortium (BCSC), demonstrating that radiologists’ reported goals did not emulate the accuracy of their highest performing peers. Other strengths include the participation of a large number of community radiologists, the survey response rate of 71%, which is higher than the rate for most physician surveys 26, and the availability of detailed information on the audits received by the radiologists.
One limitation of our study is that it did not address whether individual radiologists whose performance goals fall within desirable ranges are more likely to have better actual interpretive performance. Given the clinical relevance of this point, additional work is recommended to assess a potential link. It is also important to note that 35 radiologists did not respond to any of the survey items on performance goals. Of these 35 radiologists, all but one responded to two subsequent survey items about CME. Thus their non-response seems unrelated to survey fatigue.
In conclusion, many radiologists in our study reported goals for their own interpretation of screening mammograms that fall outside of published desirable benchmarks, particularly for false positive rate and PPV2. Knowledge of desirable performance ranges is a necessary step in interpreting audit data14. Further work is warranted to evaluate whether explicitly defining and reporting target goals on individual performance audits results in improved understanding by radiologists of their own level of performance, and ultimately in improved clinical accuracy.
This work was supported by the National Cancer Institute and Agency for Healthcare Research and Quality (1R01 CA10762), the National Cancer Institute (1K05 CA104699; Breast Cancer Surveillance Consortium: U01CA63740, U01CA86076, U01CA86082, U01CA63736, U01CA70013, U01CA69976, U01CA63731, U01CA70040), the Breast Cancer Stamp Fund, and the American Cancer Society, made possible by a generous donation from the Longaberger Company’s Horizon of Hope Campaign (SIRGS-07-271-01, SIRGS-07-272-01, SIRGS-07-273- 01, SIRGS-07-274-01, SIRGS-07-275-01, SIRGS-06-281-01). We thank the participating mammography facilities and radiologists for the data they have provided for this study. A list of the BCSC investigators and procedures for requesting BCSC data for research purposes are provided at: http://breastscreening.cancer.gov/. We also thank Raymond Harris, Ph.D., for his careful review and comments on the manuscript.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.