Radiologists reading mammograms are mandated to produce performance data (audit reports) for their MQSA certified institution, and these measures are deliberately designed to be used as a performance improvement metric. However, in our study we noted that radiologists were relatively good at estimating their recall and cancer detection rates, but most were unable to accurately estimate their false-positive rate or PPV2. Radiologists tended to underestimate their recall rate and overestimate their false-positive, cancer detection, and PPV2 rates. Many radiologists perceive themselves as having better interpretative performance then they actually do. This is an important finding, because without an accurate understanding of their performance, it is unrealistic to expect radiologists to know whether improvement is needed and which areas are most in need of improvement. Performance feedback should include both definitions of the performance measures and display results relative to national guidelines or peer performance to assist highly motivated physicians to improve if needed.
While almost all radiologists (96%) reported receiving audit reports, receipt of these data did not appear to fully inform them of their own performance on the outcome measures in this study. We had hypothesized that receipt of audit reports, clinical experience and fellowship training would all improve radiologists’ accuracy at estimating their own interpretative performance, but we found minimal evidence of this relationship. Only radiologists with a higher volume of mammograms had a positive effect on accurately estimating recall rate and radiologists who more frequently used numbers or statistics when discussing mammography results with patients were more accurate in estimating their cancer detection rate.. While audit report information was available to all of the study radiologists since they work at a BCSC facility, the audit data varied across sites with most sites providing information at the radiologist level, but others at the facility level only [11
]. The type of information provided also varies across sites, with all reports providing recall rates and none reporting false-positive rates. An important next step would be to evaluate how audit reports are actually reviewed and considered by individual radiologists. Research on physician behavior change indicates that predisposing physicians to change requires showing them the gap between their own performance and that of national targets [12
]. Prior work in this area suggests that the format of audit reports may make a difference in how physicians use data to improve their clinical practice [6
Another important finding of our study was how few radiologists were able to accurately estimate their false-positive rate. Only 62% of radiologists even attempted to provide an estimate and among those who did, only 28% provided an accurate estimate. The provided values suggest some radiologists may have confused false-positive rate and specificity when completing the survey, even though definitions were provided. However, even if we assume that radiologists who provided very high estimates of their false-positive rate (i.e., >50%) were actually providing estimates for their specificity and calculate false-positive rate from those values as 100-specificity, still only 36% accurately estimated their false-positive rate. Further, a recent study on this same population evaluating if radiologists could predict a reasonable goal for these performance measures, only 22% reported goals for false-positive rate within the range recommended by the American College of Radiology [14
]. False-positive rates are much higher in the US relative to Canada and European countries with screening programs [15
] likely due in part to malpractice concerns. However, this represents an opportunity for improvement in order to minimize the negative consequences of over-treatment, anxiety and cost for women [16
]. It is also possible that radiologists interpreting screening mammograms do not typically conduct the diagnostic work-ups on these same patients, thus they are not always aware of the outcome. It will be difficult to motivate radiologists to reduce their false-positive rates (while maintaining sensitivity) if they do not understand what their false-positive rates currently are, how it is calculated, or how they compare with their peers.
Only one previous study compares estimated versus actual mammography performance, but it was restricted to three geographic regions in the US and did not evaluate false-positive and cancer detection rates [5
]. Our analysis was also conducted on a more recent survey, such that radiologists had 5 additional years of audit feedback about their performance, and had more cumulative years of actual performance data from the BCSC to accurately estimate their performance. Our results are not directly comparable to this early report as the statistical methods were different and the screening population was more restrictive in the previous study.
Our study has several strengths. We had a good response to our survey tool (68.6%) and we examined radiologists’ perceived interpretive performance compared to their peers and also their actual mammography performance data from clinical practices in six geographically distinct regions in the US. This suggests that the findings of our study are generally applicable across the country.
One weakness of our study was the low response rate for estimating false-positive and cancer detection rates. This may indicate that participants were not comfortable estimating these two measures. We suspect that radiologists purposely skipped these questions, as all but one radiologist who did not answer these questions provided responses for the subsequent survey questions. Having variable audit data across BCSC sites is also a limitation. However, since almost all study radiologists reported receiving audit reports, radiologists could have looked up their performance measures in their BCSC audits; thus, our results could overestimate the percentage of US radiologists who can accurately describe their interpretive performance.
For most performance measures, radiologists overestimate their ability, including perceiving their screening interpretive performance as better than their peers and have particular difficulty in estimating their false-positive rate and PPV2. Given the study findings, opportunities for improving radiologists’ understanding of their performance could include a standardized facility and physician audit reporting form such as a “Radiologist Report Card” for screening mammography interpretation, with clear reporting of recall rate, PPV2, false positive rate, and cancer detection rate relative to national guidelines or peer cohort. Similarly, providing a website that allows for radiologists to compare themselves to their peers in the United States and other countries may improve their ability to understand their own interpretive performance measures, and know if, and in which areas, their interpretive performance needs improvement. [17
] Development of widely available CME specific to MSQA reporting of performance measures, including radiologist’s individual audit data relative to peers’, could also be an effective tool for self assessment, and potentially ultimately improve clinical interpretation. The routine submission of local data to the National Mammography Database, developed by the American College of Radiology, https://nrdr.acr.org/Portal/NMD/Main/page.aspx
(accessed 12/1/11), is another powerful tool which should assist radiologists in their understanding of their own performance relative to their peers
Radiologists perceive their performance to be better than it actually is and at least as good as their peers. Radiologists have particular difficulty estimating their false positive rates and PPV2. Future study of strategies to improve audit feedback to and education of radiologists is warranted, but encouragement for radiologists to join the ACR National Mammography Database would answer many of these findings.