Community radiologists varied substantially in their interpretation of screening mammograms; the variability in false-positive mammography rates was reduced by half, but not eliminated, after adjustment for differences in the patient population, the testing situation, and radiologists' characteristics. Before adjustments, the 24 radiologists varied in their false-positive interpretation rates from 2.6% to 15.9%; after full adjustment for patient, testing, and radiologist characteristics that may influence false-positive readings, variability was reduced to a range of 3.5%–7.9%. While patient, testing, and radiologists' characteristics were all important predictors of false-positive rates, radiologist characteristics were more important in accounting for variability among radiologists in this study than we had anticipated. The unexpected importance of radiologist characteristics was probably due to the similarity of patient populations and testing characteristics across radiologists in this study. However, these characteristics may not be similar in other studies; therefore, it will typically be important to adjust for all of these variables when studying radiologists' variability.
The most important radiologist characteristic appeared to be age and time since graduation from medical school, with younger radiologists and those more recently in training having higher rates of false-positive mammograms. The fact that younger radiologists and those more recently trained had two to four times the false-positive mammographic examination rates of older radiologists () is especially noteworthy, because it is reasonable to hypothesize that those most recently trained would be more accurate than older mammographers, i.e., those trained a long time ago. It is possible that the younger radiologists missed fewer cancers than did older mammographers who were more distant from their training, because their training emphasized sensitivity over specificity.
Variability has been noted in many areas of clinical medicine (28
). Microscopic review of breast tissue slides has an element of subjectivity in interpretation similar to that of interpretation of mammograms. For example, in the diagnosis of ductal carcinoma in situ
, agreement among five pathologists with a standard interpretation on a test set of 24 breast tissue slides ranged from 71% to 92%, with individual false-positive rates ranging from 0% to 20% (30
). Obviously, the CIs around the individual rates would be wide, given the small sample size, but the similarities with our findings in mammography are striking.
Several studies (2
) have indicated that significant variability exists in the interpretation of mammograms. This variability indicates the possibility of wide ranges in false-positive mammogram interpretations by individual radiologists, which can be both alarming and expensive for the patient (10
). By better understanding sources of variability in mammography interpretation, we can identify potential areas of improvement. The ultimate goal is to enhance mammography performance by reducing the rate of false-positive interpretations while maintaining high levels of sensitivity and accuracy.
It has long been known that certain clinical and demographic characteristics of women make accurate reading of mammograms more difficult (31
). More recently, several studies (34
) have shown that time between mammograms and the availability of previous studies for comparison also affect accuracy. However, less attention has been directed to secular trends in false-positive mammographic examination rates. We found that rates almost doubled in this community setting between 1985 and 1993. This increase in false-positive rates may be related to fear of medical malpractice litigation, given the prominence in North America of malpractice litigation for delayed detection of breast cancer.
Strengths of this study include the fact that it was done within a community setting and with radiologists who had a broad range of years of experience and who had worked in different types of clinical settings. Data were available on the radiologists, the patients, and the testing characteristics, all of which were controlled for in the analysis. Most of the prior studies of radiologists' variability in mammography have been done in a testing situation, which might not be representative of real-life clinical practice (8
The limitations of our study include the fact that the radiologists in this study did not read the same films, and so direct comparisons are not possible (although we did adjust for patient characteristics in the models). Only 45 women were diagnosed with breast cancer; thus, we did not analyze sensitivity. In addition, some of the radiologists read fewer than 100 mammograms in the 8.5-year study period, which makes comparisons difficult because the CIs were wide. It should be noted, however, that these radiologists read additional films outside this study cohort; thus, the numbers do not represent the total number of mammograms they read during the study period. In addition, the American College of Radiology breast imaging reporting and data system (BI-RADSTM
) classification system was not in use at the time of the study (36
). Although use of BI-RADSTM
may ultimately lead to less variability among radiologists, this has not yet been shown to be the case (5
). The false-positive rates for our participating radiologists were lower than the national average; thus, our results possibly underestimate the variability among radiologists elsewhere. Finally, the data in this study are for 1985 through 1993, and reading patterns among radiologists may have changed since then.
Given the retrospective nature of this study, data on some variables were not available, which may have resulted in misclassification errors. For example, several factors related to radiologists that might be important and should be included in future research include fiscal incentives, medical malpractice concerns, and comfort with ambiguity in clinical decision making. Adjustments for these and other variables may further decrease the variability in false-positive rates.
In summary, community radiologists varied widely in their false-positive rates for screening mammograms. This variability was affected not only by the kind of patients seen but also by radiologists' age and experience. Younger radiologists and those more recently in training had higher rates of false-positive mammogram interpretations. This study was different from research designs that used test sets of films, because we looked at radiologists' decisions as they naturally occur in actual clinical practice. That the variability among radiologists in false-positive mammographic examination readings was reduced by half underscores the importance of adjusting for patient and radiologist characteristics when attempting to understand variability in clinical medicine.