In an analysis of probabilities of false-positive recall or biopsy recommendation using registry data collected in community practice, we estimated that the risk of a FP result was higher following a biennial screening interval than an annual interval. However, after 10 years of repeat screening at approximately annual or biennial intervals, the cumulative probability of receiving at least one FP recall or biopsy recommendation was lower with biennial compared to annual screening whether women started screening at age 40 or age 50. Compared to annual screening, biennial screening was associated with a non-statistically significant absolute increase of 2 and 3 percent in the proportion of women diagnosed with late-stage cancer in a cohort of those who developed cancer.
Our estimates of a woman’s cumulative probability of a FP mammogram result after repeat screening are higher than previously reported (
10,
11,
13,
15). This is partly explained by a higher probability of FP results at each exam for our cohort. Our estimate of the FP recall probability at a single screening round was 16.3% at first exam and 9.6% at subsequent exams, compared to estimates of 6.5% for other cohorts (
10). Additionally, previous methods to estimate the cumulative FP probability assumed that censoring was non-informative, which leads to underestimation if women at higher risk of a FP are more likely to be observed for fewer screening rounds (
20).
Our analysis identified covariates associated with FP mammography results that resemble previous reports. Positive associations between FP recall and previous breast biopsies, family history of breast cancer, postmenopausal hormone therapy use, more recent exam year, and time between screening exams have been reported previously, as have negative associations between FP recall and older age and comparison film availability (
6,
15,
18,
27–
29). We also identified statistically significant associations between FP recall and family history of breast cancer, exam year, time since previous mammogram, and availability of comparison films. Surprisingly, we found older age was associated with FP recall only at the first exam. This induced small differences in the cumulative probability of FP recall by starting age.
Few previous studies have estimated the cumulative probability of FP mammography results after repeat screening in U.S. community practice. We searched PubMed Central using the terms “cumulative”, “false positive”, and “mammography” to identify all studies evaluating the cumulative probability of FP recall and biopsy recommendation after repeat screening mammography. From among these, we reviewed the titles and abstracts to identify all studies providing estimates of the cumulative FP probability based on screening mammography in the U.S. We then searched all references citing these papers using Web of Science and reviewed titles and abstracts of these manuscripts to identify any additional references we may have missed. This review found 7 studies reporting cumulative FP probabilities for repeat screening mammography in the U.S (
10–
13,
15,
20,
30). Elmore (1998) reported a 49.1% probability after 10 rounds of screening (
10). Christiansen (2000) found a FP probability of 22% after five screening mammograms under biennial screening for an intermediate-risk woman and median radiologist (
15), compared to estimates for our population of 38–40%. Studies of benign biopsy have found a probability of 8–9% after 10 screening rounds (
12,
13), which are similar to our estimate. Based on our review, we believe ours is the first study to incorporate covariate effects and variation among radiologists into estimates of cumulative FP biopsy recommendation rates.
Our results on the risk of late-stage cancer following annual and biennial screening intervals are similar to those previously reported. A previous BCSC study found a statistically significantly higher proportion of late-stage cancers among women 40–49 participating in biennial compared to annual screening (28% vs. 21%), but no significant difference among women 50–59 (22% vs. 21%) (
24). Although we found no statistically significant absolute difference in the overall proportion of late-stage cancers with biennial compared to annual screening, our findings could not exclude an increase in late stage cancer of as much as 7.8% among women in their 40s and as much as 5.7% among women in their 50s based on the upper confidence bound of the estimate of absolute difference. The relatively broad confidence limits around our estimates of difference are likely attributable to the small sample size available for our analysis of incident cancer, and a larger future study is required to exclude the possibility of a clinically significant increase in late stage cancer with biennial compared to annual screening, or even a smaller and less clinically significant decrease.
We have investigated two types of FP mammography results: recall for additional imaging and recommendation for biopsy. Our definitions of FP recall and biopsy recommendation are consistent with the BI-RADS Atlas, which distinguishes these two types of false-positives (
21). Previous research on the effects of FP mammograms suggests that women receiving a FP recall or benign biopsy experienced elevated anxiety and distress (
31). Benign biopsy poses additional risks of pain and scarring (
32,
33). So FP recalls, although common, exert smaller effects than do FP biopsy recommendations. Both the relative frequency and severity of these two types of FP results should be considered when evaluating the harms of screening mammography.
Most screening mammograms had an initial assessment of negative or benign (BIRADS 1 or 2) or of BIRADS 0, needs additional imaging. Most in the latter category resolved on further evaluation to a negative or benign result; about 10% were interpreted as having suspicious abnormalities, and status continued to be unresolved (BIRADS 0) or was missing for about 19%. This could be because the woman did not return for follow-up imaging within 90 days of her screening mammogram or because she went to a facility outside the BCSC. In our analysis these observations have been defined as recalls but have been excluded from biopsy recommendation analyses. If the sub-group with missing final assessments is likely to go on to receive a biopsy recommendation, then this would tend to bias estimates of FP biopsy recommendation downward. However, these missing observations make up only 6% of the total sample, so the magnitude of this bias is expected to be small.
Our study has limitations. Although it was based on a large sample, it included 10 or more rounds of screening for a very small number of women, so our cumulative probability estimates after 10 years of annual screening depend on statistical modeling. However we were able to incorporate information from women with fewer than 10 exams using statistical methods developed for this purpose that accommodate informative censoring; previous methods for estimating cumulative FP probabilities are downwardly biased when FP recall is more common among women with fewer observed rounds of screening, as in our cohort (
20).
We lacked information on radiologist characteristics associated with FP recall. Previous research identified variation in interpretive performance by radiologist characteristics such as fellowship training and years of experience as influencing FP recall (
14,
16,
17,
34,
35). We attempted to capture differences in radiologist FP rates using random effects to estimate FP recall and biopsy recommendation variability in the middle 50% of radiologists. Variation is even larger when comparing radiologists with the highest and lowest FP rates.
Most mammograms in this analysis were film-screen exams. Digital screening mammography is rapidly becoming the predominant screening modality, with 76.2% of accredited facilities using full field digital machines as of May 1, 2011 (
36). However, research on the performance of digital mammography has indicated similar specificity, and hence FP rates, for digital and film-screen exams (
37,
38). A slight, non-statistically significant decrease in specificity has been observed for some sub-groups (
38). This would result in increased FP probabilities relative to those observed in this study.
The study’s cumulative FP risk estimates apply only to the first 10 years of screening. Over the course of a lifetime of screening, beginning screening 10 years earlier would result in an additional 10 screening mammograms under annual screening and 5 under biennial, and the lifetime risk of FP mammography results will thereby be increased. We could not estimate lifetime cumulative FP risks because doing so would require extrapolation beyond the length of observation in the current study. We found no statistical difference in FP recall probabilities among women age 60 and over and those aged 40–44 years, but estimated that FP biopsy recommendation probabilities were statistically significantly higher in women age 65 or older. Therefore, cumulative FP biopsy recommendation probabilities for the ten years beginning at age 60 might be higher than those we have reported for women who began screening at younger ages.
In summary, we estimate that after 10 years of annual screening, a majority of women will receive at least one FP recall, and 7–9% will receive a FP biopsy recommendation. Both probabilities are lowered with biennial screening. In a population of women diagnosed with cancer, we also identified a non-statistically significant increase in the proportion diagnosed with late stage cancer after biennial screening compared to annual. Biennial screening thus decreases risks but may also attenuate the benefits of routine screening. Women and physicians should be aware of the possibility of these harms associated with different screening intervals so they can make informed decisions about screening and be prepared for what to expect when they receive their results. They should also ensure that prior mammograms, when they exist, are available to the interpreting radiologist, as it seems clear from these data that availability of prior studies may halve the odds of a FP recall.