Within a large geographically diverse sample of mammography facilities, CAD was associated with statistically significant decreases in specificity and PPV1, a non-statistically significant increase in sensitivity, but no statistically significant improvements in either cancer detection rates or the prognostic characteristics of incident breast cancers. The non-statistically significant increase in sensitivity was attributed to greater detection of DCIS with CAD rather than increased detection of invasive breast cancer.
This study offers important new insights regarding the effectiveness of CAD when used in real-world practice. First, the results suggest a limited impact of CAD on breast cancer detection, particularly with respect to invasive breast cancer detection. In adjusted analyses, CAD was not associated with improved detection of invasive breast cancer, increased early stage diagnosis, or smaller-sized invasive breast cancers. These findings raise concerns that CAD, as currently implemented in clinical practice, may have little or no impact on breast cancer mortality, which may depend on earlier detection of invasive breast cancer (
6,
7). Second, this study of CAD adoption within 25 BCSC facilities indicates that CAD has a modest impact in typical practice compared with a previous analysis within seven BCSC facilities from 1998 to 2002 (
10). The current findings are consistent with a meta-analysis that suggests a modest increase in recall rates with CAD with little or no impact on cancer detection rates (
17).
Nishikawa and Pesce (
33) argue that studies with matched designs are the most accurate means of assessing CAD clinical impacts (ie, in which outcomes are compared before and after CAD application on the same mammogram) (
34). However, matched studies typically impose the restriction that radiologists can only upgrade final BI-RADS assessments after viewing CAD output, resulting in the recall of women who would not have been recalled in the absence of CAD (
11,
13,
36,
37). In our view, this design assesses the efficacy of CAD, or its clinical impact when used under optimal conditions. In contrast, this study assesses the effectiveness of CAD use in actual everyday practice conditions, in which radiologists with variable experience and expertise may use CAD in a nonstandardized idiosyncratic fashion (
5). Some community radiologists, for example, may decide not to recall women because of the absence of CAD marks on otherwise suspicious lesions.
This large-scale population-based observational study also enabled assessment of the impact of CAD on important breast cancer outcomes such as DCIS detection and the stage distribution of invasive cancers. These outcomes may be impossible to assess with adequate statistical power within matched studies or even randomized trials (
38). Although our analyses may lack sufficient power to exclude a small benefit of CAD in terms of invasive breast cancer detection, the principal contribution of CAD may be increased detection for DCIS—a precancerous lesion with an ill-defined long-term prognosis (
39). Point estimates and 95% confidence intervals from this study may be useful in statistical models of long-term breast cancer outcomes among women screened with and without CAD (
40). Such models could quantify the potential for CAD to improve breast cancer mortality, possible overdiagnosis of DCIS, patient preferences for earlier treatment of DCIS vs later treatment of invasive cancer, the harms of additional false-positive mammograms, and societal costs (including both supplemental fees for CAD use and the costs of added diagnostic testing).
We performed post hoc analyses to explore potential heterogeneity across facilities and over time. Because the analyses were conducted post hoc, these results should be interpreted cautiously, particularly the resultant confidence intervals and
P values. These analyses suggest that substantial initial changes in specificity and sensitivity after CAD implementation subsequently attenuated within six BCSC facilities that were included in a previous report (
10). The performance impact of CAD at these facilities may have diminished as radiologists gained experience with CAD. Similar adjustment effects, however, were not apparent within 19 other CAD facilities that are reported on here for the first time. It is possible that more recent versions of CAD software have a less potent influence on interpretation, although a recent comparative study suggests that earlier compared with later versions of CAD software do not differ greatly in performance (
41). It is also possible that training in CAD use may vary across radiologists and facilities, leading to variable interpretive impacts of CAD. In addition, overall performance changes associated with CAD may be attributed to substantive impacts within some facilities, whereas CAD may have little interpretive impact when implemented in most community facilities. Variability in the impact of CAD on recall rates was also observed in a meta-analysis (
17) and across three sites in a trial comparing mammogram interpretation by a single radiologist with CAD versus interpretation by two radiologists (double reading) (
42).
Although prior analysis of BCSC data indicated increased biopsy risk with CAD (
10), this analysis reveals a decline in the rate of biopsy recommendation over time regardless of CAD use. In the Women’s Health Initiative, women randomly assigned to HT had nearly two-thirds greater cumulative breast biopsy risk (
29), and reduced HT use after publication of the Women’s Health Initiative results in July 2002 may largely explain the observed reduction in biopsy recommendations (
32). Biopsy recommendations declined despite reduced specificity within facilities that implemented CAD, suggesting that most CAD-induced recalls were resolved without biopsy.
A limitation of this study is the absence of digital mammography data. Whereas CAD algorithms perform a similar alerting function in the film-screen and digital environments, film-screen mammograms must be digitized before CAD analysis, and digitization may introduce noise and adversely affect performance. However, small retrospective studies suggest that the performance impacts of CAD are similar when used in digital (
43–
46) and film-screen environments (
8,
11,
19).
Because prior research suggests that facilities apply CAD on nearly all mammograms after implementation (
10), these analyses assumed that all mammograms were interpreted with CAD after implementation—another limitation of this study. To the extent that facilities did not use CAD on all mammograms, results may be biased toward the null. As the analyses account for salient patient factors, unmeasured radiologist or facility characteristics may affect results. Results were similar, however, in analyses in which CAD effects were conditional on the basis of the facility, which would control to some extent for potentially confounding facility factors. Similarly, results were unchanged when analyses were restricted to mammograms interpreted by radiologists who were present at facilities before and after CAD implementation, therefore controlling to some extent for changes in radiologist staffing over time. Although the number of women with breast cancers diagnosed after CAD implementation (>1000 cancers) is greater than that observed in previous samples, larger samples may be needed to detect small increases in sensitivity or cancer detection with CAD. Finally, another limitation of this study is the lack of data on the CAD products that each facility used, so the potentially distinct impacts of different products could not be investigated.
Among a large sample of US mammography facilities, CAD was associated with statistically significantly decreased specificity and PPV
1. CAD was not associated with improved sensitivity for invasive breast cancer, increased rates of breast cancer detection, or more favorable stage or size of invasive breast cancers. CAD is now applied to the large majority of screening mammograms in the United States with annual direct Medicare costs exceeding $30 million (
2). As currently implemented in US practice, CAD appears to increase a woman’s risk of being recalled for further testing after screening mammography while yielding equivocal health benefits.