|Home | About | Journals | Submit | Contact Us | Français|
Computer-aided detection (CAD) is applied during screening mammography for millions of US women annually, although it is uncertain whether CAD improves breast cancer detection when used by community radiologists.
We investigated the association between CAD use during film-screen screening mammography and specificity, sensitivity, positive predictive value, cancer detection rates, and prognostic characteristics of breast cancers (stage, size, and node involvement). Records from 684 956 women who received more than 1.6 million film-screen mammograms at Breast Cancer Surveillance Consortium facilities in seven states in the United States from 1998 to 2006 were analyzed. We used random-effects logistic regression to estimate associations between CAD and specificity (true-negative examinations among women without breast cancer), sensitivity (true-positive examinations among women with breast cancer diagnosed within 1 year of mammography), and positive predictive value (breast cancer diagnosed after positive mammograms) while adjusting for mammography registry, patient age, time since previous mammography, breast density, use of hormone replacement therapy, and year of examination (1998–2002 vs 2003–2006). All statistical tests were two-sided.
Of 90 total facilities, 25 (27.8%) adopted CAD and used it for an average of 27.5 study months. In adjusted analyses, CAD use was associated with statistically significantly lower specificity (OR = 0.87, 95% confidence interval [CI] = 0.85 to 0.89, P < .001) and positive predictive value (OR = 0.89, 95% CI = 0.80 to 0.99, P = .03). A non-statistically significant increase in overall sensitivity with CAD (OR = 1.06, 95% CI = 0.84 to 1.33, P = .62) was attributed to increased sensitivity for ductal carcinoma in situ (OR = 1.55, 95% CI = 0.83 to 2.91; P = .17), although sensitivity for invasive cancer was similar with or without CAD (OR = 0.96, 95% CI = 0.75 to 1.24; P = .77). CAD was not associated with higher breast cancer detection rates or more favorable stage, size, or lymph node status of invasive breast cancer.
CAD use during film-screen screening mammography in the United States is associated with decreased specificity but not with improvement in the detection rate or prognostic characteristics of invasive breast cancer.
Computer-aided detection (CAD) is used by many radiologists to analyze and interpret mammograms before making a final recommendation. Although three out of four screening mammograms include CAD in the United States, it is unclear if CAD use improves breast cancer detection and justifies the potential associated risks and costs.
Using data from 684 956 women and more than 1.6 million mammograms administered at Breast Cancer Surveillance Consortium facilities that implemented CAD between 1998 and 2006, the relationships between performance, cancer detection rates, and breast cancer prognostic characteristics and CAD implementation were investigated.
CAD implementation was associated with decreased specificity and positive predictive value. CAD use was not associated with increased detection rates or more favorable prognostic characteristics.
The health benefits of CAD use during screening mammograms remain unclear, and the data indicate that the associated costs may outweigh the potential health benefits.
Training of radiologists in CAD use and interpretation could vary widely. For this study, digitized mammography data were not available, thus film-screen mammograms were used for analysis. Film-screen mammograms are digitized before CAD, potentially introducing noise and adversely affecting performance. Also, the analysis assumed that after a facility implemented CAD, all subsequent mammograms were analyzed with CAD, introducing potential bias.
From the Editors
Computer-aided detection (CAD) is a radiological device designed to assist radiologists during mammography interpretation. CAD software algorithms analyze data from mammogram images to identify patterns associated with underlying breast cancers (1). After a radiologist completes an initial mammogram assessment, CAD marks potential abnormalities on the image for the consideration of the radiologist before making a final recommendation. In film mammography, CAD is coupled with a device to convert film mammograms to digital images and a viewing board while CAD is integrated directly into digital mammography environments.
CAD received Food and Drug Administration approval in 1998 on the basis of small studies, which suggested that CAD could increase breast cancer detection without undue increases in recall rates. Congress subsequently amended the Medicare statute to include supplemental coverage for CAD use, and CAD is now applied on nearly three of four screening mammograms performed on American women (2). Analogous CAD systems are under development to assist interpretation of computed tomography and magnetic resonance imaging of the breast, chest, colon, liver, and prostate (3).
Despite broad acceptance and use, it is unclear if the benefits of CAD during screening mammography outweigh its potential risks and costs (4,5). Ideally, CAD would lead to earlier detection of high-risk cancers, particularly invasive tumors, by improving sensitivity for these cancers and reducing the incidence of advanced stage breast cancer (6,7). However, the high sensitivity of CAD for mammographic calcifications may shift diagnostic attention to relatively indolent cancers, such as ductal carcinoma in situ (DCIS), which often present with calcifications (8–13). In contrast, CAD is less likely to mark invasive cancers presenting as uncalcified masses (8,9) and may therefore have little or no impact on the detection of higher-risk invasive breast cancer (14,15). Although meta-analyses suggest that CAD statistically significantly increases recall rates (16,17), clinical studies have generally included too few women with breast cancer to clarify whether CAD differentially affects detection of in situ vs invasive breast cancer, or whether CAD is associated with improved prognostic characteristics of breast cancer such as more localized stage or smaller tumor size (11–13,18,19). Such data are critical for understanding the potential of CAD to decrease breast cancer mortality—the ultimate goal of breast cancer screening.
In a study of CAD implementation from 1998 to 2002 within the national Breast Cancer Surveillance Consortium (BCSC), CAD was associated with decreased specificity and positive predictive value (PPV1), and increased recall and biopsy rates, but no difference in cancer detection rates was observed (10). However, this early study did not include all BCSC sites, and of the 43 facilities, only seven used CAD for an average of 18 months from 1998 to 2002. The study also did not evaluate the prognostic characteristics of breast cancers such as tumor stage and size. In this study, we therefore examined performance impacts, cancer detection rates, and breast cancer prognostic characteristics associated with CAD implementation within 25 of 90 BCSC facilities from 1998 to 2006. We hypothesized that CAD would be associated with reduced specificity and that any increase in sensitivity associated with CAD would be attributable to increased detection of DCIS rather than invasive breast cancer. We further hypothesized that the prognostic characteristics of invasive breast cancer would be similar with and without CAD.
We studied mammography facilities participating in the BCSC, a federally supported collaboration that links mammography data to cancer outcomes in California, Colorado, North Carolina, New Hampshire, New Mexico, Washington, and Vermont (20). BCSC data quality is rigorously monitored, and BCSC sites have received institutional review board approval for active or passive consenting processes or a waiver of consent to enroll participants, link data, and perform analytical studies. All procedures are compliant with the Health Insurance Portability and Accountability Act, and BCSC sites have received a Federal Certificate of Confidentiality to protect the identities of patients, physicians, and facilities. We used standard BCSC definitions to identify screening mammogram samples (as opposed to mammograms performed for diagnostic purposes) and to define mammogram performance outcomes (eg, specificity, sensitivity, PPV1, and cancer detection rates) (20). Practice patterns and outcomes within the BCSC are broadly representative of the practice of mammography in the United States (21).
We included mammograms from facilities that averaged at least 100 film-screen screening mammograms per month of clinical activity from January 1, 1998, to December 31, 2006. Of 93 potentially eligible facilities, we excluded three because CAD use could not be definitively determined. Within the remaining 90 facilities, we included screening mammograms for women aged 40 years or older, defined as bilateral mammograms designated by radiologists as routine screening. Although 17 (19%) facilities adopted digital mammography during the study period, we excluded digital mammograms because too few were performed to distinguish the performance impacts of digital mammography and CAD (n = 23 830 digital mammograms of 1 644 936 total mammograms, 1.45%).
We defined mammograms with Breast Imaging Reporting and Data System (BI-RADS; American College of Radiology, Reston, VA) assessments of 0, 4, or 5 as positive and mammograms with BI-RADS assessments of 1 and 2 as negative. Mammograms with a BI-RADS assessment of 3 were considered positive if the radiologist also recommended immediate evaluation but were considered negative if the radiologist did not recommend immediate evaluation (22). We ascertained breast cancer diagnoses (either DCIS or invasive breast cancer) for 1 year after the screening mammography by linkage with regional Surveillance, Epidemiology, and End Results registries or with local or statewide tumor registries. This methodology captures both incident breast cancers diagnosed by mammography and interval breast cancers that arise within 1 year of screening mammography.
We defined specificity as the proportion of screening mammograms that were negative among women who were not diagnosed with breast cancer. We defined sensitivity as the proportion of screening mammograms that were positive among women diagnosed with breast cancer (including interval breast cancers) within 1 year of screening mammography. We defined PPV1 as the proportion of women diagnosed with breast cancer after a positive screening mammogram (23).
We computed the recall rate as the proportion of all screening mammograms that were positive and the cancer detection rate as the number of cancers that were preceded by positive mammograms per 1000 screening mammograms. The cancer detection rate can also be computed as the incidence rate of breast cancer (per 1000 screening mammograms) multiplied by sensitivity. For each mammogram, we determined whether radiologists recommended a breast biopsy, a fine needle aspiration, or a surgical consultation within 90 days. We classified invasive breast cancers by the following prognostic characteristics: American Joint Committee on Cancer stage (24), tumor diameter (in millimeters), and presence of regional lymph node involvement. On the basis of cancer characteristics associated with increased mortality (7) and decreased survival (25,26), we defined tumors (including both DCIS and invasive cancer) as advanced if they were designated stage II–IV, if tumors measured greater than 15 mm in diameter, or if lymph node involvement was observed.
Of 90 total facilities, 25 (27.8%) implemented CAD during the study period. Because a previous survey indicated that CAD was used on most if not all mammograms after implementation (10), we classified mammograms performed after facility implementation of CAD as exposed to CAD. We defined mammograms interpreted at facilities that never implemented CAD or before CAD implementation as non-CAD. We excluded mammograms interpreted during the initial 3 months of CAD use at each facility because radiologists may have been adapting to the technology during this period. Excluding this 3-month period, facilities used CAD for an average of 27.5 months (range = 4–60 months). The BCSC does not collect data on the specific CAD product used by individual facilities.
We included patient variables associated with interpretive accuracy including age, breast density, and time since previous mammogram (27,28). Radiologists report BI-RADS breast density on each mammogram. We also specified whether patients were using hormone therapy (HT) because HT has been associated with increased risk of abnormal mammography, breast biopsy, and breast cancer (29–31). Patients report HT use on self-administered questionnaires before mammography.
We performed bivariate analyses to compare patient characteristics and unadjusted outcomes by CAD use. We stratified descriptive analyses by examination year (January 1, 1998, to December 31, 2002, vs January 1, 2003, to December 31, 2006), because US breast cancer incidence declined in mid-2002 when many women stopped using HT following the publication of Women’s Health Initiative findings demonstrating a link between HT and breast cancer (31–33). For mammograms interpreted at facilities that implemented CAD, we stratified by CAD use rather than year. However, because the median year of CAD implementation was 2003, most post-CAD mammograms (88.5%) were interpreted from January 1, 2003, to December 31, 2006.
We performed random-effects logistic regression to model binary outcomes for each performance measure as functions of CAD use while adjusting for patient characteristics, year (1998–2002 vs 2003–2006), and BCSC registry. For specificity, the model estimated the adjusted relative odds of a true-negative mammogram among women without breast cancer diagnosed within 1 year of mammography. Similarly, for sensitivity, the model estimated the adjusted relative odds of a true-positive mammogram among women diagnosed with breast cancer within 1 year of mammography. For PPV1, the models estimated the adjusted relative odds of a breast cancer diagnosis within 1 year of a positive mammogram. For breast cancer detection rates, the model estimated the adjusted relative odds of a breast cancer diagnosis within 1 year of a positive mammogram among all mammograms. Patient covariates were included as categorical variables to allow adjustment for potentially nonlinear relationships between outcomes and covariates. Patient age on the mammogram date was categorized as 40–44 years, 45–49 years, 50–54 years, 55–59 years, 60–64 years, 65–69 years, 70–74 years, or 75 years or older. Breast density was categorized by BI-RADS density category as follows: almost entirely fat, scattered fibroglandular tissue, heterogeneously dense, and extremely dense. Time since previous mammography was categorized as no previous mammogram, 9–15 months, 16–20 months, 21–27 months, or 28 months or longer. Models included facility-level random effects to account for correlation of outcomes within facility and varying baseline performance within facilities independent of CAD and patient factors.
We obtained similar results with generalized estimation equations, conditional logistic regression (conditioning outcomes on facility), and after restricting the random-effects analyses to the 78.6% of mammograms that were interpreted by radiologists who interpreted mammograms at facilities before and after CAD implementation (or from 1998–2002 to 2003–2006 at non-CAD facilities). We repeated the analyses for cancer detection outcomes after excluding mammograms performed more than 12 months after CAD introduction. We performed this sensitivity analysis to address concerns that an initial increase in cancer detection rates with CAD (eg, during the first year of CAD use) may be obscured in analyses including many years of post-implementation data (34).
We assessed the potential impact of outliers by creating scatterplots and computing extreme studentized deviates of pre- to post-CAD changes in specificity and sensitivity within facilities (35), which suggested that one or both measures were outlying within four facilities. When we repeated principal regression models of specificity, sensitivity, and PPV1 after excluding mammograms from these four facilities, the CAD-associated odds ratio (OR) for specificity was slightly attenuated, but CAD-associated odds ratios for sensitivity and PPV1 were essentially unchanged. We therefore report results from analyses including mammograms from all facilities.
We conducted a secondary post hoc analysis to assess whether initial performance changes with CAD subsequently moderated over time and whether performance impacts differed across facilities. Across all facilities that implemented CAD, we computed unadjusted specificity, sensitivity, and PPV1 across three time periods encompassing the transition to CAD (pre-CAD, 3–12 months post-CAD, and >12 months post-CAD). We made similar comparisons within the subset of facilities that exhibited statistically significant declines in specificity after CAD implementation in a previous report (n = 6) (10) and the remaining CAD facilities that are new to this report (n = 19). Analyses were conducted with SAS, Version 9.2 (SAS Institute, Cary, NC). Statistical tests were two-sided and P less than .05 was considered significant.
From January 1, 1998, to December 31, 2006, 684 956 women received more than 1.6 million screening mammograms at 90 BSCS facilities, of which 25 (27.8%) facilities adopted CAD and used it for an average of 27.5 study months. The mammograms were interpreted by 793 radiologists, including 154 (19.4%) at facilities with CAD. Of the 684 956 women included in the study, 7722 (1.1%) were diagnosed with breast cancer within 1 year of screening, including 1013 women who were diagnosed with breast cancer within 1 year of screening at facilities with CAD (13.1% of all patients). The proportion of women who used HT declined substantially across the study period in all facilities (Table 1).
At facilities that never implemented CAD, specificity increased slightly but statistically significantly from 91.1% between 1998 and 2002 to 91.3% between 2003 and 2006 (difference = 0.2%, 95% confidence interval [CI] of the difference = 0.04% to 0.28%, P = .008) (Table 2). The recall rate correspondingly decreased statistically significantly from 9.3% between 1998 and 2002 to 9.1% between 2003 and 2006 (difference = −0.2%, 95% CI of the difference = −0.30% to −0.05%, P = .005). In contrast, at facilities that implemented CAD, specificity decreased statistically significantly from 91.9% to 91.4% after CAD implementation (difference = −0.5%, 95% CI of the difference = −0.67% to −0.39, P < .001), whereas the recall rate increased from 8.4% to 8.9% (difference = 0.5%, 95% CI of the difference = 0.36% to 0.64%, P < .001). From the 1998–2002 to 2003–2006 time periods, sensitivity increased statistically significantly from 80.7% to 84.0% at facilities that did not implement CAD (difference = 3.3%, 95% CI of the difference = 1.1% to 5.6%, P = .005). However, at facilities that implemented CAD, sensitivity did not statistically significantly change from 79.7% before CAD to 81.1% after CAD implementation (difference = 1.4%, 95% CI of the difference = −1.6% to 4.5%, P = .35). At facilities that did not implement CAD, PPV1 changed little across the time periods (difference = −0.1%, 95% CI of the difference = −0.4% to 0.2%, P = .63). However, PPV1 statistically significantly decreased from 4.3% to 3.6% after the transition to CAD use at facilities that implemented CAD (difference = −0.7%, 95% CI of the difference = −1.0% to −0.3%, P < .001). The number of biopsies recommended per 1000 mammograms declined statistically significantly both at facilities that never implemented CAD (difference = −1.3 biopsy recommendations per 1000 mammograms, 95% CI of the difference = −1.8 to −0.8 biopsy recommendations per 1000 mammograms, P < .001) and at facilities that implemented CAD (difference = −2.2 biopsy recommendations per 1000 mammograms, 95% CI of the difference = −2.7 to −1.7 biopsy recommendations per 1000 mammograms, P < .001).
Between 1998–2002 and 2003–2006, the breast cancer detection rate at facilities that did not implement CAD was stable (4.2 cancers detected per 1000 mammograms in 1998–2002 vs 4.0 cancers detected per 1000 mammograms in 2003–2006), as were the detection rates of invasive breast cancer (3.3 invasive cancers detected per 1000 mammograms in 1998–2002 vs 3.2 invasive cancers detected per 1000 mammograms in 2003–2006) and DCIS (0.9 DCIS detected per 1000 mammograms in 1998–2002 vs 0.9 DCIS detected per 1000 mammograms in 2003–2006) (Table 2). Among invasive cancers diagnosed in women screened at facilities that never implemented CAD, stage distribution, tumor size, and lymph node status were similar among cancers diagnosed in 1998–2002 and 2003–2006. Among all breast cancers diagnosed in women screened at these facilities, the proportion of advanced cancers was similar in both time periods.
At facilities that implemented CAD, the overall cancer detection rate declined statistically significantly from before CAD implementation to after CAD implementation (difference = −0.4 cancers detected per 1000 mammograms, 95% CI of the difference = −0.7 to −0.1 cancers detected per 1000 mammograms, P = .01) (Table 2). The decline in the overall cancer detection rate was attributed mainly to a statistically significant decline in the detection rate of invasive breast cancer, whereas the detection rate of DCIS remained stable. Among invasive cancers, tumor size and lymph node status were similar before and after CAD implementation, as were the proportions of all cancers that were advanced (including both DCIS and invasive cancer). Although the stage distributions of invasive cancers were generally similar before and after CAD implementation, an overall test found a statistically significantly less favorable stage distribution after CAD implementation (P = .036), with a greater proportion of stage III invasive cancers and smaller proportions of stage I and II cancers compared with the period before CAD implementation.
After adjusting for BCSC registry, patient characteristics, HT use, and interpretation year, CAD use was associated with statistically significantly lower specificity (OR = 0.87, 95% CI = 0.85 to 0.89, P < .001), a non-statistically significant increase in sensitivity (OR = 1.06, 95% CI = 0.84 to 1.33, P = .62), and statistically significantly lower PPV1 (OR = 0.89, 95% CI = 0.80 to 0.99, P = .03) (Figure 1). When the analysis of sensitivity was restricted to invasive cancers only, CAD use was no longer associated with increased sensitivity (OR = 0.96, 95% CI = 0.75 to 1.24, P = .77). In contrast, when the analysis was restricted to DCIS, CAD use was associated with greater sensitivity (OR = 1.55, 95% CI = 0.83 to 2.91, P = .17), although the association was non-statistically significant.
Furthermore, after adjusting for BCSC registry, patient characteristics, HT use, and interpretation year, CAD use was not associated with statistically significant differences in the odds of biopsy recommendation (OR = 0.99, 95% CI = 0.93 to 1.05, P = .66), overall breast cancer detection (OR = 1.01, 95% CI = 0.92 to 1.12, P = .79), the detection of invasive breast cancer (OR = 0.97, 95% CI = 0.86 to 1.08, P = .54), the diagnosis of stage I invasive cancer compared with later stage invasive cancer (OR = 0.90, 95% CI = 0.73 to 1.11, P = .34), or the diagnosis of invasive tumors of 15 mm or less in size compared with greater than 15 mm (OR = 0.92, 95% CI = 0.74 to 1.15, P = .46) (Figure 1). Also, CAD use was not associated with statistically significant differences in the diagnosis of invasive tumors without vs with nodal involvement (OR = 1.05, 95% CI = 0.84 to 1.30, P = .68), or the diagnosis of advanced vs non-advanced breast cancer (OR = 0.96, 95% CI = 0.80 to 1.15, P = .67) (Figure 1). However, CAD was associated with a non-statistically significant increase in the rate of detection of DCIS (OR = 1.16, 95% CI = 0.95 to 1.42; P = .14). In an analysis that excluded mammograms performed more than 12 months after CAD implementation, there remained no statistically significant association between CAD and the detection of any breast cancer (OR = 1.06, 95% CI = 0.93 to 1.22, P = .36).
In post hoc analyses assessing variability in CAD impacts over time and across the 25 facilities that implemented CAD, specificity declined from 91.9% to 91.5% from pre-CAD compared with 3–12 months post-CAD (difference = −0.4%, 95% CI of the difference = −0.7 to −0.3%, P < .001) and remained similarly decreased more than 12 months after CAD implementation (Table 3). Although sensitivity increased from 79.7% pre-CAD to 83.1% during the 3–12 months after CAD implementation, the change was non-statistically significant (difference = 3.4%, 95% CI of the difference = −0.9% to 7.8%, P = .14), and sensitivity after 12 months of CAD use was similar to the pre-CAD period. PPV1 declined from 4.3% in the pre-CAD period to 3.7% during the period of 3–12 months post-CAD implementation (difference = −0.6%, 95% CI of the difference = −1.1% to −0.1%, P = .02) and remained similarly decreased after 12 months of CAD use.
The interpretive impact of CAD, however, varied across facilities. Of seven CAD facilities included in a previous report (10), six were included in the current study, whereas one was excluded because it averaged no more than 100 mammograms per month. Among these six facilities (Group 1), specificity decreased from 90.3% pre-CAD to 87.5% during the 3–12 months post-CAD (difference = −2.8%, 95% CI of the difference = −3.2% to −2.3%, P < .001) before increasing to 88.8% after 12 months of CAD use (difference vs 3–12 months post-CAD = 1.3%, 95% CI of the difference = 0.8% to 1.8%, P < .001) (Table 3). At these six facilities, sensitivity increased statistically significantly from 79.8% pre-CAD to 88.9% during the 3–12 months after CAD implementation (difference = 9.1%, 95% CI of the difference = 2.2% to 16.0%, P = .02) but then decreased to 81.6% after 12 months of CAD use (difference vs pre-CAD period = 1.8%, 95% CI of the difference = −4.3% to 7.8%, P = .57). PPV1 decreased at these facilities from 4.1% pre-CAD to 3.6% during the 3–12 months post-CAD (difference = −0.5%, 95% CI of the difference = −1.3% to 0.4%, P = .30) but decreased further to 3.0% after 12 months of CAD implementation (difference vs 3–12 months = −0.6%, 95% CI of the difference = −1.4% to 0.2%, P = .11). Compared with the pre-CAD period, PPV1 was statistically significantly lower after 12 months of CAD use at these facilities (difference = −1.1%, 95% CI of the difference −1.7% to −0.5%, P < .001).
In contrast, among 19 other facilities that implemented CAD (Group 2), specificity, sensitivity, and PPV1 changed little from the pre- to the post-CAD period. Indeed, unadjusted specificity increased slightly from 92.3% pre-CAD to 92.8% during the 3–12 months after CAD implementation (difference = 0.5%, 95% CI of the difference = 0.2% to 0.7%, P < .001) and remained similarly increased (92.9%) after 12 months of CAD use (difference vs pre-CAD period = 0.6%, 95% CI of the difference = 0.4% to 0.8%, P < .001).
Within a large geographically diverse sample of mammography facilities, CAD was associated with statistically significant decreases in specificity and PPV1, a non-statistically significant increase in sensitivity, but no statistically significant improvements in either cancer detection rates or the prognostic characteristics of incident breast cancers. The non-statistically significant increase in sensitivity was attributed to greater detection of DCIS with CAD rather than increased detection of invasive breast cancer.
This study offers important new insights regarding the effectiveness of CAD when used in real-world practice. First, the results suggest a limited impact of CAD on breast cancer detection, particularly with respect to invasive breast cancer detection. In adjusted analyses, CAD was not associated with improved detection of invasive breast cancer, increased early stage diagnosis, or smaller-sized invasive breast cancers. These findings raise concerns that CAD, as currently implemented in clinical practice, may have little or no impact on breast cancer mortality, which may depend on earlier detection of invasive breast cancer (6,7). Second, this study of CAD adoption within 25 BCSC facilities indicates that CAD has a modest impact in typical practice compared with a previous analysis within seven BCSC facilities from 1998 to 2002 (10). The current findings are consistent with a meta-analysis that suggests a modest increase in recall rates with CAD with little or no impact on cancer detection rates (17).
Nishikawa and Pesce (33) argue that studies with matched designs are the most accurate means of assessing CAD clinical impacts (ie, in which outcomes are compared before and after CAD application on the same mammogram) (34). However, matched studies typically impose the restriction that radiologists can only upgrade final BI-RADS assessments after viewing CAD output, resulting in the recall of women who would not have been recalled in the absence of CAD (11,13,36,37). In our view, this design assesses the efficacy of CAD, or its clinical impact when used under optimal conditions. In contrast, this study assesses the effectiveness of CAD use in actual everyday practice conditions, in which radiologists with variable experience and expertise may use CAD in a nonstandardized idiosyncratic fashion (5). Some community radiologists, for example, may decide not to recall women because of the absence of CAD marks on otherwise suspicious lesions.
This large-scale population-based observational study also enabled assessment of the impact of CAD on important breast cancer outcomes such as DCIS detection and the stage distribution of invasive cancers. These outcomes may be impossible to assess with adequate statistical power within matched studies or even randomized trials (38). Although our analyses may lack sufficient power to exclude a small benefit of CAD in terms of invasive breast cancer detection, the principal contribution of CAD may be increased detection for DCIS—a precancerous lesion with an ill-defined long-term prognosis (39). Point estimates and 95% confidence intervals from this study may be useful in statistical models of long-term breast cancer outcomes among women screened with and without CAD (40). Such models could quantify the potential for CAD to improve breast cancer mortality, possible overdiagnosis of DCIS, patient preferences for earlier treatment of DCIS vs later treatment of invasive cancer, the harms of additional false-positive mammograms, and societal costs (including both supplemental fees for CAD use and the costs of added diagnostic testing).
We performed post hoc analyses to explore potential heterogeneity across facilities and over time. Because the analyses were conducted post hoc, these results should be interpreted cautiously, particularly the resultant confidence intervals and P values. These analyses suggest that substantial initial changes in specificity and sensitivity after CAD implementation subsequently attenuated within six BCSC facilities that were included in a previous report (10). The performance impact of CAD at these facilities may have diminished as radiologists gained experience with CAD. Similar adjustment effects, however, were not apparent within 19 other CAD facilities that are reported on here for the first time. It is possible that more recent versions of CAD software have a less potent influence on interpretation, although a recent comparative study suggests that earlier compared with later versions of CAD software do not differ greatly in performance (41). It is also possible that training in CAD use may vary across radiologists and facilities, leading to variable interpretive impacts of CAD. In addition, overall performance changes associated with CAD may be attributed to substantive impacts within some facilities, whereas CAD may have little interpretive impact when implemented in most community facilities. Variability in the impact of CAD on recall rates was also observed in a meta-analysis (17) and across three sites in a trial comparing mammogram interpretation by a single radiologist with CAD versus interpretation by two radiologists (double reading) (42).
Although prior analysis of BCSC data indicated increased biopsy risk with CAD (10), this analysis reveals a decline in the rate of biopsy recommendation over time regardless of CAD use. In the Women’s Health Initiative, women randomly assigned to HT had nearly two-thirds greater cumulative breast biopsy risk (29), and reduced HT use after publication of the Women’s Health Initiative results in July 2002 may largely explain the observed reduction in biopsy recommendations (32). Biopsy recommendations declined despite reduced specificity within facilities that implemented CAD, suggesting that most CAD-induced recalls were resolved without biopsy.
A limitation of this study is the absence of digital mammography data. Whereas CAD algorithms perform a similar alerting function in the film-screen and digital environments, film-screen mammograms must be digitized before CAD analysis, and digitization may introduce noise and adversely affect performance. However, small retrospective studies suggest that the performance impacts of CAD are similar when used in digital (43–46) and film-screen environments (8,11,19).
Because prior research suggests that facilities apply CAD on nearly all mammograms after implementation (10), these analyses assumed that all mammograms were interpreted with CAD after implementation—another limitation of this study. To the extent that facilities did not use CAD on all mammograms, results may be biased toward the null. As the analyses account for salient patient factors, unmeasured radiologist or facility characteristics may affect results. Results were similar, however, in analyses in which CAD effects were conditional on the basis of the facility, which would control to some extent for potentially confounding facility factors. Similarly, results were unchanged when analyses were restricted to mammograms interpreted by radiologists who were present at facilities before and after CAD implementation, therefore controlling to some extent for changes in radiologist staffing over time. Although the number of women with breast cancers diagnosed after CAD implementation (>1000 cancers) is greater than that observed in previous samples, larger samples may be needed to detect small increases in sensitivity or cancer detection with CAD. Finally, another limitation of this study is the lack of data on the CAD products that each facility used, so the potentially distinct impacts of different products could not be investigated.
Among a large sample of US mammography facilities, CAD was associated with statistically significantly decreased specificity and PPV1. CAD was not associated with improved sensitivity for invasive breast cancer, increased rates of breast cancer detection, or more favorable stage or size of invasive breast cancers. CAD is now applied to the large majority of screening mammograms in the United States with annual direct Medicare costs exceeding $30 million (2). As currently implemented in US practice, CAD appears to increase a woman’s risk of being recalled for further testing after screening mammography while yielding equivocal health benefits.
This work was supported by the National Cancer Institute–funded Breast Cancer Surveillance Consortium co-operative agreement (grants U01CA63740, U01CA86076, U01CA86082, U01CA63736, U01CA70013, U01CA69976, U01CA63731, and U01CA70040), the National Cancer Institute (grant CA104699 to J.G.E.), and the American Cancer Society (grant MRSGT-05-214-01-CPPB to J.J.F.). The collection of cancer data used in this study was supported in part by several state public health departments and cancer registries throughout the U.S. For a full description of these sources, please see: http://breastscreening.cancer.gov/work/acknowledgement.html.
The funders did not have any involvement in the design of the study; the collection, analysis, and interpretation of the data; the writing of the article; or the decision to submit the article for publication. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Cancer Institute or the National Institutes of Health. We thank the participating women, mammography facilities, and radiologists for the data they have provided for this study. A list of the BCSC investigators and procedures for requesting BCSC data for research purposes are provided at: http://breastscreening.cancer.gov/.
SHT was an employee of the National Cancer Institute (NCI) during the writing of this article but any opinions are his alone and not reflective of the opinions of the NCI or the US Federal Government. CD holds stock options in and serves as a medical consultant for Hologic, Inc (a manufacturer of CAD equipment), which had no role in data collection, analysis, or interpretation.