Our study revealed a number of important differences and similarities in the performance and effectiveness of screening mammography in Vermont and Norway. Opportunistic screening in Vermont was associated with a considerably higher recall rate and a lower screen detection rate compared with the organized screening program in Norway. Analyses that were based on woman-years of follow-up revealed statistically significantly higher interval cancer rates and overall detection rates (interval and screen detection) in Vermont than Norway for invasive, DCIS, and total cancer. However, tumor size and lymph node involvement characteristics were more favorable for invasive interval cancers in Vermont than for those in Norway. Despite these differences, Vermont and Norway were similar with respect to the prognostic characteristics of all invasive cancers (screen-detected and interval cancer) that were diagnosed in women who had undergone screening during the study period.
The recall rate in Vermont was nearly four times that of Norway, regardless of the screening interval that was examined. The higher recall rate in Vermont could have been driven by radiologists' concerns about malpractice lawsuits (
8). However, Elmore et al. (
20) found no association between medical malpractice experience and concerns and recall rates among radiologists in three regions of the United States (Washington, Colorado, and New Hampshire). The lower recall rates in Norway and other European countries may instead reflect a standard for recall that is set by the screening programs and regular monitoring of the recall rate to assure compliance (
4). Contrary to expectation, the lower recall rate in Norway was not associated with either a lower screen detection rate or a higher interval cancer rate than that of Vermont.
Screening performance measures are influenced by several factors. The screen detection and interval cancer rates are interrelated; both depend on the frequency of screening, the cancer incidence, and the accuracy of the mammographic assessment. The lower rate of screen-detected cancers per 1000 screens in Vermont than in Norway was expected because the majority of the Vermont women were screened twice as often as the Norwegian women and because longer intervals between examinations generally increase screen detection by improving sensitivity (
18). Results of our logistic regression analysis that controlled for screening interval suggest that differences in the screen detection rates between Vermont and Norway were due primarily to the difference in screening interval. This logistic regression analysis assumed that the relationship between screening interval and detection rate was the same in Vermont and Norway. Although this assumption appears to be valid based on the results for the 2-year and greater than 2-year screening intervals, we were unable to test it because there were no 1-year screens in Norway.
Other factors may have contributed to the differences in recall and screen detection rates between Vermont and Norway that we observed. For example, all screening mammograms in Norway are independently double read, and discrepancies are decided by arbitration. By contrast, during the study period, the majority of mammograms in Vermont were single read, although some were augmented by computer-aided detection and double reading either with or without arbitration. Independent double reading has previously been reported to be associated with a higher screen-detected cancer rate and, depending on the recall policy, a lower recall rate (
21–
24). Despite the evidence in favor of double reading, health insurance companies in Vermont and Medicare do not reimburse for double reading but do reimburse for computer-aided detection, even though the evidence that computer-aided detection is effective in improving mammography accuracy is equivocal (
25,
26). A second factor that could have affected screening accuracy in our study was the experience of the radiologists interpreting mammograms (
27–
29). All radiologists in the NBCSP are mammography specialists and are required to read at least 5000 screening mammograms each year (
4,
17). By contrast, most radiologists in Vermont are generalists who read all types of radiological images, and very few read as many as 5000 screening mammograms per year (
15,
21). At this time, there is no consensus about how radiologist volume, experience, and training influence the accuracy of screening mammography (
26–
28). Although some studies (
4,
27,
30) suggest that specialists are more accurate mammography readers than generalists, the term “specialist” was not clearly defined in those studies.
The factors described above that potentially lowered the rate of screen-detected cancer in Vermont compared with Norway also may have contributed to Vermont's higher interval cancer rate. However, the effects of these factors should be offset, to a large extent, by more frequent screening. With a shorter screening interval, cancers that are not detectable at screening have less time to be clinically detected before the next screen. It was therefore surprising that women in Vermont had a higher interval cancer rate per 1000 woman-years of follow-up and a higher probability of interval cancer regardless of the time since screening than women in Norway. These findings suggest that the Vermont women and/or their health care providers may more readily pursue evaluation of symptoms and clinical findings than their Norwegian counterparts. The predetermined 24-month screening interval and the scheduled examinations in Norway may result in women being more likely to wait until their next personal invitation, even if they have symptoms. This possibility is supported by our finding that the interval cancers diagnosed in Vermont were smaller and less likely to have lymph node involvement than those diagnosed in Norway.
Although the more favorable tumor characteristics observed in Vermont are advantageous for the women diagnosed with interval cancer, these characteristics may not have a sizable impact on the overall effectiveness of screening in terms of mortality reduction because interval cancers account for only 20%–30% of the breast cancers detected in screened women (
31,
32). The majority of interval cancers in Norway were diagnosed during the second year after screening, which suggests that a shorter screening interval might lead to earlier detection and, thus, more prognostically favorable tumor characteristics. However, a previous study from Norway that examined the characteristics of invasive interval cancers by time since last mammogram found that although tumor size increased with time since the last screen, there were no substantial differences in other tumor characteristics, such as tumor grade, lymph node involvement, or estrogen or progesterone receptor status (
33).
The tumor characteristics of all invasive cancers (screen-detected plus interval cancers) did not differ between Vermont and Norway, despite the fact that the Norwegian women had a longer screening interval and invasive interval cancers with less favorable tumor characteristics compared with Vermont women. This finding is consistent with a previous study by White et al. (
34), who found no additional risk of late-stage breast cancer in US women 50 years or older who were screened biennially vs annually, and is further supported by a study by Wai et al. (
35), which showed no difference in 5-year survival among women aged 50–74 years who underwent annual vs biennial screening mammography.
All DCIS rates (screening detection, interval cancer, and overall) computed per 1000 woman-years were lower in Norway than in Vermont. A similar finding of lower detection of DCIS with longer screening intervals has been previously reported (
35), as has a higher proportion of DCIS among young women (
36). A possible explanation for the higher detection of DCIS in Vermont may be that, since 2001, approximately one-third of screening mammograms in Vermont were performed using digital imaging, which provides improved image contrast that may increase the detection of cancers, particularly DCIS, in dense breasts (
37). Only about 8% of the screens from Norway that were included in this study were by digital mammography. The higher rate of DCIS in Vermont may also be due to the use of computer-aided detection, which can also increase the detection of DCIS (
25). Computer-aided detection has not been implemented in the screening program in Norway. The somewhat higher proportion of biopsies among women screened in the United States compared with women screened in Norway (
3,
38) may also have contributed to the higher proportion of DCIS in Vermont. The detection of DCIS, a preinvasive lesion, is controversial: some believe that low-grade DCIS lesions are being overdiagnosed, and consequently, that women are being treated for disease that is not life threatening or clinically relevant (
39,
40). However, some DCIS progresses to an invasive cancer (
40,
41). Without knowing which cases of DCIS are likely to progress, it is impossible to determine whether the higher rate of DCIS in Vermont was a beneficial or adverse outcome for the women who were screened.
Our study has several limitations. First, although great attention was paid to creating variables that were comparable, it is possible that some results were influenced by subtle differences in the Vermont and Norwegian data definitions and collection procedures. However, our results for both Vermont and Norway were similar to those of other studies within each respective country or continent (
3,
8,
35,
38), validating that the variables were indeed measuring what they were designed to measure. A second limitation is that not all variables that influence mammography accuracy were collected in both countries. For example, mammographic breast density was only collected in Vermont and we therefore were unable to adjust for this possible confounder. However, we have no reason to believe that the distribution of breast density would be different among the two populations. Finally, the absence of 1-year screens in Norway limited our ability to fully distinguish the effects of screening interval from other potential differences in mammography performance between Vermont and Norway.
In conclusion, screening in Vermont and Norway yielded comparable overall results. However, what works in one country may not work in the other. For example, it is unclear how the effectiveness of biennial opportunistic screening in Vermont would compare with that of the biennial organized screening in the Norwegian program. Implementation of biennial screening mammography in Vermont with no reduction in the interval cancer rate could have a negative impact on the prognosis of future interval cancers. Adoption of biennial screening in the United States would reduce the number of mammograms being performed, which might give radiologists more time to perform independent double reading and possibly offset the financial cost associated with double reading. Independent double reading with consensus probably accounts for the fewer interval cancers and lower recall rate in Norway.
Our results demonstrate that despite its longer screening interval, the organized population-based screening program in Norway achieved similar outcomes as the opportunistic screening in Vermont. The Norwegian women were exposed to half as many screening mammograms as the Vermont women, and the recall rate in Norway was statistically significantly lower than that in Vermont, yet the tumor characteristics for all invasive cancers diagnosed in the screened Norwegian women were not statistically significantly different from those diagnosed in the screened Vermont women. Although more frequent screening in Norway might lead to interval cancers that have more prognostically favorable tumor characteristics, it is unclear whether or not a shorter screening interval would decrease breast cancer mortality among screened Norwegian women.