Estimating the cumulative risk of at least one FP screening test after repeated rounds of screening is important for understanding the potential harms associated with a screening program. However, estimating this risk is challenging, because typically not all individuals will receive all recommended screening rounds within the study period; and some may drop out of the screening program altogether. An additional complication arises if the FP risk differs depending on the number of screening rounds attended. We reviewed existing statistical methods that fall under two general frameworks: conditional approaches, which estimate risk for the subgroup of subjects who choose to attend a specified number of screening tests; and marginal approaches, which marginalize over the number of screening tests subjects chose to attend. The conditional approaches rely on the assumption that the probability of a FP result at each round of screening is independent of the total number of screening rounds attended. By contrast, the marginal approach allows for variation in the FP risk as a function of number of screening rounds attended but relies on the assumption that the risk of a first FP result at each screening round after censoring is constant.
We used 13 years of data on screening mammograms that the BCSC collected on over 150,000 women to illustrate available statistical approaches for estimating the cumulative FP risk and to evaluate the appropriateness of modeling assumptions. We found evidence that assumptions of both approaches do not hold for the BCSC population. Specifically, at almost every screening round, women who returned for subsequent screening mammograms had lower FP rates than did those who did not return. These differences were not mitigated by adjusting for baseline age, average interval between screening exams, year of first exam, and registry site. Assumptions of the marginal model are untestable because they refer to the FP risk among women who are censored, which is by definition unobserved. However, strong trends in the probability of a first FP across screening rounds among women who were not censored suggest that this is an unrealistic assumption in this context. Thus, the estimated cumulative probability of a FP result after 10 screening rounds of 58.2% based on the conditional model likely underestimates the true risk. By comparison, the estimate of 77.0% based on the marginal approach likely overestimates the risk. We proposed an extension to Xu and colleagues' approach that allowed for variations in the FP probability associated with total number of screening rounds attended and variation in the FP probability across screening rounds. Based on our extended model, we estimated the cumulative risk of a FP test after 10 screening exams to be 63.3%.
Our estimates of cumulative risk are higher than those reported in previous studies. For instance, using data from Harvard Pilgrim Health Care in Boston, MA and a conditional cumulative risk model, Elmore et al.[6
] estimated that 49.1% of women would experience at least one FP by their tenth screening mammogram. However, this estimate does not account for such possible confounders as the sample's higher-than-normal rate of family history, irregular screening intervals, and presence or absence of prior comparison films; nor does it address variability across different risk groups or among radiologists. Also, in this study population, the observed FP rate at a single exam across all rounds of screening was only 6.3%, notably lower than that found by the BCSC and other U.S. studies and populations (see e.g. Rosenberg et al.[19
Our estimates are also notably higher than those for European screening programs. The FP risk in the triennial NHS Breast Screening Programme conducted in the U.K. is 7.8% at the first screening mammogram and 2.8% at subsequent screening mammograms[23
]. A woman attending all screening rounds in this program would participate in 7 rounds of screening over 20 years and would experience a cumulative FP risk of 22.2%, assuming independence of the FP risk and duration of screening. In a study of the Norwegian screening program[24
], the FP rate after 10 biennial screening mammograms over 20 years was estimated to be 20.8%, projected from 3 screening rounds. The FP rate after 10 biennial screening mammograms over 20 years in a Spanish screening program[25
] was estimated to be 32.4%, projected from 4 screening rounds. Increased risk was found to be associated with previous benign breast disease, perimenopausal status, high body mass index, and younger age. In a study of the Danish screening program[26
], the FP rate after 10 biennial screening mammograms over 20 years was estimated to be 15.8-21.5% for Copenhagen and 8.1-9.6% for Fyn, projected from 3-5 screening rounds and assuming independence between exams.
The European studies are not directly comparable to performance in the context of American clinical practice because screening practice differs markedly between Europe and the United States[27
]. Specifically, European screening programs typically have biennial screening with a much greater volume of screening mammograms interpreted per radiologist and typically screening mammograms are double-read, resulting in markedly lower callback rates than in U.S. practices. The lower callback rate in these studies results in lower cumulative risk of a FP result. The results from Europe are also based on a relatively small number of rounds of screening (3-5) observed per woman. Estimates of the FP rate over 10 rounds of screening are extrapolated from this course of observation.
In addition to model-based estimates of the cumulative FP risk, empirical estimates in the BCSC cohort are also higher than those previously reported. In a study of women undergoing screening mammography at Massachusetts General Hospital Avon Comprehensive Breast Center, the empirical cumulative FP risk among women receiving 10 screening mammograms within a 10 year period was 29.2%[20
]. We contrast this to our empirical cumulative FP risk of 44.5% among women receiving 10 or more screening exams. There are several reasons why we might expect the estimate based on the BCSC data to be higher than that in previous studies. First, our sample excluded women who reported previous screening mammograms prior to the first exam captured by the BCSC. Including women who had undergone previous screening would tend to underestimate the FP risk because the risk is highest at the first mammogram. Additionally, our follow-up period spanned more than 10 years allowing for longer intervals between screening exams, which is associated with an increased FP risk[21
]. Other differences may exist between our study population and that used in previous studies. We believe that the FP risk among BCSC women, a nationally inclusive cross-section of women participating in screening mammography in a community setting, is likely to most closely reflect the FP experience of women in the United States.
To more fully understand the risk of a FP after multiple rounds of screening, additional extensions of existing models are needed. First, it is important to consider how the cumulative risk of a FP depends on baseline and time-varying covariates – and to account for the wide variability that has been observed in radiologist interpretive performance[30
]. Our analysis of the BCSC data has not accounted for these sources of variability. In an extension of the work of Elmore et al.[6
], Christiansen et al.[8
] addressed the role of possible confounders and between-radiologist variability in performance using the Harvard Pilgrim population. The predicted risk of a FP after 9 mammograms varied across radiologists and as a function of woman-level risk factors from 5% to 100%. Between-radiologist variability in performance was found to be very large, with radiologist effects swamping the impact of all other covariates included in the model. Analogous extensions are needed for the marginal model. To more fully understand the FP risk in the BCSC population we will undertake analyses incorporating woman-level risk factors and between-radiologist variability in future studies. Second, marginal methods for estimating the cumulative probability of a FP result that allow for greater flexibility in patterns of FP probabilities across screening rounds are needed. We have proposed a simple extension of the marginal method that makes assumptions about FP rates among censored women that are likely to be more appropriate in the context of screening mammography. However, more general extensions that would be appropriate to other screening tests are needed.
In this analysis of the BCSC data, we have focused on the FP recall rate for mammography. FP recalls represent the most prevalent harm of mammography. However, the actual impact of a FP recall is much smaller than the impact of other types of FP events such as biopsies. Appropriate evaluation of a screening program should take into account both the probability of a given harm and its cost. The FP recall risk discussed in this paper represents a common though non-invasive cost of mammography. Previous research on the impact of FP mammograms suggest that women receiving a FP recall experience elevated anxiety and distress[5
]. While the evidence indicates that FP screening results are stressful, for most women the adverse effects are transitory. Moreover, a survey by Schwartz et al.[34
] revealed that women were highly aware of FP results and highly accepting of FPs as a necessary cost of breast cancer screening, although the women surveyed significantly underestimated the likelihood of experiencing a FP finding over a 10-year period. In addition to the FP recall risk, statistical methods discussed in this paper can be used to estimate the risk of other FP events associated with screening tests. In future research we plan to apply statistical methods discussed here to estimation of other potential harms of mammography such as the FP biopsy risk.
To date, international studies have shown highly variable risk of a FP result for women receiving routine mammography in regular screening programs. The cumulative risk observed in this analysis of women in the BCSC is substantial, and considerably higher than previously projected rates for U.S. women[6
]. Despite the high FP risk estimated in the BCSC population, this number should not be used in isolation to question the balance of benefits and harms of mammography screening programs. The estimates of the cumulative risk of a FP estimated for the BCSC women are average estimates for the BCSC population that do not account for starting ages, screening intervals, differential risk, and other factors that may influence the FP rate. While we believe that women should be informed that their risk of one or more FP mammograms is relatively likely over a decade or more of regular screening, the estimated rate in our analysis is an overall rate that does not account for individual risk or other influencing factors, and thus is not easily tailored to an individual woman.
Insofar as screening exams are not diagnostic exams, a certain rate of FPs must be anticipated and accepted given the limitations of the current technology and the goal of detecting small breast cancers. Thus, the relative harm of an FP recall must be weighed against both the frequency and benefit of early cancer detection. The possibility should not be overlooked that women may experience greater anticipatory concerns about FP results if experts overly emphasize the harms, or present pros and cons as if they were of equal importance. Moreover, the inconvenience and anxiety associated with a FP mammogram is likely to be highly variable.
Accurate estimation of the FP risk under various common conditions is an important part of program evaluation, goal setting, and identification of strategies that might be used to reduce the FP rate without compromising test sensitivity. FP risk estimates also allow us to best inform women undergoing screening what they should expect during their participation in an early detection program. Finally, as with most performance indicators in screening, an observed or estimated rate is hardly immutable. With targeted interventions the FP rate could be reduced without also reducing sensitivity