|Home | About | Journals | Submit | Contact Us | Français|
Optimal US screening strategies remain controversial. We use six simulation models to evaluate screening outcomes under varying strategies.
The models incorporate common data on incidence, mammography characteristics, and treatment effects. We evaluate varying initiation and cessation ages applied annually or biennially and calculate mammograms, mortality reduction (vs. no screening), false-positives, unnecessary biopsies and over-diagnosis.
The lifetime risk of breast cancer death starting at age 40 is 3% and is reduced by screening. Screening biennially maintains 81% (range 67% to 99%) of annual screening benefits with fewer false-positives. Biennial screening from 50–74 reduces the probability of breast cancer death from 3% to 2.3%. Screening annually from 40 to 84 only lowers mortality an additional one-half of one percent to 1.8% but requires substantially more mammograms and yields more false-positives and over-diagnosed cases.
Decisions about screening strategy depend on preferences for benefits vs. potential harms and resource considerations.
Early randomized trials of mammography demonstrated significant reductions in breast cancer mortality associated with screening from ages 50 to 69 years.1,2 Unfortunately, there were only small numbers of women over age 70 who enrolled in the trials, so results for this age group remain inconclusive. Recently the Age trial clearly demonstrated a small, but significant reduction in breast cancer mortality associated with screening women ages 40 to 49.3,4 All of these clinical trials varied in ages included, screening intervals, and use of invitations to screen vs. actual screening use, making it difficult to synthesize results to make public health recommendations.
In our prior work, we developed models of breast cancer incidence and mortality in the United States. These models are ideally suited for synthesizing data to estimate the effect of screening under a variety of policies5,6 since they can hold selected conditions (e.g., screening intervals) constant, permitting direct comparison of strategies. Because all models make assumptions about unobservable events, we have collaborated to use of several models to provide a range of plausible results.6 Our prior results indicated that biennial screening of average-risk women captured the majority of mortality benefits of annual screening with approximately half as many mammography resources and false positive results.7 These data, together with a review of the most current trial results4 were used by the United States Preventive Services Task Force to inform their most recent guidelines for breast cancer screening. The Task Force recommended that screening occur every other year among average-risk women ages 50 to 74. The upper age limit was new from the prior guidelines. For women younger than 50, they suggested that women discuss individual risk and preferences for potential harms with their providers before considering routine screening.8
These recommendations generated considerable controversy9–14 and issuance of diverging guidelines from professional groups, generally to conduct more intensive screening than suggested by the Task Force.e.g., 15 The majority of the controversy centered on the balance of benefits and harms of screening women ages 40 to 49, with little discussion about the upper age limit.
In this paper we present data from our established models comparing screening strategies,7 including new analyses of approaches that correspond to the different guidelines currently promoted in the US. These results are intended to make explicit the expected population outcomes for the strategies adopted.
The models were developed independently within the Cancer Intervention and Surveillance Modeling Network (CISNET) of the National Cancer Institute (NCI).6,7,16 Since there was no personal health information included, only population based de-identified data, the research was considered in the exempt category by the institutional review boards. The models have been described elsewhere.7,17–23 Briefly, the models share common features and inputs but differ in some ways. Several model include ductal carcinoma in-situ (DCIS) (model E [Erasmus], model G [Georgetown-Einstein], model M [MD Anderson Cancer Center], and model W (Wisconsin-Harvard] include DCIS. Models E and W specifically assume that some portions of DCIS are non-progressive and do not result in death; model W also assumes that some cases of small invasive cancer are non-progressive. Model S [Stanford] and model D [Dana-Farber] include only invasive cancer. Some groups model breast cancer in stages, but three (models E, S, and W) use tumor size and tumor growth. The models also differ by whether treatment affects the hazard for death from breast cancer (models G, S, and D), results in a cure for some fraction of cases (models E and W), or both (model M). Despite these differences, in previous collaborations7 the models came to similar qualitative estimates of the relative contributions of screening and treatment to observed decreases in deaths from breast cancer.
The models begin with estimates of breast cancer incidence and mortality trends without screening and treatment and then overlay screening use and improvements in survival associated with treatment.7 We use a cohort of women born in 1960 and follow them beginning at age 25 years for their entire lives. Breast cancer is generally depicted as having a preclinical screening-detectable period (sojourn time) and a clinical detection point. On the basis of mammography sensitivity (or thresholds of detection), screening identifies disease in the preclinical screening-detection period and results in the identification of earlier-stage or smaller tumors than might occur via clinical detection, resulting in reduction in breast cancer mortality. Age, estrogen receptor status, and tumor size-or stage-specific treatment have independent effects on mortality. Women can die of breast cancer or of other causes.
We used a common set of age-specific variables for breast cancer incidence, mammography test characteristics, treatment algorithms and effects, and non-breast cancer competing causes of death.7 Each model also included additional model-specific inputs (or intermediate outputs) to represent preclinical detectable times, lead time, dwell time within stages of disease, and stage distribution in unscreened versus screened women on the basis of their specific model structure.7,17–23
We use an age–period–cohort model to estimate what average breast cancer incidence rates would have been without screening.24 This method considers the effect of age, temporal trends in risk by cohort, and time period. Because we do not have data on future incidence of breast cancer, we extrapolated forward assuming that future age-specific incidence increases as women age based on the last observed patterns. To isolate the effect of technical effectiveness of screening and to assess the effect of screening on mortality while holding treatment constant, models assume 100% adherence to screening and indicated treatment.
We used data on age-specific mammography sensitivity (and specificity) as observed in the Breast Cancer Surveillance Consortium (BCSC) program for initial and subsequent mammography performed at either annual or biennial intervals.25
All women who have estrogen receptor-positive invasive tumors receive a hormonal treatment (tamoxifen if age at diagnosis is <50 years and anastrozole if ≥50 years and non-hormonal treatment with an anthracycline-based regimen. Women with estrogen receptor-negative invasive tumors receive non-hormonal therapy only. Women with DCIS who have estrogen receptor-positive tumors receive hormonal therapy only.26 Treatment effectiveness is based on synthesis of recent clinical trials and is modeled as a proportionate reduction in mortality risk or the proportion cured.27–29
We estimate the cumulative probability of unscreened women dying of breast cancer from age 40 years to death. Screening benefit is then calculated as the percentage of reduction in breast cancer mortality (vs. no screening). Benefits are cumulated over the lifetime of the cohort to capture reductions in breast cancer mortality (or life-years gained) occurring years after the start of screening, after considering non-breast cancer mortality.30
Three different potential screening harms were examined: false-positive mammograms, unnecessary biopsies, and over-diagnosis. False-positive mammograms are the number of mammograms read as abnormal or needing further follow-up in women without cancer divided by the total number of positive screening mammograms based on the specificity reported in the BCSC.25 Unnecessary biopsies are the proportion of women with false-positive screening results who receive a biopsy.31 Over-diagnosis is the proportion of cases in each strategy that would not have clinically surfaced in a woman's lifetime (because of lack of progressive potential or death from competing mortality among all cases arising from age 40 years onward.
We compared model results for 20 strategies among average-risk women. To rank the screening strategies, we first look at the results of each model independently. For a particular model, a strategy that requires more mammograms (our measure of resource use) but has a lower relative percentage of mortality reduction is considered inefficient or “dominated” by other strategies. We then evaluate strategies on the basis of results from all six models together.
After eliminating all dominated strategies, we represent the remaining strategies as points on a graph plotting the average number of mammograms versus the percentage of mortality reduction. We obtained the efficiency frontier for each graph by identifying the sequence of points that represent the largest incremental gain in percentage of mortality reduction per additional screening mammography. Screening strategies that fall on this frontier are the most efficient (that is, no alternative exists that provides more benefit for fewer mammograms performed).
In our previous work, results of each model accurately projected independently estimated trends in the absence of intervention and closely approximated modern stage distributions and observed mortality trends.7,17–19,21–23 Using six models to project a range of plausible screening outcomes provides implicit cross-validation, with the range of results from the models as a measure of uncertainty.
The cumulative probability of dying from breast cancer from age 40 to the end of a woman's life is a median of 3.0% across the models. Thus, if a particular screening strategy leads to a 10% breast cancer mortality reduction, then the probability of breast cancer death would be reduced from 3.0% to 2.7%, or 3 deaths averted per 1000 women screened.
The six models produce consistent results on the ranking of the strategies in terms of reduction in breast cancer mortality (Table 1). Eight approaches are “efficient” in all models (that is, not dominated, because they provide additional mortality reductions for added use of mammography); seven of these have a biennial interval, and all but two begin at age 50 years. In all models, biennial screening starting at age 50 years and continuing through ages 74 or 79 are of fairly similar efficiency. Strategies that include screening until age 84 years provide further improvements in outcome, albeit at a small added increment. (Fig. 1).
To examine the effect of screening interval, we calculated for each screening strategy and model the proportion of the annual benefit (in terms of mortality reduction) that could be achieved by biennial screening. Biennial screening maintains an average of 81% (range across strategies and models, 67% to 99%) of the benefits achieved by annual screening and yields only half as many false positives. The proportion of biopsies that occur because of these false-positive results that are retrospectively deemed unnecessary (that is, the woman did not have cancer) is about 7%; therefore, many more women will undergo unnecessary biopsies under annual screening than biennial screening (Table 2).
If screening begins at age 40 years (vs. 50 years) and continues to age 79 years (we did not model an ending age of 74 when starting at age 40), all models project additional, albeit small, reductions in breast cancer mortality for both annual and biennial screening (Table 3). However, more false-positive results occur in strategies that include screening from ages 40 to 49 years (Table 2). Continuing screening to age 84 years (vs. 50–74 years) results in a median increase in percentage of mortality reduction of 6.5% (range 4%-8%) and 5.5% (range 4% to 7%) under annual and biennial intervals, respectively. However, these benefits are accompanied by an accelerating rate of increase in the risk of over-diagnosis in the older age groups, mostly because of growing rates of competing causes of mortality (not shown).
To compare results for exemplar strategies that have been recommended in the US, we compared the incremental mortality reductions of the most intensive screening regimen (40 to 84 annually) to screening every other year from ages 50 to 74. Screening biennially from ages 50 to 74 lowers mortality by 23%. This means that the 3% chance of death without screening is reduced to 2.3% (3.0% minus 23.2% of 3%). Extending screening beyond biennial examinations from 50 to 74 to annual screening from ages 40 to 84 results in an additional 15.5% mortality reduction (Table 4), reducing the 2.3% probability of death further to 1.8%. In other words, most women do not die of breast cancer and the lifetime risk of death is only reduced by one half of one percent with more intensive screening.
Results for ranking of strategies and all conclusions were similar to the base analyses under different assumptions about test sensitivity (e.g., 10% increase in sensitivity).
This collaborative modeling project demonstrates that the choice of optimal breast cancer screening strategies is complex. All six modeling groups concluded that the most efficient screening strategies are those that include a biennial screening interval. Initiation of screening at age 40 provides small added benefits but is accompanied by a large increase in the number of screening examinations and a high false positive rate. Extending screening beyond age 74 yields moderate mortality reductions and lower false positive rates but at the expense of some women being over-diagnosed. The absolute difference in lifetime probability of death that would be expected under the Task Force recommendation for biennial screening from ages 50 to 74 compared to annual screening form age 40 to 84 is very small (0.5%).
Screening intervals are somewhat arbitrary. Screening every one month or every six months would detect the greatest number of cancers, but would be infeasible in terms of time, mammography resources and the weight of false positive exams. The finding in this study that biennial screening is more efficient than annual screening is consistent with previous screening trials, most of which used 2-year intervals.1,2 The efficiency of biennial screening is largely due to the biology of breast cancer and the specificity of mammography. Slow growing tumors are much more common than rapidly growing tumors, and the ratio of slow to fast growing tumors increases with age,32 so that little survival benefit is lost between screening every year versus every other year. For the small sub-set of younger women with aggressive, faster-growing tumors, even annual screening is not likely to confer a survival advantage. Since the specificity of mammography is less than 100%, the less often screening occurs, the lower the number of false positive results and unnecessary biopsies.
In all models, some reductions in breast cancer mortality, albeit small, were seen with strategies initiating screening at age 40 versus age 50. This is consistent with the recent Age trial in the United Kingdom.3,4 The small magnitude of benefit is attributable to the low incidence of disease from age 40 to 49 and the low sensitivity of mammography in this age group. The same factors that lead to the small benefits in the younger age group also contribute to the harms of screening – high rates of false positive screens and unnecessary biopsies. In addition, since the proportion of DCIS is highest in younger women, screen detection of DCIS that may not be clinically significant could be considered a further harm. Thus, decisions to screen before age 50 largely depend on women's willingness to tolerate these harms for a small chance that cancer is present and that screening will reduce the probability of death from that cancer.
At the other end of the age spectrum, all six models found that screening beyond age 74 remains on the efficiency frontier. This result is consistent with previously reported results of screening benefit from observational and modeled data.33–36 As with the situation for younger women, any benefits of screening older women must be balanced against possible harms. For instance, the probability of over-diagnosis accelerates among women over age 74. Model estimates for the oldest age groups also have some uncertainty built in because of the limited primary data on natural history of breast cancer and the absence of screening trial data after age 74 years.
It is logical to assume that more screening will save substantially more lives. However, when comparing the strategy recommended by the US Preventive Services Task Force8 to the most intensive regimen we evaluated (annual screening from ages 40 to 84), there was only a one-half of one percent additional reduction in the lifetime probability of death from breast cancer. This somewhat counter-intuitive result is based on several factors, including the fact that most women never develop breast cancer and when cancer is diagnosed, treatment is very effective in avoiding death for most women. Additional variables, such as slow tumor growth rates and low incidence rates in young women, also mean that a less intensive screening schedule still maintains the majority of the benefits of more intensive strategies and use far fewer mammography screening and diagnostic resources.
The collaboration of six groups with different modeling philosophies and approaches to estimate the same end-points by using a common set of data provides an excellent opportunity to cross-replicate results and depicts uncertainty related to modeling assumptions and structure by providing a range of results. The resulting conclusions about the ranking of screening strategies were very robust and should provide greater credibility than inferences based on one model alone.
Despite our consistent results, our study had some limitations. First, our models project mortality reductions similar to those observed in clinical trials, but the range of results includes higher mortality reductions than seen in the trials because we model lifetime screening (vs. for the period of the trial and for invitations to screening) and assume adherence to all screening and treatment. The trials followed women for limited numbers of years and have some non-adherence. Second, we do not consider morbidity associated with surgery for screening-detected disease37 or decrements in quality of life associated with false-positive results, living with earlier knowledge of a cancer diagnosis, or over-diagnosis.38 Third, in estimating lifetime results, we projected breast cancer trends from background incidence rates of a 1960 birth cohort extrapolated forward in time. However, future background incidence (and mortality) may change as the result of different forces, and/or results may vary for groups with higher than average risk.39 We assumed 100% adherence to screening and treatment to evaluate program efficacy. Benefits will always fall short of the projected results because adherence is not perfect. If actual adherence varies systematically by age or other factors, the ranking of strategies could change. Finally, we did not include costs in our analysis, although the average number of mammograms per woman (and false-positive results) provides some proxy of resource consumption. Even with these acknowledged limitations, the models demonstrate meaningful, qualitatively similar outcomes.
Choices about optimal ages of initiation and cessation will ultimately depend on program goals, resources, weight attached to the balance of harms and benefits, and considerations of efficiency and equity.
The authors thank the Breast Cancer Surveillance Consortium (BCSC) investigators, their participating mammography facilities, and radiologists for the data they provided and were used to inform some of our model data input variables. A list of the BCSC investigators and procedures for requesting BCSC data for research purposes is provided at http://breastscreening.cancer.gov/.
Funding: This work was done under contracts from the Agency for Healthcare Research and Quality (AHRQ) and NCI and grants from the NCI. The NCI provided some data and technical assistance and AHRQ reviewed the manuscript. Model results are the sole responsibility of the investigators.
Grant Support: By NCI cooperative agreement U01 CA152958–01 and NCI Grants RC2CA148577 and U01CA086076. Data collection in the BCSC was supported by NCI-funded BCSC cooperative agreements (U01CA63740, U01CA86076, U01CA86082, U01CA63736, U01CA70013, U01CA69976, U01CA63731, and U01CA70040) and several U.S. state public health departments and cancer registries. CISNET data management and Web site support were provided by Cornerstone Systems Northwest (NCI contract HHSN261200800002C).
‡This work was done by 6 independent modeling teams from Dana-Farber Cancer Institute; Erasmus Medical Center; Georgetown University Medical Center, Lombardi Comprehensive Cancer Center (Dr. Mandelblatt, principal investigator); Harvard School of Public Health, Harvard Medical School, Harvard Pilgrim Health Care/University of Wisconsin (Dr. Stout, principal investigator); MD Anderson Comprehensive Cancer Center; (Dr. Berry, principal investigator) and Stanford University (Dr. Plevritis, principal investigator). Drs. Mandelblatt and Cronin were the writing and coordinating committee for the project; all other collaborators are listed in alphabetical order. Dr. Feuer was responsible for overall CISNET project direction.
Presented at the St. Gallen Breast Cancer Conference, St. Gallen, Switzerland, March 2011.
Conflict of interest statement: The authors have no conflict of interest to declare.