The Gail model underestimated the number of invasive breast cancers by 13% to 14% among women age 50 years or older from two large, contemporary cohorts. As the RRs of the risk factors used in the model were generally similar to those observed in our cohorts, we attributed the overall underprediction to differences between the age-specific breast cancer rates used in the Gail model (SEER 1983 to 1987) and those of our study populations. The overall calibration of the model improved greatly by replacing SEER breast cancer rates from 1983 to 1987 with those from the main follow-up period of our studies, 1995 to 2003, which captured the higher breast cancer incidence rates during this period.13
The model was also well calibrated in PLCO when risk was projected for the period of 2003 to 2006. This improvement in model calibration was due, after a substantial drop in invasive breast cancer rates in 2003,13–15
to rates that were closer to the SEER rates used for the Gail model than rates during the 1995-to-2003 period.
Our results highlight the importance of validating models for risk prediction in contemporary cohorts and updating model components, including baseline rates and/or RR estimates periodically if necessary. This principle needs to be applied before the model can be used in national and international populations different from those used for model development. To ensure that the Breast Cancer Risk Assessment Tool remains useful to clinicians and researchers to assess breast cancer risk in white women in the United States, an updated version will be available online and will be based on the most recent breast cancer incidence rates from SEER.
Changes in breast cancer incidence rates throughout the 1990s most likely explain differences between the results of this study compared with two previous validation studies2,10
that concluded the model was well calibrated. The projection period for both of these studies began in 1992 and ended in 199710
However, breast cancer incidence increased by approximately 2% per year throughout the 1990s among women age 45 to 59 years and 0.6% per year among women age 60 years or older, and incidence began to decline in 2000 and 2001 for the age groups of 45 to 59 years and 60 years o older, respectively.13
This moderate decline was followed by a sharp drop in invasive breast cancer incidence rates in 2003 among women age 50 years or older.13–15
In our validation cohorts, there was little follow-up before 1995. All women in NIH-AARP entered the cohort in 1995 or 1996. In PLCO, less than 10% of the women entered the cohort before 1995. Underprediction of the Gail model by approximately 20% was also seen in the Women's Health Initiative on the basis of an analysis of breast cancers diagnosed within 5 years of entry (September 1993 to September 2000) onto the study.12
However, the authors did not attribute the discrepancy between their results and those of previous validation studies to differences in underlying rates but to more complete case ascertainment that resulted from higher rates of mammographic screening and biopsy.12
The overprediction observed among women with a family history of breast cancer may be attributed to lower RR estimates for family history of breast cancer in the validation cohorts than the Gail model. However, family history of breast cancer was reported by less than 15% of women in each cohort; therefore, differences in the RRs likely had a small impact on the overall calibration.
Our validation study has some limitations. We restricted our analysis to postmenopausal women because of limited numbers of premenopausal women in the cohorts. By design, all women in these studies were at least 50 years old at the start of follow-up. We also restricted the analysis to white women because of limited numbers of non-white women in these cohorts and because other models may better assess breast cancer risk in non-white populations.22
We did not have information about atypical hyperplasia in either cohort or the number of breast biopsies in PLCO; thus, we had to set those variables to unknown in the computation of the absolute risk estimates. This may have contributed to underprediction of risk in a small subset of women, as the Gail model program assigns the lowest RR for the specific variable to the unknown category. However, in a sensitivity analysis that used imputed values for the number of biopsies in PLCO, the Gail model still significantly underpredicted breast cancer risk.
This lack of information also highlights that an important practical limitation to model validation is the availability of model risk factors in large, contemporary cohorts. Although many of the risk factors in the Gail model, such as age at menarche or age at first live birth, are routinely obtained on questionnaires, number of biopsies or absence or presence of benign hyperplasia are not. Ideally, those questions would be added to standard epidemiologic questionnaires.
Strengths of our study include the use of large, contemporary cohorts with good ascertainment of breast cancer occurrences. The age-adjusted invasive breast cancer incidence rates were comparable to that of SEER 1995 to 2003, which suggests that women in the cohorts are representative of U.S. white women in this age group.
In conclusion, the calibration of risk prediction models is sensitive to trends in underlying population rates. In our study, higher baseline rates between 1995 and 2003 in the validation cohorts compared with those used in the model resulted in substantial underprediction of invasive breast cancers. As a consequence of the underprediction during this period, approximately 13% to 14% of women were lower than the recommended threshold of 1.66% for use of tamoxifen or raloxifene; conversely, on the basis of the well-calibrated model, they might have been eligible and, in fact, benefitted from chemoprevention.5
This study highlights the importance of using appropriate baseline rates in absolute risk models and the importance of calibration of risk prediction models for clinical decision making.