|Home | About | Journals | Submit | Contact Us | Français|
The Gail model combines relative risks (RRs) for five breast cancer risk factors with age-specific breast cancer incidence rates and competing mortality rates from the Surveillance, Epidemiology, and End Results (SEER) program from 1983 to 1987 to predict risk of invasive breast cancer over a given time period. Motivated by changes in breast cancer incidence during the 1990s, we evaluated the model's calibration in two recent cohorts.
We included white, postmenopausal women from the National Institutes of Health (NIH) –AARP Diet and Health Study (NIH-AARP, 1995 to 2003), and the Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial (PLCO, 1993 to 2006). Calibration was assessed by comparing the number of breast cancers expected from the Gail model with that observed. We then evaluated calibration by using an updated model that combined Gail model RRs with 1995 to 2003 SEER invasive breast cancer incidence rates.
Overall, the Gail model significantly underpredicted the number of invasive breast cancers in NIH-AARP, with an expected-to-observed ratio of 0.87 (95% CI, 0.85 to 0.89), and in PLCO, with an expected-to-observed ratio of 0.86 (95% CI, 0.82 to 0.90). The updated model was well-calibrated overall, with an expected-to-observed ratio of 1.03 (95% CI, 1.00 to 1.05) in NIH-AARP and an expected-to-observed ratio of 1.01 (95% CI: 0.97 to 1.06) in PLCO. Of women age 50 to 55 years at baseline, 13% to 14% had a projected Gail model 5-year risk lower than the recommended threshold of 1.66% for use of tamoxifen or raloxifene but ≥ 1.66% when using the updated model. The Gail model was well calibrated in PLCO when the prediction period was restricted to 2003 to 2006.
This study highlights that model calibration is important to ensure the usefulness of risk prediction models for clinical decision making.
The Gail model,1 publicly available in a slightly modified version2 as the Breast Cancer Risk Assessment Tool (http://www.cancer.gov/bcrisktool), predicts a woman's risk of developing invasive breast cancer over a specified period of time (eg, 5 or 10 years), given her age and risk factor profile. The probability of developing breast cancer is computed by combining relative risks (RRs) estimated from the Breast Cancer Detection Demonstration Project (BCDDP)1 with attributable risks, age-specific breast cancer incidence rates, and competing mortality rates from all other causes from the Surveillance, Epidemiology, and End Results (SEER) program for the years 1983 to 1987.
Several large chemoprevention studies of tamoxifen (Nolvadex; AstraZeneca, Wilmington, DE) and raloxifene (Evista; Lilly, Indianapolis, IN) used a 5-year Gail model–projected breast cancer risk ≥ 1.66% to determine eligibility.3,4 Recent American Society of Clinical Oncology (ASCO) guidelines also refer to this risk threshold in their recommendations for use of these agents and note the common use of this threshold in risk assessment.5 The guidelines indicate that premenopausal women and postmenopausal women with low risk of adverse effects and a 5-year projected invasive breast cancer risk ≥ 1.66% may benefit from tamoxifen and/or raloxifene for prevention of estrogen receptor–positive breast cancer.5 Inherent in this recommendation is the assumption that the threshold comes from a well-calibrated model, that is, there is good agreement between the expected numbers of invasive breast cancers predicted by the model and the number of breast cancers that will develop in a population.6–8 If the model is not well calibrated, then the threshold of 1.66% may not be appropriate.
Validation studies of the Gail model have been published previously.2,9–12 The model was well calibrated among white women in the placebo arm of the Breast Cancer Prevention Trial who received annual screening2 and among regularly screened women with a family history of breast cancer.9 The model slightly underpredicted the number of breast cancers in the Nurses' Health Study10 (6%). Substantial underprediction was reported in two other, more recent, studies: 40% among women with atypical hyperplasia,11 defined somewhat more stringently than in BCDDP, and 20% in the Women's Health Initiative.12
U.S. invasive breast cancer incidence rates increased during the 1990s, declined moderately starting in 2000, and substantially dropped in 2003.13–15 Recognizing that rate changes may affect model calibration (and therefore also explain some of the discrepancies between previous validation studies), we validated the Gail model among white, postmenopausal women from two large, population-based studies with follow-up through recent calendar years, the National Institutes of Health (NIH) –AARP Diet and Health Study (NIH-AARP), and the Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial (PLCO).
The present validation analysis was conducted among women from NIH-AARP and PLCO. As described previously in greater detail,16 the NIH-AARP study is a large, ongoing, prospective study that recruited AARP members from the states of California, Florida, Pennsylvania, New Jersey, North Carolina, and Louisiana and the metropolitan areas of Atlanta, Georgia and Detroit, Michigan from 1995 to 1996. At baseline, it included more than 200,000 women age 50 to 71 years. The PLCO study has also been described previously in detail.17 In brief, PLCO is a multicenter (n = 10) screening trial that enrolled approximately 77,500 women, age 55 to 74 years at baseline, between 1993 and 2001. Women were eligible if they had no history of lung, colorectal, or ovarian cancer and were neither undergoing cancer treatment nor participating in other screening or prevention trials. Women who had undergone bilateral oophorectomy or were taking tamoxifen were initially ineligible but were later included. From NIH-AARP and PLCO, we included white, postmenopausal women with known parity, no history of in situ or invasive breast cancer, and age younger than 90 years at the start of follow-up. Women with questionnaires completed by proxy were excluded from NIH-AARP. The NIH-AARP study was approved by the National Cancer Institute (NCI) Special Studies institutional review board. PLCO was approved by institutional review boards at the NCI and individual screening centers.
In NIH-AARP, invasive breast cancers diagnosed between entry into the cohort and December 31, 2003 were ascertained through linkage with state cancer registries. In PLCO, cancers diagnosed between entry into the study and May 31, 2006 were ascertained through annual mailed questionnaires to participants and subsequently were confirmed by medical records. Reports from physicians and relatives were also collected when available. The National Death Index and cancer registries as available were periodically searched to enhance completeness of end point ascertainment.
Risk factors included in the Gail model were obtained from baseline questionnaires. The model includes age at menarche (≥ 14, 12 to 13, or < 12 years), number of previous breast biopsies (0, 1, or ≥ 2), presence of atypical hyperplasia in a biopsy (yes or no), age at first live birth (< 20, 20 to 24, 25 to 29/nulliparous, or ≥ 30 years), and number of first-degree relatives with a history of breast cancer (0, 1, ≥ 2). NIH-AARP ascertained age at menarche in slightly different categories from those in the Gail model. We therefore randomly reallocated women in the 11-to-12 and 13-to-14 categories of NIH-AARP to the less-than-12 and 12-to-13 and to the 12-to-13 and ≥ 14 categories, respectively, by using the age at menarche distribution from two other cohorts.18,19 Number of breast biopsies (PLCO) and atypical hyperplasia (both studies) were not ascertained and were therefore set to unknown in the risk calculations. For each variable, the Gail model assigns the lowest RR to the unknown category. At baseline, NIH-AARP asked whether a woman had affected sisters and/or daughters but did not ask the number of each. If a woman answered yes, we assumed that she had one affected sister and/or one affected daughter.
To assess model calibration, we first compared RR estimates from NIH-AARP and PLCO to those used in the Gail risk prediction model. We estimated RRs and corresponding 95% CIs from Cox proportional hazards models. Women were observed from age at baseline questionnaire until age at invasive breast cancer diagnosis or censoring. Censoring events were death, loss to follow-up, or end of follow-up (NIH-AARP, December 31, 2003; PLCO, May 31, 2006). In PLCO, we also censored women at in situ breast tumor diagnosis because of incomplete follow-up of subsequent invasive breast cancer in those women. However, fewer than 1% of women in PLCO had an in situ diagnosis during follow-up, so few subsequent invasive tumors were missed. Of note, we obtained virtually identical RR estimates in NIH-AARP, whether or not we censored at in situ breast tumor diagnosis.
Next, we obtained the absolute risk estimate of invasive breast cancer for each woman from age at baseline until the end of the study, loss to follow-up, age of 90 years, or diagnosis of an in situ breast tumor (in PLCO only) by using her baseline risk factor profile. Model calibration was assessed by comparing the observed number (O) of invasive breast cancers for each cohort with the expected number (E) predicted by the model, obtained by summing individual absolute risk estimates across all women, or for women in a particular risk factor category. CIs for expected-to-observed ratios were calculated by assuming a Poisson distribution for the observed numbers of cases . Model discrimination was assessed by using the area under the receiver-operating characteristic curve (AUC).
Finally, we assessed the calibration of an updated model. This model combined the Gail model RRs with SEER invasive breast cancer incidence rates and competing mortality rates from the period corresponding to our cohorts, 1995 to 2003. Computation of the expected number of breast cancers was identical to that of the original Gail model.
In NIH-AARP, 5,665 invasive breast cancers were diagnosed among 181,979 women. In PLCO, 2,223 cancers were reported among 64,868 women. The median age at entry was similar in NIH-AARP (62.8 years) and PLCO (62.3 years). The mean prediction period was slightly shorter in NIH-AARP (7.5 years) than in PLCO (8.6 years). The distribution of Gail model factors was similar between the two cohorts (Table 1). The RRs for age at menarche and number of breast biopsies (NIH-AARP only) were similar to those in the Gail model (Table 2). The Gail model includes an interaction term between number of biopsies and age category (< 50 years and ≥ 50 years) that we could not evaluate, because all women were age 50 years or older. RRs in strata defined by family history and age at first live birth were substantially lower than those used in the Gail model (Table 2). A second interaction term between age at first birth and number of affected first-degree relatives was not significant in either cohort (Pinteraction = .89 and .06 for NIH-AARP and PLCO, respectively).
Overall, the Gail model significantly underestimated the number of breast cancer occurrences in both cohorts (Table 3), by 13% for NIH-AARP (expected-to-observed ratio, 0.87; 95% CI: 0.85 to 0.89) and by 14% for PLCO (expected-to-observed ratio, 0.86; 95% CI, 0.82 to 0.90). Underprediction was also observed in all strata defined by age at menarche (both studies) and number of breast biopsies (NIH-AARP). The Gail model significantly overpredicted the number of breast cancers among women with a family history of breast cancer, particularly among women with a family history of breast cancer who first gave birth before age 25 years.
The overpredictions in subgroups defined by age at first live birth and family history of breast cancer can be explained by lower RR estimates for these factors in the validation cohorts. Overall, however, the RRs of the risk factors used in the model were similar to those observed in our cohorts. We therefore suspected that the substantial, overall underprediction was due to lower age-specific rates used in the Gail model (SEER 1983 to 1987) compared with the rates in our cohorts, particularly for ages younger than 70 years (Table 4).
SEER invasive breast cancer incidence rates from 1995 to 2003 were substantially higher than those of 1983 to 1987 and generally were closer to the age-specific incidence rates of our cohorts (Table 4). The overall, age-adjusted invasive breast cancer incidence rate for SEER 1983 to 1987 was 343.0 per 100,000 person-years.20 The overall, age-adjusted rate for SEER 1995 to 2003 was 389.99 per 100,000 person-years,21 which was similar to NIH-AARP (391.9) and somewhat lower than PLCO (422.9).
After updating the absolute risk prediction model by replacing 1983 to 1987 SEER breast cancer incidence rates with those from 1995 to 2003, the overall expected-to-observed ratio was 1.03 (95% CI, 1.00 to 1.05) for NIH-AARP and was 1.01 (95% CI, 0.97 to 1.06) for PLCO, which indicated good calibration (Table 5). The updated model slightly overpredicted cancer occurrence among women in NIH-AARP with an early age at menarche and among women who reported at least two breast biopsies. Similar to the Gail model, the number of cancers among women with a family history of breast cancer was overestimated. Recognizing that invasive breast cancer rates dropped substantially in 2003,13–15 we examined calibration of the Gail model restricted to the 2003-to-2006 period among women in PLCO who were alive and had not been diagnosed with a breast cancer as of January 1, 2003. The Gail model was well calibrated for that period in PLCO (overall expected-to-observed ratio, 1.00; 95% CI, 0.94 to 1.08). Discriminatory accuracy was unchanged by the recalibration; the overall AUCs for the Gail and updated models were 0.58 in NIH-AARP and 0.59 in PLCO.
We then examined the effect of the recalibration on the identification of women who could potentially benefit from chemoprevention on the basis of current recommendations (Table 6). Of women age 50 to 55 years at baseline, 14.09% (n = 2,978) in NIH-AARP and 13.0% (n = 13) in PLCO had a projected 5-year risk less than 1.66% by using the Gail model but ≥ 1.66% by using the well-calibrated model. On the basis of the Gail model estimates, these women would not be considered for tamoxifen or raloxifene; conversely, on the basis of the well-calibrated model, they might have been eligible and, in fact, benefitted from chemoprevention.5
The Gail model underestimated the number of invasive breast cancers by 13% to 14% among women age 50 years or older from two large, contemporary cohorts. As the RRs of the risk factors used in the model were generally similar to those observed in our cohorts, we attributed the overall underprediction to differences between the age-specific breast cancer rates used in the Gail model (SEER 1983 to 1987) and those of our study populations. The overall calibration of the model improved greatly by replacing SEER breast cancer rates from 1983 to 1987 with those from the main follow-up period of our studies, 1995 to 2003, which captured the higher breast cancer incidence rates during this period.13 The model was also well calibrated in PLCO when risk was projected for the period of 2003 to 2006. This improvement in model calibration was due, after a substantial drop in invasive breast cancer rates in 2003,13–15 to rates that were closer to the SEER rates used for the Gail model than rates during the 1995-to-2003 period.
Our results highlight the importance of validating models for risk prediction in contemporary cohorts and updating model components, including baseline rates and/or RR estimates periodically if necessary. This principle needs to be applied before the model can be used in national and international populations different from those used for model development. To ensure that the Breast Cancer Risk Assessment Tool remains useful to clinicians and researchers to assess breast cancer risk in white women in the United States, an updated version will be available online and will be based on the most recent breast cancer incidence rates from SEER.
Changes in breast cancer incidence rates throughout the 1990s most likely explain differences between the results of this study compared with two previous validation studies2,10 that concluded the model was well calibrated. The projection period for both of these studies began in 1992 and ended in 199710 or 1998.2 However, breast cancer incidence increased by approximately 2% per year throughout the 1990s among women age 45 to 59 years and 0.6% per year among women age 60 years or older, and incidence began to decline in 2000 and 2001 for the age groups of 45 to 59 years and 60 years o older, respectively.13 This moderate decline was followed by a sharp drop in invasive breast cancer incidence rates in 2003 among women age 50 years or older.13–15 In our validation cohorts, there was little follow-up before 1995. All women in NIH-AARP entered the cohort in 1995 or 1996. In PLCO, less than 10% of the women entered the cohort before 1995. Underprediction of the Gail model by approximately 20% was also seen in the Women's Health Initiative on the basis of an analysis of breast cancers diagnosed within 5 years of entry (September 1993 to September 2000) onto the study.12 However, the authors did not attribute the discrepancy between their results and those of previous validation studies to differences in underlying rates but to more complete case ascertainment that resulted from higher rates of mammographic screening and biopsy.12
The overprediction observed among women with a family history of breast cancer may be attributed to lower RR estimates for family history of breast cancer in the validation cohorts than the Gail model. However, family history of breast cancer was reported by less than 15% of women in each cohort; therefore, differences in the RRs likely had a small impact on the overall calibration.
Our validation study has some limitations. We restricted our analysis to postmenopausal women because of limited numbers of premenopausal women in the cohorts. By design, all women in these studies were at least 50 years old at the start of follow-up. We also restricted the analysis to white women because of limited numbers of non-white women in these cohorts and because other models may better assess breast cancer risk in non-white populations.22 We did not have information about atypical hyperplasia in either cohort or the number of breast biopsies in PLCO; thus, we had to set those variables to unknown in the computation of the absolute risk estimates. This may have contributed to underprediction of risk in a small subset of women, as the Gail model program assigns the lowest RR for the specific variable to the unknown category. However, in a sensitivity analysis that used imputed values for the number of biopsies in PLCO, the Gail model still significantly underpredicted breast cancer risk.
This lack of information also highlights that an important practical limitation to model validation is the availability of model risk factors in large, contemporary cohorts. Although many of the risk factors in the Gail model, such as age at menarche or age at first live birth, are routinely obtained on questionnaires, number of biopsies or absence or presence of benign hyperplasia are not. Ideally, those questions would be added to standard epidemiologic questionnaires.
Strengths of our study include the use of large, contemporary cohorts with good ascertainment of breast cancer occurrences. The age-adjusted invasive breast cancer incidence rates were comparable to that of SEER 1995 to 2003, which suggests that women in the cohorts are representative of U.S. white women in this age group.
In conclusion, the calibration of risk prediction models is sensitive to trends in underlying population rates. In our study, higher baseline rates between 1995 and 2003 in the validation cohorts compared with those used in the model resulted in substantial underprediction of invasive breast cancers. As a consequence of the underprediction during this period, approximately 13% to 14% of women were lower than the recommended threshold of 1.66% for use of tamoxifen or raloxifene; conversely, on the basis of the well-calibrated model, they might have been eligible and, in fact, benefitted from chemoprevention.5 This study highlights the importance of using appropriate baseline rates in absolute risk models and the importance of calibration of risk prediction models for clinical decision making.
We thank Mitchell Gail for helpful comments and discussions; B.J. Stone for editorial review; the reviewers for helpful comments; the participants in the NIH-AARP Diet and Health Study for their outstanding cooperation; and thank Sigurd Hermansen and Kerry Grace Morrissey from Westat for study outcomes ascertainment and management and Leslie Carroll at Information Management Services for data support and analysis.
Cancer incidence data from the Atlanta metropolitan area were collected by the Georgia Center for Cancer Statistics, Department of Epidemiology, Rollins School of Public Health, Emory University. Cancer incidence data from California were collected by the California Department of Health Services, Cancer Surveillance Section. Cancer incidence data from the Detroit metropolitan area were collected by the Michigan Cancer Surveillance Program, Community Health Administration, State of Michigan. The Florida cancer incidence data used in this report were collected by the Florida Cancer Data System under contract to the Department of Health (DOH). The views expressed herein are solely those of the authors and do not necessarily reflect those of the contractor or DOH. Cancer incidence data from Louisiana were collected by the Louisiana Tumor Registry, Louisiana State University Medical Center in New Orleans. Cancer incidence data from New Jersey were collected by the New Jersey State Cancer Registry, Cancer Epidemiology Services, New Jersey State Department of Health and Senior Services. Cancer incidence data from North Carolina were collected by the North Carolina Central Cancer Registry. Cancer incidence data from Pennsylvania were supplied by the Division of Health Statistics and Research, Pennsylvania Department of Health, Harrisburg, Pennsylvania. The Pennsylvania Department of Health specifically disclaims responsibility for any analyses, interpretations or conclusions. Cancer incidence data from Arizona were collected by the Arizona Cancer Registry, Division of Public Health Services, Arizona Department of Health Services. Cancer incidence data from Texas were collected by the Texas Cancer Registry, Cancer Epidemiology and Surveillance Branch, Texas Department of State Health Services. Cancer incidence data from Nevada were collected by the Nevada Central Cancer Registry, Center for Health Data and Research, Bureau of Health Planning and Statistics, State Health Division, State of Nevada Department of Health and Human Services.
Supported by the Intramural Research Program of the National Institutes of Health and the National Cancer Institute.
Authors' disclosures of potential conflicts of interest and author contributions are found at the end of this article.
The author(s) indicated no potential conflicts of interest.
Conception and design: Sara J. Schonfeld, Patricia Hartge, James V. Lacey Jr, Kala Visvanathan, Ruth M. Pfeiffer
Financial support: Arthur Schatzkin
Administrative support: Arthur Schatzkin
Provision of study materials or patients: Robert T. Greenlee, James V. Lacey Jr, Arthur Schatzkin
Collection and assembly of data: Robert T. Greenlee, James V. Lacey Jr, Yikyung Park, Arthur Schatzkin
Data analysis and interpretation: Sara J. Schonfeld, David Pee, Patricia Hartge, James V. Lacey Jr, Kala Visvanathan, Ruth M. Pfeiffer
Manuscript writing: Sara J. Schonfeld, Robert T. Greenlee, James V. Lacey Jr, Yikyung Park, Kala Visvanathan, Ruth M. Pfeiffer
Final approval of manuscript: Sara J. Schonfeld, David Pee, Robert T. Greenlee, Patricia Hartge, James V. Lacey Jr, Yikyung Park, Arthur Schatzkin, Kala Visvanathan, Ruth M. Pfeiffer