|Home | About | Journals | Submit | Contact Us | Français|
A breast cancer risk prediction model for black women, developed from data in the Women’s Contraceptive and Reproductive Experiences (CARE) study, has been validated in women aged 50 years or older but not among younger women or for specific breast cancer subtypes.
We assessed calibration and discrimination of the CARE model in the Black Women’s Health Study (BWHS) with data from 45 942 women aged 30 to 69 years at baseline.
During a mean follow-up of 9.5 years, we identified 852 invasive breast cancers. The CARE model predicted 749.6 breast cancers, yielding an expected-to-observed (E/O) ratio of 0.88 (95% confidence interval [CI] = 0.82 to 0.94). The E/O ratio did not appreciably differ between women aged less than 50 years and those aged 50 years or older. The model underpredicted risk to the greatest degree among women aged 25 years or older at birth of first child (E/O = 0.71, 95% CI = 0.63 to 0.81); the model was well calibrated among women aged less than 25 years at birth of first child. The prevalence of later age at birth of first child was higher in the BWHS than in the CARE study, and breast cancer incidence was higher in the BWHS compared with national rates used in the CARE model. With respect to discriminatory accuracy, the concordance statistic was 0.57 (95% CI = 0.55 to 0.59) for breast cancer overall, 0.59 (95% CI = 0.57 to 0.61) for estrogen receptor (ER)-positive breast cancer, and 0.54 (95% CI = 0.50 to 0.57) for ER-negative breast cancer.
The CARE model underpredicted breast cancer risk in the BWHS, at least in part because of older age at first birth in this cohort, which led to higher breast cancer incidence rates. Our results suggest that inclusion of age at first birth may improve model performance. Discriminatory accuracy was modest and worse for ER-negative breast cancer.
Risk prediction models for breast cancer are used to counsel women on their individualized risk and determine eligibility for recruitment into prevention trials. The Gail model (1) performs well in white women (2,3) but underestimates risk for black women (4). The Gail model was modified for black women based on data from black women aged 35 to 64 years in the Women’s Contraceptive and Reproductive Experiences (CARE) study (4). The resulting CARE model uses information on a woman’s age, age at menarche, number of previous breast biopsies, and number of first-degree relatives with breast cancer to estimate absolute breast cancer risk over a specified period. In a validation study among postmenopausal black women aged 50 to 79 years in the Women’s Health Initiative (WHI), the model was reported to be well calibrated (ie, the number of observed breast cancers was similar to that predicted by the model), but the average age-specific discriminatory accuracy (ie, the probability that a randomly selected woman with breast cancer has a higher predicted risk than a randomly selected unaffected woman) was only 0.555 (4).
The CARE model has yet to be validated among black women aged less than 50 years. Because black women are more likely than white women to be diagnosed with breast cancer before age 40 years (5), it is particularly important to have an effective prediction model for younger black women. In addition, prediction models for breast cancer in black women have not addressed molecular subtypes separately.
We used data from the Black Women’s Health Study (BWHS) to assess the calibration and discriminatory accuracy of the CARE model among both younger and older black women and to assess the model’s predictive ability for estrogen receptor–positive (ER+) and estrogen receptor–negative (ER−) breast cancer.
The BWHS, an ongoing follow-up study of black women, was established in 1995 when 59 000 black women aged 21 to 69 years completed a self-administered baseline questionnaire that collected information on demographic characteristics, lifestyle factors, and medical history. Biennial follow-up questionnaires ascertain incident breast cancer. Participants indicated their informed consent by completing the questionnaires. The Boston University Medical Campus Institutional Review Board approved the study.
We restricted the analysis to 48 080 women who were aged 30 years or older at the start of follow-up on January 1, 1996. After exclusion of women who had a history of cancer (n = 1859) or died (n = 64) before start of follow-up or with missing information on any risk factors in the model (n = 215), this analysis included 45 942 women.
Incident diagnoses of breast cancer were ascertained by self-report on biennial follow-up questionnaires from 1997 to 2005. We learned of deaths from family members, the US Postal Service, and the National Death Index. We identified 1084 incident breast cancers, and 1007 (93%) were confirmed by medical record or by cancer registry data from 24 states in which 96% of participants resided at baseline. The analysis included the 852 confirmed invasive breast cancers; information on estrogen receptor status was available for 73%.
The CARE model includes terms for age at menarche, number of first-degree relatives with breast cancer, and an interaction between age and number of previous breast biopsies. Data on these factors were obtained on the BWHS baseline questionnaire. Because the baseline question on breast biopsy asked about ever biopsy, our primary analyses classified women as having either no biopsies or one or more biopsies. On the 2011 questionnaire, participants reported the total number of previous breast biopsies and their age(s) at the first two biopsies; the number of biopsy examinations a woman had undergone before her age at baseline was used in a sensitivity analysis in which the number of biopsies considered was zero, one, or two or more.
The baseline questionnaire collected information on parity, age at birth of first child, age at menopause, type of menopause, height, current weight, and weight at age 18 years. Women who reported having had a hysterectomy but who retained one or both ovaries were classified as premenopausal if their current age was younger than 43 years (10th percentile of age at natural menopause in the BWHS), as postmenopausal if their age was 57 years or older (90th percentile of age at natural menopause), and as uncertain menopausal status if they were aged 43 to 56 years. Body mass index was calculated as weight in kilograms divided by the square of height in meters.
Cox proportional hazards models in SAS version 9.3 (SAS Institute, Cary, NC) were used with BWHS data to estimate relative risks (RRs) and 95% confidence intervals (CIs) for risk of breast cancer associated with the factors included in the CARE model using coding as described in Gail et al. (4): age at menarche (coded as 0 for 14 years or older, or 1 for younger than 14 years), number of first-degree relatives with breast cancer (coded as 0, 1, or 2 for none, one, or more than one based on mother’s and sisters’ histories of breast cancer, respectively), and an interaction between age (coded as 0 for younger than 50 years, or 1 for 50 years or older) and number of previous breast biopsies (coded as 0 for none, or 1 for at least one biopsy examination). Women contributed person-years from the beginning of follow-up on January 1, 1996, until the diagnosis of breast cancer, death, loss to follow-up, or the end of follow-up on December 31, 2005, whichever occurred first. There were no departures from the proportional hazards assumption, which was tested by a likelihood ratio test comparing models with and without cross-product terms for each exposure with time period (1996–2001 vs 2001–2005).
To estimate absolute breast cancer risk for each woman during follow-up based on the CARE model, we used SAS macro available from the National Cancer Institute (http://dceg.cancer.gov/tools/riskassessment/care). We calculated the expected number of breast cancers by summing the predicted absolute risk for each participant. To assess calibration of the CARE model, we compared the expected (E) number of breast cancers with the observed (O) number. The 95% confidence intervals for E/O ratios were calculated assuming a Poisson distribution as follows: (E/O)exp(±1.96 × O-1/2) (4). An E/O ratio greater than one indicates that the model overestimates risk of breast cancer, whereas an E/O ratio less than one indicates underestimation. E/O ratios were compared across categories of risk factors using Wald tests for heterogeneity.
To evaluate the model across levels of risk, we categorized women into quintiles of 10-year predicted risk. We stratified the data by 5-year age groups and also created age-adjusted quintiles of predicted risk to assess the model as a function of risk factors apart from age.
To assess the discriminatory accuracy of the model, we used the concordance statistic (ie, the area under the receiver operating characteristic curve), which corresponds to the probability that a randomly selected woman with breast cancer has a higher predicted risk than a randomly selected unaffected woman. Random classification of women with breast cancer and women without breast cancer results in a concordance statistic of 0.5, whereas perfect classification provides a concordance statistic of 1.0. To assess the discriminatory accuracy for women of a given age, we estimated age-specific concordance statistics in 5-year intervals and calculated the unweighted average of these estimates. All statistical tests were two-sided. A P value less than .05 was considered statistically significant.
During a mean of 9.5 years of follow-up among 45 942 women, 852 women with invasive breast cancer were identified, with a median age at diagnosis of 52 years. The relative risks derived in the BWHS for age at menarche and family history of breast cancer were similar to those in the CARE study (4) (Table 1). However, among women aged less than 50 years in the BWHS, the relative risk of 1.65 associated with one or more breast biopsies was statistically significantly higher than the relative risk of 1.20 for one biopsy in the CARE study (P heterogeneity = .003). Among older women, the relative risk of 1.23 for one or more breast biopsies in the BWHS was compatible with the relative risk of 1.07 for one biopsy in the CARE study.
Overall, the CARE model predicted that 749.6 women would develop breast cancer, compared with 852 that were observed, for an E/O ratio of 0.88 (95% CI = 0.82 to 0.94) (Table 2). Calibration of the model did not differ appreciably between women aged less than 50 years at baseline (E/O = 0.92, 95% CI = 0.84 to 1.00) and women aged 50 years or older (E/O = 0.82, 95% CI = 0.74 to 0.91; P heterogeneity = .15).
With regard to factors included in the CARE model, the underprediction of breast cancer risk was greatest among women with a previous biopsy (E/O = 0.76, 95% CI = 0.65 to 0.88), but the difference from those with no biopsy (E/O = 0.91, 95% CI = 0.84 to 0.98) was not statistically significant (P heterogeneity = .08). The E/O did not differ statistically significantly across categories of age at menarche and family history of breast cancer.
The CARE model does not include age at birth of first child, which is used in the Gail model. In this study, the E/O ratio among women with a first birth before age 25 years was 0.96 (95% CI = 0.88 to 1.06), but the model underpredicted risk among women who were aged 25 years or older at the birth of their first child (E/O = 0.71, 95% CI = 0.63 to 0.81; P heterogeneity = .002). The model also underestimated the number of breast cancers among women with a body mass index less than 20kg/m2 at age 18 years (E/O = 0.79, 95% CI = 0.71 to 0.86) but was well calibrated among women with a body mass index of 20kg/m2 or greater at age 18 years (E/O = 0.97, 95% CI = 0.88 to 1.07; P heterogeneity = .008).
We assessed calibration of an expanded CARE model that used relative risks from the CARE study for age at birth of first child and for the interaction of age at birth of first child with the number of first-degree relatives with breast cancer (4). This expanded model underpredicted risk (E/O = 0.89, 95% CI = 0.83 to 0.95) to a similar degree as the CARE model. In the CARE study, the log relative risk parameters in the expanded model were 0.0014 (95% CI = −0.077 to 0.080) for age at first birth (coded ordinally), 0.424 (95% CI = 0.150 to 0.698) for number of first-degree relatives with breast cancer, and 0.0485 (95% CI = −0.161 to 0.258) for the interaction between the two variables (4). In the BWHS, the corresponding log relative risks were higher but comparable with those in the CARE study: 0.0457 (95% CI = −0.0337 to 0.125), 0.362 (95% CI = 0.0738 to 0.650), and 0.116 (95% CI = −0.0596 to 0.292), respectively. In the Gail model for white women, the log relative risks were 0.219 (95% CI = 0.149 to 0.289), 0.958 (95% CI = 0.688 to 1.229), and −0.191 (95% CI = −0.334 to −0.0476), respectively (1). When we used the Gail model relative risks in an expanded CARE model, there was a statistically significant overestimation of risk, with an E/O of 1.13 (95% CI = 1.05 to 1.20).
Age-specific breast cancer incidence rates for black women that were used in the CARE model, which came from the Surveillance, Epidemiology, and End Results (SEER) program, 1994 to 1998, were generally lower than those in the BWHS (Table 3); the standardized incidence ratio (SIR) was 0.88 (95% CI = 0.82 to 0.94). The proportion of women who were aged less than 25 years at their first child’s birth was 70% in the CARE study compared with 50% in the BWHS. Among women in the BWHS who were aged less than 25 years at their first child’s birth, breast cancer incidence rates were similar to SEER rates (SIR = 0.96, 95% CI = 0.88 to 1.06). In analyses that were restricted to BWHS participants aged less than 25 years at first child’s birth, calibration was generally good across categories of risk factors (Supplementary Table 1, available online). The exception was women with a previous breast biopsy, among whom there remained an underprediction of risk (E/O = 0.81, 95% CI = 0.65 to 0.99).
Table 4 presents data on discriminatory accuracy. The average age-specific concordance statistic for total invasive breast cancer was 0.57 (95% CI = 0.55 to 0.59). The average age-adjusted concordance statistics were 0.59 (95% CI = 0.57 to 0.61) for ER+ breast cancer and 0.54 (95% CI = 0.50 to 0.57) for ER− breast cancer, and the difference was greatest among women aged less than 40 years. The overall concordance statistic for invasive breast cancer, unadjusted for age, was 0.66 (95% CI = 0.65 to 0.68).
Another way to assess model discrimination is to examine relative risks according to age-adjusted quintiles of predicted risk. Compared with the lowest quintile of predicted risk, the relative risk of all invasive breast cancer for the highest risk quintile was 1.83 (95% CI = 1.47 to 2.28) (Table 5). The comparable relative risk was considerably higher for ER+ breast cancer (RR = 2.09, 95% CI = 1.50 to 2.91) than for ER− breast cancer (RR = 1.38, 95% CI = 0.91 to 2.07).
Among the 25 947 participants who provided information on the number of previous breast biopsies, 2.3% reported two or more previous benign biopsy examinations. Calibration and discrimination for the CARE model were similar regardless of how the number of biopsy examinations was coded (as 0, 1, and ≥2 or as 0 and ≥1; data not shown).
In this cohort of black women aged 30 years or older, the CARE model underestimated the number of invasive breast cancers overall by 12%. The initial validation of the CARE model was carried out in postmenopausal women aged 50 to 79 years in the WHI (4). The observed-to-expected ratio of 1.08, which corresponds to an E/O ratio of 0.92, was not statistically significantly different from 1.0 (95% CI = 0.97 to 1.20). Breast cancer incidence rates in the BWHS are higher than SEER rates used in the CARE model, which may explain, at least in part, why the CARE model underestimated risk in the BWHS. The CARE model may be well calibrated in a population with incidence rates similar to SEER rates. A greater proportion of women in the BWHS have a college education relative to black women from the same birth cohorts (6), and the mother’s average age at first child’s birth in the BWHS is higher than that in the general population of black women (7), reflecting delayed childbearing in women with greater educational attainment (8). Later age at first child’s birth is associated with increased risks of breast cancer in the BWHS (9) and in other studies of black women (10,11). Thus, the higher proportion of participants with older age at first child’s birth in the BWHS may have contributed to the higher breast cancer incidence rates relative to SEER rates. By contrast, BWHS incidence rates among women aged less than 25 years at first child’s birth were comparable with those in SEER.
Calibration of the CARE model in the BWHS was not materially different between women aged less than 50 years and those aged 50 years or older. However, there was statistically significant heterogeneity in the findings according to age at first child’s birth: Calibration of the model was good among women with a younger age at first child’s birth (age <25) but poor among women with older age at first child’s birth. The report on validation of the model in the WHI did not present performance of the model across strata of age at first child’s birth (4). Whereas the Gail model includes age at first child’s birth, this factor was not included in the CARE model because the estimated relative risk for age at first child’s birth was very close to 1.0 (4). When we incorporated relative risks for age at first child’s birth and its interaction with family history into an expanded CARE model, the performance of the model was almost unchanged. However, when we used relative risk estimates for these two variables from the Gail model for white women, which are appreciably higher than in the CARE study, there was a statistically significant overprediction of risk. It may be that an expanded model that includes age at first child’s birth and uses a larger relative risk than that estimated in the CARE study may better perform across various study populations of black women.
As in this study, the WHI validation study of the CARE model found an underestimation of risk in women with a previous breast biopsy (4). The WHI included only surgical biopsies, whereas the BWHS included all biopsies without regard to type, as in the CARE study. The relative risks used in the CARE model for history of benign breast biopsy were lower than relative risks estimated in the BWHS. It is possible that the inclusion of a larger relative risk for previous breast biopsy would improve calibration of the model.
We found the discriminatory accuracy of the CARE model, as measured by an age-adjusted concordance statistic of 0.57, was modest, similar to what has been observed in previous validations of the CARE (4) and Gail (3) risk models. The overall concordance statistic of 0.66, unadjusted for age, has limited interpretability. It is largely dependent on the age range of the study population, given the strong association between age and breast cancer risk.
Discriminatory accuracy of the CARE model in the BWHS was better for ER+ breast cancer than for ER− breast cancer. Relative risks by quintile of predicted risk, another measure of discrimination, were also higher for ER+ breast cancer than for ER− breast cancer. The better discrimination for ER+ breast cancer is consistent with findings in white women for both the Gail model (12) and the Rosner–Colditz model (13). Risk factors for ER+ breast cancer are better understood than those for ER− cancer, and therefore risk models will tend to better predict ER+ breast cancer (14). Models developed in white women may be less accurate for black women because the latter are more likely than white women to be diagnosed with ER− tumors (15).
The inclusion of young women is a strength of this study. The only previous validation of the CARE model was restricted to postmenopausal women aged 50 years or older, and it did not evaluate ER+ and ER− breast cancer separately (4).
Study limitations include incomplete data on the number of previous breast biopsies for some women. However, results for calibration of the CARE model were similar among the subset of women for whom we had complete information. The lack of data on number of breast biopsies in some validation studies of the Gail model also did not appreciably influence the results (3,16). We included only breast cancers that were confirmed as invasive; we were able to confirm a high proportion (93%) of self-reported diagnoses of breast cancer, but incomplete ascertainment would bias the E/O ratio toward the null.
In summary, the CARE model underestimated breast cancer risk in the BWHS, which might be because of the higher breast cancer incidence in this cohort than in black women nationally. Age at first child’s birth may be an important factor to include in a prediction model for black women, as in the Gail model for white women. In this study, the discriminatory power of the CARE model in black women—like that of previous breast cancer risk prediction models in white populations—was modest. Discrimination was worse for ER− breast cancer, indicating a need to develop a better risk model for women with this breast cancer subtype.
This work was supported by Susan G. Komen for the Cure (KG111112) and the National Cancer Institute (R01CA058420). The content of this article is solely the responsibility of the authors and does not necessarily represent the official views of Susan G. Komen for the Cure, the National Cancer Institute, or the National Institutes of Health.
Data on breast cancer pathology were obtained from several state cancer registries (AZ, CA, CO, CT, DC, DE, FL, GA, IN, IL, KY, LA, MA, MD, MI, NC, NJ, NY, OK, PA, SC, TN, TX, and VA), and results reported do not necessarily represent their views. The authors had full responsibility for the study design, data collection, analysis and interpretation of the data, writing the manuscript, and decision to submit the manuscript for publication.
We gratefully acknowledge the continuing dedication of the Black Women’s Health Study participants and staff.