|Home | About | Journals | Submit | Contact Us | Français|
Benign breast diseases (BBD) encompass a number of histologic subtypes, with varying risks of subsequent breast cancer. Information on previous benign breast disease biopsies has been incorporated into breast cancer risk prediction models; however, the type of histologic lesion has not been taken into account. Given the substantial heterogeneity in breast cancer risk dependent on type of benign lesion, we evaluated whether incorporating this level of detail improves the discriminatory power of risk classification models.
Using data from the Nurses’ Health Study breast cancer nested case-control study (240 cases; 1036 controls), we determined predictors of categories of BBD lesions and developed imputation models. The type of BBD, imputed for each cohort member reporting a diagnosis, was added to a modified version of the Rosner-Colditz breast cancer risk prediction model.
Compared to the model with only previous BBD (yes/no), the model with categories of benign breast disease was significantly improved (p<0.0001). Overall, including type of BBD increased the concordance statistic from 0.628 to 0.635. Using risk reclassification, inclusion of the type of BBD resulted in a 17% increase in incidence per increase of 1 risk decile, holding the model without BBD type risk decile constant.
Although these data suggest that inclusion of category of BBD may improve breast cancer risk classification, the clinical utility of such a model will depend on the consistency of histologic classification of benign breast disease lesions.
There is consistent evidence that women diagnosed with benign breast disease (BBD) are at an increased risk of breast cancer compared to women without a history of benign breast disease1. Though BBD encompasses a number of histologic subtypes, women with such a diagnosis have double the risk of breast cancer compared to women without. Increased morphologic data have refined our ability to estimate a woman’s risk of subsequent breast cancer. Compared to women without benign breast disease or with nonproliferative lesions, the risk of developing breast cancer increases 1.5 to two-fold for women with proliferative changes without atypia and three to five-fold for women with atypical hyperplasia 2–5.
The initiation of wide-scale mammographic screening has made a diagnosis of benign breast disease a common occurrence. Because of the strong association between having a BBD and subsequent risk of breast cancer, information on previous benign breast disease has been incorporated into breast cancer risk prediction models6, 7. However, the level of detail is usually limited to ever having a previous BBD (yes/no)6 or number of previous biopsies8 and does not take into account the histologic category of the benign lesion. The original Gail model7 included the number of previous biopsies and presence of atypical hyperplasia (yes/no) and the Tyrer-Cuzick model includes both previous atypical hyperplasia and lobular carcinoma in situ (LCIS)9. Breast cancer risk prediction models have been used to determine eligibility for clinical trials10 and to identify women at high risk who may benefit from chemoprevention11. Although these models perform well in estimating the number of breast cancers that will occur in a population, their ability to discriminate between individuals who will develop breast cancer and those who will not is modest. Given the substantial heterogeneity in risk of breast cancer depending on type of benign lesion, we evaluated whether incorporating this level of histologic detail improves the discriminatory power of breast cancer risk classification models.
The Nurses’ Health Study (NHS) cohort was initiated in 1976, when 121,700 US registered nurses ages 30 to 55 returned an initial questionnaire. Every 2 years, information on reproductive variables, body mass index, exogenous hormone use, and disease outcomes has been updated. This study was approved by the Committee on Human Subjects at Brigham and Women’s Hospital.
The population of women whose data have been used in this analysis has been described in detail 6, 12. Briefly, we excluded women with unknown, inconsistent or out of range reports for height, weight in 1976 or at age 18, age at menarche or menopause, or each pregnancy, parity and duration or type of postmenopausal hormone use. Additionally, women with a simple hysterectomy were excluded, as were women with prevalent cancers in 1976 or no follow-up after 1978. Overall, 75,022 participants remained in the analysis. These women contributed 1,167,715 person years from 1980 to 2000, during which 3,221 incident, invasive cases of breast cancer occurred.
We conducted a case–control study nested within the subcohort of participants in the NHS and Nurses’ Health Study II (NHS II) with a biopsy-confirmed BBD. Similar to NHS, the NHS II is also an ongoing cohort study of over 116,000 US female nurses who were 25 to 42 years of age in 1989 when the study was initiated. The methods developed to follow participants and confirm incident cancers and death in the study have been described previously13.
Beginning with the initial NHS questionnaire in 1976, participants have been asked on every biennial questionnaire to report any diagnosis of fibrocystic disease or other BBD. Beginning in 1982, the NHS I questionnaires asked if she had a biopsy-confirmed BBD. The initial 1989 NHS II questionnaire and all subsequent questionnaires also asked participants to report any diagnosis of BBD and to indicate whether it was confirmed by biopsy. For women that we were able to obtain specimens for, 95% of the self-reported BBD is confirmed by pathology review14, 15. Upon centralized review, the pathologists consider women with following histologies to have a benign breast disease: cysts, apocrine metaplasia, mild hyperplasia, fibroadenoma, moderate or florid hyperplasia, intraductal papilloma, sclerosing adenosis, atypical ductal hyperplasia and atypical lobular hyperplasia.
Within the subcohort of women with a biopsy-confirmed BBD, eligible cases were women who reported a first diagnosis of breast cancer between 1976 and return of the 1996 questionnaire (NHS) or between 1989 and the return of the 1995 questionnaire (NHS II). Incident breast cancer cases in both cohorts were identified through the nurses’ own reports and were confirmed by review of medical records. Eligible controls were women who did not have a diagnosis of breast cancer at the time the case was diagnosed and also had a previous biopsy-confirmed BBD. Controls were matched to cases on year of birth and year of biopsy. Attempts were made to identify four matched controls for each case, when possible.
We identified incident confirmed breast cancer cases diagnosed after return of the initial questionnaire through the 1996/1995 follow-up cycle and controls who also reported a previous biopsy-confirmed BBD. This nested case-control study has been described in detail previously16. Briefly, a total of 1,310 cases were originally identified for this study, and 5,273 matched controls were selected. More than 70% of eligible participants confirmed their BBD biopsy and granted permission to review their pathology slides. We received specimens for 465 cases and 1939 controls. There were no significant differences in the success of obtaining slides between cases and controls. Approximately 98% of pathology specimens obtained were of good quality and were evaluated by study pathologists (431 cases and 1,869 controls). After excluding participants whose benign biopsy specimens were of poor quality or had no breast tissue, evidence of carcinoma in situ or invasive carcinoma, invalid dates of diagnosis, or insufficient information on laterality, there were a total of 395 breast cancer cases and 1610 controls16, 17.
Hematoxin and eosin stained biopsy slides were independently reviewed by one of two collaborating pathologists (SJS, JLC) in a blinded fashion. Any slide identified as having atypia or questionable atypia was jointly reviewed by the two pathologists. For each set of slides reviewed, a detailed work sheet was completed. BBDs were classified according to the Page classification system 18 into one of three categories: nonproliferative, proliferative without atypia (PWOA), or atypical hyperplasia (AH). To mimic the larger population that would be used in the risk prediction modeling, only cases and controls meeting the specific inclusion criteria described above were included in the analysis. Thus, women with unknown type of menopause or simple hysterectomy (therefore unknown age at menopause), or unknown age at menopause were not included (n=729). There were a total of 1,276 women (240 cases and 1,036 controls) included from the nested case-control study.
We fit the log-incidence model of breast cancer to incident breast cancer cases. This model has been described in detail previously 6, 19. We assume that incidence at time t(It) is proportional to the number of cell divisions Ct accumulated throughout life up to age t, that is
The cumulative number of breast cell divisions is factored as follows:
Thus λi = Ci+1 / Ci represents the rate of increase of breast cell divisions from age i to age i+1. Log (λi) is assumed to be a linear function of risk factors that are relevant at age i. The set of risk factors and their magnitude may vary according to the stage of reproductive life. Details of the representation of the Ci are given in Colditz and Rosner6.
The general rationale for a log-incidence model is that the number of precancerous cells increases multiplicatively with time, but that historical exposures differentially affect the rate of increase. Specifically, for breast cancer, the number of precancerous cells is assumed to increase annually at the rate of exp(β0) prior to menopause for nulliparous women; at the rate exp(β0 + β1s) prior to menopause for parous women with parity = s, and so forth. Finally, the number of pre-cancerous cells increases immediately after the first birth by exp[β2 (t1 - t0)], where t1 is the age at first birth and t0 is age at menarche. The incidence rate of breast cancer is assumed to be approximately proportional to the number of precancerous cells.
The log-incidence model was fit using iteratively-reweighted least squares, with PROC NLIN in SAS statistical software version 9.1 (Cary, North Carolina). The parameters of the model are readily interpretable in a relative risk (RR) context. For example, exp (−β0) = RR for a 1-year increase in age at menarche among nulliparous women, exp [−(β0 + β2)] = RR for a 1-year increase in age at menarche among parous women, and so forth. In this analysis, women were censored if they developed other types of cancer except non-melanoma skin cancer or if they died.
Ideally, we would have information on type of BBD from centralized pathology review for each study participant. However, since this was not logistically possible, we used an indirect approach to impute the probability of each of the categories of BBD among women who reported a diagnosis of BBD.
Let x1 = nonproliferative BBD, x2= proliferative BBD without atypia, x3=atypical hyperplasia, and ẕ = other covariates in the risk prediction model.
From the main study, we can obtain Pr(D|ẕ) given by
under the rare disease assumption.
We want to estimate Pr(D|x1, x2, x3, z)where under the rare disease assumption
From the nested case-control study, we can estimate δ1*,δ2*, and δ3* based on the polytomous logistic regression model. Indeed, we could in principle also estimate β̱*from the breast cancer nested case control study, but the estimates will be very imprecise, due to the small sample size.
Therefore, we used the main study population to estimate the parameters in Equation 2 by estimating x1, x2, and x3 for all subjects in the main study reporting a BBD. Using the breast cancer nested case-control study in which type of BBD was determined for each participant, we developed a polytomous logistic regression model to predict outcome of proliferative benign breast disease without atypia and atypical hyperplasia in comparison to nonproliferative benign breast disease. The covariates included as predictors of type of BBD in the model were: age at biopsy, menopausal status at biopsy, nulliparous at biopsy, early breast cancer case (within 8 years of biopsy), and late breast cancer case (≥8 years after biopsy).
In the breast cancer nested case-control study, cases are more likely than controls to have a proliferative BBD and specifically atypical hyperplasia. The rationale for including case status as a covariate in these equations is to account for this relationship in the main study as well. Early cases were considered those below the median time from biopsy to breast cancer diagnosis (8 years), and late cases were breast cancer cases diagnosed 8 or more years after BBD biopsy. We then applied the estimates obtained from the nested case-control study to the larger cohort to estimate the probability of having each of the three types of BBD lesions (p1, p2, p3).
We then imputed the type of BBD for women in the larger cohort who reported a BBD, applying the betas from the polytomous logistic regression models to estimate the probability of each category of BBD for each woman reporting benign breast disease in the larger cohort. To impute the type of BBD for each woman with BBD in the main study, we drew a random number (u) using the RANUNI function of SAS. If u<p1, we designated the woman as having nonproliferative BBD; if p1 ≤ u <p1 + p2 we designated the woman as having proliferative BBD without atypia; if u≥p1 + p2 we designated the woman as having atypical hyperplasia.
We then fit Equation 2 using 1, 2, and 3 instead of x1, x2 and x3, thus obtaining the model:
Since the parameter estimates above (Equation 3) may be influenced by random error, we repeated this imputation approach four additional times and used multiple imputation20 to combine estimates from the separate imputations to obtain an overall estimate. In addition, we included an additional category for BBD that could not be classified into one of the three categories because they were missing necessary information (e.g., age at BBD, menopausal status at BBD) for the prediction model (n=7,707).
To assess the additional predictive power of category of benign breast disease, we computed age specific (5-year age groups) deciles of the risk function with BBD included as a yes/no variable, but without category of BBD (model A) and then including imputed BBD category (model B). From the cross-classification of risk decile model A×risk decile model B we then compared the observed number of cases in specific risk deciles of model B with the expected number of cases within strata defined by model A risk decile. Specifically let Xij = number of breast cancer cases, Nij = number of person-years and P̑ij = Xij/Nij estimated incidence rate within the ith age –specific risk decile for model A and the jth age-specific risk decile for model B and let ln(Pij) = αi + β(j-1) 21, 22. Then exp () is an estimate of the percent increase in breast cancer incidence for an increase of one model B risk decile, holding the model A risk decile constant.
In addition, to assess the additional predictive ability of our risk prediction models, we used the area under the receiver operating characteristic (ROC) curve (i.e., the concordance or C statistic). This statistic ranges from 0.5 to 1.0 and represents the probability that, for a randomly selected pair of women, one with breast cancer and one without breast cancer, the woman with breast cancer has the higher estimated disease probability. Also, we compared the C statistic for different risk prediction rules23.
As has been demonstrated previously, within this data set, women with proliferative BBD lesions are at an increased risk of breast cancer. Women with proliferative disease without atypia are at a 30% increased risk (RR=1.29, 95%CI 0.93–1.79; Table 1) and those with atypical hyperplasia are at a 3.5-fold increased risk of breast cancer (RR=3.47, 95%CI 2.26–5.34; Table 1) relative to women with nonproliferative BBD.
Breast cancer case status and nulliparity were the strongest predictors of type of BBD (Table 2). In addition, age and menopausal status at biopsy were modest predictors of type of BBD. The effect of nulliparity was similar for PWOA and AH; all other variables included in the model varied for PWOA and AH. No other variables from the Rosner and Colditz model were significantly associated with type of BBD.
The type of BBD, imputed for each cohort member reporting a BBD, was added to a modified version of the Rosner and Colditz model (Table 3). In total, 1,164,494 person-years with 3,221 breast cancer cases were included in this analysis. Women with nonproliferative BBD were at a nonsignificant 10% increased risk of breast cancer relative to women without a BBD (RR=1.10, 95%CI 0.97–1.25). Compared with women without BBD, women with PWOA had a 47% increased risk of breast cancer (RR=1.47; 95%CI 1.34–1.61), and women with atypical hyperplasia had a 3-fold increased risk of breast cancer (RR=3.02; 95%CI 2.57–3.55). Women with unclassified type of BBD had a 50% increased risk of breast cancer relative to women without BBD (RR=1.49, 95%CI 1.31–1.69). Compared with using only BBD (yes/no), adding specific categories of benign breast disease, significantly improved the model (difference in -2Log Likelihood=1331.86, 3df, p<0.0001). Overall, including type of BBD increased the concordance statistic from 0.628 to 0.635 (Table 4). Because not all women will have BBD, we also calculated the area under the ROC for women with BBD and women without BBD. Among women with BBD, 422,986 person-years and 1,576 breast cancer cases contributed to this analysis. In the population with BBD, the improvement in the concordance statistic with type of BBD was 0.03, while there was no improvement in the statistic when applied to women without BBD.
Cross-classifying model A (without category of BBD) risk deciles with model B (with BBD category) risk deciles (Table 5) reveals that there are substantial differences in estimated incidence. Overall, the observed number of cases was higher than the expected when the model B decile was high and lower than expected when the model B decile was low, relative to model A. The overall slope was β = 0.16 (p< 0.001) indicating a significant estimated 17% increase in breast cancer incidence for an increase of one model B age-specific risk decile, holding the age-specific model A risk decile constant. Thus, adding category of BBD to the risk prediction model increases its predictive power.
Previous work in the Nurses’ Health Study suggests that age at menarche and menopause may modify the association between BBD and subsequent breast cancer risk 6. Similar to what is seen in the larger cohort 6, women without BBD experience the protective effects of late age at menarche, while those with any type of BBD do not (Table 6). In contrast, an early age at menopause appears to be protective for all women regardless of BBD status (Table 6).
In secondary analyses, we utilized the type of BBD determined by central review for the subset of women whose specimens had undergone centralized review, rather than the imputed type of BBD, and results were nearly identical. In addition, we also conducted analyses in which imputation of type of BBD was restricted to those women whose first BBD was confirmed by biopsy and had the necessary information, and an additional 10,008 women with BBD not confirmed by biopsy were included in the unclassified BBD category. These results were very similar to when imputation was conducted on all women with a BBD regardless of biopsy status.
In the Nurses’ Health Study, we found that type of benign breast disease category, as imputed from a nested-case control study with centralized pathology review, added significantly to a modified Rosner-Colditz breast cancer risk prediction model. Using risk reclassification, inclusion of the type of BBD resulted in a 17% increase in incidence per increase of 1 risk decile, holding the model without BBD type risk decile constant. The increase in the C-statistic was also statistically significant, especially when restricted to women with BBD. The Rosner-Colditz breast cancer risk prediction model is a log-incidence model, which fits numerous time varying epidemiologic risk factors efficiently to a large data set. The complex nature of breast cancer incidence, with many time dependent risk factors, requires prediction models that account for change in risk factors over time. Such models outperform traditional approaches that fit indicator variables with fixed effects across time 24. To use this model, one needs to record year of birth, age at menarche, age at first birth and at each subsequent birth, age at menopause and type of menopause, history of benign breast disease and family history of breast cancer in mother, or sister, height, weight at age 18 and currently, use of postmenopausal hormones (including type and duration of use), and alcohol intake. Although the model requires a more extensive list of personal factors than considered in the Gail or Tyrer-Cuzick model, each of these characteristics represent established reproductive or behavioral risk factors for breast cancer25.
There are a few additional differences between the Rosner-Colditz model and other breast cancer risk prediction models. Of note, the Gail model 28 does not include details of menopause or use of postmenopausal hormones in its prediction algorithm. These are clearly established risk factors 26 and accordingly the model performance after including these factors is improved. We have not compared this model against the model developed by Tyrer and Cuzick 9, which incorporated BRCA1 and BRCA2 estimation and a hypothetical low penetrance gene, as well as some personal risk factors (including age at menarche, age at first birth, height, BMI, and age at menopause). With respect to incorporating benign breast diseases into these models, the Gail model 2 includes number of biopsies and presence of atypical hyperplasia8, and the Tyrer-Cuzick model includes previous atypical hyperplasia and lobular carcinoma in situ (yes/no) only 9.
The strengths of this study include the large size of the cohort, prospectively collected data, and centralized pathology review for a subset of women. By using both risk factors and breast cancer case status in polytomous logistic regression models, our imputed categories of benign breast disease as applied to the larger cohort accounted for both the association between category of BBD and breast cancer and the correlation between BBD and other risk factors already in the risk prediction model.
A limitation of this study is that we did not have category of BBD determined by central pathology review on all cohort members with BBD. In imputing category of benign breast disease, only age, menopausal status and nulliparity at the time of BBD, and breast cancer case status were significant predictors. Although some degree of misclassification is expected in the imputed categories of BBD, the relative risk estimates for breast cancer were nearly identical to those observed in our own and other studies with centralized pathology review. Although our nested case-control study from which the imputation model was developed was limited to women with a biopsy-confirmed BBD, our primary analysis imputed type of BBD for all women reporting a benign breast disease. Secondary analyses in which imputation was restricted only to women whose first benign breast disease was confirmed by biopsy were very similar. One explanation for the similar results is that women without a biopsy-confirmed BBD may have a similar distribution of histologic classifications as women who do not receive a biopsy. An alternative explanation may reflect the methods used in the current study. In this study, we used a woman’s first self-report of BBD to impute the type of BBD she had and did not update with subsequent reports of BBD. Thus, it is possible that a woman with a diagnosis of BBD without biopsy confirmation may in fact go on to have a second BBD which is biopsy-confirmed. Given that the model is similarly improved when imputation is applied to all women with BBD regardless of biopsy status, we presented those as our primary results.
With inclusion of imputed category of BBD, the C-statistic increased from 0.628 to 0.635. There is increasing acknowledgment of the limitations to using the ROC in evaluation risk prediction 27–29. The relationship of one or a combination of, risk factors with disease must be very strong -- relative risks on the order of 100-200 between exposed and unexposed -- to serve as a screening tool at the individual level 30–32.
Although the measures of model change suggest significant improvements, the magnitude of the effects is modest. One factor contributing to this is that the change to the model only applies to a subset of women—those with benign breast disease. In addition, the original model includes a variable for BBD. The beta estimate of this parameter in the original model is very similar in magnitude to that for women classified with proliferative BBD without atypia and women with BBD that could not be classified. Thus, the application of the modified Rosner-Colditz model with BBD category will only be altered for a small percentage of women. The majority of the population did not report a BBD and thus their estimated risk will not change. Thus, on a population level, the inclusion of these variables is small, but has the greatest impact on those women with BBD.
Recent work has demonstrated improvements in the area under the ROC curve when mammographic density 33 and breast cancer genetic susceptibility loci 34 were added to the National Cancer Institute’s Breast Cancer Risk Assessment Tool (BCRAT). There was an increase in average area under the ROC curve of 0.047 with the addition of mammographic density33, and 0.025 with the addition of 7 breast cancer SNPs 34. The improvement in the prediction model with the addition of mammographic density and genetic SNPs is similar to what we observed when we restricted the analysis to women with BBD (difference in AUC=0.03).
Although these data suggest that inclusion of category of BBD may improve breast cancer risk classification, the clinical utility of such a model will depend on the consistency of histologic classification of BBD lesions. Continued expansion of current models with other risk factors that can be estimated on everyone (e.g., mammographic density 35) may further improve breast cancer risk classification.
We thank Drs. Stuart Schnitt James Connolly for their expertise in breast pathology and review of the Nurses’ Health Study benign breast disease slides. We also thank the participants of the Nurses’ Health Study for their continued participation and dedication to the study.
Sources of Funding/Support: Public Health Service Grants CA046475, CA087969, SPORE in Breast Cancer CA089393, from the National Cancer Institute, National Institutes of Health, and the Breast Cancer Research Foundation. Dr. Colditz is supported in part by an American Cancer Society Cissy Hornung Clinical Research Professorship.
Financial Disclosures: None