|Home | About | Journals | Submit | Contact Us | Français|
Conceived and designed the experiments: RMP ARK JVL PH. Analyzed the data: RMP YP DP. Contributed reagents/materials/analysis tools: RTG SSB AH BR. Wrote the first draft of the manuscript: RMP ARK. Contributed to the writing of the manuscript: RMP YP ARK JVL MHG PH. ICMJE criteria for authorship read and met: RMP YP ARK JVL DP RTG SSB AH BR MHG PH. Agree with manuscript results and conclusions: RMP YP ARK JVL DP RTG SSB AH BR MHG PH. Designed the software used in analysis: DP.
Breast, endometrial, and ovarian cancers share some hormonal and epidemiologic risk factors. While several models predict absolute risk of breast cancer, there are few models for ovarian cancer in the general population, and none for endometrial cancer.
Using data on white, non-Hispanic women aged 50+ y from two large population-based cohorts (the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial [PLCO] and the National Institutes of Health–AARP Diet and Health Study [NIH-AARP]), we estimated relative and attributable risks and combined them with age-specific US-population incidence and competing mortality rates. All models included parity. The breast cancer model additionally included estrogen and progestin menopausal hormone therapy (MHT) use, other MHT use, age at first live birth, menopausal status, age at menopause, family history of breast or ovarian cancer, benign breast disease/biopsies, alcohol consumption, and body mass index (BMI); the endometrial model included menopausal status, age at menopause, BMI, smoking, oral contraceptive use, MHT use, and an interaction term between BMI and MHT use; the ovarian model included oral contraceptive use, MHT use, and family history or breast or ovarian cancer. In independent validation data (Nurses' Health Study cohort) the breast and ovarian cancer models were well calibrated; expected to observed cancer ratios were 1.00 (95% confidence interval [CI]: 0.96–1.04) for breast cancer and 1.08 (95% CI: 0.97–1.19) for ovarian cancer. The number of endometrial cancers was significantly overestimated, expected/observed=1.20 (95% CI: 1.11–1.29). The areas under the receiver operating characteristic curves (AUCs; discriminatory power) were 0.58 (95% CI: 0.57–0.59), 0.59 (95% CI: 0.56–0.63), and 0.68 (95% CI: 0.66–0.70) for the breast, ovarian, and endometrial models, respectively.
These models predict absolute risks for breast, endometrial, and ovarian cancers from easily obtainable risk factors and may assist in clinical decision-making. Limitations are the modest discriminatory ability of the breast and ovarian models and that these models may not generalize to women of other races.
Please see later in the article for the Editors' Summary
Several statistical models predict a woman's probability or absolute risk of developing invasive breast cancer based on her age and reproductive, medical, and lifestyle factors –. Many risk factors associated with breast cancer are also associated with the risk of other gynecologic cancers. For example, four of the seven risk factors used in the publicly available Breast Cancer Risk Assessment Tool (BCRAT; http://www.cancer.gov/bcrisktool)  are also strongly associated with ovarian and endometrial cancer risks, including current age, age at menarche, parity, and first-degree family history of breast cancer. Therefore, a woman with a high breast cancer risk due to combinations of these risk factors likely also has above-average endometrial or ovarian cancer risk. However, while some models have been proposed for risk prediction for ovarian cancer , no model that predicts the absolute risk of endometrial cancer is available, even though endometrial cancer is the fourth most common cancer in women, and 1 in 38 women will be diagnosed with endometrial cancer during her lifetime . With rates of obesity increasing and rates of hysterectomy declining in many regions of the US, endometrial cancer incidence may rise further, and thus it is important to identify women at highest risk for this disease.
Absolute risk prediction models provide useful information for health care providers and patients and aid in the design and recruitment phase of studies of preventive interventions. Several large chemoprevention studies of tamoxifen (Nolvadex; AstraZeneca) and raloxifene (Evista; Lilly) used BCRAT projected breast cancer risk to determine eligibility and to estimate needed sample sizes ,. These models can also aid clinical management of women at elevated risk of one or more of these outcomes. For example, if a woman presents with endometrial bleeding, but is found not to have endometrial cancer in a subsequent workup, an estimate of her risk of developing endometrial cancer over the next 5 or 10 y may aid her and her physician in deciding further steps, including having a hysterectomy or taking progestin. Knowing her risk of endometrial cancer in addition to her risk of breast cancer could also inform a woman's decision about whether or not to consider use of tamoxifen for breast cancer prevention , as tamoxifen reduces breast cancer risk, but also increases the risk of endometrial cancer . Additionally, knowing her ovarian cancer risk might influence her decision regarding prophylactic bilateral oophorectomy to reduce her risk of breast cancer.
We developed absolute risk prediction models for breast, endometrial, and ovarian cancer by combining data from two large prospective cohorts at the National Cancer Institute (NCI)—the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial (PLCO) and the National Institutes of Health–AARP Diet and Health Study (NIH-AARP)—and from incidence and competing mortality rates in the NCI's Surveillance, Epidemiology, and End Results Program (SEER). The SEER cancer registries cover approximately 28% of the US population (http://seer.cancer.gov/about/overview.html).
All models were validated using independent data from the Nurses' Health Study (NHS).
We used data on white, non-Hispanic women from two NCI cohorts: the PLCO cohort and the NIH-AARP cohort.
PLCO has been described in detail previously . In brief, PLCO enrolled 78,232 women, aged 55 to 74 y at baseline, between November 1993 and June 2001 at ten screening centers. Women were eligible if they had no history of lung, colorectal, or ovarian cancer and were neither undergoing cancer treatment nor participating in other screening or prevention trials. Women who had undergone bilateral oophorectomy or were taking tamoxifen were initially ineligible but were later included. Women randomized to the screening arm of the trial received a single-view chest X-ray annually for 4 y to screen for lung cancer; a CA (cancer antigen) 125 blood test annually for 6 y and a transvaginal ultrasound yearly for 4 y, both to screen for ovarian cancer; and flexible sigmoidoscopy at the beginning of the trial and either 3 y or 5 y later to screen for colorectal cancer. Institutional review boards at the NCI and screening centers approved the study.
At entry, participants completed a self-administered lifestyle questionnaire. Cancers were ascertained via annual study updates and death certificates, and verified via review of medical records.
NIH-AARP has been described previously . It included 567,169 men and women who, in 1995–1996, were 50–71 y old and resided in one of eight states. Participants returned a self-administered baseline questionnaire and a second more detailed questionnaire, sent 6 mo after the baseline questionnaire. The NCI Special Studies Institutional Review Board approved the study.
Cancer cases were identified through linkage with state cancer registries with a 90% completeness rate . All cases had a histologic diagnosis. Vital status was ascertained through annual linkage to the Social Security Administration Death Master File and the National Death Index Plus.
In each study, we restricted the analysis to non-Hispanic, white women who completed the baseline questionnaire, had follow-up information, and had no personal history of the cancer of interest at baseline. For the ovarian and endometrial cancer models, we further excluded women with bilateral oophorectomy or hysterectomy, respectively (see Table 1 for further details on exclusions). After these exclusions, the NIH-AARP and PLCO study populations included 191,604 and 64,440 women, respectively, for the breast cancer analysis, 114,931 and 42,821 women, respectively, for the endometrial cancer analysis, and 151,165 and 58,282 women, respectively, for the ovarian cancer analysis.
Independent data from the NHS cohort from July 1990 to June 2004 were used to validate our models. The NHS cohort included 121,701 women aged 30–55 y in 1976 . All participants gave informed consent at enrollment. With the same exclusions as for the relative risk (RR) models, the validations were based on 57,906 women for the breast cancer model and 37,241 for the endometrial cancer model. For the ovarian cancer absolute risk model we additionally excluded women who reported removal of a single ovary or ovarian surgery with unknown status of the ovaries during follow-up, resulting in 56,638 women for validation. See Table 2 for details on exclusions. Breast and ovarian cancer incidence rates were similar to those in SEER, but endometrial cancer incidence was substantially lower, with an overall incidence at age 50–74 y of 42.16 per 100,000 person-years, compared to 78.1 per 100,000 person-years in SEER for the same age range (see Tables 3 and and44 for further details).
We estimated separate RR models for breast, endometrial, and ovarian cancer using Cox proportional hazards regression (SAS, version 9.1; SAS Institute), with age as the timescale. For each of the cancers, the primary outcome was incident, invasive epithelial cancer. For each outcome we first estimated RRs and 95% confidence intervals (CIs) separately for each cohort. However, we used RRs from a Cox model for the combined cohorts that included a study indicator as covariate (NIH-AARP versus PLCO). Analyses of the combined data that stratified the baseline hazard function in the Cox model on study yielded results in agreement to the third decimal point with this regression method. Proportionality of the hazard functions was assessed by visual inspection of hazard plots and Schoenfeld residuals.
For each RR model, women were considered at risk from the age at study entry (randomization for PLCO and completion of baseline questionnaire for NIH-AARP) until the age at the earliest of the following: (1) diagnosis of cancer of interest, (2) death, or (3) administrative censoring (for PLCO, most recent annual study update through December 31, 2005; for NIH-AARP, December 31, 2003). In PLCO, women were also censored at date of unconfirmed self-reported cancers and non-epithelial cancers (the International Classification of Diseases for Oncology, Third Edition  morphology codes 8000 to 8573). For the breast cancer RR analysis, women were additionally censored at diagnosis of breast carcinoma in situ, and for the ovarian cancer RR analysis, at diagnosis of an ovarian tumor of low malignant potential in PLCO.
We considered the following risk factors (coding in parentheses): body mass index (BMI; <25, 25 to <30, 30 to <35, 35 to <40, and 40+ kg/m2), age at menarche (see comment on coding below), number of live-born children (parity; 0, 1, 2, 3–4, 5–9, 10+), age at first birth (no children or at age <16, 16–19, 20–24, 25–29, 30–34, 35–39, 40+ y), duration of oral contraceptive (OC) use (never or <1, 1–4, 5–9, 10+ y), menopausal status and age at natural menopause (still menstruating or age <40, 40–44, 45–49, 50–54, ≥55 y), status and duration of menopausal hormone therapy (MHT) use (never, current use, former use; 0, <5, 5–9, 10+ y), status and duration of estrogen and progestin MHT use (never, current use, former use; <1, 2, 3, …, 9, 10+ y), duration of unopposed estrogen MHT use (current use, former use, never; <1, 2, 3, …, 9, 10+ y), history of benign breast disease (yes/no) or breast biopsy (0, 1, 2, 3+ biopsies), first-degree family history of breast cancer (0, 1, 2+ first-degree relatives—mother, daughter, or sister—with a breast cancer diagnosis), first-degree family history of ovarian cancer (0, 1, 2+ first-degree relatives—mother, daughter, or sister—with an ovarian cancer diagnosis), previous gynecologic surgery (defined as hysterectomy and/or partial or bilateral oophorectomy), history of endometriosis (yes/no), history of uterine fibroids (yes/no), use of tobacco (never, former, current smoker; cigarettes per day smoked) and alcohol consumption (drinks/day), and serum CA 125 level at baseline (PLCO only). For PLCO, we tested potential interactions with randomization arm and, for ovarian cancer, method of detection (screen-detected versus not screen-detected). In most instances, the PLCO and NIH-AARP questionnaires allowed for identically coded variables. To synchronize age at menarche (PLCO categories <10, 10–11, 12–13, 14–15, 16+ y; NIH-AARP categories ≤12, 13–14, 15+ y), we randomly allocated women in the overlapping PLCO categories (e.g., 12–13 y) to the NIH-AARP categories (≤12 y or 13–14 y) using the age at menarche distribution from the nationally representative NHANES study . For OC use we randomly assigned women in the ≤1 year category in PLCO to the <1 and 1+ year categories.
Alcohol consumption was not ascertained in the PLCO control arm, and thus RRs for association with alcohol consumption were estimated from the PLCO intervention arm and NIH-AARP. RRs for duration of estrogen and progestin MHT use and duration of unopposed estrogen MHT use were estimated from women in the NIH-AARP cohort who had responded to the second questionnaire. Information for benign breast disease was missing on 20% of the women in the dataset, and we thus created an indicator for missing values. For all other variables, women with missing values were excluded as the number missing was very small (<5% for all variables; Table 5). We first assessed all risk factors listed above as possible predictors for each cancer as main effects. Final models included only variables that were significant in multivariable models with p<0.01. We chose a stringent p-value as we did not want to include variables with modest RRs that would not improve prediction. Model building was repeated using stepwise variable selection in Cox proportional hazards models and led to the same variables being selected for each cancer. We also assessed the significance at p<0.01 of all first-order interaction terms of variables included in the final models. We fitted variables with trends whenever appropriate. For all risk factors, the reference category was the lowest risk category, to facilitate attributable risk (AR) computations. To accommodate missing data, we calculated AR for the breast cancer model in women with complete data. Using time on study as the timescale, adjustment for calendar period or additional censoring at age of diagnosis of any other cancer (e.g., censoring the breast cancer models at diagnosis of endometrial cancer or ovarian cancer) or adjustment for gynecologic surgery did not change the results appreciably.
The absolute risk π(a,b) of cancer c (c=breast, endometrial, or ovarian cancer) in the age interval (a,b] is the probability of developing a specific cancer c during that interval, given that one is alive and free of previous cancer c at the beginning of the interval. The absolute risk is reduced by death from causes other than cancer c and is defined by
In Equation 1, λ1 and λ2 are the cause-specific hazard rates for cancer c and for competing causes, respectively.
We modeled λ1(a,x)=λ10(a) rr1(a,x) as the product of the age-specific baseline hazard rate λ1(a) and the RR model, rr1(a,x)=exp(β'x), where x denotes covariates. We did not include covariates in the hazard λ2 for competing causes of death. The age-specific baseline hazard rates λ10(a) are computed by multiplying the age-specific SEER incidence rates, λ1*(a), by one minus the estimate of the AR, i.e., λ10(a)=λ1*(a)(1−AR1(a)), as outlined in . The AR for cancer type c was obtained as one minus the number of women in the dataset divided by the sum of the multivariate RR estimates for cancer c over all women. We used a model with piecewise constant hazards to approximate Equation 1, with 5-y age intervals.
In SEER, endometrial and ovarian cancer incidence rates are calculated based on all women in the population in a given age group. These rates thus are lower than rates that are based on women with an intact uterus or ovaries, respectively. We therefore adjusted the age-specific SEER rates by dividing them by the percentage of women who had not had a hysterectomy (estimated from the Behavioral Risk Factor Surveillance System (BRFSS) survey  for the same areas included in SEER) or oophorectomy (estimated from NHANES ) (see Tables 4 and S1, S2, S3). In Table S1 we also present rates for models that allow for the possibility that a woman might have a hysterectomy or an oophorectomy during the projection interval, by adding hysterectomy and oophorectomy rates to the competing risks for the two models, respectively.
We compared the expected and the observed numbers of cases overall and in subgroups defined by risk factor combinations. The expected number of cases was calculated by summing the individual projected probabilities, given the baseline covariate values for each woman from entry (July 1990) into the NHS cohort to June 2004. The 95% CIs for the expected/observed (E/O) ratios were calculated using the normal approximation to the Poisson distribution: . An E/O ratio above one indicates that the model overestimates cancer risk, and an E/O less than one indicates that the model underestimates cancer risk. We evaluated the discriminatory accuracy of the prediction models using the area under the receiver operating characteristic curve (AUC), with 95% bootstrap CIs.
The characteristics of the women in the study and cancer incidence rates are shown in Table 1.
Of the 240,712 women used to fit the final breast cancer RR model, 7,695 were diagnosed with invasive breast cancer. The following variables were included in the final RR model: BMI (<25, 25 to <30, 30 to <35, 35+ kg/m2), estrogen and progestin MHT use (never, <10, 10+ y), other MHT use (no, yes), parity (0, 1+ children), age at first birth (<25, 25–29, 30+ y), premenopausal (no, yes), age at menopause (<50, 50 to <55, 55+ y), benign breast diseases (no, yes), family history of breast or ovarian cancer (no, yes), and alcohol consumption (0, <1, 1+ drinks/day). The largest RRs per category increase in the model (Table 6) were obtained for having used estrogen and progestin MHT, RR=1.40 (95% CI: 1.32–1.49) per category increase in duration, and having a history of benign breast disease/biopsy, RR=1.40 (95% CI: 1.33–1.48).
Of the 146,679 women included in the final endometrial cancer model, 1,559 were diagnosed with endometrial carcinoma. The final RR model included BMI (<25, 25 to <30, 30 to <35, 35 to <40, 40+ kg/m2), MHT use (never, <10, 10+ y), parity (0, 1–2, 3+ children), premenopausal (no, yes), age at menopause (<50, 50 to <55, 55+ y), smoking (never, former, current), OC use (<1, 1+ y), and an interaction term between MHT use and an indicator variable that was one for BMI<25 kg/m2 and zero otherwise. The largest RRs in the model were obtained for BMI, fitted with a trend RR=1.72 per category increase (95% CI: 1.65–1.80); being a never smoker, with RR=1.47 (95% CI: 1.22–1.78) compared to a current smoker; and the interaction term between MHT use and the BMI<25 kg/m2 indicator, RR=1.61 (95% CI: 1.43–1.81) (Table 6).
Among the 199,973 women included in the ovarian cancer analysis, 844 were diagnosed with ovarian carcinoma. The final RR model included family history of breast or ovarian cancer (no, yes), duration of MHT use (never, <10, 10+ y), parity (0, 1–2, 3+ children), and OC use (<1, 1+ y) (Table 6). The largest RR was seen for OC never use compared to OC ever use, RR=1.36 (95% CI: 1.17–1.59).
The AR estimates used to compute baseline hazard rates for the models were 0.52 for breast cancer, 0.81 for endometrial cancer, and 0.43 for ovarian cancer.
Table 7 shows examples of 10- and 20-y absolute risks for several risk profiles and for initial ages 50 and 65 y. Breast cancer absolute risk estimates ranged from 1.57% to 21.78% for 10-y projections and from 3.64% to 35.11% for 20-y projections. Risk for endometrial cancer ranged from 0.35% to 10.50% for 10-y and from 1.22% to 17.08% for 20 y. The highest 20-y risk, 17.08%, was seen for a 65-y-old woman in the 40+ kg/m2 BMI category who had never smoked. For specific risk factor combinations, endometrial absolute risk can be higher than breast cancer absolute risk. For example, profile 8 (Table 7) corresponds to a premenopausal nulliparious 50-y-old woman with a BMI of 35 kg/m2, who never smoked, has no family history of breast or ovarian cancer, and never had a breast biopsy. Her 10-y endometrial cancer absolute risk is 4.9%, while her 10-y absolute risk of breast cancer is only 1.62%. 10-y absolute risk of ovarian cancer ranged from 0.28% to 0.96%, and 20-y risk, from 0.74% to 1.77%.
Comprehensive tabulations of 5-, 10-, and 20-y risks for 50- and 60-y-old women for all possible combinations of risk factors for the three outcomes, and software written in SAS to compute absolute risk estimates for any age, projection length, and combination of risk factors, are freely available for download under Breast/Endometrial/Ovarian Risk Assessment at http://dceg.cancer.gov/tools/risk-assessment.
A total of 2,934 incident breast cancers were diagnosed among women included in the analyses, and the model predicted 2,930, resulting in an E/O ratio of 1.00 (95% CI: 0.96–1.04) (Table 9). The model significantly underestimated the number of breast cancers in premenopausal women. For all other variables, the E/O ratios were not statistically significantly different from unity. The overall AUC (discriminatory power) in the NHS cohort was 0.58 (95% CI: 0.57–0.59). We also compared the performance of the new breast cancer model to that of NCI's BCRAT  on the same validation data. BCRAT predicted 2,947 cases, resulting in E/O=1.00 (95% CI: 0.97–1.04). The number of cases was overpredicted significantly in cells defined by family history using either the coding for the new model or BCRAT (Table 9). For women with a family history of breast or ovarian cancer, E/O=1.30 (95% CI: 1.19–1.42), and for women with one and two relatives with breast cancer, E/O=1.30 (95% CI: 1.18–1.43) and E/O=1.78 (95% CI: 1.25–2.55), respectively. For the new breast cancer model, the predicted number of cases did not significantly differ from the observed, with E/O=0.98 (95% CI: 0.89–1.08) for women with one first-degree relative with breast cancer and E/O=0.72 (95% CI: 0.50–1.02) for women with two or more relatives with breast cancer. The overall AUC for BCRAT in the NHS validation data, 0.56 (95% CI: 0.55–0.58), was significantly lower than the AUC for the new breast cancer model (paired t-test, p<0.001).
The endometrial cancer absolute risk model predicted 640 cancers, but only 532 incident endometrial cancers were observed, resulting in E/O=1.20 (95% CI: 1.11–1.30). The model significantly overestimated the observed NHS endometrial cancers in most subgroups (Table 10). The number of endometrial cancers was underestimated for women in the highest BMI category and for premenopausal women, and significantly underestimated for women who reported taking MHT for 10 or more years. The AUC was 0.68 (95% CI: 0.66–0.70).
A total of 377 incident ovarian cancers were diagnosed among women included in the analyses, and the model predicted 406 (Table 10), resulting in E/O=1.08 (0.97–1.19). The number of ovarian cancers was overestimated for all covariate categories, albeit not statistically significantly, with the exception of the nulliparous group and women taking MHT for 10 or more years. The AUC was 0.59 (95% CI: 0.56–0.63).
Current American Society of Clinical Oncology guidelines indicate that premenopausal women and postmenopausal women with low risk of side effects and a 5-y projected BCRAT risk ≥1.66% may benefit from tamoxifen and/or raloxifene for breast cancer prevention . Among the 3,837 premenopausal women aged 50–55 y at baseline in the NHS cohort who reported having a uterus during follow-up, 784 had a 5-y absolute breast cancer risk ≥1.66%. Of those, three women had a ≥2% 5-y absolute endometrial cancer risk. Tamoxifen reduces breast cancer risk by approximately 50% and increases endometrial cancer risk approximately 4-fold in older women . Gail et al.  and Freedman et al.  weighed the risks and benefits of tamoxifen assuming average age-specific risks of health outcomes, apart from breast cancer. Using models both for breast cancer and for endometrial cancer yields a more accurate weighing of risks and benefits. For example, based on an average 5-y risk of endometrial cancer of 0.41% , tamoxifen would increase absolute endometrial cancer risk by 1.23% while reducing a 2.5% breast cancer risk by 1.25%. These risks and benefits are nearly equal. If, instead, the woman had an endometrial cancer risk of 2%, tamoxifen would increase endometrial cancer risk by 6%, which greatly exceeds the reduction in breast cancer risk. In carefully balancing risk and benefits, however, the comparative lethality of the two outcomes might also be an additional important factor in a decision concerning whether to take tamoxifen.
We developed models that predict individualized probabilities of developing breast, endometrial, and ovarian cancers among US white women aged 50+ y. We chose these three cancers because they share several risk factors, presumably reflecting a common hormonal etiology, and because management decisions may depend jointly on these risks.
To our knowledge, there is no other absolute risk model for endometrial cancer, despite the fact that it is the fourth most common cancer in women  and its absolute risks are quite high, particularly in obese women (Table 7). Knowledge of endometrial cancer risk might inform decision-making about diagnostic workup, clinical management, surgical interventions, and the use of agents, such as tamoxifen or unopposed estrogen that increase endometrial cancer risk. Such a model might also aid in designing intervention trials to prevent endometrial cancer and in identifying women with elevated risk who might benefit from such interventions. The endometrial cancer model may also be useful in assessing the burden of that cancer in the general population, which may increase, as more than a third of all US women now have a BMI of 30 kg/m2 or higher . In combination with data on trends in the prevalence of obesity, one could use the model to investigate the extent to which the increasing prevalence of obesity accounts for the significant 2% per year increase in endometrial cancer incidence seen among white women 2006–2010 .
Unlike most other models for breast cancer, our model includes factors that are potentially modifiable for the individual or in populations over time, such as alcohol consumption, BMI, and use of MHT. This model had slightly better discriminatory accuracy (AUC=0.58) in the validation data than the widely used BCRAT (“Gail model”) (AUC=0.56), which predicts breast cancer risk based on reproductive factors, number of breast biopsies, and atypical hyperplasia. However, despite this increase in AUC, the discriminatory accuracy of the breast cancer absolute risk model is still modest, and limits its clinical applicability, particularly for screening. Several breast cancer risk prediction models include non-modifiable risk factors such as family history (e.g., ), mammographic density –, and reproductive and medical factors such as age at menarche and number of breast biopsies . A few models for US women include potentially modifiable risk factors, such as BMI and alcohol use . Our model also incorporates use, and duration of use, of combined estrogen and progestin MHT and other types of MHT. Use of estrogen and progestin MHT had the second largest RR in our model, and had the same RR (RR=1.40) for a one category increase in duration as benign breast disease. Petracci et al.  illustrate public health and counseling applications of a breast cancer model with modifiable risk factors in Italian women, and similar calculations could be performed for the US with the model we developed. These calculations could also aid in understanding the impact of increases in obesity on US breast cancer incidence.
For ovarian cancer, a model based on the NHS includes age at menopause, age at menarche, OC use, and tubal ligation .
To build the models, we combined data from two large prospective studies from well-characterized populations. We used another large cohort, the NHS cohort, to independently validate our models. The RR estimates agreed well with those in the NHS cohort for all three models. The discriminatory power, as assessed by the AUC, was 0.58 and 0.59 for the breast and ovarian cancer models, respectively. While these values indicate modest discriminatory ability, they are similar to those reported for other cancer risk models for these cancers ,. The AUC value for endometrial cancer was 0.67, which is larger than that seen for most models of cancer incidence.
The breast and ovarian model were well calibrated in the NHS cohort; however, because women in the NHS cohort were censored at the age of diagnosis of an in situ breast cancer, the breast model may have underestimated slightly. The endometrial model significantly over-predicted the number of endometrial cancers. This reflects the fact that the NHS cohort has considerably lower endometrial cancer rates (Table 3) than those seen in SEER (Table 4). Further studies of the calibration of this model in additional cohorts would be desirable to assess its applicability to the general US population.
Well-calibrated risk models, even those with modest discriminatory accuracy, have several public health applications. These include designing cancer prevention trials, assessing the absolute burden of disease in the population and in subgroups, and gauging the potential absolute reductions in risk from preventive strategies. Using risk models to select individuals for screening or other interventions usually requires high discriminatory accuracy . Well-calibrated risk models with modest discriminatory accuracy can also aid in individual decision-making. Such models can provide realistic information on level of risk that is useful in making decisions, such as whether or not to have a mammogram . Such models are also useful in decisions on whether or not to take an intervention that has both beneficial and harmful health effects ,.
There are several potential limitations to our models. The exact number of breast biopsies, an important predictor in BCRAT, was not available in our cohorts, and the models were restricted to women aged 50+ y. Our RR models were built using white, non-Hispanic women and may not generalize to other races or ethnicities. We adjusted the age-specific endometrial and ovarian cancer incidence rates for the prevalence of hysterectomy and oophorectomy among US women estimated from population-based surveys, but some residual error may exist. Another limitation is that MHT use was probably more prevalent during the period of study of PLCO and NIH-AARP, from 1993 to 2005, than it is currently. While such changes in the prevalence of risk factors would not affect model calibration, they could reduce the variation of risk in the population and hence reduce the discriminatory accuracy of the models. Among the strengths of our models are large study sample size, nearly complete end point ascertainment, and information on most of the important risk factors.
Our models are not intended to predict the probability of the three cancers among women known to be at much higher than average risk, e.g., women with a mutation in BRCA1 or BRCA2 or with hereditary non-polyposis colorectal cancer (HNPCC). Each model is applicable to women without a prior diagnosis of that particular cancer, and thus in principle the breast cancer model can be applied to predict breast cancer risk for women with a prior diagnosis of any other cancer, including endometrial cancer. However, in applying the risk estimates, one needs to consider that a woman's risk may be altered by treatment for a previous cancer.
In conclusion, we developed and assessed models that project the probabilities of developing breast, endometrial, or ovarian cancer among white, non-Hispanic women aged 50+ y. These models might improve the ability to identify potential participants for research studies and assist in clinical decision-making related to the risks of these cancers.
Competing mortality rates per 100,000, 1992–2006, from causes other than breast, endometrial, or ovarian cancers obtained from US mortality data for white women. These data are collected by the National Center for Health Statistics.
Estimates from the Behavioral Risk Factor Surveillance System for percent of women with hysterectomy in SEER 13 areas, white women, 5-y age groups, 1992–2007 (2001 excluded).
Estimates from the NHANES 1999–2000 survey for percent of women with bilateral oophorectomy, white women, 5-y age groups.
We thank Craig Williams, Information Management Systems, Rockville, Maryland, for help with data management, and Susan Hankinson, University of Massachusetts Amherst, Amherst, Massachusetts, for helpful discussions regarding the NHS.
The manuscript was developed with support from the Intramural Research Program of the Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health. No funding bodies had any role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.