Overall Approach to Constructing a Risk Model and Data Sources

Estimates of relative and attributable risks for breast cancer risk factors were obtained from case–control data from the Asian American Breast Cancer Study (

4) of women of Chinese, Japanese, and Filipino ancestries who were living in California and Hawaii during the accrual period, April 1, 1983, through June 30, 1987. The women were not necessarily born in the United States. Estimated relative risks from this study were tested for heterogeneity across ethnicities and then assumed to be homogeneous to produce the same estimates for all ethnicities. An estimated attributable risk was obtained that represented Chinese, Japanese, and Filipino women in the SEER population. The resulting estimates of relative and attributable risks were combined with ethnicity-specific invasive breast cancer incidence rates and with national non-breast cancer mortality rates from SEER to produce separate estimates of absolute risk for American women categorized as Chinese, Japanese, Filipino, Hawaiian, Other Pacific Islander, and Other Asian.

The study methods for the population-based Asian American Breast Cancer Study have previously been described in detail in Ziegler et al. (

4). Women of Chinese, Japanese, and Filipino ethnicities with histologically confirmed, first primary incident breast cancer diagnosed between the ages of 20–55 were identified through population-based cancer registries in San Francisco–Oakland, California; Los Angeles County, California; and Oahu, Hawaii for the period April 1, 1983, through June 30, 1987. All three registries are currently members of the SEER program (

http://seer.cancer.gov), but the California registries were not members of the SEER program during the accrual period. Control subjects of the same ethnicity, age, and residence were identified through random-digit dialing in the two California areas and through the Hawaii Health Surveillance Program. The final study population consisted of 597 case patients (70% of eligible case patients) and 966 control subjects (75% of eligible control subjects). Eight case patients and 14 control subjects were not included in the analysis because they were missing information in one or more covariates. The estimates of relative and attributable risks were based on data from the 589 case patients and 952 control subjects with complete covariate information on the risk factors in . Some eligible subjects did not participate in the Asian American Breast Cancer Study because they refused or died [for details, see reference (

4)].

| **Table 1**Relative risks estimated from the Asian American Breast Cancer Study for all ethnicities combined and relative risks from the NCI Breast Cancer Risk Assessment Tool or Gail model (3)^{*} |

Age- and ethnicity-specific invasive breast cancer incidence rates for Chinese, Japanese, Filipino, Other Asians (excluding the previous three groups), native Hawaiians, and Other Pacific Islanders (excluding native Hawaiians) were obtained from the SEER Detailed Asian/Pacific Islander Database for the years January 1,1998, through December 31, 2002. We used the US 2000 Census to estimate women-years of exposure over the 5 years of collection of SEER incidence data from 1998 to 2002 (

6). We use the term “ethnicity” to denote these six groups, although the terms “Asian” and “Hawaiian or Other Pacific Islander” have been distinguished as different races (

http://www.whitehouse.gov/omb/fedreg_directive_15). The database represented three metropolitan areas and nine states (Atlanta, Detroit, Seattle and Puget Sound, California, Connecticut, Hawaii, Iowa, Kentucky, Louisiana, New Jersey, New Mexico, and Utah) and thirteen Asian or Pacific Island groups (Chinese, Japanese, Filipino, Asian Indian or Pakistani combined, Korean, Vietnamese, Laotian, Kampuchean, Guamanian, Samoan, Tongan, and native Hawaiian). These reporting areas covered 54% of the total US Asian and Pacific Islander population and represented 53% of Chinese, 71% of Japanese, and 69% of Filipinos in the United States (

6). Ethnicities for incident invasive breast cancer case patients were obtained from medical records by the SEER cancer registries. Because greater than 99.95% of cancer diagnoses in SEER include only one ethnicity designation, the SEER Detailed Asian/Pacific Islander Database used only one ethnicity to classify case patients (

7). The corresponding numbers of women at risk (rate denominators) were based on the US 2000 Census, which allowed individuals to report multiple ethnicities. Therefore, incidence rates in the SEER program were calculated using two different methods for determining the number of women at risk. The first method included women who self-reported one ethnicity on the US 2000 Census; the second method included women who self-reported one or more ethnicities, at least one of which was the group of interest. Because the first method results in an overestimate of the true incidence rate and the second method results in an underestimate, we calculated a simple average of the two incidence rates. Unreported calculations indicated that a simple average performs well over a range of (unknown) fractions of women who check multiple ethnicities on a census form but declare themselves to have a specific ethnicity when forced to choose. This procedure was used to calculate rates separately for Chinese, Japanese, Filipino, Other Asians (excluding the previous three groups), native Hawaiians, and Other Pacific Islanders (excluding native Hawaiians). For native Hawaiians, incidence rates in SEER were calculated using only multiple race/ethnicity denominators because a case patient with any native Hawaiian ancestry is classified as native Hawaiian in SEER (

6). The resulting age- and ethnicity-specific breast cancer incidence rates are shown in

Supplementary Table 1 (available online).

To account for competing risks from non-breast cancer mortality, age- and ethnicity-specific non-breast cancer mortality rates were obtained through SEER from the National Center for Health Statistics (NCHS;

http://www.cdc.gov/nchs) for the period January 1, 1998, through December 31, 2002 (

6), for Chinese, Japanese, Filipino, Other Asians (excluding the previous three groups), native Hawaiians, and Other Pacific Islanders (excluding native Hawaiians). The database represented seven states (California, Hawaii, Illinois, New Jersey, New York, Texas, and Washington) and nine APA groups (Chinese, Japanese, Filipino, Indian only, Korean, Vietnamese, Guamanian, Samoan, and native Hawaiian). These reporting areas covered 68% of the total US Asian and Pacific Islander population and represented 74% of Chinese, 77% of Japanese, and 79% of Filipinos in the United States (

6). Ethnicity for noninvasive breast cancer deaths were obtained from state vital records. Because vital records usually include only a single race or ethnicity designation, the NCHS data used only single race or ethnicity information to classify deaths. We calculated the census numbers at risk as previously described for calculating breast cancer incidence rates and used these denominators to calculate mortality rates as previously described for incidence rates (

Supplementary Table 1, available online).

Statistical Methods

The basic approach is described in Gail et al. (

3). First, we developed a multivariable relative risk model from the Asian American Breast Cancer Study data applied to the risk factors in Gail et al. (

3). Then, we obtained baseline age-specific breast cancer incidence rates by multiplying age- and ethnicity-specific rates from SEER times one minus the common population attributable risk estimated from the Asian American Breast Cancer Study. Finally, we made absolute risk projections for an APA woman with specific risk factors by multiplying her multivariable relative risk times the baseline age- and ethnicity-specific breast cancer incidence rate and taking age- and ethnicity-specific competing risks into account. Further details follow.

Age at diagnosis was used for case patients. A comparable age was assigned to control subjects as follows. The mean difference between the date of interview and the date of diagnosis was computed for case patients within strata defined by ethnicity, study location, year of birth in 5-year intervals, and age at interview category (above and below the median age of case patients at interview). This mean difference was subtracted from the age at interview of each control woman in that stratum to obtain a comparable age for each control subject.

Initially, ethnicity-specific odds ratios were obtained using logistic regression separately for Chinese, Japanese, and Filipino women in the Asian American Breast Cancer Study with the same independent variables as in Gail et al. (

3) (see ) but with age also included as a continuous variable and with dummy variables for location. Adding age squared or including a cubic spline in age had a negligible effect on the log-relative odds for the other risk factors. Because age was included in the model to control for confounding in the estimated effects of the other risk factors, we only present analyses with age as a continuous linear term. The log-relative odds model included main effects in four variables: age at birth of first live child (AGEFLB) coded as 0, 1, 2, or 3 for ages younger than 20, 20–24, 25–29 or nulliparous, or older than 29 years, respectively; number of affected first-degree female relatives (NUMREL) coded as 0 or 1 for zero or more than zero based on mothers’, sisters’, and daughters’ histories of breast cancer, respectively, as of the date of interview; age at menarche (AGEMEN) coded as 0, 1, or 2 for age at menarche 14 years or older, 12–13 years, or younger than 12 years, respectively; and number of benign surgical and needle breast biopsies (NBIOPS), coded as 0, 1, or 2 for zero, one, or more than one biopsy examinations, respectively. To avoid counting the biopsy that led to the diagnosis of breast cancer in a case patient, we excluded biopsies occurring within 3 years of the date of interview, because breast cancer case patients could be ascertained and interviewed up to 3 years after diagnosis. In addition, we excluded any biopsies that occurred at the same age as the breast cancer diagnosis. Unlike previous models (

3), there were no interactions between age and NBIOPS or between AGEFLB and NUMREL, and NUMREL was coded as 0 or 1 rather than as 0, 1, or 2 as in previous models.

Formal tests of heterogeneity of the log odds ratio parameters for the four risk factors among the Chinese, Japanese, and Filipino women were not statistically significant. We therefore computed common log odds parameters for the covariates in by fitting a logistic regression that included 18 intercepts for the different combinations of ethnicity (

3), location (

3), and age (<50 and ≥50 years), as well as age as a continuous variable and the variables in . The values of the log odds corresponding to variables in and their estimated variance–covariance matrix are in

Supplementary Table 2 (available online).

To compute an attributable risk (AR), that is representative of the entire SEER population of Chinese (

*C*), Japanese (

*J*), and Filipino (

*F*) women, we defined the weight for Chinese women as:

where

*D*_{C} is the number of Chinese breast cancer case patients in SEER for the years 2000–2005,

*d*_{C} is the total Chinese breast cancer case patients with complete covariate data in the Asian American Breast Cancer Study, and other terms are defined similarly for Japanese and Filipino groups. Weights for Japanese and Filipino women are also defined similarly. The factor

*F*(

*t*) = 1 − AR(

*t*) for the combined group of age

*t* is given by a weighted version of the formula by Bruzzi et al. (

11) as follows:

where the sums of reciprocal estimated relative risks are over the case patients of age

*t* with complete data in the various subgroups of the Asian American Breast Cancer Study. This formula was applied separately for case patients aged 49 years or younger and for case patients aged 50 years and older. The weights in

equations 1 and 2 are proportional to the weights in the Appendix and yield the same results, because the proportionality factor cancels from ratios in the Appendix.

Equation 2 also equals the SEER-weighted average of ethnicity-specific estimates of one minus attributable risk:

To compute absolute risks, we used the age- and ethnicity-specific invasive breast cancer incidence rates

*h**(

*t*) from

Supplementary Table 1 (available online) and estimated the baseline hazard as

*h*_{1}(

*t*) =

*h**(

*t*)

*F*(

*t*). The hazard

*h*_{2}(

*t*) of risks of age- and ethnicity-specific mortality from non-breast cancer causes was obtained from

Supplementary Table 1 (available online). Using equation 6 in Gail et al. (

3) with 1-year interval widths, we combined the information on

*h*_{1},

*h*_{2}, and the relative risk (RR) to project individualized absolute risk for various initial and final ages, and combinations of risk factors.

For a combination of risk factors leading to a relative risk (RR) compared with a woman with all risk factors at their lowest risk level, we computed the variance of the estimate RR ×

*F*(

*t*), and confidence intervals on it, from the influence function approach of Graubard and Fears (

12) (see “Appendix”). Regarding

*h** and

*h*_{2} as known quantities, we estimated the variance of the estimated absolute risk by Taylor series expansion in RR ×

*F*(

*t*). A logit transformation of the absolute risk was used to obtain symmetric 95% confidence intervals by adding and subtracting 1.96 times the estimated SE of the logit transform. Finally, the inverse logit transform was applied to these symmetric confidence limits to obtain 95% confidence intervals on the absolute risk. A computer program in SAS (

13) is available to compute such confidence limits for any combination of initial and final ages and risk factors.

We prepared a graph that gives approximate confidence intervals by generating confidence limits for a wide range of absolute risks corresponding to various choices of risk factors and risk projection intervals for Chinese, Japanese, Filipino, Hawaiians, Other Pacific Islanders, and Other Asian women. We regressed the upper confidence limits calculated from the variance estimates (see “Appendix”) on the absolute risk, *ϕ*(*x*), and on *ϕ*^{2}(*x*). The points to which the regressions were fitted were chosen to cover a broad range of absolute risks. For each of the 14 starting ages (20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, and 85 years), we considered projection intervals of length (5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, and 70 years) subject to the constraint that the starting age plus the duration of the projection interval was at most 90 years. This yielded 105 possible age intervals over which projections were to be made. For each such age interval, we computed the absolute risk for each of the 72 possible risk factor combinations, resulting in 105 × 72 = 7560 pairs for each ethnic group. Thus, there were 6 × 7560 = 45 360 estimates of absolute risk and corresponding upper and lower confidence limits. The regressions explained 99.1% of the variation in upper confidence limits and 98.4% of the variation in lower confidence limits. Thus, each locus (see ) provided a good fit to the calculated confidence limits in these 45 360 scenarios. The coefficients *a*, *b*, and *c* in the regressions *a* + *bϕ*(*x*) + *cϕ*^{2}(*x*) were (−0.0053, 1.6270, and −0.4808) for the upper confidence limit and (0.0026, 0.6219, and 0.0038) for the lower confidence limit.

To assess the calibration of the AABCS model, we checked it in independent data from APA women in the WHI. We performed separate validation studies to test model calibration for Chinese, Japanese, Filipino, Other Asians (excluding the previous three groups), native Hawaiians, and Other Pacific Islanders (excluding native Hawaiians). For women in various categories, such as Japanese women aged 50−59 years, we computed the probability of developing invasive breast cancer from the AABCS model based on her age at entry, risk factors, and the age that she would attain if she survived to the end of the original WHI follow-up on August 15, 2008. The sum of all such probabilities over women in category

*i* was the expected count,

*E*_{i}, which we compared with the corresponding observed number of women with incident invasive breast cancer,

*O*_{i}. In each category, we computed the ratio of such an observed count (

*O*) to the expected count (

*E*) of invasive breast cancers,

*O* /

*E*, and a 95% confidence interval with a lower limit of

and an upper limit of

. In addition,

*P* values for the goodness-of-fit test were calculated for mutually exclusive and exhaustive categories of the breast cancer risk factors such as age at entry, age at menarche, number of biopsies, age at first live birth, and number of affected first-degree relatives. The

*P* values for the goodness-of-fit tests for these categories were obtained from the χ

^{2} statistic Σ(

*O* −

*E*)

^{2} /

*E* with degrees of freedom equal to the number of categories. For a single category,

*i*, the value (

*O*_{i} −

*E*_{i})

^{2} /

*E*_{i} was compared with a χ

^{2} distribution with one degree of freedom, and the corresponding

*P* value was two-sided. To summarize results over ethnic subgroups, we added the

*E* and

*O* values for a given exposure category, such as age group 50–59 years or number of biopsies, over the six ethnic subgroups.

The concordance statistic or area under the receiver-operating curve (AUC) is the probability that a randomly selected case patient would have a higher-projected, absolute invasive breast cancer risk than a randomly selected control subject (

14). To estimate how much the factors in the AABCS model contributed to discriminatory accuracy for women of a given age, we estimated age-specific concordance statistics in two age intervals (50–59 and ≥60 years) with data from WHI and computed the unweighted average of these age-specific concordance estimates. We used the nonparametric estimator in Wieand et al. (

15), which accounts for ties and provides estimates of SEs.