Cancer Incidence Data
We analyze incidence rates for the following cancers: bladder cancer in black versus white women; colorectal cancer in black versus white women; kidney cancer in black versus white men; pancreas cancer in black versus white women; and oral cancer in black versus white men. These cancers were selected from a larger study that will be presented separately.
We obtained case and population data from NCI's SEER program. SEER integrates cancer incidence data from 17 population-based registries with meticulous and consistent data collection and standards that together cover approximately 26 percent of the US population19
. Our analysis covers the 16-year period from January 1, 1990 through December 31, 2005 using data released by SEER in April 2008. For this analysis, we tabulated the rates into four 4-year time periods (1990-1993, 1994-1997, 1998-2001, 2002-2005) and sixteen 4-year age-at-diagnosis groups (ages 21-24, 25-28, …, 81-84 years old) spanning nineteen partially overlapping 8-year birth cohorts from 1909 to 1981 (referred to by mid-year of birth). We excluded the youngest or oldest age group whenever the corresponding number of cases was zero.
We will use the following notation. For any given cancer and population group, matrix Y
] contains the numbers of cancer diagnoses (counts) in calendar period p
and age group a
and matrix O
] contains the corresponding person-years. The diagonals of matrices Y
, from upper right to lower left, represent successive birth cohorts indexed by c
, from the oldest cohort (c
= 1) observed only for the oldest age interval a
during the earliest calendar period p
= 1, to the youngest cohort (c
= C p
− 1) observed only for the youngest age interval a
= 1 during the most recent calendar period p
. The observed incidence rates per 100,000 person-years are λpa
. The expected log rates are ρpa
). We assume the counts are realizations of Poisson processes with means OpaE
The APC Model
APC analysis assumes an underlying true model for the expected rates in the population with log-linear effects for age, period, and cohort:
It is computationally convenient to make an orthogonal decomposition of the underlying APC effects and apply standard identifiability conditions1
In equation (2) ā
+ 1) / 2] and
+ 1) / 2], where […] is the greatest integer function. Also,
, so the values of ā
define convenient (but arbitrary) central or referent indices of age, period, and cohort, respectively.
The intercept μ
parameterizes the log rate at the “center” of the table, while αL
, and γL
parameterize log-linear trends according to age, period, and cohort, respectively. The parameters a
, and c
represent deviations from the regular trends due to non-linear influences associated with age, period, and cohort, respectively. By construction, the deviations have mean 0 and are orthogonal to the linear trends. It is well recognized that this model is not identifiable: because c
, the regular trends αL
, and γL
are not separable, and model parameters cannot be estimated without imposing one additional linear constraint.
As shown by Holford2
, if one fits the model using Poisson regression with the additional identifiability constraint that the coefficient of (p
) in equation (2)
equals 0, it follows that
i.e., the coefficient of the age trend (a
) provides an estimate of (αL
), the coefficient of the cohort trend (c
) provides an estimate of (πL
), and the difference of these two coefficients provides an estimate of (αL
This solution maximizes the Poisson log likelihood, and gives the same fitted rates as all other constraints that maximize the likelihood. Furthermore, each of the parameters in equation (3)
is identifiable. Following conventions we refer to (αL
) as the “longitudinal age trend”20
, to (πL
) as the “net drift”4
and to (αL
) − (πL
) as the “cross-sectional age trend”20
It is important to recognize that application of this particular identifiability constraint in no way imposes the assumption that the true πL = 0. Rather, it reflects that the observable effects of πL cannot be separated from those for αL and γL, but can only be seen indirectly through the estimated values of identifiable quantities such as the net drift, longitudinal age trend, and cross-sectional age trend.
Contrasting Two Groups or Tumor Types: Proportional Hazards and the APC Model
Now we consider the comparison of two sets of independent cause-specific hazard rates with expectations
, type 1 and type 0 hazards, say, over the same ages a
= 1,…, A
, periods p
= 1,…, P
, and cohorts c
= 1,…, P
− 1, assuming that a separate APC model holds for each hazard type. This general setup can be applied in many situations, for example, to compare hazard rates for the same tumor type in two population subgroups, or hazard rates for different tumor types in the same subgroup. In any given application, one hazard rate is the ‘type 1’ hazard, and the other is the ‘type 0’ hazard. For example, in our analysis of bladder cancer in black versus white women (), the incidence of bladder cancer in white women is the ‘type 0’ hazard, and the incidence of bladder cancer in black women in the ‘type 1’ hazard.
Figure 1 Descriptive analysis of cancer incidence rates in blacks and whites, for selected cancers, using data from the NCI SEER database. A) Female bladder cancer; data for APC analysis included all 2186 cases in blacks and 29,148 cases in whites; panel shows (more ...)
We will address the following general questions: Under what circumstances do the hazard rates follow a proportional hazards (PH) model? If PH holds, how does one estimate the relative hazards? Clearly, it might be the case that proportionality holds in an absolute sense for all p
. In practice, this model might be too restrictive. Therefore, we also consider more flexible models of proportionality. For this purpose, it seems natural to develop PH models that correspond to standard descriptive plots for rates, specifically: rates by age stratified on cohort (), rates by period stratified on age (), and rates by age stratified on period ()18,21,22
. In the next section, we develop each of these PH model from an analytical perspective, and provide illustrative examples.
The PH-A Model: Absolute Proportionality
We will say that proportionality holds absolutely (PH-A, “A” for absolute) if
When PH-A holds, the quantity exp (ψ) equals the rate ratio for the type 1 hazard relative to the type 0 hazard, and this value is constant over age, period, and cohort. Clearly, the data are consistent with absolute proportionality if and only if there are no significant differences between any of the APC parameters across hazard types, except for the intercepts.
To illustrate, rates of female bladder cancer in blacks versus whites are consistent with absolute proportionality (). For each age group, plots of age-specific rates (logarithmically scaled) versus calendar period reveal essentially parallel curves in blacks versus whites, and the gaps between the curves do not vary significantly by age (for clarity, two of 16 age groups are shown in ). Using methods described subsequently, the estimated overall black-to-white incidence rate ratio RR1:0 is 0.64 (95% Confidence Interval [CI]: 0.56 – 0.72) ().
Figure 2 Black-versus-white incidence rate ratios and 95% point-wise confidence limits for selected cancers, estimated from SEER using comparative APC analysis. Models were fitted to data summarized in . A) Female bladder cancer; panel shows fitted rate (more ...)
The PH-L Model: Proportionality within Cohorts
We will say that PH holds on the “natural” or longitudinal time scale of age, if the logarithms of the age-specific hazard rates are shifted by a constant θc whenever the experience of the same birth cohort c is evaluated for the type 1 and type 0 hazards. This condition defines a PH model stratified on birth cohort, which we will denote as PH-L (“L” for longitudinal). To illustrate, rates of female colorectal cancer in blacks and whites are consistent with PH-L (; three of 19 birth cohorts are shown). For each cohort, plots of age-specific rates are essentially parallel in blacks versus whites, although the gap between each pair of cohort-specific curves varies by cohort.
In general, because longitudinal follow-up of cohorts corresponds to rates along the diagonals of the rate matrices, for each cohort indexed by c
= 1,…, P
− 1, the youngest observed age interval is
, and the oldest observed age interval is
. Therefore, in terms of expected log rates, PH-L holds when
for all c. The term θc equals the logarithm of the rate ratio for the type 1 hazard versus the type 0 hazard, and is a function only of cohort when PH-L holds. Note that PH holds absolutely (PH-A) if PH-L holds and θc = θ for all c. Hence, if the gaps were all equal in the rates would be PH-A.
Assuming that PH-L holds, the estimated black-to-white incidence rate ratio
for female colorectal cancer () is increasingly elevated in successive cohorts of black versus white women, up to a peak value of 1.86 (95% CI: 1.6 – 2.1) for the 1945 cohort. In subsequent cohorts the incidence rate ratio declines; no significant excess in black women is apparent in the limited available follow-up of cohorts born since 1969. The model-based estimates in for 19 cohorts concisely summarize the descriptive data, including data for the three cohorts shown in .
The PH-T Model: Proportionality within Age Groups
We will say that PH holds over time (PH-T, “T” for calendar time) if
for each age group indexed by a = 1,…, A. If PH-T holds, then a plot of the logarithms of the age-specific rates for the type 1 and type 0 hazards versus calendar time will be parallel whenever the same age groups are compared. The term δa equals the logarithm of the rate ratio for the type 1 hazard versus the type 0 hazard, and is a function only of age when PH-T holds. Note that PH-A holds if PH-T holds and δa = δ for all a.
Rates of male kidney cancer in blacks and whites are consistent with PH-T (; two of 16 age groups are shown). Plots of age-specific rates over calendar time are essentially parallel in blacks versus whites, and the gap between the curves varies by age group. The estimated black-to-white incidence rate ratio
is generally higher in black versus white men, but more so among men in their 40s and 50s than men in their 60s and 70s ().
The PH-X Model: Proportionality within Calendar Periods
We will say that PH holds in cross-section (PH-X, “X” for cross-section), if
For each period indexed by p = 1,…, P. If PH-X holds, then a plot of the logarithm of the rates versus age for the type 1 and type 0 hazards will be parallel within any given calendar period. The term ϕp equals the logarithm of the rate ratio for the type 1 hazard versus the type 0 hazard, and is a function only of period when PH-X holds. Note that PH-A holds if PH-X holds and ϕp = ϕ for all p.
Rates of female pancreas cancer are consistent with PH-X (; two of four periods are shown). The cross-sectional age-specific rates are essentially parallel in blacks and whites within any given calendar period, but the gap varies by period. As shown in , the estimated black-to-white incidence rate ratio
is significantly elevated during each study period, yet declines over time, from a maximum value of 1.63 (95% CI: 1.4 – 2.0) during the 1990 – 1993 period, to a minimum of 1.20 (95% CI: 1.0 – 1.4) during the 2002 – 2005 period.
Characterizing the PH Models
In this section we develop necessary and sufficient conditions for each type of proportionality, and derive corresponding expressions for the relative hazards. We assume that a separate APC model holds for both of the type j hazards, j = 0,1, with type-specific values for each APC parameter, so that the expected log rates equal
Clearly, PH-A holds when all identifiable parameters except for the intercepts are equal. Now consider necessary and sufficient conditions for PH-L. From equation (8)
, it is generally the case that the difference between the logarithms of the type 1 and type 0 hazards equals
In equation (9)
the expression for θc
depends only on c
, while the other terms vary with a
. Hence PH-L holds if and only if the following conditions are true:
In other words, PH-L holds when the longitudinal age trends, the age deviations, and the period deviations are all equal across the type 1 and type 0 hazards. Furthermore, PH holds absolutely (PH-A) when it is also the case that
That is, PH-A holds if PH-L holds, and the net drifts and cohort deviations are also equal. In that case, θc = θ = (μ1 − μ0) = ψ.
Next, consider conditions for PH-T. By expressing equation (8)
in terms of age and period, it can be shown that
In equation (12)
the expression for δa
varies only with a
, while the other terms depend on p
. This expression demonstrates that PH-T holds if and only if the following conditions are true:
In other words, PH-T holds when the net drifts, the period deviations, and the cohort deviations are all equal across the type 1 and type 0 hazards. Furthermore, PH holds absolutely (PH-A) when it is also the case that
That is, PH-A holds if PH-T holds, and the cross-sectional age trends and age deviations are also equal. In that case, δa
) = ψ
. If equations (13)
hold but equations (14)
do not hold, then the expression for δa
in equation (12)
concisely describes the dependence of the relative hazard on age.
Finally, consider conditions for PH-X. By rearranging the terms in equation (8)
, it follows that
This expression demonstrates that PH-X holds if and only if the following conditions are true:
In other words, PH-X holds when the cross-sectional age trends, the age deviations, and the cohort deviations are all equal across the type 1 and type 0 hazards. Furthermore, PH holds absolutely (PH-A) when it is also the case that
That is, PH-A holds if PH-X holds, and the net drifts and period deviations are also equal. In that case, ϕp
) = ψ
. If equations (16)
hold but equations (17)
do not hold, then the expression for ϕp
in equation (15)
concisely describes the dependence of the relative hazard on period.
As summarized in , each PH model requires specific linear constraints on the APC parameters. For each model, a simple expression is available for the relative hazard that depends only on the remaining parameters, which are free to vary.
Proportional Hazards Models for Cancer Rates
It is convenient to develop Wald tests for the PH hypotheses because the required ingredients are standard outputs of an APC analysis. For each test, the number of degrees of freedom corresponds to the number of parameters that must be equal under the model (). For example, PH-L requires that the longitudinal age trends, the age deviations, and the period deviations are all equal across the type 1 and type 0 hazards, resulting in 1 + (A − 2) + (P − 2) = (A + P − 3) df. Corresponding df are 2 (A + P) − 5 for PH-A, A + 2P − 4 for PH-T, and 2A + P − 4 for PH-X.
The vector of model parameters for the type g
= 0,1, equals
. The parameters are asymptotically normally distributed. The first and last age, period, and cohort deviations are excluded from ψg
because these are determined from the others through the identifiability constraints shown in equation (2)
.The variance-covariance matrix Var (ψg
) can be estimated from the Poisson log-likelihoods for the numbers of type 0 and type 1 events. If over-dispersion is suspected for one or both sets of hazard rates, the variance-covariance matrices can be scaled by the dispersion parameters (g
) / (PA
− 1)), where D
(.,.) is the usual deviance statistic23
. Using the independence assumption, PH-A, PH-L, PH-T, and PH-X are linear hypotheses that can be evaluated using a Wald Chi-Square test with the general form
is a 2(P
+1) × 2(P
+ 1) contrast matrix with a 1 on the diagonal for each parameter that is being compared across hazard types, and 0 elsewhere, and df
) is the degrees of freedom for the test. Alternatively, likelihood ratio tests (LRT) can be constructed. Details are presented in Appendix A.1
To illustrate the Wald tests, we summarize test results for the examples presented in . Rates of female bladder cancer in blacks and whites () are consistent with PH-A, because none of the four Wald tests for proportionality flag any significant lack-of-fit (
, all P
≥ 0.29). Rates of female colorectal cancer in blacks and whites are consistent with PH-L () because departures from PH-L are not significant (
= 0.52), yet departures from each of the three other PH models are significant, each at P
. Rates of male kidney cancer in blacks and whites are consistent with PH-T (); P
= 0.75 for PH-T versus P
< 0.01 for each of the other three tests. Rates of female pancreas cancer are consistent with PH-X (); P
= 0.14 for PH-X versus P
≤ 0.02 for each of the other three tests.
In practice, non-proportionality is also a common finding. Rates of oral cancer in black versus white men provide an example. The non-proportionality is particularly striking when the rates are stratified by period (; the curves cross for the earlier of the two calendar periods shown). Lack-of-fit is convincingly detected by each of the 4 Wald tests, for example,
Figure 3 Black-versus white incidence rates and incidence rate ratios for oral cancer in males, by age and calendar year, based on 5,516 cases in blacks and 42,468 cases in whites in SEER, using comparative APC analysis. A) Observed rates by age for two selected (more ...)
As illustrated above, the following logic can be applied to select a model. The data are consistent with PH-A when none of the four PH tests detects significant lack-of-fit. When two of the three hypotheses PH-L, PH-T, or PH-X are rejected but one is not, the data are consistent with the latter model. None of the models provides a completely adequate fit if all four PH hypotheses are rejected. When this is the case we say the rate ratios are significantly heterogeneous. Importantly, one cannot use the tests to prove that a given model holds. Rather, absence of a significant difference, especially when the significance level is well above the usual 5%, constitutes evidence that the hypothesis provides a reasonable working model.
One interesting question is whether two sets of rates that are not PH-A can satisfy two of the three restricted PH models, PH-L and PH-T, say. This cannot be so, as shown in Appendix A.2
. In principle, therefore, the existence (or not) of a parsimonious summary of the relative hazards can be determined from the data (). Unfortunately, experience shows that the goodness-of-fit tests may not clearly indicate which model is best.
Uncertainty about the parameter estimates lies at the root of this problem. illustrates the situation assuming no differences in age, period, or cohort deviations, so the model depends exclusively on the trend parameters. The difference between the net drifts is plotted on the y
-axis, the difference between the longitudinal age trends is plotted on the x
-axis, and the line z
= 0} corresponds to equality of the cross-sectional age trends (αL
= 0,1. PH-L holds if the differences between the expected parameters fall on the y
-axis, PH-T if the values fall on the x
-axis, PH-X if the values fall on line z
, and PH-A if the values lie at the origin. Otherwise PH does not hold.
The model selection problem in comparative APC analysis.
Uncertainty about the trend parameters can be quantified by a joint confidence ellipse. It is very important that the uncertainty not be under-stated, therefore, in routine practice we almost always allow for a separate over-dispersion parameter for each hazard type. When the confidence region intersects the origin, or one and only one of the y-axis, x-axis, or z-line, then the corresponding choice of model is unambiguously PH-A, PH-L, PH-T, or PH-X, respectively (). Similarly, if the region excludes the x axis, y axis, or line z, there is strong evidence for heterogeneity because the data support none of the PH models. The best-fitting model will be ambiguous when the confidence ellipse intersects any two of the three lines, for example, x = 0 and y = 0. In these three scenarios, two of four PH models can be rejected with confidence, but the remaining two are both consistent with the data.
Even when PH does not hold, the fitted values
will always be less noisy than the raw data. In this situation it is still valid to consider the rate ratios as a joint function of age and period (or of age and cohort). For this purpose, patterns may be seen more clearly in the model-based estimates
than the corresponding empirical estimates
. This is also a valid approach when the best-fit model is ambiguous.
To illustrate, fitted rate ratios reveal a clear secular pattern in oral cancer incidence in black versus white men (). During the initial study period (1990 – 1993), incidence was significantly higher in black men ages 37 through 64 years compared to white men, whereas rates in younger and older black and white men were similar. By the final study period (2002 – 2005), the rate ratios had moderated for the high-risk age groups, from initial values of about 2, to significantly lower final values of about 1.2.