|Home | About | Journals | Submit | Contact Us | Français|
Standard descriptive methods for the analysis of cancer surveillance data include canonical plots based on the lexis diagram, directly age-standardized rates (ASR), estimated annual percentage change (EAPC), and joinpoint regression. The age-period-cohort (APC) model has been used less often. Here, we argue that it merits much broader use. Firstly, we describe close connections between estimable functions of the model parameters and standard quantities such as the ASR, EAPC, and joinpoints. Estimable functions have the added value of being fully adjusted for period and cohort effects, and generally more precise. Secondly, the APC model provides the descriptive epidemiologist with powerful new tools, including rigorous statistical methods for comparative analyses and the ability to project the future burden of cancer. We illustrate these principles using invasive female breast cancer incidence in the United States, but these concepts apply equally well to other cancer sites for incidence or mortality.
Cancer incidence and mortality rates are closely monitored to track the burden of cancer and its evolution in populations (1-4), provide etiological clues (5-11), reveal disparity (12-14), and gauge the dissemination of screening modalities (15-17) and therapeutic innovations (18, 19). A standard “toolbox” of graphical and quantitative methods has evolved to handle the needs of cancer surveillance researchers. Perhaps the most widely used methods include classical descriptive plots based on the lexis diagram (20-22), directly age-standardized rates (ASR) (23), estimated annual percentage change (EAPC) (24), and the joinpoint regression method (25). The underlying philosophy is agnostic and empirical; hence standard tools are particularly well suited to descriptive, exploratory, and hypothesis-generating studies.
At the same time, the age-period-cohort (APC) model has been developed in the statistics literature as a mathematical counter-point to purely descriptive approaches (20, 26-33). The APC model is based on fundamental generalized linear model theory (34); in principle, it allows the descriptive epidemiologist to both generate and test hypotheses. However, although the APC model is generally accepted, our sense is it remains more of a niche methodology than an integral part of mainstream practice.
We believe two misunderstandings have slowed the uptake of the APC approach. Firstly, there are concerns about the “identifiability problem” of the APC model (27, 28). Secondly, close connections between the classical toolbox and the APC model have not been clearly spelled out in the literature. In this commentary, we will attempt to clarify both misunderstandings and thereby make the case that the APC model merits much wider use.
We will develop this commentary using as a concrete example the incidence of invasive female breast cancers in the United States. For this purpose, we obtained age-specific case and population data from the National Cancer Institute’s Surveillance, Epidemiology, and End Results 9 Registries Database (SEER9) for the 36-year time period from 1973 through 2008 (November 2010 submission) (35).
In general, for any given cancer and population group, the matrix Y = [Ypa, p = 1, …, P, a = 1,…A] contains the number of cancer diagnoses in calendar period p and age group a, and the matrix O = [Opa, p = 1, …P, a = 1, …, A] contains the corresponding person-years. The observed incidence rates per 100,000 person-years are λpa = 105 Ypa/Opa, and the expected log rates are ρpa = log(E(Ypa)/Opa).
It is instructive to think of the rate matrix in terms of its corresponding Lexis diagram (Figure 1), which makes visually clear how the diagonals of matrices Y and O, from upper right to lower left, represent successive birth cohorts indexed by c = p − a + A, from the oldest (c = 1) to the youngest (c = C P + A − 1). From this perspective, it becomes clear that a new cohort enters prospective follow-up with each consecutive calendar period. For this reason, one can think of a registry as a “cohort of cohorts.” Because cancer registries are operated in perpetuity, over time, a substantial number of birth cohorts are followed. Our example includes C = 24 nominal 8-year cohorts born from 1892 through 1984 (referred to by mid-year of birth).
APC analysis is based on a log-linear model for the expected rates with additive effects for age, period, and cohort:
The generic additive effects in equation (1) can be partitioned into linear and non-linear components (28). There are number of equivalent ways to make this partition while incorporating the fundamental constraint that c p − a. Two of the most useful (36) are the age-period form
and the age-cohort form
Notation and parameters are summarized in Table 1. Importantly, all the parameters in equations (2) and (3) can be estimated from the data without imposing additional constraints, and fitted rates from both forms are identical.
There is a close correspondence between APC parameters and estimable functions in Table 1 and fundamental aspects of the data investigated using the standard descriptive toolbox. Before highlighting some of these connections below, we hopefully can shed further light on the much discussed identifiability problem.
The aspect of identifiability in question concerns whether log-linear trends in rates can uniquely be attributed to the influences of age, period, or cohort, quantified by parameters αL, πL, and γL. Mathematically, it has been shown by Holford (28) that one cannot do this without imposing additional unverifiable assumptions, because the three time scales are co-linear (cohort equals period minus age, c = p − a). This issue has often implicitly been held out as a unique and unfortunate limitation of the APC model. In fact, the same issue affects time-to-event analysis of any cohort study.
To see this, consider the following thought experiment. Suppose one enrolls a cohort of exchangeable persons of identical age (e.g., the 1956 birth cohort in Figure 1) and follows them longitudinally over a decade for cancer. At the end of the study, one observes that the log incidence rate increases linearly with age. It is natural to attribute this trend entirely to the effects of ageing, and equate the age-associated slope to the value of a parameter αL.
However, suppose one had also assembled an identical cohort of persons of the same age, but this study had been conducted ten years earlier. It is possible that the age-associated slopes of the two studies would be very different, if disease-causing exposures out of experimental control had been increasing or decreasing in prevalence over time. Hence, the observed age-associated slope actually estimates parameter (αL + πL) or longitudinal age trend (LAT in Figure 1) (32), where αL is the component of the trend that is attributable to aging and πL is the component of the trend due to the net impact of unknown and uncontrollable exposures over successive calendar-periods.
A similar issue affects any cross-sectional analysis. To “control” for the effects of ageing, suppose one studied in succession over time an event rate in persons of the same age (e.g., age group 65-69 years in Figure 1), to estimate the slope of the time-trend πL. By definition, each successive group in this cross-sectional study was born a year later. Hence, both unknown factors and factors out of experimental control associated with birth cohort could also play a role. Therefore, the observed slope over time actually estimates a parameter (πL + γL) or net drift in Figure 1 (29, 30), where πL is the component of the trend that is attributable to calendar time and γL is the component of the trend attributable to the successive cohorts enrolled in the study.
These simple thought experiments, Figure 1, and Table 1 illustrate an important ‘uncertainty principle’ regarding the measurement of absolute rates in cohorts. Interestingly, this principle is seldom considered in the context of most epidemiological cohort and case-control studies, perhaps because these studies have a fairly narrow accrual window and often focus on relative rates rather than absolute rates. In contrast, this issue is often centralin the analysis of registry data, because the follow-up has sufficient breadth and depth to reveal long-term secular trends in the population associated with age, period, and cohort. Indeed, a unique role of registry studies is to identify and quantify such trends, thereby providing direction and guidance regarding the needs for targeted analytical studies.
The APC model provides a unique set of best-fitting log incidence rates, pa or equivalently ca, obtained by plugging in maximum likelihood estimators into equations (2) or (3), respectively. The corresponding variances are readily calculated. In our experience the fitted rates have an appealing amount of smoothing, and we use them routinely in our studies (36-45), especially for rare cancer outcomes. Experience suggests that for “moderate” sized rate matrices (in terms of A and P), the APC model smoothes the data conservatively, about as much as a 3-point moving average, yielding around a 40-60% reduction in the width of the confidence intervals. Of course, the precise amount of noise reduction depends on a number of technical details including whether over-dispersion is present or accounted for.
This application of the APC model is illustrated in Figure 2 for the breast cancer data. The age-standardized rates (ASRs) over time calculated using the observed rates are nearly identical to the ASRs calculated using the APC fitted rates. However, the point-wise confidence intervals for the fitted rates are substantially narrower, by around 40% averaged over the 10-year time period.
The APC parameter called the net drift (Table 1 and equations (2) and (3)) estimates the same quantity as the EAPC of the ASR, i.e. the overall long-term secular trend. The point estimates for these quantities are almost identical for the breast cancer data in Figure 2; net drift = 0.83% per year (95% CI: 0.78 to 0.85%/yr) and EAPC = 0.78% percent per year (0.18 to 1.39%/yr). However, for this example, the estimated confidence bands are much narrower for the net drift.
We introduced a novel estimable function called the fitted age-at-onset curve to summarize the longitudinal (i.e. cohort-specific) age-associated natural history (Table 1 and figure 3) (46). By construction, the fitted curve extrapolates from observed age-specific rates over the full range of birth cohorts to estimate past, current, and future rates for the referent cohort, e.g., the 1932 cohort in this example. The fitted age-at-onset curve provides a longitudinal age-specific rate curve that is adjusted for both calendar-period and birth-cohort effects. We view it as an improved version of the cross-sectional age-specific rate curve, improved because the cross-sectional curve is not adjusted for period and cohort effects (47). The fitted curve has proven very useful in practice (38-40, 42-44, 46, 48).
Finally, period deviations in the APC model (Table 1) identify changes over time; such change points are often analyzed non-parametrically using joinpoint regression methods (25). Similarly, cohort deviations can provide an explanation for joinpoint patterns in age-specific rates over time.
There are many useful extensions to the basic APC model. Estimable functions are amenable to formal hypothesis tests (29, 30). Parameters associated with age, period, and cohort can be smoothed (49). Parametric assumptions about the shape of the age incidence curve derived from mathematical models of carcinogenesis can be incorporated (50). Other extensions have included parametric (33) and nonparametric (51, 52) assessments of changes in period and cohort deviations, and simultaneous modeling of a moderate or large number of strata, such as geographic areas, using Bayes and Empirical Bayes methods (53).
Recently, we developed novel methods to compare age-related natural histories and time-trends between distinct event rates assuming that separate APC models hold for each (36). Using this approach one can formally contrast the incidence of a given tumor such as breast cancer in two populations, say Black versus White women (46), or the incidence of two tumor subtypes in the same population, say, ER positive versus ER negative breast cancers ((46), supplemental Figure). We demonstrated that two event rates are proportional over age, period, or cohort if and only if certain sets of APC parameters are all equal across the respective event-specific models (36). We also developed corresponding tests of proportionality and estimators of rate ratios.
A number of authors have forecast future cancer rates using the APC model (54-58). Projections quantify the future implications of current trends, for example, the impact of a net drift of 1% versus 2% over time, or the future impact of recent changes in birth cohort patterns.
Successful technological evolution builds on effective design. This is just as true for statistical methods as for computers and cellular phones. We have argued here that the APC model provides a useful evolutionary extension to the standard armamentarium of methods available to the descriptive epidemiologist. The APC model is not a replacement for existing methods, which are popular and successful. Rather, it provides a refined means of estimating the same quantities, while also adding useful new capabilities, such as formal methods for comparing two sets of rates or projecting the future cancer burden.
Using the APC model, cancer registry data can be analyzed in the same spirit as any other epidemiological cohort using the same concepts, such as proportional hazards, confounding, and effect modification/interaction. Importantly, because cancer registries follows a cohort of cohorts, analysis of registry data can reveal fundamental changes in population rates that are not usually discernable in standard cohort or case-control studies.
Currently, software for APC analysis is available only through fairly specialized packages (SAS, R, Matlab). Development of good stand-alone software, in addition to education and training, are needed if the full potential of the APC model is to be exploited by descriptive epidemiologists.
This research was supported by the Intramural Research Program of the National Institutes of Health, National Cancer Institute. All of the authors had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
DISCLAIMER: None of the co-authors has a financial conflict of interest that would have affected this research.