|Home | About | Journals | Submit | Contact Us | Français|
A simple, computationally efficient procedure for analyses of the time period and birth cohort effects on the distribution of the age-specific incidence rates of cancers is proposed. Assuming that cohort effects for neighboring cohorts are almost equal and using the Log-Linear Age-Period-Cohort Model, this procedure allows one to evaluate temporal trends and birth cohort variations of any type of cancer without prior knowledge of the hazard function. This procedure was used to estimate the influence of time period and birth cohort effects on the distribution of the age-specific incidence rates of first primary, microscopically confirmed lung cancer (LC) cases from the SEER9 database. It was shown that since 1975, the time period effect coefficients for men increase up to 1980 and then decrease until 2004. For women, these coefficients increase from 1975 up to 1990 and then remain nearly constant. The LC birth cohort effect coefficients for men and women increase from the cohort of 1890–94 until the cohort of 1925–29, then decrease until the cohort of 1950–54 and then remain almost unchanged. Overall, LC incidence rates, adjusted by period and cohort effects, increase up to the age of about 72–75, turn over, and then fall after the age of 75–78. The peak of the adjusted rates in men is around the age of 77–78, while in women, it is around the age of 72–73. Therefore, these results suggest that the age distribution of the incidence rates in men and women fall at old ages.
It is well recognized that aging plays a fundamental role in the development of cancer in the human adult population. To describe the relationship between cancer incidence rates and the age of cancer presentation several mathematical models have been proposed (see, for example,1–5 and references therein). In these models, the distribution of incidence rates are presented as a set of numbers, Ii,j(ti), of new cases of a particular type of cancer that have been diagnosed during a given time period, j, per 100,000 population at each of the considered age intervals, ti.
Often, five-year-long age intervals are considered. For instance, 100 years of human life span can be divided on 20 five-year intervals: 0–4, 5–9, 10–14, …, 95–99. The center of these intervals can represent the corresponding age. The values Ii,j(ti), representing cancer incidence in each of these age intervals, are called the age-specific cancer incidence rates (or simply cancer rates). These rates can be collected during one or several calendar years. In the mathematical modeling of the age distribution of cancer rates, a five-year age period is commonly used. For instance, Ii,j(ti) may represent numbers of new cancer cases in each of the aforementioned age intervals diagnosed in the j-th time period that correspond to, for example, 2000–2004. Analyses of this type of data are called cross-sectional studies, because these studies aim to analyze data on people at different ages at the same time period.
The cross-sectional studies of the cancer incidence rate distributions in aging are different from the longitudinal studies of the analogous distributions. In the cross-sectional studies, the number of new cancer diagnoses can be counted simultaneously for different cohorts of people at a given time period, while in the longitudinal studies this data must be obtained for the same cohort of people but in different time periods. Each of these types of studies, cross-sectional vs. longitudinal, has their own advantages and disadvantages. For instance, it is clear that data for cross-sectional studies can be obtained much faster than for the longitudinal studies. In fact, to perform the aforementioned cross-sectional study, one has to collect data over a time period of five years (2000–2004), while for the analogous longitudinal studies, using a cohort of people born, say, in 1905–1909, to get data for all of the considered age intervals one must collect the corresponding incidence rates over 100 years. In addition, studies of cross-sectional data can provide clues to possible time period effects during which data was collected. For instance, implementation of new diagnostic techniques in a particular time period could influence the detection of a given cancer type at earlier ages (age-period effect). On the other hand, longitudinal studies (in contrast to cross-sectional data) can determine the influence of cohort effects on the age distribution of cancer rates (age-cohort effect). For example, dietary and life-style habits characteristic for a given generation of people can affect the cancer incidence rates.
In this connection, it must be emphasized that cross-sectional and longitudinal studies of cancer incidence rates performed independently can provide inconsistent or even confusing results. For instance, recently, using the SEER (Surveillance Epidemiology and End Results) database,6 Harding and colleagues4 analyzed the distribution of age-specific cancer incidence rates. For the vast majority of the examined cancers they found that the rates collected during three time periods, 1979–83, 1989–93 and 1999–2003 (cross-sectional data) increase up to the age of about 80 years and then fall at the oldest ages. However, the longitudinal data for lung cancer (LC) presented by Holford7 showed that the LC cohort risk increases with age, while the LC time period risk falls at old age. Moreover, it was suggested that there was no turnover, if both time period and cohort effects on cancer rates were considered.8
Accounting for time period and cohort effects (age-period-cohort model) represents a main challenge for mathematical modeling of relationship between cancer incidence rates and age of cancer presentation. This is because mathematically this problem falls into a category of so called identifiability problems with multiple estimators.9 In general, parameters to be determined (i.e. estimates for time period and cohort effects) as a solution of the considered problem, cannot be unambiguously identified. In other words, multiple estimators can provide equally good solutions for the problem and “true” age-period-cohort effects are difficult (if not impossible) to estimate simultaneously.9–15 The only hope for solving this problem (obtain consistent estimation for period and cohort effects) is to utilize an additional assumption on the data that is used.
Until recently, mathematical modeling of the relationship between cancer incidence rates and the age of cancer presentation has been performed exclusively using cross-sectional data.1–4 To address this short-coming, we are proposing a simple, computationally efficient procedure for analyses of time period and birth cohort effects on the distribution of the age-specific incidence rates of cancers. Assuming that cohort effects for neighboring cohorts are almost equal and using the Log-Linear Age-Period-Cohort (LLAPC) Model, this procedure allows one to evaluate temporal trends and birth cohort variations of any type of cancer without prior knowledge of the hazard function. The proposed approach was used to analyze the influence of the time period and birth cohort effects on the LC incidence rate distributions. Only first primary, microscopically confirmed cases from the SEER9 database6 over the period of 1975–2004 were considered. Using a novel approach, which is valid for any hazard function, we demonstrated that the time period trends in men and women are different in LC, while the cohort trends are similar. We also demonstrated that the distribution of these incidence rates falls at old ages, even after accounting for time period and birth cohort effects.
We describe a novel, computationally efficient procedure for the analysis of the time period and birth cohort effects in the frame of the LLAPC model. This procedure is tested on the example of LC.
In our study, we used data from only the SEER registries6 that correspond to the following nine (SEER9) areas: Atlanta, Connecticut, Detroit, Hawaii, Iowa, New Mexico, San Francisco-Oakland, Seattle-Puget Sound, and Utah. We used these nine registries, rather than the current set of seventeen, because the longitudinal nature of our study requires us to use data dating back two decades when there were only nine registries. First primary, microscopically confirmed LC cases from the SEER9 database for patients with known gender and race were considered to be “filtered” data, whereas the cases where such filtering was not performed were considered to be “raw” data. We used only filtered data that are more reliable and homogeneous than raw data.5,16 The incidence rates, I(t), expressed per 100,000 persons and age-adjusted by the direct method to the 2000 United States standard population,17 and their standard errors, SE, were utilized. The data were combined in six five-year cross-sectional time periods: 1975–79, …, 2000–2004. The gender-specific incidence rates were grouped into 18 five-year age groups: 17 groups, ranging from 0 to 84 years, and the 18th group that included all cases for ages 85+.
Table 1 presents approximations of the observed incidence rates, Ii,j(ti), as the product of a hazard function, h(ti) (which is a function of age t), the time period effect coefficient, vj, and the birth cohort effect coefficient, ul:
where i, j, and l denote the given age interval, time period, and cohort, correspondingly; n, m, and k are numbers of the age intervals, time periods, and cohorts, correspondingly.
In this table, the approximations of the cross-sectional data for six time periods 1975–79, …, 2000–2004 (index j = 1, …, 6) are shown in columns, while the approximations of the incidence rates for the same cohort groups (longitudinal data) are located along diagonals. We used only the data for the groups over age 30 (i = 7, …, 18), because the incidence rates for these groups were significant (according to SEER practice, the number of cases should exceed 15 to be statistically significant). We consider 17 birth cohorts (l = 1, …, 17), corresponding to birth year ranges of 1890–94, …, 1970–74. From this table one can see that l can be presented as l = j − i + 18.
Assuming that the numbers of cases have a Poisson distribution and the mathematical form of the hazard function is known a priori and the LLAPC model is used, one can make adjustments by using the maximum likelihood method for assessing the birth cohort and time period effect coefficients as well as parameters of the hazard function. These coefficients can be estimated by anchoring one time period coefficient (v = 1) and one birth cohort effect coefficient (u = 1).8,18,19 Note: the results of this procedure depend on the hazard function, and also the time period and cohort, to which the coefficients are anchored.
Below we describe a procedure that provides results independent of the hazard function. The hazard function values, presented in Table 1, can be canceled out by dividing the corresponding elements of the neighboring columns with indices j and j + 1 or j + 1 and j. Then from (1), one can obtain a pair of systems:
In (2) and (3), Ii,j(ti) can be considered as the observed values of the normally distributed variables with known standard errors, SEi,j. (Because we are using only the incidence rates with numbers of cases larger than 15, the normal distribution can be used instead of the Poisson distribution). Coefficients of variation for these rates can be estimated as:
Note: (2) provides 12 × 5 conditional equations for assessing five ratios of the time period coefficients (vj/vj+1, j = 1, …, 5), and 16 ratios of the cohort effect coefficients (ul/ul+1, l = 1, …, 16). Analogously, (3) provides 12 × 5 conditional equations for assessing five ratios of the time period coefficients (vj+1/vj, j = 1, …, 5), and 16 ratios of the cohort effect coefficients (ul+1/ul, l = 1, …, 16). Here, a problem of parameter identifiability arises.7,9–16 In particular, these systems do not have a single set of best estimates of vj+1/vj and ul+1/ul or vj/vj+1 and ul/ul+1. In fact, suppose we obtained the best estimates of vj+1/vj and ul+1/ul. Now, if we multiply the estimates of vj+1/vj by a constant and divide the estimates of ul+1/ul on this constant, the derived set of new estimates will also be as good as the initially assumed “best” estimates.
In order to solve this identifiability problem, additional assumptions are required.9–15 Assuming that any pair of the neighboring cohorts has the cohort effect coefficient ratio close to 1, these ratios can be set equal to 1 in (2) and (3). The rationale behind this assumption is that the adjacent cohorts usually overlap in time intervals and thus values of their cohort effect coefficient should be close. Now for estimating five ratios, vj/vj+1 and five ratios vj+1/vj, one has a pair of systems:
When coefficients of variation (4) are small, the standard errors of the ratios, Ii,j(ti)/Ii,j+1(ti), Ii,j+1(ti)/Ii,j(ti) can be calculated by the standard rules of error propagation.20
It can be shown that when the numerators and denominators in (5) and (6) are normally distributed and their coefficients of variation are small, then the ratios presented in (5) and (6) will be also normally distributed. In fact, let us assume that we have random variables A1 = a1 + 1 and A2 = a2 + 2, where a1 ≠ 0 and a2 ≠ 0 are constants and 1 and 2 are normally distributed random variables with zero means and standard deviations σ1 and σ2, correspondingly. When coefficients of variation are small (i.e. σ1/a1 1 and σ2/a2 1), then one can express the A1/A2 and A2/A1 ratios in the bivariate Taylor series around a1/a2 and a1/a2, and consider their linear approximations:
Because 1 and 2 are normally distributed variables, these linear combinations will be also normally distributed.
In the considered incidence rate data, coefficients of variation are less than 0.1, therefore the errors of the observed incidence rate ratios of the systems (5) and (6) can be considered as normally distributed. For estimation of vj/vj+1 and vj+1/vj, a least squares method can be applied and the most efficient estimates for these ratios are the weighted means of the observed values Ii,j(ti)/Ii,j+1(ti) and Ii,j+1(ti)/Ii,j(ti) averaged through index i correspondingly (weights are given as reciprocals of the square of their standard errors). The SE of the estimates (vj/vj+1)* and (vj+1/vj)* can also be calculated in a standard way (here and below asterisks (*) designate the corresponding estimates):
After anchoring any time period coefficient (for example, assuming that v6 = 1 and SE(v6) = 0), one can obtain step by step the following estimates of vj* derived by the estimated ratios (vj+1/vj)*:
For the other anchored time period coefficients, the estimates of vj* can be derived analogously by means of the estimates (vj/vj+1)* and (vj+1/vj)*. The SE of vj* can also be calculated by the standard rules of error propagation.
After estimating the time period coefficients, the incidence rates can be corrected for the time effects and the following system can be obtained from (2) and (3):
Here denotes incidence rates corrected for the time effects. By the standard rules of error propagation, one can calculate the SE of the incidence rates ratios presented on the left side of (10) and (11). Now, there are 12 × 5 conditional equations for assessing 16 ratios of the cohort effect coefficients ul/ul+1 and the same number of equations for assessing ul+1/ul. Similarly, as for ratios of time period coefficients, the ratios ul/ul+1 and ul+1/ul can be estimated by the weighted means of the incidence rate ratios of the left side of system (10) and (11). Weights should be given according to the SE of the incidence rate ratios (reciprocal of squares of the SE). Then, by choosing an anchored cohort coefficient, say u9 = 1, (SE(u9) = 0), all cohort coefficients and their SE can be estimated step-by-step by a procedure analogous to one used for the time period coefficients:
After evaluating time period and cohort effect coefficients, one can divide the initial incidence rates, Ii,j(ti), with the product of vj* and ul*, to obtain the incidence rates corrected for time period and cohort effects:
The aforementioned approach looks similar to one used in.18 However, the approach used in18 is based on the assumption that the birth cohort effects are absent. This allows the authors of18 to evaluate coefficients vj. Then, using the obtained time period effects, they correct the observed incidence rates and after that, estimate the ul coefficients. In contrast to,18 our approach uses the assumption that cohort effects for any two neighboring cohorts are almost equal. As can be seen, this assumption is not as strong as one used in.18 In addition, our procedure can assess time period and cohort coefficients without knowing the mathematical form of the hazard function, while the procedure described in18 requires prior knowledge of this function.
Table 2 shows the time period distributions (presented in columns) of the first primary, microscopically confirmed incidence LC rates for women. The observed patterns of the cross-sectional data are shown along columns and longitudinal data along diagonals. The cross-sectional and longitudinal data for the three consecutive cohorts that contain observations for the elderly exhibit turnovers at old age. Analogous observations for men also have turnovers (data not shown).
The longitudinal patterns shown in Table 2 are different from those presented in.7 In,7 using the raw LC incidence rates for women collected in Connecticut during the years of 1940–1984, it was shown that the longitudinal risks always increased with age (see Table 1 in7). This discrepancy can be explained by the fact that in contrast to,7 where raw data was used, we analyzed filtered SEER9 data collected during 1975–2004.
After estimating the time period and birth cohort effects, the incidence rates were adjusted to the 2000–2004 time period and to the 1945–1949 birth cohort. The adjusted rates are shown in Figures 1B and and2B.2B. As can be seen, the adjusted LC incidence rate distributions for both men and women have turnovers at old ages.
The adjusted rates for men and women increase (starting from the age of 30), reach the maximum and then fall at old ages. The differences are only in the age at which the distributions reach the maximum, and in the maximum values of the corresponding incidence rates. For men, this maximum is near the age of 77–78, while for women, it is near the age of 72–73. These patterns are different from the linear patterns (up to the age of 85) obtained in8 by accounting for time period and cohort effects on cancer. Again, this discrepancy can be explained by the use of raw data and an a priori assumed form of the hazard function for the time period and birth cohort adjustments used in,8 whereas we utilized only filtered data and our approach is independent of the hazard function.
Panels C and D of Figures 1 and and22 show the changes of the time period and cohort effect coefficients for men and women, correspondingly. The time period effect coefficients for men increase from the year 1975 to 1980 and then decrease until 2004. For women, these effects increase from 1975 to 1990 and then remain nearly constant. The birth cohort effect coefficients for men and women are similar; they increase from the cohort of 1890–94 until the cohort of 1925–29, then decrease until the cohort of 1950–54 and after that remain almost unchanged. It is possible that the observed temporal differences of the LC rates in men and women can be explained by the gender-specific smoking habits as it was suggested in21 (see also references in that paper).
For analyses of the time period and birth cohort effects on the distribution of the age-specific incidence rates of cancers, a simple, computationally efficient procedure, which does not require any prior knowledge of the hazard function, was proposed. Our approach uses the LLAPC model and assumes that cohort effects for neighboring cohorts are almost equal. The proposed procedure was used for analyzing the influence of the time period and birth cohort effects on the LC incidence rate distributions. However, this procedure can be applied for different types of cancers as well as for epidemiological studies of chronic diseases.
We found that the incidence rates of first primary, microscopically confirmed LC cases from the SEER9 database, adjusted by period and cohort effects, increase for both women and men, then turn over (at ages of about 72–73 and 77–78 for women and men, correspondingly) and fall at older ages. Thus, by utilizing the longitudinal and cross-sectional data and by accounting for time period and cohort effects, we have demonstrated that the LC incidence rates have a turnover at old ages, and the age at which this turnover takes place, is gender-specific. The explanation of this phenomenon should be a subject for future studies.
The authors report no conflicts of interest.