Home | About | Journals | Submit | Contact Us | Français |

**|**Cancer Inform**|**v.9; 2010**|**PMC2867636

Formats

Article sections

Authors

Related links

Cancer Inform. 2010; 9: 67–78.

Published online 2010 April 14.

PMCID: PMC2867636

Eppley Cancer Institute, University of Nebraska Medical Center, 986805 Nebraska Medical Center, Omaha, NE. Email: ude.cmnu@mrehss

Copyright © the author(s), publisher and licensee Libertas Academica Ltd.

This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

This article has been cited by other articles in PMC.

An efficient computing procedure for estimating the age-specific hazard functions by the log-linear age-period-cohort (LLAPC) model is proposed. This procedure accounts for the influence of time period and birth cohort effects on the distribution of age-specific cancer incidence rates and estimates the hazard function for populations with different exposures to a given categorical risk factor. For these populations, the ratio of the corresponding age-specific hazard functions is proposed for use as a measure of relative hazard. This procedure was used for estimating the risks of lung cancer (LC) for populations living in different geographical areas. For this purpose, the LC incidence rates in white men and women, in three geographical areas (namely: San Francisco-Oakland, Connecticut and Detroit), collected from the SEER 9 database during 1975–2004, were utilized. It was found that in white men the averaged relative hazard (an average of the relative hazards over all ages) of LC in Connecticut vs. San Francisco-Oakland is 1.31 ± 0.02, while in Detroit vs. San Francisco-Oakland this averaged relative hazard is 1.53 ± 0.02. In white women, analogous hazards in Connecticut vs. San Francisco-Oakland and Detroit vs. San Francisco-Oakland are 1.22 ± 0.02 and 1.32 ± 0.02, correspondingly. The proposed computing procedure can be used for assessing hazard functions for other categorical risk factors, such as gender, race, lifestyle, diet, obesity, etc.

In cancer epidemiology, a risk of getting a cancer in a given age (*t*) is evaluated by the age-specific incidence rate, *I*(*t*), as the number of cases of a particular type of cancer per 100,000 population. Along with age, race and gender, as well as with time period and birth-cohort effects,^{1}^{–}^{4} incidence rates also depend on other risk factors, such as geographical area, dietary factors, life style habits, etc., which can be viewed as categorical variables.

During the last 50 years, finding a direct relationship between the observed incidence rates and risk factors determining these rates has been one of the main challenges of cancer epidemiology. Some progress in solving this problem is achieved by the use of the log-linear model.^{5}^{,}^{6} The log-linear age-period-cohort (LLAPC) model is used to account for age, time period and birth-cohort effects.^{7}^{–}^{10} According to this model, an age-specific incidence rate of a cancer can be presented as a product of the time period and birth cohort coefficients, as well as an unknown age-specific hazard function, i.e. risk function of getting the cancer at a given age. Recently,^{11} we expanded the use of the LLAPC model on cases when the mathematical form of the hazard function is unknown and proposed a novel computational procedure allowing one to separate the problem of estimating the time period and birth cohort coefficients from the problem of estimating the unknown hazard function.

In the present work, we expand the use of LLAPC model for characterizing unknown hazard functions for populations with different exposures to categorical risk factors (different categories of a categorical variable). In our model, the dissimilarity in exposure is presented by different descriptive categories of the corresponding categorical variable.

The proposed procedure was used for estimating the age-specific hazard functions of lung cancer (LC) for the gender- and race-specific populations living in different geographical areas. For this purpose, we utilized data on LC incidence rates observed in white men and women, in three geographical areas (namely: San Francisco-Oakland, Connecticut and Detroit), collected during 1975–2004. The estimates were obtained from the observed cancer incidence rates, and preliminarily corrected for time period and birth cohort effects. These corrections were made by the approach that we described in.^{11}

We have found that the LC hazard functions associated with living in these geographical areas have different amplitudes, but the overall shape of these functions is very similar. We have shown that geographical area risk factors influence the LC age-specific hazard functions in approximately the same manner in all ages.

Thus, in this work we provide a proof-of-concept that the proposed computing procedure can be successfully applied for estimating the influences of categorical risk factors on the hazard functions for a particular type of cancer.

According to the LLAPC model of cancer presentation in aging, the observed incidence rates can be expressed by the product of unknown coefficients of the time period and the birth cohort effects and the unknown hazard function. This function presents a risk to get cancer in aging independently from the time period and birth cohort effects. Until recently, the use of this model in cancer epidemiology was limited to the cases when the mathematical form of the hazard function is known *a priori* (for instance, the form of hazard function can be taken from a biological model of cancer development),^{8} but parameters of this function can be unknown. In this case, the time period coefficients, *v _{j}*, the birth cohort coefficients,

$${I}_{i,j}({t}_{i})={v}_{j}{u}_{l}h({t}_{i});\hspace{0.17em}i=1,\dots ,n;j=1,\dots ,m;l=1,\dots ,k$$

(1)

In (1), *I _{i,j}*(

In practice, the identifiability problem can be solved by the use of some assumptions. For instance in,^{8} this problem was solved assuming that within each age interval, the observed cancer cases have a Poisson distribution and the mathematical form of the hazard function is given *a priori*. Adjustments of unknown parameters were performed by the LLAPC model using the maximum likelihood method for assessing the birth cohort and time period effect coefficients as well as parameters of the hazard function. An initial assumption that the cohort effect is absent was used at the beginning of the iteration process to determine the birth cohort and time period effect coefficients. These coefficients were estimated by anchoring one time period coefficient (*v* = 1) and one birth cohort effect coefficient (*u* = 1). Thus, the results obtained by this procedure depend on the hazard function used, and on the time period and cohort, to which the coefficients are anchored.

Recently in,^{11} we expanded the use of the LLAPC model of cancer presentation in aging on cases when the mathematical form of the hazard function is unknown. In contrast to the previously used methods, a simple, computationally effective method^{11} provides an estimation of the time period and birth cohort coefficients without any *a priori* knowledge of the hazard function. The only assumption used in that method is that the cohort effect coefficients of the neighbor cohorts are nearly the same. Thus, the results of assessing the birth cohort and time period effect coefficients obtained by the method^{11} depend only on the time period and cohort, to which the coefficients are anchored, but not on the unknown hazard function. It allows one to separate the problem of estimating the time period and birth cohort coefficients from the problem of estimating the unknown hazard function. Moreover, as we have shown below, the use of the procedure^{11} allows one to estimate the age-specific hazard function defined by the certain categorical risk factors.

Let us denote by *I _{i,j,c}*(

$$\begin{array}{l}{I}_{i,j,c}({t}_{i})={v}_{j,c}{u}_{l,c}{h}_{c}({t}_{i})\hspace{0.17em}i=1,\dots ,n,\\ \hspace{0.17em}j=1,\dots ,m,l=1,\dots ,k\end{array}$$

(2)

Here, *v _{j,c}* and

As can be seen from (2), the hazard function along with the age also depends on the category, *c*. By using our procedure,^{11} one can obtain the estimates of the time period and birth cohort coefficients,
${v}_{j,c}^{*}$ and
${u}_{l,c}^{*}$, and their standard errors
$\mathit{\text{SE}}({v}_{j,c}^{*})$ and
$\mathit{\text{SE}}({u}_{l,c}^{*})$ (here and below the asterisk denotes estimates, as well as estimators). Again, a distinguishable feature of the procedure^{11} is that the aforementioned estimates are obtained without using any information on the hazard function, *h _{c}*(

Using the obtained estimates of the time period and birth cohort coefficients, ${v}_{j,c}^{*}$ and ${u}_{l,c}^{*}$, the observed incidence rates can be corrected for these effects in the following way:

$$\begin{array}{l}{I}_{i,j,c}^{*}({t}_{i})=\frac{{I}_{i.j,c}({t}_{i})}{{v}_{j,c}^{*}{u}_{l,c}^{*}};\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}i=1,\dots ,n;\\ \hspace{0.17em}\hspace{0.17em}j=1,\dots ,m;\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}l=1,\dots ,k\end{array}$$

(3)

In calculations we use only the incidence rates when the number of cases is larger than 15. Therefore, to characterize the error distributions of the incidence rates, the normal distribution (instead of the Poisson distribution usually used) can be utilized.^{12} It can be shown that when coefficients of variation of the *I _{i,j,c}*(

According to the standard rules of error propagation, ^{13} squares of the standard error of
${I}_{i,j,c}^{*}({t}_{i})$, presented by (3), can be calculated by the following formula:

$$\begin{array}{l}{\mathit{\text{SE}}}^{2}[{I}_{i,j,c}^{*}({t}_{i})]={\left(\frac{1}{{v}_{j,c}^{*}{u}_{l,c}^{*}}\right)}^{2}{\mathit{\text{SE}}}^{2}[{I}_{i,j,c}({t}_{i})]\\ \hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}+{\left[-\frac{{I}_{i,j,c}({t}_{i})}{{v}_{j,c}^{*2}{u}_{l,c}^{*}}\right]}^{2}{\mathit{\text{SE}}}^{2}({v}_{j,c}^{*})\\ \hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}+{\left[-\frac{{I}_{i,j,c}({t}_{i})}{{v}_{j,c}^{*}{u}_{l,c}^{*2}}\right]}^{2}{\mathit{\text{SE}}}^{2}({u}_{l,c}^{*})\end{array}$$

(4)

where the coefficients before squares of the standard errors are squares of partial derivatives of
${I}_{c}^{*}$ with respect to *I _{c}*,
${v}_{c}^{*}$ and
${u}_{c}^{*}$, correspondingly.

From (2) and (3) one can obtain the following system of conditional equations:

$${I}_{i,j,c}^{*}({t}_{i})={h}_{c}({t}_{i});\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}i=1,\dots ,n;\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}j=1,\dots ,m$$

(5)

From (5) it can be seen that for assessing values of the hazard function, *h _{c}*(

$${h}_{c}^{*}({t}_{i})=\frac{{\sum}_{j=1}^{m}{w}_{i,j}{I}_{i,j,c}^{*}({t}_{i})}{{\sum}_{j=1}^{m}{w}_{i,j}}$$

(6)

In (6), the weights, *w _{i,j}*, are given as reciprocals of the square of the standard error of estimates of the
${I}_{i,j}^{*}({t}_{i})$ given by formula (4). Standard errors of the corresponding estimate,
${\mathit{\text{SE}}}^{2}[{h}_{c}^{*}({t}_{i})]$, can be easily obtained from (6):

$${\mathit{\text{SE}}}^{2}[{h}_{c}^{*}({t}_{i})]=\frac{1}{{\sum}_{j=1}^{m}{w}_{i,j}}=\frac{1}{{\sum}_{j=1}^{m}1/{\mathit{\text{SE}}}^{2}[{I}_{i,j,c}^{*}({t}_{i})]}$$

(7)

(Note, when variables on the left side of the conditional equations (5) are normally distributed with known standard errors, the least square estimators, ${h}_{c}^{*}({t}_{i})$, will be also normally distributed.)

From (3)–(4) and (5)–(6) it follows that estimates,
${h}_{c}^{*}({t}_{i})$, and their *SE* can be calculated by the observed incidence rates, *I _{i,j,c}*(

For populations with different exposures to the considered risk factor, the ratios of the corresponding age-specific hazard functions can be used as a measure of relative hazard. In fact, let us denote by
${h}_{0}^{*}({t}_{i})$ and
${h}_{1}^{*}({t}_{i})$ (*i*= 1,...,*n*) the estimates of the hazard function corresponding to two categories, coded as 0 and 1. Then, at a given age interval, *t _{i}*, the ratio,
${r}_{1|0}^{*}({t}_{i})={h}_{1}^{*}({t}_{i})/{h}_{0}^{*}({t}_{i})$, will present an estimate of the relative hazard for a population coded as

$${R}_{1|0}^{*}=\frac{{\sum}_{i=1}^{n}{w}_{i}{r}_{1|0}^{*}({t}_{i})}{{\sum}_{i=1}^{n}{w}_{i}}$$

(8)

In (8), the weights, *w _{i}*, are given as reciprocals of the square of the

$${\mathit{\text{SE}}}^{2}[{R}_{1|0}^{*}]=\frac{1}{{\sum}_{i=1}^{n}{w}_{i}}=\frac{1}{{\sum}_{i=1}^{n}1/{\mathit{\text{SE}}}^{2}[{r}_{1|0}^{*}({t}_{i})]}$$

(9)

Analogously, taking
${h}_{0}^{*}({t}_{i})$ as a standard, for multiple categories of a given risk factor (coded as *c* = 0, 1, 2, 3, …), the ratios;

$$\begin{array}{l}{r}_{1|0}^{*}({t}_{i})={h}_{1}^{*}({t}_{i})/{h}_{0}^{*}({t}_{i}),\\ {r}_{2|0}^{*}({t}_{i})={h}_{2}^{*}({t}_{i})/{h}_{0}^{*}({t}_{i}),\\ {r}_{3|0}^{*}({t}_{i})={h}_{3}^{*}({t}_{i})/{h}_{0}^{*}({t}_{i}),\\ \dots \end{array}$$

(10)

will give corresponding estimates of the relative hazard at a given age interval, *t _{i}*, for populations exposed to the categories,

As a test-bed for the proposed procedure of evaluation of hazard functions, we analyzed the LC risks associated with a geographical area. In this work, we used the protocol for data preparation, analogous to the one described in.^{11} The first primary, microscopically confirmed LC cases for white men and women collected during 1975–2004 were extracted from the SEER 9 registries. Data for three geographical areas were utilized in our study: (i) San Francisco-Oakland, (ii) Connecticut, and (iii) Detroit, coded as *c* = 0, *c* = 1, and *c* = 2, correspondingly. LC incidence rates, expressed per 100,000 persons, were age-adjusted by the direct method to the 2000 United States standard population.^{15} The *SE* of the age-adjusted incidence rates were calculated as described in.^{16}

The obtained incidence rates were grouped in six five-year cross-sectional time periods. These periods were indexed by *j*: 1975–79 (*j* = 1); 1980–84 (*j* = 2); 1985–89 (*j* = 3); 1990–94 (*j* = 4); 1995–99 (*j* = 5); and 2000–04 (*j* = 6). Each of these subsets was grouped into 18 five-year age groups: 17 groups, ranging from 0 to 84 years, and the 18th group that included all cases for ages 85+. These groups were indexed by *i* in the following way: 0–4 (*i* = 1); 5–9 (*i* = 2), 10–14 (*i* = 3), …, 80–84 (*i* = 17), 85+ (*i* = 18). We only used the data for the groups over age 35 (*i* = 8, 9, …, 18), because the incidence rates for these groups had corresponding case counts that were statistically significant. We considered 16 birth cohorts (*l* = 1, 2, …, 16), corresponding to birth year ranges of 1890–94, …, 1965–69.

Thus, the age-adjusted incidence rates of LC in white men (as well as in white women) in three considered geographical areas were presented as the following sets of values: *I _{i,j,}*

Our procedure described in^{11} was used to estimate the time period and birth cohort coefficients (and their *SE*) for the LC age-adjusted incidence rates in white men and women in each of three considered geographical areas. Estimates of the time period and birth cohort coefficients,
${v}_{j,c}^{*}$ and
${u}_{l,c}^{*}$ (*c* = 0,1,2), were obtained using
${v}_{6,c}^{*}=1$ (time period 2000–2004) and
${u}_{8,c}^{*}=1$ (cohort 1925–1929), as anchors. The estimates,
${I}_{i,j,c}^{*}({t}_{i})$, and their standard errors were obtained by formulas (3) and (4), correspondingly. Finally, estimates of the hazard function,
${h}_{c}^{*}({t}_{i})$, and their *SE* were obtained by formulas (6) and (7).

Figure 1 presents the incidence rates observed in men during the six (five-year long) time periods of 1975–2004 in San Francisco-Oakland (panel A), Connecticut (panel B), and Detroit (panel C). Panels A–C of Figure 2 present the analogous rates observed in women. As can be seen from the panels A, B and C, the observed incidence rates differ remarkably during the observed six time periods. This significantly complicates studies of relationship between the observed incidence rates and age.

Lung cancer incidence rates in white men during six (five-year) time periods of 1975–2004 in (**A**) San Francisco-Oakland, (**B**) Connecticut, and (**C**) Detroit. (**D**) Estimates of the age-specific hazard functions in these areas (error bars indicate standard **...**

Lung cancer incidence rates in white women during six (five-year) time periods of 1975–2004 in (**A**) San Francisco-Oakland, (**B**) Connecticut, and (**C**) Detroit. (**D**) Estimates of the age-specific hazard functions in these areas (error bars indicate **...**

Tables 1 and and22 present the estimates of the age-specific hazard functions (as well as their *SE*) of LC for the considered geographical areas in men and women, correspondingly. Visual presentation of these estimates is given on panels D of Figures 1 and and2.2. As can be seen from these panels, the distribution of the estimated values of the corresponding hazard functions exhibits definite patterns having common features, such as an exponential rise in values (from the age about 40 until the age about 70), turnover (taking place at the age interval of 70–80) and a fast fall (at the older ages). Interestingly, the absolute values of the hazard functions of LC determined for men in the San Francisco-Oakland area appears to be systematically lower than the corresponding estimates for Connecticut or Detroit areas. Analogous distributions are observed for the hazard functions of LC determined for women in these areas. Based on these observations, we hypothesized that the risk factors of LC, associated with geographical area, uniformly influence the values of the age-specific hazard functions.

Estimates of the age-specific hazard functions,
${h}_{0}^{*}({t}_{i})$,
${h}_{1}^{*}({t}_{i})$, and
${h}_{2}^{*}({t}_{i})$, and their standard errors (*SE*) for white men in three geographical areas: San Francisco-Oakland, Connecticut, and Detroit.

Estimates of the age-specific hazard functions,
${h}_{0}^{*}({t}_{i})$,
${h}_{1}^{*}({t}_{i})$, and
${h}_{2}^{*}({t}_{i})$, and their standard errors (*SE*) for white women in three geographical areas: San Francisco-Oakland, Connecticut, and Detroit.

To test this hypothesis, we used the age-specific hazard function of the San Francisco-Oakland area as a standard to estimate the relative age-specific hazards,
${r}_{1|0}^{*}({t}_{i})={h}_{1}^{*}({t}_{i})/{h}_{0}^{*}({t}_{i})$ (and their *SE*), for Connecticut vs. the San Francisco-Oakland and the relative age-specific hazards,
${r}_{2|0}^{*}({t}_{i})={h}_{2}^{*}({t}_{i})/{h}_{0}^{*}({t}_{i})$ (and their *SE*), for Detroit vs. the San Francisco-Oakland area. The obtained estimates of the relative hazards (and their *SE*) of LC for men and women are given in Tables 3 and and4,4, correspondingly.

Estimates of the age-specific hazard function ratios and their standard errors (*SE*) for Connecticut vs. San Francisco-Oakland.

Estimates of the age-specific hazard function ratios and their standard errors (*SE*) for Detroit vs. San Francisco-Oakland.

To perform graphical analysis of the estimates of the age-specific relative hazards,
${r}_{1|0}^{*}({t}_{i})$ and
${r}_{2|0}^{*}({t}_{i})$, we used 95% confidence intervals (95% CI),
${r}_{1|0}^{*}({t}_{i})\pm 1.96\cdot \mathit{\text{SE}}[{r}_{1|0}^{*}({t}_{i})]$ and
${r}_{2|0}^{*}({t}_{i})\pm 1.96\cdot \mathit{\text{SE}}[{r}_{2|0}^{*}({t}_{i})]$. Preliminary analysis showed that the estimates of the age-specific relative hazards are slightly fluctuated near certain constants depending on the considered geographical area and gender. To determine these constants, we applied the linear regression analysis. In this case, the most efficient estimates of the corresponding constants can be obtained by formula (8). We determined the estimates of the averaged relative hazards of LC in the Connecticut vs. San Francisco-Oakland areas,
${R}_{1|0}^{*}$, and in the Detroit vs. San Francisco-Oakland areas,
${R}_{2|0}^{*}$. The *SE* of the corresponding estimates was calculated by formula (9).

Outliers (i.e. those points which have large influence on the resulting fit) were excluded by the standard procedures of the linear regression analysis.^{14} After omitting these outliers, the estimates of the constants were recomputed.

Calculations showed that for men living in Connecticut vs. San Francisco-Oakland, the estimate of the averaged relative hazard (±*SE*) of LC is 1.31 ± 0.02, while for men living in Detroit vs. San Francisco-Oakland this estimate is 1.53 ± 0.02. Analogous calculations suggest that for women living in Connecticut vs. San Francisco-Oakland, the averaged relative hazard is 1.22 ± 0.02, while for women living in Detroit vs. San Francisco-Oakland this hazard is 1.32 ± 0.02.

In Figure 3, panel (A) shows the graph of the relative hazards with their 95% CI, ${r}_{1|0}^{*}({t}_{i})\pm 1.96\cdot \text{SE}[{r}_{1|0}^{*}({t}_{i})]$, for white men in Connecticut vs. San Francisco-Oakland. Panel (B) of this figure shows the relative hazards with 95% CI, ${r}_{2|0}^{*}({t}_{i})\pm 1.96\cdot \text{SE}[{r}_{2|0}^{*}({t}_{i})]$, for men in Detroit vs. San Francisco-Oakland. Analogously, panels A and B in Figure 4 show the relative hazards with 95% CI, for white women. On these panels, the horizontal line indicates the average of the relative hazards and error bars indicate the 95% CI.

The estimates of the age-specific relative hazards in white men: (**A**) for Connecticut vs. San Francisco-Oakland and (**B**) for Detroit vs. San Francisco-Oakland. Error bars indicate 95% confidence intervals. Open circles indicate outliers. Dashed line indicates **...**

The estimates of the age-specific relative hazards in white women: (**A**) for Connecticut vs. San Francisco-Oakland and (**B**) for Detroit vs. San Francisco-Oakland. Error bars indicate 95% confidence intervals. Open circles indicate outliers. Dashed line indicates **...**

Assuming that the estimate of the averaged relative hazard is equal to the mathematical expectation of this estimator, the estimates of the relative hazards can be compared with the averaged relative hazard. When the 95% CI of the relative hazard intersects with the corresponding averaged relative hazard, this relative hazard can be considered as statistically indistinguishable from the averaged value.

Analysis of Figures 3 and and44 suggests that the age-specific relative hazards of LC are nearly constant and depend on the geographical areas and gender. In fact, data presented in Table 3 (after excluding one outlier) show that the risk of LC in Connecticut vs. San Francisco-Oakland is about 1.3 times higher for men, whereas for women, it is about 1.2 times higher. Analogously, data in Table 4 (after excluding outliers) show that for men in Detroit vs. San Francisco-Oakland this risk is about 1.5 times higher, while for women, it is about 1.3 times higher. In this connection, it should be mentioned that the trends appearing on Figures 3 and and44 are much exaggerated. This is because the scale of the *x* axis on these figures is about 100 times smaller than the scale for the *y* axis. Performed regression analysis showed, however, that slopes of the linear regression lines for men in Connecticut vs. San Francisco-Oakland (Fig. 3A) and Detroit vs. San Francisco-Oakland (Fig. 3B) are 0.0023 (*SE* of 0.0009) and 0.0014 (*SE* of 0.0020), correspondingly. Analogous slopes of the linear regression lines for women in Connecticut vs. San Francisco-Oakland (Fig. 4A) and Detroit vs. San Francisco-Oakland (Fig. 4B) are −0.0038 (*SE* of 0.0012) and −0.0056 (*SE* of 0.0023), correspondingly. We also found that even when outliers are not excluded, the slopes for men and women do not exceed 0.008 (i.e. the values of slopes are always near zero). This suggests that the age-specific relative hazards of LC are nearly constant.

Based on this analysis, we suggest that the risk factors of LC, associated with the geographical area, uniformly influence the values of the age-specific hazard functions. This can be illustrated by Figures 5 and and66 showing that after adjustments by the corresponding averaged relative hazard, the shapes of the age-specific hazard functions for white men and women living in Connecticut and Detroit are almost identical to the corresponding age-specific hazard functions for white men and women living in the San Francisco-Oakland area. For Connecticut and Detroit, adjustments of their hazard functions to the hazard function of the San Francisco-Oakland area were performed by dividing the hazard function values by the corresponding values of the averaged relative hazard.

Comparison of age-specific hazard functions of lung cancer in white men unadjusted (**A** and **C**) and adjusted (**B** and **D**) for geographical location. Error bars indicate standard errors. **A**) Unadjusted hazard functions in Connecticut and San Francisco-Oakland. **...**

In this work, we proposed an efficient computing procedure for estimation of the age-specific hazard functions in the LLAPC model. This procedure is based on the novel approach for analysis of time period and birth cohort effects on the distribution of the age-specific cancer incidence rates, developed in our previous work.^{11}

The procedure proposed in the present work allows one to estimate the age-specific hazard functions for populations with different exposures to a given categorical risk factor. The ratios of hazard functions for populations with different exposures to a given categorical risk factor are used for characterizing relative age-specific hazards of cancers.

As a proof-of-concept that this procedure can be used to evaluate the influence of categorical risk factors on the age-specific hazard functions, we estimated LC risk for populations living in different geographical areas. For this purpose, we utilized data on the LC incidence rates in white men and women, collected in the San Francisco-Oakland, Connecticut and Detroit areas during 1975–2004.

We have found that the risks of LC in white men and women, associated with living in these geographical areas, differ in amplitude but the overall shape of these functions are similar, i.e. the geographical area risk factors influence the LC age-specific hazard functions in approximately the same manner in all ages. We have shown that in white men the averaged relative hazard of LC in Connecticut vs. San Francisco-Oakland is 1.31 ± 0.02, while in Detroit vs. San Francisco-Oakland this relative hazard is about 1.53 ± 0.02. In white women, analogous relative hazards in Connecticut vs. San Francisco-Oakland and Detroit vs. San Francisco-Oakland are 1.22 ± 0.02 and 1.32 ± 0.02, correspondingly.

We suggest that the proposed computing procedure can be used for assessing hazard functions for other categorical risk factors, such as gender, race, lifestyle, diet, obesity, etc.

This work was partially supported by 5 P30 CA36727 (NIH) grant and LB506 grant (Nebraska Department of Health). Authors acknowledge Dr. Leo Kinarsky for fruitful discussion and helpful comments.

**Disclosures**

This manuscript has been read and approved by all authors. This paper is unique and is not under consideration by any other publication and has not been published elsewhere. The authors and peer reviewers of this paper report no conflicts of interest. The authors confirm that they have permission to reproduce any copyrighted material.

1. Clayton D, Schifflers E. Models for temporal variation in cancer rates. I: age-period and age-cohort models. Statistics in Medicine. 1987;6:449–67. [PubMed]

2. Clayton D, Schifflers E. Models for temporal variation in cancer rates. II: age-period-cohort models. Statistics in Medicine. 1987;6:469–81. [PubMed]

3. Holford TR. Understanding the effects of age, period, and cohort on incidence and mortality rates. Statistics in Medicine. 1991;12:425–57. [PubMed]

4. Moolgavkar SH, Lee JAH, Stevens RG. Analysis of vital statistical data. In: Rothman K, Greenland S, editors. Modern Epidemiology. 2nd Ed. Lippincott-Raven; PA: 1998. pp. 482–97.

5. Selvin S. Statistical Analysis of Epidemiologic Data. 3rd Ed. Oxford University Press; 2004. pp. 263–90.

6. Holford T. Multivariate Methods in Epidemiology. Oxford University Press; 2002. pp. 205–26.

7. Fu WJA. Smoothing cohort model in age-period-cohort analysis with applications to homicide arrest rates lung cancer mortality rates. Sociol Method Res. 2008;36:327–61.

8. Luebeck EG, Moolgavkar SH. Multistage carcinogenesis and the incidence of colorectal cancer. Proc Natl Acad Sci U S A. 2002;99:15095–100. [PubMed]

9. Meza R, Jeon J, Moolgavkar SH, Luebeck EG. Age-specific incidence of cancer: phases, transitions, and biological implications. Proc Natl Acad Sci U S A. 2008;105:16284–9. [PubMed]

10. Moolgavkar SH, Meza R, Turim J. Pleural and peritoneal mesotheliomas in SEER: age effects and temporal trends, 1973–2005. Cancer Causes Control. 2009;20(6):935–44. [PubMed]

11. Mdzinarishvili T, Gleason MX, Sherman S. A novel approach for analysis of the log-linear age-period-cohort model: Application to Lung Cancer Incidence. Cancer Informatics. 2009;7:271–80. [PMC free article] [PubMed]

12. Devore JL, Berk KN. Modern Mathematical Statistics with Applications Duxbury Press; 2007. 838

13. Lindberg V. Guide to uncertainties and error propagation Rochester; NY: c1999–2003.2003[updated 2003 Aug; cited 2009 Feb 2]. Available from: http://www.rit.edu/cos/uphysics/uncertainties/Uncertainties.html

14. Chatterjee S, Hadi AS, Price B. Regression analysis by example. 3rd Ed. Wiley; New York: 2000. p. 18.

15. Surveillance, Epidemiology, and End Results (SEER) Program. Standard Populations (Millions) for Age-Adjustment [cited 2009 Feb 2]. Available from: http://seer.cancer.gov/stdpopulations/stdpop.singleagesthru99.txt

16. Surveillance, Epidemiology, and End Results (SEER) Program. Rate Algorithms [cited 2009 Feb 2]. Available from: http://seer.cancer.gov/seerstat/WebHelp/Rate_Algorithms.htm

Articles from Cancer Informatics are provided here courtesy of **Libertas Academica**

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |