|Home | About | Journals | Submit | Contact Us | Français|
A study is undertaken to evaluate the temporal projection methods that are applied by the American Cancer Society to predict four-year-ahead projections.
We obtained cancer mortality data recorded in each year from 1969 through 2007 for the US overall and for each state from the National Center for Health Statistics. Based on the mortality data through 2000, 2001, 2002 and 2003, we projected 4-years ahead to estimate the expected number of cancer deaths in 2004, 2005, 2006, 2007, respectively, in the US and in each state using five projection methods. These predictive estimates are compared to the observed number of deaths that occurred for all cancers combined and 47 cancer sites at the national level, and 21 cancer sites at the state level.
Among the models we compared, the joinpoint regression model with modified BIC selection produced estimates that are closest to the actual number of deaths Overall, we find the 4-year-ahead projection using any one method has larger error than 3-year-ahead projection of death counts using the same method. However, 4-year-ahead projection from the new method performs better than the 3-year-ahead projection from the current State Space method.
Joinpoint regression modified BIC model has the smallest error of all the models considered for four-year-ahead projection of cancer deaths to the current year for the United States overall and for each state. This method will be used by the American Cancer Society to project the number of cancer deaths starting in 2012.
For more than half a century, the American Cancer Society has published the number of cancer deaths expected to occur in the current year for the total US and for individual states1–2. These estimates have traditionally been available in January in order to provide the contemporary burden of cancer in that particular year. However, in the past few years, these estimates were published as late as June because of delays in the release of the final mortality data for the most recent data year from the National Center for Health Statistics, which are used in the computation of the estimates. The timing of release of final mortality data for the US is largely contingent on receipt of information on all deaths registered in all states. The delay in receipt of data from even one state may result in delays in the release of national data.
The estimated number of cancer deaths in the current year have been based on a statistical model (the state-space model 3) since 2004 that projects three years ahead of the most recent data year using the actual number of cancer deaths beginning in 1969. However, the estimated deaths in 2011 were based on a four-year-ahead projection using the actual number of cancer deaths from 1969–2007 4 because of a protracted delay in the release of the final mortality data for 2008.
In this paper, we compare the accuracy of the state-space model and four additional models for four-year-ahead projection of cancer deaths to the current year nationally and in each state in order to determine the best method of projection. The additional models are: (1) the Bayesian State-Space Model5, (2) two versions of the Joinpoint Regression Model (based on the permutation test and modified-BIC criteria for selecting the number of joinpoints)6, (3) the Nordpred Model7, and (4) the Vector Autoregressive Model via Hilbert-Huang Transform8.
Based on the historical data we used all the methods to project the number of deaths four years ahead to 2004, 2005, 2006, and 2007 using data through 2000, 2001, 2002 and 2003 respectively. Because it is not obvious if a longer historical series necessarily produces more accurate predictions, we tried all methods with both the full historical series (using data starting in 1969), and a shorter series using the last 15 years of historical data. These predictive estimates are compared to the observed number of deaths that occurred for all cancers combined and 47 cancer sites at the national level, and 21 cancer sites at the state level. In the following sections we describe the data used for this evaluation, develop criteria for evaluating the performance of the models, describe the five models, and summarize the results of our testing. Finally, select a method to be utilized for projecting cancer mortality counts by the American Cancer Society for 2012 and after.
We obtained cancer mortality data recorded in each year from 1969 through 2007 for the US overall and for each state from the National Center for Health Statistics. The cancer mortality data can also be retrieved from Surveillance Epidemiology and End Results (SEER)9 database of the National Cancer Institute. For the national data, we used all cancer sites combined and 47 individual cancer sites. For the state level data, we selected 21 individual cancer sites and all the cancer sites combined. Table 1 shows the cancer sites we used. The associated ICD codes can be viewed from SEER website (http://seer.cancer.gov/codrecode/1969+_d09172004/index.html).
The comparison is based on 4-year-ahead projection using data from 1969–2007. For full available years data points, we use mortality data from 1969 to 2000 to predict cancer deaths in 2004, data from 1969 to 2001 to predict cancer deaths in 2005, data from 1969 to 2002 to predict cancer deaths in 2006, and data from 1969 to 2003 to predict cancer deaths in 2007. To have a more thorough comparison among these different methods, we use various lengths of years of historical mortality data to project 4 years ahead. In addition to using the entire time span of the data beginning in 1969, we consider a shorter sequence of years to project future mortality because: (1) mortality data have been constantly improving in quality, and so a reliance on older more error-prone data may not be justified, and, (2) we noticed that the trend for larger cancer sites is approximately linear in the recent years, and it was not clear if older data would add more signal or noise to the prediction. In particular, we chose 15 years of data for projection because 15 years is the minimum number that the Nordpred method can be run on (see description of Nordpred below). For the shorter series, we used 15 years of mortality data from 1986 to 2000 to predict cancer deaths in 2004, data from 1987 to 2001 to predict cancer deaths in 2005, data from 1988 to 2002 to predict cancer deaths in 2006, and data from 1989 to 2003 to predict cancer deaths in 2007. For all the methods we compared the result using both the 15 year and all available years historical series.
To measure the accuracy of the prediction counts, for each year the prediction error is estimated by computing the predicted minus the observed value. We then find the ratio of the prediction error to the observed count, adding 0.5 to the denominator to avoid numerical error when the observed count is zero. The ratio is the percentage error of the prediction, compared to the observed count:
Finally, the average absolute relative deviation (AARD) is computed as the average of these error ratios over all the cases considered in the data. More specifically, assume s is the predicted mortality for specific scenario s, s=1,…,S, and that θs is the true observed count. Then . For example, when calculating AARD for all sites mortality at the state level, the observed data are death counts for each site-state combination from year 2004 to year 2007. There are 8088 cases across the states and cancer sites, and for each case, there are observed deaths from years 2004, 2005, 2006, and 2007. Therefore, S = 8088 (4) = 32352.
The AARD is interpreted as the average percent deviation from the true value (number of deaths) relative to the true value. This measure attempts to take into account the relative differences in observed mortality counts among different cancers and/or geographic areas as we attempt to assess the extent to which the estimates deviate from observed. We also obtain the AARD for each method considered and for categories of the mortality counts, from rarest to most common, both at the US and the state level. For example, the AARD for the less than 400 deaths category (“< 400”) is the average of percent deviation over many types of cancer in sparsely-populated states or rare cancers in any state for years 2004–2007.
Other measures of discrepancy are also obtained to compare the methods. The Maximum Absolute Relative Deviation (MARD) is a measure of the worst prediction error seen for each method over the categories of cancer types or numbers of deaths. T h e Mean Relative Sums of Squares Deviation (MRSSD) is similar to the AARD, only deviations are squared before computing the average, resulting in applying higher weights to larger deviations. The Root Mean Square Error (RMSE) is an estimate of variability of estimates about the true value. The Normalized Root Mean Square Error (NRMSE) is the RMSE expressed as a fraction of the mean. The Average Rank of the Relative Sums of Squares (ARRSS) is the average rank of deviations among the methods. (For the technical details, please see a technical report at http://surveillance.cancer.gov/reports/review/appendix_cancercounts.pdf, section A.5). These measures have similar results to the AARD, therefore in this study we use AARD as the default measure for comparison. Instead of declaring a single “winner” for each evaluation, we consider methods that are within 15% of the winner using the described metrics.
We use SS-F to represent the state-space method using full data, and SS-15 for the state-space method using 15 year data. The state space method has been used to project the cancer mortality counts three years ahead to the current calendar year since 20043,9. The method uses a local quadratic model to obtain projections of the time series of mortality counts four years into the future. The method proceeds as follows. First, to account for the uncertainty in measuring the observed mortality counts, the count at any point of time is assumed to be a realization of a random phenomenon, which has a normal distribution. This is the so-called measurement equation. The choice of normal distribution is made for the purposes of mathematical simplicity, and is a reasonable one when the mortality counts are high (such as when dealing with data at the national level).
Next, we model the average trajectory (i.e., the trajectory obtained by joining the averages of these normal distributions) using a quadratic trend. Since mortality trends can be reasonably expected to change over time due to policy changes and scientific advances, we additionally assume that the quadratic trend is not fixed but is instead locally varying. Such a locally varying quadratic trend is obtained by representing the year-to-year variation of trend in the form of a so-called state-space model10, whereby one relates the parameters (or state) of the model at a particular point to those of the previous point through a set of so-called transition equations. These transitions are assumed to have a stochastic component, and as before, errors in these transitions are assumed to have normal distributions.
The measurement equation, combined with the transition equations, completes the specification of the state-space model. An iterative procedure called the Kalman Filter11 can then be used to estimate the trend and project it into the future. The Kalman filter, however, requires that the variances in the measurement and transition equations be known. For our case, we estimate them using the method of moments, coupled with an additional step, whereby the estimates are also adjusted to minimize the sum of squares of prediction errors for the years where observed data are available. We refer to this step as the “tuning” process, in the sense that the parameters of the model are “tuned” to make the predictions close to the actual values. The whole model fitting and projection is done using the statistical software R12.
As an alternative method of projection of time series counts, we explored the use of dynamic generalized linear models, which were fitted using the Bayesian paradigm13. We will henceforth refer to this overall method as the Bayes method. The method proceeds as follows. Since the observed values are yearly mortality counts, we assumed that for any year, the observed mortality count is a realization of a random variable, taken for our purposes to have the Poisson distribution. The parameters of the Poisson distributions are assumed to be unknown and vary smoothly from year-to-year according to a random process. In particular, since the Poisson parameter has to be positive, we assume that the logarithm of the parameter (henceforth called the state) in any year is a normal perturbation around the state of the previous year, with the amount of variation in this perturbation constant from year-to-year. All these can be used to put together the likelihood function, which contains all the information on the unknown model parameters from the data. Combined with prior information on the unknown parameters in the form of distribution of the initial state (at time t=0) and the variance of the year-to-year transition of the states, the likelihood is used to obtain the posterior distribution of the parameters. Although the full posterior distribution is unavailable in a closed form, the univariate (and sometimes multivariate) conditional distributions are easy to sample from. We used a combination of various Markov chain Monte Carlo methods to generate observations from a Markov chain, that converges to the posterior distribution. Convergence was achieved within almost 100,000 iterations. After convergence, an additional 100,000 iterations of this chain were used in the generation of 4-year-ahead predictions, picking out every 100th iteration to reduce the autocorrelation. The generated predictions are estimates of the posterior of the corresponding predictive distributions. Detailed discussion of the Bayes state space methods with particular emphasis on modeling time series of counts in epidemiology can be found in Schmidt and Pereira5.
We use BSS-F to represent the Bayes state-space method using full data, and BSS-15 for the Bayes state-space method using 15 year data.
Joinpoint regression has been applied to cancer trend analysis to summarize trend changes by using segmented line regression6. Cancer incidence and mortality rates are modeled as a function of time which is composed of piecewise linear segments. The model is fitted by the least squares method at a given number of change-points, called joinpoints, and then the number of joinpoints is estimated. The Joinpoint software, available at http://surveillance.cancer.gov/joinpoint, provides the point and interval estimates of the slope parameters, which are equivalent to the annual percent change (APC) rates, as well as the point and interval estimates of the joinpoints themselves. To select the number of joinpoints, the first version (1.0a) of Joinpoint software applied the permutation test procedure, where the permutation distribution of the test statistic is used to calculate the p-value to see if the data show enough evidence to add more joinpoints. Since Version 1.0a was released in 1998, Joinpoint has been continuously updated to improve its accuracy and efficiency and to include additional features.
One of the updates is the addition of two other model selection procedures: Bayes Information Criterion (BIC) and modified Bayes Information Criterion (MBIC). The BIC was added as a faster alternative to the permutation procedure, which is computationally expensive, and the MBIC was proposed to improve the performance of the BIC that tends to over-fit the model. In this study, we used Joinpoint 3.5.1 and implemented both the permutation procedure and the MBIC. Joinpoint program allows a user to change model fitting specifications which include the maximum number of joinpoints, the minimum number of observations from a joinpoint to either end of the data, and the minimum number of observations between two joinpoints. In our comparison, we used a maximum of 2 joinpoints, a minimum of 4 observations from a joinpoint to either end of the data (including the joinpoint), and a minimum of 4 observations between two joinpoints (inclusive) for 15 years data (called JP–Perm–15 and JP-MBIC-15 for permutation-based and MBIC-based, respectively). For full data, we used a maximum of 5 joinpoints, a minimum of 4 observations from a joinpoint to either end of the data (including the joinpoint), and a minimum of 4 observations between two joinpoints (called JP–Perm–F and JP-MBIC-F for permutation-based and MBIC-based, respectively).
Spectral analysis is a complementary tool for analyzing time series. Fourier transform, wavelet transform and some other spectral analysis methods have been developed to analyze stationary time series data. These techniques are widely used by many scientists in the fields of electrical engineering, physics, among many others. Recently, the Hilbert-Huang Transform (HHT) based on empirical mode decomposition (EMD) was developed for nonlinear and non-stationary processes8. The advantage of HHT-EMD is that it does not require a set of pre-specified functions. Instead, it uses a set of adaptive intrinsic mode functions (IMF) derived from the time series data itself. Since its inception, the HHT-EMD methods have been applied to image processing, biomedical research, health monitoring and atmospheric engineering14.
To project the mortality count, we apply the EMD to decompose mortality data into several IMFs and then apply multivariate time-series technique to these IMFs for projection. Among many multivariate time series models, vector autoregressive analysis (VAR) is a standard instrument for forecasting procedure. VAR is a multivariate version of autoregressive model. While univariate time-series models may be useful for describing short-term correlation, the multivariate time-series models could give a better description of the underlying structure of the time series and give a better forecast. An R software package “vars” is used for obtaining the forecast counts.
We use VAR-F and VAR-15 to represent the vector autoregressive model via Hilbert-Huang transform method using full data and 15 year data respectively.
Time trends in incidence and mortality have also been modeled using the age-period-cohort (APC) model15. The data used in APC model consists of a set of age-specific counts tabulated for several periods of time. The Nordpred model is an age-period-cohort model, and it has been applied to model the mortality and incidence counts7. Interval widths for age and period are typically assumed to be equal. The “Nordpred” model used in this analysis is essentially Model B1 of Møller, et al.7. Model B1 assumes mortality counts follow a Poisson distribution with linear effects of age, period and cohort linked to the expected rate through a 5th root function. The period effect is further modeled as a linear regression on period with the regression coefficient being the drift parameter. In this analysis, predictions involved only 25% of the drift regression parameter for the first future period estimated rates and no drift for the second future period estimated rates.
To fully implement the Nordpred method in this study, the age and period variables information of the death counts are provided from the SEER database and the age and period variables are then grouped into 5-year intervals. We used 18 five-year age categories (0–4, 5–9, …, 85+), and seven or three five-year periods are used depending on the data used is full data or 15 years data. Age and period effects were for five-year groupings. Five year predictions were translated to single year estimates using linear interpolation. We use NP-F and NP-15 to represent the Nordpred model method using full data and 15 year data respectively.
Table 2 shows a summary statistics for comparison of 12 methods (including variations of the 5 basic methods) applied to the national data and the state data. For the national data, averaged over the 347 combinations of cancer sites and sex, JP-MBIC-15 has the smallest AARD (0.065). Any method with AARD within 15% of this value will be considered to be within the best method circle, and thus could be a candidate for being selected as the best method. With minimal AARD of 0.065, we obtain a threshold of 0.0748 for defining “best methods”. In addition to JP-MBIC-F, the methods NP-15 and BSS-15 fall into this category. These methods are shown in bold in the first row of Table 2.
For the state-level data. averaged over 8076 combinations of cancer sites, sex, and states, BSS-F has the smallest AARD (0.159). Adding 15% of 0.159, we obtain a threshold of 0.1825. The methods BSS-F, NP-F, SS-F, NP-15, JP-15, JP-MBIC-15 and BSS-15 all fall below the threshold. The methods that are within the threshold are shown in bold in row 2 of Table 2.
Taking into consideration both the national and state level data, only NP-15, JP-Perm-15, JP-MBIC-15 and BSS-15 are the remaining possible choices because they fall below the respective thresholds at both the national and state level. Notice that although BSS-F has the smallest AARD in the state level, it is not chosen because it is not in the group falling below the 15% threshold in the national data.
Cancer sites with larger mortality are considered more common. The last row in table 3 shows the average absolute relative deviation from all the methods compared for cancer sites with mortality counts greater than 200,000 cases. For these cancer sites with larger count, the national and state level large cancer sites are overlapped with each other. The method SS-F has the smallest value (AARD=0.011), and the methods JP-MBIC-15 (AARD=0.015), JP-Perm-15 (AARD=0.015), and BSS-15 (AARD=0.012) are not too far behind.
To further consider which method to use, we group the cancer sites by the number of deaths observed. This is to decide on whether one method is better depending on the rarity of cancer death. Tables 4 and and55 show the summary of the AARD from difference sizes considered when the methods are applied to national data and state level data respectively. At the national level, we group the cancer sites into 7 categories (<400, 401–1000, 1001–5K,5K–10K, 10K–20K, 20K–200K, and >200K). At the state level, the cancer sites are grouped into 12 categories (<25, 26–50, 51–100, 101–200, 201–300, 301–400, 401–1000, 1001–5000, 5K–10K, 10K–20K, 20K–200K, >200K). In each category, we first find the minimum AARD among the 12 methods, and then find a threshold by adding 15% of the minimum to it. Any method that falls below the threshold is considered a candidate for the best method. The minimum and the threshold for each group are listed in the last two columns of Tables 4 and and55.
Across the groups, the number of times the methods with AARD fall below the respective threshold is calculated and summarized in Table 5. The joinpoint regression modified BIC (JP-MBIC-15) method performs best at the national level and nation-state level combined. For JP-MBIC-15 method at the national level, 4 out of 7 groups are below the threshold, and at the state level, 7 out of 12 groups are below the threshold. Overall, JP-MBIC-15 has 11 out of 19 groups falling below the threshold, and it outperforms all other methods and is our final choice for mortality projection.
Figure 1 shows the trends of AARD for all the methods we compared at national level. Except for the SS-F method, all other methods show an increasing trend over the year-ahead projections. For example, they all have larger AARD for 4-year-ahead projection than 3–year-ahead projection except the SS-F method, as expected. Notice that SS-F is the current method that has been used by ACS for projecting the upcoming year mortality count since 2004. Figure 1 shows that SS-F performed worse than many methods we considered.
An important finding from the comparisons is that using the full time span of the data does not have an advantage over shorter time series in improving the accuracy of prediction. Figure 2 also shows that with the same method, using 15 years of data tends to perform better than 30 years of data. This is not always true at the state level where, for example, BSS-F is better than BSS-15. But in general, Figure 2 shows a similar pattern to Figure 1.
The comparison of variations of five methods of projecting the number of cancer deaths indicates that the JP-MBIC-15 model is better than any of the other methods at the national level for 4-year-ahead projection. At the state level, BSS-15 has a slightly smaller AARD than VAR-15, NP-F, and JP-MBIC-15. We could have used BSS-15 for state level data in projection, but we felt that JP-MBIC-15 performed reasonably well (with 15% of BSS-15). Since we felt that it would be better to use a single method if possible, our decision was to use, JP-MBIC-15, at both the state and national levels in Cancer Facts & Figures 2012 and Cancer Statistics, 2012.
Nordpred has been used in the Nordic countries and U.K. to predict cancer incidence and mortality7,16. It performs relatively well for the state-level data. For the State-space and Bayes state-space methods, their strengths are that they are both flexible and can easily adapt to changing trends. However, the State-Space method is better suited for common cancers, as it makes assumption of normality for the measurement equation. Both the VAR models and Bayesian state-space models are stochastic model-based methods which tend to capture sudden changes in trends and perform better in shorter time series and noisy data. In the companion paper about incidence rate projection, we found that these two methods perform the best among many others in incidence count projection.
The Joinpoint regression model was developed by the National Cancer Institute. The method has been used to evaluate cancer trends in the United States. The model needs to determine the number and locations of the joinpoints by permutation tests. Due to the computational burden of the permutation tests, a modified BIC test is incorporated to finding the joinpoints. The JP-MBIC-15 method is useful to identify changes in the trend, and is sensitive to changes in trends near the end of the series. This makes it suitable for short-term exploration.
The fluctuations in mortality are usually relatively attenuated compared to changes in incidence, thus the best methods recommended for incidence and mortality are different. Even though JP-MBIC-15 is determined to be the best overall method, it could be outperformed by other methods in some instances. Both the frequency of cancer deaths for a specific cancer site/geography combination and the length of the projection (i.e. 1, 2, 3, or 4 year) play a major role in how accurate the projection is. Nevertheless, JP-MBIC-15 outperforms the method previously used by ACS, SS-F, for 4-year-ahead projections. The AARD at the national level for 4-year-ahead projections using JP-MBIC-15 is 0.065, while that for SS-Full is 0.085.
One motivation to reexamine the methodologies for mortality projections was that ACS was forced to move from a three year to a four year projection because of delays in the arrival of the data. Despite the concern that 4 year projections would be significantly worse than 3 year projections, the 3 year AARD for the previously used model (SS-F) was larger than the 4 year AARD for the selected model (JP-MBIC-15). At the national level, the 3-year-ahead projection for SS-F methods has AARD=0.092 and the 4-year-ahead projection for JP-MBIC-15 has AARD = 0.071, while at the state level, the 3-year-ahead projection for SS-F methods has AARD=0.178 and the 4-year-ahead projection for JP-MBIC-15 has AARD = 0.162.
Because the methods used to project these counts ahead in time are applied to each state separately, differences in the size or age structure of the population or underlying incidence rates between states are irrelevant. That is, none of the projection methods borrow strength from or assume a correlation of results with neighboring states; each state projection is independent of the others. Therefore, it is only necessary to assume that there are no sudden changes to the state population characteristics at some point in time.
The NORDPRED method attempts to decompose the pattern of change in the deaths over time into changes due to age, period and cohort effects, then projects the period effects ahead in time, holding age effects constant. The other methods treat each state’s number of deaths as a time series of counts to be projected ahead based on prior trends in changes in the counts, regardless of the reason for those changes. If, for example, a state’s population is not growing in size but is becoming younger over time, we would expect the number of cancer deaths to be decreasing over time, as there are fewer older residents who are more likely to die from cancer. This age effect is but one of many that may affect the number of cancer deaths; the methods, with the exception of NORDPRED, do not attempt to identify each component of change in the counts, but simply attempt to model the change in the number of deaths which have resulted from an aggregate of causes.
K.G.’s work is supported by NIH’s contract HHSN261201000671P. H.J.K’s work is supported by NIH’s contract HHSN261200900681P. L.W.P’s work is supported by NIH’s contract HHSN261201100094P.
There are no financial disclosures from other authors.