Within limits, cancer registries register details on all primary tumours that arise in individuals in their catchments area, and routinely published incidence data include more than one cancer for some individuals (those with a second or multiple primaries). To quantify the burden of cancer in a population, traditionally, epidemiologists present and compare age-standardised incidence rates, which is an approach still taken by the International Agency for Research on Cancer (IARC) in its Cancer in Five Continents series (

Parkin *et al*, 2010), but these have little intuitive appeal. By contrast, the lay media likes to quote ‘lifetime risk' (e.g., ‘1 in 3 of us will get cancer at some point in our lives'), but currently the calculation of this may not correspond to what one intuitively understands by the phrase; it is generally not comparable between populations, or over time.

Here we describe briefly the various existing methods which could be used to give an estimate of lifetime risk, and why some of them are not advisable. We also highlight the distinction between the probability of getting cancer over a lifetime (the true lifetime risk) and the mean number of cancers per lifetime (which is currently often what is estimated by the reported ‘lifetime risk'). Finally, we propose a method for estimating the true lifetime risk from routine national statistics, allowing for both competing risks to be taken into account, and for avoiding second primaries in the same individual being treated as if the cancers were in two different individuals. The effect of this adjustment is to reduce the estimated lifetime risk of cancer: the resulting estimate is close to that obtained when calculations can be performed on data including only first primaries in individuals (which we propose is the ‘gold standard' for calculating lifetime risk).

**Summary of existing techniques measuring cancer in the population, and the risk of developing cancer**

*Crude and age-standardised incidence rates* The simplest method, which summarises the occurrence of disease in a population is the crude incidence rate. As the incidence of cancer varies hugely with age, this measure suffers two main disadvantages: it has no ‘everyday' interpretation; and direct comparison between populations is likely to be misleading because of different age structures. The effect of age can be controlled by the process of direct age standardisation, but this generally gives a figure that is even less intuitively interpretable. Neither of these methods provides estimates of the lifetime risk of developing cancer.

*Cumulative rate* Day, 1987 proposed a different method of age-standardising incidence rates called the cumulative rate, defined as:

where the summation is until age-band A,

*w*_{i} is the width of the

*i*th age band in years and

*ρ*_{i} is the age-specific annual incidence rate in the ith age-band. This method has two advantages: first as a form of directly standardised incidence rate, comparisons between populations are immediately possible; and second it can be interpreted intuitively as an approximation to the cumulative risk an individual has of developing cancer up to a defined age, provided there are no other competing risks. However, the cumulative rate is far from ideal. A defined upper age limit needs to be chosen and can have a substantial impact on the result. For instance the cumulative rate to age 85 is for many cancers double the cumulative rate to age 75. If the upper age is set too low, say at age 75, then differences in cancer incidence or mortality between long-lived populations may be missed. Further, if the upper age is set too high, the intuitive interpretation as a risk of getting cancer is misleading because competing risks have not been taken into account and many individuals will die from an unrelated cause before reaching age 85.

*Cumulative risk* The cumulative rate can be converted into a true cumulative risk (

Day, 1987) using the formula:

and, to a very close approximation, if the cumulative rate to a particular age is ‘1 in

*x*', then the cumulative risk to the same age will be ‘1 in (

*x*+1/2)'. However, the correspondence between cumulative risk and cumulative rate is only valid if an individual is only able to have at most one event, which is clearly not the case in terms of incidence data for all cancers routinely presented by cancer registries. Although the cumulative risk does not give an estimate of the risk of developing cancer over a lifetime, it has been used as an approximation of this when the truncated upper age band is chosen as an age close to the average life expectancy of the population. However, neither the cumulative rate nor the cumulative risk take other competing risks into account, and hence tend to overestimate the probability of developing cancer over a lifetime, and indeed up to a particular age.

*‘Current probability' method* A realistic estimate of the lifetime risk of getting cancer can be obtained by estimating the number of cancers that would arise during the lifetime of a hypothetical birth cohort. This was done by Goldberg

*et al* in 1956 to estimate ‘the probability of developing cancer' using a current life-table and calculating the number of cases that would occur within each age band (on the basis of the person-years at risk, from the life table, and the current age-specific incidence rate). This approach was termed ‘current probability' by

Esteve *et al* (1994). It takes competing risks into account and is not truncated at an arbitrary upper age; thus giving an estimate of lifetime risk. When truncated at the same age as a cumulative risk estimate, the ‘current probability' value obtained is lower because it has allowed for the competing risks (

Esteve *et al*, 1994; and and below). However, comparisons of such lifetime risks between populations may not reflect differences in cancer incidence because the construction of the life table uses current all-causes mortality rates, which may differ between populations.

Sasieni and Adams (1999) proposed using standard sex-specific life tables in an attempt to overcome this issue.

| **Table 1**Estimates of risk of developing any malignant neoplasm excluding NMSC, by calculation method; Scotland, 2001–2005: (a) males; (b) females |

| **Table 2**Estimates of risk of developing cancer, by site, by sex; Scotland, 2001–2005 |

When the ‘current probability' method is used on data containing only first primaries for all individuals, it provides an excellent estimate of lifetime risk (referred to here as the ‘gold standard'). However, when it is run on routine incidence data, two implicit assumptions are made, neither of which are likely to be exactly true. One is that the incidence rates are based on a denominator of individuals who have never had cancer before; the other is that the numerator only counts first cancers. Without such assumptions one is calculating a cumulative lifetime rate rather than lifetime risk. The issue of the numerator can be serious. Given the multiple primaries in routinely published incidence data, the ‘current probability' method is actually estimating the average number of primary tumours per person, rather than the probability of getting cancer, and hence tends to overestimate the lifetime risk of getting cancer for all tumours.

*Devcan—the SEER analytical program adjusting the denominator in the current probability method* This package – available at

http://surveillance.cancer.gov/devcan/ – uses a method that differs from the current probability method only in the way it deals with data in 5-year age bands. However, the authors of Devcan are also interested in the residual lifetime risk from a certain age and hence need to calculate the number of people who will be alive and cancer free at that age. This has been addressed by statisticians working for the US National Cancer Institute (

Wun *et al*, 1998;

Fay *et al*, 2003) building on the earlier work of

Goldberg *et al* (1956). Devcan assumes that data with only the first primary tumour per individual are available. When estimating the lifetime risk of getting any cancer (i.e., at any site) using routinely published registry data, Devcan makes no adjustment for cancers at different sites in the same individual. Even with access to a registry database there is an issue of how far back the registry goes and how much immigration there is into the registry. With many registries an individual could have an earlier cancer that is unknown to the registry.

*The proposed new method – the adjusted for multiple primaries method* The issue of multiple primary tumours being recorded in the same person has been recognised previously, notably by the National Cancer Institute (

Feuer *et al*, 1993) that used incidence data that only contained the first primary breast cancer diagnosis to calculate the lifetime risk of developing breast cancer, but has not been addressed (as far as we are aware) with respect to the risk of any cancer. Here we present a correction to address the serious issue of multiple primaries within routinely published incidence data: the adjusted for multiple primaries or AMP method.

Registries, such as the Scotland Cancer Registry, that are able to present data for only first occurrences offer an opportunity to assess the value of this correction, and the new method is illustrated in comparison with most of the methods described above to allow differences in the results of each approach to be examined. Additional analysis is undertaken using the new method on aggregated data in 5-year age groups, rather than on age in individual years, because this is the way routine incidence data are generally reported by cancer registries.