Mortality from prostate cancer (PCa) has decreased substantially in the United States, coinciding with the initiation of widespread prostate-specific antigen (PSA) –based screening. From 1994 to 2006, mortality rates declined by an average of 4% per year, the most rapid decline observed for any cancer site.^{1} Mathematical models have estimated that the stage migration induced by screening likely accounts for 45% to 70% of the observed reduction in PCa mortality through 2000.^{2} Notably, a similar decline was observed in Tyrol, Austria after the introduction of a PSA screening program, compared with the rest of the country where screening and curative treatment were uncommonly performed.^{3}

The European Randomized Study of Screening for Prostate Cancer (ERSPC) recently reported a 20% reduction in PCa mortality and a 41% reduction in metastatic disease at diagnosis in an intent-to-screen analysis conducted after a median follow-up time of 9 years.^{4} More recently, ERSPC estimated a mortality reduction of 31% after adjustment for noncompliance in the screening arm and contamination in the control arm.^{5} However, serious concerns were raised because the original ERSPC report included estimates indicating that a large number of men would have to be screened and treated to prevent one death from PCa.^{6} The number needed to treat (NNT) is a useful statistic to assess the balance of benefits and harms of an intervention.^{7} The goal of this study is to highlight some of the pitfalls in the calculation and interpretation of the NNT statistic and, in particular, to provide revised estimates of the NNT from the ERSPC trial accounting for the important effects of longer follow-up time.

Whether or not one accepts that PSA screening has a mortality benefit or at least reduces the incidence of metastatic disease, it must be acknowledged that screening programs engender costs at both the individual and societal level.^{8} Central to the debate over PSA screening are concerns regarding the diagnosis and treatment of tumors that may not cause harm.^{9} In the ERSPC trial, Schroder et al^{4} used the difference in cumulative mortality between the screening and control arms and the excess incidence of PCa in the screening arm to estimate a NNT of 48 to prevent one PCa death after a median follow-up time of 9 years. Because not every patient diagnosed with PCa will require treatment, NNT can be described more accurately in this context as the number needed to diagnose.^{10} The number needed to screen (NNS), which is simply the reciprocal of the absolute difference in cumulative mortality, was initially reported by ERSPC as 1,410 at the 9-year follow-up mark. This number can be reinterpreted as the number needed to be offered screening. The NNS was 1,068 when screening arm assignees who never underwent any screening were excluded.

Previous authors noted that the NNT statistic frequently has been used incorrectly in clinical trial reports in leading journals.^{11,12} NNT is easily understood when referring to proportions of patients assigned to each group at baseline but becomes more complex when dealing with differences in time-to-event data or event rates, which are based on actual person-time of observation. First, when rates, rather than proportions, are used as the basis for estimating NNT in the context of mortality, the NNT represents the amount of person-time (usually person-years), not the number of persons, that must be treated to prevent one death. Although this approach has been advocated as a way of standardizing the observation period and thus dealing with trials that have long and varying follow-up times for patients, the results are less intuitively appealing to clinicians, and their validity depends on the assumption that risk changes at a constant rate over time.^{12–14} Second, in almost all long-term trials such as ERSPC, some participants are removed from observation (ie, censored as a result of death or loss to follow-up) at varying points during follow-up, and the rates of censoring can also vary between treatment groups. Ignoring censoring, particularly differential censoring, can distort estimates of NNT that are based on simple proportions. This risk of distortion can be mitigated by instead calculating NNT based on the survival curves (or equivalent cumulative hazard functions [CHFs]) for each treatment group derived from common statistics such as the Kaplan-Meier or Nelsen-Aalen estimators, which account for variations in follow-up time among patients.^{15} Schroder et al^{4} appear to have used the Nelsen-Aalen CHF in estimating NNS and NNT. Because recalculation of the NNS and NNT using simple proportions based on the number of patients at baseline and the number of PCa deaths in each group yields nearly identical results, we assume that the pattern of censoring was approximately equivalent in each arm of the trial.

In the current analysis, however, we focused on another concern that we believe has a major effect on interpretation of NNT and NNS in the ERSPC, namely the fact that these statistics are time specific and will change as the risks for the treatment groups either converge or diverge over time. By extracting hazard rate estimates from the authors' Nelsen-Aalen curves and applying those rates in an appropriate model, we calculated predicted NNS and NNT estimates for different periods of follow-up.