|Home | About | Journals | Submit | Contact Us | Français|
A sophisticated reading of the randomized trial evidence suggests that, although screening for prostate cancer with prostate-specific antigen (PSA) can reduce cancer-specific mortality, it does so at considerable cost in terms of the number of men who need to be screened, biopsied, and treated to prevent one death. The challenge is to design screening programs that maximize benefits (reducing prostate cancer mortality) and minimize costs (overtreatment). Recent research has suggested that this can be achieved by risk-stratifying screening and biopsy; increasing reliance on active surveillance for low-risk cancer; restricting radical prostatectomy to high-volume surgeons; and using appropriately high-dose radiotherapy. In current U.S. practice, however, many men who are screened are unlikely to benefit, most men found to have low-risk cancers are referred for unnecessary curative treatment, and much treatment is given at low-volume centers.
Consideration of long-term trends in cancer mortality (1) indicates that disease-specific mortality rates generally fall only with decreased exposure to carcinogens (e.g., lung cancer, stomach cancer) or effective early detection (e.g., breast cancer, cervical cancer). Given that there is no obvious or widespread carcinogen associated with prostate cancer, it is therefore early detection that holds the greatest hope of reducing disease burden.
In many ways, prostate cancer is ideal for early detection. In general the disease is relatively slow growing, allowing a sufficient lead time for cancer to be identified before it becomes incurable, and there is a simple, noninvasive screening tool, namely testing for total prostate-specific antigen (PSA) levels in blood.
But two related factors complicate choices about screening for prostate cancer: its ubiquity and the toxicities associated with treatment. Autopsy studies show that a very high proportion of men dying from causes other than prostate cancer, and without a prior prostate cancer diagnosis, nonetheless have cancer detectable in the autopsy prostate (2). One contemporary estimate is that the rate of autopsy cancer is close to 40% by age 70 (3). This would be a more manageable problem if therapy for prostate cancer were harmless, but because of the anatomical proximity of the prostate to the rectum, bladder, urethra, and penis, curative treatment is associated with important risks of long-term erectile, urinary, and bowel dysfunction (4).
It is the combination of ubiquity and treatment harm that makes prostate cancer screening such a challenging decision. If prostate cancer were uncommon, then few would be affected by unnecessary treatment; if adverse effects of treatment were minor or uncommon, then unnecessary treatment would not be an important concern. In short, the dilemma for men considering screening is whether they should risk sexual dysfunction, incontinence or proctitis to treat a cancer that may never cause symptoms or shorten survival.
Contemporary prostate cancer screening is based on testing total PSA in blood. Total PSA comprises the sum of the unbound (“free”) PSA and PSA bound to inhibitors (“complexed PSA”) (5), with each form of PSA having different characteristics (6). There is an extensive literature on screening for prostate cancer. Studies include those of case-control design, comparing PSA screening history in men who died of prostate cancer with controls (7); cohort studies, comparing outcomes of patients presenting for treatment with and without PSA screening (8); ecologic studies, comparing prostate mortality in geographic areas with high versus low prevalence of screening (9); simulation studies, simulating the effects of screening based on computer models of prostate cancer development (10); randomized trials (11, 12); and meta-analyses of randomized trials (13, 14).
Much of this literature is unconvincing. Case-control studies of prostate cancer screening are prey to problems in ascertaining the exposure. It is widely accepted that prostate cancer is a slow-growing disease, and that the effects of screening will take a considerable period of time to become apparent. What matters, therefore, is screening behavior 10 or 15 years previously, which can be difficult to ascertain.
Cohort studies comparing outcomes of patients presenting clinically with those presenting after screening have a denominator problem. For example, imagine that, for every 1,000 in the population, 80 had an indolent cancer detectable by screening and 20 had an invariably fatal cancer that would eventually lead to clinical detection. In this scenario, screening would obviously have no possible benefit, but a cohort study would report survival rates of 80% for screen-detected cancers (for every 100 cancers detected, 80 are indolent and 20 are fatal) compared to 0% survival for clinically detected cancers, which are only detected at a late stage.
Ecologic studies are well known to be subject to important confounding. Two geographical areas that differ in rates of PSA screening will also likely differ in terms of treatment and may differ in genetic risk. Moreover, ecologic studies rely on administrative data, and there can be important variations in death certificate practices in different countries.
Simulation studies are based on assumptions about cancer behavior that are often open to interpretation and debate. As one example, computer models require a distribution for lead time; this was generally assumed to be exponential, until an empirical study recently demonstrated that prostate cancer lead time has close to a Gaussian distribution (15).
Randomized trials are widely thought to provide the best evidence as to the benefits and harms of interventions. Four randomized trials of prostate cancer screening have been reported to date, and we believe that two of these should not be considered in detail. The “Quebec” trial involved close to 50,000 men randomly selected to be invited to PSA screening or to no-screening control, but it did not include several key methodologic safeguards. In particular, prostate cancer mortality data were obtained from administrative data rather than being adjudicated by a committee blinded to treatment assignment (16). The “Norrköping” trial is relatively small—fewer than 1,200 men in the screening arm—and used only digital rectal examination (DRE) as the screening tool during the initial two rounds (six years) of screening, with PSA testing added halfway through the trial (17).
For these reasons, we focus on two large, methodologically rigorous trials: the Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial (PLCO trial), which took place at 10 separate sites in the United States (11), and the European Randomized Trial of Screening for Prostate Cancer (ERSPC trial), which included participants from the Netherlands, Sweden, Finland, Belgium, Spain, Italy, Switzerland, and France (12). Notably, the screening study in Göteborg, Sweden was originally an independent trial that later joined ERSPC.
The PLCO trial accrued close to 77,000 men aged 55–74 in 1993–2001. Participants were randomly assigned to six years of annual PSA tests and four years of DRE or control. Those with a PSA ≥4 ng/ml or a positive DRE were advised to seek further diagnostic evaluation, but no specific procedures as to biopsy indication were mandated in the study protocol. Prostate cancer mortality was adjudicated by reviewers who were blinded to treatment assignment.
The interim report of ERSPC included >180,000 men randomized in 1991–2003, of whom 162,000 were in the predefined “core” age group of 55–69 years of age. The screening approach varied slightly across the different centers but involved PSA tests every 2–4 years with DRE in some centers. An elevated PSA—3 ng/ml or higher in most centers—was considered an indication for biopsy. Cause of death was evaluated blind to treatment allocation, following a predefined algorithm.
The ERSPC and PLCO trials reported quite different results. Indeed, both reported early owing to diametrically opposite recommendations of their Data Safety and Monitoring Boards: ERSPC reported after the second interim data analysis that differences between groups crossed a predefined stopping boundary indicating effectiveness of screening; PLCO reported “on the basis of data showing a continuing lack of a significant difference in the death rate.” Specifically, PLCO reported a rate ratio of 1.13 (95% CI, 0.75–1.70) at seven years, whereas ERSPC reported a rate ratio of 0.80 (95% CI, 0.65–0.98; p = 0.04) at nine years follow-up. As required by the original approved study protocol, the randomized trial in Göteborg reported results separately from the main ERSPC report at 14 years follow-up (18). The rate ratio for prostate cancer death was 0.56 (95% CI, 0.39–0.82; p = 0.002).
The three major reports of the effectiveness of prostate cancer screening report no effect, a modest effect, and a large effect. Careful analysis of methodologic differences between the three reports can help explain these apparently contradictory results.
The PLCO trial took place in the United States, where PSA testing has become common. Typical estimates are that 75% of age-appropriate U.S. men have ever had a PSA test, with 50% undergoing regular screening (19). This was exactly the experience in PLCO, where ~45% of participants had had a PSA in the prior three years. In contrast, the rate of PSA testing was far lower in the European countries among ERSPC participants. In the Swedish arm, for example, the number of men subject to PSA testing before randomization was likely <5% (20). The effect of baseline screening is to remove from the risk set those cases most likely to die from prostate cancer. This can easily be seen by comparing event rates between the two trials: the risk of prostate cancer mortality is approximately twice as high in ERPSC controls than in PLCO controls. This lowers the statistical power of PLCO.
Reflecting community practice in the United States, many PLCO controls continued to undergo PSA testing despite being randomized to a no-screening group. In the very first year of the trial, the rate of PSA testing in the control group was 40%, and this increased to >50% by the end of the PLCO trial. Such contamination in ERSPC has been estimated to be 15% at most (21). The effect of contamination is clearly seen in prostate cancer incidence rates. The rates of prostate cancer in the screening arms are broadly comparable between PLCO and ERSPC—perhaps slightly lower in PLCO—but considerably different in the control arms. This is best illustrated by the relative risk of cancer in the two trials: in ERSPC, the incidence of prostate cancer was 70% higher in the screening arm than in controls; in PLCO, risk increased by only 20%.
The PLCO trial reported risk ratios at 7 and 10 years; the survival curves for ERSPC are reported out to 15 years; for the Swedish arm in Göteborg, median follow-up is ~5 years longer than for ERSPC as a whole. It is arguable that this alone could explain the divergent results of the three publications. Not only is it entirely predictable that the effects of screening increase over time—men dying in the first few years after randomization would typically have aggressive or advanced cancers that would be unaffected by early detection—but comparison of the survival curves clearly shows that in ERSPC, there was little difference between the screening group and the control group at 7 and 10 years. It is only on longer-term follow-up that differences in favor of screening started to emerge.
There have been numerous criticisms of PLCO and ERSPC. Critiques of PLCO have largely focused on the issues of baseline screening and contamination described above. Criticisms of ERSPC have included the charge that the study is a mere meta-analysis, on the grounds that the screening protocol varied in each country (22). Given that meta-analysis is a well-established part of evidence-based medicine, this is a rather misplaced criticism. Moreover, the ERSPC authors conducted a careful analysis to examine whether results differed by study site, and they did not find evidence of important heterogeneity. A second criticism is that treatment differed by screening arm, with curative therapy more common in screened men, suggesting that the apparent effect of screening on mortality resulted from biases in treatment recommendations. However, the reported differences in treatment used as a basis for this argument were unadjusted for stage. Curative therapy is less likely to be recommended to a patient with advanced compared to early-stage cancer, irrespective of screening, and a higher proportion of patients in the screening arm presented with early-stage disease. There are few important between-group differences in treatment recommendations after adjusting for stage at presentation (23).
It has also been alleged that outcomes ascertainment may have been biased in ERSPC because it necessarily involved knowledge of treatment assignment. The paper cited in support of this argument found that patients with aggressive early treatment of prostate cancer were more likely to have “other cancer” as a cause of death (24). Yet this paper was based on vital statistics data, not on independent ascertainment by an expert committee, and it did not directly report underestimation of prostate cancer death. Moreover, there is a countervailing influence on effect estimates based on more direct evidence. Analysis of noncancer deaths in ERSPC data suggests excess mortality in the control arm; this is plausibly explained by deaths related to prostate cancer or its treatment that were not adjudicated as prostate cancer deaths (25).
Our own view is that these criticisms are either invalid or misplaced. In particular, issues of baseline screening and contamination in PLCO do not invalidate the trial, or suggest that it is flawed—only that it needs to be interpreted differently. Both trials address whether recommendations for additional screening would be of benefit; ERSPC found that increasing screening from a low baseline reduced cancer mortality, whereas PLCO suggested that modestly increased or reduced PSA testing in the United States is unlikely to change either behavior or cancer mortality.
It is interesting to consider what underpins the critiques of ERSPC and PLCO. One guess is that the critic wants to know how screening changes a man’s risks of mortality and overdiagnosis, and believes that one or the other of these trials does not provide a good estimate. We sympathize with this position. Indeed, we have argued that the main results reported by ERSPC and PLCO are not accurate estimates of screening benefits and harms for a contemporary man (26). The reason is very simple: the trials were designed in the early 1990s, with most events occurring in patients diagnosed in the middle of that decade, and prostate cancer research and practice have not stood still since then. In particular, several recent findings enhance our understanding of how we should screen for prostate cancer and therefore influence our estimates of the effects of PSA screening.
It is now understood that PSA fluctuates over time (27). Although these fluctuations are relatively modest in most men (28), it is inevitable that some men with PSAs above a certain threshold will have a PSA below that threshold on retesting. It is now routine practice to wait a few months and repeat the test for a man with an elevated PSA, recommending biopsy only if PSA remains high. This not only reduces the rate of biopsy but likely decreases overdiagnosis. Take the case of a 60-year-old man with a PSA usually close to 1 ng/ml who experiences a transient increase in PSA due to an infection (29). If this man is referred for immediate biopsy, it would not be unusual to find a cancer (30), but the cancer is highly unlikely to be lethal (31). Were the clinician to wait, see the PSA fall back to 1 ng/ml, and advise against biopsy, overdiagnosis would be reduced.
Recent research has convincingly demonstrated that surgeons with higher annual radical prostatectomy caseloads have both lower complication rates (32) and higher rates of surgical cure, with recurrence rates ~40% lower among highly experienced surgeons compared to those less experienced (33). This is likely to lead to differences in mortality.
Randomized trials have indicated a strong association between radiotherapy dose and cure rates. For example, increasing the dose to the prostate from 70 Gy to 79 Gy, which is only possible through the use of conformal therapy, decreases five-year recurrence rates from 21% to 9% (34).
There is increasing recognition that many screen-detected cancers do not require immediate curative therapy. The concept of active surveillance is to follow low-risk patients with PSA and repeat biopsy, referring them to definitive treatment if there are signs of progression. Several reports suggest very low rates of prostate cancer mortality in active surveillance cohorts, with approximately two-thirds avoiding surgery or radiotherapy (35).
PSA is often thought of in diagnostic terms: men with a high PSA are more likely to have prostate cancer. But PSA is also very strongly prognostic of the long-term risk of advanced prostate cancer. For example, in an analysis of blood samples collected in 1981–1982 as part of the Malmö Preventive Medicine cohort, men with a PSA lower than the median (~1 ng/ml) at age 60 had only a 0.2% risk of prostate cancer death by age 85 (31). In men aged 44–50, 81% of advanced cancers diagnosed up to 30 years later occurred in men above the median, corresponding to a PSA of 0.65 ng/ml or more (36).
It has become increasingly apparent that there is a long lead time between screen-detected cancer and clinical diagnosis. One estimate based on analysis of a cohort followed for many years without screening is a mean lead time of 12–13 years (15). This suggests that shifting individually tailored screening earlier—both starting and stopping earlier—might beneficially affect both overdiagnosis and mortality.
These considerations suggest that estimates from the ERSPC trial, such as a risk ratio of 0.80, cannot be simplistically applied to contemporary patients. Participants in ERSPC were (in most centers) biopsied for any PSA elevation, even if PSA would have fallen below the biopsy threshold on follow-up. Surgical treatment was not generally at high-volume centers; indeed, radical prostatectomy volume was generally low in Europe in the 1990s. Similarly, radiotherapy doses were often <70 Gy, far lower than the 81-Gy dose common in contemporary U.S. practice. Most ERSPC participants were relatively old at first screen. In the Dutch arm, for example, the median age of men biopsied in the first round was 66 years, with 25% aged >70 years. Close to 20% of these men had advanced cancer, clinical stage T3 or T4. It cannot be reasonably doubted that a proportion of these men would have had tumors detected at a curable stage had they received PSA tests starting at age 45 or 50, as recommended in several guidelines. Moreover, men with consistently low PSAs continued to be screened, raising the risk of overdiagnosis.
These are not flaws in ERSPC or PLCO. These observations merely reflect how knowledge changes over the course of a long-term prospective randomized trial. Indeed, data from the trials themselves have been used to develop or confirm some of these ideas. For instance, ERSPC researchers analyzed data from their trial to confirm that PSA can be used to stratify risk. Mortality rates were very low for participants with PSA below 1 ng/ml at first screen (0.04 deaths per 1,000 life years) and approximately ninefold higher for those with PSA of 2–2.9 ng/ml (37). These data were reanalyzed to show that excluding men with PSA lower than 2 ng/ml from subsequent screening would markedly reduce the number of men needing to be screened, biopsied, and treated to prevent one prostate cancer death (38).
Prostate cancer screening is considered by many to be “controversial,” suggesting that much is left to be decided. But well-controlled research, with >250,000 men on randomized trials alone, has settled a number of important issues.
Applying some of these findings to current U.S. practice suggests that it is far from optimal. Many of those screened have little to gain from PSA testing: 25% of U.S. men aged 85 or older receive a yearly PSA test; of those aged 70 or greater, with a 50% or greater risk of death within five years, close to a third were screened for prostate cancer (19). Compounding this problem—such that overdiagnosis becomes overtreatment—active surveillance is a grossly underused modality, with ~90% of low-risk patients receiving curative therapy (47). Moreover, most patients are treated by low-volume surgeons—80% of U.S. surgeons who undertake radical prostatectomy conduct 10 or fewer of these procedures per year—increasing complication rates and lowering the chance of cure (48). In other words, the way that PSA screening is currently implemented in the U.S. seems virtually tailor made to minimize benefits and maximize harms.
To conclude, the question is not whether prostate cancer screening constitutes early detection or overdetection. The critical point concerns how we should screen in order to maximize benefits (reduced prostate cancer mortality) and minimize harms (avoidable treatment side effects).
Work on this review was supported in part by funds from David H. Koch, provided through the Prostate Cancer Foundation; the Sidney Kimmel Center for Prostate and Urologic Cancers; R33 CA 127768-02 grant to H. Lilja and P50-CA92629 SPORE grant to H.I. Scher from the National Cancer Institute; funding (grant no. 3455) to H. Lilja from the Swedish Cancer Society; and FiDiPro grant support to H. Lilja from TEKES.
Dr. Hans Lilja holds patents for free PSA and hK2 assays and is named as coinventor on a patent application for intact/nicked PSA assays. Dr. Monique Roobol is a board member of the ERSPC Foundation.