We report a systematic assessment of specificity in relation to its several possible determinants in a population-based trial. Our results show that a reasonably high specificity (above 90%) can be achieved with the PSA test in prostate cancer screening. Moreover, specificity decreases only slightly at repeat (incidence) screening, and this is entirely attributable to ageing of the study subjects.
Overall, specificity of serum PSA as screening test for prostate cancer was slightly above 90%. A Canadian screening study with a cutoff of 3

ng

ml
−1 reported 90% specificity (
Labrie et al, 1992) and similar findings were reported from the US (
Mettlin et al, 1994). A volunteer-based study in the US reported specificity of 73% (
Punglia et al, 2005). A meta-analysis estimated specificity of PSA as 93% at 4.0

ng

ml
−1 (
Mistry and Cable, 2003).
Our study population may represent relatively low-risk men, as the trial is population-based and the subjects are fairly young. Yet, the incidence of prostate cancer in Finland is rather high in international comparison, with age-standardised incidence of 84 per 100

000 in 2002 (
Ferlay et al, 2004). Owing to the representative study population, our findings are likely to be more applicable to the general population than those from volunteer-based studies. Furthermore, we used a consistent definition of specificity, with systematic evaluation of various factors affecting specificity within the screening trial.
Specificity was only slightly lower in the second screening round compared with the first. This was due to participants being older at the second round. The main factor is probably the strong increase in prevalence of benign prostatic hyperplasia with age. Introduction of a new biopsy regimen with increased number of cores may have also decreased the number of apparent FP screening findings (if a larger proportion of true-positive findings were detected). In both rounds, the specificity was higher in the young age groups. This finding indicates that specificity is likely to decrease at subsequent screening rounds, as age at screening increases.
Digital rectal examination as an ancillary test among men with intermediate PSA levels was associated with a lower rate of FP findings than F/T PSA and hence, slightly higher specificity. The yield was also lower than with free PSA (2.1%
vs 5.2% of men with PSA 3.0–3.9

ng

ml
−1). This is consistent with the findings from a Dutch screening trial, where the specificity of DRE was 91% (
Schröder et al, 1998). However, the costs for a DRE are substantially higher than determination of F/T PSA in our trial, where a blood sample is drawn initially and can be used for determination of both total and free PSA, whereas DRE requires a separate visit for an urologist.
We estimated the specificity first by assuming that the proportion of false negatives (cancers among SN men surfacing during the screening interval) is negligible and can be ignored. This cross-sectional approach gives a measure that can be called relative specificity. Longitudinal analysis with correction for false-negative results (interval cases) is able to take into account the fact that many men with a negative biopsy do in fact harbour a latent cancer. Yet, adjustment for this did not materially affect the results. However, if all men harbouring a focal carcinoma in their prostates were classified as false negative, the situation would change dramatically as this has been very common in studies based on autopsy (
Breslow et al, 1977;
Kabalin et al, 1989) and cystoprostatectomy specimens or prostate tissue removed in transurethral prostatectomy (
Montie et al, 1989;
Merrill and Wiggins, 2002). Studies based on natural history models have estimated that up to 45% of screen-detected cases may be due to overdiagnosis, that is, cancers that would not have surfaced clinically during the man's lifetime if unscreened (
Etzioni et al, 2002;
Draisma et al, 2003). Thus, latent or minimal disease is very frequent, and there are good grounds to argue that presence of malignant histological features alone does not constitute a true golden standard for clinically significant prostate cancer. This issue can also been as a problem of FP findings, if overdiagnosed cases (if identifiable) were to be classified as FP findings. Yet, they cannot be reliably identified by current means, even if the above argument was accepted. Both issues, however, emphasise the need for definition of diagnosis of prostate cancer. We have used the conventional approach, but taking into the above uncertainties would have reduced the estimates of specificity.
Not all men with screen-positive result attend diagnostic examinations, and the results may not be available, if medical care is sought outside the screening organisation. In our material, approximately 0.5% of all participants or 5% of screen-positive men did not undergo biopsy within the trial (in the study hospitals). In the screening programme, these men are classified as negatives, that is, no further procedures are undertaken (despite indications being fulfilled). This is problematic when evaluating a screening test. In calculation of specificity, these men were assumed to be true positives and FPs in the same proportion as those biopsied. Owing to the small number of such cases, this did not affect our estimate of test specificity.
No consensus has been established as to the optimal use of PSA and several approaches have been proposed, including age-specific cutoffs and PSA relative to prostate volume (
Gretzer and Partin 2003). Cutoff values even lower than 4

ng

ml
−1 have been proposed and are being used in some screening projects (
Labrie et al, 1992;
Krumholtz et al, 2002;
Punglia et al, 2005). In the European Randomized Study of Screening for Prostate Cancer, ERSPC, a cutoff level of 3

ng

ml
−1 instead of 4

ng

ml
−1 was associated with increase in the proportion of test-positive findings from 1.6 to 5.1% (
de Koning et al, 2002). Generally, both the proportion of screening-positive findings and detection rates have been higher in studies with combined modality screening (e.g., DRE and/or TRUS in addition to PSA). In our study, a limit of 3

ng

ml
−1 would have resulted in an increase in FP tests by more than a third. As the increase in screen-positive findings would be in the low PSA range, where prostate cancer prevalence is likely to be low and FP results more common than at higher PSA levels, adopting a lower cutoff level is likely to reduce specificity.
Age-specific cutoff values have been proposed for PSA in order to improve specificity of the test among older men (
Oesterling et al, 1993). The rationale is that the prostate volume and prevalence of benign prostatic hyperplasia increase rapidly after 60 years of age. Use of age-specific cutoff levels would have resulted in a similar number of screen-positive findings in the first round, but substantially higher numbers in the second screening round. As no referrals or biopsy decision were made based on the age-specific cutoff values, we were not able to directly assess the possible effect on specificity. It would have resulted in large numbers of screen-positive men in older age groups and lower numbers in younger age groups. Because specificity was inversely correlated with age, it is likely that use of age-specific cutoff values would have resulted in lower specificity.
There are two approaches for avoiding information bias owing to PSA-driven biopsy in assessment of validity of the PSA test. First, it can be argued that everybody should receive the diagnostic test (prostate biopsy) when evaluating specificity, in order to completely identify those with disease. In some studies, all men have been biopsied, regardless of PSA result, which has resulted in detection of prostate cancer even at low PSA levels (
Labrie et al, 1992;
Thompson et al, 2004). These studies have also shown similar specificity for PSA as others (90–94%). Alternatively, the distortion from ‘affirming the consequent' can be avoided, when no test results are followed by diagnostic examination (
Walter, 1999). In serum bank studies, the PSA has been determined only afterwards and therefore it has not affected the diagnosis (
Gann et al, 1995;
Hakama et al, 2001). Specificity in this context has been estimated as 91–94%. Furthermore, cases in the serum bank studies have been diagnosed mainly before the PSA screening era and also therefore likely to avoid overdiagnosis.
In comparison with screening for other cancers, our results indicate similar or slightly lower specificity for PSA in prostate cancer screening. In mammography screening for breast cancer, specificity has ranged 82–99%, being commonly slightly above 90% (
Elmore et al, 2005). Fairly similar figures (86–100%) have been reported for the cervical smear in cervix cancer screening (
Nanda et al, 2000;
Cervix cancer screening 2005). In faecal occult blood testing for colorectal cancer, slightly higher specificity (95% or higher) has been found (
Allison et al, 1996;
Rozen et al, 2000).
We conclude that screening for prostate cancer based on PSA determination has acceptable specificity. It should, however, be further improved if such screening is to be adopted as public health policy. We do not recommend PSA screening before the results in terms of mortality from prostate cancer are known.