|Home | About | Journals | Submit | Contact Us | Français|
We previously developed and validated the Cancer of the Prostate Risk Assessment (CAPRA) score to predict prostate cancer recurrence based on pre-treatment clinical data. We aimed to develop a similar post-surgical score with improved accuracy via incorporation of pathologic data.
We analyzed 3837 prostatectomy patients in the CaPSURE national disease registry. Cox regression was used to determine the predictive power of preoperative prostate specific antigen (PSA), pathologic Gleason score (pGS), surgical margins (SM), extracapsular extension (ECE), seminal vesicle invasion (SVI), and lymph node invasion (LNI). Points were assigned based on the relative weights of these variables in predicting recurrence. The new post-surgical score (CAPRA-S) was tested and compared to a commonly-cited nomogram with proportional hazards analysis, the concordance (c) index, calibration plots, and decision-curve analysis.
16.8% of the men recurred; actuarial progression-free probability at 5 years was 78.0%. The CAPRA-S was determined by adding up to three points for PSA, up to three for pGS, one point each for ECE and LNI, and two points each for SM and SVI. The hazard ratio for each point increase in CAPRA-S score was 1.54 (95% CI 1.49–1.59), indicating a 2.4-fold increase in risk for each two point increase in score. The CAPRA-S c-index was 0.77, substantially higher than 0.66 for the pre-treatment CAPRA score and comparable to 0.76 for the nomogram. The CAPRA-S score performed better in both calibration and decision curve analyses.
The CAPRA-S offers good discriminatory accuracy, calibration, and ease of calculation for clinical and research settings.
An estimated 217,730 men will be diagnosed with prostate cancer in the United States in 2010, a figure accounting for 28% of all male cancer diagnoses. 32,050 deaths are anticipated, representing the second highest mortality burden of all cancers among men, but a comparatively small figure relative to the number of diagnoses.1 Risk assessment of prostate cancer is therefore essential, to identify both those men at high risk of cancer mortality who require aggressive, often multimodal, therapy, and those who are at relatively low risk and might be spared the potential impact of therapy on quality of life.
We have previously developed the UCSF Cancer of the Prostate Risk Assessment (CAPRA), a pre-treatment score based on patient age, prostate specific antigen (PSA), biopsy Gleason score, clinical T stage, and percent of biopsy cores positive.2 The CAPRA score predicts risk of cancer recurrence with accuracy as least as good as other pre-treatment risk prediction instruments,3, 4 yet can be calculated easily, without need for paper tables or computer software. The CAPRA score has been externally validated in US3, 5 and European4, 6 multi-institutional studies, with accuracy ranging from 0.66 to 0.81 and higher accuracy generally seen in academic compared to community-based cohorts). More recently, the score was demonstrated to predict recurrence with rapid PSA doubling time7 and was the first shown to predict metastasis, cancer-specific mortality, and all-cause mortality from time of diagnosis across multiple treatment modalities.8 Moreover, it outperforms competing nomograms in terms of calibration and decision curve analysis.4, 9 A similar instrument intended for patients receiving primary androgen deprivation therapy has been published recently.10
Three of the variables defining the CAPRA score, like other pre-treatment instruments11 —biopsy Gleason score, clinical T stage, and percent of biopsy cores positive—are by nature approximations and may therefore under- or over-estimate true grade and extent of cancer. An advantage therefore offered by radical prostatectomy is that additional prognostic information may be gleaned from the pathologic Gleason score (pGS), surgical margin (SM) status, and presence or absence of extracapsular extension (ECE), seminal vesicle invasion (SVI), and/or lymph node involvement (LNI). These additional data have proved helpful in previously reported risk instruments.11, 12 We aimed to develop a postoperative analog to the CAPRA score which would incorporate these variables and improve the accuracy of the prediction, without sacrificing the overall simplicity of the scoring system.
The Cancer of the Prostate Strategic Urologic Research Endeavor (CaPSURE™) is a national disease registry accruing men with biopsy-proven prostate adenocarcinoma, recruited from 40 urology practices, primarily community-based, across the United States. Informed consent is obtained from each patient under institutional review board supervision. Patients are treated according to their physicians’ usual practices, and are followed until time of death or withdrawal from the study. Additional details have been reported previously.13, 14 Eligibility for inclusion in the study was limited to men with prostate cancer diagnosed since 1992 who underwent prostatectomy as primary treatment and had at least six months of followup recorded in the registry. Those with clinically advanced disease (>cT3aN0M0) pre-operatively were ineligible, as were those had received neoadjuvant or adjuvant hormonal and/or radiation.
Detailed reporting of staging variables (ECE, SVI, SM) is variable among pathology reports accessioned to CaPSURE. In the main analysis, ECE, SVI, or SM reported as “unable to assess” were assumed to be negative; in a sensitivity analysis, cases without complete data for all variables were dropped. To examine whether cases with missing pathologic data (ECE, SVI, SM) differed from cases with complete data, we compared these groups with respect to their distributions of the original preoperative CAPRA score using a Wilcoxon rank-sum statistic. In all cases, patients with no lymphadenectomy performed were assumed to have negative LNI. Patients missing pathologic Gleason score and/or preoperative PSA were excluded.
The definition of biochemical recurrence was either 2 consecutive PSA values over 0.2 ng/ml15 or any secondary treatment at least six months following surgery (treatment within six months was assumed to be adjuvant). Men not experiencing recurrence—including those dying of other causes—were censored at date of the last available PSA.
Our goal was to develop an instrument which would reflect the additional accuracy yielded by the pathologic data while preserving the straightforward and intuitive 0–10 scoring structure of the CAPRA score, with each predictor weighted by easy-to-recall integers, such that the possible scores would cover a broad range of recurrence risk. Candidate predictor variables were fit into a multivariable Cox proportional hazards regression model predicting likelihood of recurrence. Preoperative PSA and pGS were coded via 3 binary indicators contrasting low, medium, and high levels with very low risk levels, with cut-points similar to those used to define the CAPRA score (see Table 1). SM, ECE, SVI, and LNI were all dichotomous. All candidate variables were statistically significant independent predictors, so all were retained in the model.
The log hazard ratio parameter (β) estimates generated by the model were used to determine each indicator’s points to be assigned toward the new CAPRA post-Surgical (CAPRA-S) score, with points assigned per increment of β such that a 0–10 score would be obtained. Taking a similar approach as with the original CAPRA score 5, we achieved this goal by dividing each β by 0.45 and rounding to the nearest integer. Using the same thresholds as the original CAPRA score, the CAPRA-S score was also categorized in three groups at low (CAPRA-S 0–2), intermediate (CAPRA-S 3–5), and high (CAPRA-S ≥6) risk of recurrence.5 We illustrated the relationships with progression-free probability of the continuous and grouped CAPRA-S score using Kaplan-Meier plots.
The new CAPRA-S instrument’s predictive accuracy was first assessed via Cox analysis. The assumption of proportionality was examined via plots of the complementary log-log hazard and Schoenfeld residuals versus time, both of which demonstrated essentially parallel lines; a LOWESS smoothed mean drawn through the latter plot was horizontal. Confidence intervals (CIs) for the model were calculated with bootstrap correction. As a sensitivity analysis, the model was rerun with SM, ECE, and SVI data points labeled “unable to assess” considered missing rather than negative. As a measure of the variability of the score across its range, hazard ratios were estimated made between each adjacent pair of CAPRA-S scores (1 vs 0, 2 vs 1, etc.).
Given the possibility of overfitting in evaluating the new score’s performance, a ten-fold cross-validated dataset was created to determine Kaplan-Meier estimates and 95% CI of the observed estimate of 5-year progression-free probability. To evaluate model discrimination, Harrell’s c-index was calculated for the CAPRA-S score as a continuous variable, as was a bootstrap-corrected estimate of the c-index in a sample of 100 datasets drawn with replacement from the original set. The latter is a nearly unbiased estimate of the external predictive discrimination.16 As a comparison, the c-index was also calculated for the postoperative nomogram published by Stephenson et al.12
Model calibration at 5 years’ follow-up was examined graphically by plotting Kaplan-Meier estimates in the cross-validated dataset versus the model-predicted estimate for each level of the CAPRA-S score, with CIs based on the standard error of the log cumulative hazard. Calibration was also assessed via the Hosmer-Lemeshow chi-squared statistic on 5 degrees of freedom, after combining levels ≥7 of the CAPRA-S score so that at least 20 patients remained at risk in each level. This statistic is calculated by summing over CAPRA-S levels the squared differences between the observed and predicted progression-free probabilities, divided by the predicted estimates. Predicted and observed 5-year PGP estimates from the postoperative nomogram were also plotted. Finally, decision curve analysis was used to compare the nomogram to the CAPRA-S score, based on progression-free survival analysis at 5 years using the cross-validated dataset.17
In addition to prediction of progression, the ability of the continuous CAPRA-S score to predict prostate cancer-specific mortality was assessed via competing risks regression.18 Cause of death in CaPSURE is ascertained from review of death certificates and annual query to the National Death Index.8
A total of 5507 men participating in CaPSURE and diagnosed since 1992 underwent radical prostatectomy. Thirty men were excluded for clinically advanced disease (T3b or N1) preoperatively, 554 for receiving neoadjuvant or adjuvant therapy, 345 for missing pGS or preoperative PSA, and 741 for less than 6 months of followup after surgery; 3837 men thus constituted the main analytic dataset. Those 686 (17.8%) men with ECE, SVI, and/or SM reported as “unable to assess” were not statistically different from the 3151 (82.2%) not missing these data in terms of preoperative CAPRA score (p=0.52).
Overall, 644 (16.8%) men undergoing prostatectomy recurred by PSA (N=478) or second treatment (N=166) criteria. Among men recurring, failures occurred at a mean ± standard deviation (SD) of 29 ± 24 and median of 23 months after surgery, and patients not failing were censored at a mean ± SD of 44 ± 29 and median of 37 months. A total of 1843 and 795 men, respectively, were at risk at 3 and 5 years; the 3- and 5-year actuarial progression-free probability rates for the whole cohort were 85.1% (95% CI 83.8–86.3) and 78.0% (76.2–80.0).
Table 1 summarizes the results of the model used to build the CAPRA-S score. All variables included in the model were statistically significant predictors of biochemical recurrence according to likelihood ratio statistics. Based on the log-hazard ratio parameter estimates from the model, up to three points may be assigned based on preoperative PSA, up three points for pGS, one point each for ECE and LNI, and two points each for SM+ and SVI (figure 1). Although the maximum score theoretical score is therefore 12, there were only seven men with CAPRA-S scores of 10, four with scores of 11, and none with scores of 12. For this reason scores over 9 were combined to a “CAPRA-S ≥9” category. The CAPRA-S score distribution is given in table 2.
The mean hazard ratio (HR) for CAPRA-S as a continuous variable was 1.54 (95% CI 1.49–1.59), consistent with a 2-point increase in score representing on average a 2.4-fold increase in risk (1.542 = 2.4). The HRs at each individual are summarized in table 2. Pairwise comparisons among adjacent CAPRA-S scores (CAPRA-S 1 vs 0, 2 vs 1, etc) were all statistically significant except for the comparisons of CAPRA-S 4 vs. 3 and 7 vs. 6. The HRs for pairwise comparisons are also summarized in table 2.
The 3- and 5-year Kaplan-Meier progression-free probability estimates are presented in table 2 and illustrated in figure 2A. Figure 2B presents the corresponding estimates for patients with CAPRA-S scores grouped as 0–2, 3–5, and ≥6. The bootstrap-corrected c-index for the 10-level CAPRA-S is 0.77 (95% CI, 0.75–0.79); by comparison in this sample the c-index for the original CAPRA score is 0.69. Treating the unknown or “unable to assess” pathologic data as missing rather than negative decreased the analytic sample to 3151. The c-index increased slightly to 0.79, and the HR for each point increase in CAPRA-S to 1.57 (1.52–1.63). The c-index for the Stephenson nomogram, by comparison, was minimally lower at 0.76. Figure 3 presents the results of the decision curve analysis between the two instruments. In this dataset, the CAPRA-S score predictions result in greater net benefit across the range of risk thresholds compared to the nomogram predictions. Although the model calibrates very well at 5 years’ follow-up (Hosmer-Lemeshow p=1.0), there is some evidence of lack of fit at higher CAPRA-S scores, which included relatively few patients (Figure 4A). The nomogram, on the other hand (figure 4B) is consistently over-optimistic in its predictions relative to outcomes in CaPSURE.
Forty men died of prostate cancer at a median of 7.6 years, and 234 died of other causes at a median of 7.1 years during the observation period; the remainder were censored with respect to mortality. On competing risks analysis, the subhazard ratio for cancer-specific mortality for each single point increase in CAPRA-S score was 1.42 (95% CI 1.27–1.60).
Radical prostatectomy accounted for half of primary treatment selection among all patients enrolled in CaPSURE between 1990 and 2008.19 Population-based data estimate that approximately one-third of prostate cancer patients in the United States undergo the procedure.20 A subset of these will experience biochemical recurrence, of whom a fraction will progress to clinical recurrence and/or metastases and face disease-specific mortality. Postoperative PSA kinetics may help identify which patients are at greatest risk,21 but require multiple PSA assessments, potentially delaying interventions such as radiation or androgen ablation which have greater benefit with earlier administration for selected patients.22–24 Conversely, many patients with limited adverse pathologic features will not progress following surgery and could be spared the additional morbidity of further treatment.25
A recent review identified eight previously published instruments for prediction of biochemical outcomes following prostatectomy.11 A set of lookup tables, not included in this review has also been published.26 Of these, the only externally validated models based on standard clinical and pathological variables with published accuracy estimates are a prediction formula by Bauer et al.,27 the postoperative nomogram originally developed by Kattan et al28 and the updated version published by Stephenson et al..12 The latter, while based on a recurrence definition using a PSA threshold of 0.4 ng/ml rather than 0.2 ng/ml, incorporates similar variables as the CAPRA-S: preoperative PSA, pGS, SM status, ECE, SVI, LNI, and adds year of surgery. This instrument had a c-index of 0.86 for the development set, and 0.81 and 0.77–0.78 for validation studies in the same institution and another academic institution, respectively, and tended to overestimate somewhat the likelihood of progression-free probability for patients at the higher end of the risk spectrum.12
The bootstrap-corrected c-index for the CAPRA-S score in this study is 0.77, which indicates good discriminatory accuracy. Moreover, the scoring system requires no paper tables or software, and with practice can be determined rapidly from memory. An individual patient’s likelihood of recurrence 3 and 5 years after surgery can be estimated from the figures given in Table 2. However, the absolute risks will vary across cohorts and surgical series, for which reason the CAPRA-S score in meant to be used primarily as a measure of relative risk. Additional validation studies will be required to determine how consistently the absolute risk predictions are calibrated across different clinical contexts.
Of note, two previous papers have performed direct head-to-head comparisons of the pre-treatment CAPRA score to popular nomograms: a U.S. study compared the CAPRA score to the original Kattan preoperative nomogram 3, and a European study compared it to the updated preoperative nomogram published by Stephenson et al..4 In both analyses the accuracy of the CAPRA score was similar to that of the nomograms, while the European study found that the CAPRA score performed better both in terms of calibration and in decision curve analysis.4
Likewise, in the present analysis, the CAPRA-S score and postoperative nomogram have similar discrimination as calculated by the c-index, but the CAPRA-S score performs better in calibration and decision curve analyses. This finding may reflect the application of a nomogram derived from a high-volume surgeon’s experience to broader community practice. We previously observed this phenomenon in analyzing the performance of the Kattan preoperative nomogram in CaPSURE29; in that study the preoperative nomogram was also somewhat over-optimistic when applied to the community-based data, though the miscalibration was not as great as we observe with the postoperative nomogram in this current analysis.
The CAPRA-S scores, while still concentrated among the lower scores, are more broadly distributed than the original pre-treatment CAPRA scores. The CAPRA-S score thus should be quite useful in practice for helping patients understand their risk of recurrence and the possible utility of adjuvant therapy. The score should also be applicable as a composite measure of disease risk in the research setting, both for consistent identification of eligible patients, and for risk-based subgroup classification of trial results. The CaPSURE data are multi-institutional and largely community-based, so should be robust in terms of external applicability.
Several caveats should be noted. First, completeness of pathologic data is variable in CaPSURE, reflecting the nature of the registry, with multiple clinicians contributing data, including pathology reports written to varying standards by an unknown number of pathologists. However, the sensitivity analysis is reassuring that the model is robust, and bootstrap correction of the confidence intervals on the parameter estimates supports the credibility of the results. We expected LNI to be more strongly predictive of recurrence; its relatively minor contribution to the CAPRA-S score likely reflects the very low prevalence of LNI in the dataset. This finding is typical of U.S. surgical series in which a relatively limited lymphadenectomy is usually performed; in series including higher risk patients, in which extended template lymph node dissection is employed, the prevalence of LNI is substantially higher.30–32 Many patients did not have a lymphadenectomy performed, so excluding those with unknown LNI status would be problematic. Of note, in the postoperative nomogram, LNI also contributes relatively few points—comparable to SM but less important than ECE or SVI.12
There is a degree of overlap between adjacent scores, particularly CAPRA-S scores 6 and 7. However, while the incremental increase in risk with increasing CAPRA-S score is not entirely smooth, the analysis of the score as a continuous variable and the pairwise comparisons presented confirm that in general each two point increase in CAPRA-S score indicates at least a doubling of risk of recurrence. The a priori establishment of thresholds for dividing the CAPRA-S scores among low-risk (0–2), intermediate-risk (3–5), and high-risk (6–10) should facilitate use of the score as a risk stratification tool in the clinical research setting. Like other U.S. surgical cohorts, CaPSURE includes mostly men at relatively low risk of progression, so the interpretation of the CAPRA-S score at higher risk levels may be less reliable.
Our definition of recurrence included the one favored by the American Urological Association Prostate Guidelines for Localized Prostate Cancer.15 However, biochemical recurrence does not necessarily correlate with ultimate mortality from prostate cancer.33 This analysis does indicate that the CAPRA-S score predicts prostate cancer-specific mortality. However, important future directions will include both external validation and assessment of the CAPRA-S score’s performance in predicting cause-specific and overall mortality with larger numbers of patients ultimately reaching these endpoint—and additional head-to-head comparisons of CAPRA-S against other postoperative risk models in terms of accuracy, discrimination, and calibration. These studies are planned in the near future, and will include cohorts with greater representation of men with relatively high-risk disease.
Incorporating pathological information, the CAPRA-S score in predicting disease recurrence after prostatectomy, yet remains straightforward to calculate. No nomogram or scoring system can replace individualized clinician-patient decision-making, which must consider life expectancy, utilities for quality of life outcomes, and treatment preferences. However, we believe that given the accuracy and ease of use of the CAPRA-S score, this instrument will prove a useful tool both to inform decision-making after prostatectomy and to classify patients for future studies of adjuvant therapy.
CaPSURE is supported in part by Abbott Labs (Abbott Park, IL), and is additionally funded internally by the UCSF Department of Urology. This work was also supported by National Institutes of Health/National Cancer Institute University of California-San Francisco SPORE Special Program of Research Excellence P50CA89520. No sponsor had any role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; or preparation, review, or approval of the manuscript.