GEMCaP and Kattan scores
To evaluate the role of GEMCaP in predicting clinical status, confounding by known disease factors were avoided by selecting cases and controls with comparable baseline features (
Supplementary Table S1; ). The risk classifications according to the fixed, floating, and integrated GEMCaP scores are shown in
Supplementary Table S1 and include the 5-year postoperative PFP using the Kattan historical nomogram (
13). All aCGH log
2 ratios, along with probe information, are provided in
Supplementary Table S2.
Summary of predictive model scores
The summary features for each of the four prediction models are displayed in . As would be expected, the three GEMCaP scores are highly correlated (P < 0.001 for each pairwise comparison), but none were correlated with the 5-year nomogram prediction of PFP (P > 0.35 for each comparison). A significant difference between cases and controls was observed in the nomogram distributions (P = 0.0001), and a borderline difference between clinical subsets was observed using the floating and the integrated GEMCaP scores (P = 0.08 and 0.09, respectively).
Agreement in risk classification among the models
The overall agreement between each of the GEMCaP models and the nomogram score was investigated. Note that this is not an agreement with known disease recurrence status, but a summary of concurrence among the four methods. All three GEMCaP methods classified 31% of the patients as having a favorable risk and 31% as having an unfavorable risk of recurrence. Differences in classification occurred among the remaining third of the study sample. The classification of patients significantly differed between the integrated threshold method compared with both the fixed and floating approaches (McNemar's test: P = 0.02 for each comparison).
Overall, the Kattan nomogram classified 35% of the patients identically as all three GEMCaP methods, 26% favorable and 9% unfavorable (). Using the GEMCaP fixed method, agreement with the nomogram occurred for 61% of the patients, but both groupings only identified 9% of the entire sample as being at increased risk of progression. A difference in classification between the nomogram and both the floating and integrated methods was observed (McNemar's test: P < 0.0001 and 0.002, respectively). Because of the differing classification, we investigated the agreement between individual models and the combination of GEMCaP scores with the Kattan nomogram.
Agreement between predicted and known recurrence status
The known postoperative recurrence status was used as the reference to evaluate the ability of the four proposed methods to predict outcome. The floating method had the highest sensitivity (80%), whereas the fixed method had the highest specificity (75%; ). The fixed threshold approach did not sufficiently identify cases displaying a sensitivity of 43%, and the floating threshold method resulted in a specificity of 50% for identifying controls. Integration of the floating and fixed GEMCaP models achieved a sensitivity, specificity, and accuracy of ~65%. Changing the GEMCaP cut-point did not improve the accuracy of any of the three GEMCaP models. Due to the selection of the classification cut-point for the predicted 5-year PFP from the nomogram for this analysis, all control patients were correctly identified. With this cut-point, the sensitivity of the nomogram was only 40%, which is similar to the fixed thresholding results.
Among the 17 patients classified as favorable by all three GEMCaP models, there were five mismatches with the clinical status. The nomogram also misclassified two of these five patients. Similarly, among the 17 classified as unfavorable with all three GEMCaP thresholding approaches, 5 were mismatches with known status. The nomogram prediction also misclassified these five genomic mismatches, but incorrectly classified seven others in this unfavorable subset.
The nomogram score is a continuous variable with no accepted standard cut-points to indicate increased risk of recurrence. Because it is a validated and well-used method by clinicians to estimate outcome, we defined the nomogram cut-point of above 40% to identify all control patients based on this study sample as displayed in . Data points within ovals are where the nomogram and GEMCaP classification agree above 70% and below 40%. Both scoring systems misclassified cases (see data points within rectangle) and a similar number of cases and controls were misclassified by each approach (circles with values, >70%). All but one of those patients with intermediate nomogram scores (i.e., between 40% and 70%) had accurate GEMCaP classifications.
Detailed evaluation of agreement for cases
The difference between the nomogram and integrated classifications in identifying cases was explored further. For this study, patients were selected to be a case if they had positive lymph nodes determined at the time of RP or recurred within 1 year of surgery. The postoperative nomogram PFP score decreases when a patient has positive lymph nodes, whereas those cases who recurred within 1 year with negative lymph nodes would have a similar PFP estimate to the high-risk controls. Therefore, the nomogram had a low sensitivity when detecting true cases.
For all three GEMCaP methods, the distribution of the GEMCaP signature was consistent for all cases. In contrast with this, a significant difference was observed in the nomogram distributions between lymph node–positive cases and cases who recurred within 1 year of surgery (P = 0.0006). Even if the cut-point for the nomogram was increased, this difference would still be observed. There were 15 lymph node–negative cases in this study. GEMCaP identified 10 such cases, whereas the nomogram identified only 2 (1 sample overlapped). Descriptive data are shown in .
Summary predictive model
To combine these observations, a multivariate analysis was done, assuming a logistic regression model to predict the observed disease recurrence status. Individually, only the nomogram was predictive of disease recurrence, which is consistent with the previous results indicating a difference in distributions of the PFP between cases and controls (). The three GEMCaP approaches using the actual scores all resulted in AUCs for the receiver operating characteristic curves in the range of 0.60 to 0.64 whereas the AUC for the nomogram was 0.81. When the GEMCaP scores were dichotomized, the binary outcomes using the integrated and floating threshold classifications were each significant predictors of disease status using the logistic model, but an increase in the AUC was not achieved. These significant results reflect the ~65% accuracy with either of these two approaches for a binary GEMCaP score ().
Importantly, the integrated and floating GEMCaP signatures were able to detect the cases with negative lymph nodes who recurred within 1 year of surgery more often than the nomogram (). Thus, the addition of a binary GEMCaP classification to the nomogram probability in predicting the known disease status was tested. For both the integrated and floating methods, in addition to the nomogram PFP, the GEMCaP classification was a significant, independent predictor of recurrence status (likelihood ratio tests: nomogram P = 0.0001: plus integrated P = 0.055; plus floating P = 0.02). This resulted in a simultaneous increase in sensitivity, specificity, and accuracy compared with the nomogram prediction alone as well as an increase in the AUC for the receiver operating characteristic curve to 0.84 and 0.85, respectively (). Thus, this indicates the additional benefit of the GEMCaP signature in predicting disease progression.