|Home | About | Journals | Submit | Contact Us | Français|
A J Vickers took part in the study design, study funding, and revising of the manuscript. Both A J Vickers and A M Cronin took part in the statistical analysis and drafting of the manuscript. B H Bochner took part in data acquisition, study design, and drafting and revising the manuscript. M W Kattan, M Gonen, P T Scardino, M I Milowsky, and G Dalbagni took part in drafting and revising the manuscript. All authors have seen and approved the final version of the manuscript.
Multivariable prediction models have been shown to predict cancer outcomes more accurately than cancer stage. The effects on clinical management are unclear. We aimed to determine whether a published multivariable prediction model for bladder cancer (“bladder nomogram”) improves medical decision making, using referral for adjuvant chemotherapy as a model.
We analyzed data from an international cohort study of 4462 patients undergoing cystectomy without chemotherapy 1969 – 2004. The number of patients eligible for chemotherapy was determined using pathologic stage criteria (lymph node positive or stage pT3 or pT4), and for three cut-offs on the bladder nomogram (10%, 25% and 70% risk of recurrence with surgery alone). The number of recurrences was calculated by applying a relative risk reduction to eligible patients' baseline risk. Clinical net benefit was then calculated by combining recurrences and treatments, weighting the latter by a factor related to drug tolerability.
A nomogram cut-off outperformed pathologic stage for chemotherapy for every scenario of drug effectiveness and tolerability. For a drug with a relative risk of 0.80, where clinicians would treat no more than 20 patients to prevent one recurrence, use of the nomogram was equivalent to a strategy that resulted in 60 fewer chemotherapy treatments per 1000 patients without any increase in recurrence rates.
Referring cystectomy patients to adjuvant chemotherapy on the basis of a multivariable model is likely to lead to better patient outcomes than the use of pathological stage. Further research is warranted to evaluate the clinical effects of multivariable prediction models.
Many decisions in oncology depend, implicitly or explicitly, on predictions. These predictions are normally thought of in terms of “risk”: typically, we act when a patient is deemed at sufficiently high risk that the benefits of intervention, in terms of reducing risk, outweigh the harms, in terms of toxicities. Most commonly, decisions that involve prediction are based on risk categories. The most common system for risk categorization in cancer is cancer stage, with more aggressive treatments reserved for patients with higher stage disease. Stage can influence the extent of surgery, such as in breast or bladder cancer; the intensity of chemotherapy, such as in lymphoma; and whether adjuvant therapy is indicated, such as in bladder or colon cancer.
Recent years have seen an upsurge of interest in multivariable prediction models. Typically, these models provide a numerical estimate of risk in the form of a probability on the basis of several tumor and patient characteristics. The well-known “Kattan nomogram”, for example, provides the probability of prostate cancer recurrence after radical prostatectomy on the basis of stage, grade and prostate-specific antigen level1. Numerous similar models have been published for a variety of different cancers2–6 and for specific treatment decisions, such as adjuvant chemotherapy after breast cancer7. It seems reasonable that such models might predict more accurately than simple staging systems because they include additional prognostic information. For instance, a patient with high-grade organ-confined prostate cancer is likely at a higher risk of recurrence than a patient at a similar stage, but with low-grade disease. Empirical studies have confirmed that multivariable models provide more accurate predictions than American Joint Committee on Cancer (AJCC) staging, or other simple risk groupings, in a wide variety of cancers including melanoma8, gastric cancer9, pancreatic cancer10 and prostate cancer11. In the case of bladder cancer, the topic of the current paper, the predictive accuracy of a multivariable model for recurrence after radical cystectomy (the “bladder cancer nomogram”) was a concordance index of 0.75 compared to only 0.68 for the AJCC TNM staging and 0.62 for standard pathologic stage12.
Nonetheless, the clinical implications of an improved concordance index are not immediately obvious: we may be able to predict better who is at high risk of death from bladder cancer, however, this may make little practical difference to patient care. We are interested in better defining whether use of the bladder cancer nomogram would improve clinical decision making, such as whether a patient should be administered adjuvant chemotherapy. To address this question we examined a recent multivariable prediction model, the bladder cancer nomogram, using decision analytic methods.
Collection of data for the International Bladder Cancer Nomogram Consortium has been previously described12. In total, data on 9064 patients who underwent radical cystectomy were collected from 13 institutions in 6 countries. Since we wished to estimate the risk of recurrence following radical cystectomy alone, we excluded patients who had received systemic adjuvant chemotherapy, neoadjuvant chemotherapy, or definitive pelvic radiotherapy (n=2001), and patients for whom pelvic radiotherapy was unknown (n=319). Of the remaining patients, 871 were not followed for recurrence, and 1411 were excluded due to missing data on variables included in the nomogram. Our sample therefore consisted of 4462 patients who were managed by radical cystectomy only, were followed for recurrence and had complete data for all predictors: sex, age, pathologic stage, histology, nodal status, grade and time between diagnosis and cystectomy. The proportion of patients by institution is similar to that previously reported12.
Our interest in this paper is adjuvant rather than salvage chemotherapy. Adjuvant chemotherapy is commonly given to high-risk bladder cancer patients after radical cystectomy. Two separate groups have published meta-analyses of randomized trials and reported similar findings suggesting a survival benefit to adjuvant chemotherapy. Ruggeri et al. pooled data from five Phase III trials and reported a statistically significant improvement in disease-free survival in patients receiving adjuvant chemotherapy (hazard ratio of 0.65; 95% C.I. 0.54, 0.78) for disease-free survival13. The advanced bladder cancer meta-analysis collaboration14 included an additional trial and performed individual patient data analysis: they report a hazard ratio of 0.68 (95% C.I. 0.53, 0,89). These numbers are approximately equivalent of a relative risk of recurrence at 5 years of approximately 0.75. However, given the relatively limited number of patients included in these meta-analyses, fewer than 500, and the accordingly wide confidence intervals, we planned to use a variety of different estimates of relative risks in our analyses.
Eligibility for adjuvant chemotherapy therefore depends on a decision rule for identification of high risk patients. With respect to bladder cancer, patients with stage pT3 or pT4 disease, or those with positive nodes, are generally considered at high risk for recurrence and therefore recommended by standard guidelines as eligible for chemotherapy; node-negative patients with organ confined disease (less than pT3) are followed by observation only after surgery.15 We wished to compare this eligibility criterion with one of three prespecified rules based on the nomogram predicted risk of recurrence at five years: 10%, 25% and 70%.
Our overall statistical approach follows a previously published methodology16. In brief, to calculate sensitivity and specificity for survival time data, we first define x=1 if the patient is classified as being at high risk and x=0 otherwise; s(t) is the Kaplan-Meier survival probability at time t, predefined as five years. Following Begg et al.17 we use the following formula for sensitivity and specificity:
Assuming a constant relative risk (RR), the proportion of patients who recur with a particular intervention scenario i can be given as:
As we are treating recurrence as a binary event, relative risk is defined as the risk of recurrence at five years with treatment divided by five-year recurrence risk without treatment. Our primary analyses assumed that relative risk was constant across risk groups so that, for example, given a chemotherapy regimen associated with a relative risk of 0.9, a patient with a 50% probability of recurrence without treatment would have a risk of 45% on chemotherapy and patient with a 10% baseline risk would have a 9% risk.
As a sensitivity analysis, we considered three scenarios, where the treatment was most effective for patients at low, average, and high risk. A patient’s risk was estimated using the nomogram probability, which is justified because the nomogram has been shown to be well calibrated12. Figure 1 gives an illustration of how relative risk was varied by absolute risk for each scenario. The mean relative risk was kept constant between scenarios, such that the total number of recurrences was identical regardless of the presumed relationship between absolute and relative risk.
The proportion of patients who would be treated under each strategy is estimated counting the proportion of patients in our data set meeting each criterion. Knowing the treatment and recurrence rates for each strategy does not necessarily identify the optimal approach. Often, when comparing two strategies, one will be associated with a lower rate of treatment but also a higher recurrence rate. To calculate whether the reduction in the number of patients receiving chemotherapy offsets the increase in recurrence rates, we need to consider the maximum number of patients a clinician would consider treating to prevent one recurrence. This is known as the “number-needed-to-treat threshold” (NNTT) and is a clinical judgment that can vary from clinician to clinician, and from patient to patient18. The NNTT is reciprocal of the minimum, clinically significant difference, a concept necessary to design and interpret randomized trials19, 20. NNTT is a measure of drug tolerability: an agent that is easy to take, inexpensive and associated with low toxicity would have a high NNTT; a toxic, inconvenient or expensive drug would have a low NNTT. Note that the number-needed-to-treat threshold (NNTT ) is different from the usually reported number-needed-to-treat (NNT): NNT is calculated from the results of a study and tells us how many patients would need to be treated to prevent one event; NNTT is a clinical consideration independent of the results of any study, and tells us how a physician weights the harms of treatment against the benefits of avoiding an event.
We can then define “clinical net benefit”16 as follows:
Because net benefit includes both treatments and recurrences, the optimal treatment strategy is the one with the highest net benefit, irrespective of the size of difference between strategies. Statistical analyses were conducted using Stata 9.2 (Stata Corp., College Station, TX).
Baseline characteristics of the 4462 patients in our sample are summarized in Table 1. The median age at cystectomy was 62 years (interquartile range 51, 70) and the majority of patients were male (77%). Approximately half were either lymph node positive or had pathologic stage ≥ pT3 (n=2466, 55%); 4081 (91%), 1835 (41%), 365 (8%) patients had a ≥10%, ≥25%, and ≥70% 5-year nomogram probability of recurrence, respectively.
There were 1069 recurrence events. The median follow-up for recurrence-free patients was 3.8 years. The Kaplan-Meier estimate of the 5-year probability of recurrence for the entire sample was 28% (95% C.I. 26%, 29%).
Table 2 shows the number of patients treated using each treatment strategy, and the expected number of recurrences given various levels of treatment effectiveness. Compared to the standard approach, using a nomogram probability of 25% as eligibility for postoperative chemotherapy reduces the number of patients treated by nearly a quarter, at the cost of a slight increase in recurrence rates; using a 10% nomogram probability increases the number of patients treated by about 65%, but this is associated with a large reduction in recurrence rates. Table 3 shows the net benefit for each benefit given a range of NNTT and relative risks. The strategy with the highest net benefit will have the optimal clinical results. Where drugs are very effective (relative risk of 0.6 or 0.7) or tolerable (NNTT of 35 or 50), the highest net benefit is for a nomogram cut-off of 10%. Where drugs are of more marginal benefit and poor tolerability (relative risk of 0.9; NNTT of 20), the nomogram cut-off of 70% is optimal. For the remaining scenarios a nomogram cut-off of 25% provides the highest net benefit. The differences between strategies are not trivial. For example, at a relative risk of 0.80 and an NNTT of 20, use of a 25% threshold has a net benefit 0.003 higher than the standard pathological groups. This is equivalent to a strategy that led to 3 fewer recurrences per 1000 patients without any change in the number of patients given chemotherapy, or to one associated with 60 fewer treatments per 1000 patients without any increase in recurrence rates.
To explain these findings, figure 2 gives a distribution of predicted probabilities within each pathological grouping. A nomogram cut-off of 25% includes all patients with node-positive disease. However, a little more than a quarter of patients with pT3 or pT4, node-negative disease are at low risk (<25%) according to the nomogram and would therefore not be treated by chemotherapy; conversely, just under a quarter of patients with less than pT3 disease (organ confined tumors) are considered high risk by the nomogram.
We then repeated our analysis for all possible combinations of relative risk (0.50 to 0.99) and NNTT (1 to 250) and recorded which strategy had the highest net benefit. The results are shown in figure 3. For example, table 3 shows that for a NNTT of 20 and a relative risk of 0.8, the highest net benefit is obtained by treating only patients with at least a 25% risk of recurrence; a point corresponding to 0.8 on the x axis and 20 on the y axis of figure 3 is medium gray, indicating the 25% threshold as the optimal strategy. As expected, where adjuvant therapy is highly effective or tolerable, either all patients, or all patients except those with very low risk (nomogram probability less than 10%) should be treated; where therapy is of moderate effectiveness or tolerability, either no patients or only those at the highest risk (nomogram probability ≥ 70%) should be treated. For the remaining cases, the 25% cut-off is optimal. Of note is that for no combination of effectiveness or tolerability is the highest net benefit associated with the standard eligibility for chemotherapy. In other words, the nomogram outperforms the standard guideline for every plausible scenario of drug effect and tolerability: whatever a clinician believes about a particular drug, he or she should use a nomogram cut-off in order to decide which patients to treat.
We excluded 2320 patients receiving chemotherapy or radiotherapy, or where other treatments were unknown; 19% (679/3551) with low stage (≤ pT2 and node negative); 32% (1536/4875) with high stage (≥ pT3 or node positive); and 16% (105/638) with unknown stage. Although a higher proportion of patients with high stage were excluded, our cohort comprised the vast majority of patients with both high and low pathologic stages. Accordingly, we see no reason to believe that treatment selection would have an important impact on our findings.
Our primary analyses involved an assumption that the relative risk reduction associated with treatment was constant across risk groups. We therefore performed sensitivity analyses to check that our results were robust to this assumption. Results of these sensitivity analyses are displayed in table 4. Although it initially appears that conclusions about the optimal cut-point sometimes depend on our assumptions of how relative risk varies with absolute risk, the assumption of constant relative risk analyses gives good results: generally there were only small differences in net benefit between the optimal strategy under constant relative risk and the optimal strategy where relative risk varies by baseline risk. Most importantly, the strategy chosen under the assumption of constant relative risk was superior to standard pathological risk groups across almost but two of sensitivity analyses conducted, and in both cases the advantage was trivial (1 or 3 recurrences per 10,000 patients). Thus our conclusion that the nomogram improves clinical outcomes holds regardless of the relationship between absolute risk and relative risk reduction.
The nomogram was developed excluding patients who had received chemotherapy or radiotherapy – patients likely having the most unfavorable tumor characteristics. As a result, the predictions from the nomogram might underestimate the overall risk of recurrence. Therefore, as a second sensitivity analysis, we added a constant to the nomogram prediction for every patient (varying the constant from 2–5%) and repeated all analyses: none of our results were changed.
As a third sensitivity analysis, we repeated all analyses using only patients with transitional cell carcinoma, the predominant histology for bladder cancer in Europe and North America (see figure 3). Although there were some differences in results, our key finding was unaffected: one or other nomogram cut-off was superior to the standard eligibility criteria for chemotherapy for every combination of drug tolerability and effectiveness.
Our final sensitivity analysis accounted for the competing risk of death, that is, we defined the probability of recurrence by the cumulative incidence function instead of the Kaplan-Meier estimate21. The optimal treatment strategy under the competing risk analysis was the same as that under the primary analysis for all but one combination of NNTT and relative risk shown in Table 3.
Adjuvant chemotherapy after bladder cancer is subject to some debate, in particular, data from randomized trials, though positive13, 14, is limited by inadequate number of patients. Nonetheless, adjuvant therapy is often given after radical cystectomy (it is recommended in standard guidelines15), and pathologic stage is the most common criterion to determine which patients receive it. Our analyses suggest that determining eligibility for adjuvant chemotherapy after radical cystectomy on the basis of a multivariable model will give superior clinical results compared to determining eligibility on the basis of pathologic characteristics alone. In a typical comparison, use of the nomogram would reduce the number of patients subjected to chemotherapy by 14%, with only a small increase in recurrences (0.4%). This improvement in outcome is obtained purely by changing a decision rule: there are no additional tests, procedures or treatments. The superior clinical performance of the nomogram appears to result from reclassification of an important proportion of patients: some patients with pT3 or pT4 disease would not be eligible for chemotherapy on the basis of the nomogram due to the absence of any other risk factor (such as a long time from diagnosis to treatment); comparably, some patients with <pT3 disease are eligible for chemotherapy by nomogram because they are defined as high risk for a reason other than pathologic stage.
Although previous studies have shown that multivariable models improve predictive accuracy compared to staging systems8–11, we believe that we are the first to show that use of such models would have beneficial effects with respect to a therapeutic decision. This finding has important consequences for cancer care. An enormous range of decisions about the care of the cancer patient are based on risk, with patients thought to be a higher risk subject to more intensive treatment or monitoring. At the current time, however, nearly all such decisions are currently based on simple risk stratifications, such as stage. We hypothesize that multivariable risk prediction models could be used to replace many of the decisions currently made on the basis of stage, including whether a patient receives surgery, chemotherapy and radiotherapy; how aggressively he or she is treated; the intensity of post-treatment follow-up and eligibility for clinical trials. We have shown that changing the criteria we use to make decisions has important clinical consequences and so we further hypothesize that use of such models would improve cancer care either by decreasing the number of patients subject to unnecessary treatment or by decreasing event rates, as a result of better identification of patients requiring intervention.
Multivariable models have two additional advantages over crude risk stratifications based on criteria such as stage. First, multivariable models allow for individualization of care. Patients may differ with respect to the relative value they place on treatment toxicities and disease recurrence, and a model allows patients to vary with respect to the thresholds they use for treatment. Second, multivariable models allow for the addition of prognostic markers when and if they are shown to be of benefit. In the case of bladder cancer, for example, it has been suggested that markers such as cyclin E1, p53, p21, pRB, and p27 status might distinguish more from less aggressive tumors22, 23; similarly, genomic analyses have indicated that certain patterns of gene expression may be associated with cancer outcome24. Were these markers to be validated, it is unclear how they could be incorporated into a staging system without an unmanageable expansion of multiple categories (high stage / node negative / low p53 / low genomic risk; high stage / node negative / high p53 / low genomic risk; and so on). Conversely, such markers can be easily incorporated into multivariable models.
There are several possible limitations of our study. First, the same set of data was used to generate the predictive model as to assess it. That said, we do not believe that this results in statistical overfit, on the grounds that there are a very large number of events. Indeed, we conducted some preliminary analyses to estimate statistical “optimism”25 and found that this was close to zero (e.g. using a cutoff probability of 25% and a relative risk of 0.8, the optimism for the net benefit from the nomogram was 0.0001). It may also be that patients treated in the community differ systematically from the patients treated at the academic centers contributing data to the nomogram. But even if this was true, it is not clear that this would favor either the nomogram or the standard criteria for chemotherapy. For example, if stage was poorly assessed in a community setting, or if node removal was less extensive, this would reduce the predictive accuracy of the nomogram, but it would also affect the predictiveness of a decision rule based on stage and nodal status alone.
As discussed, a clear limitation of this study is that, despite being NCCN guideline standard of care, adjuvant therapy for bladder cancer is not unequivocally considered to be of benefit. However, it can be argued that the quality of evidence for adjuvant therapy is not of direct relevance to our findings. This is unless one takes the position that adjuvant therapy does not and cannot possibly work for bladder cancer, and should never be subject to clinical trial. In the absence of such a position, a decision will have to be made about which patients receive adjuvant therapy, whether as a clinical decision rule or the eligibility criteria for a trial. Our figure 3 shows that irrespective of the characteristics of the adjuvant agent – whether its toxicity is high or low; whether it is of great or only moderate effectiveness – the predicted clinical results will be superior if the nomogram rather than standard pathologic criteria are used to determine which patients receive treatment.
Although desirable in principle, subjecting a nomogram-based strategy for chemotherapy referral to prospective experimental trial is of doubtful feasibility. For example, imagine that we wanted to show that a nomogram-based strategy would lead to a similar recurrence rate as the standard nomogram, but required fewer patients to be subjected to chemotherapy. A non-inferiority trial wishing to test that the nomogram did not increase recurrence rates by more than 1% might well need more than 25,000 patients.
This paper is presented in the spirit of “proof-of-principle”. We believe that we are the first to show clearly that use of a prediction model to inform decisions about chemotherapy would improve clinical outcome. That said, there are several steps that would need to be taken before the prediction model could be used in the clinic. First, we believe that the prediction model itself should be updated. The original bladder nomogram was developed for general use, rather than for the specific purpose of chemotherapy referral. As a result, the data set used to develop the nomogram includes patients that would not be considered for adjuvant chemotherapy, such as those of advanced age, or those diagnosed with squamous cell carcinoma. Second, in this paper we chose illustrative cut-offs for the nomogram (10%, 25% and 70%). The alternative would be choose optimal cut-offs for each of several scenarios of drug effectiveness and tolerability. Third, any statistical model would have to be implemented in a user-friendly format, perhaps in web version similar to Adjuvantonline7.
We have shown referring patients to chemotherapy on the basis of a multivariable model is likely to lead to better patient outcomes that use of pathological groups. Given the importance of this finding – that we can improve outcome merely by changing the basis on which we make decisions – we recommend research on other multivariable prediction models to determine their clinical effects.
The following are members of the International Bladder Cancer Nomogram Consortium:Director, Bernard H. Bochner, MD; Co-Director, Guido Dalbagni, MD. Statistical Group: Michael W. Kattan, PhD (Director), Paul Fearn (bioinformatics coordinator), Kinjal Vora, Hee Song Seo, Lauren Zoref; Mansura University: Hassan Abol-Enein, Mohamed A. Ghoneim; Memorial Sloan-Kettering Cancer Center: Bernard H. Bochner, Guido Dalbagni, Peter T. Scardino, Dean Bajorin; University of Southern California: Donald G. Skinner, John P. Stein, Gus Miranda; Ulm University: Jürgen E. Gschwend, MD, Bjoern G. Volkmer, MD, Richard E. Hautmann, MD; Vanderbilt University: Sam Chang, Michael Cookson, Joseph A. Smith; University of Bern: George Thalman, Urs E. Studer; University of Michigan: Cheryl T. Lee, James Montie; David Wood; Fundació Puigvert: Juan Palou, Humberto Villavicencio, Antonio Rosales; Laval University: Yyes Fradet, Louis LaCombe, Pierre Simard; Johns Hopkins Medical Center: Mark P. Schoenberg; Baylor College of Medicine: Seth Lerner, Amnon Vazina; University of Padova Medical School: PierFrancesco Bassi; Keio University: Masaru Murai, Eiji Kikuchi.
This research was funded by a P50-CA92629 SPORE from the National Cancer Institute. The funding source had no role in study design, data collection, analysis, interpretation, writing or the decision to submit. The authors have no financial conflicts of interest
Conflict of interest statement
Andrew Vickers had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. None of the authors have any relevant conflicts of interest.