We define D as the predicted difference in outcome between two treatment alternatives: Di is the individualized difference that varies between patients on the basis of a prediction model; D without a subscript is the group level estimate derived, for example, from the difference between event rates in a randomized trial. We define T in terms of the threshold for agreeing to treatment: T without subscript is a single threshold applied to all patients; Ti is an individualized threshold that differs from patient to patient based on personal values and circumstances. T is defined so that we opt for treatment if T < D and avoid treatment if T > D. If T = D, then we are in a state of equipoise and are unsure as to whether treatment is worthwhile.
Most of our discussion will refer to the binary case, where patients are at risk for an event ("event"), such as death or a recurrence of cancer. In this case both T and D are expressed in proportions: for example, if the death rate in a trial was 10% in controls and 7% in the treated group, D would be 3%. Nonetheless, the same notation can be used for a continuous endpoint, such as a pain score, or length of survival in days.
Our notation has an immediate application for categorizing modes of decision making. In paternalistic medicine, or in guidelines based on expert opinion, D is compared with T. Evidence-based counselling of individual patients compares D with Ti. This paper is concerned with comparison of Di with Ti, patient counselling based on individualized risk prediction accounting for patient-specific preferences as described in the chemotherapy example, and comparisons of Di with T, where individualized predictions of treatment benefit are compared to a fixed threshold, such as in the cardiovascular medicine example.
Several different methods of developing prediction models to inform treatment decisions have been described. These include using the randomized trial data to develop the model[1
] or creating a model for untreated patients on the basis of cohort studies, then using a fixed relative risk derived from a randomized trial to estimate risk in treated patients[2
]. Models also differ in terms of the specific statistical techniques applied, for example, use of non-linear terms, bootstrap correction or model selection criteria. For the purposes of this paper, we will remain agnostic about the methods of model building, except that investigators need to be able to demonstrate the validity of their model independent of the methods we propose. This would include documentation of the data and methods used to build the model, measures of model accuracy, such as the concordance index[12
], and correction for overfit.
Method for evaluation of prediction models: background theory
We propose an extension of decision-curve analysis, a previously published method for evaluating prediction models[13
]. In its original formulation, decision-curve analysis was used for models that predicted the probability of an event, such as the presence of cancer outside the prostate capsule in a patient scheduled for prostatectomy. Here we use the method to predict the difference between the probabilities of an event under two conditions: treatment and control.
We start by noting that the value of a prediction model is usually evaluated by applying it to a data set and determining whether the predictions match the outcomes actually observed. For our current purposes, we are not interested in the predictions themselves, but in the decisions that result from these predictions. We therefore require a data set in which there is a decision between two treatments and a subsequent outcome. Randomized trials provide such a data set, except that the "decision" of which treatment to use is made by chance. Nonetheless, a randomly chosen treatment will be congruent with how a decision would have been made on the basis of the prediction model for a proportion of patients in the trial. Our proposal is therefore that the prediction model be applied to a data set from a randomized trial or meta-analysis and the results documented for patients whose randomized allocation is congruent with the recommendations of the prediction model. These results could then be compared with use of a group level estimate.
To make this comparison, we note that there are three strategies for applying the results of a randomized trial to clinical practice: treat all patients (the typical approach where treatment effects are sufficiently large), treat no patients (where the difference between groups is not thought to be clinically significant), and treat patients according to a prediction model. Applying each of these strategies to a group of patients will lead to certain number being treated and a certain number experiencing the study event. For example, take a hypothetical trial of adjuvant chemotherapy with 2000 patients where the death rates were 35% and 40% for patients on treatment and control, respectively. Applying the strategy of "treat all" to 1000 patients, would result in 1000 treatments and 350 events; treating no-one would result in 400 events. Applying a prediction model, where only patients with a predicted benefit of T or more are treated, typically leads to a number of treatments and a number of events that is intermediate. For this hypothetical example, let us assume that the prediction model leads to 650 treatments and 355 events; the prediction model leads to some patients not being treated as not all have estimated Di's less than T, but some patients who would benefit from chemotherapy do not undergo chemotherapy and therefore relapse. Hence, compared with treating all patients, the use of a prediction strategy will tend to increase the number of events.
We propose comparing the "net benefit" of the two strategies involving treatment: treat all patients vs. treat according to the prediction model. Net benefit is a concept often used in economic analysis and is simply benefits minus harms. In the case of a treatment, "benefits" are associated with reduction in the event rate compared to no additional treatment: in an adjuvant therapy trial, for instance, benefit would be a reduction in cancer recurrences or deaths compared to surgery alone. "Harms" are associated with the treatment itself: side-effects, risks, costs, inconvenience and so on.
To calculate net benefit we require a single scale for treatments and events. We have previously demonstrated[13
] that this question -"How many treatments are equivalent to one event?" – is answered by the threshold at which a patient would opt for treatment, that is, T. We know from clinical practice that patients will demand that a treatment with important side-effects must lead to a relatively larger reduction in the risk of event than a treatment with trivial toxicities. In the appendix (see additional material file 1), we demonstrate that T is equivalent to the ratio between harms of treatment and those of an event. Thus, if a patient states that they would be unsure what to do if the benefit of treatment were an absolute 5% risk reduction, they are telling us that they consider an event to be about 20 times worse than the risks, side-effects and inconvenience of treatment.
Method for evaluation of prediction models: calculation of net benefit
We calculate net benefit as the decrease in the proportion of events associated with treatment minus the proportion of patients treated multiplied by T. That is, we combine treatments given and events by weighting the proportion of treatments by the ratio of harm from treatment and harm from event. The unit is therefore in terms of events, or, alternatively, the disutility of event is defined as 1.
Net benefit = decrease in event rate – treatment rate × T
More generally, we define net benefit as:
where n is the number of patients, 1 and 0 are indicators for treatment and no treatment respectively, x is the indicator for event, and i is an indicator for each patient. To illustrate calculation of net benefit, and as a simple proof, we will use the hypothetical data above and set T at 5% (see table ). As T is equal to the observed difference between groups, net benefit should be and is zero for the strategy of "treat all". The "treat by prediction" model has a net benefit of 0.0125, suggesting that, for a T of 5%, prediction is the best strategy.
Calculation of net benefit in a hypothetical data set for a T of 5%.
The net benefit function can also be applied to continuous endpoints. In this case x is the endpoint, such as depression score or number of days with pain, for each patient. When a high value of the endpoint is a desirable, such as duration of survival, x should be replaced in the function by -x. For continuous endpoints, T is defined as the minimum improvement, such as a percent reduction in pain, that a patient would require before opting for treatment.
Net benefit is calculated separately for the strategy of treating all patients – in which case the event and treatment rates are simply the observed group level proportions – and for the strategy of treating patients according to the prediction model, where event and treatment rates are calculated by using the outcomes from patients whose randomized allocation is congruent with the recommendation of the prediction model. Our methodology for determining the effectiveness of a model is therefore as follows:
1. Obtain data from one or more randomized trials. The data should consist of the variables required by the prediction model, treatment assignment and an indicator as to whether the patient experienced an event.
2. Determine the number of patients and the number of events in the control and treatment groups.
3. Apply the prediction model to the data set and estimate the individualized prediction of treatment benefit, Di, for each patient.
4. Choose a value for the treatment threshold, T, based on consideration of the harms associated with treatment and those associated with an event.
5. Compare the estimate for Di with T for each patient: if Di > T, define patient as "Treatment recommended"; if Di < T define patient as "Treatment not recommended".
6. Identify all patients where the treatment recommendation is the same as the randomized assignment. For example, if T is 3%, and a patient in the treatment arm has an estimated Di of 8%, the patient's actual and recommended treatment are congruent. The patient would therefore be retained for analysis. A patient with in the treatment arm with an estimate Di of 2%, or a patient in the control arm with a Di estimate of 12%, would not be included in the analysis (see table for illustrative data).
Hypothetical data for some example patients using a treatment threshold (T) of 5%.
7. Determine the total number of patients with congruent treatment recommendations, the number of these who have an event and the number who are treated.
8. Apply the net benefit function to the data in 3 and 8 to determine the relative value of treating everyone and treating according to the prediction model.
9. If appropriate, repeat for a range of T's.
We first applied our methods to a variety of simulated data sets (further details are available on request from the authors). In brief our findings were that the value of prediction modelling was increased with lower event rates, less effective treatment, and higher predictive accuracy. Prediction modelling was of less benefit if event rates were high, treatment was highly effective, or predictive accuracy was poor. We also found that, even where application of a prediction model was of benefit, there were patients for whom prediction modelling should not be applied. Typically these were patients with either very high or very low Ti: prediction modelling was often not suitable for these patients due to poor model calibration for patients at either very high or low risk, and high misclassification costs for patients with very low Ti.
Application to real data
We then applied our method to three real data sets. The first data set is from the ACCENT (Adjuvant Colon Cancer Endpoints) group, an international collaboration that collates and analyzes individual patient data from randomized trials of adjuvant chemotherapy for colorectal cancer. In 2004, ACCENT published a prediction model in the Journal of Clinical Oncology
This model estimates the probability that a patient will be disease-free and alive at five years with and without adjuvant therapy, depending on variables such as age, stage and nodal status. The prediction tool is available online[7
]. The ACCENT data consist of 3302 patients enrolled in seven randomized trials and must therefore be seen as a "gold standard" for prediction models used to implement the results of randomized trials.
Unfortunately, relatively few areas in medical research benefit from large pooled-analyses of individual patient data. To understand some of the characteristics of modelling single randomized trials, in comparison to a gold standard, we chose one of the studies ("the Moertel trial") from the ACCENT pooled-analysis[14
]. A full data set for this trial is available on the Internet[15
]. We modelled this data set independently of the ACCENT model and data so as to simulate the real life situation of an analyst faced with new data. The Moertel trial[14
] involved a comparison between levamisole, chemotherapy plus levamisole or surgery alone in 929 patients with stage III colorectal cancer. We focused on the comparison between chemotherapy plus levamisole versus surgery alone. Details of the modelling approach used are available on request from the authors.
In the case of adjuvant therapy for colon cancer, the general recommendation is in favor of treatment, that is, the difference between the proportion of deaths in treatment and control groups is generally considered clinically relevant. In such a case, the role of prediction modelling is to identify what is likely to be a minority of patients who are at decreased risk of disease recurrence and whose expected benefit is therefore lower than the group level estimate. Alternatively, consider the case where a treatment is effective, but the difference between groups is more modest. Here the role of prediction modelling is to identify a subgroup of patients at greater than average risk whose expected benefit from treatment would more than likely outweigh the costs and harms. An example of this situation is the use of 5-alpha-reductase inhibitors to prevent complications of benign prostatic hyperplasia (BPH). Though undoubtedly effective[16
], the degree of benefit appears moderate, primarily because only a small proportion of men with untreated BPH experience clinically important events such as acute urinary retention or the need for surgery[17
]. The drugs can be somewhat expensive, and there is a risk of sexual side-effects. To examine the role of prediction modelling when the benefit from treatment is modest, we obtained individual patient data from three randomized trials of Dutasteride (total n = 4294) for the prevention of complications from BPH[18
]. Modelling of the Dutasteride data has been previously described[19