As it turns out, our method does not require that we obtain information regarding treatment preferences in this way. We use the *theoretical* relationship between the threshold probability of disease and the relative value of false-positive and false-negative results to ascertain the value of a prediction model. Take a group of patients scheduled for treatment by a surgeon who would be unsure whether to preserve or remove the seminal vesicle tip if the probability of SVI were 10%. We can now calculate each patient’s probability of SVI using the multivariable model, and class the result positive if it is equal to or higher than 10% and negative otherwise. Applying these results to the dataset yields the data shown in .

| **Table 1**Relationship between true seminal vesicle invasion (SVI) status and result of prediction model with a positivity criterion of 10% predicted probability of SVI. |

To place a value on this result, we fix

*a* −

*c*, the value of a true-positive result, at 1. We thenobtain the value of a false-positive result,

*b* −

*d,* as -

*p*_{t}/

*(1*−

*p*_{t}). We can now calculate net benefit using the following formula (first attributed to Peirce

^{14}):

In this formula, true- and false-positive count is the number of patients with true- and false-positive results and *n* is the total number of patients. In short, we subtract the proportion of all patients who are false-positive from the proportion who are true-positive, weighting by the relative harm of a false-positive and a false-negative result. In , where *p*_{t} is 10%, the true-positive count is 65, the false-positive count is 225 and the total number of patients (*n*) is 902. The net benefit is therefore (65/902) – (225/902) × (0.1/0.9) = 0.0443. A good model will have a high net benefit: the theoretical range of net benefit is from negative infinity to the incidence of disease.

To determine whether this value is a good one, that is, whether the prediction model should be used for a *p*_{t} of 10%, we need a comparison. The clinical alternative to using a prediction model is to assume that all patients are positive and treat them – as might be done for individuals possibly exposed to a dangerous infection easily treated with antibiotics – or assume that all patients are negative and offer no treatment, as is done for diseases for which there are no proven screening methods. The true- and false-positive count for considering all patients negative are both 0, and hence the net benefit for leaving the seminal vesicle tip in all patients is 0. Hence if the net benefit for the prediction model is positive, it is better to use the model than to assume that everyone is negative. The true- and false-positive count for the strategy of treating all patients are simply the number of patients with and without SVI respectively. Calculating net benefit gives: (87/902) − (815/902) × (0.1/0.9) = − 0.0039 for the strategy of removing seminal vesicles in all patients. This is less than the net benefit of 0.0443 from the prediction model.

At a *p*_{t} of 10%, our prediction model is therefore better than both treating no one and treating everyone. However, patients differ as to how they rate possible side-effects of surgery. For example, a surgeon might be tempted treat more aggressively a man who was impotent but had many responsibilities. For such a man, the surgeon might use a much lower *p*_{t}, say, 2%, that is, the seminal vesicle tip would be removed even if there was only a 2% chance of SVI. At this *p*_{t}, the strategies of treating all men and treating using the model are almost identical (net benefit of 0.0780 and 0.0782 respectively). Similarly, there is a difference of opinion between surgeons regarding the increase in recurrence risk associated with preservation of the seminal tip in a patient with SVI: some surgeons feel that even if a patient has SVI, it is unlikely that the seminal vesicle tip will be involved and, even then, it is not clear that preservation will inevitably lead to recurrence; other surgeons feel that leaving any part of cancerous seminal vesicle will substantively increase recurrence rates. We therefore recommend repeating the above steps for different values of *p*_{t}. Hence:

- Chose a value for
*p*_{t}. - Calculate the number of true- and false-positive results using
*p*_{t} as the cut-point for determining a positive or negative result. - Calculate the net benefit of the prediction model.
- Vary
*p*_{t} over an appropriate range and repeat steps 2 – 3. - Plot net benefit on the y axis against
*p*_{t} on the x axis. - Repeat steps 1 – 5 for each model under consideration.
- Repeat steps 1 – 5 for the strategy of assuming all patients are positive
- Draw a straight line parallel to the x-axis at y=0 representing the net benefit associated with the strategy of assuming that all patients are negative

Applying these steps to our data gives . We term this a “decision curve”. Note that as expected, the two lines reflecting the strategies of “assume all patients have SVI” (i.e., treat all) and “assume no patients have SVI” (i.e., treat none) cross at the prevalence. Also note that the prediction model is comparable to the strategy of treat all at low *p*_{t} and comparable to treat none at high *p*_{t}. This is because the probability of SVI predicted by the model ranges from a minimum of 1.8% to a maximum of 84.3%. Using the model for *p*_{t} < 1.8% or *p*_{t} > 84.3% therefore gives the same result as treat all or treat none, respectively. Between 50% and 84.3%, the value of the model is sometimes negative: this is due to random noise.

In between these two extremes there is a range of *p*_{t} where the prediction model is of value. In the case of SVI prediction this is between ~2% and ~50%. To determine whether the model is of clinical value, we need to consider the likely range of *p*_{t} in the population, that is, the typical threshold probabilities of SVI at which surgeons would opt for complete dissection of seminal vesicles. If it were the case that all surgeons remove the seminal vesicle tip only if there was at least a 60 – 70% risk of SVI, the model clearly has no clinical role. But it is unlikely that any surgeon would consider removal of a healthy seminal vesicle tip to be worse than failing to remove a potentially cancerous one. If, on the other hand, we assume that the likely range of *p*_{t} in the population is between 20% and 30%, we would use the model, because it is of clear benefit at these *p*_{t}’s.

In consultation with clinicians, we estimate that although few if any surgeons would ever have a *p*_{t} much above 10% for any patient, some may have *p*_{t} approaching 1% or less in certain cases. This means that our prediction model will be of benefit in some, but not all cases. Where *p*_{t} is less than 2%, the model is no better than a strategy of treating all patients. Hence where *p*_{t} is less than 2% the model is of no value, and patients should have total seminal vesicle dissection. On the other hand, the model is never worse than the strategy of treating all patients, and because it is based on routinely collected data, it has no obvious downside. Therefore the model will be of use for clinicians who, at least some of the time, would opt for seminal vesicle tip preservation if a patient’s predicted probability of SVI was low.

If the prediction model required obtaining data from medical tests that were invasive, dangerous or involved expenditure of time, effort and money, we can use a slightly different formulation of net benefit:

The harm from the test is a “holistic” estimate of the negative consequence of having to take the test (cost, inconvenience, medical harms and so on) in the units of a true-positive result. For example, if a clinician or a patient thought that missing a case of disease was 50 times worse than having to undergo testing, the test harm would be rated as 0.02. Test harm can also be thought of in terms of the number of patients a clinician would subject to the test to find one case of disease if the test were perfectly accurate.

If the test were harmful in any way, it is possible that the net benefit of testing would be very close to or less than the net benefit of the “treat all” strategy for some *p*_{t}. In such cases we would recommend that the clinician have a careful discussion with the patient, and perhaps, if appropriate, implement a formal decision-analysis. In this sense, interpretation of a decision curve is comparable to interpretation of a clinical trial: if an intervention is of clear benefit, it should be used; if it is clearly ineffective, it should not be used; if its benefit is likely sufficient for some, but not all patients, a careful discussion with patients is indicated.