|Home | About | Journals | Submit | Contact Us | Français|
Risk prediction models based on medical history or results of tests are increasingly common in the cancer literature. An important use of these models is to make treatment decisions on the basis of estimated risk. The relative utility curve is a simple method for evaluating risk prediction in a medical decision-making framework. Relative utility curves have three attractive features for the evaluation of risk prediction models. First, they put risk prediction into perspective because relative utility is the fraction of the expected utility of perfect prediction obtained by the risk prediction model at the optimal cut point. Second, they do not require precise specification of harms and benefits because relative utility is plotted against a summary measure of harms and benefits (ie, the risk threshold). Third, they are easy to compute from standard tables of data found in many articles on risk prediction. An important use of relative utility curves is to evaluate the addition of a risk factor to the risk prediction model. To illustrate an application of relative utility curves, an analysis was performed on previously published data involving the addition of breast density to a risk prediction model for invasive breast cancer.
A risk prediction model is a mathematical model to predict the risk of developing disease or other health outcome on the basis of medical history and the results of tests. A primary use of risk prediction is to help make treatment decisions depending on a person’s risk for developing disease in the absence of treatment. (Throughout this commentary, the “absence of treatment” refers to the absence of the particular treatment contemplated for high-risk persons.) The risk prediction model is constructed from data on persons who did not receive the treatment and hence estimates the risk of disease in the absence of treatment. An important issue in evaluating risk prediction models is to determine whether or not it is worthwhile to include an additional risk factor in the risk prediction model, such as the inclusion of breast density in a model to predict the risk of invasive breast cancer (1). This issue is addressed by comparing risk prediction models with and without the additional risk factor.
Purely statistical measures for comparing two risk prediction models have limited use for medical decision making because they do not incorporate harms and benefits related to treatment decisions arising from the risk prediction model. Such statistical measures include the odds ratio for the additional risk factor, the difference in areas under the receiver operating characteristic (ROC) curves, the difference in fraction of persons who are in the lowest and highest risk categories (2), and the fractions of persons who are correctly and incorrectly reclassified in terms of the risk prediction model with the additional risk factor (1–3). In contrast, harms and benefits are explicitly incorporated into utility formulations for evaluating risk prediction models (4–8). However, these utility formulations are sometimes difficult to implement because of the need to specify multiple harms and benefits and the complexity of investigating the impact of varying the levels of harms and benefits.
The difficulties with the utility formulation can be reduced by using the recently developed methods of relative utility curves (9) and related decision curves (10,11). Relative utility is the fraction of the utility of perfect prediction that is achieved at the optimal cut point for a risk prediction model. The relative utility curve is a plot of relative utility vs risk threshold (ie, a summary measure of harms and benefits). Relative utility and decision curves make it easy to determine the impact of various risk thresholds. The focus in this commentary is on relative utility curves because, unlike decision curves, they provide perspective relative to perfect prediction. Evaluation of an additional risk factor in a risk prediction model is based on a comparison of relative utility curves for risk prediction models with and without the additional risk factor. These topics will be explained more fully below.
The computation and application of relative utility curves is illustrated with the evaluation of risk prediction for invasive breast cancer when there is a possible recommendation for chemoprevention, such as treatment with tamoxifen, for women at high risk for invasive breast cancer. The additional risk factor to be evaluated is breast density. Investigators have fit two risk prediction models, with and without breast density, to prospectively collected data from women who were older than 35 years, who had received at least one mammogram with a measurement of breast density, and who did not have cancer in the first 6 months of follow-up (1). Follow-up lasted approximately 5 years. Presumably, few persons in this study received chemoprevention for breast cancer, so that the risk of invasive breast cancer in the risk prediction model is in the absence of chemoprevention for breast cancer. For the risk prediction model labeled in this presentation as “Baseline Factors,” the investigators estimated the risk of invasive breast cancer as a function of the baseline variables of age, race, ethnicity, family history, and biopsy history. For the risk prediction model labeled here as “Baseline Factors and Breast Density,” the investigators estimated the risk of invasive breast cancer as a function of breast density as well as the other baseline variables. Details of the model fits, which can be found in the original publication (1), are not relevant to the evaluation of the risk prediction model. Key formulas for computing relative utility are presented in the text and in Table 1. Interested readers can find derivations of the formulas in the Supplementary Material (available online), much of which is based on Baker et al. (9).
The starting point for these computations is a table of data on the number of persons who develop disease in each risk group and the total number in each risk group; such tables are found in many publications on risk prediction models. Consider the example involving invasive breast cancer (Table 1). For each person in the aforementioned breast cancer study (1), a 5-year predicted risk of invasive breast cancer in the absence of chemoprevention was separately computed under each risk prediction model, “Baseline Factors” and “Baseline Factors and Breast Density.” For each risk prediction model, each person in the study was assigned to one of four risk groups on the basis of her predicted risk of invasive breast cancer: risk group 1 = predicted risks of less than 1%; risk group 2 = predicted risks of 1%–1.66%; risk group 3 = predicted risks of 1.67%–2.49%; and risk group 4 = predicted risks greater than or equal to 2.5%. For each risk prediction model, the data for computing relative utility curves are the numbers of persons in each risk group and the numbers of persons who developed invasive breast cancer in each risk group.
From the data table, it is straightforward to compute the estimated risks for each risk group and the overall probability of developing disease in the study. Let j = 1, 2, 3, or 4 (ie, according to the index of the four risk groups). Let xj denote the number of persons in risk group j who developed disease in the absence of treatment and let nj denote the number of persons in risk group j. Then the estimated risk of developing disease in the absence of treatment among persons in risk group j of the study is
the fraction of persons in the study whose predicted risk under the model falls into risk group j in the study is
and the estimated overall probability of developing disease in the absence of treatment in the study is
With the four risk groups, p = r1w1 + r2w2 + r3w3 + r4w4, which is a weighted average of the estimated risks in each risk group (see Table 1 for an example of these calculations).
An ROC curve can be computed by using the estimated risks in the study. The ROC curve plots the true-positive rate, which is the estimated probability of a positive classification among persons who developed disease, vs the false-positive rate, which is the estimated probability of positive classification among persons who did not develop disease. When calculating false-positive rate and true-positive rate, a cut point j is specified to indicate that persons in risk groups j or higher are classified as positive, and summations are over risk groups indexed by s that are greater than or equal to cut point j. In terms of estimated risks, the false-positive rate (FPR) for cut point j is computed as
and the true-positive rate (TPR) for cut point j is computed as
For example, if there are four risk groups, the false-positive rate for risk group 2 is FPR2 = [(1 − r2)w2 + (1 − r3)w3 + (1 − r4)w4]/(1 − p). Another key quantity is the slope of the ROC curve to the left of the point (FPRj, TPRj), which can be written as
Examples of these computations are shown in Table 1.
In the application to risk prediction for invasive breast cancer, the ROC curve for the risk prediction model “Baseline Factors and Breast Density” was only slightly higher than the ROC curve for “Baseline Factors” (Figure 1), indicating that inclusion of breast density in the risk prediction model provided little improvement in classification accuracy. However, this comparison of ROC curves is limited because it does not evaluate the impact on medical decision making of including breast density in the risk prediction model. More details of the limitations of ROC curves for evaluating risk prediction will be discussed below.
Sometimes investigators want to evaluate risk prediction in a target population with an overall probability of developing disease that differs from the overall probability of developing disease in the study sample. In this situation, the estimated risk of disease in risk group j of the target population will differ from the estimated risk of disease in risk group j of the study and so needs to be computed. Let
p* = the overall probability of developing disease in the absence of treatment in the target population, a quantity that is considered to be known and
= estimated risk of developing disease in the absence of treatment among persons in risk group j of the target population.
If the ROC curve in the target population is the same as in the study, then the quantity can be computed from p*, p, and rj by using the following formula:
The slope of the ROC curve in the target population is which equals the slope of the ROC curve in the study.
For summarizing harms and benefits, the key quantity is the risk threshold, R, which is the risk of disease in the absence of treatment at which a person would be indifferent between receiving treatment for the disease and not receiving treatment for the disease. As will be discussed, the relative utility that ignores any harms of testing is a function of the harms and benefits of the possible outcomes of treatment and no treatment only through the risk threshold. This characteristic of relative utility is important because it obviates detailed specification of harms and benefits. For the application used in this commentary, the risk threshold is the 5-year risk of invasive breast cancer at which a woman would be indifferent between undertaking or not undertaking chemoprevention (ie, taking tamoxifen to prevent breast cancer).
Whether or not treatment is given in the absence of risk prediction provides important information about the range of risk thresholds and the part of the ROC curve that is relevant to a person (ie, the relevant region). If a person would not receive treatment in the absence of risk prediction, as is typically the case with the chemoprevention of breast cancer, the risk threshold for that person must be greater than the overall probability of developing disease in the absence of treatment, namely, R is greater than p*, and the slope of the ROC curve must be greater than 1 (eg, j = 3 or 4 in this example). Conversely, if a person would receive treatment in the absence of risk prediction, the relevant region is that in which R is less than p* and the relevant slope of the ROC curve is less than 1 (eg, j = 1 or 2 in this example). The area under the ROC curve is generally an inappropriate measure of risk prediction because it includes the area under the part of ROC curve outside the relevant region (9).
The term “utility” refers to either harms or benefits that are measured in the same units. The term “expected utility” refers to the average utility taking into account the probability of developing disease. The relative utility is the fraction of the expected utility of perfect prediction achieved at the optimal cut point for a risk prediction model. The optimal cut point for a risk prediction model is the cut point that maximizes the expected utility of prediction. A classic result in the theory of medical decision making is that for a person with risk threshold R, the optimal cut point j is associated with or equivalently ROCSlopej = [(1 − p*)/p*] × [R/(1 − R)] (4,5,6,7,9,12). If the harms of testing for or measuring risk factors are ignored, the relative utility (RU) ignoring testing harms for a person with risk threshold R can be written simply as
when cut point j is selected so that equals R (ie, the optimal cut point to maximize the expected utility of prediction). The conditions that and correspond to relevant regions of the relative utility curve when, as required, each corresponds to a risk threshold. As will be discussed below, the above formula facilitates computation of the maximum acceptable harm of testing for an additional risk factor.
A relative utility curve is a plot of the relative utility when harms of testing are ignored RU(no test harm)j against risk threshold In other words, the relative utility curve plots the fraction of the expected utility of perfect prediction obtained by the risk prediction model at the optimal cut point associated with the risk threshold R in the scenario when testing harms are ignored (Figure 2). Linear interpolation is used to compute relative utilities for values of R between plotted points. For a smooth ROC curve with the usual concave (curving inward) shape, the relative utility curve is highest when the risk threshold R = p* and decreases toward zero to the left and right of the point where R = p*. The theoretically highest possible value of the relative utility is 1, which corresponds to perfect prediction and provides a benchmark for evaluating risk prediction models. An attractive aspect of relative utility curves, which is also shared by decision curves (10), is that they show a range of values for R, rather than focusing a single value for R, so that precise specification of R is not required.
Another attractive aspect of relative utility curves is that two relative utility curves are aligned at the same risk threshold, which is ideal for making comparisons for a person with a given risk threshold. In contrast, two ROC curves are aligned at the same false-positive rates, which generally correspond to different slopes of the ROC curves and hence are associated with optimal cut points for different risk thresholds, making visual comparison at the same risk threshold difficult (2).
For purposes of illustration, the relative utility curves for the data on invasive breast cancer were computed for a target population with overall probability p* = 0.010 of developing invasive breast cancer in the absence of treatment (Figure 2). The relative utility curves in the relevant region were similar for the two models, “Baseline Factors” and “Baseline Factors and Breast Density.” For example, when R equals 0.013, the relative utility ignoring testing harms was 0.15 for “Baseline Factors” and 0.18 for “Baseline Factors and Breast Density.” Thus, adding breast density to the risk prediction model increased the relative utility ignoring testing harms by 0.03 (with standard error of 0.003). To put this increase in relative utility in perspective, the largest possible increase would be from 0.15 to 1.00, a difference of 0.85.
An additional risk factor will generally increase the performance of the risk prediction model but it comes at a “price” of the harm of testing for the additional risk factor. An additional test is deemed worthwhile if its harm is less than the maximum acceptable testing harm, which is the increased harm of testing for or measuring an additional risk factor that would exactly counterbalance the increased utility from better risk prediction with the inclusion of the additional risk factor.
The harm of an additional test or measurement, which could also include monetary costs, is measured as equivalent to a fraction of persons not treated among persons who would develop disease in the absence of treatment. When there is no treatment in the absence of risk prediction, so that the relevant region of the relative utility curve is R > p*, the maximum acceptable testing harm equals p* multiplied by the difference in relative utilities when the testing harm is ignored.
For the breast cancer example at a risk threshold R of 0.013, the maximum acceptable testing harm equals 0.0003, which is computed as a 0.03 increase in the relative utility ignoring testing harm multiplied by 0.01, the overall probability of developing disease in the target population, p*. Therefore, measuring breast density is worthwhile if the harm of measuring breast density is less than 0.0003—the equivalent of three in 10000 women not being treated with chemoprevention among those who would have developed invasive breast cancer in the absence of treatment for invasive breast cancer. Alternatively, one can say that breast density is worth measuring if one would be willing to exchange at least 1/0.0003 = 3333 measurements of breast density to treat with chemoprevention one woman who would develop invasive breast cancer in the absence of treatment for invasive breast cancer (9).
To avoid using the same data to estimate the risk prediction model and to compute the relative utility curve, the data should be randomly split into what are called training and test samples, the risk prediction model should be fit to the training sample, and the relative utility curve should be computed from data in the test sample. Standard errors can be computed by randomly repeating this procedure (13). Although the total sample for the risk prediction model for invasive breast cancer had been split into separate training and test samples, the published counts for the risk groups were from the total sample and not the test sample, as would ideally have been preferred for this analysis.
Relative utility curves should be computed from a concave ROC curve because concavity in the ROC curve is a prerequisite for optimal medical decision making (14). An ROC curve can be made concave by combining categories, smoothing, or creating a concave envelope of points. Alternatively, a concave ROC curve can be created by computing the estimated risk of developing disease in risk group j in the absence of treatment, rj, as an average of predicted risks of developing disease among persons in risk group j. This latter approach requires the validity of the model, which can be partly checked by comparing predicted and observed estimates, a process called calibration. Smoother relative utility curves could be created by computing false-positive rate and true-positive rate for each individual's predicted risk, as is done with decision curves (10).
The application that was presented in this commentary involved evaluating the addition of breast density to a risk prediction model for invasive breast cancer. A previous application involved evaluating the addition of a test for high-density lipoprotein to a risk prediction model for cardiovascular disease (9). The comparison of relative utility curves could also be used to evaluate the addition of a genomic test to a risk prediction model that is based on conventional risk factors either in a cancer-prevention setting, such as the value of adding information on information on single nucleotide polymorphisms to known risk factors for breast cancer risk (8), or a cancer treatment setting, such as the value of adding genetic information to established clinical prognostic factor in models to predict survival of lymphoma patients, survival of breast cancer patients, or diagnosis of lymph node metastases in head and neck cancer (15).
Division of Cancer Prevention in the National Cancer Institute and the National Institutes of Health.
The data were previously published (1).
The author had full responsibility for the analysis and interpretation of the data, the decision to submit the manuscript for publication, and the writing of the manuscript.