We can calculate an individual's estimated risk given data on his risk factors by use of the fitted risk model. For the prostate cancer example, the calculation (3
) is as follows:
where DRE is digital rectal examination, and
This risk calculator of Thompson et al. (3
) is available online at http://www.compass.fhcrc.org/edrnnci/bin/calculator/main.asp
. We calculated the estimated risk for each of the individuals in the Prostate Cancer Prevention Trial. The predictiveness curve in shows the distribution of risks. To create the curve, we ordered the risks from lowest to highest and plotted their values. We see that, at 90 percent on the x
-axis, the risk value is 0.104. This indicates that, on the basis of the predictors in the model, 90 percent of subjects in the cohort have calculated risks below 0.104 and only 10 percent have risks at or above 0.104.
FIGURE 1. Predictiveness curve for the risk model shown in that includes prostate-specific antigen, age, digital rectal examination, and prior biopsy as risk factors for high-grade prostate cancer, Prostate Cancer Prevention Trial, 1993–2003. Shown (more ...)
Another way of using the graph is to start at a risk value on the y-axis and to read the corresponding percent on the x-axis. For example, with “risk = 0.20,” we see that the percent is 97.8 percent. That is, we estimate that 2.2 percent of the subjects in the cohort have estimated risks at or above 0.20. With “risk = 0.02,” the percent is 39.0 percent, indicating that 39.0 percent of the subjects in the cohort have risks below 0.02.
What does the graph offer that is not summarized in ? It shows the range and distribution of estimated risk levels associated with the model when it is applied to the population from which the cohort was drawn. Consider that an individual wants to use his calculated risk in deciding whether or not to have a biopsy. The decision is more straightforward if his estimated risk of disease is close to 0 or 1. If his calculated risk is in an equivocal range, it is not helpful. Suppose, for illustration, that 20 percent risk of high-grade disease is sufficiently high to recommend a biopsy and that 2 percent risk is sufficiently low to decide against biopsy. Individuals whose risks are calculated in the range 0.02–0.20 are unsure about whether or not they should have a biopsy obtained. (A formal cost-benefit analysis that incorporates their risk of disease might be helpful, although specifying costs and benefits is always difficult.) A risk model will be most useful for individual decision making if calculated risks of having high-grade disease tend to exceed 20 percent or be less than 2 percent. We see from , however, that the prostate cancer risk model leaves the majority of men, 58.8 percent, in the indecisive risk region. Alternative thresholds might be chosen for defining high and low risk. If it is reasonable to assume that a man with a <5 percent risk of high-grade disease may defer further evaluation while a man with a >10 percent risk would prefer an evaluation, the corresponding indecisive risk region would contain only 25 percent of the population. It is important to keep in mind, however, that individuals typically do not distinguish between minor variations in risk, so we prefer to use the more extreme definitions of low and high risk in our illustrations.
A risk calculator should be derived from a risk model that fits the data well. The standard approach to evaluating model fit, that is, calibration, is to categorize subjects according to deciles (or other quantiles) of risk according to the model and to compare average predicted risk with the observed proportion of events in each category. The Hosmer-Lemeshow statistic (4
) uses this approach to formally test for goodness of fit. Interestingly, the predictiveness curve offers a graphical approach to assessing goodness of fit in this sense. At the midpoint of each decile of risk in , we superimpose the corresponding observed proportions of high-grade cancer. Visually, one can compare these observed proportions with the predictiveness curve, noting that the curve averaged over the decile category is the average model predicted risk. An equivalent display often seen in practice is to plot the observed proportion versus the average risk (5
, section 14.6). For the model in , the Hosmer-Lemeshow statistic is 9.11 (p
= 0.33), indicating that it fits the data rather well. However, the graphical display offers a more complete description of how observed and modeled risks compare. It shows the components of the test statistic. In addition, when there is particular interest in model fit in low- and high-risk groups, allows one to focus accordingly. We obtained similar results when the data were split into halves, with the model fit on one half and assessed with the Hosmer-Lemeshow statistic and corresponding graphic on the second half. This avoids issues with fitting a model and assessing its fit with the same data.
In viewing the predictiveness curve, one must also be cognizant of sampling variability. Neither the risks nor the distributions of predictors in the population are known with certainty, and both components enter into the predictiveness curve. This can be addressed by using bootstrapping techniques (6
) to calculate confidence intervals and p
values. We used the simple bootstrap, resampling 5,519 subjects with replacement from the original data set, fitting the risk model with the four selected covariates and calculating fitted risks for resampled subjects. When confidence intervals were calculated, a confidence level of 95 percent was used throughout. As an example, we noted that only 10 percent of the subjects have risks in excess of 0.104 (95 percent confidence interval (CI): 0.090, 0.120), indicating that the risk quantile is estimated rather precisely, at least assuming correct form for the risk model. Similarly, the estimates and confidence intervals for the proportions of subjects with risks at or above 0.20 and below 0.02 are 0.022 (95 percent CI: 0.014, 0.034) and 0.390 (95 percent CI: 0.318, 0.467), respectively.
Different risk models can be compared through their predictiveness curves. In , we see that the predictiveness curve for PSA alone is almost identical to that of the more comprehensive model that includes the additional risk factors of age, prior biopsy, and digital rectal examination. Both models calculate risks at or less than the 0.02 low-risk threshold for 36 percent and 39 percent of the population, respectively. Although the p value for this comparison, p = 0.05, is marginally statistically significant, the magnitude of the difference, 3 percent, is clinically insignificant. At the high-risk end of the scale, the PSA model puts 1.2 percent (95 percent CI: 0.7, 2.2) of subjects at or above the 0.20 risk level, while the more comprehensive model puts 2.2 percent (95 percent CI: 1.4, 3.4) of subjects in the high-risk range (p = 0.007). For comparison, we also include a simulated marker with much better performance. The simulated marker identifies 70.4 percent (95 percent CI: 66.7, 73.9) of the subjects as low risk and 6.3 percent (95 percent CI: 5.5, 7.2) as high risk, but it leaves 23.3 percent with calculated risks in the equivocal range between 0.02 and 0.20. This marker was simulated as a standard normal random variable for controls and a normal (mean = 2, standard deviation = 1) random variable for cases.
Predictiveness curves for prostate-specific antigen (PSA) alone, PSA and other factors, and the simulated marker (SIM), Prostate Cancer Prevention Trial, 1993–2003.
Another approach to comparing risk models is with the R
-squared statistic, the proportion of explained variation generalized from linear to logistic regression (7
). The values 0.053, 0.066, and 0.310 for PSA alone, for PSA and other factors, and for the simulated marker, respectively, corroborate the results depicted in the predictiveness curves. However, the interpretation of the R
-squared value as the proportion of the variance in disease explained by the model is not very intuitive. Interestingly, R2
can be calculated as a summary index from the predictiveness curve:
where ρ = disease prevalence in the study population, and pred(v
) is the value of the risk at the v
th percentile. The denominator term in R2
is a standardization factor leading to values in the range from 0 (useless prediction) to 1 (perfect prediction). We find the display of the predictiveness curve more clinically useful than simply reporting its R2
In our plots, we include a horizontal line located at the risk level equal to the prevalence. This corresponds to the predictiveness curve for a completely uninformative risk model, one that assigns all subjects equal risk. It serves as a reference curve. Moreover, mathematically, the positive area above the horizontal line but below the predictiveness curve must equal the negative area below the horizontal line but above the predictiveness curve. Better markers will show larger positive and negative areas, and we find that the horizontal line is a helpful visual aid.