A risk prediction marker is any measure that is used to predict a person’s risk of an event. It may be a quantitative measure such as HDL cholesterol, or a qualitative measure such as family history of disease. Risk predictors are also risk factors, in the sense that they will necessarily be strongly associated with the risk of disease. But a large statistically significant association does not assure that the marker has value in terms of risk prediction for many people.
A risk prediction model is a statistical model that combines information from several markers. Common types of models include logistic regression models, Cox proportional hazards models, and classification trees. Each type of model produces, for each individual, a predicted risk using information in the model. Consider, for example, a model predicting breast cancer risk that includes age as the only predictor. The resulting risk prediction for a woman of a given age is simply the proportion of women her age who develop breast cancer. The woman’s predicted risk will change if more information is included in the model. For instance, if family history information is added, her predicted risk will be the proportion of women her age and with her family history who develop breast cancer.
The purpose of a risk prediction model is to accurately stratify individuals into clinically relevant risk categories. This risk information can be used to guide clinical or policy decisions, for example about preventive interventions for individuals, or disease screening for subpopulations identified as high risk, or to select individuals for inclusion in clinical trials. The value of a risk prediction model for guiding these kinds of decisions can be judged by the extent to which the risk calculated from the model reflects the actual fraction of people in the population with events (its calibration); the proportions in which the population is stratified into clinically relevant risk categories (its stratification capacity); and the extent to which subjects with events are assigned to high risk categories and subjects without events are assigned to low risk categories (its classification accuracy).
Risk prediction models are commonly evaluated using receiver operating characteristic (ROC) curves (e.g. (5
)), which are standard tools for evaluating the discriminatory accuracy of diagnostic or screening markers. The ROC curve shows the true-positive rate versus the false-positive rate for rules that classify individuals using risk thresholds that vary over all possible values. ROC curves are generally not helpful for evaluating risk prediction models because they do not provide information about the actual risks the model predicts, or about the proportions of subjects who have high or low risk values. Moreover, when comparing ROC curves for two risk prediction models, the models are aligned according to their false-positive rates, where different risk thresholds are applied to the two models in order to achieve the same false-positive rate. This is clearly inappropriate. In addition, the area under the ROC curve (AUC or C-statistic), a commonly reported summary measure which can be interpreted as the probability that the predicted risk for a subject with an event is higher than that for a subject without an event, has little direct clinical relevance because clinicians are never asked to compare risks for a pair of subjects, one who will go on to have the event and one who will not. Neither the ROC curve nor the AUC relates to the practical task of predicting risks for clinical decision making.
Cook and colleagues propose using risk stratification tables to evaluate the incremental value of a new marker, or the benefit of adding a new marker (eg C-reactive protein) to an established set of risk predictors (eg Framingham risk predictors such as age, diabetes, cholesterol, smoking, and LDL levels) (1
). In these stratification tables, risks calculated from models with and without the new marker are cross-tabulated. This approach represents a substantial improvement over the use of ROC methodology because it displays the risks calculated by use of the model and the proportions of individuals in the population who are stratified into the risk groups. We will provide an example of this approach, and show how information about model calibration, capacity for risk stratification, and classification accuracy can be derived from a risk stratification table and used to assess the added value of a marker for clinical and healthcare policy decisions.