Accurate risk assessment and disease prognosis are essential in health care. To improve disease prevention and management, risk stratification (RS) rules are often developed to assign subjects into different risk groups where each group corresponds a particular intervention. For example, a commonly used RS rule in cardiovascular disease prevention stratifies patients into low, intermediate, and high risk groups. Patients are typically recommended to receive antihypertensive therapy if in the intermediate risk group and receive statin if in the high risk group. In studies designed to develop RS rules, measurements of risk factors are often ascertained at baseline and patients are followed over time for the occurrence of a certain clinical event. Since the risk of experiencing such an event may change over time, one must incorporate the time domain when constructing RS rules. For example, cardiovascular RS rules are often based on the risk of experiencing a cardiovascular event within 10 years since the measurement of the risk factors. In this paper, we are interested in stratification rules for the risk of experiencing an event within t years since marker measurement. Throughout, we use the terms “cases” and “controls” to denote subjects who will and will not experience an event within t years, respectively, if the RS of interest has not been employed. The potential disease status may be changed after patients receive RS-guided intervention.
When developing and evaluating RS rules, it is crucial to understand the potential clinical and financial costs associated with assigning patients into incorrect risk groups and thus receiving suboptimal interventions. Unnecessary medication costs arise when controls are incorrectly assigned to high risk groups. Assigning cases to the low-risk category may lead to costs of life-years lost, productivity, and the subsequent medication. This signifies the importance of precise risk prediction and rigorous evaluation of RS rules prior to their wide spread use in clinical practice.
In practice, RS rules are often derived from risk prediction models with a panel of markers. Based on the predicted risk from the model, future subjects are assigned to different risk categories to receive the corresponding intervention. There are 3 important steps in developing an effective RS rule: (1) constructing a regression model predictive to the clinical response of interest (2) determining the appropriate risk category corresponding to specific intervention, and (3) evaluating the resulting RS rule in an objective and transparent way. While most of statistical methodological research focuses on the step of empirical model building, the clear answer to latter 2 steps remains elusive. When evaluating the performance of risk prediction models, measures of accuracy based on the discrimination and calibration have been considered (Gail and Pfeiffer, 2005
), (Cook, 2007
). Discrimination measures the ability of the risk prediction model in discriminating cases from controls. Calibration measures how well the predicted risk approximates the true conditional risk given the marker measurements. However, neither of these 2 types of measures are appropriate for evaluating the performance of RS. One of the most commonly used discrimination measure is the receiver operating characteristic (ROC) curve (Pepe, 2003
). Since the ROC curve is scale invariant, a monotone transformation of predicted risks does not affect the discriminatory accuracy but could lead to dramatic changes in the assignment of risk groups. Calibration measures such as the Hosmer–Lemeshow goodness of fit statistic are also inadequate because a perfectly calibrated model may have poor performance in RS if the available markers have little power in predicting the outcome. To comprehensively assess a risk model, Pepe and others (2008)
advocated the use of a predictiveness curve in conjunction with discriminatory measures. However, such an approach could not be directly applied to evaluate the performance of RS-guided intervention. In the context of evaluating the incremental value of a new marker for risk reclassification, Pencina and others (2008)
proposed to measure the net reclassification improvement (NRI) based on the proportion of subjects reclassified into higher- or lower-risk categories. The NRI can be used to compare RS rules but not to evaluate a single RS rule. Furthermore, the NRI does not account for the differential costs associated with different types of incorrect assignment.
The ultimate value of an RS rule can be represented as the extra total cost/benefit if the RS-guided intervention applied to the target population. Therefore, to effectively construct and evaluate an RS rule, one should have information on the financial and medical costs/benefits associated with the interventions. Therefore, an ideal data set to evaluate the RS rule would consist of patients whose intervention status is known. With such a data set along with the cost/benefit information on the interventions, one may comprehensively evaluate an RS rule based on the expected cost associated with incorrect assignment of risk groups. In this paper, we propose a unified framework to determine the optimal risk categorization and quantify the value of the corresponding RS rule based on the expected costs when the cost parameters are assumed to be given. As a simple example, patients may be stratified into low or high risk groups where low-risk patients would be managed without intervention and high-risk patients would receive a treatment. Two types of costs may arise from such a stratification: the unnecessary intervention for controls, denoted by C0
; and the cost of not receiving treatment for cases, denoted by C1
. When evaluating an RS rule that differentiates the high- and low-risk patients, it is important to account for the trade-off between these 2 types of costs (Cantor and others, 1999
), (Obuchowski, 2003
) and develop an RS rule with a cutoff value that optimizes the trade-off between these costs. In Section 2, we discuss the relationship between costs and optimal threshold values of an RS rules based on a single marker. Procedures for comparing multiple RS rules are also discussed. These procedures are generalized to the setting where multiple risk factors are available for RS in Section 3. The proposed methods are illustrated in Section 3 with a data set from the Cardiovascular Health Study (CHS) and simulation studies. Some remarks are given in Section 5.