Risk reclassification for single factors can be examined using models with and without each risk factor in turn; that is, comparing a model without a given risk factor to the full model. For CVD, relevant strata are 0-<5%, 5-<10%, 10-<20% and >=20% ten-year risk. illustrates the risk reclassification for models with and without SBP, but including all other risk factors. The model without SBP categorized 86% of women into the lowest risk group, with a 10-year risk of <5%, 10% into the 5-<10% risk stratum, 3% into the 10-<20% risk stratum, and 1% at 20% or higher risk. The same was approximately true for the model including SBP, such that the ‘marginal’ proportions were very similar.

| **Table 2**Reclassification table comparing 10-year risk strata for models including risk factors for cardiovascular disease in the Women’s Health Study but with and without systolic blood pressure.^{*} |

A continuous analog of this table, which plots the predicted vales from both models, along with the category cut points, is provided in . The figure plots the values of the logarithm base 10 for the predicted risks from the two models, and shows the spread and difference in these, with the diagonal line denoting the line of identity. Ideally more cases will be above than below the line of identity. The dashed lines indicate the risk strata. The striated appearance is due to use of categories for SBP, here in 9 categories of 10 mmHg from <110 mmHg to 180 or more mmHg. The lines show by how much the predicted values can change when SBP increases or decreases by 10 mmHg units.

The overall percent reclassified gives some indication of how many individuals would change risk categories, and possibly treatment decisions, under the new model. Of the 24,558 women, 2,022 (8%) were classified into different risk strata. The overall percent, however, is heavily influenced by the incidence of disease in the population. In the WHS, the majority of women were in the lowest category under both models. Those who were in the intermediate categories may be more relevant clinically, and demonstrate more shift in risk category. For example, of those at 5-<10% risk in the model without SBP, 40% were reclassified into higher or lower categories. Of those at 10-<20% risk, 36% were reclassified.

More important for model fit than the simple percent reclassified, however, is a comparison of observed and expected rates of disease within each cross-classified category. This determines whether individuals are reclassified correctly, or whether the changes are due to chance. Observations in a reclassified cell are considered ‘correctly’ reclassified if the observed rate is closer to the new than to the old risk stratum. For example, 696 women were reclassified from <5% to 5-<10% 10-year risk. The observed 10-year risk based on a Kaplan-Meier estimate for these 696 women was 6.8%, which falls into the 5-<10% category. The average estimated risk for these women from the model without SBP was 4.0%, while that from the model with SBP was 6.1%, which is closer to the observed risk of 6.8%. Overall, 2022 women were reclassified; 2009 of these fell into cells with at least 20 women, for whom the observed rate could be computed. Of these 2009 women, 1932 (96%), were reclassified correctly.

Observed and average predicted rates, for cells with at least 20 observations, can be compared based on a chi-squared goodness-of-fit test within reclassified categories for each model separately (

9). This is simply the familiar Hosmer-Lemeshow goodness-of-fit statistic, but applied to reclassified categories, and we refer to it as the Reclassification Calibration Statistic (see Glossary). It is calculated as

where

*n*_{k} is the number in cell k,

*O*_{k} is the observed number of events in cell k, and

is the average predicted risk in cell k for the model under consideration. Survival data can be incorporated by using the observed events and predicted risk as of a given time, such as 10 years. The Kaplan-Meier estimated of the observed risk can be used to accommodate censored data. The statistic follows an approximate chi-square distribution with k-2 degrees of freedom, where k is the number of cells with at least 20 observations in the table. In , k=11, and the degrees of freedom is 9. As with the usual Hosmer-Lemeshow test, a significant result indicates poor fit. The test for SBP found that the model without SBP suffered from a strong lack of fit (X

^{2} = 68.3, p<0.001). That for the model with SBP X

^{2} = 22.9 (p=0.006), which still indicated some lack of fit but to a much lesser extent.

examines risk reclassification from the initial reduced model eliminating each predictor individually compared to the full model. The overall percent reclassified ranged from 3% for models with and without parental history of MI up to 13% for models with and without age. The percents reclassified within the intermediate risk categories of 5-<10% and 10-<20% were much higher, ranging from at least 13% to 62% for age, suggesting more substantial changes within these risk strata. For each model comparison, over 95% of those reclassified were reclassified correctly when the variable was included. In comparing the observed to expected rates, the reclassification calibration statistic, a X^{2} statistic, showed significant lack of fit in models excluding each variable. While the full model sometimes demonstrated a lesser degree of lack of fit within these cross-classified categories, the full model provided better fit to the observed rates in each comparison. Thus, each of these variables improved the fit of the model to the observed rates of cardiovascular disease.

| **Table 3**Reclassification measures for deleting variables from the Reynolds Risk Score Model in the Women’s Health Study, using four categories: 0-<5%, 5-<10%, 10-<20%, 20%+.^{*} |