The original article which introduced the NRI illustrated its application with a 3-category risk stratification [9
]. However, others have applied it with 4 or no categories [13
]. There is nothing implicit in the definition of the NRI which requires risk stratification into categories. The only requirement is that we define what upward and downward reclassification is.
While in some fields risk categories are firmly established and patient care depends on these categories (for example, primary prevention of cardiovascular disease (CVD)), other fields attempt to create meaningful risk categories but there is insufficient information to either justify or promote them (all cause mortality, diabetes, atrial fibrillation). Moreover, even when categories are firmly established (CVD prevention), their application is confused by different definitions of the endpoint of interest and thus different incidence rates for different models (hard CVD, full CVD, hard coronary heart disease (CHD), full CHD and so on). This can lead to different NRI values for the same marker added to different models.
What complicates matters further is the dependence of category-based NRI on the selection and number of categories. We illustrate this phenomenon with a very simple example. Assume 8 subjects, 4 events and 4 non-events with predicted probabilities of event based on a given old (and useless) model of 0.2, 0.4, 0.6 and 0.8 for events and 0.2, 0.4, 0.6 and 0.8 for non-events. Furthermore, assume that the addition of a new marker adds 0.16 to predicted probabilities for all event subjects and subtracts 0.16 for all non-event subjects. If we assume only two risk categories, below and above 0.5, the NRI equals
(event subject with original probability 0.4 moves up and non-event subject with original probability 0.6 moves down). For NRI with 3 risk categories determined by cut-off points of 0.33 and 0.67 we get
(event subjects with initial 0.2 and 0.6 move up and non-events with initial 0.4 and 0.8 move down). With 4 categories determined by cut-offs at 0.25, 0.50 and 0.75 we get
and “no category” NRI with upward and downward movement defined by any upward or downward change in predicted risks is equal to the maximum possible value of 1+1 = 2.00. Similarly, it is not difficult to observe that NRI will also depend on the choice of categories. For this reason, it may not always be true in practice that more categories will mean higher NRI.
The above discussion suggests that the category-less or continuous
NRI is the most objective and versatile measure of improvement in risk prediction. Its definition remains consistent with formulas (1
) with the only difference in the meaning of upward and downward reclassification. In the following sections we show its alternative interpretations and invariance to changing event rates. We argue that in cases where no established categories exist, it is more prudent to use a version of NRI which does not require categories, rather than trying to create them for one particular application. Moreover, in cases where a priori categories do exist, it is still worth to report the continuous NRI for comparison purposes with other applications.
In summary, two versions of NRI can be considered: one with categories which should be used if categories are already established in the field and influence care decisions and one without categories which can be used universally. We introduce the following notation:
- NRI(0.20) for two-category NRI with cut-off at 0.20;
- NRI(0.06,0.20) to denote NRI with three categories, established by cut-off points of 0.06 and 0.20;
- NRI(>0) or “continuous NRI” for NRI with no categories;
Furthermore “event NRI” and “nonevent NRI” would indicate the two very useful subcomponents of the total NRI, with the former calculating the amount of correct reclassification among events and the latter among nonevents. We recommend reporting these along with the single summary NRI for fuller interpretation:
Of course, NRI = event NRI + nonevent NRI. We note that the original NRI was presented as a sum and not average of the two subcomponents for “historical reasons” – this way it matches the approach taken in the definition of integrated discrimination improvement (IDI) which in turn parallels the definition of Youden’s index [24
] and difference in logistic regression R-squares [10
]. However, an average (
) could have an easier interpretation of average weighted improvement in classification (if categories are present) or in discrimination (if no categories are present).
In general we do not recommend using more than 3 categories unless they are already established and there is a justifiable need for that many. It seems to us that 3 categories offer sufficient categorization – high category for individuals with high risk (who should be treated), low category for those with low risk (who do not need treatment) and the middle category for everyone else. The use of categories can only be justified by explicit care recommendations for individuals in each category and it is often unlikely that these would materially differ between two middle categories (for example 0.05-0.10 and 0.10-0.20 in cardiovascular disease prevention). If one feels a partition finer than 3 is needed, then the category-free NRI offers a better option.
We do realize that in some cases, category-based presentation may be more effective in terms of communication of results. Generally, we do not recommend an ad hoc creation of categories as they should be based on multiple factors, including cost considerations. However, if one feels they are absolutely necessary to convey the message and there are no categories established in the field, we recommend that categories are formed in a way which takes into account the event rate, severity of the disease under study and potential care recommendations based on the risk categories created.
We conclude this section with a comment about a modification of the “original” NRI introduced by Cook et al. [26
]. They define “clinical NRI” as the amount of reclassification observed only in the “middle risk” group. It is important to note that such “clinical NRI” is meant to address a different question than the “original NRI”. The “original NRI” attempts to quantify the amount of improvement if the new marker was to be measured on everyone in the population of interest. “Clinical NRI” quantifies the amount of improvement offered by a strategy in which only individuals in the middle risk group have the new marker obtained, have their risk recalculated based on a function which includes the new marker and are reclassified if the new probability leads to a different risk category. As these two NRIs are based on different groups of individuals they cannot be directly compared unless individuals in the high and low risk group who do not change categories in this strategy are included and “clinical NRI” is translated into “original NRI”. The latter approach might offer a more complete picture of the effect of the two-step strategy outlined above.