We considered two criteria for evaluating risk models that measure the concentration of risk in a population and that have potential application in public health and clinical epidemiology.

*PCF*(

*q*), the proportion of cases who will be followed if the proportion

*q* of the general population at highest risk is followed has been recommended by

Pharoah et al. (2002).

Hand and Henley (1997) used the “bad risk rate amongst accepts” that is analogous to 1−

*PCF*, in credit risk models, but neither Pharoah et al. nor Hand and Henley described methods of inference. For models of disease incidence,

*PCF* can determine the proportion of those destined to develop disease who will receive screening or preventive interventions. Following disease diagnosis, it can predict the proportion of all patients destined to have a bad outcome among patients selected for a treatment based on high risk. We introduced the criterion

*PNF*(

*p*), namely the proportion of the population at highest risk that needs to be followed in order that a proportion

*p* of cases be followed as a complimentary guide to screening or intervention applications.

*PNF* could be adapted to high risk preventive interventions (

Rose, 1992), such as deciding what proportion of a population should be given statins to assure that at least 80% of all persons destined to have a myocardic infarction shall have received statins beforehand. The quantity 1−

*PCF*(

*q*) assesses what proportion of cases will fail to be followed if a proportion 1−

*q* at lowest risk is not followed. Thus,

*PCF* is useful for evaluating the effectiveness of an ongoing or proposed screening or prevention program by determining what proportion of future cases shall have participated. A “high risk strategy” requires that

*q* be small, say 20% or less. Whether that strategy will be effective in covering cases depends on whether

*PCF*(

*q*) is large enough.

*PNF* is useful in assessing the feasibility of the program required to cover a proportion

*p* of future cases. Typically, one will want

*p* to be 80% or more. Whether such a

*p* is feasible depends on whether

*PNF*(

*p*) is small enough.

*AUC*, unlike *PCF* and *PNF*, does not measure risk concentration. For example, suppose that risk is zero in non-cases and uniformly distributed in [0, 1] in cases, and that half of the population develops disease. Then *AUC* = 1, but *PCF*(0.1) = 0.2 and *PNF*(0.9) = 0.45.

By relating

*PCF* and

*PNF* to the Lorenz curve of the risk distribution and its inverse, we adapted and developed the theory needed for inference on these quantities for risks from a single model applied to the distribution of covariates in an independent validation sample. We developed methods for testing whether

*PCF*^{1} =

*PCF*^{2} or

*PNF*^{1} =

*PNF*^{2} for two models evaluated on the same independent sample with bivariate risk estimates (

*r*_{1i},

*r*_{2i}). Our methods allow for correlations between

*r*_{1i} and

*r*_{2i}. The work of

Zheng and Cushing (2001) can be used to compare two

*PCF*s with dependent data, but we are unaware of similar results for

*PNF*. Methods are available for comparing two Lorenz curves from independent samples (e.g.

Dardanoni and Forcina, 1999). When using such tests, care should be taken to assure that each model is well calibrated. Otherwise,

*PCF*^{1} >

*PCF*^{2} or

*PNF*^{1} <

*PNF*^{2} may simply reflect miscalibration of one or both models. We plan to explore the sensitivity of our methods to miscalibration in further simulations in future research.

The ideal data for estimating

*PCF* and

*PNF* are from a random sample from the distribution of covariates in an independent population of interest. This sample can be used to derive the distribution of risks for one or more models. If random samples of cases (

*Y* = 1) and controls (

*Y* = 0) are available from a population, with corresponding distributions of risk

*G* and

*K*, and if disease is rare so that

*K* nearly represents

*F*, then

has a simple distribution theory because

and

*Ĝ* are independent (

Greenhouse and Mantel, 1950). The same is true for

. If the disease is not rare but the disease risk μ is known, one can use

*F* = μ

*G* + (1−μ)

*K* in expressions for

*PCF* and

*PNF*. However, the corresponding distribution theory is not developed.

Other criteria have been proposed to evaluate risk models, apart from calibration and the AUC. If loss functions can be specified, they can determine the optimal risk threshold for intervention,

*t**, and indicate which risk model has smaller expected losses (

Vickers and Elkin, 2006;

Gail and Pfeiffer, 2005) or larger

*PCF*(

*q*) for

*q* = 1 −

*F*(

*t**).

Cook (2007) proposed reclassification criteria based on risk thresholds, and

Pencina et al. (2008) proposed a related measure, the net reclassification index. Another criterion they recommended, the integrated discrimination improvement, is not tied to a particular risk threshold, and can be regarded as a global criterion, like the AUC.

Pepe et al. (2008) point out that the ROC curve, which is a plot of 1−

*G*(

*r*) against 1−

*K*(

*r*) as

*r* varies, suppresses information on

*r*. Instead, they recommend use of the predictiveness curve, a plot

*r* versus

*F*(

*r*), together with another plot with two curves: 1−

*G*(

*r*) versus

*F*(

*r*) and 1−

*K*(

*r*) versus

*F*(

*r*). Together, these three curves summarize the information on the risk distribution in the population,

*F*(

*r*), and the effects of various risk thresholds

*r* on

*P*(

*R* >

*r*|

*Y* = 1) and

*P*(

*R* >

*r*|

*Y* = 0).

Huang and Pepe (2009) showed that

*ROC*(

*q*) = 1−

*G*○

*K*^{−1}(1−

*q*) and that

*dROC*(

*q*)/

*dq* can be used to estimate the predictiveness curve and the ”total gain” statistic, introduced by

Bura and Gastwirth (2001), if the disease prevalence μ is known. For rare diseases,

*F* ≈

*K*, and

*PCF*(

*q*) approximates

*ROC*(

*q*), but more work is needed to prove the conjecture that

*PCF*(

*q*) and its derivative can be used to approximate the predictiveness curve and total gain.

The risk distribution

*F* is central to the evaluation of risk models, because criteria such as expected loss, measures of relative dispersion of risk such as the Gini index, risk distributions

*G* and

*K*, and the AUC are functionals of

*F* (

Gail and Pfeiffer, 2005), and because of its value in displaying information in the ”integrated predictiveness and classification” plots of

Pepe et al. (2008). We believe that two functionals of particular public health interest are

*PCF*(

*q*) and

*PNF*(

*p*).