Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC3135785

Formats

Article sections

- SUMMARY
- 1. Introduction
- 2. Background, Notation, Criteria and Inference
- 3. Simulations
- 4. Data Example
- 5. Discussion
- Supplementary Material
- References

Authors

Related links

Biometrics. Author manuscript; available in PMC 2012 September 1.

Published in final edited form as:

Published online 2010 December 14. doi: 10.1111/j.1541-0420.2010.01523.x

PMCID: PMC3135785

NIHMSID: NIHMS252083

Biostatistics Branch, National Cancer Institute, 6120 Executive Blvd, Bethesda, MD 20892, USA

The publisher's final edited version of this article is available at Biometrics

See other articles in PMC that cite the published article.

We propose and study two criteria to assess the usefulness of models that predict risk of disease incidence for screening and prevention, or the usefulness of prognostic models for management following disease diagnosis. The first criterion, the proportion of cases followed *PCF*(*q*), is the proportion of individuals who will develop disease who are included in the proportion *q* of individuals in the population at highest risk. The second criterion is the proportion needed to follow-up, *PNF*(*p*), namely the proportion of the general population at highest risk that one needs to follow in order that a proportion *p* of those destined to become cases will be followed. *PCF*(*q*) assesses the effectiveness of a program that follows 100*q*% of the population at highest risk. *PNF*(*p*) assess the feasibility of covering 100*p*% of cases by indicating how much of the population at highest risk must be followed. We show the relationship of those two criteria to the Lorenz curve and its inverse, and present distribution theory for estimates of *PCF* and *PNF*. We develop new methods, based on influence functions, for inference for a single risk model, and also for comparing the *PCF*s and *PNF*s of two risk models, both of which were evaluated in the same validation data.

Statistical models that predict disease incidence, disease recurrence or mortality following disease onset have broad public health and clinical applications. Disease incidence models are available for many diseases, including coronary heart disease (Wilson et al., 1998), colorectal cancer (Freedman et al., 2009) and other cancers (http://riskfactor.cancer.gov/cancer_risk_prediction/). Disease incidence models can be used to target preventive interventions to those with high enough risks to justify an intervention that has adverse effects (Gail al, 1999) and to identify high risk individuals for intensive screening to detect disease in its early stages. Prognostic models are useful for understanding the risk of cancer recurrence (e.g. Stephenson et al., 2006) or death following cancer diagnosis (e.g. Albertsen et al., 2005). Such information can inform choices for a treatment whose side effects may outweigh the risk of death from the initial disease.

Before a model can be recommended for practical use, its performance characteristics need to be understood. General criteria to evaluate prediction models for dichotomous outcomes include predictive accuracy, proportion of variation explained, calibration and discrimination (Gail and Pfeiffer, 2005; Hand, 1981; Pepe, 2003). Most recent validation studies have emphasized calibration and discrimination. A model is called well calibrated (or unbiased) when the predicted probabilities agree with observed risk in subsets of the population and overall. Discriminatory accuracy, which assesses how much the distributions of risk predictions differ among those who subsequently did or did not develop disease, is often defined as the area under the receiver operating characteristic (ROC) curve (*AUC*) (see Pepe, 2003, page 67). Non-parametric inference on the *AUC* is based on the Mann-Whitney-Wilcoxon test.

We propose two measures of concentration of risk that are directly relevant to public health decisions. Only if most of the total population risk is concentrated in a small high risk subset will a prevention strategy that focuses on high risk groups be effective (Rose, 1992). If risk is concentrated, *AUC* will typically be large, but the reverse is not necessarily true (see Discussion). We define the ’‘proportion of cases followed”, *PCF*(*q*), as the proportion of cases that would be followed in a program that followed the proportion *q* of the population at highest risk. If risk is concentrated in a small proportion of the population at highest risk, then *PCF*(*q*) will be high, even for small *q*. Pharoah et al. (2002) used this quantity to measure the usefulness of a risk model, but did not give methods of inference. We propose a new complementary criterion, the proportion needed to follow-up, *PNF*(*p*), namely the proportion of the general population at highest risk that one needs to follow in order that a proportion *p* of those destined to become cases will be followed. If risk is concentrated in a small proportion of the population at highest risk, *PNF*(*p*) will be small, even for large *p*.

By recognizing the relationship of *PCF* and *PNF* with the Lorenz curve of the distribution of risks in the population, we are able to draw on theory for the Lorenz curve and its inverse to derive their distribution theory. We develop new methods based on influence functions for inference for a single risk model, and also for comparing the correlated estimates of *PCF* and *PNF* for two risk models evaluated in the same validation data. In Section 2 we present the general framework, formally define *PCF* and *PNF*, and give nonparametric estimates for *PCF* and *PNF* with their asymptotic properties. We present results for simulated data (Section 3), a data example (Section 4) and discussion (Section 5).

We present notation and methods for disease incidence models, but they also apply to models of disease prognosis. We want to predict whether an individual will be diagnosed with a particular disease over a given time period, for example five years (*Y _{i}* = 1) or not (

$$F(r)={\displaystyle {\int}_{\{x:R(x)\le r\}}{\mathit{\text{dF}}}_{X}(x)}.$$

(1)

Because *X* includes continuous variables such as age, and length of the risk projection interval, we assume throughout that *F* is continuous, even though some results only require continuity at selected quantiles of *F*. From Gail and Pfeiffer (2005), we assume that the triplet (*Y _{i}*, π

$$G(r)=P(R\le r|Y=1)=\frac{1}{\mu}{\displaystyle {\int}_{0}^{r}\mathit{\text{tdF}}(t).}$$

(2)

Similarly, the distribution of risk in non-diseased persons (*Y* = 0) is given by

$$P(R\le r|Y=0)=\frac{1}{1-\mu}{\displaystyle {\int}_{0}^{r}(1-t)\mathit{\text{dF}}(t).}$$

(3)

Suppose the risk model *R* can be applied to an entire population to assign a disease risk *r* to each person. The model-based risks are then ranked, and a proportion of those individuals with highest risks of developing disease is followed up. Depending on the specific application, follow-up could consist of a diagnostic test, screening, or a preventive intervention. For example, the risk of colorectal cancer could be computed based on a risk model, and then a proportion of those with highest risk would receive more intensive screening with colonoscopy. Another example of “follow-up” arises in high risk preventive strategies (Rose, 1992) in which a proportion of the population at highest risk is given a preventive intervention, such as statins to prevent heart disease. In the context of prognosis after disease diagnosis, those at highest risk of dying might be followed-up by giving a special treatment.

The proportion of cases followed, *PCF*(*q*), is the proportion of individuals who will develop disease who are included in the proportion *q* of individuals in the population at highest risk. Let ξ_{1−q} denote the (1−*q*)th quantile of the population distribution *F*, that is *P*(*r* > ξ_{1−q}) = 1 − *F*(ξ_{1−q}) = *q*. To obtain *PCF*(*q*) we apply (2) to the corresponding population quantile, ξ_{1−q} = *F*^{−1}(1 − *q*), and get

$$\mathit{\text{PCF}}(q)=1-G({\xi}_{1-q})=1-G\phantom{\rule{thinmathspace}{0ex}}\u25cb\phantom{\rule{thinmathspace}{0ex}}{F}^{-1}(1-q)=1-\frac{1}{\mu}{\displaystyle {\int}_{0}^{{\xi}_{1-q}}\mathit{\text{tdF}}(t).}$$

(4)

Note that after a change of variable,

$$G({\xi}_{1-q})=\frac{1}{\mu}{\displaystyle {\int}_{0}^{{\xi}_{1-q}}\mathit{\text{tdF}}(t)=\frac{1}{\mu}}{\displaystyle {\int}_{0}^{1-q}{F}^{-1}(x)\mathit{\text{dx}}=L(1-q),}$$

(5)

which is also the Lorenz curve *L* of *F* evaluated at 1 − *q*. Thus *PCF*(*q*) = 1 − *L*(1 − *q*), where *L*(1 − *q*) is the ratio of the total population risk below the (1 − *q*)th quantile of *F* to the total risk in the population (Gastwirth, 1971; Goldie, 1977). Figure 1 (a) plots the Lorenz curve when *F* is a Beta(1,49) distribution for which *PCF*(0.1) = 0.325.

The proportion needed to follow-up, *PNF*(*p*), is the fraction of the general population at highest risks one needs to follow to assure that a fraction *p* of those destined to become cases will be followed. Hence the quantile ξ_{1−PNF} = *F*^{−1}(1 −*PNF*) satisfies

$$1-G({\xi}_{1-\mathit{\text{PNF}}(p)})=1-G\phantom{\rule{thinmathspace}{0ex}}\u25cb\phantom{\rule{thinmathspace}{0ex}}{F}^{-1}(1-\mathit{\text{PNF}}(p))=p.$$

Thus, from (5),

$$\mathit{\text{PNF}}(p)=1-{L}^{-1}(1-p).$$

(6)

Using a result by Goldie (1977), who called expression (2), $G(t)={\mu}^{-1}{\displaystyle {\int}_{0}^{t}\mathit{\text{xdF}}(x)}$, the “first-order moment distribution”, we can also calculate *PNF*(*p*) as

$$\mathit{\text{PNF}}(p)=1-\mu {\displaystyle {\int}_{0}^{1-p}\{1/{G}^{-1}(x)\}\mathit{\text{dx}},\text{\hspace{1em}}0\le p}\le 1.$$

(7)

Figure 1 (b) plots the Lorenz curve and shows the corresponding value of *PNF*(0.9) = 0.591 when *F* is Beta(1,49).

Assume that a validation cohort contains a random sample of individuals with measured covariates. Corresponding estimates *r*_{1},…,*r _{n}* are a random sample from the continuous risk distribution

$$\widehat{\mathit{\text{PCF}}}(q)=1-{L}_{n}(1-q)=1-{S}_{[n(1-q)]}/{S}_{n}.$$

Using Goldie’s (1977) result for the inverse function of the Lorenz curve, ${L}_{n}^{-1}$, for a fixed value of 1 − *p*, the *PNF*(*p*) is estimated as

$$\widehat{\mathit{\text{PNF}}}(p)=1-{L}_{n}^{-1}(1-p)=i/n,\text{\hspace{1em}}{S}_{i}/{S}_{n}<1-p\le {S}_{i+1}/{S}_{n,}i=0,\dots ,n.$$

(8)

In what follows, we often suppress the arguments *q* and *p* for ease of exposition.

For continuous *F* with finite mean, *L _{n}*(1−

$$\sqrt{n}\{\widehat{\mathit{\text{PCF}}}(q)-\mathit{\text{PCF}}(q)\}=\sqrt{n}\{L(1-q)-{L}_{n}(1-q)\}\to N(0,{\sigma}_{\mathit{\text{PCF}}}^{2}).$$

(9)

In the Web Appendix we derive the influence function ϕ(*r*) for *L _{n}*,

$$\varphi (r)=-\frac{1}{\mu}(r-\mu )L(q)-\frac{1}{\mu}[{F}^{-1}(q)-r]I\{r\le {F}^{-1}(q)\}+\frac{1}{\mu}{F}^{-1}(q)q-L(q).$$

(10)

and use it to compute ${\sigma}_{\mathit{\text{PCF}}}^{2}=E{\varphi}^{2}$, that is given in the Web Appendix. By substituting empirical estimates *L _{n}*,

Almost sure convergence of $\widehat{\mathit{\text{PNF}}}$ to *PNF* for a given *p* holds for continuous *F* with finite mean (Goldie, 1977). For a fixed *p*, and assuming that a condition on the variation of the quantile process *F*^{−1} near 0 (equation 2.23 in Csörgö and Yu, 1999) holds,

$$\sqrt{n}\{\widehat{\mathit{\text{PNF}}}(p)-\mathit{\text{PNF}}(p)\}=\sqrt{n}\{{L}^{-1}(1-p)-{L}_{n}^{-1}(1-p)\}\to N(0,{\sigma}_{\mathit{\text{PNF}}}^{2}).$$

(11)

We evaluate ${\sigma}_{\mathit{\text{PNF}}}^{2}=E{\psi}^{2}$ by deriving the influence function

$$\psi (r)=-{L}^{-1}(1-p)+\frac{r}{{G}^{-1}(1-p)}[(1-p)-I\{r\le {G}^{-1}(1-p)\}]+I\{r\le {G}^{-1}(1-p)\},$$

(12)

and *E*ψ^{2} (Web Appendix). Substitution of empirical estimates for μ, *L*^{−1} and *G*^{−1} in equation (12) leads to (*r _{i}*) and ${\widehat{\sigma}}_{\mathit{\text{PNF}}}^{2}=(1/n){\displaystyle \sum {\widehat{\psi}}_{i}^{2}}$. As before, we use a bootstrap estimate of the variance in numerical studies.

Motivated by the asymptotic theory, we study 95% confidence intervals with the bootstrap variance estimates, $\widehat{\mathit{\text{PCF}}}+/-1.96{\widehat{\sigma}}_{\mathit{\text{PCF}}}\phantom{\rule{thinmathspace}{0ex}}\text{and}\phantom{\rule{thinmathspace}{0ex}}\widehat{\mathit{\text{PNF}}}+/-1.96{\widehat{\sigma}}_{\mathit{\text{PNF}}}$. The influence function representation is also valuable for computing the efficiency of the non-parametric estimates, compared to parametric estimates, and for comparing two models with respect to *PCF* and *PNF*, as shown next.

*AUC* values are often used to compare the discriminatory ability of two models. The model with larger *AUC* value better separates the distributions of risk in cases and non-cases. For small values of *q* and high values of *p* that are practically relevant, risk is more concentrated at high levels for a model that has larger *PCF*(*q*) or a smaller *PNF*(*p*).

We assume that risk models 1 and 2 are evaluated in the same population, and we observe paired risk projections $({r}_{i}^{1},{r}_{i}^{2})$ for individuals *i* = 1, …, *n* in the population, from the joint distribution function of risks (*R*^{1}, *R*^{2}). We estimate $\widehat{{\mathit{\text{PCF}}}^{1}}\text{and}\phantom{\rule{thinmathspace}{0ex}}\widehat{{\mathit{\text{PNF}}}^{1}},\text{and}\phantom{\rule{thinmathspace}{0ex}}\widehat{{\mathit{\text{PCF}}}^{2}}\text{and}\phantom{\rule{thinmathspace}{0ex}}\widehat{{\mathit{\text{PNF}}}^{2}}$ from ${r}_{i}^{1}\phantom{\rule{thinmathspace}{0ex}}\text{and}\phantom{\rule{thinmathspace}{0ex}}{r}_{i}^{2}$, *i* = 1, …, *n*, respectively.

To test whether, for fixed *q*, *PCF*^{1} = *PCF*^{2}, we use the statistic

$${T}_{\mathit{\text{PCF}}}=\frac{n{(\widehat{{\mathit{\text{PCF}}}^{1}}-\widehat{{\mathit{\text{PCF}}}^{2}})}^{2}}{{\widehat{V}}_{\mathit{\text{PCF}}}}.$$

(13)

We denote the influence for subject *i* for risk model 1 by ϕ_{i1}, and the corresponding influence for risk model 2 by ϕ_{i2}. The influence function representation $\sqrt{n}(\widehat{{\mathit{\text{PCF}}}^{1}}-\widehat{{\mathit{\text{PCF}}}^{2}})=\sqrt{n}{\displaystyle {\sum}_{i}({\varphi}_{1,i}-{\varphi}_{2,i})+{o}_{P}}$, and application of the central limit theorem prove asymptotic normality of $\sqrt{n}(\widehat{{\mathit{\text{PCF}}}^{1}}-\widehat{{\mathit{\text{PCF}}}^{2}})$, as *var*(ϕ_{1,i}−ϕ_{2,i}) exists because ϕ is bounded. We estimate * _{PCF}* using a bootstrap in simulations and the data example. For each bootstrap sample we draw

$$\widehat{\mathit{\text{var}}}({\widehat{\varphi}}_{1}-{\widehat{\varphi}}_{2})=n{\displaystyle \sum _{i=1}^{n}{({\widehat{\varphi}}_{i1}-{\widehat{\varphi}}_{i2})}^{2},}$$

(14)

where _{1} and _{2} are obtained as in Section 2.3. Using asymptotic normality, 95% confidence intervals for the difference can be computed as $(\widehat{{\mathit{\text{PCF}}}^{1}}-\widehat{{\mathit{\text{PCF}}}^{2}})+/-1.96{({\widehat{V}}_{\mathit{\text{PCF}}})}^{1/2}$. Alternatively, confidence intervals for the differences could be computed based on the 0.025th and 0.0975th quantiles from the bootstrap distribution. Similar remarks apply to

$${T}_{\mathit{\text{PNF}}}=\frac{n{(\widehat{{\mathit{\text{PNF}}}^{1}}-\widehat{{\mathit{\text{PNF}}}^{2}})}^{2}}{{\widehat{V}}_{\mathit{\text{PNF}}}},$$

(15)

where * _{PNF}* estimates

Asymptotically both *T _{PCF}* and

We used simulations to investigate the properties of the nonparametric estimates $\widehat{\mathit{\text{PCF}}}\phantom{\rule{thinmathspace}{0ex}}\text{and}\phantom{\rule{thinmathspace}{0ex}}\widehat{\mathit{\text{PNF}}}$ defined in Section 2.3. We also used the influence functions in Section 2.2 to compute the variances of $\widehat{\mathit{\text{PCF}}}\phantom{\rule{thinmathspace}{0ex}}\text{and}\phantom{\rule{thinmathspace}{0ex}}\widehat{\mathit{\text{PNF}}}$ and to compare their efficiencies to maximum likelihood estimates (MLEs) under a fully parametric model. We assumed that observed risks *r _{i}*,

$$\mathit{\text{PCF}}(q)=1-B({F}^{-1}(1-q,\alpha ,\beta ),\alpha +1,\beta )/B(\alpha +1,\beta ),$$

and

$$\mathit{\text{PNF}}(p)=1-B({G}^{-1}(p,\alpha +1,\beta ),\alpha ,\beta )/B(\alpha ,\beta ).$$

To investigate the inefficiency of non-parametric estimates, we also estimated *PCF* and *PNF* by finding the MLEs and based on *r _{i}*,

Table 1 gives results for 1000 independent simulations each based on a random sample of size *n* = 500. Bootstrap variance estimates were based on *B* = 1000 replicates. The mean estimates of $\widehat{\mathit{\text{PCF}}}\phantom{\rule{thinmathspace}{0ex}}\text{and}\phantom{\rule{thinmathspace}{0ex}}\widehat{\mathit{\text{PNF}}}$ were very close to the theoretical values, and the coverage of the confidence intervals was close to the nominal 0.95 value. The bootstrap estimates of the standard deviations of ${n}^{1/2}\widehat{\mathit{\text{PCF}}}\phantom{\rule{thinmathspace}{0ex}}\text{and}\phantom{\rule{thinmathspace}{0ex}}{n}^{1/2}\widehat{\mathit{\text{PNF}}}$ agreed closely with the theoretical estimates based on the influence functions. For example, for $\widehat{\mathit{\text{PCF}}}$ with (α, β) = (4.23, 207.27) and *q* = 0.10, the theoretical calculation and mean bootstrap estimates were 0.114 and 0.111 respectively. For *q* = 0.3 the theoretical and mean bootstrap estimates of standard deviation were both 0.144. Similar good agreement was found for $\widehat{\mathit{\text{PNF}}}$. Very similar results were found for parameters that yielded the same *AUC* values as Table 1, but with *E*(*R*) = 0.2 (Web Table 1).

Mean values of $\widehat{\text{PCF}}\phantom{\rule{thinmathspace}{0ex}}\text{and}\phantom{\rule{thinmathspace}{0ex}}\widehat{\text{PNF}}$ and of theoretical and bootstrap estimates of ${\{\text{nV ar}(\widehat{\text{PCF}})\}}^{1/2}\phantom{\rule{thinmathspace}{0ex}}\text{and}\phantom{\rule{thinmathspace}{0ex}}{\{\text{nV ar}(\widehat{\text{PNF}})\}}^{1/2}$ with estimated coverage of 95% confidence intervals and asymptotic relative efficiency (ARE) compared to parametric analysis under the beta distribution. **...**

The model with the least discriminatory accuracy, (α, β) = (6.55, 320.95), detects only a fraction *PCF* = 0.178 of cases when the fraction *q* = 0.10 of those at highest risk is studied, whereas the model with the most discriminatory accuracy (α, β) = (1, 49), detects *PCF* = 0.325. Likewise, to follow a fraction *p* = 0.90 of cases requires that the fraction *PNF* = 0.808 of the general population at highest risk be followed for the least discriminating model and *PNF* = 0.591 for the most discriminating model studied.

The non-parametric estimates $\widehat{\mathit{\text{PCF}}}\phantom{\rule{thinmathspace}{0ex}}\text{and}\phantom{\rule{thinmathspace}{0ex}}\widehat{\mathit{\text{PNF}}}$ were less precise than corresponding parametric estimates, as assessed by the asymptotic relative efficiency (ARE), or variance ratio. For small *q*, $\widehat{\mathit{\text{PCF}}}$ has an ARE less than 0.5, or a variance more than twice as large as the parametric estimates. For example, *ARE* = 0.473 for (α, β) = (4.23, 207.27) for *q* = 0.10. The AREs of $\widehat{\mathit{\text{PNF}}}$ compared to the MLE ranged from 0.682 to 0.895 over the parameters and choices of *q* studied. Even though the non-parametric procedures are less efficient, they yield consistent estimates, whereas incorrectly specified parametric methods need not.

We examined the size and power of tests *T _{PCF}* in equation (13) and

The estimated correlations for the risk estimates (*r*_{i1}, *r*_{i2}) generated under this mechanism are close to the correlation used in the bivariate normal model (Table 2). For ρ = 0.25 the estimated correlations ranged from 0.19 to 0.29, and for ρ = 0.50 the estimated correlations ranged from 0.41 to 0.54 (Table 2).

Power of the tests T_{PCF} and T_{PNF} to compare PCF(q) and PNF(p) for two beta-distributed risk models, R_{1} with parameters (α_{1}, β_{1}) and R_{2} with parameters (α_{2}, β_{2}) for q = 0.1 and p = 0.9 for sample size n = 500 and $c={\chi}_{}^{}$ **...**

Table 2 shows the size for *T _{PCF}* and

We compared the theoretical power *P _{t}*(

For a few sets of parameters we compared the power of *T _{PCF}* and

Freedman et al. (2009) developed a model to predict colorectal cancer (CRC) incidence. We now compute *PCF* and *PNF* for that model in independent data on 108, 016 women from the National Institutes of Health (NIH)-AARP Diet and Health Study, a prospective cohort study (Park et al., 2009).

Let *T* denote the time to CRC. An individual’s CRC risk over the age interval (*a*, *b*) is $r(x)=P(T\in (a,b)|T>a)={\displaystyle {\int}_{a}^{b}{\lambda}_{1}(t,X){S}^{*}(t-)\mathit{\text{dt}},}\phantom{\rule{thinmathspace}{0ex}}\text{where}\phantom{\rule{thinmathspace}{0ex}}{S}^{*}(t)=\text{exp}[-{\displaystyle {\int}_{0}^{t}\{{\lambda}_{1}(u,X)+}{\lambda}_{2}(u)\}\mathit{\text{du}}]$. Here λ_{1} denotes the CRC hazard and λ_{2} the hazard for competing causes of death. As CRC incidence differed among three different sites in the colon, Freedman et al. modeled λ_{1} = λ_{11} + λ_{12} + λ_{13}, as the sum of proximal (λ_{11}), distal (λ_{12}) and rectal (λ_{13}) colon cancer hazard rates, with ${\lambda}_{1i}={\mathit{\text{RR}}}_{i}(t,X){\lambda}_{1i}^{*}(t)$, *i* = 1, 2, 3, where ${\lambda}_{1i}^{*}$ denotes the age specific baseline hazard rate and *RR _{i}* the relative risk, which was estimated from case-control data. The ${\lambda}_{1i}^{*}(t)\mathrm{s}$ were obtained by multiplying the composite hazards from SEER cancer registries by one minus the attributable risk, which was estimated from the relative risks and the risk factor distribution in cases in the case-control study. Risk factors included an individual’s age, sex, sigmoidoscopy/colonoscopy history, polyps and other factors.

While the model was well calibrated overall among women in the AARP Diet and Health Study it showed lack of fit in some subgroups, possibly because the AARP questionnaire assessed sigmoidoscopy/colonoscopy history differently (Park et al., 2009). Because good estimates of *PCF* and *PNF* require good model calibration, we recalibrated the model. We first split the women in the AARP cohort randomly into a training and a test set, each comprised of 54,008 women, with 458 CRC cases in the training set and 467 in the test set. Following Cox (1958), we fit a logistic regression model with the log-transformed absolute risk estimate *r* as the independent variable to the training set. We then used logit(*r _{c}*) = −4.7019 + 0.5335 log(

In the test data, the model *r _{c}* had $\widehat{\mathit{\text{AUC}}}=0.624$. The modest discriminatory ability was also reflected in $\widehat{\mathit{\text{PCF}}}\phantom{\rule{thinmathspace}{0ex}}\text{and}\phantom{\rule{thinmathspace}{0ex}}\widehat{\mathit{\text{PNF}}}$. For example, for

Estimates $\widehat{\text{PCF}}\phantom{\rule{thinmathspace}{0ex}}\text{and}\phantom{\rule{thinmathspace}{0ex}}\widehat{\text{PNF}}$ (with 95% confidence intervals in parenthesis) based on the observed distribution F of risks for colorectal cancer in AARP women. The corresponding observed proportion of cases PCF* (with 95% confidence interval) found among the **...**

We then compared the previous model (model 1) to model 2 that used a single hazard, ${\lambda}_{1}(t,X)=\mathit{\text{RR}}(X,t){\lambda}_{1}^{*}(t)$. Model 2 was fit to the original case control data in Freedman et al. We applied the same recalibration to model 2, resulting in a logistic model, $\text{logit}({r}_{c}^{(2)})=-4.6959+0.5206\phantom{\rule{thinmathspace}{0ex}}\text{log}({r}_{2})$. Estimates ${r}_{c}^{(2)}$ were well calibrated in the test set.

Model 2 had ${\widehat{\mathit{\text{AUC}}}}_{2}=0.622$ in the test data, which was not significantly different from ${\widehat{\mathit{\text{AUC}}}}_{1}=0.624$ (*p* = 0.23). For *q* = 0.1, model 2 yielded a significantly lower ${\widehat{\mathit{\text{PCF}}}}_{2}=0.168$, 95%*CI*(0.167, 0.168) than model 1 (${\widehat{\mathit{\text{PCF}}}}_{1}=0.170$, *p* < 0.0001) (Table 3). Likewise, for *p* = 0.9, model 2 resulted in a significantly larger ${\widehat{\mathit{\text{PNF}}}}_{2}=0.811$, 95%*CI*(0.810, 0.811) than model 1 (${\widehat{\mathit{\text{PNF}}}}_{1}=0.808$, *p* < 0.0001). These small differences are statistically significant because the estimates are based on 54, 008 women in the test data, resulting in high precision.

Even though model 1 is statistically significantly superior to model 2, the practical improvement is small. For example, to assure that 90% of future cases are screened in the test cohort, model 1 would require screening of 0.808 × 54, 008 = 43, 638 women, compared to 0.811 × 54, 008 = 43, 800 for model 2.

We considered two criteria for evaluating risk models that measure the concentration of risk in a population and that have potential application in public health and clinical epidemiology. *PCF*(*q*), the proportion of cases who will be followed if the proportion *q* of the general population at highest risk is followed has been recommended by Pharoah et al. (2002). Hand and Henley (1997) used the “bad risk rate amongst accepts” that is analogous to 1− *PCF*, in credit risk models, but neither Pharoah et al. nor Hand and Henley described methods of inference. For models of disease incidence, *PCF* can determine the proportion of those destined to develop disease who will receive screening or preventive interventions. Following disease diagnosis, it can predict the proportion of all patients destined to have a bad outcome among patients selected for a treatment based on high risk. We introduced the criterion *PNF*(*p*), namely the proportion of the population at highest risk that needs to be followed in order that a proportion *p* of cases be followed as a complimentary guide to screening or intervention applications. *PNF* could be adapted to high risk preventive interventions (Rose, 1992), such as deciding what proportion of a population should be given statins to assure that at least 80% of all persons destined to have a myocardic infarction shall have received statins beforehand. The quantity 1−*PCF*(*q*) assesses what proportion of cases will fail to be followed if a proportion 1−*q* at lowest risk is not followed. Thus, *PCF* is useful for evaluating the effectiveness of an ongoing or proposed screening or prevention program by determining what proportion of future cases shall have participated. A “high risk strategy” requires that *q* be small, say 20% or less. Whether that strategy will be effective in covering cases depends on whether *PCF*(*q*) is large enough. *PNF* is useful in assessing the feasibility of the program required to cover a proportion *p* of future cases. Typically, one will want *p* to be 80% or more. Whether such a *p* is feasible depends on whether *PNF*(*p*) is small enough.

*AUC*, unlike *PCF* and *PNF*, does not measure risk concentration. For example, suppose that risk is zero in non-cases and uniformly distributed in [0, 1] in cases, and that half of the population develops disease. Then *AUC* = 1, but *PCF*(0.1) = 0.2 and *PNF*(0.9) = 0.45.

By relating *PCF* and *PNF* to the Lorenz curve of the risk distribution and its inverse, we adapted and developed the theory needed for inference on these quantities for risks from a single model applied to the distribution of covariates in an independent validation sample. We developed methods for testing whether *PCF*^{1} = *PCF*^{2} or *PNF*^{1} = *PNF*^{2} for two models evaluated on the same independent sample with bivariate risk estimates (*r*_{1i}, *r*_{2i}). Our methods allow for correlations between *r*_{1i} and *r*_{2i}. The work of Zheng and Cushing (2001) can be used to compare two *PCF*s with dependent data, but we are unaware of similar results for *PNF*. Methods are available for comparing two Lorenz curves from independent samples (e.g. Dardanoni and Forcina, 1999). When using such tests, care should be taken to assure that each model is well calibrated. Otherwise, *PCF*^{1} > *PCF*^{2} or *PNF*^{1} < *PNF*^{2} may simply reflect miscalibration of one or both models. We plan to explore the sensitivity of our methods to miscalibration in further simulations in future research.

The ideal data for estimating *PCF* and *PNF* are from a random sample from the distribution of covariates in an independent population of interest. This sample can be used to derive the distribution of risks for one or more models. If random samples of cases (*Y* = 1) and controls (*Y* = 0) are available from a population, with corresponding distributions of risk *G* and *K*, and if disease is rare so that *K* nearly represents *F*, then $\widehat{\mathit{\text{PCF}}}=1-\widehat{G}\u25cb{\widehat{K}}^{-1}(1-q)$ has a simple distribution theory because and *Ĝ* are independent (Greenhouse and Mantel, 1950). The same is true for $\widehat{\mathit{\text{PNF}}}=1-\widehat{K}\u25cb{\widehat{G}}^{-1}(1-p)$. If the disease is not rare but the disease risk μ is known, one can use *F* = μ*G* + (1−μ)*K* in expressions for *PCF* and *PNF*. However, the corresponding distribution theory is not developed.

Other criteria have been proposed to evaluate risk models, apart from calibration and the AUC. If loss functions can be specified, they can determine the optimal risk threshold for intervention, *t**, and indicate which risk model has smaller expected losses (Vickers and Elkin, 2006; Gail and Pfeiffer, 2005) or larger *PCF*(*q*) for *q* = 1 − *F*(*t**). Cook (2007) proposed reclassification criteria based on risk thresholds, and Pencina et al. (2008) proposed a related measure, the net reclassification index. Another criterion they recommended, the integrated discrimination improvement, is not tied to a particular risk threshold, and can be regarded as a global criterion, like the AUC. Pepe et al. (2008) point out that the ROC curve, which is a plot of 1−*G*(*r*) against 1−*K*(*r*) as *r* varies, suppresses information on *r*. Instead, they recommend use of the predictiveness curve, a plot *r* versus *F*(*r*), together with another plot with two curves: 1−*G*(*r*) versus *F*(*r*) and 1−*K*(*r*) versus *F*(*r*). Together, these three curves summarize the information on the risk distribution in the population, *F*(*r*), and the effects of various risk thresholds *r* on *P*(*R* > *r*|*Y* = 1) and *P*(*R* > *r*|*Y* = 0).

Huang and Pepe (2009) showed that *ROC*(*q*) = 1−*G*○*K*^{−1}(1−*q*) and that *dROC*(*q*)/*dq* can be used to estimate the predictiveness curve and the ”total gain” statistic, introduced by Bura and Gastwirth (2001), if the disease prevalence μ is known. For rare diseases, *F* ≈ *K*, and *PCF*(*q*) approximates *ROC*(*q*), but more work is needed to prove the conjecture that *PCF*(*q*) and its derivative can be used to approximate the predictiveness curve and total gain.

The risk distribution *F* is central to the evaluation of risk models, because criteria such as expected loss, measures of relative dispersion of risk such as the Gini index, risk distributions *G* and *K*, and the AUC are functionals of *F* (Gail and Pfeiffer, 2005), and because of its value in displaying information in the ”integrated predictiveness and classification” plots of Pepe et al. (2008). We believe that two functionals of particular public health interest are *PCF*(*q*) and *PNF*(*p*).

We thank the reviewers for many helpful suggestions.

**Supplementary Materials**

The Web Appendix referenced in Sections 2 and 3 is available under the Paper Information link at the Biometrics website http://www.tibs.org/biometrics.

- Albertsen PC, Hanley JA, Fine J. 20-year outcomes following conservative management of clinically localized prostate cancer. Journal of the American Medical Association. 2005;293:2095–2101. [PubMed]
- Bura E, Gastwirth JL. The binary regression quantile plot: Assessing the importance of predictors in binary regression visually. Biometrical Journal. 2001;43:5–21.
- Cook NR. Use and misuse of the receiver operating characteristic curve in risk prediction. Circulation. 2007;115:928–935. [PubMed]
- Cox DR. Two Further Applications of a Model for Binary Regression Two Further Applications of a Model for Binary Regression. Biometrika. 1958;45:562–565.
- Csöro M, Yu H. Weak approximations for empirical Lorenz curves and their Goldie inverses of stationary observations. Advances in Applied Probability. 1999;31:698–719.
- Dardanoni V, Forcina A. Inference For The Lorenz Curve Ordering. Econometrics Journal. 1999;2:48–74.
- Freedman AN, Slattery ML, Ballard-Barbash R, Willis G, Cann BJ, Pee D, Gail MH, Pfeiffer RM. Colorectal cancer risk prediction tool for white men and women without known susceptibility. Journal of Clinical Oncology. 2009;27:686–693. [PMC free article] [PubMed]
- Gail MH, Gastwirth JL. Scale-free goodness-of-fit test for exponential distribution based on Lorenz curve. Journal of the American Statistical Association. 1978;73:787–793.
- Gail MH, Costantino JP, Bryant J, Croyle R, Freedman L, Helzlsouer K, Vogel V. Weighing the risks and benefits of tamoxifen treatment for preventing breast cancer. Journal of the National Cancer Institute. 1999;91:1829–1846. [PubMed]
- Gail MH, Pfeiffer RM. On criteria for evaluating models of absolute risks. Biostatistics. 2005;6:227–239. [PubMed]
- Gastwirth JL. A General Definition of the Lorenz Curve. Econometrica. 1971;39:1037–1039.
- Goldie CM. Convergence theorems for empirical Lorenz curves and their inverses. Advances in Applied Probability. 1977;9:765–791.
- Greenhouse SW, Mantel N. The evaluation of diagnostic tests. Biometrics. 1950;6:399–412. [PubMed]
- Hand DJ. Discrimination and classification. Chichester: Wiley; 1981.
- Hand DJ, Henley WE. Statistical classification methods in consumer credit scoring: a review. Journal of the Royal Statistical Society Series A. 1997;160:523–541.
- Huang Y, Pepe MS. A parametric ROC model-based approach for evaluating the predictiveness of continuous markers in case-control studies. Biometrics. 2009;65:1133–1144. [PMC free article] [PubMed]
- Huber PJ. Robust Statistical Procedures. Society for Industrial and Applied Mathematics. 1996
- Park Y, Freedman AN, Gail MH, Pee D, Hollenbeck A, Schatzkin A, Pfeiffer RM. Validation of a Colorectal Cancer Risk Prediction Model Among White Patients Age 50 Years and Older. Journal of Clinical Oncology. 2009;27:694–698. [PMC free article] [PubMed]
- Pencina MJ, D’Agostino RB, D’Agostino RB, Vasan RS. Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond. Statistics in Medicine. 2008;27(2):157–172. [PubMed]
- Pepe MS. The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford Statistical Science Series: Oxford University Press; 2003.
- Pepe MS, Feng Z, Huang Y, Longton G, Prentice R, Thompson IM, Zheng Y. Integrating the predictiveness of a marker with its performance as a classifier. American Journal of Epidemiology. 2008;167:362–368. [PMC free article] [PubMed]
- Pharoah PDP, Antoniou A, Bobrow M, Zimmern RL, Easton DF, Ponder BAJ. Polygenic susceptibility to breast cancer and implications for prevention. Nature Genetics. 2002;31:33–36. [PubMed]
- Rose G. The Strategy of Preventive Medicine. Oxford Medical Publications, Oxford University Press; 1992.
- Stephenson AJ, Scardino PT, Eastham JA, Bianco FJ, Dotan ZA, Fearn PA, Kattan MW. Preoperative nomogram predicting the 10-year probability of prostate cancer recurrence after radical prostatectomy. Journal of the National Cancer Institute. 2006;98:715–717. [PMC free article] [PubMed]
- Vickers AJ, Elkin EB. Decision curve analysis: A novel method for evaluating prediction models. Medical Decision Making. 2006;26:565–574. [PMC free article] [PubMed]
- Wilson PWF, D’Agostino RB, Levy D, Belanger AM, Silbershatz H, Kannel WB. Prediction of coronary heart disease using risk factor categories. Circulation. 1998;97:1837–1847. [PubMed]
- Zheng BH, Cushing BJ. Statistical inference for testing inequality indices with dependent samples. Journal of Econometrics. 2001;101:315–335.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |