Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2774909

Formats

Article sections

- Abstract
- 1 Introduction
- 2 Estimating the ROC Curve
- 3 Sampling Variability
- 4 The roccurve Command
- 5 Summary Indices
- 6 The comprocCommand
- 7 Remarks
- References

Authors

Related links

Stata J. Author manuscript; available in PMC 2010 March 1.

Published in final edited form as:

Stata J. 2009 March 1; 9(1): 1.

PMCID: PMC2774909

NIHMSID: NIHMS90148

Fred Hutchinson Cancer Research Center, Seattle, Washington, USA, Email: ude.notgnihsaw.u@epepsm

See other articles in PMC that cite the published article.

The receiver operating characteristic (ROC) curve displays the capacity of a marker or diagnostic test to discriminate between two groups of subjects, cases versus controls. We present a comprehensive suite of Stata commands for performing ROC analysis. Non-parametric, semiparametric and parametric estimators are calculated. Comparisons between curves are based on the area or partial area under the ROC curve. Alternatively pointwise comparisons between ROC curves or inverse ROC curves can be made. Options to adjust these analyses for covariates, and to perform ROC regression are described in a companion article. We use a unified framework by representing the ROC curve as the distribution of the marker in cases after standardizing it to the control reference distribution.

The receiver operating characteristic curve (ROC) displays the discriminatory capacity of a marker or test. Suppose *D* = 0 denotes controls and *D* = 1 denotes cases and assume without loss of generality that larger values of *Y* are more indicative of a subject being a case. The ROC curve for a marker, *Y*, is a plot of the true positive rate TPR(*c*) = *P* [*Y* ≥ *c*|*D* = 1] versus the false positive rate FPR(*c*) = *P* [*Y* ≥ *c*|*D* = 0] for the thresholding criterion ‘*Y* ≥ *c*’ where *c* varies from −∞ to ∞. It is a monotone increasing function in the unit square tied down at the boundary points (0,0) and (1,1). A perfect classifier has an ROC curve that rises steeply along the left axis to the point (FPR=0, TPR=1), while an uninformative marker has an ROC curve that is the diagonal 45° line. Key attributes of the ROC curve are: (i) it does not depend on the raw measurement units for *Y*. It is invariant to monotone increasing transformations of *Y*; (ii) it provides a common scale for comparing performances of different markers; and (iii) it displays the range of possible performance levels that can be achieved by varying the threshold.

Figure 1 shows empirical ROC curves for 2 pancreatic cancer biomarkers (Wieand, Gail, James, et al. 1989). The data can be downloaded from the Diagnostic and Biomarker Statistical Center website (http://www.fhcrc.org/labs/pepe/dabs/), or loaded directly into a Stata session:

Non-parametric ROC curves for two markers of pancreatic cancer. 90% confidence intervals for ROC(0.2) are displayed.

Let *F* denote the right continuous cumulative distribution of *Y* in the control population, *F* (*y*) = *P* (*Y* < *y*|*D* = 0). We define a standardization of *Y*, for the *i ^{th}* subject with marker value

$${\text{pv}}_{i}=F({Y}_{i})$$

is the proportion of the control population with values below *Y _{i}*. In lay terms,

The ROC curve is the cumulative distribution of 1 − pv* _{D}*,

$$\text{ROC}(f)=P[1-{\text{pv}}_{D}\le f],$$

where pv* _{D}* denotes the standardized marker for a case.

Let *y* be a marker threshold and note that the corresponding false positive rate *f* satisfies *F* (*y*) = 1 − *f*. Let *Y _{D}* denote the marker value from a random case. If the control distribution of

$$\begin{array}{l}\text{ROC}(f)\equiv P[{Y}_{D}\ge y]\\ =P[F({Y}_{D})\ge F(y)]\\ =P[p{v}_{D}\ge 1-f]=P[1-{\text{pv}}_{D}\le f]\end{array}$$

If *F* has discrete mass points, this proof also holds when *y* is a mass point. If *y* is not a mass point but (*y*^{−}, *y*^{+}) are the closest values, *y*^{−} < *y* < *y*^{+}, then *f* = 1 − *F* (*y*^{+}) and ROC(*f*) = *P* [*Y _{D}* >

The representation in Result 1 suggests that ROC curve estimation can be accomplished in two steps:

- Estimate the reference cumulative distribution function (CDF),
*F*, using controls; and calculate corresponding standardized marker values for cases, and - Estimate the cumulative distribution of the standardized marker values for cases.

The empirical estimator of the control reference distribution can be employed. Alternatively a parametric model can be assumed. The
`roccurve`command allows one to use either the empirical method or a normal parametric distribution model.

Marker values for cases are standardized using the estimator . Write the standardized values as

$${\widehat{\text{pv}}}_{Di}=\widehat{F}({Y}_{Di})\phantom{\rule{0.38889em}{0ex}}i=1,\dots {n}_{D}$$

where *n _{D}* is the number of case observations.

The next step is to estimate the CDF of 1 − pv* _{D}*, denoted by

$$H(f)=g({\alpha}_{0}+{\alpha}_{1}{g}^{-1}(f))$$

where *g* is a CDF. Observe that this form acknowledges that the domain for *H* is restricted to (0, 1). As a special case, when *g* = Φ, the standard normal distribution, the corresponding ROC curve is binormal (Dorfman and Alf, 1969),

$$\text{ROC}(f)=H(f)=\mathrm{\Phi}({\alpha}_{0}+{\alpha}_{1}{\mathrm{\Phi}}^{-1}(f)).$$

The
`roccurve` command also allows the logistic form, *g*(·) = exp(·)/(1 + exp(·)), which gives rise to a bilogistic ROC curve (Ogilvie and Creelman, 1968).

To fit these parametric models a set of discrete points on the FPR axis is chosen, {*f*_{1}, …, *f _{np}*}. For each case

In some applications one may only want to model the ROC curve over a restricted FPR range, (*a*, *b*) (0, 1), in which case the FPR points {*f*_{1}, … *f _{np}*} should span the interval (

In figure 2 we display four different estimators applied to data on the pancreatic cancer biomarker CA-125. The first estimator is the standard empirical ROC curve that results from standardizing with the right continuous empirical control reference distribution and applying the empirical CDF for *H*. This is precisely the same as the empirical estimator that is provided by Stata’s
`roctab` command. The second estimator is the semiparametric binormal estimator that again calculates the standardized values with the empirical control distribution for *Y* but employs a probit link function for *g*. This rank invariant semiparametric estimator requires less computation than the binormal estimator provided by Stata’s
`rocfit` command and appears to have similar efficiency (Alonzo and Pepe 2002). The third estimator assumes that the marker is normally distributed in controls and is not rank invariant. It calculates standardized values as

$${\text{pv}}_{Di}=\mathrm{\Phi}(({Y}_{Di}-\mathit{mean})/sd)$$

where (*mean*, *sd*) are the sample mean and standard deviation of the control observations. The fourth estimator is fully parametric. In addition to modeling the control reference distribution as normal it assumes the ROC curve is binormal. The two assumptions taken together are equivalent to assuming markers for both cases and controls are normally distributed. In practice the rank invariant estimators are more popular. Parametric models for the reference distribution have a more prominent role in settings where covariates affect marker distributions and covariate-specific distributions are difficult to estimate empirically (Janes, Longton and Pepe, 2008).

We use bootstrap resampling to calculate pointwise confidence intervals for the ROC curve, ROC(*f*), and for its inverse, ROC^{−}(*t*). In particular, if *f* is the false positive rate, the (1 − *α*/2) and *α*/2 quantiles of the bootstrap distribution of
$\widehat{\text{ROC}}(f)$ are delivered as the (1 − *α*) confidence limits.

The resampling must reflect the study design. If selection to the study was outcome dependent, that is if a case-control design was employed as is common in early phase studies (Pepe, Etzioni, Feng, et al. 2001), then resampling is done separately within case and control strata. On the other hand, if subjects were enrolled without regard to their outcome status, resampling is done accordingly from the entire dataset. In addition, if observations are clustered, for example if subjects contribute several observations to ROC curve estimation, the
`cluster()` option can be used to identify resampling clusters.

The syntax for the
`roccurve` command is

`roccurve` *disease_var test_varlist [if] [in] [, options]*

where *disease_var* gives the name of the binary outcome variable, *D* = 1 for a case and *D* = 0 for a control and *test_varlist* gives the names of markers or tests for which ROC curves are to be calculated

`pvcmeth` (*method*) specifies how is estimated. Options include
`empirical` (the default), where is the empirical control marker distribution, and
`normal`, that assumes a normal distribution and estimates the control mean and variance with the sample mean and variance.

`tiecorr`indicates that a correction for ties between case and control values is included in the empirical pv calculation. The correction is only important in calculating summary indices such as the area under the ROC curve that is discussed later. The tie corrected pv for a case with marker *Y _{i}* is the proportion of control values

`rocmeth` (*method*) specifies whether the
`empirical`(default) or a
`parametric` model for the ROC is used.

`link` (*link*) is relevant for a parametric ROC model. For a binormal model, link is specified as
`probit`while the link is specified as
`logit`for the bilogistic model.

`interval` (*a b n _{p}*) specifies the interval (

`roc`(*f*) specifies the false positive rate, *f*, for calculation of point estimates for ROC(*f*) and confidence intervals.

`rocinv` (*t*) specifies the true positive rate, *t*, for calculation of point estimates for ROC^{−1}(*t*) and confidence intervals.

`nograph`suppresses the ROC plots; when only returned numerical results are desired.

*twoway options* include various graph options overriding default axis options, titles, and overall graph appearance. Exceptions include marker type options and the
`by()` option.

`offset` *(#)* specifies the x-axis offset from *f* or *t* for placement of second and subsequent CIs for ROC(*f*) or ROC^{−1}(*t*) to avoid overlap of interval bars for different markers.

This is only relevant if either of the
`roc` (*f*) or
`rocinv` (*t*) options are specified.

`nsamp` *(#)* specifies the number of bootstrap replications to be performed for estimating confidence intervals. The default is 1000 replications.

`noccsamp` specifies that bootstrap samples be drawn from the combined sample rather than sampling separately from cases and controls; case-control sampling is the default.

`cluster` *(varlist)* specifies variables identifying bootstrap resampling clusters.

`level` *(#)* specifies the confidence level, as a percentage, for confidence intervals.

There are options to create new variables.

`genrocvars`generates new pairs of variables, fpr# and tpr# for each marker in the *test_varlist*, with ROC coordinates for corresponding marker values. The empirical ROC curve, empirical
`rocmeth()`, results from connecting the points as a right-continuous step function. New variable names are numbered (#) according to variable order in the *test_varlist*.

`genpcv`generates variables, pcv#, to hold percentile values for each marker in the *test_varlist*. The numbers (#) correspond to marker variable order in the *test_varlist*.

`replace`requests that existing variables fpr#, tpr# or pcv# be overwritten by
`genpcv` or
`genrocvar`.

There are also options to adjust the ROC curve estimates for covariates. These options are described in another article in this journal (Janes, Longton and Pepe, 2008).

Confidence limits for
`roc`(*f*) or
`rocinv`(*t*) and parameters for the ROC-GLM parametric curve fit are saved in
`r`() when the corresponding options are specified:

Matrices

`r(ROC_ci)` *n* × 3 matrix of
`roc`(*f*) or
`rocinv`(*t*) estimates and confidence limits returned for the *n* markers of the *test_varlist* when either option is specified.

`r(BNParm)` *n*×2 matrix of binormal or bilogistic curve parameter estimates when
`rocmeth(parametric)`is specified.

The following code produced the plot in Figure 1:

`roccurve d y1 y2, roc(.2) level(90)`

The 4 estimators displayed in Figure 2 were produced using the following 4 commands:

`roccurve d y2, pvcmeth (empirical) rocmeth (nonparametric)``roccurve d y2, pvcmeth (empirical) rocmeth (parametric) link(probit)``roccurve d y2, pvcmeth (normal) rocmeth (nonparametric)``roccurve d y2, pvcmeth (normal) rocmeth (parametric)`

Measures derived from the ROC curve are used to summarize discriminatory accuracy. More importantly, they serve as the basis for test statistics to compare ROC curves. The most popular index is the area under the ROC curve (AUC), also known as the c-index or probability of correct ordering, AUC = Prob(*Y _{D}* >

For clinical applications we prefer use of the ROC (or ROC^{−1}) at a specific point. Consider ROC(*f*). Given that one is willing to accept a false positive rate (*f*), what proportion of cases will be detected? This is relevant to clinical practice. However, fixing one FPR of interest can be difficult. A compromise is the partial AUC that averages the ROC curve over a range of false positive rates (McClish 1989, Thompson and Zucchini 1989). Since low FPR are typically of interest, one can calculate the partial area between 0 and the largest acceptable FPR, denoted by *f*_{0}:

$$\text{pAUC}({f}_{0})={\int}_{0}^{{f}_{0}}\text{ROC}(f)df.$$

Interestingly, the classic nonparametric estimator of the AUC can be written as the sample mean of the *nonparametric* case percentile values (Delong et al 1988; Hanley and Hajian-Tilaki, 1997).

$${\widehat{\text{AUC}}}_{e}=\sum _{i=1}^{{n}_{D}}{\widehat{\text{pv}}}_{Di}/{n}_{D}$$

(1)

When ties between case and control marker values are present, a correction for ties is necessary in calculating the percentile values so that ${\widehat{\text{AUC}}}_{e}$ corresponds to the trapezoidal empirical AUC:

$${\widehat{\text{pv}}}_{Di}^{c}={\widehat{\text{pv}}}_{Di}+\frac{1}{2}{\widehat{e}}_{i}$$

where *ê _{i}* is the proportion of control marker values equal to

$$\text{p}{\widehat{\text{AUC}}}_{e}({f}_{0})=\sum _{i=1}^{{n}_{D}}\mathit{max}({\widehat{\text{pv}}}_{Di}-(1-{f}_{0}),0)/{n}_{D}$$

(2)

again with the aforementioned tie correction for cases tied with controls.

By using a parametric model for the control reference distribution, the average of parametric case percentiles yields another estimator of the AUC. Analogously, expression (2) with parametric case percentiles provides a semiparametric partial AUC estimate. Note that tie corrections are not necessary when the estimated reference distribution is continuous.

In general, calculation of areas and partial areas under parametric ROC curves requires numerical integration and are not output by our programs. The one exception is that the area under the binormal ROC curve has the closed form expression
$\mathrm{\Phi}({\alpha}_{0}/\sqrt{1+{\alpha}_{1}^{2}})$. Stata’s
`rocfit` command provides this after fitting a binormal curve. Our programs do not. We only provide estimates that are non-parametric with regard to the shape of the ROC curve. This is also true for point estimates of ROC(*f*) and ROC^{−1}(*t*) that are output by the
`comproc` command.

To compare ROC curves we calculate a confidence interval for the difference between ROC summary indices. A Wald statistic, dividing the observed difference by its standard error is compared to the standard normal distribution in order to report a *p*-value. Confidence intervals and standard errors are again derived from the bootstrap distribution of the estimators. The
`comproc` command outputs results for one or more of the AUC, ROC(*f*), ROC^{−1}(*t*) or pAUC(*f*) where the fixed FPR=*f* or fixed TPR= *t* of interest are specified by the data analyst.

The syntax of the
`comproc`command is

`comproc` *disease_var test_var1 [test_var2] [if] [in] [, options]*

where *disease_var* is the binary outcome status variable and *test_var1* and *test_var2* are the markers. If only one marker is specified, summary indices are output for that marker but no comparisons are made.

Options for percentile value calculation and for dealing with sampling variability are the same as described above for the
`roccurve`command. Options to include covariate adjustment in making comparisons are described in a companion paper (Janes, Longton and Pepe, 2008).

The options for summary indices to evaluate and to compare markers are:

`auc`, the area under the ROC curve

`pauc`(*f*), the partial area under the ROC curve between 0 and *f*

`roc`(*f*), the ROC (*f*), the TPR value corresponding to FPR=*f*

`rocinv`(*t*), the ROC^{−1}(*t*), the FPR value corresponding to TPR= *t*

`comproc`saves the following r-class results where <*stat*> is one or more of
`auc, pauc, roc`, or
`rocinv` corresponding to the requested summary statistics:

Scalars

`r`(<*stat*>`1`) statistic estimate for 1*st*marker`r`(<*stat*>`2`) statistic estimate for 2*nd*marker`r`(<*stat*>`delta`) estimate difference, <*stat*>`2`−<*stat*>`1``r(se_`<*stat*>`1`) bootstrap standard error estimate for 1marker statistic^{st}`r(se_`<*stat*>`2`) bootstrap standard error estimate for 2marker statistic^{nd}`r(se_`<*stat*>`delta`) bootstrap standard error estimate for the difference, <*stat*>`2`−<*stat*>`1`

In addition, many of the standard e-class bootstrap results left behind by
`bstat`are available after running
`comproc`.

The
`comproc`command applied to the pancreatic cancer marker data shown in Figure 1 yielded the following results:

Observe that the bootstrap result tables are generated by Stata’s
`estat bootstrap` command.

Our programs rely on representing the ROC curve as the CDF of the case marker values after they are standardized to the control reference distribution. This representation gives rise to simple algorithms for calculating *standard* nonparametric estimators of the ROC, AUC, and pAUC(f). The representation also provides alternative estimators of the ROC and its summary indices that are semiparametric or fully parametric. In a companion article (Janes, Longton and Pepe, 2008) we describe methods for covariate adjustment and ROC regression. The percentile value representation is particularly useful in these settings.

Applications to continuous data are our focus. Though the methods can be applied to ordinal markers and diagnostic tests, some standard ROC methods for ordinal data are not included in our routines. In particular, our algorithm for fitting the binormal ROC model does not correspond to the Dorfmann and Alf algorithm (Dorfman and Alf, 1969) for ordinal data. In addition, the AUC corresponding to a fitted binormal model is not output. Rather non-parametric AUC estimates are provided. We recommend the
`roctab`command in the main Stata package for fitting binormal models and calculating corresponding AUCs with ordinal data.

The DABS Center website is a repository of information for statistical evaluation of diagnostic tests and biomarkers. Included on the website are datasets. They can be used to gain familiarity with methods and software described here. The do–files that implement all of the analyses presented in this paper can be downloaded using Stata’s
`net`command: .net from http://www.stata-journal.com/software

- Alonzo TA, Pepe MS. Distribution-free ROC analysis using binary regression techniques. Biostatistics. 2002;3:421–32. [PubMed]
- Janes H, Longton GL, Pepe MS. Accommodating covariates in ROC analysis. The Stata Journal. 2008 (submitted) [PMC free article] [PubMed]
- Huang Y, Pepe MS. Biomarker evaluation using the controls as a reference population. Biostatistics. 2008 (under revision) [PMC free article] [PubMed]
- Cook NR. Use and misuse of the receiver operating characteristic curve in risk prediction. Circulation. 2007;115:928–935. [PubMed]
- DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics. 1988;44:837–45. [PubMed]
- Dodd L, Pepe MS. Partial AUC estimation and regression. Biometrics. 2003;59(3):614–23. [PubMed]
- Dorfman DD, Alf E. Maximum likelihood estimation of parameters of signal detection theory and determination of confidence intervals-rating method data. Journal of Mathematical Psychology. 1969;6:487–496.
- Hanley JA, Hajian-Tilaki KO. Sampling variability of nonparametric estimates of the areas under receiver operating characteristic curves: an update. Academic Radiology. 1997;4:49–58. [PubMed]
- Ogilvie JC, Creelman CD. Maximum-likelihood estimation of receiver operating characteristic curve parameters. Journal of Mathematical Psychology. 1968;5:377–391.
- Pepe MS. The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford University Press; United Kingdom: 2003.
- Pepe MS, Cai T. The analysis of placement values for evaluating discriminatory measures. Biometrics. 2004;60(2):528–535. [PubMed]
- Pepe MS, Etzioni R, Feng Z, Potter JD, Thompson ML, Thornquist M, Winget M, Yasui Y. Phases of biomarker development for early detection of cancer. Journal of the National Cancer Institute. 2001;93(14):1054–1061. [PubMed]
- Pepe MS, Longton G. Standardizing diagnostic markers to evaluate and compare their performance. Epidemiology. 2005;16(5):598–603. [PubMed]
- Thompson ML, Zucchini W. On the statistical analysis of ROC curves. Statistics in Medicine. 1989;8:1277–1290. [PubMed]
- Wieand S, Gail MH, James BR, James KL. A family of nonparametric statistics for comparing diagnostic markers with paired or unpaired data. Biometrika. 1989;76:585–592.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |