Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2905510

Formats

Article sections

- Abstract
- 1. INTRODUCTION
- 2. TWO CONCEPTUAL FRAMEWORKS
- 3. FITTING MODELS TO DATA
- 4. Extending the Range of Research Questions to Comparing Tests
- DISCUSSION
- REFERENCES

Authors

Related links

Med Decis Making. Author manuscript; available in PMC 2011 July 1.

Published in final edited form as:

Published online 2010 February 10. doi: 10.1177/0272989X09357477

PMCID: PMC2905510

NIHMSID: NIHMS188363

Daryl E. Morris, Biostatistics and Biomathematics, Public Health Sciences Division, Fred Hutchinson Cancer Research Center and Department of Biostatistics, University of Washington, Seattle, Washington;

Address correspondence to Dr. Margaret Sullivan Pepe, Biostatistics and Biomathematics, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue North, M2-B500, Seattle, WA 98109; telephone: (206) 667-7398; fax: (206)667-7004; Email: ude.notgnihsaw.u@epepsm

The publisher's final edited version of this article is available at Med Decis Making

See other articles in PMC that cite the published article.

Statistical evaluation of medical imaging tests used for diagnostic and prognostic purposes often employ receiver operating characteristic (ROC) curves. Two methods for ROC analysis are popular. The ordinal regression method is the standard approach used when evaluating tests with ordinal values. The direct ROC modeling method is a more recently developed approach that has been motivated by applications to tests with continuous values, such as biomarkers.

In this paper, we compare the methods in terms of model formulations, interpretations of estimated parameters, the ranges of scientific questions that can be addressed with them, their computational algorithms and the efficiencies with which they use data.

We show that a strong relationship exists between the methods by demonstrating that they fit the same models when only a single test is evaluated. The ordinal regression models are typically alternative parameterizations of the direct ROC models and vice-versa. The direct method has two major advantages over the ordinal regression method: (i) estimated parameters relate directly to ROC curves. This facilitates interpretations of covariate effects on ROC performance; and (ii) comparisons between tests can be done directly in this framework. Comparisons can be made while accommodating covariate effects and comparisons can be made even between tests that have values on different scales, such as between a continuous biomarker test and an ordinal valued imaging test. The ordinal regression method provides slightly more precise parameter estimates from data in our simulated data models.

While the ordinal regression method is slightly more efficient, the direct ROC modeling method has important advantages in regards to interpretation and it offers a framework to address a broader range of scientific questions including the facility to compare tests.

Receiver operating characteristic (ROC) curves have long been used to characterize the inherent accuracy of medical tests for diagnosis and prognosis.^{1} They have been particularly popular in evaluating imaging modalities where images are rated on an ordinal scale according to the reader’s certainty of a positive diagnosis.^{2} In this context a large body of statistical methodology has developed for ROC analysis of ordinal rating data. These methods have drawn primarily on ordinal regression modeling methods.^{3} For example, the classic Dorfman and Alf algorithm^{4} for estimating binormal ROC curves employs ordinal regression methods, as does the Tosteson and Begg^{5} methodology for evaluating factors influencing test accuracy. This approach continues to be refined for addressing ever more complex statistical questions.^{6}

In parallel a second body of literature has developed for ROC analysis over the past decade motivated by applications where test results are on a continuous scale. We call this approach the direct ROC modeling approach and describe it in detail below. For example, biomarkers used for diagnosis and prognosis are typically measured on a continuous scale.^{7} It has been noted that the direct ROC modeling methodology can also be applied to ordinal valued tests.^{8}^{,}^{9} The purpose of this paper is first to make explicit the close connections that exist between the ordinal regression (OR) and direct ROC modeling (DM) methods for statistical evaluation of ROC curves. Second, we contrast the approaches in terms of their conceptual frameworks, the range of questions addressed by them and the statistical efficiency with which they utilize data.

We illustrate with two applications. The first example concerns interpretation of initial screening mammograms by radiologists using the BI-RADS scale.^{10} We compare 1000 women who were found to have breast cancer within one year of the mammogram to 1000 women with no diagnosis of breast cancer. Both groups are randomly sampled from larger populations from the Breast Cancer Surveillance Consortium (http://breastscreening.cancer.gov/). The BI-RADS scale is ordered in terms of increasing likelihood of breast cancer as follows: (1) Negative; (2) Benign finding; (3) Probably benign finding; (0) Need additional imaging; (4) Suspicious abnormality; and (5) Highly suggestive of malignancy. Due to the small number of women without cancer who had readings in category 5, we collapsed categories 4 and 5 together for this example. It has been shown that this ordering corresponds to increasing cancer rates.^{11} The zero category is intended as a placeholder until additional imaging resolves uncertainty, but in practice the zero is often not replaced. For each subject, in addition to the image rating and case or control status, the dataset includes data on breast density measured on the BI-RADS coding system: (1) almost entirely fat; (2) scattered fibroglandular densities; (3) heterogeneously dense; and (4) extremely dense. Very few women had breasts classified as almost entirely fat in this sample so categories (1) and (2) were combined and labeled as ‘not dense’. We seek to estimate the ROC curves associated with mammography for women with each level of breast density and to describe the effect if any that breast density has on the accuracy of mammographic readings.

The second dataset is similarly set in the context of breast cancer diagnosis, but now a continuous valued biomarker is measured for each woman in addition to her mammographic reading. For women diagnosed with cancer, the stage of her disease is noted in the dataset. Some scientific questions of interest are: (i) to compare ROC curves for the biomarker and mammogram tests; (ii) to evaluate the accuracies of the tests for detecting late stage cancer compared with their capacities for detecting early stage cancer; and (iii) to determine if the relative performance of the tests varies with breast cancer stage.

The key distinction between the OR and DM approaches to ROC analysis is in the entities that are modeled statistically. The OR method formulates a statistical model for the probability distribution of the test results given case or control status and covariates. That is, we model Prob[*Y*=*r*|*D, X*] where *Y* is the image rating result (which takes values *r*=1,2,…,*R*, where *R* is the number of possible ratings), *D* is case or control status (*D*=1 for a case and *D*=0 for a control) and *X* denotes covariates. From this one can calculate the corresponding ROC curves for the test in populations with specified covariate values *X*=*x*. In contrast, the DM method directly formulates a statistical model for the ROC curves as an explicit function of *X*. In other words, OR models probability frequencies of ratings while DM models the relationship between those probability frequencies for cases and controls, which is the trade-off between true and false positive rates.

To see the correspondence between OR and DM model formulations, consider the classic binormal ROC curve without covariates that is discussed in depth in Pepe (2003, sections 4.4–4.5). The binormal ROC curve assumes that the tradeoff between false positive rates (*f _{r}* = Prob[

$$\text{ROC}({f}_{r})=\mathrm{\Phi}({\gamma}_{I}+{\gamma}_{S}{\mathrm{\Phi}}^{-1}({f}_{r}))$$

(1)

with Φ denoting the standard normal cumulative distribution function. The parameters γ* _{I}* and γ

The OR formulation instead specifies a model for the probability frequencies of ratings conditional on case or control status, *D* =1 or 0, respectively:

$$\text{Prob}[Y<r|D]=\mathrm{\Phi}(\{{\theta}_{r}-{\alpha}_{1}D\}/\text{exp}\{\beta D\})$$

(2)

The θ* _{r}* are called intercepts or ‘cut points’ associated with the rating thresholds

To calculate the ROC curve that corresponds to the OR model, recall that the ROC curve is defined as the true positive rate, Prob[*Y* ≥ *r* | *D* =1], written as a function of the false positive rate, *f _{r}* = Prob[

$$\begin{array}{cc}\text{ROC}({f}_{r})=\text{Prob}[Y\ge r|D=1]\hfill & =1-\mathrm{\Phi}(\{{\theta}_{r}-{\alpha}_{1}\}/\text{exp}\{\beta \})\hfill \\ \hfill & =\mathrm{\Phi}(\{-{\theta}_{r}+{\alpha}_{1}\}/\text{exp}\{\beta \})\hfill \\ \hfill & =\mathrm{\Phi}(\{{\mathrm{\Phi}}^{-1}({f}_{r})+{\alpha}_{1}\}/\text{exp}\{\beta \})\hfill \end{array}$$

the last equality following from the OR stipulation that *f _{r}* = Φ(−θ

$${f}_{r}=1-\mathrm{\Phi}({\theta}_{r}),\text{}{\gamma}_{S}=1/\text{exp}\{\beta \},\text{}{\gamma}_{I}={\alpha}_{1}/\text{exp}\{\beta \}$$

(3)

Conversely, one can start with the binormal ROC curve and show that one minus the true positive rate and one minus the false positive rate derived from it follow the OR formulation using the same correspondences between parameters, now written as:

$${\theta}_{r}=-{\mathrm{\Phi}}^{-1}({f}_{r}),\text{}\beta =\text{log}\{1/{\gamma}_{S}\},\text{}{\alpha}_{1}={\gamma}_{I}/{\gamma}_{S}$$

That is, when considering a single test and no covariates, the two models are equivalent, being simple reparameterizations of each other.

Popular formulations for OR and DM models that include covariates are:

$$\text{OR}:\text{Prob}[Y<r|D,X]=\mathrm{\Phi}(\{{\theta}_{\mathit{\text{rX}}}-{\alpha}_{1}D-{\alpha}_{3}\mathit{\text{DX}}\}/\text{exp}\{\beta D\})$$

(4)

$$\text{DM}:{\text{ROC}}_{X}(f)=\mathrm{\Phi}({\gamma}_{I}+{\gamma}_{\mathit{\text{IX}}}X+{\gamma}_{S}{\mathrm{\Phi}}^{-1}(f))$$

(5)

where *f* {*f*_{2X}, *f*_{3X}, …,*f _{RX}*} are the false positive rates within the population with covariate value

$${\gamma}_{S}=1/\text{exp}\{\beta \},\text{}{\gamma}_{I}={\alpha}_{1}/\text{exp}\{\beta \},\text{}{\gamma}_{\mathit{\text{IX}}}={\alpha}_{3}/\text{exp}\{\beta \},\text{}{f}_{\mathit{rX}}=1-\mathrm{\Phi}({\theta}_{\mathit{\text{rX}}})$$

(6)

The DM approach parameterizes the covariate specific ROC curve directly (i.e. the ROC curve for the population with covariate value *X*, ROC* _{X}*) as a binormal curve with intercept γ

Observe that no particular structure is assumed for θ* _{rX}*, or equivalently for

$$\text{OR}:\text{Prob}[Y<r|D,X]=\mathrm{\Phi}(\{{\theta}_{r}-{\alpha}_{1}D-{\alpha}_{2}X-{\alpha}_{3}\text{DX}\}/\text{exp}\{\beta D\})$$

(7)

$$\begin{array}{cc}\text{DM}:{\text{ROC}}_{X}(f)\hfill & =\mathrm{\Phi}({\gamma}_{I}+{\gamma}_{\mathit{\text{IX}}}X+{\gamma}_{S}{\mathrm{\Phi}}^{-1}(f))\hfill \\ \hfill & {f}_{\mathit{\text{rX}}}=\mathrm{\Phi}(-{\theta}_{r}+{\alpha}_{2}X)\hfill \end{array}$$

(8)

Here in the OR approach, the ‘cut point’ θ* _{rX}* is parameterized as θ

In the general DM framework as laid out by Pepe (2003, chapter 6), one formulates a regression model for covariate effects on test results in controls, *f _{rX}*, and a separate regression model for covariate effects on the ROC curve, denoted by ROC

Another possibility is that covariates affect the ROC curve but not the image ratings in controls. A special example concerns disease specific covariates such as stage of breast cancer. The ROC comparing late stage cancer to controls is likely to be higher than that comparing early stage cancer to controls, so *X*=stage should enter the ROC model, perhaps as a term of the form γ* _{IX}X* . This covariate would not enter the

The seminal paper by Tosteson and Begg^{5} on using OR for ROC analysis allowed covariates to enter the scale component as well as the location component:

$$\text{Prob}[Y<r|D,X]=\mathrm{\Phi}\left(\frac{\{{\theta}_{r}-{\alpha}_{1}D-{\alpha}_{2}X-{\alpha}_{3}\mathit{\text{DX}}\}}{\text{exp}\{{\beta}_{1}D+{\beta}_{2}X+{\beta}_{3}\mathit{\text{DX}}\}}\right)$$

(9)

This is equivalent to the DM formulation

$$\begin{array}{cc}{\text{ROC}}_{X}(f)\hfill & =\mathrm{\Phi}\left(\frac{{\alpha}_{1}+{\alpha}_{3}X+\text{exp}({\beta}_{2}X){\mathrm{\Phi}}^{-1}(f)}{\text{exp}\{{\beta}_{1}+({\beta}_{2}+{\beta}_{3})X\}}\right)\hfill \\ \hfill & {f}_{\mathit{\text{rX}}}=\mathrm{\Phi}(\{-{\theta}_{r}+{\alpha}_{2}X\}/\text{exp}\{{\beta}_{2}X\})\hfill \end{array}$$

This formulation is unappealing because the effect of *X* on the ROC curve is complicated. It enters the ROC model in a non-linear fashion and does not give rise to simple summaries of the effect of *X* on test accuracy. An alternative DM formulation for an ROC model is written as

$${\text{ROC}}_{X}(f)=\mathrm{\Phi}({\gamma}_{I}+{\gamma}_{\mathit{\text{IX}}}X+({\gamma}_{S}+{\gamma}_{\mathit{\text{SX}}}X){\mathrm{\Phi}}^{-1}(f))$$

(10)

An advantage of this model is that the effects of *X* on ROC intercept and slope are summarized succinctly in the parameters γ* _{IX}* and γ

The Tosteson and Begg model is rooted in a latent decision variable conceptual framework. In this framework, one considers that underlying the observed ordinal test result is categorization of an unobservable latent decision variable *L* with a normal distribution in cases and in controls and that (θ* _{rX}* , θ

Since (9) and (10) are not equivalent models, one must choose between them when ROC slope appears to depend on covariates other than disease status. The choice between the two presumably depends on which model better fits the data and the goal of the analysis –– to summarize effects of *X* on the ROC curve in a simple fashion or to summarize effects of *X* on test result distributions.

To illustrate a variety of ROC analyses, in Figure 1 we show ROC curves calculated with the Breast Cancer Surveillance Consortium data described earlier. The raw data are displayed in the top right panel of Figure 1 as empirical ROC curves for women in each of the 3 breast density categories. In the top left panel a fitted binormal ROC model is shown that ignores the covariate, breast density. This corresponds to the model displayed as equation (1). The observed false positive rates {_{2},…,_{5}} are also displayed. The middle panels show ROC curves associated with the three breast density categories modeled using equation (5) with the covariate *X* comprised of two binary variables *X*=(*X*_{1}, *X*_{2}) where *X*_{1} and *X*_{2} are dummy variables for “dense” and “extremely dense” breast density. That is, the ROC intercepts were allowed to vary with breast density but the slopes were assumed to be the same in all three categories. In the left panel no assumptions were made about the effect of density on false positive rates *f _{rX}* or equivalently on ‘cut points,’ while in the right panel we assumed that they followed an ordinal regression model as in equations (7) and (8). The bottom panels show the ‘averaged’ ROC curve, which allows the ‘cut points’ (θ

Binormal ROC models for mammography including breast density as a covariate. Models can be formulated either within OR or DM frameworks. Top left panel: no covariate model, equations (1) and (2); Middle panels: ROC intercept depending on breast density **...**

In the previous section we showed that DM and OR models are different representations of the same models. We now consider how to fit models to data. Different algorithms have traditionally been used by analysts depending on how they formulate the model, as DM or as OR.

The OR formulation naturally gives rise to maximum likelihood algorithms for estimating parameters. However, standard software for ordinal regression maximum likelihood only accommodates location parameters and does not allow scale parameters. Therefore, even the simplest binormal model (equation (2)) cannot be fit using standard ordinal regression algorithms because the model includes the scale component exp(β*D*). For this simple model without covariates, several statistical software packages provide the Dorfman and Alf maximum likelihood estimation algorithm. More generally, methods for nonlinear models can be adapted. We wrote our own code in the R environment^{14} for maximum likelihood estimation of parameters. Methods for fitting these models in SAS with the NLMIXED procedure have been described.^{15} These apply when the scale component only depends on *D* and not on *X*.

Algorithms for fitting ROC models have been proposed within the general DM framework.^{8}^{,}^{16}^{,}^{17} Implementation in the Stata software package^{18} is well developed and has been documented in detail.^{9}^{,}^{19} We also implemented this in the R package to perform simulation studies. The key steps in the algorithms are to (i) estimate the covariate specific rating distributions in controls, i.e. the false positive rates, {* _{rX}*,

The DM algorithms were originally developed for continuous test results where binary variables *U _{if}* are defined based on a set of

Biased ROC estimation when unobserved FPR values are employed with the DM fitting algorithm. The five observable ROC points are indicated by circles on the solid curve. The curve fitted by choosing equally spaced *f* values (circles and triangles) is indicated **...**

The DM algorithm estimates parameters in the false positive rate model first and then estimates parameters in the ROC model. In contrast with the OR algorithm, it is not symmetric in its treatment of case and control data. If one switched the labeling of cases and controls, different estimates would result. Moreover, the DM algorithms are not maximum likelihood methods. In particular parameters in the false positive rate model are estimated only with data from controls. In contrast, the OR algorithm estimates all parameters in both models at the same time by maximizing the likelihood of all the data. As a consequence data from cases can impact estimated values of the false positive rate parameters. This may lead to better efficiency for the OR fitting algorithm. The two-step DM approach allows for the forms of the model to be different thereby providing flexibility. However, as noted earlier, if one constrains the forms to be the same (for example, both probit) one can fit the false positive rate and ROC models simultaneously using the same maximum likelihood algorithm that the OR approach uses by reparameterizing the models as a single OR model.

When the DM and OR algorithms are used to fit the same model, it is of interest to know which one is most efficient in the sense of producing the most precise estimates. It is known from established statistical theory that estimates calculated by maximizing the likelihood function are asymptotically optimal in terms of being consistent and having the least sampling variability.^{20} This general result implies that parameter estimates and ROC values derived from the OR fitting algorithm, which are maximum likelihood, have the smallest standard deviations at least as sample sizes become large. The optimality of the maximum likelihood algorithm usually manifests in small samples too. Since the DM fitting algorithm is not maximum likelihood we use simulation studies to investigate the extent to which they are suboptimal.

For our simulations data were generated under a variety of scenarios that gave rise to different true binormal regression models. In all scenarios simulated, subjects were first assigned their case or control status and then their ordinal marker *Y* was derived by categorizing a continuous marker *L* generated from a normal distribution that had standard deviation 1 and mean shown in Table 1. In particular the cut points {Φ^{−1} (.1), Φ^{−1} (.3), Φ^{−1} (.5), Φ^{−1} (.7), Φ^{−1} (.9)}={−1.28, −0.52, 0.00, 0.52, 1.28} gave rise to *Y*. In all, 5 scenarios were studied, one in which no covariate was involved, two in which a categorical covariate was defined and two in which a continuous covariate was defined. The covariate was generated from the same distribution in cases and controls, namely from 4 categories with equal frequencies for the categorical covariate and from a uniform distribution on (−1, 1) for the continuous covariate. We generated datasets with equal numbers of cases and controls (*n*=100, 200, 500) and fit the appropriate models using DM and OR algorithms.

Simulation scenarios used to evaluate the relative efficiency of ordinal regression (OR) versus direct ROC modeling (DM) fitting algorithms. After assigning a subject his case-control status and covariate value, a normally distributed variable with mean **...**

Table 2 shows results of analyses when no covariates were involved. Recall that the OR maximum likelihood algorithm is the classic Dorfman-Alf method for estimating the binormal curve. It is compared with the DM method in Table 2. Interestingly the mean and standard deviations for estimates calculated with the DM method are almost identical to those from the Dorfman-Alf algorithm indicating that in this setting the DM fitting algorithm provides estimates that are very near the theoretically optimal maximum likelihood estimates.

A subset of our results, pertaining to models generated and fit when covariates affect both the false positive rates and the ROC curve are shown in Table 3 and in Figure 3. Conclusions were similar for scenarios not reported. As expected the OR algorithms are somewhat more efficient but the differences appear to be small. Interestingly ROC values seem to be estimated with essentially the same precision using DM and OR fitting algorithms. The FPR values associated with each category of *Y* appear to be estimated a little more precisely with the OR than with the DM method. An intuitive explanation is that the DM method uses only data from controls to estimate the FPR values, while the OR method incorporates data from cases as well by utilizing all the modeling assumptions in the likelihood that is maximized.

Accuracy estimates based on simulated data when a continuous covariate *X* affects both false positive rates and ROC curves. ROC values associated with covariate values *X*=−.5 and *X*=.5 are displayed. The sample size is *n*=200 cases and controls. Shown **...**

Results of fitting the binormal models described in relation to Figure 1 using the OR and DM fitting algorithms are displayed in Table 4 and Table 5. The estimated ROC values at false positive rates equal to (0.1,0.3,0.5) are shown in Table 4. The estimates agree reasonably well. The confidence intervals are for DM and OR fitting algorithms agree closely when covariates effects on false positive rates are assumed not to exist, or when they are assumed to follow an ordinal regression model. However, they appear to be substantially smaller for the maximum likelihood OR algorithm for these data when the false positive rates are calculated separately within each breast density category.

ROC values estimated with the direct ROC modeling (DM) and ordinal regression (OR) algorithms applied to the Breast Cancer Surveillance Consortium mammography data. Shown are point estimates and confidence intervals in parentheses. Confidence intervals **...**

The DM approach can be applied to continuous tests essentially as we have described it here. The analogue of the OR approach for application to continuous tests is to model the continuous test result distributions for cases and for controls. Covariates can be incorporated into the models if appropriate. These two approaches to evaluating continuous diagnostic tests, DM and modeling test results, have been compared qualitatively.^{8}^{,}^{21} We refer the reader to previous publications since similar conclusions apply in the context we consider here, namely ordinal valued tests. Perhaps the most important advantage identified for the DM framework over the OR framework is that DM allows one to succinctly compare the ROC accuracies of tests. Moreover, it has been shown how such comparisons can be made while accommodating covariate effects on test results at the same time. We now explore this methodology when ordinal valued tests are involved.

Comparisons are possible in the DM framework even when tests themselves are on different scales. As an example, consider the data displayed in Figure 4 where a continuous biomarker test and an ordinal valued imaging test are available for 1000 cases and 1000 controls data (These data were simulated and are available online at http://labs.fhcrc.org/pepe/dabs/datasets.html). Binormal ROC curves fit to the data for all cases combined and for controls are displayed in the left panel of Figure 5. Define a covariate, *X*_{test}, that specifies the test, *X*_{test}=0 for the ordinal imaging test and *X*_{test}=1 for the continuous biomarker test. The following is a comprehensive ROC model that includes both tests:

$${\text{ROC}}_{{X}_{\text{test}}}(f)=\mathrm{\Phi}({\gamma}_{I}+{\gamma}_{{\mathit{\text{IX}}}_{\text{test}}}{X}_{\text{test}}+({\gamma}_{S}+{\gamma}_{{\mathit{\text{SX}}}_{\text{test}}}{X}_{\text{test}}){\mathrm{\Phi}}^{-1}(f))$$

When *X*_{test}=0, the ROC curve relates to the ordinal test and has intercept γ* _{I}* and slope γ

Data distributions for imaging and biomarker tests for breast cancer in controls, in cases with early stage cancer (darker shading) and in cases with late stage cancer (lighter shading).

ROC curves comparing imaging versus biomarker tests for breast cancer. Left panel groups all diseased subjects together for comparison with controls. Right panel separately compares late stage cases and early stage cases with controls. Points displayed **...**

An additional covariate in this dataset concerns stage of disease, defined only for cases. The curves in the lower right panel of Figure 4 incorporate this covariate into the comparison of the two tests. We see that the accuracy of the biomarker test appears to be superior to the imaging test in detecting early stage disease but that they have similar performance for distinguishing between late stage disease and controls. The DM framework allows us to make rigorous statistical inference about these comparisons. Define *X*_{sev}=1 for late stage and *X*_{sev}=0 for early stage. We fit the following model to our data

$${\text{ROC}}_{X}=\mathrm{\Phi}({\gamma}_{I}+{\gamma}_{{\mathit{\text{IX}}}_{\text{test}}}{X}_{\text{test}}+{\gamma}_{{\mathit{\text{IX}}}_{\text{sev}}}{X}_{\text{sev}}+{\gamma}_{{\mathit{\text{IX}}}_{\text{int}}}{X}_{\text{test}}{X}_{\text{sev}}+{\gamma}_{S}{\mathrm{\Phi}}^{-1}(f))$$

after determining that the slope parameter was unaffected by either *X*_{test} or *X*_{sev}. The ROC curve for the baseline covariate values *X*_{sev}=0 and *X*_{test}=0 (early stage disease versus controls using the imaging test) is determined by estimates of its intercept * _{I}* = 0.438 (95% CI= 0.322,0.557) and its slope

Methods to implement analyses for comparing tests using the DM framework have been described previously.^{8} Briefly, the data are arranged as one data record per test result. Thus in our setting each subject has 2 data records, one for the imaging test and one for the biomarker test. Each record contains the variables (*Y, D, X*_{test}, *X*_{sev}). Since the variable *X*_{sev} is only relevant for cases with disease, it is coded as missing for controls. The _{rXtest} values are calculated empirically for the imaging test. That is, the observed proportions of controls with imaging test ratings at or exceeding *r* give rise to values _{rXtest} when *X*_{test}=0. We use the same values of _{rXtest}for the continuous biomarker test. This means that the same values on the horizontal axis of the ROC plot are used for fitting the test specific ROC curves. The DM algorithm proceeds by calculating *U _{r,i}* values for

The main purpose of this article is to contrast the direct ROC modeling method that is popular for continuous tests with the ordinal regression method that is widely popular for image rating tests. Table 6 summarizes our findings. We show that when a single test is under consideration, the models are usually equivalent in the sense that they make the same modeling assumptions. One is typically a reparameterization of the other. We hope that this recognition will help unify these apparently discrepant approaches to ROC analysis.

Yet there are major advantages for using the DM framework. In particular, when several tests are under consideration, it can be used to address scientific questions concerning comparisons between diagnostic tests. We showed through an example that additional covariates can be incorporated as well. Further examples of this general approach are provided in Pepe 2003 (section 6.4)^{8} where applications to continuous tests are illustrated. Here we applied the methodology to compare a continuous test with an ordinal test. This sort of comparison cannot be done within the OR framework to ROC analysis.

One advantage of the OR framework is that statistically optimal maximum likelihood methods are naturally employed for estimation with data. A variety of algorithms have been proposed for model fitting in the DM framework. We investigated the efficiency of one algorithm relative to maximum likelihood using simulation studies. We found that when the binormal model holds for the ROC curves this algorithm has performance almost equal to maximum likelihood. However, when the data deviate from modeling assumptions, the DM and OR methods may produce different results.

An issue that arises frequently in evaluations of imaging tests is that readers rate multiple images giving rise to observations that are clustered by reader. When many readers participate in a study, such as in the *Breast Cancer Surveillance Consortium* data, and one wants to make inference pertaining to the population of readers, random effect models are often entertained in the OR framework.^{6}^{,}^{22} Random effects for readers may pertain to the thresholding criteria or to the ROC curves, or to both. The DM framework does not yet accommodate random effects in ROC models. Extensions to the DM approach to accommodate random effects warrants further research.

Supported in part by R01 CA129934: Considerations of Covariates in Biomarker Studies and R01 GM054438: Statistical Methods for Medical Tests and Biomarkers by the National Institutes of Health (NIH) to Dr. Pepe at Fred Hutchinson Cancer Research Center. This work was also supported by the National Cancer Institute-funded Breast Cancer Surveillance Consortium co-operative agreement (U01CA63740, U01CA86076, U01CA86082, U01CA63736, U01CA70013, U01CA69976, U01CA63731, U01CA70040). We thank the participating women, mammography facilities, and radiologists for the data they have provided for this study. A list of the BCSC investigators and procedures for requesting BCSC data for research purposes are provided at: http://breastscreening.cancer.gov/.

Daryl E. Morris, Biostatistics and Biomathematics, Public Health Sciences Division, Fred Hutchinson Cancer Research Center and Department of Biostatistics, University of Washington, Seattle, Washington.

Margaret Sullivan Pepe, Biostatistics and Biomathematics, Public Health Sciences Division, Fred Hutchinson Cancer Research Center, and Department of Biostatistics, University of Washington, Seattle, Washington, Email: ude.notgnihsaw.u@epepsm.

William E. Barlow, Group Health Center for Health Studies, Seattle, Washington.

1. Swets JA, Pickett RM. Evaluation of diagnostic systems: Methods from signal detection theory. New York: Academic Press; 1982.

2. Metz CE. Some practical issues of experimental design and data analysis in radiologic ROC studies. Invest Radiol. 1989;24:234–245. 1989. [PubMed]

3. Agresti A. Categorical data analysis. New York: Wiley; 1990.

4. Dorfman DD, Alf E. Maximum likelihood estimation of parameters of signal detection theory and determination of confidence intervals-rating method data. J Math Psychol. 1969;6:487–496.

5. Tosteson AAN, Begg CB. A general regression methodology for ROC curve estimation. Med Decis Making. 1988;8:204–215. [PubMed]

6. Ishwaran H, Gatsonis CA. A general class of hierarchical ordinal regression models with applications to correlated ROC analysis. Can J Stat. 2000;28:731–750.

7. Pepe MS, Feng Z, Janes H, Bossuyt P, Potter J. Pivotal evaluation of the accuracy of a biomarker used for classification or prediction: Standards for study design. J Nat Cancer Inst. 2008;100:1432–1438. [PubMed]

8. Pepe MS. The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford University Press; 2003.

9. Pepe MS, Longton G, Janes H. Estimation and comparison of receiver operating characteristic curves. Stata Journal. 2009;9(1):1–16. [PMC free article] [PubMed]

10. D’Orsi CJ, Bassett LW, Berg WA. Breast Imaging Reporting and Data System, BI-RADS: Mammography. 4th ed. Reston, VA: American College of Radiology; 2003.

11. Barlow WE. Accuracy of Screening Mammography Interpretation by Characteristics of Radiologists. JNCI. 2004:1840–1850. [PMC free article] [PubMed]

12. Janes H, Pepe MS. Adjusting for covariates in studies of diagnostic, screening or prognostic markers: an old concept in a new setting. Am J Epidem. 2008;168:89–97. [PubMed]

13. Janes H, Pepe MS. Adjusting for covariate effects on classification accuracy using the covariate adjusted ROC curve. Biometrika. 2009;96:383–398. [PMC free article] [PubMed]

14. R Development Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2008. ISBN 3-900051-07-0, URL http://www.R-project.org.

15. Gonen M. Analyzing Receiver Operating Characteristic Curves With SAS. SAS Publishing. 2007

16. Alonzo TA, Pepe MS. Distribution-free ROC analysis using binary regression techniques. Biostatistics. 2002;3:421–432. [PubMed]

17. Pepe MS, Cai T. The analysis of placement values for evaluating discriminatory measures. Biometrics. 2004;60:528–535. [PubMed]

18. StataCorp. Stata Statistical Software: Release 10. College Station, TX: StataCorp LP; 2007.

19. Janes H, Longton G, Pepe M. Accommodating covariates in ROC analysis. Stata Journal. 2009;9(1):17–39. [PMC free article] [PubMed]

20. Casella G, Berger R. Statistical Inference. 2nd Edition. Duxbury Advanced Series; 2002.

21. Pepe MS. Three approaches to regression analysis of receiver operating characteristic curves for continuous test results. Biometrics. 1998;54:124–135. [PubMed]

22. Zheng YY, Barlow WE, Cutter G. Assessing accuracy of mammography in the presence of verification bias and intrareader correlation. Biometrics. 2005;61:259–268. [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |