Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2844793

Formats

Article sections

Authors

Related links

Acad Radiol. Author manuscript; available in PMC 2010 March 24.

Published in final edited form as:

PMCID: PMC2844793

NIHMSID: NIHMS13467

College of Optical Sciences, The University of Arizona, 1630 East University Blvd., Tucson, Arizona 85721, (TEL) 520-626-7280, (FAX) 520-626-2892, (email) clarkson/at/radiology.arizona.edu

The publisher's final edited version of this article is available at Acad Radiol

See other articles in PMC that cite the published article.

Current approaches to ROC analysis use the MRMC (multiple-reader, multiple-case) paradigm in which several readers read each case and their ratings (or scores) are used to construct an estimate of the area under the ROC curve or some other ROC-related parameter. Standard practice is to decompose the parameter of interest according to a linear model into terms that depend in various ways on the readers, cases and modalities. Though the methodological aspects of MRMC analysis have been studied in detail, the literature on the probabilistic basis of the individual terms is sparse. In particular, few papers state what probability law applies to each term and what underlying assumptions are needed for the assumed independence. When probability distributions are specified for these terms, these distributions are assumed to be Gaussians.

This paper approaches the MRMC problem from a mechanistic perspective. For a single modality, three sources of randomness are included: the images, the reader skill and the reader uncertainty. The probability law on the reader scores is written in terms of three nested conditional probabilities, and random variables associated with this probability are referred to as triply stochastic.

In this paper, we present the probabilistic MRMC model and apply this model to the Wilcoxon statistic. The result is a seven-term expansion for the variance of the figure of merit. We relate the terms in this expansion to those in the standard, linear MRMC model. Finally, we use the probabilistic model to derive constraints on the coefficients in the seven-term expansion.

The multiple-reader, multiple-case paradigm is often used to assess the performance of a new medical-imaging system or to compare the performances of two or more such systems. In this paradigm, we first select a random sample of abnormal and normal cases. Each of these cases is individually read by each member in a sample of readers. Each reader produces a test statistic for each image which measures his or her confidence that an abnormality is present. This array of test statistics is used to generate a figure of merit. An important issue is the variance of this figure of merit as a function of the number of readers and cases. This is the issue addressed by standard, linear MRMC models [1–3] and by the probabilistic model presented here.

The linear model presupposes that the figure of merit can be decomposed as a sum of statistically uncorrelated terms. For a single modality there are 5 terms. The first term is the mean value of the figure of merit and is a constant. The remaining 4 terms, the reader term, the case term, the reader-case term, and the internal noise, are random variables. The reader term is a function of the reader sample only. The case term is a function of the case sample only. The reader-case term is a function of both samples. Finally, the internal noise term accounts for all other sources of variability not accounted for in the previous 3 terms.

The conventional assumption for the linear model is that the random terms in the linear decomposition are mutually independent and normally distributed [1]. As with any model-based decomposition, this assumption cannot be verified directly. In particular, a normality assumption cannot be valid if the figure of merit is the area under the ROC curve since this quantity must be between 0 and 1.

In this paper, we present a probabilistic formulation of the MRMC problem. We account for case variability, reader variability, and reader uncertainty. We then use the methods and concepts of doubly- and triply-stochastic variables to directly derive an exact seven-term decomposition of the variance of the Wilcoxon statistic [4, 5] as a function of the numbers of readers and cases. Our results are an extension of others who have studied the statistical properties of the Wilcoxon or Mann-Whitney statistics [6–10]. This paper expands upon results first presented in [11]. The probabilistic model introduced in that paper has already been used by B. Gallas [12] to develop a “one-shot” estimate of the components of variance for the Wilcoxon statistic with multiple readers and multiple cases. Here, we provide details of the theoretical foundations and subsequent derivations for the components of variance for the Wilcoxon statistic. We also derive constraints on the MRMC coefficients that result from the theoretical model.

In the probabilistic development [11], there is no need to define intermediate and unobservable random variables. The probabilistic assumptions that go into our model are derived from the physics and intuition of the problem as opposed to the independence assumptions used for the conventional linear model to make the problem tractable. The probabilistic approach also allows us to derive constraints on the coefficients in the seven-term expansion of the variance which cannot be derived from a linear model. Indeed the normality assumption used in the conventional linear model is inconsistent with the statistical properties used to derive these constraints.

Nevertheless, we show that we may rigorously define a decomposition of the figure of merit in terms of uncorrelated, but not necessarily independent or normal, random variables that correspond to the terms in the standard linear model. The variances of these random variables can be identified with terms, or combinations of terms, in the seven-term expansion. Finally, we show that the seven-term expansion turns into a ten-term expansion when replication of the entire study is considered.

MRMC methodology accounts for multiple readers each reading multiple cases. In general, we will assume that a reader analyzes an image (case) and produces a test statistic that signifies the reader’s confidence that the image is abnormal. We do not assume that a given reader will produce the same value for the test statistic on multiple readings of the same image. This is due to the internal noise or reader jitter inherent in the diagnostic process. Thus, the fundamental random quantities in the MRMC problem are the case sample, the reader sample, and the resulting array of test statistics.

The image matrix ** G** (the cases) is composed of column vectors each representing an image. We subdivide this matrix into submatrices of signal-absent cases (i.e., normal cases),

(1)

The matrix *G*_{0} is *M* × *N*_{0} and *G*_{1} is *M* × *N*_{1}, where *M* is the number of pixels in an image, *N*_{0} is the number of signal-absent cases, and *N*_{1} is the number of signal-present images. The full image matrix ** G** is

(2)

(3)

The *g*_{0}* _{i}* and

The reader parameters are also formed into column vectors *γ** _{r}*, one for each of

(4)

This is a *K* × *N _{R}* matrix, where

A reader produces a test statistic for each image. For a given case and reader this test statistic is a random variable due to internal noise. The test statistics for all of the readers and cases are collected into a matrix ** T**. This matrix is subdivided into submatrices corresponding to signal-absent cases,

(5)

This is an *N _{R}* ×

(6)

We can also concatenate these row vectors to make a vector of all test statistics for a given reader:

(7)

We make some statistical assumptions at this point. The cases are assumed to be drawn independently from signal-absent and signal-present distributions. The reader parameter vectors are assumed to be drawn independently from a distribution of such vectors. The readers are also assumed to be independent of the cases. Finally, the joint conditional density for the noisy test statistics is a product of conditional densities for the individual reader test statistics. Furthermore, this latter distribution depends only on the given reader and the cases. These assumptions can be summarized as follows:

(8)

(9)

(10)

(11)

The fact that the readers are independent from the cases does not imply that there is no reader-case interaction. In fact, the reader-case interaction is embodied in the distribution *pr _{t}*(

If *x* is a random variable with conditional PDF *pr _{x}* (

(12)

stands for the conditional expectation of *f*(*x*) conditioned on *y*. In this expression we are averaging over the distribution of *x* given (*z, y*), and then averaging over the distribution of *z* given *y.* To perform this operation we need the conditional densities *pr _{x}* (

(13)

However, from an operational point of view, we do not know *pr _{x}*(

(14)

Note that Eqn. 12 includes the case where *x* is a deterministic function *x* (*y*, *z*) of *y* and *z*, in which case

(15)

Initially we will assume that the figure of merit has the following form

(16)

where *â* (** t**) is some figure of merit for an individual reader. Later we will be more specific about this function.

As an example of the probabilistic method and the notation introduced above, we compute the mean and variance of the figure of merit shown in Eqn. 16. From the independence assumptions on the readers the mean of the figure of merit can be written as

(17)

The inner angle bracket averages over internal noise with the reader and case sample fixed. The outer angle bracket is then the average of this quantity over readers and case samples.

For the expectation of the square of *Â* we have a double sum, which we decompose into a single sum where the indices match, and a double sum where the indices do not match (see Appendix). The end result is

(18)

Putting the results we have so far together we get an expression for the variance of *Â* in terms of moments of *â* (** t**):

(19)

The three moments we need to calculate in order to proceed further are

(20)

(21)

(22)

Equation 19 is an exact expression of the variance of the overall figure of merit in terms of expectations of the single-reader figure of merit. In order to compute these moments we need to specify our single-reader figure of merit *â* (** t**). In the next section we will compute these three moments when

Suppose reader ** γ** produces test statistics

(23)

with

(24)

(25)

The Wilcoxon statistic *â* (** t**) as a function of

(26)

In this equation *s* (*t*) is the step function, although that fact will not play a role in most of the calculations.

We will also use one more statistical assumption

(27)

This equation tells us that, conditional on the reader and cases, the components of ** t** are independent. It also tells us that the conditional distribution for the internal noise on an individual test statistic only depends on the reader parameter vector and the corresponding case. If, for example, the internal noise is Gaussian, then the mean and variance of the test statistic for a given reader will depend only on the case at hand and the reader parameter vector.

We will show that the statistical assumptions provided above imply that the variance of the Wilcoxon statistic can be expanded as

(28)

We will call this the seven-term expansion for the variance of *Â* and find explicit expressions for the coeffiecients *α _{n}*. These expressions will, in turn, lead to constraints on these coefficients. For any given set of values for

The three moments shown in Eqns.20–22 are all that we need to derive Eqn. 28.

For the first moment (Eqn. 20) we have

(29)

(30)

The last equality introduces (** γ**,

(31)

The penultimate equality here introduces
, which is (** γ**,

For the second of the three moments we average over cases after squaring. This gives

(32)

This sum involves averaging over observers before multiplying and averaging over cases. By separating the sum into the cases where both indices match, one index matches, and no indices match we get four terms

(33)

We are now in a position to compute the first part of the overall variance (Eqn. 19), which is the variance of the noise-and-reader-averaged figure of merit with respect to the case randomness. The result is three terms

(34)

with the coefficients given by

(35)

(36)

and

(37)

These equations are very similar to those in Hoeffding [6] and Lehmann [10]. By using independence of cases, we may simplify these expressions. The results are,

(38)

(39)

(40)

In the *α*_{1} expression the quantity inside the square brackets is a random variable since *g*_{1} has been averaged over but *g*_{0} has not. The coefficient *α*_{1} is then the variance of this random variable. Similar remarks apply to *α*_{2}.

For the third moment in our list we square before doing any averaging. This leads to a fourfold sum

(41)

As before we can break this down into four sums depending on which indices match, and use our independence assumptions to reduce this expectation to four terms:

(42)

The last term may require some explanation which is provided in the Appendix. If we use the fact that *s*^{2} (*t*) = *s* (*t*), then the first term reduces to

(43)

We are now ready to compute the second part of the overall variance (Eqn. 19). Combining the expressions we just derived with earlier ones we have

(44)

with

(45)

(46)

(47)

and

(48)

The first two terms in the expression for *α*_{5} are the average of a conditional variance of a random variable. A similar simplification is possible for *α*_{6} and *α*_{7}. The end results are alternate expressions for these coefficients (See Appendix),

(49)

(50)

(51)

(52)

The quantity in the outer angle brackets in Eqn. 50 is the variance of the step function averaged over internal noise and cases for the signal-present class. The random variables involved in computing this variance are the internal noise for a signal-absent case and readers. This variance is then averaged over signal-absent cases. A similar description can be applied to the bracketed term in *α*_{6} and *α*_{7}.

To gain more insight into the significance of *α*_{1}, *α*_{2}, and *α*_{3}, we expand
as

(53)

where

(54)

(55)

Thus, *s*_{0} (*g*_{0}) is *s*(*t*_{1} − *t*_{0}) averaged over internal noise, readers and signal-present cases when the signal-absent case is *g*_{0}. Similarly, *s*_{1} (*g*_{1}) is *s*(*t*_{1} − *t*_{0}) averaged over internal noise, readers and signal-absent cases when the signal-present case is *g*_{1}. The random variable *ε* (*g*_{0}, *g*_{1}) is defined by Eqn. 53. It is straightforward to verify that the following expectations and conditional expectations vanish

(56)

(57)

(58)

(59)

These equations, combined with the fact that *g*_{0} and *g*_{1} are independent, imply that *s*_{0} (*g*_{0}), *s*_{1} (*g*_{1}) and *ε* (*g*_{0}, *g*_{1}) are uncorrelated random variables. This then gives us the expansion

(60)

From this expansion, Eqns. 38–40, and the definitions above we can identify the coefficients *α*_{1}, *α*_{2} and *α*_{3}.

(61)

(62)

(63)

A random variable that is constrained to be between 0 and 1 has a maximum variance of 1/4. This fact and Eqns. 60–63 above lead to the constraints

(64)

(65)

(66)

(67)

These constraints define a bounded region in the space of points (*α*_{1}, *α*_{2}, *α*_{3}) and thus allow us to compute, for any given values of *N*_{0} and *N*_{1}, the maximum possible contribution to the variance of *Â* from the first three terms in the seven-term expansion.

(68)

This bound represents a worst case scenario. In practice we could expect this sum to be significantly smaller than the upper bound.

Equations 49–52 lead to the following constraints

(69)

(70)

(71)

(72)

These constraints define a bounded region in the space of points (*α*_{4}, *α*_{5}, *α*_{6}, *α*_{7}). This allows us to compute, for any given *N _{R}*,

(73)

Again we could expect this sum to be significantly smaller in practice. However, we can now write an upper bound for the variance of *Â,*

(74)

This could be useful in simulations where the numbers of cases and readers are easy to change and the computations of the *α _{n}* would be tedious.

To compute the *α _{n}* in the full expansion for the variance of

(75)

(76)

(77)

(78)

one at the case-averaged level,

(79)

and two at the test statistic level

(80)

(81)

The *α _{n}* are then linear combinations of these moments.

We now wish to see how the expansion given above for the variance of *Â* (** T**) compares to the more standard approach to MRMC that uses an expansion into uncorrelated components [1,2]. For this purpose we set

(82)

and define each term in this expansion in terms of averages. The first term *μ* is the overall mean

(83)

The second term is the reader term

(84)

This random variable is a function of the reader sample **Γ**. The third term is the case term

(85)

This random variable is a function of the case sample ** G**. Since

(86)

This random variable is a function of **Γ** and ** G**. The last term is the only one that depends on the internal noise of the readers via the matrix of test statistics

(87)

We will call this the noise term. It is straightforward to show that

(88)

(89)

(90)

(91)

(92)

These equations, together with the independence of *r* and *c*, can then be used to show that *r*, *c*, *rc* and *ε* are statistically uncorrelated. This fact gives us the following expansion for the variance of the figure of merit

(93)

We will now examine each term in this expansion

The reader term may be written as follows

(94)

For the second moment, which is also the variance, of this random variable we have, via the now familiar manipulations of the square of a sum,

(95)

(96)

The first equality follows from the independence of the readers, the second from the definition of *μ*, and the third from the definition of (** γ**,

(97)

Thus the variance of the reader term can be identified with the fourth term in the seven-term expansion for Var [*Â*].

For the case term we can write

(98)

(99)

(100)

The variance is given by

(101)

(102)

In other words, the first three terms in the seven-term expansion for Var [*Â*] comprise the variance of the case term.

It should be noted that if *N*_{1} = *N*_{total} and *N*_{0} = (1 − )*N*_{total}, where is the prevalence, then the variance of the case term is given by,

(103)

The first term in Eqn. 103 agrees with standard MRMC models [2]. The second term can contribute substantially when *N*_{total} is small and will become negligible for *N*_{total} sufficiently large.

The reader/case term can be written as

(104)

For the variance we use the fact that *r*, *c* and *rc* are uncorrelated and have zero mean values to get

(105)

This equation then gives us

(106)

A new moment appears here that does not appear in the computation of the seven-term expansion for Var [*Â*], i.e., the first term in the square brackets. This moment is discussed further in the Appendix.

The noise term is explicitly given by

(107)

By rearranging the variance expansion for *Â* we have

(108)

This then gives us

(109)

(110)

Note that it is *rc* + *ε* that accounts for the last three terms in the seven-term expansion. It appears that the separation of *rc*+*ε* into *rc* and *ε* is not a very useful concept at this point. Moments appear in the individual variances of *rc* and *ε* that cancel out, and therefore do not appear in the expressions for the *α _{n}*. It would therefore be somewhat wasteful to compute their variances separately. This situation changes when we consider replication.

Now we replicate the trial *K* times, with the same cases and readers, and assume that the internal reader noise is independent and identically distributed from one trial to the next (the readers are not learning anything). Then we have an average figure of merit for the *K* trials

(111)

The mean value of *Â _{K}* is given by

(112)

For the variance we need the second moment, which can be expanded as

(113)

This expansion follows from the usual independence arguments. We can now write for the variance

(114)

The new moment we need is

(115)

This expansion follows from the conditional independence of the internal noise and the independence of the readers. Now we may write

(116)

The moments involved here have all been worked out above or in the Appendix. The result is a ten-term expansion which we will describe below.

We may also expand into uncorrelated components as before

(117)

(118)

The second line here follows from the conditional independence between trials. Now we have

(119)

where the first three variances of are given above, and the last variance is given by

(120)

The dependencies on numbers of cases, readers and trials are given by

(121)

(122)

(123)

(124)

Explicit expressions for the *β _{n}* and the

We have developed a probabilistic framework for analyzing MRMC problems. We have applied this framework to the Wilcoxon statistic and derived an exact seven-term expansion for the variance of the figure of merit as a function of the numbers of readers and cases. We have used the probabilistic model to derive constraints on the coefficients in this expansion. These constraints, in turn, provide an upper bound on the variance of the Wilcoxon statistic. We introduced a linear decomposition of the figure of merit into uncorrelated random variables that are defined in term of conditional expectations over the readers, cases, and test statistics. This linear decomposition has the same structure as the conventional MRMC decomposition. We have shown that the variances of the individual terms in the linear decomposition can be related to the terms in the seven-term expansion. Finally, we have shown that replication of the MRMC experiment results in a ten-term expansion.

In the future, we plan to validate this seven-term expansion of the variance of the Wilcoxon statistic in simulation. We will also apply this methodology to real data. We are especially interested in computing the variance of the Wilcoxon statistic for ideal, Bayesian observers which we calculate using Markov chain Monte Carlo techniques. Finally, we are working on the extension of the probabilistic model to account for multiple modalities as well as multiple readers and multiple cases.

We thank Drs. Charles Metz, Brandon Gallas and Robert Wagner for their many helpful discussions about this topic. This work was supported by NIH/NCI grant K01 CA87017 and by NIH/NIBIB grants R01 EB002146, R37 EB000803, P41 EB002035.

What follows is a derivation of Eqn. 18.

(125)

(126)

(127)

(128)

(129)

The second equality follows from the independence of the test statistics when the readers and cases are fixed. The third equality follows from the independence of the reader parameters, and the fact that they are identically distributed.

We start with the sum over all four indices with no matched indices in Eqn. 41,

(130)

(131)

(132)

(133)

(134)

The first equality follows from independence of the internal noise when readers and cases are fixed. The fourth equality follows from independence of cases.

The first step to derive Eqn. 50 is to rewrite the first term in Eqn. 46 as

(135)

(136)

(137)

where the first equality follows from conditional independence of the internal noise and the second equality from independence of the cases. The second step is to rewrite the second term in Eqn. 46 as

(138)

(139)

where again independence of cases is used. Now we use the fact that

The first moment in Eqn. 106 can be expanded as

(141)

Note that Var [*rc*] has no term that varies as
. This variance will only have terms that vary as (*N _{R}N*

**Publisher's Disclaimer: **This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

1. Dorfman DD, Berbaum KS, Metz CE. Receiver operating characteristic rating analysis. generalization to the population of readers and patients with the jackknife method. Investigative Radiology. 1992;27:723–731. [PubMed]

2. Beiden SV, Wagner RF, Campbell G. Components-of-variance models and multiple-bootstrap experiments: An alternative method for random-effects, receiver operating characteristic analysis. Academic Radiology. 2000;7:342–349. [PubMed]

3. Roe CA, Metz CE. Variance-component modeling in the analysis of receiver operating characteristic index estimates. Academic Radiology. 1997;4(8):587–600. [PubMed]

4. Wilcoxon F. Individual comparison of ranking methods. Biometrics. 1945;1:80–93.

5. Mann HB, Whitney DR. On a test of whether one of two random variables is stochastically larger than the other. Annals of Mathematical Statistics. 1947;18:50–60.

6. Hoeffding W. A class of statistics with asymptotically normal distribution. Annals of Mathematical Statistics. 1948;19:293–325.

7. Bamber D. The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. Journal of Mathematical Psychology. 1975;12:387–415.

8. Noether GE. Elements of Nonparametric Statistics. New York: Wiley; 1967.

9. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics. 1988;44:837–845. [PubMed]

10. Lehmann EL. Consistency and unbiasedness of certain nonparametric tests. Annals of Mathematical Statistics. 1951;22:165–179.

11. Barrett HH, Kupinski MA, Clarkson E. Medical Imaging 2005: Image Perception, Observer Performance, and Technology Assessment. SPIE; 2005. Probabilistic foundations of the MRMC method; pp. 21–31.

12. Gallas BD. One-shot estimate of mrmc variance: Auc. Academic Radiology. 2006;13:353–362. [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's Canada Institute for Scientific and Technical Information in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |