Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Acad Radiol. Author manuscript; available in PMC 2010 March 24.
Published in final edited form as:
PMCID: PMC2844793

A Probabilistic Model for the MRMC Method. Part 1. Theoretical Development


Rationale and Objectives

Current approaches to ROC analysis use the MRMC (multiple-reader, multiple-case) paradigm in which several readers read each case and their ratings (or scores) are used to construct an estimate of the area under the ROC curve or some other ROC-related parameter. Standard practice is to decompose the parameter of interest according to a linear model into terms that depend in various ways on the readers, cases and modalities. Though the methodological aspects of MRMC analysis have been studied in detail, the literature on the probabilistic basis of the individual terms is sparse. In particular, few papers state what probability law applies to each term and what underlying assumptions are needed for the assumed independence. When probability distributions are specified for these terms, these distributions are assumed to be Gaussians.

Materials and Methods

This paper approaches the MRMC problem from a mechanistic perspective. For a single modality, three sources of randomness are included: the images, the reader skill and the reader uncertainty. The probability law on the reader scores is written in terms of three nested conditional probabilities, and random variables associated with this probability are referred to as triply stochastic.

Results and Discussion

In this paper, we present the probabilistic MRMC model and apply this model to the Wilcoxon statistic. The result is a seven-term expansion for the variance of the figure of merit. We relate the terms in this expansion to those in the standard, linear MRMC model. Finally, we use the probabilistic model to derive constraints on the coefficients in the seven-term expansion.

Keywords: ROC analysis, multiple reader multiple case, Wilcoxon statistic


The multiple-reader, multiple-case paradigm is often used to assess the performance of a new medical-imaging system or to compare the performances of two or more such systems. In this paradigm, we first select a random sample of abnormal and normal cases. Each of these cases is individually read by each member in a sample of readers. Each reader produces a test statistic for each image which measures his or her confidence that an abnormality is present. This array of test statistics is used to generate a figure of merit. An important issue is the variance of this figure of merit as a function of the number of readers and cases. This is the issue addressed by standard, linear MRMC models [13] and by the probabilistic model presented here.

The linear model presupposes that the figure of merit can be decomposed as a sum of statistically uncorrelated terms. For a single modality there are 5 terms. The first term is the mean value of the figure of merit and is a constant. The remaining 4 terms, the reader term, the case term, the reader-case term, and the internal noise, are random variables. The reader term is a function of the reader sample only. The case term is a function of the case sample only. The reader-case term is a function of both samples. Finally, the internal noise term accounts for all other sources of variability not accounted for in the previous 3 terms.

The conventional assumption for the linear model is that the random terms in the linear decomposition are mutually independent and normally distributed [1]. As with any model-based decomposition, this assumption cannot be verified directly. In particular, a normality assumption cannot be valid if the figure of merit is the area under the ROC curve since this quantity must be between 0 and 1.

In this paper, we present a probabilistic formulation of the MRMC problem. We account for case variability, reader variability, and reader uncertainty. We then use the methods and concepts of doubly- and triply-stochastic variables to directly derive an exact seven-term decomposition of the variance of the Wilcoxon statistic [4, 5] as a function of the numbers of readers and cases. Our results are an extension of others who have studied the statistical properties of the Wilcoxon or Mann-Whitney statistics [610]. This paper expands upon results first presented in [11]. The probabilistic model introduced in that paper has already been used by B. Gallas [12] to develop a “one-shot” estimate of the components of variance for the Wilcoxon statistic with multiple readers and multiple cases. Here, we provide details of the theoretical foundations and subsequent derivations for the components of variance for the Wilcoxon statistic. We also derive constraints on the MRMC coefficients that result from the theoretical model.

In the probabilistic development [11], there is no need to define intermediate and unobservable random variables. The probabilistic assumptions that go into our model are derived from the physics and intuition of the problem as opposed to the independence assumptions used for the conventional linear model to make the problem tractable. The probabilistic approach also allows us to derive constraints on the coefficients in the seven-term expansion of the variance which cannot be derived from a linear model. Indeed the normality assumption used in the conventional linear model is inconsistent with the statistical properties used to derive these constraints.

Nevertheless, we show that we may rigorously define a decomposition of the figure of merit in terms of uncorrelated, but not necessarily independent or normal, random variables that correspond to the terms in the standard linear model. The variances of these random variables can be identified with terms, or combinations of terms, in the seven-term expansion. Finally, we show that the seven-term expansion turns into a ten-term expansion when replication of the entire study is considered.


MRMC methodology accounts for multiple readers each reading multiple cases. In general, we will assume that a reader analyzes an image (case) and produces a test statistic that signifies the reader’s confidence that the image is abnormal. We do not assume that a given reader will produce the same value for the test statistic on multiple readings of the same image. This is due to the internal noise or reader jitter inherent in the diagnostic process. Thus, the fundamental random quantities in the MRMC problem are the case sample, the reader sample, and the resulting array of test statistics.

2.1 Cases, readers and test statistics

The image matrix G (the cases) is composed of column vectors each representing an image. We subdivide this matrix into submatrices of signal-absent cases (i.e., normal cases), G0, and signal-present cases (i.e., abnormal cases), G1.


The matrix G0 is M × N0 and G1 is M × N1, where M is the number of pixels in an image, N0 is the number of signal-absent cases, and N1 is the number of signal-present images. The full image matrix G is M × N with N = N0 + N1. The submatrices are decomposed into the individual case vectors as follows:



The g0i and g1j are column vectors of image data. For digital imaging, these column vectors are finite dimensional, although this assumption is not required for our probabilistic development.

The reader parameters are also formed into column vectors γr, one for each of NR readers, and then collected into the reader matrix Γ:


This is a K × NR matrix, where NR is the number of readers, and K is the dimension of a reader parameter vector. The reader vectors γr may be mathematical constructs, such as a template for a linear model observer, or they may simply be a strings of numbers that are used to identify readers in an observer study. In fact, the γr do not even have to be numerical vectors; they could, for example, be the names of the radiologists in an observer study.

A reader produces a test statistic for each image. For a given case and reader this test statistic is a random variable due to internal noise. The test statistics for all of the readers and cases are collected into a matrix T. This matrix is subdivided into submatrices corresponding to signal-absent cases, T0, and signal-present cases, T1:


This is an NR × N matrix of the noisy values of the reader test statistics. Rows of the two submatrices correspond to individual readers and give a reader’s test statistics for all of the signal-absent images and all of the signal-present images, respectively:


We can also concatenate these row vectors to make a vector of all test statistics for a given reader:


2.2 Statistical assumptions

We make some statistical assumptions at this point. The cases are assumed to be drawn independently from signal-absent and signal-present distributions. The reader parameter vectors are assumed to be drawn independently from a distribution of such vectors. The readers are also assumed to be independent of the cases. Finally, the joint conditional density for the noisy test statistics is a product of conditional densities for the individual reader test statistics. Furthermore, this latter distribution depends only on the given reader and the cases. These assumptions can be summarized as follows:





The fact that the readers are independent from the cases does not imply that there is no reader-case interaction. In fact, the reader-case interaction is embodied in the distribution prt(tr|γr, G) which we discuss in more detail below. This independence assumption simply implies that the selection of the reader sample is not dependent on the selection of the case sample.

2.3 A note on notation

If x is a random variable with conditional PDF prx (x|y, z), conditioned on the random variables y and z, then


stands for the conditional expectation of f(x) conditioned on y. In this expression we are averaging over the distribution of x given (z, y), and then averaging over the distribution of z given y. To perform this operation we need the conditional densities prx (x|y, z) and prz (z|y). The end result is a function of y. It appears that Eqn. 12 could be reduced to a single integral,


However, from an operational point of view, we do not know prx(x|y) whereas with the probabilistic assumptions above, we can calculate or approximate the integrals in 12. When z and y are independent, as will often be the case, Eqn. 12 reduces to


Note that Eqn. 12 includes the case where x is a deterministic function x (y, z) of y and z, in which case


2.4 Figure of merit

Initially we will assume that the figure of merit has the following form


where â (t) is some figure of merit for an individual reader. Later we will be more specific about this function.

2.5 Mean and Variance

As an example of the probabilistic method and the notation introduced above, we compute the mean and variance of the figure of merit shown in Eqn. 16. From the independence assumptions on the readers the mean of the figure of merit can be written as


The inner angle bracket averages over internal noise with the reader and case sample fixed. The outer angle bracket is then the average of this quantity over readers and case samples.

For the expectation of the square of  we have a double sum, which we decompose into a single sum where the indices match, and a double sum where the indices do not match (see Appendix). The end result is


Putting the results we have so far together we get an expression for the variance of  in terms of moments of â (t):


The three moments we need to calculate in order to proceed further are


Equation 19 is an exact expression of the variance of the overall figure of merit in terms of expectations of the single-reader figure of merit. In order to compute these moments we need to specify our single-reader figure of merit â (t). In the next section we will compute these three moments when â (t) is the Wilcoxon statistic.

2.6 The Wilcoxon statistic

Suppose reader γ produces test statistics t given by,




The Wilcoxon statistic â (t) as a function of t is given by


In this equation s (t) is the step function, although that fact will not play a role in most of the calculations.

We will also use one more statistical assumption


This equation tells us that, conditional on the reader and cases, the components of t are independent. It also tells us that the conditional distribution for the internal noise on an individual test statistic only depends on the reader parameter vector and the corresponding case. If, for example, the internal noise is Gaussian, then the mean and variance of the test statistic for a given reader will depend only on the case at hand and the reader parameter vector.


3.1 The seven-term expansion

We will show that the statistical assumptions provided above imply that the variance of the Wilcoxon statistic can be expanded as


We will call this the seven-term expansion for the variance of  and find explicit expressions for the coeffiecients αn. These expressions will, in turn, lead to constraints on these coefficients. For any given set of values for NR, N0 and N1, these constraints can be used to provide an upper bound for Var [Â].

The three moments shown in Eqns.2022 are all that we need to derive Eqn. 28.

3.1.1 Moment #1

For the first moment (Eqn. 20) we have


The last equality introduces s(γ, g0i, g1j), the average performance of reader γ on cases g0i and g1j. The average is over the internal noise for this reader. Since the cases are independent and identically distributed random vectors, and are independent from the readers, we have the first of the three moments


The penultimate equality here introduces s¯¯(g0,g1), which is s(γ, g0, g1) averaged over readers γ. Finally, it is notationally convenient to define μ as the overall mean of â (t). This will facilitate the comparison with the more standard approach to the MRMC problem below.

3.1.2 Moment #2

For the second of the three moments we average over cases after squaring. This gives


This sum involves averaging over observers before multiplying and averaging over cases. By separating the sum into the cases where both indices match, one index matches, and no indices match we get four terms


We are now in a position to compute the first part of the overall variance (Eqn. 19), which is the variance of the noise-and-reader-averaged figure of merit with respect to the case randomness. The result is three terms


with the coefficients given by





These equations are very similar to those in Hoeffding [6] and Lehmann [10]. By using independence of cases, we may simplify these expressions. The results are,


In the α1 expression the quantity inside the square brackets is a random variable since g1 has been averaged over but g0 has not. The coefficient α1 is then the variance of this random variable. Similar remarks apply to α2.

3.1.3 Moment #3

For the third moment in our list we square before doing any averaging. This leads to a fourfold sum


As before we can break this down into four sums depending on which indices match, and use our independence assumptions to reduce this expectation to four terms:


The last term may require some explanation which is provided in the Appendix. If we use the fact that s2 (t) = s (t), then the first term reduces to


We are now ready to compute the second part of the overall variance (Eqn. 19). Combining the expressions we just derived with earlier ones we have








The first two terms in the expression for α5 are the average of a conditional variance of a random variable. A similar simplification is possible for α6 and α7. The end results are alternate expressions for these coefficients (See Appendix),


The quantity in the outer angle brackets in Eqn. 50 is the variance of the step function averaged over internal noise and cases for the signal-present class. The random variables involved in computing this variance are the internal noise for a signal-absent case and readers. This variance is then averaged over signal-absent cases. A similar description can be applied to the bracketed term in α6 and α7.

3.2 Bounds

To gain more insight into the significance of α1, α2, and α3, we expand s¯¯(g0,g1) as




Thus, s0 (g0) is s(t1t0) averaged over internal noise, readers and signal-present cases when the signal-absent case is g0. Similarly, s1 (g1) is s(t1t0) averaged over internal noise, readers and signal-absent cases when the signal-present case is g1. The random variable ε (g0, g1) is defined by Eqn. 53. It is straightforward to verify that the following expectations and conditional expectations vanish


These equations, combined with the fact that g0 and g1 are independent, imply that s0 (g0), s1 (g1) and ε (g0, g1) are uncorrelated random variables. This then gives us the expansion


From this expansion, Eqns. 3840, and the definitions above we can identify the coefficients α1, α2 and α3.


A random variable that is constrained to be between 0 and 1 has a maximum variance of 1/4. This fact and Eqns. 6063 above lead to the constraints


These constraints define a bounded region in the space of points (α1, α2, α3) and thus allow us to compute, for any given values of N0 and N1, the maximum possible contribution to the variance of  from the first three terms in the seven-term expansion.


This bound represents a worst case scenario. In practice we could expect this sum to be significantly smaller than the upper bound.

Equations 4952 lead to the following constraints


These constraints define a bounded region in the space of points (α4, α5, α6, α7). This allows us to compute, for any given NR, N0 and N1, the maximum contribution of the last four terms in the seven-term expansion to the variance of Â, i.e.,


Again we could expect this sum to be significantly smaller in practice. However, we can now write an upper bound for the variance of Â,


This could be useful in simulations where the numbers of cases and readers are easy to change and the computations of the αn would be tedious.

3.3 The Moments Needed

To compute the αn in the full expansion for the variance of  (Eqn. 28), we need four moments at the reader-averaged level





one at the case-averaged level,


and two at the test statistic level


The αn are then linear combinations of these moments.

3.4 Relationship to the Conventional Linear Model

We now wish to see how the expansion given above for the variance of  (T) compares to the more standard approach to MRMC that uses an expansion into uncorrelated components [1,2]. For this purpose we set


and define each term in this expansion in terms of averages. The first term μ is the overall mean


The second term is the reader term


This random variable is a function of the reader sample Γ. The third term is the case term


This random variable is a function of the case sample G. Since Γ and G are statistically independent, the random variables r and c are also statistically independent. The fourth term is the reader/case term


This random variable is a function of Γ and G. The last term is the only one that depends on the internal noise of the readers via the matrix of test statistics T


We will call this the noise term. It is straightforward to show that


These equations, together with the independence of r and c, can then be used to show that r, c, rc and ε are statistically uncorrelated. This fact gives us the following expansion for the variance of the figure of merit


We will now examine each term in this expansion

3.4.1 Variance of the reader term

The reader term may be written as follows


For the second moment, which is also the variance, of this random variable we have, via the now familiar manipulations of the square of a sum,


The first equality follows from the independence of the readers, the second from the definition of μ, and the third from the definition of s(γ, g0, g1). The end result is that


Thus the variance of the reader term can be identified with the fourth term in the seven-term expansion for Var [Â].

3.4.2 Variance of the case term

For the case term we can write


The variance is given by


In other words, the first three terms in the seven-term expansion for Var [Â] comprise the variance of the case term.

It should be noted that if N1 = [var phi]Ntotal and N0 = (1 − [var phi])Ntotal, where [var phi] is the prevalence, then the variance of the case term is given by,


The first term in Eqn. 103 agrees with standard MRMC models [2]. The second term can contribute substantially when Ntotal is small and will become negligible for Ntotal sufficiently large.

3.4.3 Variance of the reader/case term

The reader/case term can be written as


For the variance we use the fact that r, c and rc are uncorrelated and have zero mean values to get


This equation then gives us


A new moment appears here that does not appear in the computation of the seven-term expansion for Var [Â], i.e., the first term in the square brackets. This moment is discussed further in the Appendix.

3.4.4 Variance of the noise term

The noise term is explicitly given by


By rearranging the variance expansion for  we have


This then gives us


Note that it is rc + ε that accounts for the last three terms in the seven-term expansion. It appears that the separation of rc+ε into rc and ε is not a very useful concept at this point. Moments appear in the individual variances of rc and ε that cancel out, and therefore do not appear in the expressions for the αn. It would therefore be somewhat wasteful to compute their variances separately. This situation changes when we consider replication.

3.5 Replication

Now we replicate the trial K times, with the same cases and readers, and assume that the internal reader noise is independent and identically distributed from one trial to the next (the readers are not learning anything). Then we have an average figure of merit for the K trials


The mean value of ÂK is given by


For the variance we need the second moment, which can be expanded as


This expansion follows from the usual independence arguments. We can now write for the variance


The new moment we need is


This expansion follows from the conditional independence of the internal noise and the independence of the readers. Now we may write


The moments involved here have all been worked out above or in the Appendix. The result is a ten-term expansion which we will describe below.

We may also expand into uncorrelated components as before


The second line here follows from the conditional independence between trials. Now we have


where the first three variances of are given above, and the last variance is given by


The dependencies on numbers of cases, readers and trials are given by


Explicit expressions for the βn and the δn can be written out using equations already provided. The main point is that we can now see that rc and εK are now distinguishable in terms of their variances. Of course replicating an MRMC study in this way is probably not practical, except in simulation.


We have developed a probabilistic framework for analyzing MRMC problems. We have applied this framework to the Wilcoxon statistic and derived an exact seven-term expansion for the variance of the figure of merit as a function of the numbers of readers and cases. We have used the probabilistic model to derive constraints on the coefficients in this expansion. These constraints, in turn, provide an upper bound on the variance of the Wilcoxon statistic. We introduced a linear decomposition of the figure of merit into uncorrelated random variables that are defined in term of conditional expectations over the readers, cases, and test statistics. This linear decomposition has the same structure as the conventional MRMC decomposition. We have shown that the variances of the individual terms in the linear decomposition can be related to the terms in the seven-term expansion. Finally, we have shown that replication of the MRMC experiment results in a ten-term expansion.

In the future, we plan to validate this seven-term expansion of the variance of the Wilcoxon statistic in simulation. We will also apply this methodology to real data. We are especially interested in computing the variance of the Wilcoxon statistic for ideal, Bayesian observers which we calculate using Markov chain Monte Carlo techniques. Finally, we are working on the extension of the probabilistic model to account for multiple modalities as well as multiple readers and multiple cases.


We thank Drs. Charles Metz, Brandon Gallas and Robert Wagner for their many helpful discussions about this topic. This work was supported by NIH/NCI grant K01 CA87017 and by NIH/NIBIB grants R01 EB002146, R37 EB000803, P41 EB002035.


Derivation of Equation 18

What follows is a derivation of Eqn. 18.


The second equality follows from the independence of the test statistics when the readers and cases are fixed. The third equality follows from the independence of the reader parameters, and the fact that they are identically distributed.

Explanation of the Equation 42

We start with the sum over all four indices with no matched indices in Eqn. 41,


The first equality follows from independence of the internal noise when readers and cases are fixed. The fourth equality follows from independence of cases.

Explanation of Equation 50

The first step to derive Eqn. 50 is to rewrite the first term in Eqn. 46 as




where the first equality follows from conditional independence of the internal noise and the second equality from independence of the cases. The second step is to rewrite the second term in Eqn. 46 as



where again independence of cases is used. Now we use the fact that


to get the result in Eqn. 50.

The New Moment in the Variance of the Reader/Case Term

The first moment in Eqn. 106 can be expanded as


Note that Var [rc] has no term that varies as NR1. This variance will only have terms that vary as (NRN0)−1, (NRN1)−1 and (NRN0N1)−1.


Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.


1. Dorfman DD, Berbaum KS, Metz CE. Receiver operating characteristic rating analysis. generalization to the population of readers and patients with the jackknife method. Investigative Radiology. 1992;27:723–731. [PubMed]
2. Beiden SV, Wagner RF, Campbell G. Components-of-variance models and multiple-bootstrap experiments: An alternative method for random-effects, receiver operating characteristic analysis. Academic Radiology. 2000;7:342–349. [PubMed]
3. Roe CA, Metz CE. Variance-component modeling in the analysis of receiver operating characteristic index estimates. Academic Radiology. 1997;4(8):587–600. [PubMed]
4. Wilcoxon F. Individual comparison of ranking methods. Biometrics. 1945;1:80–93.
5. Mann HB, Whitney DR. On a test of whether one of two random variables is stochastically larger than the other. Annals of Mathematical Statistics. 1947;18:50–60.
6. Hoeffding W. A class of statistics with asymptotically normal distribution. Annals of Mathematical Statistics. 1948;19:293–325.
7. Bamber D. The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. Journal of Mathematical Psychology. 1975;12:387–415.
8. Noether GE. Elements of Nonparametric Statistics. New York: Wiley; 1967.
9. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics. 1988;44:837–845. [PubMed]
10. Lehmann EL. Consistency and unbiasedness of certain nonparametric tests. Annals of Mathematical Statistics. 1951;22:165–179.
11. Barrett HH, Kupinski MA, Clarkson E. Medical Imaging 2005: Image Perception, Observer Performance, and Technology Assessment. SPIE; 2005. Probabilistic foundations of the MRMC method; pp. 21–31.
12. Gallas BD. One-shot estimate of mrmc variance: Auc. Academic Radiology. 2006;13:353–362. [PubMed]