|Home | About | Journals | Submit | Contact Us | Français|
Current approaches to ROC analysis use the MRMC (multiple-reader, multiple-case) paradigm in which several readers read each case and their ratings (or scores) are used to construct an estimate of the area under the ROC curve or some other ROC-related parameter. Standard practice is to decompose the parameter of interest according to a linear model into terms that depend in various ways on the readers, cases and modalities. Though the methodological aspects of MRMC analysis have been studied in detail, the literature on the probabilistic basis of the individual terms is sparse. In particular, few papers state what probability law applies to each term and what underlying assumptions are needed for the assumed independence. When probability distributions are specified for these terms, these distributions are assumed to be Gaussians.
This paper approaches the MRMC problem from a mechanistic perspective. For a single modality, three sources of randomness are included: the images, the reader skill and the reader uncertainty. The probability law on the reader scores is written in terms of three nested conditional probabilities, and random variables associated with this probability are referred to as triply stochastic.
In this paper, we present the probabilistic MRMC model and apply this model to the Wilcoxon statistic. The result is a seven-term expansion for the variance of the figure of merit. We relate the terms in this expansion to those in the standard, linear MRMC model. Finally, we use the probabilistic model to derive constraints on the coefficients in the seven-term expansion.
The multiple-reader, multiple-case paradigm is often used to assess the performance of a new medical-imaging system or to compare the performances of two or more such systems. In this paradigm, we first select a random sample of abnormal and normal cases. Each of these cases is individually read by each member in a sample of readers. Each reader produces a test statistic for each image which measures his or her confidence that an abnormality is present. This array of test statistics is used to generate a figure of merit. An important issue is the variance of this figure of merit as a function of the number of readers and cases. This is the issue addressed by standard, linear MRMC models [1–3] and by the probabilistic model presented here.
The linear model presupposes that the figure of merit can be decomposed as a sum of statistically uncorrelated terms. For a single modality there are 5 terms. The first term is the mean value of the figure of merit and is a constant. The remaining 4 terms, the reader term, the case term, the reader-case term, and the internal noise, are random variables. The reader term is a function of the reader sample only. The case term is a function of the case sample only. The reader-case term is a function of both samples. Finally, the internal noise term accounts for all other sources of variability not accounted for in the previous 3 terms.
The conventional assumption for the linear model is that the random terms in the linear decomposition are mutually independent and normally distributed . As with any model-based decomposition, this assumption cannot be verified directly. In particular, a normality assumption cannot be valid if the figure of merit is the area under the ROC curve since this quantity must be between 0 and 1.
In this paper, we present a probabilistic formulation of the MRMC problem. We account for case variability, reader variability, and reader uncertainty. We then use the methods and concepts of doubly- and triply-stochastic variables to directly derive an exact seven-term decomposition of the variance of the Wilcoxon statistic [4, 5] as a function of the numbers of readers and cases. Our results are an extension of others who have studied the statistical properties of the Wilcoxon or Mann-Whitney statistics [6–10]. This paper expands upon results first presented in . The probabilistic model introduced in that paper has already been used by B. Gallas  to develop a “one-shot” estimate of the components of variance for the Wilcoxon statistic with multiple readers and multiple cases. Here, we provide details of the theoretical foundations and subsequent derivations for the components of variance for the Wilcoxon statistic. We also derive constraints on the MRMC coefficients that result from the theoretical model.
In the probabilistic development , there is no need to define intermediate and unobservable random variables. The probabilistic assumptions that go into our model are derived from the physics and intuition of the problem as opposed to the independence assumptions used for the conventional linear model to make the problem tractable. The probabilistic approach also allows us to derive constraints on the coefficients in the seven-term expansion of the variance which cannot be derived from a linear model. Indeed the normality assumption used in the conventional linear model is inconsistent with the statistical properties used to derive these constraints.
Nevertheless, we show that we may rigorously define a decomposition of the figure of merit in terms of uncorrelated, but not necessarily independent or normal, random variables that correspond to the terms in the standard linear model. The variances of these random variables can be identified with terms, or combinations of terms, in the seven-term expansion. Finally, we show that the seven-term expansion turns into a ten-term expansion when replication of the entire study is considered.
MRMC methodology accounts for multiple readers each reading multiple cases. In general, we will assume that a reader analyzes an image (case) and produces a test statistic that signifies the reader’s confidence that the image is abnormal. We do not assume that a given reader will produce the same value for the test statistic on multiple readings of the same image. This is due to the internal noise or reader jitter inherent in the diagnostic process. Thus, the fundamental random quantities in the MRMC problem are the case sample, the reader sample, and the resulting array of test statistics.
The image matrix G (the cases) is composed of column vectors each representing an image. We subdivide this matrix into submatrices of signal-absent cases (i.e., normal cases), G0, and signal-present cases (i.e., abnormal cases), G1.
The matrix G0 is M × N0 and G1 is M × N1, where M is the number of pixels in an image, N0 is the number of signal-absent cases, and N1 is the number of signal-present images. The full image matrix G is M × N with N = N0 + N1. The submatrices are decomposed into the individual case vectors as follows:
The g0i and g1j are column vectors of image data. For digital imaging, these column vectors are finite dimensional, although this assumption is not required for our probabilistic development.
The reader parameters are also formed into column vectors γr, one for each of NR readers, and then collected into the reader matrix Γ:
This is a K × NR matrix, where NR is the number of readers, and K is the dimension of a reader parameter vector. The reader vectors γr may be mathematical constructs, such as a template for a linear model observer, or they may simply be a strings of numbers that are used to identify readers in an observer study. In fact, the γr do not even have to be numerical vectors; they could, for example, be the names of the radiologists in an observer study.
A reader produces a test statistic for each image. For a given case and reader this test statistic is a random variable due to internal noise. The test statistics for all of the readers and cases are collected into a matrix T. This matrix is subdivided into submatrices corresponding to signal-absent cases, T0, and signal-present cases, T1:
This is an NR × N matrix of the noisy values of the reader test statistics. Rows of the two submatrices correspond to individual readers and give a reader’s test statistics for all of the signal-absent images and all of the signal-present images, respectively:
We can also concatenate these row vectors to make a vector of all test statistics for a given reader:
We make some statistical assumptions at this point. The cases are assumed to be drawn independently from signal-absent and signal-present distributions. The reader parameter vectors are assumed to be drawn independently from a distribution of such vectors. The readers are also assumed to be independent of the cases. Finally, the joint conditional density for the noisy test statistics is a product of conditional densities for the individual reader test statistics. Furthermore, this latter distribution depends only on the given reader and the cases. These assumptions can be summarized as follows:
The fact that the readers are independent from the cases does not imply that there is no reader-case interaction. In fact, the reader-case interaction is embodied in the distribution prt(tr|γr, G) which we discuss in more detail below. This independence assumption simply implies that the selection of the reader sample is not dependent on the selection of the case sample.
If x is a random variable with conditional PDF prx (x|y, z), conditioned on the random variables y and z, then
stands for the conditional expectation of f(x) conditioned on y. In this expression we are averaging over the distribution of x given (z, y), and then averaging over the distribution of z given y. To perform this operation we need the conditional densities prx (x|y, z) and prz (z|y). The end result is a function of y. It appears that Eqn. 12 could be reduced to a single integral,
However, from an operational point of view, we do not know prx(x|y) whereas with the probabilistic assumptions above, we can calculate or approximate the integrals in 12. When z and y are independent, as will often be the case, Eqn. 12 reduces to
Note that Eqn. 12 includes the case where x is a deterministic function x (y, z) of y and z, in which case
Initially we will assume that the figure of merit has the following form
where â (t) is some figure of merit for an individual reader. Later we will be more specific about this function.
As an example of the probabilistic method and the notation introduced above, we compute the mean and variance of the figure of merit shown in Eqn. 16. From the independence assumptions on the readers the mean of the figure of merit can be written as
The inner angle bracket averages over internal noise with the reader and case sample fixed. The outer angle bracket is then the average of this quantity over readers and case samples.
For the expectation of the square of Â we have a double sum, which we decompose into a single sum where the indices match, and a double sum where the indices do not match (see Appendix). The end result is
Putting the results we have so far together we get an expression for the variance of Â in terms of moments of â (t):
The three moments we need to calculate in order to proceed further are
Equation 19 is an exact expression of the variance of the overall figure of merit in terms of expectations of the single-reader figure of merit. In order to compute these moments we need to specify our single-reader figure of merit â (t). In the next section we will compute these three moments when â (t) is the Wilcoxon statistic.
Suppose reader γ produces test statistics t given by,
The Wilcoxon statistic â (t) as a function of t is given by
In this equation s (t) is the step function, although that fact will not play a role in most of the calculations.
We will also use one more statistical assumption
This equation tells us that, conditional on the reader and cases, the components of t are independent. It also tells us that the conditional distribution for the internal noise on an individual test statistic only depends on the reader parameter vector and the corresponding case. If, for example, the internal noise is Gaussian, then the mean and variance of the test statistic for a given reader will depend only on the case at hand and the reader parameter vector.
We will show that the statistical assumptions provided above imply that the variance of the Wilcoxon statistic can be expanded as
We will call this the seven-term expansion for the variance of Â and find explicit expressions for the coeffiecients αn. These expressions will, in turn, lead to constraints on these coefficients. For any given set of values for NR, N0 and N1, these constraints can be used to provide an upper bound for Var [Â].
For the first moment (Eqn. 20) we have
The last equality introduces (γ, g0i, g1j), the average performance of reader γ on cases g0i and g1j. The average is over the internal noise for this reader. Since the cases are independent and identically distributed random vectors, and are independent from the readers, we have the first of the three moments
The penultimate equality here introduces , which is (γ, g0, g1) averaged over readers γ. Finally, it is notationally convenient to define μ as the overall mean of â (t). This will facilitate the comparison with the more standard approach to the MRMC problem below.
For the second of the three moments we average over cases after squaring. This gives
This sum involves averaging over observers before multiplying and averaging over cases. By separating the sum into the cases where both indices match, one index matches, and no indices match we get four terms
We are now in a position to compute the first part of the overall variance (Eqn. 19), which is the variance of the noise-and-reader-averaged figure of merit with respect to the case randomness. The result is three terms
with the coefficients given by
In the α1 expression the quantity inside the square brackets is a random variable since g1 has been averaged over but g0 has not. The coefficient α1 is then the variance of this random variable. Similar remarks apply to α2.
For the third moment in our list we square before doing any averaging. This leads to a fourfold sum
As before we can break this down into four sums depending on which indices match, and use our independence assumptions to reduce this expectation to four terms:
The last term may require some explanation which is provided in the Appendix. If we use the fact that s2 (t) = s (t), then the first term reduces to
We are now ready to compute the second part of the overall variance (Eqn. 19). Combining the expressions we just derived with earlier ones we have
The first two terms in the expression for α5 are the average of a conditional variance of a random variable. A similar simplification is possible for α6 and α7. The end results are alternate expressions for these coefficients (See Appendix),
The quantity in the outer angle brackets in Eqn. 50 is the variance of the step function averaged over internal noise and cases for the signal-present class. The random variables involved in computing this variance are the internal noise for a signal-absent case and readers. This variance is then averaged over signal-absent cases. A similar description can be applied to the bracketed term in α6 and α7.
To gain more insight into the significance of α1, α2, and α3, we expand as
Thus, s0 (g0) is s(t1 − t0) averaged over internal noise, readers and signal-present cases when the signal-absent case is g0. Similarly, s1 (g1) is s(t1 − t0) averaged over internal noise, readers and signal-absent cases when the signal-present case is g1. The random variable ε (g0, g1) is defined by Eqn. 53. It is straightforward to verify that the following expectations and conditional expectations vanish
These equations, combined with the fact that g0 and g1 are independent, imply that s0 (g0), s1 (g1) and ε (g0, g1) are uncorrelated random variables. This then gives us the expansion
These constraints define a bounded region in the space of points (α1, α2, α3) and thus allow us to compute, for any given values of N0 and N1, the maximum possible contribution to the variance of Â from the first three terms in the seven-term expansion.
This bound represents a worst case scenario. In practice we could expect this sum to be significantly smaller than the upper bound.
These constraints define a bounded region in the space of points (α4, α5, α6, α7). This allows us to compute, for any given NR, N0 and N1, the maximum contribution of the last four terms in the seven-term expansion to the variance of Â, i.e.,
Again we could expect this sum to be significantly smaller in practice. However, we can now write an upper bound for the variance of Â,
This could be useful in simulations where the numbers of cases and readers are easy to change and the computations of the αn would be tedious.
To compute the αn in the full expansion for the variance of Â (Eqn. 28), we need four moments at the reader-averaged level
one at the case-averaged level,
and two at the test statistic level
The αn are then linear combinations of these moments.
We now wish to see how the expansion given above for the variance of Â (T) compares to the more standard approach to MRMC that uses an expansion into uncorrelated components [1,2]. For this purpose we set
and define each term in this expansion in terms of averages. The first term μ is the overall mean
The second term is the reader term
This random variable is a function of the reader sample Γ. The third term is the case term
This random variable is a function of the case sample G. Since Γ and G are statistically independent, the random variables r and c are also statistically independent. The fourth term is the reader/case term
This random variable is a function of Γ and G. The last term is the only one that depends on the internal noise of the readers via the matrix of test statistics T
We will call this the noise term. It is straightforward to show that
These equations, together with the independence of r and c, can then be used to show that r, c, rc and ε are statistically uncorrelated. This fact gives us the following expansion for the variance of the figure of merit
We will now examine each term in this expansion
The reader term may be written as follows
For the second moment, which is also the variance, of this random variable we have, via the now familiar manipulations of the square of a sum,
The first equality follows from the independence of the readers, the second from the definition of μ, and the third from the definition of (γ, g0, g1). The end result is that
Thus the variance of the reader term can be identified with the fourth term in the seven-term expansion for Var [Â].
For the case term we can write
The variance is given by
In other words, the first three terms in the seven-term expansion for Var [Â] comprise the variance of the case term.
It should be noted that if N1 = Ntotal and N0 = (1 − )Ntotal, where is the prevalence, then the variance of the case term is given by,
The reader/case term can be written as
For the variance we use the fact that r, c and rc are uncorrelated and have zero mean values to get
This equation then gives us
A new moment appears here that does not appear in the computation of the seven-term expansion for Var [Â], i.e., the first term in the square brackets. This moment is discussed further in the Appendix.
The noise term is explicitly given by
By rearranging the variance expansion for Â we have
This then gives us
Note that it is rc + ε that accounts for the last three terms in the seven-term expansion. It appears that the separation of rc+ε into rc and ε is not a very useful concept at this point. Moments appear in the individual variances of rc and ε that cancel out, and therefore do not appear in the expressions for the αn. It would therefore be somewhat wasteful to compute their variances separately. This situation changes when we consider replication.
Now we replicate the trial K times, with the same cases and readers, and assume that the internal reader noise is independent and identically distributed from one trial to the next (the readers are not learning anything). Then we have an average figure of merit for the K trials
The mean value of ÂK is given by
For the variance we need the second moment, which can be expanded as
This expansion follows from the usual independence arguments. We can now write for the variance
The new moment we need is
This expansion follows from the conditional independence of the internal noise and the independence of the readers. Now we may write
The moments involved here have all been worked out above or in the Appendix. The result is a ten-term expansion which we will describe below.
We may also expand into uncorrelated components as before
The second line here follows from the conditional independence between trials. Now we have
where the first three variances of are given above, and the last variance is given by
The dependencies on numbers of cases, readers and trials are given by
Explicit expressions for the βn and the δn can be written out using equations already provided. The main point is that we can now see that rc and εK are now distinguishable in terms of their variances. Of course replicating an MRMC study in this way is probably not practical, except in simulation.
We have developed a probabilistic framework for analyzing MRMC problems. We have applied this framework to the Wilcoxon statistic and derived an exact seven-term expansion for the variance of the figure of merit as a function of the numbers of readers and cases. We have used the probabilistic model to derive constraints on the coefficients in this expansion. These constraints, in turn, provide an upper bound on the variance of the Wilcoxon statistic. We introduced a linear decomposition of the figure of merit into uncorrelated random variables that are defined in term of conditional expectations over the readers, cases, and test statistics. This linear decomposition has the same structure as the conventional MRMC decomposition. We have shown that the variances of the individual terms in the linear decomposition can be related to the terms in the seven-term expansion. Finally, we have shown that replication of the MRMC experiment results in a ten-term expansion.
In the future, we plan to validate this seven-term expansion of the variance of the Wilcoxon statistic in simulation. We will also apply this methodology to real data. We are especially interested in computing the variance of the Wilcoxon statistic for ideal, Bayesian observers which we calculate using Markov chain Monte Carlo techniques. Finally, we are working on the extension of the probabilistic model to account for multiple modalities as well as multiple readers and multiple cases.
We thank Drs. Charles Metz, Brandon Gallas and Robert Wagner for their many helpful discussions about this topic. This work was supported by NIH/NCI grant K01 CA87017 and by NIH/NIBIB grants R01 EB002146, R37 EB000803, P41 EB002035.
What follows is a derivation of Eqn. 18.
The second equality follows from the independence of the test statistics when the readers and cases are fixed. The third equality follows from the independence of the reader parameters, and the fact that they are identically distributed.
We start with the sum over all four indices with no matched indices in Eqn. 41,
The first equality follows from independence of the internal noise when readers and cases are fixed. The fourth equality follows from independence of cases.
where the first equality follows from conditional independence of the internal noise and the second equality from independence of the cases. The second step is to rewrite the second term in Eqn. 46 as
where again independence of cases is used. Now we use the fact that
to get the result in Eqn. 50.
The first moment in Eqn. 106 can be expanded as
Note that Var [rc] has no term that varies as . This variance will only have terms that vary as (NRN0)−1, (NRN1)−1 and (NRN0N1)−1.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.