|Home | About | Journals | Submit | Contact Us | Français|
We have used an approximate method developed by Barrett, Wilson, and Tsui for finding the ensemble statistics of the Maximum Likelihood-Expectation Maximization algorithm to compute task-dependent figures of merit as a function of stopping point. For comparison, human-observer performance was assessed through conventional psychophysics.
The results of our studies show the dependence of the optimal stopping point of the algorithm on the detection task. Comparisons of human and various model observers show that a channelized Hotelling observer with overlapping channels is the best predictor of human performance.
Studies of model-observer performance can generally be placed into one of two categories: studies in which statistical properties of the imaging system are well specified and hence model-observer performance is reported using ensemble statistics (mathematical formulas), and studies in which the statistical properties are not well specified and hence model-observer performance is reported using sample statistics usually through Monte-Carlo studies. In studies where the samples (images) may be difficult to obtain either because of computational intensity or large-sample requirements, the ensemble approach is preferable if the ensemble statistics can be computed. The goal of this work is to use ensemble statistical approaches to analyze model-observer performance on images produced by the Maximum Likelihood-Expectation Maximization (ML-EM) algorithm. For comparison, we also report human performance measured by conventional psychophysics.
The ML-EM algorithm has received considerable attention as a method of image reconstruction and restoration ( – ). It has the theoretically attractive properties of converging to a maximum-likelihood solution, implicitly applying a positivity constraint, and enforcing better agreement with the data in subsequent iterations. However it is not clear that these mathematical considerations translate into improved performance in diagnostic tasks. In the past, it has been difficult to study the statistical properties of this algorithm because of its nonlinear nature. However, a recent article by Barrett, Wilson, and Tsui  derives an approximate method for finding the necessary noise properties of this algorithm. We use these ensemble noise properties to compute model observer signal-to-noise ratios for task-based assessment of image quality. The present paper extends the work of Barrett, Wilson, and Tsui by considering stochastically defined image backgrounds in which the distribution of intensity within the object is presumed to follow a prior probability law.
Since humans are the end user of most medical images it is important to understand the effects of different choices made during the image reconstruction process on the resulting diagnostic performance. Due to the proliferation of reconstruction algorithms and the possibly large number of free parameters within a given algorithm, conventional psychophysical studies become infeasible for a thorough investigation of optimal processing. To this end, model observers and, specifically, linear model observers have been proposed as a more efficient way to optimize medical imaging systems for diagnostic performance.
In this work we consider the effect of stopping point on detection performance. Terminating the iterative scheme well before convergence is a simple way to reduce the high level of noise usually found in the unconstrained maximum-likelihood estimate of the reconstructed image ( – ). As with most methods of regularization, the strength with which the regularizer is applied (in this case the number of iterations the algorithm is allowed to run) is left to the user.
Our conclusions are twofold. Our observer studies indicate that the optimal stopping point is highly dependent on the task being considered. Similar results have been reported for linear iterative algorithms  using the ensemble statistical approach and for other forms of regularization in nonlinear algorithms  using Monte-Carlo methods. These results reinforce the argument that measures of image quality must take the diagnostic task into account. We also report on the ability of a number of proposed model observers to predict the outcome of human psychophysical studies.
In signal detection theory, one adopts the view that an image is a multidimensional random variable coming from one of two possible probability distributions. The distribution describing the image depends on whether the signal is actually present or absent in the image. The task of the observer is to decide from which distribution a given image comes. In theory, the observer makes its decision by forming a scalar response to the input image and subjecting this response to threshold. It is the statistics of this response variable that determine the observer performance. Detection performance measures are generally thought of as measures of separation between the distribution of responses to signal-present images and signal-absent images.
In this work we utilize the approach defined in  for objective assessment of image quality. Here the performance metric is the observer signal-to-noise ratio (SNR), the expected difference in the means of the two response distributions divided by their average variance. The model observers used in this work are all linear functions of the image data and therefore the SNR can be computed directly from the first and second-order statistics of the images. The SNR of a linear observer is defined by the following formula,
where Δs is the vector difference in the expected signal-present and signal-absent images, K is the average covariance of the two classes of images, and w is a vector representing the linear observer.
The images analyzed in this work come from a simulated two-dimensional parallel-beam single photon emission computed tomography (SPECT) imaging system. The tomographic imaging system is modeled as a linear system
where f is a vector representing the object of interest, H is the system matrix, n is a vector representing the noise associated with the system due to the Poisson statistics of gamma-ray emission, and g is the measured data. The ML-EM algorithm approximates the object of interest by a vector k+1 through the following iterative scheme
Approximate formulas for the statistical properties of images produced by ML-EM algorithm have been worked out in . The basic approach is to linearize each step of the iterative algorithm. The statistical properties of the algorithm are computed from the resulting sequence of linear transformations. A related approach is used in  to investigate resolution properties of penalized-likelihood algorithms. The expected reconstruction at a given stopping point is approximated by the image produced when the ML-EM algorithm is run on noiseless projection data. This vector is denoted by ak+1. The image covariance is approximated by
where Kg is the data covariance and Uk+1 represents the effects linearizing (3). The interested reader is referred to  and  for expressions regarding the structure of Uk+1. Extensive Monte-Carlo studies of this approximate covariance were performed in  to test the validity of the first-order approximation required for (4). Generally good agreement was found in the range of 50,000 expected counts per data set corresponding to about 60 expected counts per detector in the center of the image. In this range, the error in approximation was less than 10% at mid-range stopping points (more than 10 iterations and less than 200 iterations). In the present work, a much larger system is used (128 × 128 pixel images reconstructed at 64 angles as opposed to 32 × 32 pixel images from 32 angles), but with 400,000 expected counts in the data set, there are still about 60 expected counts per detector.
The data covariance takes two forms in this work. For the background-known-exactly (BKE) study, the Poisson statistics of the data collection process are the only source of variation in the collected data. Hence the data covariance is given by
where and Kf are the object mean and covariance respectively. The key requirement which allows us to extend the approximate to lumpy background data is relatively small object covariance. The addition of object covariance adds a source of variance to each data point which is 16% of the size of the variance due to Poisson counting statistics.
Human observer performance was assessed through conventional psychophysics. Two-alternative forced-choice experiments were conducted to estimate the probability of a correct identification (pc) in image pairs sampled from the signal-present and signal-absent distributions. The pc estimates were transformed to d′ – an estimated human signal-to-noise ratio – by the formula 
where Φ−1 is the inverse cumulative normal transformation.
The task considered was detection of a centered Gaussian bump against a flat or lumpy background. The radius of the signal – measured as the standard deviation of the Gaussian – was 4.0 pixels. Images were reconstructed within a window of radius 64 pixels inscribed in 128 × 128 pixel images. The signal contrast was 19.8% in the BKE images and 29.7% in the lumpy-background images. These contrasts were determined from pilot psychophysical studies to give an acceptable range of d′ values for human performance . Parallel-beam projections were collected at 64 angles equally spaced within a semicircle around the object. Because our objective was analysis of the reconstruction algorithm, perfect collimation of the projection data was assumed. Poisson noise in the data was set at 400,000 expected counts per data set.
Performance was assessed for a number of different model observers for comparison to the psychophysical results. One notable omission from the following list of observers is that of the Hotelling or optimal linear observer ( and ). Computation of the template w for this observer involves computing the pseudo-inverse of Kk+1, a 16,384 × 16,384 element matrix. While direct inversion of this matrix can be avoided by an iterative search for w , the large number of evaluations involving products with Kk+1 make this approach difficult as well. In addition, the Hotelling observer is independent of stopping point for a large class of linear algorithms .
The model observers considered here were:
This observer has a template w whose elements are 0 for pixels outside the region of interest and 1 for pixels inside the region of interest. For our studies, the region was a disk of radius 5.67 pixels corresponding to the standard deviation of point the signal profile. This observer model is similar to that used by Hanson  to analyze the ART algorithm.
This observer is given by the expected profile of the signal after reconstruction. From the results of , the observer template is well approximated by computing noiseless signal-present and signal-absent reconstructions and taking their difference. This expected signal difference, which we shall denote Δsk+1 is also used for Δs in Equation (1) for evaluating all model observer SNRs. The NPW observer is known to be optimal in stationary Gaussian white noise .
This observer uses a bank of frequency-selective filters to reduce the image to a much smaller number of filter responses. Optimal linear discrimination is then performed on this reduced set of responses. The channel model can be thought of as a bank of image templates represented by the matrix T. The template in each column of T is one of the channel filters represented in the spatial domain33. The observer template associated with a channelized Hotelling observer is given by 
Two channelized Hotelling observers are used here, each being defined by the radial frequency profile of its image templates. The first uses has four square non-overlapping channels (SQR). The second uses three channels with overlapping difference-of-Gaussian (DOG) profiles. Plots of the frequency profiles are seen in Figure 1. Channel models have been analyzed in (, , and ) as predictors of human performance.
Performance of the various model observers as a function of stopping point is given in Figure 2. The different observer models show marked differences in performance. Some observers – most noticeably the SQR observer – show improved performance in the lumpy background experiments. This is solely due to the increased contrast in these experiments. In lumpy backgrounds, all observers exhibited degraded performance relative to fixed backgrounds when the signals were of equal contrast.
Average human-observer performance results are plotted in Figure 3. Six observers participated in the studies, which consisted of 200 image pairs for each stopping point in both the fixed and lumpy backgrounds. All observers had participated in previous studies with similar images and were trained on an independent sample of 50 image pairs before taking each test. The experiments were randomized both in the order in which the image pairs were shown within a given test and in the order of the tests taken by each observer. The images were scaled for constant contrast across stopping point. In the fixed-background studies, the expected signal contrast was held constant at 7 grey levels of 256 in the display. In the lumpy background studies, the signal contrast was 10 grey levels.
The plots show a strong dependence on task in human performance. The fixed-background studies show a fairly consistent decrease in performance with subsequent iterations. The lumpy background studies indicate peaked performance in the range of 16 to 32 iterations.
The DOG observer was the only model observer that provided a reasonably consistent fit to the human-observer data in both backgrounds. Figure 3 also contains plots of the DOG SNR multiplied by a scale factor chosen to best fit the human data. Scaling of model-observer performance has been discussed in  as a way to incorporate degradation due to internal noise in the human observer when detecting localized signals. The least-squares scale values required for the DOG SNRs to fit the human data were 0.84 for the fixed background and 0.64 for the lumpy background. The χ2 goodness-of-fit statistics based on the error bars in the plots were 4.78 (p = 0.57) and 6.63 (p = 0.36) for the fixed and lumpy backgrounds respectively. These χ2 values must be interpreted carefully. While these goodness-of-fit statistics do not permit rejection at the usual p < 0.05 level, the relatively small sample sizes leave open the possibility that the experiments simply lacked the necessary statistical power to reject the fitted curves. In addition, this statistical analysis has not included the fact that the number, shape, and frequency range of the channels can all be adjusted. Hence the χ2 values only apply to resolving whether any channel model can provide a reasonable fit to our human data.
Visual inspection of the fitted curves is similarly ambivalent. While the model observer performance curves capture the general trends exhibited by the human data, closer inspection reveals a number of possible discrepancies. In both studies the fitted curves seem to undershoot human performance at the first iteration. The fit in the lumpy background experiments also seems to overestimate the high-iteration human performance. These differences are not statistically significant, but nevertheless they remain a source of concern and merit further study.
We have used the approximate ensemble statistical properties of the ML-EM algorithm to quantify detection performance of model observers for two different tasks – detection of a Gaussian bump in fixed and lumpy backgrounds. Standard psychophysical methods have been used to assess human performance. Our primary contribution is to have quantified the effect of the two tasks on optimal stopping point. All observers, both model and human, exhibit optimal performance at different stopping points in the different tasks. The more complex lumpy-background task requires more iterations of the ML-EM algorithm to achieve optimal performance. Similar behavior has been found in previous work on a different reconstruction algorithm . Our results highlight the need for task-based methods to answer questions involving the optimal processing of medical images.
Our second conclusion regards the ability of model observers to reliably predict (and therefore remove the necessity of assessing) human performance. As we have pointed out, the only model that was at all consistent with the human data in both backgrounds was the DOG observer. While this observer does fit the data within standard statistical tolerances, visual discrepancies between the human observer results and the scaled model observer performance leave some doubt of the prospects of generalizing to larger sample sizes and other tasks without further psychophysical evaluations. Furthermore, there is not yet a systematic way to choose parameters of the channel model that guarantee good agreement with human performance. In short, the DOG channelized Hotelling observer is not yet a predictive tool.
Future directions of this work will pursue the conclusions of the present work. Since the task has been shown to have a fundamental effect on observer performance, new tasks need to be explored. Perhaps most important of the future tasks will be those in which the signal is not confined to the center of the reconstruction. Other tasks of interest include varying the lumpy background parameters, size and shape of the signal, and quantifying the effect of penalty functions on Maximum-Likelihood reconstructions.
New model observers need to be considered as well as new tasks. This work underscores the limited capacity of conventional linear observers to predict human performance on tomographic reconstructions. Observers based on nonlinear estimation of signal parameters have been proposed as models for signal detection , . The ability of these observers to predict human performance in a wide range of detection tasks remains to be seen.
The authors would like to thank Jack Denny, Elizabeth Krupinski, and Ed Soares for many helpful discussions. We also wish to acknowledge the participants in the human-observer studies for their time and patience. This work was supported by NCI grant R01-CA52643.