Lesion detectability studies, such as the LROC study performed here, provide a powerful means of evaluating and ranking image quality improvements resulting from advances in PET imaging technology and algorithms. However, such studies have a number of inherent limitations and cannot comprehensively mimic the broad clinical tasks for which PET is used in cancer imaging. While this study showed a statistically-significant improvement in the ability of observers to detect focal “hot” lesions on a structured noisy background for the more advanced algorithms, the same improvements will not necessarily translate directly to the clinical environment where patient variability, motion artifacts, and imperfections in tracer distributions (such as FDG uptake in inflammatory lesions) come into play. As such, all available information—including studies of image characteristics (spatial resolution, contrast, noise, quantitation), lesion detectability, and clinical evaluations—should be cautiously considered before implementing new algorithms for clinical use.
One of the greatest challenges and limitations of lesion detectability studies is the selection of the lesion population. Ideally, the lesion population would directly represent the clinically-encountered lesion distribution. However, such a population is not well-defined, varies greatly between tumor types, and furthermore cancer patients may have large numbers of undetected tumor cell masses (e.g., micrometastases) which cannot yet be characterized because we do not yet have the means to detect them. Broadly stated, the clinical range of tumor activity levels would range from “very low” (e.g., just above background) to as high as 30 × -40 × background (or even higher) for very metabolically active lesions; however, the clinical distribution within this range is variable and not well defined. When also considering that it is difficult to obtain good statistical power for distinguishing moderately-significant algorithmic improvements (due, in part, to observer fatigue when reading large numbers of images), basing the lesion distribution solely upon the expected clinical population becomes impractical.
In this study, we did not attempt to quantify the PLOC and ALROC that would be observed clinically (which would require, among other things, such a true clinically-representative lesion population). Rather, our objective was to rank the reconstruction algorithms according to their relative lesion-detection performance. This greatly simplified the demands of designing the lesion population. Many clinically-encountered lesions would be either “obvious” (easily detectable for all reconstruction algorithms studied) or “invisible” (undetectable by current PET technologies, regardless of the algorithm used). Since all four algorithms would always succeed (obvious lesions) or always fail (undetectable lesions), including such lesions in the test population would provide no information regarding the differences in performance between algorithms, but would still lead to observer fatigue. Thus, the test lesion population was selected to include only lesions that were somewhat-to-very challenging, thereby emphasizing the differences (if any) between algorithms. While all of these lesions were of clinically-relevant size (6–16 mm) and contrast (lesion:background ratios of 1.6:1—37:1), the test distribution was designed to maximize statistical power for differentiating the reconstruction algorithms. As a result, a valid ranking of algorithms was obtained which would hold true for a broader population of lesions, but little or no information regarding the clinical significance of these differences was obtained. In other words, one can infer that LOR-OSEM3D+PSF would provide better clinical performance than AW-OSEM3D; but one cannot predict how much better it would be based solely on these results.
There are a number of other limitations to this study that should be considered. Though a PET/CT scanner was used, the observers read only the PET images and did not have access to the CT images (which did not mimic a clinical CT scan well). The lesions were spherical, not spiculated as many tumors would be; only a single body habitus was considered; no motion artifacts were present; and the background tracer distribution did not vary across the range that could be encountered. The observers were presented with a single-slice image for each test case, and thus could not use adjacent slices in their decisions. As such, they were not asked to search a 3D volume or allowed to identify multiple targets (such as by AFROC, [35
]) that would more closely mimic the clinical task. More advanced ROC methods could potentially be used to overcome some of these limitations; however, such methods are less well-studied than the method used here, and they would also present additional demands upon the experimental design and on the observers. Finally, as with all lesion-detection studies, the statistical power of this study was limited by the number of images read, and the statistical significance of the results should be carefully considered before drawing conclusions.