|Home | About | Journals | Submit | Contact Us | Français|
This is a response to a recent thought provoking paper  by Gur and Rockette (henceforth referred to as “the authors”) that raises issues regarding the applicability of free-response receiver operating characteristic (FROC) methodology  to imaging systems evaluations. Unlike many tests, diagnostic imaging provides information about the location(s) of disease, and other information, in addition to its presence or absence. However, the receiver operating characteristic (ROC) method only considers the disease presence or absence information and disregards location. For some clinical tasks the ROC method is more relevant. For example, a task like detecting diffuse pulmonary fibrosis that does not involve focal lesions is appropriately analyzed by the ROC method. However, tasks such as detecting lung nodules in chest radiography, or microcalcifications in screening mammography, that involve detecting localized lesions, especially multiple lesions, are more appropriately handled by FROC analysis.
By way of disclosure, having worked in this area since ca. 1984, I am vested in FROC methodology. There are two other location-specific paradigms not mentioned in the authors' paper, the location ROC (LROC) paradigm  and the region of interest (ROI) paradigm . In the LROC paradigm the radiologist provides an overall rating and marks the most suspicious region. In the ROI paradigm the investigator segments the image into ROIs and the radiologist rates each ROI for presence of disease. Like FROC these paradigms were developed to address the localization and multiple lesions limitations of ROC methodology, and most of the issues attributed to FROC apply to these methods as well.
Neglect of location information implies suboptimal measurement precision, ie, low statistical power, which diminishes the ability to detect differences between modalities, the most common application of observer performance studies. Early analyses tools developed by me drew fair criticism because they ignored correlations. This issue was resolved in 2004 by the jackknife alternative free-response operating characteristic (JAFROC) method  which was demonstrated to have substantially higher power than the ROC method and passed rigorous statistical validation. As evidenced by recent publications in journals and proceeding papers at a major international conference on medical imaging, JAFROC usage is gaining acceptance. However resistance to it is also increasing which is to be expected as part of normal scientific discourse. Gur and Rockette have done a service to the imaging community by expressing their concerns publicly and I am grateful to the Editor for giving me the opportunity to respond.
I agree with the authors on some of the issues: ambiguity of the acceptance target (how close a mark has to be to a lesion in order to be counted as a true positive); suitability of the figure of merit for multiple lesions with different clinical significances; handling multiple views per case, simulator related issues such as distributional assumptions and lack of accounting for satisfaction of search; etc. However, I choose to regard these as research opportunities and thank the authors for laying out a detailed research road-map. This research, quite apart from its obvious application to modality assessment, could substantially extend our understanding of medical decision making. I am making progress in some of these areas but others need to get involved. Unfortunately, if one accepts the authors' premise that ROC is more clinically relevant than FROC in location-specific tasks such as screening mammography, then few will be motivated to do research in FROC analysis.
When a screening radiologist refers a patient for further investigation to a colleague, the location of the lesion and which breast is involved, is crucial. Screening programs require documentation of lesion characteristics, including location, in addition to the overall recommendation to “recall” or “return to screening”. The location(s) identified at screening guide the subsequent diagnostic workup and the decision to biopsy the lesion(s). Just knowing that the woman has a malignancy somewhere in her breasts is obviously less helpful to the mammographer doing the diagnostic workup than knowing the locations and types of abnormalities detected at screening by a colleague. Neglecting location information can lead to the scenario where in the ROC paradigm the radiologist is credited for detecting an abnormal condition when in fact a lesion was missed and a normal structure was mistaken for a lesion (“right for the wrong reason”). It is obvious that the clinical consequences of the two mistakes are serious: the undetected cancer is allowed to grow and a biopsy is made at the wrong location (see #1 below).
Since claims are being made that ROC is clinically more relevant than FROC in some scenarios, a definition of clinical relevance is needed. Evaluation methods form a six-level hierarchy  of efficacies: technical, diagnostic, diagnostic-thinking, therapeutic, patient outcome and societal. I will interpret “clinical relevance” of a measurement as its hierarchy-level. The difficulty/cost of measurement increases as one moves up the hierarchy. At the lowest level, technical efficacy (e.g., spatial resolution) is easiest to measure. The level-2 ROC measurement has a reputation of being time consuming and costly. Level-3 measurements such as positive predictive values are even more laborious . One way of showing clinical relevance is to perform measurements at the higher level and show that they confirm the lower level measurements regarding which modality is superior. If the performance difference is small, demonstrating clinical relevance can be very difficult. As an example, the initial optimistic expectations of mammography CAD, which were based on ROC studies, have not been confirmed in some large-scale clinical trials [8, 9]. Since it is difficult to prove the clinical relevance of ROC one can hardly claim it is more relevant than FROC. Black and Dwyer  studied the issue of global vs. local measures of accuracy and their effects on post test probability of disease, which is a level-3 measure. They considered mediastinal lymph node metastasis (LNM) which is more likely to be present in the right lower paratracheal region (4R) than in the left lower paratracheal region (4L). As expected, post test probability is higher if the radiologist knows that LNM was found in 4R rather than 4L, and this knowledge will influence the subsequent action (e.g., biopsy or surgery). But if the location information is ignored, the posttest probability is equal for the two cases. Black and Dwyer conclude “the local versus global distinction supports the commonsense notion that information pertaining to the anatomic distribution of disease is crucial for test interpretation”.
One cannot fault the authors' recommendation that end-users become aware of issues with FROC and address them in the study design, but the authors fail to provide guidance on how one is to address them, apart from, by implication, not conducting an FROC study. The one example given of an appropriate application of the FROC method actually follows the ROI paradigm (see #8 below). An end-user may reasonably conclude from the authors' paper: “reconsider using FROC; consider using binary or ROC methods instead”.
What follows are specific response to some of the more salient issues.
This work was supported by grants from the Department of Health and Human Services, National Institutes of Health, R01-EB005243 and R01-EB008688.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.