Sensitivity and specificity, which are defined as the number of true positive decisions/the number of actually positive cases and the number of true negative decisions/the number of actually negative cases, respectively, constitute the basic measures of performance of diagnostic tests (). When the results of a test fall into one of two obviously defined categories, such as either the presence or absence of a disease, then the test has only one pair of sensitivity and specificity values. However, in many diagnostic situations, making a decision in a binary mode is both difficult and impractical. Image findings may not be obvious or clean-cut. There may be a considerable variation in the diagnostic confidence levels between the radiologists who interpret the findings. As a result, a single pair of sensitivity and specificity values is insufficient to describe the full range of diagnostic performance of a test.
| Table 1The Decision Matrix. Sensitivity and Specificity of a Test are Defined as TP/D+ and TN/D-, Respectively |
Consider an example of 70 patients with solitary pulmonary nodules who underwent plain chest radiography to determine whether the nodules were benign or malignant (). According to the biopsy results and/or follow-up evaluations, 34 patients actually had malignancies and 36 patients had benign lesions. Chest radiographs were interpreted according to a five-point scale: 1 (definitely benign), 2 (probably benign), 3 (possibly malignant), 4 (probably malignant), and 5 (definitely malignant). In this example, one can choose from four different cutoff levels to define a positive test for malignancy on the chest radiographs: viz. ≥2 (i.e., the most liberal criterion), ≥3, ≥4, and 5 (i.e., the most stringent criterion). Therefore, there are four pairs of sensitivity and specificity values, one pair for each cutoff level, and the sensitivities and specificities depend on the cutoff levels that are used to define the positive and negative test results (). As the cutoff level decreases, the sensitivity increases while the specificity decreases, and vice versa.
| Table 2Results from Plain Chest Radiography of 70 Patients with Solitary Pulmonary Nodules |
| Table 3Sensitivity, Specificity, and FPR for the Diagnosis of Malignant Solitary Pulmonary Nodules at Each Cutoff Level from the Plain Chest Radiography Study |
To deal with these multiple pairs of sensitivity and specificity values, one can draw a graph using the sensitivities as the y coordinates and the 1-specificities or FPRs as the x coordinates (). Each discrete point on the graph, called an operating point, is generated by using different cutoff levels for a positive test result. An ROC curve can be estimated from these discrete points, by making the assumption that the test results, or some unknown monotonic transformation thereof, follow a certain distribution. For this purpose, the assumption of a binormal distribution (i.e., two Gaussian distributions: one for the test results of those patients with benign solitary pulmonary nodules and the other for the test results of those patients with malignant solitary pulmonary nodules) is most commonly made (
1,
2). The resulting curve is called the fitted or smooth ROC curve () (
1). The estimation of the smooth ROC curve based on a binormal distribution uses a statistical method called maximum likelihood estimation (MLE) (
3). When a binormal distribution is used, the shape of the smooth ROC curve is entirely determined by two parameters. The first one, which is referred to as
a, is the standardized difference in the means of the distributions of the test results for those subjects with and without the condition (Appendix) (
2,
4). The other parameter, which is referred to as
b, is the ratio of the standard deviations of the distributions of the test results for those subjects without versus those with the condition (Appendix) (
2,
4). Another way to construct an ROC curve is to connect all the points obtained at all the possible cutoff levels. In the previous example, there are four pairs of FPR and sensitivity values (), and the two endpoints on the ROC curve are 0, 0 and 1, 1 with each pair of values corresponding to the FPR and sensitivity, respectively. The resulting ROC curve is called the empirical ROC curve () (
1). The ROC curve illustrates the relationship between sensitivity and FPR. Because the ROC curve displays the sensitivities and FPRs at all possible cutoff levels, it can be used to assess the performance of a test independently of the decision threshold (
5).