The classical paradigm for evaluating test performance compares the results of an index test with a reference test. When the reference test does not mirror the “truth” adequately well (e.g. is an “imperfect” reference standard), the typical (“naïve”) estimates of sensitivity and specificity are biased. One has at least four options when performing a systematic review of test performance when the reference standard is “imperfect”: (a) to forgo the classical paradigm and assess the index test’s ability to predict patient relevant outcomes instead of test accuracy (i.e., treat the index test as a predictive instrument); (b) to assess whether the results of the two tests (index and reference) agree or disagree (i.e., treat them as two alternative measurement methods); (c) to calculate “naïve” estimates of the index test’s sensitivity and specificity from each study included in the review and discuss in which direction they are biased; (d) mathematically adjust the “naïve” estimates of sensitivity and specificity of the index test to account for the imperfect reference standard. We discuss these options and illustrate some of them through examples.
KEY WORDS: medical test, diagnostic test, alloy standard