Medical tests are indispensible for clinicians and provide information that goes beyond what is available by clinical evaluation alone. Systematic reviews that attempt to determine the utility of a medical test are similar to other types of reviews-for example, those that examine clinical and system interventions. In particular, a key consideration in a review is how much influence a particular study should have on the conclusions of the review. This chapter complements the original Methods Guide for Effectiveness and Comparative Effectiveness Reviews
(hereafter referred to as the General Methods Guide
and focuses on issues of particular relevance to medical tests, especially the estimation of test performance (sensitivity and specificity).
The evaluation of study features that might influence the relative importance of a particular study has often been framed as an assessment of quality. Quality assessment—a broad term used to encompass the examination of factors such as systematic error, random error, adequacy of reporting, aspects of data analysis, applicability, specifying ethics approval and detailing sample size estimates—has been conceptualized in a variety of ways.2, 3
In addition, some schemes for quality assessment apply to individual studies and others to a body of literature. As a result, many different tools have been developed to formally evaluate the quality of studies of medical tests; however, there is no empirical evidence that any sort of score based on quantitative weights of individual study features can predict the degree to which a study is more or less “true.” In this context, systematic reviewers have not yet achieved consensus on the optimal criteria to assess study quality.
Two overarching questions that arise in considering quality in the sense of “value for judgment making” are: 1) Are the results for the population and test in the study accurate and precise (also referred to globally as the study’s “internal validity”), and 2) is the study applicable to the patients relevant to the review (an assessment of “external validity” with regard to the purpose of the review)? The first question relates to both systematic error (lack of accuracy, here termed bias) and random error (lack of precision). The second question distinguishes the relevance of the study not only to the population of interest in the study (which relates to the potential for bias) but, most importantly for a systematic review, the relevance of the study to the population represented in the key questions established at the outset of the review (i.e., applicability).
This chapter is part of the Methods Guide for Medical Test Reviews
produced by the Agency for Healthcare Research and Quality (AHRQ) Evidence-Based Practice Centers (EPC) for AHRQ and the Journal of General Internal Medicine. Similar to the General Methods Guide
assessment of the major features that influence the importance of a study to key review questions are assessed separately. Chapter 6 of this Guide
considers the evaluation of the applicability of a particular study to a key review question. Chapter 7 details the assessment of the quality of a body of evidence, and Chapter 8 covers the issue of random error, which can be addressed when considering all relevant studies through the use, if appropriate, of a summary measure combining study results. Thus, this chapter highlights key issues when assessing risk of bias in studies evaluating medical tests—systematic error resulting from design, conduct, or reporting that can lead to over- or under-estimation of test performance.
In conjunction with the General Methods Guide
and the other eleven chapters in this Methods Guide for Medical Test Reviews,
the objective is to provide a useful resource for authors and users of systematic reviews of medical tests.