“Grading” refers to the assessment of the strength of the body of evidence supporting a given statement or conclusion rather than to the quality of an individual study.1
Grading can be valuable for providing information to decisionmakers, such as guideline panels, clinicians, caregivers, insurers and patients who wish to use an evidence synthesis to promote improved patient outcomes.1,2
In particular, such grades allow decisionmakers to assess the degree to which any decision can be based on bodies of evidence that are of high, moderate, or only low strength of evidence. That is, decisionmakers can make a more defensible recommendation about the use of the given intervention or test than they might make without the strength of evidence grade.
The Evidence-based Practice Center (EPC) Program supported by the Agency for Healthcare Research and Quality (AHRQ) has published guidance on assessing the strength of a body of evidence when comparing medical interventions.1,3
That guidance is based on the principles identified by the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) working group4–6
with minor adaptations for EPCs. It is important to distinguish between the quality of a study and the strength of a body of evidence on diagnostic tests as assessed by the GRADE and EPC approaches. EPCs consider “The extent to which all aspects of a study’s design and conduct can be shown to protect against systematic bias, nonsystematic bias, and inferential error”
as the quality
or internal validity
or risk of bias
of an individual study.7
In contrast to the GRADE approach, the EPC approach prefers to use the term “strength of evidence” instead of “quality of evidence” to describe the grade of an evidence base for a given outcome because the latter term is often equated with the quality of individual studies without consideration of the other domains for grading a body of evidence. An assessment of the strength
of the entire body of evidence includes an assessment of the quality of an individual study along with other domains. Although the GRADE approach can be used to make judgments about the strength of an evidence base and the strength of recommendations, this chapter considers using GRADE as a tool for assessing only the strength of an evidence base.
When assessing the strength of an evidence base, systematic reviewers should consider four principle domains—risk of bias, consistency, directness,
Additionally, reviewers may wish to consider publication bias
as a fifth principle domain as recently suggested by the GRADE approach.6
Additional domains to consider are dose-response association
, existence of plausible unmeasured confounders,
and strength of association (i.e., magnitude of effect
). Of note, GRADE considers applicability as an element of directness
. This is distinct from the EPC approach, which encourages users to evaluate applicability as a separate component.
EPCs grade the strength of evidence for each of the relevant outcomes and comparisons identified in the key questions addressed in a systematic review. The process of defining the important intermediate and clinical outcomes of interest for diagnostic tests is further described in a previous article.8
Because most diagnostic test literature focuses on test performance (e.g., sensitivity and specificity), at least one key question will normally relate to that evidence. In the uncommon circumstance in which a diagnostic test is studied in the context of a clinical trial (e.g., test versus no test) with clinical outcomes as the study endpoint, the reader is referred to the Methods Guide for Effectiveness and Comparative Effectiveness Reviews
on evaluating interventions.1,3
For other key questions, such as those related to analytic validity, clinical validity, and clinical utility, the principles described in the present document and the Methods Guide for Effectiveness and Comparative Effectiveness Reviews
This paper is meant to complement the EPC Methods Guide for Comparative Effectiveness Reviews, and not to be a complete review. Although we have written this paper to serve as guidance for EPCs, we also intend for this to be a useful resource for other investigators interested in conducting systematic reviews on diagnostic tests. In this paper, we outline the particular challenges that systematic reviewers face in grading the strength of a body of evidence on diagnostic test performance. The focus of this article will be on diagnostic tests, meaning tests that are used in the diagnostic and management strategy of a patient symptom or complaint, as opposed to prognostic tests, which are for predicting responsiveness to treatment. We then propose principles for addressing these challenges.