Over the past 2 decades, systematic reviews have risen in number, quality, and impact. The sheer volume of work is remarkable. For example, the annual number of meta-analyses (a subset of systematic reviews) indexed by MEDLINE has grown from 273 in 1990 to 4,526 in 2010. Well-conceived and written systemic reviews serve many functions for stakeholders. First, they help clinicians apply evidence from the medical literature to patient care by critically appraising and summarizing what is often, for a given topic, a large amount of published clinical investigation. Systematic reviews are particularly useful when substantial practice variation exists, actual practice differs from published standards of care, clinical guidelines differ in their recommendations, and a large body of recent literature provides new insights that may modify recommendations from those of published guidelines.
Second, they can provide the basis for establishing and revising clinical guidelines as well as many quality assessment metrics applied to physicians, group practices, and hospitals. Third, they can inform future research agendas by defining important unresolved questions. Lastly, they draw attention to differences in findings across studies addressing similar research questions and propose a basis for the conflicting results. For all of these reasons, their impact can be substantial. For example, in one study of 170 journals in the fields of general internal medicine, family practice, nursing, and mental health, the average impact factor for systematic reviews was 26.5.1 In contrast, the mean impact factor for the top 40 general medical journals is 7.4.
Guidelines to assist authors of systematic reviews in medicine have evolved. Published in 1999, the QUORUM (Quality of Reporting of Meta-analyses) guideline for reporting systematic reviews2 aimed to standardize and improve published reports of systematic reviews. Subsequent evolution of review methods, including increasingly rigorous assessments of the risk of bias and more frequent inclusion of observational data, prompted the development of an updated reporting tool, PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses), which was published in 2009.3,4 PRISMA aims to standardize the reporting of systematic reviews; it offers less guidance to authors on the conduct and performance of such reviews. Guidelines also exist to assist authors in the conduct of reviews. Since 1994, the Cochrane Collaboration has published, and regularly updated, a detailed handbook for authors of systematic reviews.5 This methods guide focuses primarily on reviews of randomized controlled trials of interventions. While developed for authors of Cochrane reviews, the handbook is freely available and has been a helpful resource for other authors of systematic reviews.
One mission of the Agency for Healthcare Research and Quality (AHRQ) is to solicit and publish systematic reviews (evidence reports and technology assessments) on topics to improve the clinical practice and delivery of health-care services. In 1997, AHRQ formed the Evidence-based Practice Center (EPC) Program to commission and oversee these reviews.6 One of us (SC) directs the EPC program under the umbrella of the Center for Outcomes and Evidence, and one (DM) directed the Duke EPC. EPCs conduct reviews for use by a variety of groups, including national guideline groups, such as the US Preventive Services Task Force (USPSTF),7 which uses reviews to inform screening and prevention guidelines, and payers, such as the Centers for Medicare and Medicaid Services. To improve the quality and consistency of EPC reports, the Agency has published methods guidance, developed by EPC authors (Methods Guide for Effectiveness and Comparative Effectiveness Reviews).8 This guidance, along with those of other groups such as the Cochrane Collaboration, the USPSTF,9 and the Institute of Medicine10, forms the basis for a standards in the conduct of systematic reviews.
The editors of the AHRQ Methods Guide realized, however, that systematic reviews of medical tests pose unique challenges that are not adequately addressed in guidelines for authors of reviews of interventions or comparative efficacy. For example, the principle “outcome” of a study of a medical test is commonly a proxy or intermediate outcome. An illustration of this is ultrasound evaluation of the carotid arteries. The most common outcome in an evaluation of this test is the accuracy of the test in identifying clinically significant stenosis of the artery. Clinicians, while interested in this proxy outcome, would find more value in the ability of the test to predict clinically significant outcomes (such as 10-year risk of stroke or cardiovascular death) or the effect of subsequent carotid endarterectomy or stenting on stroke or death rates. A review of the operating characteristics of carotid ultrasound would optimally assess both the proxy result (as compared to a reference standard, in this case, invasive angiography) and the downstream result of testing on clinically significant outcomes.
Clinicians obtain medical tests for a number of non-overlapping reasons. These include screening, diagnosis, prognosis, and prediction of treatment response. In recognition of the unique challenges in conducting reviews of diagnostic tests, the Cochrane Collaboration has formed a working group specifically tasked with providing guidance in this arena. A draft version of their handbook, which is at present incomplete, has begun to address these challenges.11
AHRQ has also recognized the limitations of the Methods Guide for Effectiveness and Comparative Effectiveness Reviews (herein referred to as the General Methods Guide) when applied to studies of medical tests. In 2007, AHRQ convened an expert working meeting on the methodologic challenges in performing systematic reviews of medical tests. Four white papers were commissioned and presented on May 28–29, 2008.12 The discussions from this meeting formed the basis for the Medical Test EPC workgroups, led by DM, then director of the Center for Clinical Health Policy Research and of the Duke EPC. Three EPC workgroups identified and addressed practical challenges in each step of conducting systematic reviews of medical tests (understanding the context, performing the review, and synthesizing the evidence). From these workgroups, EPC authors wrote nine draft papers providing guidance on steps for systematically reviewing medical test evidence that were either not covered in the existing General Methods Guide or that illustrated how to apply the General Methods Guide to medical test evaluation. An additional two workgroups addressed issues unique to genetic and prognostic tests. Each paper underwent extensive peer review by EPC investigators, external peer review, and public comment.
The Society of General Internal Medicine (SGIM) and the editorial leadership of the Journal of General Internal Medicine recognize that academic general internists share with AHRQ the desire to improve the quality of systematic reviews of medical tests through dissemination of methods guides to potential authors. AHRQ approached the Journal’s editorial leadership and proposed a collaborative effort to review and publish this guide. The AHRQ Scientific Resource Center managed the peer and public review process through the usual Effective Health Care Program mechanisms. Two deputy editors from the Journal (GS and CU) reviewed the peer and public review comments, and author responses. All four of us reconciled any remaining issues and submitted a consensus letter to the corresponding author of each chapter with additional requests for revisions. In particular, we sought to expand the scope of the articles beyond EPC authors to provide relevant guidance to all authors of systematic reviews of medical tests. Likewise, we guided manuscript development so that the resulting chapters would be of value to readers of systematic reviews of medical tests who seek to determine the strengths and weaknesses of the review and its impact on clinical practice. We asked authors to identify potential differences between their chapters and the recommendations from the upcoming Cochrane handbook for systematic reviews of diagnostic test accuracy, and to comment on the basis for any disparities. The final versions of each chapter manuscript were submitted simultaneously to the Journal for typesetting and to AHRQ for public posting. AHRQ also developed online training modules for authors based on the content of these manuscripts.13
This supplement represents the final product of these efforts. The supplement covers 12 core aspects of the optimal conduct of systematic reviews of medical tests and serves as guidance for authors. However, each paper, or chapter, stands on its own. It is our sincere hope that EPC and non-ECP authors of systematic reviews, as well as other researchers and clinician readers, will find this collated Methods Guide for Systematic Reviews of Medical Tests to be helpful for the generation and appraisal of reviews, as well as the application of reviews to decision making about the use of specific medical tests in clinical practice.