shows the 89 reviews that met the inclusion criteria. summarises the characteristics of the reviews. The reviews covered a range of types of diagnostic tests and tumour sites. We could not assess five of the 30 reviews assigned for detailed review because the number of studies included in the review was unclear. Tables , , , , show the findings across the nine assessment domains. Items are classified according to whether they relate to the review, a single test within the review, or a single study within the review. Average agreement between duplicate data extractors was 80%, most differences occurring through reader error or from ambiguity in the reviews, particularly for the details of the reference test.
Characteristics of included reviews (n=89)
Assessment of reviews of diagnostic tests in patients with cancer, according to objectives and search, and participants and clinical setting
Assessment of reviews of diagnostic tests in patients with cancer, according to study design
Assessment of reviews of diagnostic tests in patients with cancer, according to characteristics of index and reference test
Assessment of reviews of diagnostic tests in patients with cancer according to graphical display and study results
Assessment of reviews of diagnostic tests in patients with cancer according to meta-analysis, quality, and bias. Figures are percentage (number) of reviews or studies that had the assessment item
Objectives, inclusion criteria, and search
A clear statement of objectives and inclusion criteria for the review are important for a systematic approach.23
Only when search strategies are reported can readers of the review appraise how well the review has avoided bias in locating studies.
The primary purpose of most reviews was to assess test accuracy; some did so as part of a clinical guideline or economic evaluation (). Three quarters of the 89 reviews stated inclusion criteria, though the number of studies included was unclear in 15 reviews. Of the 25 reviews assessed in detail, 16 used study inclusion criteria relating to sample size or study design, and 15 discussed the appropriateness of patient inclusion criteria used by the primary studies. Nearly a third (32%, 8/25) of the reviews searched two or more electronic databases, 80% reported their search terms, and 84% searched bibliography lists or other non-electronic sources.
Description of target condition, patients, and clinical setting
Clinical relevance and reliability requires reporting of information on the target condition, patients, and clinical setting.22
Reporting severity of disease is important because, for example, the performance of many imaging techniques is related to tumour size.
Half of the 89 reviews did not report whether tumours were primary, recurrent, or metastatic (). Only 17% (15/89) reported on the clinical setting, and 45% reported characteristics of patients for individual studies. Of 17 reviews of primary or recurrent tumours assessed in detail, 10 did not consider possible effects of tumour stage or grade on test performance. Reviews sometimes omitted information that had been collected—for example, 18% (16/89) of reviews collected information on the severity of disease but did not report it.
Consecutive prospective recruitment from a clinically relevant population of patients with masked assessment of index and reference tests is the recommended design to minimise bias and ensure clinical applicability of study results.
Twenty of the 25 reviews assessed in detail did not report or were unclear on whether included studies used consecutive recruitment of patients (). Few reviews limited inclusion to study designs less prone to bias—namely, consecutive (8%) or prospective (12%) studies. Sixty percent (15/25) discussed test masking. Poor reporting made it impossible to identify inclusion of case-control designs.6
Description of index and reference tests
Both index and reference tests need to be clearly described for a review to be clinically relevant and transparent and to allow readers to judge the potential for verification and incorporation biases.6
Only 36% (9/25) of reviews reported the definition of a positive result for the index test (). In 40% (10/25) it was unclear if the included studies used the same, or different, index tests or procedures. When index tests were reported to vary between included studies, 71% (10/14) reported the index test for each study and the compatibility of different tests was discussed in 86% (12/14) of reviews.
Sixty eight percent (17/25) of reviews assessed in detail reported the reference tests used in the review; 40% reported reference tests for each included study. Six reviews reported whether reference tests were used on all, a random sample, or a select sample of patients.
Reporting of individual study results and graphical presentation
We assessed the level of detail used to report the results of individual studies. Ideally reviews should report data from 2×2 tables for each study, or summary statistics of test performance. Graphs are efficient tools for reporting results and depicting variability between study results. Of the 89 reviews, 40% contained graphs of study findings, and 39% reported sensitivities and specificities, likelihoods ratios, or predictive values (). Over half (56%, 14/25) of the reviews assessed in detail provided adequate information to derive 2×2 tables for all included studies. Four reviews included tests with continuous outcomes but presented only dichotomised results; three reported the cutpoint used.
Meta-analysis, quality, and bias
Appropriate use of meta-analysis can effectively summarise data across studies.24
Quality assessment is important to give readers an indication of the degree to which included studies are prone to bias.
Sixty one percent (54/89) of reviews presented a meta-analysis () and 32% completed a formal assessment of quality. Twenty three of the 25 reviews assessed in detail discussed the potential for bias. Spectrum bias was most commonly considered (80% of reviews), with verification bias and publication bias considered least (40%).
Procedures in review
The reliability of a review depends partly on how it was done.23
Only 48% (12/25) of reviews provided information on review procedures, most reporting duplicate data extraction by two assessors (nine reviews), a method recommended to increase review reliability.
Assessment of overall review quality
shows quality scores for each domain assessed by using star plots for the 25 reviews assessed in detail. Reviews of higher quality have longer spokes and larger areas within the stars. Reviews conducted for the three clinical guidelines and two health economic analyses were of particularly poor quality. Additional detailed assessment of seven further reviews of clinical practice guidelines included in our larger sample confirmed this pattern: four did not report the number of included studies, and the three remaining were of similar quality to the five in .
Fig 2 Star plots of methods and reporting quality of reviews. Each review assessed in detail is represented by a star plot of nine domains, indicating the percentage of a maximum score in each domain, with domain scores indicated by clockface directions. A (more ...)
We identified two reviews with good overall methods and reporting that could serve as examples for new reviewers.25,26
Study quality was not related to page length, year of publication, assessment of an imaging technology, or the number of diseases or index tests assessed.