|Home | About | Journals | Submit | Contact Us | Français|
Systematic reviews of health care topics are valuable summaries of all pertinent studies on focused questions. However, finding all relevant primary studies for systematic reviews remains challenging.
To determine the performance of the Clinical Queries (CQ) sensitive search filter for diagnostic accuracy studies for retrieving studies for systematic reviews.
We compared the yield of the sensitive CQ diagnosis search filter for MEDLINE and EMBASE to retrieve studies in diagnostic accuracy systematic reviews (ACP Journal Club, 2006).
12 of 22 diagnostic accuracy reviews (452 included studies) met inclusion criteria. After excluding 11 studies not in MEDLINE or EMBASE, 95% of articles (417/441) were captured by the sensitive CQ diagnosis search filter (MEDLINE and EMBASE combined). Of 24 studies not retrieved by the filter, 22 were not diagnostic accuracy studies. Re-analysis of the CQ filter without these 22 non-diagnosis articles increased its performance to 99% (417/419). We found no substantive impact of the 2 articles missed by the CQ filter on the conclusions of the systematic reviews in which they were cited.
The sensitive CQ diagnostic search filter captured 99% of articles and 100% of substantive articles indexed in MEDLINE and EMBASE in diagnostic accuracy systematic reviews.
Systematic reviews are valuable resources for clinicians and researchers because they summarize all pertinent studies on a specific clinical question, can improve the understanding of inconsistencies among diverse evidence, help users to keep up with the medical literature, define future research agendas, and inform the management of health problems1, 2. However, finding all primary studies for systematic reviews is challenging because an overwhelming amount of information is available in the biomedical literature. In addition, complete, accurate retrieval is compromised by indexing inconsistencies and ambiguities, and lack of empirically validated searching filters (also referred to as search strategies and hedges)3.
One approach is to use a complex search filter based on the principles of library science, such as that of the InterTASC Information Specialists' Sub-Group4. A potential alternative is to use the simple but sensitive search filters originally designed to assist clinicians to find the current best evidence for clinical decisions. Our group developed empirically validated search filters for several purposes including retrieving higher quality studies of treatment5, diagnosis6, etiology7, prognosis8, and clinical prediction guides9. These are publicly available in the Clinical Queries (CQ) interface of MEDLINE (http://www.ncbi.nlm.nih.gov/entrez/query/static/clinical.html) as well as the limits screen of Ovid10 for MEDLINE, EMBASE, PsycINFO, and CINAHL. Three types of CQ search filters are available in Ovid: “sensitive” (retrieves a high proportion of relevant or on-target articles, but also a suboptimal number of off-target articles reflect the low precision figures), “specific” (somewhat lower sensitivity but fewer off-target retrievals), and “optimal” (best balance of sensitivity and specificity); the optimal strategy is not available on the CQ page in PubMed.
In this investigation, we sought to determine how well the MEDLINE and EMBASE sensitive CQ diagnostic search filters retrieved the diagnostic accuracy studies included in systematic reviews. We looked at this question in 2 ways. First, we determined if the included studies were retrievable by using CQ filters. Second, we calculated if the use of CQ filters reduced the number of potentially relevant studies that needed to be reviewed after searching in MEDLINE and EMBASE.
The methods used to derive the CQ search filters in both MEDLINE and EMBASE have been described elsewhere5–9. In this study, we compared the yield of the sensitive CQ diagnostic filters (Table 1) with the studies included in a sample of systematic reviews of diagnostic accuracy from the ACP Journal Club collection (http://www.acpjc.org) for the year 2006 (search done in May, 2007).
Diagnostic accuracy reviews were searched from the ACP Journal Club web site by entering “Review” in the search field, and selecting “Diagnosis” from the “Article type” drop-down menu. We selected diagnostic accuracy reviews by first looking at each ACP Journal Club page of titles for reviews that were bannered as 2006. Study eligibility was determined by looking at each 2006 diagnosis review in full-text, which was downloaded using the PubMed identifying numbers hyperlink at the end of the citation of the original study. Eligibility criteria for including a diagnostic accuracy systematic review in our study were that it was published in 2006, incorporated a MEDLINE and EMBASE search as a data source, and that the review was available and downloadable in electronic format. When the diagnostic accuracy reviews were analyzed in full-text, we discovered that the systematic review by Wardlaw et al11 used our diagnostic CQ search filter as part of their strategy. To avoid “incorporation bias”, we added this additional criterion for eligibility: the systematic review could not use the CQ search filters. The included studies of each of the eligible diagnostic systematic reviews were documented in an Excel datasheet.
For each eligible diagnostic review topic, we ran the sensitive CQ diagnostic search filter in both MEDLINE and EMBASE using the Ovid Technologies interface. Starting with MEDLINE, each of the included studies for each systematic review was located by entering citation information in the search field. Once an included study was located, the “diagnosis (sensitivity)” option in the “Limits” tab was used to test if the article would be captured by the sensitive CQ diagnostic search filter.
We assessed the effect on the conclusions of a review of any included study that was not retrieved by the sensitive search filter. We defined an included study as having a “potential impact” on the summary measures and conclusions of the systematic review if it was used as one of studies included in a meta-analysis, or if it was described in the Results section of the review in the context of any of the outcomes. For articles included in the review but not retrieved by the sensitive search filter, we defined a non-retrieved article as having “no impact” on the systematic review if the exclusion of the included study from the review’s analysis made no difference to the final conclusions compared with results if the study had been included in the analysis.
To determine if the sensitive CQ diagnostic search filter reduces the number of studies that need to be screened after searching MEDLINE and EMBASE, we sought to replicate the search filters in the systematic reviews. We contacted the authors of the systematic reviews to obtain the exact filter used in their search.
Of 94 diagnosis accuracy reviews found in ACP Journal Club, 22 were published in 2006. When reviewed in full-text, 13 systematic reviews met our original inclusion criteria (both MEDLINE and EMBASE searches as data sources, and available in electronic format)11–23. The addition of the third eligibility criterion during full-text review resulted in the exclusion of 1 diagnostic review11; and 9 systematic reviews were excluded because they did not meet our original inclusion criteria: 1 systematic review did not explicitly include a MEDLINE search filter, 6 reviews did not include an EMBASE search filter, and 2 reviews were not available in electronic format, leaving 12 systematic reviews for our sample 12–23. A total of 452 studies were analyzed by the 12 systematic reviews (Table 2). Of these, 11 articles from 2 reviews12, 13 were abstracts from conference proceedings and were not indexed in either MEDLINE or EMBASE, and thus were excluded from the analysis (Figure 1).
Figure 1 shows the flow diagram of the process that was used to calculate the proportion of articles that were captured by the most sensitive CQ diagnosis search filter. After excluding the 11 abstracts not indexed in MEDLINE or EMBASE from the pool of 452 included studies, 95% of articles (417/441) were captured by the sensitive CQ search filter when results from MEDLINE and EMBASE were combined. Of these, 273 articles (62%) overlapped between MEDLINE and EMBASE, 114 articles (26%) were captured in MEDLINE but not EMBASE, and 30 articles (7%) were found in EMBASE only.
The 24 articles (5%) that were missed by the search strategy were from 6 systematic reviews (Table 3)14–17, 22, 23. We explored the characteristics of the missed articles to determine why they were missed by our sensitive filter. We reviewed the titles and abstracts of all 24 missed articles to determine if they were about diagnostic accuracy. We used a simplified version of criteria previously developed by Wilczynski et al24 to determine this: the study compared at least 2 diagnostic test procedures with one another. Of the 24 non-retrieved articles, 22 articles did not meet the criteria of a diagnosis study (Table 3). Treatment accounted for 16 (66.7%), 4 (16.7%) were classified as ‘something else’ (defined as: content of the study does not fit any of the definitions for other purpose categories [e.g., diagnosis, treatment]24), 1 (4.2%) was an etiologic study25 , and 1 (4.2%) was classified as a prognosis study because even though patients with dyspepsia underwent 2 tests (whole blood serology and endoscopy) to determine the frequency of gastroesophageal cancer26 the investigation did not include a direct comparison of these diagnostic tests for detecting gastroesophageal cancer.
We then re-calculated the proportion of articles that the sensitive CQ diagnostic search filter retrieved by excluding these 22 non-diagnosis articles from the sample, giving a rate of 99% (417/419). The missing 1% represents 2 articles27, 28, from separate systematic reviews14, 15, that were missed by the sensitive search filter. This proportion of articles was consistent with the performance characteristics of the sensitive diagnostic search filter in MEDLINE (sensitivity 98.6%, specificity 74.3%)6 and EMBASE (sensitivity 100%, specificity 70.4%)29.
We assessed the impact of these 2 articles by excluding their findings from their reviews to see if this would affect the conclusions of the reviews. The first missed diagnosis article (Kiilholma et al27) was one of 121 included studies of a systematic review by Martin et al14, which compared ≥2 diagnostic techniques with a gold standard (multichannel urodynamics) for diagnosing urinary incontinence. We found that the removal of this study from the systematic review had no impact on the results because the study was not used in any pooled analyses or described in the Results section for the outcomes.
The second non-retrieved diagnosis study (Dahele et el28) was one of 34 included studies of a systematic review that compared the performance of the endomysial antibody (EMA) test with two types of tTG antibody tests (i.e., human recombinant [hr] and guinea pig [gp]) to make recommendations for the most appropriate screening test for celiac disease. The sensitivities and specificities of the 34 studies were pooled in a meta-analysis. We re-calculated the meta-analysis of the included studies to determine if the removal of the study by Dahele et al would impact on the overall results (using Meta-DiSc, version 1.4). We found an absolute increase of 0.3% in the pooled sensitivity for the tTG-Ab diagnostic test when the study by Dahele et al was removed, but no difference was found between the two sets of pooled specificities (Table 4). A similar absolute increase (0.2%) was found for the pooled sensitivities of the EMA diagnostic test when the study by Dahele et al was removed, and no difference between the 2 sets of pooled specificities (Table 4). Because this change was so small and within the span of the 95% CIs (see Table 4), we can conclude that the review was not substantively affected by the exclusion of this single study.
We reviewed these 2 articles, both of which include diagnostic accuracy data, to determine the reasons they were not retrieved by the sensitive CQ diagnostic search filters. Dahele et el28 included no terms or phrases related to diagnosis or diagnostic accuracy testing in the title, abstract, or indexing terms in MEDLINE and it is not indexed in EMBASE. Our request to the National Library of Medicine to have the MEDLINE indexing reviewed resulted in re-indexing. Now this article is retrieved by the sensitive CQ diagnostic filter. The article by Kiilholma et al has no diagnostic information in its EMBASE record. Its MEDLINE record includes the subheading ultrasonography, defined as the use of ultrasongraphy in the diagnosis of diseases. This ultrasonography subheading is not included in the sensitive CQ diagnostic filter.
We next sought to replicate the search filters in the articles to determine if the use of the sensitive CQ diagnostic search filer may have saved time during screening of studies after searching. We received a response from 7 authors (58%), but only 5 provided a detailed search filter for MEDLINE12, 13, 18, 21, 22. We included a 6th systematic review in our analysis because it provided the exact search filter used in the Ovid Technologies interface within the manuscript14. Using the Ovid interface in MEDLINE, we reproduced the search filters of these 6 systematic reviews by entering all search terms within the publication date parameters provided. Before testing, 3 of the 6 search filters required consensus from 3 investigators to clarify their interpretation of the search filter with respect to Boolean operator placement, and translation between PubMed and Ovid interfaces. After the searches were entered we applied the sensitive CQ diagnostic search filter to each search yield. Overall we found that 5 of the 6 systematic reviews showed a 35% to 63% reduction in the number of articles that would have to be assessed for relevance compared with retrievals without the use of the MEDLINE sensitive CQ diagnostic search filter. In the 6th systematic review, the number of articles retrieved remained the same after applying the sensitive CQ diagnostic search filter.
We showed that the sensitive CQ diagnostic search filter performed extremely well by capturing 99% of included articles indexed in MEDLINE and EMBASE from our sample of 12 diagnostic systematic reviews. The original capture estimate of 95% was improved once we determined that 92% of the 24 missed articles (22/24) were not about diagnosis. A large proportion of missed articles that were not about diagnosis occurred in reviews that addressed both diagnosis and treatment questions. Other explanations for non-retrieved articles include inconsistencies in indexing. One of the 2 non-retrieved articles about diagnosis was missed because of incomplete indexing, and has now been re-indexed. The second article missed by the sensitive CQ filter can be attributed to the performance of the operating characteristics of the filter itself, which does not include the diagnostic subheading ultrasonography. Further, the single missed diagnostic test study that was part of a pooled analysis did not have a substantive effect on the review conclusion.
Others have investigated the usefulness of search filters to identify diagnostic accuracy studies30, 31. Both of these studies concluded that search filters do not perform well enough to warrant their use in finding articles for systematic reviews. However, these studies differ in a number of ways from ours. First, neither examined the non-retrieved articles to determine whether they met the definition of diagnostic accuracy studies (including a comparison of at least 2 diagnostic tests) or whether these studies would have materially affected the conclusions of the reviews in which they appeared. Second, both confined their searching to MEDLINE. Restricting our analysis to articles indexed in Medline, we observed 89% retrieval for our sensitive filter, without adjustment for studies that did not meet the definition of diagnostic accuracy: this is consistent with the findings of Leeflang et al (87% retrieval for our sensitive filter)30. Third, neither considered ways that the search filter results could be extended, for example by examining references in the appropriately retrieved studies. (We did not consider this either, as our retrieval was very high; however, it seems that examining references in retrieved studies would be a logical extension of their findings that could have tempered their negative conclusions.) Ritchie et al31 also included only 1 systematic review and used content terms that may have limited the retrieval of the search filters they tested. Leeflang et al30 also reported considerable variability in results from study to study which could explain the differences in findings of the 3 investigations.
Our study aimed to evaluate the performance of an empirically-derived search filter as a tool for retrieving articles for a diagnostic accuracy systematic review. We used empirically validated search filters for both MEDLINE and EMBASE and examined in detail the original studies that were not retrieved by these filters, determining that most of the non-retrieved studies did not report comparisons of 2 or more tests, and that 2 studies that did report such comparisons did not substantively alter the conclusions of the reviews in which they were reported. Furthermore, the confirmation that incorrect indexing was the reason that 1 of these 2 studies were not retrieved by our diagnostic search filter, strengthens the interpretation of our findings. The search filters we used are readily available and can be used on Ovid, PubMed and EBSCO: CINAHL interfaces. Our results show promise for clinicians and researchers conducting systematic reviews because the sensitive search filter is easily used and retrieved all key articles in MEDLINE and EMBASE. Additional searches would be needed for other databases and for studies reported only in abstract form in conference proceedings.
As for any search filter in MEDLINE or EMBASE, the retrieval of the sensitive filter includes many off-target (“false positive”) articles that will need to be assessed and eliminated. We set out to test the efficiency of the sensitive CQ diagnostic search filter by comparing it with the search filters used in the 12 systematic reviews, but only one review provided the exact search strategy used14, and only 5 authors provided a detailed strategy for MEDLINE on request. We were unable to replicate the searches exactly, partly because of errors in the search filters, or the lack of details provided concerning Boolean operator placement, or uncertainties about translation for the Ovid interface. This finding is consistent with previous reports that indicate that errors in search filters are frequently revealed when the strategy is provided in enough detail to attempt replication32. This also raises an important question—to what extent can systematic reviews be replicated?
Our study has some additional limitations. First, our findings are based on a sample of systematic reviews from ACP Journal Club. We chose ACP Journal Club as the source for reviews because it includes only higher quality systematic reviews from a limited, but important set of clinical general healthcare and specialist journals. Second, we looked only in MEDLINE and EMBASE for primary articles included in the reviews; we did not look for articles that were not indexed in MEDLINE or EMBASE, but might be found in other databases (such as BIOSIS, PsycINFO, and so on) or the grey literature, which are considered important in the systematic review process. However, over 97% of the articles in the reviews we studied were indexed by MEDLINE, EMBASE or both. The Cochrane Handbook for Diagnostic Test Accuracy Reviews does not recommend the use of search filters because “studies examining the accuracy of diagnostic tests are not well indexed in the electronic bibliographic databases, such as MEDLINE”33. Our search filter would appear to circumvent this limitation. However, we are not suggesting that use of our sensitive filter would obviate the need for additional searching for studies as part of the process of conducting a systematic review of diagnostic test accuracy studies. Rather, we believe that it is a useful tool for beginning such searching and, if verified in other studies, many suffice for the MEDLINE and EMBASE database searches. It should be noted that the appropriate content terms (typically terms related to the clinical problem or diagnostic test of interest) must be added to the CQ filters. Further studies are needed to assess the reproducibility of searches reported in systematic reviews, the relative precision of search filters, and the performance of search filters in other databases and for other purposes, such as reviews of therapy, prognosis, and etiology.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.