|Home | About | Journals | Submit | Contact Us | Français|
Objective: The ability to accurately identify articles about therapy in large bibliographic databases such as EMBASE is important for researchers and clinicians. Our study aimed to develop optimal search strategies for detecting sound treatment studies in EMBASE in the year 2000.
Methods: Hand searches of journals were compared with retrievals from EMBASE for candidate search strategies. Six trained research assistants reviewed fifty-five journals indexed in EMBASE and rated articles using purpose and quality indicators. Candidate search strategies were developed for identifying treatment articles and then tested, and the retrievals were compared with the hand-search data. The operating characteristics of the strategies were calculated.
Results: Three thousand eight hundred fifty articles were original studies on treatment, of which 1,256 (32.6%) were methodologically sound. Combining search terms revealed a top performing strategy (random:.tw. OR clinical trial:.mp. OR exp health care quality) with sensitivity of 98.9% and specificity of 72.0%. Maximizing specificity, a top performing strategy (double-blind:.mp. OR placebo:.tw. OR blind: .tw.) achieved a value over 96.0%, but with compromised sensitivity at 51.7%. A 3-term strategy achieved the best optimization of sensitivity and specificity (random:.tw. OR placebo:.mp. OR double-blind:.tw.), with both these values over 92.0%.
Conclusion: Search strategies can achieve high performance for retrieving sound treatment studies in EMBASE.
The ongoing tremendous growth in the medical literature, with an estimated publication rate of over two million new articles per year , ensures that the task of keeping up to date with the latest health care knowledge can be complex and labor intensive for researchers and impractical for the busy clinician [2, 3]. Nevertheless, determining the current state of knowledge is essential to both.
For treatment, studies of the randomized controlled trial design are deemed the most valid for clinical intervention studies . Articles about therapy make up the largest proportion of clinically important articles in primary health care journals . Furthermore, with the rising emphasis on evidence-based practice , the need is accentuated for simple, fast, reliable, and inexpensive ways of retrieving evidence both relevant to clinical practice and scientifically sound.
Increasingly, clinicians are performing their own searches and turning to large online biomedical literature databases to find information to support evidence-based decision making, such as choosing treatment options [6–8]. The success of a search is highly susceptible to the choice of database . MEDLINE is often searched as the first-choice database because it provides free access to a broad range of biomedical literature coverage including nursing, dentistry, communication disorders, paramedical professions, population and reproductive biology, and clinical and experimental medicine. EMBASE searches, by comparison, are not free but complement MEDLINE in several ways, by providing greater coverage of European and non-English-language publications and broader coverage of key topics such as pharmaceuticals, psychiatry, toxicology, and alternative medicine . The estimated overlap between MEDLINE and EMBASE is only 30% to 50% [11–14]. Searchers comparing the two databases have concluded that relevant information would inevitably be missed if only one of the databases were searched [6, 11, 15]. Those wanting the most comprehensive coverage, particularly of drugs and other therapeutic regimens, would likely benefit from searching EMBASE in conjunction with MEDLINE.
Sifting through the vast amount of information in large, general bibliographic databases to locate the best health care evidence can be taxing, especially when most articles in these databases are clinically irrelevant, are ambiguous, and have poor methodologic quality  and when clinicians feel uncertain about their search skills . One solution for improving accuracy of retrieval of studies of various designs and contexts in online bibliographic databases is the use of “hedges” or filters, including search strategies consisting of indexing terms and textwords . Although search strategies are not a completely satisfactory solution (for example, precision will remain generally low in large, multipurpose databases containing a low concentration of relevant articles), they do help to narrow the search to a clinically important and sound subset of articles and have been commended to be influential and fundamental to information retrieval for evidence-based practice [17, 18].
Methodologic search strategies have been developed for improving the accuracy of retrieving treatment studies in MEDLINE [19–24]. Search strategies developed for searching MEDLINE cannot be directly translated for use in other databases such as EMBASE, because indexing practices vary and equivalent thesaurus terms do not necessarily exist across databases. Empirical studies on developing search strategies in EMBASE are scarce; the authors have identified only one study that developed search strategies for retrieving diagnostic studies in EMBASE . To our knowledge, no empirical studies have been done to develop search strategies for detecting treatment studies in EMBASE.
In the 1990s, we developed MEDLINE search strategies on a small subset of 10 journals for articles pertaining to therapy, diagnosis, prognosis, and causation [26, 27]. We expanded and updated our work using data from 161 journals indexed in MEDLINE for the publishing year 2000 [28–33]. These strategies have been adapted for use in the Clinical Queries interface of MEDLINE <http://www.ncbi.nlm.nih.gov/entrez/query/static/clinical.html> and in other searching arenas .
Furthering our earlier work, we report here the retrieval properties of single and combined terms for identifying methodologically sound studies on the prevention and treatment of health disorders in EMBASE. Although we acknowledge that indexing and terminology can evolve over time, we confined our manual search to the year 2000, having previously established the robustness of search strategies across publication periods (1991 and 2000) in MEDLINE .
We compared the operating characteristics of methodologic search strategies in EMBASE with a manual review of each article in each issue of fifty-five journals (Table 1) for the publishing year 2000. To evaluate EMBASE strategies designed to retrieve treatment studies, index terms and textwords related to research design features were run as search strategies. We treated the search strategies as “diagnostic tests” for sound treatment studies and the manual review of the literature as the “gold standard.” The sensitivity (or “recall”), specificity, precision, and accuracy of EMBASE searches were determined as shown in Figure 1. For example, for each EMBASE search strategy designed to retrieve sound treatment studies, sensitivity was defined as the proportion of relevant, high-quality articles retrieved; specificity as the proportion of low-quality articles not retrieved; precision as the proportion of relevant and high-quality articles retrieved; and accuracy as the proportion of all correctly classified articles.
Six research assistants assessed all articles for each issue of 55 journals for the year 2000. For articles in 7 purpose categories (causation, prognosis, diagnosis, treatment, economics, clinical prediction, and reviews), methodologic criteria were applied to determine if the article was scientifically sound. We used purpose category definitions to classify qualitative and cost studies but did not apply methodologic criteria to these types of studies. Purpose category definitions and corresponding methodologic rigor criteria have previously been published . Original articles (of interest to the health care of humans) pertaining to the prevention and treatment of diseases and health disorders were required to meet these methodologic criteria: random allocation of participants to comparison groups, outcome assessment of at least 80% of participants entering the investigation in 1 major analysis at any given follow-up assessment, and analysis consistent with study design. Research staff underwent training and intensive calibration before reviewing the 2000 literature, and interrater reliability (analyzed by the kappa statistic) for application of all methodologic criteria exceeded 80% beyond chance for all purpose categories .
The journals were selected using an iterative process based on recommendations of clinicians and librarians, Science Citation Index impact factors provided by the Institute for Scientific Information, and their ongoing yield of studies and reviews of scientific merit and clinical relevance for the disciplines of internal medicine, general medical practice, mental health, and general nursing practice (full list of journals available upon request from authors). Search strategies were developed on a 55-journal subset (Table 1), selected from 135 EMBASE clinical journals initially reviewed because of their ongoing yield of sound and clinically relevant articles. We had previously developed search strategies in MEDLINE using 161 journals indexed in MEDLINE [28–33] but found that the search strategies were robust in smaller journal subsets and that computation time was substantially decreased. We also found that, when strategies were developed in 60% of the database and validated in the remaining 40%, there were no statistical differences in performance. Thus, we developed search strategies for EMBASE using all data from the 55 journals.
To construct a comprehensive set of search terms, we compiled an initial list of index terms and textwords and then sought input from clinicians and librarians in the United States and Canada through interviews with known searchers and requests at meetings and conferences. Individuals were asked to identify which terms or phrases they used when searching for qualitative studies of treatment, causation, prognosis, diagnosis, economics, clinical prediction guides, reviews, and costs. We compiled a list of 5,385 terms, of which 4,843 were unique and 3,524 returned results (list of tested terms available on request from authors). For example, tested search terms included the textwords, “random,” “randomized trial,” “efficacy,” and “intention to treat”; the index term, “clinical trial”; and the index term, “treatment outcome,” exploded (exploded search terms also retrieve records indexed using more specific, narrower terms of the entered term).
Individual search terms with sensitivity greater than 25% and specificity greater than 75% for a given purpose category were incorporated into the development of search strategies that included a combination of 2 or more terms. All combinations of terms used the Boolean “OR,” for example, “random OR controlled.” The Boolean “AND” was not used because this strategy invariably compromised sensitivity, because it limited the search by reducing the number of citations retrieved. For the development of multiple-term search strategies to either optimize sensitivity or specificity, we tested all 2-term search strategies with sensitivity of at least 75% and specificity of at least 50%. For optimizing accuracy, 2-term search strategies with accuracy greater than 75% were considered for multiple-term development. Seven thousand one hundred sixty-four search strategies were tested in the development of treatment search filters.
Logistic regression approaches to developing search strategies were used when deriving treatment and prognostic hedges for MEDLINE. These logistic regression approaches, compared with the Boolean approach described above, did not improve search strategy performance. Hence, for the other purpose categories and databases including EMBASE, only the Boolean approach was used for search strategy development.
Indexing information was downloaded from EMBASE for 27,769 articles from the 55 journals from 2000 that were hand-searched. Of these, 3,850 were classified as original articles of treatment, of which 1,256 (32.6%) were methodologically sound. Search strategies were developed using all 27,769 articles. Thus, the strategies were tested for their ability to retrieve high-quality treatment articles from all other articles, including both poor-quality treatment studies and all nontreatment studies.
Table 2 shows the single terms with the best sensitivity, best specificity, and best optimization of sensitivity and specificity for detecting sound treatment studies in EMBASE in 2000. The term, “random:.mp.,” achieved the best sensitivity at 95.1%; even with sensitivity maximized, specificity was high at 92.5%. The single term, “randomized.tw.,” achieved the best specificity at 96.7%, albeit with a clear but expected reduction in sensitivity (63.2%). With specificity maximized, precision was increased to just over 47% (an absolute increase in precision of about 10% compared with the best sensitivity single term). The single term, “clinical trial:.mp.,” achieved the best balance of sensitivity (88.3%) and specificity (88.0%).
The operating characteristics of top performing combination strategies are shown in Table 3. The 3-term strategy, “random:.tw. OR clinical trial:.mp. OR exp health care quality,” yielded the best sensitivity (almost 99%) and had a specificity of 72.0%. Compared with the single term with the best sensitivity, “random:.mp.” (95.1% sensitivity, 92.5% specificity, and 37.6% precision), the 3-term strategy with the best sensitivity achieved an absolute gain in sensitivity of only 3.8% but with quite substantive absolute losses in specificity (20.5%) and precision (23.3%).
For the 3-term strategy with the best sensitivity, replacing the term, “exp health care quality,” with the term, “exp treatment outcome,” led to an absolute increase in specificity of 5.8% (72.0% to 77.8%), with only a small 0.2% decrease in sensitivity (98.9% to 98.7%) (Table 3).
The 3-term strategy, “double-blind:.mp. OR placebo: .tw. OR blind:.tw.,” yielded the best specificity at 96.7%, but with a definite trade-off in sensitivity, which lowered to 51.7%. Yet with specificity maximized, a relatively remarkable rise was seen in precision, which reached 42.8%. Compared with the 3-term strategy with the best sensitivity, this represented an absolute increase in precision of 28.5%. When search terms were combined to optimize sensitivity and specificity, these values exceeded 92%. Accuracy was driven by specificity and, for all top-performing terms, these percentages were very similar.
We have developed search strategies that can assist clinicians and researchers in retrieving methodologically sound treatment studies. Choice of search strategy should be made by weighing the most appropriate trade-off between sensitivity and specificity to best fulfill the purpose of the search. For example, if the purpose of the search is to identify as complete a set as possible of relevant randomized controlled trials for a systematic review, a high sensitivity strategy would be most suitable. For maximum coverage, however, even more widespread albeit more labor-intensive approaches might be considered. For example, searching more than one database such as EMBASE plus MEDLINE, hand-searching bibliographies of relevant articles and key journals, and contacting experts and pharmaceutical companies could be done. On the other hand, if the purpose of the search is to efficiently retrieve several key articles, a high specificity strategy that reduces the number of nuisance hits might be suitable.
Our 3-term results in Table 3 show that, by using the best specificity instead of the best sensitivity strategy, sensitivity drops 47.2% from 98.9% (with most sensitive strategy) to 51.7% (with most specific strategy), meaning that almost 1 of every 2 clinically relevant and sound articles would be missed. Bachmann and colleagues  have raised uncertainty that the articles retrieved in searches are necessarily a random selection of those available. Therefore, using the most specific strategy (even though it has greater precision), which lacks good sensitivity, can be risky because a biased representation of the knowledge based on only half of the available evidence cannot be excluded. The strategy that best optimizes sensitivity and specificity has better precision than the most sensitive strategy, with a much smaller compromise in sensitivity than the best specificity strategy. For searches of broad content areas with anticipated high citation yields, but requiring completeness, the best optimization strategy might be the most sensible option.
For example, suppose a search is done in EMBASE to identify articles on the effectiveness of therapy with herbal medicine. The search might begin with the single content term, “herbal medicine,” which would give an enormous yield of 5,696 articles (Table 4). Combining this initial search with the combination strategy with the best sensitivity that we developed reduces the yield more than five-fold to 1,115 articles, but this is still a cumbersome subset. Nevertheless, the 1,115 retrievals might be worth sifting through for the purpose of being as inclusive as possible. Alternatively, combining the initial search with the combination strategy with the best specificity that we developed reduces the yield to a far more manageable 238 articles (one-fifth the yield of the most sensitive strategy), although the possibility exists that some important articles are not detected. Finally, choosing instead to combine the initial search with the combination strategy with the best optimization of sensitivity and specificity yields an intermediate number of 427 hits. Note that a quick scan of the 238 articles detected by the best specificity strategy reveals at least several articles on the effectiveness of herbal therapies—such as ginkgo, St. John's wort, ginseng, echinacea, saw palmetto, kava, and Chinese herbal medicine—that have been included in ACP Journal Club [37–39]. Because ACP Journal Club includes only articles that meet basic criteria for clinical relevance and scientific merit, this search successfully identified at least several relevant and sound articles.
Although we developed our filters to retrieve treatment studies meeting certain criteria for methodologic rigor, it does not necessarily imply that the retrieved articles will have been done using the best methodologic standards. Ultimately, the end user has the responsibility for appraising the retrieved literature for quality and relevance before applying it to clinical practice. The quality of randomized controlled trials accepted for publication, for example, differs by journal , which may be attributable to variations in the rigor of the peer-review process and instructions given to authors and reviewers. Journal editors, by giving explicit instructions to authors to fully report their methodology, can help improve the accuracy of the indexing process in bibliographic databases and facilitate the reader's appraisal of study quality.
Dissimilarities in bibliographic databases (e.g., coverage of different sets of journals, variability in indexing practices and thesaurus terms, and use of different search engines) make direct comparisons of our top-performing treatment filters for EMBASE with those for MEDLINE not meaningful. For example, one of the terms in the most sensitive combination strategy, “health care quality,” is a subject heading in EMBASE that is not supported as an index term in MEDLINE. Therefore, direct testing of this EMBASE strategy in the MEDLINE database is not possible. Conceivably, optimal filters in one database may not be top-performing filters in another database; hence, filters need to be developed for specific use in the intended database.
Search strategies are a helpful but imperfect solution to accurate literature retrieval in large online bibliographic databases. Even our top-performing strategies have generally low precision because EMBASE is such a large and broad-ranging database. Because precision in large multipurpose databases is inevitably low due to the small concentration of relevant articles, it is especially important that terminology (e.g., methodologic terminology) used in therapeutic studies be as accurate, explicit, and consistent as possible to facilitate the indexing process and improve the success of a search. Variations in the meticulousness of indexing quality exist in bibliographic databases . Other factors that can also impact the success of a search are the formation of a well-defined clinical question and its translation into a searchable strategy and the skill of the searcher.
Our study shows that the retrieval of methodologically sound articles on therapy and prevention in EMBASE can be enhanced by the use of several search filters. Although beneficial filters exist, such as those reported here, further work is needed to improve dissemination and publication of filters so that clinicians, researchers, and librarians are not only aware of them, but also have greater knowledge and proficiency in using them effectively.
Several search strategies can achieve high performance in retrieving methodologically sound studies on the prevention and treatment of diseases and health disorders in EMBASE. The optimal trade-off between sensitivity and specificity should be assessed based on the purpose of the search.
The Hedges Team includes Angela Eady, R. Brian Haynes, Susan Marks, Kathleen Ann McKibbon, Doug Morgan, Cindy Walker-Dilks, Stephen Walter, Stephen Werre, Nancy Wilczynski, and Sharon Wong, all in the Health Information Research Unit, Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Ontario, Canada.
*Source of funding: National Library of Medicine, USA (grant no. 5R01LM06866-04).