We have documented search strategies that can help to discriminate relevant from nonrelevant articles for a number of categories of importance to those interested in HSR. Those who are interested in all articles on a given topic and who have the time to sort out irrelevant articles will be best served by the most sensitive strategies (). Those with little time who are looking for “a few good articles” on a given topic will likely be best served by the most specific strategies (). Use of these strategies is straightforward, as the National Library of Medicine has translated our most sensitive and most specific strategies for public use at www.nlm.nih.gov/nichsr/hedges/search.html
. The best strategies for optimizing the trade-off between sensitivity and specificity are shown in . When sensitivity was maximized for combinations of terms, specificities rose considerably relative to those for individual search terms. For instance, for sensitive searches for high-quality appropriateness articles, the combination of terms resulted in marked increases, to near-perfect specificity. When specificity was maximized for combinations of terms, while keeping sensitivity of at least 50%, we observed very high specificities for almost all HSR categories. Search performance, including the trade-off between sensitivity and specificity, was generally comparable to that found for topics that are of more direct interest to clinicians, such as treatment, diagnosis, prognosis and etiology.15
However, the methodologic standards for clinical topics are generally much higher, so that the literature retrieved provides more robust answers.
Few search filters have been developed to retrieve journal articles on a small range of topics of direct relevance to HSR. A pilot project created preliminary search strategies for economics and qualitative research in the HSR literature in 2000 (Allmang NA, Koonce TY. Health services research topic searches. Bethesda [MD]: National Library of Medicine; 2000. Unpublished report) but lacked a gold standard against which to assess the quality of the searches. Search filters developed for the National Health Service Economic Evaluation Database,16
the Health Economic Evaluation Database17
and the London School of Economics (LSE) Strategy,18
which are designed to retrieve economic evaluation articles, were compared with one another, to generate a relative standard, giving estimates of sensitivity of 72% and specificity of 75% for the LSE strategy in MEDLINE.18
Our findings for economics articles appear to be somewhat better but are not directly comparable, as our gold standard was a hand search. Additional filters have been designed to retrieve articles on outcome measurement19
(just 3 strategies based on hand searches in just 2 journals) and quality of care20
(in which only precision was measured).
Our study had some limitations. First, we could not find secure methodologic features for the HSR categories of appropriateness and cost that lend themselves to retrieving the best studies. Second, the number of appropriateness articles in our database was small, giving rise to imprecise estimates of search performance for that category. Third, our database was not large enough to permit test–retest searches to validate the strategies. Fourth, we have not studied the effect of combining research filters with content terms (such as a disease, technology or type of health service) and thus cannot report on the characteristics of such searches; such a study would require considerably more resources than were available to us. Fifth, we tested only Ovid's search engine for MEDLINE; other search engines, including the PubMed search engine of the National Library of Medicine, may handle terms somewhat differently, with slightly differing results.
The best search strategies found in our research leave some room for improvement. Better search performance may require maturation of research methods for HSR, similar to those for some forms of clinical research, and better indexing. Improvements may also be possible through more sophisticated search strategies, for example, with more search terms, use of other Boolean operators (“and,” “and not”), natural language processing and multivariate statistical techniques such as logistic regression and discriminant function analysis. In our limited experience with the use of other Boolean operators and logistic regression for clinical topics such as diagnostic tests,21
we have observed trade-offs between sensitivity and specificity and no substantive improvements with more complex search strategies, but we have not attempted these approaches for HSR topics. We look forward to other researchers taking up the challenge of developing better search strategies for HSR.