We compared the retrieval performance of mental health content search terms in MEDLINE with a manual review (hand search) of each article for each issue of 29 journal titles for the year 2000. Overall research staff hand searched 170 journal titles. These journals were chosen based on recommendations of clinicians and librarians, Science Citation Index Impact Factors provided by the Institute for Scientific Information, and ongoing assessment of their yield of studies and reviews of scientific merit and clinical relevance for the disciplines of internal medicine, general medical practice, mental health, and general nursing practice (list of journals provided by the authors upon request). Of these 170 hand searched journals, 161 were indexed in MEDLINE. Search strategies for the study we report here were developed using a 29 journal-subset chosen based on those journals that had the highest number of methodologically sound studies in the area of mental health, that is, those that contributed > 1 article to the journal Evidence-Based Mental Health http://ebmh.bmjjournals.com
during the year 2000 (list of journals provided by the authors upon request).
We compiled a list of 3,395 index terms and textwords (list of terms tested provided by the authors upon request). This list was compiled after surveying 140 mental health specialists from around the world, reviewing the search strategies from 5 mental health focused Cochrane groups, and mapping textwords to MeSH terms. Examples of the search terms tested are '(learn: adj problem)', 'schizoid', 'depression', and 'mania', all as textwords; 'phobic disorders', the index term; and the index term 'aggression', exploded (i.e., a search term that automatically includes closely related indexing terms).
As part of a larger study [22
], 6 trained, experienced research assistants read all issues of 170 journals for the publishing year 2000. Each article was rated using purpose and quality indicators and categorized into clinically relevant original studies, review articles, general papers, or case reports. The original and review articles were then categorized as 'pass' or 'fail' for methodologic rigor in the areas of therapy/quality improvement, diagnosis, prognosis, causation, economics, clinical prediction, and review articles. The research staff were rigorously calibrated before reviewing the journals and inter-rater agreement for identifying the format of articles (e.g., original study, review article) was 92% beyond chance (kappa statistic, 95% confidence interval (CI) 0.89 to 0.95). Inter-rater agreement for which articles met all scientific criteria (e.g., treatment study, diagnostic study) was 89% beyond chance (kappa statistic, CI 0.78 to 0.99) [22
]. One research assistant then hand searched all articles in each issue of the 29 journal subset and indicated if the article was of interest to the area of mental health. The predetermined criteria for "of interest to mental health" were as follows:
Pharmacological interventions for persons with mental health problems; cognitive and behavorial approaches to helping any patient (e.g., including cancer patients); etiology pertaining to mental health; diagnosis pertaining to mental health; or economic issues pertaining to mental health.
The proposed search strategies were treated as "diagnostic tests" for sound studies and the manual review (hand search) of the literature was treated as the "gold standard". We determined the sensitivity, specificity, precision, and accuracy of each single term and combinations of terms in MEDLINE using an automated process. Sensitivity for a given topic is defined as the proportion of high quality articles for that topic that are retrieved; specificity is the proportion of low quality articles not retrieved; precision is the proportion of retrieved articles that are of high quality; and accuracy is the proportion of all articles that are correctly classified.
Individual search terms with sensitivity > 15% and specificity > 80% for articles of interest to mental health were incorporated into the development of search strategies that included 2 or more terms. All combinations of terms used the Boolean OR, for example, "mania.tw. OR depression.sh.". For the development of multiple-term search strategies to optimize either sensitivity or specificity, we tested all 2-term search strategies with sensitivity at least 75% and specificity at least 50%. For optimizing accuracy, 2-term search strategies with accuracy > 75% were considered for multiple-term development. 11,317 search strategies were tested in the development of mental health content search filters. To enhance the performance of the most sensitive mental health content search strategy, the single search terms with the highest sensitivity were successively added to the top performing 3-term search strategy until the best sensitivity was achieved while keeping specificity ≥50%.
In addition to developing mental heath content search strategies as just described, we also evaluated the performance of the methodologic search filters for treatment articles when "ANDed" with the mental health content filters.