We included 49 028 articles in the analysis; 6568 articles (13.4%) were classified as original studies evaluating a treatment, of which 1587 (24.2%) met our methodological criteria. Overall, 3807 of 4862 proposed unique terms retrieved citations from Medline that could be used in assessment of terms. The development and validation datasets for assessing retrieval strategies included articles that passed and did not pass treatment criteria (930 and 29 397 articles, respectively, for the development dataset; 657 and 19 631 articles for the validation dataset). The validation dataset provided differences in performance that were statistically significant in only three of 36 comparisons, the greatest of which was 1.1% for one set of specificities (data not shown).
shows the operating characteristics for the single terms with the highest sensitivity and the highest specificity. The accuracy is driven by the specificity and thus the term with the best accuracy when keeping sensitivity more than 50% was “randomized controlled trial.pt.”. The single term that yielded the best precision while keeping sensitivity more than 50% was also “randomized controlled trial.pt.”, and this strategy also gave the optimal balance of sensitivity and specificity.
Best single terms for high sensitivity searches, high specificity searches, and searches that optimise balance between sensitivity and specificity for retrieving studies of treatment
For strategies combining up to three terms, those yielding the highest sensitivity, specificity, and accuracy are shown in tables , , . Some two term strategies outperformed one term and multiple term strategies (). shows the top three search strategies optimising the trade-off between sensitivity and specificity.
Top three search strategies yielding highest sensitivity (keeping specificity >50%) with combinations of terms
Top three search strategies yielding highest specificity (keeping sensitivity >50%) based on combinations of up to three terms
Top three search strategies yielding highest accuracy (keeping sensitivity >50%) based on combinations of up to three terms
Top three search strategies for optimising sensitivity and specificity (based on absolute difference (sensitivity—specificity) <1%)
shows the best combination of terms for optimising the trade-off between sensitivity and specificity when using the boolean NOT to eliminate terms with the lowest sensitivity. Nonsignificant differences were shown when citations retrieved by the three terms “review tutorial.pt.”, “review academic.pt.”, and “selection criteri:.tw.” were removed from the strategy that optimised sensitivity and specificity.
Best combination of terms for optimising the trade-off between sensitivity and specificity in Medline when adding the boolean AND NOT
After the two term and three term computations, search strategies with sensitivity more than 50% and specificity more than 95% were further evaluated by adding search terms selected using logistic regression modelling. Initially, candidate terms for addition to the base strategy were ordered with the most significant first, using stepwise logistic regression, and then added to the model sequentially. The resulting logistic function (data not shown) determined the association between the predicted probabilities and observed responses. We selected the best one term, two term, three term, and four term strategies. Two were already evaluated (“randomized controlled trial.mp.” OR “randomized controlled trial.pt.” in and “randomized controlled trial.mp.” OR “randomized controlled trial.pt.” OR “double-blind:.tw.” in ). The other two strategies are listed in : both had high performance. We next took the 13 terms that had regression coefficients less than -2.0 (“predict.tw.”, “predict.mp.”, “economic.tw.”, “economic.mp.”, “survey.tw.”, “survey.mp.”, “hospital mortality.mp,tw.”, “hospital mortalit:.mp.”, “accuracy:.tw.”, “accuracy.tw.”, “accuracy.mp.”, “explode bias (epidemiology)”, and “longitudinal.tw.”) and NOTed these terms out of the four term search strategy to determine if these terms would improve the operating characteristic values (, last row). We found a small but insignificant decrease in sensitivity and increases in specificity, precision, and accuracy.
Top three term and four term search strategies using logistic regression techniques
We compared our best strategies for maximising sensitivity (sensitivity > 99% and specificity > 70%) and for maximising specificity while maintaining a high sensitivity (sensitivity > 94% and specificity > 97%). To ascertain if the less sensitive strategy (which had a much greater specificity) would miss important articles, we assessed the methodologically sound articles that had not been retrieved by the less sensitive strategy, using studies from the four major medical journals (BMJ, JAMA, Lancet,
and New England Journal of Medicine
). In total, 32 articles were missed by the less sensitive search, of which four were from these four journals. A practising clinician with training in methods for health research found only one of the four articles to be of substantial clinical importance.14
The indexing terms for this randomised controlled trial did not include “randomized controlled trial(pt)”. When we contacted the National Library of Medicine about indexing for this article, the article was reindexed and now the “missing” article would be retrieved.
We used our data to test 19 published strategies2-7,13
and we compared these with the best strategies for optimising sensitivity and specificity. The published strategies had a sensitivity range of 1.3% to 98.8% on the basis of our handsearched data. All of these were lower than our best sensitivity of 99.3%. The specificities for the published strategies ranged from 63.3% to 96.6%. Two strategies from Dumbrique6
outperformed our most specific strategy (specificity of 98.1% and 97.6% versus our 97.4%). Both of these strategies had a lower sensitivity than did our search strategy with the best specificity (42.0% and 92.8% v