Evidence-based medicine (EBM) is the process of applying the best available evidence gained from clinical research to the practice of medicine [1
]. While this is certainly a desirable goal, a typical physician’s heavy workload can make it difficult to realize. Practicing physicians may not have time to consult the primary literature to identify the best-available evidence for each and every patient. Therefore the actual practice of EBM is dependent upon clinicians having access to syntheses of the best-available primary evidence applicable to their patients. These syntheses, such as systematic reviews (SRs), make the available evidence more accessible and usable in clinical practice. The Cochrane Collaboration states that an SR:
"“…attempts to collate all empirical evidence that fits pre-specified eligibility criteria in order to answer a specific research question. It uses explicit, systematic methods that are selected with a view to minimizing bias, thus providing more reliable findings from which conclusions can be drawn and decisions made
SRs are literature reviews designed to locate, appraise and synthesize the best-available evidence from clinical studies of diagnosis, treatment, prognosis, or etiology, and provide informative empirical answers to specific medical questions. SRs inform medical recommendations, guiding both practice and policy, such as in the creation of published practice guidelines [3
The process of creating and maintaining SRs is resource- and labor-intensive, typically requiring 6–12 months of effort, with the main expense being personnel time. There is ample evidence that SRs become outdated as research progresses, and thus need to be periodically updated [4
]. Best practice in medicine is continually changing, requiring incorporation of new information as it becomes available, so SRs must undergo periodic updates in order to remain useful and accurate. Updates are costly in terms of both time and money, and can take as much time and effort as the original SR [6
]. Typically SR programs, such as the Drug Effectiveness Review Project (DERP) can only assess a past SR topic for new literature once or twice a year, leading to a 6–12 month lag in recognizing new evidence and beginning the planning of an SR update.
Although there exists research guidance on when and how to update SRs [6
], the process is not well understood. A comparison by Shekelle of two methods (known as RAND and Ottawa) for determining the need for an SR to be updated found that both begin with an initial literature search [8
]. Neither method provides guidance on when
to conduct the required literature search. The machine learning method proposed here provides exactly this guidance, and fits into the SR update process ahead of review commitment decision methods such as those assessed by Shekelle.
A survival analysis study of SRs by Shojania [5
] found that the median duration of an SR not needing an update was 5.5 years. However, there was quite a lot of variation around this median – 23% of reviews needed an update within 2 years, and 15% within just 1 year of publication. While a more active SR research topic area would logically require more frequent updates, Shojania also found that areas with more heterogenous research tended to require more frequent updates as well, because new evidence is more likely to alter the previous findings by reducing the variation across results. Clearly, there is a strong need for informatics support in determining when an SR topic is due for an update.
Building on our prior work in applying automated document classification to work prioritization for SRs [9
], in this paper we perform an initial investigation of the potential impact of automated document classification to the SR logistical process. While other researchers have investigated the use of machine learning in supporting EBM, most notably Aphinyanaphongs [12
], Kilicoglu [13
], and Matwin [15
], this is the first study that we are aware of which specifically looks at the impact of machine learning methods on SR update scheduling. We seek to study the potential effect of automated document classification on the process of SR update, in terms of need recognition, planning, and scheduling.
Here, we define a document classification task called New Update Alert. The idea behind New Update Alert is that as publications become available to the SR team, an automated document classification system may be able to determine which ones are most likely to be included in the SR update. When an article is detected that is likely to be included in the SR update, the system alerts the SR leader, perhaps via an automatically generated email message, or using a custom RSS (Really Simple Syndication) feed. For the purposes of this work, a publication becomes available to the review team when it is indexed in MEDLINE, and therefore is findable using the search queries previously designed for the SR topic. The algorithm looks at each article meeting the original review search criteria, and notifies the team about articles that it predicts as likely to be included in an update.
We define a correct alert to be one that notifies the SR team about a publication that will be included in the eventual SR report update. These are publications that include new evidence regarding interventions, populations, or study designs relevant to the report. An incorrect alert is an alert about publication that is not eventually included in the SR update; these are “false alarms”. The machine predictions are not perfect, and a range of settings trading off sensitivity and specificity are possible. Greater sensitivity means that the team will be notified about the publication of a greater fraction of articles that will be ultimately included in the review update (true positives, TP), at the cost of more false alarms (false positives, FP). Furthermore, some publications may be more important than others, in that, in addition to being included in the final SR, they may include specific novel, or higher quality evidence that could motivate the scheduling, priority, or initiation of a review update. We specifically annotate and study these important publications in the work described below. For the work described here, alerts are trigged when any potentially included publication is detected, whether this is a motivating publication or not.
New Update Alerts could be useful to the process of SR in several ways. For example, the alerts could be used by the SR team to determine whether an SR needs an update, the urgency of the update, or when an update should be scheduled. Seeing potentially includable articles accumulate as they are published may be helpful in scheduling a review update. With a system providing New Update Alerts, reviewers could be made aware of studies potentially impacting the SR scope, conclusions, or recommendations at an earlier time. By examining the articles that result in alerts the reviewers could get a better initial idea of the quantity and quality of new information pertinent to an SR before actually scheduling or conducting the review update.
This would provide support for determining when to schedule an SR update. For example, whether a review update is needed as soon as possible, or could be postponed for a time. Given that the resources to conduct SRs are specialized and limited, the ability to coordinate review update scheduling across the full set of a team’s review topics would be a great advantage in best applying those resources and supporting the current needs of the practice of EBM. Furthermore, this could play an important role in obtaining funding to support the review. Since many SRs are dependent upon outside funding, new update alerts could provide the SR team lead with timely and important information to share with a funding organization.
Here we study the performance of an initial classification system for New Update Alert, leaving the issues surrounding exactly what kind of user interface to use with the alerts for future work.