Searching for pertinent literature is an essential part of every scientist's life. There are many stages in the scientific process in which intimate knowledge of the appropriate literature is critical: (i) familiarization of a new area by a young scientist or a scientist whose research is taking on a new direction, (ii) monitoring the literature as the research progresses to capitalize on recent developments, measure ones competitiveness and avoid duplication of effort (1
), (iii) development of reference lists during manuscript or grant application writing and (iv) compiling suggested reviewers when called upon to do so as part of a manuscript submission to a journal. For mature scientists, the reasons for interaction with the literature expand: (i) development of very broad knowledge when writing, for example, a review article, (ii) mastery of new areas in the role of student mentor or examiner and (iii) acquiring focused knowledge when called upon as a manuscript or grant application reviewer. For other scientific professionals, the literature is a resource for identifying colleagues: (i) identification of experts for advisory or steering committees, (ii) selection of reviewers for grants or proposals by government or private agencies, (iii) identification of experts for legal proceedings and testimony, (iv) finding starting points into the literature for novice or lay individuals by librarians and (v) identification of manuscript reviewers by journal editors.
The primary portal for the biomedical literature is PubMed (2
). This web-based tool searches the Medline database using keywords and Boolean operators. The selection of appropriate keywords by the user requires some knowledge to choose wisely, and this often requires numerous iterations to sample the literature with hopes of finding the most relevant literature. Once the results of a query are presented to the user, the lists can be sorted by date, author or journal. Recent research has focused upon improving the quality and navigation of output (4–8
There is sufficient information contained within the Medline database to overcome these limitations given a tool with appropriate query entry and result presentation methods. Scientists or professionals either generate in the course of manuscript or grant writing or are presented with concentrated information in the form of an abstract or other document. Given this, the keyword selection and optimization process can be bypassed if natural language free text, such as an abstract, can be submitted directly to a literature search engine. To do this, we have developed eTBLAST, which uses a hybrid scheme to extract and weight keywords contained within the submitted query to identify a subset of literature in Medline, and then performs a sentence alignment to compute a final quantitative score as a measure of similarity and, presumably, relevancy. This tool then outputs a list, similar to PubMed, but ranked instead by this similarity score. At this point, scientists can interact with the most relevant Medline literature much as they have done traditionally via date, author or journal sorting methods in PubMed. This similarity-ranked output can be further processed to compile lists and present output views which add value for the specific uses just outlined; identifying the most frequent and prominent authors as experts/reviewers, identifying the most frequent journals as targets for submission and inspection of the publication rate over time as a measure of novelty and topic popularity. It should be noted that eTBLAST and PubMed both find similar abstracts, but by different methods and PubMed's Related Links is limited to only finding similarity among the records currently in Medline, not arbitrary text, as is used by eTBLAST. There also are numerous other Medline keyword-based search tools (CiteXplore, HubMed and GoPubMed, for example) (8–10
), including some of which have results post processors with some similar functionality (author and journal finding).
Summarized herein are a set of parsers for the code, eTBLAST (11
), that can take an abstract or any text as input to identify lists of ‘experts’, target journals and publication trends.