When exploring biomedical literature for information relevant to our research, we heavily rely on search engines (e.g. PubMed) which deliver us documents that match keyword-based queries. In the case of a query consisting of multiple keywords or terms, there is a need for restricting positional distance between occurrences of the terms in a document. If the terms are found too far from each other in the text, it is very likely that the text does not, at least not explicitly by means of the terms given, describe any relationship between concepts denoted by the terms. We regard this positional restriction as crucial in seeking relational information, for example, when users attempt to find textual evidence of relations between given concepts in the literature. We provide a novel tool to address this need with a special focus on the biomedical domain.
The tool presented here, named MedEvi, is a search engine that retrieves occurrences matching a given query with their local context. It is inspired by keywords-in-context (KWIC) concordancers, which have over the last few decades revolutionized the field of lexicography where different senses of lexical entries of dictionaries have to be defined in their authentic usage context (Sinclair, 1991
). We believe that a concordancer is a good candidate to meet the above-mentioned tasks of information seeking, since it innately deals with the local context of matching occurrences where the evidence being searched is much more likely found than in other parts of the retrieved documents.
The common limitation of existing concordancers, however, is that they consider only single-term queries. To deal with multiple-term queries effectively, we implement the positional restriction on top of a concordancer. This feature of MedEvi is similar to the concept of proximity query (Baeza-Yates and Ribeiro-Neto, 1999), for example, as implemented in the proximity search of Lucene queries and the defined adjacency operator of OVID database queries. The difference between them is that while the latter is explicitly stated, if any, in query strings (e.g. ‘A ADJn B’), the former is compulsorily applied to all queries where the distance between query terms, similar to ‘n’ of ‘ADJn’, can be adjusted by users.
MedEvi allows multi-term queries, composed with BOOLEAN operators (e.g. AND, OR). It is different from other existing search engines that also allow multi-term queries [e.g. PubMed (http://www.ncbi.nlm.nih.gov/sites/entrez
), HubMed (http://www.hubmed.org
)]. While the other search engines produce as results a list of MEDLINE abstracts, MedEvi directly browses text fragments that may eventually show semantic relations between given terms. It is different from other text mining tools that also browse text fragments, mostly sentences [e.g. iHOP (http://www.ihop-net.org/UniPub/iHOP/
), MEDIE (http://www-tsujii.is.s.u-tokyo.ac.jp/medie/
)]. While the text mining tools focus on certain biological entities like proteins (iHOP) (Hoffman and Valencia, 2005) and certain grammatical structures like subject-verb-object (MEDIE), MedEvi does not impose any syntactic or semantic restrictions, thus being widely used in any biomedical domains. We explain the features of MedEvi in the next section.
Users of MedEvi have found the tool useful to find evidence from the literature, for example, to see whether candidate chemicals are involved in a metabolic pathway, to identify the proteins that regulate given proteins, and to find whether a multi-term ontology concept actually appears in the literature even with a high degree of syntactic variations. Note that the applications above are generally concerned of semantic relations between biomedical concepts. Selected example queries can be found on the web page of MedEvi.