Overall, the concept-based approach exhibited an improvement over a keyword baseline. Results were heavily dependent on the quality of concept extraction provided by the MetaMap system. MetaMap only identifies UMLS concepts, which were then mapped to SNOMED-CT concepts. The rational for converting to SNOMED-CT was its formal representation that provides scope for future inference techniques. Experiments using UMLS concepts showed comparable performance. However, mapping between terminologies may result in a loss in meaning from the original query or document. Certain UMLS concepts have no equivalent in SNOMED- CT. Such cases were found in the two worst performing queries in our experiments, these were query 454.9 (asymptomatic varicose veins) and 038.11, (methicillin susceptible staphylococcus aureus septicemia). Advances in medical NLP, and the increasing popularity of SNOMED- CT, are likely to yield further improvements to tools such as MetaMap, for example direct SNOMED-CT concept identification that avoids the mapping via UMLS, this will avoid the mapping problem and, we conjecture, should improve our concept-based retrieval system.
The queries that performed well using our concept-based approach were often characterised as having a number of possible variants in their keyword form. For example, the query 530.81 (esophageal reflux
) which mapped to the SNOMED-CT concepts:
235595009 (Gastroesophageal reflux disease);
196600005 (Acid reflux &/or oesophagitis);
47268002 (Reflux); and
249496004 (Esophageal reflux finding).
In the keyword-based system a query for esophageal reflux
was unlikely to return documents that contain oesophagitis.
However, in the concept-based approach oesophagitis
was represented in the query as part of concept 196600005
. The average precision for this query improved from 0.1285 to 0.3414. Another example was query 042 (human immunodeficiency virus
) – relevant documents contained the abbreviations HIV
but did not explicitly mention human immunodeficiency virus
(average precision increased from 0.2332 to 0.4622 for this query).
Our current system represents queries and documents as SNOMED-CT concepts but does not make use of the additional information provided by the relationships between concepts. Some initial experimentation on using these relationships for query expansions proved difficult – certain queries showed significant improvement, while others had significant degradation in performance. A more targeted approach that takes into account the semantic type (e.g. disease, treatment, symptom) of the specific query concept is required (this approach has been successful in other applications).7
The use of inter- concept relationships is the next step towards a system that supports the type of inference capabilities required to deal with the complex medical queries we have already outlined.