Search tips
Search criteria

Results 1-10 (10)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
Document Types
1.  Evaluating measures of semantic similarity and relatedness to disambiguate terms in biomedical text 
Journal of biomedical informatics  2013;46(6):10.1016/j.jbi.2013.08.008.
In this article, we evaluate a knowledge-based word sense disambiguation method that determines the intended concept associated with an ambiguous word in biomedical text using semantic similarity and relatedness measures. These measures quantify the degree of similarity or relatedness between concepts in the Unified Medical Language System (UMLS). The objective of this work is to develop a method that can disambiguate terms in biomedical text by exploiting similarity and relatedness information extracted from biomedical resources and to evaluate the efficacy of these measure on WSD.
We evaluate our method on a biomedical dataset (MSH-WSD) that contains 203 ambiguous terms and acronyms.
We show that information content-based measures derived from either a corpus or taxonomy obtain a higher disambiguation accuracy than path-based measures or relatedness measures on the MSH-WSD dataset.
The WSD system is open source and freely available from The MSH-WSD dataset is available from the National Library of Medicine
PMCID: PMC3864022  PMID: 24012881
Natural Language Processing; NLP; Word Sense Disambiguation; WSD; semantic similarity and relatedness; biomedical documents
2.  Evaluating Semantic Relatedness and Similarity Measures with Standardized MedDRA Queries 
A potential use of automated concept similarity and relatedness measures is to improve automatic detection of clinical text that relates to a condition indicative of an adverse drug reaction. This is also one of the purposes of the Medical Dictionary for Regulatory Activities (MedDRA) Standardized Queries (SMQ). An expert panel evaluates SMQs for their ability to detect a condition of interest and thus qualifies them as a reference standard for evaluating automated approaches. We compare similarity and relatedness measurement methods on rates of correctly identifying intra-category and inter-category concept pairs from SMQ data to create ROC curves of each method’s sensitivity and specificity. Results indicate an information content measure, specifically the Resnik method, achieved the highest results as measured by area under the curve, but using two different measures as predictors, Resnik and Lin, obtained the highest score. Overall, using SMQ data resulted in a productive method of evaluating automated semantic relatedness and similarity scores.
PMCID: PMC3540472  PMID: 23304271
3.  Using SemRep to Label Semantic Relations Extracted from Clinical Text 
In this paper we examined the relationship between semantic relatedness among medical concepts found in clinical reports and biomedical literature. Our objective is to determine whether relations between medical concepts identified from Medline abstracts may be used to inform us as to the nature of the association between medical concepts that appear to be closely related based on their distribution in clinical reports. We used a corpus of 800k inpatient clinical notes as a source of data for determining the strength of association between medical concepts and SemRep database as a source of labeled relations extracted from Medline abstracts. The same pair of medical concepts may be found with more than one predicate type in the SemRep database but often with different frequencies. Our analysis shows that predicate type frequency information obtained from the SemRep database appears to be helpful for labeling semantic relations obtained with measures of semantic relatedness and similarity.
PMCID: PMC3540517  PMID: 23304331
4.  Towards a Framework for Developing Semantic Relatedness Reference Standards 
Journal of biomedical informatics  2010;44(2):251-265.
Our objective is to develop a framework for creating reference standards for functional testing of computerized measures of semantic relatedness. Currently, research on computerized approaches to semantic relatedness between biomedical concepts relies on reference standards created for specific purposes using a variety of methods for their analysis. In most cases, these reference standards are not publicly available and the published information provided in manuscripts that evaluate computerized semantic relatedness measurement approaches is not sufficient to reproduce the results. Our proposed framework is based on the experiences of medical informatics and computational linguistics communities and addresses practical and theoretical issues with creating reference standards for semantic relatedness. We demonstrate the use of the framework on a pilot set of 101 medical term pairs rated for semantic relatedness by 13 medical coding experts. While the reliability of this particular reference standard is in the “moderate” range; we show that using clustering and factor analyses offers a data-driven approach to finding systematic differences among raters and identifying groups of potential outliers. We test two ontology-based measures of relatedness and provide both the reference standard containing individual ratings and the R program used to analyze the ratings as open-source. Currently, these resources are intended to be used to reproduce and compare results of studies involving computerized measures of semantic relatedness. Our framework may be extended to the development of reference standards in other research areas in medical informatics including automatic classification, information retrieval from medical records and vocabulary/ontology development.
PMCID: PMC3063326  PMID: 21044697
semantic relatedness; reference standards; reliability; inter-annotator agreement
5.  Knowledge-based Method for Determining the Meaning of Ambiguous Biomedical Terms Using Information Content Measures of Similarity 
In this paper, we introduce a novel knowledge-based word sense disambiguation method that determines the sense of an ambiguous word in biomedical text using semantic similarity or relatedness measures. These measures quantify the degree of similarity between concepts in the Unified Medical Language System (UMLS). The objective of this work was to develop a method that can disambiguate terms in biomedical text by exploiting similarity information extracted from the UMLS and to evaluate the efficacy of information content-based semantic similarity measures, which augment path-based information with probabilities derived from biomedical corpora. We show that information content-based measures obtain a higher disambiguation accuracy than path-based measures because they weight the path based on where it exists in the taxonomy coupled with the probability of the concepts occurring in a corpus of text.
PMCID: PMC3243213  PMID: 22195148
6.  Semantic Similarity and Relatedness between Clinical Terms: An Experimental Study 
Automated approaches to measuring semantic similarity and relatedness can provide necessary semantic context information for information retrieval applications and a number of fundamental natural language processing tasks including word sense disambiguation. Challenges for the development of these approaches include the limited availability of validated reference standards and the need for better understanding of the notions of semantic relatedness and similarity in medical vocabulary. We present results of a study in which eight medical residents were asked to judge 724 pairs of medical terms for semantic similarity and relatedness. The results of the study confirm the existence of a measurable mental representation of semantic relatedness between medical terms that is distinct from similarity and independent of the context in which the terms occur. This study produced a validated publicly available dataset for developing automated approaches to measuring semantic relatedness and similarity.
PMCID: PMC3041430  PMID: 21347043
7.  UMLS-Interface and UMLS-Similarity : Open Source Software for Measuring Paths and Semantic Similarity 
A number of computational measures for determining semantic similarity between pairs of biomedical concepts have been developed using various standards and programming platforms. In this paper, we introduce two new open-source frameworks based on the Unified Medical Language System (UMLS). These frameworks consist of the UMLS-Similarity and UMLS-Interface packages. UMLS-Interface provides path information about UMLS concepts. UMLS-Similarity calculates the semantic similarity between UMLS concepts using several previously developed measures and can be extended to include new measures. We validate the functionality of these frameworks by reproducing the results from previous work. Our frameworks constitute a significant contribution to the field of biomedical Natural Language Processing by providing a common development and testing platform for semantic similarity measures based on the UMLS.
PMCID: PMC2815481  PMID: 20351894
8.  Using UMLS Concept Unique Identifiers (CUIs) for Word Sense Disambiguation in the Biomedical Domain 
This paper explores the use of Concept Unique Identifiers (CUIs) as assigned by MetaMap as features for a supervised learning approach to word sense disambiguation of biomedical text. We compare the use of CUIs that occur in abstracts containing an instance of the target word with using the CUIs that occur in sentences containing an instance of the target word. We also experiment with frequency cutoffs for determining which CUIs should be included as features. We find that a Naive Bayesian classifier where the features represent CUIs that occur two or more times in abstracts containing the target word attains accuracy 9% greater than Leroy and Rindflesch’s approach, which includes features based on semantic types assigned by MetaMap. Our results are comparable to those of Joshi, et. al. and Liu, et. al., who use feature sets that do not contain biomedical information.
PMCID: PMC2655788  PMID: 18693893
9.  A Comparative Study of Supervised Learning as Applied to Acronym Expansion in Clinical Reports 
Electronic medical records (EMR) constitute a valuable resource of patient specific information and are increasingly used for clinical practice and research. Acronyms present a challenge to retrieving information from the EMR because many acronyms are ambiguous with respect to their full form. In this paper we perform a comparative study of supervised acronym disambiguation in a corpus of clinical notes, using three machine learning algorithms: the naïve Bayes classifier, decision trees and Support Vector Machines (SVMs). Our training features include part-of-speech tags, unigrams and bigrams in the context of the ambiguous acronym. We find that the combination of these feature types results in consistently better accuracy than when they are used individually, regardless of the learning algorithm employed. The accuracy of all three methods when using all features consistently approaches or exceeds 90%, even when the baseline majority classifier is below 50%.
PMCID: PMC1839635  PMID: 17238371
10.  Abbreviation and Acronym Disambiguation in Clinical Discourse 
Use of abbreviations and acronyms is pervasive in clinical reports despite many efforts to limit the use of ambiguous and unsanctioned abbreviations and acronyms. Due to the fact that many abbreviations and acronyms are ambiguous with respect to their sense, complete and accurate text analysis is impossible without identification of the sense that was intended for a given abbreviation or acronym. We present the results of an experiment where we used the contexts harvested from the Internet through Google API to collect contextual data for a set of 8 acronyms found in clinical notes at the Mayo Clinic. We then used the contexts to disambiguate the sense of abbreviations in a manually annotated corpus.
PMCID: PMC1560669  PMID: 16779108

Results 1-10 (10)