PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-1 (1)
 

Clipboard (0)
None
Journals
Authors
Year of Publication
Document Types
1.  Using Empirically Constructed Lexical Resources for Named Entity Recognition 
Biomedical Informatics Insights  2013;6(Suppl 1):17-27.
Because of privacy concerns and the expense involved in creating an annotated corpus, the existing small-annotated corpora might not have sufficient examples for learning to statistically extract all the named-entities precisely. In this work, we evaluate what value may lie in automatically generated features based on distributional semantics when using machine-learning named entity recognition (NER). The features we generated and experimented with include n-nearest words, support vector machine (SVM)-regions, and term clustering, all of which are considered distributional semantic features. The addition of the n-nearest words feature resulted in a greater increase in F-score than by using a manually constructed lexicon to a baseline system. Although the need for relatively small-annotated corpora for retraining is not obviated, lexicons empirically derived from unannotated text can not only supplement manually created lexicons, but also replace them. This phenomenon is observed in extracting concepts from both biomedical literature and clinical notes.
doi:10.4137/BII.S11664
PMCID: PMC3702195  PMID: 23847424
natural language processing; distributional semantics; concept extraction; named entity recognition; empirical lexical resources

Results 1-1 (1)