PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of amiasummtspLink to Publisher's site
 
AMIA Summits Transl Sci Proc. 2012; 2012: 38.
Published online 2012 March 19.
PMCID: PMC3392069
Feasibility of pooling annotated corpora for clinical concept extraction
Kavishwar Wagholikar, MBBS, PhD,1 Manabu Torii, PhD,2 Siddhartha Jonnalagadda, PhD,1 and Hongfang Liu, PhD1
1Mayo Clinic, Rochester, MN;
2Georgetown University, Washington, DC
Abstract
Availability of annotated corpora has facilitated application of machine learning algorithms to concept extraction from clinical notes. However, it is expensive to prepare annotated corpora in individual institutions, and pooling of annotated corpora from other institutions is a potential solution. In this paper we investigate whether pooling of corpora from two different sources, can improve performance and portability of resultant machine learning taggers for medical problem detection. Specifically, we pool corpora from 2010 i2b2/VA NLP challenge and Mayo Clinic Rochester, to evaluate taggers for recognition of medical problems. Contrary to our expectations, pooling of corpora is found to decrease the F1-score. We examine the annotation guidelines to identify factors for incompatibility of the corpora and suggest development of a standard annotation guideline by the clinical NLP community to allow compatibility of annotated corpora.
Articles from AMIA Summits on Translational Science Proceedings are provided here courtesy of
American Medical Informatics Association