PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-4 (4)
 

Clipboard (0)
None

Select a Filter Below

Journals
Authors
Year of Publication
Document Types
1.  A flexible framework for recognizing events, temporal expressions, and temporal relations in clinical text 
Objective
To provide a natural language processing method for the automatic recognition of events, temporal expressions, and temporal relations in clinical records.
Materials and Methods
A combination of supervised, unsupervised, and rule-based methods were used. Supervised methods include conditional random fields and support vector machines. A flexible automated feature selection technique was used to select the best subset of features for each supervised task. Unsupervised methods include Brown clustering on several corpora, which result in our method being considered semisupervised.
Results
On the 2012 Informatics for Integrating Biology and the Bedside (i2b2) shared task data, we achieved an overall event F1-measure of 0.8045, an overall temporal expression F1-measure of 0.6154, an overall temporal link detection F1-measure of 0.5594, and an end-to-end temporal link detection F1-measure of 0.5258. The most competitive system was our event recognition method, which ranked third out of the 14 participants in the event task.
Discussion
Analysis reveals the event recognition method has difficulty determining which modifiers to include/exclude in the event span. The temporal expression recognition method requires significantly more normalization rules, although many of these rules apply only to a small number of cases. Finally, the temporal relation recognition method requires more advanced medical knowledge and could be improved by separating the single discourse relation classifier into multiple, more targeted component classifiers.
Conclusions
Recognizing events and temporal expressions can be achieved accurately by combining supervised and unsupervised methods, even when only minimal medical knowledge is available. Temporal normalization and temporal relation recognition, however, are far more dependent on the modeling of medical knowledge.
doi:10.1136/amiajnl-2013-001619
PMCID: PMC3756268  PMID: 23686936
Natural Language Processing; Clinical Informatics; Medical Records Systems, Computerized
2.  A supervised framework for resolving coreference in clinical records 
Objective
A method for the automatic resolution of coreference between medical concepts in clinical records.
Materials and methods
A multiple pass sieve approach utilizing support vector machines (SVMs) at each pass was used to resolve coreference. Information such as lexical similarity, recency of a concept mention, synonymy based on Wikipedia redirects, and local lexical context were used to inform the method. Results were evaluated using an unweighted average of MUC, CEAF, and B3 coreference evaluation metrics. The datasets used in these research experiments were made available through the 2011 i2b2/VA Shared Task on Coreference.
Results
The method achieved an average F score of 0.821 on the ODIE dataset, with a precision of 0.802 and a recall of 0.845. These results compare favorably to the best-performing system with a reported F score of 0.827 on the dataset and the median system F score of 0.800 among the eight teams that participated in the 2011 i2b2/VA Shared Task on Coreference. On the i2b2 dataset, the method achieved an average F score of 0.906, with a precision of 0.895 and a recall of 0.918 compared to the best F score of 0.915 and the median of 0.859 among the 16 participating teams.
Discussion
Post hoc analysis revealed significant performance degradation on pathology reports. The pathology reports were characterized by complex synonymy and very few patient mentions.
Conclusion
The use of several simple lexical matching methods had the most impact on achieving competitive performance on the task of coreference resolution. Moreover, the ability to detect patients in electronic medical records helped to improve coreference resolution more than other linguistic analysis.
doi:10.1136/amiajnl-2012-000810
PMCID: PMC3422838  PMID: 22610493
Natural language processing; clinical informatics; medical records systems; computerized; semantic relations; statistical learning; machine learning; predictive modeling; privacy technology
3.  A Machine Learning Approach for Identifying Anatomical Locations of Actionable Findings in Radiology Reports 
Recognizing the anatomical location of actionable findings in radiology reports is an important part of the communication of critical test results between caregivers. One of the difficulties of identifying anatomical locations of actionable findings stems from the fact that anatomical locations are not always stated in a simple, easy to identify manner. Natural language processing techniques are capable of recognizing the relevant anatomical location by processing a diverse set of lexical and syntactic contexts that correspond to the various ways that radiologists represent spatial relations. We report a precision of 86.2%, recall of 85.9%, and F1-measure of 86.0 for extracting the anatomical site of an actionable finding. Additionally, we report a precision of 73.8%, recall of 69.8%, and F1-measure of 71.8 for extracting an additional anatomical site that grounds underspecified locations. This demonstrates promising results for identifying locations, while error analysis reveals challenges under certain contexts. Future work will focus on incorporating new forms of medical language processing to improve performance and transitioning our method to new types of clinical data.
PMCID: PMC3540484  PMID: 23304352
4.  A flexible framework for deriving assertions from electronic medical records 
Objective
This paper describes natural-language-processing techniques for two tasks: identification of medical concepts in clinical text, and classification of assertions, which indicate the existence, absence, or uncertainty of a medical problem. Because so many resources are available for processing clinical texts, there is interest in developing a framework in which features derived from these resources can be optimally selected for the two tasks of interest.
Materials and methods
The authors used two machine-learning (ML) classifiers: support vector machines (SVMs) and conditional random fields (CRFs). Because SVMs and CRFs can operate on a large set of features extracted from both clinical texts and external resources, the authors address the following research question: Which features need to be selected for obtaining optimal results? To this end, the authors devise feature-selection techniques which greatly reduce the amount of manual experimentation and improve performance.
Results
The authors evaluated their approaches on the 2010 i2b2/VA challenge data. Concept extraction achieves 79.59 micro F-measure. Assertion classification achieves 93.94 micro F-measure.
Discussion
Approaching medical concept extraction and assertion classification through ML-based techniques has the advantage of easily adapting to new data sets and new medical informatics tasks. However, ML-based techniques perform best when optimal features are selected. By devising promising feature-selection techniques, the authors obtain results that outperform the current state of the art.
Conclusion
This paper presents two ML-based approaches for processing language in the clinical texts evaluated in the 2010 i2b2/VA challenge. By using novel feature-selection methods, the techniques presented in this paper are unique among the i2b2 participants.
doi:10.1136/amiajnl-2011-000152
PMCID: PMC3168311  PMID: 21724741
Medical informatics; natural language processing; decision-support systems; clinical; machine learning; clinical informatics; biomedical informatics; pediatrics; e-prescribing; human factors; medical informatics

Results 1-4 (4)