PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-4 (4)
 

Clipboard (0)
None

Select a Filter Below

Journals
Authors
more »
Year of Publication
Document Types
author:("drouin, Cyril")
1.  Combining an Expert-Based Medical Entity Recognizer to a Machine-Learning System: Methods and a Case Study 
Biomedical Informatics Insights  2013;6(Suppl 1):51-62.
Medical entity recognition is currently generally performed by data-driven methods based on supervised machine learning. Expert-based systems, where linguistic and domain expertise are directly provided to the system are often combined with data-driven systems. We present here a case study where an existing expert-based medical entity recognition system, Ogmios, is combined with a data-driven system, Caramba, based on a linear-chain Conditional Random Field (CRF) classifier. Our case study specifically highlights the risk of overfitting incurred by an expert-based system. We observe that it prevents the combination of the 2 systems from obtaining improvements in precision, recall, or F-measure, and analyze the underlying mechanisms through a post-hoc feature-level analysis. Wrapping the expert-based system alone as attributes input to a CRF classifier does boost its F-measure from 0.603 to 0.710, bringing it on par with the data-driven system. The generalization of this method remains to be further investigated.
doi:10.4137/BII.S11770
PMCID: PMC3776026  PMID: 24052691
natural language processing; information extraction; medical records; machine learning; hybrid methods; overfitting
2.  Hybrid methods for improving information access in clinical documents: concept, assertion, and relation identification 
Objective
This paper describes the approaches the authors developed while participating in the i2b2/VA 2010 challenge to automatically extract medical concepts and annotate assertions on concepts and relations between concepts.
Design
The authors'approaches rely on both rule-based and machine-learning methods. Natural language processing is used to extract features from the input texts; these features are then used in the authors' machine-learning approaches. The authors used Conditional Random Fields for concept extraction, and Support Vector Machines for assertion and relation annotation. Depending on the task, the authors tested various combinations of rule-based and machine-learning methods.
Results
The authors'assertion annotation system obtained an F-measure of 0.931, ranking fifth out of 21 participants at the i2b2/VA 2010 challenge. The authors' relation annotation system ranked third out of 16 participants with a 0.709 F-measure. The 0.773 F-measure the authors obtained on concept extraction did not make it to the top 10.
Conclusion
On the one hand, the authors confirm that the use of only machine-learning methods is highly dependent on the annotated training data, and thus obtained better results for well-represented classes. On the other hand, the use of only a rule-based method was not sufficient to deal with new types of data. Finally, the use of hybrid approaches combining machine-learning and rule-based approaches yielded higher scores.
doi:10.1136/amiajnl-2011-000154
PMCID: PMC3168313  PMID: 21597105
NLP; controlled terminologies and vocabularies; discovery and text and data mining methods; natural-language processing; automated learning; natural language processing; medical Informatics
3.  Extracting medical information from narrative patient records: the case of medication-related information 
Objective
While essential for patient care, information related to medication is often written as free text in clinical records and, therefore, difficult to use in computerized systems. This paper describes an approach to automatically extract medication information from clinical records, which was developed to participate in the i2b2 2009 challenge, as well as different strategies to improve the extraction.
Design
Our approach relies on a semantic lexicon and extraction rules as a two-phase strategy: first, drug names are recognized and, then, the context of these names is explored to extract drug-related information (mode, dosage, etc) according to rules capturing the document structure and the syntax of each kind of information. Different configurations are tested to improve this baseline system along several dimensions, particularly drug name recognition—this step being a determining factor to extract drug-related information. Changes were tested at the level of the lexicons and of the extraction rules.
Results
The initial system participating in i2b2 achieved good results (global F-measure of 77%). Further testing of different configurations substantially improved the system (global F-measure of 81%), performing well for all types of information (eg, 84% for drug names and 88% for modes), except for durations and reasons, which remain problematic.
Conclusion
This study demonstrates that a simple rule-based system can achieve good performance on the medication extraction task. We also showed that controlled modifications (lexicon filtering and rule refinement) were the improvements that best raised the performance.
doi:10.1136/jamia.2010.003962
PMCID: PMC2995678  PMID: 20819863
4.  Automatic computation of CHA2DS2-VASc score: Information extraction from clinical texts for thromboembolism risk assessment 
The CHA2DS2-VASc score is a 10-point scale which allows cardiologists to easily identify potential stroke risk for patients with non-valvular fibrillation. In this article, we present a system based on natural language processing (lexicon and linguistic modules), including negation and speculation handling, which extracts medical concepts from French clinical records and uses them as criteria to compute the CHA2DS2-VASc score. We evaluate this system by comparing its computed criteria with those obtained by human reading of the same clinical texts, and by assessing the impact of the observed differences on the resulting CHA2DS2-VASc scores. Given 21 patient records, 168 instances of criteria were computed, with an accuracy of 97.6%, and the accuracy of the 21 CHA2DS2-VASc scores was 85.7%. All differences in scores trigger the same alert, which means that system performance on this test set yields similar results to human reading of the texts.
PMCID: PMC3243195  PMID: 22195104

Results 1-4 (4)