Search tips
Search criteria

Results 1-5 (5)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
Document Types
author:("drouin, Cyril")
1.  Eventual situations for timeline extraction from clinical reports 
To identify the temporal relations between clinical events and temporal expressions in clinical reports, as defined in the i2b2/VA 2012 challenge.
To detect clinical events, we used rules and Conditional Random Fields. We built Random Forest models to identify event modality and polarity. To identify temporal expressions we built on the HeidelTime system. To detect temporal relations, we systematically studied their breakdown into distinct situations; we designed an oracle method to determine the most prominent situations and the most suitable associated classifiers, and combined their results.
We achieved F-measures of 0.8307 for event identification, based on rules, and 0.8385 for temporal expression identification. In the temporal relation task, we identified nine main situations in three groups, experimentally confirming shared intuitions: within-sentence relations, section-related time, and across-sentence relations. Logistic regression and Naïve Bayes performed best on the first and third groups, and decision trees on the second. We reached a 0.6231 global F-measure, improving by 7.5 points our official submission.
Carefully hand-crafted rules obtained good results for the detection of events and temporal expressions, while a combination of classifiers improved temporal link prediction. The characterization of the oracle recall of situations allowed us to point at directions where further work would be most useful for temporal relation detection: within-sentence relations and linking History of Present Illness events to the admission date. We suggest that the systematic situation breakdown proposed in this paper could also help improve other systems addressing this task.
PMCID: PMC3756272  PMID: 23571851
Natural Language Processing; Information Extraction; Medical Records; Chronology as Topic; Text Mining
2.  Combining an Expert-Based Medical Entity Recognizer to a Machine-Learning System: Methods and a Case Study 
Biomedical Informatics Insights  2013;6(Suppl 1):51-62.
Medical entity recognition is currently generally performed by data-driven methods based on supervised machine learning. Expert-based systems, where linguistic and domain expertise are directly provided to the system are often combined with data-driven systems. We present here a case study where an existing expert-based medical entity recognition system, Ogmios, is combined with a data-driven system, Caramba, based on a linear-chain Conditional Random Field (CRF) classifier. Our case study specifically highlights the risk of overfitting incurred by an expert-based system. We observe that it prevents the combination of the 2 systems from obtaining improvements in precision, recall, or F-measure, and analyze the underlying mechanisms through a post-hoc feature-level analysis. Wrapping the expert-based system alone as attributes input to a CRF classifier does boost its F-measure from 0.603 to 0.710, bringing it on par with the data-driven system. The generalization of this method remains to be further investigated.
PMCID: PMC3776026  PMID: 24052691
natural language processing; information extraction; medical records; machine learning; hybrid methods; overfitting
3.  Hybrid methods for improving information access in clinical documents: concept, assertion, and relation identification 
This paper describes the approaches the authors developed while participating in the i2b2/VA 2010 challenge to automatically extract medical concepts and annotate assertions on concepts and relations between concepts.
The authors'approaches rely on both rule-based and machine-learning methods. Natural language processing is used to extract features from the input texts; these features are then used in the authors' machine-learning approaches. The authors used Conditional Random Fields for concept extraction, and Support Vector Machines for assertion and relation annotation. Depending on the task, the authors tested various combinations of rule-based and machine-learning methods.
The authors'assertion annotation system obtained an F-measure of 0.931, ranking fifth out of 21 participants at the i2b2/VA 2010 challenge. The authors' relation annotation system ranked third out of 16 participants with a 0.709 F-measure. The 0.773 F-measure the authors obtained on concept extraction did not make it to the top 10.
On the one hand, the authors confirm that the use of only machine-learning methods is highly dependent on the annotated training data, and thus obtained better results for well-represented classes. On the other hand, the use of only a rule-based method was not sufficient to deal with new types of data. Finally, the use of hybrid approaches combining machine-learning and rule-based approaches yielded higher scores.
PMCID: PMC3168313  PMID: 21597105
NLP; controlled terminologies and vocabularies; discovery and text and data mining methods; natural-language processing; automated learning; natural language processing; medical Informatics
4.  Extracting medical information from narrative patient records: the case of medication-related information 
While essential for patient care, information related to medication is often written as free text in clinical records and, therefore, difficult to use in computerized systems. This paper describes an approach to automatically extract medication information from clinical records, which was developed to participate in the i2b2 2009 challenge, as well as different strategies to improve the extraction.
Our approach relies on a semantic lexicon and extraction rules as a two-phase strategy: first, drug names are recognized and, then, the context of these names is explored to extract drug-related information (mode, dosage, etc) according to rules capturing the document structure and the syntax of each kind of information. Different configurations are tested to improve this baseline system along several dimensions, particularly drug name recognition—this step being a determining factor to extract drug-related information. Changes were tested at the level of the lexicons and of the extraction rules.
The initial system participating in i2b2 achieved good results (global F-measure of 77%). Further testing of different configurations substantially improved the system (global F-measure of 81%), performing well for all types of information (eg, 84% for drug names and 88% for modes), except for durations and reasons, which remain problematic.
This study demonstrates that a simple rule-based system can achieve good performance on the medication extraction task. We also showed that controlled modifications (lexicon filtering and rule refinement) were the improvements that best raised the performance.
PMCID: PMC2995678  PMID: 20819863
5.  Automatic computation of CHA2DS2-VASc score: Information extraction from clinical texts for thromboembolism risk assessment 
The CHA2DS2-VASc score is a 10-point scale which allows cardiologists to easily identify potential stroke risk for patients with non-valvular fibrillation. In this article, we present a system based on natural language processing (lexicon and linguistic modules), including negation and speculation handling, which extracts medical concepts from French clinical records and uses them as criteria to compute the CHA2DS2-VASc score. We evaluate this system by comparing its computed criteria with those obtained by human reading of the same clinical texts, and by assessing the impact of the observed differences on the resulting CHA2DS2-VASc scores. Given 21 patient records, 168 instances of criteria were computed, with an accuracy of 97.6%, and the accuracy of the 21 CHA2DS2-VASc scores was 85.7%. All differences in scores trigger the same alert, which means that system performance on this test set yields similar results to human reading of the texts.
PMCID: PMC3243195  PMID: 22195104

Results 1-5 (5)