Search tips
Search criteria

Results 1-8 (8)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
Document Types
1.  Automatically Classifying Question Types for Consumer Health Questions 
AMIA Annual Symposium Proceedings  2014;2014:1018-1027.
We present a method for automatically classifying consumer health questions. Our thirteen question types are designed to aid in the automatic retrieval of medical answers from consumer health resources. To our knowledge, this is the first machine learning-based method specifically for classifying consumer health questions. We demonstrate how previous approaches to medical question classification are insufficient to achieve high accuracy on this task. Additionally, we describe, manually annotate, and automatically classify three important question elements that improve question classification over previous techniques. Our results and analysis illustrate the difficulty of the task and the future directions that are necessary to achieve high-performing consumer health question classification.
PMCID: PMC4420005  PMID: 25954411
2.  A flexible framework for recognizing events, temporal expressions, and temporal relations in clinical text 
To provide a natural language processing method for the automatic recognition of events, temporal expressions, and temporal relations in clinical records.
Materials and Methods
A combination of supervised, unsupervised, and rule-based methods were used. Supervised methods include conditional random fields and support vector machines. A flexible automated feature selection technique was used to select the best subset of features for each supervised task. Unsupervised methods include Brown clustering on several corpora, which result in our method being considered semisupervised.
On the 2012 Informatics for Integrating Biology and the Bedside (i2b2) shared task data, we achieved an overall event F1-measure of 0.8045, an overall temporal expression F1-measure of 0.6154, an overall temporal link detection F1-measure of 0.5594, and an end-to-end temporal link detection F1-measure of 0.5258. The most competitive system was our event recognition method, which ranked third out of the 14 participants in the event task.
Analysis reveals the event recognition method has difficulty determining which modifiers to include/exclude in the event span. The temporal expression recognition method requires significantly more normalization rules, although many of these rules apply only to a small number of cases. Finally, the temporal relation recognition method requires more advanced medical knowledge and could be improved by separating the single discourse relation classifier into multiple, more targeted component classifiers.
Recognizing events and temporal expressions can be achieved accurately by combining supervised and unsupervised methods, even when only minimal medical knowledge is available. Temporal normalization and temporal relation recognition, however, are far more dependent on the modeling of medical knowledge.
PMCID: PMC3756268  PMID: 23686936
Natural Language Processing; Clinical Informatics; Medical Records Systems, Computerized
3.  Encephalitis and antibodies to DPPX, a subunit of Kv4.2 potassium channels 
Annals of neurology  2012;73(1):120-128.
To report a novel cell-surface autoantigen of encephalitis that is a critical regulatory subunit of the Kv4.2 potassium channels.
Four patients with encephalitis of unclear etiology and antibodies with a similar pattern of neuropil brain immunostaining were selected for autoantigen characterization. Techniques included immunoprecipitation, mass spectrometry, cell-base experiments with Kv4.2 and several dipeptidyl-peptidase-like protein-6 (DPPX) plasmid constructs, and comparative brain immunostaining of wild-type and DPPX-null mice.
Immunoprecipitation studies identified DPPX as the target autoantigen. A cell based assay confirmed that all 4 patients, but not 210 controls, had DPPX antibodies. Symptoms included agitation, confusion, myoclonus, tremor, and seizures (one case with prominent startle response). All patients had pleocytosis, and three had severe prodromal diarrhea of unknown etiology. Given that DPPX “tunes up” the Kv4.2 potassium channels (involved in somatodendritic signal integration and attenuation of dendritic backpropagation of action potentials), we determined the epitope distribution in DPPX, DPP10 (a protein homologous to DPPX) and Kv4.2. Patients’ antibodies were found specific for DPPX, without reacting with DPP10 or Kv4.2. The unexplained diarrhea led to demonstrate a robust expression of DPPX in the myenteric plexus, which strongly reacted with patients’ antibodies. The course of neuropsychiatric symptoms was prolonged and often associated with relapses while decreasing immunotherapy. Long-term follow-up showed substantial improvement in 3 patients (1 is lost to follow-up).
Antibodies to DPPX associate with a protracted encephalitis characterized by CNS hyperexcitability (agitation, myoclonus, tremor, seizures), pleocytosis, and frequent diarrhea at symptom onset. The disorder is potentially treatable with immunotherapy.
PMCID: PMC3563722  PMID: 23225603
Antibodies; encephalitis; autoimmune; DPP6; DPPX; potassium channels
4.  A supervised framework for resolving coreference in clinical records 
A method for the automatic resolution of coreference between medical concepts in clinical records.
Materials and methods
A multiple pass sieve approach utilizing support vector machines (SVMs) at each pass was used to resolve coreference. Information such as lexical similarity, recency of a concept mention, synonymy based on Wikipedia redirects, and local lexical context were used to inform the method. Results were evaluated using an unweighted average of MUC, CEAF, and B3 coreference evaluation metrics. The datasets used in these research experiments were made available through the 2011 i2b2/VA Shared Task on Coreference.
The method achieved an average F score of 0.821 on the ODIE dataset, with a precision of 0.802 and a recall of 0.845. These results compare favorably to the best-performing system with a reported F score of 0.827 on the dataset and the median system F score of 0.800 among the eight teams that participated in the 2011 i2b2/VA Shared Task on Coreference. On the i2b2 dataset, the method achieved an average F score of 0.906, with a precision of 0.895 and a recall of 0.918 compared to the best F score of 0.915 and the median of 0.859 among the 16 participating teams.
Post hoc analysis revealed significant performance degradation on pathology reports. The pathology reports were characterized by complex synonymy and very few patient mentions.
The use of several simple lexical matching methods had the most impact on achieving competitive performance on the task of coreference resolution. Moreover, the ability to detect patients in electronic medical records helped to improve coreference resolution more than other linguistic analysis.
PMCID: PMC3422838  PMID: 22610493
Natural language processing; clinical informatics; medical records systems; computerized; semantic relations; statistical learning; machine learning; predictive modeling; privacy technology
5.  Extracting Actionable Findings of Appendicitis from Radiology Reports Using Natural Language Processing  
Radiology reports often contain findings about the condition of a patient which should be acted upon quickly. These actionable findings in a radiology report can be automatically detected to ensure that the referring physician is notified about such findings and to provide feedback to the radiologist that further action has been taken. In this paper we investigate a method for detecting actionable findings of appendicitis in radiology reports. The method identifies both individual assertions regarding the presence of appendicitis and other findings related to appendicitis using syntactic dependency patterns. All relevant individual statements from a report are collectively considered to determine whether the report is consistent with appendicitis. Evaluation on a corpus of 400 radiology reports annotated by two expert radiologists showed that our approach achieves a precision of 91%, a recall of 83%, and an F1-measure of 87%.
PMCID: PMC3845763  PMID: 24303268
6.  A Machine Learning Approach for Identifying Anatomical Locations of Actionable Findings in Radiology Reports 
Recognizing the anatomical location of actionable findings in radiology reports is an important part of the communication of critical test results between caregivers. One of the difficulties of identifying anatomical locations of actionable findings stems from the fact that anatomical locations are not always stated in a simple, easy to identify manner. Natural language processing techniques are capable of recognizing the relevant anatomical location by processing a diverse set of lexical and syntactic contexts that correspond to the various ways that radiologists represent spatial relations. We report a precision of 86.2%, recall of 85.9%, and F1-measure of 86.0 for extracting the anatomical site of an actionable finding. Additionally, we report a precision of 73.8%, recall of 69.8%, and F1-measure of 71.8 for extracting an additional anatomical site that grounds underspecified locations. This demonstrates promising results for identifying locations, while error analysis reveals challenges under certain contexts. Future work will focus on incorporating new forms of medical language processing to improve performance and transitioning our method to new types of clinical data.
PMCID: PMC3540484  PMID: 23304352
7.  A flexible framework for deriving assertions from electronic medical records 
This paper describes natural-language-processing techniques for two tasks: identification of medical concepts in clinical text, and classification of assertions, which indicate the existence, absence, or uncertainty of a medical problem. Because so many resources are available for processing clinical texts, there is interest in developing a framework in which features derived from these resources can be optimally selected for the two tasks of interest.
Materials and methods
The authors used two machine-learning (ML) classifiers: support vector machines (SVMs) and conditional random fields (CRFs). Because SVMs and CRFs can operate on a large set of features extracted from both clinical texts and external resources, the authors address the following research question: Which features need to be selected for obtaining optimal results? To this end, the authors devise feature-selection techniques which greatly reduce the amount of manual experimentation and improve performance.
The authors evaluated their approaches on the 2010 i2b2/VA challenge data. Concept extraction achieves 79.59 micro F-measure. Assertion classification achieves 93.94 micro F-measure.
Approaching medical concept extraction and assertion classification through ML-based techniques has the advantage of easily adapting to new data sets and new medical informatics tasks. However, ML-based techniques perform best when optimal features are selected. By devising promising feature-selection techniques, the authors obtain results that outperform the current state of the art.
This paper presents two ML-based approaches for processing language in the clinical texts evaluated in the 2010 i2b2/VA challenge. By using novel feature-selection methods, the techniques presented in this paper are unique among the i2b2 participants.
PMCID: PMC3168311  PMID: 21724741
Medical informatics; natural language processing; decision-support systems; clinical; machine learning; clinical informatics; biomedical informatics; pediatrics; e-prescribing; human factors; medical informatics
8.  Automatic extraction of relations between medical concepts in clinical texts 
A supervised machine learning approach to discover relations between medical problems, treatments, and tests mentioned in electronic medical records.
Materials and methods
A single support vector machine classifier was used to identify relations between concepts and to assign their semantic type. Several resources such as Wikipedia, WordNet, General Inquirer, and a relation similarity metric inform the classifier.
The techniques reported in this paper were evaluated in the 2010 i2b2 Challenge and obtained the highest F1 score for the relation extraction task. When gold standard data for concepts and assertions were available, F1 was 73.7, precision was 72.0, and recall was 75.3. F1 is defined as 2*Precision*Recall/(Precision+Recall). Alternatively, when concepts and assertions were discovered automatically, F1 was 48.4, precision was 57.6, and recall was 41.7.
Although a rich set of features was developed for the classifiers presented in this paper, little knowledge mining was performed from medical ontologies such as those found in UMLS. Future studies should incorporate features extracted from such knowledge sources, which we expect to further improve the results. Moreover, each relation discovery was treated independently. Joint classification of relations may further improve the quality of results. Also, joint learning of the discovery of concepts, assertions, and relations may also improve the results of automatic relation extraction.
Lexical and contextual features proved to be very important in relation extraction from medical texts. When they are not available to the classifier, the F1 score decreases by 3.7%. In addition, features based on similarity contribute to a decrease of 1.1% when they are not available.
PMCID: PMC3168312  PMID: 21846787
Natural language processing; clinical informatics; medical records systems; computerized; semantic relations

Results 1-8 (8)