Search tips
Search criteria

Results 1-7 (7)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
Document Types
1.  À la Recherche du Temps Perdu: extracting temporal relations from medical text in the 2012 i2b2 NLP challenge 
An analysis of the timing of events is critical for a deeper understanding of the course of events within a patient record. The 2012 i2b2 NLP challenge focused on the extraction of temporal relationships between concepts within textual hospital discharge summaries.
Materials and methods
The team from the National Research Council Canada (NRC) submitted three system runs to the second track of the challenge: typifying the time-relationship between pre-annotated entities. The NRC system was designed around four specialist modules containing statistical machine learning classifiers. Each specialist targeted distinct sets of relationships: local relationships, ‘sectime’-type relationships, non-local overlap-type relationships, and non-local causal relationships.
The best NRC submission achieved a precision of 0.7499, a recall of 0.6431, and an F1 score of 0.6924, resulting in a statistical tie for first place. Post hoc improvements led to a precision of 0.7537, a recall of 0.6455, and an F1 score of 0.6954, giving the highest scores reported on this task to date.
Discussion and conclusions
Methods for general relation extraction extended well to temporal relations, and gave top-ranked state-of-the-art results. Careful ordering of predictions within result sets proved critical to this success.
PMCID: PMC3756270  PMID: 23523875
information extraction; temporal reasoning; natural language processing; relation extraction; clinical text
2.  Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010 
As clinical text mining continues to mature, its potential as an enabling technology for innovations in patient care and clinical research is becoming a reality. A critical part of that process is rigid benchmark testing of natural language processing methods on realistic clinical narrative. In this paper, the authors describe the design and performance of three state-of-the-art text-mining applications from the National Research Council of Canada on evaluations within the 2010 i2b2 challenge.
The three systems perform three key steps in clinical information extraction: (1) extraction of medical problems, tests, and treatments, from discharge summaries and progress notes; (2) classification of assertions made on the medical problems; (3) classification of relations between medical concepts. Machine learning systems performed these tasks using large-dimensional bags of features, as derived from both the text itself and from external sources: UMLS, cTAKES, and Medline.
Performance was measured per subtask, using micro-averaged F-scores, as calculated by comparing system annotations with ground-truth annotations on a test set.
The systems ranked high among all submitted systems in the competition, with the following F-scores: concept extraction 0.8523 (ranked first); assertion detection 0.9362 (ranked first); relationship detection 0.7313 (ranked second).
For all tasks, we found that the introduction of a wide range of features was crucial to success. Importantly, our choice of machine learning algorithms allowed us to be versatile in our feature design, and to introduce a large number of features without overfitting and without encountering computing-resource bottlenecks.
PMCID: PMC3168309  PMID: 21565856
natural language processing; semantics; classification/*methods; computerized medical records systems; patient discharge/*statistics & numerical data; text mining; concept detection; relation extraction; document coding; machine learning; modeling physiologic and disease processes; linking the genotype and phenotype; identifying genome and protein structure and function; visualization of data and knowledge
3.  ExaCT: automatic extraction of clinical trial characteristics from journal publications 
Clinical trials are one of the most important sources of evidence for guiding evidence-based practice and the design of new trials. However, most of this information is available only in free text - e.g., in journal publications - which is labour intensive to process for systematic reviews, meta-analyses, and other evidence synthesis studies. This paper presents an automatic information extraction system, called ExaCT, that assists users with locating and extracting key trial characteristics (e.g., eligibility criteria, sample size, drug dosage, primary outcomes) from full-text journal articles reporting on randomized controlled trials (RCTs).
ExaCT consists of two parts: an information extraction (IE) engine that searches the article for text fragments that best describe the trial characteristics, and a web browser-based user interface that allows human reviewers to assess and modify the suggested selections. The IE engine uses a statistical text classifier to locate those sentences that have the highest probability of describing a trial characteristic. Then, the IE engine's second stage applies simple rules to these sentences to extract text fragments containing the target answer. The same approach is used for all 21 trial characteristics selected for this study.
We evaluated ExaCT using 50 previously unseen articles describing RCTs. The text classifier (first stage) was able to recover 88% of relevant sentences among its top five candidates (top5 recall) with the topmost candidate being relevant in 80% of cases (top1 precision). Precision and recall of the extraction rules (second stage) were 93% and 91%, respectively. Together, the two stages of the extraction engine were able to provide (partially) correct solutions in 992 out of 1050 test tasks (94%), with a majority of these (696) representing fully correct and complete answers.
Our experiments confirmed the applicability and efficacy of ExaCT. Furthermore, they demonstrated that combining a statistical method with 'weak' extraction rules can identify a variety of study characteristics. The system is flexible and can be extended to handle other characteristics and document types (e.g., study protocols).
PMCID: PMC2954855  PMID: 20920176
4.  Automated Information Extraction of Key Trial Design Elements from Clinical Trial Publications 
Clinical trials are one of the most valuable sources of scientific evidence for improving the practice of medicine. The Trial Bank project aims to improve structured access to trial findings by including formalized trial information into a knowledge base. Manually extracting trial information from published articles is costly, but automated information extraction techniques can assist. The current study highlights a single architecture to extract a wide array of information elements from full-text publications of randomized clinical trials (RCTs). This architecture combines a text classifier with a weak regular expression matcher. We tested this two-stage architecture on 88 RCT reports from 5 leading medical journals, extracting 23 elements of key trial information such as eligibility rules, sample size, intervention, and outcome names. Results prove this to be a promising avenue to help critical appraisers, systematic reviewers, and curators quickly identify key information elements in published RCT articles.
PMCID: PMC2655966  PMID: 18999067
5.  Identifying Wrist Fracture Patients with High Accuracy by Automatic Categorization of X-ray Reports 
The authors performed this study to determine the accuracy of several text classification methods to categorize wrist x-ray reports. We randomly sampled 751 textual wrist x-ray reports. Two expert reviewers rated the presence (n = 301) or absence (n = 450) of an acute fracture of wrist. We developed two information retrieval (IR) text classification methods and a machine learning method using a support vector machine (TC-1). In cross-validation on the derivation set (n = 493), TC-1 outperformed the two IR based methods and six benchmark classifiers, including Naive Bayes and a Neural Network. In the validation set (n = 258), TC-1 demonstrated consistent performance with 93.8% accuracy; 95.5% sensitivity; 92.9% specificity; and 87.5% positive predictive value. TC-1 was easy to implement and superior in performance to the other classification methods.
PMCID: PMC1656960  PMID: 16929046
6.  LitMiner: integration of library services within a bio-informatics application 
This paper examines how the adoption of a subject-specific library service has changed the way in which its users interact with a digital library. The LitMiner text-analysis application was developed to enable biologists to explore gene relationships in the published literature. The application features a suite of interfaces that enable users to search PubMed as well as local databases, to view document abstracts, to filter terms, to select gene name aliases, and to visualize the co-occurrences of genes in the literature. At each of these stages, LitMiner offers the functionality of a digital library. Documents that are accessible online are identified by an icon. Users can also order documents from their institution's library collection from within the application. In so doing, LitMiner aims to integrate digital library services into the research process of its users.
Case study
This integration of digital library services into the research process of biologists results in increased access to the published literature.
In order to make better use of their collections, digital libraries should customize their services to suit the research needs of their patrons.
PMCID: PMC1626482  PMID: 17052341
7.  PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine 
BMC Bioinformatics  2003;4:11.
The majority of experimentally verified molecular interaction and biological pathway data are present in the unstructured text of biomedical journal articles where they are inaccessible to computational methods. The Biomolecular interaction network database (BIND) seeks to capture these data in a machine-readable format. We hypothesized that the formidable task-size of backfilling the database could be reduced by using Support Vector Machine technology to first locate interaction information in the literature. We present an information extraction system that was designed to locate protein-protein interaction data in the literature and present these data to curators and the public for review and entry into BIND.
Cross-validation estimated the support vector machine's test-set precision, accuracy and recall for classifying abstracts describing interaction information was 92%, 90% and 92% respectively. We estimated that the system would be able to recall up to 60% of all non-high throughput interactions present in another yeast-protein interaction database. Finally, this system was applied to a real-world curation problem and its use was found to reduce the task duration by 70% thus saving 176 days.
Machine learning methods are useful as tools to direct interaction and pathway database back-filling; however, this potential can only be realized if these techniques are coupled with human review and entry into a factual database such as BIND. The PreBIND system described here is available to the public at . Current capabilities allow searching for human, mouse and yeast protein-interaction information.
PMCID: PMC153503  PMID: 12689350

Results 1-7 (7)