Automated text summarisers that find the best clinical evidence reported in collections of medical literature are of potential benefit for the practice of Evidence Based Medicine (EBM). Research and development of text summarisers for EBM, however, is impeded by the lack of corpora to train and test such systems.
To produce a corpus for research in EBM summarisation.
We sourced the “Clinical Inquiries” section of the Journal of Family Practice (JFP) and obtained a sizeable sample of questions and evidence based summaries. We further processed the summaries by combining automated techniques, human annotations, and crowdsourcing techniques to identify the PubMed IDs of the references.
The corpus has 456 questions, 1,396 answer components, 3,036 answer justifications, and 2,908 references.
The corpus is now available for the research community at http://sourceforge.net/projects/ebmsumcorpus.
Evidence Based Medicine; corpora; text summarisation; natural language processing.
We present a method for automated medical textbook and encyclopedia summarization. Using
statistical sentence extraction and semantic relationships, we
extract sentences from text returned as part of an existing
textbook search (similar to a book index). Our system guides users to
the information they desire by summarizing the content of each relevant
chapter or section returned through the search. The summary is tailored
to contain sentences that specifically address the user’s
search terms. Our clustering method selects sentences that contain concepts
specifically addressing the context of the query term in each of
the returned sections. Our method examines conceptual relationships
from the UMLS and selects clusters of concepts using Expectation Maximization (EM). Sentences
associated with the concept clusters are shown
to the user. We evaluated whether our extracted summary provides a suitable
answer to the user’s question.
Evidence-Based Medicine (EBM) has become a popular approach to medical decision making and is increasingly part of undergraduate and postgraduate medical education. EBM follows four steps: 1. formulate a clear clinical question from a patient’s problem; 2. search the literature for relevant clinical articles; 3. evaluate (critically appraise) the evidence for its validity and usefulness; 4. implement useful findings into clinical practice. This review describes the concepts, terminology and skills taught to attendees at EBM courses, focusing specifically on the approach taken to diagnostic questions. It covers how to ask an answerable clinical question, search for evidence, construct diagnostic critically appraised topics (CATs), and use sensitivity, specificity, likelihood ratios, kappa and phi statistics. It familiarises readers with the lexicon and techniques of EBM and allows better understanding of the needs of EBM practitioners.
While previous authors have emphasized the importance of integrating and reinforcing evidence-based medicine (EBM) skills in residency, there are few published examples of such curricula. We designed an EBM curriculum to train family practice interns in essential EBM skills for information mastery using clinical questions generated by the family practice inpatient service. We sought to evaluate the impact of this curriculum on interns, residents, and faculty.
Interns (n = 13) were asked to self-assess their level of confidence in basic EBM skills before and after their 2-week EBM rotation. Residents (n = 21) and faculty (n = 12) were asked to assess how often the answers provided by the EBM intern to the inpatient service changed medical care. In addition, residents were asked to report how often they used their EBM skills and how often EBM concepts and tools were used in teaching by senior residents and faculty. Faculty were asked if the EBM curriculum had increased their use of EBM in practice and in teaching.
Interns significantly increased their confidence over the course of the rotation. Residents and faculty felt that the answers provided by the EBM intern provided useful information and led to changes in patient care. Faculty reported incorporating EBM into their teaching (92%) and practice (75%). Residents reported applying the EBM skills they learned to patient care (86%) and that these skills were reinforced in the teaching they received outside of the rotation (81%). All residents and 11 of 12 faculty felt that the EBM curriculum had improved patient care.
To our knowledge, this is the first published EBM curriculum using an individual block rotation format. As such, it may provide an alternative model for teaching and incorporating EBM into a residency program.
Automatic text summarization for a biomedical concept can help researchers to get the key points of a certain topic from large amount of biomedical literature efficiently. In this paper, we present a method for generating text summary for a given biomedical concept, e.g., H1N1 disease, from multiple documents based on semantic relation extraction. Our approach includes three stages: 1) We extract semantic relations in each sentence using the semantic knowledge representation tool SemRep. 2) We develop a relation-level retrieval method to select the relations most relevant to each query concept and visualize them in a graphic representation. 3) For relations in the relevant set, we extract informative sentences that can interpret them from the document collection to generate text summary using an information retrieval based method. Our major focus in this work is to investigate the contribution of semantic relation extraction to the task of biomedical text summarization. The experimental results on summarization for a set of diseases show that the introduction of semantic knowledge improves the performance and our results are better than the MEAD system, a well-known tool for text summarization.
Lately, there has been a great interest in the application of information extraction methods to the biomedical domain, in particular, to the extraction of relationships of genes, proteins, and RNA from scientific publications. The development and evaluation of such methods requires annotated domain corpora.
We present BioInfer (Bio Information Extraction Resource), a new public resource providing an annotated corpus of biomedical English. We describe an annotation scheme capturing named entities and their relationships along with a dependency analysis of sentence syntax. We further present ontologies defining the types of entities and relationships annotated in the corpus. Currently, the corpus contains 1100 sentences from abstracts of biomedical research articles annotated for relationships, named entities, as well as syntactic dependencies. Supporting software is provided with the corpus. The corpus is unique in the domain in combining these annotation types for a single set of sentences, and in the level of detail of the relationship annotation.
We introduce a corpus targeted at protein, gene, and RNA relationships which serves as a resource for the development of information extraction systems and their components such as parsers and domain analyzers. The corpus will be maintained and further developed with a current version being available at .
A method for automatic extraction of clinical temporal information would be of significant practical importance for deep medical language understanding, and a key to creating many successful applications, such as medical decision making, medical question and answering, etc. This paper proposes a rich statistical model for extracting temporal information from an extremely noisy clinical corpus. Besides the common linguistic, contextual and semantic features, the highly restricted training sample expansion and the structure distance between the temporal expression & related event expressions are also integrated into a supervised machine-learning approach. The learning method produces almost 80% F- score in the extraction of five temporal classes, and nearly 75% F-score in identifying temporally related events. This process has been integrated into the document-processing component of an implemented clinical question answering system that focuses on answering patient-specific questions (See demonstration at http://hitrl.cs.usyd.edu.au/ICNS/).
Physicians have many questions when caring for patients, and frequently need to seek answers for their questions. Information retrieval systems (e.g., PubMed) typically return a list of documents in response to a user’s query. Frequently the number of returned documents is large and makes physicians’ information seeking “practical only ‘after hours’ and not in the clinical settings”. Question answering techniques are based on automatically analyzing thousands of electronic documents to generate short-text answers in response to clinical questions that are posed by physicians. The authors address physicians’ information needs and described the design, implementation, and evaluation of the medical question answering system (MedQA). Although our long term goal is to enable MedQA to answer all types of medical questions, currently, we currently implement MedQA to integrate information retrieval, extraction, and summarization techniques to automatically generate paragraph-level text for definitional questions (i.e., “What is X?”). MedQA can be accessed at http://www.dbmi.columbia.edu/~yuh9001/research/MedQA.html.
The practice of evidence-based medicine (EBM) requires clinicians to integrate their expertise with the latest scientific research. But this is becoming increasingly difficult with the growing numbers of published articles. There is a clear need for better tools to improve clinician's ability to search the primary literature. Randomized clinical trials (RCTs) are the most reliable source of evidence documenting the efficacy of treatment options. This paper describes the retrieval of key sentences from abstracts of RCTs as a step towards helping users find relevant facts about the experimental design of clinical studies.
Using Conditional Random Fields (CRFs), a popular and successful method for natural language processing problems, sentences referring to Intervention, Participants and Outcome Measures are automatically categorized. This is done by extending a previous approach for labeling sentences in an abstract for general categories associated with scientific argumentation or rhetorical roles: Aim, Method, Results and Conclusion. Methods are tested on several corpora of RCT abstracts. First structured abstracts with headings specifically indicating Intervention, Participant and Outcome Measures are used. Also a manually annotated corpus of structured and unstructured abstracts is prepared for testing a classifier that identifies sentences belonging to each category.
Using CRFs, sentences can be labeled for the four rhetorical roles with F-scores from 0.93–0.98. This outperforms the use of Support Vector Machines. Furthermore, sentences can be automatically labeled for Intervention, Participant and Outcome Measures, in unstructured and structured abstracts where the section headings do not specifically indicate these three topics. F-scores of up to 0.83 and 0.84 are obtained for Intervention and Outcome Measure sentences.
Results indicate that some of the methodological elements of RCTs are identifiable at the sentence level in both structured and unstructured abstract reports. This is promising in that sentences labeled automatically could potentially form concise summaries, assist in information retrieval and finer-grained extraction.
Given a set of pre-defined medical categories used in Evidence Based Medicine, we aim to automatically annotate sentences in medical abstracts with these labels.
We constructed a corpus of 1,000 medical abstracts annotated by hand with specified medical categories (e.g. Intervention, Outcome). We explored the use of various features based on lexical, semantic, structural, and sequential information in the data, using Conditional Random Fields (CRF) for classification.
For the classification tasks over all labels, our systems achieved micro-averaged f-scores of 80.9% and 66.9% over datasets of structured and unstructured abstracts respectively, using sequential features. In labeling only the key sentences, our systems produced f-scores of 89.3% and 74.0% over structured and unstructured abstracts respectively, using the same sequential features. The results over an external dataset were lower (f-scores of 63.1% for all labels, and 83.8% for key sentences).
Of the features we used, the best for classifying any given sentence in an abstract were based on unigrams, section headings, and sequential information from preceding sentences. These features resulted in improved performance over a simple bag-of-words approach, and outperformed feature sets used in previous work.
The discipline of Evidence Based Medicine (EBM) studies formal and quasi-formal methods for identifying high quality medical information and abstracting it in useful forms so that patients receive the best customized care possible . Current computer-based methods for finding high quality information in PubMed and similar bibliographic resources utilize search tools that employ preconstructed Boolean queries. These clinical queries are derived from a combined application of (a) user interviews, (b) ad-hoc manual document quality review, and (c) search over a constrained space of disjunctive Boolean queries. The present research explores the use of powerful text categorization (machine learning) methods to identify content-specific and high-quality PubMed articles. Our results show that models built with the proposed approach outperform the Boolean based PubMed clinical query filters in discriminatory power.
Access to health information by consumers is hampered by a fundamental language gap. Current attempts to close the gap leverage consumer oriented health information, which does not, however, have good coverage of slang medical terminology. In this paper, we present a Bayesian model to automatically align documents with different dialects (slang, common and technical) while extracting their semantic topics. The proposed diaTM model enables effective information retrieval, even when the query contains slang words, by explicitly modeling the mixtures of dialects in documents and the joint influence of dialects and topics on word selection. Simulations using consumer questions to retrieve medical information from a corpus of medical documents show that diaTM achieves a 25% improvement in information retrieval relevance by nDCG@5 over an LDA baseline.
Teaching of evidence-based medicine (EBM) has become widespread in medical education. Teaching the teachers (TTT) courses address the increased teaching demand and the need to improve effectiveness of EBM teaching. We conducted a systematic review of assessment tools for EBM TTT courses. To summarise and appraise existing assessment methods for teaching the teachers courses in EBM by a systematic review.
We searched PubMed, BioMed, EmBase, Cochrane and Eric databases without language restrictions and included articles that assessed its participants. Study selection and data extraction were conducted independently by two reviewers.
Of 1230 potentially relevant studies, five papers met the selection criteria. There were no specific assessment tools for evaluating effectiveness of EBM TTT courses. Some of the material available might be useful in initiating the development of such an assessment tool.
There is a need for the development of educationally sound assessment tools for teaching the teachers courses in EBM, without which it would be impossible to ascertain if such courses have the desired effect.
To answer five research questions: Do Norwegian physicians know about the three important aspects of EBM? Do they use EBM methods in their clinical practice? What are their attitudes towards EBM? Has EBM in their opinion changed medical practice during the last 10 years? Do they use EBM based information sources?
Cross sectional survey in 2006.
966 doctors who responded to a questionnaire (70% response rate).
In total 87% of the physicians mentioned the use of randomised clinical trials as a key aspect of EBM, while 53% of them mentioned use of clinical expertise and only 19% patients' values. 40% of the respondents reported that their practice had always been evidence-based. Many respondents experienced difficulties in using EBM principles in their clinical practice because of lack of time and difficulties in searching EBM based literature. 80% agreed that EBM helps physicians towards better practice and 52% that it improves patients' health. As reasons for changes in medical practice 86% of respondents mentioned medical progress, but only 39% EBM.
The results of the study indicate that Norwegian physicians have a limited knowledge of the key aspects of EBM but a positive attitude towards the concept. They had limited experience in the practice of EBM and were rather indifferent to the impact of EBM on medical practice. For solving a patient problem, physicians would rather consult a colleague than searching evidence based resources such as the Cochrane Library.
Constructing an answerable question and effectively searching the medical literature are key steps in practicing evidence-based medicine (EBM). This study aimed to identify the effectiveness of delivering a single workshop in EBM literature searching skills to medical students entering their first clinical years of study.
A randomized controlled trial was conducted with third-year undergraduate medical students. Participants were randomized to participate in a formal workshop in EBM literature searching skills, with EBM literature searching skills and perceived competency in EBM measured at one-week post-intervention via the Fresno tool and Clinical Effectiveness and Evidence-Based Practice Questionnaire.
A total of 121 participants were enrolled in the study, with 97 followed-up post-intervention. There was no statistical mean difference in EBM literature searching skills between the 2 groups (mean difference = 0.007 (P = 0.99)). Students attending the EBM workshop were significantly more confident in their ability to construct clinical questions and had greater perceived awareness of information resources.
A single EBM workshop did not result in statistically significant changes in literature searching skills. Teaching and reinforcing EBM literature searching skills during both preclinical and clinical years may result in increased student confidence, which may facilitate student use of EBM skills as future clinicians.
Background and objectives
Creating a complete translation of a large vocabulary is a time-consuming task, which requires skilled and knowledgeable medical translators. Our goal is to examine to which extent such a task can be alleviated by a specific natural language processing technique, word alignment in parallel corpora. We experiment with translation from English to French.
Build a large corpus of parallel, English-French documents, and automatically align it at the document, sentence and word levels using state-of-the-art alignment methods and tools. Then project English terms from existing controlled vocabularies to the aligned word pairs, and examine the number and quality of the putative French translations obtained thereby. We considered three American vocabularies present in the UMLS with three different translation statuses: the MeSH, SNOMED CT, and the MedlinePlus Health Topics.
We obtained several thousand new translations of our input terms, this number being closely linked to the number of terms in the input vocabularies.
Our study shows that alignment methods can extract a number of new term translations from large bodies of text with a moderate human reviewing effort, and thus contribute to help a human translator obtain better translation coverage of an input vocabulary. Short-term perspectives include their application to a corpus 20 times larger than that used here, together with more focused methods for term extraction.
To develop an automated system to extract medications and related information from discharge summaries as part of the 2009 i2b2 natural language processing (NLP) challenge. This task required accurate recognition of medication name, dosage, mode, frequency, duration, and reason for drug administration.
We developed an integrated system using several existing NLP components developed at Vanderbilt University Medical Center, which included MedEx (to extract medication information), SecTag (a section identification system for clinical notes), a sentence splitter, and a spell checker for drug names. Our goal was to achieve good performance with minimal to no specific training for this document corpus; thus, evaluating the portability of those NLP tools beyond their home institution. The integrated system was developed using 17 notes that were annotated by the organizers and evaluated using 251 notes that were annotated by participating teams.
The i2b2 challenge used standard measures, including precision, recall, and F-measure, to evaluate the performance of participating systems. There were two ways to determine whether an extracted textual finding is correct or not: exact matching or inexact matching. The overall performance for all six types of medication-related findings across 251 annotated notes was considered as the primary metric in the challenge.
Our system achieved an overall F-measure of 0.821 for exact matching (0.839 precision; 0.803 recall) and 0.822 for inexact matching (0.866 precision; 0.782 recall). The system ranked second out of 20 participating teams on overall performance at extracting medications and related information.
The results show that the existing MedEx system, together with other NLP components, can extract medication information in clinical text from institutions other than the site of algorithm development with reasonable performance.
BioSimplify is an open source tool written in Java that introduces and facilitates the use of a novel model for sentence simplification tuned for automatic discourse analysis and information extraction (as opposed to sentence simplification for improving human readability). The model is based on a “shot-gun” approach that produces many different (simpler) versions of the original sentence by combining variants of its constituent elements. This tool is optimized for processing biomedical scientific literature such as the abstracts indexed in PubMed. We tested our tool on its impact to the task of PPI extraction and it improved the f-score of the PPI tool by around 7%, with an improvement in recall of around 20%. The BioSimplify tool and test corpus can be downloaded from https://biosimplify.sourceforge.net
The increasing availability of full-text biomedical articles will allow more biomedical knowledge to be extracted automatically with greater reliability. However, most Information Retrieval (IR) and Extraction (IE) tools currently process only abstracts. The lack of corpora has limited the development of tools that are capable of exploiting the knowledge in full-text articles. As a result, there has been little investigation into the advantages of full-text document structure, and the challenges developers will face in processing full-text articles.
We manually annotated passages from full-text articles that describe interactions summarised in a Molecular Interaction Map (MIM). Our corpus tracks the process of identifying facts to form the MIM summaries and captures any factual dependencies that must be resolved to extract the fact completely. For example, a fact in the results section may require a synonym defined in the introduction. The passages are also annotated with negated and coreference expressions that must be resolved.
We describe the guidelines for identifying relevant passages and possible dependencies. The corpus includes 2162 sentences from 78 full-text articles. Our corpus analysis demonstrates the necessity of full-text processing; identifies the article sections where interactions are most commonly stated; and quantifies the proportion of interaction statements requiring coherent dependencies. Further, it allows us to report on the relative importance of identifying synonyms and resolving negated expressions. We also experiment with an oracle sentence retrieval system using the corpus as a gold-standard evaluation set.
We introduce the MIM corpus, a unique resource that maps interaction facts in a MIM to annotated passages within full-text articles. It is an invaluable case study providing guidance to developers of biomedical IR and IE systems, and can be used as a gold-standard evaluation set for full-text IR tasks.
An efficient clinical process guideline (CPG) modeling service was designed that uses an enhanced intelligent search protocol. The need for a search system arises from the requirement for CPG models to be able to adapt to dynamic patient contexts, allowing them to be updated based on new evidence that arises from medical guidelines and papers.
A sentence category classifier combined with the AdaBoost.M1 algorithm was used to evaluate the contribution of the CPG to the quality of the search mechanism. Three annotators each tagged 340 sentences hand-chosen from the Joint National Committee on Prevention, Detection, Evaluation, and Treatment of High Blood Pressure (JNC7) clinical guideline. The three annotators then carried out cross-validations of the tagged corpus. A transformation function is also used that extracts a predefined set of structural feature vectors determined by analyzing the sentential instance in terms of the underlying syntactic structures and phrase-level co-occurrences that lie beneath the surface of the lexical generation event.
The additional sub-filtering using a combination of multi-classifiers was found to be more effective than a single conventional Term Frequency-Inverse Document Frequency (TF-IDF)-based search system in pinpointing the page containing or adjacent to the guideline information.
We found that transformation has the advantage of exploiting the structural and underlying features which go unseen by the bag-of-words (BOW) model. We also realized that integrating a sentential classifier with a TF-IDF-based search engine enhances the search process by maximizing the probability of the automatically presented relevant information required in the context generated by the guideline authoring environment.
Knowledge Bases; Data Mining; Natural Language Processing
Information retrieval applications have to publish their output in the form of ranked lists. Such a requirement motivates researchers to develop methods that can automatically learn effective ranking models. Many existing methods usually perform analysis on multidimensional features of query-document pairs directly and don't take users' interactive feedback information into account. They thus incur the high computation overhead and low retrieval performance due to an indefinite query expression. In this paper, we propose a Virtual Feature based Logistic Regression (VFLR) ranking method that conducts the logistic regression on a set of essential but independent variables, called virtual features (VF). They are extracted via the principal component analysis (PCA) method with the user's relevance feedback. We then predict the ranking score of each queried document to produce a ranked list. We systematically evaluate our method using the LETOR 4.0 benchmark datasets. The experimental results demonstrate that the proposal outperforms the state-of-the-art methods in terms of the Mean Average Precision (MAP), the Precision at position k (P@k), and the Normalized Discounted Cumulative Gain at position k (NDCG@k).
Over the past years concerns are rising about the use of Evidence-Based Medicine (EBM) in health care. The calls for an increase in the practice of EBM, seem to be obstructed by many barriers preventing the implementation of evidence-based thinking and acting in general practice. This study aims to explore the barriers of Flemish GPs (General Practitioners) to the implementation of EBM in routine clinical work and to identify possible strategies for integrating EBM in daily work.
We used a qualitative research strategy to gather and analyse data. We organised focus groups between September 2002 and April 2003. The focus group data were analysed using a combined strategy of 'between-case' analysis and 'grounded theory approach'. Thirty-one general practitioners participated in four focus groups. Purposeful sampling was used to recruit participants.
A basic classification model documents the influencing factors and actors on a micro-, meso- as well as macro-level. Patients, colleagues, competences, logistics and time were identified on the micro-level (the GPs' individual practice), commercial and consumer organisations on the meso-level (institutions, organisations) and health care policy, media and specific characteristics of evidence on the macro-level (policy level and international scientific community). Existing barriers and possible strategies to overcome these barriers were described.
In order to implement EBM in routine general practice, an integrated approach on different levels needs to be developed.
Tools to automatically summarize gene information from the literature have the potential to help genomics researchers better interpret gene expression data and investigate biological pathways. The task of finding information on sets of genes is common for genomic researchers, and PubMed is still the first choice because the most recent and original information can only be found in the unstructured, free text biomedical literature. However, finding information on a set of genes by manually searching and scanning the literature is a time-consuming and daunting task for scientists. We built and evaluated a query-based automatic summarizer of information on mouse genes studied in microarray experiments. The system clusters a set of genes by MeSH, GO and free text features and presents summaries for each gene by ranked sentences extracted from MEDLINE abstracts. Evaluation showed that the system seems to provide meaningful clusters and informative sentences are ranked higher by the algorithm.
Rapid growth in the scientific literature available on-line continues to motivate shifting data analysis from humans to computers. For example, greater knowledge of sentence characteristics indicative of interaction between two biological entities is needed to aid in the creation of better-performing information extraction tools for effectively using this rich body of information.
The Interaction Sentence Database (ISDB) allows users to retrieve sets of sentences fitting specified characteristics. To support this, a database of sentences from abstracts in MEDLINE was created. The sentences in the database all contain at least two biomolecule terms and one interaction-indicating term. A web interface to the database allows the user to query for sentences containing an interaction-indicating term, a single biomolecule name, or two biomolecule names, as well as for a list of biomolecules co-occurring with a given biomolecule in at least one sentence.
The system supports researchers needing conveniently available sets of sample sentences for corpus-based research on sentence properties. It also illustrates a model architecture for a sentence-based retrieval system which would be useful to people seeking information and knowledge on-line. ISDB can be freely accessed over the Web at http://bioinformatics.ualr.edu/cgi-bin/services/ISDB/isdb.cgi, and the processed database will be provided upon request.
Healthcare institutions need timely patient information from various sources at the point-of-care. Evidence-based medicine (EBM) is a tool for proper and efficient incorporation of the results of research in decision-making. Characteristics of medical treatment processes and practical experience concerning the effect of EBM in the clinical process are surveyed.
A cross sectional survey conducted in Tehran hospitals in February-March 2012 among 51 clinical residents. The respondents were asked to apply EBM in clinical decision-making to answer questions about the effect of EBM in the clinical process. A valid and reliable questionnaire was used in this study.
EBM provides a framework for problem solving and improvement of processes. Most residents (76%) agreed that EBM could improve clinical decision making. Eighty one percent of the respondents believed that EBM resulted in quick updating of knowledge. They believed that EBM was more useful for diagnosis than for treatment. There was a significant association between out-patients and in-patients in using electronic EBM resources.
Research findings were useful in clinical practice and decision making. The computerized guidelines are important tools for improving clinical process quality. When learning how to use IT, methods of search and evaluation of evidence for diagnosis, treatment and medical education are necessary. Purposeful use of IT in clinical processes reduces workload and improves decision-making.
Therapeutic Process; Evidence-Based Medicine; Decision Making; Health Information Technology; Guideline