PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (410344)

Clipboard (0)
None

Related Articles

1.  iSMART: Ontology-based Semantic Query of CDA Documents 
The Health Level 7 Clinical Document Architecture (CDA) is widely accepted as the format for electronic clinical document. With the rich ontological references in CDA documents, the ontology-based semantic query could be performed to retrieve CDA documents. In this paper, we present iSMART (interactive Semantic MedicAl Record reTrieval), a prototype system designed for ontology-based semantic query of CDA documents. The clinical information in CDA documents will be extracted into RDF triples by a declarative XML to RDF transformer. An ontology reasoner is developed to infer additional information by combining the background knowledge from SNOMED CT ontology. Then an RDF query engine is leveraged to enable the semantic queries. This system has been evaluated using the real clinical documents collected from a large hospital in southern China.
PMCID: PMC2815425  PMID: 20351883
2.  Remote access to anatomical information: an integration between semantic knowledge and visual data. 
A novel internet-based application is presented which provides access to anatomy knowledge through symbolic modality expressed by keywords taken from controlled or non-controlled terminology. The system is based on a database where anatomical concepts have been organized into a hierarchical framework. Along with term queries that allow retrieving concepts containing or exactly matching the used keyword, the system also provides semantic access to anatomical information. Queries can be setup, which retrieve concepts relying to a particular meaning and sharing a particular relationship. Moreover, the application has the capability to refine the search of the terms by querying the UMLS knowledge server. Anatomical image data have been integrated by using Visible Human Dataset. A set of these images has been indexed according to our anatomical classification and is used inside the application. The system has been implemented through Java client-server technology and works within standard Internet browsers.
PMCID: PMC2243893  PMID: 11079858
3.  COM3/369: Knowledge-based Information Systems: A new approach for the representation and retrieval of medical information 
Introduction
Present solutions for the representation and retrieval of medical information from online sources are not very satisfying. Either the retrieval process lacks of precision and completeness the representation does not support the update and maintenance of the represented information. Most efforts are currently put into improving the combination of search engines and HTML based documents. However, due to the current shortcomings of methods for natural language understanding there are clear limitations to this approach. Furthermore, this approach does not solve the maintenance problem. At least medical information exceeding a certain complexity seems to afford approaches that rely on structured knowledge representation and corresponding retrieval mechanisms.
Methods
Knowledge-based information systems are based on the following fundamental ideas. The representation of information is based on ontologies that define the structure of the domain's concepts and their relations. Views on domain models are defined and represented as retrieval schemata. Retrieval schemata can be interpreted as canonical query types focussing on specific aspects of the provided information (e.g. diagnosis or therapy centred views). Based on these retrieval schemata it can be decided which parts of the information in the domain model must be represented explicitly and formalised to support the retrieval process. As representation language propositional logic is used. All other information can be represented in a structured but informal way using text, images etc. Layout schemata are used to assign layout information to retrieved domain concepts. Depending on the target environment HTML or XML can be used.
Results
Based on this approach two knowledge-based information systems have been developed. The 'Ophthalmologic Knowledge-based Information System for Diabetic Retinopathy' (OKIS-DR) provides information on diagnoses, findings, examinations, guidelines, and reference images related to diabetic retinopathy. OKIS-DR uses combinations of findings to specify the information that must be retrieved. The second system focuses on nutrition related allergies and intolerances. Information on allergies and intolerances of a patient are used to retrieve general information on the specified combination of allergies and intolerances. As a special feature the system generates tables showing food types and products that are tolerated or not tolerated by patients. Evaluation by external experts and user groups showed that the described approach of knowledge-based information systems increases the precision and completeness of knowledge retrieval. Due to the structured and non-redundant representation of information the maintenance and update of the information can be simplified. Both systems are available as WWW based online knowledge bases and CD-ROMs (cf. http://mta.gsf.de topic: products).
doi:10.2196/jmir.1.suppl1.e16
PMCID: PMC1761778
Knowledge-based Information Systems; Knowledge-based Systems; Information Retrieval
4.  FACTA: a text search engine for finding associated biomedical concepts 
Bioinformatics  2008;24(21):2559-2560.
Summary: FACTA is a text search engine for MEDLINE abstracts, which is designed particularly to help users browse biomedical concepts (e.g. genes/proteins, diseases, enzymes and chemical compounds) appearing in the documents retrieved by the query. The concepts are presented to the user in a tabular format and ranked based on the co-occurrence statistics. Unlike existing systems that provide similar functionality, FACTA pre-indexes not only the words but also the concepts mentioned in the documents, which enables the user to issue a flexible query (e.g. free keywords or Boolean combinations of keywords/concepts) and receive the results immediately even when the number of the documents that match the query is very large. The user can also view snippets from MEDLINE to get textual evidence of associations between the query terms and the concepts. The concept IDs and their names/synonyms for building the indexes were collected from several biomedical databases and thesauri, such as UniProt, BioThesaurus, UMLS, KEGG and DrugBank.
Availability: The system is available at http://www.nactem.ac.uk/software/facta/
Contact: yoshimasa.tsuruoka@manchester.ac.uk
doi:10.1093/bioinformatics/btn469
PMCID: PMC2572701  PMID: 18772154
5.  Issues in the Design of a Pilot Concept-Based Query Interface for the Neuroinformatics Information Framework 
Neuroinformatics  2008;6(3):229-239.
This paper describes a pilot query interface that has been constructed to help us explore a “concept-based” approach for searching the Neuroscience Information Framework (NIF). The query interface is concept-based in the sense that the search terms submitted through the interface are selected from a standardized vocabulary of terms (concepts) that are structured in the form of an ontology. The NIF contains three primary resources: the NIF Resource Registry, the NIF Document Archive, and the NIF Database Mediator. These NIF resources are very different in their nature and therefore pose challenges when designing a single interface from which searches can be automatically launched against all three resources simultaneously. The paper first discusses briefly several background issues involving the use of standardized biomedical vocabularies in biomedical information retrieval, and then presents a detailed example that illustrates how the pilot concept-based query interface operates. The paper concludes by discussing certain lessons learned in the development of the current version of the interface.
doi:10.1007/s12021-008-9035-9
PMCID: PMC2664632  PMID: 18953674
Data search; Web search; Ontologies; Database mediation; Data federation; Text search; Neuroscience
6.  A Bayesian Translational Framework for Knowledge Propagation, Discovery, and Integration Under Specific Contexts 
The immense corpus of biomedical literature existing today poses challenges in information search and integration. Many links between pieces of knowledge occur or are significant only under certain contexts—rather than under the entire corpus. This study proposes using networks of ontology concepts, linked based on their co-occurrences in annotations of abstracts of biomedical literature and descriptions of experiments, to draw conclusions based on context-specific queries and to better integrate existing knowledge. In particular, a Bayesian network framework is constructed to allow for the linking of related terms from two biomedical ontologies under the queried context concept. Edges in such a Bayesian network allow associations between biomedical concepts to be quantified and inference to be made about the existence of some concepts given prior information about others. This approach could potentially be a powerful inferential tool for context-specific queries, applicable to ontologies in other fields as well.
PMCID: PMC3392061  PMID: 22779044
7.  A Query Expansion Framework in Image Retrieval Domain Based on Local and Global Analysis 
We present an image retrieval framework based on automatic query expansion in a concept feature space by generalizing the vector space model of information retrieval. In this framework, images are represented by vectors of weighted concepts similar to the keyword-based representation used in text retrieval. To generate the concept vocabularies, a statistical model is built by utilizing Support Vector Machine (SVM)-based classification techniques. The images are represented as “bag of concepts” that comprise perceptually and/or semantically distinguishable color and texture patches from local image regions in a multi-dimensional feature space. To explore the correlation between the concepts and overcome the assumption of feature independence in this model, we propose query expansion techniques in the image domain from a new perspective based on both local and global analysis. For the local analysis, the correlations between the concepts based on the co-occurrence pattern, and the metrical constraints based on the neighborhood proximity between the concepts in encoded images, are analyzed by considering local feedback information. We also analyze the concept similarities in the collection as a whole in the form of a similarity thesaurus and propose an efficient query expansion based on the global analysis. The experimental results on a photographic collection of natural scenes and a biomedical database of different imaging modalities demonstrate the effectiveness of the proposed framework in terms of precision and recall.
doi:10.1016/j.ipm.2010.12.001
PMCID: PMC3150552  PMID: 21822350
Image Retrieval; Vector Space Model; Support Vector Machine; Relevance Feedback; Query Expansion
8.  Automation and integration of components for generalized semantic markup of electronic medical texts. 
Our group has built an information retrieval system based on a complex semantic markup of medical textbooks. We describe the construction of a set of web-based knowledge-acquisition tools that expedites the collection and maintenance of the concepts required for text markup and the search interface required for information retrieval from the marked text. In the text markup system, domain experts (DEs) identify sections of text that contain one or more elements from a finite set of concepts. End users can then query the text using a predefined set of questions, each of which identifies a subset of complementary concepts. The search process matches that subset of concepts to relevant points in the text. The current process requires that the DE invest significant time to generate the required concepts and questions. We propose a new system--called ACQUIRE (Acquisition of Concepts and Queries in an Integrated Retrieval Environment)--that assists a DE in two essential tasks in the text-markup process. First, it helps her to develop, edit, and maintain the concept model: the set of concepts with which she marks the text. Second, ACQUIRE helps her to develop a query model: the set of specific questions that end users can later use to search the marked text. The DE incorporates concepts from the concept model when she creates the questions in the query model. The major benefit of the ACQUIRE system is a reduction in the time and effort required for the text-markup process. We compared the process of concept- and query-model creation using ACQUIRE to the process used in previous work by rebuilding two existing models that we previously constructed manually. We observed a significant decrease in the time required to build and maintain the concept and query models.
Images
PMCID: PMC2232691  PMID: 10566457
9.  An introduction to information retrieval: applications in genomics 
The pharmacogenomics journal  2002;2(2):96-102.
Information retrieval (IR) is the field of computer science that deals with the processing of documents containing free text, so that they can be rapidly retrieved based on keywords specified in a user’s query. IR technology is the basis of Web-based search engines, and plays a vital role in biomedical research, because it is the foundation of software that supports literature search. Documents can be indexed by both the words they contain, as well as the concepts that can be matched to domain-specific thesauri; concept matching, however, poses several practical difficulties that make it unsuitable for use by itself. This article provides an introduction to IR and summarizes various applications of IR and related technologies to genomics.
PMCID: PMC3137130  PMID: 12049181
information retrieval; full-text indexing; text processing; genomics
10.  A Maximum-Entropy approach for accurate document annotation in the biomedical domain 
Journal of Biomedical Semantics  2012;3(Suppl 1):S2.
The increasing number of scientific literature on the Web and the absence of efficient tools used for classifying and searching the documents are the two most important factors that influence the speed of the search and the quality of the results. Previous studies have shown that the usage of ontologies makes it possible to process document and query information at the semantic level, which greatly improves the search for the relevant information and makes one step further towards the Semantic Web. A fundamental step in these approaches is the annotation of documents with ontology concepts, which can also be seen as a classification task. In this paper we address this issue for the biomedical domain and present a new automated and robust method, based on a Maximum Entropy approach, for annotating biomedical literature documents with terms from the Medical Subject Headings (MeSH).
The experimental evaluation shows that the suggested Maximum Entropy approach for annotating biomedical documents with MeSH terms is highly accurate, robust to the ambiguity of terms, and can provide very good performance even when a very small number of training documents is used. More precisely, we show that the proposed algorithm obtained an average F-measure of 92.4% (precision 99.41%, recall 86.77%) for the full range of the explored terms (4,078 MeSH terms), and that the algorithm’s performance is resilient to terms’ ambiguity, achieving an average F-measure of 92.42% (precision 99.32%, recall 86.87%) in the explored MeSH terms which were found to be ambiguous according to the Unified Medical Language System (UMLS) thesaurus. Finally, we compared the results of the suggested methodology with a Naive Bayes and a Decision Trees classification approach, and we show that the Maximum Entropy based approach performed with higher F-Measure in both ambiguous and monosemous MeSH terms.
doi:10.1186/2041-1480-3-S1-S2
PMCID: PMC3337257  PMID: 22541593
11.  Extraction of SNOMED concepts from medical record texts. 
Clinicians have traditionally documented patient data using natural language text. With the increasing prevalence of computer systems in health care, an increasing amount of medical record text will be stored electronically. However, for such textual documents to be indexed, shared, and processed adequately by computers, it will be important to be able to identify concepts in the documents using a common medical terminology. Automated methods for extracting concepts in a standard terminology would enhance retrieval and analysis of medical record data. This paper discusses a method for extracting concepts from medical record documents using the medical terminology SNOMED-III (Systematized Nomenclature of Human and Veterinary Medicine, Version III). The technique employs a linear least squares fit that maps training set phrases to SNOMED concepts. This mapping can be used for unknown text inputs in the same domain as the training set to predict SNOMED concepts that are contained in the document. We have implemented the method in the domain of congestive heart failure for history and physical exam texts. Our system has a reasonable response time. We tested the system over a range of thresholds. The system performed with 90% sensitivity and 83% specificity at the lowest threshold, and 42% sensitivity and 99.9% specificity at the highest threshold.
PMCID: PMC2247882  PMID: 7949915
12.  User centered and ontology based information retrieval system for life sciences 
BMC Bioinformatics  2012;13(Suppl 1):S4.
Background
Because of the increasing number of electronic resources, designing efficient tools to retrieve and exploit them is a major challenge. Some improvements have been offered by semantic Web technologies and applications based on domain ontologies. In life science, for instance, the Gene Ontology is widely exploited in genomic applications and the Medical Subject Headings is the basis of biomedical publications indexation and information retrieval process proposed by PubMed. However current search engines suffer from two main drawbacks: there is limited user interaction with the list of retrieved resources and no explanation for their adequacy to the query is provided. Users may thus be confused by the selection and have no idea on how to adapt their queries so that the results match their expectations.
Results
This paper describes an information retrieval system that relies on domain ontology to widen the set of relevant documents that is retrieved and that uses a graphical rendering of query results to favor user interactions. Semantic proximities between ontology concepts and aggregating models are used to assess documents adequacy with respect to a query. The selection of documents is displayed in a semantic map to provide graphical indications that make explicit to what extent they match the user's query; this man/machine interface favors a more interactive and iterative exploration of data corpus, by facilitating query concepts weighting and visual explanation. We illustrate the benefit of using this information retrieval system on two case studies one of which aiming at collecting human genes related to transcription factors involved in hemopoiesis pathway.
Conclusions
The ontology based information retrieval system described in this paper (OBIRS) is freely available at: http://www.ontotoolkit.mines-ales.fr/ObirsClient/. This environment is a first step towards a user centred application in which the system enlightens relevant information to provide decision help.
doi:10.1186/1471-2105-13-S1-S4
PMCID: PMC3434427  PMID: 22373375
13.  Supporting Ontology-based Keyword Search over Medical Databases 
The proliferation of medical terms poses a number of challenges in the sharing of medical information among different stakeholders. Ontologies are commonly used to establish relationships between different terms, yet their role in querying has not been investigated in detail. In this paper, we study the problem of supporting ontology-based keyword search queries on a database of electronic medical records. We present several approaches to support this type of queries, study the advantages and limitations of each approach, and summarize the lessons learned as best practices.
PMCID: PMC2656038  PMID: 18998839
14.  Testing Three Problem List Terminologies in a Simulated Data Entry Environment 
Three Problem List Terminologies (PLT) were tested using a web-based application simulating a clinical data entry environment to evaluate coverage and coding efficiency. The three PLTs were: the CORE Problem List Subset of SNOMED CT, a clinical subset extracted from the full SNOMED CT and the PLT currently used at the Mayo Clinic. Candidate problem statements were randomly extracted from free text problem list entries contained in two electronic medical record systems. Physician reviewers searched for concepts in one of the three PLTs that most closely matched a problem statement. Altogether 45 reviewers reviewed 15 problems each. The coverage of the much smaller CORE Subset was comparable to Clinical SNOMED for combined exact or partial matches. The CORE Subset required the shortest time to find a concept. This may be related to the smaller size of the pick lists for the CORE Subset.
PMCID: PMC3243193  PMID: 22195098
15.  MachineProse: an Ontological Framework for Scientific Assertions 
Objective: The idea of testing a hypothesis is central to the practice of biomedical research. However, the results of testing a hypothesis are published mainly in the form of prose articles. Encoding the results as scientific assertions that are both human and machine readable would greatly enhance the synergistic growth and dissemination of knowledge.
Design: We have developed MachineProse (MP), an ontological framework for the concise specification of scientific assertions. MP is based on the idea of an assertion constituting a fundamental unit of knowledge. This is in contrast to current approaches that use discrete concept terms from domain ontologies for annotation and assertions are only inferred heuristically.
Measurements: We use illustrative examples to highlight the advantages of MP over the use of the Medical Subject Headings (MeSH) system and keywords in indexing scientific articles.
Results: We show how MP makes it possible to carry out semantic annotation of publications that is machine readable and allows for precise search capabilities. In addition, when used by itself, MP serves as a knowledge repository for emerging discoveries. A prototype for proof of concept has been developed that demonstrates the feasibility and novel benefits of MP. As part of the MP framework, we have created an ontology of relationship types with about 100 terms optimized for the representation of scientific assertions.
Conclusion: MachineProse is a novel semantic framework that we believe may be used to summarize research findings, annotate biomedical publications, and support sophisticated searches.
doi:10.1197/jamia.M1910
PMCID: PMC1447552  PMID: 16357355
16.  Combining Text Classification and Hidden Markov Modeling Techniques for Structuring Randomized Clinical Trial Abstracts 
Randomized clinical trials (RCT) papers provide reliable information about efficacy of medical interventions. Current keyword based search methods to retrieve medical evidence, overload users with irrelevant information as these methods often do not take in to consideration semantics encoded within abstracts and the search query. Personalized semantic search, intelligent clinical question answering and medical evidence summarization aim to solve this information overload problem. Most of these approaches will significantly benefit if the information available in the abstracts is structured into meaningful categories (e.g., background, objective, method, result and conclusion). While many journals use structured abstract format, the majority of RCT abstracts still remain unstructured.
We have developed a novel automated approach to structuring RCT abstracts by combining text classification and Hidden Markov Modeling (HMM) techniques. The results (precision of 0.94, recall of 0.93) of our approach are a significant improvement over previously reported work on automated sentences categorization in RCT abstracts.
PMCID: PMC1839538  PMID: 17238456
17.  Selective retrieval of pre- and post-coordinated SNOMED concepts. 
In general, it is very straightforward to store concept identifiers in electronic medical records and represent them in messages. Information models typically specify the fields that can contain coded entries. For each of these fields there may be additional constraints governing exactly which concept identifiers are applicable. However, because modern terminologies such as SNOMED CT are compositional, allowing concept expressions to be pre-coordinated within the terminology or post-coordinated within the medical record, there remains the potential to express a concept in more than one way. Often times, the various representations are similar, but not equivalent. This paper describes an approach for retrieving these pre- and post-coordinated concept expressions: (1) Create concept expressions using a logically-well-structured terminology (e.g., SNOMED CT) according to the rules of a well-specified information model (in this paper we use the HL7 RIM); (2) Transform pre- and post-coordinated concept expressions into a normalized form; (3) Transform queries into the same normalized form. The normalized instances can then be directly compared to the query. Several implementation considerations have been identified. Transformations into a normal form and execution of queries that require traversal of hierarchies need to be optimized. A detailed understanding of the information model and the terminology model are prerequisites. Queries based on the semantic properties of concepts are only as complete as the semantic information contained in the terminology model. Despite these considerations, the approach appears powerful and will continue to be refined.
PMCID: PMC2244193  PMID: 12463817
18.  MedEvi: Retrieving textual evidence of relations between biomedical concepts from Medline 
Bioinformatics  2008;24(11):1410-1412.
Summary: Search engines running on MEDLINE abstracts have been widely used by biologists to find publications that are related to their research. The existing search engines such as PubMed, however, have limitations when applied for the task of seeking textual evidence of relations between given concepts. The limitations are mainly due to the problem that the search engines do not effectively deal with multi-term queries which may imply semantic relations between the terms. To address this problem, we present MedEvi, a novel search engine that imposes positional restriction on occurrences matching multi-term queries, based on the observation that terms with semantic relations which are explicitly stated in text are not found too far from each other. MedEvi further identifies additional keywords of biological and statistical significance from local context of matching occurrences in order to help users reformulate their queries for better results.
Availability: http://www.ebi.ac.uk/tc-test/textmining/medevi/
Contact: kim@ebi.ac.uk
doi:10.1093/bioinformatics/btn117
PMCID: PMC2387223  PMID: 18400773
19.  Intelligent Focusing in Knowledge Indexing and Retrieval: The Relatedness Tool 
Most present day information retrieval systems use the presence or absence of certain words to decide which documents are appropriate for a user's query. This approach has had certain successes, but it fails to capture relationships between concepts represented by the words, and hence reduces the potential specificity of both indexing and searching of documents. A richer representation of the semantics of documents and queries, and methods for reasoning about these representations, have been provided by artificial intelligence. Navigational tools for browsing and authoring knowledge bases (KB's) add a convenient technique for focusing in the complex landscape of semantic representations. The center of such representations is usually a frame or a semantic network system. We are developing a prototype Unified Medical Language System (UMLS) taxonomy to represent objects and relationships in medicine. One focus of our research is improved methods for indexing and querying repositories of biomedical literature. The technique which we propose is based on the notion of relatedness of concepts. To this end we define heuristics which find related concepts and apply it to the UMLS taxonomy. Preliminary results from experiments with the implemented heuristics demonstrate its potential usefulness.
PMCID: PMC2245195
20.  Searching biomedical databases on complementary medicine: the use of controlled vocabulary among authors, indexers and investigators 
Background
The optimal retrieval of a literature search in biomedicine depends on the appropriate use of Medical Subject Headings (MeSH), descriptors and keywords among authors and indexers. We hypothesized that authors, investigators and indexers in four biomedical databases are not consistent in their use of terminology in Complementary and Alternative Medicine (CAM).
Methods
Based on a research question addressing the validity of spinal palpation for the diagnosis of neuromuscular dysfunction, we developed four search concepts with their respective controlled vocabulary and key terms. We calculated the frequency of MeSH, descriptors, and keywords used by authors in titles and abstracts in comparison to standard practices in semantic and analytic indexing in MEDLINE, MANTIS, CINAHL, and Web of Science.
Results
Multiple searches resulted in the final selection of 38 relevant studies that were indexed at least in one of the four selected databases. Of the four search concepts, validity showed the greatest inconsistency in terminology among authors, indexers and investigators. The use of spinal terms showed the greatest consistency. Of the 22 neuromuscular dysfunction terms provided by the investigators, 11 were not contained in the controlled vocabulary and six were never used by authors or indexers. Most authors did not seem familiar with the controlled vocabulary for validity in the area of neuromuscular dysfunction. Recently, standard glossaries have been developed to assist in the research development of manual medicine.
Conclusions
Searching biomedical databases for CAM is challenging due to inconsistent use of controlled vocabulary and indexing procedures in different databases. A standard terminology should be used by investigators in conducting their search strategies and authors when writing titles, abstracts and submitting keywords for publications.
doi:10.1186/1472-6882-3-3
PMCID: PMC166167  PMID: 12846931
21.  Reliability of SNOMED-CT Coding by Three Physicians using Two Terminology Browsers 
SNOMED-CT has been promoted as a reference terminology for electronic health record (EHR) systems. Many important EHR functions are based on the assumption that medical concepts will be coded consistently by different users. This study is designed to measure agreement among three physicians using two SNOMED-CT terminology browsers to encode 242 concepts from five ophthalmology case presentations in a publicly-available clinical journal. Inter-coder reliability, based on exact coding match by each physician, was 44% using one browser and 53% using the other. Intra-coder reliability testing revealed that a different SNOMED-CT code was obtained up to 55% of the time when the two browsers were used by one user to encode the same concept. These results suggest that the reliability of SNOMED-CT coding is imperfect, and may be a function of browsing methodology. A combination of physician training, terminology refinement, and browser improvement may help increase the reproducibility of SNOMED-CT coding.
PMCID: PMC1839418  PMID: 17238317
22.  Comparison of Ontology-based Semantic-Similarity Measures 
Semantic-similarity measures quantify concept similarities in a given ontology. Potential applications for these measures include search, data mining, and knowledge discovery in database or decision-support systems that utilize ontologies. To date, there have not been comparisons of the different semantic-similarity approaches on a single ontology. Such a comparison can offer insight on the validity of different approaches. We compared 3 approaches to semantic similarity-metrics (which rely on expert opinion, ontologies only, and information content) with 4 metrics applied to SNOMED-CT. We found that there was poor agreement among those metrics based on information content with the ontology only metric. The metric based only on the ontology structure correlated most with expert opinion. Our results suggest that metrics based on the ontology only may be preferable to information-content–based metrics, and point to the need for more research on validating the different approaches.
PMCID: PMC2655943  PMID: 18999312
23.  Automated Text Markup for Information Retrieval from an Electronic Textbook of Infectious Disease 
The information needs of practicing clinicians frequently require textbook or journal searches. Making these sources available in electronic form improves the speed of these searches, but precision (i.e., the fraction of relevant to total documents retrieved) remains low. Improving the traditional keyword search by transforming search terms into canonical concepts does not improve search precision greatly.
Kim et al. have designed and built a prototype system (MYCIN II) for computer-based information retrieval from a forthcoming electronic textbook of infectious disease. The system requires manual indexing by experts in the form of complex text markup. However, this mark-up process is time consuming (about 3 person-hours to generate, review, and transcribe the index for each of 218 chapters).
We have designed and implemented a system to semiautomate the markup process. The system, information extraction for semiautomated indexing of documents (ISAID), uses query models and existing information-extraction tools to provide support for any user, including the author of the source material, to mark up tertiary information sources quickly and accurately.
PMCID: PMC2232314
24.  Determining correspondences between high-frequency MedDRA concepts and SNOMED: a case study 
Background
The Systematic Nomenclature of Medicine Clinical Terms (SNOMED CT) is being advocated as the foundation for encoding clinical documentation. While the electronic medical record is likely to play a critical role in pharmacovigilance - the detection of adverse events due to medications - classification and reporting of Adverse Events is currently based on the Medical Dictionary of Regulatory Activities (MedDRA). Complete and high-quality MedDRA-to-SNOMED CT mappings can therefore facilitate pharmacovigilance.
The existing mappings, as determined through the Unified Medical Language System (UMLS), are partial, and record only one-to-one correspondences even though SNOMED CT can be used compositionally. Efforts to map previously unmapped MedDRA concepts would be most productive if focused on concepts that occur frequently in actual adverse event data.
We aimed to identify aspects of MedDRA that complicate mapping to SNOMED CT, determine pattern in unmapped high-frequency MedDRA concepts, and to identify types of integration errors in the mapping of MedDRA to UMLS.
Methods
Using one years' data from the US Federal Drug Administrations Adverse Event Reporting System, we identified MedDRA preferred terms that collectively accounted for 95% of both Adverse Events and Therapeutic Indications records. After eliminating those already mapping to SNOMED CT, we attempted to map the remaining 645 Adverse-Event and 141 Therapeutic-Indications preferred terms with software assistance.
Results
All but 46 Adverse-Event and 7 Therapeutic-Indications preferred terms could be composed using SNOMED CT concepts: none of these required more than 3 SNOMED CT concepts to compose. We describe the common composition patterns in the paper. About 30% of both Adverse-Event and Therapeutic-Indications Preferred Terms corresponded to single SNOMED CT concepts: the correspondence was detectable by human inspection but had been missed during the integration process, which had created duplicated concepts in UMLS.
Conclusions
Identification of composite mapping patterns, and the types of errors that occur in the MedDRA content within UMLS, can focus larger-scale efforts on improving the quality of such mappings, which may assist in the creation of an adverse-events ontology.
doi:10.1186/1472-6947-10-66
PMCID: PMC2988705  PMID: 21029418
25.  Automated Semantic Indexing of Figure Captions to Improve Radiology Image Retrieval 
Objective
We explored automated concept-based indexing of unstructured figure captions to improve retrieval of images from radiology journals.
Design
The MetaMap Transfer program (MMTx) was used to map the text of 84,846 figure captions from 9,004 peer-reviewed, English-language articles to concepts in three controlled vocabularies from the UMLS Metathesaurus, version 2006AA. Sampling procedures were used to estimate the standard information-retrieval metrics of precision and recall, and to evaluate the degree to which concept-based retrieval improved image retrieval.
Measurements
Precision was estimated based on a sample of 250 concepts. Recall was estimated based on a sample of 40 concepts. The authors measured the impact of concept-based retrieval to improve upon keyword-based retrieval in a random sample of 10,000 search queries issued by users of a radiology image search engine.
Results
Estimated precision was 0.897 (95% confidence interval, 0.857–0.937). Estimated recall was 0.930 (95% confidence interval, 0.838–1.000). In 5,535 of 10,000 search queries (55%), concept-based retrieval found results not identified by simple keyword matching; in 2,086 searches (21%), more than 75% of the results were found by concept-based search alone.
Conclusion
Concept-based indexing of radiology journal figure captions achieved very high precision and recall, and significantly improved image retrieval.
doi:10.1197/jamia.M2945
PMCID: PMC2732225  PMID: 19261938

Results 1-25 (410344)