PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (866862)

Clipboard (0)
None

Related Articles

1.  SPIN Query Tools for De-identified Research on a Humongous Database 
The Shared Pathology Informatics Network (SPIN), a research initiative of the National Cancer Institute, will allow for the retrieval of more than 4 million pathology reports and specimens. In this paper, we describe the special query tool as developed for the Indianapolis/Regenstrief SPIN node, integrated into the ever-expanding Indiana Network for Patient care (INPC). This query tool allows for the retrieval of de-identified data sets using complex logic, auto-coded final diagnoses, and intrinsically supports multiple types of statistical analyses. The new SPIN/INPC database represents a new generation of the Regenstrief Medical Record system – a centralized, but federated system of repositories.
PMCID: PMC1560587  PMID: 16779093
2.  Integrating an automatic classification method into the medical image retrieval process 
Combining low-level features that represent the content of medical images with high level features that are saved with images would allow the expansion of text queries submitted to Content Based Image Retrieval (CBIR) systems. Expanding these text queries would allow CBIR systems to respond more effectively to specific queries when retrieving medical images. We hypothesized that adding an automatic classification method to the current retrieval process would help improve the performance of the University at Buffalo Medical Text and Images Retrieval System (UBMedTIRS). This paper illustrates the results of our approach and its implications for expanding query statements in medical image information retrieval (IR) systems.
PMCID: PMC2655992  PMID: 18999165
3.  Natural Language Access to a Melanoma Data Base 
This paper describes ongoing research towards developing a system that will allow physicians personal access to patient medical data through natural language queries to support both patient management and clinical research. A prototype system has been implemented for a small data base on malignant melanoma. The physician can input queries in English that retrieve specified data for particular patients or for groups of patients satisfying certain characteristics, that perform simple calculations, that allow browsing through the data base, and that assist in identifying relations among attributes. The system supports dialogue interactions; that is, the user can follow a line of inquiry to test a particular hypothesis by entering a sequence of queries that depend on each other. Classes of questions that can be processed are described and examples using the system are given.
PMCID: PMC2231721
4.  Abstraction-based Temporal Data Retrieval for a Clinical Data Repository 
Disease and patient care processes often create characteristic states, trends, and temporal patterns in clinical events and observations, called temporal abstractions. Identifying patient populations who share similar abstractions may be useful for clinical research, outcomes studies, and quality assurance. In these settings, abstractions may be specific to a query, and thus allowing the specification of abstractions directly in the query would be desirable. We propose a query language for specifying and retrieving clinical data sets that allows specifying abstractions directly, and automatically selects data for retrieval based on the presence of abstractions inferred from the data. We describe the language and a prototype implementation, demonstrate its features with two queries constructed in response to clinical researcher-initiated data requests submitted to our institution’s Clinical Data Repository, and report preliminary results from an evaluation of the implementation’s performance.
PMCID: PMC2655874  PMID: 18693907
5.  A Prototype System to Support Evidence-based Practice 
Translating evidence into clinical practice is a complex process that depends on the availability of evidence, the environment into which the research evidence is translated, and the system that facilitates the translation. This paper presents InfoBot, a system designed for automatic delivery of patient-specific information from evidence-based resources. A prototype system has been implemented to support development of individualized patient care plans. The prototype explores possibilities to automatically extract patients’ problems from the interdisciplinary team notes and query evidence-based resources using the extracted terms. Using 4,335 de-identified interdisciplinary team notes for 525 patients, the system automatically extracted biomedical terminology from 4,219 notes and linked resources to 260 patient records. Sixty of those records (15 each for Pediatrics, Oncology & Hematology, Medical & Surgical, and Behavioral Health units) have been selected for an ongoing evaluation of the quality of automatically proactively delivered evidence and its usefulness in development of care plans.
PMCID: PMC2656073  PMID: 18998835
6.  Generic queries for meeting clinical information needs. 
This paper describes a model for automated information retrieval in which questions posed by clinical users are analyzed to establish common syntactic and semantic patterns. The patterns are used to develop a set of general-purpose questions called generic queries. These generic queries are used in responding to specific clinical information needs. Users select generic queries in one of two ways. The user may type in questions, which are then analyzed, using natural language processing techniques, to identify the most relevant generic query; or the user may indicate patient data of interest and then pick one of several potentially relevant questions. Once the query and medical concepts have been determined, an information source is selected automatically, a retrieval strategy is composed and executed, and the results are sorted and filtered for presentation to the user. This work makes extensive use of the National Library of Medicine's Unified Medical Language System (UMLS): medical concepts are derived from the Metathesaurus, medical queries are based on semantic relations drawn from the UMLS Semantic Network, and automated source selection makes use of the Information Sources Map. The paper describes research currently under way to implement this model and reports on experience and results to date.
PMCID: PMC225762  PMID: 8472005
7.  Performance evaluation of unified medical language system®'s synonyms expansion to query PubMed 
Background
PubMed is the main access to medical literature on the Internet. In order to enhance the performance of its information retrieval tools, primarily non-indexed citations, the authors propose a method: expanding users' queries using Unified Medical Language System' (UMLS) synonyms i.e. all the terms gathered under one unique Concept Unique Identifier.
Methods
This method was evaluated using queries constructed to emphasize the differences between this new method and the current PubMed automatic term mapping. Four experts assessed citation relevance.
Results
Using UMLS, we were able to retrieve new citations in 45.5% of queries, which implies a small increase in recall. The new strategy led to a heterogeneous 23.7% mean increase in non-indexed citation retrieved. Of these, 82% have been published less than 4 months earlier. The overall mean precision was 48.4% but differed according to the evaluators, ranging from 36.7% to 88.1% (Inter rater agreement was poor: kappa = 0.34).
Conclusions
This study highlights the need for specific search tools for each type of user and use-cases. The proposed strategy may be useful to retrieve recent scientific advancement.
doi:10.1186/1472-6947-12-12
PMCID: PMC3309945  PMID: 22376010
8.  Remote Access MicroMeSH: A Microcomputer System for Searching the MEDLINE Database 
This paper describes Remote Access - MicroMeSH (RAMM) a powerful but easy to use microcomputer system for searching the medical literature. RAMM uses MicroMeSH [1], a system for accessing the National Library of Medicine's (NLM) Medical Subject Headings (MeSH) vocabulary, to facilitate off-line creation and refinement of highly specific MEDLINE search queries. Using these queries RAMM automatically searches and retrieves citations from the MEDLINE databases through the National Library of Medicine's (NLM) Medical Literature Analysis and Retrieval System (MEDLARS). RAMM is used by both staff and students at Harvard Medical School. As search query creation and citation review are performed off-line the cost of on-line searching in minimized.
PMCID: PMC2245285
9.  User centered and ontology based information retrieval system for life sciences 
BMC Bioinformatics  2012;13(Suppl 1):S4.
Background
Because of the increasing number of electronic resources, designing efficient tools to retrieve and exploit them is a major challenge. Some improvements have been offered by semantic Web technologies and applications based on domain ontologies. In life science, for instance, the Gene Ontology is widely exploited in genomic applications and the Medical Subject Headings is the basis of biomedical publications indexation and information retrieval process proposed by PubMed. However current search engines suffer from two main drawbacks: there is limited user interaction with the list of retrieved resources and no explanation for their adequacy to the query is provided. Users may thus be confused by the selection and have no idea on how to adapt their queries so that the results match their expectations.
Results
This paper describes an information retrieval system that relies on domain ontology to widen the set of relevant documents that is retrieved and that uses a graphical rendering of query results to favor user interactions. Semantic proximities between ontology concepts and aggregating models are used to assess documents adequacy with respect to a query. The selection of documents is displayed in a semantic map to provide graphical indications that make explicit to what extent they match the user's query; this man/machine interface favors a more interactive and iterative exploration of data corpus, by facilitating query concepts weighting and visual explanation. We illustrate the benefit of using this information retrieval system on two case studies one of which aiming at collecting human genes related to transcription factors involved in hemopoiesis pathway.
Conclusions
The ontology based information retrieval system described in this paper (OBIRS) is freely available at: http://www.ontotoolkit.mines-ales.fr/ObirsClient/. This environment is a first step towards a user centred application in which the system enlightens relevant information to provide decision help.
doi:10.1186/1471-2105-13-S1-S4
PMCID: PMC3434427  PMID: 22373375
10.  Leveraging medical thesauri and physician feedback for improving medical literature retrieval for case queries 
Objective
This paper presents a study of methods for medical literature retrieval for case queries, in which the goal is to retrieve literature articles similar to a given patient case. In particular, it focuses on analyzing the performance of state-of-the-art general retrieval methods and improving them by the use of medical thesauri and physician feedback.
Materials and Methods
The Kullback–Leibler divergence retrieval model with Dirichlet smoothing is used as the state-of-the-art general retrieval method. Pseudorelevance feedback and term weighing methods are proposed by leveraging MeSH and UMLS thesauri. Evaluation is performed on a test collection recently created for the ImageCLEF medical case retrieval challenge.
Results
Experimental results show that a well-tuned state-of-the-art general retrieval model achieves a mean average precision of 0.2754, but the performance can be improved by over 40% to 0.3980, through the proposed methods.
Discussion
The results over the ImageCLEF test collection, which is currently the best collection available for the task, are encouraging. There are, however, limitations due to small evaluation set size. The analysis shows that further refinement of the methods is necessary before they can be really useful in a clinical setting.
Conclusion
Medical case-based literature retrieval is a critical search application that presents a number of unique challenges. This analysis shows that the state-of-the-art general retrieval models are reasonably good for the task, but the performance can be significantly improved by developing new task-specific retrieval models that incorporate medical thesauri and physician feedback.
doi:10.1136/amiajnl-2011-000293
PMCID: PMC3422816  PMID: 22437075
Case search; clinical (L01.700.508.300.190); computer-assisted (L01.700.508.100); decision making; decision support systems; decision support techniques (L01.700.508.190); high-performance and large-scale computing; information management (L01.399); information retrieval; information storage and retrieval (L01.700.508.280); language models; machine learning; medical case-based retrieval; medical case retrieval; medical informatics (L01.313.500); natural language processing; semantic weighing; statistical analysis of large datasets; uncertain reasoning and decision theory; visualization of data and knowledge
11.  SAPHIRE International: a tool for cross-language information retrieval. 
The world's foremost medical literature is written in English, yet much of the world does not speak English as a primary language. This has led to increasing research interest in cross-language information retrieval, where textual databases are queried in languages other than the one in which they are written. We describe enhancements to the SAPHIRE concept-retrieval system, which maps free-text documents and queries to concepts in the UMLS Metathesaurus, that allow it to accept text input and provide Metathesaurus concept output in any of six languages: English, German, French, Russian, Spanish, and Portuguese. An example of the use of SAPHIRE International is shown in the CliniWeb catalogue of clinically-oriented Web pages. A formative evaluation of German terms shows that additional work is required in handling plural and other suffix variants as well as expanding the breadth of synonyms in the UMLS Metathesaurus.
Images
PMCID: PMC2232200  PMID: 9929304
12.  An EHR Prototype Using Structured ISO/EN 13606 Documents to Respond to Identified Clinical Information Needs of Diabetes Specialists: A Controlled Study on Feasibility and Impact 
Cross-institutional longitudinal Electronic Health Records (EHR), as introduced in Austria at the moment, increase the challenge of information overload of healthcare professionals. We developed an innovative cross-institutional EHR query prototype that offers extended query options, including searching for specific information items or sets of information items. The available query options were derived from a systematic analysis of information needs of diabetes specialists during patient encounters. The prototype operates in an IHE-XDS-based environment where ISO/EN 13606-structured documents are available.
We conducted a controlled study with seven diabetes specialists to assess the feasibility and impact of this EHR query prototype on efficient retrieving of patient information to answer typical clinical questions. The controlled study showed that the specialists were quicker and more successful (measured in percentage of expected information items found) in finding patient information compared to the standard full-document search options. The participants also appreciated the extended query options.
PMCID: PMC3540470  PMID: 23304308
13.  Searching Electronic Health Records for Temporal Patterns in Patient Histories: A Case Study with Microsoft Amalga 
As electronic health records (EHR) become more widespread, they enable clinicians and researchers to pose complex queries that can benefit immediate patient care and deepen understanding of medical treatment and outcomes. However, current query tools make complex temporal queries difficult to pose, and physicians have to rely on computer professionals to specify the queries for them. This paper describes our efforts to develop a novel query tool implemented in a large operational system at the Washington Hospital Center (Microsoft Amalga, formerly known as Azyxxi). We describe our design of the interface to specify temporal patterns and the visual presentation of results, and report on a pilot user study looking for adverse reactions following radiology studies using contrast.
PMCID: PMC2655947  PMID: 18999158
14.  Query Networks for Medical Information Retrieval-Assigning Probabilistic Relationships 
Query networks are specializations of Belief networks used in information retrieval. We hypothesize that query networks can be incorporated into medical information systems in at least two ways: First, the relative values of nodes in the query networks can be used to initiate searches based on query term-weights. Second, query models can incorporate reader feedback and can become simple task-specific user models. If large query networks are to be useful, one must find means to assign reasonable “default” values to those nodes and edges which are not explicitly defined by some other means. This paper presents preliminary data assessing the suitability of various default heuristic query network edge assignment functions. Early evidence suggests that query networks using default assignment functions exhibit behavior consistent with that expected from an information retrieval aid.
PMCID: PMC2245571
15.  Scotch and Sodas: Semantically Oriented Data Access and Storage for a Medical Record DBMS 
This paper describes current research directed toward a system that will allow health professionals simplified access to patient medical data for patient care and clinical research. A prototype medical record database management system has been implemented for a large database of patient records collected at an HMO. A menu-selection technique is used by the health professional for question-type specification, and natural language phrases involving medical terminology are used for specification of selection criteria. The data is semantically structured and encoded in an attempt to provide comprehensive and efficient retrieval of relevant information. A sample system interaction is described.
PMCID: PMC2231913
16.  Query Log Analysis of an Electronic Health Record Search Engine 
We analyzed a longitudinal collection of query logs of a full-text search engine designed to facilitate information retrieval in electronic health records (EHR). The collection, 202,905 queries and 35,928 user sessions recorded over a course of 4 years, represents the information-seeking behavior of 533 medical professionals, including frontline practitioners, coding personnel, patient safety officers, and biomedical researchers for patient data stored in EHR systems. In this paper, we present descriptive statistics of the queries, a categorization of information needs manifested through the queries, as well as temporal patterns of the users’ information-seeking behavior. The results suggest that information needs in medical domain are substantially more sophisticated than those that general-purpose web search engines need to accommodate. Therefore, we envision there exists a significant challenge, along with significant opportunities, to provide intelligent query recommendations to facilitate information retrieval in EHR.
PMCID: PMC3243246  PMID: 22195150
17.  MLTrends: Graphing MEDLINE term usage over time 
The MEDLINE database of medical literature is routinely used by researchers and doctors to find articles pertaining to their area of interest. Insight into historical changes in research areas and use of scientific language may be gained by chronological analysis of the 18 million records currently in the database, however such analysis is generally complex and time consuming. The authors’ MLTrends web application graphs term usage in MEDLINE over time, allowing the determination of emergence dates for biomedical terms and historical variations in term usage intensity. Terms considered are individual words or quoted phrases which may be combined using Boolean operators. MLTrends can plot the number of records in MEDLINE per year whose titles or abstracts match each queried term for multiple terms simultaneously. The MEDLINE database is stored and indexed on the MLTrends server allowing queries to be completed and graphs generated in less than one second. Queries may be performed on all titles and/or abstracts in MEDLINE and can include stop words. The resulting graphs may be normalized by total publications or words per year to facilitate term usage comparison between years.This makes MLTrends a powerful tool for rapid evaluation of the evolution of biomedical research and language in a graphical way. MLTrends may be used at: http://www.ogic.ca/mltrends
PMCID: PMC2990277  PMID: 20333611
18.  A graphical tool for ad hoc query generation. 
Medical data are characterized by complex taxonomies and evolving terminology. Questions that clinicians, medical administrators, and researchers may wish to answer using medical databases are not easily formulated as SQL queries. In this paper we describe a graphical tool that facilitates formulation of ad hoc questions as SQL queries. This tool manages multiple attribute hierarchies and creates SQL query strings by navigating through the hierarchies. This interactive tool has been optimized using indexing to improve the overall speed of the query building and the data retrieval process. Indexed queries performed 5 to 100 times faster than query strings. However, query string generation time depends on the size of the taxonomies describing the hierarchies, while the index generation time depends on the size of the data warehouse.
Images
PMCID: PMC2232066  PMID: 9929270
19.  Intelligent Focusing in Knowledge Indexing and Retrieval: The Relatedness Tool 
Most present day information retrieval systems use the presence or absence of certain words to decide which documents are appropriate for a user's query. This approach has had certain successes, but it fails to capture relationships between concepts represented by the words, and hence reduces the potential specificity of both indexing and searching of documents. A richer representation of the semantics of documents and queries, and methods for reasoning about these representations, have been provided by artificial intelligence. Navigational tools for browsing and authoring knowledge bases (KB's) add a convenient technique for focusing in the complex landscape of semantic representations. The center of such representations is usually a frame or a semantic network system. We are developing a prototype Unified Medical Language System (UMLS) taxonomy to represent objects and relationships in medicine. One focus of our research is improved methods for indexing and querying repositories of biomedical literature. The technique which we propose is based on the notion of relatedness of concepts. To this end we define heuristics which find related concepts and apply it to the UMLS taxonomy. Preliminary results from experiments with the implemented heuristics demonstrate its potential usefulness.
PMCID: PMC2245195
20.  Beyond Information Retrieval—Medical Question Answering 
Physicians have many questions when caring for patients, and frequently need to seek answers for their questions. Information retrieval systems (e.g., PubMed) typically return a list of documents in response to a user’s query. Frequently the number of returned documents is large and makes physicians’ information seeking “practical only ‘after hours’ and not in the clinical settings”. Question answering techniques are based on automatically analyzing thousands of electronic documents to generate short-text answers in response to clinical questions that are posed by physicians. The authors address physicians’ information needs and described the design, implementation, and evaluation of the medical question answering system (MedQA). Although our long term goal is to enable MedQA to answer all types of medical questions, currently, we currently implement MedQA to integrate information retrieval, extraction, and summarization techniques to automatically generate paragraph-level text for definitional questions (i.e., “What is X?”). MedQA can be accessed at http://www.dbmi.columbia.edu/~yuh9001/research/MedQA.html.
PMCID: PMC1839371  PMID: 17238385
21.  Automatic inference of indexing rules for MEDLINE 
BMC Bioinformatics  2008;9(Suppl 11):S11.
Background:
Indexing is a crucial step in any information retrieval system. In MEDLINE, a widely used database of the biomedical literature, the indexing process involves the selection of Medical Subject Headings in order to describe the subject matter of articles. The need for automatic tools to assist MEDLINE indexers in this task is growing with the increasing number of publications being added to MEDLINE.
Methods:
In this paper, we describe the use and the customization of Inductive Logic Programming (ILP) to infer indexing rules that may be used to produce automatic indexing recommendations for MEDLINE indexers.
Results:
Our results show that this original ILP-based approach outperforms manual rules when they exist. In addition, the use of ILP rules also improves the overall performance of the Medical Text Indexer (MTI), a system producing automatic indexing recommendations for MEDLINE.
Conclusion:
We expect the sets of ILP rules obtained in this experiment to be integrated into MTI.
doi:10.1186/1471-2105-9-S11-S11
PMCID: PMC2586750  PMID: 19025687
22.  OntoQuest: A Physician Decision Support System based on Ontological Queries of the Hospital Database 
OntoQuest is a physician decision support system that mines the hospital data base for previous decisions made in cases similar to the current one. For example, OntoQuest displays a list of the medications prescribed to similar historical patients from which the physician may compare his choice of medication for the current patient. This information retrieval is accomplished using ontological queries. Unlike a regular database query, an ontological query is able to account for semantic similarity between patients. To implement the ontological query, we propose a method for computing the ontological similarity between patients represented by sets of ICD-9 diagnoses. We have tested the OntoQuest prototype on a pilot data set of 2077 patients. Finally, we compare the OntoQuest performance to conventional database queries. We believe that OntoQuest can be extended to compare services, quality and outcomes among patient and provider groups.
PMCID: PMC1839267  PMID: 17238419
23.  Ranking the whole MEDLINE database according to a large training set using text indexing 
BMC Bioinformatics  2005;6:75.
Background
The MEDLINE database contains over 12 million references to scientific literature, with about 3/4 of recent articles including an abstract of the publication. Retrieval of entries using queries with keywords is useful for human users that need to obtain small selections. However, particular analyses of the literature or database developments may need the complete ranking of all the references in the MEDLINE database as to their relevance to a topic of interest. This report describes a method that does this ranking using the differences in word content between MEDLINE entries related to a topic and the whole of MEDLINE, in a computational time appropriate for an article search query engine.
Results
We tested the capabilities of our system to retrieve MEDLINE references which are relevant to the subject of stem cells. We took advantage of the existing annotation of references with terms from the MeSH hierarchical vocabulary (Medical Subject Headings, developed at the National Library of Medicine). A training set of 81,416 references was constructed by selecting entries annotated with the MeSH term stem cells or some child in its sub tree. Frequencies of all nouns, verbs, and adjectives in the training set were computed and the ratios of word frequencies in the training set to those in the entire MEDLINE were used to score references. Self-consistency of the algorithm, benchmarked with a test set containing the training set and an equal number of references randomly selected from MEDLINE was better using nouns (79%) than adjectives (73%) or verbs (70%). The evaluation of the system with 6,923 references not used for training, containing 204 articles relevant to stem cells according to a human expert, indicated a recall of 65% for a precision of 65%.
Conclusion
This strategy appears to be useful for predicting the relevance of MEDLINE references to a given concept. The method is simple and can be used with any user-defined training set. Choice of the part of speech of the words used for classification has important effects on performance. Lists of words, scripts, and additional information are available from the web address .
doi:10.1186/1471-2105-6-75
PMCID: PMC1274266  PMID: 15790421
24.  Besides Precision & Recall: Exploring Alternative Approaches to Evaluating an Automatic Indexing Tool for MEDLINE 
Objective
This paper explores alternative approaches for the evaluation of an automatic indexing tool for MEDLINE, complementing the traditional precision and recall method.
Materials and methods
The performance of MTI, the Medical Text Indexer used at NLM to produce MeSH recommendations for biomedical journal articles is evaluated on a random set of MEDLINE citations. The evaluation examines semantic similarity at the term level (indexing terms). In addition, the documents retrieved by queries resulting from MTI index terms for a given document are compared to the PubMed related citations for this document.
Results
Semantic similarity scores between sets of index terms are higher than the corresponding Dice similarity scores. Overall, 75% of the original documents and 58% of the top ten related citations are retrieved by queries based on the automatic indexing.
Conclusions
The alternative measures studied in this paper confirm previous findings and may be used to select particular documents from the test set for a more thorough analysis.
PMCID: PMC1839480  PMID: 17238409
25.  iSMART: Ontology-based Semantic Query of CDA Documents 
The Health Level 7 Clinical Document Architecture (CDA) is widely accepted as the format for electronic clinical document. With the rich ontological references in CDA documents, the ontology-based semantic query could be performed to retrieve CDA documents. In this paper, we present iSMART (interactive Semantic MedicAl Record reTrieval), a prototype system designed for ontology-based semantic query of CDA documents. The clinical information in CDA documents will be extracted into RDF triples by a declarative XML to RDF transformer. An ontology reasoner is developed to infer additional information by combining the background knowledge from SNOMED CT ontology. Then an RDF query engine is leveraged to enable the semantic queries. This system has been evaluated using the real clinical documents collected from a large hospital in southern China.
PMCID: PMC2815425  PMID: 20351883

Results 1-25 (866862)