Search tips
Search criteria

Results 1-25 (1018365)

Clipboard (0)

Related Articles

1.  A Virtual Notebook for biomedical work groups. 
During the past several years, Baylor College of Medicine has made a substantial commitment to the use of information technology in support of its corporate and academic programs. The concept of an Integrated Academic Information Management System (IAIMS) has proved central in our planning, and the IAIMS activities that we have undertaken with funding from the National Library of Medicine have proved to be important extensions of our technology development. Here we describe our Virtual Notebook system, a conceptual and technologic framework for task coordination and information management in biomedical work groups. When fully developed and deployed, the Virtual Notebook will improve the functioning of basic and clinical research groups in the college, and it currently serves as a model for the longer-term development of our entire information management environment.
PMCID: PMC227118  PMID: 3046694
2.  IAIMS development at Baylor College of Medicine. 
At Baylor College of Medicine, we are developing the technical and intellectual resources needed to realize the Integrated Academic Information Management System (IAIMS) concept fully. The substantial technical, organizational, and financial commitments involved demand that we align our efforts with the strategic purposes of the college. The support of science, therefore, has become the principal, but not exclusive, focus of Baylor's IAIMS effort. Even so, the information technology architecture we have created for biomedical research is proving valuable in other settings as well. And the infrastructure we are creating--the communications architecture and the linkages to information resources--serves many purposes in addition to those of research. The architecture accommodates a diversity of workstations, networks, and informational and computational servers. This will be the greatest possible chance of transferring the fruits of our Phase III development to other academic medical centers.
PMCID: PMC225664  PMID: 1326367
3.  Ranking the whole MEDLINE database according to a large training set using text indexing 
BMC Bioinformatics  2005;6:75.
The MEDLINE database contains over 12 million references to scientific literature, with about 3/4 of recent articles including an abstract of the publication. Retrieval of entries using queries with keywords is useful for human users that need to obtain small selections. However, particular analyses of the literature or database developments may need the complete ranking of all the references in the MEDLINE database as to their relevance to a topic of interest. This report describes a method that does this ranking using the differences in word content between MEDLINE entries related to a topic and the whole of MEDLINE, in a computational time appropriate for an article search query engine.
We tested the capabilities of our system to retrieve MEDLINE references which are relevant to the subject of stem cells. We took advantage of the existing annotation of references with terms from the MeSH hierarchical vocabulary (Medical Subject Headings, developed at the National Library of Medicine). A training set of 81,416 references was constructed by selecting entries annotated with the MeSH term stem cells or some child in its sub tree. Frequencies of all nouns, verbs, and adjectives in the training set were computed and the ratios of word frequencies in the training set to those in the entire MEDLINE were used to score references. Self-consistency of the algorithm, benchmarked with a test set containing the training set and an equal number of references randomly selected from MEDLINE was better using nouns (79%) than adjectives (73%) or verbs (70%). The evaluation of the system with 6,923 references not used for training, containing 204 articles relevant to stem cells according to a human expert, indicated a recall of 65% for a precision of 65%.
This strategy appears to be useful for predicting the relevance of MEDLINE references to a given concept. The method is simple and can be used with any user-defined training set. Choice of the part of speech of the words used for classification has important effects on performance. Lists of words, scripts, and additional information are available from the web address .
PMCID: PMC1274266  PMID: 15790421
4.  Tools for loading MEDLINE into a local relational database 
BMC Bioinformatics  2004;5:146.
Researchers who use MEDLINE for text mining, information extraction, or natural language processing may benefit from having a copy of MEDLINE that they can manage locally. The National Library of Medicine (NLM) distributes MEDLINE in eXtensible Markup Language (XML)-formatted text files, but it is difficult to query MEDLINE in that format. We have developed software tools to parse the MEDLINE data files and load their contents into a relational database. Although the task is conceptually straightforward, the size and scope of MEDLINE make the task nontrivial. Given the increasing importance of text analysis in biology and medicine, we believe a local installation of MEDLINE will provide helpful computing infrastructure for researchers.
We developed three software packages that parse and load MEDLINE, and ran each package to install separate instances of the MEDLINE database. For each installation, we collected data on loading time and disk-space utilization to provide examples of the process in different settings. Settings differed in terms of commercial database-management system (IBM DB2 or Oracle 9i), processor (Intel or Sun), programming language of installation software (Java or Perl), and methods employed in different versions of the software. The loading times for the three installations were 76 hours, 196 hours, and 132 hours, and disk-space utilization was 46.3 GB, 37.7 GB, and 31.6 GB, respectively. Loading times varied due to a variety of differences among the systems. Loading time also depended on whether data were written to intermediate files or not, and on whether input files were processed in sequence or in parallel. Disk-space utilization depended on the number of MEDLINE files processed, amount of indexing, and whether abstracts were stored as character large objects or truncated.
Relational database (RDBMS) technology supports indexing and querying of very large datasets, and can accommodate a locally stored version of MEDLINE. RDBMS systems support a wide range of queries and facilitate certain tasks that are not directly supported by the application programming interface to PubMed. Because there is variation in hardware, software, and network infrastructures across sites, we cannot predict the exact time required for a user to load MEDLINE, but our results suggest that performance of the software is reasonable. Our database schemas and conversion software are publicly available at .
PMCID: PMC524480  PMID: 15471541
5.  Using Hypertext to Facilitate Information Sharing in Biomedical Research Groups 
As part of our effort to create an Integrated Academic Information Management System at Baylor College of Medicine, we are developing information technology to support the efforts of scientific work groups. Many of our ideas in this regard are embodied in a system called the Virtual Notebook which is intended to facilitate information sharing and management in such groups. Here we discuss the foundations of that system - a hypertext system that we have developed using a relational data base and the distributable interface the we have written in the X Window System.
PMCID: PMC2245694
6.  G-Bean: an ontology-graph based web tool for biomedical literature retrieval 
BMC Bioinformatics  2014;15(Suppl 12):S1.
Currently, most people use NCBI's PubMed to search the MEDLINE database, an important bibliographical information source for life science and biomedical information. However, PubMed has some drawbacks that make it difficult to find relevant publications pertaining to users' individual intentions, especially for non-expert users. To ameliorate the disadvantages of PubMed, we developed G-Bean, a graph based biomedical search engine, to search biomedical articles in MEDLINE database more efficiently.
G-Bean addresses PubMed's limitations with three innovations: (1) Parallel document index creation: a multithreaded index creation strategy is employed to generate the document index for G-Bean in parallel; (2) Ontology-graph based query expansion: an ontology graph is constructed by merging four major UMLS (Version 2013AA) vocabularies, MeSH, SNOMEDCT, CSP and AOD, to cover all concepts in National Library of Medicine (NLM) database; a Personalized PageRank algorithm is used to compute concept relevance in this ontology graph and the Term Frequency - Inverse Document Frequency (TF-IDF) weighting scheme is used to re-rank the concepts. The top 500 ranked concepts are selected for expanding the initial query to retrieve more accurate and relevant information; (3) Retrieval and re-ranking of documents based on user's search intention: after the user selects any article from the existing search results, G-Bean analyzes user's selections to determine his/her true search intention and then uses more relevant and more specific terms to retrieve additional related articles. The new articles are presented to the user in the order of their relevance to the already selected articles.
Performance evaluation with 106 OHSUMED benchmark queries shows that G-Bean returns more relevant results than PubMed does when using these queries to search the MEDLINE database. PubMed could not even return any search result for some OHSUMED queries because it failed to form the appropriate Boolean query statement automatically from the natural language query strings. G-Bean is available at
G-Bean addresses PubMed's limitations with ontology-graph based query expansion, automatic document indexing, and user search intention discovery. It shows significant advantages in finding relevant articles from the MEDLINE database to meet the information need of the user.
PMCID: PMC4243180  PMID: 25474588
7.  The development of a client application for the collaborative social and medical services system. 
This paper describes the design and implementation of a client application for the Baylor College of Medicine Teen Health Clinics. The application is the front end to the Collaborative Social and Medical Services System (CSMSS) under development by Baylor's Medical Informatics and Computing Research Program [8]. The application provides distributed access to an underlying object oriented database system. A process driven and patient centered design will provide staff members with a complete set of services, including forms for data entry and viewing, query, and access management to facilitate efficient and effective delivery of services. Role-specific interfaces will be supplied for clerks, nurses, nurse practitioners, physicians, and social workers. The client application is being designed using object oriented methodologies and technologies with the C++ programming language, and will operate within a Microsoft Windows operating environment utilizing Object Linking and Embedding for application interoperability.
PMCID: PMC2247750  PMID: 7950000
8.  MScanner: a classifier for retrieving Medline citations 
BMC Bioinformatics  2008;9:108.
Keyword searching through PubMed and other systems is the standard means of retrieving information from Medline. However, ad-hoc retrieval systems do not meet all of the needs of databases that curate information from literature, or of text miners developing a corpus on a topic that has many terms indicative of relevance. Several databases have developed supervised learning methods that operate on a filtered subset of Medline, to classify Medline records so that fewer articles have to be manually reviewed for relevance. A few studies have considered generalisation of Medline classification to operate on the entire Medline database in a non-domain-specific manner, but existing applications lack speed, available implementations, or a means to measure performance in new domains.
MScanner is an implementation of a Bayesian classifier that provides a simple web interface for submitting a corpus of relevant training examples in the form of PubMed IDs and returning results ranked by decreasing probability of relevance. For maximum speed it uses the Medical Subject Headings (MeSH) and journal of publication as a concise document representation, and takes roughly 90 seconds to return results against the 16 million records in Medline. The web interface provides interactive exploration of the results, and cross validated performance evaluation on the relevant input against a random subset of Medline. We describe the classifier implementation, cross validate it on three domain-specific topics, and compare its performance to that of an expert PubMed query for a complex topic. In cross validation on the three sample topics against 100,000 random articles, the classifier achieved excellent separation of relevant and irrelevant article score distributions, ROC areas between 0.97 and 0.99, and averaged precision between 0.69 and 0.92.
MScanner is an effective non-domain-specific classifier that operates on the entire Medline database, and is suited to retrieving topics for which many features may indicate relevance. Its web interface simplifies the task of classifying Medline citations, compared to building a pre-filter and classifier specific to the topic. The data sets and open source code used to obtain the results in this paper are available on-line and as supplementary material, and the web interface may be accessed at .
PMCID: PMC2263023  PMID: 18284683
9.  Accelerating Best Care at Baylor Dallas 
A culture of quality improvement (QI) is needed to bridge the gap between possible STEEEP™ (safe, timely, effective, efficient, equitable, and patient-centered) care and actual usual care. Baylor Health Care System (BHCS) developed Accelerating Best Care at Baylor (ABC Baylor), an innovative educational program that teaches health care leaders the theory and techniques of rapid-cycle QI. Course participants learn general principles of continuous QI, as well as health care–specific QI techniques, and finish the course by designing and implementing their own QI project. ABC Baylor has been employed in a variety of settings and has spread its success to other organizations, especially small and rural hospitals. These hospitals, like BHCS, have demonstrated sustained improvements that are due in part to the use of ABC Baylor and its reliance on specific modules that focus on health care safety, service, equity, and chronic disease management. The role of ABC Baylor training and consulting is part of the overall culture and infrastructure that have allowed BHCS to achieve success in its improvement journey, including the receipt of several national awards and the achievement of high reliability in compliance with Centers for Medicare and Medicaid Services core measures of processes of care related to heart failure, acute myocardial infarction, community-acquired pneumonia, and surgical care. The culture of rapid-cycle QI facilitated by ABC Baylor serves to link BHCS's vision and goals to practical execution.
PMCID: PMC2760161  PMID: 19865500
10.  Integrating Clinical Medicine into Biomedical Graduate Education to Promote Translational Research: Strategies from Two New PhD Programs 
For several decades, a barrier has existed between research and clinical medicine, making it difficult for aspiring scientists to gain exposure to human pathophysiology and access to clinical/translational research mentors during their graduate training. In 2005, the Howard Hughes Medical Institute announced the Med Into Grad initiative to support graduate programs that integrate clinical knowledge into PhD biomedical training, with the goal of preparing a new cadre of translational researchers to work at the interface of the basic sciences and clinical medicine. Two institutions, Baylor College of Medicine and the Cleveland Clinic/Case Western Reserve University, developed new PhD programs in translational biology and/or molecular medicine. These programs teach the topics and skills that today’s translational researchers must learn as well as expose students to clinical medicine. In this article, the authors compare and contrast the history, implementation, and evaluation of the Translational Biology and Molecular Medicine program at Baylor College of Medicine and the Molecular Medicine program at the Cleveland Clinic/Case Western Reserve University. The authors also demonstrate the feasibility of creating a multidisciplinary graduate program in molecular medicine that integrates pathophysiology and clinical medicine without extending training time. They conclude with a discussion of the similarities in training approaches that exist despite the fact that each program was independently developed and offer observations that emerged during their collaboration that may benefit others who are considering developing similar programs.
PMCID: PMC3529996  PMID: 23165264
11.  Adherence to HAART: A Systematic Review of Developed and Developing Nation Patient-Reported Barriers and Facilitators 
PLoS Medicine  2006;3(11):e438.
Adherence to highly active antiretroviral therapy (HAART) medication is the greatest patient-enabled predictor of treatment success and mortality for those who have access to drugs. We systematically reviewed the literature to determine patient-reported barriers and facilitators to adhering to antiretroviral therapy.
Methods and Findings
We examined both developed and developing nations. We searched the following databases: AMED (inception to June 2005), Campbell Collaboration (inception to June 2005), CinAhl (inception to June 2005), Cochrane Library (inception to June 2005), Embase (inception to June 2005), ERIC (inception to June 2005), MedLine (inception to June 2005), and NHS EED (inception to June 2005). We retrieved studies conducted in both developed and developing nation settings that examined barriers and facilitators addressing adherence. Both qualitative and quantitative studies were included. We independently, in duplicate, extracted data reported in qualitative studies addressing adherence. We then examined all quantitative studies addressing barriers and facilitators noted from the qualitative studies. In order to place the findings of the qualitative studies in a generalizable context, we meta-analyzed the surveys to determine a best estimate of the overall prevalence of issues. We included 37 qualitative studies and 47 studies using a quantitative methodology (surveys). Seventy-two studies (35 qualitative) were conducted in developed nations, while the remaining 12 (two qualitative) were conducted in developing nations. Important barriers reported in both economic settings included fear of disclosure, concomitant substance abuse, forgetfulness, suspicions of treatment, regimens that are too complicated, number of pills required, decreased quality of life, work and family responsibilities, falling asleep, and access to medication. Important facilitators reported by patients in developed nation settings included having a sense of self-worth, seeing positive effects of antiretrovirals, accepting their seropositivity, understanding the need for strict adherence, making use of reminder tools, and having a simple regimen. Among 37 separate meta-analyses examining the generalizability of these findings, we found large heterogeneity.
We found that important barriers to adherence are consistent across multiple settings and countries. Research is urgently needed to determine patient-important factors for adherence in developing world settings. Clinicians should use this information to engage in open discussion with patients to promote adherence and identify barriers and facilitators within their own populations.
An analysis of qualitative and quantitative studies found consistent barriers to adherence to HIV therapy across multiple settings and countries, ranging from access to medication to problems with complicated regimens.
Editors' Summary
The World Health Organization has estimated that in 2005, about 38 million people worldwide were living with HIV/AIDS; the mortality caused by HIV/AIDS is very high. Antiretroviral drugs are effective at controlling the disease and extending life span. However, it is important for people to stick to the drug regimens exactly in order to keep levels of HIV low, prevent it from becoming resistant to drugs, and stop the illness from progressing. However, many people find it very difficult to take antiretroviral drugs precisely as they should. There is already some evidence from research studies on the reasons why this is the case. There are two different research approaches taken by these studies: “qualitative” methods, which try to find out about attitudes and behaviors using focus groups, interviews, or other techniques; and “quantitative” methods, which try to find out about peoples' opinions and experience using surveys with set questions for the participants to answer, and then count the different responses.
Why Was This Study Done?
The investigators wanted to put together all of the available evidence from published research studies (called doing a “systematic review”) on which factors affected people's adherence to antiretroviral drugs. They wanted to do a systematic review because it is thought to be a very rigorous way of appraising all the available evidence (although there is considerable debate about the value of using such a method to analyze the results of qualitative research).
What Did the Researchers Do and Find?
The study team searched biomedical literature databases as well as conference abstracts and research registries using a defined set of search queries. They screened all the scientific papers they found; those reporting results of original research into factors affecting antiretroviral adherence were then analyzed in more detail. 84 relevant studies were identified, of which 37 used “qualitative” methods (focus groups, interviews, open-ended questioning) and 47 used “quantitative” methods (surveys). Most of these studies had been carried out in the developed world. Then, the researchers extracted the factors affecting adherence from the original studies, which could be either “positive” factors (helping adherence) or “negative” ones (making adherence more difficult). They classified the factors into four key themes: “patient related” (e.g., seeing positive results, fear of disclosure, being depressed); “beliefs about medication” (e.g., faith in how well the drugs worked, side effects); “daily schedules” (e.g., using reminder tools, disruptions to routine); and “interpersonal relationships” (e.g., trusting relations with health-care provider; social isolation).
  Many barriers to adherence were common to both developed and developing settings. Some factors were unique to the studies conducted in the developing world, such as financial constraints and problems with traveling to get access to treatment. Fear of disclosure was an important barrier identified in many of the studies.
What Do These Findings Mean?
The researchers combined the results of many different studies and identified factors that help or obstruct adherence to antiretroviral treatment. By identifying influences common to the different settings, greater weight can be placed on the factors that were identified. Only 12 of the studies included in this research were from the developing world, where the majority of HIV/AIDS patients live; hence more work is needed to examine and address the factors influencing antiretroviral adherence in these parts of the world. This study provides researchers and health policy makers with a starting point for changes that might help to ensure greater adherence to antiretroviral treatment.
Additional Information.
Please access these Web sites via the online version of this summary at
Medline Plus information on AIDS medicines (Medline Plus is a service of the US National Library of Medicine and the National Institutes of Health)
Joint United Nations Programme on HIV/AIDS has information about the state of the HIV/AIDS epidemic worldwide
The World Health Organization has an HIV/AIDS program site providing comprehensive information on the HIV/AIDS epidemic worldwide
The World Health Organization pages on antiretroviral therapy
PMCID: PMC1637123  PMID: 17121449
12.  Using Wireless Handheld Computers to Seek Information at the Point of Care: An Evaluation by Clinicians 
To evaluate: (1) the effectiveness of wireless handheld computers for online information retrieval in clinical settings; (2) the role of MEDLINE® in answering clinical questions raised at the point of care.
A prospective single-cohort study: accompanying medical teams on teaching rounds, five internal medicine residents used and evaluated MD on Tap, an application for handheld computers, to seek answers in real time to clinical questions arising at the point of care.
All transactions were stored by an intermediate server. Evaluators recorded clinical scenarios and questions, identified MEDLINE citations that answered the questions, and submitted daily and summative reports of their experience. A senior medical librarian corroborated the relevance of the selected citation to each scenario and question.
Evaluators answered 68% of 363 background and foreground clinical questions during rounding sessions using a variety of MD on Tap features in an average session length of less than four minutes. The evaluator, the number and quality of query terms, the total number of citations found for a query, and the use of auto-spellcheck significantly contributed to the probability of query success.
Handheld computers with Internet access are useful tools for healthcare providers to access MEDLINE in real time. MEDLINE citations can answer specific clinical questions when several medical terms are used to form a query. The MD on Tap application is an effective interface to MEDLINE in clinical settings, allowing clinicians to quickly find relevant citations.
PMCID: PMC2213482  PMID: 17712085
13.  Information from Searching Content with an Ontology-Utilizing Toolkit (iSCOUT) 
Journal of Digital Imaging  2012;25(4):512-519.
Radiology reports are permanent legal documents that serve as official interpretation of imaging tests. Manual analysis of textual information contained in these reports requires significant time and effort. This study describes the development and initial evaluation of a toolkit that enables automated identification of relevant information from within these largely unstructured text reports. We developed and made publicly available a natural language processing toolkit, Information from Searching Content with an Ontology-Utilizing Toolkit (iSCOUT). Core functions are included in the following modules: the Data Loader, Header Extractor, Terminology Interface, Reviewer, and Analyzer. The toolkit enables search for specific terms and retrieval of (radiology) reports containing exact term matches as well as similar or synonymous term matches within the text of the report. The Terminology Interface is the main component of the toolkit. It allows query expansion based on synonyms from a controlled terminology (e.g., RadLex or National Cancer Institute Thesaurus [NCIT]). We evaluated iSCOUT document retrieval of radiology reports that contained liver cysts, and compared precision and recall with and without using NCIT synonyms for query expansion. iSCOUT retrieved radiology reports with documented liver cysts with a precision of 0.92 and recall of 0.96, utilizing NCIT. This recall (i.e., utilizing the Terminology Interface) is significantly better than using each of two search terms alone (0.72, p = 0.03 for liver cyst and 0.52, p = 0.0002 for hepatic cyst). iSCOUT reliably assembled relevant radiology reports for a cohort of patients with liver cysts with significant improvement in document retrieval when utilizing controlled lexicons.
PMCID: PMC3389089  PMID: 22349993
Controlled vocabulary; Natural language processing; Information storage and retrieval
14.  Complementary use of the SciSearch database for improved biomedical information searching. 
The use of at least two complementary online biomedical databases is generally considered critical for biomedical scientists seeking to keep fully abreast of recent research developments as well as to retrieve the highest number of relevant citations possible. Although the National Library of Medicine's MEDLINE is usually the database of choice, this paper illustrates the benefits of using another database, the Institute for Scientific Information's SciSearch, when conducting a biomedical information search. When a simple query about red wine consumption and coronary artery disease was posed simultaneously in both MEDLINE and SciSearch, a greater number of relevant citations were retrieved through SciSearch. This paper also provides suggestions for carrying out a comprehensive biomedical literature search in a rapid and efficient manner by using SciSearch in conjunction with MEDLINE.
PMCID: PMC226327  PMID: 9549014
15.  Use of CD-ROM MEDLINE by Medical Students of the College of Medicine, University of Lagos, Nigeria 
Use of information technology in information acquisition, especially MEDLINE on CD-ROM and online, has been evaluated in several localities and regions, especially in the advanced countries. Use of MEDLINE on CD-ROM is still very poor among the medical students of the University of Lagos, Lagos, Nigeria, due to lack of awareness, insufficient personal computers, nonperiodic training, and the high cost of using the facility. Due to financial constraints, MEDLINE online and sufficiently-networked computer systems are not available.
To report on the situation in Nigeria, a developing country, so as to compare the current awareness of searching MEDLINE on CD-ROM among the medical students at the University of Lagos with the awareness of their overseas' counterparts. This is the first step toward setting up an online PubMed search as well as expanding the computer systems and network.
Essentially based on cross-sectional proportional sampling using structured questionnaires, in-depth interviews, and focus-group discussions among the medical students and library staff. The study involved the medical students in their second year to sixth (final) year of study.
Of the 250 students interviewed, 130 (52%) were aware of MEDLINE on CD-ROM searches as a means of information retrieval. Only 60 (24%) had used MEDLINE on CD-ROM — 2% had used MEDLINE on CD-ROM more than 9 times; 4%, 7 to 9 times; 8%, 4 to 6 times; and 10%, 1 to 3 times. Of the students who used MEDLINE on CD-ROM search, 22% used it in preparing for examinations, 24% in research, 6% in patient care, and 26% in preparation of assignments and clinical cases. Lack of awareness (52%) and cost of undertaking MEDLINE on CD-ROM search (46%) were identified as important factors that discouraged the use of MEDLINE on CD-ROM.
Though the above factors were recognized as important, it was concluded that the reasons for the poor use of MEDLINE on CD-ROM are multifactorial. Poor use of MEDLINE on CD-ROM could be attributed to these critical underlying factors: nonavailability of networked personal computers, which should be connected to a central server; lack of mandatory assignments to the medical students that would specifically require use of MEDLINE on CD-ROM; financial constraints on the university management; and infrequent periodic orientation on use of MEDLINE on CD-ROM. It was therefore suggested that the number of personal computers should be increased and that the library staff should periodically train the preclinical and clinical medical students in searching MEDLINE on CD-ROM. These steps would enable the medical students to benefit from online PubMed searching when it becomes fully operational in the future.
PMCID: PMC1550553  PMID: 12746212
MEDLINE use; students, medical; MEDLINE search; MEDLINE assignment; MEDLINE service; Nigeria; information retrieval; medical library; libraries, medical
16.  MED38/465: Innovative Medical Education in an Integrated Framework of Case-Based Learning and Web-Based Training 
With increasing network bandwidth and computing power the Internet will become more and more important in education. Web-Based Training (WBT) Systems leverage the advantage of flexible learning with respect to training systems being platform-independent, adaptive, easy to install, and update, easy to administer. Using standards like HTML and Java, the internet offers many resources world-wide, that can be accessed from inside WBT-Systems. As part of the Virtual University approach VIROR, the laboratory for Computer-Based Training in medicine " at the University of Heidelberg is developing the Web-Based Training-System CAMPUS, which integrates medical cases with systematic knowledge from many different sources.
Main components of CAMPUS are the authoring system, the presentation component, and a repository for systematic medical knowledge. The authoring tool is used by a medical author for editing medical cases. The presentation tool can be used as a case simulator as well as for case retrieval. During case simulation, the student is able to get further information, to specific situations within a case. This means, he has the possibility to jump from every case-situation to integrated encyclopaedias, digital libraries, and databases (like Medline or Cochrane) to get enough knowledge to solve the actual case problem. To build a bridge between medical cases and systematic knowledge, CAMPUS takes advantage of MeSH as a medium for semantic interoperability. With the author assigning MeSH-Codes to particular case-objects, the presentation component is able to jump directly to the specific topic in the systematic knowledge-base or encyclopaedia. The system is based on a 7 tier architecture and is fully implemented in Java as a client-server architecture using Java RMI for communication. The central knowledge base will be implemented as an XML repository, and will be enriched by resources in the internet. Medical cases as well as the simulation logic are stored in a relational database.
At present, prototypes of the CAMPUS authoring-system and presentation component are available and the first microbiology and pediatric cases are implemented in the system. In the future, primary efforts will be made in acquiring new medical cases as well as systematic knowledge.
CAMPUS has a high degree of adaptiveness and presents a patient in a realistic way using the advantages of Web-based Training. It is being developed in close co-operation with instructional psychologists and medical faculty members who want to integrate the system in the curriculum of medicine at the University of Heidelberg.
PMCID: PMC1761743
Java; CBT; WBT; Case Simulation; Internet
17.  Development and Validation of Filters for the Retrieval of Studies of Clinical Examination From Medline 
Efficiently finding clinical examination studies—studies that quantify the value of symptoms and signs in the diagnosis of disease—is becoming increasingly difficult. Filters developed to retrieve studies of diagnosis from Medline lack specificity because they also retrieve large numbers of studies on the diagnostic value of imaging and laboratory tests.
The objective was to develop filters for retrieving clinical examination studies from Medline.
We developed filters in a training dataset and validated them in a testing database. We created the training database by hand searching 161 journals (n = 52,636 studies). We evaluated the recall and precision of 65 candidate single-term filters in identifying studies that reported the sensitivity and specificity of symptoms or signs in the training database. To identify best combinations of these search terms, we used recursive partitioning. The best-performing filters in the training database as well as 13 previously developed filters were evaluated in a testing database (n = 431,120 studies). We also examined the impact of examining reference lists of included articles on recall.
In the training database, the single-term filters with the highest recall (95%) and the highest precision (8.4%) were diagnosis[subheading] and “medical history taking”[MeSH], respectively. The multiple-term filter developed using recursive partitioning (the RP filter) had a recall of 100% and a precision of 89% in the training database. In the testing database, the Haynes-2004-Sensitive filter (recall 98%, precision 0.13%) and the RP filter (recall 89%, precision 0.52%) showed the best performance. The recall of these two filters increased to 99% and 94% respectively with review of the reference lists of the included articles.
Recursive partitioning appears to be a useful method of developing search filters. The empirical search filters proposed here can assist in the retrieval of clinical examination studies from Medline; however, because of the low precision of the search strategies, retrieving relevant studies remains challenging. Improving precision may require systematic changes in the tagging of articles by the National Library of Medicine.
PMCID: PMC3222198  PMID: 22011384
Medline; filter; hedge; clinical examination; recursive partitioning
18.  A decade of experience in the development and implementation of tissue banking informatics tools for intra and inter-institutional translational research 
Tissue banking informatics deals with standardized annotation, collection and storage of biospecimens that can further be shared by researchers. Over the last decade, the Department of Biomedical Informatics (DBMI) at the University of Pittsburgh has developed various tissue banking informatics tools to expedite translational medicine research. In this review, we describe the technical approach and capabilities of these models.
Clinical annotation of biospecimens requires data retrieval from various clinical information systems and the de-identification of the data by an honest broker. Based upon these requirements, DBMI, with its collaborators, has developed both Oracle-based organ-specific data marts and a more generic, model-driven architecture for biorepositories. The organ-specific models are developed utilizing Oracle server tools and software applications and the model-driven architecture is implemented in a J2EE framework.
The organ-specific biorepositories implemented by DBMI include the Cooperative Prostate Cancer Tissue Resource (, Pennsylvania Cancer Alliance Bioinformatics Consortium (, EDRN Colorectal and Pancreatic Neoplasm Database ( and Specialized Programs of Research Excellence (SPORE) Head and Neck Neoplasm Database ( The model-based architecture is represented by the National Mesothelioma Virtual Bank ( These biorepositories provide thousands of well annotated biospecimens for the researchers that are searchable through query interfaces available via the Internet.
These systems, developed and supported by our institute, serve to form a common platform for cancer research to accelerate progress in clinical and translational research. In addition, they provide a tangible infrastructure and resource for exposing research resources and biospecimen services in collaboration with the clinical anatomic pathology laboratory information system (APLIS) and the cancer registry information systems.
PMCID: PMC2941965  PMID: 20922029
Tissue banking informatics; information models for translational research
19.  Dynamic summarization of bibliographic-based data 
Traditional information retrieval techniques typically return excessive output when directed at large bibliographic databases. Natural Language Processing applications strive to extract salient content from the excessive data. Semantic MEDLINE, a National Library of Medicine (NLM) natural language processing application, highlights relevant information in PubMed data. However, Semantic MEDLINE implements manually coded schemas, accommodating few information needs. Currently, there are only five such schemas, while many more would be needed to realistically accommodate all potential users. The aim of this project was to develop and evaluate a statistical algorithm that automatically identifies relevant bibliographic data; the new algorithm could be incorporated into a dynamic schema to accommodate various information needs in Semantic MEDLINE, and eliminate the need for multiple schemas.
We developed a flexible algorithm named Combo that combines three statistical metrics, the Kullback-Leibler Divergence (KLD), Riloff's RlogF metric (RlogF), and a new metric called PredScal, to automatically identify salient data in bibliographic text. We downloaded citations from a PubMed search query addressing the genetic etiology of bladder cancer. The citations were processed with SemRep, an NLM rule-based application that produces semantic predications. SemRep output was processed by Combo, in addition to the standard Semantic MEDLINE genetics schema and independently by the two individual KLD and RlogF metrics. We evaluated each summarization method using an existing reference standard within the task-based context of genetic database curation.
Combo asserted 74 genetic entities implicated in bladder cancer development, whereas the traditional schema asserted 10 genetic entities; the KLD and RlogF metrics individually asserted 77 and 69 genetic entities, respectively. Combo achieved 61% recall and 81% precision, with an F-score of 0.69. The traditional schema achieved 23% recall and 100% precision, with an F-score of 0.37. The KLD metric achieved 61% recall, 70% precision, with an F-score of 0.65. The RlogF metric achieved 61% recall, 72% precision, with an F-score of 0.66.
Semantic MEDLINE summarization using the new Combo algorithm outperformed a conventional summarization schema in a genetic database curation task. It potentially could streamline information acquisition for other needs without having to hand-build multiple saliency schemas.
PMCID: PMC3042900  PMID: 21284871
20.  A Study on Pubmed Search Tag Usage Pattern: Association Rule Mining of a Full-day Pubmed Query Log 
The practice of evidence-based medicine requires efficient biomedical literature search such as PubMed/MEDLINE. Retrieval performance relies highly on the efficient use of search field tags. The purpose of this study was to analyze PubMed log data in order to understand the usage pattern of search tags by the end user in PubMed/MEDLINE search.
A PubMed query log file was obtained from the National Library of Medicine containing anonymous user identification, timestamp, and query text. Inconsistent records were removed from the dataset and the search tags were extracted from the query texts. A total of 2,917,159 queries were selected for this study issued by a total of 613,061 users. The analysis of frequent co-occurrences and usage patterns of the search tags was conducted using an association mining algorithm.
The percentage of search tag usage was low (11.38% of the total queries) and only 2.95% of queries contained two or more tags. Three out of four users used no search tag and about two-third of them issued less than four queries. Among the queries containing at least one tagged search term, the average number of search tags was almost half of the number of total search terms. Navigational search tags are more frequently used than informational search tags. While no strong association was observed between informational and navigational tags, six (out of 19) informational tags and six (out of 29) navigational tags showed strong associations in PubMed searches.
The low percentage of search tag usage implies that PubMed/MEDLINE users do not utilize the features of PubMed/MEDLINE widely or they are not aware of such features or solely depend on the high recall focused query translation by the PubMed’s Automatic Term Mapping. The users need further education and interactive search application for effective use of the search tags in order to fulfill their biomedical information needs from PubMed/MEDLINE.
PMCID: PMC3552776  PMID: 23302604
21.  SPIRS: A Web-based Image Retrieval System for Large Biomedical Databases 
With the increasing use of images in disease research, education, and clinical medicine, the need for methods that effectively archive, query, and retrieve these images by their content is underscored. This paper describes the implementation of a Web-based retrieval system called SPIRS (Spine Pathology & Image Retrieval System), which permits exploration of a large biomedical database of digitized spine x-ray images and data from a national health survey using a combination of visual and textual queries.
SPIRS is a generalizable framework that consists of four components: a client applet, a gateway, an indexing and retrieval system, and a database of images and associated text data. The prototype system is demonstrated using text and imaging data collected as part of the second U.S. National Health and Nutrition Examination Survey (NHANES II). Users search the image data by providing a sketch of the vertebral outline or selecting an example vertebral image and some relevant text parameters. Pertinent pathology on the image/sketch can be annotated and weighted to indicate importance.
During the course of development, we explored different algorithms to perform functions such as segmentation, indexing, and retrieval. Each algorithm was tested individually and then implemented as part of SPIRS. To evaluate the overall system, we first tested the system’s ability to return similar vertebral shapes from the database given a query shape. Initial evaluations using visual queries only (no text) have shown that the system achieves up to 68% accuracy in finding images in the database that exhibit similar abnormality type and severity. Relevance feedback mechanisms have been shown to increase accuracy by an additional 22% after three iterations. While we primarily demonstrate this system in the context of retrieving vertebral shape, our framework has also been adapted to search a collection of 100,000 uterine cervix images to study the progression of cervical cancer.
SPIRS is automated, easily accessible, and integratable with other complementary information retrieval systems. The system supports the ability for users to intuitively query large amounts of imaging data by providing visual examples and text keywords and has beneficial implications in the areas of research, education, and patient care.
PMCID: PMC2693318  PMID: 18996737
Medical informatics applications; Information storage and retrieval; Content-based image retrieval; Visual access methods; Web-based systems
22.  An ontology-based comparative anatomy information system 
This paper describes the design, implementation, and potential use of a comparative anatomy information system (CAIS) for querying on similarities and differences between homologous anatomical structures across species, the knowledge base it operates upon, the method it uses for determining the answers to the queries, and the user interface it employs to present the results. The relevant informatics contributions of our work include (1) the development and application of the structural difference method, a formalism for symbolically representing anatomical similarities and differences across species; (2) the design of the structure of a mapping between the anatomical models of two different species and its application to information about specific structures in humans, mice, and rats; and (3) the design of the internal syntax and semantics of the query language. These contributions provide the foundation for the development of a working system that allows users to submit queries about the similarities and differences between mouse, rat, and human anatomy; delivers result sets that describe those similarities and differences in symbolic terms; and serves as a prototype for the extension of the knowledge base to any number of species. Additionally, we expanded the domain knowledge by identifying medically relevant structural questions for the human, the mouse, and the rat, and made an initial foray into the validation of the application and its content by means of user questionnaires, software testing, and other feedback.
The anatomical structures of the species to be compared, as well as the mappings between species, are modeled on templates from the Foundational Model of Anatomy knowledge base, and compared using graph-matching techniques. A graphical user interface allows users to issue queries that retrieve information concerning similarities and differences between structures in the species being examined. Queries from diverse information sources, including domain experts, peer-reviewed articles, and reference books, have been used to test the system and to illustrate its potential use in comparative anatomy studies.
157 test queries were submitted to the CAIS system, and all of them were correctly answered. The interface was evaluated in terms of clarity and ease of use. This testing determined that the application works well, and is fairly intuitive to use, but users want to see more clarification of the meaning of the different types of possible queries. Some of the interface issues will naturally be resolved as we refine our conceptual model to deal with partial and complex homologies in the content.
The CAIS system and its associated methods are expected to be useful to biologists and translational medicine researchers. Possible applications range from supporting theoretical work in clarifying and modeling ontogenetic, physiological, pathological, and evolutionary transformations, to concrete techniques for improving the analysis of genotype–phenotype relationships among various animal models in support of a wide array of clinical and scientific initiatives.
PMCID: PMC3055271  PMID: 21146377
Ontology; Anatomy; Comparative anatomy; Knowledge base; Protégé; Homology; Graph matching; Graph similarity; Isomorphism; Foundational Model of Anatomy
23.  Study of Query Expansion Techniques and Their Application in the Biomedical Information Retrieval 
The Scientific World Journal  2014;2014:132158.
Information Retrieval focuses on finding documents whose content matches with a user query from a large document collection. As formulating well-designed queries is difficult for most users, it is necessary to use query expansion to retrieve relevant information. Query expansion techniques are widely applied for improving the efficiency of the textual information retrieval systems. These techniques help to overcome vocabulary mismatch issues by expanding the original query with additional relevant terms and reweighting the terms in the expanded query. In this paper, different text preprocessing and query expansion approaches are combined to improve the documents initially retrieved by a query in a scientific documental database. A corpus belonging to MEDLINE, called Cystic Fibrosis, is used as a knowledge source. Experimental results show that the proposed combinations of techniques greatly enhance the efficiency obtained by traditional queries.
PMCID: PMC3958669  PMID: 24723793
24.  Federated ontology-based queries over cancer data 
BMC Bioinformatics  2012;13(Suppl 1):S9.
Personalised medicine provides patients with treatments that are specific to their genetic profiles. It requires efficient data sharing of disparate data types across a variety of scientific disciplines, such as molecular biology, pathology, radiology and clinical practice. Personalised medicine aims to offer the safest and most effective therapeutic strategy based on the gene variations of each subject. In particular, this is valid in oncology, where knowledge about genetic mutations has already led to new therapies. Current molecular biology techniques (microarrays, proteomics, epigenetic technology and improved DNA sequencing technology) enable better characterisation of cancer tumours. The vast amounts of data, however, coupled with the use of different terms - or semantic heterogeneity - in each discipline makes the retrieval and integration of information difficult.
Existing software infrastructures for data-sharing in the cancer domain, such as caGrid, support access to distributed information. caGrid follows a service-oriented model-driven architecture. Each data source in caGrid is associated with metadata at increasing levels of abstraction, including syntactic, structural, reference and domain metadata. The domain metadata consists of ontology-based annotations associated with the structural information of each data source. However, caGrid's current querying functionality is given at the structural metadata level, without capitalising on the ontology-based annotations. This paper presents the design of and theoretical foundations for distributed ontology-based queries over cancer research data. Concept-based queries are reformulated to the target query language, where join conditions between multiple data sources are found by exploiting the semantic annotations. The system has been implemented, as a proof of concept, over the caGrid infrastructure. The approach is applicable to other model-driven architectures. A graphical user interface has been developed, supporting ontology-based queries over caGrid data sources. An extensive evaluation of the query reformulation technique is included.
To support personalised medicine in oncology, it is crucial to retrieve and integrate molecular, pathology, radiology and clinical data in an efficient manner. The semantic heterogeneity of the data makes this a challenging task. Ontologies provide a formal framework to support querying and integration. This paper provides an ontology-based solution for querying distributed databases over service-oriented, model-driven infrastructures.
PMCID: PMC3471355  PMID: 22373043
25.  Integrating the UMLS into VNS Retriever. 
We are developing a networked resource for the National Library of Medicine's Unified Medical Language System. We call this resource the UMLS Retriever, which is an instance of our VNS Retriever architecture. Our prototype user interface makes use of the Virtual Notebook System Browser. The development of a networked UMLS service will result in numerous advantages to our user community.
PMCID: PMC2248017  PMID: 1482881

Results 1-25 (1018365)