Search tips
Search criteria

Results 1-10 (10)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
1.  Building a gold standard to construct search filters: a case study with biomarkers for oral cancer*† 
To support clinical researchers, librarians and informationists may need search filters for particular tasks. Development of filters typically depends on a “gold standard” dataset. This paper describes generalizable methods for creating a gold standard to support future filter development and evaluation using oral squamous cell carcinoma (OSCC) as a case study. OSCC is the most common malignancy affecting the oral cavity. Investigation of biomarkers with potential prognostic utility is an active area of research in OSCC. The methods discussed here should be useful for designing quality search filters in similar domains.
The authors searched MEDLINE for prognostic studies of OSCC, developed annotation guidelines for screeners, ran three calibration trials before annotating the remaining body of citations, and measured inter-annotator agreement (IAA).
We retrieved 1,818 citations. After calibration, we screened the remaining citations (n = 1,767; 97.2%); IAA was substantial (kappa = 0.76). The dataset has 497 (27.3%) citations representing OSCC studies of potential prognostic biomarkers.
The gold standard dataset is likely to be high quality and useful for future development and evaluation of filters for OSCC studies of potential prognostic biomarkers.
The methodology we used is generalizable to other domains requiring a reference standard to evaluate the performance of search filters. A gold standard is essential because the labels regarding relevance enable computation of diagnostic metrics, such as sensitivity and specificity. Librarians and informationists with data analysis skills could contribute to developing gold standard datasets and subsequent filters tuned for their patrons' domains of interest.
PMCID: PMC4279929  PMID: 25552941
2.  Mentors and tormentors on the road to informatics 
PMCID: PMC3988775  PMID: 24860259
3.  Feature Engineering and a Proposed Decision-Support System for Systematic Reviewers of Medical Evidence 
PLoS ONE  2014;9(1):e86277.
Evidence-based medicine depends on the timely synthesis of research findings. An important source of synthesized evidence resides in systematic reviews. However, a bottleneck in review production involves dual screening of citations with titles and abstracts to find eligible studies. For this research, we tested the effect of various kinds of textual information (features) on performance of a machine learning classifier. Based on our findings, we propose an automated system to reduce screeing burden, as well as offer quality assurance.
We built a database of citations from 5 systematic reviews that varied with respect to domain, topic, and sponsor. Consensus judgments regarding eligibility were inferred from published reports. We extracted 5 feature sets from citations: alphabetic, alphanumeric+, indexing, features mapped to concepts in systematic reviews, and topic models. To simulate a two-person team, we divided the data into random halves. We optimized the parameters of a Bayesian classifier, then trained and tested models on alternate data halves. Overall, we conducted 50 independent tests.
All tests of summary performance (mean F3) surpassed the corresponding baseline, P<0.0001. The ranks for mean F3, precision, and classification error were statistically different across feature sets averaged over reviews; P-values for Friedman's test were .045, .002, and .002, respectively. Differences in ranks for mean recall were not statistically significant. Alphanumeric+ features were associated with best performance; mean reduction in screening burden for this feature type ranged from 88% to 98% for the second pass through citations and from 38% to 48% overall.
A computer-assisted, decision support system based on our methods could substantially reduce the burden of screening citations for systematic review teams and solo reviewers. Additionally, such a system could deliver quality assurance both by confirming concordant decisions and by naming studies associated with discordant decisions for further consideration.
PMCID: PMC3903545  PMID: 24475099
4.  Are dentists interested in the oral-systemic disease connection? A qualitative study of an online community of 450 practitioners 
BMC Oral Health  2013;13:65.
Dentists in the US see an increasing number of patients with systemic conditions. These patients are challenging to care for when the relationship between oral and systemic disease is not well understood. The prevalence of professional isolation exacerbates the problem due to the difficulty in finding expert advice or peer support. This study aims to identify whether dentists discuss the oral-systemic connection and what aspects they discuss; to understand their perceptions of and attitudes toward the connection; and to determine what information they need to treat patients with systemic conditions.
We retrieved 14,576 messages posted to the Internet Dental Forum from April 2008 to May 2009. Using natural language processing and human classification, we identified substantive phrases and keywords and used them to retrieve 141messages on the oral-systemic connection. We then conducted coding and thematic analysis to identify recurring themes on the topic.
Dentists discuss a variety of topics on oral diseases and systemic health, with the association between periodontal and systemic diseases, the effect of dental materials or procedures on general health, and the impact of oral-systemic connection on practice behaviors as the leading topics. They also disseminate and share research findings on oral and systemic health with colleagues online. However, dentists are very cautious about the nature of the oral-systemic linkage that may not be causal. Nonetheless, they embrace the positive association as a motivating point for patients in practice. When treating patients with systemic conditions, dentists enquire about the cause of less common dental diseases potentially in relation to medical conditions in one-third of the cases and in half of the cases seek clinical guidelines and evidence-based interventions on treating dental diseases with established association with systemic conditions.
Dentists’ unmet information needs call for more research into the association between less studied dental conditions and systemic diseases, and more actionable clinical guidelines for well-researched disease connections. To improve dissemination and foster behavioral change, it is imperative to understand what information clinicians need and in which situations. Leveraging peer influence via social media could be a useful strategy to achieve the goal.
PMCID: PMC3924341  PMID: 24261423
Oral health; Systemic diseases; Evidence-based dentistry; Information needs; Social media
5.  Screening nonrandomized studies for medical systematic reviews: a comparative study of classifiers 
To investigate whether (1) machine learning classifiers can help identify nonrandomized studies eligible for full-text screening by systematic reviewers; (2) classifier performance varies with optimization; and (3) the number of citations to screen can be reduced.
We used an open-source, data-mining suite to process and classify biomedical citations that point to mostly nonrandomized studies from 2 systematic reviews. We built training and test sets for citation portions and compared classifier performance by considering the value of indexing, various feature sets, and optimization. We conducted our experiments in 2 phases. The design of phase I with no optimization was: 4 classifiers × 3 feature sets × 3 citation portions. Classifiers included k-nearest neighbor, naïve Bayes, complement naïve Bayes, and evolutionary support vector machine. Feature sets included bag of words, and 2- and 3-term n-grams. Citation portions included titles, titles and abstracts, and full citations with metadata. Phase II with optimization involved a subset of the classifiers, as well as features extracted from full citations, and full citations with overweighted titles. We optimized features and classifier parameters by manually setting information gain thresholds outside of a process for iterative grid optimization with 10-fold cross-validations. We independently tested models on data reserved for that purpose and statistically compared classifier performance on 2 types of feature sets. We estimated the number of citations needed to screen by reviewers during a second pass through a reduced set of citations.
In phase I, the evolutionary support vector machine returned the best recall for bag of words extracted from full citations; the best classifier with respect to overall performance was k-nearest neighbor. No classifier attained good enough recall for this task without optimization. In phase II, we boosted performance with optimization for evolutionary support vector machine and complement naïve Bayes classifiers. Generalization performance was better for the latter in the independent tests. For evolutionary support vector machine and complement naïve Bayes classifiers, the initial retrieval set was reduced by 46% and 35%, respectively.
Machine learning classifiers can help identify nonrandomized studies eligible for full-text screening by systematic reviewers. Optimization can markedly improve performance of classifiers. However, generalizability varies with the classifier. The number of citations to screen during a second independent pass through the citations can be substantially reduced.
PMCID: PMC3393813  PMID: 22677493
medical informatics; clinical research informatics; text mining; document classification; systematic reviews
6.  Comparative effectiveness research designs: an analysis of terms and coverage in Medical Subject Headings (MeSH) and Emtree*† 
We analyzed the extent to which comparative effectiveness research (CER) organizations share terms for designs, analyzed coverage of CER designs in Medical Subject Headings (MeSH) and Emtree, and explored whether scientists use CER design terms.
We developed local terminologies (LTs) and a CER design terminology by extracting terms in documents from five organizations. We defined coverage as the distribution over match type in MeSH and Emtree. We created a crosswalk by recording terms to which design terms mapped in both controlled vocabularies. We analyzed the hits for queries restricted to titles and abstracts to explore scientists' language.
Pairwise LT overlap ranged from 22.64% (12/53) to 75.61% (31/41). The CER design terminology (n = 78 terms) consisted of terms for primary study designs and a few terms useful for evaluating evidence, such as opinion paper and systematic review. Patterns of coverage were similar in MeSH and Emtree (gamma = 0.581, P = 0.002).
Stakeholder terminologies vary, and terms are inconsistently covered in MeSH and Emtree. The CER design terminology and crosswalk may be useful for expert searchers. For partially mapped terms, queries could consist of free text for modifiers such as nonrandomized or interrupted added to broad or related controlled terms.
PMCID: PMC3634392  PMID: 23646024
7.  Barriers to implementing evidence-based clinical guidelines: A survey of early adopters 
The purpose of this study is to identify barriers that early-adopting dentists perceive as common and challenging when implementing recommendations from evidence-based (EB) clinical guidelines.
This is a cross-sectional study. Dentists who attended the 2008 Evidence-based Dentistry Champion Conference were eligible for inclusion. Forty-three dentists (34%) responded to a 22-item questionnaire administered online. Two investigators independently coded and categorized responses to open-ended items. Descriptive statistics were computed to assess the frequency of barriers and perceived challenges.
The most common barriers to implementation are difficulty in changing current practice model, resistance and criticism from colleagues, and lack of trust in evidence or research. Barriers perceived as serious problems have to do with lack of up-to-date evidence, lack of clear answers to clinical questions, and contradictory information in the scientific literature.
Knowledge of barriers will help improve translation of biomedical research for dentists. Information in guidelines needs to be current, clear, and simplified for use at chairside; dentists’ fears need to be addressed.
PMCID: PMC3011934  PMID: 21093800
Dentistry; Evidence-Based Dentistry; Guidelines as Topic; Dental Informatics
8.  Using Natural Language Processing to Enable In-depth Analysis of Clinical Messages Posted to an Internet Mailing List: A Feasibility Study 
An Internet mailing list may be characterized as a virtual community of practice that serves as an information hub with easy access to expert advice and opportunities for social networking. We are interested in mining messages posted to a list for dental practitioners to identify clinical topics. Once we understand the topical domain, we can study dentists’ real information needs and the nature of their shared expertise, and can avoid delivering useless content at the point of care in future informatics applications. However, a necessary first step involves developing procedures to identify messages that are worth studying given our resources for planned, labor-intensive research.
The primary objective of this study was to develop a workflow for finding a manageable number of clinically relevant messages from a much larger corpus of messages posted to an Internet mailing list, and to demonstrate the potential usefulness of our procedures for investigators by retrieving a set of messages tailored to the research question of a qualitative research team.
We mined 14,576 messages posted to an Internet mailing list from April 2008 to May 2009. The list has about 450 subscribers, mostly dentists from North America interested in clinical practice. After extensive preprocessing, we used the Natural Language Toolkit to identify clinical phrases and keywords in the messages. Two academic dentists classified collocated phrases in an iterative, consensus-based process to describe the topics discussed by dental practitioners who subscribe to the list. We then consulted with qualitative researchers regarding their research question to develop a plan for targeted retrieval. We used selected phrases and keywords as search strings to identify clinically relevant messages and delivered the messages in a reusable database.
About half of the subscribers (245/450, 54.4%) posted messages. Natural language processing (NLP) yielded 279,193 clinically relevant tokens or processed words (19% of all tokens). Of these, 2.02% (5634 unique tokens) represent the vocabulary for dental practitioners. Based on pointwise mutual information score and clinical relevance, 325 collocated phrases (eg, fistula filled obturation and herpes zoster) with 108 keywords (eg, mercury) were classified into 13 broad categories with subcategories. In the demonstration, we identified 305 relevant messages (2.1% of all messages) over 10 selected categories with instances of collocated phrases, and 299 messages (2.1%) with instances of phrases or keywords for the category systemic disease.
A workflow with a sequence of machine-based steps and human classification of NLP-discovered phrases can support researchers who need to identify relevant messages in a much larger corpus. Discovered phrases and keywords are useful search strings to aid targeted retrieval. We demonstrate the potential value of our procedures for qualitative researchers by retrieving a manageable set of messages concerning systemic and oral disease.
PMCID: PMC3236668  PMID: 22112583
Dentistry; dental informatics; clinical research informatics; natural language processing; information storage and retrieval; electronic mail; information-seeking behavior
10.  Conceptual biology, hypothesis discovery, and text mining: Swanson's legacy 
Innovative biomedical librarians and information specialists who want to expand their roles as expert searchers need to know about profound changes in biology and parallel trends in text mining. In recent years, conceptual biology has emerged as a complement to empirical biology. This is partly in response to the availability of massive digital resources such as the network of databases for molecular biologists at the National Center for Biotechnology Information. Developments in text mining and hypothesis discovery systems based on the early work of Swanson, a mathematician and information scientist, are coincident with the emergence of conceptual biology. Very little has been written to introduce biomedical digital librarians to these new trends. In this paper, background for data and text mining, as well as for knowledge discovery in databases (KDD) and in text (KDT) is presented, then a brief review of Swanson's ideas, followed by a discussion of recent approaches to hypothesis discovery and testing. 'Testing' in the context of text mining involves partially automated methods for finding evidence in the literature to support hypothetical relationships. Concluding remarks follow regarding (a) the limits of current strategies for evaluation of hypothesis discovery systems and (b) the role of literature-based discovery in concert with empirical research. Report of an informatics-driven literature review for biomarkers of systemic lupus erythematosus is mentioned. Swanson's vision of the hidden value in the literature of science and, by extension, in biomedical digital databases, is still remarkably generative for information scientists, biologists, and physicians.
PMCID: PMC1459187  PMID: 16584552

Results 1-10 (10)