PMCC PMCC

Search tips
Search criteria

Advanced
Results 26-50 (1202)
 

Clipboard (0)
None

Select a Filter Below

Journals
more »
Year of Publication
more »
26.  Exploring the relationship between patients’ information preference style and knowledge acquisition process in a computerized patient decision aid randomized controlled trial 
Background
We have shown in a randomized controlled trial that a computerized patient decision aid (P-DA) improves medical knowledge and reduces decisional conflict, in early stage papillary thyroid cancer patients considering adjuvant radioactive iodine treatment. Our objectives were to examine the relationship between participants’ baseline information preference style and the following: 1) quantity of detailed information obtained within the P-DA, and 2) medical knowledge.
Methods
We randomized participants to exposure to a one-time viewing of a computerized P-DA (with usual care) or usual care alone. In pre-planned secondary analyses, we examined the relationship between information preference style (Miller Behavioural Style Scale, including respective monitoring [information seeking preference] and blunting [information avoidance preference] subscale scores) and the following: 1) the quantity of detailed information obtained from the P-DA (number of supplemental information clicks), and 2) medical knowledge. Spearman correlation values were calculated to quantify relationships, in the entire study population and respective study arms.
Results
In the 37 P-DA users, high monitoring information preference was moderately positively correlated with higher frequency of detailed information acquisition in the P-DA (r = 0.414, p = 0.011). The monitoring subscale score weakly correlated with increased medical knowledge in the entire study population (r = 0.268, p = 0.021, N = 74), but not in the respective study arms. There were no significant associations with the blunting subscale score.
Conclusions
Individual variability in information preferences may affect the process of information acquisition from computerized P-DA’s. More research is needed to understand how individual information preferences may impact medical knowledge acquisition and decision-making.
Electronic supplementary material
The online version of this article (doi:10.1186/s12911-015-0168-0) contains supplementary material, which is available to authorized users.
doi:10.1186/s12911-015-0168-0
PMCID: PMC4474358  PMID: 26088605
Cancer; Patient decision aid; Behaviour; Health information; Decision making; Consumer health information; Information seeking behaviors
27.  Fuzzy association rule mining and classification for the prediction of malaria in South Korea 
Background
Malaria is the world’s most prevalent vector-borne disease. Accurate prediction of malaria outbreaks may lead to public health interventions that mitigate disease morbidity and mortality.
Methods
We describe an application of a method for creating prediction models utilizing Fuzzy Association Rule Mining to extract relationships between epidemiological, meteorological, climatic, and socio-economic data from Korea. These relationships are in the form of rules, from which the best set of rules is automatically chosen and forms a classifier. Two classifiers have been built and their results fused to become a malaria prediction model. Future malaria cases are predicted as LOW, MEDIUM or HIGH, where these classes are defined as a total of 0–2, 3–16, and above 17 cases, respectively, for a region in South Korea during a two-week period. Based on user recommendations, HIGH is considered an outbreak.
Results
Model accuracy is described by Positive Predictive Value (PPV), Sensitivity, and F-score for each class, computed on test data not previously used to develop the model. For predictions made 7–8 weeks in advance, model PPV and Sensitivity are 0.842 and 0.681, respectively, for the HIGH classes. The F0.5 and F3 scores (which combine PPV and Sensitivity) are 0.804 and 0.694, respectively, for the HIGH classes. The overall FARM results (as measured by F-scores) are significantly better than those obtained by Decision Tree, Random Forest, Support Vector Machine, and Holt-Winters methods for the HIGH class. For the MEDIUM class, Random Forest and FARM obtain comparable results, with FARM being better at F0.5, and Random Forest obtaining a higher F3.
Conclusions
A previously described method for creating disease prediction models has been modified and extended to build models for predicting malaria. In addition, some new input variables were used, including indicators of intervention measures. The South Korea malaria prediction models predict LOW, MEDIUM or HIGH cases 7–8 weeks in the future. This paper demonstrates that our data driven approach can be used for the prediction of different diseases.
doi:10.1186/s12911-015-0170-6
PMCID: PMC4472166  PMID: 26084541
Malaria; Prediction; Association rule mining; Fuzzy logic; Classification; Environmental data; Socio-economic data; Epidemiological data
28.  Using a mobile health application to support self-management in chronic obstructive pulmonary disease: a six-month cohort study 
Background
Self-management strategies have the potential to support patients with chronic obstructive pulmonary disease (COPD). Telehealth interventions may have a role in delivering this support along with the opportunity to monitor symptoms and physiological variables. This paper reports findings from a six-month, clinical, cohort study of COPD patients’ use of a mobile telehealth based (mHealth) application and how individually determined alerts in oxygen saturation levels, pulse rate and symptoms scores related to patient self-initiated treatment for exacerbations.
Methods
The development of the mHealth intervention involved a patient focus group and multidisciplinary team of researchers, engineers and clinicians. Individual data thresholds to set alerts were determined, and the relationship to exacerbations, defined by the initiation of stand-by medications, was measured. The sample comprised 18 patients (age range of 50–85 years) with varied levels of computer skills.
Results
Patients identified no difficulties in using the mHealth application and used all functions available. 40 % of exacerbations had an alert signal during the three days prior to a patient starting medication. Patients were able to use the mHealth application to support self- management, including monitoring of clinical data. Within three months, 95 % of symptom reporting sessions were completed in less than 100 s.
Conclusions
Home based, unassisted, daily use of the mHealth platform is feasible and acceptable to people with COPD for reporting daily symptoms and medicine use, and to measure physiological variables such as pulse rate and oxygen saturation. These findings provide evidence for integrating telehealth interventions with clinical care pathways to support self-management in COPD.
doi:10.1186/s12911-015-0171-5
PMCID: PMC4472616  PMID: 26084626
Chronic obstructive pulmonary disease; Chronic condition; Self-management; Telehealth; Digital health; E-health; Mobile health; Alerts
30.  Detection of sentence boundaries and abbreviations in clinical narratives 
Background
In Western languages the period character is highly ambiguous, due to its double role as sentence delimiter and abbreviation marker. This is particularly relevant in clinical free-texts characterized by numerous anomalies in spelling, punctuation, vocabulary and with a high frequency of short forms.
Methods
The problem is addressed by two binary classifiers for abbreviation and sentence detection. A support vector machine exploiting a linear kernel is trained on different combinations of feature sets for each classification task. Feature relevance ranking is applied to investigate which features are important for the particular task. The methods are applied to German language texts from a medical record system, authored by specialized physicians.
Results
Two collections of 3,024 text snippets were annotated regarding the role of period characters for training and testing. Cohen's kappa resulted in 0.98. For abbreviation and sentence boundary detection we can report an unweighted micro-averaged F-measure using a 10-fold cross validation of 0.97 for the training set. For test set based evaluation we obtained an unweighted micro-averaged F-measure of 0.95 for abbreviation detection and 0.94 for sentence delineation. Language-dependent resources and rules were found to have less impact on abbreviation detection than on sentence delineation.
Conclusions
Sentence detection is an important task, which should be performed at the beginning of a text processing pipeline. For the text genre under scrutiny we showed that support vector machines exploiting a linear kernel produce state of the art results for sentence boundary detection. The results are comparable with other sentence boundary detection methods applied to English clinical texts. We identified abbreviation detection as a supportive task for sentence delineation.
doi:10.1186/1472-6947-15-S2-S4
PMCID: PMC4474545  PMID: 26099994
clinical narrative; natural language processing; machine learning
31.  SNOMED CT in a language isolate: an algorithm for a semiautomatic translation 
Background
The Systematized Nomenclature of Medicine - Clinical Terms (SNOMED CT) is officially released in English and Spanish. In the Basque Autonomous Community two languages, Spanish and Basque, are official. The first attempt to semi-automatically translate the SNOMED CT terminology content to Basque, a less resourced language is presented in this paper.
Methods
A translation algorithm that has its basis in Natural Language Processing methods has been designed and partially implemented. The algorithm comprises four phases from which the first two have been implemented and quantitatively evaluated.
Results
Results are promising as we obtained the equivalents in Basque of 21.41% of the disorder terms of the English SNOMED CT release. As the methods developed are focused on that hierarchy, the results in other hierarchies are lower (12.57% for body structure descriptions, 8.80% for findings and 3% for procedures).
Conclusions
We are in the way to reach two of our objectives when translating SNOMED CT to Basque: to use our language to access rich multilingual resources and to strengthen the use of the Basque language in the biomedical area.
doi:10.1186/1472-6947-15-S2-S5
PMCID: PMC4474582  PMID: 26100112
SNOMED CT translation; Basque Language Isolate; Natural Language Processing; Finite State Transducers
32.  Exploring Spanish health social media for detecting drug effects 
Background
Adverse Drug reactions (ADR) cause a high number of deaths among hospitalized patients in developed countries. Major drug agencies have devoted a great interest in the early detection of ADRs due to their high incidence and increasing health care costs. Reporting systems are available in order for both healthcare professionals and patients to alert about possible ADRs. However, several studies have shown that these adverse events are underestimated. Our hypothesis is that health social networks could be a significant information source for the early detection of ADRs as well as of new drug indications.
Methods
In this work we present a system for detecting drug effects (which include both adverse drug reactions as well as drug indications) from user posts extracted from a Spanish health forum. Texts were processed using MeaningCloud, a multilingual text analysis engine, to identify drugs and effects. In addition, we developed the first Spanish database storing drugs as well as their effects automatically built from drug package inserts gathered from online websites. We then applied a distant-supervision method using the database on a collection of 84,000 messages in order to extract the relations between drugs and their effects. To classify the relation instances, we used a kernel method based only on shallow linguistic information of the sentences.
Results
Regarding Relation Extraction of drugs and their effects, the distant supervision approach achieved a recall of 0.59 and a precision of 0.48.
Conclusions
The task of extracting relations between drugs and their effects from social media is a complex challenge due to the characteristics of social media texts. These texts, typically posts or tweets, usually contain many grammatical errors and spelling mistakes. Moreover, patients use lay terminology to refer to diseases, symptoms and indications that is not usually included in lexical resources in languages other than English.
doi:10.1186/1472-6947-15-S2-S6
PMCID: PMC4474583  PMID: 26100267
33.  Care episode retrieval: distributional semantic models for information retrieval in the clinical domain 
Patients' health related information is stored in electronic health records (EHRs) by health service providers. These records include sequential documentation of care episodes in the form of clinical notes. EHRs are used throughout the health care sector by professionals, administrators and patients, primarily for clinical purposes, but also for secondary purposes such as decision support and research. The vast amounts of information in EHR systems complicate information management and increase the risk of information overload. Therefore, clinicians and researchers need new tools to manage the information stored in the EHRs. A common use case is, given a - possibly unfinished - care episode, to retrieve the most similar care episodes among the records. This paper presents several methods for information retrieval, focusing on care episode retrieval, based on textual similarity, where similarity is measured through domain-specific modelling of the distributional semantics of words. Models include variants of random indexing and the semantic neural network model word2vec. Two novel methods are introduced that utilize the ICD-10 codes attached to care episodes to better induce domain-specificity in the semantic model. We report on experimental evaluation of care episode retrieval that circumvents the lack of human judgements regarding episode relevance. Results suggest that several of the methods proposed outperform a state-of-the art search engine (Lucene) on the retrieval task.
doi:10.1186/1472-6947-15-S2-S2
PMCID: PMC4474584  PMID: 26099735
34.  Using text mining techniques to extract phenotypic information from the PhenoCHF corpus 
Background
Phenotypic information locked away in unstructured narrative text presents significant barriers to information accessibility, both for clinical practitioners and for computerised applications used for clinical research purposes. Text mining (TM) techniques have previously been applied successfully to extract different types of information from text in the biomedical domain. They have the potential to be extended to allow the extraction of information relating to phenotypes from free text.
Methods
To stimulate the development of TM systems that are able to extract phenotypic information from text, we have created a new corpus (PhenoCHF) that is annotated by domain experts with several types of phenotypic information relating to congestive heart failure. To ensure that systems developed using the corpus are robust to multiple text types, it integrates text from heterogeneous sources, i.e., electronic health records (EHRs) and scientific articles from the literature. We have developed several different phenotype extraction methods to demonstrate the utility of the corpus, and tested these methods on a further corpus, i.e., ShARe/CLEF 2013.
Results
Evaluation of our automated methods showed that PhenoCHF can facilitate the training of reliable phenotype extraction systems, which are robust to variations in text type. These results have been reinforced by evaluating our trained systems on the ShARe/CLEF corpus, which contains clinical records of various types. Like other studies within the biomedical domain, we found that solutions based on conditional random fields produced the best results, when coupled with a rich feature set.
Conclusions
PhenoCHF is the first annotated corpus aimed at encoding detailed phenotypic information. The unique heterogeneous composition of the corpus has been shown to be advantageous in the training of systems that can accurately extract phenotypic information from a range of different text types. Although the scope of our annotation is currently limited to a single disease, the promising results achieved can stimulate further work into the extraction of phenotypic information for other diseases. The PhenoCHF annotation guidelines and annotations are publicly available at https://code.google.com/p/phenochf-corpus.
doi:10.1186/1472-6947-15-S2-S3
PMCID: PMC4474585  PMID: 26099853
35.  Understanding older women’s decision making and coping in the context of breast cancer treatment 
Background
Primary endocrine therapy (PET) is a recognised alternative to surgery followed by endocrine therapy for a subset of older, frailer women with breast cancer. Choice of treatment is preference-sensitive and may require decision support. Older patients are often conceptualised as passive decision-makers. The present study used the Coping in Deliberation (CODE) framework to gain insight into decision making and coping processes in a group of older women who have faced breast cancer treatment decisions, and to inform the development of a decision support intervention (DSI).
Methods
Semi-structured interviews were carried out with older women who had been offered a choice of PET or surgery from five UK hospital clinics. Women’s information and support needs, their breast cancer diagnosis and treatment decisions were explored. A secondary analysis of these interviews was conducted using the CODE framework to examine women’s appraisals of health threat and coping throughout the deliberation process.
Results
Interviews with 35 women aged 75-98 years were analysed. Appraisals of breast cancer and treatment options were sometimes only partial, with most women forming a preference for treatment relatively quickly. However, a number of considerations which women made throughout the deliberation process were identified, including: past experiences of cancer and its treatment; scope for choice; risks, benefits and consequences of treatment; instincts about treatment choice; and healthcare professionals’ recommendations. Women also described various strategies to cope with breast cancer and their treatment decisions. These included seeking information, obtaining practical and emotional support from healthcare professionals, friends and relatives, and relying on personal faith. Based on these findings, key questions were identified that women may ask during deliberation.
Conclusions
Many older women with breast cancer may be considered involved rather than passive decision-makers, and may benefit from DSIs designed to support decision making and coping within and beyond the clinic setting.
doi:10.1186/s12911-015-0167-1
PMCID: PMC4461993  PMID: 26058557
Breast cancer treatment; Old age; Decision making; Coping; Deliberation
36.  Predicting postoperative complications of head and neck squamous cell carcinoma in elderly patients using random forest algorithm model 
Background
Head and Neck Squamous Cell Carcinoma (HNSCC) has a high incidence in elderly patients. The postoperative complications present great challenges within treatment and they're hard for early warning.
Methods
Data from 525 patients diagnosed with HNSCC including a training set (n = 513) and an external testing set (n = 12) in our institution between 2006 and 2011 was collected. Variables involved are general demographic characteristics, complications, disease and treatment given. Five data mining algorithms were firstly exploited to construct predictive models in the training set. Subsequently, cross-validation was used to compare the different performance of these models and the best data mining algorithm model was then selected to perform the prediction in an external testing set.
Results
Data from 513 patients (age > 60 y) with HNSCC in a training set was included while 44 variables were selected (P < 0.05). Five predictive models were constructed; the model with 44 variables based on the Random Forest algorithm demonstrated the best accuracy (89.084 %) and the best AUC value (0.949). In an external testing set, the accuracy (83.333 %) and the AUC value (0.781) were obtained by using the random forest algorithm model.
Conclusions
Data mining should be a promising approach used for elderly patients with HNSCC to predict the probability of postoperative complications. Our results highlighted the potential of computational prediction of postoperative complications in elderly patients with HNSCC by using the random forest algorithm model.
doi:10.1186/s12911-015-0165-3
PMCID: PMC4459053  PMID: 26054335
Head and neck squamous cell carcinoma (HNSCC); Postoperative complications; Predictive model; Data mining (DM); Elderly patients
37.  PubMed-supported clinical term weighting approach for improving inter-patient similarity measure in diagnosis prediction 
Background
Similarity-based retrieval of Electronic Health Records (EHRs) from large clinical information systems provides physicians the evidence support in making diagnoses or referring examinations for the suspected cases. Clinical Terms in EHRs represent high-level conceptual information and the similarity measure established based on these terms reflects the chance of inter-patient disease co-occurrence. The assumption that clinical terms are equally relevant to a disease is unrealistic, reducing the prediction accuracy. Here we propose a term weighting approach supported by PubMed search engine to address this issue.
Methods
We collected and studied 112 abdominal computed tomography imaging examination reports from four hospitals in Hong Kong. Clinical terms, which are the image findings related to hepatocellular carcinoma (HCC), were extracted from the reports. Through two systematic PubMed search methods, the generic and specific term weightings were established by estimating the conditional probabilities of clinical terms given HCC. Each report was characterized by an ontological feature vector and there were totally 6216 vector pairs. We optimized the modified direction cosine (mDC) with respect to a regularization constant embedded into the feature vector. Equal, generic and specific term weighting approaches were applied to measure the similarity of each pair and their performances for predicting inter-patient co-occurrence of HCC diagnoses were compared by using Receiver Operating Characteristics (ROC) analysis.
Results
The Areas under the curves (AUROCs) of similarity scores based on equal, generic and specific term weighting approaches were 0.735, 0.728 and 0.743 respectively (p < 0.01). In comparison with equal term weighting, the performance was significantly improved by specific term weighting (p < 0.01) but not by generic term weighting. The clinical terms “Dysplastic nodule”, “nodule of liver” and “equal density (isodense) lesion” were found the top three image findings associated with HCC in PubMed.
Conclusions
Our findings suggest that the optimized similarity measure with specific term weighting to EHRs can improve significantly the accuracy for predicting the inter-patient co-occurrence of diagnosis when compared with equal and generic term weighting approaches.
Electronic supplementary material
The online version of this article (doi:10.1186/s12911-015-0166-2) contains supplementary material, which is available to authorized users.
doi:10.1186/s12911-015-0166-2
PMCID: PMC4450834  PMID: 26032596
38.  Design and evaluation of a software for the objective and easy-to-read presentation of new drug properties to physicians 
Background
When new pharmaceutical products appear on the market, physicians need to know whether they are likely to be useful in their practices. Physicians currently obtain most of their information about the market release and properties of new drugs from pharmaceutical industry representatives. However, the official information contained in the summary of product characteristics (SPCs) and evaluation reports from health agencies, provide a more complete view of the potential value of new drugs, although they can be long and difficult to read. The main objective of this work was to design a prototype computer program to facilitate the objective appraisal of the potential value of a new pharmaceutical product by physicians. This prototype is based on the modeling of pharmaceutical innovations described in a previous paper.
Methods
The interface was designed to allow physicians to develop a rapid understanding of the value of a new drug for their practices. We selected five new pharmaceutical products, to illustrate the function of this prototype. We considered only the texts supplied by national or international drug agencies at the time of market release. The perceived usability of the prototype was evaluated qualitatively, except for the System Usability Scale (SUS) score evaluation, by 10 physicians differing in age and medical background.
Results
The display is based on the various axes of the conceptual model of pharmaceutical innovations. The user can select three levels of detail when consulting this information (highly synthetic, synthetic and detailed). Tables provide a comparison of the properties of the new pharmaceutical product with those of existing drugs, if available for the same indication, in terms of efficacy, safety and ease of use.
The interface was highly appreciated by evaluators, who found it easy to understand and suggested no other additions of important, internationally valid information. The mean System Usability Scale score for the 10 physicians was 82, corresponding to a “good” user interface.
Conclusions
This work led us to propose the selection, grouping, and mode of presentation for various types of knowledge on pharmaceutical innovations in a way that was appreciated by evaluators. It provides physicians with readily accessible objective information about new drugs.
Electronic supplementary material
The online version of this article (doi:10.1186/s12911-015-0158-2) contains supplementary material, which is available to authorized users.
doi:10.1186/s12911-015-0158-2
PMCID: PMC4460682  PMID: 26025025
39.  Patient-reported outcomes in a large community-based pain medicine practice: evaluation for use in phenotype modeling 
Background
An academic, community medicine partnership was established to build a phenotype-to-outcome model targeting chronic pain. This model will be used to drive clinical decision support for pain medicine in the community setting. The first step in this effort is an examination of the electronic health records (EHR) from clinics that treat chronic pain. The biopsychosocial components provided by both patients and care providers must be of sufficient scope to populate the spectrum of patient types, treatment modalities, and possible outcomes.
Methods
The patient health records from a large Midwest pain medicine practice (Michigan Pain Consultants, PC) contains physician notes, administrative codes, and patient-reported outcomes (PRO) on over 30,000 patients during the study period spanning 2010 to mid-2014. The PRO consists of a regularly administered Pain Health Assessment (PHA), a biopsychosocial, demographic, and symptomology questionnaire containing 163 items, which is completed approximately every six months with a compliance rate of over 95 %. The biopsychosocial items (74 items with Likert scales of 0–10) were examined by exploratory factor analysis and descriptive statistics to determine the number of independent constructs available for phenotypes and outcomes. Pain outcomes were examined both in the aggregate and the mean of longitudinal changes in each patient.
Results
Exploratory factor analysis of the intake PHA revealed 15 orthogonal factors representing pain levels; physical, social, and emotional functions; the effects of pain on these functions; vitality and health; and measures of outcomes and satisfaction. Seven items were independent of the factors, offering unique information. As an exemplar of outcomes from the follow-up PHAs, patients reported approximately 60 % relief in their pain. When examined in the aggregate, patients showed both a decrease in pain levels and an increase in coping skills with an increased number of visits. When examined individually, 80-85 % of patients presenting with the highest pain levels reported improvement by approximately two points on an 11-point pain scale.
Conclusions
We conclude that the data available in a community practice can be a rich source of biopsychosocial information relevant to the phenotypes of chronic pain. It is anticipated that phenotype linkages to best treatments and outcomes can be constructed from this set of records.
doi:10.1186/s12911-015-0164-4
PMCID: PMC4446111  PMID: 26017305
Chronic pain; Patient-reported outcomes; Factor analysis; Pain severity
40.  ODM2CDA and CDA2ODM: Tools to convert documentation forms between EDC and EHR systems 
Background
Clinical trials apply standards approved by regulatory agencies for Electronic Data Capture (EDC). Operational Data Model (ODM) from Clinical Data Interchange Standards Consortium (CDISC) is commonly used. Electronic Health Record (EHR) systems for patient care predominantly apply HL7 standards, specifically Clinical Document Architecture (CDA). In recent years more and more patient data is processed in electronic form.
Results
An open source reference implementation was designed and implemented to convert forms between ODM and CDA format. There are limitations of this conversion method due to different scope and design of ODM and CDA. Specifically, CDA has a multi-level hierarchical structure and CDA nodes can contain both XML values and XML attributes.
Conclusions
Automated transformation of ODM files to CDA and vice versa is technically feasible in principle.
doi:10.1186/s12911-015-0163-5
PMCID: PMC4494189  PMID: 26004011
EHR; EDC; CDA; ODM; Documentation form; Data model
41.  Electronic medical record-based multicondition models to predict the risk of 30 day readmission or death among adult medicine patients: validation and comparison to existing models 
Background
There is increasing interest in using prediction models to identify patients at risk of readmission or death after hospital discharge, but existing models have significant limitations. Electronic medical record (EMR) based models that can be used to predict risk on multiple disease conditions among a wide range of patient demographics early in the hospitalization are needed. The objective of this study was to evaluate the degree to which EMR-based risk models for 30-day readmission or mortality accurately identify high risk patients and to compare these models with published claims-based models.
Methods
Data were analyzed from all consecutive adult patients admitted to internal medicine services at 7 large hospitals belonging to 3 health systems in Dallas/Fort Worth between November 2009 and October 2010 and split randomly into derivation and validation cohorts. Performance of the model was evaluated against the Canadian LACE mortality or readmission model and the Centers for Medicare and Medicaid Services (CMS) Hospital Wide Readmission model.
Results
Among the 39,604 adults hospitalized for a broad range of medical reasons, 2.8 % of patients died, 12.7 % were readmitted, and 14.7 % were readmitted or died within 30 days after discharge. The electronic multicondition models for the composite outcome of 30-day mortality or readmission had good discrimination using data available within 24 h of admission (C statistic 0.69; 95 % CI, 0.68-0.70), or at discharge (0.71; 95 % CI, 0.70-0.72), and were significantly better than the LACE model (0.65; 95 % CI, 0.64-0.66; P =0.02) with significant NRI (0.16) and IDI (0.039, 95 % CI, 0.035-0.044). The electronic multicondition model for 30-day readmission alone had good discrimination using data available within 24 h of admission (C statistic 0.66; 95 % CI, 0.65-0.67) or at discharge (0.68; 95 % CI, 0.67-0.69), and performed significantly better than the CMS model (0.61; 95 % CI, 0.59-0.62; P < 0.01) with significant NRI (0.20) and IDI (0.037, 95 % CI, 0.033-0.041).
Conclusions
A new electronic multicondition model based on information derived from the EMR predicted mortality and readmission at 30 days, and was superior to previously published claims-based models.
Electronic supplementary material
The online version of this article (doi:10.1186/s12911-015-0162-6) contains supplementary material, which is available to authorized users.
doi:10.1186/s12911-015-0162-6
PMCID: PMC4474456  PMID: 25991003
Readmission; Predictive model; All-cause readmission; Electronic medical record
42.  Developing a hybrid dictionary-based bio-entity recognition technique 
Background
Bio-entity extraction is a pivotal component for information extraction from biomedical literature. The dictionary-based bio-entity extraction is the first generation of Named Entity Recognition (NER) techniques.
Methods
This paper presents a hybrid dictionary-based bio-entity extraction technique. The approach expands the bio-entity dictionary by combining different data sources and improves the recall rate through the shortest path edit distance algorithm. In addition, the proposed technique adopts text mining techniques in the merging stage of similar entities such as Part of Speech (POS) expansion, stemming, and the exploitation of the contextual cues to further improve the performance.
Results
The experimental results show that the proposed technique achieves the best or at least equivalent performance among compared techniques, GENIA, MESH, UMLS, and combinations of these three resources in F-measure.
Conclusions
The results imply that the performance of dictionary-based extraction techniques is largely influenced by information resources used to build the dictionary. In addition, the edit distance algorithm shows steady performance with three different dictionaries in precision whereas the context-only technique achieves a high-end performance with three difference dictionaries in recall.
doi:10.1186/1472-6947-15-S1-S9
PMCID: PMC4460617  PMID: 26043907
43.  A formal concept analysis and semantic query expansion cooperation to refine health outcomes of interest 
Background
Electronic Health Records (EHRs) are frequently used by clinicians and researchers to search for, extract, and analyze groups of patients by defining Health Outcome of Interests (HOI). The definition of an HOI is generally considered a complex and time consuming task for health care professionals.
Methods
In our clinical note-based pharmacovigilance research, we often operate upon potentially hundreds of ontologies at once, expand query inputs, and we also increase the search space over clinical text as well as structured data. Such a method implies to specify an initial set of seed concepts, which are based on concept unique identifiers. This paper presents a novel method based on Formal Concept Analysis (FCA) and Semantic Query Expansion (SQE) to assist the end-user in defining their seed queries and in refining the expanded search space that it encompasses.
Results
We evaluate our method over a gold-standard corpus from the 2008 i2b2 Obesity Challenge. This experimentation emphasizes positive results for sensitivity and specificity measures. Our new approach provides better recall with high precision of the obtained results. The most promising aspect of this approach consists in the discovery of positive results not present our Obesity NLP reference set.
Conclusions
Together with a Web graphical user interface, our FCA and SQE cooperation end up being an efficient approach for refining health outcome of interest using plain terms. We consider that this approach can be extended to support other domains such as cohort building tools.
doi:10.1186/1472-6947-15-S1-S8
PMCID: PMC4460622  PMID: 26043839
Health outcome of interest; Ontology; Semantic Query Expansion; Formal Concept Analysis
44.  Injury narrative text classification using factorization model 
Narrative text is a useful way of identifying injury circumstances from the routine emergency department data collections. Automatically classifying narratives based on machine learning techniques is a promising technique, which can consequently reduce the tedious manual classification process. Existing works focus on using Naive Bayes which does not always offer the best performance. This paper proposes the Matrix Factorization approaches along with a learning enhancement process for this task. The results are compared with the performance of various other classification approaches. The impact on the classification results from the parameters setting during the classification of a medical text dataset is discussed. With the selection of right dimension k, Non Negative Matrix Factorization-model method achieves 10 CV accuracy of 0.93.
doi:10.1186/1472-6947-15-S1-S5
PMCID: PMC4460654  PMID: 26043671
Narrative Text; Classification; Pre-processing; Matrix Factorization; Learning Enhancement
45.  Entity linking for biomedical literature 
Background
The Entity Linking (EL) task links entity mentions from an unstructured document to entities in a knowledge base. Although this problem is well-studied in news and social media, this problem has not received much attention in the life science domain. One outcome of tackling the EL problem in the life sciences domain is to enable scientists to build computational models of biological processes with more efficiency. However, simply applying a news-trained entity linker produces inadequate results.
Methods
Since existing supervised approaches require a large amount of manually-labeled training data, which is currently unavailable for the life science domain, we propose a novel unsupervised collective inference approach to link entities from unstructured full texts of biomedical literature to 300 ontologies. The approach leverages the rich semantic information and structures in ontologies for similarity computation and entity ranking.
Results
Without using any manual annotation, our approach significantly outperforms state-of-the-art supervised EL method (9% absolute gain in linking accuracy). Furthermore, the state-of-the-art supervised EL method requires 15,000 manually annotated entity mentions for training. These promising results establish a benchmark for the EL task in the life science domain. We also provide in depth analysis and discussion on both challenges and opportunities on automatic knowledge enrichment for scientific literature.
Conclusions
In this paper, we propose a novel unsupervised collective inference approach to address the EL problem in a new domain. We show that our unsupervised approach is able to outperform a current state-of-the-art supervised approach that has been trained with a large amount of manually labeled data. Life science presents an underrepresented domain for applying EL techniques. By providing a small benchmark data set and identifying opportunities, we hope to stimulate discussions across natural language processing and bioinformatics and motivate others to develop techniques for this largely untapped domain.
doi:10.1186/1472-6947-15-S1-S4
PMCID: PMC4460707  PMID: 26045232
semantic web; biological ontologies; text mining; signal transduction; wikification; entity linking; biomedical literature
46.  Identification of genomic features in the classification of loss- and gain-of-function mutation 
Background
Alterations of a genome can lead to changes in protein functions. Through these genetic mutations, a protein can lose its native function (loss-of-function, LoF), or it can confer a new function (gain-of-function, GoF). However, when a mutation occurs, it is difficult to determine whether it will result in a LoF or a GoF. Therefore, in this paper, we propose a study that analyzes the genomic features of LoF and GoF instances to find features that can be used to classify LoF and GoF mutations.
Methods
In order to collect experimentally verified LoF and GoF mutational information, we obtained 816 LoF mutations and 474 GoF mutations from a literature text-mining process. Next, with data-preprocessing steps, 258 LoF and 129 GoF mutations remained for a further analysis. We analyzed the properties of these LoF and GoF mutations. Among the properties, we selected features which show different tendencies between the two groups and implemented classifications using support vector machine, random forest, and linear logistic regression methods to confirm whether or not these features can identify LoF and GoF mutations.
Results
We analyzed the properties of the LoF and GoF mutations and identified six features which have discriminative power between LoF and GoF conditions: the reference allele, the substituted allele, mutation type, mutation impact, subcellular location, and protein domain. When using the six selected features with the random forest, support vector machine, and linear logistic regression classifiers, the result showed accuracy levels of 72.23%, 71.28%, and 70.19%, respectively.
Conclusions
We analyzed LoF and GoF mutations and selected several properties which were different between the two classes. By implementing classifications with the selected features, it is demonstrated that the selected features have good discriminative power.
doi:10.1186/1472-6947-15-S1-S6
PMCID: PMC4460711  PMID: 26043747
47.  Parsing clinical text: how good are the state-of-the-art parsers? 
Background
Parsing, which generates a syntactic structure of a sentence (a parse tree), is a critical component of natural language processing (NLP) research in any domain including medicine. Although parsers developed in the general English domain, such as the Stanford parser, have been applied to clinical text, there are no formal evaluations and comparisons of their performance in the medical domain.
Methods
In this study, we investigated the performance of three state-of-the-art parsers: the Stanford parser, the Bikel parser, and the Charniak parser, using following two datasets: (1) A Treebank containing 1,100 sentences that were randomly selected from progress notes used in the 2010 i2b2 NLP challenge and manually annotated according to a Penn Treebank based guideline; and (2) the MiPACQ Treebank, which is developed based on pathology notes and clinical notes, containing 13,091 sentences. We conducted three experiments on both datasets. First, we measured the performance of the three state-of-the-art parsers on the clinical Treebanks with their default settings. Then we re-trained the parsers using the clinical Treebanks and evaluated their performance using the 10-fold cross validation method. Finally we re-trained the parsers by combining the clinical Treebanks with the Penn Treebank.
Results
Our results showed that the original parsers achieved lower performance in clinical text (Bracketing F-measure in the range of 66.6%-70.3%) compared to general English text. After retraining on the clinical Treebank, all parsers achieved better performance, with the best performance from the Stanford parser that reached the highest Bracketing F-measure of 73.68% on progress notes and 83.72% on the MiPACQ corpus using 10-fold cross validation. When the combined clinical Treebanks and Penn Treebank was used, of the three parsers, the Charniak parser achieved the highest Bracketing F-measure of 73.53% on progress notes and the Stanford parser reached the highest F-measure of 84.15% on the MiPACQ corpus.
Conclusions
Our study demonstrates that re-training using clinical Treebanks is critical for improving general English parsers' performance on clinical text, and combining clinical and open domain corpora might achieve optimal performance for parsing clinical text.
doi:10.1186/1472-6947-15-S1-S2
PMCID: PMC4460747  PMID: 26045009
Medical language processing; natural language processing; parsing; clinical text; NLP
48.  Discovering transnosological molecular basis of human brain diseases using biclustering analysis of integrated gene expression data 
Background
It has been reported that several brain diseases can be treated as transnosological manner implicating possible common molecular basis under those diseases. However, molecular level commonality among those brain diseases has been largely unexplored. Gene expression analyses of human brain have been used to find genes associated with brain diseases but most of those studies were restricted either to an individual disease or to a couple of diseases. In addition, identifying significant genes in such brain diseases mostly failed when it used typical methods depending on differentially expressed genes.
Results
In this study, we used a correlation-based biclustering approach to find coexpressed gene sets in five neurodegenerative diseases and three psychiatric disorders. By using biclustering analysis, we could efficiently and fairly identified various gene sets expressed specifically in both single and multiple brain diseases. We could find 4,307 gene sets correlatively expressed in multiple brain diseases and 3,409 gene sets exclusively specified in individual brain diseases. The function enrichment analysis of those gene sets showed many new possible functional bases as well as neurological processes that are common or specific for those eight diseases.
Conclusions
This study introduces possible common molecular bases for several brain diseases, which open the opportunity to clarify the transnosological perspective assumed in brain diseases. It also showed the advantages of correlation-based biclustering analysis and accompanying function enrichment analysis for gene expression data in this type of investigation.
doi:10.1186/1472-6947-15-S1-S7
PMCID: PMC4460778  PMID: 26043779
49.  Inference of brain pathway activities for Alzheimer's disease classification 
Background
Alzheimer's disease (AD) is a neurodegenerative and progressive disorder that results in brain malfunctions. Resting-state (RS) functional magnetic resonance imaging (fMRI) techniques have been successfully applied for quantifying brain activities of both Alzheimer's disease (AD) and amnestic mild cognitive impairment (aMCI) patients. Region-based approaches are widely utilized to classify patients from cognitively normal subjects (CN). Nevertheless, region-based approaches have a few limitations, reproducibility owing to selection of disease-specific brain regions, and heterogeneity of brain activities during disease progression. For coping with these issues, network-based approaches have been suggested in the field of molecular bioinformatics. In comparison with individual gene-based approaches, they acquired more accurate results in diverse disease classification, and reproducibility was confirmed by replication studies. In our work, we applied a similar methodology integrating brain pathway information into pathway activity inference, and permitting classification of both aMCI and AD patients based on pathway activities rather than single region activities.
Results
After aggregating the 59 brain pathways from literature, we estimated brain pathway activities by using exhaustive search algorithms between patients and cognitively normal subjects, and identified discriminatory pathways according to disease progression. We used three different data sets and each data set consists of two different groups. Our results show that the pathway-based approach (AUC = 0.89, 0.9, 0.75) outperformed the region-based approach (AUC = 0.69, 0.8, 0.68). Also, our approach provided enhanced diagnostic power achieving higher accuracy, sensitivity, and specificity (pathway-based approach: accuracy = 83%; sensitivity = 86%; specificity = 78%, region-based approach: accuracy = 74%; sensitivity = 78%; specificity = 76%).
Conclusions
We proposed a novel method inferring brain pathway activities for disease classification. Our approach shows better classification performance than region-based approach in four classification models. We expect that brain pathway-based approach would be helpful for precise classification of brain disorders, and provide new opportunities for uncovering disrupted brain pathways caused by disease. Moreover, discriminatory pathways between patients and cognitively normal subjects may facilitate the interpretation of functional alterations during disease progression.
doi:10.1186/1472-6947-15-S1-S1
PMCID: PMC4460780  PMID: 26044913
50.  Context-based resolution of semantic conflicts in biological pathways 
Background
Interactions between biological entities such as genes, proteins and metabolites, so called pathways, are key features to understand molecular mechanisms of life. As pathway information is being accumulated rapidly through various knowledge resources, there are growing interests in maintaining the integrity of the heterogeneous databases.
Methods
Here, we defined conflict as a status where two contradictory pieces of evidence (i.e. 'A increases B' and 'A decreases B') coexist in a same pathway. This conflict damages unity so that inference of simulation on the integrated pathway network might be unreliable. We defined rule and rule group. A rule consists of interaction of two entities, meta-relation (increase or decrease), and contexts terms about tissue specificity or environmental conditions. The rules, which have the same interaction, are grouped into a rule group. If the rules don't have a unanimous meta-relation, the rule group and the rules are judged as being conflicting.
Results
This analysis revealed that almost 20% of known interactions suffer from conflicting information and conflicting information occurred much more frequently in the literature than the public database.
With consideration for dual functions depending on context, we thought it might resolve conflict to consider context. We grouped rules, which have the same context terms as well as interaction. It's revealed that up to 86% of the conflicts could be resolved by considering context.
Subsequent analysis also showed that those contradictory records generally compete each other closely, but some information might be suspicious when their evidence levels are seriously imbalanced.
Conclusions
By identifying and resolving the conflicts, we expect that pathway databases can be cleaned and used for better secondary analyses such as gene/protein annotation, network dynamics and qualitative/quantitative simulation.
doi:10.1186/1472-6947-15-S1-S3
PMCID: PMC4461014  PMID: 26045143

Results 26-50 (1202)