PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-10 (10)
 

Clipboard (0)
None

Select a Filter Below

Journals
Year of Publication
Document Types
1.  Use of Fluorescence in situ Hybridization to Predict Patient Response to BCG Therapy for Bladder Cancer: Results of a Prospective Trial 
The Journal of Urology  2012;187(3):862-867.
Purpose
No reliable methods currently exist to predict patient response to intravesical immunotherapy with bacillus Calmette-Guérin (BCG), given after transurethral resection for high-risk non-muscle-invasive bladder cancer. We initiated a prospective clinical trial to determine whether fluorescence in situ hybridization (FISH) results during BCG immunotherapy can predict therapy failure.
Materials and Methods
Candidates for standard of care BCG were offered participation in a clinical trial. FISH was performed prior to BCG and at 6 weeks, 3 months, and 6 months during BCG therapy with maintenance. Cox proportional hazards regression was used to assess the relationship between FISH results and tumor recurrence or progression; the Kaplan-Meier product limit method was used to estimate recurrence- and progression-free survival.
Results
One hundred twenty-six patients participated. At a median follow-up of 24 months, 31% of patients had recurrent tumors and 14% experienced disease progression. Patients who had positive FISH results during BCG therapy were 3-5 times more likely than those who had negative FISH results to develop recurrent tumors and 5-13 times more likely to experience disease progression (p < 0.01). The timing of positive FISH results also affected outcome; for example, patients with a negative FISH result at baseline, 6 weeks, and 3 months demonstrated an 8.3% recurrence rate, compared to 48.1% in those with a positive FISH result at all three time points.
Conclusions
FISH results can identify patients who are at risk of tumor recurrence and progression during BCG immunotherapy. This information may be used to counsel patients about alternative treatment strategies.
doi:10.1016/j.juro.2011.10.144
PMCID: PMC3278506  PMID: 22245325
bladder cancer; BCG; FISH; response; prediction
2.  Enhancing clinical concept extraction with distributional semantics 
Journal of Biomedical Informatics  2011;45(1):129-140.
Extracting concepts (such as drugs, symptoms, and diagnoses) from clinical narratives constitutes a basic enabling technology to unlock the knowledge within and support more advanced reasoning applications such as diagnosis explanation, disease progression modeling, and intelligent analysis of the effectiveness of treatment. The recent release of annotated training sets of de-identified clinical narratives has contributed to the development and refinement of concept extraction methods. However, as the annotation process is labor-intensive, training data are necessarily limited in the concepts and concept patterns covered, which impacts the performance of supervised machine learning applications trained with these data. This paper proposes an approach to minimize this limitation by combining supervised machine learning with empirical learning of semantic relatedness from the distribution of the relevant words in additional unannotated text.
The approach uses a sequential discriminative classifier (Conditional Random Fields) to extract the mentions of medical problems, treatments and tests from clinical narratives. It takes advantage of all Medline abstracts indexed as being of the publication type “clinical trials” to estimate the relatedness between words in the i2b2/VA training and testing corpora. In addition to the traditional features such as dictionary matching, pattern matching and part-of-speech tags, we also used as a feature words that appear in similar contexts to the word in question (that is, words that have a similar vector representation measured with the commonly used cosine metric, where vector representations are derived using methods of distributional semantics). To the best of our knowledge, this is the first effort exploring the use of distributional semantics, the semantics derived empirically from unannotated text often using vector space models, for a sequence classification task such as concept extraction. Therefore, we first experimented with different sliding window models and found the model with parameters that led to best performance in a preliminary sequence labeling task.
The evaluation of this approach, performed against the i2b2/VA concept extraction corpus, showed that incorporating features based on the distribution of words across a large unannotated corpus significantly aids concept extraction. Compared to a supervised-only approach as a baseline, the micro-averaged f-measure for exact match increased from 80.3% to 82.3% and the micro-averaged f-measure based on inexact match increased from 89.7% to 91.3%. These improvements are highly significant according to the bootstrap resampling method and also considering the performance of other systems. Thus, distributional semantic features significantly improve the performance of concept extraction from clinical narratives by taking advantage of word distribution information obtained from unannotated data.
doi:10.1016/j.jbi.2011.10.007
PMCID: PMC3272090  PMID: 22085698
NLP; Information extraction; NER; Distributional Semantics; Clinical Informatics
3.  Enhancing Phylogeography by Improving Geographical Information from GenBank 
Journal of biomedical informatics  2011;44(Suppl 1):S44-S47.
Phylogeography is a field that focuses on the geographical lineages of species such as vertebrates or viruses. Here, geographical data, such as location of a species or viral host is as important as the sequence information extracted from the species. Together, this information can help illustrate the migration of the species over time within a geographical area, the impact of geography over the evolutionary history, or the expected population of the species within the area. Molecular sequence data from NCBI, specifically GenBank, provide an abundance of available sequence data for phylogeography. However, geographical data is inconsistently represented and sparse across GenBank entries. This can impede analysis and in situations where the geographical information is inferred, and potentially lead to erroneous results. In this paper, we describe the current state of geographical data in GenBank, and illustrate how automated processing techniques such as named entity recognition, can enhance the geographical data available for phylogeographic studies.
doi:10.1016/j.jbi.2011.06.005
PMCID: PMC3199023  PMID: 21723960
Phylogeography; Databases; Nucleic Acid; Geographic Locations; Bioinformatics
4.  A Hybrid System for Emotion Extraction from Suicide Notes 
Biomedical Informatics Insights  2012;5(Suppl. 1):165-174.
The reasons that drive someone to commit suicide are complex and their study has attracted the attention of scientists in different domains. Analyzing this phenomenon could significantly improve the preventive efforts. In this paper we present a method for sentiment analysis of suicide notes submitted to the i2b2/VA/Cincinnati Shared Task 2011. In this task the sentences of 900 suicide notes were labeled with the possible emotions that they reflect. In order to label the sentence with emotions, we propose a hybrid approach which utilizes both rule based and machine learning techniques. To solve the multi class problem a rule-based engine and an SVM model is used for each category. A set of syntactic and semantic features are selected for each sentence to build the rules and train the classifier. The rules are generated manually based on a set of lexical and emotional clues. We propose a new approach to extract the sentence’s clauses and constitutive grammatical elements and to use them in syntactic and semantic feature generation. The method utilizes a novel method to measure the polarity of the sentence based on the extracted grammatical elements, reaching precision of 41.79 with recall of 55.03 for an f-measure of 47.50. The overall mean f-measure of all submissions was 48.75% with a standard deviation of 7%.
doi:10.4137/BII.S8981
PMCID: PMC3409484  PMID: 22879773
NLP; sentiment analysis; emotion classification; polarity measurement; machine learning
5.  The Protein-Protein Interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text 
BMC Bioinformatics  2011;12(Suppl 8):S3.
Background
Determining usefulness of biomedical text mining systems requires realistic task definition and data selection criteria without artificial constraints, measuring performance aspects that go beyond traditional metrics. The BioCreative III Protein-Protein Interaction (PPI) tasks were motivated by such considerations, trying to address aspects including how the end user would oversee the generated output, for instance by providing ranked results, textual evidence for human interpretation or measuring time savings by using automated systems. Detecting articles describing complex biological events like PPIs was addressed in the Article Classification Task (ACT), where participants were asked to implement tools for detecting PPI-describing abstracts. Therefore the BCIII-ACT corpus was provided, which includes a training, development and test set of over 12,000 PPI relevant and non-relevant PubMed abstracts labeled manually by domain experts and recording also the human classification times. The Interaction Method Task (IMT) went beyond abstracts and required mining for associations between more than 3,500 full text articles and interaction detection method ontology concepts that had been applied to detect the PPIs reported in them.
Results
A total of 11 teams participated in at least one of the two PPI tasks (10 in ACT and 8 in the IMT) and a total of 62 persons were involved either as participants or in preparing data sets/evaluating these tasks. Per task, each team was allowed to submit five runs offline and another five online via the BioCreative Meta-Server. From the 52 runs submitted for the ACT, the highest Matthew's Correlation Coefficient (MCC) score measured was 0.55 at an accuracy of 89% and the best AUC iP/R was 68%. Most ACT teams explored machine learning methods, some of them also used lexical resources like MeSH terms, PSI-MI concepts or particular lists of verbs and nouns, some integrated NER approaches. For the IMT, a total of 42 runs were evaluated by comparing systems against manually generated annotations done by curators from the BioGRID and MINT databases. The highest AUC iP/R achieved by any run was 53%, the best MCC score 0.55. In case of competitive systems with an acceptable recall (above 35%) the macro-averaged precision ranged between 50% and 80%, with a maximum F-Score of 55%.
Conclusions
The results of the ACT task of BioCreative III indicate that classification of large unbalanced article collections reflecting the real class imbalance is still challenging. Nevertheless, text-mining tools that report ranked lists of relevant articles for manual selection can potentially reduce the time needed to identify half of the relevant articles to less than 1/4 of the time when compared to unranked results. Detecting associations between full text articles and interaction detection method PSI-MI terms (IMT) is more difficult than might be anticipated. This is due to the variability of method term mentions, errors resulting from pre-processing of articles provided as PDF files, and the heterogeneity and different granularity of method term concepts encountered in the ontology. However, combining the sophisticated techniques developed by the participants with supporting evidence strings derived from the articles for human interpretation could result in practical modules for biological annotation workflows.
doi:10.1186/1471-2105-12-S8-S3
PMCID: PMC3269938  PMID: 22151929
6.  The GNAT library for local and remote gene mention normalization 
Bioinformatics  2011;27(19):2769-2771.
Summary: Identifying mentions of named entities, such as genes or diseases, and normalizing them to database identifiers have become an important step in many text and data mining pipelines. Despite this need, very few entity normalization systems are publicly available as source code or web services for biomedical text mining. Here we present the Gnat Java library for text retrieval, named entity recognition, and normalization of gene and protein mentions in biomedical text. The library can be used as a component to be integrated with other text-mining systems, as a framework to add user-specific extensions, and as an efficient stand-alone application for the identification of gene and protein names for data analysis. On the BioCreative III test data, the current version of Gnat achieves a Tap-20 score of 0.1987.
Availability: The library and web services are implemented in Java and the sources are available from http://gnat.sourceforge.net.
Contact: jorg.hakenberg@roche.com
doi:10.1093/bioinformatics/btr455
PMCID: PMC3179658  PMID: 21813477
7.  Pattern Mining for Extraction of mentions of Adverse Drug Reactions from User Comments 
AMIA Annual Symposium Proceedings  2011;2011:1019-1026.
Rapid growth of online health social networks has enabled patients to communicate more easily with each other. This way of exchange of opinions and experiences has provided a rich source of information about drugs and their effectiveness and more importantly, their possible adverse reactions. We developed a system to automatically extract mentions of Adverse Drug Reactions (ADRs) from user reviews about drugs in social network websites by mining a set of language patterns. The system applied association rule mining on a set of annotated comments to extract the underlying patterns of colloquial expressions about adverse effects. The patterns were tested on a set of unseen comments to evaluate their performance. We reached to precision of 70.01% and recall of 66.32% and F-measure of 67.96%.
PMCID: PMC3243273  PMID: 22195162
8.  BioSimplify: an open source sentence simplification engine to improve recall in automatic biomedical information extraction 
BioSimplify is an open source tool written in Java that introduces and facilitates the use of a novel model for sentence simplification tuned for automatic discourse analysis and information extraction (as opposed to sentence simplification for improving human readability). The model is based on a “shot-gun” approach that produces many different (simpler) versions of the original sentence by combining variants of its constituent elements. This tool is optimized for processing biomedical scientific literature such as the abstracts indexed in PubMed. We tested our tool on its impact to the task of PPI extraction and it improved the f-score of the PPI tool by around 7%, with an improvement in recall of around 20%. The BioSimplify tool and test corpus can be downloaded from https://biosimplify.sourceforge.net
PMCID: PMC3041388  PMID: 21346999
9.  The Performance of Human Papillomavirus High-Risk DNA Testing in the Screening and Diagnostic Settings 
Objective
To evaluate the performance of the Human Papillomavirus High-Risk DNA test in patients 30 years and older.
Materials and Methods
Screening (N=835) and diagnosis (N=518) groups were defined based on prior Papanicolaou smear results as part of a clinical trial for cervical cancer detection. We compared the Hybrid Capture II® (HCII) test result to the worst histological report. We used cervical intraepithelial neoplasia (CIN) 2/3 or worse as the reference of disease. We calculated sensitivities, specificities, positive and negative likelihood ratios (LR+ and LR−), receiver operating characteristic (ROC) curves, and areas under the ROC curves for the HCII test. We also considered alternative strategies, including Papanicolaou smear, a combination of Papanicolaou smear and the HCII test, a sequence of Papanicolaou smear followed by the HCII test, and a sequence of the HCII test followed by Papanicolaou smear.
Results
For the screening group, the sensitivity was 0.69 and the specificity was 0.93; the area under the ROC curve was 0.81. The LR+ and LR− were 10.24 and 0.34, respectively. For the diagnosis group, the sensitivity was 0.88 and the specificity was 0.78; the area under the ROC curve was 0.83. The LR+ and LR− were 4.06 and 0.14, respectively. Sequential testing showed little or no improvement over the combination testing.
Conclusions
The HCII test in the screening group had a greater LR+ for the detection of CIN 2/3 or worse. HCII testing may be an additional screening tool for cervical cancer in women 30 years and older.
doi:10.1158/1055-9965.EPI-08-0137
PMCID: PMC2705895  PMID: 18843032
cervical intraepithelial neoplasia; cervix neoplasms; DNA probes HPV; sensitivity and specificity
10.  GeneRanker: An Online System for Predicting Gene-Disease Associations for Translational Research 
With the overwhelming volume of genomic and molecular information available on many databases nowadays, researchers need from bioinformaticians more than encouragement to refine their searches. We present here GeneRanker, an online system that allows researchers to obtain a ranked list of genes potentially related to a specific disease or biological process by combining gene-disease (or genebiological process) associations with protein-protein interactions extracted from the literature, using computational analysis of the protein network topology to more accurately rank the predicted associations. GeneRanker was evaluated in the context of brain cancer research, and is freely available online at http://www.generanker.org.
PMCID: PMC3041521  PMID: 21347122

Results 1-10 (10)