PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (735143)

Clipboard (0)
None

Related Articles

1.  New directions in biomedical text annotation: definitions, guidelines and corpus construction 
BMC Bioinformatics  2006;7:356.
Background
While biomedical text mining is emerging as an important research area, practical results have proven difficult to achieve. We believe that an important first step towards more accurate text-mining lies in the ability to identify and characterize text that satisfies various types of information needs. We report here the results of our inquiry into properties of scientific text that have sufficient generality to transcend the confines of a narrow subject area, while supporting practical mining of text for factual information. Our ultimate goal is to annotate a significant corpus of biomedical text and train machine learning methods to automatically categorize such text along certain dimensions that we have defined.
Results
We have identified five qualitative dimensions that we believe characterize a broad range of scientific sentences, and are therefore useful for supporting a general approach to text-mining: focus, polarity, certainty, evidence, and directionality. We define these dimensions and describe the guidelines we have developed for annotating text with regard to them.
To examine the effectiveness of the guidelines, twelve annotators independently annotated the same set of 101 sentences that were randomly selected from current biomedical periodicals. Analysis of these annotations shows 70–80% inter-annotator agreement, suggesting that our guidelines indeed present a well-defined, executable and reproducible task.
Conclusion
We present our guidelines defining a text annotation task, along with annotation results from multiple independently produced annotations, demonstrating the feasibility of the task. The annotation of a very large corpus of documents along these guidelines is currently ongoing. These annotations form the basis for the categorization of text along multiple dimensions, to support viable text mining for experimental results, methodology statements, and other forms of information. We are currently developing machine learning methods, to be trained and tested on the annotated corpus, that would allow for the automatic categorization of biomedical text along the general dimensions that we have presented. The guidelines in full detail, along with annotated examples, are publicly available.
doi:10.1186/1471-2105-7-356
PMCID: PMC1559725  PMID: 16867190
2.  Overview of the BioCreative III Workshop 
BMC Bioinformatics  2011;12(Suppl 8):S1.
Background
The overall goal of the BioCreative Workshops is to promote the development of text mining and text processing tools which are useful to the communities of researchers and database curators in the biological sciences. To this end BioCreative I was held in 2004, BioCreative II in 2007, and BioCreative II.5 in 2009. Each of these workshops involved humanly annotated test data for several basic tasks in text mining applied to the biomedical literature. Participants in the workshops were invited to compete in the tasks by constructing software systems to perform the tasks automatically and were given scores based on their performance. The results of these workshops have benefited the community in several ways. They have 1) provided evidence for the most effective methods currently available to solve specific problems; 2) revealed the current state of the art for performance on those problems; 3) and provided gold standard data and results on that data by which future advances can be gauged. This special issue contains overview papers for the three tasks of BioCreative III.
Results
The BioCreative III Workshop was held in September of 2010 and continued the tradition of a challenge evaluation on several tasks judged basic to effective text mining in biology, including a gene normalization (GN) task and two protein-protein interaction (PPI) tasks. In total the Workshop involved the work of twenty-three teams. Thirteen teams participated in the GN task which required the assignment of EntrezGene IDs to all named genes in full text papers without any species information being provided to a system. Ten teams participated in the PPI article classification task (ACT) requiring a system to classify and rank a PubMed® record as belonging to an article either having or not having “PPI relevant” information. Eight teams participated in the PPI interaction method task (IMT) where systems were given full text documents and were required to extract the experimental methods used to establish PPIs and a text segment supporting each such method. Gold standard data was compiled for each of these tasks and participants competed in developing systems to perform the tasks automatically.
BioCreative III also introduced a new interactive task (IAT), run as a demonstration task. The goal was to develop an interactive system to facilitate a user’s annotation of the unique database identifiers for all the genes appearing in an article. This task included ranking genes by importance (based preferably on the amount of described experimental information regarding genes). There was also an optional task to assist the user in finding the most relevant articles about a given gene. For BioCreative III, a user advisory group (UAG) was assembled and played an important role 1) in producing some of the gold standard annotations for the GN task, 2) in critiquing IAT systems, and 3) in providing guidance for a future more rigorous evaluation of IAT systems. Six teams participated in the IAT demonstration task and received feedback on their systems from the UAG group. Besides innovations in the GN and PPI tasks making them more realistic and practical and the introduction of the IAT task, discussions were begun on community data standards to promote interoperability and on user requirements and evaluation metrics to address utility and usability of systems.
Conclusions
In this paper we give a brief history of the BioCreative Workshops and how they relate to other text mining competitions in biology. This is followed by a synopsis of the three tasks GN, PPI, and IAT in BioCreative III with figures for best participant performance on the GN and PPI tasks. These results are discussed and compared with results from previous BioCreative Workshops and we conclude that the best performing systems for GN, PPI-ACT and PPI-IMT in realistic settings are not sufficient for fully automatic use. This provides evidence for the importance of interactive systems and we present our vision of how best to construct an interactive system for a GN or PPI like task in the remainder of the paper.
doi:10.1186/1471-2105-12-S8-S1
PMCID: PMC3269932  PMID: 22151647
3.  The biomedical discourse relation bank 
BMC Bioinformatics  2011;12:188.
Background
Identification of discourse relations, such as causal and contrastive relations, between situations mentioned in text is an important task for biomedical text-mining. A biomedical text corpus annotated with discourse relations would be very useful for developing and evaluating methods for biomedical discourse processing. However, little effort has been made to develop such an annotated resource.
Results
We have developed the Biomedical Discourse Relation Bank (BioDRB), in which we have annotated explicit and implicit discourse relations in 24 open-access full-text biomedical articles from the GENIA corpus. Guidelines for the annotation were adapted from the Penn Discourse TreeBank (PDTB), which has discourse relations annotated over open-domain news articles. We introduced new conventions and modifications to the sense classification. We report reliable inter-annotator agreement of over 80% for all sub-tasks. Experiments for identifying the sense of explicit discourse connectives show the connective itself as a highly reliable indicator for coarse sense classification (accuracy 90.9% and F1 score 0.89). These results are comparable to results obtained with the same classifier on the PDTB data. With more refined sense classification, there is degradation in performance (accuracy 69.2% and F1 score 0.28), mainly due to sparsity in the data. The size of the corpus was found to be sufficient for identifying the sense of explicit connectives, with classifier performance stabilizing at about 1900 training instances. Finally, the classifier performs poorly when trained on PDTB and tested on BioDRB (accuracy 54.5% and F1 score 0.57).
Conclusion
Our work shows that discourse relations can be reliably annotated in biomedical text. Coarse sense disambiguation of explicit connectives can be done with high reliability by using just the connective as a feature, but more refined sense classification requires either richer features or more annotated data. The poor performance of a classifier trained in the open domain and tested in the biomedical domain suggests significant differences in the semantic usage of connectives across these domains, and provides robust evidence for a biomedical sublanguage for discourse and the need to develop a specialized biomedical discourse annotated corpus. The results of our cross-domain experiments are consistent with related work on identifying connectives in BioDRB.
doi:10.1186/1471-2105-12-188
PMCID: PMC3130691  PMID: 21605399
4.  The TREC 2004 genomics track categorization task: classifying full text biomedical documents 
Background
The TREC 2004 Genomics Track focused on applying information retrieval and text mining techniques to improve the use of genomic information in biomedicine. The Genomics Track consisted of two main tasks, ad hoc retrieval and document categorization. In this paper, we describe the categorization task, which focused on the classification of full-text documents, simulating the task of curators of the Mouse Genome Informatics (MGI) system and consisting of three subtasks. One subtask of the categorization task required the triage of articles likely to have experimental evidence warranting the assignment of GO terms, while the other two subtasks were concerned with the assignment of the three top-level GO categories to each paper containing evidence for these categories.
Results
The track had 33 participating groups. The mean and maximum utility measure for the triage subtask was 0.3303, with a top score of 0.6512. No system was able to substantially improve results over simply using the MeSH term Mice. Analysis of significant feature overlap between the training and test sets was found to be less than expected. Sample coverage of GO terms assigned to papers in the collection was very sparse. Determining papers containing GO term evidence will likely need to be treated as separate tasks for each concept represented in GO, and therefore require much denser sampling than was available in the data sets.
The annotation subtask had a mean F-measure of 0.3824, with a top score of 0.5611. The mean F-measure for the annotation plus evidence codes subtask was 0.3676, with a top score of 0.4224. Gene name recognition was found to be of benefit for this task.
Conclusion
Automated classification of documents for GO annotation is a challenging task, as was the automated extraction of GO code hierarchies and evidence codes. However, automating these tasks would provide substantial benefit to biomedical curation, and therefore work in this area must continue. Additional experience will allow comparison and further analysis about which algorithmic features are most useful in biomedical document classification, and better understanding of the task characteristics that make automated classification feasible and useful for biomedical document curation. The TREC Genomics Track will be continuing in 2005 focusing on a wider range of triage tasks and improving results from 2004.
doi:10.1186/1747-5333-1-4
PMCID: PMC1440303  PMID: 16722582
5.  Overview of the ID, EPI and REL tasks of BioNLP Shared Task 2011 
BMC Bioinformatics  2012;13(Suppl 11):S2.
We present the preparation, resources, results and analysis of three tasks of the BioNLP Shared Task 2011: the main tasks on Infectious Diseases (ID) and Epigenetics and Post-translational Modifications (EPI), and the supporting task on Entity Relations (REL). The two main tasks represent extensions of the event extraction model introduced in the BioNLP Shared Task 2009 (ST'09) to two new areas of biomedical scientific literature, each motivated by the needs of specific biocuration tasks. The ID task concerns the molecular mechanisms of infection, virulence and resistance, focusing in particular on the functions of a class of signaling systems that are ubiquitous in bacteria. The EPI task is dedicated to the extraction of statements regarding chemical modifications of DNA and proteins, with particular emphasis on changes relating to the epigenetic control of gene expression. By contrast to these two application-oriented main tasks, the REL task seeks to support extraction in general by separating challenges relating to part-of relations into a subproblem that can be addressed by independent systems. Seven groups participated in each of the two main tasks and four groups in the supporting task. The participating systems indicated advances in the capability of event extraction methods and demonstrated generalization in many aspects: from abstracts to full texts, from previously considered subdomains to new ones, and from the ST'09 extraction targets to other entities and events. The highest performance achieved in the supporting task REL, 58% F-score, is broadly comparable with levels reported for other relation extraction tasks. For the ID task, the highest-performing system achieved 56% F-score, comparable to the state-of-the-art performance at the established ST'09 task. In the EPI task, the best result was 53% F-score for the full set of extraction targets and 69% F-score for a reduced set of core extraction targets, approaching a level of performance sufficient for user-facing applications. In this study, we extend on previously reported results and perform further analyses of the outputs of the participating systems. We place specific emphasis on aspects of system performance relating to real-world applicability, considering alternate evaluation metrics and performing additional manual analysis of system outputs. We further demonstrate that the strengths of extraction systems can be combined to improve on the performance achieved by any system in isolation. The manually annotated corpora, supporting resources, and evaluation tools for all tasks are available from http://www.bionlp-st.org and the tasks continue as open challenges for all interested parties.
doi:10.1186/1471-2105-13-S11-S2
PMCID: PMC3384257  PMID: 22759456
6.  Extracting semantically enriched events from biomedical literature 
BMC Bioinformatics  2012;13:108.
Background
Research into event-based text mining from the biomedical literature has been growing in popularity to facilitate the development of advanced biomedical text mining systems. Such technology permits advanced search, which goes beyond document or sentence-based retrieval. However, existing event-based systems typically ignore additional information within the textual context of events that can determine, amongst other things, whether an event represents a fact, hypothesis, experimental result or analysis of results, whether it describes new or previously reported knowledge, and whether it is speculated or negated. We refer to such contextual information as meta-knowledge. The automatic recognition of such information can permit the training of systems allowing finer-grained searching of events according to the meta-knowledge that is associated with them.
Results
Based on a corpus of 1,000 MEDLINE abstracts, fully manually annotated with both events and associated meta-knowledge, we have constructed a machine learning-based system that automatically assigns meta-knowledge information to events. This system has been integrated into EventMine, a state-of-the-art event extraction system, in order to create a more advanced system (EventMine-MK) that not only extracts events from text automatically, but also assigns five different types of meta-knowledge to these events. The meta-knowledge assignment module of EventMine-MK performs with macro-averaged F-scores in the range of 57-87% on the BioNLP’09 Shared Task corpus. EventMine-MK has been evaluated on the BioNLP’09 Shared Task subtask of detecting negated and speculated events. Our results show that EventMine-MK can outperform other state-of-the-art systems that participated in this task.
Conclusions
We have constructed the first practical system that extracts both events and associated, detailed meta-knowledge information from biomedical literature. The automatically assigned meta-knowledge information can be used to refine search systems, in order to provide an extra search layer beyond entities and assertions, dealing with phenomena such as rhetorical intent, speculations, contradictions and negations. This finer grained search functionality can assist in several important tasks, e.g., database curation (by locating new experimental knowledge) and pathway enrichment (by providing information for inference). To allow easy integration into text mining systems, EventMine-MK is provided as a UIMA component that can be used in the interoperable text mining infrastructure, U-Compare.
doi:10.1186/1471-2105-13-108
PMCID: PMC3464657  PMID: 22621266
7.  Using Twitter to Examine Smoking Behavior and Perceptions of Emerging Tobacco Products 
Background
Social media platforms such as Twitter are rapidly becoming key resources for public health surveillance applications, yet little is known about Twitter users’ levels of informedness and sentiment toward tobacco, especially with regard to the emerging tobacco control challenges posed by hookah and electronic cigarettes.
Objective
To develop a content and sentiment analysis of tobacco-related Twitter posts and build machine learning classifiers to detect tobacco-relevant posts and sentiment towards tobacco, with a particular focus on new and emerging products like hookah and electronic cigarettes.
Methods
We collected 7362 tobacco-related Twitter posts at 15-day intervals from December 2011 to July 2012. Each tweet was manually classified using a triaxial scheme, capturing genre, theme, and sentiment. Using the collected data, machine-learning classifiers were trained to detect tobacco-related vs irrelevant tweets as well as positive vs negative sentiment, using Naïve Bayes, k-nearest neighbors, and Support Vector Machine (SVM) algorithms. Finally, phi contingency coefficients were computed between each of the categories to discover emergent patterns.
Results
The most prevalent genres were first- and second-hand experience and opinion, and the most frequent themes were hookah, cessation, and pleasure. Sentiment toward tobacco was overall more positive (1939/4215, 46% of tweets) than negative (1349/4215, 32%) or neutral among tweets mentioning it, even excluding the 9% of tweets categorized as marketing. Three separate metrics converged to support an emergent distinction between, on one hand, hookah and electronic cigarettes corresponding to positive sentiment, and on the other hand, traditional tobacco products and more general references corresponding to negative sentiment. These metrics included correlations between categories in the annotation scheme (phihookah-positive=0.39; phie-cigs-positive=0.19); correlations between search keywords and sentiment (χ2 4=414.50, P<.001, Cramer’s V=0.36), and the most discriminating unigram features for positive and negative sentiment ranked by log odds ratio in the machine learning component of the study. In the automated classification tasks, SVMs using a relatively small number of unigram features (500) achieved best performance in discriminating tobacco-related from unrelated tweets (F score=0.85).
Conclusions
Novel insights available through Twitter for tobacco surveillance are attested through the high prevalence of positive sentiment. This positive sentiment is correlated in complex ways with social image, personal experience, and recently popular products such as hookah and electronic cigarettes. Several apparent perceptual disconnects between these products and their health effects suggest opportunities for tobacco control education. Finally, machine classification of tobacco-related posts shows a promising edge over strictly keyword-based approaches, yielding an improved signal-to-noise ratio in Twitter data and paving the way for automated tobacco surveillance applications.
doi:10.2196/jmir.2534
PMCID: PMC3758063  PMID: 23989137
social media; twitter messaging; smoking; natural language processing
8.  BioCause: Annotating and analysing causality in the biomedical domain 
BMC Bioinformatics  2013;14:2.
Background
Biomedical corpora annotated with event-level information represent an important resource for domain-specific information extraction (IE) systems. However, bio-event annotation alone cannot cater for all the needs of biologists. Unlike work on relation and event extraction, most of which focusses on specific events and named entities, we aim to build a comprehensive resource, covering all statements of causal association present in discourse. Causality lies at the heart of biomedical knowledge, such as diagnosis, pathology or systems biology, and, thus, automatic causality recognition can greatly reduce the human workload by suggesting possible causal connections and aiding in the curation of pathway models. A biomedical text corpus annotated with such relations is, hence, crucial for developing and evaluating biomedical text mining.
Results
We have defined an annotation scheme for enriching biomedical domain corpora with causality relations. This schema has subsequently been used to annotate 851 causal relations to form BioCause, a collection of 19 open-access full-text biomedical journal articles belonging to the subdomain of infectious diseases. These documents have been pre-annotated with named entity and event information in the context of previous shared tasks. We report an inter-annotator agreement rate of over 60% for triggers and of over 80% for arguments using an exact match constraint. These increase significantly using a relaxed match setting. Moreover, we analyse and describe the causality relations in BioCause from various points of view. This information can then be leveraged for the training of automatic causality detection systems.
Conclusion
Augmenting named entity and event annotations with information about causal discourse relations could benefit the development of more sophisticated IE systems. These will further influence the development of multiple tasks, such as enabling textual inference to detect entailments, discovering new facts and providing new hypotheses for experimental work.
doi:10.1186/1471-2105-14-2
PMCID: PMC3621543  PMID: 23323613
9.  Construction of an annotated corpus to support biomedical information extraction 
BMC Bioinformatics  2009;10:349.
Background
Information Extraction (IE) is a component of text mining that facilitates knowledge discovery by automatically locating instances of interesting biomedical events from huge document collections. As events are usually centred on verbs and nominalised verbs, understanding the syntactic and semantic behaviour of these words is highly important. Corpora annotated with information concerning this behaviour can constitute a valuable resource in the training of IE components and resources.
Results
We have defined a new scheme for annotating sentence-bound gene regulation events, centred on both verbs and nominalised verbs. For each event instance, all participants (arguments) in the same sentence are identified and assigned a semantic role from a rich set of 13 roles tailored to biomedical research articles, together with a biological concept type linked to the Gene Regulation Ontology. To our knowledge, our scheme is unique within the biomedical field in terms of the range of event arguments identified. Using the scheme, we have created the Gene Regulation Event Corpus (GREC), consisting of 240 MEDLINE abstracts, in which events relating to gene regulation and expression have been annotated by biologists. A novel method of evaluating various different facets of the annotation task showed that average inter-annotator agreement rates fall within the range of 66% - 90%.
Conclusion
The GREC is a unique resource within the biomedical field, in that it annotates not only core relationships between entities, but also a range of other important details about these relationships, e.g., location, temporal, manner and environmental conditions. As such, it is specifically designed to support bio-specific tool and resource development. It has already been used to acquire semantic frames for inclusion within the BioLexicon (a lexical, terminological resource to aid biomedical text mining). Initial experiments have also shown that the corpus may viably be used to train IE components, such as semantic role labellers. The corpus and annotation guidelines are freely available for academic purposes.
doi:10.1186/1471-2105-10-349
PMCID: PMC2774701  PMID: 19852798
10.  Evaluation of BioCreAtIvE assessment of task 2 
BMC Bioinformatics  2005;6(Suppl 1):S16.
Background
Molecular Biology accumulated substantial amounts of data concerning functions of genes and proteins. Information relating to functional descriptions is generally extracted manually from textual data and stored in biological databases to build up annotations for large collections of gene products. Those annotation databases are crucial for the interpretation of large scale analysis approaches using bioinformatics or experimental techniques. Due to the growing accumulation of functional descriptions in biomedical literature the need for text mining tools to facilitate the extraction of such annotations is urgent. In order to make text mining tools useable in real world scenarios, for instance to assist database curators during annotation of protein function, comparisons and evaluations of different approaches on full text articles are needed.
Results
The Critical Assessment for Information Extraction in Biology (BioCreAtIvE) contest consists of a community wide competition aiming to evaluate different strategies for text mining tools, as applied to biomedical literature. We report on task two which addressed the automatic extraction and assignment of Gene Ontology (GO) annotations of human proteins, using full text articles. The predictions of task 2 are based on triplets of protein – GO term – article passage. The annotation-relevant text passages were returned by the participants and evaluated by expert curators of the GO annotation (GOA) team at the European Institute of Bioinformatics (EBI). Each participant could submit up to three results for each sub-task comprising task 2. In total more than 15,000 individual results were provided by the participants. The curators evaluated in addition to the annotation itself, whether the protein and the GO term were correctly predicted and traceable through the submitted text fragment.
Conclusion
Concepts provided by GO are currently the most extended set of terms used for annotating gene products, thus they were explored to assess how effectively text mining tools are able to extract those annotations automatically. Although the obtained results are promising, they are still far from reaching the required performance demanded by real world applications. Among the principal difficulties encountered to address the proposed task, were the complex nature of the GO terms and protein names (the large range of variants which are used to express proteins and especially GO terms in free text), and the lack of a standard training set. A range of very different strategies were used to tackle this task. The dataset generated in line with the BioCreative challenge is publicly available and will allow new possibilities for training information extraction methods in the domain of molecular biology.
doi:10.1186/1471-2105-6-S1-S16
PMCID: PMC1869008  PMID: 15960828
11.  Comparative analysis of five protein-protein interaction corpora 
BMC Bioinformatics  2008;9(Suppl 3):S6.
Background
Growing interest in the application of natural language processing methods to biomedical text has led to an increasing number of corpora and methods targeting protein-protein interaction (PPI) extraction. However, there is no general consensus regarding PPI annotation and consequently resources are largely incompatible and methods are difficult to evaluate.
Results
We present the first comparative evaluation of the diverse PPI corpora, performing quantitative evaluation using two separate information extraction methods as well as detailed statistical and qualitative analyses of their properties. For the evaluation, we unify the corpus PPI annotations to a shared level of information, consisting of undirected, untyped binary interactions of non-static types with no identification of the words specifying the interaction, no negations, and no interaction certainty.
We find that the F-score performance of a state-of-the-art PPI extraction method varies on average 19 percentage units and in some cases over 30 percentage units between the different evaluated corpora. The differences stemming from the choice of corpus can thus be substantially larger than differences between the performance of PPI extraction methods, which suggests definite limits on the ability to compare methods evaluated on different resources. We analyse a number of potential sources for these differences and identify factors explaining approximately half of the variance. We further suggest ways in which the difficulty of the PPI extraction tasks codified by different corpora can be determined to advance comparability. Our analysis also identifies points of agreement and disagreement in PPI corpus annotation that are rarely explicitly stated by the authors of the corpora.
Conclusions
Our comparative analysis uncovers key similarities and differences between the diverse PPI corpora, thus taking an important step towards standardization. In the course of this study we have created a major practical contribution in converting the corpora into a shared format. The conversion software is freely available at .
doi:10.1186/1471-2105-9-S3-S6
PMCID: PMC2349296  PMID: 18426551
12.  Mining clinical relationships from patient narratives 
BMC Bioinformatics  2008;9(Suppl 11):S3.
Background
The Clinical E-Science Framework (CLEF) project has built a system to extract clinically significant information from the textual component of medical records in order to support clinical research, evidence-based healthcare and genotype-meets-phenotype informatics. One part of this system is the identification of relationships between clinically important entities in the text. Typical approaches to relationship extraction in this domain have used full parses, domain-specific grammars, and large knowledge bases encoding domain knowledge. In other areas of biomedical NLP, statistical machine learning (ML) approaches are now routinely applied to relationship extraction. We report on the novel application of these statistical techniques to the extraction of clinical relationships.
Results
We have designed and implemented an ML-based system for relation extraction, using support vector machines, and trained and tested it on a corpus of oncology narratives hand-annotated with clinically important relationships. Over a class of seven relation types, the system achieves an average F1 score of 72%, only slightly behind an indicative measure of human inter annotator agreement on the same task. We investigate the effectiveness of different features for this task, how extraction performance varies between inter- and intra-sentential relationships, and examine the amount of training data needed to learn various relationships.
Conclusion
We have shown that it is possible to extract important clinical relationships from text, using supervised statistical ML techniques, at levels of accuracy approaching those of human annotators. Given the importance of relation extraction as an enabling technology for text mining and given also the ready adaptability of systems based on our supervised learning approach to other clinical relationship extraction tasks, this result has significance for clinical text mining more generally, though further work to confirm our encouraging results should be carried out on a larger sample of narratives and relationship types.
doi:10.1186/1471-2105-9-S11-S3
PMCID: PMC2586752  PMID: 19025689
13.  BioCreative III interactive task: an overview 
BMC Bioinformatics  2011;12(Suppl 8):S4.
Background
The BioCreative challenge evaluation is a community-wide effort for evaluating text mining and information extraction systems applied to the biological domain. The biocurator community, as an active user of biomedical literature, provides a diverse and engaged end user group for text mining tools. Earlier BioCreative challenges involved many text mining teams in developing basic capabilities relevant to biological curation, but they did not address the issues of system usage, insertion into the workflow and adoption by curators. Thus in BioCreative III (BC-III), the InterActive Task (IAT) was introduced to address the utility and usability of text mining tools for real-life biocuration tasks. To support the aims of the IAT in BC-III, involvement of both developers and end users was solicited, and the development of a user interface to address the tasks interactively was requested.
Results
A User Advisory Group (UAG) actively participated in the IAT design and assessment. The task focused on gene normalization (identifying gene mentions in the article and linking these genes to standard database identifiers), gene ranking based on the overall importance of each gene mentioned in the article, and gene-oriented document retrieval (identifying full text papers relevant to a selected gene). Six systems participated and all processed and displayed the same set of articles. The articles were selected based on content known to be problematic for curation, such as ambiguity of gene names, coverage of multiple genes and species, or introduction of a new gene name. Members of the UAG curated three articles for training and assessment purposes, and each member was assigned a system to review. A questionnaire related to the interface usability and task performance (as measured by precision and recall) was answered after systems were used to curate articles. Although the limited number of articles analyzed and users involved in the IAT experiment precluded rigorous quantitative analysis of the results, a qualitative analysis provided valuable insight into some of the problems encountered by users when using the systems. The overall assessment indicates that the system usability features appealed to most users, but the system performance was suboptimal (mainly due to low accuracy in gene normalization). Some of the issues included failure of species identification and gene name ambiguity in the gene normalization task leading to an extensive list of gene identifiers to review, which, in some cases, did not contain the relevant genes. The document retrieval suffered from the same shortfalls. The UAG favored achieving high performance (measured by precision and recall), but strongly recommended the addition of features that facilitate the identification of correct gene and its identifier, such as contextual information to assist in disambiguation.
Discussion
The IAT was an informative exercise that advanced the dialog between curators and developers and increased the appreciation of challenges faced by each group. A major conclusion was that the intended users should be actively involved in every phase of software development, and this will be strongly encouraged in future tasks. The IAT Task provides the first steps toward the definition of metrics and functional requirements that are necessary for designing a formal evaluation of interactive curation systems in the BioCreative IV challenge.
doi:10.1186/1471-2105-12-S8-S4
PMCID: PMC3269939  PMID: 22151968
14.  Biomedical named entity extraction: some issues of corpus compatibilities 
SpringerPlus  2013;2:601.
Background
Named Entity (NE) extraction is one of the most fundamental and important tasks in biomedical information extraction. It involves identification of certain entities from text and their classification into some predefined categories. In the biomedical community, there is yet no general consensus regarding named entity (NE) annotation; thus, it is very difficult to compare the existing systems due to corpus incompatibilities. Due to this problem we can not also exploit the advantages of using different corpora together. In our present work we address the issues of corpus compatibilities, and use a single objective optimization (SOO) based classifier ensemble technique that uses the search capability of genetic algorithm (GA) for NE extraction in biomedicine. We hypothesize that the reliability of predictions of each classifier differs among the various output classes. We use Conditional Random Field (CRF) and Support Vector Machine (SVM) frameworks to build a number of models depending upon the various representations of the set of features and/or feature templates. It is to be noted that we tried to extract the features without using any deep domain knowledge and/or resources.
Results
In order to assess the challenges of corpus compatibilities, we experiment with the different benchmark datasets and their various combinations. Comparison results with the existing approaches prove the efficacy of the used technique. GA based ensemble achieves around 2% performance improvements over the individual classifiers. Degradation in performance on the integrated corpus clearly shows the difficulties of the task.
Conclusions
In summary, our used ensemble based approach attains the state-of-the-art performance levels for entity extraction in three different kinds of biomedical datasets. The possible reasons behind the better performance in our used approach are the (i). use of variety and rich features as described in Subsection “Features for named entity extraction”; (ii) use of GA based classifier ensemble technique to combine the outputs of multiple classifiers.
doi:10.1186/2193-1801-2-601
PMCID: PMC3837077  PMID: 24294548
15.  Enriching a biomedical event corpus with meta-knowledge annotation 
BMC Bioinformatics  2011;12:393.
Background
Biomedical papers contain rich information about entities, facts and events of biological relevance. To discover these automatically, we use text mining techniques, which rely on annotated corpora for training. In order to extract protein-protein interactions, genotype-phenotype/gene-disease associations, etc., we rely on event corpora that are annotated with classified, structured representations of important facts and findings contained within text. These provide an important resource for the training of domain-specific information extraction (IE) systems, to facilitate semantic-based searching of documents. Correct interpretation of these events is not possible without additional information, e.g., does an event describe a fact, a hypothesis, an experimental result or an analysis of results? How confident is the author about the validity of her analyses? These and other types of information, which we collectively term meta-knowledge, can be derived from the context of the event.
Results
We have designed an annotation scheme for meta-knowledge enrichment of biomedical event corpora. The scheme is multi-dimensional, in that each event is annotated for 5 different aspects of meta-knowledge that can be derived from the textual context of the event. Textual clues used to determine the values are also annotated. The scheme is intended to be general enough to allow integration with different types of bio-event annotation, whilst being detailed enough to capture important subtleties in the nature of the meta-knowledge expressed in the text. We report here on both the main features of the annotation scheme, as well as its application to the GENIA event corpus (1000 abstracts with 36,858 events). High levels of inter-annotator agreement have been achieved, falling in the range of 0.84-0.93 Kappa.
Conclusion
By augmenting event annotations with meta-knowledge, more sophisticated IE systems can be trained, which allow interpretative information to be specified as part of the search criteria. This can assist in a number of important tasks, e.g., finding new experimental knowledge to facilitate database curation, enabling textual inference to detect entailments and contradictions, etc. To our knowledge, our scheme is unique within the field with regards to the diversity of meta-knowledge aspects annotated for each event.
doi:10.1186/1471-2105-12-393
PMCID: PMC3222636  PMID: 21985429
16.  Cross-National Analysis of the Associations among Mental Disorders and Suicidal Behavior: Findings from the WHO World Mental Health Surveys 
PLoS Medicine  2009;6(8):e1000123.
Using data from over 100,000 individuals in 21 countries participating in the WHO World Mental Health Surveys, Matthew Nock and colleagues investigate which mental health disorders increase the odds of experiencing suicidal thoughts and actual suicide attempts, and how these relationships differ across developed and developing countries.
Background
Suicide is a leading cause of death worldwide. Mental disorders are among the strongest predictors of suicide; however, little is known about which disorders are uniquely predictive of suicidal behavior, the extent to which disorders predict suicide attempts beyond their association with suicidal thoughts, and whether these associations are similar across developed and developing countries. This study was designed to test each of these questions with a focus on nonfatal suicide attempts.
Methods and Findings
Data on the lifetime presence and age-of-onset of Diagnostic and Statistical Manual of Mental Disorders, 4th Edition (DSM-IV) mental disorders and nonfatal suicidal behaviors were collected via structured face-to-face interviews with 108,664 respondents from 21 countries participating in the WHO World Mental Health Surveys. The results show that each lifetime disorder examined significantly predicts the subsequent first onset of suicide attempt (odds ratios [ORs] = 2.9–8.9). After controlling for comorbidity, these associations decreased substantially (ORs = 1.5–5.6) but remained significant in most cases. Overall, mental disorders were equally predictive in developed and developing countries, with a key difference being that the strongest predictors of suicide attempts in developed countries were mood disorders, whereas in developing countries impulse-control, substance use, and post-traumatic stress disorders were most predictive. Disaggregation of the associations between mental disorders and nonfatal suicide attempts showed that these associations are largely due to disorders predicting the onset of suicidal thoughts rather than predicting progression from thoughts to attempts. In the few instances where mental disorders predicted the transition from suicidal thoughts to attempts, the significant disorders are characterized by anxiety and poor impulse-control. The limitations of this study include the use of retrospective self-reports of lifetime occurrence and age-of-onset of mental disorders and suicidal behaviors, as well as the narrow focus on mental disorders as predictors of nonfatal suicidal behaviors, each of which must be addressed in future studies.
Conclusions
This study found that a wide range of mental disorders increased the odds of experiencing suicide ideation. However, after controlling for psychiatric comorbidity, only disorders characterized by anxiety and poor impulse-control predict which people with suicide ideation act on such thoughts. These findings provide a more fine-grained understanding of the associations between mental disorders and subsequent suicidal behavior than previously available and indicate that mental disorders predict suicidal behaviors similarly in both developed and developing countries. Future research is needed to delineate the mechanisms through which people come to think about suicide and subsequently progress from ideation to attempts.
Please see later in the article for Editors' Summary
Editors' Summary
Background
Suicide is a leading cause of death worldwide. Every 40 seconds, someone somewhere commits suicide. Over a year, this adds up to about 1 million self-inflicted deaths. In the USA, for example, where suicide is the 11th leading cause of death, more than 30,000 people commit suicide every year. The figures for nonfatal suicidal behavior (suicidal thoughts or ideation, suicide planning, and suicide attempts) are even more shocking. Globally, suicide attempts, for example, are estimated to be 20 times as frequent as completed suicides. Risk factors for nonfatal suicidal behaviors and for suicide include depression and other mental disorders, alcohol or drug abuse, stressful life events, a family history of suicide, and having a friend or relative commit suicide. Importantly, nonfatal suicidal behaviors are powerful predictors of subsequent suicide deaths so individuals who talk about killing themselves must always be taken seriously and given as much help as possible by friends, relatives, and mental-health professionals.
Why Was This Study Done?
Experts believe that it might be possible to find ways to decrease suicide rates by answering three questions. First, which individual mental disorders are predictive of nonfatal suicidal behaviors? Although previous studies have reported that virtually all mental disorders are associated with an increased risk of suicidal behaviors, people often have two or more mental disorders (“comorbidity”), so many of these associations may reflect the effects of only a few disorders. Second, do some mental disorders predict suicidal ideation whereas others predict who will act on these thoughts? Finally, are the associations between mental disorders and suicidal behavior similar in developed countries (where most studies have been done) and in developing countries? By answering these questions, it should be possible to improve the screening, clinical risk assessment, and treatment of suicide around the world. Thus, in this study, the researchers undertake a cross-national analysis of the associations among mental disorders (as defined by the Diagnostic and Statistical Manual of Mental Disorders, 4th Edition [DSM-IV]) and nonfatal suicidal behaviors.
What Did the Researchers Do and Find?
The researchers collected and analyzed data on the lifetime presence and age-of-onset of mental disorders and of nonfatal suicidal behaviors in structured interviews with nearly 110,000 participants from 21 countries (part of the World Health Organization's World Mental Health Survey Initiative). The lifetime presence of each of the 16 disorders considered (mood disorders such as depression; anxiety disorders such as post-traumatic stress disorder [PTSD]; impulse-control disorders such as attention deficit/hyperactivity disorder; and substance misuse) predicted first suicide attempts in both developed and developing countries. However, the increased risk of a suicide attempt associated with each disorder varied. So, for example, in developed countries, after controlling for comorbid mental disorders, major depression increased the risk of a suicide attempt 3-fold but drug abuse/dependency increased the risk only 2-fold. Similarly, although the strongest predictors of suicide attempts in developed countries were mood disorders, in developing countries the strongest predictors were impulse-control disorders, substance misuse disorders, and PTSD. Other analyses indicate that mental disorders were generally more predictive of the onset of suicidal thoughts than of suicide plans and attempts, but that anxiety and poor impulse-control disorders were the strongest predictors of suicide attempts in both developed and developing countries.
What Do These Findings Mean?
Although this study has several limitations—for example, it relies on retrospective self-reports by study participants—its findings nevertheless provide a more detailed understanding of the associations between mental disorders and subsequent suicidal behaviors than previously available. In particular, its findings reveal that a wide range of individual mental disorders increase the chances of an individual thinking about suicide in both developed and developing countries and provide new information about the mental disorders that predict which people with suicidal ideas will act on such thoughts. However, the findings also show that only half of people who have seriously considered killing themselves have a mental disorder. Thus although future suicide prevention efforts should include a focus on screening and treating mental disorders, ways must also be found to identify the many people without mental disorders who are at risk of suicidal behaviors.
Additional Information
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1000123.
The US National Institute of Mental Health provides information about suicide in the US: statistics and prevention
The UK National Health Service provides information about suicide, including statistics about suicide in the UK and links to other resources
The World Health Organization provides global statistics about suicide and information on suicide prevention
MedlinePlus provides links to further information and advice about suicide and about mental health (in English and Spanish)
Further details about the World Mental Health Survey Initiative and about DSM-IV are available
doi:10.1371/journal.pmed.1000123
PMCID: PMC2717212  PMID: 19668361
17.  Multi-dimensional classification of biomedical text: Toward automated, practical provision of high-utility text to diverse users 
Bioinformatics  2008;24(18):2086-2093.
Motivation: Much current research in biomedical text mining is concerned with serving biologists by extracting certain information from scientific text. We note that there is no ‘average biologist’ client; different users have distinct needs. For instance, as noted in past evaluation efforts (BioCreative, TREC, KDD) database curators are often interested in sentences showing experimental evidence and methods. Conversely, lab scientists searching for known information about a protein may seek facts, typically stated with high confidence. Text-mining systems can target specific end-users and become more effective, if the system can first identify text regions rich in the type of scientific content that is of interest to the user, retrieve documents that have many such regions, and focus on fact extraction from these regions. Here, we study the ability to characterize and classify such text automatically. We have recently introduced a multi-dimensional categorization and annotation scheme, developed to be applicable to a wide variety of biomedical documents and scientific statements, while intended to support specific biomedical retrieval and extraction tasks.
Results: The annotation scheme was applied to a large corpus in a controlled effort by eight independent annotators, where three individual annotators independently tagged each sentence. We then trained and tested machine learning classifiers to automatically categorize sentence fragments based on the annotation. We discuss here the issues involved in this task, and present an overview of the results. The latter strongly suggest that automatic annotation along most of the dimensions is highly feasible, and that this new framework for scientific sentence categorization is applicable in practice.
Contact: shatkay@cs.queensu.ca
doi:10.1093/bioinformatics/btn381
PMCID: PMC2530883  PMID: 18718948
18.  University of Turku in the BioNLP'11 Shared Task 
BMC Bioinformatics  2012;13(Suppl 11):S4.
Background
We present a system for extracting biomedical events (detailed descriptions of biomolecular interactions) from research articles, developed for the BioNLP'11 Shared Task. Our goal is to develop a system easily adaptable to different event schemes, following the theme of the BioNLP'11 Shared Task: generalization, the extension of event extraction to varied biomedical domains. Our system extends our BioNLP'09 Shared Task winning Turku Event Extraction System, which uses support vector machines to first detect event-defining words, followed by detection of their relationships.
Results
Our current system successfully predicts events for every domain case introduced in the BioNLP'11 Shared Task, being the only system to participate in all eight tasks and all of their subtasks, with best performance in four tasks. Following the Shared Task, we improve the system on the Infectious Diseases task from 42.57% to 53.87% F-score, bringing performance into line with the similar GENIA Event Extraction and Epigenetics and Post-translational Modifications tasks. We evaluate the machine learning performance of the system by calculating learning curves for all tasks, detecting areas where additional annotated data could be used to improve performance. Finally, we evaluate the use of system output on external articles as additional training data in a form of self-training.
Conclusions
We show that the updated Turku Event Extraction System can easily be adapted to all presently available event extraction targets, with competitive performance in most tasks. The scope of the performance gains between the 2009 and 2011 BioNLP Shared Tasks indicates event extraction is still a new field requiring more work. We provide several analyses of event extraction methods and performance, highlighting potential future directions for continued development.
doi:10.1186/1471-2105-13-S11-S4
PMCID: PMC3384251  PMID: 22759458
19.  The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes 
BMC Bioinformatics  2008;9(Suppl 11):S9.
Background
Detecting uncertain and negative assertions is essential in most BioMedical Text Mining tasks where, in general, the aim is to derive factual knowledge from textual data. This article reports on a corpus annotation project that has produced a freely available resource for research on handling negation and uncertainty in biomedical texts (we call this corpus the BioScope corpus).
Results
The corpus consists of three parts, namely medical free texts, biological full papers and biological scientific abstracts. The dataset contains annotations at the token level for negative and speculative keywords and at the sentence level for their linguistic scope. The annotation process was carried out by two independent linguist annotators and a chief linguist – also responsible for setting up the annotation guidelines – who resolved cases where the annotators disagreed. The resulting corpus consists of more than 20.000 sentences that were considered for annotation and over 10% of them actually contain one (or more) linguistic annotation suggesting negation or uncertainty.
Conclusion
Statistics are reported on corpus size, ambiguity levels and the consistency of annotations. The corpus is accessible for academic purposes and is free of charge. Apart from the intended goal of serving as a common resource for the training, testing and comparing of biomedical Natural Language Processing systems, the corpus is also a good resource for the linguistic analysis of scientific and clinical texts.
doi:10.1186/1471-2105-9-S11-S9
PMCID: PMC2586758  PMID: 19025695
20.  Semi-automatic semantic annotation of PubMed Queries: a study on quality, efficiency, satisfaction 
Journal of biomedical informatics  2010;44(2):310-318.
Information processing algorithms require significant amounts of annotated data for training and testing. The availability of such data is often hindered by the complexity and high cost of production. In this paper, we investigate the benefits of a state-of-the-art tool to help with the semantic annotation of a large set of biomedical information queries.
Seven annotators were recruited to annotate a set of 10,000 PubMed® queries with 16 biomedical and bibliographic categories. About half of the queries were annotated from scratch, while the other half were automatically pre-annotated and manually corrected. The impact of the automatic pre-annotations was assessed on several aspects of the task: time, number of actions, annotator satisfaction, inter-annotator agreement, quality and number of the resulting annotations.
The analysis of annotation results showed that the number of required hand annotations is 28.9% less when using pre-annotated results from automatic tools. As a result, the overall annotation time was substantially lower when pre-annotations were used, while inter-annotator agreement was significantly higher. In addition, there was no statistically significant difference in the semantic distribution or number of annotations produced when pre-annotations were used. The annotated query corpus is freely available to the research community.
This study shows that automatic pre-annotations are found helpful by most annotators. Our experience suggests using an automatic tool to assist large-scale manual annotation projects. This helps speed-up the annotation time and improve annotation consistency while maintaining high quality of the final annotations.
doi:10.1016/j.jbi.2010.11.001
PMCID: PMC3063330  PMID: 21094696
PubMed queries; biomedical entities; annotation standards; annotation methods
21.  Learning an enriched representation from unlabeled data for protein-protein interaction extraction 
BMC Bioinformatics  2010;11(Suppl 2):S7.
Background
Extracting protein-protein interactions from biomedical literature is an important task in biomedical text mining. Supervised machine learning methods have been used with great success in this task but they tend to suffer from data sparseness because of their restriction to obtain knowledge from limited amount of labelled data. In this work, we study the use of unlabeled biomedical texts to enhance the performance of supervised learning for this task. We use feature coupling generalization (FCG) – a recently proposed semi-supervised learning strategy – to learn an enriched representation of local contexts in sentences from 47 million unlabeled examples and investigate the performance of the new features on AIMED corpus.
Results
The new features generated by FCG achieve a 60.1 F-score and produce significant improvement over supervised baselines. The experimental analysis shows that FCG can utilize well the sparse features which have little effect in supervised learning. The new features perform better in non-linear classifiers than linear ones. We combine the new features with local lexical features, obtaining an F-score of 63.5 on AIMED corpus, which is comparable with the current state-of-the-art results. We also find that simple Boolean lexical features derived only from local contexts are able to achieve competitive results against most syntactic feature/kernel based methods.
Conclusions
FCG creates a lot of opportunities for designing new features, since a lot of sparse features ignored by supervised learning can be utilized well. Interestingly, our results also demonstrate that the state-of-the art performance can be achieved without using any syntactic information in this task.
doi:10.1186/1471-2105-11-S2-S7
PMCID: PMC3166043  PMID: 20406505
22.  Impaired Decision-Making in Adolescent Suicide Attempters 
Objective
Decision-making deficits have been linked to suicidal behavior in adults. However, it remains unclear whether impaired decision-making plays a role in the etiopathogenesis of youth suicidal behavior. The purpose of this study was to examine decision-making processes in adolescent suicide attempters and never-suicidal comparison subjects.
Method
Using the Iowa Gambling Task, the authors examined decision-making in 40 adolescent suicide attempters, ages 13–18, and 40 never-suicidal, demographically-matched psychiatric comparison subjects.
Results
Overall, suicide attempters performed significantly worse on the Iowa Gambling Task than comparison subjects. This difference in overall task performance between the groups persisted in an exact conditional logistic regression analysis that controlled for affective disorder, current psychotropic medication use, impulsivity, and hostility (adjusted odds ratio=0.96, 95% confidence interval=0.90–0.99, p<.05). A two-way repeated-measures analysis of variance revealed a significant group-by-block interaction, demonstrating that attempters failed to learn during the task, picking approximately the same proportion of disadvantageous cards in the first and final blocks of the task. In contrast, comparison subjects picked proportionately fewer cards from the disadvantageous decks as the task progressed. Within the attempter group, overall task performance did not correlate with any characteristic of the index attempt or with the personality dimensions of impulsivity, hostility, and emotional lability.
Conclusions
Similar to findings in adults, impaired decision-making is associated with suicidal behavior in adolescents. Longitudinal studies are needed to elucidate the temporal relationship between decision-making processes and suicidal behavior and help frame potential targets for early identification and preventive interventions to reduce youth suicide and suicidal behavior.
doi:10.1016/j.jaac.2012.01.002
PMCID: PMC3314230  PMID: 22449645
23.  Natural Language Processing Versus Content-Based Image Analysis for Medical Document Retrieval 
One of the most significant recent advances in health information systems has been the shift from paper to electronic documents. While research on automatic text and image processing has taken separate paths, there is a growing need for joint efforts, particularly for electronic health records and biomedical literature databases. This work aims at comparing text-based versus image-based access to multimodal medical documents using state-of-the-art methods of processing text and image components. A collection of 180 medical documents containing an image accompanied by a short text describing it was divided into training and test sets. Content-based image analysis and natural language processing techniques are applied individually and combined for multimodal document analysis. The evaluation consists of an indexing task and a retrieval task based on the “gold standard” codes manually assigned to corpus documents. The performance of text-based and image-based access, as well as combined document features, is compared. Image analysis proves more adequate for both the indexing and retrieval of the images. In the indexing task, multimodal analysis outperforms both independent image and text analysis. This experiment shows that text describing images can be usefully analyzed in the framework of a hybrid text/image retrieval system.
doi:10.1002/asi.20955
PMCID: PMC2714909  PMID: 19633735
24.  From Episodes of Care to Diagnosis Codes: Automatic Text Categorization for Medico-Economic Encoding 
We report on the design and evaluation of an original system to help assignment ICD (International Classification of Disease) codes to clinical narratives. The task is defined as a multi-class multi-document classification task. We combine a set of machine learning and data-poor methods to generate a single automatic text categorizer, which returns a ranked list of ICD codes. The combined ranking system currently obtains a precision of 75% at high ranks and a recall of about 63% for the top twenty returned codes for a theoretical upper bound of about 79% (inter-coder agreement). The performance of the data-poor classifier is weak, whereas the use of temporal features such as anamnesis and prescription contents results in a statistically significant improvement.
PMCID: PMC2655971  PMID: 18999206
25.  A corpus of full-text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools 
BMC Bioinformatics  2012;13:207.
Background
We introduce the linguistic annotation of a corpus of 97 full-text biomedical publications, known as the Colorado Richly Annotated Full Text (CRAFT) corpus. We further assess the performance of existing tools for performing sentence splitting, tokenization, syntactic parsing, and named entity recognition on this corpus.
Results
Many biomedical natural language processing systems demonstrated large differences between their previously published results and their performance on the CRAFT corpus when tested with the publicly available models or rule sets. Trainable systems differed widely with respect to their ability to build high-performing models based on this data.
Conclusions
The finding that some systems were able to train high-performing models based on this corpus is additional evidence, beyond high inter-annotator agreement, that the quality of the CRAFT corpus is high. The overall poor performance of various systems indicates that considerable work needs to be done to enable natural language processing systems to work well when the input is full-text journal articles. The CRAFT corpus provides a valuable resource to the biomedical natural language processing community for evaluation and training of new models for biomedical full text publications.
doi:10.1186/1471-2105-13-207
PMCID: PMC3483229  PMID: 22901054

Results 1-25 (735143)