Search tips
Search criteria

Results 1-6 (6)

Clipboard (0)
more »
Year of Publication
Document Types
1.  A UIMA wrapper for the NCBO annotator 
Bioinformatics  2010;26(14):1800-1801.
Summary: The Unstructured Information Management Architecture (UIMA) framework and web services are emerging as useful tools for integrating biomedical text mining tools. This note describes our work, which wraps the National Center for Biomedical Ontology (NCBO) Annotator—an ontology-based annotation service—to make it available as a component in UIMA workflows.
Availability: This wrapper is freely available on the web at as part of the UIMA tools distribution from the Center for Computational Pharmacology (CCP) at the University of Colorado School of Medicine. It has been implemented in Java for support on Mac OS X, Linux and MS Windows.
PMCID: PMC2894505  PMID: 20505005
2.  Ontology quality assurance through analysis of term transformations 
Bioinformatics  2009;25(12):i77-i84.
Motivation: It is important for the quality of biological ontologies that similar concepts be expressed consistently, or univocally. Univocality is relevant for the usability of the ontology for humans, as well as for computational tools that rely on regularity in the structure of terms. However, in practice terms are not always expressed consistently, and we must develop methods for identifying terms that are not univocal so that they can be corrected.
Results: We developed an automated transformation-based clustering methodology for detecting terms that use different linguistic conventions for expressing similar semantics. These term sets represent occurrences of univocality violations. Our method was able to identify 67 examples of univocality violations in the Gene Ontology.
Availability: The identified univocality violations are available upon request. We are preparing a release of an open source version of the software to be available at
PMCID: PMC2687949  PMID: 19478020
3.  U-Compare: share and compare text mining tools with UIMA 
Bioinformatics  2009;25(15):1997-1998.
Summary: Due to the increasing number of text mining resources (tools and corpora) available to biologists, interoperability issues between these resources are becoming significant obstacles to using them effectively. UIMA, the Unstructured Information Management Architecture, is an open framework designed to aid in the construction of more interoperable tools. U-Compare is built on top of the UIMA framework, and provides both a concrete framework for out-of-the-box text mining and a sophisticated evaluation platform allowing users to run specific tools on any target text, generating both detailed statistics and instance-based visualizations of outputs. U-Compare is a joint project, providing the world's largest, and still growing, collection of UIMA-compatible resources. These resources, originally developed by different groups for a variety of domains, include many famous tools and corpora. U-Compare can be launched straight from the web, without needing to be manually installed. All U-Compare components are provided ready-to-use and can be combined easily via a drag-and-drop interface without any programming. External UIMA components can also simply be mixed with U-Compare components, without distinguishing between locally and remotely deployed resources.
PMCID: PMC2712335  PMID: 19414535
4.  MutationFinder: a high-performance system for extracting point mutation mentions from text 
Bioinformatics (Oxford, England)  2007;23(14):1862-1865.
Discussion of point mutations is ubiquitous in biomedical literature, and manually compiling databases or literature on mutations in specific genes or proteins is tedious. We present an open-source, rule-based system, MutationFinder, for extracting point mutation mentions from text. On blind test data, it achieves nearly perfect precision and a markedly improved recall over a baseline.
MutationFinder, along with a high-quality gold standard data set, and a scoring script for mutation extraction systems have been made publicly available. Implementations, source code and unit tests are available in Python, Perl and Java. MutationFinder can be used as a stand-alone script, or imported by other applications.
Project URL
PMCID: PMC2516306  PMID: 17495998
5.  Manual curation is not sufficient for annotation of genomic databases 
Bioinformatics (Oxford, England)  2007;23(13):i41-i48.
Knowledge base construction has been an area of intense activity and great importance in the growth of computational biology. However, there is little or no history of work on the subject of evaluation of knowledge bases, either with respect to their contents or with respect to the processes by which they are constructed. This article proposes the application of a metric from software engineering known as the found/fixed graph to the problem of evaluating the processes by which genomic knowledge bases are built, as well as the completeness of their contents.
Well-understood patterns of change in the found/fixed graph are found to occur in two large publicly available knowledge bases. These patterns suggest that the current manual curation processes will take far too long to complete the annotations of even just the most important model organisms, and that at their current rate of production, they will never be sufficient for completing the annotation of all currently available proteomes.
PMCID: PMC2516305  PMID: 17646325
6.  Identification of OBO nonalignments and its implications for OBO enrichment 
Bioinformatics  2008;24(12):1448-1455.
Motivation: Existing projects that focus on the semiautomatic addition of links between existing terms in the Open Biomedical Ontologies can take advantage of reasoners that can make new inferences between terms that are based on the added formal definitions and that reflect nonalignments between the linked terms. However, these projects require that these definitions be necessary and sufficient, a strong requirement that often does not hold. If such definitions cannot be added, the reasoners cannot point to the nonalignments through the suggestion of new inferences.
Results: We describe a methodology by which we have identified over 1900 instances of nonredundant nonalignments between terms from the Gene Ontology (GO) biological process (BP), cellular component (CC) and molecular function (MF) ontologies, Chemical Entities of Biological Interest (ChEBI) and the Cell Type Ontology (CL). Many of the 39.8% of these nonalignments whose object terms are more atomic than the subject terms are not currently examined in other ontology-enrichment projects due to the fact that the necessary and sufficient conditions required for the inferences are not currently examined. Analysis of the ratios of nonalignments to assertions from which the nonalignments were identified suggests that BP–MF, BP–BP, BP–CL and CC–CC terms are relatively well-aligned, while ChEBI–MF, BP–ChEBI and CC–MF terms are relatively not aligned well. We propose four ways to resolve an identified nonalignment and recommend an analogous implementation of our methodology in ontology-enrichment tools to identify types of nonalignments that are currently not detected.
Availability: The nonalignments discussed in this article may be viewed at Code for the generation of these nonalignments is available upon request.
PMCID: PMC2427163  PMID: 18463117

Results 1-6 (6)