PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-4 (4)
 

Clipboard (0)
None
Journals
Authors
more »
Year of Publication
Document Types
1.  Ontology quality assurance through analysis of term transformations 
Bioinformatics  2009;25(12):i77-i84.
Motivation: It is important for the quality of biological ontologies that similar concepts be expressed consistently, or univocally. Univocality is relevant for the usability of the ontology for humans, as well as for computational tools that rely on regularity in the structure of terms. However, in practice terms are not always expressed consistently, and we must develop methods for identifying terms that are not univocal so that they can be corrected.
Results: We developed an automated transformation-based clustering methodology for detecting terms that use different linguistic conventions for expressing similar semantics. These term sets represent occurrences of univocality violations. Our method was able to identify 67 examples of univocality violations in the Gene Ontology.
Availability: The identified univocality violations are available upon request. We are preparing a release of an open source version of the software to be available at http://bionlp.sourceforge.net.
Contact: karin.verspoor@ucdenver.edu
doi:10.1093/bioinformatics/btp195
PMCID: PMC2687949  PMID: 19478020
2.  U-Compare: share and compare text mining tools with UIMA 
Bioinformatics  2009;25(15):1997-1998.
Summary: Due to the increasing number of text mining resources (tools and corpora) available to biologists, interoperability issues between these resources are becoming significant obstacles to using them effectively. UIMA, the Unstructured Information Management Architecture, is an open framework designed to aid in the construction of more interoperable tools. U-Compare is built on top of the UIMA framework, and provides both a concrete framework for out-of-the-box text mining and a sophisticated evaluation platform allowing users to run specific tools on any target text, generating both detailed statistics and instance-based visualizations of outputs. U-Compare is a joint project, providing the world's largest, and still growing, collection of UIMA-compatible resources. These resources, originally developed by different groups for a variety of domains, include many famous tools and corpora. U-Compare can be launched straight from the web, without needing to be manually installed. All U-Compare components are provided ready-to-use and can be combined easily via a drag-and-drop interface without any programming. External UIMA components can also simply be mixed with U-Compare components, without distinguishing between locally and remotely deployed resources.
Availability: http://u-compare.org/
Contact: kano@is.s.u-tokyo.ac.jp
doi:10.1093/bioinformatics/btp289
PMCID: PMC2712335  PMID: 19414535
3.  MutationFinder: a high-performance system for extracting point mutation mentions from text 
Bioinformatics (Oxford, England)  2007;23(14):1862-1865.
Summary
Discussion of point mutations is ubiquitous in biomedical literature, and manually compiling databases or literature on mutations in specific genes or proteins is tedious. We present an open-source, rule-based system, MutationFinder, for extracting point mutation mentions from text. On blind test data, it achieves nearly perfect precision and a markedly improved recall over a baseline.
Availability
MutationFinder, along with a high-quality gold standard data set, and a scoring script for mutation extraction systems have been made publicly available. Implementations, source code and unit tests are available in Python, Perl and Java. MutationFinder can be used as a stand-alone script, or imported by other applications.
Project URL
http://bionlp.sourceforge.net
Contact
gregcaporaso@gmail.com
doi:10.1093/bioinformatics/btm235
PMCID: PMC2516306  PMID: 17495998
4.  Manual curation is not sufficient for annotation of genomic databases 
Bioinformatics (Oxford, England)  2007;23(13):i41-i48.
Motivation
Knowledge base construction has been an area of intense activity and great importance in the growth of computational biology. However, there is little or no history of work on the subject of evaluation of knowledge bases, either with respect to their contents or with respect to the processes by which they are constructed. This article proposes the application of a metric from software engineering known as the found/fixed graph to the problem of evaluating the processes by which genomic knowledge bases are built, as well as the completeness of their contents.
Results
Well-understood patterns of change in the found/fixed graph are found to occur in two large publicly available knowledge bases. These patterns suggest that the current manual curation processes will take far too long to complete the annotations of even just the most important model organisms, and that at their current rate of production, they will never be sufficient for completing the annotation of all currently available proteomes.
Contact
larry.hunter@uchsc.edu
doi:10.1093/bioinformatics/btm229
PMCID: PMC2516305  PMID: 17646325

Results 1-4 (4)