PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-18 (18)
 

Clipboard (0)
None

Select a Filter Below

Journals
Year of Publication
1.  Can Inferred Provenance and Its Visualisation Be Used to Detect Erroneous Annotation? A Case Study Using UniProtKB 
PLoS ONE  2013;8(10):e75541.
A constant influx of new data poses a challenge in keeping the annotation in biological databases current. Most biological databases contain significant quantities of textual annotation, which often contains the richest source of knowledge. Many databases reuse existing knowledge; during the curation process annotations are often propagated between entries. However, this is often not made explicit. Therefore, it can be hard, potentially impossible, for a reader to identify where an annotation originated from. Within this work we attempt to identify annotation provenance and track its subsequent propagation. Specifically, we exploit annotation reuse within the UniProt Knowledgebase (UniProtKB), at the level of individual sentences. We describe a visualisation approach for the provenance and propagation of sentences in UniProtKB which enables a large-scale statistical analysis. Initially levels of sentence reuse within UniProtKB were analysed, showing that reuse is heavily prevalent, which enables the tracking of provenance and propagation. By analysing sentences throughout UniProtKB, a number of interesting propagation patterns were identified, covering over sentences. Over sentences remain in the database after they have been removed from the entries where they originally occurred. Analysing a subset of these sentences suggest that approximately are erroneous, whilst appear to be inconsistent. These results suggest that being able to visualise sentence propagation and provenance can aid in the determination of the accuracy and quality of textual annotation.
Source code and supplementary data are available from the authors website at http://homepages.cs.ncl.ac.uk/m.j.bell1/sentence_analysis/.
doi:10.1371/journal.pone.0075541
PMCID: PMC3797126  PMID: 24143170
2.  An approach to describing and analysing bulk biological annotation quality: a case study using UniProtKB 
Bioinformatics  2012;28(18):i562-i568.
Motivation: Annotations are a key feature of many biological databases, used to convey our knowledge of a sequence to the reader. Ideally, annotations are curated manually, however manual curation is costly, time consuming and requires expert knowledge and training. Given these issues and the exponential increase of data, many databases implement automated annotation pipelines in an attempt to avoid un-annotated entries. Both manual and automated annotations vary in quality between databases and annotators, making assessment of annotation reliability problematic for users. The community lacks a generic measure for determining annotation quality and correctness, which we look at addressing within this article. Specifically we investigate word reuse within bulk textual annotations and relate this to Zipf's Principle of Least Effort. We use the UniProt Knowledgebase (UniProtKB) as a case study to demonstrate this approach since it allows us to compare annotation change, both over time and between automated and manually curated annotations.
Results: By applying power-law distributions to word reuse in annotation, we show clear trends in UniProtKB over time, which are consistent with existing studies of quality on free text English. Further, we show a clear distinction between manual and automated analysis and investigate cohorts of protein records as they mature. These results suggest that this approach holds distinct promise as a mechanism for judging annotation quality.
Availability: Source code is available at the authors website: http://homepages.cs.ncl.ac.uk/m.j.bell1/annotation.
Contact: phillip.lord@newcastle.ac.uk
doi:10.1093/bioinformatics/bts372
PMCID: PMC3436799  PMID: 22962482
3.  Bayesian integration of networks without gold standards 
Bioinformatics  2012;28(11):1495-1500.
Motivation: Biological experiments give insight into networks of processes inside a cell, but are subject to error and uncertainty. However, due to the overlap between the large number of experiments reported in public databases it is possible to assess the chances of individual observations being correct. In order to do so, existing methods rely on high-quality ‘gold standard’ reference networks, but such reference networks are not always available.
Results: We present a novel algorithm for computing the probability of network interactions that operates without gold standard reference data. We show that our algorithm outperforms existing gold standard-based methods. Finally, we apply the new algorithm to a large collection of genetic interaction and protein–protein interaction experiments.
Availability: The integrated dataset and a reference implementation of the algorithm as a plug-in for the Ondex data integration framework are available for download at http://bio-nexus.ncl.ac.uk/projects/nogold/
Contact: darren.wilkinson@ncl.ac.uk
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/bts154
PMCID: PMC3356839  PMID: 22492647
4.  Customizable views on semantically integrated networks for systems biology 
Bioinformatics  2011;27(9):1299-1306.
Motivation: The rise of high-throughput technologies in the post-genomic era has led to the production of large amounts of biological data. Many of these datasets are freely available on the Internet. Making optimal use of these data is a significant challenge for bioinformaticians. Various strategies for integrating data have been proposed to address this challenge. One of the most promising approaches is the development of semantically rich integrated datasets. Although well suited to computational manipulation, such integrated datasets are typically too large and complex for easy visualization and interactive exploration.
Results: We have created an integrated dataset for Saccharomyces cerevisiae using the semantic data integration tool Ondex, and have developed a view-based visualization technique that allows for concise graphical representations of the integrated data. The technique was implemented in a plug-in for Cytoscape, called OndexView. We used OndexView to investigate telomere maintenance in S. cerevisiae.
Availability: The Ondex yeast dataset and the OndexView plug-in for Cytoscape are accessible at http://bsu.ncl.ac.uk/ondexview.
Contact: anil.wipat@ncl.ac.uk
Supplementary information: Supplementary data is available at Bioinformatics online.
doi:10.1093/bioinformatics/btr134
PMCID: PMC3077072  PMID: 21414991
5.  Adding a Little Reality to Building Ontologies for Biology 
PLoS ONE  2010;5(9):e12258.
Background
Many areas of biology are open to mathematical and computational modelling. The application of discrete, logical formalisms defines the field of biomedical ontologies. Ontologies have been put to many uses in bioinformatics. The most widespread is for description of entities about which data have been collected, allowing integration and analysis across multiple resources. There are now over 60 ontologies in active use, increasingly developed as large, international collaborations. There are, however, many opinions on how ontologies should be authored; that is, what is appropriate for representation. Recently, a common opinion has been the “realist” approach that places restrictions upon the style of modelling considered to be appropriate.
Methodology/Principal Findings
Here, we use a number of case studies for describing the results of biological experiments. We investigate the ways in which these could be represented using both realist and non-realist approaches; we consider the limitations and advantages of each of these models.
Conclusions/Significance
From our analysis, we conclude that while realist principles may enable straight-forward modelling for some topics, there are crucial aspects of science and the phenomena it studies that do not fit into this approach; realism appears to be over-simplistic which, perversely, results in overly complex ontological models. We suggest that it is impossible to avoid compromise in modelling ontology; a clearer understanding of these compromises will better enable appropriate modelling, fulfilling the many needs for discrete mathematical models within computational biology.
doi:10.1371/journal.pone.0012258
PMCID: PMC2933225  PMID: 20838431
7.  Annotation of SBML models through rule-based semantic integration 
Journal of Biomedical Semantics  2010;1(Suppl 1):S3.
Background
The creation of accurate quantitative Systems Biology Markup Language (SBML) models is a time-intensive, manual process often complicated by the many data sources and formats required to annotate even a small and well-scoped model. Ideally, the retrieval and integration of biological knowledge for model annotation should be performed quickly, precisely, and with a minimum of manual effort.
Results
Here we present rule-based mediation, a method of semantic data integration applied to systems biology model annotation. The heterogeneous data sources are first syntactically converted into ontologies, which are then aligned to a small domain ontology by applying a rule base. We demonstrate proof-of-principle of this application of rule-based mediation using off-the-shelf semantic web technology through two use cases for SBML model annotation. Existing tools and technology provide a framework around which the system is built, reducing development time and increasing usability.
Conclusions
Integrating resources in this way accommodates multiple formats with different semantics, and provides richly-modelled biological knowledge suitable for annotation of SBML models. This initial work establishes the feasibility of rule-based mediation as part of an automated SBML model annotation system.
Availability
Detailed information on the project files as well as further information on and comparisons with similar projects is available from the project page at http://cisban-silico.cs.ncl.ac.uk/RBM/.
doi:10.1186/2041-1480-1-S1-S3
PMCID: PMC2903722  PMID: 20626923
8.  An evolutionary approach to Function 
Journal of Biomedical Semantics  2010;1(Suppl 1):S4.
Background
Understanding the distinction between function and role is vexing and difficult. While it appears to be useful, in practice this distinction is hard to apply, particularly within biology.
Results
I take an evolutionary approach, considering a series of examples, to develop and generate definitions for these concepts. I test them in practice against the Ontology for Biomedical Investigations (OBI). Finally, I give an axiomatisation and discuss methods for applying these definitions in practice.
Conclusions
The definitions in this paper are applicable, formalizing current practice. As such, they make a significant contribution to the use of these concepts within biomedical ontologies.
doi:10.1186/2041-1480-1-S1-S4
PMCID: PMC2903723  PMID: 20626924
9.  Modeling biomedical experimental processes with OBI 
Journal of Biomedical Semantics  2010;1(Suppl 1):S7.
Background
Experimental descriptions are typically stored as free text without using standardized terminology, creating challenges in comparison, reproduction and analysis. These difficulties impose limitations on data exchange and information retrieval.
Results
The Ontology for Biomedical Investigations (OBI), developed as a global, cross-community effort, provides a resource that represents biomedical investigations in an explicit and integrative framework. Here we detail three real-world applications of OBI, provide detailed modeling information and explain how to use OBI.
Conclusion
We demonstrate how OBI can be applied to different biomedical investigations to both facilitate interpretation of the experimental process and increase the computational processing and integration within the Semantic Web. The logical definitions of the entities involved allow computers to unambiguously understand and integrate different biological experimental processes and their relevant components.
Availability
OBI is available at http://purl.obolibrary.org/obo/obi/2009-11-02/obi.owl
doi:10.1186/2041-1480-1-S1-S7
PMCID: PMC2903726  PMID: 20626927
10.  Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project 
Nature biotechnology  2008;26(8):889-896.
The Minimum Information for Biological and Biomedical Investigations (MIBBI) project provides a resource for those exploring the range of extant minimum information checklists and fosters coordinated development of such checklists.
doi:10.1038/nbt.1411
PMCID: PMC2771753  PMID: 18688244
11.  Semantic Similarity in Biomedical Ontologies 
PLoS Computational Biology  2009;5(7):e1000443.
In recent years, ontologies have become a mainstream topic in biomedical research. When biological entities are described using a common schema, such as an ontology, they can be compared by means of their annotations. This type of comparison is called semantic similarity, since it assesses the degree of relatedness between two entities by the similarity in meaning of their annotations. The application of semantic similarity to biomedical ontologies is recent; nevertheless, several studies have been published in the last few years describing and evaluating diverse approaches. Semantic similarity has become a valuable tool for validating the results drawn from biomedical studies such as gene clustering, gene expression data analysis, prediction and validation of molecular interactions, and disease gene prioritization.
We review semantic similarity measures applied to biomedical ontologies and propose their classification according to the strategies they employ: node-based versus edge-based and pairwise versus groupwise. We also present comparative assessment studies and discuss the implications of their results. We survey the existing implementations of semantic similarity measures, and we describe examples of applications to biomedical research. This will clarify how biomedical researchers can benefit from semantic similarity measures and help them choose the approach most suitable for their studies.
Biomedical ontologies are evolving toward increased coverage, formality, and integration, and their use for annotation is increasingly becoming a focus of both effort by biomedical experts and application of automated annotation procedures to create corpora of higher quality and completeness than are currently available. Given that semantic similarity measures are directly dependent on these evolutions, we can expect to see them gaining more relevance and even becoming as essential as sequence similarity is today in biomedical research.
doi:10.1371/journal.pcbi.1000443
PMCID: PMC2712090  PMID: 19649320
12.  The minimum information about a genome sequence (MIGS) specification 
Nature biotechnology  2008;26(5):541-547.
With the quantity of genomic data increasing at an exponential rate, it is imperative that these data be captured electronically, in a standard format. Standardization activities must proceed within the auspices of open-access and international working bodies. To tackle the issues surrounding the development of better descriptions of genomic investigations, we have formed the Genomic Standards Consortium (GSC). Here, we introduce the minimum information about a genome sequence (MIGS) specification with the intent of promoting participation in its development and discussing the resources that will be required to develop improved mechanisms of metadata capture and exchange. As part of its wider goals, the GSC also supports improving the ‘transparency’ of the information contained in existing genomic databases.
doi:10.1038/nbt1360
PMCID: PMC2409278  PMID: 18464787
13.  Understanding and using the meaning of statements in a bio-ontology: recasting the Gene Ontology in OWL 
BMC Bioinformatics  2007;8:57.
The bio-ontology community falls into two camps: first we have biology domain experts, who actually hold the knowledge we wish to capture in ontologies; second, we have ontology specialists, who hold knowledge about techniques and best practice on ontology development. In the bio-ontology domain, these two camps have often come into conflict, especially where pragmatism comes into conflict with perceived best practice. One of these areas is the insistence of computer scientists on a well-defined semantic basis for the Knowledge Representation language being used. In this article, we will first describe why this community is so insistent. Second, we will illustrate this by examining the semantics of the Web Ontology Language and the semantics placed on the Directed Acyclic Graph as used by the Gene Ontology. Finally we will reconcile the two representations, including the broader Open Biomedical Ontologies format. The ability to exchange between the two representations means that we can capitalise on the features of both languages. Such utility can only arise by the understanding of the semantics of the languages being used. By this illustration of the usefulness of a clear, well-defined language semantics, we wish to promote a wider understanding of the computer science perspective amongst potential users within the biological community.
doi:10.1186/1471-2105-8-57
PMCID: PMC1819394  PMID: 17311682
14.  The Eighth Annual Bio-Ontologies Meeting 
doi:10.1371/journal.pcbi.0010077
PMCID: PMC1323465  PMID: 16758004
15.  The 8th Annual Bio-Ontologies Meeting 
Comparative and Functional Genomics  2005;6(7-8):370-372.
doi:10.1002/cfg.499
PMCID: PMC2447490  PMID: 18629196
16.  The Seventh Annual Bio-Ontologies Meeting Moat House Hotel, Glasgow, 30 July 2004 
Comparative and Functional Genomics  2004;5(6-7):498-500.
The Annual Bio-Ontologies Meeting [1] has now reached its seventh consecutive year, running as a special interest group (SIG) of the much larger ISMB conference. This year's meeting in Glasgow had approximately 100 attendees. Since the advent of the Gene Ontology, which coincided with the first Bio-Ontologies Meeting, we have seen a year-on-year strengthening of the field; bio-ontologies has moved from being dominated by computer science to be led by biological applications; discussion is less about ‘what is an ontology?’ and more about ‘how to build an ontology which is fit for purpose?’. This strengthening of the field can be seen elsewhere. Both the main ISMB conference and this year's Pacific Symposium on Biocomputing (PSB) [2] have seen a large number of submissions to their ontologies track. For the first time a selection of the papers from the SIG is being published in this issue of Comparative and Functional Genomics. We hope that this will complement the publications of the larger conferences, bringing to a wider audience the cutting edge research that characterizes the Bio-Ontologies SIG.
doi:10.1002/cfg.433
PMCID: PMC2447436  PMID: 18629147
17.  ISMB 2003 Bio-ontologies SIG and Sixth Annual Bio-ontologies Meeting Report 
The Annual Bio-Ontologies meeting (http://www.cs.man.ac.uk/˜stevens/meeting03/) has now been running for 6 consecutive years, as a special interest group (SIG) of the much larger ISMB conference. It met in Brisbane, Australia, this summer, the first time it was held outside North America or Europe. The bio-ontologies meeting is 1 day long and normally has around 100 attendees. This year there were many fewer, no doubt a result of the distance, global politics and SARS. The meeting consisted of a series of 30 min talks with no formal peer review or publication. Talks ranged in style from fairly formal and complete pieces of work, through works in progress, to the very informal and discursive. Each year's meeting has a theme and this year it was ‘ontologies, and text processing’. There is a tendency for those submitting talks to ignore the theme completely, but this year's theme obviously struck a chord, as half the programme was about ontologies and text analysis (http://www.cs.man.ac.uk/˜stevensr/meeting03/programme.html). Despite the smaller size of the meeting, the programme was particularly strong this year, meaning that the tension between allowing time for the many excellent talks, discussion and questions from the floor was particular keenly felt. A happy problem to have!
doi:10.1002/cfg.339
PMCID: PMC2447310  PMID: 18629028
18.  Building Ontologies in DAML + OIL 
In this article we describe an approach to representing and building ontologies advocated by the Bioinformatics and Medical Informatics groups at the University of Manchester. The hand-crafting of ontologies offers an easy and rapid avenue to delivering ontologies. Experience has shown that such approaches are unsustainable. Description logic approaches have been shown to offer computational support for building sound, complete and logically consistent ontologies. A new knowledge representation language, DAML + OIL, offers a new standard that is able to support many styles of ontology, from hand-crafted to full logic-based descriptions with reasoning support. We describe this language, the OilEd editing tool, reasoning support and a strategy for the language’s use. We finish with a current example, in the Gene Ontology Next Generation (GONG) project, that uses DAML + OIL as the basis for moving the Gene Ontology from its current hand-crafted, form to one that uses logical descriptions of a concept’s properties to deliver a more complete version of the ontology.
doi:10.1002/cfg.233
PMCID: PMC2447401  PMID: 18629114

Results 1-18 (18)