The theory of probability is widely used in biomedical research for data analysis and modelling. In previous work the probabilities of the research hypotheses have been recorded as experimental metadata. The ontology HELO is designed to support probabilistic reasoning, and provides semantic descriptors for reporting on research that involves operations with probabilities. HELO explicitly links research statements such as hypotheses, models, laws, conclusions, etc. to the associated probabilities of these statements being true. HELO enables the explicit semantic representation and accurate recording of probabilities in hypotheses, as well as the inference methods used to generate and update those hypotheses. We demonstrate the utility of HELO on three worked examples: changes in the probability of the hypothesis that sirtuins regulate human life span; changes in the probability of hypotheses about gene functions in the S. cerevisiae aromatic amino acid pathway; and the use of active learning in drug design (quantitative structure activity relation learning), where a strategy for the selection of compounds with the highest probability of improving on the best known compound was used. HELO is open source and available at https://github.com/larisa-soldatova/HELO
ontology; knowledge representation; probabilistic reasoning
Over the 15 years, the Bio-Ontologies SIG at ISMB has provided a forum for discussion of the latest and most innovative research in the bio-ontologies development, its applications to biomedicine and more generally the organisation, presentation and dissemination of knowledge in biomedicine and the life sciences. The seven papers and the commentary selected for this supplement span a wide range of topics including: web-based querying over multiple ontologies, integration of data, annotating patent records, NCBO Web services, ontology developments for probabilistic reasoning and for physiological processes, and analysis of the progress of annotation and structural GO changes.
Over the 14 years, the Bio-Ontologies SIG at ISMB has provided a forum for discussion of the latest and most innovative research in the bio-ontologies development, its applications to biomedicine and more generally the organisation, presentation and dissemination of knowledge in biomedicine and the life sciences. The seven papers selected for this supplement span a wide range of topics including: web-based querying over multiple ontologies, integration of data from wikis, innovative methods of annotating and mining electronic health records, advances in annotating web documents and biomedical literature, quality control of ontology alignments, and the ontology support for predictive models about toxicity and open access to the toxicity data.
Over the years, the Bio-Ontologies SIG at ISMB has provided a forum for discussion of the latest and most innovative research in the application of ontologies and more generally the organisation, presentation and dissemination of knowledge in biomedicine and the life sciences. The ten papers selected for this supplement are extended versions of the original papers presented at the 2010 SIG. The papers span a wide range of topics including practical solutions for data and knowledge integration for translational medicine, hypothesis based querying , understanding kidney and urinary pathways, mining the pharmacogenomics literature; theoretical research into the orthogonality of biomedical ontologies, the representation of diseases, the representation of research hypotheses, the combination of ontologies and natural language processing for an annotation framework, the generation of textual definitions, and the discovery of gene interaction networks.
Hypotheses are now being automatically produced on an industrial scale by computers in biology, e.g. the annotation of a genome is essentially a large set of hypotheses generated by sequence similarity programs; and robot scientists enable the full automation of a scientific investigation, including generation and testing of research hypotheses.
This paper proposes a logically defined way for recording automatically generated hypotheses in machine amenable way. The proposed formalism allows the description of complete hypotheses sets as specified input and output for scientific investigations. The formalism supports the decomposition of research hypotheses into more specialised hypotheses if that is required by an application. Hypotheses are represented in an operational way – it is possible to design an experiment to test them. The explicit formal description of research hypotheses promotes the explicit formal description of the results and conclusions of an investigation. The paper also proposes a framework for automated hypotheses generation. We demonstrate how the key components of the proposed framework are implemented in the Robot Scientist “Adam”.
A formal representation of automatically generated research hypotheses can help to improve the way humans produce, record, and validate research hypotheses.
The reuse of scientific knowledge obtained from one investigation in another investigation is basic to the advance of science. Scientific investigations should therefore be recorded in ways that promote the reuse of the knowledge they generate. The use of logical formalisms to describe scientific knowledge has potential advantages in facilitating such reuse. Here, we propose a formal framework for using logical formalisms to promote reuse. We demonstrate the utility of this framework by using it in a worked example from biology: demonstrating cycles of investigation formalization [F] and reuse [R] to generate new knowledge. We first used logic to formally describe a Robot scientist investigation into yeast (Saccharomyces cerevisiae) functional genomics [f1]. With Robot scientists, unlike human scientists, the production of comprehensive metadata about their investigations is a natural by-product of the way they work. We then demonstrated how this formalism enabled the reuse of the research in investigating yeast phenotypes [r1 = R(f1)]. This investigation found that the removal of non-essential enzymes generally resulted in enhanced growth. The phenotype investigation was then formally described using the same logical formalism as the functional genomics investigation [f2 = F(r1)]. We then demonstrated how this formalism enabled the reuse of the phenotype investigation to investigate yeast systems-biology modelling [r2 = R(f2)]. This investigation found that yeast flux-balance analysis models fail to predict the observed changes in growth. Finally, the systems biology investigation was formalized for reuse in future investigations [f3 = F(r2)]. These cycles of reuse are a model for the general reuse of scientific knowledge.
semantic web; logic; Saccharomyces cerevisiae; ontology
We review the main components of autonomous scientific discovery, and how they lead to the concept of a Robot Scientist. This is a system which uses techniques from artificial intelligence to automate all aspects of the scientific discovery process: it generates hypotheses from a computer model of the domain, designs experiments to test these hypotheses, runs the physical experiments using robotic systems, analyses and interprets the resulting data, and repeats the cycle. We describe our two prototype Robot Scientists: Adam and Eve. Adam has recently proven the potential of such systems by identifying twelve genes responsible for catalysing specific reactions in the metabolic pathways of the yeast Saccharomyces cerevisiae. This work has been formally recorded in great detail using logic. We argue that the reporting of science needs to become fully formalised and that Robot Scientists can help achieve this. This will make scientific information more reproducible and reusable, and promote the integration of computers in scientific reasoning. We believe the greater automation of both the physical and intellectual aspects of scientific investigations to be essential to the future of science. Greater automation improves the accuracy and reliability of experiments, increases the pace of discovery and, in common with conventional laboratory automation, removes tedious and repetitive tasks from the human scientist.
Motivation: Many published manuscripts contain experiment protocols which are poorly described or deficient in information. This means that the published results are very hard or impossible to repeat. This problem is being made worse by the increasing complexity of high-throughput/automated methods. There is therefore a growing need to represent experiment protocols in an efficient and unambiguous way.
Results: We have developed the Experiment ACTions (EXACT) ontology as the basis of a method of representing biological laboratory protocols. We provide example protocols that have been formalized using EXACT, and demonstrate the advantages and opportunities created by using this formalization. We argue that the use of EXACT will result in the publication of protocols with increased clarity and usefulness to the scientific community.
Availability: The ontology, examples and code can be downloaded from http://www.aber.ac.uk/compsci/Research/bio/dss/EXACT/
Contact: Larisa Soldatova email@example.com
The formal description of experiments for efficient analysis, annotation and sharing of results is a fundamental part of the practice of science. Ontologies are required to achieve this objective. A few subject-specific ontologies of experiments currently exist. However, despite the unity of scientific experimentation, no general ontology of experiments exists. We propose the ontology EXPO to meet this need. EXPO links the SUMO (the Suggested Upper Merged Ontology) with subject-specific ontologies of experiments by formalizing the generic concepts of experimental design, methodology and results representation. EXPO is expressed in the W3C standard ontology language OWL-DL. We demonstrate the utility of EXPO and its ability to describe different experimental domains, by applying it to two experiments: one in high-energy physics and the other in phylogenetics. The use of EXPO made the goals and structure of these experiments more explicit, revealed ambiguities, and highlighted an unexpected similarity. We conclude that, EXPO is of general value in describing experiments and a step towards the formalization of science.
ontology; formalization; annotation; artificial intelligence; metadata
Experimental descriptions are typically stored as free text without using standardized terminology, creating challenges in comparison, reproduction and analysis. These difficulties impose limitations on data exchange and information retrieval.
The Ontology for Biomedical Investigations (OBI), developed as a global, cross-community effort, provides a resource that represents biomedical investigations in an explicit and integrative framework. Here we detail three real-world applications of OBI, provide detailed modeling information and explain how to use OBI.
We demonstrate how OBI can be applied to different biomedical investigations to both facilitate interpretation of the experimental process and increase the computational processing and integration within the Semantic Web. The logical definitions of the entities involved allow computers to unambiguously understand and integrate different biological experimental processes and their relevant components.
OBI is available at http://purl.obolibrary.org/obo/obi/2009-11-02/obi.owl