The HIV epidemic has been continuously growing among women, and in some parts of the world, HIV-infected women outnumber men. Women's greater vulnerability to HIV, both biologically and socially, influences their health risk and health outcome. This disparity between sexes has been established for other diseases, for example, autoimmune diseases, malignancies and cardiovascular diseases. Differences in drug effects and treatment outcomes have also been demonstrated.
Despite proven sex and gender differences, women continue to be underrepresented in clinical trials, and the absence of gender analyses in published literature is striking. There is a growing advocacy for consideration of women in research, in particular in the HIV field, and gender mainstreaming of policies is increasingly called for. However, these efforts have not translated into improved reporting of sex-disaggregated data and provision of gender analysis in published literature; science editors, as well as publishers, lag behind in this effort.
Instructions for authors issued by journals contain many guidelines for good standards of reporting, and a policy on sex-disaggregated data and gender analysis should not be amiss here. It is time for editors and publishers to demonstrate leadership in changing the paradigm in the world of scientific publication. We encourage authors, peer reviewers and fellow editors to lend their support by taking necessary measures to substantially improve reporting of gender analysis. Editors' associations could play an essential role in facilitating a transition to improved standard editorial policies.
Several fields have created ontologies for their subdomains. For example, the biological sciences have developed extensive ontologies such as the Gene Ontology, which is considered a great success. Ontologies could provide similar advantages to the Modeling and Simulation community. They provide a way to establish common vocabularies and capture knowledge about a particular domain with community-wide agreement. Ontologies can support significantly improved (semantic) search and browsing, integration of heterogeneous information sources, and improved knowledge discovery capabilities. This paper discusses the design and development of an ontology for Modeling and Simulation called the Discrete-event Modeling Ontology (DeMO), and it presents prototype applications that demonstrate various uses and benefits that such an ontology may provide to the Modeling and Simulation community.
Discrete systems; simulation environments; standards; Web-based environments
Biological ontologies are now being widely used for annotation, sharing and retrieval of the biological data. Many of these ontologies are hosted under the umbrella of the Open Biological Ontologies Foundry. In order to support interterminology mapping, composite terms in these ontologies need to be translated into atomic or primitive terms in other, orthogonal ontologies, for example, gluconeogenesis (biological process term) to glucose (chemical ontology term). Identifying such decompositional ontology translations is a challenging problem. In this paper, we propose a network-theoretic approach based on the structure of the integrated OBO relationship graph. We use a network-theoretic measure, called the clustering coefficient, to find relevant atomic terms in the neighborhood of a composite term. By eliminating the existing GO to ChEBI Ontology mappings from OBO, we evaluate whether the proposed approach can re-identify the corresponding relationships. The results indicate that the network structure provides strong cues for decompositional ontology translation and the existing relationships can be used to identify new translations.
network theory; biomedical ontologies; ontology translation; open biomedical ontologies
Ontologies have increasingly been used in the biomedical domain, which has prompted the emergence of different initiatives to facilitate their development and integration. The Open Biological and Biomedical Ontologies (OBO) Foundry consortium provides a repository of life-science ontologies, which are developed according to a set of shared principles. This consortium has developed an ontology called OBO Relation Ontology aiming at standardizing the different types of biological entity classes and associated relationships. Since ontologies are primarily intended to be used by humans, the use of graphical notations for ontology development facilitates the capture, comprehension and communication of knowledge between its users. However, OBO Foundry ontologies are captured and represented basically using text-based notations. The Unified Modeling Language (UML) provides a standard and widely-used graphical notation for modeling computer systems. UML provides a well-defined set of modeling elements, which can be extended using a built-in extension mechanism named Profile. Thus, this work aims at developing a UML profile for the OBO Relation Ontology to provide a domain-specific set of modeling elements that can be used to create standard UML-based ontologies in the biomedical domain.
We have studied the OBO Relation Ontology, the UML metamodel and the UML profiling mechanism. Based on these studies, we have proposed an extension to the UML metamodel in conformance with the OBO Relation Ontology and we have defined a profile that implements the extended metamodel. Finally, we have applied the proposed UML profile in the development of a number of fragments from different ontologies. Particularly, we have considered the Gene Ontology (GO), the PRotein Ontology (PRO) and the Xenopus Anatomy and Development Ontology (XAO).
The use of an established and well-known graphical language in the development of biomedical ontologies provides a more intuitive form of capturing and representing knowledge than using only text-based notations. The use of the profile requires the domain expert to reason about the underlying semantics of the concepts and relationships being modeled, which helps preventing the introduction of inconsistencies in an ontology under development and facilitates the identification and correction of errors in an already defined ontology.
In recent years, as a knowledge-based discipline, bioinformatics has been made more computationally amenable. After its beginnings as a technology advocated by computer scientists to overcome problems of heterogeneity, ontology has been taken up by biologists themselves as a means to consistently annotate features from genotype to phenotype. In medical informatics, artifacts called ontologies have been used for a longer period of time to produce controlled lexicons for coding schemes. In this article, we review the current position in ontologies and how they have become institutionalized within biomedicine. As the field has matured, the much older philosophical aspects of ontology have come into play. With this and the institutionalization of ontology has come greater formality. We review this trend and what benefits it might bring to ontologies and their use within biomedicine.
bio-ontology; medical ontology; annotation; knowledge; knowledge representation; history
An abstraction network is an auxiliary network of nodes and links that provides a compact, high-level view of an ontology. Such a view lends support to ontology orientation, comprehension, and quality-assurance efforts. A methodology is presented for deriving a kind of abstraction network, called a partial-area taxonomy, for the Ontology of Clinical Research (OCRe). OCRe was selected as a representative of ontologies implemented using the Web Ontology Language (OWL) based on shared domains. The derivation of the partial-area taxonomy for the Entity hierarchy of OCRe is described. Utilizing the visualization of the content and structure of the hierarchy provided by the taxonomy, the Entity hierarchy is audited, and several errors and inconsistencies in OCRe’s modeling of its domain are exposed. After appropriate corrections are made to OCRe, a new partial-area taxonomy is derived. The generalizability of the paradigm of the derivation methodology to various families of biomedical ontologies is discussed.
We present an analysis of some considerations involved in expressing the Gene Ontology (GO) as a machine-processible ontology, reflecting principles of formal ontology.
GO is a controlled vocabulary that is intended to facilitate communication between
biologists by standardizing usage of terms in database annotations. Making such
controlled vocabularies maximally useful in support of bioinformatics applications
requires explicating in machine-processible form the implicit background information
that enables human users to interpret the meaning of the vocabulary terms.
In the case of GO, this process would involve rendering the meanings of GO into
a formal (logical) language with the help of domain experts, and adding additional
information required to support the chosen formalization. A controlled vocabulary
augmented in these ways is commonly called an ontology. In this paper, we make a
modest exploration to determine the ontological requirements for this extended version
of GO. Using the terms within the three GO hierarchies (molecular function,
biological process and cellular component), we investigate the facility with which
GO concepts can be ontologized, using available tools from the philosophical and
ontological engineering literature.
Effective Medline database exploration is critical for the understanding of high throughput experimental results and the development of novel hypotheses about the mechanisms underlying the targeted biological processes. While existing solutions enhance Medline exploration through different approaches such as document clustering, network presentations of underlying conceptual relationships and the mapping of search results to MeSH and Gene Ontology trees, we believe the use of multiple ontologies from the Open Biomedical Ontology can greatly help researchers to explore literature from different perspectives as well as to quickly locate the most relevant Medline records for further investigation.
We developed an ontology-based interactive Medline exploration solution called PubOnto to enable the interactive exploration and filtering of search results through the use of multiple ontologies from the OBO foundry. The PubOnto program is a rich internet application based on the FLEX platform. It contains a number of interactive tools, visualization capabilities, an open service architecture, and a customizable user interface. It is freely accessible at: .
With the growing amount of biomedical data available in public databases it has become increasingly important to annotate data in a consistent way in order to allow easy access to this rich source of information. Annotating the data using controlled vocabulary terms and ontologies makes it much easier to compare and analyze data from different sources. However, finding the correct controlled vocabulary terms can sometimes be a difficult task for the end user annotating these data.
In order to facilitate the location of the correct term in the correct controlled vocabulary or ontology, the Ontology Lookup Service was created. However, using the Ontology Lookup Service as a web service is not always feasible, especially for researchers without bioinformatics support. We have therefore created a Java front end to the Ontology Lookup Service, called the OLS Dialog, which can be plugged into any application requiring the annotation of data using controlled vocabulary terms, making it possible to find and use controlled vocabulary terms without requiring any additional knowledge about web services or ontology formats.
As a user-friendly open source front end to the Ontology Lookup Service, the OLS Dialog makes it straightforward to include controlled vocabulary support in third-party tools, which ultimately makes the data even more valuable to the biomedical community.
The integration of biomedical terminologies is indispensable to the process
of information integration. When terminologies are linked merely
through the alignment of their leaf terms, however, differences in context
and ontological structure are ignored. Making use of the SNAP and
SPAN ontologies, we show how three reference domain ontologies can be
integrated at a higher level, through what we shall call the OBR framework (for: Ontology
of Biomedical Reality). OBR is designed to facilitate
inference across the boundaries of domain ontologies in anatomy, physiology
ontology integration; top-level ontology; domain ontology; terminology; biomedicine
Integrating and relating images with clinical and molecular data is a crucial activity in translational research, but challenging because the information in images is not explicit in standard computer-accessible formats. We have developed an ontology-based representation of the semantic contents of radiology images called AIM (Annotation and Image Markup). AIM specifies the quantitative and qualitative content that researchers extract from images. The AIM ontology enables semantic image annotation and markup, specifying the entities and relations necessary to describe images. AIM annotations, represented as instances in the ontology, enable key use cases for images in translational research such as disease status assessment, query, and inter-observer variation analysis. AIM will enable ontology-based query and mining of images, and integration of images with data in other ontology-annotated bioinformatics databases. Our ultimate goal is to enable researchers to link images with related scientific data so they can learn the biological and physiological significance of the image content.
The Gene Ontology (GO) has become the internationally accepted standard for representing function, process, and location aspects of gene products. The wealth of GO annotation data provides a valuable source of implicit knowledge of relationships among these aspects. We describe a new method for association rule mining to discover implicit co-occurrence relationships across the GO sub-ontologies at multiple levels of abstraction. Prior work on association rule mining in the GO has concentrated on mining knowledge at a single level of abstraction and/or between terms from the same sub-ontology. We have developed a bottom-up generalization procedure called Cross-Ontology Data Mining-Level by Level (COLL) that takes into account the structure and semantics of the GO, generates generalized transactions from annotation data and mines interesting multi-level cross-ontology association rules. We applied our method on publicly available chicken and mouse GO annotation datasets and mined 5368 and 3959 multi-level cross ontology rules from the two datasets respectively. We show that our approach discovers more and higher quality association rules from the GO as evaluated by biologists in comparison to previously published methods. Biologically interesting rules discovered by our method reveal unknown and surprising knowledge about co-occurring GO terms.
The technological advances in the past decade have lead to massive progress in the field of biotechnology. The documentation of the progress made exists in the form of research articles. The PubMed is the current most used repository for bio-literature. PubMed consists of about 17 million abstracts as of 2007 that require methods to efficiently retrieve and browse large volume of relevant information. The State-of-the-art technologies such as GOPubmed use simple keyword-based techniques for retrieving abstracts from the PubMed and linking them to the Gene Ontology (GO). This paper changes the paradigm by introducing semantics enabled technique to link the PubMed to the Gene Ontology, called, SEGOPubmed for ontology-based browsing. Latent Semantic Analysis (LSA) framework is used to semantically interface PubMed abstracts to the Gene Ontology.
The Empirical analysis is performed to compare the performance of the SEGOPubmed with the GOPubmed. The analysis is initially performed using a few well-referenced query words. Further, statistical analysis is performed using GO curated dataset as ground truth. The analysis suggests that the SEGOPubmed performs better than the classic GOPubmed as it incorporates semantics.
The LSA technique is applied on the PubMed abstracts obtained based on the user query and the semantic similarity between the query and the abstracts. The analyses using well-referenced keywords show that the proposed semantic-sensitive technique outperformed the string comparison based techniques in associating the relevant abstracts to the GO terms. The SEGOPubmed also extracted the abstracts in which the keywords do not appear in isolation (i.e. they appear in combination with other terms) that could not be retrieved by simple term matching techniques.
In recent years, ontologies have become a mainstream topic in biomedical research. When biological entities are described using a common schema, such as an ontology, they can be compared by means of their annotations. This type of comparison is called semantic similarity, since it assesses the degree of relatedness between two entities by the similarity in meaning of their annotations. The application of semantic similarity to biomedical ontologies is recent; nevertheless, several studies have been published in the last few years describing and evaluating diverse approaches. Semantic similarity has become a valuable tool for validating the results drawn from biomedical studies such as gene clustering, gene expression data analysis, prediction and validation of molecular interactions, and disease gene prioritization.
We review semantic similarity measures applied to biomedical ontologies and propose their classification according to the strategies they employ: node-based versus edge-based and pairwise versus groupwise. We also present comparative assessment studies and discuss the implications of their results. We survey the existing implementations of semantic similarity measures, and we describe examples of applications to biomedical research. This will clarify how biomedical researchers can benefit from semantic similarity measures and help them choose the approach most suitable for their studies.
Biomedical ontologies are evolving toward increased coverage, formality, and integration, and their use for annotation is increasingly becoming a focus of both effort by biomedical experts and application of automated annotation procedures to create corpora of higher quality and completeness than are currently available. Given that semantic similarity measures are directly dependent on these evolutions, we can expect to see them gaining more relevance and even becoming as essential as sequence similarity is today in biomedical research.
While classifications of mental disorders have existed for over one hundred years, it still remains unspecified what terms such as 'mental disorder', 'disease' and 'illness' might actually denote. While ontologies have been called in aid to address this shortfall since the GALEN project of the early 1990s, most attempts thus far have sought to provide a formal description of the structure of some pre-existing terminology or classification, rather than of the corresponding structures and processes on the side of the patient.
We here present a view of mental disease that is based on ontological realism and which follows the principles embodied in Basic Formal Ontology (BFO) and in the application of BFO in the Ontology of General Medical Science (OGMS). We analyzed statements about what counts as a mental disease provided (1) in the research agenda for the DSM-V, and (2) in Pies' model. The results were used to assess whether the representational units of BFO and OGMS were adequate as foundations for a formal representation of the entities in reality that these statements attempt to describe. We then analyzed the representational units specific to mental disease and provided corresponding definitions.
Our key contributions lie in the identification of confusions and conflations in the existing terminology of mental disease and in providing what we believe is a framework for the sort of clear and unambiguous reference to entities on the side of the patient that is needed in order to avoid these confusions in the future.
One criterion for the well-formedness of ontologies is that their hierarchical structure forms a lattice. Formal Concept Analysis (FCA) has been used as a technique for assessing the quality of ontologies, but is not scalable to large ontologies such as SNOMED CT (> 300k concepts). We developed a methodology called Lattice-based Structural Auditing (LaSA), for auditing biomedical ontologies, implemented through automated SPARQL queries, in order to exhaustively identify all non-lattice pairs in SNOMED CT. The percentage of non-lattice pairs ranges from 0 to 1.66 among the 19 SNOMED CT hierarchies. Preliminary manual inspection of a limited portion of the over 544k non-lattice pairs, among over 356 million candidate pairs, revealed inconsistent use of precoordination in SNOMED CT, but also a number of false positives. Our results are consistent with those based on FCA, with the advantage that the LaSA pipeline is scalable and applicable to ontological systems consisting mostly of taxonomic links.
Using Semantic-Web specifications to represent temporal information in clinical narratives is an important step for temporal reasoning and answering time-oriented queries. Existing temporal models are either not compatible with the powerful reasoning tools developed for the Semantic Web, or designed only for structured clinical data and therefore are not ready to be applied on natural-language-based clinical narrative reports directly. We have developed a Semantic-Web ontology which is called Clinical Narrative Temporal Relation ontology. Using this ontology, temporal information in clinical narratives can be represented as RDF (Resource Description Framework) triples. More temporal information and relations can then be inferred by Semantic-Web based reasoning tools. Experimental results show that this ontology can represent temporal information in real clinical narratives successfully.
This paper addresses a family of issues surrounding the biological phenomenon of resistance and its representation in realist ontologies. The treatments of resistance terms in various existing ontologies are examined and found to be either overly narrow, internally inconsistent, or otherwise problematic. We propose a more coherent characterization of resistance in terms of what we shall call blocking dispositions, which are collections of mutually coordinated dispositions which are of such a sort that they cannot undergo simultaneous realization within a single bearer. A definition of ‘protective resistance’ is proposed for use in the Infectious Disease Ontology (IDO) and we show how this definition can be used to characterize the antibiotic resistance in Methicillin-Resistant Staphylococcus aureus (MRSa). The ontological relations between entities in our MRSa case study are used alongside a series of logical inference rules to illustrate logical reasoning about resistance. A description logic representation of blocking dispositions is also provided. We demonstrate that our characterization of resistance is sufficiently general to cover two other cases of resistance in the infectious disease domain involving HIV and malaria.
Infectious Disease Ontology; Basic Formal Ontology; MRSa
Biomedical ontologies have become an increasingly critical lens through which researchers analyze the genomic, clinical and bibliographic data that fuels scientific research. Of particular relevance are methods, such as enrichment analysis, that quantify the importance of ontology classes relative to a collection of domain data. Current analytical techniques, however, remain limited in their ability to handle many important types of structural complexity encountered in real biological systems including class overlaps, continuously valued data, inter-instance relationships, non-hierarchical relationships between classes, semantic distance and sparse data.
In this paper, we describe a methodology called Markov Chain Ontology Analysis (MCOA) and illustrate its use through a MCOA-based enrichment analysis application based on a generative model of gene activation. MCOA models the classes in an ontology, the instances from an associated dataset and all directional inter-class, class-to-instance and inter-instance relationships as a single finite ergodic Markov chain. The adjusted transition probability matrix for this Markov chain enables the calculation of eigenvector values that quantify the importance of each ontology class relative to other classes and the associated data set members. On both controlled Gene Ontology (GO) data sets created with Escherichia coli, Drosophila melanogaster and Homo sapiens annotations and real gene expression data extracted from the Gene Expression Omnibus (GEO), the MCOA enrichment analysis approach provides the best performance of comparable state-of-the-art methods.
A methodology based on Markov chain models and network analytic metrics can help detect the relevant signal within large, highly interdependent and noisy data sets and, for applications such as enrichment analysis, has been shown to generate superior performance on both real and simulated data relative to existing state-of-the-art approaches.