Phenotypes are the observable characteristics of an organism arising from its response to the environment. Phenotypes associated with engineered and natural genetic variation are widely recorded using phenotype ontologies in model organisms, as are signs and symptoms of human Mendelian diseases in databases such as OMIM and Orphanet. Exploiting these resources, several computational methods have been developed for integration and analysis of phenotype data to identify the genetic etiology of diseases or suggest plausible interventions. A similar resource would be highly useful not only for rare and Mendelian diseases, but also for common, complex and infectious diseases. We apply a semantic text-mining approach to identify the phenotypes (signs and symptoms) associated with over 6,000 diseases. We evaluate our text-mined phenotypes by demonstrating that they can correctly identify known disease-associated genes in mice and humans with high accuracy. Using a phenotypic similarity measure, we generate a human disease network in which diseases that have similar signs and symptoms cluster together, and we use this network to identify closely related diseases based on common etiological, anatomical as well as physiological underpinnings.
The use of model organisms to provide information on gene function has proved to be a powerful approach to our understanding of both human disease and fundamental mammalian biology. Large-scale community projects using mice, based on forward and reverse genetics, and now the pan-genomic phenotyping efforts of the International Mouse Phenotyping Consortium (IMPC), are generating resources on an unprecedented scale which will be extremely valuable to human genetics and medicine. We discuss the nature and availability of data, mice and ES cells from these large-scale programmes, the use of these resources to help prioritise and validate candidate genes in human genetic association studies, and how they can improve our understanding of the underlying pathobiology of human disease.
mouse; genetics; phenotyping; human; ontology; GWAS; CNV; database
Semantic similarity measures over phenotype ontologies have been demonstrated to provide a powerful approach for the analysis of model organism phenotypes, the discovery of animal models of human disease, novel pathways, gene functions, druggable therapeutic targets, and determination of pathogenicity.
We have developed PhenomeNET 2, a system that enables similarity-based searches over a large repository of phenotypes in real-time. It can be used to identify strains of model organisms that are phenotypically similar to human patients, diseases that are phenotypically similar to model organism phenotypes, or drug effect profiles that are similar to the phenotypes observed in a patient or model organism. PhenomeNET 2 is available at http://aber-owl.net/phenomenet.
Phenotype-similarity searches can provide a powerful tool for the discovery and investigation of molecular mechanisms underlying an observed phenotypic manifestation. PhenomeNET 2 facilitates user-defined similarity searches and allows researchers to analyze their data within a large repository of human, mouse and rat phenotypes.
Phenotype; Semantic similarity; Ontology
Many ontologies have been developed in biology and these ontologies increasingly contain large volumes of formalized knowledge commonly expressed in the Web Ontology Language (OWL). Computational access to the knowledge contained within these ontologies relies on the use of automated reasoning.
We have developed the Aber-OWL infrastructure that provides reasoning services for bio-ontologies. Aber-OWL consists of an ontology repository, a set of web services and web interfaces that enable ontology-based semantic access to biological data and literature. Aber-OWL is freely available at http://aber-owl.net.
Aber-OWL provides a framework for automatically accessing information that is annotated with ontologies or contains terms used to label classes in ontologies. When using Aber-OWL, access to ontologies and data annotated with them is not merely based on class names or identifiers but rather on the knowledge the ontologies contain and the inferences that can be drawn from it.
Ontology-based data access; Linked data; OWL
The systematic investigation of the phenotypes associated with genotypes in model organisms holds the promise of revealing genotype–phenotype relations directly and without additional, intermediate inferences. Large-scale projects are now underway to catalog the complete phenome of a species, notably the mouse. With the increasing amount of phenotype information becoming available, a major challenge that biology faces today is the systematic analysis of this information and the translation of research results across species and into an improved understanding of human disease. The challenge is to integrate and combine phenotype descriptions within a species and to systematically relate them to phenotype descriptions in other species, in order to form a comprehensive understanding of the relations between those phenotypes and the genotypes involved in human disease. We distinguish between two major approaches for comparative phenotype analyses: the first relies on evolutionary relations to bridge the species gap, while the other approach compares phenotypes directly. In particular, the direct comparison of phenotypes relies heavily on the quality and coherence of phenotype and disease databases. We discuss major achievements and future challenges for these databases in light of their potential to contribute to the understanding of the molecular mechanisms underlying human disease. In particular, we discuss how the use of ontologies and automated reasoning can significantly contribute to the analysis of phenotypes and demonstrate their potential for enabling translational research.
phenotype; animal model; disease; database; comparative phenomics; ontology
The 2013 Rostock Symposium on Systems Biology and Bioinformatics in Aging Research was again dedicated to dissecting the aging process using in silico means. A particular focus was on ontologies, because these are a key technology to systematically integrate heterogeneous information about the aging process. Related topics were databases and data integration. Other talks tackled modeling issues and applications, the latter including talks focused on marker development and cellular stress as well as on diseases, in particular on diseases of kidney and skin.
Recent data from genome-wide chromosome conformation capture analysis indicate that the human genome is divided into conserved megabase-sized self-interacting regions called topological domains. These topological domains form the regulatory backbone of the genome and are separated by regulatory boundary elements or barriers. Copy-number variations can potentially alter the topological domain architecture by deleting or duplicating the barriers and thereby allowing enhancers from neighboring domains to ectopically activate genes causing misexpression and disease, a mutational mechanism that has recently been termed enhancer adoption.
We use the Human Phenotype Ontology database to relate the phenotypes of 922 deletion cases recorded in the DECIPHER database to monogenic diseases associated with genes in or adjacent to the deletions. We identify combinations of tissue-specific enhancers and genes adjacent to the deletion and associated with phenotypes in the corresponding tissue, whereby the phenotype matched that observed in the deletion. We compare this computationally with a gene-dosage pathomechanism that attempts to explain the deletion phenotype based on haploinsufficiency of genes located within the deletions. Up to 11.8% of the deletions could be best explained by enhancer adoption or a combination of enhancer adoption and gene-dosage effects.
Our results suggest that enhancer adoption caused by deletions of regulatory boundaries may contribute to a substantial minority of copy-number variation phenotypes and should thus be taken into account in their medical interpretation.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-014-0423-1) contains supplementary material, which is available to authorized users.
A forum of the Human Variome Project (HVP) was held as a satellite to the 2012 Annual Meeting of the American Society of Human Genetics in San Francisco, California. The theme of this meeting was “Getting Ready for the Human Phenome Project.” Understanding the genetic contribution to both rare single-gene “Mendelian” disorders and more complex common diseases will require integration of research efforts among many fields and better defined phenotypes. The HVP is dedicated to bringing together researchers and research populations throughout the world to provide the resources to investigate the impact of genetic variation on disease. To this end, there needs to be a greater sharing of phenotype and genotype data. For this to occur, many databases that currently exist will need to become interoperable to allow for the combining of cohorts with similar phenotypes to increase statistical power for studies attempting to identify novel disease genes or causative genetic variants. Improved systems and tools that enhance the collection of phenotype data from clinicians are urgently needed. This meeting begins the HVP’s effort toward this important goal.
meeting report; database; phenotype; database interoperability
Traditionally, biomarkers of aging are classified as either pro-longevity or antilongevity. Using longitudinal data sets from the large-scale inbred mouse strain study at the Jackson Laboratory Nathan Shock Center, we describe a protocol to identify two kinds of biomarkers: those with prognostic implication for lifespan and those with longitudinal evidence. Our protocol also identifies biomarkers for which, at first sight, there is conflicting evidence. Conflict resolution is possible by postulating a role switch. In these cases, high biomarker values are, for example, antilongevity in early life and pro-longevity in later life. Role-switching biomarkers correspond to features that must, for example, be minimized early, but maximized later, for optimal longevity. The clear-cut pro-longevity biomarkers we found reflect anti-inflammatory, anti-immunosenescent or anti-anaemic mechanisms, whereas clear-cut antilongevity biomarkers reflect inflammatory mechanisms. Many highly significant blood biomarkers relate to immune system features, indicating a shift from adaptive to innate processes, whereas most role-switching biomarkers relate to blood serum features and whole-body phenotypes. Our biomarker classification approach is applicable to any combination of longitudinal studies with life expectancy data, and it provides insights beyond a simplified scheme of biomarkers for long or short lifespan.
aging; anti-aging; inflammation; lifespan; longevity; mice; senescence
Following the technological advances that have enabled genome-wide analysis in most model organisms over the last decade, there has been unprecedented growth in genomic and post-genomic science with concomitant generation of an exponentially increasing volume of data and material resources. As a result, numerous repositories have been created to store and archive data, organisms and material, which are of substantial value to the whole community. Sustained access, facilitating re-use of these resources, is essential, not only for validation, but for re-analysis, testing of new hypotheses and developing new technologies/platforms. A common challenge for most data resources and biological repositories today is finding financial support for maintenance and development to best serve the scientific community. In this study we examine the problems that currently confront the data and resource infrastructure underlying the biomedical sciences. We discuss the financial sustainability issues and potential business models that could be adopted by biological resources and consider long term preservation issues within the context of mouse functional genomics efforts in Europe.
Motivation: Methods for computational drug target identification use information from diverse information sources to predict or prioritize drug targets for known drugs. One set of resources that has been relatively neglected for drug repurposing is animal model phenotype.
Results: We investigate the use of mouse model phenotypes for drug target identification. To achieve this goal, we first integrate mouse model phenotypes and drug effects, and then systematically compare the phenotypic similarity between mouse models and drug effect profiles. We find a high similarity between phenotypes resulting from loss-of-function mutations and drug effects resulting from the inhibition of a protein through a drug action, and demonstrate how this approach can be used to suggest candidate drug targets.
Availability and implementation: Analysis code and supplementary data files are available on the project Web site at https://drugeffects.googlecode.com.
email@example.com or firstname.lastname@example.org
Supplementary data are available at Bioinformatics online.
A major aim of the biological sciences is to gain an understanding of human physiology and disease. One important step towards such a goal is the discovery of the function of genes that will lead to better understanding of the physiology and pathophysiology of organisms ultimately providing better understanding, diagnosis, and therapy. Our increasing ability to phenotypically characterise genetic variants of model organisms coupled with systematic and hypothesis-driven mutagenesis is resulting in a wealth of information that could potentially provide insight to the functions of all genes in an organism. The challenge we are now facing is to develop computational methods that can integrate and analyse such data. The introduction of formal ontologies that make their semantics explicit and accessible to automated reasoning promises the tantalizing possibility of standardizing biomedical knowledge allowing for novel, powerful queries that bridge multiple domains, disciplines, species and levels of granularity. We review recent computational approaches that facilitate the integration of experimental data from model organisms with clinical observations in humans. These methods foster novel cross species analysis approaches, thereby enabling comparative phenomics and leading to the potential of translating basic discoveries from the model systems into diagnostic and therapeutic advances at the clinical level.
The capture and use of disease-related anatomic pathology data for both model organism phenotyping and human clinical practice requires a relatively simple nomenclature and coding system that can be integrated into data collection platforms (such as computerized medical record-keeping systems) to enable the pathologist to rapidly screen and accurately record observations. The MPATH ontology was originally constructed in 2,000 by a committee of pathologists for the annotation of rodent histopathology images, but is now widely used for coding and analysis of disease and phenotype data for rodents, humans and zebrafish.
Construction and content
MPATH is divided into two main branches describing pathological processes and structures based on traditional histopathological principles. It does not aim to include definitive diagnoses, which would generally be regarded as disease concepts. It contains 888 core pathology terms in an almost exclusively is_a hierarchy nine layers deep. Currently, 86% of the terms have textual definitions and contain relationships as well as logical axioms to other ontologies such the Gene Ontology.
Application and utility
MPATH was originally devised for the annotation of histopathological images from mice but is now being used much more widely in the recording of diagnostic and phenotypic data from both mice and humans, and in the construction of logical definitions for phenotype and disease ontologies. We discuss the use of MPATH to generate cross-products with qualifiers derived from a subset of the Phenotype and Trait Ontology (PATO) and its application to large-scale high-throughput phenotyping studies. MPATH provides a largely species-agnostic ontology for the descriptions of anatomic pathology, which can be applied to most amniotes and is now finding extensive use in species other than mice. It enables investigators to interrogate large datasets at a variety of depths, use semantic analysis to identify the relations between diseases in different species and integrate pathology data with other data types, such as pharmacogenomics.
Pathology; Ontology; Disease; Mouse; Phenotype
Mouse phenotype data represents a valuable resource for the identification of disease-associated genes, especially where the molecular basis is unknown and there is no clue to the candidate gene’s function, pathway involvement or expression pattern. However, until recently these data have not been systematically used due to difficulties in mapping between clinical features observed in humans and mouse phenotype annotations. Here, we describe a semantic approach to solve this problem and demonstrate highly significant recall of known disease-gene associations and orthology relationships. A web application (MouseFinder; www.mousemodels.org) has been developed to allow users to search the results of our whole-phenome comparison of human and mouse. We demonstrate its use in identifying ARTN as a strong candidate gene within the 1p34.1-p32 mapped locus for a hereditary form of ptosis.
phenotype; candidate disease genes; model organism; mouse
High-throughput phenotyping projects in model organisms have the potential to improve our understanding of gene functions and their role in living organisms. We have developed a computational, knowledge-based approach to automatically infer gene functions from phenotypic manifestations and applied this approach to yeast (Saccharomyces cerevisiae), nematode worm (Caenorhabditis elegans), zebrafish (Danio rerio), fruitfly (Drosophila melanogaster) and mouse (Mus musculus) phenotypes. Our approach is based on the assumption that, if a mutation in a gene leads to a phenotypic abnormality in a process , then must have been involved in , either directly or indirectly. We systematically analyze recorded phenotypes in animal models using the formal definitions created for phenotype ontologies. We evaluate the validity of the inferred functions manually and by demonstrating a significant improvement in predicting genetic interactions and protein-protein interactions based on functional similarity. Our knowledge-based approach is generally applicable to phenotypes recorded in model organism databases, including phenotypes from large-scale, high throughput community projects whose primary mode of dissemination is direct publication on-line rather than in the literature.
PhenomeNet is an approach for integrating phenotypes across species and identifying candidate genes for genetic diseases based on the similarity between a disease and animal model phenotypes. In contrast to ‘guilt-by-association’ approaches, PhenomeNet relies exclusively on the comparison of phenotypes to suggest candidate genes, and can, therefore, be applied to study the molecular basis of rare and orphan diseases for which the molecular basis is unknown. In addition to disease phenotypes from the Online Mendelian Inheritance in Man (OMIM) database, we have now integrated the clinical signs from Orphanet into PhenomeNet. We demonstrate that our approach can efficiently identify known candidate genes for genetic diseases in Orphanet and OMIM. Furthermore, we find evidence that mutations in the HIP1 gene might cause Bassoe syndrome, a rare disorder with unknown genetic aetiology. Our results demonstrate that integration and computational analysis of human disease and animal model phenotypes using PhenomeNet has the potential to reveal novel insights into the pathobiology underlying genetic diseases.
phenotype; animal model; rare disease; orphan disease; Orphanet; biomedical informatics
Numerous disease syndromes are associated with regions of copy number variation (CNV) in the human genome and, in most cases, the pathogenicity of the CNV is thought to be related to altered dosage of the genes contained within the affected segment. However, establishing the contribution of individual genes to the overall pathogenicity of CNV syndromes is difficult and often relies on the identification of potential candidates through manual searches of the literature and online resources. We describe here the development of a computational framework to comprehensively search phenotypic information from model organisms and single-gene human hereditary disorders, and thus speed the interpretation of the complex phenotypes of CNV disorders. There are currently more than 5000 human genes about which nothing is known phenotypically but for which detailed phenotypic information for the mouse and/or zebrafish orthologs is available. Here, we present an ontology-based approach to identify similarities between human disease manifestations and the mutational phenotypes in characterized model organism genes; this approach can therefore be used even in cases where there is little or no information about the function of the human genes. We applied this algorithm to detect candidate genes for 27 recurrent CNV disorders and identified 802 gene-phenotype associations, approximately half of which involved genes that were previously reported to be associated with individual phenotypic features and half of which were novel candidates. A total of 431 associations were made solely on the basis of model organism phenotype data. Additionally, we observed a striking, statistically significant tendency for individual disease phenotypes to be associated with multiple genes located within a single CNV region, a phenomenon that we denote as pheno-clustering. Many of the clusters also display statistically significant similarities in protein function or vicinity within the protein-protein interaction network. Our results provide a basis for understanding previously un-interpretable genotype-phenotype correlations in pathogenic CNVs and for mobilizing the large amount of model organism phenotype data to provide insights into human genetic disorders.
Units are basic scientific tools that render meaning to numerical data. Their standardization and formalization caters for the report, exchange, process, reproducibility and integration of quantitative measurements. Ontologies are means that facilitate the integration of data and knowledge allowing interoperability and semantic information processing between diverse biomedical resources and domains. Here, we present the Units Ontology (UO), an ontology currently being used in many scientific resources for the standardized description of units of measurements.
Expertise in the pathology of mice has expanded from traditional regulatory and drug safety screening (toxicologic pathology), primarily performed by veterinary pathologists to the highly specialized area of mouse research pathobiology performed by veterinary and medical pathologists encompassing phenotyping of mutant mice and analysis of research experiments exploiting inbred mouse strains and genetically engineered lines. With increasing use of genetically modified mice in research, mouse pathobiology and, by extension, expert mouse research-oriented pathologists have become integral to the success of basic and translational biomedical research. Training for today’s research-oriented mouse pathologist must go beyond knowledge of anatomic features of mice and strain-specific background diseases to the specialized genetic nomenclature, husbandry, and genetics, including the methodology of genetic engineering and complex trait analysis. While training can be accomplished through “apprenticeships” in formal programs, these are often heavily service-related and do not provide the necessary comprehensive training. Specialty courses and short term mentoring with expert specialists are opportunities that, when combined with active practice and publication, will lead to acquisition of the skills required for cutting-edge mouse-based experimental science.
Recent advances in gene knockout techniques and the in vivo analysis of mutant mice, together with the advent of large-scale projects for systematic mouse mutagenesis and genome-wide phenotyping, have allowed the creation of platforms for the most complete and systematic analysis of gene function ever undertaken in a vertebrate. The development of high-throughput phenotyping pipelines for these and other large-scale projects allows investigators to search and integrate large amounts of directly comparable phenotype data from many mutants, on a genomic scale, to help develop and test new hypotheses about the origins of disease and the normal functions of genes in the organism. Histopathology has a venerable history in the understanding of the pathobiology of human and animal disease, and presents complementary advantages and challenges to in vivo phenotyping. In this review, we present evidence for the unique contribution that histopathology can make to a large-scale phenotyping effort, using examples from past and current programmes at Lexicon Pharmaceuticals and The Jackson Laboratory, and critically assess the role of histopathology analysis in high-throughput phenotyping pipelines.
Pathbase, the database of images of mouse histopathology, was developed as a resource to provide free access to representative images of lesions in background and mutant strains of laboratory mice. When utilized in conjunction with diagnostic workups or phenotyping of mutant mice, it can provide a “virtual second opinion” for those working without access to groups of experienced pathologists. This is a community resource, and facilitates the sharing of expertise and data between members of the pathology community worldwide. The MPATH pathology ontology was originally developed alongside Pathbase for the annotation of images. This now represents an important resource for the coding of diagnoses which permits sophisticated data retrieval and computational analysis of mouse phenotypes. The structure and use of MPATH is discussed together with current and future challenges for the coding of mutant mouse phenotypes.
mouse pathology; database; ontology; MPATH; mouse anatomy; phenotyping
The use of standard nomenclatures for describing the strains, genes, and proteins of species is vital for the interpretation, archiving, analysis, and recovery of experimental data on the laboratory mouse. At a time when sharing of data and meta- analysis of experimental results is becoming a dominant mode of scientific investigation, failure to respect formal nomenclatures can cause confusion, errors, and in some cases contribute to poor science. Here we present the basic nomenclature rules for laboratory mice and explain how these rules should be applied to complex genetic manipulations and crosses.
Researchers design ontologies as a means to accurately annotate and integrate experimental data across heterogeneous and disparate data- and knowledge bases. Formal ontologies make the semantics of terms and relations explicit such that automated reasoning can be used to verify the consistency of knowledge. However, many biomedical ontologies do not sufficiently formalize the semantics of their relations and are therefore limited with respect to automated reasoning for large scale data integration and knowledge discovery. We describe a method to improve automated reasoning over biomedical ontologies and identify several thousand contradictory class definitions. Our approach aligns terms in biomedical ontologies with foundational classes in a top-level ontology and formalizes composite relations as class expressions. We describe the semi-automated repair of contradictions and demonstrate expressive queries over interoperable ontologies. Our work forms an important cornerstone for data integration, automatic inference and knowledge discovery based on formal representations of knowledge. Our results and analysis software are available at http://bioonto.de/pmwiki.php/Main/ReasonableOntologies.