A novel method for quantifying the similarity between phenotypes by the use of ontologies can be used to search for candidate genes, pathway members, and human disease models on the basis of phenotypes alone.
Scientists and clinicians who study genetic alterations and disease have traditionally described phenotypes in natural language. The considerable variation in these free-text descriptions has posed a hindrance to the important task of identifying candidate genes and models for human diseases and indicates the need for a computationally tractable method to mine data resources for mutant phenotypes. In this study, we tested the hypothesis that ontological annotation of disease phenotypes will facilitate the discovery of new genotype-phenotype relationships within and across species. To describe phenotypes using ontologies, we used an Entity-Quality (EQ) methodology, wherein the affected entity (E) and how it is affected (Q) are recorded using terms from a variety of ontologies. Using this EQ method, we annotated the phenotypes of 11 gene-linked human diseases described in Online Mendelian Inheritance in Man (OMIM). These human annotations were loaded into our Ontology-Based Database (OBD) along with other ontology-based phenotype descriptions of mutants from various model organism databases. Phenotypes recorded with this EQ method can be computationally compared based on the hierarchy of terms in the ontologies and the frequency of annotation. We utilized four similarity metrics to compare phenotypes and developed an ontology of homologous and analogous anatomical structures to compare phenotypes between species. Using these tools, we demonstrate that we can identify, through the similarity of the recorded phenotypes, other alleles of the same gene, other members of a signaling pathway, and orthologous genes and pathway members across species. We conclude that EQ-based annotation of phenotypes, in conjunction with a cross-species ontology, and a variety of similarity metrics can identify biologically meaningful similarities between genes by comparing phenotypes alone. This annotation and search method provides a novel and efficient means to identify gene candidates and animal models of human disease, which may shorten the lengthy path to identification and understanding of the genetic basis of human disease.
Model organisms such as fruit flies, mice, and zebrafish are useful for investigating gene function because they are easy to grow, dissect, and genetically manipulate in the laboratory. By examining mutations in these organisms, one can identify candidate genes that cause disease in humans, and develop models to better understand human disease and gene function. A fundamental roadblock for analysis is, however, the lack of a computational method for describing and comparing phenotypes of mutant animals and of human diseases when the genetic basis is unknown. We describe here a novel method using ontologies to record and quantify the similarity between phenotypes. We tested our method by using the annotated mutant phenotype of one member of the Hedgehog signaling pathway in zebrafish to identify other pathway members with similar recorded phenotypes. We also compared human disease phenotypes to those produced by mutation in model organisms, and show that orthologous and biologically relevant genes can be identified by this method. Given that the genetic basis of human disease is often unknown, this method provides a means for identifying candidate genes, pathway members, and disease models by computationally identifying similar phenotypes within and across species.