The problems associated with crossing the phenotype ‘gap’ between different ways of describing phenotypes in different genera are discussed in Schofield et al.
]. The issues are not simply those of establishing straight mappings, though that is difficult enough, but where there are large evolutionary distances between species, bridging to the most closely related phenotypes. For example, cardiac defects such as the tetralogy of Fallot are closely related in human and mouse, but while the syndrome is impossible in fish due to the different anatomical structure of the fish heart, other cardiac morphological defects may well be the consequence of dysregulation of the same morphogenetic processes in all three organisms. It is useful to include such related defects in the analysis. This requires a way to provide rich, explicit and consistent descriptions, so that automated systems are able to process and distinguish the meaning of their terms and use them to infer new information.
The problems with the use of lexical matching and lack of a formal ontology for human phenotypes have now been resolved with the much needed development of the HPO [25
]. We now have phenotype ontologies for the mouse, yeast, worm and fly, all of which are available from the Open Biological and Biomedical Ontology (OBO) foundry (http://www.obofoundry.org/
]. Lexical cross-mapping of MPO and HPO using UMLS as a translation layer has been reported [47
], but this suffers from the same problems reported by Burgun and co-workers. In recent years an approach has been developed to circumvent the species specificity of the phenotype ontologies using matching of logical definitions for classes within each ontology rather than text. The definitions utilize species-agnostic ontologies such as GO to provide a common semantic level at which they can be integrated into a single framework [48
] and a post-composition strategy termed Entity–Quality (EQ).
Rather than using a precomposed ontology such as HPO, phenotypes may be described using the EQ formalism [49
]. In the EQ method, a phenotype is characterized by an affected Entity (from an anatomy or process ontology) and a Quality [from the Phenotype and Trait Ontology (PATO)] that specifies how the entity is affected [48
]. The affected entity can either be a biological function or process as specified in GO, or an anatomical entity. The Zfin database uses the EQ method exclusively to capture phenodeviance [50
], but these statements may be used as logical definitions for classes within a phenotype ontology. While the ontologies used to write the definitions (cross-products) are largely species agnostic, such as GO, CheBI, MPATH [51
], anatomical entities are almost exclusively specified using a species-specific anatomy ontology, for example, the Foundational Model of Anatomy (FMA), the Mouse Adult Anatomy (MA) or the Zebrafish Anatomy (ZFA), and to make mappings between these vertebrate anatomies the metazoan, species-independent UBERON ontology is used in constructing anatomically based cross-products [52
]. Once their classes are formally defined, phenotype ontologies may be linked through common or related cross-product definitions and striking concordances may be discovered between the phenotypes of different species. This method was successfully exploited by Washington et al.
] who annotated the phenotypes of 11 gene-linked human diseases from OMIM and computationally compared these with other ontology-based phenotype descriptions from model organisms. They showed that, based on the subsumption of classes in the ontologies and the frequency of annotation, they could detect other alleles of the same gene, other members of a signaling pathway, and orthologous genes and pathway members across species through the similarity of the phenotypes, demonstrating a proof of principle for the EQ approach.
More recently, a whole-phenome approach to comparative phenomics was developed exploiting the full semantic content of ontologies [54
]. The method requires the formalization of anatomy and phenotype ontologies so that they can be integrated using the parthood relation followed by the generation of a single, unified and logically consistent representation of phenotype data for multiple species annotated to the species-specific phenotype ontologies within a single semantically coherent framework, amenable to automated reasoning [55
]. The unified framework, PhenomeNET, permits the use of phenotype information alone to query (‘phenomeblast’) the gathered phenotype annotations from OMIM and the mouse, zebrafish, fly, yeast and worm model organism databases. The ontology contains more than 275 000 classes and more than a million axioms, including classes for 86 203 complex phenotype annotations drawn from the model organism databases and OMIM. A great advantage is that the ontology can be regenerated to include new phenotype annotations and future developments of all of the constituent ontologies, and a tool to query the data has been made available on http://www.phenomebrowser.net/
. This method is the first to be able to allow automated reasoning over all of the phenotype ontologies and the gathered ontologies involved in the logical class definitions. It permits a simultaneous survey and computation over all of the phenotype data available from the main model organism and human databases. The network can be used to successfully identify orthologous genes through related phenotypes, and genes involved in the same pathway as well as genes giving rise to the same disease.
Development of PhenomeNET has allowed the comparison of usefulness of the phenotype annotations in the model organism databases and OMIM. It is clear that the manual annotation of MGD represents a gold standard for literature annotation to a precomposed phenotype ontology. In contrast, the heterogeneity and in some case sparseness of annotation in OMIM is problematical. Oti et al. found that the under-annotation of diseases in OMIM is a weakness in its ability to provide a resource for identifying animal models of OMIM diseases, and this observation could be confirmed with PhenomeNET. The use of annotations in Orphanet, and particularly the possibility of incorporating frequency data into the ‘phenomeblast’, are important areas of research to extend automated cross-species analyses of phenotype information.