PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (54)
 

Clipboard (0)
None

Select a Filter Below

Year of Publication
more »
1.  Finding Our Way through Phenotypes 
Deans, Andrew R. | Lewis, Suzanna E. | Huala, Eva | Anzaldo, Salvatore S. | Ashburner, Michael | Balhoff, James P. | Blackburn, David C. | Blake, Judith A. | Burleigh, J. Gordon | Chanet, Bruno | Cooper, Laurel D. | Courtot, Mélanie | Csösz, Sándor | Cui, Hong | Dahdul, Wasila | Das, Sandip | Dececchi, T. Alexander | Dettai, Agnes | Diogo, Rui | Druzinsky, Robert E. | Dumontier, Michel | Franz, Nico M. | Friedrich, Frank | Gkoutos, George V. | Haendel, Melissa | Harmon, Luke J. | Hayamizu, Terry F. | He, Yongqun | Hines, Heather M. | Ibrahim, Nizar | Jackson, Laura M. | Jaiswal, Pankaj | James-Zorn, Christina | Köhler, Sebastian | Lecointre, Guillaume | Lapp, Hilmar | Lawrence, Carolyn J. | Le Novère, Nicolas | Lundberg, John G. | Macklin, James | Mast, Austin R. | Midford, Peter E. | Mikó, István | Mungall, Christopher J. | Oellrich, Anika | Osumi-Sutherland, David | Parkinson, Helen | Ramírez, Martín J. | Richter, Stefan | Robinson, Peter N. | Ruttenberg, Alan | Schulz, Katja S. | Segerdell, Erik | Seltmann, Katja C. | Sharkey, Michael J. | Smith, Aaron D. | Smith, Barry | Specht, Chelsea D. | Squires, R. Burke | Thacker, Robert W. | Thessen, Anne | Fernandez-Triana, Jose | Vihinen, Mauno | Vize, Peter D. | Vogt, Lars | Wall, Christine E. | Walls, Ramona L. | Westerfeld, Monte | Wharton, Robert A. | Wirkner, Christian S. | Woolley, James B. | Yoder, Matthew J. | Zorn, Aaron M. | Mabee, Paula
PLoS Biology  2015;13(1):e1002033.
Imagine if we could compute across phenotype data as easily as genomic data; this article calls for efforts to realize this vision and discusses the potential benefits.
Despite a large and multifaceted effort to understand the vast landscape of phenotypic data, their current form inhibits productive data analysis. The lack of a community-wide, consensus-based, human- and machine-interpretable language for describing phenotypes and their genomic and environmental contexts is perhaps the most pressing scientific bottleneck to integration across many key fields in biology, including genomics, systems biology, development, medicine, evolution, ecology, and systematics. Here we survey the current phenomics landscape, including data resources and handling, and the progress that has been made to accurately capture relevant data descriptions for phenotypes. We present an example of the kind of integration across domains that computable phenotypes would enable, and we call upon the broader biology community, publishers, and relevant funding agencies to support efforts to surmount today's data barriers and facilitate analytical reproducibility.
doi:10.1371/journal.pbio.1002033
PMCID: PMC4285398  PMID: 25562316
3.  Deletions of chromosomal regulatory boundaries are associated with congenital disease 
Genome Biology  2014;15(9):423.
Background
Recent data from genome-wide chromosome conformation capture analysis indicate that the human genome is divided into conserved megabase-sized self-interacting regions called topological domains. These topological domains form the regulatory backbone of the genome and are separated by regulatory boundary elements or barriers. Copy-number variations can potentially alter the topological domain architecture by deleting or duplicating the barriers and thereby allowing enhancers from neighboring domains to ectopically activate genes causing misexpression and disease, a mutational mechanism that has recently been termed enhancer adoption.
Results
We use the Human Phenotype Ontology database to relate the phenotypes of 922 deletion cases recorded in the DECIPHER database to monogenic diseases associated with genes in or adjacent to the deletions. We identify combinations of tissue-specific enhancers and genes adjacent to the deletion and associated with phenotypes in the corresponding tissue, whereby the phenotype matched that observed in the deletion. We compare this computationally with a gene-dosage pathomechanism that attempts to explain the deletion phenotype based on haploinsufficiency of genes located within the deletions. Up to 11.8% of the deletions could be best explained by enhancer adoption or a combination of enhancer adoption and gene-dosage effects.
Conclusions
Our results suggest that enhancer adoption caused by deletions of regulatory boundaries may contribute to a substantial minority of copy-number variation phenotypes and should thus be taken into account in their medical interpretation.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-014-0423-1) contains supplementary material, which is available to authorized users.
doi:10.1186/s13059-014-0423-1
PMCID: PMC4180961  PMID: 25315429
4.  Standardized description of scientific evidence using the Evidence Ontology (ECO) 
The Evidence Ontology (ECO) is a structured, controlled vocabulary for capturing evidence in biological research. ECO includes diverse terms for categorizing evidence that supports annotation assertions including experimental types, computational methods, author statements and curator inferences. Using ECO, annotation assertions can be distinguished according to the evidence they are based on such as those made by curators versus those automatically computed or those made via high-throughput data review versus single test experiments. Originally created for capturing evidence associated with Gene Ontology annotations, ECO is now used in other capacities by many additional annotation resources including UniProt, Mouse Genome Informatics, Saccharomyces Genome Database, PomBase, the Protein Information Resource and others. Information on the development and use of ECO can be found at http://evidenceontology.org. The ontology is freely available under Creative Commons license (CC BY-SA 3.0), and can be downloaded in both Open Biological Ontologies and Web Ontology Language formats at http://code.google.com/p/evidenceontology. Also at this site is a tracker for user submission of term requests and questions. ECO remains under active development in response to user-requested terms and in collaborations with other ontologies and database resources.
Database URL: Evidence Ontology Web site: http://evidenceontology.org
doi:10.1093/database/bau075
PMCID: PMC4105709  PMID: 25052702
5.  The influence of disease categories on gene candidate predictions from model organism phenotypes 
Journal of Biomedical Semantics  2014;5(Suppl 1):S4.
Background
The molecular etiology is still to be identified for about half of the currently described Mendelian diseases in humans, thereby hindering efforts to find treatments or preventive measures. Advances, such as new sequencing technologies, have led to increasing amounts of data becoming available with which to address the problem of identifying disease genes. Therefore, automated methods are needed that reliably predict disease gene candidates based on available data. We have recently developed Exomiser as a tool for identifying causative variants from exome analysis results by filtering and prioritising using a number of criteria including the phenotype similarity between the disease and mouse mutants involving the gene candidates. Initial investigations revealed a variation in performance for different medical categories of disease, due in part to a varying contribution of the phenotype scoring component.
Results
In this study, we further analyse the performance of our cross-species phenotype matching algorithm, and examine in more detail the reasons why disease gene filtering based on phenotype data works better for certain disease categories than others. We found that in addition to misleading phenotype alignments between species, some disease categories are still more amenable to automated predictions than others, and that this often ties in with community perceptions on how well the organism works as model.
Conclusions
In conclusion, our automated disease gene candidate predictions are highly dependent on the organism used for the predictions and the disease category being studied. Future work on computational disease gene prediction using phenotype data would benefit from methods that take into account the disease category and the source of model organism data.
doi:10.1186/2041-1480-5-S1-S4
PMCID: PMC4108905  PMID: 25093073
6.  Unification of multi-species vertebrate anatomy ontologies for comparative biology in Uberon 
Background
Elucidating disease and developmental dysfunction requires understanding variation in phenotype. Single-species model organism anatomy ontologies (ssAOs) have been established to represent this variation. Multi-species anatomy ontologies (msAOs; vertebrate skeletal, vertebrate homologous, teleost, amphibian AOs) have been developed to represent ‘natural’ phenotypic variation across species. Our aim has been to integrate ssAOs and msAOs for various purposes, including establishing links between phenotypic variation and candidate genes.
Results
Previously, msAOs contained a mixture of unique and overlapping content. This hampered integration and coordination due to the need to maintain cross-references or inter-ontology equivalence axioms to the ssAOs, or to perform large-scale obsolescence and modular import. Here we present the unification of anatomy ontologies into Uberon, a single ontology resource that enables interoperability among disparate data and research groups. As a consequence, independent development of TAO, VSAO, AAO, and vHOG has been discontinued.
Conclusions
The newly broadened Uberon ontology is a unified cross-taxon resource for metazoans (animals) that has been substantially expanded to include a broad diversity of vertebrate anatomical structures, permitting reasoning across anatomical variation in extinct and extant taxa. Uberon is a core resource that supports single- and cross-species queries for candidate genes using annotations for phenotypes from the systematics, biodiversity, medical, and model organism communities, while also providing entities for logical definitions in the Cell and Gene Ontologies.
The ontology release files associated with the ontology merge described in this manuscript are available at: http://purl.obolibrary.org/obo/uberon/releases/2013-02-21/
Current ontology release files are available always available at: http://purl.obolibrary.org/obo/uberon/releases/
doi:10.1186/2041-1480-5-21
PMCID: PMC4089931  PMID: 25009735
Evolutionary biology; Morphological variation; Phenotype; Semantic integration; Bio-ontology
7.  BioJS: an open source standard for biological visualisation – its status in 2014 
F1000Research  2014;3:55.
BioJS is a community-based standard and repository of functional components to represent biological information on the web. The development of BioJS has been prompted by the growing need for bioinformatics visualisation tools to be easily shared, reused and discovered. Its modular architecture makes it easy for users to find a specific functionality without needing to know how it has been built, while components can be extended or created for implementing new functionality. The BioJS community of developers currently provides a range of functionality that is open access and freely available. A registry has been set up that categorises and provides installation instructions and testing facilities at http://www.ebi.ac.uk/tools/biojs/. The source code for all components is available for ready use at https://github.com/biojs/biojs.
doi:10.12688/f1000research.3-55.v1
PMCID: PMC4103492  PMID: 25075290
8.  Construction and accessibility of a cross-species phenotype ontology along with gene annotations for biomedical research 
F1000Research  2014;2:30.
Phenotype analyses, e.g. investigating metabolic processes, tissue formation, or organism behavior, are an important element of most biological and medical research activities. Biomedical researchers are making increased use of ontological standards and methods to capture the results of such analyses, with one focus being the comparison and analysis of phenotype information between species.
We have generated a cross-species phenotype ontology for human, mouse and zebrafish that contains classes from the Human Phenotype Ontology, Mammalian Phenotype Ontology, and generated classes for zebrafish phenotypes. We also provide up-to-date annotation data connecting human genes to phenotype classes from the generated ontology. We have included the data generation pipeline into our continuous integration system ensuring stable and up-to-date releases.
This article describes the data generation process and is intended to help interested researchers access both the phenotype annotation data and the associated cross-species phenotype ontology. The resource described here can be used in sophisticated semantic similarity and gene set enrichment analyses for phenotype data across species. The stable releases of this resource can be obtained from http://purl.obolibrary.org/obo/hp/uberpheno/.
doi:10.12688/f1000research.2-30.v2
PMCID: PMC3799545  PMID: 24358873
9.  The environment ontology: contextualising biological and biomedical entities 
As biological and biomedical research increasingly reference the environmental context of the biological entities under study, the need for formalisation and standardisation of environment descriptors is growing. The Environment Ontology (ENVO; http://www.environmentontology.org) is a community-led, open project which seeks to provide an ontology for specifying a wide range of environments relevant to multiple life science disciplines and, through an open participation model, to accommodate the terminological requirements of all those needing to annotate data using ontology classes. This paper summarises ENVO’s motivation, content, structure, adoption, and governance approach. The ontology is available from http://purl.obolibrary.org/obo/envo.owl - an OBO format version is also available by switching the file suffix to “obo”.
doi:10.1186/2041-1480-4-43
PMCID: PMC3904460  PMID: 24330602
Environment; Ecosystem; Biome; Ontology
10.  The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data 
Nucleic Acids Research  2013;42(Database issue):D966-D974.
The Human Phenotype Ontology (HPO) project, available at http://www.human-phenotype-ontology.org, provides a structured, comprehensive and well-defined set of 10,088 classes (terms) describing human phenotypic abnormalities and 13,326 subclass relations between the HPO classes. In addition we have developed logical definitions for 46% of all HPO classes using terms from ontologies for anatomy, cell types, function, embryology, pathology and other domains. This allows interoperability with several resources, especially those containing phenotype information on model organisms such as mouse and zebrafish. Here we describe the updated HPO database, which provides annotations of 7,278 human hereditary syndromes listed in OMIM, Orphanet and DECIPHER to classes of the HPO. Various meta-attributes such as frequency, references and negations are associated with each annotation. Several large-scale projects worldwide utilize the HPO for describing phenotype information in their datasets. We have therefore generated equivalence mappings to other phenotype vocabularies such as LDDB, Orphanet, MedDRA, UMLS and phenoDB, allowing integration of existing datasets and interoperability with multiple biomedical resources. We have created various ways to access the HPO database content using flat files, a MySQL database, and Web-based tools. All data and documentation on the HPO project can be found online.
doi:10.1093/nar/gkt1026
PMCID: PMC3965098  PMID: 24217912
11.  Web Apollo: a web-based genomic annotation editing platform 
Genome Biology  2013;14(8):R93.
Web Apollo is the first instantaneous, collaborative genomic annotation editor available on the web. One of the natural consequences following from current advances in sequencing technology is that there are more and more researchers sequencing new genomes. These researchers require tools to describe the functional features of their newly sequenced genomes. With Web Apollo researchers can use any of the common browsers (for example, Chrome or Firefox) to jointly analyze and precisely describe the features of a genome in real time, whether they are in the same room or working from opposite sides of the world.
doi:10.1186/gb-2013-14-8-r93
PMCID: PMC4053811  PMID: 24000942
GENOME; COLLABORATIVE; EDITOR
12.  PhenoDigm: analyzing curated annotations to associate animal models with human diseases 
The ultimate goal of studying model organisms is to translate what is learned into useful knowledge about normal human biology and disease to facilitate treatment and early screening for diseases. Recent advances in genomic technologies allow for rapid generation of models with a range of targeted genotypes as well as their characterization by high-throughput phenotyping. As an abundance of phenotype data become available, only systematic analysis will facilitate valid conclusions to be drawn from these data and transferred to human diseases. Owing to the volume of data, automated methods are preferable, allowing for a reliable analysis of the data and providing evidence about possible gene–disease associations.
Here, we propose Phenotype comparisons for DIsease Genes and Models (PhenoDigm), as an automated method to provide evidence about gene–disease associations by analysing phenotype information. PhenoDigm integrates data from a variety of model organisms and, at the same time, uses several intermediate scoring methods to identify only strongly data-supported gene candidates for human genetic diseases. We show results of an automated evaluation as well as selected manually assessed examples that support the validity of PhenoDigm. Furthermore, we provide guidance on how to browse the data with PhenoDigm’s web interface and illustrate its usefulness in supporting research.
Database URL: http://www.sanger.ac.uk/resources/databases/phenodigm
doi:10.1093/database/bat025
PMCID: PMC3649640  PMID: 23660285
13.  MouseFinder: candidate disease genes from mouse phenotype data 
Human Mutation  2012;33(5):858-866.
Mouse phenotype data represents a valuable resource for the identification of disease-associated genes, especially where the molecular basis is unknown and there is no clue to the candidate gene’s function, pathway involvement or expression pattern. However, until recently these data have not been systematically used due to difficulties in mapping between clinical features observed in humans and mouse phenotype annotations. Here, we describe a semantic approach to solve this problem and demonstrate highly significant recall of known disease-gene associations and orthology relationships. A web application (MouseFinder; www.mousemodels.org) has been developed to allow users to search the results of our whole-phenome comparison of human and mouse. We demonstrate its use in identifying ARTN as a strong candidate gene within the 1p34.1-p32 mapped locus for a hereditary form of ptosis.
doi:10.1002/humu.22051
PMCID: PMC3327758  PMID: 22331800
phenotype; candidate disease genes; model organism; mouse
14.  Phenotypic overlap in the contribution of individual genes to CNV pathogenicity revealed by cross-species computational analysis of single-gene mutations in humans, mice and zebrafish 
Disease Models & Mechanisms  2012;6(2):358-372.
SUMMARY
Numerous disease syndromes are associated with regions of copy number variation (CNV) in the human genome and, in most cases, the pathogenicity of the CNV is thought to be related to altered dosage of the genes contained within the affected segment. However, establishing the contribution of individual genes to the overall pathogenicity of CNV syndromes is difficult and often relies on the identification of potential candidates through manual searches of the literature and online resources. We describe here the development of a computational framework to comprehensively search phenotypic information from model organisms and single-gene human hereditary disorders, and thus speed the interpretation of the complex phenotypes of CNV disorders. There are currently more than 5000 human genes about which nothing is known phenotypically but for which detailed phenotypic information for the mouse and/or zebrafish orthologs is available. Here, we present an ontology-based approach to identify similarities between human disease manifestations and the mutational phenotypes in characterized model organism genes; this approach can therefore be used even in cases where there is little or no information about the function of the human genes. We applied this algorithm to detect candidate genes for 27 recurrent CNV disorders and identified 802 gene-phenotype associations, approximately half of which involved genes that were previously reported to be associated with individual phenotypic features and half of which were novel candidates. A total of 431 associations were made solely on the basis of model organism phenotype data. Additionally, we observed a striking, statistically significant tendency for individual disease phenotypes to be associated with multiple genes located within a single CNV region, a phenomenon that we denote as pheno-clustering. Many of the clusters also display statistically significant similarities in protein function or vicinity within the protein-protein interaction network. Our results provide a basis for understanding previously un-interpretable genotype-phenotype correlations in pathogenic CNVs and for mobilizing the large amount of model organism phenotype data to provide insights into human genetic disorders.
doi:10.1242/dmm.010322
PMCID: PMC3597018  PMID: 23104991
15.  Construction and accessibility of a cross-species phenotype ontology along with gene annotations for biomedical research 
F1000Research  2013;2:30.
Phenotype analyses, e.g. investigating metabolic processes, tissue formation, or organism behavior, are an important element of most biological and medical research activities. Biomedical researchers are making increased use of ontological standards and methods to capture the results of such analyses, with one focus being the comparison and analysis of phenotype information between species.
We have generated a cross-species phenotype ontology for human, mouse and zebra fish that contains zebrafish phenotypes. We also provide up-to-date annotation data connecting human genes to phenotype classes from the generated ontology. We have included the data generation pipeline into our continuous integration system ensuring stable and up-to-date releases.
This article describes the data generation process and is intended to help interested researchers access both the phenotype annotation data and the associated cross-species phenotype ontology. The resource described here can be used in sophisticated semantic similarity and gene set enrichment analyses for phenotype data across species. The stable releases of this resource can be obtained from http://purl.obolibrary.org/obo/hp/uberpheno/.
doi:10.12688/f1000research.2-30.v1
PMCID: PMC3799545  PMID: 24358873
16.  A knowledge based approach to matching human neurodegenerative disease and animal models 
Neurodegenerative diseases present a wide and complex range of biological and clinical features. Animal models are key to translational research, yet typically only exhibit a subset of disease features rather than being precise replicas of the disease. Consequently, connecting animal to human conditions using direct data-mining strategies has proven challenging, particularly for diseases of the nervous system, with its complicated anatomy and physiology. To address this challenge we have explored the use of ontologies to create formal descriptions of structural phenotypes across scales that are machine processable and amenable to logical inference. As proof of concept, we built a Neurodegenerative Disease Phenotype Ontology (NDPO) and an associated Phenotype Knowledge Base (PKB) using an entity-quality model that incorporates descriptions for both human disease phenotypes and those of animal models. Entities are drawn from community ontologies made available through the Neuroscience Information Framework (NIF) and qualities are drawn from the Phenotype and Trait Ontology (PATO). We generated ~1200 structured phenotype statements describing structural alterations at the subcellular, cellular and gross anatomical levels observed in 11 human neurodegenerative conditions and associated animal models. PhenoSim, an open source tool for comparing phenotypes, was used to issue a series of competency questions to compare individual phenotypes among organisms and to determine which animal models recapitulate phenotypic aspects of the human disease in aggregate. Overall, the system was able to use relationships within the ontology to bridge phenotypes across scales, returning non-trivial matches based on common subsumers that were meaningful to a neuroscientist with an advanced knowledge of neuroanatomy. The system can be used both to compare individual phenotypes and also phenotypes in aggregate. This proof of concept suggests that expressing complex phenotypes using formal ontologies provides considerable benefit for comparing phenotypes across scales and species.
doi:10.3389/fninf.2013.00007
PMCID: PMC3653101  PMID: 23717278
phenotype; ontology; Neuroscience Information Framework; neurodegenerative disease; semantics
17.  Invariance (?) of Mutational Parameters for Relative Fitness Over 400 Generations of Mutation Accumulation in Caenorhabditis elegans 
G3: Genes|Genomes|Genetics  2012;2(12):1497-1503.
Evidence is accumulating that individuals in poor physiologic condition may accumulate mutational damage faster than individuals in good condition. If poor condition results from pre-existing deleterious mutations, the result is “fitness-dependent mutation rate,” which has interesting theoretical implications. Here we report a study in which 10 mutation accumulation (MA) lines of the nematode Caenorhabditis elegans that had previously accumulated mutations for 250 generations under relaxed selection were expanded into sets of “second-order” MA lines and allowed to accumulate mutations for an additional 150 generations. The 10 lines were chosen on the basis of the relative change in fitness over the first 250 generations of MA, five high-fitness lines and five low-fitness lines. On average, the mutational properties (per-generation change in mean relative fitness, mutational variance, and Bateman-Mukai estimates of genomic mutation rate and average mutational effect) of the high-fitness and low-fitness did not differ significantly, and averaged over all lines, the point estimates were extremely close to those of the first-order MA experiment after 200 generations of MA. However, several nonsignificant trends indicate that low-fitness lines may in fact be more likely to suffer mutational damage than high-fitness lines.
doi:10.1534/g3.112.003947
PMCID: PMC3516472  PMID: 23275873
Bateman-Mukai method; fitness-dependent mutation; genetic load; mutational variance
18.  Phenotype Ontology Research Coordination Network meeting report: creating a community network for comparing and leveraging phenotype-genotype knowledge across species 
Standards in Genomic Sciences  2012;6(3):440-443.
Representing phenotype in a way that can be linked to thousands of molecular genetic and environmental databases is an unresolved research challenge. A recent meeting of the Phenotype Research Coordination Network (RCN) aimed to coordinate and leverage current efforts. The three day summit meeting was hosted by NESCent (The National Evolutionary Synthesis Center) in Durham, North Carolina on the 23rd – 25th of February, 2012.
doi:10.4056/sigs.2926219
PMCID: PMC3558964  PMID: 23409218
19.  Entity/Quality-Based Logical Definitions for the Human Skeletal Phenome using PATO 
Conference Proceedings  2009;2009:7069-7072.
This paper describes an approach to providing computer-interpretable logical definitions for the terms of the Human Phenotype Ontology (HPO) using PATO, the ontology of phenotypic qualities, to link terms of the HPO to the anatomic and other entities that are affected by abnormal phenotypic qualities. This approach will allow improved computerized reasoning as well as a facility to compare phenotypes between different species. The PATO mapping will also provide direct links from phenotypic abnormalities and underlying anatomic structures encoded using the Foundational Model of Anatomy, which will be a valuable resource for computational investigations of the links between anatomical components and concepts representing diseases with abnormal phenotypes and associated genes.
doi:10.1109/IEMBS.2009.5333362
PMCID: PMC3398700  PMID: 19964203
20.  On the Use of Gene Ontology Annotations to Assess Functional Similarity among Orthologs and Paralogs: A Short Report 
PLoS Computational Biology  2012;8(2):e1002386.
A recent paper (Nehrt et al., PLoS Comput. Biol. 7:e1002073, 2011) has proposed a metric for the “functional similarity” between two genes that uses only the Gene Ontology (GO) annotations directly derived from published experimental results. Applying this metric, the authors concluded that paralogous genes within the mouse genome or the human genome are more functionally similar on average than orthologous genes between these genomes, an unexpected result with broad implications if true. We suggest, based on both theoretical and empirical considerations, that this proposed metric should not be interpreted as a functional similarity, and therefore cannot be used to support any conclusions about the “ortholog conjecture” (or, more properly, the “ortholog functional conservation hypothesis”). First, we reexamine the case studies presented by Nehrt et al. as examples of orthologs with divergent functions, and come to a very different conclusion: they actually exemplify how GO annotations for orthologous genes provide complementary information about conserved biological functions. We then show that there is a global ascertainment bias in the experiment-based GO annotations for human and mouse genes: particular types of experiments tend to be performed in different model organisms. We conclude that the reported statistical differences in annotations between pairs of orthologous genes do not reflect differences in biological function, but rather complementarity in experimental approaches. Our results underscore two general considerations for researchers proposing novel types of analysis based on the GO: 1) that GO annotations are often incomplete, potentially in a biased manner, and subject to an “open world assumption” (absence of an annotation does not imply absence of a function), and 2) that conclusions drawn from a novel, large-scale GO analysis should whenever possible be supported by careful, in-depth examination of examples, to help ensure the conclusions have a justifiable biological basis.
Author Summary
Understanding gene function—how individual genes contribute to the biology of an organism at the molecular, cellular and organism levels—is one of the primary aims of biomedical research. It has been a longstanding tenet of model organism research that experimental knowledge obtained in one organism is often applicable to other organisms, particularly if the organisms share the relevant genes because they inherited them from their common ancestor. Nevertheless this tenet is, like any hypothesis, not beyond question. A recent paper has termed this hypothesis a “conjecture,” and performed a statistical analysis, the results of which were interpreted as evidence against the hypothesis. This statistical analysis relied on a computational representation of gene function, the Gene Ontology (GO). As representatives of the international consortium that produces the GO, we show how the apparent evidence against the “ortholog conjecture” can be better explained as an artifact of how molecular biology knowledge is accumulated. In short, a complementarity between knowledge obtained in mouse and human experimental systems was incorrectly interpreted as a disagreement. We discuss the proper interpretation of GO annotations and potential sources of bias, with an eye toward enhancing the informed use of the GO by the scientific community.
doi:10.1371/journal.pcbi.1002386
PMCID: PMC3280971  PMID: 22359495
21.  Toward community standards in the quest for orthologs 
Bioinformatics  2012;28(6):900-904.
The identification of orthologs—genes pairs descended from a common ancestor through speciation, rather than duplication—has emerged as an essential component of many bioinformatics applications, ranging from the annotation of new genomes to experimental target prioritization. Yet, the development and application of orthology inference methods is hampered by the lack of consensus on source proteomes, file formats and benchmarks. The second ‘Quest for Orthologs’ meeting brought together stakeholders from various communities to address these challenges. We report on achievements and outcomes of this meeting, focusing on topics of particular relevance to the research community at large. The Quest for Orthologs consortium is an open community that welcomes contributions from all researchers interested in orthology research and applications.
Contact: dessimoz@ebi.ac.uk
doi:10.1093/bioinformatics/bts050
PMCID: PMC3307119  PMID: 22332236
22.  Uberon, an integrative multi-species anatomy ontology 
Genome Biology  2012;13(1):R5.
We present Uberon, an integrated cross-species ontology consisting of over 6,500 classes representing a variety of anatomical entities, organized according to traditional anatomical classification criteria. The ontology represents structures in a species-neutral way and includes extensive associations to existing species-centric anatomical ontologies, allowing integration of model organism and human data. Uberon provides a necessary bridge between anatomical structures in different taxa for cross-species inference. It uses novel methods for representing taxonomic variation, and has proved to be essential for translational phenotype analyses. Uberon is available at http://uberon.org
doi:10.1186/gb-2012-13-1-r5
PMCID: PMC3334586  PMID: 22293552
23.  modMine: flexible access to modENCODE data 
Nucleic Acids Research  2011;40(Database issue):D1082-D1088.
In an effort to comprehensively characterize the functional elements within the genomes of the important model organisms Drosophila melanogaster and Caenorhabditis elegans, the NHGRI model organism Encyclopaedia of DNA Elements (modENCODE) consortium has generated an enormous library of genomic data along with detailed, structured information on all aspects of the experiments. The modMine database (http://intermine.modencode.org) described here has been built by the modENCODE Data Coordination Center to allow the broader research community to (i) search for and download data sets of interest among the thousands generated by modENCODE; (ii) access the data in an integrated form together with non-modENCODE data sets; and (iii) facilitate fine-grained analysis of the above data. The sophisticated search features are possible because of the collection of extensive experimental metadata by the consortium. Interfaces are provided to allow both biologists and bioinformaticians to exploit these rich modENCODE data sets now available via modMine.
doi:10.1093/nar/gkr921
PMCID: PMC3245176  PMID: 22080565
24.  Identification of Functional Elements and Regulatory Circuits by Drosophila modENCODE 
Roy, Sushmita | Ernst, Jason | Kharchenko, Peter V. | Kheradpour, Pouya | Negre, Nicolas | Eaton, Matthew L. | Landolin, Jane M. | Bristow, Christopher A. | Ma, Lijia | Lin, Michael F. | Washietl, Stefan | Arshinoff, Bradley I. | Ay, Ferhat | Meyer, Patrick E. | Robine, Nicolas | Washington, Nicole L. | Di Stefano, Luisa | Berezikov, Eugene | Brown, Christopher D. | Candeias, Rogerio | Carlson, Joseph W. | Carr, Adrian | Jungreis, Irwin | Marbach, Daniel | Sealfon, Rachel | Tolstorukov, Michael Y. | Will, Sebastian | Alekseyenko, Artyom A. | Artieri, Carlo | Booth, Benjamin W. | Brooks, Angela N. | Dai, Qi | Davis, Carrie A. | Duff, Michael O. | Feng, Xin | Gorchakov, Andrey A. | Gu, Tingting | Henikoff, Jorja G. | Kapranov, Philipp | Li, Renhua | MacAlpine, Heather K. | Malone, John | Minoda, Aki | Nordman, Jared | Okamura, Katsutomo | Perry, Marc | Powell, Sara K. | Riddle, Nicole C. | Sakai, Akiko | Samsonova, Anastasia | Sandler, Jeremy E. | Schwartz, Yuri B. | Sher, Noa | Spokony, Rebecca | Sturgill, David | van Baren, Marijke | Wan, Kenneth H. | Yang, Li | Yu, Charles | Feingold, Elise | Good, Peter | Guyer, Mark | Lowdon, Rebecca | Ahmad, Kami | Andrews, Justen | Berger, Bonnie | Brenner, Steven E. | Brent, Michael R. | Cherbas, Lucy | Elgin, Sarah C. R. | Gingeras, Thomas R. | Grossman, Robert | Hoskins, Roger A. | Kaufman, Thomas C. | Kent, William | Kuroda, Mitzi I. | Orr-Weaver, Terry | Perrimon, Norbert | Pirrotta, Vincenzo | Posakony, James W. | Ren, Bing | Russell, Steven | Cherbas, Peter | Graveley, Brenton R. | Lewis, Suzanna | Micklem, Gos | Oliver, Brian | Park, Peter J. | Celniker, Susan E. | Henikoff, Steven | Karpen, Gary H. | Lai, Eric C. | MacAlpine, David M. | Stein, Lincoln D. | White, Kevin P. | Kellis, Manolis
Science (New York, N.Y.)  2010;330(6012):1787-1797.
To gain insight into how genomic information is translated into cellular and developmental programs, the Drosophila model organism Encyclopedia of DNA Elements (modENCODE) project is comprehensively mapping transcripts, histone modifications, chromosomal proteins, transcription factors, replication proteins and intermediates, and nucleosome properties across a developmental time course and in multiple cell lines. We have generated more than 700 data sets and discovered protein-coding, noncoding, RNA regulatory, replication, and chromatin elements, more than tripling the annotated portion of the Drosophila genome. Correlated activity patterns of these elements reveal a functional regulatory network, which predicts putative new functions for genes, reveals stage- and tissue-specific regulators, and enables gene-expression prediction. Our results provide a foundation for directed experimental and computational studies in Drosophila and related species and also a model for systematic data integration toward comprehensive genomic and functional annotation.
doi:10.1126/science.1198374
PMCID: PMC3192495  PMID: 21177974
25.  Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium 
Briefings in Bioinformatics  2011;12(5):449-462.
The goal of the Gene Ontology (GO) project is to provide a uniform way to describe the functions of gene products from organisms across all kingdoms of life and thereby enable analysis of genomic data. Protein annotations are either based on experiments or predicted from protein sequences. Since most sequences have not been experimentally characterized, most available annotations need to be based on predictions. To make as accurate inferences as possible, the GO Consortium's Reference Genome Project is using an explicit evolutionary framework to infer annotations of proteins from a broad set of genomes from experimental annotations in a semi-automated manner. Most components in the pipeline, such as selection of sequences, building multiple sequence alignments and phylogenetic trees, retrieving experimental annotations and depositing inferred annotations, are fully automated. However, the most crucial step in our pipeline relies on software-assisted curation by an expert biologist. This curation tool, Phylogenetic Annotation and INference Tool (PAINT) helps curators to infer annotations among members of a protein family. PAINT allows curators to make precise assertions as to when functions were gained and lost during evolution and record the evidence (e.g. experimentally supported GO annotations and phylogenetic information including orthology) for those assertions. In this article, we describe how we use PAINT to infer protein function in a phylogenetic context with emphasis on its strengths, limitations and guidelines. We also discuss specific examples showing how PAINT annotations compare with those generated by other highly used homology-based methods.
doi:10.1093/bib/bbr042
PMCID: PMC3178059  PMID: 21873635
gene ontology; genome annotation; reference genome; gene function prediction; phylogenetics

Results 1-25 (54)