Search tips
Search criteria

Results 1-24 (24)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
more »
1.  Model organism databases: essential resources that need the support of both funders and users 
BMC Biology  2016;14:49.
Modern biomedical research depends critically on access to databases that house and disseminate genetic, genomic, molecular, and cell biological knowledge. Even as the explosion of available genome sequences and associated genome-scale data continues apace, the sustainability of professionally maintained biological databases is under threat due to policy changes by major funding agencies. Here, we focus on model organism databases to demonstrate the myriad ways in which biological databases not only act as repositories but actively facilitate advances in research. We present data that show that reducing financial support to model organism databases could prove to be not just scientifically, but also economically, unsound.
PMCID: PMC4918006  PMID: 27334346
2.  Ontology Engineering 
Nature biotechnology  2010;28(2):128-130.
Gene Ontology1 and similar biomedical ontologies are critical tools of today genetic research. These ontologies are crafted through a painstaking process of manual editing, and their organization relies on the intuition of human curators. Here we describe a method that uses information theory to automatically organize the structure of GO and optimize the distribution of the information within it. We used this approach to analyze the evolution of GO, and we identified several areas where the information was suboptimally organized. We optimized the structure of GO and used it to analyze 10,117 gene expression signatures. The use of this new version changed the functional interpretations of 97.5% (p < 10-3) of the signatures by, on average, 14.6%. As a result of this analysis, several changes will be introduced in the next releases of GO. We expect that these formal methods will become the standard to engineer biomedical ontologies.
PMCID: PMC4829499  PMID: 20139945
3.  PomBase 2015: updates to the fission yeast database 
Nucleic Acids Research  2014;43(Database issue):D656-D661.
PomBase ( is the model organism database for the fission yeast Schizosaccharomyces pombe. PomBase provides a central hub for the fission yeast community, supporting both exploratory and hypothesis-driven research. It provides users easy access to data ranging from the sequence level, to molecular and phenotypic annotations, through to the display of genome-wide high-throughput studies. Recent improvements to the site extend annotation specificity, improve usability and allow for monthly data updates. Both in-house curators and community researchers provide manually curated data to PomBase. The genome browser provides access to published high-throughput data sets and the genomes of three additional Schizosaccharomyces species (Schizosaccharomyces cryophilus, Schizosaccharomyces japonicus and Schizosaccharomyces octosporus).
PMCID: PMC4383888  PMID: 25361970
4.  Representing Kidney Development Using the Gene Ontology 
PLoS ONE  2014;9(6):e99864.
Gene Ontology (GO) provides dynamic controlled vocabularies to aid in the description of the functional biological attributes and subcellular locations of gene products from all taxonomic groups ( Here we describe collaboration between the renal biomedical research community and the GO Consortium to improve the quality and quantity of GO terms describing renal development. In the associated annotation activity, the new and revised terms were associated with gene products involved in renal development and function. This project resulted in a total of 522 GO terms being added to the ontology and the creation of approximately 9,600 kidney-related GO term associations to 940 UniProt Knowledgebase (UniProtKB) entries, covering 66 taxonomic groups. We demonstrate the impact of these improvements on the interpretation of GO term analyses performed on genes differentially expressed in kidney glomeruli affected by diabetic nephropathy. In summary, we have produced a resource that can be utilized in the interpretation of data from small- and large-scale experiments investigating molecular mechanisms of kidney function and development and thereby help towards alleviating renal disease.
PMCID: PMC4062467  PMID: 24941002
5.  A method for increasing expressivity of Gene Ontology annotations using a compositional approach 
BMC Bioinformatics  2014;15:155.
The Gene Ontology project integrates data about the function of gene products across a diverse range of organisms, allowing the transfer of knowledge from model organisms to humans, and enabling computational analyses for interpretation of high-throughput experimental and clinical data. The core data structure is the annotation, an association between a gene product and a term from one of the three ontologies comprising the GO. Historically, it has not been possible to provide additional information about the context of a GO term, such as the target gene or the location of a molecular function. This has limited the specificity of knowledge that can be expressed by GO annotations.
The GO Consortium has introduced annotation extensions that enable manually curated GO annotations to capture additional contextual details. Extensions represent effector–target relationships such as localization dependencies, substrates of protein modifiers and regulation targets of signaling pathways and transcription factors as well as spatial and temporal aspects of processes such as cell or tissue type or developmental stage. We describe the content and structure of annotation extensions, provide examples, and summarize the current usage of annotation extensions.
The additional contextual information captured by annotation extensions improves the utility of functional annotation by representing dependencies between annotations to terms in the different ontologies of GO, external ontologies, or an organism’s gene products. These enhanced annotations can also support sophisticated queries and reasoning, and will provide curated, directional links between many gene products to support pathway and network reconstruction.
PMCID: PMC4039540  PMID: 24885854
Gene Ontology; Functional annotation; Annotation extension; Manual curation
6.  Canto: an online tool for community literature curation 
Bioinformatics  2014;30(12):1791-1792.
Motivation: Detailed curation of published molecular data is essential for any model organism database. Community curation enables researchers to contribute data from their papers directly to databases, supplementing the activity of professional curators and improving coverage of a growing body of literature. We have developed Canto, a web-based tool that provides an intuitive curation interface for both curators and researchers, to support community curation in the fission yeast database, PomBase. Canto supports curation using OBO ontologies, and can be easily configured for use with any species.
Availability: Canto code and documentation are available under an Open Source license from Canto is a component of the Generic Model Organism Database (GMOD) project (
PMCID: PMC4058955  PMID: 24574118
7.  Dovetailing biology and chemistry: integrating the Gene Ontology with the ChEBI chemical ontology 
BMC Genomics  2013;14:513.
The Gene Ontology (GO) facilitates the description of the action of gene products in a biological context. Many GO terms refer to chemical entities that participate in biological processes. To facilitate accurate and consistent systems-wide biological representation, it is necessary to integrate the chemical view of these entities with the biological view of GO functions and processes. We describe a collaborative effort between the GO and the Chemical Entities of Biological Interest (ChEBI) ontology developers to ensure that the representation of chemicals in the GO is both internally consistent and in alignment with the chemical expertise captured in ChEBI.
We have examined and integrated the ChEBI structural hierarchy into the GO resource through computationally-assisted manual curation of both GO and ChEBI. Our work has resulted in the creation of computable definitions of GO terms that contain fully defined semantic relationships to corresponding chemical terms in ChEBI.
The set of logical definitions using both the GO and ChEBI has already been used to automate aspects of GO development and has the potential to allow the integration of data across the domains of biology and chemistry. These logical definitions are available as an extended version of the ontology from
PMCID: PMC3733925  PMID: 23895341
8.  A guide to best practices for Gene Ontology (GO) manual annotation 
The Gene Ontology Consortium (GOC) is a community-based bioinformatics project that classifies gene product function through the use of structured controlled vocabularies. A fundamental application of the Gene Ontology (GO) is in the creation of gene product annotations, evidence-based associations between GO definitions and experimental or sequence-based analysis. Currently, the GOC disseminates 126 million annotations covering >374 000 species including all the kingdoms of life. This number includes two classes of GO annotations: those created manually by experienced biocurators reviewing the literature or by examination of biological data (1.1 million annotations covering 2226 species) and those generated computationally via automated methods. As manual annotations are often used to propagate functional predictions between related proteins within and between genomes, it is critical to provide accurate consistent manual annotations. Toward this goal, we present here the conventions defined by the GOC for the creation of manual annotation. This guide represents the best practices for manual annotation as established by the GOC project over the past 12 years. We hope this guide will encourage research communities to annotate gene products of their interest to enhance the corpus of GO annotations available to all.
Database URL:
PMCID: PMC3706743  PMID: 23842463
9.  Semantic integration of physiology phenotypes with an application to the Cellular Phenotype Ontology 
Bioinformatics  2012;28(13):1783-1789.
Motivation: The systematic observation of phenotypes has become a crucial tool of functional genomics, and several large international projects are currently underway to identify and characterize the phenotypes that are associated with genotypes in several species. To integrate phenotype descriptions within and across species, phenotype ontologies have been developed. Applying ontologies to unify phenotype descriptions in the domain of physiology has been a particular challenge due to the high complexity of the underlying domain.
Results: In this study, we present the outline of a theory and its implementation for an ontology of physiology-related phenotypes. We provide a formal description of process attributes and relate them to the attributes of their temporal parts and participants. We apply our theory to create the Cellular Phenotype Ontology (CPO). The CPO is an ontology of morphological and physiological phenotypic characteristics of cells, cell components and cellular processes. Its prime application is to provide terms and uniform definition patterns for the annotation of cellular phenotypes. The CPO can be used for the annotation of observed abnormalities in domains, such as systems microscopy, in which cellular abnormalities are observed and for which no phenotype ontology has been created.
Availability and implementation: The CPO and the source code we generated to create the CPO are freely available on
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3381966  PMID: 22539675
10.  FYPO: the fission yeast phenotype ontology 
Bioinformatics  2013;29(13):1671-1678.
Motivation: To provide consistent computable descriptions of phenotype data, PomBase is developing a formal ontology of phenotypes observed in fission yeast.
Results: The fission yeast phenotype ontology (FYPO) is a modular ontology that uses several existing ontologies from the open biological and biomedical ontologies (OBO) collection as building blocks, including the phenotypic quality ontology PATO, the Gene Ontology and Chemical Entities of Biological Interest. Modular ontology development facilitates partially automated effective organization of detailed phenotype descriptions with complex relationships to each other and to underlying biological phenomena. As a result, FYPO supports sophisticated querying, computational analysis and comparison between different experiments and even between species.
Availability: FYPO releases are available from the Subversion repository at the PomBase SourceForge project page ( The current version of FYPO is also available on the OBO Foundry Web site (
Contact: or
PMCID: PMC3694669  PMID: 23658422
11.  Cross-Product Extensions of the Gene Ontology 
The Gene Ontology (GO) consists of nearly 30,000 classes for describing the activities and locations of gene products. Manual maintenance of an ontology of this size is a considerable effort, and errors and inconsistencies inevitably arise. Reasoners can be used to assist with ontology development, automatically placing classes in a subsumption hierarchy based on their properties. However, the historic lack of computable definitions within the GO has prevented the user of these tools.
In this paper we present preliminary results of an ongoing effort to normalize the GO by explicitly stating the definitions of compositional classes in a form that can be used by reasoners. These definitions are partitioned into mutually exclusive cross-product sets, many of which reference other OBO Foundry candidate ontologies for chemical entities, proteins, biological qualities and anatomical entities. Using these logical definitions we are gradually beginning to automate many aspects of ontology development, detecting errors and filling in missing relationships. These definitions also enhance the GO by weaving it into the fabric of a wider collection of interoperating ontologies, increasing opportunities for data integration and enhancing genomic analyses.
PMCID: PMC2910209  PMID: 20152934
12.  PomBase: a comprehensive online resource for fission yeast 
Nucleic Acids Research  2011;40(Database issue):D695-D699.
PomBase ( is a new model organism database established to provide access to comprehensive, accurate, and up-to-date molecular data and biological information for the fission yeast Schizosaccharomyces pombe to effectively support both exploratory and hypothesis-driven research. PomBase encompasses annotation of genomic sequence and features, comprehensive manual literature curation and genome-wide data sets, and supports sophisticated user-defined queries. The implementation of PomBase integrates a Chado relational database that houses manually curated data with Ensembl software that supports sequence-based annotation and web access. PomBase will provide user-friendly tools to promote curation by experts within the fission yeast community. This will make a key contribution to shaping its content and ensuring its comprehensiveness and long-term relevance.
PMCID: PMC3245111  PMID: 22039153
13.  How the gene ontology evolves 
BMC Bioinformatics  2011;12:325.
Maintaining a bio-ontology in the long term requires improving and updating its contents so that it adequately captures what is known about biological phenomena. This paper illustrates how these processes are carried out, by studying the ways in which curators at the Gene Ontology have hitherto incorporated new knowledge into their resource.
Five types of circumstances are singled out as warranting changes in the ontology: (1) the emergence of anomalies within GO; (2) the extension of the scope of GO; (3) divergence in how terminology is used across user communities; (4) new discoveries that change the meaning of the terms used and their relations to each other; and (5) the extension of the range of relations used to link entities or processes described by GO terms.
This study illustrates the difficulties involved in applying general standards to the development of a specific ontology. Ontology curation aims to produce a faithful representation of knowledge domains as they keep developing, which requires the translation of general guidelines into specific representations of reality and an understanding of how scientific knowledge is produced and constantly updated. In this context, it is important that trained curators with technical expertise in the scientific field(s) in question are involved in supervising ontology shifts and identifying inaccuracies.
PMCID: PMC3166943  PMID: 21819553
Gene Ontology; knowledge; maintenance; curation; ontology shifts
14.  Comparison of the Complete Protein Sets of Worm and Yeast: Orthology and Divergence 
Science (New York, N.Y.)  1998;282(5396):2022-2028.
Comparative analysis of predicted protein sequences encoded by the genomes of Caenorhabditis elegans and Saccharomyces cerevisiae suggests that most of the core biological functions are carried out by orthologous proteins (proteins of different species that can be traced back to a common ancestor) that occur in comparable numbers. The specialized processes of signal transduction and regulatory control that are unique to the multicellular worm appear to use novel proteins, many of which re-use conserved domains. Major expansion of the number of some of these domains seen in the worm may have contributed to the advent of multicellularity. The proteins conserved in yeast and worm are likely to have orthologs throughout eukaryotes; in contrast, the proteins unique to the worm may well define metazoans.
PMCID: PMC3057080  PMID: 9851918
15.  Expanding Yeast Knowledge Online 
Yeast (Chichester, England)  1998;14(16):1453-1469.
The completion of the Saccharomyces cerevisiae genome sequencing project11 and the continued development of improved technology for large-scale genome analysis have led to tremendous growth in the amount of new yeast genetics and molecular biology data. Efficient organization, presentation, and dissemination of this information are essential if researchers are to exploit this knowledge. In addition, the development of tools that provide efficient analysis of this information and link it with pertinent information from other systems is becoming increasingly important at a time when the complete genome sequences of other organisms are becoming available. The aim of this review is to familiarize biologists with the type of data resources currently available on the World Wide Web (WWW).
PMCID: PMC3037831  PMID: 9885151
World Wide Web; Saccharomyces Genome Database; Munich Information Center for Protein Sequences; Yeast Protein Database
16.  Gene Ontology: tool for the unification of biology 
Nature genetics  2000;25(1):25-29.
Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web ( are being constructed: biological process, molecular function and cellular component.
PMCID: PMC3037419  PMID: 10802651
17.  The Protein Feature Ontology: A Tool for the Unification of Protein Annotations 
Bioinformatics (Oxford, England)  2008;24(23):2767-2772.
The advent of sequencing and structural genomics projects has provided a dramatic boost in the number of protein structures and sequences. Due to the high-throughput nature of these projects, many of the molecules are uncharacterised and their functions unknown. This, in turn, has led to the need for a greater number and diversity of tools and databases providing annotation through transfer based on homology and prediction methods. Though many such tools to annotate protein sequence and structure exist, they are spread throughout the world, often with dedicated individual web pages. This situation does not provide a consensus view of the data and hinders comparison between methods. Integration of these methods is needed. So far this has not been possible since there was no common vocabulary available that could be used as a standard language. A variety of terms could be used to describe any particular feature ranging from different spellings to completely different terms. The Protein Feature Ontology ( is a structured controlled vocabulary for features of a protein sequence or structure. It provides a common language for tools and methods to use, so that integration and comparison of their annotations is possible. The Protein Feature Ontology comprises approximately 100 positional terms (located in a particular region of the sequence), which have been integrated into the Sequence Ontology (SO). 40 non-positional terms which describe general protein properties have also been defined and, in addition, post-translational modifications are described by using an already existing ontology, the Protein Modification Ontology (MOD). The Protein Feature Ontology has been used by the BioSapiens Network of Excellence, a consortium comprising 19 partner sites in 14 European countries generating over 150 distinct annotation types for protein sequences and structures.
PMCID: PMC2912506  PMID: 18936051
18.  Muscle Research and Gene Ontology: New standards for improved data integration 
The Gene Ontology Project provides structured controlled vocabularies for molecular biology that can be used for the functional annotation of genes and gene products. In a collaboration between the Gene Ontology (GO) Consortium and the muscle biology community, we have made large-scale additions to the GO biological process and cellular component ontologies. The main focus of this ontology development work concerns skeletal muscle, with specific consideration given to the processes of muscle contraction, plasticity, development, and regeneration, and to the sarcomere and membrane-delimited compartments. Our aims were to update the existing structure to reflect current knowledge, and to resolve, in an accommodating manner, the ambiguity in the language used by the community.
The updated muscle terminologies have been incorporated into the GO. There are now 159 new terms covering critical research areas, and 57 existing terms have been improved and reorganized to follow their usage in muscle literature.
The revised GO structure should improve the interpretation of data from high-throughput (e.g. microarray and proteomic) experiments in the area of muscle science and muscle disease. We actively encourage community feedback on, and gene product annotation with these new terms. Please visit the Muscle Community Annotation Wiki .
PMCID: PMC2657163  PMID: 19178689
19.  Standards and Ontologies for Functional Genomics 2 
PMCID: PMC2447478  PMID: 18629185
21.  The European Bioinformatics Institute's data resources 
Nucleic Acids Research  2003;31(1):43-50.
As the amount of biological data grows, so does the need for biologists to store and access this information in central repositories in a free and unambiguous manner. The European Bioinformatics Institute (EBI) hosts six core databases, which store information on DNA sequences (EMBL-Bank), protein sequences (SWISS-PROT and TrEMBL), protein structure (MSD), whole genomes (Ensembl) and gene expression (ArrayExpress). But just as a cell would be useless if it couldn't transcribe DNA or translate RNA, our resources would be compromised if each existed in isolation. We have therefore developed a range of tools that not only facilitate the deposition and retrieval of biological information, but also allow users to carry out searches that reflect the interconnectedness of biological information. The EBI's databases and tools are all available on our website at
PMCID: PMC165513  PMID: 12519944
22.  Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO) 
Nucleic Acids Research  2002;30(1):69-72.
The Saccharomyces Genome Database (SGD) resources, ranging from genetic and physical maps to genome-wide analysis tools, reflect the scientific progress in identifying genes and their functions over the last decade. As emphasis shifts from identification of the genes to identification of the role of their gene products in the cell, SGD seeks to provide its users with annotations that will allow relationships to be made between gene products, both within Saccharomyces cerevisiae and across species. To this end, SGD is annotating genes to the Gene Ontology (GO), a structured representation of biological knowledge that can be shared across species. The GO consists of three separate ontologies describing molecular function, biological process and cellular component. The goal is to use published information to associate each characterized S.cerevisiae gene product with one or more GO terms from each of the three ontologies. To be useful, this must be done in a manner that allows accurate associations based on experimental evidence, modifications to GO when necessary, and careful documentation of the annotations through evidence codes for given citations. Reaching this goal is an ongoing process at SGD. For information on the current progress of GO annotations at SGD and other participating databases, as well as a description of each of the three ontologies, please visit the GO Consortium page at SGD gene associations to GO can be found by visiting our site at
PMCID: PMC99086  PMID: 11752257
23.  Saccharomyces Genome Database provides tools to survey gene expression and functional analysis data 
Nucleic Acids Research  2001;29(1):80-81.
Upon the completion of the Saccharomyces cerevisiae genomic sequence in 1996 [Goffeau,A. et al. (1997) Nature, 387, 5], several creative and ambitious projects have been initiated to explore the functions of gene products or gene expression on a genome-wide scale. To help researchers take advantage of these projects, the Saccharomyces Genome Database (SGD) has created two new tools, Function Junction and Expression Connection. Together, the tools form a central resource for querying multiple large-scale analysis projects for data about individual genes. Function Junction provides information from diverse projects that shed light on the role a gene product plays in the cell, while Expression Connection delivers information produced by the ever-increasing number of microarray projects. WWW access to SGD is available at
PMCID: PMC29796  PMID: 11125055
24.  Integrating functional genomic information into the Saccharomyces Genome Database 
Nucleic Acids Research  2000;28(1):77-80.
The Saccharomyces Genome Database (SGD) stores and organizes information about the nearly 6200 genes in the yeast genome. The information is organized around the ‘locus page’ and directs users to the detailed information they seek. SGD is endeavoring to integrate the existing information about yeast genes with the large volume of data generated by functional analyses that are beginning to appear in the literature and on web sites. New features will include searches of systematic analyses and Gene Summary Paragraphs that succinctly review the literature for each gene. In addition to current information, such as gene product and phenotype descriptions, the new locus page will also describe a gene product’s cellular process, function and localization using a controlled vocabulary developed in collaboration with two other model organism databases. We describe these developments in SGD through the newly reorganized locus page. The SGD is accessible via the WWW at http://genome-www. stanford. edu/Saccharomyces/
PMCID: PMC102447  PMID: 10592186

Results 1-24 (24)