Search tips
Search criteria

Results 1-12 (12)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
1.  Improving functional annotation for industrial microbes: a case study with Pichia pastoris 
Trends in Biotechnology  2014;32(8):396-399.
•The current status of the Pichia pastoris genome is shown to lack extensive functional annotation.•GO annotation transfer and literature curation pipelines improve the functional annotation of genomes.•Pipelines and tools that can improve the annotation status of the genomes of Pichia pastoris and many industrial microbes are considered.•Well-annotated genome sequences will facilitate the utilization of these microbes in a broader range of synthetic biology applications.
The research communities studying microbial model organisms, such as Escherichia coli or Saccharomyces cerevisiae, are well served by model organism databases that have extensive functional annotation. However, this is not true of many industrial microbes that are used widely in biotechnology. In this Opinion piece, we use Pichia (Komagataella) pastoris to illustrate the limitations of the available annotation. We consider the resources that can be implemented in the short term both to improve Gene Ontology (GO) annotation coverage based on annotation transfer, and to establish curation pipelines for the literature corpus of this organism.
PMCID: PMC4111905  PMID: 24929579
Pichia pastoris; Komagataella pastoris; industrial microbes; functional annotation; recombinant protein production
2.  A method for increasing expressivity of Gene Ontology annotations using a compositional approach 
BMC Bioinformatics  2014;15:155.
The Gene Ontology project integrates data about the function of gene products across a diverse range of organisms, allowing the transfer of knowledge from model organisms to humans, and enabling computational analyses for interpretation of high-throughput experimental and clinical data. The core data structure is the annotation, an association between a gene product and a term from one of the three ontologies comprising the GO. Historically, it has not been possible to provide additional information about the context of a GO term, such as the target gene or the location of a molecular function. This has limited the specificity of knowledge that can be expressed by GO annotations.
The GO Consortium has introduced annotation extensions that enable manually curated GO annotations to capture additional contextual details. Extensions represent effector–target relationships such as localization dependencies, substrates of protein modifiers and regulation targets of signaling pathways and transcription factors as well as spatial and temporal aspects of processes such as cell or tissue type or developmental stage. We describe the content and structure of annotation extensions, provide examples, and summarize the current usage of annotation extensions.
The additional contextual information captured by annotation extensions improves the utility of functional annotation by representing dependencies between annotations to terms in the different ontologies of GO, external ontologies, or an organism’s gene products. These enhanced annotations can also support sophisticated queries and reasoning, and will provide curated, directional links between many gene products to support pathway and network reconstruction.
PMCID: PMC4039540  PMID: 24885854
Gene Ontology; Functional annotation; Annotation extension; Manual curation
3.  Analysis of a genome-wide set of gene deletions in the fission yeast Schizosaccharomyces pombe 
Nature biotechnology  2010;28(6):617-623.
We report the construction and analysis of 4,836 heterozygous diploid deletion mutants covering 98.4% of the fission yeast genome. This resource provides a powerful tool for biotechnological and eukaryotic cell biology research. Comprehensive gene dispensability comparisons with budding yeast, the first time such studies have been possible between two eukaryotes, revealed that 83% of single copy orthologues in the two yeasts had conserved dispensability. Gene dispensability differed for certain pathways between the two yeasts, including mitochondrial translation and cell cycle checkpoint control. We show that fission yeast has more essential genes than budding yeast and that essential genes are more likely than non-essential genes to be single copy, broadly conserved and to contain introns. Growth fitness analyses determined sets of haploinsufficient and haploproficient genes for fission yeast, and comparisons with budding yeast identified specific ribosomal proteins and RNA polymerase subunits, which may act more generally to regulate eukaryotic cell growth.
PMCID: PMC3962850  PMID: 20473289
4.  Canto: an online tool for community literature curation 
Bioinformatics  2014;30(12):1791-1792.
Motivation: Detailed curation of published molecular data is essential for any model organism database. Community curation enables researchers to contribute data from their papers directly to databases, supplementing the activity of professional curators and improving coverage of a growing body of literature. We have developed Canto, a web-based tool that provides an intuitive curation interface for both curators and researchers, to support community curation in the fission yeast database, PomBase. Canto supports curation using OBO ontologies, and can be easily configured for use with any species.
Availability: Canto code and documentation are available under an Open Source license from Canto is a component of the Generic Model Organism Database (GMOD) project (
PMCID: PMC4058955  PMID: 24574118
5.  FYPO: the fission yeast phenotype ontology 
Bioinformatics  2013;29(13):1671-1678.
Motivation: To provide consistent computable descriptions of phenotype data, PomBase is developing a formal ontology of phenotypes observed in fission yeast.
Results: The fission yeast phenotype ontology (FYPO) is a modular ontology that uses several existing ontologies from the open biological and biomedical ontologies (OBO) collection as building blocks, including the phenotypic quality ontology PATO, the Gene Ontology and Chemical Entities of Biological Interest. Modular ontology development facilitates partially automated effective organization of detailed phenotype descriptions with complex relationships to each other and to underlying biological phenomena. As a result, FYPO supports sophisticated querying, computational analysis and comparison between different experiments and even between species.
Availability: FYPO releases are available from the Subversion repository at the PomBase SourceForge project page ( The current version of FYPO is also available on the OBO Foundry Web site (
Contact: or
PMCID: PMC3694669  PMID: 23658422
6.  A genome-wide resource of cell cycle and cell shape genes of fission yeast 
Open Biology  2013;3(5):130053.
To identify near complete sets of genes required for the cell cycle and cell shape, we have visually screened a genome-wide gene deletion library of 4843 fission yeast deletion mutants (95.7% of total protein encoding genes) for their effects on these processes. A total of 513 genes have been identified as being required for cell cycle progression, 276 of which have not been previously described as cell cycle genes. Deletions of a further 333 genes lead to specific alterations in cell shape and another 524 genes result in generally misshapen cells. Here, we provide the first eukaryotic resource of gene deletions, which describes a near genome-wide set of genes required for the cell cycle and cell shape.
PMCID: PMC3866870  PMID: 23697806
genome-wide gene deletion resource; cell cycle; cell shape; fission yeast
7.  On the Use of Gene Ontology Annotations to Assess Functional Similarity among Orthologs and Paralogs: A Short Report 
PLoS Computational Biology  2012;8(2):e1002386.
A recent paper (Nehrt et al., PLoS Comput. Biol. 7:e1002073, 2011) has proposed a metric for the “functional similarity” between two genes that uses only the Gene Ontology (GO) annotations directly derived from published experimental results. Applying this metric, the authors concluded that paralogous genes within the mouse genome or the human genome are more functionally similar on average than orthologous genes between these genomes, an unexpected result with broad implications if true. We suggest, based on both theoretical and empirical considerations, that this proposed metric should not be interpreted as a functional similarity, and therefore cannot be used to support any conclusions about the “ortholog conjecture” (or, more properly, the “ortholog functional conservation hypothesis”). First, we reexamine the case studies presented by Nehrt et al. as examples of orthologs with divergent functions, and come to a very different conclusion: they actually exemplify how GO annotations for orthologous genes provide complementary information about conserved biological functions. We then show that there is a global ascertainment bias in the experiment-based GO annotations for human and mouse genes: particular types of experiments tend to be performed in different model organisms. We conclude that the reported statistical differences in annotations between pairs of orthologous genes do not reflect differences in biological function, but rather complementarity in experimental approaches. Our results underscore two general considerations for researchers proposing novel types of analysis based on the GO: 1) that GO annotations are often incomplete, potentially in a biased manner, and subject to an “open world assumption” (absence of an annotation does not imply absence of a function), and 2) that conclusions drawn from a novel, large-scale GO analysis should whenever possible be supported by careful, in-depth examination of examples, to help ensure the conclusions have a justifiable biological basis.
Author Summary
Understanding gene function—how individual genes contribute to the biology of an organism at the molecular, cellular and organism levels—is one of the primary aims of biomedical research. It has been a longstanding tenet of model organism research that experimental knowledge obtained in one organism is often applicable to other organisms, particularly if the organisms share the relevant genes because they inherited them from their common ancestor. Nevertheless this tenet is, like any hypothesis, not beyond question. A recent paper has termed this hypothesis a “conjecture,” and performed a statistical analysis, the results of which were interpreted as evidence against the hypothesis. This statistical analysis relied on a computational representation of gene function, the Gene Ontology (GO). As representatives of the international consortium that produces the GO, we show how the apparent evidence against the “ortholog conjecture” can be better explained as an artifact of how molecular biology knowledge is accumulated. In short, a complementarity between knowledge obtained in mouse and human experimental systems was incorrectly interpreted as a disagreement. We discuss the proper interpretation of GO annotations and potential sources of bias, with an eye toward enhancing the informed use of the GO by the scientific community.
PMCID: PMC3280971  PMID: 22359495
8.  PomBase: a comprehensive online resource for fission yeast 
Nucleic Acids Research  2011;40(Database issue):D695-D699.
PomBase ( is a new model organism database established to provide access to comprehensive, accurate, and up-to-date molecular data and biological information for the fission yeast Schizosaccharomyces pombe to effectively support both exploratory and hypothesis-driven research. PomBase encompasses annotation of genomic sequence and features, comprehensive manual literature curation and genome-wide data sets, and supports sophisticated user-defined queries. The implementation of PomBase integrates a Chado relational database that houses manually curated data with Ensembl software that supports sequence-based annotation and web access. PomBase will provide user-friendly tools to promote curation by experts within the fission yeast community. This will make a key contribution to shaping its content and ensuring its comprehensiveness and long-term relevance.
PMCID: PMC3245111  PMID: 22039153
9.  The BioGRID Interaction Database: 2008 update 
Nucleic Acids Research  2007;36(Database issue):D637-D640.
The Biological General Repository for Interaction Datasets (BioGRID) database ( was developed to house and distribute collections of protein and genetic interactions from major model organism species. BioGRID currently contains over 198 000 interactions from six different species, as derived from both high-throughput studies and conventional focused studies. Through comprehensive curation efforts, BioGRID now includes a virtually complete set of interactions reported to date in the primary literature for both the budding yeast Saccharomyces cerevisiae and the fission yeast Schizosaccharomyces pombe. A number of new features have been added to the BioGRID including an improved user interface to display interactions based on different attributes, a mirror site and a dedicated interaction management system to coordinate curation across different locations. The BioGRID provides interaction data with monthly updates to Saccharomyces Genome Database, Flybase and Entrez Gene. Source code for the BioGRID and the linked Osprey network visualization system is now freely available without restriction.
PMCID: PMC2238873  PMID: 18000002
10.  YOGY: a web-based, integrated database to retrieve protein orthologs and associated Gene Ontology terms 
Nucleic Acids Research  2006;34(Web Server issue):W330-W334.
We present YOGY a web-based resource for orthologous proteins from nine eukaryotic organisms: Homo sapiens, Mus musculus, Rattus norvegicus, Arabidopsis thaliana, Drosophila melanogaster, Caenorhabditis elegans, Plasmodium falciparum, Schizosaccharomyces pombe and Saccharomyces cerevisiae. Using a gene name from any of these organisms as a query, this database provides comprehensive, combined information on orthologs in other species using data from five independent resources: KOGs, Inparanoid, HomoloGene, OrthoMCL and a table of curated fission and budding yeast orthologs. Associated Gene Ontology (GO) terms of orthologs can also be retrieved for functional inference. Integrating these different and complementary datasets provides a straightforward tool to identify known and predicted orthologs of proteins from a variety of species. This resource should be useful for bench scientists looking for functional clues for their genes of interest as well as for curators looking for information that can be transferred based on orthology and for rapidly identifying the relevant GO terms as an aid to literature curation. YOGY is accessible online at .
PMCID: PMC1538793  PMID: 16845020
11.  GeneDB: a resource for prokaryotic and eukaryotic organisms 
Nucleic Acids Research  2004;32(Database issue):D339-D343.
GeneDB ( is a genome database for prokaryotic and eukaryotic organisms. The resource provides a portal through which data generated by the Pathogen Sequencing Unit at the Wellcome Trust Sanger Institute and other collaborating sequencing centres can be made publicly available. It combines data from finished and ongoing genome and expressed sequence tag (EST) projects with curated annotation, that can be searched, sorted and downloaded, using a single web based resource. The current release stores 11 datasets of which six are curated and maintained by biologists, who review and incorporate information from the scientific literature, public databases and the respective research communities.
PMCID: PMC308742  PMID: 14681429
12.  Website Review: How to Get the Best From Fission Yeast Genome Data 
Researchers are increasingly depending on various centralized resources to access the vast amount of information reported in the literature and generated by systematic sequencing and functional genomics projects. Biological databases have become everyday working tools for many researchers. This dependency goes both ways in that the databases require continuous feedback from the research community to maintain accurate, reliable, and upto- date information. The fission yeast Schizosaccharomyces pombe has recently been sequenced, setting the stage for the post-genome era of this popular model organism. Here, we provide an overview of relevant databases available, or being developed, together with a compilation of Internet resources containing useful information and tools for fission yeast.
PMCID: PMC2447279  PMID: 18628858

Results 1-12 (12)