Genome Database’s (SGD’s) (1
) goal of annotating yeast genes to Gene Ontology (GO) (6
) is to provide users with accurate information about the roles of gene products in the cell and their relationship to other gene products in yeast and other organisms. The availability of published experimental data for Saccharomyces cerevisiae
as a model organism, and the participation of other organism databases (currently Drosophila melanogaster
, Mus musculus
, Arabidopsis thaliana
, Caenorhabditis elegans
, Schizosaccharomyces pombe
, Dictyostelium discoideum
and Plasmodium falciparum
and other parasites) and organizations [InterPro, SWISS-PROT, TrEMBL (8
) and Compugen] in GO development and annotation make this possible. Complete annotation of S.cerevisiae
genes to GO will allow users to find all genes, including those across species, which share the same (or related) annotation(s) for function, process and component.
GO consists of three ontologies, representing the fundamental aspects of gene products: molecular function, biological process and cellular component. Each ontology is structured such that specific terms are considered children of more broad terms. For instance, when describing localization, the cellular component term ‘nucleus’ may be considered more general than ‘chromosome’. If a gene product is annotated to the cellular component term ‘chromosome’, then it is also implicitly annotated to ‘nucleus’, by virtue of the parent–child relationship between these GO terms. To appropriately model biological data, the structure allows for many-to-many relationships, such that nodes within the structure, representing individual biological concepts, may have many parents and many children, each connected by their relationship to one another (6
). For instance, the process of ‘DNA ligation’ has parent terms of ‘DNA recombination’, ‘DNA repair’ and ‘DNA-dependent DNA replication’, as it is required for all of these processes. Gene products may be annotated to as many GO terms as needed, at the most specific levels possible, to reflect the current state of our understanding. Implicit to the integrity of this structure, all relationships between nodes must be true. The structure of GO allows relationships to be made between genes that share related as well as identical GO terms by exploring the surrounding GO structure. Another important clarification is that, while SGD GO associations are made between the gene/ORF and GO terms in the database, curators are actually annotating the gene product (rather than the gene itself) to the appropriate function, process or component term(s).
With these goals and considerations in mind, SGD curators annotate gene products to GO using the following guidelines: (i) whenever possible, associations are made based on information obtained from published literature, (ii) associations are made to the most specific terms contained within the ontologies, (iii) each annotation requires a GO evidence code, and (iv) each annotation is associated with a literature citation. The third and fourth guidelines can be used to evaluate the confidence level of the association.