The GO consists of three ontologies: Molecular Function, Biological Process and Cellular Component (Ashburner et al., 2000
) (). The Molecular Function ontology describes the biochemical reactions carried out by gene products at the molecular level, for example, binding or catalysis. The Biological Process ontology describes the biological objective of a gene product, such as germination. Biological processes are achieved by a series or a group of executed molecular functions. For example, the overall objective of the biological process histone phosphorylation
(GO:0016572) can be accomplished by protein kinases carrying out their molecular function histone kinase activity
(GO:0035173). The Cellular Component ontology describes where in a cell a particular molecular function may occur. In most cases, annotation of a gene product to a cellular component term means that the gene product has been found there and we infer that this may be where the gene product functions. The primary purpose of GO is to provide a standard way to describe the roles of gene products in any organism. The standardization gives us the ability to compare the roles gene products play across many species.
Figure 2 The Gene Ontology consists of three ontologies. Molecular Function describes the biochemical activity of a gene product, such as histone kinase activity. Biological Process describes an overall biological objective, such as seed germination (image obtained (more ...)
The GO is continually being modified and improved. The ontology changes on daily basis as ontology curators add finer detail to specific areas of the ontology and steadily work to ensure that the ontology is correct and consistent with current biological knowledge. Just as the ontology is constantly improving, annotators at contributing databases are continuously adding new information about gene products to the GO resource. As a result, the entire system, from the ontology to the gene product annotations, is dynamic.
GO terms have 5 essential features (): 1) GO terms have unique names that unambiguously distinguish them from other terms. Whenever possible, terms are named as close to common usage in the community as possible. 2) GO terms have unique IDs of the format: GO:#######. These IDs are fixed and tied to the identity, definition, or meaning, of the term. Term names may change over time, but IDs remain constant. Therefore, the ID is the more stable identifier for a term in the ontology. If, upon further development of the GO, an existing term is deemed unsuitable for the ontology, the term is made obsolete and its ID is deprecated to a special obsolete category. 3) Terms have synonyms. Synonyms are very useful for searching purposes because they represent the many different ways that scientists refer to the same concept. For example, the terms ‘vitellogenesis’ and ‘yolk production’ mean the same thing, therefore a single GO term, vitellogenesis (GO:0007296) exists for this process with an exact synonym yolk production (GO:0007296). Synonym terms in GO can be exact, broad or narrow in relation to a primary term. 4) Terms have a textual definition. The textual definition is both necessary and sufficient to identify the term by its ID. Model Organism Database (MOD) biocurators and users of the ontology use textual definitions to understand the intended meaning of a term and thus the rationale for its association with a given gene product. 5) Terms have relationships to other terms such that each term is placed in the context of all of the other terms in the DAG.
Figure 3 A partial OBO stanza displaying the term vitellogenesis. An OBO stanza is the textual description of an ontology term in the OBO format. Each term has an ID, a unique name, a textual definition that is supported by a reference, appropriate synonyms and (more ...)
The GO uses six types of relationships is_a, part_of, regulates, positively_regulates, negatively_regulates and disjoint_from
. With the exception of disjoint_from
, in these binary relationships we refer to the less specific term as the parent and the more specific term as the child. The further a term is from the root of the graph, the more specific the term is. Therefore, terms that are furthest away from the root will convey more information than those that are closer. Automated methods are now being developed that take advantage of the information content of the ontology to help manage the placement of terms in the graph (Alterovitz et al., 2009
). However, it is not particularly useful to discuss the ‘level’ of a term in the graph. Since the graph is a DAG, a single term may be at more than one level depending on the path that is taken to arrive at it from the root.
The different relationships describe the multiple ways in which 2 terms can be linked to each other. The is_a
relationship in GO means that if A is_a
B then every time we find an A in a natural biological setting, it is a kind of B and A is a child of B. For example, plasma membrane
(GO:0005886) is_a membrane
(GO:0016020) means that every plasma membrane we find is a kind of membrane. The part_of
relationship in GO means that if A part_of
B then every time we find an A in a natural biological setting it, along with other things, makes up a kind of B. Note that it does not mean that every time we find a B it must contain an A as a part. Consider the case of a process in the ontology that has a part_of
child. This does not mean that every time the parent process occurs, the child part has to occur. Instead it means that every time the child part occurs, it has to be in the context of the parent process. For example, oogenesis
(GO:0048477) has ovarian follicle cell development
(GO:0030707) and ovarian nurse cell to oocyte transport
(GO:0007300) as part_of
children. Ovarian nurse cell to oocyte transport
(GO:0007300) part_of oogenesis
(GO:0048477) means that every time this transport occurs, it is part of the process of the formation and maturation of a female gamete. However, not all oocytes require transport from a nurse cell in order to undergo oogenesis (Biliński et al., 1998
). In summary, then, if A part_of
B, then every time A exists, it is a part of B, but not all kinds of B have to have a part A.
The regulates relationships are very important with respect to the representation of developmental processes in GO. The regulates relationship between processes means that a process A has a direct influence on another process B such that it controls some aspect of how process B unfolds. Process A can affect the rate at which B proceeds, the frequency at which B occurs, or how far along B is allowed to progress. For example, regulation of Notch signaling pathway (GO:0008593) regulates Notch signaling pathway (GO:0007219) means that every time a regulation of Notch signaling event occurs, it somehow modulates the rate, frequency or extent of Notch signaling. The negatively_regulates and positively_regulates relationships follow the regulates relationship in a straightforward manner by either increasing or decreasing the rate, frequency or extent of another process.
The disjoint_from relationship is a special relationship that is used to maintain the structural integrity of the DAG. If A is disjoint from B, then no A that exists can be a B. For example, cellular process (GO:0009987) disjoint_from multicellular organismal process (GO:0032501) means that if a process is a cellular process, it cannot also be a multicellular organismal process. It can, however, be a part_of a multicellular organismal process. For example, hepatocyte differentiation (GO:0070365) is_a cellular process (GO:0009987) and also is part_of liver development (GO:0001889) a multicellular organismal process (GO:0032501).
Rules governing the relationships between terms and how these relationships interact with each other are important because a computer can use them to draw conclusions about one term based on how it relates to other terms (). For example, the part_of relationship is transitive. If cell-cell signaling involved in cell fate specification (GO:0045168) is part of cell fate specification (GO:0001708) and cell fate specification (GO:0001708) is part of cell fate commitment (GO:0045165), then cell-cell signaling involved in cell fate specification (GO:0045168) is part of cell fate commitment (GO:0045165). Relationships like this can be linked together in a chain through multiple steps in the ontology. As a result, we can infer the relationship between two terms that might be quite distant from one another in the graph.
Figure 4 Rules governing relationships allow inferences to be made across the ontology. Here we show 3 rules that can be used for inference using the is_a and part_of relationships. The is_a and part_of relationships are transitive over themselves. The part_of (more ...)