Gene Ontology (GO) is a controlled vocabulary that is used to classify the biological characteristics of gene products. GO terms describe three characteristics, Biological Process (BP), Molecular Function (MF) and Cellular Component (CC). BP terms describe the general process a gene product is involved in, MF terms describe the specific molecular function of a gene product and CC terms describe the subcellular compartment in which a gene product is found. This work focused on the expansion of the BP part of GO, which encompasses all developmental processes.
Gene annotators use experimentally supported data from published literature to associate specific GO terms with the genes that have been shown to bear the attributes described by the GO term. Taken as a whole, the set of annotations to a given gene product aims to describe the totality of what is currently known about that gene product’s role in biology, while an individual annotation describes the results of a single experiment. Annotators use their biological knowledge alongside information presented in the paper to judge the most specific term possible for each annotation. For example, individual experiments have shown that the homeobox NKX2-5
gene product takes part in the process of cardiac muscle cell differentiation
(GO:0055007; BP) (Tanaka et al., 1999
), has transcription factor activity
(GO:0003700; MF) (Kasahara and Izumo, 1999
) and is found in the nucleus
(GO:0005634; CC) (Zhu et al., 2000
). Thus the NKX2-5
gene product has been annotated to each of those terms. Additionally gene products may have multiple functions, often take part in more than one process and can be found in multiple subcellular compartments. GO allows a single gene product to be annotated to any number of terms from each of the three ontologies.
GO terms are structured in a directed acyclic graphs (DAG), where each term can have multiple relationships to broader ‘parent’ and more specific ‘child’ terms. The parent and child terms have specific relationships with each other. In GO, there are seven types of relationships (Gene Ontology Consortium, 2009
; Smith et al., 2005
), of which five are relevant to the heart development ontology. The ‘is_a’ relationship means that a child term is always a type of its parent term; for example heart development
(GO:0007507) is a type of organ development
(GO:0048513). The ‘part_of’ relationship means that the child term is always a part of the parent term; for example cell migration involved in vasculogenesis
(GO:0035441) is part of vasculogenesis
(GO:0001570). The ‘regulates’, ‘positively_regulates’, and ‘negatively_regulates’ relationships signify that the children have a regulatory effect on the parent; for example the term negative regulation of cardiac muscle tissue development (GO:0055026) has a ‘negatively_regulates’ relationship to the term cardiac muscle tissue development
(GO:0048738). To illustrate how terms and relationships are used in GO, shows that the term vasculogenesis
(GO:0001570) has two direct ancestors, cell differentiation
(GO:0030154) and blood vessel morphogenesis
(GO:0048514). Vasculogenesis ‘is a’ type of cell differentiation and is also ‘part of’ the process of blood vessel morphogenesis.
Figure 1 The ancestor chart for GO terms vasculogenesis and angiogenesis. The graph shows the terms vasculogenesis (GO:0001570), angiogenesis (GO:0001525) and all of their ancestor terms. The lines marked with I indicate an ‘is_a’ relationship (more ...)
An important benefit of building a DAG, rather than a flat-list of controlled vocabulary terms, is that relationships can be used to make inferences from one term to another. For example, vasculogenesis (GO:0001570) is part of blood vessel morphogenesis (GO:0048514) and blood vessel morphogenesis (GO:0048514) is part of blood vessel development (GO:0001568). Therefore, because the ‘part_of’ relationship is transitive, vasculogenesis (GO:0001570) can also be considered to be a part of blood vessel development (GO:0001568), and does not require a direct link between the two terms. This transitive nature is very useful when using GO to find gene products annotated as being involved in a particular process; for example a search for gene products annotated to blood vessel morphogenesis (GO:0048514) will include those annotated directly to the more specific processes of vasculogenesis (GO:0001570) and angiogenesis (GO:0001525), both of which are a part of blood vessel morphogenesis (GO:0048514).
Each GO term has several different components (). The GO ID is unique to each term. The definition is a textual description of what the term means and in many cases disambiguates the use of identical terms that are used to mean different things. Additionally the placement of the term within the ontology also provides a necessary definition through the terms relationships with its parents. For example the term lateral ventricle development (GO:0021670) may at first glance be ambiguous, but upon viewing the ontology it is clear that this term is a descendant of central nervous system development (GO:0007417), and thus ‘ventricle’ in this context refers to a brain ventricle and not a heart ventricle.
Figure 2 Anatomy of a GO term. QuickGO view of GO (www.ebi.ac.uk/QuickGO) illustrates how each GO term has a unique GO ID number. The term name describes the overall concept, which is supplemented by what is often a very detailed definition. Some terms also have (more ...)
GO allows for the description of processes that occur at multiple levels of biology: i) the organ level, ii) the multicellular (tissue) level and iii) the level of the single cell. It also allows for the description of generic processes that are used in multiple ways to accomplish a given objective. The developmental process section of the biological process ontology allows for the description of developmental events either from an anatomical perspective or from a process perspective. Standard developmental terms such as cell differentiation
(GO:0030154) are defined generically and are then used consistently to describe the process in the context of all of the different processes in which it is involved (Hill et al., 2010
). The following sections illustrate how the newly created ontology can be used to describe the heart developmental processes that occur at each of these three levels; heart morphogenesis at the organ level, cell differentiation in the heart at the cellular level, and signaling pathways to describe interactions between cells that make up tissues.
The terms and relationships in the ontology are carefully chosen so that regardless of the species the ontology is always correct. Species-neutrality enables experimentally supported annotations in model organisms, to be transferred to human gene products, if appropriate. For example, based on the observation that expression of mouse Mesp1
in embryonic stem cells (ESCs) results in transcriptional regulation of key genes controlling early mesoderm and endoderm cell fates and promotes the progression of cells toward a cardiac fate (Bondue et al., 2008
), one of the terms that was used to annotate the mouse Mesp1
gene product is cardiac cell fate determination
(GO:0060913). At the time of writing there are 26 experimentally supported annotations associated with the mouse Mesp1
gene product, whereas there are 10 such annotations associated to the human MESP1
gene product. The unique mouse annotations have been transferred to the orthologous human protein, thus enhancing our knowledge about how the orthologous gene products might be involved in human biology or disease.