|Home | About | Journals | Submit | Contact Us | Français|
The gene ontology (go) resource provides dynamic controlled vocabularies to aid in the description of the functional attributes and subcellular locations of gene products from all taxonomic groups (www.geneontology.org). A renal-focused curation initiative, funded by Kidney Research UK and supported by the GO Consortium, has started at the European Bioinformatics Institute and aims to provide a detailed GO resource for mammalian proteins implicated in renal development and function. This report outlines the aims of this initiative and explains how the renal community can become involved to help improve the availability, quality and quantity of GO terms and their association to specific proteins.
Over the last decade the renal research community has embraced proteomic and genomic investigative methods to identify, quantify and characterize pathways and networks associated with the renal system.1 For example, a number of recent proteomics analyses have identified several novel, potentially susceptible genes and proteins associated with various aspects of renal function, development and disease, whose role and mode of action within the renal system remain ambiguous.2–9 There is also a number of renal genome and proteome databases that exist, providing the scientific community with a range of central repositories of renal-related physiological data, published and unpublished mass spectrometry data and microarray data.10–13 Although these high-throughput resources are extremely powerful for investigating multi-factorial phenotypes such as renal disease, these advances also mean that scientists must cope with the increasingly complex task of identifying, evaluating and managing the existing biological information for these large sequence sets. Hence, there is a need for effective bioinformatics tools as well as a supply of high-quality, detailed annotations that can support rapid evaluation of new experimental data and the generation of hypotheses for various biological insights.
The overall objective of the Renal Gene Ontology Annotation (GOA) Initiative is to provide a unique public resource of comprehensive functional annotations for proteins implicated in renal development, function and disease. The initiative aims to summarize the accumulated experimentally based knowledge for proteins using the popular structured Gene Ontology (GO) vocabulary by both improving the descriptiveness of terms describing renal processes as well as the number of associations of proteins involved in the renal system to information-rich GO terms. These efforts will ensure that the vast amount of published research on renal development and functional processes can be fully exploited by the renal research community to help guide future research towards alleviating renal disease.
Using structured controlled vocabulary terms, the GO project aims to fully describe three aspects of a gene product’s attributes: the molecular function(s), or activities that the sequence can directly perform; the biological process(es) it contributes to; and finally the subcellular locations (cellular components) in which it is located.14,15 For more informative and specific descriptions, new cross-product GO terms are created by combining existing GO terms with those from other ontologies such as the Cell Type ontology16 and the anatomy ontologies.17 Currently, over 29,800 GO terms exist, describing this wide range of concepts to differing levels of specificity.
GO terms are organized into Directed Acyclic Graphs (DAGs) which are hierarchical arrangements that allow a term to link to one or more general “parent” terms, as well as zero, one, or more specific “child” terms. For example, the GO term “alcohol catabolic process” (GO:0046164) has two parent terms, “catabolic process” (GO:0009056) and “alcohol metabolic process”(GO:0006066) and 14 child terms, including “phenol catabolic process” (GO:0019336) and “ethanolamine catabolic process” (GO:0046336). Each GO term has a unique, numerical, stable identifier e.g., GO:0070634, a term name, e.g., “transepithelial ammonium transport,” and a definition (Fig. 1).18 A number of different, distinct relationships can exist between terms, which capture how a term relates to others in the ontology. The two standard relationships are “is a” and “part of.” All terms are linked by the “is a” relation which describe “subclasses” of concepts, e.g., the term “mitochondrion” (GO:0005739) is a “intracellular organelle” (GO:0043229), which in turn, is a “organelle” (GO:0043226). The “part of ” relationship is used to represent part-whole relationships between terms, e.g., the “replication fork” (GO:0005657) is part of the “chromosome” (GO:0005694). Newer relationships are being added to the ontologies to further increase the descriptiveness, and the “regulates” (where one process directly affects the manifestation of another process or quality), “negatively regulates” and “positively regulates” relationships have been included in the GO since March 2008. This structure of related terms being joined together by multiple, defined relationships provides users with the powerful ability to fully manipulate the ontology. It allows for expansion of an area of the GO to see the most detailed descriptions for specific functions, or to group together terms using specific relationships to gain an overview of the functions that different associated gene products share.
A wide range of model organism and cross-species database groups are involved in the GO Consortium (GOC), that apply automated prediction and manual curation methods to generate associations or “annotations” between specific GO terms and gene products.19–22 These GO annotations additionally include: (1) a reference to indicate the source of data used to support an annotation (a PubMed identifier or a publicly available description of an electronic annotation method) and (2) an evidence code; a three-letter acronym describing the type of investigation applied in the cited reference to the veracity of an annotation.23,24
Electronic annotation prediction pipelines particularly benefit users of GO for non-model organisms or non-characterized sequences, since with conservative usage they can rapidly produce large numbers of annotations either from sequence data or by “translating” annotations made to external controlled vocabularies.19,24 The GOC currently uses a limited number of electronic pipelines; the most widely used applies protein signatures from the InterPro resource to predict functional attributes. For example, the protein signature “IPR000438: Acetyl-CoA carboxylase carboxyl transferase beta subunit” is used to identify a functionally similar protein set, all of whose members are also assigned the GO term “GO:0003989 Acetyl-CoA carboxylase activity.”25 Another electronic method providing an improved level of consistency between manually annotated orthologs uses the Ensembl Compara orthology resource to transfer manual GO annotations between 1:1 orthologs in over 45 different species (http://www.ebi.ac.uk/GOA/compara_go_annotations.html).
Therefore, whilst electronic annotation can produce many millions of valid annotations in a short space of time, these methods are limited. Automatic annotations are often only able to predict to high-level, less-detailed GO terms and rely on manual annotation activities in external groups to ensure correctness. Electronic annotation methods cannot capture the details of new, highly valuable experimental results that are found in peer-reviewed publications. Hence, manual annotation is employed, which requires highly trained curators to read and evaluate the available evidence in published literature in order to associate appropriate GO terms to proteins and to choose the most appropriate evidence code to apply to the annotation, thus resulting in a detailed summary of the knowledge about a protein (Fig. 2).18 Undoubtedly, manual annotation is a labor-intensive process; however, it does produce more annotations per protein, and uses GO terms which are far more informative and accurate than can be achieved by the current electronic pipelines.22 Manual methods also allow the curators to monitor electronic predictions of annotations to specific protein families and, when necessary, improve or correct them.24
To annotate the human genome comprehensively using GO is an arduous task and although several approaches are currently being used to achieve this,19–22 more manual annotation is essential. The biological insights provided by large scale genetic, genomic and proteomic studies can be difficult to ascertain and largely depend on computational analyses that incorporate functional annotation datasets. In certain cases, the current annotation datasets restrict the interpretation of these large-scale results, since the quality and quantity of the GO annotations is highly variable between different gene products.26 The GO annotation dataset provided by the GOC is one of the most widely used resources in secondary biomedical data analysis, assisting researchers in interpreting, validating and forming hypotheses for their data. For example, one recent investigation by RamachandraRao et al.27 has analysed GO annotations applied to protein-protein interactions to suggest that the antifibrotic effects of Pirfenidone may regulate RNA processing and is renoprotective in diabetic kidney disease.
Currently, the number of GO terms describing kidney development or renal-related processes such as fluid volume regulation and detoxification is very limited. Therefore, the aim of the Renal GOA Initiative is not only to generate detailed manual GO annotation, but also to develop and improve the terms in the Gene Ontology to ensure that the whole of renal biology is well represented. We believe it will be of great benefit to the entire renal research community if this central information resource is improved, generating an annotation dataset which renal biologists can use with confidence. This resource could also be very useful for the many existing renal genome and proteome databases, whereby showing the relevant GO annotations for their renal-specific datasets could provide consistency and a useful link between each one, enabling visibility and promotion in other high profile databases which, at present, seems to be lacking.
For the Renal GOA Initiative to have a large impact in the area of renal biology, it is important that experts from the renal community be consulted to ensure that the current accumulated knowledge has been comprehensively reviewed and correctly summarized by the dedicated curation team. Consequently, an international scientific advisory panel exists for consultation (http://www.ebi.ac.uk/GOA/kidney/) and a range of on-line facilities have been made available to encourage renal scientists to review and comment on the annotations or renal-related GO terms and to suggest publications or proteins for curation:
Although final annotation decisions are made by the professional curators, individual researchers contributing to the Renal GOA Initiative may do so purely to ensure their gene(s) of interest are well-curated or ensure that data from their own publications are annotated (which would hence be promoted in several highly visible databases). All contributions will be much appreciated by the GO curation teams and, when requested, a record will be maintained of those contributing so that their participation can be publicly acknowledged.
Collaborations have been initiated with a number of external and internal groups; work with the Genitourinary Development Molecular Anatomy Project team (GUDMAP—http://www.gudmap.org/) has begun to review the state of renal GO terms that currently exist in the ontology in relation to nephrogenesis, and has led to the creation of additional development terms in-line with the GUDMAP anatomy ontology (http://www.gudmap.org/Resources/Ontologies.html).
An association with the Reactome group at the EBI has led to the addition of further members of the solute-carrier transmembrane transporter protein superfamily to the Reactome database (http://www.reactome.org/). These proteins, as well as others (ion channels, proton pump and aquaporins), can be selectively viewed using the keyword “kidney,” and hence the reaction pathways in which these gene products are involved can be analyzed.
Collaborations within UniProt and with other model organism databases, including FlyBase and AgBase, have begun to improve annotation of renal-related proteins for non-mammalian organisms, which will also ensure accurate description of excretory and osmoregulatory systems in these species. Such work should also highlight biological similarities and differences of the orthologous gene products in distinct species.
The Renal GOA Initiative is funded by the Kidney Research UK Project Grant RP26/2008. The GOA project is supported by the National Institutes of Health grant R01HG02273-02, the British Heart Foundation grant SP:07/007/23671 and EMBL.
We would like to thank the GO editorial team, Bijay Jassal from Reactome and the members of the Edinburgh team of the GUDMAP Consortium.
Previously published online: www.landesbioscience.com/journals/organogenesis/article/11294