|Home | About | Journals | Submit | Contact Us | Français|
The Gene Ontology (GO) has proven to be a valuable resource for functional annotation of gene products. At well over 27 000 terms, the descriptiveness of GO has increased rapidly in line with the biological data it represents. Therefore, it is vital to be able to easily and quickly mine the functional information that has been made available through these GO terms being associated with gene products. QuickGO is a fast, web-based tool for browsing the GO and all associated GO annotations provided by the GOA group. After undergoing a redevelopment, QuickGO is now able to offer many more features beyond simple browsing. Users have responded well to the new tool and given very positive feedback about its usefulness. This tutorial will demonstrate how some of these features could be useful to the researcher wanting to discover more about their dataset, particular areas of biology or to find new ways of directing their research.
Database URL: http://www.ebi.ac.uk/QuickGO
High-throughput sequencing and experimental methodologies have meant that there is an ever-increasing amount of biological data available to researchers, which must be effectively managed, analysed and interpreted. The Gene Ontology (GO) has proven highly useful in helping researchers find biological significance in high-throughput data by supplying a consistent and structured nomenclature for biological concepts. These terms have been employed by a large number of biological database groups to describe the functionality of specific gene products; the combination of a highly descriptive, structured vocabulary and associated gene product annotations has proven effective in ordering and interpreting large data sets.
GO is a controlled vocabulary describing three functional attributes of gene products: molecular function, biological process and cellular component. The terms within these three ontologies have unique identifiers and are organized as a directed acyclic graph (DAG), a hierarchical structure where each term can have one or many parent (less-specific) terms and zero, one or many child (more-specific) terms (Figure 1). Relationships between terms are indicated in the key to the right of the graph in Figure 1. There are currently five relationship types: is a, meaning that the term is a subclass of its parent, e.g. ‘transferase activity’ is a type of ‘catalytic activity’; part of, meaning that the term is part of the parent term, e.g. ‘nucleolus’ is part of ‘nuclear lumen’; regulates, meaning that the term is a process that modulates its parent process, e.g. ‘regulation of apoptosis’ describes the modulation of ‘apoptosis’; the final two relationships are positively regulates and negatively regulates, each describing the relevant modulation of a parent process term.
Associations between these terms and gene products are made by several biological databases producing detailed functional descriptions of gene products. The Gene Ontology Annotation (GOA) group is one of 20 databases that form the GO Consortium. Each of these databases uses the common vocabulary of GO to annotate a range of gene products from different species in a consistent way. These associations, or annotations, can be made manually or computationally (electronically). Manual annotation is carried out by highly trained biologists reading published experimental literature, whereas electronic annotation involves the automatic assignment of GO annotations to gene products. To read more about GOA's annotation methods, please see the ‘Annotation methods’ link on our website (http://www.ebi.ac.uk/GOA/). Gene products can have many GO annotations in each of the ontologies such that well-studied genes or proteins may have hundreds of annotations assigned either by manual or computational methods, for example, human p53 (http://www.ebi.ac.uk/QuickGO/GProtein?ac=P04637) protein has over 200 GO annotations.
As the number of genomes being sequenced, and gene products being characterized increases, the GO annotations made from this data concomitantly increases (the GOA database contains over 45 million GO annotations as of July 2009) so there is a need for researchers to be able to sort and view these annotations and quickly retrieve relevant information to direct their research. All GO annotations from the GO Consortium member groups are available as Gene Association Files which are downloadable from the GO Consortium website (http://www.geneontology.org/) and the individual database websites. Such files have a very simple tab-delimited format; however, these files are large and somewhat cryptic to a biologist, requiring some computational knowledge in order to obtain from them subsets of information they are interested in. QuickGO was developed by the GOA group in August 2001 as a fast, web-based browser of GO term information and all GO annotations assigned to UniProt Knowledgebase (UniProtKB) accessions. In 2007, the GOA group was awarded a grant from the BBSRC Tools and Resources Development Fund to redevelop QuickGO by adding extensive new features. In March 2008, following this redevelopment, the new version of QuickGO was released. The GO annotations contained within the GOA database are now at the centre of QuickGO, users are able to customize annotation sets by using the extensive filtering options provided, these include being able to filter on protein accession, evidence code, taxonomic identifier and GO term. The latter functionality also means that users can create GO slims, subsets of GO terms used to simplify the view of annotations to a set of gene products (Binns,D. et al., submitted for publication).
A number of different web-based GO browsers are publicly available (see the GO Consortium Tools website: http://www.geneontology.org/GO.tools.browsers.shtml), and the vast majority provide equivalent detail on the terms and structure of the GO, it is in the display and manipulation of associated annotations where the main difference between browsers can be seen. A number of GO browsers are provided by model organism groups, which display the full set of electronic and manual GO annotations for individual species, such as the MGI GO browser (http://www.informatics.jax.org/searches/GO_form.shtml) (1), whereas the GO Consortium browser, AmiGO (http://amigo.geneontology.org/cgi-bin/amigo/go.cgi) (2), provides a comprehensive display of manual annotations provided by the groups in the GO Consortium. AmiGO is the most comparable GO browser to QuickGO in that the ontology can be searched and browsed, terms and their relationships can be viewed in context with the GO hierarchy, GO annotations can be viewed and downloaded for multiple species and it is updated frequently. Similar to QuickGO, AmiGO also has a GO slim facility used to map-up annotations to more general GO terms to give a simplified overview of the attributes of a list of gene products.
In addition, an increasing number of publicly available tools from third-party groups have been created to enable the manipulation and analysis of the GO ontologies and annotations in the context of other public ontology efforts and gene expression data [e.g. Ontology Lookup Service (http://www.ebi.ac.uk/ontology-lookup/) (3), Gene Class Expression (http://gdm.fmrp.usp.br/cgi-bin/gc/upload/upload.pl) (4)].
QuickGO is unique among these other browsers in that it is the only web-based browser to display annotation to almost 190 000 species, including both manually and electronically assigned annotations, as well as the facility to extensively filter on a number of annotation attributes and map between 17 different identifier types. This facility is of particular interest for researchers requiring functional predictions for genes or proteins originating from non-model organism species.
QuickGO is updated weekly with GO annotations and nightly with GO term information making it one of the most up-to-date GO browsers available, this is a critical feature of GO browsers, and GO analysis tools in general, due to the constant growth and updating of both the ontology and annotations. Unfortunately, there are some GO browsers where there is a long lag between updates [e.g. Gofetcher (http://mcbc.usm.edu/gofetcher/home.php) (5), GenNav (http://mor.nlm.nih.gov/perl/gennav.pl)] requiring users, sometimes unwittingly, to use old data.
QuickGO is linked from a range of text-mining, protein- and gene expression-analysis tools and protein databases. These tools provide a wide range of services but use QuickGO as the primary source of GO term and annotation information. Text-mining tools linking to QuickGO include GOCat (http://eagl.unige.ch/GOCat/) (6) and EBIMed (http://www.ebi.ac.uk/Rebholz-srv/ebimed/index.jsp) (7), which can analyse either blocks of text or PubMed identifiers to predict GO terms that could be associated with that text. Other analysis tools linking to QuickGO include InterProScan (http://www.ebi.ac.uk/Tools/InterProScan/) (8), a protein signature recognition tool and DAVID (http://david.abcc.ncifcrf.gov/) (9), a functional enrichment analysis tool. Major databases which link to QuickGO term and annotation data include the Pfam protein family database (http://pfam.sanger.ac.uk/) (10), UniProtKB (http://www.uniprot.org/) (11) and the Human Protein Atlas (http://www.proteinatlas.org/) (12).
This article hopes to provide users with some examples of the more complex and novel functions that QuickGO can perform, in an easy to follow guide. The researcher will then be able to apply this knowledge to their own data set enabling them to draw conclusions more easily about their chosen area of research. Some of the examples cited within are taken from real-life tasks that are commonly requested by our users through the GO (ude.drofnats.emoneg@plehog) and GOA (ku.ca.ibe@aog) helpdesks.
One of the great advantages of QuickGO is that it is very easy to start browsing the GO and its associated annotations. There is no software to download and the basic search interface is intuitive for novice users. Before understanding how some more complex tasks can be tackled, we will begin with a ‘quick-start’ guide to QuickGO.
The front page of QuickGO is shown in Figure 2 and from this point you have access to the majority of QuickGO's features, which are indicated. To assist new users, example queries are included below the search box.
An example of a simple GO term search would be to type ‘nucleus’ into the search box, this search results in a list of GO terms that have the word ‘nucleus’ in their GO term name, definition or synonym field. The required GO term can be viewed by clicking on the GO ID link. Figure 3 shows the information page for ‘nucleus’, the page is organized into tabs; Term Information, Ancestor Chart, Ancestor Table, Child Terms, Protein Annotation and Co-occurring Terms which contain all the information for a GO term including its GO ID, definition, synonyms, position within the ontology, relationship to its parent and child terms, proteins associated with the term and the terms which are most commonly co-assigned with the term.
One of QuickGO's strengths is its ability to extensively filter annotation data. This section will provide a quick-start guide on how to filter annotations in the most useful ways. Filtering is performed on any protein annotation table, e.g. the Annotation Download table (http://www.ebi.ac.uk/QuickGO/GAnnotation) or a Protein Annotation table associated with a single GO term, by using the blue ‘Filter’ boxes at the head of most columns. To get to the Annotation Download table from the front page of QuickGO, click on the ‘Find, View and Download sets of GO annotations’ link, the resulting table contains all the GO annotations in the GOA database, so is a good place to start customizing an annotation set.
We will now see how more complex questions can be answered with the help of QuickGO.
The first two cases are queries that were sent to the GO helpdesk and represent what many users would like to achieve. The final two cases describe more novel applications of QuickGO that researchers may find useful.
Topics covered include:
Question: “I'm currently working on zebrafish, and I would like to get a list of all genes implied in the development. What is the easiest way to get that list? The best for me would be to get Ensembl IDs of the genes, but other IDs would be ok.”
The user may find that he has to go to several different resources to find the information he requires; however, this question can be answered completely and very easily with QuickGO alone by utilizing its filtering and identifier mapping capabilities. Here, users can choose to see annotations with identifiers such as UniProtKB, Ensembl, RefSeq, FlyBase, etc.
Here is how to obtain these results with QuickGO:
Therefore, in these few simple steps we have been able to retrieve a set of annotations that would have either taken several resources to obtain or more advanced computing knowledge to extract the information from a gene association file.
Topics covered include:
Question: “I have a list of SwissProt accession numbers from a proteomics experiment. I am looking for a tool that will let me input (in batch mode) this list of accession numbers and give as output the GO annotation for cellular localization. I prefer this to be in a tab delimited format, such that the GO annotations can be viewed in Excel.”
Since we do not have the list of identifiers from the user, for this exercise we will use a list of breast cancer-associated proteins which were identified in a study by Tripathi et al. (13). The list is supplied as Supplementary Data available from the journal website.
When users have a large number of proteins or genes which they would like to functionally characterize (as in user case 2, above), they might be interested in generating an overview of the main cellular compartments (or molecular functions or biological processes) the proteins are located in. Such an analysis can be achieved by utilizing a GO slim, a subset of more general GO terms. Annotation sets can be ‘mapped-up’ to selected high-level terms using the ‘true path rule’, which means that an association of a protein with a GO term must be equally true for all the parents of that term, e.g. if a protein is directly annotated to the term ‘Golgi apparatus’ it must be also true that the protein could be annotated to being part of the cytoplasm—as ‘cytoplasm’ is a parent term of ‘Golgi apparatus’. QuickGO contains various GO slims which are maintained by the GO Consortium, such as slims targeted towards a particular taxonomic range, e.g. yeast or plant, as well as more general GO slims applicable to many species and areas of biology. QuickGO users can either directly use such pre-defined slims, or alternatively choose to change them, or create their own slims. GO slims are a common way to summarize the functional attributes of a list of genes or proteins from large-scale studies and can give added meaning to a dataset (14–16).
The user from Case 2 might additionally have been interested in displaying an overview of the cellular compartments their list of proteins are located in. We will now see how the user could have achieved this in QuickGO by using the 12 cellular component terms from the GOA slim.
Topics covered include:
QuickGO is well placed for finding gene products that only have predicted (electronic) evidence for a particular attribute since it includes annotations that have been applied by electronic prediction methods. In addition to using the filtering options in QuickGO, annotation sets can be customised using the ‘Advanced Search’, which uses Boolean operators to construct more complex queries. The ability to find gene products that only have a predicted function could be a valuable tool for researchers looking to focus their research. An example might be a scientist studying serine-type endopeptidases who wants to find proteins which are predicted to have this activity but where no experimental assay has been performed. Such a list could be then be used in further investigations as to whether the predictions are true. Documentation on how to use the Advanced Search can be found in the QuickGO Reference Manual (http://www.ebi.ac.uk/QuickGO/reference.html#advanced_annotation).
Here is how a query for finding predicted serine-type endopeptidases would be performed in QuickGO.
Of course, it is important to note that gene products may actually have published experimental evidence for an activity which has not yet been annotated. The scientist must decide which proteins are worth investigating, but this search is a good place to start.
Topics covered include:
A biological process or pathway consists of a number of steps, each of which may be controlled by different proteins. In the literature it is easy to find links between a protein's activity and a process it is involved in, for example an author may be studying phosphofructokinase enzymes and describe their involvement in glycolysis. In GO, however, these attributes have been divided into two separate ontologies; molecular function and biological process. Currently, there are no links between these ontologies and so it can be quite difficult to determine which activities are linked to which processes. Similarly, it is common knowledge that glycolysis occurs in the cytoplasm but, in GO, there is no link between biological process and cellular component that would indicate this.
A feature has been implemented in QuickGO, which allows the user to view which GO terms are commonly co-annotated to a gene product. The next example will demonstrate that by using co-occurrence statistics, information can be inferred about a process, activity or subcellular location simply by viewing what GO terms are commonly co-annotated to the same gene products.
This example will use the term ‘apoptosis’ to infer what types of enzyme activities are associated with this process and where in the cell proteins involved in apoptosis are located.
QuickGO is a fast web-based tool for the Gene Ontology, more than just a simple GO browser it is also a tool for analysis of GO terms and GO annotations. It has proven useful for curators browsing for GO terms as well as for users wanting to analyse gene/protein lists from large-scale experiments. QuickGO was recently redeveloped to include more advanced features than it had previously, such as the ability to: retrieve annotations for either a list of gene products or a list of GO terms, create customized annotation sets which can be bookmarked for later retrieval, provide statistics on various aspects of an annotation set, perform GO slim analysis, query the GOA database using Boolean operators, download sets of annotations or protein lists, find GO terms which are commonly co-annotated, and compare two or more GO terms and their relationships in a chart diagram. This tutorial has given an in depth look at some of the more complex tasks that can be performed in QuickGO, which the user may not be aware are available, and hopefully it has demonstrated that even these complex tasks do not require the user to have significant programming knowledge which is often the case when using other GO analysis tools.
GOA website: http://www.ebi.ac.uk/GOA/
QuickGO video tutorials: http://www.ebi.ac.uk/QuickGO/tutorial.html
QuickGO FAQs: http://www.ebi.ac.uk/QuickGO/FAQs.html
GO Consortium: http://www.geneontology.org/
GO Consortium evidence code guide: http://www.geneontology.org/GO.evidence.shtml
Other GO browsers/analysis tools: http://www.geneontology.org/GO.tools.shtml
GO helpdesk: ude.drofnats.emoneg@plehog
Contact GOA: ku.ca.ibe@aog
Supplementary data are available at Database Online.
The Biotechnology and Biological Sciences Research Council, Tools and Resources Fund (BB/E023541/1); the National Human Genome Research Institute (HG002273); and core EMBL funding. Funding for open access charge: National Human Genome Research Institute (2P41HG02273-07).
Conflict of interest. None declared.
We would like to thank Ruth Lovering and Varsha Khodiyar for testing of QuickGO during its redevelopment and Yasmin Alam-Faruque for critical reading of the manuscript.