|Home | About | Journals | Submit | Contact Us | Français|
The Gene Ontology (GO) initiative is a collaborative effort that uses controlled vocabularies for annotating genetic information. We here present AGENDA (Application for mining Gene Ontology Data), a novel web-based tool for accessing the GO database. AGENDA allows the user to simultaneously retrieve and compare gene lists linked to different GO terms in diverse species using batch queries, facilitating comparative approaches to genetic information. The web-based application offers diverse search options and allows the user to bookmark, visualize, and download the results. AGENDA is an open source web-based application that is freely available for non-commercial use at the project homepage. URL: http://sourceforge.net/projects/bioagenda.
The emergence of novel genetic techniques and the exponential accumulation of genomic data have increased the need for bioinformatics tools.1,2 Biological ontologies facilitate the handling of complex biological data and contribute to the interoperability across multiple data sources.3,4 The Gene Ontology (GO) database summarizes information about the molecular functions, cellular components, and biological processes related to gene products.5 Many tools have been created to search, browse, and analyze the GO database.6 Many of these tools accept only a single gene or GO term as an input, hampering systematic comparisons between GO annotations associated with different GO terms and genes: Complex biological questions that, for example, involve more than one biological process or molecular function cannot be addressed if only one GO term is considered. Similarly, when elucidating a certain biological mechanism, sets of genes rather than single genes are often the focus, raising the need to simultaneously access GO associations of multiple genes. Another limitation in accessing the GO database is that while most programs (eg, EasyGO,7 GOstat,8 Onto-Express9) produce a short list of significantly enriched GO terms,10,11 they do not allow to query particular GO terms independent of enrichment, which might be of interest if one wants to know which of the genes that are linked to one GO term are associated with a second, user-defined term.
Here we present AGENDA (Application for mining Gene Ontology data), a novel web-based application for comparing GO annotations associated with multiple GO terms in different species. The program allows for complex queries using GO Slims17 and Boolean operators. Unlike the programs listed above, with AGENDA it is possible to analyze genes that are annotated to a certain GO term that is defined not by enrichment but by the user. Moreover, using AGENDA, evidences for each annotation can be accessed and the results of the analysis are visualized and can be exported. The usefulness of Boolean operators for mining the GO database had been previously acknowledged.12 Using Boolean operators to refine queries in a step-by-step manner, AGENDA allows to access the GO database in a more flexible manner than was possible before. By combining GO Slims with Boolean Operators, AGENDA facilitates complex queries of GO data in a step-by-step manner whereby the results of each step can be used principally as a starting point for follow-up steps.
AGENDA offers simple and advanced modes of retrieving GO information that are described below. 12 organisms are supported: Arabidopsis thaliana, Caenorhabditis elegans, Danio rerio, Dictyostelium discoideum, Drosophila melanogaster, Escherichia coli, Gallus gallus, Homo sapiens, Mus musculus, Rattus norvegicus, Saccharomyces cerevisiae and Schizosaccharomyces pombe. The genomes of all these organisms are annotated within the ongoing Gene Ontology’s Reference Genome Project.15 The interface is designed to be intuitive, allowing for an easy navigation to enable convenient data mining. AGENDA provides user-friendly internal and external links for accessing the target data. It is possible to filter the query results by choosing between GO evidence codes such as “IMP” (“Inferred from Mutant Phenotype) and “ISS” (“Inferred from Sequence or Structural Similarity”). Query information can be stored using the URL address, and gene lists can be exported as CSV files. AGENDA also provides access to the homology data generated by the Gene Ontology’s Reference Genome Project.15 Output data obtained with one query can be reused as the input for subsequent queries, allowing to refine searches step-by-step.
Apart from simple queries that focus on only one GO term or gene, two types of batch queries are supported: First, different, user-defined GO terms can be simultaneously queried using the GO Slimmer, a method that uses parent-child relationships between GO terms to compare gene lists of interest with lists that are annotated to GO terms. GO Slimmer identifies overlap between these lists and produces “GO Slims” that quantify the overlap for each GO term. In AGENDA, gene lists of interest are always related with certain GO terms, but GO Slims can be also produced if different gene sets of interests, such as whole genomes of specific organisms, are used as query input.16,17 Second, queries of different user-defined GO terms can be combined via Boolean operators (AND, OR, NOT). Each query option is represented by a separate web-page in the program. Data from one page can fully be transferred to another so that different types of queries can be linked. Web-pages of the program include input fields, output fields, and charts for visualization of the results.
GO terms can be queried in AGENDA using accession numbers, names or synonyms (if any). When querying apoptotic proteins, for example, “GO:0006915” (accession number), “apoptotic process” (name), or “apoptosis” (synonym) can be typed in as the input. In a similar manner, a gene product can be queried using its symbol, full name or synonyms (if any). For example, “TP53” (symbol), “Cellular tumor antigen p53” (full name) and “P53” (synonym) are all accepted when querying human TP53. A detailed user guide describing this query expansion and other features of AGENDA is available as a web page (http://bioagenda.uni-goettingen.de/userguide.php).
Many forms of cancer arise from alterations in apoptosis18 The Gene Ontology database can, for example, be used to find out which genes are implicated in apoptosis (GO:0006915) in humans, and which of the respective gene products localize to mitochondria (GO:0005739), the nucleus (GO:0005634), and the plasma membrane (GO:0005886). Using simple queries only, the cellular localizations of each of the 1771 human genes that are associated with apoptosis would need to be accessed individually. Using the GO Slimmer page of AGENDA, all these GO terms can be simultaneously accessed and the respective information can be obtained with a single mouse click (Fig. 1). Using Boolean queries, in turn, it is, for example, possible to assess which of the human apoptosis genes are associated with mitochondria or the nucleus but not the plasma membrane, linking in one query all the three Boolean operators to delineate genes. By simply exchanging the name of the species, genes of eg, zebrafish or Drosophila that satisfy the same query can be displayed. Details about each of the identified genes can be found by clicking on the gene’s name: this opens a simple query page for the gene, which includes information about all its GO annotations and links to the supporting evidence.
We thank the GWDG (Gesellschaft für wissen-schaftliche Datenverarbeitung mbH Göttingen) for technical support. We acknowledge the Gene Ontology Consortium as the source of Gene Ontology data and Google for providing its Charts API infrastructure.
Conceived and designed the experiments: GO, MCG. Analysed the data: GO, QL. Wrote the first draft of the manuscript: GO. Contributed to the writing of the manuscript: QL, MCG. Agree with manuscript results and conclusions: GO, QL, MCG. Jointly developed the structure and arguments for the paper: GO, QL, MCG. Made critical revisions and approved final version: GO, QL, MCG. All authors reviewed and approved of the final manuscript.
Disclosures and Ethics
As a requirement of publication author(s) have provided to the publisher signed confirmation of compliance with legal and ethical obligations including but not limited to the following: authorship and contributorship, conflicts of interest, privacy and confidentiality and (where applicable) protection of human and animal research subjects. The authors have read and confirmed their agreement with the ICMJE authorship and conflict of interest criteria. The authors have also confirmed that this article is unique and not under consideration or published in any other publication, and that they have permission from rights holders to reproduce any copyrighted material. Any disclosures are made in this section. The external blind peer reviewers report no conflicts of interest.
This work was supported by the NRW IGS GFG (International Graduate School in Genetics and Functional Genomics) (to G.O.) and the DFG (Deutsche Forschun-gsgemeinschaft) (Go 1092/1-1) (to M.C.G.).