RGD has used ontologies for many years to standardize curated data and provide that data to end users in an organized manner. Over time, the ontologies used at RGD have grown in number to provide more comprehensive data. One recent change for ontologies/vocabularies at RGD was the conversion from a MeSH (Medical Subject Headings)-based disease vocabulary to a disease vocabulary based on a combination of MeSH terms (‘C’ branch and terms from supplementary concept data) and OMIM terms (Online Mendelian Inheritance in Man, http://www.omim.org
). This new disease vocabulary was developed at the Comparative Toxicogenomics Database (CTD, http://ctd.mdibl.org
). The CTD disease vocabulary incorporates a larger number of terms than the disease vocabulary previously used at RGD, which increases the granularity of the vocabulary, so more specific information can be leveraged. Although the disease vocabulary discussed here is not technically an ontology, when ontologies in general at RGD are mentioned in this article, it should be understood that the disease vocabulary is included. A second change in ontology/vocabulary use at RGD was conversion from the MeSH-based behavior vocabulary to the Neuro Behavior Ontology (http://www.obofoundry.org/cgi-bin/detail.cgi?id=neuro_behavior_ontology
) developed at the University of Cambridge (http://www.gen.cam.ac.uk/research/personal/gkoutos.html
). In addition, a significant expansion of ontologies at RGD has been made to serve as the basis of phenotype curation for rat strains. Four ontologies have been created at RGD (Clinical Measurement Ontology, Measurement Method Ontology, Experimental Condition Ontology, Rat Strain Ontology; Shimoyama,M., Nigam,R., McIntosh,L. S. et al
., unpublished data) to be able to curate and display quantitative phenotype data in a standardized manner in the recently developed RGD PhenoMiner tool (Shimoyama,M., Nigam,R., de Pons,J. et al
., unpublished data).
The new ontology search
The original ontology search tool at RGD () provided both end users and curators a way to find terms in the Gene Ontology (GO) (4
), Disease Vocabulary (‘C’ branch of MeSH), Behavior Vocabulary (‘F’ branch of MeSH), Mammalian Phenotype Ontology (MP) (5
) and Pathway Ontology (PW) (6
). The search allowed the user to choose which ontology is searched with a choice of search parameters (Contains, Equals, Begins With or Ends With). One major drawback of the original ontology search tool was that terms were searched, but not synonyms of those terms. That shortcoming was the main reason for needing to rebuild the ontology search tool. The new tool searches term fields, synonym fields and ID fields using words, portions of words or database IDs. In addition to improving the search algorithm, many more ontologies have been added to the ontology search tool (). The additional ontologies help increase the efficiency of previous curation workflows by allowing all ontology term searching to be done in one browser at RGD, instead of curators needing to use multiple off-site browsers for ontologies not previously available at RGD. The new tool allows searching all, one or a combination of ontologies.
Figure 1. Old and new ontology search. (A) Old ontology search with choice of ‘Contains, Equals, Begins With, or Ends With’ for searching terms in one ontology at a time. (B) New ontology search interface with options of searching all or a combination (more ...)
The new search first returns a list of ontologies in which the searched word(s) appears, along with a count of target terms found in each ontology (). Each ontology name is linked to a term results page listing all target terms found in that ontology. The term results page provides the accession number for each term and annotation counts for each term and its children. The term results page has links to both the ontology report page for any term in the list for which annotations exist at RGD, and the ontology browser, where the selected term is highlighted. The ontology annotation report pages have reciprocal links with the ontology browser (). All results pages, ontology report pages and browser pages have ontology term search boxes, so the maneuverability is maximized among all the various pages for searching and viewing ontology terms.
Figure 2. New ontology term search process at RGD. In this example, a search is done for ‘blood vessel’. From the ontologies returned, the user selects ‘GO: Biological Process’. From the terms returned, the user selects ‘angiogenesis’. (more ...)
The new ontology browser
Figure 3. Old RGD ontology browser. Terms are displayed in an expandable tree format with each term being a link to an ontology report page listing all RGD annotations using that term. This example has 15 paths for the term ‘angiogenesis’, with (more ...)
The new browser is much more compact in its presentation of ontology terms. It minimizes page scrolling by grouping parent terms, sibling terms and child terms of the searched or selected term. The selected term is presented in a center column together with all of its siblings (). All parent terms of the selected term are listed in a column left of the center. All child terms of the selected term are listed in a column right of the center. The three column arrangement allows for rapid drilling up and down the term tree, regardless of how many branches contain the selected term. When a parent, child or sibling term is selected, the driller columns are redrawn with the selected term placed in the center column with its siblings, and new parent and child terms listed in the adjacent columns.
Figure 4. New RGD ontology browser. Terms are displayed in a driller format with each term being selectable such that the selected term is highlighted and placed in the center column with all of its sibling terms. The selected term’s definition is shown (more ...)
An additional view of the selected term is shown at the bottom of the browser page (). It is a graph view showing the selected term and all its antecedent terms. It allows the user or curator to see the overall view of all term paths from the selected term up to the top node of the ontology tree. All the terms in the graph tree are links to the browser, such that a term clicked in the graph view becomes the selected browser term and the page is redrawn accordingly.
Graph view in new ontology browser. This view shows all the paths and parent terms between the selected term ‘angiogenesis’ and the top-level term ‘biological process’.
To compare the new RGD ontology browser with expandable tree format browsers, testing was done with both experienced browser users and inexperienced volunteers. Browsing up and down ontology trees was performed on four different tree format browsers (Mesh browser - http://www.nlm.nih.gov/mesh/2012/mesh_browser/MBrowser.html
for Disease Vocabulary, MP browser - http://www.informatics.jax.org/searches/MP_form.shtml
for Mammalian Phenotype Ontology, GO browser - http://www.informatics.jax.org/searches/GO_form.shtml
for Gene Ontology, and PW browser - http://bioportal.bioontology.org/ontologies/46237?p=terms
for Pathway Ontology) and the RGD driller-type browser. Subjects were timed while browsing sets of terms from four different ontologies/vocabularies. On average, subjects performed browsing significantly faster across all tested ontologies/vocabularies (disease vocabulary, Mammalian Phenotype Ontology, Gene Ontology and Pathway Ontology) using the new RGD browser as compared with the other browsers. Paired t
-tests on all the browser comparisons () covered a range of P
-values from a high of P
0.007 for phenotype term browsing by inexperienced users to a low of P
for gene ontology browsing by experienced users. The new RGD browser efficiently guides users through ontology searching, regardless of amount of previous experience.
Comparison of browsing speed in various ontology/vocabulary browsers
New ontology report pages
The new ontology report pages at RGD display upgraded features from the old report pages and new additional features. One feature kept from the old report pages is the GViewer [an embedded DHTML (dynamic HTML) application], which is displayed on the page directly below the basic term information (accession number, definition and synonyms; ). The GViewer gives a genomic view of the annotated genes, QTLs and congenic strains for the ontology term which is the subject of the report page. The strain information is new to the GViewer, where congenic strains are now mapped via the position of the flanking markers on the genome. Each annotated object is represented by a color-coded marker adjacent to the appropriate chromosomal location: genes—brown, QTLs—blue and strains—green. Clicking on a chromosome or object marker causes a scrollable ‘slider’ (gray box on X chromosome in B) to appear on the chromosome ideogram and a scrollable ‘zoom pane’ to appear below the chromosome pane. The zoom pane provides gene, QTL or strain symbols as labels for the objects’ genomic position markers. Those labels are links to individual report pages for genes, QTLs or strains. The GViewer has both data upload and download functions on the bottom bar of the chromosome pane.
Figure 6. New ontology report page. (A) This example of an ontology report page is for the disease term ‘hypertension’. The GViewer shows that RGD annotations to ‘hypertension’ have been made to genes, QTLs and strains. Part of the (more ...)
All annotations made to the report page’s ontology term are listed by species in the table immediately below the GViewer (A). A new tab feature allows users to view rat, mouse or human annotations separately or all together by selecting the appropriate tab. A second new feature for the annotation list is a toggle button to switch between annotations to the report page term alone and the report page term plus its children. An additional new feature is the ability to sort the lists by any of the columns of data in the list. All of these new features allow users more options to retrieve and view data than on the old report pages.
The bottom of the new ontology report page shows two different views of the report page’s term within the context of the ontology (). The first display is a tree view of the term with parents and children displayed. That view is configurable through a dropdown menu that allows the user to choose how many paths to the root node are displayed. The tree format gives the user an alternate way to browse the ontology. The second display is a graph view, similar to the one shown at the bottom of the ontology browser page. Any of the terms shown in the tree or graph view may be clicked to go to the ontology report page featuring that term. Also, any term may be displayed in the ontology browser by clicking on the ‘branch’ icon to the right of the term in the tree view.
Figure 7. Bottom portion of new ontology report page. The tree view on the left shows the selected term in boldface. The parent and child terms are shown above and below the selected term, respectively. A single path (the longest one) is displayed because ‘one (more ...)
All tools mentioned below are built on J2EE technologies (http://java.sun.com/j2ee/overview.html
) and driven off the RGD Oracle database. The tools can be run on any Java container that implements the Java Servlet 2.5 and JSP (JavaServer Pages) 2.1 specification or above. The popular Spring (7
) framework’s MVC (model-view-controller) architecture streamlines the application web development. The user interface relies heavily on the DOM (Document Object Model) technology along with CSS (Cascading Style Sheets). Supported browsers include Internet Explorer 7+, Firefox 3+, Chrome 13+ and Safari 5+.
Building the new ontology search
All ontologies being used are stored in the Oracle database and updated weekly. The ontology loading pipeline uses an FTP (file transfer protocol) to download the latest versions of ontology files in ‘.obo’ format from external sources. The SearchIndexer pipeline, run on a weekly basis, examines all ontology terms and their synonyms, and builds an index that is stored as a table in RGD’s Oracle database. The index is then used by the ontology search tool to perform efficient searches across multiple ontologies.
Building the new ontology browser
Graph views are generated by the ‘dot’ module of the open source Graphviz package (http://www.graphviz.org
). First, the document in dot language is built with the definitions of all paths from the selected term to the root term. After being passed to the Graphviz service, both the image and the corresponding image map are produced and presented on the term browser page. The developer can supply optional parameters to this graph generation service so the output can be customized to specific needs.
Building the new ontology report pages
GViewer is written in DHTML (dynamic HTML), taking advantage of CSS3, HTML5, AJAX (8
) and recent improvements in DOM technology. This technology allows the viewer to be platform independent without requiring browser plug-ins. Banding pattern and chromosome definitions are fed in via XML files, allowing the flexibility to display chromosome structure from any species. Genomic object tracks are also fed in via XML and plotted to their base pair location. Loose coupling of the configuration information allows for flexibility when embedding the tool into other applications. Once implemented in a web site, the existence of XML configuration is transparent to the end user.
To show ontology term trees on the report pages, the tools send hierarchical data queries to the Oracle database for optimum performance. Ontology data aggregates are computed after every ontology load once a week. This background job precomputes several pieces of information for every ontology term, such as count of immediate child terms, count of annotations for given species for all child terms, etc. RGD stores this data in a separate table in the Oracle database. This information is subsequently used on ontology report pages to significantly reduce page loading time.