The Advanced Biomedical Computing Center maintains local copies of many widely used biological databases and bioDBnet was created with the intention of integrating all of these databases (http://biodbnet.abcc.ncifcrf.gov/dbInfo/faq.php#net2
). It is built by a data warehouse-based integration where the connections are formed by exploiting the existing cross-references in the local copies of various public data sources, mainly Ensembl, UniProt and EntrezGene. The current release of bioDBnet is built by integrating 20 biological databases and recognizes more than 100 different types of database types from the molecular biology database collection (http://www.oxfordjournals.org/nar/database/subcat/3/8
). It has 153 database identifiers (nodes) connected by 554 cross-references (edges) (http://biodbnet.abcc.ncifcrf.gov/dbInfo/netGraph.php
). It includes gene centric database identifiers like EntrezGene Gene ID, Ensembl Gene ID; protein identifiers like UniProt Accession, Ensembl Protein ID; annotations like GO, InterPro; microarray identifiers from Affymetrix, Agilent; Sequence identifiers from GenBank, RefSeq; and Pathway identifiers—from Biocarta and KEGG.
Various options within bioDBnet offer a variety of functionalities to suit different user needs. All of these tools support batch queries and the results are downloadable as both excel and text files. In addition, the identifiers in the results are linked to external resources wherever applicable.
Brief descriptions of the main menu options: ‘db2db’ is a conversion tool that lets users convert from one type of biological database identifier to another. ‘dbFind’ allows users to convert from one identifier to any of the standard identifiers in bioDBnet without specifying the actual type of input. It can be used when the exact type of input is not known or with a mixture of database identifiers (, i). ‘dbReport’ generates an all inclusive report with every possible annotation for a given type of input (, ii). Wherever applicable the reports have links to polyBrowse (http://pbrowse2.abcc.ncifcrf.gov
), a gbrowse-based browser (Stein et al.
), the UCSC genome browser (http://genome.ucsc.edu
, Kent et al.
) for visualizing data on the chromosomes and to DAVID (http://david.abcc.ncifcrf.gov
, Huang da et al.
) for functional annotation clustering. ‘dbWalk’ is a customizable database conversion tool giving the users total control of the type of conversion and the intermediate databases (, iii). This allows a user to incorporate preferences into the path followed, based on the data coverage (http://boidbnet.abcc.ncifcrf.gov/dbInfo/netGraphTbl.php
) or the user's confidence in the data quality from a particular database.
Fig. 1. Partial screen shots of the results from bioDBnet. (i) dbFind to get the types of a mixture of identifiers and converting them to a single type, in this case Gene ID's. (ii) Partial report for Entrezgene Identifier ‘1’ from dbReport. ( (more ...)
bioDBnet also offers various supporting tools to enhance connectivity of biological knowledge and annotations. ‘bioText’ can be used to text mine Gene, UniProt or GO (Gene Ontology) annotations. ‘goTree’ displays the GO hierarchy for any GO accession in a top-down manner starting with the input accession to all its parents. Given any type of database identifiers the ‘chrView’ tool tries to find their chromosomal location and displays the results in a movable and zoomable SVG image. This provides for a whole-genome view with an ability to detect clusters. ‘orgTaxon’ provides an easy-to-use search interface to find the taxon ID of any organism.