Fundamental to expanding our knowledge of how the human body works in health and in disease is the capability to access and share data produced through experimentation and computational analysis. The University of California, Santa Cruz (UCSC) Genome Browser Database (GBD) (http://genome.ucsc.edu
) provides a common repository for genomic annotation data—including comparative genomics, genes and gene predictions; mRNA and EST alignments; and expression, regulation, variation and assembly data—and robust, flexible tools for viewing, comparing, distributing and analyzing the information. Produced and maintained by the Genome Bioinformatics Group at the UCSC Center for Biomolecular Science and Engineering, the GBD focuses primarily on vertebrate and model organism genomes, with an emphasis on comparative genomics analysis.
As of September 2007 the GBD contains data for 11 mammalian species including human, mouse, rat, chimpanzee, rhesus macaque, horse, cow, cat, dog, opossum and platypus; 8 other vertebrates: chicken, lizard (Anolis carolinensis
), frog (Xenopus tropicalis
), zebrafish, fugu, tetraodon, medaka and stickleback; and 21 invertebrates including 11 flies, honeybee, Anopheles
mosquito, five worms, one yeast (Saccharomyces cerevisiae
) and two deuterostomes—purple sea urchin and sea squirt. For many of the organisms, more than one assembly is provided, and several older archived assemblies may be found at: http://genome-archive.cse.ucsc.edu/
. The GBD stores a collection of annotation data for each assembly, which can be viewed graphically in the UCSC Genome Browser (2
) as a series of ‘tracks’ aligned to the genomic sequence and grouped according to shared characteristics, for example gene predictions, gene expression and variation data. In most instances, each annotation track is represented by a position-oriented table based on genomic sequence coordinates, and may be supplemented by additional non-positional tables that supply related information or link the primary table to other tables in the database. The data are stored in a variety of formats described at: http://genome.ucsc.edu/FAQ/FAQformat
Minimally, the GBD provides assembly data, comparative genomics annotations, and mRNA, EST and RefSeq (3
) gene alignments (when available) from GenBank (4
) for each assembly. When available, links are provided to the complementary annotations in two other major genome browsers, Ensembl (5
) and NCBI's MapViewer (6
). A large set of additional annotations is available for widely studied genomes such as the human and mouse. Assemblies that lack sufficient native RefSeq data alignments and are of sufficient evolutionary distance from the human genome may also include a human proteins annotation that maps human exons using tBLASTn. The organizations and individuals who contributed to the sequencing, assembly, and annotation of featured organisms are acknowledged at: http://genome.ucsc.edu/goldenPath/credits.html
; detailed information about the individual annotation tracks may be found in the Genome Browser by clicking the vertical gray or blue bars to the left of the displayed tracks.
UCSC updates the genome assemblies and annotations in the GBD as new releases become available, with priority given to primate and model organism assemblies and annotations that we feel are of widespread interest to GBD users, based on input from our Scientific Advisory Board and feedback received through our mailing lists and user surveys. (The results from a users’ survey conducted in May 2007 may be reviewed at: http://genome.ucsc.edu/goldenPath/help/GBsurvey507.html
.) RefSeq and mRNA data from GenBank are updated daily; EST data are updated weekly.
In addition to the Genome Browser, several other graphical tools for exploring the data are available from the GBD website, including the Table Browser (7
), which provides access for downloading and manipulating the GBD tables as text or tracks; the BLAT sequence-mapping tool (8
); the In Silico PCR tool that searches a sequence database with a pair of PCR primers; the Gene Sorter (9
) for exploring expression, homology and other gene relationships; the VisiGene in situ
image browser, the Proteome Browser (10
) for viewing related protein information; and the new Genome Graphs tool for uploading and viewing genome-wide data sets. This toolset is accompanied by a comprehensive set of online documentation and FAQs listed at http://genome.ucsc.edu/FAQ/
. Online and hands-on training materials are available via the Training link (http://genome.ucsc.edu/training
) on the GBD home page.