The UCSC Genome Browser Database (GBD, http://genome.ucsc.edu
) provides access to the DNA sequences for the human genome and many other organisms (1–4
). The database also contains annotation datasets for a wide variety of data types aligned to the reference genome sequence, which are displayed graphically as ‘tracks’ in the UCSC Genome Browser. Currently, the GBD offers sequence, annotations and browsers for 14 mammals, 10 nonmammalian vertebrates and 22 invertebrates, including 11 Drosophila
species and six worms. Although we do not provide browsers for low-coverage assemblies, the GBD incorporates the sequences of bushbaby, treeshrew, rabbit, common shrew, hedgehog, armadillo, elephant and tenrec into the human and mouse comparative genomic annotations. We add new and updated assemblies to the database as they are released by the sequencing centers, and maintain older assemblies either on the main site or in the genome archives (http://genome-archive.cse.ucsc.edu
), where the complete history of the human genome sequence is available. Links to other major genome databases, including Ensembl (5
) and NCBI MapViewer (6
), are provided throughout the site.
Genome assemblies are annotated with assembly clone details, GenBank mRNAs (7
), RefSeq alignments (8
), microarray gene expression data, regulatory element tracks, SNP and other variation data, multiple genome alignments and other datasets. The annotations offered in the Genome Browser's Comparative Genomics track group facilitate navigation among organisms using both the pairwise alignments in the chain and net tracks and multiple alignments (multiz) (9
Data in the GBD are updated regularly, including nightly updates of new mRNA submissions to GenBank (alignments of all new sequences to all assemblies), MGC (10
) and consensus coding sequence (CCDS); weekly updates of EST data; and a complete realignment whenever GenBank releases a periodic update. Certain other datasets are also updated regularly via new automated processes, including Ensembl genes annotations (5
) on several organisms (updated 3–5 times a year), monthly updates of mouse data from the International Gene Trap Consortium (IGTC) (11
) and regular new releases of the Database of Genomic Variants (DGV) (12
). By providing up-to-date releases of data originated by other groups, along with convenient linkouts to the primary sources, we seek to maintain our database as an integrated resource for the scientific community. All data are freely available via the Genome Browser and Table Browser interfaces, and may be downloaded in bulk at http://hgdownload.cse.ucsc.edu
. The source code and binaries are free for noncommercial use.
In addition to the Genome Browser graphical interface, the GBD provides other tools for efficient data mining. The Table Browser (13
) continues to be one of the most widely used features of the GBD toolset and is increasingly used to export data to the Galaxy (14
) tools at Penn State for further processing. The Gene Sorter (15
), the Proteome Browser (16
), VisiGene (3
), Genome Graphs (4
) and BLAT (17
) have been previously described.
UCSC is the Data Coordination Center for the Encyclopedia of DNA Elements (ENCODE) project (18
), which uses the GBD and Genome Browser for data storage and graphical access to the data. This project uses a variety of techniques to generate genome-wide annotations, including DNase hypersensitivity sites, mRNA expression, histone modification, transcription factor binding sites and gene annotations (Gencode). Data deposited for the ENCODE pilot project (now completed) are presented in the Genome Browser as separate track groups on the human hg18 assembly. Initial ENCODE production-phase data will become available in the coming year.
The evolving set of tools associated with the GBD has ever-increasing capability and configurability. Users can find assistance in using the database and tools via a large number of online help pages (http://genome.ucsc.edu/goldenPath/help
), FAQs (http://genome.ucsc.edu/FAQ
) and links to tutorials produced by Open Helix (http://openhelix.com
). We also provide staff resources to address questions from users through our mailing list (genome/at/soe.ucsc.edu