BigBed files are generated from Browser Extensible Data (BED) files. Like the BED format, the BigBed format is used for data tables with a varying number of fields. BED files consist of a simple text format: each line contains the fields for one record, separated by white space. The first three fields are required, and must contain the chromosome name, start position and end position. The standard BED format defines nine additional, optional fields, which (if present) must appear in the predefined order (Supplementary Table 1
). Alternatively, BED files may depart from the standard format after the third field, continuing with fields specific to the application and data set. BigBed files that contain custom fields, unlike those of simple BED format, must also contain the field name and a sentence describing the custom field. To help others understand custom BED fields, an autoSql (.as) (Kent and Brumbaugh, 2002
) declaration of the table format can be included in the BigBed file (Supplementary Table 2
BigWig files are derived from text-formatted wiggle plot (wig) or bedGraph files. They associate a floating point number with each base in the genome, and can accommodate missing data points. In the UCSC Genome Browser, these files are used to create graphs in which the horizontal axis is the position along a chromosome and the vertical axis is the floating point data (). Typically, these graphs are represented by a wiggly line, hence the name ‘wiggle’. Three text formats can be used to describe wiggle data at varying levels of conciseness and flexibility. Values may be specified for every base or for regularly spaced fixed-sized windows using the ‘fixedStep’ format. The ‘variableStep’ format encodes fixed-sized windows that are variably spaced. The ‘bedGraph’ format encodes windows that are both variably sized and variably spaced.
Fig. 1. Genome Browser image of BigWig annotation tracks. The top track is displayed as a bar graph, the bottom track as a point graph. Shading is used to distinguish the mean (dark), one standard deviation above the mean (medium) and the maximum (light). Peaks (more ...)
Data files of fixedStep format are divided into sections, each of which starts with a line of the form:
fixedStep chrom=chrN start=position step=N span=N
where ‘chrom’ is the chromosome name, ‘start’ is the start position on the chromosome, ‘step’ is the number of bases between items and ‘span’ shows the number of bases covered by each item. Step and span default to 1 if they are not defined. This section line is followed by a line containing a single floating point number for each item in the section.
The variableStep format is similar, but the section starts with a line of the format:
variableStep chrom=chrN span=N
and each item line contains two fields: the chromosome start position, and the floating point value associated with each base.
The bedGraph format is a BED variant in which the fourth column is a floating point value that is associated with all the bases between the chromStart and chromEnd positions. Unlike the zero-based BED and bedGraph, for compatibility reasons the chromosome start positions in variableStep and fixedStep are one-based.
To create a BigBed or BigWig file, one first creates a text file in BED, fixedStep, variableStep or bedGraph format and then uses the bedToBigBed, wigToBigWig or bedGraphToBigWig command-line utility to convert the file to indexed binary format. In addition to the text file and (in the case of BigBed) the optional .as file, the conversion utilities require a chrom.sizes input file that describes the chromosome (or contig) sizes in a two-column format (chromosome name, chromosome size). The fetchChromSizes program may be used to obtain the chrom.sizes file for any genome hosted at UCSC. All of the command-line utilities can be run without options to display a usage summary.
The wigToBigWig program accepts fixedStep, variableStep or bedGraph input. The bedGraphToBigWig program accepts only bedGraph files, but has the advantage of using much less memory. The wigToBigWig program can take up to 1.5 times as much memory as the wig file it is encoding, while bedGraphToBigWig and bedToBigBed use only about one-quarter as much memory as the size of the input file.
Once a BigBed or BigWig file is created, it can be viewed in the UCSC Genome Browser by using the custom track mechanism (Supplementary Material
). In brief the indexed file is put on a website accessible via HTTP, HTTPS or FTP, and a line describing the file type and data location in the form:
is entered in the custom track section of the browser. Additional settings in var = value format can be used to control the name, color, and other attributes of the track. When the custom track is loaded and displayed, the Genome Browser fetches only the data it needs to display at the resolution appropriate for the size of the region being viewed. While it may take a few minutes to convert the input text file to the indexed format, once this is done there is no need to upload the entire file, and the response time on the browser is nearly as fast as if the file resided on the local UCSC server.
Because the BigWig and BigBed files are binary, we have created additional tools that parse the files and describe the contents. The bigWigSummary and bigBedSummary programs can quickly compute summaries of large sections of the files corresponding to zoomed-out views in the Genome Browser. The bigWigInfo and bigBedInfo can be used to quickly check the version numbers, compression status and data ranges stored in a file. The bigBedToBed, bigWigToWig and bigWigToBedGraph programs can convert all or just a portion of files back to text format.