Beyond the box: ‘wiggle’ tracks display continuous-valued data
A new data type allows the storage of one numeric value per base pair position, enabling a graphical display much like a bar chart or a continuous-valued signal across the genome. This data type is called ‘wiggle’ because of the appearance of the visual display of this data type in the Genome Browser. Numeric values are compressed to spare disk space and time, with a loss of information no greater than that of pixelation in the visual display. Values are stored in binary files that are indexed by database tables; therefore, the values cannot be retrieved from database tables alone. The Table Browser can retrieve values, returning them in a plain text format.
Track control pages for wiggle tracks offer the user a variety of controls over scaling and visual display of the values. The display mode can be set to ‘full’ for an indication of values by height, or to ‘dense’ for an indication of values by shade. When in full mode, the graph can be displayed as a bar chart or a sequence of points. A line may be drawn at y = 0, an additional line may be drawn at a specified y value and the height and scaling of the values can be adjusted. When zoomed out such that multiple data points must share the same pixel column in the display, the user can choose one of several windowing functions to determine how values are combined for display. The signal may be smoothed within a specified window of 2–16 pixel columns.
Some examples of new tracks that use the wiggle data type are Quality (available when an assembly is released with quality score files), the scores component of the Conservation track in and the GC Percent (GC composition in 5-base windows) track in .
Figure 1 Genome Browser zoomed in to base-level view, showing Base Position, Restriction Enzymes, sno/miRNA, Conservation and SNP tracks at chr19:58, 982, 732-58, 982, 815 in the May 2004 assembly of the human genome. At this location is microRNA hsa-mir-371, (more ...)
Figure 2 Genome Browser, zoomed out to view all of human chromosome 21 in the May 2004 assembly. A title line and assembly/position line have been added using the Base Position track's new label options. Large gaps show the location of unsequenced heterochromatin. (more ...)
Conservation: juxtaposed multi-species alignments and conservation scores
The Conservation track, available for many of the vertebrates as well as C.elegans, D.melanogaster and S.cerevisiae, combines multiple species alignment and per-base-pair conservation scores computed by phastCons and shown in an integrated visual display with special features to highlight differences and gaps at the base pair level. shows a base-level view of the Conservation track. The track controls page for the Conservation track offers all the wiggle display controls for the conservation scores, as well as controls governing the alignment display. Pairwise alignments can be hidden or displayed. Triplets of genomic bases can be highlighted by alternating colors. In the base-level display, bases identical to the reference may be displayed as dots and special marks may be enabled to indicate unaligned bases with a spanning chained alignment.
For the human, mouse, rat and D.melanogaster genomes, a chromosome ideogram displaying cytological staining patterns can be displayed above the main image with a red box indicating the currently viewed region of the chromosome.
New dynamically computed tracks
Two new Genome Browser tracks, Restriction Enzymes and Short Match, are dynamically computed for display only, rather than retrieved from the database; therefore, they are not available from the Table Browser. This dynamic approach is taken because the storage requirements would be prohibitive if the data were precomputed and stored. The Restriction Enzymes track displays target sites for restriction enzymes described in REBASE (53
). When the viewing region is zoomed in to the base-level, the restriction sites are displayed with tags showing the cutsites and overhang, with ambiguous bases shown in color (). The track control page for Short Match allows the user to input a 2–30 base sequence; the track then displays exact matches of that sequence within the current viewing region. Due to the computational expense and density of these tracks, the Restriction Enzymes track progressively limits the set of enzymes that it aligns when viewing very large regions, up to a maximum of 250
000 bases at which no restriction sites are displayed. The Short Match track limits itself to 1
000 matches within the current viewing region.
Enhancements to user custom track support
User custom tracks can be submitted in the new wiggle format for graphical display of numeric data. In addition to http URLs, ftp URLs are now accepted as sources of data. For websites with password protection, the URL format http://username:password@site/
can be used. Individual items within custom tracks formatted as Browser Extensible Data can be assigned arbitrary colors by including R, G, B color values in the previously reserved ninth column and including the keyword ‘itemRgb=on’ in the track description line. Custom track file format details are provided at http://genome.ucsc.edu/goldenPath/help/customTrack.html
. The ‘Custom Tracks’ link on the Genome Browser home page leads to a collection of public custom tracks contributed by various groups. As before, the Table Browser can generate a custom track from the results of a query, either exporting the file or directly importing the track into the user's session for subsequent viewing in the Genome Browser or further querying (including intersection with other tables) in the Table Browser.
New display modes
In addition to the previously described display modes ‘hide’, ‘dense’ and ‘full’, many tracks also support two new display modes. ‘Pack’ mode draws multiple items and their labels on the same horizontal levels when there is room for them to fit side-by-side. ‘Squish’ mode also packs items side-by-side when possible but saves even more space by omitting labels and drawing items at half-height. contains examples of these two new modes.
New labeling options in gene annotations, mRNA and Base Position tracks
A new option in the track controls page of each protein-coding gene annotation track enables coloring and labeling of amino acids translated from genomic codons, drawn on top of a gene's exons, when zoomed in sufficiently. In mRNA alignment tracks, the alignments can be labeled by genomic codons, mRNA codons, mRNA bases, mRNA codons that differ from genomic codons or mRNA bases that differ from genomic bases. The Base Position track has been enhanced to show amino acids from three frames of translation, when in ‘full’ display mode and zoomed in sufficiently. Stop codons are highlighted in red and methionine or start codons are highlighted in green. A small arrow to the left of the base values points to the right when viewing base values on the forward strand and to the left when viewing base values on the reverse strand. Clicking on this arrow toggles the strand. The Base Position track controls page now includes options for placing a title, assembly version and/or position at the top of the Genome Browser tracks image (shown in ).
New PostScript or PDF image generation
The Genome Browser can now generate a publication-quality image in PostScript or PDF formats, exporting a file which can be saved locally. These vector-graphics image formats do not have the grainy appearance of screen snapshots. was generated using this new feature.
Tracks organized by type
Due to the large number of annotation tracks now available (144 for the human July 2003 assembly as of September 2005), track controls are grouped into several categories (‘track groups’) to ease the task of looking for a track of interest: Mapping and Sequencing, Genes and Gene Predictions, mRNA and expressed sequence tag (EST), Expression and Regulation, Comparative Genomics and Variation and Repeats. For those human assemblies with dozens of ENCODE tracks, additional ENCODE-specific categories partition the tracks into categories of genes, transcript levels, chromatin immunoprecipitation, chromosome structure, comparative genomics and variation.
Track configuration page
A new ‘configure’ button found on the Genome Browser gateway page and main page (shown in ) leads to a track configuration page that provides brief descriptions and visibility controls for all tracks, with links to track control pages that provide even more configuration options. To enhance the speed of the Genome Browser display, the section of track controls beneath the image can be turned off using the configuration page.
Some datasets contain results of multiple parallel experiments; rather than create a separate track for each experiment in such a dataset, we create a single ‘composite track’ with unified controls. Individual experiments can be selected for display. When a composite track consists of >20 subtracks, extra buttons are added to the controls so that groups of related subtracks can be selected or deselected. For example, the Affy pVal track in the human Genome Browser (ENCODE Chromatin Immunoprecipitation section) consists of 41 subtracks, one for each combination of cell type and time point. Its controls include buttons for selecting all subtracks of each type of cell type and each time point.