Although basic data retrieval is useful, the real power of the Table Browser lies in the ability to filter and refine queries, intersect query results from different tables and configure the resulting output. These options may be accessed through the Table Browser’s set of advanced query features.
The available query formats and output options vary by table. Many apply only to tables in which the data is position-oriented, thus preserving the database distinction between positional and non-positional tables. Position-based tables may be further differentiated by the types of data they characterize. For example, alignment tables describe a block structure for each element, but other tables may describe only a starting and ending position. Still others may specify translation start and end positions as well as transcription start and end points.
Output format configuration
The Table Browser offers a variety of data configuration formats. In addition to the tab-separated output provided by basic queries, a user can choose from several file formats that may be uploaded as aligned custom annotation tracks in the Genome Browser: Gene Transfer Format (GTF), Browser Extensible Data (BED) and Custom Track format.
The custom annotation tracks generated by the Table Browser are a valuable research tool, offering the ability to view the results of a complex customized query in alignment with the standard annotation tracks in the Genome Browser. Because custom annotations are temporary, they persist for only 8 h after they were last accessed. The tracks never become part of the Genome Browser Database, and therefore are accessible only on the machine from which they were uploaded.
The Table Browser FASTA output option allows the user to format and retrieve DNA sequence for a selected region of the genome, similar to the Get DNA utility in the Genome Browser. Other output options enable the user to generate a list of hyperlinks to the Genome Browser corresponding to the locations of features identified by a query on a positional table, or display statistical information about the query, including the number and size of matches and type information about table fields.
The most flexible feature in the Table Browser is its filtering mechanism. The form-based filter provides a straightforward interface for configuring simple SQL-based queries of the data. By default, a Table Browser search retrieves all records for a specified coordinate range or position. Using the filter, the user may set constraints on the values of some or all of the fields within a table to restrict the set of records retrieved from the query range.
The text fields within the filter support wildcard pattern matching and multiple entries. If any word or pattern within the text field matches the value, then the record meets the constraint on that field. Numeric field comparisons support the operators <, >, and != (not equal) and allow comparisons with ranges of numbers.
To satisfy the needs of advanced users who find the form-based filtering options to be insufficient, the Table Browser also supports free-form queries allowing more complex constraints, typically to relate two or more fields within the selected table. These queries—which use SQL ‘where’ clause syntax—can combine simple constraints with AND, OR and NOT, using parentheses as needed for clarity. A basic free-form constraint consists of a field name (or an arithmetic expression of numeric field names), a comparison operator and a value.
For example, when searching for gene models in which a promoter region may be present, the simple free-form query (txStart != cdsStart) on the refGene table will produce a list of genes that have the expected 5′ untranslated region (UTR) upstream sequence. Note that if the strand is negative, this will search for cases of 3′ UTR downstream sequence.
In a more complex version of the previous query, (txStart != cdsStart) AND (txEnd != cdsEnd) AND (exonCount = 1) will return a list of single exon genes with both 5′ and 3′ flanking UTRs.
Multiple table comparisons
At times one may wish to compare the data between two tables to determine whether any features have positions in common within the genome. The Table Browser provides a simple interface offering the choice of several types of table comparisons based on feature positions.
One class of comparisons preserves the gene or alignment structure of the primary table, resulting in output that describes the same type of feature as is shown in that table. Primary table features are kept or discarded based on the amount of positional overlap with features contained in the secondary table. The user controls the query output by specifying the threshold of overlap: any, none or a percentage.
For example, one might want to identify all the spliced ESTs that align to a particular region in the Known Genes annotation track. The user would select the location of interest in the Table Browser, choose the chrN_intronEst table, and then proceed to the advanced query options. Intersecting the EST table with the knownGene table results in the desired list.
A second class of intersections and unions compares the positions of table features one base position at a time. These queries return only position ranges and do not preserve the structure of the primary table. A base-by-base intersection of two tables will include the base in the output if the nucleotide position is covered by at least one feature of both tables. In a union, the base position need only be covered by the feature of one table.
A case in which this kind of comparison is appropriate is a density estimation of a certain feature, e.g. the number of bases within a genomic given region that are repeats or the number of genes within a chromosome that overlap with a repeat. Figure shows an example in which a user wishes to obtain a list of positions in the human chromosome X p-arm in which a SINE repeat overlaps the coding sequence of a gene. This query also illustrates the use of the Table Browser’s custom annotation output format.
Figure 1 Example of an advanced Table Browser query illustrating the use of a base-by-base table intersection between a standard table and a user- created custom annotation table. The goal is to obtain a list of positions in the p-arm of human chromosome X (Build (more ...)
The set of positions covered by one of the above tables can be complemented (inverted) prior to making the comparison to give the user more flexibility. The user also has the option to set constraints on the field values of the secondary table.
Retrieving subregions of features
In addition to the SQL constraints on queries, the Table Browser allows the user to specify which subregions of features should be present in the output. For example, someone interested in promoters may want to view the region covered by a gene as well as 5000 additional bases upstream from the 5′ end (or downstream from the 3′ end on the negative strand).
The set of available subregion constraints varies among table types. For instance, gene prediction tables specify both exon structure and translated region. The user may constrain the output to show upstream and downstream regions, exons, introns, or 5′, 3′, or coding exons. Alternatively, alignment tables, which specify block structure but not translated region, offer only upstream, downstream, blocks or inter-block regions.