The goal of the AthaMap ‘Gene Identification’ function is the identification of all binding sites of pre-selected TFs in all A. thaliana
genes. The tool can be accessed by selecting ‘Gene Identification’ at http://www.athamap.de
. shows a schematic overview of the new tool with parameters that the user can select (red), results obtained (yellow) and some further options for analysis of the obtained data (green). It is possible to select a specific TF from a list of all annotated TFs. To facilitate selection, one can first select the TF family. This restricts the number of selectable factors to these family members. The user can also define specific search parameters. The default upstream and downstream region of all genes to be searched is −500 and 50
bp, respectively. Positions are relative to either the transcription start site or the translation start site, depending on the annotation. The default region of −500
bp already covers the area in which most of the regulatory sequences are found within the upstream region of A. thaliana
genes. A recent study on the distribution of sequences corresponding to known regulatory elements revealed a localized distribution pattern upstream of the transcription start site (16
). For example, the G-box, CACGTG shows a peak position at −80 and a peak width of 273
bp. Hexamer sequences corresponding to regulatory sequences show peak positions between −62 and −138 and a peak width between 182 and 366
bp. Based on this study, a default region of −500 to +50
bp seems to cover the promoter region most likely harbouring the relevant TFBS for gene expression regulation. Nevertheless, these values can be changed, and a maximum window of 6000
bp upstream and 4000
bp downstream can be selected around either start site. For TFs with binding sites determined with PWMs, the minimal threshold can be increased to detect only genes with highly conserved TFBS (12
). Furthermore, it is possible to exclude genes regulated by small RNAs. This may be useful to exclude genes that are potentially post-transcriptionally regulated. The results can be displayed in two different sort modes. ‘Gene’ will list the results according to the genome identifier (AGI); ‘Distance’ will sort the results according to the distance of the TFBS to the start site of the gene. Results comprise a set of non-redundant genes (gene IDs) harbouring a potential TFBS of the selected TF including positional information and orientation of the TFBS relative to the putative target gene (, yellow). Also genes putatively regulated by small RNAs are identified. Additional information that can be obtained with the data is indicated in green (). For example, each result can be viewed in a sequence display window to analyse the genomic context of the identified TFBS. The gene set can also be submitted to the Gene Analysis function of AthaMap for detecting other TFs regulating these genes. Furthermore, the gene IDs can be used for analysis in microarray expression databases to determine whether these are coregulated. As an example for a result display, shows a partial screen shot with ABF1 and the default parameters. A total of 821 different genes (gene IDs) harbouring TFBS for ABF1 in the selected region were identified. If a gene harbours two TFBS within the selected region or if the TFBS is palindromic, the gene ID is shown twice. Palindromic sites can occur on both, the upper and lower strand (relative orientation, ). A non-redundant gene list can be displayed by selecting the underlined number of genes detected (). The result table also shows the relative distance to the start site and the score of the particular binding site detected. Gene names and positions are linked to the respective AthaMap sequence display window to explore the genomic context of the binding site. For some TFs, the number of sites to be searched had to be restricted. This applies to 13 TFs with putative binding site numbers of more than 200
000. In these cases, the threshold score used is displayed in a ‘table of restriction scores’, which can be accessed on the web interface (). For further data processing of results, binding sites detected around annotated genes can be downloaded as a file containing all sites detected for the selected TF between 2000
bp upstream and 2000
bp downstream of each gene (, download). On special request, the complete unrestricted positional information of TFBS in the A. thaliana
genome will be provided.
Schematic representation of the ‘Gene Identification’ function. The first level (red) shows user-selected parameters, the second level (yellow) shows results and the third level (green) shows further options for data analysis.
The web interface of the AthaMap ‘Gene Identification’ function. The result obtained with TF ABF1 is partially shown.