The regulation of gene expression is mainly conferred by transcription factors (TFs) that bind to cis
-regulatory sequences. These sequences can be used to generate hypothesis about TF that may be involved in the regulation of nearby genes (1
). In Arabidopsis thaliana
, more than 1500 TFs corresponding to ~5% of the total genes have been identified (3
). The largest families are MYB and MYB-related (190 members), AP2/EREBP (144), bHLH (139), NAC (109), C2H2(Zn) (105), HD (89), MADS (82), bZIP (81) and WRKY (72).
Since the complete sequence of the A.thaliana
genome has been published (4
), it was desirable to have a map of transcription factor binding sites (TFBSs) for the whole genome. The non-restrictive nature of such a map permits the identification of regulatory sequences within transcribed and coding regions as well. To accomplish such a map, the pattern search program Patser (5
) and publicly available alignment matrices were used to generate the AthaMap database, the first TFBS map for the whole A.thaliana
). The second release of the AthaMap database presented here has increased the data content from ~2.4 × 106
to >7.4 × 106
putative sites. Specific care has been taken in the annotation of CAT- and TATA-boxes, which were predicted using alignment matrices from the PlantProm database (7
) together with the positional information relative to transcription start sites (TSSs) or translation start sites. Because each TFBS is associated with a particular score that represents the similarity of the site to the underlying alignment matrix, a new functionality was implemented that allows the identification of highly conserved binding sites.
It is well known that the composition of binding sites in the regulatory region of a gene confers its specific expression profile (8
). For example, two G-box like sequences constitute the as-1
element that is bound by bZIP TFs (9
). Another example is the ocs
element that occurs in certain glutathione S
-transferase genes of Arabidopsis
, which harbour a bZIP and DOF factor binding site in close vicinity (10
). A wide variety of expression specificity is associated with the co-localization of MYB- and MYC-binding sites (13
). Other examples are MADS/MADS TFBSs and those TFs that harbour two DNA-binding domains, such as AP2 (17
For the identification of such co-localizing elements, a new web tool was implemented that permits a user-defined identification of pairs of TFBSs in the genome of Arabidopsis by providing distance and quality parameters. This web tool was used to identify the co-localizing sites for known interacting factors. Such combinatorial elements were annotated to the AthaMap database and can also be used for the identification of more complex elements consisting of, for example, two combinatorial elements harbouring four TFBSs.