|Home | About | Journals | Submit | Contact Us | Français|
The AthaMap database generates a map of predicted transcription factor binding sites (TFBS) for the whole Arabidopsis thaliana genome. AthaMap has now been extended to include data on post-transcriptional regulation. A total of 403 173 genomic positions of small RNAs have been mapped in the A. thaliana genome. These identify 5772 putative post-transcriptionally regulated target genes. AthaMap tools have been modified to improve the identification of common TFBS in co-regulated genes by subtracting post-transcriptionally regulated genes from such analyses. Furthermore, AthaMap was updated to the TAIR7 genome annotation, a graphic display of gene analysis results was implemented, and the TFBS data content was increased. AthaMap is freely available at http://www.athamap.de/.
A large number of different databases are available for database-assisted gene-expression analysis (1). The first level of gene-expression regulation is transcription which is controlled by the synchronized binding of transcription factors (TFs) to adjacent cis-regulatory sequences. The bioinformatic identification of cis-regulatory sequences is an important tool to predict target genes of specific TFs (2). Towards these ends, the AthaMap database was developed. AthaMap is a database that generates a genome-wide map of predicted transcription factor binding sites (TFBS) and cis-regulatory elements for Arabidopsis thaliana (3,4). Compared to similar databases such as AGRIS, Athena and ATTED-II (5–8), AthaMap covers the whole-genome sequence and includes predicted TFBS that were identified with positional weight matrices. Recently, plant-related contents of the transcription and promoter databases TRANSFAC and TRANSPRO (9,10) were integrated with plant proteome and pathway data to the platform BKL Plant (BIOBASE Knowledge library). This was combined with the previously reported ExPlain tool that screens promoter regions with positional weight matrices for TFBS and evaluates results using the ‘Composite Module Analyst’ (CMA) as core component (11,12). This commercial product integrates promoter and pathway analysis of gene-expression data (BIOBASE, Wolfenbüttel, Germany).
In contrast, AthaMap is in the public domain and provides online tools to display TFBS in user-selected genes or at specific genomic positions (3). The detection of combinatorial elements and their target genes allows the prediction of co-regulated genes (13). The gene analysis function detects common TFBS in user-provided genes (14). A short user manual has been published recently (15) and all tools are explained on the ‘Description’ page on the AthaMap website as well. AthaMap has been linked with PathoPlant, a database on plant–pathogen interactions (16). Arabidopsis thaliana microarray experiments in PathoPlant can be screened for co-regulated genes that respond to up to three different stimuli (17). A list of co-regulated genes can directly be exported to AthaMap for identification of common TFBS. However, not all differentially expressed genes are transcriptionally regulated (18). One important factor for post-transcriptional regulation is the expression of small RNAs such as miRNA, siRNA and ta-siRNA (19). Although there are distinct pathways to generate these types of small RNAs, the resulting molecules are very similar in size and represent the small RNA transcriptome of the organism (20). Using a massive parallel sequencing approach, small transcriptome data became available for seedlings and inflorescence tissue of A. thaliana (21). The genome-wide nature of AthaMap and the availability of small RNA data provide a unique opportunity to combine transcriptional and post-transcriptional data in a single database. This may add significantly to the quality of cis-regulatory sequence identification involved in transcriptional regulation.
Sequence signatures (17-mers) derived from a small RNA transcriptome analysis of A. thaliana inflorescence tissue and seedlings were used for genomic screenings (21). The complete lists of screening sequences (Accession numbers GSM65747 and GSM65750) were downloaded from NCBI's Gene Expression Omnibus (GEO) repository (22). Genomic positions were determined by using a Perl script that screens for occurrences of perfect matches of all 109 590 small RNA 17-mer screening sequences within the five chromosomes of A. thaliana. Absolute positions and orientation of small RNA matches from inflorescence tissue and seedlings were annotated to AthaMap resulting in a total of 403 173 genomic matches. For screening sequences yielding more than one genomic match, corresponding loci were determined. A total of 5772 genes were predicted to be post-transcriptionally regulated by small RNAs since their transcribed regions are targets of at least one small RNA in antisense orientation. A text file with the genome identifiers of the 5772 predicted target genes of small RNAs can be downloaded on the documentation page at AthaMap.
Genomic positions of small RNAs are displayed in AthaMap analogous to TFBSs and are symbolized as xxxxx>. The arrow head gives the orientation of the small RNA. A tool tip box appears when moving over the arrow indicating the absolute genomic position and screening library of the small RNA. Selecting the name adjacent to this symbol will open a new window giving additional information. Figure 1 shows a partial screen shot of position 11 911 on chromosome 1 with a small RNA from the inflorescence library, the tool tip box and the associated pop-up window. This new window shows the screening sequence, corresponding genomic positions for this particular small RNA and the reference.
Putative post-transcriptionally regulated genes are identified within the Colocalization and Gene Analysis functions. These genes are tagged on the result pages with an italicized genome identifier. They can be subtracted in the Colocalization and Gene Analysis functions by activating the checkbox ‘exclude genes regulated by smallRNA’ in order to restrict the analyses exclusively to transcriptionally regulated genes.
The recent publication of the TAIR7 A. thaliana genome release motivated the implementation of this genome annotation into AthaMap (23). The annotation of the gene structure is based on five chromosomal XML flatfiles downloaded from the TAIR web site (release 7). These files were parsed using a Perl script and positional information for 5′- and 3′-UTRs, exons and introns were annotated to AthaMap. These regions are displayed in AthaMap with a colour code similar to the one used by TAIR. Due to the significantly increased number of genes with annotated transcription start site (TSS) in TAIR7, the Gene Analysis and Colocalization functions of AthaMap have been changed to show positions of TFBS relative to TSS of the nearest gene. This applies to 23 222 (73.1%) genes while for the remaining 8540 (26.9%) genes results are still displayed relative to the translation start site. In earlier versions of AthaMap, all positions were shown relative to translation start sites as point of reference. Compared to TAIR5 the previous version annotated to AthaMap, the nucleotide sequence of the A. thaliana genome in TAIR7 was not changed. Therefore, the positional information of all previously determined TFBS remained constant, except for TATA-boxes. Because of the larger number of genes with an annotated TSS, the number of annotated TATA boxes decreased from 16 277 (13) to currently 15 955. The number of TATA boxes decreased because for genes lacking a TSS a larger upstream region was screened for putative TATA boxes than for genes with an annotated TSS (3). Therefore, the lower number of TATA boxes results from elimination of false positives.
The Gene Analysis function of AthaMap generates long lists with positional information on TFBSs in all genes investigated (14). Although overviews or summaries of the data can be displayed, the positional information is difficult to perceive. Therefore, a graphic display of TFBS in the analysed gene region was implemented that enables easy comparison between genes and visual identification of common binding site patterns. Every TF family as well as the small RNAs and combinatorial elements are identified with a different colour and their display can be selected individually. Figure 2 shows the web interface with the buttons to select the TF families and a graphic display of TFBS for selected TF family members in the Arabidopsis genes At2g42530 and At2g42540. Also shown is a tool tip box that opens when the mouse pointer moves over the colour-coded TFBS. The tool tip box gives additional information for the TF that identified this particular TFBS. Factor (RAV1) and factor family (AP2/EREBP) are identified as well as the position relative to the TSS (−70). For TFBS identified with positional weight matrices, threshold score, maximum score and score of the binding site are given (3).
Recently published binding sites for the Arabidopsis TFs TAC1, RAP2.2 and MYB98 were annotated to AthaMap (24–26). These factors belong to the C2H2(Zn), AP2/EREBP and MYB TF families. Detection and annotation of single binding sites was done as described earlier (4). Binding sites for two TFs for which positional weight matrices could be generated were annotated as well. These are the factors STF1 and SPL1 which belong to the bZIP and SBP TF families (27,28). Detection and annotation of matrix-based binding sites was done as described earlier (3). AthaMap now harbours 9 998 736 predicted TFBSs.
German Federal Ministry for Education and Research through GABI-ADVANCIS (BMBF 0315037B). Funding for open access charge: Technical University of Braunschweig.
Conflict of interest statement. None declared.
We would like to thank Anne-Kareen Blechert for help implementing the TAIR7 genome annotation and for TFBS screenings.