Arabidopsis thaliana is an important model organism for plant biologists. Its small genome is completely sequenced and contains only a low amount of repetitive DNA and a high gene density [
1]. Furthermore, many mutants have been characterised phenotypically and insertion mutations for nearly all genes are available. An integrated information resource can be accessed at
http://arabidopsis.org [
2]. The large amount of information available for this model plant justifies its use for almost all basic biological questions.
All scientific questions that address developmental processes or biotic and abiotic signal response reactions focus on the understanding of gene expression regulation. There are three levels of regulation, pre-transcriptional, transcriptional and post-transcriptional. The pre-transcriptional level concerns chromatin-structure and remodelling. Transcriptional control is executed mainly by transcription factors (TFs) that recruit the transcriptional preinitiation complex to the promoter. A major aspect of post-transcriptional regulation is RNA stability affected by small RNAs.
Out of these levels, transcriptional control is the most accessible level for database-assisted analysis [
3]. Transcription factors bind to short sequence motifs, and families of TFs usually bind to similar sequences. In
Arabidopsis thaliana more than 1500 TFs were initially identified which constitute at least 5% of all protein coding genes [
4,
5]. More recently, more than 2000 protein coding sequences comprising 68 families are predicted to be TFs [
6].
Experimental data on these TFs varies significantly. While usually a few type members of each family have been extensively analysed, the function of all family members often remains unknown. The same applies for information on the binding site of these factors. For members of 25 TF families representative binding sites have been published and were annotated to databases [
7-
9].
A simple concept for transcriptional control is based on the presence of the binding site or
cis-regulatory sequence in the promoters of genes which will then be a target site of a TF that regulates expression of the gene through DNA binding [
3]. A next level of complexity is exerted by the combinatorial control of gene expression where TFs will bind after homo- or heterodimerization with other TFs [
10].
Based on the occurrence or combination of
cis-regulatory elements, predictions can be established about which TF family is involved in regulating transcription [
3]. However, it is still a major challenge to predict particular TF family members that bind. For this, knowledge on a possible coexpression of a member of the putatively binding TF family with the target gene may be useful [
11].
Often, single TFs are not sufficient for the regulation of gene expression [
10]. To gain more insight into the complexity of expression control, it is helpful to learn if
cis-regulatory elements recognized by known interacting TFs colocalize in their target genes [
12]. Furthermore, additional information on protein-protein interactions may gain insight into upstream signal transduction pathways [
13].
Another level of complexity that can be addressed with databases is the post-transcriptional control of gene expression. A large number of small RNAs have been cloned from
Arabidopsis thaliana and the genomic identification of their target sequences may reveal which genes are subjected to small RNA-mediated degradation [
14].
This review describes internet resources that are available for the study of gene expression regulation in Arabidopsis thaliana. It will focus on two databases, AthaMap and PathoPlant and the questions that can be addressed with them. The subjects discussed are schematically shown in the flow chart in Fig. (). To address these subjects, also other online resources are available which are summarized in Table ().
| Table 1Alphabetical List of Names and Links of Web Resources Mentioned in the Text |