The Complete Arabidopsis Transcriptome MicroArray (CATMA) project (http://www.catma.org/
) was formed in 2000 and now consists of groups from eight European countries. Our aim was to use the newly completed Arabidopsis thaliana
genome sequence to develop a complete and specific microarray for A. thaliana
by producing a specific gene sequence tag (GST) for every known or predicted gene found in the genome sequence, at the time believed to be 25 498 genes (1
) and currently 29 084 genes (http://www.tigr.org
). We believe that this approach will overcome many of the drawbacks of the use of ESTs and cDNAs as microarray probes: in particular that because the complete sequence of an EST clone is rarely known, their specificity to a particular gene cannot be guaranteed, and that known ESTs may represent only a fraction of the genes identified in a eukaryotic genome. The former is particularly important for genes which belong to gene families, around 65% of all Arabidopsis
), where a full length clone may cross-hybridize to other family members.
We have designed the CATMA GSTs to be specific only to their target gene. Furthermore, because the full sequence of each GST is known, they can be used not only for microarray experiments but also for purposes such as RNAi work. This is aided by the introduction of extension primers onto each GST, allowing reamplification of, and introduction of cloning sites to, any GST using one pair from only twenty-four 3′ and sixteen 5′ PCR primers.
To track the ~30 000 GSTs and primer pairs expected to be generated by the project, and to allow easy dissemination of data about the GSTs, we developed a web-interfaced MySQL-driven database. We made this database publicly available in June 2002 and here, we describe the generation of the data and some of the features of the database.