‘Position-specific scoring matrices’ (PSSMs) and their derivatives have become the standard representation of a transcription factor's (TF) DNA-binding preference. For example, experimentally derived DNA-binding preferences for a growing number of TFs are stored as frequency matrices in databases such as JASPAR (1
) and TRANSFAC (2
). In addition, most de novo
motif-finding software tools report statistically over-represented degenerate sequence features in the form of frequency matrices or consensus sequences.
Motif-discovery is often one of the first steps performed during computational analysis of gene-regulation. For instance, researchers often wish to discover over-represented motifs that are common to sets of genes with similar expression patterns. However, interpretation of the output from motif-finders is often daunting; many distinct motifs may be reported with little or no indication as to whether each may potentially possesses regulatory function. Furthermore, no information is provided about the TF protein that may bind to them. It is therefore surprising that few tools currently exist that can assess similarity between novel, computationally identified motifs and the known motifs stored in the databases. Available tools [such as T-Reg Comparator (3
) and MACO (4
)] currently allow for only a single type of alignment method, which may not be suitable for all database searches, and none support the direct analysis of motif-finder output files.
Recently, a number of studies have focused on the evolution of binding preference amongst related TFs. For example, generalized models of the binding preferences from a group of structurally related TFs have been described (5
). Such ‘familial binding profiles’ (FBPs) have been shown to have wide applicability in improving the performance of motif-finders (5
) and in predicting the structural class of the TF associated with novel motifs (5
). Other studies have shown evolutionary conservation and change in fixed-order cis
-regulatory modules (e.g. in the SXY modules controlling vertebrate MHC gene expression (8
)). Currently, however, there is no publicly available software to support evolutionary analyses of DNA-binding motifs and facilitate the study of FBPs.
In response to the gap in the current bioinformatics software repertoire outlined above, the STAMP web server aims to provide a platform for ‘BLAST-like’ database searching and ‘ClustalW-like’ multiple alignment and tree building for DNA-binding frequency matrices and motifs. Instead of limiting analyses to a single ungapped alignment strategy, STAMP allows various combinations between the implemented scoring metrics, pairwise alignment methods, gap penalties, multiple alignment strategies and tree-building algorithms. The web server accepts many commonly used motif and frequency matrix formats, and in addition allows the uploading of entire output files from 12 supported motif-finders. STAMP therefore offers a highly flexible and comprehensive toolbox for the study of relationships between TF-binding motifs.