WebMOTIFS is designed to automate the identification of regulatory sequence motifs using multiple motif discovery algorithms. Users may provide gene names (RefSeq or yeast ORF names) or probe identifiers from one of the several microarray platforms for Saccharomyces cerevisiae, Mus musculus and Homo sapiens. WebMOTIFS automates all the remaining steps and sends the user an email with a link to the results, which are kept on our server for 30 days. In addition to graphical output, the data can be downloaded in text-based formats. An overview of the processing is shown in .
Run with the default options, WebMOTIFS reports integrated results from four motif discovery programs: MEME (
6), AlignACE (
7), MDscan (
8) and Weeder (
9,
10). WebMOTIFS retrieves sequences corresponding to each input gene or probe name. If a gene name is provided or the probe is from a microarray designed for transcriptional profiling, the sequences are chosen based on the corresponding transcriptional start site. The sequence surrounding a probe is used for arrays designed for ChIP-chip. These sequences are automatically passed to the requested motif discovery programs without masking.
To combine the results from different motif discovery programs, WebMOTIFS objectively evaluates the significance of each motif. It compares the hypergeometric enrichment score for each motif to the distribution of scores for motifs found by the same program in sets of randomly selected promoters (
2). Next, since motif discovery programs may discover very similar motifs, the significant motifs are clustered and a single representative motif for each cluster is computed. Currently, WebMOTIFS uses the clustering algorithm reported in Harbison
et al. (
2), although more sophisticated algorithms have been developed (
11).
WebMOTIFS also provides the option of Bayesian motif discovery with THEME. The THEME algorithm searches for motifs consistent with proteins from specified DNA-binding domain families. The significance of each discovered motif is determined using cross-validation. THEME is particularly powerful in revealing motifs in mammalian promoters that are often missed by other methods (
5). The user can specify which DNA-binding domains are expected to be involved in the regulation of the input sequences or test all the available DNA-binding domain families.
WebMOTIFS offers a unique combination of features. First, WebMOTIFS is completely web-based, with all jobs running on our server, so it can work on any operating system. Although many motif discovery programs have web interfaces, it is difficult to merge the results of these programs. Other available tools for running multiple motif discovery programs, such as BEST (the Binding-site Estimation Suite of Tools) (
12) and TAMO (Tools for Analysis of Motifs, the software package on which WebMOTIFS is based) (
13), are downloadable software packages. Second, WebMOTIFS analyzes the results of motif discovery automatically with default values that typically produce useful results. WebMOTIFS has few adjustable parameters, which makes it less flexible than TAMO and BEST, but also makes it easier to use and less vulnerable to user error. Third, WebMOTIFS facilitates both input and output. It automatically extracts sequences corresponding to gene and probe names, clusters the discovered motifs and produces sequence logos representing the results. Clustering and visualizing the results of motif discovery with multiple programs helps make sense of the large number of discovered motifs, providing a quick summary of the results.
Example
We evaluated WebMOTIFS by analyzing previously reported genome-wide chromatin-immunoprecipitation experiments in
S. cerevisiae (
2), taking the list of bound genes for each transcription factor in each condition as input to WebMOTIFS. We compared the results from WebMOTIFS with the motifs previously reported by MacIsaac
et al. (
14). Run with the default settings (without Bayesian motif discovery), WebMOTIFS discovers the correct motif in 51 out of the 64 transcription factors. These results are particularly striking, because, in contrast to MacIsaac
et al. (
14), the programs currently incorporated in WebMOTIFS do not take advantage of information from evolutionary conservation.
The significance filtering and clustering steps provided by WebMOTIFS reveal the most statistically significant motifs, which are frequently also the most biologically relevant. For example, applying WebMOTIFS to the genes bound by the transcription factor Fkh2 in high-H
2O
2 conditions produced 163 motifs. Significance filtering eliminated most of these results, and clustering grouped the remaining 32 motifs into two clusters. The highest-ranked cluster is a good match to the known specificity of Fkh2, and the second cluster is a good match to the known specificity of Mcm1. The Mcm1 motif is the most significant motif identified by THEME, which correctly attributes it to the SRF-TF family (
15). The next most significant motif matches the known specificity and the DNA-binding domain family of Fkh2 (). Fkh2 and Mcm1 have previously been reported to bind cooperatively to a number of promoters in
S. cerevisiae (
16,
17).
We also tested WebMOTIFS on chromatin-immunoprecipitation data from mouse and human (
5), taking up to 200 bound probes for each transcription factor as input. In general, motif discovery is more difficult on human and mouse data than on yeast sequences (
4). Nevertheless, WebMOTIFS is often able to identify the correct motif as significant when run with the default settings. For instance, applying WebMOTIFS to binding data for Hnf4α in human hepatocytes produced 640 motifs. After significance filtering and clustering, the only remaining motif matches the known specificity of this protein, as reported in TRANSFAC (
18) (). Running WebMOTIFS with the THEME option also reveals the Hnf4α motif, which is correctly attributed to the nuclear hormone receptor family (
19).