Search tips
Search criteria

Results 1-25 (933612)

Clipboard (0)

Related Articles

1.  MEME: discovering and analyzing DNA and protein sequence motifs 
Nucleic Acids Research  2006;34(Web Server issue):W369-W373.
MEME (Multiple EM for Motif Elicitation) is one of the most widely used tools for searching for novel ‘signals’ in sets of biological sequences. Applications include the discovery of new transcription factor binding sites and protein domains. MEME works by searching for repeated, ungapped sequence patterns that occur in the DNA or protein sequences provided by the user. Users can perform MEME searches via the web server hosted by the National Biomedical Computation Resource () and several mirror sites. Through the same web server, users can also access the Motif Alignment and Search Tool to search sequence databases for matches to motifs encoded in several popular formats. By clicking on buttons in the MEME output, users can compare the motifs discovered in their input sequences with databases of known motifs, search sequence databases for matches to the motifs and display the motifs in various formats. This article describes the freely accessible web server and its architecture, and discusses ways to use MEME effectively to find new sequence patterns in biological sequences and analyze their significance.
PMCID: PMC1538909  PMID: 16845028
2.  The XXmotif web server for eXhaustive, weight matriX-based motif discovery in nucleotide sequences 
Nucleic Acids Research  2012;40(Web Server issue):W104-W109.
The discovery of regulatory motifs enriched in sets of DNA or RNA sequences is fundamental to the analysis of a great variety of functional genomics experiments. These motifs usually represent binding sites of proteins or non-coding RNAs, which are best described by position weight matrices (PWMs). We have recently developed XXmotif, a de novo motif discovery method that is able to directly optimize the statistical significance of PWMs. XXmotif can also score conservation and positional clustering of motifs. The XXmotif server provides (i) a list of significantly overrepresented motif PWMs with web logos and E-values; (ii) a graph with color-coded boxes indicating the positions of selected motifs in the input sequences; (iii) a histogram of the overall positional distribution for selected motifs and (iv) a page for each motif with all significant motif occurrences, their P-values for enrichment, conservation and localization, their sequence contexts and coordinates. Free access:
PMCID: PMC3394272  PMID: 22693218
3.  SCOPE: a web server for practical de novo motif discovery 
Nucleic Acids Research  2007;35(Web Server issue):W259-W264.
SCOPE is a novel parameter-free method for the de novo identification of potential regulatory motifs in sets of coordinately regulated genes. The SCOPE algorithm combines the output of three component algorithms, each designed to identify a particular class of motifs. Using an ensemble learning approach, SCOPE identifies the best candidate motifs from its component algorithms. In tests on experimentally determined datasets, SCOPE identified motifs with a significantly higher level of accuracy than a number of other web-based motif finders run with their default parameters. Because SCOPE has no adjustable parameters, the web server has an intuitive interface, requiring only a set of gene names or FASTA sequences and a choice of species. The most significant motifs found by SCOPE are displayed graphically on the main results page with a table containing summary statistics for each motif. Detailed motif information, including the sequence logo, PWM, consensus sequence and specific matching sites can be viewed through a single click on a motif. SCOPE's efficient, parameter-free search strategy has enabled the development of a web server that is readily accessible to the practising biologist while providing results that compare favorably with those of other motif finders. The SCOPE web server is at .
PMCID: PMC1933170  PMID: 17485471
4.  MotifViz: an analysis and visualization tool for motif discovery 
Nucleic Acids Research  2004;32(Web Server issue):W420-W423.
Detecting overrepresented known transcription factor binding motifs in a set of promoter sequences of co-regulated genes has become an important approach to deciphering transcriptional regulatory mechanisms. In this paper, we present an interactive web server, MotifViz, for three motif discovery programs, Clover, Rover and Motifish, covering most available flavors of algorithms for achieving this goal. For comparison, we have also implemented the simple motif-matching program Possum. MotifViz provides uniform and intuitive input and output formats for all four programs. It can be accessed at
PMCID: PMC441564  PMID: 15215422
5.  Detecting seeded motifs in DNA sequences 
Nucleic Acids Research  2005;33(15):e135.
The problem of detecting DNA motifs with functional relevance in real biological sequences is difficult due to a number of biological, statistical and computational issues and also because of the lack of knowledge about the structure of searched patterns. Many algorithms are implemented in fully automated processes, which are often based upon a guess of input parameters from the user at the very first step. In this paper, we present a novel method for the detection of seeded DNA motifs, composed by regions with a different extent of variability. The method is based on a multi-step approach, which was implemented in a motif searching web tool (MOST). Overrepresented exact patterns are extracted from input sequences and clustered to produce motifs core regions, which are then extended and scored to generate seeded motifs. The combination of automated pattern discovery algorithms and different display tools for the evaluation and selection of results at several analysis steps can potentially lead to much more meaningful results than complete automation can produce. Experimental results on different yeast and human real datasets proved the methodology to be a promising solution for finding seeded motifs. MOST web tool is freely available at .
PMCID: PMC1197136  PMID: 16141193
6.  Compo: composite motif discovery using discrete models 
BMC Bioinformatics  2008;9:527.
Computational discovery of motifs in biomolecular sequences is an established field, with applications both in the discovery of functional sites in proteins and regulatory sites in DNA. In recent years there has been increased attention towards the discovery of composite motifs, typically occurring in cis-regulatory regions of genes.
This paper describes Compo: a discrete approach to composite motif discovery that supports richer modeling of composite motifs and a more realistic background model compared to previous methods. Furthermore, multiple parameter and threshold settings are tested automatically, and the most interesting motifs across settings are selected. This avoids reliance on single hard thresholds, which has been a weakness of previous discrete methods. Comparison of motifs across parameter settings is made possible by the use of p-values as a general significance measure. Compo can either return an ordered list of motifs, ranked according to the general significance measure, or a Pareto front corresponding to a multi-objective evaluation on sensitivity, specificity and spatial clustering.
Compo performs very competitively compared to several existing methods on a collection of benchmark data sets. These benchmarks include a recently published, large benchmark suite where the use of support across sequences allows Compo to correctly identify binding sites even when the relevant PWMs are mixed with a large number of noise PWMs. Furthermore, the possibility of parameter-free running offers high usability, the support for multi-objective evaluation allows a rich view of potential regulators, and the discrete model allows flexibility in modeling and interpretation of motifs.
PMCID: PMC2614996  PMID: 19063744
7.  A niched Pareto genetic algorithm for finding variable length regulatory motifs in DNA sequences 
3 Biotech  2011;2(2):141-148.
The transcription factor binding sites also called as motifs are short, recurring patterns in DNA sequences that are presumed to have a biological function. Identification of the motifs from the promoter region of the genes is an important and unsolved problem specifically in the eukaryotic genomes. In this paper, we present a niched Pareto genetic algorithm to identify the regulatory motifs. This approach is based on the maximization of two objectives of the problem that is the motif length and the consensus similarity score. A long motif means it is less likely to be a false motif. The similarity score represents a motifs probability of conservation in a given set of sequences. Proposed method can find multiple, variable length motifs. In this method, we represented a candidate motif as a combination of length and starting position of the motif in each sequence of the co-regulated genes. This enables the algorithm to identify multiple motifs of variable length. We applied this approach on various data sets and the results show that it can find multiple motifs of variable length in co-regulated genes.
PMCID: PMC3376862
Motif; TFBS; Binding sites; Multi-objective and genetic algorithm
8.  Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes 
Nucleic Acids Research  2004;32(Web Server issue):W199-W203.
One of the greatest challenges that modern molecular biology is facing is the understanding of the complex mechanisms regulating gene expression. A fundamental step in this process requires the characterization of regulatory motifs playing key roles in the regulation of gene expression at transcriptional and post-transcriptional levels. In particular, transcription is modulated by the interaction of transcription factors with their corresponding binding sites. Weeder Web is a web interface to Weeder, an algorithm for the automatic discovery of conserved motifs in a set of related regulatory DNA sequences. The motifs found are in turn likely to be instances of binding sites for some transcription factor. Other than providing access to the program, the interface has been designed so to make usage of the program itself as simple as possible, and to require very little prior knowledge about the length and the conservation of the motifs to be found. In fact, the interface automatically starts different runs of the program, each one with different parameters, and provides the user with an overall summary of the results as well as some ‘advice’ on which motifs look more interesting according to their statistical significance and some simple considerations. The web interface is available at the address by following the ‘Tools’ link.
PMCID: PMC441603  PMID: 15215380
9.  cisRED: a database system for genome-scale computational discovery of regulatory elements 
Nucleic Acids Research  2005;34(Database issue):D68-D73.
We describe cisRED, a database for conserved regulatory elements that are identified and ranked by a genome-scale computational system (). The database and high-throughput predictive pipeline are designed to address diverse target genomes in the context of rapidly evolving data resources and tools. Motifs are predicted in promoter regions using multiple discovery methods applied to sequence sets that include corresponding sequence regions from vertebrates. We estimate motif significance by applying discovery and post-processing methods to randomized sequence sets that are adaptively derived from target sequence sets, retain motifs with p-values below a threshold and identify groups of similar motifs and co-occurring motif patterns. The database offers information on atomic motifs, motif groups and patterns. It is web-accessible, and can be queried directly, downloaded or installed locally.
PMCID: PMC1347438  PMID: 16381958
10.  MEME Suite: tools for motif discovery and searching 
Nucleic Acids Research  2009;37(Web Server issue):W202-W208.
The MEME Suite web server provides a unified portal for online discovery and analysis of sequence motifs representing features such as DNA binding sites and protein interaction domains. The popular MEME motif discovery algorithm is now complemented by the GLAM2 algorithm which allows discovery of motifs containing gaps. Three sequence scanning algorithms—MAST, FIMO and GLAM2SCAN—allow scanning numerous DNA and protein sequence databases for motifs discovered by MEME and GLAM2. Transcription factor motifs (including those discovered using MEME) can be compared with motifs in many popular motif databases using the motif database scanning algorithm Tomtom. Transcription factor motifs can be further analyzed for putative function by association with Gene Ontology (GO) terms using the motif-GO term association tool GOMO. MEME output now contains sequence LOGOS for each discovered motif, as well as buttons to allow motifs to be conveniently submitted to the sequence and motif database scanning algorithms (MAST, FIMO and Tomtom), or to GOMO, for further analysis. GLAM2 output similarly contains buttons for further analysis using GLAM2SCAN and for rerunning GLAM2 with different parameters. All of the motif-based tools are now implemented as web services via Opal. Source code, binaries and a web server are freely available for noncommercial use at
PMCID: PMC2703892  PMID: 19458158
11.  seeMotif: exploring and visualizing sequence motifs in 3D structures 
Nucleic Acids Research  2009;37(Web Server issue):W552-W558.
Sequence motifs are important in the study of molecular biology. Motif discovery tools efficiently deliver many function related signatures of proteins and largely facilitate sequence annotation. As increasing numbers of motifs are detected experimentally or predicted computationally, characterizing the functional roles of motifs and identifying the potential synergetic relationships between them are important next steps. A good way to investigate novel motifs is to utilize the abundant 3D structures that have also been accumulated at an astounding rate in recent years. This article reports the development of the web service seeMotif, which provides users with an interactive interface for visualizing sequence motifs on protein structures from the Protein Data Bank (PDB). Researchers can quickly see the locations and conformation of multiple motifs among a number of related structures simultaneously. Considering the fact that PDB sequences are usually shorter than those in sequence databases and/or may have missing residues, seeMotif has two complementary approaches for selecting structures and mapping motifs to protein chains in structures. As more and more structures belonging to previously uncharacterized protein families become available, combining sequence and structure information gives good opportunities to facilitate understanding of protein functions in large-scale genome projects. Available at:, or
PMCID: PMC2703912  PMID: 19477961
12.  SPIC: A novel similarity metric for comparing transcription factor binding site motifs based on information contents 
BMC Systems Biology  2013;7(Suppl 2):S14.
Discovering transcription factor binding sites (TFBS) is one of primary challenges to decipher complex gene regulatory networks encrypted in a genome. A set of short DNA sequences identified by a transcription factor (TF) is known as a motif, which can be expressed accurately in matrix form such as a position-specific scoring matrix (PSSM) and a position frequency matrix. Very frequently, we need to query a motif in a database of motifs by seeking its similar motifs, merge similar TFBS motifs possibly identified by the same TF, separate irrelevant motifs, or filter out spurious motifs. Therefore, a novel metric is required to seize slight differences between irrelevant motifs and highlight the similarity between motifs of the same group in all these applications. While there are already several metrics for motif similarity proposed before, their performance is still far from satisfactory for these applications.
A novel metric has been proposed in this paper with name as SPIC (Similarity with Position Information Contents) for measuring the similarity between a column of a motif and a column of another motif. When defining this similarity score, we consider the likelihood that the column of the first motif's PFM can be produced by the column of the second motif's PSSM, and multiply the likelihood by the information content of the column of the second motif's PSSM, and vise versa. We evaluated the performance of SPIC combined with a local or a global alignment method having a function for affine gap penalty, for computing the similarity between two motifs. We also compared SPIC with seven existing state-of-the-arts metrics for their capability of clustering motifs from the same group and retrieving motifs from a database on three datasets.
When used jointly with the Smith-Waterman local alignment method with an affine gap penalty function (gap open penalty is equal to1, gap extension penalty is equal to 0.5), SPIC outperforms the seven existing state-of-the-art motif similarity metrics combined with their best alignments for matching motifs in database searches, and clustering the same TF's sub-motifs or distinguishing relevant ones from a miscellaneous group of motifs.
We have developed a novel motif similarity metric that can more accurately match motifs in database searches, and more effectively cluster similar motifs and differentiate irrelevant motifs than do the other seven metrics we are aware of.
PMCID: PMC3866262  PMID: 24564945
gene regulatory networks; information contents; transcription factor binding site (TFBS); motif; similarity metric
13.  fdrMotif 
Bioinformatics (Oxford, England)  2008;24(5):629-636.
Most de novo motif identification methods optimize the motif model first and then separately test the statistical significance of the motif score. In the first stage, a motif abundance parameter needs to be specified or modeled. In the second stage, a z-score or p-value is used as the test statistic. Error rates under multiple comparisons are not fully considered.
We propose a simple but novel approach, fdrMotif, that selects as many binding sites as possible while controlling a user-specified false discovery rate (FDR). Unlike existing iterative methods, fdrMotif combines model optimization (e.g., position weight matrix (PWM)) and significance testing at each step. By monitoring the proportion of binding sites selected in many sets of background sequences, fdrMotif controls the FDR in the original data. The model is then updated using an expectation (E) and maximization (M)-like procedure. We propose a new normalization procedure in the E-step for updating the model. This process is repeated until either the model converges or the number of iterations exceeds a maximum.
Simulation studies suggest that our normalization procedure assigns larger weights to the binding sites than do two other commonly used normalization procedures. Furthermore, fdrMotif requires only a user-specified FDR and an initial PWM. When tested on 542 high confidence experimental p53 binding loci, fdrMotif identified 569 p53 binding sites in 505 (93.2%) sequences. In comparison, MEME identified more binding sites but in fewer ChIP sequences than fdrMotif. When tested on 500 sets of simulated “ChIP” sequences with embedded known p53 binding sites, fdrMotif, compared to MEME, has higher sensitivity with similar positive predictive value. Furthermore, fdrMotif is robust to noise: it selected nearly identical binding sites in data adulterated with 50% added background sequences and the unadulterated data. We suggest that fdrMotif represents an improvement over MEME.
PMCID: PMC2376047  PMID: 18296465
14.  SLiMFinder: a web server to find novel, significantly over-represented, short protein motifs 
Nucleic Acids Research  2010;38(Web Server issue):W534-W539.
Short, linear motifs (SLiMs) play a critical role in many biological processes, particularly in protein–protein interactions. The Short, Linear Motif Finder (SLiMFinder) web server is a de novo motif discovery tool that identifies statistically over-represented motifs in a set of protein sequences, accounting for the evolutionary relationships between them. Motifs are returned with an intuitive P-value that greatly reduces the problem of false positives and is accessible to biologists of all disciplines. Input can be uploaded by the user or extracted directly from UniProt. Numerous masking options give the user great control over the contextual information to be included in the analyses. The SLiMFinder server combines these with user-friendly output and visualizations of motif context to allow the user to quickly gain insight into the validity of a putatively functional motif. These visualizations include alignments of motif occurrences, alignments of motifs and their homologues and a visual schematic of the top-ranked motifs. Returned motifs can also be compared with known SLiMs from the literature using CompariMotif. All results are available for download. The SLiMFinder server is available at:
PMCID: PMC2896084  PMID: 20497999
15.  D-MATRIX: A web tool for constructing weight matrix of conserved DNA motifs 
Bioinformation  2009;3(10):415-418.
Despite considerable efforts to date, DNA motif prediction in whole genome remains a challenge for researchers. Currently the genome wide motif prediction tools required either direct pattern sequence (for single motif) or weight matrix (for multiple motifs). Although there are known motif pattern databases and tools for genome level prediction but no tool for weight matrix construction. Considering this, we developed a D-MATRIX tool which predicts the different types of weight matrix based on user defined aligned motif sequence set and motif width. For retrieval of known motif sequences user can access the commonly used databases such as TFD, RegulonDB, DBTBS, Transfac. D­MATRIX program uses a simple statistical approach for weight matrix construction, which can be converted into different file formats according to user requirement. It provides the possibility to identify the conserved motifs in the co­regulated genes or whole genome. As example, we successfully constructed the weight matrix of LexA transcription factor binding site with the help of known sos­box cis­regulatory elements in Deinococcus radiodurans genome. The algorithm is implemented in C-Sharp and wrapped in ASP.Net to maintain a user friendly web interface. D­MATRIX tool is accessible through the CIMAP domain network.
PMCID: PMC2737498  PMID: 19759861
Weight matrix; motif prediction; file format; motif databases
16.  Promzea: a pipeline for discovery of co-regulatory motifs in maize and other plant species and its application to the anthocyanin and phlobaphene biosynthetic pathways and the Maize Development Atlas 
BMC Plant Biology  2013;13:42.
The discovery of genetic networks and cis-acting DNA motifs underlying their regulation is a major objective of transcriptome studies. The recent release of the maize genome (Zea mays L.) has facilitated in silico searches for regulatory motifs. Several algorithms exist to predict cis-acting elements, but none have been adapted for maize.
A benchmark data set was used to evaluate the accuracy of three motif discovery programs: BioProspector, Weeder and MEME. Analysis showed that each motif discovery tool had limited accuracy and appeared to retrieve a distinct set of motifs. Therefore, using the benchmark, statistical filters were optimized to reduce the false discovery ratio, and then remaining motifs from all programs were combined to improve motif prediction. These principles were integrated into a user-friendly pipeline for motif discovery in maize called Promzea, available at and on the Discovery Environment of the iPlant Collaborative website. Promzea was subsequently expanded to include rice and Arabidopsis. Within Promzea, a user enters cDNA sequences or gene IDs; corresponding upstream sequences are retrieved from the maize genome. Predicted motifs are filtered, combined and ranked. Promzea searches the chosen plant genome for genes containing each candidate motif, providing the user with the gene list and corresponding gene annotations. Promzea was validated in silico using a benchmark data set: the Promzea pipeline showed a 22% increase in nucleotide sensitivity compared to the best standalone program tool, Weeder, with equivalent nucleotide specificity. Promzea was also validated by its ability to retrieve the experimentally defined binding sites of transcription factors that regulate the maize anthocyanin and phlobaphene biosynthetic pathways. Promzea predicted additional promoter motifs, and genome-wide motif searches by Promzea identified 127 non-anthocyanin/phlobaphene genes that each contained all five predicted promoter motifs in their promoters, perhaps uncovering a broader co-regulated gene network. Promzea was also tested against tissue-specific microarray data from maize.
An online tool customized for promoter motif discovery in plants has been generated called Promzea. Promzea was validated in silico by its ability to retrieve benchmark motifs and experimentally defined motifs and was tested using tissue-specific microarray data. Promzea predicted broader networks of gene regulation associated with the historic anthocyanin and phlobaphene biosynthetic pathways. Promzea is a new bioinformatics tool for understanding transcriptional gene regulation in maize and has been expanded to include rice and Arabidopsis.
PMCID: PMC3658923  PMID: 23497159
Promoter; cis-acting; Motif; Maize; Anthocyanin; Phlobaphene; Bioprospector; MEME; Weeder; C1; P
17.  Melina II: a web tool for comparisons among several predictive algorithms to find potential motifs from promoter regions 
Nucleic Acids Research  2007;35(Web Server issue):W227-W231.
We present the second version of Melina, a web-based tool for promoter analysis. Melina II shows potential DNA motifs in promoter regions with a combination of several available programs, Consensus, MEME, Gibbs sampler, MDscan and Weeder, as well as several parameter settings. It allows running a maximum of four programs simultaneously, and comparing their results with graphical representations. In addition, users can build a weight matrix from a predicted motif and apply it to upstream sequences of several typical genomes (human, mouse, S. cerevisiae, E. coli, B. subtilis or A. thaliana) or to public motif databases (JASPAR or DBTBS) in order to find similar motifs. Melina II is a client/server system developed by using Adobe (Macromedia) Flash and is accessible over the web at
PMCID: PMC1933176  PMID: 17537821
18.  W-ChIPMotifs: a web application tool for de novo motif discovery from ChIP-based high-throughput data 
Bioinformatics  2009;25(23):3191-3193.
Summary: W-ChIPMotifs is a web application tool that provides a user friendly interface for de novo motif discovery. The web tool is based on our previous ChIPMotifs program which is a de novo motif finding tool developed for ChIP-based high-throughput data and incorporated various ab initio motif discovery tools such as MEME, MaMF, Weeder and optimized the significance of the detected motifs by using a bootstrap resampling statistic method and a Fisher test. Use of a randomized statistical model like bootstrap resampling can significantly increase the accuracy of the detected motifs. In our web tool, we have modified the program in two aspects: (i) we have refined the P-value with a Bonferroni correction; (ii) we have incorporated the STAMP tool to infer phylogenetic information and to determine the detected motifs if they are novel and known using the TRANSFAC and JASPAR databases. A comprehensive result file is mailed to users.
Availability: Data used in the article may be downloaded from
PMCID: PMC2778340  PMID: 19797408
19.  FoldMiner and LOCK 2: protein structure comparison and motif discovery on the web 
Nucleic Acids Research  2004;32(Web Server issue):W536-W541.
The FoldMiner web server ( provides remote access to methods for protein structure alignment and unsupervised motif discovery. FoldMiner is unique among such algorithms in that it improves both the motif definition and the sensitivity of a structural similarity search by combining the search and motif discovery methods and using information from each process to enhance the other. In a typical run, a query structure is aligned to all structures in one of several databases of single domain targets in order to identify its structural neighbors and to discover a motif that is the basis for the similarity among the query and statistically significant targets. This process is fully automated, but options for manual refinement of the results are available as well. The server uses the Chime plugin and customized controls to allow for visualization of the motif and of structural superpositions. In addition, we provide an interface to the LOCK 2 algorithm for rapid alignments of a query structure to smaller numbers of user-specified targets.
PMCID: PMC441527  PMID: 15215444
20.  FIMO: scanning for occurrences of a given motif 
Bioinformatics  2011;27(7):1017-1018.
Summary: A motif is a short DNA or protein sequence that contributes to the biological function of the sequence in which it resides. Over the past several decades, many computational methods have been described for identifying, characterizing and searching with sequence motifs. Critical to nearly any motif-based sequence analysis pipeline is the ability to scan a sequence database for occurrences of a given motif described by a position-specific frequency matrix.
Results: We describe Find Individual Motif Occurrences (FIMO), a software tool for scanning DNA or protein sequences with motifs described as position-specific scoring matrices. The program computes a log-likelihood ratio score for each position in a given sequence database, uses established dynamic programming methods to convert this score to a P-value and then applies false discovery rate analysis to estimate a q-value for each position in the given sequence. FIMO provides output in a variety of formats, including HTML, XML and several Santa Cruz Genome Browser formats. The program is efficient, allowing for the scanning of DNA sequences at a rate of 3.5 Mb/s on a single CPU.
Availability and Implementation: FIMO is part of the MEME Suite software toolkit. A web server and source code are available at
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3065696  PMID: 21330290
21.  MEME-LaB: motif analysis in clusters 
Bioinformatics  2013;29(13):1696-1697.
Summary: Genome-wide expression analysis can result in large numbers of clusters of co-expressed genes. Although there are tools for ab initio discovery of transcription factor-binding sites, most do not provide a quick and easy way to study large numbers of clusters. To address this, we introduce a web tool called MEME-LaB. The tool wraps MEME (an ab initio motif finder), providing an interface for users to input multiple gene clusters, retrieve promoter sequences, run motif finding and then easily browse and condense the results, facilitating better interpretation of the results from large-scale datasets.
Availability: MEME-LaB is freely accessible at:
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3694638  PMID: 23681125
22.  MEME-ChIP: motif analysis of large DNA datasets 
Bioinformatics  2011;27(12):1696-1697.
Motivation: Advances in high-throughput sequencing have resulted in rapid growth in large, high-quality datasets including those arising from transcription factor (TF) ChIP-seq experiments. While there are many existing tools for discovering TF binding site motifs in such datasets, most web-based tools cannot directly process such large datasets.
Results: The MEME-ChIP web service is designed to analyze ChIP-seq ‘peak regions’—short genomic regions surrounding declared ChIP-seq ‘peaks’. Given a set of genomic regions, it performs (i) ab initio motif discovery, (ii) motif enrichment analysis, (iii) motif visualization, (iv) binding affinity analysis and (v) motif identification. It runs two complementary motif discovery algorithms on the input data—MEME and DREME—and uses the motifs they discover in subsequent visualization, binding affinity and identification steps. MEME-ChIP also performs motif enrichment analysis using the AME algorithm, which can detect very low levels of enrichment of binding sites for TFs with known DNA-binding motifs. Importantly, unlike with the MEME web service, there is no restriction on the size or number of uploaded sequences, allowing very large ChIP-seq datasets to be analyzed. The analyses performed by MEME-ChIP provide the user with a varied view of the binding and regulatory activity of the ChIP-ed TF, as well as the possible involvement of other DNA-binding TFs.
Availability: MEME-ChIP is available as part of the MEME Suite at
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3106185  PMID: 21486936
23.  The Limits of De Novo DNA Motif Discovery 
PLoS ONE  2012;7(11):e47836.
A major challenge in molecular biology is reverse-engineering the cis-regulatory logic that plays a major role in the control of gene expression. This program includes searching through DNA sequences to identify “motifs” that serve as the binding sites for transcription factors or, more generally, are predictive of gene expression across cellular conditions. Several approaches have been proposed for de novo motif discovery–searching sequences without prior knowledge of binding sites or nucleotide patterns. However, unbiased validation is not straightforward. We consider two approaches to unbiased validation of discovered motifs: testing the statistical significance of a motif using a DNA “background” sequence model to represent the null hypothesis and measuring performance in predicting membership in gene clusters. We demonstrate that the background models typically used are “too null,” resulting in overly optimistic assessments of significance, and argue that performance in predicting TF binding or expression patterns from DNA motifs should be assessed by held-out data, as in predictive learning. Applying this criterion to common motif discovery methods resulted in universally poor performance, although there is a marked improvement when motifs are statistically significant against real background sequences. Moreover, on synthetic data where “ground truth” is known, discriminative performance of all algorithms is far below the theoretical upper bound, with pronounced “over-fitting” in training. A key conclusion from this work is that the failure of de novo discovery approaches to accurately identify motifs is basically due to statistical intractability resulting from the fixed size of co-regulated gene clusters, and thus such failures do not necessarily provide evidence that unfound motifs are not active biologically. Consequently, the use of prior knowledge to enhance motif discovery is not just advantageous but necessary. An implementation of the LR and ALR algorithms is available at
PMCID: PMC3492406  PMID: 23144830
24.  MoD Tools: regulatory motif discovery in nucleotide sequences from co-regulated or homologous genes 
Nucleic Acids Research  2006;34(Web Server issue):W566-W570.
Understanding the complex mechanisms regulating gene expression at the transcriptional and post-transcriptional levels is one of the greatest challenges of the post-genomic era. The MoD (MOtif Discovery) Tools web server comprises a set of tools for the discovery of novel conserved sequence and structure motifs in nucleotide sequences, motifs that in turn are good candidates for regulatory activity. The server includes the following programs: Weeder, for the discovery of conserved transcription factor binding sites (TFBSs) in nucleotide sequences from co-regulated genes; WeederH, for the discovery of conserved TFBSs and distal regulatory modules in sequences from homologous genes; RNAProfile, for the discovery of conserved secondary structure motifs in unaligned RNA sequences whose secondary structure is not known. In this way, a given gene can be compared with other co-regulated genes or with its homologs, or its mRNA can be analyzed for conserved motifs regulating its post-transcriptional fate. The web server thus provides researchers with different strategies and methods to investigate the regulation of gene expression, at both the transcriptional and post-transcriptional levels. Available at and .
PMCID: PMC1538899  PMID: 16845071
25.  RSAT 2011: regulatory sequence analysis tools 
Nucleic Acids Research  2011;39(Web Server issue):W86-W91.
RSAT (Regulatory Sequence Analysis Tools) comprises a wide collection of modular tools for the detection of cis-regulatory elements in genome sequences. Thirteen new programs have been added to the 30 described in the 2008 NAR Web Software Issue, including an automated sequence retrieval from EnsEMBL (retrieve-ensembl-seq), two novel motif discovery algorithms (oligo-diff and info-gibbs), a 100-times faster version of matrix-scan enabling the scanning of genome-scale sequence sets, and a series of facilities for random model generation and statistical evaluation (random-genome-fragments, random-motifs, random-sites, implant-sites, sequence-probability, permute-matrix). Our most recent work also focused on motif comparison (compare-matrices) and evaluation of motif quality (matrix-quality) by combining theoretical and empirical measures to assess the predictive capability of position-specific scoring matrices. To process large collections of peak sequences obtained from ChIP-seq or related technologies, RSAT provides a new program (peak-motifs) that combines several efficient motif discovery algorithms to predict transcription factor binding motifs, match them against motif databases and predict their binding sites. Availability (web site, stand-alone programs and SOAP/WSDL (Simple Object Access Protocol/Web Services Description Language) web services):
PMCID: PMC3125777  PMID: 21715389

Results 1-25 (933612)