Search tips
Search criteria

Results 1-25 (805040)

Clipboard (0)

Related Articles

1.  AthaMap, integrating transcriptional and post-transcriptional data 
Nucleic Acids Research  2008;37(Database issue):D983-D986.
The AthaMap database generates a map of predicted transcription factor binding sites (TFBS) for the whole Arabidopsis thaliana genome. AthaMap has now been extended to include data on post-transcriptional regulation. A total of 403 173 genomic positions of small RNAs have been mapped in the A. thaliana genome. These identify 5772 putative post-transcriptionally regulated target genes. AthaMap tools have been modified to improve the identification of common TFBS in co-regulated genes by subtracting post-transcriptionally regulated genes from such analyses. Furthermore, AthaMap was updated to the TAIR7 genome annotation, a graphic display of gene analysis results was implemented, and the TFBS data content was increased. AthaMap is freely available at
PMCID: PMC2686474  PMID: 18842622
2.  AthaMap web tools for database-assisted identification of combinatorial cis-regulatory elements and the display of highly conserved transcription factor binding sites in Arabidopsis thaliana 
Nucleic Acids Research  2005;33(Web Server issue):W397-W402.
The AthaMap database generates a map of cis-regulatory elements for the Arabidopsis thaliana genome. AthaMap contains more than 7.4 × 106 putative binding sites for 36 transcription factors (TFs) from 16 different TF families. A newly implemented functionality allows the display of subsets of higher conserved transcription factor binding sites (TFBSs). Furthermore, a web tool was developed that permits a user-defined search for co-localizing cis-regulatory elements. The user can specify individually the level of conservation for each TFBS and a spacer range between them. This web tool was employed for the identification of co-localizing sites of known interacting TFs and TFs containing two DNA-binding domains. More than 1.8 × 105 combinatorial elements were annotated in the AthaMap database. These elements can also be used to identify more complex co-localizing elements consisting of up to four TFBSs. The AthaMap database and the connected web tools are a valuable resource for the analysis and the prediction of gene expression regulation at .
PMCID: PMC1160156  PMID: 15980498
3.  ‘MicroRNA Targets’, a new AthaMap web-tool for genome-wide identification of miRNA targets in Arabidopsis thaliana 
BioData Mining  2012;5:7.
The AthaMap database generates a genome-wide map for putative transcription factor binding sites for A. thaliana. When analyzing transcriptional regulation using AthaMap it may be important to learn which genes are also post-transcriptionally regulated by inhibitory RNAs. Therefore, a unified database for transcriptional and post-transcriptional regulation will be highly useful for the analysis of gene expression regulation.
To identify putative microRNA target sites in the genome of A. thaliana, processed mature miRNAs from 243 annotated miRNA genes were used for screening with the psRNATarget web server. Positional information, target genes and the psRNATarget score for each target site were annotated to the AthaMap database. Furthermore, putative target sites for small RNAs from seven small RNA transcriptome datasets were used to determine small RNA target sites within the A. thaliana genome.
Putative 41,965 genome wide miRNA target sites and 10,442 miRNA target genes were identified in the A. thaliana genome. Taken together with genes targeted by small RNAs from small RNA transcriptome datasets, a total of 16,600 A. thaliana genes are putatively regulated by inhibitory RNAs. A novel web-tool, ‘MicroRNA Targets’, was integrated into AthaMap which permits the identification of genes predicted to be regulated by selected miRNAs. The predicted target genes are displayed with positional information and the psRNATarget score of the target site. Furthermore, putative target sites of small RNAs from selected tissue datasets can be identified with the new ‘Small RNA Targets’ web-tool.
The integration of predicted miRNA and small RNA target sites with transcription factor binding sites will be useful for AthaMap-assisted gene expression analysis. URL:
PMCID: PMC3410767  PMID: 22800758
Arabidopsis thaliana; AthaMap; MicroRNAs; Small RNAs; Post-transcriptional regulation
4.  AthaMap web tools for the analysis and identification of co-regulated genes 
Nucleic Acids Research  2006;35(Database issue):D857-D862.
The AthaMap database generates a map of cis-regulatory elements for the whole Arabidopsis thaliana genome. This database has been extended by new tools to identify common cis-regulatory elements in specific regions of user-provided gene sets. A resulting table displays all cis-regulatory elements annotated in AthaMap including positional information relative to the respective gene. Further tables show overviews with the number of individual transcription factor binding sites (TFBS) present and TFBS common to the whole set of genes. Over represented cis-elements are easily identified. These features were used to detect specific enrichment of drought-responsive elements in cold-induced genes. For identification of co-regulated genes, the output table of the colocalization function was extended to show the closest genes and their relative distances to the colocalizing TFBS. Gene sets determined by this function can be used for a co-regulation analysis in microarray gene expression databases such as Genevestigator or PathoPlant. Additional improvements of AthaMap include display of the gene structure in the sequence window and a significant data increase. AthaMap is freely available at .
PMCID: PMC1761422  PMID: 17148485
5.  AthaMap-assisted transcription factor target gene identification in Arabidopsis thaliana 
The AthaMap database generates a map of potential transcription factor binding sites (TFBS) and small RNA target sites in the Arabidopsis thaliana genome. The database contains sites for 115 different transcription factors (TFs). TFBS were identified with positional weight matrices (PWMs) or with single binding sites. With the new web tool ‘Gene Identification’, it is possible to identify potential target genes for selected TFs. For these analyses, the user can define a region of interest of up to 6000 bp in all annotated genes. For TFBS determined with PWMs, the search can be restricted to high-quality TFBS. The results are displayed in tables that identify the gene, position of the TFBS and, if applicable, individual score of the TFBS. In addition, data files can be downloaded that harbour positional information of TFBS of all TFs in a region between −2000 and +2000 bp relative to the transcription or translation start site. Also, data content of AthaMap was increased and the database was updated to the TAIR8 genome release.
Database URL:
PMCID: PMC3011983  PMID: 21177332
6.  Internet Resources for Gene Expression Analysis in Arabidopsis thaliana 
Current Genomics  2008;9(6):375-380.
The number of online databases and web-tools for gene expression analysis in Arabidopsis thaliana has increased tremendously during the last years. These resources permit the database-assisted identification of putative cis-regulatory DNA sequences, their binding proteins, and the determination of common cis-regulatory motifs in coregulated genes. DNA binding proteins may be predicted by the type of cis-regulatory motif. Further questions of combinatorial control based on the interaction of DNA binding proteins and the colocalization of cis-regulatory motifs can be addressed. The database-assisted spatial and temporal expression analysis of DNA binding proteins and their target genes may help to further refine experimental approaches. Signal transduction pathways upstream of regulated genes are not yet fully accessible in databases mainly because they need to be manually annotated. This review focuses on the use of the AthaMap and PathoPlant® databases for gene expression regulation analysis and discusses similar and complementary online databases and web-tools. Online databases are helpful for the development of working hypothesis and for designing subsequent experiments.
PMCID: PMC2691667  PMID: 19506727
Bioinformatics; databases; gene expression; plants; transcription; web-server.
7.  TRANSFAC: an integrated system for gene expression regulation 
Nucleic Acids Research  2000;28(1):316-319.
TRANSFAC is a database on transcription factors, their genomic binding sites and DNA-binding profiles ( ). Its content has been enhanced, in particular by information about training sequences used for the construction of nucleotide matrices as well as by data on plant sites and factors. Moreover, TRANSFAC has been extended by two new modules: PathoDB provides data on pathologically relevant mutations in regulatory regions and transcription factor genes, whereas S/MARt DB compiles features of scaffold/matrix attached regions (S/MARs) and the proteins binding to them. Additionally, the databases TRANSPATH, about signal transduction, and CYTOMER, about organs and cell types, have been extended and are increasingly integrated with the TRANSFAC data sources.
PMCID: PMC102445  PMID: 10592259
8.  Nencki Genomics Database—Ensembl funcgen enhanced with intersections, user data and genome-wide TFBS motifs 
We present the Nencki Genomics Database, which extends the functionality of Ensembl Regulatory Build (funcgen) for the three species: human, mouse and rat. The key enhancements over Ensembl funcgen include the following: (i) a user can add private data, analyze them alongside the public data and manage access rights; (ii) inside the database, we provide efficient procedures for computing intersections between regulatory features and for mapping them to the genes. To Ensembl funcgen-derived data, which include data from ENCODE, we add information on conserved non-coding (putative regulatory) sequences, and on genome-wide occurrence of transcription factor binding site motifs from the current versions of two major motif libraries, namely, Jaspar and Transfac. The intersections and mapping to the genes are pre-computed for the public data, and the result of any procedure run on the data added by the users is stored back into the database, thus incrementally increasing the body of pre-computed data. As the Ensembl funcgen schema for the rat is currently not populated, our database is the first database of regulatory features for this frequently used laboratory animal. The database is accessible without registration using the mysql client: mysql –h –u public. Registration is required only to add or access private data. A WSDL webservice provides access to the database from any SOAP client, including the Taverna Workbench with a graphical user interface.
Database URL:
PMCID: PMC3788330  PMID: 24089456
9.  CREME: Cis-Regulatory Module Explorer for the human genome 
Nucleic Acids Research  2004;32(Web Server issue):W253-W256.
The binding of transcription factors to specific regulatory sequence elements is a primary mechanism for controlling gene transcription. Eukaryotic genes are often regulated by several transcription factors whose binding sites are tightly clustered and form cis-regulatory modules. In this paper, we present a web server, CREME, for identifying and visualizing cis-regulatory modules in the promoter regions of a given set of potentially co-regulated genes. CREME relies on a database of putative transcription factor binding sites that have been annotated across the human genome using a library of position weight matrices and evolutionary conservation with the mouse and rat genomes. A search algorithm is applied to this data set to identify combinations of transcription factors whose binding sites tend to co-occur in close proximity in the promoter regions of the input gene set. The identified cis-regulatory modules are statistically scored and significant combinations are reported and graphically visualized. Our web server is available at
PMCID: PMC441523  PMID: 15215390
10.  MAPPER: a search engine for the computational identification of putative transcription factor binding sites in multiple genomes 
BMC Bioinformatics  2005;6:79.
Cis-regulatory modules are combinations of regulatory elements occurring in close proximity to each other that control the spatial and temporal expression of genes. The ability to identify them in a genome-wide manner depends on the availability of accurate models and of search methods able to detect putative regulatory elements with enhanced sensitivity and specificity.
We describe the implementation of a search method for putative transcription factor binding sites (TFBSs) based on hidden Markov models built from alignments of known sites. We built 1,079 models of TFBSs using experimentally determined sequence alignments of sites provided by the TRANSFAC and JASPAR databases and used them to scan sequences of the human, mouse, fly, worm and yeast genomes. In several cases tested the method identified correctly experimentally characterized sites, with better specificity and sensitivity than other similar computational methods. Moreover, a large-scale comparison using synthetic data showed that in the majority of cases our method performed significantly better than a nucleotide weight matrix-based method.
The search engine, available at , allows the identification, visualization and selection of putative TFBSs occurring in the promoter or other regions of a gene from the human, mouse, fly, worm and yeast genomes. In addition it allows the user to upload a sequence to query and to build a model by supplying a multiple sequence alignment of binding sites for a transcription factor of interest. Due to its extensive database of models, powerful search engine and flexible interface, MAPPER represents an effective resource for the large-scale computational analysis of transcriptional regulation.
PMCID: PMC1131891  PMID: 15799782
11.  AGRIS: Arabidopsis Gene Regulatory Information Server, an information resource of Arabidopsis cis-regulatory elements and transcription factors 
BMC Bioinformatics  2003;4:25.
The gene regulatory information is hardwired in the promoter regions formed by cis-regulatory elements that bind specific transcription factors (TFs). Hence, establishing the architecture of plant promoters is fundamental to understanding gene expression. The determination of the regulatory circuits controlled by each TF and the identification of the cis-regulatory sequences for all genes have been identified as two of the goals of the Multinational Coordinated Arabidopsis thaliana Functional Genomics Project by the Multinational Arabidopsis Steering Committee (June 2002).
AGRIS is an information resource of Arabidopsis promoter sequences, transcription factors and their target genes. AGRIS currently contains two databases, AtTFDB (Arabidopsis thaliana transcription factor database) and AtcisDB (Arabidopsis thaliana cis-regulatory database). AtTFDB contains information on approximately 1,400 transcription factors identified through motif searches and grouped into 34 families. AtTFDB links the sequence of the transcription factors with available mutants and, when known, with the possible genes they may regulate. AtcisDB consists of the 5' regulatory sequences of all 29,388 annotated genes with a description of the corresponding cis-regulatory elements. Users can search the databases for (i) promoter sequences, (ii) a transcription factor, (iii) a direct target genes for a specific transcription factor, or (vi) a regulatory network that consists of transcription factors and their target genes.
AGRIS provides the necessary software tools on Arabidopsis transcription factors and their putative binding sites on all genes to initiate the identification of transcriptional regulatory networks in the model dicotyledoneous plant Arabidopsis thaliana. AGRIS can be accessed from .
PMCID: PMC166152  PMID: 12820902
12.  AtPAN: an integrated system for reconstructing transcriptional regulatory networks in Arabidopsis thaliana 
BMC Genomics  2012;13:85.
Construction of transcriptional regulatory networks (TRNs) is of priority concern in systems biology. Numerous high-throughput approaches, including microarray and next-generation sequencing, are extensively adopted to examine transcriptional expression patterns on the whole-genome scale; those data are helpful in reconstructing TRNs. Identifying transcription factor binding sites (TFBSs) in a gene promoter is the initial step in elucidating the transcriptional regulation mechanism. Since transcription factors usually co-regulate a common group of genes by forming regulatory modules with similar TFBSs. Therefore, the combinatorial interactions of transcription factors must be modeled to reconstruct the gene regulatory networks.
Description For systems biology applications, this work develops a novel database called Arabidopsis thaliana Promoter Analysis Net (AtPAN), capable of detecting TFBSs and their corresponding transcription factors (TFs) in a promoter or a set of promoters in Arabidopsis. For further analysis, according to the microarray expression data and literature, the co-expressed TFs and their target genes can be retrieved from AtPAN. Additionally, proteins interacting with the co-expressed TFs are also incorporated to reconstruct co-expressed TRNs. Moreover, combinatorial TFs can be detected by the frequency of TFBSs co-occurrence in a group of gene promoters. In addition, TFBSs in the conserved regions between the two input sequences or homologous genes in Arabidopsis and rice are also provided in AtPAN. The output results also suggest conducting wet experiments in the future.
The AtPAN, which has a user-friendly input/output interface and provide graphical view of the TRNs. This novel and creative resource is freely available online at
PMCID: PMC3314555  PMID: 22397531
13.  PlantCARE, a database of plant cis-acting regulatory elements and a portal to tools for in silico analysis of promoter sequences 
Nucleic Acids Research  2002;30(1):325-327.
PlantCARE is a database of plant cis-acting regulatory elements, enhancers and repressors. Regulatory elements are represented by positional matrices, consensus sequences and individual sites on particular promoter sequences. Links to the EMBL, TRANSFAC and MEDLINE databases are provided when available. Data about the transcription sites are extracted mainly from the literature, supplemented with an increasing number of in silico predicted data. Apart from a general description for specific transcription factor sites, levels of confidence for the experimental evidence, functional information and the position on the promoter are given as well. New features have been implemented to search for plant cis-acting regulatory elements in a query sequence. Furthermore, links are now provided to a new clustering and motif search method to investigate clusters of co-expressed genes. New regulatory elements can be sent automatically and will be added to the database after curation. The PlantCARE relational database is available via the World Wide Web at
PMCID: PMC99092  PMID: 11752327
14.  COTRASIF: conservation-aided transcription-factor-binding site finder 
Nucleic Acids Research  2009;37(7):e49.
COTRASIF is a web-based tool for the genome-wide search of evolutionary conserved regulatory regions (transcription factor-binding sites, TFBS) in eukaryotic gene promoters. Predictions are made using either a position-weight matrix search method, or a hidden Markov model search method, depending on the availability of the matrix and actual sequences of the target TFBS. COTRASIF is a fully integrated solution incorporating both a gene promoter database (based on the regular Ensembl genome annotation releases) and both JASPAR and TRANSFAC databases of TFBS matrices. To decrease the false-positives rate an integrated evolutionary conservation filter is available, which allows the selection of only those of the predicted TFBS that are present in the promoters of the related species’ orthologous genes. COTRASIF is very easy to use, implements a regularly updated database of promoters and is a powerful solution for genome-wide TFBS searching. COTRASIF is freely available at
PMCID: PMC2673430  PMID: 19264796
15.  In silico Analysis of Transcription Factor Repertoire and Prediction of Stress Responsive Transcription Factors in Soybean 
Sequence-specific DNA-binding transcription factors (TFs) are often termed as ‘master regulators’ which bind to DNA and either activate or repress gene transcription. We have computationally analysed the soybean genome sequence data and constructed a proper set of TFs based on the Hidden Markov Model profiles of DNA-binding domain families. Within the soybean genome, we identified 4342 loci encoding 5035 TF models which grouped into 61 families. We constructed a database named SoybeanTFDB ( containing the full compilation of soybean TFs and significant information such as: functional motifs, full-length cDNAs, domain alignments, promoter regions, genomic organization and putative regulatory functions based on annotations of gene ontology (GO) inferred by comparative analysis with Arabidopsis. With particular interest in abiotic stress signalling, we analysed the promoter regions for all of the TF encoding genes as a means to identify abiotic stress responsive cis-elements as well as all types of cis-motifs provided by the PLACE database. SoybeanTFDB enables scientists to easily access cis-element and GO annotations to aid in the prediction of TF function and selection of TFs with functions of interest. This study provides a basic framework and an important user-friendly public information resource which enables analyses of transcriptional regulation in soybean.
PMCID: PMC2780956  PMID: 19884168
soybean; transcription factors; abiotic stress; database
16.  Measuring similarities between transcription factor binding sites 
BMC Bioinformatics  2005;6:237.
Collections of transcription factor binding profiles (Transfac, Jaspar) are essential to identify regulatory elements in DNA sequences. Subsets of highly similar profiles complicate large scale analysis of transcription factor binding sites.
We propose to identify and group similar profiles using two independent similarity measures: χ2 distances between position frequency matrices (PFMs) and correlation coefficients between position weight matrices (PWMs) scores.
We show that these measures complement each other and allow to associate Jaspar and Transfac matrices. Clusters of highly similar matrices are identified and can be used to optimise the search for regulatory elements. Moreover, the application of the measures is illustrated by assigning E-box matrices of a SELEX experiment and of experimentally characterised binding sites of circadian clock genes to the Myc-Max cluster.
PMCID: PMC1261160  PMID: 16191190
17.  LASAGNA: A novel algorithm for transcription factor binding site alignment 
BMC Bioinformatics  2013;14:108.
Scientists routinely scan DNA sequences for transcription factor (TF) binding sites (TFBSs). Most of the available tools rely on position-specific scoring matrices (PSSMs) constructed from aligned binding sites. Because of the resolutions of assays used to obtain TFBSs, databases such as TRANSFAC, ORegAnno and PAZAR store unaligned variable-length DNA segments containing binding sites of a TF. These DNA segments need to be aligned to build a PSSM. While the TRANSFAC database provides scoring matrices for TFs, nearly 78% of the TFs in the public release do not have matrices available. As work on TFBS alignment algorithms has been limited, it is highly desirable to have an alignment algorithm tailored to TFBSs.
We designed a novel algorithm named LASAGNA, which is aware of the lengths of input TFBSs and utilizes position dependence. Results on 189 TFs of 5 species in the TRANSFAC database showed that our method significantly outperformed ClustalW2 and MEME. We further compared a PSSM method dependent on LASAGNA to an alignment-free TFBS search method. Results on 89 TFs whose binding sites can be located in genomes showed that our method is significantly more precise at fixed recall rates. Finally, we described LASAGNA-ChIP, a more sophisticated version for ChIP (Chromatin immunoprecipitation) experiments. Under the one-per-sequence model, it showed comparable performance with MEME in discovering motifs in ChIP-seq peak sequences.
We conclude that the LASAGNA algorithm is simple and effective in aligning variable-length binding sites. It has been integrated into a user-friendly webtool for TFBS search and visualization called LASAGNA-Search. The tool currently stores precomputed PSSM models for 189 TFs and 133 TFs built from TFBSs in the TRANSFAC Public database (release 7.0) and the ORegAnno database (08Nov10 dump), respectively. The webtool is available at
PMCID: PMC3747862  PMID: 23522376
18.  EZ-Retrieve: a web-server for batch retrieval of coordinate-specified human DNA sequences and underscoring putative transcription factor-binding sites 
Nucleic Acids Research  2002;30(21):e121.
The availability of a draft human genome sequence and ability to monitor the transcription of thousands of genes with DNA microarrays has necessitated the need for new computational tools that can analyze cis-regulatory elements controlling genes that display similar expression patterns. We have developed a tool designated EZ-Retrieve that can: (i) retrieve any particular region of human genome sequence from the NCBI database and (ii) analyze retrieved sequences for putative transcription factor-binding sites (TFBSs) as they appear on the TRANSFAC database. The tool is web-based, user-friendly and offers both batch sequence retrieval and batch TFBS prediction. A major application of EZ-Retrieve is the analysis of co-expressed genes that are highlighted as expression clusters in DNA microarray experiments.
PMCID: PMC135846  PMID: 12409480
19.  MATCHTM: a tool for searching transcription factor binding sites in DNA sequences 
Nucleic Acids Research  2003;31(13):3576-3579.
MatchTM is a weight matrix-based tool for searching putative transcription factor binding sites in DNA sequences. MatchTM is closely interconnected and distributed together with the TRANSFAC® database. In particular, MatchTM uses the matrix library collected in TRANSFAC® and therefore provides the possibility to search for a great variety of different transcription factor binding sites. Several sets of optimised matrix cut-off values are built in the system to provide a variety of search modes of different stringency. The user may construct and save his/her specific user profiles which are selected subsets of matrices including default or user-defined cut-off values. Furthermore a number of tissue-specific profiles are provided that were compiled by the TRANSFAC® team. A public version of the MatchTM tool is available at: The same program with a different web interface can be found at An advanced version of the tool called MatchTM Professional is available at
PMCID: PMC169193  PMID: 12824369
20.  Identification of conserved regulatory elements by comparative genome analysis 
Journal of Biology  2003;2(2):13.
For genes that have been successfully delineated within the human genome sequence, most regulatory sequences remain to be elucidated. The annotation and interpretation process requires additional data resources and significant improvements in computational methods for the detection of regulatory regions. One approach of growing popularity is based on the preferential conservation of functional sequences over the course of evolution by selective pressure, termed 'phylogenetic footprinting'. Mutations are more likely to be disruptive if they appear in functional sites, resulting in a measurable difference in evolution rates between functional and non-functional genomic segments.
We have devised a flexible suite of methods for the identification and visualization of conserved transcription-factor-binding sites. The system reports those putative transcription-factor-binding sites that are both situated in conserved regions and located as pairs of sites in equivalent positions in alignments between two orthologous sequences. An underlying collection of metazoan transcription-factor-binding profiles was assembled to facilitate the study. This approach results in a significant improvement in the detection of transcription-factor-binding sites because of an increased signal-to-noise ratio, as demonstrated with two sets of promoter sequences. The method is implemented as a graphical web application, ConSite, which is at the disposal of the scientific community at .
Phylogenetic footprinting dramatically improves the predictive selectivity of bioinformatic approaches to the analysis of promoter sequences. ConSite delivers unparalleled performance using a novel database of high-quality binding models for metazoan transcription factors. With a dynamic interface, this bioinformatics tool provides broad access to promoter analysis with phylogenetic footprinting.
PMCID: PMC193685  PMID: 12760745
21.  T-Reg Comparator: an analysis tool for the comparison of position weight matrices 
Nucleic Acids Research  2005;33(Web Server issue):W438-W441.
T-Reg Comparator is a novel software tool designed to support research into transcriptional regulation. Sequence motifs representing transcription factor binding sites are usually encoded as position weight matrices. The user inputs a set of such weight matrices or binding site sequences and our program matches them against the T-Reg database, which is presently built on data from the Transfac [E. Wingender (2004) In Silico Biol., 4, 55–61] and Jaspar [A. Sandelin, W. Alkema, P. Engstrom, W. W. Wasserman and B. Lenhard (2004) Nucleic Acids Res., 32, D91–D94]. Our tool delivers a detailed report on similarities between user-supplied motifs and motifs in the database. Apart from simple one-to-one relationships, T-Reg Comparator is also able to detect similarities between submatrices. In addition, we provide a user interface to a program for sequence scanning with weight matrices. Typical areas of application for T-Reg Comparator are motif and regulatory module finding and annotation of regulatory genomic regions. T-Reg Comparator is available at .
PMCID: PMC1160266  PMID: 15980506
22.  DISPARE: DIScriminative PAttern REfinement for Position Weight Matrices 
BMC Bioinformatics  2009;10:388.
The accurate determination of transcription factor binding affinities is an important problem in biology and key to understanding the gene regulation process. Position weight matrices are commonly used to represent the binding properties of transcription factor binding sites but suffer from low information content and a large number of false matches in the genome. We describe a novel algorithm for the refinement of position weight matrices representing transcription factor binding sites based on experimental data, including ChIP-chip analyses. We present an iterative weight matrix optimization method that is more accurate in distinguishing true transcription factor binding sites from a negative control set. The initial position weight matrix comes from JASPAR, TRANSFAC or other sources. The main new features are the discriminative nature of the method and matrix width and length optimization.
The algorithm was applied to the increasing collection of known transcription factor binding sites obtained from ChIP-chip experiments. The results show that our algorithm significantly improves the sensitivity and specificity of matrix models for identifying transcription factor binding sites.
When the transcription factor is known, it is more appropriate to use a discriminative approach such as the one presented here to derive its transcription factor-DNA binding properties starting with a matrix, as opposed to performing de novo motif discovery. Generating more accurate position weight matrices will ultimately contribute to a better understanding of eukaryotic transcriptional regulation, and could potentially offer a better alternative to ab initio motif discovery.
PMCID: PMC2788558  PMID: 19941641
23.  Increasing Coverage of Transcription Factor Position Weight Matrices through Domain-level Homology 
PLoS ONE  2012;7(8):e42779.
Transcription factor-DNA interactions, central to cellular regulation and control, are commonly described by position weight matrices (PWMs). These matrices are frequently used to predict transcription factor binding sites in regulatory regions of DNA to complement and guide further experimental investigation. The DNA sequence preferences of transcription factors, encoded in PWMs, are dictated primarily by select residues within the DNA binding domain(s) that interact directly with DNA. Therefore, the DNA binding properties of homologous transcription factors with identical DNA binding domains may be characterized by PWMs derived from different species. Accordingly, we have implemented a fully automated domain-level homology searching method for identical DNA binding sequences.
By applying the domain-level homology search to transcription factors with existing PWMs in the JASPAR and TRANSFAC databases, we were able to significantly increase coverage in terms of the total number of PWMs associated with a given species, assign PWMs to transcription factors that did not previously have any associations, and increase the number of represented species with PWMs over an order of magnitude. Additionally, using protein binding microarray (PBM) data, we have validated the domain-level method by demonstrating that transcription factor pairs with matching DNA binding domains exhibit comparable DNA binding specificity predictions to transcription factor pairs with completely identical sequences.
The increased coverage achieved herein demonstrates the potential for more thorough species-associated investigation of protein-DNA interactions using existing resources. The PWM scanning results highlight the challenging nature of transcription factors that contain multiple DNA binding domains, as well as the impact of motif discovery on the ability to predict DNA binding properties. The method is additionally suitable for identifying domain-level homology mappings to enable utilization of additional information sources in the study of transcription factors. The domain-level homology search method, resulting PWM mappings, web-based user interface, and web API are publicly available at
PMCID: PMC3428306  PMID: 22952610
24.  FOOTER: a web tool for finding mammalian DNA regulatory regions using phylogenetic footprinting 
Nucleic Acids Research  2005;33(Web Server issue):W442-W446.
FOOTER is a newly developed algorithm that analyzes homologous mammalian promoter sequences in order to identify transcriptional DNA regulatory ‘signals’. FOOTER uses prior knowledge about the binding site preferences of the transcription factors (TFs) in the form of position-specific scoring matrices (PSSMs). The PSSM models are generated from known mammalian binding sites from the TRANSFAC database. In a test set of 72 confirmed binding sites (most of them not present in TRANSFAC) of 19 TFs, it exhibited 83% sensitivity and 72% specificity. FOOTER is accessible over the web at .
PMCID: PMC1160181  PMID: 15980508
25.  A web server for transcription factor binding site prediction 
Bioinformation  2006;1(5):156-157.
Promoter prediction has gained increased attention in studies related to transcriptional regulation of gene expression.We developed a web server named PMSearch (Poly Matrix Search) which utilizes Position Frequency Matrices (PFMs) to predict transcription factor binding sites (TFBSs) in DNA sequences. PMSearch takes PFMs (either user-defined or retrieved from local dataset which currently contains 507 PFMs from Transfac Public 7.0 and JASPAR) and DNA sequences of interest as the input, then scans the DNA sequences with PFMs and reports the sites of high scores as the putative binding sites. The output of the server includes 1) A plot for the distribution of predicted TFBS along the DNA sequence, 2) A table listing location, score and motif for each putative binding site, and 3) Clusters of predicted binding sites. PMSearch also provides links for accessing clusters of PFMs that are similar to the input PFMs to facilitate complicated promoter analysis.
PMSearch is available for free at
PMCID: PMC1891680  PMID: 17597879
position frequency matrix; motif; transcription factor binding site; web server

Results 1-25 (805040)