PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (616491)

Clipboard (0)
None

Related Articles

1.  AthaMap web tools for the analysis and identification of co-regulated genes 
Nucleic Acids Research  2006;35(Database issue):D857-D862.
The AthaMap database generates a map of cis-regulatory elements for the whole Arabidopsis thaliana genome. This database has been extended by new tools to identify common cis-regulatory elements in specific regions of user-provided gene sets. A resulting table displays all cis-regulatory elements annotated in AthaMap including positional information relative to the respective gene. Further tables show overviews with the number of individual transcription factor binding sites (TFBS) present and TFBS common to the whole set of genes. Over represented cis-elements are easily identified. These features were used to detect specific enrichment of drought-responsive elements in cold-induced genes. For identification of co-regulated genes, the output table of the colocalization function was extended to show the closest genes and their relative distances to the colocalizing TFBS. Gene sets determined by this function can be used for a co-regulation analysis in microarray gene expression databases such as Genevestigator or PathoPlant. Additional improvements of AthaMap include display of the gene structure in the sequence window and a significant data increase. AthaMap is freely available at .
doi:10.1093/nar/gkl1006
PMCID: PMC1761422  PMID: 17148485
2.  AthaMap, integrating transcriptional and post-transcriptional data 
Nucleic Acids Research  2008;37(Database issue):D983-D986.
The AthaMap database generates a map of predicted transcription factor binding sites (TFBS) for the whole Arabidopsis thaliana genome. AthaMap has now been extended to include data on post-transcriptional regulation. A total of 403 173 genomic positions of small RNAs have been mapped in the A. thaliana genome. These identify 5772 putative post-transcriptionally regulated target genes. AthaMap tools have been modified to improve the identification of common TFBS in co-regulated genes by subtracting post-transcriptionally regulated genes from such analyses. Furthermore, AthaMap was updated to the TAIR7 genome annotation, a graphic display of gene analysis results was implemented, and the TFBS data content was increased. AthaMap is freely available at http://www.athamap.de/.
doi:10.1093/nar/gkn709
PMCID: PMC2686474  PMID: 18842622
3.  AthaMap web tools for database-assisted identification of combinatorial cis-regulatory elements and the display of highly conserved transcription factor binding sites in Arabidopsis thaliana 
Nucleic Acids Research  2005;33(Web Server issue):W397-W402.
The AthaMap database generates a map of cis-regulatory elements for the Arabidopsis thaliana genome. AthaMap contains more than 7.4 × 106 putative binding sites for 36 transcription factors (TFs) from 16 different TF families. A newly implemented functionality allows the display of subsets of higher conserved transcription factor binding sites (TFBSs). Furthermore, a web tool was developed that permits a user-defined search for co-localizing cis-regulatory elements. The user can specify individually the level of conservation for each TFBS and a spacer range between them. This web tool was employed for the identification of co-localizing sites of known interacting TFs and TFs containing two DNA-binding domains. More than 1.8 × 105 combinatorial elements were annotated in the AthaMap database. These elements can also be used to identify more complex co-localizing elements consisting of up to four TFBSs. The AthaMap database and the connected web tools are a valuable resource for the analysis and the prediction of gene expression regulation at .
doi:10.1093/nar/gki395
PMCID: PMC1160156  PMID: 15980498
4.  ‘MicroRNA Targets’, a new AthaMap web-tool for genome-wide identification of miRNA targets in Arabidopsis thaliana 
BioData Mining  2012;5:7.
Background
The AthaMap database generates a genome-wide map for putative transcription factor binding sites for A. thaliana. When analyzing transcriptional regulation using AthaMap it may be important to learn which genes are also post-transcriptionally regulated by inhibitory RNAs. Therefore, a unified database for transcriptional and post-transcriptional regulation will be highly useful for the analysis of gene expression regulation.
Methods
To identify putative microRNA target sites in the genome of A. thaliana, processed mature miRNAs from 243 annotated miRNA genes were used for screening with the psRNATarget web server. Positional information, target genes and the psRNATarget score for each target site were annotated to the AthaMap database. Furthermore, putative target sites for small RNAs from seven small RNA transcriptome datasets were used to determine small RNA target sites within the A. thaliana genome.
Results
Putative 41,965 genome wide miRNA target sites and 10,442 miRNA target genes were identified in the A. thaliana genome. Taken together with genes targeted by small RNAs from small RNA transcriptome datasets, a total of 16,600 A. thaliana genes are putatively regulated by inhibitory RNAs. A novel web-tool, ‘MicroRNA Targets’, was integrated into AthaMap which permits the identification of genes predicted to be regulated by selected miRNAs. The predicted target genes are displayed with positional information and the psRNATarget score of the target site. Furthermore, putative target sites of small RNAs from selected tissue datasets can be identified with the new ‘Small RNA Targets’ web-tool.
Conclusions
The integration of predicted miRNA and small RNA target sites with transcription factor binding sites will be useful for AthaMap-assisted gene expression analysis. URL: http://www.athamap.de/
doi:10.1186/1756-0381-5-7
PMCID: PMC3410767  PMID: 22800758
Arabidopsis thaliana; AthaMap; MicroRNAs; Small RNAs; Post-transcriptional regulation
5.  AthaMap: an online resource for in silico transcription factor binding sites in the Arabidopsis thaliana genome 
Nucleic Acids Research  2004;32(Database issue):D368-D372.
Gene expression is controlled mainly by the binding of transcription factors to regulatory sequences. To generate a genomic map for regulatory sequences, the Arabidopsis thaliana genome was screened for putative transcription factor binding sites. Using publicly available data from the TRANSFAC database and from publications, alignment matrices for 23 transcription factors of 13 different factor families were used with the pattern search program Patser to determine the genomic positions of more than 2.4 × 106 putative binding sites. Due to the dense clustering of genes and the observation that regulatory sequences are not restricted to upstream regions, the prediction of binding sites was performed for the whole genome. The genomic positions and the underlying data were imported into the newly developed AthaMap database. This data can be accessed by positional information or the Arabidopsis Genome Initiative identification number. Putative binding sites are displayed in the defined region. Data on the matrices used and on the thresholds applied in these screens are given in the database. Considering the high density of sites it will be a valuable resource for generating models on gene expression regulation. The data are available at http://www.athamap.de.
doi:10.1093/nar/gkh017
PMCID: PMC308752  PMID: 14681436
6.  AthaMap-assisted transcription factor target gene identification in Arabidopsis thaliana 
The AthaMap database generates a map of potential transcription factor binding sites (TFBS) and small RNA target sites in the Arabidopsis thaliana genome. The database contains sites for 115 different transcription factors (TFs). TFBS were identified with positional weight matrices (PWMs) or with single binding sites. With the new web tool ‘Gene Identification’, it is possible to identify potential target genes for selected TFs. For these analyses, the user can define a region of interest of up to 6000 bp in all annotated genes. For TFBS determined with PWMs, the search can be restricted to high-quality TFBS. The results are displayed in tables that identify the gene, position of the TFBS and, if applicable, individual score of the TFBS. In addition, data files can be downloaded that harbour positional information of TFBS of all TFs in a region between −2000 and +2000 bp relative to the transcription or translation start site. Also, data content of AthaMap was increased and the database was updated to the TAIR8 genome release.
Database URL: http://www.athamap.de/gene_ident.php
doi:10.1093/database/baq034
PMCID: PMC3011983  PMID: 21177332
7.  PathoPlant®: a platform for microarray expression data to analyze co-regulated genes involved in plant defense responses 
Nucleic Acids Research  2006;35(Database issue):D841-D845.
Plants react to pathogen attack by expressing specific proteins directed toward the infecting pathogens. This involves the transcriptional activation of specific gene sets. PathoPlant®, a database on plant–pathogen interactions and signal transduction reactions, has now been complemented by microarray gene expression data from Arabidopsis thaliana subjected to pathogen infection and elicitor treatment. New web tools enable identification of plant genes regulated by specific stimuli. Sets of genes co-regulated by multiple stimuli can be displayed as well. A user-friendly web interface was created for the submission of gene sets to be analyzed. This results in a table, listing the stimuli that act either inducing or repressing on the respective genes. The search can be restricted to certain induction factors to identify, e.g. strongly up- or down-regulated genes. Up to three stimuli can be combined with the option of induction factor restriction to determine similarly regulated genes. To identify common cis-regulatory elements in co-regulated genes, a resulting gene list can directly be exported to the AthaMap database for analysis. PathoPlant is freely accessible at .
doi:10.1093/nar/gkl835
PMCID: PMC1669748  PMID: 17099232
8.  AGRIS: Arabidopsis Gene Regulatory Information Server, an information resource of Arabidopsis cis-regulatory elements and transcription factors 
BMC Bioinformatics  2003;4:25.
Background
The gene regulatory information is hardwired in the promoter regions formed by cis-regulatory elements that bind specific transcription factors (TFs). Hence, establishing the architecture of plant promoters is fundamental to understanding gene expression. The determination of the regulatory circuits controlled by each TF and the identification of the cis-regulatory sequences for all genes have been identified as two of the goals of the Multinational Coordinated Arabidopsis thaliana Functional Genomics Project by the Multinational Arabidopsis Steering Committee (June 2002).
Results
AGRIS is an information resource of Arabidopsis promoter sequences, transcription factors and their target genes. AGRIS currently contains two databases, AtTFDB (Arabidopsis thaliana transcription factor database) and AtcisDB (Arabidopsis thaliana cis-regulatory database). AtTFDB contains information on approximately 1,400 transcription factors identified through motif searches and grouped into 34 families. AtTFDB links the sequence of the transcription factors with available mutants and, when known, with the possible genes they may regulate. AtcisDB consists of the 5' regulatory sequences of all 29,388 annotated genes with a description of the corresponding cis-regulatory elements. Users can search the databases for (i) promoter sequences, (ii) a transcription factor, (iii) a direct target genes for a specific transcription factor, or (vi) a regulatory network that consists of transcription factors and their target genes.
Conclusion
AGRIS provides the necessary software tools on Arabidopsis transcription factors and their putative binding sites on all genes to initiate the identification of transcriptional regulatory networks in the model dicotyledoneous plant Arabidopsis thaliana. AGRIS can be accessed from .
doi:10.1186/1471-2105-4-25
PMCID: PMC166152  PMID: 12820902
9.  AtPAN: an integrated system for reconstructing transcriptional regulatory networks in Arabidopsis thaliana 
BMC Genomics  2012;13:85.
Background
Construction of transcriptional regulatory networks (TRNs) is of priority concern in systems biology. Numerous high-throughput approaches, including microarray and next-generation sequencing, are extensively adopted to examine transcriptional expression patterns on the whole-genome scale; those data are helpful in reconstructing TRNs. Identifying transcription factor binding sites (TFBSs) in a gene promoter is the initial step in elucidating the transcriptional regulation mechanism. Since transcription factors usually co-regulate a common group of genes by forming regulatory modules with similar TFBSs. Therefore, the combinatorial interactions of transcription factors must be modeled to reconstruct the gene regulatory networks.
Description For systems biology applications, this work develops a novel database called Arabidopsis thaliana Promoter Analysis Net (AtPAN), capable of detecting TFBSs and their corresponding transcription factors (TFs) in a promoter or a set of promoters in Arabidopsis. For further analysis, according to the microarray expression data and literature, the co-expressed TFs and their target genes can be retrieved from AtPAN. Additionally, proteins interacting with the co-expressed TFs are also incorporated to reconstruct co-expressed TRNs. Moreover, combinatorial TFs can be detected by the frequency of TFBSs co-occurrence in a group of gene promoters. In addition, TFBSs in the conserved regions between the two input sequences or homologous genes in Arabidopsis and rice are also provided in AtPAN. The output results also suggest conducting wet experiments in the future.
Conclusions
The AtPAN, which has a user-friendly input/output interface and provide graphical view of the TRNs. This novel and creative resource is freely available online at http://AtPAN.itps.ncku.edu.tw/.
doi:10.1186/1471-2164-13-85
PMCID: PMC3314555  PMID: 22397531
10.  AGRIS: the Arabidopsis Gene Regulatory Information Server, an update 
Nucleic Acids Research  2010;39(Database issue):D1118-D1122.
The Arabidopsis Gene Regulatory Information Server (AGRIS; http://arabidopsis.med.ohio-state.edu/) provides a comprehensive resource for gene regulatory studies in the model plant Arabidopsis thaliana. Three interlinked databases, AtTFDB, AtcisDB and AtRegNet, furnish comprehensive and updated information on transcription factors (TFs), predicted and experimentally verified cis-regulatory elements (CREs) and their interactions, respectively. In addition to significant contributions in the identification of the entire set of TF–DNA interactions, which are the key to understand the gene regulatory networks that govern Arabidopsis gene expression, tools recently incorporated into AGRIS include the complete set of words length 5–15 present in the Arabidopsis genome and the integration of AtRegNet with visualization tools, such as the recently developed ReIN application. All the information in AGRIS is publicly available and downloadable upon registration.
doi:10.1093/nar/gkq1120
PMCID: PMC3013708  PMID: 21059685
11.  Seed storage protein gene promoters contain conserved DNA motifs in Brassicaceae, Fabaceae and Poaceae 
BMC Plant Biology  2009;9:126.
Background
Accurate computational identification of cis-regulatory motifs is difficult, particularly in eukaryotic promoters, which typically contain multiple short and degenerate DNA sequences bound by several interacting factors. Enrichment in combinations of rare motifs in the promoter sequence of functionally or evolutionarily related genes among several species is an indicator of conserved transcriptional regulatory mechanisms. This provides a basis for the computational identification of cis-regulatory motifs.
Results
We have used a discriminative seeding DNA motif discovery algorithm for an in-depth analysis of 54 seed storage protein (SSP) gene promoters from three plant families, namely Brassicaceae (mustards), Fabaceae (legumes) and Poaceae (grasses) using backgrounds based on complete sets of promoters from a representative species in each family, namely Arabidopsis (Arabidopsis thaliana (L.) Heynh.), soybean (Glycine max (L.) Merr.) and rice (Oryza sativa L.) respectively. We have identified three conserved motifs (two RY-like and one ACGT-like) in Brassicaceae and Fabaceae SSP gene promoters that are similar to experimentally characterized seed-specific cis-regulatory elements. Fabaceae SSP gene promoter sequences are also enriched in a novel, seed-specific E2Fb-like motif. Conserved motifs identified in Poaceae SSP gene promoters include a GCN4-like motif, two prolamin-box-like motifs and an Skn-1-like motif. Evidence of the presence of a variant of the TATA-box is found in the SSP gene promoters from the three plant families. Motifs discovered in SSP gene promoters were used to score whole-genome sets of promoters from Arabidopsis, soybean and rice. The highest-scoring promoters are associated with genes coding for different subunits or precursors of seed storage proteins.
Conclusion
Seed storage protein gene promoter motifs are conserved in diverse species, and different plant families are characterized by a distinct combination of conserved motifs. The majority of discovered motifs match experimentally characterized cis-regulatory elements. These results provide a good starting point for further experimental analysis of plant seed-specific promoters and our methodology can be used to unravel more transcriptional regulatory mechanisms in plants and other eukaryotes.
doi:10.1186/1471-2229-9-126
PMCID: PMC2770497  PMID: 19843335
12.  Ab initio identification of putative human transcription factor binding sites by comparative genomics 
BMC Bioinformatics  2005;6:110.
Background
Understanding transcriptional regulation of gene expression is one of the greatest challenges of modern molecular biology. A central role in this mechanism is played by transcription factors, which typically bind to specific, short DNA sequence motifs usually located in the upstream region of the regulated genes. We discuss here a simple and powerful approach for the ab initio identification of these cis-regulatory motifs. The method we present integrates several elements: human-mouse comparison, statistical analysis of genomic sequences and the concept of coregulation. We apply it to a complete scan of the human genome.
Results
By using the catalogue of conserved upstream sequences collected in the CORG database we construct sets of genes sharing the same overrepresented motif (short DNA sequence) in their upstream regions both in human and in mouse. We perform this construction for all possible motifs from 5 to 8 nucleotides in length and then filter the resulting sets looking for two types of evidence of coregulation: first, we analyze the Gene Ontology annotation of the genes in the set, searching for statistically significant common annotations; second, we analyze the expression profiles of the genes in the set as measured by microarray experiments, searching for evidence of coexpression. The sets which pass one or both filters are conjectured to contain a significant fraction of coregulated genes, and the upstream motifs characterizing the sets are thus good candidates to be the binding sites of the TF's involved in such regulation.
In this way we find various known motifs and also some new candidate binding sites.
Conclusion
We have discussed a new integrated algorithm for the "ab initio" identification of transcription factor binding sites in the human genome. The method is based on three ingredients: comparative genomics, overrepresentation, different types of coregulation. The method is applied to a full-scan of the human genome, giving satisfactory results.
doi:10.1186/1471-2105-6-110
PMCID: PMC1097714  PMID: 15865625
13.  PlantPAN: Plant promoter analysis navigator, for identifying combinatorial cis-regulatory elements with distance constraint in plant gene groups 
BMC Genomics  2008;9:561.
Background
The elucidation of transcriptional regulation in plant genes is important area of research for plant scientists, following the mapping of various plant genomes, such as A. thaliana, O. sativa and Z. mays. A variety of bioinformatic servers or databases of plant promoters have been established, although most have been focused only on annotating transcription factor binding sites in a single gene and have neglected some important regulatory elements (tandem repeats and CpG/CpNpG islands) in promoter regions. Additionally, the combinatorial interaction of transcription factors (TFs) is important in regulating the gene group that is associated with the same expression pattern. Therefore, a tool for detecting the co-regulation of transcription factors in a group of gene promoters is required.
Results
This study develops a database-assisted system, PlantPAN (Plant Promoter Analysis Navigator), for recognizing combinatorial cis-regulatory elements with a distance constraint in sets of plant genes. The system collects the plant transcription factor binding profiles from PLACE, TRANSFAC (public release 7.0), AGRIS, and JASPER databases and allows users to input a group of gene IDs or promoter sequences, enabling the co-occurrence of combinatorial transcription factor binding sites (TFBSs) within a defined distance (20 bp to 200 bp) to be identified. Furthermore, the new resource enables other regulatory features in a plant promoter, such as CpG/CpNpG islands and tandem repeats, to be displayed. The regulatory elements in the conserved regions of the promoters across homologous genes are detected and presented.
Conclusion
In addition to providing a user-friendly input/output interface, PlantPAN has numerous advantages in the analysis of a plant promoter. Several case studies have established the effectiveness of PlantPAN. This novel analytical resource is now freely available at .
doi:10.1186/1471-2164-9-561
PMCID: PMC2633311  PMID: 19036138
14.  Discovery of Stress Responsive DNA Regulatory Motifs in Arabidopsis 
PLoS ONE  2012;7(8):e43198.
The discovery of DNA regulatory motifs in the sequenced genomes using computational methods remains challenging. Here, we present MotifIndexer - a comprehensive strategy for de novo identification of DNA regulatory motifs at a genome level. Using word-counting methods, we indexed the existence of every 8-mer oligo composed of bases A, C, G, T, r, y, s, w, m, k, n or 12-mer oligo composed of A, C, G, T, n, in the promoters of all predicted genes of Arabidopsis thaliana genome and of selected stress-induced co-expressed genes. From this analysis, we identified number of over-represented motifs. Among these, major critical motifs were identified using a position filter. We used a model based on uniform distribution and the z-scores derived from this model to describe position bias. Interestingly, many motifs showed position bias towards the transcription start site. We extended this model to show biased distribution of motifs in the genomes of both A. thaliana and rice. We also used MotifIndexer to identify conserved motifs in co-expressed gene groups from two Arabidopsis species, A. thaliana and A. lyrata. This new comparative genomics method does not depend on alignments of homologous gene promoter sequences.
doi:10.1371/journal.pone.0043198
PMCID: PMC3418279  PMID: 22912824
15.  Integration of Arabidopsis thaliana stress-related transcript profiles, promoter structures, and cell-specific expression 
Genome Biology  2007;8(4):R49.
The integration of stress-dependent, tissue- and cell-specific expression profiles and 5'-regulatory sequence motif analysis defines a common stress transcriptome, identifies major motifs for stress response, and places stress response in the context of tissue and cell lineages in the Arabidopsis root.
Background
Arabidopsis thaliana transcript profiles indicate effects of abiotic and biotic stresses and tissue-specific and cell-specific gene expression. Organizing these datasets could reveal the structure and mechanisms of responses and crosstalk between pathways, and in which cells the plants perceive, signal, respond to, and integrate environmental inputs.
Results
We clustered Arabidopsis transcript profiles for various treatments, including abiotic, biotic, and chemical stresses. Ubiquitous stress responses in Arabidopsis, similar to those of fungi and animals, employ genes in pathways related to mitogen-activated protein kinases, Snf1-related kinases, vesicle transport, mitochondrial functions, and the transcription machinery. Induced responses to stresses are attributed to genes whose promoters are characterized by a small number of regulatory motifs, although secondary motifs were also apparent. Most genes that are downregulated by stresses exhibited distinct tissue-specific expression patterns and appear to be under developmental regulation. The abscisic acid-dependent transcriptome is delineated in the cluster structure, whereas functions that are dependent on reactive oxygen species are widely distributed, indicating that evolutionary pressures confer distinct responses to different stresses in time and space. Cell lineages in roots express stress-responsive genes at different levels. Intersections of stress-responsive and cell-specific profiles identified cell lineages affected by abiotic stress.
Conclusion
By analyzing the stress-dependent expression profile, we define a common stress transcriptome that apparently represents universal cell-level stress responses. Combining stress-dependent and tissue-specific and cell-specific expression profiles, and Arabidopsis 5'-regulatory DNA sequences, we confirm known stress-related 5' cis-elements on a genome-wide scale, identify secondary motifs, and place the stress response within the context of tissues and cell lineages in the Arabidopsis root.
doi:10.1186/gb-2007-8-4-r49
PMCID: PMC1896000  PMID: 17408486
16.  Extraction of transcription regulatory signals from genome-wide DNA–protein interaction data 
Nucleic Acids Research  2005;33(2):605-615.
Deciphering gene regulatory network architecture amounts to the identification of the regulators, conditions in which they act, genes they regulate, cis-acting motifs they bind, expression profiles they dictate and more complex relationships between alternative regulatory partnerships and alternative regulatory motifs that give rise to sub-modalities of expression profiles. The ‘location data’ in yeast is a comprehensive resource that provides transcription factor–DNA interaction information in vivo. Here, we provide two contributions: first, we developed means to assess the extent of noise in the location data, and consequently for extracting signals from it. Second, we couple signal extraction with better characterization of the genetic network architecture. We apply two methods for the detection of combinatorial associations between transcription factors (TFs), the integration of which provides a global map of combinatorial regulatory interactions. We discover the capacity of regulatory motifs and TF partnerships to dictate fine-tuned expression patterns of subsets of genes, which are clearly distinct from those displayed by most genes assigned to the same TF. Our findings provide carefully prioritized, high-quality assignments between regulators and regulated genes and as such should prove useful for experimental and computational biologists alike.
doi:10.1093/nar/gki166
PMCID: PMC548334  PMID: 15684410
17.  Identification of novel regulatory modules in dicotyledonous plants using expression data and comparative genomics 
Genome Biology  2006;7(11):R103.
A strategy combining classical motif overrepresentation in co-regulated genes with comparative footprinting is applied to identify 80 transcription factor binding sites and 139 regulatory modules in Arabidopsis thaliana.
Background
Transcriptional regulation plays an important role in the control of many biological processes. Transcription factor binding sites (TFBSs) are the functional elements that determine transcriptional activity and are organized into separable cis-regulatory modules, each defining the cooperation of several transcription factors required for a specific spatio-temporal expression pattern. Consequently, the discovery of novel TFBSs in promoter sequences is an important step to improve our understanding of gene regulation.
Results
Here, we applied a detection strategy that combines features of classic motif overrepresentation approaches in co-regulated genes with general comparative footprinting principles for the identification of biologically relevant regulatory elements and modules in Arabidopsis thaliana, a model system for plant biology. In total, we identified 80 TFBSs and 139 regulatory modules, most of which are novel, and primarily consist of two or three regulatory elements that could be linked to different important biological processes, such as protein biosynthesis, cell cycle control, photosynthesis and embryonic development. Moreover, studying the physical properties of some specific regulatory modules revealed that Arabidopsis promoters have a compact nature, with cooperative TFBSs located in close proximity of each other.
Conclusion
These results create a starting point to unravel regulatory networks in plants and to study the regulation of biological processes from a systems biology point of view.
doi:10.1186/gb-2006-7-11-r103
PMCID: PMC1794593  PMID: 17090307
18.  A survey of DNA motif finding algorithms 
BMC Bioinformatics  2007;8(Suppl 7):S21.
Background
Unraveling the mechanisms that regulate gene expression is a major challenge in biology. An important task in this challenge is to identify regulatory elements, especially the binding sites in deoxyribonucleic acid (DNA) for transcription factors. These binding sites are short DNA segments that are called motifs. Recent advances in genome sequence availability and in high-throughput gene expression analysis technologies have allowed for the development of computational methods for motif finding. As a result, a large number of motif finding algorithms have been implemented and applied to various motif models over the past decade. This survey reviews the latest developments in DNA motif finding algorithms.
Results
Earlier algorithms use promoter sequences of coregulated genes from single genome and search for statistically overrepresented motifs. Recent algorithms are designed to use phylogenetic footprinting or orthologous sequences and also an integrated approach where promoter sequences of coregulated genes and phylogenetic footprinting are used. All the algorithms studied have been reported to correctly detect the motifs that have been previously detected by laboratory experimental approaches, and some algorithms were able to find novel motifs. However, most of these motif finding algorithms have been shown to work successfully in yeast and other lower organisms, but perform significantly worse in higher organisms.
Conclusion
Despite considerable efforts to date, DNA motif finding remains a complex challenge for biologists and computer scientists. Researchers have taken many different approaches in developing motif discovery tools and the progress made in this area of research is very encouraging. Performance comparison of different motif finding tools and identification of the best tools have proven to be a difficult task because tools are designed based on algorithms and motif models that are diverse and complex and our incomplete understanding of the biology of regulatory mechanism does not always provide adequate evaluation of underlying algorithms over motif models.
doi:10.1186/1471-2105-8-S7-S21
PMCID: PMC2099490  PMID: 18047721
19.  MEME Suite: tools for motif discovery and searching 
Nucleic Acids Research  2009;37(Web Server issue):W202-W208.
The MEME Suite web server provides a unified portal for online discovery and analysis of sequence motifs representing features such as DNA binding sites and protein interaction domains. The popular MEME motif discovery algorithm is now complemented by the GLAM2 algorithm which allows discovery of motifs containing gaps. Three sequence scanning algorithms—MAST, FIMO and GLAM2SCAN—allow scanning numerous DNA and protein sequence databases for motifs discovered by MEME and GLAM2. Transcription factor motifs (including those discovered using MEME) can be compared with motifs in many popular motif databases using the motif database scanning algorithm Tomtom. Transcription factor motifs can be further analyzed for putative function by association with Gene Ontology (GO) terms using the motif-GO term association tool GOMO. MEME output now contains sequence LOGOS for each discovered motif, as well as buttons to allow motifs to be conveniently submitted to the sequence and motif database scanning algorithms (MAST, FIMO and Tomtom), or to GOMO, for further analysis. GLAM2 output similarly contains buttons for further analysis using GLAM2SCAN and for rerunning GLAM2 with different parameters. All of the motif-based tools are now implemented as web services via Opal. Source code, binaries and a web server are freely available for noncommercial use at http://meme.nbcr.net.
doi:10.1093/nar/gkp335
PMCID: PMC2703892  PMID: 19458158
20.  DAtA: Database of Arabidopsis thaliana Annotation 
Nucleic Acids Research  2000;28(1):102-103.
The Database of Arabidopsis thaliana Annotation (DAtA) was created to enable easy access to and analysis of all the Arabidopsis genome project annotation. The database was constructed using the completed A.thaliana genomic sequence data currently in GenBank. An automated annotation process was used to predict coding sequences for GenBank records that do not include annotation. DAtA also contains protein motifs and protein similarities derived from searches of the proteins in DAtA with motif databases and the non-redundant protein database. The database is routinely updated to include new GenBank submissions for Arabidopsis genomic sequences and new Blast and protein motif search results. A web interface to DAtA allows coding sequences to be searched by name, comment, blast similarity or motif field. In addition, browse options present lists of either all the protein names or identified motifs present in the sequenced A.thaliana genome. The database can be accessed at http://baggage.stanford.edu/group/arabprotein/
PMCID: PMC102487  PMID: 10592193
21.  Genome analysis: RNA recognition motif (RRM) and K homology (KH) domain RNA-binding proteins from the flowering plant Arabidopsis thaliana 
Nucleic Acids Research  2002;30(3):623-635.
Regulation of gene expression at the post-transcriptional level is mainly achieved by proteins containing well-defined sequence motifs involved in RNA binding. The most widely spread motifs are the RNA recognition motif (RRM) and the K homology (KH) domain. In this article, we survey the complete Arabidopsis thaliana genome for proteins containing RRM and KH RNA-binding domains. The Arabidopsis genome encodes 196 RRM-containing proteins, a more complex set than found in Caenorhabditis elegans and Drosophila melanogaster. In addition, the Arabidopsis genome contains 26 KH domain proteins. Most of the Arabidopsis RRM-containing proteins can be classified into structural and/or functional groups, based on similarity with either known metazoan or Arabidopsis proteins. Approximately 50% of Arabidopsis RRM-containing proteins do not have obvious homologues in metazoa, and for most of those that are predicted to be orthologues of metazoan proteins, no experimental data exist to confirm this. Additionally, the function of most Arabidopsis RRM proteins and of all KH proteins is unknown. Based on the data presented here, it is evident that among all eukaryotes, only those RNA-binding proteins that are involved in the most essential processes of post-transcriptional gene regulation are preserved in structure and, most probably, in function. However, the higher complexity of RNA-binding proteins in Arabidopsis, as evident in groups of SR splicing factors and poly(A)-binding proteins, may account for the observed differences in mRNA maturation between plants and metazoa. This survey provides a first systematic analysis of plant RNA-binding proteins, which may serve as a basis for functional characterisation of this important protein group in plants.
PMCID: PMC100298  PMID: 11809873
22.  PlantQTL-GE: a database system for identifying candidate genes in rice and Arabidopsis by gene expression and QTL information 
Nucleic Acids Research  2006;35(Database issue):D879-D882.
We have designed and implemented a web-based database system, called PlantQTL-GE, to facilitate quantitatine traits locus (QTL) based candidate gene identification and gene function analysis. We collected a large number of genes, gene expression information in microarray data and expressed sequence tags (ESTs) and genetic markers from multiple sources of Oryza sativa and Arabidopsis thaliana. The system integrates these diverse data sources and has a uniform web interface for easy access. It supports QTL queries specifying QTL marker intervals or genomic loci, and displays, on rice or Arabidopsis genome, known genes, microarray data, ESTs and candidate genes and similar putative genes in the other plant. Candidate genes in QTL intervals are further annotated based on matching ESTs, microarray gene expression data and cis-elements in regulatory sequences. The system is freely available at .
doi:10.1093/nar/gkl814
PMCID: PMC1669735  PMID: 17142239
23.  Cis-motifs upstream of the transcription and translation initiation sites are effectively revealed by their positional disequilibrium in eukaryote genomes using frequency distribution curves 
BMC Bioinformatics  2006;7:522.
Background
The discovery of cis-regulatory motifs still remains a challenging task even though the number of sequenced genomes is constantly growing. Computational analyses using pattern search algorithms have been valuable in phylogenetic footprinting approaches as have expression profile experiments to predict co-occurring motifs. Surprisingly little is known about the nature of cis-regulatory element (CRE) distribution in promoters.
Results
In this paper we used the Motif Mapper open-source collection of visual basic scripts for the analysis of motifs in any aligned set of DNA sequences. We focused on promoter motif distribution curves to identify positional over-representation of DNA motifs. Using differentially aligned datasets from the model species Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster and Saccharomyces cerevisiae, we convincingly demonstrated the importance of the position and orientation for motif discovery. Analysis with known CREs and all possible hexanucleotides showed that some functional elements gather close to the transcription and translation initiation sites and that elements other than the TATA-box motif are conserved between eukaryote promoters. While a high background frequency usually decreases the effectiveness of such an enumerative investigation, we improved our analysis by conducting motif distribution maps using large datasets.
Conclusion
This is the first study to reveal positional over-representation of CREs and promoter motifs in a cross-species approach. CREs and motifs shared between eukaryotic promoters support the observation that an eukaryotic promoter structure has been conserved throughout evolutionary time. Furthermore, with the information on positional enrichment of a motif or a known functional CRE, it is possible to get a more detailed insight into where an element appears to function. This in turn might accelerate the in depth examination of known and yet unknown cis-regulatory sequences in the laboratory.
doi:10.1186/1471-2105-7-522
PMCID: PMC1698937  PMID: 17137509
24.  Small interfering peptides as a novel way of transcriptional control 
Plant Signaling & Behavior  2008;3(9):615-617.
Transcription factors are key components of transcriptional regulatory networks governing virtually all aspects of plant growth and developmental processes. Their activities are regulated at various steps, including gene transcription, posttranscriptional mRNA metabolism, posttranslational modifications, nucleocytoplasmic transport, and controlled proteolytic cleavage of membrane-anchored, dormant forms. Dynamic protein dimerization also plays a critical role in this process. An exquisite regulatory scheme has recently been proposed to modulate the action of transcription factors. Small peptides possessing a protein dimerization motif but lacking the DNA-binding motif form nonfunctional heterodimers with a group of specific TFs, inhibiting their transcriptional activation activities. Extensive searches for small proteins that have a similar structural organization in the databases revealed that small peptide-mediated transcription control is not an exceptional case but would be a regulatory mechanism occurring widespread in the Arabidopsis genome.
PMCID: PMC2634540  PMID: 19513250
Arabidopsis; flowering time; HD-ZIP III; homodimer; transcription factor; ZPR
25.  Discovering common stem–loop motifs in unaligned RNA sequences 
Nucleic Acids Research  2001;29(10):2135-2144.
Post-transcriptional regulation of gene expression is often accomplished by proteins binding to specific sequence motifs in mRNA molecules, to affect their translation or stability. The motifs are often composed of a combination of sequence and structural constraints such that the overall structure is preserved even though much of the primary sequence is variable. While several methods exist to discover transcriptional regulatory sites in the DNA sequences of coregulated genes, the RNA motif discovery problem is much more difficult because of covariation in the positions. We describe the combined use of two approaches for RNA structure prediction, FOLDALIGN and COVE, that together can discover and model stem–loop RNA motifs in unaligned sequences, such as UTRs from post-transcriptionally coregulated genes. We evaluate the method on two datasets, one a section of rRNA genes with randomly truncated ends so that a global alignment is not possible, and the other a hyper-variable collection of IRE-like elements that were inserted into randomized UTR sequences. In both cases the combined method identified the motifs correctly, and in the rRNA example we show that it is capable of determining the structure, which includes bulge and internal loops as well as a variable length hairpin loop. Those automated results are quantitatively evaluated and found to agree closely with structures contained in curated databases, with correlation coefficients up to 0.9. A basic server, Stem–Loop Align SearcH (SLASH), which will perform stem–loop searches in unaligned RNA sequences, is available at http://www.bioinf.au.dk/slash/.
PMCID: PMC55461  PMID: 11353083

Results 1-25 (616491)